The Complete Hands-On Introduction To Apache Airflow Online Course

"This post contains affiliate links, which means that if you click on them and make a purchase, I may receive a small fee at no extra cost to you."

Close up iPhone showing Udemy application and laptop with notebookApache Airflow is an open-source platform used for creating, scheduling, and monitoring workflows. It has become increasingly popular among data engineers and data scientists for its ability to manage complex data pipelines. As more companies adopt Apache Airflow, the demand for courses to learn this technology has increased. In this article, we will explore some of the best online courses available to help individuals master Apache Airflow.

Here’s a look at the Best Apache Airflow Courses and Certifications Online and what they have to offer for you!

The Complete Hands-On Introduction To Apache Airflow Online Course

1. The Complete Hands-On Introduction to Apache Airflow by Marc Lamberti (Udemy) (Our Best Pick)

The Complete Hands-On Introduction to Apache Airflow course is designed to provide learners with a comprehensive understanding of how to author, schedule, and monitor data pipelines with Apache Airflow. The course is instructed by Marc Lamberti and is suitable for individuals with a basic understanding of data engineering and programming.The course begins with an introduction to Apache Airflow, covering the basic concepts and how the platform works. Learners will then delve into the essential views of the Airflow UI and learn how to code their first data pipeline with Airflow. Databases and Executors are also covered in the course materials.As the course progresses, advanced concepts such as creating plugins and making dynamic pipelines are explored. The course includes detailed and practical examples of how to create Airflow plugins using Elasticsearch and PostgreSQL. The course concludes with a bonus appendix section, which provides additional resources to help learners further their understanding of Apache Airflow. Overall, learners will gain a thorough understanding of Apache Airflow and become proficient in managing complex ETL workflows.

2. Apache Airflow: The Hands-On Guide by Marc Lamberti (Udemy)

This course titled Apache Airflow: The Hands-On Guide is focused on teaching learners how to master Apache Airflow from A to Z. The course is designed to help learners gain hands-on experience with Airflow using AWS, Kubernetes, Docker, and more. Apache Airflow is a platform created by the community to programmatically author, schedule, and monitor workflows. The course is structured into ten sections covering the basic concepts and advanced features of Airflow.

The course begins by explaining the fundamentals of Airflow, including what it is and how the web server and scheduler work. The Forex Data Pipeline project is introduced as a way to learn about many Airflow operators and how to use them with Slack, Spark, Hadoop, and other tools. Mastering DAGs is a top priority in the course, and learners will learn how to play with timezones, unit test their DAGs, and structure their DAG folder.

The course covers how to scale Airflow using different executors such as the Local Executor, the Celery Executor, and the Kubernetes Executor. Learners will discover how to specialise workers, add new workers, and deal with node crashes. Additionally, a Kubernetes cluster of three nodes will be set up with Rancher, Airflow, and the Kubernetes Executor in local to run data pipelines.

Advanced concepts are demonstrated through practical examples such as templatating DAGs, making DAGs dependent on one another, and working with Subdags and deadlocks. Monitoring Airflow is also covered in the course using Elasticsearch and Grafana. Security considerations are also addressed to make your Airflow instance compliant with your company by specifying roles and permissions for users with RBAC, preventing unauthorized access to the Airflow UI with authentication and passwords, and encrypting data.

The course includes practical exercises to allow learners to apply what they have learned. Best practices are also highlighted when needed to provide the best ways of using Airflow.

3. Apache Airflow | A Real-Time & Hands-On Course on Airflow by A to Z Mentors (Udemy)

This course titled Apache Airflow is a real-time and hands-on training program offered by A to Z Mentors. It covers everything one needs to know about Apache Airflow from basic to advance level including how to deploy workflows and data pipelines using Airflow with Docker. Apache Airflow is a platform that provides a way to programmatically author, schedule, and monitor workflows. It is adopted by many companies and is a top-level project of Apache.

The course includes complete Apache Airflow concepts explained from scratch to advance level with real-time implementation. Each and every concept of Airflow is explained with hands-on examples including XComs, Hooks, Pools, SubDAGs, Variables, Connections, Plugins, Adhoc queries, Sensors, and many more. Even the concepts whose explanations are not very clear in Apache Airflow’s official documentation are included.

Exclusive features of the course include data profiling, charts, trigger rules, airflowignore file, zombies, undeads, and latest only operator. It also covers best practices and do’s and don’ts to follow in real-time Airflow projects. Additionally, it includes building a data pipeline of a real-time case study using Airflow. After completing the course, one can start working on any Airflow project with full confidence.

The course also provides add-ons such as answering questions and queries quickly, Airflow codes and datasets used in lectures attached in the course for convenience, and frequent updates with new components of Airflow.

4. Apache Airflow on AWS EKS: The Hands-On Guide by Marc Lamberti (Udemy)

This course, titled Apache Airflow on AWS EKS: The Hands-On Guide, is designed to guide users through the steps of setting up a production-ready architecture for Apache Airflow on AWS EKS. The course is intended for individuals who are already familiar with Airflow and have a basic knowledge of Kubernetes, Docker, and AWS. With over 15,000 students, the course aims to address the difficulties users face when configuring Airflow on AWS with the official Helm chart.

The course will cover various topics, including configuring the EKS cluster, deploying changes automatically with GitOps, using Helm to set up Airflow on Kubernetes, deploying DAGs in Airflow with Git-Sync and AWS EFS, deploying DAGs/Airflow through CI/CD pipelines with AWS CodePipeline, testing DAGs automatically, securing credentials and sensitive data in a Secret Backend, enabling remote logging with AWS S3, creating different environments, and making the production environment scalable and highly available.

It is important to note that the course is not meant to teach the basics of Airflow and users should already be familiar with it. Additionally, the course is not eligible for free-tier and does not cover how to interact with AWS in DAGs.

The course is divided into several sections, including an introduction, configuring AWS, exploring the DevOps world, creating the EKS cluster with GitOps, deploying Airflow with DAGs, building CI/CD pipelines to deploy Airflow, exposing the Airflow UI, logging with Airflow in AWS EKS, configuring the production environment, and a bonus section.

5. Apache Airflow: The Operators Guide by Marc Lamberti (Udemy)

The course titled Apache Airflow: The Operators Guide is offered by instructor Marc Lamberti. This course aims to help individuals master Airflow Operators and create reliable data pipelines. Airflow has more than 700 Operators and can interact with over 70 tools. Operators are important tasks in a data pipeline that correspond to the different steps required to produce the desired output.

The course offers best practices around Operators and covers topics such as versioning DAGs, properly retrying tasks, creating dependencies between tasks and DAG runs, understanding the owner parameter, taking actions if a task fails, choosing the right way to create DAG dependencies, executing a task only in a specific interval of time, grouping tasks to make DAGs cleaner, and triggering DAGs based on a calendar.

It is important to note that individuals must already have a good understanding of Airflow before taking this course. The course should be considered as an Airflow Operators Reference rather than an introductory course.

The course includes an introduction section followed by sections on the BaseOperator, the most common Operators, choosing a path, DAG dependencies, exotic Operators, and last words. Additionally, the instructor invites individuals to vote for specific Operators they would like to see covered in future videos. Overall, this course is designed for individuals who are ready to step up and take their data pipelines to another level.

6. Apache Airflow: Complete Hands-On Beginner to Advanced Class by Alexandra Abbas (Udemy)

The Apache Airflow: Complete Hands-On Beginner to Advanced Class course is a comprehensive online course that teaches users how to use Apache Airflow, a tool for developing, scheduling, and monitoring complex data pipelines. The course is taught by Alexandra Abbas, an Apache Airflow contributor and Google Cloud Certified Data Engineer & Architect with over three years of experience as a Data Engineer.

The course includes 50 lectures, over four hours of video content, quizzes, coding exercises, and two real-life projects that users can add to their Github portfolio. Users will learn how to install and set up Airflow on their machines, develop complex real-life data pipelines, interact with Google Cloud from their Airflow instance, extend Airflow with custom operators and sensors, test Airflow pipelines and operators, monitor their Airflow instance using Prometheus and Grafana, track errors with Sentry, and set up and run Airflow in production.

The course is designed for beginners and does not require any previous knowledge of Apache Airflow, Data Engineering or Google Cloud. The course starts with the basics and progresses step by step through the content.

Upon completion of the course, users will have lifetime access to over 50 lectures, corresponding cheat sheets, datasets, and code base for the lectures. The course is divided into nine sections: Introduction, Getting Started with Apache Airflow, Core Concepts in Apache Airflow, Loading Data to a Data Warehouse, Analysing Data using PySpark, Extending Airflow with Custom Plugins, Testing Airflow DAGs, Airflow in Production, and Finale.

7. Mastering Apache Airflow! Deploy to Kubernetes in AWS by Mihail Petkov (Udemy)

The course Mastering Apache Airflow! Deploy to Kubernetes in AWS is designed to teach individuals how to programmatically author, schedule, and monitor workflows using Apache Airflow. The course instructors are Mihail Petkov. The course begins with covering some basic concepts related to Apache Airflow, such as its various components, DAG, Plugin, Operator, Sensor, Hook, Xcom, Variable, and Connection. Later in the course, advanced topics such as branching, metrics, performance, and log monitoring, as well as Airflow’s REST API, will be discussed.

The course also includes guidance on building a development environment using Docker and Docker Compose, as well as creating a Kubernetes cluster in Amazon and deploying the application there. Finally, advanced tips will be shared to help enhance a simple Airflow project into a production ready system.

The course content is divided into sections, including an introduction to Apache Airflow, an overview of its scheduler and web server, configuration, and its various components. Additionally, there are sections on creating a simple pipeline, using Apache Airflow in Docker and Docker Compose, and deploying to a Kubernetes cluster in AWS. The course ends with tips on how to build a real-world Apache Airflow application.

8. Apache Airflow using Google Cloud Composer: Introduction by Guha Rajan M., B.Engg, MBA, PMP (Udemy)

This course, titled Apache Airflow Using Google Cloud Composer: Introduction, is instructed by Guha Rajan M. The course description highlights that learners will be able to learn about Apache Airflow without requiring a local installation. The course aims to focus on Airflow topics, which will be hosted on cloud through Google Cloud composer, making it easier for learners to understand the product functionality without the hassle of installing Airflow on their local machines.

Apache Airflow is an open-source platform that enables users to author, schedule, and monitor workflows programmatically. Cloud Composer, on the other hand, is a fully managed workflow orchestration service that allows users to author, schedule, and monitor pipelines across clouds and on-premises data centers. Cloud Composer is built on the popular Apache Airflow project and operated using the Python programming language, making it free from lock-in and easy to use.

With Cloud Composer, pipelines are configured as directed acyclic graphs (DAGs) using Python. This makes it easy for users of any experience level to author and schedule a workflow. The deployment process is simplified, as it yields instant access to a rich library of connectors and multiple graphical representations of workflows in action, increasing pipeline reliability by making troubleshooting easy.

The course is designed specifically for beginners who are first-time users of Cloud Composer or Apache Airflow. The course structure includes presentations to discuss concepts initially and hands-on demonstrations to better understand the material. Additionally, the python DAG programs used in the demonstrations are made available for download for further practice by students.

The course content is structured to provide an overview of Apache Airflow and its directed acyclic graph (DAG) and operators. It discusses the Apache Airflow architecture, Google Cloud Platform, and Cloud Composer used as Apache Airflow.

9. Apache Airflow 2.0 : Complete Distributed Configuration by Ganesh Dhareshwar (Udemy)

This course, titled Apache Airflow 2.0: Complete Distributed Configuration, is instructed by Ganesh Dhareshwar. The course focuses on achieving an Airflow distributed setup using the Celery Executor, which will allow for the running of over 100 jobs or DAGs in parallel at any given time. The course will cover Sequential, Local, and Celery Executors, and will involve the configuration of a few EC2 instances from AWS.The Airflow community recently released Airflow 2.0, which includes several new features such as an HA Scheduler and significant performance improvements on the scheduler and Celery workers. The instructor of the course has migrated all Airflow 1.x to 2.x in their organization and is adding a new module on Apache Airflow 2.0. The module will cover new enhancements and HA architecture, installation of components such as Webserver, Scheduler, Celery workers, and Flower, as well as the configuration of multiple schedulers for observation of performance.The course will include exploration of features such as Login, Email alerting, and Logs management. By the end of the course, students will have a distributed Airflow setup that can be shared among multiple teams in their organization. The course is broken down into several sections, including an Introduction, Install Airflow and Web Server walkthrough, Sequential Executor with SQLite, Local Executor with Mysql, Celery Executor with Mysql and RabbitMQ, and Apache Airflow 2.0.

10. Practical Apache Airflow by DevTechie Inc (Udemy)

This course, titled Practical Apache Airflow Course, is offered by DevTechie Inc. The aim of the course is to provide learners with the skills to develop data pipelining and workflow using Apache Airflow in Python. Data engineering is a field that incorporates a range of elements from software engineering, business intelligence, and data warehousing. Data engineering provides the toolbox for extracting value from large datasets.

Data pipeline frameworks play a significant role in managing data collection, munging, and consumption. Apache Airflow, which originated from AirBnb, has become an integral part of their tech stack. As the data infrastructure ecosystem continues to evolve, it becomes ever more important to have a tool like Apache Airflow that can bring everything together in one place where each piece of the puzzle can be orchestrated properly.

The course covers a range of topics, starting from the introduction of Apache Airflow and its architecture. The installation of Airflow and its configuration are also included. Learners will also develop their first data pipeline and learn about DAG chaining, authentication, and log storage to the cloud. Airflow on Docker, REST APIs, and SLAs are also covered in the course. Finally, learners will also learn about the Airflow command line.

The course is designed to help learners achieve feature completeness with Apache Airflow. Learners will not only learn how to set up the environment but also how to create workflow pipelines with real-world examples. By the end of the course, learners will have gained the skills to create, schedule, and monitor data pipelines using Apache Airflow in Python.