Cloud Blog: Scalable alerting for Apache Airflow to improve data orchestration reliability and performance

Source URL: https://cloud.google.com/blog/products/data-analytics/apache-airflow-hierarchy-and-alerting-options-with-cloud-composer/
Source: Cloud Blog
Title: Scalable alerting for Apache Airflow to improve data orchestration reliability and performance

Feedly Summary: About
Apache Airflow is a popular tool for orchestrating data workflows. Google Cloud offers a managed Airflow service called Cloud Composer, a fully managed workflow orchestration service built on Apache Airflow that enables you to author, schedule, and monitor pipelines. And when running Cloud Composer, it’s important to have a robust logging and alerting setup to monitor your DAGs (Directed Acyclic Graphs) and minimize downtime in your data pipelines. 
In this guide, we will review the hierarchy of alerting on Cloud Composer and the various alerting options available to Google Cloud engineers using Cloud Composer and Apache Airflow. 
Getting started
Hierarchy of alerting on Cloud Composer
Composer environment
Cloud Composer environments are self-contained Airflow deployments based on Google Kubernetes Engine. They work with other Google Cloud services using connectors built into Airflow. 
Cloud Composer provisions Google Cloud services that run your workflows and all Airflow components. The main components of an environment are GKE cluster, Airflow web server, Airflow database, and Cloud Storage bucket. For more information, check out Cloud Composer environment architecture. 
Alerts at this level primarily consist of cluster and Airflow component performance and health.
Airflow DAG Runs
A DAG Run is an object representing an instantiation of the DAG at a point in time. Any time the DAG is executed, a DAG Run is created and all tasks inside it are executed. The status of the DAG Run depends on the task’s state. Each DAG Run is run separately from one another, meaning that you can have many runs of a DAG at the same time.
Alerts at this level primarily consist of DAG Run state changes such as Success and Failure, as well as SLA Misses. Airflow’s Callback functionality can trigger code to send these alerts.
Airflow Task instances
A Task is the basic unit of execution in Airflow. Tasks are arranged into DAGs, and then have upstream and downstream dependencies set between them in order to express the order they should run in. Airflow tasks include Operators and Sensors.
Like Airflow DAG Runs, Airflow Tasks can utilize Airflow Callbacks to trigger code to send alerts. 
Summary
To summarize Airflow’s alerting hierarchy: Google Cloud → Cloud Composer Service → Cloud Composer Environment → Airflow Components (Worker) → Airflow DAG Run → Airflow Task Instance.
Any production-level implementation of Cloud Composer should have alerting and monitoring capabilities at each level in the hierarchy. Our Cloud Composer engineering team has extensive documentation around monitoring and alerting at the service/environment level. 
Airflow Alerting on Google Cloud
Now, let’s consider three options for alerting at the Airflow DAG Run and Airflow Task level. 
Option 1: Log-based alerting policies
Google Cloud offers native tools for logging and alerting within your Airflow environment. Cloud Logging centralizes logs from various sources, including Airflow, while Cloud Monitoring lets you set up alerting policies based on specific log entries or metrics thresholds.
You can configure an alerting policy to notify you whenever a specific message appears in your included logs. For example, if you want to know when an audit log records a particular data-access message, you can get notified when the message appears. These types of alerting policies are called log-based alerting policies. Check out Configure log-based alerting policies | Cloud Logging to learn more.
These services combine nicely with Airflow’s Callback feature previously mentioned above. To accomplish this:

Define a Callback function and set at the DAG or Task level.

Use Python’s native logging library to write a specific log message to Cloud Logging.

Define a log-based alerting policy triggered by the specific log message and sends alerts to a notification channel.

Pros and cons
Pros:

Lightweight, minimal setup: no third party tools, no email server set up, no additional Airflow providers required

Integration with Logs Explorer and Log-based metrics for deeper insights and historical analysis

Multiple notification channel options

Cons:

Email alerts contain minimal info

Learning curve and overhead for setting up log sinks and alerting policies

Costs associated with Cloud Logging and Cloud Monitoring usage

Sample code
Airflow DAG Callback:

code_block