As we speak, AWS introduced Amazon Managed Workflows for Apache Airflow (MWAA) Serverless. It is a new deployment possibility for MWAA that eliminates the operational overhead of managing Apache Airflow environments whereas optimizing prices via serverless scaling. This new providing addresses key challenges that information engineers and DevOps groups face when orchestrating workflows: operational scalability, value optimization, and entry administration.
With MWAA Serverless you’ll be able to focus in your workflow logic slightly than monitoring for provisioned capability. Now you can submit your Airflow workflows for execution on a schedule or on demand, paying just for the precise compute time used throughout every job’s execution. The service routinely handles all infrastructure scaling in order that your workflows run effectively no matter load.
Past simplified operations, MWAA Serverless introduces an up to date safety mannequin for granular management via AWS Identification and Entry Administration (IAM). Every workflow can now have its personal IAM permissions, operating on a VPC of your selecting so you’ll be able to implement exact safety controls with out creating separate Airflow environments. This strategy considerably reduces safety administration overhead whereas strengthening your safety posture.
On this submit, we display tips on how to use MWAA Serverless to construct and deploy scalable workflow automation options. We stroll via sensible examples of making and deploying workflows, establishing observability via Amazon CloudWatch, and changing current Apache Airflow DAGs (Directed Acyclic Graphs) to the serverless format. We additionally discover greatest practices for managing serverless workflows and present you tips on how to implement monitoring and logging.
How does MWAA Serverless work?
MWAA Serverless processes your workflow definitions and executes them effectively in service-managed Airflow environments, routinely scaling sources based mostly on workflow calls for. MWAA Serverless makes use of the Amazon Elastic Container Service (Amazon ECS) executor to run every particular person job by itself ECS Fargate container, on both your VPC or a service-managed VPC. These containers then talk again to their assigned Airflow cluster utilizing the Airflow 3 Activity API.

Determine 1: Amazon MWAA Structure
MWAA Serverless makes use of declarative YAML configuration recordsdata based mostly on the favored open supply DAG Manufacturing facility format to boost safety via job isolation. You could have two choices for creating these workflow definitions:
This declarative strategy gives two key advantages. First, since MWAA Serverless reads workflow definitions from YAML it may well decide job scheduling with out operating any workflow code. Second, this permits MWAA Serverless to grant execution permissions solely when duties run, slightly than requiring broad permissions on the workflow degree. The result’s a safer setting the place job permissions are exactly scoped and time restricted.
Service concerns for MWAA Serverless
MWAA Serverless has the next limitations that you must contemplate when deciding between serverless and provisioned MWAA deployments:
- Operator assist
- MWAA Serverless solely helps operators from the Amazon Supplier Package deal.
- To execute customized code or scripts, you’ll want to make use of AWS providers, equivalent to:
- Consumer interface
- MWAA Serverless operates with out utilizing the Airflow internet interface.
- For workflow monitoring and administration, we offer integration with Amazon CloudWatch and AWS CloudTrail.
Working with MWAA Serverless
Full the next stipulations and steps to make use of MWAA Serverless.
Conditions
Earlier than you start, confirm you’ve got the next necessities in place:
- Entry and permissions
- An AWS account
- AWS Command Line Interface (AWS CLI) model 2.31.38 or later put in and configured
- The suitable permissions to create and modify IAM roles and insurance policies, together with the next required IAM permissions:
airflow-serverless:CreateWorkflowairflow-serverless:DeleteWorkflowairflow-serverless:GetTaskInstanceairflow-serverless:GetWorkflowRunairflow-serverless:ListTaskInstancesairflow-serverless:ListWorkflowRunsairflow-serverless:ListWorkflowsairflow-serverless:StartWorkflowRunairflow-serverless:UpdateWorkflowiam:CreateRoleiam:DeleteRoleiam:DeleteRolePolicyiam:GetRoleiam:PutRolePolicyiam:UpdateAssumeRolePolicylogs:CreateLogGrouplogs:CreateLogStreamlogs:PutLogEventsairflow:GetEnvironmentairflow:ListEnvironmentss3:DeleteObjects3:GetObjects3:ListBuckets3:PutObjects3:Sync
- Entry to an Amazon Digital Personal Cloud (VPC) with web connectivity
- Required AWS providers – Along with MWAA Serverless you will want entry to the next AWS providers:
- Amazon MWAA to entry your current Airflow setting(s)
- Amazon CloudWatch to view logs
- Amazon S3 for DAG and YAML file administration
- AWS IAM to regulate permissions
- Growth setting
- Further necessities
- Fundamental familiarity with Apache Airflow ideas
- Understanding of YAML syntax
- Information of AWS CLI instructions
Be aware: All through this submit, we use instance values that you just’ll want to exchange with your individual:
- Exchange
amzn-s3-demo-buckettogether with your S3 bucket title - Exchange
111122223333together with your AWS account quantity - Exchange
us-east-2together with your AWS Area. MWAA Serverless is on the market in a number of AWS Areas. Test the Record of AWS Companies Accessible by Area for present availability.
Creating your first serverless workflow
Let’s begin by defining a easy workflow that will get an inventory of S3 objects and writes that record to a file in the identical bucket. Create a brand new file known as simple_s3_test.yaml with the next content material:
For this workflow to run, it’s essential to create an Execution position that has permissions to record and write to the above bucket. The position additionally must be assumable from MWAA Serverless. The next CLI instructions create this position and its related coverage:
You then copy your YAML DAG to the identical S3 bucket, and create your workflow based mostly upon the Arn response from the above operate.
The output of the final command returns a WorkflowARN worth, which you then use to run the workflow:
The output returns a RunId worth, which you then use to test the standing of the workflow run that you just simply executed.
If it is advisable to make a change to your YAML, you’ll be able to copy again to S3 and run the update-workflow command.
Changing Python DAGs to YAML format
AWS has revealed a conversion software that makes use of the open-source Airflow DAG processor to serialize Python DAGs into YAML DAG manufacturing unit format. To put in, you run the next:
For instance, create the next DAG and title it create_s3_objects.py:
After you have put in python-to-yaml-dag-converter-mwaa-serverless, you run:
The place the output will finish with:
And ensuing YAML will appear to be:
Be aware that, as a result of the YAML conversion is completed after the DAG parsing, the loop that creates the duties is run first and the ensuing static record of duties is written to the YAML doc with their dependencies.
Migrating an MWAA setting’s DAGs to MWAA Serverless
You’ll be able to make the most of a provisioned MWAA setting to develop and take a look at your workflows after which transfer them to serverless to run effectively at scale. Additional, in case your MWAA setting is utilizing appropriate MWAA Serverless operators, then you’ll be able to convert all the setting’s DAGs without delay. Step one is to permit MWAA Serverless to imagine the MWAA Execution position through a belief relationship. It is a one-time operation for every MWAA Execution position, and could be carried out manually within the IAM console or utilizing an AWS CLI command as follows:
Now we are able to loop via every efficiently transformed DAG and create serverless workflows for every.
To see an inventory of your created workflows, run:
Monitoring and observability
MWAA Serverless workflow execution standing is returned through the GetWorkflowRun operate. The outcomes from that may return particulars for that individual run. If there are errors within the workflow definition, they’re returned beneath RunDetail within the ErrorMessage discipline as within the following instance:
Workflows which are correctly outlined, however whose duties fail, will return "ErrorMessage": "Workflow execution failed":
MWAA Serverless job logs are saved within the CloudWatch log group /aws/mwaa-serverless/ (the place / is similar string because the distinctive workflow id within the ARN of the workflow). For particular job log streams, you will want to record the duties for the workflow run after which get every job’s info. You’ll be able to mix these operations right into a single CLI command.
Which might consequence within the following:
At which level, you’d use the CloudWatch LogStream output to debug your workflow.
You might view and handle your workflows within the Amazon MWAA Serverless console:

For an instance that creates detailed metrics and monitoring dashboard utilizing AWS Lambda, Amazon CloudWatch, Amazon DynamoDB, and Amazon EventBridge, overview the instance in this GitHub repository.
Clear up sources
To keep away from incurring ongoing costs, observe these steps to wash up all sources created throughout this tutorial:
- Delete MWAA Serverless workflows – Run this AWS CLI command to delete all workflows:
- Take away the IAM roles and insurance policies created for this tutorial:
- Take away the YAML workflow definitions out of your S3 bucket:
After finishing these steps, confirm within the AWS Administration Console that every one sources have been correctly eliminated. Do not forget that CloudWatch Logs are retained by default and will should be deleted individually if you wish to take away all traces of your workflow executions.
In case you encounter any errors throughout cleanup, confirm you’ve got the required permissions and that sources exist earlier than trying to delete them. Some sources could have dependencies that require them to be deleted in a selected order.
Conclusion
On this submit, we explored Amazon MWAA Serverless, a brand new deployment possibility that simplifies Apache Airflow workflow administration. We demonstrated tips on how to create workflows utilizing YAML definitions, convert current Python DAGs to the serverless format, and monitor your workflows.
MWAA Serverless provides a number of key benefits:
- No provisioning overhead
- Pay-per-use pricing mannequin
- Computerized scaling based mostly on workflow calls for
- Enhanced safety via granular IAM permissions
- Simplified workflow definitions utilizing YAML
To study extra MWAA Serverless, overview the documentation.
In regards to the authors

