Local Development Setup for MWAA with PyCharm
Originally published on an external platform.
Introduction
This blog is about setting up a local development environment for MWAA (Managed Workflows for Apache Airflow) using PyCharm. It will cover the steps to configure PyCharm to run and debug Airflow DAGs locally, mimicking the MWAA environment as closely as possible.
Why Local Development for MWAA?
Developing Airflow DAGs directly on MWAA can be slow and cumbersome. A local setup allows for faster iteration, easier debugging, and a more efficient development workflow. It also helps in catching errors early before deploying to the managed service.
Prerequisites
Before you begin, ensure you have the following installed:
- Docker and Docker Compose: MWAA environments are containerized, so Docker is essential for local replication.
- Python 3.8+: Airflow requires a compatible Python version.
- PyCharm (Community or Professional): Your IDE of choice for development.
- Git: For version control.
Step 1: Set up a Local Airflow Environment with Docker Compose
MWAA uses Apache Airflow, which can be run locally using Docker Compose. This setup will include an Airflow scheduler, webserver, and a PostgreSQL database.
-
Create a
docker-compose.yamlfile:version: "3.8" services: postgres: image: postgres:13 environment: - POSTGRES_USER=airflow - POSTGRES_PASSWORD=airflow - POSTGRES_DB=airflow ports: - "5432:5432" volumes: - ./pgdata:/var/lib/postgresql/data airflow-webserver: build: context: . dockerfile: Dockerfile command: webserver ports: - "8080:8080" environment: - AIRFLOW_HOME=/opt/airflow - AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql+psycopg2://airflow:airflow@postgres:5432/airflow - AIRFLOW__CORE__EXECUTOR=LocalExecutor - AIRFLOW__WEBSERVER__RBAC=True - AIRFLOW__WEBSERVER__AUTH_BACKEND=airflow.contrib.auth.backends.password_auth volumes: - ./dags:/opt/airflow/dags - ./plugins:/opt/airflow/plugins - ./logs:/opt/airflow/logs depends_on: - postgres airflow-scheduler: build: context: . dockerfile: Dockerfile command: scheduler environment: - AIRFLOW_HOME=/opt/airflow - AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql+psycopg2://airflow:airflow@postgres:5432/airflow - AIRFLOW__CORE__EXECUTOR=LocalExecutor volumes: - ./dags:/opt/airflow/dags - ./plugins:/opt/airflow/plugins - ./logs:/opt/airflow/logs depends_on: - postgres -
Create a
Dockerfilefor Airflow:FROM apache/airflow:2.2.3 USER airflow WORKDIR /opt/airflow COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt -
Create a
requirements.txtfile: This file should contain any Python packages your DAGs depend on. For MWAA, you might needapache-airflow-providers-amazon.apache-airflow-providers-amazon==3.2.0 -
Initialize Airflow and start services:
mkdir -p dags plugins logs pgdata docker-compose up airflow-webserver airflow-scheduler -d docker-compose exec airflow-webserver airflow db init docker-compose exec airflow-webserver airflow users create \ --username admin --firstname admin --lastname admin --role Admin --email admin@example.com \ --password adminAccess the Airflow UI at
http://localhost:8080withadmin/admin.
Step 2: Configure PyCharm for Local Development
Now, let’s integrate this local Airflow environment with PyCharm.
- Open your project in PyCharm.
- Configure Python Interpreter:
- Go to
File > Settings/Preferences > Project: [Your Project Name] > Python Interpreter. - Click the gear icon and select
Add.... - Choose
Docker Compose. - Select your
docker-compose.yamlfile. - Choose the
airflow-webserverservice (orairflow-scheduler) as the interpreter. PyCharm will inspect the container and set up the remote interpreter. - Click
OKand thenApply.
- Go to
-
Map Volumes: Ensure your local
dagsandpluginsfolders are correctly mapped to/opt/airflow/dagsand/opt/airflow/pluginsin the Docker Compose configuration. PyCharm should automatically detect these, but verify underSettings/Preferences > Build, Execution, Deployment > Docker > [Your Docker Compose configuration] > Volume Bindings. - Create a Run/Debug Configuration for DAGs:
To run or debug a specific DAG file:
- Go to
Run > Edit Configurations.... - Click the
+icon and selectPython. - Name:
Debug DAG: [Your DAG Name] - Script path: Navigate to your DAG file (e.g.,
dags/my_dag.py). - Python interpreter: Select the Docker Compose interpreter you configured earlier.
- Working directory: Set this to your project root or the
dagsdirectory. - Environment variables: You might need to add
AIRFLOW_HOME=/opt/airflowand other Airflow-specific environment variables if your DAGs rely on them. - Before launch: You might want to add a step to ensure Docker Compose services are up and running. You can add a
Run external toolstep to executedocker-compose up -dif not already running.
- Go to
- Debugging a DAG: With the run configuration set up, you can now place breakpoints in your DAG file and run the configuration in debug mode. PyCharm will execute the DAG within the Docker container, allowing you to step through your code.
Step 3: Mimicking MWAA Specifics (Optional but Recommended)
MWAA has some specific configurations and behaviors you might want to replicate locally.
-
MWAA Environment Variables: MWAA injects several environment variables into your Airflow tasks (e.g.,
AIRFLOW_CONN_AWS_DEFAULT,AWS_REGION). You can simulate these by adding them to your PyCharm run configurations or by creating a.envfile that Docker Compose can load.Example
.envfile:AWS_REGION=us-east-1 AIRFLOW_CONN_AWS_DEFAULT=aws://AKIA...:SECRET...@Then, in your
docker-compose.yaml, addenv_file: .envto your Airflow services. -
Custom Plugins: If you use custom Airflow plugins in MWAA, ensure they are placed in your local
pluginsdirectory, which is volume-mounted into the Airflow container. -
Python Version Consistency: Ensure the Python version in your
Dockerfilematches the Python version of your MWAA environment to avoid unexpected compatibility issues. -
Airflow Version: Use the same Airflow version in your
Dockerfileas your MWAA environment.
Example DAG for Local Testing
Create a simple DAG in dags/my_test_dag.py:
from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime
with DAG(
dag_id='my_test_dag',
start_date=datetime(2023, 1, 1),
schedule_interval=None,
catchup=False,
tags=['example'],
) as dag:
start_task = BashOperator(
task_id='start_task',
bash_command='echo "Starting DAG!"',
)
hello_task = BashOperator(
task_id='hello_world',
bash_command='echo "Hello from Airflow!"',
)
end_task = BashOperator(
task_id='end_task',
bash_command='echo "DAG finished!"',
)
start_task >> hello_task >> end_task
After placing this file, it should appear in your local Airflow UI. You can then trigger it manually.
Conclusion
Setting up a robust local development environment for MWAA with PyCharm significantly enhances productivity and reduces the feedback loop. By leveraging Docker Compose, you can closely mimic the MWAA runtime, allowing for efficient development and debugging of your Airflow DAGs. This setup provides the best of both worlds: the power of a managed service like MWAA and the flexibility of local development.
Happy DAGging!