Apache Airflow Docker Compose: A Comprehensive Guide to Configuration
Installing Docker
Before we dive into the configuration process, we need to ensure that Docker is installed on your system. If you haven’t installed Docker yet, follow these simple steps:
- Visit the official Docker website.
- Download the appropriate installation file for your operating system.
- Install Docker by following the instructions provided on the website.
Once Docker is installed, we can proceed with the Apache Airflow Docker Compose configuration.
Step 1: Creating a New Folder
To get started, let’s create a new folder to organize our Airflow configuration files. Open any terminal or command prompt, navigate to a directory of your choice, and create a new folder called “Airflow.” You can do this by executing the following commands:
mkdir Airflow cd Airflow
Step 2: Obtaining the Docker Compose File
Option 1: Using the Command Line
If you prefer using the command line, execute the following command within the newly created Airflow folder:
curl https://raw.githubusercontent.com/marvinlanhenke/Airflow/main/01GettingStarted/docker-compose.yml -o docker-compose.yml
This command will download the Docker Compose file and save it as “docker-compose.yml” in your Airflow folder.
Option 2: Manual Creation
Alternatively, you can manually create the “docker-compose.yml” file in the Airflow folder and populate it with the following content:
version: '3.4'
x-common:
&common
image: apache/airflow:2.3.0
user: "${AIRFLOW_UID}:0"
env_file:
- .env
volumes:
- ./dags:/opt/airflow/dags
- ./logs:/opt/airflow/logs
- ./plugins:/opt/airflow/plugins
- /var/run/docker.sock:/var/run/docker.sock
x-depends-on:
&depends-on
depends_on:
postgres:
condition: service_healthy
airflow-init:
condition: service_completed_successfully
services:
postgres:
image: postgres:13
container_name: postgres
ports:
- "5434:5432"
healthcheck:
test: ["CMD", "pg_isready", "-U", "airflow"]
interval: 5s
retries: 5
env_file:
- .env
scheduler:
<<: *common
<<: *depends-on
container_name: airflow-scheduler
command: scheduler
restart: on-failure
ports:
- "8793:8793"
webserver:
<<: *common
<<: *depends-on
container_name: airflow-webserver
restart: always
command: webserver
ports:
- "8080:8080"
healthcheck:
test: ["CMD", "curl", "--fail", "http://localhost:8080/health"]
interval: 30s
timeout: 30s
retries: 5
airflow-init:
<<: *common
container_name: airflow-init
entrypoint: /bin/bash
command:
- -c
- |
mkdir -p /sources/logs /sources/dags /sources/plugins
chown -R "${AIRFLOW_UID}:0" /sources/{logs,dags,plugins}
exec /entrypoint airflow version
The Docker Compose file above specifies the services required to run Airflow, including the scheduler, webserver, metadatabase (PostgreSQL), and the initialization job for the database.
Step 3: Setting Environment Variables
With the Docker Compose file in place, we need to set up the required environment variables to complete the Airflow installation and configuration. Create a new file called “.env” in the Airflow folder and add the following contents:
```plaintext
Meta-Database
POSTGRES_USER=airflow
POSTGRES_PASSWORD=airflow
POSTGRES_DB=airflow
Airflow Core
AIRFLOW__CORE__FERNET_KEY=UKMzEm3yIuFYEq1y3-2FxPNWSVwRASpahmQ9kQfEr8E=
AIRFLOW__CORE__EXECUTOR=LocalExecutor
AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION=True
AIRFLOW__CORE__LOAD_EXAMPLES=False
AIRFLOW_UID=0
Backend DB
AIRFLOW__DATABASE__SQL_ALCHEMY_CONN=postgresql+psycopg2://airflow:airflow@postgres/airflow
AIRFLOW__DATABASE__LOAD_DEFAULT_CONNECTIONS=False
Airflow Init
_AIRFLOW_DB_UPGRADE=True
_AIRFLOW_WWW_USER_CREATE=True
_AIRFLOW_WWW_USER_USERNAME=airflow
_AIRFLOW_WWW_USER_PASSWORD=airflow
The variables defined above include the credentials for the meta-database (PostgreSQL), settings for Airflow’s core functionality, and configuration for the backend database.
Step 4: Running Docker Compose
We have now completed the configuration setup, and it’s time to start the Docker containers for Apache Airflow. Open your terminal or command prompt, navigate to the Airflow folder, and execute the following command:
bash docker-compose up -d
This command will start all the required containers defined in the Docker Compose file. After a short while, you can access the Airflow Web UI by visiting http://localhost:8080 in your web browser. Sign in using the default credentials (airflow:airflow) to gain access to the user interface.
Congratulations! You have successfully installed and configured Apache Airflow using Docker Compose. Now you can leverage the power of Airflow to streamline your workflow management and automate various tasks efficiently.
Conclusion
Remember, if you require further assistance or prefer professional guidance, Codeyo Genie is here to help. Our team of experts specializes in resolving web-related errors and optimizing your
online presence. Contact us today to elevate your website performance and ensure uninterrupted
access for your users.
FAQs
1. How does Docker Compose work?
Docker Compose is a tool that allows you to define and manage multi-container Docker applications. It works by using a YAML file to specify the services, networks, and volumes required for your application. With Docker Compose, you can define the configuration once and then easily spin up all the containers with a single command. It automates the process of creating and connecting containers, making it easier to manage complex applications that rely on multiple services.
2. Can we use Airflow without Docker?
Yes, it is possible to use Apache Airflow without Docker. Airflow is a Python-based platform for workflow automation, and it can be installed and configured directly on a system without Docker. However, using Docker can provide several benefits, such as simplified installation, portability, and isolation of dependencies. Docker allows you to package Airflow and its dependencies into a container, making it easier to manage and deploy across different environments.
3. Does Airflow require coding?
Yes, Apache Airflow does require coding. Airflow uses Python as its primary programming language for defining and orchestrating workflows. To create workflows in Airflow, you need to write Python code using its domain-specific language (DSL), which consists of tasks, operators, and connections. The code defines the dependencies and relationships between tasks, allowing you to build complex workflows. While basic tasks can be defined using pre-built operators, more advanced workflows may require custom Python code.
4. What are the two types of Docker?
Docker provides two main types of containers: Linux containers and Windows containers.
Linux containers: These containers run on Linux-based operating systems and are the most commonly used type. They rely on the host system’s Linux kernel and offer lightweight, isolated environments for running applications. Linux containers are highly portable and widely supported across different platforms and cloud providers.
Windows containers: These containers are designed to run on Windows-based operating systems. They use the Windows kernel and provide a separate, isolated environment for running Windows applications. Windows containers allow developers to package and deploy Windows applications consistently across different environments.
5. What is Docker Airflow?
Docker Airflow refers to the usage of Docker to deploy and run Apache Airflow. Docker provides a convenient way to package Airflow and its dependencies into a containerized environment, ensuring consistency and reproducibility across different systems. With Docker Airflow, you can easily set up Airflow and its required services, such as databases and web servers, using Docker Compose or other container orchestration tools. Docker simplifies the installation process and allows for easier management and scalability of Airflow deployments.