Orchestrating workloads is a common challenge in data engineering. Prefect is an excellent tool that simplifies this task with its robust orchestration capabilities. In this guide, I'll walk you through deploying Prefect on AWS ECS and integrating it with Prefect Cloud for front-end management. We’ll also explore using GitHub Actions for CI/CD and Terraform for infrastructure as code.
Architecture
At a high level, our architecture will enable the deployment of new tasks on ECS using Prefect. Workflows are defined using Prefect flows, committed to GitHub, and deployed using GitHub Actions. Prefect Cloud will serve as the user interface for monitoring and management, while AWS ECS provides the compute environment to run the tasks.
Project Structure
Our project comprises three main components:
CI/CD Pipeline: Implemented with GitHub Actions.
Infrastructure: Managed on AWS via Terraform.
Prefect Flows: The actual data processing workflows.
We have three main steps here "terraform-apply" to deploy AWS resources, "build-prefect-docker" and "deploy-prefect-flow". In github repository we would need respectively create all necessary secrets and variables.
Infrastructure on AWS
In this section, we will use Terraform to provision the necessary AWS infrastructure. The following resources will be created:
A VPC with both public and private subnets.
An ECS cluster to run Prefect workloads.
ECR repositories to store Docker images for the Prefect flows and agents.
IAM roles and policies to provide secure access to ECS, ECR, and Secrets Manager.
VPC endpoints for securely accessing AWS services like Secrets Manager and ECR.
Below is the Terraform code to configure the VPC, ECS, and ECR:
Building and Pushing the Prefect Agent Docker Image
To deploy Prefect agents on ECS (Elastic Container Service), we need to build a Docker image that contains the agent code and push it to AWS ECR (Elastic Container Registry).
Dockerfile for the Prefect Agent
First, create a file named Dockerfile_agent with the following content:
FROM prefecthq/prefect:3-latest
RUN pip install prefect-aws
Build and Push the Docker Image to ECR
Once the Dockerfile_agent is created, follow these steps to build and push the Docker image to ECR:
To deploy Prefect flows on ECS, we will build a Docker image in the second step of our CI/CD pipeline. This Docker image contains the necessary dependencies and the flow code.
The Dockerfile that will be used to build the image is as follows:
# Use a Python base image
FROM python:3.9-slim
# Set the working directory
WORKDIR /app
# Install Poetry
RUN pip install --no-cache-dir poetry
# Copy the Poetry files and install dependencies
COPY pyproject.toml poetry.lock ./
RUN poetry install --no-root --no-dev
# Copy the flow code into the container
COPY src/flows/prefect_flow.py ./
# Set the entry point to run the Prefect flow
CMD ["poetry", "run", "python", "prefect_flow.py"]
Prefect Flows
Prefect flows define the data processing tasks. In this example, we use a basic flow that prints a message to confirm the deployment’s success. You can extend this flow with more complex logic tailored to your needs. The ecs_flow in prefect_flow.py looks like this:
from prefect import flow, task
@task
def say_hello():
print("Hello, Prefect on ECS!")
@flow
def ecs_flow():
say_hello()
if __name__ == "__main__":
ecs_flow()
Creating a Prefect Deployment
The deployment process is handled by the create_deployment.py script, which registers the flow with Prefect Cloud using the Prefect CLI. This ensures that the ECS infrastructure is ready to execute flows at scale (./scr/create_deployment.py):
import argparse
from flows.prefect_flow import ecs_flow
def create_deployment(name, work_pool_name, image, build, push):
ecs_flow.deploy(
name,
work_pool_name=work_pool_name,
image=image,
build=build,
push=push,
)
if __name__ == "__main__":
# Parse input parameters
parser = argparse.ArgumentParser()
parser.add_argument("--name", required=True, help="Name of the flow")
parser.add_argument("--work_pool_name", required=True, help="Name of the work pool")
parser.add_argument("--image", required=False, help="Docker image")
parser.add_argument("--build", type=bool, default=False, help="Build flag")
parser.add_argument("--push", type=bool, default=False, help="Push flag")
args = parser.parse_args()
# Call create_deployment with parsed arguments create_deployment(args.name, args.work_pool_name, args.image, args.build, args.push)
Conclusion
By integrating Prefect with AWS ECS and using Terraform for infrastructure automation, we've created a highly scalable, flexible orchestration environment for data processing workloads. Prefect Cloud provides a powerful interface to monitor and manage these tasks, while GitHub Actions automates the CI/CD pipeline for smooth deployments.
This setup allows for easy scaling and management of workflows in a real-world data engineering or DevOps environment. Whether you're automating ETL pipelines or orchestrating machine learning tasks, Prefect's versatility makes it a valuable tool in your stack.
Apache Kafka®, Kafka, and the Kafka logo, Apache Spark™, Spark, and the Spark logo, Apache Flink®, Flink, and the Flink logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. Licensed under the Apache License, Version 2.0. Trino is open source software licensed under the Apache License 2.0 and supported by the Trino Software Foundation. AWS logo, Databricks logo, Snowflake logo and all other product and service names used in this website are for identification purposes only and do not imply endorsement.