Install a Private Package as a requirement in Apache Airflow job

Note

Apache Airflow job is powered by Apache Airflow.
Apache Airflow is an open-source platform used to programmatically create, schedule, and monitor complex data workflows. It allows you to define a set of tasks, called operators, that can be combined into directed acyclic graphs (DAGs) to represent data pipelines.

A python package is a way to organize related Python modules into a single directory hierarchy. A package is typically represented as a directory that contains a special file called init.py. Inside a package directory, you can have multiple Python module files (.py files) that define functions, classes, and variables. In the context of Apache Airflow Job, you can develop you private packages to add custom Apache Airflow operators, hooks, sensors, plugins etc.

In this tutorial, you will create a simple custom operator as a Python package, add it as a requirement in the Apache Airflow job environment, and import the private package as a module within the DAG file.

Develop a custom operator and test with an Apache Airflow Dag

  1. Create a file sample_operator.py and convert it to Private Package. Refer to the guide: Creating a package in python

    from airflow.models.baseoperator import BaseOperator
    
    
    class SampleOperator(BaseOperator):
        def __init__(self, name: str, **kwargs) -> None:
            super().__init__(**kwargs)
            self.name = name
    
        def execute(self, context):
            message = f"Hello {self.name}"
            return message
    
    
  2. Create the Apache Airflow DAG file sample_dag.py to test the operator defined in Step 1.

    from datetime import datetime
    from airflow import DAG
    
     # Import from private package
    from airflow_operator.sample_operator import SampleOperator
    
    
    with DAG(
    "test-custom-package",
    tags=["example"]
    description="A simple tutorial DAG",
    schedule_interval=None,
    start_date=datetime(2021, 1, 1),
    ) as dag:
        task = SampleOperator(task_id="sample-task", name="foo_bar")
    
        task
    
  3. Create a GitHub Repository containing the sample_dag.py in Dags folder and your private package file. Common file formats include zip, .whl, or tar.gz. Place the file either in the 'Dags' or 'Plugins' folder, as appropriate. Synchronize your Git Repository with Apache Airflow Job or you can use preconfigured repositoryInstall-Private-Package

Add your package as a requirement

Add the package as a requirement under Airflow requirements. Use the format /opt/airflow/git/<repoName>.git/<pathToPrivatePackage>

For example, if your private package is located at /dags/test/private.whl in a GitHub repo, add the requirement /opt/airflow/git/<repoName>.git/dags/test/private.whl to the Airflow environment.

Screenshot showing private package added as requirement.

Quickstart: Create an Apache Airflow Job