Authorizing GitHub OAuth with Apache Airflow

Allan Dawson
Full Stack Developer

Apache Airflow Logo

This technical walkthrough will show you how to authorize GitHub OAuth with Apache Airflow, step-by-step.

Table of Contents

  1. What is Apache Airflow
  2. GitHub Token
  3. GitHub OAuth
  4. Airflow Config

What is Apache Airflow

Apache Airflow is an open-source workflow management platform created by the community to programmatically author, schedule, and monitor workflows. Airflow has a modular architecture and uses a message queue to orchestrate an arbitrary number of workers.

Airflow is written in Python, and workflows are created via Python scripts. Airflow is designed under the principle of “configuration as code”. While other “configuration as code” workflow platforms exist using markup languages like XML, using Python allows developers to import libraries and classes to help them create their workflows.

Airflow uses directed acyclic graphs (DAGs) to manage workflow orchestration. Tasks and dependencies are defined in Python and then Airflow manages the scheduling and execution. DAGs can be run either on a defined schedule (e.g. hourly or daily) or based on external event triggers (e.g. a file appearing in Hive). Previous DAG-based schedulers like Oozie and Azkaban tended to rely on multiple configuration files and file system trees to create a DAG, whereas in Airflow, DAGs can often be written in one Python file.

GitHub Token

Personal access tokens (PATs) are an alternative to using passwords for authentication to GitHub when using the GitHub API or the command line.

Create a Personal access token from https://github.com/settings/tokens, with read: enterprise, read: org, user scopes

OAuth Git Access Token

To get your team ID call this endpoint with your token https://api.github.com/orgs/<org-name>/teams, you should get a json output like this:

OAuth Git Token

The value of the key id should be your team ID

GitHub OAuth

GitHub’s OAuth implementation supports the standard authorization code grant type.

To authorize your OAuth app, consider which authorization flow best fits your app.

  • web application flow: used to authorize users for standard OAuth apps that run in the browser (the implicit grant type is not supported).
  • device flow: used for headless apps, such as CLI tools.

We will be using web application flow to authorize users for our airflow:

  • Users are redirected to request their GitHub identity
  • Users are redirected back to your site by GitHub
  • Your app accesses the API with the user’s access token

Create an OAuth app here https://github.com/settings/developers, which should give you a:

  • Client ID
  • Client Secret

Airflow Config

The first time you run Airflow, it will create a file called airflow.cfg in your $AIRFLOW_HOME directory (~/airflow by default). This file contains Airflow’s configuration and you can edit it to change any of the settings. Add the information we gathered so far to the airflow.cfg.

airflow.cfg

[webserver]
auth_backend = airflow.contrib.auth.backends.github_enterprise_auth

[github_enterprise]
api_rev = v3
host = github.com
client_id = <YOUR_CLIENT_ID>
client_secret = <YOUR_CLIENT_SECRET>
oauth_callback_route = /home
allowed_teams = <YOUR_TEAM_ID>

Now when you run your airflow, you should be redirected to your GitHub sign-in page, where you’ll be able to log in. If you are in the allowed teams, once it logs in, it will automatically redirect you to your Airflow application page.

OAuth Git Login

Conclusion

Implementing GitHub OAuth in Airflow, when your developers already use GitHub in their project, affords you two benefits. The developers do not have to use a separate login credential for Apache Airflow. Secondly, it gives them the ability to control which team(s) in their organization can access their Airflow application. This approach will simplify your OAuth implementation while using the tools your developers are most comfortable with.

The Indellient Difference

Indellient provides IT Services with expertise in Cloud ServicesApplication DevelopmentData & Analytics SolutionsDevOps Services and Training, and Cloud Managed Services. Contact our team for a no-obligation conversation on planning, executing, and maintaining your technical projects.

Learn More

About The Author

Allan Dawson

Hi, my name is Allan Dawson, a Full Stack Developer at Indellient. I try to automate everything I can, it is kinda my hobby at times. I am always learning new technologies and constantly improving myself.