forked from datahub-project/datahub
-
Notifications
You must be signed in to change notification settings - Fork 0
Azkaban Execution
SunZhaonan edited this page Feb 16, 2016
·
4 revisions
Collect Azkaban execution information, including Azkaban flows/jobs definitions, DAGs, executions, owners, and schedules.
List of properties required for the ETL process:
| configuration key | description |
|---|---|
| az.db.driver | Azkaban database driver, e.g., com.mysql.jdbc.Driver |
| az.db.jdbc.url | Azkaban database JDBC URL (not including username and password), e.g., jdbc:mysql://localhost:3306/azkaban |
| az.db.password | Azkaban database password |
| az.db.username | Azkaban database username |
| az.exec_etl.lookback_period.in.minutes | lookback period in minutes for executions |
Major related file: AzkabanExtract.py
Connect to Azkaban MySQL database, collect metadata, and store in local file.
Major source tables from Azkaban database: project_flows, execution_flows, triggers, project_permissions
Major related file: AzkabanTransform.py, SchedulerTransform.py
Transform the JSON output into CSV format.
Major related file: AzkabanLoad.py, SchedulerLoad.py
Load into MySQL database. Major related tables: flow, flow_job, flow_dag, flow_schedule, flow_owner_permission, flow_execution, job_execution