FinFlow Analytics is a modern data pipeline that processes financial transaction data using Apache Airflow, Google Cloud Platform (GCP), and dbt. The project implements a complete data warehouse solution for financial data analytics.
The architecture consists of three main components:
-
Data Generation Layer π
- Apache Airflow hosted on GCP Compute VM using Docker
- Generates synthetic financial data including customers, accounts, transactions, and more
- Orchestrates the entire data pipeline
-
Data Storage Layer πΎ
- Google Cloud Storage (GCS) for raw data storage
- BigQuery for data warehousing
- Handles both raw and transformed data
-
Data Transformation Layer βοΈ
- dbt for data modeling and transformations
- Source code available at finflow-analytics-dbt
- Implements a modern data warehouse model
For the service account, you need to grant the following roles:
roles/storage.objectViewer- Read access to GCS objectsroles/storage.objectCreator- Create new GCS objectsroles/storage.admin- Full access to GCS buckets and objects
roles/bigquery.dataEditor- Read/write access to BigQuery dataroles/bigquery.jobUser- Permission to run BigQuery jobsroles/bigquery.dataOwner- Full access to BigQuery datasets and tables
roles/compute.viewer- View Compute Engine resourcesroles/logging.viewer- View logsroles/monitoring.viewer- View monitoring data
The data warehouse follows a dimensional modeling approach with:
- Fact tables: transactions, customer metrics, account balances
- Dimension tables: customer, product, location, account, date
- Optimized for analytical queries and reporting
- Docker and Docker Compose π³
- Google Cloud Platform account with:
- Compute Engine
- Cloud Storage
- BigQuery
- Service Account with appropriate permissions
- Python 3.8+ π
- dbt
-
Clone the Repository
git clone <repository-url> cd finflow-analytics
-
Configure Environment Variables
cp .env.example .env
Update the following variables in
.env:AIRFLOW_UID_AIRFLOW_WWW_USER_USERNAME_AIRFLOW_WWW_USER_PASSWORD- GCP-related configurations
-
Set Up Google Cloud Service Account π
- Create a service account with necessary permissions
- Download the JSON key file
- Place it in the
config/google/directory - Update the path in
docker-compose.yml
-
Start the Services
docker-compose up -d
-
Initialize dbt
cd dbt dbt deps dbt seed
finflow-analytics/
βββ dags/ # Airflow DAG definitions
βββ logs/ # Airflow logs
βββ config/
β βββ google/ # GCP service account keys
βββ dbt/
β βββ models/ # dbt transformation models
β βββ profiles/ # dbt connection profiles
βββ docker-compose.yml # Docker services configuration
βββ requirements.txt # Python dependencies
βββ README.md
The main DAG (finflow.py) includes:
-
Data Generation Tasks
- Generates synthetic data for all dimensions and facts
- Implements data quality checks and validations
-
Loading Tasks
- Uploads data to Google Cloud Storage
- Loads data into BigQuery staging tables
-
Transformation Tasks
- Executes dbt models
- Performs data quality tests
- Creates final analytical tables
The pipeline follows these steps:
- Generate synthetic financial data
- Upload data to GCS in Parquet format
- Load data into BigQuery staging tables
- Transform data using dbt models
- Perform data quality checks
- Create final analytical tables
- Access Airflow UI at
http://<your-vm-ip>:8080 - Monitor DAG runs and task status
- View logs in the Airflow UI or
/logsdirectory - Check dbt documentation for transformation details
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
For support or questions, please open an issue in the repository.

