FinFlow Analytics 🏦

FinFlow Analytics is a modern data pipeline that processes financial transaction data using Apache Airflow, Google Cloud Platform (GCP), and dbt. The project implements a complete data warehouse solution for financial data analytics.

Architecture Overview 🏗️

The architecture consists of three main components:

Data Generation Layer 🔄
- Apache Airflow hosted on GCP Compute VM using Docker
- Generates synthetic financial data including customers, accounts, transactions, and more
- Orchestrates the entire data pipeline
Data Storage Layer 💾
- Google Cloud Storage (GCS) for raw data storage
- BigQuery for data warehousing
- Handles both raw and transformed data
Data Transformation Layer ⚙️
- dbt for data modeling and transformations
- Source code available at finflow-analytics-dbt
- Implements a modern data warehouse model

Required GCP IAM Roles 🔑

For the service account, you need to grant the following roles:

Google Cloud Storage Roles

roles/storage.objectViewer - Read access to GCS objects
roles/storage.objectCreator - Create new GCS objects
roles/storage.admin - Full access to GCS buckets and objects

BigQuery Roles

roles/bigquery.dataEditor - Read/write access to BigQuery data
roles/bigquery.jobUser - Permission to run BigQuery jobs
roles/bigquery.dataOwner - Full access to BigQuery datasets and tables

Additional Required Roles

roles/compute.viewer - View Compute Engine resources
roles/logging.viewer - View logs
roles/monitoring.viewer - View monitoring data

Data Model 📊

The data warehouse follows a dimensional modeling approach with:

Fact tables: transactions, customer metrics, account balances
Dimension tables: customer, product, location, account, date
Optimized for analytical queries and reporting

Prerequisites 📋

Docker and Docker Compose 🐳
Google Cloud Platform account with:
- Compute Engine
- Cloud Storage
- BigQuery
- Service Account with appropriate permissions
Python 3.8+ 🐍
dbt

Setup Instructions 🚀

Clone the Repository

git clone <repository-url>
cd finflow-analytics

Configure Environment Variables
```
cp .env.example .env
```
Update the following variables in .env:
- AIRFLOW_UID
- _AIRFLOW_WWW_USER_USERNAME
- _AIRFLOW_WWW_USER_PASSWORD
- GCP-related configurations
Set Up Google Cloud Service Account 🔐
- Create a service account with necessary permissions
- Download the JSON key file
- Place it in the config/google/ directory
- Update the path in docker-compose.yml
Start the Services
```
docker-compose up -d
```
Initialize dbt
```
cd dbt
dbt deps
dbt seed
```

Project Structure 📁

finflow-analytics/
├── dags/                 # Airflow DAG definitions
├── logs/                 # Airflow logs
├── config/              
│   └── google/          # GCP service account keys
├── dbt/
│   ├── models/          # dbt transformation models
│   └── profiles/        # dbt connection profiles
├── docker-compose.yml   # Docker services configuration
├── requirements.txt     # Python dependencies
└── README.md

DAG Structure 📈

The main DAG (finflow.py) includes:

Data Generation Tasks
- Generates synthetic data for all dimensions and facts
- Implements data quality checks and validations
Loading Tasks
- Uploads data to Google Cloud Storage
- Loads data into BigQuery staging tables
Transformation Tasks
- Executes dbt models
- Performs data quality tests
- Creates final analytical tables

Data Pipeline 🔄

The pipeline follows these steps:

Generate synthetic financial data
Upload data to GCS in Parquet format
Load data into BigQuery staging tables
Transform data using dbt models
Perform data quality checks
Create final analytical tables

Monitoring and Maintenance 🔍

Access Airflow UI at http://<your-vm-ip>:8080
Monitor DAG runs and task status
View logs in the Airflow UI or /logs directory
Check dbt documentation for transformation details

Additional Resources 📚

Contributing 🤝

Fork the repository
Create a feature branch
Commit your changes
Push to the branch
Create a Pull Request

Support 💬

For support or questions, please open an issue in the repository.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
dags		dags
dbt		dbt
logs		logs
screenshots		screenshots
.DS_Store		.DS_Store
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
update-airflow-url.sh		update-airflow-url.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FinFlow Analytics 🏦

Architecture Overview 🏗️

Required GCP IAM Roles 🔑

Google Cloud Storage Roles

BigQuery Roles

Additional Required Roles

Data Model 📊

Prerequisites 📋

Setup Instructions 🚀

Project Structure 📁

DAG Structure 📈

Data Pipeline 🔄

Monitoring and Maintenance 🔍

Additional Resources 📚

Contributing 🤝

Support 💬

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FinFlow Analytics 🏦

Architecture Overview 🏗️

Required GCP IAM Roles 🔑

Google Cloud Storage Roles

BigQuery Roles

Additional Required Roles

Data Model 📊

Prerequisites 📋

Setup Instructions 🚀

Project Structure 📁

DAG Structure 📈

Data Pipeline 🔄

Monitoring and Maintenance 🔍

Additional Resources 📚

Contributing 🤝

Support 💬

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages