Skip to content

tghred/Rick-Morty-Character-Analytics-ETL-Pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

20 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Rick & Morty Character Analytics ETL Pipeline

Python PostgreSQL License ETL

A professional ETL (Extract, Transform, Load) pipeline for collecting, processing, and analyzing character data from the Rick and Morty API. This system provides a robust foundation for data extraction and analysis with production-ready features including PostgreSQL integration.

๐ŸŒŸ Overview

An integrated system for extracting and analyzing Rick and Morty character data using ETL technologies. The system represents an automated platform for collecting data from public APIs, processing it, and storing it in multiple formats (JSON, CSV, PostgreSQL) to enable comprehensive analysis operations.

๐Ÿš€ Features

๐Ÿ”„ Integrated ETL Process

  • Extraction: Secure and reliable data retrieval from Rick and Morty API
  • Transformation: Data cleaning, enrichment, and preparation for analysis
  • Load: Data storage in multiple formats (JSON, CSV, PostgreSQL)

๐Ÿ—„๏ธ Database Integration

  • PostgreSQL Support: Full database integration with connection management
  • Configurable Settings: Secure database configuration using INI files
  • Connection Pooling: Robust connection handling with error recovery

๐Ÿ—๏ธ Scalable Architecture

  • Modular design allowing easy integration of new data sources
  • Multi-format storage support with database expansion capability
  • Comprehensive error handling ensuring operational continuity

โšก Enhanced Performance

  • Intelligent request management respecting server limitations
  • Parallel data processing across multiple pages
  • Request rate control preventing blocking and ensuring stability

๐Ÿ“ฆ Installation

Prerequisites

  • Python 3.7+
  • PostgreSQL 12+
  • pip package manager

Setup

# Clone the repository
git clone https://github.com/yourusername/rick-morty-etl.git
cd rick-morty-etl

# Install dependencies
pip install -r requirements.txt

Database Setup

  1. Install PostgreSQL and create a database
  2. Update the database.ini file with your credentials:
[postgresql]
host=localhost
database=postgres
user=postgres
password=[your_password]
port=[your_port]
connect_timeout=10

๐Ÿ› ๏ธ Project Structure

rick-morty-etl/
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ etl_pipeline.py          # Main ETL pipeline
โ”‚   โ”œโ”€โ”€ database/
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ”œโ”€โ”€ config.py            # Database configuration
โ”‚   โ”‚   โ””โ”€โ”€ connection.py        # Database connection manager
โ”‚   โ””โ”€โ”€ utils/
โ”‚       โ”œโ”€โ”€ __init__.py
โ”‚       โ”œโ”€โ”€ data_processor.py    # Data transformation functions
โ”‚       โ””โ”€โ”€ file_exporter.py     # JSON/CSV export functions
โ”œโ”€โ”€ config/
โ”‚   โ””โ”€โ”€ database.ini            # Database configuration
โ”œโ”€โ”€ requirements.txt
โ”œโ”€โ”€ README.md

๐Ÿ“‹ Usage

Basic Usage

from src.etl_pipeline import main_etl_pipeline

# Run complete ETL pipeline
main_etl_pipeline()

Advanced Usage with Custom Configuration

from src.database.connection import db_connection
from src.etl_pipeline import get_all_characters, save_to_postgresql

# Custom ETL process
characters = get_all_characters()

# Save to PostgreSQL with custom settings
with db_connection(section='production') as conn:
    save_to_postgresql(characters, connection=conn)

# Export to files
from src.utils.file_exporter import save_to_json, save_to_csv
save_to_json(characters, 'output/characters.json')
save_to_csv(characters, 'output/characters.csv')

Database Operations

from src.database.connection import db_connection
from src.database.config import test_connection

# Test database connection
test_connection()

# Execute custom queries
with db_connection() as conn:
    with conn.cursor() as cursor:
        cursor.execute("SELECT COUNT(*) FROM characters")
        count = cursor.fetchone()[0]
        print(f"Total characters: {count}")

๐Ÿ—ƒ๏ธ Database Schema

Characters Table

CREATE TABLE characters (
    id INTEGER PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    status VARCHAR(50),
    species VARCHAR(100),
    episode_count INTEGER,
    location VARCHAR(255),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Useful Queries

-- Character statistics by status
SELECT status, COUNT(*) as count 
FROM characters 
GROUP BY status 
ORDER BY count DESC;

-- Top characters by episode appearance
SELECT name, episode_count 
FROM characters 
ORDER BY episode_count DESC 
LIMIT 10;

-- Species distribution
SELECT species, COUNT(*) as count 
FROM characters 
GROUP BY species 
ORDER BY count DESC;

โš™๏ธ Configuration

Database Configuration

Create config/database.ini:

[postgresql]
host=localhost
database=postgres
user=postgres
password=[your_password]
port=[your_port]
connect_timeout=10

Environment Variables (Optional)

export DB_HOST=localhost
export DB_NAME=rick_morty_db
export DB_USER=postgres
export DB_PASSWORD=your_password

๐Ÿงช Testing

Test Database Connection

from src.database.connection import test_connection

# Test connection to default database
test_connection()

# Test connection to specific section
test_connection(section='postgresql')

๐Ÿ› Error Handling

The system includes comprehensive error handling for:

  • Network connectivity issues
  • API rate limiting
  • Database connection failures
  • Data parsing errors
  • File I/O operations

Example Error Recovery

try:
    with db_connection() as conn:
        # Database operations
        pass
except psycopg2.OperationalError as e:
    print(f"Database connection failed: {e}")
    # Implement retry logic or fallback
except Exception as e:
    print(f"Unexpected error: {e}")

๐Ÿ“Š Output Examples

JSON Output

[
  {
    "id": 1,
    "name": "Rick Sanchez",
    "status": "Alive",
    "species": "Human",
    "episode_count": 51,
    "location": "Earth"
  }
]

CSV Output

id,name,status,species,episode_count,location
1,Rick Sanchez,Alive,Human,51,Earth
2,Morty Smith,Alive,Human,39,Earth

๐Ÿ”ง Development

Adding New Data Sources

  1. Create new extractor in src/extractors/
  2. Implement transformation logic in src/utils/data_processor.py
  3. Update database schema if needed
  4. Add tests in tests/

๐Ÿค Contributing

We welcome contributions! Please see our Contributing Guide for details.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

  • Rick and Morty API for providing the data
  • PostgreSQL community for excellent database support
  • Python community for robust data processing libraries

๐Ÿ“ž Support

If you have any questions or need help:

๐Ÿ”ฎ Future Enhanceances

  • Real-time data streaming
  • Docker containerization
  • REST API for data access
  • Dashboard for visualization
  • Machine learning integration

โญ If you find this project useful, please give it a star! โญ

About

A production-ready ETL (Extract, Transform, Load) pipeline for collecting, processing, and analyzing character data from the Rick and Morty API. This modular system provides a robust foundation for data engineering projects with full PostgreSQL integration.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages