🚀 Crunchbase Automation System

A powerful, multi-client CSV processing system that automatically uploads company data to Google Sheets with intelligent duplicate prevention and client isolation.

✨ Features

🔒 Multi-Client Support

Client A & Client B - Separate data sheets for different companies
Global Duplicate Prevention - Master sheet prevents cross-client duplicates
Visual Separation - Color-coded rows for easy client identification

📊 Smart Data Processing

CSV Upload - Drag & drop or file picker interface
Column Mapping - Automatic CSV to Google Sheets column mapping
Bulk Processing - Efficient handling of large datasets (1000+ companies)
Real-time Progress - Live progress bar and status updates

🔐 Google Sheets Integration

Master Sheet - Global registry of all companies (Client, Company, Date Added)
Client Sheets - Full data storage for each client
OAuth 2.0 - Secure Google API authentication
Automatic Updates - Next available row detection

🎨 Modern User Interface

React + Vite - Fast, responsive frontend
Tailwind CSS - Beautiful, modern design
Drag & Drop - Intuitive file upload
Real-time Feedback - Processing status and progress tracking

🏗️ Architecture

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   Frontend      │    │   Backend        │    │   Google Sheets │
│   (React)       │◄──►│   (Flask)        │◄──►│   API           │
│                 │    │                  │    │                 │
│ • File Upload   │    │ • CSV Processing │    │ • Master Sheet  │
│ • Client Select │    │ • Duplicate Check│    │ • Client Sheets │
│ • Progress Bar  │    │ • Data Mapping   │    │ • Visual Colors │
└─────────────────┘    └──────────────────┘    └─────────────────┘

🚀 Quick Start

Prerequisites

Python 3.8+
Node.js 16+
Google Cloud Project with Sheets API enabled
Google Service Account credentials

1. Clone Repository

git clone <your-repo-url>
cd Crunchbase-AUtomation

2. Backend Setup

cd backend

# Create virtual environment
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Setup environment variables
cp env.example .env
# Edit .env with your Google Sheets configuration

# Upload Google credentials
# - credentials.json (Service Account key)
# - token.json (OAuth tokens)

# Run backend
python app.py

3. Frontend Setup

cd frontend

# Install dependencies
npm install

# Setup environment variables
cp .env.example .env.development  # For local development
cp .env.example .env.production   # For production deployment

# Edit environment files with your configuration
# .env.development - Use for local development
# .env.production - Use for production deployment

# Run development server
npm run dev

4. Access Application

Frontend: http://localhost:5173
Backend API: http://localhost:5000

⚙️ Configuration

Environment Variables (.env)

# Master Sheet Configuration
MASTER_SHEET_ID=your_master_sheet_id
MASTER_SHEET_NAME=Master

# Client A Configuration
CLIENT_A_NAME=Client A
CLIENT_A_SHEET_ID=client_a_sheet_id
CLIENT_A_SHEET_NAME=Sheet1

# Client B Configuration
CLIENT_B_NAME=Client B
CLIENT_B_SHEET_ID=client_b_sheet_id
CLIENT_B_SHEET_NAME=Sheet1

# Google API Configuration
GOOGLE_SHEETS_CREDENTIALS_FILE=credentials.json
GOOGLE_SHEETS_TOKEN_FILE=token.json

# File Upload Configuration
UPLOAD_FOLDER=uploads
MAX_CONTENT_LENGTH=16777216

Frontend Configuration

The frontend uses environment variables for configuration:

# .env.development (Local Development)
VITE_API_BASE_URL=http://localhost:5000
VITE_APP_NAME=Crunchbase Automation (Dev)

# .env.production (Production)
VITE_API_BASE_URL=https://your-domain.com
VITE_APP_NAME=Crunchbase Automation

Important: Update VITE_API_BASE_URL in .env.production to match your deployed backend URL.

Column Mapping

The system automatically maps CSV columns to Google Sheets headers:

CSV Column	Sheet Column	Action
Organization Name	Company Name	Direct Map
Website	Website	Direct Map
LinkedIn	Company Linkedin	Direct Map
Last Funding Type	Campaign	Direct Map
—	Date	Today's date (DD/MM/YY)
—	Source	"Crunchbase"
—	POCs	Leave blank
—	Email ID	Leave blank
—	Reachout LinkedIn	Leave blank
—	Reachout Email	Leave blank
—	Response	Leave blank

📁 Project Structure

Crunchbase-AUtomation/
├── backend/                    # Python Flask backend
│   ├── app.py                 # Main Flask application
│   ├── config.py              # Configuration management
│   ├── csv_processor.py       # CSV processing logic
│   ├── google_sheets_service.py # Google Sheets integration
│   ├── master_sheet_service.py # Master sheet operations
│   ├── requirements.txt       # Python dependencies
│   ├── .env                   # Environment variables
│   ├── credentials.json       # Google API credentials
│   └── uploads/               # File upload directory
├── frontend/                   # React frontend
│   ├── src/                   # Source code
│   │   ├── App.jsx            # Main application component
│   │   ├── App.css            # Application styles
│   │   └── main.jsx           # Application entry point
│   ├── package.json           # Node.js dependencies
│   └── dist/                  # Built frontend files
├── .gitignore                 # Git ignore rules
├── README.md                  # This file
├── deploy-digitalocean.sh     # DigitalOcean deployment script
└── DIGITALOCEAN_SETUP.md      # DigitalOcean setup guide

🚀 Deployment

DigitalOcean (Recommended)

# 1. Create $6/month droplet
# 2. SSH into your droplet
ssh root@your-droplet-ip

# 3. Run deployment script
curl -O https://raw.githubusercontent.com/your-repo/deploy-digitalocean.sh
chmod +x deploy-digitalocean.sh
./deploy-digitalocean.sh

Other Cloud Platforms

Heroku: Use Procfile and environment variables
Google Cloud Run: Containerized deployment
AWS EC2: Similar to DigitalOcean setup

📊 How It Works

1. File Upload

User selects CSV file and client
Frontend sends file to backend API

2. Data Processing

Backend parses CSV and maps columns
Checks master sheet for existing companies
Filters out duplicates

3. Google Sheets Update

Adds new companies to master sheet
Updates client-specific sheet with full data
Applies visual formatting (client colors)

4. Real-time Feedback

Progress bar shows processing status
Success/error messages displayed
Processing details shown to user

🔒 Security Features

OAuth 2.0 authentication with Google
Service Account for secure API access
Environment Variables for sensitive data
Input Validation and sanitization
File Type Restrictions (CSV only)
Client Isolation (no cross-contamination)

🧪 Testing

Backend Testing

cd backend
python -m pytest tests/

Frontend Testing

cd frontend
npm test

API Testing

# Health check
curl http://localhost:5000/api/health

# Get available clients
curl http://localhost:5000/api/clients

# Test master sheet connection
curl http://localhost:5000/api/test-master-connection

🚨 Troubleshooting

Common Issues

Authentication Errors

Verify credentials.json is correct service account key
Check token.json exists and is valid
Ensure Google Sheets API is enabled

Column Mapping Issues

Check CSV column names match expected format
Verify Google Sheet headers exist
Review column mapping configuration

Duplicate Detection Problems

Ensure master sheet is properly initialized
Check DUPLICATE_CHECK_FIELDS configuration
Verify company name field mapping

Debug Mode

# Enable debug logging
export FLASK_ENV=development
python app.py

📈 Performance

Processing Speed: 1000+ companies in under 2 minutes
Memory Usage: Efficient pandas operations
API Calls: Minimal Google Sheets API usage
Scalability: Easy to add more clients

🔄 Updates & Maintenance

Update Application

git pull origin main
cd backend && pip install -r requirements.txt
cd ../frontend && npm install && npm run build
systemctl restart crunchbase-automation  # If deployed

Backup Strategy

Google Sheets: Version history and revision tracking
Application Data: Git repository backup
Credentials: Secure credential storage

🤝 Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🆘 Support

Issues: Create GitHub issue
Documentation: Check this README and setup guides
Deployment: Refer to DIGITALOCEAN_SETUP.md

🎯 Roadmap

Multi-file Upload - Process multiple CSVs simultaneously
Advanced Filtering - Industry, funding, location filters
Data Export - Export processed data to various formats
Analytics Dashboard - Processing statistics and insights
API Rate Limiting - Better Google API quota management
Webhook Support - Notifications on completion

Built with ❤️ for efficient business automation

Your Crunchbase data, organized and secure across multiple clients. 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

🚀 Crunchbase Automation System

✨ Features

🔒 Multi-Client Support

📊 Smart Data Processing

🔐 Google Sheets Integration

🎨 Modern User Interface

🏗️ Architecture

🚀 Quick Start

Prerequisites

1. Clone Repository

2. Backend Setup

3. Frontend Setup

4. Access Application

⚙️ Configuration

Environment Variables (.env)

Frontend Configuration

Column Mapping

📁 Project Structure

🚀 Deployment

DigitalOcean (Recommended)

Other Cloud Platforms

📊 How It Works

1. File Upload

2. Data Processing

3. Google Sheets Update

4. Real-time Feedback

🔒 Security Features

🧪 Testing

Backend Testing

Frontend Testing

API Testing

🚨 Troubleshooting

Common Issues

Authentication Errors

Column Mapping Issues

Duplicate Detection Problems

Debug Mode

📈 Performance

🔄 Updates & Maintenance

Update Application

Backup Strategy

🤝 Contributing

📄 License

🆘 Support

🎯 Roadmap

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages