A powerful, multi-client CSV processing system that automatically uploads company data to Google Sheets with intelligent duplicate prevention and client isolation.
- Client A & Client B - Separate data sheets for different companies
- Global Duplicate Prevention - Master sheet prevents cross-client duplicates
- Visual Separation - Color-coded rows for easy client identification
- CSV Upload - Drag & drop or file picker interface
- Column Mapping - Automatic CSV to Google Sheets column mapping
- Bulk Processing - Efficient handling of large datasets (1000+ companies)
- Real-time Progress - Live progress bar and status updates
- Master Sheet - Global registry of all companies (Client, Company, Date Added)
- Client Sheets - Full data storage for each client
- OAuth 2.0 - Secure Google API authentication
- Automatic Updates - Next available row detection
- React + Vite - Fast, responsive frontend
- Tailwind CSS - Beautiful, modern design
- Drag & Drop - Intuitive file upload
- Real-time Feedback - Processing status and progress tracking
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Frontend │ │ Backend │ │ Google Sheets │
│ (React) │◄──►│ (Flask) │◄──►│ API │
│ │ │ │ │ │
│ • File Upload │ │ • CSV Processing │ │ • Master Sheet │
│ • Client Select │ │ • Duplicate Check│ │ • Client Sheets │
│ • Progress Bar │ │ • Data Mapping │ │ • Visual Colors │
└─────────────────┘ └──────────────────┘ └─────────────────┘
- Python 3.8+
- Node.js 16+
- Google Cloud Project with Sheets API enabled
- Google Service Account credentials
git clone <your-repo-url>
cd Crunchbase-AUtomationcd backend
# Create virtual environment
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Setup environment variables
cp env.example .env
# Edit .env with your Google Sheets configuration
# Upload Google credentials
# - credentials.json (Service Account key)
# - token.json (OAuth tokens)
# Run backend
python app.pycd frontend
# Install dependencies
npm install
# Setup environment variables
cp .env.example .env.development # For local development
cp .env.example .env.production # For production deployment
# Edit environment files with your configuration
# .env.development - Use for local development
# .env.production - Use for production deployment
# Run development server
npm run dev- Frontend: http://localhost:5173
- Backend API: http://localhost:5000
# Master Sheet Configuration
MASTER_SHEET_ID=your_master_sheet_id
MASTER_SHEET_NAME=Master
# Client A Configuration
CLIENT_A_NAME=Client A
CLIENT_A_SHEET_ID=client_a_sheet_id
CLIENT_A_SHEET_NAME=Sheet1
# Client B Configuration
CLIENT_B_NAME=Client B
CLIENT_B_SHEET_ID=client_b_sheet_id
CLIENT_B_SHEET_NAME=Sheet1
# Google API Configuration
GOOGLE_SHEETS_CREDENTIALS_FILE=credentials.json
GOOGLE_SHEETS_TOKEN_FILE=token.json
# File Upload Configuration
UPLOAD_FOLDER=uploads
MAX_CONTENT_LENGTH=16777216The frontend uses environment variables for configuration:
# .env.development (Local Development)
VITE_API_BASE_URL=http://localhost:5000
VITE_APP_NAME=Crunchbase Automation (Dev)
# .env.production (Production)
VITE_API_BASE_URL=https://your-domain.com
VITE_APP_NAME=Crunchbase AutomationImportant: Update VITE_API_BASE_URL in .env.production to match your deployed backend URL.
The system automatically maps CSV columns to Google Sheets headers:
| CSV Column | Sheet Column | Action |
|---|---|---|
| Organization Name | Company Name | Direct Map |
| Website | Website | Direct Map |
| Company Linkedin | Direct Map | |
| Last Funding Type | Campaign | Direct Map |
| — | Date | Today's date (DD/MM/YY) |
| — | Source | "Crunchbase" |
| — | POCs | Leave blank |
| — | Email ID | Leave blank |
| — | Reachout LinkedIn | Leave blank |
| — | Reachout Email | Leave blank |
| — | Response | Leave blank |
Crunchbase-AUtomation/
├── backend/ # Python Flask backend
│ ├── app.py # Main Flask application
│ ├── config.py # Configuration management
│ ├── csv_processor.py # CSV processing logic
│ ├── google_sheets_service.py # Google Sheets integration
│ ├── master_sheet_service.py # Master sheet operations
│ ├── requirements.txt # Python dependencies
│ ├── .env # Environment variables
│ ├── credentials.json # Google API credentials
│ └── uploads/ # File upload directory
├── frontend/ # React frontend
│ ├── src/ # Source code
│ │ ├── App.jsx # Main application component
│ │ ├── App.css # Application styles
│ │ └── main.jsx # Application entry point
│ ├── package.json # Node.js dependencies
│ └── dist/ # Built frontend files
├── .gitignore # Git ignore rules
├── README.md # This file
├── deploy-digitalocean.sh # DigitalOcean deployment script
└── DIGITALOCEAN_SETUP.md # DigitalOcean setup guide
# 1. Create $6/month droplet
# 2. SSH into your droplet
ssh root@your-droplet-ip
# 3. Run deployment script
curl -O https://raw.githubusercontent.com/your-repo/deploy-digitalocean.sh
chmod +x deploy-digitalocean.sh
./deploy-digitalocean.sh- Heroku: Use
Procfileand environment variables - Google Cloud Run: Containerized deployment
- AWS EC2: Similar to DigitalOcean setup
- User selects CSV file and client
- Frontend sends file to backend API
- Backend parses CSV and maps columns
- Checks master sheet for existing companies
- Filters out duplicates
- Adds new companies to master sheet
- Updates client-specific sheet with full data
- Applies visual formatting (client colors)
- Progress bar shows processing status
- Success/error messages displayed
- Processing details shown to user
- OAuth 2.0 authentication with Google
- Service Account for secure API access
- Environment Variables for sensitive data
- Input Validation and sanitization
- File Type Restrictions (CSV only)
- Client Isolation (no cross-contamination)
cd backend
python -m pytest tests/cd frontend
npm test# Health check
curl http://localhost:5000/api/health
# Get available clients
curl http://localhost:5000/api/clients
# Test master sheet connection
curl http://localhost:5000/api/test-master-connection- Verify
credentials.jsonis correct service account key - Check
token.jsonexists and is valid - Ensure Google Sheets API is enabled
- Check CSV column names match expected format
- Verify Google Sheet headers exist
- Review column mapping configuration
- Ensure master sheet is properly initialized
- Check
DUPLICATE_CHECK_FIELDSconfiguration - Verify company name field mapping
# Enable debug logging
export FLASK_ENV=development
python app.py- Processing Speed: 1000+ companies in under 2 minutes
- Memory Usage: Efficient pandas operations
- API Calls: Minimal Google Sheets API usage
- Scalability: Easy to add more clients
git pull origin main
cd backend && pip install -r requirements.txt
cd ../frontend && npm install && npm run build
systemctl restart crunchbase-automation # If deployed- Google Sheets: Version history and revision tracking
- Application Data: Git repository backup
- Credentials: Secure credential storage
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- Issues: Create GitHub issue
- Documentation: Check this README and setup guides
- Deployment: Refer to
DIGITALOCEAN_SETUP.md
- Multi-file Upload - Process multiple CSVs simultaneously
- Advanced Filtering - Industry, funding, location filters
- Data Export - Export processed data to various formats
- Analytics Dashboard - Processing statistics and insights
- API Rate Limiting - Better Google API quota management
- Webhook Support - Notifications on completion
Built with ❤️ for efficient business automation
Your Crunchbase data, organized and secure across multiple clients. 🚀