Skip to content

musegrowth/email-extractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📧 Email Finder — Local Snov.io Alternative

A complete, production-ready Chrome Extension + local Node.js backend that replicates Snov.io's domain email-discovery functionality. Personal use only.


🚀 Quick Start

# 1. Install backend dependencies and start the server
cd backend && npm install && node server.js

# 2. In another terminal (optional) verify it is running
curl http://localhost:3000/api/health

# 3. Load the Chrome extension
#    - Open chrome://extensions
#    - Enable "Developer mode"
#    - Click "Load unpacked"
#    - Select the ./extension folder

That's it. Click the toolbar icon and start searching.


🏗️ Architecture

                ┌─────────────────────┐
                │  Chrome Extension   │
                │  (Manifest V3)      │
                └──────────┬──────────┘
                           │ POST /api/search
                           │ POST /api/bulk/start
                           │ POST /api/verify
                           ▼
              ┌────────────────────────┐
              │  Express Server :3000  │
              └─────────┬──────────────┘
                        │
   ┌────────────────────┼─────────────────────┐
   │                    │                     │
   ▼                    ▼                     ▼
┌────────┐         ┌─────────┐          ┌──────────┐
│Crawler │         │Verifier │          │  SQLite  │
│sources │         │SMTP/MX  │          │  cache   │
└────────┘         └─────────┘          └──────────┘
   │                    │
   ▼                    ▼
Direct, GitHub,    syntax → DNS → MX
Wayback,           → disposable → SMTP
DuckDuckGo         → catch-all → score

Data flow per domain:

  1. Normalize input (https://www.tesla.com/tesla.com)
  2. Check cache (return instantly if < 7 days old, unless forceRefresh)
  3. Run all 5 sources in parallel: direct crawl, GitHub, Wayback, DuckDuckGo, generic generator
  4. Categorize emails into 3 buckets
  5. Score and persist to SQLite
  6. Return to extension

🧩 Features

Feature Description
Single domain search Enter one domain, get categorized results
Bulk CSV upload Upload .csv/.txt, process sequentially with pause/resume
3-tab categorization Prospects · Domain Emails · Generic Contacts
Email verification 7-step pipeline (syntax → MX → SMTP → score)
SMTP fallback Auto-detect blocked port 25, fall back to MX-only
Search history Last 20 searches stored in extension
Export CSV / JSON, per-tab or combined
7-day cache Re-searching same domain returns instantly
Dark mode Toggle in popup header

📡 API Reference

Method Path Body Description
POST /api/search { domain, forceRefresh? } Search single domain
POST /api/bulk/start { domains: [] } Start bulk job
GET /api/bulk/:jobId/status Poll job status
POST /api/bulk/:jobId/pause Pause running job
POST /api/bulk/:jobId/resume Resume paused job
POST /api/bulk/:jobId/cancel Cancel job
POST /api/verify { email } Verify single email
POST /api/verify/bulk { emails: [] } Verify many
GET /api/history Recent searches
GET /api/export/:domain?format=csv|json Download results
GET /api/health Liveness check

🛠️ Troubleshooting

SMTP port 25 blocked (most common in Bangladesh / India / residential ISPs)

Most ISPs block outbound connections to port 25 to prevent spam. The verifier auto-detects this on the first attempt and switches to MX-only mode: emails are scored to a max of 70 instead of 95, and a warning banner appears in the extension.

To regain full SMTP verification:

  • Use a VPN that does not block port 25 (e.g., Mullvad with port-forwarding profiles)
  • Run the backend on a VPS (DigitalOcean, Hetzner) and point the extension at it
  • Use your phone's mobile hotspot — some carriers don't block port 25

Puppeteer download fails on Windows

# Skip Chromium download and reuse system Chrome
set PUPPETEER_SKIP_DOWNLOAD=true
npm install

Then set the executable path in backend/crawler/fetcher.js if needed.

"Backend not running" in extension

The extension polls http://localhost:3000. Make sure node server.js is running. The extension shows a clear error banner with restart instructions if it can't reach the backend.

better-sqlite3 build error on Windows

You need Visual C++ Build Tools:

npm install --global windows-build-tools

Or use a prebuilt: npm install better-sqlite3 --build-from-source=false.

Domain returns 0 emails

Some sites are heavily protected (Cloudflare bot challenge, JS-only rendering). The crawler:

  1. Tries axios first
  2. Falls back to Puppeteer if React/Vue/Angular markers are detected
  3. If still empty, GitHub/Wayback/DuckDuckGo can still surface emails

💡 Usage Examples

Single Domain

  1. Click extension icon
  2. Domain auto-fills from current tab (or type one)
  3. Click 🔍
  4. Browse the three tabs

Bulk CSV

  1. Switch to "Bulk CSV" tab
  2. Upload sample-domains.csv (provided)
  3. Click "Start"
  4. Watch progress; pause/resume as needed
  5. Export combined results

Programmatic

curl -X POST http://localhost:3000/api/search \
  -H "Content-Type: application/json" \
  -d '{"domain":"stripe.com"}'

🔮 Suggested Next Steps

  • Add Hunter.io-style email pattern guessing UI
  • Integrate Apollo / Clearbit-style company enrichment
  • Add CSV column mapping (name, company → email guess)
  • Persist verification status across multiple SMTP attempts
  • Add Slack / Discord webhook notifications when bulk job completes
  • Build a separate dashboard at localhost:3000/ for richer browsing

⚠️ Legal / Ethical

  • Personal-use tool only
  • Respect robots.txt (the crawler honors common deny paths)
  • Don't use for cold-email spam — use for legitimate research, sales prospecting, or recruiting
  • Some jurisdictions classify scraping public emails as a gray area; check your local law

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors