From a single query to a structured, source‑linked research report.
Searches the web → cleans articles → extracts insights with GPT‑4o → de‑duplicates & ranks → serves JSON, SQLite, CLI, and a Streamlit UI.
| Layer | Module | Tech / Libs | Purpose |
|---|---|---|---|
| Search | core/search.py |
Serper.dev · httpx |
Google‑style search → URLs |
| Parse | core/parse.py |
Async httpx · trafilatura |
Download & boilerplate‑strip HTML |
| Extract | core/extract.py |
OpenAI GPT‑4o · function‑calling | Pull trends, companies, gaps |
| Aggregate | core/aggregate.py |
RapidFuzz · frequency rank | De‑dupe & merge into a report |
| Persist | persistence.py |
SQLite · sqlmodel |
--save-sqlite for history |
| Serve | api/run.py |
FastAPI · Typer CLI | /report JSON · gtmind CLI cmd |
| UI | ui/app.py |
Streamlit | Query box, saved list, two‑column companies |
flowchart LR
Q(Query) --> S(Search Service)
S --> P[Downloader + Trafilatura]
P --> E[OpenAI Extractor]
E --> A[Aggregator]
A --> J[ResearchReport JSON]
J -->|CLI| C[Typer]
J -->|SQLite| D[(DB)]
J -->|HTTP| F[FastAPI]
J -->|UI| U[Streamlit]
- Python 3.10+
- OpenAI API key
- Serper.dev API key (for Google-style search)
- Poetry (for dependency management)
Clone the repository and install dependencies:
git clone https://github.com/amritkochar/GTMind.git && cd GTMind
make install # poetry deps + toolsBefore running the application, ensure you have a .env file in the root directory. You can create one by copying the provided .env.example file:
cp .env.example .envThen, open the .env file and add your API keys:
OPENAI_API_KEY="sk-•••"
SEARCH_API_KEY="serp_•••"These keys are required for the application to function properly. Please do not commit these keys to Github.
Run the following commands to start the backend and frontend services:
make serve # FastAPI on :8000
make ui # Streamlit on :8501streamlit run src/gtmind/ui/app.py
poetry run gtmind run "AI in retail" \
--out sample_output.json \
--save-sqlite research.dbGET http://localhost:8000/report?q=AI+in+retail
- 🔹 Two‑column company list
- 🟢 Green‑highlighted whitespace gaps
- 📚 Sidebar of most‑recent reports (reads SQLite)
src/gtmind/
├─ core/ # search, parse, extract, aggregate
├─ api/ # FastAPI + Typer CLI
├─ ui/ # Streamlit front‑end
├─ persistence.py # SQLite helpers
├─ sample_outputs/ # example JSON reports
└─ tests/ # unit + integration
Real JSON examples live in sample_outputs/:
ai_in_retail.jsonai_in_healthcare.json
- 🧠 Build evals to ensure good quality data is only used to prepare reports
- 🚧 Build guardrails to ensure consistent pipeline outputs, avoid useless usage of LLM
- 🔎 Vector cache of article embeddings for faster re‑runs
- ✨ RAG enrichment for deeper summaries
- 🌐 OAuth‑guarded web UI & shareable URLs
- 🤖 Scheduled cron search with email digests
- 🐳 Docker container & CI pipeline
make check # lint (ruff) + type‑check (mypy)
make test # run pytestPull requests welcome — please keep tests green! 🎉