TOVA is a topic modeling platform with a plug-in architecture, supporting training and inference via CLI and web interface.
- TOVA: Topic Visualization & Analysis
- Training
- Train topic models from the CLI or web UI under a unified pattern
- Supported models: see Model Support Table for the full list of traditional and LLM-based models
- Full hyperparameter control over all native parameters of each model
- Topic Enrichment
- Keyword-based topic descriptions out of the box
- Optionally generate LLM-powered topic labels and summaries
- Manually refine generated labels
- Exploration
- Interactive dashboard with topic lists, top documents, visualizations, and different evaluation metrics (coherence, entropy, diversity)
- Suggestions of similar topics based on co-occurrence
- Inference & Export
- Run inference on new inputted documents or a complete corpus
- Download topic assignments (most representative topic for each document) and document-topic matrices
- Extensibility
- Plug-in architecture to add new topic model classes
| Model | Type | Library |
|---|---|---|
| TomotopyLDA | Traditional (Bayesian) | tomotopy |
| CTM | Traditional (Neural) | contextualized-topic-models |
- Docker ≥ 24 with the Compose plugin
make- Git
# 1. Clone the repo
git clone https://github.com/daniel-stephens/TOVA.git && cd TOVA
# 2. Create your environment file (edit ports, credentials, API keys as needed)
cp .env.example .env # or create it manually (see the next section)
# 3. Build images and start the stack
make smart-upOpen the web UI at http://<host>:<WEB_PORT> (default http://localhost:8080).
make smart-updetects uncommitted changes indocker/,ui/, andsrc/and offers a full--no-cacherebuild before starting. Usemake upto skip the check.
Create a .env file in the project root before running any make target.
All variables have safe defaults; change only what you need.
##############
# Image tags #
##############
# Keep these in sync with what your team publishes.
VERSION=latest
ASSETS_DATE=latest
##############
# Host ports #
##############
# These control the port exposed on the HOST machine.
# The containers always listen internally on 11000 (api) and 8080 (web).
# Change these if the default ports are already taken on your machine.
API_PORT=11000
WEB_PORT=8080
HOST=0.0.0.0
###############
# Environment #
###############
ENV=development
################
# LLM API keys #
################
# Leave empty to use only local / Ollama models.
OPENAI_API_KEY=
############
# Postgres #
############
# All three variables must be set together and must remain consistent with the
# existing postgres_data volume. If you change them after first run, you must
# wipe the volume first: make reset-db
POSTGRES_USER=tova_user
POSTGRES_PASSWORD=supersecretpassword
POSTGRES_DB=tova_db
###################
# Admin bootstrap #
###################
# Comma-separated e-mails that are always granted admin access on first login.
TOVA_ADMIN_EMAILS=| Command | What it does |
|---|---|
make build |
Build all images in order: builder → assets → api → web |
make build-api / make build-web |
Build a single runtime image (uses cache) |
make rebuild-all |
Rebuild everything --no-cache, then start the stack |
make rebuild-run |
Rebuild only runtime images (api, web) --no-cache, then start |
make rebuild-api / make rebuild-web |
Rebuild a single service --no-cache |
| Command | What it does |
|---|---|
make smart-up |
Recommended. Checks for source changes, offers rebuild, then starts |
make up |
Build (with cache) and start api, web, postgres |
make down |
Stop and remove all containers |
make reset-db |
!! Stop containers and wipe the Postgres volume (see Troubleshooting) |
| Command | What it does |
|---|---|
make logs-api |
Stream API logs |
make logs-web |
Stream web UI logs |
make logs-postgres |
Stream Postgres logs |
| Service | Internal port | Default host port | Description |
|---|---|---|---|
| api | 11000 | API_PORT (11000) |
FastAPI backend |
| web | 8080 | WEB_PORT (8080) |
Flask web UI |
| postgres | 5432 | 5432 | User and session storage |
| solr | 8983 | 8983 | Apache Solr search engine |
| solr-api | 8001 | 8001 | Solr query adapter |
| zookeeper | 2181 | 2181 | Solr coordination |
Runtime behaviour is controlled by static/config/config.yaml. The file has three main sections:
llm: provider credentials, hosts, and model allowliststopic_modeling.general: shared defaults (provider, prompt, topic count)- Per-model blocks (
traditional,llm_based,opentopicrag,topicgpt) : overrides for each model family
OpenAI / Azure OpenAI: set OPENAI_API_KEY in .env. No changes needed in config.yaml.
-
Start Ollama bound to all interfaces so Docker can reach it:
ollama serve --host 0.0.0.0 --port 11434
-
In
static/config/config.yamlusehost.docker.internalas the hostname:llm: ollama: host: http://host.docker.internal:11434 available_models: { ... } topic_modeling: general: llm_provider: "ollama" llm_model_type: "gemma3:4b" llm_server: "http://host.docker.internal:11434"
Replace host.docker.internal in the former section with the server's IP or hostname, e.g. http://192.168.1.50:11434. No docker-compose changes needed.
python3 -m venv .venv && source .venv/bin/activate
pip install --upgrade pip
pip install -e .python -m src.tova.cli.main train run \
--model tomotopyLDA \
--data data_test/bills_sample_100.csv \
--text-col tokenized_text \
--output data/models/tomotopy \
--tr-params '{"num_topics": 10, "num_iters": 50}'pip install git+https://github.com/daniel-stephens/TOVA.git@master
# or with UI extras:
pip install "tova[ui] @ git+https://github.com/daniel-stephens/TOVA.git@master"The package exposes modules under tova.* for programmatic training and inference.
Build a local wheel with python -m build --wheel.
make smart-up / make up check ports before building. If a port is taken:
- Find and stop the conflicting process:
ss -tlnp | grep :<PORT> - Or change
API_PORT/WEB_PORTin.envand re-run.
Postgres does not re-initialise an existing volume with new credentials.
If you changed POSTGRES_USER, POSTGRES_PASSWORD, or POSTGRES_DB in .env
after the first run, you must wipe the volume:
make reset-db # stops containers, removes postgres_data volume
make up # reinitialises the database with the new credentials!!
make reset-dbpermanently deletes all stored data (users, sessions, model metadata).
Usually caused by a stale Postgres volume whose schema or credentials no longer
match the running app. Run make reset-db then make up.