Local CrossRef database with 167M+ scholarly works, full-text search, and impact factor calculation
MCP Demo Video
Live demonstration of MCP server integration with Claude Code for epilepsy seizure prediction literature review:
- Full-text search on title, abstracts, and keywords across 167M papers (22ms response)
π Full demo documentation | π Generated diagrams
Why CrossRef Local?
Built for the LLM era - features that matter for AI research assistants:
| Feature | Benefit |
|---|---|
| π Abstracts | Full text for semantic understanding |
| π Impact Factor | Filter by journal quality |
| π Citations | Prioritize influential papers |
| β‘ Speed | 167M records in ms, no rate limits |
Perfect for: RAG systems, research assistants, literature review automation.
Installation
pip install crossref-localFrom source:
git clone https://github.com/ywatanabe1989/crossref-local
cd crossref-local && make installDatabase setup (1.5 TB, ~2 weeks to build):
# 1. Download CrossRef data (~100GB compressed)
aria2c "https://academictorrents.com/details/..."
# 2. Build SQLite database (~days)
pip install dois2sqlite
dois2sqlite build /path/to/crossref-data ./data/crossref.db
# 3. Build FTS5 index (~60 hours) & citations table (~days)
make fts-build-screen
make citations-build-screenPython API
from crossref_local import search, get, count
# Full-text search (22ms for 541 matches across 167M records)
results = search("hippocampal sharp wave ripples")
for work in results:
print(f"{work.title} ({work.year})")
# Get by DOI
work = get("10.1126/science.aax0758")
print(work.citation())
# Count matches
n = count("machine learning") # 477,922 matchesAsync API:
from crossref_local import aio
async def main():
counts = await aio.count_many(["CRISPR", "neural network", "climate"])
results = await aio.search("machine learning")CLI
crossref-local search "CRISPR genome editing" -n 5
crossref-local search-by-doi 10.1038/nature12373
crossref-local status # Configuration and database statsWith abstracts (-a flag):
$ crossref-local search "RS-1 enhances CRISPR" -n 1 -a
Found 4 matches in 128.4ms
1. RS-1 enhances CRISPR/Cas9- and TALEN-mediated knock-in efficiency (2016)
DOI: 10.1038/ncomms10548
Journal: Nature Communications
Abstract: Zinc-finger nuclease, transcription activator-like effector nuclease
and CRISPR/Cas9 are becoming major tools for genome editing...
HTTP API
Start the FastAPI server:
crossref-local relay --host 0.0.0.0 --port 31291Endpoints:
# Search works (FTS5)
curl "http://localhost:31291/works?q=CRISPR&limit=10"
# Get by DOI
curl "http://localhost:31291/works/10.1038/nature12373"
# Batch DOI lookup
curl -X POST "http://localhost:31291/works/batch" \
-H "Content-Type: application/json" \
-d '{"dois": ["10.1038/nature12373", "10.1126/science.aax0758"]}'
# Citation endpoints
curl "http://localhost:31291/citations/10.1038/nature12373/citing"
curl "http://localhost:31291/citations/10.1038/nature12373/cited"
curl "http://localhost:31291/citations/10.1038/nature12373/count"
# Collection endpoints
curl "http://localhost:31291/collections"
curl -X POST "http://localhost:31291/collections" \
-H "Content-Type: application/json" \
-d '{"name": "my_papers", "query": "CRISPR", "limit": 100}'
curl "http://localhost:31291/collections/my_papers/download?format=bibtex"
# Database info
curl "http://localhost:31291/info"HTTP mode (connect to running server):
# On local machine (if server is remote)
ssh -L 31291:127.0.0.1:31291 your-server
# Python client
from crossref_local import configure_http
configure_http("http://localhost:31291")
# Or via CLI
crossref-local --http search "CRISPR"MCP Server
Run as MCP (Model Context Protocol) server:
crossref-local mcp startLocal MCP client configuration:
{
"mcpServers": {
"crossref-local": {
"command": "crossref-local",
"args": ["mcp", "start"],
"env": {
"CROSSREF_LOCAL_DB": "/path/to/crossref.db"
}
}
}
}Remote MCP via HTTP (recommended):
# On server: start persistent MCP server
crossref-local mcp start -t http --host 0.0.0.0 --port 8082{
"mcpServers": {
"crossref-remote": {
"url": "http://your-server:8082/mcp"
}
}
}Diagnose setup:
crossref-local mcp doctor # Check dependencies and database
crossref-local mcp list-tools # Show available MCP tools
crossref-local mcp installation # Show client config examplesSee docs/remote-deployment.md for systemd and Docker setup.
Available tools:
search- Full-text search across 167M+ paperssearch_by_doi- Get paper by DOIenrich_dois- Add citation counts and references to DOIsstatus- Database statisticscache_*- Paper collection management
Impact Factor
from crossref_local.impact_factor import ImpactFactorCalculator
with ImpactFactorCalculator() as calc:
result = calc.calculate_impact_factor("Nature", target_year=2023)
print(f"IF: {result['impact_factor']:.3f}") # 54.067| Journal | IF 2023 |
|---|---|
| Nature | 54.07 |
| Science | 46.17 |
| Cell | 54.01 |
| PLOS ONE | 3.37 |
Citation Network
from crossref_local import get_citing, get_cited, CitationNetwork
citing = get_citing("10.1038/nature12373") # 1539 papers
cited = get_cited("10.1038/nature12373")
# Build visualization (like Connected Papers)
network = CitationNetwork("10.1038/nature12373", depth=2)
network.save_html("citation_network.html") # requires: pip install crossref-local[viz]Performance
| Query | Matches | Time |
|---|---|---|
hippocampal sharp wave ripples |
541 | 22ms |
machine learning |
477,922 | 113ms |
CRISPR genome editing |
12,170 | 257ms |
Searching 167M records in milliseconds via FTS5.
Related Projects
openalex-local - Sister project with OpenAlex data:
| Feature | crossref-local | openalex-local |
|---|---|---|
| Works | 167M | 284M |
| Abstracts | ~21% | ~45-60% |
| Update frequency | Real-time | Monthly |
| DOI authority | β (source) | Uses CrossRef |
| Citations | Raw references | Linked works |
| Concepts/Topics | β | β |
| Author IDs | β | β |
| Best for | DOI lookup, raw refs | Semantic search |
When to use CrossRef: Real-time DOI updates, raw reference parsing, authoritative metadata. When to use OpenAlex: Semantic search, citation analysis, topic discovery.
CrossRef Local is part of SciTeX. When used inside the SciTeX framework, DOI resolution and citation checking integrate seamlessly:
import scitex
# Resolve DOIs and enrich bibliography
scitex.scholar.enrich_bibtex("references.bib")
# Check citation accuracy
scitex.scholar.check_citations("manuscript.tex")The SciTeX system follows the Four Freedoms for Research below, inspired by the Free Software Definition:
Four Freedoms for Research
- The freedom to run your research anywhere β your machine, your terms.
- The freedom to study how every step works β from raw data to final manuscript.
- The freedom to redistribute your workflows, not just your papers.
- The freedom to modify any module and share improvements with the community.
AGPL-3.0 β because we believe research infrastructure deserves the same freedoms as the software it runs on.
