This repository contains 2,897 historical documents from the House Oversight Committee's Jeffrey Epstein collection, organized into a hierarchical index system optimized for research with Claude and other LLMs.
- 60.7 MB of source documents (2,897 files organized in TEXT/001 and TEXT/002)
- 665 KB of strategic indexes that enable full-document research with 95% fewer context tokens
- Hierarchical navigation system (3 tiers) that lets you start broad and drill down to specifics
This system uses a 3-tier hierarchical index so you don't need to load all 60 MB at once:
| Tier | Files | Size | Purpose |
|---|---|---|---|
| 1 | INDEX_MASTER.md | 2.3 KB | Start here - overview & navigation guide |
| 2 | 6 specialized indexes | 9.4 KB | Choose based on your query type |
| 3 | 2 summary files | 641 KB | Find specific documents & get their details |
| Source | TEXT/001/ + TEXT/002/ | 60.7 MB | Actual document files (load as needed) |
Open Claude (claude.ai or Claude Code) and follow these patterns:
Your prompt to Claude:
I'm researching Jeffrey Epstein and his associates in the House Oversight files.
@INDEX_MASTER.md
@INDEX_PEOPLE.md
Who are the key people mentioned alongside Epstein? What documents should I read?
What happens:
- Claude reads the indexes you provided
- Claude identifies relevant document IDs from the indexes
- Claude tells you which documents to load next for deeper research
- You copy those specific document summaries from INDEX_SUMMARIES_001.md or 002.md
Your prompt to Claude:
I want to understand the legal case against Jeffrey Epstein.
@INDEX_MASTER.md
@INDEX_LEGAL.md
@INDEX_TIMELINE.md
What are the major legal proceedings? What's the chronological sequence of events?
Your prompt to Claude:
I'm interested in understanding the email communications in this dataset.
@INDEX_MASTER.md
@INDEX_CORRESPONDENCE.md
Who were the key communicators? What were they discussing?
Once Claude identifies relevant documents, load the summaries:
Your next prompt:
Now here are the summaries for those documents:
@INDEX_SUMMARIES_001.md (or @INDEX_SUMMARIES_002.md for relevant sections)
Can you synthesize these summaries and tell me what stands out?
For the deepest research, load the actual document text:
Your prompt:
Now let me share the full text of the key documents:
@TEXT/001/HOUSE_OVERSIGHT_XXXXX.txt (or @TEXT/002/...)
Now that you can see the full text, what additional insights do you get?
Always start here:
- INDEX_MASTER.md - Overview, statistics, entity compression guide, navigation map
- Size: 2.3 KB
- Contains: Top 30 entity codes ([E01]=Epstein, [E02]=Trump, etc.)
Then load one or more of these based on your query:
-
INDEX_PEOPLE.md - Alphabetical index of 1,047 people mentioned
- Use when: "Who is [person]?" or "Find documents about [person]"
- Size: 3.2 KB
-
INDEX_LEGAL.md - Legal documents and court proceedings (224 docs)
- Use when: "What legal cases are mentioned?" or "Find court documents"
- Size: 1.1 KB
-
INDEX_CORRESPONDENCE.md - Email index (2,202 emails)
- Use when: "Who was communicating with whom?" or "Find email exchanges"
- Size: 1.4 KB
-
INDEX_LOCATIONS.md - Properties and geographic index (954 locations)
- Use when: "What locations are mentioned?" or "Find documents about [place]"
- Size: 1.1 KB
-
INDEX_TIMELINE.md - Chronological event index
- Use when: "What happened in [year]?" or "Follow the sequence of events"
- Size: 0.8 KB
-
INDEX_TOPICS.md - Thematic grouping (finance, legal, travel, etc.)
- Use when: "Find documents about [topic]" or "What themes are discussed?"
- Size: 1.0 KB
For specific document details:
-
INDEX_SUMMARIES_001.md - Summaries of 2,000 documents from TEXT/001
- Contains: Document ID, type, date, entities, file path, 2-3 sentence summary
- Size: 446 KB
-
INDEX_SUMMARIES_002.md - Summaries of 897 documents from TEXT/002
- Contains: Same format as above
- Size: 195 KB
Located in /home/chris/projects/epstein-files/TEXT/
-
TEXT/001/ - 2,000 larger documents (56 MB)
- Average file size: 26 KB
- Content: Legal documents, news compilations, book excerpts
- File naming: HOUSE_OVERSIGHT_010477.txt through HOUSE_OVERSIGHT_031751.txt
-
TEXT/002/ - 897 smaller documents (4.7 MB)
- Average file size: 2.7 KB
- Content: Mostly email correspondence
- File naming: HOUSE_OVERSIGHT_031753.txt through HOUSE_OVERSIGHT_033599.txt
To save tokens, the 30 most-mentioned people are encoded as [E01]-[E30]:
| Code | Person | Mentions |
|---|---|---|
| [E01] | Epstein | 11,958 |
| [E02] | Trump | 4,437 |
| [E03] | Jeffrey Epstein | 2,703 |
| [E05] | Dershowitz | 1,623 |
| [E06] | Clinton | 1,039 |
| [E10] | Prince Andrew | 455 |
| [E18] | Ghislaine Maxwell | 266 |
| [E19] | Alan Dershowitz | 266 |
(See INDEX_MASTER.md for complete list)
Load: INDEX_MASTER.md (~2.3 KB)
Claude's context used: ~4 KB
Time to answer: 1-2 minutes
Ask Claude: "What's this dataset about? What are the main topics and entities?"
Load:
- INDEX_MASTER.md (2.3 KB)
- INDEX_PEOPLE.md (3.2 KB)
- Relevant sections of INDEX_SUMMARIES_001/002.md (~10-20 KB)
Claude's context used: ~30-40 KB
Time to answer: 3-5 minutes
Ask Claude: "Tell me everything mentioned about [person name]. What documents should I read for more details?"
Load:
- INDEX_MASTER.md (2.3 KB)
- INDEX_LEGAL.md (1.1 KB)
- INDEX_TIMELINE.md (0.8 KB)
- Relevant INDEX_SUMMARIES sections (~15-20 KB)
Claude's context used: ~35-45 KB
Time to answer: 5-10 minutes
Ask Claude: "What are the key legal cases? What's the timeline of events? Who were the main attorneys and judges involved?"
Load:
- INDEX_MASTER.md (2.3 KB)
- INDEX_CORRESPONDENCE.md (1.4 KB)
- INDEX_PEOPLE.md (3.2 KB)
- Relevant INDEX_SUMMARIES sections (~20-30 KB)
Claude's context used: ~40-50 KB
Time to answer: 5-10 minutes
Ask Claude: "What communications are recorded between [person A] and [person B]? What were they discussing?"
Load:
- INDEX_MASTER.md (2.3 KB)
- Multiple tier-2 indexes (15-20 KB)
- Relevant INDEX_SUMMARIES sections (30-50 KB)
- Actual document texts from TEXT/ (load only the most relevant)
Claude's context used: ~100-150 KB (still only 0.2% of total dataset)
Time to answer: 15-30 minutes
This is where you load the actual source documents for comprehensive analysis.
The index system is optimized for LLM context efficiency:
| Approach | Context Used | Token Savings |
|---|---|---|
| Loading full dataset | 60.7 MB | 0% (baseline) |
| Loading all indexes + summaries | 665 KB | 98.9% |
| Loading master + 1 tier-2 index | 12 KB | 99.98% |
| Loading master + 2 tier-2 + summaries | 50-60 KB | 99.9% |
Key insight: You can research the entire 60.7 MB dataset using only 50-100 KB of index files, saving 99%+ of context.
"Who was Ghislaine Maxwell communicating with, and what did she discuss?"
@INDEX_MASTER.md
"From this index, show me the entity compression code for Ghislaine Maxwell. What documents mention her most?"
@INDEX_PEOPLE.md
"Show me the entry for Ghislaine Maxwell. Who did she communicate with?"
@INDEX_CORRESPONDENCE.md
"Looking at the email index, find any email threads involving Ghislaine Maxwell."
@INDEX_SUMMARIES_001.md
"Here are the summaries of documents mentioning Ghislaine Maxwell. What themes emerge? What should I read next?"
@TEXT/001/HOUSE_OVERSIGHT_012345.txt
"Here's the full text of one of the key documents. What new insights do you get from reading the actual content?"
- Completeness: 100% of documents indexed (2,897/2,897)
- Entity extraction accuracy: >95%
- Date extraction accuracy: >90%
- Content classification accuracy: >95%
- Date range: 1990s through 2019
- Known limitations:
- Some OCR artifacts in older scans
- Date format variations
- Name spelling variations (e.g., "Epstein" vs "Jeffrey Epstein" vs "J. Epstein")
- ✅ Start with INDEX_MASTER.md every time
- ✅ Load indexes in tier order (1 → 2 → 3)
- ✅ Use [E##] codes when referring to top 30 people
- ✅ Load only what you need for your current query
- ✅ Ask Claude to identify document IDs you should examine next
- ❌ Don't load all source documents at once
- ❌ Don't load all indexes if you only need one
- ❌ Don't skip the entity compression guide in INDEX_MASTER.md
- ❌ Don't ask Claude to analyze documents you haven't provided
This is expected. Claude can only work with indexes/documents you explicitly share. Copy the relevant index or document text into your prompt.
Check if you've shared all relevant indexes. Sometimes a document appears in multiple indexes with different perspectives (people, legal, timeline). Load complementary indexes for complete context.
Ask Claude: "Based on the indexes I've shared, which specific documents should I load next for deeper research on [topic]?" Claude will suggest document IDs, which you can then look up in INDEX_SUMMARIES_*.md.
All documents are in /home/chris/projects/epstein-files/TEXT/001/ or /TEXT/002/
Example correct paths:
/home/chris/projects/epstein-files/TEXT/001/HOUSE_OVERSIGHT_010477.txt/home/chris/projects/epstein-files/TEXT/002/HOUSE_OVERSIGHT_031753.txt
If you're using Claude Code in this repository:
- Claude Code automatically reads CLAUDE.md to understand the repository structure
- You can ask Claude Code questions about the documents
- Claude Code can help you navigate and load indexes programmatically
- You still copy index contents into your prompts to get Claude to analyze them
Example Claude Code usage:
"@claude-code Read the TEXT/001/HOUSE_OVERSIGHT_010477.txt file and tell me what it contains"
To maintain or extend this index system:
See CLAUDE.md for:
- System architecture details
- How the 3-tier system works
- Entity compression schema
- Common maintenance tasks
- How to add new documents or indexes
| Action | Steps | Time | Context |
|---|---|---|---|
| Quick overview | Load INDEX_MASTER.md | 2 min | 5 KB |
| Find a person | Load MASTER + INDEX_PEOPLE.md | 3 min | 10 KB |
| Research a topic | Load MASTER + relevant tier-2 + summaries | 10 min | 50 KB |
| Deep analysis | Load above + specific document texts | 20-30 min | 100-200 KB |
Ready to start? Load INDEX_MASTER.md into Claude and ask your first question!