FULL_EPSTEIN_INDEX

license

language

FULL_EPSTEIN_INDEX

CONTENT WARNING: This repository contains graphic and highly sensitive material regarding sexual abuse, exploitation, trafficking, and violence. It also contains unverified allegations and raw witness statements. User discretion is strongly advised.

Overview

Note. There is ALOT of data. OCR made mistakes scanning the files. So that being said, there is a lot of noise in the dataset, whether it be from OCR taking words out of normal 'pictures' from the pdf's, or character recognition failure. Feel free to contribute, clean up, add too, etc.

You can view ALL raw files here. This includes all releases, from all government bodies, from all timelines.

This is a comprehensive, unified research archive aggregating public releases related to the Jeffrey Epstein estate and associated investigations.

This repository expands upon earlier archives (such as the initial November 2025 House Oversight release) by integrating the First Phase of Declassified Epstein Files released by the Department of Justice. Unlike previous datasets restricted to scanned emails, this index combines:

House Oversight Documents: ~20,000 pages of emails and estate records (Nov 2025). DOJ Disclosures: Flight logs, contact books (redacted), and the "Masseuse List." Multimedia Evidence: BOP video footage and audio recordings from the Maxwell Proffer sessions. FBI releases, Customs and Border Patrol releases, etc. Note on Updates: This index is a living archive. As additional phases of files are declassified and released by government bodies, they will be processed, indexed, and added to this repository.

The repository is organized to facilitate open-source intelligence (OSINT) analysis and research.

Always cross-reference with the original raw files.

Usage Guidelines

This dataset is intended for research, investigative journalism, and legal analysis. By accessing this repository, you agree to the following ethical guidelines:

User Responsibilities

Verify Facts: Much of this data consists of raw evidence, notes, and unverified allegations. Do not present search results as established fact without corroboration. Respect Privacy: Adhere to all redactions found in the source documents. Do not attempt to use this data to doxx or harass individuals. Victim Dignity: Treat all information regarding potential victims with extreme care and respect.

Prohibited Uses

No Fine-Tuning: Do not use this dataset to train or fine-tune generative AI models. The risk of generating hallucinations regarding sensitive legal matters is too high. No Harassment: Do not use this tool to target private individuals. No Commercial Exploitation: This data is for public interest and educational purposes.

Sources

This index aggregates public domain releases from: U.S. House Committee on Oversight and Accountability (Nov 12, 2025 Release) U.S. Department of Justice (First Phase Declassification, Flight Logs, Maxwell Proffer Material, Dec 2025 Release) All FBI, Border Patrol, and other government body releases

Legal & Disclaimer

Disclaimer: This repository is an independent collection and is not an official service of the U.S. Government.

Copyright:

Original government documents are generally in the public domain or released under Fair Use principles for research. The organizing scripts and index structure are licensed under the MIT License.

Liability: The maintainers of FULL_EPSTEIN_INDEX claim no ownership over the underlying documents and assume no liability for the use, misuse, or interpretation of this data. Users are solely responsible for compliance with applicable laws and privacy regulations.

To contribute corrections to transcripts or metadata, please open a Pull Request citing the specific File ID.

Interactive Network Analysis (Added Dec 2025)

Live Demo

Open the Interactive Network Graph

Click any node to open Wikipedia
Use the timeline slider to filter by year (1994-2025)
Hover for details on each person

Key Findings from DataSet 8 Analysis

Extracted improperly redacted content from DOJ DataSet 8 (10,593 PDFs). Documents have "redaction" as visual overlays without removing underlying text.

Discovered Hidden Content:

936 unique email addresses extracted
23,383 names identified
Ghislaine Maxwell's personal email: gmax1@mindspring.com
2002 email chain discussing arranging "girls" in Peru (Document: EFTA00011438)
Epstein's Sex Offender Registry fully extractable (victim ages: 14, 16)
11 email addresses belonging to Epstein including LITTLESTJEFF@YAHOO.COM
Encrypted messaging apps: Signal, Telegram, WhatsApp, Confide

See REDDIT_REPORT.md for top 50 findings with sources.

Network Statistics

Metric	Value
Individuals Mapped	47
Connections Documented	112
Documents Analyzed	19,154
Year Range	1994-2025

Categories in Network

Core (red): Jeffrey Epstein, Ghislaine Maxwell
Accomplices (orange): Staff members
Victims (purple): Virginia Giuffre, Jane Doe
Political (blue): Prince Andrew, Bill Clinton, Donald Trump
Legal (yellow): Defense attorneys, prosecutors, judges
Family (orange): Mark Epstein, Isabel Maxwell
Associates (purple): Stan Pottinger, Juanesteban Ganoza
Financial (teal): Deutsche Bank

Analysis Files

File	Description
`epstein_network_timeline.html`	Interactive visualization (open in browser)
`focused_entities.json`	Network data with connections
`dataset8_redacted_scan.json`	Full extraction results from DataSet 8
`dataset8_emails.csv`	All 936 emails extracted
`dataset8_all_names.csv`	All names with document sources
`REDDIT_REPORT.md`	Top 50 findings report
`DATASET8_HIDDEN_CONTENT_REPORT.md`	Technical analysis

Source Links

Source	Link
HuggingFace Dataset	theelderemo/FULL_EPSTEIN_INDEX
Raw Files (Google Drive)	All Releases
DOJ FOIA DataSets	justice.gov FOIA
WikiEpstein	wikiepstein.com

How to Run Analysis

# Clone repo
git clone https://github.com/yourusername/epstein-network.git
cd epstein-network

# Create virtual environment
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install pdfplumber

# Download DOJ DataSet 8 and extract to additional_data/
# Run extraction
python scan_redacted_content.py

# Rebuild visualization
python build_timeline_graph.py

Methodology

Downloaded DOJ FOIA DataSets 1-8 (~12.4 GB)
Extracted text from 10,593 PDFs using pdfplumber
Pattern matched for emails, names, phone numbers
Cross-referenced with known individuals
Built network graph with vis-network.js

Why Redactions Failed

Many DOJ documents use a flawed redaction method:

Black boxes overlaid on text layer
Underlying text NOT removed from PDF
Copy/paste or programmatic extraction reveals content

This is a known issue in government document releases.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
lib		lib
.gitattributes		.gitattributes
.gitignore		.gitignore
DATASET8_HIDDEN_CONTENT_REPORT.md		DATASET8_HIDDEN_CONTENT_REPORT.md
INNUENDO_REPORT.md		INNUENDO_REPORT.md
README.md		README.md
REDDIT_REPORT.md		REDDIT_REPORT.md
build_final_graph.py		build_final_graph.py
build_timeline_graph.py		build_timeline_graph.py
create_visualization.py		create_visualization.py
dataset5_entities.json		dataset5_entities.json
dataset6_entities.json		dataset6_entities.json
dataset7_entities.json		dataset7_entities.json
dataset8_all_names.csv		dataset8_all_names.csv
dataset8_emails.csv		dataset8_emails.csv
dataset8_potential_victims.csv		dataset8_potential_victims.csv
dataset8_redacted_scan.json		dataset8_redacted_scan.json
epstein_network.html		epstein_network.html
epstein_network_final.html		epstein_network_final.html
epstein_network_timeline.html		epstein_network_timeline.html
extract_dataset5.py		extract_dataset5.py
extract_dataset6.py		extract_dataset6.py
extract_dataset7.py		extract_dataset7.py
extract_pdf_text.py		extract_pdf_text.py
extract_relationships.py		extract_relationships.py
extracted_entities.json		extracted_entities.json
flight_log_passengers.json		flight_log_passengers.json
focused_entities.json		focused_entities.json
focused_extraction.py		focused_extraction.py
merge_all_datasets.py		merge_all_datasets.py
merge_dataset8.py		merge_dataset8.py
merge_datasets.py		merge_datasets.py
network_stats.json		network_stats.json
parse_flight_logs.py		parse_flight_logs.py
scan_redacted_content.py		scan_redacted_content.py
vis-network.min.css		vis-network.min.css
vis-network.min.js		vis-network.min.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FULL_EPSTEIN_INDEX

CONTENT WARNING: This repository contains graphic and highly sensitive material regarding sexual abuse, exploitation, trafficking, and violence. It also contains unverified allegations and raw witness statements. User discretion is strongly advised.

Overview

Usage Guidelines

User Responsibilities

Prohibited Uses

Sources

Legal & Disclaimer

Copyright:

Interactive Network Analysis (Added Dec 2025)

Live Demo

Key Findings from DataSet 8 Analysis

Discovered Hidden Content:

Network Statistics

Categories in Network

Analysis Files

Source Links

How to Run Analysis

Methodology

Why Redactions Failed

About

Uh oh!

Releases

Packages

Contributors 2

Languages

phelix001/epstein-network

Folders and files

Latest commit

History

Repository files navigation

FULL_EPSTEIN_INDEX

CONTENT WARNING: This repository contains graphic and highly sensitive material regarding sexual abuse, exploitation, trafficking, and violence. It also contains unverified allegations and raw witness statements. User discretion is strongly advised.

Overview

Usage Guidelines

User Responsibilities

Prohibited Uses

Sources

Legal & Disclaimer

Copyright:

Interactive Network Analysis (Added Dec 2025)

Live Demo

Key Findings from DataSet 8 Analysis

Discovered Hidden Content:

Network Statistics

Categories in Network

Analysis Files

Source Links

How to Run Analysis

Methodology

Why Redactions Failed

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages