An automated end-to-end pipeline for systematic literature review of traumatic brain injury (TBI) research papers. BioScribe screens, extracts, validates, and stores cognitive test data from scientific publications.
BioScribe streamlines the systematic review process with an AI-powered pipeline:
- Screening - Automatically filters relevant TBI papers using LLM-based analysis
- Extraction - Extracts structured cognitive test data (21 fields) from included papers
- Validation - Validates extraction accuracy with confidence scoring and hallucination detection
- Database - Stores results in DynamoDB with two-table architecture
- Human-Loop - Provides web UI for expert review of medium/low confidence records
Reliability: Built-in network timeout protection with automatic retries and graceful degradation.
Input CSV → Screening → Extraction → Validation → DynamoDB → Human Review UI
↓ ↓
[PubMed Fetch] [extraction table]
[Auto-retry] [validation table]
- Python 3.8+
- AWS account (or local DynamoDB)
- OpenAI API key
# Clone the repository
git clone https://github.com/ece1786-2025/Bioscribe.git
cd Bioscribe
# Install dependencies
pip install -r requirements.txt
# Configure credentials
# 1. Add OpenAI API key to credentials/open_ai_key.txt
# 2. Add AWS credentials to credentials/aws_credentials.py# Run end-to-end pipeline (tables created automatically)
python main.py --input_csv inputs/test_papers.csv --output-dir outputs/end-to-endThe pipeline will:
- ✅ Check/create DynamoDB tables automatically
- ✅ Screen papers for relevance
- ✅ Extract cognitive test data
- ✅ Validate extractions with confidence scores
- ✅ Insert records to database
- ✅ Launch Human-Loop UI in browser
Bioscribe/
├── main.py # End-to-end pipeline orchestrator
├── screener_script/ # Paper screening module
│ └── end_to_end_screening.py
├── extractor_script/ # Data extraction module
│ └── extractor_script.py
├── validatior_script/ # Validation module
│ └── validator.py
├── database_script/ # DynamoDB operations
│ ├── create_tables.py # Table creation
│ ├── extraction_database.py # Extraction table manager
│ └── validation_database.py # Validation table manager
├── apps/
│ └── human-loop/ # Web UI for human review
│ └── app.py
├── credentials/ # API keys and AWS credentials
├── inputs/ # Input CSV files
└── outputs/ # Pipeline outputs
DynamoDB tables are created automatically on first run - no manual setup required.
- High confidence (≥0.80): Auto-approve, no review needed
- Medium confidence (0.60-0.79): Flag for review
- Low confidence (<0.60): Priority review required
bioscribe-successful-entries: Extraction data (21 fields per record)bioscribe-validations: Validation metadata (confidence scores, routing decisions)
Flask-based web interface for reviewing flagged records with:
- Confidence tier filtering
- Side-by-side source text comparison
- In-place editing and approval workflow
- Study metadata: Population, N, age, gender, education, injury severity
- Cognitive test: Test name, subtest, domain, outcome measure
- Results: Mean, SD, median, IQR, min/max, range, statistics
- Timing: Time since injury, follow-up duration
- Composite confidence score
- Field-level confidence breakdown
- Hallucination risk assessment
- Routing decision (approve/review/reject)
Default AWS region: us-east-1
Modify in main.py:
ensure_tables_exist(region="us-east-2") # Change regionAdjust in validatior_script/validator.py:
DEFAULT_THRESHOLDS = {
"high": 0.80, # Auto-approve threshold
"medium": 0.60, # Review threshold
"low": 0.50 # Reject threshold
}screening_final.json: Final screening decisions with justifications
extractions.json: All extracted records with metadata
validation_results.json: Validation results with confidence scores
# Screen papers only
cd screener_script
python end_to_end_screening.py --input_csv ../inputs/data/screen/test_papers.csv
# Extract from specific paper
cd extractor_script
python extractor_script.py <PMID>
# Validate extractions
cd validatior_script
python validator.py --input extractions.json --output validation.json
# Create tables manually
python database_script/create_tables.pycd apps/human-loop
python app.py
# Visit http://127.0.0.1:5000If table creation fails, create manually:
python database_script/create_tables.pyTables will auto-create on first run. For local development:
# Install and run local DynamoDB
docker run -p 8000:8000 amazon/dynamodb-localpip install cloudscraper # For web scraping
pip install boto3 # For AWS DynamoDBMIT Liscense
ECE1786 2025 - University of Toronto