An research project combining Compressed Trie data structures, BERT-based semantic analysis, and Google Gemini AI to create an intelligent educational story generation system for elementary school children learning English.
- π― Project Overview
- ποΈ Backend Story Generation Architecture
- π± Android Application
- π Getting Started
- π Performance Metrics
- π¬ Research Contributions
- ποΈ Project Structure
- π οΈ Development Status
This repository contains a comprehensive educational platform that generates contextually relevant, fill-in-the-blank stories for children aged 6-10. The system uniquely combines three key technologies:
- Compressed Tries - Efficient storage and retrieval of 5000+ vocabulary words
- BERT Neural Networks - Semantic word ranking for contextual relevance
- Google Gemini AI - Dynamic story generation with educational content
The platform includes both an Android mobile application and a command-line backend prototype that demonstrates the core story generation architecture.
graph TB
A[Story Context] --> B[Compressed Trie]
B --> C[Unused Words Filter]
C --> D[BERT Model]
D --> E[Top Relevant Words]
E --> F[Gemini API]
F --> G[Generated Story Template]
G --> H[Word Placement Algorithm]
H --> I[Final Story with Blanks]
- Memory Efficient: Stores 5K+ words with shared prefixes
- Fast Retrieval: O(m) search complexity where m = word length
- Auto-completion: Levenshtein distance-based suggestions
- Multiple Definitions: Supports words with multiple meanings
- Model:
bert-base-uncasedfor masked language modeling - Context-Aware: Analyzes story context to predict relevant words
- Semantic Scoring: Uses transformer attention for word relevance
- Duplicate Prevention: Ensures no word repetition across story levels
- Model:
gemini-2.0-flash-litefor natural language generation - Educational Focus: Prompts optimized for grade-level vocabulary
- Contextual Continuity: Maintains story coherence across levels
- Controlled Output: Generates exactly 3 blanks per story segment
-
Initialization
- Load 5K+ words from Oxford 5000 csv.
- Initialize BERT model and tokenizer
- Set up Gemini API connection
-
Story Level Generation
fun playNextLevel() { val unusedWords = getAllWordsFromTrie() - usedWords val story = generateDynamicStory(currentContext, unusedWords) displayStory(story) }
-
Dynamic Story Creation Process
- Context Analysis: Current story context passed to BERT
- Word Filtering: Unused words from trie filtered by relevance
- Template Generation: Gemini creates story with
[MASK]placeholders - Word Placement: BERT selects best words for each mask
- Story Assembly: Final story with blanks and answers
-
Educational Game Mechanics
data class Story( val text: String, // Complete story text val blankPositions: List<Int>, // Indices of words to blank out val answers: List<String> // Correct answers for blanks )
| Home Screen | Dictionary Lookup | Word Detail | Story World | Story Level Example |
|---|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
- Interactive Dictionary: Trie-based word lookup with auto-completion
- Story Mode: Dynamic Story Generation
- Educational Design: Child-friendly UI with Jetpack Compose
- Offline Capability: Local trie storage for fast performance
- UI: Jetpack Compose
- Database: Room with compressed trie storage
- Architecture: MVVM with Repository pattern
- Background: WorkManager for data processing
# For Android App
- Android Studio Arctic Fox or newer
- Kotlin 1.9+
- Android SDK 24+
# For Backend Architecture
- Python 3.8+
- Kotlin 1.9+
- transformers library
- torch library-
Install Python Dependencies
pip install torch transformers numpy pandas
-
Set Up Gemini API
# Add your Gemini API key to testing.main.kts val apiKey = "your-gemini-api-key-here"
-
Run the Story Generator
cd architecture kotlin -script testing.main.kts
-
Clone and Build
git clone https://github.com/ahmedsilat44/Tries-Based-Dictionary.git cd Tries-Based-Dictionary ./gradlew assembleDebug -
Install on Device
./gradlew installDebug
- Scalable Unlike most AI storytelling systems, our approach integrates classical data structures (compressed tries) with modern transformer models, enabling both memory efficiency and educational word selection.
- Contextual Learning: Words selected based on semantic relevance
- Engagement: AI-generated content keeps stories fresh and interesting
βββ app/ # Android application
β βββ src/main/java/ # Kotlin source files
β β βββ compressed_tries.kt # Compressed trie implementation
β β βββ MainActivity.kt # Main app entry point
β β βββ ui/ # Compose UI components
β βββ build.gradle.kts # Android build configuration
βββ architecture/ # Backend story generation system
β βββ testing.main.kts # Main backend application
β βββ rank_words.py # BERT-based word ranking
β βββ words.txt # 5K word dictionary
β βββ The_Oxford_3000.txt # Curated vocabulary list
βββ trie-implement.kts # Basic trie implementation
βββ compressed-trie-implement.kts # Advanced compressed trie
βββ README.md # This documentation
- Compressed trie implementation with 5K+ words
- BERT-based contextual word ranking
- Gemini AI story template generation
- Android app with Jetpack Compose UI
- End-to-end story generation pipeline
- Educational game mechanics
Although active development has concluded, several potential extensions could further improve the system:
- Output Processing: More robust parsing of Gemini API responses for
[MASK]tokens - Model Fine-tuning: Better alignment of BERT word relevance with educational content
- Prompt Optimization: Refining Gemini prompts for consistent story quality
- Progressive Difficulty: Implementing intentional progression in story complexity
- Multi-language support for international users
- Advanced difficulty progression algorithms
- Teacher dashboard for progress tracking
- Voice narration and audio features
- Multiplayer collaborative story creation
- In-Game Story Generation: Option for learners to dynamically generate brand-new stories during gameplay
- Additional Minigames: Introduce vocabulary puzzles, matching games, and quizzes to boost gamification and long-term engagement
This project is licensed under the MIT License - see the LICENSE file for details.
If you use this research in your work, please cite:
@misc{tries-dictionary-2025,
title={Tries-Based Dictionary: AI-Powered Educational Story Generation},
author={Ahmed Silat, Taha Zahid, Minhaj Ul Hasan, Maria Samad},
year={2024},
url={https://github.com/ahmedsilat44/Tries-Based-Dictionary}
}



