A Retrieval Augmented Generation (RAG) system for code understanding, built using a graph-based approach to represent code structure and relationships. BigBrain enhances AI coding assistance by providing context from your codebase through a knowledge graph.
- Create a robust Python backend framework that parses source code and builds a knowledge graph
- Develop a VS Code extension that integrates with the Agent Chat system
- Enhance AI coding assistance by providing relevant code context through graph-based retrieval
- Support multiple programming languages and provide a seamless coding experience
- Python backend that parses source code and builds a knowledge graph
- VS Code extension that communicates with the backend and provides enhanced AI coding assistance
- JSON protocol for communication between the extension and backend
- Support for Python code parsing (with more languages to come)
- Graph-based representation of code structure and relationships
- Retrieval-Augmented Generation (RAG) for improved code understanding
- Clone the repository
- Create a virtual environment:
cd bigbrain uv venv source .venv/bin/activate
- Install dependencies:
uv pip install -e .
To use the parser functionality, you need to have the tree-sitter grammars compiled:
# This feature will be fully implemented in future releases
# For now, the parser will work with a simplified approachpython main.py --graph-path=/path/to/graph.pickle --log-level=INFOThe backend communicates via JSON over stdin/stdout using the protocol defined in bridge/protocol.py.
/home/kelvin/git/bigbrain/
├── bridge/ # JSON communication module
│ ├── __init__.py
│ ├── handler.py # Handles stdin/stdout communication
│ └── protocol.py # JSON protocol definitions
├── core/ # Core functionality
│ ├── __init__.py
│ ├── graph_builder.py # Builds graph from parsed data
│ ├── graph_store.py # Graph management with NetworkX
│ └── parser.py # Code parsing with tree-sitter
├── grammars/ # For tree-sitter language files
├── tests/ # Test suite
│ ├── __init__.py
│ └── test_parser.py # Tests for parser module
├── .env.example # Environment variables template
├── .gitignore # Git ignore rules
├── config.py # Configuration settings
├── main.py # Main entry point
├── README.md # Project documentation
└── setup.py # Package installation configuration
-
Parser Module (
core/parser.py):- Parses source code files and extracts functions, classes
- Currently supports Python files (with placeholders for tree-sitter integration)
- Designed for extensibility to add more languages in future phases
-
Graph Store Module (
core/graph_store.py):- Manages the knowledge graph using NetworkX
- Supports adding nodes and edges with attributes
- Handles persistence to/from disk using pickle
- Includes file-based node removal for updates
-
Graph Builder Module (
core/graph_builder.py):- Converts parsed code structures into graph elements
- Creates file, function, and class nodes with relationships
- Updates the graph when files change
-
Bridge Protocol (
bridge/protocol.py):- Defines the JSON communication protocol as Pydantic models
- Includes commands like UPDATE_FILE, QUERY_GRAPH, GET_STATUS, and SHUTDOWN
- Provides structured request and response formats
-
Bridge Handler (
bridge/handler.py):- Processes JSON messages from stdin
- Dispatches commands to appropriate handlers
- Sends responses back via stdout
- Handles errors and validation
-
Main Entry Point (
main.py):- Parses command line arguments
- Sets up logging
- Initializes components and starts the bridge handler
- Pydantic Models: Used for robust JSON protocol validation and serialization
- NetworkX: Chosen for graph representation due to its flexibility and Python integration
- Tree-Sitter: Selected for reliable and efficient code parsing across languages
- Modular Architecture: Clear separation between parsing, graph building, and communication
- Python-dotenv: Used for environment variable management
- Pickle Serialization: Used for graph persistence (with potential for alternatives in future)
Following a phased approach as outlined in PLANNING.md:
- Phase 1: Backend Foundation & Core Python Parsing (Current)
- Phase 2: Enhanced Parsing & Relationships
- Phase 3: Backend Querying & VS Code Bridge Setup
- Phase 4: VS Code Agent Integration & Basic RAG
- Phase 5: Multi-Language Support & Advanced Features
- Phase 6: Optimization & Packaging
pytestSee the LICENSE file for details.