An intelligent data analysis, search system and agentic workflow system built with Python that combines workflow automation, database querying, charting capabilities, and AI-powered insights.
- AI-Powered Workflow System: Dynamic decision-making with branching workflows
- Prompts optimized with DsPy: Uses modern frameworks like DsPy and GEPA for prompt iteration and ease
- Database Integration: PostgreSQL connectivity with intelligent SQL generation
- Interactive Charts: Data visualization and charting tools
- Gmail Integration: Email automation and communication features
- WebSocket API: Real-time data streaming and analysis
- Modular Tool System: Extensible architecture for adding custom tools
- Multi-LLM Support: Works with OpenAI GPT and Anthropic Claude models
βββ api/ # FastAPI web application
β βββ app/
β β βββ main.py # FastAPI application entry point
β β βββ websocket_handler.py # WebSocket communication handler
β βββ requirements.txt # API dependencies
β βββ run.py # Server startup script
βββ external_tools/ # Custom tool implementations
β βββ charting_tool.py # Data visualization tool
β βββ python_interpreter_tool.py # Python code execution
β βββ output_formatter_tool.py # Result formatting
β βββ sql_tool.py # Database query tool
βββ workflow/ # Workflow management system
β βββ workflow.py # Core workflow orchestration
β βββ helper_objects.py # Data structures and helpers
β βββ prompt_decision.py # AI decision making
β βββ utils.py # Workflow utilities
βββ preprocessing/ # Data preprocessing
β βββ context.json # Database schema definitions
βββ objects.py # Core classes and data structures
βββ helper_functions.py # Utility functions
βββ gmail.py # Gmail API integration
βββ utils.py # General utilities
βββ debug_test.py # Testing and debugging
βββ credentials.json # Google OAuth credentials (template)
βββ .env # Environment variables (template)
- Python 3.8+
- PostgreSQL database (optional)
- Gmail API credentials (optional)
-
Clone the repository
git clone <repo-url> cd ai-data-search-py
-
Install dependencies
# Install API dependencies pip install -r api/requirements.txt # Install additional dependencies pip install dspy-ai openai anthropic google-api-python-client google-auth-httplib2 google-auth-oauthlib psycopg2-binary
-
Configure environment variables
cp .env.example .env # Edit .env with your actual values
-
Set up Gmail integration (optional)
- Create a Google Cloud Project
- Enable Gmail API
- Download credentials and save as
credentials.json
Create a .env
file with the following variables:
# AI Model API Keys
export OPENAI_API_KEY=your_openai_api_key_here
export ANTHROPIC_API_KEY=your_anthropic_api_key_here
# Database Configuration (optional)
export PG_HOST=your_postgres_host
export PG_RO_PW=your_postgres_password
export PG_RO_USER=your_postgres_user
export PG_DB=your_database_name
export PG_PORT=5432
# Email Service (optional)
export RESEND_API_KEY=your_resend_api_key_here
- Go to Google Cloud Console
- Create a new project or select existing one
- Enable Gmail API
- Create OAuth 2.0 credentials
- Download the credentials file and save as
credentials.json
# Start the FastAPI server
python api/run.py
# Or run directly with uvicorn
uvicorn api.app.main:app --host 0.0.0.0 --port 8000 --reload
The API will be available at http://localhost:8000
import asyncio
import dspy
from workflow.workflow import Workflow
# Initialize the workflow
lm = dspy.LM('anthropic/claude-sonnet-4-20250514', api_key=os.getenv('ANTHROPIC_API_KEY'))
workflow = Workflow(model=lm)
# Run a query
async def run_analysis():
async for response in workflow.run("Analyze sales data for Q1 2024"):
print(response.to_dict())
asyncio.run(run_analysis())
Create a custom tool by subclassing the Tool
class:
from objects import Tool, Response
from typing import Dict, Any
class CustomTool(Tool):
def __init__(self, model):
super().__init__(
name="custom_tool",
description="Description of what your tool does",
inputs={
"parameter": {"type": str, "description": "Parameter description", "required": True}
}
)
self.model = model
async def __call__(self, tree_data, inputs: Dict[str, Any], **kwargs) -> Response:
# Your tool logic here
result = f"Processed: {inputs['parameter']}"
return Response(
type="text",
data=[{"text": result}],
description="Custom tool result"
)
# Add to workflow
workflow.add_tool(CustomTool(model), branch_id="base")
Connect to the WebSocket endpoint for real-time communication:
const ws = new WebSocket('ws://localhost:8000/ws/analyze');
ws.onopen = function() {
// Send analysis request
ws.send(JSON.stringify({
type: "analyze",
query: "Show me sales trends for this month",
options: {}
}));
};
ws.onmessage = function(event) {
const response = JSON.parse(event.data);
console.log('Response:', response);
};
-
SQL Tool (
sql_tool.py
)- Execute database queries
- Generate SQL from natural language
- Handle complex analytical queries
-
Charting Tool (
charting_tool.py
)- Create data visualizations
- Generate charts and graphs
- Export visual data representations
-
Python Interpreter Tool (
python_interpreter_tool.py
)- Execute Python code snippets
- Perform calculations and data processing
- Handle complex computational tasks
-
Output Formatter Tool (
output_formatter_tool.py
)- Format and structure results
- Convert data to different formats
- Prepare data for presentation
- Create your tool class inheriting from
Tool
- Implement the
__call__
method - Define tool inputs and description
- Add to workflow using
workflow.add_tool()
Example tool structure:
class YourCustomTool(Tool):
def __init__(self, model):
super().__init__(
name="your_tool_name",
description="What your tool does",
inputs={
"input_param": {
"type": str,
"description": "Description",
"required": True
}
}
)
async def __call__(self, tree_data, inputs: Dict[str, Any], **kwargs):
# Tool implementation
return Response(data=result, type="your_result_type")
Run the debug test to verify your setup:
python debug_test.py
ws://localhost:8000/ws/analyze
- Real-time analysis and data streaming
GET /
- Health checkGET /auth/google
- Google OAuth initiationGET /auth/callback
- OAuth callback handler
- Never commit actual API keys or credentials
- Use environment variables for sensitive configuration
- The
.env
file should not be tracked in version control - Credentials files are templates and need your actual values
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
This project is licensed under the MIT License - see the file for details.
- Missing API Keys: Ensure all required environment variables are set
- Database Connection: Verify PostgreSQL credentials and connectivity
- Gmail Integration: Check Google OAuth setup and credentials file
- Dependencies: Make sure all required packages are installed
- Check the debug output from
debug_test.py
- Verify your
.env
configuration - Ensure all dependencies are installed correctly
This is an active development project. Features and APIs may change as the project evolves.