A powerful tool for searching and indexing content from Google Sheets documents using vector search technology.
This application indexes content from Google Sheets documents into a vector database (ChromaDB) and provides a search interface to quickly find information across multiple spreadsheets. The system connects to Google Drive to access spreadsheets, processes their content, and makes them searchable using natural language queries.
- Vector Search: Find relevant content across multiple Google Sheets documents using semantic search
- Direct Links: Get direct links to specific cells in Google Sheets where the information was found
- Bulk Indexing: Easily index entire folders of Google Sheets documents
- User-friendly Interface: Simple Streamlit interface for searching and indexing
- Clone the repository
- Install dependencies:
pip install -r requirements.txt - Create a
.envfile based on.env.example:OPENAI_API_KEY=your_openai_api_key DB_CHROMA_PATH=./data/chroma DB_SQLITE_PATH=./data/db.sqlite3 - Create a Google Cloud project and enable the Google Sheets API
- Create a service account and download the credentials JSON file
- Place the credentials file in the
secret/directory
streamlit run app.py
- Navigate to the "Search" tab
- Enter your search query
- View the results with direct links to the specific cells in Google Sheets
- Navigate to the "Index from Google Drive" tab
- Enter the ID of the Google Drive folder containing spreadsheets to index
- Upload your Google API credentials file
- Click "Start indexing"
app.py- Main Streamlit applicationindexer.py- Logic for indexing Google Sheets into ChromaDBproject_search.py- Google Sheets connection and search utilitiessheet_creator_tool.py- Tools for creating and manipulating Google Sheetsrequirements.txt- Project dependencieschroma_db/- Directory for the ChromaDB vector database
- Python 3.7+
- Google Sheets API credentials
- ChromaDB 0.4.6+
- FastAPI 0.68.0+
- Streamlit
- gspread 5.9.0+
- oauth2client 4.1.3+

