A Python-based interactive shell that allows you to load Excel files into a SQLite database and query them using SQL with a beautiful, colorful interface. Features persistent caching, auto-loading, and both interactive and non-interactive modes.
- High Performance: Optimized Python implementation using
calamine(Rust-based) for blazing fast Excel loading (3.3x faster than standard loaders). - Dual Backend: Support for both
DuckDB(default, high performance) andSQLite(compatibility). - Load Excel Files: Load a single
.xlsxfile or an entire directory of files in parallel. - Automatic Sanitization:
- Sheet names are converted to valid SQL table names (e.g., "Sales Data (2024)" ->
Sales_Data__2024_). - Column headers are sanitized to be valid SQL identifiers (e.g., "Salary ($)" ->
Salary____).
- Sheet names are converted to valid SQL table names (e.g., "Sales Data (2024)" ->
- SQL Querying: Full SQL support via DuckDB/SQLite. You can
JOINtables derived from different Excel files. - Large Dataset Support: Handles large Excel files (tested with 1M+ rows) efficiently.
- Rich UI: Colorful, formatted output using the
richlibrary. - Multi-line Queries: Write SQL queries across multiple lines. Press Enter to continue, end with
;to execute. - Smart History: Multi-line queries are stored as a single history entry (use Up arrow to recall).
- Detailed Metadata: The
tablescommand shows row count, column count, and all column names with types.
Since this tool uses the standard SQLite engine, all standard SQLite SQL syntax is supported. This includes but is not limited to:
- Standard Clauses:
SELECT,FROM,WHERE,GROUP BY,HAVING,ORDER BY,LIMIT. - Joins:
INNER JOIN,LEFT JOIN,CROSS JOIN(Right and Full Outer joins are not natively supported by SQLite, but can be emulated). - Aggregations:
COUNT(),SUM(),AVG(),MIN(),MAX(). - Subqueries & CTEs: Common Table Expressions (
WITH ... AS) and nested queries. - Built-in Functions:
- String:
UPPER(),LOWER(),SUBSTR(),TRIM(),REPLACE(). - Date/Time:
DATE(),TIME(),DATETIME(),STRFTIME(). - Math:
ABS(),ROUND(),RANDOM().
- String:
- Set Operations:
UNION,UNION ALL,INTERSECT,EXCEPT.
Note: The database is persistent (stored in ~/.sql_excel_data.db). Changes made with INSERT, UPDATE, or DELETE will persist across sessions. Use the refresh command to reload from Excel files.
This project uses uv for fast dependency management.
-
Install uv (if not already installed):
pip install uv
-
Set up the environment:
uv venv source .venv/bin/activate # On Windows: .venv\Scripts\activate uv pip install -r requirements.txt # Installs: pandas, openpyxl, rich, prompt_toolkit, fastmcp, duckdb, python-calamine, sqlparse
-
Generate Test Data (optional):
uv run python create_test_data.py
-
Auto-load and start REPL (recommended):
uv run python main.py test_data
This loads the data folder on first run and caches it. Subsequent runs use the cache automatically!
Start the REPL with auto-loading:
uv run python main.py test_dataOr start empty and load manually:
uv run python main.pyThen inside the REPL:
(sql-excel) load test_data
View loaded tables:
(sql-excel) tables
Refresh data from Excel files:
(sql-excel) refresh
This clears the cache and reloads from the original Excel files.
Execute queries directly without entering the REPL:
# Query using cached data (defaults to test_data folder)
uv run python main.py --query "SELECT * FROM users_Sheet1"
# Short form
uv run python main.py -q "SELECT name, email FROM users_Sheet1 WHERE id = 1"
# Specify a different source folder
uv run python main.py --source my_data -q "SELECT * FROM my_table"
# Complex query with joins (uses default test_data)
uv run python main.py -q "SELECT u.name, o.product_name FROM users_Sheet1 u JOIN orders_Sheet1 o ON u.id = o.user_id"Simple Select:
SELECT * FROM users_Sheet1;Multi-line Join Query:
SELECT u.name, o.product_name, o.amount
FROM users_Sheet1 u
JOIN orders_Sheet1 o ON u.id = o.user_id;Note: Press Enter to add new lines. End with ; and press Enter to execute.
Querying with Sanitized Names:
SELECT Full_Name, Salary____
FROM complex_data_Employee_Records
WHERE Salary____ > 60000;Aggregation with Group By:
SELECT u.name, SUM(o.amount) as total_spent
FROM users_Sheet1 u
JOIN orders_Sheet1 o ON u.id = o.user_id
GROUP BY u.name
ORDER BY total_spent DESC;load <path>- Load Excel file(s) from a pathtables- List all tables with metadata (rows, columns, types)schema <table>- Show CREATE TABLE statementrefresh- Clear cache and reload from Excel filesexit/quit- Exit the REPL<sql query>- Execute any SQL query (end with;)
python main.py [data_folder] [--query QUERY] [--source SOURCE] [--db DB_PATH]data_folder- Optional positional argument for data folder path--query,-q- Execute a query and exit (non-interactive)--source,-s- Default data source folder (default:test_data)--db- Custom database path (default:~/.sql_excel_data.db)
Note: The positional data_folder takes priority over --source. If neither is provided, it defaults to test_data.
- See
TEST_QUERIES.mdfor more example queries you can copy and paste. - Use the
helpcommand inside the REPL to see all available commands.