Skip to content

Commit 600b110

Browse files
committed
Convert text2sql example to German Pokemon MySQL database
- Updated from SQLite ecommerce DB to MySQL Pokemon database - Added German Pokemon database setup instructions with Docker - Created comprehensive German Pokemon sample queries - Added complex queries demonstrating evolution chains, type effectiveness, statistical analysis - Updated README.md with proper setup guide and advanced query examples - Added .env.example template for configuration - Cleaned up debugging artifacts while preserving functional debugging workflow - Updated all code comments and documentation for MySQL implementation
1 parent 23e36bf commit 600b110

File tree

9 files changed

+237
-250
lines changed

9 files changed

+237
-250
lines changed
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# OpenAI API Key (required)
2+
OPENAI_API_KEY=your-openai-api-key-here
3+
4+
# MySQL Database Configuration
5+
MYSQL_HOST=localhost
6+
MYSQL_PORT=3308
7+
MYSQL_USER=root
8+
MYSQL_PASSWORD=root
9+
MYSQL_DB=db_pokemon
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
./db
2+
3+
**/db/*
Lines changed: 177 additions & 75 deletions
Original file line numberDiff line numberDiff line change
@@ -1,51 +1,89 @@
11
# Text-to-SQL Workflow
22

3-
A PocketFlow example demonstrating a text-to-SQL workflow that converts natural language questions into executable SQL queries for an SQLite database, including an LLM-powered debugging loop for failed queries.
3+
A PocketFlow example demonstrating a text-to-SQL workflow that converts natural language questions into executable SQL queries for a MySQL Pokemon database, including an LLM-powered debugging loop for failed queries.
44

55
- Check out the [Substack Post Tutorial](https://zacharyhuang.substack.com/p/text-to-sql-from-scratch-tutorial) for more!
66

77
## Features
88

99
- **Schema Awareness**: Automatically retrieves the database schema to provide context to the LLM.
10-
- **LLM-Powered SQL Generation**: Uses an LLM (GPT-4o) to translate natural language questions into SQLite queries (using YAML structured output).
10+
- **LLM-Powered SQL Generation**: Uses an LLM (GPT-4o) to translate natural language questions into MySQL queries (using YAML structured output).
1111
- **Automated Debugging Loop**: If SQL execution fails, an LLM attempts to correct the query based on the error message. This process repeats up to a configurable number of times.
12+
- **German Pokemon Database**: Works with a comprehensive German Pokemon database containing 898 Pokemon, types, moves, and relationships.
1213
## Getting Started
1314

14-
1. **Install Packages:**
15+
### Prerequisites
16+
17+
1. **Pokemon Database Setup:**
18+
19+
Set up the German Pokemon database from the [matse-spicker-db](https://github.com/pblan/matse-spicker-db/) repository:
20+
```bash
21+
git clone https://github.com/pblan/matse-spicker-db.git
22+
cd matse-spicker-db
23+
docker-compose up -d
24+
```
25+
26+
This will start:
27+
- MySQL database on port 3308 with the Pokemon data
28+
- phpMyAdmin interface on port 8080 for database management
29+
30+
2. **Install Packages:**
1531
```bash
1632
pip install -r requirements.txt
1733
```
34+
*or using uv:*
35+
```bash
36+
uv sync
37+
```
1838

19-
2. **Set API Key:**
20-
Set the environment variable for your OpenAI API key.
39+
3. **Set Environment Variables:**
40+
41+
Copy the example environment file and configure your settings:
2142
```bash
22-
export OPENAI_API_KEY="your-api-key-here"
43+
cp .env.example .env
44+
```
45+
46+
Edit the `.env` file with your OpenAI API key:
47+
```
48+
OPENAI_API_KEY=your-openai-api-key-here
49+
```
50+
51+
The MySQL database configuration should match the Docker setup:
52+
```
53+
MYSQL_HOST=localhost
54+
MYSQL_PORT=3308
55+
MYSQL_USER=root
56+
MYSQL_PASSWORD=root
57+
MYSQL_DB=db_pokemon
2358
```
24-
*(Replace `"your-api-key-here"` with your actual key)*
2559

26-
3. **Verify API Key (Optional):**
60+
4. **Verify API Key (Optional):**
2761
Run a quick check using the utility script. If successful, it will print a short joke.
2862
```bash
29-
python utils.py
63+
python utils/call_llm.py
64+
```
65+
*or using uv:*
66+
```bash
67+
uv run python utils/call_llm.py
3068
```
3169
*(Note: This requires a valid API key to be set.)*
3270

33-
4. **Run Default Example:**
34-
Execute the main script. This will create the sample `ecommerce.db` if it doesn't exist and run the workflow with a default query.
71+
5. **Run Default Example:**
72+
Execute the main script with a sample query:
3573
```bash
36-
python main.py
74+
python main.py "Show me 5 Feuer type Pokemon"
3775
```
38-
The default query is:
39-
> Show me the names and email addresses of customers from New York
40-
41-
5. **Run Custom Query:**
42-
Provide your own natural language query as command-line arguments after the script name.
76+
*or using uv:*
4377
```bash
44-
python main.py What is the total stock quantity for products in the 'Accessories' category?
78+
uv run python main.py "Show me 5 Feuer type Pokemon"
4579
```
46-
Or, for queries with spaces, ensure they are treated as a single argument by the shell if necessary (quotes might help depending on your shell):
80+
81+
6. **Run Custom Queries:**
82+
Try different queries in German or English:
4783
```bash
48-
python main.py "List orders placed in the last 30 days with status 'shipped'"
84+
python main.py "Show me 3 Pokemon names"
85+
python main.py "Find all Pokemon with Wasser type"
86+
python main.py "List Pokemon from Generation 1"
4987
```
5088

5189
## How It Works
@@ -57,7 +95,7 @@ graph LR
5795
A[Get Schema] --> B[Generate SQL]
5896
B --> C[Execute SQL]
5997
C -- Success --> E[End]
60-
C -- SQLite Error --> D{Debug SQL Attempt}
98+
C -- MySQL Error --> D{Debug SQL Attempt}
6199
D -- Corrected SQL --> C
62100
C -- Max Retries Reached --> F[End with Error]
63101
@@ -68,11 +106,11 @@ graph LR
68106

69107
**Node Descriptions:**
70108

71-
1. **`GetSchema`**: Connects to the SQLite database (`ecommerce.db` by default) and extracts the schema (table names and columns).
72-
2. **`GenerateSQL`**: Takes the natural language query and the database schema, prompts the LLM to generate an SQLite query (expecting YAML output with the SQL), and parses the result.
73-
3. **`ExecuteSQL`**: Attempts to run the generated SQL against the database.
109+
1. **`GetSchema`**: Connects to the MySQL Pokemon database and extracts the schema (table names and columns) including tables like `pokemon`, `typ`, `attacke`, etc.
110+
2. **`GenerateSQL`**: Takes the natural language query and the database schema, prompts the LLM to generate a MySQL query (expecting YAML output with the SQL), and parses the result.
111+
3. **`ExecuteSQL`**: Attempts to run the generated SQL against the Pokemon database.
74112
* If successful, the results are stored, and the flow ends successfully.
75-
* If an `sqlite3.Error` occurs (e.g., syntax error), it captures the error message and triggers the debug loop.
113+
* If a MySQL error occurs (e.g., syntax error), it captures the error message and triggers the debug loop.
76114
4. **`DebugSQL`**: If `ExecuteSQL` failed, this node takes the original query, schema, failed SQL, and error message, prompts the LLM to generate a *corrected* SQL query (again, expecting YAML).
77115
5. **(Loop)**: The corrected SQL from `DebugSQL` is passed back to `ExecuteSQL` for another attempt.
78116
6. **(End Conditions)**: The loop continues until `ExecuteSQL` succeeds or the maximum number of debug attempts (default: 3) is reached.
@@ -82,81 +120,145 @@ graph LR
82120
- [`main.py`](./main.py): Main entry point to run the workflow. Handles command-line arguments for the query.
83121
- [`flow.py`](./flow.py): Defines the PocketFlow `Flow` connecting the different nodes, including the debug loop logic.
84122
- [`nodes.py`](./nodes.py): Contains the `Node` classes for each step (`GetSchema`, `GenerateSQL`, `ExecuteSQL`, `DebugSQL`).
85-
- [`utils.py`](./utils.py): Contains the minimal `call_llm` utility function.
86-
- [`populate_db.py`](./populate_db.py): Script to create and populate the sample `ecommerce.db` SQLite database.
123+
- [`utils/call_llm.py`](./utils/call_llm.py): Contains the `call_llm` utility function for OpenAI API interactions.
124+
- [`populate_db.py`](./populate_db.py): Utility script with MySQL connection helper function.
87125
- [`requirements.txt`](./requirements.txt): Lists Python package dependencies.
126+
- [`.env.example`](./.env.example): Example environment variables configuration file.
88127
- [`README.md`](./README.md): This file.
89128

90129
## Example Output (Successful Run)
91130

92131
```
93132
=== Starting Text-to-SQL Workflow ===
94-
Query: 'total products per category'
95-
Database: ecommerce.db
133+
Query: 'Show me 5 Feuer type Pokemon'
134+
Database: MySQL Pokemon Database
96135
Max Debug Retries on SQL Error: 3
97136
=============================================
98137
99138
===== DB SCHEMA =====
100139
101-
Table: customers
102-
- customer_id (INTEGER)
103-
- first_name (TEXT)
104-
- last_name (TEXT)
105-
- email (TEXT)
106-
- registration_date (DATE)
107-
- city (TEXT)
108-
- country (TEXT)
109-
110-
Table: sqlite_sequence
111-
- name ()
112-
- seq ()
113-
114-
Table: products
115-
- product_id (INTEGER)
116-
- name (TEXT)
117-
- description (TEXT)
118-
- category (TEXT)
119-
- price (REAL)
120-
- stock_quantity (INTEGER)
121-
122-
Table: orders
123-
- order_id (INTEGER)
124-
- customer_id (INTEGER)
125-
- order_date (TIMESTAMP)
126-
- status (TEXT)
127-
- total_amount (REAL)
128-
- shipping_address (TEXT)
129-
130-
Table: order_items
131-
- order_item_id (INTEGER)
132-
- order_id (INTEGER)
133-
- product_id (INTEGER)
134-
- quantity (INTEGER)
135-
- price_per_unit (REAL)
140+
Table: arenaleiter
141+
- Name (varchar(255))
142+
- Generation (int)
143+
- Standort (varchar(255))
144+
- Typ (varchar(255))
145+
- Orden (varchar(255))
146+
147+
Table: attacke
148+
- ID (int)
149+
- Name (varchar(255))
150+
- Typ (varchar(255))
151+
- Schadensklasse (varchar(255))
152+
- Staerke (int)
153+
- Genauigkeit (int)
154+
- AP (int)
155+
- Generation (int)
156+
157+
Table: pokemon
158+
- ID (int)
159+
- Name (varchar(255))
160+
- Groesse (float)
161+
- Gewicht (float)
162+
- Generation (int)
163+
- PrimaerTyp (varchar(255))
164+
- SekundaerTyp (varchar(255))
165+
166+
Table: typ
167+
- Bezeichnung (varchar(255))
168+
169+
... (additional tables)
136170
137171
=====================
138172
139173
140174
===== GENERATED SQL (Attempt 1) =====
141175
142-
SELECT category, COUNT(*) AS total_products
143-
FROM products
144-
GROUP BY category
176+
SELECT Name
177+
FROM pokemon
178+
WHERE PrimaerTyp = 'Feuer' OR SekundaerTyp = 'Feuer'
179+
LIMIT 5
145180
146181
====================================
147182
148-
SQL executed in 0.000 seconds.
183+
SQL executed in 0.002 seconds.
149184
150185
===== SQL EXECUTION SUCCESS =====
151186
152-
category | total_products
153-
-------------------------
154-
Accessories | 3
155-
Apparel | 1
156-
Electronics | 3
157-
Home Goods | 2
158-
Sports | 1
187+
Name
188+
----
189+
Glumanda
190+
Glutexo
191+
Glurak
192+
Vulpix
193+
Vulnona
194+
195+
=================================
159196
160197
=== Workflow Completed Successfully ===
161198
====================================
162199
```
200+
201+
## Database Schema Overview
202+
203+
The German Pokemon database includes the following key tables:
204+
205+
- **`pokemon`**: Main Pokemon data (Name, Groesse, Gewicht, Generation, PrimaerTyp, SekundaerTyp)
206+
- **`typ`**: Pokemon types (Bezeichnung - e.g., "Feuer", "Wasser", "Pflanze")
207+
- **`attacke`**: Pokemon moves/attacks (Name, Typ, Schadensklasse, Staerke, Genauigkeit, AP)
208+
- **`entwicklung`**: Evolution chains (Von, Zu, Level, Item, etc.)
209+
- **`lernt`**: Pokemon move learning (Pokemon, Attacke, Level, Methode)
210+
- **`effektivitaet`**: Type effectiveness chart (Multiplikator, Angreifend, Verteidigend)
211+
- **`arenaleiter`**: Gym leaders (Name, Generation, Standort, Typ, Orden)
212+
213+
## Sample Queries
214+
215+
Try these example queries with the German Pokemon database:
216+
217+
```bash
218+
# Basic Pokemon queries
219+
uv run python main.py "Show me 5 Pokemon names"
220+
uv run python main.py "Find all Feuer type Pokemon"
221+
uv run python main.py "Show me Pokemon from Generation 1"
222+
223+
# More complex queries
224+
uv run python main.py "Which Pokemon can learn Donnerschlag?"
225+
uv run python main.py "Show me all Wasser/Flug dual-type Pokemon"
226+
uv run python main.py "Find the heaviest Pokemon in the database"
227+
```
228+
229+
### Advanced German Pokemon Queries
230+
231+
These complex queries demonstrate the power of natural language to SQL conversion with sophisticated database operations:
232+
233+
```bash
234+
# Dual-type analysis with statistics
235+
uv run python main.py "Welche Pokemon haben sowohl Feuer als Primärtyp als auch eine Sekundärtyp, und wie schwer sind sie?"
236+
# → Shows Fire-type Pokemon with secondary types and their weights
237+
238+
# Move power analysis by damage class
239+
uv run python main.py "Zeige mir die stärksten Attacken jeder Schadensklasse und deren Genauigkeit"
240+
# → Finds the most powerful attacks in each damage class with accuracy stats
241+
242+
# Evolution chain analysis
243+
uv run python main.py "Welche Pokemon entwickeln sich durch Items und auf welchem Level müssen sie sein?"
244+
# → Lists Pokemon that evolve using items and required levels
245+
246+
# Type effectiveness relationships
247+
uv run python main.py "Gegen welche Typen ist Feuer besonders effektiv und mit welchem Multiplikator?"
248+
# → Shows which types Fire is super effective against with damage multipliers
249+
250+
# Statistical analysis by type
251+
uv run python main.py "Welcher Typ hat die schwersten Pokemon im Durchschnitt und wie schwer sind sie?"
252+
# → Calculates which Pokemon type has the heaviest average weight
253+
254+
# Cross-type move learning
255+
uv run python main.py "Welche Wasser-Pokemon koennen Feuerattacken lernen?"
256+
# → Finds Water-type Pokemon that can learn Fire-type moves (type coverage analysis)
257+
```
258+
259+
**Query Complexity Features Demonstrated:**
260+
- **Multi-table JOINs**: Combining Pokemon, moves, evolution, and type effectiveness data
261+
- **Statistical Functions**: AVG(), MAX(), COUNT() for data analysis
262+
- **Conditional Logic**: Complex WHERE clauses with multiple conditions
263+
- **Data Relationships**: Evolution chains, type effectiveness charts, move learning patterns
264+
- **German Language Processing**: Natural language queries in German translated to precise SQL
-28 KB
Binary file not shown.

cookbook/pocketflow-text2sql/main.py

Lines changed: 6 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,13 @@
11
import sys
22
import os
3+
from dotenv import load_dotenv
34
from flow import create_text_to_sql_flow
4-
from populate_db import populate_database, DB_FILE
55

6-
def run_text_to_sql(natural_query, db_path=DB_FILE, max_debug_retries=3):
7-
if not os.path.exists(db_path) or os.path.getsize(db_path) == 0:
8-
print(f"Database at {db_path} missing or empty. Populating...")
9-
populate_database(db_path)
6+
# Load environment variables from .env file
7+
load_dotenv()
108

9+
def run_text_to_sql(natural_query, max_debug_retries=3):
1110
shared = {
12-
"db_path": db_path,
1311
"natural_query": natural_query,
1412
"max_debug_attempts": max_debug_retries,
1513
"debug_attempts": 0,
@@ -19,7 +17,7 @@ def run_text_to_sql(natural_query, db_path=DB_FILE, max_debug_retries=3):
1917

2018
print(f"\n=== Starting Text-to-SQL Workflow ===")
2119
print(f"Query: '{natural_query}'")
22-
print(f"Database: {db_path}")
20+
print(f"Database: MySQL Pokemon Database")
2321
print(f"Max Debug Retries on SQL Error: {max_debug_retries}")
2422
print("=" * 45)
2523

@@ -44,6 +42,6 @@ def run_text_to_sql(natural_query, db_path=DB_FILE, max_debug_retries=3):
4442
if len(sys.argv) > 1:
4543
query = " ".join(sys.argv[1:])
4644
else:
47-
query = "total products per category"
45+
query = "Show me all Pokemon with their types"
4846

4947
run_text_to_sql(query)

0 commit comments

Comments
 (0)