A tool to convert CSV files containing ChatGPT/GPT-4 conversation logs into mooncake-style JSONL format for load testing and simulation.
Note
Currently, KV reuse is not considered in the output. We will update the script once BurstGPT adds user session information.
The input CSV can be downloaded from BurstGPT Release v1.1:
Timestamp: Request timestamp in secondsModel: Model name (e.g., "ChatGPT", "GPT-4")Request tokens: Number of input tokensResponse tokens: Number of output tokensTotal tokens: Total tokens (not used)Log Type: Type of log (e.g., "Conversation log", "API log")
Example:
Timestamp,Model,Request tokens,Response tokens,Total tokens,Log Type
5,ChatGPT,472,18,490,Conversation log
45,ChatGPT,1087,230,1317,Conversation log
118,GPT-4,417,276,693,Conversation logThe output is a JSONL file where each line is a JSON object:
{"timestamp": 5000, "input_length": 472, "output_length": 18, "hash_ids": [123, 456, 789, ...]}Fields:
timestamp: Request time in milliseconds (integer)input_length: Number of input tokensoutput_length: Number of output tokenshash_ids: Array of random hash IDs simulating KV cache blocks
python convert.py --input-file <BurstGPT CSV data>If --output-file is not specified, the output will use the input filename with .jsonl extension.
--input-file: Path to the input CSV file
Filtering:
--model: Filter by model (ChatGPTorGPT-4), None for no filtering--log-type: Filter by log type (Conversation logorAPI log), None for no filtering--num-prompt: Limit number of rows in the final output, None for no filtering
Timestamp Adjustment:
--speed-ratio: Adjust request timing (default: 1.0)- Values > 1: Speed up (e.g., 2.0 = 2x faster)
- Values < 1: Slow down (e.g., 0.5 = 2x slower)
- Formula:
new_timestamp = old_timestamp / speed_ratio
Hash Generation:
--block-size: Block size in mooncake traces (default: 128)--num-hash-blocks: Maximum hash ID value (default: 10000). Hash IDs are randomly chosen from 0 to this value for each block. Output:--output-file: Path to output JSONL file (default: input filename with .jsonl extension)
After conversion, the script displays statistics about the generated workload:
============================================================
STATISTICS
============================================================
Input Length (ISL):
Min: 37
Max: 1528
Avg: 705.89
Std: 524.33
Output Length (OSL):
Min: 18
Max: 1656
Avg: 494.67
Std: 513.21
Sequence Length (ISL + OSL):
Max: 3184
Request Rate:
Total requests: 9
Duration: 405.00 seconds
Average RPS: 0.02
============================================================