openai · jalateras · Sep 12, 2025
diff --git a/examples/How_to_count_tokens_with_tiktoken.ipynb b/examples/How_to_count_tokens_with_tiktoken.ipynb
@@ -783,6 +783,59 @@
     "    print(f'{response.usage.prompt_tokens} prompt tokens counted by the OpenAI API.')\n",
     "    print()"
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "source": "## 8. Counting tokens for structured responses\n\nWhen using structured outputs with the `response_format` parameter, the model uses additional tokens to enforce the schema. This is particularly relevant when using JSON mode or JSON schemas.\n\nBelow we'll explore how to count tokens for requests that use structured responses.",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "source": "import json\nfrom pydantic import BaseModel\n\ndef num_tokens_for_structured_response(messages, response_format, model=\"gpt-4o-mini\"):\n    \"\"\"\n    Count tokens for requests with structured responses.\n    \n    Args:\n        messages: List of message dictionaries\n        response_format: Dict with \"type\" key, optionally \"json_schema\" for structured output\n        model: Model name to use for token counting\n    \n    Returns:\n        Dictionary with prompt_tokens and estimated schema overhead\n    \"\"\"\n    try:\n        encoding = tiktoken.encoding_for_model(model)\n    except KeyError:\n        print(\"Warning: model not found. Using o200k_base encoding.\")\n        encoding = tiktoken.get_encoding(\"o200k_base\")\n    \n    # Count base message tokens\n    base_tokens = num_tokens_from_messages(messages, model)\n    \n    # Estimate schema overhead\n    schema_overhead = 0\n    \n    if response_format.get(\"type\") == \"json_object\":\n        # JSON mode adds minimal overhead (instructions to output valid JSON)\n        schema_overhead = 10  # Approximate overhead for JSON mode instructions\n        \n    elif response_format.get(\"type\") == \"json_schema\":\n        # JSON schema mode adds the schema definition as overhead\n        if \"json_schema\" in response_format:\n            schema = response_format[\"json_schema\"].get(\"schema\", {})\n            # Convert schema to string to estimate tokens\n            schema_str = json.dumps(schema, separators=(',', ':'))\n            schema_overhead = len(encoding.encode(schema_str))\n            # Add additional overhead for schema validation instructions\n            schema_overhead += 20  # Approximate additional instruction overhead\n    \n    return {\n        \"prompt_tokens\": base_tokens,\n        \"schema_overhead\": schema_overhead,\n        \"total_estimated\": base_tokens + schema_overhead\n    }",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": []
+  },
+  {
+   "cell_type": "markdown",
+   "source": "### Example 1: Simple JSON mode\n\nWhen using `response_format={\"type\": \"json_object\"}`, the model is instructed to output valid JSON, which adds a small token overhead.",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "source": "# Example with JSON mode\njson_mode_messages = [\n    {\n        \"role\": \"system\",\n        \"content\": \"You are a helpful assistant. Always respond with valid JSON.\"\n    },\n    {\n        \"role\": \"user\",\n        \"content\": \"List three fruits with their colors.\"\n    }\n]\n\nresponse_format_json = {\"type\": \"json_object\"}\n\n# Calculate tokens\ntoken_info = num_tokens_for_structured_response(json_mode_messages, response_format_json, \"gpt-4o-mini\")\nprint(\"JSON Mode Token Counting:\")\nprint(f\"Base prompt tokens: {token_info['prompt_tokens']}\")\nprint(f\"Schema overhead: {token_info['schema_overhead']}\")\nprint(f\"Total estimated: {token_info['total_estimated']}\")\n\n# Verify with actual API call\nresponse = client.chat.completions.create(\n    model=\"gpt-4o-mini\",\n    messages=json_mode_messages,\n    response_format=response_format_json,\n    temperature=0,\n    max_tokens=100\n)\nprint(f\"Actual prompt tokens from API: {response.usage.prompt_tokens}\")\nprint(f\"Response: {response.choices[0].message.content}\")",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": []
+  },
+  {
+   "cell_type": "markdown",
+   "source": "### Example 2: Structured output with JSON Schema\n\nWhen using structured outputs with a JSON schema, the model needs additional tokens to understand and enforce the schema structure.",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "source": "# Example with structured JSON schema\nstructured_messages = [\n    {\n        \"role\": \"system\",\n        \"content\": \"You are a helpful assistant that extracts information about books.\"\n    },\n    {\n        \"role\": \"user\",\n        \"content\": \"Tell me about 'The Great Gatsby' by F. Scott Fitzgerald.\"\n    }\n]\n\n# Define a structured schema for book information\nresponse_format_schema = {\n    \"type\": \"json_schema\",\n    \"json_schema\": {\n        \"name\": \"book_info\",\n        \"strict\": True,\n        \"schema\": {\n            \"type\": \"object\",\n            \"properties\": {\n                \"title\": {\n                    \"type\": \"string\",\n                    \"description\": \"The title of the book\"\n                },\n                \"author\": {\n                    \"type\": \"string\",\n                    \"description\": \"The author of the book\"\n                },\n                \"year_published\": {\n                    \"type\": \"integer\",\n                    \"description\": \"The year the book was published\"\n                },\n                \"genre\": {\n                    \"type\": \"array\",\n                    \"items\": {\n                        \"type\": \"string\"\n                    },\n                    \"description\": \"List of genres for the book\"\n                },\n                \"summary\": {\n                    \"type\": \"string\",\n                    \"description\": \"A brief summary of the book\"\n                }\n            },\n            \"required\": [\"title\", \"author\", \"year_published\", \"genre\", \"summary\"],\n            \"additionalProperties\": False\n        }\n    }\n}\n\n# Calculate tokens\ntoken_info = num_tokens_for_structured_response(structured_messages, response_format_schema, \"gpt-4o-mini\")\nprint(\"Structured Output Token Counting:\")\nprint(f\"Base prompt tokens: {token_info['prompt_tokens']}\")\nprint(f\"Schema overhead: {token_info['schema_overhead']}\")\nprint(f\"Total estimated: {token_info['total_estimated']}\")\n\n# Verify with actual API call\nresponse = client.chat.completions.create(\n    model=\"gpt-4o-mini\",\n    messages=structured_messages,\n    response_format=response_format_schema,\n    temperature=0,\n    max_tokens=200\n)\nprint(f\"Actual prompt tokens from API: {response.usage.prompt_tokens}\")\nprint(f\"Response: {json.dumps(json.loads(response.choices[0].message.content), indent=2)}\")",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": []
+  },
+  {
+   "cell_type": "markdown",
+   "source": "### Example 3: Comparing token usage across different response formats\n\nLet's compare the token usage for the same request with different response formats.",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "source": "# Compare token usage across different response formats\ncomparison_messages = [\n    {\n        \"role\": \"user\",\n        \"content\": \"Analyze the sentiment of this text: 'I love sunny days!'\"\n    }\n]\n\n# Define a simple sentiment analysis schema\nsentiment_schema = {\n    \"type\": \"json_schema\",\n    \"json_schema\": {\n        \"name\": \"sentiment_analysis\",\n        \"strict\": True,\n        \"schema\": {\n            \"type\": \"object\",\n            \"properties\": {\n                \"sentiment\": {\n                    \"type\": \"string\",\n                    \"enum\": [\"positive\", \"negative\", \"neutral\"],\n                    \"description\": \"The overall sentiment\"\n                },\n                \"confidence\": {\n                    \"type\": \"number\",\n                    \"description\": \"Confidence score between 0 and 1\"\n                },\n                \"explanation\": {\n                    \"type\": \"string\",\n                    \"description\": \"Brief explanation of the sentiment\"\n                }\n            },\n            \"required\": [\"sentiment\", \"confidence\", \"explanation\"],\n            \"additionalProperties\": False\n        }\n    }\n}\n\nprint(\"Token Usage Comparison for Different Response Formats:\\n\")\nprint(\"=\" * 60)\n\n# Test different formats\nformats_to_test = [\n    (\"No format (regular text)\", None),\n    (\"JSON mode\", {\"type\": \"json_object\"}),\n    (\"Structured output\", sentiment_schema)\n]\n\nfor format_name, response_format in formats_to_test:\n    print(f\"\\n{format_name}:\")\n    print(\"-\" * 40)\n    \n    if response_format:\n        token_info = num_tokens_for_structured_response(comparison_messages, response_format, \"gpt-4o-mini\")\n        print(f\"Estimated total tokens: {token_info['total_estimated']}\")\n        print(f\"  - Base prompt: {token_info['prompt_tokens']}\")\n        print(f\"  - Schema overhead: {token_info['schema_overhead']}\")\n    else:\n        # Regular message without structured output\n        tokens = num_tokens_from_messages(comparison_messages, \"gpt-4o-mini\")\n        print(f\"Estimated total tokens: {tokens}\")\n    \n    # Make actual API call for verification\n    try:\n        if response_format:\n            response = client.chat.completions.create(\n                model=\"gpt-4o-mini\",\n                messages=comparison_messages,\n                response_format=response_format,\n                temperature=0,\n                max_tokens=100\n            )\n        else:\n            response = client.chat.completions.create(\n                model=\"gpt-4o-mini\",\n                messages=comparison_messages,\n                temperature=0,\n                max_tokens=100\n            )\n        \n        print(f\"Actual API prompt tokens: {response.usage.prompt_tokens}\")\n        print(f\"Completion tokens used: {response.usage.completion_tokens}\")\n    except Exception as e:\n        print(f\"API call failed: {e}\")\n\nprint(\"\\n\" + \"=\" * 60)",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": []
+  },
+  {
+   "cell_type": "markdown",
+   "source": "### Key takeaways for structured response token counting\n\n1. **JSON mode** (`response_format={\"type\": \"json_object\"}`) adds minimal token overhead (~10 tokens) for instructing the model to output valid JSON.\n\n2. **Structured outputs with JSON schemas** add more significant overhead proportional to the schema complexity. The overhead includes:\n   - The schema definition itself (encoded as tokens)\n   - Additional instructions for schema validation (~20 tokens)\n\n3. **Schema complexity matters**: Larger schemas with more properties, nested objects, and detailed descriptions will consume more tokens.\n\n4. **Plan accordingly**: When using structured outputs, factor in the additional token usage for both cost estimation and context window management.\n\n5. **The estimates above are approximations**: Actual token usage may vary slightly based on the model version and internal optimizations.",
+   "metadata": {}
   }
  ],
  "metadata": {
@@ -811,4 +864,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 2
-}
+}