fix(llm): collect usage stats from final stream chunk by 0xallam · Pull Request #276 · usestrix/strix

0xallam · 2026-01-21T04:33:26Z

Problem

Input tokens were always showing as zero in usage stats. This regression was introduced in commit 56526cb which added an early break when </function> was found in the streamed response.

When using streaming with stream_options: {"include_usage": True}, the LLM API sends token usage data in a separate final chunk after all content chunks:

chunk 1: {"content": "Let me analyze..."}
chunk 2: {"content": "</function>"}
chunk 3: {"content": null, "usage": {"prompt_tokens": 1500, "completion_tokens": 200}}  ← FINAL CHUNK

The early break caused us to exit the loop before receiving chunk 3, so stream_chunk_builder(chunks) built a response with no usage data.

Solution

Instead of breaking immediately when </function> is found, we now:

Set a flag and continue collecting chunks
Break when we receive a chunk with usage data (ideal case)
Fall back to breaking after 5 additional chunks (prevents infinite loops with misbehaving models)

if done_streaming:
    done_streaming += 1
    if getattr(chunk, "usage", None) or done_streaming > 5:
        break  # Got usage or fallback limit
    continue

This ensures we capture the usage chunk while still protecting against models that don't properly end their streams.

Thanks to @bearsyankees for catching this issue!

The early break on </function> prevented receiving the final chunk that contains token usage data (input_tokens, output_tokens).

greptile-apps · 2026-01-21T04:35:40Z

Greptile Summary

This PR fixes a regression where input tokens were always showing as zero in usage stats. The issue was caused by an early break statement that exited the streaming loop before receiving the final chunk containing usage data from the LLM API.

The fix replaces the immediate break with a flag-based approach that:

Continues collecting chunks after </function> is detected
Breaks when a chunk with usage data arrives (ideal case)
Falls back to breaking after 5 additional chunks (prevents infinite loops)

This ensures usage statistics are properly captured while maintaining protection against misbehaving models.

Confidence Score: 5/5

This PR is safe to merge with minimal risk
The fix correctly addresses the root cause of the usage stats regression with a simple, well-bounded solution. The logic is sound: it continues streaming after </function> is found, breaks when usage data arrives, and includes a safety limit of 5 additional chunks to prevent infinite loops. The change is minimal, focused, and doesn't affect other functionality.
No files require special attention

Important Files Changed

Filename	Overview
strix/llm/llm.py	Fixed regression where usage stats were always zero by collecting final stream chunk before breaking

fix(llm): collect usage stats from final stream chunk

ff25ece

The early break on </function> prevented receiving the final chunk that contains token usage data (input_tokens, output_tokens).

0xallam merged commit b456a4e into main Jan 21, 2026
1 check passed

0xallam deleted the fix/input-token-counting branch January 21, 2026 04:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(llm): collect usage stats from final stream chunk#276

fix(llm): collect usage stats from final stream chunk#276
0xallam merged 1 commit intomainfrom
fix/input-token-counting

0xallam commented Jan 21, 2026

Uh oh!

greptile-apps bot commented Jan 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

0xallam commented Jan 21, 2026

Problem

Solution

Uh oh!

greptile-apps bot commented Jan 21, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant