fix(streaming): thinking-only retry guard + 3-layer SSE/middleware/actor diagnostics + RollingFileLogger Debug fix by Aaronontheweb · Pull Request #947 · netclaw-dev/netclaw

Aaronontheweb · 2026-05-09T14:38:58Z

Summary

Thinking-only retry guard: when a streaming LLM call completes with reasoning content but no visible text/tool calls (Qwen3 + --jinja regime), retry via EvaluateEmptyResponse instead of letting Slack post a silent fallback reply.
Stale-message absorption in LlmSessionActor.Ready for late LlmCallFailed / LlmResponseReceived after watchdog timeouts (kills noisy dead letters).
3-layer streaming diagnostics (Debug level): OpenAiCompatibleChatClient (SSE wire), LoggingChatClient (middleware), LlmSessionActor (post-assembly). Counts text deltas/chars, thinking deltas/chars, tool-call deltas, and finish reason at each layer so operators can pinpoint where deltas land or get dropped.
ILoggerFactory injection in OpenAiCompatibleProviderPlugin so the SSE-layer log isn't swallowed by NullLogger.Instance.
RollingFileLogger Debug-floor fix: removes the hardcoded >= LogLevel.Information filter so Logging:LogLevel:Default=Debug actually persists Debug logs to ~/.netclaw/logs/daemon-*.log (bug: RollingFileLogger hardcodes Information floor, ignores framework log level #908). Without this, the new diagnostics — and any other Debug logs — never reach disk regardless of config.

Refs netclaw-dev/netclaw-website#16 — delivers the 3-layer SSE / middleware / actor diagnostic counters described in that issue's "What Netclaw provides to help diagnose this" section. Does not close it: the docs/troubleshooting article portion of #16 is still open and should be addressed separately on the netclaw-website repo.

Context

Symptom: a recent self-hosted Slack session (D0AC6CKBK5K/1778333220.192409, 2026-05-09) produced three consecutive streaming turns with output: 2/4/28 final tokens despite many Thinking delta: events arriving. Each turn ended with Turn completed without visible Slack output; posting fallback reply. Same backend, the non-streaming title and distillation calls returned full output (1752 / 1358 / 466 tokens), so the failure was specific to the streaming path with reasoning content.

Pattern lines up with the May 8 testlab fix fix(llama-server): add --jinja so Qwen3 tool-call template is honored. With --jinja + --reasoning-format deepseek, llama-server now correctly emits Qwen3's <think> content as delta.reasoning_content (separate channel) rather than jumbling it into delta.content. Our streaming consumer surfaces those reasoning deltas as Thinking but the assistant response can be empty if the model never transitions to content — that's the case the new retry guard handles.

Composition

Three commits, smallest-change-first:

a64819c3 — diagnostics + thinking-only retry guard (cherry-picked from prior investigation branch claude-wt-netclaw-insta-crash)
3a86cf54 — ILoggerFactory injection (cherry-picked; otherwise SSE-layer log is silent)
1bf8da37 — RollingFileLogger Debug-floor fix (#908)

The temporary Warning/Info level bump from the original investigation branch was deliberately not included — the diagnostics stay at Debug and configuration decides what gets persisted.

Test plan

Full test suite passes (3,342 / 3,342, 0 failures)
dotnet build clean (0 warnings, 0 errors)
Live validation against testlab (https://llm.testlab.petabridge.net, Qwen3.6-27B-UD-Q4_K_XL.gguf) in an isolated Docker container with NETCLAW_Logging__LogLevel__Default=Debug. All three breakdown logs landed in the daemon log file at [DBG]:
```
SSE        : textDeltas=2 textChars=3 thinkingDeltas=37 thinkingChars=150 toolCallDeltas=0 finishReason=stop
Middleware : textDeltas=2 textChars=3 thinkingDeltas=37 thinkingChars=150 toolCallDeltas=0 finishReason=stop
Actor      : text=3ch thinking=150ch toolCalls=0 finishReason=stop
```
Counts agree across all three layers for a healthy call — confirming no delta loss in the happy path and that the instrumentation is wired correctly to compare against fault scenarios.
Reproduce the May 9 fault pattern with the retry guard active and confirm a real assistant message replaces the fallback reply.

Add debug-level logging across the LLM response pipeline to diagnose sessions that produce tokens but no visible Slack output. Three layers now report content type breakdowns (text/thinking/tool call counts and char totals): OpenAiCompatibleChatClient (SSE), LoggingChatClient (middleware), and LlmSessionActor (actor). Comparing the three logs pinpoints whether content is misrouted upstream (llama.cpp) or dropped downstream (Netclaw parsing/ToolCallTextFilter). Unify the empty-response and thinking-only guards into a single check that retries via EvaluateEmptyResponse when the LLM produces reasoning content but no visible text or tool calls — prevents silent fallback replies in Slack. Absorb stale LlmCallFailed and LlmResponseReceived messages in the Ready state to eliminate noisy dead letters after watchdog timeouts.

OpenAiCompatibleProviderPlugin was constructing OpenAiCompatibleChatClient without a logger, so _logger fell back to NullLogger.Instance and the SSE-layer "stream content breakdown" Debug log was silently swallowed. Wire ILoggerFactory through DI and create a categorized logger.

…etclaw-dev#908) RollingFileLogger.IsEnabled was hardcoded to LogLevel.Information, which silently overrode any Debug-level configuration coming through Logging.LogLevel.Default or SetMinimumLevel. The framework was correctly configured for Debug, but every Debug log was rejected at the file sink — only the console sink saw them. This made the new SSE / middleware / actor content-breakdown diagnostics invisible in production daemon logs, and is consistent with operator reports that "structured log output isn't appearing in daemon logs." Defer entirely to the framework's configured minimum level instead of imposing our own floor. Cherry-picked from the diagnostic-investigation branch (the original 6558a47 also bumped specific call sites to Warning/Info temporarily; that bump is intentionally skipped here — the diagnostic logs stay at Debug and we let configuration decide what gets persisted).

Aaronontheweb

Need to clarify one thing before we merge this

Aaronontheweb · 2026-05-09T14:51:31Z

        });

        Command<ProcessingWatchdogExpired>(_ => { });
+        Command<LlmCallFailed>(_ => { }); // stale failure arriving after watchdog timeout


Aaronontheweb · 2026-05-09T14:54:15Z

    public IDisposable? BeginScope<TState>(TState state) where TState : notnull => null;

-    public bool IsEnabled(LogLevel logLevel) => logLevel >= LogLevel.Information;
+    public bool IsEnabled(LogLevel logLevel) => logLevel != LogLevel.None;


ensures that debug logs can actually show up now

Aaronontheweb added 3 commits May 9, 2026 14:04

Aaronontheweb added sessions LLM session actor, turn lifecycle, pipelines reliability Retries, resilience, graceful degradation labels May 9, 2026

Aaronontheweb mentioned this pull request May 9, 2026

Docs: troubleshooting guide for XML/tool-call markup leaking into chat sessions (self-hosted llama.cpp / Qwen3 etc.) netclaw-dev/netclaw-website#16

Open

Aaronontheweb commented May 9, 2026

View reviewed changes

Aaronontheweb merged commit fe5c89b into netclaw-dev:dev May 9, 2026
7 of 8 checks passed

Aaronontheweb deleted the investigate/llamacpp-openai-chat-faults branch May 9, 2026 15:07

Aaronontheweb mentioned this pull request May 9, 2026

investigation: measure tool schema token weight in system prompt #622

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(streaming): thinking-only retry guard + 3-layer SSE/middleware/actor diagnostics + RollingFileLogger Debug fix#947

fix(streaming): thinking-only retry guard + 3-layer SSE/middleware/actor diagnostics + RollingFileLogger Debug fix#947
Aaronontheweb merged 3 commits intonetclaw-dev:devfrom
Aaronontheweb:investigate/llamacpp-openai-chat-faults

Aaronontheweb commented May 9, 2026 •

edited

Loading

Uh oh!

Aaronontheweb left a comment

Uh oh!

Aaronontheweb May 9, 2026

Uh oh!

Uh oh!

Aaronontheweb May 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Aaronontheweb commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related

Context

Composition

Test plan

Uh oh!

Aaronontheweb left a comment

Choose a reason for hiding this comment

Uh oh!

Aaronontheweb May 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Aaronontheweb May 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Aaronontheweb commented May 9, 2026 •

edited

Loading