fix(streaming): thinking-only retry guard + 3-layer SSE/middleware/actor diagnostics + RollingFileLogger Debug fix#947
Merged
Aaronontheweb merged 3 commits intonetclaw-dev:devfrom May 9, 2026
Conversation
Add debug-level logging across the LLM response pipeline to diagnose sessions that produce tokens but no visible Slack output. Three layers now report content type breakdowns (text/thinking/tool call counts and char totals): OpenAiCompatibleChatClient (SSE), LoggingChatClient (middleware), and LlmSessionActor (actor). Comparing the three logs pinpoints whether content is misrouted upstream (llama.cpp) or dropped downstream (Netclaw parsing/ToolCallTextFilter). Unify the empty-response and thinking-only guards into a single check that retries via EvaluateEmptyResponse when the LLM produces reasoning content but no visible text or tool calls — prevents silent fallback replies in Slack. Absorb stale LlmCallFailed and LlmResponseReceived messages in the Ready state to eliminate noisy dead letters after watchdog timeouts.
OpenAiCompatibleProviderPlugin was constructing OpenAiCompatibleChatClient without a logger, so _logger fell back to NullLogger.Instance and the SSE-layer "stream content breakdown" Debug log was silently swallowed. Wire ILoggerFactory through DI and create a categorized logger.
…etclaw-dev#908) RollingFileLogger.IsEnabled was hardcoded to LogLevel.Information, which silently overrode any Debug-level configuration coming through Logging.LogLevel.Default or SetMinimumLevel. The framework was correctly configured for Debug, but every Debug log was rejected at the file sink — only the console sink saw them. This made the new SSE / middleware / actor content-breakdown diagnostics invisible in production daemon logs, and is consistent with operator reports that "structured log output isn't appearing in daemon logs." Defer entirely to the framework's configured minimum level instead of imposing our own floor. Cherry-picked from the diagnostic-investigation branch (the original 6558a47 also bumped specific call sites to Warning/Info temporarily; that bump is intentionally skipped here — the diagnostic logs stay at Debug and we let configuration decide what gets persisted).
Aaronontheweb
commented
May 9, 2026
Collaborator
Author
Aaronontheweb
left a comment
There was a problem hiding this comment.
Need to clarify one thing before we merge this
| }); | ||
|
|
||
| Command<ProcessingWatchdogExpired>(_ => { }); | ||
| Command<LlmCallFailed>(_ => { }); // stale failure arriving after watchdog timeout |
| public IDisposable? BeginScope<TState>(TState state) where TState : notnull => null; | ||
|
|
||
| public bool IsEnabled(LogLevel logLevel) => logLevel >= LogLevel.Information; | ||
| public bool IsEnabled(LogLevel logLevel) => logLevel != LogLevel.None; |
Collaborator
Author
There was a problem hiding this comment.
ensures that debug logs can actually show up now
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
--jinjaregime), retry viaEvaluateEmptyResponseinstead of letting Slack post a silent fallback reply.LlmSessionActor.Readyfor lateLlmCallFailed/LlmResponseReceivedafter watchdog timeouts (kills noisy dead letters).OpenAiCompatibleChatClient(SSE wire),LoggingChatClient(middleware),LlmSessionActor(post-assembly). Counts text deltas/chars, thinking deltas/chars, tool-call deltas, and finish reason at each layer so operators can pinpoint where deltas land or get dropped.ILoggerFactoryinjection inOpenAiCompatibleProviderPluginso the SSE-layer log isn't swallowed byNullLogger.Instance.RollingFileLoggerDebug-floor fix: removes the hardcoded>= LogLevel.Informationfilter soLogging:LogLevel:Default=Debugactually persists Debug logs to~/.netclaw/logs/daemon-*.log(bug: RollingFileLogger hardcodes Information floor, ignores framework log level #908). Without this, the new diagnostics — and any other Debug logs — never reach disk regardless of config.Related
Refs netclaw-dev/netclaw-website#16 — delivers the 3-layer SSE / middleware / actor diagnostic counters described in that issue's "What Netclaw provides to help diagnose this" section. Does not close it: the docs/troubleshooting article portion of #16 is still open and should be addressed separately on the netclaw-website repo.
Context
Symptom: a recent self-hosted Slack session (
D0AC6CKBK5K/1778333220.192409, 2026-05-09) produced three consecutive streaming turns withoutput: 2/4/28final tokens despite manyThinking delta:events arriving. Each turn ended withTurn completed without visible Slack output; posting fallback reply. Same backend, the non-streaming title and distillation calls returned full output (1752 / 1358 / 466 tokens), so the failure was specific to the streaming path with reasoning content.Pattern lines up with the May 8 testlab fix
fix(llama-server): add --jinja so Qwen3 tool-call template is honored. With--jinja+--reasoning-format deepseek, llama-server now correctly emits Qwen3's<think>content asdelta.reasoning_content(separate channel) rather than jumbling it intodelta.content. Our streaming consumer surfaces those reasoning deltas as Thinking but the assistant response can be empty if the model never transitions to content — that's the case the new retry guard handles.Composition
Three commits, smallest-change-first:
a64819c3— diagnostics + thinking-only retry guard (cherry-picked from prior investigation branchclaude-wt-netclaw-insta-crash)3a86cf54—ILoggerFactoryinjection (cherry-picked; otherwise SSE-layer log is silent)1bf8da37—RollingFileLoggerDebug-floor fix (#908)The temporary Warning/Info level bump from the original investigation branch was deliberately not included — the diagnostics stay at Debug and configuration decides what gets persisted.
Test plan
Full test suite passes (3,342 / 3,342, 0 failures)
dotnet buildclean (0 warnings, 0 errors)Live validation against testlab (
https://llm.testlab.petabridge.net,Qwen3.6-27B-UD-Q4_K_XL.gguf) in an isolated Docker container withNETCLAW_Logging__LogLevel__Default=Debug. All three breakdown logs landed in the daemon log file at[DBG]:Counts agree across all three layers for a healthy call — confirming no delta loss in the happy path and that the instrumentation is wired correctly to compare against fault scenarios.
Reproduce the May 9 fault pattern with the retry guard active and confirm a real assistant message replaces the fallback reply.