You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Now that session-scoped diagnostics route through SessionLogDispatcher and
land in the per-session session.log (#918), we have a strong in-process
contract: every log line emitted under a populated SessionDiagnosticsContext
carries a session id and ends up filed by session.
For deployments that ship traces and logs to an OpenTelemetry collector
(rather than relying on local files), we want the same property: every
session-scoped event arriving at the collector should carry a session.id
(or equivalent) attribute that operators can pivot, filter, and alert on.
This is exploratory because it is not yet clear which layers consistently
propagate the attribute today.
Why this matters
Alerting: "ERROR in session X" should be a fan-in operator alert that
doesn't require pulling local files off the daemon. Today the OTel side
doesn't have a guaranteed session.id tag on diagnostic logs from the
provider plugins, so we cannot build that alert reliably.
Reporting: test-lab and customer deployments need rollups by session —
failure rate per session, slowest session, longest tool call within a
session. These all key on a structured attribute.
Cross-cutting: the same property would let us correlate traces from
the LLM call, the tool execution, and the channel ingress for a single
Slack thread.
Scope
Verify (or document gaps in) the following propagation paths under
OpenTelemetry export:
SessionDiagnosticsContext.Push(sessionId) is set at the LLM call
boundary (per analyzer: flag session-owned chat client calls outside session diagnostics context #915's analyzer). Logs written via MEL ILogger<T>
inside that scope: do they currently carry session.id in their
exported attributes? If not, what is the correct shape — a structured
logging scope, an ActivitySource baggage entry, or both?
Activities started by the daemon (ActivitySource instrumentation in Netclaw.Actors.Telemetry, SessionTelemetry, and channel-side
instrumentation): do they consistently tag session.id and equivalent
contextual fields (channel.type, model.id, provider.name)?
Channel ingress (Slack, Discord, SignalR, CLI): when a turn starts
from a channel, is the session.id attached to the originating Activity such that downstream child activities inherit it via Activity.Current?
Inventory of every place we emit logs or activities that should be
session-scoped, with current state of session.id tagging.
Identified gaps with proposed fixes.
A small set of operator-facing alert recipes that use the attribute
(e.g., "session error rate above threshold", "session N tools called
exceeded budget").
Documentation of the standard attribute name(s) used (likely session.id per OTel semconv-leaning naming, plus any Netclaw-specific
attributes such as netclaw.session.channel, netclaw.session.model).
Acceptance criteria
A short report enumerating the propagation gaps.
Either a follow-up issue per gap, or a single bundled fix PR if the
gaps are small enough.
The standard attribute names land in docs/spec/configuration.md's
telemetry section so operators can rely on them.
Background
Now that session-scoped diagnostics route through
SessionLogDispatcherandland in the per-session
session.log(#918), we have a strong in-processcontract: every log line emitted under a populated
SessionDiagnosticsContextcarries a session id and ends up filed by session.
For deployments that ship traces and logs to an OpenTelemetry collector
(rather than relying on local files), we want the same property: every
session-scoped event arriving at the collector should carry a
session.id(or equivalent) attribute that operators can pivot, filter, and alert on.
This is exploratory because it is not yet clear which layers consistently
propagate the attribute today.
Why this matters
doesn't require pulling local files off the daemon. Today the OTel side
doesn't have a guaranteed
session.idtag on diagnostic logs from theprovider plugins, so we cannot build that alert reliably.
failure rate per session, slowest session, longest tool call within a
session. These all key on a structured attribute.
the LLM call, the tool execution, and the channel ingress for a single
Slack thread.
Scope
Verify (or document gaps in) the following propagation paths under
OpenTelemetry export:
SessionDiagnosticsContext.Push(sessionId)is set at the LLM callboundary (per analyzer: flag session-owned chat client calls outside session diagnostics context #915's analyzer). Logs written via MEL
ILogger<T>inside that scope: do they currently carry
session.idin theirexported attributes? If not, what is the correct shape — a structured
logging scope, an
ActivitySourcebaggage entry, or both?Activities started by the daemon (
ActivitySourceinstrumentation inNetclaw.Actors.Telemetry,SessionTelemetry, and channel-sideinstrumentation): do they consistently tag
session.idand equivalentcontextual fields (
channel.type,model.id,provider.name)?Channel ingress (Slack, Discord, SignalR, CLI): when a turn starts
from a channel, is the
session.idattached to the originatingActivitysuch that downstream child activities inherit it viaActivity.Current?Sidecar paths (compaction, title generation, sub-agents, memory
distillation) — these bypass
SessionDiagnosticsContexttoday (seesession log: wrap remaining sidecar IChatClient call sites in SessionDiagnosticsContext.Push #920). When emitted through OTel, do they carry the parent
session.id? Likely not until session log: wrap remaining sidecar IChatClient call sites in SessionDiagnosticsContext.Push #920 lands.Deliverables
session-scoped, with current state of
session.idtagging.(e.g., "session error rate above threshold", "session N tools called
exceeded budget").
session.idper OTel semconv-leaning naming, plus any Netclaw-specificattributes such as
netclaw.session.channel,netclaw.session.model).Acceptance criteria
gaps are small enough.
docs/spec/configuration.md'stelemetry section so operators can rely on them.
References
SessionDiagnosticsContext.Push: analyzer: flag session-owned chat client calls outside session diagnostics context #915