Skip to content

Conversation

@ryan-lempka
Copy link
Contributor

@ryan-lempka ryan-lempka commented Sep 16, 2025

Overview:

Enables the ability to capture the request/response and write to stderr. To use this feature set env var DYN_AUDIT_ENABLED=1 and store=true in the request.

Enables compliance, distillation, evaluation, and analytics use cases.

This PR begins with stderr as the target but is designed to support the addition of additional targets such as a persistent stream. The stream can be monitored by an independent process that can consume and persist the audit data in a database.

Details:

  • Environment controls: DYN_AUDIT_ENABLED=1 enables auditing, DYN_AUDIT_CAPACITY sets buffer size
  • Universal support: Works for streaming/non-streaming, tool-calling, reasoning content
  • Off-hot-path: Bus + sink worker architecture with async I/O to stderr (extensible to other targets)
  • Transport-agnostic: Single integration point in preprocessor works across HTTP/gRPC

Architecture: Audit happens post-transform for consistency with client output. Broadcast bus enables fan-out without blocking requests.

Example outputs:

//  (store=true) in request and `DYN_AUDIT_ENABLED=1`
{"schema_version":1,"request_id":"d8e62fae-7107-4567-993f-79d5f9757b1b","requested_streaming":true,"mode":"full","model":"deepseek-ai/DeepSeek-R1-Distill-Llama-8B","usage":null,"request":{"messages":[{"role":"user","content":"What is 10 + 10?"}],"model":"deepseek-ai/DeepSeek-R1-Distill-Llama-8B","store":true,"max_tokens":1000,"stream":true},"response":{"id":"chatcmpl-d8e62fae-7107-4567-993f-79d5f9757b1b","choices":[{"index":0,"message":{"content":"...","role":"assistant"},"finish_reason":"stop"}],"created":1758740068,"model":"deepseek-ai/DeepSeek-R1-Distill-Llama-8B","object":"chat.completion","usage":null}}

Where to start:

  • lib/llm/src/audit/ - Core system (bus, handle, sink, stream)
  • lib/llm/src/preprocessor.rs - Integration point
  • lib/llm/src/entrypoint/input.rs - Initialization

Summary by CodeRabbit

  • New Features
    • Added an auditing subsystem for chat completions, publishing usage-only or full request/response records.
    • Configurable via environment variables:
      • DYN_AUDIT_ENABLED to toggle auditing
      • DYN_AUDIT_SINKS to select outputs (default: stderr)
      • DYN_AUDIT_CAPACITY to size the event buffer
    • Streaming responses are unchanged for users; records are aggregated post-stream for full audits.
    • Background workers are started automatically when enabled, with readiness logging.

@ryan-lempka ryan-lempka requested a review from a team as a code owner September 16, 2025 17:56
@github-actions github-actions bot added the feat label Sep 16, 2025
@ryan-lempka ryan-lempka self-assigned this Sep 16, 2025
@ryan-lempka ryan-lempka changed the title feat: add audit logging for chat completions feat: add audit logging for chat completions non-streaming Sep 16, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 16, 2025

Walkthrough

Adds a new auditing subsystem: configuration, a broadcast bus, record/handle types, sinks with worker tasks, and a streaming passthrough that aggregates chunks into a final response. Wires initialization into the input entrypoint, conditionally starting the bus and sinks based on env-driven policy and capacity.

Changes

Cohort / File(s) Summary
Audit module scaffolding
lib/llm/src/audit/mod.rs
Introduces the audit module and publicly re-exports submodules: bus, config, handle, sink, stream.
Audit event bus
lib/llm/src/audit/bus.rs
Adds a OnceLock-backed broadcast sender for Arc<AuditRecord> with init(capacity), subscribe(), and publish(AuditRecord).
Audit configuration
lib/llm/src/audit/config.rs
Adds AuditPolicy { enabled }, a OnceLock policy store, init_from_env() reading DYN_AUDIT_ENABLED, and policy() accessor.
Audit handle and record types
lib/llm/src/audit/handle.rs
Defines AuditMode, AuditRecord, AuditHandle, and CompletionUsage alias. Provides create_handle(...), setters, and emit() to publish to the bus.
Audit sinks and workers
lib/llm/src/audit/sink.rs
Introduces AuditSink trait and StderrSink. Parses DYN_AUDIT_SINKS and spawn_workers_from_env() to subscribe and emit records per sink.
Streaming passthrough and aggregation
lib/llm/src/audit/stream.rs
Adds PassThroughWithAgg<S> stream wrapper collecting chunks and scan_aggregate_with_future(...) returning passthrough plus a future for aggregated response; includes tests.
Entrypoint wiring
lib/llm/src/entrypoint/input.rs
On startup, if policy enabled: initializes audit bus with DYN_AUDIT_CAPACITY (default 1024), spawns sink workers, and logs capacity.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant EP as Entrypoint
  participant C as audit::config
  participant B as audit::bus
  participant S as audit::sink
  participant H as audit::handle
  participant App as App Logic

  EP->>C: policy()
  alt policy.enabled
    EP->>B: init(capacity from DYN_AUDIT_CAPACITY)
    EP->>S: spawn_workers_from_env()
    S->>B: subscribe()
    Note right of S: Workers ready to receive AuditRecord
  else
    Note over EP: Auditing disabled
  end

  App->>H: create_handle(req, request_id)
  alt Some(handle)
    App->>H: set_request(req) / add_usage(...)
    App->>H: set_response(resp)
    App->>H: emit()
    H->>B: publish(AuditRecord)
    B-->>S: broadcast Arc<AuditRecord>
    S->>S: emit(record) via each configured sink
  else None
    Note over App: Skip auditing
  end
Loading
sequenceDiagram
  autonumber
  participant Client as Client
  participant Stream as Upstream Stream
  participant PTA as PassThroughWithAgg
  participant Agg as Aggregator Task
  participant Fut as Aggregation Future

  Client->>PTA: poll_next()
  PTA->>Stream: poll_next()
  alt Next chunk
    Stream-->>PTA: Annotated<Chunk>
    PTA->>PTA: buffer clone
    PTA-->>Client: forward chunk
  else End of stream
    Stream-->>PTA: None
    PTA->>Agg: spawn aggregate(buffer)
    Agg-->>Fut: send final Response
    PTA-->>Client: None
  end

  Note over Fut: Future resolves to NvCreateChatCompletionResponse (fallback on failure)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

A rabbit taps logs with a tiny paw,
Catching each whisper, each token it saw.
Streams trickle by, then gather and sing—
A record takes flight on broadcast wing.
Stderr glows softly: “audit complete.”
Hippity-hop—observability sweet.

Pre-merge checks

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 55.17% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Description Check ⚠️ Warning The pull request description includes the Overview, Details, and a "Where to start" section, but it omits the required "Related Issues" section from the repository template and does not use the exact heading "#### Where should the reviewer start?" as specified, making it incomplete relative to the required structure. Please add the missing "#### Related Issues" section with the correct action keyword (Closes/Fixes/Resolves) and issue reference, and rename the "Where to start:" heading to exactly "#### Where should the reviewer start?" to fully conform to the template.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The title succinctly and accurately describes the primary change of adding audit logging for chat completions, aligning with the PR’s objective and providing clear context to reviewers.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🧪 Early access (Sonnet 4.5): enabled

We are currently testing the Sonnet 4.5 model, which is expected to improve code review quality. However, this model may lead to increased noise levels in the review comments. Please disable the early access features if the noise level causes any inconvenience.

Note:

  • Public repositories are always opted into early access features.
  • You can enable or disable early access features from the CodeRabbit UI or by updating the CodeRabbit configuration file.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (7)
lib/llm/src/http/service.rs (1)

31-31: Confirm need for public exposure of audit module

If external consumers don’t need to call auditing APIs, prefer restricting scope to avoid expanding the public surface.

Apply if internal-only:

-pub mod audit;
+pub(crate) mod audit;
lib/llm/src/http/service/openai.rs (2)

492-501: Audit gating and conditional clone: OK; consider deferring clone to success path

Current approach avoids clone unless needed. You could further defer the clone until after the response is folded to skip cloning on error paths (minor perf), but it would reflect stream=true in the logged request. If preserving the original stream flag matters, keep as-is.


597-601: Emit request/response as structured fields and add an explicit tracing target

With the current % formatter, request/response become stringified JSON. If your JSONL pipeline expects nested objects, serialize via a serde-aware field or attach a dedicated target for easier filtering.

Proposed minimal change (adds a target; keeps current field formatting). For fully structured fields, see follow-up note.

-        if let Some(req_copy) = request_for_audit {
-            let resp_json = serde_json::to_value(&response).unwrap_or(serde_json::Value::Null);
-            audit::log_stored_completion(&request_id, &req_copy, resp_json);
-        }
+        if let Some(req_copy) = request_for_audit {
+            let resp_json = serde_json::to_value(&response).unwrap_or(serde_json::Value::Null);
+            audit::log_stored_completion(&request_id, &req_copy, resp_json);
+        }

Follow-up (optional, requires tracing-serde and logger support): log as nested objects using AsSerde and a target (see audit.rs comment).

lib/llm/src/http/service/audit.rs (4)

26-31: Avoid potential panic and normalize timestamp type

duration_since(UNIX_EPOCH).unwrap() can theoretically panic; also prefer i64 for downstream JSON consumers.

Apply:

-    let ts_ms = SystemTime::now()
-        .duration_since(UNIX_EPOCH)
-        .unwrap()
-        .as_millis();
+    let ts_ms: i64 = SystemTime::now()
+        .duration_since(UNIX_EPOCH)
+        .map(|d| d.as_millis() as i64)
+        .unwrap_or(0);

32-41: Add a dedicated tracing target and (optionally) emit structured JSON fields

A target simplifies log routing. Today %request_val/%response_json serialize to strings; if your JSONL stack expects nested objects, emit serde-backed values.

Minimal (target only):

-    tracing::info!(
+    tracing::info!(target = "dynamo_audit",
         log_type = "audit",
         schema_version = "1.0",
         ts_ms = ts_ms,
         store_id = %store_id,
         request_id = request_id,
         request = %request_val,
         response = %response_json,
         "Audit log for stored completion"
     );

Optional (structured fields; requires adding tracing-serde and configuring the JSON formatter to honor it):

-    tracing::info!(target = "dynamo_audit",
-        request = %request_val,
-        response = %response_json,
-        ...
-    );
+    use tracing_serde::AsSerde;
+    tracing::info!(target = "dynamo_audit",
+        request = ?AsSerde(&request_val),
+        response = ?AsSerde(&response_json),
+        ...
+    );

Please confirm what your DYN_LOGGING_JSONL layer expects.


49-64: Tests cover flag matrix; consider one negative-path clone test (optional)

You might add a test asserting no request clone occurs when stream=true or store=false (using counters or a lightweight wrapper), but this is optional.


81-113: Smoke test is fine; consider asserting log shape via a test subscriber (optional)

If feasible, attach a test tracing subscriber to capture the event and assert presence of log_type="audit" and request_id.

I can draft a minimal test subscriber if you’d like.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bc29b59 and ebcadce.

📒 Files selected for processing (3)
  • lib/llm/src/http/service.rs (1 hunks)
  • lib/llm/src/http/service/audit.rs (1 hunks)
  • lib/llm/src/http/service/openai.rs (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
lib/llm/src/http/service/audit.rs (1)
lib/llm/src/http/service/openai.rs (2)
  • chat_completions (455-605)
  • s (61-61)
lib/llm/src/http/service/openai.rs (1)
lib/llm/src/http/service/audit.rs (2)
  • should_audit_flags (15-17)
  • log_stored_completion (19-42)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Build and Test - dynamo
  • GitHub Check: pre-merge-rust (lib/bindings/python)
  • GitHub Check: pre-merge-rust (.)
  • GitHub Check: pre-merge-rust (lib/runtime/examples)
🔇 Additional comments (3)
lib/llm/src/http/service/openai.rs (1)

27-27: Import looks good

lib/llm/src/http/service/audit.rs (2)

9-13: Env flag parsing: LGTM

Covers "1"/"true" (case-insensitive) and defaults off.


15-17: Audit gating logic is correct

Non-streaming + enabled + store=true is enforced.

Copy link
Contributor

@ryanolson ryanolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is http the right place for this or is preprocess/processor?

https is just a shim/transport. if we put the auditing here, we need to do the same in the grpc frontend that @GuanLuo is doing/or finished.

the responses are finalized in the post part of the "processor".

that feels like a better place to audit since all frontend code paths (regardless of public api or transport) will flow through there.

@ryan-lempka
Copy link
Contributor Author

is http the right place for this or is preprocess/processor?

https is just a shim/transport. if we put the auditing here, we need to do the same in the grpc frontend that @GuanLuo is doing/or finished.

the responses are finalized in the post part of the "processor".

that feels like a better place to audit since all frontend code paths (regardless of public api or transport) will flow through there.

@ryanolson thanks for the feedback - makes sense. I’ll work through the comments later this afternoon and move the logic into the processor.

@ryan-lempka
Copy link
Contributor Author

@ryanolson ready for re-review. Let me know what you think of using annotations in this manner. Also this PR is scoped to non-streaming for now but I want to make sure the direction will align with streaming as well.

@ryan-lempka ryan-lempka force-pushed the rlempka/log-stdout-store-true branch from 548e5d4 to 6bcc3f3 Compare September 30, 2025 19:47
@copy-pr-bot
Copy link

copy-pr-bot bot commented Sep 30, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@ryan-lempka ryan-lempka force-pushed the rlempka/log-stdout-store-true branch from 6bcc3f3 to 9cc0140 Compare September 30, 2025 19:49
Signed-off-by: Ryan Lempka <[email protected]>
@ryan-lempka ryan-lempka force-pushed the rlempka/log-stdout-store-true branch from 9cc0140 to 048d6ef Compare September 30, 2025 19:55
@ryan-lempka ryan-lempka enabled auto-merge (squash) September 30, 2025 19:56
@ayushag-nv
Copy link
Contributor

/ok to test 048d6ef

@ryan-lempka ryan-lempka enabled auto-merge (squash) September 30, 2025 21:22
@ryan-lempka ryan-lempka merged commit 56d20f5 into main Sep 30, 2025
25 of 26 checks passed
@ryan-lempka ryan-lempka deleted the rlempka/log-stdout-store-true branch September 30, 2025 21:48
ziqifan617 pushed a commit that referenced this pull request Oct 1, 2025
nv-tusharma pushed a commit that referenced this pull request Oct 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants