Skip to content

Conversation

@mohammedabdulwahhab
Copy link
Contributor

@mohammedabdulwahhab mohammedabdulwahhab commented Oct 21, 2025

As part of the introduction of OTEL, I relocated the logging::init() to happen after the tokio runtime is available (Location). The reason for this is that the export of spans happens async in the event loop. This however caused a regression where logs emitted initially are dropped since the logging stack is initialized further down.

This change reversts the init of the logging stack and creates a dedicated thread to manage the exports.

Summary by CodeRabbit

  • Chores
    • Logging is now initialized eagerly when the module is imported rather than during runtime construction.
    • Logging setup now ensures it runs within an appropriate runtime context when OTLP/exporting is enabled, improving compatibility in distributed scenarios.
    • No public APIs or exported signatures were changed.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Oct 21, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@pull-request-size pull-request-size bot added size/M and removed size/S labels Oct 21, 2025
@mohammedabdulwahhab mohammedabdulwahhab marked this pull request as ready for review October 21, 2025 19:24
@mohammedabdulwahhab mohammedabdulwahhab requested a review from a team as a code owner October 21, 2025 19:24
Signed-off-by: mohammedabdulwahhab <[email protected]>
@mohammedabdulwahhab mohammedabdulwahhab force-pushed the mabdulwahhab/tracing-regression-fix branch from d741e95 to 03b236f Compare October 21, 2025 19:30
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 21, 2025

Walkthrough

Logging initialization was moved to Python module import time (_core), and the logging module gained a static OnceCell-backed Tokio runtime to initialize logging when JSONL logging with OTLP exporter requires an async runtime.

Changes

Cohort / File(s) Summary
Python binding init
lib/bindings/python/rust/lib.rs
rs::logging::init() is invoked unconditionally during the Python _core module initialization; the earlier call from DistributedRuntime::new was removed.
Logging runtime management
lib/runtime/src/logging.rs
Added use once_cell::sync::OnceCell; and static LOGGING_RT: OnceCell<tokio::runtime::Runtime>; refactored init() to detect when a runtime is needed (JSONL + OTLP) and to use an existing runtime or create/enter the OnceCell runtime before calling setup_logging(), otherwise initialize directly.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I hopped in code at import's chime,

Logs awake at module time.
OnceCell cradles runtime tight,
OTLP hums through the night,
A rabbit cheers — logging's right!

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description Check ⚠️ Warning No pull request description was provided by the author. The repository's description template requires sections for Overview, Details, Where should the reviewer start, and Related Issues, but the PR submission is entirely empty. This represents a completely missing description that does not follow the expected structure or provide any context for reviewers about the changes, their motivation, implementation details, or related issues. The author should add a complete pull request description following the repository template. The description should include an overview explaining the issue being fixed, detailed information about the separate runtime mechanism introduced for span exporting, guidance on which files reviewers should focus on (particularly lib/runtime/src/logging.rs and lib/bindings/python/rust/lib.rs), and any related GitHub issues using action keywords like "Fixes" or "Closes".
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The pull request title "fix: start separate runtime for span exporter" directly aligns with the primary change in the codebase. The main modification is the introduction of a dedicated Tokio runtime (LOGGING_RT) in lib/runtime/src/logging.rs to handle logging when OTLP exporting is enabled, which the title accurately captures. The title is concise, clear, and specific without vague language or noise, making it easy for reviewers scanning the history to understand the core purpose of the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
lib/runtime/src/logging.rs (2)

118-120: Document that the logging runtime persists for the process lifetime.

The comment mentions the runtime is "only created if needed" but doesn't clarify that it persists until process exit and is never explicitly shut down. This is important for understanding the lifecycle, especially since OTLP batch exporters may not flush remaining spans on abrupt termination.

Consider updating the comment:

-/// Static runtime for OTLP exports and other async logging tasks
-/// Only created if needed (when OTLP export is enabled and no runtime exists)
+/// Static runtime for OTLP exports and other async logging tasks.
+/// Only created if needed (when OTLP export is enabled and no runtime exists).
+/// This runtime persists for the process lifetime and is never explicitly shut down.
+/// OTLP batch exporter background tasks continue until process exit.
 static LOGGING_RT: OnceCell<tokio::runtime::Runtime> = OnceCell::new();

717-756: The runtime initialization logic is correct and addresses the tracing regression.

The three-branch logic properly handles:

  1. Using an existing runtime when available
  2. Creating a dedicated runtime when needed for OTLP exports
  3. No runtime when only basic logging is required

The batch span processor spawns background tasks onto LOGGING_RT while the runtime is entered (line 742's _guard), and those tasks continue running on the worker threads after the context is exited.

Optional: Consider refactoring duplicated error handling.

The error handling is identical in three places (lines 726-729, 743-746, 750-753). While not critical, this could be deduplicated:

 pub fn init() {
     INIT.call_once(|| {
         // Check if we need a runtime (only for OTLP exports)
         let needs_runtime = jsonl_logging_enabled() && otlp_exporter_enabled();
+        
+        let result = if needs_runtime {
-        if needs_runtime {
             // Ensure we're in a tokio runtime context for OTLP exporter initialization
             if tokio::runtime::Handle::try_current().is_ok() {
                 // Already in a runtime, use it
-                if let Err(e) = setup_logging() {
-                    eprintln!("Failed to initialize logging: {}", e);
-                    std::process::exit(1);
-                }
+                setup_logging()
             } else {
                 // No runtime available, create a dedicated one for logging
                 let rt = LOGGING_RT.get_or_init(|| {
                     tokio::runtime::Builder::new_multi_thread()
                         .worker_threads(1)
                         .thread_name("dynamo-logging")
                         .enable_all()
                         .build()
                         .expect("Failed to create logging runtime")
                 });

                 // Enter the runtime context and initialize logging
                 let _guard = rt.enter();
-                if let Err(e) = setup_logging() {
-                    eprintln!("Failed to initialize logging: {}", e);
-                    std::process::exit(1);
-                }
+                setup_logging()
             }
         } else {
             // No runtime needed - just basic logging
-            if let Err(e) = setup_logging() {
-                eprintln!("Failed to initialize logging: {}", e);
-                std::process::exit(1);
-            }
+            setup_logging()
+        };
+
+        if let Err(e) = result {
+            eprintln!("Failed to initialize logging: {}", e);
+            std::process::exit(1);
         }
     });
 }
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 237e978 and 9a7d9f0.

📒 Files selected for processing (2)
  • lib/bindings/python/rust/lib.rs (1 hunks)
  • lib/runtime/src/logging.rs (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
lib/runtime/src/logging.rs (1)
lib/runtime/src/config.rs (1)
  • jsonl_logging_enabled (426-428)
lib/bindings/python/rust/lib.rs (2)
lib/runtime/src/logging.rs (1)
  • init (717-756)
lib/runtime/src/storage/key_value_store.rs (1)
  • init (288-290)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: clippy (lib/runtime/examples)
  • GitHub Check: clippy (.)
  • GitHub Check: clippy (launch/dynamo-run)
  • GitHub Check: tests (.)
  • GitHub Check: tests (lib/bindings/python)
  • GitHub Check: clippy (lib/bindings/python)
🔇 Additional comments (2)
lib/bindings/python/rust/lib.rs (1)

121-122: LGTM! Early logging initialization correctly isolates the OTLP exporter.

The eager initialization at module import time ensures logging is available before any application runtime is created. When OTLP exports are enabled, this will create a dedicated runtime (LOGGING_RT) separate from the DistributedRuntime, which should resolve the tracing regression.

lib/runtime/src/logging.rs (1)

712-716: Excellent documentation of the safe-call semantics.

The doc comment clearly explains how the function handles different runtime contexts, which is important given that logging can be initialized from multiple entry points (Python bindings, key_value_store, etc.).

Copy link
Contributor

@keivenchang keivenchang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as you've tested this and all the CIs pass, LGTM.

Copy link
Contributor

@keivenchang keivenchang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's been decided to revert and punt on this PR.

@github-actions
Copy link

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Nov 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants