fix: start separate runtime for span exporter #3781

mohammedabdulwahhab · 2025-10-21T18:24:54Z

As part of the introduction of OTEL, I relocated the logging::init() to happen after the tokio runtime is available (Location). The reason for this is that the export of spans happens async in the event loop. This however caused a regression where logs emitted initially are dropped since the logging stack is initialized further down.

This change reversts the init of the logging stack and creates a dedicated thread to manage the exports.

Summary by CodeRabbit

Chores
- Logging is now initialized eagerly when the module is imported rather than during runtime construction.
- Logging setup now ensures it runs within an appropriate runtime context when OTLP/exporting is enabled, improving compatibility in distributed scenarios.
- No public APIs or exported signatures were changed.

copy-pr-bot · 2025-10-21T18:24:57Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

lib/bindings/python/rust/lib.rs

Signed-off-by: mohammedabdulwahhab <[email protected]>

coderabbitai · 2025-10-21T19:30:57Z

Walkthrough

Logging initialization was moved to Python module import time (_core), and the logging module gained a static OnceCell-backed Tokio runtime to initialize logging when JSONL logging with OTLP exporter requires an async runtime.

Changes

Cohort / File(s)	Summary
Python binding init `lib/bindings/python/rust/lib.rs`	`rs::logging::init()` is invoked unconditionally during the Python `_core` module initialization; the earlier call from `DistributedRuntime::new` was removed.
Logging runtime management `lib/runtime/src/logging.rs`	Added `use once_cell::sync::OnceCell;` and `static LOGGING_RT: OnceCell<tokio::runtime::Runtime>`; refactored `init()` to detect when a runtime is needed (JSONL + OTLP) and to use an existing runtime or create/enter the OnceCell runtime before calling `setup_logging()`, otherwise initialize directly.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I hopped in code at import's chime,

Logs awake at module time.
OnceCell cradles runtime tight,
OTLP hums through the night,
A rabbit cheers — logging's right!

Pre-merge checks

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	No pull request description was provided by the author. The repository's description template requires sections for Overview, Details, Where should the reviewer start, and Related Issues, but the PR submission is entirely empty. This represents a completely missing description that does not follow the expected structure or provide any context for reviewers about the changes, their motivation, implementation details, or related issues.	The author should add a complete pull request description following the repository template. The description should include an overview explaining the issue being fixed, detailed information about the separate runtime mechanism introduced for span exporting, guidance on which files reviewers should focus on (particularly lib/runtime/src/logging.rs and lib/bindings/python/rust/lib.rs), and any related GitHub issues using action keywords like "Fixes" or "Closes".

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The pull request title "fix: start separate runtime for span exporter" directly aligns with the primary change in the codebase. The main modification is the introduction of a dedicated Tokio runtime (LOGGING_RT) in lib/runtime/src/logging.rs to handle logging when OTLP exporting is enabled, which the title accurately captures. The title is concise, clear, and specific without vague language or noise, making it easy for reviewers scanning the history to understand the core purpose of the changeset.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

lib/runtime/src/logging.rs (2)

118-120: Document that the logging runtime persists for the process lifetime.

The comment mentions the runtime is "only created if needed" but doesn't clarify that it persists until process exit and is never explicitly shut down. This is important for understanding the lifecycle, especially since OTLP batch exporters may not flush remaining spans on abrupt termination.

Consider updating the comment:

-/// Static runtime for OTLP exports and other async logging tasks
-/// Only created if needed (when OTLP export is enabled and no runtime exists)
+/// Static runtime for OTLP exports and other async logging tasks.
+/// Only created if needed (when OTLP export is enabled and no runtime exists).
+/// This runtime persists for the process lifetime and is never explicitly shut down.
+/// OTLP batch exporter background tasks continue until process exit.
 static LOGGING_RT: OnceCell<tokio::runtime::Runtime> = OnceCell::new();

717-756: The runtime initialization logic is correct and addresses the tracing regression.

The three-branch logic properly handles:

Using an existing runtime when available
Creating a dedicated runtime when needed for OTLP exports
No runtime when only basic logging is required

The batch span processor spawns background tasks onto LOGGING_RT while the runtime is entered (line 742's _guard), and those tasks continue running on the worker threads after the context is exited.

Optional: Consider refactoring duplicated error handling.

The error handling is identical in three places (lines 726-729, 743-746, 750-753). While not critical, this could be deduplicated:

 pub fn init() {
     INIT.call_once(|| {
         // Check if we need a runtime (only for OTLP exports)
         let needs_runtime = jsonl_logging_enabled() && otlp_exporter_enabled();
+        
+        let result = if needs_runtime {
-        if needs_runtime {
             // Ensure we're in a tokio runtime context for OTLP exporter initialization
             if tokio::runtime::Handle::try_current().is_ok() {
                 // Already in a runtime, use it
-                if let Err(e) = setup_logging() {
-                    eprintln!("Failed to initialize logging: {}", e);
-                    std::process::exit(1);
-                }
+                setup_logging()
             } else {
                 // No runtime available, create a dedicated one for logging
                 let rt = LOGGING_RT.get_or_init(|| {
                     tokio::runtime::Builder::new_multi_thread()
                         .worker_threads(1)
                         .thread_name("dynamo-logging")
                         .enable_all()
                         .build()
                         .expect("Failed to create logging runtime")
                 });

                 // Enter the runtime context and initialize logging
                 let _guard = rt.enter();
-                if let Err(e) = setup_logging() {
-                    eprintln!("Failed to initialize logging: {}", e);
-                    std::process::exit(1);
-                }
+                setup_logging()
             }
         } else {
             // No runtime needed - just basic logging
-            if let Err(e) = setup_logging() {
-                eprintln!("Failed to initialize logging: {}", e);
-                std::process::exit(1);
-            }
+            setup_logging()
+        };
+
+        if let Err(e) = result {
+            eprintln!("Failed to initialize logging: {}", e);
+            std::process::exit(1);
         }
     });
 }

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 237e978 and 9a7d9f0.

📒 Files selected for processing (2)

lib/bindings/python/rust/lib.rs (1 hunks)
lib/runtime/src/logging.rs (3 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

lib/runtime/src/logging.rs (1)

lib/runtime/src/config.rs (1)

jsonl_logging_enabled (426-428)

lib/bindings/python/rust/lib.rs (2)

lib/runtime/src/logging.rs (1)

init (717-756)

lib/runtime/src/storage/key_value_store.rs (1)

init (288-290)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: clippy (lib/runtime/examples)
GitHub Check: clippy (.)
GitHub Check: clippy (launch/dynamo-run)
GitHub Check: tests (.)
GitHub Check: tests (lib/bindings/python)
GitHub Check: clippy (lib/bindings/python)

🔇 Additional comments (2)

lib/bindings/python/rust/lib.rs (1)

121-122: LGTM! Early logging initialization correctly isolates the OTLP exporter.

The eager initialization at module import time ensures logging is available before any application runtime is created. When OTLP exports are enabled, this will create a dedicated runtime (LOGGING_RT) separate from the DistributedRuntime, which should resolve the tracing regression.

lib/runtime/src/logging.rs (1)

712-716: Excellent documentation of the safe-call semantics.

The doc comment clearly explains how the function handles different runtime contexts, which is important given that logging can be initialized from multiple entry points (Python bindings, key_value_store, etc.).

keivenchang

As long as you've tested this and all the CIs pass, LGTM.

keivenchang

It's been decided to revert and punt on this PR.

github-actions · 2025-11-25T09:37:04Z

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

pull-request-size bot added the size/S label Oct 21, 2025

github-actions bot added the fix label Oct 21, 2025

nnshah1 reviewed Oct 21, 2025

View reviewed changes

lib/bindings/python/rust/lib.rs Outdated Show resolved Hide resolved

pull-request-size bot added size/M and removed size/S labels Oct 21, 2025

mohammedabdulwahhab mentioned this pull request Oct 21, 2025

fix: fix OTEL regression #3777

Closed

mohammedabdulwahhab marked this pull request as ready for review October 21, 2025 19:24

mohammedabdulwahhab requested a review from a team as a code owner October 21, 2025 19:24

fix: init runtime for logging

03b236f

Signed-off-by: mohammedabdulwahhab <[email protected]>

mohammedabdulwahhab force-pushed the mabdulwahhab/tracing-regression-fix branch from d741e95 to 03b236f Compare October 21, 2025 19:30

Merge branch 'main' into mabdulwahhab/tracing-regression-fix

9a7d9f0

copy-pr-bot bot temporarily deployed to GITLAB October 21, 2025 19:32 Inactive

coderabbitai bot reviewed Oct 21, 2025

View reviewed changes

copy-pr-bot bot temporarily deployed to GITLAB October 21, 2025 19:58 Inactive

keivenchang approved these changes Oct 24, 2025

View reviewed changes

keivenchang requested changes Oct 24, 2025

View reviewed changes

github-actions bot added the Stale label Nov 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: start separate runtime for span exporter #3781

fix: start separate runtime for span exporter #3781

Uh oh!

mohammedabdulwahhab commented Oct 21, 2025 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Oct 21, 2025

Uh oh!

Uh oh!

coderabbitai bot commented Oct 21, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

keivenchang left a comment

Uh oh!

keivenchang left a comment

Uh oh!

github-actions bot commented Nov 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fix: start separate runtime for span exporter #3781

Are you sure you want to change the base?

fix: start separate runtime for span exporter #3781

Uh oh!

Conversation

mohammedabdulwahhab commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Oct 21, 2025

Uh oh!

Uh oh!

coderabbitai bot commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

keivenchang left a comment

Choose a reason for hiding this comment

Uh oh!

keivenchang left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mohammedabdulwahhab commented Oct 21, 2025 •

edited

Loading

coderabbitai bot commented Oct 21, 2025 •

edited

Loading