Skip to content

Conversation

@michaelfeil
Copy link
Contributor

@michaelfeil michaelfeil commented Sep 18, 2025

Signed-off-by: michaelfeil [email protected]

Overview:

  • code quality not good, pls fix.
  • I hope the threading async loop tick will work here..

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

  • Tests

    • Stabilized cancellation test environment with a dedicated server thread and deterministic shutdown, reducing flakiness.
    • Introduced a session-scoped runtime to speed up runs by reusing resources across tests.
    • Removed a subprocess-based wrapper test, streamlining execution and diagnostics.
    • Marked cancellation tests as pre-merge for clearer gating in CI.
  • Refactor

    • Simplified test orchestration and teardown for clearer lifecycle management.
  • Chores

    • Minor imports and setup adjustments to support the new testing approach.

Signed-off-by: michaelfeil <[email protected]>
@michaelfeil michaelfeil requested review from a team as code owners September 18, 2025 02:44
@copy-pr-bot
Copy link

copy-pr-bot bot commented Sep 18, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions
Copy link

👋 Hi michaelfeil! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

@github-actions github-actions bot added fix external-contribution Pull request is from an external contributor labels Sep 18, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 18, 2025

Walkthrough

Reworks cancellation tests’ server lifecycle to run in a dedicated thread coordinated by a threading.Event, introduces a session-scoped runtime fixture, adjusts the server fixture to yield a thread and handler, removes the subprocess wrapper test module, and adds a pre_merge pytest marker to individual test modules.

Changes

Cohort / File(s) Summary
Server lifecycle and fixtures
lib/bindings/python/tests/test_cancellation/conftest.py
Server now runs in a separate thread via asyncio.run(init_server(...)) with a threading.Event for shutdown; server fixture yields (thread, handler) and handles stop/join; runtime fixture is session-scoped; imports updated.
Removed subprocess wrapper
lib/bindings/python/tests/test_cancellation/test_cancellation.py
Deleted module that previously executed specific tests via subprocess-based pytest invocations.
Apply pre_merge marker to tests
lib/bindings/python/tests/test_cancellation/test_client_loop_break.py, .../test_server_context_cancel.py, .../test_server_raise_cancelled.py
Added module-level pytestmark = pytest.mark.pre_merge; no test logic changes.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    actor PyTest as PyTest Runner
    participant CF as conftest.py
    participant RT as runtime (session-scoped)
    participant SF as server fixture
    participant TH as Server Thread
    participant EV as stop_event

    PyTest->>CF: request runtime
    CF-->>PyTest: provide RT (session-scoped)

    PyTest->>SF: request server(namespace)
    SF->>TH: start thread asyncio.run(init_server(RT, ns, EV))
    TH->>TH: init backend service and endpoint
    TH->>TH: start serving (async task)
    SF-->>PyTest: yield (thread, handler)

    rect rgba(230,240,255,0.6)
    note over PyTest,TH: Tests execute and interact with handler/server
    end

    PyTest->>SF: teardown server fixture
    SF->>EV: set() (signal shutdown)
    TH->>TH: await serve task stop or timeout
    TH-->>SF: exit thread
    SF->>SF: join thread with timeout
    SF-->>PyTest: teardown complete
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

A thread hums softly, servers spin,
An Event waits to tuck them in.
The runtime lasts from dusk to dawn,
Subprocess coach is now withdrawn.
Pre-merge flags on tests now gleam—
Thump-thump! says rabbit, “stream to dream.” 🐇✨

Pre-merge checks

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The PR description does not follow the repository template and is largely incomplete: Overview is a one-line "code quality not good, pls fix.", the Details and "Where should the reviewer start?" sections are empty, and the Related Issues entry is a placeholder, so reviewers cannot determine what changed, why, or where to focus. Please update the PR description to follow the template: provide a clear Overview stating the motivation and high-level changes, fill Details with the specific code changes and behavioral impact (for example mention the session-scoped runtime fixture, threading-based test server shutdown, and removal of the subprocess test wrapper), add "Where should the reviewer start?" with exact file paths and key functions to inspect, include testing steps and expected results, and replace the placeholder issue number with the actual issue or remove it.
Docstring Coverage ⚠️ Warning Docstring coverage is 62.50% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The title "fix: unit tests broken, better cancellation tests" concisely captures the primary intent of the changes—repairing broken unit tests and improving cancellation-related Python tests—and aligns with the modifications to conftest and the test modules shown in the diff.

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
lib/bindings/python/tests/test_cancellation/conftest.py (1)

151-174: Fix task lifecycle: avoid pending-task leaks, await cancellations, and handle timeout deterministically.

Current flow can leave event_done_task pending, not await a cancelled serve_task, and silently proceed on timeout. Tighten it and surface timeouts.

-        # Serve the endpoint - this will block until shutdown
-        serve_task = asyncio.create_task(endpoint.serve_endpoint(handler.generate))
-        event_done_task = asyncio.create_task(asyncio.to_thread(stop_event.wait))
-
-        done, pending = await asyncio.wait(
-            {serve_task, event_done_task},
-            return_when=asyncio.FIRST_COMPLETED,
-            timeout=30.0,
-        )
-        if serve_task in done:
-            print("Server task completed")
-            await serve_task  # Propagate exceptions if any
-        else:
-            serve_task.cancel()
+        # Serve the endpoint - this will block until shutdown
+        serve_task = asyncio.create_task(
+            endpoint.serve_endpoint(handler.generate), name=f"serve:{namespace}"
+        )
+        stop_wait_task = asyncio.create_task(
+            asyncio.to_thread(stop_event.wait), name=f"stopwait:{namespace}"
+        )
+
+        try:
+            done, _ = await asyncio.wait(
+                {serve_task, stop_wait_task},
+                return_when=asyncio.FIRST_COMPLETED,
+                timeout=30.0,
+            )
+            if not done:
+                raise TimeoutError("server did not start/stop within 30s")
+            # Stop signal: cancel server if still running
+            if stop_wait_task in done and not serve_task.done():
+                serve_task.cancel()
+            # Propagate server outcome (including CancelledError)
+            if serve_task in done:
+                await serve_task
+        finally:
+            for t in (serve_task, stop_wait_task):
+                if not t.done():
+                    t.cancel()
+                with suppress(asyncio.CancelledError):
+                    await t
🧹 Nitpick comments (9)
lib/bindings/python/tests/test_cancellation/conftest.py (3)

176-179: Make the server thread daemon and name it for easier debugging.

Prevents process hangs on interpreter shutdown and improves observability.

-    thread = threading.Thread(
-        target=asyncio.run, args=(init_server(runtime, namespace, stop_event),)
-    )
+    thread = threading.Thread(
+        target=asyncio.run,
+        args=(init_server(runtime, namespace, stop_event),),
+        name=f"server:{namespace}",
+        daemon=True,
+    )

181-183: Drop fixed startup sleep; rely on client.wait_for_instances().

The client fixture already awaits readiness; the sleep just adds nondeterminism and test latency.

-    # Give server time to start up
-    await asyncio.sleep(0.5)
+    # Client fixture handles readiness via wait_for_instances()

184-186: Fail fast if the server thread doesn’t stop.

Surface teardown failures instead of silently continuing with a live thread.

     yield thread, handler
     stop_event.set()
-    await asyncio.to_thread(thread.join, 5)
+    await asyncio.to_thread(thread.join, 5)
+    if thread.is_alive():
+        pytest.fail("Server thread failed to shut down within 5s")
lib/bindings/python/tests/test_cancellation/test_server_raise_cancelled.py (2)

19-20: Register the custom marker to avoid PytestUnknownMarkWarning.

Add pre_merge to pytest markers in pyproject/pytest.ini.

Example (pyproject.toml):

[tool.pytest.ini_options]
markers = [
  "pre_merge: run on pre-merge pipelines",
]

35-41: Assertion on exact error string is brittle; assert on intent.

Match on key substrings instead of the full message with trailing space.

-        assert (
-            str(e)
-            == "a python exception was caught while processing the async generator: CancelledError: "
-        )
+        msg = str(e)
+        assert "python exception was caught" in msg
+        assert "CancelledError" in msg
lib/bindings/python/tests/test_cancellation/test_server_context_cancel.py (2)

19-20: Register the custom marker to avoid PytestUnknownMarkWarning.

Same as other modules adding pre_merge.


35-39: Loosen exact-match assertion to reduce flakiness across backends.

Error wording may vary slightly; assert on the salient part.

-        assert str(e) == "Stream ended before generation completed"
+        assert "Stream ended before generation completed" in str(e)
lib/bindings/python/tests/test_cancellation/test_client_loop_break.py (2)

20-21: Register the custom marker to avoid PytestUnknownMarkWarning.

Same marker advice as siblings.


36-45: Consider explicitly closing the stream after breaking the loop.

If supported by the client API, closing ensures cancellation propagates immediately instead of relying on GC.

Possible pattern:

# after break
await getattr(stream, "aclose", lambda: None)()
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6dd3326 and f6e7bd1.

📒 Files selected for processing (5)
  • lib/bindings/python/tests/test_cancellation/conftest.py (3 hunks)
  • lib/bindings/python/tests/test_cancellation/test_cancellation.py (0 hunks)
  • lib/bindings/python/tests/test_cancellation/test_client_loop_break.py (1 hunks)
  • lib/bindings/python/tests/test_cancellation/test_server_context_cancel.py (1 hunks)
  • lib/bindings/python/tests/test_cancellation/test_server_raise_cancelled.py (1 hunks)
💤 Files with no reviewable changes (1)
  • lib/bindings/python/tests/test_cancellation/test_cancellation.py
🧰 Additional context used
🧬 Code graph analysis (1)
lib/bindings/python/tests/test_cancellation/conftest.py (3)
lib/bindings/python/rust/lib.rs (6)
  • namespace (362-367)
  • endpoint (549-555)
  • serve_endpoint (569-595)
  • generate (753-765)
  • cancel (534-536)
  • shutdown (376-378)
lib/bindings/python/rust/engine.rs (2)
  • generate (95-97)
  • generate (150-283)
lib/bindings/python/src/dynamo/_core.pyi (1)
  • DistributedRuntime (31-54)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build and Test - dynamo

Comment on lines +189 to +196
@pytest.fixture(scope="session")
async def runtime():
"""Create a DistributedRuntime for testing"""
# TODO(michaelfeil): consider re-using runtime across tests to not have to launch tests via subprocess.
loop = asyncio.get_running_loop()
runtime = DistributedRuntime(loop, True)
yield runtime
runtime.shutdown()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

🧩 Analysis chain

Session-scoped runtime bound to the first test’s event loop can break later tests.

pytest-asyncio often creates a new loop per test by default. Capturing asyncio.get_running_loop() in a session fixture risks “attached to a different loop” issues. Bind the runtime to a session-scoped loop explicitly and inject it.

Proposed change in this file:

-@pytest.fixture(scope="session")
-async def runtime():
+@pytest.fixture(scope="session")
+async def runtime(event_loop):
     """Create a DistributedRuntime for testing"""
-    # TODO(michaelfeil): consider re-using runtime across tests to not have to launch tests via subprocess.
-    loop = asyncio.get_running_loop()
-    runtime = DistributedRuntime(loop, True)
+    runtime = DistributedRuntime(event_loop, True)
     yield runtime
     runtime.shutdown()

Add a session-scoped event loop fixture (can live in the same conftest or a top-level one):

@pytest.fixture(scope="session")
def event_loop():
    loop = asyncio.new_event_loop()
    try:
        yield loop
    finally:
        loop.close()

Run this to verify pytest-asyncio config and marker registration:


🏁 Script executed:

#!/bin/bash
# Check pytest-asyncio version and mode hints
rg -n --glob '{pyproject.toml,pytest.ini,setup.cfg}' -e 'pytest-asyncio|asyncio_mode|markers|pre_merge' -C2 || true
# Detect existing event_loop fixtures
rg -nP 'def\s+event_loop\s*\(' -g '!**/.venv/**' -C2 || true

Length of output: 1004


Don't bind a session-scoped DistributedRuntime to asyncio.get_running_loop()

pyproject.toml sets asyncio_mode = "auto" and there is no custom event_loop fixture; the current session fixture captures the first running loop and can cause "attached to a different loop"/ScopeMismatch failures. Location: lib/bindings/python/tests/test_cancellation/conftest.py (lines 189–196).

Fix (choose one):

  • Recommended: create and manage a dedicated session loop inside the fixture (loop = asyncio.new_event_loop(); pass that loop to DistributedRuntime; close it on teardown).
  • Alternative: override pytest-asyncio's event_loop as session-scoped and inject it into the runtime fixture (changes global per-test loop isolation).

@kthui
Copy link
Contributor

kthui commented Sep 18, 2025

I made some enhancement on top of this PR commit: #3127

This should fix the runtime already initialized issue and clean up the subprocess launch code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external-contribution Pull request is from an external contributor fix size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants