-
Notifications
You must be signed in to change notification settings - Fork 750
fix: add better port logic (#2175) #2192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe changes modularize and refactor port allocation logic for Dynamo services by moving it from Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant ArgsParser
participant Config
participant PortsModule
participant Etcd
User->>ArgsParser: Provide CLI args (including port min/max)
ArgsParser->>Config: Create Config with port_range
Config->>PortsModule: Request port allocation (KV/side channel)
PortsModule->>Etcd: Reserve port(s) in ETCD
PortsModule->>Config: Return allocated port(s)
Config->>User: Configuration ready with allocated ports
Estimated code review effort🎯 4 (Complex) | ⏱️ ~40 minutes Possibly related PRs
Poem
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
components/backends/vllm/src/dynamo/vllm/ports.py (1)
142-227: Excellent implementation of atomic port block allocation.The function implements a robust allocation strategy with proper validation, atomic operations, and retry logic. The use of
hold_portscontext manager ensures exclusive access during ETCD reservation, preventing race conditions.Consider making the retry sleep duration configurable or using exponential backoff instead of a fixed 0.01s delay to better handle high contention scenarios:
- if attempt < actual_max_attempts: - await asyncio.sleep(0.01) + if attempt < actual_max_attempts: + # Exponential backoff with jitter + await asyncio.sleep(min(0.01 * (2 ** (attempt - 1)) + random.uniform(0, 0.01), 1.0))components/backends/vllm/src/dynamo/vllm/args.py (1)
234-234: Address the TODO comment about KV port calculation.The comment indicates this is a workaround for vLLM's behavior. This should be tracked and fixed properly in vLLM to avoid confusion.
Would you like me to create an issue to track fixing this in vLLM so the workaround can be removed?
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
components/backends/vllm/src/dynamo/vllm/args.py(5 hunks)components/backends/vllm/src/dynamo/vllm/ports.py(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: pre-merge-rust (.)
- GitHub Check: pre-merge-rust (lib/bindings/python)
- GitHub Check: pre-merge-rust (lib/runtime/examples)
- GitHub Check: Build and Test - vllm
🔇 Additional comments (13)
components/backends/vllm/src/dynamo/vllm/ports.py (9)
1-23: LGTM!The module setup with appropriate imports and well-defined constants for the default port range is clean and follows best practices.
25-41: Well-designed dataclass with proper validation.The
DynamoPortRangedataclass correctly validates that ports are within the registered ports range (1024-49151) and that min < max. The error messages are clear and informative.
43-54: Clean abstraction for ETCD operations.The
EtcdContextdataclass provides a clean abstraction for ETCD operations with a well-structured key format that includes namespace and node IP for proper isolation.
56-75: Well-structured metadata with useful debugging information.The
PortMetadatadataclass properly captures essential information about port reservations, including timestamp and PID for debugging. Good use offield(default_factory=dict)for the mutable default.
88-114: Robust context manager for exclusive port access.The
hold_portscontext manager correctly handles both single and multiple ports, usesSO_REUSEADDRfor port reuse, and ensures proper cleanup with the finally block.
116-124: Simple and effective port availability check.The function correctly uses a context manager to ensure socket cleanup and appropriately catches
OSErrorfor bind failures.
126-140: Clean ETCD reservation implementation.The function properly reserves ports in ETCD with JSON-encoded metadata and uses the primary lease ID for automatic cleanup on process termination.
229-261: Clean convenience wrapper for single port allocation.The function effectively reuses the block allocation logic for single ports, maintaining consistency and reducing code duplication.
263-291: Robust host IP detection with comprehensive error handling.The function properly handles various failure scenarios (hostname resolution, binding failures) with appropriate fallbacks and informative logging. The bindability test is a smart addition to ensure the resolved IP is actually usable.
components/backends/vllm/src/dynamo/vllm/args.py (4)
15-26: Clean import of port management utilities.The imports from the new
portsmodule are well-organized and all imported items are utilized in the refactored code.
44-44: Good addition of port range configuration.Adding the
port_rangeattribute to the Config class properly encapsulates the port range configuration.
75-86: Well-implemented command-line arguments for port range configuration.The new
--dynamo-port-minand--dynamo-port-maxarguments have sensible defaults, clear help text, and proper validation through theDynamoPortRangeconstructor.Also applies to: 133-135
146-202: Port allocation logic verified and approvedAll NIXL port calculations match the documented formula (base_port + dp_rank*tp_size + tp_rank), and the code correctly allocates a contiguous block, computes the base port, and guards against negative values. A search of the repo shows no other NIXL connector implementation that would conflict. No further changes needed.
Overview:
Cherry-pick #2175
Summary by CodeRabbit
New Features
Refactor