-
Notifications
You must be signed in to change notification settings - Fork 738
feat: Adding nixl read() multimodal support for vLLM backend #4271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Alexandre Milesi <[email protected]>
Signed-off-by: Alexandre Milesi <[email protected]>
Signed-off-by: Alexandre Milesi <[email protected]>
Signed-off-by: Alexandre Milesi <[email protected]>
Signed-off-by: Alexandre Milesi <[email protected]>
Signed-off-by: Alexandre Milesi <[email protected]>
Signed-off-by: Alexandre Milesi <[email protected]>
Signed-off-by: Alexandre Milesi <[email protected]>
Signed-off-by: Alexandre Milesi <[email protected]>
Signed-off-by: Alexandre Milesi <[email protected]>
Signed-off-by: Alexandre Milesi <[email protected]>
Signed-off-by: Alexandre Milesi <[email protected]>
Signed-off-by: Krishnan Prashanth <[email protected]>
Signed-off-by: Krishnan Prashanth <[email protected]>
…h/vllm-nixl-read Signed-off-by: KrishnanPrash <[email protected]>
Signed-off-by: Krishnan Prashanth <[email protected]>
| async def _read_decoded_image_via_nixl( | ||
| self, decoded_meta: Dict[str, Any] | ||
| ) -> PIL.Image.Image: | ||
| """Read decoded image via NIXL RDMA and convert to PIL.Image.""" | ||
| # Lazy-init connector | ||
| if self._connector is None: | ||
| self._connector = connect.Connector() | ||
| await self._connector.initialize() | ||
| logger.info("NIXL connector initialized for decoded media") | ||
|
|
||
| # Extract fields | ||
| meta_str = decoded_meta["nixl_metadata"] | ||
| desc = decoded_meta["nixl_descriptor"] | ||
| shape = decoded_meta["shape"] | ||
|
|
||
| # Create tensor to receive RDMA data | ||
| tensor = torch.empty(shape, dtype=torch.uint8) | ||
|
|
||
| # Build RdmaMetadata from frontend-provided descriptor | ||
| # Frontend sends compressed metadata (matches Python nixl_connect) | ||
| rdma_meta = RdmaMetadata( | ||
| descriptors=[ | ||
| SerializedDescriptor( | ||
| device="cpu" | ||
| if desc.get("mem_type") == "Dram" | ||
| else f"cuda:{desc.get('device_id', 0)}", | ||
| ptr=desc["addr"], | ||
| size=desc["size"], | ||
| ) | ||
| ], | ||
| nixl_metadata=meta_str, | ||
| notification_key=f"img-{shape}", | ||
| operation_kind=int(OperationKind.READ), | ||
| ) | ||
|
|
||
| # RDMA read | ||
| read_op = await self._connector.begin_read( | ||
| rdma_meta, connect.Descriptor(tensor) | ||
| ) | ||
| await read_op.wait_for_completion() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a NIXL expert, so please let me know if I can be doing anything here better.
| // Compress metadata before base64 encoding (matches Python nixl_connect behavior) | ||
| // Backend expects: b64:<base64_of_compressed_bytes> | ||
| let mut encoder = ZlibEncoder::new(Vec::new(), Compression::new(6)); | ||
| encoder.write_all(&nixl_md)?; | ||
| let compressed = encoder.finish()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once again, welcome any suggestions on correct nixl usage.
|
Open Question for Testing: Ideally, we would like to test both test cases:
Based on my conversation with @nv-tusharma, IIUC they suggested creating a separate workflow outside |
Signed-off-by: Ayush Agarwal <[email protected]>
Signed-off-by: Krishnan Prashanth <[email protected]>
…h/vllm-nixl-read Signed-off-by: KrishnanPrash <[email protected]>
Signed-off-by: Krishnan Prashanth <[email protected]>
|
|
||
| # Build RdmaMetadata from frontend-provided descriptor | ||
| # Frontend sends compressed metadata (matches Python nixl_connect) | ||
| rdma_meta = RdmaMetadata( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this work, have you tested it?
the "normal flow" is to create a passive operation (ReadableOp or WritableOp) and use their .metadata property to get the set of SerializedDescriptors and not manually compose this.
given that this is using an active operation (ReadOp) it should be taking in the metadata to perform the read, not sending the metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
usually metadata comes from the secondary connection, which in turn got it from its ReadableOperation.
|
|
||
| # Extract fields | ||
| meta_str = decoded_meta["nixl_metadata"] | ||
| desc = decoded_meta["nixl_descriptor"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what type is desc?
| # Frontend sends compressed metadata (matches Python nixl_connect) | ||
| rdma_meta = RdmaMetadata( | ||
| descriptors=[ | ||
| SerializedDescriptor( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not pass desc to a Descriptor and then serialize the descriptor to get the metadata?
Overview:
With #3988, we have functional image decoding in the frontend for any b64 or http urls passed with the inference request. This PR builds on top of #3988, and implements the nixl
read()portion of the image decoding workflow for the backend.Details:
Look at
handlers.pyfor the additions to theDECODEDworkflow.