Skip to content

Conversation

apbose
Copy link
Collaborator

@apbose apbose commented Sep 22, 2025

TRT-LLM installation tool for distributed

  1. The download is to be done by only one GPU to avoid unnecessary downloads
  2. Use of barrier in the tool for the above purpose
  3. The util functions for TRT-LLM installation is moved to dynamo/distributed/utils.py with the initialization and cleanup of the env to be done on user side. (We do it in tests/py/dynamo/distributed and examples/distributed_inference/tensor_parallel_initialize_dist)

@github-actions github-actions bot added component: tests Issues re: Tests component: conversion Issues re: Conversion stage component: api [Python] Issues re: Python API component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths labels Sep 22, 2025
@meta-cla meta-cla bot added the cla signed label Sep 22, 2025
@github-actions github-actions bot requested a review from peri044 September 22, 2025 06:34
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/utils.py	2025-09-22 06:35:28.523784+00:00
+++ /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/utils.py	2025-09-22 06:36:00.657186+00:00
@@ -863,6 +863,6 @@
    return False


def is_thor() -> bool:
    if torch.cuda.get_device_capability() in [(11, 0)]:
-        return True
\ No newline at end of file
+        return True

@apbose apbose force-pushed the abose/trt_llm_installation_dist branch from 3f1fa7e to 54948d9 Compare September 25, 2025 19:33
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/distributed/utils.py	2025-09-25 19:33:28.176615+00:00
+++ /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/distributed/utils.py	2025-09-25 19:34:02.325958+00:00
@@ -100,11 +100,10 @@
        return True

    except Exception as e:
        logger.warning(f"Failed to detect CUDA version: {e}")
        return False
-

    return True


def _cache_root() -> Path:

@apbose apbose force-pushed the abose/trt_llm_installation_dist branch 3 times, most recently from 2bbc423 to 5beefc0 Compare September 25, 2025 22:13
@apbose apbose changed the title Changes to TRT-LLM download tool for multigpu distributed case Changes to TRT-LLM download tool for multigpu distributed case [WIP] Sep 25, 2025
@apbose apbose force-pushed the abose/trt_llm_installation_dist branch from 5beefc0 to 809c7ee Compare September 26, 2025 00:11
@apbose apbose changed the title Changes to TRT-LLM download tool for multigpu distributed case [WIP] Changes to TRT-LLM download tool for multigpu distributed case Sep 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed component: api [Python] Issues re: Python API component: build system Issues re: Build system component: conversion Issues re: Conversion stage component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths component: tests Issues re: Tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant