Skip to content

Conversation

@hhzhang16
Copy link
Contributor

@hhzhang16 hhzhang16 commented Sep 9, 2025

Overview:

Remove the Dynamo k8s setup from the script to install additional resources needed for k8s benchmarking.

Details:

  • Removed cloud deploy from k8s script
  • Renamed benchmarking resources script
  • Updated READMEs to reflect this new change

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

  • New Features

    • Added a streamlined script to provision benchmarking resources in Kubernetes, including optional token setup and validation steps.
  • Documentation

    • Updated benchmarking and profiling guides to reference the new setup flow and the main platform installation guide.
    • Expanded utilities documentation to cover benchmarking/profiling workflows, PVC usage, and quick-start steps.
    • Clarified help text in benchmarking tools to direct users to the new workflow and resources.
  • Chores

    • Removed the legacy namespace/operator setup script and associated instructions in favor of the new benchmarking resource setup process.

…s script that only applies manifests and adds needed HF tokens

Signed-off-by: Hannah Zhang <[email protected]>
…-profilerplanner-related-k8s-deployment-setup-should-be-on
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 9, 2025

Walkthrough

Replaces the old Kubernetes namespace setup script with a new benchmarking resource setup script, updates related documentation and help text to reference the new workflow and the main platform installation guide, and adjusts benchmarking/profiling docs accordingly. No runtime logic changes to benchmarking CLI; one new script added, one script removed.

Changes

Cohort / File(s) Summary of Changes
Benchmark script help text
benchmarks/benchmark.sh
Updated NOTE in show_help() to point to deploy/README.md and setup_benchmarking_resources.sh instead of setup_k8s_namespace.sh. No logic changes.
Utils docs overhaul
deploy/utils/README.md
Rewrote README to center on Dynamo benchmarking/profiling workflows, new prerequisites, reorganized manifests references, and instructions for setup_benchmarking_resources.sh and PVC utilities.
Setup scripts pivot
deploy/utils/setup_benchmarking_resources.sh, deploy/utils/setup_k8s_namespace.sh
Added new script to provision benchmarking resources (SA/Role/RoleBinding/PVC, optional HF token secret, verification, optional pip install); removed previous namespace/operator setup script.
Benchmarking & profiling docs
docs/benchmarks/benchmarking.md, docs/benchmarks/pre_deployment_profiling.md
Updated prerequisites and setup steps to require Dynamo Cloud platform installation and to use utils README for benchmarking/profiling resource setup; added dependency install command in profiling doc.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant Script as setup_benchmarking_resources.sh
  participant Kubectl as kubectl
  participant K8s as Kubernetes API

  User->>Script: Run with NAMESPACE [+ HF_TOKEN]
  Script->>Kubectl: Check kubectl availability
  Script->>Kubectl: Verify namespace exists
  Kubectl-->>Script: Namespace status
  Script->>K8s: Apply manifests (SA/Role/RoleBinding/PVC) via envsubst
  alt HF_TOKEN provided
    Script->>K8s: Create/Update Secret hf-token-secret
  end
  opt requirements.txt present
    Script->>Script: Install Python deps (pip/pip3 or fallback)
  end
  Script->>K8s: Verify SA, PVC (and Secret if set)
  K8s-->>Script: Resource statuses
  Script-->>User: Success / error summary
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Pre-merge checks (1 passed, 2 warnings)

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The description follows the repository template by providing Overview, Details, and Related Issues, but the “Where should the reviewer start?” section remains empty, lacking the required pointers to the specific files that need close review.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The title “feat: decouple dynamo k8s setup from additional benchmarking requirements” succinctly identifies the primary change—separating the existing Kubernetes setup logic from the new benchmarking resource provisioning—and uses clear, specific language without including unnecessary details or file lists.

Poem

A rabbit taps at kubectl, hop-hop!
Old scripts retire; new ones pop.
ServiceAccounts squeak, PVCs align,
Secrets burrow safe, all by design.
Benchmarks bloom in the Kubernetes sun—
Carrots counted, races run. 🥕


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
docs/benchmarks/pre_deployment_profiling.md (2)

141-144: Path mismatch: use /data path consistently with injector/PVC mount.

You inject configs under /data, but set DGD_CONFIG_FILE to /workspace/...; that will break the job if it expects the PVC path.

-   export DGD_CONFIG_FILE=/workspace/profiling_results/disagg.yaml # or your custom path
+   export DGD_CONFIG_FILE=/data/configs/disagg.yaml # or your injected PVC path under /data

155-156: Undefined variable DYNAMO_HOME in path.

Use a repo‑relative path to avoid confusion.

-Edit `$DYNAMO_HOME/benchmarks/profiler/deploy/profile_sla_job.yaml` to set the target ISL, OSL, TTFT, and ITL.
+Edit `benchmarks/profiler/deploy/profile_sla_job.yaml` to set the target ISL, OSL, TTFT, and ITL.
🧹 Nitpick comments (7)
benchmarks/benchmark.sh (1)

101-103: Help text update looks good; consider adding the full script path.

To reduce ambiguity for new users, reference the full path: deploy/utils/setup_benchmarking_resources.sh.

-    - For Dynamo deployment setup, follow the main installation guide at deploy/README.md
-      to install the platform, then use setup_benchmarking_resources.sh for benchmarking resources.
+    - For Dynamo deployment setup, follow the main installation guide at deploy/README.md
+      to install the platform, then use deploy/utils/setup_benchmarking_resources.sh for benchmarking resources.
docs/benchmarks/pre_deployment_profiling.md (1)

190-191: Incorrect reference to when the PVC is created.

PVC is created by the setup script, not “Step 2”.

-After the profiling job completes successfully, the results are stored in the persistent volume claim (PVC) created during Step 2.
+After the profiling job completes successfully, the results are stored in the persistent volume claim (PVC) created by the setup script.
deploy/utils/setup_benchmarking_resources.sh (3)

10-11: REPO_ROOT is defined but unused.

Remove it to silence SC2034 and avoid confusion.

-REPO_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"

47-47: Add a help/usage switch.

Since usage() exists, wire -h/--help.

+if [[ "${1:-}" == "-h" || "${1:-}" == "--help" ]]; then
+  usage
+  exit 0
+fi

105-111: Verify all created RBAC objects too.

Optional but helpful for debugging.

 kubectl get serviceaccount dynamo-sa -n "$NAMESPACE" >/dev/null && ok "ServiceAccount dynamo-sa exists" || err "ServiceAccount dynamo-sa not found"
 kubectl get pvc dynamo-pvc -n "$NAMESPACE" >/dev/null && ok "PVC dynamo-pvc exists" || err "PVC dynamo-pvc not found"
+kubectl get role dynamo-role -n "$NAMESPACE" >/dev/null && ok "Role dynamo-role exists" || err "Role dynamo-role not found"
+kubectl get rolebinding dynamo-binding -n "$NAMESPACE" >/dev/null && ok "RoleBinding dynamo-binding exists" || err "RoleBinding dynamo-binding not found"
deploy/utils/README.md (2)

51-58: Include RoleBinding in the verification step.

Completes the checklist.

 kubectl get serviceaccount dynamo-sa -n $NAMESPACE
 kubectl get pvc dynamo-pvc -n $NAMESPACE
+kubectl get rolebinding dynamo-binding -n $NAMESPACE

33-42: Document non-interactive mode for setup script.

Reflect FORCE/YES option to avoid prompts in CI.

-After setting up Dynamo Cloud, use this script to prepare your namespace with the additional resources needed for benchmarking and profiling workflows:
+After setting up Dynamo Cloud, use this script to prepare your namespace with the additional resources needed for benchmarking and profiling workflows. In CI/non-interactive environments, set FORCE=1 to skip prompts:
-deploy/utils/setup_benchmarking_resources.sh
+FORCE=1 deploy/utils/setup_benchmarking_resources.sh
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 241bd01 and fe76926.

📒 Files selected for processing (6)
  • benchmarks/benchmark.sh (1 hunks)
  • deploy/utils/README.md (2 hunks)
  • deploy/utils/setup_benchmarking_resources.sh (1 hunks)
  • deploy/utils/setup_k8s_namespace.sh (0 hunks)
  • docs/benchmarks/benchmarking.md (1 hunks)
  • docs/benchmarks/pre_deployment_profiling.md (1 hunks)
💤 Files with no reviewable changes (1)
  • deploy/utils/setup_k8s_namespace.sh
🧰 Additional context used
🪛 Shellcheck (0.10.0)
deploy/utils/setup_benchmarking_resources.sh

[warning] 10-10: REPO_ROOT appears unused. Verify use (or export if used externally).

(SC2034)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Build and Test - sglang
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (2)
docs/benchmarks/pre_deployment_profiling.md (1)

92-92: Good reframe of the setup step.

Reads clearly and matches the new workflow.

docs/benchmarks/benchmarking.md (1)

40-41: Prerequisites update is clear and consistent with the new flow.

No issues found.

Signed-off-by: Hannah Zhang <[email protected]>
@hhzhang16 hhzhang16 merged commit c82f477 into main Sep 15, 2025
14 of 16 checks passed
@hhzhang16 hhzhang16 deleted the hannahz/dyn-995-profilerplanner-related-k8s-deployment-setup-should-be-on branch September 15, 2025 22:04
zhongdaor-nv pushed a commit that referenced this pull request Sep 15, 2025
biswapanda pushed a commit that referenced this pull request Sep 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants