-
Notifications
You must be signed in to change notification settings - Fork 750
feat: gaie helm chart based example #2168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis update reorganizes and revises the Inference Gateway deployment documentation and example resources for Dynamo. It introduces new installation scripts, updates resource names and namespaces to use "qwen" and "my-model," and adds configuration files for streamlined deployment and usage. Outdated example documentation is removed. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Install Script
participant Kubernetes Cluster
participant Kgateway
participant Dynamo Model Resources
User->>Install Script: Run install_gaie_crd_kgateway.sh
Install Script->>Kubernetes Cluster: Create namespaces (my-model, kgateway-system)
Install Script->>Kubernetes Cluster: Apply Gateway API CRDs
Install Script->>Kubernetes Cluster: Apply Inference Extension CRDs
Install Script->>Kubernetes Cluster: Install Kgateway CRDs via Helm
Install Script->>Kubernetes Cluster: Install Kgateway via Helm (enable inference extension)
Install Script->>Kubernetes Cluster: Deploy Gateway instance
User->>Kubernetes Cluster: Apply model and pool resources (using updated YAMLs)
User->>Kgateway: Send inference requests
Kgateway->>Dynamo Model Resources: Route requests to appropriate model pools
Dynamo Model Resources-->>Kgateway: Return inference results
Kgateway-->>User: Respond with inference output
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~15 minutes Possibly related PRs
Poem
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🔭 Outside diff range comments (1)
deploy/inference-gateway/vllm_agg_qwen.yaml (1)
8-15: Port value referenced in comment is not defined – templates will fail.Lines 8-15 mention “This is the port on which the model is exposed” but no
port:key is actually present.
Helm templates that expect.Values.port(or similar) will render empty and produce invalid manifests.-# This is the port on which the model is exposed -model: +# Port on which the model service is exposed +port: 8000 + +model:
🧹 Nitpick comments (4)
deploy/inference-gateway/example/resources/inference-pool.yaml (1)
18-25: Minor style issues & EOF newline missing.
- Static-analysis flagged the absence of a trailing newline – add one to keep
git diffclean.- Double-check that every Frontend pod in namespace
vllm-aggcarries both selector labels (dynamo-component&dynamo-namespace); if not, no targets will match.nvidia.com/dynamo-namespace: vllm-agg ... name: qwen-epp +deploy/inference-gateway/example/resources/inference-model.yaml (1)
26-26: Add missing trailing newlineYAML-lint flags the lack of a final newline – a tiny nit that still breaks some CI pipelines.
- name: qwen-pool +\x0adeploy/inference-gateway/README.md (2)
41-45: Fix typo “Inferenece”-Install the Inference Extension CRDs (Inferenece Model and Inference Pool CRDs) +Install the Inference Extension CRDs (Inference Model and Inference Pool CRDs)
128-128: Spellings: “alternatively” / “alternative”-This requires `sudo` access to the host machine. alternatively, you can use port-forward to expose the gateway to the host as shown in alternateive (b). +This requires `sudo` access to the host machine. Alternatively, you can use port-forward to expose the gateway to the host as shown in alternative (b).
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
deploy/inference-gateway/README.md(1 hunks)deploy/inference-gateway/example/README.md(0 hunks)deploy/inference-gateway/example/resources/cluster-role-binding.yaml(1 hunks)deploy/inference-gateway/example/resources/dynamo-epp.yaml(2 hunks)deploy/inference-gateway/example/resources/http-router.yaml(2 hunks)deploy/inference-gateway/example/resources/inference-model.yaml(1 hunks)deploy/inference-gateway/example/resources/inference-pool.yaml(1 hunks)deploy/inference-gateway/example/resources/service.yaml(1 hunks)deploy/inference-gateway/install_gaie_crd_kgateway.sh(1 hunks)deploy/inference-gateway/vllm_agg_qwen.yaml(1 hunks)
💤 Files with no reviewable changes (1)
- deploy/inference-gateway/example/README.md
🧰 Additional context used
🧠 Learnings (4)
deploy/inference-gateway/example/resources/cluster-role-binding.yaml (3)
Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:92-98
Timestamp: 2025-07-18T16:04:31.771Z
Learning: CRD schemas in files like deploy/cloud/helm/crds/templates/*.yaml are auto-generated from Kubernetes library upgrades and should not be manually modified as changes would be overwritten during regeneration.
Learnt from: julienmancuso
PR: #1337
File: deploy/cloud/helm/platform/components/operator/templates/planner.yaml:21-24
Timestamp: 2025-06-03T19:09:37.312Z
Learning: In Kubernetes ServiceAccount resources, the imagePullSecrets field is a top-level field at the same level as metadata, not nested under metadata. The correct structure is:
apiVersion: v1
kind: ServiceAccount
metadata:
name: service-account-name
imagePullSecrets:
- name: secret-nameLearnt from: julienmancuso
PR: #1337
File: deploy/cloud/helm/platform/components/operator/templates/image-builer-serviceaccount.yaml:0-0
Timestamp: 2025-06-03T15:26:55.732Z
Learning: The image-builder ServiceAccount in deploy/cloud/helm/platform/components/operator/templates/image-builer-serviceaccount.yaml does not need imagePullSecrets, unlike the component ServiceAccount.
deploy/inference-gateway/example/resources/dynamo-epp.yaml (2)
Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:1178-1180
Timestamp: 2025-07-18T16:05:05.534Z
Learning: The stopSignal field under lifecycle in DynamoComponentDeployment CRDs is autogenerated due to Kubernetes library upgrades (k8s.io/api and k8s.io/apimachinery from v0.32.3 to v0.33.1), not a manual design decision by the user.
Learnt from: julienmancuso
PR: #1365
File: deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go:171-178
Timestamp: 2025-06-04T13:09:53.416Z
Learning: The DYN_DEPLOYMENT_CONFIG environment variable (commonconsts.DynamoDeploymentConfigEnvVar) in the Dynamo operator will never be set via ValueFrom (secrets/config maps), only via direct Value assignment. The GetDynamoDeploymentConfig method correctly only checks env.Value for this specific environment variable.
deploy/inference-gateway/example/resources/service.yaml (2)
Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:1178-1180
Timestamp: 2025-07-18T16:05:05.534Z
Learning: The stopSignal field under lifecycle in DynamoComponentDeployment CRDs is autogenerated due to Kubernetes library upgrades (k8s.io/api and k8s.io/apimachinery from v0.32.3 to v0.33.1), not a manual design decision by the user.
Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:92-98
Timestamp: 2025-07-18T16:04:31.771Z
Learning: CRD schemas in files like deploy/cloud/helm/crds/templates/*.yaml are auto-generated from Kubernetes library upgrades and should not be manually modified as changes would be overwritten during regeneration.
deploy/inference-gateway/install_gaie_crd_kgateway.sh (1)
Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:92-98
Timestamp: 2025-07-18T16:04:31.771Z
Learning: CRD schemas in files like deploy/cloud/helm/crds/templates/*.yaml are auto-generated from Kubernetes library upgrades and should not be manually modified as changes would be overwritten during regeneration.
🪛 YAMLlint (1.37.1)
deploy/inference-gateway/example/resources/inference-pool.yaml
[error] 29-29: no new line character at the end of file
(new-line-at-end-of-file)
deploy/inference-gateway/example/resources/inference-model.yaml
[error] 26-26: no new line character at the end of file
(new-line-at-end-of-file)
🪛 GitHub Actions: Copyright Checks
deploy/inference-gateway/vllm_agg_qwen.yaml
[error] 1-1: Invalid or missing copyright header detected. File does not comply with the required SPDX copyright and license header.
deploy/inference-gateway/install_gaie_crd_kgateway.sh
[error] 1-1: Invalid or missing copyright header detected. File does not comply with the required SPDX copyright and license header.
🪛 LanguageTool
deploy/inference-gateway/README.md
[grammar] ~41-~41: Ensure spelling is correct
Context: .... Install the Inference Extension CRDs (Inferenece Model and Inference Pool CRDs) ```bash ...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build and Test - vllm
🔇 Additional comments (6)
deploy/inference-gateway/example/resources/service.yaml (1)
18-23: Verify selector consistency with Deployment.The selector was renamed to
app: qwen-epp. Confirm the corresponding Deployment/Pod template actually setsapp=qwen-epp; otherwise the Service will have zero endpoints.deploy/inference-gateway/example/resources/cluster-role-binding.yaml (1)
20-25: Are you sure thedefaultServiceAccount is the right subject?You switched the namespace to
my-modelbut kept the SA name asdefault. Using the cluster-wide default SA often results in overly broad permissions or, conversely, missing token automount settings.
Consider creating a dedicated ServiceAccount (e.g.qwen-sa) and binding that instead.No action needed if the choice is deliberate; otherwise update:
subjects: - kind: ServiceAccount name: qwen-sa namespace: my-modeldeploy/inference-gateway/example/resources/inference-model.yaml (1)
18-20: Namespace & name update look correct
qwen-modelscoped tomy-modelcleanly aligns with the rest of the manifests and the install script.
No functional or schema issues spotted.deploy/inference-gateway/example/resources/dynamo-epp.yaml (2)
18-22: Renaming & relabelling LGTM
qwen-epp+my-modelconsistently propagate throughmetadata,labels, and the selector – this keeps the deployment coherent.
40-42: Double-check pool-name arguments actually reach the binaryThe EPP image previously expected
--poolName/--poolNamespace. Single-dash flags often get accepted but may be silently ignored depending on the cobra/flag wiring.Run the binary locally or inspect its
--helpoutput to ensure one-dash variants are recognised; if not, use the double-dash form:- - -poolName + - --poolName - - "qwen-pool" - - "-poolNamespace" + - "qwen-pool" + - --poolNamespace - "my-model"deploy/inference-gateway/example/resources/http-router.yaml (1)
28-30: Verify Service port for theqwen-poolbackendThe HTTPRoute is configured with:
File: deploy/inference-gateway/example/resources/http-router.yaml
Lines 28–30name: qwen-pool port: 8000 weight: 1Since this backendRef points at the InferencePool
qwen-pool(which uses the Service namedqwen-eppper its extensionRef), you must ensure that the actual Service exposes port 8000. If it only exposes the default (e.g. 80), requests will be dropped.Please confirm by running:
kubectl get svc qwen-epp -n my-model -o jsonpath='{.spec.ports[*].port}'If the output does not include
8000, update either the Service’s.spec.ports[].portor theport: 8000value in this HTTPRoute.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Biswa Panda <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need an empty NOTES.txt file?
Overview:
Helm chart for simple gaie - dynamo quickstart
Based on proposal:
linear: DEP-282
Summary by CodeRabbit