-
Notifications
You must be signed in to change notification settings - Fork 115
Allow configurable threshold values for noobaa alerts #1750
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…ityState and NoobaaBucketNoCapacityState Signed-off-by: Aayush Chouhan <[email protected]>
WalkthroughAdds configurable Prometheus alert thresholds to the NooBaa CRD. Introduces AlertThresholds spec with BucketLowCapacityPercent and BucketNoCapacityPercent fields. Updates reconciliation logic to dynamically compute PrometheusRule expressions based on configured threshold percentages. Changes
Sequence DiagramsequenceDiagram
participant User as Operator
participant NooBaa as NooBaa Spec
participant Reconciler
participant setDesiredPrometheusRule as setDesiredPrometheusRule()
participant PrometheusRule
User->>NooBaa: Configure alertThresholds<br/>(bucketLowCapacityPercent,<br/>bucketNoCapacityPercent)
Note over NooBaa: AlertThresholds spec persisted
Reconciler->>Reconciler: ReconcilePrometheusRule()
Reconciler->>setDesiredPrometheusRule: Invoke callback
setDesiredPrometheusRule->>NooBaa: Read AlertThresholds
alt AlertThresholds is set
Note over setDesiredPrometheusRule: Generate PromQL expressions<br/>from percentages
setDesiredPrometheusRule->>PrometheusRule: Update alert rules with<br/>computed expressions
PrometheusRule-->>setDesiredPrometheusRule: Acknowledged
else AlertThresholds is nil
Note over setDesiredPrometheusRule: No-op
end
setDesiredPrometheusRule-->>Reconciler: Return (status)
Reconciler-->>User: Prometheus alerts configured
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
🔇 Additional comments (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/bundle/deploy.go (1)
4885-4905:NooBaaBucketNoCapacityStatealert may not fire for threshold=100 and text still implies “100%”The updated rules that use scalar thresholds are generally good, but there are two subtle behavior issues:
- The expression now uses
> scalar(NooBaa_bucket_no_capacity_threshold). If the threshold metric is a 0–100 percentage and a user sets it to100, the alert will never trigger (usage cannot exceed 100). Previously a>= 100style check would still fire at full capacity.- The alert annotations still say “is using all of its capacity” / “No Capacity State”, but with a configurable threshold (default 95) this is effectively “near full capacity”. Either the expression or the text should be adjusted so they match.
Consider changing the “no capacity” expression to
>= scalar(NooBaa_bucket_no_capacity_threshold)(and possibly capping the CRD max at 100) and/or rewording the annotation to “usage is above the configured no-capacity threshold”.
🧹 Nitpick comments (4)
deploy/crds/noobaa.io_noobaas.yaml (1)
989-1012: CRD schema for spec.alertThresholds matches the Go API
alertThresholdsand itsbucketLowCapacityPercent/bucketNoCapacityPercentfields are correctly modelled as int32 with 0–100 constraints, aligned withAlertThresholdsSpecinnoobaa_types.go. The “default is 80/95” note here is purely documentation, which is fine given the actual defaulting is handled in core.If you ever want API-level defaulting (values materialized in the CR even when omitted), you could add
+kubebuilder:defaulttags on the Go fields so controller-gen emitsdefault:entries in this schema, but that would then need to stay in sync with the core’s hardcoded defaults.pkg/system/phase2_creating.go (1)
537-544: Avoid stale BUCKET_*_THRESHOLD env when AlertThresholds are unsetRight now the env vars are only updated when the pointer fields are non‑nil. If a user removes
spec.alertThresholds(or one of its fields) after previously setting it, the correspondingBUCKET_*_THRESHOLDvalue in the pod spec will remain at the last configured value instead of reverting to the core default.To keep pod env in sync with the CR, consider explicitly clearing the value when the field is unset, e.g.:
- case "BUCKET_LOW_CAPACITY_THRESHOLD": - if r.NooBaa.Spec.AlertThresholds != nil && r.NooBaa.Spec.AlertThresholds.BucketLowCapacityPercent != nil { - c.Env[j].Value = fmt.Sprintf("%d", *r.NooBaa.Spec.AlertThresholds.BucketLowCapacityPercent) - } - case "BUCKET_NO_CAPACITY_THRESHOLD": - if r.NooBaa.Spec.AlertThresholds != nil && r.NooBaa.Spec.AlertThresholds.BucketNoCapacityPercent != nil { - c.Env[j].Value = fmt.Sprintf("%d", *r.NooBaa.Spec.AlertThresholds.BucketNoCapacityPercent) - } + case "BUCKET_LOW_CAPACITY_THRESHOLD": + if at := r.NooBaa.Spec.AlertThresholds; at != nil && at.BucketLowCapacityPercent != nil { + c.Env[j].Value = fmt.Sprintf("%d", *at.BucketLowCapacityPercent) + } else { + c.Env[j].Value = "" + } + case "BUCKET_NO_CAPACITY_THRESHOLD": + if at := r.NooBaa.Spec.AlertThresholds; at != nil && at.BucketNoCapacityPercent != nil { + c.Env[j].Value = fmt.Sprintf("%d", *at.BucketNoCapacityPercent) + } else { + c.Env[j].Value = "" + }This keeps the env vars consistent with the CR when users remove configuration.
pkg/apis/noobaa/v1alpha1/noobaa_types.go (1)
245-248: AlertThresholdsSpec API is well-shaped; defaults are documented but not API-enforcedThe
AlertThresholdspointer onNooBaaSpecand theAlertThresholdsSpecdefinition (pointer int32 fields with 0–100 validation) look clean and align with the CRD schema. Using pointers makes the “unset vs set to 0” distinction unambiguous.Right now the “default is 80/95” behavior is only in comments (and implemented in core), which is perfectly valid. If you want those defaults to be visible directly in the CR when omitted, you could add:
// +kubebuilder:default=80 BucketLowCapacityPercent *int32 `json:"bucketLowCapacityPercent,omitempty"` // +kubebuilder:default=95 BucketNoCapacityPercent *int32 `json:"bucketNoCapacityPercent,omitempty"`and regenerate, but that’s optional and would need to stay consistent with the core implementation.
Also applies to: 314-331
pkg/bundle/deploy.go (1)
2418-2441: CRDalertThresholdsschema is solid; consider explicit defaults/constraintsThe new
spec.alertThresholdsobject and its integer fields look well-placed and correctly constrained (int32,minimum: 0,maximum: 100). Two optional refinements:
- The descriptions mention defaults (
80/95) but the OpenAPI schema does not setdefault:values. If the defaults are only enforced in controller code, consider either addingdefaulthere for discoverability or dropping the explicit numbers from the description to avoid confusing API consumers.- Today both thresholds can be configured to any 0–100 value independently. If you expect
bucketNoCapacityPercentto always be ≥bucketLowCapacityPercent, that invariant will need to be enforced in validation logic (can’t be expressed in this schema alone).
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
deploy/crds/noobaa.io_noobaas.yaml(1 hunks)deploy/internal/prometheus-rules.yaml(2 hunks)deploy/internal/statefulset-core.yaml(1 hunks)pkg/apis/noobaa/v1alpha1/noobaa_types.go(2 hunks)pkg/apis/noobaa/v1alpha1/zz_generated.deepcopy.go(2 hunks)pkg/bundle/deploy.go(7 hunks)pkg/system/phase2_creating.go(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: naveenpaul1
Repo: noobaa/noobaa-operator PR: 1607
File: pkg/bundle/deploy.go:0-0
Timestamp: 2025-07-02T04:34:47.006Z
Learning: Default values for `METRICS_AUTH_ENABLED` and `VERSION_AUTH_ENABLED` are set up under `noobaa-config` in the NooBaa operator.
🧬 Code graph analysis (2)
pkg/apis/noobaa/v1alpha1/zz_generated.deepcopy.go (1)
pkg/apis/noobaa/v1alpha1/noobaa_types.go (1)
AlertThresholdsSpec(315-331)
pkg/system/phase2_creating.go (1)
pkg/apis/noobaa/v1alpha1/noobaa_types.go (1)
NooBaa(41-56)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: run-admission-test
- GitHub Check: run-cli-tests
- GitHub Check: run-azure-vault-test
- GitHub Check: run-kms-dev-test
- GitHub Check: run-hac-test
- GitHub Check: run-operator-tests
- GitHub Check: run-kms-key-rotate-test
- GitHub Check: run-kms-tls-sa-test
- GitHub Check: run-core-config-map-tests
- GitHub Check: run-kms-kmip-test
- GitHub Check: cnpg-deployment-test
- GitHub Check: run-kms-tls-token-test
- GitHub Check: golangci-lint
🔇 Additional comments (5)
deploy/internal/statefulset-core.yaml (1)
136-137: Env var stubs for bucket capacity thresholds look consistentAdding
BUCKET_LOW_CAPACITY_THRESHOLDandBUCKET_NO_CAPACITY_THRESHOLDhere matches the existing pattern of defining core env vars in the template and populating values from the reconciler. No issues from the manifest side.pkg/apis/noobaa/v1alpha1/zz_generated.deepcopy.go (2)
114-138: DeepCopy for AlertThresholdsSpec correctly handles pointer fieldsThe new
AlertThresholdsSpecdeepcopy functions mirror the standard pattern in this file and correctly allocate and copy the pointer fields.
1494-1498: NooBaaSpec now deep-copies AlertThresholds as expectedWiring
AlertThresholdsintoNooBaaSpec.DeepCopyIntovia(*in).DeepCopyInto(*out)is correct and keeps the spec fully copy-safe.deploy/internal/prometheus-rules.yaml (1)
169-181: Dynamic bucket capacity thresholds via scalar() look good; confirm metric shapeUsing
scalar(NooBaa_bucket_low_capacity_threshold)/scalar(NooBaa_bucket_no_capacity_threshold)is a reasonable way to drive alerts from configuration, as long as each of those metrics exposes exactly one (unlabelled) timeseries in Prometheus. If they ever gain labels or multiple series, these rules will start erroring instead of alerting.Please verify in a test cluster that:
- The expressions evaluate without “multiple series” / scalar conversion errors.
- Adjusting
spec.alertThresholdsin the NooBaa CR actually shifts the effective alert firing point.pkg/bundle/deploy.go (1)
5409-5458: Core statefulset env wiring for bucket capacity thresholds looks consistentAdding
BUCKET_LOW_CAPACITY_THRESHOLDandBUCKET_NO_CAPACITY_THRESHOLDalongside the other core env vars is consistent with the existing pattern (placeholders that core reads/configures at runtime). No issues from the manifest side.
|
Conflicts with #1747? @nimrod-becker |
Signed-off-by: Aayush Chouhan <[email protected]>
Explain the changes
NooBaaBucketLowCapacityStateandNooBaaBucketNoCapacityStatealert thresholds configurable via NooBaa CR.noobaa_types.go: AddedAlertThresholdsSpecwithBucketLowCapacityPercentandBucketNoCapacityPercentphase4_configuring.go: AddedsetDesiredPrometheusRule()to override alert thresholds inPrometheusRulewhen specified in CR.Issues: Fixed #xxx / Gap #xxx
Testing Instructions:
BucketLowCapacityPercentandBucketNoCapacityPercentinAlertThresholdsSpecwhich configure the thershold values for the alerts.a.
helm install prometheus prometheus-community/kube-prometheus-stackb. put the label
release: prometheusin the serviceMonitors and prometheusRules.c.
kubectl port-forward --namespace='default' prometheus-prometheus-kube-prometheus-prometheus-0 9090d. open prometheus dashboard on the browser : http://localhost:9090 , and check for alerts
Summary by CodeRabbit
Release Notes
✏️ Tip: You can customize this high-level summary in your review settings.