-
Notifications
You must be signed in to change notification settings - Fork 92
prometheus alerts - replace bucket capacity with BS capacity DF3671 #9334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
WalkthroughAdds low-capacity reporting: a new Prometheus Gauge Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20–30 minutes
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/server/analytic_services/prometheus_reports/noobaa_core_report.js(2 hunks)src/server/system_services/stats_aggregator.js(2 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-08-08T13:10:36.141Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/server/system_services/pool_server.js:1314-1317
Timestamp: 2025-08-08T13:10:36.141Z
Learning: In src/server/system_services/pool_server.js (and config usage), the constant config.INTERNAL_STORAGE_POOL_NAME has been removed from the system. Future logic should not depend on this constant and should instead use config.DEFAULT_POOL_NAME or structural markers (e.g., pool.resource_type === 'INTERNAL' or pool.mongo_info) to identify internal/mongo pools.
Applied to files:
src/server/system_services/stats_aggregator.js
📚 Learning: 2025-08-08T13:12:46.728Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:9-17
Timestamp: 2025-08-08T13:12:46.728Z
Learning: In upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js for noobaa-core, rely on structural detection (e.g., pool.mongo_info, and resource_type === 'INTERNAL') with name-prefix fallback for removing legacy mongo/internal pools, instead of depending solely on config.INTERNAL_STORAGE_POOL_NAME or config.DEFAULT_POOL_NAME. Handle multi-system stores and remove all matching pools in one change.
Applied to files:
src/server/system_services/stats_aggregator.jssrc/server/analytic_services/prometheus_reports/noobaa_core_report.js
📚 Learning: 2025-08-08T13:13:42.192Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/server/system_services/pool_server.js:1304-1307
Timestamp: 2025-08-08T13:13:42.192Z
Learning: In noobaa-core, do not rely on config.INTERNAL_STORAGE_POOL_NAME to detect backingstore pools. In src/server/system_services/pool_server.js, prefer checking pool.hosts_pool_info.backingstore or pool.cloud_pool_info.backingstore; fallback to prefix-matching the pool name against config.DEFAULT_POOL_NAME ('backingstores') to handle potential suffixes.
Applied to files:
src/server/system_services/stats_aggregator.jssrc/server/analytic_services/prometheus_reports/noobaa_core_report.js
🧬 Code graph analysis (1)
src/server/system_services/stats_aggregator.js (1)
src/server/system_services/pool_server.js (6)
_(9-9)_(1175-1175)_(1203-1203)pool_info(475-475)pool_info(524-524)pool_info(1187-1187)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: run-package-lock-validation
- GitHub Check: Build Noobaa Image
- GitHub Check: run-jest-unit-tests
🔇 Additional comments (4)
src/server/analytic_services/prometheus_reports/noobaa_core_report.js (2)
632-632: LGTM!The metric reset is correctly placed alongside the resource_status reset.
635-637: Ensure backing stores are properly identified.The metric
backing_store_capacityshould only track backing store pools, not all pool resources. According to best practices in the codebase, verify thatget_cloud_pool_statscorrectly identifies backing stores by checkingpool.hosts_pool_info.backingstoreorpool.cloud_pool_info.backingstorerather than relying solely on pool naming patterns oris_default_poolchecks.src/server/system_services/stats_aggregator.js (2)
729-732: LGTM! Low capacity states appropriately separated from health states.The LOW_CAPACITY_MODES constant correctly identifies capacity warning states. Note that these modes also appear in OPTIMAL_MODES (lines 720-728), which is intentional: a pool in low capacity is still considered healthy but requires capacity attention. This separation aligns with the PR objective to track physical capacity independently.
798-802: Verify that only backing store pools are included in the resources array.The
backing_store_capacitymetric should only track backing store resources. Confirm that the filtering logic (skipping default pools) is sufficient to ensure only backing stores are added to the resources array. If not all non-default pools are backing stores, add explicit checks usingpool.cloud_pool_info.backingstoreorpool.hosts_pool_info.backingstorefields to properly identify backing store pools before including them in the metrics.
1f2344b to
8546050
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/server/system_services/stats_aggregator.js (1)
720-732: Clarify low‑capacity modes and consider reusing constants / explicit numeric valueThe new
LOW_CAPACITY_MODESsubset andis_low_capacityflag look reasonable, but a couple of points are worth double‑checking:
- The metric will only treat
TIER_LOW_CAPACITYandLOW_CAPACITYas “low capacity”. If modes likeTIER_NO_CAPACITY/NO_CAPACITYorTIER_NO_CAPACITYare supposed to trigger the same Prometheus alert, you may want to either include them here or rename the metric/flag to reflect that it’s strictly “low but not no capacity”. Please confirm this matches the alerting rules in the operator side.resources.is_low_capacityends up feeding a PrometheusGauge. Even though JS will coerce booleans to0/1, explicitly converting (e.g.Number(_.includes(LOW_CAPACITY_MODES, pool_info.mode))) would make the intent clearer and more robust if the Prom client ever tightens type checks.- Optional: we now have mode lists in a few places in this file; if they need to stay in sync with central definitions, consider hoisting shared mode sets to module‑level constants to avoid subtle drift over time.
Also applies to: 798-802
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/server/analytic_services/prometheus_reports/noobaa_core_report.js(2 hunks)src/server/system_services/stats_aggregator.js(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/server/analytic_services/prometheus_reports/noobaa_core_report.js
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-08-08T13:10:36.141Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/server/system_services/pool_server.js:1314-1317
Timestamp: 2025-08-08T13:10:36.141Z
Learning: In src/server/system_services/pool_server.js (and config usage), the constant config.INTERNAL_STORAGE_POOL_NAME has been removed from the system. Future logic should not depend on this constant and should instead use config.DEFAULT_POOL_NAME or structural markers (e.g., pool.resource_type === 'INTERNAL' or pool.mongo_info) to identify internal/mongo pools.
Applied to files:
src/server/system_services/stats_aggregator.js
📚 Learning: 2025-08-08T13:12:46.728Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:9-17
Timestamp: 2025-08-08T13:12:46.728Z
Learning: In upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js for noobaa-core, rely on structural detection (e.g., pool.mongo_info, and resource_type === 'INTERNAL') with name-prefix fallback for removing legacy mongo/internal pools, instead of depending solely on config.INTERNAL_STORAGE_POOL_NAME or config.DEFAULT_POOL_NAME. Handle multi-system stores and remove all matching pools in one change.
Applied to files:
src/server/system_services/stats_aggregator.js
📚 Learning: 2025-08-08T13:13:42.192Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/server/system_services/pool_server.js:1304-1307
Timestamp: 2025-08-08T13:13:42.192Z
Learning: In noobaa-core, do not rely on config.INTERNAL_STORAGE_POOL_NAME to detect backingstore pools. In src/server/system_services/pool_server.js, prefer checking pool.hosts_pool_info.backingstore or pool.cloud_pool_info.backingstore; fallback to prefix-matching the pool name against config.DEFAULT_POOL_NAME ('backingstores') to handle potential suffixes.
Applied to files:
src/server/system_services/stats_aggregator.js
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: run-jest-unit-tests
- GitHub Check: Build Noobaa Image
- GitHub Check: run-package-lock-validation
Signed-off-by: Amit Prinz Setter <[email protected]>
8546050 to
10aa010
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/api/stats_api.js(1 hunks)src/server/analytic_services/prometheus_reports/noobaa_core_report.js(2 hunks)src/server/system_services/stats_aggregator.js(2 hunks)
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-08-08T13:10:36.141Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/server/system_services/pool_server.js:1314-1317
Timestamp: 2025-08-08T13:10:36.141Z
Learning: In src/server/system_services/pool_server.js (and config usage), the constant config.INTERNAL_STORAGE_POOL_NAME has been removed from the system. Future logic should not depend on this constant and should instead use config.DEFAULT_POOL_NAME or structural markers (e.g., pool.resource_type === 'INTERNAL' or pool.mongo_info) to identify internal/mongo pools.
Applied to files:
src/server/system_services/stats_aggregator.jssrc/server/analytic_services/prometheus_reports/noobaa_core_report.js
📚 Learning: 2025-08-08T13:12:46.728Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:9-17
Timestamp: 2025-08-08T13:12:46.728Z
Learning: In upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js for noobaa-core, rely on structural detection (e.g., pool.mongo_info, and resource_type === 'INTERNAL') with name-prefix fallback for removing legacy mongo/internal pools, instead of depending solely on config.INTERNAL_STORAGE_POOL_NAME or config.DEFAULT_POOL_NAME. Handle multi-system stores and remove all matching pools in one change.
Applied to files:
src/server/system_services/stats_aggregator.jssrc/server/analytic_services/prometheus_reports/noobaa_core_report.js
📚 Learning: 2025-08-08T13:13:42.192Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/server/system_services/pool_server.js:1304-1307
Timestamp: 2025-08-08T13:13:42.192Z
Learning: In noobaa-core, do not rely on config.INTERNAL_STORAGE_POOL_NAME to detect backingstore pools. In src/server/system_services/pool_server.js, prefer checking pool.hosts_pool_info.backingstore or pool.cloud_pool_info.backingstore; fallback to prefix-matching the pool name against config.DEFAULT_POOL_NAME ('backingstores') to handle potential suffixes.
Applied to files:
src/server/system_services/stats_aggregator.jssrc/server/analytic_services/prometheus_reports/noobaa_core_report.js
📚 Learning: 2025-08-11T06:12:12.318Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:6-22
Timestamp: 2025-08-11T06:12:12.318Z
Learning: In the noobaa-core upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js, bucket migration from the internal mongo pool to a new default pool is planned to be handled in separate future PRs with comprehensive testing, rather than being included directly in the pool removal script.
Applied to files:
src/server/analytic_services/prometheus_reports/noobaa_core_report.js
📚 Learning: 2025-08-08T13:10:57.608Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/server/system_services/pool_server.js:1308-1311
Timestamp: 2025-08-08T13:10:57.608Z
Learning: In noobaa-core, config.INTERNAL_STORAGE_POOL_NAME has been removed. Use config.DEFAULT_POOL_NAME for internal/backingstore pool identification. Prefer exact match first, with prefix fallback to handle potential system-id suffixes in pool names.
Applied to files:
src/server/analytic_services/prometheus_reports/noobaa_core_report.js
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: run-jest-unit-tests
- GitHub Check: run-package-lock-validation
- GitHub Check: Build Noobaa Image
🔇 Additional comments (4)
src/api/stats_api.js (1)
450-453: LGTM!The schema addition for
is_low_capacityis well-structured and follows the existing pattern of theis_healthyfield. Making it optional ensures backward compatibility.src/server/analytic_services/prometheus_reports/noobaa_core_report.js (1)
631-638: LGTM!The metric reset and update logic is correctly implemented. The use of
Number()to convert the booleanis_low_capacityto 0/1 is appropriate for Prometheus metrics.src/server/system_services/stats_aggregator.js (2)
729-732: LGTM!The
LOW_CAPACITY_MODESconstant correctly identifies the pool modes that indicate low capacity. Note that these modes are also included inOPTIMAL_MODES(lines 720-728), which is intentional—a resource can be healthy (functional) while having low capacity that requires alerting.
798-802: LGTM!The
is_low_capacityfield is correctly set based on pool mode. This integrates properly with the API schema changes and the Prometheus metric reporting in the other files.
| }, { | ||
| type: 'Gauge', | ||
| name: 'backing_store_low_capacity', | ||
| configuration: { | ||
| help: 'Does backing store in has low capacity', | ||
| labelNames: ['backing_store_name'] | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix grammatical error in help text.
The help text contains a grammatical error: "Does backing store in has low capacity" should be corrected to something clearer.
Apply this diff to improve the help text:
}, {
type: 'Gauge',
name: 'backing_store_low_capacity',
configuration: {
- help: 'Does backing store in has low capacity',
+ help: 'Backing Store Low Capacity Status (0=normal, 1=low capacity)',
labelNames: ['backing_store_name']
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| }, { | |
| type: 'Gauge', | |
| name: 'backing_store_low_capacity', | |
| configuration: { | |
| help: 'Does backing store in has low capacity', | |
| labelNames: ['backing_store_name'] | |
| } | |
| }, { | |
| type: 'Gauge', | |
| name: 'backing_store_low_capacity', | |
| configuration: { | |
| help: 'Backing Store Low Capacity Status (0=normal, 1=low capacity)', | |
| labelNames: ['backing_store_name'] | |
| } |
🤖 Prompt for AI Agents
In src/server/analytic_services/prometheus_reports/noobaa_core_report.js around
lines 297 to 303, the Gauge metric's help text is grammatically incorrect ("Does
backing store in has low capacity"); update the configuration.help string to a
clear, correct phrase such as "Indicates whether the backing store has low
capacity" (or "Backing store has low capacity") so the metric description reads
properly.
Describe the Problem
The bucket's capacity doesn't take into account data reduction, which is confusing.
Explain the Changes
We opted to remove the per-bucket capacity check and replace with backing store-level check (physical).
Issues: Fixed
https://issues.redhat.com/browse/DFBUGS-3671
Testing Instructions:
Note corresponding PR in operator for the rules change:
noobaa/noobaa-operator#1747
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.