-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Mixin: Add and update alerts #2644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 1 commit
Commits
Show all changes
19 commits
Select commit
Hold shift + click to select a range
fc967aa
Add mountpoint to NodeFilesystem alerts
v-zhuravlev 0e0399d
Decrease NodeFilesystem pending time to 15m
v-zhuravlev fd2d62a
Add CPU and memory alerts
v-zhuravlev 7479418
Add failed systemd service alert
v-zhuravlev 3d8075d
Decrease NodeNetwork*Errs pending period
v-zhuravlev 614030b
Set 'at' everywhere as preposition for instance
v-zhuravlev 94fc82e
Add NodeDiskIOSaturation alert
v-zhuravlev 962de6c
Add %(nodeExporterSelector)s to Network and conntrack alerts
v-zhuravlev c3ec6e8
Add diskDevice selector
v-zhuravlev e15e7d6
Fix NodeMemoryHighUtilization alert
v-zhuravlev 580c497
Add NodeSystemSaturation and NodeMemoryMajorPagesFaults
v-zhuravlev da32f8d
Decrease NodeSystemdServiceFailed severity to warning
v-zhuravlev e48e790
Extend alert description
v-zhuravlev 2111e70
Add comma after 'mounted on'
v-zhuravlev 77ae769
Add thresholds for memory alerts
v-zhuravlev 6bdc1d9
Add thresholds for memory, disk and system alerts
v-zhuravlev b7dfb32
Set severity to NodeCPUHighUsage to info
v-zhuravlev 3e250a9
Update NodeSystemSaturation severity
v-zhuravlev e8d7f4e
Revert alerts pending durtions
v-zhuravlev File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Set severity to NodeCPUHighUsage to info
Signed-off-by: Vitaly Zhuravlev <[email protected]>
- Loading branch information
commit b7dfb32bfc1e20bf8c7493427ac085d550589c7e
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
High CPU usage is not a problem and can just be an indicator or properly utilizing your machine, so I'd remove these
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps, as long as we can alert on high system load(saturation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, CPU usage is good. :) I mean, this would be a case for the "info" level alerts that I like to promote, but I don't think we have them here in the mixin.
(Info level alerts notify nobody, but you could look at the alerts page while troubleshooting. They point to things that are not problems per se and might be OK, but which you might be interested while there is an actual incident happening.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I'd be fine with a 'info' level severity. No reason to now just introduce that now that we're on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll make it an info according to this guideline:
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps a warning if the usage stays above 98% for 1h would be viable? That would be a case where the host is at capacity and scheduling more tasks there would result in performance degradation. It is a risk folks can accept but something that should be considered as part of the capacity plan.