forked from prometheus/node_exporter
-
Notifications
You must be signed in to change notification settings - Fork 1
Add node-observ-lib #25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 1 commit
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
8649f29
Add node-observ-lib
v-zhuravlev f5802af
Remove trends support (not in 10.0 schema)
v-zhuravlev db019c5
Make filteringSelector for logs dashboard configurable
v-zhuravlev 79b4153
Temp change dependency (until PR is merged for commonlib)
v-zhuravlev 784cf59
Refactor config
v-zhuravlev d9f8ea2
Update jsonnetfile.json
v-zhuravlev 016bfac
Update README
v-zhuravlev 0e78ebf
Add separate loki example
v-zhuravlev 21c8272
Add sep file example
v-zhuravlev File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Refactor config
- Loading branch information
commit 784cf59803acc21cd458fc3a5c90151f9e690c82
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -20,14 +20,15 @@ You can use observ-lib to fill in monitoring-mixin structure: | |
| local nodelib = import 'node-observ-lib/main.libsonnet'; | ||
|
|
||
| local linux = | ||
| nodelib.new( | ||
| filteringSelector='job="node"', | ||
| groupLabels=['job'], | ||
| instanceLabels=['instance'], | ||
| dashboardNamePrefix='Node exporter / ', | ||
| dashboardTags=['node-exporter-mixin'], | ||
| uid='node' | ||
| ) | ||
| nodelib.new() | ||
| + nodelib.withConfigMixin({ | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe we should add a little explanation/example with separate configuration?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. added |
||
| filteringSelector: 'job=~".*node.*"', | ||
| groupLabels: ['job'], | ||
| instanceLabels: ['instance'], | ||
| dashboardNamePrefix: 'Node exporter / ', | ||
| dashboardTags: ['node-exporter-mixin'], | ||
| uid: 'node', | ||
| }) | ||
| + nodelib.withConfigMixin( | ||
| { | ||
| // enable loki logs | ||
|
|
@@ -51,14 +52,15 @@ local g = import './g.libsonnet'; | |
| local nodelib = import 'node-observ-lib/main.libsonnet'; | ||
|
|
||
| local linux = | ||
| nodelib.new( | ||
| filteringSelector='job="node"', | ||
| groupLabels=['job'], | ||
| instanceLabels=['instance'], | ||
| dashboardNamePrefix='Node exporter / ', | ||
| dashboardTags=['node-exporter-mixin'], | ||
| uid='node' | ||
| ) | ||
| nodelib.new() | ||
| + nodelib.withConfigMixin({ | ||
| filteringSelector: 'job=~".*node.*"', | ||
| groupLabels: ['job'], | ||
| instanceLabels: ['instance'], | ||
| dashboardNamePrefix: 'Node exporter / ', | ||
| dashboardTags: ['node-exporter-mixin'], | ||
| uid: 'node', | ||
| }) | ||
| + { | ||
| grafana+: { | ||
| panels+: { | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,104 @@ | ||
| { | ||
|
|
||
| // any modular observability library should inlcude as inputs: | ||
| // 'dashboardNamePrefix' - Use as prefix for all Dashboards and (optional) rule groups | ||
| // 'filteringSelector' - Static selector to apply to ALL dashboard variables of type query, panel queries, alerts and recording rules. | ||
| // 'groupLabels' - one or more labels that can be used to identify 'group' of instances. In simple cases, can be 'job' or 'cluster'. | ||
| // 'instanceLabels' - one or more labels that can be used to identify single entity of instances. In simple cases, can be 'instance' or 'pod'. | ||
| // 'uid' - UID to prefix all dashboards original uids | ||
|
|
||
| filteringSelector: std.get(self, 'nodeExporterSelector', default='"job="node"'), | ||
| groupLabels: ['job'], | ||
| instanceLabels: ['instance'], | ||
| dashboardNamePrefix: 'Node exporter / ', | ||
| uid: 'node', | ||
|
|
||
| dashboardTags: [self.uid], | ||
|
|
||
| // Select the fstype for filesystem-related queries. If left | ||
| // empty, all filesystems are selected. If you have unusual | ||
| // filesystem you don't want to include in dashboards and | ||
| // alerting, you can exclude them here, e.g. 'fstype!="tmpfs"'. | ||
| fsSelector: 'fstype!=""', | ||
|
|
||
| // Select the mountpoint for filesystem-related queries. If left | ||
| // empty, all mountpoints are selected. For example if you have a | ||
| // special purpose tmpfs instance that has a fixed size and will | ||
| // always be 100% full, but you still want alerts and dashboards for | ||
| // other tmpfs instances, you can exclude those by mountpoint prefix | ||
| // like so: 'mountpoint!~"/var/lib/foo.*"'. | ||
| fsMountpointSelector: 'mountpoint!=""', | ||
|
|
||
| // Select the device for disk-related queries. If left empty, all | ||
| // devices are selected. If you have unusual devices you don't | ||
| // want to include in dashboards and alerting, you can exclude | ||
| // them here, e.g. 'device!="tmpfs"'. | ||
| diskDeviceSelector: 'device!=""', | ||
|
|
||
| // Some of the alerts are meant to fire if a criticadiskDeviceSelector failure of a | ||
| // node is imminent (e.g. the disk is about to run full). In a | ||
| // true “cloud native” setup, failures of a single node should be | ||
| // tolerated. Hence, even imminent failure of a single node is no | ||
| // reason to create a paging alert. However, in practice there are | ||
| // still many situations where operators like to get paged in time | ||
| // before a node runs out of disk space. nodeCriticalSeverity can | ||
| // be set to the desired severity for this kind of alerts. This | ||
| // can even be templated to depend on labels of the node, e.g. you | ||
| // could make this critical for traditional database masters but | ||
| // just a warning for K8s nodes. | ||
| nodeCriticalSeverity: 'critical', | ||
|
|
||
| // CPU utilization (%) on which to trigger the | ||
| // 'NodeCPUHighUsage' alert. | ||
| cpuHighUsageThreshold: 90, | ||
| // Load average 1m (per core) on which to trigger the | ||
| // 'NodeSystemSaturation' alert. | ||
| systemSaturationPerCoreThreshold: 2, | ||
|
|
||
| // Available disk space (%) thresholds on which to trigger the | ||
| // 'NodeFilesystemSpaceFillingUp' alerts. These alerts fire if the disk | ||
| // usage grows in a way that it is predicted to run out in 4h or 1d | ||
| // and if the provided thresholds have been reached right now. | ||
| // In some cases you'll want to adjust these, e.g. by default Kubernetes | ||
| // runs the image garbage collection when the disk usage reaches 85% | ||
| // of its available space. In that case, you'll want to reduce the | ||
| // critical threshold below to something like 14 or 15, otherwise | ||
| // the alert could fire under normal node usage. | ||
| fsSpaceFillingUpWarningThreshold: 40, | ||
| fsSpaceFillingUpCriticalThreshold: 20, | ||
|
|
||
| // Available disk space (%) thresholds on which to trigger the | ||
| // 'NodeFilesystemAlmostOutOfSpace' alerts. | ||
| fsSpaceAvailableWarningThreshold: 5, | ||
| fsSpaceAvailableCriticalThreshold: 3, | ||
|
|
||
| // Memory utilzation (%) level on which to trigger the | ||
| // 'NodeMemoryHighUtilization' alert. | ||
| memoryHighUtilizationThreshold: 90, | ||
|
|
||
| // Threshold for the rate of memory major page faults to trigger | ||
| // 'NodeMemoryMajorPagesFaults' alert. | ||
| memoryMajorPagesFaultsThreshold: 500, | ||
|
|
||
| // Disk IO queue level above which to trigger | ||
| // 'NodeDiskIOSaturation' alert. | ||
| diskIOSaturationThreshold: 10, | ||
|
|
||
| rateInterval: '5m', | ||
|
|
||
| dashboardPeriod: 'now-1h', | ||
| dashboardTimezone: 'default', | ||
| dashboardRefresh: '1m', | ||
|
|
||
| // logs lib related | ||
| enableLokiLogs: false, | ||
| extraLogLabels: ['transport', 'unit', 'level'], | ||
| logsVolumeGroupBy: 'level', | ||
| showLogsVolume: true, | ||
| logsFilteringSelector: self.filteringSelector, | ||
| logsExtraFilters: | ||
| ||| | ||
| | label_format timestamp="{{__timestamp__}}" | ||
| | line_format `{{ if eq "[[instance]]" ".*" }}{{alignLeft 25 .instance}}|{{alignLeft 25 .unit}}|{{else}}{{alignLeft 25 .unit}}|{{end}} {{__line__}}` | ||
| |||, | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nitpicks throughout: I don't mind shortening
observabilitytoobservfor the folder name, but it feels pretty awkward everywhere elseThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good one, updated readme