Skip to content

Conversation

@v-zhuravlev
Copy link
Contributor

@v-zhuravlev v-zhuravlev commented Nov 29, 2023

Similar to windows-observ-lib, node-observ-lib is created, using many panels from commonlib:. It also refactors everything from node-mixin by using grafonnet (v10 schema).

Node observability lib would allow:

  • Easily read and maintain dashboards as code
  • Easily modify any dashboard, panel, target, variable... before rendering final result
  • Change instanceLabels, groupLabels from default job,instance to use dashboards in environments with extra or custom labels
  • Instantiate node dashboards more than once in a single environment (Uid and filteringSelector are used to avoid conflicts).

To learn more about suggested mixin packaging format (observ-lib) you can check dataless example here.

What is recreated/moved from old node-mixin:

  • Dashboards except USE.
  • All alerts
  • All recording rules

Added/Updated:

  • Dashboards:
    • Fleet overview
    • Overview dashboard (added inventory row)
    • Drill down dashboards:
      • Memory
      • CPU and system
      • Network (Interfaces & Sockstat / Netstat)
      • Disks and filesystem
      • Logs (only added if enableLokiLogs:true)
    • MacOS overview (added inventory row)
    • MacOS logs (only added if enableLokiLogs:true)
  • Prometheus annotations (for all dashboards):
    • Reboot
    • OOM kill detected
    • Kernel update
  • New loki annotations (only added if enableLokiLogs:true)
    • Service failed
    • Critical system event
    • Session opened
    • Session closed
  • Alerts:
    • filesystem alerts are moved into separate alerts group to avoid hitting limits for number of alerts allowed in group (in Grafana Mimir)
  • Title case is used across all recreated panels and dashboard names

image
image
image

@gaetanars
Copy link

Just a little question : why don't you render dashboards with .json suffix?

@v-zhuravlev
Copy link
Contributor Author

Just a little question : why don't you render dashboards with .json suffix?

Nothing specific, It was a little bit more convenient to reference dashboards by key (dashboards.overview, dashboards.network etc..)..

@SuperQ
Copy link
Member

SuperQ commented Feb 3, 2024

If this is intended to replace the node-mixin, should we remove the old mixin?

@v-zhuravlev v-zhuravlev force-pushed the for-upstream-node-observ-lib branch from 150c28c to c75a766 Compare May 4, 2024 18:58
@v-zhuravlev
Copy link
Contributor Author

Just a little question : why don't you render dashboards with .json suffix?
@gaetanars, I think you are right, it is expected for end files to have .json suffix. Brought it back.

If this is intended to replace the node-mixin, should we remove the old mixin?
@SuperQ , we could, but we need to either to reimplement use dashboards as well, or agree that they could be dropped.

@v-zhuravlev
Copy link
Contributor Author

Just let me know, if what do you think I should do to merge this :)

@v-zhuravlev v-zhuravlev requested a review from gaetanars June 3, 2024 12:33
@v-zhuravlev
Copy link
Contributor Author

@SuperQ ,Hi, sorry for bumping, is there anything blocking to move this forward? thanks

@discordianfish
Copy link
Member

Can it produce the same dashboards as the old mixin? Then we could remove the old one. If not, I dont know. We certainly dont want to support both..

@v-zhuravlev
Copy link
Contributor Author

The only difference is USE dashboards, there are not present in new set:

USE dashboards (node-rsrc-use.json, node-cluster-rsrc-use.json,node-multicluster-rsrc-use.json)

@discordianfish
Copy link
Member

@v-zhuravlev Could you add the USE dashboards as well? Then I'm open to replace the old mixins with this.

@v-zhuravlev
Copy link
Contributor Author

@v-zhuravlev Could you add the USE dashboards as well? Then I'm open to replace the old mixins with this.

Sure, will add them shortly

@v-zhuravlev
Copy link
Contributor Author

Use dashboard migration: image
Will add commits shortly.

@v-zhuravlev
Copy link
Contributor Author

v-zhuravlev commented Nov 1, 2024

@v-zhuravlev Could you add the USE dashboards as well? Then I'm open to replace the old mixins with this.

  1. Added use dashboards and merged node-mixin and node-observ-lib folders.
  2. Synced the latest PRs (AIX etc)
  3. Included Add NodeSystemdServiceCrashlooping alert to mixin #3039
    @discordianfish
    Could you take a look please?

USE:
image
USE cluster:
image

Use mutlicluster:
Before:
image
After:
image

* Add node-observ-lib

* Remove trends support (not in 10.0 schema)

* Make filteringSelector for logs dashboard configurable

* Temp change dependency (until PR is merged for commonlib)

* Refactor config

* Update jsonnetfile.json

* Update README

* Add separate loki example

* Add sep file example

Signed-off-by: Vitaly Zhuravlev <[email protected]>
Signed-off-by: Vitaly Zhuravlev <[email protected]>
Signed-off-by: Vitaly Zhuravlev <[email protected]>
Signed-off-by: Vitaly Zhuravlev <[email protected]>
Signed-off-by: Vitaly Zhuravlev <[email protected]>
Signed-off-by: Vitaly Zhuravlev <[email protected]>
Signed-off-by: Vitaly Zhuravlev <[email protected]>
Signed-off-by: Vitaly Zhuravlev <[email protected]>
Signed-off-by: Vitaly Zhuravlev <[email protected]>
Signed-off-by: Vitaly Zhuravlev <[email protected]>
Signed-off-by: Vitaly Zhuravlev <[email protected]>
Signed-off-by: Vitaly Zhuravlev <[email protected]>
Signed-off-by: Vitaly Zhuravlev <[email protected]>
Signed-off-by: Vitaly Zhuravlev <[email protected]>
prombot and others added 23 commits November 1, 2024 15:18
Signed-off-by: prombot <[email protected]>
Signed-off-by: Vitaly Zhuravlev <[email protected]>
Signed-off-by: Steve Wills <[email protected]>
Signed-off-by: Vitaly Zhuravlev <[email protected]>
Signed-off-by: Johannes Ziemke <[email protected]>
Signed-off-by: Vitaly Zhuravlev <[email protected]>
Signed-off-by: Johannes Ziemke <[email protected]>
Signed-off-by: Vitaly Zhuravlev <[email protected]>
Signed-off-by: Johannes Ziemke <[email protected]>
Signed-off-by: Vitaly Zhuravlev <[email protected]>
Signed-off-by: Johannes Ziemke <[email protected]>
Signed-off-by: Vitaly Zhuravlev <[email protected]>
Signed-off-by: Johannes Ziemke <[email protected]>
Signed-off-by: Vitaly Zhuravlev <[email protected]>
Signed-off-by: Johannes Ziemke <[email protected]>
Signed-off-by: Vitaly Zhuravlev <[email protected]>
Signed-off-by: Johannes Ziemke <[email protected]>
Signed-off-by: Vitaly Zhuravlev <[email protected]>
Signed-off-by: Johannes Ziemke <[email protected]>
Signed-off-by: Vitaly Zhuravlev <[email protected]>
We already support reading from multiple directories though only using globs. Now we can specify them outright.

Example use case is exporting both static info on a RO FS generated during image building and traditional uses of textfiles (e.g. for R/W service metrics files) without scripting a file copy.

* keep flag name for compatibility
* clarify flag help text
* add test case (replicating the glob one)

Signed-off-by: eduarrrd <[email protected]>
Signed-off-by: Vitaly Zhuravlev <[email protected]>
Avoid Linux-specific code scattered in two places by moving it to the
already-existing zfs_linux.go.

Signed-off-by: Daniel Swarbrick <[email protected]>
Signed-off-by: Vitaly Zhuravlev <[email protected]>
Signed-off-by: Daniel Swarbrick <[email protected]>
Signed-off-by: Vitaly Zhuravlev <[email protected]>
Code does not modify zfsPoolStatesName slice, so make it an array.

Signed-off-by: Daniel Swarbrick <[email protected]>
Signed-off-by: Vitaly Zhuravlev <[email protected]>
Ensure identical factory function name across arch-specific files so
that the common init() function in zfs.go works.

Signed-off-by: Daniel Swarbrick <[email protected]>
Signed-off-by: Vitaly Zhuravlev <[email protected]>
Also add build-tags to ensure it is ignored on non-relevant archs.

Signed-off-by: Daniel Swarbrick <[email protected]>
Signed-off-by: Vitaly Zhuravlev <[email protected]>
Drop superfluous and overly pedantic typecasting for values that fit
within 32 bits or where type comparison is already hinted.

Signed-off-by: Daniel Swarbrick <[email protected]>
Signed-off-by: Vitaly Zhuravlev <[email protected]>
Avoid panic for accessing slice out of range in hwmon.

Fixes: prometheus#3108

Signed-off-by: Ben Kochie <[email protected]>
Signed-off-by: Vitaly Zhuravlev <[email protected]>
Bumps [github.com/prometheus/common](https://github.com/prometheus/common) from 0.59.1 to 0.60.0.
- [Release notes](https://github.com/prometheus/common/releases)
- [Changelog](https://github.com/prometheus/common/blob/main/RELEASE.md)
- [Commits](prometheus/common@v0.59.1...v0.60.0)

---
updated-dependencies:
- dependency-name: github.com/prometheus/common
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Vitaly Zhuravlev <[email protected]>
…theus#3140)

Bumps [github.com/mdlayher/wifi](https://github.com/mdlayher/wifi) from 0.2.0 to 0.3.0.
- [Release notes](https://github.com/mdlayher/wifi/releases)
- [Commits](mdlayher/wifi@v0.2.0...v0.3.0)

---
updated-dependencies:
- dependency-name: github.com/mdlayher/wifi
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Vitaly Zhuravlev <[email protected]>
Signed-off-by: prombot <[email protected]>
Signed-off-by: Vitaly Zhuravlev <[email protected]>
Signed-off-by: prombot <[email protected]>
Signed-off-by: Vitaly Zhuravlev <[email protected]>
Signed-off-by: Vitaly Zhuravlev <[email protected]>
@v-zhuravlev v-zhuravlev force-pushed the for-upstream-node-observ-lib branch from 8327300 to 6f48c3c Compare November 1, 2024 15:19
Signed-off-by: Vitaly Zhuravlev <[email protected]>
Signed-off-by: Vitaly Zhuravlev <[email protected]>
Signed-off-by: Vitaly Zhuravlev <[email protected]>
@discordianfish discordianfish self-assigned this Nov 2, 2024
@v-zhuravlev
Copy link
Contributor Author

@discordianfish, hi! Any chance to check it? thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.