-
Notifications
You must be signed in to change notification settings - Fork 176
feat: traefik alerts #1460
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
feat: traefik alerts #1460
Changes from 10 commits
Commits
Show all changes
16 commits
Select commit
Hold shift + click to select a range
f4268db
feat: traefik alerts
wbollock fe655d6
docs: update README for traefik
wbollock a6c274b
chore: make fmt
wbollock f485e82
ref: make everything configurable
wbollock 478d293
ref: only have warning alert fire if above crit
wbollock afe04fe
docs: mention config vars
wbollock a113742
ref: match convention of config vars
wbollock 0a440a6
docs: match new var names
wbollock c3ebbc9
fix: add summary
wbollock f1ba221
fix: template config reload with environment label
wbollock 7b4b7a1
fix: better grouping for template
wbollock 5b0f366
fix: make all
wbollock 65f86e2
ref: remove config comments
wbollock 899ab9c
ref: remove environment label
wbollock 2a612f0
fix: make all
wbollock dae0987
fix: make fmt
wbollock File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,10 +1,62 @@ | ||
| The Traefik mixin is a set of configurable, reusable, and extensible dashboards based on the metrics exported by Traefik itself. It also creates suitable dashboard descriptions for Grafana. | ||
| # traefik-mixin | ||
|
|
||
| The Traefik mixin is a set of configurable, reusable, and extensible dashboards based on the metrics exported by Traefik itself. It also creates suitable dashboard descriptions for Grafana. Lastly, some alerts are also included. | ||
|
|
||
| To use them, you need to have mixtool and jsonnetfmt installed. If you have a working Go development environment, it's easiest to run the following: | ||
|
|
||
| $ go get github.com/monitoring-mixins/mixtool/cmd/mixtool | ||
| $ go get github.com/google/go-jsonnet/cmd/jsonnetfmt | ||
| You can then build the Prometheus rules files alerts.yaml and rules.yaml and a directory dashboard_out with the JSON dashboard files for Grafana: | ||
| ```shell | ||
| go get github.com/monitoring-mixins/mixtool/cmd/mixtool | ||
| go get github.com/google/go-jsonnet/cmd/jsonnetfmt | ||
| ``` | ||
|
|
||
| You can then build the Prometheus rules files and dashboards for Grafana: | ||
|
|
||
| ```shell | ||
| make build | ||
| ``` | ||
|
|
||
| This will generate: | ||
|
|
||
| - Prometheus alerts in `prometheus_rules_out/prometheus_alerts.yaml` | ||
| - Prometheus rules in `prometheus_rules_out/prometheus_rules.yaml` (if you have rules defined) | ||
| - Grafana dashboards in `dashboards_out/` | ||
|
|
||
| ## Included Alerts | ||
|
|
||
| The following Prometheus alerts are included: | ||
|
|
||
| - **TraefikConfigReloadFailuresIncreasing**: Fires if Traefik is failing to reload its config. | ||
| - **TraefikTLSCertificatesExpiring**: Fires if Traefik is serving certificates that will expire very soon (critical, threshold configurable). | ||
| - **TraefikTLSCertificatesExpiringSoon**: Fires if Traefik is serving certificates that will expire soon (warning, threshold configurable, only fires if the expiry is less than the warning threshold but greater than the critical threshold). | ||
|
|
||
| ## Configuration | ||
|
|
||
| You can configure alert thresholds, selectors, and labels in `config.libsonnet`: | ||
|
|
||
| ```jsonnet | ||
| { | ||
| _config+:: { | ||
| traefik_tls_expiry_days_critical: 7, // critical threshold (days) | ||
| traefik_tls_expiry_days_warning: 14, // warning threshold (days) | ||
| filteringSelector: '', // optional metric label selector for all alerts | ||
| // Example: | ||
| // filteringSelector: "component=\"traefik\",environment=\"production\"", | ||
| groupLabels: 'job, environment', // for config reload alert (sum by) | ||
| instanceLabels: 'instance', // for TLS alerts (max by) | ||
| alertLabels: {}, // optional alert labels | ||
| // Example: | ||
| // alertLabels: { | ||
| // environment: 'production', | ||
| // component: 'traefik', | ||
| // }, | ||
| alertAnnotations: {}, // optional alert annotations | ||
| // Example: | ||
| // alertAnnotations: { | ||
| // runbook: 'https://runbooks.example.com/traefik-tls', | ||
| // grafana: 'https://grafana.example.com/d/traefik', | ||
| // }, | ||
| }, | ||
| } | ||
| ``` | ||
|
|
||
| $ make build | ||
| For more advanced uses of mixins, see https://github.com/monitoring-mixins/docs. | ||
| For more advanced uses of mixins, see [monitoring-mixins/docs](https://github.com/monitoring-mixins/docs). | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,59 @@ | ||
| { | ||
| prometheusAlerts+:: { | ||
| groups+: [ | ||
| { | ||
| name: 'traefik', | ||
| rules: [ | ||
| { | ||
| alert: 'TraefikConfigReloadFailuresIncreasing', | ||
| expr: ||| | ||
| sum by (%(groupLabels)s, environment) (rate(traefik_config_reloads_failure_total{%(filteringSelector)s}[5m])) > 0 | ||
wbollock marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| ||| % $._config, | ||
| 'for': '5m', | ||
| labels: { | ||
| severity: 'critical', | ||
| } + std.get($._config, 'alertLabels', {}), | ||
| annotations: { | ||
| summary: 'Traefik is failing to reload its configuration.', | ||
| description: ||| | ||
| Traefik is failing to reload its config in {{ $labels.environment }}. | ||
| |||, | ||
wbollock marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| } + std.get($._config, 'alertAnnotations', {}), | ||
| }, | ||
| { | ||
| alert: 'TraefikTLSCertificatesExpiring', | ||
| expr: ||| | ||
| max by (%(instanceLabels)s, sans) ((last_over_time(traefik_tls_certs_not_after{%(filteringSelector)s}[5m]) - time()) / 86400) < %(traefik_tls_expiry_days_critical)s | ||
| ||| % $._config, | ||
| 'for': '5m', | ||
| labels: { | ||
| severity: 'critical', | ||
| } + std.get($._config, 'alertLabels', {}), | ||
| annotations: { | ||
| summary: 'A Traefik-served TLS certificate will expire very soon.', | ||
| description: ||| | ||
| The minimum number of days until a Traefik-served certificate expires is {{ printf "%%.0f" $value }} days on {{ $labels.sans }} which is below the critical threshold of %(traefik_tls_expiry_days_critical)s. | ||
| ||| % $._config, | ||
| } + std.get($._config, 'alertAnnotations', {}), | ||
| }, | ||
| { | ||
| alert: 'TraefikTLSCertificatesExpiringSoon', | ||
| expr: ||| | ||
| max by (%(instanceLabels)s, sans) ((last_over_time(traefik_tls_certs_not_after{%(filteringSelector)s}[5m]) - time()) / 86400) < %(traefik_tls_expiry_days_warning)s > %(traefik_tls_expiry_days_critical)s | ||
| ||| % $._config, | ||
| 'for': '5m', | ||
| labels: { | ||
| severity: 'warning', | ||
| } + std.get($._config, 'alertLabels', {}), | ||
| annotations: { | ||
| summary: 'A Traefik-served TLS certificate will expire soon.', | ||
| description: ||| | ||
| The minimum number of days until a Traefik-served certificate expires is {{ printf "%%.0f" $value }} days on {{ $labels.sans }} which is less than %(traefik_tls_expiry_days_warning)s but greater than %(traefik_tls_expiry_days_critical)s. | ||
| ||| % $._config, | ||
| } + std.get($._config, 'alertAnnotations', {}), | ||
| }, | ||
| ], | ||
| }, | ||
| ], | ||
| }, | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,26 @@ | ||
| { | ||
| _config+:: { | ||
| // alerts thresholds | ||
| traefik_tls_expiry_days_critical: 7, | ||
| traefik_tls_expiry_days_warning: 14, | ||
| filteringSelector: '', | ||
| // Example: | ||
| // filteringSelector: "component=\"traefik\",environment=\"production\"", | ||
| // for config reload alert | ||
| groupLabels: 'job', | ||
| // for TLS alerts | ||
| instanceLabels: 'instance', | ||
| alertLabels: {}, | ||
| // Example: | ||
| // alertLabels: { | ||
| // environment: 'production', | ||
| // component: 'traefik', | ||
| // }, | ||
| alertAnnotations: {}, | ||
| // Example: | ||
| // alertAnnotations: { | ||
| // runbook: 'https://runbooks.example.com/traefik-tls', | ||
| // grafana: 'https://grafana.example.com/d/traefik', | ||
| // }, | ||
| }, | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.