*: support authentication and TLS for Alertmanager#1838
*: support authentication and TLS for Alertmanager#1838bwplotka merged 10 commits intothanos-io:masterfrom
Conversation
bwplotka
left a comment
There was a problem hiding this comment.
Awesome, very nice, it would be nice to add some e2e tests TBH, but it looks good from static review (:
c76a305 to
358fdcf
Compare
|
@bwplotka it's ready now. |
bwplotka
left a comment
There was a problem hiding this comment.
Nice! It looks great, but I have some suggestions. I think we improved single alertmanagers.urls flag on the way which is nice. (: Thanks!
I think all of those are minor suggestions and this is generally good!
cmd/thanos/rule.go
Outdated
| return err | ||
| } | ||
| var ( | ||
| alertingcfg alert.AlertingConfig |
There was a problem hiding this comment.
I think we stick to camelCase here, but not a big deal, looks readable (:
pkg/alert/alert.go
Outdated
| "msg", "sending alerts failed", | ||
| "alertmanager", u.Host, | ||
| "numAlerts", len(alerts), | ||
| "err", err) |
There was a problem hiding this comment.
) should be in next line I think in terms of formatting (:
pkg/alert/alert.go
Outdated
| } | ||
| } | ||
|
|
||
| type AlertmanagerDoer interface { |
There was a problem hiding this comment.
Everywhere else we refer as Client - should we rename here as well? (:
pkg/alert/alert.go
Outdated
| } | ||
|
|
||
| // Send an alert batch to all given Alertmanager URLs. | ||
| // Send an alert batch to all given Alertmanager client. |
There was a problem hiding this comment.
| // Send an alert batch to all given Alertmanager client. | |
| // Send an alert batch to all given Alertmanager clients. |
docs/components/rule.md
Outdated
|
|
||
| ### Alertmanager | ||
|
|
||
| The configuration format supported by the `--alertmanagers.config` and `--alertmanagers.config-file` flags is the following: |
There was a problem hiding this comment.
Can we mention something like:
The configuration allows specifying multiple Alertmanagers. Those entries are treated as a single HA group. This means that alert send failure is claimed only if Ruler fails to send to all instances.
I think we might be missing this as users could use it in a different way (sharding alerts, fanout etc)
test/e2e/rule_test.go
Outdated
| r := rule(a.New(), a.New(), rulesDir, amCfg, []address{qAddr}, nil) | ||
| q := querier(qAddr, a.New(), []address{r.GRPC}, nil) | ||
|
|
||
| ctx, cancel := context.WithTimeout(context.Background(), 1*time.Minute) |
There was a problem hiding this comment.
1*time.Minute This might be not enough for our sometimes slow CI, let's make it 3m
There was a problem hiding this comment.
right! it should be ok now.
test/e2e/rule_test.go
Outdated
| })) | ||
|
|
||
| // Update the Alertmanager file service discovery configuration. | ||
| writeAlertmanagerFileSD(t, filepath.Join(amDir, "targets.yaml"), am.HTTP.HostPort()) |
There was a problem hiding this comment.
| writeAlertmanagerFileSD(t, filepath.Join(amDir, "targets.yaml"), am.HTTP.HostPort()) | |
| writeRulerAlertmanagerFileSD(t, filepath.Join(amDir, "targets.yaml"), am.HTTP.HostPort()) |
test/e2e/rule_test.go
Outdated
| return nil | ||
| })) | ||
|
|
||
| // Update the Alertmanager file service discovery configuration. |
There was a problem hiding this comment.
It sounds like we are updating some Alertmanager file SD not Ruler\s file SD for alertmanager (: Can we clarify a bit?
There was a problem hiding this comment.
I've removed writeAlertmanagerFileSD which wasn't really needed since it was only called once. Hopefully it's clearer now.
test/e2e/rule_test.go
Outdated
| <-exit | ||
| }() | ||
|
|
||
| // Wait for a couple of evaluations. |
There was a problem hiding this comment.
can we comment on what we wait?
| func TestRuleAlertmanagerFileSD(t *testing.T) { | ||
| a := newLocalAddresser() | ||
|
|
||
| am := alertManager(a.New()) |
There was a problem hiding this comment.
What do you think about this and using alertmanager Mock? I like e2e compatibility check against Alertmanager. I guess it would be too hard to use proper alertmanager in TestRuleAlertmanagerHTTPClient as well? (:
There was a problem hiding this comment.
I went with a "fake" Alertmanager for TestRuleAlertmanagerHTTPClient because Alertmanager doesn't support TLS and authentication natively so we would have to deploy something else in front of it. Since the other tests still exercise the "real" Alertmanager API, I felt that it was worth the trade off.
Side-note: with the Alertmanager v2 API and its Open API specification, it's even less needed to run a "real" Alertmanager server as you can generate the server code and probably hook into it from the e2e tests.
There was a problem hiding this comment.
This makes sense totally, worth to comment maybe? (:
Side-note: with the Alertmanager v2 API and its Open API specification, it's even less needed to run a "real" Alertmanager server as you can generate the server code and probably hook into it from the e2e tests.
From API perspective yes (methods, required parameters etc), but it's always nice to have e2e tests against actual implementation. This useful to check against hidden invariants etc.
There was a problem hiding this comment.
Agreed. It would be worth revisiting this part when we add support for Alertmanager API v2. I tried quickly to hack something with httputil.ReverseProxy but I failed short.
There was a problem hiding this comment.
Comment added.
8150c9f to
69083ff
Compare
bwplotka
left a comment
There was a problem hiding this comment.
Awesome! Looks like provider and some docs and test timeout are the only things to address (:
LGTM otherwise.
| // TODO(simonpasquier): add support for API version (v1 or v2). | ||
| type AlertmanagerConfig struct { | ||
| // HTTP client configuration. | ||
| HTTPClientConfig HTTPClientConfig `yaml:"http_config"` |
There was a problem hiding this comment.
Fair, it's just bit more work, but happy with this.
| func TestRuleAlertmanagerFileSD(t *testing.T) { | ||
| a := newLocalAddresser() | ||
|
|
||
| am := alertManager(a.New()) |
There was a problem hiding this comment.
This makes sense totally, worth to comment maybe? (:
Side-note: with the Alertmanager v2 API and its Open API specification, it's even less needed to run a "real" Alertmanager server as you can generate the server code and probably hook into it from the e2e tests.
From API perspective yes (methods, required parameters etc), but it's always nice to have e2e tests against actual implementation. This useful to check against hidden invariants etc.
test/e2e/rule_test.go
Outdated
| r := rule(a.New(), a.New(), rulesDir, amCfg, []address{qAddr}, nil) | ||
| q := querier(qAddr, a.New(), []address{r.GRPC}, nil) | ||
|
|
||
| ctx, cancel := context.WithTimeout(context.Background(), 1*time.Minute) |
This change adds support for authentication with basic auth, client certificates and bearer tokens. It also enables to configure TLS settings for the Alertmanager endpoints. Most of the work leverages the existing Prometheus configuration format and code. In particular TLS certificate files are automatically reloaded whenever they change. Signed-off-by: Simon Pasquier <spasquie@redhat.com>
…re both defined Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
69083ff to
f467acf
Compare
|
Rdy for review? (: |
|
yep |
bwplotka
left a comment
There was a problem hiding this comment.
🚄 Let's go! LGTM, thanks for this good work @simonpasquier
|
Thanks a lot for the speedy reviews! |
| var userAgent = fmt.Sprintf("Thanos/%s", version.Version) | ||
|
|
||
| type AlertingConfig struct { | ||
| Alertmanagers []AlertmanagerConfig `yaml:"alertmanagers"` |
There was a problem hiding this comment.
Just thinking.. It might be nice to also use the alert_relabel_configs instead of the --alert.label-drop ?
That would provide higher variability to user.
Also adding external labels to the config instead of the --label flag?
Not sure how fare we want to take this configuration @bwplotka WDYT?
(Sorry for late comments, I had no time lately)
Closes #606
Changes
This change adds support for authentication with basic auth, client
certificates and bearer tokens. It also enables to configure TLS
settings for the Alertmanager endpoints.
Most of the work leverages the existing Prometheus configuration format
and code. In particular TLS certificate files are automatically reloaded
whenever they change.
Verification
End-to-end tests added to cover various HTTP client configurations (TLS, authentication) and file SD integration.