*: Add initial remote-write-receive component#811
Conversation
|
Finally <3 First of all, as you can see our CI does not like NOT checked errors (: and both: looks like it would be nice at least log them. |
d042091 to
162c12f
Compare
|
Looks like that made CI happy 🙂 . |
|
I think having a simple e2e test added would be good. I'll look into adding one. |
bwplotka
left a comment
There was a problem hiding this comment.
Generally LGTM, just super small nits, mostly around style. (:
Nice work!
cmd/thanos/remote-write-receive.go
Outdated
| ) | ||
|
|
||
| func registerRemoteWriteReceive(m map[string]setupFunc, app *kingpin.Application, name string) { | ||
| cmd := app.Command(name, "accept Prometheus remote write API requests (EXPERIMENTAL, this may change drastically without notice)") |
There was a problem hiding this comment.
| cmd := app.Command(name, "accept Prometheus remote write API requests (EXPERIMENTAL, this may change drastically without notice)") | |
| cmd := app.Command(name, "Accept Prometheus remote write API requests (EXPERIMENTAL, this may change drastically without notice)") |
cmd/thanos/remote-write-receive.go
Outdated
| return nil | ||
| }, | ||
| func(err error) { | ||
| level.Debug(logger).Log("msg", "initial load group errored", "err", err) |
There was a problem hiding this comment.
hmm, I don't get this log messages here and on every group. This code is invoked even though this group did not errof (but e.g other group errored). I think we can remove those debug logs. We will print error for problematic group via g.Run anyway.
There was a problem hiding this comment.
Sorry, this is a leftover from some debugging, I'll remove it.
| }, | ||
| func(err error) { | ||
| level.Debug(logger).Log("msg", "tsdb group errored", "err", err) | ||
| if err := localStorage.Close(); err != nil { |
There was a problem hiding this comment.
just checking .. localStorage.Close closes db as well?
cmd/thanos/remote-write-receive.go
Outdated
| func(err error) { | ||
| level.Debug(logger).Log("msg", "tsdb group errored", "err", err) | ||
| if err := localStorage.Close(); err != nil { | ||
| level.Error(logger).Log("msg", "Error stopping storage", "err", err) |
There was a problem hiding this comment.
all log lines should start lowercase
cmd/thanos/remote-write-receive.go
Outdated
|
|
||
| level.Debug(logger).Log("msg", "setting up metric http listen-group") | ||
| if err := metricHTTPListenGroup(g, logger, reg, httpMetricsBindAddr); err != nil { | ||
| level.Debug(logger).Log("msg", "metric listener errored", "err", err) |
There was a problem hiding this comment.
Replicated error handling. (rule "Handle error just once") (:
I think error wrap is the way to go (and removal of this log line).
pkg/receive/handler.go
Outdated
|
|
||
| var ( | ||
| m = cmux.New(listener) | ||
| httpl = m.Match(cmux.HTTP1Fast()) |
There was a problem hiding this comment.
Can you comment what this is for? It might be not straightforward.
pkg/receive/handler.go
Outdated
| return | ||
| } | ||
|
|
||
| err = h.receiver.Receive(&wreq) |
There was a problem hiding this comment.
we can do if err := ... ; err != nil as well
| package receive | ||
|
|
||
| import ( | ||
| "github.com/go-kit/kit/log" |
scripts/prometheus.yml
Outdated
| @@ -0,0 +1,10 @@ | |||
| # When the Thanos remote-write-receive component is started, | |||
There was a problem hiding this comment.
Hm.. can we add this to remote-write-receive command description instead? Might be easier to discover
|
|
||
| sleep 0.5 | ||
|
|
||
| if [ -n "${REMOTE_WRITE_ENABLED}" ] |
There was a problem hiding this comment.
Wow I did not know that this script still works (:
|
And yea, some tests would be nice. (: |
cmd/thanos/remote-write-receive.go
Outdated
|
|
||
| var metadata = &promMetadata{ | ||
| // Start out with the full time range. The shipper will constrain it later. | ||
| // TODO(fabxc): minimum timestamp is never adjusted if shipping is disabled. |
There was a problem hiding this comment.
Was this code copied from somewhere? Slightly confusing to see TODO(fabxc) in here, but not a big deal I guess.
There was a problem hiding this comment.
haha yea was surprised as well at the beginning (: It is copied indeed.
pkg/receive/handler.go
Outdated
| // Options for the web Handler. | ||
| type Options struct { | ||
| Receiver *ReceiveWriter | ||
| Context context.Context |
There was a problem hiding this comment.
This can be removed as it is not used.
metalmatze
left a comment
There was a problem hiding this comment.
Only a few minor things found next to what @bwplotka already discovered.
Overall really good stuff! 👍
pkg/receive/handler.go
Outdated
| // Handler serves various HTTP endpoints of the Prometheus server | ||
| type Handler struct { | ||
| readyStorage *tsdb.ReadyStorage | ||
| context context.Context |
There was a problem hiding this comment.
This can be removed as it is not used.
pkg/receive/handler.go
Outdated
| ReadTimeout: h.options.ReadTimeout, | ||
| } | ||
|
|
||
| errCh := make(chan error) |
There was a problem hiding this comment.
Yes, this should be a run.Group.
pkg/receive/handler.go
Outdated
| func (h *Handler) receive(w http.ResponseWriter, req *http.Request) { | ||
| defer func() { | ||
| if r := recover(); r != nil { | ||
| fmt.Println("panic recovered:", r, string(debug.Stack())) |
There was a problem hiding this comment.
Maybe we want to have a counter for how many panics were recovered?
pkg/receive/receivewriter.go
Outdated
| Appender() (storage.Appender, error) | ||
| } | ||
|
|
||
| type ReceiveWriter struct { |
There was a problem hiding this comment.
This is spelled receive.ReceiveWriter and should be renamed to receive.Writer to be more idiomatic.
d7765cc to
abf0029
Compare
|
@bwplotka @metalmatze I believe I addressed everything or answered the comments and added the receiver to the e2e test setup (the test is that a prometheus is successfully pushing an |
domgreen
left a comment
There was a problem hiding this comment.
👍 Looks solid overall just a few nits. 👯♂️
Sorry again for taking so long to get around to this :(
cmd/thanos/main.go
Outdated
| registerCompact(cmds, app, "compact") | ||
| registerBucket(cmds, app, "bucket") | ||
| registerDownsample(cmds, app, "downsample") | ||
| registerRemoteWriteReceive(cmds, app, "remote-write-receive") |
There was a problem hiding this comment.
nit - would shorten this to just receive/remote /receiver
cmd/thanos/remote-write-receive.go
Outdated
| cancel() | ||
| peer.Close(2 * time.Second) | ||
| }) | ||
| } |
There was a problem hiding this comment.
can we remove this whole block? Will receive need to peer via gossip? If we are removing gossip in the near future I think we should not add it to any new components and instead help people migrate to the new approach.
cmd/thanos/remote-write-receive.go
Outdated
| level.Debug(logger).Log("msg", "setting up grpc server") | ||
| { | ||
| var ( | ||
| logger = log.With(logger, "component", "receiver") |
There was a problem hiding this comment.
receiver > receive? or the verbose name? Let's keep it consistant.
pkg/component/component.go
Outdated
| Rule = sourceStoreAPI{component: component{name: "rule"}} | ||
| Sidecar = sourceStoreAPI{component: component{name: "sidecar"}} | ||
| Store = sourceStoreAPI{component: component{name: "store"}} | ||
| RemoteWrite = sourceStoreAPI{component: component{name: "remote_write"}} |
pkg/receive/handler.go
Outdated
|
|
||
| reqBuf, err := snappy.Decode(nil, compressed) | ||
| if err != nil { | ||
| fmt.Println("snappy decode error") |
scripts/quickstart.sh
Outdated
|
|
||
| if [ -n "${REMOTE_WRITE_ENABLED}" ] | ||
| then | ||
| ./thanos remote-write-receive \ |
There was a problem hiding this comment.
👍 this is great taht were adding it here as well.
There was a problem hiding this comment.
either we leave this or remove gossip 🙂
There was a problem hiding this comment.
@bwplotka what do you think on this ... personally, I think we should phase out gossip so (in a different PR) change this file to work via the new approach.
So for this PR I would remove receive from this and once the above work has been done add it back in.
There was a problem hiding this comment.
As we have the e2e tests, that works for me.
There was a problem hiding this comment.
As there is no git-history of this, I'm going to comment this out for now, but leave it. Does that sound like an ok compromise?
There was a problem hiding this comment.
I'm happy with that ... can you make sure a comment about why its commented is included 😄
cmd/thanos/remote-write-receive.go
Outdated
| dataDir string, | ||
| peer cluster.Peer, | ||
| ) error { | ||
| level.Info(logger).Log("msg", "setting up receiver") |
There was a problem hiding this comment.
Is it worth adding a warn log here saying that it is currently experimental?
abf0029 to
6c371a6
Compare
pkg/receive/handler.go
Outdated
| var ( | ||
| requestDuration = prometheus.NewHistogramVec( | ||
| prometheus.HistogramOpts{ | ||
| Name: "prometheus_http_request_duration_seconds", |
There was a problem hiding this comment.
We might want thanos prefix for those metrics? Or do we want to keep it similar to get advantage of common dashboards?
There was a problem hiding this comment.
Sorry that was a copy paste mistake. Yes lets make it thanos_http_....
pkg/receive/handler.go
Outdated
| return func(w http.ResponseWriter, r *http.Request) { | ||
| if h.isReady() { | ||
| f(w, r) | ||
| } else { |
There was a problem hiding this comment.
just return here, no need for else
pkg/receive/handler.go
Outdated
| ReadTimeout: h.options.ReadTimeout, | ||
| } | ||
|
|
||
| errCh := make(chan error) |
6c371a6 to
e1315c8
Compare
bwplotka
left a comment
There was a problem hiding this comment.
Still some unaddressed bit, plus some suggestions.. Overall LGTM, but bit major is not versioned remote read receive endpoint, thoughts? (:
IF all will be addressed -> LGTM from me.
cmd/thanos/receive.go
Outdated
| Default("./data").String() | ||
|
|
||
| m[name] = func(g *run.Group, logger log.Logger, reg *prometheus.Registry, tracer opentracing.Tracer, _ bool) error { | ||
| peer, err := newPeerFn(logger, reg, false, "", false) |
There was a problem hiding this comment.
You can actualy remove gossip stuff if you want, up to you - making sure deprecated stuff is in new component is quite not necessary. (: Not a blocker
cmd/thanos/receive.go
Outdated
| dataDir string, | ||
| peer cluster.Peer, | ||
| ) error { | ||
| level.Info(logger).Log("msg", "setting up receive") |
There was a problem hiding this comment.
Let's kill this message, below as setting up receive; this component is ...
cmd/thanos/receive.go
Outdated
| select { | ||
| case <-dbOpen: | ||
| break | ||
| // In case a shutdown is initiated before the dbOpen is released |
There was a problem hiding this comment.
Missing trailing period in comment (:
There was a problem hiding this comment.
Also I would remove this, I think this is quite clear.
cmd/thanos/receive.go
Outdated
| level.Debug(logger).Log("msg", "setting up grpc server") | ||
| { | ||
| var ( | ||
| logger = log.With(logger, "component", "receiver") |
There was a problem hiding this comment.
Why not doing this in very top?
| github.com/onsi/ginkgo v1.6.0/go.mod h1:lLunBs/Ym6LB5Z9jYTR76FiuTmxDTDusOGeTQH+WWjE= | ||
| github.com/onsi/gomega v1.4.1 h1:PZSj/UFNaVp3KxrzHOcS7oyuWA7LoOY/77yCTEFu21U= | ||
| github.com/onsi/gomega v1.4.1/go.mod h1:C1qb7wdrVGGVU+Z6iS04AVkA3Q65CEZX59MT0QO5uiA= | ||
| github.com/opentracing-contrib/go-stdlib v0.0.0-20170113013457-1de4cc2120e7 h1:8KbikWulLUcMM96hBxjgoo6gTmCkG6HYSDohv/WygYU= |
pkg/receive/handler.go
Outdated
| } | ||
|
|
||
| readyf := h.testReady | ||
| router.Post("/receive", readyf(h.receive)) |
There was a problem hiding this comment.
Ok, so this starts to be some API. We already keep Prometheus like API to guide people and be intuitive. This means that remote receiver would be nice to use the same or similar... does Prometheus serves remote write endpoint? What endpoint looks like? api/v1/write ? As a basics something like v1 in path is useful.
There was a problem hiding this comment.
tend to agree but flexible on the url ... api/v1/receive feels more correct to me
There was a problem hiding this comment.
agreed with the prefix, going with /api/v1/receive, no strong opinion though
pkg/receive/handler.go
Outdated
| ReadTimeout: h.options.ReadTimeout, | ||
| } | ||
|
|
||
| errCh := make(chan error) |
1ab4011 to
1a7ea0c
Compare
57c0542 to
d459673
Compare
d459673 to
6cb54d4
Compare
|
I believe I addressed everything:
|
bwplotka
left a comment
There was a problem hiding this comment.
Awesome, all LGTM from my side.
| ) | ||
|
|
||
| func regCommonServerFlags(cmd *kingpin.CmdClause) ( | ||
| func regGRPCFlags(cmd *kingpin.CmdClause) ( |
There was a problem hiding this comment.
hahah I was starring at monitor 1m and thinking why you removed Gossip for everyone in this PR... then seen that actually you added this function.... Github troll =D
Changes
This adds the very first pieces of the proposal to add receiving remote-write support to Thanos. The features are just having a single tsdb that remote write requests are written into and that the component joins a mesh network to enable querying. All other features are still outstanding but can be worked on in parallel once this lands.
Verification
As this is very small and primarily hooks up existing components, the extend of my verification is that I added this new component to the quickstart script and verified that indeed data replicated to the endpoint receiving remote-write calls can be queried via the Thanos querier. Happy to add this to the e2e test suite though.
@bwplotka @domgreen
cc @metalmatze @mxinden @squat @s-urbaniak @ant31