-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Description
In Cortex we recently introduced the store-gateway service (proposal) which, similarly to Thanos, sits in front of the bucket and is used to shard blocks across a pool of store-gateway instances (we also support a replication factor for HA in the read path).
The next step on the Cortex side would be introducing a consistency check. Basically, due to the lack of a strong coordination there's no guarantee that the set of store-gateway instances picked by a querier effectively loaded the exact set of blocks the querier expects (ie. ring hash change could be not propagated yet, blocks could still be loading after a resharding, etc). For this reason, we would like the querier to double check the queried blocks through store-gateway which is a piece of information missing right now.
Since the store-gateway internally uses the BucketStore we would need to expose the list of queried block IDs from the Series() API. So far, we're importing Thanos protobuf in Cortex in order to guarantee the Series() API endpoint exposed by Cortex store-gateway is compatible with Thanos, but it's a soft requirement (even if personally desirable).
I personally see a couple of options:
- Add
infotostorepb.SeriesResponse(in addition toseriesandwarnings). Theinfowould contain an entry with the list of queried block IDs. Theinforesponse wouldn't be included by default, but enabled though a new boolean flag inSeriesRequest. This option would allow Cortex to keep full API compatibility with Thanos. One downside is that onlyBucketStorewould support it (OK for Cortex, maybe KO for Thanos). - Refactor
BucketStore.Series()intoBucketStore.SeriesWithInfo(req, srv) (SeriesInfo, error)(not exposed through the protobuf) and havingBucketStore.Series()just callingSeriesWithInfo()and ignoring the returned info. This way Cortexstore-gatewaycould directly callBucketStore.SeriesWithInfo()instead ofBucketStore.Series().
I've also considered using Info() but there are couple of downsides:
- We would like the list of block IDs queried for a specific
Series()request and not the entire list of block IDs loaded in theBucketStore - There would be no guarantee that the returned block IDs are the one loaded once
Series()was called
Thoughts?