Skip to content

Expose queried block IDs from the BucketStore #2479

@pracucci

Description

@pracucci

In Cortex we recently introduced the store-gateway service (proposal) which, similarly to Thanos, sits in front of the bucket and is used to shard blocks across a pool of store-gateway instances (we also support a replication factor for HA in the read path).

The next step on the Cortex side would be introducing a consistency check. Basically, due to the lack of a strong coordination there's no guarantee that the set of store-gateway instances picked by a querier effectively loaded the exact set of blocks the querier expects (ie. ring hash change could be not propagated yet, blocks could still be loading after a resharding, etc). For this reason, we would like the querier to double check the queried blocks through store-gateway which is a piece of information missing right now.

Since the store-gateway internally uses the BucketStore we would need to expose the list of queried block IDs from the Series() API. So far, we're importing Thanos protobuf in Cortex in order to guarantee the Series() API endpoint exposed by Cortex store-gateway is compatible with Thanos, but it's a soft requirement (even if personally desirable).

I personally see a couple of options:

  1. Add info to storepb.SeriesResponse (in addition to series and warnings). The info would contain an entry with the list of queried block IDs. The info response wouldn't be included by default, but enabled though a new boolean flag in SeriesRequest. This option would allow Cortex to keep full API compatibility with Thanos. One downside is that only BucketStore would support it (OK for Cortex, maybe KO for Thanos).
  2. Refactor BucketStore.Series() into BucketStore.SeriesWithInfo(req, srv) (SeriesInfo, error) (not exposed through the protobuf) and having BucketStore.Series() just calling SeriesWithInfo() and ignoring the returned info. This way Cortex store-gateway could directly call BucketStore.SeriesWithInfo() instead of BucketStore.Series().

I've also considered using Info() but there are couple of downsides:

  1. We would like the list of block IDs queried for a specific Series() request and not the entire list of block IDs loaded in the BucketStore
  2. There would be no guarantee that the returned block IDs are the one loaded once Series() was called

Thoughts?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions