Skip to content

Commit ca2690d

Browse files
committed
Added binary index header implementation with benchmarks.
This PR adds index-header implementation based on [this design](https://thanos.io/proposals/201912_thanos_binary_index_header.md/) It adds a separate indexheader.Binary* structs and method allowing to build and read index-header in binary format. ## Stats Size difference: 10k series for my autogenerated data it's 2.1x -rw-r--r-- 1 bwplotka bwplotka 6.1M Jan 10 13:20 index -rw-r--r-- 1 bwplotka bwplotka 23K Jan 10 13:20 index.cache.json -rw-r--r-- 1 bwplotka bwplotka 9.2K Jan 10 13:20 index-header For realistic block 8mln series, also similar gain. -rw-r--r-- 1 bwplotka bwplotka 1.9G Jan 10 13:29 index -rw-r--r-- 1 bwplotka bwplotka 287M Jan 10 13:29 index.cache.json -rw-r--r-- 1 bwplotka bwplotka 122M Jan 10 13:29 index-header NOTE: Size is smaller, but it's not what we are trying to optimize for. Nevertheless PostingOffsets and Symbols takes significant amount of bytes. The only downsides of size is the fact that to create such index-header we have to fetch those two parts ~60MB each from object storage. Idea for improvement if that will become a problem: Cache only 32th of the posting ranges and fetch gaps between on demand on query time (with some cache). Real time latencies for creation and loading (without network traffic): For 10k block it's similar for both (ms/micros), for 8mln we can spot the difference: index-header: * write 134.197732ms * read 415.971774ms index-cache.json: * write 6.712496338s * read 6.112222132s ## Go Benchmarks: Before comparing I changed names to correlate tests: BenchmarkJSONReader-12-> BenchmarkRead-12 old BenchmarkBinaryReader-12 -> BenchmarkRead-12 new BenchmarkJSONWrite-12 -> BenchmarkWrite-12 old BenchmarkBinaryWrite-12 -> BenchmarkWrite-12 new ### 10k series block: benchmark old ns/op new ns/op delta BenchmarkRead-12 591780 66613 -88.74% BenchmarkWrite-12 2458454 6532651 +165.72% benchmark old allocs new allocs delta BenchmarkRead-12 2306 629 -72.72% BenchmarkWrite-12 1995 64 -96.79% benchmark old bytes new bytes delta BenchmarkRead-12 150904 32976 -78.15% BenchmarkWrite-12 161501 73412 -54.54% CPU time for smaller index file is interesting. Value is low anyway. Might be something to follow up. ### 8mln series (index takes 2GB so not committed to git): benchmark old ns/op new ns/op delta BenchmarkRead-12 6221319001 502898265 -91.92% BenchmarkWrite-12 5609757863 286510336 -94.89% benchmark old allocs new allocs delta BenchmarkRead-12 20099976 5501314 -72.63% BenchmarkWrite-12 18263425 66 -100.00% benchmark old bytes new bytes delta BenchmarkRead-12 1873778853 406021704 -78.33% BenchmarkWrite-12 2133929462 8462761 -99.60% Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
1 parent 718e51a commit ca2690d

File tree

17 files changed

+1376
-106
lines changed

17 files changed

+1376
-106
lines changed

docs/components/store.md

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -221,3 +221,62 @@ While the remaining settings are **optional**:
221221
- `max_get_multi_concurrency`: maximum number of concurrent connections when fetching keys. If set to `0`, the concurrency is unlimited.
222222
- `max_get_multi_batch_size`: maximum number of keys a single underlying operation should fetch. If more keys are specified, internally keys are splitted into multiple batches and fetched concurrently, honoring `max_get_multi_concurrency`. If set to `0`, the batch size is unlimited.
223223
- `dns_provider_update_interval`: the DNS discovery update interval.
224+
225+
226+
## Index Header
227+
228+
In order to query series inside blocks from object storage, Store Gateway has to know certain initial info about each block such as:
229+
230+
* symbols table to unintern string values
231+
* postings offset for posting lookup
232+
233+
In order to achieve so, on startup for each block `index-header` is built from pieces of original block's index and stored on disk.
234+
Such `index-header` file is then mmaped and used by Store Gateway.
235+
236+
### Format (version 1)
237+
238+
The following describes the format of the `index-header` file found in each block store gateway local directory.
239+
It is terminated by a table of contents which serves as an entry point into the index.
240+
241+
```
242+
┌─────────────────────────────┬───────────────────────────────┐
243+
│ magic(0xBAAAD792) <4b> │ version(1) <1 byte> │
244+
├─────────────────────────────┬───────────────────────────────┤
245+
│ index version(2) <1 byte> │ index PostingOffsetTable <8b> │
246+
├─────────────────────────────┴───────────────────────────────┤
247+
│ ┌─────────────────────────────────────────────────────────┐ │
248+
│ │ Symbol Table (exact copy from original index) │ │
249+
│ ├─────────────────────────────────────────────────────────┤ │
250+
│ │ Posting Offset Table (exact copy from index) │ │
251+
│ ├─────────────────────────────────────────────────────────┤ │
252+
│ │ TOC │ │
253+
│ └─────────────────────────────────────────────────────────┘ │
254+
└─────────────────────────────────────────────────────────────┘
255+
```
256+
257+
When the index is written, an arbitrary number of padding bytes may be added between the lined out main sections above. When sequentially scanning through the file, any zero bytes after a section's specified length must be skipped.
258+
259+
Most of the sections described below start with a `len` field. It always specifies the number of bytes just before the trailing CRC32 checksum. The checksum is always calculated over those `len` bytes.
260+
261+
### Symbol Table
262+
263+
See [Symbols](https://github.com/prometheus/prometheus/blob/d782387f814753b0118d402ec8cdbdef01bf9079/tsdb/docs/format/index.md#symbol-table)
264+
265+
### Postings Offset Table
266+
267+
See [Posting Offset Table](https://github.com/prometheus/prometheus/blob/d782387f814753b0118d402ec8cdbdef01bf9079/tsdb/docs/format/index.md#postings-offset-table)
268+
269+
### TOC
270+
271+
The table of contents serves as an entry point to the entire index and points to various sections in the file.
272+
If a reference is zero, it indicates the respective section does not exist and empty results should be returned upon lookup.
273+
274+
```
275+
┌─────────────────────────────────────────┐
276+
│ ref(symbols) <8b> │
277+
├─────────────────────────────────────────┤
278+
│ ref(postings offset table) <8b> │
279+
├─────────────────────────────────────────┤
280+
│ CRC32 <4b> │
281+
└─────────────────────────────────────────┘
282+
```

go.mod

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@ require (
7070
github.com/prometheus/client_model v0.0.0-20190812154241-14fe0d1b01d4
7171
github.com/prometheus/common v0.7.0
7272
github.com/prometheus/procfs v0.0.6 // indirect
73-
github.com/prometheus/prometheus v1.8.2-0.20200107122003-4708915ac6ef // master ~ v2.15.2
73+
github.com/prometheus/prometheus v1.8.2-0.20200110114423-1e64d757f711 // master ~ v2.15.2
7474
github.com/samuel/go-zookeeper v0.0.0-20190923202752-2cc03de413da // indirect
7575
github.com/satori/go.uuid v1.2.0 // indirect
7676
github.com/smartystreets/assertions v1.0.1 // indirect

go.sum

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -446,8 +446,8 @@ github.com/prometheus/procfs v0.0.5/go.mod h1:4A/X28fw3Fc593LaREMrKMqOKvUAntwMDa
446446
github.com/prometheus/procfs v0.0.6 h1:0qbH+Yqu/cj1ViVLvEWCP6qMQ4efWUj6bQqOEA0V0U4=
447447
github.com/prometheus/procfs v0.0.6/go.mod h1:7Qr8sr6344vo1JqZ6HhLceV9o3AJ1Ff+GxbHq6oeK9A=
448448
github.com/prometheus/prometheus v0.0.0-20180315085919-58e2a31db8de/go.mod h1:oAIUtOny2rjMX0OWN5vPR5/q/twIROJvdqnQKDdil/s=
449-
github.com/prometheus/prometheus v1.8.2-0.20200107122003-4708915ac6ef h1:pYYKXo/zGx25kyViw+Gdbxd0ItIg+vkVKpwgWUEyIc4=
450-
github.com/prometheus/prometheus v1.8.2-0.20200107122003-4708915ac6ef/go.mod h1:7U90zPoLkWjEIQcy/rweQla82OCTUzxVHE51G3OhJbI=
449+
github.com/prometheus/prometheus v1.8.2-0.20200110114423-1e64d757f711 h1:uEq+8hKI4kfycPLSKNw844YYkdMNpC2eZpov73AvlFk=
450+
github.com/prometheus/prometheus v1.8.2-0.20200110114423-1e64d757f711/go.mod h1:7U90zPoLkWjEIQcy/rweQla82OCTUzxVHE51G3OhJbI=
451451
github.com/rcrowley/go-metrics v0.0.0-20181016184325-3113b8401b8a/go.mod h1:bCqnVzQkZxMG4s8nGwiZ5l3QUCyqpo9Y+/ZMZ9VjZe4=
452452
github.com/rogpeppe/fastuuid v0.0.0-20150106093220-6724a57986af/go.mod h1:XWv6SoW27p1b0cqNHllgS5HIMJraePCO15w5zCzIWYg=
453453
github.com/rogpeppe/fastuuid v1.2.0/go.mod h1:jVj6XXZzXRy/MSR5jhDC/2q6DgLz+nrA6LYCDYWNEvQ=

pkg/block/block.go

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,8 +27,10 @@ const (
2727
MetaFilename = "meta.json"
2828
// IndexFilename is the known index file for block index.
2929
IndexFilename = "index"
30-
// IndexCacheFilename is the canonical name for index cache file that stores essential information needed.
30+
// IndexCacheFilename is the canonical name for json index cache file that stores essential information.
3131
IndexCacheFilename = "index.cache.json"
32+
// IndexHeaderFilename is the canonical name for binary index header file that stores essential information.
33+
IndexHeaderFilename = "index-header"
3234
// ChunksDirname is the known dir name for chunks with compressed samples.
3335
ChunksDirname = "chunks"
3436

pkg/block/block_test.go

Lines changed: 4 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,6 @@ package block
22

33
import (
44
"context"
5-
"io"
65
"io/ioutil"
76
"os"
87
"path"
@@ -12,7 +11,6 @@ import (
1211

1312
"github.com/fortytw2/leaktest"
1413
"github.com/go-kit/kit/log"
15-
"github.com/pkg/errors"
1614
"github.com/prometheus/prometheus/pkg/labels"
1715
"github.com/thanos-io/thanos/pkg/objstore/inmem"
1816
"github.com/thanos-io/thanos/pkg/testutil"
@@ -104,7 +102,7 @@ func TestUpload(t *testing.T) {
104102
testutil.NotOk(t, err)
105103
testutil.Assert(t, strings.HasSuffix(err.Error(), "/meta.json: no such file or directory"), "")
106104
}
107-
testutil.Ok(t, cpy(path.Join(tmpDir, b1.String(), MetaFilename), path.Join(tmpDir, "test", b1.String(), MetaFilename)))
105+
testutil.Copy(t, path.Join(tmpDir, b1.String(), MetaFilename), path.Join(tmpDir, "test", b1.String(), MetaFilename))
108106
{
109107
// Missing chunks.
110108
err := Upload(ctx, log.NewNopLogger(), bkt, path.Join(tmpDir, "test", b1.String()))
@@ -115,7 +113,7 @@ func TestUpload(t *testing.T) {
115113
testutil.Equals(t, 1, len(bkt.Objects()))
116114
}
117115
testutil.Ok(t, os.MkdirAll(path.Join(tmpDir, "test", b1.String(), ChunksDirname), os.ModePerm))
118-
testutil.Ok(t, cpy(path.Join(tmpDir, b1.String(), ChunksDirname, "000001"), path.Join(tmpDir, "test", b1.String(), ChunksDirname, "000001")))
116+
testutil.Copy(t, path.Join(tmpDir, b1.String(), ChunksDirname, "000001"), path.Join(tmpDir, "test", b1.String(), ChunksDirname, "000001"))
119117
{
120118
// Missing index file.
121119
err := Upload(ctx, log.NewNopLogger(), bkt, path.Join(tmpDir, "test", b1.String()))
@@ -125,7 +123,7 @@ func TestUpload(t *testing.T) {
125123
// Only debug meta.json present.
126124
testutil.Equals(t, 1, len(bkt.Objects()))
127125
}
128-
testutil.Ok(t, cpy(path.Join(tmpDir, b1.String(), IndexFilename), path.Join(tmpDir, "test", b1.String(), IndexFilename)))
126+
testutil.Copy(t, path.Join(tmpDir, b1.String(), IndexFilename), path.Join(tmpDir, "test", b1.String(), IndexFilename))
129127
testutil.Ok(t, os.Remove(path.Join(tmpDir, "test", b1.String(), MetaFilename)))
130128
{
131129
// Missing meta.json file.
@@ -136,7 +134,7 @@ func TestUpload(t *testing.T) {
136134
// Only debug meta.json present.
137135
testutil.Equals(t, 1, len(bkt.Objects()))
138136
}
139-
testutil.Ok(t, cpy(path.Join(tmpDir, b1.String(), MetaFilename), path.Join(tmpDir, "test", b1.String(), MetaFilename)))
137+
testutil.Copy(t, path.Join(tmpDir, b1.String(), MetaFilename), path.Join(tmpDir, "test", b1.String(), MetaFilename))
140138
{
141139
// Full block.
142140
testutil.Ok(t, Upload(ctx, log.NewNopLogger(), bkt, path.Join(tmpDir, "test", b1.String())))
@@ -170,31 +168,6 @@ func TestUpload(t *testing.T) {
170168
}
171169
}
172170

173-
func cpy(src, dst string) error {
174-
sourceFileStat, err := os.Stat(src)
175-
if err != nil {
176-
return err
177-
}
178-
179-
if !sourceFileStat.Mode().IsRegular() {
180-
return errors.Errorf("%s is not a regular file", src)
181-
}
182-
183-
source, err := os.Open(src)
184-
if err != nil {
185-
return err
186-
}
187-
defer source.Close()
188-
189-
destination, err := os.Create(dst)
190-
if err != nil {
191-
return err
192-
}
193-
defer destination.Close()
194-
_, err = io.Copy(destination, source)
195-
return err
196-
}
197-
198171
func TestDelete(t *testing.T) {
199172
defer leaktest.CheckTimeout(t, 10*time.Second)()
200173

0 commit comments

Comments
 (0)