Skip to content

compact: add metric thanos_compactor_iterations_total#1733

Merged
bwplotka merged 5 commits intothanos-io:masterfrom
GiedriusS:add_iterations_complete_metric_compactor
Nov 13, 2019
Merged

compact: add metric thanos_compactor_iterations_total#1733
bwplotka merged 5 commits intothanos-io:masterfrom
GiedriusS:add_iterations_complete_metric_compactor

Conversation

@GiedriusS
Copy link
Member

@GiedriusS GiedriusS commented Nov 8, 2019

Add a metric called thanos_compactor_iterations_total that is a counter
and will get increased by 1 every time an iteration gets executed
successfully. This is needed in case --wait is specified and then our
Compactor could die. We need to alert on such a case.

One thing would be to alert on a restart of the container however that
is not the most flexible thing - it might still be OK as long as it
successfully finishes its job in time. However, it is impossible to know
that exact part ATM.

Add this metric so that users could add alerts like:

rate(thanos_compactor_iterations_total[1d]) == 0
FOR 3d

Signed-off-by: Giedrius Statkevičius giedriuswork@gmail.com

There is thanos_objstore_bucket_last_successful_upload_time but still
that doesn't show if we were able to get through all of the cycle successfully.

Add a metric called thanos_compactor_iterations_total that is a counter
and will get increased by 1 every time an iteration gets executed
successfully. This is needed in case --wait is specified and then our
Compactor could die. We need to alert on such a case.

One thing would be to alert on a restart of the container however that
is not the most flexible thing - it might still be OK as long as it
successfully finishes its job in time. However, it is impossible to know
that exact part ATM.

Add this metric so that users could add alerts like:

```
rate(thanos_compactor_iterations_total[1d]) == 0
FOR 3d
```

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>
Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>
Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>
@GiedriusS GiedriusS force-pushed the add_iterations_complete_metric_compactor branch from a18e0b3 to 8a9a90d Compare November 8, 2019 09:43
@GiedriusS GiedriusS closed this Nov 8, 2019
@GiedriusS GiedriusS reopened this Nov 8, 2019
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, with small nit.

Let's register the metric no matter what since if it is run as a batch
job then this metric does not matter either way.

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>
Copy link
Member

@FUSAKLA FUSAKLA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👍
we should update the alerts at some point again maybe

Copy link
Member

@squat squat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

CHANGELOG.md Outdated
- [#1573](https://github.com/thanos-io/thanos/pull/1573) `AliYun OSS` object storage, see [documents](docs/storage.md#aliyun-oss) for further information.
- [#1680](https://github.com/thanos-io/thanos/pull/1680) Add a new `--http-grace-period` CLI option to components which serve HTTP to set how long to wait until HTTP Server shuts down.
- [#1712](https://github.com/thanos-io/thanos/pull/1712) Rename flag on bucket web component from `--listen` to `--http-address` to match other components.
- [#1733](https://github.com/thanos-io/thanos/pull/1733) New metric `thanos_compactor_iterations_total` on Thanos Compactor which shows the number of successful iterations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny nit: add a period at the end here to keep them all full sentences.

Add a period at the end of an item in the CHANGELOG to keep it uniform.

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>
@bwplotka bwplotka merged commit a0747ef into thanos-io:master Nov 13, 2019
IKSIN pushed a commit to monitoring-tools/thanos that referenced this pull request Nov 26, 2019
* compact: add metric thanos_compactor_iterations_total

Add a metric called thanos_compactor_iterations_total that is a counter
and will get increased by 1 every time an iteration gets executed
successfully. This is needed in case --wait is specified and then our
Compactor could die. We need to alert on such a case.

One thing would be to alert on a restart of the container however that
is not the most flexible thing - it might still be OK as long as it
successfully finishes its job in time. However, it is impossible to know
that exact part ATM.

Add this metric so that users could add alerts like:

```
rate(thanos_compactor_iterations_total[1d]) == 0
FOR 3d
```

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>

* CHANGELOG: add entry

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>

* compact: simplify wait check

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>

* cmd: thanos: compact: remove wait check

Let's register the metric no matter what since if it is run as a batch
job then this metric does not matter either way.

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>

* CHANGELOG: add period

Add a period at the end of an item in the CHANGELOG to keep it uniform.

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>
Signed-off-by: Aleksey Sin <asin@ozon.ru>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants