Skip to content

Thanos Query stop responding if not queried for some time #705

@sysblade

Description

@sysblade

Thanos, Prometheus and Golang version used

Docker tags:

thanos:
  version: v0.2.1
prometheus:
  version: v2.5.0

What happened

When thanos-query is not used for some time, it stops answering when we try to query again and only work after are restart.
We saw that when we shut our persistent grafana dashboard (shown on screens on the office), now every time we go to grafana or to the thanos-query interface, the queries hang up forever until we restart thanos-query.
CPU usage on all the thanos relarted processes are low so it doesn't look like it's doing anything.
All the stores are up in thanos-query dashboard but all queries are failing (even up).
After restarting thanos query everything works fine for a few hours (I don't know the exact time)

What you expected to happen
Query working fine

How to reproduce it (as minimally and precisely as possible):

Don't do queries for a few hours, then try to query

Full logs to relevant components

Nothing in the logs except context cancelled because the HTTP queries are timing out on Grafana

Anything else we need to know

Environment:

  • OS: Docker 18.09.0 on Centos 7
  • Kernel (e.g. uname -a): Linux gc-euw1-prometheus-central-1 3.10.0-862.14.4.el7.x86_64 #1 SMP Wed Sep 26 15:12:11 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
  • Others:

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions