thanos-query crashes with "concurrent map iteration and map write"

**Thanos, Prometheus and Golang version used**



thanos, version 0.5.0 (branch: HEAD, revision: 72820b3f41794140403fd04d6da82299f2c16447)
  build user:       circleci@eeac5eb36061
  build date:       20190606-10:53:12
  go version:       go1.12.5

**What happened**

In one of k8s clusters that we run thanos-query in it crashes every couple of minutes with "fatal error: concurrent map iteration and map write" or "fatal error: concurrent map writes"

**What you expected to happen**

No crash :-)

**How to reproduce it (as minimally and precisely as possible)**:

I've no idea. I didn't manage to find anything that triggers it. Same problem was observed in 0.4.0. I'm not sure about 0.3.0.
thanos runs in GCP GKE cluster, query is deployed via our own helm chart. Crashing containers run:
```
  thanos query
      --log.level=debug
      --query.replica-label=prometheus_replica
      --grpc-server-tls-cert=/etc/certs/tls.crt
      --grpc-server-tls-key=/etc/certs/tls.key
      --store=dnssrv+_grpc._tcp.thanos-sidecars-prometheus.monitoring.svc
      --selector-label=location="REDACTED"
      --selector-label=stack="REDACTED"
      --selector-label=REDACTED
```

Same deployment (differs in selector-label values) crashes less in other GKE cluster and almost not at all in yet another GKE cluster, while receiving similar (very low) traffic via GRPC.

Those query instances serve as GRPC endpoints for global thanos-query (that runs in another, "observability" cluster and does not crash) to return recent data (older data is served from bucket). They are behind GCP load balancer (using http2 to communicate LB <-> thanos in GKE)

**Full logs to relevant components**

Example after-crash dump is here: https://gist.github.com/bjakubski/18a98f6f1fc2922e5056df3106fe1477

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

thanos-query crashes with "concurrent map iteration and map write" #1272

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

thanos-query crashes with "concurrent map iteration and map write" #1272

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions