Skip to content

SharedIndexInformer for CRD stop getting events #2798

@lumeche

Description

@lumeche

I have a Deployment object deployed in a managed cluster (azure) that uses the kubernetes-client (4.10.3) to implement an informer over a given CRD I have in the cluster

I'm getting sort of often this kind of errors in the logs that let me believe that the connection to k8s had been closed often enough.

i.f.k.c.i.cache.ReflectorWatcher         : Watch closing

After talking to the cluster-admin he told me that Azure implements a sort of proxy between me (the clients) and the k8s server. That proxy has some rules in terms of closing the unused connections or long connections which most likely will match with what the informer is actually doing

From what I can see in the logs, the informer is able to handle well the connection close most of the time but there is something that is happening that sometimes it just never recover. At that point, my code never gets updated on the changes of the CRs. My theory is that the thread responsible to reconnect to k8s and updates its internal cache dies for an unknown error.
Unfortunately, this is a hard to reproduce error and the pod where this error happened was in WARN logs so there is no much I can see there.

Another comment. The cluster admin from my company (the cluster is managed by Azure) told me he observed the same problem but with a Python client which seems to suggest that there is something in the way Azure creates their clusters that doesn't like the Kubernetes client

What I would like to have is a sort of notification/event that I could catch and kill the pod in case that we lose the connection with kuberentes. That way the new pod will "hopefully" have a new connection

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions