-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
I have a Deployment object deployed in a managed cluster (azure) that uses the kubernetes-client (4.10.3) to implement an informer over a given CRD I have in the cluster
I'm getting sort of often this kind of errors in the logs that let me believe that the connection to k8s had been closed often enough.
i.f.k.c.i.cache.ReflectorWatcher : Watch closing
After talking to the cluster-admin he told me that Azure implements a sort of proxy between me (the clients) and the k8s server. That proxy has some rules in terms of closing the unused connections or long connections which most likely will match with what the informer is actually doing
From what I can see in the logs, the informer is able to handle well the connection close most of the time but there is something that is happening that sometimes it just never recover. At that point, my code never gets updated on the changes of the CRs. My theory is that the thread responsible to reconnect to k8s and updates its internal cache dies for an unknown error.
Unfortunately, this is a hard to reproduce error and the pod where this error happened was in WARN logs so there is no much I can see there.
Another comment. The cluster admin from my company (the cluster is managed by Azure) told me he observed the same problem but with a Python client which seems to suggest that there is something in the way Azure creates their clusters that doesn't like the Kubernetes client
What I would like to have is a sort of notification/event that I could catch and kill the pod in case that we lose the connection with kuberentes. That way the new pod will "hopefully" have a new connection