-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Graduate "Pop pod from backoffQ when activeQ is empty" to GA #5535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
ania-borowiec
commented
Sep 16, 2025
- One-line PR description: Graduate the feature "Pop pod from backoffQ when activeQ is empty" to GA
- Issue link: Pop pod from backoffQ when activeQ is empty #5142
- Other comments:
a67bf8d
to
df15d50
Compare
@sanposhiho @macsko I'm a bit lost here, I tried to make a change similar to this one f65b96f but I clearly got something wrong, as my PR fails the tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nits only from me
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: ania-borowiec The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
I think we should put it on hold, as we spot some problems related to this feature: kubernetes/kubernetes#134249 /hold |
The feature itself is technically correct, but in its current form it brings a risk of creating a tight loops whenever other parts of the scheduling framework have gaps. It's a similar problem why we cannot turn off periodic flushing of the unschedulable queue, as it's very hard to exclude a possibility a system would miss some type of a notification waking up a pod. Even if we can make the in-tree version 100% correct, other plugins may do things wrong, so we need safety nets based on some backoff and periodic flushing. |
I agree. The feature itself is correct, but it relies on the correctness of the entire kube-scheduler, which is difficult to ensure. We can consider changing the KEP approach to somehow mitigate such unrelated errors, but for now we have more important things to deal with. |
Exactly, so I'm supporting not only finding a way to mitigate, but also turn it off by default for now until we find that mitigation. We obviously could turn off async preemption feature instead, but we see a bigger value in having async preemption in the current form as it most likely brings a higher value than the pop from backoff feature. |
Agree, that was my proposal as well: kubernetes/kubernetes#134249 (comment) |
@macsko @dom4ha If not, do you think we're able to define any conditions under which we want to have this feature enabled by default or graduated to GA? |