Skip to content

Conversation

ania-borowiec
Copy link

  • One-line PR description: Graduate the feature "Pop pod from backoffQ when activeQ is empty" to GA
  • Other comments:

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory labels Sep 16, 2025
@k8s-ci-robot k8s-ci-robot added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label Sep 16, 2025
@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Sep 16, 2025
@ania-borowiec
Copy link
Author

@sanposhiho @macsko I'm a bit lost here, I tried to make a change similar to this one f65b96f but I clearly got something wrong, as my PR fails the tests.
Do you know how to fix this?

@wojtek-t wojtek-t self-assigned this Sep 17, 2025
Copy link
Member

@sanposhiho sanposhiho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits only from me

@sanposhiho
Copy link
Member

/assign @macsko

I'd like to make sure it's aligned with @macsko as well.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ania-borowiec
Once this PR has been reviewed and has the lgtm label, please ask for approval from macsko, wojtek-t. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@macsko
Copy link
Member

macsko commented Sep 25, 2025

I think we should put it on hold, as we spot some problems related to this feature: kubernetes/kubernetes#134249

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 25, 2025
@dom4ha
Copy link
Member

dom4ha commented Sep 25, 2025

I think we should put it on hold, as we spot some problems related to this feature: kubernetes/kubernetes#134249

The feature itself is technically correct, but in its current form it brings a risk of creating a tight loops whenever other parts of the scheduling framework have gaps.

It's a similar problem why we cannot turn off periodic flushing of the unschedulable queue, as it's very hard to exclude a possibility a system would miss some type of a notification waking up a pod.

Even if we can make the in-tree version 100% correct, other plugins may do things wrong, so we need safety nets based on some backoff and periodic flushing.

@macsko
Copy link
Member

macsko commented Sep 25, 2025

I think we should put it on hold, as we spot some problems related to this feature: kubernetes/kubernetes#134249

The feature itself is technically correct, but in its current form it brings a risk of creating a tight loops whenever other parts of the scheduling framework have gaps.

It's a similar problem why we cannot turn off periodic flushing of the unschedulable queue, as it's very hard to exclude a possibility a system would miss some type of a notification waking up a pod.

Even if we can make the in-tree version 100% correct, other plugins may do things wrong, so we need safety nets based on some backoff and periodic flushing.

I agree. The feature itself is correct, but it relies on the correctness of the entire kube-scheduler, which is difficult to ensure. We can consider changing the KEP approach to somehow mitigate such unrelated errors, but for now we have more important things to deal with.

@dom4ha
Copy link
Member

dom4ha commented Sep 25, 2025

We can consider changing the KEP approach to somehow mitigate such unrelated errors, but for now we have more important things to deal with.

Exactly, so I'm supporting not only finding a way to mitigate, but also turn it off by default for now until we find that mitigation.

We obviously could turn off async preemption feature instead, but we see a bigger value in having async preemption in the current form as it most likely brings a higher value than the pop from backoff feature.

@macsko
Copy link
Member

macsko commented Sep 25, 2025

Exactly, so I'm supporting not only finding a way to mitigate, but also turn it off by default for now until we find that mitigation.

We obviously could turn off async preemption feature instead, but we see a bigger value in having async preemption in the current form as it most likely brings a higher value than the pop from backoff feature.

Agree, that was my proposal as well: kubernetes/kubernetes#134249 (comment)

@ania-borowiec
Copy link
Author

@macsko @dom4ha
Does anything change if we merge kubernetes/kubernetes#134294 ?

If not, do you think we're able to define any conditions under which we want to have this feature enabled by default or graduated to GA?

@helayoty helayoty moved this to Backlog in SIG Scheduling Oct 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
Status: Backlog
Development

Successfully merging this pull request may close these issues.

6 participants