-
Notifications
You must be signed in to change notification settings - Fork 3.4k
fix: workflow DeleteFunc use Index data to improve processing speed(#14791) #14875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…rgoproj#14791) Signed-off-by: guanguxiansheng <[email protected]>
|
@guanguxiansheng We encountered similar performance issues when migrating from version 3.5 to 3.6, and referencing this transformation resolved the problem for us as well. It looks like this deletion operation introduces significant overhead and slows down the event handler loop. |
|
Indeed, this is a good discovery. In the design of the infromer, any time-consuming operation should not be in the event handle, but in the workqueue. |
|
Getting pods from the informer does speed things up, but there's a problem: under high-load scenarios, the local informer might become inconsistent with the remote API server. This operation can't tolerate this inconsistency((For example, during reconceil, it's possible to tolerate a late update of the pod informer.), potentially leading to pod leaks due to undeleting finalizers. |
workflow/controller/controller.go
Outdated
| wf := obj.(*unstructured.Unstructured) | ||
| podObjs, err := wfc.PodController.GetPodsByIndex(indexes.WorkflowIndex, indexes.WorkflowIndexValue(wf.GetNamespace(), wf.GetName())) | ||
| if err != nil { | ||
| logger.WithError(err).Error(ctx, "Failed to list pods") | ||
| logger.WithError(err).Error(ctx, "Failed to get pods by index") | ||
| } | ||
| for _, p := range podList.Items { | ||
| for _, podObj := range podObjs { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine, should we maybe also add a backup in case it isn't in the index?
|
To solve the problem of incomplete index data, there are two ideas: 1. In the |
I prefer to move this loop outside of the event handler, to the workqueue. However, I think this approach is also acceptable, as large and multiple workflows generally do not coexist in a cluster. |
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the ✨ Finishing touches🧪 Generate unit tests (beta)
Comment |
|
@shuangkun @isubasinghe hi, I chose to get pod information from |
…essing speed(argoproj#14791) Signed-off-by: guanguxiansheng <[email protected]>
Signed-off-by: Nitin Bhojwani <[email protected]>
…Fixes argoproj#14671 (argoproj#14724) Signed-off-by: Darko Janjic <[email protected]> Signed-off-by: isubasinghe <[email protected]> Co-authored-by: isubasinghe <[email protected]>
Signed-off-by: shuangkun <[email protected]>
Signed-off-by: shuangkun <[email protected]> Co-authored-by: lons <[email protected]>
…4854) Signed-off-by: shuangkun <[email protected]> Co-authored-by: lons <[email protected]>
) Signed-off-by: jmeridth <[email protected]>
…proj#14855) Signed-off-by: shuangkun <[email protected]> Co-authored-by: lons <[email protected]>
Signed-off-by: shuangkun <[email protected]>
Signed-off-by: Tianchu Zhao <[email protected]>
argoproj#14908) Signed-off-by: shuangkun <[email protected]> Signed-off-by: Sebastien Dionne <[email protected]> Signed-off-by: Tianchu Zhao <[email protected]> Co-authored-by: shuangkun tian <[email protected]> Co-authored-by: lons <[email protected]> Co-authored-by: Tianchu Zhao <[email protected]>
Signed-off-by: Alan Clucas <[email protected]>
… (argoproj#14920) Signed-off-by: isubasinghe <[email protected]>
…rgoproj#14784) Signed-off-by: Darko Janjic <[email protected]>
Signed-off-by: Alan Clucas <[email protected]>
Signed-off-by: Alan Clucas <[email protected]>
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Alan Clucas <[email protected]> Co-authored-by: Joibel <[email protected]> Co-authored-by: Alan Clucas <[email protected]>
…roj#11898 (argoproj#14655) Signed-off-by: 민선 (minnie) <[email protected]>
Signed-off-by: Alan Clucas <[email protected]>
Signed-off-by: shota3506 <[email protected]>
…15067) Signed-off-by: Alan Clucas <[email protected]>
…roup across 1 directory (argoproj#15085) Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…ata.name` (argoproj#13476) Signed-off-by: Miltiadis Alexis <[email protected]>
Signed-off-by: Miltiadis Alexis <[email protected]>
Signed-off-by: Mason Malone <[email protected]>
Signed-off-by: Alan Clucas <[email protected]> Signed-off-by: Mason Malone <[email protected]> Co-authored-by: Elliot Gunton <[email protected]> Co-authored-by: Mason Malone <[email protected]>
Signed-off-by: Alan Clucas <[email protected]>
Signed-off-by: Eduardo Rodrigues <[email protected]>
Signed-off-by: Claude <[email protected]>
…ed Workflows (argoproj#14727) Signed-off-by: miinsun <[email protected]>
…roj#15112) Signed-off-by: shuangkun <[email protected]> Co-authored-by: cyzlmh <[email protected]>
Signed-off-by: Alan Clucas <[email protected]>
Signed-off-by: Alan Clucas <[email protected]> Co-authored-by: Claude <[email protected]>
Signed-off-by: Alan Clucas <[email protected]>
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Alan Clucas <[email protected]> Co-authored-by: Joibel <[email protected]> Co-authored-by: Alan Clucas <[email protected]>
…rgoproj#15063) Signed-off-by: Gianluca Arbezzano <[email protected]> Co-authored-by: Alan Clucas <[email protected]>
Signed-off-by: Alan Clucas <[email protected]> Co-authored-by: Claude <[email protected]> Co-authored-by: Copilot <[email protected]>
Signed-off-by: Alan Clucas <[email protected]>
Signed-off-by: Alan Clucas <[email protected]>
Signed-off-by: Alan Clucas <[email protected]>
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Alan Clucas <[email protected]> Co-authored-by: Joibel <[email protected]> Co-authored-by: Alan Clucas <[email protected]>
Signed-off-by: Alan Clucas <[email protected]>
Signed-off-by: Alan Clucas <[email protected]>
Signed-off-by: Alan Clucas <[email protected]>
…ure. (argoproj#15115) Signed-off-by: shuangkun <[email protected]> Co-authored-by: Alan Clucas <[email protected]>
Signed-off-by: Alan Clucas <[email protected]>
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Alan Clucas <[email protected]> Co-authored-by: Joibel <[email protected]> Co-authored-by: Alan Clucas <[email protected]>
…eeded always (argoproj#15150) Signed-off-by: Alan Clucas <[email protected]>
…oj#15146) Signed-off-by: shuangkun <[email protected]>
Signed-off-by: shuangkun <[email protected]> Co-authored-by: AlbeeSo <[email protected]>
Fixes #14791
Motivation
Try to fix #14791
Some performance issues were encountered during production. Controller processing performance degrades significantly when running workflows at scale. Adding a throttling queue doesn't significantly improve speed.
Modifications
Modify
DeleteFuncincontroller.goUse Index data instead of directly requesting Apiserver to improve processing speed
Documentation
In
shared_informer.go, a single coroutine traversesnextChand executesAddFunc,UpdateFunc, andDeleteFunc. InDeleteFunc, if thepods.Listrequest is slow, then the next element ofnextChwill be delayed, which will naturally blockAddFuncandUpdateFunc.