Make scheduler-plugins the default gang scheduler.#1747
Make scheduler-plugins the default gang scheduler.#1747google-oss-prow[bot] merged 1 commit intokubeflow:masterfrom
Conversation
|
@tenzen-y @johnugeorge PTAL, Related: kubeflow/common#209 |
|
/hold |
cmd/training-operator.v1/main.go
Outdated
| " Now supporting TFJob, PyTorchJob, MXNetJob, XGBoostJob, PaddleJob. By default, all supported schemes will be enabled.") | ||
| flag.StringVar(&gangSchedulerName, "gang-scheduler-name", "none", "The scheduler to gang-schedule kubeflow jobs, defaults to none") | ||
| flag.StringVar(&gangSchedulerName, "gang-scheduler-name", "none", "The scheduler to gang-schedule kubeflow jobs, defaults to none."+ | ||
| " Now supporting node, volcano, scheduler-plugins, koord-scheduler.") |
There was a problem hiding this comment.
| " Now supporting node, volcano, scheduler-plugins, koord-scheduler.") | |
| " Now supporting none, volcano, scheduler-plugins, koord-scheduler.") |
Pull Request Test Coverage Report for Build 4412816813
💛 - Coveralls |
67452d5 to
e08164d
Compare
|
@tenzen-y Are there any blockers here? |
|
@johnugeorge We need to cut a release on the common repository. |
|
@tenzen-y created 0.4.7 release https://github.com/kubeflow/common/releases/tag/v0.4.7 |
|
@johnugeorge Thank you! @Syulin7 Can you update this PR with a new common library version? |
|
/assign |
cmd/training-operator.v1/main.go
Outdated
| " Now supporting TFJob, PyTorchJob, MXNetJob, XGBoostJob, PaddleJob. By default, all supported schemes will be enabled.") | ||
| flag.StringVar(&gangSchedulerName, "gang-scheduler-name", "none", "The scheduler to gang-schedule kubeflow jobs, defaults to none") | ||
| flag.StringVar(&gangSchedulerName, "gang-scheduler-name", "", "The scheduler to gang-schedule kubeflow jobs."+ | ||
| " Now supporting volcano, default-scheduler, scheduler-plugins, koord-scheduler.") |
There was a problem hiding this comment.
| " Now supporting volcano, default-scheduler, scheduler-plugins, koord-scheduler.") | |
| " Now Supporting volcano and scheduler-plugins. Note: If you set another scheduler name, the training-operator assumes it's the scheduler-plugins.") |
Signed-off-by: Syulin7 <735122171@qq.com>
tenzen-y
left a comment
There was a problem hiding this comment.
@Syulin7 Great! Thank you for the awesome contribution!
/lgtm
/assign @johnugeorge
|
/hold cancel |
|
Thanks @Syulin7 Need to update docs( https://www.kubeflow.org/docs/components/training/job-scheduling/#running-jobs-with-gang-scheduling) regarding this. /approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: johnugeorge, Syulin7 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* Support for k8s v1.25 in CI * Support for k8s v1.25 in CI * Change k8s api to v1.25 * Upgrade golangci-lint version * Add changelog * Update CHANGELOG.md * Update Changelog * Merge common repo * Avoid to depend on local env when installing the code-generators (#1810) Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com> * Make scheduler-plugins the default gang scheduler. (#1747) Signed-off-by: Syulin7 <735122171@qq.com> * Fix tests * Fix merge conflicts * Fix CI issues * Fix CI issues * Fix review comments * Add contributors in Readme file * Fix review comments * Fix review comments --------- Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com> Signed-off-by: Syulin7 <735122171@qq.com> Co-authored-by: Yuki Iwai <yuki.iwai.tz@gmail.com> Co-authored-by: yu lin <37265556+Syulin7@users.noreply.github.com>
What this PR does / why we need it:
Training Operator now supports many gang schedulers(volcano, scheduler-plugins), and now we can easily add koordinator gang scheduler.
Related: #1746
Reviewers can check the koordinator gang schedule feature with the koordinator in the following steps:
Which issue(s) this PR fixes (optional, in
Fixes #<issue number>, #<issue number>, ...format, will close the issue(s) when PR gets merged):Fixes #1746
Checklist: