Add check pods are not scheduled when testing gang-scheduler integrations in e2e#1835
Add check pods are not scheduled when testing gang-scheduler integrations in e2e#1835google-oss-prow[bot] merged 1 commit intokubeflow:masterfrom tenzen-y:fix-gang-scheduling-e2e
Conversation
Pull Request Test Coverage Report for Build 5302424174
💛 - Coveralls |
|
Maybe, this PR will resolve #1832? |
|
cc: @lowang-bh |
| if client.is_job_running(name, namespace, job_kind): | ||
| raise Exception(f"{job_kind} shouldn't be in Running condition") | ||
| # Job shouldn't have a Running condition. | ||
| if client.is_job_running(name, namespace, job_kind): |
There was a problem hiding this comment.
Can you explain what you mean by "pending for a while"? Are you referring to a situation which is in created but not running? If then, job will get into running state after retry?
There was a problem hiding this comment.
I meant unschedulable pods (gang scheduling).
There was a problem hiding this comment.
If before the training-operator updates the job condition from Runnng=false to Running=true, this test code gets the job condition and the job condition has Running=false or doesn't have Running condition, this test unintended passes.
So, let's imagine the following situation:
Current e2e:
- Test: Deploy job with gang scheduling setting (.runPolicy.schedulingPolicy).
- Operator: Failed to set schedulerName=volcano to the job. Or create an incorrect PodGroup.
- Test: Get the job with
Running=falseor withoutRunningcondition. - Pods: Pods are immediately scheduled to Node and start since the job doesn't have appropriate gang scheduling settings.
- Operator: Update the job condition with
Running=true. - Test: Succeeded! (Unintended)
There was a problem hiding this comment.
Note: This verify_unschedulable_job_e2e function verifies that gang scheduler integrations work well.
There was a problem hiding this comment.
Thanks for the explanation.
| # TODO (tenzen-y): Implement E2E tests using volcano. | ||
| elif gang_scheduler_name == TEST_GANG_SCHEDULER_NAME_VOLCANO: | ||
| return "" | ||
| return TEST_GANG_SCHEDULER_NAME_VOLCANO |
There was a problem hiding this comment.
@johnugeorge In fact, even though we forgot to set volcano to schedulerName in the podSpec, e2e passed in #1831.
There was a problem hiding this comment.
This is necessary and it used to set the scheduler for gang-schedule e2e. I forget to changed it in last pr, sorry.
…ions in e2e Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: johnugeorge, tenzen-y The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What this PR does / why we need it:
When enabling the gang-scheduling, we don't check whether jobs have been pending for a while in e2e.
So tests for the gang-scheduling will pass if jobs meet the
Created=trueandRunning=falseconditions for just a moment.I added a check that jobs have been pending for a while.
Also, I fixed a test bug that
volcanoisn't set to jobs as a schedulerName when testing for volcano integration.Note: I faced errors in #1834.
Which issue(s) this PR fixes (optional, in
Fixes #<issue number>, #<issue number>, ...format, will close the issue(s) when PR gets merged):Fixes #
Checklist: