Implement webhook validation for the TFJob#2051
Implement webhook validation for the TFJob#2051google-oss-prow[bot] merged 1 commit intokubeflow:masterfrom tenzen-y:implement-tfjob-webhook-validations
Conversation
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: tenzen-y The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/hold |
|
/assign @andreyvelich @johnugeorge This is the next PR to introduce webhook validations. PTAL, thanks. |
|
Some tests are timing out |
I guess that some tests reach GHA limitations... |
This error seems to be related it: docker/buildx#841 |
|
/lgtm |
andreyvelich
left a comment
There was a problem hiding this comment.
Thanks for this @tenzen-y!
A few comments.
| func IsChiefOrMaster(typ ReplicaType) bool { | ||
| return typ == TFJobReplicaTypeChief || typ == TFJobReplicaTypeMaster | ||
| } | ||
|
|
There was a problem hiding this comment.
@tenzen-y Does it make sense to move this function in webhooks/tensorflow/tfjob_webhook.go since we use it there and it will help us to avoid creation of additional file named tensorflow_types_test.go?
There was a problem hiding this comment.
Indeed. I'm wondering if we can put this function in the webhook package.
However, I decided to put this here since this function is not only for the webhooks like this: https://github.com/search?q=repo%3Akubeflow%2Ftraining-operator%20IsChiefOrMaster&type=code
There was a problem hiding this comment.
@tenzen-y Do we really need this check in the controller if we already verify if TFJob contains Chief or Master here: https://github.com/kubeflow/training-operator/blob/c20422067e3ef81df39d03c6f285353344d8f77d/pkg/controller.v1/tensorflow/tfjob_controller.go#L452C6-L452C31 ?
There was a problem hiding this comment.
Yes, we need it since the webhook verifies only if the master/chief role replica is 0 or 1.
When TFJob doesn't have any master/chief role, the training operator switches there based on the check.
There was a problem hiding this comment.
Alright, since we need additional check to verify if rtype is Chief of Master
| if errors := apimachineryvalidation.NameIsDNS1035Label(job.Name, false); len(errors) != 0 { | ||
| allErrs = append(allErrs, field.Invalid(field.NewPath("metadata").Child("name"), job.Name, fmt.Sprintf("should match: %v", strings.Join(errors, ",")))) | ||
| } |
There was a problem hiding this comment.
Should we move this validation to the helper function in /webhooks/utils/utils.go since this check is common to all Framework ?
There was a problem hiding this comment.
Yes, we can commonize these functions, but as I mentioned in the description, commonizing in a separate PR would be better. Because the minimum validation change could prove that webhook validations would not bring any bugs/breaks.
Although we potentially commonize podTemplate validations, I leave such commonize in the follow-ups.
There was a problem hiding this comment.
Sure, that sounds good @tenzen-y!
We might need to track it in the separate issue.
| for idx, container := range rSpec.Template.Spec.Containers { | ||
| if container.Image == "" { | ||
| allErrs = append(allErrs, field.Required(containerPath.Index(idx).Child("image"), "must be required")) | ||
| } | ||
| if container.Name == trainingoperator.TFJobDefaultContainerName { | ||
| defaultContainerPresent = true | ||
| } | ||
| } |
There was a problem hiding this comment.
Similar to above comment maybe that could be moved to the helper function in which we pass container name for the check.
andreyvelich
left a comment
There was a problem hiding this comment.
Thank you @tenzen-y!
/lgtm
| func IsChiefOrMaster(typ ReplicaType) bool { | ||
| return typ == TFJobReplicaTypeChief || typ == TFJobReplicaTypeMaster | ||
| } | ||
|
|
There was a problem hiding this comment.
Alright, since we need additional check to verify if rtype is Chief of Master
|
Thanks all! /hold cancel |
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
What this PR does / why we need it:
I implemented webhook validations for the TFJob.
Additionally, I didn't add any additional validations. The traininig-operator has the same validations the same as before.
Although we potentially commonize podTemplate validations, I leave such commonize in the follow-ups.
This PR depends on #2035
Which issue(s) this PR fixes (optional, in
Fixes #<issue number>, #<issue number>, ...format, will close the issue(s) when PR gets merged):Part-of #1993
Checklist: