Skip to content

Conversation

@takoverflow
Copy link
Member

What this PR does / why we need it:

  • Use machine NamespacedName when populating/checking pendingMachineCreationMap
  • Populate the map with the machine name inside triggerCreationFlow

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:
cc: @aaronfern @elankath

Release note:


@takoverflow takoverflow requested a review from a team as a code owner October 15, 2025 07:10
@elankath
Copy link
Member

Thanks for the change. In addition to the unit-test, was there a manual test attempted by putting an artificial delay inside triggerCreateMachine for some minutes and then manually deleting the machine using kubectl (after removing finalizer?) - just to see if overall logic works as expected ?

@takoverflow
Copy link
Member Author

Tested with the virtual provider by having a delay of 1m in CreateMachine() call, then attempting to delete a machine.

218:W1015 12:54:11.689712   44346 machine.go:885] Cannot delete machine "shoot--i349079--aws-01-etcd-worker-z1-55b56-chm4r", its deletionTimestamp is set but it is currently being processed by the creation flow
...
598:I1015 12:55:22.807654   44346 machine.go:557] Created new VM for machine: "shoot--i349079--aws-01-etcd-worker-z1-55b56-chm4r" with ProviderID: "aws:///eu-west-1/i-41a89bfa9d66b60c3" and backing node: "shoot--i349079--aws-01-etcd-worker-z1-55b56-chm4r"
599:W1015 12:55:22.816770   44346 machine.go:732] Machine labels/annotations UPDATE failed for "shoot--i349079--aws-01-etcd-worker-z1-55b56-chm4r". Will retry after VM initialization (if required), error: Operation cannot be fulfilled on machines.machine.sapcloud.io "shoot--i349079--aws-01-etcd-worker-z1-55b56-chm4r": the object has been modified; please apply your changes to the latest version and try again
600:E1015 12:55:22.816951   44346 machine.go:644] failed to update labels and providerID for machine "shoot--i349079--aws-01-etcd-worker-z1-55b56-chm4r". err="Operation cannot be fulfilled on machines.machine.sapcloud.io \"shoot--i349079--aws-01-etcd-worker-z1-55b56-chm4r\": the object has been modified; please apply your changes to the latest version and try again"
601:I1015 12:55:22.817038   44346 machine.go:749] Initializing VM instance for Machine "shoot--i349079--aws-01-etcd-worker-z1-55b56-chm4r"
602:I1015 12:55:22.817094   44346 machine.go:758] Provider does not support Driver.InitializeMachine - skipping VM instance initialization for "shoot--i349079--aws-01-etcd-worker-z1-55b56-chm4r".
603:I1015 12:55:22.817140   44346 machine.go:241] reconcileClusterMachine: Stop for "shoot--i349079--aws-01-etcd-worker-z1-55b56-chm4r"
604:I1015 12:55:22.817176   44346 machine.go:112] Adding machine object to queue "shoot--i349079--aws-01/shoot--i349079--aws-01-etcd-worker-z1-55b56-chm4r" after 5s, reason: machine creation in process. Machine initialization (if required) is successful
605:E1015 12:55:22.817274   44346 machine.go:163] Machine "shoot--i349079--aws-01-etcd-worker-z1-55b56-chm4r" should be in machine termination queue
606:I1015 12:55:22.817332   44346 machine.go:123] Adding machine object to termination queue "shoot--i349079--aws-01/shoot--i349079--aws-01-etcd-worker-z1-55b56-chm4r", reason: handling terminating machine object
607:I1015 12:55:22.817496   44346 machine.go:272] reconcileClusterMachineTermination: Start for "shoot--i349079--aws-01-etcd-worker-z1-55b56-chm4r" with phase:"", description:""
608:I1015 12:55:22.831713   44346 machine_util.go:1305] Machine "shoot--i349079--aws-01-etcd-worker-z1-55b56-chm4r" status updated to terminating
609:I1015 12:55:22.831841   44346 machine.go:133] Adding machine object to termination queue "shoot--i349079--aws-01/shoot--i349079--aws-01-etcd-worker-z1-55b56-chm4r" after 5s, reason: Machine deletion in process. Phase set to termination

@elankath
Copy link
Member

Whyy can't we see the log for err := fmt.Errorf("machine %q is in creation flow. Deletion cannot proceed", machine.Name) ? Is it possible to add the log for this case ? (So that it is clear from logs that deletion is postponed until creation is done)

@gardener-robot gardener-robot added the size/xs Size of pull request is tiny (see gardener-robot robot/bots/size.py) label Oct 15, 2025
@takoverflow
Copy link
Member Author

Whyy can't we see the log for err := fmt.Errorf("machine %q is in creation flow. Deletion cannot proceed", machine.Name) ? Is it possible to add the log for this case ? (So that it is clear from logs that deletion is postponed until creation is done)

The error log would only show if the machine is actually being processed by machineTerminationQueue and the check in updateMachine would prevent enqueuing a machine there if it's undergoing creation as discussed.

@gardener-robot gardener-robot added size/s Size of pull request is small (see gardener-robot robot/bots/size.py) and removed size/xs Size of pull request is tiny (see gardener-robot robot/bots/size.py) labels Oct 15, 2025
@aaronfern aaronfern merged commit e8c7ae0 into gardener:master Oct 15, 2025
12 checks passed
@gardener-robot gardener-robot added the status/closed Issue is closed (either delivered or triaged) label Oct 15, 2025
aaronfern pushed a commit to aaronfern/machine-controller-manager that referenced this pull request Oct 15, 2025
* Use fully qualified machine name for pending create map

* Extract pendingCreationMap operations into helper functions
aaronfern added a commit that referenced this pull request Oct 15, 2025
* Safeguard to prevent termination of machines part of creation flow (#1036)

* Use fully qualified machine name for pending create map (#1043)

* Use fully qualified machine name for pending create map

* Extract pendingCreationMap operations into helper functions

---------

Co-authored-by: Prashant Tak <[email protected]>
jamand pushed a commit to stackitcloud/machine-controller-manager that referenced this pull request Oct 27, 2025
* Safeguard to prevent termination of machines part of creation flow (gardener#1036)

* Use fully qualified machine name for pending create map (gardener#1043)

* Use fully qualified machine name for pending create map

* Extract pendingCreationMap operations into helper functions

---------

Co-authored-by: Prashant Tak <[email protected]>
timebertt pushed a commit to stackitcloud/machine-controller-manager that referenced this pull request Oct 28, 2025
* Safeguard to prevent termination of machines part of creation flow (gardener#1036)

* Use fully qualified machine name for pending create map (gardener#1043)

* Use fully qualified machine name for pending create map

* Extract pendingCreationMap operations into helper functions

---------

Co-authored-by: Prashant Tak <[email protected]>
afritzler pushed a commit to afritzler/machine-controller-manager that referenced this pull request Nov 26, 2025
* Use fully qualified machine name for pending create map

* Extract pendingCreationMap operations into helper functions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/s Size of pull request is small (see gardener-robot robot/bots/size.py) status/closed Issue is closed (either delivered or triaged)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants