Skip to content

Conversation

@elankath
Copy link
Member

@elankath elankath commented Aug 5, 2025

What this PR does / why we need it:

Which issue(s) this PR fixes:
Fixes partially #955

Special notes for your reviewer:

Release note:

Machine Class now has VirtualCapacity field hat maps to WorkerPool's NodeTemplate VirtualCapacity.

@elankath elankath requested a review from a team as a code owner August 5, 2025 04:54
@gardener-robot gardener-robot added needs/review Needs review kind/api-change API change with impact on API users needs/second-opinion Needs second review by someone else size/m Size of pull request is medium (see gardener-robot robot/bots/size.py) labels Aug 5, 2025
@gardener-robot-ci-3 gardener-robot-ci-3 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Aug 5, 2025
@elankath
Copy link
Member Author

elankath commented Aug 5, 2025

Successfully passed IT using aws provider.

Running Suite: Controller Suite - /Users/I034796/go/src/github.com/gardener/machine-controller-manager-provider-aws/test/integration/controller
===============================================================================================================================================
Random Seed: 1754369765

Will run 10 of 10 specs
------------------------------
[BeforeSuite]
/Users/I034796/go/src/github.com/gardener/machine-controller-manager-provider-aws/test/integration/controller/controller_test.go:47
  > Enter [BeforeSuite] TOP-LEVEL @ 08/05/25 10:26:16.554
  STEP: Checking for the clusters if provided are available @ 08/05/25 10:26:16.555
  2025/08/05 10:26:16 Control cluster kube-config - /Users/I034796/go/src/github.com/gardener/machine-controller-manager-provider-aws/dev/kube-configs/kubeconfig_control.yaml
  2025/08/05 10:26:16 Target cluster kube-config  - /Users/I034796/go/src/github.com/gardener/machine-controller-manager-provider-aws/dev/kube-configs/kubeconfig_target.yaml
  STEP: Killing any existing processes @ 08/05/25 10:26:18.743
  STEP: Checking Machine-Controller-Manager repo is available at: ../../../dev/mcm @ 08/05/25 10:26:18.907
  STEP: Scaledown existing machine controllers @ 08/05/25 10:26:18.907
  STEP: Starting Machine Controller  @ 08/05/25 10:26:19.084
  STEP: Starting Machine Controller Manager @ 08/05/25 10:26:19.086
  STEP: Cleaning any old resources @ 08/05/25 10:26:19.088
  2025/08/05 10:26:19 machinedeployments.machine.sapcloud.io "test-machine-deployment" not found
  2025/08/05 10:26:19 machines.machine.sapcloud.io "test-machine" not found
  2025/08/05 10:26:19 machineclasses.machine.sapcloud.io "test-mc-v1" not found
  2025/08/05 10:26:19 machineclasses.machine.sapcloud.io "test-mc-v2" not found
  STEP: Setup MachineClass @ 08/05/25 10:26:19.789
  STEP: Looking for machineclass resource in the control cluster @ 08/05/25 10:26:21.051
  STEP: Looking for secrets refered in machineclass in the control cluster @ 08/05/25 10:26:21.225
  STEP: Initializing orphan resource tracker @ 08/05/25 10:26:21.576
  2025/08/05 10:26:23 orphan resource tracker initialized
  < Exit [BeforeSuite] TOP-LEVEL @ 08/05/25 10:26:23.331 (6.776s)
[BeforeSuite] PASSED [6.777 seconds]
------------------------------
Machine controllers test machine resource creation should not lead to any errors and add 1 more node in target cluster
/Users/I034796/go/src/github.com/gardener/machine-controller-manager/pkg/test/integration/common/framework.go:649
  > Enter [BeforeEach] Machine controllers test @ 08/05/25 10:26:23.331
  STEP: Checking machineController process is running @ 08/05/25 10:26:23.331
  STEP: Checking machineControllerManager process is running @ 08/05/25 10:26:23.331
  STEP: Checking nodes in target cluster are healthy @ 08/05/25 10:26:23.331
  < Exit [BeforeEach] Machine controllers test @ 08/05/25 10:26:23.898 (567ms)
  > Enter [It] should not lead to any errors and add 1 more node in target cluster @ 08/05/25 10:26:23.899
  STEP: Checking for errors @ 08/05/25 10:26:24.091
  STEP: Waiting until number of ready nodes is 1 more than initial nodes @ 08/05/25 10:26:24.265
  < Exit [It] should not lead to any errors and add 1 more node in target cluster @ 08/05/25 10:28:08.267 (1m44.369s)
• [104.936 seconds]
------------------------------
Machine controllers test machine resource deletion when machines available should not lead to errors and remove 1 node in target cluster
/Users/I034796/go/src/github.com/gardener/machine-controller-manager/pkg/test/integration/common/framework.go:678
  > Enter [BeforeEach] Machine controllers test @ 08/05/25 10:28:08.267
  STEP: Checking machineController process is running @ 08/05/25 10:28:08.267
  STEP: Checking machineControllerManager process is running @ 08/05/25 10:28:08.267
  STEP: Checking nodes in target cluster are healthy @ 08/05/25 10:28:08.267
  < Exit [BeforeEach] Machine controllers test @ 08/05/25 10:28:08.667 (400ms)
  > Enter [It] should not lead to errors and remove 1 node in target cluster @ 08/05/25 10:28:08.667
  STEP: Checking for errors @ 08/05/25 10:28:09.772
  STEP: Waiting until test-machine machine object is deleted @ 08/05/25 10:28:09.955
  STEP: Waiting until number of ready nodes is equal to number of initial nodes @ 08/05/25 10:28:14.496
  < Exit [It] should not lead to errors and remove 1 node in target cluster @ 08/05/25 10:28:15.078 (6.411s)
• [6.812 seconds]
------------------------------
Machine controllers test machine resource deletion when machines are not available should keep nodes intact
/Users/I034796/go/src/github.com/gardener/machine-controller-manager/pkg/test/integration/common/framework.go:717
  > Enter [BeforeEach] Machine controllers test @ 08/05/25 10:28:15.078
  STEP: Checking machineController process is running @ 08/05/25 10:28:15.079
  STEP: Checking machineControllerManager process is running @ 08/05/25 10:28:15.079
  STEP: Checking nodes in target cluster are healthy @ 08/05/25 10:28:15.079
  < Exit [BeforeEach] Machine controllers test @ 08/05/25 10:28:15.474 (395ms)
  > Enter [It] should keep nodes intact @ 08/05/25 10:28:15.474
  STEP: Skipping as there are machines available and this check can't be performed @ 08/05/25 10:28:15.814
  < Exit [It] should keep nodes intact @ 08/05/25 10:28:15.814 (340ms)
• [0.736 seconds]
------------------------------
Machine controllers test machine deployment resource creation with replicas=0, scale up with replicas=1 should not lead to errors and add 1 more node to target cluster
/Users/I034796/go/src/github.com/gardener/machine-controller-manager/pkg/test/integration/common/framework.go:745
  > Enter [BeforeEach] Machine controllers test @ 08/05/25 10:28:15.814
  STEP: Checking machineController process is running @ 08/05/25 10:28:15.815
  STEP: Checking machineControllerManager process is running @ 08/05/25 10:28:15.815
  STEP: Checking nodes in target cluster are healthy @ 08/05/25 10:28:15.815
  < Exit [BeforeEach] Machine controllers test @ 08/05/25 10:28:16.403 (588ms)
  > Enter [It] should not lead to errors and add 1 more node to target cluster @ 08/05/25 10:28:16.403
  STEP: Checking for errors @ 08/05/25 10:28:16.608
  STEP: Waiting for Machine Set to be created @ 08/05/25 10:28:16.789
  STEP: Updating machineDeployment replicas to 1 @ 08/05/25 10:28:19.49
  STEP: Checking if machineDeployment's status has been updated with correct conditions @ 08/05/25 10:28:19.842
  STEP: Checking number of ready nodes==1 @ 08/05/25 10:30:08.906
  STEP: Fetching initial number of machineset freeze events @ 08/05/25 10:30:10.27
  < Exit [It] should not lead to errors and add 1 more node to target cluster @ 08/05/25 10:30:10.994 (1m54.592s)
• [115.181 seconds]
------------------------------
Machine controllers test machine deployment resource scale-up with replicas=6 should not lead to errors and add further 5 nodes to target cluster
/Users/I034796/go/src/github.com/gardener/machine-controller-manager/pkg/test/integration/common/framework.go:813
  > Enter [BeforeEach] Machine controllers test @ 08/05/25 10:30:10.997
  STEP: Checking machineController process is running @ 08/05/25 10:30:10.998
  STEP: Checking machineControllerManager process is running @ 08/05/25 10:30:10.998
  STEP: Checking nodes in target cluster are healthy @ 08/05/25 10:30:10.998
  < Exit [BeforeEach] Machine controllers test @ 08/05/25 10:30:11.601 (604ms)
  > Enter [It] should not lead to errors and add further 5 nodes to target cluster @ 08/05/25 10:30:11.601
  STEP: Checking for errors @ 08/05/25 10:30:11.954
  STEP: Checking number of ready nodes are 6 more than initial @ 08/05/25 10:30:11.954
  < Exit [It] should not lead to errors and add further 5 nodes to target cluster @ 08/05/25 10:32:06.448 (1m54.847s)
• [115.453 seconds]
------------------------------
Machine controllers test machine deployment resource scale-down with replicas=2 should not lead to errors and remove 4 nodes from target cluster
/Users/I034796/go/src/github.com/gardener/machine-controller-manager/pkg/test/integration/common/framework.go:843
  > Enter [BeforeEach] Machine controllers test @ 08/05/25 10:32:06.448
  STEP: Checking machineController process is running @ 08/05/25 10:32:06.448
  STEP: Checking machineControllerManager process is running @ 08/05/25 10:32:06.448
  STEP: Checking nodes in target cluster are healthy @ 08/05/25 10:32:06.448
  < Exit [BeforeEach] Machine controllers test @ 08/05/25 10:32:06.896 (448ms)
  > Enter [It] should not lead to errors and remove 4 nodes from target cluster @ 08/05/25 10:32:06.896
  STEP: Checking for errors @ 08/05/25 10:32:07.9
  STEP: Checking number of ready nodes are 2 more than initial @ 08/05/25 10:32:07.9
  < Exit [It] should not lead to errors and remove 4 nodes from target cluster @ 08/05/25 10:32:16.638 (9.742s)
• [10.191 seconds]
------------------------------
Machine controllers test machine deployment resource scale-down with replicas=2 should freeze and unfreeze machineset temporarily
/Users/I034796/go/src/github.com/gardener/machine-controller-manager/pkg/test/integration/common/framework.go:872
  > Enter [BeforeEach] Machine controllers test @ 08/05/25 10:32:16.639
  STEP: Checking machineController process is running @ 08/05/25 10:32:16.639
  STEP: Checking machineControllerManager process is running @ 08/05/25 10:32:16.639
  STEP: Checking nodes in target cluster are healthy @ 08/05/25 10:32:16.639
  < Exit [BeforeEach] Machine controllers test @ 08/05/25 10:32:17.061 (421ms)
  > Enter [It] should freeze and unfreeze machineset temporarily @ 08/05/25 10:32:17.061
  < Exit [It] should freeze and unfreeze machineset temporarily @ 08/05/25 10:32:17.798 (738ms)
• [1.159 seconds]
------------------------------
Machine controllers test machine deployment resource updation to v2 machine-class and replicas=4 should upgrade machines and add more nodes to target
/Users/I034796/go/src/github.com/gardener/machine-controller-manager/pkg/test/integration/common/framework.go:881
  > Enter [BeforeEach] Machine controllers test @ 08/05/25 10:32:17.799
  STEP: Checking machineController process is running @ 08/05/25 10:32:17.799
  STEP: Checking machineControllerManager process is running @ 08/05/25 10:32:17.799
  STEP: Checking nodes in target cluster are healthy @ 08/05/25 10:32:17.799
  < Exit [BeforeEach] Machine controllers test @ 08/05/25 10:32:18.394 (595ms)
  > Enter [It] should upgrade machines and add more nodes to target @ 08/05/25 10:32:18.394
  STEP: Checking for errors @ 08/05/25 10:32:18.762
  STEP: UpdatedReplicas to be 4 @ 08/05/25 10:32:18.762
  STEP: AvailableReplicas to be 4 @ 08/05/25 10:32:23.309
  STEP: Number of ready nodes be 4 more @ 08/05/25 10:34:23.608
  < Exit [It] should upgrade machines and add more nodes to target @ 08/05/25 10:34:33.299 (2m14.906s)
• [135.501 seconds]
------------------------------
Machine controllers test machine deployment resource deletion When there are machine deployment(s) available in control cluster should not lead to errors and list only initial nodes
/Users/I034796/go/src/github.com/gardener/machine-controller-manager/pkg/test/integration/common/framework.go:935
  > Enter [BeforeEach] Machine controllers test @ 08/05/25 10:34:33.299
  STEP: Checking machineController process is running @ 08/05/25 10:34:33.299
  STEP: Checking machineControllerManager process is running @ 08/05/25 10:34:33.299
  STEP: Checking nodes in target cluster are healthy @ 08/05/25 10:34:33.299
  < Exit [BeforeEach] Machine controllers test @ 08/05/25 10:34:33.721 (422ms)
  > Enter [It] should not lead to errors and list only initial nodes @ 08/05/25 10:34:33.721
  STEP: Checking for errors @ 08/05/25 10:34:33.905
  STEP: Waiting until number of ready nodes is equal to number of initial  nodes @ 08/05/25 10:34:34.092
  < Exit [It] should not lead to errors and list only initial nodes @ 08/05/25 10:34:47.208 (13.487s)
• [13.909 seconds]
------------------------------
Machine controllers test orphaned resources when the hyperscaler resources are queried should have been deleted
/Users/I034796/go/src/github.com/gardener/machine-controller-manager/pkg/test/integration/common/framework.go:972
  > Enter [BeforeEach] Machine controllers test @ 08/05/25 10:34:47.208
  STEP: Checking machineController process is running @ 08/05/25 10:34:47.208
  STEP: Checking machineControllerManager process is running @ 08/05/25 10:34:47.208
  STEP: Checking nodes in target cluster are healthy @ 08/05/25 10:34:47.208
  < Exit [BeforeEach] Machine controllers test @ 08/05/25 10:34:47.606 (398ms)
  > Enter [It] should have been deleted @ 08/05/25 10:34:47.606
  STEP: Querying and comparing @ 08/05/25 10:34:47.606
  < Exit [It] should have been deleted @ 08/05/25 10:34:49.429 (1.823s)
• [2.222 seconds]
------------------------------
[AfterSuite]
/Users/I034796/go/src/github.com/gardener/machine-controller-manager-provider-aws/test/integration/controller/controller_test.go:49
  > Enter [AfterSuite] TOP-LEVEL @ 08/05/25 10:34:49.43
  STEP: Running Cleanup @ 08/05/25 10:34:49.43
  2025/08/05 10:35:09 machinedeployments.machine.sapcloud.io "test-machine-deployment" not found
  2025/08/05 10:35:09 machines.machine.sapcloud.io "test-machine" not found
  2025/08/05 10:35:09 deleting test-mc-v1 machineclass
  2025/08/05 10:35:10 machineclass deleted
  2025/08/05 10:35:10 deleting test-mc-v2 machineclass
  2025/08/05 10:35:11 machineclass deleted
  STEP: Killing any existing processes @ 08/05/25 10:35:11.264
  2025/08/05 10:35:11 controller_manager --control-kubeconfig=/Users/I034796/go/src/github.com/gardener/machine-controller-manager-provider-aws/dev/kube-configs/kubeconfig_control.yaml --target-kubeconfig=/Users/I034796/go/src/github.com/gardener/machine-controller-manager-provider-aws/dev/kube-configs/kubeconfig_target.yaml --namespace=shoot--i034796--aw --safety-up=2 --safety-down=1 --machine-safety-overshooting-period=300ms --leader-elect=false --v=3
  2025/08/05 10:35:11 stopMCM killed MCM process(es) with pid(s): [121]
  2025/08/05 10:35:11 main --control-kubeconfig=/Users/I034796/go/src/github.com/gardener/machine-controller-manager-provider-aws/dev/kube-configs/kubeconfig_control.yaml --target-kubeconfig=/Users/I034796/go/src/github.com/gardener/machine-controller-manager-provider-aws/dev/kube-configs/kubeconfig_target.yaml --namespace=shoot--i034796--aw --machine-creation-timeout=20m --machine-drain-timeout=5m --machine-health-timeout=10m --machine-pv-detach-timeout=2m --machine-safety-apiserver-statuscheck-timeout=30s --machine-safety-apiserver-statuscheck-period=1m --machine-safety-orphan-vms-period=30m --leader-elect=false --v=3
  2025/08/05 10:35:11 stopMCM killed MCM process(es) with pid(s): [123]
  STEP: Scale back the existing machine controllers @ 08/05/25 10:35:11.523
  < Exit [AfterSuite] TOP-LEVEL @ 08/05/25 10:35:12.162 (22.732s)
[AfterSuite] PASSED [22.732 seconds]
------------------------------

Ran 10 of 10 Specs in 535.611 seconds
SUCCESS! -- 10 Passed | 0 Failed | 0 Pending | 0 Skipped
PASS

Ginkgo ran 1 suite in 9m7.169859375s
Test Suite Passed
Integration tests completed successfully
/Users/I034796/go/src/github.com/gardener/machine-controller-manager-provider-aws

Copy link
Member

@takoverflow takoverflow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some typos and missed changes, PTAL thanks.

targetVirtualCapacity = v1.ResourceList{}
}

// Delete any keys that existed in the past but has been deleted now
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Delete any keys that existed in the past but has been deleted now
// Delete any keys that existed in the past but have been removed now

@gardener-robot gardener-robot added the needs/changes Needs (more) changes label Aug 5, 2025
@gardener-robot gardener-robot added size/l Size of pull request is large (see gardener-robot robot/bots/size.py) and removed size/m Size of pull request is medium (see gardener-robot robot/bots/size.py) labels Aug 20, 2025
@gardener-robot gardener-robot added size/xl Size of pull request is huge (see gardener-robot robot/bots/size.py) and removed size/l Size of pull request is large (see gardener-robot robot/bots/size.py) labels Aug 21, 2025
@elankath
Copy link
Member Author

elankath commented Aug 21, 2025

Simple Manual Test

Pre-requisite

You have setup machine-controller-manager and machine-controller-manager-provider-xxx using local setup. See docs.

Patch a MachineClass and add an attribute to the nodeTemplate.virtualCapacity

kubectl patch mcc shoot--i034796--aw-a-z1-38019 --type merge -p '{"nodeTemplate":{"virtualCapacity":{"hc.hana.com/memory":"1234567"}}}

machineclass.machine.sapcloud.io/shoot--i034796--aw-a-z1-38019 patched

Now check affected node(s) of machine class

kubectl get no ip-10-180-27-149.eu-west-1.compute.internal -oyaml | yq '.status.capacity'

cpu: "2"
ephemeral-storage: 52339692Ki
hc.hana.com/memory: "1234567" <--- extended resource added 
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 7927080Ki
pods: "110"

This is also added automatically by the kubelet to node.status.allocatable

kubectl get no ip-10-180-27-149.eu-west-1.compute.internal -oyaml | yq '.status.allocatable'
cpu: 1920m
ephemeral-storage: "50916052338"
hc.hana.com/memory: "1234567" <-- added by kubelet automatically
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 6776104Ki
pods: "110"

Patch MachineClass and remove attribute from the nodeTemplate.virtualCapacity

 kubectl patch mcc shoot--i034796--aw-a-z1-38019 --type='json' -p='[{"op": "remove", "path": "/nodeTemplate/virtualCapacity/hc.hana.com~1memory"}]'  #mcrvr
 
 machineclass.machine.sapcloud.io/shoot--i034796--aw-a-z1-38019 patched

Re-check affected node(s) of machine class

Extended resource should be removed from both node.status.capacity and node.status.allocatable.

kubectl get no ip-10-180-27-149.eu-west-1.compute.internal -oyaml | yq '.status.capacity'
cpu: "2"
ephemeral-storage: 52339692Ki
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 7927080Ki
pods: "110"

kubectl get no ip-10-180-27-149.eu-west-1.compute.internal -oyaml | yq '.status.allocatable'
cpu: 1920m
ephemeral-storage: "50916052338"
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 6776104Ki
pods: "110"

@gardener-robot gardener-robot added size/l Size of pull request is large (see gardener-robot robot/bots/size.py) and removed size/xl Size of pull request is huge (see gardener-robot robot/bots/size.py) labels Aug 21, 2025
Copy link
Member

@aaronfern aaronfern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good!
lgtm from me in principle, pending #1024 (comment) gets resolved

Copy link
Member

@aaronfern aaronfern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes!
/lgtm

@gardener-robot gardener-robot added reviewed/lgtm Has approval for merging and removed needs/changes Needs (more) changes labels Aug 21, 2025
@gardener-robot gardener-robot removed needs/review Needs review needs/second-opinion Needs second review by someone else labels Aug 21, 2025
Copy link
Member

@takoverflow takoverflow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@elankath
Copy link
Member Author

IT also succeeded after latest changes and review comments

Ran 10 of 10 Specs in 606.364 seconds
SUCCESS! -- 10 Passed | 0 Failed | 0 Pending | 0 Skipped
PASS

Ginkgo ran 1 suite in 10m16.334717042s
Test Suite Passed
Integration tests completed successfully

@aaronfern aaronfern merged commit 4da751b into gardener:master Aug 25, 2025
13 checks passed
@gardener-robot gardener-robot added the status/closed Issue is closed (either delivered or triaged) label Aug 25, 2025
afritzler pushed a commit to afritzler/machine-controller-manager that referenced this pull request Nov 26, 2025
* Added VirtualCapacity field to MachineClass NodeTemplate

* Added more unit tests for SyncVirtualCapacity

* enqueue machine when machineclass is updated

* Update pkg/util/provider/machinecontroller/machine_util.go

Co-authored-by: Prashant Tak <[email protected]>

* fixed typo dependecies->dependencies

* remove setup struct from SyncVirtualCapacity test

* remov LastAppliedVirtualCapacityAnnotation if empty

* resolved with changes from PR 1015

---------

Co-authored-by: Prashant Tak <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/api-change API change with impact on API users reviewed/lgtm Has approval for merging reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) size/l Size of pull request is large (see gardener-robot robot/bots/size.py) status/closed Issue is closed (either delivered or triaged)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants