Skip to content

Conversation

@iamemilio
Copy link

If a cluster uses self signed certificates, the master and worker nodes created by cluster-api will be unable to retrieve their ignition configs without trusting the CA. This fix adds user added CAs found in the AdditionalTrustBundle field to the master and worker ignition pointer configs (shims).

@openshift-ci-robot openshift-ci-robot added the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Nov 7, 2019
@openshift-ci-robot
Copy link
Contributor

@iamemilio: This pull request references Bugzilla bug 1769879, which is invalid:

  • expected the bug to target the "4.3.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

Bug 1769879: add AdditionalTrustBundles to master and worker shims

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: iamemilio
To complete the pull request process, please assign abhinavdahiya
You can assign the PR to them by writing /assign @abhinavdahiya in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Nov 7, 2019
@abhinavdahiya
Copy link
Contributor

/test e2e-azure

@abhinavdahiya
Copy link
Contributor

/test e2e-gcp

@abhinavdahiya
Copy link
Contributor

/test e2e-metal

1 similar comment
@iamemilio
Copy link
Author

/test e2e-metal

@iamemilio
Copy link
Author

/test e2e-azure

@sdodson
Copy link
Member

sdodson commented Nov 7, 2019

/test e2e-metal

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Nov 7, 2019

@iamemilio: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/prow/e2e-aws-scaleup-rhel7 8d0022b link /test e2e-aws-scaleup-rhel7

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@wking
Copy link
Member

wking commented Nov 8, 2019

Who manages this in-cluster as the configured CAs evolve and we continue to launch machines?

@wking
Copy link
Member

wking commented Nov 8, 2019

Can we have the machine-config operator generate these instead?

@tomassedovic
Copy link
Contributor

@wking I think the PR description is misleading here.

According to the BZ the issue is with the machine-controller pod being unable to talk to the OpenStack API due to a certificate failure.

Internally-deployed OpenStack clusters often run their API under an effectively self-signed cert (trusted inside the organisation but not publicly). In such cases, the CA needs to be added to the OpenStack VMs via Ignition or any OpenStack API request fails.

These certificates are managed by the OpenStack administrators and they're outside of the OpenShift cert management.

@iamemilio assuming what I wrote above is correct, will you please the PR description? Masters and workers should get their Ignition from the OpenShift cluster -- there should be no certificate issues there. It's about contacting the OpenStack APIs with cluster-api-provider-openstack.

@tomassedovic
Copy link
Contributor

/bugzilla refresh

@openshift-ci-robot openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Nov 8, 2019
@openshift-ci-robot
Copy link
Contributor

@tomassedovic: This pull request references Bugzilla bug 1769879, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot removed the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Nov 8, 2019
@abhinavdahiya
Copy link
Contributor

The worker and master pointer configs fetch their configs from the ignition-server running in the cluster, and they don't need the additional trust bundle.

@iamemilio
Copy link
Author

/test e2e-azure

@abhinavdahiya
Copy link
Contributor

Internally-deployed OpenStack clusters often run their API under an effectively self-signed cert (trusted inside the organisation but not publicly).

The current change doesn't fix that.

The additional trust bundle is already delivered to all the machines.

The machine-controller is the one doesn't have the trust. Because the machine's trusted bundle is not used by pods unless they mount in that from host.

@iamemilio
Copy link
Author

/hold I might have misunderstood what the root problem David was reporting was. Hold until I dig into it further.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 8, 2019
@jobcespedes
Copy link
Contributor

jobcespedes commented Nov 13, 2019

Facing same error message. I tried to use internal endpoint (insecure) in clouds.yaml. However, the machine controller still tries the public endpoint:

Error listing the instances (machine/actuator.go 472): Get service list err: Get https://<IP>:13774/v2.1/servers/detail?name=dev-mgqts-worker-fhnl7: x509: certificate signed by unknown authority

https://<IP>:13774 is nova (osapi) public endpoint

This is lab in Newton with some modification to make the installer compatible so far. So you might ignore my case.

@iamemilio
Copy link
Author

iamemilio commented Nov 14, 2019

Closing, this does not solve the root problem. Pod based services that interact with Open
Stack API need to trust the additionalCAbundle, which needs to be injected via a configmap. Open discussions regarding current and future solutions to this problem are occuring. One possible part of the solution that is being considered for the short term is: #2658.

@iamemilio iamemilio closed this Nov 14, 2019
@iamemilio iamemilio deleted the workers_certs branch November 14, 2019 20:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants