Skip to content

Conversation

@lindig
Copy link
Contributor

@lindig lindig commented Jun 17, 2021

Just for discussion. This is my approach to refresh server certificates. This rotation of the certs works both on the client and the server, including updating bundles and the database. However, I don't know yet how best to activate them without losing connections. Hence, the stunnel re-configuration is commented out.

let open Certificates in
with_cert_lock @@ fun () ->
let old_certs = Db_util.get_host_certs ~__context ~type' ~host in
let new_cert = write_cert_fs () in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overwrites the existing certificate. What happens when there's a failure and the previous certificates needs to be reinstated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I should rename them to *.bak such that they would not be picked up by bundling but they would be around for recovery.

WireProtocol.{filename= Printf.sprintf "%s.new.pem" uuid; content}
in
let job rpc session_id host =
Worker.remote_write_certs_fs HostPoolCertificate Merge [file] host rpc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if file already exists on the remote host? We probably don't want to overwrite it, because the remote host could be relying on it

let pem = cert_path type' in
let path = new_cert_path type' in
let cert = new_host_cert ~dbg ~path in
let bak = backup_cert_path type' in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to think about the case where bak exists on the file system at this point - it probably means that a previous cert refresh failed. Do we just error out in this case or try to resolve the problem?

Copy link
Contributor

@lippirk lippirk Jun 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see we don't actually remove bak at the end. The idea is that it is useful to keep around in case the user needs to manually intervene because of a failure?

I'm thinking about the case where a cert refresh has failed - the user's first instinct after running xe cert-refresh and seeing a failure is going to be to run it again, so it would be nice not to overwrite bak in this case (it depends on whether distribute_new_host_cert_fails or not, which I am not sure about)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Maybe we should fail if the backup exists

@lindig
Copy link
Contributor Author

lindig commented Jun 21, 2021

This should not be merged before we have support for systemd-based reload of Stunnel merged.

@lindig lindig force-pushed the private/christianlin/CP-36098 branch from a3de94b to c6821a9 Compare June 21, 2021 14:50
lindig added 4 commits June 23, 2021 15:51
Add a new API call that re-creates new self-signed server certificates
and distributes them in the pool. This commit is just introducing the
API call scaffolding.

Signed-off-by: Christian Lindig <[email protected]>
To make the function usable for both internal and external host
certificates, add a path parameter to where to install a certificate.

Signed-off-by: Christian Lindig <[email protected]>
Replace the pool-internal self-signed certificacte of a host with a new
one, distribute it in the pool, and disable the previous certificate.

Signed-off-by: Christian Lindig <[email protected]>
Introduce pool operation cert_refresh such that we can block other
operations in parallel with it.

Signed-off-by: Christian Lindig <[email protected]>
@lindig lindig force-pushed the private/christianlin/CP-36098 branch from 7f5299f to ae8cf8b Compare June 23, 2021 14:52
Copy link
Contributor

@lippirk lippirk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use the cert distrib lock introduced here: https://github.com/xapi-project/xen-api/pull/4441/files . Whilst it seems like the lock induced by the cert refresh pool operation should be sufficient, it does not mutually exclude pool.join cert distribs and Host.refresh_server_certificate

However I'm happy to get this in and fix that later

@lindig lindig merged commit 27b41c5 into xapi-project:master Jun 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants