-
Notifications
You must be signed in to change notification settings - Fork 297
Cert Refresh #4440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cert Refresh #4440
Conversation
| let open Certificates in | ||
| with_cert_lock @@ fun () -> | ||
| let old_certs = Db_util.get_host_certs ~__context ~type' ~host in | ||
| let new_cert = write_cert_fs () in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This overwrites the existing certificate. What happens when there's a failure and the previous certificates needs to be reinstated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I should rename them to *.bak such that they would not be picked up by bundling but they would be around for recovery.
| WireProtocol.{filename= Printf.sprintf "%s.new.pem" uuid; content} | ||
| in | ||
| let job rpc session_id host = | ||
| Worker.remote_write_certs_fs HostPoolCertificate Merge [file] host rpc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens if file already exists on the remote host? We probably don't want to overwrite it, because the remote host could be relying on it
| let pem = cert_path type' in | ||
| let path = new_cert_path type' in | ||
| let cert = new_host_cert ~dbg ~path in | ||
| let bak = backup_cert_path type' in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to think about the case where bak exists on the file system at this point - it probably means that a previous cert refresh failed. Do we just error out in this case or try to resolve the problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see we don't actually remove bak at the end. The idea is that it is useful to keep around in case the user needs to manually intervene because of a failure?
I'm thinking about the case where a cert refresh has failed - the user's first instinct after running xe cert-refresh and seeing a failure is going to be to run it again, so it would be nice not to overwrite bak in this case (it depends on whether distribute_new_host_cert_fails or not, which I am not sure about)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! Maybe we should fail if the backup exists
|
This should not be merged before we have support for systemd-based reload of Stunnel merged. |
a3de94b to
c6821a9
Compare
Add a new API call that re-creates new self-signed server certificates and distributes them in the pool. This commit is just introducing the API call scaffolding. Signed-off-by: Christian Lindig <[email protected]>
To make the function usable for both internal and external host certificates, add a path parameter to where to install a certificate. Signed-off-by: Christian Lindig <[email protected]>
Replace the pool-internal self-signed certificacte of a host with a new one, distribute it in the pool, and disable the previous certificate. Signed-off-by: Christian Lindig <[email protected]>
Introduce pool operation cert_refresh such that we can block other operations in parallel with it. Signed-off-by: Christian Lindig <[email protected]>
7f5299f to
ae8cf8b
Compare
lippirk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use the cert distrib lock introduced here: https://github.com/xapi-project/xen-api/pull/4441/files . Whilst it seems like the lock induced by the cert refresh pool operation should be sufficient, it does not mutually exclude pool.join cert distribs and Host.refresh_server_certificate
However I'm happy to get this in and fix that later
Just for discussion. This is my approach to refresh server certificates. This rotation of the certs works both on the client and the server, including updating bundles and the database. However, I don't know yet how best to activate them without losing connections. Hence, the stunnel re-configuration is commented out.