-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Ppaul/failover ledger endpoint #51947
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
TODO:
|
Done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request adds failover endpoint support to the Azure Confidential Ledger SDK, enabling automatic routing to backup ledger instances when the primary endpoint fails. The implementation provides a new failover service that can discover and attempt operations against backup ledger endpoints.
Key changes include:
- Introduction of a new
ConfidentialLedgerFailoverServiceclass that handles failover endpoint discovery and execution - Integration of the failover service into the main
ConfidentialLedgerClient - Version bump from 1.4.1-beta.3 to 1.5.0-beta.1
Reviewed Changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| ConfidentialLedgerFailoverService.cs | New service class implementing failover endpoint discovery and operation execution logic |
| ConfidentialLedgerClient.cs | Integration of the failover service into the main client |
| Azure.Security.ConfidentialLedger.csproj | Version increment to reflect new failover functionality |
| CHANGELOG.md | Documentation of the new failover routing feature |
You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.
...onfidentialledger/Azure.Security.ConfidentialLedger/src/ConfidentialLedgerFailoverService.cs
Outdated
Show resolved
Hide resolved
...onfidentialledger/Azure.Security.ConfidentialLedger/src/ConfidentialLedgerFailoverService.cs
Outdated
Show resolved
Hide resolved
...onfidentialledger/Azure.Security.ConfidentialLedger/src/ConfidentialLedgerFailoverService.cs
Outdated
Show resolved
Hide resolved
...onfidentialledger/Azure.Security.ConfidentialLedger/src/ConfidentialLedgerFailoverService.cs
Show resolved
Hide resolved
...onfidentialledger/Azure.Security.ConfidentialLedger/src/ConfidentialLedgerFailoverService.cs
Outdated
Show resolved
Hide resolved
...onfidentialledger/Azure.Security.ConfidentialLedger/src/ConfidentialLedgerFailoverService.cs
Show resolved
Hide resolved
| { | ||
| List<Uri> endpoints = await GetFailoverEndpointsAsync(primaryEndpoint, cancellationToken).ConfigureAwait(false); | ||
| Exception last = null; | ||
| foreach (var ep in endpoints) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a minor suggestion, probably not worth doing for this PR since this customer only has one failover: switch to Task.WhenAny or similar for the async failovers method. That should also simplify handling of the requestfailed exceptions.
| var jsonWriterOptions = new System.Text.Json.JsonWriterOptions | ||
| { | ||
| Indented = true, | ||
| Encoder = System.Text.Encodings.Web.JavaScriptEncoder.UnsafeRelaxedJsonEscaping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious - what breaks if we use the regular utf8 encoder?
| } | ||
| ms.Position = 0; | ||
| // Wrap in a synthetic Response that mimics the original status/headers but with new content. | ||
| return new SyntheticResponse(currentResponse, ms.ToArray()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Azure Response ContentStream is settable. I'm wondering if we can just set the currentResponse.contentStream with your memorystream to avoid wrapping and creating the SyntheticResponse.
| public abstract Stream? ContentStream { get; set; } |
|
|
||
| if (!string.IsNullOrEmpty(failoverLedgerId)) | ||
| { | ||
| Uri endpoint = new UriBuilder(primaryEndpoint) { Host = $"{failoverLedgerId}.{LedgerDomainSuffix}" }.Uri; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is passing the primaryEndpoint needed here?
| { | ||
| // Include 404 and specific UnknownLedgerEntry error code. | ||
| return ex.Status == 404 || | ||
| string.Equals(ex.ErrorCode, "UnknownLedgerEntry", StringComparison.OrdinalIgnoreCase) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should UnknownLedgerEntry be retriable? That's a guarantee that no key was written in that transaction right?
msftsettiy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, please explore the option to set the service identity cert for the fail over endpoints in the transport options object.
| try | ||
| { | ||
| using HttpMessage primaryMessage = CreateGetLedgerEntryRequest(_ledgerEndpoint, transactionId, collectionId, context); | ||
| return await _pipeline.ProcessMessageAsync(primaryMessage, context).ConfigureAwait(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgetting a little: does the pipeline account for Not Ready state? I only remember that the python SDK does not.
|
Hi @PallabPaul. Thank you for your interest in helping to improve the Azure SDK experience and for your contribution. We've noticed that there hasn't been recent engagement on this pull request. If this is still an active work stream, please let us know by pushing some changes or leaving a comment. Otherwise, we'll close this out in 7 days. |
|
Hi @PallabPaul. Thank you for your contribution. Since there hasn't been recent engagement, we're going to close this out. Feel free to respond with a comment containing |
|
/reopen |
| private readonly HttpPipeline _pipeline; | ||
| private readonly ClientDiagnostics _clientDiagnostics; | ||
|
|
||
| internal const string IdentityServiceBaseUrl = "https://identity.confidential-ledger.core.azure.com"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To get the actual value you'd need to follow the same logic as in GetIdentityServerTlsCert, i.e.
ledgerOptions?.CertificateEndpoint ?? new Uri(Default_Certificate_Endpoint)
It can be overriden in ledgerOptions, and please reuse the default endpoint value from ConfidentialLedgerClient but that would need to change private const string to public const string I bet
|
|
||
| if (!string.IsNullOrEmpty(failoverLedgerId)) | ||
| { | ||
| Uri endpoint = new UriBuilder(primaryEndpoint) { Host = $"{failoverLedgerId}.{LedgerDomainSuffix}" }.Uri; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
primary endpoint host already contains the suffix, which might be staging or any new env we add later like ppe. Therefore if would make sense to replace the first part of the existing host instead of constructing the new one with the current prod suffix.
| { | ||
| List<Uri> endpoints = await GetFailoverEndpointsAsync(primaryEndpoint, cancellationToken).ConfigureAwait(false); | ||
| Exception last = null; | ||
| foreach (var ep in endpoints) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting point. It is saying that the order of the failover ledgers is important and sequential. For instance if there are 2 failover ledgers and the primary fails it will redirect all traffic to the first failover ledger and then to third only if the primary and secondary fail. Third one would rarely be used in this case.
If the order does not matter much it might make sense to forward requests to a randomly ordered list of failover ledgers to improve the distribution of the requests. So that when we get x million of requests they do not suddenly get all forwarded to the failover ledger but distribute that load across all of the failover ones.
| private static bool IsRetriableFailure(RequestFailedException ex) | ||
| { | ||
| // Include 404 and specific UnknownLedgerEntry error code. | ||
| return ex.Status == 404 || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These errors need to be updated, I doubt that a failover endpoint has so many retriable conditions. 404 would mean that the ledger does not exist, UnknownLedgerEntry is not related to /failover, 408 not necessary either. 503 and 504 can be dropped because they are included in ex.Status >= 500
| }, | ||
| nameof(GetCurrentLedgerEntryAsync), | ||
| collectionId, | ||
| context?.CancellationToken ?? default).ConfigureAwait(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make sure failover works as expected the timeout would need to be extended, for instance if the client is getting an entry the timer starts. The requests will be retried to primary according to the client options and default pipeline options, i.e. this will eat into timeout quite a bit. Then we say the failover needs to happen with the remaining time which not only needs to fulfill the original intent of getting an entry but also fetch the list of failover ledgers (needs more work).
What I am saying is that we either automatically extend the timeout here, or increase it in the default client options for all ledgers that use failover endpoints or somehow tell the customers to make sure it is increased in their client options to work as expected.
| } | ||
| catch (Exception) | ||
| { | ||
| Response failoverCurrent = _failoverService.ExecuteOnFailovers( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion:
to not repeat this block for every request is could be implemented in an additional http pipeline policy which currently has default settings, etc. it would be possible to refactor the ConfidentialLedgerFailoverService into a policy and add it to the base client thus enabling automatic failover for any request: HttpPipelineBuilder.Build(..., failoverPolicy, ...)
The policy might look like:
public class FailoverPolicy : HttpPipelinePolicy
{
...
public override void Process(HttpMessage message, ReadOnlyMemory<HttpPipelinePolicy> pipeline)
{
TrySendWithFailover(message, pipeline);
}
...
private void TrySendWithFailover(HttpMessage message, ReadOnlyMemory<HttpPipelinePolicy> pipeline)
{
// TODO: get endpoints _endpoints (primary + failover ones)
// this is an exampl e, in reality we'd want to first retry the primary endpoin and only then failover
for (int attempt = 0; attempt < _endpoints.Count; attempt++)
{
message.Request.Uri.Reset(_endpoints[_currentIndex]);
try
{
ProcessNext(message, pipeline);
if (message.Response.Status >= 200 && message.Response.Status < 500)
return; // Success
}
catch
{
// Ignore and try next endpoint
}
_currentIndex = (_currentIndex + 1) % _endpoints.Count;
}
throw new RequestFailedException("All endpoints failed.");
}
...
}
Contributing to the Azure SDK
Please see our CONTRIBUTING.md if you are not familiar with contributing to this repository or have questions.
For specific information about pull request etiquette and best practices, see this section.
Changes:
GetLedgerEntry,GetLedgerEntryAsync,GetCurrentLedgerEntry, andGetCurrentLedgerEntryAsyncmethods.