Skip to content

Conversation

@PallabPaul
Copy link
Member

@PallabPaul PallabPaul commented Aug 12, 2025

Contributing to the Azure SDK

Please see our CONTRIBUTING.md if you are not familiar with contributing to this repository or have questions.

For specific information about pull request etiquette and best practices, see this section.

Changes:

  • Added support to route to failover ledgers for the GetLedgerEntry, GetLedgerEntryAsync, GetCurrentLedgerEntry, and GetCurrentLedgerEntryAsync methods.

@PallabPaul
Copy link
Member Author

PallabPaul commented Aug 12, 2025

TODO:

  • remove log lines
  • merge /failover endpoint CP logic before merging this

@PallabPaul
Copy link
Member Author

TODO:

  • remove log lines
  • merge /failover endpoint CP logic before merging this

Done

@PallabPaul PallabPaul marked this pull request as ready for review August 14, 2025 22:22
Copilot AI review requested due to automatic review settings August 14, 2025 22:22
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request adds failover endpoint support to the Azure Confidential Ledger SDK, enabling automatic routing to backup ledger instances when the primary endpoint fails. The implementation provides a new failover service that can discover and attempt operations against backup ledger endpoints.

Key changes include:

  • Introduction of a new ConfidentialLedgerFailoverService class that handles failover endpoint discovery and execution
  • Integration of the failover service into the main ConfidentialLedgerClient
  • Version bump from 1.4.1-beta.3 to 1.5.0-beta.1

Reviewed Changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 6 comments.

File Description
ConfidentialLedgerFailoverService.cs New service class implementing failover endpoint discovery and operation execution logic
ConfidentialLedgerClient.cs Integration of the failover service into the main client
Azure.Security.ConfidentialLedger.csproj Version increment to reflect new failover functionality
CHANGELOG.md Documentation of the new failover routing feature

You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.

{
List<Uri> endpoints = await GetFailoverEndpointsAsync(primaryEndpoint, cancellationToken).ConfigureAwait(false);
Exception last = null;
foreach (var ep in endpoints)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a minor suggestion, probably not worth doing for this PR since this customer only has one failover: switch to Task.WhenAny or similar for the async failovers method. That should also simplify handling of the requestfailed exceptions.

var jsonWriterOptions = new System.Text.Json.JsonWriterOptions
{
Indented = true,
Encoder = System.Text.Encodings.Web.JavaScriptEncoder.UnsafeRelaxedJsonEscaping
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious - what breaks if we use the regular utf8 encoder?

}
ms.Position = 0;
// Wrap in a synthetic Response that mimics the original status/headers but with new content.
return new SyntheticResponse(currentResponse, ms.ToArray());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Azure Response ContentStream is settable. I'm wondering if we can just set the currentResponse.contentStream with your memorystream to avoid wrapping and creating the SyntheticResponse.

public abstract Stream? ContentStream { get; set; }


if (!string.IsNullOrEmpty(failoverLedgerId))
{
Uri endpoint = new UriBuilder(primaryEndpoint) { Host = $"{failoverLedgerId}.{LedgerDomainSuffix}" }.Uri;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is passing the primaryEndpoint needed here?

{
// Include 404 and specific UnknownLedgerEntry error code.
return ex.Status == 404 ||
string.Equals(ex.ErrorCode, "UnknownLedgerEntry", StringComparison.OrdinalIgnoreCase) ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should UnknownLedgerEntry be retriable? That's a guarantee that no key was written in that transaction right?

Copy link
Member

@msftsettiy msftsettiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, please explore the option to set the service identity cert for the fail over endpoints in the transport options object.

try
{
using HttpMessage primaryMessage = CreateGetLedgerEntryRequest(_ledgerEndpoint, transactionId, collectionId, context);
return await _pipeline.ProcessMessageAsync(primaryMessage, context).ConfigureAwait(false);
Copy link
Member

@taicchoumsft taicchoumsft Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgetting a little: does the pipeline account for Not Ready state? I only remember that the python SDK does not.

@github-actions
Copy link

Hi @PallabPaul. Thank you for your interest in helping to improve the Azure SDK experience and for your contribution. We've noticed that there hasn't been recent engagement on this pull request. If this is still an active work stream, please let us know by pushing some changes or leaving a comment. Otherwise, we'll close this out in 7 days.

@github-actions github-actions bot added the no-recent-activity There has been no recent activity on this issue. label Oct 17, 2025
@github-actions
Copy link

Hi @PallabPaul. Thank you for your contribution. Since there hasn't been recent engagement, we're going to close this out. Feel free to respond with a comment containing /reopen if you'd like to continue working on these changes. Please be sure to use the command to reopen or remove the no-recent-activity label; otherwise, this is likely to be closed again with the next cleanup pass.

@github-actions github-actions bot closed this Oct 24, 2025
@PallabPaul
Copy link
Member Author

/reopen

@github-actions github-actions bot reopened this Oct 30, 2025
@github-actions github-actions bot removed the no-recent-activity There has been no recent activity on this issue. label Oct 30, 2025
private readonly HttpPipeline _pipeline;
private readonly ClientDiagnostics _clientDiagnostics;

internal const string IdentityServiceBaseUrl = "https://identity.confidential-ledger.core.azure.com";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To get the actual value you'd need to follow the same logic as in GetIdentityServerTlsCert, i.e.

ledgerOptions?.CertificateEndpoint ?? new Uri(Default_Certificate_Endpoint)

It can be overriden in ledgerOptions, and please reuse the default endpoint value from ConfidentialLedgerClient but that would need to change private const string to public const string I bet


if (!string.IsNullOrEmpty(failoverLedgerId))
{
Uri endpoint = new UriBuilder(primaryEndpoint) { Host = $"{failoverLedgerId}.{LedgerDomainSuffix}" }.Uri;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

primary endpoint host already contains the suffix, which might be staging or any new env we add later like ppe. Therefore if would make sense to replace the first part of the existing host instead of constructing the new one with the current prod suffix.

{
List<Uri> endpoints = await GetFailoverEndpointsAsync(primaryEndpoint, cancellationToken).ConfigureAwait(false);
Exception last = null;
foreach (var ep in endpoints)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting point. It is saying that the order of the failover ledgers is important and sequential. For instance if there are 2 failover ledgers and the primary fails it will redirect all traffic to the first failover ledger and then to third only if the primary and secondary fail. Third one would rarely be used in this case.

If the order does not matter much it might make sense to forward requests to a randomly ordered list of failover ledgers to improve the distribution of the requests. So that when we get x million of requests they do not suddenly get all forwarded to the failover ledger but distribute that load across all of the failover ones.

private static bool IsRetriableFailure(RequestFailedException ex)
{
// Include 404 and specific UnknownLedgerEntry error code.
return ex.Status == 404 ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These errors need to be updated, I doubt that a failover endpoint has so many retriable conditions. 404 would mean that the ledger does not exist, UnknownLedgerEntry is not related to /failover, 408 not necessary either. 503 and 504 can be dropped because they are included in ex.Status >= 500

},
nameof(GetCurrentLedgerEntryAsync),
collectionId,
context?.CancellationToken ?? default).ConfigureAwait(false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make sure failover works as expected the timeout would need to be extended, for instance if the client is getting an entry the timer starts. The requests will be retried to primary according to the client options and default pipeline options, i.e. this will eat into timeout quite a bit. Then we say the failover needs to happen with the remaining time which not only needs to fulfill the original intent of getting an entry but also fetch the list of failover ledgers (needs more work).

What I am saying is that we either automatically extend the timeout here, or increase it in the default client options for all ledgers that use failover endpoints or somehow tell the customers to make sure it is increased in their client options to work as expected.

}
catch (Exception)
{
Response failoverCurrent = _failoverService.ExecuteOnFailovers(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion:
to not repeat this block for every request is could be implemented in an additional http pipeline policy which currently has default settings, etc. it would be possible to refactor the ConfidentialLedgerFailoverService into a policy and add it to the base client thus enabling automatic failover for any request: HttpPipelineBuilder.Build(..., failoverPolicy, ...)

The policy might look like:

public class FailoverPolicy : HttpPipelinePolicy
{
...

public override void Process(HttpMessage message, ReadOnlyMemory<HttpPipelinePolicy> pipeline)
    {
        TrySendWithFailover(message, pipeline);
    }
...

private void TrySendWithFailover(HttpMessage message, ReadOnlyMemory<HttpPipelinePolicy> pipeline)
    {
        // TODO: get endpoints _endpoints (primary + failover ones)
        // this is an exampl e, in reality we'd want to first retry the primary endpoin and only then failover
        for (int attempt = 0; attempt < _endpoints.Count; attempt++)
        {
            message.Request.Uri.Reset(_endpoints[_currentIndex]);
            try
            {
                ProcessNext(message, pipeline);
                if (message.Response.Status >= 200 && message.Response.Status < 500)
                    return; // Success
            }
            catch
            {
                // Ignore and try next endpoint
            }
            _currentIndex = (_currentIndex + 1) % _endpoints.Count;
        }
        throw new RequestFailedException("All endpoints failed.");
    }
...
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants