Skip to content

Conversation

@aavasthy
Copy link
Contributor

Pull Request Template

Description

If a barrier request receives a 410/1022 from the backend for any given replica (quorum replicas in case of reads), then the SDK should attempt to retry the barrier request in the primary replica. If the primary replica responds with a 410/1022, then the entire Read or Write operation should bail-out/ fail and Reads/ Writes (either PPAF enabled on SM or in MM) should be retried on the next preferred region.

This PR adds integration tests for testing end to end flow with direct package changes.

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)

Closing issues

To automatically close an issue: closes #5383

// Handle barrier (HEAD) requests
if (request.OperationType == OperationType.Head)
{
barrierRequestCount++;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the purpose of differentiating the first and rest requests, it would be simpler to use a boolean:

bool firstRequest = true;

CosmosClientOptions clientOptions //...
{
    TransportClientHandlerFactory // ...
        interceptorAfterResult: // ...
        {
            // ...
            if (request.OperationType == OperationType.Head)
            {
                // ...
                if (firstRequest)
                {
                    // ...
                    firstRequest = false;
                } else 
                {
                    // ...
                }
            }
        }
}

Copy link
Contributor

@yash2710 yash2710 Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if it is not only about 2 requests, it would be meaningful to fail on barrierRequestCount > 2 in the if else ladder and fail fast

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The separate boolean approach might not work here because these tests validate different retry scenarios with varying expected behaviors - some tests expect exactly 2 barrier requests (success case), while others specifically test the SDK's retry exhaustion behavior that can generate up to 40 attempts.

Copy link
Member

@kirankumarkolli kirankumarkolli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two more scenarios to cover

  • 410/1022: From primary results in failure for Writes
  • 410/1022: From primary results in cross-region retry and succceds

@kirankumarkolli
Copy link
Member

Please also update the description

@aavasthy aavasthy added the auto-merge Enables automation to merge PRs label Nov 6, 2025
@kirankumarkolli kirankumarkolli merged commit a5c5d23 into master Nov 6, 2025
32 checks passed
@kirankumarkolli kirankumarkolli deleted the users/aavasthy/barrierupdate branch November 6, 2025 23:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge Enables automation to merge PRs

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Barrier Requests: Optimize Retry Logic for Failing Fast on 410/ Lease Not Found (1022) Exceptions

5 participants