Skip to content

[azservicebus] Receiver Connection Recovery Loop Hammering Service #25288

@karenychen

Description

@karenychen

Bug Report

  • import path of package in question github.com/Azure/azure-sdk-for-go/sdk/messaging/azservicebus
  • SDK version v1.10.0
    • Specify the exact commit if possible; one way to get this is the REVISION
      column output by go list -m <module>, for example go list -m github.com/Azure/azure-sdk-for-go/sdk/azcore.
  • output of go version 1.24.4
  • What happened?

We had a short service side unavailability which caused InternalServerErrors during ReceiveMessages() calls. It seems like the internal link recovery attempts are executed without any backoffs, therefore hammering the service even when it's down.

Sampled 1 hour of data and from a single container we made close to 3k link recovery requests to Service Bus (roughly 1 recovery attempt every second?).

Image

Note that Service Bus did have to do a manual mitigation for this to recover, but I am not sure if us hammering their endpoints made the problems worse.

Here is a sample of the link recovery loop, we had this pattern until we manually restarted our container. @richardpark-msft I can DM you some query links too if you need them!

TIMESTAMP level msg
2025-09-17T23:00:00.0115926Z info azservicebus SDK event: azsb.Auth, content: (redacted-queue-name) refreshing claim
2025-09-17T23:00:00.0116228Z info azservicebus SDK event: azsb.Conn, content: Creating new client, current rev: 4379
2025-09-17T23:00:00.0443104Z info azservicebus SDK event: azsb.Conn, content: Client created, new rev: 4380, took 32ms
2025-09-17T23:00:00.0443277Z info azservicebus SDK event: azsb.Auth, content: (redacted-queue-name) negotiate claim, token expires on 2025-09-18T08:38:56Z
2025-09-17T23:00:00.0453975Z info azservicebus SDK event: azsb.Auth, content: negotiating claim for audience amqps://akssvcbus-israelnorthwest.servicebus.windows.net/redacted-queue-name with token type jwt and expiry of 1758184736
2025-09-17T23:00:00.0499255Z info azservicebus SDK event: azsb.Auth, content: (redacted-queue-name/$management) refreshing claim
2025-09-17T23:00:00.0499377Z info azservicebus SDK event: azsb.Auth, content: (redacted-queue-name/$management) negotiate claim, token expires on 2025-09-18T08:38:56Z
2025-09-17T23:00:00.0500455Z info azservicebus SDK event: azsb.Auth, content: (redacted-queue-name) next refresh in 9h28m55.950088677s
2025-09-17T23:00:00.0509761Z info azservicebus SDK event: azsb.Auth, content: negotiating claim for audience amqps://akssvcbus-israelnorthwest.servicebus.windows.net/redacted-queue-name/$management with token type jwt and expiry of 1758184736
2025-09-17T23:00:00.0522487Z info azservicebus SDK event: azsb.Auth, content: (redacted-queue-name/$management) next refresh in 9h28m55.947863564s
2025-09-17T23:00:00.0555609Z info azservicebus SDK event: azsb.Conn, content: [c:4380, l:4380, r:name:mBbW_4] Links created
2025-09-17T23:00:00.0555723Z info azservicebus SDK event: azsb.Receiver, content: [c:4380, l:4380, r:name:mBbW_4] Issuing 100 credits, have 0
2025-09-17T23:00:00.3072010Z info azservicebus SDK event: azsb.Receiver, content: [c:4380, l:4380, r:name:mBbW_4] Received 0/100 messages
2025-09-17T23:00:00.3072188Z error azservicebus SDK event: azsb.Conn, content: [c:4380, l:4380, r:name:mBbW_4] Closing connection AND links for error *Error{Condition: amqp:internal-error, Description: The service was unable to process the request; please retry the operation. For more information on exception types and proper exception handling, please refer to http://go.microsoft.com/fwlink/?LinkId=761101 Reference:9d6234e7-d7c3-4646-b139-0086e82c3aef, TrackingId:654c14f1000000030003629668cb3d70_G11_B22, SystemTracker:gi::G11:2193799:mBbW_4ZYxiqCjKm59_Flvpp9ZtSYTqh9VBfCaH_MKRu3CQnTuPiMog, bi::link221846(2193800_G11), Timestamp:2025-09-17T23:00:00, Info: map[]}
2025-09-17T23:00:00.3072214Z info azservicebus SDK event: azsb.Conn, content: [c:4380, l:4380, r:name:mBbW_4] Links closing (permanent: false)
2025-09-17T23:00:00.3072242Z info azservicebus SDK event: azsb.Conn, content: [c:4380, l:4380, r:name:mBbW_4] Closing Receiver
2025-09-17T23:00:00.3072261Z info azservicebus SDK event: azsb.Conn, content: [c:4380, l:4380, r:name:mBbW_4] Closing Session
2025-09-17T23:00:00.3075907Z info azservicebus SDK event: azsb.Conn, content: [c:4380, l:4380, r:name:mBbW_4] Closing RPC
2025-09-17T23:00:00.3082437Z error azservicebus SDK event: azsb.Receiver, content: [c:4380, l:4380, r:name:mBbW_4] Failure when receiving messages: *Error{Condition: amqp:internal-error, Description: The service was unable to process the request; please retry the operation. For more information on exception types and proper exception handling, please refer to http://go.microsoft.com/fwlink/?LinkId=761101 Reference:9d6234e7-d7c3-4646-b139-0086e82c3aef, TrackingId:654c14f1000000030003629668cb3d70_G11_B22, SystemTracker:gi::G11:2193799:mBbW_4ZYxiqCjKm59_Flvpp9ZtSYTqh9VBfCaH_MKRu3CQnTuPiMog, bi::link221846(2193800_G11), Timestamp:2025-09-17T23:00:00, Info: map[]}
2025-09-17T23:00:01.3087903Z info azservicebus SDK event: azsb.Auth, content: (redacted-queue-name) refreshing claim
2025-09-17T23:00:01.3088149Z info azservicebus SDK event: azsb.Conn, content: Creating new client, current rev: 4380
2025-09-17T23:00:01.3357072Z info azservicebus SDK event: azsb.Conn, content: Client created, new rev: 4381, took 26ms
2025-09-17T23:00:01.3357246Z info azservicebus SDK event: azsb.Auth, content: (redacted-queue-name) negotiate claim, token expires on 2025-09-18T08:38:56Z
2025-09-17T23:00:01.3367936Z info azservicebus SDK event: azsb.Auth, content: negotiating claim for audience amqps://akssvcbus-israelnorthwest.servicebus.windows.net/redacted-queue-name with token type jwt and expiry of 1758184736
2025-09-17T23:00:01.3380178Z info azservicebus SDK event: azsb.Auth, content: (redacted-queue-name/$management) refreshing claim
2025-09-17T23:00:01.3380299Z info azservicebus SDK event: azsb.Auth, content: (redacted-queue-name/$management) negotiate claim, token expires on 2025-09-18T08:38:56Z
2025-09-17T23:00:01.3381426Z info azservicebus SDK event: azsb.Auth, content: (redacted-queue-name) next refresh in 9h28m54.662004374s
2025-09-17T23:00:01.3389710Z info azservicebus SDK event: azsb.Auth, content: negotiating claim for audience amqps://akssvcbus-israelnorthwest.servicebus.windows.net/redacted-queue-name/$management with token type jwt and expiry of 1758184736
2025-09-17T23:00:01.3401031Z info azservicebus SDK event: azsb.Auth, content: (redacted-queue-name/$management) next refresh in 9h28m54.659994162s
2025-09-17T23:00:01.3462211Z info azservicebus SDK event: azsb.Conn, content: [c:4381, l:4381, r:name:Jb5Ji6] Links created
2025-09-17T23:00:01.3462994Z info azservicebus SDK event: azsb.Receiver, content: [c:4381, l:4381, r:name:Jb5Ji6] Issuing 100 credits, have 0
2025-09-17T23:00:01.5727826Z info azservicebus SDK event: azsb.Receiver, content: [c:4381, l:4381, r:name:Jb5Ji6] Received 0/100 messages
2025-09-17T23:00:01.5728086Z error azservicebus SDK event: azsb.Conn, content: [c:4381, l:4381, r:name:Jb5Ji6] Closing connection AND links for error *Error{Condition: amqp:internal-error, Description: The service was unable to process the request; please retry the operation. For more information on exception types and proper exception handling, please refer to http://go.microsoft.com/fwlink/?LinkId=761101 Reference:2667034d-508e-4049-af04-2443a300de7c, TrackingId:654c14f1000000030003629968cb3d71_G9_B22, SystemTracker:gi::G9:7863226:Jb5Ji6Bwreb6EG3oiVcJ8Ds27EMkNjLzBIpLMeaQKk4ymSLxVAjVsg, bi::link221849(7863227_G9), Timestamp:2025-09-17T23:00:01, Info: map[]}
2025-09-17T23:00:01.5728164Z info azservicebus SDK event: azsb.Conn, content: [c:4381, l:4381, r:name:Jb5Ji6] Links closing (permanent: false)
2025-09-17T23:00:01.5728220Z info azservicebus SDK event: azsb.Conn, content: [c:4381, l:4381, r:name:Jb5Ji6] Closing Receiver
2025-09-17T23:00:01.5728437Z info azservicebus SDK event: azsb.Conn, content: [c:4381, l:4381, r:name:Jb5Ji6] Closing Session
2025-09-17T23:00:01.5733018Z info azservicebus SDK event: azsb.Conn, content: [c:4381, l:4381, r:name:Jb5Ji6] Closing RPC
  • What did you expect or want to happen?

Should we have some kind of backoff if the link recovery fails repeatedly?

  • How can we reproduce it?

Unsure.. We only caught it because one of our queues were totally down

  • Anything we should know about your environment.

Running on k8s

Metadata

Metadata

Labels

ClientThis issue points to a problem in the data-plane of the library.Service AttentionWorkflow: This issue is responsible by Azure service team.Service Busneeds-team-attentionWorkflow: This issue needs attention from Azure service team or SDK team

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions