-
Notifications
You must be signed in to change notification settings - Fork 1.5k
fix(spanner): avoid double decrement in session counting #13395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary of ChangesHello @egonelbre, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a subtle bug in the Spanner client's session management where the count of in-use sessions could become negative due to an erroneous double decrement. The change ensures the session counter is accurately maintained, preventing potential issues with session pool health and resource management. New comprehensive tests have been added to validate the fix and prevent future regressions. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request addresses a potential race condition in session management that could lead to a double decrement of the in-use session counter. The fix in spanner/session.go is straightforward and correct, preventing the second decrement by returning early. The addition of two targeted tests in spanner/session_test.go is excellent, as they cover the specific scenarios where this bug could manifest. The changes are well-contained and effectively resolve the issue. I have one minor suggestion to improve consistency in the new test code.
0c379ad to
9e8f2ea
Compare
|
@egonelbre Thanks for opening this PR, we are planning to remove all session pool related code in next release since by default client lib uses multiplexed sessions, are you still using regular session in your application? if yes, can you please tell us any specific reason for using it? |
|
@rahul2393 sure. So we've delayed upgrading spanner library for some time because they ended up stalling out our CI pipeline and were unsure why. But, we don't have a nice small reproducer. Yesterday and today I finally had time to bisect the issue to 0491ba6 commit. After updating to latest spanner library and "GOOGLE_CLOUD_SPANNER_MULTIPLEXED_SESSIONS=false" the CI pipeline does seem to work again. The high level scenario is following, the CI pipeline sets up 7 spanner emulators (to avoid emulator internal locks), each tests sets up and tears down a new database/tables. Tests run in a highly parallel fashion (I would guess about 8000 tests, each setting up 2-3 databases per test and 2-6 connection pools per test). After some time the spanner emulator stops accepting connections. This suggests that something isn't being properly cleaned up. I'm guessing that either connections or sessions are left open, which eventually exhaust the spanner emulator. Or that some goroutines are not being shut down. Or that some retry mechanism is causing problems. But, it could also be a spanner emulator bug with regards to multiplexed sessions that break it down. I'm still figuring out what the next best step is in debugging multiplexed sessions. Probably will either try to add some monitoring against spanner emulators. Any suggestions for debugging would be very welcome. This exact fix came about when I ran with that specific environment flag disabled, and had one test failure with the specific message. Directed gemini to diagnose the issue for me. And, if the single sessions are being removed I'm fine closing the PR. |
|
It looks like one of the issues is #13396; still testing whether it resolves the full problem. EDIT: seems that it's not sufficient, but it does avoid goroutine buildup. |
|
@rahul2393 as far as I can able to debug. It seems it's partially related to #10378 issue; and partially that the spanner emulator falters around having ~10000 sessions open (GoogleCloudPlatform/cloud-spanner-emulator#159). I'm not yet sure why multiplexed connections make it worse. |
|
@olavloite / @sakthivelmanii need help in one more approval to merge this one, please help |
PR created by the Librarian CLI to initialize a release. Merging this PR will auto trigger a release. Librarian Version: v1.0.0 Language Image: us-central1-docker.pkg.dev/cloud-sdk-librarian-prod/images-prod/librarian-go@sha256:718167d5c23ed389b41f617b3a00ac839bdd938a6bd2d48ae0c2f1fa51ab1c3d <details><summary>spanner: 1.87.0</summary> ## [1.87.0](spanner/v1.86.1...spanner/v1.87.0) (2025-12-10) ### Features * Add Send and Ack mutations for Queues (PiperOrigin-RevId: 832425466) ([185951b](185951b3)) * Exposing AutoscalingConfig in InstancePartition (PiperOrigin-RevId: 825184314) ([185951b](185951b3)) * Add Spanner location API (PiperOrigin-RevId: 833474957) ([185951b](185951b3)) * Add QueryAdvisorResult for query plan (PiperOrigin-RevId: 832425466) ([185951b](185951b3)) * improve the SQL formatting when printing out SQL (#13267) ([af0806f](af0806f4)) * Add grpc.xds.resource_type label to xDS client metrics (#13358) ([b9196cf](b9196cf6)) * support subquery in View Join (#13266) ([d19f797](d19f797b)) ### Bug Fixes * add env var to allow disabling directpath bound token (#13265) ([029bc79](029bc795)) * fix createMultiplexedSession goroutine leak (#13396) ([1805e89](1805e895)) * decoding spanner rows using SelectAll should map values in correct annotations (#13301) ([315f65b](315f65b5)) * error instead of panic for iterator after tx end (#13366) ([a27c19a](a27c19ae)) * transaction_tag should be set on BeginTransactionRequest (#13463) ([a429aea](a429aea4)) * Configure keepAlive time for gRPC TCP connections (#13216) ([ca8f64e](ca8f64e0)) * avoid double decrement in session counting (#13395) ([e036421](e0364214)) ### Documentation * minor update for Spanner Location API (PiperOrigin-RevId: 834841888) ([185951b](185951b3)) * Update description for the BatchCreateSessionsRequest and Session (PiperOrigin-RevId: 832425466) ([185951b](185951b3)) * Update description for the IsolationLevel (PiperOrigin-RevId: 832425466) ([185951b](185951b3)) </details>
…3464) PR created by the Librarian CLI to initialize a release. Merging this PR will auto trigger a release. Librarian Version: v1.0.0 Language Image: us-central1-docker.pkg.dev/cloud-sdk-librarian-prod/images-prod/librarian-go@sha256:718167d5c23ed389b41f617b3a00ac839bdd938a6bd2d48ae0c2f1fa51ab1c3d <details><summary>spanner: 1.87.0</summary> ## [1.87.0](googleapis/google-cloud-go@spanner/v1.86.1...spanner/v1.87.0) (2025-12-10) ### Features * Add Send and Ack mutations for Queues (PiperOrigin-RevId: 832425466) ([185951b](googleapis@185951b3)) * Exposing AutoscalingConfig in InstancePartition (PiperOrigin-RevId: 825184314) ([185951b](googleapis@185951b3)) * Add Spanner location API (PiperOrigin-RevId: 833474957) ([185951b](googleapis@185951b3)) * Add QueryAdvisorResult for query plan (PiperOrigin-RevId: 832425466) ([185951b](googleapis@185951b3)) * improve the SQL formatting when printing out SQL (googleapis#13267) ([af0806f](googleapis@af0806f4)) * Add grpc.xds.resource_type label to xDS client metrics (googleapis#13358) ([b9196cf](googleapis@b9196cf6)) * support subquery in View Join (googleapis#13266) ([d19f797](googleapis@d19f797b)) ### Bug Fixes * add env var to allow disabling directpath bound token (googleapis#13265) ([029bc79](googleapis@029bc795)) * fix createMultiplexedSession goroutine leak (googleapis#13396) ([1805e89](googleapis@1805e895)) * decoding spanner rows using SelectAll should map values in correct annotations (googleapis#13301) ([315f65b](googleapis@315f65b5)) * error instead of panic for iterator after tx end (googleapis#13366) ([a27c19a](googleapis@a27c19ae)) * transaction_tag should be set on BeginTransactionRequest (googleapis#13463) ([a429aea](googleapis@a429aea4)) * Configure keepAlive time for gRPC TCP connections (googleapis#13216) ([ca8f64e](googleapis@ca8f64e0)) * avoid double decrement in session counting (googleapis#13395) ([e036421](googleapis@e0364214)) ### Documentation * minor update for Spanner Location API (PiperOrigin-RevId: 834841888) ([185951b](googleapis@185951b3)) * Update description for the BatchCreateSessionsRequest and Session (PiperOrigin-RevId: 832425466) ([185951b](googleapis@185951b3)) * Update description for the IsolationLevel (PiperOrigin-RevId: 832425466) ([185951b](googleapis@185951b3)) </details>
No description provided.