-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-47986][CONNECT][FOLLOW-UP] Unable to create a new session when the default session is closed by the server #47008
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| val session = tryCreateSessionFromClient() | ||
| .getOrElse(sessions.get(builder.configuration)) | ||
| .getOrElse({ | ||
| var existingSession = sessions.get(builder.configuration) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a lock here for sessions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so as Cache is backed by ConcurrentMap which allows concurrent access to the data. Source: https://guava.dev/releases/17.0/api/docs/com/google/common/cache/CacheBuilder.html.
| */ | ||
| def getDefaultSession: Option[SparkSession] = Option(defaultSession.get()) | ||
| def getDefaultSession: Option[SparkSession] = | ||
| Option(defaultSession.get()).filterNot(s => s.client != null && s.client.hasSessionChanged) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we can use the same naming with the Python side so we can easily land the same fix at both sides in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll look into the Python code, and try to make it much more harmonised with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no Scala counterparts for is_stopped or is_closed unfortunately, so I named it isSessionValid instead (not public).
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/SparkSessionE2ESuite.scala
Outdated
Show resolved
Hide resolved
connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/SparkSessionE2ESuite.scala
Outdated
Show resolved
Hide resolved
|
Looks fine but would be great to have looks from @nija-at and @juliuszsompolski |
nija-at
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM.
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala
Outdated
Show resolved
Hide resolved
...or/connect/common/src/main/scala/org/apache/spark/sql/connect/client/ResponseValidator.scala
Outdated
Show resolved
Hide resolved
| case e: StatusRuntimeException | ||
| if e.getStatus.getCode == Status.Code.INTERNAL && | ||
| e.getMessage.contains("[INVALID_HANDLE.SESSION_CHANGED]") => | ||
| sessionInvalidated.setRelease(true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also reset the serverSideSessionId?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so because serverSideSessionId will be updated to the new value if we reset it here. Maybe @nemanja-boric-databricks wants to double check this if I missed something here.
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala
Outdated
Show resolved
Hide resolved
This failure really doesn't seem to be related to this change.. |
|
Merged to master. |
What changes were proposed in this pull request?
This is a Scala port of #46221 and #46435.
A client is unaware of a server restart or the server having closed the client until it receives an error. However, at this point, the client in unable to create a new session to the same connect endpoint, since the stale session is still recorded
as the active and default session.
With this change, when the server communicates that the session has changed via a GRPC error, the session and the respective client are marked as stale, thereby allowing a new default connection can be created via the session builder.
In some cases, particularly when running older versions of the Spark cluster (3.5), the error actually manifests as a mismatch in the observed server-side session id between calls. With this fix, we also capture this case and ensure that this case is
also handled.
Why are the changes needed?
Being unable to use getOrCreate() after an error is unacceptable and should be fixed.
Does this PR introduce any user-facing change?
No
How was this patch tested?
./build/sbt testOnly *SparkSessionE2ESuite
Was this patch authored or co-authored using generative AI tooling?
No