-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-45516][CONNECT] Include QueryContext in SparkThrowable proto message #43352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@hvanhovell @juliuszsompolski @HyukjinKwon Please take a look |
juliuszsompolski
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
connector/connect/common/src/main/protobuf/spark/connect/base.proto
Outdated
Show resolved
Hide resolved
…proto Co-authored-by: Maxim Gekk <[email protected]>
|
@heyihong Please, add the |
|
Merged to master. |
| self, oneof_group: typing_extensions.Literal["_file_name", b"_file_name"] | ||
| ) -> typing_extensions.Literal["file_name"] | None: ... | ||
|
|
||
| class QueryContext(google.protobuf.message.Message): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qq; could we do this in Python client too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not yet if I understand correctly. @gatorsmile has some concern about whether exposing QueryContext in the PySpark Exception APIs makes sense for non-sql PySpark exceptions. There is some ongoing discussion for this.
| .newBuilder() | ||
| .setObjectType(queryCtx.objectType()) | ||
| .setObjectName(queryCtx.objectName()) | ||
| .setStartIndex(queryCtx.startIndex()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, this should actually fail after #43334 PR because DataFrameQueryContext throws an exception now (https://github.com/apache/spark/pull/43334/files#diff-b3bc05fec45cd951053b2876c71c7730b63789cb4336a7537a6654c724db3241R586-R589).
Seems like we miss this information Spark Connect client sides for now so it seems working .. but we should carry this context for DataFrame operations ..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MaxGekk I think there's a bug not only here but without Spark Connect.
scala> try {
| spark.range(1).select("a")
| } catch {
| case e: org.apache.spark.sql.catalyst.ExtendedAnalysisException => println(e.getQueryContext)
| }
[Lorg.apache.spark.QueryContext;@6a9d7514
val res2: Any = ()It doesn't contain QueryContext from #43334 ..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, there are three issues. TL;DR:
-
There are many places that does not provide
Originthat containsQueryContextatQueryCompilationErrors. So theQueryContextis often missing (e.g., the case above). cc @MaxGekk -
.
Origin.contextis being directly used on Executor side. e.g.,origin.contextatDivideYMInterval. This seems wrongly returningSQLQueryContextfor DataFrame operations (from Executor side) in Spark Connect server because we do not callwithOriginthere. That's why this specific code did not throw an exception from https://github.com/apache/spark/pull/43334/files#diff-b3bc05fec45cd951053b2876c71c7730b63789cb4336a7537a6654c724db3241R586-R589 because it has never beenDataFrameContextcc @heyihong @juliuszsompolski -
The current logic in
ErrorUtils.scalashould handle the case ofDataFrameQueryContexte.g.,DataFrameQueryContext.stopIndex()will throw an exception. Should we:- Set a default value in
DataFrameQueryContext.stopIndex()instead of throwing an exception? @MaxGekk - Or, make the protobuf message for this optional, and throw an exception from Spark Connect client sides? @heyihong @juliuszsompolski
- Set a default value in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should invoke withOrigin in Spark Connect server.
do you mean client? Server side stacktrace is not interesting to users to understand their own code mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a mistake. I updated my comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either setting a default value or making the protobuf field optional should work. It depends on the semantic of DataFrameQueryContext.stopIndex().
cc: @MaxGekk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make sure we fix this.
What changes were proposed in this pull request?
Why are the changes needed?
Does this PR introduce any user-facing change?
No
How was this patch tested?
build/sbt "connect-client-jvm/testOnly *ClientE2ETestSuite"Was this patch authored or co-authored using generative AI tooling?