[SPARK-45516][CONNECT] Include QueryContext in SparkThrowable proto message #43352

heyihong · 2023-10-12T12:16:32Z

What changes were proposed in this pull request?

Include QueryContext in SparkThrowable proto message
Reconstruct QueryContext for SparkThrowable exceptions on the client side

Why are the changes needed?

Better integration with the error framework

Does this PR introduce any user-facing change?

No

How was this patch tested?

build/sbt "connect-client-jvm/testOnly *ClientE2ETestSuite"

Was this patch authored or co-authored using generative AI tooling?

heyihong · 2023-10-12T12:39:37Z

@hvanhovell @juliuszsompolski @HyukjinKwon Please take a look

juliuszsompolski

LGTM

connector/connect/common/src/main/protobuf/spark/connect/base.proto

…proto Co-authored-by: Maxim Gekk <[email protected]>

MaxGekk · 2023-10-12T15:03:57Z

@heyihong Please, add the [CONNECT] tag.

HyukjinKwon · 2023-10-12T23:36:36Z

Merged to master.

HyukjinKwon · 2023-10-13T01:05:25Z

python/pyspark/sql/connect/proto/base_pb2.pyi

            self, oneof_group: typing_extensions.Literal["_file_name", b"_file_name"]
        ) -> typing_extensions.Literal["file_name"] | None: ...

+    class QueryContext(google.protobuf.message.Message):


qq; could we do this in Python client too?

Not yet if I understand correctly. @gatorsmile has some concern about whether exposing QueryContext in the PySpark Exception APIs makes sense for non-sql PySpark exceptions. There is some ongoing discussion for this.

HyukjinKwon · 2023-12-14T23:45:38Z

connector/connect/server/src/main/scala/org/apache/spark/sql/connect/utils/ErrorUtils.scala

+                .newBuilder()
+                .setObjectType(queryCtx.objectType())
+                .setObjectName(queryCtx.objectName())
+                .setStartIndex(queryCtx.startIndex())


Hm, this should actually fail after #43334 PR because DataFrameQueryContext throws an exception now (https://github.com/apache/spark/pull/43334/files#diff-b3bc05fec45cd951053b2876c71c7730b63789cb4336a7537a6654c724db3241R586-R589).

Seems like we miss this information Spark Connect client sides for now so it seems working .. but we should carry this context for DataFrame operations ..

@MaxGekk I think there's a bug not only here but without Spark Connect.

scala> try { | spark.range(1).select("a") | } catch { | case e: org.apache.spark.sql.catalyst.ExtendedAnalysisException => println(e.getQueryContext) | } [Lorg.apache.spark.QueryContext;@6a9d7514 val res2: Any = ()

It doesn't contain QueryContext from #43334 ..

Alright, there are three issues. TL;DR:

There are many places that does not provide Origin that contains QueryContext at QueryCompilationErrors. So the QueryContext is often missing (e.g., the case above). cc @MaxGekk

. Origin.context is being directly used on Executor side. e.g., origin.context at DivideYMInterval. This seems wrongly returning SQLQueryContext for DataFrame operations (from Executor side) in Spark Connect server because we do not call withOrigin there. That's why this specific code did not throw an exception from https://github.com/apache/spark/pull/43334/files#diff-b3bc05fec45cd951053b2876c71c7730b63789cb4336a7537a6654c724db3241R586-R589 because it has never been DataFrameContext cc @heyihong @juliuszsompolski

The current logic in ErrorUtils.scala should handle the case of DataFrameQueryContext e.g., DataFrameQueryContext.stopIndex() will throw an exception. Should we:

Set a default value in DataFrameQueryContext.stopIndex() instead of throwing an exception? @MaxGekk

Or, make the protobuf message for this optional, and throw an exception from Spark Connect client sides? @heyihong @juliuszsompolski

We should invoke withOrigin in Spark Connect server.

do you mean client? Server side stacktrace is not interesting to users to understand their own code mistakes.

I made a mistake. I updated my comment.

Either setting a default value or making the protobuf field optional should work. It depends on the semantic of DataFrameQueryContext.stopIndex().

cc: @MaxGekk

Let's make sure we fix this.

github-actions bot added SQL CONNECT labels Oct 12, 2023

[SPARK-45516] Include QueryContext in SparkThrowable proto message

c775686

heyihong force-pushed the SPARK-45516 branch from 4fde3b0 to c775686 Compare October 12, 2023 12:39

github-actions bot added the PYTHON label Oct 12, 2023

juliuszsompolski approved these changes Oct 12, 2023

View reviewed changes

nits

cef694e

MaxGekk reviewed Oct 12, 2023

View reviewed changes

connector/connect/common/src/main/protobuf/spark/connect/base.proto Outdated Show resolved Hide resolved

heyihong and others added 2 commits October 12, 2023 16:34

Update connector/connect/common/src/main/protobuf/spark/connect/base.…

f6d8c35

…proto Co-authored-by: Maxim Gekk <[email protected]>

Update generated code

8b0c717

heyihong requested a review from MaxGekk October 12, 2023 14:36

MaxGekk approved these changes Oct 12, 2023

View reviewed changes

heyihong changed the title ~~[SPARK-45516] Include QueryContext in SparkThrowable proto message~~ [SPARK-45516][CONNECT] Include QueryContext in SparkThrowable proto message Oct 12, 2023

HyukjinKwon approved these changes Oct 12, 2023

View reviewed changes

HyukjinKwon closed this in e720cce Oct 12, 2023

HyukjinKwon reviewed Oct 13, 2023

View reviewed changes

HyukjinKwon reviewed Dec 14, 2023

View reviewed changes

[SPARK-45516][CONNECT] Include QueryContext in SparkThrowable proto message #43352

[SPARK-45516][CONNECT] Include QueryContext in SparkThrowable proto message #43352

Uh oh!

Conversation

heyihong commented Oct 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

heyihong commented Oct 12, 2023

Uh oh!

juliuszsompolski left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MaxGekk commented Oct 12, 2023

Uh oh!

HyukjinKwon commented Oct 12, 2023

Uh oh!

HyukjinKwon Oct 13, 2023

Choose a reason for hiding this comment

Uh oh!

heyihong Oct 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Dec 14, 2023

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Dec 14, 2023

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Dec 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan Dec 15, 2023

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Dec 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

heyihong Dec 28, 2023

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Dec 29, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

heyihong commented Oct 12, 2023 •

edited

Loading

heyihong Oct 13, 2023 •

edited

Loading

HyukjinKwon Dec 15, 2023 •

edited

Loading

HyukjinKwon Dec 15, 2023 •

edited

Loading