-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-45022][SQL] Provide context for dataset API errors #42740
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-45022][SQL] Provide context for dataset API errors #42740
Conversation
628f70d to
5cff672
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we need to touch a lot of functions anyway, here is a long-standing cleanup work that is not finished: most of the functions can be implemented by creating UnresolvedFunction. We don't need to create the function expression directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I will update the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#42864 is merged now and this PR is ready for review.
6d218e4 to
b2d8750
Compare
b2d8750 to
7556ffa
Compare
7556ffa to
289222f
Compare
fadd5e9 to
9a8cb86
Compare
f1bb66e to
f390e73
Compare
e00a6a7 to
8b535be
Compare
f487a76 to
62c4a98
Compare
e8d644e to
26ba6d7
Compare
26ba6d7 to
7adf30e
Compare
7adf30e to
5b774a4
Compare
cfa0332 to
cb936a6
Compare
cb936a6 to
935e74a
Compare
cabd982 to
f0a8e18
Compare
f0a8e18 to
42028a6
Compare
| originObjectName: Option[String]) extends QueryContext { | ||
| override val contextType = QueryContextType.SQL | ||
|
|
||
| override val objectType = originObjectType.getOrElse("") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why remove override?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this chnage accidentally remained here from a different implementation. I will revert it.
| } else { | ||
| methodName | ||
| } | ||
| val callSite = elements(1).toString |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a followup: make the size of the call site configurable.
| */ | ||
| def join(right: Dataset[_]): DataFrame = withPlan { | ||
| Join(logicalPlan, right.logicalPlan, joinType = Inner, None, JoinHint.NONE) | ||
| def join(right: Dataset[_]): DataFrame = withOrigin() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this feature is mostly for ansi mode, do we really need to capture the call site for df APIs that do not create expressions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need to, but please note that query context can be useful with non-ANSI related erros too. here is an example with over(): https://github.com/apache/spark/pull/42740/files#diff-728befc0ad63fec7c44a606c62f3f63204293d3ca87da2830fde8ae7a291656bR188-R189
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible in a central place? like withPlan?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some methods' stacktrace can be captured in withPlan but others' can't that's why I decided to add withOrigin to all top level methods.
| f | ||
| } else { | ||
| val st = Thread.currentThread().getStackTrace | ||
| var i = framesToDrop + 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why 3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thread.currentThread() and withOrigin() add 2 fames and because user code can't call withOrigin() directly we can be sure that 3rd frame also belongs to spark code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see!
|
@peter-toth Wouldn't you mind if I'll take this PR over? |
Sure, please do it. |
What changes were proposed in this pull request?
This PR captures the dataset APIs used by the user code and the call site in the user code and provides better error messages.
E.g. consider the following Spark app
SimpleApp.scala:After this PR the error message contains the error context (which Spark Dataset API is called from where in the user code) in the following form:
which is similar to the already provided context in case of SQL queries:
Please note that stack trace in
spark-shelldoesn't contain meaningful elements:so this change doesn't help with that usecase.
Why are the changes needed?
To provide more user friendly errors.
Does this PR introduce any user-facing change?
Yes.
How was this patch tested?
Added new UTs to
QueryExecutionAnsiErrorsSuite.Was this patch authored or co-authored using generative AI tooling?
No.