-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-46655][SQL] Skip query context catching in DataFrame methods
#44501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
DataFrame methodsDataFrame methods
|
@cloud-fan Could you look at the PR, please. |
| private val sparkCodePattern = Pattern.compile("(org\\.apache\\.spark\\.sql\\." + | ||
| "(?:functions|Column|ColumnName|SQLImplicits|Dataset|DataFrameStatFunctions|DatasetHolder)" + | ||
| "(?:|\\..*|\\$.*))" + | ||
| "|(scala\\.collection\\..*)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added Scala collections classes as Spark classes because they might be mixed with them inside of Spark code.
...core/src/test/scala/org/apache/spark/sql/execution/datasources/FileMetadataStructSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/DataFrameSetOperationsSuite.scala
Show resolved
Hide resolved
| queryContext = Array( | ||
| ExpectedContext(fragment = "withColumn", callSitePattern = getCurrentClassCallSitePattern))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about to revert changes for the withColumn() method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
withColumn takes Column as parameters, why do we lose the error context if Column captures it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added origin from the head expression:
throw QueryExecutionErrors.nonTimeWindowNotSupportedInStreamingError(
windowFuncList,
columnNameList,
windowSpecList,
windowExpression.head.origin)for double check. In that case, the query context is propagated obviously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the related PR: #41578
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan So far, I would revert changes in withColumn. Need to figure out how to properly fix the case of Window + Dataframe query context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then we should add withOrigin in select as well? Any API that produces Project or Window should have error context.
|
@MaxGekk after a second thought, I feel that we don't need dataframe error context for analysis errors. The analysis is eager in dataframe, so people will know the dataframe call site after seeing the stacktrace of the analysis exception. |
This reverts commit 9850345.
ok, I have reverted all methods in |
|
Merging to master. Thank you, @cloud-fan for review. |


What changes were proposed in this pull request?
In the PR, I propose to do not catch DataFrame query context in DataFrame methods but leave that close to
Columnfunctions.Why are the changes needed?
To improve user experience with Spark DataFrame/Dataset APIs, and provide more precise context of errors.
Does this PR introduce any user-facing change?
No, since the feature hasn't been released yet.
How was this patch tested?
By running the modified test suites.
Was this patch authored or co-authored using generative AI tooling?
No.