-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Spark: Fix Alignment of Merge Commands with Mixed Case #4848
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark: Fix Alignment of Merge Commands with Mixed Case #4848
Conversation
Prior to this a mixed-case insert statement would fail to be marked as aligned after our alignement rule was applied. This would occur because Spark is allowed to opperate without case sensitivity. Although we would correctly align the fields, our check for alignment required an exact match even with the system was set to be case-insensitive.
aokolnychyi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch!
| matchedActions = alignedMatchedActions, | ||
| notMatchedActions = alignedNotMatchedActions) | ||
|
|
||
| if (!alignedMerge.aligned) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative way of solving this is to provide a check like MergeIntoIcebergTableResolutionCheck. It will be run after all resolution rules and we will know for sure that we failed to align the assignments. If we fail here, we won't give Spark any chances to fix the alignments using other rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether we should also cover UPDATEs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did add in a test case for Update. I think that's a good point, I can also just add a case to the Merge Rewrite Rule itself to match on any unaligned merge to an Iceberg table
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add check rule for both Updates and Merges
| val key = assignment.key | ||
| val value = assignment.value | ||
| toAssignmentRef(attr) == toAssignmentRef(key) && | ||
| val refsEqual = if (conf.caseSensitiveAnalysis) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we can use conf.resolver that would abstract this away? Then we will have only one case to handle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a great idea
| createOrReplaceView( | ||
| "source", | ||
| "{ \"id\": 1, \"c1\": -2, \"c2\": \"new_str_1\" }\n" + | ||
| "{ \"id\": 2, \"c1\": -20, \"c2\": \"new_str_2\" }"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the rest of this file aligns the json records slightly differently (no extra indentation for 2nd record)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ugh it was the intellij helper annotations, they made it look aligned :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hahaha that happens to me all the time. So annoying.
| "{ \"id\": 2, \"c1\": -20, \"c2\": \"new_str_2\" }"); | ||
|
|
||
| sql("MERGE INTO %s t USING source " + | ||
| "ON t.iD == source.Id " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: same for MERGE statements
Fix Identation Use conf.resolver for string comparison Add check rule for failing analysis with unaligend ops
| override def apply(plan: LogicalPlan): Unit = { | ||
| plan foreach { | ||
| case m: MergeIntoIcebergTable if !m.aligned => | ||
| throw new AnalysisException(s"Could not align Iceberg MERGE INT: $m") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: looks like a typo at the end?
| case m: MergeIntoIcebergTable if !m.aligned => | ||
| throw new AnalysisException(s"Could not align Iceberg MERGE INT: $m") | ||
| case u: UpdateIcebergTable if !u.aligned => | ||
| throw new AnalysisException(s"Could not align Iceberg UPDATE was never aligned: $u") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: do we need "was never aligned" part?
aokolnychyi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
I blame all typos on the fact that I don't have my external monitor |
* Spark: Fix Alignment of Merge Commands with Mixed Case Prior to this a mixed-case insert statement would fail to be marked as aligned after our alignment rule was applied. This would then fail the entire MERGE INTO command. The commands were correctly aligned but our alignment check was always case sensitive.
* Spark: Fix Alignment of Merge Commands with Mixed Case Prior to this a mixed-case insert statement would fail to be marked as aligned after our alignment rule was applied. This would then fail the entire MERGE INTO command. The commands were correctly aligned but our alignment check was always case sensitive.
…nsitivity (apache#1428) ### What changes were proposed in this pull request? Previously alignment was checked by comparing the exact attribute references between Spark and the underlying table, which failed with case insensitive SQL configurations. To fix this we use the configuration's resolver to compare references. ### Why are the changes needed? This was breaking some migrations from Spark 3.1 where the alignment check was not present. A query which attempted to do a MergeInto with column names which matched in a case-insensitive way would fail to trigger our plan rewrite rules leading to an opaque MERGE INTO is temporarily not supported exception. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? This patch is a cherry pick of the code fixed and released in Apache Iceberg. apache/iceberg#4848 The test for this specific case is in the Iceberg codebase and we will back-port this to ADT following the merge of the fix in Spark.
Prior to this a mixed-case insert statement would fail to be marked
as aligned after our alignement rule was applied. This would occur
because Spark is allowed to opperate without case sensitivity. Although
we would correctly align the fields, our check for alignment required
an exact match even with the system was set to be case-insensitive. Failing this
check would mean our RewriteToMerge analysis rule would never get applied.