Skip to content

Conversation

@RussellSpitzer
Copy link
Member

Prior to this a mixed-case insert statement would fail to be marked
as aligned after our alignement rule was applied. This would occur
because Spark is allowed to opperate without case sensitivity. Although
we would correctly align the fields, our check for alignment required
an exact match even with the system was set to be case-insensitive. Failing this
check would mean our RewriteToMerge analysis rule would never get applied.

Prior to this a mixed-case insert statement would fail to be marked
as aligned after our alignement rule was applied. This would occur
because Spark is allowed to opperate without case sensitivity. Although
we would correctly align the fields, our check for alignment required
an exact match even with the system was set to be case-insensitive.
Copy link
Contributor

@aokolnychyi aokolnychyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch!

matchedActions = alignedMatchedActions,
notMatchedActions = alignedNotMatchedActions)

if (!alignedMerge.aligned) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative way of solving this is to provide a check like MergeIntoIcebergTableResolutionCheck. It will be run after all resolution rules and we will know for sure that we failed to align the assignments. If we fail here, we won't give Spark any chances to fix the alignments using other rules.

Copy link
Contributor

@aokolnychyi aokolnychyi May 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether we should also cover UPDATEs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did add in a test case for Update. I think that's a good point, I can also just add a case to the Merge Rewrite Rule itself to match on any unaligned merge to an Iceberg table

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add check rule for both Updates and Merges

val key = assignment.key
val value = assignment.value
toAssignmentRef(attr) == toAssignmentRef(key) &&
val refsEqual = if (conf.caseSensitiveAnalysis) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we can use conf.resolver that would abstract this away? Then we will have only one case to handle.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a great idea

createOrReplaceView(
"source",
"{ \"id\": 1, \"c1\": -2, \"c2\": \"new_str_1\" }\n" +
"{ \"id\": 2, \"c1\": -20, \"c2\": \"new_str_2\" }");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the rest of this file aligns the json records slightly differently (no extra indentation for 2nd record)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ugh it was the intellij helper annotations, they made it look aligned :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hahaha that happens to me all the time. So annoying.

"{ \"id\": 2, \"c1\": -20, \"c2\": \"new_str_2\" }");

sql("MERGE INTO %s t USING source " +
"ON t.iD == source.Id " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: same for MERGE statements

Fix Identation
Use conf.resolver for string comparison
Add check rule for failing analysis with unaligend ops
override def apply(plan: LogicalPlan): Unit = {
plan foreach {
case m: MergeIntoIcebergTable if !m.aligned =>
throw new AnalysisException(s"Could not align Iceberg MERGE INT: $m")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: looks like a typo at the end?

case m: MergeIntoIcebergTable if !m.aligned =>
throw new AnalysisException(s"Could not align Iceberg MERGE INT: $m")
case u: UpdateIcebergTable if !u.aligned =>
throw new AnalysisException(s"Could not align Iceberg UPDATE was never aligned: $u")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: do we need "was never aligned" part?

Copy link
Contributor

@aokolnychyi aokolnychyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@RussellSpitzer
Copy link
Member Author

I blame all typos on the fact that I don't have my external monitor

@RussellSpitzer RussellSpitzer merged commit ca8f0a1 into apache:master May 25, 2022
@RussellSpitzer RussellSpitzer deleted the FixMixedCaseMergeAlignment branch May 25, 2022 22:36
RussellSpitzer added a commit to RussellSpitzer/iceberg that referenced this pull request May 25, 2022
* Spark: Fix Alignment of Merge Commands with Mixed Case

Prior to this a mixed-case insert statement would fail to be marked
as aligned after our alignment rule was applied. This would then fail the
entire MERGE INTO command. The commands were correctly aligned but
our alignment check was always case sensitive.
RussellSpitzer added a commit that referenced this pull request May 25, 2022
* Spark: Fix Alignment of Merge Commands with Mixed Case

Prior to this a mixed-case insert statement would fail to be marked
as aligned after our alignment rule was applied. This would then fail the
entire MERGE INTO command. The commands were correctly aligned but
our alignment check was always case sensitive.
sunchao pushed a commit to sunchao/spark that referenced this pull request Jun 2, 2023
…nsitivity (apache#1428)

### What changes were proposed in this pull request?
Previously alignment was checked by comparing the exact attribute references between
Spark and the underlying table, which failed with case insensitive SQL configurations. To
fix this we use the configuration's resolver to compare references.

### Why are the changes needed?
This was breaking some migrations from Spark 3.1 where the alignment check was not present. A query which attempted
to do a MergeInto with column names which matched in a case-insensitive way would fail to trigger our plan rewrite
rules leading to an opaque MERGE INTO is temporarily not supported exception.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
This patch is a cherry pick of the code fixed and released in Apache Iceberg. apache/iceberg#4848 

The test for this specific case is in the Iceberg codebase and we will back-port this to ADT following the merge of the fix in Spark.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants