Spark: Fix Alignment of Merge Commands with Mixed Case #4848

RussellSpitzer · 2022-05-23T23:28:10Z

Prior to this a mixed-case insert statement would fail to be marked
as aligned after our alignement rule was applied. This would occur
because Spark is allowed to opperate without case sensitivity. Although
we would correctly align the fields, our check for alignment required
an exact match even with the system was set to be case-insensitive. Failing this
check would mean our RewriteToMerge analysis rule would never get applied.

Prior to this a mixed-case insert statement would fail to be marked as aligned after our alignement rule was applied. This would occur because Spark is allowed to opperate without case sensitivity. Although we would correctly align the fields, our check for alignment required an exact match even with the system was set to be case-insensitive.

aokolnychyi

Great catch!

aokolnychyi · 2022-05-24T00:38:21Z

.../src/main/scala/org/apache/spark/sql/catalyst/analysis/AlignRowLevelCommandAssignments.scala

+        matchedActions = alignedMatchedActions,
+        notMatchedActions = alignedNotMatchedActions)
+
+      if (!alignedMerge.aligned) {


An alternative way of solving this is to provide a check like MergeIntoIcebergTableResolutionCheck. It will be run after all resolution rules and we will know for sure that we failed to align the assignments. If we fail here, we won't give Spark any chances to fix the alignments using other rules.

I wonder whether we should also cover UPDATEs?

I did add in a test case for Update. I think that's a good point, I can also just add a case to the Merge Rewrite Rule itself to match on any unaligned merge to an Iceberg table

Add check rule for both Updates and Merges

aokolnychyi · 2022-05-24T00:43:13Z

...rk-extensions/src/main/scala/org/apache/spark/sql/catalyst/expressions/AssignmentUtils.scala

      val key = assignment.key
      val value = assignment.value
-      toAssignmentRef(attr) == toAssignmentRef(key) &&
+      val refsEqual = if (conf.caseSensitiveAnalysis) {


Seems like we can use conf.resolver that would abstract this away? Then we will have only one case to handle.

That is a great idea

aokolnychyi · 2022-05-24T00:45:00Z

spark/v3.2/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMerge.java

+    createOrReplaceView(
+        "source",
+        "{ \"id\": 1, \"c1\": -2, \"c2\": \"new_str_1\" }\n" +
+            "{ \"id\": 2, \"c1\": -20, \"c2\": \"new_str_2\" }");


nit: the rest of this file aligns the json records slightly differently (no extra indentation for 2nd record)

ugh it was the intellij helper annotations, they made it look aligned :)

Hahaha that happens to me all the time. So annoying.

aokolnychyi · 2022-05-24T00:45:15Z

spark/v3.2/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestMerge.java

+            "{ \"id\": 2, \"c1\": -20, \"c2\": \"new_str_2\" }");
+
+    sql("MERGE INTO %s t USING source " +
+            "ON t.iD == source.Id " +


nit: same for MERGE statements

Fix Identation Use conf.resolver for string comparison Add check rule for failing analysis with unaligend ops

aokolnychyi · 2022-05-25T17:30:10Z

...c/main/scala/org/apache/spark/sql/catalyst/analysis/AlignedRowLevelIcebergCommandCheck.scala

+  override def apply(plan: LogicalPlan): Unit = {
+    plan foreach {
+      case m: MergeIntoIcebergTable if !m.aligned =>
+        throw new AnalysisException(s"Could not align Iceberg MERGE INT: $m")


nit: looks like a typo at the end?

aokolnychyi · 2022-05-25T17:30:32Z

...c/main/scala/org/apache/spark/sql/catalyst/analysis/AlignedRowLevelIcebergCommandCheck.scala

+      case m: MergeIntoIcebergTable if !m.aligned =>
+        throw new AnalysisException(s"Could not align Iceberg MERGE INT: $m")
+      case u: UpdateIcebergTable if !u.aligned =>
+        throw new AnalysisException(s"Could not align Iceberg UPDATE was never aligned: $u")


nit: do we need "was never aligned" part?

aokolnychyi

LGTM

RussellSpitzer · 2022-05-25T17:36:27Z

I blame all typos on the fact that I don't have my external monitor

* Spark: Fix Alignment of Merge Commands with Mixed Case Prior to this a mixed-case insert statement would fail to be marked as aligned after our alignment rule was applied. This would then fail the entire MERGE INTO command. The commands were correctly aligned but our alignment check was always case sensitive.

…nsitivity (apache#1428) ### What changes were proposed in this pull request? Previously alignment was checked by comparing the exact attribute references between Spark and the underlying table, which failed with case insensitive SQL configurations. To fix this we use the configuration's resolver to compare references. ### Why are the changes needed? This was breaking some migrations from Spark 3.1 where the alignment check was not present. A query which attempted to do a MergeInto with column names which matched in a case-insensitive way would fail to trigger our plan rewrite rules leading to an opaque MERGE INTO is temporarily not supported exception. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? This patch is a cherry pick of the code fixed and released in Apache Iceberg. apache/iceberg#4848 The test for this specific case is in the Iceberg codebase and we will back-port this to ADT following the merge of the fix in Spark.

RussellSpitzer requested a review from aokolnychyi May 23, 2022 23:28

github-actions bot added the spark label May 23, 2022

aokolnychyi reviewed May 24, 2022

View reviewed changes

RussellSpitzer added 3 commits May 24, 2022 11:20

Reviewer Comments

045b76b

Fix Identation Use conf.resolver for string comparison Add check rule for failing analysis with unaligend ops

Fix block import

64583bc

More Style Errors

0dbff9d

aokolnychyi reviewed May 25, 2022

View reviewed changes

aokolnychyi approved these changes May 25, 2022

View reviewed changes

Fix Error Messages

f5923a7

Formatting was bothering me

cc93232

RussellSpitzer merged commit ca8f0a1 into apache:master May 25, 2022

RussellSpitzer deleted the FixMixedCaseMergeAlignment branch May 25, 2022 22:36

Spark: Fix Alignment of Merge Commands with Mixed Case #4848

Spark: Fix Alignment of Merge Commands with Mixed Case #4848

Uh oh!

Conversation

RussellSpitzer commented May 23, 2022

Uh oh!

aokolnychyi left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aokolnychyi May 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aokolnychyi left a comment

Choose a reason for hiding this comment

Uh oh!

RussellSpitzer commented May 25, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

aokolnychyi May 24, 2022 •

edited

Loading