Skip to content

Commit 2e54d68

Browse files
karenfengcloud-fan
authored andcommitted
[SPARK-34547][SQL] Only use metadata columns for resolution as last resort
### What changes were proposed in this pull request? Today, child expressions may be resolved based on "real" or metadata output attributes. We should prefer the real attribute during resolution if one exists. ### Why are the changes needed? Today, attempting to resolve an expression when there is a "real" output attribute and a metadata attribute with the same name results in resolution failure. This is likely unexpected, as the user may not know about the metadata attribute. ### Does this PR introduce _any_ user-facing change? Yes. Previously, the user would see an error message when resolving a column with the same name as a "real" output attribute and a metadata attribute as below: ``` org.apache.spark.sql.AnalysisException: Reference 'index' is ambiguous, could be: testcat.ns1.ns2.tableTwo.index, testcat.ns1.ns2.tableOne.index.; line 1 pos 71 at org.apache.spark.sql.catalyst.expressions.package$AttributeSeq.resolve(package.scala:363) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveChildren(LogicalPlan.scala:107) ``` Now, resolution succeeds and provides the "real" output attribute. ### How was this patch tested? Added a unit test. Closes apache#31654 from karenfeng/fallback-resolve-metadata. Authored-by: Karen Feng <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
1 parent 4bda3c0 commit 2e54d68

File tree

2 files changed

+35
-2
lines changed

2 files changed

+35
-2
lines changed

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -89,8 +89,9 @@ abstract class LogicalPlan
8989
}
9090
}
9191

92-
private[this] lazy val childAttributes =
93-
AttributeSeq(children.flatMap(c => c.output ++ c.metadataOutput))
92+
private[this] lazy val childAttributes = AttributeSeq(children.flatMap(_.output))
93+
94+
private[this] lazy val childMetadataAttributes = AttributeSeq(children.flatMap(_.metadataOutput))
9495

9596
private[this] lazy val outputAttributes = AttributeSeq(output)
9697

@@ -103,6 +104,7 @@ abstract class LogicalPlan
103104
nameParts: Seq[String],
104105
resolver: Resolver): Option[NamedExpression] =
105106
childAttributes.resolve(nameParts, resolver)
107+
.orElse(childMetadataAttributes.resolve(nameParts, resolver))
106108

107109
/**
108110
* Optionally resolves the given strings to a [[NamedExpression]] based on the output of this

sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2492,6 +2492,37 @@ class DataSourceV2SQLSuite
24922492
}
24932493
}
24942494

2495+
test("SPARK-34547: metadata columns are resolved last") {
2496+
val t1 = s"${catalogAndNamespace}tableOne"
2497+
val t2 = "t2"
2498+
withTable(t1) {
2499+
sql(s"CREATE TABLE $t1 (id bigint, data string) USING $v2Format " +
2500+
"PARTITIONED BY (bucket(4, id), id)")
2501+
sql(s"INSERT INTO $t1 VALUES (1, 'a'), (2, 'b'), (3, 'c')")
2502+
withTempView(t2) {
2503+
sql(s"CREATE TEMPORARY VIEW $t2 AS SELECT * FROM " +
2504+
s"VALUES (1, -1), (2, -2), (3, -3) AS $t2(id, index)")
2505+
2506+
val sqlQuery = spark.sql(s"SELECT $t1.id, $t2.id, data, index, $t1.index, $t2.index FROM " +
2507+
s"$t1 JOIN $t2 WHERE $t1.id = $t2.id")
2508+
val t1Table = spark.table(t1)
2509+
val t2Table = spark.table(t2)
2510+
val dfQuery = t1Table.join(t2Table, t1Table.col("id") === t2Table.col("id"))
2511+
.select(s"$t1.id", s"$t2.id", "data", "index", s"$t1.index", s"$t2.index")
2512+
2513+
Seq(sqlQuery, dfQuery).foreach { query =>
2514+
checkAnswer(query,
2515+
Seq(
2516+
Row(1, 1, "a", -1, 0, -1),
2517+
Row(2, 2, "b", -2, 0, -2),
2518+
Row(3, 3, "c", -3, 0, -3)
2519+
)
2520+
)
2521+
}
2522+
}
2523+
}
2524+
}
2525+
24952526
test("SPARK-33505: insert into partitioned table") {
24962527
val t = "testpart.ns1.ns2.tbl"
24972528
withTable(t) {

0 commit comments

Comments
 (0)