[SPARK-18394][SQL] Make an AttributeSet.toSeq output order consistent #18959

maropu · 2017-08-16T07:44:56Z

What changes were proposed in this pull request?

This pr sorted output attributes on their name and exprId in AttributeSet.toSeq to make the order consistent. If the order is different, spark possibly generates different code and then misses cache in CodeGenerator, e.g., GenerateColumnAccessor generates code depending on an input attribute order.

How was this patch tested?

Added tests in AttributeSetSuite and manually checked if the cache worked well in the given query of the JIRA.

srowen · 2017-08-16T07:47:44Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/AttributeSet.scala

+    // We need to keep a deterministic output order for `baseSet` because this affects a variable
+    // order in generated code (e.g., `GenerateColumnAccessor`).
+    // See SPARK-18394 for details.
+    baseSet.map(_.a).toArray.sortBy { a => (a.name, a.exprId.id) }


If it needs to be a Seq, then should the toArray be toSeq? maybe I missed way it has to be an array first

Yea, as you suggested, I initially did so. But, I just kept the original code cuz I was afraid this change wrongly affected the others. cc: @marmbrus

I thought map should always return a strict collection. I think it is safe to sort immediately after that.

ok, I'll fix in that way. Thanks!

SparkQA · 2017-08-16T10:21:18Z

Test build #80725 has finished for PR 18959 at commit 3201f0a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2017-08-16T15:29:37Z

In title and description, Attribute.toSeq seems to be AttributeSet.toSeq?

maropu · 2017-08-16T15:32:20Z

oh..ya, my bad.... thanks.

SparkQA · 2017-08-16T16:11:00Z

Test build #80736 has finished for PR 18959 at commit eba844e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-08-16T18:37:34Z

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/AttributeSetSuite.scala

    assert(aSet == AttributeSet(aUpper :: Nil))
  }
+
+  test("SPARK-18394 keep a deterministic output order along with attribute names") {


Modify this test case. Add a scenario in which the attribute set has two columns with the same name but different ids?

gatorsmile · 2017-08-16T18:39:00Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/PruningSuite.scala


      assert(actualOutputColumns === expectedOutputColumns, "Output columns mismatch")
-      assert(actualScannedColumns === expectedScannedColumns, "Scanned columns mismatch")
+      assert(actualScannedColumns.sorted === expectedScannedColumns.sorted,


Could you add a comment to explain where we call AttributeSet.toSeq?

gatorsmile · 2017-08-16T18:40:56Z

LGTM except two minor comments.

maropu · 2017-08-17T03:24:39Z

Jenkins, retest this please.

gatorsmile · 2017-08-17T05:56:29Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/PruningSuite.scala

+      // Scanned columns in `HiveTableScanExec` are generated by the `pruneFilterProject` method
+      // in `SparkPlanner` that internally uses `AttributeSet.toSeq`.
+      // Since we change an output order of `AttributeSet.toSeq` in SPARK-18394,
+      // we need to sort column names for a test below.


How about?

Scanned columns in HiveTableScanExec are generated by the pruneFilterProject method in SparkPlanner. This method internally uses AttributeSet.toSeq, in which the returned output columns are sorted by the names and expression ids.

look good, I'll update soon.

SparkQA · 2017-08-17T06:07:28Z

Test build #80763 has finished for PR 18959 at commit b33fde8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-08-17T07:04:49Z

Test build #80770 has finished for PR 18959 at commit 973402b.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2017-08-17T09:07:35Z

Jenkins, retest this please.

SparkQA · 2017-08-17T11:57:49Z

Test build #80781 has finished for PR 18959 at commit 973402b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

hvanhovell · 2017-08-17T20:47:18Z

Merging to master. Thanks!

Keep a deterministic output order in Attribute.toSeq

3201f0a

srowen reviewed Aug 16, 2017

View reviewed changes

hvanhovell approved these changes Aug 16, 2017

View reviewed changes

toArray to toSeq

eba844e

maropu changed the title ~~[SPARK-18394][SQL] Make an Attribute.toSeq output order consistent~~ [SPARK-18394][SQL] Make an AttributeSet.toSeq output order consistent Aug 16, 2017

gatorsmile reviewed Aug 16, 2017

View reviewed changes

Apply review

b33fde8

maropu force-pushed the SPARK-18394 branch from a62d871 to b33fde8 Compare August 17, 2017 03:22

gatorsmile reviewed Aug 17, 2017

View reviewed changes

Update comments

973402b

asfgit closed this in 6aad02d Aug 17, 2017

[SPARK-18394][SQL] Make an AttributeSet.toSeq output order consistent #18959

[SPARK-18394][SQL] Make an AttributeSet.toSeq output order consistent #18959

Uh oh!

Conversation

maropu commented Aug 16, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

srowen Aug 16, 2017

Choose a reason for hiding this comment

Uh oh!

maropu Aug 16, 2017

Choose a reason for hiding this comment

Uh oh!

hvanhovell Aug 16, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maropu Aug 16, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Aug 16, 2017

Uh oh!

viirya commented Aug 16, 2017

Uh oh!

maropu commented Aug 16, 2017

Uh oh!

SparkQA commented Aug 16, 2017

Uh oh!

gatorsmile Aug 16, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maropu Aug 17, 2017

Choose a reason for hiding this comment

Uh oh!

gatorsmile Aug 16, 2017

Choose a reason for hiding this comment

Uh oh!

gatorsmile commented Aug 16, 2017

Uh oh!

maropu commented Aug 17, 2017

Uh oh!

gatorsmile Aug 17, 2017

Choose a reason for hiding this comment

Uh oh!

maropu Aug 17, 2017

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Aug 17, 2017

Uh oh!

SparkQA commented Aug 17, 2017

Uh oh!

maropu commented Aug 17, 2017

Uh oh!

SparkQA commented Aug 17, 2017

Uh oh!

hvanhovell commented Aug 17, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

maropu commented Aug 16, 2017 •

edited

Loading

hvanhovell Aug 16, 2017 •

edited

Loading

maropu Aug 16, 2017 •

edited

Loading

gatorsmile Aug 16, 2017 •

edited

Loading