[SPARK-25498][SQL] InterpretedMutableProjection should handle UnsafeRow #22512

maropu · 2018-09-21T05:09:43Z

What changes were proposed in this pull request?

Since AggregationIterator uses MutableProjection for UnsafeRow, InterpretedMutableProjection needs to handle UnsafeRow as buffer internally for fixed-length types only.

How was this patch tested?

Run 'SQLQueryTestSuite' with the interpreted mode.

maropu · 2018-09-21T05:19:59Z

I thought we currently had less tests for interpreted projections, so I was checking if we had no bug caused by these projections. Then, I noticed these two issues when the interpreted mode enabled in SQLQueryTestSuite. I'm still digging if we have other bugs about interpreted projections, so I set WIP.

Btw, we'd be better to split this pr into multiple ones, probably. But, I'd like to make all the related bugs clear first.

SparkQA · 2018-09-21T05:37:32Z

Test build #96395 has finished for PR 22512 at commit 39c5e92.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
s\"but $
case class Literal(value: Any, dataType: DataType) extends LeafExpression

maropu · 2018-09-21T07:06:36Z

retest this please

SparkQA · 2018-09-21T07:33:38Z

Test build #96405 has finished for PR 22512 at commit 39c5e92.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
s\"but $
case class Literal(value: Any, dataType: DataType) extends LeafExpression

maropu · 2018-09-21T12:08:21Z

This is a simple query to reproduce;

$ SPARK_TESTING=1 ./bin/spark-shell
scala> sql("SET spark.sql.codegen.factoryMode=NO_CODEGEN")
scala> sql("CREATE TABLE desc_col_table (key int COMMENT 'column_comment') USING PARQUET")
scala> sql("""ANALYZE TABLE desc_col_table COMPUTE STATISTICS FOR COLUMNS key""")
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 0, localhost, executor driver): java.lang.UnsupportedOperationException
	at org.apache.spark.sql.catalyst.expressions.UnsafeRow.update(UnsafeRow.java:206)
	at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(InterpretedMutableProjection.scala:67)
	at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(InterpretedMutableProjection.scala:31)
	at org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.createNewAggregationBuffer(TungstenAggregationIterator.scala:129)
	at org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.<init>(TungstenAggregationIterator.scala:156)
	at org.apache.spark.sql.execution.aggregate.HashAggregateExec$$anonfun$doExecute$1$$anonfun$4.apply(HashAggregateExec.scala:112)
	at org.apache.spark.sql.execution.aggregate.HashAggregateExec$$anonfun$doExecute$1$$anonfun$4.apply(HashAggregateExec.scala:102)

SparkQA · 2018-09-21T16:18:50Z

Test build #96430 has finished for PR 22512 at commit bff88ee.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
s\"but $
case class Literal(value: Any, dataType: DataType) extends LeafExpression

maropu · 2018-10-04T03:43:56Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala

I'm not sure though, is it ok for Literal to have a different Scala-typed value for the corresponding dataType? , e.g., new Literal(1 /* int value */, LongType)? In the current master, there are some places to do so, e.g.,

spark/sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala

Line 213 in 927e527

val one = Literal(1, LongType)

spark/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala

Line 796 in 927e527

:: Literal.create(1.0, FloatType)

In the codegen path, this is ok because we add a correct literal suffix in Literal.doGenCode (e.g., 1L for new Literal(1, LongType));

spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala

Line 294 in 927e527

dataType match {

But, in the non-codegen path (e.g., spark.sql.codegen.factoryMode=NO_CODEGEN and ConstantFolding), this case throws an exception ;

scala> import org.apache.spark.sql.Column scala> import org.apache.spark.sql.catalyst.expressions.Literal scala> import org.apache.spark.sql.types._ scala> val intOne: Int = 1 scala> val lit = Literal.create(intOne, LongType) scala> spark.range(1).select(struct(new Column(lit))).collect 18/10/04 11:35:56 ERROR Executor: Exception in task 3.0 in stage 0.0 (TID 3) java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.Long at scala.runtime.BoxesRunTime.unboxToLong(BoxesRunTime.java:105) at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getLong(rows.scala:42) at org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getLong(rows.scala:195) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$11$$anon$1.hasNext(WholeStageCodegenExec.scala:619) ...

WDYT? cc: @gatorsmile @cloud-fan

I think we should not allow it. Can you send a separated PR for this change?

cloud-fan · 2018-10-04T04:10:46Z

.../src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedMutableProjection.scala

UnsafeRow doesn't support it, right?

oh, yes. I'll recheck again.

This match should only accept the generic internal rows, so I added code to verify types for the UnsafeRow case;
https://github.com/apache/spark/pull/22512/files#diff-3ed819282d4e4941571dd3b08fc03e37R55

maropu · 2018-10-04T04:24:44Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala

I think this API (def create(v: Any, dataType: DataType): Literal) might be a little obscure: create(scalaValue, dataType) vs create(catalystValue, dataType). How about splitting this API into the two below?

SparkQA · 2018-10-04T06:44:42Z

Test build #96923 has finished for PR 22512 at commit 31c623f.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-10-04T07:05:01Z

Test build #96922 has finished for PR 22512 at commit 78795be.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2018-10-15T11:49:38Z

I made a new pr to address the Literal issue in #22724 (I'll fix the Literal issue first, then I'll resume this pr).

SparkQA · 2018-10-15T13:44:31Z

Test build #97390 has finished for PR 22512 at commit 8e4f2b8.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-10-15T14:05:47Z

Test build #97391 has finished for PR 22512 at commit 2766bd1.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-10-23T02:10:39Z

Test build #97886 has finished for PR 22512 at commit 79c435a.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2018-10-23T02:19:56Z

.../src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedMutableProjection.scala

shall we call mutableRow.setNullAt?

We need to take care of e.nullable && e.dataType == NullType here?

the corresponding logic in the codegen version is simply call row.update(null, i).

cloud-fan · 2018-10-23T02:21:10Z

.../src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedMutableProjection.scala

we have InternalRow.getAccessor, shall we move this method there too?

oh, yes! yea, I will.

cloud-fan · 2018-10-23T02:29:42Z

sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala

If wholeStageCodegenEnabled is not used, let's not complicate the code now.

cloud-fan · 2018-10-23T02:30:02Z

LGTM, we also need a unit test

maropu · 2018-10-23T05:08:48Z

ok, I'll add tests.

maropu · 2018-10-25T23:22:16Z

I'm looking into the failure reason... (passed in the local, but failed in the jenkins...)

SparkQA · 2018-10-26T02:17:44Z

Test build #98058 has finished for PR 22512 at commit 5227e42.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

kiszk · 2018-11-23T17:15:14Z

retest this please

SparkQA · 2018-11-23T20:10:09Z

Test build #99217 has finished for PR 22512 at commit 5227e42.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2018-11-26T02:46:46Z

@maropu are you still working on it?

maropu · 2018-11-26T03:50:02Z

Yea, I'll update in a few days.

SparkQA · 2018-12-03T03:50:36Z

Test build #99583 has finished for PR 22512 at commit 243fae3.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

ueshin · 2018-12-03T05:19:52Z

.../src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedMutableProjection.scala

    while (i < validExprs.length) {
      val (_, ordinal) = validExprs(i)
-      mutableRow(ordinal) = buffer(ordinal)
+      fieldWriters(i)(buffer(ordinal))


Since fieldWriters is accessed via index, we should use IndexedSeq or Array explicitly?

ah, sounds reasonable. I'll update later.

fixed in 95411c8

SparkQA · 2018-12-03T08:05:02Z

Test build #99590 has finished for PR 22512 at commit 3553d91.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2018-12-03T08:13:12Z

sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala

      // all need to return the same result
-      if (regenerateGoldenFiles && configs.nonEmpty) {
+      if (regenerateGoldenFiles) {
        configs.take(1)


what if configs is empty? take(1) will fail

Actually, it returns an empty array?

scala> Array.empty.take(1) res0: Array[Nothing] = Array() scala> Seq.empty.take(1) res1: Seq[Nothing] = List()

For better readability, fixed in 4cdc504

SparkQA · 2018-12-03T12:10:11Z

Test build #99601 has finished for PR 22512 at commit 4cdc504.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2018-12-03T16:02:00Z

sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala

+        if (configs.nonEmpty) {
+          configs.take(1)
+        } else {
+          Array.empty[Array[(String, String)]]


nit: since configs don't matter when generating result, I think we can just return empty configs here. We can clean it up in a followup PR.

cloud-fan · 2018-12-03T16:05:31Z

thanks, merging to master!

…ating the golden files ## What changes were proposed in this pull request? This pr is to return an empty config set when regenerating the golden files in `SQLQueryTestSuite`. This is the follow-up of #22512. ## How was this patch tested? N/A Closes #23212 from maropu/SPARK-25498-FOLLOWUP. Authored-by: Takeshi Yamamuro <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

## What changes were proposed in this pull request? Since `AggregationIterator` uses `MutableProjection` for `UnsafeRow`, `InterpretedMutableProjection` needs to handle `UnsafeRow` as buffer internally for fixed-length types only. ## How was this patch tested? Run 'SQLQueryTestSuite' with the interpreted mode. Closes apache#22512 from maropu/InterpreterTest. Authored-by: Takeshi Yamamuro <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

…ating the golden files ## What changes were proposed in this pull request? This pr is to return an empty config set when regenerating the golden files in `SQLQueryTestSuite`. This is the follow-up of apache#22512. ## How was this patch tested? N/A Closes apache#23212 from maropu/SPARK-25498-FOLLOWUP. Authored-by: Takeshi Yamamuro <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

maropu force-pushed the InterpreterTest branch from 39c5e92 to bff88ee Compare September 21, 2018 13:51

maropu commented Oct 4, 2018

View reviewed changes

cloud-fan reviewed Oct 4, 2018

View reviewed changes

maropu force-pushed the InterpreterTest branch from bff88ee to 78795be Compare October 4, 2018 04:16

maropu commented Oct 4, 2018

View reviewed changes

maropu force-pushed the InterpreterTest branch from 78795be to 31c623f Compare October 4, 2018 04:26

maropu force-pushed the InterpreterTest branch from 31c623f to 2309313 Compare October 15, 2018 11:47

maropu force-pushed the InterpreterTest branch from 2309313 to 8e4f2b8 Compare October 15, 2018 11:52

maropu changed the title ~~[SPARK-25498][SQL][WIP] Fix SQLQueryTestSuite failures when the interpreter mode enabled~~ [SPARK-25498][SQL] InterpretedMutableProjection should handle UnsafeRow Oct 15, 2018

maropu force-pushed the InterpreterTest branch from 8e4f2b8 to 2766bd1 Compare October 15, 2018 12:06

maropu force-pushed the InterpreterTest branch from 2766bd1 to 79c435a Compare October 23, 2018 00:20

maropu mentioned this pull request Oct 23, 2018

[SPARK-25374][SQL] SafeProjection supports fallback to an interpreted mode #22468

Closed

cloud-fan reviewed Oct 23, 2018

View reviewed changes

maropu force-pushed the InterpreterTest branch from b8c5a17 to 45e65e5 Compare October 23, 2018 05:23

maropu added 6 commits November 30, 2018 10:00

Fix test failures with the interpreter mode enabled

3faf314

Fix

ef20325

Fix

09767a0

Fix

df07fc4

Fix

388213f

Fix

243fae3

maropu force-pushed the InterpreterTest branch from 5227e42 to 243fae3 Compare December 3, 2018 00:54

WIP: fix test failures

3553d91

ueshin reviewed Dec 3, 2018

View reviewed changes

cloud-fan reviewed Dec 3, 2018

View reviewed changes

maropu added 2 commits December 3, 2018 17:28

Fix

95411c8

Fix for readability

4cdc504

cloud-fan approved these changes Dec 3, 2018

View reviewed changes

cloud-fan reviewed Dec 3, 2018

View reviewed changes

asfgit closed this in 04046e5 Dec 3, 2018

maropu mentioned this pull request Dec 4, 2018

[SPARK-25498][SQL][FOLLOW-UP] Return an empty config set when regenerating the golden files #23212

Closed

maropu mentioned this pull request Dec 5, 2018

[SPARK-26262][SQL] Runs SQLQueryTestSuite on mixed config sets: WHOLESTAGE_CODEGEN_ENABLED and CODEGEN_FACTORY_MODE #23213

Closed

[SPARK-25498][SQL] InterpretedMutableProjection should handle UnsafeRow #22512

[SPARK-25498][SQL] InterpretedMutableProjection should handle UnsafeRow #22512

Uh oh!

Conversation

maropu commented Sep 21, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

maropu commented Sep 21, 2018

Uh oh!

SparkQA commented Sep 21, 2018

Uh oh!

maropu commented Sep 21, 2018

Uh oh!

SparkQA commented Sep 21, 2018

Uh oh!

maropu commented Sep 21, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Sep 21, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Oct 4, 2018

Uh oh!

SparkQA commented Oct 4, 2018

Uh oh!

maropu commented Oct 15, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Oct 15, 2018

Uh oh!

SparkQA commented Oct 15, 2018

Uh oh!

SparkQA commented Oct 23, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Oct 23, 2018

Uh oh!

maropu commented Oct 23, 2018

Uh oh!

maropu commented Oct 25, 2018

Uh oh!

SparkQA commented Oct 26, 2018

Uh oh!

kiszk commented Nov 23, 2018

Uh oh!

SparkQA commented Nov 23, 2018

Uh oh!

cloud-fan commented Nov 26, 2018

Uh oh!

maropu commented Nov 26, 2018

Uh oh!

SparkQA commented Dec 3, 2018

maropu commented Sep 21, 2018 •

edited

Loading

maropu commented Sep 21, 2018 •

edited

Loading

maropu commented Oct 15, 2018 •

edited

Loading