-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-25498][SQL] InterpretedMutableProjection should handle UnsafeRow #22512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I thought we currently had less tests for interpreted projections, so I was checking if we had no bug caused by these projections. Then, I noticed these two issues when the interpreted mode enabled in Btw, we'd be better to split this pr into multiple ones, probably. But, I'd like to make all the related bugs clear first. |
|
Test build #96395 has finished for PR 22512 at commit
|
|
retest this please |
|
Test build #96405 has finished for PR 22512 at commit
|
|
This is a simple query to reproduce; |
39c5e92 to
bff88ee
Compare
|
Test build #96430 has finished for PR 22512 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure though, is it ok for Literal to have a different Scala-typed value for the corresponding dataType? , e.g., new Literal(1 /* int value */, LongType)? In the current master, there are some places to do so, e.g.,
spark/sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala
Line 213 in 927e527
| val one = Literal(1, LongType) |
spark/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala
Line 796 in 927e527
| :: Literal.create(1.0, FloatType) |
In the codegen path, this is ok because we add a correct literal suffix in Literal.doGenCode (e.g., 1L for new Literal(1, LongType));
spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala
Line 294 in 927e527
| dataType match { |
But, in the non-codegen path (e.g., spark.sql.codegen.factoryMode=NO_CODEGEN and ConstantFolding), this case throws an exception ;
scala> import org.apache.spark.sql.Column
scala> import org.apache.spark.sql.catalyst.expressions.Literal
scala> import org.apache.spark.sql.types._
scala> val intOne: Int = 1
scala> val lit = Literal.create(intOne, LongType)
scala> spark.range(1).select(struct(new Column(lit))).collect
18/10/04 11:35:56 ERROR Executor: Exception in task 3.0 in stage 0.0 (TID 3)
java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.Long
at scala.runtime.BoxesRunTime.unboxToLong(BoxesRunTime.java:105)
at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getLong(rows.scala:42)
at org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getLong(rows.scala:195)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$11$$anon$1.hasNext(WholeStageCodegenExec.scala:619)
...
WDYT? cc: @gatorsmile @cloud-fan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should not allow it. Can you send a separated PR for this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UnsafeRow doesn't support it, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, yes. I'll recheck again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This match should only accept the generic internal rows, so I added code to verify types for the UnsafeRow case;
https://github.com/apache/spark/pull/22512/files#diff-3ed819282d4e4941571dd3b08fc03e37R55
bff88ee to
78795be
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this API (def create(v: Any, dataType: DataType): Literal) might be a little obscure: create(scalaValue, dataType) vs create(catalystValue, dataType). How about splitting this API into the two below?
78795be to
31c623f
Compare
|
Test build #96923 has finished for PR 22512 at commit
|
|
Test build #96922 has finished for PR 22512 at commit
|
31c623f to
2309313
Compare
|
I made a new pr to address the Literal issue in #22724 (I'll fix the Literal issue first, then I'll resume this pr). |
2309313 to
8e4f2b8
Compare
8e4f2b8 to
2766bd1
Compare
|
Test build #97390 has finished for PR 22512 at commit
|
|
Test build #97391 has finished for PR 22512 at commit
|
2766bd1 to
79c435a
Compare
|
Test build #97886 has finished for PR 22512 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we call mutableRow.setNullAt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to take care of e.nullable && e.dataType == NullType here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the corresponding logic in the codegen version is simply call row.update(null, i).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have InternalRow.getAccessor, shall we move this method there too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, yes! yea, I will.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If wholeStageCodegenEnabled is not used, let's not complicate the code now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
|
LGTM, we also need a unit test |
|
ok, I'll add tests. |
b8c5a17 to
45e65e5
Compare
|
I'm looking into the failure reason... (passed in the local, but failed in the jenkins...) |
|
Test build #98058 has finished for PR 22512 at commit
|
|
retest this please |
|
Test build #99217 has finished for PR 22512 at commit
|
|
@maropu are you still working on it? |
|
Yea, I'll update in a few days. |
5227e42 to
243fae3
Compare
|
Test build #99583 has finished for PR 22512 at commit
|
| while (i < validExprs.length) { | ||
| val (_, ordinal) = validExprs(i) | ||
| mutableRow(ordinal) = buffer(ordinal) | ||
| fieldWriters(i)(buffer(ordinal)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since fieldWriters is accessed via index, we should use IndexedSeq or Array explicitly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, sounds reasonable. I'll update later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in 95411c8
|
Test build #99590 has finished for PR 22512 at commit
|
| // all need to return the same result | ||
| if (regenerateGoldenFiles && configs.nonEmpty) { | ||
| if (regenerateGoldenFiles) { | ||
| configs.take(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if configs is empty? take(1) will fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it returns an empty array?
scala> Array.empty.take(1)
res0: Array[Nothing] = Array()
scala> Seq.empty.take(1)
res1: Seq[Nothing] = List()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For better readability, fixed in 4cdc504
|
Test build #99601 has finished for PR 22512 at commit
|
| if (configs.nonEmpty) { | ||
| configs.take(1) | ||
| } else { | ||
| Array.empty[Array[(String, String)]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: since configs don't matter when generating result, I think we can just return empty configs here. We can clean it up in a followup PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
|
thanks, merging to master! |
…ating the golden files ## What changes were proposed in this pull request? This pr is to return an empty config set when regenerating the golden files in `SQLQueryTestSuite`. This is the follow-up of #22512. ## How was this patch tested? N/A Closes #23212 from maropu/SPARK-25498-FOLLOWUP. Authored-by: Takeshi Yamamuro <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
## What changes were proposed in this pull request? Since `AggregationIterator` uses `MutableProjection` for `UnsafeRow`, `InterpretedMutableProjection` needs to handle `UnsafeRow` as buffer internally for fixed-length types only. ## How was this patch tested? Run 'SQLQueryTestSuite' with the interpreted mode. Closes apache#22512 from maropu/InterpreterTest. Authored-by: Takeshi Yamamuro <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
…ating the golden files ## What changes were proposed in this pull request? This pr is to return an empty config set when regenerating the golden files in `SQLQueryTestSuite`. This is the follow-up of apache#22512. ## How was this patch tested? N/A Closes apache#23212 from maropu/SPARK-25498-FOLLOWUP. Authored-by: Takeshi Yamamuro <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
Since
AggregationIteratorusesMutableProjectionforUnsafeRow,InterpretedMutableProjectionneeds to handleUnsafeRowas buffer internally for fixed-length types only.How was this patch tested?
Run 'SQLQueryTestSuite' with the interpreted mode.