simplify #1

cloud-fan · 2020-05-21T15:42:01Z

No description provided.

skambha · 2020-05-22T22:29:26Z

sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala

+    if (!ansiEnabled) {
+      try {
+        checkAnswer(df, expectedAnswer)
+      } catch {


This has changed the tests to say it is ok to throw exception even when ansienabled is false. Our ansienabled flag then isn't doing what it says it is supposed to do, right. That is a bug.

This is a bug, in unsafe row writer, and I think we should fix it. According to apache#27627 (comment) , this bug is already there for a long time.

This is a less critical bug as Spark fails instead of returning a wrong result.

Just to clarify, the issue we have is the UnsafeRow taking a overflow decimal value but when fetching it throws exception.

skambha · 2020-05-22T22:46:35Z

In this pr, the tests have been changed to say it is ok to throw exception even when ansienabled is false.

I have a few concerns about this simplification. This does not cover all the scenarios that we discussed in apache#27627, right. Basically here, we will error out in some cases for overflow even if ansienabled is false. That is a bug. Also when wholestage is disabled, and ansienabled is false, we should get null as the result but that won’t be the case now.

I also have some reservations on how safe is it to make the assumptions in the logic in Sum to be dependent on the underlying implementation details of UnsafeRowWriter writing out null for overflow value. What if that were to change in the future. Is the expectation that the UT would uncover the issue..

cloud-fan · 2020-05-25T14:41:56Z

Checking overflow for each input row is not acceptable. We must allow temporary overflow in the middle of calculation, to support cases like large_positive_decimal + large_positive_decimal + large_negative_decimal, where we can still get a valid result. That said, we can't merge apache#27627 as it is, as it can break queries that return correct result.

On the other hand, this patch fixes most of the problems, but we still suffer from the unsafe row writer bug. It's not a big deal, and we can fix it later. see #1 (comment)

skambha · 2020-05-26T20:00:02Z

Checking overflow for each input row is not acceptable.
We must allow temporary overflow in the middle of calculation, to support cases like large_positive_decimal + large_positive_decimal + large_negative_decimal, where we can still get a valid result.

i c. This means that sum is not deterministic (when there is an intermediate overflow value) since the sum output will depend on the values in each partition.

For e.g, if we have the following scenario:
a1+a2 overflows, but with a3, it won't overflow. None of a4, a5 overflows.

Partition 1 -> a1, a2, a3 (Note a1+a2 overflows, but a1+a2+a3 doesn't)
Partition 2 -> a4, a5
Sum value = a1+a2+a3 +a4+a5 - Lets say this doesn't overflow. This is the result.

But if you had different partitions for the same values, for e.g:
Partition 1 -> a1,a2
Partition 2 -> a3, a4, a5

In this case, we would have null coming from a1+a2, and then the end result being null.

So we have same values but the output of sum is different.

--
Although not ideal, this does get us closer and it fixes the incorrect results scenario for some codepaths. The other scenarios throw exceptions which is less severe than incorrect results, so I am fine with these changes. This will have the non-deterministic behavior that I mentioned above for some scenarios.

@cloud-fan, Please let me know how you want to proceed. Would you like me to merge this into the other pr? Let me know if you need me to do anything. Thanks.

cloud-fan · 2020-05-27T07:04:20Z

Yea please merge this and we can polish your PR further to get it merged.

This is a hard problem and we are not going to fix it completely here. There are 2 things we should keep thinking:

how to fix the error in non-ANSI mode.
how to fix the non-deterministic result.

both of them are existing problems and we are not making things worse.

skambha · 2020-05-28T01:01:08Z

Hi @cloud-fan, I merged this locally and running tests and I see test failure in sql suite in SQLQueryTestSuite. I wanted to check if you saw it and if you maybe missed adding an update here?

cloud-fan · 2020-05-28T09:20:21Z

can you let me know which test fails?

skambha · 2020-05-28T17:44:04Z

This test: SQLQueryTestSuite for postgreSQL/window_part4.sql and this is the diff:

[info] - postgreSQL/window_part4.sql *** FAILED *** (8 seconds, 897 milliseconds)
[info]   postgreSQL/window_part4.sql
[info]   Expected "struct<[sum(n) OVER (ORDER BY i ASC NULLS FIRST ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING):decimal(22,2)]>", but got "struct<[]>" Schema did not match for query #8
[info]   SELECT SUM(n) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
[info]     FROM (VALUES(1,1.01),(2,2),(3,3)) v(i,n): -- !query
[info]   SELECT SUM(n) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
[info]     FROM (VALUES(1,1.01),(2,2),(3,3)) v(i,n)
[info]   -- !query schema
[info]   struct<>
[info]   -- !query output
[info]   java.lang.ClassCastException
[info]   org.apache.spark.sql.catalyst.expressions.SpecificInternalRow cannot be cast to org.apache.spark.sql.types.Decimal (SQLQueryTestSuite.scala:464)
[info]   org.scalatest.exceptions.TestFailedException:
[info]   at org.scalatest.Assertions.newAssertionFailedExc

skambha · 2020-05-29T01:53:42Z

There are a few more in hive suite bucket for
org.apache.spark.sql.hive.execution.HashAggregationQuerySuite
org.apache.spark.sql.hive.execution.HashAggregationQueryWithControlledFallbackSuite

I went ahead and fixed it. Here is the diff:

a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/decimalExpressions.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/decimalExpressions.scala
@@ -159,7 +159,7 @@ case class CheckOverflowInSum(
    if (value == null) {
      if (nullOnOverflow) null else throw new ArithmeticException("Overflow in sum of decimals.")
    } else {
-      input.asInstanceOf[Decimal].toPrecision(
+      value.asInstanceOf[Decimal].toPrecision(
        dataType.precision,
        dataType.scale,
        Decimal.ROUND_HALF_UP,

simplify

de2d68f

cloud-fan mentioned this pull request May 21, 2020

[SPARK-28067][SQL] Fix incorrect results for decimal aggregate sum by returning null on decimal overflow apache/spark#27627

Closed

skambha reviewed May 22, 2020

View reviewed changes

skambha merged commit de2d68f into skambha:decaggfixwrongresults May 29, 2020

skambha mentioned this pull request Jul 8, 2020

[SPARK-28067][SPARK-32018] Fix decimal overflow issues apache/spark#29026

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

simplify #1

simplify #1

Uh oh!

cloud-fan commented May 21, 2020

Uh oh!

skambha May 22, 2020

Uh oh!

cloud-fan May 25, 2020

Uh oh!

skambha May 26, 2020

Uh oh!

skambha commented May 22, 2020

Uh oh!

cloud-fan commented May 25, 2020

Uh oh!

skambha commented May 26, 2020 •

edited

Loading

Uh oh!

cloud-fan commented May 27, 2020

Uh oh!

skambha commented May 28, 2020

Uh oh!

cloud-fan commented May 28, 2020

Uh oh!

skambha commented May 28, 2020

Uh oh!

skambha commented May 29, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

simplify #1

simplify #1

Uh oh!

Conversation

cloud-fan commented May 21, 2020

Uh oh!

skambha May 22, 2020

Choose a reason for hiding this comment

Uh oh!

cloud-fan May 25, 2020

Choose a reason for hiding this comment

Uh oh!

skambha May 26, 2020

Choose a reason for hiding this comment

Uh oh!

skambha commented May 22, 2020

Uh oh!

cloud-fan commented May 25, 2020

Uh oh!

skambha commented May 26, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cloud-fan commented May 27, 2020

Uh oh!

skambha commented May 28, 2020

Uh oh!

cloud-fan commented May 28, 2020

Uh oh!

skambha commented May 28, 2020

Uh oh!

skambha commented May 29, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

skambha commented May 26, 2020 •

edited

Loading