-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-11275] [SQL] Incorrect results when using rollup/cube #9815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… that included grouping expressions would return the wrong (null) result. Also simplifies the analyzer rule a bit and leaves column pruning to the optimizer.
|
Test build #46232 has finished for PR 9815 at commit
|
|
retest this please |
|
Test build #46256 has finished for PR 9815 at commit
|
|
@yhuai can you take a look at this pr? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the only change we need to fix this problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. This was a (minor) problem before that was not caught by any of the test cases, it's now more necessary since we duplicate all the grouping columns in the analyzer rule.
|
@aray Thank you for the PR! Since we are in the QA period for 1.6 release, it will be great if we just fix the problem without any other changes. Is this the minimal fix for this issue? |
|
@yhuai I do think this is the minimal fix. However like I stated in the summary we are simplifying instead of making more exceptions that might themselves have bugs. Let me know if I can clarify anything else. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these * overlapping * cases will fail without the fix, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct
|
Test build #46334 has finished for PR 9815 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, we will rely on our optimizer to remove this Project if it is not necessary, right?
|
Thank you for the fix! I am merging it to master and branch 1.6. |
Fixes bug with grouping sets (including cube/rollup) where aggregates that included grouping expressions would return the wrong (null) result. Also simplifies the analyzer rule a bit and leaves column pruning to the optimizer. Added multiple unit tests to DataFrameAggregateSuite and verified it passes hive compatibility suite: ``` build/sbt -Phive -Dspark.hive.whitelist='groupby.*_grouping.*' 'test-only org.apache.spark.sql.hive.execution.HiveCompatibilitySuite' ``` This is an alternative to pr #9419 but I think its better as it simplifies the analyzer rule instead of adding another special case to it. Author: Andrew Ray <[email protected]> Closes #9815 from aray/groupingset-agg-fix. (cherry picked from commit 37cff1b) Signed-off-by: Yin Huai <[email protected]>
|
This might not be related to rollup logics. It is a bug of Dataframe. I will try to fix it soon. Thanks! |
|
@gatorsmile Can you create a jira (with repro in the description) and ping me from that jira? |
|
Sorry, I think it is a test case issue. Scala automatically converts null.asInstanceOf[Int] into zero. Thus, Spark treats it as zero. Never mind. I tried another way, like null.asInstanceOf[java.lang.Integer]. It works fine. |
|
oh, i see. Yeah, we need to use |
Fixes bug with grouping sets (including cube/rollup) where aggregates that included grouping expressions would return the wrong (null) result.
Also simplifies the analyzer rule a bit and leaves column pruning to the optimizer.
Added multiple unit tests to DataFrameAggregateSuite and verified it passes hive compatibility suite:
This is an alternative to pr #9419 but I think its better as it simplifies the analyzer rule instead of adding another special case to it.