[SPARK-26572][SQL] fix aggregate codegen result evaluation #23731

peter-toth · 2019-02-03T20:00:06Z

What changes were proposed in this pull request?

This PR is a correctness fix in HashAggregateExec code generation. It forces evaluation of result expressions before calling consume() to avoid multiple executions.

This PR fixes a use case where an aggregate is nested into a broadcast join and appears on the "stream" side. The issue is that Broadcast join generates it's own loop. And without forcing evaluation of resultExpressions of HashAggregateExec before the join's loop these expressions can be executed multiple times giving incorrect results.

How was this patch tested?

New UT was added.

Change-Id: Ie07c913fc4586296c8187f0972c19169da25f613

maropu · 2019-02-04T01:52:19Z

I think we should handle this case in a planner?
For example, if we turn off broadcast join, the behaviour changes;

scala> val baseTable = Seq((1), (1)).toDF("idx")
scala> val distinctWithId = baseTable.distinct.withColumn("id", functions.monotonically_increasing_id())
scala> baseTable.join(distinctWithId, "idx").show
+---+------------+
|idx|          id|
+---+------------+
|  1|369367187456|
|  1|369367187457|
+---+------------+

sql("SET spark.sql.autoBroadcastJoinThreshold=-1")
scala> baseTable.join(distinctWithId, "idx").show
+---+------------+
|idx|          id|
+---+------------+
|  1|369367187456|
|  1|369367187456|
+---+------------+

Could you check again?

maropu · 2019-02-04T01:53:50Z

btw, could you describe more in the PR description? what's the root cause of this issue? How did this pr fix the issue? brabrabra....

peter-toth · 2019-02-04T07:59:25Z

The reason why I think this is a code generation issue is that if you disable spark.sql.codegen.wholeStage then the result is correct.

This is the physical plan of the example in the ticket:

== Physical Plan ==
*(3) Project [idx#4, id#6L]
+- *(3) BroadcastHashJoin [idx#4], [idx#9], Inner, BuildLeft
   :- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint)))
   :  +- *(1) Project [value#1 AS idx#4]
   :     +- LocalTableScan [value#1]
   +- *(3) HashAggregate(keys=[idx#9], functions=[], output=[idx#9, id#6L])
      +- Exchange hashpartitioning(idx#9, 5)
         +- *(2) HashAggregate(keys=[idx#9], functions=[], output=[idx#9])
            +- *(2) Project [value#1 AS idx#9]
               +- LocalTableScan [value#1]

and if you take a look the code of stage 3 (left some comments in it regarding what my PR does):

    ...
    // this method is called for every aggregation key
    private void agg_doAggregateWithKeysOutput_0(UnsafeRow agg_keyTerm_0, UnsafeRow agg_bufferTerm_0)
            throws java.io.IOException {
        ((org.apache.spark.sql.execution.metric.SQLMetric) references[4] /* numOutputRows */).add(1);

        int agg_value_4 = agg_keyTerm_0.getInt(0);
        // this PR moves agg_value_5 calculation and agg_count_0 increment from boradcast join loop to here

        // generate join key for stream side
        boolean bhj_isNull_0 = false;
        long bhj_value_0 = -1L;
        if (!false) {
            bhj_value_0 = (long) agg_value_4;
        }
        // find matches from HashRelation
        scala.collection.Iterator bhj_matches_0 = bhj_isNull_0 ? null
                : (scala.collection.Iterator) bhj_relation_0.get(bhj_value_0);
        if (bhj_matches_0 != null) {
            while (bhj_matches_0.hasNext()) {
                UnsafeRow bhj_matched_0 = (UnsafeRow) bhj_matches_0.next();
                {
                    ((org.apache.spark.sql.execution.metric.SQLMetric) references[6] /* numOutputRows */).add(1);

                    int bhj_value_2 = bhj_matched_0.getInt(0);
                    boolean project_isNull_0 = false;
                    UTF8String project_value_0 = null;
                    if (!false) {
                        project_value_0 = UTF8String.fromString(String.valueOf(bhj_value_2));
                    }
                    final long agg_value_5 = partitionMask + agg_count_0;
                    agg_count_0++;
                    boolean project_isNull_2 = false;
                    UTF8String project_value_2 = null;
                    if (!false) {
                        project_value_2 = UTF8String.fromString(String.valueOf(agg_value_5));
                    }
    ...

So both hash aggregate and broadcast join are required in one codegen stage to experience this issue and also important that aggregate has to be on the "stream" side. This might be a rare case and explains why this issue hasn't come up earlier.
(I also think that there might be other operators than broadcast join that generate loop and so are affected, but I didn't look into that.)
But I think this is an issue with the generated code of HashAggregateExec and it seems to me that we can force evaluation of resultExpressions before generating broadcast join code (ie. calling consume()) without any drawback.

mgaido91 · 2019-02-04T08:53:58Z

The changes makes sense to me, but I think this problem was introduced in SPARK-13404, which claimed to have a significant perf gain (about 30% on TPCDS Q55), so it would be great if we can fix this without introducing perf regression. @peter-toth may you please run (and post the results) the benchmarks in order to ensure we are not introducing a perf regression with this PR?

@davies you are the author of that PR, do you have time to check this?

maropu · 2019-02-04T09:46:13Z

ok to test

maropu · 2019-02-04T09:52:02Z

cc: @cloud-fan @hvanhovell

maropu · 2019-02-04T11:00:24Z

This issue happens in case of stateful exprs only? If so, could you modify the code to apply the current fix only if HashAggregateExec has stateful exprs? I worry about the performance regression @mgaido91 pointed out, too. It seems the current fix affect the other queries, its a corner case though....

maropu · 2019-02-04T11:17:35Z

sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala

       $evaluateKeyVars
       $evaluateBufferVars
       $evaluateAggResults
+       $evaluateResultVars


We need this change?

I think so. If you replace .distinct() to .groupBy("idx").max() in the example then this code path runs and the change fixes the same issue.

If so, could you please add test cases to cover all the code paths you added in this pr.

Thanks. I've added that path to the test.

SparkQA · 2019-02-04T13:47:49Z

Test build #102034 has finished for PR 23731 at commit b5d079c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

peter-toth · 2019-02-04T16:26:17Z

Here are my benchmark results of q55. I run 3 times on master and 3 times on this PR branch against scale=5 generated data.
Master:

master:
  Stopped after 5 iterations, 29324 ms

Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.14.2
Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz
TPCDS Snappy:                            Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
q55                                           5683 / 5865          2.6         391.2       1.0X

  Stopped after 5 iterations, 28914 ms

Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.14.2
Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz
TPCDS Snappy:                            Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
q55                                           5584 / 5783          2.6         384.3       1.0X

  Stopped after 5 iterations, 29905 ms

Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.14.2
Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz
TPCDS Snappy:                            Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
q55                                           5873 / 5981          2.5         404.3       1.0X

This PR:

this PR:
  Stopped after 5 iterations, 32577 ms

Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.14.2
Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz
TPCDS Snappy:                            Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
q55                                           6226 / 6515          2.3         428.5       1.0X


  Stopped after 5 iterations, 30612 ms

Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.14.2
Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz
TPCDS Snappy:                            Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
q55                                           5792 / 6122          2.5         398.6       1.0X

  Stopped after 5 iterations, 32918 ms

Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.14.2
Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz
TPCDS Snappy:                            Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
q55                                           6415 / 6584          2.3         441.5       1.0X

Although the results are a bit varying, it seems this patch would introduce some performance degradation.
I will try to modify the patch to evaluate only Stateful expressions as @maropu suggested and run the benchmark again.

mgaido91 · 2019-02-04T16:36:06Z

@peter-toth did you run the benchmark also on the other queries? My guess is that it may also happen that q55 gets some perf degradation, but others improve. In that case we should kind of average over all the queries whether the impact is positive or not.

In case we decide to limit this to be done only for some expressions, we should do it for those which aer non-deterministic rather than only for the Stateful ones.

peter-toth · 2019-02-04T16:46:56Z

Thanks @mgaido91, then I will run a full benchmark first.

mgaido91 · 2019-02-04T16:55:16Z

Thanks @peter-toth!

dongjoon-hyun · 2019-02-08T16:56:30Z

Retest this please.

SparkQA · 2019-02-08T21:10:18Z

Test build #102104 has finished for PR 23731 at commit b5d079c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2019-02-11T06:33:05Z

The issue is that Broadcast join generates it's own loop. And without forcing evaluation of resultExpressions of HashAggregateExec before the join's loop these expressions can be executed multiple times giving incorrect results.

Shouldn't we fix join instead of aggregate?

viirya · 2019-02-11T13:59:03Z

sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala

-      consume(ctx, eval)
+      val evaluateResultVars = evaluateVariables(resultVars)
+      s"""
+       $evaluateResultVars


For non broadcast join cases, the change will force evaluation unnecessarily too. We should move evaluation out of the loop in broadcast join, if possible.

What I a bit concern about is; is it semantically ok to defer the evaluation of nondeterministic exprs if HashAggregateExec has these exprs?

I think, to fix this issue, its ok to modify code in the join side if we could find a simpler solution there with no performance regression. But, I have just a question about the design regardless of this issue.

oh.. Kris answered my question.. in #23731 (review)

peter-toth · 2019-02-11T14:31:20Z

@mgaido91 @maropu @cloud-fan @viirya I've just collected the results of full TPCDSQueryBenchmark runs on master vs. this PR and overall it doesn't seem to have that big impact if we force the evaluation in aggregate. But I will try to change the PR and fix broadcast join to force evaluation of non-deterministic expressions out of the loop.

Here are the benchmark results if you are interested:

test case (values in ms)	master	master 2nd run	this PR	this PR 2nd run
total	27023411	26650436	27076386	27024466
q1	76017	75679	78988	81349
q10	90319	88032	93112	97605
q10a-v2.7	93878	93220	95113	94199
q11	187918	187985	198156	201343
q11-v2.7	186384	183901	191372	195608
q12	35709	34768	37109	39020
q12-v2.7	33617	34629	35666	37382
q13	76088	75420	79224	79844
q14-v2.7	647991	593543	611725	638555
q14a	793687	779841	786366	780845
q14a-v2.7	835538	849404	845488	824285
q14b	632994	659007	647556	628163
q15	50407	48712	53039	52104
q16	173310	176263	181887	185434
q17	358234	349108	384112	380049
q18	67506	67611	70547	71098
q18a-v2.7	260611	263082	268121	260403
q19	45170	45566	48270	48136
q2	62042	63219	64080	65965
q20	37064	36726	38922	38857
q20-v2.7	35919	35109	37882	37615
q21	30900	30886	31394	32062
q22	134810	140961	139015	146047
q22-v2.7	881799	666436	736824	786129
q22a-v2.7	77062	76621	76412	77984
q23a	551085	562690	570236	554450
q23b	509188	529924	514952	504751
q24-v2.7	232464	223424	235800	242875
q24a	235519	239793	240399	239449
q24b	242000	235056	240476	237701
q25	365661	364580	375756	388874
q26	50035	49711	52832	53557
q27	54088	53121	55344	57900
q27a-v2.7	156000	153863	156385	149929
q28	253946	253869	264420	266887
q29	364358	368899	371732	377960
q3	32569	32863	34460	35264
q30	86921	87179	90322	93566
q31	195233	195026	204116	208575
q32	70098	70355	72000	73724
q33	113700	114023	117777	121381
q34	48608	48859	50382	51431
q34-v2.7	47455	46366	48917	49559
q35	96340	93386	97375	99735
q35-v2.7	92379	93076	95759	97744
q35a-v2.7	98323	97102	98990	95888
q36	52161	52314	53873	55724
q36a-v2.7	56251	55359	55688	55844
q37	71609	72788	74007	75574
q38	113581	111851	115416	117293
q39a	67328	70285	69553	67597
q39b	67457	69747	70059	67390
q4	645699	632785	630790	662483
q40	92189	90362	94774	97688
q41	3931	3906	4157	4170
q42	32727	33828	33368	35238
q43	41522	40911	43279	44498
q44	118338	118557	122399	126860
q45	42166	42729	44086	43529
q46	57798	57073	59813	60527
q47	92922	92823	94551	99835
q47-v2.7	90674	90166	91696	96707
q48	64766	65324	66017	69578
q49	290310	306006	294090	303586
q49-v2.7	288623	320468	294292	290968
q5	242664	244383	258006	263323
q50	177935	187770	193884	188241
q51	180839	180103	184773	189253
q51a-v2.7	1261774	1166285	1173536	1156590
q52	32827	32561	33622	33896
q53	42069	42203	43387	44729
q54	178694	180335	185816	193587
q55	32863	32091	33049	33913
q56	113489	110395	116137	116632
q57	77085	78120	80313	80765
q57-v2.7	75062	75184	78971	78424
q58	102682	104439	106576	109776
q59	58396	57408	59314	62043
q5a-v2.7	264955	257989	264370	266317
q6	109696	107457	107418	109251
q6-v2.7	114195	109684	113283	114997
q60	115152	111163	116920	118019
q61	88653	87815	88905	92676
q62	45958	45002	45842	49475
q63	42271	41499	42553	44231
q64	375608	366708	371139	389530
q64-v2.7	398085	387919	378419	381406
q65	120201	121013	121007	124101
q66	107631	110160	110194	116045
q67	491712	496057	502941	498816
q67a-v2.7	582984	582585	588929	561230
q68	57644	57396	58574	62110
q69	92364	86473	87804	92744
q7	53092	54107	56050	56440
q70	76027	76236	78249	80208
q70a-v2.7	81529	81288	80193	79803
q71	111032	118959	114687	115729
q72	1263412	1215258	1204783	1194142
q72-v2.7	1274719	1297230	1224349	1188723
q73	50254	47880	48539	50459
q74	169414	163799	163079	169219
q74-v2.7	162417	163068	161341	164335
q75	365739	366476	372510	377480
q75-v2.7	352554	372481	374001	359940
q76	114328	110486	114099	114004
q77	181185	174504	177397	179485
q77a-v2.7	191487	188109	196948	188439
q78	381549	366170	382482	377300
q78-v2.7	373872	381872	400429	363220
q79	55042	53635	55588	55914
q8	40407	39912	41745	42687
q80	515092	512636	542615	522297
q80a-v2.7	527537	532262	546034	529508
q81	85570	85808	88579	86749
q82	99038	96385	99921	100342
q83	106249	99379	104401	105534
q84	39353	37683	40165	39717
q85	167298	167352	165008	169000
q86	37422	37246	38432	39093
q86a-v2.7	40440	40165	41697	41213
q87	120091	117061	125508	123227
q88	283596	293759	296313	300676
q89	44557	44882	46236	46012
q9	499645	497991	514793	525809
q90	70330	74905	72998	74123
q91	45903	46732	48843	48705
q92	66107	66381	68388	67379
q93	281893	293320	292280	295871
q94	117957	117824	126470	124528
q95	632772	591228	607336	608948
q96	37997	38276	39832	39946
q97	105541	104452	113225	108664
q98	38516	38436	41049	41108
q98-v2.7	37586	36939	39786	39280
q99	50378	51491	53608	52571

mgaido91 · 2019-02-11T14:43:48Z

@cloud-fan @viirya I am not sure about fixing this in the join is a good idea. First of all we have many kind of joins, so likely we would need to impact all of them and there may be other operators which use loops other than joins. I don't think it is correct to delegate to the consumer the responsibility of computing variables if needed. It seems more reasonable to me to fix it in the aggregate honestly.

cloud-fan · 2019-02-12T03:12:08Z

@mgaido91 are you sure aggregate is the only one that produces unevaluated result expressions? IIRC this is a long-standing optimization in the whole stage codegen framework, and there is no such a rule that operators must evaluate the result expressions before calling parent.consume.

also cc @rednaxelafx @kiszk

dongjoon-hyun · 2019-02-12T07:52:08Z

cc @dbtsai since he is the release manager for 2.4.1.

rednaxelafx

This bug and fix touches a basic design area of Spark SQL's whole-stage codegen:

Deterministic expressions can be evaluated anywhere as long as the inputs (data dependencies) are available, and are allowed to be evaluated multiple times (although from a performance point of view it's not preferred to evaluate them repeatedly); non-deterministic expressions has to be only evaluated once, and the order of evaluation should respect the order in the original query.

Two rules of thumb are:

In the whole-stage codegen framework, the evaluation of a deterministic expression can be deferred to just before its result is used. To improve performance and reduce code size, we only expect output expressions that are used more than once to be eagerly evaluated. This "used more than once" is expressed by CodegenSupport.usedInputs, and CodegenSupport.consume() handles the eager evaluation of such expressions automatically. That's #11274 already mentioned in one of the comments above.
Any physical plan operator that carries an output projection list, such as ProjectExec and in this case HashAggregateExec has to perform special treatment of forcing evaluation of non-deterministic expressions before passing the outputVars to consume(), to make sure the side effects are emitted in the correct order and not evaluated repeatedly in the parents' doConsume(). See ProjectExec.doConsume() for an example of what this special treatment should look like.

Note that Stateful expressions are Nondeterministic by design; the latter covers more expressions than the former.

The reason why this special treatment isn't done in the CodegenSupport.consume() framework function is because: consume() only gets to see the outputVars from the child as a list of ExprCodes but not the list of Expressions that produced the code. The former has lost the notion of whether the generated code is deterministic or not, which can only be found on the latter.
consume() also gets to see the child.outputs but that's a list of Attributes, which doesn't have the knowledge of whether or not the original expression was deterministic. So that doesn't help.
With that, we'd have to perform the special treatment before calling consume().

This brings us to another related note: in the whole-stage codegen world, it really is preferred to host non-trivial expressions in ProjectExec as much as possible, so that we'd only have to non-trivial expression handling in one place. Fusing the output projection list in a fat operator is a design from the past -- it would have helped reduce the operator boundaries and thus reduce materialization/operator dispatch overhead in the Volcano model, but in the whole-stage codegen world that doesn't matter at all.

Here's my suggested fix for HashAggregateExec:

diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
index 19a47ffc6d..be457b435b 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
@@ -154,6 +154,14 @@ case class HashAggregateExec(
     child.asInstanceOf[CodegenSupport].inputRDDs()
   }
 
+  // Extract the code to evaluate non-deterministic expressions in the resultExpressions.
+  // NOTE: this function will mutate the state of the `ExprCode`s in `resultVars`: the `code` of
+  // non-deterministic expressions will be cleared.
+  private def evaluateNondeterministicResults(resultVars: Seq[ExprCode]): String = {
+    val nondeterministicAttrs = resultExpressions.filterNot(_.deterministic).map(_.toAttribute)
+    evaluateRequiredVariables(output, resultVars, AttributeSet(nondeterministicAttrs))
+  }
+
   protected override def doProduce(ctx: CodegenContext): String = {
     if (groupingExpressions.isEmpty) {
       doProduceWithoutKeys(ctx)
@@ -208,8 +216,10 @@ case class HashAggregateExec(
       // evaluate result expressions
       ctx.currentVars = aggResults
       val resultVars = bindReferences(resultExpressions, aggregateAttributes).map(_.genCode(ctx))
+      val evaluateNondeterministicAggResults = evaluateNondeterministicResults(resultVars)
       (resultVars, s"""
         |$evaluateAggResults
+        |$evaluateNondeterministicAggResults
         |${evaluateVariables(resultVars)}
        """.stripMargin)
     } else if (modes.contains(Partial) || modes.contains(PartialMerge)) {
@@ -466,10 +476,12 @@ case class HashAggregateExec(
       val resultVars = bindReferences[Expression](
         resultExpressions,
         inputAttrs).map(_.genCode(ctx))
+      val evaluateNondeterministicAggResults = evaluateNondeterministicResults(resultVars)
       s"""
        $evaluateKeyVars
        $evaluateBufferVars
        $evaluateAggResults
+       $evaluateNondeterministicAggResults
        ${consume(ctx, resultVars)}
        """
     } else if (modes.contains(Partial) || modes.contains(PartialMerge)) {
@@ -506,10 +518,14 @@ case class HashAggregateExec(
       // generate result based on grouping key
       ctx.INPUT_ROW = keyTerm
       ctx.currentVars = null
-      val eval = bindReferences[Expression](
+      val resultVars = bindReferences[Expression](
         resultExpressions,
         groupingAttributes).map(_.genCode(ctx))
-      consume(ctx, eval)
+      val evaluateNondeterministicResults = evaluateNondeterministicResults(resultVars)
+      s"""
+        |$evaluateNondeterministicAggResults
+        |${consume(ctx, resultVars)}
+       """.stripMargin
     }
     ctx.addNewFunction(funcName,
       s"""

peter-toth · 2019-02-12T08:09:05Z

I was thinking of why this following simple code snippet doesn't have the same issue:

    val baseTable = Seq((1), (1)).toDF("idx")
    val distinctWithId = baseTable.withColumn("id", monotonically_increasing_id())
    val x = baseTable.join(distinctWithId, "idx")
    x.show()

because it produces the expected

+---+----------+
|idx|        id|
+---+----------+
|  1|         0|
|  1|         0|
|  1|8589934592|
|  1|8589934592|
+---+----------+

and it seems because doConsume in ProjectExec evaluates non deterministic result vars before passing to Join. So I think it would be analogous to handle non-determinism in aggregate.

Oops, meanwhile we got the same answer. Thanks @rednaxelafx.

maropu · 2019-02-12T08:18:33Z

Thanks, Kris, I'm just curious that the @rednaxelafx approach has no performance regression..

peter-toth · 2019-02-12T08:23:20Z

So, shall I adjust the fix as @rednaxelafx suggested and maybe run another benchmark? Any objections?

rednaxelafx · 2019-02-12T08:25:23Z

@maropu : my proposed change won't introduce any performance regressions because what used to be both (1) correct and (2) fast will stay the same, no changes whatsoever; whereas what used to be incorrect will be fixed.
You won't see any statistically significant differences in TPC-DS perf numbers because that benchmark doesn't really use a lot of non-deterministic expressions. Such expressions are rare in the SQL world. There isn't even a rand() call in TPC-DS...
We should expect the TPC-DS queries to generate identical whole-stage codegen code before and after my proposed fix.

mgaido91 · 2019-02-12T08:32:07Z

Thanks for your comment @rednaxelafx , huge +1 on everything you just said.

@mgaido91 are you sure aggregate is the only one that produces unevaluated result expressions?

@cloud-fan if it is not the only one, I think we have to fix the others too, but I don't think there are. ProjectExec is fine as mentioned by @rednaxelafx and I can't think of other plans which can generate non-deterministic expressions (there may be, but in this moment none comes to my mind).

maropu · 2019-02-12T08:56:10Z

@rednaxelafx I just worried about performance numbers other than TPCDS though, that's certainly true. Thanks, Kris.

nit: btw, could we move evaluateNondeterministicResults into CodegenSupport, and then ProjectExec reuse it?

peter-toth · 2019-02-12T20:18:08Z

Thank you all for the comments and suggestions.
I pushed a commit with the changes except for the change in doProduceWithoutKeys() as resultVars are force evaluated there and changing the evaluation to non-deterministic only would be an optimization, not a bugfix.
Also I didn't change ProjectExec, let me know if these 2 should be incorporated in this PR.

rednaxelafx

Mostly LGTM, with a comment in the test case.

rednaxelafx · 2019-02-12T20:40:55Z

sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala

+   * Returns source code to evaluate the variables for non-deterministic expressions, and clear the
+   * code of evaluated variables, to prevent them to be evaluated twice.
+   */
+  protected def evaluateNondeterministicVariables(


Nitpick on naming: "variables" are never non-deterministic, only expressions can have the property of being deterministic or not. Two options:

I'd prefer naming this utility function evaluateNondeterministicResults to emphasis this should (mostly) be used on the results of an output projection list.

But the existing utility function evaluateRequiredVariables uses the "variable" notion, so keeping consistency there is fine too.

I'm fine either way.

Also, historically Spark SQL's WSCG would use variable names like eval for the ExprCode type, e.g. evals: Seq[ExprCode]. Not sure why it started that way but you can see that naming pattern throughout the WSCG code base.
Again, your new utility function follows the same names used in evaluateRequiredVariables so that's fine. Local consistency is good enough.

To keep the consistent naming, +1 for evaluateNondeterministicVariables .

rednaxelafx · 2019-02-12T21:02:04Z

sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala

+    val baseTable = Seq((1), (1)).toDF("idx")
+
+    // BroadcastHashJoinExec with a HashAggregateExec child containing no aggregate expressions
+    val distinctWithId = baseTable.distinct().withColumn("id", monotonically_increasing_id())


I'm not sure how stable the results are going to be if you use monotonically_increasing_id here with an unspecified number of shuffle partitions. Since you're checking the exact value of the resulting id, if the number of shuffle partitions changes (let's say if someone decides to change the default shuffle partitions setting in all tests), this test can become fragile and fail unnecessarily.

It might be worth setting the shuffle partition to 1 explicitly inside this test case. Or go back to grouping by id instead of checking the exact value of id, or just assert the ids are equal.

Also, how about wrapping with withSQLConf(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> Long.MaxValue.toString) for safeguard.

Thanks. Fixed both.

SparkQA · 2019-02-12T23:33:03Z

Test build #102262 has finished for PR 23731 at commit 567f8f6.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala

maropu · 2019-02-13T02:12:13Z

sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala

+    val distinctWithId = baseTable.distinct().withColumn("id", monotonically_increasing_id())
+      .join(baseTable, "idx")
+    assert(distinctWithId.queryExecution.executedPlan.collectFirst {
+      case BroadcastHashJoinExec(_, _, _, _, _, HashAggregateExec(_, _, Seq(), _, _, _, _), _) =>


How about this?

assert(distinctWithId.queryExecution.executedPlan.collectFirst { case j: BroadcastHashJoinExec if j.left.asInstanceOf[HashAggregateExec] => true }.isDefined)

We need to strictly check agregate exprs? It seems baseTable.distinct() obviously has no aggregate expr?

I prefer avoiding isInstanceOf if possible, but changed it a bit.

sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala

mgaido91

the fix itself looks fine to me. Just some comments on the test, may you please also re-run the benchmark for the query having a considerable perf issue earlier i order to confirm now we have no regression? Thanks.

mgaido91 · 2019-02-13T08:29:04Z

sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala

    }
  }
+
+  test("SPARK-26572: fix aggregate codegen result evaluation") {


Since this is a problem with whole stage codegen, waht about moving this test to WholeStageCodegenSuite? And adding an assert that whole stage codegen is actually used, ie. the HashAggregate is a child of WholeStageCodegenExec?

I'm fine with moving it to WholeStageCodegenSuite but the plan looks like:

*(3) Project [idx#4, id#6L] +- *(3) BroadcastHashJoin [idx#4], [idx#9], Inner, BuildRight :- *(3) HashAggregate(keys=[idx#4], functions=[], output=[idx#4, id#6L]) : +- Exchange hashpartitioning(idx#4, 1) : +- *(1) HashAggregate(keys=[idx#4], functions=[], output=[idx#4]) : +- *(1) Project [value#1 AS idx#4] : +- LocalTableScan [value#1] +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint))) +- *(2) Project [value#1 AS idx#9] +- LocalTableScan [value#1]

so I guess you mean checking WholeStageCodegenExec has a ProjectExec child that has a BroadcastHashJoinExec child?

Moved and added WholeStageCodegenExec check.

SparkQA · 2019-02-13T12:08:49Z

Test build #102288 has finished for PR 23731 at commit 5ae9add.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

peter-toth · 2019-02-13T12:20:33Z

Hmm, the failing UT doesn't seem to be related to the changes in this PR.

mgaido91 · 2019-02-13T12:26:01Z

retest this please

SparkQA · 2019-02-13T16:43:12Z

Test build #102292 has finished for PR 23731 at commit 5ae9add.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

peter-toth · 2019-02-14T08:22:27Z

the fix itself looks fine to me. Just some comments on the test, may you please also re-run the benchmark for the query having a considerable perf issue earlier i order to confirm now we have no regression? Thanks.

@mgaido91, I checked that the PR now doesn't add pref regression.

mgaido91 · 2019-02-14T08:25:02Z

LGTM

sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala

viirya · 2019-02-14T08:33:16Z

Looks good and a minor comment about variable naming.

Change-Id: I1a2c52e7ba30a186517d91568093da813f201d1f

SparkQA · 2019-02-14T14:20:39Z

Test build #102342 has finished for PR 23731 at commit af861d5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

This PR is a correctness fix in `HashAggregateExec` code generation. It forces evaluation of result expressions before calling `consume()` to avoid multiple executions. This PR fixes a use case where an aggregate is nested into a broadcast join and appears on the "stream" side. The issue is that Broadcast join generates it's own loop. And without forcing evaluation of `resultExpressions` of `HashAggregateExec` before the join's loop these expressions can be executed multiple times giving incorrect results. New UT was added. Closes #23731 from peter-toth/SPARK-26572. Authored-by: Peter Toth <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 2228ee5) Signed-off-by: Wenchen Fan <[email protected]>

cloud-fan · 2019-02-14T15:16:24Z

thanks, merging to master/2.4/2.3!

peter-toth · 2019-02-14T15:25:02Z

Thanks @cloud-fan @maropu @mgaido91 @rednaxelafx and @viirya for your review and help.

## What changes were proposed in this pull request? This PR is a correctness fix in `HashAggregateExec` code generation. It forces evaluation of result expressions before calling `consume()` to avoid multiple executions. This PR fixes a use case where an aggregate is nested into a broadcast join and appears on the "stream" side. The issue is that Broadcast join generates it's own loop. And without forcing evaluation of `resultExpressions` of `HashAggregateExec` before the join's loop these expressions can be executed multiple times giving incorrect results. ## How was this patch tested? New UT was added. Closes apache#23731 from peter-toth/SPARK-26572. Authored-by: Peter Toth <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

This PR is a correctness fix in `HashAggregateExec` code generation. It forces evaluation of result expressions before calling `consume()` to avoid multiple executions. This PR fixes a use case where an aggregate is nested into a broadcast join and appears on the "stream" side. The issue is that Broadcast join generates it's own loop. And without forcing evaluation of `resultExpressions` of `HashAggregateExec` before the join's loop these expressions can be executed multiple times giving incorrect results. New UT was added. Closes apache#23731 from peter-toth/SPARK-26572. Authored-by: Peter Toth <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 2228ee5) Signed-off-by: Wenchen Fan <[email protected]>

This PR is a correctness fix in `HashAggregateExec` code generation. It forces evaluation of result expressions before calling `consume()` to avoid multiple executions. This PR fixes a use case where an aggregate is nested into a broadcast join and appears on the "stream" side. The issue is that Broadcast join generates it's own loop. And without forcing evaluation of `resultExpressions` of `HashAggregateExec` before the join's loop these expressions can be executed multiple times giving incorrect results. New UT was added. Closes apache#23731 from peter-toth/SPARK-26572. Authored-by: Peter Toth <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 2228ee5) RB=1571578 G=superfriends-reviewers R=fli,yezhou,edlu,mshen A=yezhou

peter-toth added 2 commits February 3, 2019 20:45

[SPARK-26572][SQL] fix aggregate codegen result evaluation

b906c8c

indentation fix

b5d079c

Change-Id: Ie07c913fc4586296c8187f0972c19169da25f613

maropu reviewed Feb 4, 2019

View reviewed changes

viirya reviewed Feb 11, 2019

View reviewed changes

rednaxelafx reviewed Feb 12, 2019

View reviewed changes

maropu reviewed Feb 13, 2019

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala Outdated Show resolved Hide resolved

maropu reviewed Feb 13, 2019

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala Outdated Show resolved Hide resolved

maropu reviewed Feb 13, 2019

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala Outdated Show resolved Hide resolved

mgaido91 reviewed Feb 13, 2019

View reviewed changes

fix review findings

5ae9add

viirya reviewed Feb 14, 2019

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala Outdated Show resolved Hide resolved

fix var name

af861d5

Change-Id: I1a2c52e7ba30a186517d91568093da813f201d1f

cloud-fan closed this in 2228ee5 Feb 14, 2019

[SPARK-26572][SQL] fix aggregate codegen result evaluation #23731

[SPARK-26572][SQL] fix aggregate codegen result evaluation #23731

Uh oh!

Conversation

peter-toth commented Feb 3, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

maropu commented Feb 4, 2019

Uh oh!

maropu commented Feb 4, 2019

Uh oh!

peter-toth commented Feb 4, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mgaido91 commented Feb 4, 2019

Uh oh!

maropu commented Feb 4, 2019

Uh oh!

maropu commented Feb 4, 2019

Uh oh!

maropu commented Feb 4, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Feb 4, 2019

Uh oh!

peter-toth commented Feb 4, 2019

Uh oh!

mgaido91 commented Feb 4, 2019

Uh oh!

peter-toth commented Feb 4, 2019

Uh oh!

mgaido91 commented Feb 4, 2019

Uh oh!

dongjoon-hyun commented Feb 8, 2019

Uh oh!

SparkQA commented Feb 8, 2019

Uh oh!

cloud-fan commented Feb 11, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

peter-toth commented Feb 11, 2019

Uh oh!

mgaido91 commented Feb 11, 2019

Uh oh!

cloud-fan commented Feb 12, 2019

Uh oh!

dongjoon-hyun commented Feb 12, 2019

Uh oh!

rednaxelafx left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

peter-toth commented Feb 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maropu commented Feb 12, 2019

Uh oh!

peter-toth commented Feb 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rednaxelafx commented Feb 12, 2019

Uh oh!

mgaido91 commented Feb 12, 2019

Uh oh!

maropu commented Feb 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

peter-toth commented Feb 3, 2019 •

edited

Loading

peter-toth commented Feb 4, 2019 •

edited

Loading

rednaxelafx left a comment •

edited

Loading

peter-toth commented Feb 12, 2019 •

edited

Loading

peter-toth commented Feb 12, 2019 •

edited

Loading

maropu commented Feb 12, 2019 •

edited

Loading

maropu Feb 13, 2019 •

edited

Loading

peter-toth Feb 13, 2019 •

edited

Loading