Commit dca0d9a
[SPARK-14322][MLLIB] Use treeAggregate instead of reduce in OnlineLDAOptimizer
## What changes were proposed in this pull request?
jira: https://issues.apache.org/jira/browse/SPARK-14322
OnlineLDAOptimizer uses RDD.reduce in two places where it could use treeAggregate. This can cause scalability issues. This should be an easy fix.
This is also a bug since it modifies the first argument to reduce, so we should use aggregate or treeAggregate.
See this line: https://github.com/apache/spark/blob/f12f11e578169b47e3f8b18b299948c0670ba585/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala#L452
and a few lines below it.
## How was this patch tested?
unit tests
Author: Yuhao Yang <hhbyyh@gmail.com>
Closes #12106 from hhbyyh/ldaTreeReduce.
(cherry picked from commit 8cffcb6)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>1 parent cfe9f02 commit dca0d9a
File tree
1 file changed
+3
-2
lines changed- mllib/src/main/scala/org/apache/spark/mllib/clustering
1 file changed
+3
-2
lines changedLines changed: 3 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
449 | 449 | | |
450 | 450 | | |
451 | 451 | | |
452 | | - | |
| 452 | + | |
| 453 | + | |
453 | 454 | | |
454 | 455 | | |
455 | | - | |
| 456 | + | |
456 | 457 | | |
457 | 458 | | |
458 | 459 | | |
| |||
0 commit comments