[SPARK-4431][MLlib] Implement efficient foreachActive for dense and sparse vector #3288

dbtsai · 2014-11-16T01:08:30Z

Previously, we were using Breeze's activeIterator to access the non-zero elements
in dense/sparse vector. Due to the overhead, we switched back to native while loop
in #SPARK-4129.

However, #SPARK-4129 requires de-reference the dv.values/sv.values in
each access to the value, which is very expensive. Also, in MultivariateOnlineSummarizer,
we're using Breeze's dense vector to store the partial stats, and this is very expensive compared
with using primitive scala array.

In this PR, efficient foreachActive is implemented to unify the code path for dense and sparse
vector operation which makes codebase easier to maintain. Breeze dense vector is replaced
by primitive array to reduce the overhead further.

Benchmarking with mnist8m dataset on single JVM
with first 200 samples loaded in memory, and repeating 5000 times.

Before change:
Sparse Vector - 30.02
Dense Vector - 38.27

With this PR:
Sparse Vector - 6.29
Dense Vector - 11.72

SparkQA · 2014-11-16T01:14:59Z

Test build #23430 has started for PR 3288 at commit 101c2ea.

This patch merges cleanly.

SparkQA · 2014-11-16T02:40:03Z

Test build #23430 has finished for PR 3288 at commit 101c2ea.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-11-16T02:40:07Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23430/
Test PASSed.

SparkQA · 2014-11-17T06:55:15Z

Test build #23462 has started for PR 3288 at commit 453b29f.

This patch merges cleanly.

SparkQA · 2014-11-17T08:20:14Z

Test build #23462 has finished for PR 3288 at commit 453b29f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-11-17T08:20:17Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23462/
Test PASSed.

mengxr · 2014-11-17T23:05:27Z

mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala

I think we don't need skippingZeros here because it is very easy to chain the iterator with a filter to achieve it.

skippingZeros will be very useful in foreach operation, and if you use iterator -> filter -> foreach, it will not use the optimized foreach which is implemented by native while loop.

I think the following code should have the same performance:

vec.foreach { (i, v) => if (v != 0.0) { ... } }

With the following code,

sample.activeIterator(false).foreach { case (index, value) => if(value != 0.0) add(index, value) }

It takes 61.809 for dense vector, and 54.626 for sparse vector.

The most expensive part is calling the anonymous function even when the values are zero.

Okay, the issue is in the anonymous function. Basically, scala will convert primitive index: Int and value: Double into boxed object in order to have them in tuple. In my testing dataset, there are so many zeros explicitly, and even those values with zero have to be converted to tuple before we do the if statement. That's why it's dramatically faster if we do the if statement before calling the anonymous function.

Changing the signature of foreach into

def foreach[@specialized(Unit) U](f: (Int, Double) => U)

to take two primitive variables will solve this problem, but it will not comply the interface of foreach.

Tuple2[Int, Double] is specialized in Scala: https://github.com/scala/scala/blob/2.10.x/src/library/scala/Tuple2.scala, but the iterator interface still won't be high-performance for numerical computation. The iterator interface is not used in this PR, as we only need foreach. Then the best for us is defining foreach directly:

def foreach(f: (Int, Double) => Unit)

( We could implement ZippedTraversable2 but it doesn't seem to be necessary.)

or

def foreach(skipZeros = True)(f: (Int, Double) => Unit)

This is a private function. We can change it when we see more use cases.

You are right; the Tuple2[Int, Double] is specialized, and I mistakenly interpreted the bytecode.
For the flowing scala code,

def foreach[@specialized(Unit) U](f: ((Int, Double)) => U) { var i = 0 val localValuesSize = values.size val localValues = values while (i < localValuesSize) { f(i, localValues(i)) i += 1 } }

the generated bytecode will be

public foreach(Lscala/Function1;)V L0 LINENUMBER 296 L0 ICONST_0 ISTORE 2 L1 LINENUMBER 297 L1 GETSTATIC scala/Predef$.MODULE$ : Lscala/Predef$; ALOAD 0 INVOKEVIRTUAL org/apache/spark/mllib/linalg/DenseVector.values ()[D INVOKEVIRTUAL scala/Predef$.doubleArrayOps ([D)Lscala/collection/mutable/ArrayOps; INVOKEINTERFACE scala/collection/mutable/ArrayOps.size ()I ISTORE 3 L2 LINENUMBER 298 L2 ALOAD 0 INVOKEVIRTUAL org/apache/spark/mllib/linalg/DenseVector.values ()[D ASTORE 4 L3 LINENUMBER 299 L3 FRAME APPEND [I I [D] ILOAD 2 ILOAD 3 IF_ICMPGE L4 L5 LINENUMBER 300 L5 ALOAD 1 NEW scala/Tuple2$mcID$sp DUP ILOAD 2 ALOAD 4 ILOAD 2 DALOAD INVOKESPECIAL scala/Tuple2$mcID$sp.<init> (ID)V INVOKEINTERFACE scala/Function1.apply (Ljava/lang/Object;)Ljava/lang/Object; POP L6 LINENUMBER 301 L6 ILOAD 2 ICONST_1 IADD ISTORE 2 GOTO L3

However,

INVOKESPECIAL scala/Tuple2$mcID$sp.<init> (ID)V INVOKEINTERFACE scala/Function1.apply (Ljava/lang/Object;)Ljava/lang/Object;

is expensive, so that's why checking zero in the anonymous function will slow down the whole thing.

I agree with you, the iterator is slow by nature, and we are only interested in foreach implementation. I'll remove the iterator, and just have foreach method in vector.

dbtsai · 2014-11-18T23:33:42Z

(PS, when I did the bytecode analysis, I found that accessing the
member variables of values and values.size require two operation.
By having a local copy of reference to make it as single call, there is
another 8% performance gain. See
http://stackoverflow.com/questions/6602922/is-it-faster-to-access-final-local-variables-than-class-variables-in-java for detail)

SparkQA · 2014-11-18T23:35:32Z

Test build #23568 has started for PR 3288 at commit d970498.

This patch does not merge cleanly.

SparkQA · 2014-11-19T01:20:40Z

Test build #23568 has finished for PR 3288 at commit d970498.

This patch passes all tests.
This patch does not merge cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-11-19T01:20:44Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23568/
Test PASSed.

… the accessing a single step operation.

dbtsai · 2014-11-20T23:48:09Z

mllib/src/main/scala/org/apache/spark/mllib/stat/MultivariateOnlineSummarizer.scala

This will be faster than

sample.foreach(true){ case (index, value) => add(index, value) }

for 5%.

See the generated bytecode.

With pattern matching.

L20 LINENUMBER 103 L20 ALOAD 4 INVOKEVIRTUAL scala/Tuple2._1$mcI$sp ()I ISTORE 5 L21 ALOAD 4 INVOKEVIRTUAL scala/Tuple2._2$mcD$sp ()D DSTORE 6 L22 ALOAD 0 ILOAD 5 DLOAD 6 INVOKEVIRTUAL org/apache/spark/mllib/stat/MultivariateOnlineSummarizer.org$apache$spark$mllib$stat$MultivariateOnlineSummarizer$$add (ID)V GETSTATIC scala/runtime/BoxedUnit.UNIT : Lscala/runtime/BoxedUnit; ASTORE 8

Without pattern matching.

L17 LINENUMBER 100 L17 ALOAD 0 ALOAD 3 INVOKEVIRTUAL scala/Tuple2._1$mcI$sp ()I ALOAD 3 INVOKEVIRTUAL scala/Tuple2._2$mcD$sp ()D INVOKEVIRTUAL org/apache/spark/mllib/stat/MultivariateOnlineSummarizer.org$apache$spark$mllib$stat$MultivariateOnlineSummarizer$$add (ID)V

SparkQA · 2014-11-20T23:50:13Z

Test build #23694 has started for PR 3288 at commit 1907ae1.

This patch merges cleanly.

dbtsai · 2014-11-21T00:00:19Z

mllib/src/test/scala/org/apache/spark/mllib/linalg/VectorsSuite.scala

Calling foreach without parenthesis

dv.foreach { case (index: Int, value: Double) => dvMap0.put(index, value) }

will cause

Error:(182, 16) missing parameter type for expanded function The argument types of an anonymous function must be fully known. (SLS 8.5) Expected type was: Boolean dv.foreach { ^

This is scala curry function overloading issue. It seems that unless we change the signature to

private[spark] def foreach(skippingZeros: Boolean = false, f: ((Int, Double)) => Unit)

we need to explicitly call it with parenthesis when we want to call it with default value of skippingZeros.

SparkQA · 2014-11-21T01:22:39Z

Test build #23694 has finished for PR 3288 at commit 1907ae1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-11-21T01:22:43Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23694/
Test PASSed.

mengxr · 2014-11-21T23:09:33Z

mllib/src/main/scala/org/apache/spark/mllib/stat/MultivariateOnlineSummarizer.scala

The following style is more common in the codebase:

sample.foreachActive { (index, value) => ... }

SparkQA · 2014-11-21T23:10:12Z

Test build #23731 has started for PR 3288 at commit 03dd693.

This patch merges cleanly.

mengxr · 2014-11-21T23:10:43Z

LGTM except minor inline comments. Thank for improving the performance!

SparkQA · 2014-11-21T23:25:04Z

Test build #23733 has started for PR 3288 at commit 844b0e6.

This patch merges cleanly.

SparkQA · 2014-11-22T00:38:55Z

Test build #23731 has finished for PR 3288 at commit 03dd693.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-11-22T00:38:58Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23731/
Test PASSed.

SparkQA · 2014-11-22T00:51:56Z

Test build #23733 has finished for PR 3288 at commit 844b0e6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-11-22T00:52:00Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23733/
Test PASSed.

mengxr · 2014-11-22T02:15:51Z

Merged into master and branch-1.2. Thanks!

…parse vector Previously, we were using Breeze's activeIterator to access the non-zero elements in dense/sparse vector. Due to the overhead, we switched back to native `while loop` in #SPARK-4129. However, #SPARK-4129 requires de-reference the dv.values/sv.values in each access to the value, which is very expensive. Also, in MultivariateOnlineSummarizer, we're using Breeze's dense vector to store the partial stats, and this is very expensive compared with using primitive scala array. In this PR, efficient foreachActive is implemented to unify the code path for dense and sparse vector operation which makes codebase easier to maintain. Breeze dense vector is replaced by primitive array to reduce the overhead further. Benchmarking with mnist8m dataset on single JVM with first 200 samples loaded in memory, and repeating 5000 times. Before change: Sparse Vector - 30.02 Dense Vector - 38.27 With this PR: Sparse Vector - 6.29 Dense Vector - 11.72 Author: DB Tsai <[email protected]> Closes apache#3288 from dbtsai/activeIterator and squashes the following commits: 844b0e6 [DB Tsai] formating 03dd693 [DB Tsai] futher performance tunning. 1907ae1 [DB Tsai] address feedback 98448bb [DB Tsai] Made the override final, and had a local copy of variables which made the accessing a single step operation. c0cbd5a [DB Tsai] fix a bug 6441f92 [DB Tsai] Finished SPARK-4431

…parse vector Previously, we were using Breeze's activeIterator to access the non-zero elements in dense/sparse vector. Due to the overhead, we switched back to native `while loop` in #SPARK-4129. However, #SPARK-4129 requires de-reference the dv.values/sv.values in each access to the value, which is very expensive. Also, in MultivariateOnlineSummarizer, we're using Breeze's dense vector to store the partial stats, and this is very expensive compared with using primitive scala array. In this PR, efficient foreachActive is implemented to unify the code path for dense and sparse vector operation which makes codebase easier to maintain. Breeze dense vector is replaced by primitive array to reduce the overhead further. Benchmarking with mnist8m dataset on single JVM with first 200 samples loaded in memory, and repeating 5000 times. Before change: Sparse Vector - 30.02 Dense Vector - 38.27 With this PR: Sparse Vector - 6.29 Dense Vector - 11.72 Author: DB Tsai <[email protected]> Closes #3288 from dbtsai/activeIterator and squashes the following commits: 844b0e6 [DB Tsai] formating 03dd693 [DB Tsai] futher performance tunning. 1907ae1 [DB Tsai] address feedback 98448bb [DB Tsai] Made the override final, and had a local copy of variables which made the accessing a single step operation. c0cbd5a [DB Tsai] fix a bug 6441f92 [DB Tsai] Finished SPARK-4431 (cherry picked from commit b5d17ef) Signed-off-by: Xiangrui Meng <[email protected]>

mengxr reviewed Nov 17, 2014
View reviewed changes

DB Tsai added 4 commits November 20, 2014 14:51

Finished SPARK-4431

6441f92

fix a bug

c0cbd5a

Made the override final, and had a local copy of variables which made…

98448bb

… the accessing a single step operation.

address feedback

1907ae1

dbtsai reviewed Nov 20, 2014
View reviewed changes

dbtsai reviewed Nov 21, 2014
View reviewed changes

futher performance tunning.

03dd693

dbtsai changed the title ~~[SPARK-4431][MLlib] Implement efficient activeIterator for dense and sparse vector~~ [SPARK-4431][MLlib] Implement efficient foreachActive for dense and sparse vector Nov 21, 2014

mengxr reviewed Nov 21, 2014
View reviewed changes

formating

844b0e6

dbtsai closed this Nov 24, 2014

dbtsai deleted the activeIterator branch November 25, 2014 00:02

dbtsai mentioned this pull request Dec 9, 2014

[SPARK-2309][MLlib] Generalize the binary logistic regression into multinomial logistic regression #1379

Merged

[SPARK-4431][MLlib] Implement efficient foreachActive for dense and sparse vector #3288

[SPARK-4431][MLlib] Implement efficient foreachActive for dense and sparse vector #3288

Uh oh!

Conversation

dbtsai commented Nov 16, 2014

Uh oh!

SparkQA commented Nov 16, 2014

Uh oh!

SparkQA commented Nov 16, 2014

Uh oh!

AmplabJenkins commented Nov 16, 2014

Uh oh!

SparkQA commented Nov 17, 2014

Uh oh!

SparkQA commented Nov 17, 2014

Uh oh!

AmplabJenkins commented Nov 17, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dbtsai commented Nov 18, 2014

Uh oh!

SparkQA commented Nov 18, 2014

Uh oh!

SparkQA commented Nov 19, 2014

Uh oh!

AmplabJenkins commented Nov 19, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Nov 20, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Nov 21, 2014

Uh oh!

AmplabJenkins commented Nov 21, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Nov 21, 2014

Uh oh!

mengxr commented Nov 21, 2014

Uh oh!

SparkQA commented Nov 21, 2014

Uh oh!

SparkQA commented Nov 22, 2014

Uh oh!

AmplabJenkins commented Nov 22, 2014

Uh oh!

SparkQA commented Nov 22, 2014

Uh oh!

AmplabJenkins commented Nov 22, 2014

Uh oh!

mengxr commented Nov 22, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants