Skip to content

Conversation

@dbtsai
Copy link
Member

@dbtsai dbtsai commented Nov 16, 2014

Previously, we were using Breeze's activeIterator to access the non-zero elements
in dense/sparse vector. Due to the overhead, we switched back to native while loop
in #SPARK-4129.

However, #SPARK-4129 requires de-reference the dv.values/sv.values in
each access to the value, which is very expensive. Also, in MultivariateOnlineSummarizer,
we're using Breeze's dense vector to store the partial stats, and this is very expensive compared
with using primitive scala array.

In this PR, efficient foreachActive is implemented to unify the code path for dense and sparse
vector operation which makes codebase easier to maintain. Breeze dense vector is replaced
by primitive array to reduce the overhead further.

Benchmarking with mnist8m dataset on single JVM
with first 200 samples loaded in memory, and repeating 5000 times.

Before change:
Sparse Vector - 30.02
Dense Vector - 38.27

With this PR:
Sparse Vector - 6.29
Dense Vector - 11.72

@SparkQA
Copy link

SparkQA commented Nov 16, 2014

Test build #23430 has started for PR 3288 at commit 101c2ea.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 16, 2014

Test build #23430 has finished for PR 3288 at commit 101c2ea.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23430/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Nov 17, 2014

Test build #23462 has started for PR 3288 at commit 453b29f.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 17, 2014

Test build #23462 has finished for PR 3288 at commit 453b29f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23462/
Test PASSed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need skippingZeros here because it is very easy to chain the iterator with a filter to achieve it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skippingZeros will be very useful in foreach operation, and if you use iterator -> filter -> foreach, it will not use the optimized foreach which is implemented by native while loop.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the following code should have the same performance:

vec.foreach { (i, v) =>
  if (v != 0.0) {
    ...
  }
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the following code,

sample.activeIterator(false).foreach {
  case (index, value) => if(value != 0.0) add(index, value)
}

It takes 61.809 for dense vector, and 54.626 for sparse vector.

The most expensive part is calling the anonymous function even when the values are zero.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, the issue is in the anonymous function. Basically, scala will convert primitive index: Int and value: Double into boxed object in order to have them in tuple. In my testing dataset, there are so many zeros explicitly, and even those values with zero have to be converted to tuple before we do the if statement. That's why it's dramatically faster if we do the if statement before calling the anonymous function.

Changing the signature of foreach into

def foreach[@specialized(Unit) U](f: (Int, Double) => U)

to take two primitive variables will solve this problem, but it will not comply the interface of foreach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tuple2[Int, Double] is specialized in Scala: https://github.com/scala/scala/blob/2.10.x/src/library/scala/Tuple2.scala, but the iterator interface still won't be high-performance for numerical computation. The iterator interface is not used in this PR, as we only need foreach. Then the best for us is defining foreach directly:

def foreach(f: (Int, Double) => Unit)

( We could implement ZippedTraversable2 but it doesn't seem to be necessary.)

or

def foreach(skipZeros = True)(f: (Int, Double) => Unit)

This is a private function. We can change it when we see more use cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right; the Tuple2[Int, Double] is specialized, and I mistakenly interpreted the bytecode.
For the flowing scala code,

def foreach[@specialized(Unit) U](f: ((Int, Double)) => U) {
  var i = 0
  val localValuesSize = values.size
  val localValues = values
  while (i < localValuesSize) {
    f(i, localValues(i))
    i += 1
  }
}

the generated bytecode will be

  public foreach(Lscala/Function1;)V
   L0
    LINENUMBER 296 L0
    ICONST_0
    ISTORE 2
   L1
    LINENUMBER 297 L1
    GETSTATIC scala/Predef$.MODULE$ : Lscala/Predef$;
    ALOAD 0
    INVOKEVIRTUAL org/apache/spark/mllib/linalg/DenseVector.values ()[D
    INVOKEVIRTUAL scala/Predef$.doubleArrayOps ([D)Lscala/collection/mutable/ArrayOps;
    INVOKEINTERFACE scala/collection/mutable/ArrayOps.size ()I
    ISTORE 3
   L2
    LINENUMBER 298 L2
    ALOAD 0
    INVOKEVIRTUAL org/apache/spark/mllib/linalg/DenseVector.values ()[D
    ASTORE 4
   L3
    LINENUMBER 299 L3
   FRAME APPEND [I I [D]
    ILOAD 2
    ILOAD 3
    IF_ICMPGE L4
   L5
    LINENUMBER 300 L5
    ALOAD 1
    NEW scala/Tuple2$mcID$sp
    DUP
    ILOAD 2
    ALOAD 4
    ILOAD 2
    DALOAD
    INVOKESPECIAL scala/Tuple2$mcID$sp.<init> (ID)V
    INVOKEINTERFACE scala/Function1.apply (Ljava/lang/Object;)Ljava/lang/Object;
    POP
   L6
    LINENUMBER 301 L6
    ILOAD 2
    ICONST_1
    IADD
    ISTORE 2
    GOTO L3

However,

    INVOKESPECIAL scala/Tuple2$mcID$sp.<init> (ID)V
    INVOKEINTERFACE scala/Function1.apply (Ljava/lang/Object;)Ljava/lang/Object;

is expensive, so that's why checking zero in the anonymous function will slow down the whole thing.

I agree with you, the iterator is slow by nature, and we are only interested in foreach implementation. I'll remove the iterator, and just have foreach method in vector.

@dbtsai
Copy link
Member Author

dbtsai commented Nov 18, 2014

(PS, when I did the bytecode analysis, I found that accessing the
member variables of values and values.size require two operation.
By having a local copy of reference to make it as single call, there is
another 8% performance gain. See
http://stackoverflow.com/questions/6602922/is-it-faster-to-access-final-local-variables-than-class-variables-in-java for detail)

@SparkQA
Copy link

SparkQA commented Nov 18, 2014

Test build #23568 has started for PR 3288 at commit d970498.

  • This patch does not merge cleanly.

@SparkQA
Copy link

SparkQA commented Nov 19, 2014

Test build #23568 has finished for PR 3288 at commit d970498.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23568/
Test PASSed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be faster than

sample.foreach(true){
  case (index, value) => add(index, value) 
}

for 5%.

See the generated bytecode.

With pattern matching.

   L20
    LINENUMBER 103 L20
    ALOAD 4
    INVOKEVIRTUAL scala/Tuple2._1$mcI$sp ()I
    ISTORE 5
   L21
    ALOAD 4
    INVOKEVIRTUAL scala/Tuple2._2$mcD$sp ()D
    DSTORE 6
   L22
    ALOAD 0
    ILOAD 5
    DLOAD 6
    INVOKEVIRTUAL org/apache/spark/mllib/stat/MultivariateOnlineSummarizer.org$apache$spark$mllib$stat$MultivariateOnlineSummarizer$$add (ID)V
    GETSTATIC scala/runtime/BoxedUnit.UNIT : Lscala/runtime/BoxedUnit;
    ASTORE 8

Without pattern matching.

   L17
    LINENUMBER 100 L17
    ALOAD 0
    ALOAD 3
    INVOKEVIRTUAL scala/Tuple2._1$mcI$sp ()I
    ALOAD 3
    INVOKEVIRTUAL scala/Tuple2._2$mcD$sp ()D
    INVOKEVIRTUAL org/apache/spark/mllib/stat/MultivariateOnlineSummarizer.org$apache$spark$mllib$stat$MultivariateOnlineSummarizer$$add (ID)V

@SparkQA
Copy link

SparkQA commented Nov 20, 2014

Test build #23694 has started for PR 3288 at commit 1907ae1.

  • This patch merges cleanly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling foreach without parenthesis

dv.foreach {
  case (index: Int, value: Double) => dvMap0.put(index, value)
}

will cause

Error:(182, 16) missing parameter type for expanded function
The argument types of an anonymous function must be fully known. (SLS 8.5)
Expected type was: Boolean
    dv.foreach {
               ^

This is scala curry function overloading issue. It seems that unless we change the signature to

private[spark] def foreach(skippingZeros: Boolean = false, f: ((Int, Double)) => Unit)

we need to explicitly call it with parenthesis when we want to call it with default value of skippingZeros.

@SparkQA
Copy link

SparkQA commented Nov 21, 2014

Test build #23694 has finished for PR 3288 at commit 1907ae1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23694/
Test PASSed.

@dbtsai dbtsai changed the title [SPARK-4431][MLlib] Implement efficient activeIterator for dense and sparse vector [SPARK-4431][MLlib] Implement efficient foreachActive for dense and sparse vector Nov 21, 2014
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following style is more common in the codebase:

sample.foreachActive { (index, value) =>
  ...
}

@SparkQA
Copy link

SparkQA commented Nov 21, 2014

Test build #23731 has started for PR 3288 at commit 03dd693.

  • This patch merges cleanly.

@mengxr
Copy link
Contributor

mengxr commented Nov 21, 2014

LGTM except minor inline comments. Thank for improving the performance!

@SparkQA
Copy link

SparkQA commented Nov 21, 2014

Test build #23733 has started for PR 3288 at commit 844b0e6.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 22, 2014

Test build #23731 has finished for PR 3288 at commit 03dd693.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23731/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Nov 22, 2014

Test build #23733 has finished for PR 3288 at commit 844b0e6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23733/
Test PASSed.

@mengxr
Copy link
Contributor

mengxr commented Nov 22, 2014

Merged into master and branch-1.2. Thanks!

@dbtsai dbtsai closed this Nov 24, 2014
guavuslabs-builder pushed a commit to ThalesGroup/spark that referenced this pull request Nov 24, 2014
…parse vector

Previously, we were using Breeze's activeIterator to access the non-zero elements
in dense/sparse vector. Due to the overhead, we switched back to native `while loop`
in #SPARK-4129.

However, #SPARK-4129 requires de-reference the dv.values/sv.values in
each access to the value, which is very expensive. Also, in MultivariateOnlineSummarizer,
we're using Breeze's dense vector to store the partial stats, and this is very expensive compared
with using primitive scala array.

In this PR, efficient foreachActive is implemented to unify the code path for dense and sparse
vector operation which makes codebase easier to maintain. Breeze dense vector is replaced
by primitive array to reduce the overhead further.

Benchmarking with mnist8m dataset on single JVM
with first 200 samples loaded in memory, and repeating 5000 times.

Before change:
Sparse Vector - 30.02
Dense Vector - 38.27

With this PR:
Sparse Vector - 6.29
Dense Vector - 11.72

Author: DB Tsai <[email protected]>

Closes apache#3288 from dbtsai/activeIterator and squashes the following commits:

844b0e6 [DB Tsai] formating
03dd693 [DB Tsai] futher performance tunning.
1907ae1 [DB Tsai] address feedback
98448bb [DB Tsai] Made the override final, and had a local copy of variables which made the accessing a single step operation.
c0cbd5a [DB Tsai] fix a bug
6441f92 [DB Tsai] Finished SPARK-4431
asfgit pushed a commit that referenced this pull request Nov 24, 2014
…parse vector

Previously, we were using Breeze's activeIterator to access the non-zero elements
in dense/sparse vector. Due to the overhead, we switched back to native `while loop`
in #SPARK-4129.

However, #SPARK-4129 requires de-reference the dv.values/sv.values in
each access to the value, which is very expensive. Also, in MultivariateOnlineSummarizer,
we're using Breeze's dense vector to store the partial stats, and this is very expensive compared
with using primitive scala array.

In this PR, efficient foreachActive is implemented to unify the code path for dense and sparse
vector operation which makes codebase easier to maintain. Breeze dense vector is replaced
by primitive array to reduce the overhead further.

Benchmarking with mnist8m dataset on single JVM
with first 200 samples loaded in memory, and repeating 5000 times.

Before change:
Sparse Vector - 30.02
Dense Vector - 38.27

With this PR:
Sparse Vector - 6.29
Dense Vector - 11.72

Author: DB Tsai <[email protected]>

Closes #3288 from dbtsai/activeIterator and squashes the following commits:

844b0e6 [DB Tsai] formating
03dd693 [DB Tsai] futher performance tunning.
1907ae1 [DB Tsai] address feedback
98448bb [DB Tsai] Made the override final, and had a local copy of variables which made the accessing a single step operation.
c0cbd5a [DB Tsai] fix a bug
6441f92 [DB Tsai] Finished SPARK-4431

(cherry picked from commit b5d17ef)
Signed-off-by: Xiangrui Meng <[email protected]>
@dbtsai dbtsai deleted the activeIterator branch November 25, 2014 00:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants