Skip to content
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
update
  • Loading branch information
DB Tsai committed Nov 25, 2014
commit fc795e4d7f8792ce4ad70b7ecb41531556669983
Original file line number Diff line number Diff line change
Expand Up @@ -96,14 +96,14 @@ class StandardScalerModel private[mllib] (
*/
override def transform(vector: Vector): Vector = {
require(mean.size == vector.size)
val localFactor = factor
if (withMean) {
val localShift = shift
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is worth to leave a comment here and explain why we need local reference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shift is only used in this branch. Shall we just put val shift = mean.toArray here instead of having a member variable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I'll change it back to lazy since it will not be evaluated in those branches which don't use shift. I don't want to create shift array/object for each sample since shift will always be the same.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shift only holds a reference to mean.values. We don't really need to define it as a member and make it lazy. It should give the same performance if we only define it inside the if branch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For different implementation of vector, toArray can be very expensive. For example, toArray for sparse vector requires to create a new array object and loop through all the non zero values. As a result, we can have a global lazy shift which can prevent this happens.

vector match {
case dv: DenseVector =>
val values = dv.values.clone()
var i = 0
if(withStd) {
val localFactor = factor
while (i < values.length) {
values(i) = (values(i) - localShift(i)) * localFactor(i)
i += 1
Expand All @@ -118,6 +118,7 @@ class StandardScalerModel private[mllib] (
case v => throw new IllegalArgumentException("Do not support vector type " + v.getClass)
}
} else if (withStd) {
val localFactor = factor
vector match {
case dv: DenseVector =>
val values = dv.values.clone()
Expand Down