Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
04677af
initial work on adding argmax to Vector and SparseVector
May 11, 2015
3cffed4
Adding unit tests for argmax functions for Dense and Sparse vectors
May 12, 2015
df9538a
Added argmax to sparse vector and added unit test
May 12, 2015
4526acc
Merge branch 'master' of github.com:apache/spark into SPARK-7422
May 13, 2015
eeda560
Fixing SparseVector argmax function to ignore zero values while doing…
May 15, 2015
af17981
Initial work fixing bug that was made clear in pr
dittmarg May 22, 2015
f21dcce
commit
GeorgeDittmar May 25, 2015
b1f059f
Added comment before we start arg max calculation. Updated unit tests…
GeorgeDittmar May 29, 2015
3ee8711
Fixing corner case issue with zeros in the active values of the spars…
GeorgeDittmar Jun 1, 2015
ee1a85a
Cleaning up unit tests a bit and modifying a few cases
GeorgeDittmar Jun 1, 2015
d5b5423
Fixing code style and updating if logic on when to check for zero values
GeorgeDittmar Jun 9, 2015
ac53c55
changing dense vector argmax unit test to be one line call vs 2
GeorgeDittmar Jun 9, 2015
aa330e3
Fixing some last if else spacing issues
GeorgeDittmar Jun 9, 2015
f2eba2f
Cleaning up unit tests to be fewer lines
GeorgeDittmar Jun 9, 2015
b22af46
Fixing spaces between commas in unit test
GeorgeDittmar Jun 10, 2015
42341fb
refactoring arg max check to better handle zero values
GeorgeDittmar Jul 9, 2015
5fd9380
fixing style check error
GeorgeDittmar Jul 9, 2015
98058f4
Merge branch 'master' of github.com:apache/spark into SPARK-7422
GeorgeDittmar Jul 15, 2015
2ea6a55
Added MimaExcludes for Vectors.argmax
GeorgeDittmar Jul 15, 2015
127dec5
update argmax impl
mengxr Jul 17, 2015
3e0a939
Merge pull request #1 from mengxr/SPARK-7422
GeorgeDittmar Jul 18, 2015
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Fixing corner case issue with zeros in the active values of the spars…
…e vector. Updated unit tests
  • Loading branch information
GeorgeDittmar committed Jun 1, 2015
commit 3ee87118472ae68c18444612ffbaba9cd4eaa080
14 changes: 9 additions & 5 deletions mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala
Original file line number Diff line number Diff line change
Expand Up @@ -725,7 +725,6 @@ class SparseVector(
-1
} else {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove blank line.

//grab first active index and value by default
var maxIdx = indices(0)
var maxValue = values(0)

Expand All @@ -736,9 +735,14 @@ class SparseVector(
}
}

// look for inactive values incase all active node values are negative
// look for inactive values in case all active node values are negative
if(size != values.size && maxValue <= 0){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to be that guy, but it looks like this would also fail our current convention that the first idx should be returned,

if maxValues is zero and if the activeIndex that has a value zero is lesser than all inactive indices, something like.

val a = SparseVector(3, Array(0), Array(0))

It seems that argmax would return 1 in this case. (correct me if I'm wrong)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So are you thinking of the case where we have an inactive value thats set to something like 1? I dont think the api allows you to do that. My understanding of this case is that we will return idx=0 if 0 is the only max value found. Its technically correct since that active zero happens at the very beginning of the vector. I dont think we skip it due to the fact that someone decided to create a sparse vector with an active zero value. I am pretty sure i cover this case in my unit tests but I'll go back to the code real quick to double check.

Also no worries. Better to find bugs than not right? lol.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had meant something like this.

val a = new SparseVector(3, Array(0, 1), Array(0, -1))

Till this block of code, the maxIdx would be 0 and maxValue would be 0 and since the condition size != values.size && maxValue <= 0 satisfies, it would return the first inactive Index, i.e 2. However we want 0 to be returned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. Think I have this case handled as well now. Will push it up soon if it passes my unit tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems odd to allow addition of active nodes with value 0 if they should really be inactive. As well if we call SparseVector.toSparseVector it looks like it filters out the zeros to begin with so might make sense to do this more formally at object creation time. @mengxr thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found another corner case with if we have a 0 value defined in the active set of values at the very end of the vector.

Wouldn't this logic deal with all such cases?

if(size != values.size && maxValue <= 0) {
        // calculate first inactive value
        if (maxValue == 0) {
            if (firstInactiveValue > maxIdx) maxIdx else firstInactiveValue
        }
        else {
            firstInactiveValue
        }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I had a check for that in there but forgot I had reverted my code a bit. doh! realized it wasnt a new corner case anyways just the same one you saw earlier.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, scipy also allows zero values to be stored in the active nodes. Doing such a check might be expensive when values are large.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah good to know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Space after if

maxIdx = calcInactiveIdx(0)
val firstInactiveIdx = calcFirstInactiveIdx(0)
if(maxValue == 0){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space after if; here and elsewhere

if(firstInactiveIdx >= maxIdx) maxIdx else maxIdx = firstInactiveIdx
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can do away with the else here.

if (!(maxValue == 0 && firstInactiveIdx >= maxIdx)) {
  maxIdx = firstInactiveIdx
}

so that the only time when maxIdx does not change is when maxValue equals zero, and if firstInactiveIdx is greater than maxIdx

}else{
maxIdx = firstInactiveIdx
}
maxValue = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line can be removed, since only maxIdx is used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I more kept it in there for clarity incase anyone is debugging through the code and can more easily understand what the associated idx and val are. But i can remove if its just too much clutter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed

}
maxIdx
Expand All @@ -751,12 +755,12 @@ class SparseVector(
* @param idx starting index of computation
* @return index of first inactive node
*/
private[SparseVector] def calcInactiveIdx(idx: Int): Int = {
private[SparseVector] def calcFirstInactiveIdx(idx: Int): Int = {
if (idx < size) {
if (!indices.contains(idx)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

contains is too expensive. See my previous comment.

idx
} else {
calcInactiveIdx(idx + 1)
calcFirstInactiveIdx(idx + 1)
}
} else {
-1
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -95,19 +95,24 @@ class VectorsSuite extends FunSuite {
val max2 = vec3.argmax
assert(max2 === 2)

// check for case that sparse vector is created with only negative vaues {0.0, 0.0,-1.0, -0.7, 0.0}
val vec4 = Vectors.sparse(5,Array(2, 3),Array(-1.0,-.7))
// check for case that sparse vector is created with only negative values {0.0, 0.0,-1.0, -0.7, 0.0}
val vec4 = Vectors.sparse(5,Array(0, 1, 2, 3),Array(0.0, 0.0, -1.0,-.7))
val max3 = vec4.argmax
assert(max3 === 0)

// check for case that sparse vector is created with only negative vaues {-1.0, 0.0, -0.7, 0.0, 0.0}
val vec5 = Vectors.sparse(5,Array(0, 3),Array(-1.0,-.7))
val vec5 = Vectors.sparse(11,Array(0, 3, 10),Array(-1.0,-.7,0.0))
val max4 = vec5.argmax
assert(max4 === 1)

val vec6 = Vectors.sparse(2,Array(0, 1),Array(-1.0, 0.0))
val vec6 = Vectors.sparse(5,Array(0, 1, 3),Array(-1.0, 0.0, -.7))
val max5 = vec6.argmax
assert(max5 === 1)

// test that converting the sparse vector to another sparse vector then calling argmax still works right
var vec8 = Vectors.sparse(5,Array(0, 1),Array(0.0, -1.0))
vec8 = vec8.toSparse
val max7 = vec8.argmax
assert(max7 === 0)
}

test("vector equals") {
Expand Down