-
Notifications
You must be signed in to change notification settings - Fork 29k
[Spark-7422][MLLIB] Add argmax to Vector, SparseVector #6112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from 3 commits
Commits
Show all changes
21 commits
Select commit
Hold shift + click to select a range
04677af
initial work on adding argmax to Vector and SparseVector
3cffed4
Adding unit tests for argmax functions for Dense and Sparse vectors
df9538a
Added argmax to sparse vector and added unit test
4526acc
Merge branch 'master' of github.com:apache/spark into SPARK-7422
eeda560
Fixing SparseVector argmax function to ignore zero values while doing…
af17981
Initial work fixing bug that was made clear in pr
dittmarg f21dcce
commit
GeorgeDittmar b1f059f
Added comment before we start arg max calculation. Updated unit tests…
GeorgeDittmar 3ee8711
Fixing corner case issue with zeros in the active values of the spars…
GeorgeDittmar ee1a85a
Cleaning up unit tests a bit and modifying a few cases
GeorgeDittmar d5b5423
Fixing code style and updating if logic on when to check for zero values
GeorgeDittmar ac53c55
changing dense vector argmax unit test to be one line call vs 2
GeorgeDittmar aa330e3
Fixing some last if else spacing issues
GeorgeDittmar f2eba2f
Cleaning up unit tests to be fewer lines
GeorgeDittmar b22af46
Fixing spaces between commas in unit test
GeorgeDittmar 42341fb
refactoring arg max check to better handle zero values
GeorgeDittmar 5fd9380
fixing style check error
GeorgeDittmar 98058f4
Merge branch 'master' of github.com:apache/spark into SPARK-7422
GeorgeDittmar 2ea6a55
Added MimaExcludes for Vectors.argmax
GeorgeDittmar 127dec5
update argmax impl
mengxr 3e0a939
Merge pull request #1 from mengxr/SPARK-7422
GeorgeDittmar File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -63,11 +63,35 @@ class VectorsSuite extends FunSuite { | |
| assert(vec.toArray.eq(arr)) | ||
| } | ||
|
|
||
| test("dense argmax"){ | ||
| val vec = Vectors.dense(Array.empty[Double]).asInstanceOf[DenseVector] | ||
| val noMax = vec.argmax | ||
| assert(noMax === -1) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nitpick: I think this can be combined into a single line, here and elsewhere |
||
|
|
||
| val vec2 = Vectors.dense(arr).asInstanceOf[DenseVector] | ||
| val max = vec2.argmax | ||
| assert(max === 3) | ||
| } | ||
|
|
||
| test("sparse to array") { | ||
| val vec = Vectors.sparse(n, indices, values).asInstanceOf[SparseVector] | ||
| assert(vec.toArray === arr) | ||
| } | ||
|
|
||
| test("sparse argmax"){ | ||
| val vec = Vectors.sparse(0,Array.empty[Int],Array.empty[Double]).asInstanceOf[SparseVector] | ||
| val noMax = vec.argmax | ||
| assert(noMax === -1) | ||
|
|
||
| val vec2 = Vectors.sparse(n,indices,values).asInstanceOf[SparseVector] | ||
| val max = vec2.argmax | ||
| assert(max === 3) | ||
|
|
||
| val vec3 = Vectors.sparse(5,Array(1,3,4),Array(1.0,.5,.7)) | ||
| val max2 = vec3.argmax | ||
| assert(max2 === 1) | ||
| } | ||
|
|
||
| test("vector equals") { | ||
| val dv1 = Vectors.dense(arr.clone()) | ||
| val dv2 = Vectors.dense(arr.clone()) | ||
|
|
||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not correct if all nonzeros are negative and there exist inactive (zero) entries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"there exist inactive (zero) entries" are you meaning we have values in the sparse vector that are zero but are set to not active? I thought that the sparse vector had you define what indices have active elements and that indices.size must equal the size of the values in that vector so I am not sure how you get into the state I think you are describing. Do you have an example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait are you getting at say I have a sparse vector with zero in it and an assigned index? Like indices = (1,2,3) values = (-1,0,-.5)? If thats the case then yes I now see what you are saying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bug still exists. Take a sparse vector
val sv = SparseVector(5, Array(1, 3), Array(-1, -5))for example,sv.argmaxshould return one from{0, 2, 4}because0is the max value in this vector.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added unit tests to cover this and they appear to be passing so maybe I am totally misunderstanding the bug you are discussing? Or more likely I am misunderstanding the usage of the sparse vector implementation in regards to the inactive vs active elements?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I didn't explain this clearly. All inactive values in a sparse vector are zeros. The edge case here is that zero could be the max value of the entries. For example,
[-1.0, 0.0, -3.0].argmax == 1but0.0doesn't appear inforeachActiveif the sparse vector isSparseVector(3, Array(0, 2), Array(-1.0, -3.0)). If we only look at the active values, the argmax would be 0 as-1.0is the max among active values. So we need to cover the following cases if all active values are negative:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok yeah it sounds like I misunderstood the inactive vs active node concept. I'll get a fix in for that. I was assuming inactive meant we wanted to just ignore it and do the argmax over the active portions.