-
Notifications
You must be signed in to change notification settings - Fork 29k
[Spark-7422][MLLIB] Add argmax to Vector, SparseVector #6112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
04677af
3cffed4
df9538a
4526acc
eeda560
af17981
f21dcce
b1f059f
3ee8711
ee1a85a
d5b5423
ac53c55
aa330e3
f2eba2f
b22af46
42341fb
5fd9380
98058f4
2ea6a55
127dec5
3e0a939
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
…e vector. Updated unit tests
- Loading branch information
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -725,7 +725,6 @@ class SparseVector( | |
| -1 | ||
| } else { | ||
|
|
||
| //grab first active index and value by default | ||
| var maxIdx = indices(0) | ||
| var maxValue = values(0) | ||
|
|
||
|
|
@@ -736,9 +735,14 @@ class SparseVector( | |
| } | ||
| } | ||
|
|
||
| // look for inactive values incase all active node values are negative | ||
| // look for inactive values in case all active node values are negative | ||
| if(size != values.size && maxValue <= 0){ | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry to be that guy, but it looks like this would also fail our current convention that the first idx should be returned, if maxValues is zero and if the activeIndex that has a value zero is lesser than all inactive indices, something like. It seems that argmax would return 1 in this case. (correct me if I'm wrong)
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So are you thinking of the case where we have an inactive value thats set to something like 1? I dont think the api allows you to do that. My understanding of this case is that we will return idx=0 if 0 is the only max value found. Its technically correct since that active zero happens at the very beginning of the vector. I dont think we skip it due to the fact that someone decided to create a sparse vector with an active zero value. I am pretty sure i cover this case in my unit tests but I'll go back to the code real quick to double check. Also no worries. Better to find bugs than not right? lol.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I had meant something like this. Till this block of code, the maxIdx would be 0 and maxValue would be 0 and since the condition
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Gotcha. Think I have this case handled as well now. Will push it up soon if it passes my unit tests.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Seems odd to allow addition of active nodes with value 0 if they should really be inactive. As well if we call SparseVector.toSparseVector it looks like it filters out the zeros to begin with so might make sense to do this more formally at object creation time. @mengxr thoughts?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Wouldn't this logic deal with all such cases?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah I had a check for that in there but forgot I had reverted my code a bit. doh! realized it wasnt a new corner case anyways just the same one you saw earlier.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. btw, scipy also allows zero values to be stored in the active nodes. Doing such a check might be expensive when values are large.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ah good to know.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Space after if |
||
| maxIdx = calcInactiveIdx(0) | ||
| val firstInactiveIdx = calcFirstInactiveIdx(0) | ||
| if(maxValue == 0){ | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. space after if; here and elsewhere |
||
| if(firstInactiveIdx >= maxIdx) maxIdx else maxIdx = firstInactiveIdx | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we can do away with the else here. so that the only time when maxIdx does not change is when maxValue equals zero, and if firstInactiveIdx is greater than maxIdx |
||
| }else{ | ||
| maxIdx = firstInactiveIdx | ||
| } | ||
| maxValue = 0 | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this line can be removed, since only maxIdx is used.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I more kept it in there for clarity incase anyone is debugging through the code and can more easily understand what the associated idx and val are. But i can remove if its just too much clutter.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not needed |
||
| } | ||
| maxIdx | ||
|
|
@@ -751,12 +755,12 @@ class SparseVector( | |
| * @param idx starting index of computation | ||
| * @return index of first inactive node | ||
| */ | ||
| private[SparseVector] def calcInactiveIdx(idx: Int): Int = { | ||
| private[SparseVector] def calcFirstInactiveIdx(idx: Int): Int = { | ||
| if (idx < size) { | ||
| if (!indices.contains(idx)) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
| idx | ||
| } else { | ||
| calcInactiveIdx(idx + 1) | ||
| calcFirstInactiveIdx(idx + 1) | ||
| } | ||
| } else { | ||
| -1 | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove blank line.