-
Notifications
You must be signed in to change notification settings - Fork 29k
[Spark-7422][MLLIB] Add argmax to Vector, SparseVector #6112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 15 commits
04677af
3cffed4
df9538a
4526acc
eeda560
af17981
f21dcce
b1f059f
3ee8711
ee1a85a
d5b5423
ac53c55
aa330e3
f2eba2f
b22af46
42341fb
5fd9380
98058f4
2ea6a55
127dec5
3e0a939
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -150,6 +150,12 @@ sealed trait Vector extends Serializable { | |
| toDense | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * Find the index of a maximal element. Returns the first maximal element in case of a tie. | ||
| * Returns -1 if vector has length 0. | ||
| */ | ||
| def argmax: Int | ||
| } | ||
|
|
||
| /** | ||
|
|
@@ -588,11 +594,7 @@ class DenseVector(val values: Array[Double]) extends Vector { | |
| new SparseVector(size, ii, vv) | ||
| } | ||
|
|
||
| /** | ||
| * Find the index of a maximal element. Returns the first maximal element in case of a tie. | ||
| * Returns -1 if vector has length 0. | ||
| */ | ||
| private[spark] def argmax: Int = { | ||
| override def argmax: Int = { | ||
| if (size == 0) { | ||
| -1 | ||
| } else { | ||
|
|
@@ -717,6 +719,51 @@ class SparseVector( | |
| new SparseVector(size, ii, vv) | ||
| } | ||
| } | ||
|
|
||
| override def argmax: Int = { | ||
| if (size == 0) { | ||
| -1 | ||
| } else { | ||
|
|
||
| var maxIdx = indices(0) | ||
| var maxValue = values(0) | ||
|
|
||
| foreachActive { (i, v) => | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is not correct if all nonzeros are negative and there exist inactive (zero) entries.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "there exist inactive (zero) entries" are you meaning we have values in the sparse vector that are zero but are set to not active? I thought that the sparse vector had you define what indices have active elements and that indices.size must equal the size of the values in that vector so I am not sure how you get into the state I think you are describing. Do you have an example?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Wait are you getting at say I have a sparse vector with zero in it and an assigned index? Like indices = (1,2,3) values = (-1,0,-.5)? If thats the case then yes I now see what you are saying.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The bug still exists. Take a sparse vector
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added unit tests to cover this and they appear to be passing so maybe I am totally misunderstanding the bug you are discussing? Or more likely I am misunderstanding the usage of the sparse vector implementation in regards to the inactive vs active elements?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry, I didn't explain this clearly. All inactive values in a sparse vector are zeros. The edge case here is that zero could be the max value of the entries. For example,
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok yeah it sounds like I misunderstood the inactive vs active node concept. I'll get a fix in for that. I was assuming inactive meant we wanted to just ignore it and do the argmax over the active portions. |
||
| if (v > maxValue) { | ||
| maxIdx = i | ||
| maxValue = v | ||
| } | ||
| } | ||
|
|
||
| // look for inactive values in case all active node values are negative | ||
| if (size != values.size && maxValue <= 0) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can implement this way: var k = 0
while (k < indices.length && indices(k) == k) {
k += 1
}
maxIdx = if (maxValue == 0.0 && k > maxIdx) maxIdx else k |
||
| val firstInactiveIdx = calcFirstInactiveIdx(0) | ||
| if (!(maxValue == 0 && firstInactiveIdx >= maxIdx)) { | ||
| maxIdx = firstInactiveIdx | ||
| } | ||
| maxValue = 0 | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this line can be removed, since only maxIdx is used.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I more kept it in there for clarity incase anyone is debugging through the code and can more easily understand what the associated idx and val are. But i can remove if its just too much clutter.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not needed |
||
| } | ||
| maxIdx | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * Calculates the first instance of an inactive node in a sparse vector and returns the Idx | ||
| * of the element. | ||
| * @param idx starting index of computation | ||
| * @return index of first inactive node | ||
| */ | ||
| private[SparseVector] def calcFirstInactiveIdx(idx: Int): Int = { | ||
| if (idx < size) { | ||
| if (!indices.contains(idx)) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
| idx | ||
| } else { | ||
| calcFirstInactiveIdx(idx + 1) | ||
| } | ||
| } else { | ||
| -1 | ||
| } | ||
| } | ||
| } | ||
|
|
||
| object SparseVector { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove blank line.