Skip to content
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ sealed trait Vector extends Serializable {

override def equals(other: Any): Boolean = {
other match {
case v2: Vector => {
case v2: Vector =>
if (this.size != v2.size) return false
(this, v2) match {
case (s1: SparseVector, s2: SparseVector) =>
Expand All @@ -63,20 +63,25 @@ sealed trait Vector extends Serializable {
Vectors.equals(0 until d1.size, d1.values, s1.indices, s1.values)
case (_, _) => util.Arrays.equals(this.toArray, v2.toArray)
}
}
case _ => false
}
}

override def hashCode(): Int = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is Vector.hashCode overridden, even though it will never be called (since Sparse & Dense Vector both override it)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a reference implementation for subclasses to optimize.

var result: Int = size + 31
var i = 0
this.foreachActive { case (index, value) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't mean to have a case here do you? just a closure not partial function

// ignore explict 0 for comparison between sparse and dense
if (value != 0) {
result = 31 * result + index
// refer to {@link java.util.Arrays.equals} for hash algorithm
val bits = java.lang.Double.doubleToLongBits(value)
result = 31 * result + (bits ^ (bits >>> 32)).toInt
i += 1
// only scan the first 16 nonzeros
if (i > 16) {
return result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that return in a closure is pretty expensive because it triggers an exception (which requires traversing the current stack). this might be much slower than before. you probably want to pass the max number of things to run into foreachActive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

foreachActive is usually on the critical path. This hashCode implementation is only for the cases when the vector is used as a map key, as in the Pyrolite SerDe. I added specialized version for DenseVector and SparseVector.

}
}
}
result
Expand Down