Skip to content

Conversation

sethah
Copy link
Contributor

@sethah sethah commented Oct 25, 2016

What changes were proposed in this pull request?

This patch adds a compressed method to ML Matrix class, which returns the minimal storage representation of the matrix - either sparse or dense. Because the space occupied by a sparse matrix is dependent upon its layout (i.e. column major or row major), this method must consider both cases. It may also be useful to force the layout to be column or row major beforehand, so an overload is added which takes in a columnMajor: Boolean parameter.

The compressed implementation relies upon two new abstract methods toDense(columnMajor: Boolean) and toSparse(columnMajor: Boolean), similar to the compressed method implemented in the Vector class. These methods also allow the layout of the resulting matrix to be specified via the columnMajor parameter. More detail on the new methods is given below.

How was this patch tested?

Added many new unit tests

New methods (summary, not exhaustive list)

Matrix trait

  • private[ml] def toDenseMatrix(columnMajor: Boolean): DenseMatrix (abstract) - converts the matrix (either sparse or dense) to dense format
  • private[ml] def toSparseMatrix(columnMajor: Boolean): SparseMatrix (abstract) - converts the matrix (either sparse or dense) to sparse format
  • def toDense: DenseMatrix = toDense(true) - converts the matrix (either sparse or dense) to dense format in column major layout
  • def toSparse: SparseMatrix = toSparse(true) - converts the matrix (either sparse or dense) to sparse format in column major layout
  • def compressed: Matrix - finds the minimum space representation of this matrix, considering both column and row major layouts, and converts it
  • def compressed(columnMajor: Boolean): Matrix - finds the minimum space representation of this matrix considering only column OR row major, and converts it

DenseMatrix class

  • private[ml] def toDenseMatrix(columnMajor: Boolean): DenseMatrix - converts the dense matrix to a dense matrix, optionally changing the layout (data is NOT duplicated if the layouts are the same)
  • private[ml] def toSparseMatrix(columnMajor: Boolean): SparseMatrix - converts the dense matrix to sparse matrix, using the specified layout

SparseMatrix class

  • private[ml] def toDenseMatrix(columnMajor: Boolean): DenseMatrix - converts the sparse matrix to a dense matrix, using the specified layout
  • private[ml] def toSparseMatrix(columnMajors: Boolean): SparseMatrix - converts the sparse matrix to sparse matrix. If the sparse matrix contains any explicit zeros, they are removed. If the layout requested does not match the current layout, data is copied to a new representation. If the layouts match and no explicit zeros exist, the current matrix is returned.

@sethah
Copy link
Contributor Author

sethah commented Oct 25, 2016

cc @dbtsai @yanboliang

Note for reviewers: The driving need behind this change is that we now have multiclass logistic regression which represents coefficients as a matrix. When we use L1 regularization it can be more efficient to store the coefficient matrix as sparse since there can be a lot of zero elements. We will incorporate this compression in a follow-up PR

@SparkQA
Copy link

SparkQA commented Oct 25, 2016

Test build #67527 has finished for PR 15628 at commit a51e217.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 25, 2016

Test build #67528 has finished for PR 15628 at commit 5600d14.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sethah
Copy link
Contributor Author

sethah commented Oct 25, 2016

I'm not sure I understand why there are MiMa failures. It should not be a problem that the new methods only exist in the current version AFAIK.

* @param columnMajor Whether the values of the resulting dense matrix should be in column major
* or row major order. If `false`, resulting matrix will be row major.
*/
private [ml] def toDenseMatrix(columnMajor: Boolean): DenseMatrix
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: space

* @param columnMajor Whether the values of the resulting sparse matrix should be in column major
* or row major order. If `false`, resulting matrix will be row major.
*/
private[ml] def toSparseMatrix(columnMajor: Boolean): SparseMatrix
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit of explanation: I made this a private method with a different name because toDense: DenseMatrix and toSparse: SparseMatrix should be implemented in the trait, not in the subclasses. But we can't just put them here and use overloading, because we will get ambiguous reference compile errors. So, we implement them here and make this private with a different name to avoid this. I appreciate feedback on this approach - it feels a bit awkward.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is very hacky in my opinion too!

The problem is that when one overloads a function without parenthesis, the ambiguity happens because when that function is invoked without parenthesis, this can be calling the actual function without parenthesis, or getting the function with parenthesis. The following is an example demonstrating the issue.

In my opinion, I would like to call it toSparse(columnMajor: Boolean) and toSparse() = toSparse(true), but in the vector api, we already use the one without parenthesis, so it will result inconsistency in the api design.

I think exposing the ability of converting it to columnMajor or rowMajor is very useful, as a result, we can expose it as toCSRMatrix, toCSCMatrix, and toSparse which converts the matrix to the one with smallest storage.

scala> trait A {
     | def foo(b: Boolean): String
     | def foo: String = foo(true)
     | }
defined trait A

scala> class B extends A {
     | def foo(b: Boolean): String = b.toString
     | }
defined class B

scala> val b = new B
b: B = B@67b6d4ae

scala> b.foo
<console>:18: error: ambiguous reference to overloaded definition,
both method foo in class B of type (b: Boolean)String
and  method foo in trait A of type => String
match expected type ?
       b.foo
         ^

scala> val x: String = b.foo
x: String = true

scala> val y: Boolean=> String = b.foo
y: Boolean => String = <function1>

Copy link
Contributor Author

@sethah sethah Mar 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just leave toDenseMatrix private and have toDense always use colMajor = true? I think that's ok to do for now.

EDIT: I don't think we can make toSparse choose the smaller representation since before it always chose column major layout. This would change the behavior.

@SparkQA
Copy link

SparkQA commented Oct 31, 2016

Test build #67819 has finished for PR 15628 at commit b5277c9.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 31, 2016

Test build #67820 has finished for PR 15628 at commit a931fb9.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sethah
Copy link
Contributor Author

sethah commented Nov 3, 2016

What are you going to add? compressed already exists on vectors...

@WeichenXu123
Copy link
Contributor

@sethah Oh..I miss it..


private[ml] def getSparseSize(numActives: Long, numPtrs: Long): Long = {
// 8 * values.length + 4 * rowIndices.length + 4 * colPtrs.length + 8 + 8 + 1
12L * numActives + 4L * numPtrs + 17L
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says 8 * values while this is 12? Seems like a mistype?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wondered how confusing this comment might be. Since values.length == rowIndices.length == numActives:

8 * values.length + 4 * rowIndices.length = 8 * numActives + 4 * numActives = 12 * numActives

The comment is meant to show where each number comes from and the implementation is meant to just be a condensed computation. But please let me know if you think it's too confusing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah right - no that's fine.

@sethah
Copy link
Contributor Author

sethah commented Dec 7, 2016

ping @dbtsai :)

@SparkQA
Copy link

SparkQA commented Dec 14, 2016

Test build #70135 has finished for PR 15628 at commit 10ec41b.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sethah
Copy link
Contributor Author

sethah commented Jan 5, 2017

pinging other potential reviewers @jkbradley @srowen @MLnick I think this is an important patch for multiclass logistic regression.

@sethah
Copy link
Contributor Author

sethah commented Feb 1, 2017

ping @imatiach-msft @dbtsai

@sethah
Copy link
Contributor Author

sethah commented Feb 27, 2017

re-ping @dbtsai @MLnick @yanboliang I still think this is an important patch :D

* Converts this matrix to a dense matrix in column major order.
*/
@Since("2.1.0")
def toDense: DenseMatrix = toDenseMatrix(columnMajor = true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, since we're using numCols already, should we call it colMajor? I saw couple packages using colMajor as the variable name.

val csrSize = getSparseSizeInBytes(columnMajor = false)
val minSparseSize = cscSize.min(csrSize)
if (getDenseSizeInBytes < minSparseSize) {
// size is the same either way, so maintain current layout
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (getDenseSizeInBytes < math.min(cscSize, csrSize)) 
...
...
if (cscSize < csrSize)

could be easier to read.

Also, can you elaborate the comment like

// sizes for dense matrix in row major or column major are the same, so maintain current layout

@@ -153,6 +153,86 @@ sealed trait Matrix extends Serializable {
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make foreachActive(f: (Int, Int, Double) => Unit) public? This is public for vector. I believe it will be very useful, and I think it's stable enough to make it public.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the change. Not sure if we should do this in a separate PR though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fine. Small enough change :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, it's nice to have return type in public method. Can you add Unit as return type?

* @param columnMajor Whether the values of the resulting sparse matrix should be in column major
* or row major order. If `false`, resulting matrix will be row major.
*/
private[ml] def toSparseMatrix(columnMajor: Boolean): SparseMatrix
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is very hacky in my opinion too!

The problem is that when one overloads a function without parenthesis, the ambiguity happens because when that function is invoked without parenthesis, this can be calling the actual function without parenthesis, or getting the function with parenthesis. The following is an example demonstrating the issue.

In my opinion, I would like to call it toSparse(columnMajor: Boolean) and toSparse() = toSparse(true), but in the vector api, we already use the one without parenthesis, so it will result inconsistency in the api design.

I think exposing the ability of converting it to columnMajor or rowMajor is very useful, as a result, we can expose it as toCSRMatrix, toCSCMatrix, and toSparse which converts the matrix to the one with smallest storage.

scala> trait A {
     | def foo(b: Boolean): String
     | def foo: String = foo(true)
     | }
defined trait A

scala> class B extends A {
     | def foo(b: Boolean): String = b.toString
     | }
defined class B

scala> val b = new B
b: B = B@67b6d4ae

scala> b.foo
<console>:18: error: ambiguous reference to overloaded definition,
both method foo in class B of type (b: Boolean)String
and  method foo in trait A of type => String
match expected type ?
       b.foo
         ^

scala> val x: String = b.foo
x: String = true

scala> val y: Boolean=> String = b.foo
y: Boolean => String = <function1>

@SparkQA
Copy link

SparkQA commented Mar 9, 2017

Test build #74280 has finished for PR 15628 at commit 810a077.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

* the current layout order.
*/
@Since("2.2.0")
def compressed: Matrix = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't compressed(colMajor: Boolean) and compressed cause the overloading ambiguous issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's only a problem if we override/implement it in a subclass. Since it's contained wholly in the trait, it will be fine. I think this is ok to leave, though we could make it final? Also we could make three methods: compressed, compressedCSC, compressedCSR. I think the latter is a good solution, thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on the later one.

@@ -153,6 +153,86 @@ sealed trait Matrix extends Serializable {
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fine. Small enough change :)

* Converts this matrix to a sparse matrix in column major order.
*/
@Since("2.2.0")
def toCSCMatrix: SparseMatrix = toSparseMatrix(colMajor = true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we follow scipy, and call it as toCSC?

* Converts this matrix to a sparse matrix in row major order.
*/
@Since("2.2.0")
def toCSRMatrix: SparseMatrix = toSparseMatrix(colMajor = false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toCSR?

*/
@Since("2.2.0")
def toDense: DenseMatrix = toDenseMatrix(colMajor = true)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add toColumnMajorDense and toRowMajorDense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto. should we consider to maintain the same layout?

* @param colMajor Whether the values of the resulting sparse matrix should be in column major
* or row major order. If `false`, resulting matrix will be row major.
*/
private[ml] def toSparseMatrix(colMajor: Boolean): SparseMatrix
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean @inline private[ml] def ... ? Do we expect this to be called often enough for that to make a difference?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough.

@sethah
Copy link
Contributor Author

sethah commented Mar 14, 2017

@dbtsai Let me know your thoughts on the comments I left. Thanks for the review!

@SparkQA
Copy link

SparkQA commented Mar 14, 2017

Test build #74477 has finished for PR 15628 at commit cf8945a.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 14, 2017

Test build #74486 has started for PR 15628 at commit baa8c9d.

@SparkQA
Copy link

SparkQA commented Mar 14, 2017

Test build #74479 has finished for PR 15628 at commit 254b9fb.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@sethah
Copy link
Contributor Author

sethah commented Mar 14, 2017

Jenkins test this please.

@SparkQA
Copy link

SparkQA commented Mar 14, 2017

Test build #74552 has finished for PR 15628 at commit baa8c9d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sethah
Copy link
Contributor Author

sethah commented Mar 17, 2017

ping @dbtsai! Code freeze is upcoming :)

* Converts this matrix to a sparse matrix in column major order.
*/
@Since("2.2.0")
def toCSC: SparseMatrix = toSparseMatrix(colMajor = true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not good at naming, but since we use toDenseRowMajor for dense vector, should we use toSparseColumnMajor? Almost many packages are using toCSC, but I think we can make them consistent. Just my 2 cents.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I have a preference. I don't mind leaving them as CSC and CSR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking about it again, let's have it as toSparseColumnMajor to make the apis consistent with the dense ones if you don't mind?

* Converts this matrix to a sparse matrix in row major order.
*/
@Since("2.2.0")
def toCSR: SparseMatrix = toSparseMatrix(colMajor = false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question.

* or row major order. If `false`, resulting matrix will be row major.
*/
@Since("2.2.0")
def compressed(colMajor: Boolean): Matrix = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make it private, and follow the previous style. Should add compressedRowMajor, compressedColumnMajor since it can be dense matrix in certain situations.

*/
private[ml] override def toDenseMatrix(colMajor: Boolean): DenseMatrix = {
if (!(isTransposed ^ colMajor)) {
val newValues = new Array[Double](numCols * numRows)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simpler to use foreachActive? With it, both toDenseMatrix can have the same implementation for sparse and dense in trait.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can put this in the trait, since when it is called on dense matrix we would like to return this in some cases (when no layout change is needed). But, yes, I think it simpler to use toArray which calls foreachActive. Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following could work, and we only need one implementation in trait. Thanks.

trait Matrix {

  var isTransposed: Boolean = true
  var numCols: Int = 0
  var numRows: Int = 0

  def foreachActive(f: (Int, Int, Double) => Unit): Unit

  def toDenseMatrix(colMajor: Boolean): Matrix = {
    this match {
      case _: DenseMatrix if this.isTransposed != colMajor =>
        this
      case _: SparseMatrix | _: DenseMatrix if this.isTransposed == colMajor =>
        val newValues = new Array[Double](numCols * numRows)

        this.foreachActive { case (row, col, value) =>
            // filling the newValues
        }
        new DenseMatrix(numRows, numCols, newValues, isTransposed = !colMajor)
      case _ =>
        throw new IllegalArgumentException("")
    }
  }
}

class DenseMatrix extends Matrix {

  def foreachActive(f: (Int, Int, Double) => Unit): Unit = {

  }

}

class SparseMatrix extends Matrix {
  def foreachActive(f: (Int, Int, Double) => Unit): Unit = {

  }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I don't think this solution is better. The entire point of abstract methods is to allow subclasses to implement a method differently. Since we need different implementations depending on the subclass, we should just implement them in the subclasses. We can do this with the following:

trait Matrix {
  def toDenseMatrix(colMajor: Boolean): Matrix
}
class DenseMatrix extends Matrix {
    private[ml] override def toDenseMatrix(colMajor: Boolean): DenseMatrix = {
    if (isTransposed && colMajor) {
      new DenseMatrix(numRows, numCols, toArray, isTransposed = false)
    } else if (!isTransposed && !colMajor) {
      new DenseMatrix(numRows, numCols, transpose.toArray, isTransposed = true)
    } else {
      this
    }
  }
}
class SparseMatrix extends Matrix {
  private[ml] override def toDenseMatrix(colMajor: Boolean): DenseMatrix = {
    if (colMajor) new DenseMatrix(numRows, numCols, toArray)
    else new DenseMatrix(numRows, numCols, this.transpose.toArray, isTransposed = true)
  }
}

Which is less verbose than the previous code. I'm going to put that in the next commit. Let me know what you think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me!

new DenseMatrix(numRows, numCols, toArray)
private[ml] override def toSparseMatrix(colMajor: Boolean): SparseMatrix = {
if (!(colMajor ^ isTransposed)) {
// breeze transpose rearranges values in column major and removes explicit zeros
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's hacky to use breeze's transpose behavior to remove zeros in sparse matrices. Can we have our own implementation given we're potentially remove breeze?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pretty much agree with you, but this is non-trivial code if we want to do it efficiently. Breeze has a pretty well-optimized implementation to do this. I would leave it as a follow up JIRA, or do it when/if we ever remove the Breeze dependency. Or do you think this is a blocker for this PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a blocker.

val breezeTransposed = asBreeze.asInstanceOf[BSM[Double]]
Matrices.fromBreeze(breezeTransposed).asInstanceOf[SparseMatrix]
}
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we document here that it's when the layout of this and colMajor is different? Easier read than (colMajor ^ isTranspose) condition here. Even more readable to use pattern matching with exact boolean on both variables.

assert(!dm3.isTransposed)
assert(dm3.values.equals(dm1.values))

val dm4 = dm1.toDenseMatrix(false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to make toDenseMatrix as private, and we test against toDenseRowMajor which is more explicit.

* Converts this matrix to a sparse matrix in column major order.
*/
@Since("2.2.0")
def toSparse: SparseMatrix = toSparseMatrix(colMajor = true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm debating that should we keep the same ordering of layout when we call toSparse or toDense?

Copy link
Contributor Author

@sethah sethah Mar 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this would change the behavior of external facing code. Before if I called toSparse on a row major matrix, I'd get a column major matrix. If we maintain the layout, then I'd now get something different (row major). Otherwise, I'd agree it is best to maintain the layout.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I thought this is a new api being added, so we can make it to maintain the same layout.

1.0 -3.0 -8.0
*/
val dm1 = new DenseMatrix(2, 3, Array(4.0, -1.0, 2.0, 7.0, -8.0, 4.0))
val dm2 = new DenseMatrix(2, 3, Array(5.0, -9.0, 4.0, 1.0, -3.0, -8.0), isTransposed = true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just make dm2 dm1.transposed, but explicitly assign the value? Thus, you don't need to type the value in the array for the comparison.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand your meaning here. These are made to be two entirely different matrices anyway.

*/
@Since("2.2.0")
def toDense: DenseMatrix = toDenseMatrix(colMajor = true)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto. should we consider to maintain the same layout?

toDenseMatrix(!isTransposed)
} else {
if (cscSize <= csrSize) {
toSparseMatrix(colMajor = true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toSparseColumnMajor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

if (cscSize <= csrSize) {
toSparseMatrix(colMajor = true)
} else {
toSparseMatrix(colMajor = false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toSparseRowMajor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

val csrSize = getSparseSizeInBytes(colMajor = false)
if (getDenseSizeInBytes < math.min(cscSize, csrSize)) {
// dense matrix size is the same for column major and row major, so maintain current layout
toDenseMatrix(!isTransposed)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call toDense if we decide to make toDense and toSparse outputting the same layout.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the change here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this.toDense

* Generate a `DenseMatrix` from this `DenseMatrix`.
*
* @param colMajor Whether the resulting `DenseMatrix` values will be in column major order.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor, can we have

protected def isColMajor = !isTransposed

protected def isRowMajor = isTransposed

So the code can be understood easier?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I added these methods. I updated the test suites to use them instead of isTransposed.

}
new SparseMatrix(numRows, numCols, colPtrs, rowIndices.result(), spVals.result())
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we override the toArray in DenseMatrix so when this is column major, we just return this.values? Otherwise, it's very expensive to create a new array.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to say I'm a bit surprised - I thought that's what toArray already did! Yeah, that seems like a good change, but I'd prefer to do it in another pr because we need to make sure that this doesn't adversely affect other places that use toArray as well as adding unit tests. If that sounds ok, I'll make a JIRA for it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Let's do it in another PR. Thanks.

@SparkQA
Copy link

SparkQA commented Mar 23, 2017

Test build #75116 has finished for PR 15628 at commit 4746ec0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 24, 2017

Test build #75123 has finished for PR 15628 at commit 4026e89.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 24, 2017

Test build #75122 has finished for PR 15628 at commit 354935b.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 24, 2017

Test build #75144 has finished for PR 15628 at commit c880a45.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 24, 2017

Test build #75147 has started for PR 15628 at commit 93ec250.

}
new SparseMatrix(numRows, numCols, colPtrs, rowIndices.result(), spVals.result())
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Let's do it in another PR. Thanks.

val csrSize = getSparseSizeInBytes(colMajor = false)
if (getDenseSizeInBytes < math.min(cscSize, csrSize)) {
// dense matrix size is the same for column major and row major, so maintain current layout
toDenseMatrix(!isTransposed)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the change here.

toDenseMatrix(!isTransposed)
} else {
if (cscSize <= csrSize) {
toSparseMatrix(colMajor = true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

if (cscSize <= csrSize) {
toSparseMatrix(colMajor = true)
} else {
toSparseMatrix(colMajor = false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

* @param colMajor Whether the resulting `DenseMatrix` values are in column major order.
*/
private[ml] override def toDenseMatrix(colMajor: Boolean): DenseMatrix = {
if (colMajor) new DenseMatrix(numRows, numCols, toArray)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new DenseMatrix(numRows, numCols, this.toArray, isTransposed = false) to make the style consistent.

assert(cm1 === sm1)
assert(!cm1.isTransposed)
assert(cm1.values.equals(sm1.values))
assert(cm1.getSizeInBytes <= sm1.getSizeInBytes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto. remove =

assert(cm2.isTransposed)
// forced to be row major, so we have increased the size
assert(cm2.getSizeInBytes > sm1.getSizeInBytes)
assert(cm2.getSizeInBytes <= sm1.toDense.getSizeInBytes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

= is needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's ok to leave it here since they could potentially be equal.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If =, dense should be preferred since it will be faster. As a result, we should only check <.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, before we broke ties with sparse, so I changed it to choose dense, and added unit tests. Also removed the = here.

assert(cm8 === sm2)
assert(!cm8.isTransposed)
assert(cm8.getSizeInBytes > sm2.getSizeInBytes)
assert(cm8.getSizeInBytes <= sm2.toDense.getSizeInBytes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For completion, could you test sm1. compressedColumnMajor and sm2. compressedRowMajor? Thanks.

val cm4 = sm3.compressed.asInstanceOf[DenseMatrix]
assert(cm4 === sm3)
assert(!cm4.isTransposed)
assert(cm4.getSizeInBytes <= sm3.getSizeInBytes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove =?

val cm5 = sm3.compressedRowMajor.asInstanceOf[DenseMatrix]
assert(cm5 === sm3)
assert(cm5.isTransposed)
assert(cm5.getSizeInBytes <= sm3.getSizeInBytes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove =? and sm3.compressedColumnMajor?

@SparkQA
Copy link

SparkQA commented Mar 24, 2017

Test build #75155 has started for PR 15628 at commit 5411d46.

@sethah
Copy link
Contributor Author

sethah commented Mar 24, 2017

@dbtsai Thanks for the good suggestions. I rearranged the test suites, removed redundancies, and filled in some gaps. Things got a bit jumbled when changing some of the methods around. I also added the this prefix to some of the method calls as you suggest.

Copy link
Member

@dbtsai dbtsai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor

Copy link
Member

@dbtsai dbtsai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Couple small comments.

* Converts this matrix to a sparse matrix while maintaining the layout of the current matrix.
*/
@Since("2.2.0")
def toSparse: SparseMatrix = toSparseMatrix(colMajor = !isTransposed)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def toSparse: SparseMatrix = toSparseMatrix(colMajor = isColMajor)

* Converts this matrix to a dense matrix while maintaining the layout of the current matrix.
*/
@Since("2.2.0")
def toDense: DenseMatrix = toDenseMatrix(colMajor = !isTransposed)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def toDense: DenseMatrix = toDenseMatrix(colMajor = isColMajor)

}

private[ml] def getDenseSize(numCols: Long, numRows: Long): Long = {
// 8 * values.length + 12 + 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you document what is the magical number 12 + 1? Also, can we make it

java.lang.Double.BYTES * numCols * numRows + 13L

since the size of primitive type can depend on the implementation of JVM.


private[ml] def getSparseSize(numActives: Long, numPtrs: Long): Long = {
// 8 * values.length + 4 * rowIndices.length + 4 * colPtrs.length + 12 + 12 + 12 + 1
12L * numActives + 4L * numPtrs + 37L
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(java.lang.Double.BYTES + java.lang.Integer.BYTES) * numActives + java.lang.Integer.BYTES * numPtrs + 37L

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice that we can get 37L using java apis to ensure the portability.

@SparkQA
Copy link

SparkQA commented Mar 24, 2017

Test build #75167 has finished for PR 15628 at commit 95ac0e0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sethah
Copy link
Contributor Author

sethah commented Mar 24, 2017

Thanks for finally cooperating, Jenkins!

@dbtsai
Copy link
Member

dbtsai commented Mar 24, 2017

Thanks @sethah and Jenkins! Merged into master.

@sethah
Copy link
Contributor Author

sethah commented Mar 24, 2017

Thanks for all the time reviewing!

@asfgit asfgit closed this in e8810b7 Mar 24, 2017
@SparkQA
Copy link

SparkQA commented Mar 24, 2017

Test build #75171 has finished for PR 15628 at commit 87dfaa0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -148,7 +154,8 @@ sealed trait Matrix extends Serializable {
* and column indices respectively with the type `Int`, and the final parameter is the
* corresponding value in the matrix with type `Double`.
*/
private[spark] def foreachActive(f: (Int, Int, Double) => Unit)
@Since("2.2.0")
def foreachActive(f: (Int, Int, Double) => Unit): Unit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sethah @dbtsai Hi all, just saw this during QA. This method is not very Java-friendly. I'm OK with adding it as long as we document the fact that it's not Java-friendly. We could also consider adding a Java-friendly version, perhaps using https://spark.apache.org/docs/latest/api/java/org/apache/spark/api/java/function/Function2.html

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should do the same for foreachActive in Vector which was already a public api long time ago.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. I guess we can leave it until someone complains & add a Java-friendly one as needed.

dongjoon-hyun pushed a commit that referenced this pull request Sep 16, 2025
### What changes were proposed in this pull request?
This PR upgrades Netty from 4.1.126.Final to 4.1.127.Final and netty-tcnative from 2.0.72.Final to 2.0.73.Final.

### Why are the changes needed?
This version fixes a BouncyCastle-related regression introduced in Netty 4.1.126.Final:
- BouncyCastleAlpnSslUtils needs to use the correct SSLEngine class as otherwise it will fail to init static fields. ([#15628](netty/netty#15628))

The full release notes as follows:
- https://netty.io/news/2025/09/08/4-1-127-Final.html

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass Github Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #52356 from LuciferYang/SPARK-53599.

Authored-by: yangjie01 <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants