-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-3974][MLlib] Distributed Block Matrix Abstractions #3200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from 1 commit
Commits
Show all changes
24 commits
Select commit
Hold shift + click to select a range
b693209
Ready for Pull request
f378e16
[SPARK-3974] Block Matrix Abstractions ready
aa8f086
[SPARK-3974] Additional comments added
589fbb6
[SPARK-3974] Code review feedback addressed
19c17e8
[SPARK-3974] Changed blockIdRow and blockIdCol
b05aabb
[SPARK-3974] Updated tests to reflect changes
brkyvz 645afbe
[SPARK-3974] Pull latest master
brkyvz 49b9586
[SPARK-3974] Updated testing utils from master
brkyvz d033861
[SPARK-3974] Removed SubMatrixInfo and added constructor without part…
brkyvz 9ae85aa
[SPARK-3974] Made partitioner a variable inside BlockMatrix instead o…
brkyvz ab6cde0
[SPARK-3974] Modifications cleaning code up, making size calculation …
brkyvz ba414d2
[SPARK-3974] fixed frobenius norm
brkyvz 239ab4b
[SPARK-3974] Addressed @jkbradley's comments
brkyvz 1e8bb2a
[SPARK-3974] Change return type of cache and persist
brkyvz 1a63b20
[SPARK-3974] Remove setPartition method. Isn't required
brkyvz eebbdf7
preliminary changes addressing code review
brkyvz f9d664b
updated API and modified partitioning scheme
brkyvz 1694c9e
almost finished addressing comments
brkyvz 140f20e
Merge branch 'master' of github.com:apache/spark into SPARK-3974
brkyvz 5eecd48
fixed gridPartitioner and added tests
brkyvz 24ec7b8
update grid partitioner
mengxr e1d3ee8
minor updates
mengxr feb32a7
update tests
mengxr a8eace2
Merge pull request #2 from mengxr/brkyvz-SPARK-3974
brkyvz File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
[SPARK-3974] Changed blockIdRow and blockIdCol
- Loading branch information
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -28,27 +28,27 @@ import org.apache.spark.util.Utils | |
| /** | ||
| * Represents a local matrix that makes up one block of a distributed BlockMatrix | ||
| * | ||
| * @param blockIdRow The row index of this block | ||
| * @param blockIdCol The column index of this block | ||
| * @param blockRowIndex The row index of this block | ||
| * @param blockColIndex The column index of this block | ||
| * @param mat The underlying local matrix | ||
| */ | ||
| case class SubMatrix(blockIdRow: Int, blockIdCol: Int, mat: DenseMatrix) extends Serializable | ||
| case class SubMatrix(blockRowIndex: Int, blockColIndex: Int, mat: DenseMatrix) extends Serializable | ||
|
|
||
| /** | ||
| * Information of the submatrices of the BlockMatrix maintained on the driver | ||
| * | ||
| * @param partitionId The id of the partition the block is found in | ||
| * @param blockIdRow The row index of this block | ||
| * @param blockIdCol The column index of this block | ||
| * @param blockRowIndex The row index of this block | ||
| * @param blockColIndex The column index of this block | ||
| * @param startRow The starting row index with respect to the distributed BlockMatrix | ||
| * @param numRows The number of rows in this block | ||
| * @param startCol The starting column index with respect to the distributed BlockMatrix | ||
| * @param numCols The number of columns in this block | ||
| */ | ||
| case class SubMatrixInfo( | ||
| partitionId: Int, | ||
| blockIdRow: Int, | ||
| blockIdCol: Int, | ||
| blockRowIndex: Int, | ||
| blockColIndex: Int, | ||
| startRow: Long, | ||
| numRows: Int, | ||
| startCol: Long, | ||
|
|
@@ -228,7 +228,7 @@ class BlockMatrix( | |
| // collect may cause akka frameSize errors | ||
| val blockStartRowColsParts = matrixRDD.mapPartitionsWithIndex { case (partId, iter) => | ||
| iter.map { case (id, block) => | ||
| ((block.blockIdRow, block.blockIdCol), (partId, block.mat.numRows, block.mat.numCols)) | ||
| ((block.blockRowIndex, block.blockColIndex), (partId, block.mat.numRows, block.mat.numCols)) | ||
| } | ||
| }.collect() | ||
| val blockStartRowCols = blockStartRowColsParts.sortBy(_._1) | ||
|
|
@@ -283,9 +283,9 @@ class BlockMatrix( | |
| private def keyBy(part: BlockMatrixPartitioner = partitioner): RDD[(Int, SubMatrix)] = { | ||
| rdd.map { block => | ||
| part match { | ||
| case r: RowBasedPartitioner => (block.blockIdRow, block) | ||
| case c: ColumnBasedPartitioner => (block.blockIdCol, block) | ||
| case g: GridPartitioner => (block.blockIdRow + numRowBlocks * block.blockIdCol, block) | ||
| case r: RowBasedPartitioner => (block.blockRowIndex, block) | ||
| case c: ColumnBasedPartitioner => (block.blockColIndex, block) | ||
| case g: GridPartitioner => (block.blockRowIndex + numRowBlocks * block.blockColIndex, block) | ||
| case _ => throw new IllegalArgumentException("Unrecognized partitioner") | ||
| } | ||
| } | ||
|
|
@@ -304,7 +304,7 @@ class BlockMatrix( | |
|
|
||
| /** Collect the distributed matrix on the driver. */ | ||
| def collect(): DenseMatrix = { | ||
| val parts = rdd.map(x => ((x.blockIdRow, x.blockIdCol), x.mat)). | ||
| val parts = rdd.map(x => ((x.blockRowIndex, x.blockColIndex), x.mat)). | ||
| collect().sortBy(x => (x._1._2, x._1._1)) | ||
| val nRows = numRows().toInt | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd check numRows and numCols here before converting to Int and throw an error if the matrix is too large. |
||
| val nCols = numCols().toInt | ||
|
|
||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
BlockPartitionInfo->SubmatrixInfo?Can
startRowbe determined from block row index plus the partitioner? Are we going to support irregular grids?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question... I left it in there for the following reason, not sure if it's going to be ever required, but in the end, I thought it would be good to cover corner cases such as irregular grids:
Assume you have a matrix A with dimensions 280 x d. Assume each
SubMatrixhas a dimension 30 x d/3. The last row will consist of SubMatrices 10 x d/3.Then you vertically append a Matrix B, with dimensions n x d. Then you're left with an irregular grid.
Maybe vertical concatenation is not as common as horizontal concatenation, but being ready to support such operations seems beneficial for users.