[SPARK-14739] [PySpark] Fix Vectors parser bugs #12516

arashpa · 2016-04-20T01:26:37Z

What changes were proposed in this pull request?

The PySpark deserialization has a bug that shows while deserializing all zero sparse vectors. This fix filters out empty string tokens before casting, hence properly stringified SparseVectors successfully get parsed.

How was this patch tested?

Standard unit-tests similar to other methods.

srowen · 2016-04-20T10:19:01Z

Jenkins test this please

srowen · 2016-04-20T10:19:39Z

AFAIK that's correct, because similar calls work as described here in Scala. LGTM pending tests

SparkQA · 2016-04-20T10:34:39Z

Test build #56350 has finished for PR 12516 at commit 5a0ace6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

Spark 14739

… the parsing test cases

vishnu667 · 2016-04-20T10:55:14Z

@arashpa I have updated your test cases in the following pull request to your branch arashpa/spark/pull/#1

@srowen This PR contains the changes to the Test Cases that I mentioned in #12513

Once Merged Both ( #12516 and #12513 ) will contain the same Fix you can wither wait for that or merge #12513

srowen · 2016-04-20T11:07:48Z

Actually we should probably close this in favor of #12513 which came first and has additional fixes.

arashpa · 2016-04-20T17:12:17Z

@vishnu667 Thanks merged in your PR.

arashpa · 2016-04-20T17:13:44Z

@srowen would be great if this PR is considered as the bug fix (instead of #12513) since I discovered the bug and also provided the first PR on the Jira ticket. I made a mistake by closing and opening a new PR when I wanted to update the code instead of adding more commits. By merging the @vishnu667 PR this should now have all the updated tests.

srowen · 2016-04-20T17:30:09Z

@arashpa there was actually a fourth as well, one before you (WTH?) but the first two were closed.
Huh this really is an unusual sequence. Maciej really identified the original and follow-on problem (you didn't open the JIRA). Although in the end the important thing is to get one correct fix in, I understand we want to observe some order and fairness since people care about merge/JIRA credit.

Generally, people shouldn't open a new PR when there were others in progress, unless that one has been abandoned or there's a clear need to try a different approach. I see you began work on the JIRA first, at virtually the same time as zero323, but both those were closed for some reason: #12510 #12511

Then #12513 which was correct, but I see it built on Maciej's comment and maybe your first PR. Why didn't you just update the original one instead of closing? it kind of signaled you weren't working on it.

I am OK merging this one for reasons above. Maybe a little more communication would have avoided 3-4x duplicated effort on this one.

arashpa · 2016-04-20T17:38:05Z

@srowen I apologize, closing #12510 was a mistake and I helped making this confusing. I didn't open the Jira ticket but the stackoverflow post is by me.

vishnu667 · 2016-04-20T18:04:51Z

@arashpa You'll need to merge https://github.com/arashpa/spark/pull/2 your test cases are still not updated the previous commit it got merged to your master instead of the current branch.

@srowen which PR are you going to merge so that we can close the other one.

Test cases fix

arashpa · 2016-04-20T18:16:14Z

@vishnu667 just merged the second PR.

ameyc · 2016-04-20T18:43:34Z

@srowen i wanted to add a comment regarding fairness of credit. @arashpa did indeed find the bug since we were looking at this yesterday, Maciej reported the issue based off of @arashpa 's stack overflow question about the bug ( http://stackoverflow.com/questions/36730727/parsing-all-zero-sparse-vectors-with-pyspark-sparsevectors ).

srowen · 2016-04-20T19:34:00Z

OK that all sounds good. With different aliases on different sites, I didn't see the connection.

srowen · 2016-04-21T09:50:36Z

Jenkins retest this please

SparkQA · 2016-04-21T10:04:55Z

Test build #56523 has finished for PR 12516 at commit aeefb82.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

## What changes were proposed in this pull request? The PySpark deserialization has a bug that shows while deserializing all zero sparse vectors. This fix filters out empty string tokens before casting, hence properly stringified SparseVectors successfully get parsed. ## How was this patch tested? Standard unit-tests similar to other methods. Author: Arash Parsa <[email protected]> Author: Arash Parsa <[email protected]> Author: Vishnu Prasad <[email protected]> Author: Vishnu Prasad S <[email protected]> Closes #12516 from arashpa/SPARK-14739. (cherry picked from commit 2b8906c) Signed-off-by: Sean Owen <[email protected]>

srowen · 2016-04-21T10:30:27Z

Merged to master/1.6

## What changes were proposed in this pull request? The PySpark deserialization has a bug that shows while deserializing all zero sparse vectors. This fix filters out empty string tokens before casting, hence properly stringified SparseVectors successfully get parsed. ## How was this patch tested? Standard unit-tests similar to other methods. Author: Arash Parsa <[email protected]> Author: Arash Parsa <[email protected]> Author: Vishnu Prasad <[email protected]> Author: Vishnu Prasad S <[email protected]> Closes apache#12516 from arashpa/SPARK-14739. (cherry picked from commit 2b8906c) Signed-off-by: Sean Owen <[email protected]> (cherry picked from commit 1cda10b)

Fix bug while parsing all zero sparse vectors

5a0ace6

viirya mentioned this pull request Apr 20, 2016

[SPARK-14739][Python] Fix for Sparse and Dense Vector Parsing Errors #12513

Closed

vishnu667 added 2 commits April 20, 2016 16:16

Merge pull request #2 from arashpa/SPARK-14739

0977493

Spark 14739

Updated the old test cases replacing assertTrue with assertEquals for…

008cdd5

… the parsing test cases

Merge pull request #2 from vishnu667/testCasesFix

aeefb82

Test cases fix

asfgit closed this in 2b8906c Apr 21, 2016

[SPARK-14739] [PySpark] Fix Vectors parser bugs #12516

[SPARK-14739] [PySpark] Fix Vectors parser bugs #12516

Uh oh!

Conversation

arashpa commented Apr 20, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

srowen commented Apr 20, 2016

Uh oh!

srowen commented Apr 20, 2016

Uh oh!

SparkQA commented Apr 20, 2016

Uh oh!

vishnu667 commented Apr 20, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

srowen commented Apr 20, 2016

Uh oh!

arashpa commented Apr 20, 2016

Uh oh!

arashpa commented Apr 20, 2016

Uh oh!

srowen commented Apr 20, 2016

Uh oh!

arashpa commented Apr 20, 2016

Uh oh!

vishnu667 commented Apr 20, 2016

Uh oh!

arashpa commented Apr 20, 2016

Uh oh!

ameyc commented Apr 20, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

srowen commented Apr 20, 2016

Uh oh!

srowen commented Apr 21, 2016

Uh oh!

SparkQA commented Apr 21, 2016

Uh oh!

srowen commented Apr 21, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

vishnu667 commented Apr 20, 2016 •

edited

Loading

ameyc commented Apr 20, 2016 •

edited

Loading