Skip to content

Conversation

@arashpa
Copy link

@arashpa arashpa commented Apr 20, 2016

What changes were proposed in this pull request?

The PySpark deserialization has a bug that shows while deserializing all zero sparse vectors. This fix filters out empty string tokens before casting, hence properly stringified SparseVectors successfully get parsed.

How was this patch tested?

Standard unit-tests similar to other methods.

@srowen
Copy link
Member

srowen commented Apr 20, 2016

Jenkins test this please

@srowen
Copy link
Member

srowen commented Apr 20, 2016

AFAIK that's correct, because similar calls work as described here in Scala. LGTM pending tests

@SparkQA
Copy link

SparkQA commented Apr 20, 2016

Test build #56350 has finished for PR 12516 at commit 5a0ace6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vishnu667
Copy link

vishnu667 commented Apr 20, 2016

@arashpa I have updated your test cases in the following pull request to your branch arashpa/spark/pull/#1

@srowen This PR contains the changes to the Test Cases that I mentioned in #12513

Once Merged Both ( #12516 and #12513 ) will contain the same Fix you can wither wait for that or merge #12513

@srowen
Copy link
Member

srowen commented Apr 20, 2016

Actually we should probably close this in favor of #12513 which came first and has additional fixes.

@arashpa
Copy link
Author

arashpa commented Apr 20, 2016

@vishnu667 Thanks merged in your PR.

@arashpa
Copy link
Author

arashpa commented Apr 20, 2016

@srowen would be great if this PR is considered as the bug fix (instead of #12513) since I discovered the bug and also provided the first PR on the Jira ticket. I made a mistake by closing and opening a new PR when I wanted to update the code instead of adding more commits. By merging the @vishnu667 PR this should now have all the updated tests.

@srowen
Copy link
Member

srowen commented Apr 20, 2016

@arashpa there was actually a fourth as well, one before you (WTH?) but the first two were closed.
Huh this really is an unusual sequence. Maciej really identified the original and follow-on problem (you didn't open the JIRA). Although in the end the important thing is to get one correct fix in, I understand we want to observe some order and fairness since people care about merge/JIRA credit.

Generally, people shouldn't open a new PR when there were others in progress, unless that one has been abandoned or there's a clear need to try a different approach. I see you began work on the JIRA first, at virtually the same time as zero323, but both those were closed for some reason: #12510 #12511

Then #12513 which was correct, but I see it built on Maciej's comment and maybe your first PR. Why didn't you just update the original one instead of closing? it kind of signaled you weren't working on it.

I am OK merging this one for reasons above. Maybe a little more communication would have avoided 3-4x duplicated effort on this one.

@arashpa
Copy link
Author

arashpa commented Apr 20, 2016

@srowen I apologize, closing #12510 was a mistake and I helped making this confusing. I didn't open the Jira ticket but the stackoverflow post is by me.

@vishnu667
Copy link

@arashpa You'll need to merge https://github.com/arashpa/spark/pull/2 your test cases are still not updated the previous commit it got merged to your master instead of the current branch.

@srowen which PR are you going to merge so that we can close the other one.

@arashpa
Copy link
Author

arashpa commented Apr 20, 2016

@vishnu667 just merged the second PR.

@ameyc
Copy link

ameyc commented Apr 20, 2016

@srowen i wanted to add a comment regarding fairness of credit. @arashpa did indeed find the bug since we were looking at this yesterday, Maciej reported the issue based off of @arashpa 's stack overflow question about the bug ( http://stackoverflow.com/questions/36730727/parsing-all-zero-sparse-vectors-with-pyspark-sparsevectors ).

@srowen
Copy link
Member

srowen commented Apr 20, 2016

OK that all sounds good. With different aliases on different sites, I didn't see the connection.

@srowen
Copy link
Member

srowen commented Apr 21, 2016

Jenkins retest this please

@SparkQA
Copy link

SparkQA commented Apr 21, 2016

Test build #56523 has finished for PR 12516 at commit aeefb82.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request Apr 21, 2016
## What changes were proposed in this pull request?

The PySpark deserialization has a bug that shows while deserializing all zero sparse vectors. This fix filters out empty string tokens before casting, hence properly stringified SparseVectors successfully get parsed.

## How was this patch tested?

Standard unit-tests similar to other methods.

Author: Arash Parsa <[email protected]>
Author: Arash Parsa <[email protected]>
Author: Vishnu Prasad <[email protected]>
Author: Vishnu Prasad S <[email protected]>

Closes #12516 from arashpa/SPARK-14739.

(cherry picked from commit 2b8906c)
Signed-off-by: Sean Owen <[email protected]>
@srowen
Copy link
Member

srowen commented Apr 21, 2016

Merged to master/1.6

@asfgit asfgit closed this in 2b8906c Apr 21, 2016
zzcclp pushed a commit to zzcclp/spark that referenced this pull request Apr 22, 2016
## What changes were proposed in this pull request?

The PySpark deserialization has a bug that shows while deserializing all zero sparse vectors. This fix filters out empty string tokens before casting, hence properly stringified SparseVectors successfully get parsed.

## How was this patch tested?

Standard unit-tests similar to other methods.

Author: Arash Parsa <[email protected]>
Author: Arash Parsa <[email protected]>
Author: Vishnu Prasad <[email protected]>
Author: Vishnu Prasad S <[email protected]>

Closes apache#12516 from arashpa/SPARK-14739.

(cherry picked from commit 2b8906c)
Signed-off-by: Sean Owen <[email protected]>
(cherry picked from commit 1cda10b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants