-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-3838][examples][mllib][python] Word2Vec example in python #2952
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can one of the admins verify this patch? |
|
@anantasty Could you also update the example code in |
|
@mengxr I updated the example code as well. |
docs/mllib-feature-extraction.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The input to Word2Vec should be sentences instead of individual words, though it doesn't affect the implementation. The following command extract 16 words per line.
grep -o -E '\w+(\W+\w+){0,15}' text8 > text8_lines
|
@mengxr I just implemented those changes. |
docs/mllib-feature-extraction.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inp -> sentences, [row] -> row.split(' ')
|
ok to test |
|
test this please |
|
Test build #22364 has started for PR 2952 at commit
|
|
Test build #22364 has finished for PR 2952 at commit
|
|
Test FAILed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use file_path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left that old line in there thanks for that!
…implified example in docs.
|
Test build #22598 has started for PR 2952 at commit
|
docs/mllib-feature-extraction.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the indent or remove this line.
|
Test build #22598 has finished for PR 2952 at commit
|
|
Test PASSed. |
|
@mengxr I think it's ready to merge. |
|
Test build #22618 has started for PR 2952 at commit
|
|
Test build #22618 has finished for PR 2952 at commit
|
|
Test FAILed. |
|
The URL shows 0 failures I am not sure why it says the tests fail. |
|
test this please |
|
Test build #22658 has started for PR 2952 at commit
|
|
Test build #22658 has finished for PR 2952 at commit
|
|
Test PASSed. |
|
LGTM. Merged into master. Thanks! |
This pull request refers to issue: https://issues.apache.org/jira/browse/SPARK-3838
Python example for word2vec
@mengxr