-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-12603] [MLlib] PySpark MLlib GaussianMixtureModel should support single instance predict/predictSoft #10552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #48580 has finished for PR 10552 at commit
|
|
These changes look fine to me, but could you please test predictSoft in Python (also to add this to the example in the doc)? |
|
Test build #48827 has finished for PR 10552 at commit
|
|
Test build #48845 has finished for PR 10552 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd add a println after this for spacing.
|
Looks good except for the 1 comment |
|
Test build #2338 has finished for PR 10552 at commit
|
af22848 to
ee56c7b
Compare
|
Test build #48933 has finished for PR 10552 at commit
|
|
I'm finding that this unit test fails on my laptop. I'm wondering if the problem is: different default parallelism levels on different machines => clusterdata_1 has different numbers of partitions => GMM gets initialized with different samples. Could you please change the sc.parallelize call to use 2 partitions to see if that fixes the test? |
|
Test build #49004 has finished for PR 10552 at commit
|
|
@jkbradley It can produce stable result after changing |
|
That looks good. I also just noticed: Could you please update the docs for Python predict, predictSoft to say they work on RDDs and single vectors? That should be it. |
|
Test build #49132 has finished for PR 10552 at commit
|
|
LGTM |
PySpark MLlib
GaussianMixtureModelshould support single instancepredict/predictSoftjust like Scala do.