[SPARK-12603] [MLlib] PySpark MLlib GaussianMixtureModel should support single instance predict/predictSoft #10552

yanboliang · 2016-01-02T09:51:38Z

PySpark MLlib GaussianMixtureModel should support single instance predict/predictSoft just like Scala do.

…dictSoft

SparkQA · 2016-01-02T10:44:18Z

Test build #48580 has finished for PR 10552 at commit cbda57e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jkbradley · 2016-01-05T23:29:34Z

These changes look fine to me, but could you please test predictSoft in Python (also to add this to the example in the doc)?

SparkQA · 2016-01-06T04:59:29Z

Test build #48827 has finished for PR 10552 at commit da45010.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-01-06T09:19:05Z

Test build #48845 has finished for PR 10552 at commit af22848.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jkbradley · 2016-01-06T18:21:08Z

examples/src/main/scala/org/apache/spark/examples/mllib/DenseGaussianMixture.scala

I'd add a println after this for spacing.

jkbradley · 2016-01-06T18:21:22Z

Looks good except for the 1 comment

SparkQA · 2016-01-06T20:15:40Z

Test build #2338 has finished for PR 10552 at commit af22848.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-01-07T11:10:22Z

Test build #48933 has finished for PR 10552 at commit ee56c7b.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

jkbradley · 2016-01-07T19:37:14Z

I'm finding that this unit test fails on my laptop. I'm wondering if the problem is: different default parallelism levels on different machines => clusterdata_1 has different numbers of partitions => GMM gets initialized with different samples.

Could you please change the sc.parallelize call to use 2 partitions to see if that fixes the test?

SparkQA · 2016-01-08T07:21:39Z

Test build #49004 has finished for PR 10552 at commit 06f4033.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yanboliang · 2016-01-08T07:29:37Z

@jkbradley It can produce stable result after changing sc.parallelize to use 2 partitions. Thanks!
Due to the predictSoft output may be negligibly small that we can not use 0.000... to match equality, so I use abs(softPredicted[1] - 0.0) < 0.001 in doctest.

jkbradley · 2016-01-08T20:15:49Z

That looks good. I also just noticed: Could you please update the docs for Python predict, predictSoft to say they work on RDDs and single vectors? That should be it.

SparkQA · 2016-01-11T10:40:30Z

Test build #49132 has finished for PR 10552 at commit f6717a8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jkbradley · 2016-01-11T22:43:03Z

LGTM
Merging with master
Thanks for the PR!

yanboliang added 2 commits January 2, 2016 16:30

MLlib GaussianMixtureModel should support single instance predict/pre…

8371f34

…dictSoft

Fix python3 compatibility issue

cbda57e

yanboliang added 2 commits January 6, 2016 12:00

add predictSoft test

334f005

update examples

da45010

jkbradley reviewed Jan 6, 2016
View reviewed changes

examples/src/main/scala/org/apache/spark/examples/mllib/DenseGaussianMixture.scala

Copy link

Member

jkbradley Jan 6, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a println after this for spacing.

Add blank line

ee56c7b

yanboliang force-pushed the spark-12603 branch from af22848 to ee56c7b Compare January 7, 2016 10:20

make test stable

06f4033

add doc

f6717a8

asfgit closed this in ee4ee02 Jan 11, 2016

yanboliang deleted the spark-12603 branch January 12, 2016 01:34

[SPARK-12603] [MLlib] PySpark MLlib GaussianMixtureModel should support single instance predict/predictSoft #10552

[SPARK-12603] [MLlib] PySpark MLlib GaussianMixtureModel should support single instance predict/predictSoft #10552

Uh oh!

Conversation

yanboliang commented Jan 2, 2016

Uh oh!

SparkQA commented Jan 2, 2016

Uh oh!

jkbradley commented Jan 5, 2016

Uh oh!

SparkQA commented Jan 6, 2016

Uh oh!

SparkQA commented Jan 6, 2016

Uh oh!

jkbradley Jan 6, 2016

Choose a reason for hiding this comment

Uh oh!

jkbradley commented Jan 6, 2016

Uh oh!

SparkQA commented Jan 6, 2016

Uh oh!

SparkQA commented Jan 7, 2016

Uh oh!

jkbradley commented Jan 7, 2016

Uh oh!

SparkQA commented Jan 8, 2016

Uh oh!

yanboliang commented Jan 8, 2016

Uh oh!

jkbradley commented Jan 8, 2016

Uh oh!

SparkQA commented Jan 11, 2016

Uh oh!

jkbradley commented Jan 11, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants