Skip to content

Conversation

@yanboliang
Copy link
Contributor

PySpark MLlib GaussianMixtureModel should support single instance predict/predictSoft just like Scala do.

@SparkQA
Copy link

SparkQA commented Jan 2, 2016

Test build #48580 has finished for PR 10552 at commit cbda57e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jkbradley
Copy link
Member

These changes look fine to me, but could you please test predictSoft in Python (also to add this to the example in the doc)?

@SparkQA
Copy link

SparkQA commented Jan 6, 2016

Test build #48827 has finished for PR 10552 at commit da45010.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 6, 2016

Test build #48845 has finished for PR 10552 at commit af22848.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a println after this for spacing.

@jkbradley
Copy link
Member

Looks good except for the 1 comment

@SparkQA
Copy link

SparkQA commented Jan 6, 2016

Test build #2338 has finished for PR 10552 at commit af22848.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 7, 2016

Test build #48933 has finished for PR 10552 at commit ee56c7b.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jkbradley
Copy link
Member

I'm finding that this unit test fails on my laptop. I'm wondering if the problem is: different default parallelism levels on different machines => clusterdata_1 has different numbers of partitions => GMM gets initialized with different samples.

Could you please change the sc.parallelize call to use 2 partitions to see if that fixes the test?

@SparkQA
Copy link

SparkQA commented Jan 8, 2016

Test build #49004 has finished for PR 10552 at commit 06f4033.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yanboliang
Copy link
Contributor Author

@jkbradley It can produce stable result after changing sc.parallelize to use 2 partitions. Thanks!
Due to the predictSoft output may be negligibly small that we can not use 0.000... to match equality, so I use abs(softPredicted[1] - 0.0) < 0.001 in doctest.

@jkbradley
Copy link
Member

That looks good. I also just noticed: Could you please update the docs for Python predict, predictSoft to say they work on RDDs and single vectors? That should be it.

@SparkQA
Copy link

SparkQA commented Jan 11, 2016

Test build #49132 has finished for PR 10552 at commit f6717a8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jkbradley
Copy link
Member

LGTM
Merging with master
Thanks for the PR!

@asfgit asfgit closed this in ee4ee02 Jan 11, 2016
@yanboliang yanboliang deleted the spark-12603 branch January 12, 2016 01:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants