[Spark-21854] Added LogisticRegressionTrainingSummary for MultinomialLogisticRegression in Python API #19185

jmwdpk · 2017-09-11T03:28:26Z

What changes were proposed in this pull request?

Added LogisticRegressionTrainingSummary for MultinomialLogisticRegression in Python API

How was this patch tested?

Added unit test

Please review http://spark.apache.org/contributing.html before opening a pull request.

…ssification.py

gatorsmile · 2017-09-11T03:36:01Z

ok to test

gatorsmile · 2017-09-11T03:36:15Z

@yanboliang

yanboliang · 2017-09-11T03:37:03Z

@gatorsmile Thanks.

SparkQA · 2017-09-11T03:55:24Z

Test build #81615 has finished for PR 19185 at commit 53ac68e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yanboliang · 2017-09-11T11:37:27Z

python/pyspark/ml/classification.py

        trained on the training set. An exception is thrown if `trainingSummary is None`.
        """
        if self.hasSummary:
            java_blrt_summary = self._call_java("summary")


Rename this to java_lrt_summary, as it's not always binary logistic regression.

yanboliang · 2017-09-11T11:38:33Z

python/pyspark/ml/classification.py

-            # Note: Once multiclass is added, update this to return correct summary
-            return BinaryLogisticRegressionTrainingSummary(java_blrt_summary)
+            if (self.numClasses == 2):
+                java_blrt_binarysummary = self._call_java("binarySummary")


Actually this is not necessary, we can just wrap java_lrt_summary with BinaryLogisticRegressionTrainingSummary.

yanboliang · 2017-09-11T11:38:54Z

python/pyspark/ml/classification.py

            java_blrt_summary = self._call_java("summary")
-            # Note: Once multiclass is added, update this to return correct summary
-            return BinaryLogisticRegressionTrainingSummary(java_blrt_summary)
+            if (self.numClasses == 2):


if (self.numClasses <= 2)

yanboliang · 2017-09-11T11:46:00Z

python/pyspark/ml/classification.py

+        """
+        return self._call_java("recallByLabel")
+
+    @property


Remove this annotation.

yanboliang · 2017-09-11T11:48:34Z

python/pyspark/ml/classification.py

+        """
+        return self._call_java("weightedPrecision")
+
+    @property


Remove this annotation.

yanboliang · 2017-09-11T11:51:44Z

python/pyspark/ml/tests.py

+        self.assertAlmostEqual(s.weightedPrecision, 0.583, 2)
+        self.assertAlmostEqual(s.weightedFMeasure, 0.65, 2)
+        # test evaluation (with training dataset) produces a summary with same values
+        # one check is enough to verify a summary is returned, Scala version runs full test


Please add test for evaluation like:

sameSummary = model.evaluate(df) self.assertAlmostEqual(sameSummary.accuracy, s.accuracy)

yanboliang · 2017-09-11T11:54:16Z

python/pyspark/ml/tests.py

+        self.assertAlmostEqual(s.weightedFalsePositiveRate, 0.25, 2)
+        self.assertAlmostEqual(s.weightedRecall, 0.75, 2)
+        self.assertAlmostEqual(s.weightedPrecision, 0.583, 2)
+        self.assertAlmostEqual(s.weightedFMeasure, 0.65, 2)


We need to add these check for the above test_logistic_regression_summary and rename it to test_binary_logistic_regression_summary, since binary logistic regression summary has these variables as well.

SparkQA · 2017-09-11T15:55:24Z

Test build #81642 has finished for PR 19185 at commit a4755d7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

sethah

Just a few minor things. LGTM otherwise. Thanks!

sethah · 2017-09-11T23:40:14Z

python/pyspark/ml/tests.py

+        self.assertAlmostEqual(s.weightedFalsePositiveRate, 0.25, 2)
+        self.assertAlmostEqual(s.weightedRecall, 0.75, 2)
+        self.assertAlmostEqual(s.weightedPrecision, 0.583, 2)
+        self.assertAlmostEqual(s.weightedFMeasure(), 0.65, 2)


maybe add beta=1.0 to the methods that take beta as a parameter.

sethah · 2017-09-11T23:40:47Z

python/pyspark/ml/tests.py

        self.assertTrue(isinstance(s.precisionByThreshold, DataFrame))
        self.assertTrue(isinstance(s.recallByThreshold, DataFrame))
+
+        self.assertAlmostEqual(s.accuracy, 1.0, 2)


care to add these to the scala unit test for binary summary as well?

also nit, but should probably add tests for all the new attributes, like falsePositiveRateByLabel as below.

sethah · 2017-09-11T23:42:45Z

python/pyspark/ml/classification.py

-            # Note: Once multiclass is added, update this to return correct summary
-            return BinaryLogisticRegressionTrainingSummary(java_blrt_summary)
+            java_lrt_summary = self._call_java("summary")
+            if (self.numClasses <= 2):


nit: remove parentheses

…for other binary logistic regression summary

SparkQA · 2017-09-12T07:04:45Z

Test build #81660 has finished for PR 19185 at commit eb8f6b4.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

yanboliang · 2017-09-12T09:39:37Z

Jenkins, test this please.

SparkQA · 2017-09-12T10:58:29Z

Test build #81669 has finished for PR 19185 at commit eb8f6b4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yanboliang

One nit left, otherwise, LGTM.

yanboliang · 2017-09-14T04:06:54Z

python/pyspark/ml/tests.py

+        # test evaluation (with training dataset) produces a summary with same values
+        # one check is enough to verify a summary is returned, Scala version runs full test
+        sameSummary = model.evaluate(df)
+        self.assertAlmostEqual(sameSummary.accuracy, s.accuracy)


Nit: Like mentioned in annotation, one check is enough to verify a summary is returned, let's remove others to simplify the test. Thanks.

SparkQA · 2017-09-14T05:40:20Z

Test build #81757 has finished for PR 19185 at commit 6529fa6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yanboliang

LGTM, merged into master. Thanks for all.

Ming Jiang added 4 commits September 10, 2017 20:08

added probabilityCol to LogisticRegressionSummary

50cfafe

modified LogisticRegressionSummary and LogisticRegressionModel in cla…

60579d5

…ssification.py

test on multiclass summary

1a73e6c

fixed numClasses

53ac68e

yanboliang reviewed Sep 11, 2017

View reviewed changes

0911 added more test and simplified summary logic

a4755d7

sethah reviewed Sep 11, 2017

View reviewed changes

added more scala unit tests for binary summary, and additional tests …

eb8f6b4

…for other binary logistic regression summary

yanboliang reviewed Sep 14, 2017

View reviewed changes

removed extra summary test

6529fa6

yanboliang approved these changes Sep 14, 2017

View reviewed changes

asfgit closed this in 8d8641f Sep 14, 2017

[Spark-21854] Added LogisticRegressionTrainingSummary for MultinomialLogisticRegression in Python API #19185

[Spark-21854] Added LogisticRegressionTrainingSummary for MultinomialLogisticRegression in Python API #19185

Uh oh!

Conversation

jmwdpk commented Sep 11, 2017

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

gatorsmile commented Sep 11, 2017

Uh oh!

gatorsmile commented Sep 11, 2017

Uh oh!

yanboliang commented Sep 11, 2017

Uh oh!

SparkQA commented Sep 11, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 11, 2017

Uh oh!

sethah left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 12, 2017

Uh oh!

yanboliang commented Sep 12, 2017

Uh oh!

SparkQA commented Sep 12, 2017

Uh oh!

yanboliang left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 14, 2017

Uh oh!

yanboliang left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants