Skip to content

Conversation

@jmwdpk
Copy link

@jmwdpk jmwdpk commented Sep 11, 2017

What changes were proposed in this pull request?

Added LogisticRegressionTrainingSummary for MultinomialLogisticRegression in Python API

How was this patch tested?

Added unit test

Please review http://spark.apache.org/contributing.html before opening a pull request.

@gatorsmile
Copy link
Member

ok to test

@gatorsmile
Copy link
Member

@yanboliang

@yanboliang
Copy link
Contributor

@gatorsmile Thanks.

@SparkQA
Copy link

SparkQA commented Sep 11, 2017

Test build #81615 has finished for PR 19185 at commit 53ac68e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

trained on the training set. An exception is thrown if `trainingSummary is None`.
"""
if self.hasSummary:
java_blrt_summary = self._call_java("summary")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename this to java_lrt_summary, as it's not always binary logistic regression.

# Note: Once multiclass is added, update this to return correct summary
return BinaryLogisticRegressionTrainingSummary(java_blrt_summary)
if (self.numClasses == 2):
java_blrt_binarysummary = self._call_java("binarySummary")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this is not necessary, we can just wrap java_lrt_summary with BinaryLogisticRegressionTrainingSummary.

java_blrt_summary = self._call_java("summary")
# Note: Once multiclass is added, update this to return correct summary
return BinaryLogisticRegressionTrainingSummary(java_blrt_summary)
if (self.numClasses == 2):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (self.numClasses <= 2)

"""
return self._call_java("recallByLabel")

@property
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this annotation.

"""
return self._call_java("weightedPrecision")

@property
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this annotation.

self.assertAlmostEqual(s.weightedPrecision, 0.583, 2)
self.assertAlmostEqual(s.weightedFMeasure, 0.65, 2)
# test evaluation (with training dataset) produces a summary with same values
# one check is enough to verify a summary is returned, Scala version runs full test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add test for evaluation like:

sameSummary = model.evaluate(df)
self.assertAlmostEqual(sameSummary.accuracy, s.accuracy)

self.assertAlmostEqual(s.weightedFalsePositiveRate, 0.25, 2)
self.assertAlmostEqual(s.weightedRecall, 0.75, 2)
self.assertAlmostEqual(s.weightedPrecision, 0.583, 2)
self.assertAlmostEqual(s.weightedFMeasure, 0.65, 2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add these check for the above test_logistic_regression_summary and rename it to test_binary_logistic_regression_summary, since binary logistic regression summary has these variables as well.

@SparkQA
Copy link

SparkQA commented Sep 11, 2017

Test build #81642 has finished for PR 19185 at commit a4755d7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@sethah sethah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few minor things. LGTM otherwise. Thanks!

self.assertAlmostEqual(s.weightedFalsePositiveRate, 0.25, 2)
self.assertAlmostEqual(s.weightedRecall, 0.75, 2)
self.assertAlmostEqual(s.weightedPrecision, 0.583, 2)
self.assertAlmostEqual(s.weightedFMeasure(), 0.65, 2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add beta=1.0 to the methods that take beta as a parameter.

self.assertTrue(isinstance(s.precisionByThreshold, DataFrame))
self.assertTrue(isinstance(s.recallByThreshold, DataFrame))

self.assertAlmostEqual(s.accuracy, 1.0, 2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

care to add these to the scala unit test for binary summary as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also nit, but should probably add tests for all the new attributes, like falsePositiveRateByLabel as below.

# Note: Once multiclass is added, update this to return correct summary
return BinaryLogisticRegressionTrainingSummary(java_blrt_summary)
java_lrt_summary = self._call_java("summary")
if (self.numClasses <= 2):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove parentheses

…for other binary logistic regression summary
@SparkQA
Copy link

SparkQA commented Sep 12, 2017

Test build #81660 has finished for PR 19185 at commit eb8f6b4.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yanboliang
Copy link
Contributor

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Sep 12, 2017

Test build #81669 has finished for PR 19185 at commit eb8f6b4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@yanboliang yanboliang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit left, otherwise, LGTM.

# test evaluation (with training dataset) produces a summary with same values
# one check is enough to verify a summary is returned, Scala version runs full test
sameSummary = model.evaluate(df)
self.assertAlmostEqual(sameSummary.accuracy, s.accuracy)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Like mentioned in annotation, one check is enough to verify a summary is returned, let's remove others to simplify the test. Thanks.

@SparkQA
Copy link

SparkQA commented Sep 14, 2017

Test build #81757 has finished for PR 19185 at commit 6529fa6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@yanboliang yanboliang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, merged into master. Thanks for all.

@asfgit asfgit closed this in 8d8641f Sep 14, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants