-
Notifications
You must be signed in to change notification settings - Fork 29k
[Spark-21854] Added LogisticRegressionTrainingSummary for MultinomialLogisticRegression in Python API #19185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
ok to test |
|
@gatorsmile Thanks. |
|
Test build #81615 has finished for PR 19185 at commit
|
python/pyspark/ml/classification.py
Outdated
| trained on the training set. An exception is thrown if `trainingSummary is None`. | ||
| """ | ||
| if self.hasSummary: | ||
| java_blrt_summary = self._call_java("summary") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename this to java_lrt_summary, as it's not always binary logistic regression.
python/pyspark/ml/classification.py
Outdated
| # Note: Once multiclass is added, update this to return correct summary | ||
| return BinaryLogisticRegressionTrainingSummary(java_blrt_summary) | ||
| if (self.numClasses == 2): | ||
| java_blrt_binarysummary = self._call_java("binarySummary") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this is not necessary, we can just wrap java_lrt_summary with BinaryLogisticRegressionTrainingSummary.
python/pyspark/ml/classification.py
Outdated
| java_blrt_summary = self._call_java("summary") | ||
| # Note: Once multiclass is added, update this to return correct summary | ||
| return BinaryLogisticRegressionTrainingSummary(java_blrt_summary) | ||
| if (self.numClasses == 2): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (self.numClasses <= 2)
python/pyspark/ml/classification.py
Outdated
| """ | ||
| return self._call_java("recallByLabel") | ||
|
|
||
| @property |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this annotation.
python/pyspark/ml/classification.py
Outdated
| """ | ||
| return self._call_java("weightedPrecision") | ||
|
|
||
| @property |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this annotation.
| self.assertAlmostEqual(s.weightedPrecision, 0.583, 2) | ||
| self.assertAlmostEqual(s.weightedFMeasure, 0.65, 2) | ||
| # test evaluation (with training dataset) produces a summary with same values | ||
| # one check is enough to verify a summary is returned, Scala version runs full test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add test for evaluation like:
sameSummary = model.evaluate(df)
self.assertAlmostEqual(sameSummary.accuracy, s.accuracy)
python/pyspark/ml/tests.py
Outdated
| self.assertAlmostEqual(s.weightedFalsePositiveRate, 0.25, 2) | ||
| self.assertAlmostEqual(s.weightedRecall, 0.75, 2) | ||
| self.assertAlmostEqual(s.weightedPrecision, 0.583, 2) | ||
| self.assertAlmostEqual(s.weightedFMeasure, 0.65, 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to add these check for the above test_logistic_regression_summary and rename it to test_binary_logistic_regression_summary, since binary logistic regression summary has these variables as well.
|
Test build #81642 has finished for PR 19185 at commit
|
sethah
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few minor things. LGTM otherwise. Thanks!
| self.assertAlmostEqual(s.weightedFalsePositiveRate, 0.25, 2) | ||
| self.assertAlmostEqual(s.weightedRecall, 0.75, 2) | ||
| self.assertAlmostEqual(s.weightedPrecision, 0.583, 2) | ||
| self.assertAlmostEqual(s.weightedFMeasure(), 0.65, 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add beta=1.0 to the methods that take beta as a parameter.
| self.assertTrue(isinstance(s.precisionByThreshold, DataFrame)) | ||
| self.assertTrue(isinstance(s.recallByThreshold, DataFrame)) | ||
|
|
||
| self.assertAlmostEqual(s.accuracy, 1.0, 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
care to add these to the scala unit test for binary summary as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also nit, but should probably add tests for all the new attributes, like falsePositiveRateByLabel as below.
python/pyspark/ml/classification.py
Outdated
| # Note: Once multiclass is added, update this to return correct summary | ||
| return BinaryLogisticRegressionTrainingSummary(java_blrt_summary) | ||
| java_lrt_summary = self._call_java("summary") | ||
| if (self.numClasses <= 2): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove parentheses
…for other binary logistic regression summary
|
Test build #81660 has finished for PR 19185 at commit
|
|
Jenkins, test this please. |
|
Test build #81669 has finished for PR 19185 at commit
|
yanboliang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One nit left, otherwise, LGTM.
| # test evaluation (with training dataset) produces a summary with same values | ||
| # one check is enough to verify a summary is returned, Scala version runs full test | ||
| sameSummary = model.evaluate(df) | ||
| self.assertAlmostEqual(sameSummary.accuracy, s.accuracy) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Like mentioned in annotation, one check is enough to verify a summary is returned, let's remove others to simplify the test. Thanks.
|
Test build #81757 has finished for PR 19185 at commit
|
yanboliang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, merged into master. Thanks for all.
What changes were proposed in this pull request?
Added LogisticRegressionTrainingSummary for MultinomialLogisticRegression in Python API
How was this patch tested?
Added unit test
Please review http://spark.apache.org/contributing.html before opening a pull request.