Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
fix deviance calculation when y = 0
  • Loading branch information
tengpeng committed Apr 23, 2018
commit 3c6a4dab973851e385b6c9a2c77e5684ad6171a4
Original file line number Diff line number Diff line change
Expand Up @@ -782,8 +782,12 @@ object GeneralizedLinearRegression extends DefaultParamsReadable[GeneralizedLine

override def variance(mu: Double): Double = mu

private def ylogy(y: Double, mu: Double): Double = {
if (y == 0) 0.0 else y * math.log(y / mu)
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another ylogy implementation in Binomial. Can you move this code to object GeneralizedLinearRegression and make it private to this package?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for the quick review. I have moved the ylog implementation to object GeneralizedLinearRegression. One quick question here: I am not sure I have fully understood why this is the right place for ylog? Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any suggestion to avoid the duplicated code? Let's followup this later if you have an idea.

override def deviance(y: Double, mu: Double, weight: Double): Double = {
2.0 * weight * (y * math.log(y / mu) - (y - mu))
2.0 * weight * (ylogy(y, mu) - (y - mu))
}

override def aic(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -495,8 +495,8 @@ class GeneralizedLinearRegressionSuite extends MLTest with DefaultReadWriteTest
[1] 1.8121235 -0.1747493 -0.5815417
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update the R script which generate the deviance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. The updated script is sufficient to calculate deviance on its own.

*/
val expected = Seq(
Vectors.dense(0.0, -0.0457441, -0.6833928),
Vectors.dense(1.8121235, -0.1747493, -0.5815417))
Vectors.dense(0.0, -0.0457441, -0.6833928, 3.8093),
Vectors.dense(1.8121235, -0.1747493, -0.5815417, 3.7006))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding them to expected is not consistent to the rest of the test code.

How about

val residualDeviancesR = Array(3.8093, 3.7006)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified. Thanks!

import GeneralizedLinearRegression._

Expand All @@ -507,7 +507,8 @@ class GeneralizedLinearRegressionSuite extends MLTest with DefaultReadWriteTest
val trainer = new GeneralizedLinearRegression().setFamily("poisson").setLink(link)
.setFitIntercept(fitIntercept).setLinkPredictionCol("linkPrediction")
val model = trainer.fit(dataset)
val actual = Vectors.dense(model.intercept, model.coefficients(0), model.coefficients(1))
val actual = Vectors.dense(model.intercept, model.coefficients(0), model.coefficients(1),
model.summary.deviance)
assert(actual ~= expected(idx) absTol 1e-4, "Model mismatch: GLM with poisson family, " +
s"$link link and fitIntercept = $fitIntercept (with zero values).")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert(model.summary.deviance ~== residualDeviancesR(idx) absTol 1E-3)

idx += 1
Expand Down