Skip to content

Conversation

@zhengruifeng
Copy link
Contributor

What changes were proposed in this pull request?

ml.MulticlassClassificationEvaluator & mllib.MulticlassMetrics support log-loss

Why are the changes needed?

log-loss is an important classification metric and is widely used in practice

Does this PR introduce any user-facing change?

Yes, add new option ("logloss") and a related param eps

How was this patch tested?

added testsuites & local tests refering to sklearn



private val confusions = predictionAndLabels.map {
private lazy val confusions = predictionAndLabels.map {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the metricName==logloss, then the confusion matrix is not needed, so I make this computation lazy.

(prediction, label, 1.0)
case other =>
throw new IllegalArgumentException(s"Expected Row of tuples, got $other")
this(predictionAndLabels.rdd.map { r =>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

matching will not work in pyspark, so I have to use r.get instead.
MultilabelMetrics also deals with dataframe in this way.

@SparkQA
Copy link

SparkQA commented Oct 16, 2019

Test build #112149 has finished for PR 26135 at commit dadf716.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 16, 2019

Test build #112150 has finished for PR 26135 at commit 90c8ef2.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 16, 2019

Test build #112155 has finished for PR 26135 at commit 2b94170.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 16, 2019

Test build #112163 has finished for PR 26135 at commit a981f7b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

def logLoss(eps: Double = 1e-15): Double = {
require(eps > 0 && eps < 0.5, s"eps must be in range (0, 0.5), but got $eps")
val loss1 = - math.log(eps)
val loss2 = - math.log(1 - eps)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- math.log1p(-eps)? because eps is going to be very small

lazy val labels: Array[Double] = tpByClass.keys.toArray.sorted

/**
* Returns the logLoss, aka logistic loss or cross-entropy loss.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could just use a @return tag
Also log-loss rather than logLoss

@SparkQA
Copy link

SparkQA commented Oct 17, 2019

Test build #112217 has finished for PR 26135 at commit 38b901c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking OK pending tests and one very minor comment

/**
* An auxiliary constructor taking a DataFrame.
* @param predictionAndLabels a DataFrame with two double columns: prediction and label
* @param predictionAndLabels a DataFrame with columns: prediction, label, weight(optional)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: spaces before paren

@SparkQA
Copy link

SparkQA commented Oct 18, 2019

Test build #112246 has finished for PR 26135 at commit f46046c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zhengruifeng
Copy link
Contributor Author

merged to master, thanks @srowen for reviewing!

@zhengruifeng zhengruifeng deleted the logloss branch October 18, 2019 09:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants