Skip to content
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Normalize posteriors, change signature to Map interface
  • Loading branch information
Alan Gardner committed Mar 2, 2015
commit 7d6b5b4801c9402bfdfedf0eb8d9f87be8345efa
Original file line number Diff line number Diff line change
Expand Up @@ -68,19 +68,20 @@ class NaiveBayesModel private[mllib] (
}

def classProbabilities(testData: RDD[Vector]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Classes are index from 0, so would it work to return (for each instance) a Vector of probabilities rather than a Map[Double, Double]? That seems simpler (and more efficient).

Also, I have a PR [https://github.com//pull/3637] for a separate part of MLlib which adds a method like this for predicting class conditional probabilities. I'd like us to use the same name for the prediction method, but I'm open about choosing a name. I had used "predictProbabilities" to (a) have it start with "predict" like other prediction methods and (b) leave open the possibility of supporting a similar method for regression algorithms (which can predict probability distributions). But I'll agree "classProbabilities" is more specific. Do you have strong preferences?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, I have no strong preference but "predictProbabilities" makes sense for consistency. I can make that change and the style ones mentioned.

My stats background is not super-strong, @jatinpreet seemed to imply there's a correctness issue with this PR. Can anyone comment on if I've got the math wrong?

RDD[mutable.Map[Double, Double]] = {
RDD[scala.collection.Map[Double, Double]] = {
val bcModel = testData.context.broadcast(this)
testData.mapPartitions { iter =>
val model = bcModel.value
iter.map(model.classProbabilities)
}
}

def classProbabilities(testData: Vector): mutable.Map[Double, Double] = {
def classProbabilities(testData: Vector): scala.collection.Map[Double, Double] = {
val posteriors = (brzPi + brzTheta * testData.toBreeze)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These aren't quite the class probabilities as the expression doesn't have the probability of the evidence incorporated. This would work if you first normalized the probabilities to sum to 1.

val sum = posteriors.sum
val probs:mutable.Map[Double,Double] =
mutable.Map.empty[Double, Double]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can fit on 1 line.
Also, there's no need to specify the type of probs since mutable.Map.empty[Double, Double] makes it clear.

posteriors.foreachPair((k,v) => probs += (labels(k) -> v))
posteriors.foreachPair((k,v) => probs += (labels(k) -> v/sum))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scala style: space after comma + space around "/" operator

probs
}

Expand Down