Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
add 1 un
  • Loading branch information
YY-OnCall committed Jul 28, 2017
commit 94e025055a7755460cb83afe375d11a99dda8c0c
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,11 @@ import org.apache.spark.ml.feature.Instance
import org.apache.spark.ml.linalg._

/**
* LinearSVCAggregator computes the gradient and loss for loss function ("hinge" or
* HingeAggregator computes the gradient and loss for loss function ("hinge" or
* "squared_hinge", as used in binary classification for instances in sparse or dense
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this only support hinge loss currently. BTW, if we support squared hinge in the future, what is the best way? Add a param loss function for HingeAggregator or just add a new SquaredHingeAggregator? The later way should be more clear, but with more code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to use SquaredHingeAggregator. API-wise, it looks more intuitive and consistent to me. We can continue the review in the other LinearSVC PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree you for separate SquaredHingeAggregator. Then we should remove squared_hinge from here?
BTW, you missed right parenthesis here.

* vector in an online fashion.
*
* Two LinearSVCAggregator can be merged together to have a summary of loss and gradient of
* Two HingeAggregators can be merged together to have a summary of loss and gradient of
* the corresponding joint dataset.
*
* This class standardizes feature values during computation using bcFeaturesStd.
Expand All @@ -50,11 +50,11 @@ private[ml] class HingeAggregator(
protected override val dim: Int = numFeaturesPlusIntercept

/**
* Add a new training instance to this LinearSVCAggregator, and update the loss and gradient
* Add a new training instance to this HingeAggregator, and update the loss and gradient
* of the objective function.
*
* @param instance The instance of data point to be added.
* @return This LinearSVCAggregator object.
* @return This HingeAggregator object.
*/
def add(instance: Instance): this.type = {
instance match { case Instance(label, weight, features) =>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ class HingeAggregatorSuite extends SparkFunSuite with MLlibTestSparkContext {
val interceptArray = Array(2.0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

val agg = getNewAggregator(instances, Vectors.dense(coefArray ++ interceptArray),
fitIntercept = true)
withClue("LogisticAggregator does not support negative instance weights") {
withClue("HingeAggregator does not support negative instance weights") {
intercept[IllegalArgumentException] {
agg.add(Instance(1.0, -1.0, Vectors.dense(2.0, 1.0)))
}
Expand Down Expand Up @@ -133,4 +133,18 @@ class HingeAggregatorSuite extends SparkFunSuite with MLlibTestSparkContext {
assert(gradient ~== agg.gradient relTol 0.01)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other aggregator tests have one for "zero standard deviation". We should add one here too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Added.

test("check with zero standard deviation") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we found out in #18896, this test is not thorough enough. We should check all elements of the gradient for correctness.

val instancesConstantFeature = Array(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about move the dataset to the beginning of this class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Instance(0.0, 0.1, Vectors.dense(1.0, 2.0)),
Instance(1.0, 0.5, Vectors.dense(1.0, 1.0)),
Instance(1.0, 0.3, Vectors.dense(1.0, 0.5)))
val binaryCoefArray = Array(1.0, 2.0)
val intercept = 1.0
val aggConstantFeatureBinary = getNewAggregator(instancesConstantFeature,
Vectors.dense(binaryCoefArray ++ Array(intercept)), fitIntercept = true)
instances.foreach(aggConstantFeatureBinary.add)
// constant features should not affect gradient
assert(aggConstantFeatureBinary.gradient(0) === 0.0)
}

}