DOC copyedit SGDClassifier docstring

larsmans · larsmans · commit 5eab030d24cf · 2013-08-01T12:29:02.000+02:00
Make it clear immediately that this beast fits SVMs by default,
logistic regression if you ask for it.
diff --git a/sklearn/linear_model/stochastic_gradient.py b/sklearn/linear_model/stochastic_gradient.py
@@ -520,11 +520,18 @@ def fit(self, X, y, coef_init=None, intercept_init=None,
 
 
 class SGDClassifier(BaseSGDClassifier, _LearntSelectorMixin):
-    """Linear model fitted by minimizing a regularized empirical loss with SGD.
+    """Linear classifiers (SVM, logistic regression, a.o.) with SGD training.
 
-    SGD stands for Stochastic Gradient Descent: the gradient of the loss is
-    estimated each sample at a time and the model is updated along the way with
-    a decreasing strength schedule (aka learning rate).
+    This estimator implements regularized linear models with stochastic
+    gradient descent (SGD) learning: the gradient of the loss is estimated
+    each sample at a time and the model is updated along the way with a
+    decreasing strength schedule (aka learning rate). SGD allows minibatch
+    (online/out-of-core) learning, see the partial_fit method.
+
+    This implementation works with data represented as dense or sparse arrays
+    of floating point values for the features. The model it fits can be
+    controlled with the loss parameter; by default, it fits a linear support
+    vector machine (SVM).
 
     The regularizer is a penalty added to the loss function that shrinks model
     parameters towards the zero vector using either the squared euclidean norm
@@ -533,19 +540,16 @@ class SGDClassifier(BaseSGDClassifier, _LearntSelectorMixin):
     update is truncated to 0.0 to allow for learning sparse models and achieve
     online feature selection.
 
-    This implementation works with data represented as dense or sparse arrays
-    of floating point values for the features.
-
     Parameters
     ----------
     loss : str, 'hinge', 'log', 'modified_huber', 'squared_hinge',\
                 'perceptron', or a regression loss: 'squared_loss', 'huber',\
                 'epsilon_insensitive', or 'squared_epsilon_insensitive'
-        The loss function to be used. Defaults to 'hinge'. The hinge loss is
-        a margin loss used by standard linear SVM models. The 'log' loss is
-        the loss of logistic regression models and can be used for
-        probability estimation in binary classifiers. 'modified_huber'
-        is another smooth loss that brings tolerance to outliers.
+        The loss function to be used. Defaults to 'hinge', which gives a
+        linear SVM.
+        The 'log' loss gives logistic regression, a probabilistic classifier.
+        'modified_huber' is another smooth loss that brings tolerance to
+        outliers as well as probability estimates.
         'squared_hinge' is like hinge but is quadratically penalized.
         'perceptron' is the linear loss used by the perceptron algorithm.
         The other losses are designed for regression but can be useful in