@@ -141,17 +141,19 @@ The following Theano code defines the (symbolic) loss for a given minibatch:
141141
142142.. code-block:: python
143143
144- loss = -T.sum (T.log(p_y_given_x)[T.arange(y.shape[0]), y])
144+ loss = -T.mean (T.log(p_y_given_x)[T.arange(y.shape[0]), y])
145145 # note on syntax: T.arange(y,shape[0]) is a vector of integers [0,1,2,...,len(y)].
146146 # Indexing a matrix M by the two vectors [0,1,...,K], [a,b,...,k] returns the
147147 # elements M[0,a], M[1,b], ..., M[K,k] as a vector. Here, we use this
148148 # syntax to retrieve the log-probability of the correct labels, y.
149149
150150.. note::
151151
152- In practice, we will use the mean (T.mean) instead of the sum. This
153- allows for the learning rate choice to be less dependent of the minibatch size.
154-
152+ Even though the loss is formally defined as the *sum*, over the data set,
153+ of individual error terms, in practice, we use the *mean* (``T.mean``)
154+ in the code. This allows for the learning rate choice to be less dependent
155+ of the minibatch size.
156+
155157
156158Creating a LogisticRegression class
157159+++++++++++++++++++++++++++++++++++
@@ -191,21 +193,21 @@ similar to what we have covered so far, and should be self explanatory.
191193
192194
193195 def negative_log_likelihood(self, y):
194- """Return the negative log-likelihood of the prediction of this
195- model under a given target distribution.
196+ """Return the mean of the negative log-likelihood of the prediction
197+ of this model under a given target distribution.
196198
197199 .. math::
198200
199- \mathcal{L} (\theta=\{W,b\}, \mathcal{D}) =
200- \sum_{i=0}^{|\mathcal{D}|} \log(P(Y=y^{(i)}|x^{(i)}, W,b)) \\
201+ \frac{1}{|\mathcal{D}|} \ mathcal{L} (\theta=\{W,b\}, \mathcal{D}) =
202+ \frac{1}{|\mathcal{D}|} \ sum_{i=0}^{|\mathcal{D}|} \log(P(Y=y^{(i)}|x^{(i)}, W,b)) \\
201203 \ell (\theta=\{W,b\}, \mathcal{D})
202204
203205
204206 :param y: corresponds to a vector that gives for each example the
205207 correct label;
206208
207- note: in practice we use mean instead of sum so that
208- learning rate is less dependent on the batch size
209+ Note: we use the mean instead of the sum so that
210+ the learning rate is less dependent on the batch size
209211 """
210212 return -T.mean(T.log(self.p_y_given_x)[T.arange(y.shape[0]),y])
211213
@@ -231,11 +233,8 @@ the instance method ``classifier.negative_log_likelihood``.
231233
232234.. code-block:: python
233235
234- cost = classifier.negative_log_likelihood(y)
236+ cost = classifier.negative_log_likelihood(y)
235237
236- Note that the return value of ``classifier.negative_log_likelihood`` is a vector
237- containing the cost for each training example within the minibatch. Since we are
238- using MSGD, the cost to minimize is the mean cost across the minibatch.
239238Note how x is an implicit symbolic input to the symbolic definition of cost,
240239here, because classifier.__init__ has defined its symbolic variables in terms of x.
241240
0 commit comments