Skip to content

Commit 4f56473

Browse files
committed
Some changes in text so that it remains in sync with the code
1 parent 4211d6f commit 4f56473

3 files changed

Lines changed: 126 additions & 63 deletions

File tree

code/linear.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -190,6 +190,6 @@ def callback(w_b_value):
190190
tuple(validation_scores))
191191

192192
if __name__ == '__main__':
193-
sgd_optimization_mnist()
194-
#cg_optimization_mnist()
193+
#sgd_optimization_mnist()
194+
cg_optimization_mnist()
195195

doc/logreg.txt

Lines changed: 90 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -40,27 +40,44 @@ The code to do this in theano is the following:
4040

4141
.. code-block:: python
4242

43-
# allocate shared variables for inputs and model params
44-
x = theano.shared(numpy.zeros((5,784))
45-
y = theano.shared(numpy.zeros((5))
43+
# generate symbolic variables for input (x and y represent a
44+
# minibatch)
45+
x = T.fmatrix()
46+
y = T.lvector()
47+
48+
# allocate shared variables model params
4649
b = theano.shared(numpy.random(10))
4750
W = theano.shared(numpy.random(784,10))
4851

49-
# compute vector of class-membership probabilities
52+
# symbolic expression for computing the vector of
53+
# class-membership probabilities
5054
p_y_given_x = T.softmax(T.dot(x,w)+b)
5155

52-
print 'Probability that x is of class %i is %f' % i, p_y_given_x[i]
56+
# compiled theano function that returns the vector of class-membership
57+
# probabilities
58+
get_p_y_given_x = theano.function( x, p_y_given_x)
59+
60+
# print the probability of some example represented by x_value
61+
# x_value is not a symbolic variable but a numpy array describing the
62+
# datapoint
63+
print 'Probability that x is of class %i is %f' % i, get_p_y_given_x(x_value)[i]
5364

54-
# compute prediction as class whose probability is maximal
65+
# symbolic description of how to compute prediction as class whose probability
66+
# is maximal
5567
y_pred = T.argmax(p_y_given_x)
56-
classify = pfunc([x,y], y_pred)
68+
69+
# compiled theano function that returns this value
70+
classify = theano.function([x,y], y_pred)
5771

5872

59-
We first start by allocating shared variables for the parameters :math:`W,b` and
60-
and inputs :math:`x,y`. This step declares them both as symbolic theano
73+
We first start by allocating symbolic variables for the inputs
74+
:math:`x,y`. Afterwards we allocate shared variables for the parameters :math:`W,b`.
75+
This step declares them both as symbolic theano
6176
variables, but also initializes their contents. The dot and softmax operators
6277
are then used to compute the vector :math:`P(Y|x, W,b)`. The resulting
63-
variable p_y_given_x is a vector and can thus be index to retrieve a
78+
variable p_y_given_x is a symbolic variable pointing to a vector. The function
79+
`get_p_y_given_x` computs this vector for a given x. The output of the
80+
function is a vector and can thus be index to retrieve a
6481
particular entry :math:`P(Y=i|x, W,b)`. The final model prediction is then
6582
computed using the T.argmax operator.
6683

@@ -103,10 +120,10 @@ The following Theano code defines the loss for a given minibatch:
103120

104121
.. code-block:: python
105122

106-
loss = theano.sum(theano.log(p_y_given_x)[y])
123+
loss = T.sum(T.log(p_y_given_x)[y])
107124

108125
.. note::
109-
In practice, we will use the mean (theano.mean) instead of the sum. This
126+
In practice, we will use the mean (T.mean) instead of the sum. This
110127
allows for the learning rate to be independent of the minibatch size.
111128

112129

@@ -120,6 +137,7 @@ encapsulates the basic behaviour for LogisticRegression:
120137

121138
class LogisticRegression(object):
122139

140+
123141
def __init__(self, input, n_in, n_out):
124142
""" Initialize the parameters of the logistic regression
125143
:param input: symbolic variable that describes the input of the
@@ -160,9 +178,15 @@ We instantiate the class and declare a global cost which we wish to minimize:
160178
.. code-block:: python
161179

162180
# allocate symbolic variables for the data
163-
x = tensor.fmatrix() # the data is presented as rasterized images
164-
y = tensor.lvector() # the labels are presented as 1D vector of [long int] labels
165-
classifier = LogisticRegression(input=x.reshape((batch_size,784)), n_in=784, n_out=10)
181+
x = T.fmatrix() # the data is presented as rasterized images
182+
y = T.lvector() # the labels are presented as 1D vector of [long int] labels
183+
184+
# construct the logistic regression class
185+
classifier = LogisticRegression( \
186+
input=x.reshape((batch_size,28*28)), n_in=28*28, n_out=10)
187+
188+
# the cost we minimize during training is the negative log likelihood of
189+
# the model in symbolic format
166190
cost = classifier.negative_log_likelihood(y).mean()
167191

168192

@@ -219,46 +243,67 @@ The finished product is as follows:
219243
.. code-block:: python
220244

221245
# early-stopping parameters
222-
patience = 2000 # look as this many examples regardless
223-
patience_increase = 2 # wait this much longer when a new best is
224-
# found
225-
improvement_threshold = 0.99 # a relative improvement of this much is
226-
# considered significant
227-
validation_frequency = 1000 # make this many SGD updates between
228-
# validations
246+
patience = 5000 # look as this many examples regardless
247+
patience_increase = 2 # wait this much longer when a new best is
248+
# found
249+
improvement_threshold = 0.995 # a relative improvement of this much is
250+
# considered significant
251+
validation_frequency = 1000 # make this many SGD updates between
252+
# validations
229253

230254
best_params = None
231255
best_validation_loss = float('inf')
256+
test_score = 0.
257+
258+
# have a maximum of `n_iter` iterations through the entire dataset
259+
for iter in xrange(n_iter* len(train_batches)):
260+
261+
# get epoch and minibatch index
262+
epoch = iter / len(train_batches)
263+
minibatch_index = iter % len(train_batches)
264+
265+
# get the minibatches corresponding to `iter` modulo
266+
# `len(train_batches)`
267+
x,y = train_batches[ minibatch_index ]
268+
cost_ij = train_model(x,y)
269+
270+
if (iter+1) % validation_frequency == 0:
271+
# compute zero-one loss on validation set
272+
this_validation_loss = 0.
273+
for x,y in valid_batches:
274+
# sum up the errors for each minibatch
275+
this_validation_loss += test_model(x,y)
276+
# get the average by dividing with the number of minibatches
277+
this_validation_loss /= len(valid_batches)
278+
279+
print('epoch %i, validation error %f' %
280+
(epoch, this_validation_loss))
232281

282+
#improve patience
283+
if this_validation_loss < best_validation_loss * \
284+
improvement_threshold :
285+
patience = max(patience, iter * patience_increase)
233286

234-
for i in xrange(n_iter):
235-
# go through the training set and update the model parameters
236-
for x,y in train_batches:
237-
cost_ij = train_model(x, y)
238-
239287

240-
# test the model on the validation set ( measuring the average number
241-
# of errors )
242-
valid_score = 0.
243-
for x,y in valid_batches:
244-
# sum up the errors for each minibatch
245-
valid_score += test_model(x,y)
246-
# get the average by dividing with the number of minibatches
247-
valid_score /= len(valid_batches)
288+
# if we got the best validation score until now
289+
if this_validation_loss < best_validation_loss:
290+
best_validation_loss = this_validation_loss
291+
# test it on the test set
292+
293+
test_score = 0.
294+
for x,y in test_batches:
295+
test_score += test_model(x,y)
296+
test_score /= len(test_batches)
297+
print(' epoch %i, test error of best model %f' %
298+
(epoch, test_score))
248299

249-
print('epoch %i, validation error %f' % (i, valid_score))
300+
if patience <= iter :
301+
break
250302

251303

252-
# if we got the best validation score until now
253-
if valid_score < best_valid_score:
254-
best_valid_score = valid_score
255-
# test it on the test set
304+
print(('Optimization complete with best validation score of %f,'
305+
'with test performance %f') % (best_validation_loss, test_score))
256306

257-
test_score = 0.
258-
for x,y in test_batches:
259-
test_score += test_model(x,y)
260-
test_score /= len(test_batches)
261-
print('epoch %i, test error of best model %f' % (i, test_score))
262307

263308

264309

doc/optimization.txt

Lines changed: 34 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -86,7 +86,7 @@ hierarchical memory organization in modern computers.
8686

8787
.. code-block:: python
8888

89-
for (x_batch,y_batch) in training_set_batches(batchsize=B):
89+
for (x_batch,y_batch) in train_batches:
9090
# imagine an infinite generator
9191
# that may repeat examples
9292
loss = f(params, x_batch, y_batch)
@@ -113,17 +113,17 @@ is almost arbitrary (though harmless).
113113

114114
.. code-block:: python
115115

116-
zero_one_loss = theano.sum(theano.neq(argmax(p_y_given_x), y)) ???
116+
zero_one_loss = T.sum(T.neq(argmax(p_y_given_x), y)) ???
117117

118-
loss = theano.sum(theano.log(p_y_given_x)[y]) #option 1 (TODO: advanced indexing, optimization pattern)
118+
loss = T.sum(T.log(p_y_given_x)[y]) #option 1 (TODO: advanced indexing, optimization pattern)
119119

120-
loss = theano.log(p_y_given_x[0,y[0]]) + theano.log(p_y_given_x[1, y[1]]) # option 2: simple indexing on each minibatch element
120+
loss = T.log(p_y_given_x[0,y[0]]) + theano.log(p_y_given_x[1, y[1]]) # option 2: simple indexing on each minibatch element
121121

122-
loss = theano.sum(theano.log(p_y_given_x) * one_of_n(y)) # option 3 (TODO: one_of_n:: integer array, optimization pattern)
122+
loss = T.sum(theano.log(p_y_given_x) * one_of_n(y)) # option 3 (TODO: one_of_n:: integer array, optimization pattern)
123123

124-
loss = theano.sum(theano.nnet.categorical_crossentropy(p_y_given_x, y)) # option 4:
124+
loss = T.sum(theano.nnet.categorical_crossentropy(p_y_given_x, y)) # option 4:
125125

126-
gw, gb = theano.grad(L, [w,b])
126+
gw, gb = T.grad(L, [w,b])
127127

128128

129129

@@ -158,28 +158,46 @@ of a strategy based on a geometrically increasing amount of patience.
158158
# params refers to [initialized] parameters of our model
159159

160160
# early-stopping parameters
161-
patience = 2000 # look at this many training examples regardless
162-
patience_increase = 2 # wait this much longer when a new best is found
163-
improvement_threshold = 0.99 # a relative improvement of this much is considered significant
164-
validation_frequency = 1000 # make this many SGD updates between validations
161+
n_iter = 100 # the maximal number of iterations of the
162+
# entire dataset considered
163+
patience = 5000 # look at this many training examples regardless
164+
patience_increase = 2 # wait this much longer when a new best is
165+
# found
166+
improvement_threshold = 0.995 # a relative improvement of this much is
167+
# considered significant
168+
validation_frequency = 1000 # make this many SGD updates between validations
165169

166170
# initialize cross-validation variables
167171
best_params = None
168172
best_validation_loss = float('inf')
169173

170-
for iter, (x_batch,y_batch) in enumerate(training_set_batches(batchsize=B)):
174+
for iter in xrange( n_iter * len(train_batches) ) :
175+
176+
# get epoch and minibatch index
177+
epoch = iter / len(train_batches)
178+
minibatch_index = iter % len(train_batches)
179+
180+
# get the minibatches corresponding to `iter` modulo
181+
# `len(train_batches)`
182+
x,y = train_batches[ minibatch_index ]
183+
184+
171185
d_loss_wrt_params = ... # compute gradient
172186
params -= learning_rate * d_loss_wrt_params # gradient descent
173187

174-
if iter % validation_frequency == 0:
188+
# note that if we do `iter % validation_frequency` it will be
189+
# true for iter = 0 which we do not want
190+
if (iter+1) % validation_frequency == 0:
175191

176192
this_validation_loss = ... # compute zero-one loss on validation set
177-
if this_validation_loss < best_validation_loss:
178-
best_params = copy.deepcopy(params)
179-
best_validation_loss = this_validation_loss
180193

194+
# improve patience
181195
if this_validation_loss < best_validation_loss*improvement_threshold:
182196
patience = iter * patience_increase
197+
198+
if this_validation_loss < best_validation_loss:
199+
best_params = copy.deepcopy(params)
200+
best_validation_loss = this_validation_loss
183201

184202
if patience <= iter:
185203
break

0 commit comments

Comments
 (0)