@@ -6,7 +6,7 @@ Convolutional Neural Networks (LeNet)
66.. note::
77 This section assumes the reader has already read through :doc:`logreg` and
88 :doc:`mlp`. Additionally, it uses the following new Theano functions and concepts:
9- `T.tanh`_, `shared variables`_, `basic arithmetic ops`_, `T.grad`_,
9+ `T.tanh`_, `shared variables`_, `basic arithmetic ops`_, `T.grad`_,
1010 `floatX`_, `downsample`_ , `conv2d`_, `dimshuffle`_. If you intend to run the
1111 code on GPU also read `GPU`_.
1212
@@ -33,7 +33,7 @@ Convolutional Neural Networks (LeNet)
3333
3434.. _floatX: http://deeplearning.net/software/theano/library/config.html#config.floatX
3535
36- .. _GPU: http://deeplearning.net/software/theano/tutorial/using_gpu.html
36+ .. _GPU: http://deeplearning.net/software/theano/tutorial/using_gpu.html
3737
3838.. _downsample: http://deeplearning.net/software/theano/library/tensor/signal/downsample.html
3939
@@ -84,7 +84,7 @@ contiguous receptive fields. We can illustrate this graphically as follows:
8484Imagine that layer **m-1** is the input retina.
8585In the above, units in layer **m**
8686have receptive fields of width 3 with respect to the input retina and are thus only
87- connected to 3 adjacent neurons in the layer below (the retina).
87+ connected to 3 adjacent neurons in the layer below (the retina).
8888Units in layer **m** have
8989a similar connectivity with the layer below. We say that their receptive
9090field with respect to the layer below is also 3, but their receptive field
@@ -135,7 +135,7 @@ feature map :math:`h^k` is obtained as follows (for :math:`tanh` non-linearities
135135.. math::
136136 h^k_{ij} = \tanh ( (W^k * x)_{ij} + b_k ).
137137
138- .. Note::
138+ .. Note::
139139 Recall the following definition of convolution for a 1D signal.
140140 :math:`o[n] = f[n]*g[n] = \sum_{u=-\infty}^{\infty} f[u] g[u-n] = \sum_{u=-\infty}^{\infty} f[n-u] g[u]`.
141141
@@ -144,10 +144,10 @@ feature map :math:`h^k` is obtained as follows (for :math:`tanh` non-linearities
144144
145145To form a richer representation of the data, hidden layers are composed of
146146a set of multiple feature maps, :math:`\{h^{(k)}, k=0..K\}`.
147- The weights :math:`W` of this layer can be parametrized as a 4D tensor
147+ The weights :math:`W` of this layer can be parametrized as a 4D tensor
148148(destination feature map index, source feature map index, source vertical position index, source horizontal position index)
149149and
150- the biases :math:`b` as a vector (one element per destination feature map index).
150+ the biases :math:`b` as a vector (one element per destination feature map index).
151151We illustrate this graphically as follows:
152152
153153.. figure:: images/cnn_explained.png
@@ -167,7 +167,7 @@ input feature maps, while the other two refer to the pixel coordinates.
167167
168168Putting it all together, :math:`W^{kl}_{ij}` denotes the weight connecting
169169each pixel of the k-th feature map at layer m, with the pixel at coordinates
170- (i,j) of the l-th feature map of layer (m-1).
170+ (i,j) of the l-th feature map of layer (m-1).
171171
172172
173173The ConvOp
@@ -208,7 +208,7 @@ one of Figure 1. The input consists of 3 features maps (an RGB color image) of s
208208 high=1.0 / w_bound,
209209 size=w_shp),
210210 dtype=input.dtype), name ='W')
211-
211+
212212 # initialize shared variable for bias (1D tensor) with random values
213213 # IMPORTANT: biases are usually initialized to zero. However in this
214214 # particular application, we simply apply the convolutional layer to
@@ -223,10 +223,10 @@ one of Figure 1. The input consists of 3 features maps (an RGB color image) of s
223223 conv_out = conv.conv2d(input, W)
224224
225225 # build symbolic expression to add bias and apply activation function, i.e. produce neural net layer output
226- # A few words on ``dimshuffle`` :
226+ # A few words on ``dimshuffle`` :
227227 # ``dimshuffle`` is a powerful tool in reshaping a tensor;
228- # what it allows you to do is to shuffle dimension around
229- # but also to insert new ones along which the tensor will be
228+ # what it allows you to do is to shuffle dimension around
229+ # but also to insert new ones along which the tensor will be
230230 # broadcastable;
231231 # dimshuffle('x', 2, 'x', 0, 1)
232232 # This will work on 3d tensors with no broadcastable
@@ -268,7 +268,7 @@ Let's have a little bit of fun with this...
268268 # plot original image and first and second components of output
269269 pylab.subplot(1, 3, 1); pylab.axis('off'); pylab.imshow(img)
270270 pylab.gray();
271- # recall that the convOp output (filtered image) is actually a "minibatch",
271+ # recall that the convOp output (filtered image) is actually a "minibatch",
272272 # of size 1 here, so we take index 0 in the first dimension:
273273 pylab.subplot(1, 3, 2); pylab.axis('off'); pylab.imshow(filtered_img[0, 0, :, :])
274274 pylab.subplot(1, 3, 3); pylab.axis('off'); pylab.imshow(filtered_img[0, 1, :, :])
@@ -280,7 +280,7 @@ This should generate the following output.
280280.. image:: images/3wolfmoon_output.png
281281 :align: center
282282
283- Notice that a randomly initialized filter acts very much like an edge detector!
283+ Notice that a randomly initialized filter acts very much like an edge detector!
284284
285285Also of note, remark that we use the same weight initialization formula as
286286with the MLP. Weights are sampled randomly from a uniform distribution in the
@@ -384,7 +384,7 @@ The lower-layers are composed to alternating convolution and max-pooling
384384layers. The upper-layers however are fully-connected and correspond to a
385385traditional MLP (hidden layer + logistic regression). The input to the
386386first fully-connected layer is the set of all features maps at the layer
387- below.
387+ below.
388388
389389From an implementation point of view, this means lower-layers operate on 4D
390390tensors. These are then flattened to a 2D matrix of rasterized feature maps,
@@ -458,7 +458,7 @@ layer.
458458Notice that when initializing the weight values, the fan-in is determined by
459459the size of the receptive fields and the number of input feature maps.
460460
461- Finally, using the LogisticRegression class defined in :doc:`logreg` and
461+ Finally, using the LogisticRegression class defined in :doc:`logreg` and
462462the HiddenLayer class defined in :doc:`mlp` , we can
463463instantiate the network as follows.
464464
@@ -504,7 +504,7 @@ instantiate the network as follows.
504504 layer2_input = layer1.output.flatten(2)
505505
506506 # construct a fully-connected sigmoidal layer
507- layer2 = HiddenLayer(rng, input=layer2_input,
507+ layer2 = HiddenLayer(rng, input=layer2_input,
508508 n_in=50 * 4 * 4, n_out=500,
509509 activation=T.tanh )
510510
@@ -523,7 +523,7 @@ instantiate the network as follows.
523523
524524 # create a list of gradients for all model parameters
525525 grads = T.grad(cost, params)
526-
526+
527527 # train_model is a function that updates the model parameters by SGD
528528 # Since this model has many parameters, it would be tedious to manually
529529 # create an update rule for each model parameter. We thus create the updates
@@ -548,36 +548,36 @@ Running the Code
548548The user can then run the code by calling:
549549
550550.. code-block:: bash
551-
551+
552552 python code/convolutional_mlp.py
553553
554- The following output was obtained with the default parameters on a Xeon E5450
555- CPU clocked at 3.00GHz and using flags 'floatX=float32':
554+ The following output was obtained with the default parameters on a Core i7-2600K
555+ CPU clocked at 3.40GHz and using flags 'floatX=float32':
556556
557557.. code-block:: bash
558558
559559 Optimization complete.
560- Best validation score of 0.910000 % obtained at iteration 16099 ,with test
561- performance 0.930000 %
562- The code for file convolutional_mlp.py ran for 755.32m
560+ Best validation score of 0.910000 % obtained at iteration 17800 ,with test
561+ performance 0.920000 %
562+ The code for file convolutional_mlp.py ran for 380.28m
563563
564564Using a GeForce GTX 285, we obtained the following:
565565
566566.. code-block:: bash
567567
568568 Optimization complete.
569- Best validation score of 0.910000 % obtained at iteration 20099 ,with test
569+ Best validation score of 0.910000 % obtained at iteration 15500 ,with test
570570 performance 0.930000 %
571- The code for file convolutional_mlp.py ran for 47.96m
571+ The code for file convolutional_mlp.py ran for 46.76m
572572
573573And similarly on a GeForce GTX 480:
574574
575575.. code-block:: bash
576576
577577 Optimization complete.
578- Best validation score of 0.910000 % obtained at iteration 18499 ,with test
579- performance 0.910000 %
580- The code for file convolutional_mlp.py ran for 43.09m
578+ Best validation score of 0.910000 % obtained at iteration 16400 ,with test
579+ performance 0.930000 %
580+ The code for file convolutional_mlp.py ran for 32.52m
581581
582582Note that the discrepancies in validation and test error (as well as iteration
583583count) are due to different implementations of the rounding mechanism in
@@ -598,10 +598,10 @@ Number of filters
598598*****************
599599When choosing the number of filters per layer, keep in mind that computing the
600600activations of a single convolutional filter is much more expensive than with
601- traditional MLPs !
601+ traditional MLPs !
602602
603603Assume layer :math:`(l-1)` contains :math:`K^{l-1}` feature
604- maps and :math:`M \times N` pixel positions (i.e.,
604+ maps and :math:`M \times N` pixel positions (i.e.,
605605number of positions times number of feature maps),
606606and there are :math:`K^l` filters at layer :math:`l` of shape :math:`m \times n`.
607607Then computing a feature map (applying an :math:`m \times n` filter
@@ -625,7 +625,7 @@ keeping the total number of activations (number of feature maps times
625625number of pixel positions) to be non-decreasing from one layer to the next
626626(of course we could hope to get away with less when we are doing supervised
627627learning). The number of feature maps directly controls capacity and so
628- that depends on the number of available examples and the complexity of
628+ that depends on the number of available examples and the complexity of
629629the task.
630630
631631
0 commit comments