@@ -6,7 +6,7 @@ Convolutional Neural Networks (LeNet)
66.. note::
77 This section assumes the reader has already read through :doc:`logreg` and
88 :doc:`mlp`. Additionally, it uses the following new Theano functions and concepts:
9- `T.tanh`_, `shared variables`_, `basic arithmetic ops`_, `T.grad`_,
9+ `T.tanh`_, `shared variables`_, `basic arithmetic ops`_, `T.grad`_,
1010 `floatX`_, `downsample`_ , `conv2d`_, `dimshuffle`_. If you intend to run the
1111 code on GPU also read `GPU`_.
1212
@@ -20,7 +20,7 @@ Convolutional Neural Networks (LeNet)
2020
2121.. _floatX: http://deeplearning.net/software/theano/library/config.html#config.floatX
2222
23- .. _GPU: http://deeplearning.net/software/theano/tutorial/using_gpu.html
23+ .. _GPU: http://deeplearning.net/software/theano/tutorial/using_gpu.html
2424
2525.. _downsample: http://deeplearning.net/software/theano/library/tensor/signal/downsample.html
2626
@@ -71,7 +71,7 @@ contiguous receptive fields. We can illustrate this graphically as follows:
7171Imagine that layer **m-1** is the input retina.
7272In the above, units in layer **m**
7373have receptive fields of width 3 with respect to the input retina and are thus only
74- connected to 3 adjacent neurons in the layer below (the retina).
74+ connected to 3 adjacent neurons in the layer below (the retina).
7575Units in layer **m** have
7676a similar connectivity with the layer below. We say that their receptive
7777field with respect to the layer below is also 3, but their receptive field
@@ -122,7 +122,7 @@ feature map :math:`h^k` is obtained as follows (for :math:`tanh` non-linearities
122122.. math::
123123 h^k_{ij} = \tanh ( (W^k * x)_{ij} + b_k ).
124124
125- .. Note::
125+ .. Note::
126126 Recall the following definition of convolution for a 1D signal.
127127 :math:`o[n] = f[n]*g[n] = \sum_{u=-\infty}^{\infty} f[u] g[u-n] = \sum_{u=-\infty}^{\infty} f[n-u] g[u]`.
128128
@@ -131,10 +131,10 @@ feature map :math:`h^k` is obtained as follows (for :math:`tanh` non-linearities
131131
132132To form a richer representation of the data, hidden layers are composed of
133133a set of multiple feature maps, :math:`\{h^{(k)}, k=0..K\}`.
134- The weights :math:`W` of this layer can be parametrized as a 4D tensor
134+ The weights :math:`W` of this layer can be parametrized as a 4D tensor
135135(destination feature map index, source feature map index, source vertical position index, source horizontal position index)
136136and
137- the biases :math:`b` as a vector (one element per destination feature map index).
137+ the biases :math:`b` as a vector (one element per destination feature map index).
138138We illustrate this graphically as follows:
139139
140140.. figure:: images/cnn_explained.png
@@ -154,7 +154,7 @@ input feature maps, while the other two refer to the pixel coordinates.
154154
155155Putting it all together, :math:`W^{kl}_{ij}` denotes the weight connecting
156156each pixel of the k-th feature map at layer m, with the pixel at coordinates
157- (i,j) of the l-th feature map of layer (m-1).
157+ (i,j) of the l-th feature map of layer (m-1).
158158
159159
160160The ConvOp
@@ -195,7 +195,7 @@ one of Figure 1. The input consists of 3 features maps (an RGB color image) of s
195195 high=1.0 / w_bound,
196196 size=w_shp),
197197 dtype=input.dtype), name ='W')
198-
198+
199199 # initialize shared variable for bias (1D tensor) with random values
200200 # IMPORTANT: biases are usually initialized to zero. However in this
201201 # particular application, we simply apply the convolutional layer to
@@ -210,10 +210,10 @@ one of Figure 1. The input consists of 3 features maps (an RGB color image) of s
210210 conv_out = conv.conv2d(input, W)
211211
212212 # build symbolic expression to add bias and apply activation function, i.e. produce neural net layer output
213- # A few words on ``dimshuffle`` :
213+ # A few words on ``dimshuffle`` :
214214 # ``dimshuffle`` is a powerful tool in reshaping a tensor;
215- # what it allows you to do is to shuffle dimension around
216- # but also to insert new ones along which the tensor will be
215+ # what it allows you to do is to shuffle dimension around
216+ # but also to insert new ones along which the tensor will be
217217 # broadcastable;
218218 # dimshuffle('x', 2, 'x', 0, 1)
219219 # This will work on 3d tensors with no broadcastable
@@ -255,7 +255,7 @@ Let's have a little bit of fun with this...
255255 # plot original image and first and second components of output
256256 pylab.subplot(1, 3, 1); pylab.axis('off'); pylab.imshow(img)
257257 pylab.gray();
258- # recall that the convOp output (filtered image) is actually a "minibatch",
258+ # recall that the convOp output (filtered image) is actually a "minibatch",
259259 # of size 1 here, so we take index 0 in the first dimension:
260260 pylab.subplot(1, 3, 2); pylab.axis('off'); pylab.imshow(filtered_img[0, 0, :, :])
261261 pylab.subplot(1, 3, 3); pylab.axis('off'); pylab.imshow(filtered_img[0, 1, :, :])
@@ -267,7 +267,7 @@ This should generate the following output.
267267.. image:: images/3wolfmoon_output.png
268268 :align: center
269269
270- Notice that a randomly initialized filter acts very much like an edge detector!
270+ Notice that a randomly initialized filter acts very much like an edge detector!
271271
272272Also of note, remark that we use the same weight initialization formula as
273273with the MLP. Weights are sampled randomly from a uniform distribution in the
@@ -371,7 +371,7 @@ The lower-layers are composed to alternating convolution and max-pooling
371371layers. The upper-layers however are fully-connected and correspond to a
372372traditional MLP (hidden layer + logistic regression). The input to the
373373first fully-connected layer is the set of all features maps at the layer
374- below.
374+ below.
375375
376376From an implementation point of view, this means lower-layers operate on 4D
377377tensors. These are then flattened to a 2D matrix of rasterized feature maps,
@@ -445,7 +445,7 @@ layer.
445445Notice that when initializing the weight values, the fan-in is determined by
446446the size of the receptive fields and the number of input feature maps.
447447
448- Finally, using the LogisticRegression class defined in :doc:`logreg` and
448+ Finally, using the LogisticRegression class defined in :doc:`logreg` and
449449the HiddenLayer class defined in :doc:`mlp` , we can
450450instantiate the network as follows.
451451
@@ -491,7 +491,7 @@ instantiate the network as follows.
491491 layer2_input = layer1.output.flatten(2)
492492
493493 # construct a fully-connected sigmoidal layer
494- layer2 = HiddenLayer(rng, input=layer2_input,
494+ layer2 = HiddenLayer(rng, input=layer2_input,
495495 n_in=50 * 4 * 4, n_out=500,
496496 activation=T.tanh )
497497
@@ -510,7 +510,7 @@ instantiate the network as follows.
510510
511511 # create a list of gradients for all model parameters
512512 grads = T.grad(cost, params)
513-
513+
514514 # train_model is a function that updates the model parameters by SGD
515515 # Since this model has many parameters, it would be tedious to manually
516516 # create an update rule for each model parameter. We thus create the updates
@@ -585,10 +585,10 @@ Number of filters
585585*****************
586586When choosing the number of filters per layer, keep in mind that computing the
587587activations of a single convolutional filter is much more expensive than with
588- traditional MLPs !
588+ traditional MLPs !
589589
590590Assume layer :math:`(l-1)` contains :math:`K^{l-1}` feature
591- maps and :math:`M \times N` pixel positions (i.e.,
591+ maps and :math:`M \times N` pixel positions (i.e.,
592592number of positions times number of feature maps),
593593and there are :math:`K^l` filters at layer :math:`l` of shape :math:`m \times n`.
594594Then computing a feature map (applying an :math:`m \times n` filter
@@ -612,7 +612,7 @@ keeping the total number of activations (number of feature maps times
612612number of pixel positions) to be non-decreasing from one layer to the next
613613(of course we could hope to get away with less when we are doing supervised
614614learning). The number of feature maps directly controls capacity and so
615- that depends on the number of available examples and the complexity of
615+ that depends on the number of available examples and the complexity of
616616the task.
617617
618618
0 commit comments