added tips and trick section for tuning hyperparams of CNNs

gdesjardins · gdesjardins · commit 39333dd7f347 · 2010-01-28T14:41:23.000-05:00
diff --git a/doc/lenet.txt b/doc/lenet.txt
@@ -539,6 +539,57 @@ Tips and Tricks
 Choosing Hyperparameters
 ------------------------
 
+CNNs are especially tricky to train, as they add even more hyper-parameters than
+a standard MLP. While the usual rules of thumb for learning rates and
+regularization constants still apply, the following should be kept in mind when
+optimizing CNNs.
+
+Number of filters
+*****************
+When choosing the number of filters per layer, keep in mind that computing the
+activations of a single convolutional filter is much more expensive than with
+traditional MLPs ! 
+
+Assume layer :math:`(l-1)` contains :math:`S^{(l-1)}` pixels (across all feature
+maps), and that feature maps at layer :math:`l` are of shape :math:`m \times n`.
+Computing the activations of a single convolutional filter requires :math:`m
+\times n \times S^{(l-1)}` multiplications, compared to :math:`S^{(l-1)}` for a
+standard MLP. As such, the number of filters used in CNNs is typically much
+smaller than the number of hidden units in MLPs and depends on the size of the
+feature maps (itself a function of input image size and filter shapes).
+
+Since feature map size decreases with depth, shallow layers will tend to
+have fewer filters while deep layers can have much more.
+
+Filter Shape
+************
+Common filter shapes found in the litterature vary greatly, usually based on
+the dataset. Best results on MNIST-sized images (28x28) are usually in the 5x5
+range, while natural image datasets (often with hundreds of pixels in each
+dimension) tend to use larger filters of shape 12x12 or 15x15.
+
+When optimizing filter shapes, it is good to keep in mind however that there
+is a relationship between the size of the input image, the filter shape and
+the number of hidden units. Filters which are too large with respect to the
+input will project the input onto a very low-dimensional space. Creating a
+useful high-level abstraction will thus require many hidden units, as in the
+case of fully connected MLPs. Smaller filter shapes (with respect to the
+input) can get away with fewer hidden units (i.e. as few as 6 in the case of
+LeNet-5) as they project into a high-dimensional space, which preserves more
+of the information content of the input signal.
+
+The trick is thus to find the right level of "granularity" (i.e. filter
+shapes) in order to create abstractions at the proper scale, given a
+particular dataset.
+
+
+Max Pooling Shape
+****************
+Typical values are 2x2 or no max-pooling. Very large input images may warrant
+4x4 pooling in the lower-layers. Keep in mind however, that this will reduce the
+dimension of the signal by a factor of 16, and may result in throwing away too
+much information.
+
 
 References
 ++++++++++