@@ -539,6 +539,57 @@ Tips and Tricks
539539Choosing Hyperparameters
540540------------------------
541541
542+ CNNs are especially tricky to train, as they add even more hyper-parameters than
543+ a standard MLP. While the usual rules of thumb for learning rates and
544+ regularization constants still apply, the following should be kept in mind when
545+ optimizing CNNs.
546+
547+ Number of filters
548+ *****************
549+ When choosing the number of filters per layer, keep in mind that computing the
550+ activations of a single convolutional filter is much more expensive than with
551+ traditional MLPs !
552+
553+ Assume layer :math:`(l-1)` contains :math:`S^{(l-1)}` pixels (across all feature
554+ maps), and that feature maps at layer :math:`l` are of shape :math:`m \times n`.
555+ Computing the activations of a single convolutional filter requires :math:`m
556+ \times n \times S^{(l-1)}` multiplications, compared to :math:`S^{(l-1)}` for a
557+ standard MLP. As such, the number of filters used in CNNs is typically much
558+ smaller than the number of hidden units in MLPs and depends on the size of the
559+ feature maps (itself a function of input image size and filter shapes).
560+
561+ Since feature map size decreases with depth, shallow layers will tend to
562+ have fewer filters while deep layers can have much more.
563+
564+ Filter Shape
565+ ************
566+ Common filter shapes found in the litterature vary greatly, usually based on
567+ the dataset. Best results on MNIST-sized images (28x28) are usually in the 5x5
568+ range, while natural image datasets (often with hundreds of pixels in each
569+ dimension) tend to use larger filters of shape 12x12 or 15x15.
570+
571+ When optimizing filter shapes, it is good to keep in mind however that there
572+ is a relationship between the size of the input image, the filter shape and
573+ the number of hidden units. Filters which are too large with respect to the
574+ input will project the input onto a very low-dimensional space. Creating a
575+ useful high-level abstraction will thus require many hidden units, as in the
576+ case of fully connected MLPs. Smaller filter shapes (with respect to the
577+ input) can get away with fewer hidden units (i.e. as few as 6 in the case of
578+ LeNet-5) as they project into a high-dimensional space, which preserves more
579+ of the information content of the input signal.
580+
581+ The trick is thus to find the right level of "granularity" (i.e. filter
582+ shapes) in order to create abstractions at the proper scale, given a
583+ particular dataset.
584+
585+
586+ Max Pooling Shape
587+ ****************
588+ Typical values are 2x2 or no max-pooling. Very large input images may warrant
589+ 4x4 pooling in the lower-layers. Keep in mind however, that this will reduce the
590+ dimension of the signal by a factor of 16, and may result in throwing away too
591+ much information.
592+
542593
543594References
544595++++++++++
0 commit comments