|
| 1 | +.. _deep: |
| 2 | + |
| 3 | +Deep Learning |
| 4 | +============= |
| 5 | + |
| 6 | +The breakthrough to effective training strategies for deep architectures came in |
| 7 | +2006 with the algorithms for training deep belief networks |
| 8 | +(DBN) [Hinton07]_ and stacked auto-encoders [Ranzato07]_ , [Bengio07]_ . |
| 9 | +All these methods are based on a similar approach: **greedy layer-wise unsupervised |
| 10 | +pre-training** followed by **supervised fine-tuning**. |
| 11 | + |
| 12 | +The pretraining strategy consists in using unsupervised learning to guide the |
| 13 | +training of intermediate levels of representation. Each layer is pre-trained |
| 14 | +with an unsupervised learning algorithm, which attempts to learn a nonlinear |
| 15 | +transformation of its input, in order to captures its main variations. Higher |
| 16 | +levels of abstractions are created by feeding the output of one layer, to the |
| 17 | +input of the subsequent layer. |
| 18 | + |
| 19 | +The resulting an architecture can then be seen in two lights: |
| 20 | + |
| 21 | +* the pre-trained deep network can be used to initialize the weights of all, but |
| 22 | + the last layer of a deep neural network. The weights are then further adapted |
| 23 | + to a supervised task (such as classification) through traditional gradient |
| 24 | + descent (see :ref:`Multilayer perceptron <mlp>`). This is referred to as the |
| 25 | + fine-tuning step. |
| 26 | + |
| 27 | +* the pre-trained deep network can also serve solely as a feature extractor. The |
| 28 | + output of the last layer is fed to a classifier, such as logistic regression, |
| 29 | + which is trained independently. Better results can be obtained by |
| 30 | + concatenating the output of the last layer, with the hidden representations of |
| 31 | + all intermediate layers [Lee09]_. |
| 32 | + |
| 33 | +For the purposes of this tutorial, we will focus on the first interpretation, |
| 34 | +as that is what was first proposed in [Hinton06]_. |
| 35 | + |
| 36 | +Deep Coding |
| 37 | ++++++++++++ |
| 38 | + |
| 39 | +Since Deep Belief Networks (DBN) and Stacked Denoising-AutoEncoders (SDA) share |
| 40 | +much of the same architecture and have very similar training algorithms (in |
| 41 | +terms of pretraining and fine-tuning stages), it makes sense to implement them |
| 42 | +in a similar fashion, as part of a "Deep Learning" framework. |
| 43 | + |
| 44 | +We thus define a generic interface, which both of these architectures will |
| 45 | +share. |
| 46 | + |
| 47 | +.. code-block:: python |
| 48 | + |
| 49 | + class DeepLayerwiseModel(object): |
| 50 | + |
| 51 | + def layerwise_pretrain(self, layer_fns, pretrain_amounts): |
| 52 | + """ |
| 53 | + """ |
| 54 | + |
| 55 | + def finetune(self, datasets, lr, batch_size): |
| 56 | + """ |
| 57 | + |
| 58 | + class DBN(DeepLayerwiseModel): |
| 59 | + """ |
| 60 | + """ |
| 61 | + |
| 62 | + class StackedDAA(DeepLayerwiseModel): |
| 63 | + """ |
| 64 | + """ |
| 65 | + |
| 66 | +.. code-block:: python |
| 67 | + |
| 68 | + def deep_main(learning_rate=0.1, |
| 69 | + pretraining_epochs = 20, |
| 70 | + pretrain_lr = 0.1, |
| 71 | + training_epochs = 1000, |
| 72 | + batch_size = 20, |
| 73 | + mnist_file='mnist.pkl.gz'): |
| 74 | + |
| 75 | + n_train_examples, train_valid_test = load_mnist(mnist_file) |
| 76 | + |
| 77 | + # instantiate model |
| 78 | + deep_model = ... |
| 79 | + |
| 80 | + #### |
| 81 | + #### Phase 1: Pre-training |
| 82 | + #### |
| 83 | + |
| 84 | + # create an array of functions, which will be used for the greedy |
| 85 | + # layer-wise unsupervised training procedure |
| 86 | + |
| 87 | + pretrain_functions = deep_model.pretrain_functions( |
| 88 | + batch_size=batch_size, |
| 89 | + train_set_x=train_set_x, |
| 90 | + learning_rate=pretrain_lr, |
| 91 | + ... |
| 92 | + ) |
| 93 | + |
| 94 | + # loop over all the layers in our network |
| 95 | + for layer_idx, pretrain_fn in enumerate(pretrain_functions): |
| 96 | + |
| 97 | + # iterate over a certain number of epochs) |
| 98 | + for i in xrange(pretraining_epochs * n_train_examples / batch_size): |
| 99 | + |
| 100 | + # follow one step in the gradient of the unsupervised cost |
| 101 | + # function, at the given layer |
| 102 | + layer_fn(i) |
| 103 | + |
| 104 | + |
| 105 | +.. code-block:: python |
| 106 | + |
| 107 | + #### |
| 108 | + #### Phase 2: Fine Tuning |
| 109 | + #### |
| 110 | + |
| 111 | + # create theano functions for fine-tuning, as well as |
| 112 | + # validation and testing our model. |
| 113 | + |
| 114 | + train_fn, valid_scores, test_scores =\ |
| 115 | + deep_model.finetune_functions( |
| 116 | + train_valid_test[0][0], # training dataset |
| 117 | + learning_rate=finetune_lr, # the learning rate |
| 118 | + batch_size = batch_size) # number of examples to use at once |
| 119 | + |
| 120 | + |
| 121 | + # use these functions as part of the generic early-stopping procedure |
| 122 | + for i in xrange(patience_max): |
| 123 | + |
| 124 | + if i >= patience: |
| 125 | + break |
| 126 | + |
| 127 | + cost_i = train_fn(i) |
| 128 | + |
| 129 | + ... |
| 130 | + |
| 131 | + |
| 132 | + |
| 133 | + |
| 134 | + |
| 135 | + |
| 136 | + |
0 commit comments