@@ -53,7 +53,7 @@ are trained, we can train the :math:`k+1`-th layer because we can now
5353compute the code or latent representation from the layer below.
5454
5555Once all layers are pre-trained, the network goes through a second stage
56- of training called **fine-tuning**,
56+ of training called **fine-tuning**. Here we consider **supervised fine-tuning**
5757where we want to minimize prediction error on a supervised task.
5858For this, we first add a logistic regression
5959layer on top of the network (more precisely on the output code of the
@@ -66,15 +66,14 @@ training. (See the :ref:`mlp` for details on the multilayer perceptron.)
6666
6767This can be easily implemented in Theano, using the class defined
6868previously for a denoising autoencoder. We can see the stacked denoising
69- autoencoder as having two facades: One is a list of
70- autoencoders. The other is an MLP. During pre-training we use the first facade, i.e., we treat our model
69+ autoencoder as having two facades: a list of
70+ autoencoders, and an MLP. During pre-training we use the first facade, i.e., we treat our model
7171as a list of autoencoders, and train each autoencoder seperately. In the
72- second stage of training, we use the second facade. These two
73- facades are linked
72+ second stage of training, we use the second facade. These two facades are linked because:
7473
75- * by the parameters shared by the autoencoders and the sigmoid layers of the MLP, and
74+ * the autoencoders and the sigmoid layers of the MLP share parameters , and
7675
77- * by feeding the latent representations of intermediate layers of the MLP as input to the autoencoders.
76+ * the latent representations computed by intermediate layers of the MLP are fed as input to the autoencoders.
7877
7978.. literalinclude:: ../code/SdA.py
8079 :start-after: start-snippet-1
@@ -83,8 +82,8 @@ facades are linked
8382``self.sigmoid_layers`` will store the sigmoid layers of the MLP facade, while
8483``self.dA_layers`` will store the denoising autoencoder associated with the layers of the MLP.
8584
86- Next, we construct ``n_layers`` denoising autoencoders and ``n_layers`` sigmoid
87- layers , where ``n_layers`` is the depth of our model. We use the
85+ Next, we construct ``n_layers`` sigmoid layers and ``n_layers`` denoising
86+ autoencoders , where ``n_layers`` is the depth of our model. We use the
8887``HiddenLayer`` class introduced in :ref:`mlp`, with one
8988modification: we replace the ``tanh`` non-linearity with the
9089logistic function :math:`s(x) = \frac{1}{1+e^{-x}}`).
0 commit comments