@@ -8,7 +8,7 @@ Stacked Denoising Autoencoders (SdA)
88
99
1010The Stacked Denoising Autoencoder (SdA) is an extension of the stacked
11- autoencoder and it was introduced in [Vincent08]_. We will start the
11+ autoencoder [Bengio07]_ and it was introduced in [Vincent08]_. We will start the
1212tutorial with a short digression on :ref:`autoencoders`
1313and then move on to how classical
1414autoencoders are extended to denoising autoencoders (:ref:`dA`).
@@ -22,39 +22,41 @@ Autoencoders
2222+++++++++++++
2323
2424An autoencoder takes an input :math:`\mathbf{x} \in [0,1]^d` and first
25- maps it to a hidden representation :math:`\mathbf{y} \in [0,1]^{d'}`
25+ maps it (with an *encoder*) to a hidden representation :math:`\mathbf{y} \in [0,1]^{d'}`
2626through a deterministic mapping:
2727
2828.. math::
2929
3030 \mathbf{y} = s(\mathbf{W}\mathbf{x} + \mathbf{b})
3131
32- The latent representation :math:`\mathbf{y}` is then mapped back into a
32+ The latent representation :math:`\mathbf{y}` is then mapped back (with a *decoder*) into a
3333"reconstructed" vector :math:`\mathbf{z}` of same shape as
3434:math:`\mathbf{x}` through a similar transformation, namely:
3535
3636.. math::
3737
3838 \mathbf{z} = s(\mathbf{W'}\mathbf{y} + \mathbf{b'})
3939
40- The weights matrix :math:`\mathbf{W'}` of the reverse mapping may be
41- optionally constrained by :math:`\mathbf{W'} = \mathbf{W}^T`, this is
42- called using *tied weights*. The parameters of this model (nameley
40+ where ' does not indicate transpose, and
41+ :math:`\mathbf{z}` should be seen as a prediction of :math:`\mathbf{x}`.
42+ The weight matrix :math:`\mathbf{W'}` of the reverse mapping may be
43+ optionally constrained by :math:`\mathbf{W'} = \mathbf{W}^T`, which is
44+ an instance of *tied weights*. The parameters of this model (namely
4345:math:`\mathbf{W}`, :math:`\mathbf{b}`,
4446:math:`\mathbf{b'}` and, if one doesn't use tied weights, also
4547:math:`\mathbf{W'}`) are optimized such that the average reconstruction
4648error is minimized. The reconstruction error can be measured using the
47- traditional *squared error* :math:`L(\mathbf{x}, \mathbf{z}) = | \mathbf{x} - \mathbf{z} |^2`,
49+ traditional *squared error* :math:`L(\mathbf{x}, \mathbf{z}) = || \mathbf{x} - \mathbf{z} | |^2`,
4850or if the input is interpreted as either bit vectors or vectors of
4951bit probabilities by the reconstruction *cross-entropy* defined as :
5052
5153.. math::
5254
53- L_{H} (\mathbf{x}, \mathbf{z} = - \sum^d_{k=1}[\mathbf{x}_k \log
55+ L_{H} (\mathbf{x}, \mathbf{z}) = - \sum^d_{k=1}[\mathbf{x}_k \log
5456 \mathbf{z}_k + (1 - \mathbf{x}_k)\log(1 - \mathbf{z}_k)]
5557
5658
57- We want to implent this behaviour using Theano, in the form of a class,
59+ We want to implement this behavior using Theano, in the form of a class,
5860that could be afterwards used in constructing a stacked autoencoder. The
5961first step is to create shared variables for the parameters of the
6062autoencoder ( :math:`\mathbf{W}`, :math:`\mathbf{b}` and
@@ -92,7 +94,9 @@ autoencoder ( :math:`\mathbf{W}`, :math:`\mathbf{b}` and
9294
9395Note that we pass the ``input`` to the autoencoder as a
9496parameter. This is such that later we can concatenate layers of
95- autoencoders to form a deep network.
97+ autoencoders to form a deep network: the symbolic output (the :math:`\mathbf{y}` above, self.y
98+ in the code below) of
99+ the k-th layer will be the symbolic input of the (k+1)-th.
96100
97101Now we can compute the latent representation and the reconstructed
98102signal :
@@ -127,11 +131,11 @@ Denoising Autoencoders (dA)
127131The idea behind denoising autoencoders is simple. In order to enforce
128132the hidden layer to discover more roboust features we train the
129133autoencoder to reconstruct the input from a corrupted version of it.
130- This can be understoond from different perspectives
134+ This can be understood from different perspectives
131135( the manifold learning perspective,
132136stochastic operator perspective,
133137bottom-up -- information theoretic perspective,
134- top-down -- generative model perspective ), all of which being explained in
138+ top-down -- generative model perspective ), all of which are explained in
135139[Vincent08]_.
136140
137141
0 commit comments