Skip to content

Commit b241bde

Browse files
author
Yoshua Bengio
committed
editing SdA txt
1 parent d6c0574 commit b241bde

1 file changed

Lines changed: 16 additions & 12 deletions

File tree

doc/SdA.txt

Lines changed: 16 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ Stacked Denoising Autoencoders (SdA)
88

99

1010
The Stacked Denoising Autoencoder (SdA) is an extension of the stacked
11-
autoencoder and it was introduced in [Vincent08]_. We will start the
11+
autoencoder [Bengio07]_ and it was introduced in [Vincent08]_. We will start the
1212
tutorial with a short digression on :ref:`autoencoders`
1313
and then move on to how classical
1414
autoencoders are extended to denoising autoencoders (:ref:`dA`).
@@ -22,39 +22,41 @@ Autoencoders
2222
+++++++++++++
2323

2424
An autoencoder takes an input :math:`\mathbf{x} \in [0,1]^d` and first
25-
maps it to a hidden representation :math:`\mathbf{y} \in [0,1]^{d'}`
25+
maps it (with an *encoder*) to a hidden representation :math:`\mathbf{y} \in [0,1]^{d'}`
2626
through a deterministic mapping:
2727

2828
.. math::
2929

3030
\mathbf{y} = s(\mathbf{W}\mathbf{x} + \mathbf{b})
3131

32-
The latent representation :math:`\mathbf{y}` is then mapped back into a
32+
The latent representation :math:`\mathbf{y}` is then mapped back (with a *decoder*) into a
3333
"reconstructed" vector :math:`\mathbf{z}` of same shape as
3434
:math:`\mathbf{x}` through a similar transformation, namely:
3535

3636
.. math::
3737

3838
\mathbf{z} = s(\mathbf{W'}\mathbf{y} + \mathbf{b'})
3939

40-
The weights matrix :math:`\mathbf{W'}` of the reverse mapping may be
41-
optionally constrained by :math:`\mathbf{W'} = \mathbf{W}^T`, this is
42-
called using *tied weights*. The parameters of this model (nameley
40+
where ' does not indicate transpose, and
41+
:math:`\mathbf{z}` should be seen as a prediction of :math:`\mathbf{x}`.
42+
The weight matrix :math:`\mathbf{W'}` of the reverse mapping may be
43+
optionally constrained by :math:`\mathbf{W'} = \mathbf{W}^T`, which is
44+
an instance of *tied weights*. The parameters of this model (namely
4345
:math:`\mathbf{W}`, :math:`\mathbf{b}`,
4446
:math:`\mathbf{b'}` and, if one doesn't use tied weights, also
4547
:math:`\mathbf{W'}`) are optimized such that the average reconstruction
4648
error is minimized. The reconstruction error can be measured using the
47-
traditional *squared error* :math:`L(\mathbf{x}, \mathbf{z}) = | \mathbf{x} - \mathbf{z} |^2`,
49+
traditional *squared error* :math:`L(\mathbf{x}, \mathbf{z}) = || \mathbf{x} - \mathbf{z} ||^2`,
4850
or if the input is interpreted as either bit vectors or vectors of
4951
bit probabilities by the reconstruction *cross-entropy* defined as :
5052

5153
.. math::
5254

53-
L_{H} (\mathbf{x}, \mathbf{z} = - \sum^d_{k=1}[\mathbf{x}_k \log
55+
L_{H} (\mathbf{x}, \mathbf{z}) = - \sum^d_{k=1}[\mathbf{x}_k \log
5456
\mathbf{z}_k + (1 - \mathbf{x}_k)\log(1 - \mathbf{z}_k)]
5557

5658

57-
We want to implent this behaviour using Theano, in the form of a class,
59+
We want to implement this behavior using Theano, in the form of a class,
5860
that could be afterwards used in constructing a stacked autoencoder. The
5961
first step is to create shared variables for the parameters of the
6062
autoencoder ( :math:`\mathbf{W}`, :math:`\mathbf{b}` and
@@ -92,7 +94,9 @@ autoencoder ( :math:`\mathbf{W}`, :math:`\mathbf{b}` and
9294

9395
Note that we pass the ``input`` to the autoencoder as a
9496
parameter. This is such that later we can concatenate layers of
95-
autoencoders to form a deep network.
97+
autoencoders to form a deep network: the symbolic output (the :math:`\mathbf{y}` above, self.y
98+
in the code below) of
99+
the k-th layer will be the symbolic input of the (k+1)-th.
96100

97101
Now we can compute the latent representation and the reconstructed
98102
signal :
@@ -127,11 +131,11 @@ Denoising Autoencoders (dA)
127131
The idea behind denoising autoencoders is simple. In order to enforce
128132
the hidden layer to discover more roboust features we train the
129133
autoencoder to reconstruct the input from a corrupted version of it.
130-
This can be understoond from different perspectives
134+
This can be understood from different perspectives
131135
( the manifold learning perspective,
132136
stochastic operator perspective,
133137
bottom-up -- information theoretic perspective,
134-
top-down -- generative model perspective ), all of which being explained in
138+
top-down -- generative model perspective ), all of which are explained in
135139
[Vincent08]_.
136140

137141

0 commit comments

Comments
 (0)