Skip to content

Commit bd9f394

Browse files
committed
update dA
1 parent 1f04dfa commit bd9f394

File tree

1 file changed

+282
-0
lines changed

1 file changed

+282
-0
lines changed
Lines changed: 282 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,282 @@
1+
降噪自动编码机(Denoising Autoencoders)
2+
====================================
3+
4+
这一节假设读者一节阅读了[使用逻辑回归进行MNIST分类](https://github.com/Syndrome777/DeepLearningTutorial/blob/master/2_Classifying_MNIST_using_LR_逻辑回归进行MNIST分类.md)[多层感知机](https://github.com/Syndrome777/DeepLearningTutorial/blob/master/3_Multilayer_Perceptron_多层感知机.md)。如果你需要在GPU上跑代码,你也需要阅读[GPU](http://deeplearning.net/software/theano/tutorial/using_gpu.html)
5+
6+
本节所有的代码都可用在[这里](http://deeplearning.net/tutorial/code/dA.py)下载。
7+
8+
降噪自动编码机(denoising Autoencoders)是经典自动编码机的扩展。它在[Vincent08](http://deeplearning.net/tutorial/references.html#vincent08)中作为深度网络的一个构建块被介绍。我们在通过开始简短的[自动编码机](http://deeplearning.net/tutorial/dA.html#autoencoders)来开始本教程。
9+
10+
###自动编码机
11+
[Bengio09](http://deeplearning.net/tutorial/references.html#bengio09)的第4.6节中,有自动编码机的简介。一个自动编码机,由d维的[0,1]之间的输入向量x,通过第一层映射(使用一个编码器)来获得隐藏的d‘维度的[0,1]的输出表达y。通过如下的决定性映射:
12+
13+
![y_mapping](/images/5_autoencoders_1.png)
14+
15+
这里s是一个非线性的函数,例如sigmoid。这个潜在的表达y,或者码,被映射回一个重构机z,来重构x。这个映射通过下面的简单转换来实现:
16+
17+
![z_mapping](5_autoencoders_2.png)
18+
19+
(这里撇号不代表矩阵转置)当y被给定时,z被看着是对x的预测。可选的,这个权重矩阵W‘的逆映射可用被约束为正向映射的转置:![tanspose](/images/5_autoencoders_3.png),这被称为捆绑权重。这个模型的所有参数(W,b,b‘,或者不使用捆绑权重W’)通过优化最小平均重构误差来实现训练。
20+
21+
重建误差可以通过许多方法来度量,基于输入的分布假设。这个传统的平方误差是L(x,z)=||x-z||^2,可以被使用。假如这个输入是通过比特向量或者比特概率向量来表述,重构`交叉熵`([cross-entropy)可以被表示如下:
22+
23+
![cross-entropy](/images/5_autoencoders_4.png)
24+
25+
我们希望这样,这个编码y是一个分布式表达,可以捕捉数据中主要因子变化的位置。这就类似与主成分凸出,金额也捕捉数据中主要因子的变化。事实上,假如这里有一个线性隐藏层(这个编码),并且平均平方误差准则被用以训练这个网络,然后这k个隐藏单元学习去凸出这个输入,在该数据的前k个主成分的范围中。假如这个隐藏层是非线性的,这个自动编码机表现的是与PCA不同的,它有着捕捉输入分布的多模态方面的能力。从PCA开始变得更加重要,当我们考虑堆叠混合编码机(stacking multiple encoders,在[Hinton06](http://deeplearning.net/tutorial/references.html#hinton06)中被用以构建一个深度自动编码机)。
26+
27+
因为y是视为x的有损压缩(lossy compression),它不可能对所有的x有好的压缩。优化可以使得训练样本有好的压缩,同时也希望对别的输入有好的压缩,但是不是对于任意输入。这里有一个对自动编码机的概括定义:它对于与训练样本有相似分布的测试样本有较低的重建误差,但对于随机的输入会有较高的重构误差。
28+
29+
我们希望通过Theano中来实现自动编码机,作为一个类的形式,它可以在未来去构建一个层叠自动编码机。这个第一步是去创建自动编码机的共享变量参数(W,b,b‘)。
30+
31+
```Python
32+
class dA(object):
33+
"""Denoising Auto-Encoder class (dA)
34+
35+
A denoising autoencoders tries to reconstruct the input from a corrupted
36+
version of it by projecting it first in a latent space and reprojecting
37+
it afterwards back in the input space. Please refer to Vincent et al.,2008
38+
for more details. If x is the input then equation (1) computes a partially
39+
destroyed version of x by means of a stochastic mapping q_D. Equation (2)
40+
computes the projection of the input into the latent space. Equation (3)
41+
computes the reconstruction of the input, while equation (4) computes the
42+
reconstruction error.
43+
44+
.. math::
45+
46+
\tilde{x} ~ q_D(\tilde{x}|x) (1)
47+
48+
y = s(W \tilde{x} + b) (2)
49+
50+
x = s(W' y + b') (3)
51+
52+
L(x,z) = -sum_{k=1}^d [x_k \log z_k + (1-x_k) \log( 1-z_k)] (4)
53+
54+
"""
55+
56+
def __init__(
57+
self,
58+
numpy_rng,
59+
theano_rng=None,
60+
input=None,
61+
n_visible=784,
62+
n_hidden=500,
63+
W=None,
64+
bhid=None,
65+
bvis=None
66+
):
67+
"""
68+
Initialize the dA class by specifying the number of visible units (the
69+
dimension d of the input ), the number of hidden units ( the dimension
70+
d' of the latent or hidden space ) and the corruption level. The
71+
constructor also receives symbolic variables for the input, weights and
72+
bias. Such a symbolic variables are useful when, for example the input
73+
is the result of some computations, or when weights are shared between
74+
the dA and an MLP layer. When dealing with SdAs this always happens,
75+
the dA on layer 2 gets as input the output of the dA on layer 1,
76+
and the weights of the dA are used in the second stage of training
77+
to construct an MLP.
78+
79+
:type numpy_rng: numpy.random.RandomState
80+
:param numpy_rng: number random generator used to generate weights
81+
82+
:type theano_rng: theano.tensor.shared_randomstreams.RandomStreams
83+
:param theano_rng: Theano random generator; if None is given one is
84+
generated based on a seed drawn from `rng`
85+
86+
:type input: theano.tensor.TensorType
87+
:param input: a symbolic description of the input or None for
88+
standalone dA
89+
90+
:type n_visible: int
91+
:param n_visible: number of visible units
92+
93+
:type n_hidden: int
94+
:param n_hidden: number of hidden units
95+
96+
:type W: theano.tensor.TensorType
97+
:param W: Theano variable pointing to a set of weights that should be
98+
shared belong the dA and another architecture; if dA should
99+
be standalone set this to None
100+
101+
:type bhid: theano.tensor.TensorType
102+
:param bhid: Theano variable pointing to a set of biases values (for
103+
hidden units) that should be shared belong dA and another
104+
architecture; if dA should be standalone set this to None
105+
106+
:type bvis: theano.tensor.TensorType
107+
:param bvis: Theano variable pointing to a set of biases values (for
108+
visible units) that should be shared belong dA and another
109+
architecture; if dA should be standalone set this to None
110+
111+
112+
"""
113+
self.n_visible = n_visible
114+
self.n_hidden = n_hidden
115+
116+
# create a Theano random generator that gives symbolic random values
117+
if not theano_rng:
118+
theano_rng = RandomStreams(numpy_rng.randint(2 ** 30))
119+
120+
# note : W' was written as `W_prime` and b' as `b_prime`
121+
if not W:
122+
# W is initialized with `initial_W` which is uniformely sampled
123+
# from -4*sqrt(6./(n_visible+n_hidden)) and
124+
# 4*sqrt(6./(n_hidden+n_visible))the output of uniform if
125+
# converted using asarray to dtype
126+
# theano.config.floatX so that the code is runable on GPU
127+
initial_W = numpy.asarray(
128+
numpy_rng.uniform(
129+
low=-4 * numpy.sqrt(6. / (n_hidden + n_visible)),
130+
high=4 * numpy.sqrt(6. / (n_hidden + n_visible)),
131+
size=(n_visible, n_hidden)
132+
),
133+
dtype=theano.config.floatX
134+
)
135+
W = theano.shared(value=initial_W, name='W', borrow=True)
136+
137+
if not bvis:
138+
bvis = theano.shared(
139+
value=numpy.zeros(
140+
n_visible,
141+
dtype=theano.config.floatX
142+
),
143+
borrow=True
144+
)
145+
146+
if not bhid:
147+
bhid = theano.shared(
148+
value=numpy.zeros(
149+
n_hidden,
150+
dtype=theano.config.floatX
151+
),
152+
name='b',
153+
borrow=True
154+
)
155+
156+
self.W = W
157+
# b corresponds to the bias of the hidden
158+
self.b = bhid
159+
# b_prime corresponds to the bias of the visible
160+
self.b_prime = bvis
161+
# tied weights, therefore W_prime is W transpose
162+
self.W_prime = self.W.T
163+
self.theano_rng = theano_rng
164+
# if no input is given, generate a variable representing the input
165+
if input is None:
166+
# we use a matrix because we expect a minibatch of several
167+
# examples, each example being a row
168+
self.x = T.dmatrix(name='input')
169+
else:
170+
self.x = input
171+
172+
self.params = [self.W, self.b, self.b_prime]
173+
```
174+
注意,我们将`input`作为一个参数来传递给自动编码机。我们可以串联自动编码机来实现深度网络:第k层的输出(y)可以变成第k+1层的输入。
175+
176+
现在,我们可以预计去重构信号的潜在表达的计算量。
177+
178+
```Python
179+
def get_hidden_values(self, input):
180+
""" Computes the values of the hidden layer """
181+
return T.nnet.sigmoid(T.dot(input, self.W) + self.b)
182+
183+
def get_reconstructed_input(self, hidden):
184+
"""Computes the reconstructed input given the values of the
185+
hidden layer
186+
187+
"""
188+
return T.nnet.sigmoid(T.dot(hidden, self.W_prime) + self.b_prime)
189+
```
190+
然后我们通过这些函数可以计算一个随机梯度下降步骤的cost和更新。
191+
192+
```Python
193+
def get_cost_updates(self, corruption_level, learning_rate):
194+
""" This function computes the cost and the updates for one trainng
195+
step of the dA """
196+
197+
tilde_x = self.get_corrupted_input(self.x, corruption_level)
198+
y = self.get_hidden_values(tilde_x)
199+
z = self.get_reconstructed_input(y)
200+
# note : we sum over the size of a datapoint; if we are using
201+
# minibatches, L will be a vector, with one entry per
202+
# example in minibatch
203+
L = - T.sum(self.x * T.log(z) + (1 - self.x) * T.log(1 - z), axis=1)
204+
# note : L is now a vector, where each element is the
205+
# cross-entropy cost of the reconstruction of the
206+
# corresponding example of the minibatch. We need to
207+
# compute the average of all these to get the cost of
208+
# the minibatch
209+
cost = T.mean(L)
210+
211+
# compute the gradients of the cost of the `dA` with respect
212+
# to its parameters
213+
gparams = T.grad(cost, self.params)
214+
# generate the list of updates
215+
updates = [
216+
(param, param - learning_rate * gparam)
217+
for param, gparam in zip(self.params, gparams)
218+
]
219+
220+
return (cost, updates)
221+
```
222+
我们现在可以定义一个函数来实现重复的更新参数W,b,b‘,直到这个重构消耗大约是最小的。
223+
224+
```Python
225+
da = dA(
226+
numpy_rng=rng,
227+
theano_rng=theano_rng,
228+
input=x,
229+
n_visible=28 * 28,
230+
n_hidden=500
231+
)
232+
233+
cost, updates = da.get_cost_updates(
234+
corruption_level=0.,
235+
learning_rate=learning_rate
236+
)
237+
238+
train_da = theano.function(
239+
[index],
240+
cost,
241+
updates=updates,
242+
givens={
243+
x: train_set_x[index * batch_size: (index + 1) * batch_size]
244+
}
245+
)
246+
```
247+
假设没有最小化重构误差的限制,一个有n个输入的自动编码机
248+
249+
250+
251+
252+
253+
254+
255+
256+
257+
258+
259+
260+
261+
262+
263+
264+
265+
266+
267+
268+
269+
270+
271+
272+
273+
274+
275+
276+
277+
278+
279+
280+
281+
282+

0 commit comments

Comments
 (0)