|
51 | 51 | }, |
52 | 52 | { |
53 | 53 | "cell_type": "code", |
54 | | - "execution_count": 2, |
| 54 | + "execution_count": 1, |
55 | 55 | "metadata": {}, |
56 | 56 | "outputs": [ |
57 | 57 | { |
|
124 | 124 | }, |
125 | 125 | { |
126 | 126 | "cell_type": "code", |
127 | | - "execution_count": 4, |
| 127 | + "execution_count": 2, |
128 | 128 | "metadata": {}, |
129 | 129 | "outputs": [ |
130 | 130 | { |
|
153 | 153 | }, |
154 | 154 | { |
155 | 155 | "cell_type": "code", |
156 | | - "execution_count": 5, |
| 156 | + "execution_count": 6, |
157 | 157 | "metadata": {}, |
158 | 158 | "outputs": [ |
159 | 159 | { |
160 | 160 | "name": "stdout", |
161 | 161 | "output_type": "stream", |
162 | 162 | "text": [ |
163 | | - "tensor([[-0.0204, -0.0268, -0.0829, 0.1420, -0.0192, 0.1848, 0.0723, -0.0393,\n", |
164 | | - " -0.0275, 0.0867]], grad_fn=<ThAddmmBackward>)\n" |
| 163 | + "tensor([[ 0.1120, 0.0713, 0.1014, -0.0696, -0.1210, 0.0084, -0.0206, 0.1366,\n", |
| 164 | + " -0.0455, -0.0036]], grad_fn=<AddmmBackward>)\n" |
165 | 165 | ] |
166 | 166 | } |
167 | 167 | ], |
|
181 | 181 | }, |
182 | 182 | { |
183 | 183 | "cell_type": "code", |
184 | | - "execution_count": 6, |
| 184 | + "execution_count": 7, |
185 | 185 | "metadata": {}, |
186 | 186 | "outputs": [], |
187 | 187 | "source": [ |
|
232 | 232 | }, |
233 | 233 | { |
234 | 234 | "cell_type": "code", |
235 | | - "execution_count": 7, |
| 235 | + "execution_count": 8, |
236 | 236 | "metadata": {}, |
237 | 237 | "outputs": [ |
238 | 238 | { |
239 | 239 | "name": "stdout", |
240 | 240 | "output_type": "stream", |
241 | 241 | "text": [ |
242 | | - "tensor(1.3172, grad_fn=<MseLossBackward>)\n" |
| 242 | + "tensor(0.8109, grad_fn=<MseLossBackward>)\n" |
243 | 243 | ] |
244 | 244 | } |
245 | 245 | ], |
|
257 | 257 | "cell_type": "markdown", |
258 | 258 | "metadata": {}, |
259 | 259 | "source": [ |
260 | | - "Now, if you follow ``loss`` in the backward direction, using its\n", |
261 | | - "``.grad_fn`` attribute, you will see a graph of computations that looks\n", |
262 | | - "like this:\n", |
| 260 | + "现在,如果在反向过程中跟随``loss`` , 使用它的\n", |
| 261 | + "``.grad_fn`` 属性,将看到如下所示的计算图。\n", |
263 | 262 | "\n", |
264 | 263 | "::\n", |
265 | 264 | "\n", |
|
268 | 267 | " -> MSELoss\n", |
269 | 268 | " -> loss\n", |
270 | 269 | "\n", |
271 | | - "So, when we call ``loss.backward()``, the whole graph is differentiated\n", |
272 | | - "w.r.t. the loss, and all Tensors in the graph that has ``requires_grad=True``\n", |
273 | | - "will have their ``.grad`` Tensor accumulated with the gradient.\n", |
| 270 | + "所以,当我们调用 ``loss.backward()``时,整张计算图都会\n", |
| 271 | + "根据loss进行微分,而且图中所有设置为``requires_grad=True``的张量\n", |
| 272 | + "将会拥有一个随着梯度累积的``.grad`` 张量。\n", |
274 | 273 | "\n", |
275 | | - "For illustration, let us follow a few steps backward:\n", |
| 274 | + "为了说明,让我们向后退几步:\n", |
276 | 275 | "\n" |
277 | 276 | ] |
278 | 277 | }, |
279 | 278 | { |
280 | 279 | "cell_type": "code", |
281 | | - "execution_count": null, |
| 280 | + "execution_count": 9, |
282 | 281 | "metadata": {}, |
283 | | - "outputs": [], |
| 282 | + "outputs": [ |
| 283 | + { |
| 284 | + "name": "stdout", |
| 285 | + "output_type": "stream", |
| 286 | + "text": [ |
| 287 | + "<MseLossBackward object at 0x7f3b49fe2470>\n", |
| 288 | + "<AddmmBackward object at 0x7f3bb05f17f0>\n", |
| 289 | + "<AccumulateGrad object at 0x7f3b4a3c34e0>\n" |
| 290 | + ] |
| 291 | + } |
| 292 | + ], |
284 | 293 | "source": [ |
285 | 294 | "print(loss.grad_fn) # MSELoss\n", |
286 | 295 | "print(loss.grad_fn.next_functions[0][0]) # Linear\n", |
|
304 | 313 | }, |
305 | 314 | { |
306 | 315 | "cell_type": "code", |
307 | | - "execution_count": 8, |
| 316 | + "execution_count": 10, |
308 | 317 | "metadata": {}, |
309 | 318 | "outputs": [ |
310 | 319 | { |
|
314 | 323 | "conv1.bias.grad before backward\n", |
315 | 324 | "tensor([0., 0., 0., 0., 0., 0.])\n", |
316 | 325 | "conv1.bias.grad after backward\n", |
317 | | - "tensor([ 0.0074, -0.0249, -0.0107, 0.0326, -0.0017, -0.0059])\n" |
| 326 | + "tensor([ 0.0051, 0.0042, 0.0026, 0.0152, -0.0040, -0.0036])\n" |
318 | 327 | ] |
319 | 328 | } |
320 | 329 | ], |
|
364 | 373 | }, |
365 | 374 | { |
366 | 375 | "cell_type": "code", |
367 | | - "execution_count": 9, |
| 376 | + "execution_count": 11, |
368 | 377 | "metadata": {}, |
369 | 378 | "outputs": [], |
370 | 379 | "source": [ |
|
385 | 394 | "cell_type": "markdown", |
386 | 395 | "metadata": {}, |
387 | 396 | "source": [ |
388 | | - ".. Note::\n", |
| 397 | + ".. 注意::\n", |
389 | 398 | " \n", |
390 | | - " Observe how gradient buffers had to be manually set to zero using\n", |
391 | | - " ``optimizer.zero_grad()``. This is because gradients are accumulated\n", |
392 | | - " as explained in `Backprop`_ section.\n", |
| 399 | + " 观察如何使用``optimizer.zero_grad()``手动将梯度缓冲区设置为零。\n", |
| 400 | + " 这是因为梯度是按Backprop部分中的说明累积的。\n", |
393 | 401 | "\n" |
394 | 402 | ] |
395 | 403 | }, |
|
403 | 411 | ], |
404 | 412 | "metadata": { |
405 | 413 | "kernelspec": { |
406 | | - "display_name": "Pytorch for Deeplearning", |
| 414 | + "display_name": "Python 3", |
407 | 415 | "language": "python", |
408 | | - "name": "pytorch" |
| 416 | + "name": "python3" |
409 | 417 | }, |
410 | 418 | "language_info": { |
411 | 419 | "codemirror_mode": { |
|
417 | 425 | "name": "python", |
418 | 426 | "nbconvert_exporter": "python", |
419 | 427 | "pygments_lexer": "ipython3", |
420 | | - "version": "3.6.7" |
| 428 | + "version": "3.7.3" |
421 | 429 | } |
422 | 430 | }, |
423 | 431 | "nbformat": 4, |
|
0 commit comments