Skip to content

Commit ca0fe51

Browse files
committed
small tweaks
1 parent 92a7c8a commit ca0fe51

3 files changed

Lines changed: 4 additions & 3 deletions

File tree

doc/gettingstarted.txt

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -148,7 +148,8 @@ Math Conventions
148148
List of Symbols and acronyms
149149
++++++++++++++++++++++++++++
150150

151-
* D: number of input dimensions.
151+
* :math:`D`: number of input dimensions.
152+
* :math:`D_h^{(i)}`: number of hidden units in the :math:`i`-th layer.
152153
* :math:`f_{\theta}(x)`, :math:`f(x)`: prediction function of a model :math:`P(Y|x,\theta)`, defined as :math:`argmax_k P(Y=k|x,\theta)`.
153154
Note that we will often drop the :math:`\theta` subscript.
154155
* L: number of labels.

doc/images/mylenet.png

-19.6 KB
Loading

doc/mlp.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -296,8 +296,8 @@ This hyper-parameter is very much dataset-dependent. Vaguely speaking, the
296296
more complicated the input distribution is, the more capacity the network
297297
will require to model it, and so the larger the number of hidden units that
298298
will be needed (note that the number of weights in a layer, perhaps a more direct
299-
measure of capacity, is :math:`D\times H`, where :math:`D` is the number of
300-
inputs and :math:`H` is the number of hidden units).
299+
measure of capacity, is :math:`D\times D_h` (recall :math:`D` is the number of
300+
inputs and :math:`D_h` is the number of hidden units).
301301

302302
Unless we employ some regularization scheme (early stopping or L1/L2
303303
penalties), a typical number of hidden units vs. generalization performance graph will be U-shaped.

0 commit comments

Comments
 (0)