small tweaks

gdesjardins · gdesjardins · commit ca0fe51383ba · 2010-01-14T14:30:35.000-05:00
diff --git a/doc/gettingstarted.txt b/doc/gettingstarted.txt
@@ -148,7 +148,8 @@ Math Conventions
 List of Symbols and acronyms
 ++++++++++++++++++++++++++++
 
-* D: number of input dimensions.
+* :math:`D`: number of input dimensions.
+* :math:`D_h^{(i)}`: number of hidden units in the :math:`i`-th layer.
 * :math:`f_{\theta}(x)`, :math:`f(x)`: prediction function of a model :math:`P(Y|x,\theta)`, defined as :math:`argmax_k P(Y=k|x,\theta)`.
   Note that we will often drop the :math:`\theta` subscript.
 * L: number of labels.
diff --git a/doc/images/mylenet.png b/doc/images/mylenet.png
diff --git a/doc/mlp.txt b/doc/mlp.txt
@@ -296,8 +296,8 @@ This hyper-parameter is very much dataset-dependent. Vaguely speaking, the
 more complicated the input distribution is, the more capacity the network
 will require to model it, and so the larger the number of hidden units that
 will be needed (note that the number of weights in a layer, perhaps a more direct
-measure of capacity, is :math:`D\times H`, where :math:`D` is the number of
-inputs and :math:`H` is the number of hidden units).
+measure of capacity, is :math:`D\times D_h` (recall :math:`D` is the number of
+inputs and :math:`D_h` is the number of hidden units).
 
 Unless we employ some regularization scheme (early stopping or L1/L2
 penalties), a typical number of hidden  units vs. generalization performance graph will be U-shaped.