some typos

Razvan Pascanu · Razvan Pascanu · commit dde46d8f64a4 · 2010-04-06T16:05:49.000-04:00
diff --git a/doc/gettingstarted.txt b/doc/gettingstarted.txt
@@ -141,9 +141,12 @@ and then cast it to int.
 .. note::
     
     If you are running your code on the GPU and the dataset you are using 
-    is too large to fit in memory the code will crash. In such a case, do 
-    not store the data in a shared variable. You can however copy a larger chunk
-    of it at once (several minibatches) to reduce the overhead of data transfer.
+    is too large to fit in memory the code will crash. In such a case you
+    should store the data in a shared variable. You can however store a 
+    sufficiently small chunk of your data (several minibatches) in a shared
+    variable and use that during trianing. One you got through the chunk,
+    update the values it stores. This way you minimize the number of data 
+    transfers between CPU memory and GPU memory.
 
 
 
@@ -170,7 +173,7 @@ use superscripts to distinguish training set examples: :math:`x^{(i)} \in
 \mathcal{R}^D` is thus the i-th training example of dimensionality :math:`D`. Similarly,
 :math:`y^{(i)} \in \{0, ..., L\}` is the i-th label assigned to input
 :math:`x^{(i)}`. It is straightforward to extend these examples to
-:math:`y^{(i)}` that has other types (e.g. Gaussian for regression,
+ones where :math:`y^{(i)}` has other types (e.g. Gaussian for regression,
 or groups of multinomials for predicting multiple symbols).
 
 .. index:: Math Convetions
diff --git a/doc/mlp.txt b/doc/mlp.txt
@@ -410,7 +410,7 @@ This allows information to flow well upward and downward in the network and
 reduces discrepancies between layers.
 Under some assumptions, a compromise between these two constraints leads to the following
 initialization: :math:`uniform[-\frac{6}{\sqrt{fan_{in}+fan_{out}}},\frac{6}{\sqrt{fan_{in}+fan_{out}}}]`
-for tanh and :math:`uniform[-\4*frac{6}{\sqrt{fan_{in}+fan_{out}}},\4*frac{6}{\sqrt{fan_{in}+fan_{out}}}]`
+for tanh and :math:`uniform[-4*\frac{6}{\sqrt{fan_{in}+fan_{out}}},4*\frac{6}{\sqrt{fan_{in}+fan_{out}}}]`
 for sigmoid. Where :math:`fan_{in}` is the number of inputs and :math:`fan_{out}` the number of hidden units.
 For mathematical considerations please refer to [Xavier10].