@@ -75,10 +75,10 @@ previous state, as needed.
7575.. figure:: images/lstm_memorycell.png
7676 :align: center
7777
78- **Figure 1** : Illustration of an LSTM memory cell.
78+ **Figure 1**: Illustration of an LSTM memory cell.
7979
8080The equations below describe how a layer of memory cells is updated at every
81- timestep :math:`t`. In these equations :
81+ timestep :math:`t`. In these equations:
8282
8383* :math:`x_t` is the input to the memory cell layer at time :math:`t`
8484* :math:`W_i`, :math:`W_f`, :math:`W_c`, :math:`W_o`, :math:`U_i`,
@@ -89,7 +89,7 @@ timestep :math:`t`. In these equations :
8989
9090First, we compute the values for :math:`i_t`, the input gate, and
9191:math:`\widetilde{C_t}` the candidate value for the states of the memory
92- cells at time :math:`t` :
92+ cells at time :math:`t`:
9393
9494.. math::
9595 :label: 1
@@ -102,7 +102,7 @@ cells at time :math:`t` :
102102 \widetilde{C_t} = tanh(W_c x_t + U_c h_{t-1} + b_c)
103103
104104Second, we compute the value for :math:`f_t`, the activation of the memory
105- cells' forget gates at time :math:`t` :
105+ cells' forget gates at time :math:`t`:
106106
107107.. math::
108108 :label: 3
@@ -111,15 +111,15 @@ cells' forget gates at time :math:`t` :
111111
112112Given the value of the input gate activation :math:`i_t`, the forget gate
113113activation :math:`f_t` and the candidate state value :math:`\widetilde{C_t}`,
114- we can compute :math:`C_t` the memory cells' new state at time :math:`t` :
114+ we can compute :math:`C_t` the memory cells' new state at time :math:`t`:
115115
116116.. math::
117117 :label: 4
118118
119119 C_t = i_t * \widetilde{C_t} + f_t * C_{t-1}
120120
121121With the new state of the memory cells, we can compute the value of their
122- output gates and, subsequently, their outputs :
122+ output gates and, subsequently, their outputs:
123123
124124.. math::
125125 :label: 5
@@ -139,7 +139,7 @@ In this variant, the activation of a cell’s output gate does not depend on the
139139memory cell’s state :math:`C_t`. This allows us to perform part of the
140140computation more efficiently (see the implementation note, below, for
141141details). This means that, in the variant we have implemented, there is no
142- matrix :math:`V_o` and equation :eq:`5` is replaced by equation :eq:`5-alt` :
142+ matrix :math:`V_o` and equation :eq:`5` is replaced by equation :eq:`5-alt`:
143143
144144.. math::
145145 :label: 5-alt
@@ -170,7 +170,7 @@ concatenating the four matrices :math:`W_*` into a single weight matrix
170170:math:`W` and performing the same concatenation on the weight matrices
171171:math:`U_*` to produce the matrix :math:`U` and the bias vectors :math:`b_*`
172172to produce the vector :math:`b`. Then, the pre-nonlinearity activations can
173- be computed with :
173+ be computed with:
174174
175175.. math::
176176
@@ -187,11 +187,11 @@ Code - Citations - Contact
187187Code
188188====
189189
190- The LSTM implementation can be found in the two following files :
190+ The LSTM implementation can be found in the two following files:
191191
192- * `lstm.py <http://deeplearning.net/tutorial/code/lstm.py>`_ : Main script. Defines and train the model.
192+ * `lstm.py <http://deeplearning.net/tutorial/code/lstm.py>`_: Main script. Defines and train the model.
193193
194- * `imdb.py <http://deeplearning.net/tutorial/code/imdb.py>`_ : Secondary script. Handles the loading and preprocessing of the IMDB dataset.
194+ * `imdb.py <http://deeplearning.net/tutorial/code/imdb.py>`_: Secondary script. Handles the loading and preprocessing of the IMDB dataset.
195195
196196After downloading both scripts and putting both in the same folder, the user
197197can run the code by calling:
@@ -202,7 +202,7 @@ can run the code by calling:
202202
203203The script will automatically download the data and decompress it.
204204
205- **Note** : The provided code supports the Stochastic Gradient Descent (SGD),
205+ **Note**: The provided code supports the Stochastic Gradient Descent (SGD),
206206AdaDelta and RMSProp optimization methods. You are advised to use AdaDelta or
207207RMSProp because SGD appears to performs poorly on this task with this
208208particular model.
0 commit comments