@@ -41,14 +41,14 @@ hidden layer. This means that, the magnitude of weights in the transition
4141matrix can have a strong impact on the learning process.
4242
4343If the weights in this matrix are small (or, more formally, if the leading
44- eigenvalue of the weight matrix is small ), it can lead to a situation called
45- *vanishing gradients* where the gradient signal gets so small that learning
46- either becomes very slow or stops working altogether. It can also make more
47- difficult the task of learning long-term dependencies in the data.
48- Conversely, if the weights in this matrix are large (or, again, more formally,
49- if the leading eigenvalue of the weight matrix is large), it can lead to a
50- situation where the gradient signal is so large that it can cause learning to
51- diverge. This is often referred to as *exploding gradients*.
44+ eigenvalue of the weight matrix is smaller than 1.0 ), it can lead to a
45+ situation called *vanishing gradients* where the gradient signal gets so small
46+ that learning either becomes very slow or stops working altogether. It can
47+ also make more difficult the task of learning long-term dependencies in the
48+ data. Conversely, if the weights in this matrix are large (or, again, more
49+ formally, if the leading eigenvalue of the weight matrix is larger than 1.0),
50+ it can lead to a situation where the gradient signal is so large that it can
51+ cause learning to diverge. This is often referred to as *exploding gradients*.
5252
5353These issues are the main motivation behind the LSTM model which introduces a
5454new structure called a *memory cell* (see Figure 1 below). A memory cell is
0 commit comments