1+ .. _rnnslu:
2+
13Recurrent Neural Networks with Word Embeddings
24**********************************************
35
@@ -15,11 +17,13 @@ in order to perform Semantic Parsing / Slot-Filling (Spoken Language Understandi
1517Code - Citations - Contact
1618++++++++++++++++++++++++++
1719
18- **Code**
20+ Code
21+ ====
1922
2023Directly running experiments is also possible using this `github repository <https://github.com/mesnilgr/is13>`_.
2124
22- **Papers**
25+ Papers
26+ ======
2327
2428If you use this tutorial, cite the following papers:
2529
@@ -35,7 +39,8 @@ If you use this tutorial, cite the following papers:
3539
3640Thank you!
3741
38- **Contact**
42+ Contact
43+ =======
3944
4045Please email to `Grégoire Mesnil <http://www-etud.iro.umontreal.ca/~mesnilgr/>`_ for any
4146problem report or feedback. We will be glad to hear from you.
@@ -91,7 +96,8 @@ measure the performance of our models.
9196Recurrent Neural Network Model
9297++++++++++++++++++++++++++++++
9398
94- **Raw input encoding**
99+ Raw input encoding
100+ ==================
95101
96102A token corresponds to a word. Each token in the ATIS vocabulary is associated to an index. Each sentence is a
97103array of indexes (``int32``). Then, each set (train, valid, test) is a list of arrays of indexes. A python
@@ -115,7 +121,8 @@ Same thing for labels corresponding to this particular sentence.
115121 'O', 'B-arrive_time.time_relative', 'B-arrive_time.time',
116122 'I-arrive_time.time', 'I-arrive_time.time']
117123
118- **Context window**
124+ Context window
125+ ==============
119126
120127Given a sentence i.e. an array of indexes, and a window size i.e. 1,3,5,..., we
121128need to convert each word in the sentence to a context window surrounding this
@@ -165,7 +172,8 @@ Here is a sample:
165172To summarize, we started with an array of indexes and ended with a matrix of
166173indexes. Each line corresponds to the context window surrounding this word.
167174
168- **Word embeddings**
175+ Word embeddings
176+ =================
169177
170178Once we have the sentence converted to context windows i.e. a matrix of indexes, we have to associate
171179these indexes to the embeddings (real-valued vector associated to each word).
@@ -218,7 +226,8 @@ We now have a sequence (of length 5 which is corresponds to the length of the
218226sentence) of **context window word embeddings** which is easy to feed to a simple
219227recurrent neural network to iterate with.
220228
221- **Elman recurrent neural network**
229+ Elman recurrent neural network
230+ ==============================
222231
223232The followin (Elman) recurrent neural network (E-RNN) takes as input the current input
224233(time ``t``) and the previous hiddent state (time ``t-1``). Then it iterates.
@@ -329,22 +338,25 @@ Note that the extension is `txt` and you will have to cahnge it to `pl`.
329338Training
330339++++++++
331340
332- **Updates**
341+ Updates
342+ =======
333343
334344For stochastic gradient descent (SGD) update, we consider the whole sentence as a mini-batch
335345and perform one update per sentence. It is possible to perform a pure SGD (contrary to mini-batch)
336346where the update is done on only one single word at a time.
337347
338348After each iteration/update, we normalize the word embeddings to keep them on a unit sphere.
339349
340- **Stopping Criterion**
350+ Stopping Criterion
351+ ==================
341352
342353Early-stopping on a validation set is our regularization technique:
343354the training is run for a given number of epochs (a single pass through the
344355whole dataset) and keep the best model along with respect to the F1 score
345356computed on the validation set after each epoch.
346357
347- **Hyper-Parameter Selection**
358+ Hyper-Parameter Selection
359+ =========================
348360
349361Although there is interesting research/`code
350362<https://github.com/JasperSnoek/spearmint>`_ on the topic of automatic
@@ -373,7 +385,8 @@ The user can then run the code by calling:
373385 ...
374386 ('BEST RESULT: epoch', 57, 'valid F1', 97.23, 'best test F1', 94.2, 'with the model', 'rnnslu')
375387
376- **Timing**
388+ Timing
389+ ======
377390
378391Running experiments on ATIS using this `repository <https://github.com/mesnilgr/is13>`_
379392will run one epoch in less than 40 seconds on i7 CPU 950 @ 3.07GHz using less than 200 Mo of RAM::
@@ -394,7 +407,8 @@ After a few epochs, you obtain decent performance **94.48 % of F1 score**.::
394407 [learning] epoch 44 >> 100.00% completed in 35.31 (sec) <<
395408 [...]
396409
397- **Word Embedding Nearest Neighbors**
410+ Word Embedding Nearest Neighbors
411+ ================================
398412
399413We can check the k-nearest neighbors of the learned embeddings. L2 and
400414cosine distance gave the same results so we plot them for the cosine distance.
0 commit comments