Skip to content

Commit 0624849

Browse files
committed
Add lstm.doc and update dataset path
1 parent d788ebc commit 0624849

File tree

2 files changed

+66
-1
lines changed

2 files changed

+66
-1
lines changed

code/imdb.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ def prepare_data(seqs, labels, maxlen=None):
4040

4141
return x, x_mask, labels
4242

43-
def load_data(path="/data/lisatmp3/chokyun/tweets_sa/imdb/aclImdb/imdb.pkl", n_words=100000, valid_portion=0.1):
43+
def load_data(path="imdb.pkl", n_words=100000, valid_portion=0.1):
4444
''' Loads the dataset
4545
4646
:type dataset: string

doc/lstm.txt

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
.. _lstm:
2+
3+
Recurrent Neural Networks with Word Embeddings
4+
**********************************************
5+
6+
Summary
7+
+++++++
8+
9+
This tutorial aims to provide an example of how a Recurrent Neural Network (RNN) using the Long Short Term Memory (LSTM) architecture can be implemented using Theano. In this tutorial, this model is used to perform sentiment analysis on movie reviews from the `Large Movie Review Dataset <http://ai.stanford.edu/~amaas/data/sentiment/>`_, sometimes known as the IMDB dataset.
10+
11+
In this task, given a movie review, the model attempts to predict whether it is positive or negative. This is a binary classification task.
12+
13+
Code - Citations - Contact
14+
++++++++++++++++++++++++++
15+
16+
Code
17+
====
18+
19+
The LSTM implementation can be found in the two following files :
20+
21+
* `lstm.py <http://deeplearning.net/tutorial/code/lstm.py>`_ : Main script. Defines and train the model.
22+
23+
* `imdb.py <http://deeplearning.net/tutorial/code/imdb.py>`_ : Secondary script. Handles the loading and preprocessing of the IMDB dataset.
24+
25+
Data
26+
====
27+
28+
As previously mentionned, the provided scripts are used to train a LSTM
29+
recurrent neural on the Large Movie Review Dataset dataset.
30+
31+
While the dataset is public, in this tutorial we provide a copy of the dataset
32+
that has previously been preprocessed according to the needs of this LSTM
33+
implementation. You can download this preprocessed version of the dataset
34+
using the script `download.sh` and uncompress it.
35+
36+
Papers
37+
======
38+
39+
If you use this tutorial, please cite the following papers:
40+
41+
* `[pdf] <http://deeplearning.cs.cmu.edu/pdfs/Hochreiter97_lstm.pdf>`_ HOCHREITER, Sepp et SCHMIDHUBER, Jürgen. Long short-term memory. Neural computation, 1997, vol. 9, no 8, p. 1735-1780. 1997.
42+
43+
* `[pdf] <http://www.iro.umontreal.ca/~lisa/pointeurs/nips2012_deep_workshop_theano_final.pdf>`_ Bastien, Frédéric, Lamblin, Pascal, Pascanu, Razvan, Bergstra, James, Goodfellow, Ian, Bergeron, Arnaud, Bouchard, Nicolas, and Bengio, Yoshua. Theano: new features and speed improvements. NIPS Workshop on Deep Learning and Unsupervised Feature Learning, 2012.
44+
45+
* `[pdf] <http://www.iro.umontreal.ca/~lisa/pointeurs/theano_scipy2010.pdf>`_ Bergstra, James, Breuleux, Olivier, Bastien, Frédéric, Lamblin, Pascal, Pascanu, Razvan, Desjardins, Guillaume, Turian, Joseph, Warde-Farley, David, and Bengio, Yoshua. Theano: a CPU and GPU math expression compiler. In Proceedings of the Python for Scientific Computing Conference (SciPy), June 2010.
46+
47+
Thank you!
48+
49+
Contact
50+
=======
51+
52+
Please email `Kyunghyun Cho <http://www.kyunghyuncho.me/>`_ for any
53+
problem report or feedback. We will be glad to hear from you.
54+
55+
Running the Code
56+
++++++++++++++++
57+
58+
After downloading both the scripts, downloading and uncompressing the data and
59+
putting all those files in the same folder, the user can run the code by
60+
calling:
61+
62+
.. code-block:: bash
63+
64+
THEANO_FLAGS="floatX=float32" python train_lstm.py
65+

0 commit comments

Comments
 (0)