Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Add a parameter to make the test set smaller
  • Loading branch information
nouiz committed Jan 31, 2015
commit 26914e4080661abf4211e6d129fbe01206a51334
4 changes: 4 additions & 0 deletions code/lstm.py
Original file line number Diff line number Diff line change
Expand Up @@ -395,6 +395,7 @@ def train_lstm(
use_dropout=True, # if False slightly faster, but worst test error
# This frequently need a bigger model.
reload_model="", # Path to a saved model we want to start from.
test_size=-1, # If >0, we will trunc the test set to this number of example.
):

# Model options
Expand All @@ -406,6 +407,8 @@ def train_lstm(
print 'Loading data'
train, valid, test = load_data(n_words=n_words, valid_portion=0.05,
maxlen=maxlen)
if test_size > 0:
test = (test[0][:test_size], test[1][:test_size])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that the data is sorted by length, it means the test set only consists in the shortest sequences, which is biased.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


ydim = numpy.max(train[1]) + 1

Expand Down Expand Up @@ -578,4 +581,5 @@ def train_lstm(
train_lstm(
#reload_model="lstm_model.npz",
max_epochs=100,
test_size=500,
)