Skip to content

Commit ded03dc

Browse files
committed
more info about dealing with the dataset
1 parent 2c19c92 commit ded03dc

1 file changed

Lines changed: 13 additions & 6 deletions

File tree

doc/gettingstarted.txt

Lines changed: 13 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -91,10 +91,9 @@ MNIST Dataset
9191

9292
Since now the data is in one variable, and a minibatch is defined as a
9393
slice of that variable, it comes more natural to define a minibatch by
94-
indicating its index and how large one minibatch is
95-
Note that since the batch size stays constant through out the
96-
execution of the code, a function will
97-
require only the index as input in order to identify on which minibatch to work.
94+
indicating its index and its size. In our setup the batch size stays constant
95+
through out the execution of the code, therefore a function will actually
96+
require only the index to identify on which datapoints to work.
9897
The code below shows how to store your data and how to
9998
access a minibatch:
10099

@@ -134,10 +133,18 @@ MNIST Dataset
134133
label = train_set_y[2*500:3*500]
135134

136135

137-
Note that the data has to be stored as floats on the GPU ( the right
136+
The data has to be stored as floats on the GPU ( the right
138137
``dtype`` for storing on the GPU is given by ``theano.config.floatX``).
139138
To get around this shortcomming for the labels, we store them as float,
140-
and then cast it to int.
139+
and then cast it to int.
140+
141+
.. note::
142+
143+
If you are running your code on the GPU and the dataset you are using
144+
is too large to fit in memory the code will crush. In such a case, do
145+
not store the data in a shared variable. You can however copy a larger chunk
146+
of it at once (several minibatches) to reduce the overhead of data transfer.
147+
141148

142149

143150
.. index:: Notation

0 commit comments

Comments
 (0)