@@ -31,16 +31,17 @@ Multilayer Perceptron
3131.. _GPU: http://deeplearning.net/software/theano/tutorial/using_gpu.html
3232
3333
34- The next architecture we are going to present using Theano is the single-hidden
35- layer Multi-Layer Perceptron (MLP). An MLP can be viewed as a logistic
36- regressor, where the input is first transformed using a learnt non-linear
37- transformation :math:`\Phi`. The purpose of this transformation is to project the
34+ The next architecture we are going to present using Theano is the
35+ single-hidden- layer Multi-Layer Perceptron (MLP). An MLP can be viewed as a
36+ logistic regression classifier where the input is first transformed using a
37+ learnt non-linear transformation :math:`\Phi`. This transformation projects the
3838input data into a space where it becomes linearly separable. This intermediate
39- layer is referred to as a **hidden layer**. A single hidden layer is
40- sufficient to make MLPs a **universal approximator**. However we will see later
41- on that there are substantial benefits to using many such hidden layers, i.e. the
42- very premise of **deep learning**. See these course notes for an `introduction
43- to MLPs, the back-propagation algorithm, and how to train MLPs <http://www.iro.umontreal.ca/~pift6266/H10/notes/mlp.html>`_.
39+ layer is referred to as a **hidden layer**. A single hidden layer is sufficient
40+ to make MLPs a **universal approximator**. However we will see later on that
41+ there are substantial benefits to using many such hidden layers, i.e. the very
42+ premise of **deep learning**. See these course notes for an `introduction to
43+ MLPs, the back-propagation algorithm, and how to train MLPs
44+ <http://www.iro.umontreal.ca/~pift6266/H10/notes/mlp.html>`_.
4445
4546This tutorial will again tackle the problem of MNIST digit classification.
4647
@@ -54,10 +55,9 @@ follows:
5455.. figure:: images/mlp.png
5556 :align: center
5657
57- Formally, a one-hidden layer MLP constitutes a function :math:`f: R^D \rightarrow R^L`,
58- where :math:`D` is the size of input vector :math:`x`
59- and :math:`L` is the size of the output vector :math:`f(x)`, such that,
60- in matrix notation:
58+ Formally, a one-hidden-layer MLP is a function :math:`f: R^D \rightarrow
59+ R^L`, where :math:`D` is the size of input vector :math:`x` and :math:`L` is
60+ the size of the output vector :math:`f(x)`, such that, in matrix notation:
6161
6262.. math::
6363
@@ -97,8 +97,8 @@ cover this in the tutorial !
9797Going from logistic regression to MLP
9898+++++++++++++++++++++++++++++++++++++
9999
100- This tutorial will focus on a single-layer MLP. We start off by
101- implementing a class that will represent any given hidden layer. To
100+ This tutorial will focus on a single-hidden- layer MLP. We start off by
101+ implementing a class that will represent a hidden layer. To
102102construct the MLP we will then only need to throw a logistic regression
103103layer on top.
104104
@@ -132,7 +132,7 @@ to use something else.
132132
133133If you look into theory this class implements the graph that computes
134134the hidden layer value :math:`h(x) = \Phi(x) = s(b^{(1)} + W^{(1)} x)`.
135- If you give this as input to the ``LogisticRegression`` class,
135+ If you give this graph as input to the ``LogisticRegression`` class,
136136implemented in the previous tutorial :doc:`logreg`, you get the output
137137of the MLP. You can see this in the following short implementation of
138138the ``MLP`` class.
0 commit comments