DeepLearningTutorials/doc/notation.txt at master · doomie/DeepLearningTutorials · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
Notation
========

Data set notation
+++++++++++++++++

We label data sets as :math:`\mathcal{D}`. When the distinction is important, we
indicate train, validation, and test sets as: :math:`\mathcal{D}_{train}`,
:math:`\mathcal{D}_{valid}` and :math:`\mathcal{D}_{test}`. The validation set
is used to perform model selection and hyper-parameter selection, whereas
the test set is used to evaluate final generalization error and
compare different algorithms in an unbiased way.

The tutorials mostly deal with classification problems, where each data set
:math:`\mathcal{D}` is an indexed set of pairs :math:`(x^{(i)},y^{(i)})`. We
use superscripts to distinguish training set examples. :math:`x^{(i)} \in
\mathcal{R}^D` is thus the i-th training example of dimensionality :math:`D`. Similarly,
:math:`y^{(i)} \in \{0, ..., L\}` is the i-th label assigned to input
:math:`x^{(i)}`. It is straightforward to extend these examples to
:math:`y^{(i)}` that has other types (e.g. Gaussian for regression,
or groups of multinomials for predicting multiple symbols).

Math Conventions
++++++++++++++++

* :math:`W`: upper-case symbols refer to a matrix unless specified otherwise
* :math:`W_{ij}`: element at i-th row and j-th column of matrix :math:`W`
* :math:`W_{i \cdot}, W_i`: vector, i-th row of matrix :math:`W`
* :math:`W_{\cdot j}`: vector, j-th column of matrix :math:`W`
* :math:`b`: lower-case symbols refer to a vector unless specified otherwise
* :math:`b_i`: i-th element of vector :math:`b`

List of Symbols and acronyms
++++++++++++++++++++++++++++

* D: number of input dimensions.
* :math:`f_{\theta}(x)`, :math:`f(x)`: prediction function of a model :math:`P(Y|x,\theta)`, defined as :math:`argmax_k P(Y=k|x,\theta)`.
  Note that we will often drop the :math:`\theta` subscript.
* L: number of labels.
* :math:`\mathcal{L}(\theta, \cal{D})`: log-likelihood :math:`\cal{D}`
  of the model defined by parameters :math:`\theta`.
* :math:`\ell(\theta, \cal{D})` empirical loss of the prediction function f
  parameterized by :math:`\theta` on data set :math:`\cal{D}`.
* NLL: negative log-likelihood
* :math:`\theta`: set of all parameters for a given model