DeepLearningTutorials/doc/unet.txt at master · channgo2203/DeepLearningTutorials · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
.. _unet:

U-Net
**********************************************

.. note::
    This section assumes the reader has already read through :doc:`lenet` for
    convolutional networks motivation and :doc:`fcn_2D_segm` for segmentation
    network.

Summary
+++++++

This tutorial provides a brief explanation of the U-Net architecture as well as a way to implement
it using Theano and Lasagne. U-Net is a Fully Convolutional Network (FCN) that does image segmentation.
Its goal is then to predict each pixel's class. See :doc:`fcn_2D_segm` for differences between
network architecture for classification and segmentation tasks.

Data
++++

The data is from ISBI challenge and can be found `here <http://brainiac2.mit.edu/isbi_challenge/home>`_.
We use data augmentation for training, as specified
in the defaults arguments in the code given below.

Model
+++++

The U-Net architecture is built upon the Fully Convolutional Network and modified
in a way that it yields better segmentation in medical imaging.
Compared to FCN-8, the two main differences are (1) U-net is symmetric and (2) the skip
connections between the downsampling path and the upsampling path apply a concatenation
operator instead of a sum. These skip connections intend to provide local information
to the global information while upsampling.
Because of its symmetry, the network has a large number of feature maps in the upsampling
path, which allows to transfer information. By comparison, the basic FCN architecture only had
*number of classes* feature maps in its upsampling path.

The U-Net owes its name to its symmetric shape, which is different from other FCN variants.

U-Net architecture is separated in 3 parts:

- 1 : The contracting/downsampling path
- 2 : Bottleneck
- 3 : The expanding/upsampling path

.. figure:: images/unet.jpg
    :align: center
    :scale: 60%

    **Figure 1** : Illustration of U-Net architecture (from U-Net paper)


Contracting/downsampling path
=============================

The contracting path is composed of 4 blocks. Each block is composed of

* 3x3 Convolution Layer + activation function (with batch normalization)
* 3x3 Convolution Layer + activation function (with batch normalization)
* 2x2 Max Pooling

Note that the number of feature maps doubles at each pooling, starting with
64 feature maps for the first block, 128 for the second, and so on.
The purpose of this contracting path is to capture the context of the input image
in order to be able to do segmentation. This coarse contextual information will
then be transfered to the upsampling path by means of skip connections.


Bottleneck
==========

This part of the network is between the contracting and expanding paths.
The bottleneck is built from simply 2 convolutional layers (with batch
normalization), with dropout.


Expanding/upsampling path
=========================

The expanding path is also composed of 4 blocks. Each of these blocks is composed of

* Deconvolution layer with stride 2
* Concatenation with the corresponding cropped feature map from the contracting path
* 3x3 Convolution layer + activation function (with batch normalization)
* 3x3 Convolution layer + activation function (with batch normalization)


The purpose of this expanding path is to enable precise localization combined
with contextual information from the contracting path.

Advantages
==========

* The U-Net combines the location information from the downsampling path with the contextual information in the upsampling path to finally obtain a general information combining localisation and context, which is necessary to predict a good segmentation map.

* No dense layer, so images of different sizes can be used as input (since the only parameters to learn on convolution layers are the kernel, and the size of the kernel is independent from input image' size).

* The use of massive data augmentation is important in domains like biomedical segmentation, since the number of annotated samples is usually limited.


Code
++++

.. warning::

    * Current code works with Python 2 only.
    * If you use Theano with GPU backend (e.g. with Theano flag ``device=cuda``),
      you will need at least 12GB free in your video RAM.

The U-Net implementation can be found in the following GitHub repo:

* `Unet_lasagne_recipes.py <../code/unet/Unet_lasagne_recipes.py>`_, from original main script
  `Unet.py <https://github.com/Lasagne/Recipes/blob/master/modelzoo/Unet.py>`_. Defines the model.

* `train_unet.py <../code/unet/train_unet.py>`_ : Training loop (main script to use).


The user must install `Lasagne <http://lasagne.readthedocs.io/en/latest/user/installation.html>`_ ,
`SimpleITK <http://www.simpleitk.org/SimpleITK/resources/software.html>`_ and
clone the GitHub repo `Dataset Loaders <https://github.com/fvisin/dataset_loaders>`_.

Change the ``dataset_loaders/config.ini`` file to set the right path for the dataset:

.. code-block:: cfg

    [isbi_em_stacks]
    shared_path = /path/to/DeepLearningTutorials/data/isbi_challenge_em_stacks/

Folder indicated at section ``[isbi_em_stacks]`` should contain files:

* ``test-volume.tif``
* ``train-labels.tif``
* ``train-volume.tif``

The user can now build a U-Net with a specified number of input channels and number of classes.
First include the Lasagne layers needed to define the U-Net architecture :

.. literalinclude:: ../code/unet/Unet_lasagne_recipes.py
  :start-after: start-snippet-1
  :end-before: end-snippet-1

The *net* variable will be an ordered dictionary containing layers names as keys and layers instances as value.
This is needed to be able to concatenate the feature maps from the contracting to expanding path.


First the contracting path :

.. literalinclude:: ../code/unet/Unet_lasagne_recipes.py
  :start-after: start-snippet-downsampling
  :end-before: end-snippet-downsampling

And then the bottleneck :

.. literalinclude:: ../code/unet/Unet_lasagne_recipes.py
  :start-after: start-snippet-bottleneck
  :end-before: end-snippet-bottleneck

Followed by the expanding path :

.. literalinclude:: ../code/unet/Unet_lasagne_recipes.py
  :start-after: start-snippet-upsampling
  :end-before: end-snippet-upsampling

And finally the output path (to obtain *number of classes* feature maps):

.. literalinclude:: ../code/unet/Unet_lasagne_recipes.py
  :start-after: start-snippet-output
  :end-before: end-snippet-output

Running ``train_unet.py`` on a Titan X lasted for around 60 minutes, ending with the following:

.. code-block:: text

    $ THEANO_FLAGS=device=cuda0,floatX=float32,dnn.conv.algo_fwd=time_once,dnn.conv.algo_bwd_data=time_once,dnn.conv.algo_bwd_filter=time_once,gpuarray.preallocate=1 python train_unet.py
    [...]
    EPOCH 364: Avg epoch training cost train 0.160667, cost val 0.265909, acc val 0.888796, jacc val class 0  0.636058, jacc val class 1 0.861970, jacc val 0.749014 took 4.379772 s


References
++++++++++

If you use this tutorial, please cite the following papers.

* `[pdf] <https://arxiv.org/pdf/1505.04597.pdf>`__ Olaf Ronneberger, Philipp Fischer, Thomas Brox. U_Net: Convolutional Networks for Biomedical Image Segmentation. May 2015.
* `[GitHub Repo] <https://github.com/fvisin/dataset_loaders>`__ Francesco Visin, Adriana Romero - Dataset loaders: a python library to load and preprocess datasets. 2017.

Papers related to Theano/Lasagne:

* `[pdf] <https://arxiv.org/pdf/1605.02688.pdf>`__ Theano Development Team. Theano: A Python framework for fast computation of mathematical expresssions. May 2016.
* `[website] <https://zenodo.org/record/27878#.WQocDrw18yc>`__ Sander Dieleman, Jan Schluter, Colin Raffel, Eben Olson, Søren Kaae Sønderby, Daniel Nouri, Daniel Maturana, Martin Thoma, Eric Battenberg, Jack Kelly, Jeffrey De Fauw, Michael Heilman, diogo149, Brian McFee, Hendrik Weideman, takacsg84, peterderivaz, Jon, instagibbs, Dr. Kashif Rasul, CongLiu, Britefury, and Jonas Degrave, “Lasagne: First release.” (2015).


Thank you!