Merge branch 'master' of https://github.com/kwotsin/TensorFlow-ENet

kwotsin · kwotsin · commit f272b804ceb6 · 2017-06-30T11:25:09.000+08:00
diff --git a/README.md b/README.md
@@ -1,20 +1,22 @@
 # TensorFlow-ENet
 TensorFlow implementation of [**ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation**](https://arxiv.org/pdf/1606.02147.pdf).
 
-This model was tested on the CamVid dataset with street scenes taken from Cambridge, UK. For more information, please visit: http://mi.eng.cam.ac.uk/research/projects/VideoRec/CamVid/
+This model was tested on the CamVid dataset with street scenes taken from Cambridge, UK. For more information on this dataset, please visit: http://mi.eng.cam.ac.uk/research/projects/VideoRec/CamVid/
 
 
 ## Visualizations
 Note that the gifs may be out of sync if the network doesn't load them together. You can refresh your page to see them in sync.
 
-### Original Video Input
-![CamVid Test Dataset Output](https://github.com/kwotsin/TensorFlow-ENet/blob/master/visualizations/original.gif)
-
 ### Test Dataset Output
-![CamVid Test Dataset Output](https://github.com/kwotsin/TensorFlow-ENet/blob/master/visualizations/output.gif)
+![CamVid Test Dataset Output](https://github.com/kwotsin/TensorFlow-ENet/blob/master/visualizations/original.gif) ![CamVid Test Dataset Output](https://github.com/kwotsin/TensorFlow-ENet/blob/master/visualizations/output.gif)
+
+### TensorBoard Visualizations
+Execute `tensorboard --logdir=log` on your root directory to monitor your training and watch your segmentation output form against the ground truth and the original image as you train your model.
 
 
 ## Contents
+
+#### Code
 - **enet.py**: The ENet model definition, including the argument scope.
 
 - **train_enet.py**: The file for training. Includes saving of images for visualization and tunable hyperparameters.
@@ -26,12 +28,16 @@ Note that the gifs may be out of sync if the network doesn't load them together.
 - **predict_segmentation.py**: Obtains the segmentation output for visualization purposes. You can create your own gif with                                  these outputs.
 
 - **get_class_weights.py**: The file to obtain either the median frequency balancing class weights, or the custom ENet                                   function class weights.
+- **train.sh**: Example training script to train the different variations of the model.
 
+- **test.sh** Example testing script to test the different variants you trained.
 
-**TensorBoard Visualizations:** Execute `tensorboard --logdir=log` on your root directory to monitor your training and watch your segmentation output form against the ground truth and the original image as you train your model.
+#### Folders
 
+- **dataset**: Contains 6 folders that holds the original train-val-test images and their corresponding ground truth annotations.
 
-**Note:** To use the checkpoint model, please set the argument `--stage_two_repeat=3` in both `train_enet.py` and `test_enet.py` as the checkpoint was trained on a slightly deeper version of ENet using 3 stage_two bottleneck series instead of the default 2.
+- **checkpoint**: The checkpoint directory that could be used for predicting the segmentation output. The model was trained using the default parameters mentioned in the paper, except that it uses median frequency balancing to obtain the class weights.
+- **visualizations**: Contains the gif files that were created from the output of `predict_segmentation.py`.
 
 
 ## Important Notes
@@ -43,12 +49,14 @@ Note that the gifs may be out of sync if the network doesn't load them together.
 
 4. On the labels and colouring scheme: The dataset consists of only 12 labels, with the road-marking class merged with the road class. The last class is the unlabelled class. 
 
-5. No preprocessing is done to the images for ENet. (see references below on clarifications with author), 
+5. No preprocessing is done to the images for ENet. (see references below on clarifications with author).
+
+6. Once you've fine-tuned to get your best hyperparameters, there's an option to combine the training and validation datasets together. However, if your training dataset is large enough, this won't make a lot of difference.
 
 ## Implementation and Architectural Changes
 1. By default, skip connections are added to connect the corresponding encoder and decoder portions for better performance.
 
-2. The number of initial blocks and the depth of stage 2 residual bottlenecks are tunable hyperparameters, to allow you to build a deeper network if required, since ENet is rather lightweight.
+2. The number of initial blocks and the depth of stage 2 residual bottlenecks are tunable hyperparameters. This allows you to build a deeper network if required, since ENet is rather lightweight.
 
 3. Fused batch normalization is used over standard batch normalization for faster computations. See [TensorFlow's best practices](https://www.tensorflow.org/performance/performance_guide).
 
@@ -62,3 +70,6 @@ Note that the gifs may be out of sync if the network doesn't load them together.
 5. [Original Torch implementation of ENet](https://github.com/e-lab/ENet-training)
 6. [ResNet paper for clarification on residual bottlenecks](https://arxiv.org/pdf/1512.03385.pdf)
 7. [Colouring scheme](https://github.com/alexgkendall/SegNet-Tutorial/blob/c922cc4a4fcc7ce279dd998fb2d4a8703f34ebd7/Scripts/test_segmentation_camvid.py)
+
+## Feedback and Bugs
+This implementation may not be entirely correct and may contain bugs. It would be great if the open source community can spot any bugs and raise a github issue/submit a pull request to fix those bugs if any!