Skip to content

Commit 082be49

Browse files
committed
2 parents 389217e + 3e59fe9 commit 082be49

File tree

1 file changed

+9
-1
lines changed

1 file changed

+9
-1
lines changed

README.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,16 +2,24 @@
22

33
**WORK IN PROGRESS**
44

5+
**UPDATE 10/30/17** Was unable to get the RL pretraining model with greedy decoding to learn on the TSP10 or TSP20 environments. I tried a critic network as well as an exponential moving average baseline. I am using 1 NVIDIA GTX 1080 and trained for 1-2 days. It appears as if the variance of the actor loss is still too high, even with these baselines. Please create an Issue and let me know if you get this to work.
6+
57
PyTorch implementation of [Neural Combinatorial Optimization with Reinforcement Learning](https://arxiv.org/abs/1611.09940).
68

7-
So far, I have implemented the basic RL pretraining model from the paper. An implementation of the supervised learning baseline model is available [here](https://github.com/pemami4911/neural-combinatorial-rl-tensorflow).
9+
I have implemented the basic RL pretraining model from the paper. An implementation of the supervised learning baseline model is available [here](https://github.com/pemami4911/neural-combinatorial-rl-tensorflow).
810

911
My implementation uses a stochastic decoding policy in the pointer network, realized via PyTorch's `torch.multinomial()`, during training, and beam search (not yet finished, only supports 1 beam a.k.a. greedy) for decoding when testing the model. I have tried to use the same hyperparameters as mentioned in the paper but have not yet been able to replicate results from TSP.
1012

1113
Currently, there is support for a sorting task and the Planar Symmetric Euclidean TSP.
1214

1315
See `main.sh` for an example of how to write a bash script to easily set the run parameters.
1416

17+
## TODO
18+
19+
* [ ] Finish implementing beam search decoding to support > 1 beam
20+
* [ ] Add support for variable length inputs
21+
* [ ] Distributed implementation
22+
1523
Examples:
1624

1725
To run `sort_10`:

0 commit comments

Comments
 (0)