diff --git a/docs/rl-algorithms/dqn.md b/docs/rl-algorithms/dqn.md index 56f617fb6..7e7e72338 100644 --- a/docs/rl-algorithms/dqn.md +++ b/docs/rl-algorithms/dqn.md @@ -278,7 +278,7 @@ Tracked experiments and game play videos: ## `dqn_jax.py` -* Uses [Jax](https://github.com/google/jax), [Flax](https://github.com/google/flax), and [Optax](https://github.com/deepmind/optax) instead of `torch`. [dqn_atari_jax.py](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/dqn_jax.py) is roughly 4% faster than [dqn_atari.py](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/dqn.py) +* Uses [Jax](https://github.com/google/jax), [Flax](https://github.com/google/flax), and [Optax](https://github.com/deepmind/optax) instead of `torch`. [dqn_jax.py](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/dqn_jax.py) is roughly 50% faster than [dqn.py](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/dqn.py) * Works with the `Box` observation space of low-level features * Works with the `Discrete` action space * Works with envs like `CartPole-v1` @@ -309,9 +309,9 @@ Below are the average episodic returns for [`dqn_jax.py`](https://github.com/vwx | Environment | `dqn_jax.py` | `dqn.py` | | ----------- | ----------- | ----------- | -| CartPole-v1 | 499.84 ± 0.24 | 488.69 ± 16.11 | -| Acrobot-v1 | -89.17 ± 8.79 | -91.54 ± 7.20 | -| MountainCar-v0 | -173.71 ± 29.14 | -194.95 ± 8.48 | +| CartPole-v1 | 498.38 ± 2.29 | 488.69 ± 16.11 | +| Acrobot-v1 | -88.89 ± 1.56 | -91.54 ± 7.20 | +| MountainCar-v0 | -188.90 ± 11.78 | -194.95 ± 8.48 | @@ -329,7 +329,7 @@ Below are the average episodic returns for [`dqn_jax.py`](https://github.com/vwx Tracked experiments and game play videos: - + diff --git a/docs/rl-algorithms/dqn/jax/Acrobot-v1-time.png b/docs/rl-algorithms/dqn/jax/Acrobot-v1-time.png index 5c44ebe44..89c1e96f7 100644 Binary files a/docs/rl-algorithms/dqn/jax/Acrobot-v1-time.png and b/docs/rl-algorithms/dqn/jax/Acrobot-v1-time.png differ diff --git a/docs/rl-algorithms/dqn/jax/Acrobot-v1.png b/docs/rl-algorithms/dqn/jax/Acrobot-v1.png index d5e9444c7..93f4c68b4 100644 Binary files a/docs/rl-algorithms/dqn/jax/Acrobot-v1.png and b/docs/rl-algorithms/dqn/jax/Acrobot-v1.png differ diff --git a/docs/rl-algorithms/dqn/jax/CartPole-v1-time.png b/docs/rl-algorithms/dqn/jax/CartPole-v1-time.png index abe2ea470..ff67ba030 100644 Binary files a/docs/rl-algorithms/dqn/jax/CartPole-v1-time.png and b/docs/rl-algorithms/dqn/jax/CartPole-v1-time.png differ diff --git a/docs/rl-algorithms/dqn/jax/CartPole-v1.png b/docs/rl-algorithms/dqn/jax/CartPole-v1.png index 393c3573a..c81b2a862 100644 Binary files a/docs/rl-algorithms/dqn/jax/CartPole-v1.png and b/docs/rl-algorithms/dqn/jax/CartPole-v1.png differ diff --git a/docs/rl-algorithms/dqn/jax/MountainCar-v0-time.png b/docs/rl-algorithms/dqn/jax/MountainCar-v0-time.png index bc4f62eec..a995b497d 100644 Binary files a/docs/rl-algorithms/dqn/jax/MountainCar-v0-time.png and b/docs/rl-algorithms/dqn/jax/MountainCar-v0-time.png differ diff --git a/docs/rl-algorithms/dqn/jax/MountainCar-v0.png b/docs/rl-algorithms/dqn/jax/MountainCar-v0.png index 9c7abc763..c2d460710 100644 Binary files a/docs/rl-algorithms/dqn/jax/MountainCar-v0.png and b/docs/rl-algorithms/dqn/jax/MountainCar-v0.png differ