Skip to content
Open
Prev Previous commit
Next Next commit
Added doc page for SACD
  • Loading branch information
pauerbach committed Aug 7, 2023
commit d97dbc727c20a8a0ebbd59fcbe42464e0aa0129f
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ RL Baselines3 Zoo also offers a simple interface to train, evaluate agents and d
modules/ppo_mask
modules/ppo_recurrent
modules/qrdqn
modules/sacd
modules/tqc
modules/trpo

Expand Down
99 changes: 99 additions & 0 deletions docs/modules/sacd.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
.. _sacd:

.. automodule:: sb3_contrib.sacd


SACD
====


`Soft Actor Critic Discrete (SACD) <https://arxiv.org/abs/1910.07207>`_ is a modification of the original Soft Actor Critic Algorithm for discrete action spaces.

.. rubric:: Available Policies

.. autosummary::
:nosignatures:

MlpPolicy
CnnPolicy
MultiInputPolicy


Notes
-----

- Original paper: https://arxiv.org/abs/1910.07207
- Original Implementation: https://github.com/p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch


Can I use?
----------

- Recurrent policies: ❌
- Multi processing: ✔️
- Gym spaces:


============= ====== ===========
Space Action Observation
============= ====== ===========
Discrete ✔️ ✔️
Box ❌ ✔️
MultiDiscrete ❌ ✔️
MultiBinary ❌ ✔️
Dict ❌ ✔️
============= ====== ===========


Example
-------
.. code-block:: python

import gymnasium as gym

from sb3_contrib import SACD

env = gym.make("CartPole-v1", render_mode="rgb_array")

model = SACD("MlpPolicy", env, verbose=1, policy_kwargs=dict(net_arch=[64,64]))
model.learn(total_timesteps=20_000)
model.save("sacd_cartpole")

del model # remove to demonstrate saving and loading

model = SACD.load("sac_cartpole")

obs, info = env.reset()
while True:
action, _states = model.predict(obs, deterministic=True)
obs, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
obs, info = env.reset()



Parameters
----------

.. autoclass:: SACD
:members:
:inherited-members:

.. _sac_policies:

SACD Policies
-------------

.. autoclass:: MlpPolicy
:members:
:inherited-members:

.. autoclass:: stable_baselines3.sac.policies.SACPolicy
:members:
:noindex:

.. autoclass:: CnnPolicy
:members:

.. autoclass:: MultiInputPolicy
:members: