-
Notifications
You must be signed in to change notification settings - Fork 169
Move from Gym 0.22 to Gymnasium 0.28 #445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you so much, Govind! I am busy with a paper and will finish the code review ASAP : ) |
|
Hi, I wanted to follow up this. Is there anything I can do to help? |
|
Hi, Govind Sorry for the late response. Thanks for helping us finish the gymnasium support. I have some minor questions regarding this PR.
Actually, I want to support the Best, |
|
BTW, shall we align the |
|
@pengzhenghao Please help check the MARL part. Till now everything LGTM. |
|
Regarding each of your points:
Regarding |
|
Great! And regarding the backward compatibility, I think v21 support is good, as For testing, I recently worked on a training script that uses rllib v1.0 and thus can be used to test the wrapper. Could you follow these steps to test your wrapper?
from metadrive.envs.metadrive_env import MetaDriveEnv
from scenarionet_training.train_utils.utils import train, get_train_parser
from ray.rllib.agents.callbacks import DefaultCallbacks
if __name__ == '__main__':
args = get_train_parser().parse_args()
exp_name = args.exp_name or "TEST"
stop = int(10_000_000)
config = dict(
env=Your_wrapper(MetaDriveEnv),
env_config=dict(environment_num=-1),
# ===== Evaluation =====
evaluation_interval=3,
evaluation_num_episodes=20,
evaluation_config=dict(env_config=dict(environment_num=-1, start_seed=0)),
evaluation_num_workers=1,
metrics_smoothing_episodes=20,
# ===== Training =====
horizon=1000,
num_sgd_iter=20,
lr=5e-5,
rollout_fragment_length=200,
sgd_minibatch_size=100,
train_batch_size=3000,
num_gpus=0,
num_cpus_per_worker=0.13,
num_cpus_for_driver=1,
num_workers=1,
)
train(
"PPO",
exp_name=exp_name,
keep_checkpoints_num=1,
custom_callback=DefaultCallbacks,
stop=stop,
config=config,
num_gpus=args.num_gpus,
# num_seeds=args.num_seeds,
num_seeds=1,
test_mode=args.test,
# local_mode=True
)
== Status ==
Memory usage on this node: 10.0/62.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 1.26/16 CPUs, 0/0 GPUs, 0.0/33.2 GiB heap, 0.0/11.43 GiB objects (0/1.0 accelerator_type:G)
Result logdir: /mnt/data/scenarionet/scenarionet_training/scripts/TEST
Number of trials: 1 (1 RUNNING)
+------------------------------+----------+----------------------+--------+--------+------------------+------+----------+
| Trial name | status | loc | seed | iter | total time (s) | ts | reward |
|------------------------------+----------+----------------------+--------+--------+------------------+------+----------|
| PPO_MetaDriveEnv_212ea_00000 | RUNNING | 10.161.143.158:46207 | 0 | 1 | 19.2438 | 3000 | 3.9947 |
+------------------------------+----------+----------------------+--------+--------+------------------+------+----------+I tested the mentioned workflow already. Thus I am pretty sure it works for old gym API |
|
Ah, one more question. I wonder if we can install |
|
Hi, Luckily, it is possible to install both Gym and Gymnasium together. Thank you so much for the detailed instructions! Using them, I got the Gym compatibility wrapper working. I had to modify the script you wrote slightly, since I took a slightly different approach. Here's the link to my implementation: https://github.com/pimpale/metadrive/blob/main/metadrive/envs/gym_wrapper.py Here's the modified code that runs: from metadrive.envs.metadrive_env import MetaDriveEnv
from metadrive.envs.gym_wrapper import GymEnvWrapper
from scenarionet_training.train_utils.utils import train, get_train_parser
from ray.rllib.agents.callbacks import DefaultCallbacks
if __name__ == '__main__':
args = get_train_parser().parse_args()
exp_name = args.exp_name or "TEST"
stop = int(10_000_000)
config = dict(
env=GymEnvWrapper,
env_config={
"inner_class": MetaDriveEnv,
"inner_config": { "environment_num": -1 }
},
# ===== Evaluation =====
evaluation_interval=3,
evaluation_num_episodes=20,
evaluation_config=dict(env_config=dict(environment_num=-1, start_seed=0)),
evaluation_num_workers=1,
metrics_smoothing_episodes=20,
# ===== Training =====
horizon=1000,
num_sgd_iter=20,
lr=5e-5,
rollout_fragment_length=200,
sgd_minibatch_size=100,
train_batch_size=3000,
num_gpus=0,
num_cpus_per_worker=0.13,
num_cpus_for_driver=1,
num_workers=1,
)
train(
"PPO",
exp_name=exp_name,
keep_checkpoints_num=1,
custom_callback=DefaultCallbacks,
stop=stop,
config=config,
num_gpus=args.num_gpus,
# num_seeds=args.num_seeds,
num_seeds=1,
test_mode=args.test,
# local_mode=True
)I got this script to start running, and it was able to get to this point: |
|
Well done! You make it! My last question is can we keep |
|
Hi, Regarding gym version compatibility, I tested the I think we should make gym an optional dependency, so that it's not installed by default. You should only have to install it if you need the wrapper. I've just pushed a commit that makes Also, I have a question about the truncation when horizon is reached.
|
|
The @pengzhenghao, could you help look into this? I believe with these updates, we can update our codebase to latest |
|
Based on the documentation here: https://docs.ray.io/en/latest/_modules/ray/rllib/env/base_env.html#BaseEnv.poll , it seems that we just use two "all" keys, one for episode termination, and one for episode truncation. This seems fairly easy to implement, and I will do so. Do you have a training script for the new version of RLLib, or will the |
|
Hi, Thank you so much for contributing! Our previous work on Multi-agent RL: CoPO repo is here: https://github.com/decisionforce/CoPO#training where the training scripts are: python ./copo_code/copo/torch_copo/train_ccppo.py
python ./copo_code/copo/torch_copo/train_ippo.py
python ./copo_code/copo/torch_copo/train_copo.pyThe multi-agent RL assumes ray==2.2.0 and it is suppose to be runnable by calling the above scripts directly. I would expect your wrapper can be compatible with the above scripts and that's all. The return As for the Gymnasium 0.28 requiring the something like all_truncated (I don't dive into yet) and how to fit MD to latest RLLib, I think we can address these issues in future PR. So so far you only need to ensure that old CoPO torch code is working with latest MD in rllib==2.2.0 Again, thank you so much!! If you encounter some issues in MARL we can mark it as a known issue and address in future PR. Because I think this PR is already too heavy with 100+ files changed. |
|
If you decide to leave the MARL compatibility to the future, shall I merge this PR? |
|
From my side, it's good to merge. |
* replace gym with gymnasium * go all in on the new termination + truncation semantics * fixed incompatibility between gym and gymnasium * fix some instances of done with terminated and truncated * continue renaming variables * rewrite more steps * format * fix bugs with obs, pass almost all test cases * fix bugs with test export * fix infinite loop, all test cases pass * rename te, tr to tm, tc * rename force_seed to seed, introduce gym_wrapper * gate gym behind optional feature, correctly handle when gym is not installed * fix test * format code --------- Co-authored-by: QuanyiLi <[email protected]>

What changes do you make in this PR?
This PR switches from OpenAI Gym version 0.22 to Farama Gymnasium 0.28.
OpenAI Gym is no longer maintained, as stated on the README of https://github.com/openai/gym . Farama Gymnasium is designated as a replacement, and it is where current development occurs. As such, it would be beneficial to use that library instead, as we would benefit from ongoing improvements.
One major change in recent versions of Gymnasium is that the signature of the
Env.stepfunction has changed from returning a tuple of the form(obs, reward, done, info)to(obs, reward, terminated, truncated, info). More information is given in the associated PR: openai/gym#2752 .Another major change is that
Env.resethas changed from returning a single valueobsto a tuple of(obs, info).Finally, the
modeargument ofEnv.renderhas been removed. Instead,render_modeshould be provided during environment creation.This PR implements all 3 of the major changes made. This allows us to be compatible with the latest version of Gymnasium without needing wrappers.
I have confirmed that all test cases pass after these changes.
Checklist
bash scripts/format.shbefore merging.