Move from Gym 0.22 to Gymnasium 0.28 #445

pimpale · 2023-05-29T23:14:27Z

What changes do you make in this PR?

This PR switches from OpenAI Gym version 0.22 to Farama Gymnasium 0.28.

OpenAI Gym is no longer maintained, as stated on the README of https://github.com/openai/gym . Farama Gymnasium is designated as a replacement, and it is where current development occurs. As such, it would be beneficial to use that library instead, as we would benefit from ongoing improvements.

One major change in recent versions of Gymnasium is that the signature of the Env.step function has changed from returning a tuple of the form (obs, reward, done, info) to (obs, reward, terminated, truncated, info). More information is given in the associated PR: openai/gym#2752 .

Another major change is that Env.reset has changed from returning a single value obs to a tuple of (obs, info).

Finally, the mode argument of Env.render has been removed. Instead, render_mode should be provided during environment creation.

This PR implements all 3 of the major changes made. This allows us to be compatible with the latest version of Gymnasium without needing wrappers.

I have confirmed that all test cases pass after these changes.

Checklist

I have merged the latest main branch into current branch.
I have run bash scripts/format.sh before merging.
Please use "squash and merge" mode.

QuanyiLi · 2023-06-01T21:15:24Z

Thank you so much, Govind! I am busy with a paper and will finish the code review ASAP : )

pimpale · 2023-06-12T00:29:02Z

Hi, I wanted to follow up this. Is there anything I can do to help?

QuanyiLi · 2023-06-13T16:22:35Z

Hi, Govind

Sorry for the late response. Thanks for helping us finish the gymnasium support. I have some minor questions regarding this PR.

Can we make it still compatible with the old gym? I believe there are a lot of users still using the old gym. Like we can have a new environment config option called old_gym. If set to True, we can use the lower version gym, and the returned value of step will be o, r, d, i. I believe @pengzhenghao has the same concerns regarding API compatibility.
I find that the render_mode option in env.reset() is moved to env.config. Is there some special reason to do this?
MetaDrive can terminate an episode when reaching the max step limit which is specified by env.config['horizon']. Can we flag the truncation to True in this case?

Actually, I want to support the gymnasium by providing a wrapper taking any MetaDrive and the derived environment as input. It then turns the env.action_space, env.observation_space and the returned values of APIs into the gymnasium style. There is a basic implementation already: https://github.com/metadriverse/metadrive/blob/main/metadrive/envs/gymnasium_wrapper.py
But it seems that this wrapper doesn't work with ray/rllib lol. Following this idea, maybe we can provide a wrapper for turning the gymnasium style environment into a gym style one before we merge this PR. Or do you have any suggestions about the compatibility issue?

Best,
Quanyi

QuanyiLi · 2023-06-13T16:34:48Z

BTW, shall we align the tr, te with tm, tc? I prefer using tm, and tc to represent termination and truncation. What do the gymnasium guys use for the abbreviation?

QuanyiLi · 2023-06-13T16:37:11Z

@pengzhenghao Please help check the MARL part. Till now everything LGTM.

pimpale · 2023-06-13T17:58:18Z

Regarding each of your points:

I agree with you that compatibility is important. I think that switching from Gym to Gymnasium will have to be a breaking change, but we should offer a very easy solution to be backward compatible. I think the solution of writing a wrapper for the environment is more clean, and I will look into it. I checked online if there were any wrappers that did automatic conversion from a Gymnasium environment to a Gym environment, but I couldn't find anything.
- I have a few questions about this though:
  - Are there any Gym-only libraries we can test on to make sure our wrapper works?
  - Which version of Gym do we aim to support? v21 or v26?
The reason I did this was because it was required by Gymnasium. Here's the link to the relevant section of the documentation: https://gymnasium.farama.org/content/migration-guide/#environment-render
Yes, I will add this in.

Regarding te, tr vs tm, tc, I agree with you. I will add another commit to rename all of those variables to the tm, tc scheme.

QuanyiLi · 2023-06-13T18:46:32Z

Great! And regarding the backward compatibility, I think v21 support is good, as gymnasium is already a fork of v26: https://gymnasium.farama.org/content/migration-guide/

For testing, I recently worked on a training script that uses rllib v1.0 and thus can be used to test the wrapper. Could you follow these steps to test your wrapper?

Create a new python3.8 environment and Install the MetaDrive locally: cd metadrive, git checkout main and run pip install -e .
Clone ScenarioNet: git clone [email protected]:metadriverse/scenarionet.git
Install ScenarioNet with the training tools: cd scenarionet and run pip install -e .[train]. Note: please make sure the metadrive package is the one you are working on via pip list | grep metadrive.
Modify the script scenarionet_training/scripts/train_pg.py to the following one and then test it:

from metadrive.envs.metadrive_env import MetaDriveEnv
from scenarionet_training.train_utils.utils import train, get_train_parser
from ray.rllib.agents.callbacks import DefaultCallbacks

if __name__ == '__main__':
    args = get_train_parser().parse_args()

    exp_name = args.exp_name or "TEST"
    stop = int(10_000_000)

    config = dict(
        env=Your_wrapper(MetaDriveEnv),
        env_config=dict(environment_num=-1),

        # ===== Evaluation =====
        evaluation_interval=3,
        evaluation_num_episodes=20,
        evaluation_config=dict(env_config=dict(environment_num=-1, start_seed=0)),
        evaluation_num_workers=1,
        metrics_smoothing_episodes=20,

        # ===== Training =====
        horizon=1000,
        num_sgd_iter=20,
        lr=5e-5,
        rollout_fragment_length=200,
        sgd_minibatch_size=100,
        train_batch_size=3000,
        num_gpus=0,
        num_cpus_per_worker=0.13,
        num_cpus_for_driver=1,
        num_workers=1,
    )

    train(
        "PPO",
        exp_name=exp_name,
        keep_checkpoints_num=1,
        custom_callback=DefaultCallbacks,
        stop=stop,
        config=config,
        num_gpus=args.num_gpus,
        # num_seeds=args.num_seeds,
        num_seeds=1,
        test_mode=args.test,
        # local_mode=True
    )

Once you get messages like this, congratulation! your wrapper works:

== Status ==
Memory usage on this node: 10.0/62.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 1.26/16 CPUs, 0/0 GPUs, 0.0/33.2 GiB heap, 0.0/11.43 GiB objects (0/1.0 accelerator_type:G)
Result logdir: /mnt/data/scenarionet/scenarionet_training/scripts/TEST
Number of trials: 1 (1 RUNNING)
+------------------------------+----------+----------------------+--------+--------+------------------+------+----------+
| Trial name                   | status   | loc                  |   seed |   iter |   total time (s) |   ts |   reward |
|------------------------------+----------+----------------------+--------+--------+------------------+------+----------|
| PPO_MetaDriveEnv_212ea_00000 | RUNNING  | 10.161.143.158:46207 |      0 |      1 |          19.2438 | 3000 |   3.9947 |
+------------------------------+----------+----------------------+--------+--------+------------------+------+----------+

I tested the mentioned workflow already. Thus I am pretty sure it works for old gym API

QuanyiLi · 2023-06-13T18:50:53Z

Ah, one more question. I wonder if we can install gymnasium and gym together. Only If it is allowed, our wrapper can work lol.

pimpale · 2023-06-14T00:57:19Z

Hi,

Luckily, it is possible to install both Gym and Gymnasium together.

Thank you so much for the detailed instructions! Using them, I got the Gym compatibility wrapper working. I had to modify the script you wrote slightly, since I took a slightly different approach.

Here's the link to my implementation: https://github.com/pimpale/metadrive/blob/main/metadrive/envs/gym_wrapper.py

Here's the modified code that runs:

from metadrive.envs.metadrive_env import MetaDriveEnv
from metadrive.envs.gym_wrapper import GymEnvWrapper
from scenarionet_training.train_utils.utils import train, get_train_parser
from ray.rllib.agents.callbacks import DefaultCallbacks

if __name__ == '__main__':
    args = get_train_parser().parse_args()

    exp_name = args.exp_name or "TEST"
    stop = int(10_000_000)

    config = dict(
        env=GymEnvWrapper,
        env_config={
            "inner_class": MetaDriveEnv,
            "inner_config": { "environment_num": -1 }
        },

        # ===== Evaluation =====
        evaluation_interval=3,
        evaluation_num_episodes=20,
        evaluation_config=dict(env_config=dict(environment_num=-1, start_seed=0)),
        evaluation_num_workers=1,
        metrics_smoothing_episodes=20,

        # ===== Training =====
        horizon=1000,
        num_sgd_iter=20,
        lr=5e-5,
        rollout_fragment_length=200,
        sgd_minibatch_size=100,
        train_batch_size=3000,
        num_gpus=0,
        num_cpus_per_worker=0.13,
        num_cpus_for_driver=1,
        num_workers=1,
    )

    train(
        "PPO",
        exp_name=exp_name,
        keep_checkpoints_num=1,
        custom_callback=DefaultCallbacks,
        stop=stop,
        config=config,
        num_gpus=args.num_gpus,
        # num_seeds=args.num_seeds,
        num_seeds=1,
        test_mode=args.test,
        # local_mode=True
    )

I got this script to start running, and it was able to get to this point:

I didn't wait for the training to finish, but it seemed like there were no errors.

QuanyiLi · 2023-06-14T10:07:50Z

Well done! You make it!

My last question is can we keep gym in the dependency list of setup.py? Previously, it is "gym>=0.20.0, <0.26.0, !=0.23.*, !=0.24.*". If the script you are running can work with 0.20, 0.21, 0.22 and 0.25, I think we can restore it. Otherwise, we should exclude some broken gym versions.

…stalled

pimpale · 2023-06-14T18:54:50Z

Hi,

Regarding gym version compatibility, I tested the scenarionet/train_pg.py script with gym versions 0.20, 0.21, 0.22, 0.23, 0.24, 0.25, and 0.26, and all of them seem to work.

I think we should make gym an optional dependency, so that it's not installed by default. You should only have to install it if you need the wrapper.

I've just pushed a commit that makes gym an optional feature flag. The versions I permit are [0.20 to 0.26]. So, if you want to use the wrapper, you would run pip install -e .[gym].

Also, I have a question about the truncation when horizon is reached.

I noticed that we add a __all__ key in the done dictionary when all vehicles are done. Now that we have two different dictionaries for either termination or truncation, this seems awkward, since it might be the case that no cars are terminated, but all of them are truncated. Wouldn't it be better to change the __all__ key in the terminated dict into an all_done key in the info dict? This way the terminated and truncated dictionaries will have the same number of keys.

QuanyiLi · 2023-06-14T19:02:30Z

The __all__ is needed for the MARL interface of rllib , so it would be better to see how the latest rllib processes the truncation and done.

@pengzhenghao, could you help look into this? I believe with these updates, we can update our codebase to latest ray/rllib.

pimpale · 2023-06-14T19:30:23Z

Based on the documentation here: https://docs.ray.io/en/latest/_modules/ray/rllib/env/base_env.html#BaseEnv.poll , it seems that we just use two "all" keys, one for episode termination, and one for episode truncation. This seems fairly easy to implement, and I will do so.

Do you have a training script for the new version of RLLib, or will the train_pg.py script suffice for testing?

pengzhenghao · 2023-06-14T20:05:46Z

Hi,

Thank you so much for contributing!

Our previous work on Multi-agent RL: CoPO repo is here: https://github.com/decisionforce/CoPO#training where the training scripts are:

python ./copo_code/copo/torch_copo/train_ccppo.py
python ./copo_code/copo/torch_copo/train_ippo.py
python ./copo_code/copo/torch_copo/train_copo.py

The multi-agent RL assumes ray==2.2.0 and it is suppose to be runnable by calling the above scripts directly. I would expect your wrapper can be compatible with the above scripts and that's all.

The return __all__ will be True if all agents in the scenes are done. So no observation, reward, info anymore starting from this step.

As for the Gymnasium 0.28 requiring the something like all_truncated (I don't dive into yet) and how to fit MD to latest RLLib, I think we can address these issues in future PR. So so far you only need to ensure that old CoPO torch code is working with latest MD in rllib==2.2.0

Again, thank you so much!! If you encounter some issues in MARL we can mark it as a known issue and address in future PR. Because I think this PR is already too heavy with 100+ files changed.

QuanyiLi · 2023-06-18T20:44:37Z

If you decide to leave the MARL compatibility to the future, shall I merge this PR?

pimpale · 2023-06-19T06:01:42Z

From my side, it's good to merge.

* replace gym with gymnasium * go all in on the new termination + truncation semantics * fixed incompatibility between gym and gymnasium * fix some instances of done with terminated and truncated * continue renaming variables * rewrite more steps * format * fix bugs with obs, pass almost all test cases * fix bugs with test export * fix infinite loop, all test cases pass * rename te, tr to tm, tc * rename force_seed to seed, introduce gym_wrapper * gate gym behind optional feature, correctly handle when gym is not installed * fix test * format code --------- Co-authored-by: QuanyiLi <[email protected]>

pimpale added 10 commits May 13, 2023 21:22

replace gym with gymnasium

f965193

go all in on the new termination + truncation semantics

57d62d0

fixed incompatibility between gym and gymnasium

3249fa4

fix some instances of done with terminated and truncated

43f5f03

continue renaming variables

95ac5c1

rewrite more steps

2f6890e

format

36ffaf8

fix bugs with obs, pass almost all test cases

beb0571

fix bugs with test export

694b1f0

fix infinite loop, all test cases pass

caf37ad

pimpale mentioned this pull request May 29, 2023

gymnasium Support #443

Closed

QuanyiLi self-requested a review June 13, 2023 15:27

pimpale added 3 commits June 13, 2023 13:44

rename te, tr to tm, tc

e320c06

merge fork with upstream

a121bda

rename force_seed to seed, introduce gym_wrapper

72804c7

gate gym behind optional feature, correctly handle when gym is not in…

c4df3eb

…stalled

QuanyiLi added 2 commits June 19, 2023 10:53

fix test

b65cef9

format code

25cd6c6

QuanyiLi merged commit fd60d4f into metadriverse:main Jun 19, 2023

pimpale mentioned this pull request Jun 20, 2023

Upgrade Gym to Gymnasium Wrapper #458

Merged

2 tasks

QuanyiLi mentioned this pull request Jul 6, 2023

How to change the environment used in metadrive/examples/train_generalization_experiment.py? #461

Closed

Move from Gym 0.22 to Gymnasium 0.28 #445

Move from Gym 0.22 to Gymnasium 0.28 #445

Uh oh!

Conversation

pimpale commented May 29, 2023

What changes do you make in this PR?

This PR switches from OpenAI Gym version 0.22 to Farama Gymnasium 0.28.

Checklist

Uh oh!

QuanyiLi commented Jun 1, 2023

Uh oh!

pimpale commented Jun 12, 2023

Uh oh!

QuanyiLi commented Jun 13, 2023

Uh oh!

QuanyiLi commented Jun 13, 2023

Uh oh!

QuanyiLi commented Jun 13, 2023

Uh oh!

pimpale commented Jun 13, 2023

Uh oh!

QuanyiLi commented Jun 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

QuanyiLi commented Jun 13, 2023

Uh oh!

pimpale commented Jun 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

QuanyiLi commented Jun 14, 2023

Uh oh!

pimpale commented Jun 14, 2023

Uh oh!

QuanyiLi commented Jun 14, 2023

Uh oh!

pimpale commented Jun 14, 2023

Uh oh!

pengzhenghao commented Jun 14, 2023

Uh oh!

QuanyiLi commented Jun 18, 2023

Uh oh!

pimpale commented Jun 19, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

QuanyiLi commented Jun 13, 2023 •

edited

Loading

pimpale commented Jun 14, 2023 •

edited

Loading