Skip to content

Conversation

@pimpale
Copy link
Contributor

@pimpale pimpale commented May 29, 2023

What changes do you make in this PR?

This PR switches from OpenAI Gym version 0.22 to Farama Gymnasium 0.28.

OpenAI Gym is no longer maintained, as stated on the README of https://github.com/openai/gym . Farama Gymnasium is designated as a replacement, and it is where current development occurs. As such, it would be beneficial to use that library instead, as we would benefit from ongoing improvements.

One major change in recent versions of Gymnasium is that the signature of the Env.step function has changed from returning a tuple of the form (obs, reward, done, info) to (obs, reward, terminated, truncated, info). More information is given in the associated PR: openai/gym#2752 .

Another major change is that Env.reset has changed from returning a single value obs to a tuple of (obs, info).

Finally, the mode argument of Env.render has been removed. Instead, render_mode should be provided during environment creation.

This PR implements all 3 of the major changes made. This allows us to be compatible with the latest version of Gymnasium without needing wrappers.

I have confirmed that all test cases pass after these changes.

Checklist

  • I have merged the latest main branch into current branch.
  • I have run bash scripts/format.sh before merging.
  • Please use "squash and merge" mode.

@pimpale pimpale mentioned this pull request May 29, 2023
@QuanyiLi
Copy link
Member

QuanyiLi commented Jun 1, 2023

Thank you so much, Govind! I am busy with a paper and will finish the code review ASAP : )

@pimpale
Copy link
Contributor Author

pimpale commented Jun 12, 2023

Hi, I wanted to follow up this. Is there anything I can do to help?

@QuanyiLi QuanyiLi self-requested a review June 13, 2023 15:27
@QuanyiLi
Copy link
Member

Hi, Govind

Sorry for the late response. Thanks for helping us finish the gymnasium support. I have some minor questions regarding this PR.

  1. Can we make it still compatible with the old gym? I believe there are a lot of users still using the old gym. Like we can have a new environment config option called old_gym. If set to True, we can use the lower version gym, and the returned value of step will be o, r, d, i. I believe @pengzhenghao has the same concerns regarding API compatibility.
  2. I find that the render_mode option in env.reset() is moved to env.config. Is there some special reason to do this?
  3. MetaDrive can terminate an episode when reaching the max step limit which is specified by env.config['horizon']. Can we flag the truncation to True in this case?

Actually, I want to support the gymnasium by providing a wrapper taking any MetaDrive and the derived environment as input. It then turns the env.action_space, env.observation_space and the returned values of APIs into the gymnasium style. There is a basic implementation already: https://github.com/metadriverse/metadrive/blob/main/metadrive/envs/gymnasium_wrapper.py
But it seems that this wrapper doesn't work with ray/rllib lol. Following this idea, maybe we can provide a wrapper for turning the gymnasium style environment into a gym style one before we merge this PR. Or do you have any suggestions about the compatibility issue?

Best,
Quanyi

@QuanyiLi
Copy link
Member

BTW, shall we align the tr, te with tm, tc? I prefer using tm, and tc to represent termination and truncation. What do the gymnasium guys use for the abbreviation?

@QuanyiLi
Copy link
Member

@pengzhenghao Please help check the MARL part. Till now everything LGTM.

@pimpale
Copy link
Contributor Author

pimpale commented Jun 13, 2023

Regarding each of your points:

  1. I agree with you that compatibility is important. I think that switching from Gym to Gymnasium will have to be a breaking change, but we should offer a very easy solution to be backward compatible. I think the solution of writing a wrapper for the environment is more clean, and I will look into it. I checked online if there were any wrappers that did automatic conversion from a Gymnasium environment to a Gym environment, but I couldn't find anything.
    • I have a few questions about this though:
      • Are there any Gym-only libraries we can test on to make sure our wrapper works?
      • Which version of Gym do we aim to support? v21 or v26?
  2. The reason I did this was because it was required by Gymnasium. Here's the link to the relevant section of the documentation: https://gymnasium.farama.org/content/migration-guide/#environment-render
  3. Yes, I will add this in.

Regarding te, tr vs tm, tc, I agree with you. I will add another commit to rename all of those variables to the tm, tc scheme.

@QuanyiLi
Copy link
Member

QuanyiLi commented Jun 13, 2023

Great! And regarding the backward compatibility, I think v21 support is good, as gymnasium is already a fork of v26: https://gymnasium.farama.org/content/migration-guide/

For testing, I recently worked on a training script that uses rllib v1.0 and thus can be used to test the wrapper. Could you follow these steps to test your wrapper?

  1. Create a new python3.8 environment and Install the MetaDrive locally: cd metadrive, git checkout main and run pip install -e .
  2. Clone ScenarioNet: git clone [email protected]:metadriverse/scenarionet.git
  3. Install ScenarioNet with the training tools: cd scenarionet and run pip install -e .[train]. Note: please make sure the metadrive package is the one you are working on via pip list | grep metadrive.
  4. Modify the script scenarionet_training/scripts/train_pg.py to the following one and then test it:
from metadrive.envs.metadrive_env import MetaDriveEnv
from scenarionet_training.train_utils.utils import train, get_train_parser
from ray.rllib.agents.callbacks import DefaultCallbacks

if __name__ == '__main__':
    args = get_train_parser().parse_args()

    exp_name = args.exp_name or "TEST"
    stop = int(10_000_000)

    config = dict(
        env=Your_wrapper(MetaDriveEnv),
        env_config=dict(environment_num=-1),

        # ===== Evaluation =====
        evaluation_interval=3,
        evaluation_num_episodes=20,
        evaluation_config=dict(env_config=dict(environment_num=-1, start_seed=0)),
        evaluation_num_workers=1,
        metrics_smoothing_episodes=20,

        # ===== Training =====
        horizon=1000,
        num_sgd_iter=20,
        lr=5e-5,
        rollout_fragment_length=200,
        sgd_minibatch_size=100,
        train_batch_size=3000,
        num_gpus=0,
        num_cpus_per_worker=0.13,
        num_cpus_for_driver=1,
        num_workers=1,
    )

    train(
        "PPO",
        exp_name=exp_name,
        keep_checkpoints_num=1,
        custom_callback=DefaultCallbacks,
        stop=stop,
        config=config,
        num_gpus=args.num_gpus,
        # num_seeds=args.num_seeds,
        num_seeds=1,
        test_mode=args.test,
        # local_mode=True
    )
  1. Once you get messages like this, congratulation! your wrapper works:
== Status ==
Memory usage on this node: 10.0/62.6 GiB
Using FIFO scheduling algorithm.
Resources requested: 1.26/16 CPUs, 0/0 GPUs, 0.0/33.2 GiB heap, 0.0/11.43 GiB objects (0/1.0 accelerator_type:G)
Result logdir: /mnt/data/scenarionet/scenarionet_training/scripts/TEST
Number of trials: 1 (1 RUNNING)
+------------------------------+----------+----------------------+--------+--------+------------------+------+----------+
| Trial name                   | status   | loc                  |   seed |   iter |   total time (s) |   ts |   reward |
|------------------------------+----------+----------------------+--------+--------+------------------+------+----------|
| PPO_MetaDriveEnv_212ea_00000 | RUNNING  | 10.161.143.158:46207 |      0 |      1 |          19.2438 | 3000 |   3.9947 |
+------------------------------+----------+----------------------+--------+--------+------------------+------+----------+

I tested the mentioned workflow already. Thus I am pretty sure it works for old gym API

@QuanyiLi
Copy link
Member

Ah, one more question. I wonder if we can install gymnasium and gym together. Only If it is allowed, our wrapper can work lol.

@pimpale
Copy link
Contributor Author

pimpale commented Jun 14, 2023

Hi,

Luckily, it is possible to install both Gym and Gymnasium together.

Thank you so much for the detailed instructions! Using them, I got the Gym compatibility wrapper working. I had to modify the script you wrote slightly, since I took a slightly different approach.

Here's the link to my implementation: https://github.com/pimpale/metadrive/blob/main/metadrive/envs/gym_wrapper.py

Here's the modified code that runs:

from metadrive.envs.metadrive_env import MetaDriveEnv
from metadrive.envs.gym_wrapper import GymEnvWrapper
from scenarionet_training.train_utils.utils import train, get_train_parser
from ray.rllib.agents.callbacks import DefaultCallbacks

if __name__ == '__main__':
    args = get_train_parser().parse_args()

    exp_name = args.exp_name or "TEST"
    stop = int(10_000_000)

    config = dict(
        env=GymEnvWrapper,
        env_config={
            "inner_class": MetaDriveEnv,
            "inner_config": { "environment_num": -1 }
        },

        # ===== Evaluation =====
        evaluation_interval=3,
        evaluation_num_episodes=20,
        evaluation_config=dict(env_config=dict(environment_num=-1, start_seed=0)),
        evaluation_num_workers=1,
        metrics_smoothing_episodes=20,

        # ===== Training =====
        horizon=1000,
        num_sgd_iter=20,
        lr=5e-5,
        rollout_fragment_length=200,
        sgd_minibatch_size=100,
        train_batch_size=3000,
        num_gpus=0,
        num_cpus_per_worker=0.13,
        num_cpus_for_driver=1,
        num_workers=1,
    )

    train(
        "PPO",
        exp_name=exp_name,
        keep_checkpoints_num=1,
        custom_callback=DefaultCallbacks,
        stop=stop,
        config=config,
        num_gpus=args.num_gpus,
        # num_seeds=args.num_seeds,
        num_seeds=1,
        test_mode=args.test,
        # local_mode=True
    )

I got this script to start running, and it was able to get to this point:
image
I didn't wait for the training to finish, but it seemed like there were no errors.

@QuanyiLi
Copy link
Member

Well done! You make it!

My last question is can we keep gym in the dependency list of setup.py? Previously, it is "gym>=0.20.0, <0.26.0, !=0.23.*, !=0.24.*". If the script you are running can work with 0.20, 0.21, 0.22 and 0.25, I think we can restore it. Otherwise, we should exclude some broken gym versions.

@pimpale
Copy link
Contributor Author

pimpale commented Jun 14, 2023

Hi,

Regarding gym version compatibility, I tested the scenarionet/train_pg.py script with gym versions 0.20, 0.21, 0.22, 0.23, 0.24, 0.25, and 0.26, and all of them seem to work.

I think we should make gym an optional dependency, so that it's not installed by default. You should only have to install it if you need the wrapper.

I've just pushed a commit that makes gym an optional feature flag. The versions I permit are [0.20 to 0.26]. So, if you want to use the wrapper, you would run pip install -e .[gym].

Also, I have a question about the truncation when horizon is reached.

  • I noticed that we add a __all__ key in the done dictionary when all vehicles are done. Now that we have two different dictionaries for either termination or truncation, this seems awkward, since it might be the case that no cars are terminated, but all of them are truncated. Wouldn't it be better to change the __all__ key in the terminated dict into an all_done key in the info dict? This way the terminated and truncated dictionaries will have the same number of keys.

@QuanyiLi
Copy link
Member

The __all__ is needed for the MARL interface of rllib , so it would be better to see how the latest rllib processes the truncation and done.

@pengzhenghao, could you help look into this? I believe with these updates, we can update our codebase to latest ray/rllib.

@pimpale
Copy link
Contributor Author

pimpale commented Jun 14, 2023

Based on the documentation here: https://docs.ray.io/en/latest/_modules/ray/rllib/env/base_env.html#BaseEnv.poll , it seems that we just use two "all" keys, one for episode termination, and one for episode truncation. This seems fairly easy to implement, and I will do so.

Do you have a training script for the new version of RLLib, or will the train_pg.py script suffice for testing?

@pengzhenghao
Copy link
Member

Hi,

Thank you so much for contributing!

Our previous work on Multi-agent RL: CoPO repo is here: https://github.com/decisionforce/CoPO#training where the training scripts are:

python ./copo_code/copo/torch_copo/train_ccppo.py
python ./copo_code/copo/torch_copo/train_ippo.py
python ./copo_code/copo/torch_copo/train_copo.py

The multi-agent RL assumes ray==2.2.0 and it is suppose to be runnable by calling the above scripts directly. I would expect your wrapper can be compatible with the above scripts and that's all.

The return __all__ will be True if all agents in the scenes are done. So no observation, reward, info anymore starting from this step.

As for the Gymnasium 0.28 requiring the something like all_truncated (I don't dive into yet) and how to fit MD to latest RLLib, I think we can address these issues in future PR. So so far you only need to ensure that old CoPO torch code is working with latest MD in rllib==2.2.0

Again, thank you so much!! If you encounter some issues in MARL we can mark it as a known issue and address in future PR. Because I think this PR is already too heavy with 100+ files changed.

@QuanyiLi
Copy link
Member

If you decide to leave the MARL compatibility to the future, shall I merge this PR?

@pimpale
Copy link
Contributor Author

pimpale commented Jun 19, 2023

From my side, it's good to merge.

@QuanyiLi QuanyiLi merged commit fd60d4f into metadriverse:main Jun 19, 2023
@pimpale pimpale mentioned this pull request Jun 20, 2023
2 tasks
QuanyiLi added a commit to QuanyiLi/metadrive that referenced this pull request Aug 21, 2023
* replace gym with gymnasium

* go all in on the new termination + truncation semantics

* fixed incompatibility between gym and gymnasium

* fix some instances of done with terminated and truncated

* continue renaming variables

* rewrite more steps

* format

* fix bugs with obs, pass almost all test cases

* fix bugs with test export

* fix infinite loop, all test cases pass

* rename te, tr to tm, tc

* rename force_seed to seed, introduce gym_wrapper

* gate gym behind optional feature, correctly handle when gym is not installed

* fix test

* format code

---------

Co-authored-by: QuanyiLi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants