Add MuJoCo v5 environments #572

Kallinteris-Andreas · 2023-06-27T19:26:27Z

cont from: Farama-Foundation/Gymnasium-Robotics#104

Description

Adds the v5 version of the mujoco environments.

Changelog

Minimum mujoco version is now 2.3.3.

All v5 environments

Added support for fully custom/third party mujoco models using the xml_file argument (previously only a few changes could be made to the existing models).
Added default_camera_config argument, a dictionary for setting the mj_camera properties, mainly useful for custom environments.
Added env.observation_structure, a dictionary for specifying the observation space compose (e.g. qpos, qvel), useful for building tooling and wrappers for the MuJoCo environments.
Return a non-empty info with reset(), previously an empty dictionary was returned, the new keys are the same state information as step().
Added frame_skip argument, used to configure the dt (duration of step()), default varies by environment check environment documentation pages.

Ant

Fixed bug: healthy_reward was given on every step (even if the Ant is unhealthy), now it is only given when the Ant is healthy. The info["reward_survive"] is updated with this change (related Github issue).
The reward function now always includes contact_cost, before it was only included if use_contact_forces=True (can be set to 0 with contact_cost_weight=0).
Excluded the cfrc_ext of worldbody from the observation space as it was always 0, and thus provided no useful information to the agent, resulting is slightly faster training (related Github issue).
Added the main_body argument, which specifies the body used to compute the forward reward (mainly useful for custom MuJoCo models).
Added the forward_reward_weight argument, which defaults to 1 (effectively the same behavior as in v4).
Added the include_cfrc_ext_in_observation argument, previously in v4 the inclusion of cfrc_ext observations was controlled by use_contact_forces which defaulted to False, while include_cfrc_ext_in_observation defaults to True.
Removed the use_contact_forces argument (note: its functionality has been replaced by include_cfrc_ext_in_observation and contact_cost_weight) (related Github issue).
Fixed info["reward_ctrl"] sometimes containing contact_cost instead of ctrl_cost.
Fixed info["x_position"] & info["y_position"] & info["distance_from_origin"] giving xpos instead of qpos observations (xpos observations are behind 1 mj_step() more here) (related Github issue #1 & Github issue #2).
Removed info["forward_reward"] as it is equivalent to info["reward_forward"].

HalfCheetah

Restored the xml_file argument (was removed in v4).
Renamed info["reward_run"] to info["reward_forward"] to be consistent with the other environments.

Hopper

Fixed bug: healthy_reward was given on every step (even if the Hopper was unhealthy), now it is only given when the Hopper is healthy. The info["reward_survive"] is updated with this change (related Github issue).
Restored the xml_file argument (was removed in v4).
Added individual reward terms in info (info["reward_forward"], info["reward_ctrl"], info["reward_survive"]).
Added info["z_distance_from_origin"] which is equal to the vertical distance of the "torso" body from its initial position.

Humanoid

Fixed bug: healthy_reward was given on every step (even if the Humanoid was unhealthy), now it is only given when the Humanoid is healthy. The info["reward_survive"] is updated with this change (related Github issue).
Restored contact_cost and the corresponding contact_cost_weight and contact_cost_range arguments, with the same defaults as in Humanoid-v3 (was removed in v4) (related Github issue).
Excluded the cinert & cvel & cfrc_ext of worldbody and root/freejoint qfrc_actuator from the observation space, as it was always 0, and thus provided no useful information to the agent, resulting in slightly faster training) (related Github issue).
Restored the xml_file argument (was removed in v4).
Added include_cinert_in_observation, include_cvel_in_observation, include_qfrc_actuator_in_observation, include_cfrc_ext_in_observation arguments to allow for the exclusion of observation elements from the observation space.
Fixed info["x_position"] & info["y_position"] & info["distance_from_origin"] returning xpos instead of qpos based observations (xpos observations are behind 1 mj_step() more here) (related Github issue #1 & Github issue #2).
Added info["tendon_length"] and info["tendon_velocity"] containing observations of the Humanoid's 2 tendons connecting the hips to the knees.
Renamed info["reward_alive"] to info["reward_survive"] to be consistent with the other environments.
Renamed info["reward_linvel"] to info["reward_forward"] to be consistent with the other environments.
Renamed info["reward_quadctrl"] to info["reward_ctrl"] to be consistent with the other environments.
Removed info["forward_reward"] as it is equivalent to info["reward_forward"].

Humanoid Standup

Excluded the cinert & cvel & cfrc_ext of worldbody and root/freejoint qfrc_actuator from the observation space, as it was always 0, and thus provided no useful information to the agent, resulting in slightly faster training) (related Github issue).
Restored the xml_file argument (was removed in v4).
Added xml_file argument.
Added uph_cost_weight, ctrl_cost_weight, impact_cost_weight, impact_cost_range arguments, to configure the reward function (defaults are effectively the same as in v4).
Added reset_noise_scale argument, to set the range of initial states.
Added include_cinert_in_observation, include_cvel_in_observation, include_qfrc_actuator_in_observation, include_cfrc_ext_in_observation arguments to allow for the exclusion of observation elements from the observation space.
Added info["tendon_length"] and info["tendon_velocity"] containing observations of the Humanoid's 2 tendons connecting the hips to the knees.
Added info["x_position"] & info["y_position"] , which contain the observations excluded when exclude_current_positions_from_observation == True.
Added info["z_distance_from_origin"] which is equal to the vertical distance of the "torso" body from its initial position.

InvertedDoublePendulum

Fixed bug: healthy_reward was given on every step (even if the Pendulum is unhealthy), now it is only given if the DoublePendulum is healthy (not terminated)(related Github issue).
Excluded the qfrc_constraint ("constraint force") of the hinges from the observation space (as it was always 0, thus providing no useful information to the agent, resulting is slightly faster training) (related Github issue).
Added xml_file argument.
Added reset_noise_scale argument, to set the range of initial states.
Added healthy_reward argument to configure the reward function (defaults are effectively the same as in v4).
Added individual reward terms in info ( info["reward_survive"], info["distance_penalty"], info["velocity_penalty"]).

InvertedPendulum

Fixed bug: healthy_reward was given on every step (even if the Pendulum is unhealthy), now it is only given if the Pendulum is healthy (not terminated) (related Github issue).
Added xml_file argument.
Added reset_noise_scale argument to set the range of initial states.
Added info["reward_survive"] which contains the reward.

Pusher

Added xml_file argument.
Added reward_near_weight, reward_dist_weight, reward_control_weight arguments, to configure the reward function (defaults are effectively the same as in v4).
Fixed info["reward_ctrl"] being not being multiplied by the reward weight.
Added info["reward_near"] which is equal to the reward term reward_near.

Reacher

Removed "z - position_fingertip" from the observation space since it is always 0, and therefore provides no useful information to the agent, this should result is slightly faster training (related Github issue).
Added xml_file argument.
Added reward_dist_weight, reward_control_weight arguments, to configure the reward function (defaults are effectively the same as in v4).
Fixed info["reward_ctrl"] being not being multiplied by the reward weight.

Swimmer

Restored the xml_file argument (was removed in v4).
Added forward_reward_weight, ctrl_cost_weight, to configure the reward function (defaults are effectively the same as in v4).
Added reset_noise_scale argument to set the range of initial states.
Added exclude_current_positions_from_observation argument.
Replaced info["reward_fwd"] and info["forward_reward"] with info["reward_forward"] to be consistent with the other environments.

Walker2D

In v2, v3 and v4 the models have different friction values for the two feet (left foot friction == 1.9 and right foot friction == 0.9). The Walker-v5 model is updated to have the same friction for both feet (set to 1.9). This causes the Walker2d's the right foot to slide less on the surface and therefore require more force to move (related Github issue).
Fixed bug: healthy_reward was given on every step (even if the Walker2D is unhealthy), now it is only given if the Walker2d is healthy. The info "reward_survive" is updated with this change (related Github issue).
Restored the xml_file argument (was removed in v4).
Added individual reward terms in info (info["reward_forward"], info["reward_ctrl"], info["reward_survive"]).
Added info["z_distance_from_origin"] which is equal to the vertical distance of the "torso" body from its initial position.

Type of change

add new revision of MuJoCo environments.

Checklist:

I have run the pre-commit checks with pre-commit run --all-files (see CONTRIBUTING.md instructions to set it up)
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Benchmarks

benchmark similarly to Use mujoco bindings instead of mujoco_py openai/gym#2595 (comment) (v3 → v4)
https://github.com/Kallinteris-Andreas/gymnasium-mujuco-v5-envs-validation

issues fixed:

TODO

Verify that docs are being built
Update "Version History"
Blog anouncement ???
Minary Dataset generation [Proposal] Gymnasium/MuJoCo-v5 dataset generation Minari#243

Finished environments

Cutting room floor (not included in the `v5` release)

Add option to observe tendons in Humanoids (include_tendon_in_observation).
Update kinematics of Ant & Humanoid after step.
Add ManySegmentSwimmer & CoupledHalfCheetah environments.
Add reset_noise_scale to manipulation environments (Pusher & Reacher).
Increase configurability of the manipulation environments (Pusher & Reacher).
Add termination conditions to the manipulation environments (Pusher & Reacher).
Add more arguments to control the reward function of InvertedDoublePendulum.
Reduce the obsevation space limits of angles.
Add noisy actions & observations & rewards
define healthy_z_range's body
HumanoidStandup.uph_cost based on self.dt and not opt.timestep
HumanoidStandup.model.left_hip_y range fix

Credits

Lead Developer: @Kallinteris-Andreas
Specifications/Requirements & Code Review: @pseudo-rnd-thoughts
Debugging assistance: @rodrigodelazcano

`tests/env/mujoco/test_mojoco_v3.py`

pseudo-rnd-thoughts

The documentation doesn't include in the version history the v5 changes

Are we planning on updating all of the assets to use the compile tool?

gymnasium/envs/mujoco/assets/walker2d_v5_old.xml

Kallinteris-Andreas · 2023-06-30T11:24:20Z

the version history is in the opening comment of this PR (for now)
No we will keep the current assets, no benefit in changing

pseudo-rnd-thoughts · 2023-06-30T11:33:55Z

@Kallinteris-Andreas Could you update Ant (and all other environments) to follow the detail that I outline in Farama-Foundation/Gymnasium-Robotics#104 (comment)

pseudo-rnd-thoughts

Given the number of suggested changes, I have only done Humanoid but can look to do more soon

gymnasium/envs/mujoco/humanoid_v5.py

Kallinteris-Andreas · 2023-07-12T17:57:22Z

@pseudo-rnd-thoughts wow, that was way more comments that I expected, many of which apply to multiple environments, so do not review another environments until I have resolved all the issue in Humanoid.

If you still want to review something, review the test_mujoco_v5.py

pseudo-rnd-thoughts

The tests are very comprehensive and impressive, nice job on them, only two points about them

tests/envs/mujoco/test_mujoco_v5.py

Kallinteris-Andreas · 2023-07-14T17:01:31Z

@pseudo-rnd-thoughts
I have applied most of the requested changed in humanoid to all environments (when applicable),
a second pass to Humanoid should be enough, and then review the HumanoidStandup (Note: for HumanoidStandup, you can skip the action & observation space sections since they are a copy and paste from Humanoid)

pseudo-rnd-thoughts

I have a quick scroll through all of the environments and the changes are super impressive. Amazing job, I have two small requested changes otherwise looks good to merge to me

gymnasium/envs/mujoco/half_cheetah_v5.py

gymnasium/envs/mujoco/humanoid_v5.py

Kallinteris-Andreas · 2023-12-12T10:50:51Z

For future reference, an additional bug has been fixed in Gymnasium/MuJoCo-v5 in this PR #832

Kallinteris-Andreas · 2024-04-23T17:52:31Z

For reference:
In the MuJoCo-v5 release, this PR was also added to fix a bug with Pusher-v5 and mujoco>=3.0.0 #1019

Kallinteris-Andreas and others added 13 commits May 2, 2023 17:55

Add Hopper and Walker2D models for v5

cf153f6

Merge branch 'Farama-Foundation:main' into main

bc92449

Delete hopper_v5.xml

0cbdd72

Delete walker2d_v5.xml

db3734e

General MuJoCo Env Documention Cleanup

a2d2e64

typofix

f58bb5e

typo fix

7a4bc32

update following @pseudo-rnd-thoughts reviews

2418631

Merge branch 'Farama-Foundation:main' into main

3b9080b

Merge branch 'Farama-Foundation:main' into main

77bcb8b

refactor tests/env/test_mojoco.py ->

7639d18

`tests/env/mujoco/test_mojoco_v3.py`

Merge branch 'Farama-Foundation:main' into main

8eb1b11

add files

ab931b7

Kallinteris-Andreas marked this pull request as draft June 27, 2023 19:26

Kallinteris-Andreas added 3 commits June 27, 2023 22:50

pre-commit

c3077ef

remove old v4 doc

52fbae6

fix test_make_erros()

25fe2c3

Kallinteris-Andreas closed this Jun 27, 2023

Kallinteris-Andreas added 4 commits June 28, 2023 01:29

cleanup tests

43c25c6

ant cleanup

2602a04

fix mjc-py error

79c8fa9

update tests

a8af4f5

Kallinteris-Andreas changed the title ~~adding Mujoco v5 environments~~ add Mujoco v5 environments Jun 30, 2023

Kallinteris-Andreas reopened this Jun 30, 2023

Kallinteris-Andreas and others added 2 commits June 30, 2023 13:42

Merge branch 'Farama-Foundation:main' into mujoco-v5

edd839a

cleanup

071d3a1

pseudo-rnd-thoughts requested changes Jun 30, 2023

View reviewed changes

gymnasium/envs/mujoco/assets/walker2d_v5_old.xml Outdated Show resolved Hide resolved

remove old file

3b90221

Kallinteris-Andreas and others added 6 commits July 10, 2023 16:35

undo

0466100

fix test_dt typing

7e8a4d6

undo mujoco env changes

ee66207

Merge branch 'Farama-Foundation:main' into mujoco-v5

525af00

a

a5da62b

pre-commit

f6fba95

pseudo-rnd-thoughts requested changes Jul 12, 2023

View reviewed changes

pseudo-rnd-thoughts reviewed Jul 12, 2023

View reviewed changes

tests/envs/mujoco/test_mujoco_v5.py Show resolved Hide resolved

tests/envs/mujoco/test_mujoco_v5.py Outdated Show resolved Hide resolved

rickstaa mentioned this pull request Jul 13, 2023

refactor(hopper_v4): add forward/health reward and ctrl cost to step info #602

Closed

10 tasks

Farama-Foundation deleted a comment from Kallinteris-Andreas Jul 13, 2023

Kallinteris-Andreas added 3 commits July 14, 2023 19:05

@pseudo-rnd-thoughts 2nd review comments

eaf4664

add test_reset_noise_scale

2bb1687

cleanup humanoidstandup

0c2c45b

Kallinteris-Andreas and others added 4 commits July 16, 2023 01:11

add changelog in "Version History"

cc65b42

typo fix

ff65f7d

Merge branch 'Farama-Foundation:main' into mujoco-v5

148d662

ant fix root obs name in DOCs

3bd9263

pseudo-rnd-thoughts requested changes Jul 23, 2023

View reviewed changes

gymnasium/envs/mujoco/half_cheetah_v5.py Outdated Show resolved Hide resolved

gymnasium/envs/mujoco/humanoid_v5.py Outdated Show resolved Hide resolved

@pseudo-rnd-thoughts 3rd review comments resolve

f559bb3

pseudo-rnd-thoughts approved these changes Jul 24, 2023

View reviewed changes

pseudo-rnd-thoughts merged commit 0736be9 into Farama-Foundation:main Jul 24, 2023

rickstaa mentioned this pull request Jul 26, 2023

Remove control_cost() method from Walker2d and Hopper when gymnasium v5 environments are released rickstaa/stable-gym#267

Open

Kallinteris-Andreas mentioned this pull request Oct 11, 2023

[WIP] add MaMuJoCo-v1 Farama-Foundation/Gymnasium-Robotics#161

Closed

10 tasks

This was referenced Dec 16, 2023

add MaMuJoCo-v1 environments Farama-Foundation/Gymnasium-Robotics#196

Merged

[RFC] MJX environment prototype (WIP) #834

Draft

add default_camera_config argument on Swimmer, InvertedPendulum & InvertedDoublePendulum #854

Merged

Uh oh!

Add MuJoCo v5 environments #572

Add MuJoCo v5 environments #572

Uh oh!

Conversation

Kallinteris-Andreas commented Jun 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changelog

All v5 environments

Ant

HalfCheetah

Hopper

Humanoid

Humanoid Standup

InvertedDoublePendulum

InvertedPendulum

Pusher

Reacher

Swimmer

Walker2D

Type of change

Checklist:

Benchmarks

issues fixed:

TODO

Finished environments

Cutting room floor (not included in the v5 release)

Credits

Uh oh!

pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Kallinteris-Andreas commented Jun 30, 2023

Uh oh!

pseudo-rnd-thoughts commented Jun 30, 2023

Uh oh!

pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Kallinteris-Andreas commented Jul 12, 2023

Uh oh!

pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Kallinteris-Andreas commented Jul 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Kallinteris-Andreas commented Dec 12, 2023

Uh oh!

Kallinteris-Andreas commented Apr 23, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Kallinteris-Andreas commented Jun 27, 2023 •

edited

Loading

Cutting room floor (not included in the `v5` release)

Kallinteris-Andreas commented Jul 14, 2023 •

edited

Loading