-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Add MuJoCo v5 environments #572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MuJoCo v5 environments #572
Conversation
`tests/env/mujoco/test_mojoco_v3.py`
pseudo-rnd-thoughts
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation doesn't include in the version history the v5 changes
Are we planning on updating all of the assets to use the compile tool?
|
|
@Kallinteris-Andreas Could you update Ant (and all other environments) to follow the detail that I outline in Farama-Foundation/Gymnasium-Robotics#104 (comment) |
pseudo-rnd-thoughts
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the number of suggested changes, I have only done Humanoid but can look to do more soon
|
@pseudo-rnd-thoughts wow, that was way more comments that I expected, many of which apply to multiple environments, so do not review another environments until I have resolved all the issue in If you still want to review something, review the |
pseudo-rnd-thoughts
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests are very comprehensive and impressive, nice job on them, only two points about them
|
@pseudo-rnd-thoughts |
pseudo-rnd-thoughts
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a quick scroll through all of the environments and the changes are super impressive. Amazing job, I have two small requested changes otherwise looks good to merge to me
|
For future reference, an additional bug has been fixed in |
|
For reference: |
cont from: Farama-Foundation/Gymnasium-Robotics#104
Description
Adds the
v5version of themujocoenvironments.Changelog
mujocoversion is now 2.3.3.All v5 environments
mujocomodels using thexml_fileargument (previously only a few changes could be made to the existing models).default_camera_configargument, a dictionary for setting themj_cameraproperties, mainly useful for custom environments.env.observation_structure, a dictionary for specifying the observation space compose (e.g.qpos,qvel), useful for building tooling and wrappers for the MuJoCo environments.infowithreset(), previously an empty dictionary was returned, the new keys are the same state information asstep().frame_skipargument, used to configure thedt(duration ofstep()), default varies by environment check environment documentation pages.Ant
healthy_rewardwas given on every step (even if the Ant is unhealthy), now it is only given when the Ant is healthy. Theinfo["reward_survive"]is updated with this change (related Github issue).contact_cost, before it was only included ifuse_contact_forces=True(can be set to0withcontact_cost_weight=0).cfrc_extofworldbodyfrom the observation space as it was always 0, and thus provided no useful information to the agent, resulting is slightly faster training (related Github issue).main_bodyargument, which specifies the body used to compute the forward reward (mainly useful for custom MuJoCo models).forward_reward_weightargument, which defaults to1(effectively the same behavior as inv4).include_cfrc_ext_in_observationargument, previously inv4the inclusion ofcfrc_extobservations was controlled byuse_contact_forceswhich defaulted toFalse, whileinclude_cfrc_ext_in_observationdefaults toTrue.use_contact_forcesargument (note: its functionality has been replaced byinclude_cfrc_ext_in_observationandcontact_cost_weight) (related Github issue).info["reward_ctrl"]sometimes containingcontact_costinstead ofctrl_cost.info["x_position"]&info["y_position"]&info["distance_from_origin"]givingxposinstead ofqposobservations (xposobservations are behind 1mj_step()more here) (related Github issue #1 & Github issue #2).info["forward_reward"]as it is equivalent toinfo["reward_forward"].HalfCheetah
xml_fileargument (was removed inv4).info["reward_run"]toinfo["reward_forward"]to be consistent with the other environments.Hopper
healthy_rewardwas given on every step (even if the Hopper was unhealthy), now it is only given when the Hopper is healthy. Theinfo["reward_survive"]is updated with this change (related Github issue).xml_fileargument (was removed inv4).info(info["reward_forward"], info["reward_ctrl"],info["reward_survive"]).info["z_distance_from_origin"]which is equal to the vertical distance of the "torso" body from its initial position.Humanoid
healthy_rewardwas given on every step (even if the Humanoid was unhealthy), now it is only given when the Humanoid is healthy. Theinfo["reward_survive"]is updated with this change (related Github issue).contact_costand the correspondingcontact_cost_weightandcontact_cost_rangearguments, with the same defaults as inHumanoid-v3(was removed inv4) (related Github issue).cinert&cvel&cfrc_extofworldbodyandroot/freejointqfrc_actuatorfrom the observation space, as it was always 0, and thus provided no useful information to the agent, resulting in slightly faster training) (related Github issue).xml_fileargument (was removed inv4).include_cinert_in_observation,include_cvel_in_observation,include_qfrc_actuator_in_observation,include_cfrc_ext_in_observationarguments to allow for the exclusion of observation elements from the observation space.info["x_position"]&info["y_position"]&info["distance_from_origin"]returningxposinstead ofqposbased observations (xposobservations are behind 1mj_step()more here) (related Github issue #1 & Github issue #2).info["tendon_length"]andinfo["tendon_velocity"]containing observations of the Humanoid's 2 tendons connecting the hips to the knees.info["reward_alive"]toinfo["reward_survive"]to be consistent with the other environments.info["reward_linvel"]toinfo["reward_forward"]to be consistent with the other environments.info["reward_quadctrl"]toinfo["reward_ctrl"]to be consistent with the other environments.info["forward_reward"]as it is equivalent toinfo["reward_forward"].Humanoid Standup
cinert&cvel&cfrc_extofworldbodyandroot/freejointqfrc_actuatorfrom the observation space, as it was always 0, and thus provided no useful information to the agent, resulting in slightly faster training) (related Github issue).xml_fileargument (was removed inv4).xml_fileargument.uph_cost_weight,ctrl_cost_weight,impact_cost_weight,impact_cost_rangearguments, to configure the reward function (defaults are effectively the same as inv4).reset_noise_scaleargument, to set the range of initial states.include_cinert_in_observation,include_cvel_in_observation,include_qfrc_actuator_in_observation,include_cfrc_ext_in_observationarguments to allow for the exclusion of observation elements from the observation space.info["tendon_length"]andinfo["tendon_velocity"]containing observations of the Humanoid's 2 tendons connecting the hips to the knees.info["x_position"]&info["y_position"], which contain the observations excluded whenexclude_current_positions_from_observation == True.info["z_distance_from_origin"]which is equal to the vertical distance of the "torso" body from its initial position.InvertedDoublePendulum
healthy_rewardwas given on every step (even if the Pendulum is unhealthy), now it is only given if the DoublePendulum is healthy (not terminated)(related Github issue).qfrc_constraint("constraint force") of the hinges from the observation space (as it was always 0, thus providing no useful information to the agent, resulting is slightly faster training) (related Github issue).xml_fileargument.reset_noise_scaleargument, to set the range of initial states.healthy_rewardargument to configure the reward function (defaults are effectively the same as inv4).info(info["reward_survive"],info["distance_penalty"],info["velocity_penalty"]).InvertedPendulum
healthy_rewardwas given on every step (even if the Pendulum is unhealthy), now it is only given if the Pendulum is healthy (not terminated) (related Github issue).xml_fileargument.reset_noise_scaleargument to set the range of initial states.info["reward_survive"]which contains the reward.Pusher
xml_fileargument.reward_near_weight,reward_dist_weight,reward_control_weightarguments, to configure the reward function (defaults are effectively the same as inv4).info["reward_ctrl"]being not being multiplied by the reward weight.info["reward_near"]which is equal to the reward termreward_near.Reacher
"z - position_fingertip"from the observation space since it is always 0, and therefore provides no useful information to the agent, this should result is slightly faster training (related Github issue).xml_fileargument.reward_dist_weight,reward_control_weightarguments, to configure the reward function (defaults are effectively the same as inv4).info["reward_ctrl"]being not being multiplied by the reward weight.Swimmer
xml_fileargument (was removed inv4).forward_reward_weight,ctrl_cost_weight, to configure the reward function (defaults are effectively the same as inv4).reset_noise_scaleargument to set the range of initial states.exclude_current_positions_from_observationargument.info["reward_fwd"]andinfo["forward_reward"]withinfo["reward_forward"]to be consistent with the other environments.Walker2D
Walker-v5model is updated to have the same friction for both feet (set to 1.9). This causes the Walker2d's the right foot to slide less on the surface and therefore require more force to move (related Github issue).healthy_rewardwas given on every step (even if the Walker2D is unhealthy), now it is only given if the Walker2d is healthy. Theinfo"reward_survive" is updated with this change (related Github issue).xml_fileargument (was removed inv4).info(info["reward_forward"], info["reward_ctrl"],info["reward_survive"]).info["z_distance_from_origin"]which is equal to the vertical distance of the "torso" body from its initial position.Type of change
add new revision of
MuJoCoenvironments.Checklist:
pre-commitchecks withpre-commit run --all-files(seeCONTRIBUTING.mdinstructions to set it up)Benchmarks
v3→v4)https://github.com/Kallinteris-Andreas/gymnasium-mujuco-v5-envs-validation
issues fixed:
"global"with"local"coordinate system google-deepmind/mujoco#833Humanoid&AntHave wronginfo["distance_from_origin"]#539Ant&Humanoidhave wrong "x_position" & "y_position"info#521Humanoid-v4does not havecontact_cost#504InvertedDoublePendulumEnvandInvertedPendulumEnvalways gives "alive_bonus" #500MuJoCo/Walker2dleft foot has different friction than right foot #477mujoco.InvertedDoublePendulumlast 2 observations (constraints) are const 0 #228MuJoCo.Antcontact forces being off by default is based on a wrong experiment #214MuJoCo] Reacher And Pusher reward is calculated prior to transition #821 (fixed in FixReacher-v5&Pusher-v5reward function being calculated using previous state #832)TODO
Finished environments
Cutting room floor (not included in the
v5release)Humanoids (include_tendon_in_observation).Ant&Humanoidafter step.ManySegmentSwimmer&CoupledHalfCheetahenvironments.reset_noise_scaleto manipulation environments (Pusher&Reacher).Pusher&Reacher).healthy_z_range's bodyHumanoidStandup.uph_costbased onself.dtand notopt.timestepHumanoidStandup.model.left_hip_yrange fixCredits
Lead Developer: @Kallinteris-Andreas
Specifications/Requirements & Code Review: @pseudo-rnd-thoughts
Debugging assistance: @rodrigodelazcano