-
-
Notifications
You must be signed in to change notification settings - Fork 128
mujoco-v5 initial commit
#104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mujoco-v5 initial commit
#104
Conversation
|
@Kallinteris-Andreas Thanks for all of this hard work, we are planning on having a gymnasium v0.28.2 and v0.29 in the next few weeks. If there are any changes you want to make in gymnasium, could you do them soon, thanks Also, for what reason is the CI failing? |
|
|
@Kallinteris-Andreas I can't remember the previous conversations we had about this but I don't think we are planning on moving the mujoco environments (v2, v3 or v4) to gymnasium robotics. |
|
Yeah, The After validation, I can move the PR to the gymnasium repo, it is no problem. |
|
@pseudo-rnd-thoughts & @rodrigodelazcano Note: the changelog here will be used in the Thanks! |
|
Could you add much more detail to each point, in particular, why the change was made? It would be great if you could look at the notes with minimal previous knowledge of the environment and understand the changes For examples
|
|
@pseudo-rnd-thoughts thanks, I have made a bunch of improvements. Can you do a second pass of the change list, to make sure that all the changes are desired. |
Without more detail, I can't understand all of the changes, could you do a documentation update |
|
All the changes are in the docstings of the environments |
cont from: #91
Description
Adds the
v5version of themujocoenvironments.Changelog
mujocominimum version is 2.3.3 now.mujocomodels with the usage of thexml_fileargument (previously only a few changes could be made).default_camera_configargument, a dictionary for setting themj_cameraproperties, primarily useful for custom environments.env.observation_structure, a dictionary indicating the composition of the observation space (e.g.qpos,qvel), useful for building tooling and wrappers for the MuJoCo environments.infowithreset(), previously an empty dictionary would be returned.frame_skipargument.Ant
healthy_rewardbeing given on every step (even when the Ant is unhealthy), now it is only given when the Ant is healthy. Theinfo"reward_survive" is updated with this change.contact_cost, previously it was only included whenuse_contact_forces=True(can be set to0withcontact_cost_weight=0).worldbody'scfrc_extfrom the observation space (since they are constantly 0, and therefore provide no useful information to the agent, should result is slightly faster training).main_bodyargument.forward_reward_weightargument.include_cfrc_ext_in_observationargument.use_contact_forcesargument (note: its functionality has been replaced withinclude_cfrc_ext_in_observationandcontact_cost_weight).info"reward_ctrl" sometimes containingcontact_costinstead ofctrl_cost.info"x_position" & "y_position" givingxposinstead ofqposobservations (xposobservations are behind 1mj_step()).info(note: there still exits "reward_forward", which contains the same information).Half Cheetah
xml_fileargument.info"reward_run" → "reward_forward" (to be consistent with the other environments).Hopper
coordinate='global', but has near identical behavior).healthy_rewardbeing given on every step (even when the Hopper is unhealthy), now it is only given when the Ant is healthy. Theinfo"reward_survive" is updated with this change.xml_fileargument.info"reward_forward", "reward_ctrl", "reward_survive", "z_distance_from_origin".Humanoid
healthy_rewardbeing given on every step (even when the Humanoid is unhealthy), now it is only given when the Ant is healthy. Theinfo"reward_survive" is updated with this change.contact_cost(and the correspondingcontact_cost_weightandcontact_cost_rangearguments).worldbodycinert&cvel&cfrc_ext&root/freejoint`qfrc_actuator from the observation space (since they are constantly 0, and therefore provide no useful information to the agent, should result is slightly faster training).xml_fileargument.include_cinert_in_observation,include_cvel_in_observation,include_qfrc_actuator_in_observation,include_cfrc_ext_in_observationarguments.info"x_position" & "y_position" givingxposinstead ofqposobservations (xposobservations are behind 1mj_step()).info"tendon_lenght" & "tendon_velocity".info"reward_alive" → "reward_survive" (to be consistent with the other environments).info"reward_linvel" → "reward_forward" (to be consistent with the other environments).info"reward_quadctrl" → "reward_ctrl" (to be consistent with the other environments).info(note: there still exits "reward_forward").Humanoid Standup
worldbodycinert&cvel&cfrc_ext&root/freejoint`qfrc_actuator from the observation space (since they are constantly 0, and therefore provide no useful information to the agent, should result is slightly faster training).xml_file,uph_cost_weight,ctrl_cost_weight,impact_cost_weight,impact_cost_range,reset_noise_scale,exclude_current_positions_from_observation,include_cinert_in_observation,include_cvel_in_observation,include_qfrc_actuator_in_observation,include_cfrc_ext_in_observationarguments.info"tendon_lenght" & "tendon_velocity".info"x_position" & "y_position" & "z_distance_from_origin".InvertedDoublePendulum
healthy_rewardbeing given on every step (even when the Pendulum is unhealthy), now it is only given when the Ant is healthy. Theinfo"reward_survive" is updated with this change.qfrc_constraint("constraint force") of the hinges from the observation space (since they are constantly 0, and therefore provide no useful information to the agent, should result is slightly faster training).xml_file,healthy_reward,reset_noise_scalearguments.info"reward_survive", "distance_penalty", "velocity_penalty".InvertedPendulum
healthy_rewardbeing given on every step (even when the Pendulum is unhealthy), now it is only given when the Ant is healthy. Theinfo"reward_survive" is updated with this change.xml_file,reset_noise_scalearguments.info"reward_survive".Pusher
xml_fileargument.reward_near_weight,reward_dist_weight,reward_control_weightarguments.info"reward_ctrl" being not being multiplied by the reward weight.info"reward_near".Reacher
xml_fileargument.reward_dist_weight,reward_control_weightarguments.info"reward_ctrl" being not being multiplied by the reward weight.Swimmer
xml_fileargument.forward_reward_weight,ctrl_cost_weight,reset_noise_scale,exclude_current_positions_from_observationarguments.info "reward_fwd"/ "forward_reward" → "reward_forward"(to be consistent with the other environments).Walker2D
coordinate='global'), now both feet havefriction==1.9, previously the right foot hadfriction==0.9and left foot hadfriction==1.9.healthy_rewardbeing given on every step (even when the Walker2D is unhealthy), now it is only given when the Ant is healthy. Theinfo"reward_survive" is updated with this change.xml_fileargument.info"reward_forward", "reward_ctrl", "reward_survive", "z_distance_from_origin".Type of change
add new revision of
MuJoCoenvironments.Checklist:
pre-commitchecks withpre-commit run --all-files(seeCONTRIBUTING.mdinstructions to set it up)Benchmarks
v3→v4)https://github.com/Kallinteris-Andreas/gymnasium-mujuco-v5-envs-validation
issues fixed:
"global"with"local"coordinate system google-deepmind/mujoco#833Humanoid&AntHave wronginfo["distance_from_origin"]Gymnasium#539Ant&Humanoidhave wrong "x_position" & "y_position"infoGymnasium#521Humanoid-v4does not havecontact_costGymnasium#504InvertedDoublePendulumEnvandInvertedPendulumEnvalways gives "alive_bonus" Gymnasium#500MuJoCo/Walker2dleft foot has different friction than right foot Gymnasium#477mujoco.InvertedDoublePendulumlast 2 observations (constraints) are const 0 Gymnasium#228MuJoCo.Antcontact forces being off by default is based on a wrong experiment Gymnasium#214TODO
Finished environments
Cutting room floor (not included in the
v5release)HumanoidsAnt&Humanoidafter stepManySegmentSwimmer&CoupledHalfCheetahenvironmentsreset_noise_scaletoPusher&ReacherCredits
Lead Developer: Kallinteris Andreas
Debugging assistance & setting specification/requirements: Rodrigo, Mark Towers
Technical Advisor: saran-t (helped with the creation of the new
HopperandWalker2Dmodels)