Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
94 commits
Select commit Hold shift + click to select a range
1aa736b
add pygame GUI for frozen_lake.py env
younik Jan 21, 2022
73a9e6b
add new line at EOF
younik Jan 21, 2022
c9a359a
pre-commit reformat
younik Jan 24, 2022
354830c
improve graphics
younik Jan 28, 2022
95258f5
new images and dynamic window size
younik Feb 1, 2022
a121b7b
darker tile borders and fix ICC profile
younik Feb 2, 2022
91f3a03
pre-commit hook
younik Feb 2, 2022
6f56feb
adjust elf and stool size
younik Feb 3, 2022
0abec2f
Update frozen_lake.py
jkterry1 Feb 3, 2022
ad87dbc
reformat
younik Feb 5, 2022
48c22fe
Merge branch 'openai:master' into master
younik Feb 7, 2022
6c38481
fix #2600
younik Feb 7, 2022
2649a1c
#2600
younik Feb 7, 2022
c2da74a
add rgb_array support
younik Feb 8, 2022
45a3154
reformat
younik Feb 8, 2022
db9ba2f
Merge branch 'master' into master
younik Feb 10, 2022
9711282
test render api change on FrozenLake
younik Mar 4, 2022
cd85df4
add render support for reset on frozenlake
younik Mar 5, 2022
1beba40
Merge branch 'render_api'
younik Mar 5, 2022
15d9321
add clock on pygame render
younik Mar 5, 2022
3089e57
new render api for blackjack
younik Mar 5, 2022
fad0f6f
new render api for cliffwalking
younik Mar 5, 2022
ffd9fc8
new render api for Env class
younik Mar 5, 2022
5140dc9
update reset method, lunar and Env
younik Mar 5, 2022
311ac82
fix wrapper
younik Mar 5, 2022
1197a36
fix reset lunar
younik Mar 5, 2022
da77145
new render api for box2d envs
younik Mar 5, 2022
18fe379
new render api for mujoco envs
younik Mar 6, 2022
d586394
fix bug
younik Mar 6, 2022
b078647
new render api for classic control envs
younik Mar 6, 2022
415449c
fix tests
younik Mar 9, 2022
9363aa0
add render_mode None for CartPole
younik Mar 10, 2022
11ab948
Merge branch 'master' into master
younik Mar 10, 2022
4912778
new render api for test fake envs
younik Mar 10, 2022
d5107d4
Merge remote-tracking branch 'origin/master'
younik Mar 10, 2022
98ab069
pre-commit hook
younik Mar 10, 2022
e06d293
fix FrozenLake
younik Mar 10, 2022
6fab5a2
fix FrozenLake
younik Mar 10, 2022
0f970ff
more render_mode to super - frozenlake
younik Mar 11, 2022
9999527
Merge remote-tracking branch 'origin/master'
younik Mar 11, 2022
f945184
Merge branch 'master' into master
younik Mar 11, 2022
4771755
remove kwargs from frozen_lake new
younik Mar 11, 2022
b458f4f
Merge remote-tracking branch 'origin/master'
younik Mar 11, 2022
1f5ddf1
pre-commit hook
younik Mar 11, 2022
96b3a7a
solve conflicts
younik Mar 17, 2022
64dcf77
add deprecated render method
younik Mar 23, 2022
5601778
Merge branch 'master' into master
younik Apr 2, 2022
cffacd7
Merge remote-tracking branch 'origin/master'
younik Apr 2, 2022
9570f30
add backwards compatibility
younik Apr 4, 2022
4345b21
fix test
younik Apr 4, 2022
99c6680
add _render
younik Apr 5, 2022
8ad9ed7
Merge branch 'master' into master
younik Apr 8, 2022
765c014
move pygame.init() (avoid pygame dependency on init)
younik Apr 8, 2022
45cbabd
fix pygame dependencies
younik Apr 8, 2022
7b13622
Merge branch 'master' into master
younik Apr 9, 2022
f53aa27
remove collect_render() maintain multi-behaviours .render()
younik Apr 19, 2022
4d76fe1
Merge remote-tracking branch 'origin/master'
younik Apr 19, 2022
2ab1824
Merge branch 'master' into master
younik Apr 19, 2022
343f72a
add type hints
younik Apr 21, 2022
c4bfe84
fix renderer
younik Apr 21, 2022
2b0ca9a
don't call .render() with None
younik Apr 21, 2022
0869ee7
improve docstring
younik Apr 21, 2022
d08b80d
add single_rgb_array to all envs
younik Apr 25, 2022
c7156bd
remove None from metadata["render_modes"]
younik Apr 25, 2022
e8e3c26
Merge branch 'master' into master
younik Apr 25, 2022
3a2f9b6
add type hints to test_env_checkers
younik Apr 25, 2022
a3ac176
Merge remote-tracking branch 'origin/master'
younik Apr 25, 2022
66b0c23
fix lint
younik Apr 25, 2022
ff4aff3
add comments to renderer
younik Apr 26, 2022
e0753ef
add comments to single_depth_array and single_state_pixels
younik Apr 30, 2022
3a961a3
Merge branch 'master' into master
younik Apr 30, 2022
07cf336
reformat
younik Apr 30, 2022
694220d
add deprecation warnings and env.render_mode declaration
younik May 7, 2022
9d04c6a
fix lint
younik May 7, 2022
52268f9
reformat
younik May 7, 2022
a0a409f
fix tests
younik May 7, 2022
d319228
Merge branch 'master' of https://github.com/openai/gym
younik May 23, 2022
ba01803
add docs
younik May 23, 2022
ce8d471
fix car racing determinism
younik May 23, 2022
f6a0c42
remove warning test envs, customizable modes on renderer
younik Jun 2, 2022
11d2260
remove commments and add todo for env_checker
younik Jun 3, 2022
7efffb8
fix car racing
younik Jun 3, 2022
5fe6e80
replace render mode check with assert
younik Jun 3, 2022
e4dc18c
Merge remote-tracking branch 'openai-gym/master'
younik Jun 3, 2022
0a979cc
update new mujoco
younik Jun 3, 2022
69326af
reformat
younik Jun 3, 2022
c220b4a
Merge remote-tracking branch 'openai-gym/master'
younik Jun 3, 2022
efda297
reformat
younik Jun 3, 2022
edb2c7a
change metaclass definition
younik Jun 4, 2022
88da572
Merge remote-tracking branch 'openai-gym/master'
younik Jun 4, 2022
540019c
fix tests
younik Jun 5, 2022
6c55c7b
implement mark suggestions (test, docs, sets)
younik Jun 6, 2022
34b8d08
Merge remote-tracking branch 'openai-gym/master'
younik Jun 6, 2022
bdb3220
check_render
younik Jun 6, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
solve conflicts
  • Loading branch information
younik committed Mar 17, 2022
commit 96b3a7a779d9b46219edb037efe558e3f9302bdb
2 changes: 1 addition & 1 deletion .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,4 @@ jobs:
--build-arg PYTHON_VERSION=${{ matrix.python-version }} \
--tag gym-docker .
- name: Run tests
run: docker run gym-docker pytest --forked --import-mode=append
run: docker run gym-docker pytest --import-mode=append
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,10 +32,10 @@ env.close()

## Notable Related Libraries

* [Stable Baselines 3](https://github.com/DLR-RM/stable-baselines3) is a learning library based on the Gym API. It is our recommendation for beginners who want to start learning things quickly.
* [RL Baselines3 Zoo](https://github.com/DLR-RM/rl-baselines3-zoo) builds upon SB3, containing optimal hyperparameters for Gym environments as well as code to easily find new ones. Such tuning is almost always required.
* The [Autonomous Learning Library](https://github.com/cpnota/autonomous-learning-library) and [Tianshou](https://github.com/thu-ml/tianshou) are two reinforcement learning libraries I like that are generally geared towards more experienced users.
* [RLlib](https://docs.ray.io/en/latest/rllib/index.html) is a commonly used library that allows for distributed training and inferencing.
* [Stable Baselines 3](https://github.com/DLR-RM/stable-baselines3) is a learning library based on the Gym API. It is designed to cater to complete beginners in the field who want to start learning things quickly.
* [RL Baselines3 Zoo](https://github.com/DLR-RM/rl-baselines3-zoo) builds upon SB3, containing optimal hyperparameters for Gym environments as well as code to easily find new ones.
* [Tianshou](https://github.com/thu-ml/tianshou) is a learning library that's geared towards very experienced users and is design to allow for ease in complex algorithm modifications.
* [RLlib](https://docs.ray.io/en/latest/rllib/index.html) is a learning library that allows for distributed training and inferencing and supports an extraordinarily large number of features throughout the reinforcement learning space.
* [PettingZoo](https://github.com/Farama-Foundation/PettingZoo) is like Gym, but for environments with multiple agents.

## Environment Versioning
Expand Down
11 changes: 9 additions & 2 deletions gym/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,15 +14,22 @@
from gym import vector
from gym import wrappers
import os

import sys

__all__ = ["Env", "Space", "Wrapper", "make", "spec", "register"]

# Initializing pygame initializes audio connections through SDL. SDL uses alsa by default on all Linux systems
# SDL connecting to alsa frequently create these giant lists of warnings every time you import an environment using
# pygame
# DSP is far more benign (and should probably be the default in SDL anyways)

if sys.platform.startswith("linux"):
os.environ["SDL_AUDIODRIVER"] = "dsp"

os.environ["PYGAME_HIDE_SUPPORT_PROMPT"] = "hide"

try:
import gym_notices.notices as notices
import sys

# print version warning if necessary
notice = notices.notices.get(__version__)
Expand Down
4 changes: 3 additions & 1 deletion gym/envs/box2d/car_racing.py
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,8 @@ def __init__(self, render_mode="human", verbose=1, lap_complete_percent=0.95):
shape=polygonShape(vertices=[(0, 0), (1, 0), (1, -1), (0, -1)])
)

# This will throw a warning in tests/envs/test_envs in utils/env_checker.py as the space is not symmetric
# or normalised however this is not possible here so ignore
self.action_space = spaces.Box(
np.array([-1, 0, 0]).astype(np.float32),
np.array([+1, +1, +1]).astype(np.float32),
Expand Down Expand Up @@ -613,9 +615,9 @@ def _create_image_array(self, screen, size):
)

def close(self):
pygame.quit()
if self.screen is not None:
pygame.display.quit()
pygame.quit()
self.isopen = False


Expand Down
5 changes: 4 additions & 1 deletion gym/envs/classic_control/pendulum.py
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,9 @@ def __init__(self, render_mode="human", g=10.0):
self.render_list = []

high = np.array([1.0, 1.0, self.max_speed], dtype=np.float32)
# This will throw a warning in tests/envs/test_envs in utils/env_checker.py as the space is not symmetric
# or normalised as max_torque == 2 by default. Ignoring the issue here as the default settings are too old
# to update to follow the openai gym api
self.action_space = spaces.Box(
low=-self.max_torque, high=self.max_torque, shape=(1,), dtype=np.float32
)
Expand Down Expand Up @@ -202,7 +205,7 @@ def _render(self):
scale_img = pygame.transform.smoothscale(
img, (scale * np.abs(self.last_u) / 2, scale * np.abs(self.last_u) / 2)
)
is_flip = self.last_u > 0
is_flip = bool(self.last_u > 0)
scale_img = pygame.transform.flip(scale_img, is_flip, True)
self.surf.blit(
scale_img,
Expand Down
84 changes: 0 additions & 84 deletions gym/envs/mujoco/archive/striker.py

This file was deleted.

70 changes: 0 additions & 70 deletions gym/envs/mujoco/archive/thrower.py

This file was deleted.

124 changes: 124 additions & 0 deletions gym/envs/mujoco/pusher.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,132 @@
from gym import utils
from gym.envs.mujoco import mujoco_env

import mujoco_py


class PusherEnv(mujoco_env.MujocoEnv, utils.EzPickle):
"""
### Description
"Pusher" is a multi-jointed robot arm which is very similar to that of a human.
The goal is to move a target cylinder (called *object*) to a goal position using the robot's end effector (called *fingertip*).
The robot consists of shoulder, elbow, forearm, and wrist joints.

### Action Space
The action space is a `Box(-2, 2, (7,), float32)`. An action `(a, b)` represents the torques applied at the hinge joints.

| Num | Action | Control Min | Control Max | Name (in corresponding XML file) | Joint | Unit |
|-----|--------------------------------------------------------------------|-------------|-------------|----------------------------------|-------|--------------|
| 0 | Rotation of the panning the shoulder | -2 | 2 | r_shoulder_pan_joint | hinge | torque (N m) |
| 1 | Rotation of the shoulder lifting joint | -2 | 2 | r_shoulder_lift_joint | hinge | torque (N m) |
| 2 | Rotation of the shoulder rolling joint | -2 | 2 | r_upper_arm_roll_joint | hinge | torque (N m) |
| 3 | Rotation of hinge joint that flexed the elbow | -2 | 2 | r_elbow_flex_joint | hinge | torque (N m) |
| 4 | Rotation of hinge that rolls the forearm | -2 | 2 | r_forearm_roll_joint | hinge | torque (N m) |
| 5 | Rotation of flexing the wrist | -2 | 2 | r_wrist_flex_joint | hinge | torque (N m) |
| 6 | Rotation of rolling the wrist | -2 | 2 | r_wrist_roll_joint | hinge | torque (N m) |

### Observation Space

Observations consist of

- Angle of rotational joints on the pusher
- Anglular velocities of rotational joints on the pusher
- The coordinates of the fingertip of the pusher
- The coordinates of the object to be moved
- The coordinates of the goal position

The observation is a `ndarray` with shape `(23,)` where the elements correspond to the table below.
An analogy can be drawn to a human arm in order to help understand the state space, with the words flex and roll meaning the
same as human joints.

| Num | Observation | Min | Max | Name (in corresponding XML file) | Joint| Unit |
|-----|-----------------------|----------------------|--------------------|----------------------|--------------------|--------------------|
| 0 | Rotation of the panning the shoulder | -Inf | Inf | r_shoulder_pan_joint | hinge | angle (rad) |
| 1 | Rotation of the shoulder lifting joint | -Inf | Inf | r_shoulder_lift_joint | hinge | angle (rad) |
| 2 | Rotation of the shoulder rolling joint | -Inf | Inf | r_upper_arm_roll_joint | hinge | angle (rad) |
| 3 | Rotation of hinge joint that flexed the elbow | -Inf | Inf | r_elbow_flex_joint | hinge | angle (rad) |
| 4 | Rotation of hinge that rolls the forearm | -Inf | Inf | r_forearm_roll_joint | hinge | angle (rad) |
| 5 | Rotation of flexing the wrist | -Inf | Inf | r_wrist_flex_joint | hinge | angle (rad) |
| 6 | Rotation of rolling the wrist | -Inf | Inf | r_wrist_roll_joint | hinge | angle (rad) |
| 7 | Rotational velocity of the panning the shoulder | -Inf | Inf | r_shoulder_pan_joint | hinge | angular velocity (rad/s) |
| 8 | Rotational velocity of the shoulder lifting joint | -Inf | Inf | r_shoulder_lift_joint | hinge | angular velocity (rad/s) |
| 9 | Rotational velocity of the shoulder rolling joint | -Inf | Inf | r_upper_arm_roll_joint | hinge | angular velocity (rad/s) |
| 10 | Rotational velocity of hinge joint that flexed the elbow | -Inf | Inf | r_elbow_flex_joint | hinge | angular velocity (rad/s) |
| 11 | Rotational velocity of hinge that rolls the forearm | -Inf | Inf | r_forearm_roll_joint | hinge | angular velocity (rad/s) |
| 12 | Rotational velocity of flexing the wrist | -Inf | Inf | r_wrist_flex_joint | hinge | angular velocity (rad/s) |
| 13 | Rotational velocity of rolling the wrist | -Inf | Inf | r_wrist_roll_joint | hinge | angular velocity (rad/s) |
| 14 | x-coordinate of the fingertip of the pusher | -Inf | Inf | tips_arm | slide | position (m) |
| 15 | y-coordinate of the fingertip of the pusher | -Inf | Inf | tips_arm | slide | position (m) |
| 16 | z-coordinate of the fingertip of the pusher | -Inf | Inf | tips_arm | slide | position (m) |
| 17 | x-coordinate of the object to be moved | -Inf | Inf | object (obj_slidex) | slide | position (m) |
| 18 | y-coordinate of the object to be moved | -Inf | Inf | object (obj_slidey) | slide | position (m) |
| 19 | z-coordinate of the object to be moved | -Inf | Inf | object | cylinder | position (m) |
| 20 | x-coordinate of the goal position of the object | -Inf | Inf | goal (goal_slidex) | slide | position (m) |
| 21 | y-coordinate of the goal position of the object | -Inf | Inf | goal (goal_slidey) | slide | position (m) |
| 22 | z-coordinate of the goal position of the object | -Inf | Inf | goal | sphere | position (m) |


### Rewards
The reward consists of two parts:
- *reward_near *: This reward is a measure of how far the *fingertip*
of the pusher (the unattached end) is from the object, with a more negative
value assigned for when the pusher's *fingertip* is further away from the
target. It is calculated as the negative vector norm of (position of
the fingertip - position of target), or *-norm("fingertip" - "target")*.
- *reward_dist *: This reward is a measure of how far the object is from
the target goal position, with a more negative value assigned for object is
further away from the target. It is calculated as the negative vector norm of
(position of the object - position of goal), or *-norm("object" - "target")*.
- *reward_control*: A negative reward for penalising the pusher if
it takes actions that are too large. It is measured as the negative squared
Euclidean norm of the action, i.e. as *- sum(action<sup>2</sup>)*.

The total reward returned is ***reward*** *=* *reward_dist + 0.1 * reward_ctrl + 0.5 * reward_near*

Unlike other environments, Pusher does not allow you to specify weights for the individual reward terms.
However, `info` does contain the keys *reward_dist* and *reward_ctrl*. Thus, if you'd like to weight the terms,
you should create a wrapper that computes the weighted reward from `info`.


### Starting State
All pusher (not including object and goal) states start in
(0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0). A uniform noise in the range
[-0.005, 0.005] is added to the velocity attributes only. The velocities of
the object and goal are permanently set to 0. The object's x-position is selected uniformly
between [-0.3, 0] while the y-position is selected uniformly between [-0.2, 0.2], and this
process is repeated until the vector norm between the object's (x,y) position and origin is not greater
than 0.17. The goal always have the same position of (0.45, -0.05, -0.323).

The default framerate is 5 with each frame lasting for 0.01, giving rise to a *dt = 5 * 0.01 = 0.05*

### Episode Termination

The episode terminates when any of the following happens:

1. The episode duration reaches a 100 timesteps.
2. Any of the state space values is no longer finite.

### Arguments

No additional arguments are currently supported (in v2 and lower),
but modifications can be made to the XML file in the assets folder
(or by changing the path to a modified XML file in another folder)..

```
env = gym.make('Pusher-v2')
```

There is no v3 for Pusher, unlike the robot environments where a v3 and
beyond take gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc.


### Version History

* v2: All continuous control environments now use mujoco_py >= 1.50
* v1: max_time_steps raised to 1000 for robot based tasks (not including reacher, which has a max_time_steps of 50). Added reward_threshold to environments.
* v0: Initial versions release (1.0.0)

"""

def __init__(self, **kwargs):
utils.EzPickle.__init__(self)
mujoco_env.MujocoEnv.__init__(self, "pusher.xml", 5, **kwargs)
Expand Down
3 changes: 3 additions & 0 deletions gym/envs/toy_text/blackjack.py
Original file line number Diff line number Diff line change
Expand Up @@ -292,3 +292,6 @@ def scale_card_img(card_img):
np.array(pygame.surfarray.pixels3d(self.screen)), axes=(1, 0, 2)
)
)


# Pixel art from Mariia Khmelnytska (https://www.123rf.com/photo_104453049_stock-vector-pixel-art-playing-cards-standart-deck-vector-set.html)
Loading
You are viewing a condensed version of this merge commit. You can view the full changes here.