Skip to content
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
113 changes: 66 additions & 47 deletions gym/utils/play.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,15 @@
import argparse
from typing import Tuple

import matplotlib
import pygame
from pygame.event import Event

import gym
from gym import logger
from gym import Env, logger

try:
import matplotlib

matplotlib.use("TkAgg")
import matplotlib.pyplot as plt
except ImportError as e:
Expand All @@ -18,6 +21,55 @@
from pygame.locals import VIDEORESIZE


class PlayableGame:
def __init__(self, env: Env, keys_to_action: dict = None, zoom: float = None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keys_to_action: Optional[dict]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even better if we can specify the key/value types on the dict (I guess values are arbitrary actions, but keys should be keypresses? how are they represented?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's Dict[Tuple[int], int] where the key is (ord(key),)

def get_keys_to_action(self):
# Control with left and right arrow keys.
return {(): 1, (276,): 0, (275,): 2, (275, 276): 1}

reference of the actions:

| Num | Observation | Value | Unit |
|-----|-------------------------------------------------------------|---------|------|
| 0 | Accelerate to the left | Inf | position (m) |
| 1 | Don't accelerate | Inf | position (m) |
| 2 | Accelerate to the right | Inf | position (m) |

gym/gym/utils/play.py

Lines 125 to 135 in 9a1d4b6

keys_to_action: dict: tuple(int) -> int or None
Mapping from keys pressed to action performed.
For example if pressed 'w' and space at the same time is supposed
to trigger action number 2 then key_to_action dict would look like this:
{
# ...
sorted(ord('w'), ord(' ')) -> 2
# ...
}
If None, default key_to_action mapping for that env is used, if provided.

self.env = env
self.relevant_keys = self._get_relevant_keys(keys_to_action)
self.video_size = self._get_video_size(zoom)
self.screen = pygame.display.set_mode(self.video_size)
self.pressed_keys = []
self.running = True

def _get_relevant_keys(self, keys_to_action: dict) -> set:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional[dict]

if keys_to_action is None:
if hasattr(self.env, "get_keys_to_action"):
keys_to_action = self.env.get_keys_to_action()
elif hasattr(self.env.unwrapped, "get_keys_to_action"):
keys_to_action = self.env.unwrapped.get_keys_to_action()
else:
assert False, (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explicitly raise some Exception instead of a failing assert, you can either use one of the existing exceptions like AttributeError or something; or make a new one in this file to make it more descriptive.

self.env.spec.id
+ " does not have explicit key to action mapping, "
+ "please specify one manually"
)
relevant_keys = set(sum(map(list, keys_to_action.keys()), []))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

map is nice (functional programming is great), but in Python it's typically more readable to use a comprehension, so it'd be something like
set(sum(list(key) for key in keys_to_action), []))

But is this actually what we want this to do? I'm not sure what each key is meant to be (see previous comment about types), so take another look at the logic here. Intuition tells me that you might have wanted to just do set(keys_to_actions.keys()) or something like that (which might be equivalent to just set(keys_to_actions))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to explore it more in depth, this logic was already there in the old play(), it seems also to me that set(keys_to_actions.keys()) may be fine

return relevant_keys

def _get_video_size(self, zoom: float = None) -> Tuple[int, int]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional[float]

rendered = self.env.render(mode="rgb_array")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put like a TODO here so that we remember to update this when the render API change goes through?

video_size = [rendered.shape[1], rendered.shape[0]]

if zoom is not None:
video_size = int(video_size[0] * zoom), int(video_size[1] * zoom)

return video_size

def process_event(self, event: Event) -> None:
if event.type == pygame.KEYDOWN:
if event.key in self.relevant_keys:
self.pressed_keys.append(event.key)
elif event.key == 27:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is 27? Ideally replace this with some constant that already might exist in pygame, or define it yourself, or at the very least add a comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole _get_relevant_keys() was extracted "as is" from the old play(). Didn't want to change it before having all tests in place but this is safe enough to replace with a pygame constants 😅 (it should be the exit key)

self.running = False
elif event.type == pygame.KEYUP:
if event.key in self.relevant_keys:
self.pressed_keys.remove(event.key)
elif event.type == pygame.QUIT:
self.running = False
elif event.type == VIDEORESIZE:
self.video_size = event.size
self.screen = pygame.display.set_mode(self.video_size)


def display_arr(screen, arr, video_size, transpose):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add type hints here and to play()?

arr_min, arr_max = arr.min(), arr.max()
arr = 255.0 * (arr - arr_min) / (arr_max - arr_min)
Expand Down Expand Up @@ -83,63 +135,30 @@ def callback(obs_t, obs_tp1, action, rew, done, info):
If None, default key_to_action mapping for that env is used, if provided.
"""
env.reset()
rendered = env.render(mode="rgb_array")
game = PlayableGame(env, keys_to_action, zoom)

if keys_to_action is None:
if hasattr(env, "get_keys_to_action"):
keys_to_action = env.get_keys_to_action()
elif hasattr(env.unwrapped, "get_keys_to_action"):
keys_to_action = env.unwrapped.get_keys_to_action()
else:
assert False, (
env.spec.id
+ " does not have explicit key to action mapping, "
+ "please specify one manually"
)
relevant_keys = set(sum(map(list, keys_to_action.keys()), []))

video_size = [rendered.shape[1], rendered.shape[0]]
if zoom is not None:
video_size = int(video_size[0] * zoom), int(video_size[1] * zoom)

pressed_keys = []
running = True
env_done = True

screen = pygame.display.set_mode(video_size)
done = True
clock = pygame.time.Clock()

while running:
if env_done:
env_done = False
while game.running:
if done:
done = False
obs = env.reset()
else:
action = keys_to_action.get(tuple(sorted(pressed_keys)), 0)
action = keys_to_action.get(tuple(sorted(game.pressed_keys)), 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we always safely assume that 0 is a good default action? (if I'm understanding this correctly)
Maybe this should be part of the specification?

In particular this seems like it'd crash with any continuous-action environments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is actually the default action if no relevant keys are pressed; I have the same doubt regarding 0, I will explore it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RedTachyon About this I think we can safely say that, since we can not map a continuos-action environment to discrete keyboard presses it is ok to leave 0 as default because play() will crash before since there is no keys_to_action. (Maybe we could add an explicit message for continuos-action environment?)
An example of a keys to action mapping it is indeed this:

def get_keys_to_action(self):
# Control with left and right arrow keys.
return {(): 1, (276,): 0, (275,): 2, (275, 276): 1}

prev_obs = obs
obs, rew, env_done, info = env.step(action)
obs, rew, done, info = env.step(action)
if callback is not None:
callback(prev_obs, obs, action, rew, env_done, info)
callback(prev_obs, obs, action, rew, done, info)
if obs is not None:
rendered = env.render(mode="rgb_array")
display_arr(screen, rendered, transpose=transpose, video_size=video_size)
display_arr(
game.screen, rendered, transpose=transpose, video_size=game.video_size
)

# process pygame events
for event in pygame.event.get():
# test events, set key states
if event.type == pygame.KEYDOWN:
if event.key in relevant_keys:
pressed_keys.append(event.key)
elif event.key == 27:
running = False
elif event.type == pygame.KEYUP:
if event.key in relevant_keys:
pressed_keys.remove(event.key)
elif event.type == pygame.QUIT:
running = False
elif event.type == VIDEORESIZE:
video_size = event.size
screen = pygame.display.set_mode(video_size)
print(video_size)
game.process_event(event)

pygame.display.flip()
clock.tick(fps)
Expand Down
161 changes: 161 additions & 0 deletions tests/utils/test_play.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,161 @@
from dataclasses import dataclass, field
from typing import Callable, Optional, Tuple

import numpy as np
import pygame
import pytest
from pygame import KEYDOWN, QUIT, event
from pygame.event import Event

import gym
from gym.utils.play import PlayableGame, play

RELEVANT_KEY = 100
IRRELEVANT_KEY = 1


@dataclass
class DummyEnvSpec:
id: str


class DummyPlayEnv(gym.Env):
def step(self, action):
obs = np.zeros((1, 1))
rew, done, info = 1, False, {}
return obs, rew, done, info

def reset(self):
...

def render(self, mode="rgb_array"):
return np.zeros((1, 1))


class PlayStatus:
def __init__(self, callback: Callable):
self.data_callback = callback
self.cumulative_reward = 0

def callback(self, obs_t, obs_tp1, action, rew, done, info):
self.cumulative_reward += self.data_callback(
obs_t, obs_tp1, action, rew, done, info
)


# set of key events to inject into the play loop as callback
callback_events = [
Event(KEYDOWN, {"key": RELEVANT_KEY}),
Event(KEYDOWN, {"key": RELEVANT_KEY}),
Event(QUIT),
]


def callback(obs_t, obs_tp1, action, rew, done, info):
event.post(callback_events.pop(0))
return rew


def dummy_keys_to_action():
return {(ord("a"),): 0, (ord("d"),): 1}


@pytest.fixture(autouse=True)
def close_pygame():
yield
pygame.quit()


def test_play_relevant_keys():
env = DummyPlayEnv()
game = PlayableGame(env, dummy_keys_to_action())
assert game.relevant_keys == {97, 100}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No magic numbers please



def test_play_relevant_keys_no_mapping():
env = DummyPlayEnv()
env.spec = DummyEnvSpec("DummyPlayEnv")

with pytest.raises(AssertionError) as info:
PlayableGame(env)


def test_play_relevant_keys_with_env_attribute():
"""Env has a keys_to_action attribute"""
env = DummyPlayEnv()
env.get_keys_to_action = dummy_keys_to_action
game = PlayableGame(env)
assert game.relevant_keys == {97, 100}


def test_video_size_no_zoom():
env = DummyPlayEnv()
game = PlayableGame(env, dummy_keys_to_action())
assert game.video_size == list(env.render().shape)


def test_video_size_zoom():
env = DummyPlayEnv()
zoom = 2.2
game = PlayableGame(env, dummy_keys_to_action(), zoom)
assert game.video_size == tuple(int(shape * zoom) for shape in env.render().shape)


def test_keyboard_quit_event():
env = DummyPlayEnv()
game = PlayableGame(env, dummy_keys_to_action())
event = Event(pygame.KEYDOWN, {"key": 27})
assert game.running == True
game.process_event(event)
assert game.running == False


def test_pygame_quit_event():
env = DummyPlayEnv()
game = PlayableGame(env, dummy_keys_to_action())
event = Event(pygame.QUIT)
assert game.running == True
game.process_event(event)
assert game.running == False


def test_keyboard_relevant_keydown_event():
env = DummyPlayEnv()
game = PlayableGame(env, dummy_keys_to_action())
event = Event(pygame.KEYDOWN, {"key": RELEVANT_KEY})
game.process_event(event)
assert game.pressed_keys == [RELEVANT_KEY]


def test_keyboard_irrelevant_keydown_event():
env = DummyPlayEnv()
game = PlayableGame(env, dummy_keys_to_action())
event = Event(pygame.KEYDOWN, {"key": IRRELEVANT_KEY})
game.process_event(event)
assert game.pressed_keys == []


def test_keyboard_keyup_event():
env = DummyPlayEnv()
game = PlayableGame(env, dummy_keys_to_action())
event = Event(pygame.KEYDOWN, {"key": RELEVANT_KEY})
game.process_event(event)
event = Event(pygame.KEYUP, {"key": RELEVANT_KEY})
game.process_event(event)
assert game.pressed_keys == []


def test_play_loop():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to do the same test with some actual environment? Say, CartPole, define the game, input a bunch of random actions in a loop just to see that it doesn't crash.

For a more "advanced" version, if possible, you could instantiate one PlayableGame of an env with a fixed seed, and one normal env with the same fixed seed. You input the same sequence of actions through the game interface, and through the regular env.step, and check that the outcome is the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, following what suggested by @pseudo-rnd-thoughts I'm adding a test with an actual environment. It's a little bit tricky to manage the Keydown and Keyup event with pygame but I'm on the way

env = DummyPlayEnv()
cumulative_env_reward = 0
for s in range(
len(callback_events)
): # we run the same number of steps executed with play()
_, rew, _, _ = env.step(None)
cumulative_env_reward += rew

env_play = DummyPlayEnv()
status = PlayStatus(callback)
play(env_play, callback=status.callback, keys_to_action=dummy_keys_to_action())

assert status.cumulative_reward == cumulative_env_reward