Skip to content

Conversation

@gianlucadecola
Copy link
Contributor

Referring to #2729; to add test to gym.utils.play I have refactored the play() script and extracted a PlayableGame class to have more separation and easier testing.
Running headless the whole game is possible but it seems that pygame events are not available (https://www.pygame.org/wiki/HeadlessNoWindowsNeeded) so I prefer to mock them in tests and test the PlayableGame class

@pseudo-rnd-thoughts
Copy link
Contributor

pseudo-rnd-thoughts commented Apr 11, 2022

Thanks for making the PR, couple of things

Would you be able to add typing to the PlayableGame or pydocs for people using the class to be able to know what expected information is?

@pseudo-rnd-thoughts
Copy link
Contributor

Also, you don't seem to have any tests for the primary play function. Could you add that, please

@gianlucadecola
Copy link
Contributor Author

Added type hint 😃
As for testing the play() function, I'm quite unsure how to break the while loop since it does not seem possible to inject events in headless mode. One solution might be adding a testing parameter inside play() and tweaking a little bit the function but it seems like a sort of hacky solution to me 🤔

@pseudo-rnd-thoughts
Copy link
Contributor

I'm not experienced with pygame so this may not work
My idea would be to use a custom testing callback function with the play function that would add a new event action to the event queue that would be processed next by the game.
This could allow you to run the play function forward n time steps with a known input then compare the resulting game observation at n to a standard environment at n

@pseudo-rnd-thoughts
Copy link
Contributor

Looking at the blame of the play.py, there is a __main__ script at the end that appears to be at least 5 years old (42a42fd)
@jkterry1 Would it be ok to remove this __main__ script?

@gianlucadecola
Copy link
Contributor Author

I'm not experienced with pygame so this may not work My idea would be to use a custom testing callback function with the play function that would add a new event action to the event queue that would be processed next by the game. This could allow you to run the play function forward n time steps with a known input then compare the resulting game observation at n to a standard environment at n

Well, it actually worked pretty well 🎉 Really good hint

@pseudo-rnd-thoughts
Copy link
Contributor

Wow, I was half expecting that to not work, thanks for doing that

You probably have a reason but

  1. Why are you using dummy env rather than an actual gym environment, i.e. cartpole?
  2. Why are you using a dummy key event rather than an actual pygame event?

Extra test ideas
3. Could you extend the current test_play to be longer and assert the final observation are equal as checking that the reward hasn't changed at all (as the initial value is 0) doesn't seem to test the function properly to me
4. This may be too much, but we could run the play for all of the spec_list environments with random actions being generated in the callback (key down for the new action, key up for the old action)

Finally, if you think it is possible to add a test for PlayPlot? There doesn't seem much to test, maybe the data saved is correct, particularly for len(plot_names) > 1.
Plus we should add a doc string and type hinting for the class if possible

@gianlucadecola
Copy link
Contributor Author

Wow, I was half expecting that to not work, thanks for doing that

You probably have a reason but

1. Why are you using dummy env rather than an actual gym environment, i.e. cartpole?

I didn't like the idea of test being tied to a particular environment. It seems more generic to me to have the dummy just for testing purpose of the play function

2. Why are you using a dummy key event rather than an actual pygame event?

This can actually be removed and replaced with an actual pygame event, I will commit the fix

Extra test ideas 3. Could you extend the current test_play to be longer and assert the final observation are equal as checking that the reward hasn't changed at all (as the initial value is 0) doesn't seem to test the function properly to me 4. This may be too much, but we could run the play for all of the spec_list environments with random actions being generated in the callback (key down for the new action, key up for the old action)

Finally, if you think it is possible to add a test for PlayPlot? There doesn't seem much to test, maybe the data saved is correct, particularly for len(plot_names) > 1. Plus we should add a doc string and type hinting for the class if possible

Good point on .3;
.4 seems an overkill to me, maybe it could be done in another PR 🤔

I will try to look into PlayPlot, also 👍

@pseudo-rnd-thoughts
Copy link
Contributor

I understand your point about using an abstract implementation (a dummy environment) and actual implementation (i.e. cartpole) but I see two possible problems

  1. What if someone changes the actual implementation of an environment in the future then they would have to both know about this dummy environment and then update it
  2. It assumes that the dummy implementation is equivalent to an actual environment which it possible is but it is easier to check that if we just use an actual environment

Thanks for making those changes and looking into PlayPlot tests

@jkterry1
Copy link
Collaborator

to answer an above question, removing old code that doesn't do anything is fine

Copy link
Contributor

@RedTachyon RedTachyon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks pretty good, left a bunch of comments. Have you tried actually playing this with some environments? As a "full" test which can't really be easily automated, but it's helpful to do it as a human.



class PlayableGame:
def __init__(self, env: Env, keys_to_action: dict = None, zoom: float = None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keys_to_action: Optional[dict]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even better if we can specify the key/value types on the dict (I guess values are arbitrary actions, but keys should be keypresses? how are they represented?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's Dict[Tuple[int], int] where the key is (ord(key),)

def get_keys_to_action(self):
# Control with left and right arrow keys.
return {(): 1, (276,): 0, (275,): 2, (275, 276): 1}

reference of the actions:

| Num | Observation | Value | Unit |
|-----|-------------------------------------------------------------|---------|------|
| 0 | Accelerate to the left | Inf | position (m) |
| 1 | Don't accelerate | Inf | position (m) |
| 2 | Accelerate to the right | Inf | position (m) |

gym/gym/utils/play.py

Lines 125 to 135 in 9a1d4b6

keys_to_action: dict: tuple(int) -> int or None
Mapping from keys pressed to action performed.
For example if pressed 'w' and space at the same time is supposed
to trigger action number 2 then key_to_action dict would look like this:
{
# ...
sorted(ord('w'), ord(' ')) -> 2
# ...
}
If None, default key_to_action mapping for that env is used, if provided.

self.pressed_keys = []
self.running = True

def _get_relevant_keys(self, keys_to_action: dict) -> set:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional[dict]

elif hasattr(self.env.unwrapped, "get_keys_to_action"):
keys_to_action = self.env.unwrapped.get_keys_to_action()
else:
assert False, (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explicitly raise some Exception instead of a failing assert, you can either use one of the existing exceptions like AttributeError or something; or make a new one in this file to make it more descriptive.

+ " does not have explicit key to action mapping, "
+ "please specify one manually"
)
relevant_keys = set(sum(map(list, keys_to_action.keys()), []))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

map is nice (functional programming is great), but in Python it's typically more readable to use a comprehension, so it'd be something like
set(sum(list(key) for key in keys_to_action), []))

But is this actually what we want this to do? I'm not sure what each key is meant to be (see previous comment about types), so take another look at the logic here. Intuition tells me that you might have wanted to just do set(keys_to_actions.keys()) or something like that (which might be equivalent to just set(keys_to_actions))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to explore it more in depth, this logic was already there in the old play(), it seems also to me that set(keys_to_actions.keys()) may be fine

return relevant_keys

def _get_video_size(self, zoom: float = None) -> Tuple[int, int]:
rendered = self.env.render(mode="rgb_array")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put like a TODO here so that we remember to update this when the render API change goes through?

if event.type == pygame.KEYDOWN:
if event.key in self.relevant_keys:
self.pressed_keys.append(event.key)
elif event.key == 27:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is 27? Ideally replace this with some constant that already might exist in pygame, or define it yourself, or at the very least add a comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole _get_relevant_keys() was extracted "as is" from the old play(). Didn't want to change it before having all tests in place but this is safe enough to replace with a pygame constants 😅 (it should be the exit key)

self.screen = pygame.display.set_mode(self.video_size)


def display_arr(screen, arr, video_size, transpose):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add type hints here and to play()?

obs = env.reset()
else:
action = keys_to_action.get(tuple(sorted(pressed_keys)), 0)
action = keys_to_action.get(tuple(sorted(game.pressed_keys)), 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we always safely assume that 0 is a good default action? (if I'm understanding this correctly)
Maybe this should be part of the specification?

In particular this seems like it'd crash with any continuous-action environments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is actually the default action if no relevant keys are pressed; I have the same doubt regarding 0, I will explore it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RedTachyon About this I think we can safely say that, since we can not map a continuos-action environment to discrete keyboard presses it is ok to leave 0 as default because play() will crash before since there is no keys_to_action. (Maybe we could add an explicit message for continuos-action environment?)
An example of a keys to action mapping it is indeed this:

def get_keys_to_action(self):
# Control with left and right arrow keys.
return {(): 1, (276,): 0, (275,): 2, (275, 276): 1}

def test_play_relevant_keys():
env = DummyPlayEnv()
game = PlayableGame(env, dummy_keys_to_action())
assert game.relevant_keys == {97, 100}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No magic numbers please

assert game.pressed_keys == []


def test_play_loop():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to do the same test with some actual environment? Say, CartPole, define the game, input a bunch of random actions in a loop just to see that it doesn't crash.

For a more "advanced" version, if possible, you could instantiate one PlayableGame of an env with a fixed seed, and one normal env with the same fixed seed. You input the same sequence of actions through the game interface, and through the regular env.step, and check that the outcome is the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, following what suggested by @pseudo-rnd-thoughts I'm adding a test with an actual environment. It's a little bit tricky to manage the Keydown and Keyup event with pygame but I'm on the way

@gianlucadecola
Copy link
Contributor Author

Overall looks pretty good, left a bunch of comments. Have you tried actually playing this with some environments? As a "full" test which can't really be easily automated, but it's helpful to do it as a human.

Hey! Thanks for the reply, yes I had some fun trying to balance the cartpole myself 😄 I'm going to look into the comments 👍

@gianlucadecola
Copy link
Contributor Author

gianlucadecola commented Apr 15, 2022

Pushed a bunch of updates; still need to assess some things like set(sum(list(key) for key in keys_to_action), [])) and explore this:

Can we always safely assume that 0 is a good default action? (if I'm understanding this correctly) Maybe this should be part of the specification?

In particular this seems like it'd crash with any continuous-action environments?

@gianlucadecola
Copy link
Contributor Author

@RedTachyon I think everything is in place now

@pseudo-rnd-thoughts
Copy link
Contributor

Thanks @gianlucadecola I think the play function changes are complete.
Do you want to add a PlayPlot docstring, type hinting and test in this PR or another?

@gianlucadecola
Copy link
Contributor Author

I think we can merge this and then I'll submit another PR assessing only the PlayPlot part as soon as I can manage to work on it 👍

@pseudo-rnd-thoughts
Copy link
Contributor

@jkterry1 Looks good to merge

@jkterry1 jkterry1 merged commit 36a7fe5 into openai:master Apr 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants