Skip to content

Commit b9ef23a

Browse files
committed
update samples from Release-59 as a part of SDK release
1 parent 7e2c1ca commit b9ef23a

File tree

13 files changed

+1758
-1
lines changed

13 files changed

+1758
-1
lines changed

how-to-use-azureml/reinforcement-learning/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ Using these samples, you will learn how to do the following.
3535
| [cartpole_sc.ipynb](cartpole-on-single-compute/cartpole_sc.ipynb) | Notebook to train a Cartpole playing agent on an Azure Machine Learning Compute Cluster (single node) |
3636
| [pong_rllib.ipynb](atari-on-distributed-compute/pong_rllib.ipynb) | Notebook for distributed training of Pong agent using RLlib on multiple compute targets |
3737
| [minecraft.ipynb](minecraft-on-distributed-compute/minecraft.ipynb) | Notebook to train an agent to navigate through a lava maze in the Minecraft game |
38+
| [particle.ipynb](multiagent-particle-envs/particle.ipynb) | Notebook to train policies in a multiagent cooperative navigation scenario based on OpenAI's Particle environments |
3839

3940
## Prerequisites
4041

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
FROM mcr.microsoft.com/azureml/base:openmpi3.1.2-ubuntu18.04
2+
3+
# Install some basic utilities
4+
RUN apt-get update && apt-get install -y \
5+
curl \
6+
ca-certificates \
7+
sudo \
8+
cpio \
9+
git \
10+
bzip2 \
11+
libx11-6 \
12+
tmux \
13+
htop \
14+
gcc \
15+
xvfb \
16+
python-opengl \
17+
x11-xserver-utils \
18+
ffmpeg \
19+
mesa-utils \
20+
nano \
21+
vim \
22+
rsync \
23+
&& rm -rf /var/lib/apt/lists/*
24+
25+
# Install python 3.7
26+
RUN conda install python==3.7
27+
28+
# Create a working directory
29+
RUN mkdir /app
30+
WORKDIR /app
31+
32+
# Install required pip packages
33+
RUN pip install --upgrade pip setuptools && pip install --upgrade \
34+
pandas \
35+
matplotlib \
36+
psutil \
37+
numpy \
38+
scipy \
39+
gym \
40+
azureml-defaults \
41+
tensorboardX \
42+
tensorflow==1.15 \
43+
tensorflow-probability==0.8.0 \
44+
onnxruntime \
45+
tf2onnx \
46+
cloudpickle==1.2.0 \
47+
tabulate \
48+
dm_tree \
49+
lz4 \
50+
opencv-python \
51+
ray==0.8.3 \
52+
ray[rllib]==0.8.3 \
53+
ray[tune]==0.8.3
54+
55+
# Install particle
56+
RUN git clone https://github.com/openai/multiagent-particle-envs.git
57+
COPY patch_files/* multiagent-particle-envs/multiagent/
58+
RUN cd multiagent-particle-envs && \
59+
pip install -e . && \
60+
pip install --upgrade pyglet==1.3.2
Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
# MIT License
2+
3+
# Copyright (c) 2018 OpenAI
4+
5+
# Permission is hereby granted, free of charge, to any person obtaining a copy
6+
# of this software and associated documentation files (the "Software"), to deal
7+
# in the Software without restriction, including without limitation the rights
8+
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
# copies of the Software, and to permit persons to whom the Software is
10+
# furnished to do so, subject to the following conditions:
11+
12+
# The above copyright notice and this permission notice shall be included in all
13+
# copies or substantial portions of the Software.
14+
15+
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
# SOFTWARE.
22+
23+
import numpy as np
24+
import gym
25+
26+
27+
class MultiDiscrete(gym.Space):
28+
"""
29+
- The multi-discrete action space consists of a series of discrete action spaces with different
30+
parameters
31+
- It can be adapted to both a Discrete action space or a continuous (Box) action space
32+
- It is useful to represent game controllers or keyboards where each key can be represented as
33+
a discrete action space
34+
- It is parametrized by passing an array of arrays containing [min, max] for each discrete action
35+
space where the discrete action space can take any integers from `min` to `max` (both inclusive)
36+
Note: A value of 0 always need to represent the NOOP action.
37+
e.g. Nintendo Game Controller
38+
- Can be conceptualized as 3 discrete action spaces:
39+
1) Arrow Keys: Discrete 5 - NOOP[0], UP[1], RIGHT[2], DOWN[3], LEFT[4] - params: min: 0, max: 4
40+
2) Button A: Discrete 2 - NOOP[0], Pressed[1] - params: min: 0, max: 1
41+
3) Button B: Discrete 2 - NOOP[0], Pressed[1] - params: min: 0, max: 1
42+
- Can be initialized as
43+
MultiDiscrete([ [0,4], [0,1], [0,1] ])
44+
"""
45+
def __init__(self, array_of_param_array):
46+
self.low = np.array([x[0] for x in array_of_param_array])
47+
self.high = np.array([x[1] for x in array_of_param_array])
48+
self.num_discrete_space = self.low.shape[0]
49+
50+
def sample(self):
51+
""" Returns a array with one sample from each discrete action space """
52+
# For each row: round(random .* (max - min) + min, 0)
53+
# random_array = prng.np_random.rand(self.num_discrete_space)
54+
random_array = np.random.RandomState().rand(self.num_discrete_space)
55+
return [int(x) for x in np.floor(np.multiply((self.high - self.low + 1.), random_array) + self.low)]
56+
57+
def contains(self, x):
58+
return len(x) == self.num_discrete_space \
59+
and (np.array(x) >= self.low).all() \
60+
and (np.array(x) <= self.high).all()
61+
62+
@property
63+
def shape(self):
64+
return self.num_discrete_space
65+
66+
def __repr__(self):
67+
return "MultiDiscrete" + str(self.num_discrete_space)
68+
69+
def __eq__(self, other):
70+
return np.array_equal(self.low, other.low) and np.array_equal(self.high, other.high)

0 commit comments

Comments
 (0)