|
| 1 | +Python Machine Learning - Code Examples |
| 2 | + |
| 3 | + |
| 4 | +## Chapter 18: Reinforcement Learning for Decision Making in Complex Environments |
| 5 | + |
| 6 | + |
| 7 | +### Chapter Outline |
| 8 | + |
| 9 | +- Introduction: learning from experience |
| 10 | + - Understanding reinforcement learning |
| 11 | + - Defining the agent-environment interface of a reinforcement learning system |
| 12 | + - The theoretical foundations of RL |
| 13 | + - Markov decision processes |
| 14 | + - The mathematical formulation of Markov decision processes |
| 15 | + - Visualization of a Markov process |
| 16 | + - Episodic versus continuing tasks |
| 17 | + - RL terminology: return, policy, and value function |
| 18 | + - The return |
| 19 | + - Policy |
| 20 | + - Value function |
| 21 | + - Dynamic programming using the Bellman equation |
| 22 | +- Reinforcement learning algorithms |
| 23 | + - Dynamic programming |
| 24 | + - Policy evaluation – predicting the value function with dynamic programmin |
| 25 | + - Improving the policy using the estimated value function |
| 26 | + - Policy iteration |
| 27 | + - Value iteration |
| 28 | + - Reinforcement learning with Monte Carlo |
| 29 | + - State-value function estimation using MC |
| 30 | + - Action-value function estimation using MC |
| 31 | + - Finding an optimal policy using MC control |
| 32 | + - Policy improvement – computing the greedy policy from the action-value function |
| 33 | + - Temporal difference learning |
| 34 | + - TD prediction |
| 35 | + - On-policy TD control (SARSA) |
| 36 | + - Off-policy TD control (Q-learning) |
| 37 | +- Implementing our first RL algorithm |
| 38 | + - Introducing the OpenAI Gym toolkit |
| 39 | + - Working with the existing environments in OpenAI Gym |
| 40 | + - A grid world example |
| 41 | + - Implementing the grid world environment in OpenAI Gym |
| 42 | + - Solving the grid world problem with Q-learning |
| 43 | + - Implementing the Q-learning algorithm |
| 44 | +- A glance at deep Q-learning |
| 45 | + - Training a DQN model according to the Q-learning algorithm |
| 46 | + - Replay memory |
| 47 | + - Determining the target values for computing the loss |
| 48 | + - Implementing a deep Q-learning algorithm |
| 49 | +- Chapter and book summary |
| 50 | + |
| 51 | +### A note on using the code examples |
| 52 | + |
| 53 | +The recommended way to interact with the code examples in this book is via Jupyter Notebook (the `.ipynb` files). Using Jupyter Notebook, you will be able to execute the code step by step and have all the resulting outputs (including plots and images) all in one convenient document. |
| 54 | + |
| 55 | + |
| 56 | + |
| 57 | + |
| 58 | + |
| 59 | +Setting up Jupyter Notebook is really easy: if you are using the Anaconda Python distribution, all you need to install jupyter notebook is to execute the following command in your terminal: |
| 60 | + |
| 61 | + conda install jupyter notebook |
| 62 | + |
| 63 | +Then you can launch jupyter notebook by executing |
| 64 | + |
| 65 | + jupyter notebook |
| 66 | + |
| 67 | +A window will open up in your browser, which you can then use to navigate to the target directory that contains the `.ipynb` file you wish to open. |
| 68 | + |
| 69 | +**More installation and setup instructions can be found in the [README.md file of Chapter 1](../ch01/README.md)**. |
| 70 | + |
| 71 | +**(Even if you decide not to install Jupyter Notebook, note that you can also view the notebook files on GitHub by simply clicking on them: [`ch18.ipynb`](ch18.ipynb))** |
| 72 | + |
| 73 | +In addition to the code examples, I added a table of contents to each Jupyter notebook as well as section headers that are consistent with the content of the book. Also, I included the original images and figures in hope that these make it easier to navigate and work with the code interactively as you are reading the book. |
| 74 | + |
| 75 | + |
| 76 | + |
| 77 | + |
| 78 | +When I was creating these notebooks, I was hoping to make your reading (and coding) experience as convenient as possible! However, if you don't wish to use Jupyter Notebooks, I also converted these notebooks to regular Python script files (`.py` files) that can be viewed and edited in any plaintext editor. |
0 commit comments