[RL经典教材]及[MATLAB源码] 联合开发网

Pudn.com > 下载中心 > 其他 > [RL经典教材]及[MATLAB源码]

[RL经典教材]及[MATLAB源码]

所属分类：其他
开发工具：matlab
文件大小：15634KB
下载次数：0
上传日期：2020-12-08 15:55:06
上传者：周冬雨的小号

说明： [经典教材]Reinforcement+Learning：An+Introduction 包含MATLAB源码推荐大家下载！！！
([Classic textbook] reinforcement + Learning: an + introduction Including Matlab source code Recommend you to download!!!)

文件列表:

[经典教材]Reinforcement+Learning：An+Introduction.pdf (12613382, 2020-10-10)
reinforcement-learning-an-introduction (0, 2020-10-21)
reinforcement-learning-an-introduction\.travis.yml (148, 2020-07-06)
reinforcement-learning-an-introduction\chapter01 (0, 2020-10-21)
reinforcement-learning-an-introduction\chapter01\.idea (0, 2020-10-21)
reinforcement-learning-an-introduction\chapter01\.idea\chapter01.iml (464, 2020-10-15)
reinforcement-learning-an-introduction\chapter01\.idea\inspectionProfiles (0, 2020-10-21)
reinforcement-learning-an-introduction\chapter01\.idea\inspectionProfiles\profiles_settings.xml (174, 2020-10-15)
reinforcement-learning-an-introduction\chapter01\.idea\misc.xml (314, 2020-10-15)
reinforcement-learning-an-introduction\chapter01\.idea\modules.xml (277, 2020-10-15)
reinforcement-learning-an-introduction\chapter01\.idea\workspace.xml (1796, 2020-10-15)
reinforcement-learning-an-introduction\chapter01\policy_first.bin (216274, 2020-10-15)
reinforcement-learning-an-introduction\chapter01\policy_second.bin (214104, 2020-10-15)
reinforcement-learning-an-introduction\chapter01\tic_tac_toe.py (11069, 2020-07-06)
reinforcement-learning-an-introduction\chapter02 (0, 2020-10-21)
reinforcement-learning-an-introduction\chapter02\ten_armed_testbed.py (9070, 2020-07-06)
reinforcement-learning-an-introduction\chapter03 (0, 2020-10-21)
reinforcement-learning-an-introduction\chapter03\grid_world.py (6312, 2020-07-06)
reinforcement-learning-an-introduction\chapter04 (0, 2020-10-21)
reinforcement-learning-an-introduction\chapter04\car_rental.py (7647, 2020-10-15)
reinforcement-learning-an-introduction\chapter04\car_rental_synchronous.py (8747, 2020-07-06)
reinforcement-learning-an-introduction\chapter04\gamblers_problem.py (2677, 2020-07-06)
reinforcement-learning-an-introduction\chapter04\grid_world.py (3331, 2020-07-06)
reinforcement-learning-an-introduction\chapter05 (0, 2020-10-21)
reinforcement-learning-an-introduction\chapter05\blackjack.py (13344, 2020-07-06)
reinforcement-learning-an-introduction\chapter05\infinite_variance.py (1814, 2020-07-06)
reinforcement-learning-an-introduction\chapter06 (0, 2020-10-21)
reinforcement-learning-an-introduction\chapter06\cliff_walking.py (9355, 2020-07-06)
reinforcement-learning-an-introduction\chapter06\maximization_bias.py (4269, 2020-07-06)
reinforcement-learning-an-introduction\chapter06\random_walk.py (6574, 2020-07-06)
reinforcement-learning-an-introduction\chapter06\windy_grid_world.py (4018, 2020-07-06)
reinforcement-learning-an-introduction\chapter07 (0, 2020-10-21)
reinforcement-learning-an-introduction\chapter07\random_walk.py (4222, 2020-07-06)
reinforcement-learning-an-introduction\chapter08 (0, 2020-10-21)
reinforcement-learning-an-introduction\chapter08\expectation_vs_sample.py (1627, 2020-07-06)
reinforcement-learning-an-introduction\chapter08\maze.py (23222, 2020-07-06)
reinforcement-learning-an-introduction\chapter08\trajectory_sampling.py (4892, 2020-07-06)
reinforcement-learning-an-introduction\chapter09 (0, 2020-10-21)
reinforcement-learning-an-introduction\chapter09\random_walk.py (15941, 2020-07-06)
... ...

# Reinforcement Learning: An Introduction [![Build Status](https://travis-ci.org/ShangtongZhang/reinforcement-learning-an-introduction.svg?branch=master)](https://travis-ci.org/ShangtongZhang/reinforcement-learning-an-introduction) Python replication for Sutton & Barto's book [*Reinforcement Learning: An Introduction (2nd Edition)*](http://incompleteideas.net/book/the-book-2nd.html) > If you have any confusion about the code or want to report a bug, please open an issue instead of emailing me directly, and unfortunately I do not have exercise answers for the book. # Contents ### Chapter 1 1. Tic-Tac-Toe ### Chapter 2 1. [Figure 2.1: An exemplary bandit problem from the 10-armed testbed](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_2_1.png) 2. [Figure 2.2: Average performance of epsilon-greedy action-value methods on the 10-armed testbed](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_2_2.png) 3. [Figure 2.3: Optimistic initial action-value estimates](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_2_3.png) 4. [Figure 2.4: Average performance of UCB action selection on the 10-armed testbed](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_2_4.png) 5. [Figure 2.5: Average performance of the gradient bandit algorithm](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_2_5.png) 6. [Figure 2.6: A parameter study of the various bandit algorithms](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_2_6.png) ### Chapter 3 1. [Figure 3.2: Grid example with random policy](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_3_2.png) 2. [Figure 3.5: Optimal solutions to the gridworld example](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_3_5.png) ### Chapter 4 1. [Figure 4.1: Convergence of iterative policy evaluation on a small gridworld](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_4_1.png) 2. [Figure 4.2: Jack’s car rental problem](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_4_2.png) 3. [Figure 4.3: The solution to the gambler’s problem](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_4_3.png) ### Chapter 5 1. [Figure 5.1: Approximate state-value functions for the blackjack policy](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_5_1.png) 2. [Figure 5.2: The optimal policy and state-value function for blackjack found by Monte Carlo ES](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_5_2.png) 3. [Figure 5.3: Weighted importance sampling](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_5_3.png) 4. [Figure 5.4: Ordinary importance sampling with surprisingly unstable estimates](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_5_4.png) ### Chapter 6 1. [Example 6.2: Random walk](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/example_6_2.png) 2. [Figure 6.2: Batch updating](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_6_2.png) 3. [Figure 6.3: Sarsa applied to windy grid world](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_6_3.png) 4. [Figure ***: The cliff-walking task](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_6_4.png) 5. [Figure 6.6: Interim and asymptotic performance of TD control methods](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_6_6.png) 6. [Figure 6.7: Comparison of Q-learning and Double Q-learning](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_6_7.png) ### Chapter 7 1. [Figure 7.2: Performance of n-step TD methods on 19-state random walk](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_7_2.png) ### Chapter 8 1. [Figure 8.2: Average learning curves for Dyna-Q agents varying in their number of planning steps](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_8_2.png) 2. [Figure 8.4: Average performance of Dyna agents on a blocking task](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_8_4.png) 3. [Figure 8.5: Average performance of Dyna agents on a shortcut task](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_8_5.png) 4. [Example 8.4: Prioritized sweeping significantly shortens learning time on the Dyna maze task](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/example_8_4.png) 5. [Figure 8.7: Comparison of efficiency of expected and sample updates](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_8_7.png) 6. [Figure 8.8: Relative efficiency of different update distributions](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_8_8.png) ### Chapter 9 1. [Figure 9.1: Gradient Monte Carlo algorithm on the 1000-state random walk task](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_9_1.png) 2. [Figure 9.2: Semi-gradient n-steps TD algorithm on the 1000-state random walk task](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_9_2.png) 3. [Figure 9.5: Fourier basis vs polynomials on the 1000-state random walk task](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_9_5.png) 4. [Figure 9.8: Example of feature width’s effect on initial generalization and asymptotic accuracy](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_9_8.png) 5. [Figure 9.10: Single tiling and multiple tilings on the 1000-state random walk task](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_9_10.png) ### Chapter 10 1. [Figure 10.1: The cost-to-go function for Mountain Car task in one run](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_10_1.png) 2. [Figure 10.2: Learning curves for semi-gradient Sarsa on Mountain Car task](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_10_2.png) 3. [Figure 10.3: One-step vs multi-step performance of semi-gradient Sarsa on the Mountain Car task](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_10_3.png) 4. [Figure 10.4: Effect of the alpha and n on early performance of n-step semi-gradient Sarsa](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_10_4.png) 5. [Figure 10.5: Differential semi-gradient Sarsa on the access-control queuing task](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_10_5.png) ### Chapter 11 1. [Figure 11.2: Baird's Counterexample](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_11_2.png) 2. [Figure 11.6: The behavior of the TDC algorithm on Baird’s counterexample](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_11_6.png) 3. [Figure 11.7: The behavior of the ETD algorithm in expectation on Baird’s counterexample](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_11_7.png) ### Chapter 12 1. [Figure 12.3: Off-line λ-return algorithm on 19-state random walk](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_12_3.png) 2. [Figure 12.6: TD(λ) algorithm on 19-state random walk](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_12_6.png) 3. [Figure 12.8: True online TD(λ) algorithm on 19-state random walk](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_12_8.png) 4. [Figure 12.10: Sarsa(λ) with replacing traces on Mountain Car](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_12_10.png) 5. [Figure 12.11: Summary comparison of Sarsa(λ) algorithms on Mountain Car](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_12_11.png) ### Chapter 13 1. [Example 13.1: Short corridor with switched actions](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/example_13_1.png) 2. [Figure 13.1: REINFORCE on the short-corridor grid world](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_13_1.png) 3. [Figure 13.2: REINFORCE with baseline on the short-corridor grid-world](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_13_2.png) # Environment * python 3.6 * numpy * matplotlib * [seaborn](https://seaborn.pydata.org/index.html) * [tqdm](https://pypi.org/project/tqdm/) # Usage > All files are self-contained ```commandline python any_file_you_want.py ``` # Contribution If you want to contribute some missing examples or fix some bugs, feel free to open an issue or make a pull request. Following are missing figures/examples: * Figure 12.14: The effect of λ

近期下载者：

相关文件：

评论：[我要评论] [举报此文件]

收藏者：