DPOC-Project
所属分类:自动驾驶
开发工具:matlab
文件大小:590KB
下载次数:0
上传日期:2022-04-02 10:49:48
上 传 者:
sh-1993
说明: 2019学年,苏黎世联邦理工学院,2020学年,《动态规划和最优控制》课程的编程练习...
(Programming Exercise solved for course Dynamic Programming and Optimal Control, ETH Zurich, academic year 2019/2020. It is extended through RL algorithms.)
文件列表:
ProgrammingExercise.pdf (558357, 2022-04-02)
src (0, 2022-04-02)
src\ComputeBaseStateIndex.m (1296, 2022-04-02)
src\ComputePickUpStateIndex.m (1200, 2022-04-02)
src\ComputeStageCosts.m (35276, 2022-04-02)
src\ComputeTerminalStateIndex.m (1354, 2022-04-02)
src\ComputeTransitionProbabilities.m (35933, 2022-04-02)
src\Double_Q_Learning.m (3475, 2022-04-02)
src\GenerateWorld.p (1103, 2022-04-02)
src\LinearProgramming.m (3140, 2022-04-02)
src\MakePlots.p (1682, 2022-04-02)
src\OldPandG (0, 2022-04-02)
src\OldPandG\ComputeStageCosts.m (34516, 2022-04-02)
src\OldPandG\ComputeTransitionProbabilities.m (35492, 2022-04-02)
src\PolicyIteration.m (4175, 2022-04-02)
src\Q_Learning.m (3148, 2022-04-02)
src\Q_Learning_UCB.m (3284, 2022-04-02)
src\SARSA.m (3082, 2022-04-02)
src\SARSA_UCB.m (3243, 2022-04-02)
src\SampleTrajMDP.m (2618, 2022-04-02)
src\ValueIteration.m (3880, 2022-04-02)
src\exampleG.mat (1338, 2022-04-02)
src\exampleP.mat (80282, 2022-04-02)
src\exampleWorld.mat (227, 2022-04-02)
src\main.m (14340, 2022-04-02)
src\plotOptimalSolution.m (8605, 2022-04-02)
# DPOC-Project
Programming Exercise part of the course Dynamic Programming and Optimal Control, ETH Zurich, academic year 2019/2020. Read ProgrammingExercise.pdf for problem and scripts description. The aim of this programming exercise is to solve a stochastic shortest path problem using Value Iteration, Policy Iteration and Linear Programming. The scripts coded by the student are:
1. ComputeTerminalStateIndex.m
2. ComputeTransitionProbabilities.m
3. ComputeStageCosts.m
4. PolicyIteration.m
5. ValueIteration.m
6. LinearProgramming.m
Run main.m for checking the solution obtained.
# RL extension
I have extended the programming exercise by solving the same stochastic shortest path problem using Reinforcement Learning (RL) algorithms:
1. SARSA w and w/o initialization from expert
2. Q-Learning w and w/o initialization from expert
3. Double-Q-Learning w and w/o initialization from expert
Where the initialization is guided by the expert, some trajectories are sampled using the optimal policy obtained through Dynamic Programming, and the Q-values of the state-action pairs visited are initialized at a higher value. The functions that implement SARSA, Q-Learning and Double-Q-Learning can be found in scripts SARSA.m, Q_Learning.m and Double_Q_Learning.m, respectively. In these, an epsilon-greedy policy is used for exploration. SARSA and Q-Learning are implemented also using Upper Confidence Bounds (UCB) for exploration (in SARSA_UCB.m and Q_Learning_UCB.m, respectively).
近期下载者:
相关文件:
收藏者: