DPOC-Project

所属分类:自动驾驶
开发工具:matlab
文件大小:590KB
下载次数:0
上传日期:2022-04-02 10:49:48
上 传 者sh-1993
说明:  2019学年,苏黎世联邦理工学院,2020学年,《动态规划和最优控制》课程的编程练习...
(Programming Exercise solved for course Dynamic Programming and Optimal Control, ETH Zurich, academic year 2019/2020. It is extended through RL algorithms.)

文件列表:
ProgrammingExercise.pdf (558357, 2022-04-02)
src (0, 2022-04-02)
src\ComputeBaseStateIndex.m (1296, 2022-04-02)
src\ComputePickUpStateIndex.m (1200, 2022-04-02)
src\ComputeStageCosts.m (35276, 2022-04-02)
src\ComputeTerminalStateIndex.m (1354, 2022-04-02)
src\ComputeTransitionProbabilities.m (35933, 2022-04-02)
src\Double_Q_Learning.m (3475, 2022-04-02)
src\GenerateWorld.p (1103, 2022-04-02)
src\LinearProgramming.m (3140, 2022-04-02)
src\MakePlots.p (1682, 2022-04-02)
src\OldPandG (0, 2022-04-02)
src\OldPandG\ComputeStageCosts.m (34516, 2022-04-02)
src\OldPandG\ComputeTransitionProbabilities.m (35492, 2022-04-02)
src\PolicyIteration.m (4175, 2022-04-02)
src\Q_Learning.m (3148, 2022-04-02)
src\Q_Learning_UCB.m (3284, 2022-04-02)
src\SARSA.m (3082, 2022-04-02)
src\SARSA_UCB.m (3243, 2022-04-02)
src\SampleTrajMDP.m (2618, 2022-04-02)
src\ValueIteration.m (3880, 2022-04-02)
src\exampleG.mat (1338, 2022-04-02)
src\exampleP.mat (80282, 2022-04-02)
src\exampleWorld.mat (227, 2022-04-02)
src\main.m (14340, 2022-04-02)
src\plotOptimalSolution.m (8605, 2022-04-02)

# DPOC-Project Programming Exercise part of the course Dynamic Programming and Optimal Control, ETH Zurich, academic year 2019/2020. Read ProgrammingExercise.pdf for problem and scripts description. The aim of this programming exercise is to solve a stochastic shortest path problem using Value Iteration, Policy Iteration and Linear Programming. The scripts coded by the student are: 1. ComputeTerminalStateIndex.m 2. ComputeTransitionProbabilities.m 3. ComputeStageCosts.m 4. PolicyIteration.m 5. ValueIteration.m 6. LinearProgramming.m Run main.m for checking the solution obtained. # RL extension I have extended the programming exercise by solving the same stochastic shortest path problem using Reinforcement Learning (RL) algorithms: 1. SARSA w and w/o initialization from expert 2. Q-Learning w and w/o initialization from expert 3. Double-Q-Learning w and w/o initialization from expert Where the initialization is guided by the expert, some trajectories are sampled using the optimal policy obtained through Dynamic Programming, and the Q-values of the state-action pairs visited are initialized at a higher value. The functions that implement SARSA, Q-Learning and Double-Q-Learning can be found in scripts SARSA.m, Q_Learning.m and Double_Q_Learning.m, respectively. In these, an epsilon-greedy policy is used for exploration. SARSA and Q-Learning are implemented also using Upper Confidence Bounds (UCB) for exploration (in SARSA_UCB.m and Q_Learning_UCB.m, respectively).

近期下载者

相关文件


收藏者