rl-ml-master

所属分类:其他
开发工具:matlab
文件大小:63KB
下载次数:9
上传日期:2020-02-23 15:08:44
上 传 者said3988
说明:  Q-LEARNING FOR WORLD GRID NAVIGATION This project is for implementation of Q-Learning algorithm for world grid navigation

文件列表:
images (0, 2018-01-29)
images\Figure1.png (39115, 2018-01-29)
images\Figure2.png (9266, 2018-01-29)
images\Figure3.png (14927, 2018-01-29)
source (0, 2018-01-29)
source\QLearnTrial.m (713, 2018-01-29)
source\QLearnTrials.m (835, 2018-01-29)
source\RL_main.m (559, 2018-01-29)
source\Task1.m (1438, 2018-01-29)
source\calculateOptimalPolicy.m (154, 2018-01-29)
source\decay.m (468, 2018-01-29)
source\epsilonGreedy.m (802, 2018-01-29)
source\plot_grid.m (860, 2018-01-29)
source\task1.mat (586, 2018-01-29)
source\transition.m (576, 2018-01-29)
source\walkOptimalPolicy.m (490, 2018-01-29)

# Q-LEARNING FOR WORLD GRID NAVIGATION This project is for implementation of Q-Learning algorithm for world grid navigation. Matlab has been used for the implementation. ## Problem Statement Suppose that a robot is to traverse on a 10 x 10 grid, with the start state being the top-left cell and the goal state being the bottom-right cell, as is illustrated in Figure 1.
Fig. 1. Illustration of a 10 x 10 world grid with start state and goal state. The index of each cell follows the MATLAB column-wise convention.
Fig. 1. Illustration of a 10 x 10 world grid with start state and goal state. The index of each cell follows the MATLAB column-wise convention.
The robot is to reach the goal state by maximizing the total reward of the trip. Note that the numbers (from 1 to 100) assigned to the individual cells represent the states; they do not represent the reward for the cells. At a state, the robot can take one of four actions (as shown in Figure 2) to move up (a = 1), right (a = 2), down (a = 3), or left (a = 4), into the corresponding adjacent state deterministically.
Fig. 2. Possible actions of the robot at a given state.
Fig. 2. Possible actions of the robot at a given state.
The learning process will consist of a series of trials. In a trial the robot starts at the initial state (s = 1) and makes transitions, according to the algorithm for Q-learning with -greedy exploration, until it reaches the goal state (s = 100), upon which the trial ends. The above process repeats until the values of the Q-function converge to the optimal values. An optimal policy can be then obtained. ## Implementation ### Task 1 Write a MATLAB (M-file) program to implement the Q-learning algorithm, using the reward function as given in task1.mat and with the -greedy exploration algorithm by setting k, k and as specified in Table 1. The file task1.mat is included in the zip file that also contains this document. It can be directly loaded into MATLAB and contains the matrix variable reward (dimension: 1004), in which each column corresponds to an action and each row to a state. For example, the reward for taking action a = 3 at state s = 1 to enter state s = 2 is given by the (1; 3) entry of reward, i.e., p(1; 3; 2) = reward(1; 3). Transitions that are not possible are marked by a reward of -1. #### Table 1
k, k Training accuracy Test accuracy
= 0.5 = 0.9 = 0.5 = 0.9
1 / ‘ 0 0 - -
100 / (100 + ‘) 0 10 - 1.1368
(1 + log(‘)) / ‘ 0 0 - -
(1 + 5log(‘)) / ‘ 0 10 - 3.4976
The plot below shows the trajectory and reward.
Fig. 3. Trajectory and reward.
Fig. 3. Trajectory and reward.
### Task 2 Write a MATLAB (M-file) program to implement Q-learning using your own values of the relevant parameters. Assume that the grid size is 10 x 10 and implement your program in a MATLAB M-file. This M-file will be used to find the optimal policy using a reward function not provided to the students, as part of the assessment scheme discussed in Section V. #### Solution “ For this task following design parameters are chosen for optimality “ 1. discount is 0.9 as I found in task 1 that value of 0.9 yields 100% successful runs. 2. alpha is selected as constant 1. 3. epsilon is selected - 100 / (100 + ‘) RL_main.m file can be executed to run the main program with unknown reward. The output is generated in a column vector named qevalstates. An assumption is made that the variable qevalreward already exists in Matlab workspace. If the variable is not found, the script will exit. The code was implemented and run in Matlab version - R2016b (9.1.0.441655).

近期下载者

相关文件


收藏者