• K5_982579
  • 74.3MB
  • zip
  • 0
  • VIP专享
  • 0
  • 2022-06-14 08:46
边缘距离 一种计算机视觉工具,可使用sobel边缘检测和欧几里德距离计算来找到一个点到边缘的距离。 对于caMicroscope GSoC 2021项目选择[2]病理学Smartpens代码挑战 代码挑战:创建一个页面或工具来对给定图像执行边缘检测,并在给定点的情况下返回从该点到最近边缘的距离。 个人背景知识 作为背景知识,我参加了华盛顿大学的CSE 455计算机视觉课程,并在此课程中实现了自己的微型计算机视觉库。 我实现的过滤器之一是用于边缘检测的sobel运算符。 为了应对这一挑战,我在源代码中添加了一个名为get_smallest_dist_from_edge的附加函数,您可以在dist-from-edge/src/hw2/filter_image.c找到该dist-from-edge/src/hw2/filter_image.c 我在2021年3月14日添加了此额外的get_
# CSE 455 Homework 5 # Welcome, it's time for neural networks! Needless to say, run ``` git pull make clean make ``` Then `./main test hw5` and/or `python tryhw5.py`. ## 1. Implementing neural networks ## ### 1.1 Activation functions ### An important part of machine learning, be it linear classifiers or neural networks, is the activation function you use. #### TO DO #### Fill in `void activate_matrix(matrix m, ACTIVATION a)` to modify `m` to be `f(m)` applied elementwise where the function `f` is given by what the activation `a` is. Remember that for our softmax activation we will take e^x for every element x, but then we have to normalize each element by the sum as well. Each row in the matrix is a separate data point so we want to normalize over each data point separately. ### 1.2 Taking gradients ### As we are calculating the partial derivatives for the weights of our model we have to remember to factor in the gradient from our activation function at every layer. We will have the derivative of the loss with respect to the output of a layer (after activation) and we need the derivative of the loss with respect to the input (before activation). To do that, we take each element in our delta (the partial derivative) and multiply it by the gradient of our activation function. Normally, to calculate `g'(x)` we would need to remember what the input to the function, `x`, was. However, all of our activation functions have the nice property that we can calculate the gradient `g'(x)` only with the output `g(x)`. This means we have to remember less stuff when running our model. The gradient of a linear activation is just 1 everywhere. The gradient of our softmax will also be 1 everywhere because we will only use the softmax as our output with a cross-entropy loss function, for an explanation of why see [here](https://math.stackexchange.com/questions/945871/derivative-of-softmax-loss-function?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa) or [here](https://eli.thegreenplace.net/2016/the-softmax-function-and-its-derivative/#softmax-and-cross-entropy-loss). The gradient of our logistic function is discussed [here](https://en.wikipedia.org/wiki/Logistic_function#Derivative): g'(x) = g(x) * (1 - g(x)) As discussed in the backpropagation slides, we just multiply our partial derivative by `g'(x)` to backpropagate through an activation function. #### TO DO #### Fill in `void gradient_matrix(matrix m, ACTIVATION a, matrix d)` to multiply the elements of `d` by the correct gradient information where `m` is the output of a layer that uses `a` as it's activation function. ### 1.3 Forward propagation ### Now we can fill in how to forward-propagate information through a layer. First check out our layer struct: typedef struct { matrix in; // Saved input to a layer matrix w; // Current weights for a layer matrix dw; // Current weight updates matrix v; // Past weight updates (for use with momentum) matrix out; // Saved output from the layer ACTIVATION activation; // Activation the layer uses } layer; During forward propagation we will do a few things. We'll multiply the input matrix by our weight matrix `w`. We'll apply the activation function `activation`. We'll also save the input and output matrices in the layer so that we can use them during backpropagation. But this is already done for you. #### TO DO #### Fill in the TODO in `matrix forward_layer(layer *l, matrix in)`. You will need the matrix multiplication function `matrix matrix_mult_matrix(matrix a, matrix b)` provided by our matrix library. Using matrix operations we can batch-process data. Each row in the input `in` is a separate data point and each row in the returned matrix is the result of passing that data point through the layer. ### 1.4 Backward propagation ### We have to backpropagate error through our layer. We will be given `dL/dy`, the derivative of the loss with respect to the output `y` of the layer. The output `y` is given by `y = g(xw)` where `x` is the input, `w` is our weights for that layer, and `g` is the activation function. What we want to calculate is `dL/dw` so we can update our weights and `dL/dx` which is the backpropagated loss to the previous layer. #### TO DO #### Fill in `matrix backward_layer(layer *l, matrix delta)` #### 1.4.1 Gradient of activation function #### First we need to calculate `dL/d(xw)` using the gradient of our activation function. Recall: dL/d(xw) = dL/dy * dy/d(xw) = dL/dy * df(xw)/d(xw) = dL/dy * g'(xw) Use your `void gradient_matrix(matrix m, ACTIVATION a, matrix d)` function to change delta from `dL/dy` to `dL/d(xw)`. Note `xw` here is the result from matrix multiplication x times w (i.e. `l->out`). `dL/dy` is the Δo. #### 1.4.2 Δw: Derivative of loss w.r.t. weights #### Next we want to calculate the derivative with respect to our weights, `dL/dw`. Recall: dL/dw = dL/d(xw) * d(xw)/dw = dL/d(xw) * x but remember from the slides, to make the matrix dimensions work out right we actually do the matrix operation of `xt * dL/d(xw)` where `xt` is the transpose of the input matrix `x`. In our layer we saved the input as `l->in`. Calculate `xt` using that and the matrix transpose function in our library, `matrix transpose_matrix(matrix m)`. Then calculate `dL/dw` and save it into `l->dw` (free the old one first to not leak memory!). We'll use this later when updating our weights. #### 1.4.3 Δo: Derivative of loss w.r.t. input #### Next we want to calculate the derivative with respect to the input (i.e. the output of the previous layer) as well, `dL/dx`. Recall: dL/dx = dL/d(xw) * d(xw)/dx = dL/d(xw) * w again, we have to make the matrix dimensions line up so it actually ends up being `dL/d(xw) * wt` where `wt` is the transpose of our weights, `w`. Calculate `wt` and then calculate dL/dx. This is the matrix we will return. ### 1.5 Weight updates ### After we've performed a round of forward and backward propagation we want to update our weights. Remember from class with momentum and decay our weight update rule is: Δw_t = dL/dw_t - λw_t + mΔw_{t-1} w_{t+1} = w_t + α Δw_t We'll be doing gradient ascent (the partial derivative component is positive) instead of descent because we'll flip the sign of our partial derivative when we calculate it at the output. We saved `dL/dw_t` as `l->dw` and by convention we'll use `l->v` to store the previous weight change `Δw_{t-1}`. Calculate the current weight change as a weighted sum of `l->dw`, `l->w`, and `l->v`. The function `matrix axpy_matrix(double a, matrix x, matrix y)` will be useful here, it calculates the result of the operation `ax+y` where `a` is a scalar and `x` and `y` are matrices. Save the current weight change in `l->v` for next round (remember to free the current `l->v` first). Finally, apply the weight change to your weights by adding a scaled amount of the change based on the learning rate. #### TO DO #### Fill in `void update_layer(layer *l, double rate, double momentum, double decay)`. ## 1.6 Read through the rest of the code ## Check out the remainder of the functions which string layers together to make a model, run the model forward and backward, update it, train it, etc. Notable highlights: #### `layer make_layer(int input, int output, ACTIVATION activation)` #### Makes a new layer that takes `input` number of inputs and produces `output` number of outputs. Note that we randomly initialize the weight vector, in this case with a uniform random distribution between `[-sqrt(2/input), sqrt(2/input)]`. The weights could be initialize with other random distributions but this one works pretty well. This is one of those secret tricks you just have to know from experience! #### `double accuracy_model(model m, data d)` #### Will be useful to test