CNN-from-scratch
所属分类:GPU/显卡
开发工具:Cuda
文件大小:0KB
下载次数:0
上传日期:2023-02-22 23:01:16
上 传 者:
sh-1993
说明: MNIST手写数字神经网络分类器的全擦写训练器。仅使用C和CUDA。
(Fully scratched trainer for neural network classifier for MNIST hand written digit. Only C and CUDA is used.)
文件列表:
cnn.cpp (11275, 2023-02-22)
cnn.hpp (655, 2023-02-22)
dnn.cpp (12702, 2023-02-22)
dnn.hpp (1066, 2023-02-22)
dot.cu (45701, 2023-02-22)
main.cpp (14203, 2023-02-22)
mnist/ (0, 2023-02-22)
mnist/t10k-images-idx3-ubyte (7840016, 2023-02-22)
mnist/t10k-labels-idx1-ubyte (10008, 2023-02-22)
mnist/train-images-idx3-ubyte (47040016, 2023-02-22)
mnist/train-labels-idx1-ubyte (60008, 2023-02-22)
tensor.cpp (1334, 2023-02-22)
tensor.hpp (4142, 2023-02-22)
# MNIST Trainer From Scratch on GPU
## About
This is a CNN trainer for [MNIST handwritten digit dataset](http://yann.lecun.com/exdb/mnist/) written in C++ and CUDA from scratch.
## Tools for GPU implementation
- CUDA (https://developer.nvidia.com/cuda-toolkit)
- OpenACC (https://www.openacc.org/)
## Requirements
- nvcc
- pgcc
## Feature
![image](https://user-images.githubusercontent.com/62000880/220344724-a0aed1dc-f98a-451c-af8a-497c7aca259d.png)
This program supports [NVIDIA Tensor Core](https://www.nvidia.com/en-us/data-center/tensor-cores/).
Tensor Core is an arithmetic circuit specialized for matrix multiplication operations.
CUDA can access Tensor Cores through WMMA API like below.
```C++
wmma::fragment
a_frag;
wmma::fragment b_frag;
wmma::fragment c_frag;
wmma::fill_fragment(c_frag, __float2half(0.f));
wmma::load_matrix_sync(a_frag, &a_half[wid*ELEMS_TILE], 16);
wmma::load_matrix_sync(b_frag, &b_half[wid*ELEMS_TILE], 16);
wmma::mma_sync(c_frag, a_frag, b_frag, c_frag);
wmma::store_matrix_sync(&c_half[wid*ELEMS_TILE], c_frag, 16, wmma::mem_row_major);
```
Tensor core performs a certain size of matrix multiplications.
We can make large matrix multiplications by splitting each matrix into small 'tiles' and throw it into Tensor Cores.
近期下载者:
相关文件:
收藏者: