CNN-from-scratch
GPU cuda 

所属分类:GPU/显卡
开发工具:Cuda
文件大小:0KB
下载次数:0
上传日期:2023-02-22 23:01:16
上 传 者sh-1993
说明:  MNIST手写数字神经网络分类器的全擦写训练器。仅使用C和CUDA。
(Fully scratched trainer for neural network classifier for MNIST hand written digit. Only C and CUDA is used.)

文件列表:
cnn.cpp (11275, 2023-02-22)
cnn.hpp (655, 2023-02-22)
dnn.cpp (12702, 2023-02-22)
dnn.hpp (1066, 2023-02-22)
dot.cu (45701, 2023-02-22)
main.cpp (14203, 2023-02-22)
mnist/ (0, 2023-02-22)
mnist/t10k-images-idx3-ubyte (7840016, 2023-02-22)
mnist/t10k-labels-idx1-ubyte (10008, 2023-02-22)
mnist/train-images-idx3-ubyte (47040016, 2023-02-22)
mnist/train-labels-idx1-ubyte (60008, 2023-02-22)
tensor.cpp (1334, 2023-02-22)
tensor.hpp (4142, 2023-02-22)

# MNIST Trainer From Scratch on GPU ## About This is a CNN trainer for [MNIST handwritten digit dataset](http://yann.lecun.com/exdb/mnist/) written in C++ and CUDA from scratch. ## Tools for GPU implementation - CUDA (https://developer.nvidia.com/cuda-toolkit) - OpenACC (https://www.openacc.org/) ## Requirements - nvcc - pgcc ## Feature ![image](https://user-images.githubusercontent.com/62000880/220344724-a0aed1dc-f98a-451c-af8a-497c7aca259d.png) This program supports [NVIDIA Tensor Core](https://www.nvidia.com/en-us/data-center/tensor-cores/). Tensor Core is an arithmetic circuit specialized for matrix multiplication operations. CUDA can access Tensor Cores through WMMA API like below. ```C++ wmma::fragment a_frag; wmma::fragment b_frag; wmma::fragment c_frag; wmma::fill_fragment(c_frag, __float2half(0.f)); wmma::load_matrix_sync(a_frag, &a_half[wid*ELEMS_TILE], 16); wmma::load_matrix_sync(b_frag, &b_half[wid*ELEMS_TILE], 16); wmma::mma_sync(c_frag, a_frag, b_frag, c_frag); wmma::store_matrix_sync(&c_half[wid*ELEMS_TILE], c_frag, 16, wmma::mem_row_major); ``` Tensor core performs a certain size of matrix multiplications. We can make large matrix multiplications by splitting each matrix into small 'tiles' and throw it into Tensor Cores.

近期下载者

相关文件


收藏者