openCNN
所属分类:GPU/显卡
开发工具:Cuda
文件大小:53KB
下载次数:0
上传日期:2021-08-25 09:10:28
上 传 者:
sh-1993
说明: 在CUDA中实现Winograd最小滤波器
(A Winograd Minimal Filter Implementation in CUDA)
文件列表:
LICENSE (11357, 2021-08-25)
Makefile (218, 2021-08-25)
bench (0, 2021-08-25)
bench\bench.sh (423, 2021-08-25)
bench\smem (0, 2021-08-25)
bench\smem\LDS (0, 2021-08-25)
bench\smem\LDS\Makefile (342, 2021-08-25)
bench\smem\LDS\lds128.sass (788, 2021-08-25)
bench\smem\LDS\lds32.sass (779, 2021-08-25)
bench\smem\LDS\lds64.sass (784, 2021-08-25)
bench\smem\LDS\lds64_opt3.sass (1304, 2021-08-25)
bench\smem\LDS\main.cu (1261, 2021-08-25)
bench\smem\STS (0, 2021-08-25)
bench\smem\STS\Makefile (624, 2021-08-25)
bench\smem\STS\main.cu (3261, 2021-08-25)
bench\smem\STS\sts.sass (823, 2021-08-25)
bench\smem\STS\sts128.sass (1145, 2021-08-25)
bench\smem\STS\sts128_0.sass (823, 2021-08-25)
bench\smem\STS\sts32.sass (819, 2021-08-25)
bench\smem\STS\sts64.sass (821, 2021-08-25)
bench\smem\STS\sts64_2bank_conflict.sass (1239, 2021-08-25)
bench\smem\STS\sts64_broadcast.sass (1325, 2021-08-25)
bench\smem\STS\sts64_opt3.sass (1341, 2021-08-25)
src (0, 2021-08-25)
src\FX_m2.cu (2704, 2021-08-25)
src\ampere (0, 2021-08-25)
src\ampere\convolutionForward_32x64x8.cu (9709, 2021-08-25)
src\ampere\convolutionForward_32x64x8_baseline.cu (9532, 2021-08-25)
src\ampere\store_and_transform_output_baseline.cuh (4594, 2021-08-25)
src\ampere\store_and_transform_output_optLDS64.cuh (5160, 2021-08-25)
src\ampere\store_and_transform_output_optSTS64.cuh (4999, 2021-08-25)
src\ampere\store_and_transform_output_optSTS64_compact.cuh (7237, 2021-08-25)
src\config.hpp (3615, 2021-08-25)
src\convolutionForward_32x64x8.cu (10113, 2021-08-25)
src\convolutionForward_32x64x8_baseline.cu (9313, 2021-08-25)
src\openCNN_winograd.cu (17213, 2021-08-25)
src\outer_product.cuh (11107, 2021-08-25)
src\outer_product_suffle.cuh (11110, 2021-08-25)
... ...
# OpenCNN
A winograd’s minimal filtering algorithm implementation in CUDA
## Requirements
- CUDA Toolkit
- cuDNN
- CMake
## Build the project
```
git clone https://github.com/UDC-GAC/openCNN.git
cd openCNN
```
GPU architecture code must be specified inside the Makefile before compiling.
```
make
```
Compile time macros have been defined for testing purposes. They can be specified in the MakeFile according to the following values:
```
$OUT (builds an specific output storage and transform version):
- BASE: baseline layout
- OPTSTS*** (default): optSTS*** layout
- OUTSTS***_CMP: optSTS***_compact layout
- OUTLDS***: optLDS*** layout
```
## Run examples
(Recommended before time measurement) Lock the clocks:
```
sudo nvidia-smi -i 0 -pm 1
sudo nvidia-smi -lgc 1750 -i 0
```
1. OpenCNN benchmark
```
cd bench
./bench.sh
```
2. Instruction-level microbenchmarking (Require Turing devices). TuringAs (https://github.com/daadaada/turingas) must be installed for running instruction-level microbenchmarking.
```
cd bench/smem/STS
make
./test
```
## Citation
If you find this tool helpful, please cite:
```
@Article{math9172033,
AUTHOR = {Castro, Roberto L. and Andrade, Diego and Fraguela, Basilio B.},
TITLE = {OpenCNN: A Winograd Minimal Filtering Algorithm Implementation in CUDA},
JOURNAL = {Mathematics},
VOLUME = {9},
YEAR = {2021},
NUMBER = {17},
ARTICLE-NUMBER = {2033},
URL = {https://www.mdpi.com/2227-7390/9/17/2033},
ISSN = {2227-7390},
DOI = {10.3390/math9172033}
}
```
## License
Apache-2.0 License
-- Roberto Lopez Castro
近期下载者:
相关文件:
收藏者: