openCNN

所属分类:GPU/显卡
开发工具:Cuda
文件大小:53KB
下载次数:0
上传日期:2021-08-25 09:10:28
上 传 者sh-1993
说明:  在CUDA中实现Winograd最小滤波器
(A Winograd Minimal Filter Implementation in CUDA)

文件列表:
LICENSE (11357, 2021-08-25)
Makefile (218, 2021-08-25)
bench (0, 2021-08-25)
bench\bench.sh (423, 2021-08-25)
bench\smem (0, 2021-08-25)
bench\smem\LDS (0, 2021-08-25)
bench\smem\LDS\Makefile (342, 2021-08-25)
bench\smem\LDS\lds128.sass (788, 2021-08-25)
bench\smem\LDS\lds32.sass (779, 2021-08-25)
bench\smem\LDS\lds64.sass (784, 2021-08-25)
bench\smem\LDS\lds64_opt3.sass (1304, 2021-08-25)
bench\smem\LDS\main.cu (1261, 2021-08-25)
bench\smem\STS (0, 2021-08-25)
bench\smem\STS\Makefile (624, 2021-08-25)
bench\smem\STS\main.cu (3261, 2021-08-25)
bench\smem\STS\sts.sass (823, 2021-08-25)
bench\smem\STS\sts128.sass (1145, 2021-08-25)
bench\smem\STS\sts128_0.sass (823, 2021-08-25)
bench\smem\STS\sts32.sass (819, 2021-08-25)
bench\smem\STS\sts64.sass (821, 2021-08-25)
bench\smem\STS\sts64_2bank_conflict.sass (1239, 2021-08-25)
bench\smem\STS\sts64_broadcast.sass (1325, 2021-08-25)
bench\smem\STS\sts64_opt3.sass (1341, 2021-08-25)
src (0, 2021-08-25)
src\FX_m2.cu (2704, 2021-08-25)
src\ampere (0, 2021-08-25)
src\ampere\convolutionForward_32x64x8.cu (9709, 2021-08-25)
src\ampere\convolutionForward_32x64x8_baseline.cu (9532, 2021-08-25)
src\ampere\store_and_transform_output_baseline.cuh (4594, 2021-08-25)
src\ampere\store_and_transform_output_optLDS64.cuh (5160, 2021-08-25)
src\ampere\store_and_transform_output_optSTS64.cuh (4999, 2021-08-25)
src\ampere\store_and_transform_output_optSTS64_compact.cuh (7237, 2021-08-25)
src\config.hpp (3615, 2021-08-25)
src\convolutionForward_32x64x8.cu (10113, 2021-08-25)
src\convolutionForward_32x64x8_baseline.cu (9313, 2021-08-25)
src\openCNN_winograd.cu (17213, 2021-08-25)
src\outer_product.cuh (11107, 2021-08-25)
src\outer_product_suffle.cuh (11110, 2021-08-25)
... ...

# OpenCNN A winograd’s minimal filtering algorithm implementation in CUDA ## Requirements - CUDA Toolkit - cuDNN - CMake ## Build the project ``` git clone https://github.com/UDC-GAC/openCNN.git cd openCNN ``` GPU architecture code must be specified inside the Makefile before compiling. ``` make ``` Compile time macros have been defined for testing purposes. They can be specified in the MakeFile according to the following values: ``` $OUT (builds an specific output storage and transform version): - BASE: baseline layout - OPTSTS*** (default): optSTS*** layout - OUTSTS***_CMP: optSTS***_compact layout - OUTLDS***: optLDS*** layout ``` ## Run examples (Recommended before time measurement) Lock the clocks: ``` sudo nvidia-smi -i 0 -pm 1 sudo nvidia-smi -lgc 1750 -i 0 ``` 1. OpenCNN benchmark ``` cd bench ./bench.sh ``` 2. Instruction-level microbenchmarking (Require Turing devices). TuringAs (https://github.com/daadaada/turingas) must be installed for running instruction-level microbenchmarking. ``` cd bench/smem/STS make ./test ``` ## Citation If you find this tool helpful, please cite: ``` @Article{math9172033, AUTHOR = {Castro, Roberto L. and Andrade, Diego and Fraguela, Basilio B.}, TITLE = {OpenCNN: A Winograd Minimal Filtering Algorithm Implementation in CUDA}, JOURNAL = {Mathematics}, VOLUME = {9}, YEAR = {2021}, NUMBER = {17}, ARTICLE-NUMBER = {2033}, URL = {https://www.mdpi.com/2227-7390/9/17/2033}, ISSN = {2227-7390}, DOI = {10.3390/math9172033} } ``` ## License Apache-2.0 License -- Roberto Lopez Castro

近期下载者

相关文件


收藏者