ppmp_exercises
cuda cpp 

所属分类:GPU/显卡
开发工具:Cuda
文件大小:0KB
下载次数:0
上传日期:2020-10-29 06:37:17
上 传 者sh-1993
说明:  我在阅读《大规模并行处理器编程》一书时一直在做的练习。,
(Exercises that I ve been doing while reading Programming Massively Parallel Processors book.,)

文件列表:
CMakeLists.txt (675, 2020-10-28)
LICENSE (1074, 2020-10-28)
bin/ (0, 2020-10-28)
bin/CMakeLists.txt (577, 2020-10-28)
bin/benchmark_matmul.cpp (2109, 2020-10-28)
bin/devicequery.cpp (118, 2020-10-28)
bin/test_cuda_functions.cpp (2263, 2020-10-28)
cmake_modules/ (0, 2020-10-28)
cmake_modules/Findfmt.cmake (2061, 2020-10-28)
cmake_modules/google_benchmark.cmake (761, 2020-10-28)
kernels/ (0, 2020-10-28)
kernels/CMakeLists.txt (558, 2020-10-28)
kernels/core.cu (1897, 2020-10-28)
kernels/core.cuh (1371, 2020-10-28)
kernels/device_query.cu (1500, 2020-10-28)
kernels/device_query.cuh (60, 2020-10-28)
kernels/elementwise.cu (1477, 2020-10-28)
kernels/elementwise.cuh (262, 2020-10-28)
kernels/internal.cu (671, 2020-10-28)
kernels/internal.cuh (185, 2020-10-28)
kernels/matmul.cu (3323, 2020-10-28)
kernels/matmul.cuh (337, 2020-10-28)

# Build & Run ``` mkdir build cd build cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc make -j all ./bin/devicequery ./bin/benchmark_matmul ``` # Materials - [Kirk, David B., and Wen-mei W. Hwu. Programming Massively Parallel Processors: A Hands-on Approach. 2nd ed. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2012.](https://www.amazon.com/Programming-Massively-Parallel-Processors-Hands/dp/0124159923) - [CoffeeBeforeArch, CUDA Crash Course, 2019](https://www.youtube.com/playlist?list=PLxNPSjHT5qvtYRVdNN1yDcdSl39uHV_sU) - [PUMPS+AI Summer School | PUMPS+AI 2019, accessed Oct. 29, 2020](https://pumps.bsc.es/2019/) - [PUMPS+AI Summer School | PUMPS+AI 2019, labs](https://github.com/illinois-impact/gpu-algorithms-labs) # Plan I'm updating it on the way. - build system - device query - [benchmarking](https://github.com/google/benchmark) - [unit tests](https://github.com/google/googletest) - CUDA thingies - unified memory - streams - algorithms - vector add - naive matrix multiplication - tiled matrix multiplication - sum reduction - 1D/2D convolution (const mem, tiled) - histogram - Estimate Pi using Monte-Carlo simulation # Benchmarks ## Matrix multiplication | Benchmark | Time | CPU | Iterations | |---------------------------------|----------------|-----------------|-----------| | MatmulSimple/256/manual_time | 0.371 ms | 0.495 ms | 1887 | | MatmulSimple/512/manual_time | 2.68 ms | 3.08 ms | 261 | | MatmulSimple/1024/manual_time | 21.5 ms | 22.7 ms | 33 | | MatmulSimple/2048/manual_time | 172 ms | 176 ms | 4 | | MatmulSimple/4096/manual_time | 1373 ms | 1393 ms | 1 | | MatmulTiled/256/manual_time | 0.160 ms | 0.284 ms | 4363 | | MatmulTiled/512/manual_time | 1.11 ms | 1.53 ms | 631 | | MatmulTiled/1024/manual_time | 8.97 ms | 10.3 ms | 78 | | MatmulTiled/2048/manual_time | 71.5 ms | 76.3 ms | 10 | | MatmulTiled/4096/manual_time | 571 ms | 590 ms | 1 |

近期下载者

相关文件


收藏者