ppmp_exercises
所属分类:GPU/显卡
开发工具:Cuda
文件大小:0KB
下载次数:0
上传日期:2020-10-29 06:37:17
上 传 者:
sh-1993
说明: 我在阅读《大规模并行处理器编程》一书时一直在做的练习。,
(Exercises that I ve been doing while reading Programming Massively Parallel Processors book.,)
文件列表:
CMakeLists.txt (675, 2020-10-28)
LICENSE (1074, 2020-10-28)
bin/ (0, 2020-10-28)
bin/CMakeLists.txt (577, 2020-10-28)
bin/benchmark_matmul.cpp (2109, 2020-10-28)
bin/devicequery.cpp (118, 2020-10-28)
bin/test_cuda_functions.cpp (2263, 2020-10-28)
cmake_modules/ (0, 2020-10-28)
cmake_modules/Findfmt.cmake (2061, 2020-10-28)
cmake_modules/google_benchmark.cmake (761, 2020-10-28)
kernels/ (0, 2020-10-28)
kernels/CMakeLists.txt (558, 2020-10-28)
kernels/core.cu (1897, 2020-10-28)
kernels/core.cuh (1371, 2020-10-28)
kernels/device_query.cu (1500, 2020-10-28)
kernels/device_query.cuh (60, 2020-10-28)
kernels/elementwise.cu (1477, 2020-10-28)
kernels/elementwise.cuh (262, 2020-10-28)
kernels/internal.cu (671, 2020-10-28)
kernels/internal.cuh (185, 2020-10-28)
kernels/matmul.cu (3323, 2020-10-28)
kernels/matmul.cuh (337, 2020-10-28)
# Build & Run
```
mkdir build
cd build
cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc
make -j all
./bin/devicequery
./bin/benchmark_matmul
```
# Materials
- [Kirk, David B., and Wen-mei W. Hwu. Programming Massively Parallel Processors: A Hands-on Approach. 2nd ed. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2012.](https://www.amazon.com/Programming-Massively-Parallel-Processors-Hands/dp/0124159923)
- [CoffeeBeforeArch, CUDA Crash Course, 2019](https://www.youtube.com/playlist?list=PLxNPSjHT5qvtYRVdNN1yDcdSl39uHV_sU)
- [PUMPS+AI Summer School | PUMPS+AI 2019, accessed Oct. 29, 2020](https://pumps.bsc.es/2019/)
- [PUMPS+AI Summer School | PUMPS+AI 2019, labs](https://github.com/illinois-impact/gpu-algorithms-labs)
# Plan
I'm updating it on the way.
- build system
- device query
- [benchmarking](https://github.com/google/benchmark)
- [unit tests](https://github.com/google/googletest)
- CUDA thingies
- unified memory
- streams
- algorithms
- vector add
- naive matrix multiplication
- tiled matrix multiplication
- sum reduction
- 1D/2D convolution (const mem, tiled)
- histogram
- Estimate Pi using Monte-Carlo simulation
# Benchmarks
## Matrix multiplication
| Benchmark | Time | CPU | Iterations |
|---------------------------------|----------------|-----------------|-----------|
| MatmulSimple/256/manual_time | 0.371 ms | 0.495 ms | 1887 |
| MatmulSimple/512/manual_time | 2.68 ms | 3.08 ms | 261 |
| MatmulSimple/1024/manual_time | 21.5 ms | 22.7 ms | 33 |
| MatmulSimple/2048/manual_time | 172 ms | 176 ms | 4 |
| MatmulSimple/4096/manual_time | 1373 ms | 1393 ms | 1 |
| MatmulTiled/256/manual_time | 0.160 ms | 0.284 ms | 4363 |
| MatmulTiled/512/manual_time | 1.11 ms | 1.53 ms | 631 |
| MatmulTiled/1024/manual_time | 8.97 ms | 10.3 ms | 78 |
| MatmulTiled/2048/manual_time | 71.5 ms | 76.3 ms | 10 |
| MatmulTiled/4096/manual_time | 571 ms | 590 ms | 1 |
近期下载者:
相关文件:
收藏者: