MyCudaSamples 联合开发网

Pudn.com > 下载中心 > GPU/显卡 > MyCudaSamples

MyCudaSamples

所属分类：GPU/显卡
开发工具：Cuda
文件大小：15KB
下载次数：0
上传日期：2018-01-23 12:51:32
上传者：sh-1993

说明：这是一系列CUDA C++编程示例，旨在研究CUDA技术及其并行编程模型。
(This is a series of CUDA C++ programming samples developed to study CUDA technology and its parallel programming model.)

文件列表:

src (0, 2018-01-23)
src\Convolution.cu (8329, 2018-01-23)
src\Convolution.h (372, 2018-01-23)
src\CudaUtil.h (791, 2018-01-23)
src\Histogram.cu (2394, 2018-01-23)
src\Histogram.h (233, 2018-01-23)
src\MatMatMult.cu (1564, 2018-01-23)
src\MatMatMult.h (238, 2018-01-23)
src\MatrixVectorMult.cu (1615, 2018-01-23)
src\MatrixVectorMult.h (288, 2018-01-23)
src\MyMatrixAdd.cu (2621, 2018-01-23)
src\MyMatrixAdd.h (226, 2018-01-23)
src\QueryDevice.h (1164, 2018-01-23)
src\Run.cpp (5496, 2018-01-23)
src\VectorAdd.cu (1304, 2018-01-23)
src\VectorAdd.h (233, 2018-01-23)
src\VectorReduction.cu (2192, 2018-01-23)
src\VectorReduction.h (247, 2018-01-23)
src\prefixSum.cu (4253, 2018-01-23)
src\prefixSum.h (220, 2018-01-23)

# MyCudaSamples Collection of programming exercises based on the book [Programming Massively Parallel Processors: A Hands-on Approach by David Kirk and Wen-mei W. Hwu](https://www.elsevier.com/books/programming-massively-parallel-processors/kirk/978-0-12-811***6-0). In this self-study project, we implement different mathematical operations using C++ and pure CUDA framework. ## Requirements: - C++ - Cuda Toolkit ## Algorithms: - [Vector Addition](./src/VectorAdd.cu), [Matrix Vector Multiplication](./src/MatrixVectorMult.cu) and [Matrix Multiplication](./src/MatMatMult.cu): Standard linear algebra operations on CPU and GPU. Check how the GPU implementation becomes faster than the CPU code when we increase the size of the operands. - [Matrix Addition](./src/MyMatrixAdd.cu): Standard Matrix addition in CPU and GPU. Check how to launch kernels with different configurations and its implications. - [Convolution](./src/convolution.cu): Implementation of 1D and 2D convolution in CPU and GPU. We also show how tiling can speed-up the GPU implementation by leveraging data locality. - [Histogram](./src/histogram.cu): Code to compute histograms using CPU an GPU. We show how to use the shared memory to speed-up the GPU implementation by reducing the global memory access. - [Vector Reduction](./src/VectorReduction.cu): GPU code to peform vector reduction operations. Here, we see how to synchronize threads in GPU. - [Prefix Sum](./src/prefixSum.cu): This is the implementation of the cumulative sum which is trivial in CPU code but challenging in GPU. We provide two ways to perform such an algorithm. - [Run](./src/Run.cpp): Use this file to run the samples described above. Note that we **do not use** any CUDA library like [cuBLAS](https://developer.nvidia.com/cublas), [cuDNN](https://developer.nvidia.com/cuDNN) or [Thrust](https://developer.nvidia.com/thrust) in our project, since we focus on learning how the GPU hardware and programming model works. However, we acknowledge that these libraries are more efficient and very handy for large-scale applications.

近期下载者：

相关文件：

评论：[我要评论] [举报此文件]

收藏者：