cuda

所属分类:GPU/显卡
开发工具:Cuda
文件大小:0KB
下载次数:0
上传日期:2021-06-02 04:13:27
上 传 者sh-1993
说明:  EE 5351应用并行编程的所有作业。所有程序都是完整的、经过测试的和功能齐全的。,
(All assignments from EE 5351 Applied Parallel Programming. All programs are complete, tested, and functional.,)

文件列表:
vang0951_mp1/ (0, 2021-06-01)
vang0951_mp1/matrixmul.cu (6980, 2021-06-01)
vang0951_mp1/matrixmul_kernel.cu (988, 2021-06-01)
vang0951_mp2/ (0, 2021-06-01)
vang0951_mp2/matrixmul.cu (8760, 2021-06-01)
vang0951_mp2/matrixmul_kernel.cu (1636, 2021-06-01)
vang0951_mp3/ (0, 2021-06-01)
vang0951_mp3/2Dconvolution.cu (8096, 2021-06-01)
vang0951_mp3/2Dconvolution_kernel.cu (1128, 2021-06-01)
vang0951_mp4-1/ (0, 2021-06-01)
vang0951_mp4-1/Makefile (647, 2021-06-01)
vang0951_mp4-1/data.txt (1939, 2021-06-01)
vang0951_mp4-1/vector_reduction.cu (4424, 2021-06-01)
vang0951_mp4-1/vector_reduction_gold.cpp (989, 2021-06-01)
vang0951_mp4-1/vector_reduction_kernel.cu (1044, 2021-06-01)
vang0951_mp4-2/ (0, 2021-06-01)
vang0951_mp4-2/Makefile (614, 2021-06-01)
vang0951_mp4-2/scan_gold.cpp (1111, 2021-06-01)
vang0951_mp4-2/scan_largearray.cu (9728, 2021-06-01)
vang0951_mp4-2/scan_largearray_kernel.cu (4981, 2021-06-01)
vang0951_mp4-2/size.txt (9, 2021-06-01)
vang0951_mp5/ (0, 2021-06-01)
vang0951_mp5/Makefile (878, 2021-06-01)
vang0951_mp5/opt_2dhisto.cu (2308, 2021-06-01)
vang0951_mp5/opt_2dhisto.h (208, 2021-06-01)
vang0951_mp5/ref_2dhisto.cpp (661, 2021-06-01)
vang0951_mp5/ref_2dhisto.h (203, 2021-06-01)
vang0951_mp5/test_harness.cpp (4302, 2021-06-01)
vang0951_mp5/util.cpp (594, 2021-06-01)
vang0951_mp6/ (0, 2021-06-01)
vang0951_mp6/Makefile (526, 2021-06-01)
vang0951_mp6/kernel.cu (2010, 2021-06-01)
vang0951_mp6/main.cu (8291, 2021-06-01)
vang0951_mp6/support.cu (5009, 2021-06-01)
vang0951_mp6/support.h (1068, 2021-06-01)

# Applied Parallel Programming using CUDA/C++ Programming All assignments completed by me for course EE5351 Applied Parallel Programming at the University of Minnesota-Twin Cities. All programs are complete, tested, and functional. The GPU and CUDA versions used in this course: - 03:00.0 VGA compatible controller: NVIDIA Corporation GF106GL [Quadro2000] (rev a1) - 04:00.0 VGA compatible controller: NVIDIA Corporation GF100 [GeForceGTX 480] (rev a3) - 22:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForceGTX 1080] (rev a1) ### Machine Problem 1 - Matrix Multiplication The purpose of this machine problem is to learn the computation pattern for parallel matrix multiplication. The `MatrixMulOnDevice()` function in `matrixmul.cu` and the `MatrixMulKernel()` function in `matrixmul_kernel.cu` are edited to complete the functionality of the matrix multiplication on the device. Source code elsewhere are previously provided. The size of the matrix is defined such that one thread block will be sufficient to compute the entire solution matrix. ### Machine Problem 2 - Tiled Matrix Multiplication The purpose of this machine problem is to learn about shared memory tiling and apply it to a matrix multiplication problem to alleviate the memory bandwidth bottleneck. The `MatrixMulOnDevice()` function in `matrixmul.cu` and the `MatrixMulKernel()` function in `matrixmul_kernel.cu` are edited to complete the functionality of the matrix multiplication on the device. Source code elsewhere are previously provided. The two matrices are of any size, but one CUDA grid is guaranteed to cover the entire output matrix. ### Machine Problem 3 - 2D Tiled Convolution The purpose of this machine problem is to learn how constant memory and shared memory can be used to alleviate memory bandwidth bottlenecks in the context of a convolution computation. This is a tiled implementation of a matrix convolution. It will have a 5x5 convolution kernel, but will have arbitrarily sized "images." ### Machine Problem 4-1 - Reduction The purpose of this machine problem is to implement a work-efficient parallel reduction algorithm on the GPU. It assumes an input array of any size. ### Machine Problem 4-2 - Parallel Prefix Sum (Scan) The purpose of this machine problem is to implement a parallel prefix sum. The algorithm is also known as "scan." Scan is a useful building block for many parallel algorithms such as radix sort, quicksort, tree operations, and histograms. While scan is an appropriate algorithm for an associative operator, in this problem, addition is used. ### Machine Problem 5 - Histogramming Histograms are a commonly used analysis tool in image processing and data mining application. They show the frequency of occurrence of data elements over discrete intervals, also known as bins. The purpose of this machine problem is to explore this topic. Below are key assumptions/constraints: - The input data consists of index values to the bins. - The input bins are NOT uniformly distributed. This non-uniformity is a large portion of what makes this problem interesting for GPU's. - For each bin in the histogram, once the bin count reaches 255, no further incrementing will occur. This is sometimes known as a "saturating counter." Furthermore, the kernel runtime is measured. ### Machine Problem 6 - Sparse Matrix-Vector Multiplication The purpose of this machine problem is to understand sparse matrix storage formats and their impacts on performance using sparse matrix-vector multiplication. The formats used in this problem are Compressed Sparse Row (CSR) and Jagged Diagonal Storage (JDS). ## How to Run Code: Be sure to include all files within same directory before running code. Open up a new command line shell (e.g. Terminal), navigate to the directory where files are saved, and type/enter the following: `$ make`

近期下载者

相关文件


收藏者