udacity-IntroToParallelProgramming

所属分类:GPU/显卡
开发工具:Cuda
文件大小:14895KB
下载次数:0
上传日期:2017-07-23 16:11:20
上 传 者sh-1993
说明:  CS344-并行编程入门课程(Udacity)建议的解决方案
(CS344 - Introduction To Parallel Programming course (Udacity) proposed solutions)

文件列表:
IntroParallelProgramming.sln (6068, 2017-07-24)
Lesson1-CubeNumbers (0, 2017-07-24)
Lesson1-CubeNumbers\IntroParallelProgramming.vcxproj (8534, 2017-07-24)
Lesson1-CubeNumbers\main.cu (1118, 2017-07-24)
Lesson4-Reduction (0, 2017-07-24)
Lesson4-Reduction\Lesson4-Reduction.vcxproj (8458, 2017-07-24)
ProblemSet1-RGB2Gray (0, 2017-07-24)
ProblemSet1-RGB2Gray\CMakeLists.txt (594, 2017-07-24)
ProblemSet1-RGB2Gray\HW1.cpp (2790, 2017-07-24)
ProblemSet1-RGB2Gray\HW1_differenceImage.png (60061, 2017-07-24)
ProblemSet1-RGB2Gray\HW1_reference.png (120964, 2017-07-24)
ProblemSet1-RGB2Gray\Makefile (1681, 2017-07-24)
ProblemSet1-RGB2Gray\RGB2Gray.vcxproj (9250, 2017-07-24)
ProblemSet1-RGB2Gray\cinque_terre.gold (123140, 2017-07-24)
ProblemSet1-RGB2Gray\cinque_terre_gray.jpg (81136, 2017-07-24)
ProblemSet1-RGB2Gray\cinque_terre_small.jpg (89528, 2017-07-24)
ProblemSet1-RGB2Gray\compare.cpp (1400, 2017-07-24)
ProblemSet1-RGB2Gray\compare.h (208, 2017-07-24)
ProblemSet1-RGB2Gray\main.cpp (2802, 2017-07-24)
ProblemSet1-RGB2Gray\reference_calc.cpp (516, 2017-07-24)
ProblemSet1-RGB2Gray\reference_calc.h (252, 2017-07-24)
ProblemSet1-RGB2Gray\student_func.cu (3294, 2017-07-24)
ProblemSet1-RGB2Gray\timer.h (577, 2017-07-24)
ProblemSet1-RGB2Gray\utils.h (2976, 2017-07-24)
ProblemSet2-Blur (0, 2017-07-24)
ProblemSet2-Blur\CMakeLists.txt (599, 2017-07-24)
ProblemSet2-Blur\HW2.cpp (5087, 2017-07-24)
ProblemSet2-Blur\HW2_differenceImage.png (155744, 2017-07-24)
ProblemSet2-Blur\HW2_reference.png (236188, 2017-07-24)
ProblemSet2-Blur\Makefile (1573, 2017-07-24)
ProblemSet2-Blur\ProblemSet2-Blur.vcxproj (10457, 2017-07-24)
ProblemSet2-Blur\cinque_terre.gold (245505, 2017-07-24)
ProblemSet2-Blur\cinque_terre_blur.jpg (43825, 2017-07-24)
ProblemSet2-Blur\cinque_terre_small.jpg (89528, 2017-07-24)
ProblemSet2-Blur\compare.cpp (1386, 2017-07-24)
ProblemSet2-Blur\compare.h (194, 2017-07-24)
ProblemSet2-Blur\main.cpp (3958, 2017-07-24)
... ...

# udacity-IntroToParallelProgramming CS344 - Introduction To Parallel Programming course (Udacity) proposed solutions Testing Environment: Visual Studio 2015 x*** + nVidia CUDA 8.0 + OpenCV 3.2.0 For each problem set, the core of the algorithm to be implemented is located in the _students_func.cu_ file. ## Problem Set 1 - RGB2Gray: ### Objective Convert an input RGBA image into grayscale version (ignoring the A channel). ### Topics Example of a **map** primitive operation on a data structure. ## Problem Set 2 - Blur ### Objective Apply a Gaussian blur convolution filter to an input RGBA image (blur each channel independently, ignoring the A channel). ### Topics Example of a **stencil** primitive operation on a 2D array. Use of the **shared memory** in order to speed-up the algorithm. Both global memory and shared memory based kernels are provided, the latter providing approx. 1.6 speedup over the first. ## Problem Set 3 -Tone Mapping ### Objective Map a High Dynamic Range image into an image for a device supporting a smaller range of intensity values. ### Topics - Compute range of intensity values of the input image: min and max **reduce** implemented. - Compute **histogram** of intensity values (1024-values array) - Compute the cumulative ditribution function of the histogram: Hillis & Steele **scan** algorithm (step-efficient, well suited for small arrays like the histogram one). ## Problem Set 4 - Red eyes removal ### Objective Remove red eys effect from an inout RGBA image (it uses Normalized Cross Correlation against a training template). ### Topics Sorting algorithms with GPU: given an input array of NCC scores, sort it in ascending order: **radix sort**. For each bit: - Compute a predicate vector (0:false, 1:true) - Performs **Bielloch Scan** on the predicate vector (for both false and positive cases) - From Bielloch Scan extracts: an histogram of predicate values [0 numberOfFalses], an offset vector (the actual result of scan) - A move kernel computes the new index of each element (using the two structures above), and moves it. ## Problem Set 5 - Optimized histogram computation ### Objective Improve the histogram computation performance on GPU over the simple global atomic solution. ### Topics **Per-block** histogram computation. Each block computes his own histogram in shared memory, and histograms are combined at the end in global memory (more than 7x speedup over global atomic implementation, while being relatively simple). ## Problem Set 6 - Seamless Image Cloning ### Objective Given a target image (e.g. a swimming pool), do a seamless attachment of a source image mask (e.g. an hyppo). ### Topics The algorithm consists into performing Jacobi iterations on the source and target image to blend one with the other. - Given the mask, detect the interior points and the boundary points - Since the algorithm has to be performed only on the interior points, compute the **bounding box** of the mask region to restrict the Jacobi iterations on a subimage. - Split the images in the R,G and B channels. - Run 800 Jacobi iterations on each channel. The code makes use of **CUDA Streams** to run concurrently the same kernel on the 3 different channels (speedup of 3x on my machine, of 1.5x on the Udacity machine). The Jacobi kernel makes extensive use of shared memory, so the number of threads per block has been reduced to maximize SM's occupancy. - Recombine the 3 channels to form the output image.

近期下载者

相关文件


收藏者