cuda-multigrid

所属分类:GPU/显卡
开发工具:Cuda
文件大小:14KB
下载次数:0
上传日期:2021-04-25 06:57:17
上 传 者sh-1993
说明:  GPU加速的二维Poisson方程多重网格求解器
(GPU-Accelerated multigrid solver for Poisson s equation in 2D)

文件列表:
CMakeLists.txt (488, 2021-04-25)
src (0, 2021-04-25)
src\assertions.hpp (1781, 2021-04-25)
src\definitions.cuh (450, 2021-04-25)
src\grid.cuh (9486, 2021-04-25)
src\grid.hpp (4765, 2021-04-25)
src\poisson.cuh (7981, 2021-04-25)
src\poisson.hpp (8194, 2021-04-25)
src\solver.hpp (1186, 2021-04-25)
test (0, 2021-04-25)
test\CMakeLists.txt (177, 2021-04-25)
test\test_grid.cu (11037, 2021-04-25)
test\test_poisson.cu (4159, 2021-04-25)

# CUDA Multigrid The code in this repository solves Poisson's equation in 2D subject to Dirichlet boundary condition using the Multigrid method with a Gauss-Seidel smoother. The PDE is discretized using second order finite differences and conservative prolongation and restriction operators. There are two implementations of the solver, a single-threaded CPU solver and a GPU solver (custom CUDA kernels). To expose parallelism, the GPU solver uses a Red-Black Gauss-Seidel smoother. ## Usage Look at the test program `test/test_poisson.cu` to learn how to use the code. ```CUDA // Select problem to solve using CUDAProblem = CUDAPoisson; // Select smoother using CUDASmoother = CUDAGaussSeidelRedBlack; // Select multigrid solver using CUDAMG = CUDAMultigrid; CUDAProblem problem(l, h, modes); CUDAMG solver(problem); auto out = solve(solver, problem, opts); printf("Iterations: %d, Residual: %g \n", out.iterations, out.residual); ``` ``` CPU: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz GPU: NVIDIA RTX 2080 Ti test/test_poisson Solver: Gauss-Seidel (red-black) Iteration Residual 10 6.674289 20 1.369973 30 0.2812023 40 0.05771992 50 0.01184766 60 0.002431866 70 0.0004991677 80 0.00010245*** 90 2.103102e-05 100 4.316853e-06 110 8.860825e-07 120 1.818784e-07 130 3.733259e-08 Iterations: 139, Residual: 8.97769e-09 Solver: Multi-Grid Iteration Residual 10 4.95***94e-09 Iterations: 10, Residual: 4.95***9e-09 Solver: CUDA Gauss-Seidel (red-black) Iteration Residual 10 6.674289 20 1.369973 30 0.2812023 40 0.05771992 50 0.01184766 60 0.002431866 70 0.0004991677 80 0.00010245*** 90 2.103102e-05 100 4.316853e-06 110 8.860825e-07 120 1.818784e-07 130 3.733259e-08 Iterations: 139, Residual: 8.97769e-09 Solver: CUDA Multi-Grid Iteration Residual 10 4.95***94e-09 Iterations: 10, Residual: 4.95***9e-09 MMS convergence test Solver: Multi-Grid Grid Size Iterations Time (ms) Residual Error Rate 6 x 6 1 0.00406 2.4718e-16 0.9348 -inf 10 x 10 8 0.01162 8.3216e-09 0.30908 1.59669 18 x 18 10 0.03814 4.9599e-09 0.08183 1.91727 34 x 34 11 0.14534 1.6851e-09 0.02074 1.***024 66 x 66 11 0.55350 2.2692e-09 0.0052025 1.99512 130 x 130 11 2.1***37 2.4601e-09 0.0013017 1.9***78 258 x 258 11 8.77101 2.5173e-09 0.0003255 1.99970 514 x 514 11 40.58800 2.5417e-09 8.1378e-05 1.99993 1026 x 1026 11 179.36189 2.5836e-09 2.0344e-05 2.00001 2050 x 2050 11 746.79437 2.735e-09 5.0858e-06 2.00010 40*** x 40*** 11 3020.28027 3.38e-09 1.2711e-06 2.00041 8194 x 8194 11 12569.50879 7.1411e-09 3.174e-07 2.00166 MMS convergence test Solver: CUDA Multi-Grid Grid Size Iterations Time (ms) Residual Error Rate 6 x 6 1 0.08147 2.4718e-16 0.9348 -inf 10 x 10 8 0.46346 8.3216e-09 0.30908 1.59669 18 x 18 10 0.7***61 4.9599e-09 0.08183 1.91727 34 x 34 11 1.04093 1.6851e-09 0.02074 1.***024 66 x 66 11 1.23590 2.2692e-09 0.0052025 1.99512 130 x 130 11 1.52275 2.4601e-09 0.0013017 1.9***78 258 x 258 11 2.05760 2.5173e-09 0.0003255 1.99970 514 x 514 11 3.58195 2.5417e-09 8.1378e-05 1.99993 1026 x 1026 11 10.08339 2.5836e-09 2.0344e-05 2.00001 2050 x 2050 11 29.20931 2.735e-09 5.0858e-06 2.00010 40*** x 40*** 11 109.10480 3.38e-09 1.2711e-06 2.00041 8194 x 8194 11 397.84283 7.1411e-09 3.174e-07 2.00166 ``` ## TODO Currently, the GPU solver is more or less a copy-and-paste-version of the CPU solver. It would be nice to remove much of the code duplication. There's also room for improvement in terms of optimization. First, I should perform a baseline profiling to identify the current hotspots. Nonetheless, I already have some ideas on what can be improved. The red-black Gauss-Seidel implementation is currently terrible. It takes two trips to DRAM to load the data. First, once for the red points, and then once more for the black points. At each time, only half of the threads are active. It should possible to keep all threads active by loading the data into shared memory using a float2/double2 instruction. Some kernels like the restriction and residual kernel can also probably merged into a single one to further reduce memory traffic.

近期下载者

相关文件


收藏者