cuda-multigrid
所属分类:GPU/显卡
开发工具:Cuda
文件大小:14KB
下载次数:0
上传日期:2021-04-25 06:57:17
上 传 者:
sh-1993
说明: GPU加速的二维Poisson方程多重网格求解器
(GPU-Accelerated multigrid solver for Poisson s equation in 2D)
文件列表:
CMakeLists.txt (488, 2021-04-25)
src (0, 2021-04-25)
src\assertions.hpp (1781, 2021-04-25)
src\definitions.cuh (450, 2021-04-25)
src\grid.cuh (9486, 2021-04-25)
src\grid.hpp (4765, 2021-04-25)
src\poisson.cuh (7981, 2021-04-25)
src\poisson.hpp (8194, 2021-04-25)
src\solver.hpp (1186, 2021-04-25)
test (0, 2021-04-25)
test\CMakeLists.txt (177, 2021-04-25)
test\test_grid.cu (11037, 2021-04-25)
test\test_poisson.cu (4159, 2021-04-25)
# CUDA Multigrid
The code in this repository solves Poisson's equation in 2D subject to Dirichlet boundary condition
using the Multigrid method with a Gauss-Seidel smoother. The PDE is discretized using second order
finite differences and conservative prolongation and restriction operators. There are two
implementations of the solver, a single-threaded CPU solver and a GPU solver (custom CUDA
kernels). To expose parallelism, the GPU solver uses a Red-Black Gauss-Seidel smoother.
## Usage
Look at the test program `test/test_poisson.cu` to learn how to use the code.
```CUDA
// Select problem to solve
using CUDAProblem = CUDAPoisson;
// Select smoother
using CUDASmoother = CUDAGaussSeidelRedBlack;
// Select multigrid solver
using CUDAMG = CUDAMultigrid;
CUDAProblem problem(l, h, modes);
CUDAMG solver(problem);
auto out = solve(solver, problem, opts);
printf("Iterations: %d, Residual: %g \n", out.iterations, out.residual);
```
```
CPU: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
GPU: NVIDIA RTX 2080 Ti
test/test_poisson
Solver: Gauss-Seidel (red-black)
Iteration Residual
10 6.674289
20 1.369973
30 0.2812023
40 0.05771992
50 0.01184766
60 0.002431866
70 0.0004991677
80 0.00010245***
90 2.103102e-05
100 4.316853e-06
110 8.860825e-07
120 1.818784e-07
130 3.733259e-08
Iterations: 139, Residual: 8.97769e-09
Solver: Multi-Grid
Iteration Residual
10 4.95***94e-09
Iterations: 10, Residual: 4.95***9e-09
Solver: CUDA Gauss-Seidel (red-black)
Iteration Residual
10 6.674289
20 1.369973
30 0.2812023
40 0.05771992
50 0.01184766
60 0.002431866
70 0.0004991677
80 0.00010245***
90 2.103102e-05
100 4.316853e-06
110 8.860825e-07
120 1.818784e-07
130 3.733259e-08
Iterations: 139, Residual: 8.97769e-09
Solver: CUDA Multi-Grid
Iteration Residual
10 4.95***94e-09
Iterations: 10, Residual: 4.95***9e-09
MMS convergence test
Solver: Multi-Grid
Grid Size Iterations Time (ms) Residual Error Rate
6 x 6 1 0.00406 2.4718e-16 0.9348 -inf
10 x 10 8 0.01162 8.3216e-09 0.30908 1.59669
18 x 18 10 0.03814 4.9599e-09 0.08183 1.91727
34 x 34 11 0.14534 1.6851e-09 0.02074 1.***024
66 x 66 11 0.55350 2.2692e-09 0.0052025 1.99512
130 x 130 11 2.1***37 2.4601e-09 0.0013017 1.9***78
258 x 258 11 8.77101 2.5173e-09 0.0003255 1.99970
514 x 514 11 40.58800 2.5417e-09 8.1378e-05 1.99993
1026 x 1026 11 179.36189 2.5836e-09 2.0344e-05 2.00001
2050 x 2050 11 746.79437 2.735e-09 5.0858e-06 2.00010
40*** x 40*** 11 3020.28027 3.38e-09 1.2711e-06 2.00041
8194 x 8194 11 12569.50879 7.1411e-09 3.174e-07 2.00166
MMS convergence test
Solver: CUDA Multi-Grid
Grid Size Iterations Time (ms) Residual Error Rate
6 x 6 1 0.08147 2.4718e-16 0.9348 -inf
10 x 10 8 0.46346 8.3216e-09 0.30908 1.59669
18 x 18 10 0.7***61 4.9599e-09 0.08183 1.91727
34 x 34 11 1.04093 1.6851e-09 0.02074 1.***024
66 x 66 11 1.23590 2.2692e-09 0.0052025 1.99512
130 x 130 11 1.52275 2.4601e-09 0.0013017 1.9***78
258 x 258 11 2.05760 2.5173e-09 0.0003255 1.99970
514 x 514 11 3.58195 2.5417e-09 8.1378e-05 1.99993
1026 x 1026 11 10.08339 2.5836e-09 2.0344e-05 2.00001
2050 x 2050 11 29.20931 2.735e-09 5.0858e-06 2.00010
40*** x 40*** 11 109.10480 3.38e-09 1.2711e-06 2.00041
8194 x 8194 11 397.84283 7.1411e-09 3.174e-07 2.00166
```
## TODO
Currently, the GPU solver is more or less a copy-and-paste-version of the CPU solver. It would be
nice to remove much of the code duplication. There's also room for improvement in terms of
optimization. First, I should perform a baseline profiling to identify the current hotspots.
Nonetheless, I already have some ideas on what can be improved. The red-black Gauss-Seidel
implementation is currently terrible. It takes two trips to DRAM to load the data. First, once for
the red points, and then once more for the black points. At each time, only half of the threads are
active. It should possible to keep all threads active by loading the data into shared memory
using a float2/double2 instruction. Some kernels like the restriction and residual kernel can also
probably merged into a single one to further reduce memory traffic.
近期下载者:
相关文件:
收藏者: