gpuPOM

所属分类:GPU/显卡
开发工具:Cuda
文件大小:236KB
下载次数:0
上传日期:2014-12-24 08:13:26
上 传 者sh-1993
说明:  基于GPU的普林斯顿海洋模型
(a GPU-based Princeton Ocean Model)

文件列表:
gpuPOMv1.0 (0, 2014-12-24)
gpuPOMv1.0\makefile (4171, 2014-12-24)
gpuPOMv1.0\pom.h (33799, 2014-12-24)
gpuPOMv1.0\pom (0, 2014-12-24)
gpuPOMv1.0\pom\cadvance.c (61744, 2014-12-24)
gpuPOMv1.0\pom\cadvance.h (1899, 2014-12-24)
gpuPOMv1.0\pom\cadvance_gpu.cu (114093, 2014-12-24)
gpuPOMv1.0\pom\cadvance_gpu.h (408, 2014-12-24)
gpuPOMv1.0\pom\cadvance_gpu_kernel.h (9983, 2014-12-24)
gpuPOMv1.0\pom\cassim.c (197, 2014-12-24)
gpuPOMv1.0\pom\cassim.h (79, 2014-12-24)
gpuPOMv1.0\pom\cassim_drf.c (66, 2014-12-24)
gpuPOMv1.0\pom\cassim_drf.h (73, 2014-12-24)
gpuPOMv1.0\pom\cbounds_forcing.c (31832, 2014-12-24)
gpuPOMv1.0\pom\cbounds_forcing.h (149, 2014-12-24)
gpuPOMv1.0\pom\cbounds_forcing_gpu.cu (121077, 2014-12-24)
gpuPOMv1.0\pom\cbounds_forcing_gpu.h (144, 2014-12-24)
gpuPOMv1.0\pom\cbounds_forcing_gpu_kernel.h (2298, 2014-12-24)
gpuPOMv1.0\pom\cinitialize.c (47784, 2014-12-24)
gpuPOMv1.0\pom\cinitialize.h (4694, 2014-12-24)
gpuPOMv1.0\pom\cinterp.c (61, 2014-12-24)
gpuPOMv1.0\pom\cinterp.h (65, 2014-12-24)
gpuPOMv1.0\pom\cio_pnetcdf.c (56690, 2014-12-24)
gpuPOMv1.0\pom\cio_pnetcdf.h (878, 2014-12-24)
gpuPOMv1.0\pom\cmcsst.c (84, 2014-12-24)
gpuPOMv1.0\pom\cmcsst.h (81, 2014-12-24)
gpuPOMv1.0\pom\cparallel_mpi.c (52571, 2014-12-24)
gpuPOMv1.0\pom\cparallel_mpi.h (1601, 2014-12-24)
gpuPOMv1.0\pom\cparallel_mpi_gpu.cu (114972, 2014-12-24)
gpuPOMv1.0\pom\cparallel_mpi_gpu.h (5309, 2014-12-24)
gpuPOMv1.0\pom\cpom.c (3168, 2014-12-24)
gpuPOMv1.0\pom\criver.c (233, 2014-12-24)
gpuPOMv1.0\pom\criver.h (104, 2014-12-24)
gpuPOMv1.0\pom\csolver.c (143306, 2014-12-24)
gpuPOMv1.0\pom\csolver.h (2303, 2014-12-24)
gpuPOMv1.0\pom\csolver_gpu.cu (466665, 2014-12-24)
gpuPOMv1.0\pom\csolver_gpu.h (8830, 2014-12-24)
gpuPOMv1.0\pom\csolver_gpu_kernel.h (18841, 2014-12-24)
... ...

run_exp002.sh is written for the dam-break test case described in the gmdd paper(doi:10.5194/gmdd-7-7651-2014, http://www.geosci-model-dev-discuss.net/7/7651/2014/gmdd-7-7651-2014.html). The "dam break" simulation is conducted to verify the correctness and test the performance and the scalability of the gpuPOM. It is a baroclinic instability problem which simulates flows produced by horizontal temperature gradients. The model domain is configured as a straight channel with uniform depth of 50 m. Periodic boundary conditions are used in the east-west direction, and the channel is closed in the north and south. Its horizontal resolution is 1kmX1km. To test large computational grid, the default domain size of this test case is 962X722 horizontal grid points and 51 vertical sigma levels. Initially, temperature in the southern half of the channel is 15 degrees Celsius and 25 degrees Celsius in the northern half. The salinity is fixed at 35 psu. The fluid is then allowed to adjust. In the first 3-5 days, geostrophic adjustments occurs. Then unstable wave develops due to baroclinic instability. Eventually, eddies are generated. Noticeably, The gravity wave is confined in the middle of the channel by Rossby radius deformation. gpuPOMv1.0 implements all the optimizations described in the paper, including read-only data cache utilization, local memory blocking, loop/function fusion, communication overlapping optimization and I/O overlapping optimization. This version is developped incrementally with unoptimized reserved in the comments nearby, that is, you can compare the optimized and the original code intuitively. The mpiPOM was firstly transformed into a C version and then to a CUDA-C version called gpuPOM. Therefore, in the source directory pom/, files with the suffix of ".c" are the source files of C-version mpiPOM, and the others with a suffix of ".cu" are the source files of the CUDA-version gpuPOM. The main function is included in the "cpom.c" where the time-stepping "advance_gpu()" function is invoked. The actual executed functions are contained in the *.cu source files. To run the gpuPOM correctly, first, a C/Fortran compiler and the MPI library is requested. Test results in the paper are based on Intel compiler v14.0.1 and Intel MPI Library v4.1.3. Second, Nvidia GPUs and CUDA environments have to be installed. To reproduce the optimizations like read-only data cache utilization, K20/K20X or newer Nvidia's GPU is requested. Test results in the paper are based on K20X/K40 GPU and CUDA5.5. Last, the I/O procedure is based on the Pnetcdf library, which is also required. To configure the parameters of "dam break" test case, domain size and mpi processes between line 44 and 55 of run_exp002.sh have to be carefully designed. They should satisfy: (im_global-2)/(im_local-2) * (jm_global-2)/(jm_local-2) = n_proc. The "days" in line 60 represents the simulation days and Figure 7 in the paper is result of the 39th day.

近期下载者

相关文件


收藏者