halide-cuda-sat-perf

所属分类:GPU/显卡
开发工具:CSS
文件大小:73KB
下载次数:0
上传日期:2014-07-22 23:50:28
上 传 者sh-1993
说明:  总面积表性能测试
(Summed area table performance test)

文件列表:
Makefile (129, 2014-07-11)
Makefile.common (121, 2014-07-11)
cuda_summed_table (0, 2014-07-11)
cuda_summed_table\Makefile (1027, 2014-07-11)
cuda_summed_table\include (0, 2014-07-11)
cuda_summed_table\include\alloc.h (2179, 2014-07-11)
cuda_summed_table\include\defs.h (2037, 2014-07-11)
cuda_summed_table\include\dvector.h (10243, 2014-07-11)
cuda_summed_table\include\error.h (1180, 2014-07-11)
cuda_summed_table\include\extension.h (5813, 2014-07-11)
cuda_summed_table\include\gpuconsts.cuh (4949, 2014-07-11)
cuda_summed_table\include\gpudefs.h (3647, 2014-07-11)
cuda_summed_table\include\symbol.h (7730, 2014-07-11)
cuda_summed_table\include\util.h (28152, 2014-07-11)
cuda_summed_table\src (0, 2014-07-11)
cuda_summed_table\src\defs.cc (0, 2014-07-11)
cuda_summed_table\src\gpudefs.cu (1, 2014-07-11)
cuda_summed_table\src\kernel_perf.cc (1271, 2014-07-11)
cuda_summed_table\src\sat.cu (4294, 2014-07-11)
hl_summed_table (0, 2014-07-11)
hl_summed_table\Makefile (602, 2014-07-11)
hl_summed_table\hl_summed_table.cpp (5050, 2014-07-11)
nv_profiler (0, 2014-07-11)
nv_profiler\cuda_summed_table.nvvp (6937, 2014-07-11)
nv_profiler\hl_summed_table.nvvp (11085, 2014-07-11)
ptx (0, 2014-07-11)
ptx\fermi (0, 2014-07-11)
ptx\fermi\hl_0.ptx (6808, 2014-07-11)
ptx\fermi\hl_0.sass (42746, 2014-07-11)
ptx\fermi\hl_1.ptx (5517, 2014-07-11)
ptx\fermi\hl_2.ptx (6698, 2014-07-11)
ptx\fermi\hl_3.ptx (6740, 2014-07-11)
ptx\fermi\hl_4.ptx (6696, 2014-07-11)
ptx\fermi\sat.ptx (50746, 2014-07-11)
ptx\fermi\sat.sass (24999, 2014-07-11)
ptx\gaurav-fermi (0, 2014-07-11)
ptx\gaurav-fermi\version_0.ptx (7023, 2014-07-11)
... ...

Halide performance debugging ============================ Comparision between Halide and CUDA version of an app that partitions a 4096x4096 image into 32x32 tiles and computes the summed area table within each tile. **Halide version**: runs several kernels - only version 0 computes the summed area table, other kernels are meant to demonstrate the effect of different Halide update definitions on instruction count and global memory throughput. **CUDA version**: source code from GPU efficient recursive filtering and summed area table (SIGGRAPH 2011), [Nehab et al.] Compilation ----------- - Makefile provided for both projects - Edit Makefile.common to set the CUDA include path and Halide base path Profiling files --------------- The directory nv_profile NVIDIA profiling tools profiling logs. Can be opened using $ nvvp cuda_summed_table.nvvp
$ nvvp hl_summed_table.nvvp
Generated ptx and stamement files --------------------------------- The directory ptx and stmt contains the generated ptx and statement files for the different Halide kernels. These can be regenerated by: $ HL_JIT_TARGET=cuda-gpu_debug HL_DEBUG_CODEGEN=1 ./hl_summed_table 2> hl_summed_table.ptx

近期下载者

相关文件


收藏者