mgpu
所属分类:工具库
开发工具:C++
文件大小:0KB
下载次数:0
上传日期:2011-08-09 16:27:08
上 传 者:
sh-1993
说明: 一个多GPU C++编程库,
(A Multi-GPU C++ Programming Library,)
文件列表:
CMakeLists.txt (3354, 2011-08-09)
LICENSE_1_0.txt (1338, 2011-08-09)
bench/ (0, 2011-08-09)
bench/CMakeLists.txt (1943, 2011-08-09)
bench/algorithm/ (0, 2011-08-09)
bench/algorithm/CMakeLists.txt (258, 2011-08-09)
bench/algorithm/fft.cpp (7191, 2011-08-09)
bench/micro/ (0, 2011-08-09)
bench/micro/CMakeLists.txt (408, 2011-08-09)
bench/micro/devicecopy.cu (6015, 2011-08-09)
bench/micro/invoke.cpp (12194, 2011-08-09)
bench/micro/meminfo.cu (746, 2011-08-09)
bench/micro/reduce.cu (16539, 2011-08-09)
bench/stopwatch.cpp (4727, 2011-08-09)
bench/stopwatch.hpp (650, 2011-08-09)
doc/ (0, 2011-08-09)
doc/Makefile (4582, 2011-08-09)
doc/_static/ (0, 2011-08-09)
doc/_static/nature.css (4095, 2011-08-09)
doc/conf.py (7061, 2011-08-09)
doc/gettingstarted.rst (3090, 2011-08-09)
doc/index.rst (1750, 2011-08-09)
doc/make.bat (4507, 2011-08-09)
doc/rationale.rst (1526, 2011-08-09)
doc/tutorial.rst (9596, 2011-08-09)
example/ (0, 2011-08-09)
example/CMakeLists.txt (1546, 2011-08-09)
example/broadcast.cpp (816, 2011-08-09)
example/environment.cpp (634, 2011-08-09)
example/fft.cpp (1195, 2011-08-09)
example/invoke_kernel.cpp (999, 2011-08-09)
example/simple_seg_dev_vector.cpp (779, 2011-08-09)
example/simple_transfer.cpp (1038, 2011-08-09)
include/ (0, 2011-08-09)
include/boost/ (0, 2011-08-09)
include/boost/assert.hpp (4268, 2011-08-09)
include/boost/thread/ (0, 2011-08-09)
include/boost/thread/barrier.hpp (2376, 2011-08-09)
... ...
MGPU
====
The MGPU library strives to simplify the implementation of high performance
applications and algorithms on multi-GPU systems. Its main goal is both to
abstract platform dependent functions and vendor specific APIs, as well as
simplifying communication between different compute elements. The library is
currently an alpha release containing only limited yet already useful
functionality. The documentation is available
[here](http://sschaetz.github.com/mgpu/).
News
----
* 2011-08-04 MGPU v0.1 has been released, download
[bz2](https://github.com/sschaetz/mgpu/raw/archives/mgpu_0_1.tar.bz2) or
[zip](https://github.com/sschaetz/mgpu/raw/archives/mgpu_0_1.zip) archives
or fork the code from github
Examples
--------
The following example shows how a batched FFT can be calculated in parallel on
all available GPUs in a system.
```cpp
#include
#include
#include
#include
#include
using namespace mgpu;
int main(void)
{
environment e;
{
unsigned int dim = 128;
unsigned int batch = 15;
std::size_t blocksize = dim*dim;
std::size_t size = blocksize*batch;
std::vector > host_in(size, std::complex(0));
std::vector > host_out(size, std::complex(0));
std::generate(host_in.begin(), host_in.end(), rand);
seg_dev_vector > in(size, blocksize);
seg_dev_vector > out(size, blocksize);
copy(host_in, in.begin());
// plan 2D FFT batch with dimension and batch
fft, std::complex > f(dim, dim, batch);
f.forward(in, out);
f.inverse(out, in);
// fetch result
copy(in, host_out.begin());
synchronize_barrier();
}
}
```
This example shows how a vector can be distributed across all devices and how
a kernel can be invoked to operate on the local data.
```cpp
#include
#include
#include
#include
#include
#include
#include
using namespace mgpu;
// generate random number
float random_number() { return ((float)(rand()%100) / 100); }
// axpy CUDA kernel code
__global__ void axpy_kernel(
float const a, float * X, float * Y, std::size_t size)
{
unsigned int i = blockIdx.x * blockDim.x + threadIdx.x;
if (i < size) Y[i] = (a * X[i]) + Y[i];
}
// axpy CUDA kernel launcher
void axpy(float const a, dev_range X, dev_range Y)
{
int threads = 256;
int blocks = (X.size() + T - 1) / T;
axpy_kernel<<< blocks, threads >>>(a, X.get_raw_pointer(), Y.get_raw_pointer(), Y.size());
}
int main(void)
{
const std::size_t size = 1024;
environment e;
{
std::vector X(size), Y(size);
float const a = .42;
std::generate(X.begin(), X.end(), random_number);
std::generate(Y.begin(), Y.end(), random_number);
seg_dev_vector X_dev(size), Y_dev(size);
copy(X, X_dev.begin()); copy(Y, Y_dev.begin());
// calculate on devices
invoke_kernel_all(axpy, a, X_dev, Y_dev);
copy(Y_dev, Y.begin());
synchronize_barrier();
// result is now in Y
}
}
```
Please refer to the [documentation](http://sschaetz.github.com/mgpu/>)
for further examples and for information on how to get started.
近期下载者:
相关文件:
收藏者: