muda
所属分类:GPU/显卡
开发工具:C++
文件大小:3349KB
下载次数:0
上传日期:2023-03-18 17:07:34
上 传 者:
sh-1993
说明: μ-Cuda,又一个无痛的Cuda编程范例。功能:结构化启动、图形、仅线程STL。
(μ-Cuda, yet another painless cuda programming paradigm. With features: structured launch, graph, thread only STL.)
文件列表:
.clang-format (2011, 2023-07-14)
.editorconfig (40, 2023-07-14)
CMakeLists.txt (4533, 2023-07-14)
LICENSE (11357, 2023-07-14)
doc (0, 2023-07-14)
doc\dev_gui_guide.md (370, 2023-07-14)
doc\developer_zh.md (14383, 2023-07-14)
doc\implement.md (2451, 2023-07-14)
doc\import_guide.md (1095, 2023-07-14)
doc\profile_muda_zh.assets (0, 2023-07-14)
doc\profile_muda_zh.assets\profile_result.png (64662, 2023-07-14)
doc\profile_muda_zh.md (1532, 2023-07-14)
doc\tutorial_zh.md (13213, 2023-07-14)
example (0, 2023-07-14)
example\cooperative_groups (0, 2023-07-14)
example\cooperative_groups\async_transfer.cu (1508, 2023-07-14)
example\device_query (0, 2023-07-14)
example\device_query\device_query.cu (14062, 2023-07-14)
example\device_query\log.txt (12710, 2023-07-14)
example\event (0, 2023-07-14)
example\event\event.cu (1277, 2023-07-14)
example\example_common.h (2991, 2023-07-14)
example\graph (0, 2023-07-14)
example\graph\graph_bindless.cu (4829, 2023-07-14)
example\graph\graph_manager.cu (3804, 2023-07-14)
example\graph\graph_quick_start.cu (1991, 2023-07-14)
example\hello_muda (0, 2023-07-14)
example\hello_muda\hello_muda.cu (621, 2023-07-14)
example\hello_muda\muda_vs_cuda.cu (1008, 2023-07-14)
example\hello_muda\vector_add.cu (1853, 2023-07-14)
example\launch (0, 2023-07-14)
example\launch\parallel_for.cu (2473, 2023-07-14)
example\main.cpp (53, 2023-07-14)
example\mk.py (668, 2023-07-14)
example\pba (0, 2023-07-14)
example\pba\sph2d.cu (10841, 2023-07-14)
... ...
# muda
MUDA is **-CUDA**, yet another painless CUDA programming **paradigm**.
> stop bothering yourself using raw CUDA, just stay elegant and clear-minded.
## overview
easy launch:
```c++
#include
using namespace muda;
int main()
{
launch(1, 1)
.apply(
[] __device__()
{
print("hello muda!\n");
}).wait();
}
```
muda vs cuda:
```c++
/*
* muda style
*/
void muda()
{
device_vector dv(***, 1);
stream s;
parallel_for(2, 16, 0, s) // parallel-semantic
.apply(***, //automatically cover the range using (gridim=2, blockdim=16)
[
// mapping from the device_vector to a proper viewer
// which can be trivially copy through device and host
dv = make_viewer(dv)
]
__device__(int i) mutable
{
dv(i) *= 2; // safe, the viewer check the boundary automatically
})
.wait();// happy waiting, muda remember the stream.
//.apply(...) //if you want to go forward with the same config, just call .apply() again.
}
/*
* cuda style
*/
// manually create kernel
__global__ void times2(int* i, int N) // modifying parameters is horrible
{
auto tid = threadIdx.x;
if(tid < N) // check corner case manaully
{
i[tid] *= 2;// unsafe: no boundary check at all
}
}
void muda_vs_cuda()
{
// to be brief, we just use thrust to allocate memory
thrust::device_vector dv(***, 1);
// cast to raw pointer
auto dvptr = thrust::raw_pointer_cast(dv.data());
// create stream and check error
cudaStream_t s;
checkCudaErrors(cudaStreamCreate(&s));
// call the kernel (which always ruins the Intellisense, if you use VS.)
times2<<<1, ***, 0, s>>>(dvptr, dv.size());
// boring waiting and error checking
checkCudaErrors(cudaStreamSynchronize(s));
}
```
## quick start
run example:
```shell
$ xmake f --example=true
$ xmake
$ xmake run muda_example hello_muda
```
to show all examples:
```shell
$ xmake run muda_example -l
```
play all examples:
```shell
$ xmake run muda_example
```
## tutorial
- [tutorial_zh](https://github.com/MuGdxy/muda/blob/master/./doc/tutorial_zh.md)
## features
### thread_only container
allow you to use STL-like containers and algorithms in one thread.
```cpp
#include
#include
using namespace muda;
namespace to = muda::thread_only;
int main()
{
launch(1, 1)
.apply(
[] __device__() mutable
{
to::priority_queue queue;
auto& container = queue.get_container();
container.reserve(16);
queue.push(4);queue.push(5);queue.push(6);
queue.push(7);queue.push(8);queue.push(9);
while(!queue.empty())
{
print("%d ", queue.top());
queue.pop();
}
//result: 9 8 7 6 5 4
})
.wait();
}
```
### graph support
a friendly way to create cuda graph.
```cpp
void graph_example()
{
device_var value = 1;
// create a graph
auto graph = graph::create();
// create kernel as graph node parameters(we switch from .apply to .asNodeParms)
auto pA = parallel_for(1).asNodeParms(1,
[] __device__(int i) mutable
{
print("A\n");
});
auto pB = parallel_for(1).asNodeParms(1,
[] __device__(int i) mutable
{
print("B\n");
});
auto phB = host_call().asNodeParms(
[] __host__()
{
std::cout << "host" << std::endl;
});
auto pC = parallel_for(1).asNodeParms(1,
[value = make_viewer(value)] __device__(int i) mutable
{
print("C, value=%d\n", value);
value = 2;
});
auto pD = launch(1, 1).asNodeParms(
[value = make_viewer(value)] __device__() mutable
{
print("D, value=%d\n", value);
});
// create nodes and dependencies
auto kA = graph->addKernelNode(pA);
auto kB = graph->addKernelNode(pB);
// hB deps on kA and kB
auto hB = graph->addHostNode(phB, {kA, kB});
// kC deps on hB
auto kC = graph->addKernelNode(pC, {hB});
// kD deps on kC
auto kD = graph->addKernelNode(pD, {kC});
// create instance (or say graphExec)
auto instance = graph->instantiate();
//launch on the default stream
instance->launch();
}
```
## Contribute
Go to [developer_zh.md](https://github.com/MuGdxy/muda/blob/master/./doc/developer_zh.md) and [zhihu-ZH](https://github.com/MuGdxy/muda/blob/master/https://zhuanlan.zhihu.com/p/592439225) for further info.
## More Info
[zhihu-ZH](https://github.com/MuGdxy/muda/blob/master/https://zhuanlan.zhihu.com/p/592439225) : description of muda.
近期下载者:
相关文件:
收藏者: