融合矩阵库-C/C++开发

  • A6_980280
    了解作者
  • 1.9MB
    文件大小
  • zip
    文件格式
  • 0
    收藏次数
  • VIP专享
    资源类型
  • 0
    下载次数
  • 2022-04-16 07:23
    上传日期
fml版本:0.2-0状态:许可证:BSL-1.0项目主页:https://github.com/fml-fam/fml错误报告:https://github.com/fml-fam/fml/issues文档:https ://fml-fam.github.io/fml fml是Fused Matrix L fml版本:0.3-0状态:许可证:BSL-1.0项目主页:https://github.com/fml-fam/fml错误报告: https:// fml-fam / fml / issues文档:https://fml-fam.github.io/fml fml是融合矩阵库,它是用于密集矩阵的多源,仅标头的C ++库计算。 重点在于用于数据运算的数值运算的实值矩阵类型(浮点型,双精度型和__half型)。 fml的目标是成为“中级”。 也就是说,与直接工作相比,高层
fml-master.zip
内容介绍
# fml * **Version:** 0.5-0 * **Status:** [![Build Status](https://travis-ci.org/fml-fam/fml.png)](https://travis-ci.org/fml-fam/fml) * **License:** [BSL-1.0](http://opensource.org/licenses/BSL-1.0) * **Project home**: https://github.com/fml-fam/fml * **Bug reports**: https://github.com/fml-fam/fml/issues * **Documentation**: https://fml-fam.github.io/fml fml is the Fused Matrix Library, a multi-source, header-only C++ library for dense matrix computing. The emphasis is on real-valued matrix types (`float`, `double`, and `__half`) for numerical operations useful for data analysis. The goal of fml is to be "medium-level". That is, high-level compared to working directly with e.g. the BLAS or CUDA™, but low(er)-level compared to other C++ matrix frameworks. Some knowledge of the use of LAPACK will make many choices in fml make more sense. The library provides 4 main classes: `cpumat`, `gpumat`, `parmat`, and `mpimat`. These are mostly what they sound like, but the particular details are: * <font color="blue">CPU</font>: Single node cpu computing (multi-threaded if using multi-threaded BLAS and linking with OpenMP). * <font color="green">GPU</font>: Single gpu computing. * <font color="red">MPI</font>: Multi-node computing via ScaLAPACK (+gpus if using [SLATE](http://icl.utk.edu/slate/)). * <font color="orange">PAR</font>: Multi-node and/or multi-gpu computing. There are some differences in how objects of any particular type are constructed. But the high level APIs are largely the same between the objects. The goal is to be able to quickly create laptop-scale prototypes that are then easily converted into large scale gpu/multi-node/multi-gpu/multi-node+multi-gpu codes. ## Installation The library is header-only so no installation is strictly necessary. You can just include a copy/submodule in your project. However, if you want some analogue of `make install`, then you could do something like: ```bash ln -s ./src/fml /usr/include/ ``` ## Dependencies and Other Software There are no external header dependencies, but there are some shared libraries you need to have (more information below): * CPU code needs [LAPACK](http://performance.netlib.org/lapack/) (I recommend [OpenBLAS](https://github.com/xianyi/OpenBLAS)) * GPU code needs [NVIDIA® CUDA™](https://developer.nvidia.com/cuda-downloads) * MPI code needs [ScaLAPACK](http://performance.netlib.org/scalapack/) * PAR code needs the libraries required by the CPU and/or GPU features, noted above. Other software we use: * Tests use [catch2](https://github.com/catchorg/Catch2) (a copy of which is included under `tests/`). You can find some examples of how to use the library in the `examples/` tree. Right now there is no real build system beyond some ad hoc makefiles; but ad hoc is better than no hoc. Depending on which class(es) you want to use, here are some general guidelines for using the library in your own project: * CPU: `cpumat` - Compile with your favorite C++ compiler. - Link with LAPACK and BLAS (and ideally with OpenMP). * GPU: `gpumat` - Compile with `nvcc`. - For most functionality, link with libcudart, libcublas, and libcusolver. Link with libcurand if using the random generators. Link with libnvidia-ml if using nvml (if you're only using this, then you don't need `nvcc`; an ordinary C++ compiler will do). If you have CUDA installed and do not know what to link with, there is no harm in linking with all of these. * MPI: `mpimat` - Compile with `mpicxx`. - Link with libscalapack. * PAR: `parmat` - Compile with `mpicxx`. - Link with CPU stuff if using `parmat_cpu`; link with GPU stuff if using `parmat_gpu` (you can use both). Check the makefiles in the `examples/` tree if none of that makes sense. ## Example Here's a simple example computing the SVD with some data held on a single CPU: ```C++ #include <fml/cpu.hh> using namespace fml; int main() { len_t m = 3; len_t n = 2; cpumat<float> x(m, n); x.fill_linspace(1.f, (float)m*n); x.info(); x.print(0); cpuvec<float> s; linalg::svd(x, s); s.info(); s.print(); return 0; } ``` Save as `svd.cpp` and build with: ```bash g++ -I/path/to/fml/src -fopenmp svd.cpp -o svd -llapack -lblas ``` You should see output like ``` # cpumat 3x2 type=f 1 4 2 5 3 6 # cpuvec 2 type=f 9.5080 0.7729 ``` The API is largely the same if we change the object storage, but we have to change the object initialization. For example, if `x` is an object of class `mpimat`, we still call `linalg::svd(x, s)`. The differences lie in the creation of the objects. Here is how we might change the above example to use distributed data: ```C++ #include <fml/mpi.hh> using namespace fml; int main() { grid g = grid(PROC_GRID_SQUARE); g.info(); len_t m = 3; len_t n = 2; mpimat<float> x(g, m, n, 1, 1); x.fill_linspace(1.f, (float)m*n); x.info(); x.print(0); cpuvec<float> s; linalg::svd(x, s); if (g.rank0()) { s.info(); s.print(); } g.exit(); g.finalize(); return 0; } ``` In practice, using such small block sizes for an MPI matrix is probably not a good idea; we only do so for the sake of demonstration (we want each process to own some data). We can build this new example via: ```bash mpicxx -I/path/to/fml/src svd.cpp -fopenmp svd.cpp -o svd -lscalapack-openmpi ``` We can launch the example with multiple processes via ```bash mpirun -np 4 ./svd ``` And here we see: ``` ## Grid 0 2x2 # mpimat 3x2 on 2x2 grid type=f 1 4 2 5 3 6 # cpuvec 2 type=f 9.5080 0.7729 ``` ## High-Level Language Bindings * R bindings: [fmlr](https://github.com/fml-fam/fmlr) ## Header and API Stability tldr: * Use the super headers (or read the long explanation) * CPU - `fml/cpu.hh` * GPU - `fml/gpu.hh` * MPI - `fml/mpi.hh` * PAR still evolving * Existing API's are largely stable. Most changes will be additions rather than modifications. The project is young and things are still mostly evolving. The current status is: ### Headers There are currently "super headers" for CPU (`fml/cpu.hh`), GPU (`fml/gpu.hh`), and MPI (`fml/mpi.hh`) backends. These include all relevant sub-headers. These are "frozen" in the sense that they will not move and will always include everything. However, as more namespaces are added, those too will be included in the super headers. The headers one folder level deep (e.g. those in `fml/cpu`) are similarly frozen, although more may be added over time. Headers two folder levels Internals are evolving and subject to change at basically any time. Notable changes will be mentioned in the changelog. ### API * **Frozen**: Existing APIs will not be developed further. - none * **Stable**: Existing APIs are not expected to change. Some new features may be added slowly. - cpumat/gpumat/mpimat classes - copy namespace functions - linalg namespace functions (all but parmat) * **Stabilizing**: Core class naming and construction/destruction is probably finalized. Function/method names and arguments are solidifying, but may change somewhat. New features are still being developed. - dimops namespace functions * **Evolving**: Function/method names and arguments are subject to change. New features are actively being developed. - stats namespace functions * **Experimental**: Nothing is remotely finalized. - parmat - all functions and methods Internals are evolving and subject to change at basically any time. Notable changes will be mentioned in the changelog. ## Philosophy and Similar Projects Some similar C/C++ projects worth mentioning: * [Armadillo](http://arma.sourceforge.net/) * [Eigen](http://eigen.tuxfamily.org/) * [Boost](http://www.boost.org/) * [PETSc](https://www.mcs.anl.gov/petsc/) * [GSL](https://www.gnu.org/software/gsl/) These are all great libraries which have stood the test of time. Armadillo in particular
评论
    相关推荐
    • C++ 读写 parquet 文件 Demo
      项目使用vs2015,使用的git上的三方开源框架,进行了裁剪,基于C++创建,读写parquet文件。
    • lexy:C++ 解析器组合器库
      lexy是 C++17 及以后的解析器组合库。 它允许您通过在方便的 C++ DSL 中指定来编写解析器,这为您提供了手写解析器的所有灵活性和控制,而无需所有手动工作。 文档: IPv4地址解析器 # include # include ...
    • cpp-practice:练习程序以阻止我的 C++ 生锈
      练习程序以阻止我的 C++ 生锈。 不供任何人使用。 让你的头脑保持新鲜感并防止自己对 C++ 失去熟悉的最好方法,这些是为我的实践编写的简单程序的个人集合。 根据自由的 MIT 许可证随意使用此代码,并作为您自己的...
    • C++&QT工具01.rar
      包括TortoiseGit-2.11.0.0-64bit/vs_Professional_16.4.6/Git-2.31.0-64-bit和qt的下载链接
    • add SublimeGit
      SublimeGit
    • 个人网站git
      个人网站git
    • rplusplus:让 C++ 不那么痛苦
      C++ 不那么痛苦。 R++ 旨在成为命令行实用程序和神奇仙尘的集合,让所有 C++ 怪物消失,让您作为出色的程序员,了解重要的事情。 安装 $ gem install rplusplus 用法 创建一个新项目 使用 Rakefile、gitignore...
    • Scott Meyers - Effective C++ Digital Collection 140 Ways to Impr
      学习C++ 和STL学习学习的书,三合一(共140种改善您的C++程序的经验之谈,C++程序员必读),英文原版,带书签,2018版本。难得的c++学习资料
    • 简易本地Git C++
      使用C++开发的简单Git,包括创建log,查看log,恢复某个日期创建的log,删除某个log。并且在创建log的过程中,调用压缩库,自动压缩代码,以减少整体体积。
    • visual c++
      digital image processing with VC+