bluefog

所属分类:前端开发
开发工具:Python
文件大小:5456KB
下载次数:0
上传日期:2023-03-28 03:38:13
上 传 者sh-1993
说明:  PyTorch在图上的分布式和去中心化训练框架
(Distributed and decentralized training framework for PyTorch over graph)

文件列表:
.clang-format (59, 2023-03-28)
.pylintrc (17736, 2023-03-28)
.readthedocs.yml (251, 2023-03-28)
.travis.yml (1725, 2023-03-28)
LICENSE (11342, 2023-03-28)
MANIFEST.in (163, 2023-03-28)
Makefile (1742, 2023-03-28)
bluefog (0, 2023-03-28)
bluefog\__init__.py (22, 2023-03-28)
bluefog\common (0, 2023-03-28)
bluefog\common\__init__.py (0, 2023-03-28)
bluefog\common\basics.py (22552, 2023-03-28)
bluefog\common\common.cc (5064, 2023-03-28)
bluefog\common\common.h (9920, 2023-03-28)
bluefog\common\cuda (0, 2023-03-28)
bluefog\common\cuda\cuda_kernels.cu (4716, 2023-03-28)
bluefog\common\cuda\cuda_kernels.h (1095, 2023-03-28)
bluefog\common\cuda_util.cc (1525, 2023-03-28)
bluefog\common\cuda_util.h (1074, 2023-03-28)
bluefog\common\global_state.h (4197, 2023-03-28)
bluefog\common\half.cc (1333, 2023-03-28)
bluefog\common\half.h (4451, 2023-03-28)
bluefog\common\logging.cc (3700, 2023-03-28)
bluefog\common\logging.h (2510, 2023-03-28)
bluefog\common\message.cc (14702, 2023-03-28)
bluefog\common\message.h (5984, 2023-03-28)
bluefog\common\mpi_context.cc (25470, 2023-03-28)
bluefog\common\mpi_context.h (8625, 2023-03-28)
bluefog\common\mpi_controller.cc (68837, 2023-03-28)
bluefog\common\mpi_controller.h (7934, 2023-03-28)
bluefog\common\nccl_controller.cc (87072, 2023-03-28)
... ...

BlueFog ======= .. image:: https://github.com/Bluefog-Lib/bluefog/actions/workflows/ci.yml/badge.svg :target: https://github.com/Bluefog-Lib/bluefog/actions/workflows/ci.yml/badge.svg .. image:: https://img.shields.io/badge/License-Apache%202.0-blue.svg :target: https://img.shields.io/badge/License-Apache%202.0-blue.svg :alt: License .. image:: https://zenodo.org/badge/225537951.svg :target: https://zenodo.org/badge/latestdoi/225537951 .. image:: https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat :target: https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat .. raw:: html

Logo

BlueFog is a high-performance distributed training framework built with **decentralized optimization** algorithms. The goal of Bluefog is to make decentralized algorithms easy to use, fault-tolerant, friendly to heterogeneous environment, and even faster than training frameworks built with parameter server, or ring-allreduce. Performance ----------- Below are the charts representing the performance of BlueFog that was done on ResNet50 benchmark. Each machine has 8 V100 GPUs (***GB memory) with NVLink-enabled and the inter-connected communication speed is 25Gbps. This is the same hardware setup you can get on AWS_. We test the scaling efficiency with a batch size of *** for a computationally intensive scenario, and a batch size of 32 for a communicationally intensive scenario. .. raw:: html

Benchmark 1Benchmark 2

In the figures, the black box represents the ideal linear scaling. It is observed that Bluefog can achieve over 95% scaling efficiency while Horovod reaches around 66% sacling efficiency with batch size *** on 128 GPUs. For the communicationally intensive scenario with batch size 32, the scaling efficiency gap between Bluefog and Horovod becomes even larger. To understand more details about the BlueFog benchmark, checkout our `performance page `_. Overview -------- BlueFog is built with decentralized optimization algorithms. This is fundamentally different from other popular distributed training frameworks, such as DistributedDataParallel provided by PyTorch, Horovod, BytePS, etc. In each communication stage, neither the typical star-shaped parameter-server toplogy, nor the pipelined ring-allreduce topology is used. Instead, BlueFog will exploit a virtual and probably dynamic network topology (that can be in any shape) to achieve most communication efficiency. .. Main Idea: Replace expensive allreduce averaging over gradients by cheap neighbor averaging over parameters For each training iteration, one process (or agent) will update its model with information received from its **direct** neighbors defined by the virtual topology. It is observed all communications only occur over the predefied virtual topolgy and no global communication is required. This is why the algorithms is named *decentralized*. Decentralized training algorithms are proved in literature that it can converge to the same solution as their standard centralized counterparts. The topology decides the communication efficiency. BlueFog supports both **static** topology and **dynamic** topology usages. After tremendous trials, the dynamic Exponential-2 graph is observed to achieve the best performance if the number of agents is the power of 2, such as 4, 32, 128 agents. In Exponential-2 graph, each agent will communicates with the neighbors that are 2 :sup:`0`, 2 :sup:`1`, ..., 2 :sup:`t` hops away. **Dynamic** toplogy means all agents select one neighbor only in one iteration and select next neighbor in next iteration as illustrated in the following figure: .. raw:: html

one-peer-exp2

In this scenario, the communcation cost for each iteration is only one unit delay, one standard parameter size to transmit and no communication conflict happens, which is better than what parameter server or ring-allreduce promises. As for loss and accuracy guarantees, please check out our theoratical paper and our `slides `_ preseneted on MLA'20. [A full tutorial will be added in future]. Quick Start ----------- First, make sure your environment is with ``python>=3.7`` and ``openmpi >= 4.0``. Then, install Bluefog with: ``pip install --no-cache-dir bluefog`` or ``BLUEFOG_WITH_NCCL=1 pip install bluefog`` if NCCL is supported (``NCCL>=2.7``). Check the `install_bluefog `_ page if you need more information or other install options. Using BlueFog With Jupyter Notebook ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ BlueFog is able to run interactively with Jupyte Notebook. Please check out our `hello world notebook `_ or other notebooks in the example folder to start with. Interacitve BlueFog is great for research and algorithm experiment. For the large-scale machine learning problem, we recommand to use BlueFog with script. Using BlueFog With Script ^^^^^^^^^^^^^^^^^^^^^^^^^ We provide high-level wrapper for torch optimizer. You just need to modify the existing script to distributed implementation is wrapping the optimizer with our ``DistributedNeighborAllreduceOptimizer``, then run it through ``bfrun``. That is it! .. code-block:: python # Execute Python functions in parallel through # bfrun -np 4 python file.py import torch import bluefog.torch as bf ... bf.init() optimizer = optim.SGD(model.parameters(), lr=lr * bf.size()) optimizer = bf.DistributedNeighborAllreduceOptimizer( optimizer, model=model ) ... Previous example is for static topology usage. For dynamic topology case, you need a little bit more code: .. code-block:: python from bluefog.common import topology_util ... # Same setup code as previous snippets dynamic_neighbors_gen = topology_util.GetInnerOuterExpo2DynamicSendRecvRanks( bf.size(), local_size=bf.local_size(), self_rank=bf.rank()) def dynamic_topology_update(epoch, batch_idx): send_neighbors, recv_neighbors = next(dynamic_neighbors_gen) avg_weight = 1/(len(recv_neighbors) + 1) optimizer.send_neighbors = to_neighbors optimizer.neighbor_weights = {r: avg_weight for r in recv_neighbors} optimizer.self_weight = avg_weight # Torch training code for epoch in range(epochs): for batch_idx, (data, target) in enumerate(train_loader): dynamic_topology_update(epoch, batch_idx) ... loss.backward() optimizer.step() Check our BlueFog `dynamic topology neighbor averaging `_ page to see more on how to control and use topology. See BlueFog `examples`_ folder for full code. We also provide lots of low-level functions, which you can use those as building blocks to construct your own distributed training algorithm. The following example illustrates how to run a simple consensus algorithm through bluefog. .. code-block:: python import torch import bluefog.torch as bf bf.init() x = torch.Tensor([bf.rank()]) for _ in range(100): x = bf.neighbor_allreduce(x) print(f"{bf.rank()}: Average value of all ranks is {x}") Checkout our `API explanation page `_ to see all supported *synchronous* and *asynchronous* features. The Bluefog source code was based off `Horovod `_ repository. Hence, BlueFog shared lots of common features from Horovod such as `timeline `_, tensor-fusion, etc. Here, we want to express our gratitude to the Horovod team. Materials --------- *Bluefog: Make decentralized algorithms practical for optimization and deep learning*. B. Ying, K. Yuan, H. Hu, Y. Chen, and W. Yin. arXiv preprint arXiv:2111.04287, 2021. `[link] `_ *Faster Learning over Networks and BlueFog*, BlueFog Team, invited talk at MLA, 2020 `[slides] `_ Cite --------- Bluefog is uploaded to Zenodo. An equivalent BibTex format reference is below for all the versions: .. code-block:: bibtex % System paper @article{bluefog, author = {Ying, Bicheng and Yuan, Kun and Hu, Hanbin and Chen, Yiming and Yin, Wotao}, title = {BlueFog: Make Decentralized Algorithms Practical for Optimization and Deep Learning}, journal. = {arXiv preprint arXiv:2111.04287}, year = {2021}, } % Theoratical Papers @article{ying2021exponential, title={Exponential Graph is Provably Efficient for Decentralized Deep Training}, author={Ying, Bicheng and Yuan, Kun and Chen, Yiming and Hu, Hanbin and Pan, Pan and Yin, Wotao}, journal={Advances in Neural Information Processing Systems (NeurIPS), 34. Also available at arXiv:2110.13363}, year={2021} } @inproceedings{yuan2021decentlam, title={DecentLaM: Decentralized Momentum SGD for Large-Batch Deep Training}, author={Yuan, Kun and Chen, Yiming and Huang, Xinmeng and Zhang, Yingya and Pan, Pan and Xu, Yinghui and Yin, Wotao}, booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision}, pages={3029--3039}, year={2021} } @article{yuan2020influence, title={On the influence of bias-correction on distributed stochastic optimization}, author={Yuan, Kun and Alghunaim, Sulaiman A and Ying, Bicheng and Sayed, Ali H}, journal={IEEE Transactions on Signal Processing}, volume={68}, pages={4352--4367}, year={2020}, publisher={IEEE} } Troubleshooting --------- Import bluefog.torch failed ^^^^^^^^^^^^^^^^^^^^^^^^^ If you see the error message below, it means that bluefog is not installed properly. Please install bluefog using github source and recompile bluefog (e.g. make clean && make -j $(nproc) && BLUEFOG_WITH_NCCL=1 pip install .) .. code-block:: python import bluefog.torch as bf Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python3.7/dist-packages/bluefog/torch/__init__.py", line 34, in from bluefog.torch.mpi_ops import init, shutdown File "/usr/local/lib/python3.7/dist-packages/bluefog/torch/mpi_ops.py", line 23, in from bluefog.torch import mpi_lib # C library ImportError: /usr/local/lib/python3.7/dist-packages/bluefog/torch/mpi_lib.cpython-37m-x86_***-linux-gnu.so: undefined symbol: _ZN7bluefog6common14NCCLController9AllreduceERNS0_16TensorTableEntryE .. _AWS: https://aws.amazon.com/about-aws/whats-new/2018/12/introducing-amazon-ec2-p3dn-instances-our-most-powerful-gpu-instance-yet/ .. _examples: https://github.com/Bluefog-Lib/bluefog/tree/master/examples

近期下载者

相关文件


收藏者