DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
-
Updated
Jan 13, 2023 - Python
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Colossal-AI: A Unified Deep Learning System for Big Model Era
Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
A state-of-the-art multithreading runtime: message-passing based, fast, scalable, ultra-low overhead
Paddle Distributed Training Examples. 飞桨分布式训练示例 Resnet Bert GPT MOE DataParallel ModelParallel PipelineParallel HybridParallel AutoParallel Zero Sharding Recompute GradientMerge Offload AMP DGC LocalSGD Wide&Deep
LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training
Distributed Keras Engine, Make Keras faster with only one line of code.
Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.
Ternary Gradients to Reduce Communication in Distributed Deep Learning (TensorFlow)
Orkhon: ML Inference Framework and Server Runtime
WIP. Veloce is a low-code Ray-based parallelization library that makes machine learning computation novel, efficient, and heterogeneous.
Understanding the effects of data parallelism and sparsity on neural network training
OpenCL powered Merklization using BLAKE3
Dependence-Based Code Transformation for Coarse-Grained Parallelism
Development of Project HPGO | Hybrid Parallelism Global Orchestration
A decentralized and distributed framework for training DNNs
Torch Automatic Distributed Neural Network (TorchAD-NN) training library. Built on top of TorchMPI, this module automatically parallelizes neural network training.
Add a description, image, and links to the data-parallelism topic page so that developers can more easily learn about it.
To associate your repository with the data-parallelism topic, visit your repo's landing page and select "manage topics."