Pulse · microsoft/DeepSpeed · GitHub

April 7, 2023 – April 14, 2023

Overview

37 Active pull requests

43 Active issues

1 Release published by 1 person

v0.9.0 DeepSpeed v0.9.0
published Apr 13, 2023

23 Pull requests merged by 15 people

Nested zero.Init() and dynamically defined model class
#2989 merged Apr 14, 2023
Update DeepSpeed-Chat docs with latest changes to scripts
#3219 merged Apr 13, 2023
Update DS-Chat docs for v0.9.0
#3216 merged Apr 13, 2023
[CPU support] Optionally bind each rank to different cores on host
#2881 merged Apr 12, 2023
Update AMD workflows
#3179 merged Apr 12, 2023
fix license badge
#3200 merged Apr 12, 2023
feat: Add support for `NamedTuple` when sharding parameters [#3029]
#3037 merged Apr 12, 2023
fix hero figure
#3199 merged Apr 12, 2023
Add Japanese version of ChatGPT-like pipeline blog
#3194 merged Apr 12, 2023
Chatgpt chinese blog
#3193 merged Apr 12, 2023
Fix typo
#3164 merged Apr 11, 2023
Fix typo
#3183 merged Apr 11, 2023
Fix references to figures
#3189 merged Apr 11, 2023
DeepSpeed Chat
#3186 merged Apr 11, 2023
add news item.
#3188 merged Apr 11, 2023
[docs] add run command for 13b
#3187 merged Apr 11, 2023
Add DeepSpeed-Chat Blogpost
#3185 merged Apr 11, 2023
deepspeed/runtime/utils.py: reset_peak_memory_stats when empty cache
#2803 merged Apr 10, 2023
zero.Init() should pin params in GPU memory as requested
#2953 merged Apr 7, 2023
op_builder: conditionally compute relative path for hip compiled files
#3095 merged Apr 7, 2023
fixing a bug in CPU Adam and Adagrad
#3109 merged Apr 7, 2023
Remove benchmark code
#3157 merged Apr 7, 2023
Update curriculum-learning.md
#3031 merged Apr 7, 2023

14 Pull requests opened by 14 people

Fix handling of (CUDA,ROCR)_VISIBLE_DEVICES
#3165 opened Apr 8, 2023
stage_1_and_2.py: do gradient scale only for fp16
#3166 opened Apr 9, 2023
[Fix] _conv_flops_compute when padding is a str and stride=1
#3169 opened Apr 9, 2023
Enable auto TP policy for llama model
#3170 opened Apr 10, 2023
AMD Kernel Compatibility Fixes
#3180 opened Apr 11, 2023
Documentation for DeepSpeed Accelerator Abstraction Interface
#3184 opened Apr 11, 2023
improving int4 asymmetric quantization accuracy
#3190 opened Apr 11, 2023
Add HE support for the rest of model containers
#3191 opened Apr 12, 2023
Update automatic-tensor-parallelism.md
#3198 opened Apr 12, 2023
Make deepspeed.zero.Init() idempotent
#3203 opened Apr 12, 2023
Additional changes to support MI200
#3204 opened Apr 12, 2023
zero3 checkpoint frozen params
#3205 opened Apr 12, 2023
[update] reference in cifar-10
#3212 opened Apr 13, 2023
Fix for Stable Diffusion
#3218 opened Apr 13, 2023

13 Issues closed by 12 people

What is the principle for 15x speedup? Thank you very much!
#3225 closed Apr 14, 2023
[BUG]pip install deepspeed install fail
#3213 closed Apr 14, 2023
[BUG]deepspeed.ops.adam.DeepSpeedCPUAdam how to use in config file？
#3227 closed Apr 14, 2023
[BUG] RuntimeError: Failed to import transformers.models.opt.modeling_opt
#3215 closed Apr 14, 2023
[BUG]TypeError: allocate_workspace_fp16(): incompatible function arguments.
#3209 closed Apr 14, 2023
[BUG] g++: fatal error: cannot execute ‘cc1plus’: execvp: No such file or directory
#3221 closed Apr 14, 2023
[BUG] using ZeRO-stage 3 with offload, buffer configs don't make memory usage change.
#2869 closed Apr 13, 2023
[BUG] `AttributeError: 'CSVConfig' object has no attribute 'group'`
#2853 closed Apr 12, 2023
[REQUEST]The input sequences must be in the same length when using pipeline parallelism for all batches?
#3195 closed Apr 12, 2023
how to make different rank use same dataloader in pipeline
#3131 closed Apr 12, 2023
[TYPO DOCS] Typo in DeepSpeed Configuration JSON documentation
#3163 closed Apr 11, 2023
[BUG] Cannot build transformer_inference extension
#3171 closed Apr 11, 2023
[BUG] pip install DeepSeed Error
#3145 closed Apr 10, 2023

30 Issues opened by 26 people

[BUG] return getattr(args, f"{model_type[step_num]}_model")
#3231 opened Apr 14, 2023
[BUG] Fail to run the example in/DeepSpeedExamples
#3229 opened Apr 14, 2023
[BUG] batch_size check failed with zero 2 (deepspeed v0.9.0)
#3228 opened Apr 14, 2023
CUDA out of memory
#3224 opened Apr 14, 2023
[BUG] Installed CUDA version 12.1 does not match the version torch was compiled with 11.8
#3223 opened Apr 14, 2023
[BUG]the following arguments are required: user_script, user_args
#3222 opened Apr 14, 2023
[REQUEST] Please spend more time on the usability of the project, especially the doc.
#3220 opened Apr 14, 2023
[BUG] multi-node inference initialization fails when trying not to use replace_with_kernel_inject
#3217 opened Apr 13, 2023
[BUG]Out of memory when training, and is streaming mode supported ?
#3214 opened Apr 13, 2023
[BUG] Unable to pre-compile async_io
#3211 opened Apr 13, 2023
[BUG] NCCL out of memory on `save_checkpoint()`
#3210 opened Apr 13, 2023
[BUG]RuntimeError: Step 1 exited with non-zero status 1
#3208 opened Apr 13, 2023
[BUG]error: can't copy 'deepspeed/accelerator': doesn't exist or not a regular file
#3207 opened Apr 13, 2023
[REQUEST] 省钱15倍，这是不是明显的病句
#3206 opened Apr 13, 2023
[BUG] "with deepspeed.zero.Init()" is not idempotent
#3202 opened Apr 12, 2023
[BUG] error: use of undeclared identifier '__double2half'; did you mean '__double2hiint'?"
#3197 opened Apr 12, 2023
whl does not get created following the instructions on Windows 11 [BUG]
#3196 opened Apr 12, 2023
cpu memory out of use when infering on 30b model
#3192 opened Apr 12, 2023
[BUG] ds inference succeed for 2 gpus, oom for 4 gpus
#3182 opened Apr 11, 2023
[BUG] Inference failed serveral times
#3181 opened Apr 11, 2023
[BUG] Intermittent RuntimeError: The specified pointer resides on host memory and is not registered with any CUDA device.
#3178 opened Apr 10, 2023
[BUG] Deepspeed inference fp16 gives different results than HuggingFace with FlanT5-XL
#3177 opened Apr 10, 2023
[Deepspeed stage-3 student+teacher crash]
#3175 opened Apr 10, 2023
AssertionError: AutoTP not supported for model. Please use kernel injection since container policy for model exists.
#3174 opened Apr 10, 2023
Installing Ops for using with Pyinstaller
#3173 opened Apr 10, 2023
Does ds inference support op fusion for multi-head attention?
#3172 opened Apr 10, 2023
[BUG] High GPU memory use when fine-tuning Flan-T5-xxl (11B) using stage 3
#3168 opened Apr 9, 2023
[BUG] Memory increase consistently when using multiple NCCL_IB_HCA.
#3167 opened Apr 9, 2023
[BUG] state dict loading issue when running an example in https://github.com/huggingface/transformers-bloom-inference/tree/main/bloom-inference-scripts#run
#3161 opened Apr 7, 2023
[BUG] exits with return code = -9
#3160 opened Apr 7, 2023

55 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

[BUG] Fail to inference with 8bit quantized bloom-3b model, shape mismatch error
#2865 commented on Apr 13, 2023 • 6 new comments
add bf16 cuda kernel support
#3092 commented on Apr 14, 2023 • 6 new comments
Installation on Windows 10 (Deepspeed inference)
#2588 commented on Apr 11, 2023 • 3 new comments
Checks for user injection policy
#3052 commented on Apr 11, 2023 • 3 new comments
[BUG] (NVMe Offload with Zero3) Not enough buffers 0 for swapping 1
#3062 commented on Apr 7, 2023 • 2 new comments
[REQUEST] Hey, Microsoft...Could you PLEASE Support Your Own OS?
#2427 commented on Apr 11, 2023 • 2 new comments
Zero-R source code
#452 commented on Apr 11, 2023 • 2 new comments
[BUG] Can't build sparse attention op with PyTorch 2.0
#3117 commented on Apr 11, 2023 • 2 new comments
RuntimeError: 'weight' must be 2-D while training Flan-T5 models with stage 3
#2746 commented on Apr 12, 2023 • 2 new comments
[BUG] Install on AMD ROCm system but fails to build on CUDA dependencies
#3091 commented on Apr 12, 2023 • 2 new comments
[BUG] Outputs of type NamedTuple cause crash in `_apply_to_tensors_only` (stage 3 + shard parameters)
#3029 commented on Apr 13, 2023 • 2 new comments
[BUG] ValueError: max() arg is an empty sequence using bf16 zero stage3
#2820 commented on Apr 13, 2023 • 2 new comments
[BUG]
#3069 commented on Apr 13, 2023 • 2 new comments
[BUG] CUDA illegal memory access on large batch with ZeRO-infinity
#1852 commented on Apr 14, 2023 • 2 new comments
Fix broadcast error on multi-node training with ZeroStage3 and TensorParallel=2
#2999 commented on Apr 14, 2023 • 2 new comments
[CPU] Support Intel CPU inference
#3041 commented on Apr 13, 2023 • 2 new comments
Update torch version check in building sparse_attn
#3152 commented on Apr 13, 2023 • 2 new comments
deepspeed on T4 GPU server and run Stable diffustion model inference error
#2957 commented on Apr 8, 2023 • 1 new comment
[BUG] Error "exits with return code -7" when finetuning FLANT5-xxl on 8x A100
#2897 commented on Apr 8, 2023 • 1 new comment
[BUG] ds-model inference results go far away from that of original model (megatron ) attention-context_layer error
#3124 commented on Apr 8, 2023 • 1 new comment
[BUG] Incorrect Model Output For Contrastive Search
#2809 commented on Apr 10, 2023 • 1 new comment
[BUG] VRAM increasing after each call to model
#3073 commented on Apr 10, 2023 • 1 new comment
Concurrent generation of responses (one GPU, multiple users)
#3080 commented on Apr 10, 2023 • 1 new comment
[BUG] Can't compile DeepSpeed version 0.8.1+ with Cuda 11.7
#2914 commented on Apr 10, 2023 • 1 new comment
[Error] [Win] Unable to pre-compile async_io on Windows
#1769 commented on Apr 10, 2023 • 1 new comment
[BUG] 'StableDiffusionPipeline' object has no attribute 'children'
#2968 commented on Apr 11, 2023 • 1 new comment
[BUG]pip install doesn't work. Please eeelp.
#2137 commented on Apr 11, 2023 • 1 new comment
[BUG] Inference fail with "mat1 and mat2 shapes cannot be multiplied" for Llama model.
#3099 commented on Apr 11, 2023 • 1 new comment
4 X A100 80G train HF_13B Llama, error
#3153 commented on Apr 11, 2023 • 1 new comment
[BUG] Resume from checkpoint Out of memory error (SIGTERM: Killed) for a large model
#3104 commented on Apr 11, 2023 • 1 new comment
[REQUEST] Add more device-agnostic compression algorithms
#2894 commented on Apr 11, 2023 • 1 new comment
[BUG] INFLIGHT parameters after evaluation
#3068 commented on Apr 12, 2023 • 1 new comment
[math] what network throughput is required to handle ZeRO-3 traffic?
#2928 commented on Apr 12, 2023 • 1 new comment
subprocess.CalledProcessError: Command '['which', 'c++']' returned non-zero exit status 1.
#1649 commented on Apr 12, 2023 • 1 new comment
[BUG] Int8 Inference Does Not Work For GPTJ
#2956 commented on Apr 12, 2023 • 1 new comment
[REQUEST] Support multiple models using deepspeed
#3093 commented on Apr 12, 2023 • 1 new comment
[BUG] terminate called after throwing an instance of 'std::bad_alloc'
#3126 commented on Apr 12, 2023 • 1 new comment
[BUG] Bloom inference error with dtype=int8
#2923 commented on Apr 12, 2023 • 1 new comment
Automatic adjustment of ZeRO's optimizer state partitioning with a new world size is not currently supported.
#2898 commented on Apr 12, 2023 • 1 new comment
[REQUEST] Model serving via deepspeed's inference module
#1508 commented on Apr 13, 2023 • 1 new comment
[Question] How to preshard a model for tensor parallism
#2379 commented on Apr 14, 2023 • 1 new comment
Error building extension 'cpu_adam'
#889 commented on Apr 14, 2023 • 1 new comment
Add 4-bit quantized inference to run BLOOM-176B on 2 A100 GPUs
#2526 commented on Apr 7, 2023 • 1 new comment
Codegen Inference Support
#2916 commented on Apr 14, 2023 • 1 new comment
[DRAFT] Tentative implementation of MiCS
#2964 commented on Apr 10, 2023 • 1 new comment
Fix pipeline module evaluation when contiguous activation checkpoin…
#3005 commented on Apr 13, 2023 • 1 new comment
fix check params
#3036 commented on Apr 7, 2023 • 1 new comment
[BUG] RuntimeError: still have inflight params [<bound method Init._convert_to_deepspeed_param.<locals>.ds_summary of Parameter containing:
#3156 commented on Apr 7, 2023 • 0 new comments
SSH-less connection on Kubernetes
#2679 commented on Apr 13, 2023 • 0 new comments
Add T5 example using flops profiler to the docs
#2774 commented on Apr 10, 2023 • 0 new comments
ZeRO 1 and 2 Gradient Accumulation Dtype.
#2847 commented on Apr 11, 2023 • 0 new comments
Allow dict datatype for checkpoints (inference)
#3007 commented on Apr 10, 2023 • 0 new comments
fix mpich launcher issue in multi-node
#3078 commented on Apr 10, 2023 • 0 new comments
remove `torch.cuda.is_available()` check when compiling ops
#3085 commented on Apr 7, 2023 • 0 new comments
Disable ZeRO loading when load_module_only=True
#3116 commented on Apr 7, 2023 • 0 new comments