Insights: microsoft/DeepSpeed
Overview
Could not load contribution data
Please try again later
1 Release published by 1 person
-
v0.9.0 DeepSpeed v0.9.0
published
Apr 13, 2023
23 Pull requests merged by 15 people
-
Nested zero.Init() and dynamically defined model class
#2989 merged
Apr 14, 2023 -
Update DeepSpeed-Chat docs with latest changes to scripts
#3219 merged
Apr 13, 2023 -
Update DS-Chat docs for v0.9.0
#3216 merged
Apr 13, 2023 -
[CPU support] Optionally bind each rank to different cores on host
#2881 merged
Apr 12, 2023 -
Update AMD workflows
#3179 merged
Apr 12, 2023 -
fix license badge
#3200 merged
Apr 12, 2023 -
feat: Add support for `NamedTuple` when sharding parameters [#3029]
#3037 merged
Apr 12, 2023 -
fix hero figure
#3199 merged
Apr 12, 2023 -
Add Japanese version of ChatGPT-like pipeline blog
#3194 merged
Apr 12, 2023 -
Chatgpt chinese blog
#3193 merged
Apr 12, 2023 -
Fix typo
#3164 merged
Apr 11, 2023 -
Fix typo
#3183 merged
Apr 11, 2023 -
Fix references to figures
#3189 merged
Apr 11, 2023 -
DeepSpeed Chat
#3186 merged
Apr 11, 2023 -
add news item.
#3188 merged
Apr 11, 2023 -
[docs] add run command for 13b
#3187 merged
Apr 11, 2023 -
Add DeepSpeed-Chat Blogpost
#3185 merged
Apr 11, 2023 -
deepspeed/runtime/utils.py: reset_peak_memory_stats when empty cache
#2803 merged
Apr 10, 2023 -
zero.Init() should pin params in GPU memory as requested
#2953 merged
Apr 7, 2023 -
op_builder: conditionally compute relative path for hip compiled files
#3095 merged
Apr 7, 2023 -
fixing a bug in CPU Adam and Adagrad
#3109 merged
Apr 7, 2023 -
Remove benchmark code
#3157 merged
Apr 7, 2023 -
Update curriculum-learning.md
#3031 merged
Apr 7, 2023
14 Pull requests opened by 14 people
-
Fix handling of (CUDA,ROCR)_VISIBLE_DEVICES
#3165 opened
Apr 8, 2023 -
stage_1_and_2.py: do gradient scale only for fp16
#3166 opened
Apr 9, 2023 -
[Fix] _conv_flops_compute when padding is a str and stride=1
#3169 opened
Apr 9, 2023 -
Enable auto TP policy for llama model
#3170 opened
Apr 10, 2023 -
AMD Kernel Compatibility Fixes
#3180 opened
Apr 11, 2023 -
Documentation for DeepSpeed Accelerator Abstraction Interface
#3184 opened
Apr 11, 2023 -
improving int4 asymmetric quantization accuracy
#3190 opened
Apr 11, 2023 -
Add HE support for the rest of model containers
#3191 opened
Apr 12, 2023 -
Update automatic-tensor-parallelism.md
#3198 opened
Apr 12, 2023 -
Make deepspeed.zero.Init() idempotent
#3203 opened
Apr 12, 2023 -
Additional changes to support MI200
#3204 opened
Apr 12, 2023 -
zero3 checkpoint frozen params
#3205 opened
Apr 12, 2023 -
[update] reference in cifar-10
#3212 opened
Apr 13, 2023 -
Fix for Stable Diffusion
#3218 opened
Apr 13, 2023
13 Issues closed by 12 people
-
What is the principle for 15x speedup? Thank you very much!
#3225 closed
Apr 14, 2023 -
[BUG]pip install deepspeed install fail
#3213 closed
Apr 14, 2023 -
[BUG]deepspeed.ops.adam.DeepSpeedCPUAdam how to use in config file?
#3227 closed
Apr 14, 2023 -
[BUG] RuntimeError: Failed to import transformers.models.opt.modeling_opt
#3215 closed
Apr 14, 2023 -
[BUG]TypeError: allocate_workspace_fp16(): incompatible function arguments.
#3209 closed
Apr 14, 2023 -
[BUG] g++: fatal error: cannot execute ‘cc1plus’: execvp: No such file or directory
#3221 closed
Apr 14, 2023 -
[BUG] using ZeRO-stage 3 with offload, buffer configs don't make memory usage change.
#2869 closed
Apr 13, 2023 -
[BUG] `AttributeError: 'CSVConfig' object has no attribute 'group'`
#2853 closed
Apr 12, 2023 -
[REQUEST]The input sequences must be in the same length when using pipeline parallelism for all batches?
#3195 closed
Apr 12, 2023 -
how to make different rank use same dataloader in pipeline
#3131 closed
Apr 12, 2023 -
[TYPO DOCS] Typo in DeepSpeed Configuration JSON documentation
#3163 closed
Apr 11, 2023 -
[BUG] Cannot build transformer_inference extension
#3171 closed
Apr 11, 2023 -
[BUG] pip install DeepSeed Error
#3145 closed
Apr 10, 2023
30 Issues opened by 26 people
-
[BUG] return getattr(args, f"{model_type[step_num]}_model")
#3231 opened
Apr 14, 2023 -
[BUG] Fail to run the example in/DeepSpeedExamples
#3229 opened
Apr 14, 2023 -
[BUG] batch_size check failed with zero 2 (deepspeed v0.9.0)
#3228 opened
Apr 14, 2023 -
CUDA out of memory
#3224 opened
Apr 14, 2023 -
[BUG] Installed CUDA version 12.1 does not match the version torch was compiled with 11.8
#3223 opened
Apr 14, 2023 -
[BUG]the following arguments are required: user_script, user_args
#3222 opened
Apr 14, 2023 -
[REQUEST] Please spend more time on the usability of the project, especially the doc.
#3220 opened
Apr 14, 2023 -
[BUG] multi-node inference initialization fails when trying not to use replace_with_kernel_inject
#3217 opened
Apr 13, 2023 -
[BUG]Out of memory when training, and is streaming mode supported ?
#3214 opened
Apr 13, 2023 -
[BUG] Unable to pre-compile async_io
#3211 opened
Apr 13, 2023 -
[BUG] NCCL out of memory on `save_checkpoint()`
#3210 opened
Apr 13, 2023 -
[BUG]RuntimeError: Step 1 exited with non-zero status 1
#3208 opened
Apr 13, 2023 -
[BUG]error: can't copy 'deepspeed/accelerator': doesn't exist or not a regular file
#3207 opened
Apr 13, 2023 -
[REQUEST] 省钱15倍,这是不是明显的病句
#3206 opened
Apr 13, 2023 -
[BUG] "with deepspeed.zero.Init()" is not idempotent
#3202 opened
Apr 12, 2023 -
[BUG] error: use of undeclared identifier '__double2half'; did you mean '__double2hiint'?"
#3197 opened
Apr 12, 2023 -
whl does not get created following the instructions on Windows 11 [BUG]
#3196 opened
Apr 12, 2023 -
cpu memory out of use when infering on 30b model
#3192 opened
Apr 12, 2023 -
[BUG] ds inference succeed for 2 gpus, oom for 4 gpus
#3182 opened
Apr 11, 2023 -
[BUG] Inference failed serveral times
#3181 opened
Apr 11, 2023 -
[BUG] Deepspeed inference fp16 gives different results than HuggingFace with FlanT5-XL
#3177 opened
Apr 10, 2023 -
[Deepspeed stage-3 student+teacher crash]
#3175 opened
Apr 10, 2023 -
Installing Ops for using with Pyinstaller
#3173 opened
Apr 10, 2023 -
Does ds inference support op fusion for multi-head attention?
#3172 opened
Apr 10, 2023 -
[BUG] High GPU memory use when fine-tuning Flan-T5-xxl (11B) using stage 3
#3168 opened
Apr 9, 2023 -
[BUG] Memory increase consistently when using multiple NCCL_IB_HCA.
#3167 opened
Apr 9, 2023 -
[BUG] exits with return code = -9
#3160 opened
Apr 7, 2023
55 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
[BUG] Fail to inference with 8bit quantized bloom-3b model, shape mismatch error
#2865 commented on
Apr 13, 2023 • 6 new comments -
add bf16 cuda kernel support
#3092 commented on
Apr 14, 2023 • 6 new comments -
Installation on Windows 10 (Deepspeed inference)
#2588 commented on
Apr 11, 2023 • 3 new comments -
Checks for user injection policy
#3052 commented on
Apr 11, 2023 • 3 new comments -
[BUG] (NVMe Offload with Zero3) Not enough buffers 0 for swapping 1
#3062 commented on
Apr 7, 2023 • 2 new comments -
[REQUEST] Hey, Microsoft...Could you PLEASE Support Your Own OS?
#2427 commented on
Apr 11, 2023 • 2 new comments -
Zero-R source code
#452 commented on
Apr 11, 2023 • 2 new comments -
[BUG] Can't build sparse attention op with PyTorch 2.0
#3117 commented on
Apr 11, 2023 • 2 new comments -
RuntimeError: 'weight' must be 2-D while training Flan-T5 models with stage 3
#2746 commented on
Apr 12, 2023 • 2 new comments -
[BUG] Install on AMD ROCm system but fails to build on CUDA dependencies
#3091 commented on
Apr 12, 2023 • 2 new comments -
[BUG] Outputs of type NamedTuple cause crash in `_apply_to_tensors_only` (stage 3 + shard parameters)
#3029 commented on
Apr 13, 2023 • 2 new comments -
[BUG] ValueError: max() arg is an empty sequence using bf16 zero stage3
#2820 commented on
Apr 13, 2023 • 2 new comments -
[BUG]
#3069 commented on
Apr 13, 2023 • 2 new comments -
[BUG] CUDA illegal memory access on large batch with ZeRO-infinity
#1852 commented on
Apr 14, 2023 • 2 new comments -
Fix broadcast error on multi-node training with ZeroStage3 and TensorParallel=2
#2999 commented on
Apr 14, 2023 • 2 new comments -
[CPU] Support Intel CPU inference
#3041 commented on
Apr 13, 2023 • 2 new comments -
Update torch version check in building sparse_attn
#3152 commented on
Apr 13, 2023 • 2 new comments -
deepspeed on T4 GPU server and run Stable diffustion model inference error
#2957 commented on
Apr 8, 2023 • 1 new comment -
[BUG] Error "exits with return code -7" when finetuning FLANT5-xxl on 8x A100
#2897 commented on
Apr 8, 2023 • 1 new comment -
[BUG] ds-model inference results go far away from that of original model (megatron ) attention-context_layer error
#3124 commented on
Apr 8, 2023 • 1 new comment -
[BUG] Incorrect Model Output For Contrastive Search
#2809 commented on
Apr 10, 2023 • 1 new comment -
[BUG] VRAM increasing after each call to model
#3073 commented on
Apr 10, 2023 • 1 new comment -
Concurrent generation of responses (one GPU, multiple users)
#3080 commented on
Apr 10, 2023 • 1 new comment -
[BUG] Can't compile DeepSpeed version 0.8.1+ with Cuda 11.7
#2914 commented on
Apr 10, 2023 • 1 new comment -
[Error] [Win] Unable to pre-compile async_io on Windows
#1769 commented on
Apr 10, 2023 • 1 new comment -
[BUG] 'StableDiffusionPipeline' object has no attribute 'children'
#2968 commented on
Apr 11, 2023 • 1 new comment -
[BUG]pip install doesn't work. Please eeelp.
#2137 commented on
Apr 11, 2023 • 1 new comment -
[BUG] Inference fail with "mat1 and mat2 shapes cannot be multiplied" for Llama model.
#3099 commented on
Apr 11, 2023 • 1 new comment -
4 X A100 80G train HF_13B Llama, error
#3153 commented on
Apr 11, 2023 • 1 new comment -
[BUG] Resume from checkpoint Out of memory error (SIGTERM: Killed) for a large model
#3104 commented on
Apr 11, 2023 • 1 new comment -
[REQUEST] Add more device-agnostic compression algorithms
#2894 commented on
Apr 11, 2023 • 1 new comment -
[BUG] INFLIGHT parameters after evaluation
#3068 commented on
Apr 12, 2023 • 1 new comment -
[math] what network throughput is required to handle ZeRO-3 traffic?
#2928 commented on
Apr 12, 2023 • 1 new comment -
subprocess.CalledProcessError: Command '['which', 'c++']' returned non-zero exit status 1.
#1649 commented on
Apr 12, 2023 • 1 new comment -
[BUG] Int8 Inference Does Not Work For GPTJ
#2956 commented on
Apr 12, 2023 • 1 new comment -
[REQUEST] Support multiple models using deepspeed
#3093 commented on
Apr 12, 2023 • 1 new comment -
[BUG] terminate called after throwing an instance of 'std::bad_alloc'
#3126 commented on
Apr 12, 2023 • 1 new comment -
[BUG] Bloom inference error with dtype=int8
#2923 commented on
Apr 12, 2023 • 1 new comment -
Automatic adjustment of ZeRO's optimizer state partitioning with a new world size is not currently supported.
#2898 commented on
Apr 12, 2023 • 1 new comment -
[REQUEST] Model serving via deepspeed's inference module
#1508 commented on
Apr 13, 2023 • 1 new comment -
[Question] How to preshard a model for tensor parallism
#2379 commented on
Apr 14, 2023 • 1 new comment -
Error building extension 'cpu_adam'
#889 commented on
Apr 14, 2023 • 1 new comment -
Add 4-bit quantized inference to run BLOOM-176B on 2 A100 GPUs
#2526 commented on
Apr 7, 2023 • 1 new comment -
Codegen Inference Support
#2916 commented on
Apr 14, 2023 • 1 new comment -
[DRAFT] Tentative implementation of MiCS
#2964 commented on
Apr 10, 2023 • 1 new comment -
Fix pipeline module evaluation when contiguous activation checkpoin…
#3005 commented on
Apr 13, 2023 • 1 new comment -
fix check params
#3036 commented on
Apr 7, 2023 • 1 new comment -
[BUG] RuntimeError: still have inflight params [<bound method Init._convert_to_deepspeed_param.<locals>.ds_summary of Parameter containing:
#3156 commented on
Apr 7, 2023 • 0 new comments -
SSH-less connection on Kubernetes
#2679 commented on
Apr 13, 2023 • 0 new comments -
Add T5 example using flops profiler to the docs
#2774 commented on
Apr 10, 2023 • 0 new comments -
ZeRO 1 and 2 Gradient Accumulation Dtype.
#2847 commented on
Apr 11, 2023 • 0 new comments -
Allow dict datatype for checkpoints (inference)
#3007 commented on
Apr 10, 2023 • 0 new comments -
fix mpich launcher issue in multi-node
#3078 commented on
Apr 10, 2023 • 0 new comments -
remove `torch.cuda.is_available()` check when compiling ops
#3085 commented on
Apr 7, 2023 • 0 new comments -
Disable ZeRO loading when load_module_only=True
#3116 commented on
Apr 7, 2023 • 0 new comments