Issues: microsoft/DeepSpeed
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[BUG] GPT-j int8 requires more memory than float16
bug
Something isn't working
inference
#2467
opened Nov 2, 2022 by
zelcookie
[BUG] GPT-j memory overflows sharply if you use kernel inject
bug
Something isn't working
inference
#2466
opened Nov 2, 2022 by
zelcookie
[REQUEST] Does DeepSpeed support ONNX export and Triton Inference Server?
enhancement
New feature or request
#2465
opened Nov 2, 2022 by
stevezheng23
[BUG]Call torch.cuda.synchronize each time to reduce gradients in ZeRO Stage2
enhancement
New feature or request
training
user-question
Questions about DeepSpeed.
#2463
opened Nov 1, 2022 by
li-yi-dong
[BUG] stage 3 cannot load the checkpoint when optimizer is not configured
bug
Something isn't working
inference
#2449
opened Oct 27, 2022 by
rohitgr7
[Question] MoE: Why exp_counts have to be moved to CPU?
question
Further information is requested
training
#2444
opened Oct 25, 2022 by
clumsy
[BUG] MP-sharded checkpoint loading does not work for models except BLOOM
bug
Something isn't working
inference
#2442
opened Oct 24, 2022 by
pai4451
What's the use of synchronization in activation checkpointing?
question
Further information is requested
training
#2439
opened Oct 21, 2022 by
afcruzs
[QUESTION] Pipeline explaination needed
question
Further information is requested
training
#2435
opened Oct 20, 2022 by
bruno-darochac
[REQUEST] Hey, Microsoft...Could you PLEASE Support Your Own OS?
enhancement
New feature or request
#2427
opened Oct 15, 2022 by
d8ahazard
Inference Megatron GPT2 ERROR.
enhancement
New feature or request
inference
user-question
Questions about DeepSpeed.
#2421
opened Oct 13, 2022 by
cdj0311
Feature request: Ability to disable autocast locally
enhancement
New feature or request
training
#2400
opened Oct 7, 2022 by
gahdritz
[Question] How to preshard a model for tensor parallism
enhancement
New feature or request
inference
#2379
opened Sep 29, 2022 by
lanking520
Schema for ds_config.json
enhancement
New feature or request
training
#2355
opened Sep 24, 2022 by
timoklimmer
[REQUEST] BF16 mixed precision => grad accum in fp32
enhancement
New feature or request
training
#2352
opened Sep 23, 2022 by
stas00
Minibatch gradients aren't readily accessible in zero optimization level 1
bug
Something isn't working
training
#2329
opened Sep 17, 2022 by
bcui19
[REQUEST] An option to only save the model state_dict when save_checkpoint(), and how to manually save & load the model state_dict when using ZERO3
enhancement
New feature or request
training
#2304
opened Sep 8, 2022 by
BlinkDL
DeepSpeed still gives CUDA-out-of-memory issue
bug
Something isn't working
training
#2302
opened Sep 7, 2022 by
buttercutter
questuion : how to inference Int8 models (GPT) supported through ZeroQuant technology ?
enhancement
New feature or request
inference
#2301
opened Sep 7, 2022 by
xk503775229
[Question] why are overlap and contiguous grads meaningless in stage 1 and are ignored
training
user-question
Questions about DeepSpeed.
#2295
opened Sep 6, 2022 by
woolpeeker
[REQUEST] individual tensor New feature or request
training
.grad getter/setter for flattened tensors
enhancement
#2290
opened Sep 2, 2022 by
stas00
get_fp32_state_dict_from_zero_checkpoint with fixed weights in pipeline
enhancement
New feature or request
training
#2288
opened Sep 2, 2022 by
CodyLDA
[BUG] Wrong output for batch input for opt model inference.
bug
Something isn't working
inference
#2281
opened Sep 1, 2022 by
yingapple
Previous Next
ProTip!
Follow long discussions with comments:>50.