microsoft / DeepSpeed Public

Notifications
Fork 945
Star 8.1k

Code
Issues 482
Pull requests 75
Discussions
Actions
Projects
Security
Insights

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Issues: microsoft/DeepSpeed

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

482 Open 570 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

[BUG] GPT-j int8 requires more memory than float16 bug

Something isn't working

inference

#2467 opened Nov 2, 2022 by zelcookie

[BUG] GPT-j memory overflows sharply if you use kernel inject bug

Something isn't working

inference

#2466 opened Nov 2, 2022 by zelcookie

[REQUEST] Does DeepSpeed support ONNX export and Triton Inference Server? enhancement

New feature or request

#2465 opened Nov 2, 2022 by stevezheng23

[BUG]Call torch.cuda.synchronize each time to reduce gradients in ZeRO Stage2 enhancement

New feature or request

training user-question

Questions about DeepSpeed.

#2463 opened Nov 1, 2022 by li-yi-dong

[BUG] stage 3 cannot load the checkpoint when optimizer is not configured bug

Something isn't working

inference

#2449 opened Oct 27, 2022 by rohitgr7

[Question] MoE: Why exp_counts have to be moved to CPU? question

Further information is requested

training

#2444 opened Oct 25, 2022 by clumsy

[BUG] MP-sharded checkpoint loading does not work for models except BLOOM bug

Something isn't working

inference

#2442 opened Oct 24, 2022 by pai4451

What's the use of synchronization in activation checkpointing? question

Further information is requested

training

#2439 opened Oct 21, 2022 by afcruzs

[QUESTION] Pipeline explaination needed question

Further information is requested

training

#2435 opened Oct 20, 2022 by bruno-darochac

[REQUEST] Hey, Microsoft...Could you PLEASE Support Your Own OS? enhancement

New feature or request

#2427 opened Oct 15, 2022 by d8ahazard

Inference Megatron GPT2 ERROR. enhancement

New feature or request

inference user-question

Questions about DeepSpeed.

#2421 opened Oct 13, 2022 by cdj0311

DeepSpeed Megatron [BUG] bug

Something isn't working

training

#2417 opened Oct 12, 2022 by koliaok

Feature request: Ability to disable autocast locally enhancement

New feature or request

training

#2400 opened Oct 7, 2022 by gahdritz

[Question] How to preshard a model for tensor parallism enhancement

New feature or request

inference

#2379 opened Sep 29, 2022 by lanking520

Schema for ds_config.json enhancement

New feature or request

training

#2355 opened Sep 24, 2022 by timoklimmer

[REQUEST] BF16 mixed precision => grad accum in fp32 enhancement

New feature or request

training

#2352 opened Sep 23, 2022 by stas00

Minibatch gradients aren't readily accessible in zero optimization level 1 bug

Something isn't working

training

#2329 opened Sep 17, 2022 by bcui19

hi ，when the ZeroQuant inference will be released?

#2326 opened Sep 15, 2022 by xk503775229

[REQUEST] An option to only save the model state_dict when save_checkpoint(), and how to manually save & load the model state_dict when using ZERO3 enhancement

New feature or request

training

#2304 opened Sep 8, 2022 by BlinkDL

DeepSpeed still gives CUDA-out-of-memory issue bug

Something isn't working

training

#2302 opened Sep 7, 2022 by buttercutter

questuion : how to inference Int8 models (GPT) supported through ZeroQuant technology ? enhancement

New feature or request

inference

#2301 opened Sep 7, 2022 by xk503775229

[Question] why are overlap and contiguous grads meaningless in stage 1 and are ignored training user-question

Questions about DeepSpeed.

#2295 opened Sep 6, 2022 by woolpeeker

[REQUEST] individual tensor .grad getter/setter for flattened tensors enhancement

New feature or request

training

#2290 opened Sep 2, 2022 by stas00

get_fp32_state_dict_from_zero_checkpoint with fixed weights in pipeline enhancement

New feature or request

training

#2288 opened Sep 2, 2022 by CodyLDA

[BUG] Wrong output for batch input for opt model inference. bug

Something isn't working

inference

#2281 opened Sep 1, 2022 by yingapple

Previous 1 2 3 4 5 … 19 20 Next

Previous Next

ProTip! Follow long discussions with comments:>50.