Issues: hpcaitech/ColossalAI
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[BUG]: the training procedure exit without any error logs or stuck in some steps using multi-nodes with booster api
bug
Something isn't working
#4353
opened Jul 31, 2023 by
zhangvia
[BUG]: stuck when scaling to 8 nodes with 8 GPUs per node
bug
Something isn't working
#4351
opened Jul 30, 2023 by
Atopos-309
[BUG]: 在 interleaved pipeline schedule 情况下,模型训练 loss 不下降
bug
Something isn't working
#4348
opened Jul 28, 2023 by
huangting4201
[BUG]: llama training script cannot run in branch of example/llama
bug
Something isn't working
#4344
opened Jul 28, 2023 by
Hap-Zhang
[BUG]: Training is slower than DeepSpeed in llama 7B?
bug
Something isn't working
#4342
opened Jul 27, 2023 by
Hap-Zhang
[BUG]: llama example 3d parallel OOM even with cpu offload.
bug
Something isn't working
#4339
opened Jul 27, 2023 by
uygnef
[BUG]: ModuleNotFoundError: No module named 'colossalai._C.cpu_adam'
bug
Something isn't working
#4327
opened Jul 25, 2023 by
Aaricis
[BUG]: can not run batch2_seq2048_flash_attn for 65b llama
bug
Something isn't working
#4326
opened Jul 25, 2023 by
uygnef
[BUG]: Data parallelism and model parallelism have completely different weights
bug
Something isn't working
#4319
opened Jul 25, 2023 by
bobo0810
[BUG]: sft training 13B model outputs 75go model
bug
Something isn't working
#4308
opened Jul 23, 2023 by
allaccs
[BUG]: TypeError: 'LazyTensor' object is not callable
bug
Something isn't working
#4296
opened Jul 20, 2023 by
upwindflys
[plugin] add 3d parallel plugin
enhancement
New feature or request
#4294
opened Jul 20, 2023 by
ver217
[BUG]: 使用RedPajama-Data-1T-Sample数据集训练llama-7b模型报错
bug
Something isn't working
#4289
opened Jul 20, 2023 by
Maxhyl
[FEATURE]: Support for LLaMA-2
enhancement
New feature or request
#4281
opened Jul 19, 2023 by
lvcc2018
[BUG]: ATTENTION! tensor.type_as() will return None with colossalai
bug
Something isn't working
#4280
opened Jul 19, 2023 by
densechen
[shardformer] add tensor type of attn mask in Coloattention
#4262
opened Jul 18, 2023 by
flybird1111
[BUG]: dataloader num is not correct
bug
Something isn't working
#4207
opened Jul 10, 2023 by
zhangvia
[FEATURE]: FP8 mixed precision training via Transformer Engine
enhancement
New feature or request
#4199
opened Jul 7, 2023 by
sbhavani
Previous Next
ProTip!
Exclude everything labeled
bug with -label:bug.