Skip to content

Not see desirable GPU memory saving when running DeepSpeedExamples/BingBertSquad #259

@wangli1426

Description

@wangli1426

Hi ALL,

I run DeepSpeedExamples/BingBertSquad using docker image deepspeed/deepspeed:latest . I launched 16 ranks on two GPU servers, each of which is equipped with 8 V100 GPUs.

However, only 20% GPU memory saving, aka. 2.5GB per GPU vs 1.9 GB per GPU, was observed when I turned on zero_optimization and activation_checkpointing. Is this normal?

The config is attached as below.

{
  "train_batch_size": 256,
  "train_micro_batch_size_per_gpu": 16,
  "steps_per_print": 32,
  "gradient_accumulation_steps": 1,
  "optimizer": {
    "type": "Adam",
    "params": {
      "lr": 3e-5,
      "weight_decay": 0.0,
      "bias_correction": false
    }
  },
  "gradient_clipping": 1.0,
  "fp16": {
    "enabled": true
  },
  "zero_optimization": {
    "stage": 2
  },
  "activation_checkpointing": {
    "partition_activations": true,
    "cpu_checkpointing": false,
    "contiguous_memory_optimization": false,
    "number_checkpoints": null,
    "synchronize_checkpoint_boundary": false,
    "profile": true
    }
}

The network I used is Bert-large with 1024 hidden_size and 24 num_hidden_layers.

Any suggestion or feedback is highly appreciated.

Thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions