Training without storing model states #705

Ditwoo · 2020-03-14T10:06:51Z

Description
In some rare cases, for example, when you need to finetune a large model on a small dataset the majoring part of training loop is waiting for saving model checkpoints to a hard drive.

Proposal
Would be logically to add a CheckpointCallback with parameter save_n_best=0 to a configuration and do not store best checkpoints and instead use the latest state of the model.

Note
All of the described above is a proposal mostly for config API because config API during the stages loads the best checkpoint from the previous stage.

So here are a few steps to how it can be achieved:

For using the latest state of the model you need to overload a get_model property of ConfigExperiment to return the latest model (something like in BaseExperiment).
To prevent saving the model state to a hard drive overload property get_callbacks of experiment class and remove CheckpointCallback from default callbacks.
Create and use an empty callback instead of CheckpointCallback.

Scitator · 2020-04-22T12:48:04Z

Dear @Ditwoo,
could you please make a Pull Request with such functionality?

Ditwoo added the enhancement label Mar 14, 2020

Scitator added good first issue help wanted labels Apr 22, 2020

catalyst-team / catalyst

Training without storing model states #705

Training without storing model states #705

Ditwoo commented Mar 14, 2020 •

edited

Scitator commented Apr 22, 2020

catalyst-team / catalyst

Sponsor catalyst-team/catalyst

Join GitHub today

Training without storing model states #705

Training without storing model states #705

Comments

Ditwoo commented Mar 14, 2020 • edited

Scitator commented Apr 22, 2020

Ditwoo commented Mar 14, 2020 •

edited