Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upTraining without storing model states #705
Open
Comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Description
In some rare cases, for example, when you need to finetune a large model on a small dataset the majoring part of training loop is waiting for saving model checkpoints to a hard drive.
Proposal
Would be logically to add a
CheckpointCallbackwith parametersave_n_best=0to a configuration and do not store best checkpoints and instead use the latest state of the model.Note
All of the described above is a proposal mostly for config API because config API during the stages loads the best checkpoint from the previous stage.
So here are a few steps to how it can be achieved:
get_modelproperty ofConfigExperimentto return the latest model (something like inBaseExperiment).get_callbacksof experiment class and removeCheckpointCallbackfrom default callbacks.CheckpointCallback.