Optimizer-Model Consistency: Full Finetuning with the Same Optimizer as Pretraining Forgets Less
Optimizer-Model Consistency: Full Finetuning with the Same Optimizer as Pretraining Forgets Less
要約
Optimizers play an important role in both pretraining and finetuning stages when training large language models (LLMs). In this paper, we present an observation that full finetuning with the same optimizer as in pretraining achieves a better learning-forgetting tradeoff, i.e., forgetting less while …