Model Merging Scaling Laws in Large Language Models
Model Merging Scaling Laws in Large Language Models
要約
We study empirical scaling laws for language model merging measured by cross-entropy. Despite its wide practical use, merging lacks a quantitative rule that predicts returns as we add experts or scale the model size. We identify a compact power law that links model size and expert number: the size-d…