論文 Hugging Face 発表: 2026-05-10 HF ↑11

Rebellious Student: Reversing Teacher Signals for Reasoning Exploration with Self-Distilled RLVR

Rebellious Student: Reversing Teacher Signals for Reasoning Exploration with Self-Distilled RLVR

著者: Jeonghye Kim, Jiwon Jeon, Dongsheng Li, Yuqing Yang

要約

Self-distillation has emerged as a powerful framework for post-training LLMs, where a teacher conditioned on extra information guides a student without it, both from the same model. While this guidance is useful when the student has failed, on successful rollouts, the same mechanism instead overwrit…

#rl#llm

同じカテゴリの記事