論文 深掘り Hugging Face 発表: 2026-05-19 HF ↑27

You Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1 Trajectories

You Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1 Trajectories

著者: Zhepei Wei, Xinyu Zhu, Wei-Lin Chen, Chengsong Huang, Jiaxin Huang ほか1名

要約

Reinforcement learning with verifiable rewards (RLVR) has become a dominant paradigm for improving reasoning in large language models (LLMs), yet the underlying geometry of the resulting parameter trajectories remains underexplored. In this work, we demonstrate that RLVR weight trajectories are extr…

#llm#rl#benchmark

同じカテゴリの記事