論文 深掘り Hugging Face 発表: 2026-05-18 HF ↑44

GoLongRL: Capability-Oriented Long Context Reinforcement Learning with Multitask Alignment

GoLongRL: Capability-Oriented Long Context Reinforcement Learning with Multitask Alignment

著者: Minxuan Lv, Tiehua Mei, Tanlong Du, Junmin Chen, Zhenpeng Su ほか7名

要約

We present GoLongRL, a fully open-source, capability-oriented post-training recipe for long-context reinforcement learning with verifiable rewards (RLVR). Existing long-context RL methods often treat data construction as a matter of designing increasingly complex retrieval paths, leading to homogene…

#rl#alignment#benchmark

同じカテゴリの記事