GoLongRL: Capability-Oriented Long Context Reinforcement Learning with Multitask Alignment
GoLongRL: Capability-Oriented Long Context Reinforcement Learning with Multitask Alignment
要約
We present GoLongRL, a fully open-source, capability-oriented post-training recipe for long-context reinforcement learning with verifiable rewards (RLVR). Existing long-context RL methods often treat data construction as a matter of designing increasingly complex retrieval paths, leading to homogene…