DenoiseRL: Bootstrapping Reasoning Models to Recover from Noisy Prefixes
DenoiseRL: Bootstrapping Reasoning Models to Recover from Noisy Prefixes
要約
Reinforcement learning has become a central paradigm for advancing reasoning in large language models, yet most existing methods still depend on stronger teacher models or heavily curated difficult datasets, limiting scalable capability improvement. In this paper, we introduce DenoiseRL, a reinforce…