論文 Hugging Face 発表: 2026-05-05 HF ↑1

When to Think, When to Speak: Learning Disclosure Policies for LLM Reasoning

著者: Jiaqi Wei, Xuehang Guo, Pengfei Yu, Xiang Zhang, Wanli Ouyang ほか3名

要約

In single-stream autoregressive interfaces, the same tokens both update the model state and constitute an irreversible public commitment. This coupling creates a silence tax: additional deliberation postpones the first task-relevant content, while naive early streaming risks premature commitments th…

#llm#rl#benchmark

When to Think, When to Speak: Learning Disclosure Policies for LLM Reasoning

要約

同じカテゴリの記事

On-Policy Self-Evolution via Failure Trajectories for Agentic Safety Alignment

World-R1: テキストから動画生成における3D制約の強化学習による整合

OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents