When to Think, When to Speak: Learning Disclosure Policies for LLM Reasoning
When to Think, When to Speak: Learning Disclosure Policies for LLM Reasoning
要約
In single-stream autoregressive interfaces, the same tokens both update the model state and constitute an irreversible public commitment. This coupling creates a silence tax: additional deliberation postpones the first task-relevant content, while naive early streaming risks premature commitments th…