論文 Hugging Face 発表: 2026-05-19 HF ↑3

SpecBench: Measuring Reward Hacking in Long-Horizon Coding Agents

SpecBench: Measuring Reward Hacking in Long-Horizon Coding Agents

著者: Bingchen Zhao, Dhruv Srikanth, Yuxiang Wu, Zhengyao Jiang

要約

As long-horizon coding agents produce more code than any developer can review, oversight collapses onto a single surface: the automated test suite. Reward hacking naturally arises in this setup, as the agent optimizes for passing tests while deviating from the users true goal. We study this reward h…

#agent#coding#benchmark

同じカテゴリの記事