論文 Hugging Face 発表: 2026-05-19 HF ↑3

SpecBench: Measuring Reward Hacking in Long-Horizon Coding Agents

著者: Bingchen Zhao, Dhruv Srikanth, Yuxiang Wu, Zhengyao Jiang

要約

As long-horizon coding agents produce more code than any developer can review, oversight collapses onto a single surface: the automated test suite. Reward hacking naturally arises in this setup, as the agent optimizes for passing tests while deviating from the users true goal. We study this reward h…

#agent#coding#benchmark

SpecBench: Measuring Reward Hacking in Long-Horizon Coding Agents

要約

同じカテゴリの記事

On-Policy Self-Evolution via Failure Trajectories for Agentic Safety Alignment

World-R1: テキストから動画生成における3D制約の強化学習による整合

OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents