論文 深掘り Hugging Face 発表: 2026-05-31 HF ↑41

K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts

K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts

著者: Nahyun Lee, Dongkeun Yoon, Guijin Son, Geewook Kim, Dayoon Ko ほか10名

要約

Frontier model evaluations are shifting from foundational capabilities (e.g., instruction following and reasoning) toward compositional, agentic ones, but Korean agentic benchmarks remain scarce. We introduce K-BrowseComp, a web-browsing agent benchmark grounded in Korean contexts, consisting of 400…

#agent#benchmark#llm

同じカテゴリの記事