Towards On-Policy Data Evolution for Visual-Native Multimodal Deep Search Agents
Towards On-Policy Data Evolution for Visual-Native Multimodal Deep Search Agents
要約
Multimodal deep search requires an agent to solve open-world problems by chaining search, tool use, and visual reasoning over evolving textual and visual context. Two bottlenecks limit current systems. First, existing tool-use harnesses treat images returned by search, browsing, or transformation as…