Agent Explorative Policy Optimization for Multimodal Agentic Reasoning
Agent Explorative Policy Optimization for Multimodal Agentic Reasoning
要約
Vision-language models with extended reasoning succeed on complex problems, but many real-world problems require external tools that internal reasoning alone often cannot resolve. Agentic reasoning therefore interleaves two behaviors with a structural asymmetry: thinking (the self-contained default)…