VLMs are Good Teachers for Video Reasoning via Adaptive Test-Time Optimization
VLMs are Good Teachers for Video Reasoning via Adaptive Test-Time Optimization
要約
The recent “Reasoning with Video” paradigm utilizes Video Generation Models (VGMs) to generate temporally coherent visual trajectories to complete reasoning tasks. Although state-of-the-art VGMs excel at visual quality, they often struggle to understand and follow task-specific rules, leading to log…