MementoGUI: Learning Agentic Multimodal Memory Control for Long-Horizon GUI Agents
MementoGUI: Learning Agentic Multimodal Memory Control for Long-Horizon GUI Agents
要約
Recent GUI agents have made substantial progress in visual grounding and action prediction, yet they remain brittle in long-horizon tasks that require maintaining task state across many interface transitions. Existing agents typically rely on raw history replay or text-only memory, which either over…