StableVLA: Towards Robust Vision-Language-Action Models without Extra Data
StableVLA: Towards Robust Vision-Language-Action Models without Extra Data
要約
It is infeasible to encompass all possible disturbances within the training dataset. This raises a critical question regarding the robustness of Vision-Language-Action (VLA) models when encountering unseen real-world visual disturbances, particularly under imperfect visual conditions. In this work, …