Squeezing Capacity from Multimodal Large Language Models for Subject-driven Generation
Squeezing Capacity from Multimodal Large Language Models for Subject-driven Generation
要約
Subject-driven image generation aims to synthesize new images that preserve the identity of the given subject while following textual instructions. Existing approaches often encode text and reference images separately. This limits cross-modal reasoning abilities and causes copy-paste artifacts. Rece…