DM-Align: Leveraging the Power of Natural Language Instructions to Make Changes to Images
arxiv(2024)
摘要
Text-based semantic image editing assumes the manipulation of an image using
a natural language instruction. Although recent works are capable of generating
creative and qualitative images, the problem is still mostly approached as a
black box sensitive to generating unexpected outputs. Therefore, we propose a
novel model to enhance the text-based control of an image editor by explicitly
reasoning about which parts of the image to alter or preserve. It relies on
word alignments between a description of the original source image and the
instruction that reflects the needed updates, and the input image. The proposed
Diffusion Masking with word Alignments (DM-Align) allows the editing of an
image in a transparent and explainable way. It is evaluated on a subset of the
Bison dataset and a self-defined dataset dubbed Dream. When comparing to
state-of-the-art baselines, quantitative and qualitative results show that
DM-Align has superior performance in image editing conditioned on language
instructions, well preserves the background of the image and can better cope
with long text instructions.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要