ANCHOR: LLM-driven News Subject Conditioning for Text-to-Image Synthesis
arxiv(2024)
摘要
Text-to-Image (T2I) Synthesis has made tremendous strides in enhancing
synthesized image quality, but current datasets evaluate model performance only
on descriptive, instruction-based prompts. Real-world news image captions take
a more pragmatic approach, providing high-level situational and Named-Entity
(NE) information and limited physical object descriptions, making them
abstractive. To evaluate the ability of T2I models to capture intended subjects
from news captions, we introduce the Abstractive News Captions with High-level
cOntext Representation (ANCHOR) dataset, containing 70K+ samples sourced from 5
different news media organizations. With Large Language Models (LLM) achieving
success in language and commonsense reasoning tasks, we explore the ability of
different LLMs to identify and understand key subjects from abstractive
captions. Our proposed method Subject-Aware Finetuning (SAFE), selects and
enhances the representation of key subjects in synthesized images by leveraging
LLM-generated subject weights. It also adapts to the domain distribution of
news images and captions through custom Domain Fine-tuning, outperforming
current T2I baselines on ANCHOR. By launching the ANCHOR dataset, we hope to
motivate research in furthering the Natural Language Understanding (NLU)
capabilities of T2I models.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要