SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection
CVPR 2024(2024)
摘要
Misinformation is a prevalent societal issue due to its potential high risks.
Out-of-context (OOC) misinformation, where authentic images are repurposed with
false text, is one of the easiest and most effective ways to mislead audiences.
Current methods focus on assessing image-text consistency but lack convincing
explanations for their judgments, which is essential for debunking
misinformation. While Multimodal Large Language Models (MLLMs) have rich
knowledge and innate capability for visual reasoning and explanation
generation, they still lack sophistication in understanding and discovering the
subtle crossmodal differences. In this paper, we introduce SNIFFER, a novel
multimodal large language model specifically engineered for OOC misinformation
detection and explanation. SNIFFER employs two-stage instruction tuning on
InstructBLIP. The first stage refines the model's concept alignment of generic
objects with news-domain entities and the second stage leverages language-only
GPT-4 generated OOC-specific instruction data to fine-tune the model's
discriminatory powers. Enhanced by external tools and retrieval, SNIFFER not
only detects inconsistencies between text and image but also utilizes external
knowledge for contextual verification. Our experiments show that SNIFFER
surpasses the original MLLM by over 40
methods in detection accuracy. SNIFFER also provides accurate and persuasive
explanations as validated by quantitative and human evaluations.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要