The First to Know: How Token Distributions Reveal Hidden Knowledge in Large Vision-Language Models?
CoRR(2024)
摘要
Large vision-language models (LVLMs), designed to interpret and respond to
human instructions, occasionally generate hallucinated or harmful content due
to inappropriate instructions. This study uses linear probing to shed light on
the hidden knowledge at the output layer of LVLMs. We demonstrate that the
logit distributions of the first tokens contain sufficient information to
determine whether to respond to the instructions, including recognizing
unanswerable visual questions, defending against multi-modal jailbreaking
attack, and identifying deceptive questions. Such hidden knowledge is gradually
lost in logits of subsequent tokens during response generation. Then, we
illustrate a simple decoding strategy at the generation of the first token,
effectively improving the generated content. In experiments, we find a few
interesting insights: First, the CLIP model already contains a strong signal
for solving these tasks, indicating potential bias in the existing datasets.
Second, we observe performance improvement by utilizing the first logit
distributions on three additional tasks, including indicting uncertainty in
math solving, mitigating hallucination, and image classification. Last, with
the same training data, simply finetuning LVLMs improve models' performance but
is still inferior to linear probing on these tasks.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要