Leveraging Representations from Intermediate Encoder-blocks for Synthetic Image Detection
CoRR(2024)
摘要
The recently developed and publicly available synthetic image generation
methods and services make it possible to create extremely realistic imagery on
demand, raising great risks for the integrity and safety of online information.
State-of-the-art Synthetic Image Detection (SID) research has led to strong
evidence on the advantages of feature extraction from foundation models.
However, such extracted features mostly encapsulate high-level visual semantics
instead of fine-grained details, which are more important for the SID task. On
the contrary, shallow layers encode low-level visual information. In this work,
we leverage the image representations extracted by intermediate Transformer
blocks of CLIP's image-encoder via a lightweight network that maps them to a
learnable forgery-aware vector space capable of generalizing exceptionally
well. We also employ a trainable module to incorporate the importance of each
Transformer block to the final prediction. Our method is compared against the
state-of-the-art by evaluating it on 20 test datasets and exhibits an average
+10.6
require just a single epoch for training ( 8 minutes). Code available at
https://github.com/mever-team/rine.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要