Fusion Transformer with Object Mask Guidance for Image Forgery Analysis
CoRR(2024)
摘要
In this work, we introduce OMG-Fuser, a fusion transformer-based network
designed to extract information from various forensic signals to enable robust
image forgery detection and localization. Our approach can operate with an
arbitrary number of forensic signals and leverages object information for their
analysis – unlike previous methods that rely on fusion schemes with few
signals and often disregard image semantics. To this end, we design a forensic
signal stream composed of a transformer guided by an object attention
mechanism, associating patches that depict the same objects. In that way, we
incorporate object-level information from the image. Each forensic signal is
processed by a different stream that adapts to its peculiarities. Subsequently,
a token fusion transformer efficiently aggregates the outputs of an arbitrary
number of network streams and generates a fused representation for each image
patch. These representations are finally processed by a long-range dependencies
transformer that captures the intrinsic relations between the image patches. We
assess two fusion variants on top of the proposed approach: (i) score-level
fusion that fuses the outputs of multiple image forensics algorithms and (ii)
feature-level fusion that fuses low-level forensic traces directly. Both
variants exceed state-of-the-art performance on seven datasets for image
forgery detection and localization, with a relative average improvement of
12.1
traditional and novel forgery attacks and can be expanded with new signals
without training from scratch.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要