Multilateral Temporal-view Pyramid Transformer for Video Inpainting Detection
arxiv(2024)
摘要
The task of video inpainting detection is to expose the pixel-level inpainted
regions within a video sequence. Existing methods usually focus on leveraging
spatial and temporal inconsistencies. However, these methods typically employ
fixed operations to combine spatial and temporal clues, limiting their
applicability in different scenarios. In this paper, we introduce a novel
Multilateral Temporal-view Pyramid Transformer (MumPy) that collaborates
spatial-temporal clues flexibly. Our method utilizes a newly designed
multilateral temporal-view encoder to extract various collaborations of
spatial-temporal clues and introduces a deformable window-based temporal-view
interaction module to enhance the diversity of these collaborations.
Subsequently, we develop a multi-pyramid decoder to aggregate the various types
of features and generate detection maps. By adjusting the contribution strength
of spatial and temporal clues, our method can effectively identify inpainted
regions. We validate our method on existing datasets and also introduce a new
challenging and large-scale Video Inpainting dataset based on the YouTube-VOS
dataset, which employs several more recent inpainting methods. The results
demonstrate the superiority of our method in both in-domain and cross-domain
evaluation scenarios.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要