Lazy Layers to Make Fine-Tuned Diffusion Models More Traceable
arxiv(2024)
摘要
Foundational generative models should be traceable to protect their owners
and facilitate safety regulation. To achieve this, traditional approaches embed
identifiers based on supervisory trigger-response signals, which are commonly
known as backdoor watermarks. They are prone to failure when the model is
fine-tuned with nontrigger data. Our experiments show that this vulnerability
is due to energetic changes in only a few 'busy' layers during fine-tuning.
This yields a novel arbitrary-in-arbitrary-out (AIAO) strategy that makes
watermarks resilient to fine-tuning-based removal. The trigger-response pairs
of AIAO samples across various neural network depths can be used to construct
watermarked subpaths, employing Monte Carlo sampling to achieve stable
verification results. In addition, unlike the existing methods of designing a
backdoor for the input/output space of diffusion models, in our method, we
propose to embed the backdoor into the feature space of sampled subpaths, where
a mask-controlled trigger function is proposed to preserve the generation
performance and ensure the invisibility of the embedded backdoor. Our empirical
studies on the MS-COCO, AFHQ, LSUN, CUB-200, and DreamBooth datasets confirm
the robustness of AIAO; while the verification rates of other trigger-based
methods fall from 90
consistently above 90
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要