Generative Models are Self-Watermarked: Declaring Model Authentication through Re-Generation
CoRR(2024)
摘要
As machine- and AI-generated content proliferates, protecting the
intellectual property of generative models has become imperative, yet verifying
data ownership poses formidable challenges, particularly in cases of
unauthorized reuse of generated data. The challenge of verifying data ownership
is further amplified by using Machine Learning as a Service (MLaaS), which
often functions as a black-box system.
Our work is dedicated to detecting data reuse from even an individual sample.
Traditionally, watermarking has been leveraged to detect AI-generated content.
However, unlike watermarking techniques that embed additional information as
triggers into models or generated content, potentially compromising output
quality, our approach identifies latent fingerprints inherently present within
the outputs through re-generation. We propose an explainable verification
procedure that attributes data ownership through re-generation, and further
amplifies these fingerprints in the generative models through iterative data
re-generation. This methodology is theoretically grounded and demonstrates
viability and robustness using recent advanced text and image generative
models. Our methodology is significant as it goes beyond protecting the
intellectual property of APIs and addresses important issues such as the spread
of misinformation and academic misconduct. It provides a useful tool to ensure
the integrity of sources and authorship, expanding its application in different
scenarios where authenticity and ownership verification are essential.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要