Unlearning Backdoor Threats: Enhancing Backdoor Defense in Multimodal Contrastive Learning via Local Token Unlearning
CoRR(2024)
摘要
Multimodal contrastive learning has emerged as a powerful paradigm for
building high-quality features using the complementary strengths of various
data modalities. However, the open nature of such systems inadvertently
increases the possibility of backdoor attacks. These attacks subtly embed
malicious behaviors within the model during training, which can be activated by
specific triggers in the inference phase, posing significant security risks.
Despite existing countermeasures through fine-tuning that reduce the adverse
impacts of such attacks, these defenses often degrade the clean accuracy and
necessitate the construction of extensive clean training pairs. In this paper,
we explore the possibility of a less-cost defense from the perspective of model
unlearning, that is, whether the model can be made to quickly unlearn
backdoor threats (UBT) by constructing a small set of
poisoned samples. Specifically, we strengthen the backdoor shortcuts to
discover suspicious samples through overfitting training prioritized by weak
similarity samples. Building on the initial identification of suspicious
samples, we introduce an innovative token-based localized forgetting training
regime. This technique specifically targets the poisoned aspects of the model,
applying a focused effort to unlearn the backdoor associations and trying not
to damage the integrity of the overall model. Experimental results show that
our method not only ensures a minimal success rate for attacks, but also
preserves the model's high clean accuracy.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要