Carefully Blending Adversarial Training and Purification Improves Adversarial Robustness
arxiv(2023)
摘要
In this work, we propose a novel adversarial defence mechanism for image
classification - CARSO - blending the paradigms of adversarial training and
adversarial purification in a synergistic robustness-enhancing way. The method
builds upon an adversarially-trained classifier, and learns to map its internal
representation associated with a potentially perturbed input onto a
distribution of tentative clean reconstructions. Multiple samples from such
distribution are classified by the same adversarially-trained model, and an
aggregation of its outputs finally constitutes the robust prediction of
interest. Experimental evaluation by a well-established benchmark of strong
adaptive attacks, across different image datasets, shows that CARSO is able to
defend itself against adaptive end-to-end white-box attacks devised for
stochastic defences. Paying a modest clean accuracy toll, our method improves
by a significant margin the state-of-the-art for CIFAR-10, CIFAR-100, and
TinyImageNet-200 ℓ_∞ robust classification accuracy against
AutoAttack. Code, and instructions to obtain pre-trained models are available
at https://github.com/emaballarin/CARSO .
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要