Assessing Adversarial Robustness of Large Language Models: An Empirical Study
arxiv(2024)
摘要
Large Language Models (LLMs) have revolutionized natural language processing,
but their robustness against adversarial attacks remains a critical concern. We
presents a novel white-box style attack approach that exposes vulnerabilities
in leading open-source LLMs, including Llama, OPT, and T5. We assess the impact
of model size, structure, and fine-tuning strategies on their resistance to
adversarial perturbations. Our comprehensive evaluation across five diverse
text classification tasks establishes a new benchmark for LLM robustness. The
findings of this study have far-reaching implications for the reliable
deployment of LLMs in real-world applications and contribute to the advancement
of trustworthy AI systems.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要