Investigating the Impact of Model Instability on Explanations and Uncertainty
CoRR(2024)
摘要
Explainable AI methods facilitate the understanding of model behaviour, yet,
small, imperceptible perturbations to inputs can vastly distort explanations.
As these explanations are typically evaluated holistically, before model
deployment, it is difficult to assess when a particular explanation is
trustworthy. Some studies have tried to create confidence estimators for
explanations, but none have investigated an existing link between uncertainty
and explanation quality. We artificially simulate epistemic uncertainty in text
input by introducing noise at inference time. In this large-scale empirical
study, we insert different levels of noise perturbations and measure the effect
on the output of pre-trained language models and different uncertainty metrics.
Realistic perturbations have minimal effect on performance and explanations,
yet masking has a drastic effect. We find that high uncertainty doesn't
necessarily imply low explanation plausibility; the correlation between the two
metrics can be moderately positive when noise is exposed during the training
process. This suggests that noise-augmented models may be better at identifying
salient tokens when uncertain. Furthermore, when predictive and epistemic
uncertainty measures are over-confident, the robustness of a saliency map to
perturbation can indicate model stability issues. Integrated Gradients shows
the overall greatest robustness to perturbation, while still showing
model-specific patterns in performance; however, this phenomenon is limited to
smaller Transformer-based language models.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要