Uncertainty in Additive Feature Attribution methods
CoRR(2023)
摘要
In this work, we explore various topics that fall under the umbrella of
Uncertainty in post-hoc Explainable AI (XAI) methods. We in particular focus on
the class of additive feature attribution explanation methods. We first
describe our specifications of uncertainty and compare various statistical and
recent methods to quantify the same. Next, for a particular instance, we study
the relationship between a feature's attribution and its uncertainty and
observe little correlation. As a result, we propose a modification in the
distribution from which perturbations are sampled in LIME-based algorithms such
that the important features have minimal uncertainty without an increase in
computational cost. Next, while studying how the uncertainty in explanations
varies across the feature space of a classifier, we observe that a fraction of
instances show near-zero uncertainty. We coin the term "stable instances" for
such instances and diagnose factors that make an instance stable. Next, we
study how an XAI algorithm's uncertainty varies with the size and complexity of
the underlying model. We observe that the more complex the model, the more
inherent uncertainty is exhibited by it. As a result, we propose a measure to
quantify the relative complexity of a blackbox classifier. This could be
incorporated, for example, in LIME-based algorithms' sampling densities, to
help different explanation algorithms achieve tighter confidence levels.
Together, the above measures would have a strong impact on making XAI models
relatively trustworthy for the end-user as well as aiding scientific discovery.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要