Procedural Dilemma Generation for Evaluating Moral Reasoning in Humans and Language Models
arxiv(2024)
摘要
As AI systems like language models are increasingly integrated into
decision-making processes affecting people's lives, it's critical to ensure
that these systems have sound moral reasoning. To test whether they do, we need
to develop systematic evaluations. We provide a framework that uses a language
model to translate causal graphs that capture key aspects of moral dilemmas
into prompt templates. With this framework, we procedurally generated a large
and diverse set of moral dilemmas – the OffTheRails benchmark – consisting of
50 scenarios and 400 unique test items. We collected moral permissibility and
intention judgments from human participants for a subset of our items and
compared these judgments to those from two language models (GPT-4 and Claude-2)
across eight conditions. We find that moral dilemmas in which the harm is a
necessary means (as compared to a side effect) resulted in lower permissibility
and higher intention ratings for both participants and language models. The
same pattern was observed for evitable versus inevitable harmful outcomes.
However, there was no clear effect of whether the harm resulted from an agent's
action versus from having omitted to act. We discuss limitations of our prompt
generation pipeline and opportunities for improving scenarios to increase the
strength of experimental effects.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要