What Are We Optimizing For? A Human-centric Evaluation of Deep Learning-based Movie Recommenders
arxiv(2024)
摘要
In the past decade, deep learning (DL) models have gained prominence for
their exceptional accuracy on benchmark datasets in recommender systems
(RecSys). However, their evaluation has primarily relied on offline metrics,
overlooking direct user perception and experience. To address this gap, we
conduct a human-centric evaluation case study of four leading DL-RecSys models
in the movie domain. We test how different DL-RecSys models perform in
personalized recommendation generation by conducting survey study with 445 real
active users. We find some DL-RecSys models to be superior in recommending
novel and unexpected items and weaker in diversity, trustworthiness,
transparency, accuracy, and overall user satisfaction compared to classic
collaborative filtering (CF) methods. To further explain the reasons behind the
underperformance, we apply a comprehensive path analysis. We discover that the
lack of diversity and too much serendipity from DL models can negatively impact
the consequent perceived transparency and personalization of recommendations.
Such a path ultimately leads to lower summative user satisfaction.
Qualitatively, we confirm with real user quotes that accuracy plus at least one
other attribute is necessary to ensure a good user experience, while their
demands for transparency and trust can not be neglected. Based on our findings,
we discuss future human-centric DL-RecSys design and optimization strategies.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要