UniAR: Unifying Human Attention and Response Prediction on Visual Content
CoRR(2023)
摘要
Progress in human behavior modeling involves understanding both implicit,
early-stage perceptual behavior such as human attention and explicit,
later-stage behavior such as subjective ratings/preferences. Yet, most prior
research has focused on modeling implicit and explicit human behavior in
isolation. Can we build a unified model of human attention and preference
behavior that reliably works across diverse types of visual content? Such a
model would enable predicting subjective feedback such as overall satisfaction
or aesthetic quality ratings, along with the underlying human attention or
interaction heatmaps and viewing order, enabling designers and content-creation
models to optimize their creation for human-centric improvements. In this
paper, we propose UniAR -- a unified model that predicts both implicit and
explicit human behavior across different types of visual content. UniAR
leverages a multimodal transformer, featuring distinct prediction heads for
each facet, and predicts attention heatmap, scanpath or viewing order, and
subjective rating/preference. We train UniAR on diverse public datasets
spanning natural images, web pages and graphic designs, and achieve leading
performance on multiple benchmarks across different image domains and various
behavior modeling tasks. Potential applications include providing instant
feedback on the effectiveness of UIs/digital designs/images, and serving as a
reward model to further optimize design/image creation.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要