Self-Explainable Affordance Learning with Embodied Caption
CoRR(2024)
摘要
In the field of visual affordance learning, previous methods mainly used
abundant images or videos that delineate human behavior patterns to identify
action possibility regions for object manipulation, with a variety of
applications in robotic tasks. However, they encounter a main challenge of
action ambiguity, illustrated by the vagueness like whether to beat or carry a
drum, and the complexities involved in processing intricate scenes. Moreover,
it is important for human intervention to rectify robot errors in time. To
address these issues, we introduce Self-Explainable Affordance learning (SEA)
with embodied caption. This innovation enables robots to articulate their
intentions and bridge the gap between explainable vision-language caption and
visual affordance learning. Due to a lack of appropriate dataset, we unveil a
pioneering dataset and metrics tailored for this task, which integrates images,
heatmaps, and embodied captions. Furthermore, we propose a novel model to
effectively combine affordance grounding with self-explanation in a simple but
efficient manner. Extensive quantitative and qualitative experiments
demonstrate our method's effectiveness.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要