An Analysis of State-of-the-Art Models for Situated Interactive MultiModal Conversations (SIMMC)

Satwik Kottur,Paul A. Crook,Seungwhan Moon,Ahmad Beirami,Eunjoon Cho,Rajen Subba,Alborz Geramifard

SIGDIAL 2021: 22ND ANNUAL MEETING OF THE SPECIAL INTEREST GROUP ON DISCOURSE AND DIALOGUE (SIGDIAL 2021)（2021）

引用 0|浏览37

暂无评分

摘要

There is a growing interest in virtual assistants with multimodal capabilities, e.g., inferring the context of a conversation through scene understanding. The recently released Situated and Interactive Multimodal Conversations (SIMMC) dataset addresses this trend by enabling research to create virtual assistants, which are capable of taking into account the scene that user sees when conversing with the user and also interacting with items in the scene. The SIMMC dataset is novel in that it contains fully annotated user-assistant, task-oriented dialogs where the user and an assistant co-observe the same visual elements and the latter can take actions to update the scene. The SIMMC challenge, held as part of the Ninth Dialog System Technology Challenge (DSTC9), propelled the development of various models which together set a new state-ofthe-art on the SIMMC dataset. In this work, we compare and analyze these models to identify `what worked?', and the remaining gaps; `what next?'. Our analysis shows that even though pretrained language models adapted to this setting show great promise, there are indications that multimodal context isn't fully utilised, and there is a need for better and scalable knowledge base integration. We hope this first-ofits-kind analysis for SIMMC models provides useful insights and opportunities for further research in multimodal conversational agents.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要