S-EQA: Tackling Situational Queries in Embodied Question Answering
arxiv(2024)
摘要
We present and tackle the problem of Embodied Question Answering (EQA) with
Situational Queries (S-EQA) in a household environment. Unlike prior EQA work
tackling simple queries that directly reference target objects and quantifiable
properties pertaining them, EQA with situational queries (such as "Is the
bathroom clean and dry?") is more challenging, as the agent needs to figure out
not just what the target objects pertaining to the query are, but also requires
a consensus on their states to be answerable. Towards this objective, we first
introduce a novel Prompt-Generate-Evaluate (PGE) scheme that wraps around an
LLM's output to create a dataset of unique situational queries, corresponding
consensus object information, and predicted answers. PGE maintains uniqueness
among the generated queries, using multiple forms of semantic similarity. We
validate the generated dataset via a large scale user-study conducted on
M-Turk, and introduce it as S-EQA, the first dataset tackling EQA with
situational queries. Our user study establishes the authenticity of S-EQA with
a high 97.26
consensus object data. Conversely, we observe a low correlation of 46.2
LLM-predicted answers to human-evaluated ones; indicating the LLM's poor
capability in directly answering situational queries, while establishing
S-EQA's usability in providing a human-validated consensus for an indirect
solution. We evaluate S-EQA via Visual Question Answering (VQA) on VirtualHome,
which unlike other simulators, contains several objects with modifiable states
that also visually appear different upon modification – enabling us to set a
quantitative benchmark for S-EQA. To the best of our knowledge, this is the
first work to introduce EQA with situational queries, and also the first to use
a generative approach for query creation.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要