Bongard-OpenWorld: Few-Shot Reasoning for Free-form Visual Concepts in the Real World
ICLR 2024(2023)
摘要
We introduce Bongard-OpenWorld, a new benchmark for evaluating real-world
few-shot reasoning for machine vision. It originates from the classical Bongard
Problems (BPs): Given two sets of images (positive and negative), the model
needs to identify the set that query images belong to by inducing the visual
concepts, which is exclusively depicted by images from the positive set. Our
benchmark inherits the few-shot concept induction of the original BPs while
adding the two novel layers of challenge: 1) open-world free-form concepts, as
the visual concepts in Bongard-OpenWorld are unique compositions of terms from
an open vocabulary, ranging from object categories to abstract visual
attributes and commonsense factual knowledge; 2) real-world images, as opposed
to the synthetic diagrams used by many counterparts. In our exploration,
Bongard-OpenWorld already imposes a significant challenge to current few-shot
reasoning algorithms. We further investigate to which extent the recently
introduced Large Language Models (LLMs) and Vision-Language Models (VLMs) can
solve our task, by directly probing VLMs, and combining VLMs and LLMs in an
interactive reasoning scheme. We even conceived a neuro-symbolic reasoning
approach that reconciles LLMs VLMs with logical reasoning to emulate the
human problem-solving process for Bongard Problems. However, none of these
approaches manage to close the human-machine gap, as the best learner achieves
64
Bongard-OpenWorld can help us better understand the limitations of current
visual intelligence and facilitate future research on visual agents with
stronger few-shot visual reasoning capabilities.
更多查看译文
关键词
Few-shot learning,Visual reasoning,Open world learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要