Chameleon: A Data-Efficient Generalist for Dense Visual Prediction in the Wild
arxiv(2024)
摘要
Large language models have evolved data-efficient generalists, benefiting
from the universal language interface and large-scale pre-training. However,
constructing a data-efficient generalist for dense visual prediction presents a
distinct challenge due to the variation in label structures across different
tasks. Consequently, generalization to unseen dense prediction tasks in the
low-data regime is not straightforward and has received less attention from
previous vision generalists. In this study, we explore a universal model that
can flexibly adapt to unseen dense label structures with a few examples,
enabling it to serve as a data-efficient vision generalist in diverse
real-world scenarios. To this end, we base our method on a powerful
meta-learning framework and explore several axes to improve its performance and
versatility for real-world problems, such as flexible adaptation mechanisms and
scalability. We evaluate our model across a spectrum of unseen real-world
scenarios where low-shot learning is desirable, including video, 3D, medical,
biological, and user-interactive tasks. Equipped with a generic architecture
and an effective adaptation mechanism, our model flexibly adapts to all of
these tasks with at most 50 labeled images, showcasing a significant
advancement over existing data-efficient generalist approaches. Codes are
available at https://github.com/GitGyun/chameleon.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要