MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control
CoRR(2024)
摘要
It is a long-lasting goal to design a generalist-embodied agent that can
follow diverse instructions in human-like ways. However, existing approaches
often fail to steadily follow instructions due to difficulties in understanding
abstract and sequential natural language instructions. To this end, we
introduce MineDreamer, an open-ended embodied agent built upon the challenging
Minecraft simulator with an innovative paradigm that enhances
instruction-following ability in low-level control signal generation.
Specifically, MineDreamer is developed on top of recent advances in Multimodal
Large Language Models (MLLMs) and diffusion models, and we employ a
Chain-of-Imagination (CoI) mechanism to envision the step-by-step process of
executing instructions and translating imaginations into more precise visual
prompts tailored to the current state; subsequently, the agent generates
keyboard-and-mouse actions to efficiently achieve these imaginations, steadily
following the instructions at each step. Extensive experiments demonstrate that
MineDreamer follows single and multi-step instructions steadily, significantly
outperforming the best generalist agent baseline and nearly doubling its
performance. Moreover, qualitative analysis of the agent's imaginative ability
reveals its generalization and comprehension of the open world.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要