Overestimation, Overfitting, and Plasticity in Actor-Critic: the Bitter Lesson of Reinforcement Learning
CoRR(2024)
摘要
Recent advancements in off-policy Reinforcement Learning (RL) have
significantly improved sample efficiency, primarily due to the incorporation of
various forms of regularization that enable more gradient update steps than
traditional agents. However, many of these techniques have been tested in
limited settings, often on tasks from single simulation benchmarks and against
well-known algorithms rather than a range of regularization approaches. This
limits our understanding of the specific mechanisms driving RL improvements. To
address this, we implemented over 60 different off-policy agents, each
integrating established regularization techniques from recent state-of-the-art
algorithms. We tested these agents across 14 diverse tasks from 2 simulation
benchmarks. Our findings reveal that while the effectiveness of a specific
regularization setup varies with the task, certain combinations consistently
demonstrate robust and superior performance. Notably, a simple Soft
Actor-Critic agent, appropriately regularized, reliably solves dog tasks, which
were previously solved mainly through model-based approaches.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要