The Power of Noise: Redefining Retrieval for RAG Systems
CoRR(2024)
摘要
Retrieval-Augmented Generation (RAG) systems represent a significant
advancement over traditional Large Language Models (LLMs). RAG systems enhance
their generation ability by incorporating external data retrieved through an
Information Retrieval (IR) phase, overcoming the limitations of standard LLMs,
which are restricted to their pre-trained knowledge and limited context window.
Most research in this area has predominantly concentrated on the generative
aspect of LLMs within RAG systems. Our study fills this gap by thoroughly and
critically analyzing the influence of IR components on RAG systems. This paper
analyzes which characteristics a retriever should possess for an effective
RAG's prompt formulation, focusing on the type of documents that should be
retrieved. We evaluate various elements, such as the relevance of the documents
to the prompt, their position, and the number included in the context. Our
findings reveal, among other insights, that including irrelevant documents can
unexpectedly enhance performance by more than 30
our initial assumption of diminished quality. These findings call for
developing specialized approaches tailored to the specific demands of
integrating retrieval with language generation models and pave the way for
future research. These results underscore the need for developing specialized
strategies to integrate retrieval with language generation models, thereby
laying the groundwork for future research in this field.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要