IndicGenBench: A Multilingual Benchmark to Evaluate Generation Capabilities of LLMs on Indic Languages
arxiv(2024)
摘要
As large language models (LLMs) see increasing adoption across the globe, it
is imperative for LLMs to be representative of the linguistic diversity of the
world. India is a linguistically diverse country of 1.4 Billion people. To
facilitate research on multilingual LLM evaluation, we release IndicGenBench -
the largest benchmark for evaluating LLMs on user-facing generation tasks
across a diverse set 29 of Indic languages covering 13 scripts and 4 language
families. IndicGenBench is composed of diverse generation tasks like
cross-lingual summarization, machine translation, and cross-lingual question
answering. IndicGenBench extends existing benchmarks to many Indic languages
through human curation providing multi-way parallel evaluation data for many
under-represented Indic languages for the first time. We evaluate a wide range
of proprietary and open-source LLMs including GPT-3.5, GPT-4, PaLM-2, mT5,
Gemma, BLOOM and LLaMA on IndicGenBench in a variety of settings. The largest
PaLM-2 models performs the best on most tasks, however, there is a significant
performance gap in all languages compared to English showing that further
research is needed for the development of more inclusive multilingual language
models. IndicGenBench is released at
www.github.com/google-research-datasets/indic-gen-bench
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要