TIGQA:An Expert Annotated Question Answering Dataset in Tigrinya
arxiv(2024)
摘要
The absence of explicitly tailored, accessible annotated datasets for
educational purposes presents a notable obstacle for NLP tasks in languages
with limited resources.This study initially explores the feasibility of using
machine translation (MT) to convert an existing dataset into a Tigrinya dataset
in SQuAD format. As a result, we present TIGQA, an expert annotated educational
dataset consisting of 2.68K question-answer pairs covering 122 diverse topics
such as climate, water, and traffic. These pairs are from 537 context
paragraphs in publicly accessible Tigrinya and Biology books. Through
comprehensive analyses, we demonstrate that the TIGQA dataset requires skills
beyond simple word matching, requiring both single-sentence and
multiple-sentence inference abilities. We conduct experiments using
state-of-the art MRC methods, marking the first exploration of such models on
TIGQA. Additionally, we estimate human performance on the dataset and juxtapose
it with the results obtained from pretrained models.The notable disparities
between human performance and best model performance underscore the potential
for further enhancements to TIGQA through continued research. Our dataset is
freely accessible via the provided link to encourage the research community to
address the challenges in the Tigrinya MRC.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要