CrunchQA: A Synthetic Dataset for Question Answering over Crunchbase Knowledge Graph.

Lifan Yu,Nadya Abdel Madjid,Djellel Eddine Difallah

Big Data（2022）

引用 0|浏览22

暂无评分

摘要

The digital transformation in the finance and enterprise sector has been driven by the advances made in big data and artificial intelligence technologies. For instance, data integration enables businesses to make better decisions by consolidating and mining heterogeneous data repositories. In particular, knowledge graphs (KGs) are used to facilitate the integration of disparate data sources and can be utilized to answer complex queries. This work proposes a new dataset for question-answering on knowledge graphs (KGQA) to reflect the challenges we identified in real-world applications which are not covered by existing benchmarks, namely, multi-hop constraints, numeric and literal embeddings, ranking, reification, and hyper-relations. To build the dataset, we create a new Knowledge Graph from the Crunchbase database using a lightweight schema to support high-quality entity embeddings in large graphs. Next, we create a Question Answering dataset based on natural language question generation using predefined multiple-hop templates and paraphrasing. Finally, we conduct extensive experiments with state-of-the-art KGQA models and compare their performance on CrunchQA. The results show that the existing models do not perform well, for example, on multi-hop constrained queries. Hence, CrunchQA can be used as a challenging benchmark dataset for future KGQA reasoning models. The dataset and scripts are available on the project repository. ¹

查看译文

关键词

synthetic dataset,knowledge

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要