Alfa: active learning for graph neural network-based semantic schema alignment

The VLDB Journal(2023)

引用 0|浏览7
暂无评分
摘要
Semantic schema alignment aims to match elements across a pair of schemas based on their semantic representation. It is a key primitive for data integration that facilitates the creation of a common data fabric across heterogeneous data sources. Deep learning approaches such as graph representation learning have shown promise for effective alignment of semantically rich schemas, often captured as ontologies. Most of these approaches are supervised and require large amounts of labeled training data, which is expensive in terms of cost and manual labor. Active learning (AL) techniques can alleviate this issue by intelligently choosing the data to be labeled utilizing a human-in-the-loop approach, while minimizing the amount of labeled training data required. However, existing active learning techniques are limited in their ability to utilize the rich semantic information from underlying schemas. Therefore, they cannot drive effective and efficient sample selection for human labeling that is necessary to scale to larger datasets. In this paper, we propose Alfa, an active learning framework to overcome these limitations. Alfa exploits the schema element properties as well as the relationships between schema elements (structure) to drive a novel ontology-aware sample selection and label propagation algorithm for training highly accurate alignment models. We propose semantic blocking to scale to larger datasets without compromising model quality. Our experimental results across three real-world datasets show that (1) Alfa leads to a substantial reduction (27–82 × without adversely affecting model quality and scales AL to large datasets, and (3) sample selection achieves comparable schema matching quality (90 × shorter time per AL iteration and (2) requiring half of the AL iterations to achieve the highest convergent F1-score.
更多
查看译文
关键词
Semantic schema alignment,Active learning,Data integration
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要