Soft Target-Enhanced Matching Framework for Deep Entity Matching

Authors

  • Wenzhou Dou Northeastern University
  • Derong Shen Northeastern University
  • Xiangmin Zhou RMIT University
  • Tiezheng Nie Northeastern University
  • Yue Kou Northeastern University
  • Hang Cui University of Illinois at Urbana-Champaign
  • Ge Yu Northeastern University

DOI:

https://doi.org/10.1609/aaai.v37i4.25544

Keywords:

DMKM: Linked Open Data, Knowledge Graphs & KB Completion, DMKM: Applications, SNLP: Text Classification

Abstract

Deep Entity Matching (EM) is one of the core research topics in data integration. Typical existing works construct EM models by training deep neural networks (DNNs) based on the training samples with onehot labels. However, these sharp supervision signals of onehot labels harm the generalization of EM models, causing them to overfit the training samples and perform badly in unseen datasets. To solve this problem, we first propose that the challenge of training a well-generalized EM model lies in achieving the compromise between fitting the training samples and imposing regularization, i.e., the bias-variance tradeoff. Then, we propose a novel Soft Target-EnhAnced Matching (Steam) framework, which exploits the automatically generated soft targets as label-wise regularizers to constrain the model training. Specifically, Steam regards the EM model trained in previous iteration as a virtual teacher and takes its softened output as the extra regularizer to train the EM model in the current iteration. As such, Steam effectively calibrates the obtained EM model, achieving the bias-variance tradeoff without any additional computational cost. We conduct extensive experiments over open datasets and the results show that our proposed Steam outperforms the state-of-the-art EM approaches in terms of effectiveness and label efficiency.

Downloads

Published

2023-06-26

How to Cite

Dou, W., Shen, D., Zhou, X., Nie, T., Kou, Y., Cui, H., & Yu, G. (2023). Soft Target-Enhanced Matching Framework for Deep Entity Matching. Proceedings of the AAAI Conference on Artificial Intelligence, 37(4), 4259-4266. https://doi.org/10.1609/aaai.v37i4.25544

Issue

Section

AAAI Technical Track on Data Mining and Knowledge Management