A novel end-to-end neural network for simultaneous filtering of task-unrelated named entities and fine-grained typing of task-related named entities
Introduction
Named Entity Typing (NET) is the task of assigning a semantic label to the named entity mention in a text. NET supports many downstream applications, such as relation extraction (Hang et al., 2021, Park et al., 2020), question answering (Gupta et al., 2021, Lee et al., 2006), information retrieval (Carlson et al., 2010, Wang et al., 2009), machine translation (Britz et al., 2017, Kazemi et al., 2017) and knowledge base (Dong et al., 2019, Zhu and Iglesias, 2018) etc. In the early years, NET only considered a tiny set of coarse-grained types, such as Person, Location and Organization etc (Chinchor and Marsh, 1998, Sang and De Meulder, 2003). Nowadays, NET has extended to a broad range of fine-grained entity types, which leads to explosion of the number of entity types (Del Corro et al., 2015, Hearst, 1992, Nakashole et al., 2013, Snow et al., 2006). Murty et al. (2017) applied hierarchically-structured types to solve the fine-grained NET (FG-NET) with thousands of entity types. Ren (2020) introduced hierarchical inference to the FG-NET model for better performance. Zheng et al. (2020) employed the knowledge graph to help model recognize fine-grained entity types. In recent years, many new needs for NET emerge, such as Open NET (Yuan & Downey, 2018), Ultra-Fine Entity Typing (Choi et al., 2018) and so on.
One important practical need of NET is the fine-grained classification of task-related entities that co-exist with task-unrelated entities in a long text such as a news report (Pasca, 2004). Actually, each practical application of NET for long texts has its own focus, and fine-grained NET is often limited to some specific types of entities only, which are named as task-related entities. The other entities are considered as task-unrelated. It is therefore computationally non-economical to explicitly classify all entities because this will need to generate labeled training data for each of the entities including the task-unrelated entities. In addition, the huge number of entity types brings great challenges to the model design (Ling & Weld, 2012). Discovering all types of entities may be impractical for long texts with rich contents, such as news reports (Fernández et al., 2007, Vychegzhanin and Kotelnikov, 2019). Therefore, it is essential to solve the fine-grained NET of task-realted entities co-existing with task-unrelated entities.
This problem is typically solved with a pipeline strategy, which decomposes the problem into two sub-tasks (Kim and Yoon, 2007, Yang and Zhou, 2010, Yao et al., 2015). The first sub-task is the detection and filtering of the task-unrelated entities. Inspired by anomaly detection, closed-boundary classification is employed to filter task-unrelated entities. Closed-boundary classifier could be constructed by one-class Support Vector Machine (SVM) (Bounsiar & Madden, 2014), radial basis function network (Chen et al., 1991), isolation forest (Liu et al., 2008) and so on. The second sub-task is the fine-grained classification of the task-related entity mentions detected in the first sub-task. Open-boundary classification is commonly used in this fine-grained typing task. It assumes that each class could be well represented by its samples in feature space and could be generalized to regions even uncovered by training samples (Lee and Landgrebe, 1997, Li et al., 2018, Mickisch et al., 2020). This assumption works well when the category could be explicitly demonstrated. Many pattern classifiers are built on this assumption, such as linear discriminant analysis (Balakrishnama & Ganapathiraju, 1998), multilayer perceptron neural networks (Murtagh, 1991) and so on. Regarding the overall method of NET of task-related entities co-existing with task-unrelated entities, Kim and Yoon (2007) used SVM-based models to solve the two sub-tasks sequentially, and Yao et al. (2015) employed neural networks to improve the pipeline model performance.
In this paper, we design a novel end-to-end neural network to jointly solve the problem of task-unrelated entity filtering and task-related entity fine-grained classification, where all task-unrelated entities are collectively considered as one class named as “Others”.
However, we found that the inclusion of class “Others” results in deteriorated performance of commonly used multi-class pattern classifiers. Researches show that class “Others” has strong heterogeneity and uncertainty (Kim and Yoon, 2007, Yang and Zhou, 2010). Heterogeneity is because of the very different characteristics of different task-unrelated entities, while uncertainty is due to the fact that some task-unrelated entities remain unknown until they actually appear in the testing data. The significant heterogeneity results in scattered data distribution of class “Others” in the feature space, while the uncertainty results in insufficient coverage of class distribution by the training data. Both issues tend to deteriorate the performance of conventional multi-class classifiers. Therefore, a novel classification scheme should be designed to simultaneously deal with task related and unrelated entities. The basic idea of the novel classification scheme is to assign a hyper-sphere for each of the task-related entities to enclose all its training samples, and meanwhile assign the space beyond all the task-related hyper-spheres to the task-unrelated entities as illustrated in Fig. 1. This integrated method assumes availability of feature space for all fine-grained task-related entity types and unavailability of compact feature space for task-unrelated entities.
In this work, we abandon the conventional open-boundary classifiers and employ the closed-boundary classifier, which takes Radial Basis Function (RBF) neural network as the basis. Moreover, we have developed an improved training scheme for the RBF classifier. In the novel training scheme, pair-sensitive contrastive loss is applied to separate all the entities as the basic idea demonstrated and all-zero class encoding is used to avoid the task-unrelated entities influence to the cross-entropy loss. The improved RBF classifier generates a closed decision boundary for each task-related entity category, and entities outside the closed boundaries in the feature space are classified into class “others”.
In the pipeline framework, the task-unrelated entities are filtered out first, and the task-related entities are then classified (Lee et al., 2004, Yang and Zhou, 2014), without considering the interaction between task-related and task-unrelated entities in the text. We found, however, the interaction between task related and unrelated entities in a semantically coherent text contains useful information for fine-grained classification of task-related entities. Based on this finding, we have developed an algorithm to learn mention–mention relation features for each candidate entity mention to capture the interactions of an entity with all entity mentions around it, no matter they are task related or unrelated.
The main contributions of the present work are summarized as follows:
- (i)
We have developed a novel end-to-end neural network, MM-ImRBF, with improved RBF classifier for jointly solving the tasks of task-unrelated entities filtering and task-related entities fine-grained classification.
- (ii)
We have developed a mention–mention relation feature learning algorithm to capture interactions between task related and unrelated entities, which leads to improved performance for fine-grained classification of task-related entities.
The rest of the paper is organized as follows. Our designed end-to-end model is explicitly described in Section 2. Experiment results and conclusion are given in Sections 3 Experiments and results, 4 Conclusions separately.
Section snippets
Methodology
As shown in Fig. 2, our model consists of two parts. In Part-I, the entity representation learning is based on sentence-specific context and the entity mention–mention relationship. In Part II, a classifier of a radial basis function (RBF) layer is built on the improved training scheme of one-hot/all-zeros class encoding and pair-sensitive contrastive loss. Finally, a class prediction scheme is applied to simultaneously classify all entities.
Experiment setup
300-dimensional GloVe pre-trained word-embeddings (Pennington et al., 2014) is adopted in our model. Adam optimizer (Kingma & Ba, 2014) is employed in model training and early stopping is used to identify the best training epoch. Dropout regularization with a dropout rate of 0.5 is adopted. The weight for pair-sensitive contrastive loss is set to 1, and the expected distance of different classes M is set to 50.
Datasets
Two datasets are used in the experiments: Ontonotes Release 5.0 and Earthquake
Conclusions
In this paper, we have investigated the problem of fine-grained classification of task-related entities that co-exist with task-unrelated in a long text such as a news report. The heterogeneity and uncertainty of task-unrelated entities lead to the need of one-class classification with closed decision boundary to separate the task related and unrelated entities, and hence typical methods usually apply pipeline strategy to deal with task-unrelated entities firstly. In contrast to the pipeline
CRediT authorship contribution statement
Qi Li: Methodology, Software, Investigation, Writing - original draft. Kezhi Mao: Supervision, Conceptualization, Methodology, Writing - review & editing. Pengfei Li: Investigation, Validation. Yuecong Xu: Validation, Visualization. Edmond Y.M. Lo: Investigation.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References (47)
- et al.
Hierarchical deep multi-modal network for medical visual question answering
Expert Systems with Applications
(2021) - et al.
Joint extraction of entities and overlapping relations using source–target entity labeling
Expert Systems with Applications
(2021) - et al.
Syntax-and semantic-based reordering in hierarchical phrase-based statistical machine translation
Expert Systems with Applications
(2017) - et al.
Biomedical named entity recognition using two-phase model based on SVMs
Journal of Biomedical Informatics
(2004) Multilayer perceptrons for classification and regression
Neurocomputing
(1991)- et al.
AGCN: Attention-based graph convolutional networks for drug-drug interaction extraction
Expert Systems with Applications
(2020) - et al.
Web-based pattern learning for named entity translation in Korean–Chinese cross-language information retrieval
Expert Systems with Applications
(2009) - et al.
Exploiting semantic similarity for named entity disambiguation in knowledge graphs
Expert Systems with Applications
(2018) - et al.
Linear discriminant analysis-a brief tutorial
Institute for Signal and Information Processing
(1998) - et al.
One-class support vector machines revisited