A novel end-to-end neural network for simultaneous filtering of task-unrelated named entities and fine-grained typing of task-related named entities

doi:10.1016/j.eswa.2022.117498

Expert Systems with Applications

Volume 204, 15 October 2022, 117498

https://doi.org/10.1016/j.eswa.2022.117498 Get rights and content

Highlights

•
Task related and unrelated entities have very different properties.
•
Simultaneously recognizing all entities is challenging.
•
A novel end-to-end neural network is developed to classify all entities.
•
An improved RBF classifier applies closed decision boundary for all the entity typing.
•
Mention–mention relation strengthens the link of task related and unrelated entities.

Abstract

Recently, one emerging problem in Named Entity Typing (NET) is the fine-grained classification of task-related entities co-existing with task-unrelated entities. The traditional pipeline framework decomposes this problem into two sub-tasks. The first sub-task filters out the task-unrelated entities, while the second sub-task performs fine-grained classification for task-related entities. In the present study, we have developed an end-to-end neural network to solve the two sub-tasks simultaneously. The new model has two main merits. First, Mention–Mention (MM) relationship learning is developed to capture the interaction of task related and unrelated entities for producing more discriminative features. Second, an Improved Radial Basis Function classifier (ImRBF) with a novel training scheme is developed to jointly solve task-unrelated entity filtering and fine-grained classification of task-related entities. Experiments show that our model outperforms the pipeline methods by 3.3%–6% (F1 score) on the first sub-task and 1.8%–6.3% (F1 score) on the second sub-task.

Introduction

Named Entity Typing (NET) is the task of assigning a semantic label to the named entity mention in a text. NET supports many downstream applications, such as relation extraction (Hang et al., 2021, Park et al., 2020), question answering (Gupta et al., 2021, Lee et al., 2006), information retrieval (Carlson et al., 2010, Wang et al., 2009), machine translation (Britz et al., 2017, Kazemi et al., 2017) and knowledge base (Dong et al., 2019, Zhu and Iglesias, 2018) etc. In the early years, NET only considered a tiny set of coarse-grained types, such as Person, Location and Organization etc (Chinchor and Marsh, 1998, Sang and De Meulder, 2003). Nowadays, NET has extended to a broad range of fine-grained entity types, which leads to explosion of the number of entity types (Del Corro et al., 2015, Hearst, 1992, Nakashole et al., 2013, Snow et al., 2006). Murty et al. (2017) applied hierarchically-structured types to solve the fine-grained NET (FG-NET) with thousands of entity types. Ren (2020) introduced hierarchical inference to the FG-NET model for better performance. Zheng et al. (2020) employed the knowledge graph to help model recognize fine-grained entity types. In recent years, many new needs for NET emerge, such as Open NET (Yuan & Downey, 2018), Ultra-Fine Entity Typing (Choi et al., 2018) and so on.

One important practical need of NET is the fine-grained classification of task-related entities that co-exist with task-unrelated entities in a long text such as a news report (Pasca, 2004). Actually, each practical application of NET for long texts has its own focus, and fine-grained NET is often limited to some specific types of entities only, which are named as task-related entities. The other entities are considered as task-unrelated. It is therefore computationally non-economical to explicitly classify all entities because this will need to generate labeled training data for each of the entities including the task-unrelated entities. In addition, the huge number of entity types brings great challenges to the model design (Ling & Weld, 2012). Discovering all types of entities may be impractical for long texts with rich contents, such as news reports (Fernández et al., 2007, Vychegzhanin and Kotelnikov, 2019). Therefore, it is essential to solve the fine-grained NET of task-realted entities co-existing with task-unrelated entities.

This problem is typically solved with a pipeline strategy, which decomposes the problem into two sub-tasks (Kim and Yoon, 2007, Yang and Zhou, 2010, Yao et al., 2015). The first sub-task is the detection and filtering of the task-unrelated entities. Inspired by anomaly detection, closed-boundary classification is employed to filter task-unrelated entities. Closed-boundary classifier could be constructed by one-class Support Vector Machine (SVM) (Bounsiar & Madden, 2014), radial basis function network (Chen et al., 1991), isolation forest (Liu et al., 2008) and so on. The second sub-task is the fine-grained classification of the task-related entity mentions detected in the first sub-task. Open-boundary classification is commonly used in this fine-grained typing task. It assumes that each class could be well represented by its samples in feature space and could be generalized to regions even uncovered by training samples (Lee and Landgrebe, 1997, Li et al., 2018, Mickisch et al., 2020). This assumption works well when the category could be explicitly demonstrated. Many pattern classifiers are built on this assumption, such as linear discriminant analysis (Balakrishnama & Ganapathiraju, 1998), multilayer perceptron neural networks (Murtagh, 1991) and so on. Regarding the overall method of NET of task-related entities co-existing with task-unrelated entities, Kim and Yoon (2007) used SVM-based models to solve the two sub-tasks sequentially, and Yao et al. (2015) employed neural networks to improve the pipeline model performance.

In this paper, we design a novel end-to-end neural network to jointly solve the problem of task-unrelated entity filtering and task-related entity fine-grained classification, where all task-unrelated entities are collectively considered as one class named as “Others”.

However, we found that the inclusion of class “Others” results in deteriorated performance of commonly used multi-class pattern classifiers. Researches show that class “Others” has strong heterogeneity and uncertainty (Kim and Yoon, 2007, Yang and Zhou, 2010). Heterogeneity is because of the very different characteristics of different task-unrelated entities, while uncertainty is due to the fact that some task-unrelated entities remain unknown until they actually appear in the testing data. The significant heterogeneity results in scattered data distribution of class “Others” in the feature space, while the uncertainty results in insufficient coverage of class distribution by the training data. Both issues tend to deteriorate the performance of conventional multi-class classifiers. Therefore, a novel classification scheme should be designed to simultaneously deal with task related and unrelated entities. The basic idea of the novel classification scheme is to assign a hyper-sphere for each of the task-related entities to enclose all its training samples, and meanwhile assign the space beyond all the task-related hyper-spheres to the task-unrelated entities as illustrated in Fig. 1. This integrated method assumes availability of feature space for all fine-grained task-related entity types and unavailability of compact feature space for task-unrelated entities.

In this work, we abandon the conventional open-boundary classifiers and employ the closed-boundary classifier, which takes Radial Basis Function (RBF) neural network as the basis. Moreover, we have developed an improved training scheme for the RBF classifier. In the novel training scheme, pair-sensitive contrastive loss is applied to separate all the entities as the basic idea demonstrated and all-zero class encoding is used to avoid the task-unrelated entities influence to the cross-entropy loss. The improved RBF classifier generates a closed decision boundary for each task-related entity category, and entities outside the closed boundaries in the feature space are classified into class “others”.

In the pipeline framework, the task-unrelated entities are filtered out first, and the task-related entities are then classified (Lee et al., 2004, Yang and Zhou, 2014), without considering the interaction between task-related and task-unrelated entities in the text. We found, however, the interaction between task related and unrelated entities in a semantically coherent text contains useful information for fine-grained classification of task-related entities. Based on this finding, we have developed an algorithm to learn mention–mention relation features for each candidate entity mention to capture the interactions of an entity with all entity mentions around it, no matter they are task related or unrelated.

The main contributions of the present work are summarized as follows:

(i)
We have developed a novel end-to-end neural network, MM-ImRBF, with improved RBF classifier for jointly solving the tasks of task-unrelated entities filtering and task-related entities fine-grained classification.
(ii)
We have developed a mention–mention relation feature learning algorithm to capture interactions between task related and unrelated entities, which leads to improved performance for fine-grained classification of task-related entities.

The rest of the paper is organized as follows. Our designed end-to-end model is explicitly described in Section 2. Experiment results and conclusion are given in Sections 3 Experiments and results, 4 Conclusions separately.

Section snippets

Methodology

As shown in Fig. 2, our model consists of two parts. In Part-I, the entity representation learning is based on sentence-specific context and the entity mention–mention relationship. In Part II, a classifier of a radial basis function (RBF) layer is built on the improved training scheme of one-hot/all-zeros class encoding and pair-sensitive contrastive loss. Finally, a class prediction scheme is applied to simultaneously classify all entities.

Experiment setup

300-dimensional GloVe pre-trained word-embeddings (Pennington et al., 2014) is adopted in our model. Adam optimizer (Kingma & Ba, 2014) is employed in model training and early stopping is used to identify the best training epoch. Dropout regularization with a dropout rate of 0.5 is adopted. The weight for pair-sensitive contrastive loss $λ$ is set to 1, and the expected distance of different classes M is set to 50.

Datasets

Two datasets are used in the experiments: Ontonotes Release 5.0 and Earthquake

Conclusions

In this paper, we have investigated the problem of fine-grained classification of task-related entities that co-exist with task-unrelated in a long text such as a news report. The heterogeneity and uncertainty of task-unrelated entities lead to the need of one-class classification with closed decision boundary to separate the task related and unrelated entities, and hence typical methods usually apply pipeline strategy to deal with task-unrelated entities firstly. In contrast to the pipeline

CRediT authorship contribution statement

Qi Li: Methodology, Software, Investigation, Writing - original draft. Kezhi Mao: Supervision, Conceptualization, Methodology, Writing - review & editing. Pengfei Li: Investigation, Validation. Yuecong Xu: Validation, Visualization. Edmond Y.M. Lo: Investigation.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (47)

GuptaD. et al.
Hierarchical deep multi-modal network for medical visual question answering
Expert Systems with Applications
(2021)
HangT. et al.
Joint extraction of entities and overlapping relations using source–target entity labeling
Expert Systems with Applications
(2021)
KazemiA. et al.
Syntax-and semantic-based reordering in hierarchical phrase-based statistical machine translation
Expert Systems with Applications
(2017)
LeeK.-J. et al.
Biomedical named entity recognition using two-phase model based on SVMs
Journal of Biomedical Informatics
(2004)
MurtaghF.
Multilayer perceptrons for classification and regression
Neurocomputing
(1991)
ParkC. et al.
AGCN: Attention-based graph convolutional networks for drug-drug interaction extraction
Expert Systems with Applications
(2020)
WangY.-C. et al.
Web-based pattern learning for named entity translation in Korean–Chinese cross-language information retrieval
Expert Systems with Applications
(2009)
ZhuG. et al.
Exploiting semantic similarity for named entity disambiguation in knowledge graphs
Expert Systems with Applications
(2018)
BalakrishnamaS. et al.
Linear discriminant analysis-a brief tutorial
Institute for Signal and Information Processing
(1998)
BounsiarA. et al.
One-class support vector machines revisited

BritzD. et al.

Massive exploration of neural machine translation architectures

(2017)

Carlson, A., Betteridge, J., Wang, R. C., Hruschka, E. R., Jr., & Mitchell, T. M. (2010). Coupled semi-supervised...

ChalapathyR. et al.

Anomaly detection using one-class neural networks

(2018)

ChenS. et al.

Orthogonal least squares learning algorithm for radial basis function networks

IEEE Transactions on Neural Networks

(1991)

Chinchor, N., & Marsh, E. (1998). MUC-7 information extraction task definition. In Proceeding of the seventh message...

ChoiE. et al.

Ultra-fine entity typing

(2018)

Del Corro, L., Abujabal, A., Gemulla, R., & Weikum, G. (2015). Finet: Context-aware fine-grained named entity typing....

DingK. et al.

Deep anomaly detection on attributed networks

DongT. et al.

Triple classification using regions and fine-grained entity typing

FernándezN. et al.

IdentityRank: Named entity disambiguation in the context of the NEWS project

Finkel, J. R., Grenager, T., & Manning, C. D. (2005). Incorporating non-local information into information extraction...

Hearst, M. A. (1992). Automatic acquisition of hyponyms from large text corpora. In Coling 1992 volume 2: The 15th...

KimS. et al.

Experimental study on a two phase method for biomedical named entity recognition

IEICE Transactions on Information and Systems

(2007)

Cited by (0)

View full text

A novel end-to-end neural network for simultaneous filtering of task-unrelated named entities and fine-grained typing of task-related named entities

Highlights

Abstract

Introduction

Section snippets

Methodology

Experiment setup

Datasets

Conclusions

CRediT authorship contribution statement

Declaration of Competing Interest

Expert Systems with Applications

Expert Systems with Applications

Expert Systems with Applications

Journal of Biomedical Informatics

Neurocomputing

Expert Systems with Applications

Expert Systems with Applications

Expert Systems with Applications

Linear discriminant analysis-a brief tutorial

Institute for Signal and Information Processing

One-class support vector machines revisited

Massive exploration of neural machine translation architectures

Anomaly detection using one-class neural networks

Orthogonal least squares learning algorithm for radial basis function networks

IEEE Transactions on Neural Networks

Ultra-fine entity typing

Deep anomaly detection on attributed networks

Triple classification using regions and fine-grained entity typing

IdentityRank: Named entity disambiguation in the context of the NEWS project

Experimental study on a two phase method for biomedical named entity recognition

IEICE Transactions on Information and Systems