Joint Extraction of Entities and Relations Based on Enhanced Span and Gate Mechanism

Zhang, Nan; Xin, Junfang; Cai, Qiang; Chung, Vera

doi:10.3390/app131910643

Open AccessArticle

Joint Extraction of Entities and Relations Based on Enhanced Span and Gate Mechanism

¹

College of Computing, Beijing Technology and Business University, Beijing 100048, China

²

School of Computer Science, The University of Sydney, Sydney, NSW 2006, Australia

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(19), 10643; https://doi.org/10.3390/app131910643

Submission received: 26 August 2023 / Revised: 21 September 2023 / Accepted: 22 September 2023 / Published: 25 September 2023

(This article belongs to the Special Issue AI Techniques in Computational and Automated Fact Checking)

Download

Browse Figures

Versions Notes

Abstract

:

Although entity and relation joint extraction can obtain relational triples efficiently and accurately, there are a number of problems; for instance, the information between entity relations could be transferred better, entity extraction based on span is inefficient, and it is difficult to identify nested entities. In this paper, a joint entity and relation extraction model based on an Enhanced Span and Gate Mechanism (ESGM) is proposed to solve the above problems. We design a new span device to solve the problem of entity nesting and inefficiency. We use the pointer network method to predict the beginning and end of the span, and combine them through the one-to-many matching principle. A binary classification model is then trained to predict whether the span of the combination is the subject. In the object prediction stage, a gating unit is added to fuse the subject information with the sentence information and strengthen the information transfer between the entity and the relationship. Finally, the relationship is used as the mapping function to predict the tail entity related to the head entity. Our experimental results prove the effectiveness of this model. The precision of the proposed model reached 93.8% on the NYT dataset, which was 0.4% higher than that of the comparison model. Moreover, when the same experiment was conducted in a nested entity scenario, the accuracy of the proposed model was 4.4% higher than that of the comparison model.

Keywords:

nested entities; relation overlap; joint extraction; span; gate unit

1. Introduction

Entity and relation extraction is an essential approach to acquiring knowledge and represents a crucial step in knowledge extraction. It holds significant value in understanding both natural language and world knowledge. With the deepening research in knowledge extraction, entity and relation extraction has garnered considerable attention from scholars. The concept of entity and relation extraction was introduced more than twenty years ago. Its objective is to extract both named entities and the relationships between them from unstructured text, typically in the form of triplets.

Early research on entity and relation extraction employed a pipeline-based approach [1,2,3], which decomposed the project into two attached tasks, namely, entity identification and relationship extraction. This means that all entities in the sentences must first be extracted, followed by the relationships within the sentences. Finally, relations are assigned to entity pairs. This method treats relation extraction as a relation classification problem. However, this method is prone to error propagation, as errors in the entity phase can be spread to the relation during the extraction phase. Additionally, the pipeline approach overlooks the interaction between the named entity recognition and relation extraction tasks, resulting in loss of information between the two subtasks.

To address the aforementioned issues, researchers have proposed methods for joint entity and relation extraction. Initially, joint extraction methods relied on feature engineering [4]. However, with recent advancements in neural networks, the combination of neural network models [5] with pretrained language models [6] has garnered significant attention in the field of relation extraction. These approaches have shown remarkable progress, and have gradually become the primary method used for joint extraction.

Current research on joint extraction of entities and relations has largely focused on addressing the challenges of relation overlap and nested entities. Relation overlap can be categorized into entity pair overlap and single-entity overlap. On the other hand, nested entities refer to the situation where multiple entity spans have the same start position and different end positions. A concrete example is shown in Figure 1.

Currently, researchers have conducted extensive research on relation overlap and nested entities issues in joint entity and relation extraction. Among them, Wei et al. [7] introduced the CasRel model, which employs a binary tagging strategy to model relations in the relation dataset. This model treats relations as a mapping function from the subject to the object. It first extracts the subject and then matches the object based on relation conditions, enabling the extraction of relation triplets. While the CasRel model effectively addresses relation overlap, it does not tackle nested entities. Furthermore, Bowen Yu et al. [8] decomposed the entity and relation extraction task into two subtasks: head entity (HE) and tail entity relation (TER). They first extracted the head entities, then assigned related relationship and tail entities to each head entity. To address these subtasks, they decomposed them into multiple sequence labeling problems and utilized a hierarchical boundary labeler to predict the start and end positions of entities. To accommodate multi-span entities, they proposed a multi-span decoding algorithm. This approach achieved notable results; however, it is not able to handle entity overlap. In addition, Eberts et al. [9] proposed a span-based Transformer model for relation extraction. This model treats each entity as a span and can improve extraction efficiency through positive and negative sample training. After performing type classification for each span and filtering them based on length restrictions, the model assigns the corresponding relations. While the span-based Transformer model successfully addresses nested entity and relation overlap issues, it may encounter entity redundancy during the relation classification process. These studies demonstrate the efforts made to address relation overlap and nested entity challenges in entity and relation extraction. Each approach tackles specific aspects of the problem and contributes to advancing the field of relation extraction.

Despite the significant progress made in entity and relation extraction with the aforementioned methods, there are several areas that require improvement. The challenge of accurately extracting the subject arises from the complexity of sentence structures, semantic diversity, and contextual ambiguity. Traditional rules-based or pattern-matching methods have limitations when faced with these challenges. With the rapid advancement of deep learning, neural network-based approaches have achieved notable performance improvements in entity and relation extraction tasks. However, current deep learning methods have limitations in subject extraction. This becomes particularly challenging in complex sentence structures, long-distance dependencies, and sentences with nested structures. Accurately extracting the subject is crucial for the successful execution of entity and relation extraction tasks and the reliability of downstream applications. Additionally, relation extraction encounters the issue of class imbalance. Many entity pairs in the extraction do not form valid relations, resulting in a vast quantity of negative samples. Moreover, the classifier may become confused when an entity is associated with multiple relations, further affecting the accurate extraction of relations. In scenarios where there is an insufficient volume of training data, the model’s ability to extract complete and precise triplets may be compromised. Consequently, our research is dedicated to enhancing subject extraction accuracy, thereby improving the overall quality and reliability of both entity and relation extraction processes. In addition, this paper focuses on modeling relations and mitigating the impact of class imbalance in relation extraction.

This study presents an innovative joint extraction model called “Joint Extraction of Entities and Relations based on Enhanced Span and Gate Mechanism” (ESGM), which treats relations as a function mapping from subject to object and utilizes a pointer network labeler for the extraction of entities and relations. The model we propose utilizes the pretrained BERT language model as the encoder for sentence encoding, enabling it to capture informative representations. For each relation, the model constructs a function that serves as the condition for mapping from subjects to objects. To address nested entities and improve entity extraction accuracy, the model adopts a span-based approach for entity extraction. Unlike traditional fragment-based methods in relation extraction, this model integrates the span selection technique from Machine Reading Comprehension [10] (MRC). This paper predicts the start and end positions of spans using two binary classifiers and proposes a one-to-many matching strategy to form spans by matching predicted start and end positions. A classifier model is utilized to determine whether a span represents a subject. Before matching the object, a gate mechanism is designed to fuse the features of the subject with the sentence encoding vector. This improves the model’s capacity to capture information and dependencies between subjects and objects throughout the extraction process. Ultimately, using the relation function mapping, the model sequentially pairs each subject with its respective object.

This model mainly consists of five modules: a BERT encoding module, span extraction module, subject extraction prediction module, subject feature and sentence vector fusion module, and object relation extraction prediction module. The goal of this paper is to enhance the efficiency of the relation extraction model and solve the problems of poor information transmission between entity relations, low efficiency of span-based entity extraction, difficulty in identifying nested entities. We put forward a new kind of extraction model to enhance the entity relationship extraction task. Specifically, our research aims and contributions are as follows:

A new one-to-many matching method is proposed which combines the predicted start positions and multiple end positions of spans sequentially. Specifically, a start position can be matched with multiple end positions located to its right. In addition, we design a subject binary classifier to determine whether a span represents the subject entity in the sentence. This approach effectively solves the problem of nested entities.
To enhance feature transfer between the subject and object prediction tasks, we present a new subject feature fusion method based on a gate mechanism. Compared with simple connections, this method can integrate information more effectively and enhance the expressiveness of the model.
We present a novel joint model called ESGM that combines the pointer network span with a gate mechanism. This new model addresses the challenges of nested entities and relation overlap, enabling the model to capture richer semantic information and resulting in greatly improved performance.

2. Related Work

2.1. Span Extraction

A span typically refers to a contiguous segment or range within a piece of text. It can represent a single word, an entity, a phrase, or a range. In the context of relation extraction tasks, a span often indicate the textual segment between the source entity and the target entity of a relation. For example, in the sentence “John works at Apple” we can mark “John” as the span representing the source entity and “Apple” as the span representing the target entity. The model can learn from this span to understand the relationship between these two entities as “works at.” The use of spans is not limited to relation extraction tasks; they can be used for tasks such as event extraction, semantic role marking, and more. In these tasks, spans can be used to mark the boundaries of entities, events, or semantic roles.

The earliest examples of using spans to address tasks related to relation extraction include the work by Lee et al. [11] in solving the Coreference resolution task. Coreference resolution mainly solves the problem of multiple references to a single entity. Wan et al. [12] improved the representation of spans using a retrieval-based span-level graph and calculating the n-gram similarity as the distance between spans to address the issue of nested entities. Kalpit Dixit et al. [13] proposed a span-level relation extraction model. This model first obtains specific spans through token embeddings, then calculates a score vector for the entity type of each obtained span. The span is assigned to the entity type with the highest score, and the relations between span pairs are finally determined based on the computed scores. The model is based on a bidirectional LSTM architecture. Li et al. [14] regarded entity extraction as a reading comprehension task. They used a pointer network to predict the start and end indices of spans, allowing for the extraction of target entity spans. The above studies demonstrate the early adoption of spans in various tasks related to relation extraction. Spans have been utilized for coreference resolution, addressing nested entities, improving span representation, and extracting entity span for relation extraction tasks.

The span-based approach is effective in addressing the issue of nested entities, that is, where overlapping or nested entities exist within the text. By marking the start and end positions of entities, the structure of nested entities can be accurately captured. Consequently, this approach has been widely adopted by recent models such as Spert [9] and PURE [15], which perform entity and relation extraction based on spans. However, these span-based relation extraction methods typically employ an enumeration approach, in which all possible spans are generated and then recognized as entities. This leads to a large number of redundant span combinations, increasing the complexity of the model. With the aim of addressing this, in the present paper we adopt a pointer network method inspired by MRC for predicting the start and end positions of spans. Additionally, a one-to-many matching method is proposed to combine the predicted start and end positions to form target spans. In this way, the proposed approach can effectively handle nested entities and alleviate the issue of redundant span combinations.

2.2. Entity and Relation Extraction

Entity and relation extraction is the process of extracting triplets from a given unstructured text and set of relations. It is a crucial step in information extraction tasks and a prominent research direction in NLP. The typical approach in early research, known as the pipeline method, was to first extract entities and then assign specific relations to each entity pair. For example, Socher et al. [16] applied Recurrent Neural Networks (RNN) to relation extraction tasks. Wang et al. [17] proposed using Convolutional Neural Networks (CNN) for relation extraction. Cai et al. [18] combined CNN with Long Short-Term Memory (LSTM) networks and utilized the Shortest Dependency Path (SDP) principle to solve relation extraction problems, achieving notable results. This approach does not consider the useful information that can be generated during the execution of both tasks. Performing the two tasks separately can result in redundant entities or relations, leading to accumulating errors and information redundancy. To reduce the impact of error accumulation and information redundancy, researchers have proposed building joint models that concurrently extract entities and relations.

The early joint models were mainly based on feature extraction. Miwa et al. [19] was the first to propose a parameter-sharing joint model in which entities and relations share the embedding layer, although the two tasks were still performed separately, resulting in entity redundancy and dependence on sequential labeling. Building upon Miwa’s work, Katiyar et al. [20] improved the model by constructing an RNN model using multiple layers of bidirectional LSTMs, presenting the first truly neural network-based joint model. However, these methods required complex feature engineering and manual labeling when sharing feature vector parameters between the two tasks, resulting in lower extraction efficiency. In 2014, Sutskever et al. [21] applied a sequence-to-sequence learning method, which uses an RNN to construct an encoder–decoder structure for handling sequence data. This method incorporated an attention mechanism to concentrate on the association between the input sequence and the decoder output sequence. Subsequently, researchers have started to use neural networks to build joint extraction models encompassing both entity and relation extraction. Zheng et al. [22] proposed a sequence labeling approach to address the extraction problem, simultaneously extracting entities and relations in the form of triplets using an end-to-end model. However, this model cannot handle relation overlap. Due to the need for manual annotation of unlabeled datasets for relation extraction, distant supervision techniques have been utilized in entity and relation extraction tasks. Mintz et al. [23] used distant supervision methods to address entity and relation extraction problems, which has gradually gained popularity in the field. Additionally, Lin et al. [24] introduced the application of sentence-level attention mechanisms to mitigate the influence of inaccurate labels in distant supervision on the extraction outcomes.

In recent times, scholars have concentrated their efforts on addressing the nested entities and overlapping relations in the realm of joint entity and relation extraction. Xie et al. [25] proposed the RERE model, which first extracts relations and then matches the relations to subjects and objects to obtain relation triplets, effectively improving the accuracy of relation extraction. However, this model cannot handle cases involving multiple relations. Furthermore, Zheng et al. [26] introduced the PRGC model, which predicts potential relations and reduces the relation set to the predicted potential relation set. It then performs “BIO” tagging on the entities using a sentence processing module, and finally aligns the entity pairs and extracts the triplets. Chen et al. [15] argued that joint entity and relation extraction can harm model performance. Therefore, they constructed separate entity and relation models, trained them independently, used entity features as inputs to connect the two models, and reverted to the pipeline method. This approach achieved good results. Yan et al. [27] proposed a PFN model that includes a partition-filtering network encoder and two task units. It divides the cells into entity cells, relation cells, and shared cells, filters and stores feature information, and then inputs the feature information into the task units. This model achieved good results, demonstrating the benefits of relation extraction for entity extraction and refuting the viewpoint that the pipeline method is more efficient. Dai et al. [28] created an encoding framework for each token and fused the entity and relations using BIES encoding. They applied a position attention to identify the entities at the current position and their corresponding relations. However, this method requires repeated encoding for each sentence, resulting in higher complexity. Wang et al. [29] proposed the TPLink model, which uses three types of matrix markings to label the entities and an upper triangular matrix to represent them, thereby addressing the issue of matrix sparsity. Finally, it performs entity decoding for each relation. The labeling strategy of this model effectively addresses the challenges of nested entity and relation overlap, although it increases the computational cost by increasing the sentence length from N to N-squared, making it unsuitable for processing large sentences. Lastly, Shang et al. [30] introduced the OneRel model, which extracts all triplets in a single step. It uses a score-based classifier to evaluate the correctness of triplets and assigns specific labels to them, then decodes the triplets based on relation-specific Horn clauses to obtain the final triplet results.

Joint entity and relation extraction is the core task of natural language processing and knowledge engineering. A joint model can connect the subtasks of entity extraction and relation extraction, as there are opportunities for information sharing between them. Relation extraction can provide valuable feedback for entity extraction. To address the challenges of relation overlap and nested entities, in this paper we propose joint Enhanced Span and Gate Mechanism (ESGM)-based extraction of entities and relations.

3. ESGM Model

The composition of the ESGM model is shown in Figure 2, which comprises of the BERT encoding module, span extraction module, subject predict extraction module, subject feature–sentence vector fusion module, and object relation extraction module. First, the sentence is encoded using a BERT encoder to obtain a vector embedding for all tokens. These representations are then fed into the span extraction module, which predicts the probability of each word being the start position and end position of a span. In this paper, we propose a one-to-many matching approach to sequentially pair all predicted start and end positions and train a binary classification model to predict whether the paired span is the subject. In order to effectively fuse the global information in the sentence with the subject features, in this paper we design a gate unit to perform the fusion. The fused information is then passed through a relation mapping function to match the corresponding object. In Figure 2, 9 × 768 is marked as the vector of the 9 × 768 dimension of the instance sentence after BERT, where 9 is the length of the sentence and 768 is the vector dimension of BERT.

The main functionality of this model is to identify all possible relation triplets in the given dataset while addressing the challenges of relation overlap and nested entities. The model first extracts the subject, then, based on the extracted subject, it simultaneously extracts the relation and object. For example, for the sentence “New York University is located in the United States”, it first extracts all the head entities in the sentence, as follows: “New York, New York University, the United States”. Then, under the condition of the head entity “New York”, the corresponding tail entity “the United States” can be found for the relation “located”, while the corresponding tail entity cannot be found for the relation “contains”. Finally, all the relations are iterated in turn to find all the relational triples in the sentence, with “New York” as the head entity. The same process is followed for the other two head entities. The process of relational triplet extraction in this model can be represented by Formula (1):

P ((s, r, o) | x_{j}) = p (s | x_{j}) \cdot p ((r, o), r \in R | x_{j}, s)

(1)

where

P ((s, r, o) | x_{j})

represents the probability of extracting the target relation triplet for a given set of sentences,

p (s | x_{j})

represents the probability of extracting the subject from the sentences, and

p ((r, o), r \in R | x_{j}, s)

represent the probabilities of extracting the object and relation given the subject and the sentences. In these formulae,

(s, r, o)

represent a relational triplet, R indicates the relationship in the dataset,

x_{j}

represents the jth sentence in the dataset, and

(r, o)

represents the object–relation pair.

3.1. BERT Encoding Module

BERT is a pretrained masked language model consisting of multiple identical bidirectional Transformer encoders [31]. It has been widely applied in various downstream tasks, including information extraction and machine translation. BERT typically divides the input sentence into a segment embedding, token embedding, and position embedding; however, for the discussion of sentence-level entity and relation extraction in this paper, segment embedding is not relevant. The encoding layer is used to obtain the encoded vectors for each token in the input sentence. The algorithm execution procedure of the encoding layer for the input sentence is illustrated in Equations (2) and (3):

H_{0} = S \cdot W_{s} + W_{p}

(2)

H_{α} = E n c o d e r (H_{α - 1}), α \in [0, N]

(3)

where S represents the one-hot vector matrix for each token in the input sentence. A one-hot matrix is a common vector representation method used to convert discrete categories or labels into vector form. It is a sparse representation in which each category is represented as an independent vector and each vector has only one element equal to 1 at its corresponding category index, with all other elements being 0. Here,

W_{s}

represents the token embedding matrix for each token and

W_{p}

represents the position embedding matrix for each token. The variable

H_{α}

represents the hidden layer vector, where

α

represents the

α

th layer of the Transformer. The term

E n c o d e r ()

refers to the transformer encoder in the BERT module, which encodes the input vectors; N represents the number of Transformer blocks, which in this paper we set to 12. Through these operations, a sentence representation is obtained that incorporates both semantic and positional information, providing a foundation for the subsequent entity and relation extraction tasks.

3.2. Span Extraction Module

Entity and relation extraction using spans can effectively address the issue of nested entities. Such methods can generally be categorized into two approaches, namely, the combination of span enumeration with classification, and pointer network methods. In the span enumeration with classification approach, all possible spans of different lengths in the sentence are enumerated and entity classification is performed using a softmax function. This method results in a large number of redundant spans, as the number of enumerated spans in a sentence with N tokens is N(N + 1)/2. This increases the computational complexity of entity recognition. Pointer network methods are commonly used in MRC tasks. They use two n-class markers to predict the begin and end positions of spans, where n represents the length of the sentence. This method employs a sigmoid function to forecast whether each token within the sentence signifies a starting or concluding position, thereby allowing the output of a single span only when a specific query is provided. Building upon these approaches, in this paper we utilize two binary classifiers; one predicts whether each token is a start position of a span, and the other predicts whether each token is an end position of a span. This approach allows for multiple start and end positions to be found, enabling the extraction of multiple relevant entities. To address the issue of nested entities, a one-to-many matching method is proposed which matches the begin position token with all the subsequent final position tokens, resulting in the final span set.

After passing the sequence of sentences through BERT, the final Transformer layer output, which represents the representation of each token in the sentence, is obtained. These representations are then input into the span extraction module. In this module, the model predicts the probability of each token being the starting position of a span, then predicts the probability of each token being the end position of a span using the same approach. The probability predictions are calculated according to Equations (4) and (5).

p_{i}^{s t a r t_s} = σ (W_{s t a r t_s} \cdot t_{i} + b_{s t a r t_s})

(4)

p_{i}^{e n d_s} = σ (W_{e n d_s} \cdot t_{i} + b_{e n d_s})

(5)

In the above equations,

p_{i}^{s t a r t_s}

and

p_{i}^{e n d_s}

respectively denote the probabilities that the ith token within a given sentence is the start or end position of the subject. Here,

t_{i}

represents the final Transformer layer encoding representation of the ith token in the input sentence

t_{i} = H_{N} [i]

, the weights are denoted by

W_{s t a r t_s}

and

W_{e n d_s}

, the offset is denoted by

b_{s t a r t_s}

and

b_{e n d_s}

, and

σ

represents the sigmoid activation function.

During the probability prediction stage, we set a threshold to compare it with the predicted probabilities; the results are labeled using two binary markers. Following references [7,9,26], we conducted several experiments, finding that setting the threshold to 0.5 has a good experimental effect; thus, we set the threshold as 0.5. Tokens with probabilities that exceed the threshold are marked as 1 at their corresponding positions, while the remaining positions are marked as 0. After labeling the tokens, the next step is to form spans by combining those tokens marked as 1 in the start and end markers. In this paper, we adopt a one-to-many matching principle where a start position can be matched with multiple end positions. As shown in Figure 2, the start marker labels “New” and “United” as 1, while the end marker labels “York”, “University”, and “States” as 1. By applying the one-to-many matching approach, the tokens marked as 1 in the start positions are combined with the tokens marked as 1 in the subsequent end positions. For example, “New” is combined with “York” to form “New York”, with “University” to form “New York University”, and with “States” to form “New York University is located in the United States”. This process allows for the identification of nested entities. The indexing process for the start and end positions of the spans is shown in Equations (6) and (7):

T_{s t a r t_s} = \{i | p_{i}^{s t a r t_s} > h_{b a r}, i = 1, 2, \cdot \cdot \cdot, n\}

(6)

T_{e n d_s} = \{j | p_{j}^{e n d_s} > t_{b a r}, j = 1, 2, \cdot \cdot \cdot, n\}

(7)

where

T_{s t a r t_s}

and

T_{e n d_s}

represent the set of position indexes in the sentence for tokens with a probability greater than the set threshold, i and j represent the position indexes in the sentence of tokens marked as 1 in the two markers,

h_{b a r}

and

t_{b a r}

represent the respective begin position thresholds and end position thresholds, and n represents the length of the sentence.

3.3. Subject Extraction Prediction Module

In entity and relation extraction, the accuracy and completeness of subject extraction have a profound impact on the subsequent extraction of the object and relation. The subject represents the entity that performs an action or event in the sentence. Therefore, correctly extracting the subject helps to locate the object’s position within the given sentence, thereby determining the direction of relationship extraction. The features and contextual information of the subject are crucial for accurately classifying the relation. The subject can provide contextual information relevant to the object, allowing for a better understanding of the object, its position, and the relationship between them. Furthermore, the subject can offer important clues to sentence meaning and semantic inference. Properly understanding and extracting the subject contributes to a more accurate comprehension of the sentence’s semantics, thereby improving relation extraction and semantic inference. To sum up, the correct extraction of subjects is vital for the successful execution of entity and relation extraction tasks and the reliability of downstream applications. Therefore, in this study we focus on improving the accuracy and effectiveness of subject extraction in order to enhance the quality and reliability of entity and relation extraction.

It can be seen from Figure 2 that after the span extraction module the span collection contains four spans, several of which are not entities; thus, it is necessary to judge the formed spans. In this module, the BERT-encoded sentence feature vector is input into an attention mechanism, allowing the information in the sentence to be learned. Each word in the sentence is projected to all words, including itself, in order to obtain the correlation between each word and then obtain the dependency between them. Words with strong dependencies must have greater correlations, and are more likely to be a subject. To capture multi-dimensional dependencies between words, the model utilizes a multi-head attention mechanism that captures dependencies from multiple spatial vectors, allowing it to learn richer features of the sentence. After the attention layer, a multi-layer binary classifier takes the span vector representations as input and leverages the learned entity features from the attention mechanism to determine whether the spans obtained in this module are subjects. By applying the process described in Equation (8), Span 4 is identified as not being a subject, resulting in the output of only three subject results:

P_{m} (s | x_{j}) = σ (W_{s u b} \cdot [E_{s t a r t_s}^{i}, E_{e n d_s}^{j}] + b_{s u b})

(8)

where

W_{s u b}

and

b_{s u b}

respectively represent the training weights and biases of the model and

E_{s t a r t_s}^{i}

and

E_{e n d_s}^{j}

respectively indicate the token vectors corresponding to the span’s start and end positions, with

i \in T_{s t a r t_s}

,

j \in T_{e n d_s}

, and

i < j

.

In order to reduce the occurrence of entity redundancy, in this paper we introduce a length limitation during the span combination process to remove spans that are deemed too long during the judgment process. Different parameters are set for English and Chinese datasets. In this model, the maximum length for English spans is set to 15.

3.4. Subject Feature and Sentence Vector Fusion Module

The different stages of entity and relation extraction are interconnected, and the accurate identification of subjects is crucial for the precise recognition of objects and their relations. The extracted subject information helps to determine the scope of the object, making it easier for the model to identify the position of the object. Additionally, the information contained in the subject provides more features for relation identification, enabling a better understanding of the relation between objects. In certain cases, the same entity may serve as both the subject and object, in which case correctly extracting the subject reduces confusion between entities. Therefore, in this paper we incorporate the features of the subject during the process of object and relation extraction,. A simple connection operation directly on the subject makes feature expression less accurate. To address this issue, a gating unit is introduced to weigh the extracted subject information. By considering the weight information, more important features can be obtained and more effectively fused with the vector of the current word. This method adaptively integrates information from different parts, with the advantages of model learnability, information selection capability, and model interpretability. This improve the performance of the model while providing explanations and understanding of model decisions. The vector after the ith word is fused with the subject information calculated by Formula (9):

X_{i} = g a t e ([W_{o} \cdot s_{m}, t_{i}])

(9)

where

t_{i}

represents the encoded representation of the ith word in the input sentence at the last transformer layer, which is

t_{i} = H_{N} [i]

,

s_{m}

refers to the vector representations of the mth subject, [,] represents a splice,

g a t e ()

denotes the gating unit, and

W_{o}

and

b_{o}

respectively denote the training weights and offsets.

3.5. Object–Relation Extraction Module

After passing through the gating unit, we obtain the fusion vector between the subject and the current word, standardize it through the ReLU activation function, use the vector obtained above as input, and use the relation as the mapping function to calculate the start and end position probabilities of the object corresponding to the relation. This object tagger first learns the input information through a linear layer, finally passes it through a fully connected layer into the sigmoid function, and outputs the probability of each word being the beginning and ending location of the object under the correspondence relationship. The formulas of specific implementation steps are shown in Formulas (10) and (11):

p_{i}^{s t a r t_o} = σ (R e L U (W_{s t a r t_o} \cdot X_{i} + b_{s t a r t_o}))

(10)

p_{i}^{e n d_o} = σ (R e L U (W_{e n d_o} \cdot X_{i} + b_{e n d_o}))

(11)

where

p_{i}^{s t a r t_o}

and

p_{i}^{e n d_o}

respectively indicate the probability of the ith word being the beginning location and ending location of the object in the sentence,

R e L U ()

represents the ReLU function,

W_{s t a r t_o}

,

W_{e n d_o}

,

b_{s t a r t_o}

, and

b_{e n d_o}

denote the weights and offsets, and

X_{i}

represents the embedded representation of the word in the ith position of the sentence after merging with the topic information. Using the above position probability, the start and end positions of the object span are determined in order to predict the complete object span. The specific implementation steps of object extraction are shown in Formulas (12)–(14):

T_{s t a r t_o} = \{i | p_{i}^{s t a r t_o} > h_{b a r}, i = 1, 2, \cdot \cdot \cdot, n\}

(12)

T_{e n d_o} = \{i | p_{i}^{e n d_o} > t_{b a r}, i = 1, 2, \cdot \cdot \cdot, n\}

(13)

P_{r} ((o, r) | s, x_{j}) = σ (W_{o b j}^{r} \cdot [E_{s t a r t_o}^{i}, E_{e n d_o}^{j}] + b_{o b j}^{r})

(14)

where

T_{s t a r t_o}

and

T_{e n d_o}

represent the set of position indexes in the sentence for tokens with a probability greater than the set threshold, i and j represent the position index of this token in the sentence,

h_{b a r}

and

t_{b a r}

represent the respective threshold values of the set start position and end position, n represents the length of the sentence

E_{s t a r t_o}^{i}

,

E_{e n d_o}^{j}

represents the token vector of the start and the end of span, with

i \in T_{s t a r t_o}

,

j \in T_{e n d_o}

, and

i < j

, and

W_{o b j}^{r}

and

b_{o b j}^{r}

respectively represent the training weights and offset of the model.

The loss function of this model is mainly the sum of the losses of the two tasks when extracting the head entity and object–relation pair, which in this paper is calculated using the binary cross entropy loss function. The specific formula is as follows:

\begin{matrix} L_{s} = \frac{1}{M} \sum_{i = 1}^{M} & [S_{m} \cdot l o g (P_{m} (s | x_{j})) + (1 - S_{m}) \cdot l o g (1 - P_{m} (s | x_{j}))] \end{matrix}

(15)

\begin{matrix} L_{o} = \frac{1}{N} \sum_{n = 1}^{N} [O_{n} \cdot l o g (P_{r} ((r, o) | s, x_{j})) + (1 - O_{n}) \cdot l o g (1 - P_{r} ((r, o) | s, x_{j}))] \end{matrix}

(16)

L = L_{s} + L_{o}

(17)

where

P_{m} (s | x_{j})

and

P_{r} ((r, o) | s, x_{j})

are the results of Formula (8) and Formula (14), respectively, m and n denote the index of the mth subject and nth object in the sentence, respectively, M and N represent the respective quantities of subject and objects contained in the sentence,

S_{m}

indicates whether the binary label of the main language is used, and

O_{n}

indicates whether the binary label of the main object is used. The model uses the Adam optimizer [32] to optimize the loss function.

The joint extraction approach can better integrate information between entities and relations. It can improve the consistency of the model and the precision of context association and relation type recognition while making full use of training data and being suitable for a variety of application scenarios. Therefore, joint extraction is widely used in the research and practical application of entity and relation extraction.

4. Experiments

In this paper, we describe experiments targeting nested entities and overlapping relations. First, following the experimental methods of previous relation extraction studies, experiments were conducted on the NYT and WebNLG datasets. Next, to address the issue of nested entities, we processed the dataset to create a new dataset called “new_data” for further experiments. Subsequently, ablation experiments were conducted on the span extraction module and the subject feature–sentence vector fusion module for object extraction in order to validate their effectiveness.

4.1. Datasets and Experimental Setting

The experiments in this paper were conducted on the NYT [33] and WebNLG [34] datasets. In the NYT dataset, the training set consists of 56,195 sentences, the valid set consists of 5000 sentences, and the test set consists of 5000 sentences. There are a total of 24 relation types in this dataset. In the WebNLG dataset, the training set consists of 5019 sentences, the valid set consists of 500 sentences, and the test dataset consists of 703 sentences, for a total of 247 types of relationships between the dataset. To evaluate the efficiency of the ESGM model in extracting nested entities, the dataset was preprocessed to create a new dataset called “new_data”. The new_data training set consisted of 5589 sentences, the valid set consisted of 1226 sentences, and the test set consisted of 627 sentences; there were a total of 360 relation types in this dataset. The statistics of the datasets are summarized in Table 1.

In this experiment, we used Python 3.7 and related scientific computing libraries to write and execute the code of the experiment, using Pytorch v1.12 as the deep learning framework. Specifically, we used the Pytorch v1.12 GPU version to speed up the training process. All experiments were conducted on a workstation running an Ubuntu 18.04 Linux operating system. The model training in the experiment relied on an NVIDIA GeForce RTX 3090 GPU. We configured Pytorch to take full advantage of GPU acceleration using CUDA 11.3. The model was trained using mini-batches with a batch size of 3. The learning rate was set to 1

\times 10^{- 5}

. The maximum sentence length was set to 300. Dropout with a rate of 0.2 was applied during the subject extraction and object extraction stages. The model was trained on the BERT base case pretrained language model, which consists of twelve Transformer blocks with a hidden size of 768 and a total of 110M parameters. For the NYT dataset, the maximum number of epochs was set to 100. For the WebNLG dataset, we set the maximum epoch number to 200. To prevent overfitting, the training process was automatically stopped if the training results showed no improvement over ten consecutive epochs or the performance on the valid set worsened while the performance on the training set improved.

In order to select the most appropriate span length limit, in this experiment we used span lengths of 5, 10, 15, and 20. The performance of the acquired loss was evaluated visually, as shown in Figure 3. From the figure, it is evident that the loss function converges more easily when the span length is configured as 15.

4.2. Experiment Results

We compared our model with seven advanced baseline models: NovelTagging [20], CopyRe [35], GraphRel [36], CopyR [37], CasRel [7], TPLinker [28], and CasDE [38]. NovelTagging is an extraction method based on rules and feature engineering, GraphRel is a graph-based extraction method, and the others are based on sequence. To ensure a fair comparison, all baseline reports in this paper were obtained from the original text.

Referring to the above-mentioned experiments [28,37], the triplet prediction is correct only when the subject, object, and relation are extracted correctly. Moreover, in this paper we evaluated the experimental results using the precision, recall, and F1 value. The precision refers to the proportion of triples predicted correctly among all triples predicted by the model. The recall rate is the ratio of triples correctly predicted by the model with respect to the reference triples in the dataset; the formula for calculating the recall rate is shown in Formula (18). The value of F1 ranges from 0 to 1; Table 2 presents the comparison of this model with seven baseline models on the NYT and WebNLG datasets. It can be seen from the experimental results that the F1 value of the proposed model exceeds that of TPLinker model by 0.3% on the NYT dataset and that of CasRel model by 2.6%. On the WebNLG dataset, the F1 value exceeds that of the TPLinker model by 0.7%, indicating that the accuracy of subject extraction and the information contained in the extracted subject play essential roles in the recognition of subsequent objects and relations.

F 1 = \frac{2 * p r e c i s i o n * r e c a l l}{p r e c i s i o n + r e c a l l}

(18)

Our experiments on the above two datasets indicate that there are few nested entities in the datasets; thus, the results cannot show the advantages of the proposed model in solving nested entities. In order to reflect the performance indicators of this model in solving the problem of nested entities, we filtered several datasets, obtained the parts related to nested entities, and integrated the sentences containing nested entities into a new dataset, new_data. Then, a comparative experiment was carried out between the proposed ESGM model and the CasRel model using the new dataset. It can be seen from the experimental results that the accuracy rate, recall rate, and F1 value of this model are improved to an extent compared with CasRel model, exceeding 4.4%, 6.0%, and 5.6%, respectively. The specific results are shown in Table 3.

4.3. Ablation Experiment

According to the above experimental results, our proposed model can enhance the span mechanism. It extracts entities through a pointer network and one-to-many matching method, and adds a gating mechanism before the object extraction module, ensuring that information in sentences can be more widely obtained and entity characteristics can be better integrated. In order to reflect the function of the above module, in this paper we performed ablation experiments on the above two modules, with the results shown in Table 4. All of the modules introduced in this paper contribute to the final performance, and the subject feature; notably, the sentence vector fusion module has a significant impact, leading to a reduction of 6.9 percentage points in the F1 score, from 92.6% to 85.7%, on the WebNLG dataset. The span module has a great impact as well, as the F1 value decreases by 4.8 percentage points. It can be seen that the new span method proposed in this paper can improve the accuracy and completeness of entity extraction, while the multi-dimensional fusion subject information is effective in determining the influence of the relation and object extraction tasks.

5. Conclusions

This paper proposes a combined extraction model ESGM to solve nested entities and the relationship of overlapping problems. Different from the traditional span-based collaborative extraction model, this model combines a pointer network and one-to-many matching principle to predict the start and end of the span to extract nested entities and solve the redundant span problem. At the same time, in order to enhance the correlation between modules, ESGM model adds a gating unit before the object extraction, fusing the subject information and the global information of the sentence together to improve the adaptability and accuracy of the model. The ESGM model in this paper has been tested on several data sets such as NTY and WebNLG to verify its superiority, and the results show that the precision and F1 value of the ESGM model are higher than that of the comparison model. However, the ESGM model dealing with document level and open domain relation extraction still needs to be improved, which will be our future research work.

Author Contributions

Conceptualization, N.Z. and J.X.; Methodology, N.Z. and J.X.; Software, N.Z. and J.X.; Validation, N.Z. and J.X.; Formal analysis, N.Z. and J.X.; Investigation, N.Z. and J.X.; Resources, N.Z., J.X. and Q.C.; Data curation, N.Z. and J.X.; Writing-original draft, N.Z. and J.X.; Writing—review & editing, N.Z., J.X. and V.C.; Visualization, N.Z., J.X., Q.C. and V.C.; Supervision, N.Z., J.X., Q.C. and V.C.; Project administration, N.Z., J.X. and Q.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

No new data were created.

Conflicts of Interest

The authors declare no conflict of interest.

References

Zelenko, D.; Aone, C.; Richardella, S.T. Kernel Methods for Relation Extraction. J. Mach. Learn. Res. 2003, 3, 1082–1106. [Google Scholar]
E, H.H.; Zhang, W.-J.; Xiao, S.-Q.; Cheng, R.; Hu, Y.-X.; Zhou, X.-S.; Niu, P.-Q. Survey of entity relationship extraction based on deep learning. J. Softw. 2019, 30, 1793–1818. [Google Scholar]
Ratinov, L.; Roth, D. Design challenges and misconceptions in named entity recognition. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL-2009), Boulder, CO, USA, 4–5 June 2009; pp. 147–155. [Google Scholar]
Li, Q.; Ji, H. Incremental joint extraction of entity mentions and relations. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, MD, USA, 22–27 June 2014; pp. 402–412. [Google Scholar]
Bekoulis, G.; Deleu, J.; Demeester, T.; Develder, C. Joint entity recognition and relation extraction as a multi-head selection problem. Expert Syst. Appl. 2018, 114, 34–45. [Google Scholar] [CrossRef]
Qiao, B.; Zou, Z.; Huang, Y.; Fang, K.; Zhu, X.; Chen, Y. A joint model for entity and relation extraction based on bert. Neural Comput. Appl. 2022, 34, 3471–3481. [Google Scholar] [CrossRef]
Wei, Z.; Su, J.; Wang, Y.; Tian, Y.; Chang, Y. A novel cascade binary tagging framework for relational triple extraction. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 1476–1488. [Google Scholar]
Yu, B.; Zhang, Z.; Shu, X.; Liu, T.; Wang, Y.; Wang, B.; Li, S. Joint extraction of entities and relations based on a novel decomposition strategy. In ECAI 2020; IOS Press: Copenhagen, Denmark, 2020; pp. 2282–2289. [Google Scholar]
Eberts, M.; Ulges, A. Span-based joint entity and relation extraction with transformer pre-training. In ECAI 2020; IOS Press: Copenhagen, Denmark, 2020; pp. 2006–2013. [Google Scholar]
Seo, M.; Kembhavi, A.; Farhadi, A.; Hajishirzi, H. Bidirectional Attention Flow for Machine Comprehension. In Proceedings of the International Conference on Learning Representations, San Juan, PR, USA, 2–4 May 2016. [Google Scholar]
Lee, K.; He, L.; Lewis, M.; Zettlemoyer, L. End-to-end neural coreference resolution. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 9–11 September 2017. [Google Scholar]
Wan, J.; Ru, D.; Zhang, W.; Yu, Y. Nested named entity recognition with span-level graphs. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland, 22–27 May 2022; pp. 892–903. [Google Scholar]
Dixit, K.; Al-Onaizan, Y. Span-level model for relation extraction. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 5308–5314. [Google Scholar]
Li, X.; Feng, J.; Meng, Y.; Han, Q.; Wu, F.; Li, J. A unified mrc framework for named entity recognition. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 5849–5859. [Google Scholar]
Zhong, Z.; Chen, D. A frustratingly easy approach for entity and relation extraction. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, Online, 7–9 June 2021; pp. 50–61. [Google Scholar]
Socher, R.; Huval, B.; Manning, C.D.; Ng, A.Y. Semantic compositionality through recursive matrix-vector spaces. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jeju Island, Republic of Korea, 12–14 July 2012; pp. 1201–1211. [Google Scholar]
Wang, L.; Cao, Z.; De Melo, G.; Liu, Z. Relation classification via multi-level attention cnns. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, 7–12 August 2016; pp. 1298–1307. [Google Scholar]
Cai, R.; Zhang, X.; Wang, H. Bidirectional recurrent convolutional neural network for relation classification. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, 7–12 August 2016; pp. 756–765. [Google Scholar]
Miwa, M.; Bansal, M. End-to-end relation extraction using lstms on sequences and tree structures. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, 7–12 August 2016. [Google Scholar]
Katiyar, A.; Cardie, C. Going out on a limb: Joint extraction of entity mentions and relations without dependency trees. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, BC, Canada, 30 July–4 August 2017; pp. 917–928. [Google Scholar]
Sutskever, I.; Vinyals, O.; Le, Q.V. Sequence to sequence learning with neural networks. Adv. Neural Inf. Process. Syst. 2014, 27, 1227–1236. [Google Scholar]
Zheng, S.; Wang, F.; Bao, H.; Hao, Y.; Zhou, P.; Xu, B. Joint extraction of entities and relations based on a novel tagging scheme. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, BC, Canada, 30 July–4 August 2017. [Google Scholar]
Mintz, M.; Bills, S.; Snow, R.; Jurafsky, D. Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Singapore, 2–7 August 2009; pp. 1003–1011. [Google Scholar]
Lin, Y.; Shen, S.; Liu, Z.; Luan, H.; Sun, M. Neural relation extraction with selective attention over instances. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, 7–12 August 2016; pp. 2124–2133. [Google Scholar]
Xie, C.; Liang, J.; Liu, J.; Huang, C.; Huang, W.; Xiao, Y. Revisiting the negative data of distantly supervised relation extraction. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Online, 1–6 August 2021; pp. 3572–3581. [Google Scholar]
Zheng, H.; Wen, R.; Chen, X.; Yang, Y.; Zhang, Y.; Zhang, Z.; Zhang, N.; Qin, B.; Ming, X.; Zheng, Y. Prgc: Potential relation and global correspondence based joint relational triple extraction. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Online, 1–6 August 2021; pp. 6225–6235. [Google Scholar]
Yan, Z.; Zhang, C.; Fu, J.; Zhang, Q.; Wei, Z. A partition filter network for joint entity and relation extraction. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic, 7–11 November 2021; pp. 185–197. [Google Scholar]
Dai, D.; Xiao, X.; Lyu, Y.; Dou, S.; She, Q.; Wang, H. Joint extraction of entities and overlapping relations using position-attentive sequence labeling. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 6300–6308. [Google Scholar]
Wang, Y.; Yu, B.; Zhang, Y.; Liu, T.; Zhu, H.; Sun, L. Tplinker: Single-stage joint extraction of entities and relations through token pair linking. In Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain, 8–13 December 2020; pp. 1572–1582. [Google Scholar]
Shang, Y.-M.; Huang, H.; Mao, X. Onerel: Joint entity and relation extraction with one module in one step. In Proceedings of the AAAI Conference on Artificial Intelligence, Palo Alto, CA, USA, 22 February–1 March 2022; Volume 36, pp. 11285–11293. [Google Scholar]
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, L.K. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the NAACL-HLT, Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Riedel, S.; Yao, L.; McCallum, A. Modeling relations and their mentions without labeled text. In Proceedings of the Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD, Barcelona, Spain, 20–24 September 2010; pp. 148–163. [Google Scholar]
Gardent, C.; Shimorina, A.; Narayan, S.; Perez-Beltrachini, L. Creating Training Corpora for NLG Micro-Planners. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, BC, Canada, 30 July–4 August 2017; pp. 179–188. [Google Scholar]
Zeng, X.; Zeng, D.; He, S.; Liu, K.; Zhao, J. Extracting relational facts by an end-to-end neural model with copy mechanism. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia, 15–20 July 2018; pp. 506–514. [Google Scholar]
Fu, T.-J.; Li, P.-H.; Ma, W.-Y. Graphrel: Modeling text as relational graphs for joint entity and relation extraction. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 1409–1418. [Google Scholar]
Zeng, X.; He, S.; Zeng, D.; Liu, K.; Liu, S.; Zhao, J. Learning the extraction order of multiple relational facts in a sentence with reinforcement learning. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 367–377. [Google Scholar]
Ma, L.; Ren, H.; Zhang, X. Effective cascade dual-decoder model for joint entity and relation extraction. arXiv 2021, arXiv:2106.14163. [Google Scholar]

Figure 1. Examples of normal, SEO, EPO, and nested entities.

Figure 2. The overall structure of ESGM.

Figure 3. Training loss graph, showing how the training loss changes in the new_data dataset when the span length is limited to 5, 10, 15, and 20.

Table 1. Statistics of datasets.

Datasets/Category	Train	Valid	Test
new_data	5589	1226	627
NYT	56,195	5000	5000
WebNLG	5019	500	703

Table 2. The results of our experiments on the datasets.

Datasets/Model	WebNLG			NYT
Datasets/Model	Precision	Recall	F1	Precision	Recall	F1
NovelTagging	62.4	31.7	42.0	52.5	19.3	28.3
CopyRe	59.4	53.1	56.0	32.2	28.9	30.5
GraphRel	63.9	60.0	61.9	44.7	41.1	42.9
CopyR	77.9	67.2	72.1	63.3	59.9	61.6
CasRel	89.7	89.5	89.6	93.4	90.1	91.8
TPLinker	91.3	92.5	91.9	91.8	92.0	91.9
CasDE	90.2	90.9	90.5	90.3	91.5	90.9
ESGM (ours)	91.8	92.6	92.2	93.6	91.7	92.6

Table 3. The experiment results of ESGM model and CasRel Model on the new_data dataset.

Model	New_Data
Model	Precision	Recall	F1
CasRel	76.1	71.0	73.4
ESGM(ours)	80.5	77.0	79.0

Table 4. Ablation experiment results.

	F1	Δ
ESGM(ours)	92.6	-
-gate unit for object extraction	85.7	−6.9
-span match	87.8	−4.8

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, N.; Xin, J.; Cai, Q.; Chung, V. Joint Extraction of Entities and Relations Based on Enhanced Span and Gate Mechanism. Appl. Sci. 2023, 13, 10643. https://doi.org/10.3390/app131910643

AMA Style

Zhang N, Xin J, Cai Q, Chung V. Joint Extraction of Entities and Relations Based on Enhanced Span and Gate Mechanism. Applied Sciences. 2023; 13(19):10643. https://doi.org/10.3390/app131910643

Chicago/Turabian Style

Zhang, Nan, Junfang Xin, Qiang Cai, and Vera Chung. 2023. "Joint Extraction of Entities and Relations Based on Enhanced Span and Gate Mechanism" Applied Sciences 13, no. 19: 10643. https://doi.org/10.3390/app131910643

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Joint Extraction of Entities and Relations Based on Enhanced Span and Gate Mechanism

Abstract

1. Introduction

2. Related Work

2.1. Span Extraction

2.2. Entity and Relation Extraction

3. ESGM Model

3.1. BERT Encoding Module

3.2. Span Extraction Module

3.3. Subject Extraction Prediction Module

3.4. Subject Feature and Sentence Vector Fusion Module

3.5. Object–Relation Extraction Module

4. Experiments

4.1. Datasets and Experimental Setting

4.2. Experiment Results

4.3. Ablation Experiment

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI