Bidirectional matching and aggregation network for few-shot relation extraction

Few-shot relation extraction is used to solve the problem of long tail distribution of data by matching between query instances and support instances. Existing methods focus only on the single direction process of matching, ignoring the symmetry of the data in the process. To address this issue, we propose the bidirectional matching and aggregation network (BMAN), which is particularly powerful when the training data is symmetrical. This model not only tries to extract relations for query instances, but also seeks relational prototypes about the query instances to validate the feature representation of the support set. Moreover, to avoid overfitting in bidirectional matching, the data enhancement method was designed to scale up the number of instances while maintaining the scope of the instance relation class. Extensive experiments on FewRel and FewRel2.0 public datasets are conducted and evaluate the effectiveness of BMAN.


INTRODUCTION
Knowledge graphs have multiple applications in question answering for medical domain  and data analytics and prediction for Biological domain (Alshahrani, Thafar & Essack, 2021). Nevertheless, the existing Knowledge graphs is incomplete, which limits the scope of knowledge applications. Relation extraction for building knowledge graphs provides indispensable knowledge, which are represented by triplet format <head entity, relation, tail entity>. Different from link prediction , which uses already existing triples to derive related triples, relation extraction determines the relation between entities in a given text to construct the triples. Relations extraction is generally handled as a text classification task, classifying entity-pair relations into a certain relation class, but differently from text classification which focuses on the text as a whole (Jia & Wang, 2022), the relation extraction task focuses more on the context of the entity. The study of the extraction of relational facts from a large corpus has important implications for the enrichment of the knowledge graph.
Conventional methods of manually labeling data face inefficiencies and expensive costs. Distant supervision learning (Mintz et al., 2009) was proposed to address the above issues,

Note:
The support set for this task has three relation classes, with two instances of each class. The entities in the instances are underlined and R is the relation label.
The above approaches use matching to relational prototypes to extract relations on query instances, but ignores the symmetry of the data in this process. We propose the bidirectional matching and aggregation network (BMAN) model for relation extraction. Our hypothesis is that if the relational prototype obtained from support set learning is able to extract relations on query set, then the relational prototype obtained from query set learning is also able to extract relations on the instances that are the same as the support instances relations. The reverse mechanism is designed to enable the model to learn features that are more general to the class of relation, which improves the accuracy of relation extraction for tail instances.
Generally, the main contributions of this article are summarized below: We propose the bidirectional matching and aggregation network that exploits the symmetry of the data in the matching process to mutually validate the prediction accuracy in both the forward and reverse, thereby improving the accuracy of the model relation extraction. We designed the data enhancement method to generate multiple sets of the reverse support instance using a limited number of forward query instances, while in order to avoid overfitting problem, the reverse query sets were not directly replaced by the forward support sets.
We have conducted ample experiments on the datasets FewRel  and FewRel 2.0 , and the experimental results show that our model achieves state-of-the-art performance.
The rest of the article is structured as follows. The related work is described in Section 2. In Section 3 the bidirectional few-sample relation extraction task is defined, followed by a detailed description of the framework of the BMAN model. In Section 4 the experiments are presented and the results analyzed. Finally, in Section 5, we summaries the work and list future work.

Relation extraction
In recent years, the distant supervision method (Mintz et al., 2009) has been widely used in relation extraction, while achieving excellent result, which also brings the problem of noise in data annotation quality and long-tail distribution of data. For the noise problem, the introduction of the attention mechanism (Luo et al., 2018), the segmental convolutional neural networks  and the multiple instance learning  have effectively improved the accuracy of automatic annotation. For the problem of long-tailed distribution of data, Zhang et al. (2019) proposed a distant supervision relation extraction method for long-tailed distribution of data, which transfers the knowledge of a large number of identified labeled instances to the relation extraction of scarcity instance classes. Subsequently, Cao et al. (2021) proposed a general method for learning relational prototypes from unlabeled text, which uses relational prototypes to transfer knowledge from head classes to tail classes, therefore improving the performance of relation extraction for tail classes.

Few-shot relation extraction
Early few-shot learning was used in image classification tasks Fu, Zhou & Chen, 2022), which aims to classify instances of a query set with a small number of samples by learning generic features of each class.  first defined relation extraction as a few-shot learning task and proposed the few-shot relation extraction dataset FewRel, which was generated using distant supervision annotation method and manual annotation method. Gao et al. (2019a) proposed the prototype network of hybrid attention that uses the instance-level attention mechanism to filter out representative support instances and focuses on the important dimensions in the feature space using the feature-level attention mechanism. In the same year,  proposed the FewRel 2.0 dataset with (N + 1) relation classes, which added cross-domain data and noneof-the-above data to the FewRel dataset. To strengthen the connection between relational prototypes and query sets, Ye & Ling (2019) proposed a multi-level matching and aggregation network that considers the matching between support instances and query instances at the instance level to compute relational prototypes. Ren et al. (2020) proposed the two-stage prototype network for the incremental few-shot relation extraction task, which serves to dynamically identify novel relation in support instances. Liu et al. (2021) proposes a global transformed prototype network based on the learning discriminant paradigm, which extracts relations by learning the differential information between query instances and all target classes. This capability of the model is better suited to the processing of cross-domain data. Wang et al. (2022) proposed the hybrid enhanced prototype network model which enhances the scale and utilization of annotated data.

METHODOLOGY
In this section, we define the bidirectional few-shot relation extraction task, then describe the BMAN model framework, which is designed to extract more representative class prototype features, and finally describe the data augmentation extension method. We describe the detailed process of our method in three parts below.

Task definition
In the few-shot learning method, the dataset is generally divided into a training set and a test set. The training set consists of the support set (t ts ) and the query set (t tq ). If there are N relations in the (t ts ) set, each containing K instances, then this few-shot learning is called the N-way K-shot task, and the label of each (t tq ) set is one of these N relation labels. The test set is also divided into the support set (t cs ) and the query set (t cq ) in order to follow the consistency of the test environment with the training environment. The prediction query instance relation in BMAN based on few-shot learning for forward and inverse processes can be expressed as: (1) where S represents the forward support set S ¼ fs i k ; i ¼ 1; …; N; k ¼ 1; …; Kg, s i k is the k-th instance of class i, l is the label of the support instance,l is the prediction label for the query instance, the function f ðfs k g K k¼1 ; qÞ is used to calculate the degree of match between the query instance q and the support instances fs k g K k¼1 . In the training stage, we used the cross-entropy loss function which reacts to the distance between the predicted label and the true label: log PðljS; qÞ forward ; (3) where J forward is the forward loss function, Q represents the forward query set Q ¼ fðq j ; l j Þ; j ¼ 1; …; Rg, and R is the sample size of the forward selected query set, l j 2 fq; …; Ng is the relation label of instance q j . J reverse is the reverse loss function Q 0 represents the reverse query set Q 0 ¼ fðq 0 j ; l j Þ; j ¼ 1; …; R 0 g, l 0 j 2 fq; …; Ng is the relational label of instance q 0 j . Combine Eqs. (3) and (4) to form the final function: where a is the weighting parameter, x is the group number of the query set in the reverse mechanism.
In summary, the BMAN model based on few-shot learning generates N-way K-shot tasks by randomly selecting instances in the dataset, and utilizes the N-way K-shot tasks to learn features of relation classes. Next, query instances are matched with the calculated relation class features, and the relations between entities in the query instances are predicted by the matching degree.

Bidirectional matching and aggregation network framework
The article proposes the bidirectional matching and aggregation network based on MLMAN. The model BMAN has four main components and the framework is shown in Fig. 1. 1. Instances encoder. Bidirectional Encoder Representations from Transformer (BERT) (Devlin et al., 2018) extracts the features of the instances based on different perspectives and embeds the instance features into a vector representation.
2. Multi-level matching and aggregation. The results of the sentence encoder are fed into MLMAN to obtain a matching representation of the support instance and the query instance, then the match between the support instance and the query instance is calculated.
3. Relational prototype. The matching results obtained in the previous step are used as weights for the aggregated support instances to obtain a vector representation of the relation class.
4. Bidirectional matching. The results of the forward matching are used as the support set for the reverse matching, while the results of the reverse matching guide the forward matching, which makes the obtained relational prototypes more accurate.

Instances encoder
The Instance Encoder module encodes each instance t into the same embedding space: w ¼ jðtÞ, where w is the embedding of each instance, which will be used for matching representation learning. Existing instance encoder methods, including convolutional neural networks (CNN) (Zeng et al., 2014); recurrent neural networks (RNN) (Hochreiter & Schmidhuber, 1997); and transformer architectures (Vaswani et al., 2017). In our method, we use an instance encoder based on BERT (Devlin et al., 2018;Baldini Soares et al., 2019), which packages the entities in the instance with special tokens ENTiTY and /ENTiTY and concatenates the representation of the first token ENTiTY of each entity as the instance embedding.

Matching and aggregation
After the instance encoding has obtained the embedding of the query instance q and the embedding of the support instance s, these embedding representations are input to the matching aggregation module to compute the matching representation of the query instance with the supporting instances. The matching information between them is first calculated by the following formula where m 2 f1; …; T q g, T q is the length of the query instance, n 2 f1; …; T s g, T s is the length of the query instance. Then, the matching information is fused with the original representation Q ¼ ReLUð½Q; Q 00 ; jQ À Q 00 j; Q Q 00 W 1 Þ; (8) S ¼ ReLUð½S; S 00 ; jS À S 00 j; S S 00 W 1 Þ; where Q and S are matrices for the original representation q and s aggregation, Q 00 and S 00 are matrices for the matching information q 00 and s 00 aggregation, is the element-wise product and W 1 is the weight matrix. Finally the matches are aggregated into a single vector for each query and each support instance using a max pooling together with an average pooling. The computation is as follows: whereŝ k is the support instance matching representation,q is the query instance matching representation, S k is a representation of K splits of S, corresponding to K support instances.

Relational prototype
The query instance match representationq and the support instance match representation s k output by the matching and aggregation module are fed into the ReLU layer to calculate the match l k betweenq andŝ k where v T and W 2 are the parameter matrices. Finally, the matching degree l k is used as a weight to calculate the relational prototype s of the support instances

Bidirectional mechanism
The flow of the bidirectional mechanism is shown in Fig. 1. After the relational prototype s and query instanceq are calculated, the function f ðfs k g K k¼1 ; qÞ in Eq. (4) is defined as which is called forward mechanism in this article.
After identifying the labels for the forward match, the results of the forward match are mapped to the support set S 0 to participate in the reverse mechanism. To ensure consistency in the matching calculation, the same method as forward matching is used in the reverse mechanism. The function f ðfs k g K k¼1 ; qÞ is defined in reverse matching as where s 0 is the relational prototype andq 0 is the query instance in the inverse mechanism. There are two advantages to using the bidirectional mechanism: (1) Information from symmetric data can be obtained to improve the accuracy of model relation extraction. (2) The positive and negative labeled samples generated by the forward mechanism can be used to obtain a more accurate relational prototype.

Data enhancement method
The conversion of forward and reverse data in the framework utilizes data enhancement method, which is responsible for generating the reverse support set S 0 by the forward query set Q and generating the reverse query set Q 0 . The design of this method is a simple to complex process. The reverse support set S 0 and the reverse query set Q 0 are divided directly on the single direction queue of instance sets, which is called BMAN-simple, as shown in Fig. 2. But such an approach does not exploit the results of forward matching and severs the connection between forward and reverse in the matching and aggregation network, which is not the result we expected. To address the above shortcoming, we use the forward matching results to recombine the instances in the query set Q to obtain multiple reverse support sets S 0 , as shown in Fig. 2, and randomly select instances of the same class as the support set S to form the reverse query set Q 0 , which is called BMAN-link. We also found that the labels of the m consecutive instances in the k-shot task were identical, so that the logic of BMAN-link was not applicable to the k-shot task. This article also improves on BMAN-link to form the final BMAN model, which uses the 1-shot task as a special k-shot, as shown in Fig. 2. In order to ensure that the reverse support set S 0 contains instances of n relation classes, this article devises solutions for two possible scenarios: 1. Reuse instances within instance-poor classes when the distribution of forward matching results is unbalanced.
2. When instances of a certain class are completely missing, instances of this class in the forward support set S are introduced.
This method maintains the range of relation classes while expanding the size of the instances, which allows the model to obtain more information about the relational prototypes while avoiding overfitting problem. The specific code implementation is shown in Algorithm 1.

EXPERIMENTS
In this section, the BMAN model is compared with existing baseline methods to show the advantages of this approach.

Dataset
This article evaluates the BMAN model on two publicly available datasets, details of which are shown in Table 2. The dataset FewRel  is the first proposed dataset for few-shot relation extraction, which is based on Wikipedia text and constructed using distant supervised learning and manual annotation, with the data distribution obeying a balanced distribution. The dataset FewRel contains 100 relations with 700 instances each, where the train set contains 64 relation classes, the validation set contains 16 relation classes and the test set contains 20 relation classes. The dataset FewRel 2.0  adds cross-domain data based on dataset FewRel which introduces biomedical texts from the PubMed dataset and UMLS dataset to generate the test set, where 25 relation classes with 100 instances each are included, and 17 relations from the SemEval dataset are selected to construct the validation set. The datasets FewRel 2.0 and FewRel share the same train set.

Hyperparameter
The effect of hyperparameters in the model was experimentally investigated, with the model trained on a device with an NVIDIA RTX3090 24G GPU. Due to GPU video memory limitations, the experiments were set up with a query set sample of R ¼ 5 Á n, where n is the number of total relation classes, and the value in the learning rate learn rate 2 ð1e À 4; 1e À 3; 1e À 2; 2e À 2; 1e À 1; 2e À 1; 3e À 1Þ that makes the model accurate is determined, as shown in Fig. 3. As can be seen from the figure, on the FewRel, FewRel 2.0 dataset. The model BMAN obtained the highest accuracy when the learning rate was 0.2.
Algorithm 1 Generating support sets and query sets for the reverse mechanism. M is a dataset holding instances of the same relation class. Q l and T l represent the query set and training set of the label l respectively. x is the number of groups in the reverse mechanism, initially 0. b x is the number of positive samples in each set of reverse matching support set S 0 x .
Output: Reverse support set S 0 and reverse query set Q 0 . Parameters x and a.
for k = 1 do

Data enhancement method
We compare the accuracy of BMAN-simple, BMAN-link and BMAN on datasets FewRel and FewRel 2.0, as shown in Table 3. Table 3 shows that the overall accuracy of the BMAN model is the best, while the overall accuracy of the BMAN-simple model is poor. The BMAN-simple model severs the connection between the forward and the reverse which does not serve the purpose of extracting symmetric information from the data. The BMAN-link model achieved the best results in the 1-shot task, but the worst accuracy in the k-shot task, which we analyzed because the overall logic of the k-shot task was broken by 1-shot task as the reverse matching of the k-shot task. The BMAN model considers the difference between 1-shot and k-shot tasks which strengthens the connection between the reverse mechanism and the forward mechanism, while maintaining the consistency between the forward matching logic and the reverse matching logic.

Experimental results and analysis
In this article, we analyzed the effect of the number of instances and relations on model performance at the dataset level and task level respectively. We set the number of instances of each relation in the train set to 700, 500, 300 and 100, respectively, which generated four training sets of train_700, train_500, train_300 and train_100. The BMAN model was trained on these train sets, and the results are shown in Fig. 4. The results show that the accuracy of the 5-way 1-shot task fluctuates within 1.15 on the FewRel dataset and within 1.24 on the FewRel 2.0 dataset. The accuracy of the 10-way 1-shot task fluctuates within 1.61 on the FewRel dataset and within 1.58 on the FewRel 2.0 dataset. The accuracy of the fluctuates within 7.5 on the FewRel dataset and within 3.62 on the FewRel 2.0 dataset. The accuracy of the 10-way one-shot task fluctuates within 8.16 on the FewRel dataset and within 1.64 on the FewRel 2.0 dataset. The accuracy of the five-way five-shot task fluctuates within 2.69 on the FewRel dataset and within 5.27 on the FewRel 2.0 dataset. The accuracy of the 10-way five-shot task fluctuates within 5.11on the FewRel dataset and within 6.74 on the FewRel 2.0 dataset. The results of the above two experiments show that the number of instances have a weak effect on BMAN performance and the number of relations have a strong effect on BMAN performance at the dataset level, which indicates that BMAN can be used in scenarios with sufficient relations and few instances, and can effectively solve the problem of lack of tail instances in the long-tail distribution of data. The experimental results also show that the performance of the BMAN model decreases with increasing 'way' and increases with increasing 'shot' at the task level, which means that the performance of the model decreases as the number of relations in the task increases and increases as the number of instances selected for the task increases.
To directly validate the performance of the model on the long-tailed distribution datasets, we constructed the long-tailed distribution datasets FewRel_L and FewRel 2.0_L based on random sampling of the original datasets, where the relation classes of the train, validation and test sets were kept constant and the number of instances was adjusted, as shown in Fig. 6. The datasets FewRel_L and FewRel 2.0_L share the same training set and the validation results of the BMAN model are shown in Table 4. The accuracy of the BMAN model on the datasets FewRel_L for the tasks five-way one-shot, 10-way one-shot, five-way five-shot and 10-way five-shot lost 0.15, 0.94, 0.84 and 0.38 respectively. The accuracy of the BMAN model lost 0.23, 1.02, 1.13, and 0.39 on the datasets FewRel 2.0_L for the tasks five-way one-shot, 10-way one-shot, five-way five-shot, and 10-way five-shot respectively. The results show that the model BMAN keeps excellent performance on the long-tailed distribution dataset, which indicates that the model BMAN is effective for solving the long-tailed distribution of the data.
Since both the BMAN and the MLMAN use matching and aggregation network, this article evaluates the time complexity of the BMAN for different task scenarios using MLMAN as the baseline, as shown in Fig. 7.
The comparison in the Fig. 7 shows that the accuracy of the BMAN model improves which comes at the cost of time complexity. In the datasets FewRel and FewRel2.0, the accuracy improvements in the five-way one-shot task were 14.63% and 13.89% respectively, while the corresponding time complexity improvements were 490.95 and 478.04% respectively. The accuracy improvements in the 10-way one-shot task were 22.20% and 29.60% respectively, while the corresponding time complexity improvements were 477.30% and 473.44% respectively. The accuracy improvements in the five-way fiveshot task were 5.95% and 22.81% respectively, while the corresponding time complexity improvements were 444.55% and 441.46% respectively. The accuracy improvements in the 10-way five-shot task were 9.04% and 34.59% respectively, while the corresponding time  complexity improvements were 427.02% and 449.57% respectively. The increase in time complexity is due to the fact that a set of forward query set can be mapped to multiple sets of reverse support set during the bidirectional training process with increased computational effort for reverse matching. This article also selects methods as a baseline for comparison and evaluates the performance of the BMAN model for relation extraction in vertical domain. The evaluation metrics in this article follow the settings in FewRel and include accuracy on the five-way one-shot, five-way five-shot, 10-way one-shot and 10-way five-shot tasks. This article compares the BMAN with the following baselines: MLMAN (Ye & Ling, 2019): This approach interactively encodes support instances with query instances, and the matching process strengthens the connection between the query set and the relational prototype. CTEG : The model uses entity-guided attention mechanisms and confusion-aware training to distinguish between problems of relational confusion. TPN(BERT) : The method integrates the transformer model into the prototype network and uses the pre-trained BERT as the encoder for the model. ConceptFERE (Yang et al., 2021): The model enhances the influence of entity concepts on relations in relation extraction. COL Final (Ding et al., 2021): The method learns the relational prototype from contextual information, which focuses on the influence of semantics on the relational prototype, and uses spherical coordinates as the basis for prototype interpretation.  Table 6 shows that BMAN achieves the best performance in these tasks compared to other approaches based on the learning-to-matching paradigm, but with a significant difference compared to GTPN. The reason for this result is that the BMAN model focuses more on matching between relational prototype features and query instance features, whereas GTPN focuses more on discriminating between different features of query instances and target relation classes, which is more important for relation extraction of cross-domain data. Specifically, the main advantage of GTPN is that all candidate relation classes are considered jointly rather than independently when extracting relations, which allows the model to more accurately use discriminative information between classes to distinguish between relation confusions caused by cross-domains. In contrast, the BMAN model has a larger error in measuring the degree of similarity between query instances and Table 5 Accuracy (%) of the different models on the FewRel dataset.

CONCLUSIONS
This article proposes the BMAN based on few-shot learning. The model obtains symmetric information from the data by the bidirectional mechanism which is used to validate the relational prototype and adjust the model parameters. At the same time, the data enhancement approach provides the data scale for the model to fully exploit the information in the query set. Compared to other baseline methods, the model BMAN achieves state-of-the-art performance in vertical domains, while also being highly competitive across domains. In addition, we evaluated the time complexity of BMAN and concluded that the increased time complexity was acceptable for the extent of the performance improvement. As future work, we consider the deficiencies of the BMAN model in cross-domain scenarios and are prepared to introduce information on the differences between relation classes to enhance the model's discrimination of out-of-domain relations. Also, our model has a high time complexity and we consider reducing the time complexity by speeding up the convergence of the model.