Cue prompt adapting model for relation extraction

Prompt-tuning models output relation types as verbalised-type tokens instead of predicting the confidence scores for each relation type. However, existing prompt-tuning models cannot perceive named entities of a relation instance because they are normally implemented on raw input that is too weak to encode the contextual features and semantic dependencies of a relation instance. This study proposes a cue prompt adapting (CPA) model for relation extraction (RE) that encodes contextual features and semantic dependencies by implanting task-relevant cues in a sentence. Additionally, a new transformer architecture is proposed to adapt pre-trained language models (PLMs) to perceive named entities in a relation instance. Finally, in the decoding process, a goal-oriented prompt template is designed to take advantage of the potential semantic features of a PLM. The proposed model is evaluated using three public corpora: ACE, ReTACRED, and Semeval. The performance achieves an impressive improvement, outperforming existing state-of-the-art models. Experiments indicate that the proposed model is effective for learning task-specific contextual features and semantic dependencies in a relation instance.


Introduction
Relation extraction (RE) identifies predefined semantic relationships between two named entities in a sentence and plays a vital role in many downstream natural language processing (NLP) tasks, such as knowledge base construction (Q.Liu et al., 2016), question answering (K.Xu et al., 2016) or machine translation (Bao et al., 2014).RE is usually implemented as a classification problem, which predicates a relation type for each entity pair in a sentence.Even greater success has been achieved, it is still a challenging task, which suffers from two problems.First, because some relation types are asymmetric, the order between entities in each pair should be considered.Therefore, it leads to a serious data imbalance problem.Second, every entity pair in a sentence should be classified to determine whether there is a relation between them.Because these entity pairs share the same contextual features in a sentence, it is difficult to distinguish them.In order to reduce the influence of negative instances and capture the order information between entities, it is important to learn contextual features and semantic dependencies relevant to the considered entities in a sentence.
In traditional type classification models, many techniques have been developed to learn contextual features and semantic dependencies relevant to the considered entities, such as position embedding (Zeng et al., 2015), multi-channel (Chen et al., 2020), neuralized feature engineering (Chen et al., 2021), and entity indicators (Qin et al., 2021;W. Zhou & Chen, 2021).In these models, PLMs are mainly used to support token embedding.Subsequently, entities' relevant features (e.g.entity positions or types) are encoded into a task-specific representation.In this case, PLMs-based deep architectures are usually designed to compress every relation instance into an abstract representation.Classification depends only on a dense representation of the entire input.Because the representation is usually a vector, it undoubtedly results in a serious semantic loss.Furthermore, the process of initialising PLMs is implemented as a masked token-prediction task (Devlin et al., 2018).There is also a gap between pre-training objectives and fine-tuning objectives.
In the prompt-tuning schema, predicting relation types are transformed into a verbalised-type token-prediction task.In general, prompts are defined as templates with slots that take values from a verbalised-type token set.Predefined prompts are concatenated with an input (sentence) and fed into PLMs to predict masked slots, similar to a closed-style schema (Schick & Schütze, 2020).For example, an input in semantic recognition is first concatenated with a prompt (e.g."It was [MASK]").The input is subsequently fed into PLMs to predict masked tokens (e.g.Glad or Sad).This approach is effective in making use of knowledge within PLMs because prompt tuning is helpful in bridging the gap between PLMs and RE.Therefore, this approach has been successfully applied to tasks such as text classification and natural language inference (Schick & Schütze, 2020).
In related studies, several prompts have been designed for PLM tuning (Brown et al., 2020).Despite great successes having been achieved in prompt-tuning models, their effectiveness heavily depends on the quality of the prompt templates.Current prompttuning models are often implemented directly on raw inputs concatenated with predefined prompt templates.In this case, it is difficult to encode contextual features and semantic dependencies because they cannot perceive the named entities of a relation instance.
This study proposes a cue prompt adapting (CPA) model for RE consisting of three components.First, entity cues are implanted into a sentence instead of implementing a classifier on the raw input in order to learn the semantic dependencies of a relation instance.Second, a new transformer architecture is proposed to tune PLMs for RE to adapt to the proposed entity cues.Third, a goal-oriented prompt template is designed to decode the potential semantic features of a PLM.This model enables PLMs to encode semantic dependencies between the type tokens and contextual words.The proposed model achieves a state-of-the-art performance on three public evaluation datasets.
The remainder of this paper is organised as follows.Section 2 introduces the related work.The proposed CPA model is presented in Section 3. Section 4 presents experiments to evaluate the proposed CPA model.Finally, the conclusions are presented in Section 5.

Related work
The task of extracting relations is typically implemented as a classification problem.Shallow architectures were widely used in the early research stage, such as rule and feature-engineering based architectures (Chen et al., 2015;S. Zhao & Grishman, 2005).This was because manually designed rules were required to extract the features of a relation instance.However, these models were expensive in human labour, and generation to a different domain was difficult.In contrast, deep architectures adopt multi-stacked network layers for designed feature transformations, such as convolutional neural networks (CNNs) (Nguyen & Grishman, 2015;Zeng et al., 2014) and recurrent neural networks (RNN) (Geng et al., 2020;Wang et al., 2016;P. Zhou et al., 2016).These networks have the advantage of automatically extracting high-order abstract representations from raw input.
PLMs in deep neural networks have been widely adopted to embed tokens into distributed representations to learn better relation representations, such as ELMo (Peters et al., 2018) and BERT (Devlin et al., 2018).PLM tuning in RE has achieved great success (Torfi et al., 2020).PLMs typically consist of billions of parameters that are automatically learned from external resources.These parameters encode rich knowledge of sentences that is valuable for downstream tasks (Brown et al., 2020).Therefore, PLMs are tuned with annotated examples during the training process to learn task-relevant representations.In this field, there are two paradigms for tuning PLMs: fine-tuning and prompt tuning.
In the fine-tuning paradigm, PLMs are used to map every token into a distributed representation, such as BERT, ALBERT (Lan et al., 2019), and RoBERTa (Y.Liu et al., 2019).PLMs are effective in addressing the feature sparsity problem because they are pretrained from external resources using unsupervised methods (C.Li & Tian, 2020;Soares et al., 2019).K. Zhao et al. (2021) proposed a graph neural network for a joint entity and RE.R. Li et al. (2021) proposed an entity correlated attention neural model to extract entities and their relations.W. Zhao and Zhao et al. (2022) proposed a gated and attentive network to collaboratively extract entities and their relations.Hang et al. (2021) proposed an end-to-end neural network model for the joint extraction of entities and overlapping relations.Chen et al. (2021) combined a neural network with feature engineering and proposed a neuralized feature engineering method.Q. Zhao and Xu et al. (2022) proposed a knowledge guided distance supervision model for biomedical RE.Cohen et al. (2020) used the schema of question answering to verify the feasibility of RE.P. Li and Mao (2019) proposed a knowledgeoriented CNN for causal RE.Lyu and Chen (2021) proposed an entity type restriction, where entity types were exploited to restrict candidate relations.K. Zhao and Yang et al. (2022) proposed a consistent representation learning method for few-shot RE.
Prompt tuning has received considerable attention in recent years and has achieved great success (Hu et al., 2021;P. Liu et al., 2021).In this paradigm, RE is implemented as a mask language model that addresses two issues: template design and verbaliser construction.In a related work, Han et al. (2021) proposed a PTR model that applied logic rules to construct prompts with several sub-prompts.This model was able to encode prior knowledge of each class into prompt tuning.Shin et al. (2020) proposed a gradient-guided method to create prompts automatically.Gao et al. (2020) presented a prompt model that used sequence-to-sequence models to generate prompt candidates.Cui et al. (2022) proposed a prototypical verbaliser to learn prototype vectors through contrastive learning.Xiang et al. (2020) proposed knowledge-aware prompt-tuning that jointly optimised the representation of a prompt template and answered words with knowledge constraints.

Methodology
The task of extracting entity relations was formalised to provide a formalised discussion, as follows.
A relation instance is defined as a 3-tuple I = r, e 1 , e 2 that contains a relation mention r and two named entities e 1 and e 2 .The relation mention r is a token sequence r = [t 1 , t 2 , . . ., t n ].The entities e k = [t i , . . ., t j ] and (k ∈ {1, 2}) are a substring of r.Y = {y 0 , y 1 , . . ., y M } is a relation type set that is composed of M positive-relation types and one negative-relation type y 0 .I = {I 1 , I 2 , . ..} represents a relation instance set.The RE is subsequently represented as a map between I and Y, which is expressed as: where f is a function which can be a shallow model (e.g. a support vector machine or maximum entropy classifier) or a deep neural network (e.g. a CNN or RNN).In a traditional model, a deep architecture (denoted as N ) is implemented on the original input r to extract its representation.The network N can be embedded with a PLM to support token embedding and encode external knowledge, which is denoted as where H i is an abstract representation of token t i .H is often mapped into a vector, then fed into a classifier (C) to make a prediction.This process is formalised and expressed as: In prompt tuning, class types are verbalised into a token set V that is composed of relation types or category labels (e.g."true" or "false").Elements of V are referred to as "type tokens".A prompt is defined as a template with slots that can be filled by verbalised type tokens, for example, "P = It is [MASK]".Subsequently, this template is concatenated with raw input and fed into a deep network to predict the distribution of type tokens in the position of "[MASK]".This is expressed as: where "+" denotes the character string concatenation operation.In prompt tuning, the confidence score is assigned to a type token v ∈ V for each slot ([MASK]) in the prompt template instead of outputting a class label based on token representations [H 1 , . . ., H L ].
A CPA model was proposed in this study to perceive named entities and encode the semantic dependencies between them.The proposed model is composed of three components: cue encoding, PLM adapting, and prompt decoding.The architecture of the proposed model is shown in Figure 1.
In the cue encoding component, entity cues were implanted into a sentence to learn the semantic dependencies of a relation instance instead of implementing a classifier on raw input.In the PLM adapting component, two Adapter layers were added to tune PLMs for entity cueing (Houlsby et al., 2019).In the prompt decoding component, a goal-oriented prompt template was designed to decode the potential semantic features of a PLM for RE.Each component is discussed in detail below.

Cue encoding component
Directly implementing a deep network on r typically causes serious performance degradation because the network is unaware of the positions of the considered entities.To address this problem, entity cues were implanted into an input to control the attention of a deep network for learning task-specific representations.This was formalised and is expressed as: where c k and /c k are specific tokens representing the start and end boundaries of the entity e k (k = {1, 2}), respectively.These are referred to as entity cues.Equation ( 4) concatenates the two tokens on both sides of e k .In Equation ( 5), e k /Cueing(e k ) denotes the string replacement operation, where e k is replaced by Cueing(e k ).Therefore, the function Cueing(r) implants entity cues on both sides of the considered entity pair.Using these settings, Equation ( 2) can be revised as: In Equation ( 6), the entity cues that were implanted into the input enable the deep network to focus on the considered entity pair.The classification is then based on a sentence representation relevant to the considered entities.This approach is effective for learning the contextual features and semantic dependencies of a relation instance.
In prompt tuning, researchers have mainly focussed on designing prompt templates to tune PLMs for downstream tasks (Brown et al., 2020).It was assumed that tuning PLMs to attend to task-specific information in prompt tuning was also valuable.Therefore, this study focussed on designing and implanting entity cues for tuning PLMs to support RE. Several cueing strategies were proposed in this study, as listed in Table 1.
In Table 1, square brackets indicate that the inner is a token sequence and Cueing o indicates that the input relation mention remained unchanged.Cueing e replaces entity e k (k ∈ {1, 2}) with a token sequence " c k , e k , /c k ".This is the traditional strategy used in related studies (Chen et al., 2021;Qin et al., 2021).Note that all pairs of closed braces and parentheses are also used as tokens to indicate the positions of the named entities.
In Cueing et (e k ), the "type 1 " and "type 2 " tokens are denoted by head and tail entity types, respectively.Different braces are used to distinguish different entities.

PLM adapting component
The prompt is concatenated with the input and fed into a deep neural network to learn the token representations H.This is expressed as: where P(e 1 , e 2 ) is a prompt template used to generate output for the RE.This is discussed in Section 3.3 in detail.
The PLM N M is modified to adapt to the proposed entity cues by adding two Adapter layers, as shown in Figure 1.Each Adapter consists of two linear layers and a nonlinear layer, and its output size is consistent with that of its input.This can be formalised and expressed as: where, x is the input hidden states, and W , W , b and b are trainable parameters.The dimension size of A(x) is the same as that of x.N M with Adapter is represented as N A M .In the output layer, the normalised confidence score that N A M assigns as a type token v ∈ V to [MASK] i for each slot ([MASK]) in a prompt template was used instead of outputting a class label based on token representations [H 1 , . . ., H L ].This is expressed as: where H M i ∈ H is the representation of a [MASK] i, and H v is the token-type representation of v ∈ V in the employed PLMs.Subsequently, RE is transformed into a token predication in masked slots.Given a relation instance I, the distribution of the type token v in slot [MASK] i is expressed as: The prompt tuning outputs relation types as verbalised-type tokens, which have the advantage of making full use of the rich knowledge of PLMs.

Prompt decoding component
Each PLM contains a large number of parameters that are pre-trained from external resources using unsupervised methods, encoding rich knowledge for RE.A goal-oriented prompt template was designed for the prompt-decoding component to decode the potential semantic features of a PLM.Specifically, a relation prompt was defined as a template with three slots.For example, "P(e 1 , e 2 ) = the head entity type [MASK] 1 e 1 [MASK] 2 the tail entity type [MASK] 3 e 2 ", where [MASK] takes values from the token set V. V is composed of entity types and associations between types.Corresponding labels for each mask are listed in the Table 2.
The masked tokens for the relation representations in the three datasets are listed in Table 2.The column "Relation Types" denotes the redefined relation types in the manually annotated datasets." [MASK] 1 ", "[MASK] 2 ", and " [MASK] 3 " are template slots that should be predicated by a prompt tuning relation classifier.For example, in the ACE corpus, a "PER-SOC" entity relation was identified if the outputs for the three slots were the tokens "person", "was related to", and "social".
Examples to demonstrate the utilisation of the proposed entity cues and prompt templates are presented in Figure 2. "Full prompt" is a prompt template in which a template has context with three slots.Slots [MASK] 1 and [MASK] 3 can take values from "person", "country", and ••• .These denote the types of named entities.[MASK] 2 takes values from "was born in", "was located in", and ••• .This is used to indicate the relation between named entities.In "Naive prompt", three [MASK]s are directly used without any contextual words.This was mainly used for comparison purposes.
The cueing strategies listed in Figure 2 were concatenated with "Full prompt" and "Naive prompt", where ⊕ denotes the concatenating operation.For example, "Cueing et (e k )+Full" indicates that e 1 and e 2 are first replaced in r by the two strings "{(type 1 ) e 1 }" and " type 2 e 2 ", respectively, given a relation instance r, e 1 , e 2 .Subsequently, the revised relation mention (r| e k /Cueing ht (e k ),k={1,2} ) is concatenated with the full prompt.The output is fed into a PLM to predict the type of tokens in each [MASK].If a PLM outputs "person", "is parent of", and "person", then a "person:parent" relation is identified between e 1 and e 2 .

Experiments
In this section, the proposed CPA model is verified using three popular evaluation datasets.
The model is then compared with several state-of-the art models.Experiments were also conducted to demonstrate the advantages of the proposed CPA model using few-shot learning and a case study.

Datasets and experimental settings
The experiments were conducted using three evaluation datasets: ACE 2005 English 1 , SemEval 2010 Task 8 (Hendrickx et al., 2019), and ReTACRED (Stoica et al., 2021).The ACE 2005 English corpus is a classic and widely used dataset that was annotated from newswire, broadcast news, broadcast conversation, weblog, discussion forums, and conversational telephone speech.The SemEval 2010 corpus was published by the 8th Task of Semantic Evaluation Conference in 2010.ReTACRED is a large-scale supervised RE dataset obtained through crowdsourcing and is targeted toward TAC KBP relations.The statistics for these datasets are presented in Table 3. RoBERTa LARGE (Y.Liu et al., 2019) was adopted as the PLM in these experiments.The maximum length of each input was set to 150.The Adam optimiser (Kingma & Ba, 2014) was used as the model optimiser.The dropout rate was set to 0.1 to avoid overfitting.The epochs, learning rate, and batch size were set to 30, 1e-5, and 64, respectively.For comparison with related work, the experimental settings of Qin et al. (2021) were used in the ACE dataset and those from Han et al. (2021) were used in the other two datasets.

Comparison with related works
The Cueing et (e k ) strategy was adopted in this experiment, as shown in Table 1.Here, entities e 1 and e 2 in a relation mention were replaced by strings "{(type 1 ) e 1 }" and " type 2 e 2 ", respectively.The revised relation mention was concatenated with the full prompt, as illustrated in Figure 2. Every concatenated string was fed into a PLM with an Adapter to predict the type of tokens in the masked slots.The results were compared with those of several related studies on three public evaluation datasets: ACE 2005, ReTACRED, and SemEval 2010.ME (Kambhatla, 2004) and SVM (G.Zhou et al., 2005) represent the maximum entropy and support vector machine classifiers for the ACE 2005 dataset, respectively.FCM denotes a feature-rich compositional embedding model for RE (Gormley et al., 2015), and Mix-CNN (Zheng et al., 2016) are CNN-based methods for RE.SSM is a set space model that calculates the features in a sentence to alleviate the sparse-feature problem (Yanping et al., 2017).Dual-PN denotes an integrated dual pointer network with a multi-head attention mechanism (Park & Kim, 2020).BERT-CNN used the rich semantics of PLMs in a CNN (Qin et al., 2021).All these performances are presented in Table 4.
The results in Table 4 demonstrate that shadow modes (the ME and SVM models) achieved a lower performance because they used categorical features, which cannot encode the semantic features of words.In deep neural networks, the Mixed-CNN and FCM models also exhibited low performance without the utilisation of PLMs.The performances of Dual-PN, SSM, and BERT-CNN improved considerably from the word embeddings in the PLMs, which were effective in learning semantic features from external resources.However, these models could not make full use of the knowledge in the PLMs because there was a gap between the pre-training and fine-tuning objectives.The proposed model achieved state-of-the-art performance compared to the fine-tuning models.
PA-LSTM combined a long short-term memory (LSTM) sequence model with entityposition-aware attention for the ReTACRED dataset (Zhang et al., 2017).C-GCN denotes a contextualised graph convolutional network that was used to learn effective representations (Zhang et al., 2018).Joshi et al. designed a SpanBERT model to represent and predict the spans of text, which achieved good performance (Joshi et al., 2020).REBEL is an autoregressive approach that frames relation extraction as a seq2seq task (Cabot & Navigli, 2021).These performances are presented in Table 5, where " * " indicates that the performance was generated by implementing the published source codes because its precision and recall were not reported.
The results in Table 5 demonstrate that PA-LSTM and C-GCN achieved low performance because they are fine-tuning models that do not make full use of the rich knowledge in PLMs.The proposed approach achieved the best performance in the ReTACRED dataset compared with other prompt-tuning models (e.g. the PTR model).This was because the proposed model contains cue-encoding and PLM adapting components that effectively learned the contextual features and semantic dependency information of a relation instance.
For the SemEval 2010 dataset, MV-RNN (Socher et al., 2012) and SDP-LSTM (Y.Xu et al., 2015) utilised an RNN to learn the dependency information of entity relations.CR-CNN (Santos et al., 2015), Multi-Channel (Chen et al., 2020).TACNN integrated an attention mechanism into a CNN to increase the effect of the relationship matrix weights of the two entities (Geng et al., 2022).TRE (Alt et al., 2019), R-BERT (Wu & He, 2019), and IA-BERT (Tao et al., 2019) are PLM-based fine-tuning methods, whereas PTR (Han et al., 2021) and Know-Prompt (Xiang et al., 2020) use PLMs that are more suitable for RE tasks using prompt tuning.All these performances are presented in Table 6.
The results in Table 6 demonstrate the same trends as those in Tables 4 and 5.In the SemEval dataset, fine-tuning models achieved higher performance than traditional neural networks without PLMs.Prompt-tuning models also exhibited better performance because they were more adept at using the potential knowledge of PLMs.However, prompt-tuning models are not always better than fine-tuning models because fine-tuning models also address the gap between PLMs by integrating external knowledge (e.g.syntactic indicators).The proposed method learned more task-related dependency features from the raw data by integrating the three components.The proposed model also achieved state-of-theart performance.

Ablation study
An ablation study was conducted to further quantify the contribution of each component of the proposed model.The ACE dataset was used in this experiment.Here, "w/" indicates that the corresponding component was used and "w/o" indicates that some components were omitted from the CPA model to determine the influence on the final performance.Omitting the prompt component refers to the use of a naive prompt, as discussed in Section 3.3."replace" indicates a component was replaced with another component.The results are presented in Table 7.
The results in Table 7 demonstrate that each of the three components influenced the final performance.The model could not effectively learn the representation of the mask if the goal-oriented prompt was removed.Therefore, the performance was degraded.The model was weak in perceiving named entities when the cueing strategy was removed, leading to serious performance degeneration.Therefore, the cueing component and goaloriented prompt were highly beneficial because they were effective in learning the semantic dependencies of a relation instance.The results also show that these three components were mutually promoted.The performance decreased considerably if all of them were omitted.
"Entity hints" (Giorgi et al., 2022) and "Entity markers" (W.Zhou & Chen, 2021) have also been proposed for relation extraction in related works.They were applied in fine-tuning models.Comparing with them, our "Entity cues" have different structures.The result in Table 7 indicates that, "entity cues" achieved better performance.It is more effective to learn sentence representations relevant to the considered named entities in prompt-tuning method.
The result in Table 7 indicates that the CPA model based on BART (Lewis et al., 2019) has a significant performance degradation 2 .The reason is that BART was mainly proposed as decoders to support generative tasks.Because the CPA model needs to predict verbalisedtype tokens in the slots "[MASK]", it is optimised under the same pre-training objectives in the training process.Therefore, RoBERTa achieved better performance in the CPA model.

Performance on different cueing
Cueing strategies with naive and full prompts were compared to demonstrate their effectiveness and the influence of cueing strategies on performance.The ReTACRED and ACE datasets were used in this experiment.The results are presented in Table 8.
(1) In Cueing o (e k )+Naive, every original input was directly concatenated with a naive prompt.This was mainly conducted as a baseline of prompt tuning for comparison.Cueing o (e k )+Full was the strategy used in study (Han et al., 2021).The Full prompt contains contextual words that considerably improved the performance.(2) In Cueing e (e k )+Naive, entity cues were implanted into the input to indicate the position of the entities e i (i ∈ {1, 2}).Compared with the related work in Tables 4 and 5, this approach achieved competitive performance.This result indicates that entity cues are very powerful in prompt-tuning based models.Cueing e (e k )+Full also outperformed the naive version.
(3) In this cueing strategy, different entity cues were used to distinguish between the entities.Here, contextual words were used as entity cues (e.g.entity types) instead of specific tags (e.g.c k or /c k ).This strategy was effective in encoding the contextual features and semantic dependencies of a relation instance because entity cues and prompts contain contextual words.This strategy achieved a state-of-the-art performance.
RE achieved a more robust performance in all experiments when entity cues and prompts were used simultaneously.The proposed cueing strategy packaged the entity order and location together, whereas traditional entity cue methods only mark the location of the entity.This setting achieved the best performance in the experiments.

Performance on few-shot
PLMs encode the rich knowledge of a sentence, which is valuable in supporting few-shot learning.In this experiment, the feasibility of the proposed method was evaluated for fewshot learning.The same seed (seed = 42) was used to randomly sample the K-shot (K ∈ 8, 16, 32) from each relation class of the training set.These were used to tune the PLM and subsequently evaluated on the entire testing dataset.For comparison, the typical R-BERT and PTR models were used in the fine-tuning and prompt-tuning paradigms, respectively.The impressive results are shown in Table 9.
The prompt-tuning method was more suitable for few-shot learning than the fine-tuning method.The proposed model also outperformed the traditional fine-tuning (R-BERT) and prompt-tuning (PTR) methods.These results demonstrate that it is not reliable to tune a large model with very few samples in the fine-tuning paradigm because it leads to weak performance.However, task-specific information can be learned from PLMs using very few samples in prompt-tuning learning.The performance improved steadily in all the experiments when the number of shots was increased.Few-shot learning exhibited impressive performance in the SemEval 2010 dataset and achieved competitive performance when K = 32.However, its performance degraded considerably in the ReTACRED dataset compared to the results in Table 5.This is because the RE task in the ReTACRED dataset was more challenging owing to unbalanced data.Additionally, its performance depends significantly on the number of training data.However, compared with the R-BERT model, prompt-tuning methods still achieved robust performance.The result indicates that the prompt-tuning approach was insensitive to the data imbalance problem in few-shot learning.
The influence of the training epochs on few-shot learning is shown in Figure 3.The proposed model demonstrated a faster convergence and higher performance.It also exhibited a stable performance in the training process.This conclusion is similar to that from the results in Table 9.
The effectiveness of prompt-tuning in few-shot learning has great potential in real applications.It has the advantage to considerably reduce the requirement for manually annotated datasets, which are expensive in human labour and time.Furthermore, under the support of prompt-tuning, the migration between different domains is much easier.Another advantage of prompt-tuning is the potential to make full use of PLMs for natural language processing.PLMs contains a huge number of parameters pre-trained from external resources with unsupervised methods, which is incredibly thirsty for computational

Case study on ACE
The task of extracting relations from the ACE dataset was more difficult compared with the ReTACRED and SemEval 2010 evaluation datasets and is characterised by two challenges.First, two named entities of a relation instance can overlap in a sentence in the ACE corpora.For example, the phrase "a waiting shed at the Davao City International Airport" was annotated as an FAC (facility) entity.It was also nested with an inner FAC entity "the Davao City International Airport".Additionally, a PART-WHOLE relation was defined between them.
The second challenge is that all entity pairs in a sentence must be verified to predict the possible relations between them.However, distinguishing them is difficult because all entity pairs in the same sentence share the same contextual features.Two experiments were conducted to demonstrate the influence of these two challenges on the performance and effectiveness of the proposed model.I represents the entire set of relation instances.Two strategies were used to divide relation instances.In the first strategy, the data were divided into two parts, Î1 and Ï1 ( Î1 ∪ Ï1 = I and Î1 ∩ Ï1 = ∅), where set Î1 contained relation instances composed only of nested entity mentions.In the second strategy, data were divided into two parts, Î2 and Ï2 ( Î2 ∪ Ï2 = I and Î2 ∩ Ï2 = ∅).The set Î2 contained all relation instances that shared the same relation mentioned at least twice.Subsequently, every part was evaluated on an independent CPA model to compare the performances.The results are presented in Table 10.
In the first case, a relation mention was only composed of a named entity, where contextual features are rare for RE.Therefore, it is customary to expect that the performance of Î1 will be lower.However, unexpectedly, the results show that Î1 performed better than Ï1 .The reason for this may be that relations in a single nested named entity only occurred with specific relation types in the first case, which were easily recognisable.Furthermore, named entities with overlapping structures were typically noun phrases, which often have normal structures that are helpful in RE.In the second case, Î2 achieved a lower performance because many relation instances in Î2 shared the same relation mentions.In this case, different entity pairs in a relational instance shared the same contextual features.However, the relations between these entity pairs were entirely different, which confused the model and lead to performance degradation.
The proposed CPA model exhibited competitive and stable performance in all cases compared with the traditional prompt tuning model (PTR).The performance was clearly improved, particularly in the Ï1 and Ï2 datasets.The results indicate that the proposed model was effective in learning semantic features and encoding semantic dependencies in a relation instance.

Limitations of CPA model
The CPA model is a prompt learning approach based on PLMs.It also has three major limitations.First, manually designed prompt templates are required for predication.Therefore, the migration between different corpora is difficult.Second, the quality of the templates is influential on the final performance.However, at current, there is no standard method available to generate these templates.Third, the effectiveness of prompt learning heavily depends on PLMs with a growing number of billions of parameters.Because the reason of computational complexity, it limits the practicability of this approach.

Conclusion and future work
This study proposed a CPA model for RE, in which predicting relation was implemented as a verbalised type token prediction task.In this model, a goal-oriented prompt template and a novel cueing strategy were designed to perceive named entities for decoding potential knowledge of PLMs.Furthermore, an Adapter was proposed to learn the iteration between them.Several experiments were conducted to evaluate the effectiveness of our model for relation extraction on three popular public evaluation datasets.It achieved state-of-the-art performance for relation extraction.In future work, the cueing strategy can be extended to support other NLP tasks.Furthermore, more studies can be conducted to reveal the mechanisms of the cueing, adapting, and prompting frameworks.

Figure 1 .
Figure 1.Architecture of our proposed CPA model.

Figure 2 .
Figure 2. Example of entity cues and prompt templates.

Table 2 .
Labels of [MASK] on different datasets.

Table 3 .
Statistics of different datasets.

Table 4 .
Performance on ACE 2005 English dataset.

Table 5 .
Performance on the ReTACRED dataset.

Table 8 .
Performance with different cueing strategies on ReTACRED and ACE 2005 datasets.

Table 9 .
Evaluation of few-shot case using SemEval 2010 and ReTACRED datasets.Influence of Training Epochs on Few-shot Learning using SemEval 2010 dataset.

Table 10 .
Performance of complex data on ACE dataset.power.Instead of tuning PLMs for task-specific tasks, prompt tuning models predicate relation types as verbalised type tokens.It avoids the cost for fine-tuning, which is an effective way to decode potential semantic features of a PLM.Our proposed CPA model implanted task-relevant cues into the input sentence.Then, a novel Transformer was design to tune PLMs for perceive named entities in a sentence.It is effective to learn task-specific contextual features and semantic dependencies in a relation instance and to decode potential knowledge of PLM for predication.