Dialogue Logic Aware and Key Utterance Decoupling Model for Multi-Party Dialogue Reading Comprehension

Multi-party dialogue machine reading comprehension (MRC) brings an unprecedented challenge due to the multiple speakers and the complex discourse linkages among speaker-aware utterances. The majority of current methods only consider the textual aspects of dialogue situations, and pay little attention to crucial speaker-aware cues. This prevents a model from capturing the speaker’s intention and important discourse information for questions in a complex discourse relationship, leading to the model giving wrong answers. In this paper, we construct a dialogue logic graph module by the relational graph convolutional network (R-GCN) to structure the dialogue information, and design a speaker prediction task to enhance the ability to capture discourse logic. Additionally, we construct a key utterance information decoupling module that focuses on the key discourse information flow involve questions, and filters out noise information. Extensive experiments FriendsQA and Molweni show that our approach outperforms competitive baselines and current state-of-the-art models, especially when dealing with more rounds of dialogue and questions involving people, events and time.


I. INTRODUCTION
Teaching machines to comprehend a given context paragraph and answer corresponding questions is one of the long-term goals of natural language processing and artificial intelligence. Machine Reading Comprehension (MRC) [1], [2] gives computers the ability to derive knowledge and answer questions from textual data. The traditional machine reading comprehension usually involves question and answer (QA) of the information in a single scene text. The multi-party dialogue MRC [3], [4] is more challenging due to the complex relationship and knowledge background involved in the dialogue: First, the multi-party dialogue MRC has a large number of speakers that lead to more complex key utter-The associate editor coordinating the review of this manuscript and approving it for publication was Mu-Yen Chen . ances and speaker information [5]. Second, the discourse logic is graph structured [6]. The discussion in multi-party dialogue may involve multiple events, and two utterances with dependency transmission may not be adjacent, resulting in more flexible and complex dependency information transmission among utterances. Third, the multi-party dialogue MRC involves multiple turns of dialogue, and contains a lot of invalid information, which not only hinders the inference of key information flows, but also interferes with the prediction of the answer span [7].
To demonstrate the challenges of multi-party dialogue MRC more intuitively, we selected a dialogue case from FriendsQA [8]. As shown in Fig 1, the whole dialogue involves 5 speakers and 9 utterances. The logical information of the discourse in multi-party dialogues shows a complex and graphic character. Some utterances also have a lot of VOLUME 11, 2023 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ referencing and omission, which pose great challenges for multi-party dialogue MRC. In the sample dialogue, Yang and Choi [8] labeled two questions (Q1 and Q2). For Q1, the model should notice the utterance U2 and understand that the ''you'' in U2 refers to the speaker 'Chandler Bing' in U1, so the model needs to extract the correct answer span about Q1 from the next utterance U3 spoken by 'Chandler Bing'; Q2 is exactly the omitted interrogative utterance U4 proposed by 'Ross Geller' to 'Chandler Bing'. Obviously, the subject is omitted in U4, which requires the model to understand that U4 is asking about 'Chandler Bing', thus extracting the correct answer span about Q2 in the next utterance U5 that 'Chandler Bing' says.
To overcome the challenges of multi-party dialogues, Li and Choi [9] encoded the question and utterance, and built a multi-head attention mechanism between token and utterance embedding. Li and Zhao [7] designed a self-supervised speaker prediction task by masking partial speakers to decouple and fuse the speaker information. Based on this, they added a pseudo-self-supervised key utterance prediction task to capture the key discourse information. The above work enhances the model's ability to understand the logic of discourse. Both of these modeling methods are ''structureless'' implicit modeling methods. These methods are deficient in reasoning about the answers to the questions because important logical information about the discourse structure is missing in the inference of the answers. In contrast to these two ''structureless'' modeling approaches, Liu et al. [10] used relational graphical convolutional neural networks (R-GCN) [11] to explicitly model discourse structure based on discourse dependencies, which can effectively integrate contextual knowledge into the multi-party dialogue MRC task and combine background knowledge with question responses to effective reasoning. However, Liu et al. [10] used words in the dialogue context as nodes in the relationship graph, and such an approach instead again lacks information about the textual structure among utterances.
To tackle the obstacles mentioned above, we propose a new approach for the problem of complex discourse logic relations and key information flow transmission in the multiparty dialogues. To keep the discourse logical structure information, we construct a Dialog Logic graph (DLgraph) by R-GCN [11] with two types of nodes for the speaker and the utterance representation of the corresponding dialogue turns. And we use different types of edges to represent different types of relationships. In order that the DLgraph can capture the discourse logic information well in multiple turns of dialogue, we design a self-supervised speaker node prediction task to optimize the representation of nodes and edges in the DLgraph. Besides, we design the key utterance information decouple module with multi-head attention mechanism, which enables the model to capture the key information in long dialogues. Compared to previous models, our work contributes to the improvement of multi-party dialogue MRC in three ways: 1) For the characteristics of the multi-party dialogue MRC, our method is novel in graph structured approach of discourse logic and key utterance information capture.
2) We introduce R-GCN to construct a graph-structured discourse logic decoupling module and designed a key utterance information decoupling module by multi-head attention mechanism. The two modules work in parallel to improve the model's ability to understand complex contexts.
3) Our model outperforms the ESa [5], DADGraph [6], and Spss [7] in experiments on two dialog MRC benchmark datasets from FriendsQA [8] and Molweni [12]. In addition, the results of performance gain experiments show that our model has significant improvement for the more rounds of dialogue and questions involving people, events and time.

II. RELATED WORK A. PRE-TRAINED LANGUAGE MODELS
The pre-trained language models (PrLM) [13] have reached remarkable achievements in learning universal natural language representations by pre-training large language models on massive general corpus and fine-tuning them on downstream tasks. BERT [14], which is derived from the Transformer's encoder, is the most representative among PrLMs [15], the multi-head self-attention in the Transformer is a vital mechanism, it is essentially a variant of the graph attention network [16] (GAT). The conventional workflow for BERT consists of two stages: pre-training and finetuning. Pretraining uses two self-supervised tasks: masked language modeling (MLM, prediction of randomly masked input tokens) and next sentence prediction (NSP, predicting if two input sentences are adjacent to each other). BERT only encodes and generates a language model, so an encoder is enough. It has powerful ability to learn and understand natural language representations such as BERT [14], GPT [17], ELECTRA [18], etc.
After BERT [14] was proposed, research [19] on BERT and its related variant models have emerged, such as RobBERT [20] (a BERT model with better robustness), ALBERT [21] (a miniaturized BERT model), etc. Meanwhile, the emergence of PrLMs has also pushed the processing results of many tasks in the field of NLP to a new level [22], [23]. Compared with previous neural network models such as LSTM [24] and RNN [25], BERT representations are hierarchical rather than linear, BERT embeddings encode information about parts of speech, syntactic chunks and roles. Enough syntactic information to be captured in the token embeddings. Hence, we chose BERT as the encoder for our architecture.

B. TRANSFORMERS FOR LEARNING DIALOGUE
Currently, M. Firdaus et al. [26] and W. Liu et al. [27] think that dialogue reading comprehension is mainly used in dialogue systems. W. Wei et al. [28] think dialogue reading comprehension is also used for intelligent human-computer interaction systems. Compared to the more mature two-party dialogue MRC [29], [30], one would expect applications such as dialogue systems to be able to handle more complex multi-party dialogue MRC. Due to the excellent performance of PrLMs in text-level NLP tasks (Section II-A), PrLMs have been widely used in the processing of multi-party dialogue MRC in earlier studies [31], [32]. However, the multi-head attention mechanism in PrLMs is more adapted to the linear structure of regular text, while the discourse structure of multi-party dialogues is more complex and graphical, resulting in PrLMs being far inferior to the processing of regular text for multi-party dialogues [5], [6]. PrLMs simply match the semantic meaning of the question and the dialogue context, and for dialogues with more colloquial textual content. This method is more likely to lead the model to give a wrong answer prediction in the final answer span prediction [7].
To make the model effectively use the speaker information flow, some research has used attention mechanism as a basis for decoupling and fusing speaker information flow by speaker mask mechanism [33]. Liu and Chen [34] started from the cluster of discourse trees to find different factors across domains from relevant data, modeled the discourse structure of multi-party dialogues based on Transformer using discourse dependency information. To enhance the supervision of model training, Wu et al. [35] introduced the QAConv dataset, on which they constructed two test scenarios, block mode and full mode. They used historical conversations as a source of knowledge, and infer the answers to questions through conversation retrieval. The above three approaches are similar to the ''structureless'' implicit modeling approach [7], [9].These approaches are missing the logical information about the structure of the discourse, so that noisy information cannot be filtered during the reasoning process. To capture the logical structure of information between utterances, we design the logical Dialogue Logic graph by using the R-GCN.

C. GRAPH MODELS FOR LEARNING DIALOGUE
In addition to these ''structureless'' implicit modeling approaches (Section II-B), some work have used Graph Neural Networks to model multi-party dialogues explicitly with good results. Since graph convolutional networks (GCNs) perform well in a variety of NLP tasks, Banerjee and Khapra proposed a memory-augmented GCN [36], which models multi-party dialogues using entity-relationship graphs and discourse dependencies of knowledge base. Ghosal et al. [37] used a dialog context window-based heterogeneous GCN to model the flow of affective information in multi-party dialogues. Hu et al. [38] proposed a model based on multimodal fusion GCN, which can model inter-speaker dependencies more effectively by using speaker information flow. Several research works above have shown that the discourse structure of multi-party dialogues can be modeled using graph neural networks, and more dialogue information on discourse logic can be obtained. Their methods have all achieved excellent results in the processing of tasks such as emotion recognition in conversation (ERC). However, most of these methods require manually labeled data, which is rare in practical usage scenarios. Meanwhile, methods use words in the dialogue context as graph nodes, methods causes model that lacks of information about the textual structure between the utterances.
Our approach works in parallel through a DLgraph and a key utterance decoupling module. While the model obtains dialogue logical information about the discourse, it also avoids missing information about textual structure between the utterances. Particularly, we have not used additional labels.

III. METHODOLOGY
In this section, we introduce our proposed model in detail. As shown in Fig. 2, our model, consists of four components: 1) the Transformer Encoder blocks for sequential encoding of multi-party dialogue contexts; 2) a Dialogue Logic graph (DLgraph) constructed by R-GCN, in which we capture the discourse logic structure information by speaker-utterance prediction task; 3) a Key Utterance Information Decoupling Module constructed by multi-head attention Transformer blocks, in which we capture the key utterance information by key utterance prediction task; 4) an extraction layer is used for the model to calculate the corresponding answer span to answer question.

A. TASK FORMULATION
Given a sequence of multi-party dialogues C = {U 1 , U 2 , . . . , U n }, where n is the number of rounds of multiparty dialogues; U i is the content of the discourse spoken by speaker S i in the ith round, i.e.
The overview of our model, which contains a Transformer encoder, a key utterances information decoupling module, a dialogue logic graph module and an information fusing layer.
sequence of problems Q associated with C, denoted as Q = {q 1 , q 2 , . . . , q s }. If question Q is an answerable question, the model needs to give the answer span a = [a s , a e ] corresponding to question Q in C in the context of the dialogue; if question Q is an unanswerable question, the answer span given by the model should be empty or unanswerable.

B. MULTI-PARTY DIALOGUE CONTEXTUAL ORDER CODING
In order for Transformer encoder blocks to better encode the contexts of multi-party dialogue, the dialogue contexts need to be processed in some way. For a given sequence C of multi-party dialogue contexts and a sequence Q of questions, we use both [CLS] and [SEP] tokens to connect the questions Q with the dialogue contexts C in the way proposed by Zhang et al. [3] denoted as sequence X: , then we input the sequence X into Transformer encoder blocks for word embedding and sequential encoding. The result of this module output is noted as H ∈ R L * D . L is the length of the sequence output by Transformer encoder blocks after encoding, and D is the hidden dimension of Transformer encoder blocks output.

C. DIALOGUE LOGIC GRAPH MODULE AND SPEAKER-UTTERANCE PREDICTION TASK
This module and the key utterance information decoupling module (Section III-D) are the most important part of the whole model. For solving complex references and omissions in the multi-party dialogue, we model the discourse structure of multi-party dialogue by R-GCN. To capture the speaker-utterance information flow and utterance dependencies in multi-party dialogues, we design a speaker-utterance prediction task.
We construct the Dialogue Logic graph (DLgraph) by R-GCN, denote as G = (V , E, R), where V is the vertex set, E is the edge set, and R is the relation set.
First, we extract the [SEP] token from H (from Section III-B) and use it as the node value of the utterance to embed each vertex in the DLgraph accordingly: where V S is the speaker vertex, V U is the utterance vertex. As shown in Figure 3, we construct edges E to connect pairs of vertices with dependent information passing in the DLgraph. Speaker-utterance information interaction bidirectional edge E Sij is used to connect speaker-utterance ver-  The dialogue logic graph module and speaker-utterance prediction task. In dialogue logic graph, the bi-directional arrow means that the information flows from and to both sides, the unidirectional arrow means that the information only flows from start nodes to end nodes.
is denote as h i : where σ is the activation function, R is the set of relations in the multi-party dialogue discourse structure, N r i is the set of neighbors of vertex Vi on relation r, c i,j is the normalization term, h For the m randomly selected vertex pairs (v i S ,v j U ) participating in the speaker-discourse prediction task, the speaker-discourse prediction layer uses the heuristic matching mechanism [39], [40] (Eq. 3) as the basis for determining the source of v i S through the speaker-discourse matching function (Eq. 4): where, Y 1 and Y 2 in Eq. 3 are the premise and assumption in the heuristic matching operation, ⊙ denotes the element-wise multiplication, Res is the matching result, σ is the activation function (softmax, sigmoid, etc.), β is the weight vector; v i U and v i S in Eq. 4 correspond to Y 1 and Y 2 in Eq. 3, respectively.
We calculate the loss for this prediction task based on the prediction result P forecast S_U given by the speaker-utterance prediction layer, combined with P target S_U using binary cross-entropy loss function (Eq. 6): S−U )) (6) As shown in Figure 3, the gradients of L S−U and the results of the speaker-utterance prediction task will be fed back to the DLgraph, the model will adjust and optimize the information representation in each node of the DLgraph based on the information fed back, so that each node has the corresponding discourse logic information.
Finally, we pass these node representations H Ui (from V U ) to the key utterance information decoupling module (Section III-D) as input, and the output H G of the whole speaker-utterance prediction and information interaction graph module will be fed forward into the information fusion span extraction layer (Section III-E) to participate in the prediction of answer span [a s , a e ].

D. KEY UTTERANCE INFORMATION DECOUPLING MODULE
In this module, we extract the [SEP Q ] and [SEP Ui ] tokens corresponding to the question and discourse from H (dialogue context representation, from Section III-B), as the corresponding node values, thus obtain 1 question node H q and n+1 token nodes H ti .
We collect the above two kinds of nodes as well as n discourse information nodes H Ui (from Dlgraph), denoted as H k (Eq.7): Subsequently, we input the three types of information nodes in H k into the multi-layer multi-head self-attention blocks, the information exchange process between nodes is as in Eq.8: where W Q z ,W K z ,W V z and W O are weight matrices in multihead self-attention mechanism, which can be updated continuously following the training.
After the three types of information nodes in H k go through the multi-layer multi-head self-attention blocks, VOLUME 11, 2023 the multi-party dialogue information contained within the nodes is fully exchanged and fused, finally get the question, utterances and tokens representation of the multi-party dialogue(H K q ,H K U ,H K T ): Through Eq. 10, we expand H K q into the same dimension as H K U , which is denoted as H K Q − expand (to facilitate the processing of the key discourse matching layer). After that, H K Q − expand and H K U will be co-fed into the key discourse matching layer, obtain H K ui with the greatest degree of association with H K q by Eq. 11: We calculate the loss for this task using cross-entropy loss function based on the results P KU given by the key discourse prediction layer, combined with the target discourse result P terget KU : The gradient of L KU will be fed back to the multi-layer multi-head self-attention blocks to enhance the interaction and fusion of discourse logic information among the nodes. And the model is optimized to focus on the key dialogue information to filter out the noise.
Through the processing of the key discourse information decoupling module, the multi-party dialogue information contained in the three types of nodes is fully exchanged and fused. We extract all the token node representations H K T from them as the output of the whole key discourse information decoupling module, participate in the prediction of answer span [a s , a e ] together with H G .

E. INFORMATION FUSION SPAN EXTRACTION LAYER
For the information representation H G , H K fed by the two modules in Sections III-C and III-D, we fuse them using Eq. 13: On this basis, we extract the information representation corresponding to the [CLS] token from H f , determine together with H f whether question Q is an answerable question (Eq. 14): P target A−Q = 1, Answerable question 0, Unanswerable question (15) combining P target A−Q , we calculate the loss for this judgment task using binary cross-entropy loss function (Eq.16): If question Q is an answerable question, we will use H f together with P K U (from Section III-D) to calculate the starting and ending nodes a s and a e for answering question Q (Eq. 17). Also calculate the cross-entropy loss for this task (Eq. 18): Thus the final cross-entropy loss rate for the entire task is the sum of L S−U (from Section III-C), L KU (from Section III-D), L A−Q and L A :

IV. EXPERIMENTAL RESULTS AND ANALYSISS A. EXPERIMENTAL SETTINGS 1) DATASETS
Molweni [12] is derived from the large-scale multi-party dialogue dataset -Ubuntu Chat Corpus, whose main theme is technical discussions about problems on Ubuntu system. This dataset features in its informal speaking style and domain-specific technical terms. In total, it contains 10,000 dialogues whose average and maximum number of speakers is 3.51 and 9 respectively. Each dialogue is short in length with the average and maximum number of tokens 104.4 and 208 respectively. Unanswerable questions are asked in this dataset. Additionally, this dataset is equipped with discourse parsing annotations which is not used by our model however.
FriendsQA [8] is also a multi-party multi-turn dialogue dataset, which provides 1,222 dialogues and 10,610 open domain questions. Compared with Molweni [12], the Friend-sQA dataset is smaller in size, but it has more average turns per dialogue, longer average discourse length, and a more balanced distribution of various types of questions in the FriendsQA dataset (show in Table 1).
The types of questions involved in the dialogue reading comprehension tasks provided by FriendsQA and Molweni were divided into six main categories (5W1H), containing answerable and non-answerable questions, and the number and percentage of different types of questions in the two benchmark datasets are shown in Table 2.

2) BASELINES
For both FriendsQA [8] and Molweni [12], we use BERT as a baseline model. In addition to this, we also use ELECTRA [18] as a baseline model to test whether the progress of our model can still be maintained over a more robust baseline. At the same time, we compare our model with ESa [5], DADgraph [6] and SPss [7].

3) EVALUATION METRICS
For the evaluation metrics of the experimental results, we use the same evaluation criteria as for the numerous dialogue MRC tasks: F1 score and exact match (EM). The F1 score is used to measure the average degree of overlap between model-predicted answers and the target answers. The EM is used to calculate the degree of exact match between the model-predicted answers and the target answers.

4) PARAMETER SETTINGS
We give different parameter settings (including learning rate, batch size and max length) for different datasets. For Friend-sQA, we give three parameters: 4e-6, 4, 512. For Molweni, the three parameters are 1.2e-5, 8, 384, respectively. The number of information decoupling layers is uniformly 3 to 5. As shown in Table 2, FriendsQA has more average turns of dialogue and longer average length of discourse. Therefore, the length of some dialogues in FriendsQA are longer than our settings and we use sliding windows to handle them. Table 3 shows our experimental results on on Molweni and FriendsQA. The experimental results show that our model outperforms all the baselines. Also, compared to ESa [5], DADgraph [6] and SPss [7], our model demonstrates corresponding advantages to varying degrees.

B. QUANTITATIVE EVALUATION
For the experimental results, we explored the way in which the three models ESa [5], DADgraph [6] and SPss [7] modeled the structure of multi-party dialogue discourse:

1) DADgraph [6]
DADgraph used discourse-dependent links and discourse relations to construct a graph neural network model for modelling the discourse structure of multi-party dialogues based on R-GCN, which surpassed DialogueGCN [37] and Dia-logueRNN [41], the neural network models used for dialogue processing at the time, to become the SOTA model of the time.

2) SPss [7]
The model is based on the Transformer multi-head selfattention mechanism, which captures the speaker information flow and the key discourse information flow through self-supervised and pseudo-supervised prediction tasks respectively, on the basis of which implicit modelling of discourse structure is accomplished.

3) ESa [5]
Ma et al. jointly constructed an enhanced speaker perception model through the Transformer masked self-attention mechanism and a heterogeneous graph network. The model jointly captures multi-party dialogue discourse cues from both speaker attributes and their perceptual relationships, thereby modelling the structure of multi-party dialogue discourse. At the same time, the model goes beyond the SPss [7] model to implement SOTA on Molweni [12].
Compared to the three models above, our model is able to handle the QA over dialogue more efficiently. This is mainly due to two important modules in our model. We structure the discourse logic through the Dialogue Logic graph (DLgraph), and optimise the representation of nodes and edges in the DLgraph by speaker-utterance prediction task. It enables the DLgraph to better capture discourse logic information over multiple turns of dialogue. In addition, we enhance the overall model's ability to focus on and capture the flow of key discourse information in long dialogues by the key discourse information decoupling module.
To investigate the gain effect of our model revealed when dealing with the six types of questions (5W1H) in the dialogue reading comprehension tasks, we conducted a comparison experiment using the baseline model ELECTRA [18] and our model on the FriendsQA dataset. The results of the comparison experiments are shown in Table 4.  Compared with ELECTR [18], our model has more significant improvement in the processing of all six types of problems. Among them, the most significant improvement is observed for the Who, What and When types of questions. The improvement for Who type questions is mainly due to our speaker-discourse prediction task, which improves the integration of speaker information by discourse representation nodes. The improvement for the What and When types of questions is due to the key discourse information decoupling module, which enhances the overall model's focus on key information while filtering out the more noisy information flows.
We also investigated the effect of the number of dialogue turns on the performance of different models in multi-party dialogues. The results of this investigation are presented in Figure 5. We can find that the difference between our model and the baseline model in terms of EM and F1 scores is insignificant when the number of dialogue turns is small (less than 10). Meanwhile, with the increase of dialogue rounds, the score of the baseline model has a significant downward trend. But our model does not suffer too much from the increase in noise information.

C. CASE STUDY
To more visually demonstrate the improvement of our model for multi-party dialogue MRC, we conducted multiple side-by-side comparison tests of our model with the baseline model BERT (testing both models separately using the same reading comprehension task in FriendsQA [8]). Figure 4 shows what was involved in one of the side-by-side comparison tests (dialogue text content, questions, and responses of the two models).
We intercepted the first seven dialogue turns of this dialogue scenario, Yang and Choi [8] gave three types of questions for these seven dialogue turns (Q1 for What-type, Q2 for Who-type, and Q3 for When-type), which are the three types of questions with the most significant performance gain in our model.
In Q1, the discourse involving Q1 is exactly the 'Scene' given at the beginning of the dialogue text. However, the baseline model does not understand the exact meaning of Q1 and therefore gives a rather absurd answer. In Q2, the baseline model does not notice that the 'Dr. Geller' in U2 is the 'Ross' mentioned in Q2. This referential phenomenon prompted the baseline model to overlook the fact that it was the 'Older Scientist' (the speaker of U2) who told 'Ross' that there was a seat next to him, not the 'Dr. Geller' mentioned in U2. In Q3, the question and its related discourse are omitted (Teddy is abbreviated to Ted in discourse U7, and New York is abbreviated to NY in Q3), but the baseline model does not properly deal with the omission in the multi-party dialogue and incorrectly takes the scene in which that dialogue is taking place at that time as the final answer to the question. Our model gives correct answers to all three different types of questions, thanks to two important modules that improve the overall model's ability to model graphical discourse structures, enhance the overall model's ability to focus on and capture the key utterance information flow in long dialogues, while effectively resolving the complex problems of referencing and omission that exist in dialogues.

V. CONCLUSION
In this paper, we propose a novel model for the multi-party dialogue machine reading comprehension task, which involves a Dialog logic graph module and a key utterance decoupling module. We use utterance-level representations and speaker representations to construct the Dialog logic graph module. Our model captures the potential discourse logic information, by R-GCN to enable information transfer between nodes, and we design speaker-utterance prediction task to refine speaker awareness in node representations. Key utterance decoupling module is used to understand the contextual semantic information in the multi-party dialogues, and we design the key utterance prediction task to focus the model on the key utterance related to the question. Hence, our model is able to filter out noisy information that is irrelevant to the question in the complex multi-party dialogues. We conduct corresponding experiments with the model on two multiparty multi-round dialogue benchmark datasets Molweni and FriendsQA. The experimental results show that Our model has a significant performance improvement compared to previous research in the multi-party dialogue MRC tasks.
TIANQING YANG received the B.E. degree in computer science and technology from the Yunnan Police Officer Academy and the M.E. degree in technology of computer application from Yunnan University. He is currently a Lecturer with the Big Data School, Baoshan University, Baoshan, China. His research interests include deep learning, natural language processing, and knowledge graph.
TAO WU is currently pursuing the B.E. degree with the Big Data School, Baoshan University, Baoshan, China. He has been working on natural language processing, since 2020. His research interests include deep learning, natural language processing, and knowledge graph.