Research on Question Answering Technology Based on Bi- LSTM

Question Answering (QA) is always the key issue in the Natural Language Process (NLP). This paper mainly researches the question answering model based on Bi-directional Long Short Term Memory networks (Bi-LSTM), with the use of the WebQA dataset to train the model. Experiment results show that ACC@1 in Bi-LSTM model is 55.00%, ACC@3 is 73.24%, and ACC@10 reaches to 86.64%. The Bi-LSTM model outperforms Best Match 25 (BM25) algorithm in all of the three metrics, which proves the superiority of Bi-LSTM model.


Related Work
In this section, we will elaborate on traditional question answering technology and some question answering techniques based on neural networks.

Traditional Question Answering Technology
The traditional question answering technology generally consists of three parts: problem analysis, information retrieval and answer extraction. As shown in Fig. 1. Problem analysis is to classify the problem, extract keywords from the problem and expand the keywords in time. If it is a Chinese question answering, you also need to perform word segmentation. Information retrieval is to search the relevant documents in the data source through the keyword set obtained by the problem analysis, and finally form a candidate set. The answer extraction is to find candidate answers in the candidate set, sort the candidate answers according to some rules and return the best answer. Figure 1 Traditional question answering structure. Cui et al. proposed a method based on the results returned by an external search engine to obtain keywords, and the point mutual information between the finally returned words can be used to explore the topic [8]. Stoyanchev et al. proposed the use of named entities, noun phrases, verb phrases and prepositional phrases for semantic match, which can greatly improve the accuracy and recall rate of information retrieval [9]. Ravichandran and Hovy manually constructed some features to extract answers based on the surface information of the text [10].
Traditional question answering techniques are mostly based on analyzing sentence structure and performing keyword matching. Although these methods are effective, there are some disadvantages. For example, there are a large number of manual pre-processing operations and excessive reliance on syntactic parsing of questions, resulting in such methods being inefficient and poorly scalable.

Neural Network Based Question Answering Technology
Kalchbrenner et al. firstly applied Convolutional Neural Networks (CNN) to the field of NLP [11]. Hu et al. used convolutional neural networks to extract semantic information between sentence contexts and constructed a model to solve sentence similarity matching [12]. M. Feng et al. proposed a deep learning framework for sentence matching using single-layer CNN and multi-layer CNN [13]. W. Yin et al. used CNN with attention mechanism to process the sematic information expression of generated sentences, taking into account the characteristics of different granularities at the same time [14].
Tan et al. took the advantage of recurrent neural networks for sentence sequence instead of convolutional neural networks, and used bidirectional recurrent neural networks to extract semantic information of sentences [15] [16]. LSTM is one of the earliest effective variants of RNN. Hermann et al. used multiple machine reading mechanisms on LSTM to effectively extract the relationship between questions and sentences [17]. Wang et al. proposed using LSTM to learn the semantics of question and answer to select the correct answer [18]. However, LSTM can only use the previous context. Bi-LSTM can perform better by using both previous and posterior context.
In the more fine-grained classification, for the classification tasks of positive, neutral, and negative, it is necessary to highline the importance of the interaction between emotional words, degree words, and negative words. Bi-LSTM can capture the Bi-directional semantic dependence by using both previous and posterior context. Accordingly, this paper uses Bi-LSTM to make better use of the semantic features of the sentence level.

Methodology
RNN are widely used in NLP tasks, but it is difficult to train deeper RNN model because of gradient disappearance and gradient explosion problems [19]. The gated recurrent unit network such as LSTM solves this problem perfectly [20]. Fig. 2 shows the basic structure of the LSTM model. The outer layer of the LSTM network is a sequence model. There are multiple control gates and memory units inside the node. The introduction of control gates and memory units allow the LSTM to selectively forget or enhance the information of the input sequence, which can solve the gradient disappearance and gradient explosion problems effectively. which are calculated as follows respectively: The input of each gate represents as the input t x of the current node and the output 1 t h  of the previous node. The output value is mapped to the range of (0, 1) by the sigmoid function. W is the weight parameter, while b is the offset parameter.
The information of memory unit t C is determined by t x , 1 t h  as well as historical memory 1 t C  , and is adjusted by the forget gate t f and the input gate The final output of the node is determined by memory unit t C and output gate t o : The memory unit t C and output t h of the current node will have an impact on the next node as historical information iteratively. The Bi-LSTM can process the context of a sentence from the forward directions and backward directions to find accurate contextual associations of words.
The match score between the question and the answer is measured by the cosine similarity. Let represent the Bi-LSTM encoded vector of the question q and the answer a respectively, then the match score M can be expressed as:  represents the norm of the vector.
This paper uses the BM25 algorithm as a benchmark method to compare with the Bi-LSTM model results. BM25 is a sorting algorithm used in information retrieval, while it is also widely used in the semantic matching task of question answering. The definition of the BM25 algorithm is: where j w represents the th j word in the question q .

Datasets
The model is trained using the WebQA dataset, which is constructed by Baidu Company using Baidu Knows [21]. Table 1 is the statistics of the WebQA dataset. The dataset has 36,181 questions. The number of each question's answers is from 2 to 12 and there are 140,897 answers totally. The data set is divided into three parts: training set, development set and test set. Table 2 is an example of a question and answer in the data set. q is the problem and a is the answer that matches the question.

Evaluation Metric
For each question in the test set, the match score to each answer in the answer library is calculate by cosine similarity. The result is measured by ACC@k. The match is considered successful when the correct answers are in the top k candidate answers with the highest similarity.

Setup
The experiment applies Tensorflow Toolkit to build the neural network framework. All questions and answers are segmented by jieba, a widely used word segmentation tool. The word vectors are initialized by zhwiki, which is pre-trained by Wikipedia Simplified Chinese corpus. The dimension of word vector is 50 dimensions, and the word vector remains unchanged during training.
The experiment uses the hinge loss function for training. For each question, a correct answer and a randomly selected answer are paired as a training sample. To increase the training effect, each correct answer expands 5 training samples. There are totally 582,380 training samples. The experiment uses stochastic gradient descent (SGD) algorithm to update the parameters, and the batch size is set to 20. The model is trained for 80 epochs and the initial training learning rate is set to 0.4. The learning rate is halved after every 20 epochs. Table 3 compares the results of the Bi-LSTM model and BM25 model, with k values of 1, 3, and 10. show that the accuracy of this two algorithms increases with the loosing of the matching restriction. If the restriction is limited to match the correct answer at one click, the correct rate of the Bi-LSTM model is 55.00%, and the correct rate of the BM25 algorithm is 50.98%. Under the same conditions, Bi-LSTM has a higher accuracy rate than BM25.

Conclusion
This paper mainly discusses the question answering technology based on Bi-LSTM model and uses WebQA data set for training. The final result shows that the Bi-LSTM model outperforms BM25 remarkably. The model will be optimized continuously and we look forward to achieve better performance on QA tasks.