Attention-BLSTM-CRF Based Method for Named Entity Recognition in Judicial Domain

Although the texts in the judicial field are relatively standardized, the entity categories are rich and the structure is different, and the entity expression is special in some legal documents, electronic files and guiding cases. In order to improve the effect of named entity recognition in the field of justice, this paper entity type can be divided into four categories, and presents a model of named entity recognition based on attention mechanism, structure improvement of input vector fusion CNN embed mode, BLSTM neural network at the same time we constantly study the characteristics of the context, finally by CRF decoding output sequence. The experimental results show that the method is helpful to the application research in the judicial field, and the experiment on the self-built corpus test set has achieved a good entity identification effect.


Introduction
In the Internet information era, judicial texts (judicial documents) play an important role in the construction of a law-based society, and normative judicial texts (judicial documents) help citizens to understand the specific details of cases and the process of judicial results. Although the judgment document of the court has the characteristics of standardization and accuracy, its structure is relatively complex for entity identification, which is different from the general entity identification. For example, the name of a person, the case will include the defendant, the plaintiff, the judge, and the criminal suspect, etc., while the location may be divided into the place of the case, the place of reporting, and the place of handling a case. Time and literature are also very sensitive and important words in the judgment documents. Faced with the lack of corpus for judicial text annotation at present, we can only use manual annotation to obtain small-scale sample sets for machine learning exercise, which may affect the effect of entity identification to some extent. Obviously, the particularity of judicial text brings some difficulties to the identification of named entities.
In recent years, the research on named entity recognition has achieved good results and has been widely used in many fields, especially in some proprietary fields such as medicine, military and network media. In order to realize the correct word segmentation of traditional Chinese medicine nouns and prescription nouns in medicine, the BLSTM neural network model is used to identify named entities, which reduces ambiguity and achieves good word segmentation effect [1]; Aiming at the grammatical features of military text, a conditional random field model (CRF) is established and the method based on dictionaries and rules is used to identify military named entities [2]; Microblog, as an important platform for information exchange, has a unique content structure [3]. The method based on the combination of rules and statistics is used to identify named entities. Compared with the application of named entity identification in medical and military fields, the combination of judicial fields has not been fully promoted. Text in the field of justice, while the relative specification, entity class rich and different structure, and in some legal documents, electronic files, and to guide the particularity of physical expression in the case, so whether add the various characteristics of CRF model, or named entity recognition method based on rules or statistics, can make effective to entities in the field of judicial identification [4].
In order to improve the judicial field named entity, especially in some entity recognition effect in criminal investigation cases, based on the characteristics of judicial field text, this paper USES the traditional entity type: no longer person names, place names, organization names, time, etc., but to set the entity justice text category for legal subject, object, legal facts and legal documents these four classes. Generally speaking, the subject of law includes natural persons and organizations, while the object of law includes things and natural achievements, etc. Legal facts refer to some legal events and legal acts. In this paper, after solving the problem of long distance dependence which cannot be solved by traditional recognition methods by using the Bidirectional Long and Short Term Memory (BLSTM)-Conditional Random Field (CRF), attention mechanism is introduced to obtain the local features of text. Experimental results show that this method can effectively improve the identification effect of named entities in judicial field.

Named Entity Identification Framework in the Judicial Field
This paper abandons the traditional entity classification, and establishes four specific categories for judicial texts: Legal Subject (SUB), Legal Object (OBJ), Legal Fact (FAC), and Legal Document (DOC). This paper comprehensively considers the interconnectedness of concepts in judicial texts, and sometimes there are reciprocal semantic relations among these four types of entities. Therefore, the identification of named entities in judicial field is attributed to the task of sequential annotation for research. The model framework is shown in Fig.1:

Input Vector Representation
In the process of named entity recognition based on traditional methods, the advantages of word features and part of speech features to better reflect the expected set information [5] will be utilized to help complete the task of named entity recognition., but past experience has shown that this method has certain disadvantages, which cannot be deep dig the information processing and the relationship between the word and the word has strict requirements. There are two ways for vectorization of Chinese characters in judicial texts: Singly Heat Vector Representation and Distributed Vector Representation [6]. The research shows that the distributed vector representation of these word features and part of speech features can solve the "semantic gap". This representation method is to express some large and extremely sparse vectors with low-dimensional dense real value vectors, in which each dimension represents a hidden feature, and these distributed representations can automatically learn In this paper, attention mechanism is introduced to better obtain the local features of the text, and in order to extract the internal features of the text, we adopt the method of character vector and word vector splicing as the input vector embedding. As shown in Fig.2:

Fig.2 Word combination vector model
As can be seen from Fig.2, a CNN model is added for the word vector. The character vector of each word is converted into matrix form, and considering that the length of each word is different and the size of the matrix is different, placeholders are introduced at both ends of the word to supplement to ensure that the size of each character-level matrix is the same [7]. Then it enters the convolutional neural network for training through the back-propagation algorithm. After feature extraction through the convolutional layer, it enters the pooling layer to aggregate the included elements. The maximum pooling layer can realize the extraction of descending and multi-feature, and finally get the characterlevel feature vector.  [8], is similar to the traditional RNN structure, which can process the current information while transmitting the information forward. However, the difference is that the model structure of LSTM has more cell state and gate structure, which enables LSTM to remember the long-term information [9].  The gating structure of LSTM mainly includes three mechanisms: forgetting gate, input gate and output gate [10]. Forgetting gate, as the name implies, is the function of "forgetting". It is used to decide what kind of old information is discarded from the cell state. The input gate is used to determine what kind of new information should be placed in the cell state. The output gate determines what kind of information the next cell state will contain.. The information transmission algorithm of LSTM is [11]: Compared with the traditional RNN structure, LSTM solves the problem of gradient vanishing and long-term dependence [12], but it can not use the information of context. In order to effectively use the contextual information, this paper adopts the BLSTM structure, which can not only read the sentence forward to obtain the information above, but also read the sentence backward to obtain the information below. Let , , ⋯ , be an input equence, forward recursion and backward recursion are respectively adopted to obtain forward LSTM and backward LSTM: ℎ , ℎ , ⋯ , ℎ and ℎ , ℎ , ⋯ , ℎ .

Attention Model
The mechanism of attention comes from the study of human vision. During the cognitive process, people will choose some specific information in the visual field to focus on and read, while they will discard other visible information [14]. In the neural network model, each output data has different importance to us. By adopting the attention mechanism, the weight of the important data needed in the whole model is increased and the weight of the noise data is reduced.
In this paper, by introducing the attention mechanism and combining with the BLSTM model, each feature vector of the output splicing of the BLSTM model is given appropriate weight, and some local features are noticed by the attention mechanism while the global features are extracted, and finally the joint features are obtained [15]. The neural network model of attention mechanism introduced in the framework of named entity recognition is shown in Fig. 6 Fig.6 Attention machine ∑ ℎ is the final output state of the attention mechanism. assigns the weight of each eigenvector to the attention mechanism: is the score of the attention mechanism: , ℎ ℎ ℎ 9 is the output state of the attention mechanism at the last moment; ℎ is hyperbolic tangent activation function; ℎ is the eigenvector of BLSTM output; is the global weight transformation matrix; is the weight transformation matrix of the output state of the attention mechanism at the previous time； is the weight transformation matrix of the eigenvectors.

CRF Layer
After the BLSTM model and the Attention Layer, although the output sequence of joint features of global and local features has been obtained, the rationality between annotations cannot be taken into account in the tagging task of named entities [16]. In order to avoid illegal annotation situations such as B-SUB and B-OBJ after b-sub, CRF Layer was added at the end of the model to ensure that the predicted sequence output was the correct annotation sequence with the maximum probability [17]. Let , , ⋯ , be an input sequence, and , , ⋯ , as the annotation sequence.

10
represents the final score of the annotation sequence; is the state transition matrix; , represents the transition matrix from the tag to the tag; is the output state matrix of Attention layer; , is the probability that the word is labeled , and the probability value of the annotation sequence is ：

∑ ∈ 11
And Y={All possible label sequences}. The objective function of the model is denoted as L: According to the function of CRF layer, the maximum probability of the model's output prediction sequence being correctly labeled is: max ∈ 13 Due to the particularity of judicial texts, there is no official judicial corpus. This experiment from China's official website to download the written judgment of Beijing in 2018 the civil verdict and administrative orders, orders for compensation, execute orders each 250, a total of more than 1.201 million words, taking the artificial labeling and coding quadratic mark [4], the corpus of the experiment, and according to the proportion of 8:1:1 corpus can be divided into training, validation, corpus, test corpus. The labeling of corpus is shown in Table 1: Table 1. Corpus labeling situation Name Numbers Texts(Part) 1000 Sentences 22000 Words 712000 Characters 1201000 In this paper, the task of named entity identification is divided into four categories: legal subject, legal object, legal fact and legal literature. In order to better evaluate the performance of the system, Precision (P), Recall (R) and F-score are used as the evaluation indexes of the named entity identification task.

Element Label
The annotation mode adopted in this experiment is BIO annotation mode. B marks the beginning part of the entity, I marks the middle part of the entity, and O represents the non-entity class. For example, the annotation of the legal subject is denoted as B-SUB/O-SUB.

Experimental Results and Analysis
In order to verify the best entity recognition effect of the model adopted in this experiment, CRF and BLSTM-CRF models were used to conduct experiments on the corpus obtained by manual annotation and coded secondary annotation in this paper. Experimental comparison results are shown in Table 2:  in table 2 to the judicial corpus of this paper for analysis, which also verified the better effect of the Attention-BLSTM-CRF model adopted in the experiment. According to the comparison results between CRF model and BLSTM-CRF model, BLSTM-CRF model has better effect, mainly because BLSTM is added to the model to fully obtain the characteristics of the context and consider the long-term dependence between features. From the results of the BLSTM-CRF model and the Attention-BLSTM-CRF model, the recognition effect of the Attention-BLSTM-CRF model is better, because the named entity recognition model adopted in this paper adds Attention mechanism. It can enrich the local features of the text according to the particularity of the judicial text, realize the acquisition of joint features, and integrate the CNN structure model into the processing of the input vector to better train the character vector, so as to achieve a better recognition effect of the model as a whole.

Conclusion
In this paper, by studying the particularity of judicial text and judicial named entity, a named entity recognition model based on self-attention mechanism is proposed. The internal feature of character vector is obtained by CNN structure, and the ability of BLSTM model to learn context information is better used. At the same time, CRF model is used to ensure the rationality of annotation task. Although the recognition effect of this model is better than the other two models, the accuracy and recall rate are not very ideal. This may be because our entity labeling in the judicial field is not mature enough. In the future work, we will increase the text processing, and consider adding the BERT pre training language model to improve the effect of named entity recognition.