English Grammar Error Detection Using Recurrent Neural Networks

Automatic marking of English compositions is a rapidly developing field in recent years. It has gradually replaced teachers’ manual reading and become an important tool to relieve the teaching burden. )e existing literature shows that the error of verb consistency and the error of verb tense are the two types of grammatical errors with the highest error rate in English composition. Hence, the detection results of verb errors can reflect the practicability and effectiveness of an automatic reading system. )is paper proposes an English verb’s grammar error detection algorithm based on the cyclic neural network. Since LSTM can effectively retain the valid information in the context during training, this paper decided to use LSTM tomodel the labeled training corpus. At the same time, how to convert the text information in English compositions into numerical values for subsequent calculation is also an important step in automatic reading. Most mainstream tools use the word bag model, i.e., each word is encoded according to the order of each word in the dictionary. Although this encoding method is simple and easy to use, it not only causes the vector to lose the sequence information of the text but also is prone to dimensional disaster. )erefore, word embedding model is adopted in this paper to encode the text, and the text information is sequentially mapped to a low-dimensional vector space. In this way, the position information of the text is not lost, and the dimensional disaster is avoided. )e proposed work collects some corpus samples and compares the proposed algorithm with Jouku and Bingguo. )e verification results show the superiority of the proposed algorithm in verb error detection.


Introduction
For English learning, grammar and practice are extremely important. Writing is an effective way to test and improve the grammar. Hence, English learner needs to carry out a lot of writing training to improve their English proficiency [1,2]. On the one hand, due to the increasing importance of English, there are more and more learners. At the same time, the number of English teachers is growing slowly, and teaching resources are becoming more and scarcer. Because the correction of composition is difficult and it takes much longer time, the feedback period from the teacher after students' practice is longer. us, the longer the feedback period, the worse the effect, affecting the learning of the individuals. In daily teaching, an English teacher usually needs to teach several classes of English at the same time. e teaching task is extremely heavy, and it is difficult to spend a lot of energy on reviewing composition.
China has a large population and an increasing number of English learners. However, China's basic education resources are relatively deficient, which is quite different from that of developed nations. Nowadays, the scale of English test in China is getting larger and larger, and a number of teachers are needed to correct the composition. English has very complicated grammar rules, which require a lot of effort on the teacher's part. Many students cannot get targeted guidance in the training, so there is an increasing demand for using the computer to correct English compositions. Some regular English writing self-correcting systems, such as Bingguo and Jouku, have been used in many colleges and universities, which greatly relieve the pressure of English teachers in teaching and correcting English compositions [3][4][5][6]. Moreover, in China's English database, the error rate of noun use is 8.4%, while the error rate of verb use is nearly twice, which is the most error-prone part of speech error. e most common mistakes in the use of verbs are subjectverb agreement [7][8][9] and verb tenses and forms. erefore, the effect of verb detection is a key sign of the maturity of an English composition automatic reading system. rough usage and statistics, this paper finds that Jouku and Bingguo have poor performance in verb consistency and tense error. erefore, it is important to improve the detection rate of verb grammatical errors for English composition automatic reading [10].
In view of the poor performance of the mainstream reading system in verb error detection [11][12][13], this paper proposes a grammatical error detection method [14][15][16][17] based on deep learning [18,19], especially for verb consistency and verb tense. According to the dependence of verb tenses on context information, this paper uses the LSTM model's ability to retain context information to solve the problem of incorrect verb tenses. At the same time, different from the word bag model used in the mainstream reading system, the word embedding model is used in this paper to transform the text information into numerical information, which not only solves the shortcoming of missing the text position information in the word bag model but also solves the dimensional disaster that the word bag model may bring. e main contributions of this paper are as follows: (1) is article uses the recurrent neural network technology to solve the problem of English grammar error correction and provides a simple and convenient way for English learners to use, which has a certain role in promoting the improvement of English learners' grammar level. (2) is paper maps the text information to a low-dimensional vector space so that the position information of the text is not lost, and the dimensional disaster is avoided. Moreover, the superiority of the algorithm in the error detection of English verbs is proved through experiments. e rest of the paper is organized as follows. In Section 2, the background information relevant to the proposed work is presented. In Section 3, the proposed methodology adopted by our work is presented. In Section 4, the experimental results are provided. Finally, the paper is concluded, and future research directions are provided in Section 5.

Related Research.
One of the classic problems in the field of natural language processing [20][21][22] is machine translation [23]. e development of machine translation is a microcosm of the development of natural language processing. From the expert system at the beginning to the rulebased and grammar tree and then to the statistical probability method, a set of solutions have been formed. Grammatical error correction is similar to the process of machine translation, in which there are many lessons to be learned, and its development is also similar. At the beginning of the development of grammar correction, common grammar checkers were also rule-based, but due to the complexity and changeability of English grammar, it is basically impossible to rely on exhaustive grammar rules to detect errors. e late statistical machine translation technology has been introduced into grammar error correction, which is based on statistical experience and data-driven approach, and has received certain attention, which greatly promotes the development of grammar error correction.
For single grammatical errors of verbs, there is no special algorithm to solve them. e mainstream automatic marking systems all use a unified algorithm to solve all grammatical errors. At present, it has produced obvious results. For example, the United States has successfully made use of computer correcting in the process of correcting the TOEFL test. At present, grammar detection is mainly carried out by the rule detection model, but the complexity and diversity of English grammar result in the limitation of rule model detection, which cannot fully realize the function of error correction. However, the rule model is not very complicated and convenient to use, so various grammar detection systems generally choose this model.
Easy English grammar detection system [24], an IBM product, is based on the "English slot language model" (English Sbt Grammar), using the syntax tree to carry out error correction.
e system successfully converts grammatical problems into pattern problems, enabling it to pair with the syntactic tree, but one problem that this system cannot solve is that not all sentences can be successfully decomposed and a complete syntactic tree can be constructed. Microsoft Word contains tools for checking spelling and grammar. High detection efficiency is one of its advantages. Precisely because of its superior performance in all aspects, it has become the benchmark for many grammar checking systems. e main function of Word is text editing, and grammar error checking is only an auxiliary function. erefore, there is no special research on verb errors, resulting in a very low detection rate of verb errors. Link Grammar Parser is essentially a rule-based detection system, but it is different from the traditional rule-based detection system. e basic idea is to connect two sentences, which can replace the traditional method of tree structure analysis. After the experiment, the link grammar has high accuracy in judging sentences. In addition, according to the definition of different links, the link grammar itself also indirectly provides a tool for evaluating sentences. Since the rules for verb errors are complex and difficult to define, the detection effect of most grammar checkers on verb errors depends heavily on the definition of the rules.

Grammar Check.
e design technology of the grammar detection model is generally based on natural language processing, i.e., marked part of speech, word segmentation, corpus, and other basic technologies.

Word Segmentation.
For natural language processing, words are atomic units. Word segmentation refers to parsing a paragraph or a sentence into individual words, combining the words for analysis, in order to achieve the effect of studying sentencing problems. erefore, word segmentation technology is the first step of the entire language processing program. Only by accurately and effectively decomposing sentences into individual words can the overall effective detection be carried out. Because English itself has obvious characteristics, there are spaces or other symbols between words for segmentation, so when studying English, the difficulty of segmenting English sentences is relatively low. However, the segmentation of English vocabulary is also difficult, mainly in how to correctly judge the ending position of the sentence.
At present, the common word segmentation techniques include the rule model-led word segmentation techniques and the statistical model-led analysis techniques. Rule model-driven analysis techniques store word classification as rules in the system, such as "can't," which can be divided into "can" and "not" by corresponding rules. Statistical modelbased word segmentation techniques usually analyze the rationality of sentences under various word segmentation methods and finally produce the optimal word segmentation results.

Part-of-Speech Tagging.
e part-of-speech marking system is based on the grammatical coherence, which marks the constituent words of a sentence with different parts of speech. Rule model dominance and statistical model dominance are two common implementations. Early partof-speech taggers usually adopted a rule-based approach. PoS tagging is done manually by linguists according to language rules. e most representative one is the Taggit marking system, in which 86 kinds of marking sets can mark the part of speech, and there are more than 3300 contextual association rules. Brown corpus contains 1 million words, and the accuracy of word marking is up to 77%. is kind of form takes time and experience and cannot maintain absolute objectivity. It is difficult to guarantee the consistency of rules all the time, and problems such as contradictions and incompleteness among rules easily occur.

Corpus.
Corpus can be regarded as a structural synthesis of a large number of texts, and the texts are all commonly used language vocabulary. e current corpus is usually stored in electronic devices and analyzed and sorted by computers. Corpus is very common in practical applications, which is generally applicable to hypothesis testing, static analysis, resource query, verification of linguistic rules, etc. In order to give full play to the function of the corpus in research, it is necessary to carry out preliminary processing work. For example, some small-scale corpora will be marked with PoS marking and other methods, and some will mark the lexical prototype. Smaller corpora usually contain 1 million to 3 million lexical data, which are usually annotated and parsed or deeply parsed in terms of semantics or morphology. A large amount of information can be obtained by studying these marked corpora, which is of great help to the technical development of computer language, computer translation, and speech recognition. Corpus can not only produce hidden Markov chains but also support English teaching according to vocabulary usage rate.
ere are several classifications of corpus. According to corpus balance and representativeness, corpus can be divided into two categories: balanced corpus and parallel corpus. According to the use, it can be divided into two categories: special and general corpus. According to time, it can be divided into two categories: diachronic and synchronic corpus. In addition, it can be divided into two categories according to the content: markup and raw corpus. e distribution of these four error types is shown in Table 1.

LSTM Verb Grammar Checking Model Based on the Transformer Attention Model.
Information present before and after the text is crucial for checking grammatical errors and complementing the text, accordingly. When the LSTM model based on the attention model is used for syntax error checking, the full-text information is not fully considered, which leads to the missing of potential information. In dealing with this problem, Bi-LSTM model [25] is referred. In other words, the forward transfer information layer and the backward transfer information layer can, respectively, obtain the prior and subsequent text information of the sequence input into the system, and the latent layer of the forward and backward transfer layer with the same input node can input the final layer information after combining. e LSTM model based on the attention model [26], combined with the positive order and the negative order, is designed as the model structure, as shown in Figure 1.
It can be seen from Figure 1 that x 1 , x 2 , . . . , x t is the sequence entered into the system, h 1 , h 2 , . . . , h t represents the potential node of the layer whose transfer direction is forward, and h 1 ′ , h 2 ′ , . . . , h t ′ denotes the potential node of the layer whose transfer direction is backward. Y 1 , Y 2 , . . . , Y t denotes the final output sequence. e following assumptions are taken into account: (1) Input both positive and reverse sequences into the model. encoding so that all the information of the text is taken into account. (3) e semantic coding obtained after the combination of the forward and reverse semantic coding is the vector that can be used to build the categorized system. Logical regression is used to design the classifier.
In model training, word vector generation module and feature extraction module [27,28] adopt the single-layer training method. Each layer first generates the word vector and then trains the attention-based LSTM feature extraction model to generate its respective semantic coding. In the training and testing part of the classifier, the twolayer structure is trained together, and the input of the classifier is the final semantic code generated after the combination of positive-order semantics and inverse semantics.

Transformer.
e transformer model contains an encoder and a decoder [29]. Figure 2 shows the structure of the model. Given the source-end error sentence, i.e., x � (x 1 , x 2 , . . . , x m ), x i ∈ X, X is the source-end vocabulary. Here, the transformer encoder encodes x as a set of implicit states in a continuous space, representing e � (e 1 , e 2 , . . . , e m ). Based on this representation, the transformer decoder generates target-side corrected sentences y � (y 1 , y 2 , . . . , y m ), y i ∈ Y, time by time, and Y is the target-side vocabulary. e encoder and decoder in the transformer, respectively, contain 6 identical layers. e layer located in the encoder consists of a self-attention sublayer and a forward neural network complex sublayer. e input first passes through the self-attention sublayer, and then the same forward neural network is applied to the output at different locations in the self-attention sublayer. e layer located in the decoder also contains a self-attention sublayer and a forward neural network complex sublayer. In addition, there is an encoder-decoder attention sublayer in between the two, which is similar to the attention layer in the typical encoder-decoder model of the cyclic neural network. At the output of all sublayers of the encoder and decoder, the residual connection is applied, and the layer normalization is done.

Attention Mechanism in the Transformer.
Given the query vector q, the key vector set K, and the value vector set V, the calculation equation of the zoomed point product attention is as follows: (1) Here, d k represents the dimension of the key vector, and the scaling factor (1/ �� d k ) is introduced to prevent the probability distribution calculated by softmax from being too extreme, resulting in too small gradient when the parameter is updated.
To allow the model to obtain information from different representation subspaces at the same time when encoding at different positions in the sequence, the multiheaded attention will perform multiple scaled point-multiplying attention calculations.
e dimensions of each projection matrix are, respectively, Here, pos is the position label of a symbol in the sequence, and i indicates a component of the position encoding vector.

Model
Training. When training the transformer, maximum likelihood estimation is used. e goal is to maximize the likelihood of the model on the training data S:

Experimental Setup
. I train the model on two GeForce RTX 2080 Ti, the batch size is set to 256, the maximum sentence length is set to 50, the part exceeding this length is directly truncated, and the update stops after about 30,000 steps. e source end and the target end use different vocabularies, and the first 30,000 BPE subword units that appear most frequently are, respectively, taken. When decoding, the beam search with length penalty is used, the beam size is set to 8, a parameter in the length penalty is set to 0.6, and the maximum length of the generated corrected sentence is set to 300.

Datasets.
e dataset used for the experiment in this paper is the Chinese Learner English Corpus (CLEC), which contains more than one million composition materials written by middle and college students in China. Scientific Programming e editor of the corpus has marked the grammar and errors of all the materials in the corpus. Due to the complicated work and huge project, it is called the first language material library officially open to the world to mark language errors for English learners. In this paper, 100 CLEC English compositions were randomly selected for the experiment. ere were a total of 1128 sentences in the 100 compositions, including 1083 wrongly marked sentences.

Evaluation Methods.
We generally use F-measure, which is F-score, to evaluate and measure the results and effectiveness of error checking. erefore, this article uses F 1 -measure, and its calculation equation is as follows: Here, P stands for precision and R for recall, respectively.

Experimental Results.
As can be seen from Table 2, the algorithm in this paper has played its role in improving the accuracy of checking whether the grammar is positive or negative, and at the same time, the F-value has been improved to a certain extent. In addition, this paper also conducted a comparison of the same type system, namely, Jouku.com, and the results show that Jouku.com has a recall rate of 0.487 for grammatical errors, while its accuracy rate is 0.897. erefore, we can realize that our system has a higher accuracy than Jouku.com, but there is a certain gap in recall rate. By analyzing the reasons, the insufficient number of rules leads to a lower recall rate. ere are less than 1100 rules in the system studied in this paper, while there are more than 5000 rules in Jouku.com. erefore, in order to improve the recall rate, the system must increase the quantity and quality of its own rules. In addition, a comparison with the GCSCL using the CLEC test set showed that our system had an accuracy rate of 77%, and the recall rate was improved. Figure 3 also shows the training loss curve, which can be seen to be very smooth. In addition, Figures 4-6 also show the histograms on precision, recall, and F-score, which clearly show the effectiveness of our model.

Conclusion
is paper proposes an error detection algorithm for English verbs based on a recurrent neural network. Since LSTM can effectively retain adequate information during training, this paper uses LSTM as a training model for the labeled training corpus. At the same time, how to convert the text information in the English composition into numerical values for subsequent calculations is also an essential step in automatic review. Most mainstream tools use the bag-of-words model, where each word is encoded according to its order in the dictionary. Although this encoding method is simple and easy to use, it causes the vector to lose the sequence information of the text and is prone to dimensional disasters. erefore, this paper uses the word embedding model to encode the text and maps the text information to a lowdimensional vector space. e text's position information is not lost, and the dimensional disaster is also avoided. Next, this paper collects specific corpus samples and compares and verifies the algorithm in this paper with Jouku and Bingguo, respectively. e verification results show the superiority of the algorithm in verb error detection.

Data Availability
e data used to support the findings of this study are included within the article.