NTOU Chinese Grammar Checker for CGED Shared Task

Grammatical error diagnosis is an essential part in a language-learning tutoring system. Participating in the second Chinese grammar error detection task, we proposed a new system which measures the likelihood of sentences generated by deleting, inserting, or exchanging characters or words. Two sentence likelihood functions were proposed based on frequencies of space-removed version of Google n-grams. The best system achieved a precision of 23.4% and a recall of 36.4% in the identification level.


Introduction
Although that Chinese grammars are not defined as clearly as English, Chinese native speakers can easily identify grammatical errors in sentences.This is one of the most difficult parts for foreigners to learn Chinese.They are often uncertain about the proper grammars to make sentences.It is an interesting research topic to develop a Chinese grammar checker to give helps in Chinese learning.There have been several researches focused on Chinese (Wu et al., 2010;Chang et al., 2012;Yu and Chen, 2012;Tseng et al., 2014).
In NLPTEA-1 (Yu et al., 2014), the first Chinese grammatical error diagnosis evaluation project, the organizers defined four kinds of grammatical errors: redundant, missing, selection, and disorder.The evaluation was based on detection of error occurrence in a sentence, disregarding its location and correction.We developed an error detection system by machine learning.
However in NLPTEA2-CGED (Lee et al., 2015), it is required to report the location of a detected error.To meet this requirement, two new systems were proposed in this paper.The first one was an adaptation of the classifier developed by machine learning where location information was considered.The second one employed hand-crafted rules to predict the locations of errors.
We also designed two scoring functions to predict the likelihood of a sentence.Totally three runs were submitted to NLPTEA2-CGED task.Evaluation results showed that rule-based systems achieved better performance.More details are described in the rest of this paper.
This paper is organized as follows.Section 2 gives the definition of Chinese grammatical error diagnosis task.Section 3 delivers our newly proposed n-gram statistics-based systems.Section 4 gives a brief description about our SVM classifier.Section 5 shows the evaluation results and Section 6 concludes this paper.

Task Definition
The task of Chinese grammatical error diagnosis (CGED) in NLPTEA2 is defined as follows.Given a sentence, a CGED system should first decide if there is any of the four types of errors occur in the sentence: redundant, missing, selection, and disorder.If an error is found, report its beginning and ending locations.
Training data provided by the task organizers contain the error types and corrected sentences.

N-gram Statistics-Based System
Besides the classifiers developed in the last CGED task (Yu et al., 2014), we proposed a new method to build a CGED system based on ngram statistics from the World Wide Web.
Our assumption is: a corrected sentence has a larger probability than an erroneous sentence.I.e.deleting unnecessary characters, adding necessary characters, and exchanging locations of misplaced words will result in a better sentence.Our system will try to delete, insert, or exchange characters or words in a given sentence to see if the newly generated sentence receives a higher score of likelihood.Steps and details are described in this section.

Sentence Likelihood Scores
Since our method heavily counts on likelihood of a sentence being seen in Chinese, it is important to choose a good scoring function to measure the likelihood.Although n-gram language model is a common choice, a corpus in a very large scale with word-segmentation information is not easy to obtain.An alternation is to use Google Ngram frequency data.
Chinese Web 5-gram1 is real data released by Google Inc. who collected from all webpages in the World Wide Web which are unigram to 5grams.Frequencies of these ngrams are also provided.Some examples from the Chinese Web 5-gram dataset are given here: We have proposed several sentence likelihood scoring functions when dealing with Chinese spelling errors (Lin and Chu, 2015).But in order to avoid interference of word segmentation errors, we further design some likelihood scoring functions which utilize substring frequencies instead of word n-gram frequencies.By removing space between n-grams in the Chinese Web 5-gram dataset, we constructed a new dataset containing identical substrings with their web frequencies.For instances, n-grams in the previous example will become: Note that if two different n-gram sets become the same after removing the space, their will merge into one entry with the summation of their frequencies.Simplified Chinese words were translated into Traditional Chinese in advanced.Given a sentence S, let SubStr(S, n) be the set of all substrings in S whose lengths are n bytes.We define Google String Frequency gsf(u) of a string u with length n to be its frequency data provided in the modified Chinese Web 5-gram dataset.If a string does not appear in that dataset, its gsf value is defined to be 0.
Two new sentence likelihood scoring functions are defined as follows.Equation 1gives the definitions of length-weighted string log frequency score SL(S) where each substring in S with a length of n contributes a score of the logarithm of its Google string frequency multiplied by n.We think that short strings are not that meaningful, this function only considers strings no shorter than 6 bytes (i.e. a twocharacter Chinese words or a bigram of onecharacter Chinese words.) Equation 2 gives a macro-averaging version of Equation 1 where scores are averaged within each length before summation over different lengths.(2)

Character Deletion (Case of Redundant)
To test if a sentence has a redundant character, a set of new sentences are generated by removing characters in the original sentence one by one.If any of the new sentences has a higher likelihood score than the original sentence, it may be the case of redundant-type error.
Because the experimental data are essays written by Chinese-learning foreign students, some redundant errors are commonly seen across different students.In order not to generate too many new sentences, we only deleted the characters of the frequent redundant errors which occurred at least three times.There were 23 of them which covered 66% of the redundant errors in the training data.Examples of character deletion are as follows where 很 and 到 are frequent redundant errors.

Character Insertion (Case of Missing)
To test if a sentence has a missing character, a set of new sentences are generated by inserting a character into the original sentence at each position (including the beginning and the end).If any of the new sentences has a higher likelihood score than the original sentence, it may be the case of missing-type error.
Similarly, some missing errors are commonly seen across the essays written by Chineselearning foreign students.

Word Exchanging (Case of Disorder)
To test if a sentence has a disorder error, the original sentence is word-segmented, and a set of new sentences are generated by exchanging words in the original sentence, each pair at a time.
If any of the new sentences has a higher likelihood score than the original sentence, it may be the case of disorder-type error.
Examples of word exchange are as follows.

Error Decision
All the new sentences, whenever generated by removing characters, inserting characters, or exchanging words, are scored by the sentence likelihood functions.The creation type and the modification location of the top-1 new sentence are reported as the error type and error location.
If no new sentence's score is higher than the original's, it is reported as a "Correct" case.

Selection-Error Detection
If a detected error in Section 3.5 is a redundant case, it may also be a Selection-type error.If the deleted character occurs in a multi-character word in the original sentence, report this error as a Selection-type error.
[B1-0764] Redundant => Selection org: 我 很 想到 跟 你 見面 (I really want to to meet you.) new: 我 很 想 跟 你 見面 (I really want to meet you.) Similarly, if a detected error in Section 3.5 is a missing case, it may also be a Selection-type error.To make a decision, the new sentence is also word-segmented.If the inserted character occurs in a multi-character word in the original sentence, report this error as a Selection-type error.

Error Detection by Machine Learning
We also modified our previous CGED system participated in NLPTEA-1 to do error detection.

Experiments
Three formal runs from our systems were submitted to NLPTEA2-CGED this year.The first run was created by the SVM classifier.The second run as created by the newly proposed CGED system with the original version of the length-weighted string log frequency function.The third run as created by the newly proposed CGED system with the macro-averaging version of the length-weighted string log frequency function.Table 3 shows the evaluation results of our three formal runs.All results suggest that a system using the length-weighted string log frequency function achieves better performance than a SVM classifier.

Conclusion
This is the second Chinese grammatical error diagnosis task.We proposed three systems to do the task.One is a SVM classifier where features are length, numbers of infrequent word bigrams, and occurrence of stop POS bigrams.The other two measure the likelihood of newly generated sentences by deleting, inserting, or exchanging characters or words.Two sentence likelihood functions were proposed based on frequencies of space-removed Google n-grams.The second system performed better than the other two which achieved a precision of 23.4% and a recall of 36.4%.Although the performance seemed not good enough, our system was ranked at the second place in the identification level and the third in the position level, which means that the task is very hard.More rules and features should be studied in the future.
Table 2 shows the most frequent missing errors in the training data.

Table 3 .
Evaluation Results of NTOU Runs