Hybrid Neural Network for Automatic Recovery of Elliptical Chinese Quantity Noun Phrases

In Mandarin Chinese, when the noun head appears in the context, a quantity noun phrase can be reduced to a quantity phrase with the noun head omitted. This phrase structure is called elliptical quantity noun phrase. The automatic recovery of elliptical quantity noun phrase is crucial in syntactic parsing, semantic representation and other downstream tasks. In this paper, we propose a hybrid neural network model to identify the semantic category for elliptical quantity noun phrases and realize the recovery of omitted semantics by supplementing concept categories. Firstly, we use BERT to generate character-level vectors. Secondly, Bi-LSTM is applied to capture the context information of each character and compress the input into the context memory history. Then CNN is utilized to capture the local semantics of n-gramswith various granularities. Based on theChinese AbstractMeaning Representation (CAMR) corpus and Xinhua News Agency corpus, we construct a hand-labeled elliptical quantity noun phrase dataset and carry out the semantic recovery of elliptical quantity noun phrase on this dataset. The experimental results show that our hybrid neural network model can effectively improve the performance of the semantic complement for the elliptical quantity noun phrases.


Introduction
Ellipsis is a widespread linguistic phenomenon that aims to recover missing objects in the sentence based on the surrounding discourse. The human can quickly restore the omitted semantics based on context, however, it is a big challenge for machine understanding. The gap in syntax and the omission in semantics bring obstacles to many NLP tasks such as syntactic parsing, machine translation and information extraction. Therefore, ellipsis recovery has become one of the fundamental tasks in natural language processing. Quantity noun phrase is one of the most common structures in Chinese, in which quantity phrase is used to modify and specify the noun head. However, in some contexts, the noun head is omitted, while the quantity phrase is used to represent the meaning of the whole quantity noun phrase. According to Dai et al. [1], the elliptical quantity phrase accounts for 38.7% of the omission in the corpus of Chinese Abstract Meaning Representation (CAMR). As the omitted noun bears the core meaning of the quantity noun phrase, it is necessary and critical to restore it for the downstream tasks of natural language understanding [2]. However, this research topic has been neglected by most of the research communities. To fill this gap, our research proposes a hybrid neural network model for automatically complement the elliptical quantity noun phrase in Chinese to fill the current research gap.
Quantity noun phrase, usually in the form of "numerical word + quantifier + noun", is a common type of phrase in modern Chinese, in which numerical words and quantifiers are the noun's modifiers. For example, in the case of " (one) (unit) (student)", the numeral " (one)" and the quantifier " (unit)" are the modifiers of the noun head " (student)". In actual language use, following the economic principle of pragmatics, if the head of a quantity noun phrase appears in the context as an antecedent, the noun head can be omitted and the quantity noun phrase is reduced to a quantity phrase and forms an elliptical quantity noun phrase. For example:

The clerk had no choice but to pour spirits for Wu Song. Wu Song drunk eighteen bowls [of spirits] altogether.
In CAMR, the concept nodes of omitted arguments are allowed to be added, which makes the semantic representation of sentences more natural and comprehensive. As is shown in Fig. 1, CAMR recovers the noun head of the elliptical quantity noun phrase " (eighteen bowls)" by adding the concept thing. The quantity phrase " (eighteen bowls)" is the elliptical form of the quantity noun phrase " [ ](eighteen bowls of spirits)", as the noun head " (spirits)" appears in the previous sentence. In our work, we realize semantic recovery by supplementing concept categories that are automatically distinguished by our model. Semantic recovery of elliptical quantity noun phrases will help the subsequent Chinese Abstract Meaning Representation (CAMR) parsing by improving the accuracy of the parser; hence, it is beneficial to subsequent work based on the CAMR such as machine translation [3,4], text summarization [5], event extraction [6] and other works.
AMR is an abstract meaning representation method which represents the semantics of a sentence with a single rooted directed acyclic graph [7]. What distinguishes it from the other sentential semantic representation methods is that the elliptical arguments are also represented with concept nodes in its semantic graph [8]. There are overall 109 concepts in its Name Entity List that can be used to represent elliptical arguments in AMR. Inspired by AMR (Abstract Meaning Representation) [9], we utilize concepts to complete the noun head of elliptical quantity noun phrases. This paper uses the concepts from Name Entity List of AMR to denote the omitted noun heads in elliptical quantity noun phrases [10]. In the case " (Wu Song drunk eighteen bowls [of spirits] altogether.)", we use the concept "thing" to represent the head that is omitted by the quantity phrase " (eighteen bowls)". Moreover, in our work, we build an elliptical quantity noun phrase dataset based on the Chinese AMR corpus [11] and Xinhua News Agency corpus [12].
Since both the characters and words in the elliptical quantity noun phrases are crucial to recover the omitted noun head, we focus on the information of n-grams with various granularities in the target sentence. We utilize a hybrid neural network to obtain sufficient semantic information from the elliptical quantity noun phrase and its context for the omission recovery. We aim to recover the omitted head of an elliptical quantity noun phrase automatically with a hybrid neural network model, combining bidirectional encoder representation transformer (BERT) [13], bidirectional long short-term memory network (Bi-LSTM) [14] and convolutional neural network (CNN) [15].
To be specific, firstly, we utilize BERT to generate character-level vectors. Secondly, we utilize Bi-LSTM to capture both the preceding and following context information of each character and compress the input into the context memory history. Then CNN is utilized to capture the local semantics of n-grams with various granularities. Finally, the semantic representation of elliptical quantity noun phrase and its context are both used to predict the probabilities of the concepts that can represent the elliptical component.
To my knowledge, this is the first time that use the deep learning method on the Chinese elliptical quantity noun phrase dataset and fill the gaps in the research on semantic complement of elliptical quantity noun phrase. The main contributions of this paper are as follows: 1. We build a hand-labeled dataset of elliptical quantity noun phrases. 2. The method based on the hybrid neural network model is first proposed to complement elliptical quantity noun phrases. We anticipate that our model could recover the omitted noun head of the elliptical quantity noun phrases in a high accuracy, and subsequently helps to improve the downstream tasks.

Related Work
In this paper, we utilize the neural network model to predict the concept for complement the elliptical quantity noun phrases. Our work is related to previous works on quantity noun phrase research, ellipsis recovery and the related knowledge of AMR.

Quantity Noun Phrase
For the semantic research of quantity noun phrase, the existing works mainly focus on the boundary recognition of quantity noun phrases, mostly by adopting rules-based and databasebased methods to identify the structural features of quantity noun phrases. Zhang et al. [16] built a database that provides vocabulary knowledge and phrase structure knowledge for numerical phrase recognition and implemented the boundary recognition of "numerical words + quantifier". Xiong et al. [17] proposed a rule-based Chinese quantitative phrase recognition method based on Zhang's method without word segmentation. Fang et al. [18] proposed a backoff algorithm to get the quantifier-noun collocations that are not collected in dictionaries, which resulted in greatly improved recall in quantity noun phrase recognition. However, no researchers have paid attention to the semantic recovery of elliptical quantity noun phrases yet. Therefore, this paper aims to recover the omitted noun heads for elliptical quantity noun phrases to provide accurate and comprehensive sentential semantic representation for downstream semantic analysis and applications.

Ellipsis Recovery
In recent years, there has been a lot of work on semantic Chinese omission recovery. Shi et al. [19] explored a method to automatically complement the omitted head of elliptical DE phrases with a hybrid model, by combining densely connected Bi-LSTM and CNN(DC-Bi-LSTM). Zhang et al. [20] proposed a neural framework to uniformly integrate the two subtasks, VPE (Verb Phrase Ellipsis) detection and VPE resolution. For VPE detection, Zhang chose an SVM model with non-linear kernel function, a simple multilayer perception (MLP) and the Transformer. For VPE resolution, Zhang applied a MLP and the Transformer model, and finally a novel neural framework is proposed to integrate the two subtasks uniformly. Wu et al. [21] present CorefQA (Coreference Resolution as Query-based Span Prediction), an accurate and extensible approach for the coreference resolution task. Wu formulated the problem as a span prediction task, like in question answering: A query is generated for each candidate mention using its surrounding context, and a span prediction module is employed to extract the spans of the coreferences within the document using the generated query. Existing works to solve omission recovery are based on the specific phrase structure of the task, and might not suitable for all omission recovery tasks. We learn from its methods and research ideas, and carry out omission recovery work on the elliptical quantity noun phrases in this article.

Abstract Meaning Representation (AMR)
AMR [9] represents the semantics of sentences with a single rooted directed acyclic graph and allows sharing arguments. As it is more specific and accurate in semantic representation than other semantic representation methods, it has been applied to and proved effective in many NLP tasks such as event extraction [22], natural language modeling [23], text summarization [24], machine translation [25] and question answering [26]. Chinese AMR corpus (CAMR) was released in 2019 containing 10,419 Chinese sentences annotated with 109 concepts. We create our own dataset by extracting the elliptical quantity noun phrases from CAMR corpus and Xinhua News Agency corpus [12].
By studying and analyzing the existing related work of omission recovery, we use neural network methods to implement the semantic completion for elliptical quantity noun phrases in this paper. We regard the head complement of elliptical quantity noun phrases as a classification task, and propose a hybrid model to determine the corresponding noun head for a concept.

Model Overview
An elliptical quantity noun phrase in Chinese sentence can be denoted as: where s i represents the character in the sentence, 1 ≤ i ≤ n, 1 ≤ y ≤ n, m is the length of elliptical quantity noun phrase. {s 1 , . . . , s y−1 } represents the preceding context words of the quantity noun phrase, {s q y , s q y+1 , . . . , s q y+m } represents the elliptical quantity noun phrase or quantity noun phrase, {s y+m+1 , . . . , s n } represents the following context words. This paper aims to select the correct concept to complement the omitted head of the elliptical quantity noun phrase. We use the powerful BERT pre-training model, which can generate character-level embedding vectors combined with contextual information. Subsequently, Bi-LSTM is used to further generate a textual representation based on contextual semantics. Then, the CNN model is used to extract feature in sentences. Therefore, after Bi-LSTM semantic encoding, we use the feature extraction module to obtain various granularities features.
The overview of the model framework proposed in this paper is shown in Fig. 2: Overall, the model has four principal modules: • Semantic Encoding Module: Bi-LSTM model is utilized to obtain the semantic information of each word in the elliptical quantity noun phrase and its context. • Feature Extraction Module: CNN model and Max pooling are used to capture semantics of n-grams with various granularities. • Output Module: Generate predicted probability distribution over all concepts.
We will introduce the details of above four modules in the rest of this section.

Input Module
In the input module, we feed the data set of elliptical quantity noun phrases into the BERT pre-training module by character to obtain character vectors combined with contextual semantics. We take the output of this module based on the character-level vector of BERT as the input of the semantic encoding module.
BERT is a pre-trained language model proposed in 2018. Due to its powerful text representation capability, the model provides strong support in many different natural language processing tasks [27], especially in semantic analysis and semantic similarity computing. BERT pre-trains a general "language understanding" model in an unsupervised manner on a very large-scale corpus. When it is applied to other downstream tasks, it only needs to fine-tune the parameters of the network according to the specific task requirements [28].
The sub-structure of BERT is the encoder block of Transformer [29]. BERT abandons the recurrent network structure of RNN, regards the Transformer encoder as the main structure of the model, and uses the self-attention mechanism to model sentences. Its model structure is shown in Fig. 3.
In Eq. (1), two special tokens need to be embedded, where the [CLS] is the tag at the beginning of each sentence and the [SEP] is the tag at the end of the sentence. Since the elliptical quantity noun phrases always appear in one sentence there is no other sentences that [CLS] is used to tag. For input, we fill the segment embedding with 0 s. Besides, position embedding is mainly used to encode the sequence order. The BERT model jointly adjusts the internal twoway Transformer encoder and uses the self-attention mechanism to learn the contribution of the remaining characters in the context to the current character, thereby enhancing contextual semantic information acquisition. Finally, BERT model generates character-level embedding vectors . . x n } based on the current context.

Semantic Encoding Module
Semantic encoding module obtains the character-level vector X from the previous module as input. This module uses the two-layer Bi-LSTM model to perform semantic encoding and passes the Bi-LSTM hidden layer value h as the output to the local feature extraction module.
Long short-term memory neural network (LSTM) is a specific type of recurrent neural network (RNN) [30] that introduces memory mechanism and forgetting mechanism for neurons in hidden layers [31]. It effectively addresses the issue of long-distance dependence in RNN and avoids the gradient exploding or vanishing in the stage of error backpropagation calculation caused by long input sequence.
As shown in Fig. 4, each memory element of LSTM contains one or more memory blocks and three adaptive multiplication gates, i.e., Input Gate, Output Gate and Forget Gate, through which information can be saved and controlled. LSTM computes the hidden states h t = {h 1 , h 2 , . . . , h t } by iterating the following equation: At time step t, the memory c t and the hidden state h t are updated with the following equations: ⎡ where X t is the input at current time-step, i, f , o, g are respectively input gate, forget gate, output gate and new candidate memory state, σ and tanh are respectively sigmoid and hyperbolic tangent activation function, and denotes element-wise multiplication.

Figure 4: The Bi-LSTM architecture
However, the LSTM model, as a unidirectional neural network, is unable to learn the context information sufficiently, so Bi-LSTM is developed to improve semantic representation. In our work, we feed the character vector X = {x 1 , x 2 , . . . , x y−1 , x q y , x q y+1 , . . . , x q y+m , x y+m+1 , . . . x n } pretrained by BERT to the two-layer Bi-LSTM. The first layer Bi-LSTM output is fed to the second layer Bi-LSTM as input, and the second layer Bi-LSTM output is the concatenation of the last unit output of the forward and the backward propagation. The final output h is obtained through stacked Bi-LSTM layers: The architecture of the proposed "BERT + Bi-LSTM" model is shown in Fig. 5.

Local Feature Extraction Module
Local feature extraction module combines the output of the semantic encoding module and the output of the BERT pre-training layer as the input of this module. This module feeds the input vector through the convolution neural network and max-pooling feature extraction to the new output module.
CNN is an effective model for extracting semantic features and capturing salient features in a flat structure [32]. The reason is that they are capable of capturing local semantics of ngrams with various granularities. While in the sentence containing an elliptical quantity phrase, the omitted noun head usually can be inferred by the context as the words in the context are informative. Therefore, for the feature extraction module, we use CNN and Max pooling layer to extract features. Meanwhile, in order to obtain more effective information and avoid the loss of semantic information during semantic encoding, we concatenate the character vectors obtained by BERT pre-training with the semantic vectors generated by Bi-LSTM and input them into the feature extraction module.
We use a convolution filter F ∈ R ω×d to obtain the feature map Y ∈ R n−ω+1 , the j-th element y j is given by: where f is the Relu activation function, W is the weight matrix of convolution filter, b is a bias, ω is the length of the filter, and d is the dimension of word vector. The convolutional layer uses multiple filters in parallel to obtain feature maps. We also can use convolutional filters of various length to extract different feature information.
The feature maps are fed to the max-pooling layer to obtain the most salient information to form the feature vectors that have been proved effective in improving the model performance.

Output Module
The output module uses the output of the previous feature extraction module to obtain the probability distribution of the concept category through the classification function.
With the semantic encoding module and the feature extraction module, we obtain the feature representation of elliptical quantity noun phrase and its context. Then we feed the output of local feature extraction module into softmax function to classify the omitted semantic categories of elliptical quantity noun phrase.

Experiments
The experiment in this paper utilizes a hybrid neural network model to recover elliptical quantity noun phrases and compares multiple sets of experiments to show the improvement of our model. We first compare and analyze the experimental results, then discuss some error cases, and propose the feasible future improvements in this section.

Dataset
As there is no such dataset ready for use, we build our own elliptical quantity noun phrase dataset by extracting all the head omitted quantity noun phrases from the CAMR corpus and Xinhua News Agency corpus. In this dataset, we extract 838 elliptical quantity noun phrases from the CAMR corpus and 676 elliptical quantity noun phrases from the Xinhua News Agency corpus. We observe that in the corpus the omitted head concepts of the elliptical quantity noun phrases are only distributed on few categories such as thing and person, accounting for 85% of the overall elliptical quantity noun phrases. Tab. 1 shows the distribution of the elliptical quantity noun phrases and their concept categories in the training and test sets.

Experiment Settings
The experiment settings of the combined neural network based on our Hybrid Neural Network are shown in Tab. 2.

Evaluation Metrics
We use accuracy and Macro-F1 score to measure the performance of our proposed method. Macro-F1 gives equal weight to each class label which is suitable for classification task with unbalanced categories [33].

Results
We conduct comparative experiments to prove the effectiveness of our proposed method, the results are shown in Tab. 3. CharVec means that we adopt 300-dimensional character embedding vector obtained from the Baidu Encyclopedia corpus trained by Skip-Gram with Negative Sampling (SGNS) [34] method. Bi-LSTM means a bidirectional LSTM network is used. ATT refers to attention mechanism for extracting features. CNN + Max pooling means using convolutional neural network and max-pooling to extract features. As shown in line 1 of Tab. 3, with the conventional character vector CharVec and Bi-LSTM model, we obtain an accuracy of 70.98%, which illustrates the effectiveness of baseline model complement the elliptical quantity noun phrase in this paper. When extra context features are incorporated, the performance is improved accordingly. Comparing method 4 with method 5, we find that the syntactic information provided through BERT pre-trained character vector plays a vital role in improving our proposed model. It can help locate important characters in the elliptical quantity noun phrase, capturing better contextual information in the following modules. Comparing method 2 with method 4, we can see that the CNN and Max pooling feature extraction methods perform better than the attention-based mechanism in this paper. That is because in the sentence of the elliptical quantity noun phrase, the words in the context are crucial for people to infer what is omitted. Based on this observation, we use convolutional neural networks to obtain contextual semantics of various granularities for feature extraction. As is shown in line 2 and 3, CNN and the max-pooling model obtain better results than Bi-LSTM+ATT, which also proves the effectiveness of CNN in extracting features.
Despite the achievement in complement, we notice that there exists unbalance among the categories of the concepts that are omitted in language, Tab. 4 shows the recognition results of models for each type of concept. The numbering sequence in the Tab. 4 is the same as in Tab. 3.
One can notice that the poor performance of location and animal do not affect the overall performance obviously, due to the small amount of data of these concepts. In addition to continuously improving the model to advance the experimental results, we can also use the method of expanding our experimental data set. More training corpus makes further improvements in the characteristics of model learning.   5 95.44 95.44 95.44 83.33 88.61 85.89 100.0 100.0 100.0 76.92 71.43 74.07 90.91 76.92 83.33

Case Study
Most of the concept complement errors occur in cross-sentences. We use the following sentence as an example and analyze it. had an area of more than 2 square kilometers, and the layout was scattered and the agglomeration function was poor.)" In this example, elliptical quantity noun phrase "213 (213)" should be recognized as the location concept to complete the noun head of elliptical quantity noun phrase. Since our work uses one sentence as a unit for ellipsis recovery, the phenomenon of cross-sentence omission cannot be recovered well. This problem caused our model fail in obtaining information of "small towns" in the example sentence, and the elliptical quantity noun phrase "213" was classified into the category of thing concept. Therefore, in the later work, we will introduce the cross-sentence semantic information into the semantic complement task of elliptical quantity noun phrases, while expanding the data set to balance the elliptical quantity noun phrases of each concept category.

Ablation Experiment
The ablation experiments are implemented to evaluate the performance of each module. As shown in Tab. 5, we conduct four sets of experiments to compare the effects of the modules. Comparing experiment I with experiment IV, local feature extraction module can improve the experimental results by 4.55%. Comparing experiment II with experiment IV, the accuracy of the semantic encoding module for the overall recovery increased by 2.63%. In the input module, the BERT pre-training module has the best improvement. Comparing experiment III with experiment IV, with pre-training, the accuracy of the model increased by 8.7%. This also reminds us that the training effect of larger-scale corpus can lead to better deep learning models. Enlarging the scale of the corpus is an effective way to improve the experimental results.

Conclusion
In this paper, we aim to recover the elliptical quantity noun phrases by identifying the corresponding semantic categories automatically. We propose a hybrid model of BERT-Bi-LSTM-CNN, which utilizes BERT to obtain good representation, the Bi-LSTM model to encode long sentences and CNN to extract features. Experiments show that our model can effectively complement the omission of elliptical quantity noun phrases. Recovery of elliptical Chinese quantity noun phrases help the subsequent Chinese Abstract Meaning Representation (CAMR) parsing by improving the accuracy of the parser; hence, it is beneficial to downstream tasks based on the CAMR.
In the future, we will explore the following directions. First, we will develop better models to recognize the elliptical quantity noun phrases with more concrete concepts categories. Second, we can also incorporate comprehensive linguistic knowledge of the elliptical quantity noun phrases into the model construction to improve the performance.