Enhancing Embedding-Based Chinese Word Similarity Evaluation with
Concepts and Synonyms Knowledge

Word similarity (WS) is a fundamental and critical task in natural language processing. Existing approaches to WS are mainly to calculate the similarity or relatedness of word pairs based on word embedding obtained by massive and high-quality corpus. However, it may suffer from poor performance for insufficient corpus in some specific fields, and cannot capture rich semantic and sentimental information. To address these above problems, we propose an enhancing embedding-based word similarity evaluation with character-word concepts and synonyms knowledge, namely EWS-CS model, which can provide extra semantic information to enhance word similarity evaluation. The core of our approach contains knowledge encoder and word encoder. In knowledge encoder, we incorporate the semantic knowledge extracted from knowledge resources, including character-word concepts, synonyms and sentiment lexicons, to obtain knowledge representation. Word encoder is to learn enhancing embedding-based word representation from pre-trained model and knowledge representation based on similarity task. Finally, compared with baseline models, the experiments on four similarity evaluation datasets validate the effectiveness of our EWS-CS model in WS task.


Introduction
Currently, information security has become a global problem, and it is important to study and learn about security-related technologies. Especially in the field of text information security, through similarity technology research, we can not only detect information security vulnerabilities, but also effectively prevent text information security problems. Word similarity (WS) aims to measure the relatedness or similarity degree between word pairs [1][2][3], which is a fundamental and critical component in many tasks, such as information retrieval [4,5], detection of information security [6,7], machine translation [8], semantic disambiguation [9] etc.
Tradition WS methods obtain the similarity of word pairs by using relationship of word pairs in public lexical resources, which provide professional and authoritative knowledge by experts and scholars, such as character-word concept [10,11] and synonym information [11,12]. Afterwards, the embedding approaches based on corpus get more and more attention to measure WS, including some well-known models, such as continuous bag-of-words (CBOW) and Skip-gram (SG) in Word2Vec [13,14], GloVe [15], and improved methods considering more complex network structures [10,[16][17][18][19]. However, most of above models obtain excellent results based on massive corpus, and exist serious expression ambiguity.
Some studies also captured rich semantic knowledge by incorporating extra information, such as sentimental information [7,[27][28][29], synonym [1] and concept [30] information. Niu et al. [30] combined the lexical concepts in HowNet as prior knowledge to enhance word embedding representation, which realized sense disambiguation for better word similarity. Huang et al. [1] introduced multiple prior knowledge including statistical features or lexicon resources into word embedding to improve the performance of word similarity. However, these methods cannot calculate words not included in training corpus, and just simply used the combination of word similarities calculated by different features, which ignored lexical overlap relationship between different features. Take the word "骄傲" (pride) as an example, Fig. 1 shows the related words of it in different expert knowledge resources including synonym base CiLin, word concept base HowNet, and character concept base, and top ten related words extracted from Skip-gram model. From the results, first we can know that the different related word sets obtained by prior knowledge resources provide rich semantic information. For example, the synonyms of the word "骄傲" (pride) have different meanings with different sentiment tendencies, "光荣" (glory) has a positive sentiment, however "自满" (complacent) has a negative one. This information is difficult to learn using pre-trained embedding model. Our work is motivated by this idea, we encode knowledge representation of each word by incorporating different semantic knowledge, such as synonym, character-word concept.
In this paper, we propose an enhancing embedding-based Chinese word similarity evaluation with concepts and synonyms knowledge (EWS-CS), which consists of three major modules i.e., knowledge extraction, knowledge encoder and word encoder. First, we extract related knowledge including concepts, synonyms, to construct related knowledge word set. Then, in knowledge encoder, the core is to encode Figure 1: An example of semantic knowledge for word "骄傲" (pride). If a word repeats in multiple lexical resources, it is marked blue color. Because the related words are extracted from different knowledge resources, many words have the nearly similar meaning, which lead to the same English expression the knowledge representation via integrating synonym information from CiLin, character-word level concept from HowNet and sentiment information from lexicons resources to supplement the semantic information under small corpus. Word encoder is to learn enhancing embedding-based word representation from pretrained model and knowledge representation based on similarity task. The experiments are conducted on four evaluation datasets to validate the effectiveness of our method in WS task. The result shows that our EWS-CS model can improve the stability and adaptability under small corpus.
The rest of the paper is organized as follows: we introduce some methods of word similarity in Section 2, and describe our model in Section 3. Then, we present the results and performance comparisons in Section 4, followed by the conclusions and next research plan in Section 5.

Related Work
There are mainly three popular methods for word similarity (WS), including embedding-based method, lexical resource method and hybrid method.

Embedding-Based Method
The popular WS method currently is to calculate the cosine similarity between the vectors of word pairs based on word embedding model trained by large-scale corpus. Some widely used models include CBOW, SG [13,14], GloVe [15]. The CBOW model predicted the vector representation of the current word through context words, by contrast, the SG model utilized a word to achieve the representation of context word. The GloVe model integrated the global information with local contexts and learned the word representation using matrix decomposition. Most of the subsequent models are basically improved on the basis of above models. Ji et al. [31] proposed a WordRank model, and converted the word vector learning problem into a sorting problem to place the context words with strong relevance at the top of the list. The directional Skip-Gram (DSG) model proposed by Song et al. [18] is an extended model based on SG, which considered the direction factor of context words. It not only predicted its context words, but also clearly pointed out the left or right direction of these words. Sakketou et al. [32] proposed to incorporate the semantic information and the complex relationships of the words by semantic lexicons based on GloVe to improve the similarity calculation task. Peters et al. [17] proposed the ELMo model, which employed a linear combination of layers to represent word vectors based on a bidirectional language model. BERT [16] aimed to pre-train deep two-way representation based on the left and right contexts of all layers. Zhang et al. [19] proposed the ERNIE model, which fused text and knowledge mapping information based on BERT. These methods highlight large-scale corpus to train for word embedding, however some limitations are ignored: First, the wastage of these models is huge, and it is not effective to the research development.
Second, the distributed hypothesis that similar words have similar distributions is inherently questionable, because some words in the same position are not all synonymous. For example, the distributions of "good" and "bad" are similar, but they are adverse in fact.
Third, training objective and task are inconsistent. The parameters that achieve the state-of-the-art results in training process may not be suitable for similarity tasks.
In addition, the internal information of a word is taken into account, mainly including character feature [20][21][22]24,33], radicals in Chinese characters [24][25][26]. Chen et al. [20] proposed character-enhanced word embeddings (CWE) model, which introduced internal character information into word embedding methods to alleviate excessive reliance on the external information. Sun et al. [22] proposed a hybrid model to learn word embedding by simultaneously considering the pixel-level characteristics, characterlevel characteristics and context characteristics of words.
WordNet is a lexical database for the English language [11]. It provided a short, summary definition for each synset, which consisted of a group of words with the same meaning. Jimenez et al. [35] exploited the related word set from WordNet graph to calculate word similarity, and achieved the similar effect as the word embedding method. CiLin [12] consists of synonyms and related word of each word. Chen et al. [36] calculated the semantic similarity between words by exploiting the path and depth in CiLin, and then assigned different weights to the edges between the different layers. This method made the value of similarity change dynamically, not limited to fixed value.
At present, the most widely used word conceptual lexicon in Chinese is HowNet [38], which describes the concepts represented by Chinese and English words and reveals the relationship between concepts and their attributes. Liu et al. [38] first explored the calculation of lexical semantic similarity in HowNet. Zhu et al. [39] calculated the word similarity by integrating HowNet and CiLin. They first calculated the single similarity according to the characteristics of each lexical resource, and then obtained the final similarity based on the dynamic weighting strategy. Compared with using a single resource, combination method can include more semantic information and improve the accuracy of word similarity. However, lexical-based resources are not always updated and the timeliness is poor, which lead to a low word coverage.
In addition, there are also some methods by considering sentiment information. Smarandache et al. [2] proposed a fuzzy-based sentiment similarity measurement method, which assigned each word positive, negative and neural sentiment value extracted from SentiWordNet 3.0 1 [40] (an English sentiment lexicon) to construct a fuzzy sentiment vector representation. Then the word similarity was obtained by calculating the vector distance of each word pair. Tang et al. [41] integrated word context and sentiment polarity to construct a hybrid model HyRank. Lan et al. [27] proposed a sentiment word vector learning model based on a convolutional neural network. First, sentimental tags were automatically recognized using emoticons, and then a traditional CNN was extended by using two channels semantics and sentiments. The two were integrated to create a determined word vector (SWV) for similarity calculations.

Hybrid Method
Currently, word similarity focuses on the combination method, which incorporates multiple semantic information from different knowledge resources since single method exists accuracy and coverage limitations.
A widely used approach incorporate lexical resources into word embedding [29,30,42]. Niu et al. [30] fused the semantic information from HowNet based on the Skip-gram framework, and the different sense weights of the words were calculated based on the context attention mechanism. It could relieve word sense disambiguation. Yan et al. [29] combined word embedding with lexical resource to improve the similarity calculation of retrieval tasks, which solved the gap of synonyms when using lexical-based method.
In addition, Huang et al. [1] incorporated statistical methods, lexicon methods and word embedding methods, and used a variety of mathematical and counter-fitting combination strategies [43] for similarity calculations. Guo et al. [44] proposed a multi-feature fusion similarity algorithm, which adopted prior knowledge features and corpus statistical features. However, these methods generally combine a single internal feature or external feature with word embedding without taking into account the correlation between different knowledge.
The knowledge information has recently begun to be explored for word similarity, which so far had shown great promise. Inspired by this, we propose an enhancing embedding-based Chinese word similarity evaluation with synonyms and concepts knowledge (ESW-CS). The core of our method is to encode the knowledge representation via integrating synonym information from CiLin, character-word concept from HowNet and sentiment information from lexicons resources to supplement the semantic information under small corpus.

Our Enhancing Embedding-Based Chinese Word Similarity Model
Our ESW-CS model consists of three parts, including knowledge extraction, knowledge encoder and word encoder for similarity calculation, which is shown in Fig. 2. First, we extract related knowledge including concepts and synonyms from different knowledge resources, to construct a related word set R i for a word w i . Then, we introduce a dual weight method to calculate the importance of n-th related word r n i in R i by combining semantic and sentiment weights. Then, we incorporate related vector v p i with pre-trained vector v o i of word w i to obtain the final word representation of w i : And the vector cosine similarity of each pair of words is calculated as the semantic similarity to obtain the Pearson and Spearman coefficients as the output of the model. Finally, by continuously adjusting the weight parameters of various knowledge during training, the optimal Pearson and Spearman coefficients are obtained and used to evaluate the model.

Knowledge Extraction
Lexical knowledge resources are constructed by numerous experts and scholars, which can be considered to provide highly refined and correct information. In our work, we assume that word similarity of a word pair w i ; w j À Á not only relates to the context semantic information, but also has correlation with the semantic knowledge including concept and synonyms from lexical resources. Hence, we extract the candidate knowledge word set R i 0 of each word w i in multiple knowledge resources, such as HowNet and CiLin, among which there may be some poorly related words.
Tab. 1 shows the word concept set (from HowNet), character concept set (from Xinhua online dictionary 2 ) and synonym set (from CiLin) constructed by the four example words. It shows that the word sets from different lexicon resources have different representation importance for a word. Take the word "街道" (street) as an example, in the word concept set, "道路" (road) is more important for the representation of words compared with "居民区" (residential area) and "地方" (local). Similarly, in the character concept, "街巷" (street) is more meaningful for word representation than "两边" (both sides). Therefore, we will assign a scoring function to select some words with strong correlation to construct the related word set for each word, which is shown in Fig. 3.
First, we use pre-trained word embedding to obtain D candidate word representations for R i 0 , v w Rm ð Þand v w Rk ð Þ represent the pre-trained words' vectors of the m-th and k-th candidate words w Rm and w Rk , and we calculate the relevance R w Rm ; w Rk ð Þof each pair of candidate words based on the vector representation: Then, the similarity matrix H is constructed, R w Rm ; w Rk ð Þrepresents the similarity degree corresponding to the m-th row and k-th column in the matrix H. We score the importance of candidate words, which obtained to sum the matrix by rows. The score S w Rm ð Þ of the m-th word is defined as follows: Finally, we select some candidate words with the highest scores to construct the knowledge set R i .

Knowledge Encoder
Currently, most of knowledge resources is generally universal, especially for a single Chinese character word "大" (big) shown in Fig. 4, which has 12 meanings in HowNet, and some of meanings are more important than others, such as "龄大" (age old), "高于正常" (above normal). Therefore, it is worthy of study to measure the importance of different meanings or related words for R i to achieve knowledge representation. In this section, we propose a dual weight method to assign importance for each word in R i by considering semantic and sentiment weights. The former one is obtained by calculated the cosine similarity from the vectors between the w i and n-th related word r n i to highlight the different semantic importance of the r n i . The latter one is based on sentiment lexicons to assign correlation weight for each related word. Considering that some words have no pre-trained vectors, we design two strategies to obtain a dual weight, which is shown in Fig. 5.

Strategy One
If w i has a pre-trained vector v i , we use the cosine similarity between vectors of w i and each related word r n i to determine importance of related words, defined as where W sem r n i À Á is the semantic weight of r n i , v r n i À Á represents the vector of n-th related word r n i .

Strategy Two
If w i has no pre-trained vector v i , we build a semantic matrix S i , and each element uses the correlation between vectors of the m-th r m i and n-th related word r n i to determine importance of related words.
Finally, the matrix S i is summed by rows to get the semantic weight of each related word: For incorporating sentiment information, we query and get the sentiment value representation of each word in R i via sentiment lexicons, and then build the corresponding sentiment set s set R i ð Þ ¼ sen r 1 i À Á ; . . . ; sen r n i À Á ; . . . ; sen r N i À Á È É . The n-th word's sentimental polarity is defined as where N is the total number in R i , v r n i À Á means the sentiment polarity of r n i , 1, -1 and 0 indicate positive, negative and neutral sentiments, respectively. By comparing the sentiment value sen w i ð Þ and sen r n i À Á , the sentiment weight of r n i can be obtained: where b represents the sentimental weight when two words belong to different sentimental polarities. We will explore the optimal value in the experiments. Finally, we can get the dual importance weight of each word in R i by integrating semantic and sentimental weights.
Finally, the corresponding knowledge vector v k i is defined as: where v o r n i À Á represents pre-trained word representation of the n-th related word r n i .

Word Encoder and Similarity Calculation
After obtaining the knowledge vector, we combine the word's contextual semantic vector to get an updated word representationv w i ð Þ.
where a represents the harmonic weight of the original semantic features with respect to w i , which is used to adjust the proportion of knowledge vector v k i and context semantic vector v o i . Then the similarity between any two words w i and w j can be calculated by the cosine similarity. In the experiments, 1 million news are randomly selected to obtain pre-trained model. In addition, in order to increase the diversity of data and verify the broad applicability of our model, we also use some pre-trained word vectors 3 provided by Li et al. [46], which were trained on multiple corpora including Baidu Encyclopedia, Wikipedia, People's Daily News, Financial News and Literature based on Skip-gram model. As for the pre-trained embedding settings, window size is five, negative sampling is five, iteration is five, low frequency word is ten, dimension of vector is 300 and we only use the pre-embedding with word feature.
Training data: In order to train the parameters a and b in our method, we use the SimLex-999 translated dataset [47], which contains 999 word pairs and corresponding similarity score translated from English, to train and predict the similarity of each word pair. Then, we calculate correlation between predicted similarity sequence and the standard similarity sequence.
Evaluation data: The purpose of our work is to construct word representation model for calculating similarity of Chinese words. At present, there are four evaluation datasets commonly used in Chinese, namely WordSim-240 [20], WordSim-296 [48], MC30 [11] and RG35 [33], all of which are word pairs with similarity scores. The details of evaluation dataset are shown in Tab. 2.

Metrics
In order to evaluate the effectiveness of our proposed method, we use the Spearman (q) and Pearson (r) rank correlation coefficient, which are both widely applied in word similarity task. As for each evaluation dataset D ¼ w 1 1 ; w 2 1 ; X 1 À Á ; . . . ; w 1 n ; w 2 n ; X n À Á ; . . . ; w 1 N 1 ; w 2 N ; X N À Á È É , N represents the total number of word pairs, w 1 n ; w 2 n ; X n À Á is the n-th word pair, w 1 n and w 2 n indicate the two words in n-th word pair, X n is the n-th gold-standard similarity score. Through our ESW-CS model, we can predict the similarity Y n of the n-th word pair, and then get two sequences X ¼ X 1 ; . . . ; X n ; . . . ; X N f gand Y ¼ Y 1 ; . . . ; Y n ; . . . ; Y N f g . The key to the evaluation of the similarity task is to find the correlation between the two sequences. The Pearson (r) is defined as: where X and Y are the average value of two sequences X and Y .
The Spearman correlation coefficient (q) is defined as where R X n and R Y n are the rank of X n in X and the rank of Y n in Y, respectively.

Parameter Settings
In our experiments, pre-trained word embedding is 100 dimensions. We select SG model as basic pre-trained method. The parameters are followed by [13,14], window is 5, min count of word is 20, negative is 3, sample is 0.001. The other contrast experiment models use the same parameters. In our Section 4.2 similarity comparison experiments, we set a = 0.2, b = 0.1, and the specific inquiry experiments are set in Section 4.3. In order to avoid the occasional case of our experiments, each evaluation dataset is trained five times to obtain the average result.
In order to obtain the sentiment of each word, we integrate multiple Chinese sentiment lexicon resources, including HowNet 4 , DUTIR 5 , NTUSD 6 and sentiment lexicon from Tsinghua University [49].

Word Similarity Experiments
We evaluate our model based on concepts and synonyms knowledge on the task of word similarity. To present the effectiveness for word similarity, we compare and analyze the performance of our model to the following state-of-art models, which are widely used in Chinese word similarity: Lexical-based method: The commonly used resources in Chinese are HowNet, a word concept resource and CiLin a synonym resource. HowNet provides the concept set of each word, and then calculates the similarity of the two words based on the path relationship between the concept word set of the word pair [39]. CiLin contains synonyms and related words for each word, and then calculates word similarity according to the path relationship between the synonyms and related words of the word pairs [39].
Word embedding method: We apply the wide word embedding models, including CBOW, SG and Glove to obtain the vectors of word pairs, and then utilize them on word similarity task. In addition, we also compared with some of improved embedding methods, such as CWE [20], SCWE [23], JWE [24]. JWE was proposed by Yu et al. [24] to learn the joint embedding of Chinese words, characters and finegrained sub-character components. SCWE considered the Chinese word and internal structure character to learn the word embedding [23]. CWE method was proposed to obtain multiple-prototype character embedding for word similarity task [20].
Hybrid method of word embedding and lexicons: Niu et al. [30] proposed a sememe attention over target model (SAT) to incorporated word concepts from HowNet into word embedding representation learning for word similarity task. Sememes are used to describe the meaning of word, and each sememe has different importance to the meaning of the word. 4 Hownet http://www.keenage.com/html/e_index.html 5 Sentiment Ontology http://ir.dlut.edu.cn/EmotionOntologyDownload 6 Lexicon from National Taiwan University https://down.itsvse.com/amp/16003.html Our EWS-CS model: We propose an enhancing embedding-based Chinese Word similarity evaluation with concepts and synonyms knowledge, and take sentiment information of words into considerations. Concepts feature includes character-word concepts from HowNet and Xinhua online dictionary, and synonyms feature contains synonyms from CiLin.
The evaluation results of our EWS-CS model and baseline methods on word similarity datasets are shown in Tab. 3. From the results, our model outperforms other baseline models and we can observe that: 1. The performance of lexicon-based methods is very unstable, which performs well on MC30 and RG35 with a small amount of word pairs, especially using synonym information from CiLin, but does not perform well in WordSim-240 and WordSim-296 with more word pairs. MC30 and RG35 have a good performance since most word pairs of them can extract related concepts and synonyms through the knowledge resources. However, many words in the WordSim-240 and WordSim-296 evaluation data sets cannot be matched in the knowledge resources to lead poor results. This reflects the shortcomings of the lexicon-based method that the similarity of a word pair not in the lexicons or knowledge base resources cannot be obtained. 2. The evaluation indices of word embedding-based methods fluctuate little. The overall performance of small evaluation datasets (MC30 and RG35) are better than that of large data sets (WordSim-240 and WordSim-296). On the one hand, it shows that different word embedding methods can capture the semantics in the corpus to a certain extent, on the other hand, it also reflects that the word embedding method can no longer further improve the word similarity effect. 3. Our model outstrips other state-of-the-art baseline methods, including lexicon-based method, word embedding-based methods, hybrid methods. Compared with the lexicon-based method, the performance of our model is improved significantly when there are many word pairs, such as WordSim-240 and WordSim-296. And the Spearman correlation coefficient (q) improved by more than 50%. Compared with the word embedding-based methods, the improvement is obvious in the case where word pair is small (MC30 and RG35), and the Spearman correlation coefficient (q) is increased by more than 20%. In the whole, the EWS-CS model has achieved outstanding results in WordSim-240 and WordSim-296, indicating that our method with synonym and character-level concept knowledge can effectively represent words.

Applicability of Our Model in Different Corpora
In order to strengthen the diversity of data samples and further verify the applicability of our model to different text, we used pre-trained vectors based on Skip-gram model from different corpora including Baidu Encyclopedia, Wikipedia, People's Daily News, Financial News and Literature. It can be known from Section 4.2 that the performances of WordSim-240 and WordSim-296 are relatively stable, so we choose WordSim-240 and WordSim-296 two evaluation data sets for verification in this section. From Tab. 4, our model performs significantly better than the pre-trained Skip-gram model in different corpora.
Specifically, we can observe that: 1. Different corpus: On the whole, as the size of corpus increases, the task performance gets better and better for different corpora. Although the size of Baidu Encyclopedia is higher than Financial News, the effect is similar. The possible reason is that financial news is more professional and the quality of corpus is better than Baidu Encyclopedia. 2. Different evaluation data sets: Although the overall effect of our model is better than the pre-trained Skipgram model, the results of the WordSim-296 data set are more improved compared with WordSim-240.
Most of the word pairs contained in WordSim-240 are related words, but there are many similar words in WordSim-296 data set. From this perspective, due to the incorporating synonym information in our model, the word pairs in WordSim-296 can be better supplemented with semantic knowledge, so that the results are improved more.

Parameter Tuning and Determination
There are two parameters in the EWS-CS model, namely the knowledge and context semantic harmony coefficient a, and the sentiment similarity b of antonyms in sentiment assignment. In order to determine the importance of the two parameters in similarity, we use the SimLex-999 translated dataset for training based on Skip-gram as basic pre-trained method, and each parameter is trained by 5 times, and the result is averaged. Figs. 6 and 7 show the performance of the parameters under different values.   when a is fixed. The area of the boxplot represents the fluctuation range of the final result with different b when a is fixed. It can be seen that with the increase of a, the area of the boxplot is getting smaller and smaller, which shows that the larger the value of a is, the smaller the fluctuation of b in the result becomes. As for the Spearman's rank correlation coefficient, the experimental results illustrate that the values increase first and then decrease with increasing a. This reflects our proposed model to incorporate character-word concepts and synonyms, which can effectively capture more semantic knowledge, thereby obtaining a better word semantic representation vector. As for the Pearson coefficient, it has a similar law to Spearman's rank correlation coefficient, but the overall fluctuation is slightly higher, which illustrates that the model of integrating concepts and synonyms knowledge into word vector can effectively improve the performance. Especial, the evaluation achieves best result, when a = 0.2, so a will be set to 0.2. Fig. 7 shows the Spearman's rank correlation coefficient (a) and Pearson correlation coefficient (b) with different b values, respectively. Each boxplot indicates the performance of word similarity under different a when b is fixed. The area of each boxplot is larger, and the area slightly decreases as b increases, indicating that when b is fixed, different a has a greater impact on the result, and as b increases, this effect decreases slightly. Fig. 7 also shows that sentimental similarity of antonyms has slight little effect on our method. When the antonym's sentimental similarity b equals 0.1, the overall result is relatively optimal.

Case Analysis
In order to better understand the quality of our proposed model, we conduct a case analysis in Tab. 5 to illustrate the similarity for 10 pairs of words under different methods. Our model outperforms other methods in three aspects:  For some similar words, such as lines 1, 2, 3, the performance of our method is very close to the standard result of artificial labeling, which is evidently better than other methods. The reason is that our model integrates synonym and sentiment information, which can better represent the synonymy of words. 2. For some related words, such as lines 4, 5, 6, because our model considers the character-word level concepts of words, it complements the relevance of words, making our proposed model outperform existing methods. 3. For some word pairs that are unrelated, such as lines 8, 9, 10, our method considers the dual superposition of semantics and sentiment knowledge to update the word representation, making our method significantly better than existing methods.
In summary, our method has good results for various types of word similarity calculations. However, for some word pairs, such as line 7 "日本" (Japan) and "南京大屠杀" (Nanjing massacre), due to the influence of history and other factors, the above similarity methods have not achieved good performance.

Conclusion
Similarity calculation is a basic task in natural language processing, which is of great significance for information retrieval and information security detection. We propose an enhancing embedding-based word similarity evaluation method, which highly emphasizes on synonyms and character-word level concepts knowledge. Different from traditional methods to calculate similarity within a single feature, in this paper, we first construct a knowledge related word set to enrich semantic information for each word, and then obtain the knowledge representation utilizing semantic and sentimental information to enhance  the word embedding and distinguish the significance of different knowledge. In our work, we break the boundary between multi-features and consider synonyms, character-word level concepts and sentiment knowledge, which achieves excellent word representation. Experiments on similarity task have validated the effectiveness of our proposed model, which not only improves the performance of the word similarity under small samples, but also increases the stability of result.
Of course, there are still some issues worthy of further study in the similarity calculation based on small samples.
First, synonyms and related words are confused as conducting the word similarity task. Since similar and related word pairs are essentially different, similar words mean that they can be replaced in the same position by each other, and related words indicate that they have certain associations and appear in each other's context. Therefore, different word connotations lead to higher requirements for WS task, which makes the study consider the correlation and similar relationship of word pairs as a key issue to improve the accuracy of similarity calculation.
Second, the problem of low vocabulary coverage cannot be ignored under small sample corpus. It is not prominent when using large corpus because of the wide range of vocabulary. In this paper, we incorporate the concept of character in the case of small corpus. Although this problem is alleviated, it is still incompetent for some words without character concept. Therefore, studying the word similarity calculation of vocabulary to achieve more coverage is still a key issue. Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.