An Opinion Spam Detection Method Based on Multi-Filters Convolutional Neural Network

: With the continuous development of e-commerce, consumers show increasing interest in posting comments on consumption experience and quality of commodities. Meanwhile, people make purchasing decisions relying on other comments much more than ever before. So the reliability of commodity comments has a significant impact on ensuring consumers’ equity and building a fair internet-trade-environment. However, some unscrupulous online-sellers write fake praiseful reviews for themselves and malicious comments for their business counterparts to maximize their profits. Those improper ways of self-profiting have severely ruined the entire online shopping industry. Aiming to detect and prevent these deceptive comments effectively, we construct a model of Multi-Filters Convolutional Neural Network (MFCNN) for opinion spam detection. MFCNN is designed with a fixed-length sequence input and an improved activation function to avoid the gradient vanishing problem in spam opinion detection. Moreover, convolution filters with different widths are used in MFCNN to represent the sentences and documents. Our experimental results show that MFCNN outperforms current state-of-the-art methods on standard spam detection benchmarks.


Introduction
In recent years, online shopping has been favored by consumers because of its convenience and fast speed. When purchasing commodities on the Internet, instead of relying on intuitive feelings, consumers more likely to make their purchasing decisions based on the seller's reputation, the commodities sales, and the comments of products. It can be considered that these are the essential elements of meeting customers' online shopping preferences. In this case, sellers and customers are paying increasing attention to the content of reviews, which triggers the motivation of generating spam comments [Jansen (2010)]. Numerous biased comments on the comment page deliberately promote or maliciously lower the rating of a certain product. These comments are usually powerfully deceptive and misleading [Jindal and Liu (2007)]. Sellers may counterfeit unreal praises for their products by hiring astroturfers to get higher ratings. Besides, they may maliciously lower relating ratings to their competitors in the same way. The abuse of cheating and spam reviews will lead to adverse effects on the entire online-shopping industry. In order to break this black market and protect consumers' legitimate rights, we designed an opinion spam detection method in this paper. Researches on traditional spam such as spam email and advertisement have received considerable attention, while recognition of spam opinion is neglected. Existing spam recognition works mainly focus on identifying destructive spam opinions, such as advertisements, non-commentary texts, or other irrelevant texts, or Jindal et al. [Jindal and Liu (2008)]. Although destructive spam opinions do cause some disruption to customers, the actual impact is not significant because customers can choose to ignore them. There is also a stealthier type of spam opinion in the commentary, namely deceptive spam opinion (deliberately written to sound authentic). Deceptive spam opinion detection can then be considered as an exercise of taking a review and determining whether it is a spam or not. Therefore, in this paper, we focus on how to identify deceptive spam opinion. To summarize, our main contributions are as follows: 1) We introduce a CNN-based algorithm with a simple structure and powerful representation ability to get good spam opinion representation. 2) We propose the LDA structure, which can obtain fixed-length inputs and a new activation function, combines the advantages of traditional methods and makes up for the shortcomings.

Related work
Over the last decade, the researches on spam opinion detection are in two aspects: spam opinion detection based on comment content and commentator's behavior. Jindal et al. [Jindal and Liu (2008)] firstly proposed the concept of spam opinion detection using a Logistic regression algorithm to identify repetitive reviews, noncommentary reviews, and deceptive reviews. A few years later, Jindal et al. [Jindal, Liu and Lim (2010)] used frequent itemset mining to detect spam comments. Ott et al. [Ott, Choi, Cardie et al. (2011)] combined linguistic features and psychological features as a classifier to effectively identify spam opinion. Chen et al. [Chen, Zhao and Yang (2015)] used deep-level language features to identify spam comments by analyzing information in the sentence and inter-sentence information. There have been lots of works which detects features outside the comment content itself. Mukherjee et al. [Mukherjee, Liu and Glance (2012)] obtained spam comments by grouping reviewers with similar behaviors. Li et al. [Li, Huang, Yang et al. (2011)] comprehensively considered the content of comments and the characteristics of commentators and obtained the conclusion that the emotional characteristics of comments have less influence on the identification of deceptive comments. Akoglu et al. [Akoglu, Chandy and Faloutsos (2013)] grouped commentators and comment content into a network,

MFCNN framework
We propose a Multi-Filters Convolutional Neural Network (MFCNN) with LDA structure input and an improved activation function (SRLU) based on the research of Kim [Kim (2014)]. Moreover, we also refer to the ideas from other research fields [Chen, Yu, Liu et  As shown in Fig. 1, the CNN-based model proposed in this paper can generate a representation of a document (the part drawn by the dotted line is the sentence representation model). Taking the sentence representation model as an example, the model consists of four layers: input layer, convolution layer, pooling layer, and fully connected layer. The input layer outputs the word vector matrix matched from the search matrix to complete the conversion from the word to the short text feature. The convolution layer is composed of three convolution kernels of different widths, and each convolution kernel can extract different local features. It can extract semantic information of different granularity better. The pooling layer adopts the maximum pooling, and comprehensively considers all the information. Finally, the fully connected layer completes the output of the sentence through the activation function.

Input layer
The role of the input layer is to convert a short text into input data and transfer the input into the convolutional neural network by connecting to the next layer. Let indicates a word in a sentence, and a sentence in a document is represented as , where represents the connection between and .
However, the lengths of the sentences are inconsistent, and we need a sequence of consistent length to be entered into the input layer, so converting each sentence into a fixed-length sequence is necessary. Considering the sequence problem of the sentence and in order to get a more reasonable vector representation, we use the LDA model to get the final vector [Duan and Ai (2015)]. Unlike the previous method of directly intercepting the length of a sentence, the LDA model is used to extract the essential words and reconstruct the sentences in the previous order. For each word , we use the lookup matrix to obtain its word embedding , where is the model parameter, is the word vector dimension, and is the vocabulary. In a traditional neural network, can be randomly initialized from a uniform distribution, or it can be pre-trained from a sizeable original corpus through an embedded learning algorithm [Socher, Perelygin, Wu et al. (2013); Mikolov, Sutskever, ]. Since the convolutional neural network has initially been designed for image processing tasks, the difference between image and text is that image can be directly converted into a two-dimensional matrix, which is directly utilized by the input layer convolutional neural network. Therefore, we consider using word2vec to transform the text information so that the result becomes the inputs of the convolutional neural network. The input layer of the sentence representation based on convolutional neural network proposed in this paper uses Google's word vector GoogleNews-vectorsnegative300 ] which has been trained with word2vec. The word vector model trains a 3-billion-word corpus and obtains a 300-dimensional word vector model, so the input layer in this paper is a word vector matrix with a dimension of 300. Word2vec can pay more attention to the relationship between words and sentences in the text, which will affect the classification results.

Convolution layer
In order to capture semantic fragments of various granularities, three convolutional operations kernels are used to generate sentence representations. The convolution operation kernel is a linear list with shared parameters. Parameters sharing ensures that only one parameter set learning is needed in the learning process, and no separate parameter sets learning is required for each position.
where , and is the output vector matrix sizes of the convolutional layer.

Pooling layer
In the task of applying traditional methods to do text classification, traditional approaches mainly classify text according to the most representative words in each sentence. However, this method is hard to complete, and the accuracy is not satisfactory. The convolutional neural network model designed in this paper utilizes the maximum pooling method. The factors which have the most significant influence on classification results are extracted from the feature vectors output from the upper convolution layer to complete the process of extracting local features of text from input data. The maximum pooling layer has the advantage of reducing the amount of calculation and limiting the network scale. Moreover, the pooling layer can extract the essential local features in the data for the convolutional neural network calculation.
Let denotes the output of the pooling layer. Taking as an example, let the size of the pooling function be , then the calculation of is as shown in Eq. (2): (2) Then the maximum values are combined as a fixed length vector matrix to be the output matrix of the pooling layer.

Activation function
The activation function is to map the features through nonlinear functions and remove the redundancy in the data. It makes the neural network appliable with nonlinear problems. In this paper, the SRLU function is used as the activation function, and the SRLU function is shown in Eq. (3): ( 3) where is the slope factor of positive interval. The larger the value of the slope factor is, the steeper the function of positive interval becomes. The setting of is related to the saturation point of the negative value interval. The larger the value of is, the lower the saturation point position will get. represents the distance between the intersection of the function and the negative part of the y-axis and the origin. The larger the value of is, the further the distance will become. Let , obviously, the negative part of SRLU is always less than zero. Compared with ReLU, the average value of SRLU tends to be 0, which lowers offset value. The derivative of the SRLU function is shown in Eq. (4): (4) Eq. (4) shows that the Sigmoid function multiplies constant by a negative interval of the derivative of SRLU function. To ensure the function is derivable at zero, the parameter constraints are shown in Eq. (5): In the backpropagation of a convolutional neural network, the change in the gradient is represented by a power function whose base is the derivative of the activation function, and the exponent is the number of nonlinear transformation layers [Trottier, Gigu, Chaibdraa et al. (2017)]. In order to avoid gradient vanishing or explosion problems during backpropagation, the slope factor should be set as shown in Eq. (6): The result is shown in Eq. (7) can be obtained by solving Eq. (5): (7) So, the exact mathematical definition of the SRLU function is shown in Eq. (8): The improved activation function is as shown in Fig. 2. The improved activation function has sparsity and smoothness. It can avoid gradient vanishing problem in the convolutional neural network and make the convolutional neural network effectively converge. Meanwhile, the calculation amount is smaller than the saturated nonlinear activation function, and the convergence speed is faster than the others.

Dropout layout
The function of the dropout layer is preventing over-fitting and improve the training effect of convolutional neural networks. Let obeys the Bernoulli distribution, then the final output of the model is as shown in Eqs. (9) and (10): where is the th dimension vector of the upper layer output and is the th dimension vector of the dropout layer output.

Output layout
Putting together the sentence vectors belonging to the same comment, we can reinvent the output of the sentence representation to get the input of the document representation. The document indicates that the top of the model with a softmax layer added, which can comprehensively consider all the extracted local features to complete the task of identifying fraudulent spam opinion. Softmax regression is an algorithm that divides the target component into multiple classes. The document representation model first converts the document vector into a real-valued vector whose length is the classification number N. The softmax layer then converts the real-value vector to the conditional probability , which uses the average of several output values from the previous layer, namely considerers the eigenvalues represented by the document to identify deceptive spam opinion. The calculation of the softmax function is as shown in Eq. (11): (11)

Model training
We use a small batch gradient descent method to train convolutional neural networks. The advantage of the small batch gradient descent method is both considering the advantages of the batch gradient descent method and the stochastic gradient descent. In each iteration, a part of the training sample should be considered to accelerate the convergence rate and avoid the optimal local solutions as much as possible [Ruder (2016)]. Assuming that the number of training samples is 1000, we apply the small batch gradient descent method and set 10 training samples for each iteration, and then the whole data set can be divided into 100 mini-batch. Let the loss function of the batch descent method be as shown in Eq. (12): The iterative formula for the batch gradient drop is as shown in Eq. (13): where is the learning rate. Then the iteration of the small batch gradient drop at each iteration is as shown in Eq. (14): Besides, how to solve the over-fitting phenomenon in model training is significant for obtaining accurate results. We use L2 regularization to constrain the parameters of the convolutional neural network and add dropout layer after generating sentence representation and document representation [Srivastava, Hinton, Krizhevsky et al. (2014)]. In the training of many deep learning models, the dropout layer is a commonly used method to avoid overfitting.

Datasets
We use the spam opinion standard dataset from Ott et al. [Ott, Choi, Cardie et al. (2011)] first published a false positive standard dataset in 2011, which includes 400 true praises and 400 false positives. Later, in 2013, a false negative evaluation standard dataset was published, which included 400 true bad reviews and 400 false bad reviews [Ott, Cardie and Hancock (2013)]. In 2014, Li et al. [Li, Ott, Cardie et al. (2014)] used the same method to collect 1,636 false comments and 1,200 real comments in three different fields. Also, we use a dataset named YelpZip [Rayana and Akoglu (2015)], which contains 608,598 reviews to testify our model.

Results and analysis
In order to verify the accuracy of different methods, we use the following four indicators, namely Accuracy, Precision, Recall and F1 (F1-score). In this paper, the 10-fold crossvalidation method is adopted [Zhang, Jin, Sun et al. (2018);Zhang, Wu, Feng et al. (2019);Zhang, Lu, Li et al. (2019);Yu, Long and Cai (2017)], that is, the original data set is equally divided into ten parts, and 1/10 of them are taken as the test set, and 1/10 of remaining data is used as verification set. The last remaining data is used as the train set. For each experiment, ten results obtained were averaged as the results. We set sentence representation and document representation convolution kernel width as (1, 2, 3) and (1, 2, 3), the number of convolution kernel channels is 100, the dropout is 0.5, and the learning rate is 0.001. The experimental results compared with other algorithms are as shown in Tab. 1. As shown in Tab. 1, the automatically optimized logistic regression method only gets F1score of 55.75%. CNN gives better results by capturing the relationship between words. Furthermore, we also compare our method with others, which gives results lower than ours.  The experiment based on the dataset of Li mainly verifies the influence of data volume on experimental results, ignoring the influence of parameter setting on experimental results. As can be seen from Tab. 2, the best experimental results were obtained on the Hotel dataset, followed by the Doctor dataset, and the worst experimental result was the Restaurant dataset. The result of such an experiment is mainly due to the different data amounts of the data set. Obviously, the larger the amount of data, the more accurate the experimental results. However, the experimental results obtained in experiments with all data sets are not the best. The Dataset category also has an impact on the experimental results. Multi-category spam recognition is less accurate than single-category spam identification. The experiment based on YelpZip shows the practical value of our method.

Conclusion
In this paper, we construct a word vector model by looking up the trained word vector model provided by Google News. Moreover, a new document representation model MFCNN based on the convolutional neural network with a fixed-length obtained input and a new activation function is proposed. Experimental results based on standard evaluation show that compared with other existing methods, this model has better performance in resolving the problem of opinion spam detection.

Conflicts of Interest:
The authors declare that they have no conflicts of interest to report regarding the present study.