Abstract
Due to the anonymous and free-for-all characteristics of online forums, it is very hard for human beings to differentiate deceptive reviews from truthful reviews. This paper proposes a deep learning approach for text representation called DCWord (Deep Context representation by Word vectors) to deceptive review identification. The basic idea is that since deceptive reviews and truthful reviews are composed by writers without and with real experience on using the online purchased goods or services, there should be different contextual information of words between them. Unlike state-of-the-art techniques in seeking best linguistic features for representation, we use word vectors to characterize contextual information of words in deceptive and truthful reviews automatically. The average-pooling strategy (called DCWord-A) and max-pooling strategy (called DCWord-M) are used to produce review vectors from word vectors. Experimental results on the Spam dataset and the Deception dataset demonstrate that the DCWord-M representation with LR (Logistic Regression) produces the best performances and outperforms state-of-the-art techniques on deceptive review identification. Moreover, the DCWord-M strategy outperforms the DCWord-A strategy in review representation for deceptive review identification. The outcome of this study provides potential implications for online review management and business intelligence of deceptive review identification.
Similar content being viewed by others
References
Cao L, Tang X, (2014). Topics and trends of the online public concerns based on Tianya forum. Journal of Systems Science and Systems Engineering 23(2):212–230.
Chatterjeei P (2001). Online reviews. Do consumers use them? Proceedings of Conference on Association for Consumer Research: 129–134.
Chen J, Zhou X, Tang X (2018). An empirical feasibility study of societal risk classification toward BBS posts. Journal of Systems Science and Systems Engineering 27(6):709–726.
Chen L, Wang F (2013). Preference-based clustering reviews for augmenting e-commerce recommendation. Knowledge-Based Systems 50(3):44–59.
Ciresan D C, Meier U, Masci J, Gambardella L M, Schmidhuber (2011). Flexible, high performance convolutional neural networks for image classification. Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence: 1237–1242.
Collobert R, Weston J (2008). A unified architecture for natural language processing: Deep neural networks with multitask learning. Journal of Parallel & Distributed Computing: 160–167.
Collobert R, Weston J, Bottou L, Karlen M., Kavukcuoglu K, Kuksa P (2011). Natural language processing (almost) from scratch. Journal of Machine Learning Research 12(1):2493–2537.
Feng S, Banerjee R, Choi Y (2012). Syntactic stylometry for deception detection. ACL: 8–14.
Feng V W, Hirst G. (2013). Detecting deceptive opinions with profile compatibility. International Joint Conference on Natural Language Processing: 14–18.
Firth J R (1957). A synopsis of linguistic theory 1930–1955. Studies in Linguistic Analysis. Philological Society 40(2):305–321.
Gokhman S, Hancock J, Prabhu P, Ott M, Cardie C (2012). In search of a gold standard in studies of deception. Proceedings of the EACL 2012 Workshop on Computational Approaches to Deception Detection: 23–27.
Guo C, Du Z, Kou X (2018). Products ranking through aspect-based sentiment analysis of online heterogeneous reviews. Journal of Systems Science and Systems Engineering 27(5):542–558.
Hinton G E, Salakhutdinov R R (2006). Reducing the dimensionality of data with neural networks. Science 313(5786):504–507.
Jindal N, Liu B (2008). Opinion spam and analysis. International Conference on Web Search and Data Mining, ACM.
Kietzmann J, Canhoto A (2013). Bittersweet! Understanding and managing electronic word of mouth. Journal of Public Affairs 13(2):146–159.
Klein D, Manning CD (2003). Accurate unlexicalized parsing. Meeting on Association for Computational Linguistics: 423–430.
Lai S, Xu L, Liu K, Zhao J (2015). Recurrent convolutional neural network for text classification. Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence: 2267–2273.
Li F, Huang M, Yang Y, Zhu X (2011). Learning to identify review spam. International Joint Conference on Artificial Intelligence: 2488–2493.
Li J, Ott M, Cardie C, Hovy E (2014). Towards a general rule for identifying deceptive opinion spam. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: 1566–1576.
Lim Y J, Osman A, Salahuddin S N, Romle A R, Abdullah S (2016). Factors influencing online shopping behavior: The mediating role of purchase intention. Procedia Economics and Finance 35:401–410.
Liu B (2012). Opinion spam detection: Detecting fake reviews and reviewers. https://www.cs.uic.edu/liub/FBS/fake-reviews.html.
Liu Q, Gao Z, Liu B, Zhang Y (2013). A logic programming approach to aspect extraction in opinion mining. Ieee/wic/acm International Joint Conferences on Web Intelligence 1:276–283.
Marrese-Taylor E, Velásquez J D, Bravo-Marquez F, Matsuo Y (2013). Identifying customer preferences about tourism products using an aspect-based opinion mining approach. Procedia Computer Science 22:182–191.
Mikolov T, Chen K, Corrado G, Dean J (2013). Efficient estimation of word representations in vector space. Computer Science: 1301.
Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013). Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems 26:3111–3119.
Mudambi S M, Schuff D (2010). What makes a helpful online review? A study of customer reviews on Amazon.com. MIS Quarterly 34(1):185–200.
Nitin I, Fred J D, Zhang T (2005). Text mining: Predictive methods for analyzing unstructured information. Springer Science and Business Media: 15–37.
Ott M, Choi Y, Cardie C, Hancock J T (2011). Finding deceptive opinion spam by any stretch of the imagination. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: 19–24.
Pannakkong W, Sriboonchitta S, Huynh V (2018). An ensemble model of arima and ann with restricted boltzmann machine based on decomposition of discrete wavelet transform for time series forecasting. Journal of Systems Science and Systems Engineering 27(5):690–708.
Ren Y, Ji D (2017). Neural networks for deceptive opinion spam detection: An empirical study. Information Sciences 385:213–224.
Ren Y, Zhang Y (2016). Deceptive opinion spam detection using neural network. Proceedings of the 26th International Conference on Computational Linguistics:140–150.
Socher R, Lin CY, Ng AY, Manning CD (2011). Parsing natural scenes and natural language with recursive neural networks. International Conference on Machine Learning: 129–136.
Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol P A (2010). Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research 11(12):3371–3408.
Zhang W, Yoshida T, Tang X (2007). Text classification toward a scientific forum. Journal of Systems Science and Systems Engineering 16(3):356–379.
Zhang W, Yoshida T, Tang X (2008). Text classification based on multi-word with support vector machine. Knowledge-Based Systems 21(8):879–886.
Zhang W, Yoshida T, Tang X, Ho T (2009). Improving effectiveness of mutual information substantival multiword expression extraction. Expert Systems with Application 36(8):10919–10930.
Zhou L, Shi Y, Zhang D (2008). A statistical language modeling approach to online deception detection. IEEE Transactions on Knowledge & Data Engineering 20(8):1077–1081.
Acknowledgments
This research is supported in part by National Natural Science Foundation of China under Grant Nos. 71932002, 61379046, 91318302 and 61432001; the Innovation Fund Project of Xi’an Science and Technology Program (Special Series for Xi’an University under Grant No. 2016CXWL21). Also, the authors sincerely thank the referees for their much practical help to improve the quality of this paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Wen Zhang is a professor of College of Economics and Management at Beijing University of Technology (BJUT). He received his PhD degree in knowledge science from the Japan Advanced Institute of Science and Technology in 2009. His recent research interests include machine learning, data mining, and information systems.
Qiang Wang is a PhD candidate of College of Economics and Management at Beijing University of Technology (BJUT). He received his BS degree in marketing from Qufu Normal University in 2016. His research interest includes E-commerce big data analysis, data mining, and machine learning.
Xiangjun Li is a professor with School of Information Engineering, Xi’an University. She received her PhD from Xidian University in 2013. Her current research interest includes data mining, knowledge discovery, and machine learning.
Taketoshi Yoshida is a professor with School of Knowledge Science, Japan Advanced Institute of Science and Technology. He received his PhD degree in systems engineering from Case Western Reserve University in 1984. His current research interest includes knowledge management, knowledge discovery, and information systems.
Jian Li is a professor of College of Economics and Management at Beijing University of Technology (BJUT). He received his PhD degree from Chinese Academy of Sciences in 2007. His recent research interests include supply chain finance, blockchain technology, and emergency management.
Rights and permissions
About this article
Cite this article
Zhang, W., Wang, Q., Li, X. et al. DCWord: A Novel Deep Learning Approach to Deceptive Review Identification by Word Vectors. J. Syst. Sci. Syst. Eng. 28, 731–746 (2019). https://doi.org/10.1007/s11518-019-5438-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11518-019-5438-4