Skip to main content
Log in

DCWord: A Novel Deep Learning Approach to Deceptive Review Identification by Word Vectors

  • Published:
Journal of Systems Science and Systems Engineering Aims and scope Submit manuscript

Abstract

Due to the anonymous and free-for-all characteristics of online forums, it is very hard for human beings to differentiate deceptive reviews from truthful reviews. This paper proposes a deep learning approach for text representation called DCWord (Deep Context representation by Word vectors) to deceptive review identification. The basic idea is that since deceptive reviews and truthful reviews are composed by writers without and with real experience on using the online purchased goods or services, there should be different contextual information of words between them. Unlike state-of-the-art techniques in seeking best linguistic features for representation, we use word vectors to characterize contextual information of words in deceptive and truthful reviews automatically. The average-pooling strategy (called DCWord-A) and max-pooling strategy (called DCWord-M) are used to produce review vectors from word vectors. Experimental results on the Spam dataset and the Deception dataset demonstrate that the DCWord-M representation with LR (Logistic Regression) produces the best performances and outperforms state-of-the-art techniques on deceptive review identification. Moreover, the DCWord-M strategy outperforms the DCWord-A strategy in review representation for deceptive review identification. The outcome of this study provides potential implications for online review management and business intelligence of deceptive review identification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Cao L, Tang X, (2014). Topics and trends of the online public concerns based on Tianya forum. Journal of Systems Science and Systems Engineering 23(2):212–230.

    Article  Google Scholar 

  • Chatterjeei P (2001). Online reviews. Do consumers use them? Proceedings of Conference on Association for Consumer Research: 129–134.

    Google Scholar 

  • Chen J, Zhou X, Tang X (2018). An empirical feasibility study of societal risk classification toward BBS posts. Journal of Systems Science and Systems Engineering 27(6):709–726.

    Article  Google Scholar 

  • Chen L, Wang F (2013). Preference-based clustering reviews for augmenting e-commerce recommendation. Knowledge-Based Systems 50(3):44–59.

    Article  Google Scholar 

  • Ciresan D C, Meier U, Masci J, Gambardella L M, Schmidhuber (2011). Flexible, high performance convolutional neural networks for image classification. Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence: 1237–1242.

    Google Scholar 

  • Collobert R, Weston J (2008). A unified architecture for natural language processing: Deep neural networks with multitask learning. Journal of Parallel & Distributed Computing: 160–167.

    Google Scholar 

  • Collobert R, Weston J, Bottou L, Karlen M., Kavukcuoglu K, Kuksa P (2011). Natural language processing (almost) from scratch. Journal of Machine Learning Research 12(1):2493–2537.

    MATH  Google Scholar 

  • Feng S, Banerjee R, Choi Y (2012). Syntactic stylometry for deception detection. ACL: 8–14.

    Google Scholar 

  • Feng V W, Hirst G. (2013). Detecting deceptive opinions with profile compatibility. International Joint Conference on Natural Language Processing: 14–18.

    Google Scholar 

  • Firth J R (1957). A synopsis of linguistic theory 1930–1955. Studies in Linguistic Analysis. Philological Society 40(2):305–321.

    Google Scholar 

  • Gokhman S, Hancock J, Prabhu P, Ott M, Cardie C (2012). In search of a gold standard in studies of deception. Proceedings of the EACL 2012 Workshop on Computational Approaches to Deception Detection: 23–27.

    Google Scholar 

  • Guo C, Du Z, Kou X (2018). Products ranking through aspect-based sentiment analysis of online heterogeneous reviews. Journal of Systems Science and Systems Engineering 27(5):542–558.

    Article  Google Scholar 

  • Hinton G E, Salakhutdinov R R (2006). Reducing the dimensionality of data with neural networks. Science 313(5786):504–507.

    Article  MathSciNet  Google Scholar 

  • Jindal N, Liu B (2008). Opinion spam and analysis. International Conference on Web Search and Data Mining, ACM.

    Google Scholar 

  • Kietzmann J, Canhoto A (2013). Bittersweet! Understanding and managing electronic word of mouth. Journal of Public Affairs 13(2):146–159.

    Article  Google Scholar 

  • Klein D, Manning CD (2003). Accurate unlexicalized parsing. Meeting on Association for Computational Linguistics: 423–430.

    Google Scholar 

  • Lai S, Xu L, Liu K, Zhao J (2015). Recurrent convolutional neural network for text classification. Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence: 2267–2273.

    Google Scholar 

  • Li F, Huang M, Yang Y, Zhu X (2011). Learning to identify review spam. International Joint Conference on Artificial Intelligence: 2488–2493.

    Google Scholar 

  • Li J, Ott M, Cardie C, Hovy E (2014). Towards a general rule for identifying deceptive opinion spam. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: 1566–1576.

    Google Scholar 

  • Lim Y J, Osman A, Salahuddin S N, Romle A R, Abdullah S (2016). Factors influencing online shopping behavior: The mediating role of purchase intention. Procedia Economics and Finance 35:401–410.

    Article  Google Scholar 

  • Liu B (2012). Opinion spam detection: Detecting fake reviews and reviewers. https://www.cs.uic.edu/liub/FBS/fake-reviews.html.

    Google Scholar 

  • Liu Q, Gao Z, Liu B, Zhang Y (2013). A logic programming approach to aspect extraction in opinion mining. Ieee/wic/acm International Joint Conferences on Web Intelligence 1:276–283.

    Google Scholar 

  • Marrese-Taylor E, Velásquez J D, Bravo-Marquez F, Matsuo Y (2013). Identifying customer preferences about tourism products using an aspect-based opinion mining approach. Procedia Computer Science 22:182–191.

    Article  Google Scholar 

  • Mikolov T, Chen K, Corrado G, Dean J (2013). Efficient estimation of word representations in vector space. Computer Science: 1301.

    Google Scholar 

  • Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013). Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems 26:3111–3119.

    Google Scholar 

  • Mudambi S M, Schuff D (2010). What makes a helpful online review? A study of customer reviews on Amazon.com. MIS Quarterly 34(1):185–200.

    Article  Google Scholar 

  • Nitin I, Fred J D, Zhang T (2005). Text mining: Predictive methods for analyzing unstructured information. Springer Science and Business Media: 15–37.

    Google Scholar 

  • Ott M, Choi Y, Cardie C, Hancock J T (2011). Finding deceptive opinion spam by any stretch of the imagination. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: 19–24.

    Google Scholar 

  • Pannakkong W, Sriboonchitta S, Huynh V (2018). An ensemble model of arima and ann with restricted boltzmann machine based on decomposition of discrete wavelet transform for time series forecasting. Journal of Systems Science and Systems Engineering 27(5):690–708.

    Article  Google Scholar 

  • Ren Y, Ji D (2017). Neural networks for deceptive opinion spam detection: An empirical study. Information Sciences 385:213–224.

    Article  Google Scholar 

  • Ren Y, Zhang Y (2016). Deceptive opinion spam detection using neural network. Proceedings of the 26th International Conference on Computational Linguistics:140–150.

    Google Scholar 

  • Socher R, Lin CY, Ng AY, Manning CD (2011). Parsing natural scenes and natural language with recursive neural networks. International Conference on Machine Learning: 129–136.

    Google Scholar 

  • Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol P A (2010). Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research 11(12):3371–3408.

    MathSciNet  MATH  Google Scholar 

  • Zhang W, Yoshida T, Tang X (2007). Text classification toward a scientific forum. Journal of Systems Science and Systems Engineering 16(3):356–379.

    Article  Google Scholar 

  • Zhang W, Yoshida T, Tang X (2008). Text classification based on multi-word with support vector machine. Knowledge-Based Systems 21(8):879–886.

    Article  Google Scholar 

  • Zhang W, Yoshida T, Tang X, Ho T (2009). Improving effectiveness of mutual information substantival multiword expression extraction. Expert Systems with Application 36(8):10919–10930.

    Article  Google Scholar 

  • Zhou L, Shi Y, Zhang D (2008). A statistical language modeling approach to online deception detection. IEEE Transactions on Knowledge & Data Engineering 20(8):1077–1081.

    Article  Google Scholar 

Download references

Acknowledgments

This research is supported in part by National Natural Science Foundation of China under Grant Nos. 71932002, 61379046, 91318302 and 61432001; the Innovation Fund Project of Xi’an Science and Technology Program (Special Series for Xi’an University under Grant No. 2016CXWL21). Also, the authors sincerely thank the referees for their much practical help to improve the quality of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wen Zhang.

Additional information

Wen Zhang is a professor of College of Economics and Management at Beijing University of Technology (BJUT). He received his PhD degree in knowledge science from the Japan Advanced Institute of Science and Technology in 2009. His recent research interests include machine learning, data mining, and information systems.

Qiang Wang is a PhD candidate of College of Economics and Management at Beijing University of Technology (BJUT). He received his BS degree in marketing from Qufu Normal University in 2016. His research interest includes E-commerce big data analysis, data mining, and machine learning.

Xiangjun Li is a professor with School of Information Engineering, Xi’an University. She received her PhD from Xidian University in 2013. Her current research interest includes data mining, knowledge discovery, and machine learning.

Taketoshi Yoshida is a professor with School of Knowledge Science, Japan Advanced Institute of Science and Technology. He received his PhD degree in systems engineering from Case Western Reserve University in 1984. His current research interest includes knowledge management, knowledge discovery, and information systems.

Jian Li is a professor of College of Economics and Management at Beijing University of Technology (BJUT). He received his PhD degree from Chinese Academy of Sciences in 2007. His recent research interests include supply chain finance, blockchain technology, and emergency management.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, W., Wang, Q., Li, X. et al. DCWord: A Novel Deep Learning Approach to Deceptive Review Identification by Word Vectors. J. Syst. Sci. Syst. Eng. 28, 731–746 (2019). https://doi.org/10.1007/s11518-019-5438-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11518-019-5438-4

Keywords

Navigation