A framework for fake review detection in online consumer electronics retailers

https://doi.org/10.1016/j.ipm.2019.03.002Get rights and content

Abstract

The impact of online reviews on businesses has grown significantly during last years, being crucial to determine business success in a wide array of sectors, ranging from restaurants, hotels to e-commerce. Unfortunately, some users use unethical means to improve their online reputation by writing fake reviews of their businesses or competitors. Previous research has addressed fake review detection in a number of domains, such as product or business reviews in restaurants and hotels. However, in spite of its economical interest, the domain of consumer electronics businesses has not yet been thoroughly studied. This article proposes a feature framework for detecting fake reviews that has been evaluated in the consumer electronics domain. The contributions are fourfold: (i) Construction of a dataset for classifying fake reviews in the consumer electronics domain in four different cities based on scraping techniques; (ii) definition of a feature framework for fake review detection; (iii) development of a fake review classification method based on the proposed framework and (iv) evaluation and analysis of the results for each of the cities under study. We have reached an 82% F-Score on the classification task and the Ada Boost classifier has been proven to be the best one by statistical means according to the Friedman test.

Introduction

Online consumer product reviews are playing an increasingly important role for customers, constituting a new type of WOM information (Chen & Xie, 2008). Recent research shows that 52% of online consumers use the Internet to search for product information, while 24% of them use the Internet to browse products before making purchases (Ha, yong Bae, & Son, 2015). As a result, online reviews has a strong impact on consumers’ decision purchase in e-commerce, affecting the most relevant areas, such as travel and accommodations (Filieri, McLeay, 2014, Sotiriadis, Van Zyl, 2013), online retailers (Awad & Ragowsky, 2008), and entertainment (Chevalier, Mayzlin, 2006, Dhar, Chang, 2009, Zhu, Zhang, 2006). Moreover, online reviews of the same product can be found in multiples sources of information, which can be classified (Park, Gu, & Lee, 2012) according to the parties that host WOM information into internal WOMs, hosted by retailers (e.g. Amazon, Walmart, BestBuy, etc.) and external ones, hosted by independent product review providers (e.g. CNET, Yelp, TripAdvisor, Epinions, etc.).

Nevertheless, only credible reviews have a significant impact on consumers’ purchase decision (Chakraborty & Bhat, 2018). Moreover, product category affects significantly the credibility of WOMs (Mudambi & Schuff, 2010). Consumer electronics product category is the most online reviewed (Chan & Ngai, 2011), based on a number of factors. On the one hand, consumer electronics usually require a significant investment, and the more valuable and expensive an item is, the more it is researched. According to a study (Riegner, 2007), consumer electronics are the product most influenced by online reviews, influencing the 24% of products acquired in this category, and being WOMs the second most influential source after search engines in this product category. On the other hand, consumers tend to research on consumer electronics products because these products change very frequently, with new products and updates of existing ones (Chakraborty & Bhat, 2018). Thus, consumers frequently trust on reviews to avoid making a wrong purchase decision (Park & Kim, 2008). As a result, Horrigan et al. Horrigan and Vitak (2008) report that more than 50% of consumer electronics buyers tend to consult several WOMs before making a purchase decision.

Some studies (Gu, Park, & Konana, 2012) show that retailer hosted online WOM influences enormously sales in low involvement products, such as books or CDs. However, consumers usually conduct a pre-sales research in high-involvement products, such as consumer electronics. Thus, in consumer electronics, retailer’s internal WOM has a limited influence, while external WOM sources have a significant impact on the retailer’s reputation and sales (Cui, Lui, & Guo, 2012). Hence, consumer electronics are more sensible to the effects of external WOMs, since they cannot easily act on them.

Since both consumers and retailers become overwhelmed by the huge number of available opinions in WOM internal and external sources, automatic natural language processing and sentiment analysis techniques have been frequently applied. Some of the most frequent application domains are review polarity classification (Poria, Cambria, & Gelbukh, 2016), review summarization (Potthast & Becker, 2010), competitive intelligence acquisition (Dey, Haque, Khurdiya, & Shroff, 2011) and reputation monitoring (Ziegler & Skubacz, 2006).

Given the importance of reviews for businesses and the difficulty of obtaining a good reputation on the Internet, several techniques have been used to improve online presence, including unethical ones. Fake reviews are one of the most popular unethical methods which are present on sites such as Yelp or TripAdvisor. However, according to Jindal and Liu (2007b), not all fake reviews are equally harmful. Fake negative reviews on good quality products are really harmful for enterprises, and along with fake positive reviews on poor quality products, result also harmful for consumers. Fake positive reviews on poor quality products are also harmful for competitors who offer average or good quality products but do not have so many reviews on them.

The goal of this article is analyzing the fake review problem in the consumer electronics field, more precisely studying Yelp businesses from four of the biggest cities of the USA. No prior research has been carried out in this concrete field, being restaurants and hotels the most previously studied cases. We want to prove that fake review detection problem in online consumer electronics retailers can be solved by machine learning means and to show if the difficulty of achieving it depends on geographic location.

In order to achieve this goal, we have followed a principled approach. Based on literature review and experimentation, a feature framework for fake review detection is proposed, which includes some contributions such as the exploitation of the social perspective. This framework, so called Fake Feature Framework (F3), helps to organize and characterize features for fake review selection. F3 considers information coming from both the user (personal profile, reviewing activity, trusting information and social interactions) and review elements (review text), establishing a framework with which categorize existing research. In order to evaluate the effectiveness of the features defined in F3, a dataset from the social Yelp in four different cities has been collected and a classification model has been developed and evaluated.

The reminder of the paper is structured as follows. Section 2 reviews the state of the art on fake review detection on other domains. Afterwards, Section 3 presents the followed methodology and also introduces the proposed feature framework. Experimentation is detailed in Section 4. Finally, Section 5 highlights and discusses the main obtained results.

Section snippets

Related work

The task of fake review detection has been studied since 2007, with the analysis of review spamming (Jindal & Liu, 2007b). In this work, the authors analyzed the case of Amazon, concluding that manually labeling fake reviews may result challenging, as fake reviewers could carefully craft their reviews in order to make them more reliable for other users. Consequently, they proposed the use of duplicates or nearly-duplicates as spam in order to develop a model that detects fake reviews (Jindal &

Methodology

The methodology followed in this article is shown in Fig. 1. The first step is building the dataset from Yelp by web scraping means (Section 3.1). Then, a feature model is defined and computed (Section 3.2) for training a classifier that detects fake reviews (Section 4).

Experimentation and evaluation

This section describes the experiments carried out to develop and evaluate a fake review classifier model based on the framework previously described. The classifier has been trained and tested over the consumer electronics dataset previously scrapped and evaluated using ten fold cross-validation. Also, we tackle the impact the user and review centric features may have on the performance, and the possible differences across different cities. After analyzing the results obtained, a statistical

Conclusions

In this paper we have addressed fake review detection in the consumer electronics domain. We have proposed a feature framework oriented to analysis of social sites, and we have also developed a dataset which is made available2 for future research. Our framework is composed of two main types of features: Review centric and user centric. Review centric features are only related to the text of the review. On the other

Acknowledgments

This work is supported by the Spanish Ministry of Economy and Competitiveness under the R&D projects SEMOLA (TEC2015-68284-R) and EmoSpaces (RTC-2016-5053-7), by the Regional Government of Madrid through the project MOSI-AGIL-CM (grant P2013/ICE-3019), the European Union under Trivalent (2020 RIA Action Grant No. 740934 under the call SEC-06-FCT-2016), and Spanish MINETAD (TSI-102600-2016-1).

References (60)

  • S. Banerjee et al.

    Using supervised learning to classify authentic and fake online reviews

    Proceedings of the 9th international conference on ubiquitous information management and communication

    (2015)
  • L. Breiman

    Random forests

    Machine learning

    (2001)
  • L. Breiman et al.

    Classification and regression trees

    (1984)
  • U. Chakraborty et al.

    The effects of credible online reviews on brand equity dimensions and its consequence on consumer behavior

    Journal of Promotion Management

    (2018)
  • Y.Y. Chan et al.

    Conceptualising electronic word of mouth activity: An input-process-output perspective

    Marketing Intelligence & Planning

    (2011)
  • Y. Chen et al.

    Online consumer review: Word-of-mouth as a new element of marketing communication mix

    Management Science

    (2008)
  • J.A. Chevalier et al.

    The effect of word of mouth on sales: Online book reviews

    Journal of Marketing Research

    (2006)
  • D.R. Cox

    The regression analysis of binary sequences

    Journal of the Royal Statistical Society. Series B (Methodological)

    (1958)
  • G. Cui et al.

    The effect of online consumer reviews on new product sales

    International Journal of Electronic Commerce

    (2012)
  • J. Demšar

    Statistical comparisons of classifiers over multiple data sets

    Journal of Machine Learning Research

    (2006)
  • R.K. Dewang et al.

    Identification of fake reviews using new set of lexical and syntactic features

    Proceedings of the sixth international conference on computer and communication technology 2015

    (2015)
  • L. Dey et al.

    Acquiring competitive intelligence from social media

    Proceedings of the 2011 joint workshop on multilingual OCR and analytics for noisy unstructured text data

    (2011)
  • G. Fei et al.

    Exploiting burstiness in reviews for review spammer detection

    ICWSM

    (2013)
  • S. Feng et al.

    Distributional footprints of deceptive product reviews

    ICWSM

    (2012)
  • R. Filieri et al.

    E-WOM and accommodation: An analysis of the factors that influence travelers’ adoption of information from online reviews

    Journal of Travel Research

    (2014)
  • E. Fitzpatrick et al.

    Automatic detection of verbal deception

    Synthesis Lectures on Human Language Technologies

    (2015)
  • T. Fornaciari et al.

    Identifying fake amazon reviews as learning from crowds

    Proceedings of the 14th conference of the european chapter of the association for computational linguistics

    (2014)
  • N. Friedman et al.

    Bayesian network classifiers

    Machine Learning

    (1997)
  • D.H. Fusilier et al.

    Detecting positive and negative deceptive opinions using pu-learning

    Information Processing & Management

    (2015)
  • Y. Goldberg et al.

    Word2vec explained: Deriving Mikolov et al.’s negative-sampling word-embedding method

    arXiv preprint arXiv:1402.3722

    (2014)
  • Cited by (183)

    • Multiscale cascaded domain-based approach for Arabic fake reviews detection in e-commerce platforms

      2024, Journal of King Saud University - Computer and Information Sciences
    • Spotlight on fraud risk in hospitality a systematic literature review

      2024, International Journal of Hospitality Management
    View all citing articles on Scopus
    View full text