A framework for fake review detection in online consumer electronics retailers
Introduction
Online consumer product reviews are playing an increasingly important role for customers, constituting a new type of WOM information (Chen & Xie, 2008). Recent research shows that 52% of online consumers use the Internet to search for product information, while 24% of them use the Internet to browse products before making purchases (Ha, yong Bae, & Son, 2015). As a result, online reviews has a strong impact on consumers’ decision purchase in e-commerce, affecting the most relevant areas, such as travel and accommodations (Filieri, McLeay, 2014, Sotiriadis, Van Zyl, 2013), online retailers (Awad & Ragowsky, 2008), and entertainment (Chevalier, Mayzlin, 2006, Dhar, Chang, 2009, Zhu, Zhang, 2006). Moreover, online reviews of the same product can be found in multiples sources of information, which can be classified (Park, Gu, & Lee, 2012) according to the parties that host WOM information into internal WOMs, hosted by retailers (e.g. Amazon, Walmart, BestBuy, etc.) and external ones, hosted by independent product review providers (e.g. CNET, Yelp, TripAdvisor, Epinions, etc.).
Nevertheless, only credible reviews have a significant impact on consumers’ purchase decision (Chakraborty & Bhat, 2018). Moreover, product category affects significantly the credibility of WOMs (Mudambi & Schuff, 2010). Consumer electronics product category is the most online reviewed (Chan & Ngai, 2011), based on a number of factors. On the one hand, consumer electronics usually require a significant investment, and the more valuable and expensive an item is, the more it is researched. According to a study (Riegner, 2007), consumer electronics are the product most influenced by online reviews, influencing the 24% of products acquired in this category, and being WOMs the second most influential source after search engines in this product category. On the other hand, consumers tend to research on consumer electronics products because these products change very frequently, with new products and updates of existing ones (Chakraborty & Bhat, 2018). Thus, consumers frequently trust on reviews to avoid making a wrong purchase decision (Park & Kim, 2008). As a result, Horrigan et al. Horrigan and Vitak (2008) report that more than 50% of consumer electronics buyers tend to consult several WOMs before making a purchase decision.
Some studies (Gu, Park, & Konana, 2012) show that retailer hosted online WOM influences enormously sales in low involvement products, such as books or CDs. However, consumers usually conduct a pre-sales research in high-involvement products, such as consumer electronics. Thus, in consumer electronics, retailer’s internal WOM has a limited influence, while external WOM sources have a significant impact on the retailer’s reputation and sales (Cui, Lui, & Guo, 2012). Hence, consumer electronics are more sensible to the effects of external WOMs, since they cannot easily act on them.
Since both consumers and retailers become overwhelmed by the huge number of available opinions in WOM internal and external sources, automatic natural language processing and sentiment analysis techniques have been frequently applied. Some of the most frequent application domains are review polarity classification (Poria, Cambria, & Gelbukh, 2016), review summarization (Potthast & Becker, 2010), competitive intelligence acquisition (Dey, Haque, Khurdiya, & Shroff, 2011) and reputation monitoring (Ziegler & Skubacz, 2006).
Given the importance of reviews for businesses and the difficulty of obtaining a good reputation on the Internet, several techniques have been used to improve online presence, including unethical ones. Fake reviews are one of the most popular unethical methods which are present on sites such as Yelp or TripAdvisor. However, according to Jindal and Liu (2007b), not all fake reviews are equally harmful. Fake negative reviews on good quality products are really harmful for enterprises, and along with fake positive reviews on poor quality products, result also harmful for consumers. Fake positive reviews on poor quality products are also harmful for competitors who offer average or good quality products but do not have so many reviews on them.
The goal of this article is analyzing the fake review problem in the consumer electronics field, more precisely studying Yelp businesses from four of the biggest cities of the USA. No prior research has been carried out in this concrete field, being restaurants and hotels the most previously studied cases. We want to prove that fake review detection problem in online consumer electronics retailers can be solved by machine learning means and to show if the difficulty of achieving it depends on geographic location.
In order to achieve this goal, we have followed a principled approach. Based on literature review and experimentation, a feature framework for fake review detection is proposed, which includes some contributions such as the exploitation of the social perspective. This framework, so called Fake Feature Framework (F3), helps to organize and characterize features for fake review selection. F3 considers information coming from both the user (personal profile, reviewing activity, trusting information and social interactions) and review elements (review text), establishing a framework with which categorize existing research. In order to evaluate the effectiveness of the features defined in F3, a dataset from the social Yelp in four different cities has been collected and a classification model has been developed and evaluated.
The reminder of the paper is structured as follows. Section 2 reviews the state of the art on fake review detection on other domains. Afterwards, Section 3 presents the followed methodology and also introduces the proposed feature framework. Experimentation is detailed in Section 4. Finally, Section 5 highlights and discusses the main obtained results.
Section snippets
Related work
The task of fake review detection has been studied since 2007, with the analysis of review spamming (Jindal & Liu, 2007b). In this work, the authors analyzed the case of Amazon, concluding that manually labeling fake reviews may result challenging, as fake reviewers could carefully craft their reviews in order to make them more reliable for other users. Consequently, they proposed the use of duplicates or nearly-duplicates as spam in order to develop a model that detects fake reviews (Jindal &
Methodology
The methodology followed in this article is shown in Fig. 1. The first step is building the dataset from Yelp by web scraping means (Section 3.1). Then, a feature model is defined and computed (Section 3.2) for training a classifier that detects fake reviews (Section 4).
Experimentation and evaluation
This section describes the experiments carried out to develop and evaluate a fake review classifier model based on the framework previously described. The classifier has been trained and tested over the consumer electronics dataset previously scrapped and evaluated using ten fold cross-validation. Also, we tackle the impact the user and review centric features may have on the performance, and the possible differences across different cities. After analyzing the results obtained, a statistical
Conclusions
In this paper we have addressed fake review detection in the consumer electronics domain. We have proposed a feature framework oriented to analysis of social sites, and we have also developed a dataset which is made available2 for future research. Our framework is composed of two main types of features: Review centric and user centric. Review centric features are only related to the text of the review. On the other
Acknowledgments
This work is supported by the Spanish Ministry of Economy and Competitiveness under the R&D projects SEMOLA (TEC2015-68284-R) and EmoSpaces (RTC-2016-5053-7), by the Regional Government of Madrid through the project MOSI-AGIL-CM (grant P2013/ICE-3019), the European Union under Trivalent (2020 RIA Action Grant No. 740934 under the call SEC-06-FCT-2016), and Spanish MINETAD (TSI-102600-2016-1).
References (60)
An effective lda-based time topic model to improve blog search performance
Information Processing & Management
(2017)- et al.
Does chatter matter? the impact of user-generated content on music sales
Journal of Interactive Marketing
(2009) - et al.
Detection of fake opinions using time series
Expert Systems with Applications
(2016) - et al.
The effects of consumer knowledge on message processing of electronic word-of-mouth via online consumer reviews
Electronic Commerce Research and Applications
(2008) - et al.
The relationship between retailer-hosted and third-party hosted wom sources and their influence on retailer sales
Electronic Commerce Research and Applications
(2012) - et al.
Computational approaches for mining user’s opinions on the web 2.0
Information Processing & Management
(2014) - et al.
Aspect extraction for opinion mining with a deep convolutional neural network
Knowledge-Based Systems
(2016) - et al.
Opinion summarization of web comments
European conference on information retrieval
(2010) - et al.
Opinion fraud detection in online reviews by network effects
ICWSM
(2013) - et al.
Establishing trust in electronic commerce through online word of mouth: An examination across genders
Journal of Management Information Systems
(2008)
Using supervised learning to classify authentic and fake online reviews
Proceedings of the 9th international conference on ubiquitous information management and communication
Random forests
Machine learning
Classification and regression trees
The effects of credible online reviews on brand equity dimensions and its consequence on consumer behavior
Journal of Promotion Management
Conceptualising electronic word of mouth activity: An input-process-output perspective
Marketing Intelligence & Planning
Online consumer review: Word-of-mouth as a new element of marketing communication mix
Management Science
The effect of word of mouth on sales: Online book reviews
Journal of Marketing Research
The regression analysis of binary sequences
Journal of the Royal Statistical Society. Series B (Methodological)
The effect of online consumer reviews on new product sales
International Journal of Electronic Commerce
Statistical comparisons of classifiers over multiple data sets
Journal of Machine Learning Research
Identification of fake reviews using new set of lexical and syntactic features
Proceedings of the sixth international conference on computer and communication technology 2015
Acquiring competitive intelligence from social media
Proceedings of the 2011 joint workshop on multilingual OCR and analytics for noisy unstructured text data
Exploiting burstiness in reviews for review spammer detection
ICWSM
Distributional footprints of deceptive product reviews
ICWSM
E-WOM and accommodation: An analysis of the factors that influence travelers’ adoption of information from online reviews
Journal of Travel Research
Automatic detection of verbal deception
Synthesis Lectures on Human Language Technologies
Identifying fake amazon reviews as learning from crowds
Proceedings of the 14th conference of the european chapter of the association for computational linguistics
Bayesian network classifiers
Machine Learning
Detecting positive and negative deceptive opinions using pu-learning
Information Processing & Management
Word2vec explained: Deriving Mikolov et al.’s negative-sampling word-embedding method
arXiv preprint arXiv:1402.3722
Cited by (183)
An interpretable wide and deep model for online disinformation detection
2024, Expert Systems with ApplicationsMultiscale cascaded domain-based approach for Arabic fake reviews detection in e-commerce platforms
2024, Journal of King Saud University - Computer and Information SciencesInvestigating reviewers' intentions to post fake vs. authentic reviews based on behavioral linguistic features
2024, Technological Forecasting and Social ChangeSpotlight on fraud risk in hospitality a systematic literature review
2024, International Journal of Hospitality ManagementRHGNN: Fake reviewer detection based on reinforced heterogeneous graph neural networks
2023, Knowledge-Based SystemsA graph neural network approach to detect original review spammers of astroturfing campaigns
2023, Electronic Commerce Research and Applications