A systematic literature review about the consumers ’ side of fake review detection – Which cues do consumers use to determine the veracity of online user reviews?

Background: Consumers rely heavily on online user reviews when shopping online and cybercriminals produce fake reviews to manipulate consumer opinion. Much prior research focuses on the automated detection of these fake reviews, which are far from perfect. Therefore, consumers must be able to detect fake reviews on their own. In this study we survey the research examining how consumers detect fake reviews online. Methods: We conducted a systematic literature review over the research on fake review detection from the consumer-perspective. We included academic literature giving new empirical data. We provide a narrative synthesis comparing the theories, methods and outcomes used across studies to identify how consumers detect fake reviews online. Results: We found only 15 articles that met our inclusion criteria. We classify the most often used cues identified into five categories which were (1) review characteristics (2) textual characteristics (3) reviewer characteristics (4) seller characteristics and (5) characteristics of the platform where the review is displayed. Discussion: We find that theory is applied inconsistently across studies and that cues to deception are often identified in isolation without any unifying theoretical framework. Consequently, we discuss how such a theoretical framework could be developed.


Introduction
Private and business life has increasingly migrated to online platforms in recent years (Spithoven, 2020). Likewise, criminals are also increasingly turning to the online world and finding new ways to deceive victims (Van Nek & Bolz, 2021). This includes online shopping fraud (Spithoven, 2020) which can have (severe) physical, emotional and psychological consequences (Button & Cross, 2017;Button, Nicholls, Kerr, & Owen, 2014;Jansen & Leukfeldt, 2018). The emotional and psychological consequences of online shopping fraud include feeling unsafe in the online environment, stressing about finances, and feeling guilty or being blamed for being victimized due to their own behavior (Button & Cross, 2017;Button et al., 2014;Spalek, 2016;Stevens et al., 2023). The experience of being defrauded can lead to feelings of hopelessness, low self-worth or depression (Button & Cross, 2017;Spalek, 2016). The stress can also manifest in (severe) physical consequences such as psychosomatic symptoms or posttraumatic stress disorders which can manifest in skin diseases and insomnia (Button et al., 2014;Spalek, 2016). Due to the increasing use of online shopping, especially since the COVID-19 pandemic (Statista Research Department, 2021), there has also been an increase in online shopping fraud (Chevalier, 2021;Trivedi, 2021). Within online shopping fraud, there are different techniques used by fraudsters to trick consumers, however, we focus on fake user reviews as one of the most prominent examples of consumer fraud (Bambauer-Sachse & Mangold, 2013;Malbon, 2013).
Research has shown that user reviews have a substantial impact on the purchase decision of consumers (Malbon, 2013). Hu, Bose, Koh, and Liu (2012) also show that consumers increasingly rely on these reviews. Therefore, if user reviews on online platforms, such as Amazon, Google, or TripAdvisor, are fake, consumers could be manipulated into buying low quality or unsafe products or services. It is not yet possible to determine the specific costs to consumers of fake review, in part because there are as yet no longitudinal studies indicating the consequences (Spithoven, 2020). However, there are studies showing the costs of shopping fraud more generally. A 2022 representative study by the German federal office for information security shows that eight percent of their participants were victims of online shopping fraud. The consequences of this fraud varied from financial damage, damage to trust in brands and shopping platforms, and time costs (BSI, 2022). Unfortunately, this report could not specify how these costs could be allocated to different cybercrimes, and no differentiation is made between different types of online shopping fraud (BSI, 2022). Nonetheless, given the known high costs associated with shopping fraud generally, the costs of fake online reviews are likely to be substantial (Román, Riquelme, & Iacobucci, 2023).
Fake reviews are reviews that have an intention to mislead customers in their decision to purchase a product. They are often written by reviewers with little or no actual experience with the products or services. Fake reviews can be either unwarrantedly positive, aiming to promote a product, or unjustifiably negative, for example to damage the reputation or sales of competitor products and brands . While research on fake review detection has increased in recent years (e.g. Patel & Patel, 2018;Rastogi & Mehrotra, 2017;Santos, Camargo, & Lacerda, 2020), existing literature reviews such as Vidanagama et al. (2020) and Hussain, Turab Mirza, Rasool, Hussain, and Kaleem (2019) have focused on algorithmic, machine learning techniques. Algorithmic fake detection is mostly used by businesses, since the common user does not have the capacity to compare thousands of reviews via an algorithmic approach (Ansari & Gupta, 2021a;Cohen, 2020). Consequently, it is important that we also understand how the typical consumer makes judgements about whether a review is genuine or fake. There are literature reviews that but consider consumers and do not solely focus on algorithmic fake review detection (Ansari & Gupta, 2021a;Wu et al., 2020). Ansari and Gupta (2021a) focus on the umbrella term review manipulation and they review different spam detection models and identify characteristics of fake reviews (e.g. fake reviews lack the first-hand experience of the product). Furthermore, they review the different themes that have been researched in the online review manipulation context. For example, they examine the characteristics and motives of organizations that engage in review manipulation, the impact of review manipulation, and how consumers respond to an online review that they have determined is not genuine. However, they do not consider how consumers detect fake reviews in the first place (Ansari & Gupta, 2021a). Wu et al. (2020) do consider the critical role consumers play in determining how fake reviews impact the market as the ultimate decision makers. However, their review does not consider the decision making processes of consumers. Rather, their review identifies the antecedents (e.g. who writes fake reviews and for what purpose) and consequences (e.g. damaged credibility of reviews and reduced trust in business) of fake reviews, and contrast existing interventions to challenge fake reviews. However, the interventions they review focus on technical and legal interventions such as algorithmic detection of regulatory controls. However, Wu et al. (2020) do conclude that ultimately it is important on which website the review is displayed and if consumers perceive to have an influence on how much fake reviews impact the market.
Therefore, there remains a need to consider ways of informing and protecting consumers from fake reviews (Ansari & Gupta, 2021a). An important step toward supporting consumers is not only to know how they respond to fake reviews (as e.g. Ansari & Gupta, 2021a), but to also know how consumers process cues from online reviews to estimate veracity. This literature review is a first step in gathering the existing literature on how consumers determine if an online review is genuine or not. In particular, we, seek to determine which cues to veracity actually inform consumer decisions, and how those cues are processed by consumers to form a veracity judgment based on the existing literature.
Understanding the processes by which consumers make judgments will allow for the development of more directly tailored interventions. Thus far, there are few websites that help consumers to analyze the credibility of reviews, for example ReviewMeta, 1 Fakespot, 2 The Review Index 3 and Review Sceptic 4 (Cohen, 2020). However, it is unclear what proportion of users are aware of or engage with such tools or what level of protection they confer. In light of the limitations and inaccessibility of technical support systems, it remains important for consumers to be able to distinguish between fake and genuine reviews themselves. However, at present there is no clear overview of how people distinguish fake from genuine reviews. In order to be successful from a socio-technical perspective, websites or algorithms supporting consumers in the veracity judgment of a review must facilitate users' ability to accurately identify relevant detection cues. That is why in this paper we consolidate existing work about fake review detection and in order to identify the detection cues consumers use that have already been identified by prior research.

State of the art
Researchers commonly distinguish between three different types of fake reviews (Ansari, Gupta, & Dewangan, 2018;Hu et al., 2012;Mayzlin, Dover, & Chevalier, 2014;Ren & Ji, 2019). The first type are untruthful opinions, which are reviews that present false opinions of products . The second type are reviews that focus on the brand of the product and not the product itself, which means that they do not actually provide consumers with advice about a specific product . The last type are non-reviews. These cover advertisements for different products or services and other irrelevant information without containing any opinions . The second and the third type of fake reviews are easily detected by machines and people, yet most consumers have no idea or are unsure about how to detect the first type of untruthful opinion reviews (Cohen, 2020). In addition, even if people are confident in detecting fake reviews, research shows that humans are only about 60%-80% accurate in labeling user reviews as fake or genuine (Plotkina, Munzel, & Pallud, 2020;Shukla et al., 2019). Compared to machine learning approaches, which can be up to 90% accurate, human judges are far less accurate in detecting fake reviews (Harris, 2012;Hovy, 2016;Kim, Kang, Shin, & Myaeng, 2021;Kronrod, Lee, & Gordeliy, 2017;Masip, Bethencourt, Lucas, Segundo, & Herrero, 2012;Ott, Choi, Cardie, & Hancock, 2011).
Due to the superiority of machine learning over human detection, existing research on fake review detection mostly focuses on automated fake review detection (Ansari et al., 2018;Ezhilarasan, Govindasamy, Akila, & Vadivelan, 2019;Hossain et al., 2020;Hussain et al., 2019;Patel & Patel, 2018;Vidanagama et al., 2020). Since most of the research is on automated fake review detection, and because the cues used to train machine learning approaches to fake review detection may have relevance to how humans detect fake reviews, we will also consider the features that are used to train machine learning approaches to identify fake reviews.
Most features that are used to identify fake reviews can be categorized as (1) Review centric, (2) Reviewer centric, (3) Meta-data centric or (4) Product centric (Heydari, Tavakoli, & Salim, 2015Mohawesh et al., 2021;Rastogi, 2020;Rastogi & Mehrotra, 2017;Ren & Ji, 2019). The features from the different categories are sometimes used in combination for better detection accuracy . Appendix C provides a detailed overview of the features used in existing literature, but we briefly describe each category here.
Review centric features are the most popular feature category used to train automated fake review detection tools (Alsubari, Shelke, & Deshmukh, 2020, pp. 3846-3856;Budhi, Chiong, & Wang, 2021;Ott et al., 2011;Reddy, Babu, Scholar, & Tech, 2020). Review centric features are aspects of the reviews themselves that can be indicative of whether a review is genuine or fake. For example, the use of adverbs can indicate whether a review is genuine or fake, because fake reviews generally contain more adverbs than genuine reviews (Ahmed, Traore, & Saad, 2018;Ott et al., 2011;Shukla et al., 2019),.
The second most popular category of feature is the reviewer centric type (Barbado, Araque, & Iglesias, 2019;Fazzolari, Buccafurri, Lax, & Petrocchi, 2020;Fontanarava, Pasi, & Viviani, 2017;Gera, Thakur, & Singh, 2015;Goswami, Park, & Song, 2017;Harris, 2018;Y. Li, Feng, & Zhang, 2016;Pillala, Bhansali, Reddy, & Rojamani, 2020;Pokharkar, Shete, & Ghogare, 2017, p. 7;Shehnepoor, Salehi, Farahbakhsh, & Crespi, 2017;Wang et al., 2016;. These features assess aspects of the user that posted a review, rather than the content of the review itself. Reviewer centric features include trust features Fazzolari et al., 2020;Goswami et al., 2017;). An example of trust features are personal features of the reviewer , such as whether they use their real name Pillala et al., 2020). Recent research on reviewer centric features also focuses on the detection of reviews that were written by the same reviewer under different names (Le, Li, & Li, 2022). This means that some reviewers post the same review under a product, but use different usernames to pretend to be multiple people expressing the same opinion of that product.
The third category are meta-data features which include temporal features Khurshid, Zhu, Xu, Ahmad, & Ahmad, 2019), such as the date variance of the reviews (the amount of time that passes between reviews) .
The fourth category is product centric features Wang et al., 2016) for example the price  or the average rating of the product Wang et al., 2016).
Other authors, such as Wu et al. (Z. Wu et al., 2017), include topological (interaction features on social media) and group features in addition to the four most common categories. Furthermore, there are algorithmic approaches that focus on deep learning which do not use specific review-, behavior-, or product centric features (Hernandez, Rahman, Recabarren, & Carbunar, 2018;Wu et al., 2020).
This prior work helpfully identifies cues to online review veracity, and provides a classification system for those cues. In this way they provide a useful starting point for the identification and classification of cues used by consumers to detect fake online reviews. However, humans do not process information in an algorithmic way (Korteling et al., 2021) and therefore, there is a need to know if humans use the cues that have been identified in algorithmic approaches, and if they can be categorized similarly. Moreover, identifying and classifying cues would not directly inform us about how consumers process cues to make veracity decisions. It is also likely to remain the case that the ultimate decisions about whether to trust a review or not will lie with the consumer. Algorithmic fake review detection is unlikely to perfectly remove all fake reviews, and automated fake review detection is not realistically accessible for all consumers Kennedy, Walsh, Sloka, Foster, & McCarren, 2019). Therefore, there is a clear need to better understand how consumers identify and process cues to deception in online reviews.

Research question
In light of the shortcomings of machine learning approaches and the high likelihood consumers will encounter fake reviews, supporting consumers in their judgement remains an important task. However, it is not well understood how consumers approach detecting fake reviews in the first place, or how far consumer judgements make use of known cues of deception used within automated detection. In order to develop successful interventions for consumers we first need to understand how they make their own judgements regarding review veracity. Therefore, our research question is: What cues and features are being used by consumers for the detection of fake reviews in online shopping recommendation systems?
We address this question via a systematic review of the extant peer reviewed literature and use our findings to discuss a research agenda to further shape the direction of this young research field.

Methods
To address our research question we use a systematic review methodology following the guidelines of Kitchenham (2004).

Inclusion and exclusion criteria
We first determined the scope of our study through our inclusion and exclusion criteria. We present these criteria in Table 1. The inclusion criteria required that the research must focus on fake review detection, be user-centered, and identify cues for detecting online shopping fraud. Therefore, studies which compare the accuracy and performance between automated fake review detection approaches and human judges, without indicating the cues used by the human judges to identify the veracity of a review, were not included in the scope of this study (e.g. Kim et al., 2021;Ott et al., 2011;Plotkina et al., 2020;Yuan et al., 2016). We included papers about any kind of online shopping, for example booking hotels, shopping for clothing, gifts or household goods, or shopping via Marketplace platforms. Included research had to provide new empirical data which had not been previously reported, but we do not limit studies by design. That is, both qualitative and quantitative studies of all types could be included so long as they met our other inclusion criteria. Our exclusion criteria were that we excluded papers which are not peer-reviewed academic literature. This means that citations, patents, books, PhD theses, Master and Bachelor theses were excluded from the literature search. We also excluded everything that is published before 2007, because Jindal & Liu (2007) were the first to provoke wider analysis of fake reviews and their detection. Furthermore, we excluded research that does not provide new empirical data on fake review detection, for example literature reviews. Research about spammer and spammer group identification as well as fake news detection were excluded since they did not fall into the scope of this study. Finally, for pragmatic reasons we excluded papers that were not written in Dutch, English or German, since we were not able to translate other languages into our analysis.

Data sources and search strategy
The key objective of this research is to identify all studies that focus Table 1 Inclusion and exclusion criteria for the papers.

Inclusion criteria
Fake review detection Quantitative studies when cues are mentioned for detecting fake online reviews Qualitative studies when cues are mentioned for detecting fake online reviews User-centered Must include new empirical data on that identifies human rated cues for identifying fake online reviews Exclusion criteria Machine learning where algorithms were used Not peer-reviewed academic literature such as books, Master and Bachelor Thesis etc. Papers written in a language other than German, Dutch or English Citations & patents Books Articles that could not be accessed despite our best efforts on how consumers detect fake reviews in online shopping. Before conducting the systematic search, we ran some preliminary searches to identify keywords and synonyms associated with our topic (see Appendix A and see Table 2). From these synonyms we constructed a search string that included all known synonyms of fake reviews, detection cues and used Boolean operators to distinguish between synonyms and essential keywords. We included the user-perspective because we are focusing on consumers detection techniques and included synonyms for e-commerce. Since we did not want to include automated fake review detection, we excluded those. Our final search string was then entered into the search engines Web of Science, IEEE Xplore and Elsevier ScienceDirect: (deception cues OR detection strategies OR deception detection OR spam indicators OR opinion fraud detection OR opinion spam detection OR opinion spam features) AND (opinion scams OR opinion fraud OR fake reviews OR opinion spam OR online review OR review spam) AND (user-perspective OR consumer-perspective OR consumer OR human) AND (recommender systems OR persuasive recommendations OR e-commerce) NOT (machine learning OR autoencoder OR deep learning) However, Web of Science and IEEE Xplore only produced six results, while Elsevier did not offer any results. Therefore, we decided to broaden the search and use Google Scholar. Here the search string was too long, and we therefore used Harzing's Publish and Perish software 5 to extract the literature. We combined the results from all search engines for our analyses.

Study selection process
The selection process of this literature review had several steps. In a first step the first author read the titles of the papers and decided whether the title of the paper indicated that the paper might meet the inclusion criteria. Articles were only excluded when the title indicated they were clearly irrelevant. The abstract of the retained articles were then read. Again, articles were excluded at this stage only if it was clear the paper did not fall into the scope of the review, all remaining abstracts were retained for full text screening. Papers were retained at full text reading only where they met all inclusion criteria and no exclusions criteria. The data collection and extraction were performed by only the first author. However, we agreed that if the first author was unsure about whether the study fitted the scope of the review, the other authors would look at the paper and a decision would be made through discussion. However, during the process it was never necessary to enact this since the decisions were clear.

Data extraction and data items
The data extraction was conducted in a standardized way (Biolchini, Mian, Natali, & Travassos, 2005;Castelli, Stevens, & Jakobi, 2019). We collected general information such as publication year and country in which the study was done. Additionally, we collected information about the methodology and the theories (if any) that were used. Lastly, we collected the deception cues that were used or identified (depending on the methodology), and the main findings of each study. We used the classification of cues to review veracity identified from algorithmic approaches to fake review detection as our starting point. We present these classifications in Appendix C. Where those categorizations did not apply, we used the descriptions of cues and any categorical classification of those cues provided by the authors of the reviewed papers to determine classification. Where these descriptions did not fit within any of the preexisting categories provided in Appendix C, we clustered cues based on core common characteristics. For example Ansari et al. found that the more influence the seller has on the website the less trustworthy the reviews are perceived to be (2018); since this feature does not fit in the categories of meta-data, reviewer-centric, review-centric or product-centric we grouped it together with the seller reputation and the trade record under the category "seller characteristics". We extracted the data by coding them with the software MAXQDA, 6 allowing us to keep track of and classify all cues to deception identified across all studies in our review. We also used MAXQDA to track the methods and theories used in each reviewed study. We did this by forming a descriptive table in MAXQDA.

Study risk of bias
Systematic reviews often assess the extent to which individual study findings may be at risk of being biased. However, we did not assess quality with the risk of bias because of the variety of methodologies in the studies which would mean there would be no clear or fair way to compare risk of bias across studies (Higgins et al., 2021). Instead, we limited our analysis to assessing the quality of methodology used across the studies we identified as a whole.

Synthesis
Our aims were to survey the theories, methods and outcomes used in studies on fake review detection thus far, and to synthesize their commonalities and differences. Therefore, we employed a narrative rather than statistical synthesis. Since we did not perform any meta-analysis due to the variety of different methodologies used in the different studies, we did not ask the authors for original data.

Study selection
Our search strategy initially identified 989 articles. However, in our preliminary search for key words we found 26 articles that we did not find in the systematic search, but their title implied that they might be relevant for our study. Therefore, we decided to add these articles to those we identified through our keyword search. In total, we identified 1,015 potentially relevant articles for our analysis. We did four rounds of selection as outlined in Fig. 1. In the first round we scanned the titles of all articles; literature that obviously did not cover fake review detection, duplicates and literature about reviewer detection were excluded in this round (n = 697), which left 318 articles. In the next round of exclusion, the abstracts were examined. We then excluded research about fake news and research focusing on the perpetrators of fake reviews because they also do not fall in the scope of our review (n = 223). Of the 95 articles that remained after abstract screening four were not available online, therefore we contacted the authors. For one paper, the author did not respond to these requests and so Gupta & George (2013) could not be accessed and was excluded. We then examined the full text of 94 articles and after applying the inclusion and exclusion criteria we excluded 79 more papers.
This means that the analysis is based on the following 15 articles, of which five were from the Google Scholar orientation search: Ananthakrishnan, Li, and Smith (2020)

Study characteristics
The 15 papers we found during our literature search can be categorized into three groups. First we have the papers that directly address our research question by identifying or testing cues consumers use when determining the veracity of an online review (Ansari et al., 2018;Ansari & Gupta, 2019, 2021bDeAndrea et al., 2018;Kronrod et al., 2017;Peng et al., 2016;Román et al., 2019, pp. 141-166). Secondly, we have papers that did not focus on actual deception cues but on the broader concept of trust (Ananthakrishnan et al., 2020;Filieri, 2016;Munzel, 2015Munzel, , 2016Racherla et al., 2012). And lastly, we have papers that did not focus on actual deception cues but on the broader concept of credibility (Jensen et al., 2013;Kusumasondjaja et al., 2012;Luo et al., 2013). For an overview of all papers with the theories and methods they used and the cues they found see Appendix B.
In the following we will discuss the cues found and methods and theories used across the studies.

Detection cues found to be used by consumers
We found seven papers that concentrated on the identification of fake review detection cues used by consumers (Ansari et al., 2018;Ansari & Gupta, 2019, 2021bDeAndrea et al., 2018;Kronrod et al., 2017;Peng et al., 2016;Román et al., 2019, pp. 141-166). However, three papers from Ansari and colleagues report on different aspects of the same mixed-methods study. The three reports use different theoretical frameworks and report different aspects of their results. Therefore, we will report the results and theoretical framework separate from each other but will treat the methods as being from a single study.
We found five studies (Ananthakrishnan et al., 2020;Filieri, 2016;Munzel, 2015Munzel, , 2016Racherla et al., 2012) that focus on the factors that influence perceived trustworthiness, and three studies (Jensen et al., 2013;Kusumasondjaja et al., 2012;Luo et al., 2013) that focus on the cues that influence the perceived credibility of a reviews. We find that deception cues and cues to trustworthiness and credibility are inseparable from each other. While deception cues give reasons why reviews might be fake, trustworthiness and credibility cues give reasons why reviews might be genuine. In turn, for example, a lack of deception cues could also influence trustworthiness and credibility. Therefore, we considered them all together.

Review characteristics
Review characteristics include the review content, that is, what the review actually says and details about the product, usefulness count of the review (Ansari et al., 2018), and other characteristics of the review itself.
Review extremity, for example the use of 1 or 5 star ratings, was found in four different studies to be an indicator of untruthfulness (Ansari et al., 2018;Ansari & Gupta, 2019, 2021bFilieri, 2016 Luo et al., 2013;Racherla et al., 2012;Román et al., 2019, pp. 141-166) to be an indicator of genuine reviews, with higher quality arguments indicating genuineness. Ansari et al. (Ansari et al., 2018;Ansari & Gupta, 2019) and Kronrod et al. (2017) both found that if a review is overly short or long, it is perceived as more suspicious. Filieri (2016) and Kusumasondjaja et al. (2012) found that the valence of a review is by some seen as a cue to detect the veracity of a review, meaning that negative online reviews are perceived as more credible than positive ones. However, a positive review has more influence on initial trust than a negative one. Kusumasondjaja et al. (2012). Jensen et al. (2013), on the other hand, found that two-sided reviews are more credible. They furthermore found that the complexity of language does not influence the credibility assessment of the review while a high affective intensity decreases review credibility (Jensen et al., 2013). Ansari et al. (2018) found that a higher review helpfulness count for a review is more likely to be perceived as genuine. Additionally, Ansari and Gupta (2019) find peripheral cues, such as verified badges, the most recent reviews and 1 and 5 star ratings, which influence the assessment of a review towards the positive (verified badges and the most recent reviews) and negative (1 and 5 star ratings). Multiple studies show that reviews which correspond with other reviews, in content and rating, are perceived to be more genuine (Filieri, 2016;Munzel, 2015Munzel, , 2016Peng et al., 2016;Román et al., 2019, pp. 141-166). In this vein, Filieri (2016) and Munzel (2016) show that the number of reviewers complementing or complaining about the same things positively influences the reviews' perceived plausibility (Filieri, 2016;Munzel, 2016). Essentially, consistent reviews provide social proof, which is known to impact on purchasing decisions (Amblee & Bui, 2011). However, if the wording of the reviews is too similar, consumers perceive them as fabricated and therefore less genuine (Peng et al., 2016). Furthermore, Peng et al. (2016) and Román et al. (2019, pp. 141-166) found that the quantity of the reviews per product has a positive influence of the perceived veracity of a review. Román et al. (2019) and Peng et al. (2016) found that the review quantity has a positive influence on the perceived trustworthiness of the product, which means that consumers perceive a product as more trustworthy the more reviews the product has.

Textual characteristics
Textual characteristics include writing style, such as the use of pronouns and adverbs, but also the amount of detail and the spelling and grammar of the review (Ansari et al., 2018;Filieri, 2016). Research found that textual characteristics can influence the perceived veracity of a review (Ansari et al., 2018; Ansari & Gupta, 2021b; Kronrod et al., 2017). Kronrod et al. (2017) found that people assess a review based on avoidance of detail, specific descriptions or names, vagueness, omission of information, and the presence of grammar/spelling errors (Kronrod et al., 2017). Ansari and Gupta (2021b) find that referencing, flattering, contextual embedding, and detailing influence the perceived veracity of a review. The authors define "referencing" as providing cues which identify the people involved, for example, the use of first person pronouns. Ansari and Gupta (2021b) find that reviews containing more personal pronouns are perceived as less genuine. The authors argue that such a counter-intuitive outcome could be due to a relational connection that the reader makes between the reviewer and the product, which enhances the subjectivity and therefore the believability (Ansari & Gupta, 2021b). In a different study Ansari et al. (2018) also find that reviews with a higher level of subjectivity are perceived as more likely to be genuine. "Contextual embedding" means giving statements that place the event within the context of a specific time and place. Deceivers often leave out these connections to actual events. Therefore, contextual embedding increases the perception a review is likely genuine (Ansari & Gupta, 2021b). "Detailing" means providing sufficient details in a text or review so that only few open questions about the product remain. In many contexts, detailing is associated with genuineness, because it implies knowledge and experience with the product. However, in online reviews it is perceived as deceptive because consumers believe excessive details are suspicious (Ansari & Gupta, 2021b). "Flattering", on the other hand, refers to the existence of stylistic textual elements which review authors uses to appear pleasant and to connect with the audience. In an online review, flattering could be represented through the use of exclamation marks or emoticons. Flattering often leads to reviews being perceived as not genuine, because, for example the excessive use of exclamation marks, is assumed to reflect harmful intent in a reviewer (Ansari & Gupta, 2021b).

Reviewer characteristics
Another important aspect consumers look at when trying to identify the veracity of a review is the identity of the reviewer (Ansari et al., 2018;DeAndrea et al., 2018;Kusumasondjaja et al., 2012;Luo et al., 2013;Munzel, 2016;Peng et al., 2016;Román et al., 2019, pp. 141-166). That is, the more information there is about the reviewer, the more likely the review is perceived to be genuine. If the perceived similarity between the reader and the reviewer is high, the review is also more likely to be perceived as genuine (Racherla et al., 2012). Consumers judge the similarity between themselves and reviewers based on background information and details reviewers provide. For example, opinions or preferences expressed in a review that match those of the consumer, and sociodemographic background information that some platforms provide, such as age or nationality (Racherla et al., 2012;Román et al., 2019, pp. 141-166). DeAndrea et al. (2018) found that the more independent from the product the reviewer is perceived to be then the more genuine is the review perceived to be, and the less influence the seller has on the displayed reviews the more genuine the reviews are perceived to be. Luo et al. (2013) found that a high source credibility can increase the recommendation credibility. Peng et al. (2016) found that participants consider seller reputation and the trade record, the higher/better they are, the more trustworthy the reviews of the product are perceived to be. The more influence the seller has on the website the less trustworthy the reviews are perceived to be (Ansari et al., 2018;DeAndrea et al., 2018). This means that if it is perceived that the seller can delete or publish the displayed reviews (on their own business website for example) the reviews are perceived as less trustworthy.

Platform characteristics
Platform characteristics include not only which platform is used (company website or a website with different vendors) but also the layout, design and the information displayed on the platform (Ansari et al., 2018;Munzel, 2015). Different studies identified that it is important on what kind of website the review is displayed, for example on the company website, a platform for reviews or a neutral website (Ansari et al., 2018;DeAndrea et al., 2018;Filieri, 2016;Román et al., 2019, pp. 141-166). Furthermore, if the website displays fake reviews and marks them as such, the website and the reviews on that website are appraised as more trustworthy (Ananthakrishnan et al., 2020;Munzel, 2015Munzel, , 2016. These studies show that participants place more trust in websites that display fraudulent reviews, and a summarized version of the fraud review information in the form of a trust score (Ananthakrishnan et al., 2020;Munzel, 2016). This suggests that consumers prefer to be aware of fraudulent reviews and that they likely prefer it when their cognitive load is reduced in the form of a summary score instead of having to go through all reviews themselves. Munzel (2015) shows in an experiment that a third-party seal, indicating that the fake review detection algorithm used by a website is approved, and the display of information about the detection support mechanisms used by a website increase the perceived trustworthiness of reviews.
Three studies indicated that prior knowledge and training affects veracity decisions. DeAndrea et al., (2018), Munzel (2015), and Munzel (2016) show that knowledge about review spam has a positive effect on accurately detecting fake reviews. However, Kronrod et al. (2017) found that educating users about specific cues to deception to participants makes them overly suspicious. This means that participants tended to assume real reviews were fake. This could lead to the popularity and sales of platform sellers to decrease (Ansari et al., 2018). However, this negative effect of knowledge about cues to deception ceases to occur if a third-party seal suggests that the website uses a good detection support mechanism (Munzel, 2016).

Mixed-methods
Four studies used a mixed-methods approach that combined qualitative and quantitative methods. Ansari et al. (Ansari et al., 2018;Ansari & Gupta, 2019, 2021b used interviews as a preliminary study and followed these with an experimental survey and reported the combined findings across three different papers. Peng et al. (2016) conducted interviews first and then conducted an online survey without any experimental component. Kronrod et al. (2017) used a different approach. They asked participants to first generate fake and genuine reviews and then asked a second batch of participants to judge veracity of the generated reviews in an experimental survey. Racherla et al. (2012) first conducted focus groups with undergraduate students and then conducted an experimental survey with a second batch of participants.
Ansari et al. (Ansari et al., 2018;Ansari & Gupta, 2019, 2021b conducted an exploratory study interviewing 20 Indian online shoppers to understand customers' information processing mechanisms and strategies adopted to deal with review manipulation in India. Peng et al. (2016) conducted 16 semi-structured interviews, asking participants about their knowledge and experience in fake review detection. Racherla et al. (2012) conducted focus groups with 20 undergraduate students while Kronrod et al. (2017) randomly assigned 1190 participants from the United States of America to one of six conditions to write review for a hotel or motel stay. In their authentic condition, participants wrote a review for a hotel they really experienced. In the fake condition they were assigned to one out of five groups. In the first group the participants were not given a clue on how to best write a fake review that seems genuine. In the other four conditions they were given the clues: to use more past tense, more unique words, more abstract language, and more personal pronouns.
In their main study, Ansari et al. (2018) randomly assigned 202 post-graduate students into one of four groups asking them to sort 30 reviews into fake and truthful reviews (Ansari et al., 2018;Ansari & Gupta, 2021b). Similar to this, Kronrod et al. (2017) asked 328 MTurk workers to each read and sort 60 reviews (30 real and 30 fake reviews from their study 1 control group where no directions were given with regard to how to successfully write a fake review) as either genuine or fake with five different conditions getting different (or no) clues beforehand. Peng et al. (2016), on the other hand, conducted a pen and paper survey with 199 Chinese students. Their survey aimed to assess how consumers perceive different manipulation tactics. Racherla et al. (2012) used a different approach conducting a 2 x 2 (within-subjects) x2 (between-subjects) experimental design to depict the factors that influence the consumers trust in a review (manipulated variables: valence and argument quality of the content and sociodemographic information). They asked 283 undergraduates to search the internet for a hotel to stay there with their friends. Then they were presented with four reviews and were asked to assign trust scores (Racherla et al., 2012).

Interviews
While in the mixed-method approaches interviews were conducted as exploratory pre-studies, two studies conducted interviews as the main study to gather more insight into how consumers deal with online consumer reviews (Filieri, 2016;Román et al., 2019, pp. 141-166). Filieri (2016) conducted 38 in-depth interviews with TripAdvisor users to ask about their experiences with online user reviews in the United Kingdom. Román et al. (2019, pp. 141-166) held 18 in-depth, semi-structured interviews where participants were asked to describe their most recent experiences with online consumer reviews before buying a product or service. After that, a few open-ended questions were asked, concerning how often they read reviews, on which platforms and their trust in reviews.

Experiments
While three studies conducted a single experimental survey (DeAndrea et al., 2018;Jensen et al., 2013;Kusumasondjaja et al., 2012), there were two studies that conducted multiple experiments in the form of surveys (Ananthakrishnan et al., 2020;Munzel, 2015). Munzel (2016) first conducted an analysis of real online reviews and then used the data to conduct an experimental survey with 390 participants. Lastly, Jensen et al. (2013) conducted a two-part experimental laboratory survey with 231 students.
DeAndrea et al. (2018) conducted a 2 (review spamming knowledge) × 2 (review platform) between-subjects experimental design where 123 participants were randomly assigned to one of four conditions in the United States of America. As the review spamming knowledge factor, the authors gave the participants one of two articles, one was an article about fake reviews on Yelp as an awareness stimulus and the second was an article about iPhones as a control stimulus. For the review platform factor, the authors created dummy Yelp webpages and dummy restaurant webpages for restaurants that do not exist (DeAndrea et al., 2018). Jensen et al. (2013) conducted a 2 (lexical complexity: high versus low) x 2 (two-sidedness: high versus low) × 2 (affect intensity: high versus low) fully crossed experimental design in the United States of America. The 231 participants were randomly assigned into one of the experimental conditions, which consisted of eight separate product webpages. There was one base product review which was edited for all combinations of high and low: (1) lexical complexity (2) two-sidedness (3) affect intensity (Jensen et al., 2013). Kusumasondjaja et al. (2012) conducted a 2x2 between-subjects experiment with travellers from 31 different countries visiting Bali at the time of recruitment. The reviews that were displayed to the participants were manipulated on message valence (positive and negative) and the reviewers' identity (identified and anonymous) (Kusumasondjaja et al., 2012). Ananthakrishnan et al. (2020) conducted three online experiments in the United States of America, where the participants selected a review from a fictive restaurant review portal. They had 820 participants in total (238 first; 293 s; 289 third). They measured different behaviors of the groups by different metrics such as clicks, time spent, number of page visits, and page activities in the first experiment. In the second experiment they measured the influence of: fraud information displayed, fraud information displayed together with a trust score, 7 only trust score displayed, and silent approach with a note that reviews were removed. They asked participants to make a bet on which review they trusted more; they were also able to divide their virtual chip. The third experiment tested how fake reviews influenced consumer behavior under different product quality settings (high, medium and low quality) by asking participants again to bet which restaurant would win a competition for being the best restaurant (all the needed information was available on the website); they could also divide their money (against and for the restaurant). After the bet was placed, participants were informed that the website contained fake reviews and were offered the opportunity to look at the fake reviews and change their bets (Ananthakrishnan et al., 2020). Munzel (2015) conducted three online experiments in France with a total of 549 participants (first 197; second 211; third 141). In all experiments, the control group was given no additional information, while the other groups were primed by showing them a news article about the existence of fake reviews. In addition, three additional (dis-)trusting cues were tested: an independent Consumer Report Seal for detection capabilities of the website; the review site's own indication of the probability that a review is fake; or information about the overall rating of the product (Munzel, 2015).
In a follow up study, Munzel (2016) conducted a related 3 (identity disclosure) x2 (consensus) x2 (priming of fake reviews) experiment in France with 390 participants. They were confronted with newspaper articles that either activated fake review knowledge or not and were then exposed to a screenshot of a review for a restaurant. The review was manipulated to have either high, medium or low identity disclosure and were either in consensus with the overall rating or not. Afterwards, participants were asked about their perception of trustworthiness, purchase intention, and avoidance behavior (Munzel, 2016). Luo et al. (2013) conducted an online questionnaire with 199 students in China to test whether recommendation persuasiveness affects the perceived credibility of a review. First participants were asked to read a new recommendation based on their current need or interest and copy the link into the questionnaire. The second part consisted of the questions for the constructs of their model and the third part were demographic questions (Luo et al., 2013).

Theories used in research on human fake review detection
In this section we introduce the theories used across the reviewed studies and describe how they were applied to consumers review veracity decisions. Warranting theory was used by three studies (Ansari et al., 2018;Ansari & Gupta, 2019;DeAndrea et al., 2018). The elaboration likelihood model (ELM) was used by three studies (Ansari & Gupta, 2019;Luo et al., 2013;Román et al., 2019, pp. 141-166), in all cases the ELM was used in combination with other theories. Two studies used uncertainty reduction theory (Kusumasondjaja et al., 2012;Racherla et al., 2012) and one study (Ansari & Gupta, 2021b) used speech act theory (SAT). Credibility theory was used by two studies (Filieri, 2016;Jensen et al., 2013). There were five studies that did not use existing theoretical frameworks (Ananthakrishnan et al., 2020;Kronrod et al., 2017;Munzel, 2015Munzel, , 2016Peng et al., 2016).Of these five studies, four based their research questions and hypotheses on earlier research without constructing a specific theoretical model (Ananthakrishnan et al., 2020;Kronrod et al., 2017;Munzel, 2015;Peng et al., 2016).

Warranting theory
Warranting theory was used by three studies (Ansari et al., 2018;Ansari & Gupta, 2019;DeAndrea et al., 2018), one of which combined it with the ELM (Ansari & Gupta, 2019). The key principle of the warranting theory is that the perception of information control reflects the impression of the information (Ansari et al., 2018;Ansari & Gupta, 2019;DeAndrea et al., 2018). This means that if the receiver of information has the impression that the sender has a strong influence on the information, the receiver will perceive the information as less valuable, meaning it has a low warranting value. In our context this means that the greater the manipulation of the content (e.g. the ability to edit or censor reviews) is perceived by the consumers, the lesser the authenticity of the review would be perceived to be (DeAndrea et al., 2018). The studies show that warranting theory can explain the relationship between the perception of a sender (e.g. a reviewer) of a message and the decision about how credible the message is perceived to be. Ansari et al. (2018) showed that review extremity, helpfulness, informativeness, subjectivity, and objectivity will be perceived as more (helpfulness voting, subjectivity, and moderate informativeness) or less (extreme star ratings, objectivity, and extreme or no informativeness) impressive and therefore affect the warranting value of the review. DeAndrea et al. (2018) showed that knowledge about review spamming, and review platform (if the reviews appear on a business review website or on the website of the business itself) influence the perceived veracity of a review by increasing or decreasing the warranting value. The more neutral and independent a review platform is perceived, the higher is the perceived veracity of the reviews (DeAndrea et al., 2018). Ansari et al. (2018) state in their article that the affect on the warranting value, as explained above, can also be explained by the ELM, and the ELM can additionally explain why some information is more persuasive than other information.

Elaboration Likelihood Model (ELM)
The ELM was used by three studies (Ansari & Gupta, 2019;Luo et al., 2013;Román et al., 2019, pp. 141-166). Of these studies, one combined it with the cognitive dissonance theory (Román et al., 2019, pp. 141-166), one combined it with warranting theory (Ansari & Gupta, 2019), and one combined it with theoretical considerations about information quality (Luo et al., 2013). The key principle of the ELM is that information is processed either via a central or a peripheral route (Petty & Brinol, 2011;Petty & Cacioppo, 1984). When the receiver of a message is motivated and able to actively process the information, it is processed via the central route and systematically analyzed in terms of the strength of arguments, credibility of included facts, and compatibility with prior knowledge to form a firm decision which is not easily changed afterwards. However, if the motivation or ability of the receiver is low, information is processed peripherally and heuristically analyzed. This then leads to a decision which can be more readily persuaded to change (Petty & Brinol, 2011;Petty & Cacioppo, 1984). Studies show that the use of the ELM can help identify factors that can influence the 7 Since it needs a high cognitive engagement to process all the information, for each restaurant, a trust score was presented based on all reviews of the restaurant to indicate the proportion of fraudulent reviews. And it was displayed as a 'review quality score'. identification of a review as either genuine or fake (Ansari et al., 2018;Román et al., 2019, pp. 141-166). Both Ansari et al. (2018) and Román et al. (2019, pp. 141-166) differentiate between central cues (Argument quality, source quality, and message length in Ansari et al. (2018); argument quality and review quantity in Román et al. (2019, pp. 141-166)) and peripheral cues (verified badges, most recent reviews, and 1 & 5 star ratings in Ansari et al. (2018); review quantity, review consistency and review homophily in Román et al. (2019, pp. 141-166)). Additionally, Román et al. (2019, pp. 141-166) identified moderator variables, such as purchase context, the review platform, hedonic versus utilitarian purchases and purchases involving negative needs versus positive wants. Other studies also show that the identity of a source (Peng et al., 2016) and prior knowledge about review spam can influence the perception of the veracity of a review (DeAndrea et al., 2018;Peng et al., 2016).

Cognitive dissonance theory
As mentioned above, one study combined Cognitive Dissonance Theory with the ELM (Román et al., 2019, pp. 141-166). A person has a cognitive state of attitudes, beliefs and behaviors and most of the time they will act accordingly. However, Festinger's Cognitive Dissonance Theory (Festinger, 1957) states that if one does something that contradicts this cognitive state, it will cause a dissonance, meaning a state of tension or arousal. This discomfort can lead to either a change in behavior or a change in attitude and therefore repair the dissonant cognitive state. Festinger argues that people are motivated to be consistent and avoid the state of dissonance even if that means changing their behavior or attitude (Festinger, 1957;Román et al., 2019, pp. 141-166). Román et al. (2019, pp. 141-166) used the ELM to explain how the consumer identifies the veracity of a review, while they used Cognitive Dissonance Theory to explain what happens when the review is identified as a fake positive review, with the level of cognitive dissonance provoked by the discovery a review was false motivating increasingly severe responses. If the customer thinks the deception is minor, they will simply not purchase the product. When the deception is perceived to be stronger they will report the review, and if the deception is perceived to be especially strong they may even stop buying anything from the company altogether (Román et al., 2019, pp. 141-166).

Uncertainty reduction theory
Two studies used Uncertainty Reduction Theory (Kusumasondjaja et al., 2012;Racherla et al., 2012) which states that the moment two strangers meet, the primary concern of both is to reduce the uncertainty or increase the predictability of the behavior of all parties involved in the interaction (Racherla et al., 2012). It further states that uncertainty has an influence on the level of intimacy, liking and trust between the parties involved and has three phases (1) entry phase, (2) personal phase and (3) attitude assessment phase. In the context of reviews this means that since there is no history and no future expectations of interaction between the sender and receiver of an online consumer review, active and passive strategies are used to evaluate the information (Racherla et al., 2012). Active strategies could be scanning other reviews of the same product or the review itself for arguments, and passive strategies could be comparing socio-demographic characteristics of reviewers. Racherla et at. (2012) found that high similarity and high argument quality both increase the trust in a review, similar to Ansari et al. (2018) and Román et al. (Román et al., 2019, pp. 141-166) who found argument quality to be a central cue in fake review detection using the Elaboration Likelihood Model. Kusumasondjaja et al. (2012) on the other hand found that the identity of the reviewer is of high importance. When the reviewer is unknown uncertainty is high and so the review is perceived as less trustworthy, while trustworthiness increases when there is more information about the reviewer.

Speech act theory
Speech Act Theory was used by one study (Ansari & Gupta, 2021b) and proposes that next to the actual statements that are said or written and understood by a sender and receiver, respectively, there is a second message in spoken and written language, called an utterance (Ansari & Gupta, 2021b). These utterances describe an indicative action or request a form of action. Speech Act Theory further states that there are three different messages in a statement: the statement itself, what the sender intends to convey with the message, and the effect of the statement on the receiver (Ansari & Gupta, 2021b). An example of this would be: the traffic light is green. At the first level it describes the fact that the traffic light is green. At the second level, the statement may be intended as a request to drive on. At the third level, the receiver could interpret the statement as a sign that the sender believes that the receiver is inattentive and so needs a signal to indicate that they should proceed. Ansari and Gupta (Ansari & Gupta, 2021b) provide insight that the perceived veracity of a review can be influenced by referencing, flattering, contextual embedding, detailing and argument structure. Referencing, for example using pronouns, can be used to connect with a reader which influences the receivers' comprehension, in this study, they found that the use of more pronouns is perceived as more deceptive (Ansari & Gupta, 2021b). Detailing can help to support decisions or recommendations, however, as mentioned earlier, in reviews it is perceived as suspicious. Flattering can be understood as emojis, quotation and exclamation marks. In online reviews they can build rapport and help convey the emotions of the reviewer to the receiver. However, in reviews they are perceived to be used as manipulation, using flattery to influence the receiver to feel a certain way about the product being reviewed (Ansari & Gupta, 2021b). Contextual embedding indicates a personal experience, and helps the sender to connect to the receiver, showing them the whole picture. Too little contextual embedding can therefore create distance and indicate deception (Ansari & Gupta, 2021b). Argument structure, measured here with cognitive process words, can indicate a thoughtfulness and therefore, show that the sender wants to make sure something is understood correctly, which in online reviews is perceived as less likely to be deceptive.

Credibility theory
Credibility Theory was used by two studies (Filieri, 2016;Jensen et al., 2013). Credibility Theory states that credibility results from the interaction of source characteristics (such as expertise, reputation and labels), characteristics of the message (argument quality and plausibility), characteristics of the receiver (own cultural background, involvement, beliefs and motivation) and the media via which the message is conveyed (design features such as usability and ease of navigation) (Filieri, 2016;Wathen & Burkell, 2002). However, the process of how these factors are processed is not clearly described by the theory. Similar to Elaboration Likelihood Model and Speech Act theory, Credibility Theory takes into account argument quality as factor. The theory assumes that a higher argument quality has a positive effect on the credibility of a review. While Jensen et al. (2013) combined Credibility with Language Expectancy Theory, Filieri (2016) mainly used Credibility Theory to develop their interview guide. Jensen et al. (2013) used Credibility Theory in combination with Language Expectancy Theory. Language Expectancy Theory describes the effect of language characteristics from different groups of individuals on the change of attitudes (Burgoon & Miller, 2018;Jensen et al., 2013). It includes that when we know a person has a certain characteristic or profession, we expect them to communicate in a certain way. For example, we expect more formal communication from a professor than from a student. It also includes how persuasive the communication is expected to be (Burgoon & Miller, 2018;Jensen et al., 2013). Following Jensen et al. (2013), this means one expects to read within reviews a certain level of language complexity; one-sidedness of arguments with extreme ratings, and two-sidedness with moderate ratings. If these expectations are violated, the perceived credibility of the reviewer and the perception of the product are affected negatively (Jensen et al., 2013). Munzel (2016) used earlier research from different authors to build a new theoretical framework. He stated that identity disclosure, consensus information and persuasion knowledge activation influence the trustworthiness of a review, which in turn influences the purchase intention and avoidance intention. Munzel (2016) furthermore states that consensus information has a moderating effect on the identity disclosure of a reviewer, which means that if there is little information on the identity of the reviewer but high consensus between reviewers is high, then the trust is still high. The theory argues that the more is known about the sender/reviewer the more trustworthy the message/review is perceived to be. Román et al. (2019) used the findings of their interviews to develop a new theoretical framework. The interviews were developed by considering a combination of Cognitive Dissonance Theory and the Elaboration Likelihood Model. The new theoretical framework described by Román et al. (2019, pp. 141-166) proposes that elaboration motivation and elaboration ability influence the central (argument quality and review quantity) and peripheral cues (review quantity, review consistency and review homophily) to veracity, while product knowledge acts as a moderator on the strength of these relationships. The central and peripheral cues and initial expectations then influence the perceived deception in online consumer reviews (PDOCR). Perceived deception in turn influences self-protective and public and revenge behaviors. Roman et a. (2019) also find that there are moderating variables that have an influence on the perceived deception and the resulting behaviour behavior. There were purchase context (products vs. services), review platform (retailer's website/other platforms), hedonic vs. utilitarian purchases, and purchases involving negative needs vs. positive wants).

Other/ no theoretical background
Next to using an existing theoretical framework, Credibility Theory, Filieri (2016) used his findings to develop a new theoretical framework. Filieri argued that the inductively generated cues he identified (source trustworthiness, message trustworthiness, review valence and pattern in reviews) determined consumers' perception of trustworthiness of a review. This trustworthiness in turn determined how persuasive a review was. Filieri (2016) also found that consumer involvement, consumer experience with consumer reviews and the medium type are moderating factors in determining how cues impacted on trustworthiness, and via trustworthiness, persuasion.
There were four studies that based their research questions and hypotheses on earlier research without constructing a specific theoretical model (Ananthakrishnan et al., 2020;Kronrod et al., 2017;Munzel, 2015;Peng et al., 2016).

Discussion: A research agenda for consumers' fake review detection
This review gives an overview of the cues and features used by consumers to detect fake reviews in online shopping systems. We found 15 papers that identified or tested different cues used by consumers to detect the veracity of a review. These cues have been identified within a diversity of theoretical and methodological approaches. Overall, we show that in contrast to algorithmic fake detection, human fake review detection is a multi-faceted phenomenon in which situational as well as cognitive elements play a role. The most common elements identified in literature are that argument quality, information about reviewers' identity, and the characteristics of the reviewer (what the reviewer is like) have a positive influence on how genuine a review is perceived to be. Furthermore, many theories indicate that how the (fake) reviews are displayed, including on which website, has an influence on how genuine reviews are perceived to be. However, few theories are able to both describe the cues and factors that are used by consumers to identify the veracity of an online review, and also describe the process by which the cues and factors identified lead to decisions by consumers.
We will first discuss the strengths and limitations of the theories and methods captured within our review. Afterwards, we compare human detection and automated detection, discuss the limitations of our literature review and then discuss future work.

Strengths and limitations of the theories applied to fake review detection
We found very little consistency across the literature with regard to theoretical approach. Consequently, we review the strengths and limitation of each theory. The theoretical frameworks used were Warranting theory, Elaboration Likelihood Model, Cognitive Dissonance Theory, Uncertainty Reduction Theory, Speech Act Theory, Credibility Theory and Language Expectancy Theory.
When comparing these theories and how they are used in human fake review detection online, we found that Language Expectancy Theory, Uncertainty Reduction Theory and Credibility Theory propose a mental model of what consumers believe an online consumer review ought to be like. In simplified terms, these theories suggest that people perceive online reviews as suspicious if the reviews do not match these mental models. Similarly, Warranting Theory proposes that people are more likely to believe a message when they trust the sender. Warranting Theory has thereby been used to explain how the consumers' perception of the author of a review influences how they perceive the review. The Elaboration Likelihood Model (ELM) allows us to understand how consumer motivation and ability leads to (sub)optimal decision making depending on whether consumers are willing or able to process online reviews via the central or peripheral route. Whereas the previous theories explain how people try to detect fake reviews, Cognitive Dissonance Theory and Speech Act Theory explain how the identification of fake online reviews influences the buying behavior of consumers.
As a consequence of these strengths and weaknesses, we argue that a combination of the Language Expectancy Theory, Uncertainty Reduction Theory, Credibility Theory, Elaboration Likelihood Model, Cognitive Dissonance Theory and Speech Act Theory would be most valuable in explaining what review cues are used by humans and how they are considered during the act of purchasing.
There are also various limitations of the application of theory in our identified literature to be mentioned. Firstly, on a theoretical level, none of the theories explain sufficiently how the different cues/factors are considered during the purchase and how much they weigh into the decision-making process. Furthermore, it is also noteworthy that most studies employed a deductive approach using a priori (categories of) cues presumed to be relevant from prior research, rather than using an inductive approach. Since most of the theories applied deductively are taken from adjacent research areas, a lack of inductive research comes at the risk of overlooking important cues that are specific to fake review detection. This concern is compounded by the scarcity of research on this topic overall. There is a real need to galvanize research on this topic to ensure relevant cues are identified. Additionally, even where studies were consistent in using the same or similar theoretical approaches, they often used different cues and therefore it is hard to determine how to combine the theories and cues most effectively. We hope that our identification of specific clusters of cues will help to make it easier for future work in this area to be both more comprehensive in their consideration of cues, but also more specific with regard to how these cues lead to behaviors.
On an empirical level, we find that the research on how consumers identify the veracity of online consumer reviews is very scarce. This leads us to consider that there might be more cues that have not been included in the research and theories yet. Coupled with the uncertainty we identify regarding how cues are processed, we conclude that it is not yet possible based on the extant literature to develop a full-fledged theoretical framework explaining how consumers detect fake online consumer reviews. Developing such a theoretical framework therefore is a priority for future work examining fake review detection. Based on the findings of our study the foundations of such a theoretical framework should integrate the best qualities of the theories that identify the relevant the cues, and those of the theories that focus on identifying how consumers process information. We advocate for more inductive work to developing theory in this area to complement the more extensive deductive work that has already been done. This would allow any new theory to be grounded in observations of actual consumer behavior, complementing the existing theories we review here.

Strengths and limitations of the methods applied in fake review detection
From the 13 studies in our literature review (three papers from Ansari describe one mixed-methods design) there were four mixedmethods approaches (Ansari et al., 2018;Kronrod et al., 2017;Peng et al., 2016;Racherla et al., 2012) and two studies that conducted qualitative interviews (Filieri, 2016;Román et al., 2019, pp. 141-166). Though mixed-methods approaches offer the possibility to first develop a theoretical framework and then test it, none of the above-mentioned studies did so. The mixed-method approaches all used preexisting theoretical frameworks and tested the influence of specific aspects on the perceived veracity of a review. The qualitative parts of the mixed-method designs were primarily descriptive rather than theoretical. That is, they gathered insight into the awareness of fake reviews of consumers and asked which cues consumers used (Ansari et al., 2018;Ansari & Gupta, 2019, 2021bPeng et al., 2016) or they asked participants to produce online consumer reviews which would then be used in follow up studies (Kronrod et al., 2017). However, there was no attempt to generate and test a new theory. As a result, these studies are less capable of depicting the whole picture.
Explorative qualitative interviews are a good method to explore how a decision about the veracity of a review is made, and so allow an inductive analysis of processes or mechanisms that underpin decision making. However, Filieri (2016) and Román et al. (2019, pp. 141-166) both constructed their interview guide based on existing theoretical frameworks. Due to the use of existing and limited theoretical frameworks they may have impeded participants from describing novel processes or cues that have not previously been considered in the fake review detection literature. Furthermore, both Román et al. (2019, pp. 141-166) and Filieri (2016) embed the cues they found almost completely within the Elaboration Likelihood Model by disambiguating the found cues into central versus peripheral cues. In later articles, Filieri et al. (Filieri, Hofacker, & Alguezaui, 2018;Filieri, Raguseo, & Vitari, 2018) highlight that the primary benefit of their approach was the inductive identification of relevant cues, rather than developing new theory to explain how those cues are processed.
The consequence of the at least partly deductive approach to the extant qualitative literature we identify, is that there is still a need for exploratory qualitative research to uncover cues and relations not covered by existing theories. There remains a need to use methods that will help us to develop theory for understanding fake review detection.
The non-qualitative studies used an experimental design to test specific factors. The strength of these studies is that they can empirically test some of the theoretical assumptions outlined above, such as argument quality and consensus. However, in such an underdeveloped field of research as human fake review detection there remains significant risk that many key variables are unknown, and where they are known we cannot yet be confident in how they affect judgments. Again, we argue that research that builds theory based on observation of behavior and through exploring the thought processes of actual consumers would complement the existing evidence base and allow greater confidence when describing relevant cues and how they are processed. In this way we can better determine which theory or theories best describe how consumers make veracity judgements about online reviews, and how these are then translated into concrete actions.

Comparing human and automated fake review detection
When comparing human versus automated fake review detection, there are cues that are used by both, such as review characteristics (Ansari & Gupta, 2021b;Heydari et al., 2015) or reviewer characteristics (DeAndrea et al., 2018). Examples of review characteristics are usefulness rating counts , linguistic features Reddy et al., 2020) and metrics such as word counts Dewang, Singh, & Singh, 2016). Reviewer characteristic are features that can identify the reviewer such as a name , but also the number of friends or followers (Z. Wu et al., 2017). Furthermore, both humans and machines use textual characteristics, however, while with humans textual characteristic are content based such as the amount of detail (Ansari et al., 2018;Ansari & Gupta, 2021b;Jensen et al., 2013;Kronrod et al., 2017;Román et al., 2019, pp. 141-166), machine learning focuses on the amount of adverbs, verbs or nouns used, the percentage of capitals used or on other text statistics .
However, there are also differences between the human and machine approach. Unlike machine approaches, humans also consider seller characteristics, such as trade records (DeAndrea et al., 2018;Peng et al., 2016;Racherla et al., 2012) and characteristics of the platform where the review is displayed (Ananthakrishnan et al., 2020;Munzel, 2015Munzel, , 2016. On the other hand, machine fake review detection incorporates product centric cues such as price , and meta-feature centric features such as date variance . The reason for these discrepancies between users and automated fake review detection could be that product centric features such as the price of the item are perceived by consumers as less affected by reviews. However, the reasons underpinning why different cues are used by consumers has not yet been fully explored which is an additional motivator for exploratory research that addresses such questions. In addition to differences in the cues used, humans and machines process the information differently. For instance, theories such as the ELM argue that humans process information by two different routes (the central and the peripheral one) (Ansari et al., 2018), but we do not know of any algorithmic approach that processes the information in the same way.

Key lessons and suggestions for future research based on the current findings
Our first key lesson is that because of the diversity of theories and cues employed, it remains unclear how well the cues and cognitive processes involved in fake review detection have been mapped by the extant literature. For example, the theoretical framework incorporating the Elaboration Likelihood Model and Cognitive Dissonance theory by Román (Román et al., 2019, pp. 141-166) includes only some of the factors identified as relevant by the studies we have analyzed in this systematic review; and does not include factors that influence the trustworthiness and credibility of reviews. We argue that more freely developed interview guidelines could help develop a new theoretical framework that does cover all factors, cues, prioritizing and thought processes in identifying the veracity of a review. In other words, there is a need for more inductive bottom-up research based on the lived experiences of consumers to supplement the predominantly deductive top-down approach that has been taken so far.
Second, during our research we identified two approaches to human detection of fake reviews which complemented automated detection: (1) cues to deception and (2) research on trustworthiness and credibility of reviews. This is a meaningful discovery because it shows that human and automated fake review detection are not interchangeable. This is further supported by research on automated review detection which showed that automated review detection outperforms humans  and research which shows that if human detection is supported by automated detection, the detection rate increases (Kim et al., 2021;Yuan et al., 2016). Furthermore, research on automated fake review detection shows that there is no tool that reaches absolute accuracy Kennedy et al., 2019;Ott et al., 2011). Therefore, we propose that research should focus on researching both areas thoroughly and combining them for higher accuracy in fake review detection as 'human-in-the-loop'. The new generation of fake reviews are a serious problem for detection if one concentrates solely on textual features, because they are often written by humans who were instructed on how to write reviews (Bode, 2022;Stiftung Warentest, 2020), therefore a different approach has to be developed (Hovy, 2016).
To summarize, further research should address the limitations of the current theoretical approaches by combining their stronger elements with a new inductively generated theory which would be developed into interventions that are focused on increasing the salience of cues intuitively attended to by humans as well as those that can be trained. Likely these would include cues that are already successfully used by machine learning approaches. Any such theory needs to be more than just a collection of cues and must also explain how and why people attend to relevant cues, and how we can transfer the perception of cues to improved behavior.
Additionally, we do not differentiate between different industries, such as hotels, products or services, within the review. We expect it to be likely that the nature (and cost) of the purchased item or experience may influence consumer behavior and judgements. Unfortunately, we do not believe it is currently possible to make reliable claims about how reviews have different effects depending on industry or item due to the scarcity of the research. We do, however, recommend using different industries as a predictor variable in future research.
Concretely, a research agenda to address the research gaps we identify might begin with qualitative work to understand how consumers themselves believe they think about and determine the veracity of a review, including what cues they attend to and how these are combined with additional considerations. For example, it seems likely that consumers prior beliefs about a product, brand or platform are likely to influence purchases. As are decision heuristics such as confirmation bias. These findings can then be integrated with the theoretical frameworks. This combined inductive and deductive approach would provide a solid foundation for a comprehensive theoretical framework. This new framework should then be tested and further developed both in the laboratory and on actual consumer behavior.

Strengths and limitations of this review
Identifying that there is a limited evidence base is one contribution of this review, and we hope that this review will provide a springboard for future research. Nonetheless, the limitations of this study are partly determined by the scarcity of and inconsistency in the extant research. This is exemplified by the use of multiple different words to describe fake reviews, including but not limited to 'opinion spam', 'opinion scam', 'opinion fraud', 'review spam' and 'online review'. Furthermore, there are different words used for cues including but not limited to 'features', 'factors', 'indicators' and 'detection strategies'. We addressed this variety of terms by extending our initial backward and forward search for potential keywords, and also included many terms in our search string. It might however be the case that certain key words were omitted from our search meaning that we cannot guarantee that we included all research there is on consumer fake review detection.
Similarly, as for all systematic reviews, new research is continually published after the search is complete. In an effort to ensure our article is as up to date as possible we have also conducted a post-review search. This search identified an additional six articles on fake review detection which have been omitted from the initial search, however none of them fit our inclusion criteria. Five of them discuss algorithmic ways to identify fake reviews online (Birim et al., 2022;Lee, Song, Li, Lee, & Yang, 2022;Salminen, Kandpal, Kamel, Jung, & Jansen, 2022;Shi et al., 2022;Tufail et al., 2022). The sixth recent study investigated the consequences of perceived credibility of exaggerated positive online consumer reviews (Román et al., 2023). The authors found that perceived review credibility affected the consumers thoughts about the brand's reputation, purchase intention, and perceptions about trustworthiness of the review site. In turn, brand identification enhanced review credibility (Román et al., 2023). However, since Román et al. (2023) do not discuss any detection cues, this article also does not meet our inclusion criteria.
A related limitation is that we considered only peer reviewed literature and did not seek grey literature on this topic. Therefore, we excluded student projects and one PhD project (Roland, 2019) that would otherwise have fit the other criteria of our review study. Nonetheless, we argue that we found a variety of different research that has to be combined in future research. That is, we believe we have captured the breadth of the literature even if it is never possible to have identified the full depth of extant research. Capturing the breadth rather than depth of the research is more fundamental for our research goals however, since we aimed to consider the diversity of cues, theories and methods used in the published literature. Furthermore, we argue that the holistic way we have evaluated the existing literature by considering the strength of the theory and methods and not only looking at outcomes, is another strength of this literature review.

Conclusion
Online fake reviews are a challenge for online markets and platforms, since they undermine user ratings as a tool for consumers to make an informed choice. Given the shortcomings of machine learning approaches to identify fake reviews, it stands to reason that there is benefit in supporting consumers in identifying fake reviews themselves. However, consumer detection cues for online fake reviews are not yet well understood. Our systematic review has consolidated the scattered research landscape by analyzing existing cues used by consumers as well as research on factors that influence the trustworthiness and credibility of online reviews. Our survey shows that these are a broad variety of cues and factors which can be categorized into five categories: (1) review characteristics (2) textual characteristics (3) reviewer characteristics (4) seller characteristics (5) characteristics of the platform where the review is displayed.
However, since research on this topic is scarce it is possible that there are yet undiscovered factors and cues. Hence, it should be researched which cues consumers believe are most relevant and how they influence consumer behavior. This requires the development of a consolidated theory that exploits the best aspects of extant theories, but which is also supplemented by theory grounded in the experiences of consumers.

Consent for publication
All authors and the University of Applied Sciences Bonn-Rhein-Sieg consent in the publication of this article. This manuscript has not been published in any other journal and is not under review by any other journal.

Author contributions
The authors contributions are as follows, the first author wrote most of the article as part of her PhD program at the University of Applied Sciences Bonn-Rhein-Sieg and the University of Twente. The supervisors, Prof. Dr. Gunnar Stevens and Dr. Timo Jakobi, were involved in the design idea and throughout the whole project, the editing process and approval decision to publish the article. Dr. Steven Watson was involved in co-drafting the article, the editing process and approval decision to publish the article.

Funding
None.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability
No data was used for the research described in the article.

Appendix A
Key words: opinion fraud, fake reviews, opinion spam, detection cues, recommender systems, user-perspective, detection strategies, persuasive recommendations, consumer, Opinion fraud detection, online deception, online review, opinion spam detection, opinion spam features, review spam, deception detection, spam indicators, trust, lie detection.

Appendix B
Research Mixed-method approach Study 1: Generating reviews, either real or fake, In fake they were given clues: no clue (control), past tense, unique words, abstract language, personal pronouns Study 2: Asking participants to determine for 60 reviews the veracity, using the same clues as in study 1 328 useable MTurk workers 60 reviews were 30 real and 30 fake reviews from study 1 (control group) 5 conditions: no clue, past tense, unique words, abstract language, personal pronouns To ensure consistent understanding of the tactics, participants were presented the description of each of the three manipulation tactics, at random order to avoid bias. Participants were then asked how they perceived the tactics in terms of deceptiveness, ease of detection and ethicality and indicate their purchase intention and perceived usefulness of online product reviews.
Study 1: the quality of content; the quantity of the reviews; the mismatch between reviews, seller reputation and the trade record; and the identities of the reviewers Román S, P. Riquelme I, Iacobucci D. Perceived Deception in Online Consumer Reviews: Antecedents, Consequences, and Moderators. In 2019. p. 141-66. (Román et al., 2019, pp. 141-166)  Appendix C Timo Jakobi is a Post-Doc and managing director of the institute for digital consumption at the University of Applied Sciences Bonn-Rhein-Sieg. His main research direction lies in the investigation and development of useable data security and data safety solutions which can be used by companies within customer experience design. Within this Timo operates at a crossing of user research and jurisprudence, to facilitate a robust, empirical perspective of consumers on the definition and interpretation of (data security)laws. Topics include but are not restricted to the implementation of complex demands, such as transparency, self-explanatory intelligent systems, rights of data subjects and design of consent, taking into account consumer and company interests.
Steven Watson is an Assistant Professor within the Psychology of Conflict, Risk and Safety section at the University of Twente. Steven researches decision making and communication in applied contexts, including in the areas of deception detection and cybersecurity.
Gunnar Stevens is professor for system informatics, especially IT-Security at the University of Siegen (Germany) and co-director of the institute for digital consumption at the University of Applied Sciences Bonn-Rhein-Sieg. His research field includes but is not restricted to User-Centered Security, Useable Privacy & Data Literacy, as well as Consumer Data Analytics & Visualization. His expertise lies in informatics and Human-Computer-Interaction as well as the ergonomic design of data protection and information system solutions.