Abstract
Online shopping reviews provide valuable information for customers to compare the quality of products, store services, and many other aspects of future purchases. However, spammers are joining this community trying to mislead consumers by writing fake or unfair reviews to confuse the consumers. Previous attempts have used reviewers’ behaviors such as text similarity and rating patterns, to detect spammers. These studies are able to identify certain types of spammers, for instance, those who post many similar reviews about one target. However, in reality, there are other kinds of spammers who can manipulate their behaviors to act just like normal reviewers, and thus cannot be detected by the available techniques.
In this article, we propose a novel concept of review graph to capture the relationships among all reviewers, reviews and stores that the reviewers have reviewed as a heterogeneous graph. We explore how interactions between nodes in this graph could reveal the cause of spam and propose an iterative computation model to identify suspicious reviewers. In the review graph, we have three kinds of nodes, namely, reviewer, review, and store. We capture their relationships by introducing three fundamental concepts, the trustiness of reviewers, the honesty of reviews, and the reliability of stores, and identifying their interrelationships: a reviewer is more trustworthy if the person has written more honesty reviews; a store is more reliable if it has more positive reviews from trustworthy reviewers; and a review is more honest if many other honest reviews support it. This is the first time such intricate relationships have been identified for spam detection and captured in a graph model. We further develop an effective computation method based on the proposed graph model. Different from any existing approaches, we do not use an review text information. Our model is thus complementary to existing approaches and able to find more difficult and subtle spamming activities, which are agreed upon by human judges after they evaluate our results.
- Barabasi, A. and Albert, R. 1999. Emergence of scaling in random networks. Science.Google Scholar
- Carreras, X., Marquez, L. S., and Salgado, J. G. 2001. Boosting trees for anti-spam email filtering. In Proceedings of the 4th International Conference on Recent Advances in Natural Language Processing. 58--64.Google Scholar
- Consumerist. Resellerratings cracks down on thecellshop.net’s review bribing. http://consumerist.com/2008/05/reselleratings-cracks-down-on-thecellshopnets-review-bribing.html.Google Scholar
- Fleiss, J. and Cohen, J. 1973. The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. In Educational and Psychological Measurement.Google Scholar
- Gyngyi, Z. and Garcia-Molina, H. 2005. Web spam taxonomy. In Proceedings of the Workshop on Adversarial IR on the Web.Google Scholar
- Hu, M. and Liu, B. 2004. Mining and summarizing customer reviews. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarDigital Library
- Jindal, N. and Liu, B. 2008. Opinion spam and analysis. In Proceedings of the International Conference on Web Search and Web Data Mining (WSDM’08). ACM, New York, NY, 219--230. Google ScholarDigital Library
- Jindal, N., Liu, B., and Lim, E. 2010. Finding unusual review patterns using unexpected rules. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM’10). ACM, New York, NY, 1549--1552. Google ScholarDigital Library
- Kleinberg, J. 1999. Authoritative sources in a hyperlinked environment. In J. ACM 46, 5, 604--632. Google ScholarDigital Library
- Landis, J. and Koch, G. 1977. The measurement of observer agreement for categorical data. Biometrics.Google Scholar
- Lauw, H., Lim, E. P., and Wang, K. 2006. Bias and controversy: Beyond the statistical deviation. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’06). ACM, 625--630. Google ScholarDigital Library
- Lauw, H. W., Lim, E., and Wang, K. 2008. Bias and controversy in evaluation system. IEEE Trans. Knowl. Data Engin. 20, 11, 1490--1504. Google ScholarDigital Library
- Lim, E., Nguyen, V., Jindal, N., Liu, B., and Lauw, H. 2010. Detecting product review spammers using rating behaviors. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM’10). ACM, New York, NY, 939--948. Google ScholarDigital Library
- Liu, B. 2010. Sentiment analysis and subjectivity. In Handbook of Natural Language Processing.Google Scholar
- McGlohon, M., Bay, S., Anderle, M., Steier, D., and Faloutsos, C. 2009. Snare: A link analytic system for graph labeling and risk detection. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarDigital Library
- Mukherjee, A., Liu, B., Wang, J., Glance, N., and Jindal, N. 2011. Detecting group review spam. In Proceedings of the 20th International Conference Companion on World Wide Web. 93--94 Google ScholarDigital Library
- Ott, M., Choi, Y., Cardie, C., and Hancock, J. 2011. Finding deceptive opinion spam by any stretch of the imagination. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Google ScholarDigital Library
- Page, L., Brin, S., Motwani, R., and Winograd, T. 1998. The pagerank citation ranking: Bringing order to the web. Tech. rep., Stanford Digital Library Technologies Project.Google Scholar
- Pandit, S., Chau, D., Wang, S., and Faloutsos, C. 2007. Netprobe: A fast and scalable system for fraud detection in online auction networks. In Proceedings of the 16th International Conference Companion on World Wide Web. 201--210. Google ScholarDigital Library
- Pang, B. and Lee, L. 2008. Opinion mining and sentiment analysis. Found. Trends Inf. Retrieval 2, 1--2. Google ScholarDigital Library
- Pearl, R. and Reed, L. 1920. On the rate of growth of the population of the united states since 1790 and its mathematical representation. Proc. Nat. Acad. Sci.Google Scholar
- Popescu, A. and Etzioni, O. 2005. Extracting product features and opinions from reviews. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing. Google ScholarDigital Library
- Wu, G., Greene, D., Smyth, B., and Cunningham, P. 2010. Distortion as a validation criterion in the identification of suspicious reviews. In Proceedings of the 1st Workshop on Social Media Analytics. Google ScholarDigital Library
- Yin, X., Han, J., and Yu, P. S. 2008. Truth discovery with multiple conflicting information providers on the web. IEEE Trans. Knowl. Data Engin. 20, 6. Google ScholarDigital Library
Index Terms
- Identify Online Store Review Spammers via Social Review Graph
Recommendations
Detecting product review spammers using rating behaviors
CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge managementThis paper aims to detect users generating spam reviews or review spammers. We identify several characteristic behaviors of review spammers and model these behaviors so as to detect the spammers. In particular, we seek to model the following behaviors. ...
Review Graph Based Online Store Review Spammer Detection
ICDM '11: Proceedings of the 2011 IEEE 11th International Conference on Data MiningOnline reviews provide valuable information about products and services to consumers. However, spammers are joining the community trying to mislead readers by writing fake reviews. Previous attempts for spammer detection used reviewers' behaviors, text ...
Detecting spammers using review graph
In recent years, e-commerce is so popular that many consumers make transactions online. In order to make more profit, some merchants hire spammers to give high ratings to promote certain products, or to give malicious negative reviews to defame products ...
Comments