skip to main content
research-article

Identify Online Store Review Spammers via Social Review Graph

Published:01 September 2012Publication History
Skip Abstract Section

Abstract

Online shopping reviews provide valuable information for customers to compare the quality of products, store services, and many other aspects of future purchases. However, spammers are joining this community trying to mislead consumers by writing fake or unfair reviews to confuse the consumers. Previous attempts have used reviewers’ behaviors such as text similarity and rating patterns, to detect spammers. These studies are able to identify certain types of spammers, for instance, those who post many similar reviews about one target. However, in reality, there are other kinds of spammers who can manipulate their behaviors to act just like normal reviewers, and thus cannot be detected by the available techniques.

In this article, we propose a novel concept of review graph to capture the relationships among all reviewers, reviews and stores that the reviewers have reviewed as a heterogeneous graph. We explore how interactions between nodes in this graph could reveal the cause of spam and propose an iterative computation model to identify suspicious reviewers. In the review graph, we have three kinds of nodes, namely, reviewer, review, and store. We capture their relationships by introducing three fundamental concepts, the trustiness of reviewers, the honesty of reviews, and the reliability of stores, and identifying their interrelationships: a reviewer is more trustworthy if the person has written more honesty reviews; a store is more reliable if it has more positive reviews from trustworthy reviewers; and a review is more honest if many other honest reviews support it. This is the first time such intricate relationships have been identified for spam detection and captured in a graph model. We further develop an effective computation method based on the proposed graph model. Different from any existing approaches, we do not use an review text information. Our model is thus complementary to existing approaches and able to find more difficult and subtle spamming activities, which are agreed upon by human judges after they evaluate our results.

References

  1. Barabasi, A. and Albert, R. 1999. Emergence of scaling in random networks. Science.Google ScholarGoogle Scholar
  2. Carreras, X., Marquez, L. S., and Salgado, J. G. 2001. Boosting trees for anti-spam email filtering. In Proceedings of the 4th International Conference on Recent Advances in Natural Language Processing. 58--64.Google ScholarGoogle Scholar
  3. Consumerist. Resellerratings cracks down on thecellshop.net’s review bribing. http://consumerist.com/2008/05/reselleratings-cracks-down-on-thecellshopnets-review-bribing.html.Google ScholarGoogle Scholar
  4. Fleiss, J. and Cohen, J. 1973. The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. In Educational and Psychological Measurement.Google ScholarGoogle Scholar
  5. Gyngyi, Z. and Garcia-Molina, H. 2005. Web spam taxonomy. In Proceedings of the Workshop on Adversarial IR on the Web.Google ScholarGoogle Scholar
  6. Hu, M. and Liu, B. 2004. Mining and summarizing customer reviews. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Jindal, N. and Liu, B. 2008. Opinion spam and analysis. In Proceedings of the International Conference on Web Search and Web Data Mining (WSDM’08). ACM, New York, NY, 219--230. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Jindal, N., Liu, B., and Lim, E. 2010. Finding unusual review patterns using unexpected rules. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM’10). ACM, New York, NY, 1549--1552. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Kleinberg, J. 1999. Authoritative sources in a hyperlinked environment. In J. ACM 46, 5, 604--632. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Landis, J. and Koch, G. 1977. The measurement of observer agreement for categorical data. Biometrics.Google ScholarGoogle Scholar
  11. Lauw, H., Lim, E. P., and Wang, K. 2006. Bias and controversy: Beyond the statistical deviation. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’06). ACM, 625--630. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Lauw, H. W., Lim, E., and Wang, K. 2008. Bias and controversy in evaluation system. IEEE Trans. Knowl. Data Engin. 20, 11, 1490--1504. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Lim, E., Nguyen, V., Jindal, N., Liu, B., and Lauw, H. 2010. Detecting product review spammers using rating behaviors. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM’10). ACM, New York, NY, 939--948. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Liu, B. 2010. Sentiment analysis and subjectivity. In Handbook of Natural Language Processing.Google ScholarGoogle Scholar
  15. McGlohon, M., Bay, S., Anderle, M., Steier, D., and Faloutsos, C. 2009. Snare: A link analytic system for graph labeling and risk detection. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Mukherjee, A., Liu, B., Wang, J., Glance, N., and Jindal, N. 2011. Detecting group review spam. In Proceedings of the 20th International Conference Companion on World Wide Web. 93--94 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Ott, M., Choi, Y., Cardie, C., and Hancock, J. 2011. Finding deceptive opinion spam by any stretch of the imagination. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Page, L., Brin, S., Motwani, R., and Winograd, T. 1998. The pagerank citation ranking: Bringing order to the web. Tech. rep., Stanford Digital Library Technologies Project.Google ScholarGoogle Scholar
  19. Pandit, S., Chau, D., Wang, S., and Faloutsos, C. 2007. Netprobe: A fast and scalable system for fraud detection in online auction networks. In Proceedings of the 16th International Conference Companion on World Wide Web. 201--210. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Pang, B. and Lee, L. 2008. Opinion mining and sentiment analysis. Found. Trends Inf. Retrieval 2, 1--2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Pearl, R. and Reed, L. 1920. On the rate of growth of the population of the united states since 1790 and its mathematical representation. Proc. Nat. Acad. Sci.Google ScholarGoogle Scholar
  22. Popescu, A. and Etzioni, O. 2005. Extracting product features and opinions from reviews. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Wu, G., Greene, D., Smyth, B., and Cunningham, P. 2010. Distortion as a validation criterion in the identification of suspicious reviews. In Proceedings of the 1st Workshop on Social Media Analytics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Yin, X., Han, J., and Yu, P. S. 2008. Truth discovery with multiple conflicting information providers on the web. IEEE Trans. Knowl. Data Engin. 20, 6. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Identify Online Store Review Spammers via Social Review Graph

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Intelligent Systems and Technology
      ACM Transactions on Intelligent Systems and Technology  Volume 3, Issue 4
      September 2012
      410 pages
      ISSN:2157-6904
      EISSN:2157-6912
      DOI:10.1145/2337542
      Issue’s Table of Contents

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 September 2012
      • Accepted: 1 May 2011
      • Revised: 1 March 2011
      • Received: 1 December 2010
      Published in tist Volume 3, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader