Skip to main content

Anomalous User Comment Detection in Social News Websites

  • Conference paper
International Joint Conference SOCO’14-CISIS’14-ICEUTE’14

Abstract

The Web has evolved over the years and, now, not only the administrators of a site generate content. Users of a website can express themselves showing their feelings or opinions. This fact has led to negative side effects: sometimes the content generated is inappropriate. Frequently, this content is authored by troll users who deliberately seek controversy. In this paper we propose a new method to detect trolling comments in social news websites. To this end, we extract a combination of statistical, syntactic and opinion features from the user comments. Since this troll phenomenon is quite common in the web, we propose a novel experimental setup for our anomaly detection method: considering troll comments as base model (normal behaviour: ‘normality’). We evaluate our approach with data from ‘Menéame’, a popular Spanish social news site, showing that our method can obtain high rates whilst minimising the labelling task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. OReilly, T.: What is web 2.0: Design patterns and business models for the next generation of software. Communications & Strategies (1), 17 (2007)

    Google Scholar 

  2. Dadvar, M., Trieschnigg, D., Ordelman, R., de Jong, F.: Improving cyberbullying detection with user context. In: Serdyukov, P., Braslavski, P., Kuznetsov, S.O., Kamps, J., Rüger, S., Agichtein, E., Segalovich, I., Yilmaz, E. (eds.) ECIR 2013. LNCS, vol. 7814, pp. 693–696. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  3. Smith, P.K., Mahdavi, J., Carvalho, M., Fisher, S., Russell, S., Tippett, N.: Cyberbullying: Its nature and impact in secondary school pupils. Journal of Child Psychology and Psychiatry 49(4), 376–385 (2008)

    Article  Google Scholar 

  4. Dinakar, K., Reichart, R., Lieberman, H.: Modeling the detection of textual cyberbullying. In: The Social Mobile Web (2011)

    Google Scholar 

  5. Shachaf, P., Hara, N.: Beyond vandalism: Wikipedia trolls. Journal of Information Science 36(3), 357–370 (2010)

    Article  Google Scholar 

  6. Bergstrom, K.: don’t feed the troll: Shutting down debate about community expectations on reddit. com. First Monday 16(8) (2011)

    Google Scholar 

  7. Fisher, D., Smith, M., Welser, H.T.: You are who you talk to: Detecting roles in usenet newsgroups. In: Proceedings of the 39th Annual Hawaii International Conference on System Sciences, HICSS 2006, vol. 3, p. 59b. IEEE (2006)

    Google Scholar 

  8. Lea, M., O’Shea, T., Fung, P., Spears, R.: ’Flaming’in computer-mediated communication: Observations, explanations, implications. Harvester Wheatsheaf (1992)

    Google Scholar 

  9. Postmes, T., Spears, R., Lea, M.: Breaching or building social boundaries? side-effects of computer-mediated communication. Communication Research 25(6), 689–715 (1998)

    Article  Google Scholar 

  10. Lerman, K.: User participation in social media: Digg study. In: Proceedings of the 2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology-Workshops, pp. 255–258. IEEE Computer Society (2007)

    Google Scholar 

  11. Jindal, N., Liu, B.: Review spam detection. In: Proceedings of the 16th International Conference on World Wide Web, pp. 1189–1190. ACM (2007)

    Google Scholar 

  12. Jindal, N., Liu, B.: Opinion spam and analysis. In: Proceedings of the International Conference on Web Search and Web Data Mining, pp. 219–230. ACM (2008)

    Google Scholar 

  13. Santos, I., de-la Peña-Sordo, J., Pastor-López, I., Galán-García, P., Bringas, P.: Automatic categorisation of comments in social news websites. Expert Systems with Applications (2012)

    Google Scholar 

  14. Baeza-Yates, R.A., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley Longman Publishing Co., Inc., Boston (1999)

    Google Scholar 

  15. Salton, G., McGill, M.: Introduction to modern information retrieval. McGraw-Hill New York (1983)

    Google Scholar 

  16. Tata, S., Patel, J.M.: Estimating the selectivity of tf-idf based cosine similarity predicates. ACM SIGMOD Record 36(2), 75–80 (2007)

    Article  Google Scholar 

  17. Kohavi, R., et al.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: IJCAI, vol. 14, pp. 1137–1145 (1995)

    Google Scholar 

  18. Cooper, G.F., Herskovits, E.: A bayesian method for constructing bayesian belief networks from databases. In: Proceedings of the 1991 Conference on Uncertainty in Artificial Intelligence (1991)

    Google Scholar 

  19. Geiger, D., Goldszmidt, M., Provan, G., Langley, P., Smyth, P.: Bayesian network classifiers. In: Machine Learning, pp. 131–163 (1997)

    Google Scholar 

  20. Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press (1995)

    Google Scholar 

  21. Amari, S., Wu, S.: Improving support vector machine classifiers by modifying kernel functions. Neural Networks 12(6), 783–789 (1999)

    Article  Google Scholar 

  22. Maji, S., Berg, A., Malik, J.: Classification using intersection kernel support vector machines is efficient. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8. IEEE (2008)

    Google Scholar 

  23. Üstün, B., Melssen, W., Buydens, L.: Visualisation and interpretation of support vector regression models. Analytica Chimica Acta 595(1-2), 299–309 (2007)

    Article  Google Scholar 

  24. Cho, B., Yu, H., Lee, J., Chee, Y., Kim, I., Kim, S.: Nonlinear support vector machine visualization for risk factor analysis using nomograms and localized radial basis function kernels. IEEE Transactions on Information Technology in Biomedicine 12(2), 247–256 (2008)

    Article  Google Scholar 

  25. Garner, S.: Weka: The waikato environment for knowledge analysis. In: Proceedings of the 1995 New Zealand Computer Science Research Students Conference, pp. 57–64 (1995)

    Google Scholar 

  26. Quinlan, J.: C4.5 programs for machine learning. Morgan Kaufmann (1993)

    Google Scholar 

  27. Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jorge de-la-Peña-Sordo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

de-la-Peña-Sordo, J., Pastor-López, I., Ugarte-Pedrero, X., Santos, I., Bringas, P.G. (2014). Anomalous User Comment Detection in Social News Websites. In: de la Puerta, J., et al. International Joint Conference SOCO’14-CISIS’14-ICEUTE’14. Advances in Intelligent Systems and Computing, vol 299. Springer, Cham. https://doi.org/10.1007/978-3-319-07995-0_51

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-07995-0_51

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-07994-3

  • Online ISBN: 978-3-319-07995-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics