Anomalous User Comment Detection in Social News Websites

de-la-Peña-Sordo, Jorge; Pastor-López, Iker; Ugarte-Pedrero, Xabier; Santos, Igor; Bringas, Pablo García

doi:10.1007/978-3-319-07995-0_51

Jorge de-la-Peña-Sordo¹²,
Iker Pastor-López¹²,
Xabier Ugarte-Pedrero¹²,
Igor Santos¹² &
…
Pablo García Bringas¹²

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 299))

1585 Accesses
1 Citations

Abstract

The Web has evolved over the years and, now, not only the administrators of a site generate content. Users of a website can express themselves showing their feelings or opinions. This fact has led to negative side effects: sometimes the content generated is inappropriate. Frequently, this content is authored by troll users who deliberately seek controversy. In this paper we propose a new method to detect trolling comments in social news websites. To this end, we extract a combination of statistical, syntactic and opinion features from the user comments. Since this troll phenomenon is quite common in the web, we propose a novel experimental setup for our anomaly detection method: considering troll comments as base model (normal behaviour: ‘normality’). We evaluate our approach with data from ‘Menéame’, a popular Spanish social news site, showing that our method can obtain high rates whilst minimising the labelling task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

OReilly, T.: What is web 2.0: Design patterns and business models for the next generation of software. Communications & Strategies (1), 17 (2007)
Google Scholar
Dadvar, M., Trieschnigg, D., Ordelman, R., de Jong, F.: Improving cyberbullying detection with user context. In: Serdyukov, P., Braslavski, P., Kuznetsov, S.O., Kamps, J., Rüger, S., Agichtein, E., Segalovich, I., Yilmaz, E. (eds.) ECIR 2013. LNCS, vol. 7814, pp. 693–696. Springer, Heidelberg (2013)
Chapter Google Scholar
Smith, P.K., Mahdavi, J., Carvalho, M., Fisher, S., Russell, S., Tippett, N.: Cyberbullying: Its nature and impact in secondary school pupils. Journal of Child Psychology and Psychiatry 49(4), 376–385 (2008)
Article Google Scholar
Dinakar, K., Reichart, R., Lieberman, H.: Modeling the detection of textual cyberbullying. In: The Social Mobile Web (2011)
Google Scholar
Shachaf, P., Hara, N.: Beyond vandalism: Wikipedia trolls. Journal of Information Science 36(3), 357–370 (2010)
Article Google Scholar
Bergstrom, K.: don’t feed the troll: Shutting down debate about community expectations on reddit. com. First Monday 16(8) (2011)
Google Scholar
Fisher, D., Smith, M., Welser, H.T.: You are who you talk to: Detecting roles in usenet newsgroups. In: Proceedings of the 39th Annual Hawaii International Conference on System Sciences, HICSS 2006, vol. 3, p. 59b. IEEE (2006)
Google Scholar
Lea, M., O’Shea, T., Fung, P., Spears, R.: ’Flaming’in computer-mediated communication: Observations, explanations, implications. Harvester Wheatsheaf (1992)
Google Scholar
Postmes, T., Spears, R., Lea, M.: Breaching or building social boundaries? side-effects of computer-mediated communication. Communication Research 25(6), 689–715 (1998)
Article Google Scholar
Lerman, K.: User participation in social media: Digg study. In: Proceedings of the 2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology-Workshops, pp. 255–258. IEEE Computer Society (2007)
Google Scholar
Jindal, N., Liu, B.: Review spam detection. In: Proceedings of the 16th International Conference on World Wide Web, pp. 1189–1190. ACM (2007)
Google Scholar
Jindal, N., Liu, B.: Opinion spam and analysis. In: Proceedings of the International Conference on Web Search and Web Data Mining, pp. 219–230. ACM (2008)
Google Scholar
Santos, I., de-la Peña-Sordo, J., Pastor-López, I., Galán-García, P., Bringas, P.: Automatic categorisation of comments in social news websites. Expert Systems with Applications (2012)
Google Scholar
Baeza-Yates, R.A., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley Longman Publishing Co., Inc., Boston (1999)
Google Scholar
Salton, G., McGill, M.: Introduction to modern information retrieval. McGraw-Hill New York (1983)
Google Scholar
Tata, S., Patel, J.M.: Estimating the selectivity of tf-idf based cosine similarity predicates. ACM SIGMOD Record 36(2), 75–80 (2007)
Article Google Scholar
Kohavi, R., et al.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: IJCAI, vol. 14, pp. 1137–1145 (1995)
Google Scholar
Cooper, G.F., Herskovits, E.: A bayesian method for constructing bayesian belief networks from databases. In: Proceedings of the 1991 Conference on Uncertainty in Artificial Intelligence (1991)
Google Scholar
Geiger, D., Goldszmidt, M., Provan, G., Langley, P., Smyth, P.: Bayesian network classifiers. In: Machine Learning, pp. 131–163 (1997)
Google Scholar
Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press (1995)
Google Scholar
Amari, S., Wu, S.: Improving support vector machine classifiers by modifying kernel functions. Neural Networks 12(6), 783–789 (1999)
Article Google Scholar
Maji, S., Berg, A., Malik, J.: Classification using intersection kernel support vector machines is efficient. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8. IEEE (2008)
Google Scholar
Üstün, B., Melssen, W., Buydens, L.: Visualisation and interpretation of support vector regression models. Analytica Chimica Acta 595(1-2), 299–309 (2007)
Article Google Scholar
Cho, B., Yu, H., Lee, J., Chee, Y., Kim, I., Kim, S.: Nonlinear support vector machine visualization for risk factor analysis using nomograms and localized radial basis function kernels. IEEE Transactions on Information Technology in Biomedicine 12(2), 247–256 (2008)
Article Google Scholar
Garner, S.: Weka: The waikato environment for knowledge analysis. In: Proceedings of the 1995 New Zealand Computer Science Research Students Conference, pp. 57–64 (1995)
Google Scholar
Quinlan, J.: C4.5 programs for machine learning. Morgan Kaufmann (1993)
Google Scholar
Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

S3Lab, DeustoTech Computing, University of Deusto, Avenida de las Universidades 24, 48007, Bilbao, Spain
Jorge de-la-Peña-Sordo, Iker Pastor-López, Xabier Ugarte-Pedrero, Igor Santos & Pablo García Bringas

Authors

Jorge de-la-Peña-Sordo
View author publications
You can also search for this author in PubMed Google Scholar
Iker Pastor-López
View author publications
You can also search for this author in PubMed Google Scholar
Xabier Ugarte-Pedrero
View author publications
You can also search for this author in PubMed Google Scholar
Igor Santos
View author publications
You can also search for this author in PubMed Google Scholar
Pablo García Bringas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jorge de-la-Peña-Sordo .

Editor information

Editors and Affiliations

DeustoTech Computing, University of Deusto, Bilbao, Spain
José Gaviria de la Puerta
DeustoTech Computing, University of Deusto, Bilbao, Spain
Iván García Ferreira
DeustoTech Computing, University of Deusto, Bilbao, Spain
Pablo Garcia Bringas
German Workforce ADL Partnership Laboratory, Waltershausen, Germany
Fanny Klett
Scientific Network for Innovation and Research Excellence, Machine Intelligence Research Labs (MIR Labs), Washington, USA
Ajith Abraham
Department of Computer Science, University of Sao Paulo at Sao Carlos, Sao Carlos, Brazil
André C.P.L.F. de Carvalho
Department of Civil Engineering Escuela Politénica Superior, University of Burgos, Burgos, Spain
Álvaro Herrero
Department of Civil Engineering Escuela Politénica Superior, University of Burgos, Burgos, Spain
Bruno Baruque
Departamento de Enxeñeria Industrial, Escuela Universitaria Politécnica, University of Salamanca, Salamanca, Spain and University of Coruña, La Coruña, Spain
Héctor Quintián
University of Salamanca, Salamanca, Spain
Emilio Corchado

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

de-la-Peña-Sordo, J., Pastor-López, I., Ugarte-Pedrero, X., Santos, I., Bringas, P.G. (2014). Anomalous User Comment Detection in Social News Websites. In: de la Puerta, J., et al. International Joint Conference SOCO’14-CISIS’14-ICEUTE’14. Advances in Intelligent Systems and Computing, vol 299. Springer, Cham. https://doi.org/10.1007/978-3-319-07995-0_51

Download citation

DOI: https://doi.org/10.1007/978-3-319-07995-0_51
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07994-3
Online ISBN: 978-3-319-07995-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics