Post or Block? Advances in Automatically Filtering Undesired Comments

Alberto, Túlio C.; Lochter, Johannes V.; Almeida, Tiago A.

doi:10.1007/s10846-014-0105-y

Post or Block? Advances in Automatically Filtering Undesired Comments

Published: 05 September 2014

Volume 80, pages 245–259, (2015)
Cite this article

Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

Túlio C. Alberto¹,
Johannes V. Lochter¹ &
Tiago A. Almeida¹

154 Accesses
9 Citations
Explore all metrics

Abstract

Currently, a great volume of the available information on several websites comes from the interaction with users, such as social networks, forums and blogs, where readers can post comments and sometimes develop habits of frequenting them. Some blogs specialized in certain subjects, gain the users credibility and become references in the field. Nevertheless, the ease of inserting content through text comments makes room for unwanted messages, which affect the user experience, reduce the quality of the information provided by the websites and indirectly cause personal and economic losses. In this scenario, this paper presents a comprehensive study of established machine learning techniques applied to automatically detect undesired comments posted on blogs. Furthermore, different sets of attributes were evaluated along with text normalization techniques. Experiments carried out with a real and public database indicate that support vector machines, logistic regression and stacking ensemble methods, trained with both attributes extracted from the text messages and posting information, are promising for the task of blocking undesired comments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on sentiment analysis methods, applications, and challenges

Article 07 February 2022

Mayur Wankhade, Annavarapu Chandra Sekhara Rao & Chaitanya Kulkarni

A review on sentiment analysis and emotion detection from text

Article 28 August 2021

Pansy Nandwani & Rupali Verma

A survey of sentiment analysis in social media

Article 04 July 2018

Lin Yue, Weitong Chen, … Minghao Yin

References

Aha, D., Kibler, D., Albert, M.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991)
Google Scholar
Alberto, T., Almeida, T.: Aprendizado de máquina aplicado na detecção automática de comentários indesejados. In: Anais do X Encontro Nacional de Inteligência Artificial e Computacional (ENIAC’13), pp. 1–12. Fortaleza, Brazil (2013)
Google Scholar
Almeida, T., Alberto, T.: Learning to block undesired comments in the blogosphere. In: Proceedings of the 12th IEEE International Conference on Machine Learning and Applications (ICMLA’13), pp. 1–6. Miami (2013)
Almeida, T., Almeida, J., Yamakami, A.: Spam filtering: How the dimensionality reduction affects the accuracy of naive bayes classifiers. JISA 1(3), 183–200 (2011)
Google Scholar
Almeida, T., Yamakami, A.: Compression-based spam filter. Secur. Commun. Netw., 1–15 (2012)
Almeida, T., Yamakami, A.: Occam’s razor-based spam filter. JISA 3(3), 245–253 (2012)
Google Scholar
Bhattarai, A., Dasgupta, D.: A self-supervised approach to comment spam detection based on content analysis. IJISP 5(1), 14–32 (2011)
Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article MATH Google Scholar
Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM TIST 2, 1–27 (2011)
Article Google Scholar
Chawla, N., Bowyer, K., Hall, L., Kegelmeyer, W.: Smote: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
MATH Google Scholar
Chu, Z., Gianvecchio, S., Koehl, A., Wang, H., Jajodia, S.: Blog or block: Detecting blog bots through behavioral biometrics. Comp. Netw. 57(1), 634–646 (2013)
Article Google Scholar
Cook, P., Stevenson, S.: An unsupervised model for text message normalization. In: Proceedings of the 2009 CALC, pp. 71–78. Association for Computational Linguistics (2009)
Cormack, G., Gómez Hidalgo, J., Sanz, E.: Spam filtering for short messages. In: Proceedings of the 16th CIKM, pp. 313–320. Lisbon (2007)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)
MATH Google Scholar
Frank, E., Witten, I.: Generating accurate rule sets without global optimization. In: Proceedings of the 15th ICML, pp. 144–151. Madison (1998)
Freund, Y., Schapire, R.: Experiments with a new boosting algorithm. In: Proceedings of the 13rd ICML, pp. 148–156. Bari (1996)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: The WEKA data mining software: An update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
Article Google Scholar
Haykin, S.: Neural Networks: A Comprehensive Foundation, 2nd edn. New York, Prentice Hall (1998)
Google Scholar
Kantchelian, A., Ma, J., Huang, L., Afroz, S., Joseph, A., Tygar, J.: Robust detection of comment spam using entropy rate. In: Proceedings of the 5th AISec, pp. 59–69. Raleigh (2012)
Mishne, G., Carmel, D., Lempel, R.: Blocking blog spam with language model disagreement. In: Proceedings of the 1st AIRWeb, pp. 1–6. Chiba (2005)
Mishne, G., Glance, N.: Leave a reply: An analysis of weblog comments. In: Proceedings of the 3rd WWE, pp. 1–8. Edinburgh (2006)
Opitz, D., Maclin, R.: Popular ensemble methods: An empirical study. J. Artif. Intell. Res. 11, 169–198 (1999)
MATH Google Scholar
Quinlan, J.: C4.5: programs for machine learning, 1st edn. Morgan Kaufmann, San Mateo (1993)
Google Scholar
Romero, C., Valdez, M., Alanis, A.: A comparative study of machine learning techniques in blog comments spam filtering. In: Proceedings of the 6th WCCI, pp. 63–69. Barcelona (2010)
Shin, Y., Gupta, M., Myers, S.: Prevalence and mitigation of forum spamming. In: Proceedings of the 30th INFOCOM, pp. 1–9. Shangai (2011)
Wang, J., Yu, C., Yu, P., Liu, B., Meng, W.: Diversionary comments under political blog posts. In: Proceedings of the 21st CIKM, pp. 1789–1793. Maui (2012)
Witten, I., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Google Scholar
Wolpert, D.: Stacked generalization. Neural Netw. 5(2), 241–259 (1992)
Article MathSciNet Google Scholar
Wu, X., Kumar, V., Quinlan, J., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G., Ng, A., Liu, B., Yu, P., Zhou, Z.H., Steinbach, M., Hand, D., Steinberg, D.: Top 10 algorithms in data mining. KAIS 14(1), 1–37 (2008)
Google Scholar
Xue, Z., Yin, D., Davison, B.D., Davison, B.: Normalizing Microtext. In: Proceedings of the 2011 AAAI, pp. 74–79. Association for the Advancement of Artificial Intelligence (2011)

Download references

Author information

Authors and Affiliations

Department of Computer Science, Federal University of São Carlos – UFSCar, 18052-780, Sorocaba, São Paulo, Brazil
Túlio C. Alberto, Johannes V. Lochter & Tiago A. Almeida

Authors

Túlio C. Alberto
View author publications
You can also search for this author in PubMed Google Scholar
Johannes V. Lochter
View author publications
You can also search for this author in PubMed Google Scholar
Tiago A. Almeida
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tiago A. Almeida.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Alberto, T.C., Lochter, J.V. & Almeida, T.A. Post or Block? Advances in Automatically Filtering Undesired Comments. J Intell Robot Syst 80 (Suppl 1), 245–259 (2015). https://doi.org/10.1007/s10846-014-0105-y

Download citation

Received: 28 February 2014
Accepted: 26 August 2014
Published: 05 September 2014
Issue Date: December 2015
DOI: https://doi.org/10.1007/s10846-014-0105-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Post or Block? Advances in Automatically Filtering Undesired Comments

Abstract

Access this article

Similar content being viewed by others

A survey on sentiment analysis methods, applications, and challenges

A review on sentiment analysis and emotion detection from text

A survey of sentiment analysis in social media

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Post or Block? Advances in Automatically Filtering Undesired Comments

Abstract

Access this article

Similar content being viewed by others

A survey on sentiment analysis methods, applications, and challenges

A review on sentiment analysis and emotion detection from text

A survey of sentiment analysis in social media

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation