Skip to main content

Advertisement

Log in

Leveraging posts’ and authors’ metadata to spot several forms of abusive comments in Twitter

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Social media is frequently plagued with undesirable phenomena such as cyberbullying and abusive content in the form of hateful and racist posts. Therefore, it is crucial to study and propose better mechanisms to automatically identify communication that promote hate speech, hostility, and aggressiveness. Traditional approaches have only focused on exploiting the content and writing style of social media posts while ignoring information related to their context. On the other hand, several recent works have reported some interesting findings in this direction, although they have lacked an exhaustive analysis of contextual information, and also an evaluation about if this same premise holds to detect different types of abusive comments, e.g. offensive, hostile and hateful. For this, we have extended seven Twitter benchmark datasets related to the detection of offensive, aggressive, hostile, and hateful communication. We evaluate our hypothesis by using three different learning models, considering classical (Bag of Words), advanced (Glove), and state-of-the-art (BERT) text representations. Experiments show statistically significant differences between the classification scores of all methods that use a combination of text and metadata in comparison to the classical view of only using the text content of the messages, thus suggesting the importance of paying attention to context to spot the different kinds of abusive comments on social networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data Availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

Notes

  1. Just in one minute: Facebook users upload 147,000 photos, Twitter registers 319 new users, Instagram adds 350,000 new stories, etc. Source: https://www.socialmediatoday.com/news/what-happens-on-the-internet-every-minute-2020-version-infographic/583340/

  2. Important to remark that although this data is particular to specific posts, the privacy of its authors is never compromised.

  3. Those tweets were probably easier to spot and deleted by Twitter itself because of the racist keywords used for corpus collection.

  4. https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.mutual_info_classif.html

  5. A zero value means both variables are independent.

  6. https://ec.europa.eu/commission/presscorner/detail/en/IP_16_1937

References

Download references

Funding

This work was supported by the Mexican National Council for Science and Technology (CONACYT) under grant agreements no. 701616 and no. 654803.

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization: [Marco Casavantes]; Methodology: [Mario Ezra Aragón]; Formal analysis: [Marco Casavantes]; Investigation: [Marco Casavantes, Mario Ezra Aragón]; Data curation: [Marco Casavantes]; Validation: [Mario Ezra Aragón]; Writing - original draft preparation: [Marco Casavantes, Mario Ezra Aragón]; Writing -review and editing: [Luis C. González, Manuel Montes-y-Gómez]; Supervision: [Luis C. González, Manuel Montes-y-Gómez]; Project administration: [Luis C. González, Manuel Montes-y-Gómez].

Corresponding author

Correspondence to Luis C. González.

Ethics declarations

Competing interests

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Casavantes, M., Aragón, M.E., González, L.C. et al. Leveraging posts’ and authors’ metadata to spot several forms of abusive comments in Twitter. J Intell Inf Syst 61, 519–539 (2023). https://doi.org/10.1007/s10844-023-00779-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-023-00779-z

Keywords

Navigation