Offensive Comments in the Brazilian Web: a dataset and baseline results

  • Rogers Prates de Pelle
  • Viviane P. Moreira

Resumo


Brazilian Web users are among the most active in social networks and very keen on interacting with others. Offensive comments, known as hate speech, have been plaguing online media and originating a number of lawsuits against companies which publish Web content. Given the massive number of user generated text published on a daily basis, manually filtering offensive comments becomes infeasible. The identification of offensive comments can be treated as a supervised classification task. In order to obtain a model to classify comments, an annotated dataset containing positive and negative examples is necessary. The lack of such a dataset in Portuguese, limits the development of detection approaches for this language. In this paper, we describe how we created annotated datasets of offensive comments for Portuguese by collecting news comments on the Brazilian Web. In addition, we provide classification results achieved by standard classification algorithms on these datasets which can serve as baseline for future work on this topic.

Publicado
06/07/2017
Como Citar

Selecione um Formato
DE PELLE, Rogers Prates; MOREIRA, Viviane P.. Offensive Comments in the Brazilian Web: a dataset and baseline results. In: BRAZILIAN WORKSHOP ON SOCIAL NETWORK ANALYSIS AND MINING (BRASNAM), 6. , 2017, São Paulo. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2017 . ISSN 2595-6094. DOI: https://doi.org/10.5753/brasnam.2017.3260.