Abstract
Profanity is a common occurrence in online text. Recent studies found swearing words in over 7% of English tweets and 9% of Yahoo! Buzz messages. However, efforts in recognizing, understanding and dealing with profanity do not share resources, namely, their dataset, which imposes duplication of effort and non-comparable results.
We here present a freely available dataset of 2500 messages from a popular Portuguese sports website. About 20% of the messages had profanity, thus we annotated 726 swear words, 510 of which were obfuscated by the authors. We also identified the most frequent profanities, and what methods, and combination of methods, people used to disguise their cursing.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Constant, N., Davis, C., Potts, C., Schwarz, F.: The pragmatics of expressive content: Evidence from large corpora. Sprache und Datenverarbeitung: International Journal for Language Data Processing (33), 5–21 (2009)
Jay, T., Janschewitz, K.: Filling the emotional gap in linguistic theory: Commentary on Pot’s expressive dimension (33), 215-221 (2007)
Jay, T.: The utility and ubiquity of taboo words. 4(2), 153-161 (2009)
Wang, W., Chen, L., Thirunarayan, K., Sheth, A.P.: Cursing in English on Twitter. In: Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work & Social Computing, CSCW 2014 (February 2014)
Thelwall, M.: Fk yea I swear: Cursing and gender in MySpace. Corpora. 3(1), 83–107 (2008)
Sood, S.O., Antin, J., Churchill, E.: Profanity use in online communities. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 2012, pp. 1481–1490. ACM, New York (2012)
Mehl, M.R., Pennebaker, J.W.: The Sounds of Social Life: A Psychometric Analysis of Students Daily Social Environments and Natural Conversations. Journal of Personality and Social Psychology 84(4), 857–870 (2003)
Crisp, R.J., Heuston, S., Farr, M.J., Turner, R.N.: Seeing Red or Feeling Blue: Differentiated Intergroup Emotions and Ingroup Identification in Soccer Fans
Sousa Silva, R., Laboreiro, G., Sarmento, L., Grant, T., Oliveira, E., Maia, B.: ‘twazn me!!!;(’ Automatic Authorship Analysis of Micro-Blogging Messages. In: Muñoz, R., Montoyo, A., Métais, E. (eds.) NLDB 2011. LNCS, vol. 6716, pp. 161–168. Springer, Heidelberg (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Laboreiro, G., Oliveira, E. (2014). What We Can Learn from Looking at Profanity. In: Baptista, J., Mamede, N., Candeias, S., Paraboni, I., Pardo, T.A.S., Volpe Nunes, M.d.G. (eds) Computational Processing of the Portuguese Language. PROPOR 2014. Lecture Notes in Computer Science(), vol 8775. Springer, Cham. https://doi.org/10.1007/978-3-319-09761-9_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-09761-9_11
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09760-2
Online ISBN: 978-3-319-09761-9
eBook Packages: Computer ScienceComputer Science (R0)