Published July 17, 2020 | Version 1.0
Dataset Open

SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020)

  • 1. Rochester Institute of Technology
  • 2. Qatar Computing Research Institute, HBKU
  • 3. IBM Research
  • 4. University of Copenhagen
  • 5. University of Cambridge
  • 6. 2Qatar Computing Research Institute, Qatar
  • 7. 6 IT University Copenhagen, Denmark
  • 8. University of Wolverhampton, UK
  • 9. University of Tubingen, Germany

Description

The task involves three subtasks corresponding to the hierarchical taxonomy of the OLID schema (Zampieri et al., 2019) from OffensEval 2019. The task featured five languages and this upload is for the English language. In addition, English also featured Subtasks B and C. OffensEval 2020 was one of the most popular tasks at SemEval-2020 attracting a large number of participants across all subtasks and also across all languages. A total of 528 teams signed up to participate in the task, 145 teams submitted systems during the evaluation period, and 70 submitted system description papers.

This upload includes a test set used in the paper describing the dataset used in the shared task as well as the official test set used in the shared task.

The evaluation phase for English is available on Codalab: https://competitions.codalab.org/competitions/23285

The Website for the shared task is https://sites.google.com/site/offensevalsharedtask/home

Files

extended_test-20200717T190516Z-001.zip

Files (244.1 MB)

Name Size Download all
md5:72a47ea414eaa6116075d43810b6eb00
469.3 kB Preview Download
md5:c4026ffda9998603b9d9d94ce16e2692
8.6 kB Preview Download
md5:7476d6555ea05374002cfd0a46667a0b
261.7 kB Preview Download
md5:4ff61a2c75f36e91d7b6616af2684859
226.9 MB Preview Download
md5:2bf18f6eb890ac7fc787897a79a7bc48
4.8 MB Preview Download
md5:f34dc50cffed1313026b29b91173a78d
11.6 MB Preview Download

Additional details

References

  • Rosenthal, Sara, et al. "A large-scale semi-supervised dataset for offensive language identification." arXiv preprint arXiv:2004.14454 (2020).
  • Zampieri, Marcos, et al. "SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020)." arXiv preprint arXiv:2006.07235 (2020).