Skip to main content

Overview of Annotation Creation: Processes and Tools

  • Chapter
  • First Online:
Handbook of Linguistic Annotation

Abstract

Creating linguistic annotations requires more than just a reliable annotation scheme. Annotation can be a complex endeavour potentially involving many people, stages, and tools. This chapter outlines the process of creating end-to-end linguistic annotations, identifying specific tasks that researchers often perform. Because tool support is so central to achieving high quality, reusable annotations with low cost, the focus is on identifying capabilities that are necessary or useful for annotation tools, as well as common problems these tools present that reduce their utility. Although examples of specific tools are provided in many cases, this chapter concentrates more on abstract capabilities and problems because new tools appear continuously, while old tools disappear into disuse or disrepair. The two core capabilities tools must have are support for the chosen annotation scheme and the ability to work on the language under study. Additional capabilities are organized into three categories: those that are widely provided; those that often useful but found in only a few tools; and those that have as yet little or no available tool support.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 349.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 449.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 449.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agirre, E., Edmonds, P. (eds.): Word Sense Disambiguation. Text, Speech, and Language Technology. Springer, Dordrecht (2007)

    Google Scholar 

  2. Apache.: UIMA Documentation, Version 2.7.0. https://uima.apache.org/d/uimaj-2.7.0/index.html (2014)

  3. Apostolova, E., Neilan, S., An, G., Tomuro, N., Lytinen, S.: Djangology: a light-weight web-based tool for distributed collaborative text annotation. In: Proceedings of the 7th Language Resources and Evaluation Conference (LREC 2010), pp. 3499–3505 (2010)

    Google Scholar 

  4. Auer, E., Russel, A., Sloetjes, H., Wittenburg, P., Schreer, O., Masnieri, S., Schneider, D., Tschöpel, S.: ELAN as flexible annotation framework for sound and image processing detectors. In: Proceedings of the 7th Language Resources and Evaluation Conference (LREC 2010), pp. 890–893. Malta (2010)

    Google Scholar 

  5. Boersma, P.: The use of Praat in corpus research. In: Durand, J., Gut, U., Kristoffersen, G. (eds.) The Oxford Handbook of Corpus Phonology. Oxford University Press, Oxford (2014). doi:10.1093/oxfordhb/9780199571932.013.016

  6. Bombien, L., Cassidy, S., Harrington, J., John, T., Palethorpe, S.: Recent developments in the Emu speech database system. In: Proceedings of the Australian Speech Science and Technology Conference. Auckland, New Zealand (2006)

    Google Scholar 

  7. Buchholz, S., Marsi, E., Krymolowski, Y., Dubey, A.: CoNLL-X Shared Task: Multi-lingual Dependency Parsing. http://ilk.uvt.nl/conll/ (2015). Accessed 11 June 2015

  8. Burchardt, A., Erk, K., Frank, A., Kowalski, A., Pado, S.: SALTO – a versatile multi-level annotation tool. In: Proceedings of the 5th International Conference on Language Resources and Evaluation LREC2006, pp. 517–520 (2006). doi:10.1.1.127.8088

    Google Scholar 

  9. Chen, W.-T., Styler, W.: Anafora: a web-based general purpose annotation tool. In: Proceedings of the 2013 NAACL HLT Demonstration Session, pp. 14–19. Atlanta, Association for Computational Linguistics, Georgia. http://www.aclweb.org/anthology/N13-3004 (2013)

  10. Choi, J.D., Bonial, C., Palmer, M.: Jubilee: Propbank Instance Editor Guidelines (Version 2.1). University of Colorado at Boulder, Boulder (2009)

    Google Scholar 

  11. Cunningham, H., Maynard, D., Bontcheva, K.: Text Processing with GATE (Version 6). University of Sheffield, London (2011)

    Google Scholar 

  12. Day, D., Aberdeen, J., Hirschman, L., Kozierok, R., Robinson, P., Vilain, M.: Mixed-initiative development of language processing systems. In: Proceedings of the 5th Conference on Applied Natural Language Processing, pp. 348–355. Association for Computational Linguistics, Washington, DC (1997). doi:10.3115/974557.974608

  13. Day, D., McHenry, C., Kozierok, R., Riek, L.: Callisto: a configurable annotation workbench. In: Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC 2004), pp. 2073–2076. Lisbon, Portugal (2004)

    Google Scholar 

  14. Dickinson, M., Lee, C.M.: Detecting errors in semantic annotation. In: Proceedings of the 6th International Language Resources and Evaluation (LREC’08), pp. 605–610. Marrakech, Morocco. http://www.lrec-conf.org/proceedings/lrec2008/ (2008)

  15. Fellbaum, C.: Wordnet: An Electronic Lexical Database. MIT Press, Cambridge (1998)

    Google Scholar 

  16. Finlayson, M.A.: The Story Workbench: an extensible semi-automatic text annotation tool. In: Tomai, E., Elson, D., Rowe, J. (eds.) Proceedings of the 4th Workshop on Intelligent Narrative Technologies (INT4), vol. 4, pp. 21–24. AAAI Press, Menlo Park, Stanford. http://aaai.org/ocs/index.php/AIIDE/AIIDE11WS/paper/view/4091/4455 (2011)

  17. Hinrichs, E.W., Hinrichs, M., Zastrow, T.: WebLicht: web-based LRT services for German. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL 2010): System Demonstrations, pp. 25–29. Uppsala, Sweden. http://www.aclweb.org/anthology/P10-4005 (2010)

  18. Kilgarriff, A.: The Sketch Engine: ten years on. Lexicography, pp. 1–30 (2014)

    Google Scholar 

  19. Kipp, M.: ANVIL: The video annotation research tool. In: Durand, J., Gut, U., Kristofferson, G. (eds.) Handbook of Corpus Phonology. Oxford University Press, Oxford (2014)

    Google Scholar 

  20. Kulkarni, N., Finlayson, M.A.: jMWE: A Java Toolkit for detecting multi-word expressions. In: Kordoni, V., Ramisch, C., Villavicencio, A. (eds.) Proceedings of the 8th Workshop on Multiword Expressions: From Parsing and Generation to the Real World (MWE 2011), pp. 122–124. Association for Computational Linguistics (ACL), Portland. http://www.aclweb.org/anthology/W11-0818 (2011)

  21. MacWhinney, B.: The CHILDES Project: Tools for Analyzing Talk (Electronic Edition Part 2: The CLAN Programs). Carnegie Mellon University, Pittsburg. http://childes.psy.cmu.edu/manuals/CLAN.pdf (2015)

  22. Maeda, K., Bird, S., Ma, X., Lee, H.: Creating annotation tools with the annotation graph toolkit. In: Proceedings of the Third International Conference on Language Resources and Evaluation. Paris, France (2002)

    Google Scholar 

  23. Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics (ACL 2014): System Demonstrations, pp. 55–60. http://www.aclweb.org/anthology/P/P14/P14-5010 (2014)

  24. Marcel, B., Florian, P., Stefanie Dipper, J.K.: CorA: A web-based annotation tool for historical and other non-standard language data. In: Proceedings of the 8th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH), pp. 86–90. Gothenburg, Sweden (2014)

    Google Scholar 

  25. Neale, S., Silva, J., Branco, A.: A flexible interface tool for manual word sense annotation. In: Bunt, H. (ed.) Proceedings of the 11th Joint ACL-ISO Workshop on Interoperable Semantic Annotation (ISA-11). London, UK. http://www.aclweb.org/anthology/W/W15/W15-0208.pdf (2015)

  26. Orasan, C.: PALinkA: A highly customisable tool for discourse annotation. In: Proceedings of the 4th SIGdial Workshop on Discourse and Dialog (2001)

    Google Scholar 

  27. Petasis, G., Karkaletsis, V.: Ellogon: A new text engineering platform. In: Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC 2002), pp. 72–78. Las Palmas, Canary Islands. http://arxiv.org/abs/cs/0205017 (2002)

  28. Pradhan, S., Marcus, M., Palmer, M., Ramshaw, L., Weischedel, R., Xue, N. (eds.): Proceedings of the 15th Conference on Computational Natural Language Learning (CoNLL-2011): Shared Task. Association for Computational Linguistics, Portland, Oregon. http://www.aclweb.org/anthology/W11-19 (2011)

  29. Pustejovsky, J., Stubbs, A.: Natural Language Annotation for Machine Learning: A guide to corpus-building for applications. O’Reilly, Sebastopol (2013)

    Google Scholar 

  30. Schmidt, T., Wörner, K.: EXMARaLDA – Creating, analysing and sharing spoken language corpora for pragmatic research. Pragmatics 19, 565–582 (2009)

    Google Scholar 

  31. Seid Muhie, Y., Gurevych, I., de Castilho, R.E. Biemann, C.: WebAnno: a flexible, web-based and visually supported system for distributed annotations. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013): System Demonstrations, pp. 1–6. Sofia, Bulgaria (2013)

    Google Scholar 

  32. Stenetorp, P., Pyysalo, S., Topić, G., Ohta, T., Ananiadou, S., Tsujii, J.: brat: a web-based tool for NLP-assisted text annotation. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2012): Demonstrations, pp. 102–107. Avignon, France. http://www.aclweb.org/anthology/E12-2021 (2012)

  33. Stubbs, A.: MAE and MAI: lightweight annotation and adjudication tools. In: Proceedings of the 5th Linguistic Annotation Workshop (LAW V), pp. 129–133. Association for Computational Linguistics., Portland, Oregon, USA http://www.aclweb.org/anthology/W11-0416 (2011)

  34. Verhagen, M., Knippen, R., Mani, I., Pustejovsky, J.: Annotation of temporal relations with Tango. In: Proceedings of the 5th Languange Resources and Evaluation Confernece (LREC 2006), pp. 2249–2252. European Language Resources Association (ELRA), Genoa, Italy (2006)

    Google Scholar 

  35. Zeldes, A., Ritz, J., Lüdeling, A., Chiarcos, C.: ANNIS: a search tool for multi-layer annotated corpora. In: Proceedings of Corpus Linguistics 2009. Liverpool. http://ucrel.lancs.ac.uk/publications/cl2009/ (2009)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tomaž Erjavec .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Finlayson, M.A., Erjavec, T. (2017). Overview of Annotation Creation: Processes and Tools. In: Ide, N., Pustejovsky, J. (eds) Handbook of Linguistic Annotation. Springer, Dordrecht. https://doi.org/10.1007/978-94-024-0881-2_5

Download citation

  • DOI: https://doi.org/10.1007/978-94-024-0881-2_5

  • Published:

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-94-024-0879-9

  • Online ISBN: 978-94-024-0881-2

  • eBook Packages: Social SciencesSocial Sciences (R0)

Publish with us

Policies and ethics