Published September 12, 2016 | Version v1
Presentation Open

WebAnno for less resourced and historical data annotation

  • 1. Technische Universität Darmstadt

Description

WebAnno is a generic, web-based, and distributed annotation tool. WebAnno supports the annotation of different linguistic types and structures, such as token spans (e.g. part of speech), sub-tokens (e.g. morphology markers), relations (e.g. dependency grammar), chains (e.g. co-reference), and complex slotbased (e.g. semantic role labelling) annotations.

Unlike many annotation tools, it supports the annotations of different languages, including low-resourced and historical languages, as far as the writing systems use valid Unicode representations. In addition to left-to-right writing systems of the European languages, the latest WebAnno release also supports annotations for right-to-left scripts such as Arabic and Hebrew. To facilitate rapid annotations for less resourced languages, the tool also includes an integrated automation component, which suggests annotations automatically and incrementally so that annotators can easily correct the suggestions.

This automation component led to an increase of annotation speed of 21% in an annotation study for Amharic. There is ample support for the annotation flow, including user management, agreement computation, adjudication of multiply-annotated material, as well as various import and export formats. WebAnno has been developed over the past 3 years as part of the CLARIN-D infrastructure, and is available as open source, enabling others to customize it according to specific needs.

Files

WebAnno for less resourced and historical data annotation.pdf

Files (2.6 MB)