skip to main content
10.5555/1621034.1621040dlproceedingsArticle/Chapter ViewAbstractPublication PagesnlpxmlConference Proceedingsconference-collections
research-article
Free Access

Multi-dimensional annotation and alignment in an English-German translation corpus

Published:04 April 2006Publication History

ABSTRACT

This paper presents the compilation of the CroCo Corpus, an English-German translation corpus. Corpus design, annotation and alignment are described in detail. In order to guarantee the searchability and exchangeability of the corpus, XML stand-off mark-up is used as representation format for the multi-layer annotation. On this basis it is shown how the corpus can be queried using XQuery. Furthermore, the generalisation of results in terms of linguistic and translational research questions is briefly discussed.

References

  1. Mona Baker. 1996. Corpus-based translation studies: The challenges that lie ahead. In Harold Somers (ed.). Terminology, LSP and Translation. Benjamins, Amsterdam:175--186.Google ScholarGoogle Scholar
  2. Douglas Biber. 1993. Representativeness in Corpus Design. Literary and Linguistic Computing 8/4:243--257.Google ScholarGoogle ScholarCross RefCross Ref
  3. Shoshana Blum-Kulka. 1986. Shifts of cohesion and coherence in Translation. In Juliane House and Shoshana Blum-Kulka (eds.). Interlingual and Intercultural Communication. Gunter Narr, Tübingen:17--35.Google ScholarGoogle Scholar
  4. Thorsten Brants. 2000. TnT - A Statistical Part-of-Speech Tagger. Proceedings of the Sixth Applied Natural Language Processing Conference ANLP-2000, Seattle, WA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Matthias Heyn. 1996. Integrating machine translation into translation memory systems. European Association for Machine Translation - Workshop Proceedings, ISSCO, Geneva:111--123.Google ScholarGoogle Scholar
  6. Heinz Dieter Maas. 1998. Multilinguale Textverarbeitung mit MPRO. Europäische Kommunikationskybernetik heute und morgen '98, Paderborn.Google ScholarGoogle Scholar
  7. Christoph Müller and Michael Strube. 2003. Multi-Level Annotation in MMAX. Proceedings of the 4th SIGdial Workshop on Discourse and Dialogue, Sapporo, Japan:198--107.Google ScholarGoogle Scholar
  8. Stella Neumann and Silvia Hansen-Schirra. 2005. The CroCo Project: Cross-linguistic corpora for the investigation of explicitation in translations. In Proceedings from the Corpus Linguistics Conference Series, Vol. 1, no. 1, ISSN 1747-9398.Google ScholarGoogle Scholar
  9. Franz Josef Och and Hermann Ney. 2003. A Systematic Comparison of Various Statistical Alignment Models. Journal of Computational Linguistics Nr. 1, vol. 29:19--51. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Maeve Olohan and Mona Baker. 2000. Reporting that in Translated English. Evidence for Subconscious Processes of Explicitation? Across Languages and Cultures 1(2):141--158.Google ScholarGoogle ScholarCross RefCross Ref
  11. Geoffrey Sampson. 1995. English for the Computer. The Susanne Corpus and Analytic Scheme. Clarendon Press, Oxford.Google ScholarGoogle Scholar
  12. Anne Schiller, Simone Teufel and Christine Stöckert. 1999. Guidelines für das Tagging deutscher Textkorpora mit STTS, University of Stuttgart and Seminar für Sprachwissenschaft, University of Tübingen.Google ScholarGoogle Scholar
  13. Erich Steiner. 2005. Explicitation, its lexicogrammatical realization, and its determining (independent) variables -- towards an empirical and corpus-based methodology. SPRIKreports 36:1--43.Google ScholarGoogle Scholar
  14. Elke Teich, Silvia Hansen, and Peter Fankhauser. 2001. Representing and querying multi-layer annotated corpora. Proceedings of the IRCS Workshop on Linguistic Databases. Philadelphia: 228--237.Google ScholarGoogle Scholar
  1. Multi-dimensional annotation and alignment in an English-German translation corpus

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image DL Hosted proceedings
        NLPXML '06: Proceedings of the 5th Workshop on NLP and XML: Multi-Dimensional Markup in Natural Language Processing
        April 2006
        104 pages

        Publisher

        Association for Computational Linguistics

        United States

        Publication History

        • Published: 4 April 2006

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader