research-article

Free Access

Multi-dimensional annotation and alignment in an English-German translation corpus

Authors:
Silvia Hansen-Schirra

Saarland University, Germany

Saarland University, Germany
View Profile

,
Stella Neumann

Saarland University, Germany

Saarland University, Germany
View Profile

,
Mihaela Vela

Saarland University, Germany

Saarland University, Germany
View Profile

NLPXML '06: Proceedings of the 5th Workshop on NLP and XML: Multi-Dimensional Markup in Natural Language ProcessingApril 2006Pages 35–42

Published:04 April 2006Publication History

NLPXML '06: Proceedings of the 5th Workshop on NLP and XML: Multi-Dimensional Markup in Natural Language Processing

Pages 35–42

ABSTRACT

This paper presents the compilation of the CroCo Corpus, an English-German translation corpus. Corpus design, annotation and alignment are described in detail. In order to guarantee the searchability and exchangeability of the corpus, XML stand-off mark-up is used as representation format for the multi-layer annotation. On this basis it is shown how the corpus can be queried using XQuery. Furthermore, the generalisation of results in terms of linguistic and translational research questions is briefly discussed.

References

Mona Baker. 1996. Corpus-based translation studies: The challenges that lie ahead. In Harold Somers (ed.). Terminology, LSP and Translation. Benjamins, Amsterdam:175--186.Google Scholar
Douglas Biber. 1993. Representativeness in Corpus Design. Literary and Linguistic Computing 8/4:243--257.Google ScholarCross Ref
Shoshana Blum-Kulka. 1986. Shifts of cohesion and coherence in Translation. In Juliane House and Shoshana Blum-Kulka (eds.). Interlingual and Intercultural Communication. Gunter Narr, Tübingen:17--35.Google Scholar
Thorsten Brants. 2000. TnT - A Statistical Part-of-Speech Tagger. Proceedings of the Sixth Applied Natural Language Processing Conference ANLP-2000, Seattle, WA. Google ScholarDigital Library
Matthias Heyn. 1996. Integrating machine translation into translation memory systems. European Association for Machine Translation - Workshop Proceedings, ISSCO, Geneva:111--123.Google Scholar
Heinz Dieter Maas. 1998. Multilinguale Textverarbeitung mit MPRO. Europäische Kommunikationskybernetik heute und morgen '98, Paderborn.Google Scholar
Christoph Müller and Michael Strube. 2003. Multi-Level Annotation in MMAX. Proceedings of the 4th SIGdial Workshop on Discourse and Dialogue, Sapporo, Japan:198--107.Google Scholar
Stella Neumann and Silvia Hansen-Schirra. 2005. The CroCo Project: Cross-linguistic corpora for the investigation of explicitation in translations. In Proceedings from the Corpus Linguistics Conference Series, Vol. 1, no. 1, ISSN 1747-9398.Google Scholar
Franz Josef Och and Hermann Ney. 2003. A Systematic Comparison of Various Statistical Alignment Models. Journal of Computational Linguistics Nr. 1, vol. 29:19--51. Google ScholarDigital Library
Maeve Olohan and Mona Baker. 2000. Reporting that in Translated English. Evidence for Subconscious Processes of Explicitation? Across Languages and Cultures 1(2):141--158.Google ScholarCross Ref
Geoffrey Sampson. 1995. English for the Computer. The Susanne Corpus and Analytic Scheme. Clarendon Press, Oxford.Google Scholar
Anne Schiller, Simone Teufel and Christine Stöckert. 1999. Guidelines für das Tagging deutscher Textkorpora mit STTS, University of Stuttgart and Seminar für Sprachwissenschaft, University of Tübingen.Google Scholar
Erich Steiner. 2005. Explicitation, its lexicogrammatical realization, and its determining (independent) variables -- towards an empirical and corpus-based methodology. SPRIKreports 36:1--43.Google Scholar
Elke Teich, Silvia Hansen, and Peter Fankhauser. 2001. Representing and querying multi-layer annotated corpora. Proceedings of the IRCS Workshop on Linguistic Databases. Philadelphia: 228--237.Google Scholar

Multi-dimensional annotation and alignment in an English-German translation corpus
1. Computing methodologies
  1. Artificial intelligence
2. Hardware
  1. Power and energy
    1. Power estimation and optimization

Recommendations

Building an English-Vietnamese Bilingual Corpus for Machine Translation
IALP '12: Proceedings of the 2012 International Conference on Asian Language Processing

Bilingual corpora are critical resources for machine translation research and development since parallel corpora contain translation equivalences of various granularities. Manual annotation of word alignments is of significance to provide a gold-...
Read More
Building a training corpus for word sense disambiguation in English-to-Vietnamese machine translation
COLING-MTIA '02: Proceedings of the 2002 COLING workshop on Machine translation in Asia - Volume 16

The most difficult task in machine translation is the elimination of ambiguity in human languages. A certain word in English as well as Vietnamese often has different meanings which depend on their syntactical position in the sentence and the actual ...
Read More
POS-tagger for English-Vietnamese bilingual corpus
HLT-NAACL-PARALLEL '03: Proceedings of the HLT-NAACL 2003 Workshop on Building and using parallel texts: data driven machine translation and beyond - Volume 3

Corpus-based Natural Language Processing (NLP) tasks for such popular languages as English, French, etc. have been well studied with satisfactory achievements. In contrast, corpus-based NLP tasks for unpopular languages (e.g. Vietnamese) are at a ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
NLPXML '06: Proceedings of the 5th Workshop on NLP and XML: Multi-Dimensional Markup in Natural Language Processing
April 2006
104 pages
Conference Chairs:
David Ahn
University of Amsterdam
,
Erik Tjong Kim Sang
University of Amsterdam
,
Graham Wilcock
University of Helsinki
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 4 April 2006
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 307
  Total Downloads
- Downloads (Last 12 months)18
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Multi-dimensional annotation and alignment in an English-German translation corpus

NLPXML '06: Proceedings of the 5th Workshop on NLP and XML: Multi-Dimensional Markup in Natural Language Processing

ABSTRACT

References

Cited By

Recommendations

Building an English-Vietnamese Bilingual Corpus for Machine Translation

Building a training corpus for word sense disambiguation in English-to-Vietnamese machine translation

POS-tagger for English-Vietnamese bilingual corpus

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Multi-dimensional annotation and alignment in an English-German translation corpus

NLPXML '06: Proceedings of the 5th Workshop on NLP and XML: Multi-Dimensional Markup in Natural Language Processing

ABSTRACT

References

Cited By

Recommendations

Building an English-Vietnamese Bilingual Corpus for Machine Translation

Building a training corpus for word sense disambiguation in English-to-Vietnamese machine translation

POS-tagger for English-Vietnamese bilingual corpus

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media