Corpus linguistics

doi:10.4324/9781315158945-16

ABSTRACT

Corpora are collections of authentic texts selected and assembled to study language. This chapter provides an introduction to corpus use for translation scholars, reviewing the theoretical assumptions underlying it and providing an introduction to the different corpus types (monolingual comparable, parallel, and composite designs) and the different tools available for studying translation through corpora (concordances, clusters, collocates, frequency lists, keywords, n-grams). Several application areas are identified as especially central, such as the search for typical features of translation, the investigation of translator style, and translator education. While providing an accessible introduction to the field, the chapter also aims to explore some more advanced notions. First, in terms of methods for corpus exploration, quantitative lexico-grammatical indices, annotation-based data extraction methods, and inferential and exploratory statistics for corpus comparison are briefly presented. Second, several critical issues and topics are discussed, namely the emergence of increasingly composite corpus designs, the wider adoption of quantitative methods and sophisticated techniques borrowed from natural language processing, the emergence of new hypotheses and research paradigms which combine corpus techniques with other data sources, the inclusion in the research paradigm of novel forms of translation, and the comparison of translation with other forms of bilingual communication.