Data-driven learning for languages other than English: the cases of French, German, Italian, and Spanish

This paper summarises the contributions to EuroCALL’s CorpusCALL SIG Symposium for the year 2020. In line with this year’s EuroCALL conference theme, ‘CALL for widening participation’, the Symposium centred around the theme of Data-driven learning for languages other than English. This paper gives a brief overview of developments and challenges when using Data‐Driven Learning (DDL) to teach French, German, Italian, and Spanish. As research suggests, a DDL approach has been effectively utilised to teach these languages. However, there are differences in available DDL resources and corpora for the respective languages that are appropriate for language teaching. The main challenges for future developments are also discussed.


Introduction
This paper shares developments in using DDL in teaching Languages Other Than English (LOTEs) within the wider DDL community. As literature on DDL has primarily focused on studies in the context of teaching English (Chambers, 2019), we provide brief overviews of the current state of DDL in relation to the teaching of French, German, Italian, and Spanish. Each overview discusses challenges and proposes solutions to realising the full potential of the DDL approach. First, an empirical study of DDL for French is reported. Next, we provide a brief overview of the range and effectiveness of corpus resources used for teaching and learning German and indicate directions for future resource development and empirical research. We then trace a brief historical overview of DDL for Italian, with an indication of the main challenges that the field faces today. Finally, challenges of DDL for teachers and learners of Spanish are discussed.

DDL for French: linking professional communication skills and linguistic features
Research papers from the French DDL community mainly report on indirect applications (Vyatkina, 2020a), with learner corpora analysed as error repositories (Dubois, Kamber, & Dekens, 2013) or as resources for designing learning materials (Di Vito, 2013). Direct applications are mentioned within the context of academic writing (Jacques & Rinck, 2017) and French for specific purposes (Rodgers & Chambers, 2011). Here we present the results of a study focusing on the direct use of a small, specialised corpus by a group of 12 international engineering students enrolled on a professional writing course for advanced learners of French as a foreign language (target level: B2-C1). The study aimed to determine whether guided observation of corpus data could help these students better understand recurrent language errors in their first drafts of technical specification documents, in French called 'Cahier des Clauses Techniques Particulières' (CCTP). We chose 14 CCTP samples to create a corpus accessible via Sketch Engine (Kilgarriff et al., 2014). In this corpus, we identified linguistic features corresponding to the professional communication skills targeted (see Table 1). The observed errors mainly correspond to these features. During the course, the participants completed worksheets containing activities partly inspired by their own errors, and they answered two online questionnaires.
The data obtained inform about the learners' progress and remaining needs. We conclude from this study that the specialised CCTP corpus offers enough data to support students who have to write a pedagogical version of a CCTP. However, more training time is needed to better explain to them the technical features of Sketch Engine. They also need to learn how to notice linguistic features and report their findings.
To boost the DDL L2 French sector, we recommend choosing a user-friendly corpus tool and concentrating on learning issues. The content of the corpus must correspond to the writing task and the query activities should focus on the observed learning needs.

DDL for German: available resources, learning outcomes, and future directions
The subfield of DDL for German, like the broader DDL field, can be divided into pedagogical materials, classroom reports, and empirical research. The subfield's origins go back to the turn of the 21st century (e.g. Dodd, 2000;St. John, 2001).
In the most recent synthesis of DDL research, Boulton and Vyatkina (forthcoming) identify 14 empirical studies that explored the effectiveness of DDL for teaching German. Like most DDL research (ca. 90% of which has been dedicated to teaching English), studies on DDL for German primarily focus on university contexts and DDL interventions developed and administered by the researchers themselves. They report improved learner knowledge of German lexico-grammar and pragmatics as well as writing, translation, and interpreting skills and favourable learner attitudes. The geographic coverage of these studies is encouragingly broad, including seven countries and three continents, which attests to the generalizability of the findings. While more studies are needed in university contexts, promising future directions could also include an expansion of DDL for German to primary and secondary schools.
A unique feature of the German subfield is the availability of several large, welldesigned, sustainable, and open-access corpora. The missing link between these rich resources and a broader German-learning and German-teaching population is teacher and learner DDL guides, written in accessible language and tethered to specific corpora. One such guide to using the DWDS corpus (http://dwds.de) and associated DDL exercises currently are being developed and gradually released with open access at the University of Kansas (Vyatkina, 2020b). It is hoped that other DDL researchers can use this resource as a model for "bringing corpora to the masses" (Boulton, 2011, p. 69) in DDL for German and beyond.

DDL for Italian: studies, practices, and future prospects
The studies on DDL for Italian cover a time span of at least 27 years. A solid starting point can be traced back to 1993, when Polezzi published her pioneering work in ReCALL. Polezzi (1993) showed how a corpus of Italian for specific purposes could be built and used with beginner learners of Italian enrolled in a postgraduate course in Renaissance Studies. She supported the idea of a didactic language corpus, identifying the characteristics that would make such a corpus suitable for specific language learning needs.
Since then, the studies on DDL for Italian have risen steadily but not steeply. To the best of our knowledge, they are no more than 20 in total, consisting mostly of descriptive studies (e.g. Corino & Marello, 2009), and with still very few empirical studies (e.g. Forti, 2019).
The pedagogical practices adopted in the context of DDL for Italian have been closely linked to the characteristics of available corpora. While freely accessible reference corpora of Italian are available, they were primarily built by researchers for researchers. As a result, their pedagogical potential is generally restricted to the development of paper-based materials and to advanced-level learners. The first learner-friendly corpus exploration tool for Italian was developed very recently, within the SkELL platform .
Bridging the teacher-researcher gap (Chambers, 2019) is one of the main challenges that DDL for Italian faces today. Integrating corpora in teacher training programmes, publishing teacher guides and developing more learner-friendly corpus exploration tools are ways to help bridge this gap.

DDL for Spanish: attitudes and tasks in the use of corpora
DDL did not have a name in Spanish until fairly recently. Two terms were coined (aprendizaje basado en datos and aprendizaje guiado por datos). The field adopted the former, likely thanks to the seminal article by Asención-Delaney et al. (2015), which reported the profusion of pedagogical articles and the shortage of empirical studies. Since then, the field has experienced a steady growth of empirical research in DDL with both native and learner corpora as sources (Benavides, 2015;Yao, 2019).
In terms of resources, there are vast open-access native corpora, such as Corpus del Español (BYU) or CORPES XXI, and also important learner corpora (such as CAES, Aprescrilov, CEDEL 2). Among the numerous pedagogical articles, the scope of learning targets has widened from lexico-grammar to pragmatics, discourse features and pronunciation (using oral corpora), and varieties of Spanish.
Corpus-based tasks can also be found in recently published textbooks (e.g. Aula Internacional 4, Prisma C2), which is helping to spread DDL among practitioners and learners.
Despite this growth spurt, DDL is very far from being normalised in Spanish as a foreign language teaching practice. One main challenge lies in changing teachers' attitudes towards corpus use by training programmes and by integrating corpus use in the syllabus. As in other LOTEs, most Spanish teachers do not seem to be aware of the benefits of using corpora in language teaching. In addition, there is a need for ready-made materials and "online corpus user guides for teachers and exercises integrated with specific corpora" (Vyatkina, 2020a, p. 364) that can inspire teachers to develop their own corpora.

Conclusions
This brief overview on DDL research for LOTEs revealed that DDL has effectively been used for teaching the languages considered. Challenges to DDL often centre around availability of appropriate corpora and tools for practitioners. The paper concentrated on a handful of European languages. Further reviews should explore developments of DDL within a wider geographical scope, including, for example, Arabic, Mandarin, and Russian.