LARA: an extensible open source platform for learning languages by reading

Learning and Reading Assistant (LARA) is an open source platform that enables conversion of plain texts into an interactive multimedia form designed to support secondand foreign-language (L2) learners. In this workshop, we illustrate the open source aspects using collaborative work carried out during a six-week summer project at the Árni Magnússon Institute for Icelandic Studies. Three undergraduate level students extended the platform in different directions in cooperation with other members of the international LARA team. The three subprojects were respectively concerned with adding automatically generated flashcards, adding multimedia versions of poetic texts in the archaic language Old Norse, and extending LARA to allow the inclusion of sign language content in Icelandic sign language – Íslenskt TáknMál (ÍTM). All three reached successful conclusions.


Introduction
LARA 10 (Akhlaghi et al., 2019) is a collaborative open source 11 project, active since mid-2018, whose goal is to develop tools that enable conversion of plain texts into an interactive multimedia form designed to support development of L2 language skills by reading. The basic approach is in line with Krashen's (1982) influential theory of input, suggesting that language learning proceeds most successfully when learners are presented with interesting and comprehensible L2 material in a low-anxiety situation. LARA implements this abstract programme by providing concrete assistance to L2 learners, making texts more comprehensible to help them develop their reading, vocabulary, and pronunciation skills. In particular, LARA texts include translations and human-recorded audio attached to words and sentences, and a personalised concordance constructed from the learner's reading history. The learner, just by clicking or hovering over a word, is always in a position to answer three questions: what does it mean, what does it sound like, and where have I seen it before? Figure 1 shows an example.
Related platforms, from which we have adapted some ideas, include Learning With Text 12 and Clilstore 13 . LARA, however, offers considerably more functionality.
In particular, generation of learner-specific concordances is, as far as we know, unique to LARA. The LARA tools are made available through a free portal, divided into two layers. The core LARA engine consists of a suite of Python modules, which can also be run stand-alone from the command line. These are accessed through a web layer implemented in PHP 14 . There is comprehensive online documentation 15 .
In this paper, we will concentrate on the open source aspects. We illustrate this using work carried out during a six-week summer project at the Árni Magnússon Institute for Icelandic Studies, where three BA students, who had previously not worked with LARA, extended the platform in different directions with some assistance from other members of the LARA team. The three subprojects were respectively concerned with adding automatically generated flashcards; creating multimedia versions of poetic texts written in the archaic language Old Norse; and extending LARA to allow inclusion of sign language content. In sections 2 to 4, we briefly sketch the three subprojects. The final section summarises and concludes.   (1); the text is in the upper pane, clicking on a word displays information about it in the lower pane; here, the user has just clicked on part of the multiword il y a ('there is') (2), showing an automatically generated concordance (4); hovering the mouse over a word plays audio and shows a popup translation; clicking on a loudspeaker plays audio for the preceding sentence (3); the backarrows (5) link each line in the concordance to its context of occurrence; a link to the document can be found on the LARA content page 16

Extending LARA with flashcards
A common suggestion we have received from LARA users is that it would be useful to make the platform more interactive and include functionality that allows learners to test their understanding of a text. The first subproject, carried out by a student who had just completed a Bachelor of Science in computer science, addressed this idea by adding capabilities to create flashcards automatically extracted from a LARA text. A member of the core LARA team first wrote a toy version of the flashcard module in Python, showing how to extract the necessary information from the internalised form. The student then worked autonomously, except for a couple of requests for low-level functions to obtain other types of internal information. Finally, the flashcard module was incorporated into the web layer by another member of the core LARA team, working together with the student.

Using LARA for Old Norse
In most countries, students at middle schools are required to read classic works of literature that play an important part in the relevant country's cultural history: English children read Shakespeare, French children Molière, etc. The archaic language of the texts is, in general, not fully comprehensible to the students without some explanation. Our second subproject was designed to see if LARA could provide assistance in this kind of situation. In Iceland, the appropriate culture referent is the Poetic Edda, a poem-cycle first written down in the late 13th century, but composed earlier. The Edda is written in Old Norse, the language spoken in Iceland between the 8th and 14th centuries, from which Modern Icelandic has developed. Old Norse is much closer to Modern Icelandic than English is to Old English, but still displays substantial differences: the grammar is not exactly the same, many words have shifted in meaning or have different spellings, and some have disappeared.  (1); recorded audio for each verse (2); text in both original (3) and modern orthography (4); words in red are linked to informative notes (5); clicking on a word, here skein, displays the information page for the lemma, here skína (6); runic symbol (7) displays translation of verse; automatically generated links (8) to online resources; automatically generated concordance (9); list of notes (10); frequency and alphabetical indexes (11); hovering the mouse over a word plays audio and shows a popup translation (not featured) The student responsible for the subproject, who had just completed her second year of a BA degree in Icelandic, used the LARA portal to create versions of the two best-known Edda poems, Völuspá and Hávamál; in contrast to the other subprojects, this did not involve developing any new platform functionality.
The poems are annotated with glosses in Modern Icelandic, and read with adapted Old Norse pronunciation. Key words and phrases, most often names of gods and places, are linked to explanatory notes. An interesting aspect concerns kennings, poetic phrases typically of two or three not necessarily contiguous words characteristic of the Edda and related Old Norse poems. These could successfully be handled by the multiword annotation scheme illustrated in Figure 1, a use of this mechanism we had not anticipated. An example of a page from an Edda text is shown in Figure 3, and a 'sentence with gap' flashcard in Figure 4. This work is described at greater length in Bédi et al. (2020, this volume). The student has to find the missing word in the incomplete verse presented at the top; after they have answered correctly, they can listen to the whole verse using the audio control

Adding sign language to LARA documents
The theme of the third subproject was sign language. Here, the intention was to create an initial version of a LARA text designed for Deaf learners. The starting point was an existing LARA text for an Icelandic children's story, Tína fer í frí; this story had been constructed for a previous experiment , where it had been used by beginner/intermediate L2 learners in an Icelandic-for-foreigners course. In the current project, we repurposed the text so that it could be used by Deaf signers of ÍTM who wished to strengthen their Icelandic reading skills. Like all sign languages, ÍTM has no grammatical connection with the surrounding oral/ aural language, here Icelandic. It is thus by no means assured that native signers of ÍTM will have strong reading skills in Icelandic, which, for them, is a second language.
As with the other two subprojects, core members of the LARA team did a small amount of preparatory work, generalising the treatment of multimedia so that word and sentence annotations could be supplied in video and audio forms. The rest of the project was performed by the student, who had just completed a BA degree in ÍTM and translation. He created signed videos for the words and sentences in the text, using the online recording tool integrated with LARA, after which the LARA platform scripts downloaded and linked everything together to create the final document.
The signed video extension was incorporated into the flashcard module developed during the first subproject. Examples of LARA pages and flashcards for ÍTM are shown in Figure 5 and Figure 6. This work will be presented at greater length elsewhere.

Summary and further directions
We have briefly described three summer projects where students without previous exposure to LARA extended it in different directions over a six-week period. This ambitious program was completed in half the time that was originally planned; encouraged by the successful results, we envisage further collaboration with the same and new collaborators. If you are interested in developing other open source extensions to LARA and need assistance, please feel free to contact us at the addresses given above.