Multimodal, Digital Artefacts as Learning Tools in a University Subject-Specific English Language Course

This paper explores the practice of using multimodal, digital assessment tasks assigned to students of an English for Architects and Civil Engineers course at a university in Germany. Students were tasked with creating multimodal video compositions and interviewed about the processes behind composing their artefacts. The goal was to interrogate to what extent multimodal assessment tasks such as these can promote the communication of technical concepts, facilitate nuanced opportunities for language development and develop the students as social agents. The artefacts were examined through the lens of Systemic Functional Semiotics, drawing particularly upon the Genre and Multimodality framework (Bateman et al., 2017; Bateman & Schmidt-Borcherding, 2018) and a recent approach to analysing multimodal artefacts developed by Turney & Jones (2021).


Introduction
The overt emphasis on digital, multimodal communication in the applied disciplines of architecture and civil engineering is not always reflected in related subject-specific English language courses. This is especially the case in Germany, where multimodal literacy has been neglected in favour of a textcentric approach to language education (Wilke, 2012). There is a pressing need to develop curriculum and assessment tasks in Teaching English to Speakers of other Languages (TESOL) education that better reflect the demands placed on English language students, both during their disciplinary studies and within their future workplaces. Additionally, the recent changes to the Council of Europe's Common European Framework of Reference for Languages (CEFR) have asked educators to substantially reimagine "… the user/learner as a social agent" (Council of Europe, 2020) and measure new competencies around mediation and plurilingualism. Further, educational policy around the world has increasingly emphasised the importance of digital literacy for 21st century social and civic participation (Lamb et al., 2017), especially with the recent move towards online education necessitated by the spread of  communicative modes", of which the four skills form only a part. These modes of communication are: reception (listening, reading); production (speaking, writing); interaction (a social skill) and a new mode, mediation. It is this last mode that is the most complex, involving as it does all the other modes, and it is also of most relevance to this paper. According to the CEFR, mediation is an "…interpretation or reformulation of a source text" very similar to "resemiotization", where meaning making shifts from context to context, from practice to practice, or from one stage of a practice to the next (Iedema, 2003, p. 41). Mediation, and its relationship to multimodal literacy, will be explored in more detail in 4.3 of the Results section. It will be suggested that multimodal assessment tasks are an excellent way to expand upon this area of language acquisition and provide educators with a way of satisfying the descriptors included within it. The multimodal assessment task explored in this paper involved the production of a 3-5-minute video composition (VC) explaining a concept from architecture or civil engineering to a nonspecialist audience using a variety of modes. This task was worth 20% of the students' overall grade.

Theoretical background
Multimodality theory allows us to explore the ways in which the connections and combinations between modes can unsettle existing practices, forge new connections and animate new meanings. In this sense, multimodality is closely tied to semiotics, in particular, to social semiotics (Hodge & Kress, 1988). This approach draws on Halliday's work on systemic functional linguistics (SFL), which sees language as a social action constituting culture (Halliday, 1994). Multimodal artefacts are those which communicate through a variety of modes simultaneously (Jewitt, 2005), and the lens of multimodality can help us to understand the complexities at play within them. Examples of modes include, but are not limited to, writing, images, music and architecture. At their essence, modes are shaped by culture to produce different kinds of knowledges, enact different social relationships and perform identities. However, the boundaries between what constitutes a mode can be somewhat diffuse, for example, the mode of "image" can usefully be broken down into image types, such as a photograph or a charcoal sketch. Indeed, even within a photograph there are different communicative elements, such as the choice of black & white or colour, and further still, within the realm of colour, communicative choices such as brightness, saturation and so on can be made.
For clarity, and in accordance with the work of Bateman et al. (2017), this paper conceptualises a mode as limited by the affordances of its materiality. They write, "The reach of a semiotic mode will usually be a refinement and extension of what its material carrier affords" (Bateman et al., 2017, p. 119); that is, the definitional boundaries of a mode are determined by the opportunities and constraints of its material context -such as which senses are engaged, whether and how time is involved, and so on -and also in terms of the discourse community within which it functions. It can be reasonably assumed that the student-creators of the video compositions (VCs) reported on here belong to one shared discourse community, narrowing the scope of available modes. Once the mode has been identified, it is then useful to think of modes as having different 'modal resources' (Bezemer & Kress, 2008), that is, within a mode like 'image', modal resources such as framing, composition and colour affect the meanings that are created and connoted (Kress and van Leeuwen, 1996). Jewitt et al. (2001, p. 27, emphasis original) argue that social semiotics allows us to see "…the process of learning as a dynamic process of sign-making". Small wonder, then, that in the past twenty years in particular, multimodal assessment tasks have flourished in classrooms around the world. Scholars have found that multimodal assessment tasks can enhance student creativity and agency (McGinnis, 2007), provide opportunities for increased levels of student engagement (Pandya et al., 2018) and better prepare learners for the future (Hafner, 2015). Although the two concepts are distinct (Alvermann, 2017), multimodal literacy has some overlap with the theory of "multiliteracies", first proposed by the New London Group (1996) and further developed by Cope and Kalantzis (2009), who emphasised the sociohistorical context of literacy. While the creation of multimodal texts may not be a new phenomenon, the increasing ubiquity of digital technologies has changed social practices (Lotherington & Jenson, 2011) and further democratised knowledge, with an increasing emphasis on collaboration and participation (Knobel & Lankshear, 2014).

Literature review
Digital, multimodal projects have been popular at all levels of education for at least the past twenty years, whether at primary (Burn & Parker, 2001 etc), secondary (Nash, 2018 etc) or tertiary levels (Nielsen, et al., 2016 etc). They have also been explicitly included in curricula all around the world, such as in Australia (ACARA, 2015), the United States (Lapp & Fisher, 2011) and Europe (EUMade4All, 2019). As such, it is no surprise that TESOL educators have also embraced multimodal literacy at all levels, including in primary (e.g. Grapin, 2019) and secondary education (e.g. Huang, 2019). From a tertiary TESOL perspective, Oldakowski (2014) argues that multimodal assessments deepen comprehension and promote engagement, while Zacchi (2016) asserts that multimodal meaning making is an effective way of traversing cultural differences in an increasingly globalized world. Jiang and Luk (2016), writing from a Chinese context, found that such multimodal assessment tasks increase students' "motivational capacity", with interviews indicating an increased sense of curiosity and cooperation, among other qualities. In Taiwan, Lee (2014) claims that multimodal learning practices enhance students' motivation and self-confidence.
Nevertheless, digital, multimodal assessment tasks remain the exception rather than the rule within most TESOL curricula. Writing not only from the tertiary perspective but across their work with young children, adolescents, and adults, Early et al. (2015) suggest that multimodality is "on the margins… in the TESOL community" (p. 450-1) and that we may need to rethink our course design and rewrite our textbooks if we are to wholly incorporate multimodality into language education. Lotherington & Jenson (2011) further this claim, arguing that in all L2 teaching contexts, teachers have been reluctant to embrace or even acknowledge multimodal literacy, favouring instead the 'flat literacies' of paper-based assessment.

Case context
The cases within the study were drawn from two courses of "English for Architects and Civil Engineers A". This was a 4-credit point, 14-week course taught in the winter semester 2019-2020 at a university in Germany. It was designed for students with an English language level of intermediate to advanced (B2-C1 according to the CEFR scale), and the courses attracted both undergraduate and postgraduate students, (n = 38), 32 of whom were majoring in civil engineering, with only six students majoring in architecture. This bias was likely a consequence of the students needing a B2 English to graduate from Civil Engineering, while the Architecture students had no such requirement. For 20% of their grade, students were tasked with producing a 3-5 minute video composition (VC) explaining a concept from architecture or civil engineering to a non-specialised audience using a variety of modes. The number of modes could vary, resulting in dynamic, standalone, two-dimensional artefacts (the material parameters of these artefacts will be explored in more detail in 4.1 of the Results section). Students were asked to upload the artefacts to the sharing platform Moodle before they were screened in class, where they were encouraged to lead discussions around the VCs, offering feedback and asking questions of their fellow student creators.
Students in both courses were also invited to participate in this research project, which had two phases of data collection: the artefacts themselves were collected, and after semester's end participants were interviewed about the processes of creating the artefacts and their perspectives as audience members. Of the 38 students in the two classes, 17 agreed to participate in the first phase of the study (artefact collection), with 7 of those also agreeing to the second phase (the interview). The semistructured interviews lasted between 19:56 and 46:11 minutes and involved a pre-prepared interview protocol. There were seven general questions relating to the processes behind designing the artefacts as well as a number of questions developed in response to specific elements of the VCs. There were also four additional general questions relating to their experience as audience members. The data was then investigated inductively, using NVivo to develop themes and code the responses.

Case study: "Human comfort in relation to architectural spaces"
The case reported on in this paper is comprised of a video composition and the information provided in two post-task interviews, one with the student composer (Student C) and one with a fellow-student audience member (Student A). This artefact was selected as it was considered to exemplify the task, as well as including six modes with varying affordances: text (both written and spoken), image (handdrawn sketches, cartoons and photographs), music, various typographical and layout elements, film and gesture. Student C is a C1 level (advanced) English language learner in an Architecture track program. Her artefact is 5:15 minutes and her interview lasted 36:02 mins. Student A is a B2 level (intermediate) language learner in a Civil Engineering program, and his interview lasted 46:11 minutes, of which 9:17 minutes were devoted to responses to Student C's artefact. The remainder of Student A's interview is not relevant to this paper and is not included in the data drawn upon here.

RQ1: To what extent can digital, multimodal assessment tasks promote the communication of technical concepts?
Measuring the impact of these video compositions on students' understanding of technical concepts is far beyond the scope of this qualitative study. However, one way of approaching this question is by examining the artefact through a lens developed by Bateman and Schmidt-Borcherding (2018) in their quantitative study of the effectiveness of educational videos in terms of learner uptake and engagement. After analysing the results of a knowledge test and an engagement survey, they suggest that a successful educational video should establish clear expectations and avoid sensory overload. The more successful videos "…prepare their audiences for their messages audiovisually and then use this preparation for presenting new information" (Bateman & Schmidt-Borcherding, 2018, p. 4), a process they divide into the two 'discourse units' of 'scaffolding' and 'development'. Units that scaffold information prepare audiences for what to expect later in the video, and units that develop information "elaborate or extend what has been introduced previously" (Bateman & Schmidt-Borcherding, 2018, p. 11).
In order to perform a fine-grained analysis of these very complex artefacts, they argue that the constraints and affordances of the media used must first be recognised in order to identify what is intentional and what is a result of a limitation of the medium (as this in turn restricts which semiotic modes can be employed). It is therefore important to identify the parameters of the physical situation in which VCs take place, what Bateman, et al. (2017, p. 96) term "…the 'canvas' that meaning is inscribed on…". According to their definitions, a video composition changes over time and is therefore "dynamic"; it is also viewed rather than participated in, making it "observational", and this is done through a computer screen, rendering it "two-dimensional". Further, the artefact itself cannot change, making it "immutable", and it can be rewatched for a limited time (until the course content is removed from Moodle), making it "partially transient". To analyse such a dynamic, 2D, immutable, observational, partially transient artefact, the artefact must be broken down into smaller, trackable units of meaning. The term "presentational micro-event" (PMEs) is useful here, as it facilitates looking at the artefacts as a collection of "…unit(s) of meaningful behaviour that may be distributed across several coordinated sensory channels…" (Bateman & Schmidt-Borcherding, 2018, p. 6).

Figure 1 A Visual Representation of Some Presentational Micro-events (PMEs) Sharing Meaning Intersemiotically
This is perhaps best illustrated with an example. In Figure 1, a section of the artefact is presented. In this image, five still frames of the video are presented above a text of the accompanying narration. The VC has been broken down into five PMEs, and it is possible to see here how meaning is shared across modes, or sensory channels. Three noteworthy moments occur at 2:02 minutes, when a photograph of a highway appears on screen with the narration of the term "street space"; and at 2:28 minutes, when the architectural term "dominance" is accompanied by a sketch of Zaha Hadid's Heydar Aliyev Centre in Azerbaijan; and again at 2:30 minutes, when a sketch of a terrace house co-occurs with the narrated term "adaptation". The example at 2:28 minutes should perhaps be unpacked: this is a good example of how an image can coinstantiate the technical field of discourse as it draws upon a shared repertoire of architectural knowledge. For the initiated, the Heydar Aliyev Centre is one of the most recognisable of Hadid's buildings. It carries with it some of the meaning of the term "dominance", because Hadid's designs are synonymous with the kind of architecture that ignores the context of its surroundings and has "…neither respect nor reference to its locality" (Bayley cited in Fairs, 2015). Student C follows this PME with a sketch of a terrace house at 2:30 minutes to co-instantiate the meaning of "adapting to the cityscape", shown in synchrony with the narrated term, "adaption". In this way she invites her audience to unpack the technical nominalisations of "dominance" and "adaption" by showing us a visual example of each. These can also be considered "grammatical metaphors", as they package complex processes as single elements within the clause (Macnaught et al., 2013); that is, the activities of "dominating a cityscape" or "adapting to a cityscape" are bundled into technical terms of considerable complexity ('dominance', 'adaption'), especially for English language learners. As such, Student C has chosen to depict buildings that exemplify the processes of dominating or adapting to the cityscape in order to help the audience "make sense" of these technical terms by sharing the "work" of making meaning intersemiotically.
Student C helps her audience understand her technical concepts in other ways as well. She frequently uses parallelisms and redundancies across the narration and the images, or the "audio and visual sensory channels", to support her audience's understanding. As can be seen in Figure 1 at 2:06 minutes, the terms of the text of the narration in colour coincide with the moment the bullet point of written text appears on screen. Student C has taken the time to temporally coordinate her visual information in order to support her audience's conceptual understanding: when she says "balanced" in the narration, the bullet point "balanced relationship" appears; when she says "connection", the bullet point 'connecting with the surrounding' appears; and when she says "to enter into" the bullet point "dialogue with other buildings" appears. This attention to detail shows not only that she has harnessed the affordances of the medium effectively and appropriately, but that she has given considerable thought to its pedagogic potential and her relationship to the audience.
Further, returning to Bateman and Schmidt-Borcherding's discourse functions of "scaffolding" and "development", Student C has placed scaffolding segments at regular and appropriate intervals throughout the artefact. They can be found at four intervals: at 0:35 minutes, when she introduces a taxonomy of the elements to consider when designing interior space; again at 1:34 when she taxonomises exterior space in the same manner; at 2:06 (see Figure 1), when she relates the building to its environment; and again at 2:39 minutes, when she categorises the ways in which individuals experience the built environment. This strengthens her participation in the genre of a 'system explanation', which will be elaborated upon below. It also helps the audience anticipate the meanings to come and focus their attention upon the most salient concepts.
She also makes meaning intersemiotically by repeating a sketch of a face throughout the VC. This sketch occurs at the beginning and end of the artefact, and, crucially, re-occurs when new information is being scaffolded as a "visual reminder" for the audience to focus their attention. The image also remains in the background of the artefact as a sort of visual "anchor", albeit at varying degrees of magnification. You can see it clearly at 2:04 minutes, for example ( Figure 1), but it is also present in the background at 4:27 (Figure 2), although the magnification transforms it into an indistinguishable blur of pixels. It works as an almost subliminal cohesive device, guiding the audience into unfamiliar conceptual territory while retaining a "familiar face". The fact that this is a hand-drawn sketch by the participant herself of a singer she enjoys speaks to the rich vein of interpersonal meaning that is present in the artefact but beyond the scope of this paper. It is worth noting, however, that eye-tracking research suggests that faces on screen, however small, attract and fixate the gaze (Wang & Antonenko, 2017, cited in Bateman & Schmidt-Borcherding, 2018), and it could be said that the sketch was included in an attempt to hold the audience's attention. Bateman and Schmidt-Borcherding (2018) suggest the constraints and affordances of the media employed -the "canvas" -should be identified in order to separate intentional from inadvertent meaning making. After this, the artefact can be broken down into smaller units of meaning -termed "presentational micro-events" (PMEs) -to track how meaning is made. Such an approach illuminates how Student C consistently shares the semiotic labour across modes to help her audience understand the technical concepts she communicates. For example, certain technical terms employed in the narration are elaborated visually in the form of sketches, and she frequently uses parallelisms and redundancies to support her audience's understanding. A successful educational video should also establish clear expectations and avoid sensory overload. According to Bateman and Schmidt-Borcherding (2018), this can be achieved through two processes they term "scaffolding" and "development". Scaffolding, that is, preparing the audience for what to expect, is observable in Student C's VC when she categorises elements of space and the built environment. She then 'develops' these taxonomies by elaborating upon what has already been introduced. All of these elements suggest that digital, multimodal assessment tasks such as this one can very effectively promote the communication of technical concepts.

RQ2: To what extent can digital, multimodal assessment tasks facilitate more nuanced opportunities for meaning making?
In order to understand meaning making, we need to situate the artefact in the context of the culture, and one of the best ways to do this is to identify which genre it is participating in. Genres can be seen as 'recurrent configurations of meaning' (Rose & Martin, 2012, p. 53) which make sense to the discourse communities that comprise the culture. As such, it is important that the students correctly identify and reproduce the genre required by the task. In order to better determine multimodal task fulfilment, Turney & Jones (2021) have developed a method of analysing tertiary-level, student-generated, educational videos. Drawing upon the Genre and Multimodality model (Bateman et al. 2017), they suggest first identifying the genre of the artefact before examining the media used and then exploring how the artefact unfolds intersemiotically, realizing configurations of register variables (patternings of field, tenor and mode). The medium and some of the intersemiotic meanings realised in the artefact have been briefly touched upon in the previous section, but it is worth unpacking the VC in terms of its genre. In the Martinian systemic functional perspective, genre is situated in the stratified context plane, departing from a strictly Hallidayan perspective which associates genre with mode (Martin, 2009). For an artefact to participate in a genre, it must have a characteristic structure and observable stages and phases contributing to the achievement of its social purpose (Derewianka & Jones, 2016). It is also realised in terms of three simultaneously occurring parameters: the field, tenor and mode of discourse. While field is concerned with the topic of the text and its representation of the world, tenor focuses on the relationships between the participants, and mode is concerned with the text type and its organisation (Martin, 2009). Identifying and understanding the boundaries of genre is essential to teaching and assessment, especially in a TESOL context, where the differences between genres may be more difficult for second language learners to identify and reproduce.
The task question explicitly asked students to "explain a concept from the fields of architecture or civil engineering…", and unsurprisingly, a great number of language features present in the narration of Student C's VC were typical of the "explanation" genre. Looking firstly at how she constructs her field of discourse, there are a marked number of generalised participants (e.g., "interior and exterior spaces") and nominalized abstract concepts (e.g., dominance, adaption) as well as causal relationships (e.g., "These experiences are triggered by the senses…") and a considerable amount of technical and specialized vocabulary (e.g., "form dimensioning"). It is harder to ascertain which of the explanation genres the artefact is participating in, as it does not unproblematically conform to any one. It can, however, be read as a system explanation, the explanation genre concerned with the relationships and interactions between different parts of a system. This genre typically begins by identifying the Phenomenon, describing the System, explaining the interaction between the Components of the System and concluding with a Generalisation (Derewianka & Jones, 2016, p. 205-6). These elements are broadly observable in Student C's VC, with "the interior", "the public space", "the architect" and "the human comfort" functioning as the Components. In this way, the interaction between the Components reflects the student's own sequential process through the narration: Interaction One is between the interior/exterior and human senses, or as she terms it, the "first step" of cognitive appraisal; Interaction Two occurs between the interior/exterior and human emotions ("the affective reaction"), and Interaction Three is between the interior/exterior and human aesthetics, or what she calls the third process of "aesthetic reaction". The interaction between the levels is less explicitly realised, but can be seen in her conclusion (4:38 -5:13 mins), where she narrates, "the relationship between space, human and content is what gives the place its expression".

Figure 2 Intersemiotic Meaning Making with Film, Image and Text
Having identified the genre, the question remains as to what extent multimodal assessment tasks such as this one can facilitate more nuanced opportunities for meaning making. Perhaps the most effective method of exploring this is to identify where meaning is made with notable depth or subtlety. Figure 2 depicts one such section of the artefact. Much like in Figure 1, the still frames in Figure 2 were taken from the VC between 4:11 and 4:37 minutes, the text of the narration is printed underneath, and the section is divided into four PMEs. What makes this section different to the one shown in Figure 1 is the use of video, shot by a friend of the participant at her request. Two separate videos were recorded, one at 4:11 -4:18 minutes and another at 4:21 -4:26 minutes, both of which share the meaning of the narration in subtle and interesting ways. In the first video, a hand-held camera pans upwards, simulating the craning of a neck as Student C narrates "… a church or mosque where you feel delightful, peaceful". Neck craning both connotes and is a physical response to awe, and this gesture corroborates the meaning made in her narration. Similarly, a second film, also suggestive of a first-person experience with the use of a hand-held camera at head height, carries some of the meaning of the narration at 4:21 minutes ("… radiates a feeling of coldness, such as a basement"), with the jolting movement of someone walking into a dark, narrow space. Student C is sharing the work of making meaning across modes and is also elaborating upon the experiences she describes, inviting her audience to share in her experiences and affiliate around her values (Knight, 2010): feeling "peaceful" and "delightful" as they "gaze" upwards at a light-filled dome in a mosque, and feeling "cold" in the dark basement. Exploring the interpersonal patternings instantiated in the tenor of this artefact is beyond the scope of this paper, but such finegrained attention to detail also guides the audience to engage with and comprehend the key concepts explained in her artefact and emphasised in her title "Human comfort in architectural spaces". Although some additional information is provided in the narration that is not manifested visually, her most salient ideational meanings are almost unfailingly supported across more than one mode simultaneously. Whenever she presents key ideas, for example, she both narrates and visually displays the key points on screen (see Figure 2 at 4:27 minutes).
Similarly, the information presented visually supports the meanings being made in the audio text, even if this is not always performed flawlessly. When asked about her inclusion of what appears to be a cartoonish drawing of David Bowie at 4:27 minutes (see Figure 2), alongside drawings of a man in a suit and a woman with a flower crown, Student C commented in her interview data that "…they're all different, so I just wanted to include that or stress that". For the student creator, these animations expand upon the meanings made in her narration ("…the aesthetic reaction, which is taste dependent and different from each person. Does the room correspond to my style?"), and so what at first appears unnecessary or distracting, upon closer examination is yet another example of her semiotic decision making. Such "representation" produces a sign that is focused on Student C's interest, rather than "...the assumed interest of the recipient of the sign" (Kress, 2010, p. 71). Although the cartoons she chose may not be as transparent in their meanings as other elements in the artefact, they are nevertheless examples of motivated representation rather than meanings made inadvertently.

Summary of RQ2:
(To what extent do digital, multimodal assessment tasks facilitate more nuanced opportunities for meaning making?) For meaning to be made effectively, the artefact should be appropriately situated within the context of the culture. TESOL students in particular often struggle with identifying and participating in genres and reproducing the stages and phases that contribute to their structures. This multimodal assessment task facilitates opportunities for effective meaning making in a broad sense by necessitating that the artefact participates in one of the explanation genres, as Student C's does. The task also facilitates more nuanced opportunities for meaning making, as evidenced by the rich and subtle meanings made by, for example, certain camera movements in her filmed segments to variously communicate awe and claustrophobia.

RQ3: To what extent do digital, multimodal assessment tasks develop the students as social agents?
As mentioned in section 2.1 of the Background, the Council of Europe has redesigned their CEFR assessment criteria in "...a move away from the matrix of four skills…" and towards "… reallife language use (Council of Europe, 2020, p. 33)". They have not only added a new competence, plurilingualism, but have also expanded on the fours skills to add two new "communicative modes" (see Figure 3), of which one, mediation, is of particular relevance here. Mediation emphasises "…the constant movement between the individual and social level in language learning, mainly through its vision of the user/learner as a social agent" (Council of Europe, 2020, p. 36). It also bears a striking resemblance to Iedema's "resemiotization", the process of transforming meaning across contexts and practices (Iedema, 2003). The theoretical background for this conceptual shift is based, however, on the work of Vygotsky (1978) and sociocultural theory, as well as the ecological model (van Lier, 2000) and complexity theories (Piccardo, 2015, all cited in Council of Europe, 2016.

Figure 3
The Relationship Between Reception, Production, Interaction and Mediation (Council of Europe, 2020, p. 34) Multimodal artefacts such as the one explored here position the learner as a social agent in screenings and uploadings of their work. The skills demonstrated throughout this task also bear a striking resemblance to some of the updated descriptors included in the appendix of the latest companion volume of the CEFR. This could be of benefit to TESOL educators wishing to accredit their students with CEFR certification. Examples of how multimodal assessment tasks could meet CEFR descriptors is provided in Figure 4. Multimodal assessment tasks such as this one could also be adapted to groupwork, which would incorporate some of the other new mediation competences, such as "Managing interaction" and "Collaborating to construct meaning".
The Council of Europe have added eighteen new competences, with detailed descriptors provided for learners from beginner (A1) to advanced (C2) levels (Council of Europe, 2020). Of these eighteen, only the eight competences most relevant to multimodal literacy are shown here (Figure 4), along with one corresponding descriptor for both B2 and C1 levels. In order to demonstrate how closely aligned some of the new descriptors are with many of the learning outcomes of this assessment task, it is worth looking at one competence in closer detail, along with two of its related descriptors at both C1 and B2 levels. One example of demonstrating mediation competence at C1 level is that students should be able to "…explain (in Language B) the relevance of specific information found in a particular section of a long, complex text (in Language A)", while at B2 level, they should be able to "…interpret and describe reliably (in Language B) detailed information contained in complex diagrams, charts and other visually organised information (with text in Language A)" (see Figure 4). This dovetails beautifully with the task requirements of the assessment explored here and is richly demonstrated in the artefacts. When students rephrase, or "mediate" the academic, German language of their lectures and textbooks into "everyday" English, they also fulfil the descriptors for both spoken and written mediation competence. This is visible not only in the labels and text boxes of their VCs, but also in the scripts they compose to prepare for their narration: six of the seven students interviewed reported that they wrote a text to read aloud in advance of their audio narration.
Further, within the category, "mediation strategies", there are a further two sub-categories of particular relevance here: "strategies to explain a new concept and strategies to simplify a text" (see Figure 5). The descriptors here are remarkably relevant to this project: for example, students at C1 level are expected to be able to "...explain technical terminology and difficult concepts when communicating with non-experts about matters within their own field of specialisation". Similarly, at B2 level, learners should be able to "...explain technical topics within their field, using suitably non-technical language for a recipient who does not have specialist knowledge" (see Figure 5). The simplifying strategies are also highly pertinent: at C1 level, students are expected to "...make complex, challenging content more accessible by explaining difficult aspects more explicitly…", while at B2 level, students should "...make concepts on subjects in their fields of interest more accessible by giving concrete examples…" (see Figure 5).

Figure 4
Some of the Mediation Descriptors Satisfied by this Assessment Task (Council of Europe, 2020, p. 198-241). (Council of Europe, 2020, p. 119-122) Summary of RQ3: (To what extent do digital, multimodal assessment tasks develop the students as social agents?) The Council of Europe has redesigned their CEFR assessment criteria to include plurilingualism and mediation, and the skills demonstrated throughout this multimodal assessment task are very closely aligned with a great number of the related descriptors. For example, students should be able to "… interpret and describe reliably (in Language B) detailed information contained in complex diagrams, charts and other visually organised information (with text in Language A)" as well as "...explain technical terminology and difficult concepts when communicating with non-experts about matters within their own field of specialisation" (Council of Europe, 2020, p. 119-122). In this sense, digital, multimodal assessment tasks such as this one contribute considerably to developing the students as social agents through "...the constant movement between the individual and social level in language learning" (Council of Europe, 2020, p. 36), as well as in the screenings and uploadings of their work.

Conclusion
Despite the ubiquity of multimodal communication, the skills involved are largely neglected in tertiary TESOL classrooms in Germany. The Council of Europe has attempted to address this with new competences, claiming that "…tasks in the language classroom should involve communicative language activities… that also occur in the real world" (Council of Europe, 2020, p. 32). However, a reluctance to move away from the 'four skills' persists. This case study is an attempt to demonstrate the usefulness of multimodal assessment tasks by examining the results of one in close detail. Returning again to the research questions posed by this paper, it has been demonstrated that this task can very effectively promote the communication of technical concepts (RQ1) and provide opportunities for nuanced meaning making in English (RQ2) while simultaneously developing the students as social agents (RQ3). More broadly, tasks such as these can prepare students for their disciplinary studies and the job markets of the future, while also helping them become CEFR accredited, especially in terms of their mediation skills.