A systematic review of foreign language learning with immersive technologies (2001-2020)

This study provides a systematic literature review of research (2001–2020) in the field of teaching and learning a foreign language and intercultural learning using immersive technologies. Based on 2507 sources, 54 articles were selected according to a predefined selection criteria. The review is aimed at providing information about which immersive interventions are being used for foreign language learning and teaching and where potential research gaps exist. The papers were analyzed and coded according to the following categories: (1) investigation form and education level, (2) degree of immersion, and technology used, (3) predictors, and (4) criterions. The review identified key research findings relating the use of immersive technologies for learning and teaching a foreign language and intercultural learning at cognitive, affective, and conative levels. The findings revealed research gaps in the area of teachers as a target group, and virtual reality (VR) as a fully immersive intervention form. Furthermore, the studies reviewed rarely examined behavior, and implicit measurements related to interand trans-cultural learning and teaching. Interand transcultural learning and teaching especially is an underrepresented investigation subject. Finally, concrete suggestions for future research are given. The systematic review contributes to the challenge of interdisciplinary cooperation between pedagogy, foreign language didactics, and Human-Computer Interaction to achieve innovative teaching-learning formats and a successful digital transformation.


Introduction
"Digitalization [...] is an issue that concerns many educational stakeholders" [8, p. 1] Digital devices should be commonplace in all schools. However, many scientists, as well as the pandemic, have revealed many challenges of the appropriate digitalization in educational systems [89,90]. Firstly, the digital equipment of schools is one success factor. Second, the acceptance among teachers and digital competences to handle the devices are crucial success factors. Third, interdisciplinary incorporation of digital and pedagogical knowledge resulting in innovative teaching-learning formats are essential for a successful digital transformation and effective teaching and learning effects [97]. Particularly, immersive technologies provide much potential to train competencies and increase acceptance on the one hand [72,73]. Concurrently, immersive technologies are geared for the development of innovative teaching-learning formats and effective teaching and learning effects [96]. While Garzón, and Acevedo analyzed 64 quantitative research studies from 2010 to 2018 to investigate the impact of AR on students' learning outcomes and illustrated that AR had a meaningful effect on student learning gains, there is less research systemizing learning with VR than AR [34]. Hamilton and colleagues noted in their literature review that affective behavioral changes in non-pedagogical VR applications are extensively studied. However, they were underrepresented in education which is an important area for future research [39]. VR can make an essential contribution to education by allowing students to directly experience environments or situations that are difficult to recreate with traditional teaching methods [79,97]. These potentials might be highly relevant for foreign languages, especially in the current Covid-19 pandemic and climate crises, where immersion programs are less feasible. In sum, the field lacks systematic reviews revealing successful gestalt features of the whole immersive technology spectrum in foreign language learning and research gaps that should be addressed by empirical studies. The present systematic review of the literature addresses foreign language learning with immersive technology to identify and analyze research gaps and encourage the interdisciplinary cooperation between pedagogy, foreign language didactics, and Human-Computer Interaction. Five research questions are to be answered: (RQ1) How are virtual, fully immersive learning environments used for foreign language learning, and (RQ2) which characteristics of immersive technology support foreign language learning? (RQ3) Can virtual, fully immersive learning environments increase motivation, and success in learning a foreign language? (RQ4) Can they change participants' attitudes through intercultural encounters, and (RQ5) how are they used for teacher training?

A short view on virtual reality
Virtual reality (VR) and augmented reality (AR) are subordinate to the term immersive technologies or extended reality (XR) [99]. AR refers to the real-time combination of digital and physical information through different technological devices, and VR creates a complete, artificial virtual environment and thus offers complete virtualization [9,99]. In the context of behavior change, the BehaveFIT framework described four potentials of VR to support behavior change processes: selfrepresentation (e.g., perspective-taking), context-representation (e.g., showing different environments), representation of others (e.g., showing diverse others), and the representation of objects (e.g., including personal everyday objects) [96]. Those potentials can be transferred to a teaching and learning context and have already shown significant influence [1,2,44,74,77].
Experiencing VR is strongly linked with the concepts of immersion and presence (e.g., [83] Immersion and presence can probably be considered a so-called hygiene factor [95]. To a certain extent, it might be necessary to allow other VR potentials to become effective. However, the term immersion is understood differently in many areas and also within the field of human-computer interaction [85]. The best known and most widely accepted definitions of immersion in the field of Human-Computer Interaction (HCI) are those of Witmer and Singer [98] and those of Slater [84]. We followed the latter, defining immersion as objective system factors that influence the sense of presence. Immersion stands for what the technology delivers in all sensory and tracking modalities and that it can be objectively assessed (for more discussion about different uses of the term immersion, see, e.g., [83,95]). In contrast, presence can be defined as a human reaction to a system of a certain level of immersion and thus describes a subjective state [85,86]. Defining immersion by technological factors, immersive applications can be categorized by the degree of immersion. One possibility is a categorization along the virtual-reality-continuum [69]. VR applications create an entirely artificial virtual environment and pertain as most immersive [9] followed by mixed reality (MR) applications. MR refers to a specific subset of immersive technologies that involve merging real and virtual worlds [69]. This includes, for example, augmented reality (AR) applications and applications in which 360 • content is shown through a head-mounted display (HMD). Another wider categorization possibility refers to extended reality (XR) that combines all real and virtual environments and refers to human-machine interactions generated by computer technology and wearables [32]. It is used to cover all these approaches or to indicate their combined use. Therefore, multi-user virtual environments (MUVE's), such as Second Life, can be included in such a categorization definition. To enable a comprehensive systematization, the present review included all studies designated as immersive and follows the wider possibility of categorization.

A short view on language learning and beyond
Language learning is initially shaped by learning vocabulary, grammar, and text analysis. However, in Europe, where constructivist views on language learning and education dominate the academic discourse, language learning is integrated into communicative-based learning. Language acquisition takes place during a communicative exchange [37,57]. Topics are mostly related to the foreign country conveying knowledge about culture, traditions, living, for example. Further, due to the advancing globalization, not only language competencies, but also intercultural competencies have become an important part of modern foreign language teaching [33]. While intercultural competence is often seen as a linear model [40], Chen, and Starosta assume a multidimensional construct, "consisting of three interrelated aspects: intercultural sensitivity (affective aspect), intercultural awareness (cognitive aspect) and intercultural adroitness (behavioral aspect)" [13,103]. However, research shows that many adult students do not possess the necessary intercultural skills [103]. A lack of intercultural competencies leads to prejudice, discrimination, and unfriendly expressions directly related to misunderstandings between people from different cultural backgrounds and affiliations [4] -consequences that might be particularly severe in school. A meta-analysis that examined different interventions for intercultural competencies found that immersion programs moderated intercultural competence more strongly compared to educational interventions with students. The study revealed that immersion programs primarily addresses the cognitive and affective levels. Therefore, in future studies, they recommend to include formats addressing the conative level as well [103]. This recommendation is in line with modern pedagogic principles focusing on competence-and actionorientated as well as situated learning, connectivity, and co-construction [55].
According to the potentials described above, VR enables an active exploration of distant and diverse learning contexts, the highlighting of achieved milestones, the possibility to change roles, and to encounter the unknown in a tangible way [97]. Thus, the potentials of immersive technologies draw on these pedagogic principles.
Especially in times of Covid-19, where immersion programs are much less feasible and in times of climate crisis, social VR applications may offer enormous potential. Exchange programs and the experience they provide learners with are probably irreplaceable. However, for larger school classes or to create social fairness, a virtual exchange via a social VR application may be the ideal solution, or addendum.

Research questions
VR has many potentials for language and intercultural learning. However, the field lacks systematic reviews describing VR interventions, manipulations, distinctive immersion factors, and corresponding outcomes leading to the following research questions:

Method
For this literature research and review, we followed the PRISMA statement [70] and the search strategy of Cooper et al. [24]. In addition, we followed the Population Intervention Comparison Outcome (PICO) scheme [61]. The selected populations were schools, universities, or other educational institutes. All interventions analyzed included an immersive setting, i.e., AR, MR, VR, or applications described as immersive by the authors. These immersive interventions should be compared to conventional learning and teaching, or between. The desired outcome is an improvement of language skills and/or an increase in intercultural competence, so a change in attitude towards the foreign culture, their people, or language [10].
In the following, we describe all sources of information used during the search process. Furthermore, we described the selection process of studies (identification, screening, eligibility, inclusion in the systematic review). The number of pre-selected studies, screened for eligibility, and included in the review, with reasons for exclusion at each stage, are shown and presented in a flowchart ( Figure 1). Next, the categories according to which the studies will be examined and classified will be presented to identify successful gestalt features, answers to the research questions, and potential research gaps.

Data collection
The selection process of studies followed four steps: (1) identification, (2) screening, (3) eligibility, and (4) final inclusion ( Figure 1). Figure 1. The PRISMA flow diagram for the systematic review detailing the database searches, the number of abstracts screened, the full texts retrieved, and the number of papers included.

STEP 1: identification
For the literature search, the databases IEEE Xplore, EBSCOhost (academic search ultimate), Web of Science, and ACM Library were scanned. The search phrases used resulted in 32 search queries (Table 1). Only peer-reviewed academic journals, conference papers, reports, and reviews that have been published since April 2001, are written in English or German, and contain either qualitative or quantitative data acquisition or both were included. The article must be an empirical study and explicitly address the learning of a second language. We did not explicitly search for intercultural learning since our focus was on intercultural aspects during language learning. The last search request was submitted on May 31, 2020. The studies' intervention had to include an immersive setting, e.g., AR, MR, VR, or a system described as immersive by the authors. All papers that had unfitting terms or did not fit the journal topic or population were excluded. The search yielded 2507 results after removing duplicates. During the Level 1 screening, the titles and abstracts of the papers were scanned. Studies without qualitative, quantitative, or mixed-methods data collection were excluded. In addition, papers were excluded whose target group was not appropriate (n=18). These included, for instance, studies with the following terms: blind, autism, deaf, disabilit*, down syndrome, etc. People with disabilities have special needs for learning a language, which we do not want to investigate in this context. Given the frequent overlap of these excluded target groups with unrelated terms related to "medical" these excluded studies were included in this category. Further, unrelated topics included papers with medical terms like "rehabilitation", "medical", or "physio*", but also general medical journals (n=97), with industrial terms like "Internet of Things", or "mechanical" (n=61), or with terms like "climate change". Moreover, we noticed that immersion was often used in connection with an immersion program or immersion class. That implied it was about an exchange program and not about immersion through a technical medium. These papers were excluded, as well (n=481). In connection with the term "behavior, attitude, or change" and the 2nd keywords of the search query, many papers retrieved were then excluded, because no educational topic was apparent (n=69). In most cases, the papers were not related to foreign language (FL) learning, or teaching (n=1158). There were also papers in which neither an immersive setting nor FL were the subject of investigation (n=380). After the first level screening 174 paper remained for further analyses.

STEP 3: eligibility
The exclusion criteria under Level 2 applied to the entire text. We excluded papers in which the study design did not include a qualitative, quantitative, or mixed-method measure. Additionally, studies were excluded for having no intervention described as immersive or without a focus on second/foreign language learning, or acquisition. 35 of the identified papers did not present a study or study results. 36 studies did not examine immersive intervention or did not describe it as such. Another 34 papers did not conduct research in FL learning and teaching, and we did not have access to eight papers. After the second level screening 61 papers remained for further analysis.

STEP 4: final inclusion
Among the 61 identified papers, another seven papers were excluded due to lack of soundness of the findings and inconsistencies in the results. Finally, 54 papers were included for the systematic review ( Figure 1).

Data Analysis
The following categories have further classified the 54 papers that fulfill the inclusion criteria: Category 1: Investigation form and education level To identify research gaps and areas about target groups, the population participating in the empirical study was recorded. First of all, a distinction was made between teachers and learners, among the learners, regarding their education level (type of school, (college) student, or pupil), age, or proficiency level. Furthermore, a distinction was made between blended learning (BL) formats, experiments, and qualitative forms of inquiry. The main goal of BL design is to find the most effective and efficient combination of two learning modalities [71]. In this case, we distinguished between whether the immersive technology and the corresponding intervention were integrated into an experiment or into a several-weeks-long teaching concept.
Category 2: Degree of immersion and technology used As mentioned earlier, there are discussions in HCI around the definition of presence and immersion. Because authors used different definitions, we sorted the included articles by the definition used. As described above, defining immersion by technological factors, immersive applications can be categorized by the degree of immersion. The technologies used can be sorted by the degree of immersion. VR refers to the most immersive, followed by 360-degree formats shown via an HMD and AR. 360 • formats which also include so-called CAVE systems or SmartSpaces. MUVE is the umbrella term for all interventions that do not use any technical means other than a mobile-, tablet, or desktop-based application, such as Second Life. In sum, this category reports which technological medium was used in the studies.

Category 3: Predictor
In any empirical study, there is at least one independent variable that is manipulated. We would like to list all variables used in the identified studies to identify successful gestalt principles and possible research gaps and areas here as well.
Category 4: Criterion Similarly, we categorized the dependent or outcome variables to determine possible research areas and pinpoint outcomes with promising results. Additionally, this category provides an overview of the effects of using immersive interventions in second language learning and the closely linked intercultural learning.

Results
We examined the studies according to the four categories defined above.

Category 1: Investigation form and education level
A clear trend can be seen in the studies selected for the literature analysis. The ratio of pupils (n = 25 studies) compared to students (n= 29 studies) is almost equal. Teachers (n = 8) and others (n = 2) have been addressed less. Although eight studies interviewed teachers, most of these were observational or qualitative. Only in one study were educators, in this case university professors, the exclusive target group [62]. As can be seen in Figure 2, a distinction was made between blended learning (BL), experiment (between-design), experiment (within-design), and solely qualitative studies, such as semi-structured interviews. As the target groups in the studies were also mixed and different forms of investigation were used in one paper, the target groups summed up are larger in Figure 2. The period in which the BL format was used in conjunction with an immersive intervention proceeded for up to one year. The results revealed more experiments (n = 37) than BL concepts (n = 16). We assume that it is due to the complexity of creating and integrating a BL design. The high number of studies with students (higher education) could be explained by the fact that they are better attracted to studies by credit programs, financial compensation, or similar. A total of four studies surveyed a mixed population [25,45,47,51]. The analysis of investigation forms and the target group was studied to identify potential research gaps. Overall, blended learning concepts and the target group of teachers have rarely been addressed in the context of language and intercultural learning.

Category 2: Degree of immersion and technology used
Half of the studies used the AR medium (50%). The next largest share is taken by studies with MUVE's (24%). 13% of the studies showed the 360 • content via an HMD or by Smart Spaces. Only in 13% of the studies fully immersive VR applications were deployed, and still none of these studies defined immersion according to the designated definition by Slater, and Wilbur [84]. In the few studies in which immersion was measured, either Witmer and Singer's definition [98] was used [94], or immersion was part of a larger questionnaire, such as the TAM questionnaire [65] or the game experience questionnaire (GEQ) [28]. Yeh et al. (2020) used immersion as the quality of the learning VR system and surveyed it qualitatively by asking participants the following question, "Which of the following features (VR panorama, audio, interactive features, and structure) best help immersion in the VR experience?" The open-ended questions were evaluated using context analysis [102]. 19 Studies made no claims about immersion but used immersive technology by definition, namely AR. In 18 studies, "immersive" was an attribute of the application but no definition was given, and only eleven of these studies used AR, VR, or 360 • content. However, the term "immersive", or "immersion" has also been used to describe diving into the intervention [76,93], in context with flow immersion [62], and cultural immersion [19,82]. Quintín et al., for instance, referred to playing Second Life as "Immersion in Second Life" [76]. Immersion is most often defined in the field of HCI by Witmer and Singer, or by Slater and Wilbur's terms of reference [84,98]. Contrary to expectations, these definitions were rarely used in the identified studies. Only Liaw [63] and Wang et al. [94] used Witmer and Singer's definition, and only Qu et al. [75] extended Witmer and Singer's IPQ questionnaire [98] with the "oneitem" regarding presence from Slater [88]. Gelsomini et al. [35] cited Johnson-Glenberg, et al. [53] as stating that immersion is a design consideration, and Guillen-Nieto et al. [38] cited de Freitas [27] regarding immersion. The definition of Shermann and Craig [80] was used by three studies [15,49,52]. Draxler et al. [29] used Sanchez-Vives and Slater's [87] definition. It should be kept in mind that not all authors have the same basis and the same standards for immersion and presence. Thus, this makes the studies more difficult to compare in terms of their degree of immersivity. Therefore, results revealed a gap of interdisciplinary cross-discussion. While in the field of HCI, clear definitions of immersion and presence exist, including theories, measures, and a large corpus of empirical results. In the field of education, immersive technologies are used without such a clear definition and theoretical base.

Category 3: Predictor
Most studies compared the findings over time (31%), particularly in blended learning studies or studies with at least two measurement points. The second most common predictor was "degree of immersion", where a traditional intervention was compared to an immersive intervention (28%). Studies with only one measurement time point (post measurements) consisted of 13% of the papers. Figure 3. This pie-chart shows the reviewed predictors from the studies divided into six categories, and their frequency (in %).

Setting manipulation as predictor variable
A total of ten studies (15%) manipulated the setting of the application. Hoa, et al., tried to predict the learning effectiveness and motivation by different game designs (puzzle game, word matching, quiz, role-playing) [41], as well as, Hsu et al. [48]. Hsu, et al., used an AR game mechanism called collective game design (CGB), and another called sequential mission game design (SMG). "In both systems, there were a total of seven reality targets corresponding to seven objects in the real-world setting. These seven objects could be collected randomly at the same ward in the CGB mode, while they had to be collected step by step at different wards in the SMG mode" [48, p. 317]. While the teachers in the interview found that students in the CGB mode were significantly more enthusiastic, they made significant progress in their learning in the SMG mode (Paired sample t = 2.34 * ; p < 0.05) Students in CGB mode did not make such remarkable progress (Paired sample t = 1.46; p > 0.05). The SMG mode forced students to study the learning material repeatedly until they passed the spelling task using the learning objective, which apparently helped them to remember the spelling [48]. In the study by Wang et al. (2017) there were four different intervention designs with two key artifacts. The first intervention was an English learning environment in a 3D virtual world, OpenSimulator, without any other artifacts. The second intervention included a chat-bot, the third a time machine, and the fourth the chat-bot and the time machine [94]. In summary, the chatbot and the time machine increased learners' sense of presence in a virtual environment. Learners who experienced the chatbot and those who experienced the time machine felt that they gained a sense of presence within the virtual environment. However, no data were provided here on the impact on FL learning.
Other studies have investigated whether the availability of certain support options makes a difference in language learning, such as scaffolding assistance (yes, no) [21], caption type (no caption, English caption, Chinese caption) [17] speech support (auditive, visual, or both) [92], speech input (yes, no) [26], or corrective feedback [25]. Liu et al. [66] investigated whether the social context during the use of an AR 3D pop-up book affects learning achievements. There were three trial settings: a) individual trial, b) small group trial, and c) real class trials. However, the setting's actual manipulations were examined and the participants' perceived effectiveness of the setting. The authors Chang et al. [12] investigated the effectiveness of the system using the TAM-questionnaire (Technology Acceptance Model) [65] which aimed to gather information on students' learning experience and attitude. Thus, they investigated whether the perceived effectiveness of the system influences behavioral intention, motivation, and satisfaction in learning English with an AR application [12].

Participants' traits and states as predictor variable
Things the participant already had as an ability or characteristic were chosen as predictor in 13% of the reviewed studies.
English or language ability was considered separately for this review, and accounted for 6% overall. Participants were divided into either two or three groups based on their scores on an English test or their school performance. Hence Gelsomini, et al. explored the question of whether children with more significant learning difficulties found it easier to remember vocabularies, and terms within IMAGINE [35].
Five studies (7%) used other traits or states of participants as predictors. For example, Hsu et al. [48] used gender as an independent variable. There was no remarkable influence on learning effectiveness due to the participants' gender. The authors of this study, therefore, recommend that other personal characteristics such as learning style or cognitive traits (working memory capacity, inductive reasoning, and associative learning abilities) be considered in future studies [48]. Another study by Qu et al. divided the participants according to the attitude of their virtual counterparts. They used virtual bystanders in a virtual classroom setting to influence people's beliefs, self-efficacy, and anxiety [75]. In two other studies, the participants were divided according to their cognitive and learning styles [18,45]. In the study by Chang et al. [12], the user experience was the central object of investigation. On the one hand, the subjects' self-efficacy was used as a predictor. They found that perceived self-efficacy was a predictor of perceived learner satisfaction. The results also showed that perceived usefulness and perceived satisfaction were predictors of learners' behavioral intention to use e-learning. In addition, the authors of this study categorized the participants by their perceived effectiveness of the system itself. Therefore, the predictors of this study belong to the category of traits and states as well as to the category setting manipulation. A complete listing of the distribution of the predictors is shown in Figure 5.
As predictors, most studies took the most apparent change over time, and compared an immersive intervention to a traditional one. Of special interest were the results on participant characteristics and states as predictors, as well as setting manipulations. Specifically, gamification has a positive effect on motivation to learn, whereby designs with more scaffolding show better learning outcomes. Regarding states and traits, personal characteristics such as learning style or cognitive traits (working memory capacity, inductive reasoning, and associative learning skills) are recommended.

Category 4: Criterion
The following tables show the identified dependent variables sorted according to qualitative, and behavioral methods (see Tables 2, and 3), indicating the investigation form (questionnaire, eye-tracking data, etc.). Furthermore, the quantitative measured dependent variables are listed below (Table 4)  . This pie chart shows the criterions, or responding dependent variables, that were examined in the studies reviewed. These were grouped into nine superordinate categories.

Results of qualitative measurements
Qualitatively, teachers often found the systems used very motivating and enjoyable [19,43,66]. They found it easy to use the systems, but only if pre-training had taken place [66]. Li et al. [62] conducted an exclusively qualitative study and interviewed six professors from the departments of Applied Linguistics and Linguistics, Information Management, and Information and Computer Engineering in semi-structured in-depth interviews after they were shown an AR learning environment. Although some technical problems occurred during the demonstration that caused delays, almost all of them perceived the AR system to be interesting, innovative, and attractive [62]. Some teachers assumed that AR might help students memorize vocabulary [66], while others perceived little vocabulary achievement [43]. Also frequently cited by teachers was the fact that application would distract users from the task because of its novelty. Instructors also mentioned that AR has an advantage over VR, due to greater peer communication and lower cost. Students and learners also increasingly mentioned in interviews the positive effect of interactivity, collaboration, and the resulting enjoyment and motivation from the new intervention. In none of the qualitative studies were learners found to have difficulties with the new technology. Answers from video and log data [60,93] Reflection papers, or blogs [81,91] Ethnographic procedure, researcher as participant-observer; interviews [38] Qualitative analysis of comments (positive, or negative) [58] Qualitative data mentioned, but no further information.
[42] Analysis of participants' speech [75] Teachers' performance 5.6. Results of observational, behavioral, and quantitative measurements Behavioral measures confirmed the results on enjoyment and motivation. Experiential motivation was identified, for example, by longer game playing time, or more interaction, and enjoyment by increased, or extended, smiles and laughter [26]. There were also other observational and behavioral assessments. For example, Chen, and Hwang [18] analyzed the move structure during an oral presentation according to five main points and eleven sub-criteria. It was found that participants learning with the proposed ISVVR learning mode (360 • content) used more move structures similar to those of professional TED speakers compared to the conventional multimedia mode. The use of the immersive application (ISVVR) as a visual aid was able to improve the participants' movement structures in their oral presentations [18]. Ibrahim and colleagues investigated participants' attention during a conventional multimedia application (flashcard application) and the more immersive AR application (ARbis pictus) in terms of vocabulary learning behavior, recall, and recognition. For this purpose, gaze data were collected for both modalities. For the Flash Card application, eye-tracking data has been collected with a screen-based eye-tracker, and for the HoloLens application, head orientation focus has been recorded. Furthermore, click behavior on the multimedia application has also been collected to identify possible learning patterns. The differences in attention times between the different groups were not significant here. The click patterns showed that participants tended to click more frequently toward the end of the study. Users later reported in the interview that they found the ability to see the object and the word simultaneously in AR very helpful for learning. However, for autonomous self-testing, the flashcard application was more effective [51].
Physiological measures, for instance heart rate and skin conductance, were included to measure elicited arousal during the virtual English lesson (Table 3). Qu, et al.'s results [75] showed a significant effect for phasing on participants' heart rate, with an increase in participants' heart rate during the second phase in which they answered questions. In addition, the experience of witnessing bystanders commenting negatively on other students' performance increased participants' heart rates when it was their turn to speak. number study behavioral investigation [51] attention and gaze (eye-tracking data) [3,26,29,36,38] behavioral cues (subjective, observed data) [18] move structures [25] gaming behavior (log data) [50,59,76] interaction patterns between students (video, and subjective, observed data 12 [75] physiologically, through skin conductance and heart rate 5.6.1. Effects on language learning Quantitatively, learning achievements, especially on a cognitive or affective level, were most frequently investigated, as well as the motivation and acceptance to use the new systems. Chen [20] reported in his study that the implemented method (ARVEL) effectively improves students' learning achievements and motivation. Students show a higher level of satisfaction with English as Foreign Language (EFL) learning than with the video-assisted learning method (CVEL). The ARVEL method significantly increased students' intrinsic motivation to learn English, and their satisfaction was significantly higher than the computer-based method. Wu [100] showed promising results in vocabulary learning success by comparing Pokémon Go with a flashcards application. Similarly, Redondo et al. [78] confirmed that the use of AR in EFL instruction in early childhood education produced higher levels of motivation in students than the more traditional instructional approach in the control group. Also, an improvement in English learning was observed in the children who used AR in the classroom compared to those who used the traditional method. Students who practiced in IMAGINE showed that they retained many more concepts over the long term than those who practiced in the classroom. Children with more significant learning difficulties found it easier to remember terms within IMAGINE [35]. To study learning achievement in younger groups, matching games were used instead of tests, for example. Through AR, an improved classification of the vocabulary word and the corresponding picture, for instance, was determined here [3]. There were also studies comparing immersive interventions. Chen et al. [17], for example, compared interactions between subtitles and English proficiency in terms of motivation, and attitude, and effectiveness to learn in augmented-reality-enhanced theme-based contextualized EFL learning. Although all learners expressed positive attitudes toward AR-assisted learning, interactions between predictors (subtitles and English proficiency) and attitude were not significant. The captioning condition poorly influenced learning achievement.

Effects on intercultural learning
Intercultural competence was the least frequently used research item in the reviewed studies. In the study by Liaw [63], groups of different sizes (2-5 participants) from different parts of the world collaborated in vTime, a VR social networking site. Then, Intercultural Communication Competence (ICCQ, [68]) was assessed via questionnaire. The ICCQ results show that 88% of the participants were satisfied with the VR approach. 48% of participants disagreed that it was challenging to build a good relationship with people from other countries. Most participants (59%) also did not find dealing with and managing cultural uncertainties troublesome or frustrating (48%). Most of them found it quite enjoyable (60%) and even exciting (76%). Participants indicated that they had no prejudice against people who spoke with an accent (77%) and actively listened to others when they were in vTime interactions. All felt that they were attentive to the cultural and behavioral norms of others. [63].
Yeh and colleagues took a different approach [102]. Here, participants were asked to create their own VR content that presented the cultural significance of local Taiwanese sites in an educational VR application. The BL approach took 18 weeks to complete. In the first eleven weeks, the instructor taught participants how to use the EduVenture VR app to create panoramas and add interactive features. In weeks 12 through 15, participants created their content with the instructor's assistance by adding interactive features, panoramas, and audio. In the last three weeks, participants reviewed their classmates' work on the app. Then students were asked about their reactions to the experience with the VR cultural learning project. Immersion was not measured, but two open-ended questions were designed to explore how VR technology enriched students' intracultural learning experience (presence). Most students agreed that VR technology provided them with better intracultural learning experiences because VR technology made them feel like they were physically visiting and traveling to the places presented. All the features of VR helped students to present the local cultural sites in a vivid, attractive, and clear way [102].
Shih examined cultural knowledge acquisition and the emergence of attitudes toward the target culture [82]. She examined four participants in a case study for one year. They walked the virtual streets of London (Google Street View integrated into a virtual environment) with avatars' help under a native English teacher's guidance and interacted with each other via text and voice chat features. The results show that learners increased their cultural knowledge to varying degrees concerning culture's two main aspects -visible aspects and invisible aspects. There is a tendency for the effect of cultural immersion on participants' cultural knowledge acquisition to increase over time. All learners benefited from cultural learning and showed positive attitudes toward the target culture, country, and interaction with English speakers to varying degrees after completing the course. The results suggest the possibility of a relationship between language level, character traits, motivation to learn about the target culture, initial attitudes, and cultural learning in terms of developing cultural knowledge and positive attitudes toward the target culture. Yang, and Liao investigated whether the learning effectiveness in terms of cultural learning can be increased by using an AR application (VECAR). This intervention was compared to Google Earth, which was operated via a screen by participants using the mouse. The results show that the effectiveness of learning cultural content was better in the experimental group using VECAR than in the control group. "VECAR enables students to translate, rotate, scale, and modify virtual objects in 3D by using the intuitive hand gestures, thus it helps students to visualize and play around some cultural contents, such as football rules, that are difficult to under-stand solely based on verbal explanation" [101, p. 114].
The last of the five studies that investigated intercultural competence examined the language proficiency of the participants. Here, they played a serious game (It's a deal) for teaching intercultural business communication. The statistical test for the correlation between learning effectiveness and English proficiency level showed that the students of A2, B1, and B2 proficiency levels made more intercultural learning progress than the students of C1 and C2 proficiency levels. Accordingly, the native speakers participating in the surveyed had the least learning progress [38]. However, the limitation of this study is the low immersion. The serious game was played through a computer. Immersion has been described as part of a dimension in a framework for educational games [27].

Combination, and summary of the results
The explanatory variable in over half of the studies was change over time, or degree of immersion. As shown in the following plot, no study examined how immersive their application was when compared to a traditional intervention. The following figure shows the predictors in combination with the criteria. All pairs of variables that appeared in the identified studies were combined here. Because not every study examined only one independent and one dependent variable, there are more combinations than studies together (see also Table 4).
Most frequently, subjects' learning achievements were examined in combination with the degree of immersion (immersive vs. traditional setting). This combination was investigated in a total of 16 studies. Often the subject of the investigation was learning of vocabulary. Most studies found improved language skills. However, researchers considered that questionnaires and achievement tests might not be the best way to obtain information from children as they are still young to read and understand [46]. Also, very often, the learning achievements over time were examined (N=14), as well as the motivation or the acceptance towards the new technology used predicted by the degree of immersion was examined (N=12). The motivation was usually higher in the more immersive interventions, and attitudes and acceptance of the new technology were also high. The use (usability, satisfaction, and user experience) was easy for most participants. However, no study investigated why immersive technologies brought better results and which gestalt features have been successful. As stated above, studies addressing a different focus already revealed some specific characteristics of immersive technologies. Future studies might investigate whether those characteristics are useful gestalt features for language and intercultural learning. As already mentioned, fully, immersive technologies, such as VR, provide opportunities for perspective-taking, intercultural encounters, and changes in certain attitudes and behaviors. Nevertheless, the results of this review show that few studies used fully immersive interventions (N=7).
Moreover, in only eight studies were teachers the participants (Table 4). They mostly took an observational role. Only one study surveyed solely teachers qualitatively [62], and only one other study investigated teachers' communication behavior quantitatively [101].
Zhang, and Zhou's [103] assertion that language skills are increasingly investigated rather than intercultural competences or conative objects of inquiry, can be confirmed. Nowadays, intercultural competence is a learning objective that should be taken into account when teaching foreign languages. Nevertheless, it has been investigated in the reviewed studies only by five studies and thus the least frequently (Table 4). Constructive views on language learning and teaching focus additionally on intercultural competencies [33]. Communication in realistic contexts was identified as one primary method to improve knowledge about the language itself and the culture and traditions. Although intercultural learning is an essential part of modern language learning and immersive technologies are proven to be useful in this context, only a few reviewed studies addressed this topic. Thus, future studies might focus on intercultural aspects and ask how immersive technologies can increase intercultural learning in language learning and teaching.  Table 4. This table summarizes the study results from all studies that used at least one quantitative measurement. These are sorted by the previously defined categories of (1) degree of immersion (2) investigation form, and education level (3) predictor's, and (4) criterion's.   Table 2 Second Life [64] MUVE Blended learning change over time Table 2 AR-application [66] AR Blended learning post evaluation   Table 2 Mobile-AR [14] AR Quasi experiment English ability traits and states learning achievements motivation AR Experiment(between design) degree of immersion learning achievements, Table 2 MOW [3] AR Experiment(between design) degree of immersion learning achievements usability Table 3 Interactive Tablet

Discussion
At the onset, the following questions were posed, which are to be answered with the newly gained knowledge: RQ1: How are virtual, fully immersive learning environments used for foreign language learning?
It was observed that a large number of studies compare conventional teaching methods with interventions described as immersive for foreign language learning. The criteria studied were mostly cognitive learning achievements, such as vocabulary or speaking skills, or affective variables, such as motivation, satisfaction or speaking anxiety and discomfort. Measurements on a conative level are quite rare. Only in two studies, the behavior was surveyed by collecting eye-tracking data and by using physiological measurement methods. We see great potential here, as these measurements are almost uncontrollable by the participants.
We also see great potential and a present research gap in the area of intercultural and transcultural language learning. Intercultural competence was the subject of research in only five identified studies. As mentioned at the beginning, Chen, and Starosta [13] assume that three levels define intercultural competence. At the cognitive level, intercultural awareness should be created, intercultural sensitivity should be addressed at the affective level, and the conative level should address intercultural fluency [13,103]. Current research provides little on the topic of intercultural competence nor at the cognitive, or affective neither the conative level. Intercultural sensitivity is less studied than criteria such as motivation or speech anxiety. In today's English classes, teachers, and learners have to acquire knowledge, develop a critical, and sensitive awareness of racial thinking and act with respect and open-mindedness towards one's own and others' perspectives [30,31]. Particularly with the renewed attention to the BlackLiveMatters movement, we see that this learning objective is of enormous importance. It should therefore be given a corresponding priority in language education. Fully immersive learning environments have been used in most of the studies identified for foreign language learning by examining primarily cognitive teaching, and learning processes, such as vocabulary learning achievements, or affective investigations, such as motivation or speaking anxiety. Constructive views of language learning and teaching additionally focus on intercultural competencies [33]. Intercultural learning is an essential component of modern language learning. In the context of immersive technologies, few reviewed studies have addressed this topic. Researchers should therefore focus on intercultural aspects in the future and ask how immersive technologies can enhance intercultural learning in language learning and teaching.
RQ2: Which characteristics of immersive technology support language and intercultural learning?
The benefit of increased immersion was mentioned in most studies but not checked. Only six studies quantitatively measured immersion or presence. In the three studies that measured immersion, it was only one item of a broader questionnaire. In one study, the presence was measured qualitatively [102]. Hence, studies addressing language and intercultural learning with immersive technologies lack systematic investigations on the increment value of immersive technologies and different immersion degrees. Furthermore, as can be seen in Figure 5, no study comparing an immersive intervention with a traditional intervention measured immersion or presence. In other words, no study tested whether the interventions were truly distinct enough in their immersiveness for further assumptions. Moreover, only seven of the 54 papers offered fully immersive settings. Thus, the spectrum of immersive technologies is unexplored -particularly full immersive technologies. Future researchers should therefore focus on the area of foreign language and intercultural learning and teaching using fully, immersive VR applications. Liaw's study [63] also shows that VR can offer great potential for use in the classroom and especially for intercultural encounters (41% of participants felt that VR helped them learn). Yet , the other 41% did not seem to feel such effects, and 18% responded negatively to this statement. Only 18% of the participants felt that using VR fit their learning style. Therefore, it is imperative to focus on interdisciplinary collaboration when developing fully immersive learning environments [63].
The Behavioral Framework for immersive Technologies (BehaveFiT) presents four main factors that can be manipulated in VR environments to promote change in attitude and behavior [96]. The first factor is the situational context, i.e. the appearance of the virtual environment in which the user feels present. In the study by Yeh and colleagues, participants were allowed to create their own VR environment. Students ranked panoramic (situational context) as the most effective technology to deliver an immersive cultural learning experience, followed by audio, interaction, and structure [102]. The second factor is the virtual objects and their appearance, interactivity etc. As Yang and Liao showed, the AR application VECAR enabled students to translate, rotate, scale, and modify virtual objects in 3D using intuitive hand gestures. This allowed cultural content to be easily and quickly visualized and thus paraphrased [101]. Object integration in VR is almost limitless and can be integrated effectively into foreign language learning. Nevertheless, because of this limitlessness, especially with regard to cultural learning, it is essential to pay attention to possible associations and connotations of the objects in order to avoid critical incidents. The third and fourth factors are self-presentation and other-presentation. For example, in the study by Qu et al. [75], virtual bystanders perceived attitudes toward participants and virtual peers were influenced by their appearance. Bystanders' attitudes toward participants significantly influenced participants' performance beliefs, self-efficacy, or anxiety. Virtual bystanders who displayed consistently positive attitudes toward both peer speakers and participants elicited the least anxiety among participants. Furthermore, participants rated their presence higher when the bystanders exhibited a positive instead of a negative attitude towards them [75]. Thus some studies showed positive effects. However, future research on language and intercultural learning might draw on those promising research stemming from other fields (e.g., [1,2,44,74,77]). RQ3: Can virtual, fully immersive learning environments increase motivation and success in learning a foreign language?
Immersive applications can increase motivation and are highly accepted by learners. There are already some promising research results in this area. At the same time, researchers, as well as teachers, warn against the strong flood of stimuli, the effect of the new, and the distracting effect of animated objects, especially on younger participants [46]. In terms of learning success, nearly all studies measured an improvement in learning achievements.
RQ4: Can virtual, fully immersive learning environments change participants' attitudes through intercultural encounters?
Affective behavioral changes in non-educational VR applications have been extensively studied [39]. However, they are underrepresented in educational settings and represent an important area for future research. There is only one study that demonstrates that virtual encounters impact our self-efficacy, speaking anxiety, and avoidance behaviors in the foreign language classroom. However, it was not the cross-cultural component that was manipulated, but the virtual bystanders' attitudes toward the participants and the virtual peer speakers. Thus, there is still a large research gap that has not been explored, especially in the pedagogical framework and in the field of foreign language teaching, which, as the Covid-19 crisis shows, should receive special attention.
RQ5: How are virtual, fully immersive learning environments used for teacher training?
Teachers were rarely included in the evaluation and usually only took part as observers. This was then surveyed qualitatively in only seven studies. Just one study investigated the teacher communication behavior quantitatively as well as qualitatively [101]. Since student teachers are relevant future multipliers of teaching and learning with digital media in the future, they might be addressed twofold [97]. Firstly, the teaching and learning processes at the university set an example for the student teachers by imparting digital key competencies. There are some promising projects and suggestions on how to address student teachers. For example, the prospective teachers can explore and experience various digital devices, and can be informed about the basic functionalities. They can receive a reflected understanding of the benefits and limitations of digital devices and develop digital competencies by working on projects with digital media. Further, students can learn how to use digital devices to implement pedagogical as well as methodological objectives. Finally, the students can evaluate their material during the course, imparting knowledge about the appropriate use of digital equipment to attain didactical aims. Secondly, the prospective teachers prepare teaching units for practical implementation. Thus, innovative teaching and learning contexts can be directly explored in realistic situations and transferred into real-life teaching scenarios in schools [31,79,97].

Conclusion
Some benefits of immersive technology and its use in foreign language teaching were utilized and evidenced within the identified studies, such as increased attention, motivation, and enjoyment. Nevertheless, it also revealed research gaps concerning implicit measurements of learning achievements, and intercultural competence on a cognitive, affective, and conative level. In modern foreign language teaching, the focus is no longer exclusively on native language competence. It is now common to see the overall goal of foreign language teaching as the development of intercultural and language action-oriented communicative competence, which consciously includes the development of competencies in the learners' inter-language. "This change in goals marks a shift away from a standards-based deficit orientation toward the speaker as a linguistically and culturally shaped individual. A language embodies and expresses aspects of its speakers' culture and worldview. Full communicative competence in a language, therefore, requires an understanding of and the ability to interact with the culture and worldview of the speakers of that language" [7, p. 2]. Not only this, but also the renewed Black Lives Matter movement indicate that inter-and trans-cultural competencies are not given enough space in the foreign language classroom compared to deficit-oriented language acquisition measured by vocabulary and grammar. Yet, as this review shows, the study of intercultural competence through immersive technologies is a research gap that needs to be filled.
Furthermore, the usage of fully, immersive interventions, like VR, and its associated benefits, were comparatively rarely used. Especially in the context of studies and basic research, VR environments offer a comparatively cheap and very controllable setting. Studies are more reproducible and therefore more comparable than conventional teaching methods. The systematic review further revealed that many potentials remained unexploited. For example, spatial distances and temporal distances can be overcome, which might support the learning of tempi and intercultural competence. Especially in times of pandemics, it can be a great advantage to be able to meet remotely at a virtual location. Simultaneously, the design of this location can be limitless and easily changed. Furthermore, the representation of the other persons, and the self-presentation can be manipulated. Studies from behavioral psychology show here promising results, which would be applicable particularly in the area of intercultural competence. However, stylized avatars can also have advantages. For example, anonymity can arise, which can have a positive impact on speech anxiety. In addition to these factors, virtual objects can be integrated, which might steer and inspire discussions. Visual feedback and learning progress can be displayed directly by an instructor, and made tangible, even without a student feeling exposed in front of a class. Future researchers on language and intercultural learning should therefore focus on these potentials of immersive technologies to design innovative and effective learning environments.
Immersive learning environments can address the challenges of digitization in educational systems. The first success factor of digitization is the digital equipment of schools. The second is acceptance among teachers and their digital skills in using the equipment. Although this is the case, the target group of educators was heavily underrepresented in this review. The third success factor of digitization is interdisciplinary cooperation. These seem essential in order to exploit the whole potential of digitized programs. Both pedagogical and didactic concepts must be combined here with research from HCI to achieve innovative teaching-learning formats and a successful digital transformation.