Introduction

The ability to instantaneously transform ourselves into being whatever we want and to explore worlds that are only bound by the realm of human imagination has historically been limited to science fiction. However, with the wide availability of modern technology like virtual and augmented reality and the upsurge in free development engines, it is becoming possible for anyone to build a virtual experience that is engaging and captivating. This is intriguing for fields like education wherein existing methods have long been criticized for not adapting to the opportunities and challenges of the twenty-first century (Scott 2015). With predictions that virtual reality (VR) and related technologies could reach 15 million learners by 2025 (Goldman Sachs 2018), it is natural that the number of research studies on learning with VR is rapidly increasing. A literature search shows that the number of publications on Scopus that refer to VR in combination with either learning, education, or training is quickly growing (see Fig. 1). Although the number of studies is rapidly increasing, a recent review (Radianti et al. 2020) and meta-analysis (Wu et al. 2020) highlight a lack of theories to guide research and application development as a major challenge facing this field. It is therefore relevant to develop a research-based theoretical model that provides an understanding of learning in immersive VR (IVR), so that stakeholders such as students, teachers, instructional designers, or policy-makers know what to be aware of in using, choosing, designing, developing, and purchasing IVR-based learning applications.

Fig. 1
figure 1

Number of articles on the Scopus database that refer to VR in combination with either learn, education, train, or teach. Note: The following search string was applied: (TITLE-ABS-KEY (“virtual reality” OR vr) AND TITLE-ABS-KEY (learn* OR education OR train OR teach*)) AND DOCTYPE (ar OR re) AND PUBYEAR > 1981 AND PUBYEAR < 2021. Retrieved October 23, 2020

This paper presents the Cognitive Affective Model of Immersive Learning (CAMIL) with the intention of providing a research-based theoretical framework for understanding learning in immersive environments. Although there are an endless number of potential factors that influence an immersive learning experience, the CAMIL defines some of the most important ones based upon previous empirical literature within the field of learning with immersive technology. Below, we first present a definition of IVR before describing the model in detail.

Defining Immersive Virtual Reality

VR has been defined as a complex media system that encompasses a specific technological setup for sensory immersion as well as a means of sophisticated content representation, which is capable of simulating or imitating real and imagined worlds (Mikropoulos and Natsis 2011). VR can be accessed through various displays such as a desktop computer, a head-mounted display (HMD), or a cave automatic virtual environment (CAVE; Buttussi and Chittaro 2018). The major factor that distinguishes a VR learning session accessed through HMD and CAVE as compared with a VR session accessed through a desktop computer is the degree of immersion. Immersion is an objective measure of the vividness offered by a system, and the extent to which the system is capable of shutting out the outside world (Cummings and Bailenson 2016). Although the degree of immersion can vary based on the number of senses that are activated by the technology and the quality of the hardware, VR experiences accessed through an HMD or in a CAVE are generally regarded as high immersion. Although the CAMIL is relevant for existing and future immersive learning technologies, and is not a technology-specific theory, in this paper, we focus on immersive learning experiences that are accessed through an HMD (which we refer to as IVR) because most of the recent research has used this technology due to its broad availability. This allows us to provide a concrete description of the process of learning in immersive environments by using a specific technological solution as an example. Simulations or 3D worlds accessed through a desktop computer or tablet are referred to as low immersion or desktop VR in the literature and will only be used as comparisons to IVR in this paper.

IVR allows for head and position tracking and is able to render a different image for each eye, which creates visual cues for depth perception. IVR also increases the size of the visual field of view as compared to a monitor. These factors are important for determining the types of learning experiences that benefit from using IVR, and are essential in determining how to design learning content for IVR. IVR is also qualitatively different from mixed or augmented reality technologies because these allow the learner to experience the virtual and the real world simultaneously, while IVR completely shuts out the real world, psychologically isolating the learner in the virtual environment (Loomis et al. 1999).

The other defining factor of an IVR learning experience as compared to more traditional multimedia lessons—for instance those delivered through videos or PowerPoint—is the level of interaction that is possible (Makransky et al. 2020a). Interaction is a technical feature of an IVR lesson, which is related to how much freedom the learner is given to control the learning experience as well as the fidelity (the accordance between actual movements and the corresponding visual feedback; Kilteni et al. 2012) with which that control can be exerted.

Since immersion and interaction are the main characteristics that differentiate IVR from less immersive media such as desktop VR, videos, or PowerPoint lessons (Johnson-Glenberg 2019; Makransky et al. 2020a), it is important to understand if, and how, an immersive/interactive learning experience can influence learning.

The Theoretical Perspective of CAMIL

Historically, there has been a distinction between the roles of media vs. methods in promoting learning (Clark 1994; Kozma 1994; Salomon 1979). One perspective is that each medium has its own underlying rules and conventions which can form our cognition, impact our social structures, and set cultural norms (McLuhan 1964). McLuhan (1964) is famous for coining the phrase the “medium is the message”, and he argued that the nature of a medium is more important than the content of the message. An example of this would be that watching TV is more relevant in shaping how you think, behave, and interact with others than anything you might watch on TV. An opposite theoretical perspective is that a medium in itself cannot lead to increased learning, and the only relevant factor for learning is the instructional method (Clark 1994; Clark and Salomon 1986). Clark (1994) famously claimed that media are mere vehicles that deliver instruction, and that they do not influence student achievement, learning, or motivation. He further argued for the lack of evidence of media effects, and reasoned that media studies are “confounded” because they fail to control for instructional method (Clark 1994, p. 22). Kozma agreed with the lack of evidence, but responded with hopes that future media research would prove more positive, and contended that “if we can find a relationship between media and learning then we will be able to see how technology influences learning” (Kozma 1994, p. 8).

CAMIL provides a theory of change that describes how it is not the medium of IVR that causes more or less learning, but rather that the instructional method used in an IVR lesson will be specifically effective if it facilitates the unique affordances of the medium. That is, interaction and immersion are limited with lessons presented on a video or PowerPoint, but are greater with IVR or other existing/future immersive technologies. So students’ presence and agency, which are psychological constructs that arise from immersion and interaction, will generally be higher in immersive media. This means that instructional methods that enrich learning though higher presence or agency will specifically increase learning through immersive technology.

Several recent studies have attempted to disentangle the effects of media and instructional methods in the field of IVR-based learning. These studies have investigated the effectiveness of an instructional method presented across media by using the same lesson presented in IVR compared to a video or desktop VR. For instance, Meyer et al. (2019) used a 2 × 2 design to investigate the effect of the pre-training principle across a biology lesson presented in IVR or by video in a sample of 118 students. The authors found an interaction between media and method, where pre-training had a significant positive effect on the outcomes of knowledge retention, transfer, and self-efficacy; but no effect was found for any of these variables within the video condition. The findings suggest that the method of pre-training specifically enables the media of IVR to be a successful learning tool. That is, having high prior knowledge enables the affordances of presence and agency to be conducive to learning by allowing learners to interpret their experiences in the IVR lesson in a meaningful way. By contrast, learners who have no prior knowledge to anchor these experiences on may experience entertainment value stemming from high presence and agency, without properly selecting, organizing, and integrating the information into long term memory. In another study, Makransky et al. (2020a) also used a 2 × 2 design to investigate the effectiveness of the generative learning strategy of enactment after a lesson across two media conditions consisting of an interactive IVR simulation compared to a video of the simulation in a sample of 165 high school students. There was an interaction between media and method with enactment resulting in significantly better procedural knowledge and transfer in the IVR group, but not in the video group. The authors concluded that learning in IVR is not more effective than learning with video; but incorporating the generative learning strategy of enactment is specifically effective when learning through IVR because the affordances of presence and agency result in highly engaging experiences which do not necessarily facilitate self-regulated learning. Nonetheless, adding a generative learning strategy specifically helped the students who learned in IVR because this provided the time for reflection, which was necessary for integrating the highly engaging experiences into meaningful schema. Finally, Klingenberg et al. (2020) conducted a 2 × 2 mixed-methods experiment with 89 undergraduate biochemistry students. Students learned about the electron transport chain through desktop VR and IVR (media conditions), with about half of each group engaging in the subsequent generative learning activity of teaching a fellow student (method conditions). The authors found a significant interaction between media and methods indicating that the generative learning strategy of teaching significantly improved retention, transfer, and self-efficacy when learning through IVR, but not desktop VR. These studies provide evidence that certain instructional methods enable IVR to be more effective, and help build the theoretical foundation for CAMIL.

The above examples describe how methods enable media in the sense that an instructional method specifically facilitates the affordances of, or limits the shortcomings of, learning in a specific medium. It is also possible that the affordances of a medium specifically enable an instructional method. An example is the embodiment principle, which states that people learn more deeply when onscreen agents display human-like gesturing, movement, eye contact, and facial expressions (Mayer 2014a). The principle is based on social agency theory (Mayer 2014a), which describes how using the embodiment principle can prime a learner’s social presence and increase their motivation to exert more effort to make sense of a lesson. The CAMIL predicts that the instructional effectiveness of the embodiment principle would be greater when learning through an IVR lesson compared to a video because learners will generally have a higher sense of presence in IVR. Therefore, the potential difference between a lesson that facilitates this process in IVR as compared to a lesson that does not (e.g., if it was presented through video) will be greater. That is, the CAMIL would predict that the embodiment principle causes learning in an IVR and a video lesson, thus supporting the method perspective. However, the CAMIL would go on to predict an interaction between media and methods where learners in the IVR-based lesson would benefit more from the embodiment principle than the video-based lesson, because the affordances of the media of IVR specifically enables the method.

To sum up, the CAMIL takes the theoretical perspective that media interacts with method. Therefore, the CAMIL recognizes that motivational and learning theories that have been developed for less immersive media generalize to learning in IVR; however, the model predicts that there will be an interaction when the instruction facilitates one of two principle affordances of learning in IVR: presence and agency. Furthermore, the CAMIL describes how these two affordances result from technological features, and how they predict learning through affective and cognitive processes. Finally, the model describes how six affective and cognitive factors including interest, intrinsic motivation, self-efficacy, embodiment, cognitive load, and self-regulation lead to factual, conceptual, and procedural knowledge acquisition as well as knowledge transfer. The factors are identified based on previous VR-based research (e.g., Lee et al. 2010; Makransky and Lilleholt 2018; Makransky and Petersen 2019).

General Overview of the CAMIL

  1. 1.

    The general theoretical framework of the model suggests that motivational and learning methods developed based on evidence from research with less immersive media generalize to learning in IVR; however, the CAMIL takes the perspective that media interacts with method. This view acknowledges that learning methods affect learning, but suggests that certain methods are more or less relevant in IVR.

  2. 2.

    The general affordances of learning in IVR are presence and agency.

  3. 3.

    The model describes how these affordances influence six affective and cognitive factors that play a role in immersive learning, including interest, intrinsic motivation, self-efficacy, embodiment, cognitive load, and self-regulation.

  4. 4.

    The model predicts how these relationships relate to different learning outcomes.

Figure 2 illustrates the constructs that are included in CAMIL and the relationships between these constructs. The model is built on previous research (Makransky and Lilleholt 2018; Makransky and Petersen 2019) that recognizes that affective and cognitive factors play a role in immersive learning. These studies empirically tested a framework developed and proposed by Lee et al. (2010), which incorporates constructs that are relevant for learning in desktop VR based upon several previously proposed media technology models (e.g., Alavi and Leidner 2001; Piccoli et al. 2001; Salzman et al. 1999; Wan et al. 2007). In the following sections, we describe the relations between the different variables in the CAMIL (that is, the paths illustrated in Fig. 2). In this way, we first describe how technological factors including immersion, control factors, and representational fidelity influence the main psychological affordances of learning in IVR, which are a high sense of psychological presence and agency (Johnson-Glenberg 2019; Makransky et al. 2020a). Then we describe six affective and cognitive factors through which the affordances of presence and agency can lead to learning outcomes. These include interest, intrinsic motivation, self-efficacy, embodiment, cognitive load, and self-regulation. Finally, we describe how these factors lead to important learning outcomes, including factual, conceptual, and procedural knowledge, and transfer of learning. We conclude the paper with a discussion of the implications of the CAMIL for future research and instructional design as well as external factors that may influence the CAMIL.

Fig. 2
figure 2

Overview of the CAMIL

What Factors Lead to Presence?

Presence can roughly be translated to a feeling of “being there” (Ijsselsteijn and Riva 2003). Ijsselsteijn and Riva (2003) subdivide the determinants of presence in mediated environments into media characteristics and user characteristics. As presence is related to perceiving, there is an individual component to it (i.e., different individuals may experience different amounts of presence in response to the same experience). This could, for instance, be related to an individual’s attentional capacities. In terms of media characteristics, Ijsselsteijn and Riva (2003) refer to Sheridan (1992) who suggested three types of determinants of presence: (1) the extent of sensory information presented, (2) the amount of control one has over the sensors in the environment, and (3) the degree to which one can modify the environment and its objects. The first determinant has to do with the degree of immersion offered by the system in question. The second and third determinants are related to the degree of control afforded by the environment, where the immediacy with which it is effectuated plays a central role (Witmer and Singer 1998). Another important determinant of presence is the representational fidelity of the environment, which has to do with how realistically the environment is displayed as well as the smoothness of view changes (Dalgarno and Lee 2010). In summary, the CAMIL regards immersion (positive relation, path 1 in Fig. 2), control factors (positive relation, path 2 in Fig. 2), and representational fidelity (positive relation, path 4 in Fig. 2) as important factors for instigating a sense of presence in virtual environments. Control factors encompass variables such as degree of control, immediacy of control, and mode of control (Witmer and Singer 1998). Representational fidelity includes variables such as realism of display, smoothness of display, and consistency of object behavior (Dalgarno and Lee 2010). Furthermore, the CAMIL distinguishes between three different dimensions of presence including physical, social, and self-presence (Lee 2004; Makransky et al. 2017). We use Lee’s (2004) definition of physical presence as a psychological state in which virtual physical objects are experienced as actual physical objects in either sensory or non-sensory ways. Social presence is defined as a psychological state in which virtual social actors are experienced as actual social actors in either sensory or non-sensory ways (Lee 2004). Self-presence is defined as a psychological state in which virtual self/selves are experienced as the actual self in either sensory or non-sensory ways (Lee 2004).

What Factors Lead to Agency?

According to Moore and Fletcher (2012), sense of agency (here referred to as agency) can be described as a feeling of generating and controlling actions. The most important predictor of agency in virtual environments is that users have control over their actions and are able to exert that control over parameters in the environment (Johnson-Glenberg 2019). It follows that low agency would result from immersive virtual environments where interaction is not possible and where the user follows a fixed narrative. Furthermore, Kilteni et al. (2012) refer to studies that indicate a particular role for accordance between an actual movement and the corresponding visual feedback in creating agency. This phenomenon is related to forward modeling of the central nervous system (CNS; Farrer et al. 2008). In this sense, the CNS represents the predicted sensory consequences of a given movement, which is then compared to the actual sensory feedback signals arising as a consequence of the movement (Farrer et al. 2008). If these are correlated, it gives rise to agency (Farrer et al. 2008). It follows from this that a body representation (anatomically correct or not) and the ability to control this representation are important in order to experience agency in immersive environments. In other words, control factors (positive relation, path 3 in Fig. 2; e.g., being able to control the body representation and modify the environment and its objects) are regarded as the most important predictor of agency in the CAMIL.

The CAMIL includes six affective and cognitive factors that can lead to IVR-based learning outcomes. We start by presenting how presence and agency influence each of these six factors (interest, intrinsic motivation, self-efficacy, embodiment, cognitive load, and self-regulation), before describing how these variables lead to learning outcomes.

How Do Presence and Agency Influence Interest, Intrinsic Motivation, Self-Efficacy, Embodiment, Cognitive Load, and Self-Regulation?

Situational Interest

Interest is a psychological construct that represents a relationship between an individual and a specific topic or content area, and is characterized by both affective and cognitive factors (Krapp 1999). Broadly speaking, two types of interest are described in the literature: situational and individual interest (Hidi and Renninger 2006). We focus on situational interest and define it as the focused attention and affective reaction that is activated in the moment by certain stimuli (Hidi and Renninger 2006). Situational interest may elicit short-term, situational knowledge-seeking behavior—that is, a state of wanting to know more (Knogler et al. 2015). Although the main focus in the CAMIL is on situational interest, as IVR provides an ideal way of triggering and maintaining situational interest, we recognize that this may develop into an individual interest—i.e., a disposition to reengage content over time. The empirical articles that have investigated affective outcomes of educational interventions in IVR compared to less immersive media have generally been consistent in finding higher levels of presence (e.g., Buttussi and Chittaro 2018; Makransky and Lilleholt 2018; Makransky et al. 2019b; Parong and Mayer 2018), and interest (Makransky et al. 2020c; Parong and Mayer 2018). Presence can foster the conditions necessary for sparking a situational interest in the learner (positive relation, path 5 in Fig. 2). Situational interest can be initiated by environmental stimuli, often of novel and intense nature (Hidi and Renninger 2006; Renninger et al. 2008). Feeling a high level of presence in a realistic virtual environment may constitute such a novel and intense experience, triggering one’s interest in the moment. In addition, a high degree of agency in virtual environments can have a positive effect on learners’ situational interest (positive relation, path 6 in Fig. 2), as exemplified in Schraw et al. (2001) who accounted for the role of choice and autonomy in increasing situational interest in the classroom.

Intrinsic Motivation

Intrinsic motivation refers to engaging in an activity for the built-in satisfaction associated with the activity itself, rather than for some separate consequence (Deci and Ryan 2000). Self-determination theory (SDT; Deci and Ryan 2015) highlights autonomy, competence, and relatedness as important needs that should be met in order to develop intrinsic motivation. The empirical articles that have compared IVR to less immersive media have also consistently identified higher levels of enjoyment (Makransky and Lilleholt 2018; Meyer et al. 2019), and intrinsic motivation (Makransky and Lilleholt 2018; Olmos-Raya et al. 2018; Villena Taranilla et al. 2019) in IVR-based lessons. Previous literature using structural equation modeling has also identified an affective path related to learning with IVR, where higher presence was associated with higher motivation and enjoyment and thereby more perceived learning (Makransky and Lilleholt 2018). Being in the presence of a perceived real virtual instructor (social presence) capable of providing positive feedback may satisfy learners’ need for competence as well as social relatedness, thereby leading to higher intrinsic motivation for the activity (positive relation, path 7 in Fig. 2; Deci and Ryan 2015). According to social agency theory (Mayer 2014a), such social interactions are motivating to the extent that they activate a social response in the learner, leading to the exertion of cognitive activity in order to make sense of the learning material. According to the CAMIL, agency during immersive learning also affects the level of intrinsic motivation felt by the learner. This link can be explained by SDT, which holds that providing individuals with choice and acknowledgement of their internal perspective enhances their sense of autonomy and thereby their intrinsic motivation (Deci and Ryan 2015). According to the Control-Value Theory of Achievement Emotions (CVTAE), achievement activities of high perceived value and controllability trigger enjoyment in the learner (Pekrun 2006). Pekrun (2006) cites Skinner (1996) in his description of perceived control as one’s perceived causal influence over actions. This is directly related to agency, and we thus can infer that high agency during immersive learning instigates intrinsic motivation (positive relation, path 8 in Fig. 2).

Self-Efficacy

Self-efficacy refers to one’s perceived capabilities for learning or performing actions (Schunk and DiBenedetto 2016). In a meta-analysis, Sitzmann (2011) concluded that computer-based simulation games can increase self-efficacy by 20%. Several empirical studies investigating the effect of IVR-based lessons on self-efficacy have also identified positive effects (e.g., Buttussi and Chittaro 2018; Klingenberg et al. 2020; Makransky et al. 2019a, 2020a; Petersen et al. 2020). The CAMIL builds on the work from Bandura (1977), who describes four major sources of information that can increase expectations of personal efficacy. The strongest is performance accomplishments, followed by vicarious experience, verbal persuasion, and physiological states. The CAMIL describes how a high sense of presence and agency leads learners to experience activities in a virtual lesson as performance accomplishments because they perceive the virtual experience as “real” (positive relation between presence and self-efficacy, path 9, Fig. 2), and feel like they are in control of their actions (positive relation between agency and self-efficacy, path 10, Fig. 2). This is in contrast to other media such as a video, which provides learners with a vicarious experience rather than a mastery experience.

The predictions are supported by previous literature including a meta-analysis by Gegenfurtner and colleagues (2014) which concluded that higher levels of interaction and user control result in higher estimates of self-efficacy. Immersive simulations can increase self-efficacy through immediate high-fidelity feedback on one’s actions and choices (Makransky et al. 2020b). The relation between presence and self-efficacy has also been identified in previous research. Makransky and Petersen (2019) used structural equation modeling to identify a positive path between presence and self-efficacy which went through intrinsic motivation.

Embodiment

The term embodiment can be used to describe the sensations that arise as part of “being inside, having, and controlling a body” (Kilteni et al. 2012, p. 375). Embodiment is a central part of embodied cognition, which suggests that the way we think and make sense of the world depends on our sensorimotor system and bodily interactions with the environment (Wilson 2002). In general, this view emphasizes the role of the body in human experience and links it with cognitive processes (Stolz 2015) as well as affective processes (i.e., when emotions involve bodily sensations; Furtak 2018). In IVR, embodiment refers to the experience of owning a virtual body (body ownership), which can be influenced by the external appearance of the body and the ability to control the actions of the body (agency), and the possibility to feel the sensorial events directed to the body (such as touch; Kilteni et al. 2012; Longo et al. 2008). In the CAMIL, presence, and specially self-presence, is posited to be associated with increased levels of embodiment experienced by learners in IVR (positive relation, path 11 in Fig. 2; Biocca and Frank 1997). As mentioned above, self-presence refers to experiencing a virtual self as the actual self in either sensory or non-sensory ways, and is thus closely linked to feelings of embodiment. Likewise, feeling in control of the actions of the body (agency) is positively linked to embodiment (positive relation, path 12 in Fig. 2; Gonzalez-Franco and Peck 2018).

Cognitive Load

Cognitive load theory (CLT; Sweller et al. 2011) and the cognitive theory of multimedia learning (CTML; Mayer 2014b) describe how cognitive overload occurs when the information to be processed during learning exceeds the limited capacity of working memory. Cognitive load (CL) is posited to be caused by the cognitive demands involved in the learning task, and it is a multifaceted construct consisting of intrinsic and extraneous load (Kalyuga 2011; Sweller 2010). Intrinsic CL is influenced by the number of elements that must be processed simultaneously in working memory and the expertise of the learner (Van Merriënboer and Sweller 2005). Extraneous CL is dependent on the design of the learning task based on how information is presented to the learner. A number of studies have identified CL as a specifically important component of understanding the learning process when learning in IVR (Makransky et al. 2019b; Meyer et al. 2019; Moreno and Mayer 2002; Parong and Mayer 2018, 2020). This research suggests that learning in IVR leads to higher extraneous CL than learning in less immersive media, and highlights the importance of considering CL when designing IVR learning tools. IVR systems increases the size of the visual field of view compared to a monitor which can increase presence. However this can also increase extraneous cognitive load because learners have to find relevant content, specifically when the content includes seductive details that are not necessary for learning (Makransky et al. 2020a). Therefore, CAMIL describes a positive relationship between presence and extraneous cognitive load (positive relation, path 13 in Fig. 2). Extraneous CL can also result from high levels of agency (positive relation, path 14 in Fig. 2). Makransky et al. (2020a) describe how an IVR intervention, which allowed for more agency by giving learners autonomy to view the content of their choices and to interact with features in an interactive immersive laboratory simulation, was not optimal or conducive to learning because this might have led to extraneous CL, whereas learners who watched a video of the same content only viewed an optimal run, leading to less extraneous CL.

Self-Regulation

Self-regulation is defined as “the ability to manage one’s behavior, so as to withstand impulses, maintain focus, and undertake tasks, even if there are other more enticing alternatives available” (Boyd et al. 2005, p. 3). Students who successfully self-regulate generate thoughts, feelings, and actions to attain their learning goals (Zimmerman 2013). IVR-based lessons can potentially facilitate this process through high levels of social presence, which makes it possible to increase self-regulated learning through meaningful interactions with peer avatars or pedagogical agents (potential positive relation, path 15 in Fig. 2; Makransky et al. 2019c). Nevertheless, immersive learning environments are highly engaging, yet cognitively demanding due to high levels of presence, so self-regulated learning can suffer when immersive lessons do not provide natural reflection opportunities (potential negative relation, path 15 in Fig. 2; Makransky et al. 2019b). This is the case because a highly engaging environment with high levels of presence and agency may cause the learner to not actively monitor or adapt their affective, cognitive, metacognitive, and motivational processes unless lessons are heavily scaffolded (Meyer et al. 2019; Makransky 2020; Parong and Mayer 2018). An example is that students may be tempted to engage in hedonic activities (Van Der Heijden 2004). Hedonic information systems focus on the fun aspect of using the system and are designed to provide self-fulfilling rather than instrumental value to the user, thereby encouraging prolonged rather than productive use. Although these activities lead to more interest and enjoyment, they can also result in more superficial learning strategies and thus lower learning and transfer (Makransky et al. 2020a). Even when IVR systems are not specifically designed to engage learners in hedonic activities, there is a risk that learners are overwhelmed by the engaging activities they experience in IVR, and the high levels of agency and presence. By definition, reflection requires a momentary decoupling from one’s activities, which may be undermined by high presence and agency (potentially negative relations; paths 15 and 16 in Fig. 2, respectively). Nevertheless, more agencies in the form of activating participants in their own learning process can also provide opportunities for self-regulated learning. Therefore, introducing reflection activities that prompt metacognition and deeper learning within, or after, IVR is critical (potentially postive relation; path 16 in Fig. 2; Makransky et al. 2020a). Therefore, unlike the other paths in CAMIL where the direction of the paths are hypothesized, the directions of the paths 15 and 16 are contingent on whether the instructional design components of an immersive lesson explicitly facilitate self-regulation.

What Are the Different Learning Outcomes Included in the CAMIL?

The types of learning outcomes predicted by the CAMIL include factual, conceptual, and procedural knowledge, and transfer of knowledge; all of which have been identified in connection with uses of IVR in education (Radianti et al. 2020). These are in part based on Anderson et al.’s (2001) taxonomy for learning, teaching, and assessing, including factual knowledge, conceptual knowledge, and procedural knowledge. We added transfer of knowledge as it is often cited as the ultimate outcome of education (Mayer 2014b; Prawat 1989). In the following, we define each of these learning outcomes and briefly describe their relevance in relation to IVR. This is followed by a description of the connection between the six affective and cognitive factors that play a role in immersive learning described in CAMIL and these learning outcomes.

Factual and Conceptual Knowledge

Factual knowledge is defined as knowledge of discrete, isolated content elements or “bits of information” (Anderson et al. 2001, p. 45). These bits of information can include knowledge of terminology and knowledge of specific details and elements. Conceptual knowledge is defined as knowledge of “more complex, organized knowledge forms” (Anderson et al. 2001, p. 48). Conceptual knowledge can include classifications and categories, principles and generalizations, and theories, models, and structures. Parong and Mayer (2018) found that an immersive VR lesson was less effective than a PowerPoint lesson for acquiring factual knowledge, but they found no significant difference in conceptual knowledge acquisition. Other studies that have compared immersive and non-immersive learning systems have not differentiated between factual and conceptual knowledge and instead combined these into declarative knowledge. This research shows mixed findings. For instance, Webster (2016) found that VR-based instruction produced higher declarative knowledge learning gain scores than lecture-based instruction. Makransky et al. (2019a) report no significant differences when comparing an immersive simulation to a desktop simulation or a booklet for declarative knowledge acquisition. FinallyMakransky et al. (2019b) found that an immersive VR simulation was less effective in terms of developing declarative knowledge compared the same simulation presented on a desktop computer. These results suggest that IVR is not necessarily the ideal medium for teaching factual knowledge, and that the exact mechanisms that cause IVR to be more or less effective for developing factual and conceptual knowledge depend very much on how the IVR lesson is designed.

Procedural Knowledge

Procedural knowledge is defined as knowledge about how to do something (Anderson et al. 2001), and reveals itself through behavior (e.g., knowing how to drive a car) rather than conscious recollection. A recent systematic review found that IVR was used most frequently to teach procedural-practical knowledge (Radianti et al. 2020). One reason for this is that IVR provides optimal conditions for rehearsing procedures, through the provision of appropriate sensors such as hand-control devices, gloves, or camera-based real hand tracking, making it possible to slow down the performance of a procedure or rehearse it an endless amount of times. Such use of IVR for gaining procedural knowledge has especially been used with procedures that are difficult or dangerous to train in real life, such as fire safety behavior (Sankaranarayanan et al. 2018), complicated surgical procedures (Xin et al. 2019), or flying planes (Oberhauser and Dreyer 2017).

Transfer

Transfer of learning refers to situations where learning that has taken place in one context impacts performance in another context, and is considered a key educational concept and goal (Perkins and Salomon 1992). By providing virtual simulations of real-life performance situations, transfer of learning to actual real-life situations can be enhanced through IVR (e.g., Makransky et al. 2019a). Such transfer can both be procedural (e.g., in the case of using skills learned in a fire safety simulation during a real-life fire accident) or conceptual (e.g., when a virtual “tour” of the human brain via IVR impacts performance on a real-life test of brain anatomy).

How Do the Six Affective and Cognitive Factors Lead to Learning Outcomes?

Heightened levels of situational interest, intrinsic motivation, self-efficacy, embodiment, and self-regulation and lower levels of cognitive load can have positive effects on learning outcomes, as predicted by the CAMIL (see Fig. 2). Below we go through these paths individually.

According to Harackiewicz et al. (2016), situational interest promotes learning by increasing the learner’s attention and engagement, making learning feel effortless (positive relation, path 17 in Fig. 2).

Intrinsic motivation influences learning by exciting persistence and curiosity in the learner (positive relation, path 18 in Fig. 2; Dev 1997). Enjoyment which results from intrinsically motivating learning activities is assumed to positively impact learning through facilitating the use of flexible, creative learning strategies as proposed in the CVTAE (path 19 in Fig. 2; Pekrun 2006). By keeping the learner’s focus on the task and inciting awareness of one’s learning process, these processes can promote factual, conceptual, and procedural knowledge, and ultimately transfer of learning.

Self-efficacy influences learning because beliefs about whether one can effectively perform the behaviors necessary to produce an outcome are a major determinant of goal setting, activity choice, willingness to expend effort, and persistence (Eccles and Wigfield 2002). These are all important and have a positive effect on academic performance and learning (Pajares 1996). In their meta-analysis, Richardson et al. (2012) found a medium correlation of 0.31 between GPA and academic self-efficacy (i.e., general perceptions of academic capability), and a correlation of 0.59 between GPA and performance self-efficacy (i.e., perceptions of academic performance capability). The CAMIL predicts a positive relation between self-efficacy and learning outcomes (path 19 in Fig. 2).

The theories of embodied cognition suggest that there is a connection between motor and visual processes; and the more explicit the connection the better the learning, suggesting that embodiment is important for learning. Agency through appropriate interaction fidelity can facilitate learning, as direct manipulation of external representations of materials is an implicit part of learning. Evidence suggests that when a motoric modality is added to the learning experience, more neural pathways are activated, which results in more learning or memory trace (Broaders et al. 2007; Goldin-Meadow 2011). Furthermore, there is evidence that learning increases when bodily interactions and visual features of a particular concept are coordinated (Jang et al. 2017). That is, when physical activities are meaningful for the learning outcome, such as manipulating an object to understand its physical dimensions. In the CAMIL, the relationship is outlined through a higher level of self-presence and agency, which is associated with embodiment and embodied learning experiences. Embodiment is especially relevant for developing procedural knowledge (e.g., Kilteni et al. 2013), but may also strengthen neural pathways during factual/conceptual learning, and thus lead to the development of factual and conceptual knowledge (positive relation, path 20 in Fig. 2). Ultimately, this can reveal itself as enhanced transfer performance.

CL is an important factor in CAMIL because it provides an understanding of the complexity that occurs when designing IVR learning experiences. This is the case because more complex visual representations and details, which represent better representational fidelity and can lead to higher presence, may also lead to virtual environments that result in higher extraneous CL and less learning. This is the case when these features are seductive details (also referred to as “bells and whistles”) that are not relevant for learning (Moreno and Mayer 2002). Similarly, more agency does not necessarily mean more learning when it can lead to more extraneous CL. Makransky et al. (2020a) describe how viewing a science simulation on video led to better factual knowledge acquisition than learning the same content while controlling the interaction and viewing in an IVR simulation, presumably due to extraneous CL. As such, extraneous CL negatively influences the learning of factual, conceptual, and procedural knowledge, as well as the transfer of this knowledge (negative relation, path 21 in Fig. 2). The goal of instructional design is to optimize learning by reducing the degree of unnecessary processing (extraneous CL) produced by the learning task, while simultaneously increasing cognitive engagement, so that the learner’s limited cognitive resources can be used to engage in the type of processing that is necessary for learning (Moreno and Mayer 2007).

Self-regulation is also an important yet complicated factor when designing immersive lessons. This is the case, because immersive lessons can be designed to help learners self-regulate, but they can also add enticing alternatives in the form of seductive details (Moreno and Mayer 2002). Several studies have investigated generative learning strategies as ways of stimulating self-regulation during learning. Generative learning theory (Fiorella and Mayer 2016; Wittrock 1974) suggests that learning is the “process of generating and transferring meaning for stimuli and events from one’s background, attitudes, abilities, and experiences” (Wittrock 1989, p. 93). Initial research suggests that the generative learning strategy of summarization improves factual and conceptual knowledge when applied within an IVR science simulation (Parong and Mayer 2018). Furthermore, Makransky et al. (2020a) found that the generative learning strategy of enacting improved procedural knowledge and transfer when used after an IVR simulation, but not when the same lesson was presented as a video. Finally, Klingenberg et al. (2020), found that the generative learning strategy of summarization increased retention of declarative knowledge when conducted following an IVR simulation, as compared to the same lesson in a desktop simulation. In summary, self-regulation promotes factual, conceptual, and procedural knowledge, as well as transfer (positive relation, path 22; Sitzmann and Ely 2011); however, the extent to which an immersive learning experience facilitates self-regulated learning depends greatly on how the lesson is designed and implemented. This is described in more detail in the following sections.

What Are the Implications for Future Research Based on CAMIL?

The CAMIL provides several important implications for future research in the field of immersive learning. Rather than conducting media comparisons that could potentially lead to a body of research that may not show any consistent differences between modalities, CAMIL identifies specific affordances of learning in immersive virtual environments, and proposes that future research should attempt to understand how these affordances interact with different instructional methods. This view prioritizes research which investigates interactions between media and instructional methods. Therefore, future research should investigate if motivational and learning theories generalize to immersive environments, and should specifically test the proposal that there will be an interaction between media and methods when an instructional method facilitates one of the two affordances of learning in IVR: presence and agency. This will ultimately provide a better understanding learning in a particular modality such as IVR.

Another relevant perspective to consider is the hype factor often associated with educational VR. In general, emerging technologies are said to progress through different levels of hype (i.e., expectations that surround a technology over time from its initial launch; Fenn and Blosch 2020). Such expectations are often exaggerated at first due to media coverage and heavy marketing campaigns (Fenn and Blosch 2020). Similar to the views provided in Chandler (2009), we argue that it is important to look beyond the “wow” factor of dynamic visualizations for instruction. If instructional support and learning processes are not emphasized, even the latest technological advancements may have limited instructive value. Hype may have benefits in itself as demonstrated by the novelty effect, where people show increased effort and attention when dealing with media that are new to them (Clark 1983). Consequently, level of prior experience with IVR is an important external factor to control for, as we discuss later in this manuscript. Nevertheless, the effect of novelty is only transient and may never take the place of instructional support.

Although the CAMIL is based on empirical research end existing educational theories, few empirical studies have specifically tested the paths outlined in the model. More research is therefore needed to test, extend, and revise the model. For instance, more studies are needed to investigate the antecedents of presence and agency. Furthermore, the CAMIL provides a theory of change that can help describe how the affordances of presence and agency can lead to learning outcomes through affective and cognitive factors. More research is needed because embodiment, cognitive load, and self-regulation are complicated theoretical frameworks which can be used to design immersive learning interventions that can facilitate learning, but these frameworks can also help understand why immersion can be detrimental to learning. Research that specifically untangles these complicated relationships is thus needed. More research is also needed to establish how factors such as interest, intrinsic motivation, and self-efficacy can mediate the relationship between presence and agency and different learning outcomes. The CAMIL differentiates between factual, conceptual, procedural knowledge, and transfer, but few studies within the field of immersive learning differentiate between these constructs. Future research should use knowledge taxonomies to differentiate between different learning outcomes in order to provide a better understanding of how the affordances of learning in immersive environments benefits different learning outcomes. Knowledge transfer is specifically relevant as immersive simulations can simulate realistic settings where the knowledge is ultimately to be used; however, immersive lessons are also typically situated, which could make it difficult to transfer knowledge to a different situation.

What Are the Implications for Instructional Design Based on CAMIL?

The implications for instructional design are that IVR learning tools should be developed with a focus on the affordances of IVR. This would suggest that designers should emphasize immersion, representational fidelity, and control factors when developing IVR learning tools thereby increasing presence and agency. Specifically, this would mean designing the environment for use with an HMD, ensuring a realistic display of the environment and smoothness of view changes, and affording a high degree and immediacy of control. By doing so, instructional designers can create realistic experiential learning opportunities, allowing learners to perform tasks that would be impossible, impractical, or too expensive to perform in the real world (Dalgarno and Lee 2010). In general as described by the CAMIL, high presence and agency obtained through immersive learning experiences can facilitate interest, intrinsic motivation, self-efficacy, and embodiment and thereby facilitate learning. Simultaneously however, it is important to consider cognitive load and self-regulation when designing immersive learning scenarios. This includes reducing extraneous processing. Instructional design principles that could be specifically relevant for immersive learning environments include the coherence principle (Mayer and Fiorella 2014; e.g., leave out irrelevant material that leads to hedonic activities but does not support learning) and the signaling principle (Mayer and Fiorella 2014; e.g., help learners focus their attention on relevant features in a way that does not diminish their presence). Instructional design principles that help manage essential processing are also specifically relevant for IVR, including the pre-training (Meyer et al. 2019), segmentation (present multimedia messages in learner-paced segments; Parong and Mayer 2018), or modality principles (use spoken rather than printed words in a multimedia message; Mayer and Pilegard 2014). Furthermore, instructional design principles such as the embodiment principle (Mayer 2014a) are especially relevant for immersive learning environments because they foster generative processing through higher social presence. Self-efficacy can also be increased by using the feedback principle (Johnson and Priest 2014). Adding self-regulation activities to manage essential processing during immersive learning is also important. Reflection activities can take the shape of explaining the content of a lesson to an avatar peer in VR, or to a real peer after an IVR lesson (Klingenberg et al. 2020), or summarizing after segments of an IVR lesson (Parong and Mayer 2018).

Important External Factors that Influence the CAMIL

There are a number of overarching factors that do not appear in the CAMIL but nonetheless influence the model. These encompass usability, social factors, and a range of individual differences variables, including age, tendency to experience cyber sickness, working memory, personality, predisposition towards absorption, and spatial ability.

In the context of IVR, usability can be defined as the extent to which the IVR system can be utilized by specified users to achieve specified goals with effectiveness, efficiency, and satisfaction in a specified context of use (ISO 2018). Thus, usability has to do with the outcome of interacting with a system, and can be understood in terms of user performance and satisfaction (ISO 2018). Importantly however, suitable system attributes can be instrumental in making it usable (ISO 2018). This means taking important factors such as system design into account. Why is it important to emphasize usability when dealing with educational IVR, and specifically in the CAMIL? In general, the intended users might not be able/willing to use the system if the usability is low (ISO 2018). Low usability, e.g., by virtue of errors in navigating, may also result in breaks in presence. Likewise, low usability may impair the users’ sense of agency through limiting their control. Additionally, the role of social influence is included in theories such as the technology acceptance model (Venkatesh and Davis 2000), and can influence learning with immersive technology.

It is also important to recognize that learners may possess different degrees of certain traits and dispositions that can moderate the impact of IVR learning interventions. For instance, younger users have been shown to be more likely to accept immersive technologies compared to older users (Suh and Prophet 2018). Cyber sickness is also a factor known to occur for some users of HMDs (Munafo et al. 2017), and can diminish learning from IVR by shifting the learners’ focus. Munafo et al. (2017) found indications of a gender difference in the occurrence of cyber sickness, with women being more susceptible than men. Cognitive differences in variables, such as spatial ability, may also contribute to variability in IVR learning between individuals (Li et al. 2020). Although no extensive investigations, to our knowledge, have been carried out with respect to the influence of personality traits on IVR learning, Watjatrakul (2016) found that neuroticism and openness to experience influence university students’ intentions to adopt online learning programs by influencing their perceived value of online learning. This highlights the role of personality in learning with technology. Furthermore, the tendency of the learner to become absorbed in activities may make them more inclined to experience spatial presence (a combination of physical presence and self-presence; Wirth et al. 2007). Prior experience and familiarity with IVR is also an essential factor to consider as this makes it possible to investigate novelty effects.

Conclusion

In conclusion, CAMIL extends previous research and theory from the fields of virtual reality, multimedia, educational psychology, and educational technology, to describe how IVR can lead to factual, conceptual, and procedural knowledge acquisition, as well as transfer of learning. Recent reviews and meta-analyses of IVR in education have highlighted challenges facing the research in this field, including not using learning theories (Radianti et al. 2020; Wu et al. 2020) and lack of theoretical and methodological rigor (Jensen and Konradsen 2018; Radianti et al. 2020). Given that the number of research articles in the field of immersive learning is rapidly increasing, we hope that researchers will include measures of the variables included in the CAMIL, as well as the external factors, when conducting research on the use of IVR for learning. This would make it possible to test the paths and relationships depicted in the CAMIL as well as further refine the model. Since the model is developed based on empirical research, we expect and encourage researchers to empirically test the assumptions and predictions made by the model, and to include other relevant variables. We also encourage researchers to investigate the role of external factors on the different variables and relationships in the model.