1 Introduction

Whether through symbolic games, stories, television, or video games, children spend a significant amount of time in contact with fictional worlds and fantastical stories. In a systematic review of a selection of popular books, movies, and TV series for children aged three to six (i.e., top-selling books, shows with best TV ratings, movies with strong ticket sales, and movies and TV shows rented or streamed in 2016), Goldstein and Alperson (2020) found that 91.6% of these selected media contained fantastical elements. In educational settings, too, learning material is often integrated into a fantastical context. In their review of a sample of 252 educational books and videos, Chlebuch et al. (2022) found that 81% contained at least one fantastical element. Anthropomorphic representation of characters is a frequent way of incorporating fantastical elements in educational materials. According to Goldstein and Alperson (2020), this is the fantastical element most often used across all types of media, accounting for 69.5% (81.4% in the case of TV shows). Despite the large use of anthropomorphic characters in recreational and educational settings, it remains somewhat uncertain whether they benefit young children’s learning. Investigating whether fantastical elements, and more specifically anthropomorphism, impact preschool children’s learning is the primary goal of this study.

By allowing high levels of control with high psychological relevance (Blascovich et al. 2002), virtual reality (VR) is ideally suited to investigate our research question. For several years, immersive VR devices, such as VR headsets, have been offering a new way to explore different issues related to learning and memory, with increased ecological validity and logistic ease (Smith 2019). Moreover, with VR, it is possible to dynamically simulate not only realistic settings but also fantastical elements in a realistic manner, thus allowing for the investigation of how fantastical elements impact learning. Nevertheless, there are still mixed results about the advantages of immersive VR versus non-immersive VR learning (e.g., Babu et al. 2018; Buttussi and Chittaro 2018; Makransky et al. 2019; Makransky and Mayer 2022; Olmos-Raya et al. 2018; Parong and Mayer 2018). One of the main advantages of VR headsets compared to more traditional media, such as television or tablets, is their ability to completely immerse the user in a new world (Slater 2003). We think that this immersive aspect could influence the perception of the realism of fantastical elements. Indeed, being immersed in a highly immersive environment could make the fantastical elements that surround children more realistic. To our knowledge, no study has addressed this issue in preschool children. Investigating the impact of immersion and the possible combined impact of immersion and realism on learning is the secondary goal of our study.

1.1 Children learning with fantasy

Fantastical elements are often present in the educational materials of young children. Nevertheless, the impact of such materials on children’s learning and their effectiveness have not been clearly attested. Indeed, the current findings are in some measure contradictory. While researchers agree that young children can learn from fantasy (e.g., Weisberg and Hopkins 2020; Hopkins and Lillard 2021; Hopkins and Weisberg 2021), there is also evidence of fantasy hampering learning, especially when it comes to the transfer of knowledge (e.g., Bonus and Mares 2019; Richert and Smith 2011; Walker et al. 2015).

One of the arguments made for the effectiveness of fantastical elements in learning is the way in which these elements may capture children’s attention and thus help them stay engaged with the learning material. Hopkins and Weisberg (2021), for example, observed that children learned science-related educational content better with a story containing improbable events that violated the laws of physics than with an ordinary story. This effect was found to be even more important when the violation concerned a domain in which the children had knowledge, which in turn would indicate to them to remain particularly vigilant about what they were told (Hopkins and Weisberg 2021). Beyond simply capturing children’s attention, these elements may even induce deeper processing of the to-be-learned material (Weisberg and Hopkins 2020).

However, learning with fantastical material can be counterproductive when it comes to transferring what has been learned in an unrealistic context to the real world, especially in the long run (Bonus and Mares 2019). Indeed, when learning with fantastical content, children must evaluate whether what they are learning is true only in the world of the story, or whether the new information is also applicable to the real world. This evaluation is called the “reader dilemma” (Hopkins and Weisberg 2017). In a study by Richert and Smith (2011), it was more difficult for 3.5-to 4.5-year-olds to transfer the solution from a story to a problem to the real world when the story involved a fantastical character (i.e., a giant robot) than when it was a realistic character (i.e., a human being).

Different types of learning are possible through fictional content (e.g., science, history, geography). Among these, the effectiveness of cultural learning has often been investigated (Bonus and Mares 2019; Borzekowski and Macha 2010; Cole et al. 2003; Goldstein et al. 2022). One of the main challenges young children face in this type of learning is to correctly characterize the learning of a new culture, very different from their own, as something that can really exist rather than merely invented for the story (Mares and Sivakumar 2014). For example, in a study by Bonus and Mares (2019), three-to-five-year-olds watched a short nine-minute video of Sesame Street that described aspects of Hispanic culture. They were then asked to answer comprehension questions and judge the realism of the educational content. A week later, they were asked to prepare a party for a friend from a Hispanic country (and think about what food should be prepared, what music should be played, etc.). The authors observed that the success or failure of this transfer depended not only on memory but also on children’s judgment of the realism of the video. Better memory predicted better performance on the transfer task, but only for children who remembered the content as realistic. The authors explained the underlying mechanism by arguing that if children consider educational material to be unreal, they might not integrate it into their knowledge about the real world. Another explanation is that children might invest less effort in processing material that they consider unreal (Bonus and Mares 2019). Some studies have also shown that the number of fantastical elements in a learning material is related to children’s learning performance (Geerdts 2016; Richert and Schlesinger 2022). It seems that many fantastical elements tend to negatively impact children’s learning; thus, it has been proposed that a learning material should contain some fantasy to attract children’s attention and motivate them to learn, but not too much to avoid overwhelming the children with information that they are not able to process (Weisberg and Richert 2022).

Children Learning with Anthropomorphic Characters Anthropomorphism, which involves attributing human-specific characteristics to nonhuman animals or objects (Geerdts 2016), is a prominent approach in incorporating fantastical elements into educational materials (Goldstein and Alperson 2020). The effect of anthropomorphism on children’s learning is a subject that has yielded mixed results in research. For instance, Larsen et al. (2018) reported that children had more difficulty understanding the moral lessons conveyed in educational materials featuring anthropomorphic characters compared to human characters. Furthermore, studies have suggested that anthropomorphism may have a negative impact on children’s learning by increasing the likelihood that they will apply anthropomorphic qualities (e.g., the entity can talk, the entity can be sad) to real-world entities (Conrad et al. 2021; Ganea et al. 2014; Li et al. 2019). On the other hand, some research did not find a significant negative effect of anthropomorphism in learning materials (Conrad et al. 2021) or has observed that it could, in some cases, enhance children’s learning outcomes (Bonus and Mares 2018; Geerdts et al. 2016).

The mixed effects of anthropomorphism, similar to the more general impact of fantasy elements, have been linked to their more or less effective integration within educational materials. Researchers have noted that a crucial factor is the connection between fantastical elements and the educational goals, as poorer learning outcomes were observed when these elements were not directly linked to the educational goals (Richert and Schlesinger 2017). Additionally, the degree of anthropomorphism has been shown to influence the quality of learning. Excessive anthropomorphism may overstimulate children and distract them from the learning content, while insufficient anthropomorphism may fail to capture their attention and focus (Bonus and Mares 2018; Conrad et al. 2021; Geerdts 2016). However, it has been highlighted that the use of anthropomorphism in books and TV shows is not independent of the explicitness of educational content (the higher the anthropomorphism, the higher the explicitness), and this confound could also account for the mixed results (Nguyentran and Weisberg 2023).

1.2 Learning with virtual reality

New technologies, such as VR, are occupying an increasingly important place in our society. VR can be defined as a computer-generated environment that simulates and/or reproduces various aspects and features of the physical world (Araiza-Alba et al. 2021; Makransky and Lilleholt 2018). The arrival of these technologies in the world of education through the digitization of educational tools and materials was inevitable. However, there is still a lack of empirical evidence on how they affect children’s learning and memory.

In the cognitive theory of multimedia learning (CTML), Mayer (2009) proposed three types of cognitive processes that an adult individual must deal with during multimedia learning: essential processing, generative processing, and extraneous processing. Essential processing is the cognitive processing essential to the mental representation of the learning material. Generative processing allows one to make sense of the learning material and to understand it, and extraneous processing is any cognitive processing that does not support the learning objective and that, on the contrary, can actually distract the learner. Given that processing capacities are limited, one of the goals of VR should be to promote generative processing without adding extraneous processing.

It has been suggested that VR can promote generative processing by providing a realistic experience of being in a particular situation (Slater and Wilbur 1997). VR devices are usually differentiated according to the degree of immersion they offer. Experiencing content—whether it is a game or a simple presentation—on a desktop computer is referred to as low-immersion VR or desktop VR (D-VR). In contrast, what is referred to as high-immersion VR or immersive VR (IVR) is when content is experienced with a VR headset. A sense of presence is supposed to help the user actively engage in the learning process and to improve learning outcomes. It has been shown that immersion can have a positive impact on an individual’s sense of presence, as well as on the ease of learning (Makransky and Mayer 2022).

Researchers have observed that learning in IVR is as effective (e.g., Buttussi and Chittaro 2018; Leder et al. 2019), and sometimes more effective (e.g., Babu et al. 2018; Olmos-Raya et al. 2018; Rupp et al. 2019) than learning with D-VR. Among the arguments put forward in favor of learning with IVR, both the immersive and motivational aspects of IVR have been advanced, as well as the interactive experience it offers. For example, Makransky and Mayer (2022) investigated the impact of a virtual field trip to Greenland in a sample of middle school students and found that IVR outperformed D-VR in the aspects of presence, enjoyment, interest, and retention in immediate and delayed follow-up tests.

However, other studies have shown that IVR can also be less effective than D-VR when learning (Makransky et al. 2019; Parong and Mayer 2018). Among the arguments explaining the lower efficiency of IVR, researchers have pointed to cognitive overload due to the addition of extraneous processing, which devalues the gain in generative processing (Makransky et al. 2019). In this study, extraneous processing was mainly due to the overload of information to be considered during the IVR experiment and the participants’ lack of experience with IVR. Indeed, VR headsets are still relatively new for the wider population for whom participation in these experiments is also the first contact they have with IVR. Meyer et al. (2019) have shown that this extraneous processing can be reduced when the participants train and become accustomed to VR headsets. For IVR to be an effective learning medium, the learning experience must be specifically designed to take advantage of this medium (Makransky and Mayer 2022).

Finally, most studies concerning the effectiveness of learning in IVR have focused on an adult population. To our knowledge, no study has yet investigated the use of IVR in culturally anchored learning in young children. One reason for this may be that current IVR devices have been designed with adults and young adults in mind as primary users and are not necessarily suitable for younger children. Before extending their use to a younger audience, it would be essential to observe their effects and effectiveness in the learning of such a population.

1.3 Present study

The main goal of this study was to investigate children’s ability to learn from a virtual presentation according to its realistic (vs. anthropomorphic) characteristics. We chose to investigate children between the ages of four and six because this is a key age in the development of children and their relationship with fantasy. Indeed, from the age of four, a young child can differentiate between a fantasy character and a real person (Woolley and Cox 2007). Nevertheless, this ability still develops over the course of young children’s lives, and children of about five years old will have an easier time making this distinction than children of three and four years old (Corriveau and Harris 2015; Martarelli and Mast 2013; Martarelli et al. 2015; Sharon and Woolley 2004). In the present study, we manipulated the realism of the presentation according to the appearance of the avatar giving the presentation. In the realistic condition, the presentation was made by an avatar with the appearance of a young girl. In the anthropomorphic condition, the presentation was made by an avatar with the appearance of an anthropomorphic animal (goat). The second goal of this study was to investigate the possible impact of immersion on realistic/anthropomorphic manipulation, as well as the overall impact of immersion on learning. We manipulated the immersion of the presentation via the medium: half of the children followed the presentation with an Oculus Quest 2 VR headset (IVR condition) and the other half with a tablet (D-VR condition).

The presentation that the children had to follow concerned China and its culture. We opted for culturally anchored learning because the contextual immersion that VR offers can be particularly favorable to this type of learning (Makransky and Mayer 2022). Additionally, we chose cultural learning because it is prominently featured in educational programs for preschool children, such as Sesame Street, and has thus been extensively investigated, yielding promising results (Bonus and Mares 2019; Borzekowski and Macha 2010; Cole et al. 2003; Goldstein et al. 2022). To assess learning and memory, the children completed two tasks, a new/old recognition task and a quiz task, immediately after the presentation and one week later. To specifically measure the transfer of the information acquired during the virtual presentation into the real world, the children were also asked to complete a transfer task one week after the presentation. Finally, we controlled for verbal comprehension, theory of mind abilities, perception of the avatar, age, and gender. In agreement with Bonus and Mares (2019), we hypothesized that children in realistic conditions might outperform children in anthropomorphic conditions. We could only speculate about the impact of immersion on learning because of the mixed findings in the literature (e.g., Babu et al. (2018) show an advantage of IVR when compared to D-VR, whereas Makransky et al. (2019) show an advantage of D-VR when compared to IVR). Given the advantage of contextualizing the learning of IVR (Makransky and Mayer 2022), we expected immersion to have a positive impact on the learning outcomes of the children. Moreover, when realism and immersion interact, we expected children to learn better with the realistic avatar in IVR because immersion makes the experience more realistic.

2 Methods

2.1 Participants

The required sample size and exclusion criteria were preregistered on Open Science Framework (OSF, https://osf.io/g2x9v). An a priori G*Power analysis revealed that we needed a sample of at least 128 participants to detect a medium effect (2 × 2 between-subjects analysis of variance; parameters: f = 0.25, α = 0.05, 1 − β = 0.80; Faul et al. 2007). We increased the sample size to around 40 children per group (N = 168, 85 female, M = 5.4, ranging from 4 to 6 years) to make sure to have enough participants after data exclusion.

Children were recruited from different preschools across Switzerland and were semi-randomly assigned to one of our four different conditions, which included a presentation by an anthropomorphic avatar in IVR (N = 46, 25 female, mean age = 5.39, 31 French-speaking children), a presentation by an anthropomorphic avatar in D-VR (N = 39, 15 female, mean age = 5.33, 31 French-speaking children), a presentation by a realistic avatar in IVR (N = 47, 24 female, mean age = 5.34, 33 French-speaking children), and a presentation by a realistic avatar in D-VR (N = 36, 17 female, mean age = 5.56, 28 French-speaking children). Since we tested different children from the same classroom on the same day, we decided for logistical and practical reasons to test each classroom in the same condition. Switzerland is a country with several national languages; 123 children were tested in French, and 45 in German. The different tools and tests were adapted accordingly, either with the official translation or adaptation by a native speaker. We obtained ethical approval for the study from the institute’s ethics committee, and written parental consent was obtained for each child.

2.2 Design

In this study, we manipulated the realism of the avatar in realistic and anthropomorphic conditions, as well as the level of immersion in IVR and D-VR conditions. This gave us a 2 × 2 design with the conditions of an anthropomorphic presentation in IVR, an anthropomorphic presentation in D-VR, a realistic presentation in IVR, and a realistic presentation in D-VR. We measured memory performance with two tasks (old/new recognition task and quiz task) and generalization with a transfer task. Control tasks were also included to assess the children’s theory of mind and language abilities. We controlled the theory of mind for its impact on the understanding of fantasy (Martarelli et al. 2015) and language abilities because of the verbal skills needed by the children during the experiment (see materials below). For exploratory reasons, we included five items to assess the perception of the avatar, as well as enjoyment. We assessed these items to control for children differentiating effectively between the two avatars as being more or less realistic (manipulation check) and to find out whether there were conditions in which they experienced more enjoyment.

2.3 Materials

2.3.1 Virtual presentation and virtual environment

The virtual environment was designed in Unity (version 2019.4.32). Different 3D models were created using Blender software (www.blender.org) in versions 2.82 to 2.93. The textures used came from the open-source database “TextureHaven” (www.TextureHaven.com) and were edited with Photoshop (Creative Cloud 2019). The music and different sounds used came from a license-free database (www.freesound.org). Apart from the resolution (1833 × 1920 for each eye in IVR and 1080 × 1920 in D-VR) and the field of view (89° ± 4 in IVR and 87.1° in D-VR), the virtual environment was identical in IVR and D-VR. A video of the virtual presentation is available online on OSF (https://osf.io/v8bnp/).

The virtual presentation consisted of a short, two-minute description of China and certain aspects of its culture by an avatar. The virtual environment consisted of a representation of a Chinese temple in a forest (see Fig. 1). The virtual presentation was made either by an avatar with the appearance of a young Asian girl or by an anthropomorphic animal (goat), according to the testing condition of the participant (see Fig. 2).

Fig. 1
figure 1

A screenshot of the virtual environment

Fig. 2
figure 2

Human avatar (Left) and anthropomorphic avatar (Right)

Before the presentation, each participant had time to get used to the virtual environment and the viewing medium (approximately two minutes). At the beginning of the presentation, the children were told that a character was going to appear and that they had to listen carefully to what she said and observe what she showed. During the first part of the presentation, different types of information, such as the name of the character and the location of China on the world map, were revealed. During the second part of the presentation, the avatar displayed a series of 12 visual stimuli for the participant (see Appendix 1 for an illustration of the stimuli). The first six stimuli were 3D models related to the food that one can typically eat in China, three others were images of typical places to visit in China, and the last three visual stimuli were images of activities to do in China. Each of these stimuli was accompanied by a commentary from the avatar that presented the 3D models and images in more detail. The different visual stimuli were either created by the authors or obtained from a copyright-free database (www.pixabay.com). The presentation was completely dubbed with the voiceover of two native speakers for our two different versions—one in German and one in French.

2.3.2 Previous knowledge of China

To assess previous knowledge about China, we asked the children about the color of China’s flag and the position of China on a world map. More precisely, we showed the children the Swiss flag (i.e., Do you know what this is?), described the Swiss flag, and then asked the children about the Chinese flag (color and form). With the map, the procedure was similar. We showed a map as well as the position of Switzerland on the map and then asked the children about the position of China on the map.

2.3.3 Memory tasks

To assess the children’s recall of the virtual presentation, we used three different tasks: a new/old recognition task, a quiz task, and a transfer task.

New/Old Recognition Task Designed on Psychopy (version 2020.2.10; Peirce et al. 2019), the new/old recognition task was completed on a computer by the child. This task consisted of the presentation of a series of 12 stimuli in random order, including six old stimuli and six new stimuli. The stimuli are reported in Appendix 1. Old stimuli were taken from the 12 visual stimuli presented during the virtual presentation. The other 12 new stimuli were images from the internet related to things one could visit or do in another foreign country (India). For each trial, the child had to indicate whether the item was old or new by pressing a key on the computer keyboard. The task was self-paced. To facilitate the child’s task, colored stickers were applied to the corresponding keys on the computer. To avoid any confusion, the children were told that an “old” item was an image seen in the presentation, and a “new” item was an image that did not appear in the presentation. Between each trial, a fixation cross appeared for 800 ms. The child completed this task twice: once immediately after the presentation, and the second time one week later. We used two different sets (sets A and B) of 12 pictures for a total of 24 items. The order in which these sets were used (post-test vs. follow-up test) was counterbalanced.

Quiz Task The quiz task consisted of a series of eight questions related to the presentation that the child had just completed (e.g., What was the name of the character in the story? What does her first name mean?). The questions are reported in Appendix 2. Given the young age of the children, the questions were asked orally by the experimenter. Four of these questions were related to information given only orally by the avatar during the presentation (see Appendix 2), while the other four were presented orally and visually during the presentation. This was done to ensure different levels of difficulty and to avoid ceiling effects. When children were unable to provide the correct answer, they were given four choices (one correct, and three wrong), and they could choose their answer among these options. Each participant completed this task twice (same questions): the first time after the presentation, and the second time one week later.

Transfer Task For the transfer task, we presented the children with a picture of 10 different types of food on a table (see Appendix 3). Among these 10 items, five were Chinese foods from the virtual presentation, and five were foods that did not come from China. We asked the children to indicate four types of food that were typical of China. The children completed this task only once (one week after the presentation).

2.3.4 Control tasks

Theory of Mind Abilities To measure the theory of mind abilities, we presented the children with four different theory of mind tasks, as used by Martarelli et al. (2015). These included two false belief tasks with representational change—the “Crayon Box” and the smarties box (Perner et al. 1987)—and two other classic theory of mind tasks—”Maxi and the Chocolate” (Wimmer and Perner 1983) and “Mouse and the Cheese” (Clement and Perner 1994). These tasks were enacted by the experimenter for the children using test materials (i.e., colored boxes, puppets/dolls, and wooden objects). Each task was scored as either successful or unsuccessful for a total of four points. The higher the values, the better the performance. The two false belief tasks with representational change were successfully passed if the children answered the four different questions correctly (two control and two test questions), and the two classical false belief tasks were successfully passed if the children answered the three different questions correctly (one control and two test questions). The complete scripts for these tasks can be found on OSF (https://osf.io/v8bnp/).

Language Abilities To assess language abilities, we used the corresponding subtests of the Wechsler Preschool and Primary Scale of Intelligence IV (WPPSI-IV) on active and passive language skills (Wechsler 2012). As the French and German tests had a different number of items, we decided to use the percentage of correct answers rather than the original scoring system. For the rest of the procedure, the original instructions were respected. Language abilities were tested on the second day of testing. The two scores were averaged to create a single language abilities score. The higher the values, the better the performance.

2.3.5 Exploratory items

We assessed the perception of the avatar with three items, including one about the appearance of the avatar (item: “What did the character in the video look like?,” to be answered by choosing between human and animal) and the veracity of the information she transmitted (item: “Do you believe what the character told you? Did the character tell the truth?,” to be answered by a yes/no response), as well as its possible existence in the real world (item: “Can a character like this really exist in our world? Could we meet a character like this in our world?,” to be answered by a yes/no response). Furthermore, we asked participants whether they liked the virtual presentation (item: “Did you like the virtual presentation?”). For a response, we used a pictorial measure to be answered on a five-point Likert scale (1 = not at all to 5 = a lot). Finally, following Bonus and Mares (2019), we asked the children whether they thought the presentation was just for fun or for educational purposes (choice between two options).

2.4 Procedure

For each child, the study was separated into two 20-min periods on two days, one week apart. The children were tested individually in isolated rooms in their respective schools. On the first day, the children had to answer a questionnaire regarding their previous knowledge about China. They were then asked to attend to a two-minute presentation about China delivered by a virtual character (young Asian girl or anthropomorphic animal) with a VR headset or a tablet, depending on their experimental condition. Before the presentation, the children had a short time (approximatively two minutes) to get used to the virtual environment and the medium used (VR headset/tablet). The presentation in IVR was rendered on an Oculus Quest 2. The tablet used during the D-VR presentation was a Samsung Galaxy Tab A7. After the presentation, the children had to answer the quiz task as well as the new/old recognition task to assess what they had retained from the presentation, as well as five exploratory items to assess how they perceived the virtual character delivering the presentation and whether they enjoyed the presentation. Finally, they had to complete the first two tasks to assess their theory of mind abilities. These tasks were carried out in a fixed order, as listed above.

One week later, the children were again asked to perform the two memory tasks, which included the quiz task and the new/old recognition task, to assess memory consolidation. After the memory tasks, the children had to perform the transfer task and finally finish the testing with the last four tasks: two assessing their theory of mind abilities and two assessing their language abilities (passive and then active language). The study also included a questionnaire sent out to the parents that was not considered in this paper. See Fig. 3 for a graphical representation of the experimental procedure.

Fig. 3
figure 3

Graphical representation of the procedure

2.5 Analytical approach

Following our preregistration on OSF (https://osf.io/g2x9v), we computed different two-way analyses of variance to compare the different memory measures among the four conditions, which included the presentation by an anthropomorphic avatar in IVR, by an anthropomorphic avatar in D-VR, by a realistic avatar in IVR, and by a realistic avatar in D-VR. We report the analyses of variance in the main text and the analyses of covariance (with the covariates of age, gender, theory of mind abilities, and language skills) in Appendices 4, 5 and 6. In addition, we computed logistic regressions to analyze the answers to the four questions about avatar perception and the purpose of the virtual presentation. For the liking score, we computed a further analysis of variance. The logistic regressions, as well as this last analysis of variance, were not preregistered. The analyses were computed using jamovi version 1.6.23 (The Jamovi Project, 2021). The dataset is available on OSF (https://osf.io/v8bnp/).

3 Results

Due to dropouts and technical problems, we did not have the same number of participants on the first and second days one week later; therefore, we computed analyses for each day separately. Additional analyses that consider both days in the same model, using a mixed design and focusing on a smaller sample due to the dropouts produced the same results in terms of significance and are reported in Appendices 7 and 8. Moreover, we excluded children who had previous knowledge about China (children with a score > 2 on the basic knowledge task, n = 4) and outliers that were three standard deviations above/below the mean. We provide more information about data exclusion for each analysis in their respective parts.

3.1 New/old recognition task

To correctly assess sensitivity, we used signal detection theory (Macmillan and Creelman 1991). We calculated d’ as in Martarelli and Mast (2013):

$$d{\prime} = z - {1}\left( {{\text{Hit}}} \right){-}z - {1}\left( {\text{False Alarms}} \right)$$

where z − 1 is the inverse of the standard normal function, Hit is the proportion of stimuli correctly categorized as old, and False Alarms are the proportion of stimuli erroneously categorized as old. As we obtained extreme values for the hit and false alarm rates (1 and 0, respectively), we adjusted these values according to the method proposed by Hautus (1995) and Brown and White (2005). We added 0.5 to the number of hits and the number of false alarms and 1 to the number of signal trials and the number of noise trials. Two two-way analyses of variance were performed to compare the memory scores of the new/old recognition task between our four experimental groups depending on the level of immersion and realism of the presentation: one for the data collected during the post-test (immediately after the intervention) and one for the data collected during the follow-up test (one week after the intervention). We report analyses of covariance (with the covariates of age, gender, theory of mind, and language abilities) in Appendix 4. Adding the covariates did not change the results reported here in terms of rejecting the null hypotheses. We depict all the data in Fig. 4.

Fig. 4
figure 4

d’ scores for the new/old recognition task. These scores were calculated according to the realism of the presentation (realistic/anthropomorphic) and the level of immersion (IVR/D-VR) during the post-test directly after the presentation and during the follow-up test one week later. Error bars represent one SEM

Post-Test (Immediately After the Intervention) On the first day, we had to exclude three participants because they were not able to finish the task due to a lack of time and three other participants due to technical problems. In the end, we carried out the analysis with 158 children, including 34 in the realistic D-VR condition, 46 in the realistic IVR condition, 37 in the anthropomorphic D-VR condition, and 41 in the anthropomorphic IVR condition.

We observed a significant impact of immersion on the results for the new/old recognition task (F(1, 154) = 18.331; p < 0.001, ηp2 = 0.106). Contrary to our expectations, participants in the D-VR condition (M = 1.02, SD = 0.143) performed significantly better than participants in the IVR condition (M = 0.906, SD = 0.180). On the contrary, we did not observe a significant difference between the realistic (M = 0.956, SD = 0.175) and anthropomorphic (M = 0.957, SD = 0.173) conditions for the results of the new/old recognition task (F(1, 154) = 0.008; p = 0.927, ηp2 < 0.001). Further, we found no significant interaction between the levels of realism and immersion of the presentation (F(1, 154) = 0.936; p = 0.335, ηp2 = 0.006).

Follow-Up Test (One Week After the Intervention) On the second day, we had to exclude seven participants because they dropped out or were not able to finish the testing, 10 participants due to technical problems, and one participant for a score lower than three times the standard deviation. In the end, we carried out the analysis on 146 children, including 32 in the realistic D-VR condition, 44 in the realistic IVR condition, 39 in the anthropomorphic D-VR condition, and 31 in the anthropomorphic IVR condition.

On the second day, we found, once again, a significant effect for the level of immersion on the children’s performance on the new/old recognition task (F(1, 142) = 13.389; p < 0.001, ηp2 = 0.086). Again, participants performed better in the D-VR condition (M = 0.573, SD = 0.177) than in the IVR condition (M = 0.478, SD = 0.175). Contrary to the first day, on the second day, one week later, we observed a significant difference in the children’s performance on the new/old recognition task depending on the level of realism of the presentation (F(1, 142) = 7.722; p = 0.006, ηp2 = 0.052). Participants performed better in the realistic condition (M = 0.556, SD = 0.196) than in the anthropomorphic condition (M = 0.490, SD = 0.160). Moreover, we found no significant interaction between the levels of immersion and realism of the presentation (F(1, 142) = 0.019; p = 0.890, ηp2 < 0.001).

3.2 Quiz task

The children were able to obtain a maximum of 12 points on the quiz task. The higher the score, the better the children performed. Two two-way analyses of variance were performed to compare the memory scores of the quiz task between our four experimental groups depending on the level of immersion and realism of the presentation, including one for the data collected during the post-test (immediately after the intervention) and one for the data collected during the follow-up test (one week after the intervention). We report the analyses of covariance (with the covariates of age, gender, theory of mind, and language abilities) in Appendix 5. Adding the covariates did not change the results reported here in terms of rejecting the null hypotheses.

Post-Test (Immediately After the Intervention) On the first day, we had to exclude one participant because of the inability to finish the task due to lack of time and two other participants for a score lower than three times the standard deviation. In the end, we carried out the analysis on 161 children, including 34 in the realistic D-VR condition, 45 in the realistic IVR condition, 39 in the anthropomorphic D-VR condition, and 43 in the anthropomorphic IVR condition.

As with the new/old recognition task, we observed a significant impact of immersion on the results of the quiz task for the first day (F(1, 157) = 8.792; p = 0.003, ηp2 = 0.053). Once again, participants in the D-VR condition (M = 6.34, SD = 1.25) performed significantly better than participants in the IVR condition (M = 5.67, SD = 1.52). We did not observe a significant difference between the realistic (M = 5.92, SD = 1.56) and anthropomorphic (M = 6.02, SD = 1.32) conditions in the results of this task (F(1, 157) = 0.204; p = 0.652, ηp2 = 0.001). We found no significant interaction between the levels of realism and immersion in the presentation (F(1, 157) = 2.069; p = 0.152, ηp2 = 0.013).

Follow-Up Test (One Week After the Intervention) On the second day, we had to exclude nine participants because they dropped out or were not able to finish the testing, and one participant for a score lower than three times the standard deviation. In the end, we carried out the analysis on 154 children, including 32 in the realistic D-VR condition, 45 in the realistic IVR condition, 39 in the anthropomorphic D-VR condition, and 38 in the anthropomorphic IVR condition.

Contrary to the previous day and the new old/recognition task, we did not find a difference in the quiz scores depending on the level of immersion (F(1, 150) = 2.95; p = 0.088, ηp2 = 0.019). Participants in the D-VR conditions (M = 6.24, SD = 1.20) performed as well as participants in the IVR conditions (M = 5.82, SD = 1.59). We also found no difference in the quiz results between participants in the realistic (M = 5.88, SD = 1.50) and anthropomorphic (M = 6.14, SD = 1.35) conditions on the second day (F(1, 150) = 1.20; p = 0.275, ηp2 = 0.008). Further, we found no significant interaction between the levels of realism and immersion of the presentation (F(1, 150) = 2.39; p = 0.124, ηp2 = 0.016).

3.3 Transfer

The children were able to obtain a maximum of four points (one point for each item correctly remembered as Chinese food) on the transfer task. The higher the score, the better the children performed. The children were tested only on transfer in the follow-up testing phase. Once again, a two-way analysis of variance was performed to compare the transfer scores between our four experimental groups, depending on the levels of immersion and realism of the presentation. We had to exclude nine participants because they dropped out or were not able to finish the testing, and one participant for a score lower than three times the standard deviation. In the end, we carried out the analysis on 154 children, including 33 in the realistic D-VR condition, 45 in the realistic IVR condition, 38 in the anthropomorphic D-VR condition, and 38 in the anthropomorphic IVR condition. We report the analyses of covariance (with the covariates of age, gender, theory of mind, and language abilities) in Appendix 6. Adding the covariates did not change the results reported here in terms of rejecting the null hypotheses.

As for the other tasks, we also observed a significant impact of the immersion level on the transfer task (F(1, 150) = 5.286; p = 0.023, ηp2 = 0.034). Again, participants in the D-VR condition (M = 3.23, SD = 0.778) performed significantly better than participants in the IVR condition (M = 2.94, SD = 0.817). We did not observe a significant difference between the realistic (M = 3.14, SD = 0.768) and anthropomorphic (M = 3.00, SD = 0.849) conditions (F(1, 150) = 1.651; p = 0.201, ηp2 = 0.011). We also found no significant interaction between the levels of realism and immersion (F(1, 150) = 0.073; p = 0.787, ηp2 < 0.001).

3.4 Perception of the Avatar

Three questions were asked to the children concerning their perception of the avatar and the truthfulness of its words. As a reminder, the child had the choice between two propositions for each of these questions. Although the children were encouraged to answer with one of the two propositions, some of them failed to provide an answer and were therefore not taken into account in the analyses. Each question was analyzed individually with respect to the realism of the presentation and the medium used. To do this, we computed binomial logistic regressions for each of the three questions, with immersion (D-VR/IVR) and realism (realistic/non-realistic) as factors. The count of the children’s responses to each question is reported in Appendix 9.

For the first question (“What did the character in the video look like?”), we had to exclude 18 participants because they were not able to select an answer from among the propositions. In the end, we carried out the analysis for 146 children, including 28 in the realistic D-VR condition, 43 in the realistic IVR condition, 34 in the anthropomorphic D-VR condition, and 37 in the anthropomorphic IVR condition. The logistic regression model was statistically significant (χ2(2) = 169, p < 0.001). The realism of the presentation was a significant predictor of how children perceived the avatar (Z = 6.108, p < 0.001). The odds of classifying the avatar as a little girl were 2626.210 (95% CI 209.920–32855.180) in the realistic condition. Immersion did not significantly predict how the child perceived the avatar (Z = 0.648, p = 0.517).

For the second question (“Do you believe what the character told you? Did the character tell the truth?”), we had to exclude 23 participants because they failed to provide an answer. In the end, we carried out the analysis on 141 children, including 30 in the realistic D-VR condition, 38 in the realistic IVR condition, 33 in the anthropomorphic D-VR condition, and 40 in the anthropomorphic IVR condition. The logistic regression model was not statistically significant (χ2(2) = 0.379, p = 0.828). The count of the children’s responses reported in Appendix 8 illustrates that the children overall believed what the character told them.

For the third question (“Can a character like this really exist in our world? Could we meet a character like this in our world?”), we had to exclude 16 participants because they failed to provide an answer. In the end, we carried out the analysis on 146 children, including 29 in the realistic D-VR condition, 41 in the realistic IVR condition, 40 in the anthropomorphic D-VR condition, and 36 in the anthropomorphic IVR condition. The logistic regression model was statistically significant (χ2(2) = 9.61, p = 0.008). The realism of the presentation was a significant predictor of how the children perceived the possible existence of the avatar (Z = 2.75, p = 0.006). The odds of classifying the avatar as possibly existing in the real world were 2.831 (95% CI 1.3482–5.945) in the realistic condition. Immersion was not a significant predictor of responses to this question (Z = 1.16, p = 0.244).

3.5 Enjoyment and Perceived Purpose of the Virtual Presentation

To assess the enjoyment and perceived purpose of the virtual presentation, the children had to answer two questions afterward. First, the children were asked whether they enjoyed the presentation on a five-point Likert scale. To investigate the impact of realism and immersion on their appreciation of the presentation, we computed a two-way analysis of variance. We had to exclude one participant because of the inability to select an answer from among the propositions. We did not observe a significant difference between the IVR (M = 4.70, SD = 0.714) and D-VR (M = 4.69, SD = 0.701) conditions on the appreciation of the presentation (F(1, 159) = 0.009; p = 0.925, ηp2 = 0.000). We also did not observe a significant difference between the realistic (M = 4.67, SD = 0.758) and anthropomorphic (M = 4.72, SD = 0.653) conditions with respect to their effect on the appreciation of the presentation (F(1, 159) = 0.273; p = 0.602, ηp2 = 0.002). We found no significant interaction between the levels of realism and immersion in the presentation (F(1, 159) = 0.254; p = 0.615, ηp2 = 0.002).

Second, the children were asked if they thought we had showed them the presentation for fun or for learning. We had to exclude five participants because they were unable to select an answer from among the propositions. In the end, we carried out the analysis on 159 children, including 33 in the realistic D-VR condition, 46 in the realistic IVR condition, 38 in the anthropomorphic D-VR condition, and 42 in the anthropomorphic IVR condition. To investigate the impact of realism and immersion, we computed a binomial logistic regression with immersion (D-VR/IVR) and realism (realistic/anthropomorphic) as factors. The logistic regression model was not statistically significant (χ2(2) = 0.077, p = 0.962). The count of the children’s responses reported in Appendix 10illustrates that the children considered the overall presentation to be for learning.

4 Discussion

In the present study, we investigated the impact of fantasy and immersion on young children’s recall of cultural information. The findings provide evidence that young children are sensitive to the appearance of the avatar doing the presentation; specifically, whether the avatar was realistic or anthropomorphic had an impact on their memory performance one week after the intervention. Contrary to our expectations, children in the D-VR conditions outperformed children working with headsets (IVR conditions). Gender, age, theory of mind, and verbal abilities did not have an impact on the significance of our results. From a more practical point of view, almost all the children very much appreciated the presentation, regardless of its realistic and immersive nature (mean of 4.69 on a 5-point Likert scale); they also considered the presentation to have been delivered for learning. In the following sections, we provide possible explanations of our findings.

4.1 Children learning with fantasy

As expected, the children performed better in the realistic condition than in the anthropomorphic condition. However, this difference was significant only one week after the virtual presentation and only for the new/old recognition task. It seems that at first, children memorized the information equally well, independently of the realism of the avatar. These results are consistent with the literature indicating that children can learn from fiction (Aydin et al. 2021; Conrad et al. 2021), although unlike some other authors (e.g., Hopkins and Lillard 2021), we did not find that fantastical elements significantly enhanced learning. On the contrary, our results for the new/old recognition task one week after the virtual presentation suggest an advantage of realistic information. It is possible that the information that children consider realistic is better consolidated in their long-term memory. This finding is consistent with the results of Bonus and Mares (2019), who found that children remembered and understood a cultural presentation better when they judged it to be realistic, but only in the long-term period of 5 to 12 days after the intervention.

Why should it be easier for young children to recall the elements of a realistic presentation? First, when presented with realistic information, children can import more real information and thus create a solid mental representation of the information by using prior knowledge of the world to support their understanding. Furthermore, information from an anthropomorphic character might interfere with memory processes; fantastical material might enhance the cognitive demand of the task and hamper the encoding/retrieval process (Fisch 2000; Fisch et al. 2005). Yet another explanation for the better memory of information from a realistic avatar is that children might invest more effort in processing realistic material since it is more relevant to the real world (Mares and Sivakumar 2014). It remains an open question whether children put more effort into processing elements from a realistic avatar, or whether elements presented by a realistic avatar are simply easier to recall in a long-term old/new recognition task.

We believe that one of the possible reasons why we did not find an effect of realism with the quiz task is because we used the exact same questions immediately after the intervention and one week after the presentation. Therefore, on the second quiz, one week after the presentation, the children could rely not only on what they learned from the presentation but also on the quiz itself, which might have reinforced and decontextualized the learned cultural information. However, it is surprising that we did not find an impact of realism on the transfer task, which was used only one week later. A possible explanation here is that the task had too few items to detect the effect. Although the effect is not generalizable among our measurements, we think that the result is interesting, and that it deserves further research. Future studies should investigate the conditions under which realism has an impact on learning. In their recent literature review, Weisberg and Richert (2022) propose that the quality of learning with material containing fantastical elements is dependent on multiple factors (i.e., degree of realism, target educational content, and depth of integration of the fantastical elements in the story). Future studies should also test the extent to which these different factors are important for short- and long-term retention.

Finally, the children correctly differentiated the two avatars that were presented to them (young girl vs. anthropomorphic animal). One of the elements that we were able to identify with our question as allowing them to distinguish the two avatars as more or less realistic was the possibility that the avatar exists in the real world, and that one could meet it. The children in the realistic condition thought significantly more often that one could meet the avatar of the young girl in the real world. On the contrary, children in anthropomorphic conditions thought significantly more often that the anthropomorphic avatar did not exist in the real world and could not be met anywhere on Earth. We did not observe any other significant differences between the conditions in the responses to our exploratory questions. Future research should continue to investigate the different characteristics that make an avatar more or less realistic and the impact of such characteristics on learning.

4.2 Children learning with immersive virtual reality

Contrary to our hypothesis, children performed significantly worse in the IVR conditions across tasks (new/old recognition task, quiz task, and transfer task). Based on previous research (Babu et al. 2018; Makransky and Mayer 2022), we expected that children would retrieve information at least as well in the IVR as in the D-VR conditions. A first possible explanation for this finding is that IVR induced too much extraneous processing compared to the generative processing that was thought to result from immersion. This interpretation is in line with the findings of Makransky et al. (2019), who reported that IVR elicited a stronger feeling of presence and more liking but less learning.

What were the possible drivers of extraneous processing in our immersive environment? First, the novelty of IVR could have generated increased cognitive processing; in other words, children in the D-VR conditions may have performed better because they were more familiar with the learning medium when compared to children working with IVR. An investigation by Bernath et al. (2020) showed that in Switzerland, out of nearly 900 households with children aged four to six, 79% were equipped with at least one tablet. VR headsets are much less common, especially among young children. In our study, 54% of the households specified that their children used a tablet at least once a month, while it was only 11.3% of the children who had already tried a VR headset, and 8.3% only on one occasion. According to the theory of multimedia learning (Mayer 2009), habituation to the use of a VR headset can be considered extraneous processing and would offset the generative processing linked to the benefits of immersion offered by IVR. Even though the children in our study had a short phase of familiarization with the virtual environment and the corresponding medium before the virtual presentation, this might not have been sufficient to acquire the same level of familiarization between headsets and tablets. It is plausible that children might profit more from IVR if they have a longer phase of habituation or use IVR repeatedly. This is what Meyer et al. (2019) observed when they added IVR pre-training phases to their IVR teaching design.

Another driver of extraneous processing in our environment might have been the details of the graphics, especially assets that were not directly useful for learning purposes. Despite our careful design of the environment, it is still possible that the level of graphic detail in the VR environment was too high and created extraneous processing. These details would have been particularly disturbing in the IVR conditions, where the level of resolution was higher, and the environment fully surrounded the children, as compared to the lower-resolution D-VR conditions, where the children were sitting in front of the tablet (i.e., a more restricted field of view). It has been suggested that to fully benefit from IVR, the virtual environment should not be too detailed, especially regarding elements that are not related to learning. Nevertheless, this reduction in detail should not be at the expense of a loss of immersion or of the learner’s ability to contextualize the learning materials (Makransky and Mayer 2022).

Yet another explanation for the D-VR advantage we found is the concordance of the medium used during the encoding and recall tasks. Indeed, while the encoding was done on a tablet or with a VR headset, depending on the condition, the new/old recognition task was always carried out on a computer, a medium closer to the tablet than to the VR headset. In this regard, children in the D-VR conditions may have had an advantage over children in the IVR conditions. Future research should carefully compare the use of different media (IVR, D-VR, and computer experiments) during the encoding and recall phases.

5 Conclusion

In view of our findings and the rest of the literature, we continue to note mixed results regarding the effectiveness of IVR and materials containing fantastical elements in learning experiences. Regarding VR, our results clearly point to a benefit of D-VR when compared to IVR for four-to-six-year-olds. As discussed above, follow-up studies (e.g., manipulating the degree of habituation with the medium, the level of detail of the VR environment, or using IVR to assess memory performance) are needed to fully understand this finding. To our knowledge, this is one of the first studies where such young children had to learn with a VR headset, and before proposing the use of IVR with preschool children, much more research is needed about children’s learning in VR. This type of research is needed to assess the full educational potential of these methods and to avoid potential detrimental effects, as it is currently unknown whether VR adds value to more traditional materials in preschool settings.

Finally, our results seem to suggest that children do not retrieve information from an anthropomorphic avatar as well as they retrieve information from a realistic avatar when considering memory consolidation (i.e., assessment one week after the presentation). It is important to specify that this study has looked at the impact of an anthropomorphic avatar on the recall of cultural information, and that because fantasy appears to place more pressure on cognitive processes, it may have some value in supporting cognitive development in other contexts, such as fantastical play (Thibodeau et al. 2016). As these results were significant only for the new/old recognition task, it is important to continue investigating the impact of realism on children’s learning and see if these results can be replicated and generalized to other tasks.

It is also important to note that the length of the presentation in the study (5 min) is below the standard length of TV programs aimed at children in this age group. The shorter duration of the presentation was chosen mainly due to limited scientific data on the effects of IVR on such a young population. In prioritizing ethical considerations, we decided not to make the presentation too long, to avoid any potential negative effects. Replicating this study with a longer duration in IVR versus D-VR would be interesting, as it would better align with children’s everyday experience with educational TV programs.

To summarize, this study provides evidence that the type of medium (IVR vs. D-VR) and the type of material (anthropomorphic vs. realistic) influence the way information is processed, encoded, and memorized by young children. Even though the underlying mechanisms remain unclear, these findings highlight the importance of carefully selecting and designing educational material for young children when seeking to support their recall of cultural information.