The effect of immersion on sense of presence and affect when experiencing an educational scenario in virtual reality: A randomized controlled study

1 Abstract Virtual reality (VR) technology has been used to learn skills for decades. While no standardized measure exists for learning outcomes in VR training, commonly explored outcomes are immersion, sense of presence and emotions. Methods In this paper, the objective was to investigate these outcomes in two VR conditions, immersive and desktop in a randomized controlled trial with a parallel design. The sample consisted of 134 university students (70 women, mean age 23 years, SD = 2.99). These were randomized using a covariate-adaptive randomization procedure based on stratification by gender into two interventions; play out a VR scenario in either desktop (control group) or immersive VR (intervention group). The setting was a university lab. Results There was a significant within subject effect for positive affect and a significant between-group effect for the immersive compared to desktop VR groups. Positive affect was reduced after interacting with the VR scenario in both the immersive and desktop versions, however, positive affect was overall higher in the immersive, compared to the desktop version. The results show higher scores for sense of presence (d = 0.90, p < 0.001) and positive affect pre- and post-scenario in the immersive VR condition (d = 0.42, p = 0.017 and d = 0.54, p = 0.002) compared to the desktop condition. Conclusion Immersive VR may be beneficial in higher education as it promotes high levels of sense of presence as well as positive emotions. When it comes to changing the immediate emotions of the students, type of VR does not seem to matter. The project was funded by the Norwegian Directorate for Higher Education and Skills.


Introduction
There is an increasing use of technology in higher education across most educational fields, gradually replacing traditional face-toface learning [1]. This is not only due to the new experiences gained by the COVID-19 pandemic where many were forced to use more technology in their education [2], but also because technology now can offer many benefits for teaching in terms of for example possibilities of individualized teaching, improved feedback and better alignment of learning goals [3]. One of these emerging educational technologies is the immersive head-mounted displays (HMD) used to explore different virtual realities (VR) [4]. Immersive H2. The changes in positive affect after playing through a virtual scenario in the immersive VR group will differ from the changes in the desktop VR group.
H3. The changes in negative affect after playing through a virtual scenario in the immersive VR group will differ from the changes in the desktop VR group.
H4. There will be a higher sense of presence in immersive VR than in desktop VR.

Design
We used the CONSORT statement to report the present study but did not preregister the study in a trial register. To test the hypotheses a randomized controlled trial was designed. Prior to the present study, a pilot study was run. This was to make sure that the experiment would run smoothly, that the instructions were clear, that the design worked as planned, and to identify other experimental challenges. As a result of this pilot study, multiple factors were adjusted such as the translation of the questionnaire, final adjustments of the interactive video to exclude bugs, the instructions given to the participants, as well as the design of the study. The original design was a crossover-design where all participants were tested in both the immersive VR and desktop VR condition. Half of them started in the immersive VR, and half in the desktop VR. All participants reported that they got bored when playing through the scenario multiple times in each group which compromised the truthfulness of the answers on the questionnaire. Resultingly, the design of the study was changed. Thus, the final study consisted of two independent groups, testing two independent setups to ensure that none of the participants had played through the scenario before, making it a novel experience for everyone. Thus, a parallel design was used.
A mixed 2 × 2 design consisting of two (immersive VR and desktop VR) condition between-subjects and two timepoints (before and after) was implemented. The participants were randomly assigned to play through an interactive scenario in VR displaying a workrelated conflict relevant for soft skills in one of the two conditions. Positive affect and negative affect were measured before and after playing through the scenario. User experience, including sense of presence was measured after the scenario.

Participants
Prior to the data collection, a power analysis was performed to compute the required sample size. G*Power v.3.1.9.2 was used for this task, with the following parameters: ANOVA, repeated measures, within-between interactions. For effect size, we used direct input, f(v) 0.25 the alpha level 0.05, and the power to 0.80. The effect size input was selected using the function in G*Power that bases the calculation on recommendations from Cohen [46]. The power analysis tool returned a recommended sample size of 126 participants. Participants were recruited based on this, randomly assigned to one of the groups, matched by gender, to ensure that the sample was representative of the population.
Participants were recruited amongst the students at the local University. Inclusion criteria were that they had to be students at the university and had to be between 19 and 40 years of age. Participant recruitment strategies used included announcements in lectures, snowballing methods where participants themselves informed peers, announcements on the teaching and learning platform Blackboard, posters in the hallways, directly approaching candidates at the university, and announcements on social media. As a reward for participating in the study, five randomly selected participants across the groups got a gift card of 200 NOK. The participants in the desktop VR group also got free pizza after participation.
A total of 134 individuals were recruited and assessed for eligibility, all met the inclusion criteria, all gave their informed consent and all completed the entire study (N = 134, 70 female and 63 male, one participant reported gender as "other"). None were excluded, did not receive allocated intervention nor were lost to follow up (see flow diagram). None of them had participated in the pilot study. The ages ranged from 19 to 40 years (M = 23, SD = 2.99). Using a covariate-adaptive randomization procedure [47] based on stratification by gender the participants were assigned into two groups; To play through the scenario in either immersive VR (intervention group) (n = 69, 34 female) or desktop VR (control group) (n = 65, 36 female). The randomization was performed by one of the study administrators as the participants signed up for the study and was evaluated for eligibility. The participants first indicated what day and time they could come in for testing. This study administrator then assigned participants into the control group or intervention group by alternating between the groups by every new eligible participant, at the same time also bearing in mind the preferred day and time and the gender distribution to obtain an even gender distribution in both groups. The participants were blinded to the intervention before taking part in it, but discovered if they had been assigned to the control or intervention group by themselves immediately after starting the trial.

Procedure
The study was carried out at the university campus, in small classrooms. The trial was administered by one of the authors with the help of two student assistants during the fall of 2021, from the 20th of September to the 10th of November. In the immersive VR condition, the participants were tested in groups of maximum three people at a time. They were each assigned to a setup with an HMD, a protective face mask, a computer, two position sensors, and one hand-controller. In the desktop VR condition, the participants were tested in groups of maximum fourteen people at a time. The participants were tested in a room with three rows of computers prepared for the experiment. The computers were stationed with a 1-m gap between them. Each of the participants were assigned to a setup including a computer, a computer-mouse, and an audio headset.
Before starting the experiment both groups were given oral information about consent, the equipment, the questionnaire, the video, that the experiment was expected to last for approximately 340 min, and that they could end the experiment at any point without any consequences. This information was also given in writing before the start of the questionnaire. All participants signed a letter of consent. After been given the information, the participants answered the first part of the questionnaire: the demographics and the Positive and Negative Affect Schedule (PANAS [48]) pre.
In the immersive VR condition, the participants were asked to stand after answering the PANAS pre before an experimenter who instructed them on how to put on and adjust the HMD. They were instructed on how to put on the protective face mask, the HMD itself, adjust the head straps, the inter-pupillary distance, the speakers, and the hand-controller. They continued to stand when playing through the scenario.
When all the participants were done with the first part of the questionnaire, they were given instructions that there were to play through the scenario three times, trying out different in-game alternatives, and that when they were done, they were to continue to the second part of the questionnaire. After playing through the scenario three times, the participants of both groups found their way to the questionnaires on the computer themselves and answered the second part of the questionnaire consisting of PANAS [48] post and the user experience questionnaire. Both groups answered the questionnaires on the computers using a web-based survey builder. This was the last part of the experiment.
Prior to the data collection, the ethical and data storage aspects of the study were approved by the Norwegian Centre for Research Data (application number: 693393).

Software
The interactive scenario used in the experiment was produced for use in a lecture in work and organisational psychology and was recorded using a 360 • -camera, recording live action. The scenario had a duration of approximately 2 min and showed an interactive dialogue between an employee, acted out by a male actor (named "Olaf"), but played in a first-person perspective by the participant, and the employee's manager (named "Berit"), acted out by a female actor. The scene was a potential conflict situation, where the manager comes close up to the player and after a short session of small talk comment that the worker's last decision was made too quickly. A set of optional dialogue choices and their consequences follows in an interactive video. The player can choose three different sets of answers; a) asking what Berit means (avoiding), b) say that it was Berits' own fault, she should have been clearer and is always like this (aggressive/person oriented) and c) admit to being too quick (constructive/solution oriented). Several new dialogues follows until the player once again get to choose what to answer, with similar choice options. The participants played through the scenario as the employee, not seeing the body of their character in the video but answering with the actor's voice. See Fig. 1 for a screen capture of the video illustrating the scenario.
When playing through the scenario, the participants themselves chose, at multiple times, amongst different pre-recorded alternatives, which response to give to the manager. The manager's responses depended on which alternatives were chosen. The reaction led the dialogue to end in either a positive manner, a neutral promise of a follow up conversation with the manger, or in a conflict. Thus, all participants played through three different versions of the scenario resulting in both negative and positive outcomes. These points of interaction happened at three different points in the scenario.

Hardware
Even though the two VR conditions were made to be as similar as possible, there were some differences between the two conditions. Participants had full freedom of camera movement in the immersive VR conditions, meaning they had three degrees of freedom. They were not able to move around in the VR condition. In the desktop VR condition, participants had no control over the camera. The camera was mostly fixed in one direction, apart from in the beginning where a small camera pan occurred from a coffee machine to the person speaking.  The group in the immersive VR version used an immersive VR HMD (Oculus Rift) connected to a compatible laptop with two position sensors. The 470 g HMD consists of an OLED display with a resolution of 2160 × 1200 pixels (1080 × 1200 pixels per eye) and a field of view of 110 • , as well as integrated "3D audio" headphones. In addition, they had a controller in their hand to choose a response in the scenario. Participants in the desktop VR group played through the scenario on a standard 23-inch stationary computer screen with a resolution of 1920 × 1080 pixels accompanied by on-ear headphones. A computer mouse was used to choose a response in the scenario.

Measures of emotions and user experience (outcomes)
To measure the participant's affect and subjective user experience of the VR, a questionnaire was developed based on existing questionnaires [49]. The participants answered the first part (consisting of demographics and PANAS [48]) before playing through the scenario, and the second/final part (consisting of PANAS and user experience) directly after playing through the scenario.

Affect
The Positive and Negative Affect Schedule (PANAS) was used to measure the participant's affect both before and after playing through the scenario [48]. A Norwegian translation of PANAS was used to measure to which extent the participants experienced 20 different affects on a 5-point Likert scale ranging from "not at all" to "very much". The scale consists of 10 questions measuring positive affect (PA) (interested, alert, attentive, excited, enthusiastic, inspired, proud, determined, strong, active) and 10 questions measuring negative affect (NA) (distressed, upset, guilty, hostile, irritable, nervous, jittery, scared, afraid). Omega (ω) was used as it is generally viewed as superior measure compared to alfa [50]. In the present study the internal consistency was high (ω = 00.81 PA before intervention) to 0.87 (PA after intervention) and ω = 00.83 (NA before intervention) to 0.87 (NA after intervention). The questionnaire has been used previously to measure affect in virtual environments [e.g. 51]. The first PANAS (PANAS pre) was administrated directly before playing through the scenario, and the second one (PANAS post) directly after playing through the scenario in both conditions. There were some missing data on PANAS and only participants who had answered at least 8 out of 10 affect questions were included in the analyses.

Sense of presence
System immersion is defined as an objective measure. However, there is no established way to measure such system immersion other than classifying the different VRs as immersive or not according to objective properties. Visual immersion is a prominent part of the system immersion and consists of the components: the size of the field of view of the user, the size of the field of view that the user can orientate in, screen size, screen resolution, stereoscopy, reproduction based on head movement, realism of light, frame rate and refresh rate. Thus, different VR technologies have different measurable degrees of immersion [12]. For example, an immersive VR HMD has a larger immersion than a mobile screen or a standard desktop display. In the present study, the "measure" of system immersion reflects this and is incorporated in the different VR-technologies, one classified as high-immersive (immersive VR) and one classified as non-immersive (desktop VR) [52].
To measure the participants' subjective user experience in the virtual realities, a questionnaire was used consisting of a collection of questions from different well-known questionnaires such as "the presence questionnaire" [49]. Participants answered the questions after answering the PANAS post. The questionnaire consisted of 41 questions with answers on a 10-point Likert scale, as well as three open ended questions where the participants answered through self-produced text. The full questionnaire consisted of multiple subscales to measure the user experience. In this study the subscale "sense of presence" of the questionnaire are used. The questionnaire measures sense of presence subjectively.
Seven of the questions belonged to the subscale sense of presence, e.g. "My interactions with the virtual environment seemed natural", and "I was able to actively survey the virtual environment using vision". These items measured how real the participants experienced the scenario to be, if they felt that the experience of being in a conflict with the manager "Berit" felt like a real and natural experience that they could actively engage in. The questionnaire was translated to Norwegian by the first author prior to the experiment to fit the first language of the participants, but no other modifications of the questions were made. See Fig. 2 for a spider plot showing the various presence items. When examining the relationship between the items, we observed that one item (item 5 "The visual display distracted me from performing the assigned tasks) had negative, although very small correlation with two of the other items r − 0.02) This was the only reversed item of the instrument. As omega cannot be calculated with negative values, we removed this item, and the ω was acceptable 0.69. Thus, the "presence" variable consists of six out of seven original items.

Data analysis and statistics
IBM SPSS Statistics (version 27.0) was used for data analysis. An examination of boxplots revealed no outliers to be removed. The central limit theorem ensured that the sampling distribution of the model estimates were normally distributed [53]. The assumption of normality was thus supported. Levene's test was used to test the assumption of variance. In cases where this assumption was violated, corrected values are reported. A linear mixed-effects ANOVA was used to examine within-group and between group differences in positive affect and negative affect before and after the VR experience by the two VR conditions, assessing hypothesis 1, 2 and 3. To assess hypothesis 4, comparing sense of presence for the desktop VR and immersive VR conditions measured after the VR experience independent samples t-tests were run. Cohen's [54]guidelines were used when interpreting effect size, i.e. 0.20, 0.50 and 0.80 represent small, medium and large effect sizes.

Results
All participants were included in the analysis as they all completed the experiment. Table 1 shows the descriptive statistics of the variables in this study.

Changes in affect before and after the VR experience by condition
Positive affect: We observed a significant linear within-subject effect F(1, 132) = 22.30, p < 0.001, partial Eta squared = 14, i.e. we observed a reduction in level of positive affect from time 1 (before the VR experience) to time 2 (after the VR experience). Further, there was a significant difference between the two groups, where the immersive VR group had a significant higher level of positive affect F(1) = 10.35, p = 0.002, partial Eta squared=0.07. There were no significant interaction effect between type of intervention and time on positive affect F(1, 132) = 0.1.85, p = 0.180.
Negative affect: For negative affect we observed a significant increase in negative affect (main effect) in the sample as a whole F(1, 132) = 98.38, p < 0.001., partial Eta squared = 43. As with positive affect, there were no significant interaction effect F(1, 132) = 0.005, p = 0.940. Tests of between-subject effects showed that there was no significant differences between the two groups; F(1) = 0.15, p = 0.700.

Differences in sense of presence, positive affect and negative affect before and after the VR experience
To test if there were any statistical differences in sense of presence, and previous experience with the device between the immersive VR and desktop VR condition multiple t-tests were conducted (Table 2). We also explored the within-and between effect for positive and negative affect identified in the ANOVA. An alpha level of 0.05 was used for all statistical tests, and Cohen's d was used to examine the magnitude of the effect. On average, the participants that watched the scenario in immersive VR had a higher sense of presence (M = 7.10, SD = 1.25) than the participants in the desktop VR group (M = 5.77, SD = 1.46). This difference between the two conditions (MD = − 1. We are not aware of any harms or unintended effects in any of the groups, and did not test for e.g. motion sickness in the VR condition.

Discussion
The aim of this study was to examine whether immersive VR could be a good replacement for desktop VR in terms of soft skills training in higher education by examining affect and presence. This was done through the administration of a virtual scenario of an interpersonal conflict in either immersive VR or desktop VR. The outcomes measured were positive and negative affect, and sense of presence. We found that positive affect was reduced, and negative affect increased after interacting with the VR scenario in both the immersive and desktop versions, however, positive affect was overall greater in the immersive, compared to the desktop version. Moreover, participants had a higher sense of presence in the immersive than in desktop VR. These findings have implications for the use of VR in educational activities, as well as future research.
Multiple important findings were identified with regards to affect. The participants who experienced the scenario through immersive VR had a significantly higher positive affect after playing through the scenario than the participants in the desktop VR group. Thus, supporting Hypothesis 1, stating that there would be greater positive affect after playing through the scenario in immersive VR than in desktop VR. This corresponds with previous findings [e.g. 19,28] showing that students had higher levels of enjoyment in immersive VR than in desktop VR. In the present study the participants also had higher positive affect prior to start in immersive VR than desktop VR. As the participants were able to identify if they were in the desktop or immersive VR group when answering the affect questions, the high positive affect in the immersive condition may be due to excitement about the immersive VR experience. Positive affect was reduced after watching the conflict scenario in both conditions. This is likely due to the nature of the unpleasant interpersonal conflict theme of the video. Thus, overall, it indicate that the participants felt that the interactive videos affected them emotionally, but they still enjoyed the immersive VR version more than the non-immersive version. However, it is important to note that this could be due to the novelty of the experience with the immersive VR and could be reduced as the technology is maturing. The present study does not clarify this.
The participants also reported low levels of negative affect before and after they played through the scenario in both groups, thus indicating that they did not have a negative attitude either prior to start or after the experiment. However, negative affect increased after watching the scenario, possibly due to the negative content. Results from previous studies imply that learning in immersive VR may have a positive effect on the participants mood with higher registered positive affect in immersive VR than desktop VR [28].
As the scenario dealt with a work-related conflict where the participant played the role of an employee in a first-person perspective, one would assume that such a situation was perceived as challenging and unpleasant for everyone experiencing the scenario. Thus, expecting the scenario itself to have a negative effect on the affective state of the participants. The results reflect this, as the positive affect decreased after experiencing the scenario while negative affect increased in both groups. Although there was a significant change in affects within both groups, no differences were found in the change of negative affect between the two groups. Thus, Hypotheses 2 and 3, stating that there would be a larger change in positive/negative affect after the scenario in the immersive VR group compared to the desktop VR group, were only supported for positive affect. This is in line with previous studies showing that immersive VR, elicit high levels of sense of presence, also elicit greater emotion than less immersive technology [55].
The participants felt more present in the scenario when watching it through immersive VR, than on a desktop VR, in line with previous research [56], supporting Hypothesis 4. According to the definition of sense of presence, this indicates that when playing through the scenario in immersive VR, the participants felt more as a part of the virtual environment when being physically present in another environment, than the participants in desktop VR.
Overall our study support that level of system immersion, in the present study measured by the type of VR, increase sense of presence and positive affect, which may ultimately influence learning outcomes. This relation is generally accepted in the literature [57][58][59].
Emotions are shown to influence the learning outcome in multiple ways such as encoding, retrieving, motivation and joy, even though there is inconclusiveness in if this effect has a positive or negative effect on learning outcomes [1]. As positive emotions and enjoyment have been linked to student performance [e.g. 31] simply by being more engaging, the use of immersive VR as an educational technology can promote learning. Thus, both emotion and sense of presence may have an effect on the learning outcomes, as well as influence each other.

Strengths and limitations
Only a few studies found in the literature exploring the effect of immersive VR compare the technology to a control group [4], thus the experimental design with a control group represents a strength of the present study. While we initially planned to implement a cross-lagged design, feedback from the pilot study revealed that it would have compromised the integrity of the results, leading us to have the participants go through only one VR condition each.
Studies in the field of immersive VR are often under-powered and lack a randomized and controlled design with two or more experimental conditions. Thus, the selection of a sample size based on a power analysis performed before the data collection, also strengthens this study. The power analysis resulted in the recruitment of 134 participants, a much larger sample size compared to several previous studies. This is reflected in a review by Checa and Bustillo [4] where the number of participants often were between 20 and 50 participants. A student sample was used in this study, limiting generalisability to the larger population. On the other hand, the VR scenario was developed for use by students during the coursework, and thus has ecological validity.
Studies focusing on immersive VR have often had quite different conditions, e.g., watching an instructional video in desktop VR, while performing the task in immersive VR [9]. This makes it hard to conclude whether the effects observed stem from the immersive VR itself or the performance of the task independent of the immersive VR. In the present study, the two VR scenarios were nearly identical. The similarity in the conditions minimized the number of factors influencing the difference between the two conditions, thus making the outcome of the analysis more valid. However, the novelty of the VR environment can limit the learning experience [60], and probably also their affect and sense of presence and could have influenced the results of the present study.
The scales used in the current study was based on a questionnaire previously used in the measurement of user experience in virtual environments. The questionnaire had a high internal validity and consists of questions from several previously used, well-known questionnaires [49]. As per now, the measurement of soft skills is challenging, and there is no accessible and efficient way of measuring soft skills after an experience in immersive VR [45]. Thus, the present study can provide important knowledge about the effect of immersive VR on affect, which is known to influence learning, but cannot on predict direct learning outcomes. In addition, it is not possible to understand if the VR/desktop differences persist in the long term.

Implications
The findings of this study are relevant for several disciplines such as teaching, learning, and training. Based on previous studies linking sense of presence to higher learning outcomes as previously described [e.g. 19], the results implies that the learning outcome may be higher in the immersive VR condition than for the desktop VR condition [e.g. 4,14]. Thus, immersive VR seem to be better than desktop VR in eliciting positive emotional responses. As our participants experienced more positive affect before the immersive VR condition compared to the desktop VR condition, immersive VR may be recommended in learning situations where it is important to boost the engagement for the education.

Conclusion
The present study illustrated that students experienced a higher sense of presence in immersive VR as compared to desktop VR when playing through an educational scenario relevant for soft skills. Additionally, they had greater positive affect prior to the experience in immersive VR than desktop VR. This may reflect their motivation or attitude towards the learning, which may influence the learning outcomes. Watching the conflict scenario in immersive and desktop VR reduced positive affect and increased negative affect. However, positive affect was overall greater in the immersive, compared to the desktop version.
The result of the present study supports that immersive VR may replace desktop VR as an educational tool in higher education. This is because immersive VR elicits higher sense of presence than desktop VR. Additionally, students may be more exited to learn through such technology, as reflected by higher levels of positive affect in immersive than desktop scenarios.
Virtual experiences in immersive VR have a notable potential for use in teaching by offering students relevant experiences that they may never be able to experience otherwise. Future research should strive to obtain good measures of the learning effect of soft skill training in immersive VR. More research is needed to investigate the relation between the factors such as immersion, sense of presence and emotion on learning outcomes, and if these influence the immersive VR on specific skills. Longitudinal studies on the effect of longterm use of immersive VR in education is crucial to determine the actual effect of using such technology in educational settings across time. More controlled, randomized studies on large samples, as the present study, are needed to examine the effect immersive VR has on specific learning outcomes and their relations to immersion, sense of presence and emotions. However, the value VR creates for student learning should not be underestimated, and increased focus should be given to this technology and its potential to further enhance teaching excellence in higher education in novel and innovative ways.

Author contribution statement
Tuva Fjaertoft Lønne: Håvard R Karlsen: Conceived and designed the experiments; Performed the experiments; Analyzed and interpreted the data; Contributed reagents, materials, analysis tools or data; Wrote the paper.
Eva Langvik: Conceived and designed the experiments; Analyzed and interpreted the data. Ingvild Saksvik-Lehouillier: Conceived and designed the experiments; Analyzed and interpreted the data; Contributed reagents, materials, analysis tools or data; Wrote the paper.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper