1 Introduction

Museums continually strive to enhance the way in which they engage visitors including how they interact with museum objects such as historic artefacts, documents, works of art and natural specimens. In recent years, museums have been moving away from traditional “do not touch” approaches towards providing hands-on interactive and engaging experiences (Wood and Latham 2016). Recent advancements in enabling virtual reality (VR) technologies provide the ability to experience objects and situations, which are difficult to access or simply not available, representing a powerful tool for a range of different application domains, including museums. Museum visitors are members of the public and a diverse user group unlike more focussed groups such as those involved in VR application domains like surgical training (Visser et al. 2011), specialist driver training (Callari et al. 2022), and robotic teleoperation (Naceri et al. 2021). As such, the amount of prior experience with technology can span from zero to expert level, and given the nature of visiting a museum, interaction with a particular VR experience is typically sporadic or even just a once off experience.

Vehicles are an important part of history and often displayed in museums. The ability to display large objects such as these however poses obvious logistical, display, and interpretation challenges (Clark 2010; GMOM 2022). Museum communication around these types of objects is complex and VR provides opportunities for enhanced engagement with museum audiences (Antlej 2022). To provide a real-world case study for this work, the first Ford ute designed by Lewis T. Bandt in 1934 was chosen. This was deemed as a suitably large and historically significant museum object and one where the ability to interact with it is important. In this work, the focus is on the task of loading objects onto the cargo area at the back of the ute.

Museums continue to explore VR (Hornecker and Ciolfi 2019; Shehade 2020; Margetis et al. 2021), and interaction in VR remains an ongoing research challenge. Interaction in VR typically involves using handheld controllers that are tracked by a VR system. This constitutes a relatively accurate and robust method of interaction and one available with most of the commercially available VR headsets. Natural user interfaces (NUIs) are based around natural human abilities (Wigdor and Wixon 2011) and natural hand interaction where the user’s hand/s are tracked directly using cameras and has long been explored in the literature and has recently become available with off-the-shelf VR technologies (Manresa et al. 2005; Beattie et al. 2015; Hillman 2019; Voigt-Antons et al. 2020). The ability for users to interact with the VR experience using their bare hands provides the potential for interaction that more closely resembles that in the real-world. This condition also means that less physical hardware is required and potentially reduces cost and required maintenance which could be a welcome attribute in museums and other settings. The mechanical design of most traditional real-world computer input devices (e.g. a mouse, keyboard, and touchpads) inherently provides some form of passive tactile or haptic feedback to assist user interaction. The same can be said for handheld VR controllers, which provide some degree of physical feedback to users when, for example, grasping the controller or pressing a mechanical button. When interacting with VR using freehand motion, however, the user typically relies solely on the visual information provided by the VR headset and their own sensory feedback on the position of their hands and arms.

This paper aims to add empirical knowledge on the impact of passive physical everyday tools (e.g. glove, pen, kitchen tongs) on manipulating desk-top sized museum objects within VR. This is achieved through the evaluation of usability and user experience for three different free-hand interaction techniques. These techniques are used in the context of loading a utility vehicle (i.e. ute) with different objects. Each of the three interaction techniques are used with and without the physical tool, constituting a total of six variants of the techniques. The evaluation focuses on the selection and positioning of rigid objects because of the expected applicability to a wide range of applications.

Experiencing museums at home has been increasing in popularity and literature, and this has continued through COVID-19 (Koumaditis et al. 2021). The evaluation presented in this paper was designed to be undertaken within a museum from home environment. Museums are an important adopter of VR and museum visitors are a diverse user group and when it comes to interacting with VR applications in museums are likely first-time or infrequent users. As such, the participant group was selected as one with similar characteristics.

2 Background

2.1 Natural user interfaces (NUIs)

Unlike traditional graphical user interfaces (GUIs) that rely largely on WIMP (windows, icons, menus, pointer), for interaction, natural user interfaces, or NUIs, utilise natural human abilities such as touch, gesture and verbal language and corresponding enabling technologies. Such types of interaction, according to Bill Buxton from Microsoft, “exploit skills that we have acquired through a lifetime of living in the world, which minimises the cognitive load and therefore, minimises the distraction” (Larson 2010, Time:13:45).

User interaction with NUIs is often described as direct, intuitive, and similar to natural behaviour (Norman 2010). The book Brave NUI World focuses on the user’s behaviour and feeling during the experience rather than on the NUI itself (Wigdor and Wixon 2011). Natural interaction in this paper refers to interactions that include free-hand movement (via hand tracking technology) and the corresponding metaphors used to translate those movements to actions in VR.

Handheld controllers such as the Oculus Touch are widely used in VR applications because they provide accurate and reliable interaction (Masurovsky et al. 2020). Research that compares user preference and the performance of physical input devices (e.g. Oculus Touch controllers and 3D mouse) against hand-based gesture interaction for virtual environments (VEs) suggests that traditional input devices provide more natural, immersive (Masurovsky et al. 2020) and efficient interaction for complex tasks (Tscharn et al. 2016). One of the reasons for this is that existing hand tracking technologies have inherent limitations, including occlusion and dependency on specific lighting conditions. The challenge is also due to the anatomical structure of the human hand, and a high degree of freedom combined with the grasping parameters and the geometry of the objects (Nasim and Kim 2018). Despite these challenges, the benefits of hand tracking mean that it continues to appear in commercially available VR technologies (Abdlkarim et al. 2022) laying the foundation to explore more NUIs.

2.2 Natural User Interfaces (NUIs) and their role in museum experiences

Museums, like other application domains, are starting to explore VR, given the ability to recreate environments that are difficult to access (e.g. historical artefacts), or even environments that no longer exist such as situations that occurred in the past (e.g. the fall of the Roman empire).

Interactivity is important to visitor interpretation of museum objects, and this topic has been widely investigated. Interaction has been shown to have a positive impact on learning outcomes and can lead to increased retention of information (Veverka 2018). Research suggests that interacting with an object can reinforce personal interest or present new and valuable information, which extends a user’s existing knowledge (Wood and Latham 2016). Interactivity also supports user satisfaction through engagement and exploration of content across a variety of contexts (Adams and Moussouri 2002). A community survey conducted across 5 months in 2019–2020 suggests that users would benefit from the ability to manipulate and interact with museum objects (Antlej 2021).

Related work on natural free-hand interaction for museum and heritage applications demonstrates that it is beneficial in enabling visitors to be physically and emotionally involved (Carrozzino et al. 2016). It can also be effective in transferring intangible knowledge, such as traditional craftsmanship techniques that are no longer commonly used, such as pottery, wood carving, and printmaking (Brondi et al. 2016). Recent research has proposed interactive solutions in museums that aim to improve user experience through more accessible, natural and intuitive interfaces utilising hand gestures and full-body movement (Hsu 2011; Pietroni et al. 2012, 2021; Vosinakis et al. 2016). Results have shown that such solutions can make the user experience more enjoyable and therefore retain user interest and engagement.

2.3 Passive haptic feedback in Virtual Reality

Touching and grasping are natural ways to manipulate and explore objects in both the virtual and real worlds. Manipulating computerised objects typically involves manipulation tasks such as selection, positioning, rotation, and scaling (Bowman et al. 2008).

A body of work focuses on the incorporation of haptic feedback into user interfaces (Bowman et al. 2008). Haptic feedback can be achieved using either active or passive haptic devices. Active haptic devices render simulated force by using actuators and have been successfully incorporated in a wide range of applications, such as training and simulation for complex tasks (Sigrist et al. 2015). While active haptic devices enable a wide range of haptic feedback, they are typically cost prohibitive and complex. Passive haptics provide reaction forces and friction and despite not being able to actively render different forces to the user, are typically inexpensive, robust and less complex than active devices. Passive haptic interfaces have been shown to provide several benefits, such as increased levels of realism and immersion in VEs (Hoffman et al. 1998; Insko 2001, Han et al. 2018; and Mantovani 2019) and to improve the sense of presence (Schulz et al. 2019). Passive haptic interfaces have been used for conducting music (Barmpoutis et al. 2020), medical simulation and training (Fucentese et al. 2015). Passive haptic interaction by using 3D physical replicas of the virtual object being manipulated have been successfully applied in training applications to assist with a complex task (Oda et al. 2015). 3D printed artefacts have been used to augment VR museum experiences and provide passive haptic feedback (Di Franco et al. 2015; Antlej et al. 2018) to increase realism and improve engagement. A recent study has explored the use of inexpensive touch-sensitive 3D printed replicas to interact with the virtual representation of the cultural heritage artefacts (Palma et al. 2021). Cheng et al. (2018) propose passive physical props that could be turned into active haptics by user’s physical action to make experience more realistic and enjoyable.

3 Method

An evaluation was undertaken to investigate the impact of the physical everyday tools, and corresponding passive haptic feedback, on users’ interaction with three different interaction techniques. Each technique was performed twice—once with the physical tool and once without the tool—constituting variants of six interaction techniques. This research considers different interaction techniques within a museum from home. Usability and user experience criteria were considered to conduct and evaluate a comparative study between the version of the same technique. Usability testing was conducted with participants to evaluate the efficiency, efficacy, and satisfaction with techniques. After each technique, a post-task survey was undertaken to capture users’ experience with the related interaction technique. Criteria included: ease of use, confidence, naturalness, ease of understanding, helpfulness, joy of use, frustration and cognitive and physical load all taken on a 5-point Likert scale. A post-experience survey was conducted offline to collect demographic data and satisfaction ratings. The measures collected for this evaluation are listed in Table 1.

Table 1 Usability and user experience measures collected during the experiment

3.1 Participants

In total 24 participants (10 female, 14 male) completed the evaluation. Their ages ranged from 18 to 65 years (M = age group 26–35) and included postgraduate research students from Deakin University and members of the public from the Geelong and Melbourne regions in Australia.

22 participants completed the post-experience survey, and of these, 7 (31.82%) had not been immersed in VR in the past 5 years, 5 (22.73%) had only had a single experience, 4 (18.18%) had between 2 and 5 experiences, 1 (4.55%) had between 6 and 15 experiences and 5 (22.73%) had more than 16 experiences. The participants’ self-reported familiarity with VR was as follows: 1 participant self-reported as an expert (4.55%), five participants as very familiar (22.73%), 10 participants as moderately familiar (45.45%), three participants as slightly familiar (13.64%), 2 participants as not familiar at all (9.09%) and 1 participant as none of the above (4.55%). Considering prior experience with 3D gaming, 5 (22.73%) participants had never played 3D video games, 4 (18.18%) had tried once, 8 (36.36%) played occasionally, 3 (13.64%) played weekly and 2 (9.09%) played daily.

This broadly distributed demographic profile, in terms of age and relevant prior experience, aligns well with typical museum visitors. Some of the cohort was recruited from earlier research with the National Wool Museum in Geelong, Australia.

3.2 Apparatus

The study was conducted within participants’ homes. Participants were provided with the Oculus Quest VR headset pre-programmed with the VR experience, relevant equipment (e.g. charging cable, sanitising wipes, and disposable protection cover), tools (glove, pen and kitchen tongs) and instructions on to configure and use the VR experience. The research team were telepresent during the experiment via zoom video conferencing platform.

The VR experiences were developed using Unity and C# programming language. Figure 1 shows the instructional diagram provided to participants as part of the instructions and illustrates the relevant setup requirements. Participants were required to use their bare hands to complete three of the VR experiences. Another three were undertaken with the aid of corresponding everyday tools. During the experiments, video and audio from the zoom call were used to monitor and record the participants’ behaviour.

Fig. 1
figure 1

Instructional diagram provided to participants to illustrate relevant information including: A the minimum area required to be clear of obstacles; B the ideal distance from the computer camera for the zoom call; and C the provided suitcase with equipment

3.3 Experimental design

3.3.1 Scenario

The scenario for the VR museum experience was based on the case study of a utility vehicle (i.e. ute), an iconic Australian invention designed by Lew Band in 1934 for Ford in Geelong. The task of loading the ute with five virtual farm objects was selected to evaluate the proposed interaction techniques.

3.3.2 Interactive virtual objects

Figure 2 shows the virtual museum objects that participants were required to select and position onto the cargo area at the back of the ute. These interactive virtual objects include an iconic Australian 1934 utility vehicle, colloquially referred to as a ute (Fig. 2a), and five desk-top size farm-related objects that might ordinarily be carried on the back of the ute, such as a pumpkin, pitchfork, milk urn, bale of hay, and pig shown in Fig. 2b–f, respectively. In the task assigned to participants, the ute remained static, and participants were required to move the five desk-top size objects onto the back of the ute using one of the six variants of the interaction techniques as shown in Fig. 4. The task was considered complete once all five objects have been successfully positioned onto the back of the ute (Fig. 3b).

Fig. 2
figure 2

3D models of museum objects used within the VR environment. a ute, b pumpkin, c pitchfork, d milk urn, e bale of hay, and f pig

Fig. 3
figure 3

a Initial placement of objects using a circle’s circumference (placement circle) to ensure equal distance between objects and user. The white dot represents the user’s initial spawn location at the centre of the placement circle and the diagonal striped pattern shows the area in which participants were asked to place objects, and b Shows the final position of the objects once a participant has completed the task

As shown in Fig. 5, the participant’s starting location was behind the vehicle and in line with the ground plane at the centre of the red circle used for placement (not visible within the VR experience). Although the participant was initially placed at the centre of this placement circle, they were able to move around the VE during the experience as long as they remained within the tracking volume assigned during the setup of the system. In each experience, the five desk-top sized objects were randomly placed along the circumference of the placement circle, excluding the region at the rear of the ute, as presented in Figs. 3a and 5. The objects were orientated so they faced towards the origin of the circle (normal to the circle’s circumference), and the forward direction used was the same as that presented in Fig. 5. The placement of virtual objects was such that once an object was assigned to its random position along the placement circle’s circumference, a surrounding region of this circumference was excluded from future placement to ensure that the objects did not overlap. The placement circle was positioned to ensure an equal distance from its centre to the farm objects and the back of the ute where objects were stored.

3.3.3 Interaction techniques

Each of the three interaction techniques has a corresponding everyday physical tool (Fig. 4):

  1. (i)

    Natural hand grasping—a glove;

  2. (ii)

    Ray-casting—a pen; and

  3. (iii)

    Jaw tool-type grasping—kitchen tongs.

Fig. 4
figure 4

Each interaction technique was used twice—once with the physical everyday tool to provide passive haptic feedback and once without the physical tool—which comprises a total of six variants of the interaction techniques

The interaction techniques were chosen based on their suitability for selection and manipulation of desk-top sized objects at different distances (Fig. 5). In choosing the interaction techniques and corresponding everyday tools, consideration was made to ensure that both the techniques, as well as with the inclusion of the physical tool, could be satisfactorily tracked by the hand-tracking system in the VR headset (Oculus Quest). The first two interaction techniques have been widely used and published in the literature (LaViola Jr et al. 2017). These were chosen because are based on two fundamental interaction metaphors, virtual hand (grasping metaphor) and virtual pointer (pointing metaphor) (Poupyrev et al. 1998), which are commonly used for manipulation and selection in VR. The familiar everyday tool is used aiming to enhance these two metaphors. The third technique is a variation on hand grasping for manipulating objects at a distance and was developed for this work because it has distinct capabilities to the ray-cast and hand grasp techniques. The technique also simulates the interaction with an everyday tool familiar to most users. A technique using tongs was proposed by Schkolne et al. (2002), and it incorporates sensors and other technology to manipulate objects in VR. Our approach differs in that the tongs are passive and have no embedded technology or connection to the system

Fig. 5
figure 5

Top view showing the initial placement of the virtual objects for each interaction technique

.

When the participants used the technique with and without the physical tool, they would see a virtual version of the tool in VR, as depicted in the right-hand column of Fig. 4. When using the techniques with a physical tool, the participants held the physical tool (glove, pen and tongs) in their hand as they moved their hand to interact with the interaction technique. The participants were instructed to use their preferred hand to interact with the technique and not to change this as the experiment progressed.

The radius of the red placement circle shown in Fig. 5 was specified relative to the distances considered to best support each interaction technique. The grasp technique is for direct hand grasping objects and as such the radius of the circle was set to 0.5 m so that objects were positioned within arm's reach. The ray-casting method, on the other hand, is better suited for selection and manipulation at the distances beyond arm’s reach; and therefore, its placement circle was assigned a radius of 1.5 m. Finally, the jaw tool-type grasping was deemed suitable for objects just outside of arm’s reach, and designed to accommodate an extended reach and as such the placement circle was assigned a radius of 0.6 m.

A within-group design was used for the experiment where each participant used the three techniques twice (i.e. with and without the physical tool) comprising a total of six interaction techniques. The ordering of the techniques was randomly assigned to control and reduce the learning effect. The tasks assigned to the participants with respect to object placement was kept simple with corresponding instructional videos provided to give participants sufficient time to become familiar with the task and interaction technique.

3.3.4 Selection and manipulation task

The participants were required to select a virtual object and then place it on the back of the ute. In general, selection refers to the task of choosing an object, manipulating it to set its position and orientation, and changing other properties of the object, such as scale, shape, or colour. During the experiment, the participants were required to only change the position, and rotation as required, of the virtual object to move it from the starting location to the target destination on the back of the ute. As such, in this work, manipulation refers only to changing the position (and potentially rotation) of the selected object and not to other properties.

Once a participant had selected and picked up a virtual object from its initial location, they were required to move it to the back of the virtual ute. Oculus Quest hand tracking provides six degrees of freedom allowing participants to move and rotate virtual objects naturally as if they were doing so in the real-world. Once a participant moved the selected object to the cargo area at the back of the ute the object was automatically snapped to its predefined location. The task was completed once all five objects had been moved to the required location.

Selection time (ST) was defined as time taken to select an object and calculated as the time between the object being released on the back of the ute (predefined location) and the next object being selected (Fig. 6). The ST for the first object was the time duration from the start of the experience until the first confirmed selection of an object. The ST may include the participants exploring the environment and making decisions on which object to select before the selection is confirmed. On the basis of an observation of the participants undertaking the experiences, a large amount of ST constituted the selection of the next object to be moved to the back of the ute. No significant difference in ST was found for the selection of the first object versus the following objects. Therefore, ST for the first object was retained within the data for analysis and not excluded. Manipulation refers to the act of moving and placing the selected object to its storage location at the back of the ute. Manipulation time (MT) was defined as the time spent moving objects, starting from when the participant selects an object until the object is positioned on the back of the ute.

Fig. 6
figure 6

Protocol for the experiment showing the sequence of activities, data collection protocol used to capture data and, breakdown of tasks

3.4 Procedure

Before participating in the experiment participants underwent a process to obtain formal consent. This included receiving a digital copy of a plain language statement and consent form via email. Figure 6 depicts the protocol and data collection for the experiment and comprises the experiment set up, VR experience (i.e. task/technique performance and post-task/technique survey) and post-experience survey. The zoom session was video and audio recorded and data collected during the VR experiences were stored directly on the headset. Think aloud and observation were used as one of the techniques for capturing the user's performance during the task and to complement primary data collection. A post-experience survey was undertaken outside of VR after undertaking the VR experiences with all six techniques. The survey was completed electronically once the researcher had left the participant.

After the equipment was delivered to the participant’s home, participants met with the researcher at a pre-scheduled time on Zoom. The participants were firstly introduced to the setup and required to configure the devices and this took approximately 20 min in total. The participants then undertook the six VR experiences (one for each technique) and corresponding post-task surveys (one for each technique) which took approximately 30–45 min in total (including breaks). A post-experience survey took approximately 10 min for participants to complete. The total time of participation was approximately 90 min inclusive of set up time.

Before starting each VR experience the participants were provided with a random number between 1 and 6 corresponding to one of the six interaction techniques. The participant was then presented with an instructional video (within VR) on how to interact with tools corresponding to the particular technique. Once ready they were required to select an object and then position it on the back of a utility vehicle (ute) within the VE. Five objects needed to be loaded onto the ute and once all five objects had been successfully positioned on the back of the ute the task was complete. The participants were then prompted to press a virtual button (located on the back of the ute) to begin the post-task/technique survey. This process was repeated six times, once for each of the interaction techniques.

Usability criteria—i.e., efficacy, efficiency and satisfaction—measurements were collected. Efficiency and efficacy criteria were measured; while, the participants performed the tasks and stored to the headset; satisfaction criterion was evaluated using a post-task/technique survey. Efficiency was determined using MT (time spent moving objects) and ST (time taken to select an object). A timestamp (in milliseconds) for all logged data and each data point was stored on a rendered visual frame which occurred approximately 72 times per second (~ 14 ms intervals). This corresponds to the 72 Hz visual refresh rate of the Oculus Quest headset. Efficacy was determined by looking at the success rate and error rate. The task was successfully completed when all five objects were loaded on the back of the ute. Success rate was defined as % of users who completed all six experiences and % of completed tasks. The error rate (ER) was determined by counting the objects that were not loaded relative to the number of objects. Think aloud and observation notes were used to capture perceived effort, difficulty and confusion while performing all experimental tasks. This qualitative input was used to add insights and clarify the challenges faced by the participants.

The post-task/technique survey was undertaken directly in VR after task/technique performance and repeated following each of the six interaction techniques. The participants selected their response by pressing a virtual button using their hand, as depicted in Fig. 7. The post-task/technique survey comprised nine questions relating to the participant’s experience with respect to the interaction technique and the questions are presented with results in Table 6. The questions considered user experience, including ease of use, naturalness, confidence, ease of understanding, helpfulness, joy of use, frustration, and physical and cognitive load. The questions were partially taken from the System Usability Scale (SUS) and adapted to suit the given experiment and scenario. Other questions were created to address aspects specific to interaction with passive physical tools, and the participants responded using a 5-point Likert scale ranging from 1 (strongly disagree) to 5 (strongly agree).

Fig. 7
figure 7

Screen capture of the VR environment showing a participant responding to the post-task/technique survey

To capture the participants’ demographic data, feedback and preferences, a post-experience survey was undertaken after completion of VR experiences. Twenty-two of the 24 participants undertook the survey. Aside from demographics, the survey included questions on self-reported satisfaction, as presented in Table 4, and perceived effectiveness as presented in Table 5. The question on satisfaction was repeated for each of the six interaction techniques and a 5-point Likert scale was used. Questions related to perceived effectiveness were repeated for each technique that used a physical tool (three times). Self-reported effectiveness questions were answered using an integer scale ranging from 0 to 100 aided by a slider.

4 Results

4.1 Usability of the interaction techniques

Evaluation of usability included the efficiency and efficacy criteria, and the overall satisfaction. Quantitative measures used to evaluate participant efficiency included ST (i.e. time taken to select an object), MT (i.e. time spent moving object), and dwell time (i.e. time required for the system to confirm a selection). When the ray-cast interaction technique was used, an object was selected by directing the casted ray onto the object for 3 s. This time is required to confirm that the object was selected intentionally and referred to herein as the dwell time. STs longer than 15 s were considered unusual and potentially caused by issues with tracking. As such, these were excluded from the analysis.

4.1.1 Efficiency and efficacy

The Friedman test was performed to test for differences in MT. As can be seen in Table 2, a statistically significant difference in MT was found between the hand grasp, ray-cast and jaw tool-type grasping techniques with and without the physical tool (χ2(5) = 61.683, p < 0.001). Post-hoc (Wilcoxon signed-rank test) comparisons were performed with Bonferroni correction applied to maintain a significance level (α) of 5%. The significance level for MT was set at p < 0.003, and a statistically significant difference (p < 0.001) in MT existed between the hand grasping and ray-casting techniques for all four combinations (the two techniques each with and without the tool). The same was observed for the jaw tool-type and ray-casting techniques. No significant difference in MT was found between using and not using the tool for each of the three techniques. The ray-cast technique resulted in significantly faster MT compared with hand grasp and jaw tool-type grasping.

Table 2 Test statistics (N = 23)

Table 3 shows the average MT taken by participants (N = 24) to correctly position the required objects on the back of the ute by using each interaction technique. Although there was no significant difference for the ray-cast and jaw tool-type grasping techniques, the participants were approximately 17–18% faster when using the physical tool. The largest difference in SD between the option with the tool (SD = 2.01) and without the tool (SD = 4.01) was for the jaw tool-type technique. The lower SD when a tool is used could suggest that it provides not only a lower MT but also more consistency.

Table 3 Descriptive statistics based on average manipulation time (MT) and selection time (ST) with the different techniques

As mentioned earlier, ST is the time between correctly positioning an object onto the back of the ute and selecting the next object. Repeated measures ANOVA was used to compare ST for all six interaction techniques. A statistically significant difference in average ST was found between the interaction techniques (F(5, 110) = 13.909, p < 0.001). Results of a post hoc pairwise comparison test with a Bonferroni adjustment revealed that ST was faster for the ray-cast technique compared to hand-grasp and jaw tool-type grasping (95% CI p < 0.05). However, no difference was found between the same technique with and without the physical tool (Table 2).

Figure 8 shows the ST and MT for all five objects using each of the six interaction techniques. For selection, the slowest times for all five objects corresponded to the jaw tool-type grasping interaction, which can potentially be considered more mechanically complex to perform than the other two techniques. Comparing using the technique with and without the physical tool, it can be observed that the longest ST was for the jaw tool-type grasping interaction technique with the physical tool. The same technique without the physical tool was around 10% faster. The STs for the ray-cast with and without the physical tool were comparable (− 1%) and that for the hand grasp was faster (7%) when using the physical tool.

Fig. 8
figure 8

Selection time (ST) and manipulation time (MT) for positioning the five objects

Over the course of selecting the five objects, the difference in average ST between the six techniques narrowed from mean = 5.89 (SD = 0.76) for the first object to 4.21 and (SD = 0.93) for the last object. When objects were selected, the jaw tool-type grasping interaction technique, both with and without the physical tool, was largely the slowest of all techniques. The ST does, however, reduce and by the fifth and final object and by the end, it is much closer to the other techniques. This finding may indicate that despite being perceived to have the lowest level of satisfaction and high levels of frustration, as discussed in the following sections, the participants quickly learnt how to use the technique efficiently for selecting objects.

In terms of the time taken to manipulate objects, the hand grasp interaction technique was largely the slowest. This finding appears logical given that both the ray-cast and jaw tool-type grasping techniques inherently provide the ability to move objects at a distance unlike the hand grasp technique, which requires the object to be picked up and placed. The ray-cast technique, both with and without using the physical tool, was relatively fast for selection and was also the fastest techniques for manipulation likely because of the ease with which objects can be moved at a distance.

All participants attempted to complete the VR experiences with each of the six techniques and half of the participants (N = 12) loaded all five objects for all techniques. Overall, 89.58% of the tasks were completed successfully. The error rate was recorded by counting each unloaded object. The highest error rate was recorded for the jaw tool-type interaction technique with (5%) and without (4.17%) a tool and for ray-casting without a tool (4.17%).

Spearman’s correlation was performed to identify correlation between demographics (e.g. age and gender), prior experience (e.g. 3D gaming, VR experiences, and familiarity with VR) and efficiency (e.g. MT and ST). A significant correlation was found between prior VR experiences and ST for the ray-casting technique without the tool (ρ = 0.437, p = 0.042, N = 22). These results suggest that the above factors do not affect the participants’ efficiency with the different techniques to a large extent.

4.1.2 Satisfaction

The post-experience survey rated overall user satisfaction for the six interaction techniques (Table 4). Satisfaction was measured using a 5-point Likert scale represented as a star rating system, where 1-star represents “not satisfied at all” and 5-stars represents “very satisfied”. The difference in results for the same technique with and without physical tools is provided as a percentage, with a negative difference representing a preference for the technique without the physical tool and a positive difference indicating a preference for where the physical tool was used. The Friedman test was performed to ascertain differences in satisfaction ratings when using different techniques. Although a significant difference was found (p = 0.046), post hoc analysis found that none of the differences reached a significant level when alpha was adjusted using Bonferroni correction.

Table 4 User satisfaction ratings for each of the six interaction techniques

Results show that for all techniques, not using the physical tool was more satisfying than when the physical tool was used. The largest difference was observed for the hand grasp interaction technique, where using the technique without the physical tool was preferred (13.33%) over the physical tool. For the ray-cast technique, not using a physical tool was more satisfying than using one (5.13%), and no noticeable difference was observed for the jaw tool-type grasping technique (0.70%).

A statistically significant negative correlation was observed between MT and satisfaction for the hand grasp technique without the object (ρ =  − 0.491, p = 0.024). The state of being more satisfied when manipulating faster or being faster when one is more satisfied was observed for this technique only and not the others.

The results related to perceived effectiveness (Table 5) show that for all interaction techniques participants marginally felt that holding a corresponding physical tool (i.e. pen, glove or tongs) helped them in using the particular interaction technique.

Table 5 Self-reported effectiveness in using a tool for each of the three interaction techniques

4.2 User experience

The post-task/technique survey mainly considers the user experience when employing specific interaction techniques in the given scenario and task (Table 6). In this task, user experience considers ease of use, naturalness, confidence, ease of understanding, helpfulness, joy of use, frustration, and cognitive and physical load. The difference between using the interaction technique with and without the physical tool is provided as percentage difference. A positive percentage difference means that the technique when using the physical tool received a higher rating, and a negative percentage difference indicates that the technique rated higher when the physical tool was not used.

Table 6 Self-reported user experience ratings

When considering ease of use, as a whole, participants found the technique easier to use without the physical tool. The ray-cast technique without a physical tool had the highest ease of use, and the participants commented “That was easy” and “Easy but did not get the first time”. For ease of use of the hand grasping technique, the preference was for without the physical tool and the participants made comments such as “This one is easy”, “That was realistic, grabbing the handles of the milk urn.”. The jaw tool-type grasping technique with a physical tool had the lowest ease of use. The participants’ comments on physical jaw tool-type grasping interaction technique include “I like the physical feeling, but marking the tracking”, “They are not staying still”, “Can’t release the tongs”, “Weird with the tongs, the real physics is not there”.

When comparing the jaw tool-type grasping with and without the physical tool, the participants felt that using the physical tool was slightly more natural (3.66%), despite indicating that using the jaw tool-type grasping with the physical tool (tongs) was more physically and cognitively demanding.

Figure 9 shows that using the physical tool with the jaw tool-type grasping technique scored higher across all three questions in terms of confidence, ease of understanding and helpfulness. Comments such as “Tongs help a lot” and “this one is easier” were made. For the ray-cast technique, little difference was found when using and not using the physical tool.

Fig. 9
figure 9

Self-reported ease-of-use, naturalness, confidence, ease of understanding and helpfulness for the jaw tool-type grasping technique

The results indicate that the participants reported higher cognitive demand when using the physical tool for all three interaction techniques. Interestingly, the participants found interaction with a physical pen 17% more demanding than the option without a tool, which is the highest percentage difference. The participants found that using the passive physical tool for the jaw tool-type technique was only 9.62% more cognitively demanding.

In terms of perceived physical demand, the largest difference was observed for the jaw tool-type grasping interaction, which is also the most mechanically complex task, with results showing that use of the physical tool (tongs) was 12.5% more physically demanding than without the tool. For the ray-cast interaction, which is also the least mechanically complex, no difference was observed. This result was supported by the participants’ comments, such as “Not much difference even with physical” and “Tongs helped a lot, pen not much difference.”.

Regarding the difference in frustration for the same technique (with and without the physical tool), the participants reported higher frustration levels when using the jaw tool-type grasping (26.92%) and ray-cast (10.2%) interaction techniques without physical tools than they did when the physical tools were used. This increased frustration correlates to slower performance when using the jaw tool-type grasping (17%) and ray-cast (18%) interaction without the physical tool. This correlation was also seen for the grasp interaction technique where the participants found that using the physical tool was more frustrating (by 8.24%) than without (only 2% slower). These results suggest that the version of the same interaction technique (i.e. with or without physical tool) that is more frustrating is likely to relate to slower performance in manipulating/positioning objects.

Spearman’s correlation was performed on MT and frustration and yielded a statistically significant positive correlation for the hand grasp technique without the tool (ρ = 0.524071, p = 0.008572) and nearly significant (ρ = 0.134509, p = 0.058) for the ray-casting technique without the tool. A positive correlation indicates that the more frustrated participants are with the technique, the longer it takes them to manipulate the object.

As discussed, the difference in self-reported frustration for tool versus no tool, was largest for the jaw tool-type grasping technique, followed by the ray-cast technique, and then least for the hand grasp technique. For satisfaction, the opposite was the case, with the largest difference being for hand grasping, followed by the ray-cast, and then the least for jaw tool-type grasping (Table 4).

When the size of the difference between when a physical tool was used and when it was not is taken into consideration (Tables 3, 4 and 6), the jaw tool-type grasping technique had the largest difference for frustration but the smallest for satisfaction. The hand grasp interaction technique had the smallest difference for frustration but the largest for satisfaction. Participant comments supported the self-reported higher satisfaction when not using the physical tool (Table 4) including “Feeling more comfortable without using any tool in my hands” and “There is no difference when I experience, a physical item in my hand”. The participants differed in their preferences for the jaw tool-type technique, with some commenting on not observing any difference between “virtual or physical” and others saying that “Tongs helped a lot” and that “Tongs help a bit because there's like feedback. Like spring feedback.”. Some participants felt that it was better without the physical tool, commenting that “The virtual tongs were the best method”, “Virtual is ok”, “Physical tongs where not so comfortable to use”. Notably, the jaw tool-type grasping technique and corresponding kitchen tongs potentially constitute a more mechanically challenging task for users than performing the ray-cast (potentially holding a pen) and hand grasp (potentially wearing a glove).

Spearman’s correlation was conducted to look for correlations between demographics (e.g. age, gender), prior experience (e.g. 3D gaming, VR experiences, familiarity with VR) and user experience (e.g. ease of use, naturalness, confidence and more). A correlation seems to exist between gaming experience, VR level and VR experience towards user experience variables. This condition is more apparent for techniques that do not use an accompanying tool, particularly for the ray-cast technique without a tool. Frustration, confidence and physical effort seem to be the most affected, however, due to the large number of dependent variables (i.e. comparisons) and the small sample size. This matter requires further investigation.

5 Discussion

This paper investigates the impact of passive physical everyday tools on users’ interaction with objects within a VR environment. Museum visitors are a diverse group and their interaction with museum objects and exhibits is often infrequent and for the first time. Participants in the study represented a wide age demographic (18–65) and with wide variety of experience with VR and related technologies. The case study was chosen as one suitable for a museum and the particular interaction techniques as ones with wide applicability. An experiment was undertaken to evaluate the impact of using the passive physical tool on the usability and user experience of three interaction techniques, each with and without a corresponding physical tool held by the user. The evaluation was conducted within participants’ homes.

The usability results show that the MT was on average less (17–18%) for the ray-cast and jaw tool-type grasping techniques when using the passive physical tool. The findings also suggest that for the jaw tool-type and ray-cast techniques interaction incorporating a passive physical tool increases the physical and cognitive load but has lower levels of experienced frustration. For the jaw tool-type technique, the participants felt that movements were more natural when using the physical tool.

The jaw tool-type grasping technique was the most difficult to use in general possibly because the technique is potentially the most mechanically complex to perform (compared with grabbing using one’s hand or by pointing a pen) and aim to simulate interaction with the familiar tool. The technique with the physical tool was reported by the participants as the most difficult of all three techniques. Despite this, participants felt more “natural”, were more confident, and found it easier to understand and more helpful than not using the tool. This finding may suggest that for mechanically difficult 3D interaction techniques, the provision and use of a physical tool may positively affect the user’s confidence and perceived ease of understanding and helpfulness, thus making it feel more natural to use the tool overall.

The participants found the ray-cast, hand grasp techniques and jaw tool-type grasping technique without the physical tool more satisfying than when the physical tool was used. However, the reported satisfaction levels for the jaw tool-type grasping technique were similar for the option with and without the physical tool. The jaw tool-type technique may be the most mechanically complex to use, and as noted above, when used with the physical tool, this technique corresponded to higher reported levels of naturalness, confidence, ease of understanding and helpfulness than without the physical tool. These perceived benefits may have offset the low level of relative satisfaction with the use of the physical tool with this technique.

The interaction techniques utilise hand tracking of the participant’s preferred hand. When a physical tool is used, it potentially occludes the camera’s view of the hand and causes tracking issues. Based on observation and participant comments, this condition appeared to have a greater effect on hand grasp and jaw tool-type interaction techniques because they require movement with multiple hand rotations. While free-hand interaction encourages the user to freely explore the virtual environment, fewer physical constraints, quick and uncontrolled movements and restricted boundary of the virtual world to the physical space challenge the capability of camera tracking. The participants felt the challenges caused by tracking limitations and issues, as evidenced by remarks such as,

Demonstrates some of the limitations of hand tracking still”, “The bale of hay at the beginning of the experience it went on top of my head. I don’t know what happened. But it’s still fun”, and “Getting the program to recognise hand gestures such as grabbing”.

Comments such as

Weird with the tongs, the real physics is not there”, “Sometimes, the objects were too close and also coordinating with tong movements was not that smooth.”

show that tracking had a significant impact on the jaw tool-type technique when used with the tool, thus causing a mismatch between the physical tool and the visually represented version in VR.

The results indicate that holding a passive physical tool when performing free-hand 3D interaction can improve usability and user experience. The impact of using the physical tool may differ with the level of mechanical complexity of the interaction technique. When designing and incorporating 3D user interaction techniques into VR experiences, the more mechanically complex the interaction technique, the more likely that employing a physical object will enhance the usability and user experience associated with the technique. While for metaphor-based interaction techniques commonly used in VR, such as ray-casting and hand grasping, the downsides of using a passive physical object may outweigh the benefits achieved.

Camera tracking has inherent limitations such as occlusion and lighting, but if it is carefully designed with the limitations of tracking technology taken into consideration, then we can still have engaging and enjoyable experiences using free-hand interaction techniques for VR museum applications. Despite the challenges and limitations of camera tracking the satisfaction ratings, positive user comments and 89.58% task completion rate suggest that the proposed free-hand interaction techniques are suitable for this type of audience and worth further development and exploration.

6 Conclusions and future work

This paper investigated the impact of physical tools on the usability and user experience of three different interaction techniques. The participants were asked to select and manipulate desk-top sized objects comparable to those that may be encountered in a museum. Although the experiment was undertaken by a participant group similar to museum visitors with diverse demographic backgrounds, further study is required in relation to different age groups, motivations, and other backgrounds due to the low number of participants.

In the considered scenario, the utility vehicle was chosen, and the participants were asked to select and position five desk-top sized farm objects on the back of the ute. The task was repeated six times with each of the interaction techniques. In this context, it has been seen that object shape and other characteristics affect users’ free-hand gestures, especially in the case of the hand grab and jaw tool-type technique. These techniques are also more affected by tracking limitations and issues.

The results suggest that using passive physical tools for more mechanically complex techniques, such as the jaw tool-type interaction technique, has several benefits. While a passive physical tool increases the cognitive and physical load, the frustration is lower, and the participants felt that using the tool was more natural. User experience ratings confirm that the jaw tool-type grasping technique was the most difficult to use in general. However, the participants were more confident and found it easier to understand and more helpful than when the tool is not used. The efficiency results based on MT show a 17–18% increase in speed for the ray-cast and jaw tool-type grasping techniques with the tool. Although the reported satisfaction ratings do not show significant preferences for either option, the participants’ comments confirm the benefits of using the passive physical tool for the jaw tool-type technique. These findings may be valuable for those designing and incorporating free-hand interaction techniques into VR experiences.

Tracking issues affected the performance and subjective quantitative ratings but did not prevent the participants from completing the task successfully. The quantitative rating results show low frustration likely due to the excitement of undertaking the experience. Usability is an important aspect of user experience, however, considering the visitor interaction with museum objects, a more holistic approach is required to improve the evaluation of those types of experiences. Future work could consider the impact of passive tools on presence and agency.