Dataset for the assessment of presence and performance in an augmented reality environment for motor imitation learning: A case-study on violinists

This dataset comprises motion capture, audio, and questionnaire data from violinists who underwent four augmented reality training sessions spanning a month. The motion capture data was meticulously recorded using a 42-marker Qualisys Animation marker set, capturing movement at a high rate of 120 Hz. Audio data was captured using two condenser microphones, boasting a bit depth of 24 and a sampling rate of 48 kHz. The dataset encompasses recordings from 2 violin orchestra section leaders and 11 participants. Initially, we collected motion capture (MoCap) and audio data from the section leaders, who performed 2 distinct musical pieces. These recordings were then utilized to create 2 avatars, each representing a section leader and their respective musical piece. Subsequently, each avatar was assigned to a group of violinists, forming groups of 5 and 6 participants. Throughout the experiment, participants rehearsed one piece four times using a 2D representation of the avatar, and the other piece four times using a 3D representation. During the practice sessions, participants were instructed to meticulously replicate the avatar's bowing techniques, encompassing gestures related to bowing, articulation, and dynamics. For each trial, we collected motion capture, audio data, and self-reported questionnaires from all participants. The questionnaires included the Witmer presence questionnaire, a subset of the Makransky presence questionnaire, the sense of musical agency questionnaire, as well as open-ended questions for participants to express their thoughts and experiences. Additionally, participants completed the Immersive Tendencies questionnaire, the Music Sophistication Index questionnaire, and provided demographic information before the first session commenced.


a b s t r a c t
This dataset comprises motion capture, audio, and questionnaire data from violinists who underwent four augmented reality training sessions spanning a month.The motion capture data was meticulously recorded using a 42-marker Qualisys Animation marker set, capturing movement at a high rate of 120 Hz.Audio data was captured using two condenser microphones, boasting a bit depth of 24 and a sampling rate of 48 kHz.The dataset encompasses recordings from 2 violin orchestra section leaders and 11 participants.Initially, we collected motion capture (MoCap) and audio data from the section leaders, who performed 2 distinct musical pieces.These recordings were then utilized to create 2 avatars, each representing a section leader and their respective musical piece.Subsequently, each avatar was assigned to a group of violinists, forming groups of 5 and 6 participants.Throughout the experiment, participants rehearsed one piece four times using a 2D representation of the avatar, and the other piece four times using a 3D representation.During the practice sessions, participants were instructed to meticulously replicate the avatar's bowing techniques, encompassing gestures related to bowing, articulation, and dynamics.For each trial, we collected motion capture, audio data, and self-reported questionnaires from all participants.The questionnaires included the Witmer presence questionnaire, a subset of the Makransky presence questionnaire, the sense of musical agency questionnaire, as well as openended questions for participants to express their thoughts and experiences.Additionally, participants completed the Immersive Tendencies questionnaire, the Music Sophistication Index questionnaire, and provided demographic information before the first session commenced.
© Obtained with a Qualisys motion capture system, using 18 infrared OQUS cameras and 42 reflective markers, using the Animation marker set, a violin bow (3 markers) and a violin (3-4 markers), recorded at 120 Hz.

Audio
Obtained with a Y-pair of condenser microphones in Ableton and synchronized with Motion Capture data using a SMTPE system, recorded at 48 kHz, and bit depth of 24.

Questionnaires
Completed before the first trial, after each trial, or after the last trial, in LimeSurvey.Eleven amateur violin players from a student orchestra took part in a study where they practiced two previously unfamiliar pieces with an avatar through an augmented reality application.The avatars used in the application were generated from data provided by two section leaders who also served as the ( continued on next page ) section leaders for the first and second violin sections.The avatars were presented in two different formats: 2D and 3D renderings.
During the span of one month, the amateur players engaged in four practice sessions with the application, focusing on two pieces.Throughout the experiment, the pieces were performed either by a 2D or a 3D avatar.It is worth noting that the participants learned the pieces during the study and had almost no prior experience with them.

Value of the Data
• These data investigate performance, feeling of presence, and learning process of a small group of violinists, who use augmented reality with different layers of reality/stereoscopic information to master a musical piece.This dataset is unique, because it assesses kinematics and audio of musicians performing a complex task in different states of presence and immersion, while using augmented reality.• These data are of interest for the scientific community studying behavior in augmented reality, the interplay between presence and learning, the interplay between immersion and presence, and motor imitation learning.• The data can be used to study kinematics and audio of musicians rehearsing a piece, and to study interaction of performance, feeling of presence, and learning process in an augmented reality environment.

Objective
This study aims to address the educational challenge of acquiring sophisticated gestures.The dataset collected for this study is utilized to investigate the impact of practicing with an augmented reality (AR) play-along application on playing performance, learning process, and feeling of presence.
The study consists of two experimental conditions: one involving a 2D simulation of a virtual section leader and the other utilizing a 3D simulation within a virtual HoloLens 2 environment.Eleven participants engaged in four practice trials spaced evenly over the course of a month, with each participant experiencing both conditions in a within-subject design.
Like the dynamics of real orchestral playing, participants were instructed to closely imitate the avatar's bow movements, encompassing bowings, articulations, and dynamics.The study recorded and analyzed violin playing gestures using kinematic metrics.Additionally, questionnaires were administered to explore subjective experiences of presence and establish participants' musicality and immersive tendencies.
By employing hierarchical regression modeling, the study examines whether the play-along application's conditions influenced gesture similarity, imitation learning, and feelings of presence.This analysis provides valuable insights into the effects of different experimental conditions on skill acquisition and immersive experiences.

Repository structure
The repository contains several zip files, which contain a set of individual data files.We describe the general content of every zip file and supplement this information with a table describing the individual data files in the zip file.An overview of the different zip-files is given in Table 1 .

Labeled_MoCap_Data.zip
Csv format with labeled MoCap Data, including data labels.Every column is a data stream from a marker.Marker positions are indicated in Fig. 1 .Every marker has 3 data streams, referring to the x, y, and z coordinates of the marker position.In addition to the markers as indicated in Fig. 1 , the violin (3-4 markers) and the violin bow (3 markers) are labelled as well.An overview of the different labels and their meaning is given in Table 2 .One data file per participant (P001-P011), per trial (T1-T4), per condition (2D/3D) is presented.Additionally, the data type (MoCap), and the performed fragment (F1-F4) are given in the filename.An example of a file name is e.g., 'P001_T1_2D_F1_MoCap.csv' for a participant (see Table 2 ).

Joint_Angle_Data.zip
Csv format with joint angles, including data labels.Every column is a data stream from a joint (see Fig. 2 ).Every joint has a varying number of data streams, depending on the calculated angles.In addition to joint angles, the angles of the instrument relative to the body are given as well, the distances of the bow to the bridge, and to distances of the bow to the strings, respectively.An overview of the different labels and their meaning is given in Table 3 .One data file per participant (P001-P011), per trial (T1-T4), per condition (2D/3D) is presented.Additionally, the data type (JointAngles), and the performed fragment (F1-F4) are given in the filename.An example of a file name is e.g., 'P001_T1_2D_F1_JointAngles.csv' for a participant (see Table 3 ).

Analyzed_Data.zip
Time series of filtered and analyzed data.The distances of the bow to the bridge and frog, are filtered so that only bow strokes with a certain bowing length and a certain loudness level are retained.The resulting collection of regions-of-interest (ROIs) is then analyzed for movement smoothness (as assessed with the SPARC index [1] ), and a comparison is made between the profile of avatar bowing movements and participant bowing movements by means of the Procrustes distance [2] (see Fig. 3 ).These data are presented as csv files with 4 columns: SPARC index per ROI, Procrustes distance between the bow movement of the avatar and participant, index of start and end of each ROI (see Fig. 4 ).One data file per participant (P001-P011), per trial (T1-T4), per  condition (2D/3D) is presented.Additionally, the data type (AnalyzedData), and the performed fragment (F1-F4) are given in the filename.An example of a file name is e.g., 'P001_T1_2D_F1_ AnalyzedData.csv'for a participant (see Table 4 ).

Questionnaire_Data.zip
The results of 5 standardized questionnaires are presented: the Makransky Multimodal Presence Questionnaire (the social presence subset or MPQS and the physical presence subset or MPQP), the Witmer Presence Questionnaire (WPQ), the Immersive Tendencies Questionnaire (ITQ), the Musical Sophistication Index (MSI), and the Sense of Musical Agency Questionnaire (SOMA).Additionally, demographic data (DQ) were collected, along with some open questions (OQ).The answers to the questionnaires are organized in 3 csv files: 'MB.csv', containing answers to the questionnaires presented before the first session (ITQ, MSI and some DQ); and 'C1.csv' and 'C2.csv', containing the answers to the questionnaires presented before and after each session (MPQS, MPQP, WPQ, SOMA, some DQ and some OQ) in the first and second condition, respectively.A csv file named 'Legend.csv'indicates the codes of all the questions, and where the answers to the questions can be found (see Table 6 ).Since some participants answered in Dutch, all responses were translated to English before adding them to the repository.

Scores.zip
The scores which were played by both the avatar and the participants are provided, with the correct bowings and articulations.The fragment and the violin section are indicated in the filename.E.g., 'First_Violin_F2.pdf',contains the scores of fragment F2, as played by the first violins.See Table 7 for an overview of the file structure and the content.

Avatar_Data.zip
This directory contains files in csv format with labeled MoCap Data, including data labels.Every column is a data stream from a marker.Marker positions are indicated in Fig. 1 .Every marker has 3 data streams, referring to the x, y, and z coordinates of the marker position.In addition to the markers as indicated in Fig. 1 , the violin (3-4 markers) and the violin bow (3 markers) are labelled as well.One data file per avatar (First Violin or Second Violin) is presented.Additionally, the data type (MoCap), and the performed fragment (F1-F4) are given in the filename.An example of a file name is e.g., 'First_Violin_F2_MoCap.csv' for an avatar (see Table 8 ).Additionally, the directory contains files in csv format with joint angles, including data labels.Every column is a data stream from a joint (see Fig. 2 ).Every joint has a varying number of data streams, depending on the calculated angles.In addition to joint angles, the angles of the instrument relative to the body are given as well, the distances of the bow to the bridge, and the distances of the bow to the strings, respectively.One data file per avatar (First Violin or Second Violin) is presented.Additionally, the data type (JointAngles), and the performed fragment (F1-F4) are given in the filename.An example of a file name is e.g., 'Second_Violin_F3_JointAngles.csv' for an avatar (see Table 8 ).Finally, this directory contains wav-files per avatar (First Violin or Second Violin).Audio files contain 2 tracks (left and right microphone), i.e., they are stereo recordings.Additionally, the data type (Audio), and the performed fragment (F1-F4) are given in the filename.An example of a file name is e.g., 'First_Violin_F1_Audio.wav' for an avatar (see Table 8 ).

General protocol
A total of 11 participants were recruited from the Ghent University Student Orchestra (GUSO), a local amateur orchestra.Among them, 6 participants belonged to the first violin section, while 5 participants were part of the second violin section.Each group was assigned to rehearse two different orchestral fragments using an Augmented Reality (AR) environment with a virtual audiovisua representation of a virtual section leader.In this study, there were four fragments in total, labeled as fragments 1, 2, 3, and 4, with two fragments assigned to each violin section.
The participants from the first and second violin sections were randomly divided into two groups: Group 1 and Group 2. Group 1 rehearsed with a 3D avatar for fragments 1 and 3, and a 2D projection of the avatar for fragments 2 and 4. On the other hand, Group 2 rehearsed with a 2D projection of the avatar for fragments 1 and 3, and a 3D avatar for fragments 2 and 4, as indicated in Table 9 .
Throughout the experiment, the participants took part in four sessions, conducted once per week over the course of one month.The condition (2D or 3D) for each fragment remained consistent within each group.In each trial, the participants followed the same procedure: they practiced the designated fragment with the avatar for 15 min, took a short break, and then performed the fragment while being recorded along with the avatar clip.The participants were instructed to carefully imitate the bowings, dynamics, and articulation of the avatar during the recording.Motion capture data (MoCap), audio, and video of the participants were recorded during each complete trial, and these datasets were synchronized with the avatar simulations for subsequent analysis.It is worth noting that participants were allowed to freely move around and choose their preferred playing position during practice in both the 2D and 3D conditions, but a fixed position was assigned to them during the recording phase.
Following the recording, participants completed the MPQS, MPQP, WPQ, SOMA, and OQ questionnaires.They then repeated all the aforementioned steps with the other condition and the remaining fragment.Importantly, participants were not allowed to practice the fragments between trials.An outline of the pipeline is given in Campo et al. (see Fig. 2 in [3] ).Before the first trial, participants completed the ITQ and MSI questionnaires.
In total, the experiment yielded 11 × 2 × 4 raw datasets, which consisted of MoCap, audio, and questionnaire data, capturing the entire duration of the study.

Participants
Participants had a mean age of 21.3 ± 2.2 years (mean ± SD).All participants in the study had extensive experience playing the violin, with a minimum of 12 years of violin-playing expe-

Table 9
Overview of participants, their respective violin section and stimulus.

Participant
Violin section Piece (2D condition) Piece (3D condition) .Additionally, the level of musical skill and engagement of the participants was assessed using The Goldsmith Musical Sophistication Index (Gold-MSI), which yielded an average score of 4.9 ± 0.5 [4] .Furthermore, participants' tendency to become immersed in an artificial environment was evaluated by administering a standardized self-reported Immersive Tendencies Questionnaire during the registration process, with an average score of 4.2 ± 0.8 [5] (see Table 10 ).

Questionnaires
We used the standardized self-reported Presence Questionnaire [6] (the Witmer Presence Questionnaire (WPQ)) and two sub-sets of questions from the Multimodal Presence Scale for Virtual Reality [7] (the Makransky Multimodal Presence Questionnaire (MPQS and MPQP) to inquire about participants' physical and social presence in a Mixed Reality environment.Participants were asked to fill in these questionnaires after every condition in every trial.In total, each participant filled in the WPQ, the MPQS and the MPQP eight times.In addition, after each session, we asked several open-ended questions (OQ) regarding the effectiveness of the training in Mixed Reality setup, regarding the similarity of the experience when practicing with a colleague or with the video at home, and regarding possible application improvements.
Before the experiment, participants filled in the Immersive Tendencies Questionnaire (ITQ), the Musical Sophistication Index (MSI).Additionally, demographic data (DQ) were collected.

HoloLens app
The avatars of the first and second violin were integrated into a HoloLens application, developed using Unity (Unity version 2020.3.2f1,San Francisco, CA, USA).For the 2D condition, the avatar was projected in a frontal view onto a virtual 2D screen (see Fig. 3.c in [3] ).In contrast, for the 3D condition, a fully rendered avatar was positioned in front of the participants (see Fig. 3.d in [3] ).The performance of the avatar could be controlled by a user interface with start/stop buttons and a slider, allowing for starting, stopping, forwarding, or rewinding of the performance (see Fig. 4 in [3] ).The app is not included in the repository.

Data analysis
MoCap data, audio and video data were acquired for every participant.Joint angles of wrist, elbow and shoulder were approximated from the MoCap data using a custom-made MATLAB package [8] , based on the standards of the International Society of Biomechanics (ISB) [9][10][11][12] .Additionally, tilting angle and bow position between the bow and the violin was calculated, respectively defined as the distance of the contact point between bow and string to the bridge and frog (see Fig. 2 ).

Stimulus
The performances of the leaders of the first and second violins from the GUSO were recorded playing two pieces each for the study.).These pieces were deliberately chosen to encompass a wide range of techniques and difficulties, with minimal rests.During the recording sessions, the violinists had the option to play along with either a metronome or an orchestral recording through headphones, depending on their preference.The scores of these pieces are also added in the repository.

Motion capture acquisition
Motion capture data were captured using a Qualisys MoCap system with 18 cameras, including 4 RGB cameras (see Fig. 2.a in [3] ). MoCap and video data were recorded at a rate of 120 Hz, while audio was recorded using a Y-pair of condenser microphones at 48,0 0 0 Hz with a bit depth of 24 bits.All data, including audio, video, and MoCap, were recorded simultaneously and later synchronized using the SMTPE protocol.The motion capture data were then used to generate a whole-body skeleton (see Fig. 2b in [3] ), which was used as the basis for modeling and rigging a male avatar (for the first violin) and a female avatar (for the second violin) by the company ARVRtech (ARVRtech, Novi Sad, Serbia).at IPEM (UGent, BOF, FWO).We thank dr. Bart Moens for technical assistance, and the GUSO for their enthusiastic collaboration.

Fig. 1 .
Fig. 1.Overview of marker placement in participant, violin, and bow.Markers are indicated as white dots with a black edge.

Fig. 2 .
Fig. 2. Example of joint angle data of both avatar (blue) and a participant (red) of the first violin section, playing Dvorak (fragment 1).Data represent wrist (a,b), elbow (c,d) and shoulder angles (e-g), as well as the distance between contact point between bow and string, and the frog (h) and the bridge (j), respectively, and the tilting angle between the bow and the violin (i).AA = adduction-abduction, FE = flexion-extension, PS = pronation-supination, E = elevation.(For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 3 .
Fig. 3. pipeline illustrating time series comparison between participant and avatar.(a) represents the bow position of avatar (black) and participant (blue).Green and red dotted lines represent analyzed downstrokes and upstrokes, respectively.(b) Represents the loudness of the performance, the green dotted line represents the loudness threshold for further analysis.(c) Displays a close-up of the red area in (a).(d) represents a 2D plot of the bow position relative to the bridge (horizontal axis) and the bow position relative to the frog (vertical axis).(e) Alignment of both signals in (d) before Procrustes distance calculation of the 2D signal.(d) Displays a close-up of the red area in (c).(For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 4 .
Fig. 4. example of analyzed data.(a) Represents the bow position of avatar (black) and participant (blue).Green and red dotted lines represent analyzed downstrokes and upstrokes, respectively.(b) Displays the computed Procrustes distance (PD), of the participant bow strokes as compared to the avatar, and (c,d) displays the SPARC index and bowing length of every participant (dotted blue line) and avatar (dotted green line) bow stroke.(For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) The leader of the first violins played Dvorak's Symphony No. 8 in G major, Op. 88, Part III (bars 1-180) and Holst's The Planets, Op. 32: I. Mars (bars 17-83; 95-133; 149-167).The leader of the second violins played Dvorak's Symphony No. 8 in G major, Op. 88, Part I (bars 33-240) and Brahms' Symphony No. 2 in D major, Op. 73, Part III (bars 144-318 2023 The Author(s).Published by Elsevier Inc.This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ )

Table 1
Overview of different zip-files in repository.

Table 2
Content and file structure of Labeled_MoCap_Data.zip.
( continued on next page )

Table 3
Content and file structure of Joint_Angle_Data.zip.

Table 4
Content and file structure of Analyzed_Data.zip.

Table 5
Content and file structure of Audio_Data.zip.

Table 6
Content and file structure of Questionnaire_Data.zip.

Table 7
Content and file structure of Scores.zip.

Table 8
Content and file structure of Avatar_Data.zip.

Table 10
Participant demographics.SD is standard deviation, MSI is the music sophistication index, ITQ is the immersive tendencies questionnaire.