1 Introduction

Augmented reality (AR) is a technology which overlays virtual objects (augmented components) into the real world. These virtual objects then appear to coexist in the same space as objects in the real world (Akçayır and Akçayır 2017; Azuma et al. 2001). Another widely used definition is from Milgram and Kishino (1994) where, in a continuum ranging from a purely virtual environment to a completely real one, AR is positioned close to the real environment and the users perceive the real world with an additional layer of virtuality. Even though the technology was originally used as a tool to help assembly workers at Boeing by showing them virtual labels through the use of custom made glasses (Caudell and Mizell 1992), AR has quickly attracted research and industry attention in many different areas such as gaming (Das et al. 2017), cultural heritage (Vlachos et al. 2022), customer engagement (McLean and Wilson 2019), manufacturing (Ong et al. 2008) or education (Avila-Garzon et al. 2021; Garzón et al. 2019). Especially, in the last ten years, there has been a surge of research related to the usage of AR in education and its impact on student engagement and academic results (Bressler and Bodzin 2013; Sirakaya and Cakmak 2018; Chang et al. 2022). This can largely be attributed to the increased availability of AR-ready devices as well as the familiarity of today’s students with technology-enhanced learning (TEL).

Despite huge improvements in both hardware and software which led to an increased offer of AR applications for mobile devices and AR headsets, as well as extensive research on the usage of AR in education (Dinis et al. 2017; Akçayır and Akçayır 2017; Chen et al. 2017; Ibáñez and Delgado-Kloos 2018; Pellas et al. 2019; Masneri et al. 2022b), the usage of AR in primary and secondary schools is still not common (Commission et al. 2023). In a previous work (Masneri et al. 2023), the main reasons behind this were identified, namely the limited collaboration capabilities of existing apps, the inability to create new content and the difficulty of adapting to existing school curricula. In the same work, the cleAR (see Fig. 1) architecture was presented. This is an interoperable architecture that enables the creation of multi-user AR applications, simplifies the development process and allows the stakeholders to add new content to existing applications, track user progress and integrate application data into the learning management system (LMS) used by the teachers.

Fig. 1
figure 1

The cleAR architecture as defined in Masneri et al. (2023)

In Masneri et al. (2023), 47 teachers were asked to fill in a questionnaire to help identify the requirements of the architecture and were also involved in several interviews to define the scope of the solutions. Ultimately, four design objectives (DOs) were identified:

  • Interoperability (DO1): The architecture enables the creation of applications that can run on multiple devices—tablets, smartphones, head-mounted displays (HMDs) or laptops—and provide APIs for development on multiple platforms.

  • Multi-user capabilities (DO2): As collaboration is a key requirement, the architecture provides tools to support multi-user functionalities, both for remote and in-class collaboration.

  • Data analytics (DO3): The architecture enables long-term data storage as well as tools for automatic data analysis and visualization, by providing an API to access standard dataviz and machine learning libraries.

  • Easy to develop (DO4): The applications relying on the architecture can be developed quickly and easily.

In the previous work, three proof-of-concepts (PoCs) were developed with the aim of performing a conceptual evaluation of the architecture and to demonstrate that it fulfils the aforementioned DOs.

In this new manuscript, we build upon such previous work and introduce ARoundTheWorld, a multiplatform collaborative AR geography game. The primary goal of the application is to demonstrate the potential of the cleAR architecture for creating interoperable applications that can be integrated into school curricula.Footnote 1 Additionally, the game aims to enhance student engagement through the collaborative functionalities provided by cleAR. The application has been developed incorporating feedback from the teachers of a Basque primary and secondary school associationFootnote 2 and it has been evaluated after being tested with 44 students. Once the test was complete, teachers were interviewed while students were asked to fill a short questionnaire about the ARoundTheWorld User Interface (UI), the User eXperience (UX) it offered, as well as its effectiveness as a tool for raising the engagement of the students and enabling collaboration between them. To perform a quantitative evaluation about ARoundTheWorld collaboration capabilities, the app collected data—in the form of xAPI statements (Clarke et al. 2020)—about its usage, the number of interactions between students and the performance of the students.

The choice of Geography as the application domain is motivated by the fact that geographical exploration is an integral part of child development (Catling 1993), and the use of maps helps students improve spatial thinking skills (Collins 2018). The application is structured as a quiz where students take turns to answer geography questions. If a student is struggling to answer a question, other students can interact in the augmented space and provide hints to the active user. ARoundTheWorld works both as a mobile and a web application, is easily extensible and provides several logging and tracking mechanisms, which can be easily integrated into the school’s LMS to enable automatic tracking of the progress of the students.

The main contributions of this work can be summarized as:

  • A complete description of the application and the feasibility of implementing the different components of the cleAR architecture to fulfil all the design objectives.

  • A qualitative evaluation of the technology integrated in the application, based on the questionnaires filled in by the 44 students and the interviews with their teachers.

  • An analysis of the data collected by the application during the user study, with a focus on the effects of collaboration capabilities on the quiz results and the engagement of the students.

The number of users who participated in the study is not representative of the whole student population. The aim of this study is not to demonstrate the positive effects of AR in school settings, but rather to validate the architecture presented in a real school setting, and not only through the development of PoCs.

The rest of the paper is structured as follows: Sect. 2 covers the relevant state of the art regarding collaborative AR applications for education. Section 3 describes the implementation details of ARoundTheWorld, while Sect. 4 outlines our methodological framework and the evaluation process. In Sect. 5, we describe the results obtained from the student questionnaires and teacher interviews and the quantitative analysis of the data collected through the application. Finally, Sect. 6 presents the conclusions and suggests future research lines.

2 Related work

While the first mention of collaborative AR is more than 20 years old (Billinghurst and Kato 2002), there are only a few works presenting such applications in educational contexts, mainly due to the difficulty of including multi-user capabilities in AR apps. A systematic review of AR applications used in education (Iqbal et al. 2022) mentions that collaborative learning in AR represents a critical research direction, but so far very few studies provide collaborative functionalities in an AR environment (Pan et al. 2021; Choi et al. 2017). The work from Cai et al. (2017) presents an application that makes use of a Kinect camera to extract 3D information from the scene and display virtual magnetic induction lines. As the students move around the room the objects simulating the magnets, the system updates in real time the representation of the magnetic field. In Takahashi et al. (2018), the authors designed a large scale AR and projection system, modifying the gymnasium of the school, to create a learning game for children with autism spectrum disorder (ASD), which was designed to keep their attention focused on the content provided. Laviole et al. (2018) presented a markerless application for learning how an artificial neural network works, where the students can manually tweak the values of the network parameters and see how it affects the ability of the network to classify images.

As it heavily relies on visual representation of data, several technologies have been exploited to make the teaching of geography more effective and engaging. In the context of AR, Palaigeorgiou et al. (2018) used a projector to create tangible 3D maps with which up to three students could interact at the same time. Xefteris and Palaigeorgiou (2019) extended the concept of tangible maps by including the usage of programmable robots to guide the students through a virtual journey.

As far as evaluating the effectiveness of AR applications for education, the vast majority of the studies highlight a positive (albeit limited) effect derived from using the technology. Chang et al. (2022) performed a meta-analysis of 134 studies which suggests that AR benefits all the learning outcomes evaluated, with the largest effect being on students performance. A systematic review of 45 studies (da Silva et al. 2019) reaches the same conclusions, but highlights the many differences in the evaluation protocols, which complicate the statistical analysis of AR effectiveness across different applications.

AR applications are often implemented as serious games, in which using gamification concepts the students can more easily learn and retain concepts that would otherwise not interest them. Oh et al. (2017) described a game-based simulation where the users can study the properties of light such as reflection and refraction. López-Faican and Jaen (2020) created a multiplayer game in which children can improve their communication skills by practising in an AR environment, while Çelik and Yangın Ersanlı (2022) described a gamified AR app used in a Content and Language Integrated class.

Several publications focus on the importance of effective UIs and UXs in enhancing student engagement. A systematic review of the literature analysed 49 studies (Law and Heintz 2021) and identified a lack of knowledge about usability and user experience frameworks, suggesting that there is a disconnect between Human–Computer Interaction (HCI) and TEL communities, as well as a lack of AR-specific UX evaluation metrics. The work of Thamrongrat and Law (2019) evaluated the learning effect for teaching 3D geometry using an AR application compared to traditional learning, as well as the User Engagement (UE) of the students using the app compared to the ones in the control group. Another study (Alrashidi et al. 2017) compared the effectiveness of learning software debugging concepts using an AR application versus a non-AR approach.

Applications used in schools usually generate data that are stored on the school LMS. A standard that is recently gaining traction for collecting data about learners’ activities is eXperience API (xAPI),Footnote 3 an open-source software specification that makes it possible to collect data about a wide range of learning experiences. This is achieved by sending each activity that needs to be recorded to a learning record store (LRS) in a consistent and secure format (Clarke et al. 2020). The activities are collected as statements stored as JSON objects. Statements can be tuned to a specific use case by defining a vocabulary of valid statements. Secretan et al. (2019) described a system where xAPI is used to perform learning analytics in an AR environment, while Wu et al. (2020) used xAPI to collect data for a 3D design course.

Despite the impressive amount of literature about the use of AR technology in educational environments, to the best of our knowledge, there is no solution that incorporates all the requirements identified by the teachers. These requirements aim to make AR apps a useful learning tool by providing collaboration capabilities, customisation, interoperability, analytics, LMS integration and ease of use.

3 Collaborative AR application

In this section, we summarize the application implemented using cleAR, the architecture presented in Masneri et al. (2023) and how the developed application fulfils the DOs presented in Sect. 1. The application, called ARoundTheWorld, is a collaborative geography quiz in which students answer a set of questions prepared by the teacher. Once started, the application sends a question to the first student (for example, “Where is Kyoto?”). The student answers by placing a pin on the 3D globe of the Earth shown in the augmented space. Other students can collaborate with the active user in two ways. They can suggest to her in which continent the answer is located and—once the user has placed the pin but has not confirmed her choice yet—by sending a “thumbs up” or “thumbs down” feedback.Footnote 4 Once a student answers, the application sends a new question to the next user and repeats the process until all the questions have been answered. Figure 2 shows the application workflow, highlighting the interactions of teacher and students.

Fig. 2
figure 2

Workflow of the ARoundTheWorld application

The application considers three types of users (“teacher”, “players” and “watchers”), depending on their role and their means of interacting with other users. The first role is that of the students participating in the quiz— the player—described above. Another role is that of the teacher, who controls the overall status of the app through a web-based interface. The final role—the watcher—is that of the students who are not actively participating in the quiz (that is, they are not answering any questions). They can watch what other students are doing and suggest to them the correct answer. This role was designed to let students without an AR capable device engage with the players by checking what they are doing and collaborate with them by suggesting the correct answer.

The application is designed to require minimal supervision from the teacher to let him or her interact as much as possible with the students. As shown in Fig. 3, the teacher interface consists of four parts:

  • A 3D representation of the augmented content as viewed by the active user (that is, the student who is answering the current question).

  • The list of users connected to the app, together with the current score of the players and the last question they answered.

  • The suggestions sent to the student currently answering the question.

  • A dashboard (accessible in a separate tab) with charts of the scores achieved by each student across different sets of questions.

The teachers who filled in the questionnaire described in Masneri et al. (2023) mentioned that one of the factors limiting the usage of AR apps in schools is the lack of customisation capabilities. In this respect, ARoundTheWorld provides an additional web interface from which the teacher can create new sets of questions. To minimize the amount of work required by the teacher, the coordinates of each location are computed automatically using the Wikimedia APIFootnote 5 and the questions are stored as JSON files which are directly added to the application. The interface of the watchers is web based, too, and has a look and feel similar to the teacher interface.

For the application to successfully achieve interoperability (DO1), several types of hardware as well as software libraries need to be supported. In the aforementioned survey, the teachers reported a wide spectrum of devices available in their schools. Chromebooks and Android tablets were the most commonly used but other options included laptops, PCs, smartphones (both Android and iOS based) and iPads. Furthermore, while none of the teachers reported using AR headsets such as HoloLens, we believe that such devices provide the best AR learning experience for users. For this reason, our application supports Microsoft Mixed Reality Toolkit and is fully compatible with HoloLens devices. The application for mobile and tablet devices has been developed using Unity 2020.3, and the AR functionalities are provided by the AR foundation framework. The web application has been built using Typescript and Three.js (to enable 3D content to be displayed in the browser). All the logging data and the statements collected during app usage are stored in a Mongo database in the Learning Locker instance deployed on AWS. Porting to HoloLens and iOS devices is achieved through, respectively, Unity integration with Microsoft MR Toolkit and by exporting Unity’s project file to an XCode environment.

Fig. 3
figure 3

The web-based teacher interface of the ARoundTheWorld application

The application supports multi-user capabilities (DO2) by relying on the functionalities provided by the cleAR architecture, which provides a library for sharing 1-to-1 or broadcast message passing (Masneri et al. 2022a) with minimal changes to the existing code base. When a student is asked to answer a question, she becomes the active user. She shares the camera position (which determines her view of the 3D globe) as well as the position of the pin, once placed, with the other users. The other students will then see on their devices the 3D globe in the same way the active user does. For users on a mobile device, this happens directly in the augmented space, while users using a PC will see the globe in a virtual 3D environment on a <canvas> element. At the same time, suggestions from users are shared in a broadcast fashion, so that every student knows about the suggestions sent by others. Finally, the teacher interface shares information about the current question, the score obtained by the active user after receiving her answer and the cumulative score of each user. The information is shared 30 times per second and it allows a smooth UX for every participant. The bandwidth usage is low since only basic data types such as strings and numbers are shared between users and the delay is below 15 milliseconds on both WiFi and mobile networks. A previous approach tried to combine message passing and the transmission of the screen of the active user, using WebRTC, to the students using a PC to better simulate the AR experience (Matsumoto et al. 2023). Unfortunately, such a solution has proven not to be scalable. Due to poor Unity support for WebRTC servers such as Janus, the application suffered delays which severely impacted the performance. With more than five users, the UX was severely affected, and the app became unusable when more than ten users were connected to the same session.

To comply with the data analytics design objective (DO3), the application enables data collection through the storage of eXperience API (xAPI) statements on a Learning LockerFootnote 6 instance. Learning Locker is the standard data repository for storing learning activity statements generated by xAPI. eXperience API is a web service that enables the secure sending and storing of learning experiences to an LRS. xAPI statements use JSON format and at their core they are formed by the triplet Actor–Verb–Object. The Actor represents the person performing a specific action (the Verb), while the Object could be another person or an xAPI activity on which the actor acts upon. xAPI statements can optionally include additional information such as Timestamps, Context or Results, to provide more detailed information. Storing statements across each session enables the application to keep track of user activity and to store additional logging messages that simplify application debugging. Learning Locker provides basic analytics and plotting capabilities through a web interface, as well as filtering and exporting the data in CSV format. These functionalities have been extended through the development of a Python packageFootnote 7 that includes methods to perform advanced data exploration and plotting, as well as running common machine learning models on xAPI statements data. One of the aims of the package is to simplify data analysis as much as possible, enabling teachers without development skills to extract valuable information from the collected data. For this reason, the package directly integrates GPT-4 (OpenAI 2023; Osmulski 2023), so that users with a valid OpenAI Key can use natural language to debug or generate code when needed. The package has been used to analyse the data collected during the evaluation of the app and the results will be presented in Sect. 5.

Finally, to demonstrate how the aforementioned architecture enables developers to easily create multi-user applications (DO4), we asked the developer of ARoundTheWorld if and how the architecture helped him in the development process. The developer mentioned that the architecture API enabled him to extend the application to enable multiple users in a transparent way. He could avoid having to deal with low-level networking issues or having to implement platform specific methods. While it was not possible to estimate the amount of lines of code or hours saved by its usage, the developer said that he was satisfied by the capabilities of the architecture and would use it again for future projects. Nevertheless, in order to enable teachers to create an ecosystem of collaborative AR applications for education, the developer mentioned that the availability of authoring tools to easily create applications on top of the cleAR architectural design would be desirable.

4 Evaluation

Our study aims to investigate how collaborative AR solutions may benefit the learning experience, and what, if any, are the usability issues of multi-user applications that can be used on different devices such as tablets, mobile phones or laptops. In the literature, there is no agreement on how to conduct evaluation of AR-based educational apps. The survey from Santos et al. (2013) analyses 87 AR applications and the evaluation protocols included interviews, observing and coding overt behaviour and expert reviews. Of those who used questionnaires, the majority crafted their own. Among the works that used established questionnaires, some relied on the ISONORM Usability questionnaire (Prümper 1999), Technology Acceptance Model (Davis and Venkatesh 1996), Constructivist Multimedia Learning Environment Survey (Maor 1999), Instructional Material Material Survey (Keller 1987), Intrinsic Motivation Inventory (Ryan and Deci 2000). The number of participants in the evaluation of AR applications for education varies a lot depending on the study. A systematic review of the literature conducted by Masneri et al. (2020) on articles published between 2015 and 2020 shows that this number varies between 2 and 290 participants, while another survey by Santos et al. (2013) analysed studies where the number of participants ranged from 4 to 419.

In this work, a questionnaire which adapts and extends the Positive System Usability Scale (P-SUS) (Brooke et al. 1996; Sauro and Lewis 2011), with a few additional questions added to specifically evaluate the collaborative capabilities of the application, was developed and used. The questions, presented in Appendix A, were grouped into four classes depending on what they were evaluating: the interest of the application as an educational tool, the usability of the app, its collaboration capabilities and its functionality. Additionally, the participants could provide free-form feedback about the overall experience and whether they would recommend it to other students. Finally, we also conducted an interview with the teachers to collect their feedback about the learning experience, how collaboration may impact the involvement of the students, how AR apps could be used to evaluate the students knowledge of a subject and how they would take advantage of the data collected through the application.

Table 1 Details of the demographics and number of devices used across each test

At the beginning of the evaluation, the participants were briefed about the experiment and its purpose and were asked to sign a consent form. The questionnaires were anonymous but had an ID associated, so that during data analysis we were able to associate the answers to the questionnaires with the data collected from each device through the xAPI statements. The evaluation involved 44 students from three schools in San Sebastian between March and May 2023. Each experiment involved students of different ages (14, 17 and 19 years old) and their corresponding teachers. None of the participants had previous experience with AR applications, but they were familiar with the hardware devices (smartphones, tablets and PCs) used during the evaluation. In each school, the students were split into two groups: the first one represented the players, tasked with answering the quiz questions using the application on mobile or tablet devices, while the second group represented the watchers, who used a laptop or PC to see how the students answered the quiz and provided suggestions along the way. Table 1 shows the number of participants in each experiment, as well as details about the type of devices used when interacting with the application.

Once each participant had a device assigned, they were asked to connect to the application by selecting the session ID representing the experiment and the user ID which would be used as the Actor value for the xAPI statements generated while using the application. After a short Q&A session to clarify doubts about the app usage the teacher started the quiz and the students would then take turns to answer two sets of questions. Once the quiz was over, the teacher stopped the data collection and was able to check the score of each student and to export the data. After logging out of the session, the students filled in the questionnaires while we conducted the post-study interview with the teacher.

5 Results and discussion

In this Section, we present and discuss the results of the survey as well as the data collected from the app. An extended version, which also includes a subgroup analysis of the data split for different age group, is available in the data repositoryFootnote 8. The quantitative results from the responses to the post-intervention survey (shown in Appendix A) are summarised in Fig. 4. They are shown as stacked bar charts (Friedman and Amoo 1999), where the chart on the left refers to the answers to each question and the chart on the right to the groups of questions mentioned in Sect. 4. From the figure, it can be appreciated that the application was very well received by the students and that every question except the first one was answered positively (“Agree” or “Strongly agree”) more than 60% of the time. The average rating for each question ranges from 3.45 to 4.43, with limited variability across answers, with the standard deviation being below 1 for most of the questions. The plot on the right in Fig. 4 shows similar results: the students assigned the highest score to questions related to usability, especially questions 2 (“I found the application to be simple”) and 3 (“I thought the application was easy to use”). The questions related to functionality received a high score as well, while the ones relating to the educational content of the app show the highest variability: those questions received the highest amount of negative answers while also receiving the highest score more than 40% of the time.

Fig. 4
figure 4

Left: survey results on each question. Right: survey results grouped by question type

The bar plots of Fig. 5 are used to identify differences in the answers of the students based on their role when using the app and the device they used. Somewhat surprisingly the watchers gave a slightly higher mean score, albeit with a much higher variability in the answer. As for the device type, the users on an Apple device (iPhone or iPad) gave a higher score compared to students using an Android device or a PC, but the differences are not statistically relevant.

Fig. 5
figure 5

Mean score of the questionnaire answers. The vertical bars represent the standard deviation

Figure 6 shows the mean score and the standard deviation for the questionnaire answers grouped by user. We split the students by age, their role when using the app and the device they used. From this visualization, an outlier can be easily identified, represented by the only student in the 19-year-olds’ group who used a PC and was the only non-active user in that session. The reason for the lower score, as identified by the comments provided by the student in the questionnaire, was that the experience for him did not feel particularly immersive nor collaborative, as his role was fairly different from that of his classmates.

Fig. 6
figure 6

Survey results split by user type. Left: Average question score by device used and student role. Right: average question score by students age

Since ARoundTheWorld collected data in the form of xAPI statements, an analysis of the data received was conducted in order to detect whether there was a correlation between the score in the questionnaire and the number of statements collected by the application for each user. The actions registered by the app include both interaction between students, such as the suggestions sent, and the interactions of a user with the app. As shown in Fig. 7, there is a high variability in how much the students interacted. It is clear, though, that the players (the students using a mobile device and interacting with the augmented content) were much more involved in using ARoundTheWorld. This is probably because their role was much more interactive and immersive.

Fig. 7
figure 7

Interactions with the application for each student (identified by the device used)

An interesting aspect to analyse is whether there is any correlation between the number of interactions of each student and the answers they have given to the survey questions. Two statistical approaches were followed. First, a correlation analysis was performed to check whether there was a relation between the number of interactions and the average scores given to the questions by the students, by calculating the Pearson correlation coefficient. The second approach is that of hypothesis testing. This checks whether the answers given by students with a high number of interactions are significantly different from the answers given by students who had a low number of interactions. A two-sample T test assuming equal variances has been used for testing. In both cases, a significance level of \(p < 0.05\) was established. Since the interactions between the watchers (students on a PC) and players (students on a mobile device) are significantly different, we also performed the analysis for these specific subsets of the data as well.

Unfortunately, the analysis performed is not conclusive. None of the tests returned a p value below the significance level, and the correlations identified (most notably between the interactions of students on a PC and their answers to the survey) are not statistically significant.

A similar analysis was conducted to inspect whether there were correlations between the score obtained in the quiz and the answers in the survey. In this case as well, no statistically significant correlation was found. This was expected since the app was designed to ask questions of varying difficulties, but the difficulty level of the questions received by a student did not change during the test. As expected, students who received easier questions achieved a higher score and there is indeed a significant correlation between these two variables.

Another correlation we wanted to analyse was the one between the mean score obtained in the survey and the number of suggestions sent by the user. The statements about suggestions are an interesting variable because for each question, every user (besides the active player, the one answering the current question) was able to send two suggestions. For this reason, many such statements were collected during the trial. To encourage students to provide suggestions, the application assigned points for each correct suggestion. In this case, the analysis showed a significant positive correlation (\(r = .37\), \(p = .044\)), meaning that the most engaged students were the ones that gave a higher score in the survey.

Finally, we wanted to perform a clustering analysis of the data, to check if we could identify distinct groups of users. In this case, we focused only on the students who used a mobile device, since they provided a greater number of features to work with. We considered as variables of interest the average time left per question, the number of suggestion accepted, the total number of interactions and the mean value of the answers in the survey. A dimensionality reduction using PCA (Jolliffe 2002), shown on the left in Fig. 8, revealed that the first two principal components explained more than 70% of the variance in the data. Additionally, a biplot analysis indicated that the most relevant variable for the first principal component was the number of user interactions, while for the second one it was the results of the survey.

As the number of data points is small, we used a hierarchical clustering algorithm (Caliński and Harabasz 1974). A Silhouette score (Rousseeuw 1987) computed for cuts between 2 and 6 suggested that the optimal number of clusters in this case was either 2 or 4. The clustering results (shown in Fig. 8, right) identified one big group of students characterized by having a particularly high number of interactions and another one having a higher score in the survey answer. The other two clusters were harder to characterize. In one case, we could not clearly identify a common feature in the data, while in the other the cluster only contained two members, and the intra-cluster variable suggests that those data points are probably outliers.

After running the trials in school, we also conducted a post-intervention interview with the teachers. The three teachers seemed very intrigued by the possibility of easily being able to use AR in school without having to resort to any specific hardware. They especially valued the fact that the collaborative features of ARoundTheWorld encouraged the students to work together to provide the answer, either through the features of the application or simply by talking to each other. Another relevant point for the teachers was the possibility of adding new content on their own, as well as the fact that they could export the results to the school LMS. The teachers were more sceptical about the AI features provided by the backend. They mentioned that the vast majority of their colleagues do not have sufficient knowledge to perform the analysis on their own. They would rather prefer using a PowerBI or Tableau interface to visualize data and extract basic reports. The teachers also mentioned that the role of the watchers was too passive and that in longer experiments these students might lose interest. They suggested enabling the role of active user when using a PC, even if that means not using AR components but a browser-based 3D graphics library.

Fig. 8
figure 8

Left: Clustering of the active users data on the PCA space. Right: the dendrogram representing the hierarchical clustering

6 Conclusions and future work

In this work, we presented ARoundTheWorld, a multiplatform AR application which implements collaborative capabilities and gamification concepts. The application is based on a Geography quiz and it fulfils the design objectives identified in Masneri et al. (2023). The evaluation, conducted with 44 students and 3 teachers, and the analysis of the xAPI statements showed that students evaluated very positively the application. Additionally, we measured a small but statistically significant correlation between the ratings in the questionnaire and the engagement of the students. Furthermore, post-study interviews with the teachers identified the collaborative capabilities and the possibility of personalizing the app content as being key factors for a sustained usage of AR apps. In fact, one of the teachers suggested the possibility of adding more collaborative features, such as a chat system or speech-based interactions to make the application more immersive and more appealing when used in a distributed setting.

Future work directions include testing the application with a larger sample size, in schools spread over a wider geographical area. This would reduce the noise in the data collected while also allowing a more statistically robust analysis of the data. Another line of work is the creation of a software suite that simplifies the analysis of the data for teachers without a software development background, as well as an authoring tool to further simplify the development of collaborative AR applications. While this is not a new idea (Rajaram and Nebeling 2022; Thanyadit et al. 2022), accomplishing a widespread diffusion of AR applications in education requires software solutions that allow quick and easy development, content personalisation after the release of the app and the creation of a central repository for storage and sharing of assets, plugins and applications. We believe that this work represents a first step towards this goal.