A Systematic Map of Research Characteristics in Studies on Augmented Reality and Cognitive Load

In this paper, we present results from a systematic review of research on Augmented Reality (AR) with a special focus on cognitive load (CL). A total of 64 studies from the years 2007 to 2019 were analyzed. The number of publications on AR and CL is steadily increasing. While studies are often conducted by multidisciplinary teams, most are from the US and Taiwan. From a methodological perspective, quantitative research methods with experimental designs dominate. Usually, studies are conducted as media comparison studies measuring effects of AR on declarative or procedural knowledge compared to one or more control groups. The examination of AR focuses on different components


Introduction
Research on the use of Augmented Reality (AR) for teaching and learning has gained much attention in recent years [1][2][3]. The number of publications published has been increasing rapidly and seems to gain in importance also in the future due to the reduced technical hurdles. AR is defined as the computer-based extension of reality [4]. The virtual objects align themselves with the objects of the real world, enable interaction possibilities, and react simultaneously to these interactions [5,6]. In the field of education, the use of AR is mainly discussed from the point of view of learner-centered learning [7]. AR can, e.g., be used to make the invisible visible [8], to realize situated learning [9], and to view environments familiar to learners in a completely new way [10]. In the past, bulky devices such as head-mounted displays were needed for this purpose, but today mobile devices with an Internet connection and an appropriate app are sufficient. At least marker-based and location-based AR experiences can be realized very easily. Marker-based AR uses the technology of image recognition. So-called markers or triggers are made available to the learners. By scanning them with the camera of a mobile device, additional information, e.g. a 3D model in an analogue book, becomes visible and can be modelled via touch function. The situation is similar with location-based AR, where the virtual insertions become visible in connection with GPS data [11,12].
See-through and spatial AR are currently still technically more complex and require either special glasses or projectors to display AR elements. Newer AR techniques are markerless and web AR. Markerless AR allows the projection of virtual objects on any surface without scanning a marker beforehand. With web AR, no app is needed on mobile devices anymore; the virtual objects can be viewed directly via a browser. Because it is so easy to use, particularly web AR is seen to have great potential for many different areas [13].
Research on teaching and learning with AR has already targeted many different areas. Especially in the industry, AR seems to have become more popular and has been integrated to support and train everyday tasks [14,15].
The use of AR in formal and informal learning contexts has also been studied more intensively in recent years and, thus, some of the potential of AR can already be summed up:  Motivation: As with other technologies, there are many positive results regarding the motivating effect of AR [16][17][18]. Here, flow experience as well as the ARCS model [19] have been researched particularly often [20].
 Attitudes: AR is perceived by learners as useful educational technology with which they would like to continue learning. In addition, there are studies that show that attitudes towards, e.g., science learning have changed positively with the help of AR [21][22][23].
 Learning achievement: Regarding learning success, current meta-analyses show medium effect sizes when learning with AR. However, it is necessary to point out the still rather limited number of primary studies as well as methodological limitations, as the authors do, too [24][25][26].
While research on the potentials mentioned is relatively consistent, this does not apply to another important variable for the acquisition of knowledge and skills: cognitive load (CL).
Here, authors report different study results. Some studies conclude that AR can reduce cognitive load, while others see AR as a risk of cognitive overload [27][28][29].
The construct of cognitive load has its theoretical foundations in Cognitive Load Theory (CLT) [30,31]. This instructional theory assumes that the human working memory is limited in its capacity. This limitation must be taken into account when providing teaching and learning opportunities so that effective learning can take place.
Prior knowledge has emerged as the strongest predictor for the perception of cognitive load [32]. If learners are beginners in a certain domain, they need more guided learning opportunities, for example in problem solving.
Experts, on the other hand, can also benefit from more unguided learning opportunities and acquire new knowledge and skills [33]. The preparation of the learning materials is also crucial, as unnecessary cognitive load can be reduced if the principles of multimedia learning are taken into account [34]. This type of load is called extraneous load and can be actively changed by teachers. The intrinsic load, on the other hand, is the taskinduced cognitive load which can be changed by building up knowledge or by changing the task itself. The last load type is germane load, which represents the learning-relevant cognitive load [35].
AR can reduce the cognitive load when used appropriately, e.g. by scaffolding [36], or unnecessarily increase it in the case of poorly designed offers [37].
Reviews of research on AR and cognitive load are not yet available. The study by Ibili [38] does not consider primary studies, but only summarizes the findings of existing literature reviews. Again, the conclusion is that more research is needed on cognitive load in AR-supported learning. In another study, research on AR was reviewed regarding its theoretical relevance. Of 45 studies, only three mentioned cognitive load theory as a theoretical reference [39]. A detailed review of multimedia learning and cognitive load shows a similar picture: Only four studies examined the influence of AR on cognitive load [40]. This paper makes a first attempt to extent our knowledge of previous research on AR and cognitive load by analyzing the characteristics of available studies. It is quite central of being aware of the characteristics of a research field in order to overcome possible methodological deficiencies, for example. Therefore, this work makes an essential contribution to the field by providing clues for future research on AR and cognitive load in the selection of methodological approaches, survey instruments, and methodological designs. The analysis of these methodological characteristics of research can be classified as a critical issue. Findings on this can contribute significantly to shaping the future of research in a field [41,42]. Therefore, the following research questions will be addressed: The further structure of the article is as follows: First, we introduce the methodology we use and describe the process that led to the sample of 64 studies on AR and cognitive load. Then, the results for each research question are presented and discussed. The conclusion summarizes the most important findings for the reader.

Method
To address our research questions, we systematically map the nature of the research on AR and cognitive load.
Therefore, we conducted a systematic review. A systematic review is a systematically performed literature review that uses specific research methods with the aim to answer a specific question. It is characterized by a comprehensible search strategy and inclusion/exclusion criteria which lead to those studies that can contribute to answering the question. Included studies are then coded, synthesized and used to answer the research questions and to guide further research on the mapped topic [43].

Search strategy and selection criteria
Four databases were searched in October 2019: ERIC, Web of Science, Scopus and PsycINFO. Table 1 shows the search terms used for each topic.

Topic
Search Terms Augmented Reality "augmented reality" OR "mixed reality" OR "glass" OR "head mounted display" OR "virtual reality" OR "augmented reality AR" AND Cognitive Load "cognitive load" OR "cognitive load theory" OR "dual task" OR "working memory" OR "cognition" OR "attention" OR "load" OR "mental load" OR "overload" OR "mental effort" OR "germane load" OR "germane cognitive load" OR "intrinsic load" OR "intrinsic cognitive load" OR "extraneous load" OR "extraneous cognitive load" Inclusion criteria were journal articles, conference proceedings and book chapters in the English language reporting empirical results of primary studies on cognitive load and AR; including all types and devices that enable the presentation of AR content in all educational areas. No limit was set regarding time span .
The search initially revealed 2,008 references (see Fig. 1). After removing 10 duplicates, 1,998 sources remained for the first screening. The titles and abstracts of 300 publications were each screened based on the inclusion and exclusion criteria by two researchers. In case of conflicts regarding the inclusion, the title and abstract were read again together, and a decision was made on whether to accept or reject the publication.
The remaining sources were then divided between the researchers and screened according to the criteria, which resulted in 126 potential references for the review. Finally, 66 references were excluded and 60 references containing 64 studies remained for the data extraction process (see Appendix 1).

Data coding and categories
A comprehensive coding system was developed to extract the data from these studies. It includes more general information, as is usual in a mapping study [44], such as the origin of the authors, the institutional classification, the type of publication, and the assignment of the study to a research field. The methodological parameters were also coded to reveal possible trends or gaps in the research methodology. Furthermore, codes were developed for the research procedure, e.g., what is to be compared to find out more about the effect of AR on cognitive load.
Here, we took the different types of AR and the devices with which the subjects use AR into account. We also coded the purpose of AR in each study. We distinguish between assistance systems, instructional material, training systems, AR design research, and AR games. We also coded the type of knowledge, declarative or procedural, that should be taught/trained using AR.

RQ1: Bibliometric and geographical characteristics
The systematic map shows that 39 studies (60.9%) were published in journals and 25 studies (29.1%) in conference proceedings. As can be seen in Figure

RQ3: Characteristics of research on AR and CL to date.
As the systematic analysis shows, studies examine the use of AR for six different purposes and how it affects the cognitive load. In 27 studies (43%), AR is used as an assistance system to support specific action requirements, like physical computing, surgery or navigation tasks [e.g. [47][48][49]. AR is used in 18 studies (28%) as a technology for instruction [e.g. [50][51][52] and in 15 further studies (23%) to guide assembly tasks [e.g. [53][54][55]. In the two studies (3%) in Alrashidi et al. [56] and Loup-Escande et al. [57] AR is used to provide real-time feedback. One study aims at examining AR for spatial ability training [58] and further one focuses on collaborative problem solving [59].
In 73% (n=47) of the studies, AR is compared with one or more other media types (see Fig. 4). 29 studies, e.g., compare AR with screen and 20 studies compare AR with paper-based materials. Auditory information is contrasted with AR in four studies, real task situations in three. One study compares AR with VR.
In twelve studies (19%), an AR system is compared to another AR system and in six studies (9.4%), more than one AR system is contrasted with another medium.
Five studies (8%) study only the influence of an AR system on the cognitive load. In just two studies, a correlation is calculated and the influence of motivational or affective variables on the cognitive load while learning with AR is explored [63,64].

Fig. 4. Distribution according to the comparisons made
Regarding survey instruments, our mapping study shows that the Nasa Task Load Index (Nasa TLX) [65] is used in 36 studies (56.3%); either in two of these together with the Cooper Harper Workload Rating Scale [66]. Five studies (7.8%) use a questionnaire with five items on mental load and three items on mental effort (referenced as based on [67]. A similar questionnaire with two mental load items and two mental effort items (referenced as based on [31]) is used in four studies (6.3%). In another four studies (6.3%), a self-made survey instrument is used to record the cognitive load. Three studies (4.7%) each assess the cognitive load using the 9-Point Cognitive Load Scale (9-Point CLS, referenced as based on [68]) and a dual task approach [69]. The Rating Scale Mental Effort (RSME) [67] and a cognitive load subjective ratings scale (PAAS) (referenced as based on [70]) are used twice (3.1%). An adapted Nasa TLX version, the RAW-TLX [71], the Surg-TLX [72], the Cognitive Load Scale (CLS) [73], and a qualitative approach in form of an interview are chosen [28] once each (1.6%). Fig 5 provides a summary of the survey instruments used. In twelve studies, mental effort and mental load (18.8%) are reported.
Extraneous, germane, and intrinsic load were measured in one of the 64 studies (1.6%), task difficulty in three studies (4.7%), and psychological effort in 3.1% of the studies (n=2).
In 24 studies (37.5%), only the overall cognitive load is reported. In 16 studies (66.7%), this is measured by using several scales, e.g. from the Nasa TLX: four times by using the scale Mental Effort, three times by using the scale Mental Load, and once by using only the scale Mental Demand.

Discussion
The aim of this study is to map research characteristics of studies investigating the role of cognitive load during AR-enriched learning and training. As the bibliometric characteristics show, research on AR and cognitive load is an emerging field, telling by the increasing number of published studies each year. We identified a high number of existing studies addressing the issue of cognitive load while learning with AR, in contrast to the review in [40]. Researchers should be aware of this body of research when setting research questions and planning studies. Consequently, this will enable the expansion of knowledge on cognitive load and AR.
A possible extension for future studies is to include memory performance as a moderating variable. As other review studies have shown, it is neglected in research on multimedia and CL [74]. Neither did we find any study that included memory performance as an influencing variable. A possible cause is that half of the studies were conducted by authors from the same discipline. Since the assumptions of CLT are based on Baddeley's working memory model [31], we recommend for future studies the collaboration of multidisciplinary teams, especially with experts in memory research, to overcome these limitations.
The majority of the studies is conducted by researchers from Western countries and Taiwan, as geographical characteristics show. This is not only to be found in AR-based learning studies, but also in research of other educational technologies [75]. No study from an African country is available, which results in an enormous research gap. Learning is always dependent on cultural and educational realities, so there is a need for diverse findings on the use of educational technologies.
Regarding the methodological characteristics quantitative approaches are preferentially applied. Interestingly, there is also a qualitative study that reports on the risk of cognitive overload when learning with AR. This finding comes from interviews with learners after the AR intervention [28].
An exploratory research approach dominates in the studies, meaning that no hypotheses are tested but an open research question is pursued. Researchers justify this by saying that there are still few studies on AR and CLT. Therefore, it is not possible to formulate and test hypotheses. Our study, on the other hand, shows that a large number of studies on AR and CLT are already available and that it would be appropriate to test hypotheses. In addition, research on cognitive load has a long tradition, which would make hypothesis testing even in ARenhanced learning environments reasonable [31].
With regard to the research objectives, the analysis shows, most of the studies investigate the role of cognitive load during the AR-based fostering of procedural knowledge. Here, AR serves as a supportive technology that might reduce cognitive load and thus assists the performance of different tasks. The other studies use AR to teach declarative knowledge and prove, if AR is perceived as an additional burden while learning.
As noted in the CLT, the learning environment, including the technologies used, can influence cognitive load [76]. Therefore, both the assumption of a reduction as well as an increase of CL in AR-based learning environments are worth of investigation.
However, the research designs used to investigate these assumptions are questionable. As our data shows, media comparison studies dominate in the analyzed studies, i.e. an AR system is compared to video instruction.
Media comparison studies have been criticized for over 40 years because they focus on technology rather than the actions and processes of the learners [77][78][79][80][81]. Reeves and Reeves [82] call this thing-oriented research, which has no impact on practice because only contradictory findings are produced. This is because media comparison studies are inherently wrong, as exactly the same conditions can never be established for the experimental and control groups [81,83,84].
Incidentally, this cannot be justified by randomized design either. In this, too, researchers belief that it is the technology that influences performance and not the actions triggered by it [85].
This belief is also not in line with the theoretical assumptions from CLT, according to which prior knowledge and the individual capacities of working memory as well as the instructional approach are decisive for learning effectiveness and not the technology used [32].
In order to generate robust insights into the role of cognitive load in learning with AR, other, usually more complex, research designs are needed. Such studies can consider the interplay of technology-method-task or investigate the effect of an AR system for learners with different prerequisites, e.g. higher vs. lower prior knowledge [81].
If media comparisons are still conducted in the future, they should at least address different learning objectives [84]. Thus, it could be assumed that AR-based 3D representations are more effective for training spatial skills than 2D paper-based illustrations. Studies with such designs do not ask any more if one technology is better than the other but address educational problems [82]. As a result, these studies provide solutions how AR may support learners to achieve specific learning goals while perceiving lower cognitive load.
In 12 studies we identified value-added or intra-media research designs that allow the investigation of such solutions. These studies compare two or more AR applications under different instructional conditions or variations with regard to the media attributes. By focusing on learning processes and activities rather than the technologies used, these research designs are best suited for studying instructional effectiveness [86]. As an example, Lampen et al.'s study explored three different display variations when completing a task using AR glasses [54]. As it turned out, the demonstration of the task by a human avatar was the most effective support to accomplish the task and the cognitive load was also lowest in this condition. It is strongly recommended to conduct more such studies and focus more on value-added studies rather than continuing to conduct media comparisons. The described study by Lampen et al. helps to figure out how AR applications should be designed to make learning and training more effective.
With regard to the instruments used to measure cognitive load, it is noticeable that the Nasa-TLX questionnaire dominates. In principle, this questionnaire is well suited to capture the multidimensionality of cognitive load [87]. However, all scales from the questionnaire should then also be reported in the results section of the studies.
As our data show, this is not the case. Rather, the reporting of the scales varies greatly without being justified by the authors. Another problem with the Nasa-TLX scale is its estimation of cognitive load level. For example, many authors find lower cognitive load scores for the AR condition, but only compared to the control condition.
As describe above, mostly the control group consists of participants learning with other media or technologies.
Whether the cognitive load from, for example, a paper-based instruction was high or perhaps already very low and actually not perceived as cognitively demanding is not addressed in the studies. However, this would be necessary, because otherwise no real statement can be made about whether AR actually reduces the load. When interpreting the values from the Nasa-TLX, we recommend referring to the recommendations in Grier [88] in order to be able to interpret the values found accordingly.
Furthermore, if the Nasa-TLX is used, all scales should also be reported for the readers and if not all scales are relevant, authors should justify this.
Furthermore, it is striking that the measurement of the different cognitive load types intrinsic, extraneous and germane cognitive load has not been considered so far. Only one study used a corresponding questionnaire [89].
It should be noted that currently germane load is no longer recognized as a load type on its own by some CLT researchers [31]. On the other hand, instructional design researchers continue to see great importance in measuring germane load. Namely, it allows learning designers to determine whether the interventions they develop are actually triggering the processes that promote learning [90].
For research on AR and cognitive load, it must be strongly recommended at this point to at least distinguish between intrinsic and extraneous cognitive load when measuring the cognitive affordances of AR-enhanced learning environments. Further consideration of germane load is also appropriate at this time, as there is currently no evidence of this with respect to AR technologies. Future studies should also increasingly use alternative methods to measure cognitive load, for example dual-task methods or eye tracking.

Conclusion
Research on AR and cognitive load is an emerging research field that is still dominated by media comparison studies that investigate the question whether AR can be used to learn or perform better. Such studies have to be interpreted with great care, since exactly comparable conditions can never be established. As a consequence, no conclusions can be drawn about causal relationships. As an alternative, value-added studies or studies that take into account the characteristics of the learners are suggested. Especially in the case of cognitive load, it is useful to differentiate between lower and higher prior knowledge and to examine AR systems against this background.
Value-added studies are characterized by comparing two AR systems under variation of a variable, e.g., the addition of a learning strategy. Such studies help establish features and principles for designing AR applications for the purpose of learning and training.
Measurement of cognitive load is based on self-reporting scales, mostly by the Nasa-TLX. Reporting lacks any indication of whether the reported load is high or low; it is interpreted only in comparison to the control group.
Completely missing are measurements on the three cognitive load types, here further research is urgently needed. The studies should then not only be exploratory but should derive and test hypotheses based on the CLT.
Research on AR and cognitive load is a multidisciplinary research field, therefore researchers from different disciplines should also collaborate in conducting such studies.
Long-term studies are also needed to further inform practice, and such studies are not yet available in the sample of this study.
In conclusion, the characteristics found in this mapping study may shape the future of research on AR and cognitive load and can contribute to the design of more rigorous studies. This would help both research and practice.