The HoloLens in Medicine: A systematic Review and Taxonomy

The HoloLens (Microsoft Corp., Redmond, WA), a head-worn, optically see-through augmented reality display, is the main player in the recent boost in medical augmented reality research. In medical settings, the HoloLens enables the physician to obtain immediate insight into patient information, directly overlaid with their view of the clinical scenario, the medical student to gain a better understanding of complex anatomies or procedures, and even the patient to execute therapeutic tasks with improved, immersive guidance. In this systematic review, we provide a comprehensive overview of the usage of the first-generation HoloLens within the medical domain, from its release in March 2016, until the year of 2021, were attention is shifting towards it's successor, the HoloLens 2. We identified 171 relevant publications through a systematic search of the PubMed and Scopus databases. We analyze these publications in regard to their intended use case, technical methodology for registration and tracking, data sources, visualization as well as validation and evaluation. We find that, although the feasibility of using the HoloLens in various medical scenarios has been shown, increased efforts in the areas of precision, reliability, usability, workflow and perception are necessary to establish AR in clinical practice.


Introduction
Augmented Reality (AR) enhances the real world with virtual content by interposing computer graphics between the human eye and its field of vision.Recent consumeroriented developments made AR devices accessible to the general public.As a result, the AR field saw a strong growth in various domains, such as industry and entertainment.A main player in this new development was the HoloLens (Microsoft Corp., Redmond, WA), released in 2016.The HoloLens was originally marketed for applications in gaming, communication and 3D modeling; nevertheless, it quickly drew the attention from the medical domain.This development is unsurprising -after all, one can hardly imagine a professional domain in which AR could have a more significant impact than in medicine.AR has the potential to grant physicians "X-Ray vision" -the ability to see critical structures within the patient, without making a single incision.Wearable devices, such as the HoloLens, can make critical patient information permanently and readily available, and show them directly in the vision of the physicians.This approach allows them to keep their focus on the patient only and makes surgeries cluttered with monitors obsolete.Immersing remote experts into the mixed reality environment would further permit more and more patients to benefit from their expertise.Patients could be monitored and guided through various treatment and rehabilitation stages using AR, be it within the clinic or in their own homes, while medical students could practice critical interventions in a safe, virtually enhanced setting or immerse themselves in 3D anatomy.The HoloLens 1, as the first wearable, fully untethered AR device, was certainly an important step towards the future of AR in medicine.But how much could it contribute, and how far are we in making the aforementioned scenarios a reality?
In this systematic review, we provide a comprehensive overview of works that reported the usage of the firstgeneration HoloLens in the medical domain from 2016 to 2021.We identified 171 relevant publications through a systematic search of the PubMed and Scopus databases and analyzed them according to their intended use case, technical methodology concerning registration and tracking, data sources and data visualization, as well as evaluation and validation.Throughout our review, we highlight principal findings, identify gaps and discuss challenges and limitations.With the recent availability of its successor, the HoloLens 2, this review outlines the impact the first generation HoloLens had during its lifetime in the medical area.

Augmented reality
One of the most common definitions of AR stems from the virtuality continuum definition by Milgram et al. [137], who describe AR as a mixed reality (MR), which con-tains mainly real elements, enhanced with virtual content.Azuma et al. [11] further characterize AR environments as combining reality and virtuality by registration in 3D, while being interactive in real-time.Although this definition makes clear that AR can appeal to all senses, it is mostly concerned with visual data.In the medical field, where digital imaging techniques provide rich information, AR has huge potential.Unsurprisingly, once technology was advanced enough to consider real-world AR applications, it quickly drew the attention from the medical domain.
AR displays.The first medical AR systems were introduced as early as the late 1980s, with Roberts et al. [172] describing the first system, an operating microscope augmented with segmented computed tomography (CT) images.A head-mounted displays (HMD) continued to be a popular display choice in early medical AR systems, as, for example, demonstrated by the works in the 1990s [195,60] and in the early 2000 [182], who developed a video see-through HMD for medical applications.An HMD is a natural choice for medicine, as they intuitively align the head gaze of the wearer with the viewpoint of the content, and keep the hands of the wearer free at the same time.However, early HMD designs could not easily fulfill the high demands of medical AR systems in terms of performance, latency and accuracy, which require powerful computational infrastructure.Usually, this challenge resulted in bulky form factors, with wired connections between HMD and more capable computing and tracking infrastructure, making these systems difficult to implement in real clinical scenarios.Although head-worn microscopes, optical and video see-through displays continued to be relevant, in the years between 2011 and 2017, we see a shift towards world-localized displays, such as stationary monitors or projector systems [49].The release of the HoloLens 1, which was the first self-contained AR-HMD with a slim form factor, subsequently caused research attention to shift towards optical see-through (OST) displays again [75].
Registration and tracking.Alignment between reality and virtuality is a fundamental concept of AR, which is realized via registration.In a medical context, registration is mostly desired between medical data, often volumetric imaging such as CT or magnetic resonance imaging (MRI), and the patient.To maintain registration and synchronization of the viewpoint in the user's perspective, the position and orientation of the AR viewing camera with respect to the environment need to be tracked.
For tracking and registration, two fundamental paradigms can be distinguished: outside-in and inside-out approaches.Outside-in (or extrinsic) tracking refers to strategies where external sensors (e.g., cameras) are stationed around the user and thus, observe the movement of the device from the outside.Such methods can be very accurate, but require many components and only work in a limited space.In inside-out (or intrinsic) tracking, the sensors are integrated within the AR device itself and thus, the device can self-locate within an unprepared environment.Although diverse types of sensors can be used for tracking, vision-based methods, relying on visible light, infrared (IR) cameras or depth sensors, have dominated the field for many years [223].For vision-based tracking, observable features need to be visible to the tracking cameras.Typically, these features can be divided into artificial features for marker-based tracking, and natural features for marker-less tracking.
Marker-based tracking relies on indicators of pre-defined pattern and size, whose location in relation to the real world is precisely known.These indicators can, for example, be fiducial markers visible by standard RGB cameras, or IR emitters (either active or passive), which are more robust to variable lighting conditions.Medical technology has appropriated this principle years ago: IR emitting markers are well-established in surgical navigation systems, where they are anchored in rigid tissue, such as the patients' bones, and on surgical instruments, while being tracked with stereo IR cameras.This approach allows a computation of the relative position of tools in relation to critical anatomy.
Marker-less systems do not require artificial objects and, instead, rely on naturally observable features.Simultaneous localization and mapping (SLAM) [48] and its variants are the most common markerless tracking techniques, which are capable of fusing information from various sensors (e.g., visible light, depth, GPS) to build a map of the environment and tracking the device within it.Virtual content can then be placed manually or with the aid of markers into the mapped world.Other marker-less tracking approaches involve models or templates of known, stationary real-world objects, which are then fitted to their real counterparts, either through 2D-3D (in case of visible light cameras) or 3D-3D (if 3D information of the scene is available) matching.Since 3D models of the patient's skin surface are typically available from medical imaging, such methods are well-suited for medical applications.

The HoloLens
The first generation HoloLens is wearable computer glass (often also referred to as "smartglass"), which delivers augmented reality experiences through a 3D optical seethrough head-mounted display (OST-HMD).It was developed by Microsoft and rolled out in 2016.Contrary to other mixed or virtual reality headsets, the HoloLens was the first one to work fully untethered, requiring no wired connections to stationary infrastructure or prepared environments [222].
The HoloLens features a set of built-in sensors, including an inertial measurement unit (IMU), four side-facing visible light cameras for capturing the environment, a time-offlight (ToF) depth sensor, an ambient light sensor, four microphones and a front-facing, high definition photo/video camera.Only microphone and photo/video camera were accessible to developers in the beginning.In mid 2018, however, the so-called Research Mode enabled access to ToF and environmental understanding cameras for research purposes [53].Stereoscopic virtual content is displayed on two semi-transparent combiner lenses in front of the user's eyes for 3D vision, combined with the real environment.The equivalent of two 720p displays, one in front of each eye, allows a diagonal field of view (FOV) of 34 degrees, with a resolution of 47 pixels per degree [73].Sound is delivered via built-in speakers.The HoloLens is equipped with an Intel Atom x5 32-bit central processing unit (CPU) with 1 GB of random access memory (RAM), and has 64 GB of storage.Its active battery life is specified at 2-3 hours.
A custom, dedicated hardware accelerator, the Holographic Processing Unit with 1 GB of additional RAM, enables efficient processing of the sensor data in parallel to processes running on the HoloLens' CPU.This custom chip facilitates a set of on-board capabilities to understand the users actions, as well as the environment around the device.A SLAM algorithm continuously constructs and refines a spatial map of the environment, and locates the device within it, resulting in on-board, marker-less inside-out tracking of the HoloLens [102].Gaze tracking is supported via analyzing the user's head movement.Users can interact with virtual content via hand gestures or voice commands, both of which are automatically recognized.Additional input devices can be connected to the device via Bluetooth 4.1 LE, for example, the included clicker, a gamepad or an external keyboard.Connections can further be established wireless via Wi-Fi 802.11ac, or wired via Micro USB 2.0.

Search strategy and selection process
We conducted a systematic review of existing research about the HoloLens applied in medical scenarios.The review followed the Preferred Reporting Items on Systematic Reviews and Meta-Analysis (PRISMA) guidelines by Moher et al. [140].A systematic literature search in the databases PubMed and Scopus was performed for the keyword [hololens], together with any of the terms [medicine], [surgery] or [healthcare] in March 2022.The publication period was restricted to the years 2016 to 2021.Duplicates were removed, then, an initial screening of titles and abstracts was performed.After the initial screening, full texts were retrieved and reviewed for eligibility.Criteria for inclusion in both phases of screening were: 1) studies with English full texts, 2) studies describing full original research by the authors, 3) studies which have been peer reviewed, and 4) studies describing the application of the HoloLens primarily for a human medical purpose.Consequently, exclusion criteria were: 1) studies without English full texts, 2) studies not describing full original research, such as reviews or book chapters 3) studies which have not been peer-reviewed, for example conference posters/abstracts or commentaries, 4) studies which do not use the HoloLens as main AR  device, but only mention it, and 5) studies which are not primarily focused on a human medical purpose, but on other applications such as industry or gaming, and only mention medicine as a possible field of application.
The systematic electronic search resulted in a total of 975 records.18 additional records previously known to the authors were also considered.After removal of duplicates and screening of titles, abstracts and full texts according to our inclusion criteria, 171 studies were selected for the final analysis (see Figure 1).

Data extraction and taxonomy
Each study was reviewed by one author.We extracted information about authors, year of publication and medical speciality from every publication.Medical specialities were determined as stated by the authors, by publication venue or targeted anatomy and grouped, were applicable, e.g., cranial and facial sub-specialities were combined as craniomaxillofacial.Then, we extracted information about every publication according to our taxonomy, seen in Figure 2.
In section 6, we classified each study by the main intended user of the HoloLens: 1) clinical systems, whose main purpose is the support of physicians and healthcare professionals in the clinical routine, 2) Educational works, which aid medical and healthcare students in their schooling and training, and 3) applications focused on treatment and rehabilitation, which aim at supporting patients during different stages of therapy and disease management.Further, we divided each main category into sub-categories, based on application areas.From every publication, we also extracted information about applied registration and tracking methodologies, if any (see section 7), where we first categorized studies based on their tracking paradigm (manual vs. inside-out vs. outside-in), and further distinguished between marker-based and marker-less methods.Data and visualization techniques are reviewed in section 8, where we define categories based on data source (medical vs. non-medical), data type (2D, 3D and other), as well as acquisition time.Finally, we analyzed how medical AR applications using the HoloLens have been evaluated, grouping studies according to their evaluation scenarios, and identified commonly used qualitative and quantitative measures in section 9.

Related reviews
According to our exclusion criteria, review publications are not analyzed in this study.Still, we identified several related reviews during our literature search, which might be of interest for the reader.
Barsom et al. [15] provide a systematic review about AR for medical training to the year of 2015, and found that, although promising results were achieved, full validation of training systems was lacking.Chen et al. [34] analyze trends and challenges in medical AR found in over 1400 publications in the time period between 1995 and 2015.They identify powerful enabling technologies, human-computer-interaction and validation as major research challenges.Eckert et al. [49] review medical AR applications described between the years of 2012 and 2017.In these years, a trend towards display technology research and medical treatment scenarios could be identified.Still, a lack of evidence in clinical studies was noted.
Several reviews about AR, specifically for surgical applications, have been published.Vavra et al. [208] and Yoon et al. [221] review articles published pre-HoloLens, between 2010 and 2016, as well as 1995 and 2017, respectively.In this time period, live streaming from endoscopy, followed by navigation and video recording, were the most popular applications.Rahman et al. [170] focus specifically on HMD use in surgical scenarios up to the year of 2017.
More recent reviews about surgical AR using OST-HMD come from Birlo et al. [17] and Doughty et al. [47] for the years between 2013 and 2020, and 2021 to March 2022, respectively.They clearly show that the Microsoft HoloLens was the major driving force in OST-HMD research for surgery in the past years.Even more specialized surgical reviews have been published for orthopedic surgery [98], oral and cranio-maxillofacial surgery [12,75], neurosurgery [134,81,126], laparoscopic surgery [16] and robotic surgery [165].
In all these reviews, the lack of clinical validation is the most re-occurring aspect, something we also identify in this study.Other commonly mentioned challenges include technical limitations in regards to device tracking and rendering, and limited usability due to complicated workflows.The HoloLens, with its self-tracking capabilities, good support for the development of user interfaces and interactions and improved rendering capabilities, makes some of these challenges obsolete.Therefore, in this review, we focus exclusively on aspects and challenges coming with this new generation of OST-HMD devices, which still bear significance for more recent hardware, such as the HoloLens 2 or Magic Leap 2 (Magic Leap, Plantation, FL).Thus, we hope that it is interesting for not only looking back, but in particular also for pointing future researchers towards directions in which increased efforts are required.America and October 2016 worldwide, no publications reporting it's use in the medical domain were published in this year.After that, the number of publications in all categories shows a steady increase, with the highest number of research reported in 2020.In 2021, the number of papers decreases again -likely caused by the release of the HoloLens 2, which lead many researchers to shift their attention towards the newer generation device.

Medical fields of applications
As shown in Figure 4, the HoloLens saw applications in a large variety of medical areas, which we group into 21 fields.Surgical disciplines, in particular orthopedic surgery (18) and neurosurgery (14), were most frequently supported by AR applications, especially those targeted at physicians.Interestingly, in these most frequent dis- ciplines, image-guided and navigated interventions are already particularly common, e.g., through surgical navigation systems or fluoroscopy.Hence, it can be assumed that, from the perspective of user acceptance and recognition, the translation of AR technology into clinical practice can be more successful in areas which already heavily rely on such technological assistance.However, relevant procedures have also highest demands in accuracy and safety, which makes the implementation of AR much more difficult from a technical standpoint.15 publications do not indicate a specific medical field, and eleven target surgical procedures in general.These publications mostly introduce more general concepts not targeted at specific medical procedures -thus, they could be used in more than one specialty.Patient-focused applications are rather situated in speciality areas, where patient cooperation and motivation has a large impact on treatment outcome, such as neurology and kinesiology.

Use cases
We first categorize publications by their intended users, and further by the supported application.An overview of the identified categories and number of associated publications is given in  Figure 5: Overview of the number of papers identified in each of our three main categories (defined by targeted users) and sub-categories (defined by application area).

Physician-centered applications of the HoloLens
We group research within this category based on application, ranked by technological complexity: 1) Data visualization applications, where the HoloLens primarily serves as a display, are relatively simple to implement.2) Image-guided interventions demand either a registration between virtual content and the patient or a way to display intra-operative imaging in real-time and are, consequently, more challenging.3) Surgical navigation applications require tracking of medical tools in addition to the patient and the HoloLens, and have the highest demands in accuracy and reliability, which makes them the most complex.
Table 1 shows all studies targeted at physicians and other healthcare professionals, including their applications.

Data display
In its simplest form, the HoloLens can be used as an immersive display for medical data, such as 2D/3D imaging or healthcare records (see Figure 6 (a) and (b)).Pure data display applications do not need to establish a correspondence between the physical space and the shown data -content can simply be anchored to a fixed position according to the display itself, to be always visible for the wearer.Since the HoloLens self-locates within its environment, virtual objects can further be anchored to a stationary position within the real world without additional expenditure, to be naturally examined from different perspectives.This ability can have several advantages for clinicians.Access to medical data can be detached from stationary infrastructure and brought to treatment rooms, operating theaters and the bedside of the patient.For 3D data, such as volumetric medical imaging, stereoscopic visualization through the HoloLens may lead to an improved perception of 3D relations.Furthermore, the possibility of touch-less interaction with data is ideal for scenarios where sterility is important.Finally, by synchronizing several headsets, visualizations may be more easily shared between users.These factors could make inspection of interaction with medical data during diagnosis, intervention planning and procedures more intuitive and less cumbersome.We identified 33 publications in this category.Most of them describe workflows for visualizing pre-interventionally acquired, 3D volumetric imaging data, such as CT, MRI or positron emission tomography (PET), but also healthcare records and other documents.
A smaller group of works explores telemedicine, where remote monitoring and assistance are important concepts.A remote expert can assist local staff in carrying out critical interventions, which is particularly useful in rural or disadvantaged areas, with limited funding and staff.The HoloLens features video conferencing capabilities, which enable the real-time transmission and visualization of the viewpoint of an interventionist to a remote expert/observers, and, vice versa, expert guidance via voice, video or annotations, without having to look away from the patient or using an external computer.Sirilak et al. [190] developed an e-consulting platform to connect specialized physicians with rural and remote hospitals.The feasibility of video and voice communication during intervention or surgery has further been explored by Mitsuno et al. [139] and Glick et al. [71].Proniewska et al. [161] developed a strategy for digitizing the operating room, allowing telemonitoring from different perspectives with the HoloLens.

Image-guided interventions
The majority of papers reviewed in this study describe an application in image-guided intervention (IGI).AR for IGI is mainly motivated by the desire to obtain X-Ray vision of a patient, which can incorporate medical imaging data intuitively into interventional workflows by aligning patient anatomy, imaging data and the physician's viewpoint.AR technology can superimpose pre-or intra-operative images and planning data directly with the patient, allowing the physician to see target structures through skin or obstructive anatomy, as seen in Figure 6 (c).It can, thus, either replace traditional image guidance, or provide guidance for interventions usually performed without.
But X-ray visualization with the HoloLens has also been applied for procedures where the target anatomy is surgically exposed, such as tumor removal [158,91,175,192,88,180,94,184,185,74], vessel surgery [160,100,215], or cranio-maxillofacial surgeries [105,197,133].In these scenarios, the visualization of critical anatomical structures, which are not directly or clearly visible on the surgical site, such as blood vessels and nerves, or important planning information; for example, tumor resection margins or osteotomy lines, have the potential to make interventions safer.
For a convincing X-Ray visualization, an accurate overlay of imaging data with the patient is a prerequisite.Imageto-patient-registration, relating virtual content with target anatomy, is the key component for such a system, but other factors, such as display calibration and stability of the HoloLens self-tracking also play an important role.While many of the aforementioned works rely on a manual alignment of virtual content with the patient, several publications within this category focus on addressing these technical challenges.Mostly, they do not focus on specific medical applications, but develop new concepts for system calibration [6,82,57] or image-to-patient-registration [217,35,156,201,77], which could be applied in various medical scenarios.Other works evaluate and compare selected technical aspects [58,138,205,79,157].We will discuss image-to-patient registration and calibration methods in more detail in section 7 A third category of works does not target X-ray visualization -instead, the HoloLens is used to enhance traditional image-guided interventions, such as laparoscopy, endoscopy, fluoroscopy or ultrasound.It has been shown that monitor placement during image-guided procedures plays an important role -a misalignment of the visualmotor axis can increase fatigue and decrease orientation and hand-eye coordination of the operator, and, conse- quently, increase the risk of intervention-induced injuries [51].By anchoring the virtual 2D "monitor" to a convenient physical location or the head gaze of the user, ergonomics and subjective workload may be improved.These applications require methods to deliver live medical data to the HoloLens in real-time.While most frameworks could support a variety of imaging sources, studies specifically evaluate intra-operative X-ray [43,3], endoscopy [3], ultrasound [30], electro-anatomic mapping [193] and MRI [209].An example is shown in Figure 6 (d).

Surgical navigation
Surgical navigation systems (SNS) have been shown to make procedures more accurate, less invasive and faster, resulting in improved outcomes for the patient [136].Compared to conventional image guidance using intra-operative X-ray or CT, SNS do not burden operators and patients with additional radiation exposure, and compared to ultrasoundbased guidance, they are more accurate and work for every tissue type.Conventional SNS rely on visualizing navigation information on separate monitors, which leads to a switching focus problem for surgeons -they have to divide their attention between the surgical site and the navigation information.Such a division leads to issues of increased workload, disorientation and deteriorated hand-eye coordination [84], which AR could alleviate by fusing navigation information with the operating site.While IGI systems, as described above, can already provide a basic guidance based on images, precise surgical navigation requires realtime tracking of medical instruments and tools in relation to the patient anatomy, in addition to image-to-patient registration.In AR, navigation information can then be displayed in situ, fused with the target anatomy, as shown in Figure 6 (e).
Surgical navigation with the HoloLens has been explored as an alternative to commercial SNS in 27 publications.Mostly, AR navigation was studied in procedures where conventional SNS are already gold standard, such as neurosurgery [29,112,206,207], orthopedic (in particular, spinal) surgery [50,41,119,107,147,194], general surgery [135] or cranio-maxillofacial surgery [63,198,70].AR SN can also provide an X-ray free alternative to interventions typically guided by intra-operative imaging, such as endovascular procedures [109,64,121] or tissue ablations [114,117], or can be integrated into robotic surgery [167,164,168].
These procedures have highest demands in accuracy and reliability of registration and tracking, with a high reference precision in a millimeter or sub-millimeter range.
With the HoloLens hardware, it is difficult to meet these requirements.However, the HoloLens has a much slimmer form factor than conventional image guidance systems, which allows navigation for less critical procedures, usually performed without.Examples for such procedures include brain stimulation treatment [115] and US examinations [178,150].
Instrument tracking methods with the HoloLens will be reviewed in more detail in section 7.

Interventional and surgical training
Simulation-based skill training has made its way into standard medical education, replacing or enhancing traditional teaching and training methods [61].Aside from traditional simulators based on physical manikins, mixed reality technology has gained considerable popularity in this domain, either by enabling fully virtual environments, or by enhancing manikin-based training through virtual guidance and feedback [191].15 reviewed studies fall into this category.
The HoloLens 1 has been integrated into hybrid simulators, where it can be used to display additional guidance or even direct feedback to the user.Examples include the training of orthopedic surgery [32,37], emergency medicine interventions [103,13,86,162], laparoscopic or US examinations [129,171,85] or urological procedures [146,187].Another possibility is to build fully simulated, virtual training scenarios [32,24] or to include remote experts into the training sessions [214].

Anatomy learning
A meta-survey by Yammine et al. [220] has shown that 3D visualization techniques are preferable over traditional methods for learning and teaching anatomy, both in terms of factual and spatial knowledge.Contrary to such visualizations on conventional monitors or in virtual reality (VR), AR could not only provide 3D visuals, but also annotate real, physical models or cadavers with digital information.
Studies evaluating the use of the HoloLens to teach various gross anatomy via 3D visualizations of anatomical models have been reviewed [196,130,8,72,179,145].Robinson et al. [173] further tested the HoloLens as a learning platform for studying microscopic anatomy.

Patient-focused applications of the HoloLens
19 publications describe HoloLens-based systems for assisting patients during rehabilitation and treatment.De-signing AR applications for patients is challenging due to age demographics, varying affinity to novel technologies and general anxiety when it comes to medical treatments.The novelty of AR technology also provides opportunities, since it can make otherwise repetitive or dull activities significantly more engaging.We deduce three main application areas in this domain: a) patient training and education, b) assistance and monitoring and c) assessment and diagnosis.An overview over all studies, grouped by their application, is given in Table 3.

Patient training and education
It has been shown that immersive experiences can improve patient engagement and satisfaction during training tasks in rehabilitation [204] and pre-interventional patient education [153].Therefore, AR environments have the advantage of being potentially more intriguing for patients than conventional methods.At the same time, AR scenarios are safe and easy to control.
A series of studies has investigated the usage of the HoloLens to create virtual training environments for people with cognitive disorders, such as Alzheimer's disease [10,66,9].Another training task, which has benefited from AR support through the HoloLens, is the control of functional prostheses [188,152].In the context of patient education, the HoloLens has been used to provide a more comprehensible and imaginable explanation to patients before surgery [212,87,176].

Assistance and monitoring
AR, with its ability to enhance the reality around the users in real-time, without insulating them, could be ideal for compensating various impairments and overcoming difficulties during the daily lives of patients.Mobile health (mHealth) applications support such procedures through mobile devices, such as smartphones, smartwatches, or, in this case, the HoloLens, and are, consequently, fitting  for scenarios outside of a clinical environment, e.g., in the homes of patients.
The HoloLens has been explored for aiding patients with vision impairments in navigating their surroundings [219,7].Other applications include assisting patients with cognitive disorders in everyday activities [174,96], helping outpatients to adhere to their care plans [92,18,20] and text editing for people with motor disabilities [80].
As mHealth applications are becoming more and more pervasive in our everyday lives, integrating them into augmented environments is a logical step, and the abovementioned studies suggest promising applications of headworn AR devices in mHealth.However, it should be noted that the HoloLens is not yet suitable for operation during everyday activities, as it is relatively expensive, and its short battery live and bulky form factor make it unfit for being worn and used for an extended period of time.

Assessment and diagnosis
The variety of built-in sensors, along with its self-tracking capabilities, unfold the possibility to utilize the HoloLens as a measurement device during patient assessments and diagnosis.At the same time, instructions and demonstrations, guiding patients through these tests, can be displayed immersively and interactively.
Sun et al. [199] used the HoloLens for leading and tracking patient performance during functional mobility tests, by evaluating the inertial measurement unit (IMU) data recorded by the device.Geerse [67] and Koop [104] utilize motion data collected by the HoloLens to assess gait parameters (e.g., walking speed, step length, cadence) in patients with movement disorders, in particular Parkinson's disease.
HoloLens-supported assessment and diagnosis is presumably closest to real clinical applicability in the domain of patient-oriented applications, as all reviewed studies have shown reliability of the measurements derived from the HoloLens sensors.At the same time, the ability to simultaneously monitor clinical parameters, while providing instructions to the patient with a single device has obvious benefits in terms of ergonomics and economics.Furthermore, using the HoloLens during the confined timespan of such screenings is feasible without undue discomfort for the patient.

Registration and object tracking with the HoloLens
As already mentioned, the registration of virtual content to the physical situation is one of the fundamental concepts  As previously mentioned, tracking for AR mostly focuses on the self-localization of the AR device.Since the HoloLens already provides SLAM, applications in surgical navigation or advanced medical simulators need only track additional non-stationary objects (i.e., medical tools and instruments) with high precision.SLAM is not suitable for this type of dynamic tracking; therefore, other methods need to be implemented.Registration and tracking are usually closely related, as the same paradigms and methods can be applied to both tasks.In our analysis, we found 99 studies which establish a registration between virtual and real content, which are listed in Table 4. 31 studies further integrate methods for object tracking with their AR systems, which are shown in Table 5. Figure 9 visualizes the frequencies of identified paradigms and methods.

Manual registration
Due to the self-tracking capabilities of the HoloLens, registration between real and virtual content can be achieved simply by manually aligning position, orientation and scale of the virtual items to match their physical counterparts.Since registration is performed for the perspective of the user, factors hindering accurate perception, such as a poor display calibration, may be mitigated.32 studies in this review adopt such a manual registration technique, mostly by using transformation of objects via on-board input methods (hand gestures and voice commands) or additional input devices, e.g., gamepads [25,133] and keyboards [149].
Obviously, manual alignment of virtual content can be time-consuming and ponderous, which affects applicability in clinical settings, where time and personnel are usually scarce.Landmark-based methods can make manual alignment faster and less cumbersome.They involve the manual annotation of pre-defined anatomical landmarks in the spatial map of the real environment using gestures, which are matched with their virtual counterparts in preinterventional imaging [138,149,148].However, due to the coarseness of the spatial map and the lack of haptic feedback when selecting landmarks, these approaches may not be reliable or accurate.All manual registration methods have the disadvantage of being static -if the patient moves, the registration has to be manually adapted accordingly.

Inside-out methods
The built-in sensors of the HoloLens offer several possibilities for inside-out registration and tracking.The advantages of inside-out approaches in medical scenarios are evident: They work in unprepared and unrestricted environments and do not rely on expensive, specialized hardware, thus avoiding extra costs and further cluttering of already densely occupied spaces, such as operation rooms.However, it is still difficult to meet the high demands in accuracy and robustness of medical procedures using inside-out approaches [189,75].With 52 occurrences, they are most frequent in our reviewed studies.
Marker-based.Marker-based inside-out registration is the most common registration technique identified in this review, employed by 42 studies.Freely available AR libraries, such as Vuforia (PTC Inc, Boston, USA) or ArUco library [65], facilitate optimized, close to real-time detection and tracking of image fiducials via the HoloLens' front-facing RGB camera, which makes marker-based inside-out strategies easy to implement.The most straightforward method for registration, also employed by commercial SNS, is to anchor markers directly to rigid tissue of the patient, e.g., bones.For precisely relating the coordinate frame of the marker to the target anatomy, it is common practice to perform a pre-interventional scan, including the marker.However, attaching markers to patients is invasive and the additional imaging scan may lead to increased radiation exposure of the patient.Additive manufacturing offers an interesting alternative to this route, which allows the creation of patient-specific bone guides or occlusal splints for holding the markers [144,63,105].
In laboratory settings, 3D printing is also commonly used to create custom, marker-embedded phantoms for testing the registration method.Andress et al. [6] even developed a multi-modal marker, allowing intra-interventional marker-based registration.
Alternatively, landmark-based approaches, where distinct anatomical landmarks are digitized in the coordinate frame of the HoloLens and matched to their virtual counterpart using point based registration, can be used.A markertracked pointing device is used for landmark selection in these studies [205,119,107,147,215].To adapt to movements of the patient, a rigidly attached marker is necessary, or the entire procedure has to be repeated.
It is straightforward to extend marker-based inside-out methods for medical instrument tracking by simply attaching markers to the tracked objects as well.14 reviewed studies apply such a combined strategy.Liu et al. [122] and Condino et al. [37], combine marker-based, inside-out patient registration with outside-in tracking, using stereo cameras and electromagnetic sensors for tracking, respectively.
A drawback of using fiducial markers is a general lack of robustness and accuracy.It has been shown that the tracking error using common libraries can range from several millimeters to even centimeters [21,28] and is highly dependent on viewing angles, distance, lighting conditions and movement patterns [115,97,128,224].These issues make planar image targets not ideal for highly precise, six degrees of freedom (6DoF) applications, as required in most medical scenarios.In proof-of-concept studies, Kunz et al. [112] and Van Gestel et al. [206] have explored the possibility of tracking spherical, IR reflective markers inside-out using the IR sensor of the HoloLens, which appears to be a promising direction.Marker-less.Ten studies explore the possibility of using the various on-board sensors of the HoloLens for insideout, marker-less registration.An early work by Xie et al. [218] explored the possibility of surface-based registration of a patient's skin surface with the spatial map created by the HoloLens SLAM.However, the spatial map accessible to developers is very coarse, resulting in insufficiently accurate natural features extractable from it.Hajek et al. [82] also exploit the HoloLens SLAM by using two devices in a master-worker configuration, while Liu et al. [121] use image-based matching to align intra-operative X-ray with the patient anatomy.
Landmark-based registration approaches have been employed as well.For example, Pepe et al. [156,155] use automatically detected facial landmarks for registration.
From mid 2018 on, the Research Mode allowed access to the HoloLens' built in sensors aside from the RGB camera, opening new possibilities for inside-out registration.Sylos-Labini et al. [201] used automatically detected facial landmarks as well, but showed that, by combining them with the ToF depth data, accuracy can be slightly improved.Gsaxner et al. [77,78] subsequently introduced a pipeline for fully automatic registration via point cloud matching, using 3D features from ToF depth alone.This method was later also employed by Gu et al. [79], who compared surface-based registration with marker-based and outside-in methods.

Outside-in methods
25 reviewed publications use an outside-in paradigm for registration and tracking.Outside-in approaches rely on external infrastructure for registration and tracking.External infrastructure makes it possible to exploit highly precise, specialized hardware, such as commercial SNS.The high reference accuracy of such systems (usually ≤ 1 mm and ≤ 1 • ) makes their integration into an AR environment promising.However, the integration of external systems requires the calibration of coordinate frames between the HoloLens and the navigation device.This procedure usually involves manual and/or semi-automatic steps, which can be cumbersome and disruptive to the clinical workflow, as well as prone to errors and highly subjective [42].
Marker-based.Most commercially available SNS track passively reflecting markers using stereoscopic IR cameras [106].By attaching those markers to the patient, their relative localization in relation to pre-interventional imaging can be determined.The HoloLens can be integrated into such a setup, by affixing markers to the headset as well.Since SNS are designed not only for tracking patients, but, in particular, medical instruments, object tracking can be integrated easily with such systems, and most reviewed studies in this category use this principle.Liu et al. [122] use stereo cameras and LED markers instead.
Such marker-based SNS have a high reference precision, often below one millimeter, however, in addition to potential complications resulting from system calibration, they require a constant line-of-sight between IR camera, patient and device, which may restrict movements.
Marker-less.Before the HoloLens Research Mode enabled access to the on-board ToF camera of the device, some works integrated external depth sensors with the HoloLens to enable a surface-based registration [115,217,35,213].As an alternative to capture the full surface of patients, again, a sub-set of points in the form of anatomical landmarks can be used, for example, digitized via external electromagnetic trackers [109,64,146].In these scenarios, the electromagnetic sensors have been used for instrument and tool tracking, as well.However, electromagnetic tracking is generally less popular than optical tracking, as it suffers from interference with metallic materials, commonly found in clinical spaces [106].

Data and visualization
Various data were visualized in augmented environments through the HoloLens.We distinguish data based on its source (medical or non-medical) and type, according to dimensionality (2D, 3D, other), and discuss how this data is typically visualized.An overview of data source fre- quencies in the reviewed publications is given in Figure 10, and a list of all papers in each category is provided in Table 6.Note that most reviewed studies utilize more than one source and type of data -therefore, multiple mentions are possible.

Acquisition time
Regardless of the source and type, data can further be distinguished based on its acquisition time: Pre-interventional data is acquired offline, processed and uploaded to the HoloLens before the actual intervention.This method allows more complex workflows, including manual manipulations of data.With overall 206 examples, preinterventional data makes up the majority of sources.
Intra-operative data is collected at run-time and streamed to the device for visualization.Obviously, intra-operative approaches are technically more complex, since they require a connection between the HoloLens and the raw data source, and necessary processing steps need to be performed automatically, in real-time.Overall, 58 intrainterventional data sources have been identified for this review.

Medical data
3D volumetric medical image data.For the majority (110) of reviewed papers, 3D medical images, acquired primarily through CT/CTA (89) and MRI (38), are a main source of data.They are represented as volumetric grids, where each voxel represents a specific value calculated by the imaging device.For visualization, they have to be rendered to present them on the HoloLens display.
Volumetric medical data is conventionally visualized in 2D on monitors in clinical practice, in the form of orthogonal slices through the image volume (mainly axial, sagittal and coronal planes or, sometimes, oblique reformats, so called multi-planar reformations).Since physicians are accustomed to this type of visualization, slice rendering of volumetric data has also been employed in 29 reviewed medical HoloLens systems.
This technique has, of course, the drawback that data is only shown in selected planes.Given a stereoscopic AR display, true 3D visualization is becoming more widely used, mostly in the form of 3D surface renderings, which is computationally efficient and natively supported by all graphics engines compatible with the HoloLens.Furthermore, colors and opacities can easily be modified, enabling visualization techniques such as wire frames or outline visualizations.However, for surface rendering, tissue has to be segmented and converted to polygons prior to visualization, leading to more time intensive workflows and quantization inaccuracies.In contrast, direct volume rendering offers superior image quality [113,101] and does not require surface extraction before visualization.Instead, color and opacity are directly computed from the underlying voxel values using specialized transfer functions.Still, performance requirements of volume rendering cannot be easily addressed with mobile hardware, such as the HoloLens.Consequently, only six reviewed studies attempt volume rendering [59,216,87,94,68,4].
Since data acquisition and reconstruction of 3D volumetric data is relatively costly, only few applications with intra-operative acquisition times exist.Velazco-Garcia et al. [209] describe a framework for live interactions with MRI scanners.Qian et al. [166] stream 3D endoscopy data to the HoloLens in real-time, while Southworth et al. [193] display live 3D cardiac electrophysiology data with the HoloLens.
2D medical image data.20 reviewed studies use 2D medical imaging as a data source.Common modalities include X-Ray/fluoroscopy scans (7), ultrasound (7) or endoscopic video (3).Contrary to 3D imaging, 2D modalities usually have short acquisition times (close to or even meeting real-time requirements) and are comparably easy to deploy, and are therefore popular for intra-interventional guidance of procedures.15 publications in this category support intra-interventional data acquisition during the runtime of the HoloLens.
Analogous to the ordinary clinical practice, 2D imaging data in AR is often visualized on virtual (AR) monitors, which can be anchored to the head gaze of the HoloLens wearer.Another possibility is to position 2D images on 3D planes in the environment, which allows an in-situ visualization, if a registration between imaging data and patient is available.
Other data from medical sources.In many situations, it is beneficial to integrate other medical data not stemming from medical imaging into the workflow.Medical planning data is a particularly common example, with 29 publications integrating planning data into their workflows.This data is usually created manually and pre-operatively by medical professionals before an intervention on the basis of medical imaging.It can include access points, tool trajectories, cutting lines, resection margins and target positions of implants, amongst others.This type of data is usually translated into geometric primitives, which are displayed in relation to the target anatomy.
For intra-interventional data, the positional coordinates of medical tools (such as needles, wires, or screws) or other tracked objects (parts of the anatomy, imaging systems) obtained from outside-in or inside-out navigation systems are the most common data source.Mostly, these objects are represented by geometric primitives or 3D models, which are transformed according to the positional information.However, a simple numerical representation is also used in some studies [119,121,107].
Other medical data sources, which have been captured both pre-and intra-interventionally, include vital signs or other biosignals and patient records, which can be displayed on virtual monitors in AR.

Non-medical data
3D data.The inherent ability of the HoloLens for stereoscopic rendering make all sorts of 3D meshes an obvious choice of data source for AR visualizations.
Eight studies use 3D scans of patients, captured with depth or stereo cameras, instead of volumetric medical imaging, mostly for the purpose of image-to-patient registration.
Contrary to medical 3D data, such scans can only capture the surface of patients and do not inform about the underlying anatomy.In particular in educational scenarios (targeting both patients and students), the visualization of anatomical models, created by medical artists, is common and used in 11 studies.Both of these data sources have exclusively been deployed pre-interventionally to the HoloLens.
2D data.A small number of eleven studies visualize nonmedical two-dimensional data in the form of pre-recorded or live streamed videos or documents.As with 2D medical data, it is usually displayed on virtual monitors anchored to the display or environment.
Other data.Six works have explored the possibility of integrating other data, in most cases coming from the HoloLens itself, into their applications.Three publications track the user wearing the HoloLens, to measure gait parameters [67,104] or guide the user [219].Two publications utilize the gaze data from the HoloLens [199,85].Only Sharma et al. [188] use an external data source, namely IMU data, for training limb prosthesis control.

Evaluation of medical HoloLens applications
In general, an objective evaluation of AR applications is challenging, because each user has a different perception of augmented content, depending on individual anatomy (interpupillary distance, eye sight), familiarity with the technology, familiarity with 3D visualizations in general [177] and external influences, such as comfort while wearing the AR device.In comparison to other areas of computer science, no benchmarks, datasets or standard protocols exist to evaluate AR experiences and usefulness.Clinically, comparative clinical trials, measuring and comparing parameters about treatment outcomes, such as treatment time, number and severity of complications or survival rate, are considered gold standard.However, each AR application requires approval by a relevant agency or committee before it can be tested on cadavers, healthy human subjects or even patients.Depending on the executive research institution and national regulations, obtaining such an approval and the quantitative data that comes with it, can be very difficult for researchers.Therefore, in our reviewed studies, a large variety of evaluation metrics have been collected in distinct experimental scenarios, which are summarized in Table 7.

Evaluation scenario
We first analyze the reviewed publications with regard to the evaluation scenario.Inspired by the Technology Readiness Level [131], we group the studies according to their evaluation settings, which is shown in Figure 11: Proof of concept studies focus on reporting a medical problem, how AR could overcome it and describe their prototype workflows and applications.Sometimes, anecdotal or informal feedback from users or general observations are reported, but, in general, these studies do not follow a rigorous experimental protocol and do not collect quantitative or qualitative measurements.Therefore, it is difficult to draw general conclusions from them.With 35 papers, proof of concept studies are in the minority.
Laboratory studies typically focus on the technical aspects of their applications and report quantitative measurements.We identify 47 records in this category.The study can be carried out using only hardware (e.g., the HoloLens), or on cadavers, animals or humans (healthy or patients).Most commonly, however, phantoms are used to collect measurements.Specialized medical phantoms, which include realistic anatomical structures and tissue characteristics, are commercially available, however, they are very expensive.Consequently, many researchers resort to additive manufacturing (i.e., 3D printing) to replicate the target anatomy or build more abstract phantoms.
Studies performed in a relevant environment evaluate their AR systems directly in the environment in which it should be implemented.Such an approach involves the usage of the system by one or more individuals of the intended target group -either clinicians, patients or medical students.Most of the time, qualitative feedback in the form of questionnaires is collected from them, although quantitative measurements, for example measuring task performance, might also be taken.As seen in Figure 11, most of the reviewed studies (89) fall into this category, which indicates advanced research maturity.We further distinguish between non-comparative studies, where results acquired through AR are not compared to another method (for example, case series or uncontrolled cohort studies), and comparative studies, which provide comparisons to non-AR conditions.The latter are most conclusive about the possible advantages and implications of the HoloLens in their domain.

Quantitative metrics
Quantitative metrics are often focused on technical aspects of the AR system.Therefore, acquiring them does not require a large number of test subjects and, instead, can be done by individuals.However, they can also characterize the performance of individuals in carrying out certain tasks.
In this case, quantitative measures are usually closely related to the application scenario.
Technical performance metrics.Several works, in particular in the area of data display, measure performance metrics of the HoloLens itself, such as hardware utilization, frame rate, power usage, execution time and latency.The studies come to the conclusion that the HoloLens is suitable for displaying pre-and intra-interventional medical data given an appropriate software framework, also within safety critical environments, such as the OR.A commonly reported limiting factor is battery life, which restricts device usage to around two hours, which is too short for many medical interventions.
The HoloLens was also compared to other OST-HMD devices for medical usage.Qian et al. [167] evaluated the HoloLens, Epson Moverio BT-200 and ODG R-7 for displaying object-anchored 2D medical data, and concluded that the HoloLens is the best choice in terms of contrast, frame rate and perceived task load.Moosburner et al. [141] compare the HoloLens to the Meta 2 (Meta Company, San Mateo, California, USA) and found that, albeit the HoloLens was criticized for having a comparably small FoV and being more complicated and difficult to operate, medical students preferred it over the competitor, as it does not rely on a wired connection to a powerful external computer and presented virtual models more stably.
Accuracy metrics.For registration and tracking, accuracy metrics, measuring the spatial distance between the virtual and real position of an object, are usually acquired.
While many different measures can be computed, the target registration error (TRE) is one of the most commonly and consistently used metrics for evaluating registration accuracy, and has been employed by 22 reviewed studies.
TRE measures the Euclidean distance between 3D target points in the physical world and their virtual counterparts.Studies evaluating the TRE report averages of just above 1 mm and up to 40 mm.E.g., for a registration using outsidein tracking, El Hariri et al. [50] report a TRE of 36.9 mm, while Kuhlemann et al. [109], Li et al. [117] and Sun et al. [198] report much lower values of 4.3 mm, 2.2 mm and 1.3 mm, respectively.For manual registration, the reported error spectrum is also large, ranging between 20 mm [25] and 3 mm [203].Registration using image fiducials seems to be the most reliable in terms of TRE, with values in the 2 mm region [37,144], but several studies show that Pre (1) Perkins et al. [159] Intra (1) Deib et al. [43] 3D non-medical the achievable accuracy with image fiducials is highly dependent on lumination, viewing angle and movement [97,128,224].Whether the reported registration accuracies are acceptable is, of course, contingent upon the clinical scenario.However, most studies express the need to reduce the registration error before clinical usability.While TRE provides some comparability between registration methods, measuring it involves the selection or digitization of matching landmark points, which is itself a subjective, error-prone procedure, encumbered by a lack of haptic feedback, fine-grained input possibilities and depth perception.These problems explain the large variability reported for this metric.
A variant of the TRE is the fiducial registration error (FRE), which uses fiducials used in landmark-based registration approaches as target points for error computation.Three publications report FRE.For example, Van Doormaal et al. [205] compared FRE achievable with marker-based inside-out registration using landmarks with a conventional SNS registration.They found that the AR system is less accurate and not yet suitable for clinical application.However, it has been shown that FRE does not correlate with the TRE and thus, does not inform much about the actual registration accuracy [56].
Another common measure, evaluated in 18 studies, is the target visualization error (TVE), which measures the reprojection error between physical and virtual objects as perceived by the user, e.g., using a ruler or a millimeter grid or by marking the virtual projection directly on the real counterpart.Most studies report TVE values in the millimeter region.Three studies compare the TVE achieved using the HoloLens with a non-AR baseline: Ivan et al. [94] found no significant difference to a commercial SNS in terms of TVE.Qi et al. [163] state that AR could reach the reference precision in 80% of cases, while Incekara et al. [91] determined that only in 38% of cases the reference could be met, and the mean deviation of 4 mm between HoloLens and SNS is too large for clinical applicability.Still, the manual measurement of TVE is, again, subject to operator bias.
Registration error, measured in three studies, calculates the deviation between the source-to-target transformation computed by the employed algorithm and a reference transformation obtained from a reference tracking system.Analogously, tracking error compares the pose of a tracked object to a ground truth, ideally in six degrees of freedom.Since the reference system and the HoloLens need to be calibrated, such experiments are complicated to set up.Therefore, many studies report a simplified tracking error, e.g., in 2D [121] or positional only [112,122].
In clinical interventions where pre-interventional planning data is available, the target deviation error (TDE), which measures the Euclidean distance between a pre-operatively planned target point and the actual point after intervention, can be determined.A typical scenario is the insertion of objects, such as needles, wires or screws, into a phantom, cadaver or patient under AR guidance.After insertion, post-operative imaging is acquired, which can be compared to the planning.This type of clinically specific evaluation is most objective and informative about how an AR system can support the intervention in question.We identified 24 publications evaluating TDE.Several studies perform such a clinically specific evaluation by comparing the outcome of an AR-supported procedure to a non-AR control condition; however, results are inconclusive.Several studies [2,6,120,147,125] compare needle/wire placements under AR guidance with a conventional, flouroscopy-guided procedure.They found that placements in AR were slightly less accurate than in the reference condition, although AR guidance lead to faster task completion.Andress et al. [6] and Long et al. [125] further point out that, with AR, less radiation was required during image-guided procedures.Li et al. [117], Ruger et al. [178] and Glas et al. [70], report favorable needle insertion accuracies in AR-guided procedures versus conventional image guided procedures.
Compared to freehand, non-guided procedures, AR could improve both accuracy and number of successful task completions in placement tasks [52,45,207].
Task-specific scores.Studies using the HoloLens for supporting specific medical tasks usually report some quantification of task completion.The task completion time (TCT) is most commonly measured, namely in 32 reviewed studies.Most comparative studies report that AR guidance helped users in carrying out tasks faster [62,2,3,6,120,147,125,85,70,188,52,168,200], while others did not report significant differences [214,43].Only Qi et al. [163] and Rohrbach et al. [174] report longer TCT for the AR condition, however, the latter application is targeted as Alzheimer's patients, who may have more difficulties in adapting to novel technology, such as AR and the HoloLens.
The number of successful task completions (NSC) is measured in 12 studies.Most studies report favorable outcomes of HoloLens usage in terms of NSC [45,186,188,168].Only Agten et al. [2] found that AR actually leads to less successful outcomes, compared to a conventional imageguided procedure.
The effectiveness of AR for learning in an educational scenario can be quantitatively measured by comparing exam scores between AR-supported learners and a control group.
Seven studies perform such an evaluation.No statistically significant knowledge improvement was found between students receiving AR lectures through the HoloLens versus students undergoing conventional anatomy courses based on cadavers [196,173,179].Robinson et al. [173], however, highlight that students perceived the AR activity more favorably.Similar findings are described in comparison to other computerized learning methods by [8,72,145] -while student engagement, motivation and excitement is typically higher for HoloLens-based education, the outcomes in terms of learning effect are not significantly different.

Qualitative metrics
We define qualitative metrics as parameters and data, which reflect the personal opinion of individuals, and can, therefore, not be objectively and repeatably measured.Usually, they are collected from application users by the means of questionnaires or interviews.Since AR experiences are highly individual, qualitative metrics can be considered equally if not more important than quantitative measures.After all, theoretical benefits of medical AR are negligible if the system that delivers them is deemed cumbersome or fails to meet the user's needs.
Commonly, questionnaires use a Likert scale, where respondents express their level of agreement or disagreement with certain statements.42 reviewed publications use such questionnaires for evaluating various system aspects.Examples include general comfort, image quality and audio quality of the HoloLens and its suitability for medical applications [37,95,190,141,62,111,3,184,45], the effectiveness of certain types of visualization [23,212,87,68] or, most commonly, how well the proposed application can support a certain procedure.
Generally, the reported questionnaire outcomes are favorable towards the HoloLens and AR, and the common consensus is that AR can have a large impact in the medical domain.However, limitations of the device itself, such as the small field of view, short battery life and relative discomfort while wearing it are frequently mentioned.For IGI or navigation applications, users also frequently noticed a lack of registration accuracy or a drift of virtual content due to instabilities in the HoloLens SLAM, which negatively influenced user ratings.In these scenarios, issues of depth perception, where users perceived internal anatomy to be on top of, not within, the patient, were also frequently mentioned.
Some reviewed studies employed standardized questionnaires, with the NASA Task Load Index (NASA-TLX), a tool to assess subjective workload, being the most commonly used one.A drawback of the NASA-TLX is that it is only fully descriptive in comparative studies, where it is measured for several conditions.A classification or interpretation of a single final score is, generally, not substantial.Unfortunately, only a few comparative studies measure the NASA-TLX -two of them report a reduced task load for AR-supported procedures [52,168], Rüger et al. [178] found no significant difference and Saito et al. [180] found that the mental demand was higher for participants using AR.
The System Usability Scale (SUS) [22] was applied in five studies as a measure for application usability.Compared to the NASA-TLX, the advantage of SUS is that it is a fast way of classifying the ease of use of a system, even without a comparison.Generally, overall scores greater than 68 are considered above average; furthermore, an adjective rating scale has been proposed [14].Only two reviewed studies compute the overall SUS, both reporting above average usability with SUS values of 71.5 [5] and 74.8 [78], scoring a "Good" on the adjective rating scale.While these ratings are encouraging, they suggest that there is room for improvement.

Conclusion and outlook
With 171 original, peer reviewed works in the medical field, the HoloLens certainly had a large impact on medical AR already.In this systematic review, we found that, while various medical specialties and applications have been investigated, and a fair number of systems have been studied clinically, only few works have clinically demonstrated clear advantages of HoloLens-based systems over the current state-of-the-art.The acceptance of new technologies, such as AR, in the medical field is an ongoing challenge for researchers, medical professionals and patients alike.In this review, we identify that increased efforts in the areas of precision, reliability, usability, workflow and perception are necessary to establish AR in clinical practice.
We found that applications targeted at physicians and healthcare professionals are, by far, the most common.
While the potential benefit for AR supported image guidance and navigation is very high, those systems are also difficult to implement, mostly due to the high accuracy and reliability demands.The reviewed studies suggest that, for high precision applications, registration and tracking errors achieved with the HoloLens are generally too high, regardless of the employed technical paradigm and method.However, for procedures carried out without image guidance, for which sub-millimeter precision is not necessary (e.g., ablations [52], ventriculostomy [118,207,186] or certain orthopedic interventions [44]), the HoloLens is already a very promising tool.In these scenarios, the slim form factor and low cost of the HoloLens in comparison to traditional image guidance systems could make navigation feasible for procedures which have not benefited from it before.For this purpose, however, automatic and accurate inside-out registration and tracking is paramount to keep the setup workflow.
The second most common intended user group were students.In educational training scenarios, the HoloLens was shown to be an effective enhancement for medical simulators [13,86,146,187,200,85], in particular for providing visual feedback during training tasks.In anatomy learning, the effects of HoloLens learning compared to conventional learning using cadavers or other computerized methods seem to be small, although several studies report improved engagement and motivation of students, which could have positive effects in the long term.Anatomy learning studies in this review also usually feature relatively simple, conventional 3D models.More innovative visualizations, including interactive, dynamic content, which can not be easily delivered by regular computerized methods, have not been explored in depth yet.
The majority of studies in this review seeks a registration between real and virtual environment, and inside-out approaches, in particular using image fiducials, are the most common methods to achieve the required registration.This observation is unsurprising -after all, such approaches are relatively easy to implement.Our analysis shows that they deliver a reliable, acceptable accuracy in controlled settings.Their described disadvantages, such as line-ofsight constraints and susceptibility to different viewing positions, movement patterns and lighting conditions, however, likely impede clinical adoption.Spherical markers seem to be more robust and encouraging results have been reported [112], more recently also for the HoloLens 2 [76].
Innovative, marker-less, inside-out strategies have been reported for registration, but are still hampered by technical limitations.For instrument tracking, research in the direction of marker-less, inside-out methods based on deep learning is only recently gaining traction [46], but will surely have a large impact in the field.
When it comes to data and visualization, the majority of studies display pre-interventionally acquired 3D medical imaging data, primarily from CT or CTA, visualized through surface rendering.We expect this trend to continue.Compared to volume rendering, surface renderings are easy to create, modify and efficient to render, and no clear advantage of volume rendering through the HoloLens has been shown so far.Perceptual issues, in particular depth perception, are a known problem in AR [123], and several works mention that incorrect depth perception negatively influenced the perceived accuracy of their application and impaired guidance through the HoloLens.Still, very few reports concern themselves with visualization strategies overcoming these limitations, and use very simple methods (e.g., wire frames [54]).While many strategies exist to improve depth perception in medical AR [75], most of them are difficult to apply with OST displays, such as the HoloLens, where only additive visual information is possible and the view of reality cannot be altered.Novel, innovative strategies will be necessary to overcome this limitation in the future.
It is paramount that medical AR applications are validated with the intended user in the loop, and it is encouraging to see that the majority of studies in this review evaluate their applications in a relevant setting.Still, the large variety in experimental setups and acquired measures, together with the lack of standardized protocols, makes it very difficult to clinically validate these methods.We believe that this review can serve as a guideline to researchers, to help them in picking appropriate experimental protocols and mea-sures for their scenario.We think that it is time for medical AR to step out of the comfort zone of controlled laboratory settings, and finally find its way into medical routine.To this end, close collaborations between researchers, universities, clinicians and patients, as well as comparative studies on a larger scale are necessary.
The HoloLens 1 has likely reached the end of its life cycle in research, due to the release of its direct successor, but it has caused a major boost in medical AR research.With the availability of novel hardware, such as the HoloLens 2 or the Magic Leap 2, and the recent increased interest of other leading tech companies in AR technologies, we expect this trend to continue.Furthermore, specialized medical OST-HMD devices, e.g., xvision (Augmedics Inc., Arlington Heights, IL) or VOSTARS (University of Pisa, Pisa, IT), have the potential to address technical limitations in current, commercial devices.Improved hardware can also facilitate the use of deep learning models on the HMD itself, opening up countless possibilities in terms of recognition, tracking and scene understanding.In conclusion, we think that, although the feasibility of using the HoloLens for various medical scenarios has been suggested, research in medical AR is still in its early stages, and abundant areas for future work remain.

Figure 1 :
Figure 1: Search strategy used in this systematic review.Adapted from the PRISMA flow diagram by Moher et al. [140].

Figure 2 :
Figure 2: Taxonomy employed in this review.Each publication is analyzed with regard to intended use case, registration and tracking principles, data sources and visualization, as well as evaluation and validation.

Figure 3
Figure 3 provides an overview of the number of papers published in each reviewed year, from 2016 to 2021.Although the HoloLens was available from March 2016 in North

Figure 3 :
Figure 3: Number of papers published per year in each use case category between the years 2016 and 2021.

Figure 5 .
Physicians and healthcare professionals working within the clinical routine have been, by far, the most popular target audience of proposed HoloLensbased AR systems.128 out of 171 studies, almost 75%, describe an application of the device for supporting healthcare professionals in tasks such as diagnosis, treatment planning and treatment execution.Medical students come second, with 24 works dedicated to anatomy learning or training of interventional procedures.Lastly, 19 studies targeted an application for patients, either for patient education, monitoring and guidance, or diagnosis.

Figure 4 :
Figure 4: Frequency of papers in each of the 21 identified medical fields."Nonspecific" refers to applications where authors did not indicate a specific area, which means they could be used in several disciplines.

Figure 9 :
Figure 9: Frequency of registration and tracking methods employed by the reviewed studies.Most works rely on inside-out, marker-based tracking, followed by manual alignment.

Figure 10 :
Figure 10: Frequency of data sources used in the reviewed studies.3D Medical imaging data is, by far, the most common source of data for a visualization in AR.

Figure 11 :
Figure 11: Number of papers for each experimental setting (inner circle) and experimental level (outer circle).

Table 2 :
Studies reporting an application of the HoloLens for medical students and residents in an educational context.

Table 3 :
Studies reporting a patient-focused application of the HoloLens.