What is XR? Towards a Framework for Augmented and Virtual Reality

organize them in our proposed framework. As a result, we conclude that (1) XR should not be used to connote extended reality, but as a more open approach where the X implies the unknown variable: xReality; (2) AR and VR have fundamental differences and thus should be treated as different experiences; (3) AR experiences can be described on a continuum ranging from assisted reality to mixed reality (based on the level of local presence); and (4), VR experiences can be conceptualized on a telepresence-continuum ranging from atomistic to holistic VR.


Introduction
Recent advances in information technology, such as high-speed mobile Internet, artificial intelligence, increased computing power, and high-resolution displays, create new ways for users to experience reality (Dwivedi et al., 2020;Hoyer et al., 2020). Important industry players have developed a plethora of devices, brands, and labels to position themselves in this market. For example, Microsoft is promoting their Hololens as a "Mixed Reality" device (Rauschnabel, 2018). Meta Platforms, Inc. (formerly Facebook, Inc.) purchased Oculusa "Virtual Reality" (VR) company (Hoffman et al., 2014), to complement their primary social media products as a "metaverse" company. PTC discusses "Assisted Reality" as a new reality format for warehousing companies (Coon, 2018). Apple touted "Augmented Reality" (AR) as a technology that will disrupt the world (Raymundo, 2016). Furthermore, Deloitte (2018) uses the term "Digital Reality" and Accenture embraces "Extended Reality" (Raghavan & Rao, 2018).
This ambiguity and confusion of terms and concepts is also notable in the academic literature. For instance, Wedel et al. (2020, p. 443) state that "mixed reality (MR) merges both VR and AR", indicating that "we refer to all these technologies [AR, VR, and MR] as VR and use the term AR only when the distinction is needed in a specific context." At the same time, Milgram and Kishino's (1994) influential realvirtual environment continuum conceptualizes mixed reality as an umbrella term, combining virtual and real elements. However, other scholars contest the Milgram and Kishino view by suggesting that mixed reality is a very specific type of reality, situated between AR and "augmented virtuality (AV)" (Farshid et al., 2018;Flavián et al., 2019). Hoyer et al. (2020) propose that mixed reality is an extension of AR and argue that "while AR is mainly available through smartphone apps, MR requires a headset or an equivalent wearable device" (p. 59). In addition, some authors point out that AR is fundamentally different from VR (Tan et al., 2021). The discussion is further complicated by Milgram et al.'s (1995) observation that "Perhaps surprisingly, we do in fact agree that AR and VR are related and that it is quite valid to consider the two concepts together" (p. 283). Finally, the meaning of the term or abbreviation "XR" remains ambiguous.
Importantly, the ongoing ambiguity regarding AR, AV, mixed reality, and related concepts may be detrimental to the user experience 1 for a number of reasons. First, this ambiguity "holds back those eager to explore the different opportunities these new technologies represent" (Farshid et al., 2018, p. 658), which, in turn, constricts both consumer value realization and cash flow for producers. Second, ambiguity and user confusion impact managerially-relevant outcome variables such as customer intention to use a product (Deng et al., 2010). By definition, customer perceptions that do not align with customer expectations will result in issues with satisfaction. Since satisfaction is linked to equity and other important managerial variables (Poushneh & Vasquez-Parraga, 2017;Szymanski & Henard, 2001), user experience is important for managers. In short, we concur with Flavián et al. (2019) who observe that the boundaries of AR, VR, and mixed reality have not been defined adequately, and we posit that the extant literature is ripe for a reorganization and reconceptualization of existing approaches to reality.
We address this gap by detailing a coherent framework to consolidate the existing and often contradictory perspectives currently found in both industry and academic literature. Specifically, this research has three objectives. First, we identify and organize extant terms, views, and definitions in the academic and practitioner-oriented literature. Second, we synthesize these concepts and terms into an ordering framework that is externally informed and validated through insights from focus groups and in-depth interviews with industry experts. Third, we delineate the core differences between new reality formats and guide future scholarly work by proposing avenues for future research in various disciplines.
Our work provides contributions to several streams of literature. Specifically, we advance the current literature on AR (e.g., Chylinski et al., 2020;Hilken et al., 2020;Rauschnabel et al., 2019) by conceptualizing local presence, defined as the degree to which a user perceives AR content as being actually here (Verhagen et al., 2014), as a key criterion for AR. Our proposed AR continuum ranges from assisted reality (with low levels of local presence) to mixed reality (with high levels of local presence). Furthermore, we advance the extant literature on VR (e. g., Cowan & Ketron, 2019;Hudson et al., 2019;Mütterlein, 2018) by conceiving telepresence, defined as the degree to which a user feels present in the virtual rather than the physical environment (Steuer, 1992), as the focal construct. Our VR continuum ranges from atomistic VR (with low levels of telepresence) to holistic VR (with high levels of telepresence). Importantly, our conceptual framework clearly separates AR from VR as opposed to previous streams of research (Milgram et al., 1995;Milgram & Kishino, 1994) which proposed a fluent AR-VR continuum. In other words, we suggest that users can either be immersed 2 in an AR environment or in a VR environment, but not both simultaneously. Finally, our work addresses the ongoing confusion regarding the term XR, which we do not define as "extended reality" but with X as a placeholder for any form or new reality.

Augmented reality
Augmenting the view of the world has a long history. In his novel "The Master Key", L. Frank Baum's (1901) protagonist receives the supernatural power of "character marker" -a special set of spectacles that superimpose a letter indicating an individual's underlying personality on their forehead. Before this, the concept of Pepper's ghost symbolized an "AR-like" illusion techniquealthough not digitallyin stage productions from the 1860s.
While the concept of AR dates back to the 1950s (Carmigniani et al., 2011), the phrase is generally considered to have been coined by Tom Caudell and David Mizell in 1990 (Berryman, 2012). AR has been defined in a number of ways, but it typically refers to a combination of digital information with the real world that is presented in real-time (Azuma, 1997;Feiner et al., 1993;Milgram et al., 1995). A Google Scholar search of AR literature reveals hundreds of articles exploring this topic focusing on diverse topics like the development of the underlying technology (Zhou et al., 2008), the impact on social interaction (Miller et al., 2019), and leveraging this technology in differentiated application areas, including medical training (Berryman, 2012), tourism (Yin et al., 2021;tom Dieck & Han, 2019), manufacturing (Schein & Rauschnabel, 2021), marketing Hilken et al., 2017Hilken et al., , 2020Heller et al., 2019aHeller et al., , 2019bTan et al., 2021;Hoffman et al., 2022;Scholz & Duffy, 2018;Rauschnabel et al., 2022), service management , and architecture (Lin & Hsu, 2017). Overall, technological advances and the ubiquity of mobile devices have made AR available to a substantial number of users (Billinghurst et al., 2015).
It is important to note that multiple AR classifications coexist in the extant literature, and many are either incompatible or contradictory. Fig. 1 presents a classification of AR characteristics based on the most prominent devices, enablers, and display types for visual AR (non-visual AR will be discussed later). As a general rule of thumb, newer, dedicated AR devices typically include more specialized hardware (e.g., depth sensors, eye tracking, see-through/retinal displays, etc.), which also allow new forms of human-computer interfaces (e.g., controllers, hand and finger tracking, voice commands, retinal control, and brain computer interfaces). Moreover, newer AR devices typically provide a higher level of embodiment by moving the technology closer to the human body, whereas more established approaches leverage ubiquitous technologies characterized by a wide market penetration (e.g., smartphones or WebAR on a laptop computer).

Virtual reality
The idea of providing users with an immersive, artificiallyconstructed reality predates the concept of AR. While the concepts of engaging in a technology-driven, fabricated reality go back to early science fiction and fantasy novels, the first approaches to "VR" were in fact panoramic paintings that sought to fill an individual's field of view and make the viewer feel as if he/she were actually embedded in the scene (Bown et al., 2017). Whereas panoramic paintings utilize foreshortening to create a feeling of presence in a scene, stereoscopic photo viewers more effectively used this concept to create a realistic perception of "being there". The first widespread use of technology that resembled contemporary VR was the Link Trainer application used to train pilots before and during World War II (Jeon, 2015). When we think of VR today, most picture a head mounted system that occludes information from the environment while presenting information depicting a virtual environment to the user. These 'head-mounted displays' (HMDs) were initially designed for gaming and entertainment, but usage has gradually broadened to include areas like job training, prototyping, marketing, and tourism (Shahab et al., 2021). Researchers have also explored the usage of VR in several commercial applications such as retail outlets and supermarkets (Krasonikolakis et al., 2018), the fashion industry (Yaoyuneyong et al., 2018), manufacturing (Berg & Vance, 2017), tourism (Lee et al., 2020;Wei et al., 2019), healthcare (Fertleman et al., 2018), and as a research tool (Holländer et al., 2019;Stadler et al., 2019).
With few exceptions (e.g., VR caves; compare Lu & Smith, 2009), VR has traditionally been limited to headset-based applications. The main distinction between different VR devices is the number of degrees-of-freedom (DoF), i.e., the number of parameters in a system that can vary independently of each other. For example, 3 DoF only supports rotational tracking, whereas 6 DoF supports both rotational and translational tracking (Pan & Hamilton, 2018).

Four views of new realities
When screening the literature for existing definitions and frameworks, we searched for publications that introduce new reality formats to the literature in various disciplines, such as marketing, tourism, human-computer interaction (HCI), management information system (MIS), and computer science. As a first step, we conducted a literature search in common academic databases, including Google Scholar, Web of Science, IEEE Xplore, and the ACM Digital Library. By iteratively comparing similarities and differences expressed in extant classifications of these concepts, we grouped existing definitions and perspectives into four prototypical views. Once we determined that the four views adequately encapsulated the perspectives of the existing literature, we requested feedback from our informants during the interviews and found general support for our classification. Fig. 2 outlines these views which will be discussed in detail below.

The "MR-dominant view"
In what is probably the most cited definition in this area, Milgram et al. (1995) presented the Reality-Virtuality Continuum as a way to understand the relationship between real and virtual elements of the user experience. To the left side of the continuum lies the real environment without the addition of virtual objects. To the right of the continuum lies the virtual environment, which refers to a fully virtual user experience without the inclusion of elements from the real world. Milgram et al. (1995) define any user experience combining real and virtual objects as mixed reality, with mixed reality's two sub forms being AR and AV.
The "MR-dominant view" is compelling and well-known due to its simplicity and flexibility (Skarbez et al., 2021). However, several issues are notable with this perspective. First, the MR-dominant view states that other realities (AR, AV) are a subclass of mixed reality (Milgram & Kishino, 1994) and that "AR and VR are related and that it is quite valid to consider the two concepts together" (Milgram et al., 1995, p. 283). However, given the differences in both designer goals and user experiences associated with AR and VR, considering them together might be problematic. Second, some authors argue that the MR-dominant view distinguishes between AR and AV based on the proportion of real vs. virtual content (Leclet-Groux et al., 2013;Looser et al., 2004). This proportion-based interpretation, however, is not without limitations. For instance, consider a user wearing a pair of functional AR glasses integrating textual content into the person's field of view. This use case would, in the MR-dominant view, be considered a "mixed reality environment", and, more specifically, represent either AR or AV, depending on the proportion of the user's field of view that is covered by text. As we will show later, such examples contradict current industry practices. Third, some authors also interpret this view by the dichotomous distinction of whether virtual content is overlaid on the real-world (AR) or whether real objects are overlaid on virtual content (AV). While a dichotomous distinction contradicts the idea of a continuum, this distinction might become especially challenging in video see-through systems (where everything is presented on a digital screen), and it also may not matter to consumers. Fourth, real-world occurrences of AV are difficult to find. As we will show later, several industry informants with numerous years of experience were unfamiliar with this term, and a Google trend analysis 3 supports this conclusion.

The "VR-dominant view"
The "VR-dominant view" argues that VR is the main medium standing above all other formats. For instance, Azuma (1997) suggests that AR "is a variation of Virtual Environments (VE), or Virtual Reality as it is more commonly called" (p. 2). In a similar vein, Guttentag (2010, p states that "this paper accepts augmented reality (AR) -the projection of computer-generated images onto a real-world view [ …] as a type of VR." Wedel et al. (2020) recently argued that the term VR is sufficient and the use of the term AR is only necessary when a distinction is specifically needed in a given context. Hence, the VR-dominant view tends to classify AR as a sub form of VR and discusses mixed reality as something merging both VR and AR (Wedel et al., 2020), without further specifying the relationship between mixed reality on the one hand and VR/AR on the other. As with the MR-dominant view, given the substantially different goals and experiences of AR and VR, it seems dissatisfying to declare AR to be a subset of VR.

The "MR-centered view"
Other scholars (e.g., Farshid et al., 2018) propose a continuum including mixed reality in the center between AR and AV, which is surrounded by the real world and VR. In addition, and contrary to Milgram et al. (1995), mixed reality is not conceptualized as an umbrella term for these realities that include both real and virtual elements, but rather as a very specific type of reality that "combines what's real with what's possible" (Farshid et al., 2018, p. 660). Flavián et al. (2019) similarly designate "pure mixed reality" as a specific technology that fits between AR (defined as when virtuality overlaps reality) and AV (when reality overlaps virtuality). Overall, the MR-centered view makes an important distinction by segregating the real from the possible and follows Deleuze (1966), who first proposed this split between the real (e. g., the telephone on your desk) and the virtual (e.g., using Siri as a virtual assistant) to virtuality (e.g., playing a virtual game that has no connection to reality) (Farshid et al., 2018). However, Farshid's distinction between VR (left side of the VR continuum) and virtuality (right side of the VR continuum) seems to remain ambiguous, especially given that mixed reality and AV are positioned between these two poles. For example, it remains unclear why VR is described as being "real", whereas virtuality is described as being "possible" (cf. Fig. 1, p. 658). Furthermore, as with the "MR-dominant view" and the "VR-dominant view," the "actual reality/virtual reality continuum" view does not define a role for XR.

The "extended reality view"
A myriad of firms and consultants have developed their own approaches and terminologies. XR (often used as an abbreviation for extended reality) is frequently employed as an umbrella term for a variety of distinct conceptsmost prominently AR and VR. The term mixed reality is often loosely and vaguely incorporated, typically as "a combination of AR and VR" and without specifying further what this means, while other authors specify mixed reality in more detail. For instance, Kunkel and Soechtig (2017, pp. 48-63), in a recent report from Deloitte, propose that in mixed reality, "the virtual and real worlds come together to create new environments in which both digital and physical objects-and their data-can coexist and interact with one another" (p. 49). Dalton (2021, p. 5) states that AR is "sometimes subcategorized as mixed reality" which represents "another form of AR". He explains the unique characteristics of mixed reality in technical terms, such as that "digital elements can be anchored to points in the physical environment" (p. 5), and discusses it in "contrast … [to being] … simply overlaid" (p. 5). This specification is basically the translation of technical mixed reality characteristics (e.g., spatial anchors, stereoscopic 3D, etc.) on a higher level of abstraction in the user's voice ("very realistic"), as discussed in Dwivedi et al. (2020). However, it is important to note that we observed a shift in the use of the term mixed reality toward "realistic AR" with the launch of the HoloLens device in 2016, which the experts in our study also confirmed. The question mark and the unclear boundaries (blurred circle) in Fig. 2 indicate that many industry professionals have not yet scrutinized the organization of these terms under a clearly defined umbrella. Furthermore, this view has not yet received sufficient academic attention.

The need for an updated framework
In the previous section, we observed an inconsistent and incomplete use of new reality terminology. We also noted that three of the four different "views" are academically driven whereas the "extended reality view" is primarily informed by industry. Since the new reality field is shaped by various stakeholders, such as academics, commercial players (e.g., hardware and software providers, and their marketing departments), industry associations, and so forth, we conclude that both academics and industry will benefit from reconceptualizing and organizing the field through expert informant opinion. Against this background, the current research aims at identifying, (re-)defining, distinguishing, and organizing relevant terms in a managerially focused framework by consolidating published research and input from a variety of expert informants. More formally, the proposed framework is designed to: • include all relevant terms • provide an informant-driven definition • identify and explain core differences between relevant terms • organize these concepts into a coherent framework

Focus group and expert interviews as an iterative validation
We began our research with a detailed inspection of the academic literature, industry publications, and other publicly available materials (e.g., self-descriptions from companies). Based on the review of these materials, we developed a first tentative draft of the xReality framework which included definitions and delineations between and across different terminologies.
Next, we conducted a focus group with seven experienced industry practitioners, all with extensive but varied AR and VR expertise. We conducted the focus group via a video conferencing tool, which allowed participants to engage in active discussions and conversations (Bosco & Herman, 2010). The focus group lasted 1.5 h and was recorded. After a general introduction, participants were presented with the four views of new realities (described in section 2) as well as the draft of the xReality framework, and then asked for impressions and feedback on these materials. This procedure produced intense discussions regarding the existing draft. In addition, concepts and terminologies that were not part of the initial framework were discussed. Four participants later contacted the research team with additional input. Following qualitative analysis of the focus group, the authors iteratively referred back to the literature and compared the results with documented views and findings.
Additionally, the authors conducted 15 qualitative interviews with AR and VR professionals of different backgrounds, foci, and tenure. The interviews lasted between 40 and 105 min and were guided by presenting the framework as it evolved with each interview. Specifically, after each interview, we adjusted the model, revised terms, added new concepts whenever necessary, and documented additional feedback and suggestions. Four experts shared further documents, videos, or thoughts with us after the interviews. We ceased data collection when saturation was reached and additional interviews failed to generate novel insights (Saunders et al., 2018).
We identified both focus group and interview informants based on public presentations, recommendations, publications, and personal contacts. The sample included individuals with differing perspectives, terminologies, and backgrounds (e.g., academics, managers, developers, consultants, etc.), and Appendix 1 displays their demographics. Our empirical approach to integrating the voice of both industry and academic experts iterated "through data collection and analysis in such a way that preceding operations shape subsequent ones" (Spiggle, 1994, p. 495). The qualitative approach was appropriate as we were exploring a new and emerging area that is rapidly changing, and understanding the processes involved in designing and categorizing these interrelated concepts requires the flexibility availed by the qualitative process.

Development of the integrative xReality framework
Based on our review of related literature and the insights provided by our informants, we developed a contemporary classification of new media formats. We name this framework the "xReality framework" (see Figs. 3 and 4). In the subsequent sections, we explain each element in more detail.

XR: extended reality vs. xReality
The term XR is often used as a generic expression covering both AR and VR (Çöltekin et al., 2020). Our informants supported this notion of an "umbrella term" where reality formats are "put in" (JAKE). Extant literature frequently establishes XR as an abbreviation for "extended reality" (Alcañiz et al., 2019), and some of our informants first echoed this view. However, we also found that many experts felt that this conceptualization could be misleading, as the term "extended", per definition, excludes VR since reality in VR is not extended but rather replaced. Some of our informants, for example JOE, indicated this immediately ("I do not say extended realities [to XR] because VR, to me, is not an extended reality but rather an alternative reality"), whereas others altered their stance after a discussion about the appropriateness of the term "extended" as including VR since reality is replaced. Some informants suggested alternative terms, such as "reality x" (DORIAN), "digital reality" (PAT), "new realities" (JOE), or sticking with "XR" (CARL). MARTIN specified this as follows: "XR, in my mind, always was where X is replaced with whatever; it's 'something R'". We decided to follow the general consensus that X represents a placeholder for any digital reality format, embracing the notion of using XR as an abbreviation for X Reality (as, for instance, suggested by BILL), conceptualized as an established umbrella term for a variety of digital reality formats. Proposition 1. We posit that Xin XRrepresents a placeholder (similar to an X variable in algebra) for any form of new reality.

Strict separation of AR from VR and of experience from hardware
Section 2.3 above outlined four views of the conceptual space described in the XR literature, where most of them conceptualize AR and VR on the same continuum (Flavián et al., 2019;Milgram et al., 1995;Milgram & Kishino, 1994) or AR as a specific sub form of VR (e.g., Guttentag, 2010;Wedel et al., 2020). Contrasting these views, we found a general agreement 4 among our informants that AR and VR represent "fundamentally different concepts" (CARLA), where typically different types of content are relevant (MAX) and need to be "separated" (ANNA), especially "from the user's viewpoint" (JAKE). In short, as stated by GARY, AR and VR "are not the same thing at all […] I don't think they should be considered on the same scale [ …] and it is better to have a split between those two". RICK reported his observation that "many firms completely separate AR from VR''. ANNA discussed "different purposes" of these two formats which are driven by different success factors; for instance, "how well the content is integrated in the reality" matters in AR, but not in VR (SAM).
Following suggestions from the human computer interaction literature (e.g., Hassenzahl & Tractinsky, 2006), we conclude that a conceptualization of new realities should be based on user experience, and that Fig. 3. XR (xReality) as an umbrella term for AR and VR. 4 We acknowledge that some informants (e.g., TRISTAN, MIKE) provided some less common examples where the differentiation might not be immediately clear (we discuss some of them in the discussion section). However, we found a general agreement that common use cases can be well separated into AR or VR.
AR should be clearly differentiated from VR. This contradicts some streams of research (e.g., Milgram et al., 1995;Milgram & Kishino, 1994), yet echoes the views of nearly all of our informants. Specifically, we observed a general agreement that the distinction between AR and VR should be made based on whether the physical environment is, at least visually, part of the user's experience or not. That is, any distinction based on the underlying hardware alone is not appropriate.
For example, one could use a "VR-branded device" with front cameras and present video-see-through AR to a user. Although the device itself might be classified (or marketed) as "VR", the user would in fact experience AR (similar to AR on a smartphone). Our informants corroborated this perspective. For instance, SVEN observed: "You are either in an AR environment, so you see the real-world, or you are in a VR environment where […] you are basically in a digital environment. […]. You can move from one to the otherthat is not happening very often now, the technology is not really there. But in the future, I see that as a thing that can happen. But you are only either in one, you can't be in both at the same time".
We also identified general differences between AR and VR (see Table 1). As discussed earlier, there are a variety of AR devices, whereas consumer VR is typically limited to HMDs (and in rare instances to caves or similar formats). Many of the informants emphasized AR technology's potential to develop into something that is used always and everywhere (e.g., BEN), whereas VR deviceswithout substantial innovationremain a device for temporary use. In addition, a precise understanding of the physical environment through tracking technology is necessary to realistically integrate virtual objects into the real-world in AR. On the other hand, in VR this is usually less crucial and often limited to collision avoidance, i.e., the identification of potentially dangerous real objects close to a user. Furthermore, from a practical perspective, specific content might be effective (e.g., accepted by consumers) in AR, but not in VR, and vice versa. Therefore, developing an application in one or the other requires an understanding of both the specific use case and the availability of devices within the user group.
The physical environment is, at least visually, replaced in VR and represents an experience where users "go in" (CARL). The notion "at least visually" is important since other external sensual stimuli (e.g., smell) are challenging to suppress. However, similar to AR, typical VR systems are "primarily centered around vision" (Slater & Sanchez-Vives, 2016, p. 4). In contrast, the physical environment is extended and enriched (also by diminishing real objects) in AR, and thus, AR experiences are driven by the experience of digital content within a physical space (BEN, JOE, RICK). Furthermore, barriers to the experience tend to  Typical Use Cases Situations where combined experiences of real and virtual content is beneficial (e.g., to compare sizes, e.g., of furniture) and possible (e. g., the home for the furniture already exists) Situations where the physical or story context does not exist (e.g., a fictitious game), is not accessible to a user (e.g., the moon, time travel) or where the actual physical context is not desirable (e.g., in training situations that would be dangerous in the real-world).
Note: We refer here to "generic" experiences and standard devices. There may be situations where these differences do not (fully) apply.
matter. In VR, users may feel "lost", struggle with motion sickness, or fear collisions with physical objects (e.g., due to abstraction, complete mental immersion, and an inability to perceive the real world). In AR, physical threats may result from distraction or misinterpretation (e.g., by perceiving real objects as virtual), which can eventually also result in collisions with physical objects. Furthermore, users can compromise not only their own, but also other people's privacy (Cowan et al., 2021;Lammerding et al., 2021;Rauschnabel et al., 2018). These aforementioned differences are also echoed in typical use cases. For example, both AR and VR can be effective for training and education. However, SVEN, for instance, argues that VR is effective in training before starting a new job, whereas AR allows training on the job. In more general terms, typical AR use cases usually emerge in situations where combined experiences of real and virtual content are beneficial (e.g., to compare sizes of furniture or clothing) and possible (e.g., when the space for a specific piece of furniture already exists). VR, in contrast, is preferred in situations where the physical context does not exist (e.g., a fictitious game), is not accessible to a user (e.g., the moon), or where the actual physical context is not desirable (e.g., in training situations that would be dangerous in the real world). Furthermore, as with other media, both AR and VR have the potential to cause psychological or physical harmeither intentionally, due to carelessness, or out of malicious intent. Examples include experiencing a war scene in which people are killed in VR, the inclusion of scary or disturbing virtual objects in AR, or the design of visual content in a way causing nausea, headaches or even seizures, as was demonstrated by the so-called Pokemon Shock in 1997 where strobe effects in a TV episode caused health issues among 600 viewers. Fig. 4 presents the final model related to Proposition 2.

Proposition 2.
There is a need to separate AR from VR based on whether the physical environment is, at least visually, part of the user experience (=AR) or not (=VR).

Refining AR: the assisted versus mixed reality continuum
Following the general consensus of our expert informants, we suggest that AR represents a combination of real and virtual content that is displayed in real-time. Furthermore, our empirical findings suggest that it is meaningful to distinguish between different types of AR (cf. Fig. 5). For example, workers can use AR glasses to obtain text-based work instructions overlaid on the physical environment (Mura et al., 2016), and tourists can gain access to overlaid information for places of interest when on a sightseeing tour (Han et al., 2013). Many of our informants (e. g., MEL, PAT, SVEN) used the term "assisted reality" to describe this form of AR because the purpose of the virtual objects is to assist the user in obtaining a better understanding of the physical environment rather than to merge virtual objects with the real world. On the other hand, our informants described a highly sophisticated form or AR that tracks and maps the environment in three dimensions, and which integrates digital objects realistically and seamlessly into the user's perception of the real world. This seamless integration of virtual and real objects is termed "mixed reality" because the two realities (real and virtual) merge and, in its extreme form, become indistinguishable to the user (MEL, PAT, SVEN, MIKE).
Although we acknowledge that the term "mixed reality" is often used differently in the literature [compare, e.g., also Speicher et al. (2019) who observe that different authors treat mixed reality either as a synonym for AR, as a combination of AR and VR, or as a "stronger" version of AR], we argue that our conceptualization of the assisted realitymixed reality continuum is meaningful and beneficial for academia and practitioners. First, we found substantial support from our informants for this conceptualization, especially from those informants who felt that the traditional mixed reality continuum suggested by Milgram et al. (1995) has become conceptually problematic, given the development of AR over the last 25 years. For example, one of our informants emphasized that "there is a big step from assisted into mixed reality" (SVEN) within the range of the AR continuum. Others associated mixed reality with "hybrid experiences" (BILL), where virtual content is "interacting with physical objects and logically matching" (DORIAN), or user experiences where "you cannot really tell anymore what is real from what not" (LENA). Second, this view is echoed by recent industry publications from reputable players in the XR market (e.g., Dalton, 2021) who use the term mixed reality similarly to our conceptualization. 5 Third, the term "mixed" is etymologically closely related to the idea of a realistic integration of real and virtual content. For instance, the Oxford English Dictionary (2021a) defines the term mix as "combine and put together to form one substance of mass" (here: one experience) and uses "oil and water do not mix" as a negation example. This example metaphorically represents the opposite end of the continuum, assisted reality, where virtual content (oil) is just overlaid ("floating") on top of the real world ("water") -or, as stated by MIKE, "floating in front" of the user.
The discussions with our informants about the differences between assisted and mixed reality centered around the term realism (e.g., LENA), and identified different views of what this term means to them. Some argued that fictitious characters (e.g., monsters) can, per definition, never be realistic, whereas others discussed this issue through the lens of "suspension of disbelief", which describes users' willingness to suppress information that contradicts real-world knowledge (Weibel et al., 2015). Linking these observations to the literature, we identified 5 The experts in the study highlighted their associations of mixed reality with the Microsoft Hololens device from 2016 (e.g., PAT, ANNA; RICK: "a super marketing term"), which was marketed as the first technology that realistically integrates, rather than overlays, virtual content. Others stated the term is, in their view, poorly defined and used without a clear understanding of what it means (e.g., JAKE). the term "local presence 6 " to best describe this distinguishing factor between assisted and mixed reality. Drawing on prior research (e.g., Lombard & Ditton, 1997;Smink et al., 2020;Spagnolli et al., 2009;Verhagen et al., 2014;Vonkeman et al., 2017), we define local presence as the degree to which a user experiences AR objects as being actually present in his or her own physical environment. In assisted reality, content is perceived as clearly artificial and overlaid, and thus, not perceived as being actually there. In contrast, when it comes to mixed reality, users experience virtual content as being actually in their physical environment (e.g., a decorative vase on a table or a flying monster in a game).
Proposition 3. Assisted reality and mixed reality are the opposite poles of the AR continuum. This categorization depends on the level of local presence perceived by the user.
In addition, two more terms were subject to the discussion: First, several informants mentioned diminished reality where physical objects are omitted from user perception. In other words, the technology "erases" objects from the real-world by overlaying them with virtual objects (Mann & Fung, 2002). The xReality framework appropriately handles this emerging concept. At the assisted reality endpoint, diminished reality would utilize an unrealistic overlay (e.g., a censor bar blurring content). Mixed reality would seamlessly remove perception of a real object in a way that is difficult or impossible to detect by users. An example use case might be an ad blocker in AR glasses that realistically overlays virtual content over environmental ads, allowing users to experience an "ad free" environment (Rauschnabel, 2021).
Second, when discussing different reality views (e.g., the MRdominant view or the MR-centered view), we asked informants about their view of the concept of AV. Surprisingly, many informants were not aware of the term, and others consciously choose not to use it (e.g., BILL, DORIAN, GARRY, MIKE). Still others see it as "maybe a niche" (JOE) with limited use cases, or associated it with green screen technology (RICK) or other non-XR formats.
Throughout our research, we identified numerous characteristics of AR that determine the position of an AR experience on the assisted realitymixed reality continuum. Before discussing them in more detail, we need to acknowledge two premises of our framework. First, AR/VR conceptualizations based on Milgram et al.'s (1995) reality-virtuality continuum determine the specific type of reality via the proportion of visual "content that is real versus how much is computer-generated" (Looser et al., 2004, p. 2;see also;Leclet-Groux et al., 2013). For example, in the Milgram et al. (1995) view, environments with a larger proportion of real objects are termed AR, and environments with a larger proportion of virtual elements are termed AV. However, we argue that such a "view of proportions" remains limited and does not acknowledge substantial changes and developments in recent AR technology. Hence, our AR continuum considers aspects related to the type of content (e.g., the quality, transition, and integration of real vs. virtual objects) rather than merely the proportion between these elements. Second, we suggest that mixed reality is not per se "better" than assisted realityrather, this depends on the general context. Hence, user goals determine whether users perceive one specific AR application as being better than another one. For example, even though mixed reality provides a substantially higher integration of virtual and real objects than assisted reality, assisted reality may very well be the superior environment if the goal of the user is to enrich the real environment with factual information. In this example, a high integration of virtual and real elements might distract or confuse the user rather than being able to generate benefits. On the other hand, in highly hedonic contexts (e.g., games), a high integration between virtual and real objects (leaning towards mixed reality) is likely to improve user experience. Finally, given our conceptualization of AR as an experience, we argue that personal user characteristics (e.g., expectations, prior experiences with AR, etc.) determine how realistically they perceive specific uses.
The following paragraphs outline elements that our informants identified as the most important drivers of the assisted/mixed reality distinction. It is important to note that the factors listed below are not presented as either exhaustive or perpetual, as other factors may play a role and some factors may lose importance as both user expectations and technology change.

Content stability and persistence
There are two approaches to placing virtual content in the real world (Jaekl et al., 2002), and these approaches determine content stability. Head-stable content moves according to the orientation of the user's head and is appropriate in cases where no relationship between virtual and real-world objects exist or where it is essential that users can quickly process information (e.g., text or notifications). This approach aligns well with the notion of assisted reality. On the other hand, world-stable content is anchored in a fixed position within the 3D space. This approach is commonly used in cases where a relationship exists between a virtual and a physical object and, hence, is prominent in mixed reality applications. From a technical point of view, world-stable content requires extensive tracking technology so that virtual content can be rendered in 3D space realistically and in registration with both real and virtual objects (Keil et al., 2019;Tamura et al., 2001).
Persistence is a specific characteristic of world-stable content that refers to how augmented content is spatially attached to specific physical objects (e.g., a digital vase on a user's physical desk), or attached to a specific geographic location through geo-coordinates (Bachras et al., 2019). Augmented content is required to respond to movements within the physical world. Informant MEL stated that mixed reality experiences allow users "to place an object in the space, to turn around and back, and it still appears at the same place" and added that this does not always work well with all technologies, such as those that suffer from calibration drift (i.e., a loss of calibration quality). RICK highlighted the importance of world-wide and device-independent platforms ("mirror worlds'') for applications that enable AR content to persist in a specific location forever and for multiple users to access (AR metaverses).

Dimensionality of content
The type of display influences how a perception of depth can be created in AR applications (Greene et al., 2021). In general, the processing of depth information based on binocular disparity requires the use of stereoscopic AR displays (Heinrich et al., 2019). Alternatively, if no stereoscopic display is available (for example, when using a smartphone), other depth cues such as perspective, occlusion or shading could be used. The visual content displayed in assisted reality is typically 2D (e.g., text), whereas content toward the mixed reality endpoint is usually 3D, and thus, typically increases local presence.

Contextual embedding
In general terms, contextual embedding refers to how cues in the environment are situated and interpreted within a specific context (cf. Hornecker, 2010). Assisted reality provides a minimum level of contextual embedding as the technology is not fully aware of the context. Here, the technology requires the user to update when a specific step in a process has been completed so that it can advance to display the next step (processual context). Mixed reality, on the other hand, 6 Note that in the extant literature, similar terms such as "local presence", "object presence", and "spatial presence" are used, among others. We deliberately opted for using local presence to avoid misunderstandings among different scientific communities. For example, "spatial presence" could be understood as a term to describe the spatial relationship between the physical location of the user and the virtual location in which he or she is. In this case, spatial presence would be ambiguous. Likewise, "object presence" refers to the object-focused appearance or position of virtual objects (in AR or VR), rather than a user's perception that virtual content is perceived as actually being in his or her local physical environment. recognizes and identifies objects in the surroundings and can also track the user's progress in a task, thereby advancing the display as progress is made (physical context). As AR moves from assisted to mixed reality, the requirements for tracking and understanding the environment increase (c.f., Fig. 4). For instance, LUKE commented on the need to understand and incorporate real world lighting characteristics to model realistic shadows. RICK discussed varifocal technology that allows users to perceive virtual objects displayed a centimeter from the eye as if they were far away. Advancements in tracking technology (e.g., markerless tracking, LIDAR scanners, and recent versions of Apple's AR Kit and Google's AR Core) substantially improve the perception of embedded content. Hence, embedding in the context is reflected by perceived augmentation quality (Rauschnabel et al., 2019) which in turn leads to higher levels of local presence (cf. Daassi & Debbabi, 2021) and thus moves the user experience more towards the mixed reality end of the AR continuum.

Technological embodiment
The concept of technological embodiment describes how AR technology can become an extension of a user's body (Tussyadiah et al., 2017). Flavián et al. (2019) suggest that wearable devices increase and stationary devices decrease embodiment. Furthermore, our informants reported industry developments on AR brain interfaces that could directly integrate virtual information into the optic nerve.
An integral aspect of technological embodiment is how interaction is designed. For interaction with (2D) content on the assisted reality end of the continuum, techniques that have been designed for other contexts in which interaction with 2D content is common (e.g., desktop and smartphone applications) typically work. Yet novel approaches may be required for interaction with content in 3D space, with the objective of enabling intuitive interaction and ultimately supporting a stronger sense of technological embodiment. Examples include, but are not limited to, interfaces based on speech, gaze, EEG, and EMG. Overall, assisted reality is usually characterized by lower levels of technological embodiment, as opposed to higher levels of technological embodiment for mixed reality.

Interactivity
In the context of technology-mediated communication, interactivity has been defined as "the extent to which users can participate in modifying the form and content of a mediated environment in real time" (Steuer, 1992, p. 84). Previous research suggests that interactivity is important for an effective AR experience (Park & Yoo, 2020;Yim et al., 2017). Park and Yoo (2020) find that perceived interactivity increases mental imagery, which in turn improves consumers' attitudes towards a product and increases purchase intentions in an AR-mediated shopping context. In a similar vein, Yim et al. (2017) show that higher levels of interactivity in AR applications increase consumers' perceived usefulness and enjoyment. The extant literature acknowledges that interactivity and realism may interact in AR experiences (Montero et al., 2019), yet higher levels of interactivity may not necessarily result in higher perceived local presence. For instance, assisted reality applications can provide a high level of interactivity by allowing users to manipulate and control superimposed objects such as text or virtual icons. However, such applications may not provide high levels of local presence as virtual objects and the physical environment are (intentionally) not seamlessly integrated.

Shared and social experiences
Content in traditional AR has typically been restricted to a single user at a time. However, advances in technology create the opportunity for multiple users to experience the same (virtual) content together Hilken et al., 2020;Lebeck et al., 2018), which is often discussed in terms of co-presence (Nowak, 2001) and co-experience (Battarbee & Koskinen, 2008). Shared experiences build upon persistence (as outlined above) since the content needs to remain in the same location for multiple users to perceive it (e.g., in an AR metaverse).
Shared experiences can not only enable interactions between one user and the (real and virtual) content, but also between users. This enables a form of interaction that is more similar (i.e., realistic) to real-world interactions. For example, Carrozzi et al. (2019) found that shared AR experiences enhance feelings of psychological ownership for virtual objects, and Hilken et al. (2020) showed that shared AR experiences can enhance users' social empowerment and improve joint decision-making. Hence, we suggest that with increasing levels of shared experiences, the user experience shifts towards mixed reality in the AR continuum. In addition, yet not fully explored, shared experiences might also occur with anthropomorphic or animalistic virtual creatures (e.g., a user-pet relationship with an AR animal).

Augmentational/environmental control
Environmental control refers to the level of control users have of content in AR applications and how this content interacts with real objects (Brooks, 1990). Previous research suggests that increased levels of perceived control raise the need for user identification and psychological ownership of the AR application (Carrozzi et al., 2019). Augmentational control is very low or potentially non-existent in a typical assisted reality application. Control increases in common mobile AR apps (e.g., a makeup app where users can augment their eyes, lips, etc.) and tends to be very high in mixed reality where users can manipulate virtual objects and enhance real objects. For example, in a mixed reality environment, a user might be able to click on a virtual light switch that is connected to a real lamp.

Refining VR: the atomistic vs. Holistic Virtual reality continuum
Compared to their categorizations of AR and mixed reality, Milgram and Kishino (1994) as well as many authors building on Milgram et al.'s work, remain surprisingly silent when it comes to specific forms or types of VR (Farshid et al., 2018;Flavián et al., 2019;Milgram et al., 1995;Milgram & Kishino, 1994). However, given the drastic advances in VR technology and applications (Hollebeek et al., 2020), a distinction between different types of VR will further advance our understanding of virtual environments.
Informed by the experts and inspired by prior research, we argue that telepresence allows us to distinguish between different forms of VR. Hence, telepresence with its notion of "being there" (Rodríguez-Ardura & Martínez-López, 2014) is clearly delineated from local presence, as discussed in the section on AR. More formally, we draw on existing research (Lim & Ayyagari, 2018;Mantovani & Riva, 1999;Mollen & Wilson, 2010;Steuer, 1992;Tussyadiah et al., 2018) and define telepresence as the degree to which a user feels present in the virtual rather than the physical environment. We acknowledge that existing research uses several variations of this term (e.g., simply "presence", Mantovani & Riva, 1999), virtual presence (Sheridan, 1992), or mediated presence (Bourdon, 2020) synonymously for our definition of telepresence. However, by adding the prefix "tele," we highlight the distinction from local presence. More specifically, according to the Oxford English Dictionary (2021b), the prefix "tele" is borrowed from Greek and denotes or relates to "action, observation, or communication at, over, or across a distance, or denoting devices used for this." In this sense, telepresence refers to presence mediated through a fully virtual environment (Mantovani & Riva, 1999).
Based on the notion of telepresence, we propose VR applications to be positioned on a continuum between atomistic and holistic VR experiences. On the one hand, atomistic VR refers to applications of VR for which the quality of the user experience is often secondary to some other goal. For example, VR can be used for training or modeling physical spaces (such as virtual blueprints in construction applications) where the completion of a task is a primary concern. In these cases, the user's perception of telepresence is less important than accomplishing a specific goal or outcome. On the other hand, holistic VR is signified by a VR experience that is nearly indistinguishable from a real-world experience in the mind of the user; in these cases, it is the perception that the user feels present in the virtual world that supersedes other aspects.

Proposition 4. Atomistic VR and Holistic VR are the opposite poles of the VR continuum, and this categorization depends on the degree of telepresence perceived by the user.
Two aspects primarily emerged from our discussions with experts in this area. First, the extant literature often refers to VR experiences consistent with Hassenzahl's (2018) classification of VR products or services based on their hedonic or pragmatic qualities. Whereas pragmatic quality refers to the perceived usefulness and ease of use, hedonic quality refers to an intrinsic "joy of use" that accompanies the user's experience. According to our conceptualization of atomistic VR, VR experiences at this end of the spectrum are likely to have a higher pragmatic quality, while holistic VR might also be characterized by a higher hedonic quality as perceived by the user. Second, this dichotomy is analogous to the concept of instrumental versus terminal use. In instrumental examples, the user employs VR technology as a means to an end as the technology is an instrument designed to accomplish some other goal. The word "terminal" signifies an end, and holistic VR experiences are often an end in and of themselves. In the following section, we categorize properties of VR applications and explain how they impact the position of a VR experience on the VR continuum.

Content stability and persistence
Like in AR, (portions of) content in VR can be stable with respect to the user's head movements (head or user stability) or the surrounding virtual environment (world stability) (Sipatchin et al., 2021). A VR experience can contain both head and world stable content at the same time. For example, the environment in a game can be world-stable, but objects belonging to the user (e.g., a map) or status information would be displayed in a head-stable manner.
Similar to AR, persistence in VR means that virtual objects are attached to a fixed location in 3D space. In contrast to AR (where objects are fixed in the physical world), persistent objects in VR are attached to a digital 3D position in the virtual world (Zielinski et al., 2015). The implications are similar, as different users can see and potentially manipulate these digital objects by navigating to a distinct location virtually.

Dimensionality of content
In VR applications, the dimensionality of content refers to how the elements forming the virtual world are rendered, typically in 3D (Zielinski et al., 2015). A specific example, where the virtual world is only rendered in 2D, is 360-degree videos. VR users require depth cues to allow them to judge the size of and distance to virtual objects and to perceive high levels of telepresence (LENA). Note that designers can deliberately design 2D or 3D content based on its purpose. For example, information on objects or status information about the user's location could be designed using 2D objects even though the user navigates in a 3D environment, and vice versa.

Contextual embedding
Contextual embedding is relevant for many XR applications, but it is different in VR compared to AR. In order to embed content in context, knowledge of the context is required. In VR, the context is typically entirely virtual (e.g., a virtual room), where such a 3D map is already part of the application, whereas in AR, this is much more challenging, as the context must be obtained externally (as outlined in the AR section).

Technological embodiment
Similar to the discussion in the AR section, advances in technology and hardware support higher levels of technological embodiment in VR applications (Flavián et al., 2021). However, the nature of this embodiment differs between AR and VR applications. Whereas a high level of technological embodiment in AR is based on unobtrusiveness and registration between real and augmented content, a high-level embodiment in VR depends on the individual's perception of telepresence in the virtual world. Instead of registering the real to the augmented content, technological embodiment in VR remains in the mind of the user, yet can be enhanced by technologies such as smart gloves.

Interactivity
How users interact with the application, the controllers, and other users in VR strongly contributes to the perception of telepresence. This differs from AR where the typical objective is to make interaction closely resemble interactions in the physical world. Traditional interaction techniques (i.e., the way in which I/O devices are used for interactive tasks) were developed for computer-human interaction through desktop computer interfaces, and these generally do not translate well to VR interaction (cf. Flavián et al., 2021).
Users often interact in VR using a controller, their hands, their gaze, or a combination thereof. These approaches to interaction can pose specific challenges. Because the entire VR world is synthetic, users are not able to see their own hands, and a virtual representation of the hand (or any other part of the user) is needed. Informant LENA highlighted the importance of self-perception in the virtual world, and research has shown that the realism of projected body parts has a substantial effect on both perceptions of embodiment (Argelaguet et al., 2016) and task performance (Knierim et al., 2018). VR systems that project representations of the human body require motion capture systems (cf. Liebers et al., 2021;Pfeuffer et al., 2019) to track user movements, and body movements are often extrapolated from the position and orientation of the headset and/or other controllers (Yung et al., 2021). For atomistic VR applications, it might be sufficient if the userin particular the parts of their body with which they interact (arm, finger, foot) -is shown in a very simplified way, or if only the controller is visible. For example, if users are positioning objects in VR to build a model of a product, they only need to perceive the controller and not their entire body. In contrast, holistic applications typically require a higher level of self-perception as body position may significantly influence the user's performance. For example, VR applications modeling fine-grained motor-control like typing (Knierim et al., 2018) or playing piano (Fanger et al., 2020) require higher levels of self-perception.
Furthermore, when objects are displayed beyond arm's reach where direct interaction with the object is not possible, more indirect interaction techniques, such as using a laser pointer metaphor (Hoppe et al., 2018) or the user's gaze can be employed. Many headsets now employ eye tracking technology (e.g., Pico Neo or HTC Vive Pro Eye) that can be used to select objects at a distance using simple visual focus. Gaze can be paired with other techniques for operations such as translating or rotating an object (Pfeuffer et al., 2014). The choice of such techniques may influence telepresence.
A particular form of interaction in VR is navigation, and in particular, locomotion. While users in AR simply physically move through the environment, other concepts are needed in VR. A high-quality experience while navigating is crucial for users to experience high levels of telepresence. Depending on the degrees of freedom, locomotion in VR may be realized using simple walking patterns. VR systems track the users' movements and map them to the virtual world. Yet the physical world in which the user experiences the VR application is usually limited in terms of space (e.g., the user's living room). To accommodate this, treadmills or so-called VR walkers can be used. As an alternative form, locomotion in VR can be realized using controllers to proceed through the environment. However, this approach typically results in a higher cognitive load for the user.
Summarizing, the extant literature consistently finds that higher levels of interactivity increase perceived telepresence (Beck et al., 2019;Kim & Ko, 2019;Mütterlein, 2018). This perspective is also supported by a large number of our informants. For example, the focus group discussed telepresence for atomistic and holistic VR and indicated that "looking around" 360 • is a very "simple form of interaction" (JOE).
Interaction within VR must be logical, as stated by DORIAN: "It's not only the extent or capacity of interaction that I can do and the quantity of interaction, but whether it makes sense or not. The interaction, to me, must be logical". Supporting this reasoning, MIKE argued that "interactivity leads to high levels of [tele]presence because you are more connected to the experience when you are making a choice, when you are affecting the outcome of the experience in some way" and uses examples of lower (e.g., making a choice) and higher (e.g., picking up a tool and unscrewing a screw) interactions. Hence, we posit that low levels of interactivity typically indicate a more atomistic form of VR, whereas high levels of interactivity lead to holistic representations of VR.

Shared and social experience
To enable social interactioneither with virtual or real characters -VR experiences rely on virtual representations, commonly referred to as avatars (Schroeder, 2002). However, when employing avatars, non-verbal communication cues pose a particular challenge. For example, sophisticated user representations record and display not only overall avatar movement, but also the specific head and eye movements of these avatars. Failure to accurately animate the head and the eyes may lead to avatars being perceived as unrealistic or inattentive, and this can significantly reduce users' perception of telepresence (Itti et al., 2003). For avatars depicting real world characters, representing their appearance and behavior in realistic ways is challenging. Furthermore, whereas creating static avatar models is less problematic, adding motion adds complexity. Typically, the more appearance and behavior coincide with what users would expect in the physical world, the higher the perceived telepresence. However, our informants pointed out that telepresence might be negatively influenced by content that is too realistic. This is particularly true when displaying representations of humans, as this evokes strangely familiar feelings or eeriness and revulsion, an effect that is commonly referred to as the uncanny valley (MacDorman et al., 2009; also suggested by our informants, e.g., BILL). In addition, whereas AR is often built to accommodate multiple users, doing so in a shared VR environment poses challenges for designers. For example, when implementing conversations with multiple users, VR environments need to account for the distance to the user (e.g., the closer a communication partner, the louder the voice of this avatar should be). If not done properly, this might negatively influence the experience. An example where distance-based volume of other users' voices is implemented is Mozilla Hubs.

Perceptual experience
To maximize the feeling of telepresence, a plausible virtual world needs to be presented to the user (Lee, 2004) where aspects such as the quality of the graphics, dimensionality of content, and self-perception play an integral role (DORIAN). When events in the environment correlate with users' actions and meet their expectations of how objects and people are expected to behave, plausibility increases because they feel that the events are really happening (Slater & Sanchez-Vives, 2016).
Graphics are fundamental in driving user perceptions of telepresence in VR (Slater & Sanchez-Vives, 2016). This aspect depends on the hardware, such as the resolution of the VR display device, the visual field-of-view, and the frame rate at which the graphic updates (Bowman & McMahan, 2007). For example, for holistic VR, HMDs should allow for high resolution to increase the sense of immersion (CARLA) -optimally as close as possible to the resolution of the human eye. Trackers should update just as quickly to translate user feedback instantly. The quality of the VR model plays an important role in the determination of atomistic vs. holistic VR as well. There is a spectrum ranging from rendering simple abstract geometric shapes to highly realistic objects (Bierbaum et al., 2001) with perfect texturing and shading, making the virtual world closely resemble what users expect from the real world. For many atomistic use cases (i.e., training, familiarization, etc.), simpler forms and shapes might be appropriate, but for holistic use cases, high-quality graphics are more likely to result in increased levels of perceived telepresence.
Despite a frequent focus on visual output, it is important to acknowledge that multiple human senses are part of a VR experience. The feeling of telepresence is strongly influenced by how well a VR experience communicates through all of our senses (Baus & Bouchard, 2017), and is most positive when congruent (Flavián et al., 2021;Petit et al., 2019). Several experts referred to these communication options as modalities (visual, audio, tactile), or different feedback channels through which people can perceive the world (CARLA, LENA, PAT). For example, the lack of a haptic experience poses a challenge when interacting with virtual objects. For atomistic VR, it might be sufficient if the VR application primarily focuses on the senses required to accomplish the main task, even though a specific combination of senses may positively influence task performance. To increase telepresence for holistic VR use cases, approaches like sensory substitution (e.g., replacing the lack of haptic feedback with visual feedback), providing appropriate controllers that match the physical properties of the virtual object (for example, a sphere when interacting with a globe; Englmeier et al., 2020), or using electric muscle stimulation to simulate forces (Lopes et al., 2018) may be employed.
The feeling of telepresence in VR is equally influenced by the physical behavior of virtual objectsi.e., the laws of physicsas mentioned by LENA: "If something is falling down, it would not fall down and stay on the floor. It would basically bump up, like physical laws would have to be integrated into the virtual object." This may stand in contrast to AR, where the physical behavior of a virtual object is expected to match the laws of physics on earth. For example, when putting users in a VR application where they can experience being on the moon, gravity should match the physical laws on the moon. However, there are certain exceptions to this rule. For example, some virtual worlds allow users to fly or teleport (Hinsch & Bloch, 2009). These capabilities, while not in line with the expected laws of physics, are novel elements of the virtual world that are embraced by users specifically for this reason. Informants used the term "suspension of disbelief" as a component of certain virtual worlds (Steffen et al., 2019). However, we suggest that in most cases, the system's behavior in terms of gravity, movement, or size of objects should be internally consistent to create a higher level of telepresence.

Motion sickness
A specific challenge in VR is motion sickness, also referred to as VR sickness (Mai & Steinbrecher, 2018). It typically results from a mismatch between users' actual movements and the movements that they perceive through the virtual world but is also influenced by human factors (Chang et al., 2020). A common cause for this phenomenon is latency in VR rendering, i.e., the system is not capable of responding to users' movements in real-time (Köse et al., 2020). As technology advances, sensing user motion, calculating the required changes in the virtual scene, and rendering content in real-time will improve and continue to attenuate this issue. However, it currently poses a challenge specifically in highly sophisticated VR environments which require substantial computing power. For atomistic applications where the time users spend in VR is rather short, motion sickness might present less of an issue than holistic VR applications, where users spend a considerable amount of time (Ruddle, 2004). A summary of sections 4.3 and 4.4 is given in Table 2.

Discussion & implications
Due to their immense opportunities in many disciplines, AR, VR, and related reality formats have recently received increased attention. Enthusiasts from companies and research institutions have developed fascinating experiences both through augmentation of the real world (AR) and the creation of virtual worlds (VR). Scholars and industry practitioners, coping with a rapid evolution in this field, have defined and organized terms associated with these developments. Twenty-five years ago, Milgram outlined a technology-focused continuum leading from the real to the virtual, with mixed reality between these poles. Other authors (e.g., Farshid et al., 2018;Flavián et al., 2019) defined mixed reality as a combination of AR and VR, while industry (e.g., Kunkel & Soechtig, 2017, pp. 48-63) has proposed new terms and repurposed (or, as stated by some experts, "misused") existing ones. In short, the current literature contains inconsistent and often conflicting conceptualizations of these realities, resulting in confusion amongst academics, users, and practitioners. Meanwhile, the AR and VR industries have steadily matured to generate multiple billions in revenue each year.
Based on an intense review of the academic literature, industry publications, and expert input, we propose a complementary approach to define, organize, and conceptualize common reality formats. More specifically, the xReality framework separates AR from VR based on whether the physical environment plays a role in the user's experience or not. If yes, the experience is AR; if no, and the experience is virtual, it is VR. In order to specify AR and VR in more detail, the framework provides two continua: the AR continuum ranges from assisted to mixed reality with local presence forming the core distinction between poles. The VR continuum ranges from atomistic to holistic, and the level of telepresence is the primary discriminating factor between these poles. Our findings provide a series of implications for the emerging XR discipline. Importantly, rather than separating managerial from theoretical implications, we sought to understand these perspectives and propose an approach that incorporates both. We argue that many current discrepancies exist because academia and industry management are conceptually separated, and the current work attempts to consolidate these divergent perspectives.
XR is an overarching term used primarily by practitioners to describe "all" forms of new realities (e.g., Dalton, 2021). XR subsumes both AR and VR, as well as their various sub forms. Contrary to extant research, we propose that the term "extended reality" might be misleading since it does not include VR (where reality is replaced, not extended). Therefore, we propose to maintain the term XR (Dwivedi et al., 2020), but use it as an abbreviation for xReality. Practically speaking, the variable x serves as a placeholder for Augmented, Assisted, Mixed, Virtual, Atomistic Virtual, Holistic Virtual, or Diminished Reality.

The xReality framework
The proposed framework presents XR as an umbrella term, with two distinct sub streams: AR and VR, which contain their own continua. This conceptualization differs from existing classifications (e.g., Farshid et al., 2018;Flavián et al., 2019;Milgram et al., 1995;Milgram & Kishino, 1994) in which AR and VR are located on the same continuum. Likewise, to the best of our knowledge, this framework is the first to include all commonly used terms in a coherent framework, including AR, VR, XR, mixed reality, and assisted reality. For instance, many older (e.g., Milgram & Kishino, 1994) and more recent (e.g., Farshid et al., 2018;Flavián et al., 2019) frameworks remain silent on some aspects of reality (e.g., assisted reality). Industry frameworks sometimes incorporate the term XR but it is frequently not used consistently.
However, contrary to the MR-dominant (e.g., Milgram & Kishino, 1994) and MR-centered view (e.g., Farshid et al., 2018;Flavián et al., 2019), our framework excludes AV. We justify this based on the viewpoints expressed by the academic and industrial informants along with our inspection of both recent academic literature and online search trends. Moreover, following Looser et al. (2004) and Leclet-Groux et al. (2013), the AR/AV distinction is based on the proportion of virtual versus real content, and this is difficult to quantify. Other perspectives suggest that the AR/AV distinction is based on whether virtual content is augmented to the real-world, or if the real world is mapped to the digital content, but users may not perceive any difference between these two approaches. Our framework simply argues that if the physical environment is part of the user experience, then it is some sort of AR; if not, it is VR.
Our distinction between AR and VR is not dependent on the devices a person uses, but rather complements more technology-focused definitions (e.g., Azuma, 1997;Zhou et al., 2008). For instance, one could consider a wearable device occluding the real-world from the user as VR, but this device may also include cameras to capture the real world for AR applications. Potentially, this would allow users to switch from a VR mode to a video see-through AR mode. However, while the hardware might accommodate both AR and VR, a user could only be in either AR or VR at any given time. Furthermore, shared experiences, often discussed as "metaverses" (such as spatial. io), could also fit into the framework. Here, multiple users located in different physical locations could interact in a shared experience through either AR or VR; some seeing the others as holographic avatars and some interacting from a VR environment. Our framework also acknowledges that not all AR experiences are equal, and the same is true for VR. More specifically, we propose separate continua for AR and VR that describe how 'sophisticated' the experiences are as perceived by the user.
Moreover, managers often want to solve business problems through XR. While we clearly acknowledge that XR cannot solve every problem, our framework further suggests that the distinction between AR and VR, and its sub forms in particular, is important. For instance, if a company wants to guide its production workers through a specific task, ARin particular assisted realityis most likely beneficial. On the other hand, mixed reality might be best for letting a customer aesthetically experience a product in their living space. However, if the firm seeks to provide customers with an understanding of their production environment, VR might be the best choice. Nevertheless, in all cases, the availability of  Rauschnabel et al. devices to the target consumer (e.g., mixed reality devices) is crucial. For AR, we argue that user experiences can range from a very low functional level (assisted reality) to highly interactive and realistic experiences (mixed reality). We propose the degree of local presence of the experience as the primary distinction on this continuum. Low local presence indicates the assisted reality pole (e.g., simple text overlaid over real world data) and high levels of local presence indicate mixed reality. In true mixed reality, local presence would be so high that users may not be able to distinguish virtual from real content; they would actually experience it as being in their physical environment. However, we acknowledge that higher levels of local presence might be preferable in many, but not in all cases. There might indeed be situations where lower levels of local presence are preferred, such as when providing instructions to an employee (e.g., a worker should clearly be able to distinguish a real cable with high voltage from a virtual one).
In VR, higher levels of telepresence indicate a higher feeling of "being there" in the simulated environment. The xReality framework's VR continuum indicates that user experiences can be described between the end poles of atomistic and holistic VR. Atomistic experiences are typically simply designed, have low levels of interaction, and generally have a more "pragmatic" purpose. Holistic VR experiences are characterized by multi-sensory, complex, social experiences. As in AR, we argue that higher levels of telepresence might be better in many, yet not all, cases. For instance, a higher level of telepresence can lead to a flow experience of "time flying by" and longer use -which might not always be desirable (e.g., an instrumental task such as finding a specific piece of information). Likewise, our informants mentioned various practical arguments as to why lower telepresence VR experiences are acceptable and potentially preferable in certain situations. For instance, although most experts expect rapid improvements in technology, they suggest that simpler devices (which might not enable very high levels of telepresence) will remain more practicable (e.g., not require external computing power or external tracking technology, be lower priced, more flexible in use, or even owned by a large number of people) than highly sophisticated ones.

AR and VR as technologically-mediated experiences
A substantial amount of research conceptualizes xReality through the lens of technology. For instance, Milgram and Kishino (1994) argue that in AR, "real world and virtual world objects are presented together within a single display", and Azuma et al. (2001) define AR as ''[a technology which] supplements the real world with virtual (computer generated) objects that appear to coexist in the same space as the real world". Azuma et al. (2001) go on to add three characteristics: First, AR combines real and virtual objects into a real environment and runs interactively; second, AR is in real time, and third, it registers real and virtual objects with each other (p.34). The current research complements Azuma et al. by conceptualizing XR from a user experience perspective which requires certain technology.

Defining AR in the xReality framework
Augmented Reality is a hybrid experience consisting of contextspecific virtual content that is merged into a user's real-time perception of the physical environment through computing devices. AR can further be refined based on the level of local presence, ranging from assisted reality (low) to mixed reality (high).
Several elements of this definition require discussion in more detail. First, determining at what point forms of merged virtual and real objects become AR may remain at times ambiguous. For instance, the first versions of Snapchat Glasses are often discussed as AR, but according to our definition, they would not be considered AR in a strict sense. These glasses capture the real world through a camera, and subsequently (not in real-time) add virtual elements on a user's smartphone or tablet. However, if this was in real-time (e.g., as in the Snap Spectacles AR Developer Edition), it would be AR.
Second, AR content must be associated with a user's physical, realworld context, which might also include the user herself (e.g., through a mirror-like make-up app on a tablet). Azuma (1997) and other scholars argue that augmented content needs to be physically registered to the real world. In other words, from Azuma's perspective, the virtual content must be attached to a specific location or object ("world-stable"), which means that head-stable content (a common use case for AR glasses) would not be AR. We argue that some form of context relevance is required (e.g., the processual context, such as receiving information related to certain work instructions), but world-stable context is not a specific requirement for AR.
Third, the term 'hybrid' echoes the general consensus that AR experiences must consist of digital and real-content, and both need to coexist during the experience. In most cases, the core digital content is visual, but our definition does not exclude user experiences with, for instance, acoustic digital content only.
Fourth, AR is not limited to a specific hardware, but each technology has some specific characteristics that determine whether an experience is AR or not: • In video see-through AR (including AR mirrors where users see themselves, such as in a Makeup trial app), the AR experience happens on an intransparent screen. Hence, a regular digital picture frame or screen in a living room (without merging real and virtual content in real-time) might influence how a person experiences the room in general, but this would not be considered AR. In contrast, a virtual TV screen (cf. Rauschnabel et al., 2020) which users experience through AR smart glasses with optical see-through technology only would be considered AR. The same would be true for video see-through glasses or AR in live TV programs; the fact that real-world objects are digitized is inconsequential, since they are still displayed as real (as in AR mirrors). • In optical see-through AR, virtual elements presented on the screen must be part of the experience. That is, a (digital) street sign would not be considered AR, since the virtual content is not integrated but rather a part of the physical object (i.e., the sign). • In projection-based AR, the projected content must be digitally controlled and be registered to the physical world. A decorative color light would not be considered AR. • Although most AR discourses typically center around visual content (e.g., Gatter et al., 2022), similar principles may apply in non-visual AR. There, the content must also merge with a specific context in real-time (e.g., a specific person's actions). For instance, a user wearing Amazon's Alexa spectacles (screen-less glasses, basically an Amazon Echo in the spectacles' frame) that react to auditory commands or environmental triggers is considered AR, whereas a PA announcement in an airport not to leave the luggage unattended would not be considered auditory AR.
Finally, we acknowledge diminished reality as a specific sub form of AR that can occur anywhere on the AR continuum. In assisted reality, objects might be blurred out or hidden by a censor bar, whereas they are realistically erased in mixed reality. Diminished reality also works for audio (e.g., active noise reduction in headphones), but diminishing other sensual stimuli (smell or taste) might be challenging.

Defining VR in the xReality framework
Virtual Reality is an artificial, virtual, and viewer-centered experience in which the user is enclosed in an all-encompassing 3D space that is -at least visually -sealed off from the physical environment. VR experiences can lie on a telepresence continuum ranging from atomistic (low) to holistic (high).
As with AR, several elements of this VR definition require discussion in more detail. First, the term viewer-centered implies that the content is typically built around a certain user (we acknowledge second-person VR as a specific sub form which we do not discuss further). Second, visually sealed off implies that a user does not see the physical environment. We reduce this notion to visual senses since temperature, loud noises, or haptics (e.g., wind, surface of the floor etc.) are not very well omittable or controllable. Furthermore, looking at 3D content on a 2D screen and browsing through it with a mouse is excluded from our conceptualization of VR. Third, users see only virtual content (and not content from the physical environment) which can range from static 360-degree views to high-end multi-sensual and fully immersive VR experiences. It is important to acknowledge that we did not receive full agreement on the minimum requirements for the lowest possible level of VR. For instance, some experts suggested that VR requires the possibility for users to manipulate content (in contrast to purely consumable content, e.g., a 360-degree video displayed on an HMD), whereas others argued that looking around and switching apps qualifies as a minimum level of interactivity. Fourth, most discussions in this article revolve around common HMDs. Although our industry experts observed a declining relevance of VR caves, the general propositions of our framework might still apply to them. Fifth, we conclude that in most cases, higher levels of telepresence lead to a better user experience and, thus, are desired. This aligns with the views of our informants that VR experiences situated toward the atomistic endpoint are in many cases perceived to be more "practical" (e.g., mobile, easy-to-use, etc.), whereas user experiences toward the holistic endpoint often require stationary infrastructure and complex tracking technology.

Limitations and future research
This study has several limitations. Our research represents a "snapshot" of terminologies for AR and VR scenarios and devices as they are currently being used. These terminologies and uses will evolve with technological advances and user experience. Our framework and how we look at XR provides a complementary view to much of the extant work on the XR landscape without claiming that the proposed framework is a one-size-fits-all solution. More specifically, we acknowledge that the purpose of this framework is to organize common XR use cases.
We discussed several uncommon cases with our panel of experts, such as a person sitting on a real chair in a VR cave while using an AR device inside of it, which may be unlikely to ever take place. One could argue that these examples represent exceptions where a user experiences elements of both AR and VR at the same time, or where one could argue that the "AR content" is part of a VR experience. However, we clearly acknowledge that these examples require significant justification to be implemented in other frameworks, if this is even possible. With this in mind, we argue that the purpose of the xReality framework is to classify and organize current use cases and deviceswhich implies a call to reassess the framework later. Moreover, we identified various factors throughout our research that determine an experience's position on the AR or VR continuum, respectively. We acknowledge that this list of factors might not be complete and thus opportunities for further research arise. Hence, our work remains largely conceptual, and future research should be conducted to validate and extend our findings. Table 3 presents more specific suggestions for future academic research that emerged from the current research findings. Here, we distinguish between user-focused (i.e., how users react to XR) and management-focused (i.e., how practitioners should apply XR) research. On a meta level, we provide several broader implications on how academics can shape the future and impact of XR, including ethics and privacy (e.g., Lammerding et al., 2021;Cowan et al., 2021, Finnegan et al., 2021Rauschnabel et al., 2022). However, this distinction may be perceived as a false dichotomy since the rapidly evolving nature of XR will blur the lines between what is currently considered "academic" or "practitioner" research. One purpose of the current research is to define XR in a way that transcends this distinction. Much like the contention that approaches from diverse disciplines will be beneficial in understanding XR, merging the perspectives of academics and practitioners will speed our understanding of this unique arena. We hope that the current research contributes here.
During the review process for this paper, Facebook, Inc. changed its name to Meta Platforms, Inc. in an effort to shift the brand's focus from social networking to the creation of an AR/VR driven metaverse. As the Research could identify the differences in/influences on user behavior (e.g., success factors for pleasant user experiences, risk factors, performance and usability) when interacting through either AR or VR.
The findings of the current research (e.g., Table 1) can inform a research agenda.
A framework that theorizes the feasibility of using AR or VR in different contexts. Furthermore, academic research could develop approaches to the measurement of effectiveness and efficiency of AR and VR applications (e.g., identification of KPIs like engagement, intention for repeated use, etc.). The scope of AR and VR, (referencing P3, P4) What additional properties/features determine an experience's position on the AR or VR continuum? How can these criteria be measured? What are the psychological constructs (e.g., user goals, information processing styles) that mediate the characteristics? How do these factors impact users?
For which use cases is it necessary to focus specifically on sophisticated mixed reality or holistic VR? When is assisted reality or atomistic VR sufficient or better suited to accomplish organizational goals? How should companies react to differences in users' perceptions? How can these differences be monitored to facilitate effective responses? On a Meta Level Research principles, methodologies, data, and research practices Future academic research should address practical problems, such as defining the scope of an evolving industry. As is common in the humancomputer interaction field, the use of design science research could be beneficial in other disciplines when exploring these topics. Furthermore, XR offers new forms of data that can help understand users from behavioral, physiological, emotional, and attitudinal perspectives (e.g., through tracking via embedded sensors such as eye tracking and motion sensing; embedded surveys; etc.). Scholars and practitioners might also make use of the data about the surrounding physical environment that is gathered through sensors. The above also points to new ethical challenges, such as how scholars can protect respondents physically (e.g., from collisions or motion sickness in VR) and psychologically (e.g., from traumatizing content) as well as from a security (e.g., impostors manipulating the visual output or capturing the input) and privacy (e.g., by collecting user data without prior consent) perspective. Finally, exploring how knowledge can be transferred effectively between industry, academia, and public policy should be an area of future research.

Disciplines
The current research serves as a call for more interdisciplinary research on XR, AR and VR. XR has suffered from inconsistent definition and industrial application, and few strategies have been developed for its effective implementation. Furthermore, innovative ideas from one discipline might lead to legal problems related to another. Interdisciplinary teams of researchers could tackle these challenges better than scholars from a single discipline.

Ethics
Very little is known about the dark side of XR. Future research should investigate how the excessive or repetitive use of XR impacts the physical and psychological wellbeing of individuals and societies at large. next iteration of social networking, Meta seeks to leverage the technologies to simulate social connection, fulfilling a purpose consistent with the Facebook legacy. Future research might apply the xReality framework to the metaverse concept as the differences between XR and VR will become increasingly important.

General conclusion
xReality is a dynamic and rapidly developing field that is influenced by technology companies and their marketing departments who aim to differentiate themselves from existing solutions. Consultancies, academics, bloggers, and journalists are shaping the discipline by communicating their own understanding of terms. The current research provides an attempt to structure and organize terminologies through the lenses of industry, academia, and the user. We hope that our framework supports the discipline.

Author statement
All authors contributed equally to the manuscript. Philipp Rauschnabel initiated the project. teractions Conference (airsi) for their valuable feedback on previous versions of this manuscript. Furthermore, we thank the students and managers enrolled in the XR marketing seminars at Universität der Bundeswehr München, Otto-Friedrich Universität Bamberg and MCI Innsbruck. Finally, we acknowledge the constructive feedback and guidance of the guest editors and two anonymous Computers in Human Behavior reviewers.