Abstract

As smartphones, tablet computers, and other mobile devices have continued to dominate our digital world ecosystem, there are many industries using mobile or wearable devices to perform Augmented Reality (AR) functions in their workplaces in order to increase productivity and decrease unnecessary workloads. Mobile-based AR can basically be divided into three main types: phone-based AR, wearable AR, and projector-based AR. Among these, projector-based AR or Spatial Augmented Reality (SAR) is the most immature and least recognized type of AR for end users. This is because there are a small number of commercial products providing projector-based AR functionalities in a mobile manner. Also, prices of mobile projectors are still relatively high. Moreover, there are still many technical problems regarding projector-based AR that have been left unsolved. Nevertheless, it is projector-based AR that has potential to solve a fundamental problem shared by most mobile-based AR systems. Also the always-visible nature of projector-based AR is one good answer for solving current user experience issues of phone-based AR and wearable AR systems. Hence, in this paper, we analyze what are the user experience issues and technical issues regarding common mobile-based AR systems, recently widespread phone-based AR systems, and rising wearable AR systems. Then for each issue, we propose and explain a new solution of how using projector-based AR can solve the problems and/or help enhance its user experiences. Our proposed framework includes hardware designs and architectures as well as a software computing paradigm towards mobile projector-based AR systems. The proposed design is evaluated by three experts using qualitative and semiquantitative research approaches.

1. Introduction

Speaking of digital economy, many people have referred to it as “an economy based on digital computing technologies” which includes vast varieties of technologies ranging from Internet technologies to reality-based technologies like virtual, mixed, and augmented realities. Nevertheless, along with increasing growths of digital economy, there are socially physical gaps that have been expanding unstoppably, leading us to a new world where we are more digitally connected with those who are physically far away but less interacted with those who are physically nearby. One reason for these physical gaps caused by digital economy is because there is still a huge gap between digital and physical worlds right now. At this point, Augmented Reality (AR), a technology for seamlessly augmenting a physical world with digital contents, can be an answer that helps bridge this gap and brings the two worlds closer together, both digitally and physically.

At present, majority of people in this digital economy age are still not aware of the true potential of AR technologies beyond gaming, entertainment, and technology gimmicks. As Deloitte emphasized in [1] that the core of digital economy is not about creating another unicorn but about “using the latest technology to do what you already do, but better”; this helps explain secrets of recently successful technology products and services, where technologies are not used to force users to do new things but to persuade them to do their same old things with another alternatives of better user experiences. Many successful and failed stories of AR products have followed this path of user experience oriented. For example, in Google Translate smartphone application, its camera-based real-time translating feature (previously known as Word Lens application) is a use-at-will AR option that attracts many users by providing a better user experience of point-camera-to-translate instead of the traditional type-characters-to-translate. A similar story can be found in the famous Pokemon Go game whose simple AR mode has spread the term AR worldwide and the AR technology itself has been utilized by massive of people of any ages across many countries for the first time in AR history. One key to success shared by these two AR products is that they managed to slip their AR features into existing user behaviors. This is, however, in contrast to a story of Google Glass (version 1.0), an AR glass which was once a rising star in AR but then failed due to lack of social acceptance. It can be said that the first Google Glass is an AR case study where the latest technology failed to slip into the existing user behaviors, resulting in unexpectedly bad user experiences.

Considering recent trends in mobile-based AR technologies, they can be categorized into three types—phone-based AR or monitor based AR, wearable AR using either optical see-through or video see-through glass or head-mounted device (HMD), and projector-based AR or Spatial Augmented Reality (SAR) as introduced in [2]. For phone-based AR, there are a large number of examples, applications, and software available, ranging from easy-to-use AR content creator and player software for multimedia designers to free AR libraries and toolkits for programmers and developers. As for wearable AR, although the story of Google Glass 1.0 is not a happy ending, the comeback of Google Glass Enterprise Edition (Google Glass EE) as well as other smartglasses seems to perform well as “a practical workplace tool that saves time and money” [3]. Also, there is Microsoft HoloLens (developer edition), an optical see-through HMD that helps promote this second type of mobile-based AR among researchers and technology enthusiasts.

Nevertheless, because of the nature of small monitors regarding phone-based AR and wearer-only private experiences as well as bulky head-mounted devices regarding wearable AR, these two types of AR share similar user experiences of difficult group observation and not facilitating face-to-face social interactions; this means that these two types of AR may not be good mobile solutions to bridge the socially physical gap caused by the growth of digital economy as we mentioned earlier. At this moment while actual 3D holographic display technologies are not ready yet, it is the third type of mobile-based AR, projector-based AR, that can help relieve these user experience problems.

Speaking about projector, it is a supreme programmable light source that allows creating space-efficient and seamless visual displays in an AR manner. Using light projected from a projector, we are able to augment tangible objects with projected imagery, bringing immediate physical object and environment to life in a shared space. This controllable light source allows us to transform an arbitrary surface into an interactive touch screen in OmniTouch [4], repaint a historical painting without damaging the invaluable canvas [5], virtually restore an old dwarf’s house in Disney theme park to a dreamy house [6], enhance a low-dynamic-range photograph to a high-dynamic-range one [7], and simulate AR virtual objects that users can catch and throw to one another across a room [8].

Using camera together with projector is called a pro-cam device which has been a popular choice of pairing for creating mobile projector-based AR as it needs no active sensor set in an environment, allowing interactions that move from space to space. However, according to a fundamental fact of vision-based pro-cam device, camera and human generally operate in the same visible light spectrum, leading to two difficult situations in mobile projector-based AR. First is when projected imagery significantly changes visual appearances of physical objects as seen by camera; this situation may lead to misinterpretation or even failure of visual analysis algorithms, especially in mobile projector-based AR systems that heavily rely on vision-based object recognition. The second difficult situation is when a projector requires projecting some special imagery to help facilitate or simplify vision-based recognition: for example, projecting a checkerboard pattern for pro-cam calibration or depth computation and projecting a predefined marker as a visual message to communicate among many projectors in a peer-to-peer (P2P) style. This second situation leads to a problem that projecting these special markers or patterns generally cannot be done without impairing or interrupting a normal real-time projection or being noticed by audiences.

At this point, it can be seen that all three types of mobile-based AR possess their own problems, advantages, and disadvantages. In this paper, our purpose is to discuss user experience problems regarding the already well-known phone-based AR and wearable AR. Then, for each problem, we propose an alternative solution using projector-based AR together with its integrated hardware design, architecture, and software computing paradigm. Starting from Section 2, we describe works in related areas of mobile projector-based AR in order to familiarize readers with this type of mobile-based AR as well as its common designs and architectures. Section 3 explains user experience problems regarding existing mobile-based AR systems. Section 4 presents our proposed projector-based AR for enhancing user experiences of mobile-based AR, including hardware designs and software computing paradigm. Section 5 shows evaluation results and discussion regarding our proposal from experts in AR and related fields. Finally, Section 6 concludes this paper together with our plans for future works.

In order to give readers a broader background about interactive projection systems whose coverage includes almost everything about mobile projector-based AR. In this section, we discuss about previous researches in interactive projection systems, particularly ubiquitous and mobile projection as required in mobile-based AR usage scenarios. However, to narrow down the scope, our focus is on previous works whose interactive projection involves difficulties or solutions regarding crosstalk between projected imagery and camera feedback in the visible light spectrum, which are two fundamental problems in a vision-based pro-cam device as mentioned in Section 1.

One of the most common and easiest solution for interactive projection using a vision-based pro-cam device is to use a predefined fiducial marker whose essential visibility when being seen by camera is barely degraded by overlaid projection imagery. Examples can be found in many proof-of-concept works like [1012]. This solution is simple and effective, allowing interactive mobile projection on any object attached or printed with the predefined marker or pattern. However, obtrusive nature of these markers violates the true spirit of mobile-based AR where the physical environment is supposed to be dynamic and unprepared.

Avoiding physical overlap between projected imagery and physical object by careful calculation as in [1315] is another solution that still allows a perfect mobile-based AR system. Nevertheless, projector-camera-object geometric calibration needs to be done precisely and there is an extra time used for computing nonoverlap areas for projection. The limited area for projection is a major drawback of this solution that may not be suitable for projection in an area cluttered by physical objects.

Instead of compromising with projected imagery, many works utilize more complicated techniques to reduce visibility of projected imagery to nearly invisible or imperceptible in camera feedback: for example, reengineering the internal mechanism of projection engine in [16] and analyzing unique characteristics of micromirror flipping and color wheel in an off-the-shelf DLP (Digital Light Processing) projector in [1719]. Results of these techniques are impressive and require little effort during online computation. However, they highly depend on special cameras (high speed and externally triggered) and specific projection technology which may not last long. Besides, it is not easy to reengineer or control internal projection mechanism without cooperation from technology’s owner.

A cheaper and more independent alternative that makes projected imagery invisible in camera feedback is to completely separate their working light spectrums, one in visible light spectrum and the other in infrared light spectrum; this refers to an active infrared technique. By letting projected imagery stay in the visible light spectrum and camera feedback in the infrared light spectrum, there is no crosstalk to be concerned regardless of any context-sensitive imagery being projected onto or around physical objects. For simple scenarios of eliminating projected imagery in camera feedback, costs of active infrared alternatives are relatively low and barely require professional knowledge in mechanism. By attaching an onboard infrared light source and applying an infrared-pass filter to a regular camera, a pro-cam device is ready for simultaneous infrared visual sensing and color projection; this is the same technique used in surveillance cameras supporting night vision. Examples of using active infrared techniques for interactive projection systems can be found in [20, 21].

Nonetheless, using active infrared to assist interactive projection has already gone far beyond the crosstalk elimination between projected imagery and camera feedback. As proposed in [22], projecting one or more predefined markers in infrared enables M2M (machine-to-machine) visual communication among multiple projectors to be done in a P2P style. For these advanced scenarios of projector-to-projector P2P visual communication that does not disrupt normal color projection, a projector capable of projecting simultaneous visible color images and invisible infrared images (i.e., an IR-RGB dual-input projector) is required. This is a difficult requirement at this moment as such projectors mostly exist as prototypes in research laboratories or as internally sealed parts inside specially designed products like Kinect sensor, Leap Motion, or Sony Xperia Touch. In the past decade, the most popular usages of vision-based active infrared are perhaps real-time depth sensing as this is a feature internally implemented in Microsoft Kinect sensor, smart TV, and Leap Motion. Existing interactive projection systems utilizing this active infrared feature in Microsoft Kinect sensor include [4, 8, 2124].

Apart from illuminating an environment with a uniform infrared light or projecting a desired image with an infrared light, many works further use active infrared in cooperation with other techniques to increase the power of infrared for vision-based interactive projection systems: for example, wrapping a target object with infrared retroreflective material to enhance its visibility in infrared camera feedback as in [21], using objects printed with infrared absorbing ink so that the markers are invisible to human but infrared camera as in [25], using two wavelengths of infrared (i.e., 830 nm and 950 nm) to enable two different interactive infrared projections at a time as in [21], and applying time-modulation, coding, or encryption techniques to embed more sophisticate M2M visual communication messages among many interactive projectors.

So far many designs of a mobile pro-cam device supporting vision-based active infrared have been proposed, mostly to enable specific interactive projection features required by a specific system. In this paper, we focus on proposing designs and architectures of pro-cam devices that come with active infrared features; this is for the purpose of enhancing user experiences of mobile-based AR. Rather than creating a design that works for one specific interactive projection requirement, our proposal tends to be more general and sustainable, enabling all previously proposed features like eliminating crosstalk between projection and camera as well as advanced M2M communication to be done in a standalone and cooperative manner.

3. User Experience Issues in Mobile-Based AR towards Digital Economy

For driving our digital economy, the strength of AR technologies is that it allows alternative user experiences where a person needs not to switch their attentions back and forth between a physical object and an electronic monitor displaying digital contents. This is because the digital contents are already displayed on/around/over the physical object in real time when viewing from proper channels. According to [3, 2729], Google Glass EE has been reported to help increase productivity and quality of works in manufacturing, logistics, field services, and healthcare in many giant industries including General Electric, Boeing, DHL, Volkswagen, AGCO, and many physician offices. For example, DHL gained increase in supply chain productivity from 6% up to 15%, AGCO achieved 25% reduction in production time, and there was 20% reduction regarding physician’s administrative paper works when using Google Glass EE. Also according to user studies conducted by Fujitsu laboratories Ltd. in a factory environment, [30] reported that projecting real size and real speed of expert hands directly onto real objects helps improve precision of works for novice workers as well as making the works feel less difficult and easier to memorize/follow for them.

It can be said that, for the purpose of closing gaps between physical and digital worlds in order to enhance individual user experiences, mobile-based AR and wearable AR are suggested alternatives that have potential to drive recent digital economy according to many available reports and studies. Another shared advantage of phone-based AR and wearable AR is that they both can be easily implemented on top of existing mobile devices and platforms like smartphone, tablet computer, Google Glass, Microsoft HoloLens, and so on. Nevertheless, despite their popularity, phone-based AR suffers from small sizes of display monitors, wearable AR provides wearer-only private AR experiences, and both share the same drawback of limited field of view, difficult group observation, and not encouraging face-to-face social interactions. This means that for the purpose of bringing people closer together and triggering more face-to-face interactions both digitally and physically, phone-based AR and wearable AR alone may not be enough.

In contrast to phone-based AR and wearable AR, projector-based AR projects digital contents out and overlays them directly on physical surfaces. Hence, public AR experiences can be automatically conveyed by this type of mobile-based AR, allowing more possibilities and usage scenarios regarding social interaction in both physically and digitally connected manners. Note that these interactive characteristics of real-time projection on surfaces are similar to CAVE (Cave Automatic Virtual Environment) Virtual Reality (VR), a room-sized VR environment where walls, floor, and ceiling are usually painted by interactive lights projected from many static projectors. On the one hand, both projector-based AR and CAVE VR share the same problem space of using virtual contents projected by projectors to interactively paint physical surfaces. On the other hand, their usage purposes, device setups, and related user experience issues are different. While CAVE VR uses static projectors in a static room and focuses in simulating immersive experiences in a virtual environment, our mobile projector-based AR uses one or more mobile projectors in a dynamic environment and focuses on accurate physical environment interpretation for precise virtual augmentation. Some user experience studies of CAVE VR include [31] where user interactions with and without the uses of virtual hands are compared between the real world and the CAVE VR. This kind of user experience studies is specific to VR environment; hence, it is beyond the scope of this paper.

Compared to phone-based AR and wearable AR, projector-based AR is less mature and not well known yet. However, for fulfilling the incomplete user experiences of phone-based AR and wearable AR as discussed earlier, projector-based AR is one good answer as its size of projected imagery does not depend on size of projection engine and the projected imagery is generally always-visible to public bystanders, encouraging spontaneous collaboration among people surrounding the projected AR contents. For example, SurfacePhone [32] presents a smartphone case with an embedded projector that helps extend the visual display from the smartphone’s small monitor to any nearby tabletop-like surface and [33] projects a user’s online social identity onto a physical surface in order to trigger face-to-face social interaction between the user and other nearby people.

In addition to user experience issues of wearable AR, a recent study in [34] mentioned that using an optical see-through HMD for mobile-based AR (i.e., Microsoft HoloLens) makes the AR contents appear too transparent for users, particularly when dark AR contents are being displayed in a bright environment. In order to make AR objects look more solid and less transparent for the HoloLens’ wearer, [34] proposed the SolidAR framework—an open framework that uses an off-the-shelf projector to project AR occlusion mask onto a physical surface, resulting in more opaque appearances of AR objects and better localization ability of users when AR objects are being overlaid on a complex surface.

Although projector-based AR has strengths that can fulfill many holes of recent phone-based AR and wearable AR systems, using projector for mobile-based AR introduces another user experience issues as there is always a gap between technical solution and real-world utilization. One obvious issue of projector-based AR is about data privacy as it is not easy to control who are allowed or not allowed to see the always-visible contents projected onto a public physical surface. Straightforward solutions for this privacy concern are to switch back to private experiences of wearable AR or less public visibility of phone-based AR. For other tricky solutions, user interactions need to be designed more carefully in combination with other user interaction tricks as an example in [35] where a sheet of retroreflective material is used in conjunction with a chest worn pro-cam device so that the pro-cam wearer will be the only one who sees contents projected onto that sheet.

Another user experience issues of projector-based AR are technical issues that originate from fundamental problems in most mobile-based AR systems. Speaking about mobile-based AR, it conveys working in a dynamic uncontrolled environment that may change unpredictably from time to time. Hence, real-time scene analysis is an important feature required by most mobile-based AR systems in order to provide correct and meaningful real-time augmentation to the physical environment. So far the problem of real-time scene analysis is nontrivial, especially in ubiquitous or mobile usage scenarios where fixing some sensors inside an environment is not possible or preferable. Vision-based approaches using camera sensors are the most popular solutions in this problem domain due to three main reasons. First, it is because of recent advances and availability of embedded cameras in mobile devices that make cameras become one of the most accessible sensors. Second, it is because the mainstream AR has been dominated by visual outputs; therefore, using a camera to observe the visual outputs is a straightforward and intuitive solution. Third, it is because using camera allows sensing an arbitrary environment as the case without any predefined setup; hence, it does not violate the true spirit of mobile-based AR.

Nevertheless, in practical, analyzing streaming images read from a dynamic scene is difficult due to the big data nature of images and high complexity of image analysis algorithms. One popular technique for dealing with these difficulties is to use some kinds of active sensors that emit some known energy to the unknown environment. Then, instead of directly measuring changes in the unknown environment, the system indirectly calculates changes in the unknown environment via changes in the known energy. Using projector and camera together forms a pro-cam device that can serve as an active sensor for facilitating real-time scene analysis in mobile-based AR systems. Considering projector-based AR in the role of active sensor for real-time scene analysis, crosstalk between visual analysis and projected imagery is a fundamental problem that can lead to interrupted or impaired user experiences while viewing real-time AR contents. There exist many alternative solutions for fixing this user experience issue as described earlier in Section 2. However, for the sake of faster speed and lighter online computational loads, this paper will focus on proposing hardware oriented solutions.

In conclusion, there are six issues regarding mobile-based AR user experiences to be discussed in this paper; some originate directly from phone-based AR and wearable AR, some are fundamental in most mobile-based AR, and some are consequences of using projector-based AR to solve problems of mobile-based, phone-based, or wearable AR. Table 1 concludes the six issues. Because solutions regarding the 1st, 2nd, and 3rd issues can be inferred by the nature of projector-based AR as mentioned already in this section, we will no further discuss about them. As for the 4th issue, AR contents projected from a projector could be impaired by inappropriate environment lighting and complexity of surface’s texture, resulting in too transparent or confusing visual appearances of AR contents as seen by audiences. However, solving the 4th issue is about doing real-time radiometric compensation whose technical details are too deep and beyond the scope of this paper. Hence, our proposal in Section 4 will mainly focus on the 5th and 6th issues of using a pro-cam device as an active sensor for facilitating real-time dynamic scene analysis in mobile-based AR systems.

Note that because all three types of mobile-based AR possess their unique strengths and weaknesses, our purpose of the study is not to replace existing phone-based AR or wearable AR systems but to unite them with projector-based AR in order to enhance their user experiences where necessary. Besides, there exist other alternative solutions without projector that can solve user experience issues of mobile-based AR, phone-based AR, and wearable AR as well. For example, Pinlight display [36] proposes a new design prototype of an optical see-through AR glass with a compact form factor and a wider field of view; their accomplishment is done by coding an array of point light sources on a LCD panel placed in front of the eye but out of focus.

4. Mobile Projector-Camera Designs and Architectures

In this section, we will propose and discuss about how to design projector-based AR systems from hardware oriented perspective. The purpose is to fulfill incomplete user experiences of mobile-based AR, phone-based AR, and wearable AR and to solve technical problems originating from projector-based AR itself. Starting from Section 4.1, hardware designs and architectures of pro-cam device in projector-based AR are discussed according to user experience issues stated in Table 1. Then Section 4.2 describes our software computing paradigm to be used with the hardware proposed in Section 4.1. The combination of both sections will not only solve user experience issues mentioned in Table 1 but also facilitate visual communication among many projectors in an invisible P2P style for future mobile-based AR applications.

4.1. Hardware

Although this section focuses on mobile-based AR relying on mobile devices, the proposed designs and architectures can be used for other AR systems using steerable or portable projectors as well. In the following subsections, we will refer to the invisible infrared light as IR and the visible color light as RGB.

4.1.1. An IR-RGB Dual-Input Projector

To build an IR-RGB dual-input projector that completely separates working light spectrums of visual analysis (background process in IR) and normal color projection for projector-based AR (foreground process in RGB), techniques of altering an off-the-shelf commercial projector into an infrared-only projector are proposed in [37]. Otherwise, we can use two projectors at a time to simulate an IR-RGB dual-input projector as presented in [38]. Figure 1 shows our prototype of simulating an IR-RGB projector by stacking a normal RGB pico-projector with an infrared projector (as embedded in Kinect sensor for Xbox 360). This prototype is, however, not capable of projecting an arbitrary infrared image yet as inputs to the infrared projector are fixed by Kinect’s manufacturer. Note that semitransparent papers are used here to blur out the special infrared dot pattern projected by Kinect. As a result, the prototype in Figure 1 is able to project an arbitrary RGB image and, at the same time, able to illuminate an environment with a static and smooth infrared light.

Nevertheless, suggestions about how to create a projector that is capable of projecting both infrared and RGB images through a single lens are written in [22, 39, 40]. The papers [22, 40] explain how to add an infrared channel to an existing image forming mechanism of a projector. This is done by preparing one light source (e.g., LEDs) for the infrared channel so that the projector has four instead of three light sources for four projected channels (i.e., red, green, blue, and infrared). To pass RGB and infrared images to the projector, user can simply create a single four-channel image and treats the fourth channel as the infrared channel.

The technique of four-channel projected images can be easily extended so that the projector can project not only one but two infrared wavelengths as in ROOMProjector [21]. The illustration of this technique is shown in Figure 2. Note that this illustration is not an actual optical implementation as it depends on each projection technology which is beyond the scope of our proposal.

4.1.2. A Multiband Camera

In this context, a multiband camera means a camera with an ability to sense images in different wavelengths simultaneously: for example, sensing one infrared image and one RGB image, or sensing one RGB image and two infrared images; this refers to the four- and five-channel projection mechanisms as mentioned in Section 4.1.1. Using this multiband camera instead of a single-band camera as usual enables more possibilities of projector-based AR interactions as well as more complex real-time scene analysis techniques.

A design of multiband camera capable of sensing one visible light image and two infrared light images is presented in [39] using combination of a hot mirror (reflecting infrared light but transmitting visible light), an infrared beamsplitter (splitting incoming infrared light into two directions), and bandpass filters (only light of a specific wavelength can pass the filter). Figure 3 illustrates how to implement multiband cameras. It can be seen that, for the four- and five-channel projection mechanisms, the multiband cameras internally consisted of two and three cameras, respectively.

4.1.3. An IR-RGB Dual-Input Projector with a Multiband Camera

Combining the IR-RGB dual-input projector and the multiband camera is a complete design and architecture for our self-contained pro-cam device that is capable of solving or relieving the 5th and 6th user experience issues written in Table 1. As mentioned earlier in Sections 4.1.1 and 4.1.2, internal coaxialization can be done by simply adding prism or special mirror. Coaxializing everything guarantees that everything that can be projected upon can also be seen by the camera as well. Hence, there is no need for computing real-time 3D pose between the projectors relative to the camera. Nevertheless, coaxializing the IR-RGB dual-input projector with the multiband camera is not simple as the single optical path will be responsible for both RGB and infrared(s), in both incoming (for cameras) and outgoing (for projectors) directions.

Figure 4 illustrates optical guides about how to coaxialize a pair of one projector and one camera. It can be seen from Figure 4 and from our prototyping experiment in Figure 5 that using a beamsplitter to coaxialize projector and camera operating in the same wavelength (i.e., RGB projector versus RGB camera and infrared projector versus infrared camera) causes half of the projected light to be reflected away which is power inefficient, particularly for a mobile projector with limited brightness. Besides, half of the light reflected from the environment is also reflected away from the camera, degrading visibility in camera feedback.

Coaxializing everything is not impossible but more complex fabrication and optical mechanism are required. Because this complicated optical work is beyond our scope, we will leave the discussion here. Nevertheless, to keep things simple, some may choose to externally calibrate all parts instead of coaxializing them. This is the implementation as found in Microsoft Kinect sensor for Xbox 360 where RGB camera, IR-pass camera, and infrared emitter are placed linearly next to one another in a horizontal direction. This Kinect-like alternative requires 3D pose recovery of the projector (either RGB or infrared) relative to the camera (either RGB or infrared), and there is no guarantee that what can be seen by the camera can also be projected upon. However, the parallax nature of this alternative can become useful for tasks like depth sensing and 3D recovery.

4.1.4. A Laser Projector

Another vision of us for a fully developed hardware architecture for projector-based AR includes short-throw laser projection. The laser-based projection allows both RGB and infrared projected images to stay in-focus regardless of 3D geometry of the surface or changing in depth. Nevertheless, from the survey in [41], new mobile RGB-laser projectors have absented from the market since 2011, whereas infrared-laser projectors, to our knowledge, have barely been talked about in terms of mobile projectors. Therefore, although it is definitely useful for mobile scenarios, we exclude this focus-free ability from our current proposal until the technologies get more ready.

4.2. Software

The main requirement of software regarding the proposed design in Section 4.1 is a software capable of doing real-time image processing and computer vision. Four main computational modules for using the proposed pro-cam design in an interactive manner are illustrated in Figure 6. Input to the combined module is light from an environment, including visible light and infrared light. Outputs from the combined module are two independent projected images; one image is an RGB image (visible light wavelength) and the other image is an invisible infrared image. For a fully computed system consisting of all four modules, responsibility of each module is concluded as follows:(i)Module A is responsible for synchronizing or calibrating RGB and infrared images so that using both images together during analysis can be done in a correct manner; the synchronization and calibration are not only for cameras but also for projectors as well. This module is crucial for the noncoaxialized design where optical axes of the RGB and infrared cameras/projectors are not coaligned and there is parallax between fields of view. An example is shown in camera_rgb and camera_ir of Figure 6. It can be seen that although both images represent the same scene, object’s positions in the two images are not coaligned.(ii)Module B is responsible for processing, analyzing, or interpreting the two images (i.e., camera_rgb and camera_ir in Figure 6) using image processing and computer vision algorithms in order to extract required knowledge from both images.(iii)Module C uses the results from B to create real-time AR projection responses (i.e., project_rgb1 and project_ir1 in Figure 6) that interact with an immediate environment appropriately. Note that implementation of this module is varied, depending on each system’s requirements.(iv)Module D is often found in smart projection systems, including projector-based AR systems, where the original images for projection are transformed before actual projection. Generally this is referred to geometric [4, 14, 18, 24, 4244] and photometric (a.k.a., radiometric [5, 7, 24, 26, 4547]) compensation whose purpose is to precompensate the projected imagery so that the image when overlaid on the surface will look as similar as possible to its originally intended appearance, regardless of surface’s geometry, color, texture, and so on. An example is illustrated in Figure 6 where the original project_rgb1 and project_ir1 are altered both geometrically and photometrically into project_rgb2 and project_ir2.

Nevertheless, as noted in Figure 6, modules A and D can be omitted and the two input camera images may go directly to module B before the projected images generated by module C are directly projected out. Omitting module A refers to cases when a system does not require cointerpretation or coanalysis between RGB and infrared camera images or between projector and camera. Skipping module D is for a system whose concern does not include adaptive projection on an arbitrary surface.

Finally, as our proposed design is intended for real-time mobile usages, it means that everything in the environment may be dynamic and interactive responses must be updated in real time. The key problem is that real-time interactive projection in a mobile environment introduces huge computational loads that are mostly beyond limitation of recent mobile devices: for example, doing 3D depth recognition in module B and computing real-time photometric compensation values for every single pixel in module D. Hence, to accomplish this software in practice, we strongly recommend an addition of specific hardware or processing unit for speeding up the calculation, particularly in time-consuming and frequently used tasks.

As for efficiency regarding visual analysis of a pro-cam device in an active infrared environment, experiments conducted by [9] conclude that, in theory, although indoor ambient light makes no difference in visual analysis on active infrared camera feedback, in practical, visual recognition on RGB camera feedback is noticeably more precise and stable than IR camera feedback.

5. Evaluation and Results

In this section, we will evaluate that, from experts’ points of view, how much our proposed design in Section 4 can solve or relieve user experience issues regarding mobile projector-based AR, particularly the 5th and 6th issues listed in Table 1. To obtain in-depth information, we applied both qualitative and semiquantitative research methodologies using a combination of open-ended and closed-ended interview questions as well as the think-aloud protocol [48]. There are three experts in AR and related fields involved in this evaluation as we will refer to them as E1, E2, and E3, respectively. Expert E1 is an active and senior game programmer/developer that has been working in one of the world top ranked and very famous game companies. Expert E2 is an academic researcher in image processing and computer vision who is currently working as a senior data scientist for an international private industry. Expert E3 is an active AR academic researcher. Note that academic performances of both expert E2 and E3 include many international publications with impact factor and high quartile ranking in SJR (Scimago Journal and Country Rank).

Our one-to-one interview session is designed to use 20–30 minutes per one expert, following the step-by-step procedure as listed below:(1)Current landscape of mobile-based AR (freestyle opinion): we use open-ended questions to ask how our expert is thinking about the current landscape of AR, including problems and their suggestions (if any). The same set of questions is repeated three times for each type of mobile-based AR.(2)Current landscape of mobile-based AR (Likert rating scale): we ask our expert to rate each user experience problem as listed in Table 1 for all three types of mobile-based AR. The five Likert rating scales are interpreted here as 5: very important, 4: important, 3: fairly important, 2: slightly important, and 1: not important.(3)Evaluation regarding our proposal (Likert rating scale and freestyle opinion): after explaining our concept of an IR-RGB dual-input projector with a multiband camera as proposed in Section 4 to our expert, we use open-ended questions to ask our expert about their opinions of using the proposed design to solve the 5th and 6th user experience problems in Table 1. Then, we use closed-ended questions and ask our expert to rate our proposal in five different attributes—novelty, practicality, (help improve) mobile-based AR user experiences, difficulty (of actual implementation), and possibility in commercial markets. The five Likert rating scales are interpreted here as 5: very high, 4: high, 3: fair, 2: low, and 1: very low.

Tables 2, 4, and 5 and Figures 7-8 conclude our evaluation results. Using an interval of , our semiquantitative results in Tables 4 and 5 will be interpreted as in Table 3.

According to the results in Tables 2 and 4 and Figure 7, the problem of AR not encouraging face-to-face social interactions is the lowest rated (1.78 = not important), whereas the problem of crosstalk is the highest (4.67 = very important). Together with qualitative results from interviews, it confirms our idea that projector-based AR should be utilized as an alternative choice for experiencing AR, not a replacement for existing phone-based or wearable AR. This is because projector-based AR not only leads to new AR user experience problems but also introduces additional complexity/difficulty to existing AR user experience problems at the same time.

Finally, according to the results in Table 5 and Figure 8 as well as qualitative information from interviews, our proposed design of an IR-RGB dual-input projector with a multiband camera is rated 3.67 (=high) regarding novelty and practicality. The highest and lowest rated attributes of our proposal are implementation difficulty (4.50 = very high) and possibility in commercial markets (2.67 = fair). According to the interviews, experts had no doubt that the proposed design does have potential to solve or relieve problems of difficult visual scene analysis and crosstalk in projector-based AR. However, its high complexity (and probably high price) contradicts the current fact that projector-based AR is the least known and used type of mobile-based AR by end users. So as long as projector-based AR cannot prove that its usage scenarios are worth dealing with all these technical difficulties, this kind of questions will continue from both experts and end users.

6. Conclusion and Future Works

In this paper, we discuss user experience issues regarding mobile-based AR in current digital economy ecosystem. The discussion includes issues currently found in phone-based AR and wearable AR, the two popular types of mobile-based AR. Regarding each issue, we describe how projector-based AR, the immature type of mobile-based AR, can be used to enhance that user experience. In addition, we propose a framework for hardware design and architecture as well as software computing paradigm in order to efficiently and sustainably deal with current and future user experience problems which originated from actual utilization of mobile-based AR systems. Our proposed design is evaluated by three experts from related fields.

The new framework proposed in this paper can be considered as a guideline for future IR-RGB interactive mobile projection systems, particularly those with high expectation of intuitive context-sensitive interactions with users or physical objects like mobile-based AR systems. As for our future works, it can be categorized into two tasks that can be conducted in a parallel manner. The first task is to perform nonexpert user studies regarding mobile AR user experience issues and compare the results with our results presented in this paper. The second task is to create a next prototype of the IR-RGB dual-input projector with a multiband camera, including performance evaluation regarding each user experience issues focused in this paper.

Conflicts of Interest

The author declares that there are no conflicts of interest regarding the publication of this paper.