The Good News, the Bad News, and the Ugly Truth: A Review on the 3D Interaction of Light Field Displays

: Light ﬁeld displays offer glasses-free 3D visualization, which means that multiple individuals may observe the same content simultaneously from a virtually inﬁnite number of perspectives without the need of viewing devices. The practical utilization of such visualization systems include various passive and active use cases. In the case of the latter, users often engage with the utilized system via human–computer interaction. Beyond conventional controls and interfaces, it is also possible to use advanced solutions such as motion tracking, which may seem seamless and highly convenient when paired with glasses-free 3D visualization. However, such solutions may not necessarily outperform conventional controls, and their true potentials may fundamentally depend on the use case in which they are deployed. In this paper, we provide a review on the 3D interaction of light ﬁeld displays. Our work takes into consideration the different requirements posed by passive and active use cases, discusses the numerous challenges, limitations and potentials, and proposes research initiatives that could progress the investigated ﬁeld of science.


Introduction
As we live in a 3D world, the need for its faithful 3D representation is not surprising. While photography was introduced back in 1839, English scientist and inventor Sir Charles Wheatstone was already utilizing his work on stereopsis [1], which was initially based on drawings, to create photographic stereoscopic pairs and exhibit them via stereoscopes in the early 1840s. Stereoscopes gained immense popularity in the subsequent decades, partially due to the stereoscopes of Sir David Brewster [2,3] and of Oliver Wendell Holmes [4]. The principle of stereoscopes is to present the viewer two photographs of the same object or scene, captured from different perspectives. Technically, one image is allocated to one eye.
Although we have come a rather long way since the era of stereoscopes, contemporary digital stereoscopic 3D (S3D) imaging is evidently based on the same concept. However, the concept also extends to the need for a viewing device. Today, use case contexts reach far beyond the entertainment of individuals via stereoscopic photographs, and the modern applications of 3D visualization technologies may benefit tremendously from autostereoscopy-the viewing-device-free visualization of S3D contents. Yet, it is important to highlight that the following technologies are not actually autostereoscopic, as they achieve 3D visualization without generating stereoscopic pairs. Such technologies are commonly labeled as "glasses-free 3D", as no glasses or any other viewing devices are required.
The first technology that may come to mind is holography. It was initially proposed by Hungarian physicist Dennis Gabor [5][6][7] in the 1940s but only became a reality after Projection-based light field displays, which also appear in the scientific literature as "projector-based light field displays", use an array of optical engines (i.e., projectors) and a reflective holographic screen. Possibly the most well-known implementations of projection-based light field displays are the HoloVizio displays of Holografika [33][34][35], first introduced roughly 18 years ago. These displays provide horizontal-only parallax (HOP), which means that vertical changes in the angle of observation do not affect the perspective. It is feasible to create vertical-only parallax (VOP) displays; however, they are not practical, as the human eyes are horizontally separated, and the vast majority of changes in the angle of observation in utilization scenarios are also horizontal. Supporting both horizontal and vertical parallax classifies the visualization system as a full parallax (FP) display.
Light field displays are slowly but surely emerging. As it is an expensive and resourcedemanding visualization technology (e.g., high power consumption, great computation capacity, massive data size, etc.), not only are light field displays still yet to penetrate the consumer market, but the number of research institutions that have access to such displays is quite limited as well. This, of course, does not stifle innovation, as many institutions propose and develop their own prototypes. Beyond the any-size, any-aspect and any-shape light field display of Holografika [36], there are the layer-based solutions (i.e., multiple LCD layers are used to create the depth of field) of Teng and Liu [37], Alpaslan and El-Ghoroury [38], and Lanman et al. [39]; the integral imaging methods (i.e., a micro-lens array transforms the light rays) of Zhao et al. [40], Yu et al. [41] and Lee et al. [42]; and the projection-based implementations of Zhong et al. [43], Shim et al. [44], and Jang et al. [45], and many more.
Light field displays can be used in numerous different contexts. At the time of writing this paper, while some use cases are still in a purely conceptual phase, others have specific system designs or are already implemented on the level of prototypes. For example, designs for large-scale light field cinema systems (i.e., those with the size of conventional cinema standards) have been proposed [46], yet constructing these systems would be greatly challenging and excessively expensive-not to mention the lack of appropriate contents. Among the largest state-of-the-art systems are screens with 140-inch [47] and 162-inch [48] diagonal sizes, while even the smaller conventional 2D movie theatre screens are more than 50% larger. On the other hand, light field telepresence prototypes have already been implemented and tested [49][50][51]. Naturally, use cases that may utilize general-purpose light field displays are not affected by the emergence of dedicated devices.
The use cases of light field visualization may be either passive or active. In the context of passive use cases, the observer has no option for human-computer interactions (HCI); the individual is purely an observer of the visualized content and has no means of interaction. Active use cases allow the observer to provide input to the system via a control mechanism. A classic example for interaction is the adjustment of the viewing parameters (e.g., rotation) of a static model or scene, which may be executed by using either conventional controls (e.g., the keyboard of the system's computer) or more advanced solutions (e.g., gesture recognition). For such a glasses-free 3D visualization technology, using a touchless user interface (TUI) may seem to be the evident choice to maximize the potential of active use cases. In fact, it is possible to provide a light field user interface for the user to enable proper 3D interaction. For example, such a solution may combine light field visualization within arm's reach and hand gesture recognition via a motion-gesture sensor.
However, advanced user interfaces are not necessarily better than conventional ones in every single aspect. One particular aspect can be the precision of the input. While it may be marginal for certain use cases, it may be of pinnacle importance for others. Another important performance metric can be task completion time, which may be crucial in professional contexts, as well as in other activities (e.g., competitive gaming).
The comparative study of Pittarello et al. [52] investigated different aspects of 3D interaction for keyboard and mouse setups, gamepads, and Leap Motion, an optical handtracking module. The studied aspects were usability, emotional involvement, cognitive involvement, aesthetics, novelty, the will to play again, and physical fatigue. The obtained results clearly indicate that optical hand tracking under-performs in terms of usability, while it induces the smallest extent of physical fatigue. Additionally, optical hand tracking resulted in the highest number of task errors in the experiment. Similar findings are reported by the work of Ardito et al. [53], which used a Nintendo Wii Remote (also known as WiiMote), as well as the other two conventional controls. The collected data show that the mean task completion time for the WiiMote is nearly 50% higher compared to the keyboard and mouse and the gamepad. Moreover, the users expressed that handling the WiiMote was roughly twice as difficult as the other controllers, and this was also reflected in subjective measurements of personal preference and satisfaction.
While the experiments mentioned above were carried out for 2D displays, headmounted displays (HMDs), such as virtual reality (VR) devices, may benefit more from advanced controllers. Indeed, commercial VR systems now have their own special controllers that dominate use cases of entertainment, and digital gloves (also known as smart gloves or haptic gloves) are being investigated by the scientific community [54][55][56][57][58] and have emerged on the consumer market as well [59,60]. For light field visualization, such HCI solutions may be considered, yet the device-free nature of optical tracking may match the glasses-free nature of light field displays better.
In this paper, we review the state-of-the-art solutions for 3D interactions with light field displays. Our work specifically focuses on projection-based light field visualization, and thus, we do not address near-eye light field displays-which are analogous to HMDbased systems on many fronts. The analysis presented in this paper separately discusses the many passive and active use cases of light field visualization. The latter may vary greatly in terms of HCI, as certain use cases may have more specialized requirements toward the aspects of interaction performance. We also highlight how specific use case archetypes may be implemented as either passive or active and demonstrate potential hybrid multiuser solutions. Essentially, this paper elaborates on the good news (i.e., the advantages and the potentials) and the bad news (i.e., the challenges and the limitations) about 3D interactions with light field displays, as well as the ugly truth (i.e., that 3D interaction is not universally superior to conventional controls, and in fact, may be easily outperformed by such). Furthermore, our work proposes future research efforts that are necessary to advance our understanding of the related HCI, which may ultimately assist the emergence of the active use cases of light field visualization.
The remainder of this paper is structured as follows. The brief history of light field visualization and the current state of the research efforts are reviewed in Section 2. The different use cases are listed and categorized in Section 3. The findings on 3D light field interactions are analyzed in Section 4. The discussion on HCI is presented in Section 5. The paper is concluded in Section 6.

Historical Overview and State-of-the-Art Research of Light Field Visualization
Although real light field displays emerged in the past two decades, the concept of such a form of visualization was already conceived in the beginning of the 20th century. In 1908, Gabriel Lippmann [61] proposed integral photography, which relied on the same principle as today's integral imaging solutions. Basically, a micro-lens array is employed in order to facilitate small differences in perspective. Even in the case of rendering for state-of-the-art projection-based light field displays, if the input is an array of 2D images-a one-dimensional array for HOP and VOP content, and a two-dimensional array for FP content-then the differences between these images also encapsulate such small disparities.
The technical term "light field" was introduced by Andrey Gershun [62] in 1936, although Michael Faraday already considered light as a field in 1846 [63]. The complex characteristics of light fields were quantified by Adelson and Bergen in 1991 through the plenoptic function [64]. The plenoptic function describes the intensity (i.e., radiance) of all the light rays in a region of 3D space. It may be parametrized by a position (three coordinates) and a direction (two angles). This parametrization is the basis of 5D plenoptic modeling or image-based rendering (IBR), as proposed by McMillan and Bishop in 1995 [65]. The 5D model represents light fields as a set of panoramic images captured from the different positions in space. In 1996, Levoy and Hanrahan presented the light slab representation [66]. It builds on the idea that radiance does not change along a line in free space, where free space refers to regions free of occluders. The light slab is a 4D representation, as a ray can be described by its intersections with two planes (i.e., two point pairs). Such planes are often illustrated as parallel planes placed in front of and behind the 3D object or scene, but their position is arbitrary. In fact, there are numerous alternative 4D representations, such as two point pairs on the surface of a sphere. The 4D light field is also known as a lumigraph [67]. In 2002, Yang et al. introduced a capture system composed of 64 webcameras in an 8-by-8 array [68]. The first commercial plenoptic camera was the Raytrix R11 in 2010, followed by the consumer-grade light field cameras of Lytro in 2012. Regarding visualization, besides the aforementioned HoloVizio displays, the 360-degree system presented by Jones et al. [69] in 2007 should be mentioned, as well as the different works of Lanman et al. and Wetzstein et al. [39,[70][71][72].
As the representation of light field contents is significantly larger than of conventional 2D and S3D contents, compression is an essential research topic. Among the most notable initiatives is JPEG Pleno [73][74][75]. At the time of writing this paper, there are already five ISO/IEC JPEG Pleno standards published, and two more are currently under development. Other proposals of light field compression include the works of Magnor and Girod [76], Liu et al. [77], Chen et al. [78], and Jiang et al. [79,80].
Beyond these topics, there are numerous research questions that are still to be thoroughly investigated in the area of light field QoE [128], such as immersion, interaction, inter-user effects, perceptual fatigue, and many more. In Section 4 of this paper, we analyze the few published scientific contributions that address 3D light field interaction.

Use Cases of Light Field Visualization
In this section, we classify and describe the use cases of light field visualization. In the cases of passive utilization, individuals do not directly interact with the system, and thus, HCI either does not play an essential role or it is completely absent from the users' perspective. Note that the classification of the vast majority of use cases depends on the implementation (i.e., both use case classes are feasible).

Prototype Review
Prototype review is one of the most common instances of industrial visualization via light field technology. Usually a static model or scene is visualized, but it is also meaningful to display animated content. During the passive implementation of a prototype review, individuals (e.g., stakeholders, developers, etc.) may move within the field of view (FOV)-or rather, valid viewing area (VVA)-of the light field display to observe the content (e.g., a mechanical component) from various perspectives. The FOV is the angle measured from the screen of the display in which light rays are reproduced, while the VVA takes into consideration the actual shape of the area, defined by the overlapping spread of light rays (also known as emission cones), and may also limit the viable viewing distance through light ray density (i.e., angular resolution). Passive prototype review is feasible if and only if the orientation of the prototype and the display FOV together enable all the perspectives of interest and the smallest detail of interest is properly perceivable at the default content zoom.

Medical Imaging
The primary considerations of passive prototype review for feasibility are also applicable to passive medical imaging. However, such implementations are less likely to be practical, as limitations in observation may result in decreased diagnostics accuracy. While content interaction (i.e., rotation and zoom) may not be necessarily utilized for each and every instance of medical data analysis, the lack of such option may increase the percentage of false negatives (i.e., a health-related issue remains undetected) and false positives (i.e., a non-present health-related issue is falsely confirmed).

Resource Exploration
In the context of light field display systems, the use cases of resource exploration primarily refer to the visualization of oil and gas resources [129,130]. The aforementioned considerations regarding perspectives and details apply to the passive instances of resource exploration as well; however, these use cases are rarely time-sensitive, unlike many applications of medical imaging. At this point, the option of automatic rotation (i.e., the visualized object rotates slowly in a given direction) needs to be mentioned, which may be used for industrial use cases such as prototype review and resource exploration. For medical contexts, it may be feasible, but mostly for training and education purposes (e.g., the visualization of an organ affected by a certain disease).

Training and Education
Various instances of training-particularly specialized training-and education support the passive implementations of the utilization of light field visualization. In such cases, the perspective of interest is provided to the individual by default-or the content is animated-and changes to the parameters of visualization are not possible (e.g., due to the lack of HCI). Evidently, no interaction at all is required from the individual for passive use cases of training and education. A straightforward example for such a use case can be the provision of 3D educational multimedia.

Digital Signage
Digital signage is a very typical utilization of visualization technologies for commercial purposes (e.g., advertising on billboards). The emergence of 3D digital signage was expected for long, as the primary function of such visuals is to attract attention, which can be achieved by the content, by the visualization, or by their combination. Digital signage in general is a dominantly passive use case (i.e., the individual observes an eyecatching digital billboard), although active implementations are also possible-particularly for small-scale units.

Cultural Heritage Exhibition
Cultural heritage exhibition (e.g., an exhibition at a museum) as a passive use case is rather straightforward. Individuals either observe a static 3D object or scene, or an animated content. Object orientation is often not an issue. For example, for the visualization of the 3D life-size replica of a classical-era vase, if only a given portion of the vase is decorated, then that portion should face the audience; or if the entire circumference of the vase is imbued with unique imagery, then the vase should be slowly rotating.

Traffic Control
From all the different types of traffic control, light field visualization could benefit air traffic control the most, as it may show accurate vertical distances between the aircrafts. In passive implementations, the operator purely observes the visualized region, and does not interact with it. Interaction with the system is not necessary, and is not necessarily beneficial, as explained in the active variant of this use case.

Driver Assistance Systems
Driver assistance systems in the investigated context are technically light field windshields. The main rationale behind the implementation of such vehicles is that vehicle-or traffic-related information is visualized closely to where the driver's visual attention should be-namely, on the road. There are numerous applications based on Vehicle-to-Everything (V2X) communication, yet the data they convey is either shown on a smartphone or the digital dashboard of the vehicle [131][132][133][134]. Such light field solutions are particularly beneficial to the driver's reactive capabilities to V2X-based information [135]. Also, as the visualized content itself is 3D, the driver does not need to regularly switch between 2D and 3D visuals. In passive implementations, the driver receives relevant information via the visualization system.

Defense Applications
There are multiple utilization scenarios for light field visualization in the context of defense applications. They are economically feasible as well, since the military tends to have a generous budget. One particular form of light field technology in this context is known as 3D battlespace visualization [136]. It is analogous to air traffic control in many aspects, yet interaction may be greatly beneficial to such systems.

Telepresence
One can look at the telepresence use case as a "3D video call", although it is more than that. The purpose of true-to-scale systems-such as the prototype of Cserkaszky et al. [50]-is to enable a sense of presence via realistic size and glasses-free 3D visualization. Other implementations are feasible as well, such as the levitating system of Zhang et al. [51], which only displays the head of the individual; or the cylindrical teleconferencing system of Gotsch et al. [49]. 3D teleconferencing can also be implemented by general-purpose displays, such as the Looking Glass light field display in the design of Blackwell et al. [137]. In a passive use case, the individual interacts with others, and not with the system itself.

Home Multimedia Entertainment
The final use case which may be both passive and active is home multimedia entertainment. In its passive form, it is analogous to simply watching a movie on the television. Of course, its active variants differ more than the option to pause the content or to change its sound volume. Moreover, from all the use cases discussed so far, home multimedia entertainment is the only one that necessitates a privately-owned light field display-even telepresence can begin emerge (i.e., in professional contexts) without having consumergrade light field visualization systems.

Cinematography
The only use case in this list that is strictly passive is cinematography. As stated earlier, while large-scale light field cinema systems have already been proposed [46], they are greatly challenging to implement. However, such solutions carry immense potentials of innovation on many fronts. First of all, ticket pricing for light field cinema would differ quite a lot from conventional pricing schemes, as closer seats could provide better 3D perception of the movie. Basically, the perceived density of light rays fundamentally depends on the viewing distance. This means that it is more difficult to address the two eyes of the observer with at least two distinct light rays at greater distances. Moreover, light field cinema could open various artistic options to be explored. For instance, storytelling could be affected by the perspective (e.g., some details could be perceptually occluded from one perspective, while visible from another).

Prototype Review
For the majority of active use cases, including the active implementations of prototype review, there are two types of content visualization. One is a typical model viewer, which enables the modifications of the viewing parameters, such as rotation and zoom-for both static and animated contents. The other type includes all the content-related interactions. For example, in the case of prototype review, if the leg of a bipedal robot is moved through commands via an HCI, then that is a content-related interaction. However, if a looping animation of a bipedal robot is displayed, then changing its orientation via the HCI is a viewrelated interaction. Prototype review is not a time-sensitive use case (i.e., strict deadlines and rushed development do not make a use case time-sensitive). The audience is commonly composed of multiple individuals. It is sufficient only if a single individual engages with the HCI at a time. Input accuracy is primarily relevant for content-related interaction.

Medical Imaging
The most common operations for medical imaging are view-related interactions. Content-related interactions are more frequent in the context of medical training. Medical imaging can be a time-sensitive use case, as in many scenarios, the need for medical treatment can be urgent. It is possible to have simultaneous observers, but having a single medical expert as the user is also typical. Even if there are multiple observers, typically, there is no need for simultaneous interactions. However, input accuracy may be important due to the potentially time-sensitive nature of the use case.

Resource Exploration
Content-related interactions are feasible for the active instances of resource exploration (e.g., changing the visualized drilling positions), yet view-related interactions are much more common, such as zooming in on an oil field. Generally, it is not a time-sensitive use case, as the primary purpose of the utilization of light field visualization is to aid careful planning. Simultaneous viewers are typical, yet for both prototype review and resource exploration, single-viewer scenarios are possible. For visualization with multiple observers, a single input for interaction is sufficient, and input accuracy is not of the greatest concern.

Training and Education
The active implementations of the use cases for training and education include both content-related and view-related interactions. The utilization of light field displays may be time-sensitive, particularly for specialized training. Simultaneous users may be common, and simultaneous inputs may be common as well, the accuracy of which may be of pinnacle importance.

Digital Signage
Active digital signage is primarily applicable to small-scale instances (i.e., sidewalk signage), and it is not feasible for billboards and façade-size signage (i.e., the largest format, used on the surface of buildings). The use is rather simple: the individual approaches the digital signage, is interested by its content, and interacts with it for more information. Although view-related interaction is meaningful in such a context (e.g., the individual may rotate a commercial product to view it from different angles), options for contentrelated interaction (e.g., modifying the color of the commercial product) are expected to dominate the use case. Again, the entire essence of signage is to obtain the attention of individuals, and therefore, it should be as attractive as possible. Digital signage is not a time-sensitive use case, and while there are expected to be simultaneous viewers, such small-scale systems shall mainly focus on the input of a single individual. Regarding the input itself, its accuracy is not particularly important. Still, the overall experience should excel, as it may greatly contribute to the financial decisions of the individual (e.g., buying a commercial product or subscribing to a service).

Cultural Heritage Exhibitions
The active use cases of cultural heritage exhibitions are rather similar to those of digital signage, as one of their primary goals is to grab attention and make the individual interested. Of course, the intention is to convey cultural heritage, to enrich the individual with cultural knowledge, and not to generate profit. For this purpose, museums and exhibitions often experiment with novel technologies, as they tend to gain interest of younger generations. Both interaction types may serve this purpose well. Exhibitions of cultural heritage are far from being time-sensitive, and simultaneous viewers are very typical. There is the potential for simultaneous input, much moreso than in the case of digital signage. However, accuracy is significantly more important if the content-related interaction plays a central role in the experience. Basically, insufficient input accuracy may easily degrade the experience and make the individual lose interest in the cultural content.

Traffic Control
In the passive variant of traffic control, particularly air traffic control, it was mentioned that interaction is not necessarily beneficial to the active use case. For instance, changing the zoom level may be counter-productive and hazardous. If the operator zooms in on a particular region, then other portions of the region are not visible for that given duration. Of course, at the same time, content-related interaction can be rather advantageous, such as adding information overlays dynamically (e.g., the visualization of calculated trajectories). Traffic control is a highly time-sensitive use case. While there is the potential for simultaneous users, simultaneous input is not expected. However, since this is not only a time-critical but a safety-critical use case as well, input accuracy is extremely important.

Driver Assistance Systems
The interaction type for driver assistance systems is mostly content-related, such as adjusting the visualized information. It is expected that the usage of the windshield surface shall be strictly regulated by compulsory future standards, which also decreases the relevance of view-related interactions. Typically, the only user is the driver, who is also the sole source of input. Just as in air traffic control, driver assistance systems are greatly time-sensitive. Therefore, the input of such solutions must be highly accurate.

Defense Applications
For defense applications, such as a 3D battlespace, there are many information overlays that can greatly assist decision makers. These include the visualization of various ranges, such as radar, sonar, or even ballistic ranges. Both interaction types are feasible, although the considerations for zoom are analogous to air traffic control. Real-time defense applications are time-sensitive use cases, typically with multiple viewers and a single input, the accuracy of which is absolutely crucial.

Telepresence
Although the telepresence use case is more passive than active, both view-related and content-related interactions can be meaningful. For instance, if one party is too far from the camera array, the other party could zoom in on the view for a better visual experience. While large-scale, portrait-oriented systems are designed to encompass a single individual, many solutions may easily accommodate multiple simultaneous user on one end. As the use case is not fundamentally designed to be active, simultaneous inputs are not expected, and there are no major requirements about input accuracy.

Home Multimedia Entertainment
Similarly to telepresence, home multimedia entertainment is a mostly passive use case, although functionalities of contemporary smart televisions are expected. It is not a time-sensitive use case, and the input generally does not play a major role. Regarding simultaneous viewers, light field visualization poses no restriction about the number, unlike HMD-based technologies.

Gaming
One of the most important active-only use cases is gaming, which is evidently based on content-related interaction. Gaming is commonly time-sensitive, unless timerless turnbased games or similar genres are played, and the accuracy of the input is typically important. Possibly the greatest potential of light field gaming is split-domain gaming. While split-screen gaming divides the screen based on the number of players (e.g., in the case of two players, either horizontally or vertically), split-domain gaming allocates a VVA to each player. An example of the VVAs of two players is shown in Figure 2. In the middle, the perspectives of the two players overlap; thus, no valid visualization can be perceived in that region. The main benefit is that both players can utilize the entire screen, and an added bonus is that during competitive gaming, the two players may not see each other's views (i.e., no "screen peeking").

Metaverse
Another potential utilization of light field visualization is the active use case of the metaverse [138]. For such, interactions are expected to have similar characteristics to gaming. The metaverse can be used for a virtually infinite number of purposes. Although the concept itself dates back to Neal Stephenson's book published in 1992 [139], practical applications of the metaverse are currently being shaped.

Research on 3D Light Field Interactions
A summary of the typical parameters for active use cases elaborated in the previous section is shown in Table 1. The table emphasizes that task completion time can be crucial for the use cases, as well as the accuracy of the input. In this section, we review the state-of-the-art research on 3D light field interactions in light of the different use cases. Adhikarla et al. [140,141] proposed a 3D light field HCI via a prototype light field display. The framework was designed for realistic direct haptic interaction. The solution relied on a leap motion controller for hand tracking and a HoloVizio-like, small-scale, back-projection light field display for visualizing the HCI. In essence, it consisted of a projector array, two sidewall mirrors, a holographic screen, and, of course, a computer that controlled the projector array.
The proposed HCI was evaluated in a subjective study with 12 test participants. In order to directly compare the light field interface to a conventional 2D solution, the authors designed a so-called "2D mode" and a "3D mode" for the experiment. In the case of the 2D mode, the perceived visualization was uniformly close to the physical surface of the device (i.e., without any variation in depth), while for the 3D mode, the distance from the screen varied up to 7 cm. Three tiles (i.e., squares) were visualized on the interface, one of which was red. The task of the test participant was to touch the red tile. In the 2D mode, the three tiles were distributed on a plane, while in the 3D mode, the depth of the tile varied as well, between 0 cm (i.e., the tile was in the plane of the 2D mode) and 7 cm.
The experiment measured task completion time, cognitive workload, and QoE. The obtained results indicate that the same task required significantly more time to complete in the 3D mode. For cognitive workload, the NASA-TLX (Task Load Index) [142] was used, the results of which show higher loads for most aspects (frustration, effort, performance, temporal demand, mental demand, and total workload), but the difference was not statistically significant. Regarding QoE, the User Experience Questionnaire (UEQ) of Laugwitz et al. [143] was used, and it revealed that the light field HCI achieved better attractiveness, efficiency, stimulation, and novelty, although none of the categories achieved statistical significance in their differences.
In a different work of Adhikarla et al. [144], the usage of hand gestures for panning, rotating, and zooming was investigated in the context of a 3D map. The hand gestures were tracked by a leap motion controller, and the map was visualized on the HoloVizio C80 light field cinema system [47]. The HCI was implemented by separating the sensed zone of the device into two parts: a hover zone and an interaction zone. Hand movement in the hover zone resulted in no action, while the interaction zone responded to the pre-defined movements for panning, rotating, and zooming on the map. The solution was evaluated by experts, and it was concluded that it may be difficult for the user to keep track of hand positioning within the zones.
Yamaguchi and Higashida [145,146] proposed a small-scale visualization system, the screen of which was composed of a 2D array of small elementary holograms that functioned as convex mirrors. User interaction with the projected content was tracked by a color image sensor, which detected the light scattered by the user's finder. For testing purposes, the character "T" and the text "Touch Screen" were visualized, and if the individual touched the screen (i.e., the light scattered), then the text "OK" appeared as well. In a subsequent test, the characters "Y" and "N" were visualized, resulting in a "Yes" or a "No" if touched, respectively. A limitation of this solution is that interactions can only be registered if light is scattered, so finger motions between visualized areas are not detectable. The authors highlighted interactive digital signage as an active use case.
Chavarría et al. [147][148][149] used the same projection-based system for HCI and enhanced its registration procedure (i.e., detecting the user's fingers) to combat the aforementioned limitation. The work demonstrates novel functions achieved by the proposed method, such as mid-air light field drawing without any additional device. The authors also tested the system as an ATM interface with grab and poke gestures. Subjective studies related to performance and QoE are yet to be carried out.
The RePro3D display of Yoshida et al. [150] was demonstrated through interactions with a computer-generated character (i.e., an animated fairy). The input interface used an infrared camera to recognize the hand gestures. However, the user wore a haptic device on the finger [151] for tactile sensation. Yet, as it was solely used for feedback, the considerations of bare-finger touch [152][153][154] were still relevant. In the investigated use case, the animated character, who was superimposed in 3D space, responded to touch with both visual and audio cues. A limitation of the solution is that the positional relationship between the hand and the animated character is not fully addressed (i.e., if the user's hand was placed perceptually in front of the character, the character was not hidden by the hand). On the level of interaction, only the binary action of touch (i.e., whether the user perceptually touches the 3D content or not) was investigated.
Matsubayashi et al. [155] used ultrasound haptic feedback in two user studies. Regarding the first study, the task of the test participants was to estimate the position and angle of the virtual object based on haptic feedback. The estimation itself was carried out as view-related interactions (i.e., repositioning and rotation), which were executed via a keyboard. During the second study, the test participants were asked to lift a virtual cube, which was to be performed with and without visual access to the cube; in the case of the latter, the test participants had to rely on haptic feedback. The obtained results emphasize the importance of angle recognition and demonstrate that haptic feedback may compensate issues of occlusion, the topic of which is investigated by several recent works. For example, Yasui et al. [156] proposed an occlusion-robust sensing method based on aerial imaging by retro-reflection (i.e., reflecting light back to its source with minimal scattering).
Sang et al. [157] introduced a light field visualization system for medical imaging. The supported view-related interactions were rotating and zooming. However, the work did not detail the means of interaction.
In the experiment by Tamboli et al. [158], canonical 3D object orientation was addressed. The task of the test participants was to rotate 3D objects into their preferred orientation. The objects were visualized on the HoloVizio C80 light field display. As it was important from the perspective of the scientific work to obtain accurate data, the authors decided to include a conventional controller in the tests. The test participants used the thumbstick of the controller to rotate the 20 objects.

Discussion
This section aims to discuss and summarize the good news, the bad news, and the ugly truth about 3D interactions with light field displays.

The Good News
The good news is that light field visualization and its interactive use cases have absolutely immense potential. Particularly in the era of the COVID-19 pandemic, the possibility of avoiding physical contact during interaction is simply invaluable. The use cases are numerous, and as the technology emerges, there may be even more than the ones covered by this article. Eventually, passive and active instances of light field visualization may become an organic part of everyday life.
Although the availability of light field displays at the time of writing this paper is quite limited, there are more and more prototypes being built and tested by institutions. Regarding research, there is a continuous stream of scientific efforts, all of which contribute to the successful future emergence of the use cases of light field visualization.
Even without haptic feedback, the device-free nature of 3D interactions through projected light field HCIs combines well with the glasses-free 3D nature of light field displays. Furthermore, as shown via recent research efforts, haptic feedback for such systems can actually be implemented without the need for additional user devices.
Generally, control by hand gestures is quite intuitive. It is sufficient to see how fast humanity adapted to the touchscreens of smartphones, tablets, and other devices. Moreover, the analysis of hand gestures on 3D interfaces is expected to follow the research directions on touchscreens, such as identifying [159][160][161][162] or characterizing the user [163,164], including via gender [165][166][167] and age [168][169][170] recognition.
An additional benefit of light field HCIs is that they are much more durable as controllers, as there is no physical contact with the user. For instance, gamepad controllers and joysticks are more susceptible to damage when their users engage with intense fighting games or games that require frequent input, not to mention that the controller may be a victim of the player's frustration.
While split-domain visualization is most apparent for gaming, many other use cases may benefit from it as well. For example, light field displays in defense use cases may be used by multiple personnel simultaneously, with different information overlays. In such a scenario, both individuals could perceive the same real-time map of military entities, but one could overview radar or sonar ranges, while the other could supervise strike ranges or trajectories.
Light field visualization may also benefit from other technological advancements, such as the emerging type of diffractive optics known as metasurfaces [171][172][173][174]. Metasurfaces, which are typically metallic [175][176][177] or dielectric [178][179][180], are subwavelength-patterned surfaces that may be used in meta-optics to control the phase, the amplitude, and the polarization of light rays. With such and even more advanced optical technologies, light field visualization may take significant steps toward passing the visual Turing test [25], which is the ultimate goal of any glasses-free 3D imaging system.

The Bad News
On a more pessimistic note, the development of light field technology is constrained by significant challenges and limitations, which also extend to 3D interactions. There are important trade-offs between display characteristics, more densely-aligned projectors are needed, many commercial systems should not be too great in terms of size and weight, and heat dissipation should be properly addressed, not to mention data size, computational requirements, power consumption, and the expense of manufacturing, which also translates into commercial cost. These factors not only delay the emergence of light field displays and their use cases, but they slow down research efforts as well. While it is true that many scientific contributions do not rely on actual light field displays (i.e., light field contents can be visualized by other display technologies as well, including conventional 2D displays), 3D interactions via light field necessitates such displays. One may say that augmented reality (AR) has the potential to emulate the perceptual circumstances; however, it is not even remotely straightforward to match the QoE of a glasses-free visualization technology with the QoE of an HMD-based system.
In order for a display system to achieve a sufficient QoE, it needs to be somewhat free of blur, the parallax effect must be smooth and continuous, and crosstalk should be completely avoided. Blurred visuals can be caused by both insufficient spatial and angular resolution. It is a limiting property of light field visualization that the displayed content is always the sharpest in the plane of the screen. Angular resolution is of the utmost importance, not only because it enables 3D perception at given distances (i.e., viewing the same visualization from a greater distance may result in a more 2D-like visual experience), but also because it determines the characteristics of the achieved parallax effect. Technically speaking, a disturbed parallax can severely degrade the QoE as well as hinder interaction performance. One of the worse threats to QoE and to interaction is the crosstalk effect, during which adjacent perspectives interfere with each other and may potentially make the visualized content unrecognizable. Constraining the depth of visualization may indeed be a solution to avoid such issues; however, perceived depth is one of the most important building blocks of the entire 3D experience.
The size of the display, and thus, the size of the HCI, is also a difficult matter. If the HCI is too small, then that can be a serious compromise against input accuracy. However, larger systems may be more difficult to implement in a given context, they may not even be possible for certain use cases, or there may be potential issues regarding visualization quality. In the world of QoE, there are many interesting questions of preference, particularly when choosing between characteristics that may degrade the QoE [181,182]. In research efforts, it would be beneficial to see the preference between smaller but higher-quality systems and larger but lower-quality ones. Of course, in many scenarios, display size is dictated by the use case itself.
There are also significant technical considerations, including challenges and outright drawbacks, for specific use cases. In the passive use cases, the aforementioned display size can be a major issue for cinematography due to the challenges of both manufacturing a single screen in that size and creating an appropriate projection system. Regarding digital signage in general, having a light field visualization system outdoors is definitely a challenging endeavour. While unfavorable lighting conditions (e.g., exceptionally sunny weather) can be overcome by the necessary projector properties, the system may have a great maintenance cost (e.g., due to potential damage), guaranteeing proper operation temperatures may pose an issue, the continuous operation itself may be taxing in general, and the total system size of interactive units may also be difficult to minimize.
It applies to each and every use case that achieving a visual-Turing-test-passing level of excellence in light field visualization requires that the observer may focus within the visualized content at different depths; in the case of general 3D perception, the eyes normally focus on the plane of the screen. In order to enable this perceptual phenomenon, super-resolution is needed. In scientific research, the technical term super-resolution often refers to the enhancement of the resolution of light field content, also known as image super-resolution [183][184][185][186]. Super-resolution also means an angular resolution so high that at least two distinct light rays may address a single pupil of an individual with respect to a given point on the screen, which is required for the above-mentioned focusing. This concept is illustrated in Figure 3. The problem is that many use cases need to support greater viewing distances. However, the farther away the observer is, the lower the perceived angular density is. At the time of writing this paper, enabling super-resolution even for the shortest feasible viewing distances is a major technological challenge.

The Ugly Truth
The ugly truth is that no matter how much research is conducted on light field visualization, it is possible that 3D interactions will not be able to surpass conventional controls in many aspects. The most important characteristics of interactions are task completion time, input accuracy, cognitive demand, and QoE. For task completion time, we can see that 3D interactions generally take more time to complete. This is partially due to the fact that the interface elements may be at different depths, and therefore, it evidently takes more time to reach elements that are physically farther away. If every element on an interface aligns to a single plane, then this issue may be mitigated. However, there are many time-sensitive use cases, which may not tolerate the additional action delay. This aspect is highly intertwined with input accuracy, which is also crucial to a great number of use cases, yet based on what we can see so far, 3D light field interfaces tend to under-perform in comparison with conventional controls. Although cognitive demand may be an issue as well, it is less important in terms of use case success, and there may also be a phase of adaptation that may compensate. After all, new technologies may be demanding at first. Note that compensation to a certain degree for task completion time and input accuracy is expected. Regarding QoE, on the one hand, such 3D interfaces may provide an exceptional experience through novelty and visual appearance, but on the other hand, guaranteeing a sufficiently good visual experience is a rather challenging task for light field visualization. Furthermore, poor task performance-particularly the potential frustration caused by insufficient input accuracy-may severely penalize the overall QoE.
It is also possible that many of the use cases will remain constrained. For example, light field technology can only visualize spatially finite contents. If we think about HCIs, this is not necessarily an issue, as such visual interfaces are meant to be finite by definition. Of course, there may be design elements that point toward great depths and distances, but those do not contribute to HCI functionalities. However, let us consider the use case of gaming, where certain genres tend to visualize virtually infinite distances (e.g., outdoor, open-world, first-person games). An example for a passive use case could be cinematography.
At the end of the day, we need to ask ourselves the question: Is it really such a great problem if the user prefers conventional controls over 3D light field interfaces? Light field HCIs may have numerous benefits, yet we need to face the fact that they are not necessary for the success of many active use cases. Light field visualization technology, as the name suggests, is a visualization technology, and while using it as an HCI is indeed an option worth considering, the primary focus shall always remain on the visualization of the content. Of course, it should be noted that 3D light field HCIs may be used in conjunction with other visualization technologies as well. A good example for such is replacing the physical touchscreen at ticket vendor machines; while the information is visualized on a flat 2D screen, the touch-free input comes from a virtual 3D interface. Naturally, this is the part where considerations regarding such 3D input are redundantly repeated.

Conclusions
In this paper, we provided a comprehensive review on 3D interactions with light field displays. We categorized the potential use cases by interaction and analyzed the active use cases by interaction type, time sensitivity, simultaneous users, simultaneous input, and input accuracy. We examined the state-of-the-art solutions and discussed the positive and negative aspects of current and future research. We conclude that the utilization of the technology has immense potential, and both the passive and the active use cases may greatly benefit humanity, yet there are significant constraints, and it is quite possible that 3D interactions via light field shall not prove to be superior in every single aspect.
Regarding future work, there is a virtually infinite, absolutely vast ocean of research questions that need to be addressed. Technically speaking, every active use case should be properly evaluated with particular emphasis on their characteristics, and further use cases should be explored. From the perspective of the authors of this article, possibly the most exciting research direction is the investigation of split-domain solutions. Domain separation, dynamic domains, domain capacities, uneven domain distribution, simultaneous input, asynchronous solutions, and inter-user effects should be addressed.

Acknowledgments:
The authors would like to thank Tibor Balogh and Holografika for the know-how and expertise that ultimately led to the creation of this article.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript:

AR
Augmented