Roadmap on 3D integral imaging: sensing, processing, and display

This Roadmap article on three-dimensional integral imaging provides an overview of some of the research activities in the field of integral imaging. The article discusses various aspects of the field including sensing of 3D scenes, processing of captured information, and 3D display and visualization of information. The paper consists of a series of 15 sections from the experts presenting various aspects of the field on sensing, processing, displays, augmented reality, microscopy, object recognition, and other applications. Each section represents the vision of its author to describe the progress, potential, vision, and challenging issues in this field. © 2020 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

driving, augmented reality, security and defense, biomedicine, etc. The 3D technologies research and development activities are conducted in academia, industry, and government Labs, and they have been implemented for objects from macro to micro scales. The broad scope of these activities is reflected in the large number of publications, conferences, and seminars, and industrial activities in the 3D field conducted across the globe in many international organizations.
Integral imaging is one of the several approaches used to implement 3D technologies . Initially, it was invented by Lippman [1] who named it Integral Photography and later won the Nobel prize in physics for his inventions. The pioneering work of a number of researchers [5][6][7][8][9] in the 1970s, 80s, and 90s rejuvenated the interest in this 3D field. In recent years, this 3D approach is referred to as integral imaging since a digital camera is used for scene capture and spatial light modulators are used for display instead of photographic film. In addition to integral imaging terminology, this approach has been named as plenoptics [19,20], and lightfield [21,23]. Integral imaging is an attractive approach because it is a passive imaging system, and it can operate in outdoor scenes, and under incoherent or ambient light for important applications .
This roadmap paper on 3D Integral Imaging: Sensing, Processing, and Display is intended to provide an overview of research activities in the broad field of integral imaging. This roadmap will consist of a series of 15 sections from the experts presenting various aspects of integral imaging, including sensing, processing, microscopy, biomedicine, object recognition, displays, and augmented reality. Each section represents the vision of its author to describe the progress, potential, applications, and challenging issues in this field. The contributions are ordered as follows (Table 1):  The three first sections analyze problems related with the detection of signals in turbid media using multiple light sources, strategies to record and display of 3D scenes in low light conditions and the measure of 3D polarimetric information, i.e. Stokes parameters and the Müller matrix, respectively. Sections 5 and 6 describe recent advances in light field microscopy, including Fourier and lens-less approaches in which the micro-lens array is replaced with a diffuser, respectively. In Section 7, we discuss about the necessity of using data compression methods adapted to 3D imaging because of the large amount of data required for the description of the light field. Section 8 summarizes previous research work on 3D sensing for gesture recognition based on integral imaging. Sections 9 to 16 analyze a variety of problems related to 3D displays. In Section 9, we introduce a technique to calculate the best perceivable light distribution that ideally should be provided to the viewer, namely the Perceivable Light Field. In Section 10 we discuss how design variables are selected depending on whether the display is intended for one or multiple users, whereas in section and 11 we analyze trade-off restrictions between angular diversity of light rays and spatial resolution of images. Section 12 provides an overview on head-mounted light field displays, focusing on present designs and future challenges. Applications of integral imaging and artificial vision (AR) for biomedicine are considered in Section 13: the main problems of these devices are (i) the trade-off between viewing angle and resolution and (ii) the requirement of high-quality real-time rendering. In Sections 14 and 15 we describe two possible approaches for 3D displays: the tabletop which enables vivid and natural 3D visual experience and 360-degree viewing zone, and the so called aerial display designed to show information in mid-air where there is no display hardware, respectively. Finally, in Section 16, we analyze how holography and integral imaging can be combined as solution of various application challenges. The conclusions are presented in Section 17.

Optical signal detection in turbid water by multidimensional integral imaging
This Section presents an overview of recently reported system for underwater optical signal detection based on the multi-dimensional integral imaging and temporally encoded light sources [30][31][32]. Figure 1 illustrates the approach based on multi-dimensional integral imaging for underwater signal detection using single or multiple light sources. The advantages of multiple light sources are increased bandwidth, and improved detection capabilities [31]. The underwater optical signal detection method contains three stages: 1) time varying optical signal transmission in turbid water, 2) 3D integral imaging sensing, turbidity removal processing [33,34], and reconstruction, and 3) signal detection using correlation 4D filter matched to the temporal and spatial varying signal. The light sources are temporally encoded using spread spectrum techniques to generate a four-dimensional (4D) spatial-temporal signals which are transmitted through turbid water and recoded using an integral imaging system. In Fig. 1(a), an example of application of the proposed approach is presented. In Fig. 1(b)-(c), the principle of integral imaging pick up stage to capture the optical signal and 3D computational volumetric reconstruction process are presented, respectively. In Fig. 1(d), the experimental setup for signal detection in turbid water is illustrated [30][31][32]. Additional discussions on the principle of integral imaging image capture and image reconstruction are presented in Section 3. The white LED lamp on the top mimics shallow water condition. Turbidity mitigation techniques could be applied on the 2D elemental images deteriorated due to turbidity to reduce the noise and improve the computational 3D reconstruction of the signals [30][31][32][33]. Once the signals are captured, processed to remove turbidity, they are computationally reconstructed by integral imaging reconstruction algorithms to reconstruct the time-varying light sources images [35]. Then, a 4D correlation filter is synthesized which includes both spatial and temporal information of the reconstructed signal to be detected. The correlation filter is applied to the 4D computationally reconstructed temporal and spatial data to detect the transmitted signals in turbidity [ Fig. 2]. The correlation filter is synthesized based on a template that contains the reconstructed light sources and the temporal sequence of the spread spectrum or Pseudo-Random codes [36] used in the transmission process. The correlation output of the receiver is generated by correlating the synthesized filter with 4D spatial-temporal reconstructed input data. Using the correlation output and the optimal threshold values calculated from the receiver operating characteristic (ROC) curves, one can detect the transmitted signals in turbid water [30,31]. In summary, multi-dimensional integral imaging systems [37] are promising in signal detection in turbid water.

Low light 3D object visualization and recognition with visible range image sensors
In this Section, we present an overview of 3D integral imaging object visualization and recognition in very low illumination conditions using visible range image sensors [38,39]. Passive imaging in very low illumination conditions with low cost visible-range image sensors such as CMOS sensors has many applications in manufacturing, remote sensing, night vision, under water imaging, security and defense, and transportation to name a few. However, this is a challenging problem mainly due to the read-noise dominant captured images in photon-starved conditions. A simple experiments using conventional 2D imaging with CMOS image sensor in very low light scene produces unsatisfactory results with noise like captured images. It has been reported that passive 3D integral imaging may perform visualization and recognition under very low illumination conditions in part because integral imaging reconstruction is optimal in a maximum likelihood sense under low light conditions [18,[38][39][40][41][42][43][44][45]. In addition, 3D integral imaging utilizing convolutional neural networks can be effective in object recognition in very low illumination conditions [39]. Integral imaging has been shown to provide superior performance over 2D imaging in degraded environments [40][41][42][43][44][45]. Clearly, high-sensitivity image sensors such as EM CCD cameras [44,45] may be used. However, the focus in this section is on 3D low light imaging with conventional and potentially low cost CMOS image sensors to enable object visualization and detection in poor illumination conditions. In [38], 3D integral imaging was used in low illumination for object visualization and detection using a conventional low-cost and compact CMOS sensor. The input scene consisted of a person standing behind an occluding tree branch in low light (night time). Total Variation (TV) denoising algorithm [46] and the Viola-Jones object detection [47] were used to process the reconstructed 3D image which resulted in successful face detection. Sample experimental results are presented in Figs. 3-4. The photons/pixel estimates are about 7 and 5.3 for the two light levels [38].  The experimental results showed increases in the 3D reconstructed image SNR and entropy compared with 2D imaging [38]. The use of convolutional neural networks (CNN) for 3D integral imaging object classification in very low illumination conditions has been reported [39]. The CNN is trained to perform object recognition on the 3D reconstructed images in different low illumination conditions and different persons. As in [38], TV denoising is applied to improve SNR and Viola-Jones face detection is used to extract the regions of interest from the denoised 3D reconstructed images to be used as input into a CNN for training and testing. The CNN approach resulted in 100% classification accuracy among 6 subjects in very low illumination conditions.

Polarimetric measurements with integral imaging
The vector character of light fields is not relevant in imaging problems, since the intensity provides enough information during recording and visualization processes. Nevertheless, the use of the information obtained from the polarization of light is a powerful and convenient tool of analysis that can be used in a variety of problems: pattern recognition, machine vision, target detection in turbid media, underwater imaging, et cetera [48]. The measure of polarization requires several recordings using a linear polarizer and a quarter wave plate. Nowadays several companies commercialize cameras able to determine the Stokes parameters and the degree of polarization (DoP) in a single shot. The process to generate integral imaging polarimetric distributions is equivalent to the one used in conventional 2D imaging [49]. A polarizer and a phase plate that determine the required state of polarization (SoP) are located in front of an integral imaging device. At each shot, the complete set of elementary images corresponding to the SoP is recorded. Combining these sets in the proper way, the DoP for each elementary image can be calculated effortlessly [50,51].
The use of 3D polarimetric techniques is particularly interesting in photon starving conditions. The estimation of the Stokes parameters and the DoP is particularly challenging in conventional imaging because the signal to noise ratio is very low and numerical errors are spread during the calculation stage. Note that the estimation of the Stokes parameters involves the difference of two intensities and when the number of photons involved is very low, these parameters become ill-defined resulting in an underestimation of the DoP. Nevertheless, we demonstrated that it is possible to determine the polarimetric information of a scene in low light conditions using integral imaging [52]. The reconstruction of the 3D information involves elemental images averaging that might be statistically optimum in maximum likelihood sense [41]. Interestingly, we found that the analysis of the statistical distribution of the DoP provides enough information to distinguish among areas with strong polarimetric signal and noise [53].
The Stokes parameters characterize the scene for a specific SoP. In particular, if natural light is used, the polarimetric response of the objects can be weak. In contrast, if the scene is illuminated with fully polarized light, the signal is stronger but dependent of the illumination SoP. The measure of the Müller matrix (MM) provides a complete polarimetric description of the scene for any SoP of the light source. We recently extended this technique from 2D to 3D imaging [54]. Generally speaking, the calculation of the 16 components of the MM requires 36 recordings of the light field (six input SoP times six recordings for each input SoP). With this information, it is possible to derive the MM for each point of the light field. This procedure is time consuming and can be a disadvantage when the scene is dynamic.
The MM technique is able to display the polarimetric response of the scene for any SoP of the illumination source, including partially polarized light. Fig. 5 shows some DoP results obtained with a commercial plenoptic camera (Lytro Illum). The plane that contains the larger clock appears in focus. It is apparent that the polarimetric signal obtained with natural light is very weak (Fig. 5(a)), whereas large areas of Fig. 5(b), illuminated with fully polarized light, appear to be saturated. The use of partially polarized light (Fig. 5(c)) provides an equilibrated description of the scene: only few pixels of the scene (e.g. the screw) display a DoP close to 1. Since the polarization landscape depends on the input SoP, it is possible to produce a synthetic DoP signal as the fusion of polarimetric images generated with different SoPs (Fig. 5(d)), resulting in a distribution that is almost independent of the illumination.

Integral microscopy
The main lack of plenoptic cameras [20,23,55] is their poor parallax, which restrict their capability for resolving occlusions or for calculating accurate depth maps. Thus, desirable are applications where the lightfield is captured with high parallax. This is the case of microscopy where the objective, especially if high-NA, captures rays with high angular content.
In 2006 two groups proposed the first schemes that took profit from the integral imaging concept in microscopy. On one hand Javidi et al. [27] used the images captured directly with a microlens array for the identification of microorganisms. On other hand, Levoy et al. [26,56] proposed the lightfield microscope (LFM), a novel scheme based in adapting the plenoptic-camera design to microscopy. As shown in the in Fig. 6(a), the lightfield microscope can be implemented from a conventional optical microscope by simply inserting a microlens array at the image plane and displacing the CCD up to the lenslets back-focal plane.
Clearly, the LFM does not capture directly perspective images of the sample, but they are easily calculated from the captured microimages. In fact, and due to a transposition property [16] it is possible to calculate as many view images as pixels compose each microimage. The LFM has inspired a lot research in the past few years and even could be said that has opened a research field. However, this design has some lacks that have prevented from its broad application to real  [26,56]; (b) Scheme of the FiMic reported in [16,59] microscopy problems. We refer to its poor spatial resolution, the inhomogeneous resolution of refocused images, and the low number of refocused planes.
Aiming to overcome these drawbacks, Georgiev proposed the so-called Plenoptic 2.0 scheme [57,58]. Based of inserting the microlenses at an intermediate plane, this design allows the direct capture of many view images, but each with small field of view, and some improvement in resolution. However, the captured microimages have vignetting problems and the refocused images are still few, have inhomogeneous resolution and show periodic artifacts.
Much more recently a new paradigm for integral microscopy has been reported [16,59]. The new architecture is based in the insertion of mililenses at the Fourier plane of the microscope; i.e. at the aperture stop of the objective or at a conjugate plane, see Fig. 6(b). This setup permits the direct capture of a number, as large as the number of lenslets in the array, of orthographic view images of the sample. This instrument, named as the Fourier integral Microscope (FiMic), overcomes many of the problems listed above. More specifically, it can provide view images with resolution up to one third of the resolution of the native microscope, with much larger depth of field and all with broad field of view and with the same point-spread function over the complete sample. Other advantages are the higher density of computable refocused depth images and their homogeneous lateral resolution.
Naturally, being the integral microscopy a computational-imaging technology, de inception, development and optimization of new computational tools for the accurate calculation of refocused images and 3D point clouds will be the subject of research along the next few years. In any case, integral microscopy already has started to demonstrate its applicability in biomedical sciences [60][61][62].

DiffuserCam: a new method for single-shot 3D microscopy
The DiffuserCam project started with a question: is it possible to capture a light field by replacing the microlens array with a diffuser? The idea is that, like a microlens array, a smooth diffuser has small bumps that focus light, albiet in a random way. Hence, the diffuser should also be able to encode 4D space-angle information. In [63], we demonstrated LFM with a diffuser in place of the microlens array, then used a computational inverse solver to reconstruct the 4D light field. The diffuser-LFM had several advantages over traditional LFM [25,26]: 1) Off-the-shelf diffusers are significantly less expensive than microlens arrays. 2) The diffuser need not be carefully aligned, making fabrication easier. 3) The numerical aperture (NA) of the diffuser bumps need not match the NA of the objective lens, allowing users to swap in objectives of different magnification/NA. We demonstrated digital refocusing and perspective shifts with the diffuser-LFM. However, the system still suffered from the typical trade-off between spatial and angular sampling that results in reduced resolution, which is a key performance metric for microscopy. LFM resolution can be significantly improved by a 3D deconvolution approach [64] in which the 2D measurement is used directly to solve for a 3D intensity image, rather than taking the intermediate step of recovering the 4D light field. The only loss of generality is an assumption of no occlusions, which holds well for fluorescent samples in bio-microscopy. Deconvolution LFM achieves nearly diffraction-limited resolution at some depths, but performance degrades sharply with depth, the system suffers artifacts near the native image plane and the spatially-varying operations require computationally-intensive reconstruction algorithms. Fourier Light Field Microscopy (FLFM)in which the microlens array and sensor are placed at the pupil plane of the objective [28,59] reduces artifacts near focus and provides a computationally-efficient shift-invariant model. The same benefits can be obtained for diffuser-LFM by placing the diffuser and sensor in the Fourier plane (Fig. 7). The diffuser version further improves the depth range and resolution uniformity because the diffuser has bumps with a wide range of focal lengths, meaning that we have a sharp response from a wide range of depth planes [65,66]. And when the diffuser is placed directly on the back aperture of the microscope objective, the entire system has the added advantage of being extremely compact [65]. With this configuration, the randomness in the diffuser brings a major new advantage by enabling compressed sensing. Because the diffuser response is not periodic like a microlens array, it does not have degeneracies that require physically limiting the FOV. Sub-images may overlap, and a sparsity-constrained inverse problem can recover the 3D scene with the fully-available FOV. This breaks the need to trade-off spatial and angular resolution, giving the best of both worlds if the sample is sufficiently sparse. Since we no longer need limiting apertures, we can even remove the objective lens, creating a lensless 3D imager that is just a sensor and a diffuser [59,[67][68][69] (Fig. 7). The resulting system is compact and inexpensive, while still providing high-resolution large-volume 3D reconstructions at speeds set by the frame rate of the sensor, or even faster when rolling shutter scanning effects are exploited [5].  [65,66], and the lensless 3D DiffuserCam [67] which is simply a diffuser and a sensor. The DiffuserCam reconstruction pipeline takes the single-shot captured image and reconstructs non-occluding 3D volumes by solving a nonlinear inverse problem with a sparsity prior, after a one-time calibration process.

Data compression and coding of the integral imaging data
Integral imaging data is spatially multiplexed 4D light field (ray space) data. Light field representation requires tens to hundreds of thousands of images, and therefore, light field data compression has been one of the critical aspects for the practical usage of light fields since its early stage of the research [70,71]. Figures 8(a) and 8(b) show examples of ray space data (shown in 3D) and spatially multiplexed data (lenslet images). The challenge is how to reduce the amount of these data by utilizing redundancy appeared in 4D light field data.
As light field data is interpreted as a collection of 2D images, the researches for light field data compression tried to apply image/video coding schemes which were originally intended for 2D image/video coding. The core principles of the image/video coding are: vector quantization, transform coding such as Discrete Cosine Transform (DCT), and predictive coding such as motion/disparity compensation. The basic approach to light field coding is to apply 2D video coding methods to 2D image array of 2D images, which corresponds to data structure of light field data. The researches at the first stage aimed at improving the compression performance of given standard coding tools [72]. In the mid 2010s, the problem of light field coding attracted considerable interest again with the increased number of academic and industrial research papers. At that time, several light field coding challenges were held in signal processing related conferences such as the IEEE International Conference on Multimedia and Expo (ICME, 2016) [73], and the IEEE International Conference on Image Processing (ICIP, 2017). Several methods proposed in the papers on light field coding applied standard image coding tools, such as JPEG standards from ISO/IEC JTC1/SC29/WG1, or video coding tools based on MPEG standards from ISO/IEC JTC1/SC29/WG11. Based on the results of the challenges, the JPEG standardization committee created an initiative called 'JPEG Pleno' in 2016 [74]. The key technologies are 4D transform and 4D prediction. For the performance of the JPEG Pleno, please see [75]. One example point on the R-D curve is PSNR (Peak Signal to Noise Ratio) 38 dB at rate 0.1 bpp (bits per pixel). At this moment, the JPEG Pleno is DIS phase in the international standardization timeline [76]. On the other hand, MPEG has also started the standardization activity. The dynamic light field coding is discussed in MPEG-I Visual group, named as 'Dense Light Fields Coding' [77,78].
Newly emerging research topics on the light field coding are to use neural network (NN)-based methods. The NN-based methods have been widely used in image processing field, such as depth estimation and view synthesis, and it is reported that the NN-based methods outperform the conventional image processing methods. One example which is applied to light field coding is generative adversarial network (GAN)-based light field coding [79].
Challenge of integral image data coding is to further enhance the coding efficiency as well as coding/decoding speed. With the advance of efficient data compression technologies together with the development of high speed and high bandwidth network such as 5G network, integral imaging data communication will be realized in the near future.

Hand gesture recognition using a 3D sensing approach based on integral imaging
Human gesture recognition, and particularly hand gesture recognition, is and increasingly demanding application for multimedia and human-machine interaction fields. RGB-D image sensors have been used as the main 3D imaging technology for hand gesture recognition tasks in these application [80][81][82]. Integral imaging is a powerful alternative to RGB-D sensors, due to its passive sensing nature and the fact that it can work under certain challenging limitations such as partial occlusion and low illumination conditions. This section summarizes part of the previous research work on 3D sensing for gesture recognition based on integral imaging [83,84]. These works showed the capabilities of integral imaging for hand gesture recognition using 3D image reconstruction techniques that overcome active RGB-D sensors in challenging partial occlusion scenarios. Camera arrays are an integral imaging modality that acquire a set of elemental images, which capture the scene from different viewpoints. High resolution cameras allow to acquire elemental images with larger physical aperture and camera separation, which provide higher image resolution and better depth estimation within some depth ranges. 3D reconstruction focusing at a certain depth can be performed from elemental images [51] using computational models based on pinhole arrays [35].
Integral imaging from an arrays of cameras can reconstruct the image sequence at the depth where a hand gesture is located, focusing at the hand movements. The hand gesture motion is characterized by means of a Bag of Words (BoW) method from previously extracted Spatiotemporal Interest Points (STIPs). These feature points are characterized by extracting local features around the STIPS, which are used to build the BoW characterization that is eventually used in a Support Vector Machine (SVM) classifier [83,84]. Experiments were carried out using a 3x3 camera array for the elemental image acquisition. The integral imaging method was also compared with Kinect RGB-D sensor, using the same hand gesture characterization and classification technique. Image sequences from a single camera were also used as a baseline technique. Results showed that integral imaging using an array of cameras outperformed RGB-D sensing and single image capture particularly in challenging partial occlusion conditions (Fig. 9) [83,84]. Integral imaging provides a variety of potential features that are very adequate for application scenarios where challenging conditions, such as partial occlusion, cannot be overcome by other 3D sensing technologies. Integral imaging capture is a powerful image acquisition technique for passive 3D reconstruction with high image resolution and enough depth estimation accuracy in certain depth ranges. 3D image reconstruction is a useful tool to extract 3D features to characterize and recognize 3D movements such as human hand gestures in multimedia and human-machine interaction applications.

Perceivable light fields for integral imaging display design
Light Fields (LFs) [70], (x, y, u, v), are four-dimensional functions representing radiance along rays as a function of positions (x, y) and the (generalized) directions (u, v). They are commonly used for the analysis of the propagation of the 3D visual information from the display to the viewer. In [85] we proposed an analysis approach that follows a reversed direction, that is from the viewer to the display device, to better evaluate the display device specifications needed to fulfill the viewer requirements. For this purpose, we have introduced the notion of Perceivable Light Field (PLF) [85,86] to describe the best perceivable light distribution that ideally should be provided to the viewer. The PLF is propagated back to the display device to determine the LF distribution that the display needs to generate.
One of the main utilities of the LF representation is for analysis of light transport through free space and through common optical components, because their propagation through first-order optical systems can be easily described by simple affine transforms [85,87,88]. For improved heuristics we proposed in [85] to use a linear decomposition of the LF: (x, u) = n,m l n,m ϕ n,m (x, u), where ϕ(x, u) denotes the light field atom (LFA), defined as the most concentrated LF element that a system can support. The PLF is the LF, (x e , u e ), captured and perceived by the human visual system. Figure 10(b) illustrates the 2D PLF chart of binocular viewing ( Fig. 10(a)). The dots in the PLF chart in Fig. 10(b) represent the center of the LFAs, according to one possible tiling of the PLF chart [85]. After backpropagation to the integral imaging [17] display ( Fig. 10(a)), the PLF is horizontally sheared, e (x e + z d u e , u e ), as shown in Fig. 10(c). For high quality 3-D image generation, the integral imaging LF support needs to enclose the back-projected PLF, and each PLF atom should be matched by at least one display LFA. In Fig. 10 we considered an integral imaging display working in unfocused mode [17] (a.k.a. resolution priority integral imaging [89] with a static viewer). The same methodology can be used for integral imaging in focused mode, and to include motion parallax as well [85].

Towards the development of high-quality three-dimensional displays
Since Lippmann's invention of integral imaging [1], there has been research aimed at developing high-quality 3D displays based on this method [2]. In addition, a system for capturing and displaying objects as 3D images in real-time has been proposed [7,8,90]. For reconstructing high-quality 3D images in real-time, the capture and display devices must have an image sensor and display panel with many fine-pitch pixels along with lens arrays with many fine-pitch lenses. This section presents a recently developed high-resolution 3D display that uses multiple projectors and a wide-viewing-angle 3D display that utilizes eye-tracking technology.
To reconstruct 3D images in integral imaging, the directions of light rays are controlled by micro-lenses comprising a lens array. Two measures are mainly used to represent the quality of the 3D images, i.e., resolution and viewing angle. Moreover, as use-case scenarios in consumer and industry, a 3D display would be viewed by multiple users and individual users.
For multiple users, a 3D display with a large area is preferred; however, it is difficult to fabricate a lens array composed of a lot of micro-lenses covering a large area. Aktina Vision solves this problem by controlling the display directions of multi-view images by using a lens larger than the micro-lenses [91,92]. In [91], multi-view images consisting of a total of 350 viewpoints are projected onto a diffusing screen by using fourteen 4K projectors. The resolution of each view image is 768(H) x 432(V) pixels, and the viewing angle of the 3D image is 35.1(H) x 4.7(V) degrees. Although the current system is not so compact, Aktina Vision is capable of having higher resolution and larger display area by projecting high-resolution multi-view images in a large area.
For individual users, the 3D display area should not be so large. In addition, because it is enough to display a 3D image just within a single viewer's area, the resolution of the 3D image can be improved by not allocating light rays to an unnecessarily wide viewing area. Here, a 3D display using an eye-tracking technology has been proposed as a way of maintaining a certain amount of resolution through a wide enough viewing angle [93][94][95]. In [94,95], the lens array is composed of 425(H) x 207(V) micro-lenses and the viewing angle of the 3D image is 81.4(H) x 47.6(V) degrees. An exterior view of the integral 3D display with eye-tracking system is shown in Fig. 11.   Fig. 11. Exterior view of the integral 3D display with eye-tracking system. A challenging issue for realizing high-quality 3D display is necessity for displaying huge amounts of light rays. A demand for the 3D display would be increased also in the future both in industrial and consumer uses. Further development toward high-quality 3D display satisfying the requirements in accordance with use case scenarios is expected. (Light field processing pipeline. A computational camera system, including one or more cameras, captures light field data, which is subsequently processed by a neural network or some other algorithmic framework. The algorithms perform low-level and high-level image processing tasks, such as demosaicking and view synthesis, and transmit the data to a direct-view or near-eye light field display

On the duality of light field imaging and display
When Gabriel Lippmann invented integral imaging the foundation of most modern light field imaging systems he envisioned this technology as a fully integrated imaging and display system [1]. Over the course of the last century, however, an intuitive interpretation of this duality between capture and display got lost, mostly because digital and computational approaches to light field acquisition and synthesis today have evolved into sophisticated opto-computational systems that are highly specialized and adapted to specific application domains. This chapter is focused on the duality between light field capture, processing, and display.
Outside the optics community, in the computer vision, graphics, and machine learning communities, (unstructured) light field capture and view interpolation, extrapolation, and synthesis have become extremely "hot" topics. Although image-based rendering and conventional 3D computer vision have been aiming at reconstructing 3D scenes from 2D images for a long time, emerging neural view synthesis approaches are the first to demonstrate photorealistic quality for these applications. In addition to these emerging reconstruction and processing approaches, the emergence of virtual reality has created a strong need to capture multiview image and video data for immersive experiences. In light of this need, custom camera rigs or hand-held camera systems that record unstructured light field data have seen much interest. Finally, in the computational optics and graphics communities, much work has been done over the last few years on developing near-eye light field displays for next-generation head-mounted displays. Today, all of these research and engineering efforts on recording, processing, and display light fields are fragmented. In this roadmap article, we argue for a streamlined approach that considers all of these aspects and potentially optimizes such systems end-to-end from recording photons to displaying them with a near-eye display (see Fig. 12).
In direct-view displays, light field capabilities enable glasses-free 3D image presentation. In contrast to conventional 2D displays, such displays provide a richer set of depth cues to the human visual system that include binocular disparity and motion parallax in addition to the pictorial cues supported by 2D displays. This capability provides new user experiences in a variety of applications, such as communication, teleconferencing, entertainment, and visualization. However, one of the biggest challenges of integral imaging-based light field displays and cameras is the spatio-angular resolution tradeoff. In order to provide the angular diversity of light rays required for light field capture or display with a single device, spatial resolution of the corresponding images typically has to be sacrificed [1][2][3]23]. This tradeoff is oftentimes not desirable by a user and may be one of the primary reasons for why neither light field cameras nor displays have succeeded in the consumer market. Through the co-design of optics, electronics, and algorithms, emerging compressive light field systems provide a modern approach to light field imaging and display that have the capability of leveraging redundancy in natural light field data to overcome the long-standing spatio-angular resolution tradeoff and enable high spatial and angular light field resolutions simultaneously [96,97].
Over the last decade, virtual and augmented reality (VR/AR) applications have sparked renewed interest in novel camera and display technologies. In these applications, near-eye light field displays may be able to provide focus cues to a user (e.g., [98]). Focus cues, including retinal blur and accommodation, allow the visual system of non-presbyopic users to accommodate at various distances and thus mitigate the vergenceaccommodation conflict in VR/AR. Alternative technologies offering similar benefits include gaze-contingent varifocal (e.g., [99,100]) and multifocal displays (e.g., [101]). Thus, the depth cues supported by light fields in these near-eye display applications are slightly different from direct-view displays, but crucial for visual comfort and perceptual realism. Here too, the duality of light field imaging and display is important, although light field camera systems for VR/AR are primarily used to capture omnidirectional stereo panoramas (e.g., [102,103]). Such an approach to cinematic VR allows immersive events to be captured and later replayed in VR while providing stereoscopic depth cues for 360 • viewing experiences.
Another emerging research area that provides a strong link between light field capture and display is neural scene representation and rendering (e.g., [104,105]). Instead of focusing too much on camera or display device development, these machine learningdriven methods take as input one of multiple views of a scene and distill them into a differentiable 3D scene representation, typically a neural network. Such a neural scene representation can then be converted into 2D images using a neural renderer. This provides a fully differentiable pipeline that provides state-of-the-art results for view interpolation, hole filling, compression / bandwidth management, and many other problems directly associated with light field imaging and display.
More than a century after integral imaging was developed by Gabriel Lippman, this technology continues to promise unprecedented user experiences in many applications related to photography, direct-view and near-eye VR/AR displays. Advanced algorithms and optical techniques for improving light field systems remain one of the most active areas of research in applied optics, computer graphics, computer vision, and machine learning.

Progress overview on head-mounted light field displays
A light-field-based 3D head-mounted display (LF-3D HMD), is one of the most promising techniques to address the well-known vergence-accommodation conflict (VAC) problem plaguing most of the state-of-the-art HMD technologies due to the lack of the ability to render correct cues for stimulating the accommodative responses of human eyes [70]. It renders the perception of a 3D scene by reproducing directional samples of the light rays apparently emitted by each point of the scene. Each angular sample of the rays represents the subtle difference of the scene when viewed from slightly different positions and thus is regarded as an elemental view of the scene.
Among the various methods that are capable of rendering partial or full-parallax 4-D light fields [1,[106][107][108], the simple optical architecture of an integral imaging based technique makes it attractive to integrate with an HMD optical system and create a wearable light field display. There exist two basic architectures for implementing an integral imaging-based method in HMDa direct-view configuration and a magnified-view configuration. In a direct-view configuration, a microdisplay and an array optics are placed directly in front of the eyes. For instance, Lanman et al. demonstrated a prototype of an immersive LF-3D HMD design for VR applications [109] and Yao et al. demonstrated a see-through prototype by creating transparent gaps between adjacent micro lenses and using a transparent microdisplay. In a magnified-view configuration, a microscopic integral imaging (micro-InI) unit is combined with a magnifying eyepiece to improve the overall depth of reconstruction and image quality. Hua and Javidi demonstrated the first practical implementation of an optical see-through LF-HMD design by integrating a micro-InI unit for full-parallax 3D scene visualization with a freeform eyepiece [110] and Later Song et al. demonstrated another OST InI-HMD design using a pinhole array together with a similar freeform eyepiece [111].
Conventional integral imaging-based displays suffer from several major limitations when applied to HMD systems [109][110][111][112] such as a tradeoff between depth of field (DOF) and spatial resolution, and tradeoffs between viewing angle or viewing window range and view density. To address these limitations, Huang and Hua presented a systematic approach to investigate the tradeoff relationships between the trade-off parameters to establish methods for quantifying their relationships and the threshold requirements and design guidelines [113,114]. Based on their analytical work, Huang and Hua recently proposed a new optical architecture that improves the performance of an integral imaging-based light field HMD by incorporating a tunable lens to extend the DOF without sacrificing the spatial resolution and an aperture array to reduce crosstalk, or equivalently expand the viewing window [115,116]. Figure 13(a) shows the optical layout based on this new architecture and Fig. 13(b) shows two photographs of rendered Snellen letter targets at the depths of 3.5 and 0.5 diopters, respectively along with two physical references placed at the same depths as their corresponding virtual targets [116]. The system supports three different rendering methods: a fixed-CDP mode, a vari-CDP mode, and a time-multiplexed multi-CDP modeenabling a large depth volume from as close as 3.5 diopters or very near to optical infinity without compromising the spatial resolution. Fig. 13. Example of a high-performance integral imaging-based LF-3D OST-HMD: (a) the optical layout and prototype, and (b) images captured through the prototype with the camera focused at the depths of 3.5 and 0.5 diopters, respectively [116].
Although the prototype examples above demonstrated that an integral imaging-based HMD method can potentially produce correct focus cues and true 3D viewing, there exists many technical gaps and challenges to develop this technology into a commercially-viable solution. For instance, scaling up the spatial resolution to the level of 1 arc minutes per pixel or the FOV as wide as 100-degrees to match up the visual acuity and FOV of the human eye, microdisplays required for building such an LF-HMD system would need to offer a pixel density as high as 25000 pixels per inch (PPI), which is still beyond the reach of today's display technology, not mentioning the amount of required computational power.

Innovation of 3D integral imaging display and AR for biomedicine
3D information can significantly accelerate human cognition compared with 2D information in medical applications. High quality, high accuracy and real-time processing, visualization, display of 3D image are important for accurate medical decision-making, which can reduce invasiveness and improve the precision in surgical treatment. Researchers have made significant progresses in 3D medical integral imaging display and intelligent augmented reality (AR) surgical navigation system.
The 3D medical display is first required to have high resolution and high accuracy during the reproducing of images of anatomic structures. In the field of high-performance 3D medical integral imaging display, a multi-projector based high-quality display method was proposed to solve the inadequate pixel density problem of the 2D elemental image [117]. To further break the trade-off between viewing angle and resolution of conventional integral imaging technique, an image enhancement method for the 3D AR system was proposed to achieve enhanced image resolution and enlarged viewing angle at the same time [118]. With the development of telemedicine and medical education, the 3D medical display is required to present a larger scene with long viewing depth. A computer-generated integral imaging elemental image generation method was proposed to achieve a long visualization depth [119].
The second requirement of the 3D medical visualization is high-quality and real-time rendering. Super-multiview integral imaging can provide better image quality and interactivity, but also suffers from high-consuming problem during rendering. A real-time lens based rendering algorithm for super-multiview integral imaging without image resampling was proposed and showed a significant advantage in image quality and calculation efficiency [120]. The research demonstrated that real-time 3D medical display and interaction system could potentially help to promote medical learning efficiency and to reduce operational time for medical education and training [121].
A novel AR navigation system using the real 3D image in situ overlay for intuitive guidance for biomedicine was proposed (Fig. 14). The region of interest in medical images will be reconstructed and rendered in real time [122]. When the surgeon observes through the viewing window, the real 3D image will be overlaid onto the corresponding anatomic structure in situ based on the spatial tracking of the patient [123], tools [124] and overlay system [125]. In this way, all internal anatomic structures are all in the sight of the surgeons during small invasiveness. The 3D AR overlay system has been used in clinical experiments in neurosurgery, orthopedic surgery, maxillofacial surgery and other areas.
Fast technical progress in recent years accelerated the innovation in the 3D display. Researchers proposed an innovative MEMS-scanning-mechanism-based light homogeneous emitting autostereoscopic 3D display approach without the need for optical lenses or gratings and achieved a super long viewing distance of over six meters [126]. The integration of conventional integral imaging and multilayer light field display will also open up new areas of future 3D medical display [127].

Tabletop integral imaging 3D display
Tabletop 3D display is one of the most challenging and interesting 3D displays [128,129]. It enables vivid and natural 3D visual experience, and 360-degree viewing zone. Because of the unique full-parallax and full-color characteristics of integral imaging, it is a natural consequence to apply integral imaging concept to the tabletop 3D display. The first proposal in this sense was made by J. H. Park who used this technology with the aim of displaying 3D images with 360-degree lateral viewing zone [130]. Later, some improved system configurations have been proposed [131][132][133][134][135].
Recently, a swept-type tabletop integral imaging 3D display system has been reported [135]. As shown in Fig. 15(a), the system uses a dynamic tilted barrier array to integrate different elemental image arrays (EIAs) to directional viewing sub zones. By rotating the tilted barrier array in synchronization with the 2D display device, the lens array and the EIA display, 360-degree viewing zone can be achieved. The main advantages of this system are that the crosstalk is eliminated and the longitudinal viewing angle is improved to 40-degree. Figure 15(b) shows the tabletop 3D images at different lateral viewing positions. Note that the parallaxes are apparent. However, the tabletop 3D images are blurred. Another tabletop integral imaging 3D display system with improved 3D image quality has been proposed. As shown in Fig. 16(a), the system utilizes a compound lens array comprised of three pieces of lenses to optimize the 3D image quality in large longitudinal viewing angle. The longitudinal viewing angle can be enlarged to 70-degree with suppressed aberration. In addition, an 8K display panel is used in the system for the improved spatial resolution. Figure 16(b) shows the different perspectives at 360-degree viewing zone, and a display video for the system is also included (Visualization 1). It is obvious that the quality of the tabletop 3D image is good.
Although several attempts have been made for improving the tabletop integral imaging 3D display effect, the spatial resolution and the longitudinal viewing angle are still limited, and the content data are huge. These problems will be overcome, and the tabletop 3D displays with high performance will have wide applications in the future.

Aerial display
One of the important functions of integral imaging is refocusing. As shown in Fig. 17(a), by showing elemental images on a high-density (HD) display, a micro-lens array (MLA) forms the aerial image. Instead of using the HD display, the aerial image of a source display can be formed by use of an MLA, a scattering screen, and an MLA as shown in Fig. 17(b). These optical components can be replaced by a reflective optical component such as a slit-mirror array, as shown in Fig. 17(c). The formed aerial image shows information in mid-air. This function is called aerial display and its international standard is being dealt in the electrotechnical commission (IEC) [136]. In a wide sense meaning, aerial display refers display that show information in mid-air, where there is no hardware. Aerial display can be realized by use of a light-source display and some imaging optics [137][138][139][140]. In the technical report of IEC, aerial display in strict meaning forms a real image in the mid-air by use of a light-source display and a passive optical component to converge diverging light from the light-source display [136]. Essentials of aerial display in strict meaning is shown in Fig. 18. The light-source display emits diverging light rays. A passive optical component changes the direction of each light ray so that light converges to the image position in the mid-air. Thus, the real image of the light source is formed because diverging plural rays emitted from a source position converge to the single position. The formed real image is visible over a wide range of angles when light rays from a wide range converge to the image position. When this converging angle is sufficiently wide, the formed real image maintains the visual 3D depth cues, including convergence, binocular parallax, accommodation, and smooth motion parallax.
Real-image formation enables us to realize aerial applications. Prospective applications of aerial displays are direct-view augmented reality (AR) display and aerial interface. See-through augmented information screen will be utilized for museums, theaters, and next-generation car cockpits. Touchless aerial interfaces are immune from hygiene issues on pressing a button to operate machines.
Aerial displays are not limited to show 2D information. In combinations with the conventional 3D display techniques, we have realized an aerial light-field display [141] and aerial depth-fused 3D (DFD) display [142]. Furthermore, aerial secure display that prevents peeping at the screen has been realized by use of polarization encryption [143]. Omni-directional aerial display was developed and utilized for behavioral biology experiments [144]. Thus, next challenges include versatile aerial display. Performances and specifications of aerial display include image size, floating distance, viewing angle, and resolution. Unlike the conventional flat-panel display, the resolution of the formed aerial screen depends not only on the number of pixels but also the imaging optics, floating distance, and the viewing distance [145]. Optimizations in optical components and systems are next challenging issues.

Spatial displays for 3D human interface by integral and holographic imaging technologies
In contrast to traditional 3D displays, which are based on the stereoscopic effect of binocular parallax, integral imaging display reproduces the light-field [20,21] by directly reproducing the light rays from an object [1,2]. Similarly, holography can reproduce the wavefront from an object. Both technologies allow for the reproduction of 3D space in the form of virtual image or real image [146,147] as in Fig. 19(a), which can be called "spatial displays." The development of such spatial displays is ongoing, as their application fields are broad. The 3D reproduction of virtual or real images using spatial displays gives an unprecedented sense of presence, realism, and impact, and is expected to be applied as a new visual media for realistic, impressive, or artistic expression. It is also desired in communication systems such as video conferencing or smart speakers. The eye-catching effect is another feature of the spatial display, which will facilitate its application in digital signage and kiosk terminals, where the application to a 3D human-machine interface is vastly promising. The combination of gesture recognition and 3D display enables a more intuitive "3D touch" interface [147,148]. The noncontact 3D interface is furthermore necessary in human interface situations for avoiding contamination.
The essential factors for developing practical spatial 3D displays are screen size, resolution, depth-range, image quality, and device size. The requirement for these factors depends on application types. In addition to these primary factors, it will enhance the sense of real existence if the display screen is discernible. The 3D image is expected to be reachable by users for the intuitive 3D touch user interface.
Then, what are the key technical issues for practical spatial displays? The intense demand for device technology is a spatial light modulator with an extremely high space-bandwidth product per unit time [149]. For example, 300k x 150k = 30G pixels are need in one frame. The screen size is flexible in an integral display, while holography requires a small pixel pitch. Other important issues are the communication of huge data and efficient computation of high-quality images. The system configuration for impressive visual effect and compact optical setup is also crucial for maximizing the benefits of a spatial display in the envisioned applications.
Since holographic and integral technologies for spatial displays have different features, their combination facilitates the solution of various application challenges [150][151][152][153][154][155]. An example is the use of a holographic screen for an integral display, and the elemental images are projected, then 3D user interface with thin, transparent screen has been realized [147,148,151]. Another instance of the combination is the use of advanced rendering techniques for computer graphics in the computation of hologram [152][153][154][155]. Then high-resolution deep 3D image with realistic material appearance can be reproduced in a holographic display as shown in Fig. 19(b). Further combinations of these technologies will allow new capabilities to emerge in the future.

Conclusion
While there are many approaches in 3D technologies, this article has focused on integral imaging. The Roadmap paper is comprised of 15 section to provide an overview of research activities in 3D integral imaging. Each section is prepared by an expert in the field. The author of each section describes the progress, potential, vision, and challenges in a particular application of integral imaging including signal detection in turbid water, low light object visualization and recognition, polarimetric imaging, microscopy, object recognition, 3D data compression, displays, and augmented reality. As in any overview paper of this nature, it is not possible to describe and represent all the possible applications, approaches, and activities in the broad field of 3D integral imaging. Thus, we apologize in advance if we have ignored any relevant work.

Authors' Contributions
This Section describes how the authors contributed to this manuscript. B. Javidi prepared sections 2 and 3. A. Carnicer prepared Section 4. M. Martínez-Corral prepared Section 5. L. Waller prepared Section 6. T. Fujii prepared Section 7. F. Pla prepared Section 8. A. Stern prepared Section 9. J. Arai prepared Section 10. G. Wetzstein prepared Section 11. H. Hua prepared