Implementation of a full-color holographic system using RGB-D salient object detection and divided point cloud gridding

Yu Zhao; Jing-Wen Bu; Wei Liu; Jia-Hui Ji; Qin-Hui Yang; Shu-Feng Lin

doi:10.1364/OE.477666

1. Introduction

1.1 Background

The discussion about the “Metaverse” has become a hot topic both in academia and industry recently. Numerous domestic and overseas companies have vigorously entered the “Metaverse” industry. Amid the growing demand for non-face-to-face communication later in the pandemic, the VR industry ushers in development opportunities. Compared with two-dimensional (2-D) display, three-dimensional(3-D) display technology can provide image content that is closer to the real world. It is a critical technology in the fields of 5 G communication, big data, metaverse, and the Internet of Things [1]. The computational holographic 3-D display is considered as an optical display technology with the potential for change. Its applied ranges include medical, entertainment, scientific research, military, distance education, vehicle display systems, and other fields with broad application prospects [2–4].

As one of the holographic display technologies, the computer-generated holographic display has become the focus of current research. Compared with traditional optical holography, computer-generated holography has many advantages, such as low production cost, fast imaging speed, flexible recording and reconstruction, and easy storage and transmission of information. Therefore, computer-generated holographic display technology has become a hot topic of 3-D display in various countries [4,5]. The development of computer-generated holographic 3-D display technology has made remarkable progress, and we are about to enter the 5 G communication era. The high speed of 5 G communication enables it possible to make a holographic video call and real-time remote holographic video displays based on computer-generated holography possible. In order to realize holographic video calls and real-time holographic video displays, researchers have actively studied holographic displays and computational holography technology for real objects to solve their existing problems. The challenges of high-quality holographic display include the difficulty of real-time data acquisition and fast calculation of holograms, insufficient quality of the reconstruction of computational holograms, limited performance of phase front modulators and holographic display systems, and lack of 3-D content sources, which are difficult to meet the actual requirements [6]. Therefore, it is urgent to study the fast algorithm of computer-generated hologram(CGH) for real objects, improve the adaptability of the algorithm to data, and improve the quality of reconstructed images. They are also unavoidable and necessary problems in the practical process of computer-generated hologram display. It is the only way for the development of CGH 3-D display to realize color dynamic holographic 3-D display facing real objects with a high refresh, high quality, low noise, and no distortion. And, it is also an inevitable requirement for holographic 3-D display in applications such as Metaverse [7,8].

With the improvement of computer technology and sensor technology, we can get 3-D information about real objects more easily. Consequently, holographic display for a real object has become more popular. In these studies, the main methods to obtain 3-D information about real objects are camera scanning, DSLR camera array scanning, depth camera acquisition, and so on. Depth cameras can easily obtain the color distribution and 3-D coordinates of real objects. Therefore, the current full-color computer-generated holographic display technology for real objects usually uses depth cameras to collect scene information and uses spatial light modulators to modulate the input light and reconstruct a 3-D scene of the real object.

1.2 Relate works

Korean scholars Lee et al. proposed a method to generate a digital hologram of a real 3-D object through a depth camera [9]. In their studies, depth cameras used the method of time-of-flight optical distance measurement to acquire depth and color information of natural 3-D objects. Li et al. used a simplified CGH pick-up method by employing a Kinect depth camera [10]. In the researchers’ collection scheme, both depth and color information was obtained by Kinect depth camera, and a hologram was then generated from the 3-D information using a point cloud approach. Yamaguchi et al. used a ray-sampling (RS) plane approach featuring a scanning vertical camera array for computational holography and the acquisition of information for real objects [11]. Holograms were created using light ray information processed by conventional CG-rendering software to reproduce deep 3-D scenes at high resolution. In addition, there is 360°full-color holographic reconstruction based on a Kinect depth camera for real objects [12]. However, it is hard to meet the actual needs of holographic video calls and real-time holographic video displays due to its difficulty in obtaining real-time data, calculation speed and insufficient reconstruction quality. Therefore, for color holographic display, the authors have constructed a full-color holographic system for natural objects [13], but due to the acquisition speed and reconstruction speed, the system is difficult to achieve real-time standards.

The wave-front recording plane (WRP) method [14] is an excellent acceleration scheme to overcome the shortcomings of traditional point cloud computing methods. Although it is closer to the object plane, WRP is a small virtual window placed between the object plane and the hologram plane. By calculating the complex amplitude of a small region on the WRP, rather than the complex amplitude over the whole holographic plane, the computational time required can be reduced. The authors and team members have also optimized this algorithm and proposed new algorithms based on wave-front recording plane surface (WRS) [15], depth ranging wave-front plane [16], and uniform ranging multi wave-front recording surface [17] to accelerate the speed of computational holograms based on the point cloud. However, the light field of the point cloud needs to be pre-calculated to determine the light field of the wave-front plane. So, it is much more difficult to use these methods to achieve real-time holographic display. Some scholars have also proposed methods for the angular spectrum layer to generate computational holograms from 3-D scenes [18,19]. The fast Fourier transform is applied without paraxial approximation to generate the angular spectrum of each layer, thus corresponding to the hologram of each layer. Japanese scholars have proposed a method to calculate 3-D holograms by using block radial and diffusion functions, which can accelerate the calculation of holograms at fixed viewpoints. However, due to the reduction of information in the point spread function, it is unsuitable for wide-angle naked-eye display [20].

In our previous work, the authors have proposed a depth-layer weighted prediction approach for full-color CGHs to achieve high-quality color holographic display [13]. In order to achieve higher reconstruction quality, we proposed a holographic display system based on a multi-depth camera [21]. We have also used a high-quality DSLR cameras to capture 2D images and constructed a high-precision color holographic optical reconstruction system based on multi-digital DSLR cameras to reconstruct real objects with high precision and large viewing angle [22]. Nevertheless, the research on reconstruction quality needs to be further expanded to meet the actual needs, which makes the color holographic reconstruction image more similar to the real object. A point cloud gridding (PCG) algorithm was proposed to lift the generation speed of CGHs for real objects [23]. Then, a relocated point cloud gridding algorithm (R-PCG) was proposed. The effect of overlapping data is eliminated through approximate calculation of small amplitude layer displacement, and the reconstruction efficiency and quality of CGH of the full-color holographic system are improved [24]. The author has also proposed a segmented point cloud gridding algorithm to improve the hologram generation speed [25]. However, it is still difficult to meet the needs of real-time display with the current calculation speed.

With the development of the related technology of the depth camera, it is very convenient to collect information from 3-D real objects through a depth camera, and there comes an urgent need for personalized editing and processing of 3-D information [26]. Salient object detection is an image pre-processing step that simulates the human visual attention mechanism in the field of computer vision. By detecting the region of interest in the image, redundant background information can be filtered out, which can effectively reduce the complexity of large information processing. This is also important for 3-D data in holographic display systems [27]. In the previous full-color holographic system, the authors usually use a depth camera to obtain depth and color information from real objects and then introduce the traditional region of interest (ROI) algorithm to generate a color point cloud model and proceed with the point cloud to adapt to the fast Fourier algorithm (FFT). However, the traditional ROI algorithm has large redundancy and various drawbacks, which limits the further improvement of the processing performance of the acquisition unit.

In this research, the depth camera is used to collect information from real objects, and then we filter out redundant background information by an RGB-D salient object detection, which is named ReSidual U-blocks and U-net based reverse attention residual network(U²-RAS) to capture more contextual information from different scales. It can effectively reduce the complexity of 3-D image processing. The proposed divided depth gridding algorithm is used to realize the rapid generation of holograms. Finally, the full-color holographic system for real objects is constructed. In addition, the feasibility of this method is proved by numerical simulations and optical experiments.

2. Full-color holographic system

This study focuses on two key issues of full-color holographic system, which are the acceleration of 3-D information acquisition, and the acceleration of hologram generation. The full-color holographic system for real objects is constructed by RGB-D salient object detection, divided point cloud gridding (D-PCG) algorithm, GPU parallel computing, and 4f system construction.

The content of this research is shown in Fig. 1. The full-color holographic system includes three units: (1) a 3-D information acquisition and pre-processing unit, (2) a hologram generation unit, and (3) a color reconstruction unit. The research scheme of 3-D salient object detection is adopted in unit 1, and the D-PCG algorithm is used to improve the calculation speed of the hologram in unit 2.

Fig. 1. Schematic diagram of the proposed full-color holographic system.

Download Full Size | PDF

2.1 RGB-D salient object detection

A depth camera that can quickly and accurately obtain object information is used to collect 3-D information from real objects. Due to the influence of the position, performance, space light, and other factors of the acquisition equipment, the collected 3-D information has the phenomena of miscellaneous points, occlusion, distortion, and low edge accuracy for the sampling of real objects. Moreover, the amount of data collected directly from the 3-D object is relatively large. Each pixel of the calculated hologram should contain all the information from the object. Therefore, the calculation speed of the complex 3-D object hologram is not enough, so the redundant information needs to be filtered at the acquisition step unit. In the research field of computer vision, salient object detection is a step of image preprocessing that simulates the human visual attention mechanism. By detecting the most interesting areas in the image, the redundant background information of the 3-D object can be filtered out, which can effectively reduce the complexity of massive 3-D image processing. This is particularly important in a full-color holographic system. In the system, the traditional ROI algorithm has various drawbacks, especially for the large redundancy at the boundary of the salient object, which limits the further improvement of performance. So the research should study and design a salient object detection model based on depth learning first to make sure it is more suitable for depth camera applications.

Although the method based on the traditional ROI can extract the saliency detection results simply and effectively, it is still not satisfactory in the aspect of object edge detail extraction. Therefore, as shown in Fig. 2, an RGB-D salient object detection network based on holistically nested edge detection (HED) [27] architecture and U-Net is proposed in this paper. The proposed algorithm called U²-RAS, which is an accurate and compact deep salient object detection network, is mainly composed of reverse attention residual network (RAS) and ReSidual U-blocks (RSU) [28].

Fig. 2. Schematic diagram of RGB-D salient object detection (U²-RAS)

Download Full Size | PDF

Firstly, we assign the color image and depth image to the detection network. then we use residual learning to learn the residual features of the side output for the deepest roughness salient prediction and apply deep supervision to learn the residual features. Instead of directly learning the multi-scale salience object features in different side output stages, we can achieve a lighter model under the premise of ensuring accuracy. Given the up-sampled input image by a factor in the side-output stage, and the residual feature learned in the side-output stage, the deep supervision can be formulated as:

(1)$$\left\{ \begin{array}{l} {\{{{S_{t + 1}}} \}^{up\mathrm{\ \times }{2^{\textrm{t + }1}}}} \approx {\textrm{G}_T}\\ {\{{S_{t + 1}^{up} + {R_t}} \}^{up \times {2^t}}} = {\{{{S_t}} \}^{up \times {2^t}}} \approx {G_T} \end{array} \right., $$

where ${S_t}$ is the output of the residual unit and ${G_T}$ is the ground truth, $up \times {2^t}$ denotes the up-sample operation by a factor ${2^t}$, which is implemented by the same bilinear interpolation with HED. The residual units establish a shortcut connection between predictions at different scales and correct data, which makes them more adaptable to scales and easier to correct for errors. Then the prediction of the shallowest side output is sent to the sigmoid layer for output.

Secondly, the reverse attention mechanism is adopted to guide the feature learning of each side output layer. Starting from the rough saliency map with high semantic confidence and low resolution generated at the bottom layer, the current predicted saliency region is deleted from the side output feature to guide the entire network to discover supplementary object regions. The attention weight is applied to each side output layer in a top-down manner. And the coarse low-resolution prediction can be refined into a complete high-resolution saliency map, which contains object boundaries and other undetected object parts. Given the side-output feature and reverse attention weight, then the output attentive feature can be produced by their element-wise multiplication, which can be formulated as:

(2)$${F_{z,c}} = {A_z} \cdot {T_{z,c}}, $$

where ${A_z}$ is the reverse attention weight of the characteristic graph, ${T_{z,c}}$ is the measured output feature of the feature map, z and c denote the spatial position of the feature map and the index of the feature channel, respectively.

For the part of the detected significant object, the mask map is taken from the lower feature map, only the detection of the currently undetected is completed. This makes the network training convergence very fast, which is also the advantage of residual networks. However, if the saliency map has incorrect redundant parts, it will inevitably lead to some errors. Of course, this is also related to the current salient object detection which is mainly detected by the incomplete contour of the object. Therefore, this research cites the RSU unit, which is able to acquire contextual information at different scales without reducing the resolution of the feature map. In the original U-Net structure, an RSU unit is filled in each stage to form a nested U-shaped structure, called U²-Net. The main design difference between the RSU and the remaining blocks is shown in Fig. 2. The RSU replaces the ordinary single-stream convolution with a structure similar to U-Net and the original features with local features transformed by the weight layer:

(3)$${H_{RSU}}(x) = u({F_1}(x)) + {F_1}(x), $$

where u represents the multi-layer U-structure illustrated, $u({F_1}(x))$ is multi-scale feature and ${F_1}(x)$ is local feature. This design enables the network to extract features directly from multiple scales for each residual block, increasing the network depth without increasing the computational intensity.

In addition, a U²-RAS algorithm is proposed, which updated the RAS unit with the RSU unit on U-Net to obtain a novel and efficient network architecture. The U²-RAS is designed for RGB-D salient object detection without using any pre-trained backbones from image classification. It can be trained from scratch to achieve competitive performance. The new architecture enables the network to be more in-depth and obtain high resolution without significantly increasing memory and computing costs. In the training process, we use deep supervision similar to HED. Our training loss is defined as:

(4)$$L = \sum\limits_{m = 1}^M {w_{side}^{(m)}l_{side}^{(m)} + {w_{fuse}}{l_{fuse}}} , $$

where $l_{side}^{(m )}$(M = 6, as the Sup1, Sup2, $\cdots $, Sup6) is the loss of the side output saliency map $S_{side}^{(m )}$ and ${l_{fuse}}$ are the loss of the final fusion output saliency map ${S_{fuse}}$. $w_{side}^{(m )}$ and ${w_{fuse}}$ are the weights of each loss term, respectively. For each term l, we use the standard binary crossentropy to calculate the loss:

(5)$$l ={-} \sum\limits_{(r,c)}^{(H,W)} {[{P_{G(r,c)}}\log {P_{S(r,c)}} + (1 - {P_{G(r,c)}})\log (1 - {P_{S(r,c)}})]} , $$

where $({r,c} )$ is the pixel coordinates and $({H,W} )$ is image size: height ang width. ${P_{G({r,c} )}}$ and ${P_{S({r,c} )}}$ denote the pixel values of the ground truth and the predicted saliency probability map, respectively.

Then, the point cloud map is generated by the color image and depth image of the object. The point cloud generated by the depth camera usually contains a large amount of depth information, as shown in Fig. 1 (unit 2). However, the depth difference between each point is constant on the z-axis of the point cloud. The point clouds can be classified into multiple sub-layers by using depth information. After classifying the point clouds, the point cloud model can be resampled and transformed into a deep grid.

Finally, we obtain the point cloud model of the salient object by multiplying the depth grids of the point cloud model with the predicted detection map output from the depth layer.

2.2 Divided point cloud gridding algorithm

After obtaining the point cloud model of the salient object, the D-PCG method is used to remove the invalid sub-grids to achieve the efficient generation of color holograms. In the D-PCG algorithm, the point cloud is stretched to generate a sparse matrix. Therefore, the FFT calculation of the entire area of the depth grid will cost a large amount of unnecessary calculation time. The D-PCG algorithm will perform effective calculations of the depth grid according to the effective area. Each sub-depth grid is determined to remove redundant grid information to reduce calculation time. Meanwhile, FFT is used to generate computational hologram (CGH) from effective depth grids. Finally, this algorithm establishes a GPU parallel processing platform to achieve the purpose of fast acquisition and processing of 3-D information of real objects.

As shown in Figs. 3 (a-d), after dividing the point cloud model into sub-depth grids, the grids in each depth are composed as sparse images. The depth grids should be generated from the effective areas. The FFT calculation of the entire area from all of the depth grids is a large waste of time, the region of interest (ROI) operator is adopted for each depth grid to reduce the calculation time. Among of D-PCG algorithms, every point has the same depth value in red, green, and blue (RGB) channels as the gridding point cloud model. In each channel, the depth grid is divided into M × N parts, and only the effective area of the depth grid is calculated. After the point cloud divided into sub-depth grids, the depth grids of different distances will form a sparse matrix, as shown in Fig. 3(e). As the FFT calculation of the entire area of the deep grid will cost a large amount of non-essential calculation time, the D-PCG used in this study will divide the sub-grids and perform effective calculations of the deep grid based on the effective area. An ROI operator is used to remove the invalid sub-grid sections for each depth grid. Compared with wave-front recording plane (WRP) and the traditional PCG method. The complexity of computation will be greatly reduced.

Fig. 3. (a) Point cloud model, (b)data of 18th depth grid and,(c)data of 180th depth grid and (d)data of 180th depth grid. (e) The principles of the D-PCG method.

Download Full Size | PDF

Larger M and N will cause faster calculation. However, more segmentation number causes more splicing time. With this feature, the authors determined the M and N by experimental data. When setting the resolution of depth grids and holograms as 1024 × 1024 (2048 × 2048) pixels, the distance between object and hologram is set as 0.5 m, the wavelength is 532 nm, and the total number of samples in each dimension becomes Nx = Ny = 43(169), and even if we divide the depth grids into 10 × 10 parts, the number of samples required in the object domain and hologram domain is sufficient.

The FFT algorithm is used to improve the computational efficiency when generating holograms. The hologram of three channels is generated by using 2D FFT on the deep grid. By calculating the hologram from a 2-D multi-depth grid rather than a single point in a point cloud, the overall calculation time can be greatly reduced. Then the calculated hologram can be obtained by performing light field diffraction calculations on each layer:

(6)$${U_M}\textrm{(}{f_x}\textrm{,}{f_y}\textrm{) = }{\cal F}[{U_N}[x,y]],$$

where U_M is the light field information in red, green, and blue channels. Applying FFT to the Fresnel diffraction impulse response can obtain the corresponding angular spectrum:

(7)$$H({f_X},{f_Y}) = {e^{jkz}}\exp [ - j\pi \lambda z(f_X^2 + f_Y^2)],$$

where λ is the wavelength, and z is the distance between the object point and the hologram, and $H({f_X},{f_Y})$ represents the angular spectrum. As shown in the Eq. (8) and Eq. (9), by applying the 2D FFT to the formula and the result of multiplication, the sub-hologram of the cutting grid can be quickly generated.

(8)$${H_{Depth\textrm{ }grid\textrm{ }N}} = {\mathrm{{\cal F}}^{ - 1}}[{\mathrm{{\cal F}}[{{U_M}[{f_x}\textrm{,}{f_y}]} ]H({f_X},{f_Y})} ],$$

(9)$${H_{M\_sub}} = {H_{Depth\textrm{ }grid\textrm{ }1}} + {H_{Depth\textrm{ }grid\textrm{ }2}} +{\cdot}{\cdot} \cdot{\cdot} \cdot{\cdot} + {H_{Depth\textrm{ }grid\textrm{ }N}},$$

In which ${H_{Depth\textrm{ }grid\textrm{ }N}}$ represents the hologram of a deep grid in the M channel (M = R, G, or B). Then the RGB sub-holograms are combined into a color hologram. Since the hologram is composed of three channels (RGB), the color hologram is encoded as 24-bit (8-bit × 3) image.

3. Experiment and results

In this section, the 3-D salient object detection model proposed in this paper was combined with the D-PCG algorithm to generate a hologram, and we compare its reconstruction speed and quality with the ROI algorithm. The Kinect depth camera with version 2.0 was used to collect depth images and color images. The simulation test was implemented in MATLAB 2021b and runs on Windows 10 and Ubuntu 22.0 64-bit PCs. The PC used in the experimental environment includes an RTX3070 unique display graphics card (8 G video memory) and a 12th generation CPU of Core i7-12700 H with 16 GB RAM.

The experiments prove the effectiveness of the proposed method. Figures 4 (a) and (b) are color information and depth information collected by a depth camera. We compare the results generated by the traditional salient object detection dilated residual networks (DRN) algorithm [29], the reverse RAS algorithm [30], and the proposed U²-RAS algorithm introduced. The detected object is white and the background part is black. The training set contains 803 images, and then we get the contour of the objects detected by 200 training times with DRN, RAS and U²-RAS method respectively. From Figs. 4 (c), (d), and (e), it can be seen that the object detected by our proposed method is more accurate.

Fig. 4. (a) Color image and (b) depth image obtained by the depth camera. Prediction detection images obtained by (c) DRN algorithm, (d) RAS algorithm and proposed (e) U²-RAS algorithm.

Download Full Size | PDF

Figure 5 shows the prediction test graph of the U²-RAS algorithm after training 30, 100, and 200 times to get the model. The training set contains 803 images of different types and complexity, and the average time required for each training session is 160 seconds. It can be seen that the contour of the object detected by 200 training times is clearer than that of 30 and 100 times.

Fig. 5. (a) Color image and the prediction detection map obtained by U²-RAS algorithm trains (b)30, (c)100 and (d)200 times respectively.

Download Full Size | PDF

As shown in Figs. 6 (a) and (b), compared with the traditional ROI algorithm, the U²-RAS algorithm can remove the redundant background more effectively. Figures 6 (c) and (d) show the reconstruction images of different focal lengths obtained by the U²-RAS algorithm.

Fig. 6. Point cloud model generated using (a) ROI algorithm and (b) U²-RAS algorithm. Reconstruction image with distance at (c) 25 cm and (d) 35 cm.

Download Full Size | PDF

Table 1 shows popular object-level Salient object detection (SOD) evaluation metrics: F-measure [30], Structural measure (S-measure) [31], Enhanced-alignment measure (E-measure) [32] and mean absolute error (MAE) [33]. Compared with the U²-NET and RAS algorithm, the U²-RAS algorithm can remove redundant backgrounds more effectively.

Table 1. Evaluation metrics

View Table | View all tables in this article

F-measure comprehensively considers both precision and recall by computing the weighted harmonic mean:

(10)$${F_\beta } = \frac{{(1 + {\beta ^2})\textrm{Precision} \times \textrm{Recall}}}{{{\beta ^2}\textrm{Precision + Rcall}}}, $$

where ${\beta ^2}$ is set to 0.3 to put more emphasis on precision.

S-measure evaluates the structural similarity between the real-valued saliency map and the binary ground truth. It considers object-aware (${S_o}$) and region-aware (${S_r}$) structure similarities:

(11)$$S = \alpha \times {S_o} + (1 - \alpha ) \times {S_r}, $$

where α is empirically set to 0.5.

E-measure considers global means of the image and local pixel matching simultaneously:

(12)$${Q_s} = \frac{1}{{W \times H}}\sum\nolimits_{i = 1}^W {\sum\nolimits_{j = 1}^H {{\phi _S}} } (i,j), $$

where ${\phi _S}$ is the enhanced alignment matrix, reflecting the correlation between S and G after subtracting their global means, respectively.

MAE denotes the average per-pixel difference between a predicted saliency map and its ground truth mask. It is defined as:

(13)$$MAE = \frac{1}{{W \times H}}\sum\nolimits_{r = 1}^H {\sum\nolimits_{c = 1}^W {|{P(r,c) - G(r,c)} |} } , $$

where P and G are the probability map of the salient object detection and the corresponding ground truth respectively, (H, W) and (r, c) are the (height, width) and the pixel coordinates.

The computation time of WRP, R-PCG and D-PCG methods are compared in Tables 2 and 3 when using CPU and GPU. The resolutions of the holograms are 1024 × 1024 and 2048 × 2048 pixels respectively, and the pixel spacing is 7.4µm. The wavelengths of red, green, and blue light are set to 633 nm, 532 nm, and 473 nm, respectively. To balance the computation time and complexity, the number of partitions is chosen to be m = 8. Compared with the traditional WRP method, the D-PCG method proposed in this paper improves the computational hologram generation speed by 31.6-104.9 times using CPU and 5.8-17.9 times using GPU. The CPU and GPU computational hologram generation speed of the D-PCG method is about 59% of the R-PCG method.

Table 2. Calculation time of the full-color holographic system when the resolution is 1024 × 1024. Running times are in seconds

View Table | View all tables in this article

Table 3. Calculation time of the full-color holographic system when the resolution is 2048 × 2048. Running times are in seconds

View Table | View all tables in this article

The optical reconstructions were performed by using RGB lasers and a transmission-type SLM (8.0µm, 1,920 × 1,200 pixels). As shown in Fig. 7, the RGB lasers were used as reference beams. The wavelengths are as follows: red laser, 660 nm; green laser, 532 nm; and blue laser, 473 nm. A 4f system was used to eliminate the noise in the display system.

Fig. 7. The setup of full-color holographic system.

Download Full Size | PDF

Figures 8(a) and (b) show the color images and depth images of the real objects captured in the full-color holographic system. As shown in Fig. 8(c), the point clouds were generated with the proposed U²-RAS method, and the distance from the center of the SLM to the CCD camera is 30 cm. In the numerical and optical experiment, the resolution of the hologram was set as 1024 × 1024, the depth grids are divided into 10 × 10 parts. It demonstrates that the real objects have been reconstructed successfully with the proposed U²-RAS and D-PCG methods in the full-color holographic system.

Fig. 8. (a) Color images and (b) depth maps of real objects. (b) Point cloud models generated from U²-RAS and (c) Numerical and (d)optical reconstructed images without 4f system.

Download Full Size | PDF

Figure 9 shows the reconstruction results by using the U²-RAS algorithm and D-PCG method at focal lengths of 35 cm and 25 cm, respectively. In the experiment, we adopted a 4f system to eliminate the speckle noise. The experimental results show that 3-D object can be reconstructed by U²-RAS salient object detection and the D-PCG algorithm.

Fig. 9. Reconstructed images using red, green and blue laser with central positions located at (a) (b) (c)35 cm and (f) (g) (h) 25 cm. Full-color reconstructed images with central positions located at (a)35 cm and (b) 25 cm with 4f system.

Download Full Size | PDF

4. Conclusion

In this paper, two units corresponding to the 3-D information acquisition and the hologram generation are focused on the full-color holographic system to improve the calculation speed of holograms and the reconstruction quality, respectively. In the acquisition unit, U²-RAS based salient object detection is adopted. In the hologram generation unit, the D-PCG algorithm is used to improve the calculation speed of the hologram. The experimental results show that the full-color holographic system based on U²-RAS salient object detection and D-PCG algorithm can obtain better imaging effect and cost less time compared with the traditional ROI, DRN, U²-net, WRP, and R-PCG methods, which can effectively improve processing speed and image accuracy of the holographic system.

Funding

National Natural Science Foundation of China (No.61905008, No.62205283); Natural Science Foundation of Jiangsu Province (No. BK20200921); China Postdoctoral Science Foundation (No.2022M712697); Natural Science Research of Jiangsu Higher Education Institutions of China (No. 20KJB510024).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. L. C. Cao, Z. H. He, K. X. Liu, and X. M. Sui, “Progress and challenges in dynamic holographic 3D display for the metaverse,” Infrared and Laser Engineering 51(1), 267–281 (2022). [CrossRef]

2. L. Shi, B. C. Li, C. Kim, P. Kellnhofer, and W. Matusik, “Towards real-time photorealistic 3d holography with deep neural networks,” Nature 591(7849), 234–239 (2021). [CrossRef]

3. C. P. Chen, Y. Cui, Y. Ye, F. Yin, H. Shao, Y. Lu, and G. Li, “Wide-field-of-view near-eye display with dual-channel waveguide,” Photonics 8(12), 557 (2021). [CrossRef]

4. Z. Zhang, C. P. Chen, Y. Li, B. Yu, L. Zhou, and Y. Wu, “Angular multiplexing of holographic display using tunable multi-stage gratings,” Mol. Cryst. Liq. Cryst. 657(1), 102–106 (2017). [CrossRef]

5. N. Chen, E. Y. Lam, T. C. Poon, and B. Lee, “Sectional hologram reconstruction through complex deconvolution,” Opt. LasersEng. 127(Apr), 105945 (2020). [CrossRef]

6. J. H. Park, “Recent progress in computer-generated holography for three-dimensional scenes,” J. Inf. Disp 18(1), 1–12 (2017). [CrossRef]

7. C. Chen, D. Kim, D. Yoo, B. Lee, and B. Lee, “Off-axis camera-in-the-loop optimization with noise reduction strategy for high-quality hologram generation,” Opt. Lett. 47(4), 790–793 (2022). [CrossRef]

8. B. Lee, D. Kim, S. Lee, C. Chen, and B. Lee, “High-contrast, speckle-free, true 3D holography via binary CGH optimization,” arXivarXiv:2201.02619 [eess.IV] (2022). [CrossRef]

9. S. H. Lee, S. C. Kwon, H. B. Chae, J. Y. Park, H. J. Kang, and J. D. K. Kim, “Digital hologram generation for a real 3-D object using by a depth camera,” J. Phys.: Conf. Ser. 415(1), 012049 (2013). [CrossRef]

10. G. Li, K. Hong, J. Yeom, N. Chen, J. H. Park, N. Kim, and B. Lee, “Acceleration method for computer generated spherical hologram calculation of real objects using graphics processing unit,” Chin. Opt. Lett. 12(6), 71–75 (2014).

11. M. Yamaguchi, K. Wakunami, and M. Inaniwa, “Computer generated hologram from full-parallax 3-D image data captured by scanning vertical camera array,” Chin. Opt. Lett. 12(6), 80–85 (2014).

12. E. Y. Chang, J. Choi, S. Lee, S. Kwon, J. Yoo, M. Park, and J. Kim, “360-degree color hologram generation for real 3-D objects,” Appl. Opt. 57(1), A91–A100 (2018). [CrossRef]

13. Y. Zhao, K. C. Kwon, Y. L. Piao, S. H. Jeon, and N. Kim, “Depth-layer weighted prediction method for a full-color polygon-based holographic system with real objects,” Opt. Lett. 42(13), 2599–2602 (2017). [CrossRef]

14. T. Shimobaba, N. Masuda, and T. Ito, “Simple and fast calculation algorithm for computer-generated hologram with wavefront recording plane,” Opt. Lett. 34(20), 3133–3135 (2009). [CrossRef]

15. Y. Zhao, Mei-Lan Piao, Gang Li, and Nam Kim, “Fast calculation method of computer-generated cylindrical hologram using wave-front recording surface,” Opt. Lett. 40(13), 3017–3020 (2015). [CrossRef]

16. M. S. Islam, Y. L. Piao, Y. Zhao, K. C. Kwon, E. Cho, and N. Kim, “Max-depth-range technique for faster full-color hologram generation,” Appl. Opt. 59(10), 3156–3164 (2020). [CrossRef]

17. Y. L. Piao, M. U. Erdenebat, Y. Zhao, K. C. Kwon, and N. Kim, “Improving the quality of full-color holographicthree-dimensional displays using depth related multiple wavefront recording planeswith uniform active areas,” Appl. Opt. 59(17), 5179–5188 (2020). [CrossRef]

18. Y. Zhao, L. Cao, H. Zhang, W. Tan, S. Wu, Z. Wang, and G. Jin, “Time-division multiplexing holographic display using angular-spectrum layer-oriented method,” Chin. Opt. Lett. 14(1), 16–20 (2016).

19. H. Y. Wu, C. W. Shin, and N. Kim, “Full-color holographic optical elements for augmented reality display,” in Holographic Materials and Applications, M. Kumar, ed. (IntechOpen, 2019), Chap. 3.25.

20. D. Yasuki, T. Shimobaba, M. Makowski, D. Blinder, J. Suszek, M. Sypek, T. Birnbaum, P. Schelkens, T. Kakue, and T. Ito, “Three-dimensional hologram calculations using blocked radial and windmill point spread functions,” Opt. Express 29(26), 44283–44298 (2021). [CrossRef]

21. Y. Zhao, M. U. Erdenebat, M. L. Piao, M. S. Alam, S. H. Jeon, and N. Kim, “Multiple-camera holographic system featuring efficient depth grids for representation of real 3D objects,” Appl. Opt. 58(5), A242–A250 (2019). [CrossRef]

22. Y. Zhao, K. C. Kwon, M. U. Erdenebat, S. H. Jeon, M. L. Piao, and N. Kim, “Implementation of full-color holographic system using non-uniformly sampled 2D images and compressed point cloud gridding,” Opt. Express 27(21), 29746–29758 (2019). [CrossRef]

23. Y. Zhao, C. X. Shi, K. C. Kwon, Y. L. Piao, M. L. Piao, and N. Kim, “Fast calculation method of computer-generated hologram using a depth camera with point cloud gridding,” Opt. Commun. 411, 166–169 (2018). [CrossRef]

24. Y. Zhao, K. C. Kwon, M. U. Erdenebat, M. S. Islam, S. H. Jeon, and N. Kim, “Quality enhancement and GPU acceleration for a full-color holographic system using a relocated point cloud gridding method,” Appl. Opt. 57(15), 4253–4262 (2018). [CrossRef]

25. Y. Zhao, L. M. Zhu, J. R. Zhu, Y. Huang, and M. Y. Zhu, “Full-color holographic system featuring segmented point cloud gridding and parallel computing for real objects,” Optics InfoBase Conference Papers, 2021, Applied Industrial Spectroscopy, AIS2021.

26. Y. M. Fang, C. Zhang, H. Q. Huang, and J. J. Lei, “Visual attention prediction for stereoscopic video by multi-unit fully convolutional network,” IEEE Trans. on Image Process. 28(11), 5253–5265 (2019). [CrossRef]

27. S. N. Xie and Z. W. Tu, “Holistically-nested edge detection,” In Proceedings of the IEEE international conference on computer vision, pages1395–1403 (2015).

28. S. H. Chen, X. L. Tan, B. Wang, H. C. Lu, X. L. Hu, and Y. Fu, “Reverse attention based residual network for salient object detection,” IEEE Trans. on Image Process. 29, 3763–3776 (2020). [CrossRef]

29. F. Yu, V. Koltun, and T. Funkhouser, “Dilated residual networks,” Proceedings of the IEEE conference on computer vision and pattern recognition (2017). [CrossRef]

30. R. Achanta, S. Hemami, F. Estrada, and S. Susstrunk, “Frequencytuned salient region detection,” in Proc. IEEE Conf. Comput. Vis.Pattern Recognit, pp. 1597–1604 (2019).

31. D. P. Fan, M. M. Cheng, Y. Liu, T. Li, and A. Borji, “Structuremeasure: A new way to evaluate foreground maps,” in Proc. IEEE. Int. Conf. Comput. Vis (2017).

32. D. P. Fan, M. M. Cheng, J. J. Liu, S. H. Gao, Q. Hou, and A. Borji, “Enhanced-alignment measure for binary foreground map evaluation,” in International Joint Conferences on Artificial Intelligence (2018). [CrossRef]

33. F. Perazzi, P. Krähenbühl, Y. Pritch, and A. Hornung, “Saliency filters: Contrast based filtering for salient region detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. IEEE, pp. 733–740 (2012).

	F-measure	S-measure	E-measure	MAE
U²-NET	0.823	0.951	0.943	0.033
DRN	0.158	0.301	0.319	0.432
RAS	0.630	0.457	0.564	0.311
U²-RAS	0.987	0.991	0.992	0.004

Object			CPU running time			GPU running time
Name	Points	Layers	WRP	R-PCG	D-PCG	WRP	R-PCG	D-PCG
Fig. 6–1	27962	453	311.16	15.91	9.82	1.44	0.32	0.25
Figure 6–2	41752	367	547.755	9.67	5.11	2.51	0.19	0.14
Fig. 6–3	56088	659	717.33	25.13	15.31	3.36	0.53	0.39
Figure 6–4	65369	789	989.42	30.58	17.47	4.14	0.65	0.44
Fig. 6–5	18616	88	245.22	2.71	1.22	0.65	0.04	0.03
Figure 6–6	42217	163	610.37	5.44	2.43	1.31	0.08	0.07

Object			CPU running time			GPU running time
Name	Points	Layers	WRP	R-PCG	D-PCG	WRP	R-PCG	D-PCG
Fig. 6–1	27962	453	731.61	37.88	22.67	4.33	1.12	0.98
Figure 6–2	41752	367	1294.57	23.15	12.34	7.26	0.64	0.38
Fig. 6–3	56088	659	1717.49	68.82	39.65	10.16	1.97	1.53
Figure 6–4	65369	789	2017.56	76.16	42.17	13.37	2.24	1.83
Fig. 6–5	18616	88	567.27	5.57	2.98	3.35	0.09	0.06
Figure 6–6	42217	163	1315.96	11.14	5.78	7.32	0.12	0.09

	F-measure	S-measure	E-measure	MAE
U²-NET	0.823	0.951	0.943	0.033
DRN	0.158	0.301	0.319	0.432
RAS	0.630	0.457	0.564	0.311
U²-RAS	0.987	0.991	0.992	0.004

Object			CPU running time			GPU running time
Name	Points	Layers	WRP	R-PCG	D-PCG	WRP	R-PCG	D-PCG
Fig. 6–1	27962	453	311.16	15.91	9.82	1.44	0.32	0.25
Figure 6–2	41752	367	547.755	9.67	5.11	2.51	0.19	0.14
Fig. 6–3	56088	659	717.33	25.13	15.31	3.36	0.53	0.39
Figure 6–4	65369	789	989.42	30.58	17.47	4.14	0.65	0.44
Fig. 6–5	18616	88	245.22	2.71	1.22	0.65	0.04	0.03
Figure 6–6	42217	163	610.37	5.44	2.43	1.31	0.08	0.07

Implementation of a full-color holographic system using RGB-D salient object detection and divided point cloud gridding

Abstract

1. Introduction

1.1 Background

1.2 Relate works

2. Full-color holographic system

2.1 RGB-D salient object detection

2.2 Divided point cloud gridding algorithm

3. Experiment and results

4. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (9)

Tables (3)

Equations (13)

Optics Express