Direct calculation of computer-generated holograms in sparse bases.

Computer-generated holography is computationally intensive, making it especially challenging for holographic displays where high-resolutions and video rates are needed. To this end, we propose a technique for directly calculating short-time Fourier transform coefficients without the need for a look-up table. Because point spread functions are sparse in this transform domain, only a small fraction of the coefficients need to be updated, enabling significant speed gains. Twenty-fold accelerations are reported over the reference implementation. This approach generalizes the notion of the phase-added stereogram, allowing for the calculatiion of an arbitrary number of Fourier coefficients per block, enabling high calculation speed with holograms of good visual quality, targeting minimal memory requirements.


Introduction
Numerical diffraction simulates how coherent light propagates through free space.Because of the properties of waves, these calculations are computationally costly: every point in the 3D scene can potentially affect every other point in space.
This problem is especially important for holographic displays.They are considered to be the ultimate 3D display technology [1], because holograms can account for all human visual cues, including stereopsis, eye-focusing, parallax and no exhibit accommodation-vergence conflict.These displays require holograms to be calculated at video rates and at high resolutions, which require highly efficient Computer Generated Holography (CGH) algorithms.
CGH algorithms come in many forms and have various trade-offs [1,2], such as: calculation time, what graphical effects are supported (occlusion, shadows, reflectivity, etc.) and what type of 3D scenes and models can be accounted for.They can be classified as point cloud methods [3][4][5][6][7][8], layer-based methods [9][10][11][12], polygon methods [13][14][15][16], RGB+depth methods [17,18], and ray-based methods [19][20][21][22][23].These categories are not sharply defined or mutually exclusive, as hybrid algorithms exist [24].This paper will focus on a subset of the point cloud algorithms, which we call "sparse CGH algorithms".Sparsity is a concept from signal processing literature; it is a measure of how few coefficients are needed to accurately describe a signal in particular basis.For examples, images such as natural photographs typically are sparse in wavelet space, making the wavelet transform highly suitable for compressing images [25], since most coefficients will be (almost) zero.
Sparse CGH algorithms thus calculate the hologram in a well-chosen transform domain, so that most of the signal's energy is concentrated in a small number of coefficients, thereby requiring much fewer updates than the total hologram pixel count; this can speed up hologram calculation considerably.The challenge is to choose the right transform, balancing sparsity (i.e.minimizing the number of needed coefficients) with the complexity of updating individual coefficients and the calculation cost of the inverse transform, maximizing performance and quality.
Presently, there are only a few examples of sparse CGH in literature [1].Although initially not described as such, it can be argued that the first instance of sparse CGH is the well-known Wavefront Recording Plane (WRP) [5], which calculates Point Spread Functions (PSFs) in an intermediate plane, which effectively corresponds to performing CGH in a convolved space.Because the propagation distance of a point to the intermediary plane is relatively short, the spatial PSF support is small, rendering them sparser in the spatial domain.After accumulating all the PSFs, the intermediary plane can be propagated efficiently using a convolution, such as the Fresnel transform or the Angular Spectrum Method (ASM) [26].This notion can be extended further by using multiple WRPs, allocating points to their respective nearest WRP [9,11].
Recently, another type of approach has been proposed that leverages the redundancies within a PSF in the wavefront plane instead of only relying on optimizing WRP depth placement.This was first proposed by Shimobaba et al. [27], by precomputing PSF coefficients in wavelet space using a Look-Up Table (LUT).Because neighboring PSF samples are highly correlated, much like in conventional imagery, fewer coefficients are needed to accurately represent PSFs.Later, sparse CGH was proposed using the Short-Time Fourier Transform (STFT), which was shown to yield even higher sparsity and better preservation of views at large angles [28].Both these approaches can be combined with WRPs to even further increase sparsity [28,29].
With this outlook, the phase-added stereogram [20] using the Fast Fourier Transform (FFT) [30] can be viewed as a special case of the sparse STFT CGH: for every PSF, a single STFT coefficient is updated in every block.Albeit fast, this approximation is relatively coarse, because the number of available discrete frequencies is limited.This will reduce the visual quality and hence limiting the application domain.Several extensions have been proposed [31] to improve quality, but these may significantly increase memory requirements by oversampling the FFT space by using blocks which are much larger than those of the basic method.
However, typically sparse CGH algorithm rely on a LUT for their operation, which introduces limitations.To this end, we propose the first sparse CGH algorithm without a LUT.This makes the system much more flexible: (1) PSFs positions are not discretized anymore for a fixed set of 3D locations; (2) no need for precomputing and storing sizeable LUT or using overcomplete representations, enabling a wider range of applications such as FPGAs (Field Programmable Gate Arrays) and embedded systems; (3) it reduces memory-bound performance limitations, making the algorithm more cache-friendly.Hence, the proposed algorithm can be viewed as a generalization of the phase-added stereogram.
The main novel contributions are the following: • A derivation of an analytical model for directly calculating STFT coefficients, and proposing an efficiently computable and accurate approximation.
• An algorithm to selectively update a small subset of the highest-magnitude STFT coefficients, with a configurable sparsity level trading-off speed and quality.
• A validation reporting a 20-fold speedup over the conventional algorithm using a C++ implementation tested on 3D point cloud models, and an error analysis measuring how much the proposed algorithm deviates from a reference PSF for various sparsity levels.
The paper is organized as follows: In section 2, we first derive the analytical expression of the STFT coefficients for PSFs; then an accurate and efficiently computable approximation is proposed, followed by a technique to select which subset of coefficients to update.Then, in section 3, the experiments are divided in two parts: (1) measurment of the accuracy of the proposed algorithm for various sparsity levels, and (2) measuring the gain in calculation speed w.r.t. a reference method, including a comparison in visual quality.Finally, we conclude in section 4 and discuss potential avenues for future work.

Methodology
The Fresnel approximation of a PSF P, laterally displaced with a translation (δ, ) and at a depth z from the hologram plane (ignoring the scaling factor) is given by where i is the imaginary unit, γ 2 = π λz with wavelength λ.This expression can be used to calculate a hologram H by summing over a collection of M points P j , j ∈ {1, ..., M }, each with their own values (γ j , δ j , j ): Sparse CGH will compute the values of P j indirectly using a linear transform T .If T is well-chosen, the PSF P j will be sparse and can be computed using only a few coefficients.Because of its linearity, we get that meaning we can first accumulate all P j in the transform domain, and only need to apply the inverse transform T −1 once on the summed PSF coefficients.If T can be computed quickly compared to the summation, we can obtain a large speedup.Here, we choose T to be the STFT.
The STFT divides the hologram into evenly-sized blocks, followed by a local Discrete Fourier Transform (DFT) on each block.The goal is to get an expression for directly calculating these coefficients for any given P(x, y) without relying on a LUT.To obtain the local Fourier coefficients of a hologram block of dimensions 2B × 2B, we have to evaluate the following integral f : Since both the PSF P as well as the Fourier transform are separable, so is f .We thus need to compute which are identical up to constants.The function f x cannot be written using only elementary functions, but it can be expressed in terms of the error function with t as the variable of integration.We can now rewrite f x (ω) = This expression can be simplified further.Firstly, we are not interested in the leading complex constant, since this constant phase delay and scaling will not affect the appearance of the hologram.Secondly, we can group the constants to obtain: where α = ω 2γ and x) is proportional to the error function evaluated on a diagonal of the complex plane, where c ∈ C can be chosen freely.Directly evaluating the integral from Eq. ( 6) to calculate E(x) would be too computationally costly, so this will be tackled in the next subsection.

Efficiently approximating E(x)
The aim is to find an efficiently computable, yet accurate approximation of E(x).This is needed if we want the sparse STFT coefficient computation to be faster than the conventional PSF computation in the spatial domain.To achieve this, an approximation was chosen with two use cases: one for small absolute values of x, and one for large values of |x|.Furthermore, c = (i − 1) π 2 was selected to simplify the subsequent expressions.For large values, we model the limit behavior of E using its asymptotic expansion: The series is truncated and simplified to a function E L for x → ±∞: where sgn(x) is the sign function, equal to 1 when x is positive and equal to −1 when x is negative.
Given that E(x) is an odd function, namely erf(x) = −erf(−x), the same expansion can be reused for very small values when x → −∞.However, this approximation cannot be used for small values of |x|; as x approaches 0, the singularity in the denominator of the first term of E L (x) will dominate, leading to large deviations from E(x).Instead, we can use the Taylor series around x = 0. We get the following Taylor approximation E S (x) valid for small values of |x|: for the real and imaginary parts, respectively.Note that the terms are separated by powers of x 4 , meaning that x 4 needs only to be computed once and can be reused for evaluating the polynomials more efficiently.What remains to be done is to find out when E L or E S should be selected for any given value of x.This can be determined by looking at their errors w.r.t. to E, and finding the intersection point.On Fig. 1 we plot the Mean Squared Error (MSE) using a logarithmic plot scale.Numerically, the intersection point is found to be x ≈ 1.609.We thus define the approximation of E(x) to be The values of E(x) and Ẽ(x) are shown (for the imaginary part) on Fig. 2.
x ≈ 1.609 MSE Fig. 1.MSE of the approximations for small values (E S (x)) and large values (E L (x)) compared to the reference E(x), on a logarithmic scale.The crossing point of both curves (at x ≈ 1.609) is chosen to select the approximation with the smallest error depending on x.Given the symmetry of E(x), the curves look exactly the same (but mirrored) for negative values of x.
2. Comparison of the (imaginary parts of) E(x) and Ẽ(x).The curves show good agreement (see Fig. 1 for the error analysis).

Accurate approximations of the PSF
Now that we have an expression f (ω, η) for directly computing the PSF in the STFT domain, a new question arises: how do we determine what subset of the coefficients to compute?The location of the highest-magnitude coefficients should be known a priori, since calculating all of them and finding the maximum would defeat the purpose of using a sparse representation.
This problem can be addressed by analyzing the simultaneous time-frequency representation of signals (a.k.a.Wigner space or phase space).Most of the energy of a single STFT coefficient is concentrated within a bounded region in space and bandwidth, which can be represented graphically by Heisenberg boxes (cf.Fig. 3).Every column represents a single STFT block, and every row are all coefficients of the same frequency.
The energy distribution of the STFT coefficients can be found by looking at the instantaneous frequency of PSFs.The instantaneous frequency ν(x) of a (one-dimensional) PSF P(x) is given by where ∠ is the complex argument (or angle).The curve ν(x) is a line with a slope inversely proportional to z and displaced by a lateral translation δ.Because of the non-stationary behavior of P(x) and the discreteness of the transforms, the energy won't be concentrated at a single coefficient, but rather leak and spread out to neighboring ones.Therefore, a group of STFT coefficients centered around the line ν(x) for every block are selected for calculation to maximize the energy capture, as shown in blue on Fig. 3.The leftmost STFT coefficient index j x (or equivalently j y ) is taken so that the subblock is centered as closely around the expected peak as possible: clamping the value to ensure valid indices for any point position, and using the floor operator • (rounding down).The value b x is the x-coordinate of the STFT block, which is a multiple of N • p (for pixel pitch p).An example is shown on Fig. 4, which also confirms the contiguousness of the coefficients: the energy is localized in a small group of coefficients, leaving all others to be close to zero.Note that the proposed algorithm will also naturally suppress aliasing when the PSF frequency surpasses the Nyquist rate because of potentially too steep incidence angles of light, since it won't compute the aliased coefficients.Furthermore, changing the STFT block size N will trade-off time and frequency resolution.The best block size to maximize sparsity will largely depend on the distribution of point distances to the hologram plane, as analyzed more in detail in [28].
Finally, the block-wise inverse FFT should be performed after accumulating all PSF contributions to obtain the final hologram.This has a computational complexity of O(t log N) for t hologram pixels and block size N.Because N is a relatively small constant, the inverse STFT can be executed very fast.Moreover, individual blocks will generally completely fit in the cache, contrary to global transforms spanning over the whole hologram, further improving speed.

Experiments
This section consists of two parts.Firstly, we evaluate the accuracy of the proposed algorithm w.r.t. the reference approach, i.e. calculating the hologram in the spatial domain by directly evaluating Eq. ( 2).Secondly, we measure the calculation times and report the speedup and compare rendered views from the generated holograms.

Measuring the accuracy of the sparse STFT algorithm
For this experiment, a PSF is computed with dimensions of 512 × 512 pixels, p = 4 µm, λ = 633 nm and z = 3 cm.The proposed algorithm uses blocks of size N = 32, and is tested for subblock sizes ranging from n = 2 up to n = 32.
In Fig. 5, several renditions are shown for different values of n, besides a reference PSF.Moreover, Fig. 6 depicts a few examples of sparse PSFs for n = 4 at distances z = 2 cm up to z = 8 cm.At higher n, the PSFs are visually indistinguishale from the reference, validating the adequacy of the method.At lower subblock size, block edge defects become increasingly visible, due to the properties of the block-wise DFT.These are reminiscent of JPEG artifacts at high compression rates.Nonetheless, the circular shape of the PSF is still well-preserved even at low n values.To quantify the magnitude of the error, the Normalized Mean Square Error (NMSE) is reported as well, namely where R is the original reference PSF and S is the sparsified PSF.The subscript j denotes the pixel index, and the norm is Euclidean.On Fig. 7 the NMSE values are plotted for different values of n using the same settings.As expected, the error decreases progressively as n increases.Note that the error for n = 32 is not zero (≈ 3.0 • 10 −3 ), because of the approximation introduced by Ẽ(x) in section 2.1, and also because the DFT is an approximation of the analytical (continuous) Fourier transform.Because conventional point-cloud CGH algorithms are a linear combination of many individual PSFs, the distortion will be an accumulation of all PSF errors as well.Therefore these results largely generalize for multiple points or even complete 3D point cloud models.
The gain in calculation time is reported for a collection of M = 10 5 points in Fig. 8, for n going from 2 up to 32.This is compared to the reference approach, taking up 557.9 seconds.As expected, the calculation time increases progressively as n grows larger.At n = 2, the calculation time is 9.8 s (a 56.8-fold speedup).
When n ≥ 30, the reference method is faster, mainly because the expression for calculating STFT coefficients (Eq.( 7)) is more complex than the standard expression (Eq.( 1)).

Generating holograms of 3D objects
In this subsection, the aim is to compare how directly calculating the hologram of 3D objects in the spatial domain (the reference approach) compares to the proposed sparse algorithm.The reference implementation utilized the separability of the Fresnel transform instead of directly calculating P(x, y): first, the PSF was evaluated to compute the horizontal and vertical components, followed by an outer product to accumulate all hologram pixel values.
The holograms were computed on a machine with an Intel Core i7 6700K CPU, 32GB RAM and ran using a C++17 implementation compiled with Microsoft Visual C++ 2017 on the OS Windows 10.The FFT was calculated using the FFTW3 library.STFT blocks of size N = 32 were used for the holograms of both 3D scenes.
The first 3D scene was configured as shown on Fig. 9.To generate a point cloud, the "Ship" model surface was randomly sampled to create a point cloud consisting of M = 10 5 points.The hologram dimensions were 2048 × 2048 pixels, with p = 4 µm and λ = 633 nm.The subblock size was n = 4, leading to a sparsity of s = (n/N) 2 ≈ 1.56%.
The reference implementation took 557.9 seconds, while the proposed implementation only took 26.4 seconds.We thus observe a 21-fold speedup over the default implementation.The STFT only took 13 ms (averaged over 10 runs), which is negligible compared to the total calculation time.Examples of rendered views can be seen on Fig. 10, showing that the overall quality is preserved from multiple viewpoints, even though we calculated only 1.56% if the total amount of coefficients.
For the second 3D scene, a point cloud of the "Train" model was used (cf.Fig. 11), consisting of M = 2 • 10 5 points assigned with random phases.The hologram dimensions were also 2048 × 2048 pixels, with p = 3 µm and λ = 633 nm.In this case, several different values for n were used to compare differences in visual quality.where R is the reference image and S is the reconstruction from the sparsified representation; C is the total pixel count, j is the pixel index and R max is the maximum pixel value, which is 255 for 8-bit images.
The results are reported in Table 1; the calculation times for the reference method is 1099.1 s.The rendered images are displayed in Fig. 12: almost no deterioration can be seen for n ≥ 4, but some noise and loss of detail is visible for n = 2. Taking n = 4 as the baseline, we obtain a speedup factor of 20.6.

Conclusion
This paper presents an algorithm to analytically calculate the STFT coefficients of PSFs, thereby enabling the sparse CGH needing only a small subset of the total amount of coefficients, without requiring a LUT.A speedup factor of 20 can be achieved w.r.t. the reference implementation with limited quality degradation.
Because of the generality of Fresnel diffraction, this algorithm has use beyond holography for other applications requiring near-field numerical diffraction of 3D point clouds.It can be viewed as a generalization of the phase-added stereogram, trading off accuracy and calculation speed.
Furthermore, this approach is orthogonal to many other CGH acceleration algorithms.Future work will involve combining this algorithm with WRPs, other input elements besides PSFs, and further analysis on tuning of the algorithm's parameters for optimizing quality and speed for any given 3D scene.Also, because of the low memory requirements, it is a good candidate for a parallelized implementation on specialized hardware such as FPGAs.

8 Fig. 3 .Fig. 4 .
Fig. 3. Time-frequency charts of PSF curves superimposed on STFT coefficients.The red curves represent the instantaneous frequency of the two PSF instances, whose slope will depend on z and lateral displacement on δ.The (Heisenberg) boxes correspond to individual coefficients, indicating where their energy is concentrated in space (or time) and frequency.The blue boxes are the coefficients that will be computed for this PSF.Both (a) and (b) have the same sparsity rates s, but (b) has a larger block size N trading off more frequency resolution for less spatial resolution, which is better for points with large |z | values.

Fig. 5 .Fig. 6 .
Fig. 5. Various renderings of a PSF placed at distance z = 3 cm using different algorithm settings.The subfigures denoted with "reference" are rendered using the standard PSF expression P(x, y) in the spatial domain, while the others are generated with the proposed approximate STFT algorithm.n denotes the subblock size and s the corresponding sparsity.means the real part is shown, ϕ indicates that the argument or (wrapped) phase is shown.Note how the PSF progressively resembles a phase-added stereogram more as n decreases.

Fig. 8 .Fig. 9 .Fig. 10 .Fig. 11 .
Fig. 8. Calculation time for computing the hologram of the "Ship" point cloud, for various subblock dimensions n (for N = 32) compared to the reference method, shown by the full horizontal line on the graph.

Fig. 12 .
Fig. 12. Reconstructed images of the "Train" hologram, for various values of subblock size n.The front focus is shown by taking the absolute value after backpropagating at z = −10.3cm with the ASM.The back focus uses z = −11.2cm.
The image quality can be measured using the Peak Signal-to-Noise Ratio (PSNR), defined byPSNR(R, S) = 10 log 10 C • R 2 max C j (R j − S j ) 2(17)