Accelerated computer generated holography using sparse bases in the STFT domain

: Computer-generated holography at high resolutions is a computationally intensive task. Eﬃcient algorithms are needed to generate holograms at acceptable speeds, especially for real-time and interactive applications such as holographic displays. We propose a novel technique to generate holograms using a sparse basis representation in the short-time Fourier space combined with a wavefront-recording plane placed in the middle of the 3D object. By computing the point spread functions in the transform domain, we update only a small subset of the precomputed largest-magnitude coeﬃcients to signiﬁcantly accelerate the algorithm over conventional look-up table methods. We implement the algorithm on a GPU, and report a speedup factor of over 30. We show that this transform is superior over wavelet-based approaches, and show quantitative and qualitative improvements over the state-of-the-art WASABI method; we report accuracy gains of 2dB PSNR, as well improved view preservation.


Introduction
Digital holography is a technology that allows for recording and reconstructing both the amplitude and phase of electromagnetic wave fields.It can account for all visual cues, such as continuous parallax, depicts no accommodation-vergence conflict and supports resolutions up to the diffraction limit of the wavelength.Because of these properties, it is often considered as the ultimate display technology.
Acquiring digital holograms with an optical setup for visualization purposes poses several practical limitations: the need for a specialized optical setup, restrictions on camera resolution, pixel pitch and object sizes, contaminations with aberrations and/or speckle noise and the need of a physical object for recording.These problems are not present Computer-Generated Holography (CGH), where holograms are numerically computed by simulating the diffraction of light.
CGH for display systems comes with its own challenges: large space-bandwidth product are needed to get decent viewing angles and hologram dimensions, corresponding to resolutions of at least 10 8 pixels.Furthermore, unlike in conventional computer graphics, every point in the scene can potentially affect every hologram pixel, resulting into a many-to-many relationship.These factors make CGH a very computationally intensive task.
The notion of sparsity is a well-known concept in signal processing literature.It refers to the property that signals drawn from a particular source can be well-approximated by a few coefficients in a certain basis.This property is very useful for many applications, such as: (1) compression, where only a small subset of the coefficients need to be stored to accurately represent the compressed signal; (2) denoising, since the noise is generally uncorrelated with the signal it will be spread out over many low-amplitude coefficients which can be thresholded away; (3) compressed sensing, where prior information on the sparsity enables one to accurately reconstruct a measured signal at sampling rates well below the Nyquist bound.
In particular, sparsity has been leveraged for CGH.Albeit often not expressed in these terms, many CGH methods leverage the sparsity of the holographic signal in some bases where only a fraction of the coefficients need to be updated; in doing so, they substantially accelerate the generation of holograms.The Wavefront Recording Plane method (WRP) [8] will express the signal in a backpropagated hologram, effectively representing the signal in a convolved space using e.g. the Angular Spectrum Method (ASM) [21].There, point-spread functions (PSFs) will only affect a limited amount of pixels in the WRP depending on their distance, rendering them sparse in the spatial domain.This principle can be extended further by using multiple parallel WRPs, and allocating points to their respective nearest WRP [9].Holographic stereograms approximate PSFs by piecewise continuous wavefronts in the Short-Time Fourier Transform (STFT) domain by updating a single coefficient per block corresponding to pieces of planar wavefronts [18].More recently, the WASABI method was proposed where PSFs are expressed in the wavelet domain [13].Here, Shimobaba et.al.pioneered the use of CGH with a sparsifying transform using wavelet shrinkage.Wavelets are a multi-scale representation of a signal, which can successfully sparsify smooth signals where neighbouring pixels are highly correlated.This reduces the amount of needed coefficients to accurately represent PSFs.This method was later combined with the WRP method to further reduce calculation time [14].
In this work, we improve over existing sparse CGH methods by updating the hologram in the STFT domain.Our novel contributions in this manuscript are the following: • We study the properties of sparse CGH systems, and analyze the properties and requirements of transforms in the space-frequency domain to optimize quality and calculation times • We demonstrate the superiority of the STFT over existing transforms by reporting an increase in the objective and subjective quality of the generated hologram • We validate its applicabililty by implementing the algorithm on a GPU using CUDA, and report a 30-fold speedup over a CUDA implementation of the conventional method.
The paper is structured as follows: In section 2, we analyze the properties of holographic PSFs and how the transform choice can be optimized to increase the quality of a transform given a limited budget of coefficients.Then, in section 3, we describe the methodology and implementation of our proposed algorithm, and provide a complexity analysis.In the experimental section 4 we compare our method to others in literature, and assess the objective and subjective quality differences.Finally, we conclude in section 5.

The wavefront recording plane method
Holograms can be computed from a point cloud, describing a collection of luminous points.
Points with complex amplitudes a j and positions (x j , y j , z j ) will create a diffraction pattern on the hologram plane U at e.g.z = 0, given by where k = 2π λ is the wavenumber, λ is the wavelength and r j = (x − x j ) 2 + (y − y j ) 2 + (z − z j ) 2 is the distance.
Directly evaluating this sum is computationally very costly, since it involves updating every pixel for every point in the point cloud.The goal is therefore to compute the hologram in a transformed space, and to maximize the sparsity of the transformed holographic signal while minimizing computation time.Higher sparsity means fewer needed coefficient updates for CGH and better signal approximation leading to higher quality.Moreover, the used transform needs to be efficiently computable.Otherwise, it would defeat the purpose of accelerating CGH in the first place.As an extreme example, one could use an ASM propagation for every point; the ASM expresses diffraction between parallel planes as a convolution, namely: where F and F −1 are the forward and inverse Fourier transform, respectively, z is the propagation distance and (ω x , ω y ) are the coordinates expressed in the frequency domain.This would lead to perfect sparsity (a single coefficient update per point) but would be way too computationally costly.The goal is therefore to optimize that balance between transform calculation time and sparsity.
Optimizing the sparsity/speed balance will require some analysis of the signal properties of holographic PSFs.First, by backpropagating the hologram closer to the object, we will reduce their spatial footprints, as shown in Fig. 1.However, this choice is still suboptimal: we can further reduce the average support size by placing the WRP in the middle of the point cloud.One problem that occurs is that calculating the PSF at small distances from the WRP using Eq.(1) will result in too coarse approximations, and amplitudes tending to infinity.This can be circumvented by representing PSFs as propagated Dirac functions instead (i.e. using an ASM-propagated single point).Affected coefficients are colored blue.The total colored surface area is an indication of the sparsity of the transform for a particular signal.Wavelets outperform the spatial (i.e.conventional WRP method), which are in turn outperformed by the STFT.The wavelets are suboptimal because of (1) the lack of frequency symmetry and (2) the inability of distinguishing postive and negative frequencies, indicated by the green boxes.

The space-frequency representation
Furthermore, a particularly useful method to analyze holograms is to study at them in the space-frequency domain [22] (also referred to as the time-frequency domain, the Wigner space representation, or phase space), where signals are simultaneously depicted in space and (instantaneous) frequency.Diffraction of light can be described as geometric operations in space-frequency domain, such as: shears for planar Fresnel diffraction, rotations describe fractional Fourier transforms modeling diffraction between spherical surfaces, translations can model object tilts, etc.These are very useful for intuitive manipulations of holographic signals.
PSFs are especially simple in this representation: because the instantaneous frequency is well-defined in every point, it can be described by a graph in space-frequency domain.We will restrict our analysis to 1D-signals to simplify the model, as to allow for a visual representation of the corresponding 2D space-frequency domain.In the 1D-case, a PSF is described as a 2D curve η(x) in the space-frequency representation.If we use the Fresnel approximation of a PSF [21]: the curve η(x) will correspond to a straight line with a slope that is inversely proportional to z.
The horizontal extent of the PSF in space-frequency domain will denote the amount of affected hologram pixels, as shown in Fig. 2(a).PSFs at a non-zero distance from the WRP can be sparsified further by compositing other transforms with the WRP method (with e.g.wavelets in the WASABI method).Many transforms can be viewed in the space-frequency domain as well; this can be done in terms of their "Heisenberg boxes": every basis function will have a particular footprint in the space-frequency domain.There is a trade-off between space and frequency resolution which is called the Heisenberg uncertainty principle, expressing that there must be a lower bound on the space-frequency support of a basis function.Since reversible transforms need to capture all the signal's information, they must cover the whole space-frequency domain, which can often be represented as a particular tiling of the space-frequency plane.In Fig. 2, we represent three representations: spatial, wavelets and STFT.The spatial domain representation will have perfect spatial localization but no localization in frequency; therefore, all pixels in the support will be affected, amounting to the WRP method.
Wavelets are a multi-resolution representation, depicted in Fig. 2(b) with three levels.The high-level sub-bands will have better spatial localization, while the low-level ones have better frequency localization.This property is very useful for natural imagery, because they tend to have 1/ f 2 -shaped autocorrelation behavior [23], for which wavelets are highly effective.However, holographic PSFs are highly symmetric in the frequency domain, causing the wavelet sparsity to drop substantially compared to natural imagery.Moreover, wavelets transforms are generally real-valued, meaning that they cannot distinguish the positive from the negative frequency components; thus, the wavelet coefficient will respond to signals as if they are present at locations on the opposite side of the frequency axis, depicted by the green Heisenberg boxes in Fig. 2(b).Finally, when sparsifying the wavelet coefficients, the highest subband coefficients will be discarded first after thresholding because of their lower average magnitudes.This will cause an uneven degradations of the views: this effect has been documented in [24], where wavelet compression of holograms gives rise to a "key-hole" effect, referring to the vanishing of the large viewing angles.This effect is also demonstrated in section 4. That is why we propose to use the block-wise STFT: as seen in Fig. 2(c), PSFs will cover less Heisenberg boxes resulting in better sparsity, it can distinguish positive and negative frequencies and exhibits symmetry in both spatial and frequency domains.Interestingly, the Accurate phase-added stereogram [18] can be viewed as a special case of our proposed method: they use an analytic approximation of a sparsified signal containing a single coefficient per block, without any WRPs.

Sparsity evaluation
We can numerically evaluate and compare the sparsity as well by keeping a small subset of the largest coefficient magnitudes of transformed PSFs.In Fig. 3, we plot the Normalized Mean Square Error (NMSE) of a sparsified PSF as a function of its distance to the hologram plane, namely where R is the original reference PSF and S is a PSF sparsified in some transform domain.The subscript j denotes the pixel index, and the norm is Euclidean.We compare the NMSE in the spatial domain (i.e. the conventional WRP method, using no additional transform), with the orthogonal Daubechies-4 wavelets (used in WASABI) and the STFT for various block sizes.We observe that wavelets perform better than using no transform, followed by STFT when the point distance surpasses a few mm.As the distance increases, larger STFT block sizes will perform better.This can be explained by the fact that when the slope of the PSF in the space-frequency domain will decrease, the optimum shifts and it becomes preferable to trade increased frequency resolution for decreased spatial resolution.In the limit to infinite distance, the slopes will be horizontal; we then get Fourier holography, so using a single FFT on the whole hologram will become optimal.
In the next section, we will explain how the STFT can be used to improve sparse hologram generation.

Methodology
For the experiments, we implement two versions of our framework: a baseline method using precomputed basis functions in the spatial domain (the "spatial version"), based on the conventional approach [6] combined with a WRP, and a sparse version using the STFT (the "STFT version").Diagrams summarizing the pipelines are shown in Fig. 4 and Fig. 5.To maximize sparsity, we place the WRP in the middle of the point cloud in both pipelines.We compute a lookup table by discretizing the z-values of the points into K equidistant quantization levels.The extent (or support) w k of a PSF can be found using the grating equation, determined by the hologram pixel pitch p, and wavelength λ: where θ is the maximum diffraction angle.Given the quantized distance z k , we have using the identity tan(sin Look-up table entries for PSFs close to the hologram will thus be smaller than farther ones.Note that the complex amplitudes PSFs with negative z k will simply be the conjugate of the corresponding ones with positive z k ; this means that only half of those entries need to be stored explicitly. For the spatial version, we directly store the cropped PSFs (according to its width w k ) in the look-up table.For the STFT version, we crop the PSFs to the next multiple of B = 16 and transform them with an STFT using B × B blocks.We only store 4 coefficients per block, amounting to approximately s = 1.6% of the total coefficient count.However, the STFT is not translation-invariant for all pixel shifts, just like the discrete wavelet transform; it is only invariant to translations which are a multiple of the block size.This means we need to store B × B translated versions for every transformed PSF.
The STFT LUT thus has three dimensions, totaling to B × B × K entries.Formally, the index of the needed LUT entry for a point (q x , q y , q z ) is: Here, the look-up table entries only contain the non-zero STFT coefficients.An entry is chosen not only based on the distance z, but also based on its relative (x, y)-translation w.r.t. the STFT blocks.The amount of affected STFT blocks depends on z.Three partially overlapping (color-coded) entries will only update a small amount of STFT coefficients.In this example, the red PSF has the largest support, followed by the blue and then the green one, according to their respective z.Finally, the blocks are inverse-STFT-transformed before propagation.
where z max is the maximum allowed distance of a point to the WRP, z WRP is the distance of the WRP to the hologram plane, • is the floor operation and [•] is the rounding operation.This entry is then added to the WRP in STFT space, offset from the center by an integer number of B × B blocks given by q x Bp , q y Bp .For better memory coalescence, we assume that the highest-amplitude coefficients are grouped together in a b×b subblock, where b = 2 (this is justified because of the well-defined instantaneous frequencies of PSFs as denoted in the Section 2.2).Thus we only need to store one extra byte per block in the look-up tables to denote the index value within a block.This is illustrated in Fig. 6.
In summary, for every possible translation (or shift) and depth quantization level, we store four 2x2-grouped STFT coefficients and their group position index within the B × B block.
We implemented the algorithm for a GPU using CUDA.The WRP (which has the same resolution as the target hologram) is divided into large 2048 × 2048 tiles, which are computed independently.This is not only to limit GPU global memory usage, but also to avoid evaluating PSFs for points lying far from the target tile.This is done by distributing the point cloud over the Fig. 6.Because the STFT is not translation-invariant, we need to store STFT coefficients for B × B shifted PSF instances.For every STFT block, we store the 2 × 2 grouped highest-magnitude coefficients and the associated position index.
tiles according to their (x, y)-position.Note that when points are close to tile edges, they must be allocated to both tiles sharing the edge.The preloaded look-up tables are stored in global GPU memory.For both methods (spatial and STFT), the additions of the PSF entries happen in shared memory for improved speed.This has as an additional advantage that after all the PSF additions, the inverse STFT can be computed immediately in the shared memory as well, which has negligible computational cost (we report average calculation times of roughly 25 ns per STFT block).All computations are done using complex-valued 32-bit floating-point numbers (amounting to 8 bytes per pixel).

Complexity analysis
This subsection will summarize the memory and computation complexities of the proposed algorithm.Aside from the memory requirement of the hologram itself, which can be constrained with the use of tiling, the main memory requirements are imposed by the stored look-up tables.Every LUT entry has a width w k (see Eq. ( 6)), and thus a memory footprint which is proportional to w 2 k .The three properties that affect the total LUT size are the maximum distance z max of a point to the WRP, the quantization level count K and the hologram parameters (p, λ).
In Big-O notation, for a conventional WRP the memory footprint of the LUT is: For the midpoint WRP, z max is halved, while K remains the same.The memory footprint is thus reduced by a factor of 4. If we omit the entries with negative z k because of the complex conjugate property as mentioned before, then the memory requirement is halved once more.
For the STFT method, we must take the block size B and sparsity s into account as well.Hence, the space complexity is given by In our implementation, we use s = 1.6% and B = 16.This will lead to an increase in memory footprint by 4, which is counteracted by the reduction by using a midpoint WRP.This means that the overall memory requirement is about the same as the conventional LUT method with a standard WRP.The Discrete Wavelet Transform lacks translation-invariance as well, so the memory requirement of WASABI will be similar to the STFT version.Concerning calculation time, we have to consider 3 steps: LUT additions, the transform stage (if applicable) and the ASM propagation step.For the STFT method, the respective time complexities are: for P luminous points and N hologram pixels.When P is sufficiently large, the LUT addition will dominate the calculation time (see Table 1), meaning that the sparsity s will become one of the main factors determining speed.The db4 wavelet transform of WASABI can be implemented efficiently with linear time complexity as well by using the lifting scheme and interleaving.

Experiments
For our experiments, we generate a hologram of the "Venus" point cloud consisting of 10 5 points.The hologram resolution is 4096 × 4096, with a pixel pitch p = 4 µm and wavelength λ = 633 nm.The center of the point cloud was placed at 1.7 cm from the hologram plane, with the WRP placed at the center as explained in the previous sections.The look-up tables were quantized in 100 entries, spanning a distance of 1.6 cm as to fully encompass the Venus head.The LUT took 135 MB in total.The hologram was computed on a machine with an Intel Core i7 6700K CPU, 32GB RAM and a NVIDIA GeForce 1080 GTX GPU.The code was compiled for CUDA 5.0 and ran on the OS Windows 10.The ASM propagation was also computed on the GPU, using the FFT libraries of MATLAB R2017b with a precomputed filter.Renderings are shown in Fig. 7.
We compare three configurations: the spatial method, the STFT method, and a version based on the WASABI method using db4 wavelets.The latter was computed by sparsifying the look-up table entries with the weighted 2-level db4 transform as explained in [13], with a sparsity of 1.6%, matching the STFT approach (see Fig. 8).
The total calculation time of the spatial method is 3.97 s, while the STFT method only took 0.13 s (a breakdown for every stage is shown in Table 1).This means we achieved a speedup factor of over 30.We did not implement an optimized CUDA version of the WASABI method for comparison, but we expect very similar calculation times to the STFT method since the coefficient count and overall transform structure is akin to the STFT version.For the quality, we report a Peak Signal-to-Noise Ratio (PSNR) of 29.33dB for the STFT method compared to a PSNR of 27.30dB for the WASABI method, indicating an improvement of about 2dB.The general quality improvement for other b × b subblock sizes for B = 16 is shown in Fig. 9; we can see that the STFT subblock approximation is a better approximation than db4 wavelets, unless the amount of kept coefficients is very high (which would hamper calculation efficiency).
The 2dB improvement can be generalized to other 3D objects: because the CGH is a linear combination of all PSFs, the distortion will be an accumulation of all individual point distortions.Therefore, the quality difference will not be affected by the object's shape, but only by the (average) point distance, as delineated in Section 2.3.
We also included a visual quality evaluation in Fig. 7.We reconstructed a view from the left and right by cropping a smaller aperture from the leftmost and rightmost side of the hologram.We can see that both views are well-preserved in the STFT method w.r.t. to the spatial method.Unfortunately, the WASABI method loses more information at the large viewing angles, resulting in darkening of the sides of Venus.The suppression of the high frequencies is visible in the Fourier magnitudes in Fig. 7.This can also be seen by investigating the sparsified PSF entries as seen in Fig. 8, showing that the high frequencies containing the large viewing angles are suppressed by the wavelets.

Conclusion
We present a framework for accelerating CGH by computing the hologram in the transform domain.
We performed an analysis of the signal properties of holograms, described and implemented a technique for efficiently computing it on GPU and evaluate the performance on a point cloud.We report a speedup with factor of 30 over the conventional approach, and improve with 2dB over the WASABI method, with better preservation of viewing angles.

Fig. 1 .
Fig. 1.Diagram illustrating how distance to a plane will affect the PSF support size.Given the hologram pixel pitch, the corresponding maximum diffraction angle will determine the geometry of the cone of influence shown on the left.The classic WRP method will reduce the support of PSFs w.r.t. the hologram plane, as shown on the right.By placing the WRP in the middle of the point cloud, the average PSF support will be reduced further.

Fig. 2 .
Fig.2.Representation of a 1D PSF as a red curve η(x), and the affected Heisenberg boxes for the (a) spatial, (b) orthogonal wavelets and (c) STFT bases.Affected coefficients are colored blue.The total colored surface area is an indication of the sparsity of the transform for a particular signal.Wavelets outperform the spatial (i.e.conventional WRP method), which are in turn outperformed by the STFT.The wavelets are suboptimal because of (1) the lack of frequency symmetry and (2) the inability of distinguishing postive and negative frequencies, indicated by the green boxes.

1 NMSE
Fig.3.NMSE of various sparsified transforms, keeping the 64 highestmagnitude coefficients.The sparsification is applied on a PSF of a point placed at z going from 0 to 25 mm, sampled with 512 × 512 pixels, on a hologram with a pixel pitch of 6 µm and a wavelength λ = 633 nm.

Fig. 4 .Fig. 5 .
Fig. 4. Schematic representation of the spatial version CGH pipeline.Look-up table entries are chosen based on each point's z-value and are directly added to the WRP.

Fig. 7 .Fig. 8 .Fig. 9 .
Fig. 7. Depiction of views and the Fourier domain of renderings of the Venus CGH using various methods.The top row (a,b,c) display the backpropagated left view, the middle row (d,e,f) the backpropagated right view, and the bottom row (g,h,i) are the Fourier magnitudes.The left column (a,d,g) is the WASABI method, the middle column (b,e,h) is the STFT method, and the right column (c,f,i) is the reference spatial method.

Table 1 .
Computation times of both tested pipelines, shown for every stage.The spatial version does not have an STFT transform.