Dual-frequency pattern scheme for high-speed 3-D shape measurement

,


Introduction
Structured Light Illumination (SLI) [1,2,3] is an active three dimensional (3-D) shape measurement technique widely used for scientific and industrial purposes [4].Various algorithms have been proposed to achieve high-accuracy and/or high-speed measurement [5,6].
Higher frequency patterns have been studied and employed to improve measurement accuracy at the extra computational cost of phase unwrapping.Most unwrapping phase strategies can be fallen in two categories [9]: spatial [10,5,11,12] and temporal [7,13,14].
Spatial phase unwrapping approaches assume a continuity of scanned surfaces and likely fail in regions with depth discontinuities (step edges) [9].Temporal methods, alternatively, use the intermediate phase values typically generated by projecting several additional reference patterns and, therefore, are not efficient for real-time operation.Kim et al. [15] proposed an algorithm for high-frequency SLI that avoided direct phase unwrapping with 4N ≥ 8 patterns.Li et al. [16] proposed a two-frequency grating that integrated both high-frequency and unitfrequency into each pattern with 2N ≥ 6 patterns, where the high frequency had to be equal to N. Su and Liu [17] treated the algorithm of Li et al. as one of two bases in their one-shot pattern strategy.
In this paper, we propose a novel dual-frequency pattern scheme that combines, into a single pattern, a high-frequency SLI pattern with a unit-frequency pattern where the high-frequency component generates wrapped phase, resilient to sensor noise, while the unit-frequency component enables phase unwraping.Differing from Li et al.'s two-frequency grating [16], the minimum number of our proposed patterns is 5 for any high frequencies.In Li et al.'s method, the minimum number of patterns is 6, and the high frequency term is fixed and related to the number of patterns.Our proposed pattern is defined by a single equation while Li et al. employed two equations.
With regard to high-speed or real-time 3-D measurement, researchers have proposed oneshot [2,18,19,20] methods that reconstruct depth from a single, continuously projected pattern, while others have proposed employing a high-speed camera and projector pair.One-shot SLI techniques are attractive for their insensitivity to object motion, where pattern strategies are based on spatial neighborhood coding, but suffering from poor resolution and high computational cost [2].Therefore, researhers have attempted to achieve the high resolution with the resilience to motion by driving the camera/projector pair at such high frame rates as to minimize impact of object motion.A study of particular importance has been led by Zhang et al. [8,21] who employed a new algorithm for phase generation instead of the time-consuming arctangent function [22] and also took the advantage of the computation capability of GPU to generate the absolute 3-D coordinates of scanned object [6].For the image in size of 532 × 500, their new algorithm was 3.4 times faster than using the arctangent function while the frame rate of 3-D reconstruction was sped up from 6 fps to 25.56 fps, by means of the GPU.
To achieve real-time 3-D video acquisition and reconstruction, this paper also introduces a lossless lookup table (LUT) method for phase, intensity/texture, and depth data that works in combination with a high-speed projector/camera pair widely used in phase measurement profilometry (PMP) [23].LUT-based methods have been known for the low computational cost and employed in many applications [24].We develop the novel method that extracts phase and intensity/texture information from the incoming PMP video by reducing the computational complexity of the arctangent and square-root functions.Furthermore, since no LUT-based method has been introduced for inverting the camera's calibration matrix to produce 3-D point clouds in real time, we introduce such a method that can also be applied to other triangulation-based 3-D reconstruction techniques.
As will be demonstrated in this paper, our experimental system achieves a processing rate of 714.29 fps to generate phase and texture video and 228.83 fps to produce 3-D point clouds using just one core of an Intel Core 2 Duo Quad Q9650 processor running at 3.0 GHz.No additional cores or GPUs [6] are needed, but using such resources would, of course, increase our frame rate if necessary.For the proposed novel dual-frequency pattern scheme, where the number of patterns is set to 6, as shown in Sec. 4, our fast LUT-based algorithms can be also applied.

Phase measuring profilometry
PMP is a popular method of SLI due to its high reliability and high accuracy [25].In practice, either vertical or horizontal sinusoid patterns are projected onto a target object so that the vertical or horizontal correspondence information, between the camera and the projector, can be directly derived from the computed phase data.The PMP patterns are described as:  where (x p , y p ) is the coordinates of a pixel in the projector, I p n is the intensity of that pixel, A p and B p are some constants, f is the frequency of the sine wave, n represents the phase-shift index, and N is the total number of phase shift.Figure (1) shows a group of sine wave patterns with N = 3, f = 1, A p = 127.5 and B p = 127.5 for 8-bits color depth projector are projected.
For reconstruction, a camera captures each image where the sine wave pattern is distorted by the scanned surface topology, resulting in the patterned images expressed as: where (x c , y c ) is the coordinates of a pixel in the camera while I c n (x c , y c ) is the intensity of that pixel.To simplify the notation, the coordinates index both in the camera and projector will be removed from our equation henceforth.The term A c is the averaged pixel intensity across the pattern set, which can be derived according to: such that the image A c is equal to an intensity or texture photograph of the scene.Correspondingly, the term B c is the intensity modulation of a given pixel and is derived from I c n as: such that B c can be thought of as the amplitude of the sinusoid reflecting off of a point on the target surface.If I c n is constant or less affected by the projected sinusoid patterns, B c will be close to zero.Thus B c is employed as a shadow noise detector/filter [25] such that the shadownoised regions, with small B c values, are discarded from further processing.Figure 1 shows, an example scene with a background that includes a fluorescent ceiling light, which over saturates the CMOS cameras pixel and, thereby, erases any signal from the SLI projector.In Fig. 1, A c looks like a standard video frame absent any indication of the projected pattern sequence I p n .In contrast, B c , shown in Fig. 1, looks very similar to A c except that it only shows texture in those areas of the scene that significantly reflect the projected sequence I p n .Given the significance of B c as an indicator of the projected signal strength, the binarized image in Fig. 1 shows only those pixels greater in magnitude to a user defined threshold.It is these pixels that will ultimately be used to reconstruct our 3-D surface with ignored pixels being considered too noisy as to relay an reliable depth information.
Of the reliable pixels with sufficiently large B c , φ represents the phase value of the captured sinusoid pattern derived as: In Fig. 1, we show the resulting phase image corresponding to the scene from sine wave patterns.For reference, Fig. 1 also shows the phase values for all pixels, including those considered unreliable according to B c .Having derived the phase image φ , we likewise have derived a unique correspondence value y p for every camera pixel (x c , y c ) through the linear equation φ (x c , y c ) = 2πy p .The 3-D world coordinates of the scanned object can, therefore, be derived through triangulation with the projector [7] where, under the assumption that the camera and projector are accurately modeled using the pinhole lens model such that their perspective matrices, M c and M p , are given by: The 3-D world coordinates X w , Y w , and Z w are defined according to [7] as It is this matrix inversion as well as the actangent computation in Eq. ( 5) that will later prove to be the bottleneck preventing real-time surface reconstruction as will be discussed in Sec.4.3.

Dual-frequency pattern strategy
For the purpose of minimizing the effects of sensor noise by means of high-frequency PMP while minimizing the computational expense of performing phase unwrapping, we propose a dual-frequency pattern set defined as: where I p n is the intensity of a pixel in the projector, A p , B p 1 and B p 2 are some constants to make value of I p n between 0 and 255 for a 8-bit color depth projector, f h is the high frequency of the sine wave, f u is the unit frequency of the sine wave and equals to 1, n represents the phase-shift index, and N is the total number of phase shift and is larger or equal to 5. Figure 2 illustrates one proposed pattern along with its profile for N = 5, n = 0, f h = 16, A p = 127.5,B p 1 = 102, and B p 2 = 25.5 for 8-bits per pixel grayscale intensity.Using this novel pattern set, Eq. ( 2) becomes: where I c n is the intensity of a pixel in the camera.The term A c is still the averaged pixel intensity across the pattern set, which can be derived by Eq. 3 such that the image A c is equal to an intensity or texture photograph of the scene.Correspondingly, the term B c 1 is the intensity modulation of a given pixel corresponding to φ h and is derived from I c n as: with m = 1 such that B c 1 can be thought of as the amplitude of the sinusoid reflecting off of a point on the target surface.
Similar to traditional PMP, if I c n is constant or less affected by the projected sinusoid patterns, then B c 1 will be close to zero, indicating that B c 1 is very sensitive to noise in either the camera, projector, and/or ambient light.Thus B c 1 is employed as an indicator of the signal to noise ratio such that excessively noisy regions, with small B c 1 values, are discarded from further processing [25].B c 2 is derived from I c n by Eq. 10 with m = 2 and it is the intensity modulation corresponding to φ u .Of the reliable pixels with sufficiently large B c 1 , the phase-pair, (φ h , φ u ), is derived as: where φ h represents the wrapped phase value of the captured pattern and φ u represents the base phase used to unwrap φ h .

LUT-based processing
LUT-based programming is a well known and widely used technique [24] for minimizing processing latency.In this section, we implement LUT-based processing for deriving the modulation, B c , and phase, φ , terms for traditional PMP that is suitable for all triangulation-based 3-D measurement, including our proposed dual-frequency pattern set.The proposed LUT-based processing takes advantage of the need by real-time systems to use as few patterns as possible by using the 8-bits per pixel, of the captured pattern set, as the indices into the LUT.By having LUTs that account for every possible combination of captured pixel value over the pattern set while storing double-precision results, the proposed scheme is completely lossless compared to traditional processing.

Modulation
As a first step in reducing the computational complexity associated with Eqs. ( 4), (5), and (7), we simplify the determination of B c by rewriting Eq. ( 4) as: for N = 4, or for N = 6.Noting that we need only solve these equations for 8-bits per pixel I c n images, we can implement a modulation look-up table, MLUT, that, for N = 3, is defined according to where integer indices V and U are derived from for N = 4, is defined according to where integer indices V and U are derived from or for N = 6, is defined according to where integer indices V and U are derived from The double-precision results of Eq. ( 15), (17) or (19) are stored in the MLUT.In contrast to Eq. ( 4), MLUT reduces the run-time computation cost of modulation without losing accuracy, where the size of the LUT is determined by the number of bits per pixel of the camera sensor and the number of patterns being projected.
For our proposed dual-frequency pattern scheme, the minimum number of patterns is 5, but to take advantage of our LUT-based processing, we set the number of patterns to 6 where the modulation terms B c 1 and B c 2 , in Eq. ( 9), are described as: and In practice, we need only calculate B c 1 as a shadow filter and for representing object texture.The MULT for B c 1 is the same as Eq. ( 19).

Phase
As our second step to producing real-time 3-D video with texture, we intend to minimize the computational complexity associated with generating phase video where the arctangent function has long been considered a significant obstacle for fringe pattern analysis [26].Previous approaches to this problem include Huang et al.'s cross ratio algorithm [22] and Guo et al.'s approximation algorithm [26].However, these methods reduce computational cost at the expense of accuracy and are not faster than our LUT-based algorithms.
As we did with Eq. ( 4) for B c , we simplify Eq. ( 5) according to the number of patterns N such that φ = tan −1 3 0.5 (I c for for for N = 6.Again based on the fact that the intensity values of grabbed images are range-limited integers, we can implement these calculations through a phase LUT (PLUT), for N = 3, defined according to where U and V are defined in Eq. ( 16); for N = 4, defined according to where U and V are defined in Eq. ( 18); or for N = 6, defined according to where U and V are defined in Eq. ( 20).The double-precision results are stored in the PLUT.Thus, the time-consuming arctangent operation is pre-performed, and phase values are obtained by accessing the pre-computed PLUT whose size is, again, determined by the number of bits per pixel of the sensor as well as the number of patterns projected with no loss in accuracy.Compared to Eqs. ( 5), PLUT avoids computing arctangent function at run-time such that the computational cost of phase is greatly reduced without introducing distortion.
For our proposed dual-frequency pattern scheme, the more accurate wrapped phase φ h and the coarse unit frequency phase φ u pair in the Eq. ( 8) is rewritten as The PLUT for φ h is the same as the Eq.(28), and the PLUT for φ u is defined as  are shown in Fig. 4 (top center), (top right), (bottom left), and (bottom right).The image in Fig. 4 (bottom center) shows the binarized B c 1 , to be used as a shadow noise filter for phase generation, using a threshold of 10.
The visualizations of Eq. ( 11) are illustrated in Fig. 5 (top-left) and (top-center) where shadow noise was filtered by the binarized B c 1 while Fig. 5 (bottom-left) and (bottom-center) show the 250th column curve crossing the angel.In comparison, the unit phase term is noisy while the wrapped phase has high quality but needs to be unwrapped.Because we obtain the wrapped phase and the base phase simultaneously, the unwrapping algorithm for multifrequency PMP can be performed to generate the final, high-quality and non-ambiguous phase.No neighborhood technique need be employed.The final unwrapped phase image without shadow noise is shown in Fig. 5 (top-right) and the 250th column curve crossing the angel is shown in Fig. 5 (bottom-right).Now once the unwrapped phase is obtained, 3-D world coordinates of the scanned object can be computed by triangulation between the camera and projector.Figure 6 shows the reconstructed 3-D point clouds with texture (top) and depth rendering (bottom) in front (left), side (center), and top view (right).A c acts as the texture for the 3-D point clouds in this experiment.
For our second experiment, a static object was scanned using traditional PMP with unit frequency with N = 3, 4, and 6 for the purpose of measuring the speed at which our LUTbased algorithms perform without phase unwrapping.Although the processor has four cores, our reported processing rates for each LUT algorithm are based on using just a one core.And although B c could be used as a shadow noise filter, thus reducing the number of pixels that would need to be processed while deriving φ , we computed a phase value for the full set of 640 × 480 pixels such that our observed processing times represent an upper limit.Under these conditions, the average computation time over 1, 000 runs for modulation B c and phase φ was then reported in Table 1.B c acts as texture in this experiment.
With the simplified modulation equations of Eq. ( 12), (13), and ( 14), the processing time decreases significantly for N = 4 (2.34 ms) compared with N = 3 (6.72 ms) because, before performing square-root computation, there are only 2 subtractions, 1 addition, and 2 square computations for N = 4.There are 3 subtractions, 1 addition, 2 multiplications, and 2 square computations for N = 3.Using the simplified phase equation, Eq. ( 23), (24), and (25), the processing time is almost the same for different N because, although the processing time for the basic computations, such as addition and subtraction, is still varied for different N.Such basic processing time is negligible compared to the processing time for the processing hungry arctangent function.
Furthermore, it is found from Table 1 that the performance of MLUT and PLUT is the same for different N because the computations for the integer indices, U and V , are the same for both MLUT and PLUT with the same N as is the time for accessing MLUT and PLUT.However, the processing time for MLUT/PLUT is increased with increasing N because the time for accessing the image buffer, I c n , increases; therefore, accessing this buffer is the predominant factor in this case.In practice when we want to access MLUT and PLUT, the computations of U and V need only be performed once.So Table 1 also shows the processing time of the row marked as "MLUT and PLUT, combined" is less than the summation time of the row marked as "MLUT: Eq. ( 15), ( 17) and then ( 19)" and the row marked as "PLUT: Eq. ( 26), (27) and then (28)" in which their U and V were computed respectively.
For testing 3-D reconstruction using a single processing core, we implemented reconstruction by means of matrix-inversion (Eq.( 7)) versus our LUT-based implementation of Sec.4.3.With our 3-D LUT algorithm, the frame rate of 3-D reconstruction is 228.83 fps for 640 × 480 image resolution, which is 10.3 times faster than matrix inversion by means of Eq (7).For added comparison, Zhang et al. reported a reconstruction frame rate of 25.56 fps with a resolution of 532 × 500 when performing matrix inversion by means of GPU processing on an nVidia Quadro FX 3450 [6].For Zhang et al.'s 2 + 1 pattern strategy [21], our LUT analysis can be also applied, and the performance is shown in Table 1.
In our third experiment, we scanned a human subject's hand gestures with our dual-frequency pattern set (N = 6) using LUT-based processing.Different from our second experiment, the most recent N video frames are captured live and stored in a shared image buffer by a camera thread.At the same time, a second thread performs modulation and phase generation from the shared buffer and stores the computed data into a modulation and phase buffer.A third thread, simultaneously, performs 3-D reconstruction using data from this modulation buffer, storing the 3-D point cloud results into a 3-D data buffer.Finally, a fourth thread displays the data from the 3-D data buffer using OpenGL.Because the speed of the camera/projector pair is 120 fps, the speed of final 3-D reconstruction is also 120 fps while the display speed is limited by the LCD display at 40 − 50 fps.
Figure 7 shows a human subject's hand gestures.In the case of Fig. 7 (left), we show the entire system along with the subject's hand as a way of demonstrating that video is being acquired and displayed on screen and in real-time where there is no noticeable delay between the gesture's performed by the subject and displayed on screen.On the LCD panel, you can see the XYZ axes labeled as blue, green, and red lines, respectively, while the 3-D point cloud is also being reflected off the XZ plane as a way of better illustrating the hand coming forward and away from the camera.In the video clips of Fig. 7 (center) and (right), we show the subject's hand held in front of a white cardboard box.The purpose of these clips is to show that we can track multiple, disjoint objects, since there are no phase ambiguities when using dual-frequency pattern scheme.For added inspection, the images in Fig. 8 show the live frames of hand poses with front view and top view where, from visual inspection, one can see the texture/intensity encoding of the points derived from B c .Points with low modulation strength are not displayed.As a final and quantitative evaluation of our proposed methodology, we scanned a textureless, white table tennis ball, with a radius of 19 mm, swinging like a pendulum on a 20 cm long string.We used our proposed dual-frequency pattern scheme with N = 6, f h = 4, f u = 1, A p = 155 and B p 1 = B p 2 = 50.Our PLUT and 3-D LUT were employed for phase generation and 3-D reconstruction.From the reconstructed point clouds of the moving ball, we found the best-fit sphere with variable position and radius.From this, we compared the estimated sphere radius with the true radius.Plotted in Fig. 9 (top), is this estimated sphere radius.The averaged distance of the table tennis ball point cloud, in Fig. 9 (bottom), was estimated as the centroid of all points in the cloud.
The exact position of the point cloud centroid is of no quantitative significance since we do not have a true measure to compare, but it does indicate the points in time when the ball was at the extreme points of the pendulum swing.What Fig. 9 shows is that we get an accurate and stable measurement of the ball radius at the extreme points of the swing, where the ball comes to a complete stop before changing direction.The mid-way point has the least stable radius estimate due to this being the point in the swing where the ball is moving at its highest velocity.At its farthest point from the sensor, noise in the reconstruction results in a sphere radius that is its smallest because the points of the cloud are not, in any way, constrained to be side by side in the reconstruction.So two neighboring pixels may end up being on the front and back of the best-fit sphere.This behavior is evident in the distance estimate, which has a high degree of variability.At its closest point to the sensor, the point cloud is sufficiently flat as to have the entire point cloud residing on the front-side of the best-fit sphere, resulting in a smooth an consistent distance estimate.

Conclusion and future works
In this paper, we presented a novel dual-frequency pattern strategy along with three LUT-based algorithms for calculating the modulation and phase terms along with reconstructing 3-D point clouds based upon a popular method of SLI.The proposed pattern coding technique embeds low and high frequency components into a single pattern where high-quality, high-frequency phase is unwrapped according to low-frequency phase.No any intermediate patterns are needed nor are surface discontinuities an issue.With regards to reconstruction quality, we note that scan accuracy is strongly influenced by features of pattern design, which were also considered in this paper.Being that our proposed method requires at least five patterns, future work will look at reducing this pattern number further.
With regards to LUT-based processing, especially, phase generation and 3-D reconstruction contributions of this paper are of particular importance to real-time 3-D reconstructing.Com-

Fig. 8 .
Fig. 8. Sample point clouds, using dual-frequency pattern scheme for N = 6, live show of various hand poses.(top) Front view and (middle) Top view.

Table 1 .
Processing time and rate, in milliseconds and frames per second (in parentheses), respectively, for various stages of PMP processing by means of the equations and LUT described in this paper.