c ○ 2003 Optical Society of America

Based on recent discoveries, we introduce a method to project a single structured pattern onto an object and then reconstruct the three-dimensional range from the distortions in the reflected and captured image. Traditional structured light methods require several different patterns to recover the depth, without ambiguity or albedo sensitivity, and are corrupted by object movement during the projection/ capture process. Our method efficiently combines multiple patterns into a single composite pattern projection allowing for real-time implementations. Because structured light techniques require standard image capture and projection technology, unlike time of arrival techniques, they are relatively low cost.


Introduction
Structured-light illumination [1] is a commonly used technique for automated inspection and measuring surface topologies. Classical 3D acquisition devices use a single scanning laser stripe scanned progressively over the surface of the target object, placing a burden on the object to remain static and a burden on data acquisition to capture all the stripe images. For reducing the technological burdens of scanning and processing each scan position of the laser stripe, many methods have been devised to project and process structured-light patterns, such as multi-stripe [2] and sinusoidal fringe patterns, that illuminate the entire target surface at the same time. But these multi-stripe patterns introduce ambiguities in the surface reconstruction around surface discontinuities, can be sensitive to surface reflectance variations (i.e., albedo), and/or they suffer from lower lateral resolution caused by the required spacing between stripes [3].
The solution to the ambiguity and the albedo problem is to encode the surface repeatedly with multiple light striped patterns [4] with variable spatial frequencies [5,6,7], but by doing so, if a real-time system were desired, either temporal multiplexed projection/capture image sequences or color multiplexed using multiple narrow band color filters are required. The temporal multiplexed system is sensitive to object motion. The multi-color techniques [8,9] also suffer from lower SNR due to the spectral division and are sensitive to surface color spectra. So what has been the missing piece, and in some circles the "holy grail" in structured-light research, is the discovery of a structured-light pattern that allows, with a single image, the measuring of surface topologies without ambiguities, with high accuracy, and insensitive to albedo variations.
Several one-shot projection patterns have been proposed to be able to recover the range data from one single image [10,11,12]. For example, a gradient pattern [10,11] can be used for non-ambiguously retrieving phase. However, this approach is typically noisy and highly sensitive to albedo variation. A single pattern technique that is both insensitive to albedo and non-ambiguous was introduced by Maruyama and Abe that uses binary coding to identify each line in a single frame [12]. While this line index approach is sensitive to highly textured surfaces, we believe the strategy is correct. However, what is needed is a general approach to the single pattern problem.
We have discovered a systematic way of generating such patterns, by combining multi-patterns into a single Composite Pattern (CP) that can be continuously projected. The area of structured light is a crowded art with thousands of custom systems being developed for thousands of different applications over the last 70 years. There are several commercially available structured light scanners, but they are expensive and have limited markets and specialized capabilities. Most of the structured light research has been funded by industry and limited to specific applications. We have pursued a general mathematical model [3,13] of the different structured light techniques, along with a general depth reconstruction methodology. Our strategy has been to treat structured light systems as wide bandwidth parallel communications channels. Thus, well-known concepts of communications theory can be applied to structured light technology for optimization, comparative analysis and standardization of performance metrics. However, we realized something else from our modeling efforts. We realized that the spatial dimension orthogonal (i.e., orthogonal dimension) to the depth distortion (i.e., phase dimension) was underutilized and could be used to modulate and combine multiple patterns into a single composite pattern [14]. Furthermore, this is a methodology that can be applied to a variety of existing multi-pattern techniques. Different from the ad hoc single pattern techniques mentioned above, we introduce a systematic methodology to combine multiple patterns into one single composite pattern, based on well-known communications theory. The individual patterns are spatially modulated along the orthogonal dimension, perpendicular to the phase dimension. In this way we can then take advantage of the existing procedure for traditional multiple patterns such as Phase Measuring Profilometry (PMP) [5], Linearly Coded Profilometry (LCP) [7], and other multi-frame techniques [15,16] while only projecting one single frame onto the target object. Basically, this composite modulation approach is fit for most of the successive projection patterns. However, for the simplicity of demonstration, our paper focuses on the coding and decoding procedures of composite patterns for the PMP technique. In our system, a single frame of composite PMP pattern is formed and projected to the target object. The reflected image is decoded to retrieve multiple PMP frames, and the phase distribution distorted by the object depth is calculated. The depth of the object can then be reconstructed out of the phase following the traditional PMP method.

Traditional PMP method
The PMP range finding method has several advantages including its pixel-wise calculation, resistance to ambient light, resistance to reflection variation, and it can have as few as three frames for whole-field depth reconstruction. Sinusoid patterns are projected and shifted by a factor of 2π/N for N times as where A p and B p are the projection constants and (x p , y p ) is the projector coordinates. The y p dimension is in the direction of the depth distortion and is called the phase dimension. On the other hand, x p dimension is perpendicular to the phase dimension, so we call it the orthogonal dimension. The frequency f φ of the sinusoid wave is in the phase direction. The subscript n represents the phase shift index and n = 1, 2, ..., N , where N is the total number of phase shifts. The reflected intensity images from the object surface after successive projections are where (x, y) are the image coordinates and α(x, y) is the reflectance variation or the albedo. The pixel-wise phase distortion φ(x, y) of the sinusoid wave corresponds to the object surface depth. The value of φ(x, y) is determined from the captured patterns by The albedo, α(x, y), is cancelled in this calculation, therefore, the depth through this approach is independent of the albedo. When calibrating the range finding system, the phase map of the reference plane φ r (x, y) is pre-calculated from the projections on the reference plane. The depth of the object surface with respect to the reference plane is easily obtained through simple geometric algorithms [17]. As shown in Fig. 1, the distance between the projector lens center, O p , to the camera lens center, O c , is d. Both the projector and the projectorcamera plane are a distance L from the reference plane. The height of the object at point A, h, is calculated by andBC is proportional to the difference between the phase at point B, φ B , and the phase at point C, φ C , asB The constant β, as well as other geometric parameters, L and d, are determined during the calibration procedure. The phase value calculated from Eq. (3) is wrapped in the range value of (−π, π] independent of the frequencies in phase direction. Phase unwrapping procedure retrieves the non-ambiguous phase value out of the wrapped phase [18,19]. With relatively higher frequencies in phase direction, the range data have higher signal-to-noise-ratio (SNR) after non-ambiguous phase unwrapping [20].

Composite PMP Pattern
In order to combine multiple patterns into one single image, each individual pattern is modulated along orthogonal direction with a distinct carrier frequency and then summed together as shown in Fig. 2. Therefore, each channel in the composite image along the orthogonal direction represents the individual pattern used in PMP for the phase calculation. Similar to the patterns projected in multi-frame approach as in Eq. (1), the image patterns to be modulated are A constant c is used here to offset I p n to be non-negative values. Negative signal values will cause incorrect demodulation with our AM based demodulation method, as dis- cussed later. The signal patterns are then multiplied with cosine wave with distinct carrier frequencies along the orthogonal direction. The composite pattern accumulates each channel such that where f p n are the carrier frequencies along the orthogonal direction and n is the shift index from 1 to N . The projection constants A p and B p are carefully calculated as so that the projection intensity range of the composite pattern falls into [I min , I max ]. In order to increase the SNR, B p should reach its maximum value allowed [20] and therefore, [I min , I max ] should match the intensity capacity of the projector to retrieve optimal depth information.
The orthogonal modulation frequencies f p n are designed to be evenly distributed and away from zero frequency. This modulation is analogous to the AM modulation. No patterns are modulated in the "DC" or baseband channel. Although the bandwidth of the composite pattern is degraded by losing the baseband channel, the modulation pattern is less sensitive to ambient light. Ideally, the reflected composite pattern image on the target object surface captured by the camera is where and α(x, y) is the albedo and φ(x, y) is the distorted phase as in Eq. (2). The actual carrier frequencies f n in the camera view may be different from the f p n due to perspective distortion between the projector and the camera. To make the modulation frequency f n as independent as possible of the topology of the object surface on each orthogonal line, the camera and projector are carefully aligned to share about the same world coordinates both in orthogonal direction and depth direction. If the orthogonal and phase axes of the camera and projector fields have a relative rotation between them, the orthogonal carrier modulation of the projector will leak into the phase component captured by the camera.
Since projector and camera digitally sample the projection pattern and captured image, the detection of the high frequency carrier wave and the recovery procedure rely heavily on the intensity and the spatial resolution of the projector and camera system. Appropriate carrier frequency, f p n , has to be carefully assigned. Selection of the carrier frequency, f p n , is highly dependent on the projector and camera quality, as well as the experimental setup. Basically, to minimize the channel leakage, adjacent, f p n , should be spread out as much as possible. However, limited by the spatial and intensity resolution, they have to be confined to a certain range for reliable depth recovery.
We process the reflected images as 1-D raster signals where each line along the orthogonal dimension is an independent signal vector. The received orthogonal spectrum for four composite pattern channels, in a typical signal vector, is illustrated in Fig. 3. The four carrier frequencies are evenly distributed and are separated from the ambient light reflection at baseband. The captured image is processed, as a set of 1-D signal vectors, by band-pass filters to separate out each channel. To achieve uniform filtering for the channels, the band-pass filters are centered at f n and are all derived from the same low-pass Butterworth filter design, in other words; they all have the uniform passband span and are symmetric at f n . The Butterworth filter is used in this stage for smoother transition and minimal side-lobe ripple effect. On the other hand, the order of the Butterworth filter is carefully selected to reduce the crosstalk between channels. Compromising between side-lobe effects and crosstalk is required to obtain acceptable reconstruction performance. Cutoff frequencies for each band are designed such that where n = 1, 2, 3, . . . , N and f 0 = 0, which is the baseband channel. The orthogonal signal vectors after 1-D band-pass filtering are where is the convolution operator and h n BP (x) is the band-pass filter along orthogonal direction centered at frequency f n . The baseband image, I n (x, y), is assumed to be band limited along the orthogonal dimension with a bandwidth less than or equal to the filter h n BP (x) bandwidth. The filtered images have to be demodulated to retrieve each individual pattern I n (x, y). Two critical practical factors have to be considered in the demodulation process. First, the perspective distortion causes the depth dependent variation of orthogonal carrier frequencies. Second, with the practical experimental setup, the cosine carrier wave on each orthogonal line has an unknown phase shift. That is, considering the perspective distortion, the practical image after band-pass filtering is based on Eq. (13) such that I BP n (x, y) = I n (x, y) · cos(2π(f n + δf )x + δθ) (14) where f n has the small variation δf and δθ is the unknown phase shift. By squaring both sides of Eq. (14) we have (I BP n (x, y)) 2 = (I n (x, y)) 2 · 1 + cos(4π(f n + δf )x + 2δθ) 2 .
This is low pass filtered by h LP (x) with a cutoff of f n such that The modulated image is recovered by square rooting Eq. (16) such that Due to the involvement of the square operation in the demodulation process, I R n (x, y) has to be non-negative. It is effectively an AM based modulation technique which recovers the PMP pattern as the positive envelope. The demodulation procedure is summarized in the diagram as in Fig. 4. The recovered images, I R n (x, y), represent the individual patterns in traditional PMP and are used to retrieve the depth of the measured object based on the traditional PMP method.
The range data, with respect to the reference plane, can then be calculated the same way as described in Sec. 2. As in Eq. (13), leakage error between orthogonal channels occurs when the measured object surface has significant variation of albedo or depth in the orthogonal direction. However, inherited from the PMP method, reconstructed depth in phase direction is resistant to the depth discontinuity and albedo variation.

Experiments
We established the range finding system, shown in Fig. 1, based on the CP technique. The projector used is a Texas Instruments (TI) Digital Light Processor (DLP) projector with an 800 × 600 micro-mechanical mirror array. The framegrabber, a DT3120, grabs the image from the CCD monochrome camera with spatial resolution of 640 × 480 with 8 bits intensity resolution.
To simplify the decoding procedure, the frequency across the phase direction f φ is selected to be unit frequency. So no unwrapping algorithm need be implemented. The number of patterns is N = 4. The choice of N = 4 came from trial and error where the minimum of N = 3 has too much inherent reconstruction noise and N > 4 reduced the lateral resolution for the given camera resolution. In this experiment, carrier frequencies of the projector f p n are 50, 85, 120 and 155 cycles per field of view for an orthogonal field of view width of 800 pixels. The corresponding received carrier frequencies are 33, 56, 79 and 103 cycles per field of view with a field of view of 640 pixels. The lowest modulation frequency is selected to be higher than the difference of the adjacent modulation frequencies to minimize the effect of the ambient light reflection. The projector has a field of view of 475 mm in height and 638 mm in width while the field of view for the camera is 358 mm high and 463 mm wide. The order of the Butterworth bandpass filter is selected to be 7 and the width of the passband is 10 to reduce the cross-talk between adjacent channels. Figure 5 (a) shows the projection pattern on the reference plane and the recovered reference phase map is shown in Fig. 5 (b). To test sensitivity to depth variation, a half circular step with a diameter of 300mm and a thickness of 85mm is placed on the top of the reference plane. The reflected image and the corresponding phase map are shown in Fig. 5 (c) and (d) respectively. The depths of the object scene are calculated pixel-wise following Eq. (4) and are shown in Fig. 5 (e). The demodulation procedure generates the edge response effects in the reconstructed depths. The original sharp edges of the circle from the reference plane in the world coordinates are reconstructed with edge transitions between the two depth levels in the depth map due to the band limited filtering. The abrupt edges of the depth act as step edges in the orthogonal direction for all pattern channels. The result is the impulse response of the filters smoothes the edges.
To test the performance of this technique in the presence of abrupt albedo variation, the object is set to be a flat plane, with the gray level 255, at zero depth with a dark circular area, with the gray level 40, at the center. The captured image is shown in Fig. 6 (a). The 2D range representation of the reconstructed depths is shown in Fig. 6 (b) and the 3D depths are shown in Fig. 6 (c). The internal areas in the dark circle are properly reconstructed, independent of albedo. However, the abrupt albedo variations on the edge of the circle generate space-variant blurring which results in the depth errors around the edges. The albedo pattern acts as a window operation against the orthogonal sinusoidal patterns. When the window edge intersects a high intensity area, signal leakage occurs which corrupts the individual patterns differently. Thus the reconstructed phase oscillates in value with a spatial dependency. We characterize this behavior with an eye diagram as shown in Fig. 6 (d). To construct an eye diagram of these edge responses, the composite pattern is then shifted nine times along the orthogonal direction with 4-pixel shift at each step and the depths are reconstructed.  for future comparison.
The key benefit of our composite pattern technique is that video sequences can be captured at the frame rate of the camera and therefore works well for the real-time 3D reconstruction. Four movie clips are made for the illustration of its feasibility for real-time 3D reconstruction as shown in Figs. 7-10. In Figs. 7 and 8, the captured frame is shown together with the depth maps. Shadows are represented as black areas in the depth images. Figure 7 shows the subject tossing an icosahedron and Fig. 8 shows the subject stretching out her hands. Two human computer interface examples are given in Figs. 9 and 10. In Fig. 9, the depth value at the location of a specified "button" is monitored. When the subject's hand crosses the depth threshold in these locations, the button sequence is activated to allow the user to use dynamic button menu options. Although there is some noise in these movies of the subject in real environments, the temporal depth changes are clearly recorded in the range frames.
In Fig. 10, a protocol based on the 3D position of subjects hands is created to control a virtual environment. The upper image is the captured composite pattern reflection and the lower image shows the 3D virtual environment that the hands are controlling. Hand detection is facilitated by thresholding the depth imagery to segment out the hands from the background and subjects body. The 3D centroids of each hand are used to estimate the hand position. The control protocol is generated such that the depth difference of two hands control the rotation of the virtual environment, horizontal movement controls