CAMERA-PROJECTOR 3D SCANNING OF A SEMI-SUBMERGED TERRAIN IN A FLUME

The paper presents a home-made 3d scanner, consisting of off-the-shelf components: a camera and a projector. It is intended for monitoring dynamics of riverbed mophology observed under laboratory conditions in a flume, which is currently under construction. Special attention is paid to satisfying high requirements concerning accuracy and precision despite a compact and versatile setup of the system. Preliminary results are shown.


INTRODUCTION
Research conducted so far in the morphological response of rivers has focused on bed level changes, sediment adaptation and widthadjustment dynamics.Laboratory experiments have shown their capability for analyzing the processes that interact in this morphological evolution (e.g.Friedkin, 1945;van Dijk et al., 2012).Traditional measurements in laboratory have been normally performed by means of bed profilers.This procedure consists in the collection of successive points across selected transects of the channel, using a rod or a laser as point gauges.This kind of measurements typically implies a slow rate of data acquisition if the number of cross sections is high, further reducing the survey frequency.
Moreover, when infrared lasers are used it is necessary to collect the data when the bed is dry because of refraction phenomena, whereas a few attempts have been made for measuring the bed level evolution in laboratory when the water is present.Within these attempts we can mention the close range digital photogrammetry technique (CRDP) which has been applied in the survey of submerged surfaces not only in flumes, but also in the field (Chandler et. al., 2001, Butler et.al.,2002).
Considering the effects of refraction at the air/water interface, high-resolution digital surface models (DSMs) have been obtained by means of through-water close range digital photogrammetry.However, the use of control points within the channel is required which in turn perturbs the flow.More recently, Smith et al. ( 2012) have applied terrestrial laser scanning (TLS) in the acquisition of DSMs of underwater objects.The inclusion of laser beams with green wavelengths (i.e.around 532 nm) eliminates the complication of water absorption and the laser refraction when infrared beams penetrate the air/water interface, but increases the costs due to the high price of the equipment.
As a new approach, we introduce in this paper the use of the structured light technique for obtaining point clouds from the channel bed in a flume with water flow.

CAMERA-PROJECTOR 3D SCANNERS
A projector-camera 3d scanner is a photogrammetric stereo system where one of the two cameras is replaced by a projector, a Figure 1: Projector-camera 3d scanner layout device usually applied for Powerpoint presentations and in home cinemas.Instead of two there is only one sensor array in the remaining camera, and an LCD array generating an image in the projector.The challenge of finding correspondences is then reduced to creating patterns at known locations inside the projector array, and recognizing these in the camera image, whereby their positions in the camera image become known as well.By projecting and recording suitable, perhaps multiple, images, arbitrarily many correspondence can be obtained.The idea is by no means new, but is a variant of well-known structured light photogrammetry.Also the use of LCD projectors for this purpose has been demonstrated earlier (Lanman andTaubin, 2009, Sansoni andReaelli, 2005).The current contribution is an effort to optimize the design of such a system in terms of convenience and practicality, as well as resolution and accuracy.We show the result in the application of monitoring a landscape in a flume, but we claim at the same time that we constructed a general-purpose, accurate, low cost 3d data acquisition system.
In the currect paper we focus on the geometrical layout of the entire system, i.e. on the locations at which the camera and the projector are mounted w.r.t. the scene to be recorded.We will shortly address the resulting photogrammetric orientation and data processing procedures, but these can be considered standard pho-ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume II-4/W1, 2013 ISPRS Acquisition and Modelling of Indoor and Enclosed Environments 2013, 11 -13 December 2013, Cape Town, South Africa togrammetric subjects.The main focus will be on obtaining optimal accuracy and precision within the chosen system geometry.The special requirements resulting from part of the scene being covered by water were our first motivation to start building a system "from scratch".This gives us control over the entire system including the "core", and allows us to build it up starting from the collinearity equations, where we will need to take the refraction at the air-water interface into account.These aspects are not yet covered in this paper, however.A fundamental choice in the design of a camera-projector system concerns the geometry, which can be wide or narrow.Within a wide geometry the distance between camera and projector is about the same as the expected depths to be measured, i.e. as the distances between the camera (or the projector) and the scene.A wide geometry roughly gives a base to height ratio between 0.5:1 and 1:1, which is generally considered optimal for accurate photogrammetric depth measurement.With a narrow geometry the camera and the projector are much nearer to each other, at a base of less than 5% of the maximum depth to be measured.The camera and projector are mounted side by side and their optical axes are approximately parallel.Not surprisingly this will lead to a lower depth accuracy.The paper investigates whether the narrow-geometry design is still a good choice, i.e. whether it allows to obtain certain pre-defined accuracy requirements.We will provide a theoretical argument for this; moreover, preliminary experiments were conducted in a tilting flume 0.40 m wide and 10 m long, with a bed composed by sand with a D50 of 0.5 mm at the Delft University of Technology (Fig. 2).

CAMERA-PROJECTOR GEOMETRY
Monitoring a landscape partially covered by water in a flume imposes a particular limitation to the measurement geometry, in that we need to prevent the projector beam being reflected by the water straight into the camera.When using a wide geometry we, therefore, cannot have the projector and the camera at opposite sides of the flume, whereas with a narrow geometry the camera and projector cannot be mounted directly above the recorded flume area; instead, the system has to 'look' sideways, and therefore there will be varying distance between the scene and the camera/projector (beyond the height variations).

Trade-off
An advantage of a narrow geometry is that projector and camera can easily both cover the same area of the scene, provided that suitable lenses are chosen.Also when considering depth, more or less the same parts of the scene will be reached by both.It Figure 3: Prototype camera-projector system may happen that parts of the scene are not illuminated by the projector because of occlusion (shadow), but most of these are also not seen by the camera.In a wide geometry, in addition to shadowed areas (some of which may be seen by the camera) there may be illuminated, however unseen, areas.
In general the compacness of a narrow geometry system can be considered an advantage.As show in Fig. 3 camera and projected are mounted rigidly with respect to each other and together form a mobile system, which after performing a relative orientation once, can be used at different locations without recalibration.With a wide geometry the camera and the projector will most likely be transported independently, requiring a new orientation procedure to be performed at every measurement location.Microsoft's XBOX game controller Kinect, whose operation is based on recognizing its user's body motion and is also widely being used for other 3d measurements, is probably the best-known example of a narrow geometry camera projector system.
As mentioned, at the downside of narrow geometry we find the sub-optimal base to height ratio, which inevitably leeds to deterioration of depth accuracy and precision.This will be addressed in Section 5.

Relative Orientation
The general layout of the system is schematically shown in Fig 1, including the projection centers and the image areas of both projector and camera.The projector defines a local object coordinate system in meters, which has its origin at the projection center, X and Y axes parallel to the edges of the LCD array, and a Z-axis prependicular to the array.After building up the system, a relative orientation procedure determines the location (X0, Y0, Z0) and rotation (ω, ϕ, κ) of the camera w.r.t. the projector / the (X, Y, Z)-system (Fig. 3).After that, the system can Figure 4: A projector beams "up" (top right) rather than horizontal (top left).Nevertheless the projected image is a rectangle (bottom right) rather than a trapezoid (bottom left).This is achieved by shifting the lens upwards w.r.t. the LCD array, giving a large value for vertical principal point offset.
be transported to different measurement locations without needing re-calibration.We implemented a self-calibrating procedure on the basis of a single checkerboard image projected on a flat surface by the projector and recorded by the camera, and a few measurements of sizes at the surface during the projection.This procedure yields focal lengths of camera fc and projector fp, as well as camera lens distortion parameters, in addition to the six relative orientation parameters mentioned before.Special attention is paid to the huge value of the projector's vertical principal point offset, as illustrated in Fig. 4.

Data processing
The general principle of most camera-projector systems is that correspondences between pixel positions in the projector LCD and the camera sensor can be easily established, as projected patterns are recognized in the camera.We will describe this later in detail.We begin by assuming that at every camera image coordinate (x, y) the corresponding projector column col is knownit differs from the usual column numbering by having the origin in the image center.We implicitely assume that correspondences are along epipolar lines, such that we do not need row positions as well, which would be less sensitive anyhow.
A camera at (X0, Y0, Z0) transforms an object coordinate (X, Y, Z) into a camera coordinate (x, y, −fc): Here s is an unknown factor and(X, Y, Z) can be anywhere along a line through the camera center an the image point.R implements the three orientation angles.We define an (X , Y , Z ) system with the origin at the camera center, but having axes parallel to the projector's.In that system the object point is at: In the projector system the point would be along a line that connects the camera center and (X , Y , Z ): where λ specifies the (yet unknown) exact location on that line.Knowing the projector column col we can state: which can be re-written as: and yields: which can now be substituted into equation 1 to obtain a point in a 3d point cloud.substituted

DEPTH RESOLUTION IN A NARROW GEOMETRY SYSTEM
In a wide geometry system the precision of locations in object space is in the order of the precision at which location and parallax can measured in the image, times the spatial resolution of the imagery.With a base-to-height ratio between 0.5:1 and 1:1 accuracies in depth and planimetry are about equal and proportional to spatial resolution, and therefore proportional to the distance between the camera(s) and the scene.
With a narrow geometry, assuming parallel optical axes, it is usual to speak about disparity rather than parallax.It is the difference between the locations where a point in the scene is represented in both images, expressed in pixels.The disparity x equals the distance B (for base) between the camera and the projector, divided by the pixel size p(Z) at that distance Z (between the camera and the scene point), which in turn depends on the focal length f measured in sensor pixels: A sensible measure for the depth resolution dZ at depth Z, where the disparity equals x pixels, is obtained by looking how far depth has to increase before a disparity of x − 1 would occur.This links the measurement x and its sensitiviy to the wanted parameter Z.
the depth resolution deteriorates with the squared of the distance (Khoshelham and Oude Elberink, 2012), which obviously restricts the maximum distance that a narrow geometry system can measure within given accuracy requirements.We will argue that this is tolerable, as the maximum distance is anyhow restricted by other factors, such as radiometric signal-to-noise ratio, and depth of field.However, certain refinements in disparity measurement are necessary, as follows.
Binary and Gray Encoding Computing disparity, i.e. the difference in column position between the camera image and the projector "image" of any object point, requires knowing the projector column number at a camera pixel.This is achieved by coding the projector column number in a series of projected images, having these recorded one by one by the camera, and the recorded sequence at each camera pixel.The straightforward approach is that each image in the sequence is one bit plane in the binary encoding of column numbers.In a projector of 1024 colums, as in the presented example, the encoding of column numbers between 0 and 1023 requires 10 bits: a sequence of ten black-and-white images.The left side of Fig. 6 illustrates this for 5 (instead of 10) bitplanes encoding numbers between 0 and 31.The procedure is to generate the bitplanes in the projector one by one as black-and-white images: 0 = black, 1 = white.The ten projections onto the scene are recorded using the camera.The images are subsequenty analysed.At each pixel it is first decided whether it is dark or light, by comparing it with an "average" image A providing the (location dependent) threshold, to be discussed below.This will tell whether a black (0) or a white (1) projection occurred at that spot, independently of local reflection and illumination conditions.The ten bits thus obtained at every camera image position will yield the number of the projector column, whereby correspondence is established.Our camera has a much higher resolution (3072 x 4608 pixels) than our projector (768 x 1024).As we choose camera and projector positions and orientations, and focal lengths, in such a way that the scene coverages have maximum overlap, each projector column spans a narrow region, with a width of 4-5 pixels, in the camera image.When ignoring any effect of blurring, pixels that are completely 'inside' a stripe would be assigned projector column values reliably.Other camera pixels would be 'intersected' by the boundary between adjacent stripes.For those pixels, one should hope that the resulting bit pattern corresponds to either of the two values.
A refinement to support this is the Gray encoding scheme, which is illustrated at the right side of Fig. 6.It was termed 'reverted binary code' by its inventor Frank Gray in a patent application of 1947, and it found many applications in comminucation technology.The mapping from binary to Gray and back can be implemented via lookup tables.Gray encoding is considered advantageous because any two neighboring values have only one bit different, i.e. the Hamming distance between any two neighboring values equals 1 (Lanman and Taubin, 2009).Unfortunately, this does not imply that a wrongly-transmitted bit gives a unit difference in the transmitted value, as for any given value ten different bit errors may occur, and only two of these will yield a neighboring value; in the eight remaining cases (much) larger error will result.A perhaps more relevant advantage of Gray encoding for our application is that none of the bitplanes alternates like the least significant bit of binary encoding does (0101010101).Instead, the plane with the maximum spatial frequency is 0110011001.
As depth of field of the projector is an issue, the maximum spatial frequency is limited by the modulation transfer function at the near and the far ends of the distance range.Note that the second bitplane of Gray encoding has the same frequency as the first, but is shifted one position (0011001100 resp.0110011001).Fig. 7 shows the image of decoded projector column numbers in the camera imagery.Values increase from low (blue) at the left side of the image, to high (red) at the right side as expected.The transitions are supposed to be smooth, except at depth jumps in the scene; however, al few more irregularities seem to occur as well.
Figure 7: Projector column numbers after decoding the camera image sequence.

Subpixel estimation
In order to improve resolution at long distances we want to estimate disparity with sub-pixel precision.In the above description we already noticed that the camera resolution is higher than the projector's, and therefore the integer projector column number, as it appears from decoding the Gray patterns, yields a coarse measure of disparity.Therefore, in addition to the projector column, we would like to determine at each camera pixel its position within a projector column, in other words to add decimals to the integer columns numbers.
When traversing a horizontal profile in the camera image, projected patterns with high spatial frequencies will appear as a sine functions, as the original black/white patterns have been subjected some blurring, caused by various effects in both camera and projector.Sub-pixel precision is obtained by estimating at each image pixel the phase of the sine function that is represented by the highest-frequency bitplane in the recorded image.
Whereas in the previous paragraph it was considered an advantage of Gray encoding to use lower maximum spatial frequency than binary encoding needs, we would now like to have the highest possible frequency, where the phase difference is most sensitive to disparity.However, instead of recording the problematic 0101010101 plane, let us call it P0, we add two more bitshifted variants of second-highest frequency.In addition to the P1 = 0011001100 and P2 = 0110011001 patterns of Gray encoding we also use P3 = 1100110011 and P4 = 1001100110 patterns.The four patterns P1, .. , P4 can be regarded as sine waves, phase shifted over multiples of 90 o .Multiplying the first two, as well as the last two, and adding the results together yields a synthetic sine wave P 0 with double frequency, according to equation (2), which has higher contrast and a better signal to noise ratio than P0.
The process is illustrated in Fig. 8, which shows the four lower- Phase estimation is based on the recipe in (Büttgen e.a., 2005).Although this did not exactly describe what they intended at the time (a more adequate version was given in (Oggier, 2009)), it suits our purposes perfectly.A similar description is hinted by (Sansoni and Reaelli, 2005).At each pixel in the camera we select neighboring values in a window of nine pixels wide; this width corresponds to one period of the highest-frequency pattern, as the projector has its resolution approximately 4.5 times courser than the camera.Within this window a convolution is performed with a 9x9 kernel that models one period of a cosine function, and another convolution is performed with a 9x9 minus-sine-shaped kernel; both are shown in 9.Each convolution yields a scalar, and the arctangent of their ratio is the requested phase.At this point it is good to mention that in a narrow geometry system the 'scale' of the observed patterns does not depend on the depthit is always about nine pixels.Therefore we require only one pair of kernels, and only two convolution runs, to estimate phase at any depth at once.
The mathematical background (in a continuous case) is given by: in which α, as in cos(x + α), is the phase difference that we are looking for, and the two factors cos(x) resp.− sin(x) represent the two convolution kernels.The formulas yield c0 and c1, whose ratio C 1/C 0 = tan(α).In the discrete (samples) case the integrals are replaced by summations.As the integration is from 0 to 2π it is important to have samples over exactly one period.
In our case the width of the synthetic sine pattern (one dark-light line pair) in the camera image is 9 pixels.The obtained phase difference α will be between −π/2 and π/2 and after scaling this to [-0.5 .. 0.5] it can be used as a correction to the integer column numbers obtained in the Gray decoding step.Care has to be taken at the 'transitions' between neighboring columns: since the two processes (Gray decoding and phase estimation, respectively yielding the integer and fractional parts of the same value) are rather independent, there the results can be "out of sync": we might add 0.48, for example, whereas in fact 0.52 should be subtracted.Since the reliabitilty of depth measurements is more important than their density (we have more camera pixels than the number of points we want), we only select pixels where the phase correction is small; these are pixels at the centers of the projected stripes (the red pixels in 5), where Gray decoding is most reliable.

CONCLUSION
We started developing a camera-projector system that uses structured light photogrammetry to monitor the dynamics of a riverbed in a flume.The advantage of an in-house development is to have full control over various aspects, amongst which is the fact that the riverbed will be partly submerged, whereby refraction at the water surface has to be considered.We constructed a versatile and compact system, and preliminary results demonstrate that the loss of accuracy caused by having a 'narrow' geometry can be counteracted by carefully aiming at sub-pixel disparity measurements.
Figure 12: Reconstructed point cloud

Figure 2 :
Figure 2: Flume used in the experiments

Figure 5 :
Figure 5: Spatial resolutions of camera and projector.The camera pixels indicated in red, at the centers of stripes in the projected patters, are selected for further processing into the final point cloud.The corresponding projector column number is computed with sub-pixel precision.

Figure 6 :
Figure 6: Binary and Gray encoding schemes

Figure 8 :
Figure 8: Image details to demonstrate synthesis of a fineststriped image (5th from above) from four courser ones (upper four).At the bottom the finest-striped image is shown as recorded by the camera.

Figure 9 :4
Figure 9: Convolution kernels for phase estimation of highestfrequency spatial pattern.frequency images recorded by the camera, the synthetic highfrequency image, and, for comparison, the real high-frequency image, which clearly has a lower quality.The operator Norm indicates that the four recorded images I1, .., I4, consisting of grey values between 0 and 255, are first made signed by subtracting their averageA = 1 4 Ii from each of them, and then linearly mapped into the range [0 .. 1].The average image A has also been used as the threshold image to decide at each image pixel between 0 and 1 during the Gray decoding above.

Figure 10 :
Figure 10: Two highest-frequency patterns of different quality due to depth-of-field influence.