5D hyperspectral imaging: fast and accurate measurement of surface shape and spectral characteristics using structured light

: Measuring the shape (coordinates x , y , z ) and spectral characteristics (wavelength-dependent reﬂectance R „ (cid:21) i ” ) of macroscopic objects as a function of time ( t ) is of great interest in areas such as medical imaging, precision agriculture, or optical sorting. Here, we present an approach that allows to determine all these quantities with high resolution and accuracy, enabling measurement in ﬁve dimensions. We call this approach 5D hyperspectral imaging. We describe the design and implementation of a 5D sensor operating in the visible to near-infrared spectral range, which provides excellent spatial and spectral resolution, great depth accuracy, and high frame rates. The results of various experiments strongly indicate the great beneﬁt of the new technology.


Introduction
Hyperspectral imaging is a powerful technology for capturing our physical environment by detecting an object or scene in n (typically a dozen to several hundred) narrow wavelength ranges over a continuous spectrum [1]. Hyperspectral cameras generate a set of data points which provide the recorded wavelength-dependent radiation intensity I(λ i ) depending on the two-dimensional coordinates x and y. Originally developed for remote sensing and astronomy [2], hyperspectral imaging systems are constantly opening up new application areas. The analysis of artistic and cultural objects [3], quality control of foodstuffs [4], determination of the state of health of plants [5], and a variety of medical studies [4,6,7] are just a few examples.
As the surface topography of the measurement objects has a significant influence on the spectral information obtained, it is very beneficial to combine the spectral information with three-dimensional (3D) surface models [8][9][10], resulting in data points p k = [x k , y k , z k , I k (λ 1 , . . . , λ n )]. ( Such hyperspectral 3D data can be acquired by using a single sensor [11][12][13][14] or two different sensors which generate the 3D surface model and the hypercube separately [9,15,16]. Yet, state-of-the-art systems rely on textured objects for 3D reconstruction and/or do not combine high accuracy, spatial and spectral resolution, and measurement speed. The approach introduced here, for the first time, exhibits all these features, enabling 5D hyperspectral imaging: p 5D k = [x k , y k , z k , t, I k (λ 1 , . . . , λ n )].
It is suitable for the reconstruction of close-range objects and is based on two hyperspectral snapshot cameras [10,17,18] and a special broadband pattern projector [19]. As hyperspectral snapshot cameras generate data points p k (see Eq. (1)) within a single acquisition, they allow the detection of dynamic scenes [20][21][22]. In order to use them to measure the 3D shape of even unstructured or untextured surfaces with high accuracy and robustness, we employ the method of active close-range stereophotogrammetry [23][24][25][26][27]. The depth impression of an observed scene is deduced from the so-called disparity between the images of two cameras, i.e., the spatial displacement of image points of the same object point. Such corresponding points are reliably and accurately detected by illuminating the object with a series of patterns. For measurements at high frame rates, this necessitates a broadband high-speed pattern projector.
The method of GOBO projection of aperiodic sinusoidal patterns introduced by us [19] meets exactly these requirements. As a GOBO projector basically consists of a radiation source, a rotating slide (the so-called GOBO = GOes Before Optics), and an imaging lens, pattern generation is almost wavelength independent. Depending on the more or less arbitrarily interchangeable source of radiation, varying aperiodic sinusoidal patterns can be projected in a wide spectral range at frame rates of up to several kilohertz. The combination of GOBO projection and hyperspectral snapshot cameras is therefore the ideal approach for 5D hyperspectral imaging. It allows for the fast and accurate shape reconstruction of even texture-less object surfaces with comparatively low computational effort and feasible equipment. Reflectance spectra are acquired directly, making detailed knowledge of the scene and time-consuming computations unnecessary. In addition, our approach offers the possibility to perform the 3D reconstruction separately in different wavelength ranges.  Figure 1(a) illustrates the basic sensor design of a hyperspectral snapshot camera. Based on the approach of a Bayer filter mosaic, pixel-by-pixel Fabry-Pérot interference (FPI) filters are monolithically integrated onto a conventional CMOS sensor. Depending on the number of spectral channels, they result in a per-pixel filter mosaic of different size which extends periodically over the entire detector [20,28]. Each filter consists of two semi-transparent mirrors which form an optical resonator (see Fig. 1(b)). The reflectance of the mirrors determines the full width at half maximum and the distance between the mirrors determines the central transmission wavelengths [29,30]. The transmittance of an FPI surrounded by air (without taking into account any absorption) is

Hyperspectral snapshot cameras
with ∆ϕ = 4πnd cos ϑ λ and n sin ϑ = sin ϑ 0 with the reflectivity R 1 and R 2 of the two semi-transparent mirrors, the refractive index n of the intermediate material, the distance d between the mirrors, and the incidence angle ϑ 0 of the electromagnetic radiation (see Fig. 1(b)) [29,30]. Thus, the transmittance maxima of such an FPI are at λ k = 2nd cos ϑ k with k = 1, 2, 3, . . . .
Due to the fact that an FPI filter transmits light of harmonic wavelengths that disturb the spectral observation, band-pass or edge filters have to be placed in front of the camera chip or in the lens to select a single wavelength per pixel. Each channel i is characterized by its quantum efficiency η i and the transmittance T FPI,i of the corresponding FPI. If their product Q i = η i T FPI,i and the transmittance T filter of the band-pass/edge filter in front of the sensor are known, the detected gray value g i in channel i is given by (without taking noise or rounding to the next integer into account) with the overall system gain K, the exposure time t exp , the wavelength-dependent reflectance R obj of the measurement object, the spectral irradiance E e of the light source, and the transmittance T lens of the objective lens. Due to crosstalk, channel i does not only cover the object reflectance between λ a i and λ b i (with λ a 1 = 0, λ b i = λ a i+1 , and λ b n → ∞). Instead, the combined quantum efficiency is Q i 0 even outside this wavelength range. In order to determine the object's reflectance spectrum nonetheless, the integral in Eq. (7) is split into the n wavelength ranges covered by the n channels: Within each wavelength range, R obj is assumed to be constant, i.e., Equation (9) defines a linear system of equations that allows to determine the n unknowns R obj, j up to a factor K . In order to achieve an external consistency of the hypercube with data sets from other hyperspectral imaging systems, the values R obj = K R obj determined by the sensor need to be transformed into a suitable physical quantity R obj . For our application of the hyperspectral surface detection of non-emitting objects in the near field, a two-point calibration and reflection transformation by means of a so-called Spectralon panel is suitable. The plane surface of the calibration standard, diffusely reflecting with a reflectance of more than 98 % over a wavelength range from 400 to 1 000 nm, must be recorded under constant ambient and irradiation conditions. In order to compute the reflectance R obj of a measurement object, the values R obj of the captured scene, the values R dark corresponding to a dark scene, and a recording of the Spectralon panel R spec are required [31]:

Structured light 3D sensors
The 3D coordinates of a point on the object surface are determined by triangulating corresponding camera pixels in the stereo image pair, i.e., image points of the same object point. For the correct matching of corresponding pixels, the machine processing unit of the measurement system requires pixel-wise unique features. In order to provide these features over a broad spectral range, a GOBO projector can be applied to encode the object surface with aperiodic sinusoidal patterns [19]. The epipolar geometry is used to restrict the search range of the correspondence analysis in the right camera image (see Fig. 2(a)) [25,27]. The epipolar lines ì l 1 and ì l 2 represent the intersection lines between the image planes of the cameras and the epipolar plane. The epipolar plane is defined by the observed object point ì P and the projection centers ì C 1 and ì C 2 of both cameras. Thus, the corresponding pixel ì p 2 of a certain image point ì p 1 is always on the epipolar line ì l 2 . Based on this approach, the rectification of the stereo images makes it possible to further simplify the correspondence search. For this purpose, the image planes of both cameras are transformed into a common plane parallel to the baseline so that corresponding epipolar lines are on the same image row. Thus, the corresponding pixel ì p 2 needs to be searched for solely on the image row on which point ì p 1 is located. The GOBO-projected patterns allow the corresponding pixel on the rectified epipolar line to be identified unambiguously even for texture-less objects (see Fig. 2(b)). The rotating GOBO slide is used to project N (typically N = 10) varying aperiodic sinusoidal patterns, which are as perpendicular as possible to the epipolar lines, onto the measurement object surface: with k = 1, . . . , N with the parameters a (offset), b (amplitude), c (directly related to the period length), and d (phase shift), which are spatially and temporally variable [32]. In this way, unique gray value sequences can be determined for each pixel of an image line. Corresponding pixels are detected using the normalized cross-correlation. The correlation coefficient ρ between a pixel in camera 1 with successively detected gray values g (1) 1 , . . . , g (1) N and a pixel in camera 2 with gray values g (2) 1 , . . . , g (2) N is given by with the temporal mean values Corresponding points have a maximum correlation coefficient. By interpolating the intensity values of adjacent pixels on the epipolar in each of the N images, corresponding points can be determined with a subpixel accuracy of up to 1/30 px. Using the corresponding points and having knowledge of the parameters of the stereo camera system, the 3D coordinates of the points in the world coordinate system can be calculated [26,27].
The stereo calibration of the described 3D sensor is a necessary condition for the correct generation of 3D surface models. The aim of the calibration is to determine the intrinsic and extrinsic parameters of the cameras using the pinhole camera model. For this purpose, stereoscopic images of a calibration standard at several different positions in the measurement volume are acquired, e.g., of a flat board with a chessboard pattern of known size. The detection of corresponding chessboard intersection points in both cameras allows the determination of the intrinsic and extrinsic camera parameters [26,33].
The extrinsic parameters describe the relative orientation of both camera coordinate systems in terms of a translation between the two projection centers ì C 1 and ì C 2 and a rotation. The intrinsic parameters include the camera constant (i.e., the distance between camera center and image plane), the coordinates of the principal point ì p 0 (i.e., the intersection of the optical axis of the lens with the image plane), and the image distortion (e.g., radial and tangential distortion). 3. Location of the principal point ì p 0 in the coordinate system of a spectral channel: for channel (0, 1) with its virtual pixels marked with orange lines, the principal point has the coordinates ì p The calibration determined by means of the full resolution of the hyperspectral cameras must be converted to the images of the individual spectral channels, as these have both a lower resolution and a specific displacement. A suitable approach is to adapt the rectification maps of the two hyperspectral cameras, which have been determined by the stereo calibration. The rectification map indicates the position in the original image from which the information of a pixel in the rectified image originates. For a hyperspectral camera with a 5 × 5 filter mosaic, the rectification map R (i, j) of the channel (i, j) (with i, j = 0, . . . , 4) can be calculated from the full-resolution rectification map R full according to Eq. (14) (see Fig. 3 for a better understanding): Using the modified maps, rectified images in full resolution can be generated from the lowerresolution spectral band images in a common coordinate system. For each channel, the same camera constant κ of the rectified system (in pixel units), coordinates c x1 and c y1 of the (rectified) principal point in camera 1, and x-coordinate c x2 of the (rectified) principal point in camera 2 can be used for reconstruction. An object point ì P, defined by corresponding points ì , is then given in homogeneous coordinates by By carrying out this calculation for all detected corresponding points, the entire 3D point cloud is reconstructed.

Experimental setup
Our first 5D sensor (see Fig. 4(a)) consists of a halogen lamp-based GOBO projector and two XIMEA "MQ022HG-IM-SM5X5-NIR" hyperspectral snapshot cameras providing 5 × 5 different spectral channels in the visible to near-infrared range (VIS-NIR). As the cameras have an (effective) resolution of 2045 × 1080 px, each of the 25 images of the individual spectral channels that can be extracted has a resolution of 409 × 216 px. Due to the pixel size of 5.5 µm × 5.5 µm, the pixel pitch of the spectral channels is 27.5 µm. The full width at half maximum of the spectral bands varies between 10 and 15 nm.
As an FPI transmits light of harmonic wavelengths that disturb the spectral observation, the active range of the hyperspectral cameras needs to be constrained by optical filters. A band-pass filter in front of the sensor limits its response to wavelengths between 600 and 1 000 nm. Remaining unwanted harmonic wavelengths of the FPIs are eliminated by either an additional short-pass or long-pass filter in the camera lens. The combination of a band-pass and an edge filter limits the detectable spectral range, resulting in two operation modes: (OM1) 600 to 875 nm and (OM2) 675 to 975 nm. Figure 5 shows the combined transmittance T filter (λ) of the band-pass/edge filter in front of the sensor in the operation mode (OM1) and the combined quantum efficiency Q i (λ) of one of the cameras.
In the experimental setup, both hyperspectral cameras are mounted on a bar at a fixed distance of 350 mm from each other. At a working distance of 550 mm, they capture a measurement field of approximately 170 × 85 mm 2 . Between the two cameras, a GOBO projector [19] containing a halogen lamp as a source of radiation is integrated into the system. Over the entire spectral range between 600 and 975 nm, the halogen lamp produces a continuous spectrum and provides an irradiance in the measurement field that is sufficiently high for the cameras. A light funnel made of mirrors with a silver coating homogeneously guides the emitted electromagnetic radiation to a   square cut-out of the rotating slide. This slide consists of a circular heat-resistant borosilicate glass, which has a chromium coating comprising radial strips of varying width (see Fig. 4(b)). The GOBO projector is used to illuminate the scene with a series of aperiodic sinusoidal patterns [32] which allow for a robust and accurate 3D reconstruction as described in Sec. 3. The average strip width must be adjusted to match the magnification of the projection lens and the resolution of the hyperspectral cameras. As 3D reconstruction can either be performed using the full resolution images of the hyperspectral snapshot cameras (2045 × 1080 px) or by extracting low-resolution images from each of the 25 spectral channels (409 × 216 px), two GOBO wheels with different average strip widths have been manufactured. Depending on the desired application, either one of the two is installed in the sensor. It is uniformly rotated at a speed that depends on the frame rate and exposure time of both cameras.
Additionally, a second projector of the same design, but without GOBO slide, is installed in the sensor (see Fig. 4(a)). By using suitable filters in the projection lenses (e.g., a 650-nm short-pass filter and a 650-nm long-pass filter, respectively), it is possible to identify the aperiodic sinusoidal patterns with some spectral channels of the cameras, while the remaining channels detect the homogeneously illuminated measurement object. In this way, both the 3D reconstruction (in one or very few channels) and the recording of the spectral reflection data (in many channels) is possible within a single measurement, similar to the approach of Ozawa et al. [34].
Altogether, the 5D sensor can be operated in four different modes so that we obtain at any point in time either (OM1a) 25 Figure 6 illustrates the four operation modes.

Results
In order to demonstrate the benefits of our new approach to 5D hyperspectral imaging, we conducted various experiments using the developed prototype.

Characterization of spectral response and 3D performance
We examined the 3D performance of the sensor as well as the spectral behavior of the cameras. As can be seen from Fig. 5, the signal of a single channel can be affected by both the crosstalk between channels and the harmonics of the FPI. Therefore, in order to determine a consistent object reflectance R obj (λ), the impact of the other channels must be eliminated and the transformation into the physical quantity R obj must be calibrated as described in Sec. 2.
In general, the accumulated energy differs significantly from channel to channel (see Fig. 5). Furthermore, channels with λ i < 675 nm detect large amounts of higher-wavelength radiation. For those channels, the ratio of accumulated energy in the wavelength range of interest to the total detected energy is less than 40 %. This means that 60 % or more of the information in these channels originates from a spectral range that is actually of no interest. These channels are of limited suitability for experiments where 3D reconstruction is performed in single channels. For instance, in the operation mode (OM1), the reliable range is between 675 and 850 nm instead of the full range (600 . . . 875 nm).
The 3D performance of the system has been evaluated according to the guidelines of the VDI/VDE 2634 Part 2 [35]. The cuboid measurement volume was defined as 170 × 85 × 85 mm 3 (width × height × depth). In order to determine the expected measurement inaccuracies, we generated and evaluated 3D surface models of four different test specimens in the spectral range between 600 and 875 nm (operation mode (OM1a)). For each single view of a measurement object, one 3D surface model can be generated from the N = 10 full-resolution images of all spectral bands of the sensor, or up to 25 3D surface models from the low-resolution images of each spectral channel. Depending on the object reflectance and thus the camera exposure time, images can be recorded at the cameras' maximum frame rate of 170 Hz at 8-bit resolution. Therefore, up to 17 hyperspectral 3D models per second can be generated with the current setup. The average distance of the 3D points is about 75 µm (full resolution) or 375 µm (channel resolution) in x and y direction.
One of the parameters used to characterize the 3D performance is the flatness measurement error F = max d k − min d k . When measuring a flat object, it describes the range of the measured points' deviation d k from a fitted plane. In the entire measurement volume and for each spectral channel, the flatness measurement error was below 0.7 mm. In the reliable range, it was smaller than 0.5 mm. The 3D points' standard deviation from the fitted plane was between 25 and 100 µm for each spectral channel.
Another important quantity we have determined is the 3D point error when measuring a sphere. We used a spherical specimen with a diameter of 38 mm, which we placed at nine different positions evenly distributed in the measurement volume. For each spectral channel, the standard deviation of the measured points from a fitted sphere ranged from 20 to 80 µm. Again, channels in the reliable range (675 to 850 nm) allowed for the highest accuracy with standard deviations between 20 and 40 µm.

Measurement of a historical globe
5D hyperspectral imaging offers a promising non-invasive option for the analysis and classification of historical measurement objects [3,36,37]. In order to demonstrate the suitability of our sensor for the digital documentation of art and cultural objects, we measured a historical globe using the two operation modes (OM1b) and (OM2b). The examined globe is a relief globe from 1885, which was created by the geographic-artistic institution of Ludwig Julius Heymann in Berlin (see Fig. 7(a)).
As can be seen in Fig. 7(b), the relief structure of the globe can be very well documented using our 5D sensor. In addition, the transfer of the texture recorded in each spectral channel to the reconstructed models enables a detailed visualization of the surface properties and spectral signatures of the measurement object. Figure 7(c) exemplarily shows four such models in false-color representation, colored according to their wavelength. In general, this allows, e.g., to classify objects made of different materials.

Measurement of a human hand
By creating 5D models of a human hand, it is possible to demonstrate the simplified detection of veins in the NIR compared to the visible spectral range. In particular, images in the spectral range between 800 and 850 nm are well suited, regardless of skin tone, degree of dehydration, fat content, or body hair. They provide, e.g., a reliable way to determine which veins are suitable for an infusion [38,39]. Using our sensor, we measured the left hand of a 26-year-old male subject and created hyperspectral 3D models in the 600-875-nm and 675-975-nm configuration (operation modes (OM1b) and (OM2b)). Due to the necessary conversion of the sensor between the two acquisitions by exchanging the optical filters in the lenses of the hyperspectral cameras, the position and condition of the hand do not exactly match. Figure 8(a) shows the 3D model recorded and reconstructed in the range between 600 and 875 nm, Fig. 8(b) shows the one in the range between 675 and 975 nm. On the left, the 3D surface models (with artificial blue shading) are shown. On the right, assigned to the respective wavelengths, exemplary 3D models with mapped texture of a single spectral channel are shown.
In all the images, the veins of the test person are very clearly visible, in Fig. 8(a) even more pronounced than in Fig. 8(b). The reason for this is that the measurement was carried out in the 600-875-nm configuration first, immediately after stimulation of blood flow in the hand by light physical activity. The subsequent measurement in the 675-975-nm configuration was performed after the sensor had been converted for a few minutes, resulting in a slight decrease in blood flow. This can also be seen from the 3D surface models on the left.

Determination of leaf water content
Another interesting field of application for the 5D sensor is plant phenotyping. For instance, VIS/NIR hyperspectral images can be used for the evaluation of fruit and vegetable quality [40] or for the determination of leaf water content [41]. For instance, the water content of leaves of Mediterranean plants can be deduced from the reflection spectrum in the NIR region (between 930 and 980 nm, depending on the specific plant) [42]. In general, a dry plant leaf reflects electromagnetic radiation in the NIR much stronger than a healthy plant leaf containing a sufficient amount of water. In our experiment, we used our sensor in the spectral range from 675 to 975 nm (operation mode (OM2b)) to measure a citrus plant at regular time intervals during water absorption. In the initial state (t = 0 min), the plant was very dry after two weeks without water supply. Immediately after adding water, a measurement was taken every ten minutes until 210 minutes had passed.  Figure 9(a) shows both the condition of the plant at time t = 0 min, 90 min, and 210 min documented by a conventional digital camera and the area observed by the 5D sensor. The corresponding 3D surface models are shown in Fig. 9(b). They illustrate the process of water absorption by the plant whose leaves unfold. In the orange marked regions, we have also determined the wavelength-dependent reflectance. In order to obtain a temporally and spatially consistent reflectance spectrum, we used the 3D point cloud to estimate the surface normals and perform shading correction, assuming Lambertian reflection [43,44]. The resulting reflectance spectrum is plotted in Fig. 9(c). It shows similar characteristics as in comparable investigations [45][46][47]. In particular, the drop in the leaf's reflectance in the range from 950 to 975 nm due to water absorption is confirmed [42].

Investigation of translucent objects
In addition to the determination of the reflectance spectrum, our 5D sensor is also suitable for the investigation of other wavelength-dependent parameters, e.g., the penetration depth of light into translucent objects. For this purpose, we used our sensor to measure both a translucent and an opaque sphere with a radius of approximately 30 mm each (see Fig. 10(a)) in different spectral channels between 600 and 875 nm (operation mode (OM1a)).
Lutzke et al. [48][49][50] have shown that the measurement of translucent objects results in a deviation between real and measured surface. Figure 10(b) shows this effect (highly exaggerated) when measuring a sphere. Instead of the actual sphere (green line), the orange colored points are detected. The greater the light's penetration depth, the more the position of a best-fit sphere (orange line) differs from the real sphere.
We repeatedly measured a diffusely reflecting white opaque sphere and a translucent synthetic resin sphere with a polished surface. To each of the point clouds generated in the different spectral ranges, a sphere with a (known) fixed radius was fitted. Figure 10(c) shows the resulting displacement of the sphere center (X 0 , Y 0 , Z 0 ) in the z direction with respect to an (unknown) reference value Z ref in the reliable wavelength range between 675 and 850 nm. When measuring the opaque sphere, the position of the sphere center does not change significantly (blue line). But when measuring the translucent sphere, an increasing shift ∆Z with larger wavelengths (orange line) is clearly noticeable. Thus, with the help of our 5D sensor, we have been able to confirm previous results [12,51] that the light's penetration depth into the synthetic resin sphere  increases with increasing wavelength and that our system is suitable for the investigation of such wavelength-dependent properties.

Conclusion
Our new approach of 5D hyperspectral imaging enables us to accurately measure the shape and reflection characteristics of the surface of macroscopic objects at video rate. By using a specially developed broadband high-speed pattern projector and two hyperspectral snapshot cameras, excellent spatial and spectral resolution, great depth accuracy, and high frame rates can be realized by a very compact, cost-effective, and robust sensor. A first system based on the proposed technology comprises two synchronized cameras, the CMOS sensors of which are tessellated with 5 × 5 different Fabry-Pérot interference filters, and a GOBO projector, which projects temporally varying aperiodic sinusoidal patterns into the measurement volume. When projecting appropriate patterns (in terms of average fringe width, fringe width variation, or GOBO wheel rotational speed [52]), the 5D sensor's specifications are determined only by the applied hyperspectral cameras. The more spectral channels the cameras have, the higher the spectral resolution and the lower the spatial resolution and depth accuracy of the 5D sensor, and vice versa. Our prototype features 25 different channels in the visible to near-infrared range, each of them providing a 3D point standard deviation of less than 100 µm (measurement of a test plane) and a radius standard deviation of less than 80 µm (measurement of a test sphere). Measurements can be performed at a frame rate of up to 17 Hz, which is significantly faster than state-of-the-art systems.
A number of different experiments demonstrated potential applications of the developed 5D sensor, such as investigation of art and cultural objects, plant phenotyping, or medicine. The reconstructed 3D point clouds can be used for shading correction to obtain temporally and spatially consistent spectra. In addition, the developed system makes it possible to investigate the penetration of light into different materials which was demonstrated by the measurement of an opaque and a translucent sphere.
In the future, the 5D sensor should be further optimized. This particularly involves the applied camera technology. By a more homogeneous signal-to-noise ratio and by minimizing the crosstalk between the individual channels, we expect a more reliable determination of the wavelength dependence of the reflectance. By further increasing the camera frame rate, even dynamically changing object properties can be monitored. Furthermore, the additional use of hyperspectral cameras working in the visible spectral range should be considered. In this way, the 5D sensor could cover an even wider wavelength range and draw more precise conclusions about the properties of the measured objects.