Resolution enhancement for topography measurement of high-dynamic-range surfaces via image fusion.

In this paper, we introduce a method and algorithm for resolution enhancement of low-resolution surface topography data by fusing them with corresponding high-resolution intensity images. This fusion is achieved by linking the three-dimensional topographical map to its intensity image via an intrinsic image-based shape-from-shading algorithm. Through computational simulation and physical experiments, the proposed method's effectiveness and repeatability have been evaluated, and the computational cost has been shown to be less than other state-of-the-art algorithms. This proposed method can be easily integrated with high-speed in-line measurements of high-dynamic-range surfaces.

stochastic optical reconstruction microscopy (STORM), stimulated emission depletion (STED) microscopy, photoactivated localisation microscopy (PALM) and structured illumination microscopy (SIM) techniques have been developed. However, a fundamental limitation of these super-resolution techniques is that they require additional pre-processing of the sample, such as fluorescent staining, and in addition, their measuring ranges are limited, usually to scales of a few or several tens of micrometres [11,12].
To increase the measuring range or field-of-view (FOV), a number of HDR measuring techniques have been proposed based on stitching of a series of local high-resolution measurement data. For example, Yan et al. developed an iterative six degree-of-freedom stitching and weighted fusion algorithm for sub-aperture testing (SAT) of aspheric mirrors [13]. Lei et al. proposed a sampling noise-resistant stitching technique for large-area surface measurement with coherence scanning interferometry (CSI) systems [14]. Liu et al. developed a Gaussian process modelling-based stitching and fusion method for large surface measurement which provides higher stitching accuracy [15]. However, stitching techniques require a number (determined by aperture size and the total measuring range) of repetitive high-resolution measurements over each sub-aperture area. This demands repetitive setup operations for each sub-aperture measurement and a complete stitching process can be time consuming with potentially high data overheads.
A further requirement of HDR surface measurement is to improve measuring efficiency [16]. For example, Preibisch et al. developed a Fourier shift theorem-based fast stitching and fusion technique for biological volume images [17]. Yu and Peng developed a high-speed multi-scale registration-based stitching technique for 3D biological volume data on a gigabytes level [18]. Bria and Lannello developed a free toolbox -TeraStitcher -for highspeed tera-voxel-sized image stitching with low memory costs (less than 8 Gb) [19]. However, high-speed stitching techniques have received limited take up in mechanical manufacturing industry (although they are applied in optics manufacture) as they require complex additional operations, such as repetitive specimen adjustment and surface searching in each local measurement, compared to imaging of biological specimens.
In this paper, we propose a fast resolution-enhancement method which can produce a high-resolution surface height map by fusing a low-resolution height map with its corresponding high-resolution intensity image, based on recently developed shape-fromshading techniques [20][21][22][23]. With this solution, general sub-aperture 3D surface stitching can be replaced by fusing a global large-area low-resolution height map with several sub-aperture intensity images at different local regions, which are separately captured using a 3D sensor and 2D vision cameras. This new approach can be significantly faster than sub-aperture stitching and many recently developed multi-sensor fusion techniques [24][25][26][27][28], which are used with coordinate data only. For review of general coordinate data fusion techniques, readers can refer elsewhere [29,30].

Intrinsic image-based shape-from-shading linkage model
Fusion of the geometrical coordinate data of a 3D surface and its 2D intensity image under a specific illumination condition relies on determining a linkage model connecting the two sets of data. Shape-from-shading (SFS) is one such technique [31]. Many SFS models have been proposed, which are reviewed elsewhere [20,23]. In this paper, an intrinsic image-based lighting model [32] is employed, which can approximate complex shading factors, including natural or remote/near point lighting, Lambertian shading, specular reflectance, shadows, inter-reflections and material-related albedo variations.
Lambertian diffusion is the simplest shading model and widely used in SFS. A beam of light that is incident on a perfectly Lambertian surface is uniformly reflected in all directions. Thus, the shading (reflectance) image observed from any viewpoint is the same, and the reflected intensity is determined by the radiance received at each small surface facet cell and the local albedo (shown in Fig. 2(a)), i.e. cos , where A is the radiance, ρ is the local albedo, θ i is the angle of incidence, S  and n  are respectively the normalised vector of the lighting direction and surface normal. Unfortunately, recovery of the surface geometrical information n  with three unknowns from the single Eq. (1) is ill-posed [23]. Realistic shading models are usually expressed by a hybrid composition of Lambertian diffusion, background illumination, near lighting-induced radiance variations, plus local specular reflection, shadows and inter-reflections, i.e.
where k S  accounts for the k-th remote light source and S R is a local specular term. As presented in Fig. 2 R is usually expressed as a Gaussian (bidirectional) reflectance distribution function [23], i.e.
( ) 2 2 cos exp , where γ is the specular reflectance coefficient, θ r is the angle of emergence, i.e. ( ) where V  accounts for the viewing direction vector, and m corresponds to the local surface roughness [33]. The challenge of an exact recovery of the surface geometry information from such an under-determined least-squares optimization problem, i.e.
is the ambiguity problem of shape-from-shading [34], where I is the corresponding pixel intensity captured from a viewing sensor. Intrinsic image decomposition [32] has recently been applied to SFS by approximating an arbitrary realistic lighting environment using a first-order variation problem to overcome the complexity of Eq. (4), i.e.
where ( ) S n  is a surface normal-related Lambertian shading, ρ is the local surface albedo, β accounts for spatially independent local components, including shadows, inter-reflections and specular reflections, and R is the reflectance intensity at a specific pixel position. With the intrinsic decomposition, an intensity image of an object can be decomposed into an intrinsic part ρS and an extrinsic add-in β.  (7) ions. If α entrywise ithm (see ) he global low-level rs can be this work (8) where m  is t vector and n  where z ∇ is the x and y dir By concat global lightin problem Given the 3(b), the glob 6(a). With the the circular sh , where I and I k respectively represent the local lighting α-corresponded image intensity and the k-th intensity neighbour, σ 2 controls the smoothness of the weighting parameters, and τ represents a threshold controlling the degree of fragmentation in piecewise local lighting segmentation.
The following regularised least-squares minimisation problem is defined, i.e.
( ) where μ α controls the trade-off between the estimation accuracy and smoothness. This regularised least-squares problem can be solved using standard matrix calculus. With the local lighting estimation, a recomposed shading result αS is shown in Fig. 6(b), which shows good similarity to the input intensity image in Fig. 3(b), but with some local blurs. Under controlled lighting conditions, e.g. with simple remote lighting and limited local specularity, shadows or inter-reflections, uniform local lighting with α = 1 can be used for algorithm acceleration. In this case, the fusion algorithm can be simplified as a combination of global lighting estimation and high-resolution surface map refinement. Our experiments in section 4 show that such an acceleration provides significant computing cost reductions. Such a simplification can be effectively used in high-speed in-line measurement, where there are lower accuracy requirements.

High-resolution surface reconstruction
With the global and local lighting conditions determined, refinement of an input surface map Z 0 can be conducted by minimising the fusion objective function in Eq. (6). This is a nonlinear problem because the surface normal estimation in Eqs. (9) and (10) is a non-linear normalisation operation. To solve the minimisation in a linear manner, we rewrite the recomposition term R in Eq. (6) as If the non-linear coefficient ω is fixed, the intensity image prediction in Eq. (16) becomes linear with regard to Z, and Eq. (6) becomes a linear least-squares problem. Therefore, we iteratively calculate the non-linear term ω k and estimate the refined height map z k+1 with a linear least-squares method, as applied elsewhere [22], until a specified tolerance is achieved. A description of the iterative algorithm is presented in Fig. 5(b). Figure 6(c) demonstrates an output refined height map, which shows slightly more details than the input map in Fig. 4(a).

MEMS surface
First, we simulated an ideal remote point source lighting (altitude: π/4, azimuth: π/4) environment to validate the proposed algorithm. In this simulation, the input low-resolution surface map was obtained from a low-pass filtered (Gaussian, with cut-off of 2.5 μm) version of the high-resolution surface map measured by a CSI [36]. The corresponding input highresolution intensity image was simulated using Eq. (2) by integrating with Lambertian ( ap and its corre ion algorithm i wn in Fig. 7(d). a refined surfa oss-sectional p truth map than ere as [25,37]

Star pattern
As a popular resolution artefact for surface topography measuring instrument calibration [10], a further demonstration on a star pattern was carried out. In this test, the source surface topography data was obtained by using a lateral distortion-corrected CSI [38] and the spatial resolution enhancement was measured with the draft ISO standard-defined lateral period limit (LPL) [39,40], i.e. the spatial period at which the height response of an instrument falls to 50% (see [10] for details of how to calculated the LPL using a star pattern). Similar to the case in section 3.1, the input low-resolution star map as shown in Fig. 9(a) was initially filtered using a low-pass Gaussian filter, with cut-off of 0.8 μm. With the fusion parameters assigned as , our experiment under a simulated oblique lighting (altitude: π/4, azimuth: π/4) environment shows that the topography spatial resolution can be reduced from 23.3 μm to 7.4 μm (using LPL as the evaluation metric) via fusing with a high-resolution intensity image. Intermediate and final results of the fusion process are presented in Figs. 9 and 10. Note that the height response curves in Fig. 10 are extracted by subtracting two radial cross-section profiles which are respectively extracted from the lower and higher pedals of the star along X direction ( ± 5° to the X-axis). It should be pointed out that our refined fusion result in Fig. 10 shows conformance with the ground-truth data in the central area (within the range of the red arrows). But in the area off to the centre, e.g. within 45 μm to 70 μm of the abscissa, the fusion result presents less-conformant reconstruction accuracy than that of the low-resolution input. This could be caused by the ambiguity issue of SFS, i.e. SFS leads to ambiguous reconstruction of a step surface because the higher and lower pedals of a step surface have the same slope, and thus they have the same intensity.

Coin surface measurement with a vision-assisted CMM
An experiment with a vision-assisted coordinate-measuring-machine (CMM) was conducted to verify the performance of the proposed algorithm. In this experiment, a one-Chinese-Yuan coin surface was measured with a 1 mm diameter touch probe and a vision camera with partial ring lighting, as shown in Fig. 11. The touch probe provided a low-resolution height map input, while the vision system provided a high-resolution intensity image input for the resolution enhancement computation. The touch probe and vision sensor were initially calibrated in terms of their relative position and orientation accuracy using a calibrated ring gauge [29,41]. In this experiment, a high-resolution intensity image of the coin surface I with 444444 pixels was initially captured, as shown in Fig. 12(b). Regarding the low-resolution height data, a 5050 grid sample data set was captured in the same measurement area as the highresolution image (see the pink points in Fig. 12(a)). To match the data resolution of the lowresolution data to the high-resolution image, a Delaunay linear interpolation was conducted and a low-resolution height map Z 0 with the same sample size as the high-resolution image was obtained, as shown in Fig. 12(a). , and following the intrinsic image decomposition-based lighting estimation as shown in Fig. 12(c) and the fusion computation, a refined high-resolution height map can be obtained and is presented in Fig. 12(d). The accuracy was evaluated by comparing the fusion results with those from a contact stylus surface topography measuring instrument. By matching the CMM data to the stylus data using a scale-invariant feature transform (SIFT)based coarse matching, followed by an iterative-closest-points (ICP) fine matching, i.e. the SIFT-ICP algorithm [42,43], the height difference of corresponding points between the stylus data (in this case assumed as the ground truth) and CMM data was calculated. Their RMS height errors show that the fusion output result reduces the measurement error by about 4.4%.
A further comparison of top view, cross-section view and fast Fourier transform (FFT) analysis of the input low-resolution map and the output high-resolution map are shown in Fig.  12(e), Fig. 12(f) and Fig. 13, from which, we can clearly see that the periodical surface discontinuity of the low-resolution height map caused by low sampling rate and linear interpolation are removed by using the proposed resolution enhancement algorithm. The output high-resolution map shows a uniform distribution of surface continuity and detail refinement. The FFT analysis in Fig. 13 shows that the input low-resolution and output highresolution maps have coincident spatial frequency components in low-spatial frequency areas; but in high-spatial frequency areas, the resolution enhanced height map has more information than that of the low-resolution input map.

Analysis of accuracy and time cost
Accuracy and computational time are important factors for algorithms for in-line measurement [44]. In this section, we compare the fusion accuracy and computational efficiency of our proposed algorithm with a state-of-the-art real-time algorithm proposed by Or-El et al. [22], by using the three specimen surfaces described above. We select Or-El's algorithm for comparison because it has been successfully used in real-time computer vision with a processing rate in approximately ten frames per second (for 640480 images), by using the C language and GPU acceleration.
We have compared the resolution enhancement results of our method (as shown in Figs. 7-13) and Or-El's method, and found that the difference is negligible because the same objective function in Eq. (6) is used in both methods. Therefore, the results of Or-El's method are not presented here.
The difference of our testing environments is that we use an Intel I5 2.4 GHz CPU instead of GPU acceleration, and the coding language is MATLAB. The 444444-pixelated coin surface case was analysed with a serial computation. But, for the MEMS and star cases, which have a million sample points, a twelve workers-parallelised computation with 128128 block partitioning was used for acceleration (Or-El's algorithm was also accelerated in the same way in this test), considering the matrix inverse demands for high complexity [45]. A statistic of the algorithm performance is summarised in Table 1, with ten repetitive numerical runs.
For the accuracy analysis, the RMS error (RMSE), as defined in Eq. (17) [25,27,37], and recently highlighted universal-quality-index (UQI) [46] where k z accounts for the mean height of the k-th map, σ 1 , σ 2 and σ 12 represent respectively the variance of each height map and their co-variance. RMSE is a height value-related parameter; while UQI is a hybrid parameter which is designed to be more sensitive to lateral correlation and distortions. UQI is normalised within [−1, 1] and the best case, i.e. UQI = 1, appears only when Z 1 and Z 2 are exactly equal. In Table 1, we can see that the proposed algorithm can reduce the computational cost by 35% compared to Or-El's algorithm [22], with a similar level of accuracy. Our accelerated version of the algorithm with α = 1 (see last column of Table 1) shows that the computational cost can be reduced by approximately 80% compared to Or-El's algorithm [22]. These statistics comply with the time complexity of each algorithm. Given an input image with N pixels, Or-El's algorithm has the complexity The RMSE and UQI analysis show that for the simulation cases, the proposed algorithm can lead to significantly resolution-enhanced results which are similar to that of Or-El's algorithm: RMSE is reduced by 23% from that of the original data and UQI is improved by 0.2 to 0.5 (range normalised to [−1, 1]). For these simulation cases, the accelerated algorithm with α = 1 shows both reduced time cost and improved fusion accuracy. However, it should be noted that this only happens in simulations or with well-controlled lighting conditions.
With the coin measurement, the results show that the accuracy level of the proposed algorithm is similar to Or-El's algorithm. However, both fusion methods show limited quality enhancement: RMSE is only reduced by 4.4% and UQI is only improved by 0.046. This is probably due to the real lighting condition, as a 1/4 partial ring light has more complex local lighting variations than the simulated cases (a remote point source), and these local complexities degrade the fusion accuracy in local detail refinement. This effect can also be verified with the accelerated algorithm test in which the uniform local lighting, α = 1, significantly reduces the computing time cost, but also degrades the fusion accuracy.

Conclusion
A resolution enhancement solution and algorithm are proposed in this paper by fusing a lowresolution height map with its corresponding high-resolution intensity image under an arbitrary natural lighting condition (but co-axial lighting). The simulations and experiments have successfully demonstrated the performance of the proposed method in terms of fusion accuracy and computational time cost. Compared to state-of-the-art algorithms, the proposed algorithm can be significantly faster (35% to 80% faster), while retaining a similar level of accuracy. More testing, both simulation and experimental, is required to establish the sensitivity of the algorithm to different lighting conditions. In future work, we expect to apply this technique to in-line high-dynamic range measurement of large area micro-structured surfaces and in case study measurement scenarios, e.g. fusion of 3D surface topography data with scanning electron microscope images or differential interference contrast microscope images.