SURFACE COMPLEXITY COMPONENT OF LIDAR POINT CLOUD ERROR CHARACTERIZATION

: There are several data product characterization methods to describe LiDAR data quality. Typically based on guidelines developed by government or professional societies, these techniques require the statistical analysis of vertical differences at known checkpoints (surface patches) to obtain a measure of the vertical accuracy. More advanced methods attempt to also characterize the horizontal accuracy of the LiDAR point cloud, using measurements at LiDAR-specific targets or other man-made objects that can be distinctly extracted from both horizontal and vertical representation in the LiDAR point cloud. There are two concerns with these methods. First, the number of check points/features is relatively small with respect to the point cloud size that is typically measured, at least, in millions. Second, these locations are usually selected in relatively benign areas, such as hard flat surfaces at easily accessible locations. The problem with this characterization is that it is not likely that a statistically representative analysis can be obtained from a limited number of points at locations that may not properly represent the overall object space composition. There is an ongoing effort to address these issues, and some of the newer methods to characterize LiDAR data include an average points spacing measure, computed from the LiDAR point cloud. Clearly, it is an important step forward but it ignores the surface complexity. The objective of this study is to elaborate only on the requirements for adequate surface representation in combination with the LiDAR error characterization techniques to identify the relation between the two surfaces, the measured and reference (ideal), and thus, to support better LiDAR or, in general, point cloud error characterization. data acquisition planning, data storage and accuracy requirements. This study provides an initial attempt to look into one aspect of surface/object space dependency by analyzing the requirements for optimal surface representation with respect to acceptable error range. Preliminary results clearly demonstrate the benefit of the approach. This research is essential to develop practical metrics to define the relationship between surface complexity and acceptable surface representation for a given DEM error level, by developing categories for both typical surfaces and error levels defined by various guidelines and standards.


INTRODUCTION
Since the introduction of digital photogrammetric techniques and airborne LiDAR, DEMs (Digital Elevation Model) have become a baseline mapping/geospatial product that is broadly used in almost all mapping and engineering as well as in other applications (Muane, 2007). For example, it is directly used in flood plane mapping or for line of sight analysis in telecommunication, and indirectly in orthophoto production or 3D city modeling. The real proliferation of DEMs started with the introduction of powerful computers and softcopy photogrammetric systems that could provide an affordable platform for mass surface point generation from scanned airborne imagery as well as to process point cloud acquired by LiDAR systems. The rapid acceptance of LiDAR technology made the DEM production quick and inexpensive; in fact, image-based surface point generation lost significant market share at that time. This situation has started to change recently, as the improving performance of digital cameras and, more importantly, advancing image matching techniques, including stereo-and multi-ray image matching, have led to efficient point cloud creation that is becoming competitive to LiDAR in certain applications. Lastly, there are several terms used for describing surface model/data, including DSM, DED, DTED, DTM, DEM, etc., some of them overlap in definition, while others are unique; in the following, only the DEM is used as a general term.
Since the introduction of LiDAR, the error characterization of LiDAR-derived DEM data has been a challenging task given the extremely large number of points and the various characteristics of data acquisition and processing techniques.
Furthermore, LiDAR produces a point cloud, and thus, in most applications, there is a need for converting the surface data into one of the several data representations, including regularly spaced point data (gridded data or raster), TIN, and contours. Obviously, this conversion may introduce errors.
Standards, guidelines and product qualification methods to characterize LiDAR data have been developed mostly by government agencies to provide for a consistent treatment of data, usually acquired from various sources. The primary, most often used regulations in the US are from USGS, FEMA, NGA, FGDC, ASPRS, etc. All of these standards/guidelines are mainly focused on the DEM data QA/QC, including accuracy, ground control and statistical evaluation methods, and there is no or little attention paid to the actual surface, or, in broader term, to the impact of the object space characteristics on the DEM characterization. The varying surface geometry and condition are typically considered in deciding on DEM point density or in using breaklines, etc. Within ASPRS, there is an ongoing effort to develop guidelines of point spacing assessment, see (Naus, 2011).
The objective of this study is to look into the surface/object space condition in terms of spatial sampling of the surface and surface representation to analyze its impact on the QA/QC processes of DEM data. Though the original motivation comes from using surfaces extracted from LiDAR data, the discussion will make no distinction with respect to the origin of the point cloud, as the emphasis is currently shifting from LiDAR point cloud to the broader point cloud processing, which also includes stereo or multiple ray image generated surface point clouds.

THE THEORY OF SPATIAL SAMPLING
Point cloud, produced by LiDAR, is a more complex geometrical structure than regular terrain surfaces, modeled by points, as point clouds have no limitation in the vertical distribution of the points. For example, multiple returns can provide detailed representation of the vertical composition of wooded areas. In general, voxel-based representation is required for point clouds. DEMs, in contrast, have single elevation data and, consequently, many times called as a 2.5D representation, which is the subject of the following discussion.
Surface elevation data, Sc, with respect to a mapping plane, in general, can be considered as a two-dimensional continuous function: where S c is the vertical (z) coordinate, x and y are the horizontal coordinates in a mapping plane. For practical reasons, the discrete representation of the surface is considered, which is typically obtained by an evenly spaced twodimensional sampling of the continuous function and by converting the continuous elevation values to discrete ones: where, Q p is the quantization function (typically a regular stepfunction), which maps the continuous input parameter space to 2 p discrete levels, p is the number of bit used for quantization, x i and y j are the coordinates at the sampling point of i and j, respectively. The fundamental question is how well the second representation (E d ij ) describes the first representation (S c ). According to Shannon's information theory (Shannon, 1948), if the sampling distance satisfies some conditions, then the continuous signal S c can be fully reconstructed from the samples E d ij . The required sampling distance is defined by the well-known Nyquist frequency (Shannon, 1949). For the twodimensional signal case, if f x max and f y max are the highest spatial spectral frequencies for a given surface, then the sampling distances d x s and d y s are sufficient for the complete representation of this surface, and consequently, the continuous surface can be restored without any error from the discrete representation in this ideal case. The Nyquist criterion for the two-dimensional case is: If the Nyquist criterion is satisfied, then the reconstruction of the continuous surface from the discrete samples using the required,, or shorter sampling distances, is described by: where the sync function is defined as In this ideal case, the reconstruction introduces no errors, as the discrete representation provides a complete description of the surface. The sampling distances in the x and y directions could be different in some specific cases. Furthermore, the concept can be extended to non-uniform sampling; though, it has no advantages in general practice. Although the quantization is a non-linear transformation, its impact in practice can be safely ignored, as in modern digital systems, the usual numerical representation provides high-precision representation for wide signal ranges, so the error introduced by converting the continuous signal into a discrete one is negligible (Widrow and Kollar, 2008).

APPLYING THE THEORY
In practice, it is generally impossible to achieve the ideal situation in terms of surface representation, described above for several reasons. First of all, the characteristics of real surfaces are rarely known, so the Nyquist criterion can be only estimated from the samples. Second, there are inherent limitations of the measurement system, which introduce measurement errors. Concerning the Nyquist criterion with respect to LiDARderived surfaces, extracted from point clouds, such as filtered last returns, the following observations can be made:  since the Nyquist criterion is typically not known, during flight mission planning, only general consideration is given to point spacing, based on overall surface characteristic, such as flat areas require less dense sampling compared to more complex and varying terrain,  LiDAR data provides an uneven spatial sampling, which could vary over larger ranges, depending on the scanning solution, such as sinusoidal or saw-tooth pattern, and consequently, the Nyquist criterion may be satisfied at some locations while mostly it is not satisfied, and  in absolute sense, the LiDAR point spacing is below the Nyquist criterion in the general practice.
In any geospatial data acquisition and product generation task, there is a target accuracy requirement, which allows for a certain level of errors. For airborne sensors, such as direct georeferencing-based LiDAR and large-format digital camera systems, the error budget is formed from navigation or sensor orientation errors, sensor modeling errors, and object spacedependent errors. The last component is frequently overlooked, giving a somewhat optimistic estimate of the overall error budget. One of the most relevant factors of object spacedependent errors is the surface undulation, as the incidence angle has a strong impact on the point positioning accuracy. For example, small incident angle will reduce the vertical accuracy of LiDAR (and it is similar for optical imagery too).
In the context of the error budget, there is clearly no need to perfectly satisfy the Nyquist criterion, as there is allowance for errors and the question is more like the relative contribution of the different components of the error budget. Thus, the problem can be rephrased that for a given surface how to formulate a relationship between the required surface sampling distance and specified error level (either from sensor or from the specification for the product)? More precisely, there are two basic questions:  What spacing is required for a given measurement error level to adequately sample (represent) a surface?  Not satisfying the sampling requirements for a given surface complexity, what is the error introduced?
To illustrate the problem of surface sampling vs. error introduced, Fig 1 shows a surface profile; for simplicity, the one-dimensional case is considered, as the generalization to 2D is straightforward. In Fig. 1, the brown line shows the ideal surface profile, and then an envelope around that profile, marked by blue boundaries, shows an acceptable error range; the error range can be defined by measurement error or product accuracy requirement in usual statistical terms, such as 1σ or CEP90. Within the error envelope, the ideal reconstruction of International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XXXIX-B1, 2012 XXII ISPRS Congress, 25 August -01 September 2012, Melbourne, Australia the surface profile is not needed, and, in fact, any curve in the error envelop is an acceptable surface representation, as it meets the accuracy specification. Among the infinite number of profiles, the one that requires the largest sampling distance that meets the Nyquist criterion should be selected. In addition, to avoid bias, the smallest distance from the ideal surface profile should be considered as a second constraint. In practice, the first condition can be easily satisfied, while the second is typically not. The curve, marked in red in Fig. 2, shows a simple solution, which clearly meets the requirements; note compared to the ideal profile, the curve is smooth. Vertical arrows show the required sampling distances for the original surface profile and the selected profile, respectively. The error between the reference and reconstructed with error profiles can be estimated by the following expression (Toth, 2011): where X(f) is the spectrum (PSD), and f s is the spatial frequency band. The expression simply states that the reconstruction error is due to the spatial frequency components that fall outside of the frequency band defined by the sampling rate. The variance is of more importance in accuracy assessment, so a similar formula can be derived: There are several potential approaches to find an optimal or near-optimal surface curve, provided the ideal/reference surface is known. However, it is rarely the case; it is possible in certain situations, such as knowing the shape of the surface because it is defined by a simple geometry. Therefore, estimation is needed to determine the surface, which, it can be done in an iterative way. To account for the estimation error, the error envelope should be reduced in this case. Though computationally intensive, simulation could be another approach to obtain the profile that meets the requirements of the accuracy specification.
The process described above can be applied in various ways to DEM representation with an accuracy specification. For example, knowing surface complexity and DEM accuracy requirements, the optimal sampling rate can be determined, and, for example, the scan rate of a LiDAR can be configured accordingly. Similarly, knowing the sampling rate and the surface complexity, the expected DEM accuracy range can be estimated. Another application is the conversion of irregularly spaced point data to a gridded format, where the optimal grid constant is defined by the sampling distance satisfying the Nyquist criterion.

EXPERIMENTS
To investigate the applicability of the theoretical results to real scenarios, a LiDAR data set was selected and various tests have been performed. First, the area size should be chosen to be practically meaningful in terms that it is not too big and not small. The information theory provides a clear basis for surface sampling, i.e., what the maximum sampling distance should be to fully represent a surface, but it is a global character, meaning that the whole area should satisfy the Nyquist criterion. For larger areas, this condition could be too conservative if the surface changes are different within the area, such as an area where a river cuts into a smooth rolling terrain would require a higher sampling rate for the riverbank, while a moderate sampling would clearly satisfy the requirements for the remaining part of the area. Therefore, larger areas should be first divided into smaller areas with nearly identical sampling requirements. This concept is practically identical to the tiling process used mapping, image compression, or, to some extent, wavelet transformation. Based on the above consideration, a 32 m by 32 m area of moderately hilly terrain was cut out from a LiDAR data set. The original point density was about 8-10 pts/m 2 . Since the Fourier transformation, the basis for any spectral analysis, requires regularly spaced data, the LiDAR point cloud was resampled to a 25 cm grid, which represented a compromise given the surface complexity and the somewhat lower original sampling. For our investigation, this data set was assumed to satisfy the Nyquist criterion. Fig. 1 shows a perspective view of the surface. Four representations were derived from the test area reference data set with sampling distances of 0.5, 1, 2, and 4 m; all of them are representing an under-sampling situation of the area. Next, the spectrum (power density function) was computed to estimate the error introduced by the sampling not satisfying the Nyquist criterion, based on Eq. (6). Finally, the differences were computed between the reference image and the various resolution representations to numerically estimate the same errors. Note that the interpolation needed to compare the different data sets (different sampling distances) was based on using Eq. (4). Table I below shows the comparison between the estimated differences based on the spectrum of the reference area and the statistically computed ones. In general, there is a good agreement between the two computations, and the minor differences are likely to due to numerical errors in the computations. To show the impact of the different sampling distances, Fig. 3 shows difference surface between the reference and the 2 m sample representations; note the wavy pattern due to the syncbased reconstruction. Since it is difficult to compare rendered surfaces, Fig.4 shows two contours from the original reference and the 1 m samples; note that the counter lines are not smoothed.  Figure 3. Surface difference between reference and reconstructed surface from 2 m samples.

CONCLUSIONS
As DEM product characterization methods to describe surfaces obtained from LiDAR and, in general, from other point clouds too, continue to advance, the surface characteristics should be more considered to improve QA/QC performance. The better understanding of the relationship between surface complexity and surface sampling distance (representation) is essential to effectively balance data acquisition planning, data storage and accuracy requirements.
This study provides an initial attempt to look into one aspect of surface/object space dependency by analyzing the requirements for optimal surface representation with respect to acceptable error range. Preliminary results clearly demonstrate the benefit of the approach. This research is essential to develop practical metrics to define the relationship between surface complexity and acceptable surface representation for a given DEM error level, by developing categories for both typical surfaces and error levels defined by various guidelines and standards.