Point Distribution as True Quality of LiDAR Point Cloud

A parameter “point density” is often used to evaluate the quality of aerial laser scanning data. It is a parameter simple for understanding and human imagination. However, the true quality of LiDAR point cloud is based on point distribution. There are researches, which mention importance of point distribution and users’ false perception, that higher point density is better quality of LiDAR point cloud. The goal of this study is to define the mathematical model how to measure quality of LiDAR point cloud. This article discusses the point distribution and LiDAR data quality defining the image resolution of point cloud. It can be interesting for experts in civil geospatial intelligence, LiDAR data processing and flight planning.


Introduction
Traditional methods such as field surveying and photogrammetry can yield high accuracy terrain data, but they are time consuming and labour intensive, especially for large area.Airborne Light Detection and Ranging (LiDAR)also referred to as Airborne Laser Scanning (ALS), provides an alternative for high density and high accuracy three dimensional terrain point data acquisition (Liu et al., 2008) measuring shape of surface using laser rangefinder with aircraft platform (see Fig. 1).

Figure 1. Conception of airborne laser scanning
One of the appealing features in the LiDAR output is the direct availability of threedimensional coordinates of points in object space (Habib et al., 2005).The fields of application for LiDAR are very diverse and include generation of digital elevation models (DEMs), 3D-city modelling, forestry management, coastline protection, disaster management, erosion studies, archaeology, monitoring of corridors such as power lines, pipelines, railways and roads.The technology offers short data acquisition time, highly detailed detection of the earth surface and the accuracy fits the needs of many applications (Zhi and Zhong, 2008).
Acquisition costs are related to coverage and LiDAR pulse density (Baltsavias, 1999), (Lovell et al., 2005).LiDAR can already be less expensive or comparable to image data analysis when the entire cost of research (data acquisition, analysis, personnel, software, etc.) is considered, especially for large areas.Since airborne LiDAR acquisition over large areas is costly, LiDAR missions necessarily involve tradeoffs.Geospatial intelligence specialists must decide between cost and coverage; between cost and LiDAR pulse density, between density and coverage; and at the same time maintain accuracy (Jakubowski, 2013).
The LiDAR data with high density is considered as qualitative raw material for analysis, but it is not very correct decision, because, all remote sensing systems only take samples (points in LiDAR case) on the object's surface or sometimes volume, with limited spatial resolution in all dimensions and with finite accuracy.The information content of the acquired point cloud data depends explicitly on the actual distribution of the samples in 3D space (Ullrich, 2013).
Users have quite naturally come to equate higher density imagery with higher accuracies and higher quality.In airborne topographic LiDAR, this is not entirely the case.For many typical LiDAR collections, the maximum accuracy attainable, theoretically, is now limited by physical error budgets.Increasing the density of points does not change those factors.That said, high density LiDAR data are usually of higher quality than low density data, and the increased quality can manifest as apparently higher accuracy (ASPRS, 2015a).Good explanation of misleading point density is provided by Ullrich using modelling and simulation of laser scanning to describe the importance of point distribution (Ullrich, 2013).
Currently, there are developed guidelines for LiDAR data acquisition.For example, ASPRS standard "ASPRS Positional Accuracy Standards for Digital Geospatial Data" specifies horizontal and vertical accuracy for digital data and their verification (ASPRS, 2015a), however, it does not associate product accuracy with the ground sample distance (GSD) and does not cover classification accuracy of thematic maps (ASPRS, 2015b).USGS "LiDAR Base Specification" defines quality level considering two parameters: aggregate nominal pulse spacing (m) and aggregate nominal pulse density (pls/m 2 ), (Heidemann, 2014); and the selection must be completed focusing on minimal acceptable requirements for satisfaction of business uses (WEB).
The dependence between point density and business uses has been researched before.One robust research is completed for forest management (Jakubowski et al., 2013), the authors systematically investigated the relations between pulse density and the ability to predict several commonly used forest measures and metrics at the plot scale.Their research is based on the similar problem with false perception of users: "As technology improves, there has been a trend to acquire data at increasingly higher densities, reflecting the belief that this will improve accuracies" (Jakubowski et al., 2013).The similar researches are completed for other business uses too.For example, researches related with urban classification are depicted in publications (Tomljenovic and Rousell, 2014) and (Kodors and Kangro, 2016), where authors investigate the relations between point density and ability to extract a building from a point cloud.Other research is related with suitability of different LiDAR point densities for channel network extraction (Pirotti and Tarolli, 2010).
The goal of study is to define the mathematical model how to measure quality of LiDAR point cloud.The research is completed by descriptive and monographic method.The main discussion objects are quality of LiDAR point cloud, its impact into research quality, descriptive parameters of LiDAR point cloudpoint density and point spacing.

Geospatial intelligence and research quality
If LIDAR data are used by geospatial intelligence, the life cycle of analysis has next structure (see Fig. 2): 1. Requirements definition and project planning are initiated by decision to obtain new information with a goal to solve some problem or to achieve new opportunities.The outcome of this stage is a requirement specification of LiDAR point cloud for LiDAR survey; 2. Airborne laser scanning and LiDAR data acquisition are completed to obtain LiDAR point cloud (geoinformation and raw material for research).This stage is related with flight planning and LiDAR point cloud production, that includes noise filtering and georeferencing of points; 3. LiDAR data processing is accomplished with a goal to obtain the secondary product for geospatial analysisit can be DEM or thematic maps etc. Speaking exactly about LiDAR data, result can be classified point cloud; 4. Geospatial analysis is the main stage, which applies LiDAR data or its secondary product to obtain new information to achieve the goal of research considering project requirements; 5. Decision-making is the final stage, when experts evaluate obtained information and current geospatial situation to make decisions about next strategy, actions or researches.

Figure 2. Life cycle of geospatial intelligence using LiDAR data
According to the life cycle of geospatial intelligence (see Fig. 2), the final quality depends on the quality obtained in each stage; this relation is depicted in Eq.1: where nthe stage of life cycle;  the final quality in stage n;  the set of possible methods, technologies, etc. in stage n to process the data or products obtained in the previous stage of research.
The described life cycle with five stages can be simplified to three-stage model (see Fig. 3).If previous model is process oriented, three-stage model is process and component oriented (see Table 1).

Processes Components
Requirements and project planning Conception -goal, vision, idea, requirements, project plan, hypothesis, existing problem, opportunities, etc.

Geospatial data acquisition and processing
Resources -data, technologies, methods, devices, drones, software, systems, budget, databases, etc.

Research and decisionmaking
Knowledge workers -experts, researchers, specialists; their knowledge, experience and skills, etc.
Other advantage of three-stage model is clear display, that quality of research is directly dependant on the quality of data, methods and systems used by experts (see Fig. 3), that is defined in Eq.2: where q is quality of research and decision, dused data, pmethods and systems used to process data.
The life cycle of project includes the collaboration of different expert groups and different subtasks.The quality meaning of LiDAR data and the list of measured and controlled features depend on a product type and the goal of experts.

Experts' collaboration in project
Relations among experts and specialists, who work with LiDAR data, are depicted in Fig. 4. Geospatial intelligence specialists analyze geospatial information and provide obtained information for a decision-making community.They take a central role in a projectthey must understand requirements and methods to achieve a defined goal managing and controlling other groups of project.
Traditionally raw LiDAR data are not applicable for geospatial analysis, usually they are initially processed to get the secondary product like DEMs and digital surface models (DSMs), 3D city models, thematical maps etc.; which are used to perform flood modelling, city planning, change detection and other tasks by geospatial intelligence specialists.

Figure 4. Knowledge exchange and collaboration among different experts
For any spatial data acquisition system, it is essential to ensure that quality assurance procedures are completed and a final product meets the end users' needs and accuracy requirements (Saylam, 2009).The secondary data or product are provided by experts of data processing.Depending on task and applied methods, some features are more essential, that must be considered compliting spatial data acquisition.For example, flood-risk maps (Webster et al., 2004) can be generated using DSM, therefore only laser scanning of terrain is sufficient for research, but bathymetric LiDAR (Mandlburge et al., 2015) may be more suitable for river hydrodinamic system modelling and flood simulation.Speaking about object classification and spatial statistics, the research of Jakubowski et al., related with extraction of forest structure metrics, has showed that methods require a threshold density to achieve reasonable accuracy, but that does not benefit significantly from very high LiDAR density (Jakubowski et al., 2013).Therefore, knowledge about used image processing methods and their specifics must be provided to flight planning specialists as a requirement specification with LiDAR data parameters.This means, that a requirement specification and defined parameters of requested LiDAR data construct a basis for a quality management specifying what is accepted as quality parameters, their quantity, methods and tools to measure them.
The flight planning is complex process and research itself.Initial review of a survey area is essential for effective survey planning using existing imagery or maps for a complete assessment (Habib et al., 2009), (Saylam, 2009).Local weather patterns, sudden topography changes, existing water basins, and general terrain cover may impact flight planning and should properly be examined prior to flight planning.Large survey areas (>200 km 2 ) need multiple flight plans (e.g. 4 x 50 km 2 ) that should accommodate 20-30% overlapping to prevent loss of data in case of a system failure/reset and easier data processing/handling (Saylam, 2009), where good engines and software are essential for flight planning experts.All these equipment is provided by technical experts, who develop image processing oriented laser scanners (Schnadt and Katzenbeißer, 2004), simulation and data acquisition models (Habib et al., 2008), special targets and related models to improve accuracy and matching procedures (Zhi and Zhong, 2008), etc.

Quality meaning among different experts
The quality of product is comparative quantity, which depends on the object of analysis and selected features for comparison.Speaking about LiDAR point cloud, the meaning of quality depends on the expert group and problem solved by them, which are changing through the life cycle of geospatial intelligence.
Technical expertstheir goal is to improve the measurement system of laser scanner.The main features of comparison are the vertical and horizontal accuracy of point coordinates obtained by laser scanner.In geodesy, the term quality is mostly used synonymous to accuracy.Accuracy is the degree to which information on a map or in a digital database matches true or accepted values.Accuracy is an issue pertaining to the quality of data and the number of errors contained in a dataset or map (Zhi and Zhong, 2008).Their objects of research are the system of laser scanning, its mathematical models, types and sources of errors, methods and methodologies to improve measurement accuracy (Csanyi and Toth, 2007), (Habib et al., 2008), (Zhi and Zhong, 2008).The technological improvement of laser scanner is developed considering business uses too.For example, Schnadt and Katzenbeißer presented airborne fiber scanner in 2004 year accentuating its advantages for building extraction and for digital elevation model generation of forestlands describing technological solution, which better penetrates vegetation and records facades of buildings (Schnadt and Katzenbeißer, 2004).The high point density and small point spacing would be impossible without all these researches.
Data processing experts are mainly associated with digital elevation and surface modeling, which are applied for flood risk and floodplain estimation (Webster et al., 2004), (Casas et al., 2006), (Mandlburge et al., 2015) or road planning (Pereira and Janssen, 1999).Different experts mention the quality of LiDAR point cloud to generate precise DEMs and DSMs shortly formulating it in high point density (Karel et al., 2006), (Liu et al., 2008), (Zhi and Zhong, 2008).However, some authors more openly discuss the quality of point cloud showing into point distribution factor, which must be considered together with point number increment:  "The final data quality achieved depends on the accuracy of the survey equipment and the density and distribution of the measured points", (Casas et al., 2006);  "Areas where the distance to the nearest data point is larger than a certain threshold may be declared as extrapolation regions", (Karel et al., 2006);  "The number of grid cells should be roughly equivalent to the number of terrain data points in covered area.",(Liu et al., 2008);  "Rule of thumb is to collect data with 1 meter point spacing for each ½ meter contour interval (e.g. 4 meter point spacing is required to create 2 meter contour interval)", (Saylam, 2009);  "Requirements for airborne LiDAR surveys usually specify point density in points (or measurements) per square meter.However, this metric of points per square meter does not provide information about the actual spatial point distribution on the target area which constitutes the true quality of the data", (Ullrich, 2013).
Quality of DEMs and DSMs is measured as height difference between points of generated model and control points measured using more precise method calculating RMSE (root mean square error) to express the amount of quality (Uddin, 2002), (Karel et al., 2006), (Saylam, 2009), (Wang et al., 2010), (Remondinoa et al., 2011).Currently, there is ASPRS standard "ASPRS Positional Accuracy Standards for Digital Geospatial Data" (www.asprs.org/Standards-Activities.html), which specifies horizontal and vertical accuracy for digital data and their verification (ASPRS, 2015a), however, it does not specify the best practices and methodologies needed to meet the accuracy thresholds stated by it and does not cover classification tasks and business uses (ASPRS, 2015b).
Elimination of all elevated features above ground may be considered as classification.Main categories of classification are bare earth and low grass, high grass and crops, brush lands and low trees, forests and urban areas.Various other main or subcategories may be created depending on the project requirements (Saylam, 2009).For example, the error matrix is used to evaluate the precision of ground and non-ground point classification to evaluate generated DEM quality (Wang et al., 2010).
The clear classification tasks are point density sensitive too, for example, echo-based indexes are calculated as proportion between pulses number and their echo number (Chehata et al., 2009); therefore ground samples must have sufficient pulse number to use this method, that can be satisfied only considering steady point distribution.Other examples are segment-based voxel methods (Wang et al., 2016) and grid-based methods (Kodors et al., 2014), which are sensitive to image resolution strongly related with point distribution too.If the distribution of points forms the dense groups, segmentation is not possible due to high number of holes in voxel cube or in 2D grid.The traditional methods to define the classification quality are error matrix/confusion matrix, total accuracy and Cohen's Kappa coefficient (Chehata et al., 2009), (Kodors et al., 2014), (Kodors and Kangro, 2016), (Wang et al., 2016).
Flight planning experts' duty is to collect LiDAR data, which satisfy a requirement specification.Accurate flight planning for airborne LiDAR survey is essential for a total quality assurance experience (Saylam, 2009).The mission (flight and data acquisition) is planned in the lab with dedicated software, starting from the area of interest (AOI), the required ground sample distance (GSD) or footprint, and knowing the intrinsic parameters of the mounted digital camera.Thus fixing the image scale and camera focal length, the flying height is derived (Uddin, 2002).Their one tool is good devices, software and methods developed by technical experts (Schnadt and Katzenbeißer, 2004), (Csanyi and Toth, 2007), (Habib et al., 2008), (Zhi and Zhong, 2008), and another is their personal experience summarized in the rules of thumb, statistical data and methodologies (Saylam, 2009).The quality of product / service is tradeoff among cost, surveyed area and achieved data quality considering a requirement specification (see Fig. 5).

Figure 5. Tradeoff between cost, point density and area surveyed
Geospatial intelligence specialists and decision-making community: the fields of LiDAR application are very diverse.The tasks completed by geospatial intelligence are itself complex researches mainly related with acquisition of statistical informationgeospatial object identification, counting and change detection; or with simulation and propagation of some geospatial processes like optimal location selection for wind and solar generators, flood simulation, etc.; or with geospatial engineering solution development, for example, bridge, road or other man-made construction planning.Decision-making community are managers, their product is economic, politic or strategic solution.Therefore summarized quality meaning related with LiDAR data can be the tradeoff between investments into project and obtained final product.

Point density and point spacing
Historically the term nominal pulse spacing (NPS) has been in use across the industry since its beginnings; the counterpart term, nominal pulse density (NPD), came into use when collection densities began to fall below 1 pls/m 2 (Heidemann, 2014).The nominal sampling frequency actually determines the ability and quality in object detection, surface reconstruction, modeling and much more (Ullrich, 2013).However, there is no standardized method on how NPS and NPD should be derived (Naus (b)).
The LiDAR point cloud is only image of real world, which has infinite fractal structure.The fractal behave of boundary called coastline paradox is also applicable to relief shape constructing its continuous signal shape (see Fig. 6).The rules like "The number of grid cells should be roughly equivalent to the number of terrain data points in covered area."(Liu et al., 2008) or "Rule of thumb is to collect data with 1 meter point spacing for each ½ meter contour interval (e.g. 4 meter point spacing is required to create 2 meter contour interval)" (Saylam, 2009) are strongly correlated with Nyquist-Shannon sampling theorem: "If a function x(t) contains no frequencies higher than B hertz, it is completely determined by giving its ordinates at a series of points spaced 1/(2B) seconds apart." The clearly reformulated sampling theorem for LiDAR case is mentioned by Ullrich (Ullrich, 2013): "The sampling frequency must be at least twice the highest spatial frequency of the object to be sampled, in order to be able to reconstruct the object or at least to do an object recognition/detection routine." According to fractal behavior, the precise curve using laser scanning can be found, only if distance between two points (samples) approaches to zero (see Eq.3): where Lmeasured length of curve, Nnumber of discrete lines, constant distance between two points (length of discrete line),  real curve of relief.In limit  → 0, the measure L becomes asymptotically equal to the length of the real curve and is independent of  (Ribeiro and Miguelote, 1998).
The similar relation is true for surface and volume (see Eq.4 and Eq.5): = () 3 →0 ⃗⃗⃗⃗⃗⃗⃗⃗⃗   . (5) These three projection types of LiDAR point cloud (see Fig. 7) can be considered to group the image processing algorithms into next categories:  Linear methods, which analyze LiDAR cross-sectionsstrips and point lines (Hu and Ye, 2013), (Hosseini, 2014);  Planar methods (grid-based methods), which construct projection of LiDAR point cloud using either height parameter (Kodors et al., 2014), indexes like echo-rate and variance (Chehata et al., 2009), combination of pixel features extracted from LiDAR and hyperspectral images (Jahan and Awrangjeb, 2017);  Volume methods, which process filled and empty voxels (with and without points), (Wang et al., 2016), voxel features (Plaza-Leiva et al., 2017).The requirement "to satisfy regular point distribution" is mentioned in USGS "LiDAR Base Specification": "The spatial distribution of geometrically usable points will be uniform and regular.Although LiDAR instruments do not produce regularly gridded points, collections shall be planned and executed to produce an aggregate first return point cloud that approaches a regular lattice of points, rather than a collection of widely spaced, high-density profiles of the terrain" (Heidemann, 2014).The use of denser data for complex surface representation improves the accuracy of the derived surface at locations between the LiDAR measurements (as each reach between points is shorter), (ASPRS, 2015a).
To sum up, the LiDAR point cloud has the similar image resolution like orthophoto or satellite images.The smaller geospace unit is sensed, the better resolution is obtained.The example of 2D point cloud projection with resolution 0.5 meter is depicted in Fig. 8, where the black pixels are empty cellswithout points.Therefore, if point density is improved without consideration of point distribution (point spacing), the new level of image resolution is not satisfied due to high number of empty cells (see Fig. 9).Of course, algorithms can interpolate the values of empty cells, but it is category of image processing quality and precision depends on the distance among samples.More ground points will result in less interpolation between points and improved surface definition, because more characteristics of the actual ground surface are being measured, not interpolated.The use of more ground points is more critical in variable or complex surfaces, such as mountainous terrain, where generalized interpolation between points would not accurately model all of the changes in the surface (ASPRS, 2015a).Traditionally, the point density is calculated as number of points per area (see Eq.6), but it does not provide some information about point distribution.
where dpoint density, pnumber of points, Aarea of ground sample.
Point density is related to point spacing and it is therefore logical that the closer a group of points are to one another, the higher the point density (Naus (b)).The parameter "point spacing" is used to express the distribution of points; it is based on Delaunay triangulation of the points on the ground.The draft proposes to attribute to each point a metric by taking the average length of all the edges connecting the specific point to all its neighbouring points (see Fig. 10a, Eq.7), (Ullrich, 2013), (Naus (b)).
where s kpoint spacing of ground point k (DEM point),   ⃗⃗⃗⃗⃗vector from point k to its neighbouring point i, nnumber of neighbouring points.
Actual or real-world data points show variable spacing with values that are smaller or larger than a nominal spacing specified in a LiDAR project (Naus (b)).For the subsequent considerations, Ullrich suggests to use the worst case for nominal point spacing by taking the maximum of all edges (see Eq.8) instead of the average (see Eq.7).This ensures that gaps in the sampling of the ground are accounted for and the metric is not reduced by very dense or even overlapping measurements within a single scan line, while having a wide line spacing (see Fig. 10b), (Ullrich, 2013).Comparing with orthoimages, this maximal point spacing can be associated with image resolution, because it defines the minimal regular step between pixels (samples).
where spoint spacing of ground point k (DEM point), ∆distance between two points, P kset of neighbouring points near p k .
The reviewed point density is strongly related with model of flat surface of earth, while the real shape of surface is rough.The ASPRS "LiDAR Density and Spacing Specification v1.0" (Naus (a)) provides other model for point density calculation considering uneven model, that is based on Voronoi cells (see Fig. 10c, Eq.9), (Naus (b)).
where d'nominal point density, A'polygon area of Voronoi cell.Nominal point density based on Voronoi cells is more correct comparing with projected point density, because mountain relief with steep slopes has greater area than flatlands.Turning to the point spacing, USGS "LiDAR Base Specification" defines nominal pulse spacing as the square root of the average area per point (the area of the polygon divided by the number of points it contains), (Heidemann, 2014), that is inverse of point density (see Eq.10).
Combining nominal pulse spacing (Eq.10) with nominal point density (Eq.9), if laser scanning is completed with regularly gridded points, the Voronoi cell is equal to ground sample area (see Eq.11).
where GSDground sample distance.
Eq.11 clearly shows that nominal pulse spacing is equal to ground sample distance or image resolution.This means that ground sample distance defines image resolution of LiDAR point cloud, where ground sample distance must be associated with flat surface in volume neither with flat earth model (see Fig. 11), but pixelwith Voronoi cell.That corresponds to Ullrich suggestion "use the worst case for nominal point spacing by taking the maximum of all edges" (Ullrich, 2013).Speaking about planar projection, it will have equal or smaller ground sample distance (see Fig. 12).According to fractal behavior, if image resolution is not restricted by technical factor, the user has infinite choice of resolutions.The first rule of choice is simple: "If monitoring object can be detected and depicted in point cloud, there is possibility to extract and to classify it" (WEB), (Kodors and Kangro, 2016).The second rule is "There must be depicted sufficient number features of objects to classify and to extract it", that is hard choice without robust research.The research of Jakubowski et al. has showed, that image processing methods require a threshold density to achieve reasonable accuracy, but that do not benefit significantly from very high LiDAR density (Jakubowski et al., 2013).Considering thinning model with regular grid, this conclusion can have other point of viewthere is minimal image resolution to satisfy reasonable accuracy.For example, another research has showed the clear relationship between classification accuracy and ground sample distance (resolution), while point number per ground sample does not influence into accuracy, only into shape precision (Kodors and Kangro, 2016).

Conclusion
The users of LiDAR data have false perception, that point density is the main parameter of LiDAR point cloud quality.However, LiDAR only provides the samples of the object's surface with the limited spatial resolution, which is directly related with the point spacing (see Eq.11).
There are different mathematical models how to measure point spacing (see Eq.7-8, 10).According to the author, the most suitable and robust method is the method suggested by Ullrich -"use the worst case for nominal point spacing by taking the maximum of all edges" (Eq.8), because it ensures that points are evenly distributed in all directions.
The point density (Eq.6) is the weakest model, because point density can be high, while there are wide holes between lines (see Fig. 10b).The model with Voronoi area (Eq.9) does not solve this problem, because point density can increase, while the distance between lines is constant (see Eq.12).
where ab is the area of rectangle (A), adistance between lines, bpoint spacing in line (see Fig. 10b).
This research has identified the mathematical model how to measure quality of LiDAR point cloud.The future research will entail the analysis of dependence between building detection quality (Kappa coefficient / total accuracy) and point spacing.The analysis will be completed using the simulation of laser scanning, which provides DSM with different point spacing.There is special interest to the case of point density increase with constant point spacing.Speaking about spatial resolution, the accuracy of building borders can be used to measure the relationship between reconstruction precision and point spacing.

Figure 3 .
Figure 3. Three-stage model of geospatial intelligence

Figure 7 .
Figure 7. Illustration of 1D, 2D and 3D projections of LiDAR point cloud based on fractal structure

Figure 10 .
Figure 10.Models for point density and point spacing: a) distance to neighboring point; b) dense point distribution with wide line spacing; c) area of Voronoi cell

Figure 11 .Figure 12 .
Figure 11.Illustration of plane in 2D and 3D space as association with flat map (a) and crumpled map (b)

Table 1 .
Processes and components