Decomposition of small-footprint full waveform LiDAR data based on generalized Gaussian model and grouping LM optimization

Hongchao Ma; Weiwei Zhou; Liang Zhang; Suyuan Wang

doi:10.1088/1361-6501/aa59f3

1. Introduction

Airborne light detection and ranging (LiDAR) is an active remote sensing technique for acquiring 3D geospatial data over the Earth's surface [1–3]. It integrates the global positioning system (GPS), the inertial navigation system, and a laser scanner, to generate a point cloud which is a discrete dataset encoding 3D coordinate values under a given geographic coordinate system [4]. The point cloud can be processed for further applications such as topographic mapping, power lines detection, forestry parameters retrieval, and many others. Discrete point cloud was the main dataset provided by a LiDAR system until 2004 when the full waveform digitization technique was applied to the small-footprint LiDAR manufactured by RIEGL [5, 6]. The so-called full waveform data generated by such a system are believed to have the following advantages when compared to a traditional discrete point cloud [7–12]. Firstly, end users can process and interpret the data in a more controllable manner. Discretization of the reflected laser energy used to be a proprietary technique of the equipment manufacturer, leaving a black-box to data users where the discretization process was concerned. This situation did not change until the advent of full waveform data. Secondly, it becomes possible to get a higher density and higher precision point cloud, since end users can adopt more advanced algorithms to decode the point cloud from the full waveform data. Finally, the vertical structure information of plants and other surface targets can be sensed by analyzing the full waveform data which has potential usefulness for further applications, especially in the forestry industry. Recent LAS format (version1.3 and later) supports simultaneous storage of point cloud and waveform data; therefore, end users not only have the point cloud data generated by the system at hand, but can also regenerate the point cloud from waveform data using an algorithm of their choosing.

Some parameters, mainly the peak locations of a waveform and the width of a given pulse component, have to be extracted from waveform before the point cloud is generated from them. Many algorithms have been developed to fulfill this purpose in the past decade. They can be classified into three categories: the threshold method, the deconvolution method, and the waveform decomposition method [8, 13]. The center of gravity detection method [14] and the averaged square difference function method (ASDF) [15] belong to the first category. Good results can be achieved by using methods in this category if the number of pulses is not too many, but the ability of automatic detection of echo pulses is limited. The deconvolution method [16, 17] simply assumes that the recorded waveform is the convolution of the transmitted pulse and the response of the illuminated surface, and applies a deconvolution process to extract the attributes of the illuminated surface based on knowledge of the transmitted pulse and the received waveform. This method can improve the ability of echo detection, but is not able to retrieve parameters describing the waveform such as peak locations, width of a pulse, etc [18]. In the waveform decomposition method, a waveform is viewed as the summation of several returns from the objects distributed along the path the emitted laser beam traveled. This method aims at the extraction of individual returns by decomposing the recorded waveform. Wagner et al [19] proved in theory that the waveform generated by RIEGL can be fitted by a Gaussian mixture model. Taking the shape of a waveform into account, and based on Wagner's Gaussian model, Chauve et al [20] proposed the generalized Gaussian model, and applied the LM algorithm for parameter optimization. Ma Hongchao et al [21] reconsidered the physical and mathematical meaning of the Expectation-Maximum (EM) algorithm and modified it to optimize the parameters of the Gaussian mixture model. Yuchu et al [22] proposed a gradual progressive decomposition strategy in order to solve the difficulties of determining the number of Gaussian components. Xu et al [23] proposed a full-waveform echo decomposition method, in which the sorted echo components are added into the decomposition model one by one according to their orders. For each adding process, the parameters of all echo components that have already been added into the decomposition model are iteratively renewed. In general, the waveform decomposition method can get more points, and can retrieve additional attributes of the waveform such as peak location, amplitude, pulse width, etc., which are valuable features for further analyzing waveform data.

The Levenberg–Marquardt (LM) algorithm is one of the most widely used optimization algorithms. It outperforms simple gradient descent and other conjugate gradient methods in a wide variety of problems [24]. It was first applied for waveform decomposition by Hofton et al [9], and was demonstrated by data collected by the airborne laser vegetation imaging sensor (LVIS). A potential issue arising in practice is that the algorithm may be interrupted when non-numerical entries occur in the Jacobian matrix. This can occur when the generalized Gaussian model is applied and some specific initial parameter estimation methods are used. See section 2.2 for details. Little research has been conducted to solve this problem, leaving room for improvement. In this paper, a grouping process is introduced to the conventional LM algorithm in order to improve results when non-numerical entries occur in the Jacobian matrix, and its performance is evaluated by comparing the point cloud generated by the proposed method to those generated by the LiDAR system and those generated by the conventional LM algorithm.

The remaining contents of the paper are organized as follows: in section 2, basic principles of the conventional LM algorithm for parameters optimization for generalized Gaussian model are described. A numerical example is provided to demonstrate the invalidity of the algorithm when the Jacobian matrix contains non-numerical entries. Principles of grouping LM are then expounded, as well as detailed steps that can be used for computer coding. Experimental results and comparisons are given in section 3. Finally, conclusions are drawn in section 4.

2. Gaussian decomposition based on grouping LM algorithm

For a small-footprint LiDAR system, each laser output pulse shape is assumed to be Gaussian with a specific and calibrated width. The recorded waveforms are therefore the convolution of the impulse response of the LiDAR system and of the reflection surface; the impulse responses of most LiDAR systems and most reflection surfaces can be approximated as Gaussian functions. It was thus demonstrated that more than 98% of the observed waveforms with the RIEGL system could be correctly fitted with a sum of Gaussian functions [25]. Gaussian functions have also been widely applied to model the echoes of large footprint LiDAR systems such as the geoscience laser altimeter system (GLAS) installed on the ice, cloud, and land elevation satellite (ICE Sat).

The returned signal, however, is not always Gaussian; especially in complex environments such as forested areas. A generalized Gaussian model was first used in 2007 by Chauve and Mallet [26] with the expectation of improving complex waveform fitting. The model they applied has the following analytical expression:

$\begin{eqnarray}&&{{f}_{j}}(x)={{A}_{j}}\exp \left(-\frac{\text{abs}{{\left(x-{{\mu}_{j}}\right)}^{\alpha _{j}^{2}}}}{2\sigma _{j}^{2}}\right)\end{eqnarray} \tag{ 1 }$

where ${{A}_{j}}$ is the pulse amplitude, ${{\sigma}_{j}}$ its width, and ${{\mu}_{j}}$ is the function mode of the $j\text{th}$ echo. ${{\alpha}_{j}}$ is the so-called shape index: when ${{\alpha}_{j}}=\sqrt{2}$ , ${{f}_{j}}(x)$ is the conventional Gaussian function. A given waveform can be modeled with a corresponding ${{\alpha}_{j}}$ .

When the generalized Gaussian mixture model is adopted, the waveform of a return echo can be expressed as:

$\begin{eqnarray}&&{{f}_{{{j}_{i}}}}(x)=\underset{i=1}{\overset{k}{\sum}}\,{{A}_{{{j}_{i}}}}\exp \left(-\frac{\text{abs}{{\left(x-{{\mu}_{ji}}\right)}^{\alpha _{{{j}_{i}}}^{2}}}}{2\sigma _{{{j}_{i}}}^{2}}\right)\end{eqnarray} \tag{ 2 }$

where sub-subscript $i~$ denotes the $i\text{th}$ Gaussian component of the return echo. The main steps of waveform decomposition that follow include: data pre-processing (noise filtering and data smoothing), parameter initialization (estimating the number of Gaussian components and the initial parameters for each of them), and parameter optimization (finding the optimal estimates of the number of components and the parameters of each component); the latter two are the most important steps. In general, the Gaussian decomposition contains two main steps. First, the number of Gaussian components consisting of a given waveform and the corresponding parameters (i.e. peak position and width, etc) need to be estimated. Second, an optimization procedure is performed to find the best estimates for the Gaussian parameters [9, 27]. Many existing methods can be applied to estimate the number of components: minimum message length, partition coefficient, Akanke's information criterion, Mindist, etc [28]. Initial parameters are usually estimated by methods such as local maxima, the center of gravity (COG), zero-crossing of the first derivative, etc [19, 27, 29]. The local maxima method is applied for initial peak position estimation in this paper, due to its high speed and ease of coding. The total number of local maximums is the number of Gaussian components. However, initial values of peak positions estimated by local maxima are all integers; this causes non-numerical columns to occur in the Jacobian matrix during the LM optimization. In what follows, section 2.1 describes the basic principles and limitations of the conventional LM algorithm when non-numerical columns of the Jacobian matrix are taken into account; section 2.2 provides a detailed description of grouping LM and its merits in overcoming the limitations of the conventional LM mentioned above.

2.1. Principles of the classical LM algorithm and its limitation

The classical LM algorithm is a blend of the Gauss–Newton method and the gradient descent method [30]. It has been proved that gradient descent and Gauss–Newton iteration are complementary in the advantages they provide. Assuming that there is a set of observations $\,\boldsymbol{X}\left(\boldsymbol{X}\in {{R}^{n}}\right)$ . A mathematical model $\,f\left(\,\boldsymbol{p}\right)$ , where $\boldsymbol{p}\in {{R}^{n}}~$ is the parameter vector whose entries need to be estimated, is used to fit the data. A common strategy in optimization is to set an initial vector of p, and then update it by an iteration process which minimizes $\,\boldsymbol{\varepsilon }^{{T}}\boldsymbol{\varepsilon }\left(\boldsymbol{\varepsilon }=\boldsymbol{X}-\tilde{{\boldsymbol{X}}}\,\right)$ , where $\tilde{{\boldsymbol{X}}}\,\in \boldsymbol{R}^{{\text{n}}}$ is the vector whose entries calculated from $f\left(\boldsymbol{p}\right)$ with the currently updated p. In optimization jargon we are solving the following minimization problem:

$\begin{eqnarray}&&\boldsymbol{p}_{{\text{opt}}}=\arg \underset{\boldsymbol{p}}{{\min}}\,||\boldsymbol{X}-f\left(\,\boldsymbol{p}\right)||\end{eqnarray} \tag{ 3 }$

LM is one of the algorithms that achieves optimized parameters by iteration. Set an initial parameter vector $\boldsymbol{p}_{{k}}$ as the current parameter vector and substitute $\boldsymbol{p}_{{k}}$ into the equation $\tilde{{\boldsymbol{X}}}\,=f\left(\,\boldsymbol{p}\right)$ to calculate the estimated values. Then use the observations to compute the residual ${{\varepsilon}^{T}}\varepsilon$ . The iteration process is performed until the residual $\boldsymbol{\varepsilon }^{{T}}\boldsymbol{\varepsilon }$ is less than the preset threshold $~\boldsymbol{\varepsilon }_{{0}}$ , at which point the current $~\boldsymbol{p}_{{k}}$ is the final parameter vector. During each iteration the first order Taylor series expansion of $f\left(\,\boldsymbol{p}\right)$ near $\boldsymbol{p}_{{k}}$ is calculated:

$\begin{eqnarray}&&f\left(\,\boldsymbol{p}_{{k}}+\boldsymbol{\delta }_{{k}}\right)=f\left(\,\boldsymbol{p}_{{k}}\right)+\boldsymbol{J}_{{k}}\boldsymbol{\delta }_{{k}}\end{eqnarray} \tag{ 4 }$

where $\boldsymbol{J}_{{k}}=\partial f\left(~\boldsymbol{p}_{{k}}\right)/\partial \boldsymbol{p}_{{k}}$ is the Jacobian matrix evaluated at the point of $\boldsymbol{p}_{{k}}$ , $\boldsymbol{\delta }_{{k~}}$ is the vector that minimizes the following:

$\begin{eqnarray}&&\|\boldsymbol{X}-f\left(\,\boldsymbol{p}_{{k}}+\boldsymbol{\delta }_{{k}}\right)\|=\underset{{{\delta}_{k}}}{{\min}}\,\|\boldsymbol{\varepsilon }_{{k}}-\boldsymbol{J}_{{k}}\boldsymbol{\delta }_{{k}}\|\end{eqnarray} \tag{ 5 }$

The precision of the final results and the speed of convergence are influenced by the initial parameters in the conventional LM algorithm [31]. When the initial values are closely matched to the true ones, the Jacobian matrix is non-singular; hence the LM algorithm is convergent with a high speed. Otherwise, non-numerical elements may appear in the Jacobian matrix, leading to an unexpected interruption of the iteration. To demonstrate this effect, let

$\begin{eqnarray}&&{{f}_{j}}(x)={{A}_{j}}\exp \left(-\frac{\text{abs}{{\left(x-{{\mu}_{j}}\right)}^{\alpha _{j}^{2}}}}{2\sigma _{j}^{2}}\right)\end{eqnarray} \tag{ 6 }$

be a Gaussian component, where $\boldsymbol{p}=\left\{{{\alpha}_{j}},{{\mu}_{j}},{{\sigma}_{j}},{{A}_{j}}\right\}$ is the parameter vector, $x$ is the sampling time, abs( $x-{{\mu}_{j}}$ ) is the absolute value of ( $x-{{\mu}_{j}}$ ), and ${{A}_{j}}$ is the amplitude. The partial derivative with respect to ${{\alpha}_{j}}$ is $\frac{{{A}_{j}}\centerdot {{\alpha}_{j}}\centerdot \text{abs}{{\left({{\mu}_{j}}-x\right)}^{\alpha _{j}^{2}}}\centerdot \log \left(\text{abs}\left({{\mu}_{j}}-x\right)\right)}{\sigma _{j}^{2}\centerdot \exp \left(\frac{\text{abs}{{\left({{u}_{j}}-x\right)}^{\alpha _{j}^{2}}}}{2\sigma _{j}^{2}}\right)}$ , which has domain $\left\{{{\mu}_{j}}\ne x\right\}$ by definition. However, in the conventional LM optimization, $x$ can be any value including $~{{\mu}_{j}}$ . Therefore an element of $\log (0)$ will occur in the Jacobian matrix when ${{\mu}_{j}}$ = $~x$ , resulting in the failure of the algorithm. This is caused by the selection of mathematical model (the phenomenon will not occur if a conventional Gaussian function is adopted; that is, if the shape index is set to $\sqrt{2}$ ), and by the initialization of the parameters (when the initial value of ${{\mu}_{j}}$ is $x$ ). There are no specific treatments for this problem in the existing literature. A common method is to modify the initial values of the parameters, but a major drawback of such an idea is equal treatment to all variables and it is not expected to be effective. Furthermore, if the initial values deviate far from the true values, then the algorithm is unlikely to be convergent. As a matter of fact, compared with the other parameters, the temporal difference $\,{{\mu}_{j}}-x$ is the most likely factor that causes the non numerical values in the Jacobian matrix when $\,{{\mu}_{j}}$ is assigned to with an integer. If dividing the involved parameters into distinct groups and then firstly optimize parameters in the group containing $\,{{\mu}_{j}}$ while treat parameters in other groups as constants, then the optimized $\,{{\mu}_{j}}$ is unlikely to be an integer. Iterate the process until convergence criteria are satisfied. This is what grouping LM works.

2.2. Grouping LM

Bearing the abovementioned problem in mind, this paper proposes an improved LM algorithm: the grouping LM optimization. To employ this algorithm the full waveform parameters are grouped firstly by the definition domains of the elements in the Jacobian matrix. Then, the parameters in each group are sequentially optimized by the LM algorithm according to the number of constraints of each domain, from small to large, rather than optimizing all of the parameters at one time. When the parameters in one group are computed, the remaining parameters are treated as constants. Once the parameters in the first group have been optimized, then the remaining constants in the next group are optimized. Any non-numerical Jacobian entries can be effectively overcome in this way. To demonstrate, equation (6) in section 2.1 is revisited. It should be noted that a full waveform is not a mathematically continuous curve, but is rather a broken line in the 2D coordinate system spanned by the sampling time versus the amplitude. The parameter $x$ in equation (6) is an integer representing the sampling time. If the group including $~{{\mu}_{j}}$ is optimized at first, then the optimized value of ${{\mu}_{j}}$ should be a non-integer. Therefore, in the following process where other parameters such as ${{\alpha}_{j}}$ , ${{\sigma}_{j}}$ , etc. are to be optimized, the situation where ${{\mu}_{j}}$ = $~x$ will not occur since an integer cannot be equal to a non-integer; hence the non-numerical elements in the Jacobian matrix are avoided. The following describes the detailed process of the grouping LM by taking a full waveform with two Gaussian components for illustration. The mathematical model for this particular situation is expressed as:

$\begin{eqnarray}&&{{y}_{j}}\left(\,\boldsymbol{p}\right)=\underset{i=1}{\overset{k}{\sum}}\,{{A}_{{{j}_{i}}}}\exp \left(-\frac{\text{abs}{{\left(x-{{\mu}_{ji}}\right)}^{\alpha _{{{j}_{i}}}^{2}}}}{2\sigma _{{{j}_{i}}}^{2}}\right)\end{eqnarray} \tag{ 7 }$

where $i$ denotes the number of components and p denotes the parameter vector that needs to be optimized such that $\boldsymbol{p}=\left\{{{\alpha}_{ji}},{{\mu}_{ji}},{{\sigma}_{ji}},{{A}_{ji}}\right\}_{i=1}^{k}$ (in the current case k = 2). ${{A}_{ji}}$ is the amplitude of the full waveform, ${{\mu}_{ji}}$ denotes the location of the component peak, ${{\sigma}_{ji}}$ denotes the half width of the pulse, and ${{\alpha}_{ji}}$ is the shape parameter which can fine-tune the shape of the waveform. The objective function for optimization is:

$\begin{eqnarray}&&\arg \underset{\boldsymbol{p}}{{\min}}\,\|x-y\left(\boldsymbol{p}\right)\|\end{eqnarray} \tag{ 8 }$

The optimization process under the umbrella of the grouping LM can be described in detail as follows:

(1)
Initialization: Determine the initial values of $\left\{\alpha _{ji}^{0},\mu _{ji}^{0},\sigma _{ji}^{0},A_{ji}^{0}\right\}_{i=1}^{2}$ , where the superscript 0 denotes the number of iterations. Suppose that the raw waveform data have been pre-processed, and the peak amplitudes and locations $A_{ji}^{0}$ and $\mu _{ji}^{0}$ are determined by local maxima as follows. For a given set of pre-processed waveform data, assign the maximum amplitude and its corresponding time position plus half a sampling step to $A_{ji}^{0}$ and $\mu _{ji}^{0}$ , respectively. For each peak, scan the waveform along both sides to find two points with amplitudes equal to half that of the peak. Calculate the distance along the $x$ axis between these two points, and divide the distance by two to determine $\,\sigma _{ji}^{0}$ . We assume that the initial shape parameter of each echo is 1.5.
(2)
Grouping: the parameters are divided into two groups: $\left\{\mu _{ji}^{0},A_{ji}^{0}\right\}_{i=1}^{2}$ and $\left\{\alpha _{ji}^{0},\sigma _{ji}^{0}\right\}_{i=1}^{2}$ .
(3)
Optimizing: First, the parameters $\left\{\alpha _{ji}^{0},\sigma _{ji}^{0}\right\}_{i=1}^{2}$ are treated as constants and the conventional LM algorithm is used to optimize the parameters $\left\{\mu _{ji}^{0},A_{ji}^{0}\right\}_{i=1}^{2}$ resulting in $\left\{\mu _{ji}^{\text{opt}},\,A_{ji}^{\text{opt}}\right\}_{i=1}^{2}$ . Now keep $\left\{\mu _{ji}^{\text{opt}},A_{ji}^{\text{opt}}\right\}_{i=1}^{2}$ as constants to optimize $\left\{\alpha _{ji}^{0},\sigma _{ji}^{0}\right\}_{i=1}^{2}$ resulting in $\left\{\alpha _{ji}^{\text{opt}},\sigma _{ji}^{\text{opt}}\right\}_{i=1}^{2}$ .
(4)
Iterating: repeat step (3) with updated parameters $\left\{\mu _{ji}^{\text{opt}},A_{ji}^{\text{opt}}\right\}_{i=1}^{2}$ and $\left\{\alpha _{ji}^{\text{opt}},\sigma _{ji}^{\text{opt}}\right\}_{i=1}^{2}$ , until the stop criterion mentioned in section 2.1 is satisfied.

3. Experimental results

3.1. The experimental datasets

Both simulation and real data were used to test our algorithm. 10 000 simulation data were generated in accordance to formula (2) where we assumed $\,{{A}_{i}}=1$ , therefore, they are noise-free data suitable for theoretically test the proposed algorithm; 5000 simulated waveforms were generated from two Gaussian functions superpositioned, while the other 5000 waveforms were the superposition of three. The real data were acquired by the RIEGL LMS-Q560 over a mountainous area full of vegetation. The full waveform data are contained in two separate files with suffixes LWF and LGC. A ^*.LWF file records the raw full waveform data and a ^*.LGC file records attributes related to the raw waveform. The average density of the point cloud provided by the system is about 0.8 pts m⁻². The dataset contains 5487 240 waveforms. Each waveform consists of 58 samples from the emitted signal and the same number of samples from the echo.

3.2. Point cloud generation by waveform decomposition

The Gaussian decomposition of the waveform data is performed according to the steps described in section 2.2. The point cloud is generated by a set of formulae (9) [32] after parameters of the Gaussian model have been estimated by the grouping LM.

$\begin{eqnarray}&&\left\{\begin{array}{c} {{E}_{i}}={{E}_{0}}+\text{d}E\ast \left(\text{WFOFFSET}+i\right)~ \\ {{N}_{i}}={{N}_{0}}+\text{d}N\ast \left(\text{WFOFFSET}+i\right)~ \\ {{H}_{i}}={{H}_{0}}+\text{d}H\ast \left(\text{WFOFFSET}+i\right)~ \end{array}\right. \end{eqnarray} \tag{ 9 }$

where $i$ denotes the $i\text{th}$ sample of a waveform with $\left\{{{E}_{i}},{{N}_{i}},{{H}_{i}}\right\}$ denoting its geodetic coordinates. $\left\{{{E}_{0}},{{N}_{0}},{{H}_{0}}\right\}$ denote the geodetic coordinates of the first sample of the emission pulse. $\left\{\text{d}E,\text{d}N,\text{d}H\right\}$ denote the coordinate differentials of the waveform sampling unit. $\text{WFOFFSET}$ is the offset between the first sample of the emitted waveform and the received one.

Precision is evaluated by comparing the point clouds generated by the proposed algorithm with those provided by the system:

$\begin{eqnarray}&&\left\{\begin{array}{c} {{\sigma}_{x}}=\sqrt{\underset{i=1}{\overset{n}{\sum}}\, \Delta x_{i}^{2}/n} \\ {{\sigma}_{y}}=\sqrt{\underset{i=1}{\overset{n}{\sum}}\, \Delta y_{i}^{2}/n} \\ {{\sigma}_{z}}=\sqrt{\underset{i=1}{\overset{n}{\sum}}\, \Delta z_{i}^{2}/n} \end{array}\right. \end{eqnarray} \tag{ 10 }$

where $\Delta x={{X}_{1}}-{{X}_{2}}, \Delta y={{Y}_{1}}-{{Y}_{2}}, \Delta z={{Z}_{1}}-{{Z}_{2}}$ ; $\left\{{{X}_{1}},{{Y}_{1}},{{Z}_{1}}\right\}$ denote the 3D coordinates of the point cloud generated by the proposed method. $\left\{{{X}_{2}},{{Y}_{2}},{{Z}_{2}}\right\}\,$ denote the 3D coordinates of the point cloud provided by the system. ${{\sigma}_{x}}$ , ${{\sigma}_{y}}$ and ${{\sigma}_{z\,}}$ represent the root mean square errors (RMSE) of deviation along the $X$ , $Y$ and $Z$ axis respectively. $n$ is the number of data points used in the experiment.

3.3. Analysis of the experimental results

3.3.1. Results of the simulation data.

We fitted the waveforms by using conventional LM and grouping LM and calculated the RMSE for the parameters of echo (peak) location, half width and shape index, respectively. Table 1 shows the comparison results. Generally speaking, parameters estimated by LM grouping are more precise than estimated by conventional LM in terms of their RMSEs. More specifically, the pulse width and the shape index estimated by the grouping LM are closer to the true values. Both algorithms can estimate comparable echo locations, though the grouping LM obviously outperforms the conventional LM. We can conclude that the grouping LM algorithm is superior to the conventional algorithm in all the aspects of waveform fitting.

Table 1. Comparison of the RMSEs of the parameters estimated by the conventional LM and the grouping LM.

Return echo	Algorithm	RMSE
Return echo	Algorithm	Echo location $\mu$	Half width $\sigma$	Shape parameter $\alpha$
The 1st return	Conventional LM	0.021	0.12	0.036
The 1st return	Grouping LM	0.019	0.07	0.003
The 2nd return	Conventional LM	0.29	0.15	0.041
The 2nd return	Grouping LM	0.11	0.09	0.007
The 3rd return	Conventional LM	0.22	0.19	0.049
The 3rd return	Grouping LM	0.10	0.08	0.005

3.3.2. Results of real data.

We picked up 487 240 waveforms from the whole real dataset to generate a point cloud by the grouping LM and the conventional LM algorithm respectively in order to evaluate the precision of the point cloud generated by the proposed method. During the iteration of the conventional LM, no discrete points are generated when non- numerical elements occurred in the Jacobian matrix. This is why the grouping LM generates more points than that of the conventional LM, as listed in table 2.

Table 2. Comparing the number of points provided by the system with those generated by the conventional LM and the grouping LM.

Return echo	The system	The conventional LM	The grouping LM
The 1st return	445 010	441 727	391 777
The 2nd return	104 116	110 559	140 638
The 3rd return	28 921	29 095	52 739
The 4th return	347	466	762
The 5th return	29	41	64
The 6th return	0	3	12
Total number of generated points	578 423	581 891	585 992

As shown in the last row in table 2, both of the total number of points generated by conventional LM and the grouping LM are larger than those provided by the system, while the grouping LM has the largest number. It can be discovered by simple calculation that the number of first echoes of grouping LM is about 12% less than the other two, while the number of second echoes of grouping LM is about 26% and 21% more than that of the system and the conventional LM, respectively. More obvious variations occur with the number of third and fourth echoes: the grouping LM generates double the third and fourth echo rates of the others. These facts indicate that it is highly likely the grouping LM can generate more multiple echoes in an area of vegetation, meaning that it is the preferred decomposition algorithm in forest regions.

Deviations were calculated by subtracting the coordinates provided by the system from the corresponding values calculated by both the conventional LM and the grouping LM. As shown in figure 1, the horizontal axis indicates the serial number of the points and the vertical axis indicates the deviation. Around 12 000 points were picked up from the total dataset to plot in figure 1. The first column of figure 1 shows the deviations between the systematic point cloud and those generated by the LM algorithm, while the second column depicts the deviation between the systematic point cloud and those generated by the grouping LM algorithm.

**Figure 1.** Precision comparison between the grouping LM algorithm and the conventional LM algorithm. The first row (a) shows the deviations along the $X$ axis between the systematic point cloud and those generated by the conventional LM and the grouping LM. The second and the third row (b) and (c) show the deviations along $Y$ and $Z$ axis respectively.
Download figure:
Standard image High-resolution image

**Figure 1.** Precision comparison between the grouping LM algorithm and the conventional LM algorithm. The first row (a) shows the deviations along the $X$ axis between the systematic point cloud and those generated by the conventional LM and the grouping LM. The second and the third row (b) and (c) show the deviations along $Y$ and $Z$ axis respectively.
Download figure:
Standard image High-resolution image

It is obvious from the figures that the deviation along the $X$ axis is generally within ±0.05 m, while a small portion lies between ±0.05 m and ±0.4 m. Similarly, the deviation along the $Y$ axis is generally within ±0.1 m, while a small portion lies between ±0.1 m and ±0.6 m. The deviation along the $Z$ axis is generally between −0.3 m and +0.2 m with the majority being less than zero. This shows that the elevation of the generated point cloud is slightly lower than that provided by the system. Though the deviations by the conventional LM and the grouping LM have the same trend, it is highly likely that deviations found with the grouping LM are slightly smaller than those of the conventional LM. This conclusion is further supported by the RMSE of deviations between the grouping LM algorithm and the conventional LM algorithm which were calculated using equation (10), and which are listed in table 3.

Table 3. Comparing the RMSE of deviations along different axis.

Algorithm	$X$ axis (m)	$Y$ axis (m)	$Z$ axis (m)
The conventional LM	0.0353	0.0726	0.2835
The grouping LM	0.0255	0.0521	0.2116

In order to further determine which algorithm can extract a digital elevation model (DEM) with a better accuracy, we collected 154 check points from the test area as ground truth data to assess the accuracies of the three datasets: point cloud provided by the LiDAR system and generated from the conventional LM and grouping LM. All the check points were collected by real time kinematic (RTK) technique, a technique commonly used for accuracy checking in real LiDAR projects [33]. The RMSEs of the elevations in the three datasets with respect to the check points were calculated, as shown in table 4; moreover, the error distributions of the three DEM datasets were demonstrated as frequency histograms, as shown in figure 2. Table 4 and figure 2 show that the grouping LM algorithm has the smallest RMSE and the most concentrated error distribution toward zero, indicating that the DEM generated by the grouping LM algorithm has a better accuracy than the others. Therefore, we conclude that the grouping LM outperforms the conventional LM in terms of DEM accuracy.

Table 4. Comparing the RMSEs of the elevations in the three datasets with respect to checkpoints.

Generation method	Number of checkpoints	RMSE (m)
The Grouping LM	154	0.28
The conventional LM	154	0.31
The system	154	0.32

**Figure 2.** Error distribution of (a) the system, (b) the conventional LM and (c) the grouping LM.
Download figure:
Standard image High-resolution image

In order to evaluate the weak-pulse detectability both of the conventional LM and the grouping LM, we assessed the section profiles of the point cloud generated by the two algorithms, as shown in figure 3. The red and the green points highlight those points with low amplitudes which the system failed to detect. Obviously, more points are generated by the grouping LM algorithm than by the conventional LM algorithm. Because weak pulses usually occur at positions of potential ground points beneath forest canopies, it implies that the grouping LM can recover more ground points in vegetative areas, implying its potential usefulness for DEM generation from full waveform data. Furthermore, more points from canopies can be derived by the grouping algorithm as well, as shown in figure 3(c). These strengths would be of significant value in areas with a lower laser energy penetration rate, such as forests with heavy canopy, since in such areas the determination of weak returns from the ground is especially useful for DEM generation and for subsequent estimation of tree heights [34].

**Figure 3.** Section profile of point cloud (first, intermediate, and last returns) derived from (a) the system (b) the conventional LM, and (c) the grouping LM. Green points and red points in sub-figures (b) and (c) are highlighted as the points with low amplitudes.
Download figure:
Standard image High-resolution image

4. Conclusions and prospects

The grouping LM algorithm is proposed in this paper for the Gaussian decomposition of full waveform LiDAR data. The results demonstrate that the proposed algorithm can not only avoid unexpected interruption caused by non-numerical elements in the Jacobian matrix in the conventional LM, but can also obtain a higher precision and a higher quality point cloud. Moreover, our algorithm can extract other information, such as echo location, pulse width, etc., more precisely. The grouping LM algorithm may have great significance for deriving DEM and tree height in forest areas. It is simple to code, and ready for engineering purposes.

The new method has a time cost which is roughly double that of the conventional LM in our experiment, since two groups of parameters must be optimized in each iteration. This shortcoming may be overcome by performing the grouping process in the first iteration only, and then entering into conventional LM in the following iterations. However, further experiments are required to validate this idea.

Acknowledgment

This work is supported by the National Natural Science Foundation of China (Grant number 61378078).

Decomposition of small-footprint full waveform LiDAR data based on generalized Gaussian model and grouping LM optimization

Article metrics

Submit

Permissions

Author e-mails

Author affiliations

Dates

Peer review information

Abstract

1. Introduction

2. Gaussian decomposition based on grouping LM algorithm