Application of Iterative Noise-adding Procedures for Evaluation of Moment Distance Index for LiDAR Waveforms

The new Moment Distance (MD) framework uses the backscattering profile captured in waveform LiDAR data to characterize the complicated waveform shape and highlight specific regions within the waveform extent. To assess the strength of the new metric for LiDAR application, we use the full-waveform LVIS data acquired over La Selva, Costa Rica in 1998 and 2005. We illustrate how the Moment Distance Index (MDI) responds to waveform shape changes due to variations in signal noise levels. Our results show that the MDI is robust in the face of three different types of noise—additive, uniform additive, and impulse. In effect, the correspondence of the MDI with canopy quasi-height was maintained, as quantified by the coefficient of determination, when comparing original to noise-affected waveforms. We also compare MDIs from noise-affected waveforms to MDIs from smoothed waveforms and found that windows of 1% to 3% of the total wave counts can effectively smooth irregularities on the waveform without risking of the omission of small but important peaks, especially those located in the waveform extremities. Finally, we find a stronger positive relationship of MDI with canopy quasi-height than with the conventional area under curve (AUC) metric, e.g., r = 0.62 vs. r = 0.35 for the 1998 data and r = 0.38 vs. r = 0.002 for the 2005 data.


Introduction
The active remote sensing using LiDAR has seen rapid developments in the past two decades.With a promise of improved accuracy of biophysical measurements and the spatial analysis done in the third dimension, LiDAR could play an important role in atmospheric and environmental field of studies.In fact, NASA's future launch of the Global Ecosystem Dynamics Investigation (GEDI) LiDAR space mission in 2018 [1] could provide the waveform laser scanning technology more boost.The full-waveform LiDAR system has the ability to record many returns per emitted pulse, as a function of time, within the vertical structure of the illuminated object, therefore showing position of individual targets, and finer details of the signature of intercepted surfaces or the proportion of the canopy complexity.Information associated with the illuminated object can be decoded from the generated backscattered waveform, as key features of the waveform such as the shape, area, and power are directly related to the geometry of the illuminated object [2][3][4].The richness of the LiDAR waveform holds the promise to address the challenge of characterizing in detail the geometric and reflection characteristics of vegetation structure, e.g. the vertical canopy volume distribution [5].
Usually, the raw incoming/received waveform displays system noise [32][33][34].The noise can easily overlap returns especially in complex forested areas where waveform peaks from ground and surface objects can be broadened and mixed caused by different layers of the vegetation, making the recognition of ground surface difficult [10,35].In most cases, noise reduction or elimination may be conducted [36] to extract the waveform intensity and avoid the noise to be detected as signal.This is done, for instance, through smoothing with a Gaussian filter at a specified window size [9].Smoothing, however, may pose major risk in eliminating important snags and understory shrubs features [37] that are usually found at the broadened end of the waveform near the ground return.
Detection of the ground surface return is needed to extract the canopy height using the direct method, which involves the identification of wave signal start (WSS) and wave ground peak (WGP) on the waveform.There is no widely accepted method for estimating the locations of the WSS and WGP.One approach currently utilized sets thresholds [10,36] above the mean background noise in the waveform.Thresholds vary from 3σ [9], 4σ [35], to 4.5σ [8], where σ is the standard deviation of the background noise.In either smoothed or unsmoothed waveforms, thresholding can pose a risk of removing the broadening effects at the extents of the waveform that may carry vital information about understory vegetation or structure.The waveform is also susceptible to extreme values in the background that could cause premature peaking, which eventually could lead to nonsensical estimates of canopy height [34].Therefore, it is necessary to come up with a better threshold limit or a sufficient smoothing process that could improve data extraction and pull out the true signal without jeopardizing the information that may be available at every change of the morphology of the LiDAR waveform.Appropriate parametric functions may be applied to the waveform to reconstruct the shape and retrieve information about the object and characterize the properties.Conventional LiDAR methods include splines [38], the Gaussian mixture models [32,39], and the non-linear least-square approach [40].In many mapping applications, Gaussian approximation has been shown to be satisfactory for signal modeling, although Gaussian fitting is less satisfactory for high amplitude pulses [41].
The preprocessing of full-waveform LiDAR data usually leads to using only part of the return signal.In most cases the full intensity of the LiDAR return is rarely used.Hopkinson & Chasmer [42] emphasized the importance of using the intensity of the LiDAR returns, especially in canopy fractional cover models, since the intensity values provide some quantification of the surface areas interacting with the laser beam.In characterizing the intercepted scenes, it is essential to know the full geometry (shape) and radiometry (power) of the signal as both could explain the geometry and radiometry of the detected object.Muss et al. [43] took advantage of the geometry and radiometry of the waveform to introduce shape-based metrics-centroid (C) and radius of gyration (RG)-for forest structure analysis.While these metrics demonstrated better relationships with estimated aboveground biomass (EAGB) than traditional height-based metrics such as height of median energy (HOME) (e.g., [24,25]) and relative heights (RH) or height percentiles (e.g., [9,44]), the centroid by itself cannot track changes of the waveform shape.Expressed as the root mean square of the sum of the distances from the centroid to all points on the waveform, RG is dependent on the centroid and cannot stand alone.
Here we use full-waveform LVIS datasets from La Selva, Costa Rica, to assess the strength of the new Moment Distance framework, which was first introduced for characterizing fine resolution spectrometer data [45,46], and later on seen to be more useful for detecting plastic greenhouses [47], mapping sparse vegetation [48] and estimating canopy heights using LiDAR waveform [49].To study shapes we use the full unsmoothed LVIS Geolocated Waveform (.lgw) dataset.We assess how different types of noise and smoothing procedures impact the new metric in characterizing the shape of the full waveform extent and waveform subsets.As noise can be expected to alter the shape of the waveform, we examine the sensitivity of the new metric to varying levels of uncertainty and in various subsets of the waveform, using metric obtained from the original waveform as the reference.The performance of our method is investigated by how the new metric performs against the canopy quasi-height.We smooth the waveform using various window sizes to demonstrate the relationship of the MDI against the canopy quasi-height and compared the results to the area under curve (AUC) method [50,51].

Waveform LiDAR datasets
The full-waveform LiDAR datasets were acquired in 1998 and 2005 over the same period in La Selva, Costa Rica using the Laser Vegetation Imaging Sensor (LVIS), which is an airborne NASA laser scanning altimeter.We specifically used the LVIS geolocated waveform (.lgw) files from 1998 and 2005, with both having the same number of 431 wave counts.The LVIS laser device produces Gaussian optical pulses at a wavelength of 1064 nm [17].We paired LiDAR waveforms samples from 1998 and 2005 datasets based on the latitude and longitude coordinates.The pairings served as inputs for generating noise-affected waveforms in our analysis.
LVIS waveform data are easily converted into distance since the signal returns are measured as a function of time.Accounting both the times the laser pulse was emitted and returned could give a measure of the distance from the sensor to the intercepted surface.LVIS has a scan angle of about 12 degrees, and could cover 2 km swaths of surface from an altitude of 10 km, with 10 to 25 m footprint size.We estimated the canopy height (we refer it as quasi-height) from the waveform as the difference from the power of the first increase of return above the mean noise level to the center of the last pulse, which is designated as the ground return.

Moment distance framework
The Moment Distance is a new analytical framework that uses a computationally simple metric to capture the shape of the curve.The approach takes advantage of the multiple returns of the waveform LiDAR to monitor changes in shape and its asymmetry-exploiting the range from first detected signal to last detected signal above the noise threshold.The formulation of the concept revolves around using the raw waveform to retain richness of the data.In addition, it means avoiding Gaussian fittings in our goal to detect changes of the waveform (e.g., widening of peaks, existence of complex extremes) with the change of canopy parameters, such as canopy height.It involves fixing two points as references and has two aspects: the set of equations that generate the MD metrics and the choice of positions within the waveform to highlight.Assume that the waveform is displayed in Cartesian coordinates with the abscissa displaying time lapse t and ordinate displaying backscattered power p.Let the subscript LP denote the left pivot or earlier temporal reference point and subscript RP denote the right pivot or later temporal reference point.Let t LP and t RP be the time value observed at the left and right pivots, respectively.The MD framework is described in the following set of equations: (3) The moment distance from the left pivot (MD LP ) is the sum of the hypotenuses constructed from the left pivot to the power at successively later times (index i from t LP to t RP ): one base of each triangle is difference from the left pivot (i-t LP ) along the abscissa and the other base is simply the backscattered power at i. Similarly, the moment distance from the right pivot (MD RP ) is the sum of the hypotenuses constructed from the right pivot to the power at successively earlier times (index i from t RP to t LP ): one base of each triangle is the difference from the right pivot (t RP -i) along the abscissa and the other base is simply the backscattered power at i.
The MD Index (MDI) is an unbounded metric.It increases or decreases as a nontrivial function of the number of wave counts considered and the shape of the waveform that spans those contiguous wave counts.The number of wave counts is a function of the temporal resolution of the LiDAR (digitization rate) and the length of the waveform (i.e., full extent or subsets) being analyzed.Depending on digitization rate, the matrix resulting from the calculations of the MDs within a range of waveform could be a massive set of numbers.As the MDI is designed to exploit the multiple wave counts and the asymmetry of the waveform, the new metric may lose its capability to detect shape changes or movements of wave morphologies when used improperly.Being resolution-dependent, MDI may ill perform and fail to define the waveform shape when there are only few points between pivots.Table 1 presents some limiting cases when computation of the MDs and MDI are not appropriate.
Figure 1 demonstrates the MDI when applied to lines.Illustrated simply, a curve opening down (Figure 1A) will differ from a curve opening up (Figure 1B) when defined by moment distances with varying pivot ranges.In Figure 1C, for instance, a two-peak, opening-down curve defined by fixing the early time wave count and increasing the range one wave count at a time (going from point 1 to point 2) a slope becomes evident when a second peak is contained within the pivot range.The dip around count 187 defines the largest difference of the change of shape detected by a particular pivot pair.A similar pattern is observed when fixing the late time wave count and increasing the range one count at a time (going from point 2 to 1).As shape dissimilarities are detected by comparing MD behavior from point 1 to MD behavior from point 2, the largest difference in shape is recorded when the earlier peak is covered by the range.Table 1.Simple cases where the MD approach would not be suitable.Note that MDI exploits multiple bins and limiting the number of bins may fail to define the waveform shape.

Cases Equations of MDI Remarks
When the pivots are the only values, then a curve is not described.This defeats the purpose of using MDI to define the shape of a line that can be described by the slope direction.The sign of the magnitude of the MDI changes when p 1 < p 2 .

MDI = 0
Missing to include the backscatter powers p 1 and p 2 , and only used p 3 .No shape has been defined in this case.Missing significant returns of the waveform must be avoided.4 4  In a curve with two dips, such as in Figure 1D, the maximum differences in shape around bins 180 to 185 occur when the dips are included in the range.Minimal differences in shapes are expected for short pivot ranges.It is important to stress that the MDI detects the differences of curvature regardless of where the pivot is being fixed, such as the cases in Figure 1, provided that the important peaks are taken into consideration in the pivot range.Detecting the differences is crucial in the following analysis using the LiDAR waveforms as various peaks and dips may exist from the first signal to the last signal detected.Using the proper range that encompasses the significant peaks of the waveform could lead to the shape difference maxima: the maximum difference of the summation of distances from point 1 and the summation of distances from point 2. The value of the difference tells how the shape of the waveform as viewed from reference point 1 varies from the one viewed from reference point 2.
Figure 1 illustrates that the selection of LP and RP may not necessarily be exactly at the start of signal and end of signal, respectively.In equations 1 and 2, there is no fixed location of pivots; instead, an option is given to pick the range of wave counts.The locations of LP and RP do not have a strict limitation as to where they should be placed exactly along the wave count axis, thus, making MDI exploitable at different locations.The location of the MDI maxima often occurs right before or right after signal peaking.Elimination of few points after the first detected signal and/or points before the last detected signal may not hurt the MDI nor affect the detection of shapes from any of the set fixed pivots.

Generation of noise-affected waveforms
We added three types of random noise to the LiDAR signal: additive (AD), uniform additive (UA), and impulse (IM).The additive Gaussian noise model [52] is the simplest noise model that consists in adding a realization of a zero mean random vector to a clean return signal.The impulse noise [53] is a different type of noise that consists of sparse impulsions, generated by a random distribution with slowly decaying probability.The additive uniform noise [54] is the kind of noise where noise is uniform in a given waveform interval.We define the resulting noisy waveform I n (t) by the sum of the original waveform I o (t) and the added noise n(t) where t indicates time. (4) We individually ran the additive Gaussian, uniform additive, and impulse error algorithms to the set of paired LiDAR waveforms from the 1998 and 2005 La Selva datasets.This process included generating 1000 noise realizations per type of noise model at four levels of uncertainty (5%, 10%, 15%, and 20%), and in two waveform subsets analyzed-leading and trailing subsets.More of these two subsets are discussed below.We observed waveform behaviors such as how it responds to noise and how certain types of waveform morphology changes to different types of noise.We then looked at the MDI as a function of the original quasi-height (calculated height from original waveform) to assess how the introduction of the levels of uncertainty to the waveforms using three different types of noise models changed the initial observed trend of the original MDI against the canopy quasi-height.

Smoothing waveforms
In addition to adding noise, we also looked at the behavior of the MDI when waveforms were smoothed to different degrees.We eliminated the small peaks using forward-moving average, backward-moving average, and centered-moving average window approaches as described by Savitzky & Golay [55] and Madden [56] by calculating the average of power of adjacent bins [12,57].The forward-moving average smoothens the waveform by moving a specified window forward in time, while the backward-moving average smoothens the waveform as it moves backward in time.Since MD is shape-dependent, care was taken not to over-smooth the waveform (i.e., using too large a window or percent wave counts).As much as possible, we preserved the significant peaks, such as the canopy and the ground [58].Moving windows of 1% to 3% of the total wave count were used to smooth irregularities without degrading waveform morphologies.

Calculation of MDI from full waveform extent and waveform subsets
The full extent of the recorded waveform, plus two defined subsets-leading and trailing subsets [8] were used to illustrate the impacts of noise integration to the waveforms and to the value of the MDI (Figure 2).The full extent covers the range starting at the first detected pulse up to the last detected pulse.The leading subset covers the range of time bins from the start of the signal to the location of the maximum peak at early time (above the mean noise level).The trailing subset defines the range from the location of the maximum peak at early time to the maximum peak at a later time (ground response).It is worth noting that a subset can be any specific and narrow range and it is always defined by the locations of the left and right pivots (represented by gray dots in Figure 2) that necessary for the computation of MDI.

Figure 2. Sample of LVIS waveform and subsets used for MDI calculations. A pair of pivots defines the range of a subset. Leading subset is defined by the first pivot (signal start) to the second pivot (maximum peak early time). Trailing subset is defined by first pivot (maximum peak early time) to the second pivot (maximum peak later time). The abscissa is in terms of time (t) and the ordinate is the backscattered power (p).
Between the two subsets, the trailing subset includes two important waveform morphologies (maximum peaks at early and later times) in its range.These two waveform morphologies are important as both could dictate the magnitude of MDI [48].In the later section of this chapter, we made use of canopy quasi-heights to describe how each subset, defined by range at key profile locations of the waveform, relates to our new metric.2a is for the full extent, 2b for the leading subset, and 2c for the trailing subset.Notice that the impulse showed the most variability estimated with CV, especially at high uncertainty levels.

Error effects on waveform shape and the MDI
Statistical results from the leading subset are presented in Table 2b.Among the three error algorithms, impulse showed the most variability, especially at high uncertainty levels.With 15% and 20% uncertainty levels, the coefficient of variations (CV) for 1998 are 10.6% and 13.9%, respectively; for 2005 are 12.1% and 15.7%, respectively.
For the trailing subset ( The mean MDI values of the additive and the uniform additive approaches are comparable at various error levels in both subsets (Table 2).In fact, increasing the level of noise from 5% to 20% showed no big differences of the MDI means between AD and UA in both years.Differences of MDI means were observed in AD vs IM, and UA vs IM, after significance testing for both the leading (p < 0.05 in all tests) and trailing subsets (p < 0.05 for 1998; p < 0.10 for 2005).The full extent of the waveform, however, showed no significant differences of the means in any of the one-to-one comparisons (AD vs UA, AD vs IM, UA vs IM).The MDI increased negatively in the leading and full-extent with every increase of uncertainty.The trailing subset, on the other hand, had decreasing positive MDI at each increase of uncertainty.

Standard Error (SE) and RMSE of MDI
Large MDI values as the noise levels increased is evident on the bar graphs of MDI and standard error of 1000 noise realizations (Figure 4, full extent of the wave).At lower noise levels, the SE is lower relative to higher noise levels.As the noise level increased the average MDI further deviates from the reference MDI, while the SE increased alongside.The impulse approach showed large increases of SE at differing noise levels.These increases can be observed for both the tested 1998 and 2005 samples.
Computed RMSE (Table 3) showed the performance of each noise approach using two pairs of waveform samples both from 1998 and 2005 datasets.The additive and the uniform additive approaches resulted to smaller values of RMSE compared to the impulse.However, when the level of noise is low (e.g., 5%), the three approaches were comparable.Figures 5 and 6 show the average MDI and the equivalent standard errors for the leading and trailing portions of the waveform, respectively.Similar to the results in Figure 4, the impulse model resulted to an average MDI with the highest offset from the reference value with respect to the other two noise models.

Moment distance index vs canopy quasi-heights
A positive relationship is manifested in Figure 7 between the MDI from full extent and the derived canopy quasi-heights from 16 pairs of waveforms.As shown in Table 4, the relationship is true for both the years 1998 and 2005, with the results from 1998 exhibiting stronger linear relationship (averaged r 2 =0.62 vs. averaged r 2 =0.38).The trailing subset shows a similar trend (Figure 8) as the one using the full extent of the waveform, albeit with less explanatory power (averaged r 2 =0.30 vs. averaged r 2 =0.22).However, even at a high noise level (20%), the trend does not changed.From the 16 pairs of samples, we measured the magnitude of the error between the reference MDI and corresponding observed MDI from noise-affected waveforms using RMSE. the results for each approach using the full waveform and the trailing subset.Small RMSE values were related to 5% noise on the waveform while the high RMSE values were from the 20% noise level, regardless of the type of noise approach.The highest RMSE values were found using the IM approach.4 for the coefficients of determination.

Waveform similarities
We compared waveform components between two signals using correlation analysis on the 1998 and 2005 pair of subsets at different noise levels.We tabulated in Table 6 the correlations between two waveforms (original and noise-affected) with varying signal powers, putting emphasis on the Spearman rank-order correlation coefficient, r s values.Comparing between subsets in Table 6, the leading subsets for both years showed higher r s than the trailing subset.For instance, the 2005 leading subset had a minimum r s = 0.88 (Table 6B1), while its equivalent in the trailing subset had r s = 0.72 (Table 6B2).In the 1998 leading subset, the IM showed an r s =0.91 (Table 6A1), while its equivalent trailing subset was only r s =0.67 (Table 6A2).Table 6.Spearman correlation coefficients (r s ) for the leading (A1/B1) and trailing (A2/B2) subsets.The r s measures the strength of the associations between the original waveform segment subset and the generated subsets with noise (significant at p = 0.05).Note that the statistics of the pair of samples are from 1998 (waveform file ID 666) and 2005 (waveform file ID 8983).

Comparison between noise levels
Comparing among the noise levels, waveforms with less noise showed a higher degree of similarity to the original curve than those with high noise levels.From 5% to 20% noise level, the waveforms with 5% additive, uniform additive, and impulse noise resulted in much closer agreements (e.g.trailing subset, Table 6A2-the additive 5% has r s = 0.95; the 10% has r s = 0.87; the 15% has r s = 0.81; and the 20% has r s = 0.75) to the original.

Comparison among additive, uniform additive, and impulse
Between additive and uniform additive, the effects of the two approaches to the waveform are comparable, returning equivalent r s values as shown in Table 6.As shown in the previous section (Tables 5, 6, and 7), the distributions of the MDI for 1000 realizations of the waveforms for the additive and uniform additive were almost always equal.In the case of the impulse model, the appearance of intermittent spikes did not tremendously change the shape of the curve in general.In fact, based on the results of the leading subsets for the 1998 and 2005 datasets (Tables 6A1 and  6B1), the IM showed high r s values (as high as r s = 0.98), comparable to the results of the other two models, AD and UA.

Smoothing waveforms
Table 7 showed the comparison of the performances of the three smoothing window types measured in terms of RMSE.The forward-window approach gave a large RMSE of MDI, 83.93 and 53.44 for 1998 and 2005, respectively, based on a moving window of 3% of the total wave counts.Centered window smoothing appears to be the best way to smooth a waveform.Table 7. Computed RMSE from 16 pairs of samples using the full extent of the smoothed waveforms.Curves were smoothed up to a moving window of 3% of total wave counts (size 15).
Figure 9 showed the relationships of the MDI computed using smoothed waveforms against the quasi-heights.Trends from Figure 9 and Figure 7 are comparable with each other, having observed a stronger 1998 linear relationship between MDI and the canopy quasi-height than the 2005 dataset-averaged r 2 =0.62 (1998) vs averaged r 2 = 0.38 (2005) for noise-affected waveform, while averaged r 2 =0.30 (1998) vs averaged r 2 =0.22 (2005)

Area under the curve (AUC)
Figures 10 and 11 illustrate the MDI as having a far better and stronger relationship with the quasi-height than the AUC (e.g.r 2 = 0.62 vs r 2 = 0.35 for the 1998 AD), especially if the full-extent of the waveform is analyzed.The same pattern holds for the UA and the IM models.Table 8 lists the coefficients of determination (r 2 ) for quasi-height vs the AUC and MDI.MDI shows better fit than AUC in most tests except in the 2005 trailing subset.We computed the RMSE (Table 9) of the AUC for each noise approach using the full waveform extent.Small RMSE values were related to 5% noise on the waveform while the high RMSE values were from the 20% noise level, regardless of the type of noise approach.The highest RMSE values were found using the IM approach.

Temporal movement of MDI
For the final analysis, we looked at the movement of the MDI from the pairing of the 1998 to 2005 samples (Figure 12).Two groupings of paired samples were observed.The pairs from the first group showed decreasing negative MDI as quasi-heights increased, while pairs from the second group showed decreasing negative MDI as quasi-height decreased (Figure 12a).
The same groups of samples were observed in Figure 12b for the trailing subset waveform.The first group of pair samples showed increasing MDI as quasi-heights increased, while pairs from the second group showed decreasing MDI as quasi-height decreased.

Discussion
The new MD approach is hardly affected by noise, as shown in our test results when waveforms corrupted with AD, UD, and IM noise have been fitted with our method.MDI appears to be sensitive to impulse noise, but robust to the other noise forms.Impulse noise tends to exaggerate the waveform by introducing spikes when a longer wave count gap exists between peaks (see the 1998 and 2005 shapes in Figure 3 for impulse).Results from these tests are important since we want the new metric to be robust to random noise in the waveform.
Even with the MDI changing in absolute values at several noise levels (up to 20%), the relationships with canopy quasi-height are minimally affected.Tests on the noise-affected waveforms resulted with AD, UD, and IM having the same averaged r 2 of 0.62 and 0.38, for the 1998 and 2005 data, respectively.Figures 7 and 8 show that good fitting results can be achieved even with waveforms corrupted with noise, regardless of whether the analysis involves the full extent or the subset of the waveform.We attribute this robustness of the MDI algorithm to its ability to define the shape of the wave from two points of perspective, rather than one.Accordingly, the MDI can minimize the effects of noise present between the left (MD LP ) and right (MD RP ) pivots.There were no significant differences of MDI means observed in all trials when we tested using ANOVA at 0.05 significance level (p>0.05 for both 1998 and 2005 datasets).Although, MDI values differ in each trial, the values are so close that the symbols in Figures 7 and 8 look like they overlap.However, these minimal differences did not affect the relationship of MDI to the canopy quasi-height (cf.Table 4).Will the MDI work in the presence of major error spikes on the waveform?
As illustrated by the impulse model results, the MDI can attenuate the effects of spikes, but it is more susceptible to spikes than other noise forms.
The leading subset has very little to no relationship to the quasi-height.This could affirm the need for a 10% elimination of the upper canopy in height analysis [59].In contrast, the trailing subset shows a positive yet low relationship against the quasi-height (Figure 8) with highest averaged r 2 of 0.30 for 1998 and 0.22 for 2005.From 5% to even a high noise of 20%, the pattern of MDI vs quasi-height remained to be similar for the AD, UD, and IM approaches.In Table 5, we computed the RMSE of MDI for each noise model using the full waveform and the trailing subset and found out that the small RMSE values are related to 5% error on the waveform.
In Table 6, the leading subsets for both years come with higher correlation coefficients, r s , compared to the trailing subset.It implies that the waveforms generated with noise levels in the leading subsets exhibit shape similarity to the original waveform.This observation further suggests that the noise within the leading region may not have strong effects to the relationship of MDI with the canopy quasi-height.The above average values of r s observed in the trailing subsets are caused by a much longer pivot range as well as the steep rise of the curves in between pivots.
While the spikes introduced by IM approach have been detected by the MDI approach as shown by the MDI distribution of the 1000 iterations of the impulse noise in the previous section (Figures 4,  5, and 6), the MDI is resistant to a single spike or two.The MDI be significantly affected only when a major spike is of considerable duration and thus be mistaken for a valid signal return or peak.
It should be noted that, in Table 6, a high correlation coefficient, say r s = 0.90, does not necessarily signify that there will be the same MDI values for the original and the noise-affected waveforms.For example, a noise-affected waveform with a major spike that is near a pivot can result to a high correlation coefficient compared to the original waveform; however, the presence of the spike will result to two different values of MDI.Moreover, a low correlation coefficient does not signify a curve shape that is less effective in estimating canopy characteristics.It is shown that with the MDI, the noisy curves are still able to show a pattern of MDI-to-height correlations (Figures 7  and 8).It is important to note that waveform shapes are maintained despite the introduction of noise.
We also have shown the new approach to have an advantage over the area under curve especially for the full waveform.Our results have shown that the MDI has a stronger positive relationship with the canopy quasi-height than does the AUC (r 2 = 0.62 vs. r 2 = 0.35 for the 1998 AD).This improvement is also evident for the 1998 trailing subset.As shown in Table 8, however, AUC exhibited a better fit with the quasi-height for the 2005 trailing subset.It even came out with a comparable r 2 to the 1998 subset, AD approach (r 2 = 0.27 for AUC vs. r 2 = 0.30 for MDI).This better showing of the AUC against the MDI maybe caused by the shorter pivot range of the subset.The shorter the pivot, the fewer the points present to define the curve for MDI.Nonetheless, with a longer pivot such as the full extent, the AUC fails badly (r 2 = 0.002 for the 2005 AD), especially with the presence of noise that can amplify the areas under the curve.
When smoothing the waveforms, small smoothing moving windows of 1% to 3% of the total wave count is recommended.Using moving windows of 5 to 10 counts, even up to a maximum of 15 counts, can effectively smooth waveform irregularities without risking of leveling small important peaks, especially those found near the ground return.Also, centered window smoothing technique may be the best way to smooth a waveform with the least RMSE for 1998 and 2005 datasets, as shown in Table 7. Forward and backward smoothing may fail to work when the waveform consists of multiple peaks or small peaks in between the major peaks that may vanish due to the smoothing.This effect explains the large deviations of the MDI (forward and backward smoothed waveforms) from the reference MDI found in Figure 9 for 2005.
The groupings of the MDI that are seen in Figure 12 are due to the fact that the waveform is complex and can have different shapes.Three canonical waveform shapes include (a) a maximum early peak, when the first canopy peak is maximum, or (b) maximum late peak, when the ground peak is maximum, or (c) roughly equal peaks for both canopy and ground.Our results reveal that the MDI can capture aspects of temporal dynamics of canopy quasi-heights and group them based on the curve shapes.How this information can be used to classify the spatial and vertical heterogeneity of forest structure is the target of a future paper.

Conclusion
The Moment Distance framework we have developed is novel and its application to LiDAR data provides a new, computationally easy approach to characterizing waveform shape.The new approach decomposes LiDAR waveform returns into left moment (with pivot from early time wave count) and right moment (with pivot from late time wave count) components, and then computes the MDI metric.The decomposition allows us to look at how a selected pivot defines the strength of each point on the waveform from a single point of perspective.The summation of each strength defines the structural behavior of the asymmetrical curve from, again, a single standpoint.Having two pivots solidifies the concept of the MD as an approach for shape-characterization by defining the structural behavior of the waveform not only from a single standpoint, but two.
Moreover, the MDI is minimally affected by noise.It has an advantage over AUC with its stronger relationship against the canopy quasi-height.MD calculation is straightforward and thus MDI analysis is easy to replicate.
One important contribution of our metric to waveform analysis is that the customary use of the Gaussian modeling to fit multiple peaks and improve peak detection may be avoided.While current waveform optimization fitting schemes rely strongly on initial parameters, which they usually fail in identifying weak returns, the new approach uses the raw waveform itself to capture shape, without requiring parameter estimation.
In conclusion, MDI is a robust metric.The results shown in this paper allows us to put forth an argument on how the new metric will behave against the existing relative height (RH) metric.Also, the results warrant future tests of MDI using different types of LiDAR waveform shapes-maximum peak observed at an earlier time, maximum peak observed at a later time, and observed peaks are equal (roughly) in return magnitude-against the important key profile landmarks of the waveform.
We plan in a future paper to show how waveform shape and movement of the peaks, dips, and other landmarks respond to changes in canopy quasi-height.It is crucial to explore the behavior of the MDI in relation to the temporal changes of waveform shape and landmarks.In this way, we can illustrate how the new metric can capture temporal change in canopy height and, thus, provide a means to monitor forest growth and development for habitat assessment and carbon monitoring applications.

Figure 1 .
Figure 1.Sample illustrations of MDI applied to simple curves, with: (a) curve opening down, (b) curve opening up, (c) two curves opening down, and (d) two curves opening up.The figures demonstrate the changes of the MDI values with varying pivot ranges, moving from point 1 to 2, and vice versa.Maximum values are observed at maximum shape differences, usually occurring at the inclusion of peaks.

Figure 3
Figure3give insights to a noise generated at 10% for a pair of waveform samples.Note that each pair in the dataset may illustrate different behaviors of the waves.Sample 1998 (LVIS waveform file ID 666, shot number 574836) shows two distinct peaks representing the canopy (maximum peak earlier time) and ground (maximum peak later time) return pulses.The corresponding pair in 2005 (LVIS waveform file ID 8983, shot number 2607680) also manifests the two peaks, with the addition of an earlier shorter peak.Among the three error algorithms, the impulse type differed in distribution of noise because of the appearances of spikes along the waveform.

Figure 3 .
Figure 3.A pair of LVIS waveform samples from (a) 1998 (LVIS waveform file ID 666, shot number 574836) and (b) 2005 (LVIS waveform file ID 8983, shot number 2607680) datasets showing the original and generated waveforms with 10% noise.The noise is randomly simulated and applied throughout the wave extent.AD = Additive Noise; UA = Uniform Noise; IM = Impulse Noise.

Figure 4 .
Figure 4. Average MDI and Error bar plots for 1000 noise realizations analyzed for each noise approach, using the full extent of the waveform: (a) for 1998 (waveform file ID 666) and (b) for 2005 (waveform file ID 8983).Take note of the increasing values of the MDI as the levels of noise are increased.The impulse noise shows abrupt increase of the standard deviation of the mean.The reference MDI is shown as horizontal line.

Figure 5 .
Figure 5.Using the leading subset of the waveform, the plots show the average MDI and the equivalent standard errors for 1000 noise realizations analyzed for each noise types at various levels of uncertainty.The left panel is for the 1998 sample (waveform file ID 666), while the right panel is for its matched pair in 2005 (waveform file ID 8983).All values of MDI are negative, with the impulse approach having the largest SE.The reference MDI is shown as horizontal line.

Figure 6 .
Figure 6.Using the trailing subset of the waveform, the plots show the average MDI and the equivalent standard errors for 1000 noise realizations analyzed for each noise types at various levels of uncertainty.The left panel is for the 1998 sample (waveform file ID 666), while the right panel is for its matched pair in 2005 (waveform file ID 8983).In this subset, all values of MDI are positive, with the impulse approach having the largest SE.The reference MDI is shown as horizontal line.

Table 4 .
Coefficients of determination (r 2 ) of the MDI vs canopy quasi-heights for the 1998 and 2005 data.Values are kept to four significant figures to show differences.Table 5. Computed RMSE of MDI with 16 pairs of samples from the 1998 and 2005 noise-affected waveforms using the full extent and the trailing subset.Notice that the highest RMSE values are found using the IM approach.

Figure 7 .
Figure 7. MDI (from full-extent waveform) as a function of canopy quasi-height for 16 pairs of the 1998 and 2005 La Selva LVIS datasets.First column: 1998 dataset with added noise levels and; Second column: 2005 dataset with added noise levels.A minimal effect of the noise is observed on the relationship of the MDI with the canopy quasi-height.See Table4for the coefficients of determination.

Figure 8 .
Figure 8. MDI (from waveform trailing subset) as a function of canopy quasi-height for 16 pairs of waveforms from the 1998 and 2005 La Selva LVIS datasets.First column: 1998 dataset with added noise levels and; Second column: 2005 dataset with added noise levels.See Table4for the coefficients of determination.

Figure 9 .
Figure 9. MDI (smooth full-extent waveform) as a function of quasi-height for the 1998 and 2005 La Selva LVIS datasets.First column: 1998 dataset at various window sizes and; Second column: 2005 dataset at various window sizes.The relationships shown between MDI and quasi-height in these figures can be compared with those found in Figure 7 from noise-affected waveforms.

Figure 12 .
Figure 12. 1998 and 2005 values of MDI vs quasi-heights plotted to show the shifting of the MDI over the time period.The arrow shows the direction of the sample pairing from earlier year to a later year.Two groupings of paired samples were observed for both the (a) full-extent and (b) trailing subset of the waveform.

Table 2
lists the statistics of the MDI from the averaged 1000 noise-affected waveforms for a sample pair.Using the full extent of the waveform (Table2a), large MDI values for 1998 and 2005 were observed with the impulse error algorithm.Negative MDI values were seen using the full extent and leading subset, while positive values are generated by the trailing subset.

2A. Full Extent: Original MDI (1998) = -742.33 Full Extent: Original MDI (2005) = -259.68
Table 2c), MDIs exhibited positive values.Statistical calculations showed minimum MDI values were generated using impulse at different levels of errors, ranging from 8.17 to 74.0 for the 1998 sample.A large MDI range was manifested in the 2005 sample (139.36 to 217.23).Smaller ranges were observed from the additive (49.49 and 47.52, for 1998 and 2005 respectively)