Expectation Maximization and the retrieval of the atmospheric extinction coefficients by inversion of Raman lidar data

We consider the problem of retrieving the aerosol extinction coefficient from Raman lidar measurements. This is an ill--posed inverse problem that needs regularization, and we propose to use the Expectation--Maximization (EM) algorithm to provide stable solutions. Indeed, EM is an iterative algorithm that imposes a positivity constraint on the solution, and provides regularization if iterations are stopped early enough. We describe the algorithm and propose a stopping criterion inspired by a statistical principle. We then discuss its properties concerning the spatial resolution. Finally, we validate the proposed approach by using both synthetic data and experimental measurements; we compare the reconstructions obtained by EM with those obtained by the Tikhonov method, by the Levenberg-Marquardt method, as well as those obtained by combining data smoothing and numerical derivation.


Introduction
Lidar systems are increasingly used to characterize the temporal and spatial distribution of the aerosol optical characteristics in the atmosphere [1,2]. Further, since lidar data present high spatial and temporal resolution, air motion can be precisely monitored with them [3]. However, lidar signals provide a very indirect measure of the relative concentration and distribution of the aerosols: the energy observed by a lidar is a function of the extinction and backscattering coefficients which in turn are functions of the microphysical properties of the aerosols. Deriving such properties is therefore possible, but only via the numerical solution of two inverse problems: the first one allows the estimation of the extinction coefficients from the experimental data [4][5][6][7] and the second one allows the reconstruction of the size distribution, shape and refractive index of the aerosols [8][9][10][11], starting from the knowledge of the extinction and backscatter coefficient at multiple wavelengths.
The present paper introduces the use of Expectation Maximization (EM) for the solution of the first lidar inverse problem and shows that this iterative scheme is highly competitive, in terms of reconstruction accuracy, for this kind of data analysis. EM provides the unique non-negative solution of the maximum likelihood problem when the model equation is linear and the distance between the experimental and reproduced data is measured by means of a specific topology named the Kullback-Leibler (KL) distance [12]. The lidar problem is of course linear. Further, KL is a convex distance particularly appropriate to describe the data discrepancy in the case of Poisson noise. Although the Poisson distribution is a reasonable approximation for describing the lidar experimental observations (such an approximation is particularly reliable at intermediate altitudes while significant systematic corrections occur at low and high altitudes), the input data actually utilized in this paper is the result of a non-linear processing of the experimental measurements and therefore are not Poisson. One of the main results of our investigation is that EM is nonetheless very effective in reconstructing the optical coefficients from such data.
We finally observe that, due to the presence of noise, the maximum likelihood approach tends to provide excessively oscillating solutions, as the random fluctuations of noise are amplified in the inversion procedure [14]. However, numerical experiments heuristically show that stopping the iterative process at some optimal iteration regularizes the solution, thus preventing the occurrence of artifacts and over-fitting [15]. Being inspired by results obtained in solar spectroscopy using hard X-ray data [16], here we use a stopping criterion imposing that the cumulative normalized data residuals satisfy an inequality derived by the central limit theorem.
The plan of the paper is as follows. Section 2 sets up the inverse problem considered and describes EM together with its statistical stopping rule. Section 3 studies the effective resolution associated with the inversion method in the case of noise-free synthetic data. Section 4 performs an error analysis on the profiles used by the European Aerosol Research LIdar NETwork (EARLINET) to assess the inversion performances of different numerical procedures and as training cases for future studies [5,[17][18][19]. Section 5 presents an application to an experimental dataset. Our conclusions are offered in Section 6.

The lidar inverse problem
In the atmospheric applications of the lidar technique, measurements of the backscattered mean power P (z) at altitude z and at a specific wavelength λ are related to the extinction coefficient α λ (z) and the backscatter coefficient β λ (z) via the lidar signal equation where C λ is a constant accounting for the laser optical power and the overall detection efficiency, and the laser beam has been assumed to be within the field of view of the receiver at any altitude (unitary overlap function). A similar equation can be written for the vibrational Raman lidar signal at wavelength µ where C µ includes laser optical power, detection efficiency and N 2 Raman scattering cross section, and the overlap function has also been assumed to be unitary; ρ(z) is the molecular density of the atmosphere, which is assumed to be known, and α := α µ +α λ represent the total extinction coefficient, due to atmospheric particles and gases, at λ and µ. The molecular contribution can be calculated from the Rayleigh theory of scattering while the relation between the particles' contribution can be derived from Mie scattering theory [20]. The advantage of Raman equation is evident: since ρ can be determined from a known atmospheric model, α can be computed. Advanced lidar systems with elastic and vibrational Raman channels are available nowadays, and deliver at the same time both backscatter and extinction coefficients. Therefore, in this study we used data coming from a multi-wavelength lidar system (elastic and Raman at several wavelengths).
In this study we focus on the problem of the inversion of the Raman data; the subscript µ is omitted but implied. Simple algebra applied to Eq. (2) leads to which can be written in compact form as where: is the input data in Eq. (3); • x, i.e. the extinction coefficient α(z), is the unknown in Eq. (3).
In the following, with a slight abuse of notation, we will be calling y ∈ R N the vector whose i-th entry y i is y(z i ), where {z i , i = 1, . . . , N } is a set of sampled altitudes where the lidar measurements occur; x is 2 defined accordingly, and H is the N × N matrix associated with the integral operator H. Therefore we are interested in the numerical solution of the linear system y = Hx .
The ill-conditioning of Eq. (5) implies that straightforward inversion of H would lead to an uncontrollable amplification of noise in the reconstructed extinction profile; therefore regularization is needed to produce stable and physically reliable results [21]. The concept at the basis of regularization consists of replacing the exact problem described by Eq. (4) with the relaxed problem where D(y, x) is some properly defined measure of the discrepancy between the measured data and the data predicted by the model, and Ω is a subset (not necessarily proper) encoding the a priori information about the solution. Examples of possible choices for Ω are: the convex set of all non-negative functions, when one a priori knows that the solution is non-negative; the subspace of functions with compact support, when one a priori knows that the solution is zero outside a closed and limited domain. In constrained maximum likelihood approaches to regularization, the explicit form of the discrepancy measure depends on the statistic of the noise affecting the experimental data. Indeed, maximum likelihood looks for x that maximizes the likelihood, i.e. the probability to observe y from the model Hx. In the case of Gaussian noise this probability is where A is a normalization constant and · is the Euclidean norm. Therefore, in this case, maximizing the likelihood corresponds to minimizing the negative logarithm of Eq. (7) and D(x, y) becomes the Euclidean discrepancy D(y, x) = y − Hx 2 .
If one also chooses then Eq. (6) and Eq. (8) define the Tikhonov regularization method [22], that can be implemented by solving the minimum problemx = arg min where η is a real, positive regularization parameter to determine via optimization. The Tikhonov method was applied to the first lidar inverse problem in [23,24]. Another technique that has been recently applied to regularize the lidar signal equation is described in [6]; there the authors make use of the Levenberg-Marquardt algorithm, an iterative technique that maximizes a Gaussian likelihood with positivity constraint on the solution; regularization is attained in [6] by stopping the iterations with the L-curve criterion.
On the other hand, if the data vector y is a realization of a Poisson random vector, the likelihood can be written as Again, maximizing this likelihood corresponds to minimizing the discrepancy measure provided by its negative logarithm, known as Kullback-Leibler divergence Expectation Maximization (EM) is the constrained maximum likelihood method that chooses Ω to be R N + * (the subset of the strictly positive elements of R N ) and utilizes the Kullback-Leibler divergence as discrepancy measure; it is therefore particularly appropriate in the case of Poisson noise. The EM algorithm can be derived by using the Karush-Kuhn-Tucker conditions [25] (named KKT conditions, or theorem) that provide first order necessary conditions for optimality and are largely used for constrained optimization problems [26]. In this case, the KKT theorem leads to a fixed point problem that can be turned into the EM iterative algorithm whose standard form is: where H T is the transpose of H (in the context of EM, H and H T are often referred to as the forward and the backward operator, respectively); furthermore, with abuse of notation, 1 is a column vector with all unit entries, and the products and divisions between vectors are intended to be taken component-wise.
There are several reasons why EM can be specifically appropriate for solving the lidar inverse problem. First, given an initial guess with all strictly positive entries, the solution at each iteration will remain strictly positive, which is a desirable property; we notice that this way to implement the positivity constraint is computationally effective and does not require any projection at each iteration as, for example, in the case of the projected Landweber method [28]. Second, the order of magnitude of the initial guess has no impact on the solution: this can be easily seen by looking at the form of the iteration described by Eq. (13), where x k at the right hand side appears both at the numerator and at the denominator. Furthermore, since the problem is a convex optimization problem, in the noise-free case and for an infinite number of iterations, the algorithm will converge to the one and only minimum of the Kullback-Leibler divergence [29]. We finally notice that, as previously observed, EM is based on the use of the Kullback-Leibler divergence which is appropriate in the case of Poisson noise. Lidar measurements obey the Poisson statistic but the data utilized as input in this inverse problem are obtained by processing such measurements (see Eq. (3)) and therefore are not exactly Poisson. However, in order to reduce the numerical instability due to ill-conditioning, in the case of noisy data the EM iteration scheme must be stopped according to an optimal criterion, which smoothens the effects of a non-ideal choice of the discrepancy [15].
The problem of stopping the iterations at a sensible point is known to be a difficult one in the inverse problems literature in general [14], and specifically for the EM method [13,14]. In fact, as the iterations proceed, the EM algorithm produces solutions whose predicted data are closer and closer to the recorded data; at the same time, the solutions become more and more complex, as they attempt to reproduce the finer details in the data; however, as the iterations proceed such fine details are mostly noise. This intuitively explains why it is so important to stop the iterations within a reasonable range. To achieve that goal we applied a statistical criterion first described in [16], which stops the iterative scheme when the residual between the measured and the predicted data is compatible with the noise level. More specifically, let • P i be the measured Raman signal at altitude z i ; •P i be the data generated by the extinction profile x, i.e.P i = C ρ z 2 i exp (−(Hx) i ), at altitude z i ; • σ P i be the uncertainty of the measurements at altitude z i . At each iteration, we compute the cumulative mean of the normalized residuals up to z i : Following [16], we stop the iterations when ∆ i < K √ i for every i; specifically we will be using K = 3. The rationale behind this criterion is as follows. If the reconstructed solution is close enough to the true value, then the predicted dataP is close to the exact (noise-free) data. As a consequence, each term in the sum (14) is the difference between the measured and the exact data, i.e. it is noise. As such, it can be interpreted as an independent variable with zero mean. By the central limit theorem, as i increases, the distribution of ∆ i tends to a Gaussian distribution, with variance 1 √ i . It is well known that 99.7% of the values of a Gaussian fall within 3 standard deviations. In conclusion, if the reconstructed solution is close to the exact value, then the cumulative residuals must fall within the prescribed range; if they don't, then we have not 4 reached a good enough reconstruction, and therefore we proceed with the iterations. We stop at the first iteration in which the data generated by the solution is compatible with the measured data, in statistical sense.
Before proceeding with the numerical studies, we specify that all the computations described in the present paper have been performed on a MacBook Pro, 3.1 GHz, Intel Core i7; therefore the computational costs below are intended to be relative to this processor.

Effective resolution
One of the key points in the inversion of lidar data is the quantification of the spatial resolution achievable by means of a specific inversion method. Indeed, regularization algorithms tend to introduce a blurring effect on the reconstructed backscattering/extinction profiles; such blurring is, at times, due to the smoothing procedure that is applied directly to the data in order to increase the signal-to-noise ratio; in other cases, it is due to the smoothness prior used in the penalty term (e.g. in the case of the Tikhonov). Whatever the reason, the result is that the reconstructed profiles might have significant limitations in distinguishing different peaks when these are closer than a threshold (usually referred to as effective resolution).
Before we proceed with the analysis, we notice that it is not possible to characterize the effective resolution of EM analytically, as one can do, for instance, with the Tikhonov method or Levenberg-Marquardt by means of the averaging kernel procedure. This is due to the non-linear nature of the iterative scheme defined by Eq. (13). Therefore, in order to investigate the effective spatial resolution of the proposed algorithm, in this section we use synthetic data generated by a simulated extinction profile containing multiple equispaced peaks (delta functions). The simulated profile is discretized every 15 meters from ground level to 15,000 meters. Every ten points (i.e., 150 m) there is a peak of intensity 10 −4 m −1 ; all other points are set to zero. We use this profile to generate synthetic data following Eq. (5). True and reconstructed profiles are shown in Figure 1 where zooming on a specific peak profile allows visual inspection of the accuracy with which it is reconstructed. We notice that such accuracy is clearly related to the number of iterations utilized by the scheme. Indeed, pushing the algorithm to convergence (for example, letting it work for 500, 000 iterations -computational time: 37.8 seconds) provides a substantially ideal resolution power. On the other hand, stopping the iterations too early introduces a non-homogeneous blurring that affects the central part of the range more than the extremes. We speculate that the peculiar non-homogeneity of the blurring is due to the nature of the forward and backward operators H and H T , that integrate between zero and z k and between z k and z N , respectively, and both appear in the EM algorithm. Their effect is therefore more intense in the middle of the range, and symmetric with respect to the central altitude. We also observe that, while failing to reconstruct the height of the peaks, the intensity of the peaks is actually spread over a larger range of altitudes and the overall energy of the signal is conserved. This effect is represented in Figure 2, which shows the result of integrating three peaks in the reconstruction corresponding to 10, 000 iterations (computational burden of 0.7 seconds).
We performed many tests, changing the number of peaks and the distance between them. The general behaviour remains the same, with a non-uniform blurring across altitudes, and better and better reconstructions for more iterations. However, we observed that the strength of the blurring depends on these parameters quite substantially: for example, with three peaks with an inter-distance of 45 meters (3 sampling points) the reconstructed profile is close to the true one already at iteration 20,000; with two peaks with an inter-distance of 150 meters (10 sampling points), 10,000 iterations are enough to obtain almost perfect reconstruction.
At a different level, it is worth to observe that the resolution power of the EM algorithm is also affected by the rearrangements applied to Eq. (2). Such a procedure requires to take the log of P (z)z 2 Cρ(z) and therefore constraints over the values of this quantity are mandatory, to make sure the computation of the logarithm itself makes sense. As a consequence, some of the datapoints are discarded, and this automatically changes the number of altitudes for which it is possible to compute a solution. Eventually, our numerical investigations suggest that the problem of quantifying the spatial resolution of EM is a rather complex one, and will be the topic of future work.

Error analysis
We now investigate the impact of noise on the stability of the reconstruction. To do so, we utilize the synthetic extinction profiles used in EARLINET for assessing the performances of different inversion procedures. We consider both wavelengths λ = 355 and λ = 532, and restrict the range of altitudes to the first 9 kilometers. 6 For each λ we construct thirty different realizations of the data, by perturbing with Poisson noise the exact data obtained through Eq. (5); then we average these thirty realizations, mimicking the process of recording lidar data every minute, and then averaging over thirty minutes, as typical in EARLINET. We apply EM and obtain an estimated extinction profile. This whole procedure is repeated thirty times to obtain a Monte Carlo variance of the estimated profile. In Figure 3 we show the mean reconstructed profile (over the thirty repetitions), together with the confidence band and the true profile. In order to stop the iteration we applied the criterion based on the analysis of the cumulative residuals described in Section 2, using K = 3 ( Figure  3). The computational burden, for the entire Monte-Carlo process, is of 21.1 seconds and 19.2 seconds for the λ = 355 and λ = 532 cases, respectively. As a final test, we compared the performances of EM, of the Tikhonov method, of the Levenberg-Marquardt algorithm [6] and of a computation tool based on numerical derivative: the ACR (A Complete Retrieval tool for lidar signal analysis) software package from Advanced Lidar Applications srl (http: //www.alasystems.it/). ACR allows the reconstruction of the extinction profiles by computing the numerical derivative of the Raman signal, making use of several different types of filters.
In this application, both the optimal iteration number for EM and for Levenberg-Marquardt, and the optimal value of the Tikhonov regularization parameter have been selected by means of the criterion of residuals, explained above; for Tikhonov, this is realized by reducing the regularization parameter (rather than proceeding with the iterations) until the criterion is satisfied. On the other hand, the ACR set up requires many parameters to be tuned ad hoc, by an experienced user, for optimization between filtering and sharp structure localization according to the signal-to-noise ratio. The included features are: • spike rejection: for each set of three consecutive points, the pairwise differences are computed: if they have opposite signs (and bigger that a preselected threshold) the central point is replaced with the average of two adjacent points; • binning: the signal can be subjected to a binning of a variable (preselected) number of data points; • Gaussian filtering: the signal is subjected to a Gaussian filter of variable (preselected) variance; • running average: average current of a variable number of points. Figure 4 shows that EM provides a more stable reconstruction and is more accurate in reproducing the signal peaks with respect to the one obtained via Tikhonov; also the solution provided by Levenberg-Marquardt exhibits substantially more oscillations at higher altitudes. In the figure, for each curve, each point corresponds to an altitude at which the signal has been computed. It is possible to observe that EM outperforms ACR in terms of spatial resolution of the reconstructed data.

Application to experimental data
We have used EM to invert two experimental lidar profiles and compared the results with the ones provided by the application of the Tikhonov method, of the Levenberg-Marquardt method, and the application of the numerical derivative method by ACR. For this comparison the lidar signals of two distinct apparatuses used in very different atmospheric conditions are considered. The first case ( Figure 5) refers to measurements made in Naples with the MALIA (Multiwavelength Aerosol LIdar Apparatus) lidar [30] at 355 nm, with a laser pulse energy of 100 mJ, 20 Hz repetition rate and a 30 cm aperture telescope. Data are collected in Naples 2010/04/20 during a volcanic ash event produced by the eruption of Icelandic volcano Eyjafjallajökull. The second case (Figure 6), is related to measurements made in Naples 2015/04/24 with the AMPLE (Aerosol Multiwavelength Polarization Lidar Experiment) lidar [31] at 355 nm during a Saharan dust event. The AMPLE lidar makes use of laser source operating at 1000 kHz repetition rate, 0.6 mJ pulse energy, and a receiving telescope of 25 cm diameter. Figure 5 shows, in the top left panel, a comparison of the aerosol extinction profiles obtained by EM, Tikhonov, Levenberg-Marquardt, and numerical derivative by ACR. In EM, Tikhonov and Levenberg-Marquardt cases, regularization is realized by using the optimality criterion based on the analysis of cumulative residuals. The EM retrieved aerosol extinction profile together with the confidence band is shown in the top right part of the figure. The computational burden for EM, in this case, is of 0.6 seconds, and the selected iteration is 4,601. The Monte-Carlo process employed to obtain the confidence band has a computational burden of 40.9 seconds.
The Figure points out that EM outperforms both the Tikhonov and the Levenberg-Marquardt method as far as numerical stability is concerned. This is particularly true at higher altitudes, where the signal-tonoise ratio of the experimental measurements notably deteriorates. The EM and ACR solutions are in good agreement each other, taking into account the uncertainty bars on the ACR solution and the confidence bar for EM solution reported in the top right panel. The corresponding aerosol backscattering coefficient vertical profile as retrieved with the Raman method [32] is plotted in the bottom left panel. This part of the figure clearly shows the presence of a cloud around 3.2 km of altitude. Taking into account that the spatial resolution of the backscattering profile is 60m, the figure demonstrates the superior performance of EM with respect to ACR. In fact, the spatial resolution of the aerosol extinction profile retrieval results to be 180 m and 300 m for EM and ACR respectively. The better spatial resolution could be of extreme relevance for the accurate evaluation of the aerosol extinction to backscattering ratio (i.e. the lidar ratio) of the narrow atmospheric features. In addition, we notice that the extinction profile reconstructed by EM presents sharp variations at low altitudes that do not appear in the profiles reconstructed by the other methods. We tend to believe that the reconstruction provided by EM is more reliable than the others, for two main reasons. The first one follows from a visual comparison (Fig. 5, bottom left panel) with the estimated backscatter coefficient (obtained independently from the absorption coefficient, as mentioned above); indeed, the bumps in the backscatter coefficient below 1,500 m are in correspondence with the highest peaks in the extinction coefficient reconstructed by EM. The second reason is that the residuals, i.e. the difference between the measured and predicted data, of the EM solution are smaller and less correlated than those of the other two, as shown in Fig. 5 (bottom right panel), thus suggesting that the EM solution explains the data much better than the others. In this case the comparison is limited to Tikhonov and Levenberg-Marquardt because the solution obtained by ACR has a significantly lower spatial resolution, and comparing the residuals would not be meaningful. Profiles in Figure 6 have been obtained from measurements performed during a Saharan dust event. The extinction profiles show the presence of aerosol in the low range up to two kilometers altitude. In EM, Tikhonov and Levenberg-Marquardt cases, regularization is realized by using the optimality criterion based on the analysis of cumulative residuals. Also in this case there is a good agreement between EM and ACR retrieval while instability of Tikhonov results appear above 3 km of altitude. Levenberg-Marquardt appears to be more stable than Tikhonov but still with some fluctuations at higher altitudes. The computational burden for EM, in this case, is of 0.4 seconds, and the selected iteration is 507. The Monte-Carlo process employed to obtain the confidence band has a computational burden of 31.5 seconds.

Conclusions
In this paper we have described the application of the EM iterative algorithm, stopped with a cumulative residuals criterion, for reconstruction of the extinction coefficient from Raman lidar data.
Our choice of EM was mostly due to the presence of some desirable properties, namely: the natural and seemless incorporation of a positivity constraint; the fact that, despite being an iterative algorithm, the solution does not depend on the order of magnitude of the first guess; its simplicity and computational efficiency.
We have used synthetically generated data to investigate the capability of EM in reconstructing atmospheric profiles, and observed that the reconstructions provided by EM are more accurate than those provided by the Tikhonov method and by Levenberg-Marquardt. In particular, reconstructions provided by EM seem to have a better overall behaviour across different altitudes: at the lower altitudes they tend to be less oversmoothed than those provided by Tikhonov, where the penalty term encourages continuity of the solution; at higher altitudes they appear to be less affected by noise.
Application of EM to experimental data has been performed considering the lidar signals acquired from two lidar devices differing for laser pulse energy, repetition rate and receiving optics. Further, these two cases correspond to very different atmospheric situations. Both applications confirm that the algorithm can provide meaningful results with real data; in particular, the estimated profiles were similar, overall, to those provided by Tikhonov, by Levenberg-Marquardt and by ACR. But it should be noted that Tikhonov profiles show a greater instability in correspondence of the lowest values of the signal to noise ratio; some oscillations, although fewer than with Tikhonov, are present also in the profile reconstructed by Levenberg-Marquardt. In real cases EM also seems to provide more detailed solutions at low altitudes. To assess the reliability of these additional details we compared the reconstructed profile with an independently obtained profile of the backscatter coefficient, and we also plotted the residuals obtained by the different reconstructions. Both tests indicate that the EM solution is reliable.
In conclusion, both synthetic and experimental data indicate that the solutions provided by EM are comparatively less noisy at high altitudes and more detailed at low altitudes. We speculate that this pleasant feature is mainly due to the discrepancy metric used by EM: indeed, the Kullbach-Leibler divergence, differently from the plain Euclidean norm used by Tikhonov and Levenberg-Marquardt, implies a signaldependent penalization of the discrepancy between measured and modeled data: stronger signals are allowed to have more variance than small signals. Even though such weighting is optimal for Poisson distributed data (and this is not the case for the log-transformed lidar data), it seems to push the solution to have a better fit at low altitudes while back-projecting less noise at high altitudes. We also speculate that a similar effect might be obtained, in Tikhonov and Levenberg-Marquardt, using a weighted norm in the discrepancy term (corresponding to a Gaussian distribution with non-uniform variance across the altitudes). Future work might be devoted to better investigate this point.

Funding
Advanced Lidar Applications srl.