Nonlinear least squares algorithm for identification of hazards

Given observations of selected concentrations one wishes to determine unknown intensities and locations of the sources for a hazard. The concentration of the hazard is governed by a steady-state nonlinear diffusion–advection partial differential equation and the best fit of the data. The discretized version leads to a coupled nonlinear algebraic system and a nonlinear least squares problem. The coefficient matrix is a nonsingular M-matrix and is not symmetric. Iterative methods are compositions of nonnegative least squares and Picard/Newton methods, and a convergence proof is given. The singular values of the associated least squares matrix are important in the convergence proof, the sensitivity to the parameters of the model, and the location of the observation sites.


Introduction
The paper is a continuation of the work in White (2011). The first new result is a convergence proof of the algorithm to approximate a solution to a nonlinear least square problem. A second new result is a sensitivity analysis of the solution as a function of the model parameters, measured data, and observation sites. Here, the singular value decomposition of the least squares matrix plays an important role.
A convergence proof is given for the algorithm to approximate the solution of a coupled nonlinear algebraic system and a nonnegative least squares problem, see lines (9-11) in this paper and in White (2011). These discrete models evolve from the continuous models using the advection-diffusion PDE in Equation (1) (see Pao, 1992). In contrast to the continuous models in Andrle and El Badia (2015), Hamdi (2007), and Mahar and Datta (1997), this analysis focuses on the linear and semilinear algebraic systems. In Hamdi (2007), the steady-state continuous linear model requires observations at all of the down stream portions of the boundary. In Andrle and El Badia (2015), the continuous linear velocity free model where the intensities of the source term are time dependent requires observations at the boundary over a given time interval. In both these papers, the Kohn-Vogelius objective function is used for the symmetric models in the identification problems.
Here and in White (2011) we assume the coefficient matrix associated with the steady-state discretized partial differential equation in Equation (1) is an M-matrix and may not be symmetric, which includes coefficient matrices from upwind finite difference schemes for the advection-diffusion PDE.
The least squares matrix C( :, ssites), see Equations (6-8), is an k × l matrix where k > l and k and l are the number of observation and source sites, respectively. The rank and singular values (see Meyer, 2000, section 5.12), of C( :, ssites) determines whether of not the observation and source sites are located so that the intensities can be accurately computed.
Four examples illustrate these methods. The first two examples in Section 3 are for n = 2 and n = 3 unknowns, and they show some of the difficulties associated with the nonlinear problem.
Sections 4 and 5 contain the convergence proof. Examples three and four in Section 6 are applications of the 1D and 2D versions of the steady-state advection-diffusion PDE. Sensitivity of the computed solutions due to variation of the model parameters, the data and the relative location of the source and observation sites is illustrated. The connection with the sensitivity and the singular values of the least squares matrix is established.

Problem formulation
Consider a hazardous substance with sources from a finite number of locations. The sources are typically located at points in a plane and can be modeled by delta functionals. The hazard is described as concentration or density by a function of space and time, u(x, y, t). This is governed by the advection-diffusion PDE where S is a source term, D is the diffusion, v is the velocity of the medium, d r is the decay rate, and f(u) is the nonlinear term such as in logistic growth models. The source term will be a finite linear combination of delta functionals where the coefficients a i are the intensities and the locations x i are possibly unknown points in the line or plane. The source coefficients should be nonnegative, which is in contrast with impulsive culling where the coefficient of the delta functionals are negative and have the form −c i u(x i ) (see White, 2009).
We will assume the source sites and the observation sites do not intersect. The reason for this is the ability to measure concentrations most likely allows one to measure the intensities at the same site.
Identification problem: Given data for u(osites) at a number of given observation sites, osites, determine the intensities (a i ) and locations (x i ) at source sites, ssites, so that ‖data − u(osites)‖ is suitably small.

Linear least squares
Since the governing PDE is to be discretized, we first consider the corresponding linear discrete problem The coefficient matrix A is derived using finite differences with upwind finite differences on the velocity term. Therefore, we assume A is a nonsingular M-matrix (see Berman & Plemmons, 1994, chapter 6 or Meyer, 2000.10). The following approach finds the solution in one least squares step. The coefficient matrix is n × n and nodes are partitioned into three ordered sets, whose order represents a reordering of the nodes: The discrete identification problem is given data data at osites, find d(ssites) such that In general, the reordered matrix has the following 2 × 2 block structure with other = ssites rsites where a = A(osites, osites), e = A(osites, other), f = A(other, osites) and b = A(other, other).

Assumptions of this paper:
(1) A is an M-matrix so that A and a have inverses.
This gives a least squares problem for z Au = d.
osites observe sites ssites source sites and rsites remaining sites.
Since a is k × k, e is k × (n − k), and b is (n − k) × (n − k), the matrix C is k × (n − k) and the least squares matrix C( :, ssites) is k × l where k > l. The least squares problem in (7) has a unique solution if the columns of C( :, ssites) are linearly independent (C(:, ssites)z = 0 implies z = 0).
Impose the nonnegative condition on the components of the least squares solution. That is, consider finding z ≥ 0 such that If the matrix C( :, ssites) has full column rank, then the nonnegative least squares problem (8) has a unique solution (see Bjorck, 1996, section 5.2). The nonnegative least squares solution can be computed using the MATLAB command lsqnonneg.m (see MathWorks, 2015).

Least squares and nonlinear term
Use the reordering in (3) and repeat the block elementary matrix transformation in (4) and (5) applied to the above. Using the notation d Solve for u 1 and then for u 0 This leads to a nonlinear least squares problem for nonnegative z The solution of the nonnegative least squares problem is input to the nonzero components in d, that is, d(ssites) = z. The coupled problem from (9) and (10) is to find z ≥ 0 and u so that

Two algorithms for the coupled problem
If the nonnegative least squares problem has a unique solution z = H(u) given a suitable u, then we can write (11) as or as a fixed point The following algorithm is a composition of nonnegative least squares and a Picard update to form one iteration step. Using additional assumptions we will show this is contractive and illustrate linear convergence. Example 3.2 is a simple implementation with n = 3, and in Section 6 Example 6.1 uses the 1D steady-state version of the model in (1) and Example 6.2 uses the 2D steady-state model in (1).
The following algorithm uses a combination of nonnegative least squares and Newton like up- The matrix F u (z, u) is not the full derivative matrix of F(z, u) because it ignores the dependence of z on u. Choose > 0 so that F u + I has an inverse.
Variations of the NLS-Newton algorithm include using over relaxation, , at the first linear solve and using damping, , on the second linear solve in the Newton update: Appropriate choices of w, and can accelerate convergence.
The calculations recorded in Table 4 and Figure 4 in White (2011), illustrate the NLS-Newton algorithm for the 1D steady-state problem with logistic growth term d variable growth rate c. The larger the c parameter resulted in larger concentrations. The intensities at the three source sites were identified. However, the number of iterations of the NLS-Newton algorithm increased as the c parameter increased. The algorithm did start to fail for c larger than 0.0100.
The lack of convergence is either because the associated fixed point problem does not have a contractive mapping or because the solutions become negative. The following example with n = 2, nonzero source term and nonsymmetric A illustrates the difficulty of solving a semilinear algebraic problem where the solution is required to be positive.
The next example with n = 3, given data at the first and second nodes and a source term at the third node illustrates the coupled problem in (11). This simple example can be solved both by-hand calculations and the NLS-Picard algorithm, and more details are given in Appendix 1.  . are given in Figure 1. The left graphs indicate the five solutions of (10) for fixed z = zz = 10 and increasing c. The two "*" are the given data, which are from the solution of (10) with c = 0.0800 and 5% error. The right graphs are the converged solutions of (11) for c = 0.08:0.04:0.20.

Nonnegative least squares problem
The nonnegative least squares problem in (10)  When the nonnegative condition is imposed on z ≥ 0 and the matrix is SPD, then the following are equivalent (see Cryer, 1971): Any solution of a variational inequality is unique and depends continuously on the right-hand side. The next theorem is a special case of this.

S where S is a bounded set of nonnegative n-vectors. Moreover, if ‖d(u) −d(v)‖ ≤K‖u − v‖, then there is a constant C 1 such that
Proof Let z and w be solutions of (10). The variational inequalities for z with y = w, and for w with y = z are Add these to get a contradiction for the case z − w is not a zero vector In order to show the continuity, use the above with z = H(u) and w = H(v):

Add these to obtain for B ≡ C(:, ssites) T C(:, ssites)
Use the Cauchy inequality By the definition of g and the assumption on d we have for some C 1 □ The solution of the variational problem for fixed u has the form z =H(u). The assumption that C(:, ssites) has full column rank implies there is a positive constant c 0 such that c 2 0 w T w ≤ w T C(:, ssites) T C(:, ssites)w. This and the above theorem give the following Although in Example 3.2 with n = 3 one can estimate c 0 and C 1 , generally these constants are not easily approximated. However, c 0 can be estimated by the smallest singular value of the least squares matrix.

Convergence proof of the NLS-Picard algorithm
Let the assumptions in the previous theorem hold so that (12) is true for all u, v ∈ S. The mapping G(u) ≡ A −1 (d(H(u)) +d(u)) may not give G(u) ∈ S! This appears to be dependent on the particular problem and properties of d (u). If the bound is chosen so that d (u) is also nonnegative, then assumption (1) implies A −1 ≥ 0 and hence G(u) ≥ 0. The least squares condition requires the solution to be "close" to the given data at the osites, which suggests the G(u) should remain bounded. The following theorem gives general conditions on the problem.
Theorem 5.1 Let the assumptions of Theorem 4.1 hold for u, v ∈ S. Assume there is a solution to the coupled problem (11), u * ∈ S. Let c 0 and C 1 be from (12). If

Applications and sensitivity analysis
Both the NLS-Picard and NLS-Newton methods converge linearly, that is, the errors at the next iteration are bounded by a contractive constant, � r < 1, times the error from the current iteration.
However, the contractive constant for the NLS-Newton algorithm is significantly smaller than the contractive constant for the NLS-Picard algorithm, and this results in fewer iterations steps for the NLS-Newton algorithm. The calculations in the next two examples can be done by either algorithm. The relevance of the singular values of the least squares matrix to the sensitivity of the calculations with variations in the data and the source and observation sites is demonstrated.
Example 6.1 Consider the steady-state 1D model in (1). The following calculations were done by the MATLAB code in White (2015, poll1dlsnonla2.m) and with seven observation sites, three source sites, and n = 160. The differential Equation (1) was discretized by the upwind finite difference approximations, and the sources terms were the intensities times the approximation of the delta functionals

Figure 2. NLS-Newton algorithm with variable c.
where initially d ≡ zeros(n, 1). In order to check the accuracy of the finite difference model, calculations were done with n = 160, 320 and 480, which gave similar results as reported below for n = 160. The condition number (ratio of the largest and smallest singular values) of the least squares matrix controls the relative errors in the computed least squares problem. The condition numbers for the n = 160, 320, and 480 are about the same at 17. 0170, 16.5666, and 16.4146, respectively. For c = 0.010 the NLS-Picard algorithm did not converge, but the NLS-Newton algorithm converged in 92 steps. The solid lines in Figure 2 are with no random error in the data, and the dotted lines are from data with 5% random error. For c = 0.005 the NLS-Picard algorithm converged in 100 steps with r ≈ .83, and the NLS-Newton algorithm converged in 19 steps with r ≈ .41. If w is changed from 1.0 to 1.1, then the NLS-Newton algorithm converges in 16 steps with r ≈ .28. As c decreases, the iterations required for convergence and r decrease.
The dotted lines indicate some uncertainty in the numerical solutions, which can be a function of the physical parameters and the location of the observation sites relative to the source sites. If there is no error in the data, the intensities of the sources are in the vector Use 100 computations with the c = 0.001 and 5% random errors. The means and standard deviations of the computed intensities at the three source sites are The standard deviation of the intensities at the center source in Figure 2 is large relative to its mean value. The large computed least squares errors are a result of some relatively small singular values in the singular value decomposition of the least squares matrix where the singular values i are the diagonal components of Σ; U is 7 × 7, Σ is 7 × 3 and V is 3 × 3.
In this example, the third singular value is small The least square solution of (10) is given by the pseudo inverse of the least squares matrix times data − g(u) ≡d Because 3 is relatively small, any errors in d will be amplified as well as the components of This suggests the second component, the center source, of least squares solution will have significant errors. The errors can be decreased if the percentage random error is smaller or by alternative location of the observation sites. In the 1D model this is not too difficult to remedy. However, in the 2D model this is more challenging.   Here, the third singular value has significantly decreased and can cause failure of the NLS-Picard algorithm or increased computation errors because of the variation in the data. ΔL = 30 and vely = 0.05: In this case the second and third singular values of the least squares matrix decreased, and the standard deviation of the second source increased.
Changing the location of the third observation site can also cause more uncertainty in the calculations, which is indicated by larger standard deviations and smaller singular values of the least squares matrix. Change ΔL = 30 to ΔL = 20 and ΔL = 10 and keep vely = 0.10. ΔL = 20 and vely = 0.10: Here, the third singular value has significantly decreased and can cause failure of the NLS-Picard algorithm or increased computation errors because of the variation in the data. The standard deviation of the third source has significantly increased.  ΔL = 10 and vely = 0.10: In this case the third singular values of the least squares matrix have dramatically decreased, and the standard deviation of the third source is very large relative to the mean.
The calculation in Figure 4 has six source sites (near the axes) and nine observation sites (down stream and below the "*") (White, 2015, testlsnonlf2da6.m) uses NLS-Picard. Here the vely is larger with velocity the equal to (1.00, 0.30), and the random error in the data is smaller and equal to 1%: The source closest to the origin is the most difficult to identify; with one percent error in the data the standard deviation is 0.8512 and with five percent error this goes up to 4.2348. This happens despite the relatively close distribution of the singular values from 0.0782 to 0.0047 for a condition number equal to 16.7658.

Summary
In order to determine locations and intensities of the source sites from data at the observation sites, there must more observations sites than source sites and the observation sites must be more chosen so that the least squares matrix C(:, ssites) has full column rank. Furthermore, the singular values in the singular value decomposition of the least squares matrix are important in the convergence proof and the location of the observation sites.
Measurement errors in the observations as well as uncertain physical parameters can contribute to significant variation in the computed intensities of the sources. Multiple computations with varying these should be done, and the means and standard deviations should be noted. Large standard deviations from the mean indicate a lack of confidence in the computations. In this case, one must adjust the observation sites, which can be suggested by inspection of the smaller singular values for the least squares matrix and the corresponding columns in V of the singular value decomposition.