Using non-diagonal data covariances in geophysical inversion

,


INTRODUCTION
Measuring misfit, i.e. the discrepancy between observations and the predictions made by geophysical models, is a key element of data analysis and particularly inversion (e.g., Tarantola 2004). Depending on how we define our measure of misfit, we consider different models as adequate representations of the physical properties within the Earth (e.g., Farquharson & Oldenburg 1998). Defining misfit is closely related to estimates of uncertainty of the observed data and any measure of misfit explicitly or implicitly assumes certain confidence limits on each datum. It is only in the combination of a suitably defined misfit measure with some knowledge of data errors that we can make an informed judgement on the validity of a given model of the Earth (Mosegaard & Hansen 2016).
The workflow for most geophysical inversions is to measure an observable of interest and its uncertainty, e.g. the travel-time of a teleseismic phase, the anomalous gravitational attraction in a certain region or the magnetotelluric impedance at a set of observation sites, and find a model that fits these observations. However, under certain circumstances it can be advantageous to not invert these observations directly, but instead some derived quantity. For example, in gravity gradient inversions the use of rotational invariants derived from the components of the gradient tensor (e.g., Heath 2007;Pilkington 2014) can help to deal with changing instrument orientations in airborne surveys (e.g., Pedersen & Rasmussen 1990). In ocean bottom magnetotellurics, inversion of apparent resistivity and phase instead of impedance circumvents problems with cusps in the data (Worzewski et al. 2011) due to the interaction of the electromagnetic fields with the land-sea interface (Key & Constable 2011). For marine controlled-source electromagnetics Wheelock et al. (2015) show that a logarithmic transform of the observed electric field improves the convergence of the inversion and results in robust resistivity models. On land, inversions of phase-tensor data (e.g. Patro et al. 2012) or apparent resistivity and phase (e.g. Khoza et al. 2013) are sometimes used in magnetotelluric inversions to reduce the influence of static distortions (Chave & Jones 2012).
These example demonstrate that, albeit not necessarily standard, there are scenarios where inverting transformations of the original data can be useful. Implementing such an inversion in an existing algorithm can be relatively easily achieved by adjusting the objective function and using the chain rule for the calculations of gradients (e.g. Moorkamp et al. 2011), but requires modifications to the source code. Thus practitioners are restricted to the range of transformations envisaged by the algorithm developers or need access to the source code and make modifications in the hope that the code is modular enough to be amenable to these changes. In addition, non-linear transformations of the observables can introduce bias into the solution of the inverse problem, i.e. even under ideal conditions the expected value of the inversion parameters differs from the true values more than indicated by the posterior model covariance (Tellinghuisen 2000). As we will show below, in many cases this is more of a theo-Data covariance 3 retical issue than of practical importance, but can become significant if we want to make quantitative statements about model uncertainty, for example in the context of Bayesian inversion approaches (e.g., Tarits et al. 1994;Guo et al. 2011;Ray & Key 2012).
Here, we present a methodology that circumvents the two aforementioned issues. It utilizes the non-diagonal elements of the data covariance matrix to emulate the fitting of linear or non-linear data transformations. Once it is implemented in the inversion algorithm, a user can emulate fitting any transformation by specifying an appropriate inverse data covariance. As we are still fitting the original observables and only modify the errors in a way that conforms with the principles of least-squares optimization, we also avoid the issue of bias. We will first present the basic idea of the approach. Then we will use examples based on magnetotellurics to demonstrate the issue of bias and the properties of the approach. These examples are intentionally designed to be as simple as possible in order to focus on the underlying principles rather than complex technicalities. Even though we focus here on magnetotelluric data, the presented methodology is completely general and can be used for any type of inversion without changes to the general formulation.

LEAST-SQUARES MEASURES OF MISFIT
Most current geophysical inversion codes use a least squares measure of data misfit Φ(m) defined as Here m is a M -dimensional model vector for which we want to calculate the misfit, s(m) the n- which indicates how many standard deviation on average the synthetic data differs from the observations. Thus mathematically any model that fits the observed data to an RMS value of 1 can be considered an adequate explanation for the observations from a statistical point of view. In practice, additional soft criteria, e.g. geological plausibility, are used to select a preferred model from the infinite number of possible solutions (e.g. Rao et al. 2014).
For inversion we typically assume mutually independent data, so the covariance matrix has simple diagonal form This is not only computationally convenient as the data covariance and its inverse can be readily calculated, but also reflects the difficulties in estimating representative errors. As a consequence the variances σ 2 i are rarely rigorously justified through quantitative analysis but a numerical reflection of the users confidence in each datum.
We now assume the more general case where we have a full covariance matrix with off-diagonal elements that express the interdependence of different data All variances σ 2 i are positive and C D is hermitian and semi-positive-definite (Tarantola 2004), the individual covariances σ ij , however, are not necessarily positive. If we assume that we do not have redundant data, i.e. the columns of C D are linearly independent, the inverse weighting matrix W = C −1 D is guaranteed to exist and hermitian and positive-definite.
To demonstrate how we can use non-diagonal covariance matrices in a practical inversion scheme, we use the example of magnetotelluric inversion. In magnetotellurics we estimate a complex, fre-   Moorkamp et al. 2011). In the real valued vector formulation, the real and imaginary part of each impedance element are associated with separate vector components, but otherwise the misfit equation has the same structure as Equation 6. Typically the variance for the real and imaginary parts of each impedance element are assumed to be identical, as this is what most processing routines provide.

INVERTING FOR DERIVED QUANTITIES
We will now show how non-diagonal covariance matrices can be used to simulate the inversion of derived quantities. In magnetotellurics, for example, it is common practice to invert for apparent resistivity ρ a = 1 2πf µ |Z| 2 and phase ϕ = tan −1 Y X instead of complex impedance elements Z = X +iY . Here f is the frequency at which the impedance is measured and µ the magnetic permeability of the vacuum. In some cases impedance phase is less affected by noise or static shift effects than apparent resistivity and thus small error estimates for the phase but larger errors for apparent resistivity can be used in the inversion (e.g. Khoza et al. 2013). In other cases apparent resistivity is down-weighted compared to phase in order to allow inversion of marine MT data that is severely affected by coast effects (Worzewski et al. 2011). A number of inversion codes allow to fit apparent resistivity and phase (e.g. . The Gauss-Markov theorem guarantees that the estimates from least-squares procedures are unbiased and consistent as long as errors on the observations have zero mean and are drawn from a distri-bution with finite variance (Jennrich 1969;Constable 1988). Deviations from these assumptions, for example due to non-linear transformations of the observables, can lead to biased results (Tellinghuisen 2000). However, for simple models and small errors the problem of bias seems to be negligible (Chave & Lezaeta 2007;Wheelock et al. 2015). Here we show a way that preserves the statistical properties of the magnetotelluric impedance while simulating the fit to a derived quantity. Another advantage of this approach is that, once the general ability to specify the inverse covariance matrix has been implemented, it allows the users of the inversion code to invert for arbitrary derived quantities without changing the source code of the inversion. In our inversion algorithm we have chosen to implement the inverse covariance in terms of sparse matrices as typically we will not have covariances between many combinations of data. Furthermore, the user has to directly specify the inverse covariance, as this is needed throughout the algorithm. This simplifies the implementation tremendously as only a sparse matrix-vector multiplication has to be added to the calculation of the objective function. It does put the burden of calculating the inverse to the user, but it can be readily achieved with open-source tools such as Maxima and Python and gives the user the ability to invert for quantities that the developers might not have foreseen.
For a multi-variate problem we can map the co-variance C ξ specified with respect to the quantities ξ 1 , . . . , ξ N to a co-variance C D with respect to the quantities z 1 , . . . , z N through (Tellinghuisen 2001;Booker 2014). Here J is the matrix of partial derivatives In the following we use magnetotelluric apparent resistivity and phase as first examples to demonstrate how we can use this in practice. Assuming the inversion fits the elements of the impedance tensor, we want it to behave as if it was fitting apparent resistivity and phase, i.e. we want to be able to specify variances for apparent resistivity and phase and map these variances into a covariance matrix for the impedance.
A single element Z of the magnetotelluric impedance tensor can be written in terms of apparent resistivity and phase as In situations where we want to express the variance and covariances of the impedance in terms of the variance of apparent resistivity and phase, we need the expression of the real and imaginary parts of Z in terms of these quantities, which is simply We now consider only a single element Z of the impedance tensor and split the misfit calculation into a real vector that contains the real and imaginary parts of Z as separate elements, i.e. we examine a misfit functional of the form This makes the analysis slightly easier and we will show below how to construct a full data covariance matrix for a practical inversion scheme with all impedance elements at a number of frequencies and sites.
In this case we have So for each complex impedance element we have a covariance matrix of the form At first sight, the form of this co-variance matrix is confusing, as the expressions for each element contain the variances with respect to both apparent resistivity and phase in a non-trivial way. To better understand the properties of this matrix we calculate the eigenvalues and eigenvectors which are Figure 1 shows a sketch of the situation in the complex plane. The variances for apparent resistivity and phase get mapped into an ellipse around the impedance estimate Z. In this illustration the eigenvalue for the eigenvector e 1 is larger than e 2 , i.e. the variation in terms of apparent resistivity is relatively large and in terms of phase is relatively small. From equation 16 we can see how the magnitude of these two eigenvectors directly depends on the variances of apparent resistivity and phase and thus specifying a large variance in terms of phase and small variance for apparent resistivity will turn the elongated axis by 90 degrees. Figure 1. Mapping different variances for apparent resistivity ρ a and phase ϕ onto the complex plane. The red ellipse around the impedance estimate Z shows the mapped confidence area as described by the covariance, we show the eigenvectors and the corresponding eigenvalues in blue.
For inversion we need the inverse of the covariance matrix The 2 × 2 data covariance derived above, describes the inter-relationship between the real and imaginary parts of each impedance element. In this case, however, there is no dependence between different elements of impedance and thus the full covariance for an inversion with many elements, frequencies and sites can be built up from individual blocks of the smaller covariance matrices for each element. For magnetotelluric data the maximum size of such blocks would be 8 × 8 for quantities that depend on all elements of impedance at one frequency. We cannot currently think of practical applications where one would have dependencies between different frequencies and/or sites. As the inverse of a block matrix is the inverse of each block, such a scheme would require the inversion of many 8 × 8 matrices which is easily achievable on modern computer architectures.

Implementation in an inversion algorithm and comparison with standard approach
Incorporating this approach into an inversion algorithm is straightforward. For the definition of misfit all that needs to be done is to replace the inverse data covariance matrix, for example with the form shown in Equation 17 for apparent resistivity and phase. This change propagates to the calculation of the gradient of the objective function in a straightforward manner, viz.
Data covariance 9 where G is the matrix of partial derivatives with elements G ij = ∂s i (m) ∂m j . In most practical algorithms G is not formed explicitly, but the vector-matrix product is calculated using reciprocity (e.g. Avdeeva et al. 2015).
For an inversion algorithm that directly wants to fit a derived quantity, additional changes have to be made that are not particularly difficult, but require changes to the misfit and gradient calculation. Remembering that in our notation z i are the original observables and ξ i the quantities derived from them, the objective function becomes where ξ(z 1 , . . . , z N ) is the function that describes the mapping between observables and derived elements and is the inverse of the function in Equation 7. For the gradient calculation we further need the derivative matrix K with elements K ij = ∂ξ i ∂z j . Then the gradient becomes If the relationship between observables and derived quantities is linear, i.e.
we have K = J −1 and thus Equation 20 can be written as So for a linear parameter transformation, our approach and the traditional approach of inverting transformed quantities are identical. However, if the parameter transformation is non-linear, there are some important differences. To begin with, the matrices J and K are no longer the inverse of each other.
Furthermore, the derivatives in J are calculated around the observed data and are constant during the inversion while the derivatives in K are calculated around the synthetic data and change at each iteration.

Analysing bias
As alluded to in the introduction, inverting non-linear transformations of the observed data can introduce bias into the solution of the inverse problem (Tellinghuisen 2000). In order to illustrate the issue of bias, we use a simple example of retrieving the resistitivy of a homogeneous half-space with magnetotelluric measurements. We calculate synthetic impedances for a 100 Ωm half-space at 20 frequencies from 100 − 0.01 Hz. We then generate 50,000 realizations of noisy data by adding random samples from a Gaussian distribution with zero mean and a standard deviation of 0.25|Z| to the real and imaginary parts of Z at each frequency. For each of these realizations we then find the best-fitting half-space by minimizing the objective function in Equation 1. Figure 2a shows a histogram of apparent resistivity calculated from the noisy realizations of Z at a frequency of 100 Hz. We can identify a small upward bias of the histogram with a mean value of 106.4(2)Ωm which matches the value predicted by equation 29 in Wheelock et al. (2015). Figure 2b shows a histogram of the half-space values resulting from the minimization of the objective function.
Interestingly the histogram here is downward biased with a mean retrieved half-space conductivity of 94.05(4) Ωm. In contrast, the mean for the estimate based on impedance is 100.11(4). It is surprising and counter-intuitive that the inversion of the upward-biased apparent resistivity results in a downward biased estimate of the resistivity of the half-space. However, there is a simple explanation for this. When calculating the standard deviations for the noisy apparent resistivities we use the noisy impedances and apply linear error propagation Thus even though all samples at a given frequency have been generated with the same variance σ 2 Z , those realizations that result in a higher than average apparent resistivity will have an increased associated variance in apparent resistivity and vice versa. The net result is that estimates with high apparent resistivity are down-weighted in the objective function compared to estimates with a low apparent resistivity and as a consequence the minimum of the misfit function is shifted towards lower estimates for the resistivity of the half-space. When using our formulation with non-diagonal covariances and propagate the calculated errors for apparent resistivity and phase to impedance, the mean half-space conductivity recovered from all realizations is 100.28(5) Ωm. Strictly speaking our results indicate that the estimates from impedance and covariance are also biased as the true value deviates more than 3 standard deviations of the mean from the true value for impedance and 5 standard deviations of the mean for covariance. However,  our precision in the miminization of the objective function is also limited and the deviation of less than 0.3% from the true value is within the accuracy to which we can find that minimum. In contrast, the mean value based on the inversion of apparent resistivities is biased by more than 100 standard deviations of the mean and cannot be attributed to this. Figure 4 shows the misfit as a function of half-space resistivity for one realization and four different measures of misfit. In addition to misfit measures based on impedance, apparent resistivity and our covariance measure, we also plot the dependency for the logarithm of apparent resistivity. As demonstrated by Wheelock et al. (2015) the misfit measure based on apparent resistivity saturates for small half-space resistivities, i.e. the misfit does not change significantly even when the resistivity changes by an order of magnitude. For this reason Wheelock et al. (2015) propose to use the logarithm of resistivity to avoid this phenomenon. When using a measure based on the covariance, the misfit increases even for resistivity values below 10 Ωm albeit at a lower rate than for the logarithmic measure. Overall, the behaviour of misfit for the covariance shows characteristics between inverting impedance and the logarithm of apparent resistivity.
Note that for this particular realization of noise, the minima of the objective functions for all four measures of misfit are close to each other and occur at half-space resistivities less than 100 Ωm. So the fact that the inversion of apparent resistivity is biased does not mean that in all cases the inversion result based on impedance or covariance will be superior. In fact, there will be realizations of noise where the inversion of apparent resistivity will yield the correct half-space resistivity and the inversion of impedance will give a shifted result. On average though the results based on apparent resistivity will be too small and unbiased measures will give the correct answer.

PRACTICAL EXAMPLES
We demonstrate the practical properties of our approach using 1D inversion and two simple scenarios based on synthetic data calculated from a two-layered model ( Figure 5). The true model consists of a 2 km thick layer with a resistivity of 100 Ωm underlain by a 1 Ωm half-space. The 1D inversion code is based on the algorithm described in Avdeeva & Avdeev (2006) modified to accommodate the inverse data covariance matrix.
For one test dataset we shift the apparent resistivities at Periods > 10 s (left two panels in Figure   5) and assume large errors for apparent resistivity in the inversion. In the other case (right panels in Figure 5) we set the phase constant at long periods and assume large errors for phase. If we invert the data with shifted apparent resistivity with small errors for both phase and apparent resistivity, the inversion tries to find a compromise between matching the conflicting information from the two quantities (green line) as expected. As our starting model consists of 26 layers with a resistivity of 100 Ωm, we match the high-frequency part of both curves, but have a significant discrepancy at longer periods. If we down-weight the apparent resistivity by prescribing a relative error of 200%, we correctly reproduce the phase curve within the specified error, but the inversion ignores the shift in apparent resistivity at long-periods (red line). It is well known that in 1D impedance phase has no sensitivity to the absolute value of resistivity but permits a large range of resistivity-thickness combinations (e.g. Oldenburg 1979). The good match with the unshifted apparent resistivities (black line) in this case is only due to the fact that we used the correct resistivity for the upper layer in the starting model. Of course, prescribing relative errors of 200% means that we completely ignore the apparent resistivity information and in practice data with such a severe shift could not be trusted to have reliable phase information. The main purpose of this experiment is to demonstrate the feasibility of our approach in general. How to choose the relative weighting depends on the dataset under investigation and its specific properties.
To demonstrate how to deal with problematic phase data, we use the same original synthetic data and set the phase at periods > 100 s to a constant value (right panels in Figure 5). We now use a relative error of 100% for the phase and 5% for apparent resistivity. We can see that the inversion now honours the apparent resistivity information, but ignores the phase. As these two do not provide independent information, but are related by a dispersion relationship (Yee & Paulson 1988), fitting apparent resistivity results in a phase that also fits the unmodified data (black line).
To complete our analysis we investigate how the specified covariance in impedance space maps  Figure 7. The true mapping of the confidence area specified by the covariance. We use relative errors of σ ϕ = 0.1ϕ and σ ρa = 0.05ρ a to calculate the covariance matrix for impedance. We show all points around the impedance estimate (yellow) that fall within the error bounds (blue region left). We then map the points inside this region back into apparent resistivity and phase (blue dots right). Compared to the originally specified error bounds (red box), we approximate the intended region well.
back into apparent resistivity and phase. It is clear that because of the non-linear mapping, the confidence ellipse in impedance only approximates the situation of independent errors in apparent resistivity and phase. Figure 7 shows how the acceptable region in impedance space (left) maps into apparent resistivity and phase (right). Here we construct the impedance covariance matrix assuming standard deviations σ ϕ = 0.1ϕ and σ ρa = 0.05ρ a for apparent resistivity and phase, respectively (left) and calculate how points within the elliptical region map back into apparent resistivity and phase (right).
Compared to the red boxed region that describes the originally specified confidence area, we can see some differences. In particular, the remapped confidence area does not allow extreme apparent resistivity and phase combinations. Generally the covariances describe the confidence area well, especially when considering that in practical inversion we will use such confidence intervals for derived quantities to overcome certain deficiencies in the data and not because we have statistically firm reasons for it.
Our approach is not limited to apparent resistivity and phase, but can be applied to virtually any transformation of the impedance tensor. As a final example we show how this can be achieved for the Berdichevsky invariant This simple transformation also demonstrates the equivalence for linear transformations, discussed above. As we need to express Z xy and Z yx in terms of our new parameters we need a second complex variable, as an auxiliary quantity. In this case we can write The 4 × 4 inverse of the covariance matrix corresponding to the off-diagonal elements of impedance is Here the approximation in Equation 30 is made for the case that the covariance σ 2 F for the auxiliary quantity is much larger than the covariance for the Berdichevsky invariant. Conversely, if σ 2 B = σ 2 F = 1 2 σ 2 Z , we can see that the inverse data covariance matrix becomes diagonal with all elements on the diagonal equal to 1

Data covariance 17
The first case (σ 2 F σ 2 B ) corresponds to the case where we invert for the invariant instead of the individual impedances. As we have shown above, the result is equivalent to an objective function formulated in terms of the invariant. In the second case (σ 2 B = σ 2 F ) we invert two quantities that are linear combinations of the original data and this is equivalent to inverting the original data with identical variances as expected.

DISCUSSION AND CONCLUSIONS
We have shown a simple methodology to use non-diagonal covariance matrices to emulate the inversion of quantities calculated from the original data. To illustrate the approach we have used magnetotelluric apparent resistivity and phase and the Berdichevsky invariant. However, the approach is completely general and it can easily be applied to other quantities such as phase tensors (Caldwell et al. 2004) in MT or invariants of the gravity gradient tensor. Even though we have to specify a large data covariance matrix and calculate its inverse, the approach is computationally feasible as in practice the matrix will be sparse and have a large number of small independent blocks that can be inverted independently.
Using apparent resistivity as an example, we have shown that inverting quantities calculated from non-linear functions of the observations can result in biased model estimates. This bias is statistically significant and thus relevant from a theoretical point of view. However, it can be argued that in practice the bias is small enough to have little to no effect on the interpretation of the results. The apparent resistivies shown in Figure 3 show errorbars of ±20Ωm and the resulting bias for the estimated half-space resistivity is only 6Ωm. Given such noisy data, any experienced practitioner would only use the resulting model as an order of magnitude estimate and would not try to discern variations of tens of Ωm. Even in these situations it is of advantage though to use an unbiased inversion approach particularly if we want to move towards statistical evaluation of the results and determine model uncertainties. Although we have presented the method as part of a deterministic inversion algorithm, the same formalism can be applied in the context of probabilistic Bayesian inversion approaches where we do not estimate a single model but aim at reconstructing the probability density functions for the model parameters (e.g. Grandis et al. 1999;Buland & Kolbjørnsen 2012;Xiang et al. 2018). These issues become even more prevalent with quantities such as the phase tensor that is calculated as a ratio of random variables and where it has been shown that care needs to be taken to calculate uncertainties (Chave 2013;Booker 2014).
Our experiments with inverting shifted data have shown that the emulation of the derived quantities works well and approximates the specified uncertainty bounds sufficiently for practical applications.
Furthermore, using non-diagonal covariances the user is not limited by the quantities envisaged dur-ing code development, but can emulate arbitrary quantities once the inversion algorithm supports the specification of a covariance. Given how easy it is to implement this scheme in practical inversion methods, we believe that it should become a standard feature.

ACKNOWLEDGMENTS
We would like to thank the editors Kerry Key and Gary Egbert, as well as Alexander Grayver and an anonymous reviewer for helpful comments that improved the quality of the manuscript. This work was funded by the German Research Foundation, DFG under grant MO 2265/4-1.