Structure‐Coupled 3‐D Imaging of Magnetotelluric and Wide‐Angle Seismic Reflection/Refraction Data With Interfaces

Magnetotelluric (MT) and wide‐angle seismic reflection/refraction surveys play a fundamental role in understanding the crustal rheology and lithospheric structure of the Earth. In recent years, the integration of the two methods in order to improve the robustness of the inversion has started to gain attention. We present a new approach for joint 3‐D inversion of MT and wide‐angle seismic reflection/refraction data to accurately determine crustal structures and Moho depth. Based on H‐κ stacking of teleseismic receiver functions, we estimate an initial reference Moho. This is used as input for the subsequent MT/seismic joint inversion, where the Moho interface is updated and crustal structures are added to the model. During the joint inversion process, structural similarity is facilitated through the cross‐gradient constraint. Synthetic model tests show an improvement of the inversion results over separate inversions. In particular, the tests based on two geologically realistic models demonstrate that the crustal structure and even the trade‐off between velocity and Moho interface can be sufficiently resolved by combined MT and seismic data sets when using the estimates from analysis of receiver functions. These results show that the new method can provide useful constraints on crustal structures including their geophysical properties and discontinuities.


Introduction
The magnetotelluric (MT) method provides crucial information on the conductivity of the subsurface of the Earth determined by the presence of interconnected fluids and partial melt and is widely used to image the lithosphere-asthenosphere system (e.g., Jones, 1999;Lin et al., 2017;Unsworth, 2010). In contrast, wideangle seismic surveys image gradual changes in velocity and distinct geological features such as interfaces between layers and unconformities in the crust using controlled or artificial sources (Rawlinson et al., 2001). Based on such measurements, both layer velocity and interface geometry can be retrieved from reflection and/or refraction traveltimes using tomographic approaches (Karplus et al., 2011;Rawlinson et al., 2010;Zelt & Smith, 1992).
The combination of multiple geophysical data types can help reduce the nonuniqueness of inversion results (e.g., Moorkamp, 2017). For example, seismic methods generally have reliable vertical resolution on horizontal or dipping planar layers. They can effectively compensate the diffusive nature of electromagnetic fields that leads to decreased vertical resolution in the deeper earth and generally smooth models (Chave & Jones, 2012). Conversely, blind areas of seismic refraction may exist in some cases but can be improved by MT surveys (Bennington et al., 2015;Stanley et al., 1990). Another benefit of combining MT and wide-angle seismic data is that they sense structures on similar spatial scales, and therefore can be effectively compared or coupled.
In recent times, a number of studies have been performed utilizing joint inversion of MT and seismic traveltime data (e.g., Gallardo, 2007;Gallardo et al., 2012;Heincke et al., 2006;Hu et al., 2009;Moorkamp et al., 2011;Peng et al., 2013). In terms of coupling approach, these studies can be classified into two main categories: direct petrophysical parameter coupling (e.g., Colombo & Stefano, 2007;Heincke et al., 2017) and structural coupling (e.g., Gallardo & Meju, 2003;Haber & Oldenburg, 1997). The former approach focuses on explicit functional relationships between electrical resistivity and seismic velocity. These can be estimated from empirical laws based on rock physics such as Archie and Wyllie equations (Archie, 1942;Wyllie et al., 1956), derived from borehole data and porosity estimations (e.g., Berryman et al., 2002;Jegen et al., 2009), or calculated from composition and temperature of the mantle (Afonso et al., 2013). However, these methods largely depend on a priori petrophysical relationships, and if the relationships are estimated incorrectly, it is likely to lead to artifacts in the jointinversion results (Moorkamp et al., , ).
Structural coupling approaches were first proposed to measure structural similarity using model curvature to enhance common boundaries (Haber & Oldenburg, 1997;Zhang & Morgan, 1997). Gallardo and Meju (2003) introduced the cross-gradient method to consider direction-dependent constraints on different models. This direction-based approach is widely applicable and has been adapted to different geophysical data types (Doetsch et al., 2010;Fregoso & Gallardo, 2009;Gallardo & Meju, 2011;Linde et al., 2006;Moorkamp et al., 2016), since it provides a generalized methodology for evaluating structural similarity between diverse multidimensional models.
In wide-angle seismic surveys, crustal structure is commonly represented by a series of subhorizontal layers separated by continuous interfaces. In comparison with traditional refraction interpretation, using both refracted and reflected phases to image seismic structure offers potentially better resolution (Rawlinson et al., 2001;Rawlinson & Urvoy, 2006). However, although a few recent studies have taken these reflection traveltimes into consideration by jointly inverting multiple geophysical data sets with application to marine subsurface integrated imaging (e.g., Gallardo et al., 2012), examples of hybrid 3-D joint inversion frameworks that determine interface depths and physical properties simultaneously are harder to find. One issue with inverting reflection traveltime is the difficulty to include data that involve seismic discontinuities, such as Moho reflections. Another problem is that there is a trade-off between Moho boundary location and crustal velocity which can be largely attributed to inadequate path coverage near the Moho interface (Rawlinson et al., 2010).
In this paper, we introduce a 3-D cross-gradient joint inversion algorithm that combines MT data with seismic reflection and refraction traveltimes. For the seismic tomography part of the algorithm, we consider a layered medium in a discontinuous velocity model intersected by an undulating Moho interface, allowing both seismic reflection and refraction phases to be tracked. Additional constraints from teleseismic RFs are also included in the new framework in order to overcome the limitation of accurately determining crustal structures and Moho interface simultaneously under geologically realistic conditions.

Forward Modeling in MT Method
We use a 3-D staggered grid finite difference method as the forward algorithm for electromagnetic modeling (Tan et al., 2003). Similar to previous work for computing the MT response of 3-D Earth models (Mackie et al., 1993), the algorithm is based on the integral forms of Maxwell's equations given by: where J is current density, σ is conductivity, ω is angular frequency, dl is the contour of integral closure, and dS is the area closed by the contour.
After spatial discretization on a staggered grid with appropriate boundary conditions, integral equations (1) and (2) can be expressed as a complex system of large-scale linear equations. In this scheme (Tan et al., 2003), the linear equations are solved by using a stabilized biconjugate gradient method and generate high-precision electric or magnetic field component efficiently. The complex impedance tensor (Zxx, Zxy, Zyx, and Zyy) can be computed from the electromagnetic fields along two orthogonal directions in four equations (e.g., Egbert & Kelbert, 2012).
Moreover, a parallel scheme was developed by adding Message Passing Interface approach running on computer clusters in order to reduce computational time (Tan et al., 2006). As the 3-D joint inversion requires a significant amount of computation, this parallel 3-D MT forward modeling scheme is particularly suited for it.

Traveltime Computation in Wide-Angle Seismic Reflection/Refraction Method
We use a multistage fast marching method (FMM; Rawlinson & Sambridge, 2005) as the forward method to compute theoretical traveltimes of seismic waves in a known 3-D velocity model volume, given hypocenter locations and receiving stations on the Earth surface. FMM is a grid-based eikonal solver that can implicitly track the evolution of wavefronts for calculating reflection and refraction phases in layered media (De Kool et al., 2006;Rawlinson & Sambridge, 2004).
For a seismic P wave in an isotropic elastic medium, the eikonal equation can be written as where ∇ represents gradient operator, T = T(x) is a time function (the traveltime), which describes surfaces of constant phase (wavefronts) when T is constant, and v is the seismic wave velocity as a function of the position at a spatial point. This equation describes the time of arrival of the fastest wavefront at a given position.
Recently, an upwind scheme (De Kool et al., 2006) has been implemented for predicting multiarrivals. One advantage of the practical grid-based method is that the propagation grid is defined separately from the inversion grid by specifying discrete sampling of velocity fields and implicit interface boundaries. Because it has no direct relationship with the inversion grid, we can consider both the calculation accuracy of forward modeling and the scale of the inversion grid. Also, adding interface nodes does not significantly increase the complexity of the inversion grid, which is another benefit for the joint inversion framework.
For our inversion we use the FMM algorithm with the upwind scheme implemented in the FMTOMO package developed by Rawlinson and Sambridge (2005). Tests with a variety of velocity structures show that it is both computationally efficient and robust and thus preferable as a forward solver for reflection and refraction data for the joint inversion.

Inversion Algorithms
In addition to well-developed separate forward modeling algorithms, a flexible 3-D joint inversion framework needs large-scale optimization inversion schemes and a coupling approach.
In our implementation, the resistivity model and velocity model parameters are updated in an exchanging pattern that is similar to the approach by Um et al. (2014). In addition, we also consider the determination of an initial Moho interface (H) and its adjustment during the velocity model updates in the algorithm.
The framework (Figure 1) is composed of two main modules (MT inversion module and seismic inversion module). Neither of them is independent of each other as they are linked by cross-gradient coupling at each iteration of a loop. Taking the MT inversion module, for example, we carry out the MT inversion first when the joint inversion starts. The parallel staggered-grid finite difference method forward solver as shown below is used to calculate theoretical electromagnetic responses from an initial resistivity model m 0 ρ . Then, in a cross-gradient coupling module, we calculate the cross-gradient values (τ) and its Jacobian matrix (B) between the initial resistivity model m 0 ρ and the velocity model m 0 s , respectivley. The first updated resistivity model m 1 ρ is generated by the model update expression of the data-space method as presented in equation (12). After that, we need to calculate the misfit (root-mean-square, RMS) between the observed data and the synthetic responses and estimate if the value is lower than the desired level of misfit. We then perform similar computations for the seismic part of the inversion. The iteration loop will not end until both MT RMS and seismic RMS satisfy the requirement of error tolerances or reach the maximum number of iterations.
In the following section, we first introduce the MT and tomographic inversion schemes and then present the model coupling approach and how the two methods are combined. We then discuss our strategy to estimate crustal thickness within the inversion and how it impacts on the tomographic results.

MT Inversion Scheme
The classical Occam's inversion seeks models fitting the data while at the same time having the "smoothest" structure (Constable et al., 1987). The inversion uses a self-adaptive regularization scheme, which is affected little by the initial model and has a stable convergence (De Groot-Hedlin & Constable, 1990); thus, it is an ideal algorithm to solve MT inverse problems. The minimization of the objective function is achieved by finding stationary points of the equation: where m ρ is the vector of resistivity model parameters, m 0 ρ is the vector of priori model parameters, C m is the model covariance matrix, d obs is the vector of observed data, f(m ρ ) is the model response, C d is the data covariance matrix, ξ is the desired level of misfit, and λ −1 is a Lagrange multiplier.
To find a new model, we linearize the above objective function U and obtain a series of iterative solutions: where J k is the corresponding Jacobian matrix, , and the inverse matrix   Siripunvaraporn et al. (2005) reformulate Occam's inversion scheme into data-space in order to avoid the computational cost of inverting a M × M matrix. They transform the above iterative equation (5) mathematically into a new one that contains a N × N system of linear normal equations: Here N is the number of independent data, which for many geophysical surveys is orders of magnitude smaller than the number of model parameters M.
The data-space method is similar to Occam's inversion approach as the model iterations in data-space also follow the principle of minimizing a penalty function with a series of Lagrange multipliers λ. The data-space algorithm has been shown to be efficient in 3-D MT inversion in terms of reducing the matrix dimensions since the number of independent data N in realistic models is normally much less than the number of model parameters M, which results in significant savings of both memory and CPU time (Siripunvaraporn & Egbert, 2009).
Another advantage of the data-space method is that a generalized model covariance C m is directly calculated; thus, it avoids solving the inverse of the full matrix C m , which can result in large computational costs. Note that for model space inversion approaches (Constable et al., 1987;De Groot-Hedlin & Constable, 1990), it is common to utilize a sparse model roughness operator in terms of a "roughness penalty" instead of directly using the model covariance C m . Siripunvaraporn and Egbert (2000) discuss the differences between them in detail and how to deal with the C m in the data-space algorithm.

Tomographic Inversion Scheme
We use a nonlinear subspace inversion approach to implement the tomographic inversion procedure. Similar to the MT inversion scheme, the inverse problem can be solved by specifying an objective function to adjust the model parameter values (i.e., velocity m s and interface depth H) and try to satisfy the data observations (source-receiver traveltimes), subject to imposed regularization constraints. The scheme is implemented by a least squares approximation to a multidimensional subspace of model space using the iterative subspace method (Kennett et al., 1988;Sambridge & Kennett, 1990). The model update is composed of a series of base vectors in terms of the steepest ascent vector in model space and the perturbation of model is finally written as where A is a projection matrix (a j ); {a j } is the spanning set of basis vectors; b γ is the gradient vector; G is the Jacobian matrix of partial derivatives; and the variables A, b γ, and G are updated constantly in successive iteration. Here the vector δm sh includes two classes of model parameters: the velocities and the depths of the interface vertices. For the derivation and simplification of the equation, see Rawlinson et al. (2001).
Once the projection matrix A has been computed and orthonormalized, the model update is obtained by the inversion of a relatively small matrix. The subspace inversion method is stable and efficient for large underdetermined inverse problems and together with FMM forms the basis of a fast and robust tomographic imaging scheme. Singular value decomposition is used to identify and remove unnecessary basis vectors if A does not completely span all dimensions. Usually, the value of the dimension size of A is small; for example, the dimension size used by Rawlinson and Urvoy (2006) is less than 20 when they implemented inversion of observed traveltime data from Tasmania, Australia. In this paper, the dimension of the subspace used in the tomographic inversion was set to 18 in that it offers a suitable compromise between the reduction of the misfit function and the computational burden.

Joint Inversion Algorithm 3.3.1. Coupling Approach
We enforce structural similarity between resistivity and P wave velocity through the cross-gradient approach (Gallardo & Meju, 2003). The cross-gradient function is defined as where m ρ and m s represent the electrical resistivity model vector and the seismic P wave velocity vector, respectively. Note that τ(x,y,z) is a spatial vector in any grid that has three components (τ x , τ y , and τ z ) along the coordinate axis in x, y, and z direction, respectively. Here, the resistivity (ρ) and velocity (v) are transformed into logρ and 1/v when calculating cross-gradient values, to ensure that the variation in regions of high and low value of the model parameters will have a similar scale and can exert identical importance on the structural similarity.
In order to generate more precise results compared with our previous forward difference scheme (Peng et al., 2013), we use a central difference scheme here to discretize the cross-gradient in a 3-D subsurface medium ( Figure 2).
The joint inversion algorithm requires the Jacobian matrix B associated with the cross-gradient function with regard to the model parameters. In our approach, the cross-gradient function for the current iteration is approximately linearized by a first-order Taylor expansion around their a priori models (Gallardo & Meju, 2004). Note that the matrix B is decomposed into three orthogonal components B x , B y , and B z . Each of the components is a sparse matrix in which every row has 10 nonzero elements in the central mesh.
In addition, there are even less than 10 nonzero elements on the boundaries of the mesh depending on which region of the grid we deal with, that is, the boundary surface, the edges or the vertices. To avoid large-scale matrix calculation, we use the compressed sparse row format to store elements of B and utilize the efficient algorithms to manipulate sparse matrices implemented in the SPARSEKIT package (Saad, 1990).

Objective Function
In addition to the data misfit terms and the regularization terms for separate inversion, the cross-gradient term for structural coupling (Gallardo & Meju, 2003) is added in the objective function to enforce structural similarity between resistivity and velocity. The framework of the joint inversion algorithm can be divided into two connected parts as shown in Figure 1: the MT inversion scheme and the seismic tomographic inversion scheme. With the structural coupling constraint the respective objective functions Ψ are then defined as where m ρ (logarithm of resistivity) and m s are the vector of current model parameters, e m ρ , e m s are the vector of the model parameters updated during the latest MT and seismic tomographic inversion respectively. f (m ρ ), g(m s ) denote the model response from the forward calculation for MT and the seismic method respectively, d ρ , d s are the vector of MT and seismic observed data, τ(⋅) is the cross-gradient operator, C τ denotes the cross-gradient covariance matrix that is an identity matrix, ε, λ are the regularization weighting factors, and μ ρ , μ s are the weighting factors of the cross-gradient terms.
In our joint inversion algorithm, the updated models are not used in integrated objective function. Instead, the models are mutually constrained by the results of a previous iteration from the other method. As described above, we use the data-space method and the subspace method as the optimization algorithm, respectively. For the MT module, we solve the inverse problem in the sense of Siripunvaraporn et al. (2005) and derive a new iterative formula to minimize the objective function of Ψ MT in equation (9), which includes the coupling term. The expression (see Appendix A) for a series of model updates from an initial model is Þ is the Jacobian matrix of the cross gradient with respect to the current resistivity model m ρ but only determined by the velocity model e m s that is updated during the latest seismic inversion. The definition of the vector X k in equation (11) is identical to that given by equation (5). In fact, compared with the separate MT inversion in equation (6), the second term in equation (11) is an added cross-gradient constraint that can make the structural features of current resistivity model similar to the previous seismic model.
To adapt the subspace method (Kennett et al., 1988; to our joint inversion optimisation problem for the seismic module, we rewrite the gradient vector and Hessian matrix in a new form by evaluating derivatives of objective function Ψ Seis in equation (10). The solution is given by the following expressions (see Appendix B for more detail): where the Jacobian matrix of the cross-gradient B s is also decomposed into three components and only determined by the resistivity model e m ρ that is updated during the latest MT inversion.
This form of objective function we adopt here is slightly different from the approach introduced by Gallardo and Meju (2003). Their approach minimizes the objective function including the data misfit term and the regularization term above but requires the cross-gradient term to be equal to zero and thereby searches exact structural resemblance. Although such an approach provides the benefit of fully enforcing the sought structural similarity, it poses some limitations on large-scale 3-D joint inverse problems (Meju & Gallardo, 2016). Instead, our approach only seeks partial resemblance, as the cross-gradient term can deviate from zero. In addition, updating the model in an alternating manner can, in theory, lead to less strict coupling between the methods. However, such mutually constrained inversion algorithms have been shown to be effective in previous studies (e.g., Heincke et al., 2017;Um et al., 2014) and offer high flexibility to integrate new methods.

Inverting for Seismic Layer Interfaces
First experiments with our coupled MT/seismic inversion show that it is important to consider the effect of variations in reflector positions in the seismic tomography. The inversion therefore also contains a module to update the position of chosen reflective interfaces.
The inclusion of interfaces can potentially increase the nonuniqueness of the solution because the amount of model parameters is larger if we invert for physical properties and depths of the interface vertices simultaneously. Also, we found that in geologically realistic situations, the estimates of crustal velocity and Moho interface provided by separate tomographic inversions are likely to be inaccurate because of the trade-off between Moho boundary location and crustal velocity. This can be largely attributed to inadequate path coverage near the Moho interface. We propose two strategies in order to overcome such limitations.
The first strategy is to invert for resistivities and velocities with fixed interfaces determined by prior information. We can acquire relatively accurate information on the Moho interface and one-dimensional layered velocity model before starting a 3-D inversion. This can be done by 1-D joint inversion of RFs and surface wave dispersion data (e.g., Moorkamp et al., 2010;Peng et al., 2012). With a good constraint on the location of the Moho interface or the layering of the subsurface, an approach to jointly invert crustal structure with a fixed Moho interface can be feasible. However, the success of such a strategy largely depends on the initial model and the accuracy of the fixed Moho interface and thus is hard to judge.
Another choice is to update the location of interfaces while resistivities and velocities are jointly estimated. The advantage is that this strategy does not need rigorous requirement on the initial Moho boundary that may be not very close to the true Moho. This is therefore our preferred approach.
As RFs are sensitive to velocity discontinuities, they are considered to be an ideal tool to recover the position of the Moho interface. A grid search method (Zhu & Kanamori, 2000), H-κ stacking, is widely used to estimate crustal thickness H and V p /V s ratio (κ). Such an "inaccurate" Moho interface can be estimated from RFs and used as an initial reference Moho for the 3-D joint inversion.
However, if there is a 0.2 km/s deviation on average P wave velocity (V p in the process of H-κ stacking, it can give rise to an average error of 1.8 km in estimating Moho depth (Zhu et al., 2006). Thus, in some study areas with little priori information on V p , the method can lead to controversial results (Wölbern & Rümpker, 2017).
In this paper, we present an effective way of mitigating this problem by using an updated crustal model to calculate more accurate V p from the results of seismic refraction/reflection tomography. We use interface nodes (j) corresponding to different V p because the layer velocity V i p is different beneath each broadband seismic station (j). This V j p can be easily obtained by calculating the average velocity in the model underneath each station viz where H is the Moho depth, Δz is the interlayer spacing in the depth direction, and N(j) is the number of layer of the crust beneath the j th seismic station. At each iteration, the Moho interface is updated by H-κ stacking with the new V p . Then, the updated Moho interface is used as the reference interface for the seismic joint inversion during the next iteration.
We finally note that the rates of convergence between MT and seismic inversion can vary significantly. This was also observed by Heincke et al. (2017) and may result in abnormal coupling in the process. For example, assuming that seismic inversion converges much more slowly than MT inversion, the latter would be constrained by an incomplete velocity model when the MT inversion ends. Therefore, we adjust the cross-gradient coupling flexibly by estimating, which updated model should be used in the cross-gradient module. In this paper, we only have 4 iterations in MT inversion but 12 iterations in seismic inversion, so at each iteration the joint MT inversion is constrained by renewed resistivity model whose estimated iteration is multiple of 3.

Synthetic Examples and Discussions
We now apply the method above to synthetic examples to test the joint inversion algorithm. All computations were performed on the Alice2 cluster at the University of Leicester. Each compute node has a pair of 14-core Intel Xeon Skylake CPUs running at 2.6 GHz and 128GB of RAM. For the first two synthetic examples, the subsurface region for joint inversion is discretized with a 20 × 20 × 20 mesh yielding 8,000 equally sized cells. Cell sizes in the interior of the mesh are 5 × 5 × 4 km.
The wide-angle synthetic data set comprises more than 3,800 crustal reflection (PmP) and refraction (Pg an Pn) traveltimes generated by 9 active sources and recorded by 144 receivers evenly distributed on a grid at the surface ( Figure S1 in the supporting information).
We place 36 MT sites on the surface and calculate the complex impeddance tensor (Zxx, Zxy, Zyx and Zyy) for five frequencies (10, 1, 0.1, 0.01, and 0.001 Hz). Gaussian noise with 2% is added to the synthetic MT data and a standard deviation of 5 ms to the synthetic traveltimes. The data variance is assumed to be 2% of|Zxy Zyx| 1/2 for MT inversion and 5 ms for seismic inversion.

Example I: A Simple Model Inverted With Fixed Moho Interface
As a first test, we design a simple model where the resistivity and velocity of anomalies have exactly similar spatial structure features and proportional parameter values. We apply the workflow to examine how the structure-coupled 3-D joint algorithm behaves when fixing the Moho interface.
The model consists of two prisms within a given background (a 100 Ω⸱m half-space background for MT model and a layered background for seismic model with velocities ranging from 5.25 to 8.15 km/s) and is very similar to the test model of Moorkamp et al. (2011). Also, for the seismic model, a fixed undulated Moho boundary is defined here in order to track multiple reflection phases in the layered media. Figure 3 displays a plot of the true model used to generate the theoretical synthetic data set.
The joint inversion starts from a simple background without two prisms: a 100 Ω⸱m half space and a layered velocity model (the same as the layered background for seismic model). These models were also used for the a priori models in the joint inversion. To seek out the final optimal model, we have performed dozens of joint inversions with different cross-gradient weights. Figure 4a shows the convergence curves for the last three iterations of the joint inversion. We can see that within four iterations, the MT inversion has converged to the desired level of misfit, while the seismic inversion requires 12 iterations to achieve the target misfit. When the value of the cross-gradient weight is small (1.0-2.0), both MT and seismic data misfit change  . Both magnetotelluric (MT) and seismic joint inversion exhibit faster convergence than the separate inversion. The definition of root-men-square (RMS) can be found in the supporting information (Text S1) slightly due to the loose structural coupling. However, we can obtain a reasonable reduction in misfit for the last iteration when choosing somewhat larger weights (e.g., μ ρ = 2.2-3.0 for MT inversion and μ s =4.0~4.6 for seismic inversion). When using very high coupling values the data misfit values strongly increase as the inversion only tries to create structurally similar models and ignores the data. These convergence curves help us to find a couple of corresponding cross-gradient weights for the optimal joint inversion model (shown as asterisk in Figure 4a).
We evaluate the misfit between the true model and the joint inverted model when the data misfit displays little difference. This model perturbation RMS is defined as (Moorkamp et al., 2011) δ The data misfit (χ) for the final joint inversion models are compared with separate inversion models varying with iteration ( Figure 4b). The MT data misfit decrease from an initial value of 2.635 to 0.9996 at iteration 4. Correspondingly, the seismic misfit diminishes monotonically from 80.58 to 1.171, a value close to the noise level. We see a slower convergence rate in separate inversions, overtaken by joint inversion during the second half of the iterative process. Figures 5a and 5c, the result of the single MT inversion differs from that of the single seismic inversion. We see that the inversion of MT data alone can recover the general shape and position of the right conductive anomaly, but the left high-resistive prism fails to be delineated in the box. The separate seismic inversion can recover the shape of the two prisms, whereas the values of the seismic velocity in the boxes cannot be fully recovered, especially at a depth of 15-20 km. Compared to the separate inversions, we can see that the general shape and values of both anomalies are recovered much better in the joint inversions (Figures 5b and 5d). These jointly reconstructed resistivity and velocity models, to some extent, overcome some of the shortcomings of the separate inversions and thus improve the inversion results.

As shown in
We also compare the computed values of the cross-gradients for MT and seismic models from the separate inversion with those for the joint inversion models (the third column in Figure 5). For separate inversion, the cross-gradient map shows zones of structural disparity especially near the boundary of anomalies. In comparison, some of the high values of cross gradients in those zones decrease obviously in the joint inversion. This observation is also confirmed by the comparison of average cross-gradient value for the final models, We obtain the average cross gradients of τ = 4.31 × 10 −4 for the joint inversion, which is smaller than the value (τ = 5.32 × 10 −4 ) for the separate inversion. The results demonstrate that the effective cross-gradient constraint facilitate mutual structural attributes between these models.

Example II: A Complex Model with Moho Perturbations
We also test the joint inversion on a complex model in order to understand whether MT and seismics can help each other to recover more complex common structure. In this example, the true model consists of a plate-like resistive anomaly on top of a conductive and low-velocity block in the crust (Figure 6).
The a priori models comprise a 100 Ω m half-space resistivity model and a layered seismic background model by getting rid of the anomalies. This example is a first step toward a more realistic model, as it is designed based on the typical crustal and upper mantle structure in SE Tibet. The low resistivity anomaly is based on an observed structure which has a typical bulk resistivity of 3 Ω m in most areas of southern Tibet (Unsworth, 2010) and has been interpreted as a partially molten layer. The velocity anomaly amplitude of the reference P wave velocity structure is set to 10% of the background based on results from joint inversion of RFs and Rayleigh wave dispersion (Sun et al., 2014). Moho depth is usually not known a priori or difficult to determine accurately in some cases, and an inaccurate Moho position might have a negative impact on the inversion of the reflection traveltimes. Thus, in this section, the second aim is to examine how the algorithm behaves if we are inverting for velocity parameters and interface depth simultaneously. Note that an electric Moho is not defined here because it is difficult to resolve in most localities (Jones, 2013). Our joint inversion approach can be a contributing way to determine the exact electric Moho, but prior to the use it should be primarily established that there is an acceptable expectation of an electric Moho from the MT data.
Before presenting results from the test with Moho perturbations, we first conduct a test with a flat Moho at 40-km depth in the starting model for the inversion. However, the results show that the crustal seismic structure is not adequately resolved by the joint inversion. Although crustal velocity at shallow depths (e.g., 0-15 km) is recovered successfully (Figures 7b and 7d, bottom), the velocity anomalies in the lower crust are poorly recovered. In addition, the true shape of the Moho interface seems to be difficult to recover (Figure 8c). Similar synthetic tests for this type of joint inversion ( Figure S2) and the comparison to the results of crustal velocity and Moho depth with a fixed starting Moho boundary also confirm this conclusion. The poor performance can be largely attributed to inadequate path coverage near the Moho interface. For separate tomographic inversion, the use of station terms or active and passive source (i.e., teleseismic events and local earthquake) data sets simultaneously can provide additional constraints on the upper-mantle velocity structure and even Moho interface (Rawlinson & Urvoy, 2006). However, the solution not only depends on the completeness of data sets but also suffers from the intrinsic smearing from the regularization, which results in underestimating the true amplitudes of velocity structures in most tomography studies (Rawlinson et al., 2010). Thereby, the estimates of crustal velocity and Moho interface are likely to be conservative just by separate tomographic inversion.
To overcome this shortcoming, we simulate 36 broadband seismic stations evenly distributed on the surface (Figure 6a) and use teleseismic RFs from each station to estimate the Moho interface underneath. We found that the lateral offset of the Moho piercing point can be confined to the interface grid spacing of 12 km with an appropriate ray parameter (e.g., <0.04 s/km). This means each broadband station can determine the crustal thickness of the grid independently using H-κ stacking technique (Zhu et al., 2006).
In the synthetic test, a total of 72 RFs ( Figure S3) are generated with two ray parameters (0.03 and 0.04 s/km) to simulate the observed data. Gaussian noise with 20% is added to the synthetic RFs directly. The initial reference Moho boundary H 0 ( Figure S4) can be estimated V p using the H-κ stacking technique based on an estimated initial V p value of 5.4 km/s. During the inversion, the interface position H gets updated and Figure 6. Sections of true model for a realistic model test in synthetic example II, consisting of a thin layer and an underlying prism buried in a background. The top panel (a) is a plan section on the surface showing magnetotelluric (MT) sites (blue solid circle), broadband seismic stations (black circle), active sources receivers (blue triangles), and the sources (red stars), respectively; the top panel (b) shows a random Moho structure for the synthetic test; the middle panels (c) and (d) and the bottom panels (e) and (f) are cross sections AA′ and BB′ along E-W direction, respectively. gets closer to the true Moho depth while simultaneously recalculating V p for each inversion iteration. Also, the updated Moho interface is used when theoretical traveltimes are computed during the seismic joint inversion scheme (Figure 1).
For the joint inversion in this example, a maximum of 18 iterations for the tomographic inversion scheme are applied to obtain solution models. The cross-gradient weights (μ ρ = 3.7 and μ s = 4.7) are selected for the optimal joint inversion model. We obtain a model misfit of δ ρ = 2.73 and δ s = 7.62 for the joint inversion with the undulating Moho interface, which is smaller than those values (2.95 and 9.88) of the separate inversion. Figure 7 shows the joint inversion and the separate inversion results for the realistic model inverted with Moho perturbations. We can see that a significant improvement is achieved in the joint inversion (especially for the velocity parameter and interface) constrained by the RFs with an initially undulating Moho interface (the middle column), compared to the other joint inversion case with the initial flat Moho (the left column). As both inversions are run with exactly the same weights, we conclude that the additional constraints on the Moho interface from RFs provide a positive contribution to the joint inversion. Although the separate inversion constrained by the initial RFs (the right column in Figures 7b and 7d) better recovers the crustal velocity structure and Moho interface than the joint inversion with a flat Moho interface, the recovery of the lowvelocity anomaly in the lower crust is not successful. In contrast, for the corresponding joint inversion, the modification of V p in the joint inversion framework enables us to obtain more accurate Moho depths and thus adequately resolve crustal structures by the combined data sets. Also, the image of the final Moho geometry from joint inversion including the RF module (Figure 8d) is similar to the true Moho interface ( Figure 8a). This demonstrates that the combined data sets can resolve crustal structure sufficiently and to a large degree even the trade-off between velocity and Moho interface.
We focus much of the particular discussion on the primary interface (i.e., Moho boundary), but our algorithm is applicable to a wide range of possibilities. For example, the method can be well applied in a realistic multilayered crust when we are able to identify more complicated phases, such as P 1 P and P 2 P for reflections associated with intracrustal secondary interfaces ( Figure S5). In practice, the upper crustal refraction (Pg) and the Moho reflection (PmP) phases are generally easier to identify than any other arrivals (e.g., P 1 , P 2 , Pn for refractions and P 1 P, P 2 P for reflections), but the identification of these later phases may be also feasible by careful picking of arrival times in record sections (Karplus et al., 2011;Rawlinson & Urvoy, 2006).

Example III: A Realistic Model in Namche Barwa, SE Tibet
A third test is performed on a new model that is compatible with geological reality. The Namche Barwa region, located in southeastern Tibet (Figure 9), is marked by strong tectonic stress, rapid rock uplift and exhumation, extremely young cooling ages, and intense metamorphism and anatexis (Xu et al., 2012;Zeitler et al., 2014). The formation mechanism is still debated although several plausible geodynamic models (e.g., indenter corner, crustal folding, and channel flow) have been proposed (Burg & Schmalholz, 2008;Jamieson et al., 2004;Koons, 1995). Our previous 3-D teleseismic tomographic P wave model showed a complex Indian subduction style with slab fragmentation beneath the eastern Himalaya on a large-scale image Namche Barwa mountain. These characteristics appear to be prominent, and the anomalous region with a resistivity of approximately 1-5 Ω·m has been interpreted as an accumulation of partial melt (Lin et al., 2017). To highlight the features of these anomalous bodies, the background of the MT model is modified by setting the varying background resistivities (1.8 Ω·m < log(ρ) < 3.2 Ω·m) to a constant value (ρ = 310 Ω·m). We then calculate MT synthetic data from the resulting resistivity model (Figure 9c).
Compared with the resistivity model, the teleseismic tomographic results (Peng et al., 2016) show a similar distribution of low-velocity anomalies in the lower crust, but they have inferior lateral resolution to the shallow structures. Therefore, the resistivity model is firstly translated into a velocity model via an empirical relationship (Moorkamp, 2017), logρ = κv-1, where κ is set to 1/2,500. A reasonable seismic model (Figure 9d) is then designed by mixing the above translated model and the preexisting teleseismic P wave model of Namche Barwa ( Figure S6b). We use a weighted summation method to combine these two models with the same weighting factor (0.5). As a result, resistivity and velocity images show some common structure characteristics, but some details (e.g., a seismic low velocity anomaly near Motuo as shown in Figure 9d) are not matched. The velocity background is adopted from a 1-D reference model that comprises two layers over a half space with the Moho at 64 km (Zhu & Helmberger, 1996). The real data of Moho depth (Table S1) is derived from the previous analysis of teleseismic RFs in Namche Barwa . The synthetic wide-angle seismic data sets were generated from the realistic seismic model with 8 shots and 50 receivers distributed in the realistic acquisition geometry (Figure 9a).
We apply our joint inversion approach to the synthetic data sets from the realistic model. In this joint inversion test, the MT inversion starts from a 310 Ω·m half-space, and all impedance tensor elements are included at 17 periods between 0.05 and 364 s. For the seismic inversion, the initial reference Moho boundary H 0 (Figure 11b) is derived from a 1°-degree global model of the Earth's crust (CRUST1.0; Laske et al., 2013), and the interface H is updated by recalculating V p for each iteration. The joint inversion finally reaches the desired RMS (χ ρ = 1.886 and χ s = 1.236) when we select the optimal cross-gradient weights (μ ρ = 1.8 and μ s = 5.2).
A significant difficulty when performing joint inversion with real data is the selection of appropriate weights for the data sets, constraints, and regularization (e.g., Moorkamp, 2017). By separating the selection procedure into three steps, we have found that we can produce robust results with the joint inversion framework presented here. In the first step, we perform constrained seismic inversions with a fixed conductivity model based on the final individual MT inversion and a series of seismic cross-gradient weights (e.g., μ s = 2.0, 2.4, 2.8, … , 7.2). This allows us to focus on the seismic inversion and the impact of the cross gradient on the velocity models. From these experiments we can choose a reasonable crossgradient weight (e.g., μ s = 5.4) for the optimal seismic joint inversion model. We then repeat the procedure for the MT inversion and use the final velocity model of the constrained inversion from the first step as a constraint to determine the optimal cross-gradient weight for the MT part of the inversion. In the final step, we fixed the MT cross-gradient weight and conduct a mutually coupled MT-seismic inversion with a new series of seismic cross-gradient weights (e.g., μ s = 5.0, 5.1, 5.2, … , 5.6) in order to fine tune the seismic part. The resulting weights provide a good balance between structural similarity and fitting the data. Furthermore, the stepwise selection procedure reduces the mutual influence of the different weights, which makes assessing the result difficult.
The solution models obtained from the separate inversions are shown in the third row of Figure 10. We observe that the separate MT inversion senses the general presence of the conductive anomalous bodies, but it fails to accurately delineate the boundaries of the anomalies. For example, the two separate low-resistivity anomalous bodies near Bomi (i.e., L3 and L4 at 25-km depth in the true model) merge into one anomaly in the separate MT inversion model as shown in Figure 10a. In contrast, the separate seismic inversion is able to delineate the boundaries of some low-velocity anomalies well (e.g., the anomaly L2), but it is almost insensitive to anomaly L4. This appears to be a blind area of seismic refraction where ray coverage is sparse. In general, the seismic and MT results exhibit different imaging sensitivity to subsurface anomalies.
In the joint inversion, we first perform a test with Moho perturbations and a workflow without RFs. Figure 10 (left column) shows the resulting images including at 25-km depth and a cross section. Compared to the separate inversion, the low-resistivity anomalies for the joint inversion are relatively similar in shape, but they are much more intensive in space. Also, the maximum resistivity of the joint inversion model is relatively higher as shown in the cross section. The recovered velocity attributes of some anomalies (e.g., the anomaly L4) are relatively closer to the true value due to the contributions of the MT data. However, the steep zone in the middle crust associated with the low velocity values from profile position 140 km to position 200 km (Figure 10c) is not clearly resolved by the joint inversion. This can be probably attributed to the unsuccessful recovery of the Moho interface.
To produce a more consistent image of the crustal structure, we apply the coupled joint inversion algorithm with the RF module to the same data sets. In the middle column of Figure 10, we present the resulting MT resistivity and seismic velocity images of the joint inversion models. We can see a clear structural resemblance between these two models. This demonstrates that the two types of parameters are effectively coupled by the algorithm. The improvement in resolution of the resistivity image is clear in comparison with the separate inversion. In the cross sections in Figure 10c, the low-resistivity anomalies are difficult to distinguish based on the individual inversion results. For example, conductive anomaly L2 is incapable of separating from anomaly L3. In the joint inversion models, by contrast, the low-resistivity anomalies are much more isolated in space and one of these anomalies (e.g., the anomaly L3) is significantly concentrated.
The seismic shallow structure in the upper crust is also better resolved in the joint inversion results. For example, the improvement of the ball-shaped low-velocity anomaly L4 is visible compared to the separate inversion. In addition, a minor low-velocity anomaly (i.e., anomaly L3) in the upper~20 km emerges in the image as shown in the middle column of Figure 10d. Overall, we achieve a better recovery of the geometry of the low-resistivity anomalies and the distribution of the velocity values.
Because the interface perturbation in this test is smaller than in the previous example, we cannot observe a distinct improvement of the estimate of Moho interface depth from the cross-section images (Figure 10c). For the purpose of comparing the Moho interfaces, we therefore investigate maps of Moho depth variation for the example. We can see that an accurate Moho interface is achieved by the coupled joint inversion algorithm with the RF module ( Figure 11d). This Moho interface retrieved by joint inversion is very similar to the true Moho interface (Figure 11a). Note that although we obtain a seemingly satisfactory result of the Moho structure by only jointly inverting the MT and seismic data ( Figure 11c) with a prior Moho estimate, the recovery of crustal anomalies is impacted. This highlights the importance of the teleseismic RF data sets in the accurate recovery of crustal structure.

Conclusions
We have developed a new algorithm for 3-D structure-coupled joint inversion of MT and wide-angle seismic reflection/refraction data by assembling preexisting MT and seismic refraction traveltime tomography schemes and incorporating reflectors into the framework. Our results show that an accurate estimation of the position of the Moho interface is critical for this type of joint inversion. We therefore present two strategies to solve the problem of accurately determining velocity parameters and Moho depth simultaneously. One strategy is to independently acquire an accurate Moho interface or layered velocity model, for example, by 1-D joint inversion of RFs and surface wave dispersion. However, in these cases the inversion results strongly depend on the quality of this prior information. Thus, our preferred approach is to adjust the average velocity and position of the Moho interface within the joint inversion and utilize teleseismic receiver functions to update these estimates. Analysis of synthetic test results reveals that the joint inversion improves the inversion results in comparison with the separate inversion. Also, the combined data sets can resolve crustal structure sufficiently and even, to a large degree, the trade-off between velocity and Moho interface. For the realistic examples, we can conclude that the inclusion of teleseismic RF data can effectively decrease the nonuniqueness for the 3-D coupled MT/seimic joint inversion. The Moho interfaces need to be updated by H-κ stacking with estimates of average V p from the tomography. Then the new V p that can be used to update the crustal velocity model and the interface during the seismic inversion. This significantly contributes to a more accurate estimation of Moho depth.
The new algorithm can be used as a tool for integrated imaging of crustal structures including the distribution of resistivity and velocity parameters, the Moho interface, and the secondary crustal discontinuities. With suitable experiments in the future, we envisage significant improvements in imaging such structures within the Earth.

Appendix A
To minimize the objective function for MT joint inversion in equation (9), we evaluate the partial derivatives of Ψ MT and yield a series of model updates from an initial model: As presented by Siripunvaraporn et al. (2005) and summarized in equations (5) and (6), the normal equation in the model space is replaced by a system in the data-space: Here, Λ k ¼ λC −1 m þ J T k C −1 d J k is an M × M positive symmetric matrix in the model space, and Γ k ¼ λC d þ J k C m J T k is an N × N positive symmetric matrix in the data-space. In order to apply the data-space approach to joint inversion, we need to solve the solution ofΛ −1 k . This M × M inverse matrix can be transformed into data-space by Woodbury matrix identity and it follows that where I is the identity matrix, and by substituting equations (A2) and (A3) into equation (A1), the (k + 1)th model update can be expressed as

Appendix B
The subspace inversion method requires the partial derivatives of objective function Ψ Seis at a specified point in model space, and it satisfies a basic assumption that Ψ Seis is adequately smooth to validate a locally quadratic approximation about some current model: