Estimating a planetary magnetic field with time-dependent global MHD simulations using an adjoint approach

The interaction of the solar wind with a planetary magnetic field causes electrical currents that modify the magnetic field distribution around the planet. We present an approach to estimating the planetary magnetic field from in situ spacecraft data using a magnetohydrodynamic (MHD) simulation approach. The method is developed with respect to the upcoming BepiColombo mission to planet Mercury aimed at determining the planet’s magnetic field and its interior electrical conductivity distribution. In contrast to the widely used empirical models, global MHD simulations allow the calculation of the strongly time-dependent interaction process of the solar wind with the planet. As a first approach, we use a simple MHD simulation code that includes time-dependent solar wind and magnetic field parameters. The planetary parameters are estimated by minimizing the misfit of spacecraft data and simulation results with a gradient-based optimization. As the calculation of gradients with respect to many parameters is usually very time-consuming, we investigate the application of an adjoint MHD model. This adjoint MHD model is generated by an automatic differentiation tool to compute the gradients efficiently. The computational cost for determining the gradient with an adjoint approach is nearly independent of the number of parameters. Our method is validated by application to THEMIS (Time History of Events and Macroscale Interactions during Substorms) magnetosheath data to estimate Earth’s dipole moment.

Abstract. The interaction of the solar wind with a planetary magnetic field causes electrical currents that modify the magnetic field distribution around the planet. We present an approach to estimating the planetary magnetic field from in situ spacecraft data using a magnetohydrodynamic (MHD) simulation approach. The method is developed with respect to the upcoming BepiColombo mission to planet Mercury aimed at determining the planet's magnetic field and its interior electrical conductivity distribution. In contrast to the widely used empirical models, global MHD simulations allow the calculation of the strongly time-dependent interaction process of the solar wind with the planet. As a first approach, we use a simple MHD simulation code that includes time-dependent solar wind and magnetic field parameters. The planetary parameters are estimated by minimizing the misfit of spacecraft data and simulation results with a gradient-based optimization. As the calculation of gradients with respect to many parameters is usually very time-consuming, we investigate the application of an adjoint MHD model. This adjoint MHD model is generated by an automatic differentiation tool to compute the gradients efficiently. The computational cost for determining the gradient with an adjoint approach is nearly independent of the number of parameters. Our method is validated by application to THEMIS (Time History of Events and Macroscale Interactions during Substorms) magnetosheath data to estimate Earth's dipole moment.

Introduction
Planets with an intrinsically generated magnetic field, such as Earth or Mercury, interact with the solar wind. This causes electrical currents that modify the planetary magnetic field. The properties of the interaction not only depend on the planetary magnetic field but also on the continuously varying solar wind conditions. A spacecraft orbiting a planet in such a highly variable environment measures the modified magnetic field distribution.
In 2025 the BepiColombo mission (Benkhoff et al., 2010) of the ESA and the Japan Aerospace Exploration Agency (JAXA) is expected to reach planet Mercury. In contrast to the previous MESSENGER (Mercury Surface, Space Environment, Geochemistry, and Ranging) mission (Solomon et al., 2001), two spacecraft will simultaneously measure the magnetic field distribution around the planet. The planetary magnetic field at Mercury is about 100 times weaker than the field of Earth. Therefore, the magnetosheath is much closer to the surface of the planet. As a consequence, the magnetic field of the electric currents of the interaction is not negligible, even in the immediate proximity of the planet (e.g., Glassmeier, 2000). Furthermore, electromagnetic induction effects within the planet might be important (e.g., Grosser et al., 2004;Jia et al., 2015). To estimate the planetary magnetic field precisely, the time-dependent interaction needs to be determined. With its two spacecraft, the BepiColombo mission is most suitable for determining the interaction because of simultaneous observations of the magnetic field distribution in the magnetosphere and the solar wind. If both spacecraft are within the interaction region, the solar wind reconstruction method by Nabert et al. (2015) can be used to estimate the time-varying solar wind conditions from the ob-servations of one spacecraft. Then the data of the other spacecraft provide still observations within the interaction region while the solar wind conditions are known.
So far, the planetary magnetic field of Mercury has been determined using empirical models of the interaction between the solar wind and the planetary magnetic field with spacecraft data from MESSENGER or Mariner 10 (e.g., Korth et al., 2004;Alexeev et al., 2010;Johnson et al., 2012). The electrical current density of the interaction in empirical models is parametrized by pre-described functional relations. Typically, the current system is described as a superposition of localized electrical currents such as the magnetopause current, which is parametrized by its subsolar location and ellipsoidal shape. Taking only a few parameters into account, these prescribed functional relations do not include, for example, effects such as magnetic pile-up, which correspond to a distribution of the magnetopause current within the entire magnetosheath. Furthermore, the parameters are distinguished between only a few discrete solar wind scenarios such as strong and weak solar wind pressure. If more parameters or solar wind scenarios are considered to parametrize the current system more accurately, it is not always possible to determine all parameters with small statistical error due to the finite data coverage. This is especially true if strongly time-dependent nonlinear phenomena occur.
Mercury's magnetic field close to the subsolar magnetopause has a strength of about 60 nT (Johnson et al., 2012). Using an average solar wind velocity of 430 km s −1 , this corresponds to a gyroradius of the interaction of about 37.5 km. Compared to global structures of the interaction, such as a subsolar magnetosheath thickness of about 1220 km (Winslow et al., 2013), a magnetohydrodynamic (MHD) approximation seems to be a valid approximation. The inverse gyrofrequency is about 0.5 s, which limits the time resolution for this approximation. In regions dominated by heavy ions, a kinetic approach might be necessary. Here, we restrict our considerations to the MHD approximation. Taking the observations of the two spacecraft of the BepiColombo mission into account, the interaction can be calculated as fully timedependent with a MHD model. We investigate a procedure to estimate the planetary magnetic field in a strongly modified magnetic environment of the planet using a global MHD simulation. In contrast to empirical models, a MHD simulation requires only parameters of the solar wind conditions, planetary magnetic field, and plasma properties. This approach calculates the interaction self-consistently and does not contain parameters to fit electrical currents. Note that such a model also allows taking a conductivity distribution of the planet into account. Then, the parameters of the planet's interior conductivity can be estimated in addition to the planetary magnetic field parameters in a further step.
As a first approach, we consider a simple MHD simulation code based on the MHD code presented by Ogino (1993) to examine our method. A cost function quantifies the misfit of the spacecraft observations in the magnetosphere to the cor-responding MHD simulation results. The cost function needs to be minimized with respect to the planetary magnetic field parameter to estimate these planetary parameters. Different methods can be used to minimize the cost function. Methods such as downhill simplex or Markov chain Monte Carlo algorithms are usually used if derivatives of the cost function cannot be calculated directly. If the gradient can be calculated, gradient-based minimization algorithms can be used, which often offers faster convergence speed. However, these methods are restricted to find a local minimum in parameter space instead of the global minimum. Here, we expect a global minimum, which is not superposed by local minima, so that a gradient-based optimization procedure is considered. The gradient-based methods can provide fast convergence only if the gradient can be determined quickly. However, the calculation of the gradient with respect to several parameters using, for example, finite difference quotients can be very time-consuming. Thus, an adjoint approach is considered, which can theoretically compute gradients nearly independent of the number of parameters (e.g., Jameson, 1988;Giles and Pierce, 2000).
In this paper we investigate the applicability of an adjoint approach to a MHD simulation code using automatic differentiation (Wengert, 1964). Although the adjoint approach can be much faster than using finite differences, it requires larger memory capacities. An adjoint approach using automatic differentiation was successfully applied to a reduced MHD model, the magnetosheath model by Nabert et al. (2013), to estimate the solar wind parameters of the model (Nabert et al., 2015). The reduced MHD model uses series expansions along the bow shock and magnetopause geometry of the MHD quantities. This transfers the stationary partial differential MHD equations into a set of ordinary differential equations. Close to the stagnation streamline, only low-order series expansions are necessary to obtain a valid representation of the interaction. Not only the numerical effort for solving the corresponding ordinary differential equations is significantly lower compared to solving the full MHD system, the required storage capacity is also much lower. Therefore, the automatic differentiation procedure could be applied without regarding memory limitations. Here, an automatic differentiation tool is applied to a full MHD simulation code and thus special emphasis needs to be put on memory consumptions.
Our approach to estimating planetary parameters using data from a multi-spacecraft mission is validated with the THEMIS (Time History of Events and Macroscale Interactions during Substorms) mission (Angelopoulos, 2008) at Earth with its well-known planetary magnetic field. The five spacecraft of the mission (THA, THB, THC, THD, and THE) provide simultaneous observations of the interaction region and the solar wind. However, in contrast to the situation at Mercury, the interaction of the solar wind near the planet's surface is negligible at Earth. Due to the weak magnetic field at Mercury, the interaction region of the solar wind is much closer to the planet (Winslow et al., 2013). In particular, the subsolar bow shock distance to the center of the planet is on average about 1.89 R M at Mercury and 13 R E at Earth (R E = 6371 km). The average distance of the subsolar magnetopause at Mercury is 1.45 R M and 10 R E at Earth. As a consequence, only close to the magnetosheath region, are the modifications of Earth's magnetic field comparable to the strong modifications throughout the magnetosphere of Mercury. To validate our procedure with respect to the future measurements of the BepiColombo mission, THEMIS data from the terrestrial magnetosheath is used. However, in its final application for the BepiColombo mission, spacecraft data of the entire interaction region including the magnetosphere will be taken into account to estimate the planetary magnetic field.

MHD simulation code
The interaction of the planetary magnetic field with the solar wind is computed by a MHD simulation code. The MHD simulation has to be efficient to perform the time-consuming estimation procedure of the planetary parameters. Furthermore, the simulation code should be simple in its numerical implementation structure to simplify the application of the adjoint approach using automatic differentiation. For these reasons, as a first approach, a simple MHD simulation code is developed, which is based on the simulation code described by Ogino (1993). The MHD simulation code described by Ogino (1993) was already used in studies of magnetospheric convection, for example, depending on the solar wind magnetic field (Ogino et al., 1985) or field-aligned currents (Ogino, 1986). The code is modified and extended for the application to the parameter estimation process as explained in the following paragraphs. Furthermore, some details about the numerical implementation of the simulation code are summarized to understand the application of the adjoint method via automatic differentiation, which is explained in the next section.

Planetary magnetic field
The magnetic field in the simulation code by Ogino (1993) is restricted to a dipole along the planet's axis of rotation. Here, a more general representation of the magnetic field is required. The planetary magnetic field can be represented by a multipole expansion using a spherical harmonic analysis (Gauss, 1839;Glassmeier and Tsurutani, 2014). Note that this part of the magnetic field does not contain contributions due to the interaction with the solar wind such as induction or magnetopause currents. As a consequence, the planetary magnetic field outside the planet can be represented by a scalar potential V pot : (1) Thereby, the scalar potential V pot satisfies a Laplace equation. Using spherical coordinates (r, λ, θ ), the solution outside the planet (r > R P ), where R P denotes the planet's radius, is given by with the Gauss coefficients g m l , h m l and the Schmidt seminormalized associated Legendre polynomials P m l (cos(θ )), e.g., P 1 1 = cos(θ ) or P 1 2 = √ 3 cos(θ ) sin(θ ) (e.g., Langel, 1987;Clauser, 2016).
The lowest-order coefficients for l = 1 are associated with the dipole moment corresponding to the magnetic field vector B dipole . The simulation code uses a Cartesian representation of the magnetic field. For the dipole moment, this is Here, m = (m x , m y , m z ) T is the vector of the dipole moment, which is related to the Gauss coefficients via The Gauss coefficients for Earth's magnetic field in 2010 were published using the International Geomagnetic Reference Field (IGRF) by Finlay et al. (2010). Thereby, the geographic coordinate system, a body-fixed coordinate system, is used, with its z axis along the axis of rotation. Magnetic field data of spacecraft close to Earth's surface as well as ground stations were used to determine the coefficients. The influence of external currents due to the interaction of the solar wind with the planetary magnetic field was neglected. The magnetic field of Earth outside the planet is dominated by the dipole coefficients, which are summarized in Table 1. Note that a similar estimation procedure at Mercury leads to large errors because of insufficient data coverage in the southern hemisphere of the planet. Furthermore, the solar wind interaction has a strong influence on the magnetic field distribution. Similar to the dipole, higher-order moments of the planetary magnetic field can be taken into account. Thereby, the tensor structure of the Cartesian representation becomes more complex for higher orders. For example, the quadrupole can be expressed by a symmetric, traceless matrix Q, which is defined by (e.g., Vogt and Glassmeier, 2000;Stadelmann et al., 2010) The magnetic field related to the quadrupole can be expressed as with Q sp defined by The simulation code according to Ogino (1993) uses normalizations for the physical quantities. The normalization constants for the additional magnetic field parameters are m norm = 8.07 × 10 15 T m 3 , for a dipole component, 5.14 × 10 22 T m 4 , for a quadrupole component.
The simulations take the dipole and quadrupole moments into account. Thus, the resulting planetary magnetic field is

MHD equations and boundary conditions
The interaction of the solar wind with the planetary magnetic field is calculated by solving the MHD equations. Thereby, these equations are solved within a box sketched in Fig. 1.
The simulation uses a model solar wind planet (MSP) coordinate system, whereby the origin is in the planet's center. The x axis is along the unperturbed solar wind velocity vector, the z axis is parallel to the rotation axis, and the y axis completes a right-handed coordinate system. The length of the simulation box is in x direction x L , in y direction y L , and in z direction z L . The MHD equations provide solutions for the mass density ρ, the plasma velocity v := (v x , v y , v z ) T , the pressure p, and the magnetic field B := (B x , B y , B z ) T . The solutions are summarized in the vector Numerical grid Figure 1. The simulation box contains the planet with its magnetic field. The origin of the coordinate system is in the planet's center and the x axis is along the unperturbed solar wind velocity.
The following representation of the MHD equations is solved by the MHD simulation code: Here, D ρ , D v , D p , and D B are diffusion coefficients of the density, the velocity, the pressure, and the magnetic field, respectively. The magnetic diffusion coefficient is related to the electrical resistivity η by D B := η/µ 0 , with the vacuum permeability µ 0 := 4 π × 10 −7 . The current density j is calculated with Ampere's law, neglecting the displacement current: According to Ogino (1993), the MHD equations are solved using a two-step Lax-Wendroff method (Lax and Wendroff, 1960), which has an accuracy of second order in space and time. This numerical scheme uses finite difference approximations, which require the solution to be described on a discrete grid. The discretization of the MSP coordinates (x, y, z) is related to the indices (i, j, k), with i for x, j for y, and k for z. Thereby, valid values for the indices are i = 1, . . ., i max +2; j = 1, . . ., j max + 2; and k = 1, . . ., k max + 2. The number of spatial grid points N grid is given by The boundaries of the simulation box along the x direction are located at (i = 1, j, k) and (i = i max + 2, j, k). Along the y direction, the boundaries are located at (i, j = 1, k) and (i, j = j max + 2, k) and along the z direction at (i, j, k = 1) and (i, j, k max +2). The distance between grid points is x = x L /(i max + 1) in x direction, y = y L /(j max + 1) in y direction, and z = z L /(k max + 1) in z direction. Within the grid, the planet is located at (i P , j P , k P ), with i P = (i max + 1)/2, j P = (j max + 1)/2, and k P = (k max + 1)/2. The grid points (i, j, k) are related to a position (x, y, z) by The time t is discretized by the index l with l = 0, . . ., l max , whereby l = 0 is related to t = 0 and l max to t = t E . This corresponds to a constant time step of t = t E /l max . The spatial and time-dependent solution of the MHD equations u(t, x, y, z) defined by Eq. (8) can be represented by u n l,i,j,k , whereby n = 1, . . ., N var refers to a component of the vector u. Here, the number of the MHD variables is N var = 8.
Boundary conditions are required to solve the MHD equations. The inflow boundary conditions at (i = 1, j, k) are determined by the solar wind conditions. The solar wind velocity vector is restricted to the x axis, perpendicular to the planet's rotation axis. In contrast to the more simple inflow boundary conditions of Ogino (1993), we use time-varying solar wind conditions: Instead of using the mass density, the ion density N SW = ρ SW /m P can be used as well, with the proton mass m P = 1.672621898 × 10 −27 kg. In general, the physical properties at grid points at (i = 1, j, k) can be replaced by the solar wind conditions in every time step. The solar wind vector discretized in time is u SW,l := u SW (l t). All other outer boundaries are outflow boundaries according to Ogino (1993). In addition to the boundary conditions, our simulation requires initial conditions. Therefore, at time step l = 0, the physical quantities have to be determined in the entire simulation domain. The velocity v is assumed to be zero, so that u 2 1,i,j,k = u 3 1,i,j,k = u 4 1,i,j,k = 0. The density ρ and pressure p are initialized by their solar wind values, ρ SW (0) and p SW (0), respectively. Thus, u 1 1,i,j,k = ρ SW (0) and u 5 1,i,j,k = p SW (0) are used. The initial values of the magnetic field are determined by the planetary magnetic field. Taking only the dipole and the quadrupole moments into account, the planetary magnetic field can be calculated by Eq. (7) with Eqs. (2) and (5). The initial conditions determine a stationary solution at a certain time step l = l st with 0 < l st < l max . The solar wind conditions for l < l st are set to the values at l st . For l > l st , time-dependent solar wind conditions from spacecraft observations are applied in the simulation and the results are compared to spacecraft observations. The extended magnetic field geometry, especially the arbitrarily aligned dipole moment, can cause a complex motion of the plasma in the magnetosphere due to co-rotation of the plasma or magnetic reconnection, for example. Therefore, different from the simulation described by Ogino (1993), the simulation requires appropriate inner boundary conditions to allow a stable simulation for long time intervals. The planetary surface is approximated by a spherical surface with the distance R P to the planet's center. The distance of a grid point (i, j, k) to the center is defined by The velocity of the plasma inside the planet, i.e., r i,j,k < R P , is u n l,i,j,k = 0 , for r i,j,k < R P , n = 2, 3, 4.
It is also possible to set only the normal component of the velocity to zero. Density and pressure gradients between the planet's interior and the plasma outside must not cause forces on the plasma. However, the MHD equations are solved within the entire simulation domain. This can lead to an interaction as sketched in Fig. 2. The values at grid points inside the planet, which have at least one neighboring grid point outside, are replaced in every time step by average values of the surrounding neighboring grid points outside the planet, i.e., the non-boundary neighbors. Neighboring grid points of a grid point (i P , j P , k P ) are {(i P ± 1, j P ± 1, k P ± 1)}. This procedure suppresses the interaction of density and pressure gradients across the planet.
In contrast to the density and gas pressure, the magnetic field can interact with the planet due to electromagnetic induction, which is additionally implemented. Time-dependent variations in the magnetic field inside the planet are calculated by Eq. (12). We assume the velocity inside the planet to be zero, not considering the detailed time-dependent dynamo action. This is justified due to the very different timescales of magnetospheric and dynamo action. Concerning a possible coupling between magnetosphere and dynamo, see Glassmeier et al. (2007) and Heyner et al. (2011). Thus, the in-duction equation simplifies to the diffusion equation The resistivity distribution in the simulation box is modeled by Thereby, the planet's interior consists of two regions with different resistivity, a core with η Core = 1/σ Core and a mantle with η Mantle = 1/σ Mantle . The resistivity outside the planet is assumed to be constant with η A . As a consequence, the interaction due to diffusion is allowed depending on the electrical resistivity of the planet. This is of particular importance if Mercury is considered; however, it is of minor importance for Earth.

Using spacecraft data
The simulation uses solar wind parameters as boundary conditions. With respect to the two spacecraft in mission Bepi-Colombo, simultaneous observations of the solar wind as well as the magnetic field close to the planet will be available in the future. Thereby, the solar wind conditions can be determined either directly by in situ measurements or by using the reconstruction method by Nabert et al. (2015) from data within the interaction region. This allows a precise determination with a high time resolution of the solar wind conditions. The THEMIS mission provides data from similar spacecraft constellations at Earth. The solar wind conditions observed by a spacecraft need to be transferred to the inflow boundary of the simulation box. Therefore, the solar wind data are shifted by t SC/in , which is given by where n P is the solar wind's phase plane normal vector, r SC/in := r SC −r in is the distance vector between the spacecraft's position r SC , with the center of the inflow boundary r in .
For a comparison between simulation results and spacecraft data, the data need to be transferred into MSP coordinates. Therefore, the THEMIS data are first transferred into geographic (GEO) coordinates. Vectors in these coordinates can be transferred into MSP coordinates by rotation matrices R y (θ K ) and R z (λ K ). These matrices are defined by The rotation angles θ K and λ K are determined by the solar wind velocity vector: Here, v SW,GEO = (v x,SW,GEO , v y,SW,GEO , v z,SW,GEO ) T is the solar wind velocity vector using GEO coordinates and v SW,GEO is defined by Then, a vector in GEO coordinates g GEO can be transferred into a vector in MSP coordinates g MSP by Applying the coordinate transformation, the solar wind velocity becomes parallel to the x axis. For the validation of the code, the known planetary dipole moment components of Earth according to Table 1 To include the rotation of the planetary magnetic field due to the planet's rotation, the magnetic moments of the magnetic field are modified according to Eq. (23). The angles of the transformation will continuously vary along the spacecraft's trajectory. The rotation of the planetary magnetic field is performed every 200 time steps by subtracting the planetary field contribution from the total magnetic field at the time step considered and adding the planetary magnetic field corresponding to the new angles θ K and λ K .

Validation of the simulation code
To validate the modified simulation code, we compare a simulation using the known dipole moment of Earth according to Table 1 with THEMIS magnetosheath data from 24 August 2008 measured by THC (Angelopoulos, 2008). Solar wind conditions are observed by THB during the magnetosheath transition. The size of the simulation box is x L = 50.0 R E , y L = 60.0 R E , and z L = 60.0 R E , with the planet in the center. The simulation uses a grid with i max = 200, j max = 150, and k max = 150. Furthermore, the values of the diffusion coefficients were chosen according to Ogino (1993) for a stable simulation at Earth. The data and corresponding simulation results on the spacecraft's trajectory are presented in Fig. 3.
The bow shock is observed at about 00:30 UT and the magnetopause at about 03:30 UT in accordance with the simulation results. Most physical quantities show a good agreement between actual observations and simulation results. Only the ion density in the magnetosheath is observed to be higher than in the simulation. Furthermore, the magnetic field in the magnetosphere is about 15 nT weaker than measured by the spacecraft. The magnetopause thickness is observed to be smaller than in the simulation, which is related to the diffusion coefficients required for a stable simulation. These differences between simulation result and data are mainly caused by numerical errors. This can impact an estimation of the planetary magnetic field. The lower magnetospheric magnetic field will tend to overestimate the planetary magnetic field strength. However, this overestimation is limited due to the magnetopause location. A much stronger dipole moment will increase the magnetopause distance and the magnetic field will increase in the magnetosheath, which is not in accord with the observations. In general, the MHD simulation results agree well with the observations made. In a future step, the simulation code might be improved to reduce differences between simulation results and observations. An adaptive mesh refinement should be introduced to enhance the accuracy close to the magnetopause and reduce numerical errors.
3 Data assimilation

Cost function and its minimization
In the previous section, spacecraft data were qualitatively compared to the results of the MHD simulation. To quantify the deviations, a cost function is introduced. Therefore, the method of least squares is used. The sum of squared residuals, FQ, of M data -measured values y m at points x m with a model f depending on the parameters s is The parameters s of the MHD model are related to a vector space P. Here, for simplification, we consider only the planetary magnetic field parameters of the dipole and quadrupole. Thus, the parameters are s = (m, Q) T = (m x , m y , m z , Q xx , Q xy , Q xz , Q yy , Q yz ) T , (26) with Q := (Q xx , Q xy , Q xz , Q yy , Q yz ) T . The vector space corresponding to these parameters is named P D,Q . The parameters of the model s are estimated by minimizing the sum of squared residuals FQ. Transferred to the magnetic field observations B data := (B x,data , B y,data , B z,data ) T and MHD simulation results B simu := (B x,simu , B y,simu , B z,simu ) T with the spacecraft's position in the orbit r SC,m , the cost function K is A gradient-based optimization can be used to minimize the cost function with respect to the parameters of the model s. Starting from a point s 0 in parameter space, new points s k = (m k , Q k ) T are determined with every kth gradient calculation. This optimization problem is without constraints and can be solved using a quasi-Newton method. We use the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm (Press et al., 1992) to minimize the cost function.
The algorithm requires the gradient of the cost function K with respect to the parameters of the model at points s k in parameter space. There are different possibilities to compute these gradients. For example, the gradient can be approximated by difference quotients: where e l denotes the lth unit vector and s l is the corresponding step size in parameter space P. The sum of Eq. (28) includes all dimensions in parameter space (N P := dim(P)). Note that the gradient ∂ s K(s) is used as a column vector. The step sizes s l need to be adequately small to approximate the gradient sufficiently well.

Automatic differentiation and adjoint method
Each calculation of the cost function for a certain set of parameters requires a full global MHD simulation of the data along the spacecraft's trajectory. Thus, (N P + 1) simulations have to be performed in the calculation of the gradient according to Eq. (28). In general, the calculation is extremely time-consuming because of the nonlinearity of the MHD equations.
Another possibility to calculate the gradient ∂ s K(s) is the differentiation using analytical expressions, as explained in detail in the following. The solution of the MHD simulation depending on space and time coordinates u(t, x) can be represented by a vector u t,x on a numerical grid. This vector contains the solution at all time steps and discrete positions in space for all physical quantities in its components. Thus, the number of components of the vector u t,x is with the constants as defined in the previous chapter. The simulation code calculates the time-and spatially dependent solution of the MHD quantities u(t, x) iteratively. The iteration is implemented by a time loop in the simulation code, whereby u(t = l t, x) is computed from the results of the previous time step u(t = (l−1) t, x) and the boundary conditions. Equivalently, the time iteration of l can be considered as an iteration that steadily improves the approximation of the final solution vector u t,x , sketched in Fig. 4. Thereby, after the lth iteration step, the vector q l t,x contains the valid solution for all time steps that satisfy t ≤ l t. The final solution q l max t,x = u t,x is obtained after l max iteration steps. At the 0th iteration step, the simulation needs to be initialized. The simulation code calculates the solution in the lth iteration step from the previous approximation by a function F . Thus, the lth iteration step can be expressed by The cost function K(q l max t,x (s)) depends implicitly on the parameters s. With respect to the nested dependences of the solution in Eq. (30), the gradient of the cost function can be expressed by the chain rule: t,x (s) ∂s .
Planetary parameters Updating vector representation Figure 4. The left column sketches the time iteration of the MHD simulation code starting from the initial state at t = 0 determined by the planetary magnetic field parameters s. At a certain time step, a stationary solution u st,x is obtained and time iteration continues using time-dependent solar wind conditions until the simulation ends at u t E ,x . In the middle column, the corresponding interpretation of updating the complete time-and spatially dependent solution vector q t,x is presented. The cost function K is calculated from the final vector. On the right side, the automatic differentiation gradient calculation is presented, starting from the bottom and multiplying each factor according to Eq. (31).
This expression contains the stationary solution at l = l st because 0 < l st < l max . Although the cost function is evaluated only after the stationary state has been obtained, i.e. l > l st , the cost function also depends implicitly on prior time steps because the stationary solution emerges from the initial state. The magnetic field components of the initial state vector u 0 t,x depend on the parameters s because the initial magnetic field distribution in the simulation is created by the dipole and quadrupole parameters. Equation (31) can be written as using the following abbreviations: The function F in Eq. (30) is determined by the Lax-Wendroff scheme of the differential equations, which is related to the MHD equations and the boundary conditions. Therefore, the derivatives of the matrices in Eq. (31) can be determined by analytical expressions. The time iteration of the simulation code starts at l = 0 and ends at l max . After the lth iteration step, the corresponding matrix A −1 l can be calculated. Starting from a unit matrix, the matrix containing the derivatives is multiplied after every time step to the left side. Finally, after l max iterations, the gradient ∂K(s)/∂s is obtained.
This procedure is called forward differentiation because the gradient is calculated parallel to the execution of the time loop in the simulation code. The advantage over the computation of the gradient using difference quotients according to Eq. (28) is that no errors due to finite step sizes occur. Forward differentiation can be applied by hand to the simulation code, or alternatively, by an automatic differentiation (AD) tool (Wengert, 1964). Therefore, the cost function K and its dependent parameters s need to be declared in the code. The AD tool identifies all implicit dependences. The required analytical expressions for the derivations are taken from a library of the AD tool and inserted at the correct positions in the code. Note that the library contains elementary analytical derivations of all important expressions such as ∂ x sin(x) = cos(x). According to Eq. (31), the inserted expressions are related to each other such that the required gradient is computed. Several different AD tools were developed during the last decades. Here, the Transformation of Algorithms in Fortran (TAF) tool (Giering and Kaminski, 2003) from the company FastOpt was used (http://www.FastOpt.com).
An AD tool is able to differentiate a numerical code automatically, i.e., the tool can be applied without considering details of the implementation. However, for complex numerical codes, such as MHD simulation codes, problems might occur. For example, codes using parallel computing function calls by the message-passing interface (MPI), as they are also used for our simulation code, usually need further treatment. The analytical forward differentiation with an AD tool is also called automatic forward differentiation.
The computational costs for the calculation of the gradient with difference quotients or using automatic forward differentiation do not differ much. However, the latter procedure leads to a more efficient approach, the adjoint method. The adjoint method is extensively used for optimization problems in fluid dynamics, e.g., drag minimization by variations in surface geometry (e.g., Jameson, 1988;Othmer, 2008Othmer, , 2014Meader and Martins, 2012) or in seismology (e.g., Fichtner et al., 2006).
The adjoint method can be introduced with systems of linear equations, as described briefly in the following (e.g., Giles and Pierce, 2000;McNamara et al., 2004;Nabert et al., 2015). The symbols used for variables, vectors, and matrices refer to the previous considerations and will be marked by an asterisk as an index for distinction. We consider the following system of equations with the matrices of the coefficients A * , the solution X * , and the inhomogeneity L * . All elements of the matrices are real numbers. The scalar product of a vector g * with the matrix X * should be calculated using the following equation: This scalar product can be computed by solving Eq. (34) first, and then calculating the product of vector g * with the solution X * . This approach is called forward calculation. Another possibility is to use the adjoint method. To deduce the method, the product of a vector y T with both sides of the system of linear Eq. (34) is considered: The vector y * is defined by This equation is transposed, which leads to the adjoint system of equations Using Eqs. (36) and (37), the scalar product (Eq. 35) can be written as If the adjoint system of Eq. (38) is solved, y T * ·L * can be computed, which is nothing other than the scalar product (Eq. 35) as seen in Eq. (39).
The computational costs are mainly determined by the number of multiplications and differ for both possibilities of calculating the scalar product (Eq. 35). Only in case of a column vector inhomogeneity L * , is the number of multiplications equal. If the matrix L * consists of N * ,P column vectors, N * ,P systems of linear equations with a vector inhomogeneity need to be solved in the forward calculation. The adjoint method is independent of N * ,P and only a single system of linear equations needs to be solved. Therefore, the latter approach requires N * ,P times fewer multiplications.
The adjoint approach can be applied to the calculation of the gradient in Eq. (32). If the product of all matrices A −1 : The second product on the right side of Eq. (40) can be substituted by This equation can be related to the system of linear Eq. (34) by identifying the quantities with an asterisk as an index.
During the analytical forward differentiation, the gradient is computed successively using chain rule from the right to the left. This corresponds to a procedure, where, at first, the system of linear Eq. (34) is solved with respect to X * and then, the scalar product (Eq. 35) is calculated. If the matrix products of Eq. (40) are computed from left to right, at first, the product is determined. This corresponds to solving the adjoint Eq. (38) with respect to y * . The scalar product (Eq. 35) is determined by y T * · L * , which is related to the multiplication of y T · L to determine the cost function. Thus, the adjoint method for the gradient calculation of the cost function can be identified with the execution of the multiplications from the left to the right in Eq. (40).
The dimensions of the vectors and matrices involved are The calculation of the gradient by computing the matrices from the right to the left in Eq. (40) requires N rl multiplications of components, whereby If the gradient is calculated from the left to the right in Eq. (40), N lr multiplications of components are performed: The limit N P = 1 leads to N rl = N lr . Usually, one can assume N P N v because the number of grid points exceeds the dimensions of parameter space, which is eight for the dipole and quadrupole parameters. Then, Eq. (44) simplifies to Thus, the multiplication of the matrices in Eq. (40) from the left to the right, the adjoint approach, is more efficient for many parameters and requires about N P times fewer multiplications. The evaluation procedure for the simulation code is depicted in Fig. 4. However, the numerical implementation of the adjoint method is more difficult than the analytical forward differentiation. As described, the calculation of the gradient with the analytical forward differentiation is parallel to the execution of the time loop in the simulation code. In contrast, the solution at the last time iteration q l max t,x has to be known to calculate g T · A −1 l max . Thus, at first, the simulation needs to be performed once, whereby all calculation results that are required for the matrix multiplications are stored temporarily. Then, the gradient can be computed according to the adjoint approach.
There are AD tools that can derive codes not only according to forward differentiation but also according to the adjoint method. However, the available memory on a computer is often too small to store all the required results in the central memory. The memory consumption M Memory can be estimated by multiplying the number of grid points of the simulation box N grid according to Eq. (14) with the number of time steps l max , the number of MHD variables N var , and the size of a MHD variable M var : The number of variables of the MHD simulation is N var = 8 and the size of such a variable is M var = 4 bytes if a float variable is assumed. This gives a memory consumption of about 1600 GB for a simulation grid i max = j max = k max = 100 and l max = 5 × 10 5 . The central memory is often much smaller, so that a certain portion of the variables needs to be stored on the hard disk. However, the seek time of the central memory is much smaller, and thus, the runtime of the algorithm becomes longer by storing data on the hard disk.
To minimize the access to the hard disk, checkpointing can be used. Thereby, the main iteration loop of the algorithm is split at certain checkpoints into smaller loops. Then, the smaller loop iterates over N loop,check iterations instead of the complete time loop with l max iterations. This reduces the memory requirements for such a loop to The variables during a calculation of such a smaller loop can be stored within the central memory. After the execution of the smaller loop, the results are stored to the hard disk to combine all results of the smaller loop. However, using smaller loops, the adjoint method can only be applied within these smaller loops. Thus, checkpointing reduces the seek time of the memory, but the adjoint approach is restricted to a smaller part of the algorithm. In total, this reduces the runtime of the algorithm, but the performance is below the theoretical possible performance of the adjoint approach with unlimited central memory space. Note that instead of using only observations of a single spacecraft, simultaneous measurements from multiple spacecraft at different locations can be calculated in Eq. (27) as well. This can be done without additional computational costs and memory capacity because the solution of the MHD simulation is calculated in the entire simulation domain and stored anyway.

Adjoint MHD simulation code
The AD tool applied for an automatic backward differentiation transfers a numerical code for the calculation of a cost function into an adjoint code, which can compute the gradient according to the adjoint approach. This was done for the MHD simulation code presented in the previous chapter by the TAF tool of the company FastOpt. Thereby, the parameter space of the dipole and quadrupole parameters P D,Q was Rel. error mnorm Figure 5. The relative errors for gradients determined by difference quotients and the adjoint method for the dipole components. Thereby, on the left side, different points in parameter space are considered. On the right side, the dependence of the error on a different number of time iteration steps is shown. t Adj t DQ Figure 6. The runtime of calculating gradients using the adjoint method t Adj and using difference quotients t DQ . On the left side, the dependence on the number of time iteration steps is presented. On the right side, the ratio of the runtimes is shown.
considered. Thus, the adjoint code computes the gradient of the cost function (27) with respect to the parameters (26).
To validate the adjoint MHD simulation code, the gradients produced by the adjoint code are compared to those calculated by difference quotients according to Eq. (28). Therefore, at first, the interaction of the solar wind with the planetary magnetic field is neglected and the planetary magnetic field represented by its dipole and quadrupole moments is only taken into account. Gradients at certain points s 0 = (0, 0, m z , 0, 0, 0, 0, 0) T in parameter space are considered. Thereby, the m z component varies between 0.7 and 1.2 m norm with a step size of 0.1 m norm . The spacecraft data B data on a trajectory r SC , required to calculate the cost function, are generated synthetically along the x axis between 20.2 and 9 R E with a step size of 0.42 R E . The gradient of the cost function is calculated using difference quotients ∂ s 0 K DQ and the adjoint method ∂ s 0 K Adj for different s 0 . The relative error of the ith component of the gradient is defined by Here, The result of the maximum function max(a, b) is the larger value of a and b and e i defines the i unit vector. The relative error of the dipole moment for different s 0 is depicted in Fig. 5. The error is smaller than 10 −4 , i.e., both gradients agree for different values of m z . Now, the interaction of the planet with the solar wind is taken into account. Thereby, the gradients calculated by the adjoint method can be compared to gradients computed by difference quotients for a different number of time iterations. The corresponding relative errors of the dipole components of the gradient are shown in Fig. 5 on the right side. It is seen that the gradients agree very well.
To determine the runtime of the adjoint code, the gradient calculations are performed on a test computer for different numbers of time iteration steps. The test computer uses 64 GB of central memory and has an Intel Xeon E5 processor with 12 cores and 24 threads at 2.5 GHz. The results are shown in Fig. 6 on the left side. The runtime of calculating the gradients increases linearly with the number of iteration steps performed, as expected. The plot on the right side presents the ratio of the runtime of the adjoint code t Adj and the runtime using difference quotients t DQ . On the test computer, the adjoint method calculates the gradient about 33 % faster than using the difference quotients. According to the previous argumentation, eight parameters require nine MHD simulation calls to determine the gradient with difference quotients of Eq. (28). The adjoint method needs to run the simulation once to store all necessary results and another simulation run to calculate the gradient. In Fig. 4, the first simulation run is shown on the left side from top to bottom, storing all results in vector representation presented in the middle. Using these results, the automatic differentiation procedure calculates the gradient as sketched on the right from bottom to top, which corresponds to another simulation run. Thus, in theory, the adjoint method can be up to 78 % faster than the difference quotient calculation. However, our adjoint code uses checkpointing because the central memory is too small, which increases the runtime. Consequently, the test computer configuration is not optimal to achieve the best performance. The performance can be improved by using a computer cluster with distributed memory space. Then, each core can access its own memory space and checkpointing can be avoided. This can increase the performance. Furthermore, it should be noted that without additional computational costs and memory requirements, more parameters can be introduced in the estimation process of the adjoint approach, such as octupole planetary magnetic field parameters.
4 Estimation of planetary magnetic field parameters 4.1 Using synthetic data At first, the results of data assimilation using synthetically produced data are considered, neglecting the interaction of the planetary magnetic field. The simulation box has a length of 60.2 R E in every dimension with the planet in its center. The number of grid points is i max = j max = k max = 300. Synthetic spacecraft data B data are calculated from the magnetic field distributions of certain dipole and quadrupole parameters s Planet along a trajectory r SC (x). The initialization s 0 for the estimation procedure of the planetary parameters differ from these moments. Starting from this initialization, the cost function is minimized.
The first trajectory considered here is r SC (x) = (x, 10.1 R E − x, 0) T , which is diagonal within the xy plane. The spacecraft's magnetic field data B data are generated by a dipole along the z axis, i.e., s Planet = (0, 0, 1, 0, 0, 0, 0, 0) T m norm .
In parameter space, the starting point of the estimation procedure is s 0 = (1, 0, 0, 0, 0, 0, 0, 0) T m norm , which is nothing other than a dipole along the x axis. The BFGS algorithm iteratively computes new gradients in which direction the cost function (27) is minimized. The corresponding dipole parameters during the minimization, depending on the iteration step of calculating new gradients, are presented in Fig. 7 in the top left panel. The vector of the dipole moment s Planet is reconstructed very well after 15 iteration steps.
Next, a different trajectory is considered to produce the synthetic data: r SC = (x, 30.1 R E − x, 0) T . This orbit is farther away from the planet than the previous trajectory. The results of the estimation process are depicted in Fig. 7 in the top right panel. Again, the dipole vector was reconstructed very well, however, about twice as many iterations were required. This is related to the variations in the magnetic field strength, with a smaller percentage ratio of the variations on the trajectory farther out. The plot on the bottom shows the iterative assimilation of the MHD simulation to the THC data (blue) before the first iteration (black) and after the 13th iteration (red).
The simultaneous estimation of dipole and quadrupole parameters is considered as well by using magnetic field data B data generated by s Planet = (0, 0, 1, 1, 0, 0, 0, 0) T m norm . Thereby, the trajectory r SC (x) = (x, 10.1 R E − x, 0) T is used. The reconstruction of the planetary magnetic field, starting from s 0 = (0, 0, 0, 0, 0, 0, 0, 0) T m norm , is shown in Fig. 7 in the bottom left panel. Additionally, the estimation process from a different point in parameter space s 0 = (1, 0, 0, 0, 0, 0, 0, 0) T m norm is realized. The results are presented in Fig. 7 in the bottom right panel. In both situations, the moments s Planet were correctly determined. Thereby, the estimation starting in parameter space farther away from the solution required 15 more iteration steps. Altogether, it is seen that the dipole as well as the quadrupole parameters can be reconstructed from synthetic data, whereby larger magnetic field variations along the trajectory or a starting point s 0 closer to the minimum speed up the estimation process.

Using THEMIS data
After proving the general functionality of the algorithm, it is applied using THEMIS spacecraft data at Earth. Thereby, data from the magnetosheath, a region strongly influenced by the interaction process of the solar wind, is considered. Different to the situation at Earth, spacecraft magnetic field observations in Mercury's magnetosphere are strongly mod-ified due to the magnetosphere's small size. Here, we restrict our approach to Earth's magnetosheath data to consider a strongly disturbed magnetic environment comparable to the situation at Mercury. However, in final application, magnetospheric data will be used as well to reduce errors. Due to the weak components of the quadrupole at Earth, their influence is negligible close to the magnetopause. The largest quadrupole moment corresponds to a magnetic field of < 0.1 nT at 10 R E . This is very small compared to the contribution of the dipole's z component of about 30 nT. Thus, only the dipole moment is considered in the estimation at Earth. The estimation process starts from s 0 = (0.25, 0, −1.2, 0, 0, 0, 0, 0) T m norm in parameter space. Subsequently, the cost function is minimized iteratively, whereby every new calculation of a gradient denotes a new iteration step.
The magnetosheath transition observed by THC on 24 August 2008, presented in Fig. 3, is used as a first estimation of the dipole parameters. Thereby, the solar wind measurements of the THB spacecraft determine the inflow boundary conditions of the MHD simulations. The results of the estimation process are depicted in Fig. 8. Thereby, the values of the dipole moment as well as the cost function are shown. The dipole components vary mainly during the first iterations. The value of the cost function is strongly reduced from iteration steps 0 to 1 and 6 to 7. After the eighth iteration step, the cost function and the components of the dipole moment do not change much. Finally, the solution vector after 13 iteration steps is s 13 = (−0.072, 0.084, −1.078, 0, 0, 0, 0, 0) T m norm . Thereby, all components are closer to the value of the dipole moment of Earth according to Eq. (24) than the initial values. The relative errors for the dipole components are m x = (s 13,1 −m x,E )/m x,E = −0.44, m y = (s 13,2 −m y,E )/m y,E = −0.47, and m z = (s 13,3 − m z,E )/m z,E = 0.14. The relative error of the z component is the smallest because Earth's dipole is mainly along the z direction. Considering the magnitude, the relative error is 0.13. The panels showing B x , B y , and B z in Fig. 8 show the magnetic field distribution of the MHD simulation, which corresponds to s 0 and s 13 .

Conclusions
We introduced an approach to estimating planetary parameters using global MHD simulations of the interaction of the solar wind with a planet. A simple MHD simulation code was introduced and prepared for an automatic differentiation tool to obtain an adjoint MHD simulation code. The differences of spacecraft data and corresponding simulation results are quantified by a cost function, which is minimized by a gradient-based optimization. The adjoint code computes the gradient with lower computational effort compared to a difference quotient calculation.
Our approach is designed to estimate planetary magnetic fields, especially if the field strength is weak so that the interaction strongly modifies the magnetic field of the planet's environment. We used THEMIS data of Earth's magnetosheath to simulate such an environment to test our approach. The results of the estimation process can be affected by statistical and systematic errors. Therefore, statistical errors will not contribute to the mean values of the estimated planetary magnetic field if a sufficiently large number of magnetosheath transitions are considered. For example, the solar wind density can be measured incorrectly due to a spacecraft potential (McFadden et al., 2008). However, the density is usually equally overestimated and underestimated. Considering a single magnetosheath transition, the estimated dipole magnitude of Earth differs about 13 % from the expected value. Based on this approach, further transitions can be considered to minimize the errors. Note that including magnetospheric data at Mercury will further reduce the statistical error. The runtime of the parameter estimation using the test computer is about 1 week using the 5 h magnetosheath data. This fast calculation procedure allows taking a lot more data into account, especially if supercomputers are used.
We used a simple MHD simulation code to investigate the automatic differentiation procedure. As a next step, the simulation code needs to be improved, e.g., by an adaptive mesh refinement, to reduce numerical errors. Also, kinetic or hybrid simulation codes can be considered and treated with an automatic differentiation tool. The limiting factor for applying the automatic differentiation is not the complexity of the code but the memory consumption. Using our test computer, the adjoint approach was about 33 % faster than a finite difference approach. Although the adjoint MHD code does not calculate the gradient very much faster than using difference quotients, it has the advantage that further parameters such as higher-order magnetic field moments or parameters of the planet's conductivity can be included with nearly no additional computational costs. Nonetheless, the performance of the adjoint code is, related to memory limitations of our test computer, much lower than expected from theory. Thus, as a further step, the test computer configuration needs to be modified to increase performance. It is beneficial for the adjoint approach that each core has access to its own memory, which is different from our test computer. Thus, instead of using traditional supercomputers with fewer more powerful computers, a computer cluster using many commoditized computers with their own memory should be considered. These computer configurations recently became very popular in big data analysis using Google's well-known MapReduce technique (Dean and Ghemawat, 2004). A similar configuration might be more suitable for the adjoint code and increase its performance. The ability of our approach to perform on clusters with many cores depends on the parallelization of the MHD simulation code. Although this can be limited to a certain number of cores, another possibility to parallelize the estimation process is to split the data into subsets and perform the calculation of these subsets in parallel. Each data set will provide an individual estimator of the planetary parameters that can be applied in an ensemble averaging technique to reduce errors.
Data availability. Data from the THEMIS mission are publicly available and can be obtained from http://themis.ssl.berkeley.edu/ data/themis from the University of California Berkeley (Angelopoulos, 2008).
Competing interests. The authors declare that they have no conflict of interest.