1 Introduction

This paper focuses on the computational solution of multiscale inverse problems, i.e., where the quantities to be sought are related to an unknown microscopic mathematical model, while measurement data are available only on a much coarser scale. Due to the scale mismatch between given data and unknowns, the direct recovery of microscopic quantities, e.g., in the form of a coefficient of a partial differential equation (PDE), is not only expensive but may also lead to unsatisfactory results. Therefore, the aim of this paper is to introduce a computational framework—inspired by numerical homogenization methods—to reconstruct an effective model, i.e., an alternative quantity on the coarse scale of the available data.

Effective models are the key to bridge the discrepancy between a microscopic coefficient and a coarse scale of interest in the forward setting. They provide models that compute reliable approximations of the solution of a PDE even in the presence of microscopic quantities. If structural assumptions such as (local) periodicity or scale separation hold, classical homogenization methods (see, e.g., [21, 28, 42, 43]) based on analytical homogenization theory can be used. These methods are local in the sense that the communication among the degrees of freedom is only between neighbors. In a more general setting where these structural assumptions do not hold a priori, numerical homogenization methods (see, e.g., [8, 13, 17, 25, 27, 33, 35]) provably provide an alternative. These methods are based on a coarse mesh with a characteristic mesh parameter and compute special problem-adapted basis functions with optimal approximation properties. Compared to the locality of classical homogenization methods, numerical homogenization methods typically involve a slight deviation from local communication between the degrees of freedom which, in turn, leads to somewhat increased sparsity patterns of the corresponding system matrices. Since this non-locality can be controlled, we refer to these methods as quasi-local.

This paper follows the pragmatic approach of reconstructing quasi-local effective models (i.e., their representation in term of quasi-local system matrices) that describe the effective behavior of a medium with microstructures based on coarse (i.e., low-resolution) measurement. On the one hand, this provides a surrogate to a direct reconstruction in the above-mentioned multiscale context. On the other hand, the reconstruction of an effective model allows us to investigate the requirement for a certain quasi-locality in the context of numerical homogenization approaches.

The goal of this work is to promote this idea along with algorithmic aspects and preparatory numerical experiments. To demonstrate the feasibility and the potential of the approach, we investigate a stationary linear elliptic multiscale diffusion problem. Moreover, we consider a worst-case scenario without any structural a priori knowledge on the underlying diffusion coefficient and do not assume that the heterogeneous coefficient can be parameterized by a few unknown parameters that could more easily be identified. The main novelty of our approach is that it aims to recover information about the microscopic scale in the sense of reproducing the effective behavior of corresponding solutions, instead of aiming at identifying the actual microscopic coefficient. In the multiscale setting, in which measurements are given on a coarse scale and without a priori assumption on the structure of the microscopic coefficient, one cannot hope to reasonably identify the actual coefficient and our approach presents an alternative strategy. Our aim is not to compete with classical inverse strategies but to provide a first step towards reasonable surrogates in the case where a direct recovery of the coefficient is either too expensive or not reliable due to multiscale aspects of the problem.

The remaining parts of the paper are organized as follows. We start with introducing the microscopic forward problem and motivating the reconstruction of an effective model, represented by an effective system matrix (Sect. 2). This strategy is inspired by numerical homogenization strategies which provably provide reliable effective models for the given forward model. The rigorous motivation is presented in Sect. 4 and is mainly to emphasize that an effective model indeed exists in the setting of very general coefficients. Based on the considerations for the forward problem, we then tackle the reconstruction of an effective quasi-local model from given measurements. To this end, we prescribe a quasi-local sparsity pattern of the system matrices and rephrase the inverse problem as a non-linear least squares problem for which we apply iterative minimization techniques such as the gradient descent or the Gauß–Newton method (Sect. 3). In a series of numerical experiments (Sect. 5), we show that quasi-local effective models can indeed be reconstructed. In particular, we consider the cases where we are given measurements for all possible (coarse) boundary conditions, and also the setting where solutions are only known for a few boundary conditions. The aim of the experiments is to show that allowing the model to deviate from locality improves the inversion process and, thus, justifies the previous discussion.

2 Microscopic Forward Problem

In this section, we present the forward model and identify an effective discrete model, which is characterized by an appropriate system matrix.

2.1 Problem Setting

We consider the prototypical second-order linear elliptic diffusion problem

$$\begin{aligned} \begin{aligned} -{\text {div}}A \nabla u&= f&\text { in } \varOmega ,\\ u&= u^0&\text { on } \partial \varOmega , \end{aligned} \end{aligned}$$
(1)

where \(\varOmega \subset \mathbb {R}^d\), \(d\in \{1,2,3\}\) is a polyhedral domain and the diffusion coefficient A encodes the microstructure of the medium. We do not make any structural assumptions on the coefficient such as periodicity or scale separation. Admissible coefficients are elements of the following set,

$$\begin{aligned} \mathcal {A}:= \left\{ \begin{aligned}&A \in L^\infty (\varOmega ;\mathbb {R}_\mathrm {sym}^{d\times d})\,:\,\exists \, 0< \alpha \le \beta < \infty \,:\\ {}&\forall \xi \in \mathbb {R}^d,\, \text {a.a. } x \in \varOmega \,:\, \alpha |\xi |^2 \le A(x)\xi \cdot \xi \le \beta |\xi |^2\end{aligned}\right\} , \end{aligned}$$

which only requires minimal assumptions.

Since solutions to problem (1) do not necessarily exist in the classical sense, we are interested in the weak solution of (1) in the Sobolev space \(V:=H^1(\varOmega )\) which is characterized by the following variational formulation. Given \(A\in \mathcal {A}\), \(u^0 \in X:=H^{1/2}(\partial \varOmega )\), and \(f \in L^2(\varOmega )\), we seek \(u \in V\) such that

$$\begin{aligned} \begin{aligned} a(u,v)&= (f,v)&\text {for all}\; v \in V^0:=H^1_0(\varOmega ),\\ \mathrm {tr}\, u&= u^0&\text {on}\; \partial \varOmega , \end{aligned} \end{aligned}$$
(2)

where \(\mathrm {tr}:V\rightarrow X\) is the trace operator, \((w,v) := (w,v)_{L^2(\varOmega )}\) denotes the \(L^2\) inner product, and \(a(w,v) := \int _{\varOmega } A \nabla w \cdot \nabla v \,\text {d}x\). Note that instead of (1), we could as well consider a general second-order linear PDE in divergence form with additional lower-order terms. It is important to emphasize that, in the presence of lower-order terms, one cannot necessarily exploit symmetry and might need additional assumptions to ensure coercivity. Nevertheless, such a generalization is straight-forward and is omitted for simplicity.

2.2 Coarse Discretization

In practice, it is favorable to rewrite problem (2) as a problem with homogeneous Dirichlet boundary conditions in \(V^0\). Let \(E^b:X\rightarrow V\) be a linear extension operator, which also defines the restriction operator \(R:V\rightarrow V^0\) by \(R:=1-E^b\,\mathrm {tr}\). Then, we can decompose \(u = Ru + (1-R)u = Ru + E^bu^0\) and problem (2) reduces to finding \(Ru \in V^0\) such that

$$\begin{aligned} a(Ru,v) = (f,v) - a(E^bu^0,v) \end{aligned}$$
(3)

for all \(v \in V^0\).

Let us now introduce a coarse target scale H (e.g., the resolution of the data available for the inverse problem). We adopt the notation from numerical homogenization where a capital H is used to indicate that the scale is indeed a coarse one. In typical applications, H will be much larger than the microscopic scale, i.e., the scale on which the diffusion coefficient varies.

In order to discretize (3), let \(\mathcal {T}_H\) be a mesh of orthotopes with characteristic mesh size H and denote with \(Q^1(\mathcal {T}_H)\) the corresponding space of piecewise bilinear functions. Further, we define the discrete spaces \(V_H:=Q^1(\mathcal {T}_H) \cap V\), \(V_H^0:= V_H\cap V^0\), and \(X_H:= \mathrm {tr}\,V_H\) with dimensions \(m = \mathrm {dim}\,V_H\), \(m^0 = \mathrm {dim}\,V_H^0\), and \(n = \mathrm {dim}\,X_H\), respectively. The choice of these finite element spaces is not unique and other standard finite element spaces could be used.

In the context of inverse problems, it is reasonable to consider that \(u^0\) is defined as the first-order finite element approximation of coarse experimental boundary data which approximate the real data up to order H in the \(H^{1/2}\) norm. That is, in the following we will assume that \(u^0 \in X_H\). Further, we assume to have a discretized extension operator \(E^b_H:X_H\rightarrow V_H\) that fulfills \(E^b\vert _{X_H} = E^b_H\) and a corresponding restriction operator \(R_H:V_H\rightarrow V_H^0\) defined by \(R_H:= 1 - E^b_H\,\mathrm {tr}\).

Based on the above spaces, we introduce an injective linear operator

$$\begin{aligned} \mathcal {G}:V_H\rightarrow V\quad \text {with}\quad \mathcal {G}V_H^0\subseteq V^0, \end{aligned}$$
(4)

which leads to the following discretization of (3): find \(u_H \in V_H\), \(u_H = R_Hu_H + E^b_Hu^0\), such that \(Ru_H \in V_H^0\) solves

$$\begin{aligned} a(\mathcal {G}R_Hu_H,\mathcal {G}v_H) = (f,v_H) - a(\mathcal {G}E^bu^0,\mathcal {G}v_H) \end{aligned}$$
(5)

for all \(v_H \in V_H^0\). Further, we define the solution operator

$$\begin{aligned} \mathfrak {L}_A:X_H\times L^2(\varOmega )&\rightarrow \quad V,\nonumber \\ (u^{0},f)&\mapsto \quad u,\;\text {where}\,u\,\text {solves}\,\,(2) \end{aligned}$$
(6)

and its discretized version

$$\begin{aligned} \mathfrak {L}_{A,\ell }^\mathcal {G}:X_H\times L^2(\varOmega )&\rightarrow \quad V_H,\nonumber \\ (u^{0},f)&\mapsto \quad u_H,\;\text {where}\;u_H\,\text {solves}\,\, (5). \end{aligned}$$
(7)

The operator \(\mathfrak {L}_A\) (and similarly also \(\mathfrak {L}_{A,\ell }^\mathcal {G}\)) can be written as

$$\begin{aligned} \mathfrak {L}_A(u^0,f) = \mathfrak {L}_A(u^0,0) + \mathfrak {L}_A(0,f) \end{aligned}$$
(8)

with the linear operators \(\mathfrak {L}_A(\cdot ,0):X_H\rightarrow V\) and \(\mathfrak {L}_A(0,\cdot ):L^2(\varOmega )\rightarrow V\). For simplicity, we assume in the following that f is a fixed function and only the boundary conditions may change. The generalization to the case where f is variable as well is conceptually straightforward but slightly more involved. The decomposition (8) motivates the distance function between operators defined by

$$\begin{aligned} \texttt {{dist}}_f(\mathfrak {A},\mathfrak {B}) := \left( \Vert \mathfrak {A}(\cdot ,0)-\mathfrak {B}(\cdot ,0)\Vert ^2_{\mathcal {L}(X_H;L^2(\varOmega ))} + \Vert \mathfrak {A}(0,f)-\mathfrak {B}(0,f)\Vert ^2_{L^2(\varOmega )}\right) ^{1/2} \end{aligned}$$
(9)

for all \(\mathfrak {A,\,B}:X_H\times L^2(\varOmega )\rightarrow V\). Note, however, that also other distance functions could be used.

2.3 Characterization in Terms of a Stiffness Matrix

As a next step, we discuss an alternative representation of the operator \(\mathfrak {L}_{A,\ell }^\mathcal {G}\) using the stiffness matrix corresponding to the discrete formulation (5). Given a coefficient \(A\in \mathcal {A}\) and a mapping \(\mathcal {G}:V_H\rightarrow V\) as above, the stiffness matrix \(S_H=S_H(A,\mathcal {G})\) is defined by

$$\begin{aligned} S_H[i,j] := a(\mathcal {G}\Lambda _{z_j}, \mathcal {G}\Lambda _{z_i}),\quad i,\, j \in \{1,\dots ,m\}, \end{aligned}$$
(10)

where \(i \mapsto z_i\) is a fixed ordering of the m nodes in \(\mathcal {T}_H\) and \(\Lambda _z\) denotes the classical finite element hat function associated with the node \(z \in \mathcal {T}_H\).

Therefore, we may define the operator

$$\begin{aligned} \begin{aligned} \mathfrak {L}_{S_H}:X_H\times L^2(\varOmega )\rightarrow & {} V_H,\\ (u^{0},f)\mapsto & {} u_H,\;\text {where}\;u_H\;\text {solves}\\&\left\{ \begin{aligned}&S_{H,0} R_Hu_H = R_H M_H f_H - R_HS_H E^b_Hu^0,\\&u_H= u^{0}\;\text {on}\;\partial \varOmega ,\end{aligned}\right. \end{aligned} \end{aligned}$$
(11)

with the classical finite element mass matrix \(M_H\), the restriction \(S_{H,0} =R_HS_H R_H^T\) of \(S_H\) to the inner nodes of \(\mathcal {T}_H\), and \(f_H := \varPi _H f\) the \(L^2\) projection of f onto \(V_H\). For better readability, we use the notation \(v_H\) (or \(B_H\)) for both the vector \(v_H \in \mathbb {R}^{m}\) (or the matrix \(B_H \in \mathbb {R}^{m^0\times m}\)) and the corresponding function \(v_H \in V_H\) (or the mapping \(B_H:V_H\rightarrow V_H^0\)).

Remark 1

Note that the extension operator \(E^b_H\), the restriction operator \(R_H\), and the mass matrix \(M_H\) depend on the coarse finite element space, and are therefore fixed, such that the operator \(\mathfrak {L}_{S_H}\) only depends on \(S_H\), \(u^0\) and f.

The following theorem is the basis of the inverse strategy discussed in the next section. It is a direct consequence of Lemma 1 and Corollary 1, which are proven in Sect. 4. It states the existence of an appropriate coarse model, characterized by an operator \(\mathcal {G}\), that is able to capture the effective behavior of the original model (6).

Fig. 1
figure 1

Sparsity patterns of matrices in \(\mathcal {M}(\ell ,\mathcal {T}_H)\) for different values of \(\ell \) on a Cartesian grid (\(H = 2^{-4}\)) with lexicographic ordering in two dimensions

Theorem 1

(Existence of an effective model) There exists an operator \(\mathcal {G}\) as in (4) and a corresponding stiffness matrix \(S_H(A,\mathcal {G})\) such that

$$\begin{aligned} \texttt {{dist}}_f\big (\mathfrak {L}_A,\mathfrak {L}_{S_H(A,\mathcal {G})}\big ) \lesssim H. \end{aligned}$$
(12)

Moreover, there exists a choice of \(\mathcal {G}\) and \(\ell \sim |\log H|\) such that \(S_H(A,\mathcal {G}) \in \mathcal {M}(\ell ,\mathcal {T}_H)\) with

$$\begin{aligned} \mathcal {M}(\ell ,\mathcal {T}_H) := \left\{ S_H \in \mathbb {R}^{m\times m}_\mathrm {sym}\,:\, \forall \, 1 \le i\le j \le m\,: \,z_i \notin \mathrm {N}^\ell ({z_j}) \Rightarrow S_H[i,j] = 0 \right\} \end{aligned}$$
(13)

defined as the set of matrices that may have a non-zero entry at position [ij] only if the corresponding nodes \(z_i\) and \(z_j\) belong to the \(\ell \)-neighborhood of each other (see Fig. 1 for an illustration). The \(\ell \)-neighborhood \(\mathrm {N}^\ell \) is defined by

$$\begin{aligned} \mathrm {N}^\ell (\omega ) := \mathrm {N}(\mathrm {N}^{\ell -1}(\omega )),\,\ell \ge 1,\quad \mathrm {N}^0(\omega ) := \bigcup \bigl \{T \in \mathcal {T}_H\,:\, \omega \cap \overline{T}\subseteq \overline{T} \bigl \} \end{aligned}$$
(14)

for \(\omega \subseteq \varOmega \).

We call the operator \(\mathfrak {L}_{S_H(A,\mathcal {G})}\) the effective model and \(S_H(A,\mathcal {G})\) the effective stiffness matrix.

Theorem 1 justifies the inverse procedure that is presented in the following section. In particular, we may interpret coarse measurements of a solution operator \(\mathfrak {L}_A\) as approximations obtained by an effective model due to the fact that there exists an appropriate effective model which is reasonably close.

2.4 Quasi-Locality and Connection to Numerical Homogenization

The operator \(\mathcal {G}\) defined in (4) and the corresponding effective stiffness matrix defined in (10) are strongly related to numerical homogenization methods. In contrast to analytical homogenization, these approaches provably work beyond structural assumptions such as (local) periodicity or a clear separation of scales which cannot be guaranteed for general microstructures.

One such method is the Localized Orthogonal Decomposition (LOD) approach which provides effective models that provably cope with arbitrary rough coefficients in a large class of model problems including diffusion problems [15, 17, 27], elasticity [2, 16] and wave propagation [1, 10, 11, 26, 37, 41], without requiring periodicity or scale separation. This method allows us to explicitly characterize an operator \(\mathcal {G}\) to prove Theorem 1 in Sect. 4. For linear elliptic problems, there are various other numerical homogenization approaches such as the Generalized Finite Element Methods (GFEM) [3], AL bases [13], Rough Polyharmonic Splines (RPS) [35], the Generalized Multiscale Finite Element Method (GMsFEM) [8], Gamblets [33], CEM-GMsFEM [5], the higher-order multiscale approach described in [25, Ch. 3], and their variants with similar properties as LOD. All these methods compute special problem-adapted basis functions with optimal approximation properties based on underlying Galerkin methods. To achieve optimal accuracy, a moderate price in terms of the computational complexity has to be paid compared to a standard finite element method (fixed order) on the same mesh in order to account for microscopic information. The computational overhead is either characterized by an increase in the number of degrees of freedom per mesh entity (GFEM, GMsFEM), e.g., elements or nodes, or in enlarging the support of the discrete basis functions (LOD, RPS, Gamblets, AL bases). In both cases, the result is a slightly denser sparsity pattern of the corresponding system matrices which is due an increased communication between the degrees of freedom. This quasi-locality distinguishes the above numerical homogenization methods from classical numerical multiscale methods based on homogenization theory such as the Multiscale Finite Element Method (MsFEM) [21], the Two-Scale Finite Element Method [28], or the Heterogeneous Multiscale Method (HMM) [42, 43] that share the communication pattern of standard finite element methods.

Quasi-locality also showed to be viable in connection with the pollution effect in high-frequency time-harmonic wave propagation [4] which cannot be avoided unless the mesh size is coupled to, e.g., the polynomial degree [29,30,31] or the support of the basis functions [37] in a logarithmic way. Promising results using non-local models have also been achieved in the field of peridynamics [7, 24, 40] or in isogeometric analysis [6, 22]. Moreover, quasi-locality also includes the increased communication of higher-order finite element approaches.

3 Inverse Problem: Reconstruction of an Effective Model

Based on the considerations in the forward setting, we are now able to formulate the inverse problem which reconstructs an effective model characterized by a stiffness matrix with a certain sparsity behavior.

3.1 Problem Setting

Let us assume that the diffusion coefficient A is unknown and that structural assumptions such as periodicity, quasi-periodicity, and given parameterization by few degrees of freedom are not satisfied a priori. In an ideal setting, information about solutions to problem (2) in the form of a solution operator

$$\begin{aligned} \tilde{\mathfrak {L}}:=\mathfrak {L}_A(\cdot ,f):X\rightarrow V\end{aligned}$$
(15)

would be given. In practical applications, however, boundary data and information about the corresponding solutions are only available on some (coarse) scale, possibly much larger than the (micro) scale on which the diffusion coefficient and the corresponding solutions vary. A classical formulation of the inverse problem, for a fixed right-hand side f, consists in recovering A in (2) given the mapping

$$\begin{aligned} \tilde{\mathfrak {L}}^{\mathrm {eff}}:=\mathfrak {L}^{\mathrm {eff}}_A(\cdot ,f):X_H\rightarrow V_H\end{aligned}$$
(16)

which comprises coarse measurements of solutions to (2).

However, since the unknown coefficient A includes microscopic features, a direct approach of recovering A by simulations on the microscopic scale is computationally unfeasible and does not provide reasonable reconstructions due to the mismatch between coarse data and a possibly microscopic coefficient. In this section, we present an alternative approach to recover information about the (macroscopic) effective model taking into account the presence of a micro-scale diffusion coefficient. Rather than reconstructing the diffusion coefficient itself, we tackle the reconstruction of an effective stiffness matrix that is able to reproduce the given data related to solutions to (2). Therefore, the alternative formulation of the inverse problem reads:

$$\begin{aligned} \text {given}\,\tilde{\mathfrak {L}}^{\mathrm {eff}}:X_H\rightarrow V_H,\;\text {find the corresponding stiffness matrix}\,{\tilde{S}}_H. \end{aligned}$$
(17)

This alternative inverse problem is built upon the assumption that coarse given measurements on the scale H may be represented by an effective mapping (up to order H) based on Theorem 1.

The effective model corresponding to the reconstructed matrix provides a reliable surrogate that contains effective properties and that may be used for simulations on the coarse data scale. Moreover, knowledge on numerical homogenization methods can, in a second step, provide further information about, e.g., geometric features.

3.2 Medical Background

The considered inverse problems are of internal type, i.e., we assume that measurements are available on the bulk domain. This setting is motivated by the use of medical imaging protocols based on Magnetic Resonance Imaging (MRI), which play a key role in modern diagnostics. Using strong magnetic fields, magnetic field gradients, and radio waves, MRI allows clinicians to obtain, in vivo and non-invasively (and without exposing the body to radiation), both geometrical features of the body, e.g., shapes of the organs, and functional properties, e.g., motion or diffusion processes. As an example, Magnetic Resonance Elastography (MRE) is an imaging technique which is sensitive to mechanical parameters of the tissue. During an MRE examination, the tissue undergoes an external mechanical excitation, imposed at given frequencies by so-called actuators, attached to the surface of human tissues. In parallel, using phase-contrast MRI, i.e., postprocessing the phase of the MRI signal, it is then possible to measure the internal displacement of the tissue and hence to recover how shear and compression waves propagate into the body [20, 32, 39]. In practice, MRE allows to obtain average displacement fields on each element of a three-dimensional Cartesian mesh (a voxel), whose resolution is typically of the order of millimeters, and it is practically limited by the examination time and by the properties of the MRI scanner.

Combining these data with a mechanical tissue model, it is then possible to recover information about the elastic behavior of the tissue (a so-called elastogram). Clinical application of MRE are based on reconstructing tissue properties only on an effective scale, i.e., describing the living tissue as a (visco-)elastic material with mechanical parameters varying only on the coarse data scale. The procedure is currently used to diagnose and monitor tissue diseases such as cancer and fibrosis, that are characterized by different tissue stiffnesses. However, parameters describing the microscopic scales—such as tissue porosity and vascular structures—might play an important role in several applications, especially in the context of poroelastic tissues [18, 19]. In those cases, in order to characterize the microstructural properties from coarse data, mathematical models defined on the effective scale—such as the one proposed in Sect. 3.1—might provide a valuable alternative to efficiently take smaller scales into account.

3.3 The Minimization Problem

We formulate the inverse problem (17) as a minimization problem in the set \(\mathcal {M}(\ell ,\mathcal {T}_H)\), i.e.,

$$\begin{aligned} \text {find } {\tilde{S}}_H^* = \arg \min _{S_H \in \mathcal {M}(\ell ,\mathcal {T}_H)} {\tilde{\mathcal {J}}}_H(S_H), \end{aligned}$$
(18)

where

$$\begin{aligned} {\tilde{\mathcal {J}}}_H(S_H) = \frac{1}{2}\Big (\texttt {{dist}}_f(\tilde{\mathfrak {L}}^{\mathrm {eff}},\mathfrak {L}_{S_H})\Big )^2. \end{aligned}$$
(19)

Based on Theorem 1, we interpret the operator \(\mathfrak {L}_{S_H}(\cdot ,f):X_H\rightarrow V_H\) as an effective model which can be represented by a matrix of size \(m \times n\), i.e.,

$$\begin{aligned} \mathfrak {L}_{S_H}^{\mathrm {eff}} = \mathfrak {L}_{S_H}(\cdot ,f) = \left( \mathbb {1}- R_H^T S_{H,0}^{-1} R_HS_H\right) E^b_H+ R_H^T S_{H,0}^{-1}R_H M_H F_H, \end{aligned}$$

with \(F_H := [f_H,f_H,\dots ,f_H] \in \mathbb {R}^{m \times n}\) and the identity matrix \(\mathbb {1}\in \mathbb {R}^{m \times m}\). The matrix \(\mathfrak {L}_{S_H}^{\mathrm {eff}}\) comprises full information about the forward problem in the sense that it includes the solutions of (11) for a complete set of basis functions of \(X_H\). The operator \(\tilde{\mathfrak {L}}^{\mathrm {eff}}\) may also be interpreted as a matrix, so that the distance between the operators can be measured in general matrix norms. This is especially useful since a splitting of the form (8) is generally not known for \(\tilde{\mathfrak {L}}^{\mathrm {eff}}\).

Let \(\mu := \mathrm {dim}\,\mathcal {M}(\ell ,\mathcal {T}_H)\). Instead of (18), based on the matrix representation introduced above we consider a minimization problem for the functional \(\mathcal {J}_H:\mathbb {R}^{\mu }\rightarrow \mathbb {R}\) defined by

$$\begin{aligned} \mathcal {J}_H(S_H) = \frac{1}{2}\big \Vert \tilde{\mathfrak {L}}^{\mathrm {eff}}\big \Vert _{\mathbb {R}^{{m} \times {n}}}^{-2}\big \Vert \tilde{\mathfrak {L}}^{\mathrm {eff}} - \mathfrak {L}_{S_H}^{\mathrm {eff}}\big \Vert ^2_{\mathbb {R}^{{m} \times {n}}}. \end{aligned}$$
(20)

At this stage, the choice of the norm in \(\mathbb {R}^{{m} \times {n}}\) in (20) is arbitrary. The results that we will show in Sect. 5 have been obtained using the Frobenius norm, which seems to be a natural candidate.

3.4 Iterative Minimization

In order to find a minimizer of (20), we can now apply standard minimization techniques such as the Newton method or the gradient descent method. Here, we adopt a Gauß–Newton method which, in our numerical computations, showed faster convergence in terms of number of iterations.

In order to compute the descent direction, the most important step concerns the computation of the gradient of \(\mathcal {J}_H\) with respect to the relevant entries \(\{s_i\}_{i=1}^\mu \) of \(S_H\) (i.e., the diagonal and the non-zero entries above the diagonal, due to symmetry). Using the chain rule, we obtain

$$\begin{aligned} \frac{\partial }{\partial s_i} \mathcal {J}_H(S_H) = -\big \Vert \tilde{\mathfrak {L}}^{\mathrm {eff}}\big \Vert _{\mathbb {R}^{{m} \times {n}}}^{-2}\Big (\tilde{\mathfrak {L}}^{\mathrm {eff}} - \mathfrak {L}_{S_H}^{\mathrm {eff}}\Big ) : \frac{\partial \mathfrak {L}_{S_H}^{\mathrm {eff}}}{\partial s_i}. \end{aligned}$$
(21)

For the Gauß–Newton method, only the derivatives of \(\mathfrak {L}^{\mathrm {eff}}_{S_H}\) are needed, i.e.,

$$\begin{aligned} \begin{aligned} \frac{\partial \mathfrak {L}^{\mathrm {eff}}_{S_H}}{\partial s_i}&= - R_H^T \bigg (\frac{\partial S_{H,0}^{-1}}{\partial s_i}\bigg ) R_H(S_H E^b_H- M_H F_H) - R_H^T S_{H,0}^{-1} R_H\bigg (\frac{\partial S_H}{\partial s_i}\bigg ) E^b_H\\&= R_H^T S_{H,0}^{-1}\bigg (\frac{\partial S_{H,0}}{\partial s_i}\bigg )S_{H,0}^{-1} R_H(S_H E^b_H- M_H F_H) - R_H^T S_{H,0}^{-1} R_H\bigg (\frac{\partial S_H}{\partial s_i}\Big ) E^b_H. \end{aligned} \end{aligned}$$

Here, the double dot product is defined by \(M:\tilde{M}=\mathrm {trace}({M\tilde{M}^T})\). The derivatives \(\frac{\partial S_H}{\partial s_i}\) and \(\frac{\partial S_{H,0}}{\partial s_i}\) are relatively easy to compute, as they are defined as global matrices that only contain at most two entries equal to 1.

For ease of notation, let us interpret \(\mathfrak {L}_{S_H}^{\mathrm {eff}}\) and \(S_H\) as vectors in \(\mathbb {R}^{mn}\) and \(\mathbb {R}^{m^2}\), respectively. The Gauß–Newton method to minimize the functional \(\mathcal {J}_H\) is then defined by the following steps.

  • Let an initial matrix \(S_H^0 \in \mathcal {M}(\ell ,\mathcal {T}_H)\) be given.

  • For \(k=0,1,\ldots \) (until a certain stopping criterion is satisfied), solve

    $$\begin{aligned} H_k p_k= \left[ D\mathfrak {L}_{S_H^k}^{\mathrm {eff}}\right] ^T \left[ \tilde{\mathfrak {L}}^{\mathrm {eff}}-\mathfrak {L}_{S_H^k}^{\mathrm {eff}}\right] \end{aligned}$$
    (22)

    where D denotes the derivative with respect to the relevant entries of \(S_H\) and

    $$\begin{aligned} H_k = \left[ D\mathfrak {L}_{S_H^k}^{\mathrm {eff}}\right] ^T \left[ D\mathfrak {L}_{S_H^k}^{\mathrm {eff}}\right] . \end{aligned}$$
  • Set \(P_k\in \mathcal {M}(\ell ,\mathcal {T}_H)\) as the matrix whose relevant entries are given by \(p_k\) and define

    $$\begin{aligned} S_H^{k+1} = S_H^{k} + \delta _k P_k \end{aligned}$$
    (23)

    with appropriately chosen step size \(\delta _k\), for example using backtracking line search based on the Armijo–Goldstein condition.

Due to the ill-posedness of the inverse problem, the matrix \(H_k\) might be singular. A possible approach to overcome this issue consists in adding artificially a diagonal block, i.e., replacing (22) by

$$\begin{aligned} \left( H_k + \eta \mathbb {1}\right) p_k= \left[ D\mathfrak {L}_{S_H^k}^{\mathrm {eff}}\right] ^T \left[ \tilde{\mathfrak {L}}^{\mathrm {eff}}-\mathfrak {L}_{S_H^k}^{\mathrm {eff}}\right] \end{aligned}$$
(24)

with a given parameter \(\eta > 0\).

Another possible strategy consists in adding a regularization term to the functional to be minimized, i.e., in replacing (20) by

$$\begin{aligned} \mathcal {J}_H(S_H) = \frac{1}{2}\big \Vert \tilde{\mathfrak {L}}^{\mathrm {eff}}\big \Vert _{\mathbb {R}^{{m} \times {n}}}^{-2}\big \Vert \tilde{\mathfrak {L}}^{\mathrm {eff}} - \mathfrak {L}_{S_H}^{\mathrm {eff}}\big \Vert ^2_{\mathbb {R}^{{m} \times {n}}} +\frac{\gamma }{2} \big \Vert S_{\mathrm {reg}} - S_H\big \Vert ^2_{R^{m \times m}} \end{aligned}$$
(25)

where \(\gamma > 0\) is a given regularization parameter and \(S_{\mathrm {reg}}\) is a regularization (or stabilization) matrix. Additionally, the computations of the gradient in (21) need to be adapted accordingly. In the presence of multiple minimizers, this regularization enforces the solution to be close (depending on the parameter \(\gamma \)) to the matrix \(S_{\mathrm {reg}}\). For example, if the aim of the inverse problem is to find defects in an otherwise homogeneous medium, a suitable choice for \(S_{\mathrm {reg}}\) could be a standard finite element stiffness matrix for a constant diffusion coefficient. In our practical computations, the regularization approach described in (24) is better suited since an appropriate regularization matrix \(S_{\mathrm {reg}}\) is generally not known.

We emphasize that the presented inversion process does not need to resolve any microscopic scales in order to obtain an effective numerical model. The information extracted by this procedure (i.e., the stiffness matrix \({\tilde{S}}_H\)) may be used to simulate other problems subject to the same (unknown) diffusion coefficient. Furthermore, the information gathered can be seen as an intermediate step towards recovering information concerning the original coefficient itself. This additional recovery step will be studied in a future work.

4 Justification of the Proposed Inverse Problem

This section is devoted to proving Theorem 1 in Sect. 2. In particular, we explicitly construct an operator \(\mathcal {G}\) as introduced in (4). We emphasize, however, that the explicit construction is only to justify the reconstruction of an effective model. Apart from its communication behavior, none of the specific properties used below are required for the inverse procedure.

4.1 Effective Forward Approximation Beyond Structural Assumptions

In this subsection, we use the multiscale technique known as Localized Orthogonal Decomposition (LOD) [17, 27] to obtain a coarse forward model on the scale H which produces a forward operator that can be used as a replacement for \(\mathfrak {L}_A\) as given in (6).

To this end, we discretize (3) in a suitable coarse multiscale space. Since the standard space \(V_H\) is not suitable for the approximation of u if H is larger than the spatial scale of the microstructure, we enrich the coarse model with microscopic information about the problem via corrections of classical finite element functions. The construction of these corrections is based on a projective quasi-interpolation operator \(I_H:V^0\rightarrow V_H^0\) with standard approximation and stability properties, i.e., for an element \(T \in \mathcal {T}_H\) with diameter \(H_T\), it holds that

$$\begin{aligned} \Vert H_T^{-1}(v-I_H v)\Vert _{L^2(T)} + \Vert \nabla I_H v\Vert _{L^2(T)} \le C \Vert \nabla v\Vert _{L^2(\mathrm {N}(T))} \end{aligned}$$
(26)

for all \(v\in V^0\), where the constant C is independent of H, and \(\mathrm {N}(T) := \mathrm {N}^1(T)\) is the neighborhood of T as defined in (14). Note that for shape-regular meshes the above estimate also holds globally. For a particular choice of \(I_H\), see [9, 12, 23].

Based on \(I_H\), we define, for any element \(T \in \mathcal {T}_H\) and any function \(v_H\in V_H^0\), the element correction \(\mathcal {C}_T v_H \in W := \mathrm {Ker}I_H\) by

$$\begin{aligned} a(\mathcal {C}_T v_H, w) = \int _T A \nabla v_H \cdot \nabla w =: a_T(v_H, w) \end{aligned}$$
(27)

for all \(w \in W\), and the full correction \(\mathcal {C}:V_H^0\rightarrow W\) by

$$\begin{aligned} \mathcal {C}:= \sum _{T \in \mathcal {T}_H}\mathcal {C}_T. \end{aligned}$$

By construction, it holds that

$$\begin{aligned} a((1-\mathcal {C})v_H,w) = 0 \end{aligned}$$
(28)

for all \(v_H \in V_H^0\) and \(w\in W\). The corrections \(\mathcal {C}_T v_H\) have, in general, global support. However, as shown in [17, 36] (based on [27]) they decay exponentially fast (see also the one-dimensional sketch in Fig. 2).

Fig. 2
figure 2

Illustration of a one-dimensional hat function and its correction for the coefficient \(A(x) = (2+\sin (2^8\pi x))^{-1}\)

Therefore, we use localized element corrections \(\mathcal {C}_{T,\ell }v_H\) which are obtained by solving (27) on local patches \(\mathrm {N}^\ell (T)\) as defined in (14), i.e.,

$$\begin{aligned} a(\mathcal {C}_{T,\ell } v_H, w) = a_T(v_H, w) \end{aligned}$$
(29)

for all \(w \in W\) with \(w\vert _{\varOmega \backslash \mathrm {N}^\ell (T)} = 0\). As above, we define the full correction \(\mathcal {C}_\ell :V_H^0\rightarrow W\) by

$$\begin{aligned} \mathcal {C}_\ell := \sum _{T \in \mathcal {T}_H}\mathcal {C}_{T,\ell }. \end{aligned}$$

As shown in [17], we get, for any \(v_H \in V_H^0\),

$$\begin{aligned} \Vert \nabla (\mathcal {C}-\mathcal {C}_\ell )v_H\Vert _{L^2(\varOmega )} \le Ce^{-c\ell } \Vert \nabla v_H\Vert _{L^2(\varOmega )}. \end{aligned}$$
(30)

The constant c only depends on the contrast \(\beta /\alpha \), although this dependence seems pessimistic in many cases of practical relevance [14, 38]. For \(v_H \in V_H\), we set \(\mathcal {C}v_H := \mathcal {C}Rv_H\) and \(\mathcal {C}_\ell v_H := \mathcal {C}_\ell Rv_H\).

Given a discretized extension operator \(E^b_H:X_H\rightarrow V_H\) and the corresponding restriction operator \(R_H:V_H\rightarrow V_H^0\) as defined in Sect. 2.2, the discretized version of (3) reads: find \(u_H = R_Hu_H + E^b_Hu_H^0 \in V_H\) such that

$$\begin{aligned} a((1-\mathcal {C}_\ell )R_Hu_H, (1-\mathcal {C}_\ell )v_H) = (f_H,v_H) - a(E^b_Hu^0,(1-\mathcal {C}_\ell )v_H) \end{aligned}$$
(31)

for all \(v_H \in V_H^0\). In (31), \(f_H := \varPi _H f\) is the \(L^2\) projection of f onto \(V_H\), and \(u^0 \in X_H\).

4.2 Error Estimates

The following theorem shows that the approximation error of the presented approach scales optimally with H and that it is independent of the variations of the diffusion coefficient.

Theorem 2

(Error of the forward effective model) Let \(u \in V\) be the solution of (2) and \(u_H \in V_H\) the solution of (31), for given boundary data \(u^0 \in X_H\), a right-hand side \(f \in L^2(\varOmega )\), as well as an oversampling parameter \(\ell \).

For \(g \in L^2(\varOmega )\), let \(\hat{u}(g) \in V\) denote the solution of (2) with right-hand side g and boundary condition \(u^0 = 0\), and let us introduce the worst-case best-approximation error

$$\begin{aligned} {{\mathbf {wcba}}}(A,\mathcal {T}_H) := \sup _{g \in L^2(\varOmega )\backslash \{0\}}\,\inf _{v_H \in V_H^0}\frac{\Vert R\hat{u}(g) - v_H\Vert _{L^2(\varOmega )}}{\Vert g\Vert _{L^2(\varOmega )}}. \end{aligned}$$

It holds

$$\begin{aligned} \Vert u - u_H \Vert _{L^2(\varOmega )} \lesssim \left( H^2 + e^{-c\ell } + {\mathbf {wcba}}(A,\mathcal {T}_H)\right) \left( \Vert f\Vert _{L^2(\varOmega )} + \Vert u^0\Vert _X\right) . \end{aligned}$$

Proof

We split the error \(u - u_H = (u - \bar{u}_H) + (\bar{u}_H - \tilde{u}_H) + (\tilde{u}_H - u_H)\) with the solutions \(\bar{u}_H\) and \(\tilde{u}_H\) of the auxiliary problems

$$\begin{aligned} a(R_H\bar{u}_H, (1-\mathcal {C})v_H) = (f,v_H) - a(E^b_Hu^0,(1-\mathcal {C})v_H) \end{aligned}$$

and

$$\begin{aligned} a(R_H\tilde{u}_H, (1-\mathcal {C}_\ell )v_H) = (f_H,v_H) - a(E^b_Hu^0,(1-\mathcal {C}_\ell )v_H). \end{aligned}$$

To bound \(e_H := u_H - \tilde{u}_H\), we observe that

$$\begin{aligned} a((1-\mathcal {C}_\ell )e_H, (1-\mathcal {C}_\ell )v_H) = a(\mathcal {C}_\ell R_H\tilde{u}_H,(1-\mathcal {C}_\ell )v_H) = a(\mathcal {C}_\ell R_H\tilde{u}_H, (\mathcal {C}- \mathcal {C}_\ell )v_H), \end{aligned}$$
(32)

by the orthogonality property (28). For the next steps, we require the identity

$$\begin{aligned} v_H = I_H (1-\mathcal {C}_\ell ) v_H, \end{aligned}$$
(33)

which follows from the fact that \(\mathcal {C}_\ell v_H \in W = \mathrm {Ker}I_H\) and thus \(I_H\mathcal {C}_\ell v_H = 0\).

Testing with \(v_H = e_H\) in (32) and using (30) as well as

$$\begin{aligned} \Vert \nabla e_H\Vert _{L^2(\varOmega )} = \Vert \nabla I_H (1-\mathcal {C}_\ell )e_H\Vert _{L^2(\varOmega )} \lesssim \Vert A^{1/2}\nabla (1-\mathcal {C}_\ell )e_H\Vert _{L^2(\varOmega )}, \end{aligned}$$

we obtain

$$\begin{aligned} \begin{aligned} \Vert A^{1/2}\nabla (1-\mathcal {C}_\ell )e_H\Vert ^2_{L^2(\varOmega )}&= a((1-\mathcal {C}_\ell )e_H,(1-\mathcal {C}_\ell )e_H) \\&= a(\mathcal {C}_\ell R_H\tilde{u}_H, (\mathcal {C}- \mathcal {C}_\ell )e_H) \\&\lesssim e^{-c\ell }\Vert \nabla \mathcal {C}_\ell R_H\tilde{u}_H\Vert _{L^2(\varOmega )} \Vert A^{1/2}\nabla (1-\mathcal {C}_\ell )e_H\Vert _{L^2(\varOmega )}. \end{aligned} \end{aligned}$$

Further, it follows that

$$\begin{aligned} \Vert e_H\Vert _{L^2(\varOmega )} \lesssim \Vert A^{1/2}\nabla (1-\mathcal {C}_\ell )e_H\Vert _{L^2(\varOmega )} \lesssim e^{-c\ell }\left( \Vert f\Vert _{L^2(\varOmega )} + \Vert u^0\Vert _X\right) \end{aligned}$$
(34)

where we use (33) and (26). As a next step, we bound \(\bar{e}_H := \tilde{u}_H - \bar{u}_H\). We note that

$$\begin{aligned} a(\bar{e}_H, (1-\mathcal {C})v_H) = a(R_H\tilde{u}_H + E^b_Hu^0, (\mathcal {C}_\ell - \mathcal {C})v_H) \end{aligned}$$

for any \(v_H \in V_H^0\). With \(v_H = \bar{e}_H\) and similar arguments as above, we obtain

$$\begin{aligned} \Vert \bar{e}_H\Vert _{L^2(\varOmega )} \lesssim \Vert A^{1/2}\nabla (1-\mathcal {C})\bar{e}_H\Vert _{L^2(\varOmega )} \lesssim e^{-c\ell }\left( \Vert f\Vert _{L^2(\varOmega )} + \Vert u^0\Vert _X\right) . \end{aligned}$$
(35)

The error \(u - \bar{u}_H\) can be estimated using [12, Prop. 1] which also holds for inhomogeneous Dirichlet boundary conditions, i.e.,

$$\begin{aligned} \Vert u - \bar{u}_H\Vert _{L^2(\varOmega )} \lesssim \left( H^2 + {\mathbf {wcba}}(A,\mathcal {T}_H)\right) \left( \Vert f\Vert _{L^2(\varOmega )} + \Vert u^0\Vert _X\right) . \end{aligned}$$
(36)

The triangle inequality, (34), (35), and (36) yield the desired estimate.\(\square \)

To illustrate the advantage of the LOD, Fig. 3 shows the error between the numerical solution on a microscopic scale and the numerical solutions using the LOD and a classical finite element approximation on a coarse scale, respectively. The finite element method suffers from pre-asymptotic effects when the micro scale is not resolved, while the LOD produces a finite element function with much better approximation properties.

Fig. 3
figure 3

Left: An example of a microscopic coefficient. Right: Comparison of the finite element method and the LOD on \({\varOmega =[0,1]^2}\) for \(f=1\), \(u^0=0\), and \(\ell =2\) for the solution of the diffusion problem corresponding to the depicted scalar coefficient. The dashed line indicates linear convergence

We emphasize that, choosing \(\ell \) large enough (i.e., \(\ell \gtrsim |\log H|\)), it holds \(e^{-c\ell } \lesssim H\) or even \(e^{-c\ell } \lesssim H^2\). As discussed in [12], the worst-case best-approximation error is at least \(\mathcal {O}(H)\), and it scales possibly even better with H for certain pre-asymptotic regimes. In this work, we are mainly interested in solving the inverse problem and do not focus on optimizing the error estimates derived above.

We can now define the discretized operator corresponding to (31) by

$$\begin{aligned} \mathfrak {L}_{A,\ell }^{\mathrm {eff}}:X_H\times L^2(\varOmega )&\rightarrow \quad V_H,\nonumber \\ (u^{0},f)&\mapsto \quad u_H,\;\text {where}\,u_H\,\text {solves}\,(31). \end{aligned}$$
(37)

Using Theorem 2 and the distance function defined in (9), we obtain the following result.

Corollary 1

(Error of the effective forward operator) Let \(\ell \gtrsim |\log H|\). Then it holds

$$\begin{aligned} \texttt {{dist}}_f(\mathfrak {L}_A,\mathfrak {L}_{A,\ell }^{\mathrm {eff}})\lesssim H. \end{aligned}$$

4.3 Reformulation Using the Effective Stiffness Matrix

As described in Sect. 2.3, we can alternatively represent the operator \(\mathfrak {L}_{A,\ell }^{\mathrm {eff}}\) using the stiffness matrix corresponding to the discrete formulation (31). Given a coefficient \(A\in \mathcal {A}\), the corresponding LOD stiffness matrix \(S_H^\ell (A)\) is defined by

$$\begin{aligned} S_H^\ell (A)[i,j] := a((1-\mathcal {C}_\ell )\Lambda _{z_j}, (1-\mathcal {C}_\ell )\Lambda _{z_i}),\; \text{ for }\; i,\, j \in \{1,\dots ,m\}, \end{aligned}$$
(38)

with the ordering \(i \mapsto z_i\) as in (10). Further, the set of LOD stiffness matrices with oversampling parameter \(\ell \) based on admissible coefficients is given by

$$\begin{aligned} \mathcal {S}(\ell ,\mathcal {T}_H) := \left\{ S_H^\ell (A)\in \mathbb {R}^{m\times m}_{\mathrm {sym}}\,:\,A \in \mathcal {A}\right\} . \end{aligned}$$
(39)

By construction of the correctors \(\mathcal {C}_\ell \) in (29), it holds \(\mathcal {S}(\ell ,\mathcal {T}_H) \subseteq \mathcal {M}(\ell ,\mathcal {T}_H)\) with the set \(\mathcal {M}(\ell ,\mathcal {T}_H)\) defined in Sect. 2.

We can now prove the following lemma using the operator defined in (11).

Lemma 1

(Alternative representation of the effective forward operator) Let \(S_H^\ell (A) \in \mathcal {S}(\ell ,\mathcal {T}_H)\) be the LOD stiffness matrix corresponding to (31). Assume that \(E^b\) fulfills \(\mathcal {C}_\ell E^b_Hv^0 = \mathcal {C}_\ell E^b\vert _{X_H} v^0 = 0\) for any \(v^0 \in X_H\). Then it holds

$$\begin{aligned} \mathfrak {L}_{S_H^\ell (A)}(u^0,f) = \mathfrak {L}_{A,\ell }^{\mathrm {eff}}(u^0,f) \end{aligned}$$
(40)

for all \(u^0\in X_H,\,f\in L^2(\varOmega )\).

Proof

Write \(u_H = \sum _{j = 1}^{m} u_j \Lambda _{z_j}\) and observe that (31) is equivalent to

$$\begin{aligned} \sum _{j\,:\,z_j \not \subset \partial \varOmega }u_j\, a((1-\mathcal {C}_\ell )R_H\Lambda _{z_j}, (1-\mathcal {C}_\ell )\Lambda _{z_i}) = (f_H,\Lambda _{z_i}) - a(E^b_Hu^0,(1-\mathcal {C}_\ell )\Lambda _{z_i}) \end{aligned}$$
(41)

for all \(i \in \{k\,:\,z_k \not \subset \partial \varOmega \}\). Inserting \(f_H = \sum _{j = 1}^{m} f_j \Lambda _{z_i}\), using the fact that

$$\begin{aligned} a(E^b_Hu^0,(1-\mathcal {C}_\ell )v_H)=a((1-\mathcal {C}_\ell )E^b_Hu^0,(1-\mathcal {C}_\ell )v_H) \end{aligned}$$

for any \(v_H \in V_H^0\), and the definition (38), we can write Eq. (41) as

$$\begin{aligned} S_{H,0}^\ell (A) R_Hu_H = R_H M_H f_H - R_HS_H^\ell (A) E^b_Hu^0, \end{aligned}$$

which shows (40). \(\square \)

Remark 2

Natural choices for the extension operator \(E^b\), in order to fulfill the assumptions of Lemma 1, are those that extend functions in \(X_H\) to functions in \(V_H\) that are only supported on one layer of elements away from the boundary. In the numerical study presented in Sect. 5, we use the discrete extension operator \(E^b_H\) that maps a nodal basis function on the boundary to the corresponding nodal basis function in the whole domain.

Lemma 1 and Corollary 1 show that the operators \(\mathfrak {L}_A(\cdot ,f)\) and \(\mathfrak {L}_{S_H^\ell (A)}(\cdot ,f)\) are close as operators from \(X_H\) to \(V\) if \(\ell \) is chosen large enough. This property is essentially the message of Theorem 1 in Sect. 2. In particular, the theorem holds with the explicit choice \(\mathcal {G}:= 1-\mathcal {C}_\ell \).

5 Numerical Experiments

In this section, we present some numerical experiments that illustrate the capability of the proposed method. The inverse problem is based on synthetic data, i.e., the coarse measurements used to feed the inversion algorithm are obtained from finite element solutions of (2) in \(V_h\), defined on a mesh with mesh size \(h = \sqrt{2}\cdot 2^{-9}\), that resolve the micro-scale features of the diffusion coefficient. Furthermore, the data are perturbed by multiplicative, independent and uniformly distributed random noise with intensity up to 5%. For the iterative optimization in the following study, we employ the regularization strategy defined in (24) with \(\eta = 0.001\). The remaining parameters used to run the inversion algorithm are included below.

5.1 Example 1: Full Boundary Data

In the first experiment, we assume to have full information on the operator (matrix) \(\tilde{\mathfrak {L}}^{\mathrm {eff}}\), i.e., we assume that measurements in \(\varOmega \) on the coarse scale \(H = \sqrt{2}\cdot 2^{-5}\) for a complete basis of \(X_H\) are available. The scalar coefficient A for which the effective behavior should be recovered is constant on a mesh \(\mathcal {T}_\varepsilon \) with \(\varepsilon = \sqrt{2}\cdot 2^{-7}\) and the value on each element is independently obtained as a uniformly distributed random number between 1 and 50, i.e., for any \(T \in \mathcal {T}_\varepsilon \) we have \(A|_{T} \sim U(1,50)\) (see Fig. 4, left, for the explicit sample used here). We set \(f=1\) and start the inverse iteration with the finite element stiffness matrix \(S_H^0\) based on the constant coefficient with value 1.

Fig. 4
figure 4

Left: Diffusion coefficient in Example 1. Right: Values of \(\mathcal {J}_H\) in the first 20 iterations of the inversion algorithm, using sparsity patterns based on local matrices (

figure a
, dotted) and quasi-local matrices with \(\ell = 1\) (
figure b
), \(\ell = 2\) (
figure c
), \(\ell = 3\) (
figure d
) (Color figure online)

The values of the functional \(\mathcal {J}_H\) in the first 20 iterations of the inversion algorithm are given in Fig.  4 (right). In particular, we compare the performance of a local approach based on matrices in \(\mathcal {M}(0,\mathcal {T}_H)\) with the sparsity pattern of, e.g., a standard first-order finite element method, the HMM, or the Two-Scale Finite Element Method, with a quasi-local approach based on matrices in \(\mathcal {M}(\ell ,\mathcal {T}_H)\) for \(\ell \in \{1,2,3\}\). One clearly sees that slightly deviating from locality leads to better results in terms of decrease and value of the error functional \(\mathcal {J}_H\). In particular, for \(\ell = 0\) the functional seems to reach a stagnation relatively quickly, while the results significantly improve when increasing the value of \(\ell \).

A necessary validation step, in order to further investigate the behavior dependent on \(\ell \), consists in solving a diffusion problem using the stiffness matrices reconstructed with using different sparsity patterns, and comparing the resulting numerical solutions with the finite element functions from which the measurements were taken to feed the inversion algorithm. The outcome of this assessment is shown in Fig. 5, focusing on the cross sections at \(x_2 = 0.5\) (left) and at \(x_1 = 0.5\) (right) of the numerical approximations corresponding to the boundary condition \(u^0(x_1,x_2) = x_1\). Figure 6 depicts the same cross sections when a random boundary condition \(u^0\in X_H\) is considered.

Fig. 5
figure 5

Cross sections of reconstructed functions with boundary condition \(u^0(x_1,x_2) = x_1\) based on local stiffness matrices (

figure e
, dotted) and quasi-local ones with \(\ell = 1\) (
figure f
), \(\ell = 2\) (
figure g
), \(\ell = 3\) (
figure h
) for Example 1 obtained from full boundary data. The corresponding microscopic FE function (
figure i
, dashed) is depicted as a reference. Left: Cross section at \(x_2 = 0.5\). Right: Cross section at \(x_1 = 0.5\) (Color figure online)

Fig. 6
figure 6

Cross sections of reconstructed functions with random boundary condition \(u^0\in X_H\) based on local stiffness matrices (

figure j
, dotted) and quasi-local ones with \(\ell = 1\) (
figure k
), \(\ell = 2\) (
figure l
), \(\ell = 3\) (
figure m
) for Example 1 obtained from full boundary data. The corresponding microscopic FE function (
figure n
, dashed) is depicted as reference. Left: Cross section at \(x_2 = 0.5\). Right: Cross section at \(x_1 = 0.5\) (Color figure online)

Besides the accuracy of the numerical approximations computed based on the recovered stiffness matrices, it is also important to assess the robustness of the reconstructed effective model, i.e., to investigate to which extent the coarsened information about the diffusion coefficient encoded in the stiffness matrix can be used to simulate other scenarios.

For this purpose, we employ the reconstructed stiffness matrices to simulate a diffusion problem with two different right-hand sides, i.e.,

$$\begin{aligned} g_1(x_1,x_2) = 20\,(\mathbb {1}_{\{x_1< 0.5\}}\,x_1 + \mathbb {1}_{\{x_1 \ge 0.5\}}\,(1-x_1))(\mathbb {1}_{\{x_2 < 0.5\}}\,x_2 + \mathbb {1}_{\{x_2 \ge 0.5\}}\,(1-x_2)) \end{aligned}$$

and

$$\begin{aligned} g_2(x_1,x_2) = 10\,\mathbb {1}_{\{x_1 \ge 0.5\}}, \end{aligned}$$

and compare the numerical results with the corresponding microscopic solution using the diffusion coefficient depicted in Fig. 4 (left). In both cases, homogeneous Dirichlet boundary conditions are imposed on the outer boundaries.

Representative cross sections of the numerical approximations obtained based on the reconstructed stiffness matrices, compared to the corresponding microscopic solutions, are shown in Fig. 7. The numerical results indicate that robustness can be assured only with some moderate quasi-locality. Moreover, as in the previous experiments, the quality of the results improves if \(\ell \) is increased.

Fig. 7
figure 7

Cross sections at \(x_2 = 0.5\) of reconstructed functions with homogeneous Dirichlet boundary conditions based on local stiffness matrices (

figure o
, dotted) and quasi-local ones with \(\ell = 1\) (
figure p
), \(\ell = 2\) (
figure q
), \(\ell = 3\) (
figure r
). The corresponding microscopic FE functions (
figure s
, dashed) are given as a reference but were not part of the input data. Left: Right-hand side \(g_1\). Right: Right-hand side \(g_2\) (Color figure online)

5.2 Example 2: Incomplete Boundary Data

Next, we consider a more realistic case where the operator \(\tilde{\mathfrak {L}}^{\mathrm {eff}}\) is only partially known. In practice, this means that coarse measurements in \(\varOmega \) are available only for q distinct boundary conditions in \(X_H\) (\(q < \mathrm {dim}\, X_H\)). In our numerical examples, this lack of information is modeled choosing q distinct functions in \(X_H\) with independent and uniformly distributed random values between 0 and 1. In this setting, the aim is to find an effective model that not only fits the given data, but that is also able to reproduce the coarse behavior for other boundary conditions not considered as input data.

The scalar coefficient A whose corresponding stiffness matrix should be recovered is shown in Fig. 8 (left). We set \(H=\sqrt{2}\cdot 2^{-5}\), \(f=1\), \(q = 40\), and the initial matrix \(S_H^0\) is defined as the finite element stiffness matrix based on an independent and uniformly distributed random coefficient on the coarse scale H with values between 0.1 and 10.

We adapt the randomized approach used in [34] in the context of deep learning. Namely, in each iteration step, we randomly choose half of the available data to compute the new search direction, whereas we use all available data for the line search and for the evaluation of the functional \(\mathcal {J}_H\). The values of the error functional \(\mathcal {J}_H\) in the first 20 iterations of the inversion algorithm are shown in Fig. 8 (right). One can observe that classical local stiffness matrices and even the quasi-local approach with \(\ell =1\) cannot significantly improve the results obtained with the initial guess, while quasi-local matrices with \(\ell \ge 2\) are able to reduce the values of the functional up to a certain degree.

As in the previous subsection, we validate the outcome of the inversion algorithm by solving a diffusion problem using the reconstructed stiffness matrices and comparing the numerical results with the corresponding microscopic finite element solutions. The cross sections at \(x_2 = 0.5\) and \(x_1 = 0.5\) of the numerical approximations using the different stiffness matrices are shown in Fig. 9, for the case with boundary condition \(u^0(x_1,x_2) = x_1\). We emphasize that, in this setting, neither the reference finite element function (black dotted line in Fig. 9) nor a coarse measurement from it were part of the input data.

Fig. 8
figure 8

Left: Diffusion coefficient in Example 2. Right: Values of \(\mathcal {J}_H\) in the first 20 iterations of the inversion algorithm with the randomized approach based on local matrices (

figure t
, dotted) and quasi-local matrices with \(\ell = 1\) (
figure u
), \(\ell = 2\) (
figure v
), \(\ell = 3\) (
figure w
) (Color figure online)

Fig. 9
figure 9

Cross sections of reconstructed functions with boundary condition \(u^0(x_1,x_2) = x_1\) based on local stiffness matrices (

figure x
, dotted) and quasi-local ones with \(\ell = 1\) (
figure y
), \(\ell = 2\) (
figure z
), \(\ell = 3\) (
figure aa
) for Example 2 obtained from incomplete boundary data and the randomized approach. The corresponding microscopic FE function (
figure ab
, dashed) is depicted as a reference but was not part of the input data. Left: Cross section at \(x_2 = 0.5\). Right: Cross section at \(x_1 = 0.5\) (Color figure online)

For a further comparison, we also present in Fig. 10 the same cross sections of the numerical solutions obtained from the stiffness matrices using a full-data approach, i.e., when all available data (40 measurements) are used in every step to compute the new search direction. Moreover, a comparison of the values of the error functional is shown in Fig. 11. The reconstructed matrices behave similarly to the ones obtained with the randomized approach. However, even for \(\ell \ge 2\), the full-data approach leads to a stagnation relatively quickly, whereas the randomized strategy is generally more robust in the case of incomplete boundary data and the included randomness results in smaller values of the error functional. Further, the randomized strategy requires less computational effort. In the presented example, we observed—employing the randomized strategy—a speed-up by a factor of about 1.7. For further numerical experiments, see also [25, Ch. 4].

5.3 Discussion

The presented inversion results demonstrate that the reconstruction of the stiffness matrix assuming a local sparsity pattern with communication only between neighboring degrees of freedom does not allow us to capture effective features of the microscopic problem, while the reconstruction based on a quasi-local approach, especially with \(\ell \ge 2\), is able to mimic the effective behavior.

Furthermore, some quasi-locality appears to allow for robustness with respect to different right-hand sides, a property which allows us to employ the reconstructed effective model for the simulation of other scenarios, assuming that the microscopic properties remain unchanged.

Our experiments also indicate that a lower bound on \(\ell \) seems to be necessary similar to the forward setting, where \(\ell \) needs to be increased for smaller values of H (\(\ell \gtrsim |\log H|\)) to obtain improvements in the first place. In that sense, our findings also deviate from the numerical results in [13] which indicate that truly local numerical homogenization might always be possible.

Fig. 10
figure 10

Cross sections of reconstructed functions with boundary condition \(u^0(x_1,x_2) = x_1\) based on local stiffness matrices (

figure ac
, dotted) and quasi-local ones with \(\ell = 1\) (
figure ad
), \(\ell = 2\) (
figure ae
), \(\ell = 3\) (
figure af
) for Example 2 obtained from incomplete boundary data and the full-data approach. The corresponding microscopic FE function (
figure ag
, dashed) is depicted as a reference but was not part of the input data. Left: Cross section at \(x_2 = 0.5\). Right: Cross section at \(x_1 = 0.5\) (Color figure online)

Fig. 11
figure 11

Left: Values of \(\mathcal {J}_H\) in Example 2 in the first 20 iterations of the inversion algorithm with the full-data approach based on local matrices (

figure ah
, dotted) and quasi-local matrices with \(\ell = 1\) (
figure ai
), \(\ell = 2\) (
figure aj
), \(\ell = 3\) (
figure ak
). Right: Comparison of the values of \(\mathcal {J}_H\) for the full-data approach (dotted line) and the randomized approach (solid line) for \(\ell = 2\) (
figure al
) and \(\ell = 3\) (
figure am
) (Color figure online)

6 Conclusion

We proposed a strategy to reconstruct the effective behavior of solutions of a multiscale PDE model which involves a coefficient varying on a microscopic scale. The approach is motivated by the effective models (represented by effective stiffness matrices) obtained by numerical homogenization. The aim is to provide a first step towards a reasonable surrogate in the inverse multiscale setting, which is characterized by a mismatch between coarse data scale and microscopic quantities. The method relies on a quasi-local behavior in the sense that the reconstructed system matrices have a slightly denser sparsity pattern than standard finite element matrices, and this allows us to recover the behavior related to characteristic microscopic features of the solutions without requiring numerical computations on the microscopic scale. The method has been numerically validated on a prototypical model problem, considering a stationary linear elliptic diffusion problem with inhomogeneous boundary conditions. Further, even the case of incomplete boundary data can be handled and ideas from learning-type methods may be adopted.

A possible future extension of the approach includes the identification of further information about the underlying coefficient, e.g., geometric features, based on the reconstructed effective models. Further, more involved combinations with learning-type techniques to deal with incomplete data could be studied, as well as adaptivity with respect to the parameter \(\ell \).