Representation and reconstruction of covariance operators in linear inverse problems

We introduce a framework for the reconstruction and representation of functions in a setting where these objects cannot be directly observed, but only indirect and noisy measurements are available, namely an inverse problem setting. The proposed methodology can be applied either to the analysis of indirectly observed functional images or to the associated covariance operators, representing second-order information, and thus lying on a non-Euclidean space. To deal with the ill-posedness of the inverse problem, we exploit the spatial structure of the sample data by introducing a flexible regularizing term embedded in the model. Thanks to its efficiency, the proposed model is applied to MEG data, leading to a novel approach to the investigation of functional connectivity.


Introduction
An inverse problem is the process of recovering missing information from indirect and noisy observations. Not surprisingly, inverse problems play a central role in numerous elds such as, to name a few, geophysics (Zhdanov 2002), computer vision (Hartley and Zisserman 2004), medical imaging (Arridge 1999, Lustig et al 2008 and machine learning (De Vito et al 2005). 4 Author to whom any correspondence should be addressed.
Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
Solving a linear inverse problem means nding an unknown x, for instance a function or a surface, from a noisy observation y, which is a solution to the model where y and ε belong to an either nite or in nite dimensional Banach space. The map K is called a forward operator and is generally assumed to be known, although its uncertainty has also been taken into account in the literature (Arridge et al 2006, Golub and van Loan 1980, Gutta et al 2019, Kluth and Maass 2017, Lehikoinen et al 2007, Nissinen et al 2009, Zhu et al 2011. The term ε represents observational error. Problem 1 is a well-studied problem within applied mathematics (for early works in the eld, see Adorf 1995, Calderón 1980, Geman 1990). Its main dif culties arise from the fact that, in practical situations, an inverse of the forward operator does not exist, or if it does, it ampli es the noise term. For this reason such a problem is called ill-posed. Consequently, the estimation of the function x in (1) is generally tackled by minimizing a functional which is the sum of a data ( delity) term and a regularizing term encoding prior information on the function to be recovered (see, among others, Cavalier 2008, Hu and Jacob 2012, Lefkimmiatis et al 2012, Mathé and Pereverzev 2006, Tenorio 2001. For convex optimization functionals, modern ef cient optimization methods can be applied (Beck and Teboulle 2009, Boyd et al 2010, Burger et al 2016, Chambolle and Pock 2011, Chambolle and Pock 2016. Alternatively, when it is important to assess the uncertainty associated with the estimates, a Bayesian approach could be adopted (Calvetti and Somersalo 2007, Kaipio and Somersalo 2005, Repetti et al 2019, Stuart 2010. The deep convolutional neural network approach has also been applied to this setting .
In imaging sciences, it is sometimes of interest to nd an optimal representation and perform statistics on the second order information associated with the functional samples, i.e. the covariance operators describing the variability of the underlying functional images. This is, for instance, the case in a number of areas of neuroimaging, particularly those investigating functional connectivity. In this work, we establish a framework for reconstructing and optimally representing indirectly observed samples C 1 , . . . , C n , that are covariance operators, expressing the second order properties of the underlying unobserved functions. The indirect observations are covariance operators generated by the model where K * i denotes the adjoint operator and the term E i models observational error. The term K i • C i • K * i represents the covariance operator of K i X (i) , with X (i) an underlying random function whose covariance operator is C i .
As opposed to more classical linear inverse problems formulations, problem 2 introduces the following additional dif culties: • We are in a setting where each sample is a high-dimensional object that is a covariance operator; it is important to take advantage of the information from all the samples to reconstruct and represent each of them. • The elements {C i } and {S i } live on non-Euclidean spaces, as they belong to the positive semide nite cone, and it is important to account for this manifold structure in the formulation of the associated estimators. • In an inverse problem setting it is fundamental to be able to introduce spatial regularization, however it is not obvious how to feasibly construct a regularizing term for covariance operators re ecting, for instance, smoothness assumptions on the underlying functional images.
More general non-Euclidean settings could also be accommodated. Speci cally, the error term could be de ned on a tangent space and mapped to the original space through the exponential mapping. Another setting of interest is the case of error terms that push the observables out of the original space. In our applications this is not an issue, as the contaminated observations are themselves empirical covariance matrices, which belong to the non-Euclidean space of positive semide nite matrices.
We tackle problem 2 by generalizing the concept of principal component analysis (PCA) to optimally represent and understand the variation associated with samples that are indirectly observed covariance operators. The proposed model is also able to deal with the simpler case of samples that are indirectly observed functional images belonging to a linear functional space.

Motivating application-functional connectivity
In recent years, statistical analysis of covariance matrices has gained a predominant role in medical imaging and in particular in functional neuroimaging. In fact, covariance matrices are the natural objects to represent the brain's functional connectivity, which can be de ned as a measure of covariation, in time, of the cerebral activity among brain regions. While many techniques have been proposed to describe functional connectivity, almost all can be described in terms of a function of a covariance or related matrix.
Covariance matrices representing functional connectivity can be computed from the signals arising from functional imaging modalities. The choice of a speci c functional imaging modality is generally driven by the preference to have high spatial resolution signals, and thus high spatial resolution covariance matrices, versus high temporal resolution, and thus the possibility to study the temporal dynamic of the covariance matrices. Functional magnetic resonance falls in the rst category, while electroencephalogram (EEG) and magnetoencephalography (MEG) in the second. However, high temporal resolution does generally come at the price of indirect measurements and, as shown in gure 1 in the case of MEG data, the signals are in practice detected on the sensors space. It is however of interest to produce results on the associated signals on the cerebral cortex, which we will refer to as brain space. The signals on the brain space are functional images whose domain is the geometric representation of the brain and are associated with the neuronal activity on the cerebral cortex. We borrow here the notion of brain space and sensors space from Johnstone and Silverman (1990) and we use it throughout the paper for convenience, however it is important to highlight that the formulation of the problem is much more general than the setting of this speci c application.
The signals on the brain space are related to the signals on the sensors space by a forward operator, derived from the physical modeling of the electrical/magnetic propagation, from the cerebral cortex to the sensors. This is generally referred to as the forward problem. For soft-eld methods like EEG, MEG and functional near-infrared spectroscopy (Eggebrecht et al 2014, Ferrari and Quaresima 2012, Mosher et al 1999, Singh et al 2014, Ye et al 2009, the forward operator is de ned through the solution to a partial differential equation of diffusion type. Such a mapping induces a strong degree of smoothing and consequently the corresponding inverse problem, i.e. the reconstruction of a signal on the brain space from observations in the sensors space, is strongly ill-posed. In fact, signals with fairly different intensities on the brain space, due to the diffusion effect, result in signals with similar intensities in the sensors space. In gure 1, we show an example of a signal on the brain space and the associated signal on the sensors space. From a practical perspective, it is crucial to understand how the different parts of the brain interact, which is sometimes known as functional connectivity. A possible way to understand these interactions is by analyzing the covariance function associated with the signals describing the cerebral activity of an individual on the brain space (Fransson et al 2011, Lee et al 2013. On the top left, head model of a subject and superimposition of the 248 MEG sensors positioned around the head, called 'sensors space'. On the top right, brain model of the same subject represented by a triangular mesh of 8k nodes, which represents the 'brain space'. On the bottom left, an example of a synthetic signal detected by the MEG sensors. The dots represent the sensors, the color map represents the signal detected by the sensors. On the bottom right, intensity of the reconstructed signal on the triangular mesh of the cerebral cortex. Covariance matrices of the signal detected by the MEG sensors from three different subjects of the human connectome project. The size of the matrices is 248 × 248. The dark blue bands represent missing data, which are due to the exclusion of some channels after a quality check of the signal. Li et al 2009). More recently, the interest has shifted from this static approach to a dynamic approach. In particular, for a single individual, it is of interest to understand how these covariance functions vary in time. This is a particularly active eld, known as dynamic functional connectivity (Hutchison et al 2013). Another element of interest is understanding how these covariance functions vary among individuals. In gure 2, we show the covariance matrices, on the sensors space, computed from the MEG signals of three different subjects.
The remainder of this paper is organized as follows. In section 2 we give a formal description of the problem. We then introduce a model for indirectly observed smooth functional images in section 3 and present the more general model associated with problem 2 in section 4. In section 5, we perform simulations to assess the validity of the estimation framework. In section 6 we apply the proposed models to MEG data and we nally give some concluding remarks in section 7.

Mathematical description of the problem
We now introduce the problem using our driving application as an example. To this purpose, let M a be a closed smooth two-dimensional manifold embedded in R 3 , which in our application represents the geometry of the cerebral cortex. An example of such a surface is shown on the top right of gure 1. We denote with L 2 (M) the space of square integrable functions on M.
De ne X to be a random function with values in a Hilbert functional space F ⊂ L 2 (M) with mean µ = E[X], nite second moment, and assume the continuity and square integrability of its dv, for all g ∈ L 2 (M). Mercer's lemma (Riesz and Szokefalvi-Nagy 1955) guarantees the existence of a non-increasing sequence {γ r } of eigenvalues of C X and an orthonormal sequence of corresponding eigenfunctions {ψ r }, such that As a direct consequence, X can be expanded 5 as X = µ + ∞ r=1 ζ r ψ r , where the random variables {ζ r } are uncorrelated and are given by The collection {ψ r } de nes the modes of variation of the random function X, in descending order of strength, and these are called principal component (PC) functions. The associated random variables {ζ r } are called PC scores. Moreover, the de ned PC functions are the best nite basis approximation in the L 2 -sense, therefore for any xed R ∈ N, the rst R PC functions of X minimize the reconstruction error, i.e.

The case of indirectly observed functions
In the case of indirect observations, the signal is detectable only through s sensors on the sensors space. Let {K l : l = 1, . . . , m} be a collection of s × p real matrices, representing the potentially sample-speci c forward operators relating the signal at p pre-de ned points {v j : j = 1, . . . , p} on the cortical surface M with the signal captured by the s sensors. The matrices {K l } are 5 More precisely, we have that = 0, i.e. the series converges uniformly in mean-square. discrete versions of the forward operator K introduced in section 1. Moreover, de ne the evaluation operator Ψ : F → R p to be a vector-valued functional that evaluates a function f ∈ F at the p pre-speci ed points {v j } ⊂ M, returning the p dimensional vector ( f (v 1 ), . . . , f (v p )) T . The operators Ψ and {K l } are known. However, in the described problem the random function X can be observed only through indirect measurements {y l ∈ R s : l = 1, . . . , m} generated from the model where {x l } are m independent realizations of X, and thus expandible in terms of the PC functions {ψ r } and the coef cients {ζ l,r } given by ζ l,r = M {x l (v) − µ(v)}ψ r (v)dv. The terms {ε l } represent observational errors and are independent realizations of an s-dimensional normal random vector, with mean the zero vector and variance σ 2 I p , where I p denotes the p-dimensional identity matrix.
We consider the problem of estimating the PC functions {ψ r } in (5), and associated scores {ζ l,r }, from the observations {y l }. In gure 3 we give an illustration of the introduced setting. Note that it would not be necessary to de ne the evaluation operator if the forward operators were de ned to be functionals {K l : F → R p }, relating directly the functional objects on the brain space to the real vectors on the sensors space. It is however the case that the operators {K l } are computed in a matrix form by third party software (see section 6 for details) for a pre-speci ed set of points {v j } ⊂ M and it is thus convenient to take this into account in the model through the introduction of an evaluation operator Ψ.
In the case of single subject studies, the surface M is the subject's reconstructed cortical surface, an example of which is shown on the right panel of gure 1. In this case, it is natural to assume that there is one common forward operator K for all the detected signals. In the more general case of multi-subject studies, M is assumed to be a template cortical surface. We are thus assuming that the individual cortical surfaces have been registered to the template M, which means that there is a smooth and one-to-one correspondence between the points on each individual brain surface and the template surface M, where the PC functions are de ned.
However, notice that when it comes to the computation of the forward operators, we are not assuming the brain geometries of the single subjects to be all equal to a geometric template, as in fact the model in (5) allows for sample-speci c forward operators {K l }. The individual cortical surfaces could also have different number of mesh points, in that case the subject-speci c 'resampling' operator could be absorbed into the de nition of sample-speci c evaluation operators {Ψ l }.
The estimation of the PC functions in (5) has been classically dealt with by reconstructing each observation x l independently and subsequently performing PCA. However, such an approach can be sub-optimal in particular in a low signal-to-noise setting, as when estimating one signal, the information from all the other sampled signals is systematically ignored. The statistical analysis of data samples that are random functions or surfaces, i.e. functional data, has also been explored in the functional data analysis (FDA) literature (Ramsay and Silverman 2005), however, most of those works focus on the setting of fully observed functions. An exception to this is the sparse FDA literature (see e.g. Yao et al 2005), where instead the functional samples are assumed to be observable only through irregular and noisy evaluations.
In the case of direct but noisy observations of a signal, previous works on statistical estimation of the covariance function, and associated eigenfunctions, have been made, for instance, in Bunea and Xiao (2015) for regularly sampled functions and in Huang et al (2008), Yao et al (2005) for sparsely sampled functions. A generalization to functions whose domain is a manifold is proposed in Lila et al (2016) and appropriate spatial coherence is introduced by penalizing directly the eigenfunctions of the covariance operator to be estimated, i.e. the PC functions. In the indirect observations setting, Tian et al (2012) propose a separable model in time and space for source localization. The estimation of PC functions of functional data in a linear space and on linear domains, from indirect and noisy samples, has been previously covered in Amini and Wainwright (2012). They propose a regularized M-estimator in a reproducing kernel Hilbert space (RKHS) framework. Due to the fact that in practice the introduction of an RKHS relies on the de nition of a kernel, i.e. a covariance function on the domain, this approach cannot be easily extended to non-linear domains. In Katsevich et al (2015), driven by an application to cryo-electron microscopy, the authors propose an unregularized estimator for the covariance matrix of indirectly observed functions. However, a regularized approach is crucial in our setting, due to the strong ill-posedness of the inverse problem considered. In the discrete setting, also other forms of regularization have been adopted, e.g. sparsity on the inverse covariance matrix (Friedman et al 2008, Liu andZhang 2019).

The case of indirectly observed covariance operators
A natural generalization of the setting introduced in the previous section is considering observations that have group speci c covariance operators. In detail, suppose now we are given a set of n covariance functions {C i : i = 1, . . . , n}, representing the underlying covariance operators {C i : i = 1, . . . , n} on the brain space. In our driving application, each covariance function C i : M × M → R describes the functional connectivity of the ith individual or the functional connectivity of the same individual at the ith time-point.
We consider the problem of de ning and estimating a set of covariance functions, that we call PC covariance functions, which enable the description of {C i } through the 'linear combinations' of few components. Such a reduced order description is of interest, for example, in understanding how functional connectivity varies among individuals or over time.
We de ne a model for the PC covariance functions of {C i } from the set of indirectly observed covariance matrices, computed from the signals on the sensors space, and thus given where . . , p} are the sampling points associated with the operator Ψ. The forward operators {K i } act on both sides of the covariance functions {C i }, due to the linear transformation K i Ψ applied to the signals on the brain space before being detected on the sensors space. The term E T i E i is an error term, where E i is an s × s matrix such that each entry is an independent sample of a Gaussian distribution with mean zero and standard deviation σ. Model (6) could be regarded as an implementation of the idealized problem 2, where the covariance operators are represented by the associated covariance functions. An illustration of the setting introduced can be found in gure 4.
The problem introduced in this section has not been extensively covered in the literature. In the discrete case, Dryden et al (2009) introduce a tangent PCA model for directly observed covariance matrices. An extension to directly observed covariance operators has been proposed in Pigoli et al (2014). Also related to our work is the setting considered in Petersen and Müller (2019), where the authors propose a regression framework for responses that are random objects (e.g. covariance matrices) with Euclidean predictors. The proposed regression model is applied to study associations between age and low-dimensional correlation matrices, representing functional connectivity, which have been computed from a parcellation of the brain. In section 4, we propose a novel PCA approach for indirectly observed high-dimensional covariance matrices.

Principal components of indirectly observed functions
The aim of this section is to de ne a model for the estimation of the PC functions {ψ r } from the observations {y l }, de ned in (5). Although the model proposed in this section is not the main contribution of this work, it allows us to introduce some of the concepts necessary to the de nition of the more general model for indirectly observed covariance functions in section 4.

Model
Now let z = (z 1 , . . . , z m ) T be an m-dimensional real column vector and H 2 (M) be the Sobolev space of functions in L 2 (M) with rst and second distributional derivatives in L 2 (M). From now on F is instantiated with H 2 (M). We propose to estimatef ∈ H 2 (M), the rst PC function of X, and the associated PC scores vector z, by solving the equation where · is the Euclidean norm and ∆ is the Laplace-Beltrami operator, which enables a smoothing regularizing effect on the PC functionf . The data t term encourages K l Ψf to capture the strongest mode of variation of {y l }. The parameter λ controls the trade-off between the data t term of the objective function and the regularizing term. The second PC function can be estimated by classical de ation methods, i.e. by applying model (7) on the residuals {y l −ẑ l K l Ψf }, and so on for the subsequent PCs. The proposed model can be interpreted as a regularized least square estimation of the rst PC function ψ 1 in (5), with the terms {z l } playing the role of estimates of the variables {ζ l,1 }.
In the simpli ed case of a single forward operator K := K 1 = · · · = K m , the minimization problem (7) can be reformulated in a more classical form. In fact, xing f in (7) and minimizing over z gives which can then be used to show that the minimization problem (7) is equivalent to maximizing with Y an m × s real matrix, where the lth row of Y is the observation y T l . This reformulation gives further insights on the interpretation off in (7). In fact,f is such that KΨf maximizes (KΨf ) T 1 m Y T Y (KΨf ) subject to a norm constraint. The term 1 m Y T Y is the empirical covariance matrix in the sensors space. The term z T z in (7) places the regularization term λ M ∆ 2 M f in the denominator of the equivalent formulation (9). Thus,f is regularized by the choice of norm in the denominator of (9), in a similar fashion to the classic functional principal component formulation of Silverman (1996). Ignoring the spatial regularization, the point-wise evaluation of the PC function Ψf in (9) can be interpreted as the rst PC vector computed from the dataset of backprojected data [K T 1 y 1 , . . . , K T m y m ] T , similarly to what is proposed in Dobriban et al (2017) in the context of optimal prediction.

Algorithm
Here we propose a minimization approach for the objective function in (7), which we approach by alternating the minimization of z and f in an iterative algorithm. In (7), a normalization constraint must be considered to make the representation unique, as in fact multiplying z by a constant and dividing f by the same constant does not change the objective function. We optimize in z under the constraint z = 1, which leads to a normalized version of the estimator (8): For a given z, solving (7) with respect to f will turn out to be equivalent to solving an inverse problem, which we discretize adopting a mixed nite elements approach (Azzimonti et al 2014). Speci cally, consider now a triangulated surface M T , union of the nite set of triangles T , giving an approximated representation of the manifold M. We then consider the linear nite element space V consisting of a set of globally continuous functions over M T that are af ne where restricted to any triangle τ in T , i.e.
This space is spanned by the nodal basis φ 1 , . . . , φ κ associated with the nodes ξ 1 , . . . , ξ κ , corresponding to the vertices of the triangulation M T . Such basis functions are Lagrangian, for all v ∈ M T . To ease the notation, we assume that the p points {v j } associated with the evaluation operator Ψ coincide with the nodes of the triangular mesh ξ 1 , . . . , ξ κ , and thus we have that the coef cients c are such that c = Ψf for any f ∈ V. Consequently, we are assuming the forward operators {K l } to be s × κ matrices, relating the κ points on the cortical surface of the ith sample, in one-to-one correspondence to ξ 1 , . . . , ξ κ , to the s-dimensional signal detected on the sensors for the ith sample.
Let now M and A be the mass and stiffness κ × κ matrices de ned as Practically, ∇ MT φ j is a constant function on each triangle of M T , and can take an arbitrary value on the edges 6 .
Let h = max τ ∈T (diam(τ )) denote the maximum diameter of the triangles forming M T , then the solutionf h of (7), in the discrete space V, is given by the following proposition.
Proposition 1. The surface nite element solutionf h ∈ V of model (7), for a given unitary Equation (12) has the form of a penalized regression, where the discretized version of the penalty term is AM −1 A.
The sparsity of the linear system (12), namely the number of zeros, depends on the sparsity of its components. The matrices M and A are very sparse, however M −1 is not, in general. To overcome this problem, in the numerical analysis of partial differential equations literature, the matrix M −1 is generally replaced with the sparse matrixM −1 , whereM is the diagonal matrix such thatM j j = j ′ M j j ′ (Fried andMalkus 1975, Zienkiewicz et al 2013). The penalty operator AM −1 A approximates very well the behavior of AM −1 A.
Moreover, in the case of longitudinal studies that involve only one subject, we have a single forward operator K := K 1 = · · · = K m common to all the observed signals, and consequently Algorithm 1. Inverse problems-PCA algorithm 1: Initialization: (a) Computation of M and A (b) Initialize z, the scores vector associated with the rst PC function 2: PC function's estimation: compute c such that 4: Repeat steps 2 and 3 until convergence equation (12) can be rewritten as the sparse overdetermined system to be interpreted in a least-square sense. A sparse QR solver can be nally applied to ef ciently solve the linear system (13).
In algorithm 1 we summarize the main algorithmic steps to compute the PC functions and associated PC scores for indirectly observed functions. The initializing scores z can be chosen either at random or, when there is a correspondence between the detectors of different samples (e.g. K 1 = · · · = K m ), with the scores obtained by performing PCA on the observations in the sensors space.

Eigenfunctions of indirectly observed covariance operators
Suppose now we are in the case of a single forward operator K. Combining steps 2 and 3 of algorithm 1, and moving the normalization step from (z l ) to f h , we obtain the iterations The obtained algorithm depends on the data only through m l=1 y l y T l that up to a constant is the covariance matrix computed on the sensors space. The proposed algorithm can thus be applied to situations where the observations {y l } are not available, but we are given only the associated s × s covariance matrix on the sensors space, computed from {y l }. This could be of interest in situations where the temporal resolution is very high and the spatial resolution is low, therefore it is convenient to store the covariance matrix rather than the entire set of observations.

Reconstruction and representation of indirectly observed covariance operators
Consider now n sample covariance matrices S 1 , . . . , S n , each of size s × s, representing n different connectivity maps on the sensors space. Three of such covariance matrices, associated with three different individuals, are shown in gure 2. Recall moreover that we denote with M the brain surface template and with {K i ∈ R s×p } the set of subject-speci c forward operators, relating the signal at the p pre-speci ed points {v j } on the cortical surface M with the signal detected on the s sensors.
The aim of this section is to introduce a model for the reconstruction and representation of the covariance functions {C i }, on the brain space, associated with the actually observed covariance matrices {S i }, on the sensors space. The matrices {S i } are related to the covariance functions {C i } through formula (6) that we recall here being i is the spectral decomposition of S i and D 1/2 i denotes the diagonal matrix whose entries are the square-root of the (non-negative) entries of D i . Each square-root decomposition S 1/2 i can be interpreted as a data-matrix whose empirical covariance is S i . Another possible choice for the square-root decompositions is S The output of the proposed algorithms will not depend on the speci c choice of the square-root decompositions.
In the most general setting, each covariance matrix S i is an indirect observation of an underlying covariance function C i , which can be expressed in terms of its spectral decomposition as where, for each i, γ i1 γ i2 · · · 0 is a sequence of non-increasing variances and {ψ ir } r a set of orthonormal eigenfunctions. Introduce now {f i ∈ H 2 (M)} and {ẑ i ∈ R s }, obtained by applying model (7) to each sample independently, i.e.
with · F denoting the Frobenius matrix norm. Each estimatef i , from model (14), can be interpreted as a regularized estimate of the leading PC function of S 1/2 i and thus of the eigenfunction ψ i1 . The subsequent eigenfunctions can be estimated by de ation methods, i.e. by removing the estimated componentsẑ i (K i Ψf i ) T from S 1/2 i and reapplying model (14). This leads to a set of estimates {f ir } and {ẑ ir }.
The unregularized version of model (14) is equivalent to a singular value decomposition applied to each matrix S 1/2 i independently, which would lead to a set of orthogonal estimates {ẑ ir } r ⊂ R s , for each i = 1, . . . , n. In the regularized model orthogonality is not enforced, however the estimated PC components can be orthogonalized post-estimation by means of a QR decomposition.
De ne now the empirical variances to beγ ir = f ir 2 L 2 (M) and consider the L 2 (M)normalized version of {f ir }. An approximate representation of S i = (S 1/2 i ) T S 1/2 i is thus given by and the associated approximate representation of C i , in terms of {γ ir } and {f ir }, is whereγ ir is an estimate of the variance γ ir andf ir is an estimate of ψ ir . The tensor prod- The regularizing terms in (14) introduce spatial coherence on the estimated {f ir } and thus on the estimated eigenfunctions of {C i }, fundamental in an inverse problems setting. The reconstructed covariance functions {C i } could be discretized on a dense grid, leading to a collection of covariance matrices (C i (v j , v l )) jl . Following the approach in Dryden et al (2009), a Riemannian metric could be de ned on the space of covariance matrices, followed by projection of (C i (v j , v l )) jl on the tangent space centered at the sample Fréchet mean. PCA could then be carried out on vectorizations of the tangent space representations. A related approach, for covariance functions, has been adopted in Pigoli et al (2014).
However, the aforementioned approaches could be prohibitive in our setting. In fact, performing PCA on tangent space projections produces modes of variation that are geodesics passing through the mean, and whose interpretation in a high-dimensional setting is often challenging. Therefore, in the next section, we propose an alternative model that enables joint reconstruction, and representation on a 'common basis', of indirectly observed covariance functions.

A population model
Let {ẑ i } n i=1 ⊂ R s andf ∈ H 2 (M) be given by the following model: The newly de ned model, as opposed to model (14), has now a subject-speci c s-dimensional vector z i and a term f that is common to all samples. As in the previous model, the subsequent components can be estimated by de ation methods, leading to a set of estimatesf r andẑ ir .
De ne now the empirical variances to beγ ir = ẑ ir 2 f r 2 L 2 (M) and consider the L 2 (M)normalized version of {f r }. The empirical term in model (16) suggests an approximate representation of S i , that is where each underlying covariance function C i is approximated by the sum of the product between a subject-speci c constantγ ir and a componentf r ⊗f r common to all the observations. The regularizing term in (16) introduces spatial coherence on the estimated functions {f r }. The covariance operators {C i } are said to be commuting if C i C i ′ = C i ′ C i for all i, i ′ = 1, . . . , n. This property can be equivalently characterized as with {γ ir } r subject-speci c variances and {ψ r } a set of common orthonormal functions. Thus, a collection of commuting covariance operators is such that its covariance operators can be simultaneously diagonalized by a basis {ψ r }. In this case, the functions {f r } can be regarded as estimates of {ψ r } and {γ ir } estimates of {γ ir }.
On the one hand, model (16) constrains the estimated covariances to be of the form C i = rγ irf r ⊗f r and not of the more general form C i = rγ irf ir ⊗f ir . On the other hand, such a model takes advantage of all the n samples to estimate the components {f r ⊗f r }. Moreover, the associated variables {γ ir } give a convenient approximate description of the ith covariance, as they are comparable across samples, as opposed to the one computed from model (14). In fact, the ith covariance function can be represented by the variance vector (γ i1 , . . . ,γ iR ) T , for a suitable truncation level R, where each entry is associated with the rank-one component f r ⊗f r . For each r, a scatter plot of the variances {γ ir } i , as the one in gure 14, helps understand what the average contribution of the rth components is and what its variability across samples is. Model (17) could also be interpreted as a common PCA model (Benko et al 2009, Flury 1984, as {f r } are the estimated regularized eigenfunctions of the pooled covariance C = 1 n n i=1 C i . Potentially, PCA could be performed on the descriptors (γ i1 , . . . ,γ iR ) T to nd rank-R components that maximize the variance of linear combinations of {γ ir } (i.e. the variance of the variances). However, results would be more dif cult to interpret, as they would involve variations that are rank-R covariance functions around the rank-R mean covariance function.

Algorithm
The minimization in (14), for each xed i, is a particular case of the one in (7) (see section 3.2), so we focus on the minimization problem in (16) which is also approached in an iterative fashion. We set n i=1 z i 2 = 1 in the estimation procedure. This leads to the estimates of {z i }, given f, that are The estimate of f given {z i }, in the discrete space V introduced in section 3.2, is given by the following proposition.
, the scores of the rst PC 3: PC function's estimation from model (14): compute c such that : Scores estimation from model (14): Algorithm 2 contains a summary of the estimation procedure. From a practical point of view, the choice to de ne the representation basis to be a collection of rank one (i.e. separable) covariance functions, of the type F r =f r ⊗f r , is mainly driven by the following reasons. Firstly, rank-one covariance functions are easier to interpret due to their limited degrees of freedom. Secondly, on a rank one covariance function F r =f r ⊗f r spatial coherence can be imposed by regularizingf r , as in fact done for model (14), and this is fundamental in the setting of indirectly observed covariance functions. Finally, due to their size, it might not be possible to store the full reconstructions of the covariance functions {C i } on the brain space, instead, the representation model in (17) allows for an ef cient joint representation of such covariance functions in terms of rank-one components.

Simulations
In this section, we perform simulations to assess the performances of the proposed algorithms. To reproduce as closely as possible the application setting, the cortical surfaces and the forward operators are taken from the MEG application described in section 6. The details on the extraction and computation of such objects are left to the same section. For the same reason, the signals on the brain space considered here are vector-valued functions, speci cally functions from the brain space M to R 3 , as is the case in the MEG application. The proposed methodology can be trivially extended to successfully deal with this case, as shown in the following simulations.

Indirectly observed functions
We consider M T to be a triangular mesh, with 8k nodes, representing the cortical surface geometry of a subject, as shown on the left panel of gure 1. Each of the 8k nodes will represent a location v j associated with the sampling operator Ψ. The locations of the nodes {v j } on the brain space, the location of the 241 detectors on the sensors space and a model of the subject's head, enable the computation of a forward operator K describing the relation between the signal generated on the locations {v j }, on the brain space, and the signal detected on the 241 sensors in the sensors space. In practice, the signal on each node v j is described by a three dimensional vector, characterized by an intensity and a direction, while the signal detected on the sensors space is a scalar signal. Thus, the forward operator is a 241 × 24k matrix.
We rst want to assess the performances of the proposed model in the case of indirect functional observations belonging to a linear space. To this purpose, we produce synthetic data following the generative model (5). Speci cally, on M T , we construct the four L 2 (M T ) orthonormal vector-valued functions {ψ r = (ψ r,1 , ψ r,2 , ψ r,3 ) : r = 1, . . . , 4}, with ψ r : M T → R 3 . These represent the PC functions to be estimated. In gure 5 we show the four components of {ψ r } and the associated energy maps { ψ r (v) 2 : v ∈ M T }, with · denoting the Euclidean norm in R 3 . We then generate m = 50 smooth vector-valued functions {x l } on M T by where {z lr } are i.i.d realizations of the four independent random variables {z r ∼ N(0, γ r ) : r = 1, . . . , 4}, with γ 1 = 3 2 , γ 2 = 2.5 2 , γ 3 = 2 2 and γ 4 = 1.
The functions {x l } are sampled at the 8k nodes, and the forward operator is applied to the sampled values, producing a collection of vectors {y l } each of dimension 241, the number of active sensors. Moreover, on each entry of the vectors {y l }, we add Gaussian noise with mean zero and standard deviation σ, for different choices of σ, to reproduce different signal-to-noise ratio regimes.
In the following, we compare the PC model (7) to an alternative approach that we call the naive approach. In fact, the individual functions {x l } could be estimated from {y l } by use of classical inverse problem estimators. Here, we adopt the estimates {x l } de ned aŝ where eachx l is de ned in such a way that it balances the tting term and the regularization term in (20). Due to the fact that f is vector-valued, ∆ M f 2 is de ned as denoting the components of f. The same penalty operator is also adopted to generalize to vector-valued functions the PC models introduced in sections 3 and 4. In this approach, the constant λ is chosen independently for each of the m functions by partitioning the 241 detectors in roughly equally sized K = 2 groups and applying K-fold cross-validation. The criterion for the optimal λ is the average reconstruction error, on the sensors space, computed on the validation groups. Once we obtain the estimates {x l } we can compute the estimated PC functions {ψ r } by applying classical multivariate PC analysis on the reconstructed objectsx l .
The estimates are compared to those of the proposed PC function model, as described in algorithm 1, with 15 iterations. Note that, instead, a tolerance could be xed to test if the algorithm has converged. However, 15 iterations give satisfactory convergence levels in our simulations and application studies. We partition the m observations in equally sized K = 2 groups and perform K-fold cross-validation for the choice of the penalty. Speci cally, we choose the coef cient λ that minimizes the sensors space reconstruction error, on the validation groups.
To evaluate the performances of the two approaches, we generate 100 datasets as previously detailed. The quality of the estimated rth PC function is then measured with E ψ r ,ψ r = 3 q=1 ∇ M (ψ r,q −ψ r,q ) 2 . The results are summarized in the boxplots in gure 6, for two different signal-to-noise ratios, where the Gaussian noise has standard deviation σ = 5 and σ = 10. In gure 7 we show an example of a signal on the brain space corrupted with the speci ed noise levels.
The boxplots highlight the fact that the proposed approach provides better estimates of the PC functions (i.e. lower estimation errors E(ψ r ,ψ r )), when compared to the naive approach. Differences in the estimation error are higher in a low signal-to-noise regime, as it is for the estimation of the fourth PC function, where intuitively, the low variance associated to the PC function makes it more dif cult to distinguish this structured signal from the noise component. Also surprising is the stability of the estimates of the proposed algorithm across the generated datasets, as opposed to the naive approach of reconstructing the functional observations independently, which instead returns multiple particularly unsatisfactory reconstructions. An example of such reconstructions is shown in gure 8.

Indirectly observed covariance functions
In this section, we consider M T to be a 8k nodes triangular mesh, this time representing a template geometry of the cortical surface, which is shown in gure 10. This contains only the geometric features common to all subjects. Moreover, each subject's cortical surface is also represented by a 8k nodes triangular surface, which is used, together with the locations of the Figure 6. On the left, a summary of the results in a medium signal-to-noise ratio regime. On the right, a summary of the results in a low signal-to-noise ratio regime. Each boxplot displays the paired differences of the estimation errors E(ψ r ,ψ r ) between the estimates of the two steps naive method and those obtained by applying algorithm 1. A paired difference greater than 0 indicates that, for the dataset in question, algorithm 1 has performed better than the two steps naive approach. 241 detectors on the sensors space, and the head model, to compute a forward operator K i for the ith subject. The 8k nodes of each subject's triangular mesh are in correspondence with the 8k nodes of the template mesh M T . This allows the model to be de ned on the template M T .
As in the previous section, we construct four L 2 (M T ) orthonormal functions ψ r = ψ r,1 , ψ r,2 , ψ r,3 : r = 1, . . . , 4 . The energy maps of {ψ r } are shown in gure 9. We generate synthetic data from model (6) as follows: where z i1 , . . . , z i4 are i.i.d realizations of the four independent random variables {z r ∼ N(0, γ r ) : r = 1, . . . , 4}, with γ 1 = 3 2 , γ 2 = 2.5 2 , γ 3 = 2 2 and γ 4 = 1. The matrixvalued form of the covariance functions arises from the fact that the observed functions on the brain space are vector-valued. Subsequently, we construct the point-wise evaluations matrices C i ∈ R 24k×24k , from which the correspondent covariance matrices on the sensors space are  de ned as The term E T i E i is an error term, where E i is an s × s matrix with each entry that is an independent sample from a Gaussian distribution with mean zero and standard deviation 5. We then apply algorithm 2 with 15 iterations, feeding in input {S i }. The results are shown in gure 9, in terms of energy maps of the reconstructed functions ψ r . These are a close approximation of the underlying functions {ψ r }. The delity measure 3 q=1 ∇ M ψ r,q −ψ r,q 2 of such estimates is 6.8 × 10 −2 , 6.1 × 10 −1 , 6.8 × 10 −1 and 7.4 × 10 −1 , for ψ 1 , . . . , ψ 4 respectively, which is comparable in term of order of magnitude to the results obtained in the case of PCs of indirectly observed functions. Across the generation of multiple datasets, results are stable, with the exception of few situations where the cross-validation approach suggests a penalization coef cient λ that under-smoothes the solution, due to very similar associated signals on the sensors space of the under-smoothed solution and the real solution. However, the crossvalidation is only a possible approach to the choice of the penalization constant, and many other options have been proposed in the inverse problems literature (Vogel 2002). Some of these, however, involve visual inspection.

Application
In this section, we apply the developed models to the publicly available human connectome project (HCP) young adult dataset (Van Essen et al 2012). This dataset comprises multi-modal neuroimaging data such as structural scans, resting-state and task-based functional MRI scans, and resting-state and task-based MEG scans from a large number of healthy volunteers. In the following, we brie y review the pre-processing pipeline, applied to such data by the HCP, to ultimately facilitate their use.

Pre-processing
For each individual a high-resolution 3D structural MRI scan has been acquired. This returns a 3D image describing the structure of the gray and white matter in the brain. Gray matter is the source of large parts of our neuronal activity. White matter is made of axons connecting the different parts of the gray matter. If we exclude the sub-cortical structures, gray matter is mostly distributed at the outer surface of the cerebral hemispheres. This is also known as the cerebral cortex. By segmentation of the 3D structural MRI, it is possible to separate gray matter from white matter, in order to extract the cerebral cortex structure. Subsequently a mid-thickness surface, interpolating the mid-points of the cerebral cortex, can be estimated, resulting in a 2D surface embedded in a 3D space that represents the geometry of the cerebral cortex. In practice, such a surface, sometimes referred to as cortical surface, is a triangulated surface. Moreover, from the 3D structural MRI, a surface describing the individuals' head can be extracted. The latter plays a role in the derivation of the model for the electrical/magnetic propagation of the signal from the cerebral cortex to the sensors. An example of the cortical surface of a single subject, is shown on the right panel in gure 1, instead the associated head surface and MEG sensors positions are shown on the left panel of the same gure.
Moreover, a surface based registration algorithm has been applied to register each of the extracted cortical surfaces to a triangulated template cortical surface, which is shown in gure 10. Post registration, the triangulated template cortical surface is sub-sampled to a 8k nodes surface. Moreover, the nodes on the cortical surface of each subject are also sub-sampled to a set of 8k nodes in correspondence to the 8k nodes of the template. For each subject, a 248 × 24k matrix, representing the forward operator, has been computed with FieldTrip (Oostenveld et al 2011) from its head surface, cortical surface and sensors position. Such a matrix relates the vector-valued signals in R 3 , on the nodes of the triangulation of the cerebral cortex, to the one detected from the sensors, consisting of 248 magnetometer channels.
With the aim of studying the functional connectivity of the brain, for each subject, three 6 min resting state MEG scans have been performed, of which one session is used in our analysis. During the 6 min, data are collected from the sensors at 600k uniformly distributed time-points. Using FieldTrip, classical pre-processing is applied to the detected signals, such as low quality channels and low quality segments removal. Details of this procedure can be found in the HCP MEG reference manual. Moreover, we apply a band pass lter, limiting the spectrum of the signal to the [12.5, 29] Hz, also known as the beta waves. For the signal of each channel we compute its amplitude envelope (see gure B.1) which describes the evolution of the signal amplitude. The measure of connectivity between channels that we adopt in this work is the covariance of the amplitude envelopes. Other connectivity metrics, such as phase-based metrics, have been proposed in the literature (see Colclough et al 2016, and references therein).

Analysis
Here we apply the population model introduced in section 4.2 to the HCP MEG data. The rst part of the analysis focuses on studying dynamic functional connectivity of a speci c subject. For this purpose, we subdivide the 6 min session in n = 40 consecutive intervals. Each of these segments is used to compute a covariance matrix in the sensors space, resulting in n covariance matrices S 1 , . . . , S n . In this setting, we have one forward operator K = K 1 = · · · = K n . The aim is understanding the main modes of variation of the functional connectivity on the brain space of the subject. Thus, algorithm 2, with 20 iterations, is applied to S 1 , . . . , S n to nd the PC covariance functions.
A regularization parameter λ common to all the PC components is chosen by inspecting the plot of the regularity of the rst R = 10 PC covariance functions ( R r=1 M ∇ψ r 2 ) versus the residual norm, for different choices of the parameter. This is a version of the L-curve plot (Hansen 2000) and is shown on the left panel of gure B.2. Here we show the results for λ = 10 2 , in the appendices we show the results for λ = 10. The energy maps of the estimated ψ 1 ,ψ 2 andψ 3 resulting from the analysis are shown in gure 11. These are associated with the rst three PC covariance functionsψ 1 ⊗ψ 1 ,ψ 2 ⊗ψ 2 andψ 3 ⊗ψ 3 . High intensity areas, in yellow, indicate which areas present high average interconnectivity, either by means of positive or negative correlation in time.
In gure 12, we show the plot of variances associated with each time segment, describing the variation in time of the PC covariance functions, hence the variation in interconnectivity. The variance can be either de ned on the sensors space, by normalizing the PC covariance functions {Kψ r }, with K the forward operator, or on the brain space, by normalizing the PC covariance functions on the brain space {ψ r }. Due to the presence of invisible dipoles, which are dipoles that display zero magnetic eld on the sensors space, the two norms can be quite different, leading to different average variances for each PC covariance function. Due to the high sensitivity of the source space variances on the choice of the regularization parameter, we focus on the estimated variances on the sensors space.
We have also applied our model to the covariances obtained by subdividing the MEG session in n = 80 segments. As expected the PC covariance functions, shown in gure B.5 are very similar. However, the variances, in gure B.4, show higher variability in time, which can be Figure 11. Top side and bottom side views of the estimated energy mapsψ 1 ,ψ 2 and ψ 3 obtained by applying algorithm 2 to the covariance matrices computed from the MEG resting state data of a single subject on n = 40 consecutive time intervals. On the right panel, the covariance functions associated with these energy maps. On the top right panel we highlight with red circles the areas with high average interconnectivity, which correspond to the neighborhoods of the red crossed vertices in the plot of the energy map of ψ 1 . partially explained by the fact that shorter time segments lead to covariance estimates that have higher variability.
The second part of the analysis focuses on applying the proposed methodology to a multisubject setting. Speci cally, n = 40 different subjects are considered. For each subject, the 6 min scan is used to compute a covariance matrix, resulting in n covariance matrices S 1 , . . . , S n . The template geometry in gure 10 is used as a model of the brain space. Algorithm 2 is then applied to nd the PC covariance functions on the template brain, associated with S 1 , . . . , S n . We run the algorithm for 20 iterations, and choose the regularizing parameter to be λ = 10 2 by inspecting the L-curve plot in the right panel of gure B.2. The results for λ = 10 are shown in the appendices. The energy maps of the estimated functionsψ 1 ,ψ 2 andψ 3 and the associated rst three covariance functionsψ 1 ⊗ψ 1 ,ψ 2 ⊗ψ 2 andψ 3 ⊗ψ 3 , are shown in gure 13. High intensity areas, in yellow, indicate which areas present high average connectivity. In gure 14, we show the subject-speci c associated variances, both in the sensors space and the brain space.
The presented methodology opens up the possibility to understand population level variation in functional connectivity, and indeed, whether, just as we need different forward operators for individuals (due to anatomical differences), we should also be considering both population and subject-speci c connectivity maps when analyzing connectivity networks. In fact, it is of interest to note that in both the single and multi-subject settings, the areas with high interconnectivity, displayed in yellow in gures 11 and 13, seem to be at least partially overlapping with the brain's default network (Buckner et al 2008, Yeo et al 2011. The brain's default network consists of the brain regions known to have highly correlated hemodynamic activity (i.e. highest functional connectivity levels), and to be most active, when the subject is not performing Figure 12. Plots of the segment-speci c variances of the rst R = 10 PC covariance functions. On the left, the estimated variances on the sensors space, on the right, the estimated variances on the brain space. Figure 13. Top side and bottom side views of the estimated energy mapsψ 1 ,ψ 2 and ψ 3 obtained by applying algorithm 2 to the covariance matrices computed from the MEG resting state data of n = 40 different subjects. On the right panel, the covariance functions associated with these energy maps. Figure 14. Plots of the subject-speci c variances associated with the rst R = 10 PC covariance functions. On the left, the estimated variances on the sensors space, on the right, the estimated variances on the brain space.
any speci c task. An image of the spatial con guration of the default network can be found, for instance, in gure 2 of Buckner et al (2008). From the plots of the associated variances in the sensors space (left panel of gures 12 and 14) we can see that these areas are also the ones that show high variability in connectivity across time or across subjects. This might suggest that the brain's default network is also the brain region that shows among the highest levels of spontaneous variability in connectivity.
The plots of the variances on the brain space (right panel of gure 14), when compared to those on the sensors space (left panel of gure 14), demonstrate that these type of studies are highly sensitive to the choice of the regularization, not only in terms of spatial con guration of the results, but also in terms of estimated variances on the brain space. With a naive ' rst reconstruct and then analyze' approach, where the reconstructed data on the brain space replace those observed on the sensors space, this issue could go unnoticed, as the variability that does not t the chosen model is implicitly discarded in the reconstruction step and does not appear in the subsequent analysis. Also, importantly, our analysis deals with statistical samples that are entire covariances, overcoming the limitations of seed-based approaches, where prior spatial information is required to choose the seed. Seed locations are usually informed by fMRI studies and this comes with the risk of biasing the analysis when comparing electrophysiological networks (MEG) and hemodynamic networks (fMRI).
In general, care should be taken when drawing conclusions from MEG studies. Establishing static and dynamic functional connectivity from MEG data remains challenging, due to the strong ill-posedness of the inverse problem. It is known that other variables, such as the choice of the frequency band or the choice of the connectivity metric can in uence the analysis. While the choice of the neural oscillatory frequency band could be seen as an additional parameter in MEG functional connectivity studies, there is no general agreement on the choice of the connectivity metrics (Gross et al 2013). It is important to highlight that in this paper we focus on methodological contributions to the speci c problem of reconstructing and representing indirectly observed functional images and covariance functions.

Discussion
In this work we introduce a general framework for the reconstruction and representation of covariance operators in an inverse problem context. We rst introduce a model for indirectly observed functional images in an unconstrained space, which outperforms the naive approach of solving the inverse problem individually for each sample. This model plays an important role in the case of samples that are indirectly observed covariance functions, and thus constrained to be positive semide nite. We deal with the non-linearity introduced by such constraint by working with unconstrained representations, yet incorporating spatial information in their estimation. The proposed methodology is nally applied to the study of brain connectivity from the signals arising from MEG scans.
The models proposed here can be extended in many interesting directions. From an applied prospective, it is of interest to apply them to different settings, not necessarily involving neuroimaging, where studying second order information has been so far prohibitive. Direct examples are second order analysis of the dynamics of meteorological observations, such as temperature. Another possible application is the study of the dynamics of ocean currents, where the irregularity of the spatial domain, and its complex boundaries, can be easily accounted for thanks to the manifold representation approach in our models.
From a modeling point of view, it is of interest to take a step further toward the integration of the inverse problems literature with the approach we adopt in this paper. For instance, penalization terms that have been shown to be successful in the inverse problems literature, e.g. total variation penalization, could be introduced in our models.

Appendix B. Application-additional material
Here we present further material complementing the analysis in section 6. In gure B.1 we show the amplitude envelope computed from a ltered version of a signal detected by an MEG sensor. The covariance of the amplitude envelopes across different sensors is the measure of connectivity used in this work.
In gure B.2 we show the L-curve plots associated with the PC covariance models applied to the dynamic and multi-subject functional connectivity studies.
In gures B.3 and B.4 we show respectively the plots of the estimated PC covariance functions and associated variances from the dynamic functional connectivity study on n = 40 segments with regularization parameter λ = 10.
In gures B.5 and B.6 we show the estimated PC covariance functions and associated variances from the dynamic functional connectivity study on n = 80 time segments with regularization parameter λ = 10 2 .
In gures B.7 and B.8 we show the estimated PC covariance functions and associated variances from the multi-subject functional connectivity study on n = 40 subjects with regularization parameter λ = 10.  Plots of the regularity of the rst R = 10 PC covariance functions, measured as 10 r=1 M ∇ψ r 2 versus the residual norm in the data, for different choices of log(λ). On the left panel, the plot refers to the dynamic connectivity study, on the right panel the plot of the multi-subject connectivity study. Energy maps of the estimatedψ 1 ,ψ 2 andψ 3 obtained by applying algorithm 2, with lower regularization (λ = 10), to the covariance matrices computed from the MEG resting state data of a single subject on n = 40 consecutive time intervals. On the right panel, the covariance functions associated with these energy maps.  Energy maps of the estimatedψ 1 ,ψ 2 andψ 3 obtained by applying algorithm 2, with λ = 10 2 , to the covariance matrices computed from the MEG resting state data of a single subject on n = 80 consecutive time intervals. On the right panel, the covariance functions associated with these energy maps.  Energy maps of the estimatedψ 1 ,ψ 2 andψ 3 obtained by applying algorithm 2, with lower regularization (λ = 10), to the covariance matrices computed from the MEG resting state data of n = 40 different subjects. On the right panel, the covariance functions associated with these energy maps.