Re-visiting Riemannian geometry of symmetric positive definite matrices for the analysis of functional connectivity

Common representations of functional networks of resting state fMRI time series, including covariance, precision, and cross-correlation matrices, belong to the family of symmetric positive definite (SPD) matrix forming a special mathematical structure called Riemannian manifold. Due to its geometric property, the analysis and operation of functional connectivity matrices may well be performed on the Riemannian manifold of the SPD space. Analysis of functional networks on the SPD space takes account of all the pairwise interactions (edges) as a whole, which differs from the conventional rationale of considering edges as independent from each other. Despite its geometric characteristic, only a few studies have been conducted for functional network analysis on the SPD manifold and inference methods specialized for connectivity analysis on the SPD manifold are rarely found. The current study aims to show the importance of connectivity analysis on the SPD space and introduce inference algorithms on the SPD manifold, such as regression analysis of functional networks in association with behaviors, principal geodesic analysis, clustering, state transition analysis of dynamic functional networks and statistical tests for network equality on the SPD manifold. We applied the proposed methods to both simulated data and experimental resting state fMRI data from the human connectome project and argue the importance of analyzing functional networks under the SPD geometry. All the algorithms for numerical operations and inferences on the SPD manifold are implemented as a MATLAB library, called SPDtoolbox, for public use to expediate functional network analysis on the right geometry.


Introduction
The prevailing view on the brain is a highly-organized network, composed of interactions among distributed brain regions. Interactions among distributed brain regions are often characterized by the crosscorrelation matrix of spontaneous fluctuations observed in the resting state functional magnetic resonance imaging (rs-fMRI) ( Biswal et al., 1995 ). The correlation matrix -a representation of functional brain network -has served as a basis for diverse fields of brain research. The exemplary areas include exploration of brain diseases using functional connectivity ( Drysdale et al., 2017 ;Jang et al., 2016 ;Lee et al., 2017 ;Yahata et al., 2016 ), behaviorally associated brain networks ( Kyeong et al., 2014 ;Smith et al., 2015 ), dynamic connectivity analysis Calhoun et al., 2014 ;Chang and Glover, 2010 ;Cribben et al., 2012 ;Handwerker et al., 2012 ;Hutchison et al., 2013 ;Jeong et al., 2016 ;Monti et al., 2014 ;Preti et al., 2017 ), identification of individuals ( Finn et al., 2015 ), inter-individual variability ( Jang et al., 2017 ) and biomarkers in machine learning ( Dosenbach et al., 2010 ;Drysdale et al., 2017 ;Finn et al., 2015 ;Yahata et al., 2016 ). Partial correlation analysis (inverse covariance matrix) has also been used Lee et al., 2017 ;Qiu et al., 2015 ) to exclude pseudoconnectivity (spurious connection due to polysynaptic or modulatory effects). In those studies, researchers have analyzed the correlation matrices edge by edge (element of correlation matrix) by considering each edge independently from other edges ( Dosenbach et al., 2010 ;Siman-Tov et al., 2016 ) or by considering multiple edges together to explain certain dependent variables ( Leonardi et al., 2013 ;Park et al., 2014 ), particularly, in a machine learning framework ( Drysdale et al., 2017 ;Yahata et al., 2016 ). All these directions do not much differ in disre- Fig. 1. General SPD manifold. The geodesic distance between A and B is shorter than that of A and C while it is reverse in the Euclidean distance.
garding the topological dependency across elements, a typical property of the correlation matrix.
Mathematically speaking, well-known forms of representation for functional connectivity, such as covariance and precision along with correlation, are symmetric and positive definite (SPD) matrices in common. It is well known that the collection of SPD matrices forms a coneshape Riemannian manifold ( Fig. 1 ). Accordingly, the operations on the SPD manifold can better be performed based on the corresponding geometric structure rather than the Euclidean geometry. The geometric properties of SPD manifold have not yet been fully recognized in neuroimaging communities. Relatively few but increasing number of studies on the functional connectivity have been conducted on the space of SPD matrices, including evaluating average and variability of group level covariance matrices Yamin et al., 2019 ), statistical testing ( Ginestet et al., 2017 ;Varoquaux et al., 2010 ), change point detection ( Dai et al., 2016 ), estimation of individual covariance matrix ( Rahim et al., 2019 ), regression of functional connectivity in estimating structural connectivity ( Deligianni et al., 2011 ) and dimension reductions for machine learning ( Davoudi et al., 2017 ;Ng et al., 2014 ;Qiu et al., 2015 ;Xie et al., 2017 ) to name a few.
Despite increasing interest for SPD-based analysis, we still lack access to diverse methods for connectivity analysis on the SPD manifold, such as smoothing and regression, clustering, principal component analysis, state transition analysis of dynamic functional networks and statistical tests. The purpose of the current study is to provide methods for analyzing correlations (or covariance, inverse covariance matrices) on the SPD manifold. We would like to mention one notable characteristic of correlation matrices. Correlation matrix, which is also symmetric and positive definite, is a normalized version of covariance matrix and constitutes a strict subset or submanifold of SPD. To study distinct structure in the constrained set, geometry of elliptope rather than SPD can be a suitable candidate to study correlation matrices. However, it mainly deals with low-rank structure and has little been investigated ( Grubi š i ć and Pietersz, 2007 ). In this study, we consider the correlation as a constrained type of SPD matrix that forms a submanifold, inheriting geometric properties of general SPD manifold. This does not diminish the importance of SPD manifold and our contribution to study functional connectivity since many representations of functional connectivity are equipped with SPD properties and the proposed methods are presented for a broader class of SPD matrices.
The current paper is composed of four main parts. Firstly, we provide a brief overview and description on the manifold of SPD matrices with a focus on the correlation matrix. Second, we introduce functions for operations or learning algorithms for correlation matrices on the SPD manifold in consideration of practical usages in the functional network analysis. The basic operation starts with definition of geodesic distance between SPD matrices, which bases evaluation of group mean, variation, and statistical tests on the SPD manifold. We also introduce methods for smoothing and regression, principal geodesic analysis and clustering. Third, we present the need to functional connectivity matrices on the SPD space by simulation and by experimental data. All the algorithms are implemented in terms of MATLAB (Mathworks, inc. USA) toolbox, called SPDtoolbox, which is freely available on GitHub ( https://github. com/kyoustat/papers ) for public use to expediate functional network analysis on the right geometry.

Elements of Riemannian manifold
In brief, a manifold  is a topological space that locally resembles Euclidean space at each point. Manifold is called smooth when transition maps are smooth. A Riemannian manifold ( , ) is a smooth manifold whose inner product on the tangent space for a point p varies smoothly. From a practitioner's perspective, it is important to keep in mind that Riemannian manifold is not a vector space. That means, we are endowed with a concept of distance that satisfies metric properties on the space that is locally Euclidean-alike but it is not Euclidean space.
For example, consider operations on a sphere 2 ⊂ ℝ 3 and take north and south poles with coordinates (0, 0, 1) and (0, 0, -1). If we simply treat data residing on the surface with Euclidean structure, averaging coordinates of north and south poles would return (0,0,0), which corresponds to the inner core or origin of the sphere. However, if we want mathematical operations to abide by the spherical geometry, simply 'adding' coordinates of two poles would be an invalid concept. We introduce some concepts and definitions without rigorous mathematical exposition in the following.
A tangent space  at a point ∈  is a set of tangent vectors ( Fig. 2 ), which are derivatives of curves crossing p . The statement for a manifold being locally similar to Euclidean space generally refers to the tangent space and its properties as vector space. Two important mathematical operations are exponential and logarithm maps. Let ∈  be a tangent vector at x . Exponential map ( ) ∈  defines a unique shortest curve (geodesic) from a point x in the direction of v in  so that the operation results in a point in  ( Fig. 2 ). Logarithm map is an inverse of exponential map in that for , ∈  , ( ) ∈  which corresponds to y when sent back via exponential map. Finally, geodesic distance is the length of the shortest curve that connects two points x and y .

Fréchet mean and variation
Given the data 1 , 2 , … , ∈  , the sample Fréchet mean is defined as a minimizer of the sum of squared distances, where ∶  ×  → ( ℝ ) is a geodesic distance for x, y ∈ M , the length of the shortest curve on M connecting two points. If Fréchet mean exists, the Fréchet variation is the attained minimum value for ,

Riemannian metric and geodesic distance
Riemannian metric g is a symmetric (0,2)-tensor field that is positivedefinite at every point. Formally speaking, we have such that for every differential vector fields X,Y on M , ↦ ( | , | ) is a smooth function and ( , ) | ∶= ( | , | ) > = 0 and ( , ) | = 0 if and only if | = 0 . Locally at a point p , it endows an inner product structure on tangent space. One observation ( Pennec, 2006 ) is that the geodesic distance is closely related to the Riemannian metric in that for x, y ∈ M , for is a curve minimizing the distance and logarithmic map lo ( ⋅) ∶  →  .

SPD manifold
Our special interest is the space of SPD to which most representations of functional connectivity belong. Conventional measures such as covariance, precision , and correlation matrices all share identical properties.
It is well known that SPD is, in fact, a Riemannian manifold. That means, there exists certain geometric structures not necessarily similar to that of Euclidean space. In the following sections, we provide mathematical characterization and computational issues from pedagogical perspective.

Matrix exponential and logarithm
For a square matrix X , the matrix exponential is defined as analogous to a real-number case. Matrix (principal) logarithm can also be defined using the definition above by For more details of matrix exponential, logarithm, and their roles in what is known as Lie group in connection with SPD manifold, we refer to Moakher (2005) for interested readers.
Due to its pervasive nature, the space of SPD matrices has been extensively studied ( Bhatia, 2015 ). Many attempts to formalize the non-Euclidean geometry on SPD have been made where distance between two SPD objects are defined by Cholesky decomposition ( Dryden et al., 2009 ), symmetrized Kullbac k-Leibler divergence ( Wang and Vemuri, 2005 ), Jensen-Bregman LogDet divergence ( Cherian et al., 2013 ) and Wasserstein distance ( Takatsu, 2011 ) to name a few. Among many frameworks, the most popular metrics on the SPD geometric space are the affine-invariant Riemannian metric (AIRM) and the log-Euclidean Riemannian metric (LERM).

Intrinsic geometry with AIRM
Affine-invariant Riemannian metric (AIRM) is one of the most popular geometric structures on the space of SPD matrices . As its name suggests, the specific metric, which takes two elements in a tangent space to return a number, characterizes the corresponding geometry of the space. For SPD equipped with AIRM, given two tangent vectors , at a point ∈  = SPD , where −1 is standard matrix inverse and Tr a trace operator. Two important operations that connect the manifold  and the tangent plane  at a point ∈  are exponential and logarithmic maps for which SPD manifold is endowed analytical formula. For a tangent vector at , its exponential map ∶  →  is defined as and logarithmic map at , so that the geodesic distance ∶  ×  → ℝ + for two points , ∈  = can be computed as

Extrinsic geometry under LERM
The log-Euclidean Riemannian metric (LERM) is another popular machinery along with AIRM ( Arsigny et al., 2007 ). To briefly introduce, LERM framework can be considered as an analysis procedure by embedding points in SPD manifold to Euclidean space via matrix logarithm. For example, geodesic distance for two points , ∈  = under LERM is defined as with ∥ ⋅∥ a Frobenius norm. As we noted before, LERM accounts for Euclidean geometry after the matrix logarithm at the identity I . We can check this statement by the fact that ( ) = ( ) with conventional inner product in the Euclidean space.
We close this section by making a note that AIRM and LERM are two principled approaches known as intrinsic and extrinsic analysis of manifold-valued data. While intrinsic approach is strictly bound by the innate geometric conditions and makes all procedures more persuasive, extrinsic approach depends on an embedding to Euclidean space that preserves considerable amount of geometric information, which eases computation and theoretical analysis thereafter. The dichotomy is common in the studies of statistics on manifolds and we suggest ( Bhattacharya and Bhattacharya, 2012 ) to interested readers for detailed description of the dichotomy of principled approaches.

Algorithms on the SPD manifold
In this section, several learning and inference functions are described with algorithmic details. Across all descriptions in this section, we use the following notations; { } =1 ∈  for data residing on the SPD manifold, and means of all data and class j, S i an index set of data who belong to the cluster class i , superscript ( t ) an iteration t , and ∶  ×  → ℝ + a geodesic distance function as defined in the previous section. Algorithms introduced in this section are largely governed by the intrinsic geometry of AIRM framework. For extrinsic log-Euclidean geometry, most of standard algorithms for data in Euclidean space can be directly applicable after taking matrix logarithmic transformation on all data points as long as the final output matrix in the Euclidean space is taken back via matrix exponential.

Fréchet mean and variation (spd_mean.m)
Given data points ∈  , = 1 , … , , empirical intrinsic Fréchet mean is a minimizer of average squared geodesic distances and corresponding variation is One of standard methods to obtain is to use a Riemannian gradient descent algorithm proposed in Pennec et al. (2006) , in which conditions for existence and uniqueness are discussed as well as the algorithm. At each iteration, the gradient of is computed in an ambient space ∇ ( ) = ∑ =1 2 ( )∕ and then projected back onto the manifold by exponential map so that we repeat following steps at iteration , until convergence and the obtained mean is plugged in to obtain the corresponding variation of our data.
On the other hand, computation of extrinsic Fréchet mean and variation under log-Euclidean geometry is rather straightforward. As mentioned before, under LERM framework, data are embedded into some Euclidean space by matrix logarithmic map in that we have the following one-liner expression to compute objects of our interests, where ∥⋅∥ is a conventional Euclidean norm.

Test for equality of two distributions (spd_eqtest.m)
It is often a prerequisite to check whether two groups of objects are identically distributed for further analysis. In statistics, this is usually conveyed with hypothesis testing procedures to determine if there exists difference in certain measures of interests, including mean, variance, and so on. One of general test methods is a two-sample testing for the equality of two distributions. When given two sets of independent observations 1 , … , ∼ and 1 , … , ∼ , we test the null hypothesis 0 ∶ = against the alternative 1 ∶ ≠ . Among many classes of distribution tests developed for Euclidean data, we employ one nonparametric method by Biswas and Ghosh (2014) . This method is solely based on inter-point distances between data points for inference. Maa et al. (1996) showed that testing = is equivalent to test whether three distributions , , are identical, where , , denote the distributions of ∥ − * ∥, ∥ − ∥, and ∥ − * ∥ with * and * being identically distributed as and . The proof from Maa et al. (1996) is directly applicable to Riemannian manifolds embedded in Euclidean space since the limit in volume form coincides with Euclidean ball locally. This leads to a convenient adaptation of the method by simply replacing the Euclidean norm with geodesic distance on SPD. To briefly introduce the adaptation of aforementioned method, define and reject the null hypothesis 0 ∶ = for larger values of the test statistic , = ‖̂ − ̂ ‖ 2 2 . Since analytical expression for the sampling distribution of , in the limiting sense is not available, we use the permutation test by rearranging labels of the given data in order to achieve a suitable threshold defining 'largeness' of the statistic ( Pitman, 1937 ). In computational aspect, once we have a pre-computed distance matrix ∈ ℝ { + } × { + } for all observations, permutation testing procedure described above does not demand intensive computing resources since acquiring , consists of scalar addition and multiplication only.

Pairwise coefficient test of equal distribution (spd_eqtestelem.m)
In functional connectivity analysis, it is often of interest to locate which connections show group-level difference at its significance. One common approach is to perform univariate two-sample testing for each entry and apply multiple testing correction. Kim et al. (2014) compared performance of some statistical hypothesis tests in detecting group differences in brain functional networks. The aforementioned approach to employ multiple univariate tests is based on an implicit assumption that each test for pairwise coefficients is independent of each other. Furthermore, asymptotic-based tests depend upon additional assumptions including underlying distribution in that absence of sufficient validation necessitates the use of nonparametric procedure ( Wasserman, 2006 ).
In this study, we introduce a novel procedure for comparing pairwise coefficients under geometric constraints imposed by SPD manifold. Hutchison et al. (2010) proposed a bootstrap procedure for comparing coefficients at the vicinity of Fréchet means by logarithmic mapping to assure independence of parameters. Instead of local comparison, we take an alternative approach to use group-level information represented by Fréchet mean and compare elements thereafter using permutation testing framework. Given data of two groups 1 , … , and 1 , … , all on SPD ( ) , the algorithm goes as follows.

k-means clustering (spd_kmeans.m)
k-means algorithm ( MacQueen, 1967 ) is one of famous clustering algorithms for data analysis. As pointed out in Goh and Vidal (2008) , the method is easily extensible to non-Euclidean data since it solely depends on the distance measure in determining class memberships. We implemented standard Llyod's algorithm ( Lloyd, 1982 ) modified to our context. 1 randomly choose k data points as cluster means (1) 1 , … , (1) . 2 repeat following steps until convergence: • assign each observation to the cluster by smallest distances to cluster centers, for all , 1 ≤ ≤ } and when it comes to a situation where an observation can belong to one of multiple clusters, assign the cluster randomly. • update cluster centroids by Fréchet means of each class, where | | refers to the number of observations in class i . To guide the selection of cluster numbers k , we implement an adaptation of celebrated Silhouette method ( Rousseeuw, 1987 ) as a cluster quality score metric since it only relies on any distance metric ( spd_clustval.m ). Silhouette is a measure of proximity for observations in a cluster to the points in its neighboring clusters. Each observation is measured with a score between [ − 1,1] where large positive number indicates an observation is well separated by the current clustering. For each observation i , a silhouette value is From the expression above, a ( i ) is the average distance between an observation i and the rest in the same cluster of X i , while b ( i ) measures the smallest average distance from i-th observation to all points in all other clusters. When a cluster is singleton consisting of a single observation, set s ( i ) = 0. Once silhouette values are computed for all data points, global score is computed by averaging individual silhouette values and a partition of the largest average score is said to be superior to others.

Modularity-based clustering
It should be noted that every execution of k-means clustering shows variations in its result due to initialization. To make a consistent result for multiple k-means clustering and thus make clustering robust to initial values, we adopt modularity analysis of multiple k-means clustering, which was applied to sort neural time courses in our previous study ( Jung et al., 2019 ). The algorithm repeats k-means clustering N times for sample covariance matrices with an empirical K or evaluated by Silhouette score. Then, it evaluates the frequency for each pair of covariances being a same cluster among N times of k-means clustering. This composes a frequency adjacency matrix for every pairs of covariance matrices. The element of the frequency adjacency matrix designates how frequently two nodes (covariances) are clustered as a same cluster. The modularity optimization algorithm ( Newman and Girvan, 2004 ), implemented in the Brain Connectivity Toolbox ( Rubinov and Sporns, 2010 ), identifies optimal clusters from the frequency adjacency matrix while maximizing the modularity index Q. The modularity index Q for an adjacency matrix A is defined as: where A ij is the edge weight between the i -th and j -th nodes, V is the total sum of weights. is a kronecker delta function that assigns 1 if the i -th and j -th nodes are in the same module (M i = M j ) and 0 otherwise.
In addition to k-means clustering, the SPDtoolbox provides clustering of sample covariance matrices by evaluating an adjacency matrix of negative geometric distances among the samples. The modularity optimization algorithm finds clusters that maximize the modularity index.

Smoothing and regression (spd_smooth.m)
Suppose we have paired data ( , ) , = 1 , … , where ∈ ℝ represents one-dimensional covariate such as age or time and ∈  manifold-valued data. One of the most fundamental questions would be to model the relationship between and , i.e., = ( ) + . This regression framework, however, is not easily defined even for parametric family of ( ⋅) since the concept of subtraction on Riemannian manifold may not be straightforward. Nevertheless, it is needless to emphasize the importance of such problem since the learned regression function may be used for prediction, smoothing, and so on.
A fast yet powerful method was recently proposed by Lin et al. (2017) for manifold-valued response ∈  and vectorvalued explanatory variable ∈ ℝ . We focus on introducing a special case = 1 for simplicity along with its usage as a nonparametric regression framework against one-dimensional covariate. The key idea is to graft kernel regression ( Nadaraya, 1964 ;Watson, 1964 ) onto log-Euclidean geometric structure.
Let ∶  → ℝ , ≥ dim (  ) be an embedding of our SPD manifold  onto high dimensional Euclidean space ( Fig. 3 ). A convenient choice for is an equivariant embedding, which is known to preserve substantial amount of geometric structure ( Bhattacharya and Bhattacharya, 2012 ). For SPD manifold, the embedding is ( ) = ( ) , which is logarithmic mapping of at identity ,  whose inverse −1 is therefore exponential mapping. Given two mappings, the extrinsic kernel estimate of the regression function is defined as for a kernel function ℎ ( ) = ( ∕ ℎ )∕ ℎ with a positive bandwidth ℎ > 0 that controls the degree of mass concentration and projection  ∶ ℝ → (  ) . Kernel bandwidth ℎ is either fixed or determined via crossvalidation. SPDtoolbox provides k-fold cross validation to determine optimal ℎ ( spd_smoothcv.m ).

Principal geodesic analysis (spd_pga.m)
Principal component analysis (PCA) is a well-known method for exploratory data analysis ( Hotelling, 1936 ;Pearson, 1901 ). To review in brevity, it aims at finding a basis that captures the largest variance onto which the data is projected. This nature allows us to exploit the method for dimension reduction and feature extraction. Given a data matrix = [ 1 | ⋯ | ] ∈ ℝ × where each row has zero mean, top eigenvectors , = 1 , … , , such that Σ = , 1 ≥ 2 ≥ … ≥ are taken as basis where Σ = ⊤ ∕ is an empirical covariance matrix. The is a matrix consisting of eigenvectors as abovealso called as principal components or loadings.  grafted the very idea onto the setting where data lies on manifold. Since there is no direct counterpart of empirical covariance matrix and eigen-decomposition on Riemannian manifold, a local approach was first taken. After obtaining a Fréchet mean for given data 1 , … , ∈  , every observation is pulled back onto the tangent space  by = ( ) ( Fig. 4 ). Recalling that  is lo- can be obtained whose eigen-decomposition returns This gives local coordinates for each observation = [ 1 | … | ] ⊤ ( ) ∈ ℝ , projected variance on the tangent space and loadings = ( ) ∈  for = 1 , … , . The name principal geodesic comes from the fact that loadings or principal components are not vectors on  but rather a projected eigenvector by exponential mapping along the geodesic. That means an element from a local coordinate vector = [ 1 , … , ] for an observation induces an approximate weight/significance of -th observation along -th geodesic direction. Please note that principal geodesics { } =1 are not orthogonal to each other as in conventional PCA so that they should be understood as approximate basis in a local sense around the Fréchet mean .

Simulation 1. Why SPD geometry should be used?
Other than mathematical characterization of SPD that is clearly different from that of a vector space, we describe why the geometry of SPD endows a natural interpretation with respect to analysis of representations for functional connectivity. Assume we have a population of covariance connectivity matrices {Σ} = { Σ 1 , … , Σ } . It is of often interest to investigate partial association among different ROIs using precision matrices {Λ} = { Λ 1 , … , Λ } where each Λ is an inverse covariance matrix Σ −1 . That means we must have Σ Λ = Λ Σ = for all where is an identity matrix. Then it is natural to question whether averages Σ and Λ also satisfy the above relationship ΛΣ = ΣΛ = as well. This idea is represented in Fig. 5 that Λ is an inverse of Σ .
If we consider Euclidean geometry, this is not necessarily true. Considering {Σ} and {Λ} as elements in Euclidean space, we can define averages as follows Therefore, it is not guaranteed that Euclidean averages of covariance and precision matrices are inverse of each other unless Σ Σ −1 = for all ≠ .
On the other hand, averages defined as Fréchet means over {Σ} and {Λ} under the geometry induced by AIRM satisfy the above statement under mild conditions. Since matrix inverse is a bijective map while geodesic distance is preserved under inversion, Fréchet mean of precision matrices is an inverse of Fréchet mean of covariance matrices if the mean is unique.
We empirically showed how the choice of geometry and corresponding definition of average/mean leads to the consistency with our intuition. We generated 100 covariance matrices Σ , = 1 , … , 100 identically and independently from Wishart distribution  ( , ) with = 5 dimension (network size), = 20 degrees of freedom (the length of time series) for strict positive definiteness, and = 5 a scale matrix. After computing precision matrices Λ = Σ −1 , we compute both Euclidean and Fréchet means with both AIRM and LERM frameworks on covariance and precision matrices respectively. For each run, the gap is obtained as ∥ΛΣ − 5 ∥ to see how close Λ is to Σ −1 where ∥ ∥ = √ Trace ( ⊤ ) is a Frobenius norm. If an average precision matrix is truly an inverse of a mean covariance matrix, the gap should be close to 0. Fig. 6 shows distribution of Euclidean and Fréchet gaps on repeated experiments for 1000 times. Empirical observation is that the geometry-induced concept of Fréchet mean aligns with our intuition while simply taking Euclidean averages of covariance and precision matrices does not preserve an inverse relationship of two averages.

Simulation 2. Modularity-optimization of multiple k-means clustering
As an example of clustering for covariance matrices, we generated three types of covariance matrices with a size of five. We generated 30 covariance matrices from three model matrices C1, C2 and C3 (10 for each). The three covariance matrices ( Fig. 7 a) We then added a zero mean Gaussian noise matrix to each covariance matrix after weighting it by 0.15. The generated covariance matrices are shown in Fig. 7 b. For the 30 covariance matrices, we executed 100 k-means clustering with k = 3 and constructed a frequency adjacency matrix ( Fig. 7 c). From the frequency adjacency matrix, modularity optimization identifies three modules and their cluster indices. We extracted cluster centroids for each cluster by taking geodesic average of all members for each cluster ( Fig. 7 d).

Simulation 3. Dynamic connectivity and state transition analysis
As an extension of the clustering example, we conducted a statetransition analysis of functional connectivity dynamics Damaraju et al., 2014 ) on the SPD manifold. For a given transition matrix, we simulated a state sequence (length = 30) consisting of three heterogeneous states which correspond to initial covariance matrices, C1, C2 and C3 ( Fig. 8 a). The transition matrix used is For the 30 correlation matrices ( Fig. 8 b), we conducted a modularityoptimization of multiple k-means clustering. The method generates a Fig. 7. An example for clustering covariance matrices. Three initial covariance matrices (a) were used to generate 30 random variation of covariance matrices around initial covariance matrices (10 for each C1, C2, and C3) (b). (c) displays the frequency adjacency matrix of the 30 covariance matrices used for modularity optimization. The frequency adjacency matrix indicates the frequency for each pair of covariances being a same cluster after 100 repeated k-means clustering. (d) displays the centroids for the three covariance groups. Fig. 8. State transition analysis of dynamic connectivity. Based on three main correlation matrices C1, C2 and C3 (a), total 30 covariance samples were generated according to a state transition sequence (b). The modularity-based k-means clustering was done after composing a frequency adjacency matrix (c) for 100 iterations of k-means clustering. The resulting centroids are displayed in (d) and the sequences of original (red) and estimated (blue) ones are displayed in (e) (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.). frequency adjacency matrix by applying 100 repetitions of k-means clustering ( Fig. 8 c), which was used for modularity optimization. The cluster centroids are displayed in Fig. 8 d and the initial sequence and resulting sequence are displayed in Fig. 8 e.

Simulation 4. Smoothing and regression
We conducted smoothing of simulated noisy covariance matrices. Covariance matrices simulated were geometrically positioned between the two covariance matrices C1 and C2 with an exponentially increasing distance weight w ( w = t 2 for t = [0, 0.1, 0.2, …, 1]) for C2 and 1-w for C1 on the tangent plane ( Fig. 9 a). We then added a zero mean Gaussian noise matrix to each covariance matrix after weighting it by 0.1 and projected the noisy covariance matrices back onto the manifold to make them as perturbed observations along the geodesic. The generated covariances and their noisy matrices are shown in Fig. 9 b. We conducted 5-fold cross-validation to determine the kernel bandwidth h from an exponentially equi-spaced interval of length 10 from exp(-1) to exp(1). For each level of h , we applied smoothing function to the noisy covariance matrices. The results for smoothing and bandwidth selection are presented in Fig. 9 b and 9 d.
We utilized this approach to a regression problem ( Fig. 10 ). We squared 50 uniformly distributed weights and used them as interpolation weights w for geometric locations between the two covariance matrices C1 and C2. We then added a Gaussian noise in the tangent space and projected noisy matrices to suffice SPD condition. Using the bandwidth h corresponding to the minimal k-fold cross-validation error, we estimated an interpolation function, which predicts the covariance matrix at any geometric location (corresponding to a weight w ).

Simulation 5. Principal geodesic analysis
We generated 30 covariance samples by interpolating three target covariances (C1, C2 and C3) with random weights ( Fig. 11 a). The weights were derived from a uniform distribution but the weights for C1 were multiplied by 1.2 from the initial weights to emphasize variations of C1. The weighted sum was done in the tangential space. The result covariance samples are presented in Fig. 11 b. We then applied principal geodesic analysis to the covariance matrices to find principal geodesics and their weights for the samples ( Fig. 11 c).

Simulation 6. Test of equal distributions
In statistical literature of hypothesis testing, simulation study is often adopted to demonstrate the power of a proposed procedure ( Lehmann and Romano, 2010 ). Among several indices, we report empirical type 1 error, which measures ratio of false rejections when the null hypothesis of 0 ∶ = is true. In our simulation, we use two exemplary tests with samples drawn from the correlation matrices C2 and C3. For each group, we generate time series samples from Wishart distribution by using designated correlation matrix as a scale parameter and degrees of freedom (time series length) and normalize the matrices to have unit diagonals. We used = 50 to guarantee strict positive definiteness. We varied the number of observations (corresponding to subjects in the group study) belonging to each set from n = 5 to 50 in order to check an impact of sample size on the performance. For T = 500 iterations, two sets of correlation matrices with varying sample sizes from n = 5 to 50 are randomly generated in accordance with the procedure described above and p-values are recorded. Then, empirical type 1 error at the pre-defined significance level is obtained by and we conclude the test is valid if empirical type 1 error is similar to . As shown in Fig.12 , both examples show similar levels of type 1 errors and = 0 . 05 for different numbers of observations.

Experimental data analysis
For the experimental data analysis, we used 734 rs-fMRI data (2016 release, male: N = 328, female: N = 405, age: 22~37, mean: 28.7, std:3.7) Fig. 9. An example for smoothing of continuously varying correlation matrices with Gaussian noises. A set of noisy continuous covariance matrices (the first row in (b)) were generated by weighted summing of two covariance matrices (C1 and C2 in Fig. 7 ) with a Gaussian noise in the tangent plane. The weight is presented in (a). For ten different bandwidths, the distances between samples and their corresponding regressed ones are presented in (c). The 5-fold cross validation (CV) errors for all bandwidths are presented in (d). The original covariance samples are displayed in the first row and their noisy versions are displayed in the second row of (b). The lowest CV error was found at the bandwidth of 1.396, the 9th row in (b). from the Human Connectome Project (HCP) database ( Van Essen et al., 2012 ). All data was sampled at 0.72 Hz during four sessions, with 1200 time points per session. rs-fMRI data was preprocessed according to the HCP minimal preprocessing pipeline and mapped into cortical surfaces ( Glasser et al., 2013 ). For the analysis of brain networks, we extracted time series at ten brain regions in the cortical surface-based atlas ( Desikan et al., 2006 ) corresponding to the default mode network (DMN). They are the superior frontal gyrus (SFG, also known as the dorsomedial prefrontal cortex), medial orbitofrontal cortex (MOF, corresponding to ventromedial prefrontal cortex), parahippocampus (as a part of the hippocampal formation), precuneus and inferior parietal lobe in each hemisphere ( Fig. 13 A), which are commonly found in previous studies of DMN ( Buckner et al., 2008 ;Power et al., 2011 ;Yeo et al., 2011 ). The first eigenvariate derived from the principal component analysis of fMRI time series at surface nodes for each region was used as a regional BOLD signal summary for the region. Pearson correlation of rs-fMRI was used as a measure for functional connectivity of the time series. The geodesic average of all the 734 rs-fMRI data and exemplary functional connectivity (correlation) matrices of 30 subjects are presented in Fig. 13 b and 13 c.

Experiment 1. Regression analysis
We conducted a regression analysis of connectivity matrices according to behavior scores. Among many behavior scores in the HCP database, we chose Penn Matrix Test (PMAT) and Pittsburgh Sleep Quality Index (PSQI) to test the feasibility of geodesic regression analysis for SPD. The PMAT24 which was developed by

Gur and colleagues ( Bilker et al., 2012 ) is a shortened version of the Raven's Progressive Matrices that measures fluid intelligence. To represent fluid intelligence, number of correct responses (PMAT24_A_CR) and total skipped items (PMAT24_A_SI) were used.
The PSQI test is a widely used 19 self-rated questionnaire that assesses sleep quality and disturbances over one month time interval ( Buysse et al., 1989 ). In the SPD regression analysis, we determined the kernel bandwidth by utilizing 5-fold cross validation at 20 exponentially equidistant bandwidths ( Fig. 14 a, c and  e). With the kernel bandwidth showing minimal CV errors (for concave CV-error curves, e.g., PMAT24-A-CR and PMAT24-A-SI) or knee of CV-error curves (for curves without any local minima, e.g., PSQI-scores), 734 connectivity matrices at different scores for each measure were used to estimate regression function. The representative connectivity matrix at each score according to the learned regression function is presented in Fig. 14 b, d and f.

Experiment 2. Principal geodesic analysis (PGA)
Principal geodesic analysis was applied to the 734 connectivity matrices, from which nine principal geodesic components were derived. The explained variances of all nine components and top three principal geodesic components and their coefficients (the weight of connectivity matrices projected onto each component) are displayed in Fig. 15 a-c. The component 3 differs from the other components in that the module composed of the bilateral inferior parietal lobe and the precuneus are differentiated from other brain regions. The component 2 differs from the component 1 in the isolated medial orbitofrontal cortex. The  Fig. 10. An example for regression analysis of covariance samples. By weighted summation of C1 and C2 covariance matrices (a) using a square weighted function w (the third panel of a), followed by adding Gaussian noise, we generated 30 samples of correlation matrices (b). The evaluation of regression performance was done at the weight points shown in (c). For ten different bandwidths, the distances between samples and their corresponding regressed ones are presented in the middle panel of (c). The 5-fold cross validation (CV) error for each bandwidth is presented in the bottom panel of (c). The original covariance samples and regressed correlation matrices (with an optimal bandwidth) at ten uniformly spaced weight points are displayed in (d).
geodesic component 1 explains most portions of variability of connectivity matrices as shown in the distribution of coefficients ( Fig. 15 c). We also evaluated Euclidean principal component analysis for the covariance matrices as a set of vectors (one per each subject) without considering covariance architecture. The results are presented in Fig. 15 d-f. The distributions of components and their coefficients in the Euclidean analysis differ from those of geodesic analysis.

Experiment 3. Clustering
Modularity-optimization of multiple k-means clustering was applied to 734 connectivity matrices. A total of 100 runs of k-means clustering constitutes a frequency adjacency matrix ( Fig. 16 a), which was sorted by modularity optimization ( Fig. 16 b). The frequency adjacency matrix indicates how much each pair of nodes is assigned to a same cluster for partitions from k-means clustering. Connectivity matrices of 734 subjects are categorized into three clusters according to this approach. The geodesic mean and Euclidean mean for each cluster and their difference are presented in Fig. 16 c-e. The cluster centers 1, 2, and 3 match highly with geodesic principal components 1, 3 and 2 in Fig. 15 .

Experiment 4. Dynamic connectivity
Dynamics in the functional connectivity was evaluated by clustering sliding-windowed connectivity matrices, followed by calculating a transition matrix among the clusters. rs-fMRI time series data (1200 temporal scans) from a subject was divided into 83 epochs by a sliding window approach with a window size of 42 (30 s, 0.72 sec/sample) and an overlap of 28 (20 s) ( Fig. 17 a). We calculated correlation matrices for all the epochs. Total 100 k-means clustering were applied to the 83 correlation matrices in the SPD space to compose a frequency adjacency matrix ( Fig. 17 b), which was used for modularity optimization ( Fig. 17 c). We presented the three geodesic cluster centers ( Fig. 17 d) and the transition matrix that shows the number of transitions among the clusters ( Fig. 17 e).

Experiment 5. Performance comparison between AIRM and LERM
We presented the reliability and computational burden innate to intrinsic Fréchet mean computation using AIRM against log-Euclidean counterpart (LERM) in Fig. 18 . It is common practice to take extrinsic Fig. 11. An example for principal geodesic analysis (PGA) of correlation matrices. 30 correlation matrices in (b) were generated by randomly weighting C1, C2 and C3 correlation matrices in (a). The weights for C1 were multiplied by 1.2. The explained variances of principal components are displayed in the first panel of (c). The sample distributions in the principal component axes are shown in the second panel of (c). The initial weights for C1 and weights for the first principal component (PC1) are displayed in the third panel of (c). mean as a starting point for gradient descent algorithm due to its asymptotic convergence ( Bhattacharya and Bhattacharya, 2012 ). This means that the strategy we take inevitably requires additional computing time for an iterative scheme but it may be negligibly small by taking a good starting point. We compare the performance of two aforementioned frameworks in computing Fréchet mean of 1000 covariance matrices Σ , = 1 , … , 1000 independently generated from Wishart distribution ( , ) for a range of dimensions (network size) = 50 , 100 , … , 500 , = a scale matrix with = 10000 degrees of freedom which is sufficient to guarantee the definiteness of generated samples. The mean squared difference between mean covariances estimated by using AIRM and LERM is presented in the left panel of Fig.18 and the CPU time in seconds are displayed in the right panel of Fig. 18 . The experiment was conducted on a personal computer with a quad-core Intel i7 CPU running at 4 GHz and 24GB RAM (Apple iMac, 2014) using MATLAB R2019b without any explicit parallel computing involved.

Discussion
Use of correlation matrix for resting state or task-dependent fMRI has been a basic tool for functional brain network analysis. The implicit assumption of conventional functional network analysis is that each edge is constructed (and can be dealt with) separately from other edges by repetitive pairwise correlation analysis. It has little considered the correlation matrix as a whole and often neglects correlated nature of edges. In some applications, multiple edges are modeled to be jointly involved in the brain diseases or cognitions. However, the vectorization-based multivariate approach does not acknowledge that each edge itself is a  Fig. 13. The brain regions used in the evaluation of the default mode network. a) The brain network includes the superior frontal gyrus (SFG), medial orbitofrontal cortex (MOF), parahippocampus (PHC), precuneus (PrC) and inferior parietal lobe (IPL) according to the Desikan-Killiany atlas ( Desikan et al., 2006 ). The brain regions in the left hemisphere are presented. The first letters "l " and "r " in each region's name indicate left and right hemisphere. b) The geometric average of a group of 734 functional connectivity matrices (FC) is presented. All square pixels in the connectivity matrix indicate edges among the brain regions. Dots in (b) indicate edges that show 5% differences between Euclidean average and geometric average of group connectivity matrices. c) Functional connectivity matrices of 30 exemplary subjects are displayed.
result of multi-nodal activities and has interrelatedness with others in consequence. It is not difficult to find neurobiological examples that the basic assumption does not hold. For example, some nodes (such as the ventral tegmental area and the substantia nigra pars compacta in the dopamine pathway or the raphe nucleus in the serotonin pathway) play modulatory roles for multiple edges (thus, they are interrelated). Therefore, a single edge may well be considered in association with other edges. This is the rationale to deal with correlation matrix as a whole.
Since the correlation matrix is a type of SPD objects which have a cone-shape Riemannian geometry, we need to take advantage of dealing correlation matrices as manifold-valued objects with corresponding geometric structure. Persistent inversion of covariance matrices for groups in Figs. 5 and 6 is such an example. We can also find several examples on the functional connectivity analysis using the correlation matrix on the SPD manifold ( Dai et al., 2016 ;Dai et al., 2020 ;Deligianni et al., 2011 ;Ginestet et al., 2017 ;Ng et al., 2016 ;Qiu et al., 2015 ;Rahim et al., 2019 ;Varoquaux et al., 2010 ;Yamin et al., 2019 ). Varoquaux et al. (2010) introduced Fréchet mean and variation of correlation matrices to detect functional connectivity difference in the post-stroke patients using group-level covariance averages and demonstrated an increased statistical power in the detection compared to the Euclidean space. They also proposed a bootstrap procedure for pairwise coefficient analysis on the tangent space to ensure mutual independence amid coordinates. In compared to their study, we propose bootstrapping directly on intra-group differences to infer statistical significance of edge difference. Rahim et al. (2019) utilized SPD geometry in estimating functional connectivity of individuals by pulling all covariance matrices to tangent space around Fréchet mean and constructed an empirical prior derived from the tangentialized data to be used as a grouplevel shrinkage mechanism for individual covariances. In estimating behaviors using covariance matrix, Ng et al. (2016) utilized a whitening transport approach which applies log mapping along the identity matrix after removing the reference covariance effect instead of conventional parallel transport approach. Ginestet et al. (2017) proposed a method to hypothesis testing for binarized connectivity matrix in the SPD space using the graph Laplacian matrix which inherits SPD structure by its nature. Yamin et al. (2019) compared brain networks between twins with geodesic distance on the SPD manifold. Deligianni et al. (2011) proposed a regression analysis on the SPD manifold in inferring brain functional connectivity from anatomical connections. The change point detection has also been conducted in the SPD space by Dai et al. (2016) who conducted stationarity testing of functional connectivity obtained by the minimal spanning tree in conjunction with resampling-based decision rules. Dai et al. (2020) also proposed a metric for trajectories on the SPD manifold to learn individuals' dynamical functional connectivity at the population level based on distance-based learning methods. This mainly differs from current smoothing approach proposed by Lin et al., (2017 ) in that we stand in line for a nonparametric modeling of the smoothed trajectory itself. Besides fMRI study, Sadatnejad et al. (2019) introduced a clustering method on the SPD to represent EEG. Several studies have conducted dimension reduction for machine learning in the SPD space. Qiu et al. (2015) conducted a manifold learning on functional networks using regression upon age after transforming covariance to the Euclidean space by applying logarithmic map at the identity matrix. Davoudi et al. (2017) suggested a dimensionality reduction based on distance preservation to local mean for symmetric positive definite matrices. Xie et al. (2017) also conducted classification after dimension reduction based on bilinear sub-manifold learning of SPD.
The current study can be in line with and an extension of those previous studies, by introducing more algorithms in particular to functional brain network analysis, which were previously used or conceptualized in the SPD research fields. The inference algorithms we newly introduced in the functional connectivity analysis are smoothing and regression, principal geodesic analysis, clustering and statistical tests for correlation-based functional connectivity.
The regression of SPD is to find a relationship between a covariate and covariance matrix, which was generally approximated by minimizing errors (measured data minus predicted ones with a learned regression function). This regression framework for SPD is not trivial since the concept of subtraction may not be straightforward. Lin et al. (2017) pro- To optimize the regression kernel bandwidth, 5-folds cross-validation (CV) errors were calculated for every bandwidths h (a, c and e). The bandwidths were set to be exponentially increasing from exp(-3) to exp(1) in 20 steps. The bandwidth that have minimal CV errors or curve knee points were chosen to be optimal (green rectangle). The regression results were presented in the form of connectivity matrix for a 10 equispaced scores (x) from the minimum to the maximum scores (b, d and f). To designate changes in the edges for visualization purpose, we marked edges that have 20%, 30% and 40% change ratios between the connectivity matrices of the lowest (the regressed connectivity for the first score, C 1 ) and highest scores (the regressed connectivity for the last score, C n ). The ratio r was defined by the absolute difference between the connectivity for the first and last scores, normalized by the mean connectivity across all scores, = connectivity matrix at the score i (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.). posed a powerful regression method associating manifold-valued response and vector-valued explanatory variables by applying kernel regression onto transformed responses under some embedding that preserves geometric information ( Nadaraya, 1964 ;Watson, 1964 ). We utilized this nonparametric regression framework to associate functional connectivity against one-dimensional covariate. Although we focused on a one-dimensional covariate for simplicity in the current study, this regression can be easily extended to cases with multiple covariates. This approach differs from most of previous studies on regression analysis of functional networks, which have dealt with correlation matrices element (edge) by element (edge) separately from each other edge ( Dosenbach et al., 2010 ;Siman-Tov et al., 2016 ). The introduced regression also differs from Qiu et al. (2015) which performs Locally Linear Embedding to embed manifold-valued covariates into Euclidean space so that conventional regression models can directly be used. However, the method by Qiu et al. (2015) does not guarantee straight-forward interpretation of acquired regression coefficients.
In the SPD smoothing and regression analysis, the main issue is how to determine the kernel bandwidth (or smoothness) residing in the SPD data. Since the current geodesic smoothing is a data-driven approach, we opted for a cross-validation scheme to determine the kernel bandwidth ' h '. When we applied the proposed regression analysis to functional networks according to behavior scores, i.e., the PSQI scores or PMAT scores, it estimates meaningful architecture in the functional network. The result suggests the advantage of SPD regression in exploring how functional network changes along a covariate as a whole, not ele- Fig. 15. Principal geodesic analysis and Euclidean principal component analysis of connectivity matrices. The explained variance for 9 components is presented in (a). The top three principal geodesic components (gPC) derived from 734 connectivity matrices are presented in (b). The distributions of coefficients for all connectivity matrices projected onto the top three components are presented in (c). Euclidean principal component analysis shows different explained variance curve (d) and components (e) compared to those of geodesic component analysis. Distributions of 734 coefficients for the three principal components, ePC1, ePC2 and ePC3, are displayed in (f). ment by element. In the current study, we evaluated the performance of the current smoothing and regression analysis for a prespecified noise level. Further elaboration with a more extensive range of noise levels will be worthy in understanding the performance of the proposed method.
As a multivariate analysis of covariance matrices, principal component analysis ( Leonardi et al., 2013 ) or independent component analysis  has been used in the Euclidean space.  introduced a principal geodesic analysis onto the setting with manifold-valued data. Since there is no direct counterpart of empirical covariance matrix and eigen-decomposition on Riemannian manifold, a local approach was first taken in the study of  . So far, the principal geodesic analysis has been applied to shape analysis ( Aljabar et al., 2010 ; and diffusion tensor analysis . In this study, we introduce principal geodesic analysis to the analysis of functional networks in the form of correlation matrices. As shown in Fig. 15 , there are significant differences in the principal components and their coefficients between the two approaches. As a limitation in principal geodesic analysis for the network analysis, not much is known for noise effects (in the covariance matrix) on the principal geodesic analysis. We can-not disregard the computational cost for the large-sample, large-scale network analysis.
k-means, though powerful in the clustering, is known to suffer from two major drawbacks. First, it is NP-hard in general so that finding the optimal solution is troublesome ( Aloise et al., 2009 ). Lloyd's algorithm provides a heuristic approach to obtain suboptimal solutions given an initialization but outcome variability cannot be removed by different initial partitioning ( Celebi et al., 2013 ). Although many variants have been proposed to overcome such instability including a popular initialization method k-means ++ ( Arthur and Vassilvitskii, 2007 ), we still choose random initialization scheme since extra computational complexity by such methods can be high especially in our context. Some methods cannot be directly transferred due to our geometric restrictions where the structure of Euclidean geometry cannot be directly transferable. We can achieve good-enough solutions via parallel computing on multiple initializations. The second drawback of is the selection of cluster numbers k . Unlike probabilistic and Bayesian methods where statistical inference on k is achievable within their modeling framework, a family of k-partitioning algorithms -k-means and its variants -requires post hoc model selection procedures to determine the number of clusters. Among many cluster quality scores, we implement an adaptation of celebrated Silhouette method ( Rousseeuw, 1987 ) to our context since it only relies on any distance metric.
To overcome the problems inherent in the k-means clustering such as dependency on the initial value and cluster number, we grafted a modularity-based clustering method ( Jung et al., 2019 ) onto SPD space. This clustering approach constructs a frequency adjacency matrix that counts partition co-occurrence for each pair of samples after multiple runs of k-means clustering. The modularity optimization of the frequency adjacency matrix leads to reliable clustering of correlation matrices, robust to selection of initial values and cluster numbers.
To note, Cluster Validity Index is a thematic program to measure effectiveness of a given clustering though it has not been fully extended for Riemannian manifold. The extension of the popular evaluation indices from literature to large-scale clustering of functional connectivity matrices remains to be researched further.
Any SPD clustering algorithms can readily be used to the analysis of dynamic functional networks as shown in Fig. 17 . The dynamics of functional connectivity has been summarized by transition matrix among several representative states, which was estimated by k-means clustering in the Euclidean space Damaraju et al., 2014 ). We showed that this framework for dynamic connectivity analysis can be applied in the SPD space.
As mentioned earlier, a collection of correlation matrices takes account into a submanifold of SPD matrices with a unit diagonal constraint. This causes some operations with correlation matrices not necessarily resulting in another correlation matrix. In this study, we normalized resulting matrices into correlation matrices whenever necessary under the philosophy of iterative projection onto the submanifold, which is a constrained set within SPD manifold. If we consider one convention to normalize BOLD signals to rule out the effect of signals' magnitude, this inevitably converts covariance to correlation matrices while the literature has been little careful in distinguishing unexpected impact of such scenario. At this stage, we resort to an ad-hoc approach to deal with correlation matrices even within the SPD geometric structure. One candidate to overcome such inconvenience includes elliptope structure even though current literature largely deals with rank-deficient correlation matrix and lacks concrete theoretical studies at the moment. For the full-rank scenario, it is possible to consider a Riemannian submersion by an isometric action by the set of positive diagonal matrices from the Lie group theory to extend the affine-invariant framework. However, this exponentiates the computational burden while the gain is thin compared to that of our ad-hoc approach. This necessitates to further extensive investigation of what we would benefit from adopting such structure.
Statistical analysis or classification of functional connectivity over the SPD manifold is recommended compared to directly applying those methods to vectorized correlation matrices because the inter-relatedness among elements in the covariance matrices violates "the uncorrelated feature assumption " implicit in most statistical analysis or classification algorithms ( Ng et al., 2016 ). In practice, however, analysis of functional connectivity under SPD framework may not always be beneficial compared to the Euclidean treatment in statistical analysis or in the machine learning. This may be particularly true in the high dimensional networks which incurs high computational cost. One possible bottleneck comes from an observation that in both AIRM and LERM geometries, an atomic operation that is no more algorithmically divisible involves matrix exponential and logarithm which are of comparable complexity as eigen-decomposition of symmetric matrices. Still, it does not necessarily hamper application of geometric approaches as long as one avoids analysis at the scale larger than tens of thousands of variables with modern computing capabilities. Although this is not usual in the general neuroimaging researches (mostly less than 500 ROIs -it takes about 300 s to calculate geometric mean of 1000 covariances matrices with a size of 500 as shown in Fig. 18 ), executing aforementioned operations requires heavy exploitation of computing hardware, efficient design of memory use, and specialized libraries designed for scalable computation, which is an interesting research direction of its own importance in the future.
We would also like to point out one requirement for application of the introduced framework. When the number of scans is small or not Fig. 17. State transition analysis of dynamic functional connectivity. A rs-fMRI time series of 1200 samples was divided into 83 consecutive epochs (a window size of 42 samples with 28 samples of overlap) (a) and correlation matrices were calculated from each epoch. The vertical lines indicate the starting point of each epoch (the gap is 28 samples). The 83 correlation matrices were clustered into three clusters using the modularization optimization of frequency adjacency matrix (b) from 100 runs of k-means clustering. The sorted frequency adjacency matrix is presented in (c). The centers for the three modules were evaluated by geodesic average of the connectivity matrices of the three modules (d). The state transition matrix among the three states (connectivity matrix) is presented in (e).

Fig. 18.
Comparison between AIRM and LERM. The mean squared difference between Fréchet mean of 1000 generated covariances estimated by using AIRM and LERM with different matrix sizes (left) and the average computational time (in seconds) to compute Fréchet mean of 1000 covariance matrices with different matrix sizes (right).
sufficient enough compared to that of variables, empirical covariance, precision, and correlation matrices (conventionally used for fMRI connectivity analysis) are likely to be rank-deficient while two geometries we revisited require those representation matrices to be strictly positive definite. To mitigate this issue, some of the aforementioned frameworks such as Wasserstein, Procrustes, Cholesky, and any other geometries that do not require operations like matrix exponential and logarithm can be an alternative ( Dryden et al., 2009 ;Takatsu, 2011 ). Also, it is natural to consider using more elaborated methods to estimate such matrices under assumptions like sparsity, strict non-singularity, and others. Estimation of high-dimensional covariance and precision matrices has long been a major problem in statistics and related communities in that we refer interested readers to Fan et al. (2016 ), Hong and Kim (2018 ), Lam (2019 ) for reviews to explore deep and expanded line of research on the topic.
The geometric approach to functional connectivity analysis is still early in the neuroimaging fields and we expect to find more examples that show advantages of dealing correlation matrices on the SPD space. We also expect that a MATLAB based toolbox for analyzing connectivity matrices would expediate geometric analysis of functional networks on the SPD manifold.