Taking the 4D Nature into Account Promises Significant Gains in fMRI Data Completion

Functional magnetic resonance imaging (fMRI) is a powerful, noninvasive tool that has significantly contributed to the understanding of the human brain. FMRI data provide a sequence of whole-brain volumes over time and hence are inherently four dimensional (4D). Missing data in fMRI experiments arise from image acquisition limits, susceptibility and motion artifacts or during confounding noise removal. Hence, significant brain regions may be excluded from the data, which can seriously undermine the quality of subsequent analyses due to the significant number of missing voxels. We take advantage of the four dimensional (4D) nature of fMRI data through a tensor representation and introduce an effective algorithm to estimate missing samples in fMRI data. The proposed Riemannian nonlinear spectral conjugate gradient (RSCG) optimization method uses tensor train (TT) decomposition, which enables compact representations and provides efficient linear algebra operations. Exploiting the Riemannian structure boosts algorithm performance significantly, as evidenced by the comparison of RSCG-TT with state-of-the-art stochastic gradient methods, which are developed in the Euclidean space. We thus provide an effective method for estimating missing brain voxels and, more importantly, clearly show that taking the full 4D structure of fMRI data into account provides important gains when compared with three-dimensional (3D) and the most commonly used two-dimensional (2D) representations of fMRI data.


I. INTRODUCTION
F UNCTIONAL magnetic resonance imaging (fMRI) provides a noninvasive and indirect measure of brain activity and is a vital methodology in basic and applied neuroscience research. One of the common problems in fMRI analysis is the problem of missing data. The data collected during fMRI experiments often suffer from scanner instabilities, magnetic susceptibility artifacts [1], subject motion, and task-related noise [2]. An additional challenge arises when the fMRI data are processed in their native, complex domain, as the phase images exhibit a challenging noise structure [3]. The denoising technique is often used in complex-valued fMRI as a preprocessing step to increase the sensitivity of analyses [4], [5]. However, the denoising process results in the exclusion of a significant number of voxels from neuroimaging studies [3], [4]. The excluded brain regions at the cortical boundaries increase the risk of false negatives.
The removal of voxels with missing data may increase the probability of false positives when spatial extent thresholds are applied to establish significance because fewer voxels generate smaller cluster sizes to trigger significance [2]. Hence, missing data present an important challenge for fMRI analysis. This challenge can result in the loss of statistical power and affect the consequent inferences motivating the use of data completion techniques to recover/estimate the excluded or missing voxels. The missing data in fMRI studies can be either missing at random (MAR) or not missing at random (MNAR), which implies random groups or contiguous groups of voxels are missing [6]. Magnetic field inhomogeneity is the main source of missing MAR values due to the variability in the response of various tissues to magnetization [7]. The variability in tissue magnetization may result in geometric distortions, which are especially pronounced at tissue/border edges, the frontal lobe proximal to the sinuses, and the temporal lobe proximal to the mastoid air cells and air canals [8]. The MNAR pattern is a more challenging missing value pattern in fMRI data. For example, task-related or resting-state fMRI (rs-fMRI) experiments may lead to the nonrandom exclusion of voxels that are associated with brain activity or the effect of interest due to standard fMRI preprocessing and motion censoring techniques [9], [10]. Another example of MNAR pattern can arise as a result of complex-valued fMRI data quality control when noisy regions are eliminated from the phase images [4].
One approach for solving the incomplete fMRI data problem is to use robust estimation techniques to infer missing voxels. Data completion methods are suitable for this purpose [11]- [14]. Data completion methods infer the missing values from partially observed entries and structural properties of the data. Real-world fMRI datasets exhibit correlations among their samples and hence highly correlated latent factors, which make data completion a feasible task. The completion methods used to estimate missing values fall into three broad categories: methods based on expectation maximization (EM) [13], nonlinear optimization [15], and nuclear norm minimization [11], [16], [17]. These approaches are further categorized into low-rank matrix and tensor completion methods. The algorithms that have been used in [18], [19], [20], [21] utilize low-rank matrix completion methods, in which each brain volume is transformed into a twodimensional (2D) matrix Y ∈ R T ×V , where T denotes the number of timepoints and V denotes the number of voxels for each brain volume.
The majority of tensor completion methods developed for medical imaging mainly focus on three-dimensional (3D) fMRI data [22], [23], [24]. In [22], authors made use of tensor completion for fMRI data to recover continuously missing 3D scans at select timepoints with the multiway delay embedding transform (MDT) and alternating least square (ALS) approach based on Tucker decomposition. However, the fMRI dataset used in [22] was transformed into a 3D representation for data completion purposes. In [23] and [24], authors formulated the task of sampling and reconstruction of fMRI data as a 3D tensor completion problem. Hence, the advantages of the four-dimensional (4D) structure of fMRI data have not been widely exploited for the data completion purposes. The state-of-the art tensor-based methods that focus on a higher-order fMRI data representation for brain imaging analyses include [25], [26], [27].
In [25], block term decomposition (BTD) is used to present a 4D analysis approach based on blind source separation (BSS) demonstrating the potential of such a representation. There has been considerable interest on tensor-based methods for analysis [25], [28] and functional connectivity estimation [29] that use higher-order models in the medical imaging context. However, less attention has been paid to the full utilization of all available spatial and temporal modes of fMRI data for the data completion task, which could directly highlight the importance of taking the full 4D information into account.
Recently, a nonconvex TT-rank minimization method for 4D fMRI tensor completion has been developed in [30] with promising results reported for the MAR voxels pattern. Similar to this work, we view 4D fMRI data as a sequence of 3D whole-brain volumes observed at a number of consecutive time points for data completion. Mathematically speaking, an fMRI dataset is a 4D tensor with 3D space × time. Tensors capture correlation information across multiple dimensions via a high-order decomposition. Since fMRI data are inherently 4D, the 4-way representation allows us to naturally preserve the global structure of the data and maximize the use of both the spatial and temporal structures, i.e., the correlations, in the data. In addition, we make use of tensor train (TT) decomposition, which provides scalable numerical algebra operations and hierarchical compressed representations of structurally rich data sets; hence, it results in both reduced computational complexity and the efficient utilization of the covariance information.
In addition to not considering the time dimension, the majority of the algorithms for estimating missing data for medical imaging modalities were derived from schemes originally developed for Euclidean spaces. However, representing the geometry of medical images in Euclidean spaces has been shown to undermine the ability to account for joint variability either at the subject or group level [31], [32]. It was shown in [32] that brain images belong to a low-dimensional manifold that can provide a better approximation for the magnetic resonance imaging (MRI) data.
To overcome this problem, we propose to design a Riemannian framework for statistical learning in the context of fMRI tensor completion directly on a manifold, which is equipped with a Riemannian affine-invariant metric [33]. In [33] authors show that when tensors are endowed with the affine-invariant Riemannian distance, they provide a powerful framework for a robust interpolation and restoration of missing data by using generalized geometrical statistics. Therefore, the estimation error can be substantially reduced by learning tensor-valued fMRI data using Riemannian methods. As demonstrated in [34], [35], [36], [32], the use of the Riemannian methods yields a high discriminative power in fMRI/MRI data analyses. The Riemannian geometry offers a principled mathematical framework for applying standard optimization algorithms that work with Euclidean geometry. The proposed approach involves a projection of the quantities expressed in the Euclidean space onto the tangent space, which is equipped with the invariant information-preserving distance. Our study seeks to adapt the classical optimization algorithms into their Riemannian counterparts by substituting the Euclidean distance with the Riemannian distance and optimizing the cost function over the Riemannian space.
In recent years, much progress has been made in the research of various conjugate gradient methods that use Riemannian geometry [37], [38]. The extension of the conjugate gradient algorithm to Riemannian manifolds is accomplished by replacing the classical gradient with the natural gradient [39]. This extension is relatively simple, as it is sufficient that all the vector operations take into account the Riemannian nature of the problem space. Yao et al. [40] proposed the Riemann Fletcher-Reeves conjugate gradient method to solve the stochastic eigenvalue problem, and Steinlechner [41] proposed the Riemannian nonlinear conjugate gradient scheme for the approximation of high-dimensional multivariate functions. To account for the nonlinearity of the search space of the original data, in this paper, we propose a novel Riemannian approach for 4D tensor completion of fMRI data. We combine the strength of two ideas -natural 4D tensor representation of fMRI data and the model parametrization that best fits to the fMRI data. TT decomposition guarantees numerical stability, and the affine-invariant Riemannian distance allows volume-preserving transformations of the shape of the data. We show the validity of this approach and its desirable properties with fMRI data from the Center for Biomedical Research Excellence [42].
In addition to the successful demonstration of the great potential of considering the full 4D information in fMRI data, our contributions in this paper are summarized as follows: • Viewing fMRI data as a 4D hypervolume allows us to take both temporal and spatial correlations into consideration, and therefore, allows us to establish a joint 4D learning scheme. The joint 4D learning scheme fully captures the intrinsic relationship between the spatial and temporal modes, thus guaranteeing superior performance compared to 2D and 3D fMRI completion methods in recovering missing voxels. • We cast the problem of estimating missing fMRI data as a smooth optimization problem on Riemannian manifolds due to the nonlinear structure of the parameter space. To the best of our knowledge, this is the first time Riemannian geometry has been used for the estimation of missing fMRI data in the form of a 4D tensor in a practical setting. • We develop a new geometric version of the spectralscaling conjugate gradient method. Our method exploits second-order information and offers enhanced numerical stability. • We extend our Riemannian optimization method in TT format, which allows efficient numerical algebra operations without exponential scaling of the memory consumption and the computational complexity with the number of dimensions [43]. It has been shown in [17] that TT decomposition captures the global correlation of the tensor entries, and thus, it is an efficient tool for tensor completion methods. • We demonstrate that the proposed Riemannian nonlinear spectral conjugate gradient with TT (RSCG-TT) method provides a robust estimate of the original data for both the MAR and the MNAR missing value patterns encountered in real fMRI datasets and has performance superior to that of the state-of-the-art tensor completion methods across different rates of missing values. We show that the proposed method outperforms the state-ofthe-art gradient algorithms based on canonical polyadic decomposition (CPD) and Tucker decomposition formulated in Euclidean space. The remainder of the paper is organized as follows. Section II-A introduces some notations and preliminaries for TT decomposition. Section B-A describes the Riemannian manifold learning framework with TT decomposition, followed by the formulation of the tensor completion problem on TT manifolds in Section II-B. In Section II-C, we propose a new spectral conjugate gradient scheme in conventional Euclidean space. In Section III, we present the algorithmic development of the proposed RSCG-TT method in full. We provide a description of the experimental setup in Section IV. The efficiency and robustness of the proposed algorithm on real fMRI data with missing data patterns that reflect simulated and real-world scenarios are demonstrated in Section V. We discuss the experimental results in Section VI. Conclusions and future work are given in Section VII.

II. THEORY
In this work, we consider TT tensors as our structured tensor format of interest [44], [45]. As noted in Section B-A2, the set of all such tensors of a fixed multilinear rank r, which we equip with an affine invariant Riemannian metric [46], forms a smooth embedded submanifold of R I1×···×I N and were originally studied in [47]. With this differentiable Riemannian structure, we can construct optimization algorithms to solve the tensor completion problem for N -dimensional tensors. In Section II-B, we formulate high-dimensional data completion as a Riemannian tensor completion problem on TT manifolds. We provide the preliminaries on Riemannian manifold learning for tensor decomposition in TT format in Appendix B, and notations and definitions of the necessary ingredients used in Riemannian TT learning are summarized in Table 2.
To provide the reader with some context on TT decomposition, we provide the basic concepts and results in Section II-A. For an overview of the most common choices of tensor formats, we refer the reader to [48]. Although our presentation is self-contained, we recommend the reader to refer to the original content [47], [44], [49] for further clarification. The mathematical notations and definitions are adopted from [50] and [43], [51]. In this paper, we denote vectors with boldface lowercase letters (x,y, z,...), matrices with boldface capital letters (X,Y, Z,...), and tensors with bold calligraphic letters (X ,Y, Z,...); see Table 1.

A. TT DECOMPOSITION
A TT decomposition [44], shown in Fig. 1, represents an N th-order tensor X ∈ R I1×I2···×I N defined as where G (n) ∈ R Rn−1×In×Rn is the nth 3D TT core and  Notation Definition Khatri-Rao and Kronecker products · F Frobenius norm of the argument Ω the sampling set a Ω the missing values set b r = (R 0 , R 1 , · · · , R N ) TT rank tr(·) trace of the argument ·, · inner product of the arguments b the set of indices of the observed entries of X a the set of indices of unknown entries of X FIGURE 1. TT decomposition of an N th-order tensor X ∈ R I 1 ×I 2 ×···×I N with TT rank r = (R0, R1, R2, · · · , Rn, · · · , R N −1 , R N ), where R0 = R N = 1. Conceptually, TT decomposition is a collection of TT cores connected in a cascade, where G (n) ∈ R R n−1 ×In ×Rn are 3D tensors and G (1) ∈ R I 1 ×R 1 and G (N ) ∈ R R N −1 ×I N are matrices [43].
TT decomposition (TT rank) [44] is an N + 1 tuple of ranks where R n is the nth component of the TT rank, which controls the complexity of TT decomposition, and R 0 = R N = 1. The rank R n is the rank of matrix X <n> ∈ R I1I2···In×In+1···I N obtained by the nth splitting operator [44] of tensor X as follows: Fig. 1 shows that the TT cores G (n) are connected in a cascade, and the leaf nodes are identities. The entries of TT decomposition for an N th-order tensor can be computed as a multilayer multiplication of slices of TT cores, where the internal edges represent the TT rank [43]. The inner product of TT tensors can be defined in terms of the standard Euclidean product for vectors: where vec(X ) is the vectorization operator and n = 1, · · · , N .

B. TENSOR COMPLETION ON TT MANIFOLDS
We consider the completion of an N -dimensional (N ≥ 3) tensor T ∈ R I1×···×I N with partially observed entries defined in TT format. We assume the data are low-rank, and we present tensor completion as an optimization problem on a Riemannian manifold with a fixed TT rank as follows: where P Ω is the projection onto the sampling set Ω ⊂ {1, 2, · · · , I 1 } × · · · × {1, 2, · · · , I N }, which defines the indices of the known entries of T , We choose TT decomposition as our structured tensor representation to solve the problem in (5). The choice of the TT format is driven by the requirement to address tensor completion for very high-dimensional (N ≥ 4) problems and the property of TT decomposition, which captures the global correlation of the tensor entries [17] due to its well-balanced matricization scheme. The TT format allows us to develop an algorithm that scales linearly in the number of dimensions N when the tensor size is I and polynomially in the TT rank r, yielding a total asymptotic complexity of N |Ω|r 2 . We exploit the Riemannian structure of TT manifold [47] by viewing (5) as an unconstrained optimization problem on M r , which allows for the use of Riemannian optimization techniques [37], [49]. The Euclidean gradient for problem (5) with respect to (w.r.t.) X is given by where f X (X ) ∈ R I1×···×I N , and the Riemannian gradient is obtained by projecting (7) onto the tangent space T X M r :

C. A NONLINEAR SPECTRAL CONJUGATE GRADIENT (SCG) SCHEME
To solve the tensor completion problem in (5), we use a nonlinear SCG scheme as our method of interest due its global superlinear convergence property, low memory requirements, and suitability for solving large-scale unconstrained optimization problems.
We develop a nonlinear SCG procedure that seeks the conjugate gradient direction N k closest to the direction of the double parameter self-scaled preconditioned Broyden-Fletcher-Goldfarb-Shanno (BFGS) update. To improve the numerical stability of our method, we use the spectral-scaling

Symbol
Definition Computation γ geodesic curve g Riemannian metric Mr Riemannian manifold of fixed TT rank r equipped with metric g ξ tangent vector at point X T X Mr tangent space of Mr at point X ∈ Mr P T X Mr (·) the orthogonal projection of the argument onto T X Mr gradf (X ) Riemannian gradient at X ∈ Mr, gradf (X ) ≡ ξ ∈ T X Mr gradf (X ) = P T X Mr (∇f X (X )) τ X →Y vector transport of ξ ∈ T X Mr, τ X →Y : projection operator by TT rank truncation P TT-SVD secant condition [52] while deriving the search direction. The choice of the values for the spectral parameters in the search direction are determined to minimize the spectral condition number of the BFGS approximation to the inverse Hessian of the objective function. To achieve global convergence for general functions, we use a modified BFGS update (MBFGS) [53], which possesses the global convergence property without the convexity assumption on the objective function. Hence, by incorporating the spectral-scaling preconditioner in the MBFGS update, we simultaneously use second-order information and improve the numerical stability of our method. Next, we introduce the key ingredients of our nonlinear SCG method in Euclidean space and provide a definition of our nonlinear SCG method in Riemannian space in Section III.
Starting from the initial guess X 0 ∈ R I1×···×I N , the nonlinear conjugate gradient method generates a sequence X k as follows: where α k > 0 is the step length determined by a suitable line search strategy. The spectral search direction N k [54] is given by where θ k and β k are spectral and conjugate gradient parameters, respectively, and G k denotes the Euclidean gradient ∇f (X k ). We denote A well-known drawback of the spectral conjugate gradient methods with the search direction (10) is that they cannot always satisfy the sufficient descent conditions [55], but they perform very well in practice for solving large-scale unconstrained optimization problems. Here, the sufficient descent condition means that there exists a positive constant c such that where c is a positive constant. The condition in (11) is crucially important in establishing the global convergence of our method. To take advantage of the attractive properties of the spectral gradient methods and enforce the generation of the search directions that satisfy (11), we require that the step length α k satisfies the improved Wolfe conditions [56], which ensure that the condition (11) always holds. To support this requirement, we employ a nonmonotone line search strategy, which produces the step length α k to satisfy the improved Wolfe conditions [56]. In Section II-C1, we derive the search direction (10) in Euclidean space and propose a SCG method based on the revised MBFGS update [53]. In Section III, we generalize the proposed SCG method to the Riemannian space and present our RSCG-TT method using the Riemannian optimization scheme on the TT manifold.

1) Derivation of the spectral search direction
It was shown in [57], [58] that the BFGS variable metric method can be regarded as a conjugate gradient algorithm by incorporating the BFGS updating scheme of the inverse Hessian approximation within the framework of a memoryless quasi-Newton method. We propose a new efficient spectral conjugate gradient algorithm based on the MBFGS update [53]. We require that the search direction (10) is close to a quasi-Newton direction; that is, where H k is the symmetric approximation to the inverse Hessian ∇ 2 f −1 (X k ), expressed as a memoryless BFGS update [59]. During each iteration, the quantity H k ≈ ∇ 2 f −1 (X k ) is replaced with a diagonal preconditoner θ k I such that the spectral-scaling secant condition [52] is satisfied: where q > 0, p > 0 and the related variables are defined after (10). The spectral-scaling secant condition (13) takes into account the second-order information of the objective function, and by extension, it incorporates the proven properties of the MBFGS update (14)-(15) [53]. Furthermore, since we require our line search strategy to fulfill the improved Wolfe conditions [56], the curvature condition holds; that is, As a result, the expression for Z k−1 in (14) can be simplified as follows: To further enhance the numerical stability of our method, we propose to employ a double parameter variable BFGS metric update [60], which is given by: where τ k > 0 is set to satisfy the bound τ k = γθ k , 1 ≤ γ < 2. Alternatively, (17) can be written as It is easy to see that if H θ k is symmetric positive definite, the choice of θ k > 0, τ k ≥ θ k results in the sufficient descent condition for the search direction (12). The choice of the spectral parameter θ k > 0 is driven by the requirement to minimize the condition number of ∇ 2 f −1 (X k ), which can enhance the numerical stability of the tensor-based computational process. Therefore, the quantity θ k I should be an approximation of ∇ 2 f −1 (X k ). We obtain parameter θ k as suggested in [61], [62] by minimizing θ k Z k−1 − S k−1 F , which yields By its derivation, (19) minimizes the spectral condition number of (17) and thereby improves the numerical stability of our method. The optimal upper bound of the parameters θ k and τ k was derived in [60], [63], and the parameters θ k and τ k are computed by To ensure the sufficient descent condition for the search direction (12), we restrict θ k as follows: where m 1 is a small positive constant. By multiplying (17) by the gradient G k and after some algebraic transformations, our spectral search direction (12) is given by The step length α is obtained by minimizing the tangent line passing through X ∈ Mr with direction N k ∈ T X k Mr. We minimize along the search curve to obtain an optimal estimate of the step length α.
where the spectral parameterθ k is given in (21) and the conjugate gradient parameter β k is computed as follows: Here, we discuss the global convergence properties of the SCG method inherited from the MBFGS update. Notably, the MBFGS update formula in (14) has a favorable property that for each k, it always holds that Z T k−1 S k−1 > 0, which ensures that H k inherits the positive definiteness of H k−1 [64] as long as we start with the initial positive definite quantity H 0 = θ 0 I. It was shown in [53] [Theorem 2.1], [64] [Theorem 3.3] that if the update for H k is given by (13)- (15), then the global convergence property holds. Therefore, as previously argued [53], [64] the SCG method is seen to possess the global convergence property. For more details regarding the MBFGS method, we refer the reader to the original articles [53], [64], and [65].

III. A RIEMANNIAN SCG METHOD FOR TENSOR COMPLETION
This section is devoted to the algorithmic development for solving the tensor completion problem (5).

A. RSCG-TT ITERATIVE ALGORITHM
With the concepts introduced in Sections II-B and II-C we have all the requirements for performing Riemannian optimization on the TT manifold M r . The SCG algorithmic framework developed in Section II-C yields Algorithm 1. It can be seen as an extension of SCG scheme introduced in Section II-C, with the Euclidean gradient replaced by the Riemannian gradient. Here, we generalize the quantities we derived in Section II-C to Riemannian settings. When the Euclidean space R I1×···×I N is replaced by the Riemannian TT manifold (M r , g), the components in spectral search direction (22) belong to different tangent spaces; hence, the Euclidean vector addition on (M r , g) makes no sense. To modify the vector addition into a suitable operation, we use the vector transport τ X k−1 →X k := P T X k Mr (X k−1 ) (see Table 2). Thus, the Riemannian SCG update rule is as follows: (38) and (39), The resulting optimization scheme is given in Algorithm 1. We call it the Riemannian SCG algorithm for tensor completion via TT. Algorithm 1 represents a geometrical version of the nonlinear SCG algorithm proposed in Section II-C. The major difference is that we have to ensure that every point X k in the optimization sequence stays on the Riemannian manifold M r . To move in the direction of the steepest ascent of f (X k ), we use the Riemannian gradient ξ k ∈ T X k M r , which points in the direction of the greatest increase within tangent space T X k M r . The geometrical version of the algorithm is visualized in Fig. 2a and consists of the following core concepts.
In the first step, we initialize tensor X 0 with random entries and move in the direction of the negative Riemannian gradient N 0 = −θ 0 ξ 0 . This produces a new iterate that is not an element of M r . We bring the new iterate X 0 + α 0 N 0 back to manifold M r using retraction (33). The spectral search direction N k ∈ T X k M r is computed by using the updating rule in (32). It requires taking a linear combination of the Riemannian gradient ξ k and quantities S k−1 and ζ k−1 , which belong to a different tangent space T X k−1 M r . We map S k−1 and ζ k−1 to the current tangent space with the vector transport τ X k−1 →X k . To generate the next iteration in the direction of sufficient descent, we compute the step length α k using a nonmonotone line search strategy based on the improved Wolfe condition proposed in [56]. Next, with the spectral conjugate direction N k , we perform a step along the search curve γ and obtain the next iteration by Algorithm 1 RSCG-TT Algorithm. Require: The observed tensor T ∈ M r , sampling set Ω, multilinear TT rank r, prescribed tolerance , and positive parameters p, m 1 , γ, q. Ensure: : The completed tensor X is an approximation of T . 1: Initialize: X 0 ∈ M r randomly 2: Compute the initial gradient ξ 0 ← gradf (X ) 3: Compute the initial search direction as the steepest descent θ 0 ← 1, N 0 ← −θ 0 ξ 0 4: Compute the initial step size by exact minimizer (37) α 0 ← min α f (X 0 ) + αN 0 5: Obtain the next iterate by retraction (33) Compute the step length by updating rules (30) and (31) Compute the next iterate by retraction (33) (33) so that X k ∈ M r . Evaluating cost function (5) reduces the computation of the entries of the sparse tensor P Ω X since the observed tensor P Ω T is provided as input. The convergence condition is reached when the relative error between observed tensor P Ω T and its approximation P Ω X is smaller than a prescribed tolerance . We find the step length α, which is a solution to the cost function, defined on the geodesic search curve γ(α) := R X k (αN k ), where R X k (αN k ) is the retraction of αN k at point X k . The cost function for the step length α is given as We determine the step length α * such that the value φ(α * ) provides the sufficient reduction in cost function (5) and generates a descent direction such that N T k ξ k < 0. The onedimensional minimizer for (34) given in [37] α * k := argmin α>0 φ(α) := argmin α>0 f (R X k (αN k )) (35) We minimize along the search curve to obtain the optimal value of the step length. The concept is illustrated in Fig.  2b. Our tensor completion problem (5) is a smooth quadratic function. Therefore, a good initial guess for the step length can be obtained with exact minimization of the tangent space without retraction.
The closed-form solution of (36) is derived by setting the αderivative to 0 and is given by To guarantee the global convergence of our RSCG-TT method, we use the nonmonotone line search strategy based on the improved Wolfe condition [56] with a combination of curvature conditions as follows: where c > 0, 0 < δ < σ < 1, l k is a positive term so that a positive summable sequence {l k } satisfies condition k≥1 l k < +∞. As noted by [56], the positive term in (40) allows slight increase in the function value, and it helps to avoid the numerical drawback of the standard Wolfe condition [66] due to the existence of the numerical round-off errors or when the tolerance error is very low [56].

B. COMPLEXITY ANALYSIS OF THE RSCG-TT METHOD
We analyze the time complexity of RSCG-TT for one iteration in Algorithm 1. The computational complexity of RSCG-TT is influenced by the following computations: the Riemannian gradient gradf (X k ) in (8), the search direction N k in (32), the step-length α k in (37), and the next iteration X k+1 obtained by the retraction R X k (α k N k ) in (33). The computational complexities of TT decomposition and the Riemannian operators are given in [44] and [67], respectively. For simplicity, we assume I = max{I 1 , I 2 , · · · , I N } and the upper bound on TT rank is R max = max{R 0 , R 1 , · · · , R N }. The Riemannian gradient gradf (X k ) is computed as a projection of the sparse Euclidean gradient in (7) onto the tangent space of X , which results in a cost of O(N IR 3 max + N |Ω|R 2 max ). The estimates for the search direction N k and the next iteration X k+1 can be calculated in O(N IR 3 max ), and the cost for the step length α k is |Ω|(N −1)R 2 max . Hence, for one iteration of RSCG-TT, the total computational cost is dominated by the computation of the Riemannian gradient and is given by

IV. NUMERICAL EXPERIMENTS
We assess the numerical performance of the RSCG-TT method with real fMRI data from the Center of Biomedical Research Excellence (COBRE), which is available from the COllaborative Informatics and Neuroimaging Suite (COINS) data exchange repository [68] 1 . We compare RSCG-TT with two other tensor completion stochastic gradient descent (SGD) methods, namely, Tucker-SGD and CPD-SGD. 1 COINS data exchange repository https://coins.trendscenter.org/ #exchange

A. IMAGE ACQUISITION AND PROCESSING
We employed data from the COBRE study of the Mind Research Network (MRN). The rs-fMRI data consists of 149 volumes of T2 * -weighted functional images each, acquired through a gradient-echo EPI sequence: TR = 2 s, TE = 29 ms, flip angle = 75°, number of axial slices = 33 in sequential ascending order, slice thickness = 3.5 mm, slice gap = 1.05 mm, field of view=240 mm, matrix size = 64 × 64 and voxel size = 3.75 mm × 3.75 mm × 4.55 mm. To remove the T1 equilibration effects, the first four volumes were discarded. Slice-timing correction was used to realign images. The data were spatially normalized to the standard Montreal Neurological Institute space, resampled to 3 mm×3 mm×3 mm voxels, and smoothed using a Gaussian kernel with a full width at a half maximum of 5 mm.

B. EXECUTION DETAILS
All experiments were performed with Python in TensorFlow on a Linux workstation with 4 Quad-Core Intel Xeon 3.1 GHz processors and 16 GB RAM. For all experiments, the input is a tensor P Ω T projected onto the sampling set Ω that reflects the positions of the observed entries, where T is the fully observed true tensor, and * is the Hadamart (elementwise) product. The completed tensorX is computed aŝ where I is the tensor of all ones the same size as T , and X is the output tensor estimated by each algorithm. RSCG-TT (Algorithm 1) was implemented using TT decomposition in the TensorFlow (T3F) [67] toolbox with its inbuilt operations to manipulate the tensor in TT format as well as its Riemannian operations. To find a step length α, we used the nonmonotone Wolfe line-search algorithm in [56].
The following parameters were used in the implementation of the proposed method: = 10 −8 , p = 0.001, m 1 = 10 −8 , γ = 1.2, q = 3, δ = 0.0001, and σ = 0.9. Here, the values for p, m 1 , γ, q are the values recommended in [60], and the values for δ, σ were selected based on the typical values used in the nonmonotone Wolfe line-search algorithms suggested in [55], [56], [69]. We further fine-tuned the selection of the optimization hyperparameters based on the recommended ranges in the literature using a grid search with cross validation on the validation set Ω Γ . We created a training set based on the sampling set Ω by randomly selecting the indices i k for each tensor dimension I k from a uniform distribution on {1, · · · , I k }. We generated a test set Ω T and a validation set Ω Γ with random sampling from the missing value set Ω, where 80% of the set was allocated to the test set and 20% was allocated to the validation set. The stopping criteria for RSCG-TT were set as follows: reaching a maximum 500 iterations or achieving the convergence tolerance ≤ 10 −8 .

C. EXPERIMENTAL DESIGN
Our goal is to study the performance of the proposed method in terms of its ability to recover missing voxels with different missing patterns and proportions in real fMRI data. We conduct hybrid experiments in different tensor dimensions and study the effectiveness of exploiting the multidimensional structure through 3D and 4D representations of fMRI data in terms of reconstruction errors. In addition, we compare the numerical performance and convergence rate of the proposed algorithm with the state-of-the-art SGD optimization [70], [71] for tensor completion in the form of Tucker and CPD decompositions. The experiments were carried out considering the percentage and pattern of missing values and tensor dimensionality. The design conditions are summarized in Table 3. We executed 50 runs for each setup and reported the averages.

D. APPLICATION TO REAL FMRI DATA
As we stated in Section II-B, tensor completion performs best on low-rank datasets. We exploit the low-rank property of fMRI data [19] and cast fMRI tensor completion as a specific case of the tensor completion problem on the TT manifold (5). TT decomposition applied on fMRI data captures the global correlation between spatial and temporal modes and thereby allows the optimal estimation of missing voxels. We evaluate our method on two types of missing value patterns. First, random missing values are simply used, and the other type is a more realistic structural missing values pattern based on 3D spatial ellipsoids. We briefly describe the background and the experimental setup of these test cases in Sections IV-F and IV-E-IV-G.

E. MAGNETIC SUSCEPTIBILITIES IN FMRI DATA
FMRI is inherently susceptible to geometric distortions caused by magnetic field inhomogeneity [8] given its relatively long total readout times. Furthermore, this data distortion is often significant in the phase encoding direction. The primary source of field inhomogeneity is differences in the susceptibilities of neighboring tissues [72]. Specifically, there are significant spatial variations at tissue borders/edges. The most vulnerable areas in brain MRI to geometric distortions are the frontal lobe proximal to the paranasal sinuses, as well as the temporal lobe proximal to the mastoid air cells and ear canals [73], [72]. Geometric distortion is often most severe in regions where air-filled sinuses border bone or tissue, including the frontal and temporal lobes. Consequently, these spatial variations produce rapid changes in magnetic field gradients. Different techniques for correcting fMRI image distortion have been reported [3], [72], [74]- [77]. Most of these approaches exclude brain regions with significant variations in the magnetic field gradients in the phase encoding direction or slice selection direction based on a field quality map. The field quality map is computed based on the magnitude or phase derivative variance [3], [72], [74], [77]. The field quality maps are robust in identifying noisy areas in fMRI images and are used to address geometric distortions in fMRI studies. The acquired quality maps are used to develop binary quality masks to threshold the brain images. One such method is the quality map phase denoising (QPMD) method for fMRI data [3]; this method identifies the noisy regions based on the partial derivatives ∇ x,y,z of the phase image, where the high gradient values correspond to low signal quality, and therefore, the corresponding voxels are excluded from the analysis. This results in MNAR missing voxels in areas, including the sinuses, which are susceptible to gradient changes, and the ventral/medial prefrontal areas. This study proposed addressing this issue through 4D fMRI tensor completion, where the missing voxels are represented by a 3D brain mask.

F. RANDOM MISSING VALUE (RMV) PATTERN
This problem is identical to the one described in [7], [8] and corresponds to the MAR pattern [6] observed in fMRI studies. To simulate an RMV pattern, we generated a sampling set Ω w.r.t different missing value rates (M R) defined as where m is the number of missing entries. The indices of observed entries in the sampling set Ω are chosen by randomly sampling each dimension index I n from a uniform distribution on {1, , I n }.

G. STRUCTURAL MISSING VALUES (SMV) PATTERN
This test case is equivalent to the MNAR fMRI pattern described in Section IV-E. To evaluate the proposed algorithm on a realistic MNAR test case, we generate analytical ellipsoids with missing voxels that satisfy the 3D ellipsoid equation where (x 0 , y 0 , z 0 ) is the origin of the ellipsoid and (r x , r y , r z ) is the radius in the direction of each spatial dimension. A spatial location (x 0 , y 0 , z 0 ) was manually selected in a brain volume, and ellipsoid (44) with missing voxels was injected into this region. Furthermore, to simulate the case where several 3D fMRI brain volumes are impacted by spatial ellipsoid (44), we place the spatial ellipsoid at multiple randomly selected timepoints in the 4D fMRI brain scan. The number of affected timepoints is selected to simulate the percentage of missing values in the temporal sense. In addition, we study the effect of the size of the spatial ellipsoid VOLUME 4, 2016  to evaluate the estimation capabilities of the proposed method that are close to a real-world scenarios. Next, we define the quantities that we use to measure the performance of algorithms completing the data affected by SMV patterns as follows: where V e 3D and V e 4D are 3D spatial and 4D hypervolumes of ellipsoid (44), V 3D is the total volume of the fMRI brain scan, X, Y, Z are the numbers of voxels in each spatial dimension, and T is the number of timepoints comprising the fMRI brain scan. The metadata of the generated ellipsoids are given in Table 4. The missing value rate metrics for the SMV pattern are given as where M R V e 3D is the spatial missing value rate measured as the ratio of the spatial volume of the ellipsoid with missing voxels in (45) to the total spatial volume in (47). M R T is the temporal missing value rate across the entire 4D fMRI brain scan, where M T is the number of timepoints affected by the artifact in (44). In the SMV experiment, we use the discrete values missing ratio set M R T (%) ∈ {5, 10, 15, 20}%, and for each value of M R T , we vary the volume of the ellipsoid artifact according to values in Table 4 and the tensor  dimension values with levels as per Table 3. Conceptually, the 4D fMRI tensor completion process for the SMV pattern is visualized in Fig. 3. The 4D fMRI hypervolume is corrupted adding spatial ellipsoids with missing voxels into a 3D whole-brain volume at randomly selected timeframes. We feed the entire 4D fMRI tensor with corrupted 3D wholebrain volumes at timepoints T i into the proposed algorithm, which completes the 4D tensor simultaneously for all corrupted 3D spatial brain volumes.

H. RANK ESTIMATION
The tensor completion problem in (5) requires that the rank of a high-dimensional dataset be known in advance. Specifically, for TT decomposition, we have to determine the multilinear TT rank in (2). The TT ranks R n have an upper bound determined by a CPD approximation with tensor rank R and accuracy TT , such that R n ≤ R and TT decomposition can be computed with a relative accuracy of √ N − 1 TT [44], where N is the tensor dimension. We estimate the CPD-rank R and multilinear Tucker-rank with the aid of automatic relevance determination (ARD) [78] and the L-curve algorithm [79], [80]. The stopping criterion for the ARD algorithm is set to when the relative tolerance of the negative log-likelihood is below 10 −8 , which results in the explained variance being in the range of 0.9998 − 0.9999. The ranks computed by the ARD algorithm are R CPD4D = 81, R CPD3D = 46 for CPD decomposition and r Tucker4D = (53, 63, 46, 91), r Tucker3D = (1630, 43,144) for Tucker decomposition.
We use the computed CPD-rank R as the maximal rank constraint R max and as the input to the TT-SVD algorithm [44], and the values of the TT rank are determined by a quasi-optimal TT approximation. The multilinear r 4D and r 3D TT ranks are estimated as a result of the TTrounding algorithm with the imposed maximal rank constraints R 4Dmax = R CPD4D = 81, R 3Dmax = R CPD3D = 46, and R 2Dmax = 144 for 4D, 3D, and 2D tensor representations accordingly. The maximal rank for the 2D representation is selected as a column rank of the reshaped 2D matrix of the input data tensor. The following TT rank values were estimated for each tensor dimensionality: 144, 1). The ranks R 0 and R N are restricted to one by the definition of TT decomposition, whereas ranks R n are selected to be the minimum values of the column ranks of the unfolding matrices X <n> ∈ R I1I2···In×In+1···I N [44] and the maximal rank R max .

I. QUANTITATIVE EVALUATION
We compare RSCG-TT with two other methods: Tucker-SGD and CPD-SGD. The compared algorithms employ the SGD optimization method called Adaptive Moment Estimation (Adam) [70], which is known for its prominent performance. The Adam's update is given as follows: where d, , β 1 , and β 2 are hyperparameters and m k and v k are the first-and second-moment estimates of gradient ∇f k (x k ). We selected the default values for the hyperparameters and learning rate α: = 10 −8 , β 1 = 0.9, β 2 = 0.999, and α = 0.001, as suggested in [70]. All three comparison algorithms are implemented in the TensorLy [81] framework.
To measure the numerical performance of the algorithms, we report the relative square error (RSE) between the completed tensorX and the true tensor T , which is defined as However, to assess the reconstruction quality of the missing values, we use the tensor completion score (TCS) [14], which is given as The RSE metric in (50) is used to evaluate the overall global numerical performance, whereas the TCS in (51) is used to assess the RSE in the missing values [14]. To accommodate the specific needs of fMRI data, we define a new threshold metric based on (51), which measures the recovery of missing voxels in the data regions after thresholding Z-scores by two times the standard deviation We detect the stagnation and stop the algorithms when a relative residual of the objective function in Ω between the successive iterations drops below the prescribed tolerance : where = 10 −8 and the maximum number of iterations max iter = 500. Furthermore, we evaluate whether there is a significant effect between study groups using one-way analysis of variance (ANOVA) and reported F -statistics along with F -values and p-values. Post hoc analyses with two-tailed parametric t-tests and corrections for multiple comparisons that use the false discovery rate (FDR) are performed to determine whether the difference between the performance of the contrasts of interest is significant and are described by the p-value, T -value, and H 0 -value. The latter shows the result from evaluating the null hypothesis and that there is a significant difference between two contrasts (i.e., H 0 = 1 indicates that the null hypothesis is rejected, and if H 0 = 0, the null hypothesis is not rejected). We also provided the mean (M ) of the measure of interest and the standard deviation (SD). In this paper, we use the 95% (p < 0.05) confidence interval (CI) to reject the null hypothesis. Of note, the relevant statistical results of group fMRI analyses are corrected for multiple comparisons with the familywise error (FWE), where p < 0.05.

V. RESULTS
This section is divided into three subsections. The first subsection describes the data completion results with the RMV pattern (see Section IV-F) as a performance baseline for RSCG-TT. The second and third subsections are used to evaluate the robustness of RSCG-TT on more challenging tasks such as SMV pattern completion (see Section IV-G) and resting network estimation. Throughout the paper, we provide the results averaged across N = 50 subjects from the COBRE dataset described in Section IV-A. In this paper, the individual subject datasets are normalized to have unit variance and a zero mean before they are input into each tensor completion algorithm. Fig. 4 presents examples of three randomly selected subjects with fMRI scans corrupted by the RMV pattern and subsequently recovered by the RSCG-TT algorithm as well as other algorithms for comparison.

A. RMV PATTERN
In this section, RSCG-TT is evaluated for RMV pattern completion that uses 4D, 3D, and 2D representations. We compare the accuracy and convergence of RSCG-TT with the state-of-the-art CPD-SCD and Tucker-SCG methods for different problem sizes and tensor dimensionalities.

1) Cost comparison by missing value rates
We investigate the performance of RSCG-TT using the 4D spatiotemporal structure of fMRI scans from the COBRE dataset with 10 − 90% voxels missing at random. See Section IV-F for a detailed description of the setup. The size of a 4D fMRI tensor is X ∈ R 53×63×46×144 . We show an example of completed fMRI brain volume with M R RM V (%) = 60% at specific timepoints in Fig. 5. We use the TCS in (51) and its threshold counterpart TCS Z in (52) to evaluate the quality of data completion. We summarize the results of 4D tensor completion for the RMV pattern using the RSCG-TT algorithm with different M R RM V (%) in Fig. 6. As shown in Fig. 5, RSCG-TT yields very low TCS values, and the recovered image in Fig. 5 is almost the same as the original image. The qualitative assessment of the completed images with M R RM V (%) ∈ {25%, 50%, 75%, 90%}, shown in Fig.  6a-d, suggests that RSCG-TT can recover the 4D fMRI tensor with high accuracy when there is a significant amount of missing values. We present the convergence curves of the mean RSE and the objective function w.r.t. to the number of iterations by applying RSCG-TT in Fig. 6e-f. The convergence curves shown in Fig. 6e-f illustrate the monotonic conversion of cost function (5). Furthermore, for the RMV pattern, the RSCG-TT algorithm exhibits rapid convergence even for larger M R RM V (%). The loss curve in Fig. 6e demonstrates that the stationary point is reached in fewer than fifty iterations.   Fig. 7, Tables 6, and 7. Fig. 8 shows that when the fMRI scan is represented by a 4D tensor, the RSCG-TT algorithm has the lowest TCS for each M R RM V (%) and numerically outperforms the 2D and 3D representations. Furthermore, it is clearly observed that the performance of the RSCG-TT method degrades as the dimension of the data decreases from 4D to 3D and 2D. The superior performance of 4D tensor completion can be explained by the ability of the 4D TT manifold to take full advantage of the spatiotemporal correlations among the modes of the fMRI tensor. Fig. 8b shows the metric TCS Z is relatively higher than the TCS. This observation illustrates that RSCG-TT performs better in the areas of the fMRI scan that have high activation since the TCS Z metric takes only higher values (|Z| > 2) into consideration, while the TCS takes all values into consideration. Both the 4D and 3D representations still yield better numerical performance in the range of 0.639 × 10 −3 to 1.162 × 10 −2 compared with the 2D representation.
We performed ANOVA and showed that the main effect of tensor dimensionality (2D vs 3D vs 4D) was significant;  Table 6 and Fig. 7). We further studied  Table 5. the ability of RSCG-TT to fully recover original images corrupted with varying M R RM V (%) levels for different tensor dimensions in the RMV pattern. We illustrate the results of the experiment in Fig. 9 and Table 7. We quantitatively evaluated the statistical significance between the mean voxel intensity values of the original fMRI images and the images after data completion for tensor dimensions D = 2, 3, 4 and M R RM V (%) ∈ {10 − 90%}. Post hoc analyses with two-tailed t-tests (FDR corrected) and the significance level p < 0.05 showed that RSCG-TT can recover the original image with M RV RM V (%) values in range of {10 − 60%} for 3D, {10 − 80%} for 4D tensor, and only 10% for 2D tensor format. As shown in Table 7, and Fig. 9, the mean voxel intensity values between the original and the completed images were statistically different for 2D format with M R RM V ∈ {20 − 90%}. In contrast, the high-order tensor formats did not fully recover only the original data for the extremely high values of M R RM V (%) ∈ {70−90%} for 3D, VOLUME 4, 2016 and for 4D with M R RM V (%) = 90%. Together, the results presented in this section suggest that tensor dimensionality does have an effect on fMRI data completion. Specifically, our results demonstrate that the original 4D spatiotemporal representation of fMRI data has a higher accuracy than the 2D and 3D representations of the original data. In summary, our experiments show that RSCG-TT can recover corrupted fMRI data in presence of high missing value rates up to 80% when the data are kept in the natural 4D format.

3) Comparison of RSCG-TT with the state-of-the-art methods
In this section, we compare the performance of RSCG-TT and the state-of-the-art methods on the reconstruction task of 4D fMRI data with M R RM V (%) ∈ {10%, 90%} with an increment of 10% corrupted by the RMV pattern. We evaluated two aspects of the performance: the numerical accuracy in terms of the RSE and TCS and the computational cost w.r.t. the convergence rate and the CPU time. We summarize the computational cost results of all algorithms in Fig. 10 and present the mean metrics of the numerical reconstruction performance results in Fig. 11 and Table 8. The main effect of the tensor completion method (CPD-SGD vs Tucker-SGD vs RSCG-TT) after post hoc analyses is shown in Table 9 and Fig. 12.
We show the convergence rate of the objective function for all methods w.r.t to the iteration in Fig. 10a and present the convergence curves of the objective function w.r.t to CPU time in Fig. 10b. The convergence curves presented in Fig. 10 and the numerical results in Fig. 11 demonstrate that RSCG-TT outperforms both competitor algorithms in terms of numerical accuracy measures (the RSE and TCS) and computational costs. In Fig. 10b, we compare the computational efficiency of RSCG-TT and the state-of-the-art algorithms as a function of the number of iterations and the elapsed time. As shown in Fig. 10a, RSCG-TT converges rapidly compared with both the CPD-SGD and Tucker-SGD algorithms. The state-of-the-art algorithms require a much larger number of iterations to converge. As shown in Fig. 10, while the total elapsed time to converge is similar, the cost function accuracy of RSCG-TT is superior to the state-of-the-art algorithms by two to three fold. We present the mean (N = 50) TCS and RSE results of 4D tensor completion with RSCG-TT and compare these results with the results of the state-of-the-art methods: CPD-SGD and Tucker-SGD in Fig. 11f-h. For the fixed rank model in the 4D experiments, we estimate the fixed rank parameters as r 4DTT = (1, 81, 81, 81, 1) for the TT rank, r Tucker4D = (53, 63, 46, 91) for the Tucker-rank, and R CPD4D = 81 for the CPD-rank, as described in Section IV-H. In terms of the RSE, both RSCG-TT and Tucker-SCG demonstrate good numerical performance w.r.t. the RSE for RSCG-TT (RSE RSCG-TT ) in the range of 0.391 × 10 −3 to 0.857×10 −3 , and the RSE for Tucker-SGD (RSE Tucker-SGD ) is in the range from 5.529×10 −3 to 1.709×10 −2 . The RSE for CPD-SGD (RSE CPD-SGD ) is in the range of 2.782 × 10 −2 to 5.261 × 10 −2 , which is higher in order of magnitude than the    11g, 11h, and Table 8. As seen in Fig. 10, CPD-SGD and Tucker-SCG demonstrate a lower computational efficiency compared with RSCG-TT in terms of both the convergence rate and numerical accuracy of the objective function w.r.t. to the CPU time (see Figs. 10a and 10b). Moreover, it is worth noting that RSCG-TT results in consistently smaller TCS Z values compared with the state-of-the-art methods, as demonstrated in Fig. 11h. This can be interpreted as RSCG-TT having a higher sensitivity in order to complete the data in the areas with very low activation values. Fig.  11a-e presents the results on the completed 4D fMRI data for M R RM V (%) = 75% with the state-of-the-art methods and RSCG-TT. A visual inspection of the completed images clearly shows that the CPD-SGD in Fig. 11c has blurred areas and suffers from poor reconstruction, the Tucker-SGD in Fig.  11d shows better performance, and the images completed by RSCG-TT are almost indistinguishable from the original images shown in Fig. 11a.
Based on the ANOVA results, we identified that the main effect of the tensor completion method (CPD-SGD vs Tucker-SGD vs RSCG-TT) was significant; i.e.,    tion methods (see Table 9 and Fig. 12).

B. SMV PATTERN
In this section, we evaluate RSCG-TT with the SMV pattern described in Section IV-G. We investigate the joint effect of the spatiotemporal missing value rate and the effect of the tensor dimensionality to estimate the missing voxels that originated from the SMV pattern. We show an example of SMV pattern completion in Fig. 13.  113, 120, 122, 129, 135, 136} and the temporal missing value rates are M R T (%) = 15%. Fig. 13b shows an example in which a spatial ellipsoid with missing voxels of size 1 (see Table 3) is placed in the brain volume at spatial location (x, y, z) = (2,32,22) and timepoint T i = 20. The completed fMRI image at timepoint T i = 20 is shown in Fig. 13c.  Fig. 14 illustrates the joint effect of both spatiotemporal factors M R T (%) and M R V e 3D ,(%) on the quality of data completion. As we can observe from Fig. 14, the simultaneous increase in both M R T (%) and M R V e 3D ,(%) impacts the numerical accuracy of data completion. However, as shown in Fig. 14, RSCG-TT obtains good quality data completion in terms of the TCS in the range of 1 × 10 −4 − 2 × 10 −3 for the problem sizes of the spatial ellipsoids ∈ {1, 2, 3, 4} and M R T (%) up to 20%. The observed results for the SMV pattern are in agreement with those from previous works [13] and [14]. The effect size of the SMV pattern proportionally decreases the number of neighborhood voxels and, as a result, reduces the inferential power of the tensor completion methods.

2) Effect of the tensor dimensionality on tensor completion for the SMV pattern
In this section, we study the effect of tensor dimensionality on the estimation of missing voxels with RSCG-TT when fMRI scans are affected by the SMV pattern. We summarize the effect of tensor dimensionality for the SMV pattern in Figs. 15, 16, 17, Table 10 and Table 11. Fig. 15 presents a visual comparison of the obtained completed images with RSCG-TT for 2D, 3D, and 4D tensor formats corrupted by the SMV pattern with different M R T (%). Despite the challenging missing SMV pattern, simultaneous temporal and volumetric corruption, successful reconstruction was obtained for 4D and 3D, but not for the 2D tensor format. In particular, the completed 4D and 3D fMRI images appear visually almost indistinguishable from the original ground truth 4D fMRI tensor. RSCG-TT obtained good quality of data completion for M R T (%) in the range of {5 − 15%}. However, RSCG-TT were unable to successfully recover fMRI data in form of 2D representation for all M R T (%). The poor performance of RSCG-TT for 2D tensor format can be explained by lack of spatiotemporal structure when the original 4D fMRI is flattened to 2D form, in which, the natural strength of the relationships is lost. Table 10 and Fig. 16 show quantitative evaluations of RSCG-TT for 2D, 3D, and 4D formats in the SMV pattern and different M R T (%). We calculated the mean TCS (N = 50) between the data completion results and the ground-truth original 4D fMRI data. In terms of TCS, the reconstruction of 2D representation demonstrated suboptimal performance with the mean TCS in the range from 0.159 to 0.396. Conversely, RSCG-TT was able to obtain good numerical performance for both 4D and 3D formats w.r.t. the mean TCS for 4D in the range from 0.3289 × 10 −3 to 0.0146 × 10 −3 , and the TCS for 3D format in the range from 0.005 to 0.057. To evaluate the statistical significance between tensor formats using RSCG-TT in the SMV pattern, we performed ANOVA followed by post hoc analyses with two-tailed t-tests corrected for multiple comparisons with p < 0.05 (see Table 11 and Fig. 15). The boxplots shown in Fig. 17 summarize the distribution of mean TCS metric for tensor dimensions 2D, 3D, and 4D. The results underline the fact that spatiotemporal data completion leads to statistically Results of 4D fMRI tensor completion on fMRI data with RSCG-TT and the SMV pattern by varying the spatial volume of the missing values for different percentages of missing temporal ratios. Left: The original brain volume at the 20th timepoint. (a) Left to right. Brain volumes at the 20th timepoint with embedded spatial ellipsoids with missing voxel sizes ∈ {1, 2, 3, 4, 5, 6} (see Table 4). We present the main effect of tensor dimensionality on the results for the SMV pattern in Fig. 17a and Table 11 and the distribution of the mean TCS in Fig. 17b. The main effect of tensor dimensionality (2D vs 3D vs 4D) in the presence of the SMV pattern was significant; i.e., F (2, 71997) = 40362.579, p < 0.0001. The subsequent post  Fig. 16 and Fig. 15.  These results suggest that tensor dimensionality does have an effect on fMRI data completion when the original data are affected by the SMV pattern.

C. EVALUATION OF THE QUALITY OF THE TENSOR COMPLETION METHODS ON DOWNSTREAM FMRI ANALYSES
In this section, we evaluated the effect of each tensor completion method on the sensitivity of fMRI analyses to estimate the resting network (RSN) components. We decomposed the rs-fMRI datasets (see Section IV-A) into functional networks using spatial independent component analysis (ICA) to derive maximally independent RSNs. We conducted group ICA (GICA) [82], [83] to evaluate the sensitivity of RSCG-TT and the state-of-the art tensor completion methods CPD-SGD and Tucker-SGD to voxel activation. We generate four datasets for each of the N = 50 subjects, one original dataset and three datasets from the completion algorithms for M R RM V (%) = 50%. We used GICA to decompose the datasets and used direct GICA3 [83], [84] back- reconstruction to obtain subject-specific RSN components. We used the Infomax algorithm [85] and estimated C = 50 RSN components as suggested by [86]. The whole analysis was implemented with the GIFT toolbox 2 . The Infomax algorithm was repeated the process ten times, and the final decomposition was chosen based on the most consistent run selection scheme [87]. We estimated 16 independent components (IC) to be RSNs (as opposed to noise artifacts) by inspecting the aggregate ICs and the power spectra [88]. The RSNs were divided into different groups according their functional and anatomical properties [88] and categorized as auditory (AUD), visual (VIS), sensorimotor (SM), cognitive control (CC), frontoparietal (FP), and default-mode networks (DMN). 0.000 4 0.000 1 **** indicates p < 0.0001 (post hoc, FDR corrected), which means significantly different.

1) Group-level sensitivity comparison by tensor completion method
To compare the group-level sensitivity of the three completion methods, we computed voxel-wise t-statistics from a one-sample t-test of the subject-level RSNs and plotted the tvalues as a statistical image called the T-map. The threshold of the T-maps is p < 0.05, and the T-maps are corrected for multiple comparisons with FWE. We evaluated the grouplevel sensitivity of each method by studying its ability to recover independent RSNs across six functional RSNs. We present the T-maps of the RSNs comprising six functional networks: DMN, AUD, VIS, SM, CC, and FP in Fig. 18 and Supplementary Fig. S.1. We focused on a reliable estimation of major RSNs, as they are important biomarkers in a number of disorders; see, e.g., [88]- [90].
The T-maps following different tensor completion methods for the canonical DMN component are shown in Fig. 18, and the T-maps for other RSNs are shown in Supplementary  Fig. S.1. All six RSNs resulting from RSCG-TT are matched with the reference RSNs (see Fig. 18 and Supplementary  Fig. S.1). The results illustrate that the RSN components estimated from the datasets completed by RSCG-TT are the most similar to those estimated from the original datasets. In contrast, it is evident from Fig. 18 and Supplementary  Fig. S.1 that the spatial clusters comprising the RSNs are generally smaller by a certain extent following the CPD-SGD and Tucker-SGD methods. We characterize the performance of each tensor completion method w.r.t. the proportion of significant activations in Table 12 and Supplementary Fig.  S.3. In summary, the results presented in Table 12, Fig.  18, Supplementary Fig. S1 and Supplementary Fig. S.3 demonstrate that the t-statistics, the proportion of significant voxels, and the spatial cluster extents are higher in RSCG-TT, indicating better sensitivity.
To quantitatively compare the RSN components across all tensor completion methods, we performed a two-sample ttest on the spatial maps between the reference RSN component estimated from the original datasets and the RSN component generated after each tensor completion method. Fig.  19 and Supplementary Fig. S.2 illustrate the T-map contrasts yielded from the two-sample t-tests: the reference RSN − CPD-SGD RSN, reference RSN − Tucker-SGD RSN, and Peak activations of RSN components after the tensor completion algorithms. Cluster ID: the enumerated cluster ID of peak activations within the RSN component; tmax: the maximum t-statistic in each cluster; Coordinate: x, y, z (in mm) of tmax in MNI space; Cluster size: the cluster size (in mm 3 ); V l : the number of voxels in each cluster; V l (%): the percentage of activated voxels.

Method/Network
Cluster ID Coordinate tmax Cluster Size, mm 3 V l V l (%)  and Tucker-SGD methods. RSCG-TT has advantages over the-state-of-the-art methods because the TT manifold allows the algorithm to take full advantage of the global correlation structure between modes of an fMRI tensor. Our study demonstrates that RSCG-TT method is a valuable approach for fMRI data completion that can increase the sensitivity of downstream analyses in identification of important biomarkers such as RSNs. Detection of the statistically significant effect size in human neuroscience continues to be an ongoing field of active research. The empirical validity of the inferential methods in fMRI is still being studied [91]. To evaluate the significance of the RSNs, we employed the voxel-based inference, a parametric statistical principled correction for multiple comparisons to control the FWE rate. However, due to principled protection from false positives (Type I error), the voxel-based statistics may result in lower sensitivity to voxel activation [92]. Additional methods to control the FWE rate include those making use of cluster-level statistics: cluster-based thresholding [93] and threshold-free cluster enhancement (TFCE) [94]. However, they have their own drawbacks studied in [91], [95]. The voxel-based inference that is used in this work has been widely used, and is desirable due to its ability for rigorous control of false positives in fMRI [96].

A. NUMERICAL PERFORMANCE OF RSCG-TT
We investigated the performance of the RSCG-TT algorithm for the completion of different missing value patterns and as a function of tensor dimensionality. Unlike most works on tensor completion that primarily utilized the RMV pattern, we applied the proposed method to realistic 4D representations in which some selective voxels were missing in spatial ellipsoids across multiple timepoints. To the best of our knowledge, this is the first work applying tensor completion algorithm to the realistic scenario of missing voxels using SMV pattern. The experimental results demonstrate that the proposed RSCG-TT method exhibits rapid convergence rate with the order of multitude differences compared with the performance of the state-of-the-art methods and tensor decompositions such as CPD-SGD and Tucker-SGD. The fast convergence rate can be attributed to the theoretical foundations of our RSCG-TT algorithm (see Section II-C1) since we minimize the spectral-scaling condition, which implies a lower condition number, and the number of iterations required for the algorithm to converge decreases. The results demonstrate the superior performance of RSCG-TT in terms of two accuracy measures compared with the stateof-the-art SGD-based algorithm such as Adaptive Moment Estimation using CPD and Tucker tensor decompositions. We observe that the mean TCS (see Fig. 14 and Table 10) is in the range of 3.289 − 9.407 × 10 −4 for the 4D fMRI tensor impacted by the SMV pattern with a realistic temporal missing value rate M R T (%) ∈ {5, 10}%. The observed desirable numerical performance of the RSCG-TT can be attributed to several factors, such as the 4D data structure is fully utilized, TT decomposition is employed, and the use of Riemannian manifold. TT decomposition improves the accuracy by taking into account the global correlations across tensor modes. RSCG-TT exhibits the best numerical results in terms of both the TCS and RSE compared with Tucker-SGD, and CPD-SGD has the worst numerical accuracy. The poor performance of CPD-SGD can be explained by the fact that CPD decomposition is better suited for the recovery of unique factors, whereas TT and Tucker decompositions perform best in data completion scenarios.

B. EFFECT OF TENSOR DIMENSIONALITY ON FMRI DATA COMPLETION
We studied the numerical performance of the proposed algorithm with fMRI data represented as 4D, 3D, and 2D tensors. The experimental results suggest that the proposed algorithm performs best when we take the full 4D nature of fMRI data into account with a 4D tensor. Our experiments have shown a reasonable completion threshold confirmed by both quantitative and visual assessments in terms of TCS when it is below 10 −2 . The 3D tensor representation has a worse performance than 4D tensor representation (see Fig. 8 and Fig. 14). The 2D tensor representation results in a high TCS above 10 −2 for the majority of missing value rates for both the RMV and SMV patterns (see Fig. 8 and Fig. 14). Our results suggest that the 4D tensor is the natural structure of fMRI data represented as 3D space × time and us allows to utilize the richness of both spatial and temporal information to its full potential.

C. ROLE OF A MISSING VALUE PATTERN ON THE CAPABILITY OF A FULL DATA RECOVERY
Based on the experimental results presented in Section V-B, we conclude that the RMV pattern is relatively easy to complete, which allows for a small TCS and consequently better numerical performance. RSCG-TT is able to successfully recover the data corrupted by the RMV pattern using 4D tensor format even with a very high missing ratio, 80% of the total number of voxels covered by a brain mask (see Fig. 8 and Fig. 9). As missing voxels are random, the neighborhood voxels contain enough information to infer them, which enables the successful completion of the data tensor. On the other hand, as confirmed by earlier studies [13], [14], the data corrupted by the SMV pattern are more challenging to complete than those affected by the RMV pattern. The experiments indicate that a simultaneous increase in both M R T (%) and M R V e 3D (%) results in a reduction in the numerical performance of RSCG-TT to complete the data tensor (see Fig. 14). In this case, since the missing voxels follow the SMV pattern, the neighborhood voxels do not have sufficient information for the inference, and therefore, the missing voxels for the SMV pattern are more difficult to recover.

D. COMPARISON OF THE COMPUTATIONAL COMPLEXITIES OF RSCG-TT AND THE STATE-OF-THE-ART METHODS
In this section, we compare the computational complexities of RSCG-TT and the state-of-the-art methods, as summarized in Table 13. Here, similar to Section III-B, we assume that R max is the upper bound on the tensor decompositions under consideration. The CPD-SGD, and Tucker-SGD methods are based on the distributed Adam's SGD, which averages the parameters after each iteration. The update for one entry of the SGD scheme requires O(1). The number of tensor entries for the CPD and Tucker models is I = |Ω|. Therefore, the total computational complexity for one iteration w.r.t. the free parameters and the number of observed entries in the CPD model is O (|Ω|N R max ), and the estimate of the Tucker model can be calculated in O(|Ω|N R max +R N max ) [43], [97].
To complete the entire 4D fMRI tensor, RSCG-TT, on average, requires 3 × 10 3 ms of CPU time (see Fig. 10b) on a 2.7Hz Xeon processor with 8-16Gb of memory. In Fig. 10b, we compare RSCG-TT and the state-of-the-art completion algorithms w.r.t. the computational time. As we can note from Fig. 10b, the computational time is another key differentiator between the methods. It should be noted from Fig. 10b that RSCG-TT is at least two orders of magnitude faster than CPD-SGD, and Tucker-SGD per unit of time. Moreover, RSCG-TT demonstrates a sharp decrease in the value of the objective function within the same time interval compared with the state-of-the-art methods. The presented results demonstrate the ability of RSCG-TT to approach the solution quicker in terms of both computational measures, i.e., complexity per unit of iteration and the complexity per unit of time. In contrast, the state-of-the-art methods require higher computational time to achieve a reasonable reduction in the value of the model cost function. Therefore, the stateof-the-art algorithms require a significantly higher computational time to approach the solution. The results shown in Fig. 10 can be explained by the properties of the algorithms listed in Table 13. The rate of change in the objective function value of RSCG-TT method is superlinear since it belongs to the family of nonlinear conjugate gradient algorithms [55], [59]. The state-of-the-art methods exhibit a linear time complexity due to their design inherited from the gradient descent scheme [55], [59]. With respect to computational time, the CPD-SGD method is clearly less efficient among the three methods, requiring the largest computational time, whereas RSCG-TT performs the best. The other notable state-of-the-art tensor completion algorithms are TT-ALS [98], [99], and TMac-TT [17]. The former is based on the Alternating Least Square (ALS) scheme, and the latter is based on parallel matrix factorization using adaptive rank estimation. In each TT-ALS step, all TT-cores but one is kept fixed, and the overall optimization problem is reduced to a small optimization problem of a single core. Compared with the complexity per iteration of RSCG-TT, the TT-ALS procedure uses the fourth power of rank instead of square. While the space complexity is similar to RSCG-TT, the TT-ALS possesses a slower theoretical [100] and practical [99] convergence property.
The advantages of TMac-TT over RSCG-TT include the adaptive rank estimation and bypassing the computationally expensive SVD. Despite the advantages such as adaptive rank estimation, the exponential computational complexity in tensor dimension O(3(N − 1)I N R max makes TMac-TT method impractical in large-scale tensor completion scenarios. Although as shown in Table 13, the CPD-SGD method possesses a better computational and space complexity (total space taken by the algorithm with respect to the input size), the determination of canonical tensor rank is NP-hard [101] problem. Second, the best rank-R CPD decomposition might not exist since the space of low-rank tensors is not closed [102]. It is worth noting that the Tucker-SGD model offers a more flexible tensor model. However, the space complexity of the Tucker format scales exponentially in the tensor dimension N , which makes the Tucker-based methods not practical for tensor dimensions N ≥ 4 due to the curse of dimensionality.
From this perspective, our assessment suggests that TT-ALS, TMac-TT, CPD-SGD, and Tucker-SGD either provide suboptimal computational complexity or may not be able to deal with large-scale high-order tensors. Finally, as illustrated in Table 13, Fig. 10, and above considerations, RSCG-TT is theoretically and practically superior to the state-of-the-art algorithms, and it is computationally preferable for the data completion tasks.

E. SMV PATTERN IN PRACTICAL SCENARIOS
In this section, we discuss a practical application of the RSCG-TT method when fMRI data are analyzed in the complex-valued domain. fMRI data are natively acquired as complex-valued spatiotemporal images; however, most studies have only focused only on the magnitude data in fMRI data. Several studies have shown that employing fMRI in their native form with both the magnitude and phase improves the sensitivity of analyses [3], [3], [72], [74]- [77]. Since phase images pose a unique challenge, a typical solution is to remove noisy regions from fMRI data prior VOLUME 4, 2016 TABLE 13. Summary of computational complexities of RSCG-TT and the state-of-the-art tensor factorization algorithms. For simplicity we assume that the length of every mode is equal to I and Rmax is the upper bound on the considered tensor decompositions.

Algorithm
Computational complexity Space complexity Convergence rate Rank computation complexity Space of low-rank tensors to analyses. Consequently, regions are eliminated through thresholding and edge reducing techniques or by using QMPD methods [3], [72], [74]- [77]. The QMPD maps are used to generate quality masks to exclude voxels that should not be further analyzed, as described in Section IV-E. The SMV pattern studied in this paper aims to demonstrate one of the applications of the RSCG-TT method when 3D brain regions are excluded from the analysis. Here, we highlight the importance of our findings, demonstrating the viability of SMV pattern fMRI data completion, which can be used in the broader context of the MNAR voxel pattern. Finally, we would like to conclude that, indeed, taking the 4D structure of fMRI data into account provides significant performance gains and is very exciting on its own. We hope this will help encourage and accelerate research in the very promising area of tensors for fMRI data not only for data completion but for the development of other tools for the analyzing fMRI data when computational challenges might be even more significant.

VII. CONCLUSIONS
We introduce a novel approach to the fMRI data completion problem based on TT decomposition along with a corresponding algorithm for its solution. We develop Riemannian nonlinear spectral conjugate gradient method called RSCG-TT to solve a higher-order fMRI data completion in the form of a 4D tensor. The RSCG-TT algorithm exploits the second-order information and offers additional guardrails to ensure that computational performance is coupled with numerical stability. Since we deal with high-order datasets, we extend our method to a TT format, which provides scalable numerical algebra operations and hierarchical compressed representations of structurally diverse datasets. Exploiting the Riemannian structure and the efficient utilization of the covariance information by TT decomposition boosts the numerical performance significantly when the proposed method is compared with state-of-the-art SGD-based methods, such as Tucker-SGD and CPD-SGD. One of the significant advantages of our approach is that we suggest viewing fMRI data as 4D spatiotemporal hypervolumes observed during a scanning session. This view takes into account the natural representation of fMRI data in the form of a fourth-order tensor represented as 3D space × time. As demonstrated by the experiments in Section V, representing a 4D tensor as a TT tensor allows us to fully exploit the global structure of fMRI data and introduces a joint learning scheme, which captures the intrinsic relationship between the spatial and temporal modes. We consider the practical use of the RSCG-TT algorithm when some select brain volumes of a 4D fMRI scan are affected by the exclusion of brain regions that resemble the MNAR missing voxel pattern. Our experiments suggest that the RSCG-TT algorithm can estimate the missing brain voxels affected by the MNAR pattern and provides a reliable recovery even in the areas with very low activation thresholds and Z-scores below two times the standard deviation. Studies with other lower-dimensional representations demonstrate that taking the full 4D structure of fMRI data into account provides significant gains compared to when 3D and 2D representations of the data are considered. We evaluated the ability of RSCG-TT to recover the RSN components after data completion. Our results suggest that RSCG-TT provides advantages in retaining the RSN components compared to the state-of-the-art tensor completion methods. Another promising direction in data completion research includes fMRI data completion in the native k-space. The kspace intrinsically possesses very desirable properties, such as fMRI image sparsity in wavelet bases and the incoherence between Fourier and inverse wavelet bases. These properties might allow one to significantly reduce the order of a model, i.e., the TT-rank, and to employ nonlinear data completion through a nonsmooth formulation of the optimization problem. Future potential studies include a number of exciting directions, such as simultaneous estimations of an optimal multilinear TT-rank and robust tensor completion, complex-valued tensor completion, incorporating nonparametric cluster-level statistics, and working directly in the k-space with a nonsmooth objective function. .

APPENDIX A LIST OF ABBREVIATIONS
2D two-dimensional 3D three-dimensional 4D four-dimensional ARD Automatic Relevance Determination It was shown in [37] that for the embedded submanifold of R I1×···I N , the vector transport is given by the orthogonal projection of P T Y M ξ onto the tangent space T Y M as follows: (B.11) The concept of vector transport τ X →Y is depicted in where we denote the directional derivative as Df .
Since the TT-manifold M r is embedded in Euclidean space R I1×···×I N [47], the Riemannian gradient for TT decomposition is computed as the orthogonal projection of the Euclidean gradient onto tangent space (B.3) [37] gradf (X ) = P T X Mr (∇f (X )), (B.13) where f : R I1×···×I N → R is the cost function with Euclidean gradient ∇f (X ) and X ∈ M r .

A. RIEMANNIAN MANIFOLD LEARNING FOR TENSOR DECOMPOSITION IN A TT FORMAT
Riemannian optimization on tensor manifolds is a growing field, and it is a generalization of the theory and algorithms from unconstrained Euclidean optimization to problems on smooth manifolds [37], [104]. It consists of optimizing a tensor valued function on a curved manifold instead of Euclidean space. In the Riemannian setting, the Euclidean notion of straight lines and ordinary differentiation is replaced with geodesics and covariant differentiation. Mathematically speaking, Riemannian tensor optimization considers finding an optimum of a scalar cost function of a tensor variable f (X ) defined on a smooth differentiable manifold, M in (B.1), equipped with a Riemannian metric structure g, given in (B.2), that is both symmetric and positive definite [105], [106] as: (B.15)

1) Riemannian optimization concepts
The continuous optimization of Riemannian manifolds requires us to construct a descent search curve γ such that: γ (α) = −gradf (γ(α)) ∀α, (B.16) γ(α) := R X (αη), (B.17) where X ∈ M, a search direction η ∈ T X M, T X M is the tangent space in (B.3) to the manifold at the point X , and α is the scalar step size following the geodesics (B.14) along the curve γ. The interpretation of moving along the curve γ can be seen as moving in the search direction η, while every iteration X k is constrained to the manifold M.
In practice, since it is not feasible to compute geodesics in a closed form, the computation of geodesics is replaced with a smooth mapping, called retraction R X : T X M → M (B.7). The concept of retraction is illustrated in Fig. B.2(a). Additionally, to compute the new search direction, we have to linearly combine the current gradient vector and the previous search direction. Technically, we need to transport the information from one tangent space T X M to another tangent space T Y M using the vector transport τ X →Y in (B.9). An intuitive drawing of a vector transport is depicted in Fig.  B.2(b). Finally, with the retraction R, the updating formula for the curvilinear search is given as where the step size α k is computed to satisfy the Wolfe conditions [59] and η k is the search direction satisfying the descent condition [55].

2) Manifold structure for TT decomposition
In [47], the authors show that the set M r of the tensors of fixed multilinear rank r = (R 0 , R 1 , R 2 , · · · , R N ) in TT format form a smooth embedded submanifold of R I1I2···I N .
where the dimension of M r is given by 20) and the tensor X allows TT decomposition (1) with multilinear TT rank r in (2). We use the inner product in (4) defined for TT decomposition as a metric on M r . Therefore, when we equip M r with this metric, M r becomes a Riemannian manifold (M r , g). As a result, the Riemannian structure of the TT manifold allows us to use any method from the optimization on Riemannian manifolds with the Riemannian gradient defined in (B.13).
IRINA BELYAEVA received the M.S degree in computer science from Johns Hopkins University, Maryland, in 2013. She is currently pursuing the Ph.D degree in computer science at University of Maryland, Baltimore County, MD, USA. Her main research interests lie in the fields of matrix and tensor factorizations with applications in medical imaging, specifically working with fMRI and MEG imaging modalities.
SUCHITA BHINGE received the PhD degree in Electrical Engineering and M.S. degree in Electrical Engineering in 2020 and 2015 respectively, from University of Maryland, Baltimore County, Maryland, U.S.A, and B.S. in Electronics and Telecommunication from University of Pune, India in 2013. Her research focuses on developing unsupervised machine learning models by integrating concepts from matrix and tensor factorization, information theory and optimization methods. Her interests include matrix decomposition, machine learning, deep learning, medical imaging analysis, data fusion and video processing.