Independent vector analysis for common subspace analysis: Application to multi-subject fMRI data yields meaningful subgroups of schizophrenia

The extraction of common and distinct biomedical signatures among different populations allows for a more detailed study of the group-speci ﬁ c as well as distinct information of different populations. A number of subspace analysis algorithms have been developed and successfully applied to data fusion, however they are limited to joint analysis of only a couple of datasets. Since subspace analysis is very promising for analysis of multi-subject medical imaging data as well, we focus on this problem and propose a new method based on independent vector analysis (IVA) for common subspace extraction (IVA-CS) for multi-subject data analysis. IVA-CS leverages the strength of IVA in identi ﬁ cation of a complete subspace structure across multiple datasets along with an ef ﬁ cient solution that uses only second-order statistics. We propose a subset analysis approach within IVA-CS to mitigate issues in estimation in IVA due to high dimensionality, both in terms of components estimated and the number of datasets. We introduce a scheme to determine a desirable size for the subset that is high enough to exploit the dependence across datasets and is not affected by the high dimensionality issue. We demonstrate the success of IVA-CS in extracting complex subset structures and apply the method to analysis of functional magnetic resonance imaging data from 179 subjects and show that it successfully identi ﬁ es shared and complementary brain patterns from patients with schizophrenia (SZ) and healthy controls group. Two components with linked resting-state networks are identi ﬁ ed to be unique to the SZ group providing evidence of functional dysconnectivity. IVA-CS also identi ﬁ es subgroups of SZs that show signi ﬁ cant differences in terms of their brain networks and clinical symptoms.


Introduction
The study of neuroimaging data from patients and healthy controls is prevalent in the neuroimaging field with a goal to identify differences in the brain function of these two groups (Shenton et al. (1992); Pascual--Marqui et al. (1999); Gur et al. (2000)) and data-driven techniques based on matrix decompositions are being increasingly used for the task (Ma et al. (2014); Abolghasemi et al. (2015)). Identification of common and distinct subspaces from multiple datasets transforms the high dimensional datasets into lower dimensional joint and disjoint subspaces, and allows for a more detailed analysis of the group-specific as well as distinct information. In these models, the assumption is that each observed dataset is explained by a sum of linearly mixed latent variable models.
The common subspace is defined as a subset of latent variables that are highly correlated across the given datasets. The distinct subspace is a subset of latent variables that have very low correlation to each other. The joint subspaces bring the datasets to a common ground, thus allowing for a fair and reliable comparison among different population groups. Meanwhile the distinct components can be used to study individual differences such as the unique connection pattern of a patient with mental disease.
Along with identifying a common subspace comprised of components correlated across all subjects, the extraction of common components across a subgroup of subjects is also of interest. Clinical heterogeneity of patients with mental disorders, especially in schizophrenia has been recognized (Jablensky (2006(Jablensky ( , 2010; Dwyer et al. (2018)), and there has been significant interest in studying their subtypes (Hallmayer et al. (2005); Geisler et al. (2015); Morar et al. (2018)). The study of subtypes can be made possible by identifying subgroups of patients that share specific common information and can help better understand the uncertainty in the need of precision medicine (Senn (2018)) during clinical diagnosis and treatment. Subtypes of schizophrenia have been well studied using genetic information (Hallmayer et al. (2005); Morar et al. (2018)) but not yet using other neuroimaging modalities such as functional magnetic resonance imaging (fMRI) data, which has been successfully used in the study of schizophrenia (Geisler et al. (2015); Calhoun and Adalı (2009); Ma et al. (2012)). The common subspace analysis motivates us to find a way to identify subgroups of patients with subtypes of schizophrenia by summarizing their shared information.
Given the importance of common and distinct subspace analysis in medical image analysis, a number of recent studies had a focus on this aspect, in particular for fusion of different modalities such as fMRI, structural MRI and electroencephalograph, or of fMRI data from different tasks (Sui et al. (2012); Adalı et al. (2015a); Ponnapalli et al. (2011);Van Deun et al. (2012); Lock et al. (2013); Klami et al. (2013); Levin-Schwartz et al. (2017); Dontaraju et al. (2018)). However these cases have only been demonstrated for joint decomposition of a small number of datasets. As we have discussed, distinct and common subspace analysis also promises to be attractive for multi-subject analyses. The models used for identification of common and distinct subspaces in fusion study have not been well-studied in the context of the joint analysis of more than a couple of datasets. Multi-subject data analysis involves joint analysis of at least tens, or more typically hundreds of subjects. A recently proposed method Shared and Subject-Specific Dictionary Learning (ShSSDL) (Iqbal et al. (2018)) targets multi-subject task fMRI analysis and identifies shared components across subjects as well as subject-specific components. However, ShSSDL assumes common time courses across datasets and is not able to identify components that are common across subgroups of subjects. Another ICA-based algorithm, hierarchical ICA (Guo and Tang (2013)), simultaneously estimates the population-level and subject-specific sources. However, the complexity of the density model used in hierarchical ICA grows when the number of datasets increases and it does not account for the dependence structure of these sources.
In this work, we propose a new method, which we call, independent vector analysis (IVA) for common subspace extraction (IVA-CS) to extract subspaces from large-scale datasets and demonstrate its successful application to the analysis of fMRI data collected from 179 subjects. IVA (Kim et al. (2006b,a)) has been successfully used for multi-subject medical imaging data analysis such as fMRI data (Lee et al. (2008); Adalı et al. (2015b); Engberg et al. (2016)) and has been shown to effectively capture subject variability compared to the group independent component analysis (ICA) approach (Michael et al. (2014); Laney et al. (2015)). IVA extends ICA to multiple datasets and makes effective use of the dependence across the datasets through the definition of a source component vector (SCV), making it an attractive choice for subspace analysis. Through the selection of an effective density model for the SCV, with IVA, one can model and estimate the statistical dependence across datasets. Additionally, the strong identification condition of IVA-i.e., the ability to uniquely identify the underlying latent variables-enables the preservation of subspace structure even using only second-order statistics (SOS) as we demonstrate by simulation results in Section IV. IVA with multivariate Gaussian distribution (IVA-G) (Anderson et al. (2012)) is an IVA algorithm that takes only SOS into consideration by assuming a multivariate Gaussian distribution for each SCV and provides efficient estimation with reliable convergence due to its desirable analytical properties. We show that IVA-CS using IVA-G (IVA-G-CS) is powerful in discovering the subspace structure and estimating the subspaces through a careful study of the correlation structure of SCVs.
Furthermore, IVA-CS helps mitigate the high dimensionality issue of IVA and enables a reliable estimation of the latent sources from multisubject data. The curse of dimensionality of IVA notes that the performance of IVA degrades with increase in the number of sources, i.e., the IVA model order, and datasets for a fixed number of samples ). Hence for a high model order and relatively large number of datasets, IVA requires a proportionally large number of data samples for efficient estimation of the demixing matrices. However, the number of samples is fixed in many real-world applications such as fMRI data analysis since the resolution of the data is predetermined. On the other hand, the dependent information across datasets is not sufficient for IVA to exploit in the case with a relatively small number of datasets. Thus, in order to reliably estimate the sources with a given number of samples, we need to determine the desirable number of datasets that is high enough to exploit the dependence across datasets and is not affected by issues regarding high dimensionality. For the purpose of finding the optimal number of datasets to be used in a single IVA decomposition, we estimate and fix the model order for the data, and explore for the number of datasets, i.e., number of subjects. We then divide the entire data into subsets of subjects and perform IVA on each subset to identify common subspace. This defines the first stage of IVA-CS, subset analysis. The common components from each subset are further compared to find the consistent common components for all subjects as well as subgroups of subjects in a group, which then defines the second stage of IVA-CS, i.e., common subspace identification.
We study the ability of IVA to preserve the subspace structure using simulated data and demonstrate that IVA identification condition enables successful recovery of the structure of all subspaces using only SOS. We compare the results to the commonly used multiset canonical correlation analysis (MCCA) (Kettenring (1971)) for joint data analysis that also uses SOS (Li et al. (2009b); Deleus and Van Hulle (2011)). Then, we apply IVA-CS to real fMRI data collected from 88 patients with schizophrenia (SZ) and 91 healthy controls (HC) and show that IVA-CS extracts interpretable common components for the SZ and HC groups separately. These common components are typical resting-state network (RSN) components such as sensorimotor, frontoparietal and default mode network, which are also found in previous studies (Kiviniemi et al. (2009);Allen et al. (2011)). The results show that two components unique to the SZ group include two different interesting RSNs. The first one merges the motor cortex (precentral and postcentral gyrus) and the auditory cortex (superior temporal gyrus) and the other one merges the precuneus gyrus and the right frontoparietal network. Analysis of the correlation matrices of group-specific components helps us identify a number of subgroups of patients that show significant difference in terms of spatial activation of different brain networks such as cingulate gyrus, secondary visual gyrus, primary somatosensory and motor cortex, and inferior frontal gyrus. This provides neurobiological support for the heterogeneity of schizophrenia. The subgroups also demonstrate significant differences in clinical symptoms that are measured by the positive and negative syndrome scale (PANSS) scores (Kay et al. (1987)) and possess unique dominant and absent symptoms. PANSS scores are analyzed synthetically, rather than individually since the self-reported symptom scores are, in general, subjective and noisy, and hence not effective in terms of categorization of disease. These findings emphasize the importance of interpreting subtypes of schizophrenia in terms of both the neuroimaging data analysis and the clinically diagnostic data.
The rest of the paper is organized as follows. Section 2 presents the background of IVA. Section 3 introduces the details of the subset analysis and common subspace identification of the proposed IVA-CS method. Section 4.1 and Section 4.2 show the simulated results and the application to real fMRI data analysis separately. Section 5 gives the discussion and Section 6 summarizes the work and points out future directions.

Independent vector analysis
In most real-world applications the observed data consists of multiple datasets such as the fMRI data that is collected from multiple subjects.
Enabling an analysis of multiset data to leverage its rich information especially across the datasets is important. ICA is a data-driven blind source separation technique that is designed for a single dataset with the assumption that the observed data is a linear mixture of latent (statistically) independent sources (Hyv€ arinen et al. (2001)). It has proven powerful in recovering the independent brain networks from fMRI data ; Calhoun et al. (2001); Li et al. (2009a)). IVA extends ICA to the joint analysis of multiple datasets by additionally taking into account the dependence across datasets ; Kim et al. (2006b,a)).
Suppose there are K datasets collected from K subjects each containing V samples. IVA assumes that each dataset is a linear mixture of N independent sources, (1) where X ½k ¼ ½x ½k ð1Þ; x ½k ð2Þ; ⋯; x ½k ðVÞ 2 R NÂV , S ½k ¼ ½s ½k ð1Þ; s ½k ð2Þ; ⋯; s ½k ðVÞ 2 R NÂV and A ½k 2 R NÂN denote the observed dataset, the set of independent sources, and the invertible mixing matrix respectively. A general model of IVA is shown in Fig. 1 ½k n 2 R VÂ1 is the n th source from the k th dataset. ICA can be achieved by minimizing the mutual information among the individual latent sources. After extending to multiple datasets, an IVA solution finds K demixing matrices by minimizing the mutual information among the SCVs, which results in the following cost function Eflogp n ðy n Þg À such that the estimated sources of each dataset are obtained as y ½k ðvÞ ¼ W ½k x ½k ðvÞ for k ¼ 1; ⋅ ⋅ ⋅; K, where W ¼ fW ½1 ; W ½2 ; ⋯W ½K g denotes the demixing matrices, y n denotes the estimated SCV, and H ð ÁÞ denotes the (differential) entropy. Minimization of (2) is equivalent to maximization of likelihood through the general asymptotic equipartition property ).
where Y n ¼ ½y n ð1Þ; y n ð2Þ; ⋯y n ðVÞ, and p n ð ÁÞ denotes the multivariate probability distribution of nth SCV. In both (2) and (3) the term that is constant with respect to W associated with the observed data X is ignored. For simplicity, in the rest of the paper we consider the simpler independent and identical distribution case and do not take sample dependence into account.

Identification condition for IVA
Identification condition of IVA is studied by analyzing the Fisher information matrix of the objective function (3) with respect to the demixing matrices W ; Anderson et al. (2014)). Compared with the identification condition of ICA that is associated with two individual sources, the identification condition of IVA is introduced for two SCVs. It is shown that the identification condition of IVA depends on the second-order statistics (SOS) of subsets of sources in an SCV. If the SOS defined through covariance matrices provide the required diversity across SCVs, the SCVs are separable even when they are multivariate Gaussian distributed since their SOS can be accurately captured by any type of multivariate distribution prior. We define an α-SCV as an SCV with a particular subset of source components that are K α -dimensional multivariate distributed and independent from the complementing subset in the same SCV. Note that all sources are assumed to have unit variance and zero mean for simplicity. Let α 2 N Kα be a subset of source indices within an SCV, where 0 K α K. The complementing subset of α in 1; ⋯; K is denoted as α c 2 N KÀKα . In two arbitrary α-SCVs, the subsets of sources, S n;α 2 R KαÂV and S m;α 2 R KαÂV , cannot be identified if and only if there exists a full rank diagonal matrix D 2 R KαÂKα such that R m;α ¼ DR n;α D; (4) where R n;α ≜ EfS n;α S T n;α g 2 R KαÂKα refers to the correlation matrix of the subset of sources ). This suggests that the correlation matrices of the subsets of sources in two unseparable α-SCVs have the same structure but different scaling. If an IVA algorithm only takes SOS into consideration, any subset of sources within an SCV that possess this property will not be separated into individual ones. An example of two unseparable α-SCVs is shown in Fig. 1. The change in value of K α indicates that two SCVs can be unseparable either in whole or in subsets of components. However, this is no longer a problem when other types of diversity, such as higher than second-order statistics (HOS) and sample dependence, of the data are taken into consideration.
The most commonly used blind source separation algorithm for multiple datasets is multiset canonical correlation analysis (MCCA) (Kettenring (1971); Li et al. (2009b)) which is based on SOS only. Five cost functions, i.e., GENVAR, MINVAR, MAXVAR, SUMVAR, and SSQCOR, are introduced in (Kettenring (1971)) for maximizing correlation among linearly transformed multiple datasets, which in IVA formulation are the SCVs. MCCA adopts a deflationary approach to estimate one SCV at each time hence its cost function is associated with the correlation matrix of a single SCV. Since the correlation among the sources within each SCV is maximized, the goal for each cost-mostly ad-hoc in nature-is trying to make the correlation matrix as ill-conditioned as possible. MCCA with GENVAR cost function (MCCA--GENVAR) can be shown to have the same cost function as IVA-G (Anderson et al. (2012)) if the demixing matrices are assumed to be orthogonal ). Both MCCA and IVA-G make use of only the SOS of the data. The identification condition for MCCA is given by where r ½n k;l is the element of R n (Li et al. (2009b)). Hence MCCA cannot preserve the structure of SCVs. In contrast, IVA-G has a more general identification condition given in (4) that is synchronized for all SCVs, which enables one to discover subspace structures as we demonstrate by simulation results in Section 4.1.

IVA algorithms
To maximize the likelihood, besides W, we need to estimate the multivariate density function of SCV. The selection of the multivariate distribution, p n ðy n Þ, determines whether SOS and/or HOS of the data are taken into consideration. IVA-G assumes that the sources in an SCV are multivariate Gaussian distributed and only takes SOS into consideration (Anderson et al. (2012)). This assumption guarantees a positive definite Hessian matrix of the cost at the global optimum hence providing a reliable estimation by using second-order optimization techniques such as using Newton updates. IVA with multivariate Laplacian distribution (IVA-L) assumes each SCV is modeled by the multivariate Laplacian distribution (Kim et al. (2006b,a)). The assumption is that the correlation matrix of an SCV is identity thus taking only HOS into consideration. IVA with multivariate generalized Gaussian distribution (IVA-MGGD) on the other hand assumes that an SCV is multivariate generalized Gaussian distributed (Anderson et al. (2013; Boukouvalas et al. (2015)). IVA-MGGD calculates the whole correlation matrix and estimates the shape parameter of the MGGD that models each SCV, making it possible to take both SOS and HOS into consideration. As a consequence, this algorithm is computationally complex but provides good performance. IVA-L with SOS (IVA-L-SOS) ) calculates the whole correlation matrix of each SCV as in IVA-MGGD but fixes the shape parameter to 0.5 to model a multivariate Laplacian distribution which is a good match for fMRI sources. Hence it takes both SOS and HOS into consideration and simplifies the computational complexity compared with IVA-MGGD. Both IVA-G and IVA-L-SOS have proven powerful in extracting interpretable source components when applied to medical imaging data analysis (Anderson et al. (2012); Adalı et al. (2015a); Bhinge et al. (2019)).

IVA for subspace analysis
Common and distinct subspace analysis has proven useful in identifying distinct biomedical signatures of different populations in order to better understand the unique features of different brain disorders. Most subspace analysis algorithms introduced to date that have shown superior performance are designed specially for fusion study where only a few datasets are analyzed (Sui et Klami et al. (2013)). However, these models become extraordinarily complex as the data size increases to tens or even hundreds of datasets. Most medical imaging data like fMRI data is collected from tens or hundreds of subjects and a joint analysis of the multiset data enables one to leverage its rich information especially across the datasets. This motivates the exploration of an algorithm which can identify common and distinct subspaces from relatively large-scale datasets.
We assume that the source space of the observed data consists of three sets of SCVs. The first set of SCVs fY Cn ¼ ½y ½1 Cn ; ⋯; y ½K Cn T g; 1 n N C , define the common subspace where the sources within each SCV are highly correlated (across all datasets) and the third set of SCVs, fY Dn ¼ ½y ½1 Dn ; ⋯; y ½K Dn T g;1 n N D , consist of low correlated sources (correlation values less than σ 2 t ) as shown in Fig. 2(a). Another set of group-specific SCVs, fY Gn ¼ ½y ½1 Gn ; ⋯; y ½K Gn T g; 1 n N G , can be used to determine subgroups of subjects that have more highly correlated RSNs in an unsupervised manner. Therefore, the observed dataset for the k th subject is a mixture of three types of sources: In this work, we propose a new method called IVA-CS that is able to extract subspaces from at least couple of hundreds of datasets. IVA-CS leverages the strength of IVA that its identification condition enables successful preservation of the SCV structure, which makes it highly effective for subspace analysis. However, IVA suffers from the curse of dimensionality when applied to real-world applications with a high number, e.g., hundreds, of datasets such as in multi-subject fMRI data analysis. In this scenario since the number of samples (voxels) is fixed, as the number of datasets and the number of sources increases, it does not guarantee an accurate estimation of the demixing matrices W and calculation of SCV correlation matrices any more. Therefore, our IVA-CS method is composed of two stages, (i) subset analysis and (ii) common subspace identification. The subset analysis stage overcomes the challenge of dimensionality issue by first performing multiple individual IVA decompositions on subsets of subjects that are randomly sampled from the population. The exploration of subset size, K, will be introduced in detail in Section 4.2.2. The common subspace identification stage determines a consistent common subspace for the whole population as well as a set of the group-specific SCVs. The details of IVA-CS method are given next.

IVA-CS: subset analysis
The flowchart of IVA-CS is shown in Fig. 2(b). In subset analysis, R subsets of K subjects are randomly selected from all subjects of a group. IVA is performed on each subset yielding N SCVs. We define the whole signal space as Y that includes all N estimated sources, where y ½k n denotes the n th source of the k th dataset. For each subset, its signal space is denoted as Y r ; 1 r R. Note that all source components are normalized to have unit variance and zero mean hence their covariance values and correlation values coincide. For each SCV we compute a KÂ K correlation matrix with KðK À1Þ=2 distinct correlation values. Using these correlation values, we can study how close these components are to each other and determine whether the corresponding source component is common across all the datasets or not. For an SCV corresponding to the common component, we expect that all the correlation values are significantly high. We measure the "commonality" of an SCV by computing the ratio of the number of correlation values that are greater than σ 2 t , an empirically determined threshold, to the total number of correlation values for each SCV as follows, where N σ 2 t denotes the number of correlation values that are greater than σ 2 t in the correlation matrix of an SCV. An SCV corresponding to a common component is expected to have high commonality hence the value of q is close to 1. Another relevant metric is "dissimilarity" which is defined as the ratio of the number of low correlation values to the number of high correlation values The dissimilarity, the value ofq, for a common SCV is expected to be close to 0.
For each subset r; r ¼ 1;⋯;R, N SCVs are sorted in descending order by the mean value of the correlation, which roughly arranges the SCVs from those with high source correlation to those with low source correlation. The number of common SCVs in each subset, M r , is determined as the largest number that allows for most of the first M r SCVs having q ! δ 1 andq δ 2 . Here not all the M r SCVs are required to satisfy the criteria, seeking to allow flexibility for a further examination of the commonness of these components. After determining the first subspace Y rI that is spanned by the M r common SCVs for each subset, P percent of the remaining SCVs that have a mix of both high and low correlation values forms the second subspace Y rII . The other SCVs form the third subspace Y rIII , which is the distinct subspace.

IVA-CS: common subspace identification
The second stage, common subspace identification has the goal to find a consistent common subspace Y C for all subjects of a group. To enable a comprehensive comparison across R subsets, we adjust the number of components in the subspace Y rI to be common as M across all subsets. The M SCVs are candidates for the common subspace identification. The value of M is determined as the largest value of M r , seeking to select as many candidates as possible. The mean component, y mr ; m ¼ 1; ⋯; M; r ¼ 1; ⋯; R , is calculated by averaging the K components in the m th SCV for each subset r. The cross-correlation of each mean component is defined as the average correlation with its corresponding components in the other R À 1 subsets: where ρ mrl ; 1 l R; l 6 ¼ r; is the Pearson correlation coefficient between the mean components y mr and y ml . For each subset, its crosscorrelation is defined as the average cross-correlation of its M mean components The subset r 0 with the most consistent components is selected as the one with the largest cross-correlation ρ r and is used to identify the common subspace for all the subjects. The identified common components should be not only common across the subjects within subset r 0 but also consistent across all subsets. To achieve this goal, we use a datadriven decision tree to determine whether a component is common or not as shown in Fig. 2(b). A threshold t is determined from the MðR À1Þ cross-correlation values. For each candidate, if more than a given percentage of its cross-correlation values are higher than t, it is determined to be a common component. First, the determined common component comes from the first subspace of subset r 0 which means it is common across all the subjects within subset r 0 . Second, this decision tree ensures that this component is correlated with its corresponding components from most of all the subsets hence it is consistent across subsets. The identified common subspace of all subjects therefore consists of all the determined common components. The components that are filtered out as not common are merged with subspace Y II to form a new set of SCVs to identify subgroups of datasets that have more highly interfered components through an unsupervised analysis.

Data and code availability
The data that supports the findings of this study is openly available on the collaborative informatics and neuroimaging suite (COINS) data exchange repository (https://coins.trendscenter.org/). The codes used in this work are available upon direct request from the corresponding author. The data and code sharing adopted by the authors comply with the requirements of the funding bodies.

Simulations
We use IVA-G with a block Newton update that provides desirable convergence properties (Anderson et al. (2012)), and in what follows, demonstrate its effectiveness in subspace extraction.

Simulation setup
To study the ability of IVA to maintain the structure of SCVs to effectively identify the subspaces, we design a set of simulations. Our application is fMRI data analysis and the latent fMRI sources are likely to be super-Gaussian distributed (Calhoun and Adalı (2006)). Therefore, all the SCVs are generated from an MGGD with the shape parameter β randomly selected from the interval [0.1, 0.8], which generates super-Gaussian marginals. The SCVs are mixed by randomly generated mixing matrices, A ½k , from a standard uniform distribution to form the datasets using (1). A total of N ¼ 30 SCVs are generated with V ¼ 10000 voxels and K ¼ 20 datasets. Note that all the MGGD sources are normalized to zero mean and unit variance hence the covariance values and correlation (coefficients) coincide. These SCVs are grouped to imitate three subspaces and the details are as follows: The first group of 14 SCVs are simulated as common sources with high correlation value ρ c . The correlation matrix of a common SCV is shown in Fig. 3(a).
The second group of 6 SCVs have structured correlation matrices with some higher value of ρ c and some lower value of ρ d . This indicates that the sources with higher correlation values are common within a subgroup of datasets. The structured correlation matrices are shown in Fig. 3(b)-(g).
The last group of 10 SCVs are simulated as distinct sources with low correlation value ρ d . Fig. 3(h) Fig. 3 shows the correlation matrices of the true SCVs and the estimated SCVs for a random run for Case 1. To demonstrate that IVA identification condition enables preserving the structure of SCVs compared to MCCA due to its identification condition, we perform IVA-G and three versions of MCCA, i.e., MCCA-GENVAR, MCCA with MAXVAR cost function (MCCA-MAXVAR), and MCCA with SSQCOR cost function (MCCA-SSQCOR), on the mixtures of sources, X. MCCA-MAXVAR and MCCA-SSQCOR behave similar to MCCA-GENVAR and Fig. 3 shows the results of IVA-G and MCCA-GENVAR. Both IVA-G and MCCA-GENVAR successfully extract the common subspace spanned by SCVs 1-14. But for the SCVs 15-20, as we can see there are only a subset of components highly correlated with each other. It is possible for some components in one SCV to have a certain level of correlation with the components in another SCV. MCCA-GENVAR estimates one SCV by maximizing the correlation among the sources across datasets. MCCA tends to group as many correlated sources as possible in the SCVs that are estimated first, due to its deflationary nature. Hence we observe some SCVs that are merged together, as shown in Fig. 3(r)-(u), and that the SCV15 and SCV16 contain more correlated sources than SCV17 and SCV18. This breaks the structure of true SCVs and makes it impossible to identify the subgroups of sources within an SCV, i.e., to identify subspaces. As a result, MCCA yields fewer SCVs with structured correlation matrix and more distinct SCVs compared with the ground truth. In contrast, IVA-G successfully preserves the structure of all SCVs. Note that these SCVs are estimated subject to permutation ambiguity due to the nature of all blind source separation algorithms. This illustrates the desirable use of IVA-G to extract subspaces to identify subgroups.

IVA-G and IVA-L-SOS
Identification condition of IVA-G implies that IVA-G is not able to separate α-SCVs that have proportional correlation matrices no matter which distribution the latent sources are drawn from. The unseparable sources can be the whole SCVs or a subset of sources in the SCVs since 0 K α K. In our simulation SCVs 1-14 have the same correlation matrix hence IVA-G identifies the whole subspace successfully while not the individual SCVs. The same situation for SCVs 21-30. As noted in Section 2.2, an IVA algorithm that takes both SOS and HOS into consideration like IVA-L-SOS can solve this problem. We can either perform IVA-L-SOS on the original datasets or perform a secondary IVA-L-SOS decomposition on the subspace identified by IVA-G. Reliably identifying subspaces of interest by IVA-G followed by a secondary decomposition of an IVA algorithm that takes both SOS and HOS (and hence is computationally more complex) into consideration on the subspaces helps save considerable computation time. This two-step procedure guarantees better performance by first identifying a desirable starting point. This strategy is favorable in the applications of blind source separation and is the core in the common practice of performing principal component analysis first to provide an orthogonal initialization for an ICA decomposition.
The correlation between the estimates and the ground truth is calculated to evaluate the performance. IVA-G yields good estimation of SCV15-20 and their average correlation is 0.996 AE 0.014 across all four cases. To explore the ability of IVA-L-SOS to separate α-SCVs with proportional correlation matrices, a secondary IVA-L-SOS decomposition is performed on the common and distinct subspace separately. The common and distinct subspaces are constructed separately by multiplying the common and distinct SCVs from IVA-G with their associated mixing matrices calculated as the inverse of the estimated demixing matrices. Fig. 4 shows that IVA-G yields common components that are not highly correlated with the ground truth. However, these components are very reliably estimated by the secondary IVA-L-SOS with correlation values close to 1. This verifies that the identification issue of IVA is not a problem anymore when HOS is taken into consideration.
4.2. Application to multi-subject resting state fMRI data

Data acquisition and preprocessing
The data used in this study is a resting state fMRI data from the Center of Biomedical Research Excellence (COBRE), which is available on the collaborative informatics and neuroimaging suite data exchange repository (https://coins.trendscenter.org/) (Scott et al. (2011);Çetin et al. (2014); Aine et al. (2017)). The data includes 88 SZs (average age: 37 AE 14) and 91 HCs (average age: 38 AE 12). For this study, the participants were asked to keep their eyes open during the entire scanning period. All images were collected on a single 3-T Siemens Trio scanner with a 12-channel radio frequency coil using the following parameters: TE ¼ 29 ms, TR ¼ 2 s, flip angle ¼ 75 , slice thickness ¼ 3.5 mm, slice gap ¼ 1.05 mm, voxel size 3.75 Â 3.75 Â 4.55 mm 3 . Participants were instructed to keep their eyes open during the scan and stare passively at a central fixation cross. Each resting state scan consists of 150 volumes. To eliminate the T1-related signal fluctuations (T1 effect) (Shin et al. (2013)), the first 6 volumes are removed in this study, thus 144 volumes remain for each subject. The fMRI data are realigned with INRIalign algorithm (Freire et al. (2002)), slice-timing correction is applied using the middle slice as the reference frame in the functional data pipeline and spatially normalized to the standard Montreal Neurologic Institute (MNI) space (Friston et al. (1994)) and resampled to 3 Â 3 Â 3 mm 3 , resulting in 53 Â 63 Â 46 voxels. Afterwards, the fMRI data are smoothed using a Gaussian kernel with a full-width at half-maximum of 5 mm.

Parameter selection
IVA procedure we employ does not require the selection of any parameter except the model order, i.e., the dimension of signal space N. However, for fMRI data, classical order estimation techniques based on information theoretic criteria may overestimate the order due to the inherent sample dependence of fMRI data (Li et al. (2007(Li et al. ( , 2011). A common way to overcome this issue is by using downsampling to obtain effectively independent and identically distributed samples (Li et al. (2007(Li et al. ( , 2011). However, methods based on downsampling suffer from a loss of information associated with it. More recently, two entropy rate (ER)-based order estimation techniques are proposed that account for sample dependence without the use of downsampling: ER using a finite memory length model (ER-FM) and ER using an autoregressive model (ER-AR) ). Since the sample correlation structure in ER-FM is a better match to that in fMRI data due to the finite span of correlation in the point spread function, ER-FM is used in this paper to estimate the order of signal space for each subject. The final value of N is selected as the mean plus one standard deviation of the orders computed across all the subjects and is fixed for the IVA decompositions on R subsets. The mean and standard deviation of the order across subjects are 72.86 AE 10.40. We use an order equal to the mean plus one standard deviation, which is rounded up to 85 to retain a significant level of the variability across multiple subjects while introducing minimal noise. The use of this high model order is also well motivated in the literature for achieving a more useful functional segmentation of the brain, see e.g., (Kiviniemi et al. (2009);Allen et al. (2011)). The dimension of each dataset is reduced to 85 by performing a principal component analysis.
To achieve a reliable IVA decomposition, we need sufficient number of samples per estimated parameter so that IVA can effectively take dataset dependence into account. The performance of IVA degrades as the number of datasets, K, or the number of sources, N, increases beyond a certain point, when number of samples, V, is fixed. The estimation of the source covariance matrix that determines the multivariate Gaussian distribution of each SCV is required in IVA-G for each update of demixing matrices W, which in turn is used to estimate the SCVs. The total number of samples in a dataset is NV Â K. The number of free parameters to be estimated in the covariance matrix for each of K SCVs isKðK À 1Þ=2. Hence the number of samples per free parameter is NVÂK which is inversely proportional to K. When updating the demixing matrices W k , there are KN 2 free parameters hence the number of samples per free parameter is NVÂK KN 2 ¼ V=N, which is inversely proportional to N. To explore the optimal value of K that balances the effect of high dimensionality and maximal subject information when the N is fixed-i.e., determined using a data-driven approach-we design a hybrid simulation. The estimated 85 COBRE sources as presented in our previous study (Long et al. (2019)) are used as the sources and mixed by the randomly generated mixing matrices to produce the hybrid data. With a fixed model order and a fixed number of voxels, the number of subjects K is changed from 30 to 80 with increments of 10. The performance is measured by joint-ISI which is defined as where and G ½k ¼ W ½k A ½k with elements denoted as g nm (Amari (2000)). A ½k is the true mixing matrix and W ½k is the estimated demixing matrix. If W ½k is perfectly estimated, G ½k is identity subject to permutation and scaling ambiguities, thus yielding zero ISI that indicates a perfect separation. Therefore, the smaller the joint-ISI, the closer the estimates are to the ground truth. From Fig. 5 we see that the performance of IVA-G improves with increase in the number of datasets until a certain value of K, after which the performance degrades. This illustrates that when the number of datasets used in an IVA decomposition is too small, the interaction information across datasets is not sufficient for an accurate estimation of the sources. Hence the performance improves with more number of datasets. However, for a larger number of datasets, IVA suffers from the curse of dimensionality. Thus, the performance degrades. The experimental results show that there indeed exists an optimal value of the number of datasets for an IVA decomposition that balances the requirement of interaction information and the dimensionality issue. The optimal number of subjects in this application is determined as K ¼ 50, where IVA yields the best performance, i.e., the lowest joint-ISI as shown in Fig. 5. This value is then used as the subset size in subset analysis of IVA-G-CS. We randomly selected five subsets of subjects for each group to ensure that each subject is included at least once in the decompositions.
Note that in this work the values of all dimension parameters are determined in a data-driven manner.

Common subspace identification using IVA-G-CS
IVA-G-CS is applied to SZ group and HC group separately to identify their subspaces. Five subsets of 50 subjects are randomly selected for each group and 85 SCVs are estimated from each subset. The value at each voxel of the estimated source is transformed into Z-scores before any calculation of metrics hence the covariance and correlation coincide. For each SCV, there are 50 Â (50 À 1)/2 ¼ 1225 distinct correlation values. The smoothed distribution of correlation values is plotted as a function of the index of SCVs for the fifth subset of SZs and the second subset of HCs as shown in Fig. 6. The similar plots of the other subsets are provided as supplementary materials. The SCVs are sorted by the mean value of correlation to roughly order them from high correlation to low correlation. From Fig. 6(a) we can see there is a group of SCVs with all their correlation values higher than 0.2. Using σ 2 t ¼ 0:2, the commonality q and dissimilarityq are calculated and plotted in yellow and purple, respectively. In the application to this dataset, we chose δ 1 ¼ 0:98 and δ 2 ¼ 0:02. More than 90% of the first 28 SCVs have q ! 0:98 andq 0:02 in the fifth subset of SZs. This suggests that the first 28 SCVs are common across the subjects in this subset. Finally, the numbers of common SCVs M r for all five subsets of SZ group are determined as 23, 30, 33, 35, 28, respectively. For HC group, the numbers are 31, 32, 28, 34, 34, respectively. The number of candidates, M, in the subspace Y rI is determined as 35, which is the largest one among the ten values. The 25th percentile of the cross-correlation values of the M candidates is used as threshold t for the identification of common components. Each identified common component is consistently estimated in at least 80% of the subsets. Note that the values of σ 2 t , δ 1 , and δ 2 are selected such that the average correlation values in the subspaces are as different as possible.
Determining the value of M as the largest one across multiple subsets mitigates the sensitivity of the choice of parameters by allowing more candidates available for common component identification.

Artifact removal.
Using the common component identification method presented in Section 3.2, 25 and 24 common components are determined for SZ and HC groups separately. Among the determined common components, some of them have high ventricle effects and hence should be removed from further analysis. We utilize the grey matter (GM) and cerebrospinal fluid (CSF) MNI templates included in SPM 12 to distinguish the components (Sui et al. (2009);Bhaganagarapu et al. (2013)). The correlation between the common components and the two templates, C GM and C CSF , are calculated. Each component is normalized to Z-score and thresholded by Z ¼ 2 which means that the voxels with Z < 2 are set to zero. The components are divided into two groups with respect to the median value of C GM À C CSF . The components in the first group are with C GM À C CSF higher than the median hence they are more likely to be RSNs and those in the second group are more likely to be ventricle effects. To ensure all the RSNs are retained and the ventricle effects are removed, we further do a visual check. The grouping by the correlation reduces the burden of a visual check. The median value of C GM À C CSF is 0.18 for SZs and 0.22 for HCs. Finally, 14 common RSNs are obtained for SZs and 16 for HCs after artifact removal. The average correlation values of the 14 common RSNs of SZs is 0.66 AE 0.15 and that of the 16 common RSNs of HCs is 0.68 AE 0.10.
The identified common components of each group are typical RSN components like those found in previous studies (Kiviniemi et al. (2009);Allen et al. (2011)). They are grouped into six domains, motor, cognitive control (COG), default mode (DM), auditory (AUD), visual (VIS) and cerebellum (CB), according to their anatomical and presumed functional properties as in (Allen et al. (2011)). Fig. 7 shows the composite spatial maps for each cluster. The results show that two common components of SZs include two different interesting RSNs, the first one merges the motor cortex (precentral and postcentral gyrus) and the auditory cortex (superior temporal gyrus) and the other one merges the precuneus gyrus and the right frontoparietal network. The merging of two different RSNs into a single source component does not occur in the HC group. This observation suggests a high correlation between these RSN pairs might result from decreases in connectivity in the brain of individuals with schizophrenia (Bullmore et al. (1997); Stephan et al. (2009);Friston (1999)) hence more networks are involved in its functioning.

Analysis of group-specific SCVs
From the violin plots in Fig. 6, we can identify two more subspaces, Y II and Y III (distinct subspace). The subspace Y II has a mix of both high and low correlation values and Y III has very low correlation values. The value of P that is used to determine the subspace Y II is set as 40. Further analysis of these two groups is of interest as well. The SCVs in subspace Y II are called group-specific SCVs since they suggest that the components from some subgroups of subjects have higher correlation hence can be used to identify those subgroups. As we know that the brain functions differently in patients of schizophrenia compared with HCs. Using the group-specific SCVs, we seek to identify subgroups of patients that have RSNs with significant correlation which may result from similar functional patterns. The results indeed show that the subgroups identified using group-specific SCVs reflect similarity within each subgroup and significant differences across subgroups in terms of the spatial activation patterns of their RSNs. We also conduct statistic test on their clinical symptoms that are scored by PANSS and discover differences with certain significance level.
From the distribution plots of correlation values shown in Fig. 6(a), the subspace Y II is determined as that located within two vertical red lines. Meanwhile, those SCVs that are filtered out by common subspace identification are treated as group-specific SCVs as well. Consequently, N G ¼ 30 group-specific SCVs are identified for SZs. A k-means clustering is performed on the correlation matrices of the group-specific SCVs to find out clusters that have similar correlation matrices to help identify source components that are common within the same subgroup of subjects. The mean correlation matrix of a cluster is used to identify the subgroups of subjects. As shown in Fig. 8(c) and (d), the mean correlation matrix is rearranged to assemble the subgroup modules by maximizing the modularity of the matrix, which is called modularization. Modularity is a measure that quantifies the community structure of a network that is summarized in a matrix (Newman (2006)). As shown in Fig. 8(c), two clusters with higher modularity, 0.39 and 0.21, each yields three separate clear subgroups. There is no significant difference associated with age among the subgroups (p ! 0.1541).
To compare the spatial activation patterns of the RSNs across subgroups in each cluster, we perform a two-sample t-test on the activation value at each voxel of the spatial map across the subjects within each subgroup. False discovery rate (FDR) correction is conducted throughout all comparisons and the associated confidence interval after FDR correction is reported. Cluster I includes three components and two of them show significant differences in spatial activation patterns, as shown in Fig. 8(d). Subgroup 1 yields higher activation in the posterior cingulate gyrus and Brodmann area (BA) 31, and lower activation in the secondary visual cortex compared with subgroups 2 and 3. Subgroup 2 has lower activation than the other two subgroups in the anterior and posterior cingulate gyrus. Cluster II includes five components and three of them show significant differences in spatial activation patterns as shown in Fig. 8(e). Subgroup 3 has lower activation in the primary somatosensory and motor cortex, and higher activation in the secondary visual cortex. Subgroup 2 has lower activation in angular gyrus and higher activation in the inferior frontal gyrus. A particular case is the very small area of activation in BA30 that shows significant differences with 100% confidence interval after FDR correction between subgroups 1 and 3.
A multivariate analysis of variance (MANOVA) is conducted on five statistics-mean, standard deviation, median, minimum, and maximum-of all the thirty PANSS scores (Kay et al. (1987)) including seven positive, seven negative, and sixteen general scales. The MANOVA yields in F-score ¼ 3.978 (p ¼ 6.816 Â 10 À5 ) that demonstrates significant differences across the three subgroups in Cluster I (Cluster II was not significant). Fig. 9 summarizes the dominant and absent symptoms of each subgroup. The dominant symptoms of a subgroup refer to those that have median value greater than 2 or the median value is 2 for one subgroup while is 1 (which means absent) for the other two subgroups. In Cluster I, as shown in Fig. 9(b), subgroup 3 has more dominant symptoms, subgroup 2 has more absent symptoms, and subgroup 1 has obvious broader range for a number of symptoms such as delusions, tension, lack of judgment and insight, and active social avoidance. In Cluster II, as shown in Fig. 9(d), all three subgroups have several dominant symptoms. Subgroup 2 has more absent symptoms and subgroup 3 has broader range for symptoms such as stereotyped thinking, anxiety, and tension. All subgroups possess their unique dominant and absent symptoms. We also conduct a MANOVA on the symptoms present in all three subgroups. The MANOVA detects significant differences among the subgroups in Cluster I with F-score ¼ 4.1367 (p ¼ 8.302 Â 10 À5 ). In Cluster II, only the standard deviation demonstrates significant differences among subgroups with F-score ¼ 3.3846 (p ¼ 0.0404).

Discussion
Through the investigation of spatial activation patterns of the networks across subgroups, we find several interesting networks that show significant subgroup differences. Most of the networks such as the cingulate gyrus (Mucci et al. (2007)), somatosensory and motor cortex (Arango et al. (2000)), angular gyrus, inferior frontal gyrus (Kanahara et al. (2013)), and secondary visual cortex (Silverstein et al. (2009)) are reported to be related to schizophrenia (Takahashi et al. (2004); Whalley et al. (2007)). One particular case is the third component in Cluster II because the activated region that shows significant differences between two subgroups is very small yet very interesting. This network is BA30 which does not have a specific name and only the function of its left part, where the activation pattern shows significant subgroup differences, is reported in (Vorobyev et al. (2004); Patel et al. (2006)). Its function is related to attending to speech and listening to sentences. Hearing voices is the most common type of hallucination-one of the typical symptoms-in people with schizophrenia.
In addition to studying the differences across the identified subgroups using the spatial maps of the networks extracted from fMRI data, we also investigate the differences in terms of their clinical data-PANSS scores. PANSS scores-1 means absent and 7 means extreme-are medical scales used for measuring symptom severity of patients with schizophrenia (Kay et al. (1987)). The self-reported symptom scores are, in general, subjective and noisy, and hence not effective in terms of categorization of disease. Focusing the analysis on individual PANSS scores is not sufficient to describe the subgroups. By investigating the PANSS scores via a MANOVA, we find significant differences across the subgroups and identify unique dominant and absent symptoms for each subgroup. The high significance level provides more confidence in the identified subgroups in Cluster I. Extracting reliable information of the subgroups of schizophrenia from both the neuroimaging data and clinically diagnostic data can potentially lead to a better understanding of the underlying heterogeneity of the disorder and in the future may lead to improved categorization and treatment strategies. Along with the model order and the number of datasets to be used in a single IVA decomposition, which are determined in a data-driven manner, there are other parameters in the proposed procedure that are practically determined. However, these parameters are easy to determine and can be categorized into two classes. The first class includes the thresholds, the correlation threshold t and σ 2 t . These can vary across different datasets but are easy to determine. For the value of t the suggestion is to use a simple statistic, such as the 25th percentile used in this work, of all the correlation values. Two examples of the correlation structure analysis are shown in Fig. 6, where we clearly see the three distinguishable subspaces. This demonstrates the strength of IVA-CS to identify subspaces. Hence in cases where there are distinguishable subspaces the threshold σ 2 t needed for further division is also easy to determine. Thresholding method might not be effective in cases where the subspaces are not clearly delineated. A better choice of the division of subspaces in these scenarios can be achieved by using additional meaningful prior information such as sparsity that is used in (Peng et al. (2016)). However, in most cases, the groups are heterogeneous and we would expect to see such subspace clusters. The other class of parameters includes parameters such as, 80% of the subsets, δ 1 , δ 2 , and the percentage of group-specified components, P. These are ratios that are practically determined to be large or small enough to avoid missing useful information or including unreliable, noisy information. Their determination depends on users' choice. Small changes in these ratios are not expected to change the conclusions. Furthermore, the parameters included in the second stage and post analysis do not influence the use of IVA-CS for subspace analysis in other applications since the proposed flexible framework allows researchers to design their own way of extracting information from the subspaces. Hence the parameters can be either accordingly reserved for easy selection or fully omitted due to the design of experiment.

Conclusion and future directions
Given the importance of common and distinct subspace analysis in medical imaging data analysis, a number of recent studies had a focus on this aspect, in particular for fusion of different modalities or tasks where only a couple of datasets are jointly analyzed. However, joint analysis of large-scale medical imaging data such as multi-subject fMRI data collected from tens, or typically hundreds of subjects enables one to leverage the rich information across the datasets. Here, we introduced a new method called IVA-CS to extract subspaces from at least a hundred of datasets by leveraging the strength of IVA in identification through Fig. 8. The clustering and modularization processes of the correlation matrices of group-specific SCVs and the two clusters that yield clear subgroups of patients (a-c). Subgroup 1, 2, and 3 are labeled by orange, magenta, and red squares separately. The differences of spatial activation patterns across subgroups in Cluster I (d) and Cluster II (e) are represented by the t-statistic from a two sample t-test with FDR ( 0.05) control and the associated confidence interval (CI) is reported. The brain regions that show significant subgroup (SG) differences including: posterior cingulate gyrus (PCG), BA31, secondary visual cortex (SVC), anterior cingulate gyrus (ACG), primary somatosensory and motor cortex (SSM), angular gyrus (Ang), inferior frontal gyrus (IFG), and BA30. successful preservation of the complete SCV structure. This allows for efficient identification and estimation of subspaces by carefully studying the dependence structure of SCVs. IVA-CS also mitigates the high dimensionality issue of IVA by introducing subset analysis to determine a desirable number of datasets that is high enough to exploit the dependence across datasets and is not affected by issues regarding high dimensionality. The simulation study verifies the ability of IVA to preserve subspace structure and its application to real fMRI data demonstrates its effectiveness. The identified common components with two linked networks provide evidence of the functional dysconnectivity in the brain of SZs. A subspace of group-specific SCVs is identified by IVA-CS for the SZ group as well. The subgroups of SZ recognized using the group-specific SCVs exhibit significant differences in terms of their brain networks as well as their clinical symptoms that are measured by PANSS. These findings emphasize the importance of interpreting subtypes of schizophrenia in terms of both the neuroimaging data analysis and the clinically diagnostic data. A better understanding of the underlying heterogeneity of the disorder in the future may lead to improved categorization and treatment strategies.
IVA-CS successfully identifies subspaces for the COBRE dataset where we demonstrated its robustness through a subset analysis. It will be desirable to apply IVA-CS to different datasets to provide further evidence that the proposed framework is useful. Other interesting future directions include the use of common components estimated by IVA-CS. They represent the global information of the population of a group hence allowing for the identification of more reliable and robust brain patterns of the patients with mental disorder. In the application to the analysis of fMRI data that is collected from at least a hundred of subjects, we demonstrate that IVA-G-an IVA algorithm that uses only SOS-reliably extracts interpretable common RSNs that are consistent with previous work. In this work, we use subset analysis to mitigate the high dimensionality issue. As discussed in Bhinge et al. (2019), another way to deal with the high dimensionality issue of IVA is to use constrained IVA (Bhinge et al. (2017))-a semi-blind source separation technique-to shrink the solution space. The common components from IVA-G-CS hence can be used as constraints to mitigate the high dimensionality issue and to guide the analysis with an IVA algorithm that uses both SOS and HOS. One other possible future application is the dynamics study of brain function using fMRI data, where each dataset is further divided into multiple datasets thus resulting in hundreds even thousands of datasets. The hardware used in the computational studies is part of the UMBC High Performance Computing Facility (HPCF). The facility is supported by the U.S. National Science Foundation through the MRI program (grant nos. CNS-0821258, CNS-1228778, and OAC-1726023) and the SCREMS program (grant no. DMS-0821311), with additional substantial support from the University of Maryland, Baltimore County (UMBC). See hpcf .umbc.edu for more information on HPCF and the projects using its resources. Fig. 9. The median (connected by solid lines), minimum, and maximum (connected by dash lines) values of each PANSS score of subgroups in Cluster I (a) and Cluster II (c), and the dominant (bright orange, magenta, and cyan) and absent (fade orange, magenta, and cyan) symptoms of each subgroup in Cluster I (b) and Cluster II (d). In (b) and (d), the bold symptoms refer to those that are unique for a subgroup and the translucent symptoms refer to those that are absent from all three subgroups.