On the brain structure heterogeneity of autism: Parsing out acquisition site effects with significance‐weighted principal component analysis

Abstract Neuroimaging studies have reported structural and physiological differences that could help understand the causes and development of Autism Spectrum Disorder (ASD). Many of them rely on multisite designs, with the recruitment of larger samples increasing statistical power. However, recent large‐scale studies have put some findings into question, considering the results to be strongly dependent on the database used, and demonstrating the substantial heterogeneity within this clinically defined category. One major source of variance may be the acquisition of the data in multiple centres. In this work we analysed the differences found in the multisite, multi‐modal neuroimaging database from the UK Medical Research Council Autism Imaging Multicentre Study (MRC AIMS) in terms of both diagnosis and acquisition sites. Since the dissimilarities between sites were higher than between diagnostic groups, we developed a technique called Significance Weighted Principal Component Analysis (SWPCA) to reduce the undesired intensity variance due to acquisition site and to increase the statistical power in detecting group differences. After eliminating site‐related variance, statistically significant group differences were found, including Broca's area and the temporo‐parietal junction. However, discriminative power was not sufficient to classify diagnostic groups, yielding accuracies results close to random. Our work supports recent claims that ASD is a highly heterogeneous condition that is difficult to globally characterize by neuroimaging, and therefore different (and more homogenous) subgroups should be defined to obtain a deeper understanding of ASD. Hum Brain Mapp 38:1208–1223, 2017. © 2016 Wiley Periodicals, Inc.

Abstract: Neuroimaging studies have reported structural and physiological differences that could help understand the causes and development of Autism Spectrum Disorder (ASD). Many of them rely on multisite designs, with the recruitment of larger samples increasing statistical power. However, recent large-scale studies have put some findings into question, considering the results to be strongly dependent on the database used, and demonstrating the substantial heterogeneity within this clinically defined category. One major source of variance may be the acquisition of the data in multiple centres.
In this work we analysed the differences found in the multisite, multi-modal neuroimaging database from the UK Medical Research Council Autism Imaging Multicentre Study (MRC AIMS) in terms of both diagnosis and acquisition sites. Since the dissimilarities between sites were higher than between diagnostic groups, we developed a technique called Significance Weighted Principal Component Analysis (SWPCA) to reduce the undesired intensity variance due to acquisition site and to increase the statistical power in detecting group differences. After eliminating site-related variance, statistically significant group differences were found, including Broca's area and the temporo-parietal junction. However, discriminative power was not sufficient to classify diagnostic groups, yielding accuracies results close to random. Our work supports recent claims that ASD is a highly heterogeneous condition that is difficult to globally characterize by neuroimaging, and therefore different (and more homogenous) subgroups should be defined to obtain a deeper understanding of ASD. Hum Brain Mapp 38:1208-1223, 2017.

INTRODUCTION
Autism Spectrum Disorder (ASD) is a neurodevelopmental syndrome characterized by social and communication impairment as well as restricted, repetitive patterns of behaviour, interests or activities. The delimitation of both functionally and structurally affected areas in the brain in such an etiologically and neurobiologically heterogeneous condition has long been a major concern Lai et al., 2013;Lenroot and Yeung, 2013]. With this context, the use of large samples is of fundamental importance, and has been addressed by establishing multicentre collaborations such as the UK Medical Research Council Autism Imaging Multi-centre Study (MRC AIMS) [Ecker et al., , 2013[Ecker et al., , 2015 and the Autism Brain Imaging Data Exchange (ABIDE) [Di Martino et al., 2014] Multicentre studies with structural (sMRI) and functional magnetic resonance imaging (fMRI) are increasingly common, allowing for recruitment of larger samples in shorter periods of time. However, the use of images acquired at different sites still poses a major challenge. In addition to logistical difficulties, such as regulatory approvals and data protection, a number of technical and methodological issues can potentially affect the resulting maps, introducing undesired intensity and geometric variance.  [Jovicich et al., 2006;Stonnington et al., 2008], where group differences are well known, and demonstrating that the impact of a correction for site on the resulting neurobiological differences is relatively small. However, these effects have a stronger impact in psychiatric conditions where the atypical radiological signs on MRI are often subtle and require large samples of patients to observe on-average differences relative to control samples. Recent meta-analyses point to differences being inconsistently reported in schizophrenia [Friedman and Glover, 2006;Turner et al., 2013], psychosis [Clementz et al., 2016;Wang et al., 2015], and ASD (using the multicentre ABIDE database) [Haar et al., 2014]. These inconsistencies can arise from a variety of variance sources, ranging from the multi-level (phenotypic, neurobiological, and etiological) heterogeneities of the conditions to technical issues that include differences in scanner make, model, manufacturer, static field strength, field inhomogeneities, slew rates and image reconstruction [Van Horn and Toga, 2009], as well as acquisition problems such as within-acquisition participant head motion. Field inhomogeneities are a source of misinterpretation of the data even when the same MRI system manufacturer and model are used [Van Horn and Toga, 2009]. Furthermore, results in [Pearlson, 2009] demonstrate that a single scanner can change with time, which makes some widely used strategies, for example collecting controls first and patients later, a flawed approach. Recent neuroimaging research on ASD [Haar et al., 2014] has shown that, while analyses performed on a particular database (acquired on a single platform) could yield coherent regions, the atypical structures are often inconsistent across the wider literature using different databases. Therefore, new methodologies focused on reducing multi-site variance may be potentially helpful in increasing the power to identify the characteristic neurobiological signature of autism, should there be one.
A number of approaches have been proposed to reduce between-site variance. Geometric distortions caused by magnetic fields inhomogeneities have been widely studied [Jovicich et al., 2006;Stonnington et al., 2008]. Furthermore, there exist a number of diffeomorphic registration algorithms, such as DARTEL [Ashburner, 2007] or ANTS [Avants et al., 2010], intended to reduce inter-subject-and by extension, inter-site-variations. In the case of intensity variance, the issue has already been addressed in the MRC AIMS database by leveraging sMRI sequences that yield quantitative estimates of relaxation times [Deoni et al., 2008] that have been demonstrated to reduce single-site effects compared to weighted sequences. However, images acquired using this technique still yield between-site differences .
To address the problem of intensity variance and improve the homogeneity of the images across different sites, we have developed a new post-acquisition methodology that enhances derived maps (e.g. grey or white matter volumes) by means of ameliorating site effects. This method, called significance-weighted principal component analysis (SWPCA), can be applied as part of pre-processing before computing whole-brain or regional statistical analysis. The algorithm proceeds by performing a principal component analysis (PCA) over the whole database of images and later computing the statistical significance of each component in relation to a categorical variable, in this case the acquisition site. This information is used to reconstruct the datasets using a weighted strategy that effectively reduces intensity inhomogeneities due to site effects.
This article is organised as follows: First, in "Material and Methods" section, the methodology is presented. It comprises the presentation of the MRC AIMS image database and its pre-processing, PCA, one-way analysis of variance (ANOVA), and how these two statistical procedures are combined in a weighted approach to create the corrected maps. In "Results" section, the algorithm is applied to maps of grey and white matters and estimates of relaxation times, with qualitative and quantitative results presented before and after applying SWPCA to the images. "Discussion" section is used to discuss the relevance of the SWPCA algorithm and the results of the case-control comparison, and finally in "Conclusions" section, we draw conclusions concerning multi-site studies and the neurobiology of autism.

Image Database
Structural MRI were analysed from 136 adult, righthanded males (68 with ASD and 68 matched controls) with no significant mean differences in age and full-scale IQ, acquired from the centres contributing to the UK Medical Research Council Autism Imaging Multi-centre Study (MRC AIMS) [Ecker et al., , 2013[Ecker et al., , 2015 and recruited by advertisement. In this work, only participants recruited at the Institute of Psychiatry, King's College London (LON) and the Autism Research Centre, University of Cambridge (CAM) were included where an equivalent set of images were acquired from each participant.
Participants were excluded from the study if they had a history of major psychiatric disorder or medical illness affecting brain function (e.g. psychosis or epilepsy), or current drug misuse (including alcohol), or were taking antipsychotic medication, mood stabilizers or benzodiazepines.
All participants with ASD were diagnosed according to International Classification of Diseases, 10th Revision (ICD-10) research criteria, and confirmed using the Autism Diagnostic Interview-Revised (ADI-R) [Lord et al., 1994]. Autism Diagnostic Observation Schedule (ADOS) [Lord et al., 2000] was performed, but the score was not considered as an inclusion criteria. ASD participants, to be r Martinez-Murcia et al. r r 1210 r included, must have scored above the ADI-R cut-off in the three domains of impaired reciprocal social interaction, communication and repetitive behaviours and stereotyped patterns, although failure to reach cut-off in one of the domains by one point was permitted. Intellectual ability was assessed using the Wechsler Abbreviated Scale of Intelligence (WASI) [Wechsler, 1999], ensuring the participants fell within the high-functioning range on the spectrum defined by a full-scale IQ > 70. The demographics of the participants are shown in detail in Table 1.
Structural MRI were obtained using Driven Equilibrium Single Pulse Observation of T1 and T2 (DESPOT1, DES-POT2) [Deoni et al., 2008] at King's College London and University of Cambridge, both with 3T GE Medical Systems HDx scanners. Using multiple spoilt gradient recall (SPGR) acquisitions in the DESPOT1 sequence and steady state free procession (SSPF) acquisitions in the DESPOT2 sequence, with different flip angles and repetition times, quantitative T1 and T2 maps (qT1 and qT2) were calculated with a custom ImageJ plug-in package. Correction of main and transmit magnetic field (B0 and B1) inhomogeneity effects was performed during the estimation of T1 and T2.
For accurate registration to the standard stereotatic space of the Montreal Neurological Institute (MNI), a simulated T1-weighted Inversion Recovery (IR) images (synT1) were created based on the qT1 maps (Ecker et al., , 2013Lai et al., 2012]. The synT1 images were then segmented using New Segment into grey (GM) and white matter (WM) maps, and normalized to the MNI space using DARTEL in SPM8 [Friston et al., 2007], with modulation (preserve volume) to retain information of regional/ local GM and WM volumes, and smoothed with a 3mm FWHM Gaussian Kernel to account for inter-subject misregistration. The synT1, qT1 and qT2 maps were also registered to the standard MNI space using the same DAR-TEL flow fields, but without modulation (preserve concentration) to retain information of regional/local T1 contrast, T1 relaxation time, and T2 relaxation times, and smoothed with a 3mm FWHM Gaussian kernel. Therefore, there were five different modalities: qT1, qT2, synT1 map, GM and WM maps, for each every participant, which allows us to observe the impact of our SWPCA correction of site-related undesired variance on quantitative (qT1 and qT2), simulated (synT1) images and probability maps (GM and WM).
During the pre-processing of the images, several procedures targeted the reduction of inter-subject and inter-site geometric distortion, amongst them the correction of B0 and B1 field inhomogeneity effects and the registration to MNI space. Many other algorithms have been proposed to help in this task. However, the study of their relative performance lies beyond the scope of this article. Following image registration, it was assumed that only the intensity of the maps was affected between sites.

Intracerebral Mask
Prior to any processing of the brain images, masks were applied that restricted the analysis to the brain parenchyma. These binary masks comprised voxels in the DARTEL study-specific template with GM (or WM) probabilities >0.3, whilst excluding voxels with a cerebrospinal fluid (CSF) tissue probability >0.3, to avoid including CSF regions where T1 and T2 values are extremely high. Throughout this work three masks were used: selectively GM (excluding WM and CSF); selectively WM (excluding GM and CSF); both GM and WM (excluding CSF).

Significance Weighted Principal Component Analysis
The SWPCA is an algorithm to reduce, in this case, undesired intensity variance introduced by multi-site image acquisition. SWPCA takes any dataset of preprocessed images, spatially normalized, and decomposes them into their variance components to then provide a corrected dataset where these undesired variance components have been reduced. To do so, PCA was applied to each modality in turn to obtain the component scores and component loadings. Since PCA is a data-driven approach, it was only used to decompose the source images, and after this procedure, a one-way ANOVA estimated the relation between each variance component and a given categorical variable, in our case, the acquisition site. The between-site variability in the variance component was then identified by its corresponding P-value. Finally, these P-values were transformed into a weighting matrix K that weighted the influence of each variance component in a final PCA reconstruction of the corrected maps. The procedure is summarized in Figure 1.

Principal component analysis
The first step in the SWPCA algorithm was to perform a PCA decomposition of the dataset into a set of orthogonal components that model the variance present in the images, in an analogous way to the paradigm described in Active Appearance Models (AAM) [Cootes et al., 2001]. PCA is a statistical procedure that uses an orthogonal transformation to convert a set of observations X of possibly correlated variables, where X is a K 3 N matrix, with K participants (in this case, with one image per participant) and N the number of voxels, into a set of N linearly uncorrelated variables called Principal Components (PC, also known as component loadings or the mixing matrix) W of size N 3 N whose linear combination using a vector of component scores s K can perfectly recompose each image. The set of these component scores S (size K 3 N) was estimated as:

S5XW
(1) This transformation computes a sequence of PCs, maximally explaining the variability of the data while maintaining orthogonality between components. PCA was computed using singular value decomposition (SVD): where U is an K 3 K orthogonal matrix, R is an K 3 N diagonal matrix with non-negative real numbers on the diagonal, and the N 3 N unitary matrix V * denotes the conjugate transpose of the N 3 N unitary matrix V. With this decomposition both the component scores and estimates of the set of components loadings W were obtained. In this work the truncated form of SVD was used such that only the first C components were considered, where most of the variability of the data was concentrated: where S C is the set of component scores using the first C components (size K 3 C). To achieve reasonable performance with minimal information loss, it was assumed that the number of components was the same as the number of images, C 5 K. Thus, a partial reconstruction of the original signal could be undertaken: where A C is the pseudoinverse of the truncated matrix of component loadings W C , andX is the reconstructed set of images.

One-way analysis of variance
The estimated PCs effectively model the variability of the image dataset. The next step was to assess each PC as a source of inter-site variance with ANOVA. ANOVA estimates the F-statistic, defined as the ratio between the estimated variance within groups and the variance between groups: Where MS within and MS between are the mean squares withinand between-groups respectively, G is the number of separate groups (in our case, two), Y is the sample mean of a certain feature (in our case, the sample mean of all K values of a given component score), Y i is the sample mean of the features belonging to group i 5 1. . .G, Y ij is the j th observation of a feature belonging to group i and n i is the number of participants in the i th group. The F-distribution allows an easy computation of P-values, given the number of groups and degrees of freedom. The F-statistic and Pvalues were computed independently for each component

Weighting function
To obtain a set of corrected maps, a new signal matrix of all maps of the same modality,X , was estimated with the influence of the PCs with variance related to acquisition site, assessed via the P-values, reduced. To do so, Eq. (4) was modified to include a square matrix K (dimension C 3 C) whose diagonal contains a weight k c for each component that depends on its P-value; that is,

X5SKA
( The computation of each k c , for each component, was performed using the Laplace distribution, modified so that the weights were on the interval [0, 1]: where p c is the statistical significance of the c-th component with respect to the acquisition site and p th is the statistical threshold for significance; that is, p th 50.05. This weighting ensured that most of the components of variance that are not related to the acquisition site are kept unchanged, while at the same time it strongly reduces the influence of components with P-values less than the threshold. This procedure is illustrated in Figure 2, where a boxplot of the distribution of the first four principal component scores is shown. Since we have assumed that substantial differences imply a bigger influence of the acquisition site on the portion of variance modelled by that component, the resulting weight is reduced, and the contribution of that component to the reconstructed signal will be smaller. After computing all weights, most of the sources that are related to the acquisition site (for example, the second and third components) have been parsed out while keeping all other sources of variance.

Experimental Settings and Validation
To validate the effects of the SWPCA algorithm on the inter-site variance, experiments were undertaken to assess the reduction of the undesired site variance in the original datasets, and its impact on the between-group signal. Two kind of analysis were performed: a characterization of voxel-wise differences, and a classification analysis.
Voxel-wise differences between groups were characterized using voxel-based-morphometry (VBM) [Ashburner and Friston, 2000], comprising preprocessing (registration, smoothing) and mass-univariate t-test on the smoothed maps from each modality. SWPCA is included (when needed) in this pipeline as a plug in, after the smoothing and before the computation of the test. Permutation testing assessed the significance of the relationship between the tested and target variables. A max-type procedure was used to obtain family-wise, whole-brain corrected P-values [Freedman and Lane, 1983]. Additionally, a componentbased morphometry (CBM), based on source based morphometry (SBM) [Xu et al., 2009] was used. This procedure provided Z-maps for visual inspection comparable to those obtained in VBM, by selecting component loadings W, scaling them to unit standard deviation and weighting their contribution to the final map with their statistical significance, computed using the same permutation inference as in VBM.
A classification analysis was undertaken using a common classification pipeline [Khedher et al., 2015;L opez et al., 2009] consisting of preprocessing, feature extraction and classification. SWPCA is used as a plug-in here as well, after the preprocessing and before the feature Box-plot of the distribution of the component scores at each site in the four first components. We assume that bigger differences between distributions imply a bigger influence of the acquisition site on the portion of variance modelled by that component and therefore, to parse out those differences, the resulting weight will be smaller. [Color figure can be viewed at wileyonlinelibrary.com] r On the Brain Structure Heterogeneity of Autism r r 1213 r extraction step. We used PCA on the images for feature reduction and a Support Vector Classifier (SVC) with linear kernel, as implemented in LIBSVM [Chang and Lin, 2011], to classify the component scores in both corrected and uncorrected datasets (i.e. with and without SWPCA).
The classification was validated using stratified 10-fold cross-validation [Kohavi, 1995]. In brief, 9 subsets of the dataset were used for extraction of the PCs and training of the classifier with the remaining subset used for testing. This procedure was repeated for each subset, repeated 10 times to avoid possible bias and random effects of the partitions. The average and standard deviation of the accuracy (acc), sensitivity (sens) and specificity (spec) values for each repetition were recorded.
For each modality independently, the following experiments were performed: Experiment 1: To demonstrate the ability of the SWPCA algorithm to reduce undesired effects due to acquisition site, the PCA 1 SVC pipeline was applied to the datasets labelled by acquisition site. Classification accuracy was compared to datasets with and without SWPCA. VBM was then applied to identify the spatial location of the between-site differences. This was undertaken on the whole database (ALL), and subgroups containing only ASD or CTL participants.

Experiment 2:
The discrimination ability of each modality, acquired at different sites was assessed by classification performance of individuals from London (LON) and Cambridge (CAM) was separately assessed, using group (ASD and CTL) as the labels.
Experiment 3: To assess the impact of SWPCA on the datasets when characterizing the differences between ASD and CTL groups, the classification pipeline comprising PCA 1 SVC, as well as VBM and CBM, were applied to all participants with group as the labels.

Experiment 1: Effect of Acquisition Site
The first experiment was to demonstrate the ability of SWPCA to reduce the intensity variance related to acquisition site. To do so, we first performed a VBM analysis in all five modalities (qT1, qT2, synT1, GM and WM) separately, with the uncorrected (without applying SWPCA) and the corrected (after applying SWPCA) maps, using the acquisition site as labels.
To illustrate where the sources of variance of the acquisition sites are located, Figure 3 shows a brain t-map of significant (P < 0.01, |t|>2.57) GM and WM between-site differences. The biggest reductions in variance were found in qT1 and synT1 maps, where high variability between acquisition sites, especially in the right hemisphere, was substantially reduced after the application of SWPCA. The reduction in the qT2, GM and WM maps was smaller, although noticeable.
To quantify the impact of this variance reduction on the between-groups effects, the classification analysis was undertaken. Higher accuracy values imply that the maps contain site-related patterns that were significant, whereas accuracy close to 0.5 indicates that the site-related variance was low. The test was applied to ALL, and also to the ASD and CTL subgroups. The classification results are presented in Table 2.
Performance results indicate clear advantages of using SWPCA, in particular in the case of qT1 and synT1 which were associated with strong site-dependent variance. These results are also consistent with the reduction of significant between-group areas observed in Figure 3.
The between-site differences were smaller for GM and WM maps, possibly due their reduced sensitivity. Since fractional occupancy values are abstract, unitless values derived from each image they are less influenced by the acquisition site effects. For qT2 maps, the site-related differences were greater for the CTL participants than ASD where, according to the classification accuracy, they were nearly indistinguishable. Acquisition site differences were therefore noticeably reduced in the CTL and ALL databases, but not in the ASD.

Experiment 2: Within-Site between-Group Differences
In this second experiment, accuracy, sensitivity and specificity in the between-group comparison were recorded for images acquired from each site. This is an estimation of the discrimination ability of the different modalities without the influence of the site effects; Table 3. For all modalities, most of the values are close to a random classifier ($50%), indicative of having either no significant differences between groups, or having spatially heterogeneous patterns of sMRI measures across individuals where mass-univariate approaches are sub-optimal in detecting group differences. It is interesting to note that the London sample contained more between-group differences that those acquired in Cambridge.

Experiment 3: Effect of SWPCA on Group Differences
Finally, group differences were characterised with and without applying site-effects reduction via SWPCA to the five modalities.
Whole-brain VBM analysis was performed on the corrected and uncorrected maps from each modality. Figure 4 depicts the brain t-maps of significant (P < 0.01, |t|>2.57) qT1, qT2, synT1, GM and WM between-group differences, using ALL, with the GM 1 WM mask, before and after applying SWPCA, so that the reduction of site-related r Martinez-Murcia et al. r r 1214 r variability can be observed. Some of the highlighted areas after applying SWPCA are inconsistent across modalities, with spurious peaks and noise, including a large area around the ventricles in the qT1 and synT1 modalities related to some abnormal participants that will be discussed later. However, there were some areas that were consistent across modalities. The complementary CBM (Section 2.4) analysis was performed on the most significant components. The resulting regions, statistically thresholded with |Z|>2.57 Figure 3. Brain t-map (voxel-based morphometry) of significant (P < 0.01, |t|>2.57) GM and WM betweengroup differences using qT1, qT2, synT1, GM and WM modalities after applying SWPCA to remove site effects. [Color figure can be viewed at wileyonlinelibrary.com] r On the Brain Structure Heterogeneity of Autism r r 1215 r (corresponding to P < 0.01), were superimposed on the MNI template, and are depicted in Figure 5. A reduction of significant between-group areas after applying SWPCA is evident in most modalities, but particularly noticeable in the qT1 and qT2. In WM no significant regions were observed, neither before nor after SWPCA. The significant regions identified in any modality corresponded to the AAL areas of the CSF filled areas around the ventricles (planes z 5 26, 4, 14, 24), the right middle temporal gyrus (plane z 5 14) and the left crus I of cerebellar hemisphere (plane z 5 226). However, none of these regions were repeated over more than two of the modalities, except for the large areas around ventricles that were caused by abnormalities in three participants, which will be discussed later.
Performance results for the classification analysis applied to ALL are shown in Table 4. Between-group results were quite similar before or after applying SWPCA, although reducing between-site variance generally reduced the performance towards a random classifier. The results in this table match the overall effects that were found in Figure 4, where most spurious significance peaks disappeared after applying SWPCA, but some regions were highlighted. These regions, where SWPCA did not seem to eliminate the significant areas but enhanced them, could be responsible for the accuracy increment in the analysis of the qT2 modality, and the GM with GM mask.

DISCUSSION
Brain anatomical and functional differences between ASD participants and controls have been explored by a number of previous studies [Di Martino et al., 2014;Ecker et al., 2015;Hernandez et al., 2015;Lenroot and Yeung, 2013;Z€ urcher et al., 2015]. Many affected structures have  been proposed in each of these studies, however as a recent large-scale study points out [Haar et al., 2014], these are frequently inconsistent throughout the literature. Researchers argue that most of these structures are database-dependent, and since many studies use multi-site acquisition procedures, the variance introduced by each acquisition site is a probable source of Type I errors.
The technical and logistical drawbacks of multicentre studies are widely documented, including participant recruitment procedures [Pearlson, 2009] and technical effects that range from the usage of different equipment or acquisition parameters [Van Horn and Toga, 2009] to physical changes that affect the performance of MRI scanners across time [Pearlson, 2009]. There is general recognition that standardization is needed to ensure the uniformity of the acquired maps. Different approaches have been used in large-scale studies, such as the Alzheimer's Disease Neuroimaging Initiative (ADNI) where Brain t-map (voxel-based morphometry) of significant (P < 0.01, |t|>2.57) grey and white matter differences in ASD using qT1, qT2, synT1, GM and WM images before and after applying SWPCA to remove site effects. [Color figure can be viewed at wileyonlinelibrary.com] r On the Brain Structure Heterogeneity of Autism r r 1217 r human "phantoms" were used to perform a preparatory optimisation of MRI scanning platforms [Friedman and Glover, 2006].
There are two major types of site effects, regardless of their source: geometric distortions and intensity inhomogeneities. In this work, we focused on the latter, since much of the geometric distortion has been eliminated during acquisition (see section "Image Database"), and the DAR-TEL normalization and registration acts as a homogenizing step, reducing both between-site and between-subject geometric differences, substantially reducing the impact of the site-related geometric differences.
Regarding intensity correction, in the MRC AIMS database used in this study [Ecker et al., , 2013[Ecker et al., , 2015, a standardization procedure based on quantitative imaging [Deoni et al., 2008] was used to minimize inter-site variance and improve the signal-to-noise contrast. However, as the between-site analysis in "Experiment 1: Effect of Acquisition Site" section suggests, this strategy still results in variance that makes it easier to distinguish scanning sites than diagnostic groups. For example, when using qT1 the accuracy for LON vs. CAM classification was >80%, whilst when classifying ASD vs. CTL it was 52%. This marks the substantial effect of site variance on the maps' intensity distribution, even when the multi-site study employs quantitative imaging protocol on the same model of scanner platform across sites. However, with the inclusion of GM and WM maps, we can observe that the inhomogeneities found on qT1 or synT1 barely affected the segmentation procedure.
In this work, the approach we have taken is to perform a multivariate decomposition of each dataset into a number of components that explain different portions of variance. The following step was to identify the components of variance that are due to multi-site acquisition and reduce them. Decomposition was completed using PCA and then, to identify which of the components were linked to acquisition site, we performed an ANOVA on the component scores. Finally, using the weighting function defined in Sec. 2.3.3, we reconstructed the original signal reducing the undesired variance, in what we called Significance Weighted PCA (SWPCA). The method has proven its ability in reducing undesired variance, quantifiable by means of the accuracy obtained in a site vs. site classification. In this case, SWPCA reduced the accuracy from >0.8 to approximately $0.5, a random classifier, suggesting that most site-related variance was eliminated.
A simpler approach such as applying a voxel-by-voxel ANOVA would also be useful to reduce the acquisition site effects . However, SWPCA is a multivariate approach that still offers major advantages over this voxel-wise algorithm, and similar algorithms have found utility in text document searches [Kriegel et al., 2008;Tavoli et al., 2013;Zhang and Nguyen, 2005]. First, PCA models the different sources of variance of the dataset, whereas a simple voxel-wise ANOVA only removes mean site differences, which might result in less statistical power. Secondly, SWPCA is multivariate in nature, where each component contains information that potentially affects all voxels. Together, these two features allow SWPCA to identify the components linked to the undesired effects, and reduce their impact with a weighted reconstruction approach, reducing the general variance related to the acquisition site. However, this increased power reveals a major drawback: SWPCA needs at least a moderate number of participants to work properly. That is the reason why we cannot apply SWPCA to databases such as ADNI [Friedman and Glover, 2006] or ABIDE [Di Martino et al., 2014], where the number of participants acquired at each site is small, or to the six travelling phantoms used in the calibration of the MRC AIMS study.
There exist a number of similar multivariate methods that model the influence of categorical variables, such as the well-known Partial Least Squares (PLS) algorithm [Vinzi et al., 2010] or Surrogate Variable Analysis (SVA) [Leek and Storey, 2007]. In the first case, both PLS and SWPCA take categorical variables Y along with the data X as inputs to partition the influence of these into components. However, the most significant difference is the underlying model. Whilst SWPCA estimates the principal components blindly using their variance, which is what we aim to reduce, and performs an ANOVA afterwards, PLS uses the categorical variable in the computation of the covariance matrix and then estimates the components.
On the other hand, SVA, used for gene expression studies [Leek and Storey, 2007], is more comparable to SWPCA. The SVA algorithm uses a number of decomposition and significance estimation steps to construct a set of surrogate variables; that is, variables that account for the unmodeled variance and expression heterogeneity. While Focusing on the VBM results, after performing the siteeffects removal by SWPCA significant between-group differences were noted in five areas: (A) the right superior frontal gyrus; (B) the pars opercularis of the left inferior frontal gyrus; (C) the pars triangularis of the left inferior frontal gyrus; (D) the posterior part of the left middle temporal gyrus; and (E) the left crus I of cerebellar hemisphere. The first three regions are within Brodmann areas 6, 44 and 45. However, when examining the projection of the region D onto the MNI template (see Fig. 6), it is also located in the posterior part of the left superior temporal gyrus. Therefore, D corresponds closely with the region between Brodmann areas 22 and 39, the Temporo-Parietal Junction (TPJ), with negative t-value at the left side (containing Wernicke's area) and positive t-value at the right side.
The role of these regions in autism has received much attention. Brodmann areas 44 and 45, that together make the Broca's Area (of importance in speech production and a proposed part of the human mirror neuron system (Nishitani et al., 2005)], is a region where mirror neuron dysfunction has been consistently reported in ASDaffected children [Dapretto et al., 2006] and adults [Hadjikhani et al., 2006;Lopez-Hurtado and Prieto, 2008;Verly et al., 2014]. Wernicke's area, contained in the left TPJ, is also linked to language, and has been associated with ASD in several works [Hadjikhani et al., 2006;Kriegel et al., 2008;Verly et al., 2014]. Additionally, the right TPJ has been proposed as related to mentalizing and has been repeatedly implicated in autism [Barnea-Goraly et al., 2004], including a fMRI study of a subsample of this same AIMS dataset [Lombardo et al., 2011]. The right superior  The template used in this work compared to two of the participants with abnormal ventricle size (21016 and 21018). Atrophy of the cerebellum in participant 21016 can also be appreciated, responsible for some of the 'highlighted' areas in qT1, qT2 and synT1 t-maps (see Fig. 4). r Martinez-Murcia et al. r r 1220 r frontal gyrus (region A) is more equivocal, with some studies [Ecker et al., 2010 reporting abnormalities in this area, while others [Hadjikhani et al., 2006;Segovia et al., 2014] report no significant differences. Our analyses reveal no differences in the insula and amygdala, brain structures frequently linked to autism.
Some regions, particularly in qT2, synT1 and segmented GM maps show potentially spurious significance peaks around the ventricles and especially in the left crus I of cerebellar hemisphere (region E). After examining the database, two individuals had appreciable structural abnormalities in the form of abnormal ventricle size and cerebellar atrophy, as can be seen in Figure 7. It is possible that these participants influenced the computation of the tmaps, and therefore are responsible for the significance in region E and areas surrounding the ventricles and, since they are part of the LON subdataset, could also be responsible for the increased classification accuracy of the quantitative T1 and T2, and the synthetic T1 maps in this subdataset.
After observing the influence of these participants on the computation of the t-maps, we can assume that most of the structural differences in ASD are so subtle that the influence of just one or two images can impact on the final results. This, along with the poor performance of the classification pipeline presented in Section 3, dramatically reduces the significance of the aforementioned t-maps. Therefore, the existing evidence leads to the conclusion that ASD presents as either undetectable structural differences or, more likely, with such heterogeneous differences that are difficult to establish a common pattern even after reducing the variance introduced by acquisition site.
It may be the case that cohorts of individuals examined at different sites are somehow systematically biased towards a specific type of patient (in ways that we cannot see simply based on phenotypic information), then siterelated intensity variability is also enriched with important variability about nested autism subgroups. So with any technique trying to remove the site-related inhomogeneity, the subgroup information could also be removed. Together, the evidence supports the claim that defining meaningful subgroups based on different measures, such as genetic profiling, clinical co-morbidities or sensory sensitivities, is the most urgent next step for ASD research [Haar et al., 2014].

CONCLUSIONS
In this work, a novel method called Significance Weighted PCA (SWPCA) is proposed. The method comprises a principal component extraction, the characterization of their statistical significance according to a categorical variable (in this case, acquisition site) and the reconstruction of the images using a weighted approach. This approach was tested with an ASD database and demonstrated its ability for reducing inter-site variance, which we have characterized with a site vs. site classification of the images. A priori, the images yielded >0.8 of accuracy, in contrast to a $0.5 accuracy between groups, suggesting that the images contained many differences depending on acquisition site. After minimizing site-related variance, statistically significant group differences were found for example in Broca's area and the temporoparietal junction. However, their discriminative power was not sufficient to classify diagnostic groups, yielding accuracy results close to random. Our work supports recent claims that ASD is a highly heterogeneous syndrome/diagnostic category that is difficult to characterize globally using neuroimaging and therefore different (and more homogeneous) subgroups should be defined using imaging and perhaps other biological measures to obtain a deeper understanding of ASD.