Increased power by harmonizing structural MRI site differences with the ComBat batch adjustment method in ENIGMA

A common limitation of neuroimaging studies is their small sample sizes. To overcome this hurdle, the Enhancing Neuro Imaging Genetics through Meta-Analysis (ENIGMA) Consortium combines neuroimaging data from many institutions worldwide. However, this introduces heterogeneity due to different scanning devices and sequences. ENIGMA projects commonly address this heterogeneity with random-effects meta-analysis or mixed-effects mega-analysis. Here we tested whether the batch adjustment method, ComBat, can further reduce site-related heterogeneity and thus increase statistical power. We conducted random-effects meta-analyses, mixed-effects mega-analyses and ComBat mega-analyses to compare cortical thickness, surface area and subcortical volumes between 2897 individuals with a diagnosis of schizophrenia and 3141 healthy controls from 33 sites. Specifically, we compared the imaging data between individuals with schizophrenia and healthy controls, covarying for age and sex. The use of ComBat substantially increased the statistical significance of the findings as compared to random-effects meta-analyses. The findings were more similar when comparing ComBat with mixed-effects mega-analysis, although ComBat still slightly increased the statistical significance. ComBat also showed increased statistical power when we repeated the analyses with fewer sites. Results were nearly identical when we applied the ComBat harmonization separately for cortical thickness, cortical surface area and subcortical volumes. Therefore, we recommend applying the ComBat function to attenuate potential effects of site in ENIGMA projects and other multi-site structural imaging work. We provide easy-to-use functions in R that work even if imaging data are partially missing in some brain regions, and they can be trained with one data set and then applied to another (a requirement for some analyses such as machine learning).


Introduction
After the early reporting of ventricular enlargement in patients with schizophrenia (SCZ) using pneumoencephalography (Huber, 1957), there has been an exponential increase in the number of studies that use imaging techniques to detect brain differences in people with psychiatric disorders. This increase is most evident for studies using magnetic resonance imaging (MRI), probably due to its high resolution and its wide availability around the globe. However, most MRI studies have examined relatively small sample sizes, a limitation that may prevent the detection of true differences (type II errors), and because of the use of liberal thresholds, may even lead to increased detection of false differences (type I errors). Consequently, reports of unreliable, inconsistent and even contradictory results are not uncommon .
Collaborative multi-site initiatives provide an opportunity to assemble larger and more diverse groups of subjects, leading to increased power and findings that may be more representative of the general population. Among these initiatives, the ENIGMA (Enhancing Neuro Imaging Genetics through Meta-Analysis; http://enigma.ini.usc.edu) Consortium (Thompson et al., 2014) stands out for including hundreds of groups worldwide and facilitating the sharing of tens of thousands of neuroimages. One great advantage of this consortium is the harmonization of the protocols to pre-process the MRI data, which has decreased the heterogeneity between the sites related to methodological factors. All sites apply the same pre-processing pipelines to obtain thickness and surface area estimates for cortical regions of interest (ROI) and volume estimates for subcortical ROIs; similar harmonized protocols are in use for standardized analysis of diffusion MRI, resting state fMRI and EEG data, as well as various kinds of omics data (GWAS and epigenetic data).
However, even though all sites participating in an ENIGMA project apply the same preprocessing protocol, data from different sites still show relevant methodological heterogeneity due to systematic differences in MRI scanning devices and acquisition sequences. Also, prior studies have reported that the results of the FreeSurfer segmentation process, for morphometric analysis of MRI, can be affected even by using different FreeSurfer versions, workstations or operating systems (Chepkoech et al., 2016;Gronenschild et al., 2012). Most ENIGMA projects address this residual heterogeneity by random-effects meta-analysis (RE-Meta), but estimation and control of heterogeneity in siteaggregated meta-analyses may be suboptimal (Chen and Benedetti, 2017). It is worth noting that a few ENIGMA studies have analyzed shared individual data (rather than siteaggregated statistical data). These "mega-analyses" of individual data considered the "site" 2.1.1. The RE-Meta approach-In the random-effects meta-analysis (RE-Meta), a linear model estimates the difference in cortical thickness between SCZ and CON for each ROI at each site, covarying for age and sex: y r, i, j = α r, i + X i, j ⋅ β r, i + ε r, i, j where y r,i,j is the measurement of cortical thickness of the rth ROI from the jth individual of the ith site, α r,i is the estimate overall cortical thickness of the rth ROI from individuals of the ith site, X i,j are the values of the variables (disorder, age, and sex) of the jth individual of the ith site, β r,i are the estimates of the coefficients of these variables for the rth ROI from individuals of the ith site, and ε r,i,j is the error term for the rth ROI in the jth individual of the ith site.
Estimates of coefficients of interest (e.g., β r,i,1 , the difference between SCZ and CON are then pooled to obtain a single estimate for each ROI (β r,meta,1 ). A typical method to pool the coefficients is the weighted mean of the coefficient of each site : where w r,i the weight of ith site for the rth ROI, and is calculated as the inverse of the variance of β r,i,1 , plus the heterogeneity for the rth ROI (τ r 2 ): w r, i = 1 var β r, i, 1 + τ r 2 Frequently, the analyst does not use the coefficients but effect sizes, such as Hedges' g , but the concept is similar.
Some problems of RE-Meta are that β r,i may be poorly estimated in sites with small sample sizes, or that τ r 2 may be poorly estimated in some scenarios (Chen and Benedetti, 2017).
This approach benefits from a more robust estimation of α r and β r as it is based on the data from all sites, as well as from a more precise estimation of the heterogeneity. However, it still may have some minor issues. It assumes that the error terms follow the same normal distribution at all sites, which may seldom be the case. We acknowledge that it is possible to create linear mixed-effects models that consider a different variance for each site, but they involve the specification of variance structures for each statistical test, which may substantially complicate the analyses. In addition, the effects of site are estimated independently for each ROI, which may be suboptimal because the effects of site, even if different for each ROI, may still share some commonalities (e.g., an MRI device may yield a better signal contrast than another across the brain).
2.1.3. The ComBat-mega approach-As compared to ME-Mega, the Combat megaanalysis (ComBat-Mega) assumes that the error terms may follow varying normal distributions at different sites: y r, i, j = α r + X i, j ⋅ β r + γ r, i + δ r . i + ε r, i, j where δ r,i are the multiplicative effects of the ith site in the rth ROI.
In addition, it assumes that the additive and multiplicative effects of the sites are not completely independent across ROIs but, rather, they share a common distribution. Such considerations prevent the use of standard linear models, but ComBat uses an empirical Bayes framework to estimate the distribution of the effects of site (Johnson et al., 2007). Once estimated, it derives the additive error terms: ε r, i, j = y r, i, j − α r − X i, j ⋅ β r − γ r, i δ r . i These terms allow the derivation of harmonized data: y r, i, j ComBat = α r + X i, j ⋅ β r + ε r, i, j These simpler data can then be analyzed with standard linear models to estimate the overall difference in cortical thickness between SCZ and CON groups, for each ROI.

Modifications of the ComBat function
Fortin and colleagues modified the original "combat" function, in the "sva" package for R (Leek et al., 2019), so that it could be applied to imaging data (Fortin et al., 2017). However, Fortin's "combat" function may not be easily applicable to ENIGMA projects as it requires that the dataset has no missing data, which is seldom the case. In addition, it finds the harmonization parameters and applies them to the data within the same function, while some analyses -such as machine learning -require that the parameters are found in a training set and later applied to an independent test set (this is not the case here, but it might be the case in future studies). We further modified the "combat" function to allow for missing data and to separate the fitting and the application of the harmonization.
First, we divided the function into two subfunctions: "combat_fit", which finds the harmonization parameters, and "combat_apply", which applies them to the same or to another set. The "combat_fit" function automatically imputes missing data so that the function can find the harmonization parameters without errors. These imputations are predictions based on linear models of the ROI values by the covariates, separately for each ROI and each site: The covariates are the variables introduced into the "combat_fit" function, which in the present study were the diagnosis, age, and sex. The "combat_fit" function also discards ROIs with no variance, which returned errors in the previous "combat" function. Importantly, these imputations are temporary and only aimed to avoid errors during the fitting of the parameters, they are not saved. To apply the parameters, the user must use the "combat_apply" function with the original data, and missing values are not imputed.

Collection of data
The data for this paper includes the cortical thickness, surface area and subcortical volumes from 33 sites of the ENIGMA Schizophrenia Working Group (van Erp et al., 2016(van Erp et al., , 2018Wong et al., 2019) who shared individual subject level FreeSurfer data for this project. The overall sample included 2897 individuals with a diagnosis of SCZ (mean age 34 years, 34% females) and 3141 CON (mean age 33 years, 49% females). For SCZ, the mean age of onset was 23 years and their Positive and Negative Syndrome Scale (PANSS) (Kay et al., 1987) scores for total/positive/negative symptoms were 61/16/17, respectively. The researchers at each of the sites had collected the data after obtaining participants' written informed consent, with protocols that had been approved by local institutional review boards. We provide a description of the overall sample in Table 2 and a description of the sample from each site in Supplementary Table S1.
All sites had processed the data with FreeSurfer (Fischl, 2012) versions 4.0 to 5.3, except for version 5.2 which was found to produce low intra-class correlations compared to the other versions, and within site all patients and controls were processed using the same FreeSurfer version (van Erp et al., 2016(van Erp et al., , 2018 according to the ENIGMA protocols, which are available at http://enigma.usc.edu/protocols/imaging-protocols. For cortical ROIs, they involved the estimation of cortical vertex-wise statistics, the extraction of cortical thickness and surface area for 70 Desikan-Killiany (DK) atlas regions (Desikan et al., 2006), and quality checks (van Erp et al., 2018). For subcortical ROIs, they involved the estimation of subcortical volumes and quality checks (van Erp et al., 2016).

Statistical analyses
We conducted comparisons of MRI data between individuals with SCZ and CON to assess the statistical significance, power and familywise error rate (FWER) using RE-Meta, ME-Mega and ComBat-Mega. We formally tested whether ComBat-Mega increases the statistical significance and power of the differences between individuals with SCZ and CON by attenuating site-effects, using a permutation test and a small-subset strategy respectively. We also used the data of the permutation test to check the FWER.

Comparisons of MRI data between individuals with SCZ and CON-We
conducted the RE-Meta in two steps. In the first step, we compared the values of each ROI between SCZ and CON via a standard linear model, with age and sex as covariates, separately for each site. We then converted the difference to a Hedges' g and its variance for each site and ROI. In the second step, we conducted a random-effects meta-analysis of the Hedges' g of each ROI with the "metafor" package for R (Viechtbauer, 2010), and we corrected the p-values for multiple comparisons with the Holm method.
For ME-Mega, we compared the values of each ROI between SCZ and CON via a linear mixed-effects model, with age and sex as covariates and site as a random factor, with the "lme4" and "lmerTest" packages for R (Bates et al., 2015;Kuznetsova et al., 2017). We then divided the difference by the standard deviation (derived from the model) and corrected it for small-sample bias to obtain a Hedges' g and its variance, and we corrected the p-values for multiple comparisons using the Holm method (Holm, 1979).
Finally, for ComBat-Mega, we first removed the effects of site using the ComBat functions (modelling the effects of diagnosis, age, and sex), and then compared the values of each ROI (e.g., cortical thickness of the frontal pole) between SCZ and CON via a standard linear model, with age and sex as covariates. Note that the ComBat functions use covariates (e.g., age and sex) to better estimate the effects of site, but they do not remove the effects of these covariates; for this reason, we included these covariates in the subsequent linear model. As for ME-Mega, we converted the difference to a Hedges' g and its variance, and we corrected the p-values for multiple comparisons with the Holm method. Note that we applied a single ComBat harmonization for different types of data (cortical thickness, cortical surface area, and subcortical volume) because we considered that they were related. We also conducted an alternative analysis with a separate harmonization for each type of data.

Comparison of the statistical significance-To test whether
ComBat-Mega had improved the statistical significance we used a permutation approach. We followed the Draper-Stoneman procedure, which according to results from a study comparing different algorithms (Winkler et al., 2014), is one of the procedures that best controls the FWER and that can be safely applied here. Note that other algorithms such as Freedman Lane would produce different permuted data for RE-Meta, ME-Mega and ComBat-Mega, which would be problematic in our study because these unwanted differences could confound other potential differences between the methods. Specifically, we randomly permuted the diagnosis among the individuals within each site and repeated all comparison analysis 1000 times.
To show the differences in statistical significance between methods expected by chance, we plotted the histogram of the median difference in the logit-transformed p-values between the methods across the permutations (Fig. 1). For example, in one permutation we randomly assigned study participants to patient or control status. We then compared these randomly assigned patients and controls using RE-Meta, ME-Mega and ComBat-Mega. We then calculated differences between logit-transformed p-values of the ComBat-Mega comparison and logit-transformed p-values of the RE-Meta (or ME-Mega) comparisons for each ROI. From these, we only saved the median between logit-transformed p-value difference. Note that this median difference should be very close to zero, given that participant assignment was random, and there should therefore be no patient-control group differences other than by chance. By conducting multiple of these permutations, we were able plot the histogram of the median differences expected by chance alone. Finally, we compared the median difference of the original analysis (with correctly assigned patient and control status) with the histogram of the median differences expected by chance. Only median differences were used in this analysis to simplify the test as doing so avoids the need to correct for multiple comparisons.
We must note that without the logit (or other) transforms, the detection of differences in statistical significance would be too sensitive for large p-values and too little sensitive for small p-values. For example, if the (non-transformed) p-value using one approach was 0.6 and the (non-transformed) p-value using another approach was 0.4, the difference in pvalues would be very large (0.6-0.4 = 0.2) even if the two p-values might be considered conceptually very similar, whereas if the (non-transformed) p-value using one approach was 0.003 and the (non-transformed) p-value using another approach was 0.001, the difference in p-values would be very small (0.003-0.001 = 0.002) even if one p-value is three times the size of the other. With the logit transform, the p-values of the first example would be 0.4 and −0.4, with a difference of 0.8, and the p-values of the second example would be −5.8 and −6.9, with a difference of 1.1.
The use of a permutation test implied that both the estimated probability of obtaining the observed median difference in (logit-transform) p-values was discrete, i.e., it could only be 0.001, or 0.002, or 0.003, etcetera. However, we were only interested in assessing if this estimation was <0.05, for what this level of precision should not pose any problems.

Evaluation of the statistical power-
We also tested whether ComBat-Mega increases the statistical power using a small-subset strategy. Specifically, we repeated 500 times the analyses but including each time only a random sample of 10 sites. We then counted the number of times that these analyses using only 10 sites were able to detect differences between SCZ and CON. We only used ROIs in which the differences between SCZ and CON were strongly statistically significant in the main analyses using the 33 sites (FWER<0.001 for RE-Meta, for ME-Mega, and for ComBat-Mega), as we assumed that they have true differences. Finally, we conducted a Wilcoxon signed-ranked test to compare the statistical power across ROIs between ComBat-Mega and RE-Meta, as well as between ComBat-Mega and ME-Mega.

Determination of the empirical FWER-
We also used the permutation data created above to check whether the FWER for the three methods were appropriate, i.e., we counted the proportion of permutations in which at least one ROI had a Holm-corrected pvalue < 0.05. Again, the use of a permutation test implied that the estimated FWER was discrete, but we were only interested in assessing whether it was <0.05.

Results
With ComBat-Mega, on average, individuals with a diagnosis of SCZ showed thinner cortex and smaller surface area in nearly all cortical ROIs (Table 3). The only exceptions were the bilateral pericalcarine fissures and right entorhinal cortex (where between-group differences in thickness did not reach statistical significance after correction for multiple comparisons) and the left isthmus of the cingulate and right temporal pole (where between-group differences in surface area did not reach statistical significance after correction for multiple comparisons). The SCZ group also showed, on average, smaller bilateral thalamus, hippocampus, amygdala, and right accumbens volumes, and larger bilateral lateral ventricle, putamen, and pallidum volumes. Smaller left accumbens and larger bilateral caudate volumes were not statistically significant after correction for multiple comparisons.
Results were in the same direction for the RE-Meta and ME-Mega, though RE-Meta did not detect thinner cortex in three ROIs (bilateral rostral anterior cingulate and left caudal anterior cingulate) and smaller surface area in six ROIs (bilateral pericalcarine fissure, left posterior cingulate and temporal pole, and right isthmus cingulate and insula).
The Hedges' g estimates for the differences were similar across the different analytic methods, but their statistical significance was greater in ComBat-Mega as compared to RE-Meta and ME-Mega (Figs. 2 and 3). The difference in statistical significance was relatively minor when comparing ComBat-Mega to ME-Mega, whereas particularly relevant when comparing ComBat-Mega to RE-Meta (Fig. 3).
The median difference between logit-transformed ComBat-Mega p-values and logittransformed RE-Meta p-values in the original data was 13.9. This was substantially larger than any of the median differences in the permuted data (all < 0.61), indicating that the higher statistical significance of ComBat-Mega findings was unlikely due to chance (probability 0.001) (Fig. 4). For the comparison between ComBat-Mega and ME-Mega, the median difference was smaller (3.2), but still unlikely due to chance (all median differences in the permuted data <0.52, probability 0.001).
Interestingly, a plot of the ComBat-Mega-related increase in statistical significance as a function of the intra-site variance/total variance ratio, showed that the increase in statistical significance was larger in those ROIs in which intra-site variance was only ~50-70% of total variance compared to those ROIs in which intra-site variance was ~90-100% of total variance (p < 0.001, Fig. 5).
In the evaluation of statistical power using the small-subset strategy, the statistical power was higher for ComBat-Mega (statistical power = 83.5%) than for RE-Meta (statistical power = 53.7%; Wilcoxon p-value < 0.001) or ME-Mega (statistical power = 80.4%; Wilcoxon p-value < 0.001).
When we applied the ComBat harmonization separately for cortical thickness data, cortical surface area data and subcortical volume data, we found the same differences with nearly identical Hedges' g (Supplementary Figure S1). The statistical significance was minimally lower (median difference between single ComBat logit-transformed p-values and separate ComBat logit-transformed p-values was 0.1), the statistical power in the small-subset strategy was 83.5%, and the empirical FWER was 0.026.
When we covaried ComBat-Mega by age, sex and ICV, results were similar: The only differences were that the right frontal pole, isthmus of the cingulate and pericalcarine and left parahippocampal and temporal pole decreases in surface area were no longer statistically significant, whereas the left pericalcarine decrease in surface area and the bilateral caudate increases in volume reached statistical significance. Results were again in the same direction for the RE-Meta and ME-Mega, though RE-Meta did not detect statistically significant differences in 36 of the ROIs showing differences with ComBat-Mega, and ME-Mega did not detect smaller right accumbens volume (and detected smaller surface area in left parahippocampal and right pericalcarine but not in left paracentral and right entorhinal). The Hedges' g estimates for the differences were again similar across the different analytic methods, but their statistical significance was again greater in ComBat-Mega as compared to RE-Meta and ME-Mega (Supplementary Figure S2).

Discussion
In this study, we analyzed individual subject level data pooled by the ENIGMA Schizophrenia Working Group using three methods to account for the effects of site: random-effects meta-analysis (RE-Meta), linear mixed-effects models (ME-Mega), and ComBat harmonization followed by standard linear models (ComBat-Mega). The results of the comparison between SCZ and CON using ComBat-Mega were similar to the studies already published by the ENIGMA Schizophrenia Working Group: SCZ showed a widespread thinner cortex and smaller surface area (van Erp et al., 2018), smaller hippocampus, amygdala, thalamus and accumbens, and larger lateral ventricles, putamen and pallidum (van Erp et al., 2016) than CON. The results of the same comparison using RE-Meta and ME-Mega were in the same direction and had similar effect sizes, although with a lower statistical significance (i.e. wider confidence intervals, larger p-values), especially for RE-Meta. In other words, the use of ComBat increased the statistical significance (i.e., narrower confidence intervals, smaller p-values) of the differences between SCZ and CON. This was specially apparent in those ROIs in which intra-site variance was only ~50-70% of total variance. ComBat Mega also showed increased statistical power when we repeated the analyses with fewer sites. All approaches controlled well the FWER, even too strictly probably due to the use of the Holm method, which is more powerful than the Bonferroni method but still conservative (Blakesley et al., 2009). Findings were similar when covarying by ICV.
Based on these findings, we recommend that ENIGMA mega-analysis projects consider applying the ComBat function to reduce the effects of site, followed by standard statistical analysis without including site as a fixed or random effect in the statistical model. To apply ComBat harmonization, we provide easy-to-use functions for R that work even if there are missing data and they can be trained with data from one set and then applied to data from another.
We must note that we conducted these analyses with the three main types of data used in ENIGMA projects: thickness of cortical ROIs, surface area of cortical ROIs, and volumes of subcortical nuclei. However, some ENIGMA projects use other types of data, such as mean fractional anisotropy of white matter tracts, and we have not explored whether the application of ComBat would be beneficial for these projects. Two notions suggest that ComBat should be broadly beneficial. On the one hand, the ComBat algorithm is not specific for a given type of imaging data. Indeed, while it was developed for genomics data (Johnson et al., 2007), we here successfully applied it to three types of ENIGMA imaging data. Moreover, Fortin and colleagues found that ComBat outperforms other harmonization methods for voxel-based fractional anisotropy and mean diffusivity (Fortin et al., 2017), and Yu et al. found similar results for resting-state functional connectivity and network measures (Yu et al., 2018).
While our findings suggest that ComBat harmonization will be useful for most ENIGMA mega-analyses and other multi-site structural imaging work, we suggest caution when combining different types of data. We conducted a single ComBat harmonization for different types of MRI data because we considered that thickness, area, and volume are related, as they are obtained from the same FreeSurfer output of the T1-weighted image and all measure amounts of gray matter. Indeed, an alternative analysis with separate ComBat harmonization for each type of data yielded nearly identical results. However, we do not know whether the application of a single ComBat harmonization on other combinations of data would behave similarly.
Other popular approaches for pooling neuroimaging data are the voxel-based meta-analytic methods, such as Seed-based d Mapping (SDM)  or Activation Likelihood Estimation (ALE) (Eickhoff et al., 2009(Eickhoff et al., , 2012. These methods can include imaging studies even if they only report the coordinates of the peaks of the clusters of statistical significance. Therefore, a great advantage of these methods is the exhaustive inclusion of studies. In addition, the analyses are conducted at the voxel level (rather than using ROIs). These methods traditionally tested whether the reported findings tended to converge in a few brain voxels (Albaje-s-Eizagirre and Radua, 2018), but novel methods are able to directly test whether there are differences -even if they are widespread and do not converge (Albaje-s-Eizagirre et al., 2019). In view of the results of the present study, one could wonder whether these voxel-based methods should also conduct ComBat mega-analysis instead of meta-analysis. However, to use ComBat they would need access to individual subject level data, which at present are often not available. Another aspect to consider is whether we need SDM or ALE meta-analyses after an ENIGMA ComBat megaanalysis is published. Here, we must remember that SDM and ALE are voxel-based and include virtually all published studies, whereas most ENIGMA studies are ROI-based and include only the data that authors agree to share. Therefore, these different approaches present interesting complementary information.
Our study has some limitations. First, we already stated that we have not explored whether the application of ComBat would be beneficial for projects using other types of data, although several facts suggest that ComBat should be broadly beneficial. Second, we also acknowledged that we do not know whether the application of a single ComBat harmonization on other combinations of data would behave similarly. Third, our analysis is focused on the differences between SCZ and CON, whose distribution is roughly similar across sites. The effects of site and thus the importance of their removal might be larger for conditions with few cases in each site, where pooling data is more beneficial. Fourth, ComBat-Mega addresses some issues but not others, which still need to be investigated, such as site by nuisance confounds. For example, a site with poor quality data may also be a site with a mean age higher than other sites. Future studies addressing these questions could point to methods other than ComBat. Finally, there is a conceptual difference in the effects of site that are modeled in ComBat/ME-Mega and the effects of site that are modeled in RE-Meta. The former effects are in (individual) raw data and refer to site-specific constants that are added to or that multiply the measurements. The latter effects, conversely, are in (group) effect sizes, and are probably a mix of several factors such as site-specific constants that multiply the measurements, heterogeneity in the differences between SCZ and CON, or differences in precision between studies.
To conclude, this paper provides evidence of the superiority of ComBat harmonization over standard mega-analyses and meta-analyses in reducing site-related heterogeneity and thus increase statistical power. We therefore recommend that ENIGMA mega-analysis projects and other multi-site structural imaging work consider applying the ComBat function, which we provide employing easy functions for R. The provided code works with missing data and allows for harmonization of a test set based on the training set (a requirement for machine learning and possibly replication studies). We hope that future ENIGMA mega-analysis projects will improve between-site harmonization using ComBat.  Steps of each iteration of the permutation test used to compare the statistical significance between random-effects meta-analysis, mixed-effects mega-analysis and ComBat megaanalysis Footnote: ComBat-Mega: ComBat mega-analysis; ME-Mega: mixed-effects mega-analysis; RE-Meta: random-effects meta-analysis. Forest plot for random-effect meta-analysis (light red), mixed-effects mega-analysis (blue) and ComBat mega-analysis (dark green). Footnote: The width of the confidence intervals in the legend corresponds to the mean width of the confidence intervals across the brain. ComBat-Mega: ComBat mega-analysis; ME-Mega: mixed-effects mega-analysis; RE-Meta: random-effects meta-analysis. Hedges' g and p-values of random-effect meta-analysis, mixed-effects mega-analysis and ComBat mega-analysis in the comparison of ENIGMA brain data between 2897 patients with schizophrenia and 3141 healthy controls. Footnote: Each cross represents an ROI. ComBat-Mega: ComBat mega-analysis; ME-Mega: mixed-effects mega-analysis; RE-Meta: random-effects meta-analysis. The top plots show that ComBat-Mega effect sizes are similar to RE-Meta and ME-Mega effect sizes, as crosses are mostly distributed around the diagonal lines. The bottom plots show that ComBat-Mega p-values are substantially smaller than RE-Meta p-values (crosses are clearly above the diagonal line), and slightly smaller than ME-Mega p-values (crosses tend to be slightly above the diagonal line).

Fig. 4.
Median difference between logit-transformed p-values derived from ComBat mega-analysis and logit-transformed p-values derived from random-effects meta-analysis and mixed-effects mega-analysis in the original data (red) and in the permuted data (histograms). Footnote: ComBat-Mega: ComBat mega-analysis; ME-Mega: mixed-effects mega-analysis; RE-Meta: random-effects meta-analysis. The histograms (in gray) show the expected ComBat-Mega-related increase of statistical significance by chance, and the arrows (in red) show the actual increase. The latter is clearly larger than that former (negative values mean that ComBat-Mega increases statistical significance).

Fig. 5.
Relationship between the intra-site variance/total variance ratio and ComBat mega-analysisrelated increase of statistical significance. Footnote: ComBat-Mega: ComBat mega-analysis; ME-Mega: mixed-effects mega-analysis; RE-Meta: random-effects meta-analysis. The ComBat-Mega-related increase of statistical significance (negative values in the Y axis) is larger in regions with lower intra-site variance/ variance ratio (around 50-70%). Radua et al. Page 36 Table 2 Description of the overall sample.   Table 3 Effect sizes and confidence intervals derived from the ComBat mega-analysis.