Resting state fMRI scanner instabilities revealed by longitudinal phantom scans in a multi-center study

Quality assurance (QA) is crucial in longitudinal and/or multi-site studies, which involve the collection of data from a group of subjects over time and/or at different locations. It is important to regularly monitor the performance of the scanners over time and at different locations to detect and control for intrinsic differences (e.g., due to manufacturers) and changes in scanner performance (e.g., due to gradual component aging, software and/or hardware upgrades, etc.). As part of the Ontario Neurodegenerative Disease Research Initiative (ONDRI) and the Canadian Biomarker Integration Network in Depression (CAN-BIND), QA phantom scans were conducted approximately monthly for three to four years at 13 sites across Canada with 3T research MRI scanners. QA parameters were calculated for each scan using the functional Biomarker Imaging Research Network's (fBIRN) QA phantom and pipeline to capture between- and within-scanner variability. We also describe a QA protocol to measure the full-width-at-half-maximum (FWHM) of slice-wise point spread functions (PSF), used in conjunction with the fBIRN QA parameters. Variations in image resolution measured by the FWHM are a primary source of variance over time for many sites, as well as between sites and between manufacturers. We also identify an unexpected range of instabilities affecting individual slices in a number of scanners, which may amount to a substantial contribution of unexplained signal variance to their data. Finally, we identify a preliminary preprocessing approach to reduce this variance and/or alleviate the slice anomalies, and in a small human data set show that this change in preprocessing can have a significant impact on seed-based connectivity measurements for some individual subjects. We expect that other fMRI centres will find this approach to identifying and controlling scanner instabilities useful in similar studies.

a b s t r a c t Quality assurance (QA) is crucial in longitudinal and/or multi-site studies, which involve the collection of data from a group of subjects over time and/or at different locations. It is important to regularly monitor the performance of the scanners over time and at different locations to detect and control for intrinsic differences (e.g., due to manufacturers) and changes in scanner performance (e.g., due to gradual component aging, software and/or hardware upgrades, etc.). As part of the Ontario Neurodegenerative Disease Research Initiative (ONDRI) and the Canadian Biomarker Integration Network in Depression (CAN-BIND), QA phantom scans were conducted approximately monthly for three to four years at 13 sites across Canada with 3T research MRI scanners. QA parameters were calculated for each scan using the functional Biomarker Imaging Research Network's (fBIRN) QA phantom and pipeline to capture between-and within-scanner variability. We also describe a QA protocol to measure the full-width-at-half-maximum (FWHM) of slice-wise point spread functions (PSF), used in conjunction with the fBIRN QA parameters. Variations in image resolution measured by the FWHM are a primary source of variance over time for many sites, as well as between sites and between manufacturers. We also identify an unexpected range of instabilities affecting individual slices in a number of scanners, which may amount to a substantial contribution of unexplained signal variance to their data. Finally, we identify a preliminary preprocessing approach to reduce this variance and/or alleviate the slice anomalies, and in a small human data set show that this change in preprocessing can have a significant impact on seed-based connectivity measurements for some individual subjects. We expect that other fMRI centres will find this approach to identifying and controlling scanner instabilities useful in similar studies.

Table 1
Description of the thirteen sites with 3T research MRI scanners participating in this study. These sites included 5 GE Discovery, 3 Siemens Trio, 3 Siemens Prisma, 1 Siemens Skyra, and 2 Philips Achieva scanners.

Introduction
Changes in the blood oxygenation-level dependent (BOLD) signal due to neuronal activity constitute only a fraction of the changes in the raw fMRI signal intensity. Other major sources of variance in the fMRI signal are thermal noise, head motion, physiological noise, and temporal instabilities in the MRI scanner hardware (also known as system noise). Consequently, to accurately measure such small changes due to neuronal activity, it is crucial to measure, understand, and where possible, control other sources of variance. Thermal noise is dominated by the SNR-resolution-acquisition-time trade off, and several studies have attempted to improve this trade-off since the early days of MRI ( Griswold et al., 2002 ;Haase et al., 1986 ;Larkman and Nunes, 2007 ;Lustig et al., 2007 ;Mansfield, 1977 ;Margosian et al., 1986 ;Pruessmann et al., 1999 ;Samsonov et al., 2004 ;Sodickson and Manning, 1997 ). There exists a rich literature on controlling head motion and physiological noise ( Caballero-Gaudes and Reynolds, 2017 ;Power et al., 2015 ;Strother, 2006 ). Equally important is monitoring the temporal stability of MRI scanner hardware and, where possible, making corrections to the detected instabilities. A number of quality assurance (QA) protocols have been developed to monitor the performance of the scanner Glover et al., 2012 ;Jovicich et al., 2016 ;Price et al., 1990 ;Yan et al., 2013 ).
Quality assurance becomes particularly important in longitudinal and/or multi-site studies, which involve collecting data from a group of subjects over time and/or at different locations. It is crucial to regularly monitor the performance of the scanners over time and at different locations to detect and, where possible, control for intrinsic differences (e.g., due to manufacturers) and changes in the scanner performance (e.g., due to gradual component aging, software and/or hardware upgrades, etc.). If such differences and changes are not accounted for, they can add unexplained variance to the data .
As part of the Ontario Neurodegenerative Disease Research Initiative (ONDRI) ( Farhan et al., 2017 ) and the Canadian Biomarker Integration Network in Depression (CAN-BIND) ( Lam et al., 2016 ) QA phantom scans were conducted approximately monthly for three to four years at 13 sites across Canada with 3T research MRI scanners (See Table 1 ). We found considerable variance in the QA parameters over time for many sites as well as substantial variance across sites. We also identified an unexpected range of instabilities affecting individual slices in a number of scanners. These slice anomalies may amount to a substantial contribution to the signal variance and to the best of our knowledge have 2 https://ondri.ca/publications/ 3 https://www.canbind.ca/about-can-bind/our-team/executive-committee/ not been reported before. The main objectives of this paper are as follows: (1) to assess the range of within-and between-scanner variability revealed by the fBIRN QA pipeline and parameters, (2) to identify primary factors contributing to the scanner-dependent signal variance in fMRI studies, and (3) to identify preprocessing approaches to reduce this variance.

Sites
Thirteen sites participated in this study across Canada, including MR scanners from different vendors: 5 GE Discovery, 3 Siemens Trio, 2 Siemens Prisma, 1 Siemens Skyra, and 2 Philips Achieva ( Table 1 ).

Phantom scan protocol
The present study uses QA data acquired as part of the ONDRI and CAN-BIND research initiatives, where phantom QA scans were performed contemporaneously with human scan protocols of ONDRI and CAN-BIND for quality control. The functional Biomarker Imaging Research Network (fBIRN) phantom was scanned approximately monthly at each site, amounting to a total of 629 scans across all sites. The fBIRN phantom is a spherical plastic vessel 17 cm in diameter filled with a doped agar gel chosen to reflect T1, T2, magnetization transfer, and RF conductivity characteristics of human brain tissue . Each site had their own copy of the fBIRN phantom to enable regular QA scans. While there may be minor manufacturing differences between different copies, the effect of these differences are assumed to be negligible for the present study.
The data from these 629 scans have been made open source and are available online at https://www.braincode.ca/content/open-datareleases .
The aim of the QA protocol is to measure scanner stability under conditions that match those of human resting state experiments Glover et al., 2012 ). Thus we employed the Canadian Dementia Imaging rs-fMRI protocol (CDIP) ( https://www.cdip-pcid.ca ) for the phantom QA scans ( Duchesne et al., 2019 ) (Appendix Table A1 ). In a few study sites, the initial scan parameters did not match those of CDIP and were adjusted, during the study, to more closely match CDIP parameters ( Table A2 ).
• Region of interest (ROI) is by default a 15 ×15 square centered on the middle slice through the phantom.
• Signal image is the mean intensity across time by voxel.
• Static spatial noise image is the sum of all even-numbered images subtracted from the sum of all the odd images.
• Temporal fluctuation noise image : first, voxel time-series is detrended with a 2nd order polynomial. The fluctuation noise image is the standard deviation (SD) of the residual by voxel.
• CV (coefficient of variation) is the SD of a time-series divided by the mean of the time-series. mean Average value in the ROI of signal image SNR signal-to-noise ratio Noise: variance of the static spatial noise image over the ROI Signal: average of the signal image over the ROI SNR = (signal)/ √ ((Noise)/ (length of time series)) SFNR signal-fluctuation-to-noise ratio Signal image divided by temporal fluctuation noise image by voxel. Summary SFNR value is the average of this within ROI. For the next 5 variables: a time-series composed of the mean intensity of each volume within the ROI (i.e., 15 × 15 square centered on the middle slice from each volume) is calculated ( "raw signal "), and a 2nd order polynomial trend is fit to this data ( "fit "). msi is mean signal intensity of the raw signal. std SD of residuals after detrending percentFluc 100 * ( std )/( msi ) drift 100 * (max raw signal -min raw signal)/ msi driftfit 100 * (max fit -min fit)/ msi rdc radius of decorrelation. CV(N) = CV of the raw signal for a square NxN voxel ROI. rdc = CV(1)/CV(Nmax), where Nmax = 15. FWHM-Full width at half maximum The fBIRN QA pipeline uses an old version of AFNI 3dFWHMx (AFNI_2011_12_21_1014) for FWHM calculation, where FWHM is calculated using the "classic " estimate as a function of the ratio of variance of first differences to data variance ( Forman et al., 1995 ). minFWHM min value of all volumes in each direction maxFWHM max value of all volumes in each direction MeanFWHM mean value of all volumes in each direction AFNI_2011_12_21_1014 was used for the calculation of the next two variables: 3dvolreg: motion correction 3dDetrend: voxel-wise detrending using Legendre polynomials of order up to and including 2 3dTstat: calculate mean of voxels 3dAutomask: create mask of brain-only (high signal) voxels, dilated 4 times The mask is shifted by N/2 voxels in the phase-encode direction (Nyquist ghosting) to create "ghost mask ".

MeanGhost
Percentage of the mean intensity of the ghost voxels to non-ghost voxels MeanBrightGhost Mean intensity of the top 10 percent of ghost-only voxels sion . The QA parameters used in the subsequent analyses are described in Table 2 .

Autocorrelation function (ACF) analysis
Subsequent analyses of the fBIRN QA parameters revealed that FWHM is a prominent factor driving the variance between imaging sessions and between different sites (as outlined in the Results section and previously shown by ). Consequently, we developed an independent approach to measuring FWHM to verify and investigate the previous findings as follows: The (spatial) autocorrelation function (ACF) of a destructured image can be used to estimate the FWHM measure of resolution. We calculate the ACF for each slice by first detrending the time-series with a 2nd order Legendre polynomial to subtract out the spatial structure. Detrending is performed using AFNI's 3dDetrend (AFNI_17.3.05). The ACF is then calculated for each slice using the Wiener-Khinchin theorem: where F − 1 u,v denotes the inverse Fourier transformation and PSD(u,v) is the power spectral density of the image, which is estimated as the squared magnitude of the image FFT.
The FWHM in each direction (i.e., x and y) is subsequently estimated by fitting a Gaussian plus exponential mixture model to the ACF profile in that direction: The fit is identical to that used by the current version of AFNI 3dFWHM (AFNI_17.3.05) and is used for consistency with AFNI's current approach. Unlike the reported fBIRN FWHM measures, the slice ACF approach calculates FWHM values for each slice, which allows the identification of possible slice effects. FWHM values calculated using this approach will be referred to as "ACF FWHM " in the current paper, to distinguish from fBIRN QA FWHM values. Any slice with FWHMx > ×minFWHMx or FWHMy > ×minFWHMy is marked as an anomaly, where minFWHMx and minFWHMy are the minimum FWHM values in the x and y directions in that scan session and is an anomaly threshold.

Spatial smoothing
Following the approach of  to reduce the FWHM variance due to imaging sessions using different scanners, all imaging data were spatially smoothed to the largest average FWHM of all the sites set at 7 mm using AFNI's 3dBlurToFWHM (Version AFNI_17.3.05).

Correction of slice anomalies -Censoring
Subsequent investigation revealed a number of anomalous slices in some imaging sessions despite additional spatial smoothing. Where anomalies occur at discontiguous time frames in the time series, the anomalies might be mitigatable by censoring the anomalous slices, i.e., by removing the anomalous slices and replacing them with temporally interpolated values. Moreover, since the anomalies occur at individual slices, a slice-wise correction approach may best suit this problem. SpikeCor ( Campbell et al., 2013 ) is a PCA-based spike correction technique that can identify volume and / or slice anomalies. While the original version of SpikeCor detected volume anomalies only ( Campbell et al., 2013 ), a slice version has been made available by the author where in addition to volume defects, individual slice defects can also be detected ( https://www.nitrc.org/projects/spikecor_fmri/ ). We employed this version of SpikeCor to investigate the possibility of mitigating these slice anomalies by preprocessing. SpikeCor uses a statistical p-value threshold for outlier detection, which we set at the default value of 0.05.

Principal component analysis
Principal component analysis (PCA) was used to gain an overview of the within-and between-scanner normalised variance and correlation structure of the 14 fBIRN QA variables for the 13 MRI scanners over time. PCA was performed by singular value decomposition of the data matrix, where each row corresponds to one scan session and the columns correspond to the QA variables. The variables were shifted to be zero centered and scaled to have unit variance prior to the analysis. Nonetheless, no statistical inference was drawn based on PCA.

Statistical analysis
One-way multivariate analysis of variance (MANOVA) was used to test for significant differences in QA parameters between sites and, separately, between scanner manufacturers and the effect size was quantified using partial associations ( 2 ). Linear mixed effects models were fit to QA parameters, allowing the intercept of the model to vary randomly between scanners (i.e., sites) or scanner manufacturers. Significance of the estimates and the random effects was quantified using likelihood ratio tests. The effect size was quantified based on the correlation between the fitted and the observed values. Intra-class correlation (ICC) was calculated by dividing the random effect variance by the total variance, i.e., the sum of the random effect variance and the residual variance.

Preliminary human experiments
We used a small human data set to demonstrate the impact of slice vs volume censoring on human data as follows: resting state fMRI (rs-fMRI) and MRI data were acquired from 64 baseline participants, 34 normal controls and 30 participants with major depressive disorder from UCA. The participant and acquisition details are described in ( Lam et al., 2016 ;Wijk et al., 2021 ).
The resting state fMRI data were preprocessed with the OPPNI pipeline ( ( Churchill et al., 2017( Churchill et al., , 2015; software available at ( https: //github.com/strotherlab/oppni ) using the following steps: (1) the volume with the least amount of head displacement was determined using a principal component analysis (PCA) and all volumes were registered to this volume with rigid-body motion correction via AFNI's 3dvolreg; (2) significant outlier volumes or slices were identified, removed and replaced by interpolated values using neighbouring volumes or slices, respectively, through censoring as implemented in (( Campbell et al., 2013 ); software available at: nitrc.org/projects/spikecor_fmri); (3) slicetiming correction was performed with Fourier interpolation via AFNI's 3dTshift; (4) spatial smoothing across MRI scanners at different sites was matched using the 3dBlurToFWHM module in AFNI to smooth the fMRI images to the smoothness level of FWHM = 6 mm in three directions (x,y,z); (5) AFNI's 3dAutomask algorithm was used to obtain a binary mask excluding non-brain voxels using default parameter settings, the resultant mask was applied to all EPI volumes prior to subsequent pipeline steps; (6) neuronal tissue masking was performed by estimating a probabilistic mask to reduce the variance contribution of non-neuronal tissues in the brain (macro-vasculature, ventricles) using the first part of the PHYCAA + algorithm to estimate task-run and subject-specific neural tissue masks ( ( Churchill and Strother, 2013 ); software available at nitrc.org/projects/phycaa_plus); (7) Global signal regressors were estimated using PCA performed on each session's fMRI data as the PC1 time-series tends to be highly correlated with global signal effects but is orthogonal to the PC2 + signal subspace ( Carbonell et al., 2011 ); (8) several nuisance regressors (low frequency temporal trends, head motion effects and global PC1) were calculated and then regressed-out from the data concurrently via multiple linear regression ( Churchill et al., 2012a( Churchill et al., , 2012b; (9) physiological noise components were estimated and removed through data-driven physiological correction using the second part of the data-driven PHYCAA + algorithm ( Churchill and Strother, 2013 ); software at nitrc.org/projects/phycaa_plus); (10) lowpass filtering was carried out using a linear filter to remove BOLD frequencies above 0.10 Hz; 11) spatial normalization to a structural template (sNORM) was carried out with all scans aligned to the MNI152 template (MNI-normalized, 4 mm resolution) using two transformations (fMRISubj -> MRISubj, and MRISubj →MNITemp, combined into one aggregated transform) via FSL's FMRIB's Linear Image Registration Tool (FLIRT) module.
For each rs-fMRI scanning session we generated two preprocessed data sets: one with outlier volumes censored (VC) and one with outlier slices censored (VS). As a preliminary test of the impact of these different censoring choices we examined changes between the two preprocessed data sets per session for the Posterior Cingulate Cortex connectivity seed PCC, MNI:[0, − 60, 28]) used by ( Wijk et al., 2021 ). The mean time series within each ROI was extracted using AFNI's 3dROIstats for all of the 64 MNI-normalized preprocessed scans. AFNI's 3dTcorr1D was used to generate voxel-wise Pearson correlation between the mean time series within each ROI and every other voxel of each scan. Finally, correlation values were converted to z-scores using the variance stabilizing Fisher's transformation for correlations using AFNI's 3dCalc.

Split-half reproducible PCA
For a set of brain maps, PCA produces eigenimages that correspond to multivariate patterns explaining the greatest variance within the dataset. The reproducibility of these eigenimages can be quantified using a split-half resampling procedure, previously published in ( Churchill et al., 2015 ). In this approach, a matrix of brain images , of dimensions ( V voxels x S samples) is randomly split in half, producing matrices 1 and 2 , of dimensions ( V x S /2). Applying PCA obtains from 1 a set of eigenimages 1( ) , and from 2 a set of eigenimages 2( ) , where k = 1…S /2 indexes over eigenimages, ordered by decreasing amount of variance explained. We then obtain a z-scored map of voxelwise reproducibility for each component pair k , using the procedure described in ( Strother et al., 2002 ). This involves z-transforming 1( ) and 2( ) , before calculating signal-axis projection ( ) = ( 1( ) + 2( ) )∕ √ 2 and noise-axis projection ( ) = ( 1( ) − 2( ) )∕ √ 2 , then obtaining . This process is repeated for 100 random split-half partitions, and the mean ( ) maps are retained.

fBIRN QA measurements
To give a sense of the range of the values produced by the fBIRN QA pipeline for each site, the fourteen QA parameters are plotted for all sites in Fig. 1 . The sites are labeled using a three-letter code defined in Table 1 . Multiple potential site differences are seen in Fig. 1 and manufacturer differences in mean, SNR, and max and minFWHM values are clearly visible between the first five GE sites (CAM, MCM, SBH, TWH & UCA), the next six Siemens sites (BYC, WEU, QNS, TOH, UTO, SMH) and the remaining two Phillips sites (TBR, UBC). The differences are significant between sites (One-way MANOVA, p < 0.001, 2 = 0.43) and between manufacturers (One-way MANOVA, p < 0.001, 2 = 0.90).
Principal component analysis (PCA) provides a visualization of the normalised variance and correlation of the QA variables within imaging sites as well as between different sites and manufacturers ( Fig. 2 ). 66% of the total variance is captured by the first two components and a substantial part of that is between-manufacturer variance in agreement with the relatively large partial association effect size ( 2 = 0.90) for the between manufacturer MANOVA . For instance, in Fig. 2 , Siemens scanners are mostly in the first quadrant (top right quadrant), while GE scanners are mostly located in the third (bottom left) and Philips scanners in the second (top left) quadrants on the PCA plane. Furthermore, the within-manufacturer variance varies substantially between different manufacturers and sites. For example, for PC1 Siemens has  Table 1 . Note that FWHM values are plotted in mm on log scales and are based on the fBIRN use of AFNI's "classic " FWHM estimate as a function of the ratio of variance of first differences to data variance.
the smallest within-manufacturer variance and GE has the largest (Appendix Table A4 ). Moreover, we observe prominent anomalous scans in a number of sites (most strikingly in UCA, TWH, SMH, UBC, and TBR), as shown in Fig. 2 , where the imaging sessions within each site are connected in temporal order of acquisition.
Finally, the 14 QA variables shown as red arrows with lengths proportional to their variable loadings in Fig. 2 suggest two primary clusters of variable effects over time, and site and manufacturer. The first cluster of variables is roughly arrayed around the 45°diagonal (i.e., correlated; see Fig. A1 in Appendix) within the bottom left quadrant and includes minFWHMy, minFWHMx, mean, SNR and SFNR with negative variable loadings on PC1 and PC2. Therefore the individual scan sessions from GE and Phillips scanners with their negative factor/observation loadings x the negative variable loadings for these five variables all have incre- Shown is the scatter plot of the first two principal components (PC1 and PC2) for all the imaging sessions (including outliers). Each imaging session is color-coded by scan site and shape-coded by scanner manufacturer. The imaging sessions within each site are connected in temporal order of acquisition. The ellipses show 95% confidence bounds assuming a multivariate t distribution.
A few examples of prominent anomalous sessions are highlighted with labels for UCA. The fBIRN QA variables are also plotted using red arrows on the PCA plane with lengths proportional to the variable loadings. An interactive version of this plot can be viewed at https://kayvanrad.github.io/phantomQA/ #fbirn_qa_pca .

Table 3
Linear mixed effects fit of the QA parameters to the scanners (i.e., sites) as a random effect. mentally positively larger values than for the Siemens's scanner sessions in the upper right quadrant (i.e., positive factor/observation loadings x negative variable loadings). This shows that the manufacturer dependent blocks seen in Fig. 1 , with GE and Phillips FWHM greater than Siemens, are quantitatively dominant effects in the overall data set. We also note that in terms of these main effects, multiple QA variables appear to be somewhat redundant with meanGhost, rdc and the weaker meanBrightGhost (shorter red vector) inversely correlated with the positively correlated minFWHMy, minFWHMx, mean, SNR and SFNR. The second cluster of variables occurs around the 45°diagonal (i.e., correlated) within the upper left quadrant and includes percentFluc, std, drift, maxFWHMx, maxFWHMy and the weaker driftfit, and all record temporal variation within-and between-scan sessions within a site, likely driven by the multiple outlying scan sessions noted in GE and Phillips scanners above.

Main drivers of variance and anomalous scan sessions
To further quantitatively understand the structure of these effects we used linear mixed effects models fitting the QA parameters to scanners (i.e., sites) and, separately, to scanner manufacturers as random effects. These results are summarized in Tables 3 and 4 , respectively. Substantial variance in mean, SNR, SFNR, meanGhost, minFWHMX, and min-FWHMY is explained by scanner (ICC ≥ 0.85, p < 0.05 for SFNR and p < 0.001 for others) and substantial variance in mean, minFWHMX and minFWHMY is explained by scanner manufacturer random effects (ICC ≥ 0.85, p < 0.001). This is consistent with the PCA factor loadings in Fig. 2 , where minFWHMX, minFWHMY, and mean drive the betweenmanufacturer difference between the GE/Philips versus Siemens clusters. Notably, however, limited variance is explained in measures of max FWHM, i.e., maxFWHMX and maxFWHMY, by scanner and manufacturer (ICC < 0.40, p < 0.001). In consistency with these low ICC values, measures of maxFWHM, which show weaker correlations with minFWHM ( Fig. A1 ) and are almost orthogonal to minFWHM on the PCA plane ( Fig. 2 ), do not appear to drive substantial between-manufacturer variance. Nevertheless, a large number of the anomalous scan sessions appear to be associated with maxFWHM ( Fig. 2 ). This furthermore indicates that min and max FWHM are potentially driven by different factors.

Spatial smoothing reduces variance in minFWHM
We tested the extent to which scanner differences in imaging resolution can be alleviated by smoothing the images to the greatest mean FWHM of all sites, as advocated by . Our data show that after spatial smoothing to 7 mm by AFNI's 3dBlurToFWHM, scanner effects contribute to a lesser extent to the variance in minFWHM in the x (LMM, ICC = 0.81, p < 0.001) and y (LMM, ICC = 0.72, p < 0.001) directions, amounting to 12% and 20% reduction in ICC for

Table 5
Percentage of scan sessions with anomalies for each site for different anomaly thresholds . A slice is identified as anomalous if FWHMx > × minFWHMx or FWHMy > × minFWHMy, minFWHMx and minFWHMy are the minimum FWHM values in the x and y directions in that scan session and is an anomaly threshold. U: percentage of anomalous scan sessions for the unprocessed raw data, S: percentage of anomalous scan sessions for data processed using slice SpikeCor, V: percentage of anomalous scan session for volume SpikeCor processed data.   Table A6 ) and reduces the variance of minFWHMX and min-FWHMY by 81% and 71%, respectively. Nevertheless, the scanner effect is far from eliminated. Moreover, smoothing does not appear to remove and/or alleviate the anomalous sessions, indicating that these anomalies are potentially driven by factors other than differences in reconstruction resolution (Appendix Fig. A4 ).

ACF Analysis
As noted, imaging resolution (measured in terms of FWHM) is one of the main factors driving the within-and between-site variance and temporal anomalies. To further investigate imaging resolution, we looked at an independent slice-wise measure of resolution using ACF. This was motivated by two factors: (1) to provide an independent measure of FWHM compared to the fBIRN QA pipeline FWHM measures, and (2) to calculate slice-wise measures of FWHM, as our preliminary investigation of the anomalous images showed possible slice effects (See, for example, Fig. 6 ) and the fBIRN QA pipeline does not provide slice measurements of FWHM -as noted the fBIRN QA pipeline uses the "classic " estimate of FWHM, defined as a function of the ratio of variance of first differences to data variance ( Forman et al., 1995 ). Fig. 3 (a) shows the min and max slice ACF FWHM values for the 13 participating sites. ACF FWHM values reveal relatively stable but distinct manufacturer differences in minimum imaging resolution in x (LMM, ICC = 0.90, p < 0.001) and y (LMM, ICC = 0.92, p < 0.001) directions, where minFWHM ranges around 2 mm for Siemens and around 3 mm for GE and Philips scanners (Appendix Table A6 ). Scanner manufacturer to a lesser extent explains the variance in maximum slice ACF FWHM in x (LMM, ICC = 0.60, p < 0.001) and y (LMM, ICC = 0.72, p < 0.001) directions. Nonetheless, these measures reveal multiple anomalous scan sessions with suspiciously high maxFWHM values. While some of these anomalies are observed in the fBIRN QA measures (e.g., those of UCA), these anomalies are more prominently revealed by the slicebased ACF measures -for example, for BYC0098, compare the max ACF FWHM values in Fig. 3 (a), with max fBIRN QA (volume-based) FWHM values in Fig. 1 . Table 5 summarizes the number of anomalous scan sessions detected using the slice ACF FWHM values in each site for different thresholds . The number of anomalous sessions incrementally decreases with increasing the threshold. While the number does not decrease at a uniform rate, with sharper decrease between some threshold values compared to the others, our data do not show any consistent pattern for different sites. For example, for WEU the number of anomalies drops from 4 to 0 between = 2 and = 3, but for BYC, the number drops from 58 to 8 between = 3 and = 5. Appendix Table A3 (a) summarizes the anomalous scan sessions detected using a conservative threshold of = 10. Presuming the min FWHM value reflects the manufacturer's intended resolution, any slice with a FWHM value one order of magnitude larger than the intended resolution is considered anomalous as such a huge discrepancy is undesired and can have detrimental effects on the data.  Table 1 . A few anomalous sessions are labeled on the plot. Note that FWHM values are plotted in mm on log scales. While SpikeCor effectively removed anomalies in some sessions (e.g., BYC0014, BYC0046, BYC0098, and QNS9025), it was not able to ameliorate the anomalies is some other sessions (e.g., SBH0010, SMH0020, UCA009, and UCA0037).

Scan sessions with anomalies
Principal component analysis of the fBIRN QA parameters of the data excluding the = 10 anomalous sessions verify the impact of anomaly detection based on slice ACF FWHM on the fBIRN QA variance ( Fig. 4 ). With the anomalies removed, maxFWHM and minFWHM, which were previously almost orthogonal to each other ( Figs. 2 and A1 (a)), become strongly correlated ( Figs. 4 , and A1 (b)), together driving the betweenmanufacturer variance, which is now more clearly reflected by PC1 than in Fig. 2 . Consistently, with anomalies excluded, substantial variance is now explained by the scanner manufacturer effect in maxFWHMX (LMM, ICC = 0.98, p < 0.001) and maxFWHMY (LMM, ICC = 0.98, p < 0.001), in addition to minFWHMX (LMM, ICC = 0.98, p < 0.001) and minFWHMY (LMM, ICC = 0.96, p < 0.001) (Compare with Table 4 ). BYC0014 shows one prominent spike in maxFWHM in only one time frame (frame #6) ( Fig. 5 (a)). A visual investigation of this acquisition reveals only one anomalous slice in that volume ( Fig. 6 (a)), which is manifest in the maxFWHM values.
QNS9025 shows multiple spikes in maxFWHM over several timeframes ( Fig. 5 (b)). Nevertheless, these spikes do not manifest in the mean, median, or the quartiles, indicating that these effects are due to Shown is the scatter plot of the first two principal components (PC1 and PC2) of the fBIRN QA variables for all the imaging sessions. Each imaging session is color-coded by scan site and shape-coded by scanner manufacturer. The imaging sessions within each site are connected in temporal order of acquisition. The ellipses show 95% normal probabilities. The original fBIRN QA variables are also plotted using red arrows on the PCA plane with lengths proportional to the variable loadings. An interactive version of this plot can be viewed at https://kayvanrad.github.io/phantomQA/ #anomalies_removed_pca .   one or a few individual slices but not the majority of slices in the corresponding volumes. Fig. 7 (a) shows an example of one such volume with a few anomalous slices. SMH0020 shows blocks of defective volumes with unusually high measures of FWHM ( Fig. 5 (c)). The anomalies are manifest not only in maxFWHM values, but also in other measures of center and spread (mean, median, quartiles). These anomalies are due to multiple and sometimes many anomalous slices in the corresponding volumes. Fig. 8 (a) displays these anomalies in one such volume. For comparison, a normal volume in the same scan session is shown in Fig. 8 (c). Fig. A2 displays slice FWHM summary values for SMH0019 (the monthly scan session immediately prior to SMH00200), which has normal FWHM values.
Most of the frames (i.e., volumes) in UCA0009 are defective with slice instabilities ( Fig. 5 (d)). Moreover, these anomalies are manifest in the median and the quartiles in addition to the max FWHM values, indicating that a large portion of the slices are defective in each volume. Fig. A3 (a) summarizes the total number of slice anomalies and the median of the number of slice anomalies per volume for all the scan sessions. As noted, the distribution of slice anomalies varies greatly between different scan sessions. For example, while there are a number of anomalous slices in QNS9025 and SMH0020, the median of the number of anomalies per volume is zero for those scan session, indicating that less than half of the frames (i.e., volumes) in the acquisition were affected by these instabilities. On the other hand, UCA0009 has a large to-tal number of anomalies, and a large median of the number of anomalies per volume, indicating that a substantial number of slices in a substantial number of volumes (more than half) were affected by these anomalies. These observations are consistent with those of Fig. 5 .

Correction of anomalous slices
Our data show that slice-based SpikeCor can effectively identify anomalous slices under certain circumstances, namely, when only a fraction of the slices are defective, but fails when they constitute a large fraction of the slices. For example, slice-based SpikeCor succeeds in removing or alleviating the anomalies in BYC0014 ( Figs. 9 (a) and 6 (b)) and QNS9025 ( Figs. 9 (b) and 7 (b)). On the other hand, SpikeCor does not appear to be effective in removing the anomalies for SMH0020 ( Figs. 9 (c) and 8 (b)) and for UCA0009 ( Fig. 9 (d)). Furthermore, SpikeCor does not affect normal acquisition, e.g. SMH 0019 ( Fig. A2 ). Our data also indicate that slice-based SpikeCor is more effective than the volume-based SpikeCor in removing the anomalies, regardless of the threshold used to detect anomalies ( Table 5 ). Appendix Table A3 (b) lists the anomalous scan sessions, identified using a conservative threshold of = 10, for the unprocessed data as well as for volume-based and slice-based SpikeCor preprocessing for each center. With the exception of volumebased SpikeCor for UBC, SpikeCor (both slice-and volume-based) processed anomalies are a subset of those of the raw data. That is, SpikeCor does not produce additional synthetic anomalies.

Preliminary human experiments
For the PCC seed and the preprocessing pipelines using volume-(VC) or slice-based (SC) censoring, z-mapped session connectivity volumes (CONNz) were analysed using spit-half PCA to ensure a significant reinforcing connectivity map that is stable and reproducible exists across the 64 CONNz volumes. Individual session variance was calculated for the VC CONNz (VCz) and SC CONNz (SCz) volumes using all PC components from the split-half PCA.
Split-half PCA shows one highly-significant (VC: p < 0.01; SC: p < 0.01) highly reproducible (VC: R = 0.95; SC: R = 0.95) connectivity pattern accounting for (VC:44%; SC:45%) of total variance ( Fig. 10 ). This reflects some of the classical default mode regions seen when using a PCC seed in resting state data with the large Z-values and associated high pattern reproducibilities indicating a strong stable connectivity pattern being reinforced across most sessions.
Using either a paired t -test ( p = 0.56) or Wilcoxon non-parametric run test ( p = 0.64) there is no difference between VCz and SCz session variance distributions.
To explore the possible changes in seed connectivity values per voxel we regressed the brain-masked SCz voxel values against the VCz voxel values separately for each session. Fig. 11 demonstrates these regression slope values plotted as a function of the 64 session numbers so that scan date progresses from left (Session 1) to right (Session 77) from October 2013 through August 2016. Note that there are missing sessions with a total of only 64 rs-fMRI data sets because session numbers were taken from CAN-BIND study enrollment numbers, and for a variety of reasons not all participants had a MRI scan. There is no significant trend of slope with session number ( p = 0.41; F = 0.70). However, the 32 sessions up to and including session 42 (large red disk) have a lower slope variance than for sessions after 42. The first 32 slope values have a variance ratio of 0.425 (two-tailed F-test: F = 0.425, p < 0.05, 95%-CI = [0.21,0.87]) of the slope value variance for the 32 session # s > 42. This fits with our monthly fBIRN phantom measurements from Calgary showing that the scanner became more unstable in the later period of human data collection for CAN-BIND.
In Fig. 11 we see both large increases and decreases in overall zscores above 20%. Fig. 12 (a) shows session 64 with a large decrease (red disk in Fig. 11 ), and Fig. 12 (b) shows session 42 with a large increase (purple disk in Fig. 11 ). Overall 46 of the 64 (72%) sessions have slopes less than or equal to 1.0 showing that the primary impact of slice censoring versus volume censoring is to reduce z-scores.

Discussion
Scanner stability is an important concern for fMRI experiments . Instabilities can add unexplained variance to the data. This is particularly relevant to multi-site longitudinal studies. Not only is it important to ensure the scanner remains stable throughout a run (set of volumes acquired using a prescribed pulse sequence), it is also very important to ensure stability within each site over time (i.e., between different imaging sessions in one site), and between sites. Our data highlight the importance and efficacy of phantom scans for monitoring the stability of MRI scanners and identify an additional important stability dimension across slices within a volume. Routine QA phantom scans at all sites effectively detected instabilities across all of these possible stability dimensions across multiple scanners.
As noted above, we used the fBIRN QA pipeline to calculate 14 QA parameters, the utility of which has been described in prior works Glover et al., 2012 ). It was unknown to what extent these 14 variables capture different variance / noise sources and one of the goals of the current study was to evaluate the utility of   the published fBIRN parameters for monitoring scanner stability and precision, and identify the most important QA variables to monitor.

Between-manufacturer variance
Our data indicate that between manufacturer differences in imaging resolution are a major source of variance in multisite studies. In particular, we identified two distinct mechanisms by which these differences can contribute to the variance: (1) differences in minFWHM values, attributed to the differences intrinsic resolution of the scanners, and (2) differences in maxFWHM values, attributed to slice anomalies.
We identified two primary clusters of variable effects over time, and site and manufacturer ( Fig. 2 ). The first cluster of variables includes min-FWHMy, minFWHMx, mean, SNR, and SFNR. The positive correlation between these variables makes sense since the higher FWHM in GE and Phillips scanners leads to higher SNR/SFNR values. We also noted that meanGhost, meanBrightGhost, and rdc are inversely correlated with the positively correlated minFWHMy, minFWHMx, mean, SNR, and SFNR. While minFWHMX, minFWHMY, and mean may be considered manufacturer traits, SNR, SFNR, rdc, and meanGhost are more site dependent and capture significant scanner differences within manufacturers. Furthermore, these variables might be expected to be relatively stable within sessions, whereas the second cluster of variables, including per-centFluc, std, drift, maxFWHMx, maxFWHMy, and riftfit, which are located at approximately 90°to the first cluster on the PCA plane, largely reflect within session temporal fluctuations. The correlation (positive or negative) between the QA variables within each cluster indicate that multiple QA variables appear to be somewhat redundant.
We note that while there exists a nested dependence among the observations due to site effects, the main objective of the PCA illustrations is descriptive rather than inferential, which is not affected by complications due to non-independence of the observations. As noted above, the PCA illustrations provide insight into the structure of the data, including the between-and with-site effects.
We observe significant contributions of scanner manufacturer effects to the variance in minFWHM (LMM, ICC ≥ 0.90, p < 0.001), which presumably reflects the manufacturer's intended imaging resolution. Our data show clear differences in minFWHM between manufacturers ( Figs. 1 , 3 , and Appendix Table A6 ), which are attributed to the difference in the reconstruction techniques used by different vendors (e.g., differences in apodization, etc.) . Moreover, we noted that preprocessing by blurring all images to a common spatial resolution (e.g., the coarsest spatial resolution of all scanners) can reduce scanner-related variance in imaging resolution, reflected in minFWHM values ( Table A6 ). However, mean differences of > 1 mm remain suggesting that a new algorithm other than the current AFNI 3dBlurToFWHM (Version AFNI_17.3.05) is needed to further reduce this major inter-scanner difference and its impact on SNR.
The anomalous scan sessions, on the other hand, are reflected in maxFWHM, which is almost orthogonal to the minFWHM ( Fig. 2 ). This indicates that maxFWHM, which is more than one order of magnitude larger than the minFWHM for some scan sessions, is driven by factors other than the vendor-specific reconstruction resolution. Nevertheless, min and max FWHM become more strongly correlated if the anomalous sessions are excluded ( Fig. 4 ) and scanner manufacturer becomes a substantial driver of variance in maxFWHM in addition to minFWHM (LMM, ICC ≥ 0.90, p < 0.001). Furthermore, with these sessions ex- cluded, measures of min and max FWHM show stronger correlation with center and spread measures of FWHM, i.e., median and quartiles ( Fig. A1 ). In addition, with the anomalies excluded, slice ACF measures of FWHM also show stronger correlation with fBIRN QA measures of (volume) FWHM ( Fig. A1 ). These observations confirm that these anomalies are caused by instabilities affecting individual slices rather than by global instabilities affecting entire volumes and/or the entire acquisition.

Slice anomalies
Our data provide evidence of a potentially significant new source of variance due to anomalous individual slices without subject motion or any sources of physiological noise. These anomalies manifest in abnormally high values of slice ACF FWHM. They can go unnoticed with the fBIRN QA measurements but can be more effectively revealed using slice ACF FWHM measures of imaging resolution. Since these anomalies are caused by instabilities affecting individual slices rather than an entire volume, they are best captured by slice measures of FWHM, rather than volume FWHM where a single FWHM value (in each direction) is calculated for each volume using a volume ACF or by averaging over slices.
We presented a simple anomaly detection criterion based on the slice ACF FWHM, where the minFWHM is used as a reference and anomalies are detected using a multiplicative threshold. We also noted that the number of scan sessions with slice anomalies incrementally decreases with increasing the threshold. The value of the threshold can be chosen empirically depending on how strictly the user wishes to identify anomalies based on the application.
Slice anomalies are observed more frequently in older scanners. We almost never observed slice anomalies in WEU and UTO scans, which have relatively new Prisma scanners. We speculate these anomalies are related to aging hardware, resulting in spontaneous instabilities in the scanners' acquisition performance.

Correcting anomalous slices
We noted a preprocessing approach by which these slice anomalies can be corrected under certain circumstances. Since these scanner instabilities affect individual slices (rather than entire volumes) a slicebased censoring scheme is more effective in removing these anomalies than a volume-based censoring scheme. Among the available censoring techniques, SpikeCor provides the option for slice-based censoring. Our results confirm that slice-based SpikeCor more effectively removes the anomalies than the volume-based SpikeCor ( Table 5 ).
For correction to be effective in removing slice anomalies (1) the anomalies should occur in discontiguous time frames or discontiguous blocks of at most a few frames length (hence enabling interpolation), and (2) the censoring method should succeed in correctly identifying the anomalous slices. SpikeCor fails to correctly identify the anomalous slices when they constitute a large fraction of the total number of slices in the volume. This is most prominently evident in several scan sessions at SBH, UBC, and UCA, where SpikeCor has limited efficacy in detecting anomalous slices within the scans.
It should be noted that different independent metrics are involved in SpikeCor and in ACF anomaly detection. The ACF anomaly detection is based on the physical measurements of ACF FWHM, where anomalies are detected using a threshold with respect to the minimum FWHM value. SpikeCor, on the other hand, uses a statistical threshold for anomaly detection. While one could also use the slice ACF threshold for anomaly detection for censoring, the goal of the current study is not to propose yet another censoring technique, nor is it to compare the efficacy of different censoring techniques. The current study demonstrates slice anomalies on phantom data and provides a proof-of-concept approach to removing them under certain circumstances.
For the next step of testing the impact of slice-versus volume-based censoring in resting state human data we note that it may be difficult to separate and detect slice anomalies of the sort demonstrated here from motion and physiological effects. As a result it is unclear exactly what metric should be used to show improved performance with slicebased censoring in resting state human scans. This is particularly the case since temporal variance itself is considered a cognitive signal in some studies ( Garrett et al., 2013 ) and while presumably it should decrease in cases of extreme anomalies it is unclear if it should generally go up or down after slice-based censoring. We have presented some preliminary human data to illustrate these issues and discuss this further below.

Hardware and/or software changes
Hardware and/or software changes are reflected in and can be identified by the QA parameters. Our data show an abrupt step change in several QA parameters between SBH0010 and SBH0011, including in FWHM (both fBIRN QA measures and slice ACF measures), percent-Fluc, and std ( Figs. 1 and 3 ). A follow up with the imaging site revealed that the RF coil was replaced in between those scans. QNS updated their scan parameters by using a lower pixel bandwidth ( Table A2 ) and switching from MSENSE to GRAPPA to resolve ghosting issues that were occurring in their images. These changes are reflected as prominent step changes in measures of meanGhost and meanBrightGhost in Fig. 1 . This change is also reflected in slice ACF measures of maxFWHM, as a step increase after the change in parameters ( Fig. 3 ). We also observed step changes in ghosting and FWHM parameters for UBC ( Fig. 1 ). An investigation of the scan parameters revealed that these parameters were updated to match those of the CDIP protocol ( Table A2 ), which in turn resulted in step changes in QA parameters of ghosting and FWHM (where these measures improved, i.e., their value decreased, once the scan parameters were set to those of the CDIP protocol). The crop of large anomalies in UCA prior to and including UCA0037 was due to a failing head coil that was eventually replaced after the UCA0037 scan in December 2016. As a result, the UCA scans following UCA0037 have normal FWHM values. Note that the last human scan, session 77, was acquired in August 2016, four months before the failing head coil was replaced. These results indicate that the fBIRN QA parameters and the FWHM measures in particular may provide a sensitive means of monitoring MRI scanners for early signs of head coil failure.

Translation to human studies
The ability to detect slice anomalies and the efficacy of the preprocessing strategies for controlling them needs to be more comprehensively verified on human data. Our preliminary data show that there is an impact of slice despiking versus volume despiking for PCC seed connectivity. While 46 of the 64 (72%) sessions have slopes less than or equal to 1.0 ( Fig. 11 ), these regression plots of individual session values show both highly reduced and highly increased voxel connectivity values of more than 30%. Therefore, while on average the SC CONNz map connectivity values are reduced relative to VC CONNz there are individual sessions that have strongly increased SC CONNz map connectivity values. We speculate that the slice censoring may be overcorrecting in the majority of sessions with reduced connectivity values as they may not contain significant slice anomalies, but testing this hypothesis is beyond the scope of this paper. While the group connectivity maps are almost identical (i.e., Fig. 10 ) our results indicate that sliceversus volume-censoring may significantly change inter-individual differences in connectivity and is therefore an important issue to be further understood as the field focuses more and more on individual effects, and not only group results. The presence of physiological processes and head motion adds substantial variance in human data, which can consequently induce challenges for accurate detection and removal of the anomalous slices and the effects we observe may be the result of interactions between multiple preprocessing steps. Additionally, previous literature has investigated the effect of a number of environmental and physiological variables on within-subject variance in longitudinal studies ( Karch et al., 2019 ). Notably, it has been shown that days since the first scan and time of day are predictors of within-subject variance in structural MRI ( Karch et al., 2019 ;Nakamura et al., 2015 ). The present human data provide a preliminary investigation of the impact of slice vs volume censoring on human resting state fMRI scans. Our phantom data identify unexpected slice anomalies and provide proof-of-concept preprocessing approaches for controlling these anomalies. We speculate these scanner dependent differences and fluctuations are equally present in human scans. Nevertheless, their relative magnitude and impact compared to other variables affecting in vivo scans remains to be investigated. Subsequent studies need to focus on translating the findings to human scans, taking the effect of the present scanner-related differences and fluctuation into account in relation to subject-specific variables.

Concluding remarks
In conclusion, the present study provides evidence of slice instabilities in fMRI scans, which can result in unexplained variance in the data. These instabilities can be detected using slice ACF measurements of FWHM and can, under certain circumstances, be controlled by preprocessing. When preprocessing is not effective in controlling these instabilities, the affected scan sessions can be detected using slice ACF FWHM values and excluded in subsequent analyses. Between-manufacturer differences in spatial resolution is another main factor driving variance in multisite studies, which can also be at least partially controlled by preprocessing. The utilization of regular QA scan protocols can reveal problems with the performance of the scanners, some of which may be addressed by hardware and/or software modifications, as was the case for at least two of the participating sites where increasing slice instabilities detected with slice-based FWHM measurements eventually led to head coils being replaced. Furthermore, our phantom results have led to preliminary human studies that have indicated the existence of quite large interindividual connectivity differences as a result of applying the same slice censoring technique in humans that removes some of the slice instabilities detected in phantoms..

Authors' note
The opinions, results, and conclusions are those of the authors' and no endorsement by the Ontario Brain Institute is intended or should be inferred.

Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was conducted with the support of the Ontario Brain Institute, an independent nonprofit corporation, funded partially by the Ontario government. Matching funds were provided by participant hos-pital and research institute foundations, including the Baycrest Foundation, Bruyère Research Institute, center for Addiction and Mental Health Foundation, London Health Sciences Foundation, McMaster University Faculty of Health Sciences, Ottawa Brain and Mind Research Institute, Queen's University Faculty of Health Sciences, Sunnybrook Health Sciences Foundation, the Thunder Bay Regional Health Sciences center, the University of Ottawa Faculty of Medicine, and the Windsor/Essex County ALS Association. The Temerty Family Foundation provided the major infrastructure matching funds. AK, AC, and MZ were partially supported by a Canadian Institutes of Health Research (CIHR) grant (MOP201403).

Declaration of Conflicting Interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: S. Strother is the Chief Scientific Officer of ADMdx, Inc., which receives NIH funding, and he currently has research grants from Brain Canada, Canada Foundation for Innovation (CFI), Canadian Institutes of Health Research (CIHR), and the Ontario Brain Institute in Canada.
Data and code availability statement : The data used in the current study are available online at https://www.braincode.ca/content/opendata-releases . A dashboard containing interactive figures and links to github repositories containing the code used for the analysis is freely accessible online at https://kayvanrad.github.io/phantomQA/.

Scan parameters Correlation analysis
The correlation between fBIRN QA and ACF FWHM variables is displayed in Fig. A1 (a). fBIRN QA measures of min FWHM are highly correlated with each other (minFWHMX and minFWHMY) and fBIRN measures of max FWHM also highly correlate with each other (maxFWHMX and maxFWHMY). However, fBIRN measures of min FWHM do not show strong correlation with those of max FWHM. Moreover, as expected, SNR and SFNR are highly correlated and so are measures of drift and driftfit. Similar to fBIRN measures, although ACF measures of min FWHM are highly correlated with each other (minFWHMx and min-FWHMy) and ACF measures of max FWHM are also highly correlated with each other (maxFWHMx and maxFWHMy), we did not observe strong correlation between ACF measures of min and max FWHM. Moreover, while "center and spread " measures of ACF FWHM (i.e., mean, std, med, Q1, and Q3) are highly correlated, they do not show strong correlation with min and max ACF FWHM measures. This indicates that measures of max and min FWHM are potentially driven by different factors.
The correlation between fBIRN QA measures and slice ACF FWHM measures with the anomalous sessions removed is summarized in Fig. A1 (b). After the removal of the anomalous sessions, fBIRN measures of min and max FWHM now show strong correlations with each other. Also, min and max slice ACF FWHM now strongly correlate with each other and with center and spread measures. Furthermore, slice ACF measures of FWHM now show stronger correlation with fBIRN QA measures of FWHM.
Scan sessions with slice anomalies Spatial smooting Variance within different sites and different manufacturers Table A5 .

Table A2
Summary of the sites that updated their rs-fMRI scan parameters during the study to match those of the CDIP ( Table A1 ). The echo time (TE), repetition time (TR), flip angle (FA), voxel size, and pixel bandwidth (BW), are shown before (b) and after (a) the update.

Table A6
Mean and 95%-confidence intervals for the slice ACF minFWHM values for all the imaging sessions at each site (including anomalous sessions).