Does higher sampling rate (multiband + SENSE) improve group statistics - An example from social neuroscience block design at 3T

Multiband (MB) or Simultaneous multi-slice (SMS) acquisition schemes allow the acquisition of MRI signals from more than one spatial coordinate at a time. Commercial availability has brought this technique within the reach of many neuroscientists and psychologists. Most early evaluation of the performance of MB acquisition employed resting state fMRI or the most basic tasks. In this study, we tested whether the advantages of using MB acquisition schemes generalize to group analyses using a cognitive task more representative of typical cognitive neuroscience applications. Twenty-three subjects were scanned on a Philips 3 T scanner using five sequences up to eight-fold acceleration with MB-factors 1 to 4, SENSE factors up to 2 and corresponding TRs of 2.45s down to 0.63s, while they viewed (i) movie blocks showing complex actions with hand object interactions and (ii) control movie blocks without hand object interaction. Data were processed using a widely used analysis pipeline implemented in SPM12 including the unified segmentation and canonical HRF modelling. Using random effects group-level, voxel-wise analysis we found that all sequences were able to detect the basic action observation network known to be recruited by our task. The highest t-values were found for sequences with MB4 acceleration. For the MB1 sequence, a 50% bigger voxel volume was needed to reach comparable t-statistics. The group-level t-values for resting state networks (RSNs) were also highest for MB4 sequences. Here the MB1 sequence with larger voxel size did not perform comparable to the MB4 sequence. Altogether, we can thus recommend the use of MB4 (and SENSE 1.5 or 2) on a Philips scanner when aiming to perform group-level analyses using cognitive block design fMRI tasks and voxel sizes in the range of cortical thickness (e.g. 2.7 mm isotropic). While results will not be dramatically changed by the use of multiband, our results suggest that MB will bring a moderate but significant benefit.


Introduction
Multiband (MB) or Simultaneous multi-slice (SMS) acquisition schemes allow the acquisition of magnetic resonance imaging (MRI) signals from more than one spatial coordinate at a time. Apart from the obvious advantage in the reduction of per volume acquisition times , these sequences have been postulated to have several advantages for functional MRI (fMRI), especially, higher signal to noise per time unit, higher sampling rate resulting in higher statistical power and a better estimation of physiological noise (Barth et al., 2016;Feinberg and Setsompop, 2013;Olafsson et al., 2015). Most of the early testing of performance of MB acquisition employed resting state fMRI and showed decreased total scan times (X.-H. Liao et al., 2013), better noise estimation and spectral and spatial de-aliasing (Griffanti et al., 2014;Kalcher et al., 2014;Frederick, 2014a, 2014b;Tong et al., 2014) as well as better localization and estimation of functional networks (Koopmans et al., 2012). However, much less information is available about the benefits of MB acquisition for task based fMRI (Feinberg and Yacoub, 2012). Particularly, the utility of MB acquisition for group level statistics has not been evaluated yet. In this study we therefore performed comprehensive analyses to test the advantages of using MB acquisition schemes over single band EPI acquisition schemes in detecting task-based Blood Oxygenation Level Dependent (BOLD) responses.
The idea of acquiring data from different spatial locations simultaneously was first proposed in the 1980s (see Barth et al., 2016 for a historical review). However, the studies that brought widespread attention to the use of MB technique in fMRI were published in 2010 as part of the Human Connectome Project (Smith et al., 2013;U gurbil et al., 2013). These two studies used multiplexed EPI combining simultaneous echo refocusing (SIR) and multiband radio frequency pulses and showed increased sensitivity to detect resting state networks (RSN) with 6 fold higher sampling rate at 3 T (Feinberg et al., 2010), and comparable activation between single band and MB acquisition using simple sensory tasks at 7 T . Since then, MB sequences were improved to include multiple modalities (Cohen et al., 2017a), to reduce radiofrequency power deposition Koopmans et al., 2012;Norris et al., 2011;Wu et al., 2013) and g-factor penalty due to the suboptimal reconstruction of the images (Setsompop et al., 2012), to ameliorate image reconstruction using controlled aliasing methods (Blaimer et al., 2013) such as radial CAIPIRINHA (Yutzy et al., 2011) and blipped CAIPI techniques (Setsompop et al., 2012), and by optimizing coil designs (Poser et al., 2014). Together with these technical advancements in MR pulse sequence development and image quality improvements, the commercial availability of compatible hardware and tailored acquisition and reconstruction software has brought the technique within the reach of many neuroscientists and psychologists, paving the way to systematic testing, validation and application to basic and clinical research settings (Chekroud et al., 2017;Cohen-Gilbert et al., 2017;Gabrielsen et al., 2018;Harenski et al., 2018;Kim et al., 2016;King et al., 2018;Kyathanahally et al., 2017;Provencher et al., 2018;Shah et al., 2016;Suri et al., 2017).
Since the publication of the first fMRI studies using MB acquisition in 2010 (Feinberg et al., 2010;Moeller et al., 2010), several studies have tested its benefits on resting and task based functional data. Using the keywords 'Multiband EPI', 'Multiband fMRI', 'Simultaneous Multislice EPI' and 'Simultaneous Multislice fMRI' in Pubmed on July 15, 2019, we identified 41 publications (including the two 2010 publications) that assessed the advantages of multiband acquisition for studying resting state and/or task based fMRI (Table 1). The data presented in these studies were collected on either 3 T or 7 T field strength scanners, generally using small to moderate sample sizes (median ¼ 10; range ¼ 3 to 476), and different voxel sizes (ranging between 1.5 mm isotropic and 3.75 mm isotropic). Of these 40 studies, 20 studies used resting state acquisition not only to assess the performance of short repetition times (TRs) (Feinberg et al., 2010;Koopmans et al., 2012;X.-H. Liao et al., 2013;Preibisch et al., 2015) but also to harness the advantages of the short TRs in spectral de-aliasing of high frequency bands to study the BOLD specific as well as noise specific components in different frequency bands, and for ICA based de-noising methods (Boubela et al., 2013;Gohel and Biswal, 2015;Griffanti et al., 2014;Kalcher et al., 2014;Tong and Frederick, 2014a).
The main aim of the current study is to perform a detailed assessment of the effects of MB acceleration on random effects group level statistics for task-based fMRI, and identify the optimal acceleration for task-based fMRI. We therefore analysed the 21 studies that employed task based BOLD, in greater detail. Boyacio glu et al. (2014) compared the performance of Gradient echo (GE) and Spin echo (SE) sequences in combination with MB factor of 3 and using a random effect analysis concluded that the performance of GE was superior in terms of the BOLD compared to SE. De Martino et al. (2015), ingeniously used longer silent periods between two TRs to present their auditory stimuli and showed a higher BOLD contrast with MB2 compared to single band, concluding that reducing the length of the scanner noise results in stronger auditory responses. Similarly, other studies that used MB accelerated sequences, but did not directly compare their performances for task related statistics were not analysed further (Shah et al., 2016;Vu et al., 2016). We identified three other studies that directly compared MB sequences for noise amplification, slice leakage and serial autocorrelation matrices. These studies unequivocally showed that the noise is amplified as MB factor increases, leakage might occur at very high acceleration (>MB8) and that the conventional auto-correlation models might not be sufficient for data acquired with faster TRs (Bollmann et al., 2018;Boubela et al., 2014;Risk et al., 2018). Since group-level task-based BOLD was not the focus of these studies, we did not analyse them further. This left us with 13 studies which directly tested MB acquisition compared to the conventional single band acquisition or standard datasets to assess the benefits of higher sampling rate with short TRs on task-based fMRI. Collectively, these studies showed that using MB acceleration of 2-3 times might yield comparable or better statistics for task-based BOLD (Table 1). While these studies are valuable in enhancing our understanding of the effects of MB, there are several criteria, typical of modern cognitive neuroscience applications, not fulfilled in these studies and therefore warrants further investigation.
Up until now, the effects of MB acceleration have mostly been studied in basic functional designs using finger tapping and visual checkerboard (L. Chen et al., 2015;Chu and Noll, 2016;McDowell and Carmichael, 2018;Moeller et al., 2010;Sahib et al., 2018Sahib et al., , 2016Su et al., 2018;Todd et al., 2016). These tasks massively activate motor and visual cortices with strong contrasts between ON and OFF conditions. Since these are predominantly positioned in the cortex of the brain and are close to the MR receiver coils, higher acceleration factors are easily achieved. In contrast, modern cognitive neuroscience tasks often contrast tightly controlled conditions resulting in substantially smaller effect sizes than has been explored in past multiband studies. Further, cognitive tasks may accrue more within-and between-subject variability. Within-subject variability may increase because higher cognitive processing is less stimulus locked and may habituate more than early sensory or motor tasks. Between-subject variance may increase because complex cognitive tasks allow more individual variability in interpretation and depend more on past experiences. This will further reduce effect sizes. Whether MB provides advantages in the context of such smaller effects sizes and increased individual variability is difficult to predict. We thus feel that findings from MB-studies with basic sensory stimuli cannot be directly generalized to cognitive neuroscience applications. Moreover, while testing complex cognitive neuroscience applications, larger sample sizes are needed to reliably detect the BOLD as compared to basic stimuli where a sample size of <15 is enough for a reliable activation. A simple power calculation showed that with a sample of size 3 (smallest sample size reported in the literature reviewed in Table 1) one can expect detection of an effect with an effect size of 2.3, whereas using 14 subjects would allow us to detect an effect with effect size of 0.7. We included a sample of 23 subjects as recommended for cognitive tasks in fMRI by Thirion et al. (2007) which would allow us to observe the effects with effect size of~0.5.
Furthermore, the majority of these 13 studies that tested the effects of higher sampling rate only report summary statistics such as mean t-or top 10% t-values within selected ROIs from subject level analysis (L. Chen et al., 2015;Chu and Noll, 2016;Kiss et al., 2018;McDowell and Carmichael, 2018;Sahib et al., 2018Sahib et al., , 2016Todd et al., 2017Todd et al., , 2016. Some studies additionally presented activation maps from representative subjects for visual comparisons Su et al., 2018), but only three studies presented voxel-wise, random-effects group-level, task-related BOLD statistics followed by a visual assessment of the performance of fast scanning.
In the current study, we add to our understanding of the effect of accelerated image acquisition and therefore higher sampling on task related BOLD statistics by addressing these gaps. We took advantage of a paradigm we have often used in our laboratory (Arnstein et al., 2011;Gazzola and Keysers, 2009;Gazzola et al., 2007;Valchev et al., 2016) and thus have reference data to systematically test the effect of MB acquisition. This paradigm aims to identify brain regions involved in the action observation network (Gazzola and Keysers, 2009) and involves observing complex goal directed actions (CA) and contrasting that activation  After noise correction, the detection of rs-networks improves with more non-artefactual independent components being observed. Additional activation clusters for task data are discovered for MBME data (increased sensitivity) whereas existing rs-networks become more localized (improved spatial specificity Longer scan times are needed to acquire data on single subjects for information on connections between specific ROIs. Longer scans may be facilitated by acquisition during task paradigms, which will systematically affect functional connectivity but may preserve individual differences in connectivity on top of task modulations.
3 T Not specified 2 Â 2 Â 2 0.7s Demetriou et al. (2018) MB, in plane acceleration factor ¼ 2 10 þ 14 Rest (6 min) & Task Strong benefits of the multiband protocols on results derived from resting-state data, but more varied effects on results from the task paradigms. Multiband protocols were superior when Multi-Voxel Pattern Analysis was used to interrogate the faces/places data, but showed less benefit in conventional General Linear Model analyses of the same data. In general, ROI-derived measures of statistical effects benefitted only modestly from higher sampling resolution. Imaging protocols using an acceleration factor of MB 2 Â GRAPPA 2 can be confidently used for highresolution whole-brain imaging to improve BOLD sensitivity with very low probability for false-positive activation due to slice leakage. Imaging protocols using higher acceleration factors (MB 3 or MB 4 Â 3 T 1, 2, 4, 6 1.5 Â 1.5 Â 1.5 6.6s, 3.3s, 1.6s, 1.1s (continued on next page) R. Bhandari et al. NeuroImage 213 (2020) 116731 against complex control stimuli (CC) that include the same objects and the same hand, but without the hand manipulating the objects. This contrast is motivated by the fact that viewing objects and hands in motion would already activate a very broad network including early visual areas, attentional brain circuits, spatial representations in addition to regions specifically representing how the hand manipulates the object.
To focus on the latter, we thus subtract stimuli that include the same basic visual elements and their spatial distribution and also include movement, but without the actual hand-object interaction (Gazzola and Keysers, 2009). This contrast is known to identify a broad network of brain regions (often referred to as the action observation network) encoding hand-object interactions in all participants via the CA-CC contrast including premotor, somatosensory, insula, inferior parietal, visual, and cerebellar regions (Gazzola and Keysers, 2009). This allowed us to assess the benefits of MB across a wide range of brain regions and their functional connectivity. Unlike most of the previous studies, we present voxel-wise group-level analysis with detailed post-hoc analyses to quantify the differences in the data acquired with different acceleration using a sample size representative of modern task-based designs (n ¼ 23 subjects).  (2017) MB, 12% in phase acceleration 10 Task Lower g-factor noise area of V1 shows significant improvements at higher SMS factors; the moderate-level gfactor noise area of the para-hippocampal place area shows only a trend of improvement; and the high gfactor noise area of the ventral-medial pre-frontal cortex shows a trend of declining t-scores at higher SMS factors. This spatial variability suggests that the optimal SMS factor for fMRI studies is region dependent. SMS accelerations of 4x (conservative) to 8x (aggressive) for most studies and a more conservative acceleration of 2x for studies interested in anterior midline regions is recommended. Task Accelerated gradient echo (GRE) sequence combining simultaneous multislice excitation (SMS) with echoshifting technique for high spatial resolution BOLD fMRI has potential for high spatial resolution fMRI at ultra-high field because of its sufficient BOLD sensitivity as well as improved acquisition speed over conventional GRE-based techniques. When data were smoothed, we found evidence of slice leakage in some, but not all, subjects. We also found evidence of SMS noise amplification in unprocessed task and processed resting-state HCP data. 3 T 8 2 Â 2 Â 2 0.7s Corbin et al. (2018) MBþ12% phase-over sampling 10 Task The "FAST" model implemented in SPM is used with a well-controlled number of parameters, it can successfully prewhiten 80% of grey matter voxels even with volume repetition times as short as 0.35 s. Temporal signal-to-noise ratio can be augmented to account for the temporal correlations in the time series. To derive a hypothesis for the expected outcomes, we looked at the underlying mechanisms that are inherent to the MB technology. On the one hand, we might expect multiband to improve second level randomeffect statistics. This is because multiband increases the number of samples that can be acquired in a given experimental time-window. More samples should increase the precision of the parameter estimates at the first level (Constable and Spencer, 2001). Because the second level residual variance is composed of the sum of 'true' between subject variance and first level errors in parameter estimates, this could directly improve statistics by reducing residual errors. On the other hand, other factors may curtail these benefits. First, measurement noise might increase due to imperfect separation of the aliased voxels (Golestani et al., 2017). Second, more samples only improve parameter estimates if the additional samples are independent, however the temporal auto-correlation increases as the samples are acquired closer in time (Bollmann et al., 2018;Corbin et al., 2018), limiting the independence of the samples and hence the benefits from acquiring more samples. Whether multiband acquisition leads to a net improvement of group-level statistics thus depends on the trade-off between these factors and is difficult to predict a priori. We therefore based it on the previous literature and hypothesized that the use of MB acquisition may show comparable if not higher t-values at a group level in response to task stimuli. Moreover, we look at individual factors such as subsampling the acquired volumes, tSNR and within and between subject variances to tease apart the contribution of these factors on our group level statistics. Finally, we performed a dual-regression analysis with the pseudo resting state data obtained by regressing out task-correlated activity to approximate the findings of the previous studies that showed an improvement of resting state statistics with MB acceleration. We hypothesized that on a group level, MB would outperform single-band acquisition in terms of the voxel-wise t-values.

Subjects
Twenty-four (13 males, 11 females) right-handed (self-reported) healthy volunteers (M age ¼ 25.5 years, SD age ¼ 3.6, Range ¼ 21 to 33) with no contraindications for MRI, and normal or corrected-to-normal vision participated in the study. After the screening procedure, subjects were familiarized with the MRI environment and were verbally informed about the observation task that they will watch during the session. The research was approved by the Amsterdam Medical Centre Ethics Committee Review Board, and the volunteers provided written consent at the start of the study. In what follows, sequences will be shorthanded as MBxSy TR res , where TR is the acquisition time of one volume, res the resolution in mm, and x and y, the multiband and SENSE factors respectively. One subject (all sessions) was removed from the analysis due to excessive head motion and the MB2S2 1:22 2:7iso session of another subject was removed due to technical error during the acquisition, resulting in a final sample size of N ¼ 23 for MB1S2 2:45 2:7iso , MB1S2 2:00 3x3x3:3 , MB4S1:5 0:70 2:7iso , MB4S2 0:63 2:7iso and of N ¼ 22 subjects for MB2S2 1:22 2:7iso session.

Task and experimental procedure
Subjects were shown movie stimuli previously used in our lab to trigger robust activity in the action observation network (Gazzola and Keysers, 2009;Valchev et al., 2016). These movies show a human hand interacting with everyday objects. Examples of such interactions are shown in Fig. 1 Valchev et al., 2016). Two types of stimuli were used: Complex actions (CA) showed the hand interacting with the object in typical, goal directed actions. For example, the hand of the actor reached for a lighter placed on the table, grasped it, and lit a candle with it. Complex controls (CC) stimuli had the exact same setting as the CA but the actor's hand did not interact with or manipulate the object on the table, instead, made aimless hand movements. A block was composed of three movies of the same category (CA or CC) and lasted 7s. Each fMRI session was composed of 13 blocks per stimulus category for a total of 26 blocks, presented in a randomized order. The inter-block-interval lasted between 8 and 12s and consisted of a fixation cross on a grey and blue background similar to the stimuli background. Five of these sessions were presented to each subject, showing the same blocks but in different order, with each session acquired with a different acquisition scheme, varying in MB factor, in-plane SENSE (S) acceleration and/or spatial resolution ( Table 2). The order of acquisition was randomized between subjects. Importantly, the duration of the five sessions was similar (~8 min), but more functional volumes were acquired during sessions where the acquisition parameters lead to shorter TRs.

Image acquisition
Data were acquired on a 3 T Philips scanner, using a commercial version of Philips' MB implementation (based on software release version R5.4). A 32-channel head coil was used. The total length of the scanning session per subject was around 50 minutes and included the functional and anatomical scans. Functional data were acquired using five different acquisition sequences (Table 2). These sequences were chosen based on the following considerations. First, we wanted to acquire one of the most typically used non-multiband sequences as a meter of comparison, and therefore acquired a sequence with a TR ¼ 2.00s and close to 3 mm isotropic resolution (MB1S2 2:00 3x3x3:3 ). Because multiband is typically used with slightly smaller voxels, we acquired all other sequences at a resolution of 2.7 mm isotropic, which is closer to the average grey matter thickness. At that resolution, we then measured an acquisition at MB1S2 (MB1S2 2:45 2:7iso ), and increased MB to 2 and 4. At MB4, we additionally reduced SENSE acceleration to 1.5 as recommended by Philips, to mitigate potentially higher noise amplification. The SENSE reconstruction method has been characterized as an image domain "unfolding" algorithm. An acceleration (with reduction factor R) would result in a reduced FOV in every component coil image. Each pixel in the individual reduced FOV coil image will contain information from multiple (R), equidistantly distributed pixels in the desired full FOV image. Additionally, these pixels will be weighted with the coil sensitivity at the corresponding location in the full FOV (Blaimer et al., 2004). In our case, . Thirteen bocks per condition were presented and each block comprised of three clips of the same condition. The inter block interval was randomized between 8 and 12 s. The blocks and conditions were randomized between the five acquisition sequences per subject and between subjects. The top row schematically illustrates the structure of each run; the bottom two rows illustrate, for a randomly chosen clip, how it differs in the two conditions: a goal directed action in CA and a random movement in CC.
the reduction factor of 1.5 would result in reduction of the FOV in the phase-encoding direction by this factor. The resulting TRs were 2.45s, 2.00s, 1.22s, 0.70s and 0.63s for the sequences MB1S2 2:45 2:7iso , MB1S2 2:00 3x3x3:3 , MB2S2 1:22 2:7iso , MB4S1:5 0:70 2:7iso and MB4S2 0:63 2:7iso respectively (Table 2). Slices were acquired in ascending order in transverse direction. TE was kept constant for all sequences at 30 ms. A default shift of ½ a FOV was used while acquiring with MB 2 and 4 to improve reconstruction quality. The flip angle was optimized to the Ernst angle of each TR. All functional images were reconstructed using the SENSE reconstruction algorithm (Setsompop et al., 2012) and this was done simultaneously with acquisition. Note that the sequence MB1S2 2:00 3x3x3:3 has a coarser spatial resolution compared to the other sequences, which would lead to differences that are not related to the total acceleration in acquisition. We therefore present the results of this sequence separately on figures rather than in the same line as the other sequences. In addition, the sequence MB4S1:5 0:70 2:7iso differs from MB4S2 0:63 2:7iso in the SENSE acceleration but not in MB factor. Any differences in these two sequences therefore will not reflect the effect of MB acceleration. A T1-weighted image was acquired at the end of the functional runs with a field of view of 240 Â 256 Â 250 mm and the voxel size was 1 mm isotropic. Geometry factor (g-factor) maps (see supplementary analysis 1) that quantify the aliasing noise per voxel (Pruessmann, 1999) were obtained for sequences MB1S2 2:45 2:7iso , MB2S2 1:22 2:7iso , MB4S1:5 0:70 2:7iso . Here the g-factor maps were computed on-scanner using the vendor's implementation and represent additional SNR penalty of multiband and in-plane SENSE acceleration. Note that the g-factor is computed from the coil sensitivity maps and the noise correlation matrix, both of which are measured in the preparation phase ahead of scanning (Pruessmann, 1999).

Reference study
As a reference for some of the analyses performed to test the effect of accelerated acquisition, we used a previously acquired dataset. Data from this previous study (Abdelgabar et al., 2018) consisted of fMRI data from an independent group of 31 subjects. Images included an anatomical scan and a functional run where the subjects viewed the same action observation paradigm used in this study (CA and CC blocks of 7s). In addition, they also saw thirteen 7s blocks of Static Control (SC) movies, which had the same elements as the CA and CC movies, but the hand lay still next to the objects without any movement. Data were acquired using a Philips Intera 3 T scanner (University Medical Centre Groningen, University of Groningen. Groningen, The Netherlands), using a 32-channel coil. First, a high-resolution, structural image (170 slices; scan resolution ¼ 256 Â 256; field of view ¼ 232 Â 232 mm; voxel size: 1 mm isotropic) was acquired. Functional images were acquired using an echo planar T2*-weighted gradient sequence (See Table 2 for additional parameters). The aim of choosing this independent dataset as a reference instead of the average data acquired in the current study was to avoid circularity, which may result in inflated statistical estimates, and to use a study with an even larger sample size to further approximate the true population activity that each multiband experiments aim to estimate. These data were acquired without any MB acceleration.

Preprocessing
Reconstructed whole brain functional data were preprocessed using SPM12 (Wellcome Trust Centre for Neuroimaging, UCL, UK) with Matlab version 8.4 (The MathWorks Inc., Natick, USA). Briefly, functional images were slice-time corrected and then realigned to the estimated average. Anatomical images were co-registered to the mean functional image (rigid body transformation, DOF ¼ 6), and segmented. The normalization parameters that were generated during segmentation (unified segmentation) were used to bring all the images to the MNI space. The resampled voxel size for the functional images was 2 Â 2 Â 2 mm and 1 Â 1 Â 1 for the anatomical scans. We adjusted the SPM12 bounding box settings to [-90 -126 -72; 90 90 108] in order to include the cerebellum completely, as the default settings of [-78 -112 -70; 78 76 85] may result in omission of some of the cerebellar voxels (Abdelgabar et al., 2018;Gazzola and Keysers, 2009). Smoothing of 6 mm FWHM Gaussian kernel was applied to the functional data. The reference dataset was preprocessed similarly.

Subject level general linear models (GLM)
GLM with task predictors: Subject level GLM included CA and CC as two separate task predictors with each predictor having 13 blocks of 7s. The reference dataset additionally included SC as a predictor with 13 blocks of 7s. Boxcar functions were convolved with canonical haemodynamic response function implemented in SPM. Regressors included the six motion parameters estimated during realignment, first five principal components of cerebrospinal fluid (CSF) and five principal components of white matter (WM) (total 16 regressors). The CSF and WM principal components were extracted from the normalized but unsmoothed functional data for each subject using the average WM and CSF segments of the subjects' segmented and normalized anatomy, thresholded at 0.7 (arbitrary units) as masks (Behzadi et al., 2007). We used the principal component analysis based regressors as they have been shown to yield better results for physiological de-noising as compared to regressors generated based on cardiac and respiratory traces recorded with the respiration belt and pulse oximeter (Kirilina et al., 2016). FAST autocorrelation algorithm implemented in SPM12 was used to correct for temporal autocorrelations in the data which has been suggested to be superior to the default AR(1) models specially for the shorter TRs (Bollmann et al., 2018;Todd et al., 2016). All the other parameters, including the high-pass filter (128 s), were left as the default in SPM12. The residuals were saved and were used to calculate the effective tSNR (see supplementary analysis 2).
GLM for resting state: Previous studies with MB show that accelerating image acquisition may have benefits for detecting RSNs (Demetriou et al., 2018;Griffanti et al., 2014;Preibisch et al., 2015;Reynaud et al., 2017). To replicate these positive findings of MB acceleration on RSNs, in Table 2 Overview of the scanning parameters used for the five acquisition sequences and the reference study. Note (in red) that the sequence MB1S2 2:00 3x3x3:3 has a coarser spatial resolution compared to the other sequences. The sequence MB4S1:5 0:70 2:7iso differs from MB4S2 0:63 2:7iso in the SENSE acceleration and not in MB factor. Reference study was collected with a different set of subjects and a different scanner. It is used to compare some results from this study. this study we created a pseudo resting state dataset by regressing out the BOLD associated with task predictors CA, CC and the 16 nuisance regressors. Briefly, a subject level GLM was performed on unsmoothed functional data with the GLM described above (i.e. two task predictors, CA and CC, and the 16 regressors). The residuals were saved which were devoid of the BOLD that was detected as correlated to our action observation task. These residuals were smoothed with 6 mm FWHM Gaussian kernel and used in a spatial GLM in FSL (http://www.fmrib.ox.a c.uk/fsl/index.html) with 20 resting state networks (RSN20) as predictors (Smith et al., 2009). FSL was used as it has a well-implemented spatial GLM algorithm as a first step of the dual regression approach. The RSN20 maps classified by Smith et al. (2009) consist of 10 primary cortical networks, 3 cortical networks with partial spatial/functional overlap with primary cortical network(s), 3 networks with spatial location corresponding to subcortical or deep cortical areas and 4 artefactual components. This spatial GLM step generated 20 time-courses per subject per acquisition sequence, one for each RSN. These time courses were then used as predictors in a subject level GLM along with the 16 nuisance regressors for the normalized and smoothed data. Note that this analysis is very similar to the dual regression analysis implemented in FSL. We decided to perform the second step in SPM to keep the default processing (e.g. auto-regression) similar to those used in the task-based GLM which was performed in SPM, so as to avoid effects of using different packages and to enable qualitative comparisons between task based and resting state analyses.

Group statistics
Random effects group t-tests were performed separately for each acquisition sequence using the parameter estimates from the subject level GLMs (CA-CC contrast or RSNs). The significance of all group analyses was evaluated at q fdr <0.05 at the voxel level and a cluster-size threshold of 50 voxels. We use FDR correction rather than FWE correction here to improve sensitivity (Lieberman and Cunningham, 2009;Lindquist and Mejia, 2015), but also provide histograms of t-values to enable scientists to appreciate the effect of MB at any threshold. When looking at the data at uncorrected p < 0.001 (see Supplementary Analysis 7), the FDRc cluster thresholds for different sequences ranged from 49 to 85. FDRc represents the minimum cluster size that limits the false discovery rate to 0.05. For visualization purposes, we therefore decided to use a common cluster threshold of 50 while evaluating the FDR corrected results.
Basic signal and noise characteristics of different acquisition sequence, which include g-factor, tSNR (predicted, effective and raw) and CNR are presented in supplementary materials (see Supplementary analysis). In the result section 3.1, we focus on the random-effect grouplevel task based BOLD outcomes for different acquisition sequences. We look at: (a) Voxel-wise t-statistics of individual sequences to see which sequences perform the best in terms of t-statistics. Next, using Receiver Operating Characteristics (ROC) on the resulting voxelby-voxel t-values we analyse how similar/different the activation maps are between different acquisition sequences and the reference study. The ROC analysis allows us to look at the hit rate and false alarm rate in a non-threshold dependent manner. The intent of using the ROC measures is not to find a "winner sequence" but to see concordance between sequences. If a sequence would show poor concordance with all other sequences, it might warrant particular prudence. However, if all sequences show high concordance, this will point to the fact that although the t-statistics might differ between sequences, the sequence that one chooses would not greatly alter functional interpretation about the brain networks involved in the cognition of interest. (b) We then label the activated brain regions using the AAL atlas (Automated Anatomical Labeling, Tzourio-Mazoyer et al., 2002), and explore the similarity of conclusions at the brain-region level using an ROC analysis (i.e. which AAL region are detected as active and which not). (c) To explain the small differences in the outcomes of different acquisition sequences we performed within and between subject variance analyses to see if there are systematic differences in the variance patterns between different sequences as that would directly impact the observed t-statistics. (d) Previous studies with resting state show that MB acceleration may enable us to shorten total acquisition times (Smitha et al., 2018).
To test this theory for task-based studies, we truncated the total acquired data and looked if it would be possible to reduce the number of task trials/total acquisition time with less loss in accuracy at higher MB acceleration.
Section 3.2 reports the group statistics of the pseudo resting state data as replication for the widely reported positive effects of MB in the resting state literature.

Random effect group level analysis
(a) Voxel-wise analyses: Fig. 2A and B presents the random-effect group activation t-maps for each sequence separately. As a validation, we also present the group maps of the reference study (Fig. 2C). Visual comparison of the group voxel-wise results look very similar across sequences, and consistent with what has been reported in Gazzola and Keysers (2009) . Fig. 2D presents the histograms of the significant t-values for the sequences MB1S2 2:00 3x3x3:3 MB1S2 2:45 2:7iso , MB2S2 1:22 2:7iso , MB4S1:5 0:70 2:7iso and MB4S2 0:63 2:7iso . Both sequences with MB4 have higher t-values than sequences with MB 1 and 2 with same spatial resolution of 2.7 mm isotropic. Interestingly, this is the case despite the effective smoothness (especially average subject FWHM) being lower in the MB4 sequences compared to sequences with MB1 and 2 at the same acquired spatial resolution ( Fig. 2A).
One notable difference in the visual inspection of the maps is that all of the sequences tested here activate fewer voxels than the independent study when visualized at the same threshold (q fdr < 0.05). Fig. 2E shows that the independent study (black line) has much higher t-values than the two best sequences (pink and red lines in Fig. 2D) of the current study. This could potentially be due to three reasons: i) the independent study was conducted on 31 participants instead of the 23 participants included here, affording it more statistical power. ii) While in the reference study subjects viewed the stimuli only once, they saw them 5 times in our within-subject MB comparison, potentially leading to some form of repetition suppression. iii) The acquired voxel size in the reference study (3.5 mm isotropic) was larger than the resolution used here (2.7 mm isotropic or 3 Â 3 Â 3.3 mm 3 ). We therefore compared two more GLMs, one including just the first 23 subjects of the reference study and one with just the first view of the stimuli in the current study (Fig. 2C). Note that the "first view" GLM consists of data coming from different sequences. To assess the impact of voxel size, we compared MB1S2 2:45 2:7iso and MB1S2 2:00 3x3x3:3 , as these differ only in the acquired voxel sizes. Visual inspection of the maps confirmed the intuition that the sample size (N ¼ 31 vs. N ¼ 23) has an impact on the group BOLD levels, as although the networks look similar, the size of the clusters appears a bit smaller when fewer participants are included (see green arrows in Fig. 2C, and grey and black lines in Fig. 2E). Next, we considered the first view in the current study regardless of the acceleration with which it was acquired. Some clusters start to appear as significant (green boxes in Fig. 2C compared to maps in Fig. 2A and B), which were either not present or were smaller, when group analyses were done per sequence and therefore contained first to fifth view of the task. Fig. 2E shows that reference study (N ¼ 23) vs. the first view in this study are very similar with slightly higher t-values for the reference study. Finally, the impact of higher voxel size can be seen in sequences MB1S2 2:45 2:7iso and MB1S2 2:00 3x3x3:3 , which differ in the voxel size but not in the acceleration: white arrows in Fig. 2A and B show the areas with better BOLD for sequence with 3 Â 3 Â 3.3 mm 3 voxels compared to sequence with 2.7 mm isotropic voxels ( Fig. 2D and E). Although it has been argued that temporal SNR no longer depends on voxel volume beyond a voxel-size of 1.5 mm isotropic while using 32 channel coils (Triantafyllou et al., 2011), our results show that going from 2.7 mm isotropic to 3 Â 3 Â 3.3 mm 3 improves the tSNR (Supplementary analysis 2) as well as the t-values for task based BOLD (Fig. 2D). Taken together the findings of the random effect analysis suggest that in otherwise identical scanning parameters, MB4 shows higher t-values than lower acceleration with MB2 or no MB. Larger voxel sizes, larger group sizes and/or novelty of the task seems to show higher t-values. In our sample, MB1 sequence with larger voxels show t-values most comparable to MB4 sequences in terms of t-values, suggesting that MB4 can be used to obtain finer spatial resolution while keeping the t-sensitivity similar to a non-accelerated sequence with larger voxel sizes.
Since the GLM measures reported above are very threshold dependent, we used ROC measures to compare the conclusions one would reach from different acceleration factors. Table 3 presents signal detection metrics (ROC, hit rate and false alarm rate) comparing the outcomes of these GLMs. The area under the ROC curve, which is less threshold dependent, is very high in all pairwise comparisons. It ranges from 85% to 90% when comparing the sequences against the reference study, and from 94% to 98% when comparing different sequences acquired here. The hit-rate was low when assessing how many of the voxels in the reference study were activated in each sequence (40% on average), and there was no systematic trend of this hit rate increasing or decreasing with the total acceleration. The hit rate was much higher when examining how much of the voxels from the current sequences were activated in the highly powered reference study (83-90%), and when comparing different accelerated sequences with each other (72% on average). The false alarm rate was consistently low, with on average only 2% of the voxels not activated in the reference study activated at any sequences, and only 3% of voxels not activated in one sequence activated in another. The sub-sample (N ¼ 23) of the reference study showed high correspondence with the full sample (99%), had a high hit rate (81%) and a small false alarm rate (1%). The hit rate for the sequences against the subsample of the reference study was slightly higher (average 45%). The Fig. 2. All maps are overlaid on the mean grey matter segment of the group. q fdr <0.05, cluster threshold 50 voxels. (A) Group maps showing the task correlated activity detected using the GLM predictors for the acquisition sequences MB1S2 2:45 2:7iso , MB2S2 1:22 2:7iso , MB4S1:5 0:70 2:7iso and MB4S2 0:63 2:7iso . At the group level the effective smoothing is~0.5 mm more in MB1S2 2:45 2:7iso compared to MB4S2 0:63 2:7iso and the average smoothness at the subject level is~1.4 mm more for MB1S2 2:45 2:7iso compared to MB4S2 0:63 2:7iso . (B) Group maps for the acquisition sequence MB1S2 2:00 3x3x3:3 . White arrows represent the effect of voxel size on the BOLD outcomes. (C) Group maps from the reference study using the same task (N ¼ 31 subjects), maps from the reference study with a smaller sample of N ¼ 23 subjects and from the current study looking at the first view of the task. Green arrows show how results change with the number of subjects. Green boxes represent the clusters that become bigger or more significant if we only consider the first view. (D) Histogram of the group t values for the CA-CC contrast in Fig. 2A and B. (E) Histogram of the group t values for the CA-CC contrast in Fig. 2C. hit rate was also higher when only the first view (vs. the reference study) was considered. These matrices suggest that on average, the maps from different acquisition sequences have a high concordance.
(b) Region-wise analyses: Next, we labelled the brain regions using AAL (thresholded at q fdr <0.05) to explore whether different sequences would lead to different conclusions about the brain network recruited by our task (Table 4). What is apparent from the table is that the core regions typically conceived of as part of the action observation network are found to be activated whatever parameters one uses (e.g. Postcentral, Precentral, Rolandic, and Supramarginal Gyrus). Such perfect agreement on the fact that a region is activated was true for 14 out of 62 brain regions in at least one hemisphere. There is also broad agreement on the fact that many regions are not activated (i.e. activated to less than 5%) in any of the sequences. This was true for 39 of the 62 brain regions in at least one hemisphere. In total, for 53 out of the tested 62 brain regions, scientists would thus arrive at the same conclusion whatever MB choice they would use. There was however a number of regions on which the different parameter choices did show substantial disagreement (highlighted in blue). It is notable, that regions of disagreement are often subcortical, including the cerebellum and basal ganglia.
While this provides a qualitative overview, to get a quantitative summary measure of how similar a conclusion would be drawn from such activation tables using different sequences, we examined ROC measures and correlation measured across the regions obtained using different sequences ( Table 5). The ROC analysis revealed that the concordance has an area under the curve of at least 0.87 for the worst of the comparisons, and is very close to perfect (0.99) when comparing the highest acceleration (MB4S2 0:63 2:7iso ) against the gold standard sequence without MB (MB1S2 2:00 3x3x3:3 ) that had voxels with 50% larger volume. Correlation measures on un-thresholded tables (in terms of percent activated) was at least 0.77 and on average 0.88, further supporting how similar a conclusion would be reached. Binarizing the table in those regions activated (i.e. with at least 5% of the voxels in that region showing significant activation) or not, reduced the correlation to 0.74 on average.
Taken together these findings suggest that on a group level, with the exception of a few regions, one would reach very similar conclusions about the cortical networks that are involved in action observation, regardless of which acquisition sequence is used, with the highest acceleration allowing a voxel-volume reduction of 50% without changing conclusions.
(c) Between and within subject variances: The measurements with MB acceleration are noisier due to the imperfect separation of the simultaneously acquired voxels (Golestani et al., 2017). However, it also allows acquisition of many more samples which are thought to increase the temporal SNR per time unit (see supplementary analysis 1, 2). As the noise levels have a direct impact on the t-value estimation, we looked at the between and within subject variances to see if the marginally higher t-values in sequences with MB4 can be explained by systematic differences in the variances. One might expect that MB acceleration has benefits on within subject variance by providing more samples, but not on between subject variance, because we did not measure more participants at higher MB. We therefore performed a variance analysis as explained in Kirilina et al. (2016). The maps of the intra-subject variance (σ 2 ) were calculated using the following equation. (1) Here, N ¼ number of subjects, c ¼ contrast of interest (CA-CC), X (1)is the generalized inverse of the design matrix of the fixed effect analysis and C (1) is the residual variance maps (ResMS) generated while estimating the fixed effect ANOVA. Superscripted T signifies the transposition of the matrix. Since the random effect variance maps are a combination of inter subject and intra subject variances, inter-subject variance maps (Σ 2 ) were calculated by subtracting σ 2 from the ResMS maps estimated during the random effect second level ANOVA. Five voxels with peak activities in the CA-CC contrast of the reference study (Fig. 2C) were selected as centers of 6 mm radius spheres (ROI1: À50, À24, 34 area PF of the inferior parietal lobe; ROI2: 40, À32, 44 Area 2 of the primary somatosensory cortex; ROI3: 30, À12, 60 superior frontal gyrus; ROI4: À40, À6, 4 insula lobe; ROI5: 38, À4, 10 area OP3 of the secondary somatosensory cortex; See inset in Fig. 3 for the location of each ROI). Fig. 3 presents the mean t-values for the fixed and random effect models, and within-and between-subject variances. As can be seen, some ROIs were dominated by within-(ROI4, 5) and some by betweensubject (ROI1, 2) variance, but in all cases, the MB4S1:5 0:70 2:7iso sequence outperforms the MB1S2 2:45 2:7iso sequence in terms of t-values. Also, within subject variance was often but not always reduced in MB4S1:5 0:70 2:7iso vs. MB1S2 2:45 2:7iso sequences. There was thus no systematic relationship between benefits of MB against the dominant source of variance.
The inter-subject variance is a positively defined value. However, its estimation is obtained as a difference between the total variance and intra-subject variance in the second level analysis. In cases when intersubject variance is much smaller than the intra-subject contribution, the difference might by chance become negative as can be seen in the case of ROI 5 and 6 where intra-subject variance dominates.
(d) GLM on first third of the data: With a specific total acquisition time, higher MB allows the acquisition of up to 4 times the number of volumes acquired with no MB. We thus explored whether the additional volumes per unit time at high MB can help when experimental time is limited. To test this, we repeated the subject level analysis using only the first third of each session for all five sequences. Group level t-maps were calculated using the CA-CC contrast image of each sequence separately and the resulting group t-map was correlated to the group t-map of the same contrast from the reference study. A similar correlation was also computed between the t-maps of the full GLM (see section 3.2) and the reference study.
As can be seen in the t-maps of the truncated dataset (one-third of the total scan) presented in Fig. 4A, many nodes that are a part of the action observation network can already be detected (see Fig. 2A-C for comparison). Moreover, looking at the histogram of these images, the overall performance pattern is similar to the results of the full dataset: higher acceleration result in higher t-values (Fig. 4B). MB1S2 2:00 3x3x3:3 which have bigger voxel size also shows higher t-values than the sequence with same acceleration but smaller voxel size (MB1S2 2:45 2:7iso ). Looking at the correlation levels, we observed two phenomena. First, all of the truncated dataset were worse than the full dataset of the same sequence (Fig. 4C), suggesting that longer tasks are beneficial even at high multiband. Second, even high multiband acceleration failed to perform quite as well as the full dataset without MB acceleration (Fig. 4C comparing the solid blue point against the dashed green, cyan or red). Given that the dashed red point (i.e. 1/3 at MB4S2 0:63 2:7iso ) actually has 1.33x as many functional volumes as the solid blue dot (MB1S2 2:45 2:7iso ), this highlights that the number of dynamic volumes alone does not determine the reliability of activations. In summary, multiband is not a surrogate for total acquisition time.

GLM for resting state networks
Multiband has so far been mainly used and validated for resting state studies, where it has been suggested to improve the ability to detect RSNs. To explore whether we can replicate that finding in our data, we performed a pseudo resting state analysis with our data. The detailed description of the method can be found in section 2.6. Fig. 5 shows a representative network (rsn06 in Smith et al., 2009) separately for each sequence. Overall, at the threshold of q fdr <0.05, the Table 4 AAL labelling of the brain regions activated for the CA-CC contrast at q < 0.05 level. Red colours are used to colour-code the level of activation in a region in terms of percentage activated. Blue cells are the brain region where there is a disagreement between different sequences, with some sequences showing more and some less than 5% activation (in both hemispheres). Note that for the vermis we show the joint activity of right and left hemisphere on the left side of the table. Values are % brain area active. Minimum red intensity is 5 and maximum is 50. network looks very similar for all the five sequences. Careful visual examination shows a small decrease in the cluster sizes as the TR increases (see white arrows and white boxes in Fig. 5 for examples).
To explore this phenomenon in a way that is less threshold dependent, and separately for each rsn, we plotted normalized cumulative histograms in Fig. 7. Using q fdr <0.05 for each rsn for MB1S2 2:45 2:7iso , we counted the number of significant voxels, N ref . For any higher point along the x axis, we plot N(t ! x)/N ref . Accordingly, for MB1S2 2:45 2:7iso , the first point along the x axis has a value of 1, which then decreases as a higher threshold is used while moving to the right. Values of other x-coordinates of sequences can then be directly understood as the number of voxels surviving that threshold relative to N ref . Plots framed in the orange outline show the component for which MB2S2 1:22 2:7iso performs best, and plots framed in red are the ones where sequences with no MB show the highest number of supra-threshold voxels. All the others components showed that MB4S1:5 0:70 2:7iso or MB4S2 0:63 2:7iso indeed identify more voxels across a wide range of thresholds. The two components (in red) which do Table 5 ROC and correlations between the levels of activation labelled using AAL atlas. No threshold in Yellow and with 5% threshold in green.

Fig. 3.
Mean parameter values for the CA-CC contrast from five ROIs (radius 6 mm) centered on the first five voxels showing highest t values (at correction q fdr <0.05) for the CA-CC contrast in the reference study. Figure A and B show the mean t-values from the random effect model and the fixed effect model, respectively. Figure C and D show the between-subject variance and the within-subject variance in these ROIs, respectively. Inset shows the location of the five ROIs. Fig. 4. (A) Group maps showing the task correlated activity detected using the task GLM predictors, but using only the first one third of the total acquisition per sequence. Overlaid on the mean grey matter segment of the group. q fdr <0.05, cluster threshold 50 voxels. (B) Histogram of the t values for the CA-CC contrast for the one third of the total acquisition per sequence. (C) Correlation between t-maps of the CA-CC contrast per sequence and t-maps of the same contrast from the 31 subjects of the reference study. Fig. 5. Group maps per acquired sequence showing a representative RSN: network number 6, as described in Smith et al. (2009). White arrows and rectangles evidence areas with visually different cluster extension across different sequences. Maps overlaid on the mean grey matter segment of the group, and thresholded at q fdr <0.05 with a minimum cluster size of 50 voxels. not show an advantage of MB are classified by Smith et al. (2009) as belonging either to deep brain regions, or are artefactual components. However, since 17 out of 20 components show that using MB4 would yield higher t-values, this analysis recommends the use of MB 4 while studying RSNs at the group level, if the aim is to be inclusive. One unexpected finding of this analysis was that the MB1S2 2:00 3x3x3:3 , which was excellent for task based fMRI (performing better than MB1S2 2:45 2:7iso and similar to the MB4S1:5 0:70 2:7iso and MB4S2 0:63 2:7iso ), was one of the least sensitive sequence for the RSNs. The potential reason for this curious finding is discussed in the discussion section.

Discussion
We tested if acquisition with a shorter TR and therefore higher sampling rateas afforded by MBmay improve our power to identify the neural substrates of cognitive functions. Based on our analysis of taskbased fMRI data, for group level analyses, the use of MB acceleration is beneficial to improve voxel-wise statistics. In our Philips 2:7iso sequence for resting state analysis. The x-axis represents the voxel wise t-values, and the y-axis the number of voxels surviving that threshold relative to the number of voxels surviving t ! 2 at MB1S2 2:45 2:7iso . MB4 sequences are most inclusive, except for the plot framed in orange and red, where MB2S2 1:22 2:7iso and MB1S2 2:45 2:7iso sequences include the largest number of voxels.
implementation, sequences with MB4 acceleration (both S1.5 and S2) and 2.7 mmisotropic voxels, showed the highest t-statistics in our taskbased fMRI analysis. Interestingly, this was the case despite the MB4 sequence generating data with less effective smoothing than the sequences with lower MB acceleration. This is surprising, because more smoothing normally benefits t-values and reduces the correction for multiple comparisons, and would thus be expected to put our MB4 sequences at a disadvantage.
Other than the effect of MB acceleration, there were three main factors that influenced the group statistics. First, the impact of voxel size is well known (Robinson et al., 2008) and with this data we show that while for tSNR, MB2 is enough to compensate for 50% reduction in voxel volume (see Supplementary analysis, tSNR section, Supplementary Fig. 3), when it comes to the group level t-values, MB4 had t-values that were similar to the t-values seen with MB1 sequence with 3 Â 3 Â3.3 mm 3 voxel size (Fig. 2B,  E). Furthermore, higher t-values were observed when the group sizes were bigger (Turner et al., 2018) and the subjects viewed the stimuli only once (Larsson and Smith, 2011). While these are indirect observations from the data presented here, it can be clearly seen from Fig. 2 C-E that MB does not compensate for factors such as larger sample sizes and repetition suppression effect. While these contributions of voxel size, group size and multiple view is not a novel discovery, the use of the higher-powered reference study helps visualize the magnitude of t-value gains with MB acceleration in a new light. It suggests that the effect sizes of the gains are not big enough to nullify the effects of these factors.
It should be noted that all the sequences used in the current acquisition were able to identify the core regions that have been shown to be a part of the action observation network (Abdelgabar et al., 2018;Gazzola and Keysers, 2009). ROC analyses suggest a strong concordance between the outcomes of different sequences (Tables 3 and 5). So if one selects any of the sequences presented here, they would reach more or less similar conclusion about the brain regions implicated in the cognition of interest. Looking at the areas where we found disagreement between the sequences, we notice that they particularly include subcortical regions including the cerebellum and basal ganglia, which are detected more often in sequences with MB acceleration or with the sequences with bigger voxel sizes (see blue box in Table 4). This finding further supports the use of MB acceleration for detecting task correlated BOLD particularly if one is interested in these regions. Moreover, using dual regression analysis we show similar benefits of MB acceleration on  supporting the evidence present in the literature that tests MB acquisition using resting state fMRI (Feinberg et al., 2010;Griffanti et al., 2014;X.-H. Liao et al., 2013).
We looked at inter and intra-subject variability in an attempt to shed light on how MB benefits group analyses. In our previous study where we tested standard single-shot 2D echo planar imaging (EPI) to three advanced EPI sequences, i.e., 2D multi-echo EPI, 3D high resolution EPI and 3D dual-echo fast EPI, inter-subject variability had a major impact in determining the sensitivity of the group-level analyses (Kirilina et al., 2016) in almost all brain regions except of prefrontal cortex, thereby limiting the potential for improving results by improving the measurement of each subject. However, in the current study, we found that the contribution of intra-and inter-subject variance to the random effect group level analysis was comparable. Therefore the dominant form of variance varies across regions, making the distinction of between-and within-subject variance less fruitful in understanding how MB benefits the group analysis (Fig. 3).
Looking at other image statistics such as G-factor and tSNR, we see that g-factor values increase with higher MB suggesting increase in noise due to poorer separation of the aliased voxels ( Supplementary Fig. 1). However, correcting for the number of acquired volumes, the predicted tSNR and the measured effective tSNR (Supplementary Table 2, Fig. 2) both improve with higher MB factor. Since the variance shows no systematic differences (Fig. 3) and the noise measures increase with higher MB (Supplementary Analysis 1, 3), increasing the sampling rate is the most likely explanation for the better t-statistics with higher MB, and appears to have outweighed the disadvantages of MB (e.g. reduction in raw tSNR and increased autocorrelation of samples).
To explore whether the additional volumes acquired with higher MB can compensate for performing fewer repetitions of the task, we truncated the data and examined the first third of each experiment. With one third of the MB4 acquisition containing as many samples as the full MB1 acquisition, one might have expected the former to generate maps quite similar to the latter. However, we see that higher multiband did not show benefits in terms of localizing activations more reliably when fewer repetitions of a condition are available. This is contrary to what has been claimed for rs-fMRI (X.-H. Liao et al., 2013). This inability of the additional data points obtained using multiband to replace those obtained by a longer task is similar to the recent findings showing that temporal down sampling by randomly removing up to 50% of time points has little effects on BOLD reliability, while truncating the datasets is associated with decreased reliability (Shah et al., 2016), and may reflect the fact that temporal autocorrelation in fMRI data may make adjacent data-points acquired with short TR comparatively redundant.
Taking these factors together, our data suggests that on our Philips 3 T scanner, and implementation of multiband (based on scanner software release version R5.4), faster scanning offers modest but significant benefits for group-level voxel-wise task-correlated statistics and can be used as a better alternative to the single-band EPI, if other variables such as voxel size, scan duration and sample size are identical. Using MB factor of 4 with in plane SENSE factor of 1.5 or 2 and spatial resolution of 2.7 mm isotropic seems superior to the other sequences used here. However, when deciding whether to invest into multiband technology for cognitive neuroscience paradigms, our analyses suggests that for task-based studies, results similar to MB acceleration of factor 4 can be achieved by increasing the voxel size. This however might not be true for resting state studies. In our resting state analysis, sequence with MB1 and 3 Â 3Â3.3 mm 3 voxel size showed the lowest t-values. The reason for the observed better connectivity estimates for smaller voxel sizes at MB1 is unclear. One potential reason could be a reduced intra-voxel dephasing due to reduced field inhomogeneity for smaller voxel sizes.
One important thing to note here is that our study focuses on comparing group-level statistics. This was chosen, because most cognitive neuroscience studies draw their conclusions from such group studies with~20 participants. An interesting additional question is whether MB would alter the t-values obtained at the single subject level. This might be particularly interesting when studying rare patients, for instance, or performing pre-surgical scanning. Even though each of our subjects was scanned with all sequences, we found it difficult to answer this seemingly simple question. This is due to two reasons. First, the difference in TR across sequences substantially changes the white-matter/grey-matter contrast ( Supplementary Figs. 6 and 7). As a result, the global mean normalization that is part of the traditional SPM pipeline applies different normalizing factors to images from different TR, and hence makes the beta maps non-comparable. Second, sequences with shorter TR acquire more samples and their single subject t-values thus need to be compared against different critical t-values due to the difference in degrees of freedom. This makes comparing the single subject t-values across sequences meaningless. We tried to compensate for this effect by z-transforming the p values (invnorm(p)) or comparing effect sizes (t/sqrt(df)), but found that doing so leads to counterintuitive results that suggest potential issues with the calculation of degrees of freedom. For a detailed analysis on this topic using the same data presented here, see (Bhandari et al., 2019).
To assess the task correlated BOLD, we convolved the boxcar with the canonical haemodynamic function. It assumes that the BOLD response peaks approximately 5 s after stimulation, and is followed by an undershoot. While this may be true in the primary sensory areas such as VI and M1, this pattern may vary in other brain regions and between subjects. Several alternative measures have been proposed for modelling the HRF (Badillo et al., 2013;Kumar and Penny, 2014;Quir os et al., 2010). A systematic study of some of the commonly used algorithms and alternatives for modelling the haemodynamic response concluded that it is surprisingly difficult to accurately recover true task-evoked changes in BOLD signal and that there are substantial differences among models in terms of power, bias and parameter confusability (Lindquist et al., 2009). One should bear in mind the settings used here while interpreting the findings.
To briefly elaborate on this issue with our own data, we re-analysed our data using the additional time and dispersion derivatives that allow us to capture differences in the latency and the duration of the peak response, respectively. The results (Supplementary Analysis 6) show that the network recruited by the task remains largely unchanged. Moreover, the histogram of the t-values shows that overall, MB4 accelerated sequences still performs better than the sequences with no or lower MB accelerated sequences as seen while modelling only with the canonical HRF.
Our study has a number of strengths and limitations that should be considered. Limitations: Because we only used a block design task, our study cannot address whether event-related designs may benefit more from increased acquisition speed, and this should be investigated in future studies. Theoretically, if the contribution of the white noise (dependent on the subject and the MRI system electronics) in the data is not substantial, then one may expect a higher gain in GLM activation with increasing task frequency (characteristic of event related designs) with shorter TRs (Chen et al., 2019). Several studies have investigated the effect of MB acceleration on event-related designs (Demetriou et al., 2018;McDowell and Carmichael, 2018;Sahib et al., 2018Sahib et al., , 2016. Together these studies showed that there might be benefits of using MB acceleration for task based paradigms while analysing functional connectivity and task-based BOLD contrasts. However, these effects might be more nuanced and may depend on more factors than just the task design, as Demetriou et al. (2018) specifically showed that while the ROI analysis were inconsistent, the benefits of using MB became apparent only when using multivariate pattern analysis. Also, whether cognitive neuroscience questions which can use MB acquisition without decreasing the TR and thus using the silent periods for presenting auditory stimuli (De Martino et al., 2015) or performing simultaneous EEG recordings (Uji et al., 2018) or other electrical recordings that may benefit from reduced interference from the radio-frequency pulses from fMRI, have not been tested here. Next, higher sampling with MB acceleration allows acquisition of high frequency components. Here we do not explore whether these higher frequencies, which are absent in the HRF convolved predictor, contain task-induced neural information. Moreover, shorter TRs offer advantages for spectral de-aliasing (Tong and Frederick, 2014a;Tong et al., 2014). Here we decided not to perform low-pass filtering because studies have shown the presence of BOLD like components in high frequency bands (Boubela et al., 2013). Here we employ 16 regressors including the six motion and ten principal components from CSF and WM (CompCorr, Behzadi et al., 2007) to de-noise the data. However, the effect of other noise cleaning procedures that have been proposed for cleaning MB data such as FIX (Boyacio glu et al., 2015), have not been tested on this data and may be considered in the future exploration using this data. We performed our analysis using the algorithms implemented in SPM that are most commonly used while analysing the data in cognitive and social neurosciences. For instance to normalize the data, the unified segmentation was used. Alternative algorithms implemented in other software, such as AIR, ANIMAL, ART, Diffeomorphic Demons, FNIRT, IRTK, JRD-fluid, ROMEO, SICLE, SyN, DARTEL, or using surfaces (Coalson et al., 2018;Klein et al., 2009) might provide alternate ways to perform normalization. Future studies could use the data we make available at https://osf.io/ncm25/?view_only¼1d15a45aa3444 a7caed90f3842d9f4e5 to examine the impact of these methods on the effect of MB in our dataset. An additional limitation of our study is that for the MB2 condition, we had 22 instead of 23 participants. Because t-values scale with the square route of the degrees of freedom, this means that we have underestimated the t-values for the MB2 by about 2%. The differences between MB2 and MB4 we report here are however larger than 2%, and we feel that our data therefore nevertheless supports benefits of MB4 over MB2, and performing all analysis on N ¼ 22 participants lead to similar results (See Supplementary Fig. 8). Finally, we report that the image grey-white CNR decreases as the TR becomes shorter ( Supplementary Figs. 6 and 7). In this data where the volunteers were healthy adults and there was minimal subject movement, we did not encounter any issues with the realignment steps of pre-processing pipeline. However, in patient population where motion might be an issue, an additional loss of CNR may result in sub-optimal realignment, thereby affecting the final statistics. This concept can be explored using controlled head movements in the scanner and may be addressed in future studies. Strengths: While most of the previous studies looked the effect of MB on the summary statistics from the subject level analyses (Demetriou et al., 2018;Kiss et al., 2018;Sahib et al., 2016;Todd et al., 2016), the current study is one of the only 2 studies that look at the effect of MB acceleration on voxel-wise group-level task-related statistics (Boyacio glu et al., 2015). While the subject-level statics are good indicators of the performance of MB, one should be careful while interpreting the findings coming from sequences that result in variable image CNR as well as have different degrees of freedom per sequence (Bhandari et al., 2019). While most previous studies looked at the effect of multiband on very simple sensory tasks and lenient contrasts, using very small sample sizes (median N ¼ 10), our study used a task and sample size more representative of contemporary cognitive neuroscience studies to have the power to detect even moderate benefits of multiband and ensure that our results are reproducible and representative. That our group results are so similar at different MB and so similar to an independent study (Fig. 2) provides evidence for the robustness of our cognitive neuroscience task and results. Using the same participants on the same scanner at different multiband levels ensures that differences in multiband performance are not the result of differences across subject pools or scanners.
Finally, there may be additional considerations for a potential user. For example, the head coil must have a sufficient number of coil elements, with a minimum of 32 elements as used in this study. Commercial solutions require appropriate licensing that needs to be acquired. The size of the data increases linearly with the MB factor, requiring additional storage space and processing timings. Moreover, different noise sources may affect the data differently because of the reduced signal to-noise ratio (SNR) per time frame due to reduced longitudinal magnetization recovery (for a detailed analysis of this topic see J. E. Chen et al., 2019). For instance, the impact of head movement on respiratory artifacts is much more pronounced with short TRs than longer TRs. Therefore, with special populations where head movement is an issue, use of shorter TRs may not be ideal or would require implementation of special preprocessing steps (J. E. Chen et al., 2019).
In summary, we can thus recommend the use of MB4 on a Philips scanner when aiming to perform group-level analyses using cognitive block design fMRI tasks using voxel sizes in the range of cortical thickness (e.g. 2.7 mm isotropic). While results will not be dramatically changed by the use of multiband, our results suggest that MB will bring a moderate but significant benefit.

Declaration of competing interest
The authors report that M.W.A. Caan is shareholder of Nicolab Inc.