Investigating the field-dependence of the Davis model: Calibrated fMRI at 1.5, 3 and 7T

Gas calibrated fMRI in its most common form uses hypercapnia in conjunction with the Davis model to quantify relative changes in the cerebral rate of oxygen consumption (CMRO2) in response to a functional stimulus. It is most commonly carried out at 3 T but, as 7 T research scanners are becoming more widespread and the majority of clinical scanners are still 1.5 T systems, it is important to investigate whether the model used remains accurate across this range of field strengths. Ten subjects were scanned at 1.5, 3 and 7 T whilst performing a bilateral finger-tapping task as part of a calibrated fMRI protocol, and the results were compared to a detailed signal model. Simulations predicted an increase in value and variation in the calibration parameter M with field strength. Two methods of defining experimental regions of interest (ROIs) were investigated, based on (a) BOLD signal and (b) BOLD responses within grey matter only. M values from the latter ROI were in closer agreement with theoretical predictions; however, reassuringly, ROI choice had less impact on CMRO2 than on M estimates. Relative changes in CMRO2 during motor tasks at 3 and 7 T were in good agreement but were over-estimated at 1.5 T as a result of the lower signal to noise ratio. This result is encouraging for future studies at 7 T, but also highlights the impact of imaging and analysis choices (such as ASL sequence and ROI definition) on the calibration parameter M and on the calculation of CMRO2.


Introduction
Gas calibrated functional magnetic resonance imaging (fMRI) has emerged as a promising tool to non-invasively measure stimulus evoked changes in the cerebral metabolic rate of oxygen consumption (CMRO 2 ) (Davis et al., 1998;Hoge, 2012). Not only is this more directly physiologically relevant than measuring only the blood oxygen leveldependent (BOLD) signal, but it has also been shown to be more consistent between subjects and between scanning sessions (Krieger et al., 2014).
Calibrated fMRI is most commonly performed at 3 T, but with the emergence of 7 T research systems there is interest in translating the method to higher field strengths. It is clear that as a physiological parameter CMRO 2 should not be affected by the field strength at which it is measured. However, calibrated fMRI relies on a simple model of the BOLD signal known as the Davis model (Davis et al., 1998) and it is unclear how translation to different field strengths affects its accuracy. Due to the continuing clinical dominance of 1.5 T scanners, we were also interested in revisiting the use of this lower field strength, enabling a three-way comparison to be performed. Simulations of the calibrated fMRI experiment suggest that the method is feasible at these field strengths, so long as the Davis model parameters are adjusted to reflect the altered BOLD sensitivity Griffeth et al., 2013). It has also been predicted that systematic error may be increased at 7 T compared with lower field strengths . Experimental confirmation of these findings has so far not been performed.
The aim of this study was to carry out a direct comparison of the hypercapnia calibrated fMRI method at 1.5, 3 and 7 T, in order to test the level of agreement in the measured estimates of relative CMRO 2 across field strengths. Detailed simulations of the BOLD signal were performed to predict general trends in the calibration parameter M across field strengths, and to provide a basis for interpreting the experimental results. Implicitly this study is also an investigation into the robustness of the Davis model itself, which was originally developed for use at 1.5 T, and how well it translates across field strengths.

Theory
The Davis model (Eq. (1)) describes the extravascular BOLD signal as a function of changes in cerebral blood flow (CBF) and CMRO 2 (Davis et al., 1998;Hoge et al., 1999). Subscript 0 variables represent baseline values. Changes in CBF are related to changes in cerebral blood volume (CBV) via the Grubb exponent α (Grubb et al., 1974). This physiological interpretation of α was retained and was therefore set independent of field strength as 0.2 Pike, 2009, 2010). The BOLD signal is related to underlying changes in susceptibility via the exponent β, which varies with vessel size and also with field strength.
In the Davis model the calibration parameter M is defined as TE × A × CBV 0 × [dHb] v0 β , where TE is the BOLD-weighted echo time, A is a proportionality constant which depends on field strength and tissue properties, CBV 0 is the baseline CBV and [dHb] v0 is the baseline venous deoxyhemoglobin concentration of the blood (Hoge et al., 1999). Therefore, the calibration parameter is expected to vary with inter-subject or inter-session variations in physiology and tissue type. Hence it is necessary to calculate M during each scan session. This is most commonly achieved by applying an isometabolic stimulus, such as mild hypercapnia, during which CMRO 2 is assumed to remain constant (Hoge, 2012). Eq.
(1) can then be rearranged to estimate M given measurements of the changes in CBF and BOLD signal acquired during the hypercapnic challenge.
The definition of M also describes a dependence on the BOLD weighted TE. Since many different values of TE have been used throughout the literature it is often instructive to be able to scale the calibration parameter to the optimal BOLD echo time for each field strength, which in this study were considered to be 50/35/25 ms at 1.5/3/7 T, respectively (van der Zwaag et al., 2009). To do this we assume that A has no dependence on TE, and multiply the experimental M by the ratio (optimal TE/actual TE).

BOLD signal simulations
A detailed model of the BOLD signal (Griffeth and Buxton, 2011) was used to simulate the hypercapnia calibration experiment, with the aim of predicting general trends in M across field strengths and to aid in interpreting experimental results. The original detailed signal model was designed to simulate the BOLD signal at 3 T. In order to extend the model to 1.5 and 7 T a few modifications were made, which are detailed below. In brief, the basic model consists of a volume-weighted sum of arterial (a), capillary (c) and venous (v) intravascular compartments and a single extravascular (e) compartment.
The signal contributions were summed according to Eq. (2) under baseline and hypercapnic conditions and combined to simulate the BOLD signal change (see appendix of Griffeth and Buxton (2011)). Extravascular signals (S e ) were modelled using the results of numerical simulations for two vessel scales to reflect the different signal characteristics of capillaries (β = 2) compared with arteries and veins (β = 1) (Ogawa et al., 1993). These signals were added to the baseline R 2 ⁎ of the extravascular tissue space (R 2E *(0)). Intravascular signal contributions (S a , S c , S v ) have been described by empirical measurements of the blood transverse relaxation rate R 2 ⁎ as a function of oxygenation and haematocrit (Zhao et al., 2007). Models of both extravascular and intravascular signal enabled the effects of blood oxygenation and haematocrit to be simulated. Blood was distributed to each of the compartments as a fraction (Ω) of the total CBV (V T ), e.g. V a = Ω a V T . Relative volume fractions were set as Ω a = 0.2, Ω c = 0.4, and Ω v = 0.4. However, the model described thus far is only applicable at 3 T.
The following modifications were made to enable simulations to be performed at all three field strengths used in this study. The existing empirical measurements of blood R 2 ⁎ were replaced by the results of a multi-field relaxometry experiment (Blockley et al., 2008). Due to the limited 7 T relaxometry literature it was only possible to generalise R 2 * for oxygenation and not haematocrit. The quadratic relationship of R 2 ⁎ on oxygenation was retained, i.e. R 2 ⁎ = C 1 + C 2 × (1 − Y) 2 , where C 1 and C 2 are field strength specific constants and Y is the fractional blood oxygenation. Therefore, unlike the previous model where R 2 * was modelled as a function of oxygenation and haematocrit, the intravascular signal dependence on oxygenation in the present study was described for a fixed haematocrit (Hct = 0.44) at all field strengths. Whilst this limits the ability of the model to test the effect of intersubject physiological variability, it enables general trends in M to be investigated across a range of field strengths. Baseline extravascular tissue R 2 ⁎ values (R 2E *(0)) were assigned field-specific values based on relaxometric measurements made at each field strength (van der Zwaag et al., 2009). BOLD echo times were set to match the experimental acquisition. These parameters are summarised in Table 1.
Simulations were initially performed for the baseline physiological state of a hypothetical average healthy individual. Haematocrit was set to be 0.44 (McPhee and Hammer, 2009), oxygen extraction fraction (OEF) was set at 0.4 (Hatazawa et al., 1995) and total CBV for grey matter was set as 0.05 (Perlmutter et al., 1987). Hypercapnia calibration was simulated as a 30% increase in CBF with no change in CMRO 2 . The BOLD signal response to this challenge was simulated using the detailed BOLD signal model. The BOLD and CBF changes were then combined with Eq. (1) to calculate M at each field strength using an α of 0.2 Pike, 2009, 2010) and β values of 1.5/1.3/1.0 at 1.5/3/7 T respectively, consistent with the literature (Bulte et al., 2009;Davis et al., 1998;Driver et al., 2012). Estimates of M are dependent on the baseline physiological state of the individual and it is therefore expected to vary across the population. To test how this variability is affected by different field strengths, M was estimated for BOLD signals generated with several values for baseline CBV and OEF. Perlmutter et al. (1987) recorded a mean CBV of 0.05 ± 0.01. Extending the range to 2 standard deviations from the mean, limits of 0.03 and 0.07 were also simulated. Similarly, resting OEF values of 0.3, 0.4 and 0.5 were investigated (Hatazawa et al., 1995) whilst keeping CBV constant at 0.05. Simulated M values were linearly scaled to the optimal TE values for each field strength for comparison purposes.

MRI parameters
Subjects were scanned on 1.5 T Avanto, 3 T Verio and 7 T systems (Siemens Healthcare, Erlangen, Germany) with 12-channel (1.5 T) and 32-channel (3 and 7 T) head coils. Scans were carried out on separate days to minimize the effects of fatigue and habituation. Because specific absorption rate (SAR) was anticipated to be a limiting factor on sequence design at 7 T, a pulsed (rather than pseudo-continuous) ASL sequence was implemented to measure CBF (Alsop et al., 2015). Flowsensitive alternating inversion recovery (FAIR) (Kim, 1995) was chosen to minimize the effects of B 1 inhomogeneity (Gardener et al., 2009), and the QUIPSS II scheme (Wong et al., 1998) was used to improve quantification of perfusion.
A single echo at 17 ms provided sufficient signal to noise ratio (SNR) for both CBF and BOLD analysis at 3 and 7 T; a dual echo version of the same sequence was implemented at 1.5 T with echoes at 17 and 50 ms to ensure sufficient BOLD contrast. All other imaging parameters were kept constant across scanners. Six slices were acquired (limited by SAR at 7 T) with an echo-planar imaging (EPI) readout and were placed axially to cover the motor cortex. For consistency across scanners, no acceleration methods (such as parallel imaging or partial Fourier) were used. Bandwidth was set to 3004 Hz/Px, inversion times were TI 1 = 700 and TI 2 = 1800 ms, and repetition time was 3 s. Note that in pulsed ASL the bolus duration is fixed by TI 1 and the effective post-labelling delay (PLD) is given by TI 2 -TI 1 (Alsop et al., 2015), so 1100 ms in this experiment. It is common to use shorter PLDs in pulsed ASL compared to (pseudo-)continuous implementations to compensate for the reduced SNR inherent in pulsed ASL. Although this may result in incomplete delivery of tagged blood to the imaging slices, it is also more suited to gascalibrated experiments, where arterial arrival times are shortened during hypercapnia. Voxel size was 4.1 × 4.1 × 5.0 mm 3 with a 1 mm slice gap in order that SNR of ASL data at 1.5 T did not become prohibitively low. FAIR labelling used a 60 mm selective and a 260 mm nonselective slab at all field strengths.

Functional and respiratory tasks
A bilateral finger tapping motor task was chosen to easily allow a consistent implementation across scanner suites. Subjects were given audio cues over the intercom systems and were instructed to perform 4 blocks of self-paced finger tapping (48 s ON, 48 s OFF). This was followed by 2 blocks of hypercapnia (3 min duration, each followed by 2 min of air), as shown in Fig. 1. Subjects were instructed to perform the motor task at a fast but comfortable rate, and these instructions were repeated prior to each scan session. Performance of the task was monitored from the control room and all subjects were observed to fully cooperate.
Gas delivery and sampling was achieved through a nasal cannula (dual Nare, Flexicare, Mountain Ash, UK) in conjunction with a CO 2 gas analyzer (CO2 100C Biopac Systems, Goleta, CA, USA). A 10% CO 2 / 21% O 2 gas mixture with balance nitrogen was delivered, which mixed with room air at a ratio of approximately 1:1, resulting in an~5% CO 2 stimulus. In order to minimize subject awareness of the hypercapnia stimulus, medical air (21% O 2 with balance nitrogen) was delivered during periods of air breathing, including during the motor tasks.

Analysis
Data were analyzed using the FMRIB Software Library (FSL) and the fMRI Expert Analysis Tool (FEAT) (Woolrich et al., 2009). Preprocessing consisted of motion correction (Jenkinson et al., 2002), brain extraction (Smith, 2002) and fieldmap correction (Jenkinson et al., 2012). Highpass filters of 10 and 300 s were applied for ASL and BOLD respectively, and no spatial smoothing was applied. The following contrasts were input as part of a general linear model within FEAT: (1) tagcontrol signal, modelled as a square on/off shape, which describes baseline perfusion; (2) BOLD response to motor tasks A + B, with a haemodynamic response function modelled as a gamma convolution with mean lag 6 s, standard deviation 3 s; (3) BOLD response to motor tasks C + D, as above; (4) BOLD response to hypercapnia, modelled with a gamma convolution with mean lag 42 s, standard deviation 30 s to account for the delay between switching to the CO 2 mixture and the cerebrovascular response to increased arterial partial pressure of CO 2 in the brain; (5-7) blood flow responses to motor tasks A + B, C + D and hypercapnia, modelled as interactions between contrasts (1) and the three BOLD responses, respectively. In this context, the term "interaction" refers to the fitting of the product of two existing contrasts, and may be modelled in FEAT simply by checking the relevant tick box. To avoid circular analysis (Kriegeskorte et al., 2009), data from motor tasks A + B were used to create a region of interest (ROI), and only data acquired during motor tasks C + D were analyzed to quantify response to motor stimuli (see Fig. 1). Other combinations of block-wise analysis were investigated (e.g. A + D for ROI definition, B + C for data analysis), and this choice was found to have no substantial impact on the results.
Two methods were used to create motor ROIs. First, a "BOLD only" ROI was defined as the 40% of voxels with the highest uncorrected voxel-wise z-stats in response to motor tasks (A + B only, see Fig. 1). This cutoff was used in place of setting a z-stat threshold directly as it was more consistent between subjects and particularly across field strengths. Similarly, a grey matter (GM) mask was created from the 40% of voxels with the most significant baseline tag-control signal difference. Multiplication of the BOLD ROI and GM mask produced a "BOLD/GM" motor ROI. The creation of "CBF only" and "BOLD/CBF overlap" ROIs was not possible due to the low SNR of ASL data, particularly at 1.5 T.
Further analysis was performed on coefficient of parameter estimate (COPE) outputs of the FEAT analysis in MATLAB (MathWorks, Natick, MA, USA). BOLD signal responses were normalized with respect to the mean signal over the entire time course, and CBF responses were normalized to the baseline perfusion signal, as output by the tag-control COPE. Voxels with a BOLD response to either motor or hypercapnia stimuli b 0 or N 0.10, or a CBF response b1 (no change) or N3 (+200% relative to baseline), were assumed to be noise or to contain significant fractions of white matter or cerebrospinal fluid, and were excluded from further analysis. Mean values within the remaining voxels for BOLD and CBF responses to motor tasks (C + D) and to hypercapnia were used to calculate M and relative CMRO 2 on a per subject basis, according to Eq. (1), with the assumption that mild hypercapnia does not alter CMRO 2 . The Grubb exponent α was set at 0.2 for all field strengths Pike, 2009, 2010), and β values of 1.5/1.3/1.0 were used at 1.5/3/7 T respectively (Bulte et al., 2009;Davis et al., 1998;Driver et al., 2012). Finally, M values at 3 and 7 T were scaled to optimal echo times for comparison purposes.
Relative changes in CMRO 2 induced by the motor tasks were compared across field strengths using the one sample paired t-test, where the null hypothesis that measured CMRO 2 changes are independent of field strength was rejected if p b 0.05. Bland-Altman diagrams were used to investigate any systematic biases between field strengths. ASL image SNR was calculated from a single resting tag-control subtracted time point, as the mean ASL signal within the BOLD/GM ROI divided by the standard deviation in a non-brain ROI. In order to avoid unfair bias, noisy voxels were not excluded from the BOLD/GM ROI for this calculation.

BOLD signal simulations
Simulations of hypercapnia calibration using a CBV of 0.05 and the actual experimental TE's used predicted M values of 0.085 at 1.5 T, 0.076 at 3 T and 0.191 at 7 T. When scaled to the optimal TE values, M was predicted to be 0.085 at 1.5 T, 0.156 at 3 T and 0.281 at 7 T. These optimal values are plotted in Fig. 2 as crosses; circles and triangles represent the predicted M values for ± 2 standard deviations for physiological CBV values respectively. The simulations predict that variation in resting CBV leads to a greater range of M values at higher magnetic field strengths. Similar increases in the range of M values with field strength were observed when the range of baseline total OEF was investigated (data not shown).

Experiment
Ten consenting subjects (3 females, mean age 29 ± 6 years) were successfully scanned on 1.5, 3 and 7 T systems. Examples of fractional BOLD and CBF responses to motor tasks are shown in Fig. 3. Group average responses to hypercapnia and motor tasks are summarised in Table 2. End-tidal CO 2 levels were monitored throughout the course of the experiments; however, the equipment performed poorly and the resultant traces were deemed unreliable, and thus are not reported here. Although unconfirmed, the authors have no reason to believe that end-tidal CO 2 modulations were inconsistent between the three scan suites. The BOLD response appears smallest at 3 T because of the shorter than usual echo time (TE = 17 ms), whereas the echo times used at 1.5 and 7 T were close to the field-optimized values. Nevertheless, the BOLD signal at this shorter echo time of 17 ms at 3 T was easily sufficient for this analysis; hence a second echo was not acquired. Changes in CBF are consistent between 3 and 7 T but are substantially higher at 1.5 T for both stimuli and both ROIs. Fig. 4 shows how M and relative CMRO 2 to motor tasks vary with field strength, where each blue cross represents an individual subject, and group means are indicated by red circles. The BOLD/GM ROI produces consistently higher M estimates than the BOLD only ROI. There is a much larger standard deviation in M values at 7 T than at 1.5 or 3 T; however this variability does not propagate to CMRO 2 , where the largest inter-subject standard deviation was always seen at 1.5 T.
One sample paired t-tests were carried out to determine whether relative CMRO 2 changes were consistent between field strengths. For the BOLD/GM ROI analysis, 1.5 T and 3 T results were shown to be inconsistent (p = 0.022). Similarly, the null hypothesis was rejected when comparing 1.5 T and 7 T (p b 0.001). However, 3 T and 7 T did not yield significantly different results (p = 0.166). When considering relative CMRO 2 in the BOLD only ROI, only 1.5 T vs. 7 T reached statistical significance (p = 0.001).
The Bland-Altman diagrams in Fig. 5 show graphically that the relative CMRO 2 to motor tasks as estimated by the Davis model at 3 and 7 T are in good agreement. However, they are consistently higher when carried out at 1.5 T, compared with either 3 or 7 T, by an average of~10% (using the BOLD/GM ROI).
CBF responses to motor tasks (Fig. 6) and hypercapnia (data not shown) had a significantly broader distribution at 1.5 T than at 3 or 7 T. After applying voxel exclusion criteria (illustrated by solid red lines in Fig. 6), the mean of the remaining voxels (indicated by the dashed lines) was larger at 1.5 T than at higher fields. For illustrative purposes Fig. 6 includes data from all 10 subjects, but the same trends were seen at the single subject level.
Finally, Fig. 7 shows how the SNR of resting ASL data increases with field strength. This is generally considered to be the limiting factor on the accuracy of calibrated BOLD methods, as the ASL signalthe difference between tag and control imagesis always substantially smaller than the directly measured BOLD signal.

Discussion
Measurements of stimulus evoked changes in CMRO 2 using gas calibrated fMRI provide a more direct physiological measurement of underlying neural activity than standard BOLD fMRI. The technique was developed at 1.5 T and has increased in popularity with the more widespread availability of 3 T systems. However, very little research has been published to date regarding the translation of the technique to 7 T. Simulations predict that the Davis model parameters must be adjusted for this increase in field strength by using a β value of 1, compared with 1.5 and 1.3 at field strengths of 1.5 T and 3 T Griffeth et al., 2013). The small number of published experimental 7 T studies have universally used β = 1, consistent with these simulations Hall et al., 2012;Krieger et al., 2014). The potential advantages of performing calibrated fMRI at 7 T come from improved image SNR for both BOLD and ASL (Triantafyllou et al., 2005) and an increase in longitudinal relaxation times for ASL resulting in higher SNR CBF estimates (Franke et al., 2000). This increased SNR has so far been used to produce maps of M  and to incorporate persubject measurements of changes in CBV, rather than assume a fixed CBF-CBV coupling . However, the accuracy of CMRO 2 measurements made at 7 T has until now been untested. In this study the validity of the calibrated fMRI technique across field strengths was investigated by examining the consistency of measurements acquired at 1.5 T, 3 T and 7 T.

Main findings
The relative changes in CBF and CMRO 2 in response to a motor task observed in this study are broadly in line with those reported in the literature. For example, Donahue et al. (2009) reported a CBF increase of 46 ± 11% and CMRO 2 increase of 12 ± 13% at 3 T; Kastrup et al. (2002) reported a CBF increase of 71 ± 9% and a CMRO 2 increase of 16 ± 9% at 1.5 T; Petr et al. (2014) reported a group average CBF increase of 47.2% at 3 T; Stefanovic et al. (2006) reported a CBF change of 45.6 ± 0.57% at 1.5 T; and Vilela et al. (2011) reported a CBF change of 62 ± 7% at 3 T. All of these studies also used pulsed ASL techniques. Estimates of M were observed to increase roughly linearly with field strength (Fig. 4), consistent with the results of the detailed BOLD signal model simulations (Fig. 2). M values at 7 T were found to have the largest standard deviation, despite the predicted improvements in SNR (Table 2). Simulations indicate that variations in the underlying cerebral physiology (such as CBV and OEF) will result in a broader range of M values as magnetic field increases. This suggests that the increase in the standard deviation of M at higher fields is due to differences in physiology across the subject group, rather than an increase in random noise. This ability of M to control for physiological variability is an important characteristic of the Davis model, meaning that this variability is not propagated through to estimates of CMRO 2 change.
Intra-subject CMRO 2 was consistent between 3 and 7 T, which is encouraging for research centres investing in ultra-high field scanners. Future studies at these high fields could make use of the improved SNR to image at a higher resolution than has been possible until now. Changes in CMRO 2 measured at 1.5 T were consistently greater than those at 3 and 7 T (Figs. 4 and 5). We hypothesize that this is an artefact of the lower SNR of 1.5 T scanners (see Fig. 7) combined with our voxel inclusion criteria for ROI analysis, which together artificially increase the average CBF response of the remaining voxels for analysis at 1.5 T as compared to 3 and 7 T. As Fig. 6 shows, the distribution of motor responses is much broader at 1.5 T than at higher field strengths, elevating the mean of remaining voxels.
The exclusion of approximately 50% of voxels from further analysis within the BOLD/GM ROI at 1.5 T is a cause for concern. In experiments carried out at either 3 or 7 T, ROIs are typically defined from voxels that exhibit positive BOLD and ASL responses (either in absolute terms or those with the highest z-stats) to tasks, and there is no need to define an additional "GM" criterion. Unfortunately due to the poor SNR of pulsed ASL data, especially at 1.5 T, this was not viable in the current study and it was necessary to rely only on BOLD and resting ASL data for ROI determination. Fig. 6 highlights the difficulties when using pulsed ASL at 1.5 T, demonstrating that the high level of noise can introduce significant biases during analysis, and that these results should be interpreted with caution.

ROI selection
The large physiological and inherent scanner noise present in functional data, especially in ASL, presents a challenge. Increasing voxel size helps to increase SNR, but at the cost of lower specificity and greater partial voluming within voxels. This can make it difficult to create accurate GM masks, as many voxels will contain a mix of tissue types. In addition, the small number of slices and large voxels acquired in the current study made accurate co-registration of functional to structural data impossible. As a surrogate, good tag-control contrast in resting ASL data (i.e. high baseline perfusion) was used to define GM voxels (Gauthier and Hoge, 2013).
The choice of ROIin this case, BOLD only versus BOLD/GMhad a significant impact on the ensuing estimates of M. Values calculated within the BOLD only motor ROI were comparable with results from previous studies at 3 T, at which the majority of calibrated fMRI studies have been carried out (see Table 3). M values in the BOLD/GM ROI were larger, but were in closer agreement with the predictions of the simulations. This is likely a reflection on the assumptions of the detailed BOLD signal model used for simulations. In contrast the method used to create the BOLD only ROI intentionally followed the analysis procedure from past studies rather than matching modelling assumptions. Recent studies at 7 T have taken greater care to extract only GM voxels for analysis, with methods and resulting M values more similar to our BOLD/GM ROI. Studies at 3 T have also recommended defining ROIs based on mapping stimulus evoked changes in CBF, which have been shown to result in greater reproducibility of CMRO 2 estimates across sessions (Leontiev and Buxton, 2007). Defining the ROI in this manner is also likely to more closely align with the assumptions of the detailed BOLD signal Fig. 3. ASL and BOLD responses to motor tasks for one representative subject. ASL units are normalized to baseline, as are BOLD signal increases. Note that no smoothing has been applied. model. However, our ability to apply this technique in this study was limited by the SNR of the ASL measurements, particularly at 1.5 T.
Because of the wide distribution of CBF changes within both ROIs, it was necessary to exclude some further voxels from later analysis. Thresholds were applied to include only those voxels with relative changes in CBF between 1 and 3 (in response to motor or hypercapnia stimuli) in order to remove voxels that were deemed unacceptably noisy or contained an insufficient fraction of grey matter. However, as the remaining voxels do not even approximate a normal distribution (see Fig. 6), neither the mean nor median values truly capture the overall flow response in the ROI. Nonetheless we followed convention and used mean values for further analysis, but this caused a noticeably higher apparent blood flow responses at 1.5 T as compared to 3 and 7 T (see Table 2).
The results of this study suggest that the Davis model is reassuringly insensitive to field strength, provided that the value for β is adjusted appropriately (Bulte et al., 2009;Davis et al., 1998;Driver et al., 2012). The observation that choice of ROI has a large effect on M but only a minimal impact on CMRO 2 is a reflection of the power of the model in regressing out potentially confounding parameters. By combining all auxiliary parameters into one calibration constant, M, the model becomes very wide-ranging and remains relatively insensitive to residual factors, at least within healthy tissue.

Limitations
The primary question that this study sought to answer was whether the translation of calibrated fMRI from 3 to 7 T would alter the estimated changes in CMRO 2 during a functional task. In an attempt to remove confounding factors from this comparison, scan parameters were kept constant wherever possible. This included implementing the same pulse sequence (FAIR) with the same readout (single echo EPI at 17 ms). Unfortunately the use of a single echo was not feasible at 1.5 T, where the optimal BOLD echo time is much longer (50 ms or more), so it was necessary to modify the sequence and add a second echo at this lower field strength. As a result of these choices, readouts were acquired at close to optimal BOLD echo times at 1.5 T (50 ms, optimal is 50 ms) and 7 T (17 ms, optimal is 25 ms), whereas the 3 T BOLD signal was extracted from the 17 ms echo despite a longer optimal TE of 35 ms. This discrepancy led to a lower experimental BOLD response at 3 T (see Table 2), and may have negatively impacted the BOLD SNR at this field strength. The lower BOLD value was accounted for by the Fig. 4. M and relative CMRO 2 as a function of field strength. Individual subjects are marked by blue crosses, group means by red circles. Fig. 5. Bland-Altman plots comparing relative CMRO 2 to motor tasks (BOLD/GM ROI) at different fields. In all diagrams, mean differences are shown as solid lines and 95% confidence intervals as dashed lines.
linear scaling of M value to the optimal echo time (see Methods section), although of course this cannot recover the lost SNR.
SNR is inherently lower at 1.5 T compared with higher field strengths. This problem has been amplified in this study by the use of a pulsed ASL sequence and a 12-channel head coil. In comparison to (pseudo-)continuous ASL, pulsed ASL is more sensitive to changes in flow velocity during motor tasks and gas challenges, as shortened arterial arrival times during stimulation may lead to differing volumes of tagged blood arriving at the imaging planes. Furthermore, the increased intravascular contribution to the BOLD signal at lower field strengths may further contribute to the CMRO 2 discrepancy; in the absence of large vessels,~57% of all signal at 1.5 T is of intravascular origin, as compared to~36% at 3 T and a negligible contribution at 7 T (Uludag et al., 2009). However, it has been shown that the Davis model can account for this signal contribution via the parameters α and β (Griffeth and Buxton, 2011).
Another consequence of the use of a pulsed ASL sequence is the potential for underestimating flow responses during hypercapnia. Tancredi et al. (2012) have reported that although pulsed and pseudocontinuous methods show good agreement in baseline perfusion and focal activation responses, the global nature of the CBF increase during hypercapnia appears to lead to an underestimation in flow response when pulsed ASL is used. This may have led to an overestimation in M values and thus in relative CMRO 2 estimates during motor tasks. However, this issue would have impacted all fields equally, and thus is not expected to affect the conclusions of this study.
In order to implement the same protocol on all three scanners, it was necessary to make several compromises in terms of sequence design and choice of parameters. It is important to bear in mind that SNR at 1.5 and 3 T could be improved by implementing a pseudo-continuous ASL sequence; similarly the intrinsically greater SNR at 7 T could be utilized by increasing the resolution, which would also reduce physiological noise (Triantafyllou et al., 2005). However, the primary aim of this work was to fairly compare the outcomes of the Davis model as a function of field strength only, by keeping as many other variables constant as possible.

Conclusion
Changes in CMRO 2 during a motor task, as calculated by the Davis model, were consistent between 3 and 7 T and were also in close agreement with the results of theoretical simulations. This is encouraging for future studies of calibrated fMRI at ultra-high fields, and supports the Fig. 6. Histograms of voxel-wise blood flow response to motor tasks (C + D only), pooled from all 10 subjects' BOLD/GM ROIs. Solid red lines indicate cutoff conditions beyond which voxels were excluded from further analysis; dashed red lines indicate means of remaining voxels. Fig. 7. Signal to noise ratios of subtracted ASL images (tag-control) at different field strengths during the baseline condition. Signal was measured in motor regions, defined by BOLD/GM ROIs (prior to the removal of noisy voxels). Error bars show ± one standard deviation. continued use of this simple signal model. The lower SNR at 1.5 T may present problems for the calibrated fMRI method, which relies heavily on ASL data. In this study the CMRO 2 results were consistently overestimated at 1.5 T, although this may be a result of the sub-optimal pulsed ASL sequence used. Voxel exclusion criteria and methods for ROI creation for post-processing are important and should be carefully considered and clearly stated in future studies, as they can have a significant impact on results.