Measurement of oxygen extraction fraction (OEF): An optimized BOLD signal model for use with hypercapnic and hyperoxic calibration

Several techniques have been proposed to estimate relative changes in cerebral metabolic rate of oxygen consumption (CMRO2) by exploiting combined BOLD fMRI and cerebral blood flow data in conjunction with hypercapnic or hyperoxic respiratory challenges. More recently, methods based on respiratory challenges that include both hypercapnia and hyperoxia have been developed to assess absolute CMRO2, an important parameter for understanding brain energetics. In this paper, we empirically optimize a previously presented "original calibration model" relating BOLD and blood flow signals specifically for the estimation of oxygen extraction fraction (OEF) and absolute CMRO2. To do so, we have created a set of synthetic BOLD signals using a detailed BOLD signal model to reproduce experiments incorporating hypercapnic and hyperoxic respiratory challenges at 3T. A wide range of physiological conditions was simulated by varying input parameter values (baseline cerebral blood volume (CBV0), baseline cerebral blood flow (CBF0), baseline oxygen extraction fraction (OEF0) and hematocrit (Hct)). From the optimization of the calibration model for estimation of OEF and practical considerations of hypercapnic and hyperoxic respiratory challenges, a new "simplified calibration model" is established which reduces the complexity of the original calibration model by substituting the standard parameters α and β with a single parameter θ. The optimal value of θ is determined (θ=0.06) across a range of experimental respiratory challenges. The simplified calibration model gives estimates of OEF0 and absolute CMRO2 closer to the true values used to simulate the experimental data compared to those estimated using the original model incorporating literature values of α and β. Finally, an error propagation analysis demonstrates the susceptibility of the original and simplified calibration models to measurement errors and potential violations in the underlying assumptions of isometabolism. We conclude that using the simplified calibration model results in a reduced bias in OEF0 estimates across a wide range of potential respiratory challenge experimental designs.

Several techniques have been proposed to estimate relative changes in cerebral metabolic rate of oxygen consumption (CMRO 2 ) by exploiting combined BOLD fMRI and cerebral blood flow data in conjunction with hypercapnic or hyperoxic respiratory challenges. More recently, methods based on respiratory challenges that include both hypercapnia and hyperoxia have been developed to assess absolute CMRO 2 , an important parameter for understanding brain energetics. In this paper, we empirically optimize a previously presented "original calibration model" relating BOLD and blood flow signals specifically for the estimation of oxygen extraction fraction (OEF) and absolute CMRO 2 . To do so, we have created a set of synthetic BOLD signals using a detailed BOLD signal model to reproduce experiments incorporating hypercapnic and hyperoxic respiratory challenges at 3 T. A wide range of physiological conditions was simulated by varying input parameter values (baseline cerebral blood volume (CBV 0 ), baseline cerebral blood flow (CBF 0 ), baseline oxygen extraction fraction (OEF 0 ) and hematocrit (Hct)). From the optimization of the calibration model for estimation of OEF and practical considerations of hypercapnic and hyperoxic respiratory challenges, a new "simplified calibration model" is established which reduces the complexity of the original calibration model by substituting the standard parameters α and β with a single parameter θ. The optimal value of θ is determined (θ = 0.06) across a range of experimental respiratory challenges. The simplified calibration model gives estimates of OEF 0 and absolute CMRO 2 closer to the true values used to simulate the experimental data compared to those estimated using the original model incorporating literature values of α and β. Finally, an error propagation analysis demonstrates the susceptibility of the original and simplified calibration models to measurement errors and potential violations in the underlying assumptions of isometabolism. We conclude that using the simplified calibration model results in a reduced bias in OEF 0 estimates across a wide range of potential respiratory challenge experimental designs.

Introduction
Blood oxygenation level dependent (BOLD) fMRI is a common tool for basic neuroscientific and clinical research. However, relating changes in BOLD signal to local brain activity is complicated by its dependence on various physiological parameters, in particular cerebral blood flow (CBF), rate of cerebral metabolic oxygen consumption (CMRO 2 ) and cerebral blood volume (CBV). Mathematical models have been proposed to describe this dependence (Davis et al., 1998;Hoge et al., 1999a;Wise et al., 2013).
The study of CMRO 2 is of particular interest because the metabolism in the brain is mostly oxidative and CMRO 2 potentially offers a marker of the (patho)physiological state of brain tissue (Lin et al., 2010). An MRIbased method for measuring relative CMRO 2 changes was originally introduced by Davis et al. (1998)) exploiting the effects of hypercapnic vasodilatation. Later, similar approaches were proposed using hyperoxic stimuli . More recently, new methods for absolute, NeuroImage 129 (2016) 159-174 rather than relative, CMRO 2 measurement have been introduced based on quantifying venous oxygen saturation via the T 2 of venous blood (Lu and Ge, 2008;Xu et al., 2009), exploiting the pattern of magnetic field distortions around major veins (Fan et al., 2012) or T 2 -oxygenation calibration curves refined with velocity selective techniques (Bolar and Rosen, 2011;Guo and Wong, 2012). Recently, extensions of the approaches of Davis et al. (1998) and Hoge et al. (1999b) have also been developed allowing the use of both hypercapnia and hyperoxia induced CBF and BOLD signal changes within the same experiment, to estimate venous deoxyhemoglobin concentration and thus OEF and absolute CMRO 2 (Bulte et al., 2012;Gauthier and Hoge, 2012;Wise et al., 2013).
It is this last approach that we investigate in this paper, focusing on improving the calibration model previously defined for describing BOLD signal behavior by Wise et al. (2013). This model permits simultaneous application of hypercapnic and hyperoxic stimuli and provides an expression for the baseline deoxyhemoglobin concentration ([dHb] 0 , see models summary, Fig. 1). It extends the models proposed by (Davis et al. (1998)) and Hoge et al. (1999b) by exploiting the information given by both oxygen (O 2 ) and carbon dioxide (CO 2 ), leading to estimates of absolute CMRO 2 under the assumption of isometabolic conditions. Our aim in this paper is to empirically modify the original calibration model to optimize the integration of information carried by BOLD and CBF signals, modulated through hypercapnic (elevated CO 2 ) and hyperoxic (elevated O 2 ) respiratory challenges, in order to provide the best estimates of OEF 0 and therefore absolute CMRO 2 from the analysis of a set of synthetic BOLD signals generated with a detailed BOLD signal model (Griffeth and Buxton, 2011) in the ideal noiseless condition. Accurate estimates of OEF are crucial for the assessment of absolute CMRO 2 , as it represents the fraction of arterial oxygen extracted by brain tissue (by definition CMRO 2 = CaO 2 ⋅ OEF ⋅ CBF, where CaO 2 is Fig. 1. Models summary. From the top to the bottom, in the green box, the detailed model proposed by Griffeth and Buxton (2011), used for the data simulation; in red, the calibration model defined in the previous paper (Wise et al., 2013), used in the estimates and for the calculation of RSS index and OEF 0 discrepancy; in yellow, the newly proposed simplified model. For details please refer to the respective papers. the arterial O 2 content (see models summary, Fig. 1); therefore it is necessary to optimize the calibration model (Wise et al., 2013) to best explain BOLD signal behavior across a range of potential underlying physiological states to apply the calibration model in practice in the healthy and diseased brain.
In this simulation study, we focus on optimizing the values of two parameters of the calibration model: α and β (see models summary, Fig. 1). In the original implementations, these parameters represented the exponent in a power law relationship between blood flow and venous blood volume (Grubb's parameter, Grubb et al., 1974) and the exponent of a nonlinear dependence of the MR signal on venous oxygenation (Boxerman et al., 1995) respectively. Here instead, following the scheme adopted by Griffeth and Buxton (2011), they are simply recast as fitting factors, removing the previous strict connection to the biophysical origin of the signal.
However, other assumptions underlying the original calibration model are still made, in particular that mild hypercapnia and hyperoxia change respectively CBF and arterial O 2 content (CaO 2 ), but not CMRO 2 (Jain et al., 2011). The validity of physiological assumptions is still controversial and is examined in detail in a recent paper by Blockley and colleagues (2015) that investigates sources of systematic error in dual calibrated BOLD approaches to CMRO 2 estimation.
The simulations in this study (a flowchart of the analysis framework is illustrated in Fig. 2) provide a detailed analysis of the biases present in estimating OEF from the original calibration model assuming previously reported values of α and β. The simulations allow us to define a simplified calibration model, with fewer parameters. This model is similar to others recently proposed to simplify the original Davis model (Blockley et al., 2015;Griffeth et al., 2013) by linearizing the relationship between BOLD signal and changes in deoxyhemoglobin. What distinguishes the simplified calibration model is the subsequent process of optimization of the parameters, which eventually leads to improved performance in estimating OEF 0 .
Moreover, an analysis on the effects of input errors was carried out comparing the simplified calibration model with the original one. This provided a first evaluation of the behavior of this model when dealing with errors in measurements and also a further understanding of its limits.
Our simulated results suggest that this approach, and the simplified calibration model that it yields, can produce unbiased results in estimating OEF 0 across a wide range of physiological states potentially offering an effective practical tool for calibrated BOLD-fMRI experiments aimed at measuring OEF and CMRO 2 .

Methods
The approach for estimating baseline CMRO 2 from hypercapnia and hyperoxia experiments The following is a brief summary of the methodology developed and described in more detail by Wise et al. (2013). More details on the notation are reported in the models summary of Fig. 1. The signal model used in Wise et al. (2013) was: With the assumption that oxygen metabolism does not change with either hypercapnia or hyperoxia, the deoxyhemoglobin ratio [dHb]/ [dHb] 0 is modeled as: where φ represents the O 2 carrying capacity of hemoglobin [Hb]. The baseline deoxyhemoglobin [dHb] 0 is then related to the venous saturation S v O 2 , arterial and venous oxygen content (respectively C a O 2 and C v O 2 ) and oxygen extraction fraction OEF by: Finally, CMRO 2 is determined from the OEF and CBF: Experimentally, the BOLD signal and CBF are measured, along with the end tidal partial pressure of oxygen from which the arterial content of oxygen CaO 2 is calculated. There are then just two unknown parameters in this set of equations: M and OEF. During experimentation, hypercapnia is used predominantly to alter CBF and hyperoxia predominantly to alter CaO 2 in order to provide sufficient information to estimate these two unknown parameters.
In the current work, our focus is on Eq.
(1) (the original calibration model), optimizing the choice of the parameters α and β, and developing a new simplified calibration model to replace Eq. (1). In the following numerical studies, OEF is estimated from simulated BOLD signals for particular combinations of hypercapnia and hyperoxia that could be delivered experimentally. The simulated BOLD signals are generated with a more detailed BOLD signal model as described in the next section. Note that the parameter M does not appear in the detailed BOLD signal model, and comes into the approximate expression (Eq. (1) as a lumped parameter that captures several effects that are modeled explicitly in the detailed model. In our simulations to optimize our choice of the parameters α and β we aim to improve the estimation of OEF and not the lumped parameter M.

Generation of physiological states
A set of BOLD signals was created to simulate experiments at 3 T, the most common MRI field strength for neuroimaging research, using the detailed model employed by Griffeth and Buxton (2011), developing an approach taken by Uludaǧ et al. (2009) and adapted here to simulate experiments in which arterial oxygen and carbon dioxide tensions are modulated for the measurement of baseline oxygen extraction fraction (OEF 0 ) and CMRO 2 . The model relates the BOLD signal (change in MR signal relaxation rate, ΔR2*, Hoge et al., 1999b) to the echo time TE, taking into account the contribution of four different compartments: one extravascular and three intravascular, i.e. arterial, venous and capillary. The signal is therefore computed as a sum of the different sources weighted for their respective volumes (see models summary, Fig. 1).
This model was chosen because of its sensitivity to different aspects of the signal, in particular the introduction of the capillary compartment, which represents an improvement in accuracy of signal description compared to previous models. Moreover it allows the variation of the underlying physical and physiological factors (for which we refer to the paper by Griffeth and Buxton, 2011), allowing us to simulate a wide set of different experimental conditions. Our purpose is in fact to propose an empirical model able to cope with such a heterogeneous dataset and robust enough to provide accurate results for as large a range of realistic physiological conditions as possible.

BOLD signal generation
Different sets of BOLD signals were created from the detailed model by combining these physiological inputs with the expected CBF changes produced by different combinations of respiratory gas challenges. These are tasks in which the subject is asked to breathe particular gas mixtures within which the partial pressures of O 2 and CO 2 vary over time. Here, elevated levels of O 2 (hyperoxic condition) and CO 2 (hypercapnic condition) were simulated and considered (see gas designs summary, Fig. 3) following our practical work using similar designs (Wise et al., 2013).
The effects of hypercapnic stimuli were directly related to CBF changes through an assumed linear cerebrovascular reactivity to CO 2 (fixed to 3% ΔCBF/mmHg accordingly to reports from Bulte et al., 2012 andMark et al., 2010). It is noteworthy that for the values of CBF 0 and the mild levels of hypercapnia considered we expect an approximately linear relationship between end tidal CO 2 and CBF (as per Tancredi andHoge, 2013 andReivich, 1964).
In order to reflect BOLD signal changes with hyperoxia, the detailed model has been integrated with well-known physiological descriptions of carriage of oxygen in the blood, summarized in the appendix to Wise et al. (2013), under the assumption of isometabolism in hyperoxia. Then a modified version of eq. A7 in Griffeth and Buxton (2011) has been adopted to take account of the arterial oxygen concentration CaO 2 , incorporating in hyperoxia the important component of oxygen carried in solution in the blood plasma. Therefore venous oxygen saturation has been calculated as S v O 2 = (CaO 2 − OEF·CaO 2 )/(φ·[Hb]), where the O 2 carrying capacity of hemoglobin φ equals 1.34 ml O2 /g Hb . In the last equation O 2 dissolved in venous plasma has been considered negligible as in Chiarelli et al. (2007).
Each simulated respiratory experiment and therefore each BOLD signal simulated, consisted of 13 values, corresponding to 13 equally spaced samples each representing a block of experimental data. This approach aimed to simulate studies in which data time series (BOLD, CBF, PO 2 , PCO 2 ) are averaged over blocks of time (as in Bulte et al., 2007;Chiarelli et al., 2007;Wise et al., 2010). Levels of end-tidal CO 2 and O 2 partial pressure modulation were chosen to represent those typically used in previous calibrated fMRI studies.
The simulated respiratory experiments (see gas designs summary, Fig. 3) were chosen to be of two main types: simultaneous or interleaved modulation of the O 2 and CO 2 supply. For the simultaneous experiment, a single instance being considered (design A), hypercapnia and hyperoxia were applied at the same time. A challenge of this kind was employed in our previous paper to demonstrate that it potentially allows the extraction of more information than the interleaved design when studied with the original calibration model (Wise et al., 2013). For the interleaved experiments either hypercapnia or hyperoxia is applied, but not both together. Typically a single level of hypercapnia and hyperoxia is chosen, alternated with normocapnia and normoxia as used by Bulte et al. (2012) for their CMRO 2 -calibration study and also in our previously presented work (Wise et al., 2013).
In addition we propose three new interleaved designs for which more than one level of hypercapnia ("interleaved modulated in CO 2 "), hyperoxia ("interleaved modulated in O 2 ") or hypercapnia and hyperoxia ("interleaved modulated") is employed (Fig. 3). We include these additional designs to explore the information that they can yield for the original calibration model, discussed further below. These new designs are aimed at simplifying the experiment compared to the simultaneous design while possibly allowing the extraction of the same amount of information as available from that design.

Criteria for optimizing α and β values
Estimates of OEF 0 using the original calibration model (Fig. 2) were obtained by setting literature values of the α and β parameters: (0.2,1.3) as used by Bulte et al. (2012), and (0.14,0.91) as suggested by Griffeth and Buxton (2011). The resulting values, for both the simultaneous and interleaved gas challenge designs, were considered as "standard" results against which others are referenced. The aim of the optimization process was to establish values of α and β that minimize the error and bias in OEF 0 estimated across the range of physiological states using the original calibration model.
The first part of the optimization process fitted the original calibration model to the generated BOLD signals. We estimated the values of the parameters M and [dHb 0 ], defined in Wise et al. (2013), with Matlab (MathWorks, Natick, MA) function lsqnonlin for different pairs of the α and β parameters: 2500 combinations of parameters in which α ranged from 0 to 1 in steps of 0.0204 and β from 0.5 to 3 in steps of 0.051. Two different approaches were chosen and compared for assessing the best (α,β) pairs: (1) the (α,β) parameters were chosen to minimize the residual sum of squares (RSS) fit to the BOLD signal, and (2) they were chosen to minimize the difference between the OEF 0 estimate from the original calibration model and the true OEF 0 entered into the detailed model (dOEF).
The first approach optimized α and β by minimizing the RSS index among all combinations of physiological states. This was obtained as the sum of the squared differences between the fitted and simulated BOLD signal. However it must be remembered that a good fit of the model to the signal is not a guarantee for good OEF 0 estimate; in fact the relationship between the BOLD signal and OEF 0 is non-linearly regulated through the original calibration model. α and β are regarded simply as parameters to be fit: the search space is extended broadly beyond the literature values and no constraint is imposed on their value.
Computing the RSS over the entire search space leads to the definition of characteristic surfaces, whose minimum points represent the pairs of (α,β) which give the best fit for the specific physiological states considered. The surfaces of all the physiological states were studied and the respective minima found. Also the median surface was calculated from all the physiological states and the results were compared to those of the single states in order to understand if common and representative patterns of minima could be detected across all physiological states. These common surface shapes would suggest that a single (α,β) combination would optimize the fit of all the physiological states. Otherwise, if the single surfaces present patterns so greatly different among each other that the median surface is not representative of the whole set, the choice of an optimum (α,β) pair may not be possible.
The second approach optimized α and β by minimizing the absolute difference between the estimated OEF 0 from the original calibration model and the true OEF 0 used as an input to the detailed model (dOEF). The analysis was carried out calculating difference index surfaces for all physiological states and also the median surface. This is of course a theoretical exercise given that experimentally the information about the real value of OEF 0 is not available as it is the goal of the measurement. Nevertheless, as in the case of the signal RSS indices, it is useful in the context of simulations to assess whether a common representative pattern can be found in the calculated surfaces and what kind of information may be extracted in experiments. In contrast to the first approach, this second approach was introduced to specifically address the minimization of errors in OEF 0 estimate without aiming at a good fit to the simulated BOLD data.
We further addressed the need to find a single representative value of (α,β) that optimizes OEF 0 estimates when only the BOLD and CBF information are available and OEF 0 is unknown, reflecting the real world situation. The search for the best combination was performed by combining, for our simulations, the information given by both the RSS and dOEF indices. The resulting values of (α,β) would be a trade-off between a good fit to the real data and accurate estimates of OEF 0 , therefore matching the criteria for an optimum model which provides reliable estimates of CMRO 2 consumption in a real world case.

Development of the original calibration model and proposal of the simplified model
We also investigated the nature of the relationship between α and β and its dependence on the respiratory challenge design. In fact, the results from the analysis above may be influenced by the interaction of the two parameters for the particular task considered rather than reflect their general behavior. We aimed, therefore, to find reliable optimum α and β values with reduced dependence on the choice of experimental design, making the model applicable over a wide range of potential experiments.
In particular, we studied the RSS and dOEF indices having fixed β, so that its relationship with α is disentangled. We tested different literature values of β and the results were compared to investigate the relationship between these two parameters. On the basis of the results of these analyses we introduced a new optimal model, the simplified calibration model (Eq. (6), also see models summary, Fig. 1), that aims to offer a single estimation framework for all different respiratory designs.
This new model removes the β parameter, reducing the complexity from 4 to 3 parameters: M, dHb 0 and θ (Eq. (6); see models summary, Fig. 1). For M and dHb 0 we refer to the previous paper (Wise et al., 2013), while θ may be considered an empirical parameter lumping together different sources of physiological information expressed by α and β in the original calibration model. The single optimum value of θ was selected as a representative value across all the designs considered.
The choice to remove the β parameter, equivalent to setting β = 1, rather than fixing α, was a pragmatic one in order to reduce the model complexity. In addition there is a historical trend to increasing field strength in MRI. At higher field β tends closer to 1 as a larger proportion of the BOLD signal contribution is extravascular. The form of this simplified model therefore becomes physically more representative at higher field strengths.

Approaching the real-world case
Having proposed the simplified model we returned our focus to the initial practical goal of the study, namely obtaining the best estimates of OEF 0 from the analysis of a set of BOLD signals, with no information of the real OEF 0 itself, that is, without calculation of the dOEF index. At the same time, the respiratory challenge designs showing the lowest bias and variability in estimated OEF 0 were identified. This was done with the intention of providing practical advice for experimental designs.
We further investigated the situation in which θ was fixed to the optimum value found. We considered both the whole dataset and a subset including only the physiological states for which the value of each input parameter (i.e. CBF 0 , CBV 0 , Hct and OEF 0 ) lay within the μ ± σ range. This set (270 cases) is more similar to an ideal "average dataset", excluding outliers while including physiological states which should be more frequently found in real data, at least in the healthy brain. The results for all designs are reported and compared to those previously found with the original calibrated model and literature values of α and β.

Error propagation analysis
We performed a targeted error analysis of the original and simplified calibration models to demonstrate the effect, on OEF estimation, of errors in the measurement of key elements of the brain's responses to hypercapnia and hyperoxia. Only the interleaved design (des. B, Fig. 3) was considered at this stage, as one of the more common types of respiratory experimental designs employed in calibrated fMRI studies. In particular, five sources of error were tested independently of one another: (1) error in the measurement of the CBF response to CO 2 , which we express as error in cerebrovascular reactivity (change in CBF per mmHg change in partial pressure of end-tidal CO 2 ), (2) error in the measurement of the BOLD signal response to CO 2 , (% BOLD signal change per mmHg change in partial pressure of end-tidal CO 2 ), (3) error in the measurement of the BOLD signal response to O 2 , (% BOLD signal change per mmHg change in partial pressure of end-tidal O 2 ), (4) change in CMRO 2 with hypercapnia and (5) change in CMRO 2 with hyperoxia.
Error in the measurement of the CBF response to CO 2 was introduced as an additive factor on the assumed value of cerebrovascular reactivity in the model (fixed to 3% ΔCBF/mmHg, see Section 2.3 in Methods). The simulated measured values of cerebrovascular reactivity considered were, therefore, between 2 and 4% ΔCBF/mmHg (an error from − 33 to +33%). The same rationale was applied to introduce error in BOLD signal measurement during hypercapnia and hyperoxia, with measurement errors ranging from −33 to +33% of the true value of BOLD signal response. In these three cases the resulting erroneous simulated traces of CBF and BOLD signal were input to the nonlinear estimate framework for OEF 0 estimation.
The final two sources of error introduce the effects of nonisometabolism, violation of the assumption of isometabolism, during hypercapnia and hyperoxia that have been suggested in some previous investigations. The CMRO 2 changes with hypercapnia and hyperoxia were introduced during the BOLD signal generation with the detailed model. The ranges of changes in oxygen consumption were fixed below extreme values found in literature: 1% change in CMRO 2 for 1 mmHg change in end-tidal CO 2 (against 1.5%/mmHg found by Xu et al. (2012)) and 1% change in CMRO 2 for 40 mmHg change in end-tidal O 2 (against between 1.86%/40 mmHg and 1.16%/40 mmHg found by Xu et al. (2011) for hyperoxia with a fraction of inspired O 2 of respectively 50 and 98%). These values therefore led to errors spanning between − 7 and + 7% change in CMRO 2 during hypercapnia and − 5 and + 5% change in CMRO 2 during hyperoxia, for the chosen respiratory design (des. B, Fig. 3). The analysis incorporating changes in oxygen metabolism was carried out over a full range of both positive and negative changes. This is because in the literature contrasting directions of altered metabolism have been reported. For example, hypercapnia has been reported to increase (Jones et al., 2005), decrease (Xu et al., 2011) and have no significant effect (Jain et al., 2011) on CMRO 2 .
Performance in estimating OEF 0 in the simplified calibration model was compared to the original calibration model. The nature of the difference between these models was further investigated by analysing the role of the balance between the fractions of capillary and venous baseline CBV (respectively ω c and ω v ) and the exponents relating CBF to total, capillary and venous CBV (respectively ϕ t , ϕ c and φ v , in (Griffeth and Buxton, 2011), see Appendix). In particular, estimates of OEF 0 were calculated for the interleaved design (des. B, Fig. 3 Finally the robustness of the simplified model has been tested against variations of two experimental parameters: the echo time TE and the static magnetic field B 0 . This was done to assess the sensitivity of the optimization process to such parameters and the degree of error introduced in the estimates of OEF 0 when using the simplified model in applications other than the one it has been optimized for (i.e. TE = 32 ms and B 0 = 3 T).
We firstly repeated the process of optimization of α and β on datasets of synthetic BOLD signals created for TE of 22, 27, 32, 37 and 42 ms at both 3 and 7 T. Only the simultaneous design (des. A, Fig. 3) was considered in this case, as the interleaved design (des. B) is shown not to carry enough information for the optimization both of α and β. Then, exploiting the same approach adopted for the rest of the error propagation analysis, we estimated OEF 0 with the simplified calibration model considering the different values of TE and B 0 as sources of systematic error. In this case only the interleaved design (des. B, Fig. 3) was considered for consistency with the error propagation analysis. For the application at 7 T, the detailed model, optimized for 3 T experiments, had to be modified in order to take account of the field dependency of the baseline extravascular signal decay rate and of the baseline intravascular signal decay rate. The former was fixed at 35 ms (Griffeth et al., 2013), while for the latter a quadratic model has been used to extrapolate its dependency on hematocrit, based on data available in literature for experiments at 1.5 and 4.7 T (Silvennoinen et al., 2003), 3 T (Zhao et al., 2007) and 7 T (Blockley et al., 2008).
The simulated data and optimization code used in this article are openly available from the Cardiff University data Catalogue at http:// dx.doi.org/10.17035/d.2015.100127. The simulation code (detailed model) remains with the original authors of that work.

The original calibration model
The results of using literature values for the parameters (α,β) and fitting the original calibration model to the simulated BOLD signal at 3 T using the nonlinear RSS estimates are shown in Fig. 4. The percentage errors in OEF 0 estimates for values of (0.14, 0.91) and (0.2, 1.3) in both the simultaneous and interleaved gas challenge designs are shown as compared to OEF 0 , the input into the detailed model. An overestimate of OEF 0 is observed in all cases, with mean and median values of 11.63% and 11.17% respectively for the simultaneous design with (α,β) = (0.2, 1.3) (Fig. 4-1); 10.46% and 9.89% for the simultaneous design with (α,β) = (0.14, 0.91) (Fig. 4-2); 10.54% and 10.11% for the interleaved design with (α,β) = (0.2, 1.3) (Fig. 4-3); 10.66% and 10.14% for the interleaved design with (α,β) = (0.14, 0.91) (Fig. 4-4). These results provided us with the motivation to optimize the α and β parameters in order to minimize this bias. Fig. 5 shows the RSS surfaces in the α, β space for a single mean state (OEF 0 = 0.4, CBF 0 = 55 ml/100 g/min, CBV 0 = 5 ml/100 g and Hct = 0.44top row) and the median across all physiological states (bottom row). A reliable and representative minimum point cannot be found for either type of experimental design; a minimum for each physiological state can be found but not consistently across states, i.e. it is not possible to select a single pair of (α, β) values which minimizes the discrepancy between the data generated with the detailed model and the signal fitted with the original calibration model. In the interleaved case this is due to the extreme irregularity of the surfaces in a single physiological state (Fig. 5-2). Then in the median over all physiological states and in the simultaneous cases those surfaces assume a characteristic "valley shape", whose minima are ill-defined ( Fig. 5-1,3,4). The minima lie in a wide region at the bottom of the valley which appears to broaden once the boundaries of the α, β space are extended (data not shown). This suggests a difficulty with fitting the original calibration model, namely, the collinear nature of the relationship between the two parameters α and β. Fig. 6 plots the surfaces for the discrepancy between true OEF 0 and that estimated from the original calibration model over (α,β) space. In contrast with the RSS surfaces, similar patterns are observed for the two respiratory challenge designs, but large differences exist between the single mean physiological state and the average one. In all cases, similarly to what has been shown for the RSS analysis, it is possible to find a minimum for each physiological state, but not one representative for all states, demonstrating that an optimal α and β combination cannot be prescribed when considering only the minimization of the error in measured OEF 0 .
Collinearity in the original calibration model Fig. 7 shows the lines extrapolated from the points of minimum of the RSS and dOEF median curves for the five different gas challenge designs considered. The intersection for the simultaneous design (des. A) is (0.07,1), whereas for the interleaved modulated (des. C) is (0.06,0.7) and for the interleaved modulated in CO 2 (des. D) is (0.06,0.85). An optimum combination was found also for the interleaved design modulated in CO 2 design (des. E) but laying outside the space of the (α,β) considered for our analyses (0.08,0.4). For the interleaved design (des. B), the distribution of the minima is such that it is not informative to calculate a line of best fit. Fig. 7 also shows how the intersection for designs A, C and D differs from the optimum values of (α,β) previously proposed in literature and here marked with a black circle and a green star (respectively (0.2,1.3) and (0.14,0.91)). Fig. 8 illustrates statistics on the logarithm values of the two indices, RSS and dOEF. Three values of fixed β were tested in order to remove the effect of collinearity between parameters α and β. These values were 0.91, 1.3 and 1: the first two taken from the literature, while the last chosen as the value that effectively removes the effect of β from the model. The minima of the median curves reflect the behavior of the whole set of physiological states for the simultaneous design (des. A) and for respiratory designs when CO 2 delivery modulation is exploited in interleaved designs (des. C, D). By comparison, in the interleaved design (des. B) there is more variability across physiological states, suggesting that the choice of a minimum would be far less representative of the general behavior of the curves. Also the magnitude of the RSS index is so low that it becomes uninformative (note the change of scale in Fig. 8 des. B compared to the others). Finally the result for the interleaved design modulated in O 2 (des. E) is noteworthy as the calculated RSS index seems to be insensitive to the considered values of α and therefore a significant minimum cannot be identified.

The simplified model
For the simplified model, i.e. the original calibration model in which the β parameter is fixed to 1 and the new parameter θ substitutes for α, the single optimum value of θ has been selected that gives the minimum of the median dOEF curves across the dataset. This value is θ = 0.06 for each design (apart from the interleaved one, des. B) as reported in the middle column of Fig. 8. Strictly, this is a suboptimal solution, given that the minima of RSS and dOEF median curves do not coincide. Nevertheless this choice matches the value of α for the intersections found in Fig. 7. Also, the consistency across designs of selecting this single representative value of the θ parameter for the simplified model offers a practical benefit justified by the results below.
The percentage errors in OEF 0 estimates fitting the simulated signal through minimization of RSS only using the simplified model are reported in Fig. 9. The biases in the distributions of estimates from the set of underlying physiological states are only slight, with mean and median values lower that those reported in Fig. 4: 2.45% and −0.35% respectively for the simultaneous design ( Fig. 9-1, des. A); 2.83% and 0.05% for the interleaved design ( Fig. 9-1, des. B); 2.44% and − 0.39% for the interleaved modulated design (Fig. 9-1, des. C); 2.15% and −0.69% for the interleaved modulated in CO 2 design ( Fig. 9-1, des. D); and 3.13% and 0.33% for the interleaved modulated in O 2 design des. E). In summary all respiratory designs show similar results with this simplified calibration model. The small value of θ found led us to consider the effect of fixing its value to 0. Results (not reported in figures) showed mean errors in OEF 0 estimates about − 6%, with distributions mainly within the range − 10 to 5%. Finally in Fig. 9-2 the percentage errors in OEF 0 estimates are shown for the simplified model but considering only the physiological states in which the value of each input variable is included in the respective μ ± σ range. For all, as expected, the boxplots are tighter than those comprising the wider range of physiological states.  1) and (2) show surfaces for a single mean physiological state (OEF 0 = 0.4, CBF 0 = 55 ml/100 g/min, CBV 0 = 5 ml/100 g and Hct = 0.44). (3) and (4) show surfaces for the median calculated on all 1000 states. In both cases a reliable minimum cannot be found and the collinearity between the two parameters α and β is highlighted. (1) and (2) show surfaces for a single mean physiological state (OEF 0 = 0.4, CBF 0 = 55 ml/100 mg/min, CBV 0 = 5 ml/100 g and Hct = 0.44). (3) and (4) show surfaces for the median calculated on all 1000 states. In both cases reliable minima cannot be found. For both models and for all sources of error considered, the performance in estimating OEF 0 worsens as the error increases, and worsens at higher values of true underlying OEF 0 . Furthermore, the degree of error affects both the offset and the slope of the scatterplot lines in each of the five cases of error. Of particular note is the effect of altered oxygen metabolism with hyperoxia ( Fig. 10-5, A and B). The relationship between true and estimated values of OEF 0 is non-linear and larger changes in oxygen metabolism lead to a huge increase of error in OEF 0 estimates (some of the results not shown in the figure).

Error propagation
The direction of the effect on OEF 0 is different depending on the error. For both models a positive error in CBF estimation due to CO 2 ( Fig. 10-1, A and B), BOLD estimation due to O 2 (Fig. 10-3, A and B) and CMRO 2 increases due to CO 2 (Fig. 10-4, A and B), causes an underestimate of OEF 0 while a negative error causes an overestimate of OEF 0 compared to the noiseless condition. In the other cases (error in BOLD estimation due to CO 2 , Fig. 10-2, A and B, and CMRO 2 increase due to O 2 , Fig. 10-5, A and B) the opposite is true.
The principal difference between the original calibration and the simplified model, highlighted in Fig. 10, is the offset and slope of the lines, most clear in the error-free condition (blue crosses, corresponding to noiseless condition in all panels). The original calibration model shows a slope very close to the unity but also a significant offset that shifts the relationship between true and estimated OEF 0 away from the identity line (plotted as a black line), leading to consistent overestimation of OEF 0 . In comparison, for the simplified model, the offset is minimized and the slope is close to unity such that the discrepancy between true and estimated OEF 0 is particularly small over the middle range of OEF 0 values. Fig. 11 illustrates the influence of the balance between the fractions of capillary and venous baseline CBV and the effects of altered exponents relating CBF to total, capillary and venous CBV. The estimates of OEF 0 show relatively little sensitivity to changes in all parameters Fig. 7. Lines extrapolated from the minimal points. Lines extrapolated from the minimal points of the median curves for the residual sum of square (RSS) and absolute discrepancy in OEF 0 (dOEF) indices in all different designs. The combinations of (α,β) for which minima coincide are (0.07,1.00) in design A, (0.06,0.07) in design C and (0.06,0.85) in design D. The combination is outside the search space for design E (0.08,0.4), whereas a line of best fit was not calculated for the RSS index in design B. Literature combinations of (α,β) are shown with a black circle (0.2,1.3) and a green star (0.14,0.91) for comparison. apart from φ v (the exponent relating fractional changes in venous blood volume to blood flow, Figs. 11-4, A and B) and venous blood volume fraction ω v at lower values of OEF 0 . Similarly to Fig. 10, the main difference between the two models is the offset in the estimate of OEF 0 . The biases in the results introduced by altering the blood volume parameters change little between the models and appear independent of offset between the two models. Fig. 12 illustrates the sensitivity to changes in TE and B 0 of the optimization approach presented and of the simplified calibration model proposed. In Fig. 12-A are shown lines extrapolated from the points of minimum of the RSS and dOEF median curves for the simultaneous design (des. A) at different combinations of TE and B 0 . This figure is analogous to Fig. 7, where only the case of TE = 32 ms and B 0 = 3 T is reported. Results show that the optimum combinations found are sensitive to both parameters, TE and B 0 . However, while for 3 T (Fig. 12-A, solid lines) the combinations gravitate around the optimum one selected for the simplified model, for 7 T (Fig. 12-A, broken lines) they are shifted towards values of approximately (0.19,1.2). Furthermore, while in the first case TE mostly affects the optimal value of β, at 7 T it affects more the optimum α. The scatterplot in Fig. 12-B reports the error introduced estimating values of OEF 0 with the simplified model from the BOLD signals created at different values of TE and B 0 . Similar values are obtained for the other respiratory experiments considered (data not shown). Results highlight that the estimates are only slightly sensitive to variations in TE, with the introduced error being negligibly small when considering 3 T experiments, but showing a larger offset when using the 3 T optimized simplified model at 7 T.

Discussion
The process that led to the optimization of the original calibration model is summarized in Fig. 2. Our simulated data was a set of BOLD signals created using the detailed signal model (Griffeth and Buxton, 2011) with Gaussian distributed values of physiological parameters (CBV 0 , CBF 0 , OEF 0 and Hct) and five different specific respiratory experimental designs employing hypercapnia and hyperoxia, some previously published and some new. Estimates of OEF 0 obtained using the original calibration model (Wise et al., 2013) and literature combinations of α and β using residual sum of squares (RSS) proved unsatisfactory. We therefore considered an index based on the magnitude of the discrepancy between the real and estimated OEF 0 (dOEF) in addition to the RSS. The consequent minimization led to values of α and β that are a good trade-off between the experimental necessity to fit to the BOLD signal and the capacity to estimate OEF 0 in an unbiased manner.
We demonstrate substantial collinearity between the parameters of the original calibration model when estimating OEF 0 , a feature that leads to erroneous results illustrated by the valleys in Figs. 5 and 6, in which we have almost the same value of the RSS and dOEF indices for very different combinations of α and β which therefore produce broad distributions of potential OEF 0 estimates. The collinearity suggests that for practical applications using the respiratory challenges here analyzed, values of α and β need to be fixed. We overcame the collinearity, in a practical sense, by the analysis of the relationship between the two indices for fixed values of β. Fixing the α and β parameters to those values where minima of both RSS and absolute discrepancy in OEF 0 estimate curves coincided, providing optimized OEF 0 estimates and BOLD signal Fig. 9. OEF 0 estimate errors for the simplified calibration model. Percentage errors resulting from the estimate of OEF 0 using only a nonlinear RSS minimization, simulating the experimental situation. (1) shows results achieved from implementing the simplified model (θ = 0.06) on all physiological states. (2) presents results using the simplified model (θ = 0.06) with ranges of physiological states considered narrowed to μ ± σ. (1) percentage error in measured CBF response to CO 2 , (2) percentage error in measured BOLD response to CO 2 , (3) percentage error in measured BOLD response to O 2 , (4) change in oxygen metabolism due to +7 mmHg change in end-tidal CO 2 and (5) change in oxygen metabolism due to +200 mmHg change in end-tidal O 2 .
fits. Fig. 8 shows that this is not achieved using the literature values of α and β and explains the bias in the results in Fig. 4.
The consequence of our practical optimization in this work is that we take α and β parameters to have no specific physiological meaning compared to those introduced in the original models by Davis (Davis et al., 1998) and Hoge (Hoge et al., 1999a). This is similar to the approach previously adopted by Griffeth and Buxton (2011). Fig. 8 also shows that once alpha is fixed to the value minimizing the dOEF index, the sensitivity of the model to the chosen β is only small. In fact the values of β considered mostly affect the behavior of the RSS index, which is orders of magnitude lower than the dOEF one. This therefore allows us to fix β to 1 and thus to introduce the simplified model. Note that the approach we take is similar to others recently taken (Blockley et al., 2015;Griffeth et al., 2013), in which β has been fixed to 1. Differences arise when considering that in those cases α has been chosen as Grubb's parameter (Grubb et al., 1974), instead of an optimized fitting parameter, as in the present case. The one we propose is of course, strictly speaking, a suboptimal model, in which, for the purpose of simplicity and generality of the model, the best possible fit to the data is compromised in favour of a better estimate of OEF. This model overcomes the difficulty of defining an optimum value for α and β in the interleaved gas challenge designs (such as design B of Fig. 3) and of needing to consider a different pair of parameters, α and β, for each different respiratory experiment. For all the designs considered, the distributions of percentage errors in OEF 0 estimates lay almost exclusively within the ±5% range, with rare outliers. This same accuracy, assuming good estimates of CBF, would be directly translated into CMRO 2 estimates through the defining relationship, CMRO 2 = CBF⋅OEF⋅CaO 2 , in which CaO 2 represents the concentration of oxygen carried in arterial blood. The simplified model shows good performance for all interleaved modulated designs, which mimic more closely the behavior of a real experiment in which end tidal values will vary over time and between hypercapnic and hyperoxic blocks. The simplified model offers, therefore, a reliable and adaptable tool for estimating OEF 0 , and then absolute CMRO 2 , across a range of respiratory experiment designs. Indeed, the simulation framework that we have presented could be applied to any arbitrary respiratory experimental design.  Given the substantial equivalence between experimental designs in their ability to estimate OEF 0 once the model parameters are fixed, we may substitute the more sophisticated experimental designs, in which both CO 2 and O 2 are elevated simultaneously, with simpler interleaved designs (as in Bulte et al., 2012). The simplification achieved would be significant firstly in terms of instrumentation and control of the experiment, as this requires only one gas to be delivered at any given time. Moreover the BOLD signal behavior itself is likely to be simplified in an interleaved block sequence, less affected for instance by transients between different levels of hypercapnia and hyperoxia. Both of these improvements may also lead to more rapid data acquisition.
Our simulated results also help to explain the estimates of OEF 0 in our previous study (Wise et al., 2013). Fig. 4 reports overestimates of OEF 0 when using the original calibration model with two pairs of literature (α,β) parameters, leading directly to overestimates in CMRO 2 . In our previous study (Wise et al., 2013) experimental data were fitted with literature values of α and β. This may help to explain the higher values of CMRO 2 reported in Table 1 of the Wise and colleagues study (Wise et al., 2013), compared to previous MRI and PET studies (Bulte et al., 2012;Gauthier and Hoge, 2012;Ito et al., 2004). From the current work, similar biases in OEF 0 estimates from other groups' work using similar respiratory challenges (Bulte et al., 2012) and literature (α,β) values would also be expected.
The limitations of the present study lie mainly in the synthetic nature of the data. In the BOLD signal generation process no noise was added to the resulting time series. We made this choice because our focus was on the analysis of the model properties and in particular on its bias in OEF o estimation, with an approach similar to that of Griffeth and Buxton (2011). The simulation environment was necessary to study and optimize the behavior of the original calibration model, in particular for the analysis of the discrepancy in OEF 0 estimates, which leads to the identification of the optimum parameters α and β and to the definition of a simplified model with an optimum parameter θ. Even though synthetic, the set of simulated BOLD signals was designed to span a wide range of physiological states defined by CBF 0 , CBV 0 , OEF 0 and Hct. The results show the distributions of errors in OEF 0 estimates to be narrow with median values close to 0, demonstrating a substantial insensitivity to differences in the underlying physiological state for both the original calibrated model and the simplified one. In particular these models to estimate OEF are robust to variations in hematocrit that is considered to be an unknown variable, and therefore a source of uncertainty.
Using the simplified model, the analysis of a subset of physiological states in which the value of each input parameter was included in the narrower mean ± one standard deviation range, that is, a more realistic range for the healthy brain, led to even more accurate final results in OEF 0 estimates (Figs. 9-2). This suggests that the outliers found using the wider range of physiological states may be due to particular combinations of unusual physiological states rather than possible critical issues with the simplified model.
The error propagation analysis illustrated the differences between the original and simplified calibration models and their likely practical limitations. Fig. 10 shows the offset in OEF 0 estimates in the application of the original model, eliminated by the new simplified model. Both models are susceptible to errors in the measurement of the CBF and BOLD responses to the respiratory challenges. Performance in terms of absolute error in estimated OEF 0 tended to be worse at higher OEF values, although fractional errors were largely independent of true OEF 0 . In practice, the accurate estimation of CBF responses to CO 2 is more challenging, and carries greater uncertainty, than the estimation of BOLD signal responses to CO 2 and O 2 , largely as a result of the low contrast-to-noise of arterial spin labelling. The errors represented in Fig. 10-1 (CBF response to CO 2 ) are likely to be of a realistic order of magnitude, depending on a number of experimental factors, while the uncertainty of BOLD responses is likely to be smaller and contribute less to OEF 0 estimate error in practice.
Changes in CMRO 2 during hypercapnia and even more during hyperoxia would strongly affect the estimated OEF 0 . This is to be expected, as the assumptions of isometabolism in hypercapnia and hyperoxia, although still object of discussion, are commonly assumed as hypotheses for BOLD calibrated methods. Dependence of CMRO 2 on altered arterial CO 2 and O 2 levels has been reported with variable results. Hyperoxia has been found to cause increase (Rockswold et al., 2010), decrease (Richards et al., 2007;Xu et al., 2012) and no change (Diringer et al., 2007) in oxygen metabolism. Similarly hypercapnia has been observed to cause increases (Horvath et al., 1994;Jones et al., 2005;Yang and Krasney, 1995), decreases (Sicard and Duong, 2005;Xu et al., 2011) and no change (Chen and Pike, 2010;Barzilay et al., 1985;Kety and Schmidt, 1948;Novack et al., 1953) in oxygen metabolism.
The most relevant studies suggest hypometabolism resulting from both hypercapnia and hyperoxia with relative decrease in CMRO 2 respectively up to 13% (for a + 10 mmHg increase in end tidal CO 2 (Xu et al., 2011)) and 10% (for a 50% fraction of inspired O 2 (Xu et al., 2012)). These, according to our results, would translate in very severe errors in OEF 0 estimates even for average values of OEF 0 : 50% overestimate in hypercapnia and 42% underestimate in hyperoxia (for the simplified calibrated model with OEF 0 = 0.4 and CMRO 2 changes scaled to match the variations in end tidal CO 2 and O 2 of our interleaved design, des. B, Fig. 3). This indicates the failure of both the original and the newly proposed model in accordance with results from Blockley and colleagues (2015) when testing the effect of changes in metabolism during hypercapnia on OEF estimates for calibrated BOLD methods.
Further investigation of the circumstances in which CMRO 2 is altered needs to be performed. It is possible that, with mild respiratory challenges that limit the violation of the assumption of isometabolism, the errors in OEF 0 suggested by Fig. 10-4 and 10-5 can be maintained within practically acceptable levels. Given the successful literature reports of the calibrated fMRI approaches to estimating OEF and CMRO 2 , it seems unlikely that they are strongly affected by violation of assumptions of isometabolism to the worst degrees simulated in Fig. 10-4 and 10-5. Were they affected in such a way by the published respiratory challenges more bias in reported OEF in the literature would be expected. However, we must be cautious in future assumptions of isometabolism in studies of cerebral pathology. For example, it may be hypothesized that in conditions of reduced blood flow where OEF is elevated, CMRO 2 may be oxygen limited and therefore an increase of CBF or arterial oxygen content in hyperoxia may itself stimulate an increase in CMRO 2 .
Finally it has been demonstrated that the better estimates of OEF 0 obtained with the simplified model, and more specifically the absence of the offset affecting the estimates of the calibration model, are not dependent on the particular balance between the fractions of capillary and venous baseline CBV or the exponents relating CBF to CBV used in the detailed model for creating the synthetic BOLD signal. This suggests that the optimization carried out and the performance of the simplified model is not significantly dependent on the characteristic features of the detailed model when considering the blood volumes of the venous and capillary compartments. The effects of different blood volumes will be partly absorbed into the parameter M in the original and simplified calibration models. While this is often considered in studies using the original calibration model as the maximum achievable BOLD response, in the current framework where the focus is the unbiased estimation of OEF, we do not interpret its biological significance and consider it simply as a fitting parameter of no particular interest. In fact, the role of M in our approach is different from earlier approaches to calibrated fMRI. In our approach, thanks to a model in which hypercapnia-and hyperoxia-induced changes are both accounted for simultaneously, M, although still estimated, is just a by-product of an estimation of OEF 0 and has therefore been put into the background. Given the different form of the simplified calibration model, we expect the estimate of M obtained to be different to those reported in literature as they are representative of different information.
Effects of considering different values of TE and B 0 are investigated showing that the optimal combination of α and β is sensitive to TE at both 3 and 7 T. However, at 3 T the error introduced in the estimates of OEF 0 over a practical range of TE is negligible. Conversely, the effect of B 0 cannot be ignored as results at 7 T show that using the simplified calibration model would lead to a significant bias in the estimates of OEF 0 and therefore another (α,β) combination should be used. Considering a characteristic TE of 25 ms for a GRE experiment at 7 T, the optimal combination found is about (0.21,1.245). Our analysis, therefore, not only supplies a complete picture of the influence of TE and B 0 on the newly proposed model, but also demonstrates how our optimization approach can conveniently be adapted by other groups to different research settings, such as 7 T experiments.
In conclusion, we have demonstrated our simulation framework as a useful means of testing calibration models for respiratory experiments aimed at measuring OEF 0 . We have shown that the simplified model is a potentially valuable tool for the unbiased evaluation of OEF 0 and therefore absolute CMRO 2 in studies using respiratory challenges. In particular, we would recommend the simplified calibration model as it offers accurate results along with reduced complexity and enhanced flexibility with respect to the respiratory design of the experiment. As the model has been found to be particularly affected by errors in measurements for high values of OEF 0 , its application for absolute estimates of CMRO 2 may not be optimal in those pathological conditions where extreme values of OEF 0 might be expected. In considering practical experimentation, since similar accuracy is achieved across different respiratory challenge designs when θ is fixed to its optimal value in the simplified model, the least complex interleaved designs may be used with that model with no detriment to the OEF 0 estimates.