Prediction of an oxygen extraction fraction map by convolutional neural network: validation of input data among MR and PET images

Purpose Oxygen extraction fraction (OEF) is a biomarker for the viability of brain tissue in ischemic stroke. However, acquisition of the OEF map using positron emission tomography (PET) with oxygen-15 gas is uncomfortable for patients because of the long fixation time, invasive arterial sampling, and radiation exposure. We aimed to predict the OEF map from magnetic resonance (MR) and PET images using a deep convolutional neural network (CNN) and to demonstrate which PET and MR images are optimal as inputs for the prediction of OEF maps. Methods Cerebral blood flow at rest (CBF) and during stress (sCBF), cerebral blood volume (CBV) maps acquired from oxygen-15 PET, and routine MR images (T1-, T2-, and T2*-weighted images) for 113 patients with steno-occlusive disease were learned with U-Net. MR and PET images acquired from the other 25 patients were used as test data. We compared the predicted OEF maps and intraclass correlation (ICC) with the real OEF values among combinations of MRI, CBF, CBV, and sCBF. Results Among the combinations of input images, OEF maps predicted by the model learned with MRI, CBF, CBV, and sCBF maps were the most similar to the real OEF maps (ICC: 0.597 ± 0.082). However, the contrast of predicted OEF maps was lower than that of real OEF maps. Conclusion These results suggest that the deep CNN learned useful features from CBF, sCBF, CBV, and MR images and predict qualitatively realistic OEF maps. These findings suggest that the deep CNN model can shorten the fixation time for 15O PET by skipping 15O2 scans. Further training with a larger data set is required to predict accurate OEF maps quantitatively. Supplementary Information The online version contains supplementary material available at 10.1007/s11548-021-02356-7.


Introduction
Oxygen extraction fraction (OEF) is a biomarker of the viability of brain tissue in ischemic stroke [1][2][3][4]. Positron emission tomography (PET) with oxygen-15 gases ( 15 O PET) is the gold standard method for quantifying OEF maps [5,6]. Calculating the OEF map requires PET scans for cerebral blood flow (CBF) with C 15 O 2 or H 2 15 O and cerebral blood volume (CBV) with C 15 O, as well as a 15 O 2 scan. Arterial blood sampling is also required to quantify the OEF, CBF, and CBV in 15 O PET. The long fixation time of the 15 O PET scans, which consists of preparing the arterial blood sampling and three PET scans (1-2 h), places a burden on patients. These issues prevent the widespread use of 15 O PET in clinical settings. Kudomi and colleagues proposed a method of shortening the total fixation time by continuous inhalation of C 15 O 2 and 15 O 2 gases [7]. However, the continuous inhalation protocol has not been widely used. Various methods to acquire OEF maps only with magnetic resonance (MR) imaging data [8][9][10][11][12] were proposed previously. These methods have not also been widely used in clinical due to the need for special calculation and sequences. Deep learning, which is a type of machine learning with a neural network consisting of numerous layers [13,14], has been recently and widely used in the computer vision area.
We hypothesized that the deep CNN could predict OEF maps without the 15 O 2 scan from the other PET and MR images. To verify this hypothesis, we performed learning of structural MR images, CBV maps, and CBF maps at rest and under stress with acetazolamide as inputs and OEF maps as a target with U-shaped CNN with skip connections (U-Net) [27]. To demonstrate which MRI, CBF, and CBV are optimal for predicting the OEF maps, we compared the models learned with various combinations of MR images, CBF, and CBV maps as inputs. Finally, we performed the test using the model learned with the best combination.

Data
We retrospectively analyzed data from patients with unilateral cerebrovascular steno-occlusive disease who underwent both MR and 15

Scan procedures and image processing
PET data were acquired using an SET-3000GCT/M scanner (Shimadzu Corp., Kyoto, Japan) dedicated to the threedimensional (3D) acquisition mode [28]. The details of 15 O PET scans have been described elsewhere [29]. Motion correction for the 15 O PET scans was performed using a previously described software-based method [30]. The OEF maps were estimated using the autoradiographic method [6] based on images acquired through inhaled 15 O 2 and CBF estimated from H 2 15 O PET images [31]. Blood volume was subtracted from the OEF maps by CBV estimated from C 15 O PET images. The stressed CBF maps were estimated using data acquired with H 2 15 O and acetazolamide, as described in previous reports [30,32].
The T2-and T2*-weighted images and CBF map at rest were registered to the T1-weighted images. The transformation matrix was applied to the realignment of OEF and CBV maps to the T1-weighted images. Then, all images were realigned to spaces for the T1-weighted images. These registration processes were performed using the FreeSurfer software package (https:// surfer. nmr. mgh. harva rd. edu/). Each slice for the realigned images was down-sampled to 256 × 256 for the input and target data for the deep CNN model. All images were standardized by the average for an individual image.
For extracting brain mask and the region-of-interest analysis described below, spatial normalization of the T1-weighted images was performed using the unified segmentation algorithm [33]. The deformation estimated for the T1-weighted images was applied for the other realigned MR images and CBF, CBV, and OEF maps. These spatial normalization processes were performed using the SPM12 software package (https:// www. fil. ion. ucl. ac. uk/ spm/).
Flowchart for the image pre-processing is shown in Fig.  S1 on Supplementary Materials. Figure 1 illustrates the U-Net used in this study. The U-Net was trained with the training data set for the 113 subjects. Briefly, the U-Net contains an encoder part to compress data for extracting robust image features and a decoder part to restore a desirable image from the extracted features. The decoder part has a mirrored structure of the encoder part. Each level of the encoder and decoder parts contains two convolutional layer blocks. Each block contains a convolutional layer, a batch normalization layer to avoid internal covariance shifts [34], and an activation layer with a rectified linear unit [35]. Up-sampling on the decoder part was implemented with a transposed convolutional layer with stride by 2. Down-sampling on the encoder was implemented with a convolutional layer with stride by 2, instead of a pooling layer, due to improving the ability of expression for the network [36]. The level of down-and up-sampling was set to 3 empirically. To avoid losses of spatial information, the skip connections were added on each level. Finally, the output images were recovered from the final image features using a convolutional layer with a 1 × 1 kernel.

Training
Weights on the network were optimized by minimizing the mean squared error between the real and predicted OEF maps. The optimization of the weights was performed using the Adam algorithm [37]. We used the default values of hyperparameters for Adam in this study, except for β 1 , which was set to 0.5 empirically. The update of the weights was implemented by a batch, including eight image data sets, and iterated with 100 epochs. The initial learning rate was set to 0.001. The learning rate was linearly decayed from the 50th epoch to the end of the learning. We performed data augmentation of the training data with rotation and a horizontal flip. The training processes were implemented using the PyTorch library (https:// pytor ch. org) [38].

Validation for combination of input images
To validate which combination among structural MR images, rest CBF, stressed CBF, and CBV maps were optimal for predicting the OEF maps, we performed the training using various combinations of input images, as presented in Table 1. To simplify the validation, we regarded the structural MR image data set including T1-, T2-, and T2*weighted images as one image data type with three channels, which was termed "MRI." Stressed CBF was denoted as "sCBF" hereafter. The models were validated with five-fold cross-validation. Briefly, we split the 113 training/validation data into five data subsets, regarded four and one subsets as training and validation data, respectively, and then repeated training and evaluation of the trained model five times such that all subsets had been validated.
We calculated the intraclass correlation for agreement between the predicted and real OEF values on brain voxels (ICC (2, 1)) as an index for performance to predict the OEF map. The real and predicted OEF maps were down-sampled by 4 for rapid calculation of the ICC. We compared the ICC among the models learned using the combination of input  images. Individual brain masks were calculated using spatial normalization to the Montreal Neurological Institute (MNI) template and the inverse transformation to the individual brain. We performed the Dunnett's test to test the differences from the model with the best ICC. We also calculated the effect sizes of the differences in ICC.
A four-way repeated-measures analysis of variance (ANOVA) was performed to demonstrate which images contributed to the prediction performance. Binary variables, as to whether each image (MRI, CBF, CBV, and sCBF) was used as an input image, were regarded as independent variables. For each independent variable, we calculated the partial eta-squared (η p 2 ) as an effect size to contribute to the ICC values. ICC values were regarded as dependent variables. The ICC was calculated with the Pingouin library (https:// pingo uin-stats. org/ index. html) [39]. The ANOVA was performed with the R programming language (https:// www.rproje ct. org/).
To extract cortical OEF values, volume-of-interests (VOI) template, based on labeled data provided by Neuromorphometrics, Inc. (http:// Neuro morph ometr ics. com) under academic subscription, was applied on the spatially normalized OEF maps. The template VOIs were masked with a gray matter mask, determined with thresholding of the tissue probability map on the MNI template by 0.5. OEF values for the cerebral cortex on each hemisphere were calculated from the average values on nine cortical regions were extracted. The nine cortical regions consisted of the frontal, parietal, occipital, temporal, central operculum, anterior cingulate, middle cingulate, posterior cingulate, and insula cortices. Ratio of OEF between ipsi-and contra-lateral VOIs for the cortical regions was calculated.

Test
To assess the generalization performance of the trained model, the model with the best prediction performance in the validation was tested using the test data for the most recent 25 subjects. The model trained with all training data for 113 subjects was tested. The ICC for the predicted OEF values on the brain voxels was calculated in a similar manner to the validation.

Validation for the combination of input images
As presented in Fig. 2 and Table 2, the highest ICC (0.597 ± 0.082) to the real OEF values was observed with the model learned with MRI, CBF, CBV, and sCBF images (full model). The OEF maps predicted by the full model were similar to those of the real OEF maps, as illustrated in Fig. 3. The ICC value for the full model (0.597) indicates moderate agreement between the real and predicted OEF maps. No significant difference in ICC was observed among the models with the top-six mean ICC. In the case illustrated in Fig. 3, we did not observe marginal differences among the OEF maps predicted by the models with a top-three ICC. The ICC for the model other than top-six was significantly lower than the ICC for the full model. The large effect sizes (> 1.6) for the differences of ICC to the full model were observed with the model learned without resting CBF. The model learned only with MRI resulted in the worst mean ICC and the flat OEF maps, as illustrated in Fig. 3.
In the case illustrated in Fig. 4, we observed a lower contrast on the predicted OEF maps, even with the full model, as compared with the real OEF maps. This case resulted in the lowest ICC with the full model among the validation data set and had high laterality of OEF. Similar trends were observed in other cases with high laterality of OEF, as shown in Fig  S2 on Supplementary Materials. The lower predicted rOEF values than real ones were observed in the cases with the higher real rOEF, as shown in Figs. 5 and S3. In the cases with low real rOEF due to cerebral infarction, as illustrated in Fig. S4, the predicted rOEF values were higher than the real rOEF values. Table 3 indicates that all binary variables for the input images had significant effects on ICC. We observed a much stronger effect of CBF on ICC (η p 2 = 0.576) than that of the other variables. The effects of CBV (η p 2 = 0.096) and MRI (η p 2 = 0.065) on ICC were moderate. The effect of stressed CBF (η p 2 = 0.021) was the weakest among the binary variables. For the test below, we applied the full model because it had the best ICC and the most significant effects of all input images on ICC.

Test
The mean ICC value for the test data sets was 0.591 ± 0.081. We observed no significant differences in ICC between validation and test data sets (Welch's t test: t = 0.342; p = 0.734; effect size = 0.075). As illustrated in Fig. 6, we observed similar textures to the real OEF map in the predicted OEF map for a case with the highest ICC (0.703). However, the predicted OEF values in this case were apparently lower on the whole brain than the real OEF values were. The underestimation of OEF values was observed in 12 cases in the test data sets, as illustrated in Fig. 6. In the case illustrated in Fig. 7, lower contrast on the predicted OEF maps than that on the real OEF maps was observed; this case had high laterality of OEF, similar to that observed in the validation data set illustrated in Fig. 4. Similar trends in predicted cortical rOEF values to the validation, underestimation in the case with high real rOEF, were observed in the test dataset, as illustrated in Fig. 8.   The best ICC for the full model and the significant effects of all binary variables for the input images suggest that all MRI, CBF, CBV, and stressed CBF maps contribute to the prediction of the OEF map. The very strong effect of the resting CBF on the prediction was as we expected, because the OEF maps as the target were calculated with the resting CBF maps using the autoradiographic method. The contribution of the CBV maps indicates that the deep CNN model learned an association between the dilation of blood vessels and oxygen supply to the brain tissue in stroke. The contribution of stressed CBF indicates that the deep CNN model can learn the relation between cerebrovascular reactivity and OEF. The cerebrovascular reactivity, measured by the difference between rest and stressed CBF, decreases in advance of the elevation of OEF in stroke. The contribution of the MRI indicates that the CNN model learned two pieces of information from the MRI: One is the anatomical information for the individual brain, and the other information is the information on changes in susceptibility with deoxygenation of hemoglobin. The elevation in OEF results in changes in intensities in vessels in T2*-weighted images [40]. These findings suggest that all MRI, CBF, CBV, and stressed CBF maps are useful as input images for the prediction of   OEF maps using the deep CNN model. Therefore, for this study, we selected the full model trained with MRI, CBF, CBV, and sCBF maps for the test.
The moderate ICC in the test data set was the same as that in the validation data set; this suggests that the trained CNN model was successfully generalized, except for cases with high laterality of OEF. The trained model failed to predict the OEF maps with high laterality, and resulted in the underestimation of rOEF values in the validation and test data sets. The overestimation of rOEF was observed in the cases with the lower real rOEF than 1.0 due to cerebral infarction. These results reflect the lack of training data, as only a few cases with high laterality and low rOEF were included in the training data set in this study. To quantitatively predict accurate OEF maps, further training of a larger number of cases with high laterality of OEF and low rOEF is required.
Another limitation of this study is that because the features learned by the deep CNN model are too complicated for humans, we cannot understand what the model has learned. However, the similarity of the predicted OEF maps to the real maps and the moderate ICC suggests that the deep CNN model has learned useful features for the prediction of OEF maps from MRI, CBF, CBV, and sCBF maps. Further studies are required to interpret the model using techniques such as attention network [41,42].
The findings in this study suggest that the trained deep CNN can shorten the fixation time for 15 15 O-water or C 15 O 2 gases, arterial blood sampling, and in-house cyclotron are still required even if C 15 O and 15 O 2 scans can be skipped. The contributions of CBF maps to the prediction of OEF maps also imply that CBF maps acquired by perfusion imaging with MR such as the ASL method, and single photon emission computed tomography can be used as alternative training data for the prediction of OEF maps. Further studies are required to demonstrate the validity of the maps acquired by methods other than 15 O PET.
In conclusion, the results in this study suggest that the trained deep CNN model can qualitatively predict OEF maps. To predict OEF maps, the deep CNN model can learn useful features from MRI, rest CBF, CBV, and stressed CBF. These findings suggest that by skipping the 15 O 2 scan, the trained deep CNN model can shorten the fixation time for 15 O PET. However, training with a larger data set is required for the prediction of quantitative OEF maps.