Deep-learning based image reconstruction for MRI-guided near-infrared spectral tomography

Non-invasive near-infrared spectral tomography (NIRST) can incorporate the structural information provided by simultaneous magnetic resonance imaging (MRI), and this has significantly improved the images obtained of tissue function. However, the process of MRI guidance in NIRST has been time consuming because of the needs for tissue-type segmentation and forward diffuse modeling of light propagation. To overcome these problems, a reconstruction algorithm for MRI-guided NIRST based on deep learning is proposed and validated by simulation and real patient imaging data for breast cancer characterization. In this approach, diffused optical signals and MRI images were both used as the input to the neural network, and simultaneously recovered the concentrations of oxy-hemoglobin, deoxy-hemoglobin, and water via end-to-end training by using 20,000 sets of computer-generated simulation phantoms. The simulation phantom studies showed that the quality of the reconstructed images was improved, compared to that obtained by other existing reconstruction methods. Reconstructed patient images show that the well-trained neural network with only simulation data sets can be directly used for differentiating malignant from benign breast tumors.

fitting terms together with regularizers (L 2 , L 1 , total variation norm, etc.) to instablilize from measurement noise and modeling errors [5]. Within the cadre of approaches, Tikhonov regularization is a common and very effective method [6] that utilizes the L 2 norm as the regularizer. However, it tends to over-smooth reconstructed images and reduces the contrast between tumor and surrounding tissue. To enhance the quality of reconstructed images, other imaging modalities can be used to provide structural information to guide the reconstruction [7].
Two major classes of constraint-based image guidance in NIRST reconstruction involve algorithms that introduce hard [7,8] or soft priors [9] or direct regularization imaging (DRI) [10,11]. Soft/hard priors can enhance accuracy significantly within localized regions by reducing the ill-posedness of NIRST image reconstruction, but they usually require manual segmentation to identify regions of interest. Indeed, manual segmentation can introduce errors into the reconstruction process, and the accuracy of estimated chromophores is then dependent on the accuracy of image segmentation. Additionally, the segmentation step can be time consuming and requires sufficient experience to avoid bias or error. In contrast, DRI does not need to segment anatomical images; however, it still needs to model light propagation in tissue, and model errors due to mesh discretization, imperfect boundary conditions, and approximate governing equations are inevitable in NIRST image reconstruction.
Deep learning (DL) has been investigated and shown to improve certain image reconstruction problems [12][13][14][15][16][17][18][19]. In particular, Lan et al. developed an image reconstruction algorithm for photoacoustic (PA) tomography to recover initial pressure distributions based on the Y-net architecture [14] in which network inputs were measured PA signals and poor quality images recovered by conventional reconstruction algorithms. Accordingly, the approach models PA propagation and requires mesh discretization. A multilayer perceptron based inverse problem method has been developed to improve the accuracy of source location in bioluminescence tomography [15]. More recently, several groups [16][17][18][19] have reported DL based approaches to estimate optical properties in diffuse optical tomography (DOT) [16][17][18] and validated these algorithms with phantoms [19]. The studies have focused on using DL with a single optical input, whereas the method decribed here incorporates network inputs from multiple imaging modalities to achieve image reconstruction.
Inspired by these developments and with the unique opportunity to incorporate anatomical images into these networks that can further improve NIRST image quality, we developed a DL based algorithm (Z-Net) for MRI-guided NIRST image reconstruction. In our approach, segmentation of MRI images and modeling of light propagation are avoided, and the concentrations of chromophores of oxy-hemoglobin (HbO), deoxy-hemoglobin (Hb), and water are recovered from acquired NIRST signals guided by MRI images through endto-end training with simulated datasets. Figure 1 shows the Z-Net architecture for 2D experiments. Optical signals at nine wavelengths (661, 735, 785, 808, 826, 852, 903, 912, and 948 nm) and MRI images provide the input to the network. The Z-Net based reconstruction algorithm is described in the following steps: Step 1. Measured NIRST signals, s ∈ R N s , are input into the network and mapped into feature space, φ 0 , with size 6 * 6 * 256 through a resizing operation described as φ 0 = L p σ σ s * k 3 × 3 * k 3 × 3 , (1) where k 3×3 is a convolution kernel, * represents the convolution operation, σ(·) denotes the batch normalization (BN) and rectified liner unit (ReLU) operation, p is the pooling operation, L denotes the double sampling linear interpolation operation, φ 0 is the output of the resize operation, and N s is the number of measurements.
Step 2. MRI images, m, are the second Z-net input. They are mapped to feature space, ψ 0 , by down-sampling layers described by where k 1×1 is a convolution kernel, which is used to change the number of channels. Next, MRI image features, ψ n , of the nth layer are obtained through four down-sampling layers: where P max denotes the max pooling operation.
Step 3. The features obtained in steps 1 and 2 are input to the deconvolution layers after concatenation. Each deconvolution layer concatenates the features from both its previous layer and two other paths. The output of the first layer can be described as where ⊕ denotes the concatenation operation. After concatenating features from the previous layer, features in the nth (n = 1, 2, 3, 4) concatenation layer are expressed as Finally, images of chromophore concentrations, ℜ, are output through the fourth convolution layer: Feng et al.

Page 3
A series of 2D circular phantoms with a diameter of 82 mm was used to create simulation datasets in which 16 light source and detector pairs were uniformly distributed around the circumference of each phantom. One or two circular inclusions with varied inclusion-tobackground contrasts were placed randomly at locations inside the phantoms. Chromophore concentrations of HbO, Hb, and water used for training are listed in Supplement 1, Table  S1. The diameter of the single inclusion was set to be 12, 16, or 20 mm. For phantoms with two inclusions, diameters were fixed at 16 mm, but with varied edge-to-edge distances (from 4 to 42 mm). Chromophore concentrations listed in Table S1 were assigned randomly to phantoms with one or two inclusions of different sizes. A total of 20,000 phantoms were created to generate the simulation data. When one detector position operates as the source, data were collected at the remaining 15 detector locations for each wavelength. Thus, a total of 2160 (16 * 15 * 9) data points were collected for each phantom. Open source software, Nirfast, was used to generate boundary measurements by solving the diffusion equation [20], and 2% Gaussian noise (twice the amplitude noise level of our existing NIRST system, which is <1% [21]) was added randomly to the measurements, to evaluate the performance of the proposed algorithm.
MRI images corresponding to each phantom were also generated. Specifically, gray values of inclusions in MRI images were set to 80, and gray values of background were assigned as 50, according to the dynamic contrast enhanced (DCE)-MRI contrast commonly observed.
In addition, 4% Gaussian noise was added to the MRI images.
We used 70% of these datasets for training, 20% for validation, and 10% as testing. The Z-net algorithm was implemented in Python 3.7 with PyTorch [22] of Adam [23] with a learning rate of 0.005, batch size of 128, and mean square error (MSE) loss function for backpropagation, respectively. A workstation with an Intel Xeon CPU at 2.20 GHz and 2 NVIDIA GeForce RTX 2080 graphic cards with 8 GB memory was the computational system used for training and validating our network. Computations consumed 3.9 h for training with 200 epochs. Table 1 shows the number of training parameters and training times for two DL based algorithms. Our Z-Net has only 3.48M parameters, and it took approximately 3.9 h for training from a 100 × 100-sized dataset. Relative to Y-Net, our method saved 43% in parameters and 46% in computation time without reducing reconstruction performance.
Three evaluation metrics were used to validate Z-net performance: MSE [24], peak signalto-noise ratio (PSNR) [25], and structural similarity index (SSIM) [26]. For performance assessment, we compared our algorithm against two reconstruction methods including DRI [10] and Y-Net (network architecture shown in Supplement 1, Fig. S1) [14].  Fig. S2). These phantoms were not used for training. To test further generalization of a well-trained Z-Net, the number of source-detector pairs in the testing phantom data was reduced from 16 to eight. The corresponding reconstructed images with different algorithms are shown in Supplement 1, Fig. S4. Compared to DRI or Y-net, images reconstructed by Z-net are higher in quality, and estimated chromophore values are closer to the ground truths.
Finally, as an example of clinical relevance, we applied the Z-net approach to image reconstruction of patient data obtained by our MRI-guided NIRST system [10,11]. The MRI exam and NIRST data acquisition were carried out simultaneously for women with undiagnosed abnormalities at the time of the imaging exam. A triangular interface with 16 fiber bundles as sources-detectors was used to acquire NIRST data at each of nine wavelengths in the range of 660 nm to 1064 nm (which are the same as those used in the previous simulation experiments). MRI acquisition consisted of standard (T1, T2, diffusion weighted imaging) and DCE sequences. Amplitude data at each of nine wavelengths and MRI DCE images were input to the trained Z-net. Figure 3 illustrates results obtained from a 61-year-old woman with invasive ductal carcinoma in her right breast. Figure 3(a) shows a 3D image rendering from the T1-MRI data. The NIRST imaging plane is marked by the red rectangle in Fig. 3(b), and dynamic contrast MR images are shown in Fig. 3(c). Breast density was fatty, and the patient's BIRADS score was 5. Figures 3(d)-3(f) present reconstructed HbO, Hb, and water images from acquired CW data, respectively. The tumor is located accurately, and HbT contrast between tumor and surrounding normal tissue was 1.47-high values indicate the abnormality was malignant, which was confirmed later by pathology. Figure 4 illustrates results obtained from a 28-year-old woman with a suspicious mass in her left breast. Images presented in the figure are the same as Fig. 3. Breast density was heterogeneous dense, and the BIRADS score was 3. In this case, HbT contrast between the suspicious mass and the surrounding normal tissue was 1.05, suggesting the lesion is benign. Pathological analysis confirmed later that the abnormality was a fibroadenoma.
Although DL has been adapted for optical image reconstruction [12][13][14][15][16][17][18][19], the algorithm developed here is the first to use DL for combined multimodality image reconstruction. The structural information obtained from DCE-MRI was combined with NIRST through DL without segmenting the MRI or modeling the NIRST light propagation in tissue. Simulation results show that the quantitative accuracy of NIRST is improved relative to DRI or other DL-based reconstruction algorithms. Patient results also suggest that Z-net, when trained with only computer generated simulation data from simple and regular-shaped phantoms, has potential to differentiate malignant from benign breast abnormalities. Since Z-net was trained successfully with simulated phantom data, unlimited training sets can be generated to enhance further the generalization of Z-net for MRI-guided NIRST image reconstruction. While the MRI images used in training had two regions, one of which mimicked fatty tissue (the background) and the other mimicked tumor, and gray-scale contrasts between tumor and surrounding background regions were assumed to be constant (at 1.5), patient images were generated with Z-net that had different chromophore contrasts in malignant and benign cases. These results indicate the robustness of the approach, and the possibility that it can be applied to other combined multimodality image reconstructions.
We found Z-net reconstructions generated images with better quantitative accuracy relative to Y-net results (Table S2). Z-net also reduced the number of trained parameters to about half those needed in Y-net (Table 1). Training time was also reduced with Z-net (from 7.2 h for Y-net to 3.9 h for Z-net). Finally, Z-net proved to offer an end-to-end reconstruction that takes only a few seconds after successful training and leads to near real-time image reconstruction that could be applied in clinical settings where more dynamic results are needed.
In this study, only tissue hemoglobin concentration (HbO and Hb) and water images were used in Z-net to differentiate malignant from benign breast abnormalities. Since Z-net can be expanded by adding other parameters, such as oxygen saturation, lipids, and scattering properties into the network, the diagnostic power for breast cancer detection may be increased even further as multi-spectral systems for tissue spectroscopy are advanced. Supplement 1, Fig. S5 confirms the importance of using MRI images to guide NIRST reconstruction. The phantom used to generate the results in Fig. S5 is the same as the one used in Fig. S3. Figures S5(a) and S5(b) present images reconstructed with a traditional reconstruction algorithm [20] that uses only NIRST signals as network input. Image quality of the reconstructions in Fig. S5 is inferior to that with MRI guidance (in Fig. S3). Indeed, inclusion contrast relative to the surrounding background in Fig. S5(b) has nearly disappeared. This result demonstrates the value of combining MRI images with NIRST reconstruction, especially for DL based image recovery.
In summary, we developed a new tomographic reconstruction algorithm based on Z-Net that recovers concentrations of chromophores in NIRST guided by MRI without modeling light propagation in tissue or segmenting MRI. We demonstrated that the Z-Net algorithm yielded superior performance after being trained by a deep neural network with computer generated synthetic phantom data. Future work will expand Z-Net to incorporate 3D patient data and test its performance in a larger clinical trial.   Table 1.

Method Y-Net Z-Net
Number of parameter (M) 6.09 3.48 Training time (hours) 7.2 3.9