Transfer Learning Enhanced Generative Adversarial Networks for Multi-Channel MRI Reconstruction

Deep learning based generative adversarial networks (GAN) can effectively perform image reconstruction with under-sampled MR data. In general, a large number of training samples are required to improve the reconstruction performance of a certain model. However, in real clinical applications, it is difficult to obtain tens of thousands of raw patient data to train the model since saving k-space data is not in the routine clinical flow. Therefore, enhancing the generalizability of a network based on small samples is urgently needed. In this study, three novel applications were explored based on parallel imaging combined with the GAN model (PI-GAN) and transfer learning. The model was pre-trained with public Calgary brain images and then fine-tuned for use in (1) patients with tumors in our center; (2) different anatomies, including knee and liver; (3) different k-space sampling masks with acceleration factors (AFs) of 2 and 6. As for the brain tumor dataset, the transfer learning results could remove the artifacts found in PI-GAN and yield smoother brain edges. The transfer learning results for the knee and liver were superior to those of the PI-GAN model trained with its own dataset using a smaller number of training cases. However, the learning procedure converged more slowly in the knee datasets compared to the learning in the brain tumor datasets. The reconstruction performance was improved by transfer learning both in the models with AFs of 2 and 6. Of these two models, the one with AF=2 showed better results. The results also showed that transfer learning with the pre-trained model could solve the problem of inconsistency between the training and test datasets and facilitate generalization to unseen data.


Introduction
Magnetic Resonance Imaging (MRI) is widely used for the diagnosis of diseases due to superior soft-tissue contrast and non-invasiveness. However, a major drawback of MRI is the low imaging speed since it needs to perform full data acquisition in k-space. To solve this problem, accelerated imaging techniques with under-sampling in k-space have been proposed. Among them, parallel imaging (PI) [1] and compressed sensing (CS) [2] are two typical reconstruction methods for acquiring artifact-free images.
Reconstruction methods for PI [1] are divided into image and k-space domain algorithms. The sensitivity encoding (SENSE) [3] algorithm removes the aliasing artifacts by solving inverse problems in the image domain. Generalized auto-calibrating partially parallel acquisition (GRAPPA) [4] interpolates non-sampled k-space data using AutoCaliberation Signals (ACS) sampled in the central k-space. To further accelerate parallel imaging, CS reconstruction methods have been proposed. In order to apply the CS theory to MRI reconstruction, a suitable transform domain is needed that makes signal sparse, e.g., wavelet transform [5], total variation (TV) [6], and low rank [7].
Meanwhile, the L1 minimization problem can be solved with regularization terms by using the prior information of the image. However, CS reconstructions are limited by the fact that the under-sampling mask must be incoherent. Moreover, since the solutions of both reconstruction methods require iterative computations, the reconstruction time would be too long. Besides, hand-crafted regularization terms are usually too simple, and several hyper-parameters need to be tuned before application. Thus, conventional CS-MRI technique is limited to acceleration factors of 2~3 [8,9].
It is of note that the advantage of deep learning reconstructions over conventional CS is the greatly reduced reconstruction time while maintaining superior image quality [10,11]. As for single-channel imaging, Wang et al. [12] developed a convolutional neural network (CNN) to identify the mapping relationship between the zero-filled (ZF) images and the corresponding fully-sampled data. Yang et al. [13] developed a novel deep architecture to include iterative processes in the Alternating Direction Method of Multipliers (ADMM) algorithm into the optimization of a CS-based MRI model. Schlemper et al. [14] used a deep cascade of CNNs to reconstruct under-sampled 2D cardiac MR images. This method outperformed CS approaches in terms of reconstruction error and speed. Yang et al. [15] proposed deep de-aliasing generative adversarial networks (DAGAN) for fast CS-MRI reconstruction. They adopted a U-net architecture as the generator network and coupled an adversarial loss with a novel content loss that could preserve perceptual image details. Quan et al. [10] developed a GAN with a cyclic loss for MRI de-aliasing. This network is composed of two cascaded residual U-Nets, with the first to perform the reconstruction and the second to refine it. Mardani et al. [16] trained a deep residual network with skip connections as a generator with a mixed cost loss of least squares (LS) and L1/L2 norm to reconstruct high-quality MR images. Shaul et al. [17] proposed a two-stage GAN to estimate missing k-space samples and to remove aliasing artifacts in the image space simultaneously. Wu et al. [18] integrated the self-attention mechanism into a hierarchical deep residual convolutional neural network (SAT-Net) for improving the sparsely sampled MRI reconstruction. Yuan et al. [19] proposed a network that uses the self-attention mechanism and the relative average discriminator (SARA-GAN), in which half of the input data to the discriminator are true and half are false.
All the approaches above are applicable to single-channel MRI reconstruction.
Nevertheless, multi-channel PI is a classic solution of physics-based acceleration. It can not only improve the speed of acquisition but also yield better image quality. In 2018, Hammernik et al. [20] introduced a variational network (VN) to reconstruct complex multi-channel MR data. Aggarwal et al. [21] proposed a model-based deep learning architecture to address the multi-channel MRI reconstruction problem, called MoDL.
Zhou et al. [22] combined parallel imaging with CNN, named PI-CNN, for high-quality real-time MRI reconstruction. Wang et al. [23] proposed a multi-channel image reconstruction algorithm based on residual complex convolutional neural networks to accelerate parallel MR imaging (DeepcomplexMRI). Liu et al. [24] developed a novel deep learning-based reconstruction framework called SANTIS for efficient MR image reconstruction. Duan et al. [25] developed a variable splitting network (VS-Net) to effectively achieve a high-quality reconstruction of under-sampled multi-coil MR data.
Lv et al. [26] combined sensitivity encoding and generative adversarial networks for accelerated multi-channel MRI reconstruction, developing SENSE-GAN. Sriram et al. [27] proposed GrappaNet architecture for multi-coil MRI reconstruction. The GrappaNet combined traditional parallel imaging methods with neural networks and trained the model end-to-end. Souza et al. [28] proposed dual-domain cascade U-nets for multi-channel MRI reconstruction. They demonstrated that dual-domain methods are better when simultaneously reconstructing all channels of multi-channel data.
All the above methods need a large number of training samples to train the network parameters and to achieve robust generalization performances. Most previous studies have validated their reconstruction performances on publicly available datasets.
However, in clinical applications, it is difficult to obtain tens of thousands of multichannel data for model training since saving the raw k-space data is not included in the routine clinical flow. Thus, it is crucial to improve the generalization of learned image reconstruction networks trained from public datasets. Nowadays, several transfer learning studies have been performed to solve this problem. Han et al. [29] developed a novel deep learning approach with domain adaptation to reconstruct high quality images from under-sampled k-space data in MRI. The proposed network employed a pre-trained network using CT datasets or synthetic radial MR data, with fine-tuning using a small number of radial MR datasets. Knoll et al. [30] investigated the effects of image contrast, signal-to-noise-ratio (SNR), sampling pattern, and image content on the generalizability of a pre-trained model and demonstrated the potential for transfer learning with the VN architecture. Dar et al. [31] proposed a transfer-learning approach to examine the generalization capability of networks trained on natural images to T1weighted and T2-weighted brain images. Arshad et al. [32]

Problem Formulation
For parallel imaging, the multi-channel image reconstruction problem can be formulated as: in which M is the under-sampling mask; F represents the Fourier transform, S is the coil sensitivity maps, n is the noise, x is the desired image that we want to solve and y represents the acquired k-space measurements.
To address the inverse problem of Equation (1), CS-MRI constrains the solution space by introducing some a priori knowledge. Thus, the optimization problem can be expressed as: where the first term represents data fidelity in the k-space domain, which guarantees the consistency of reconstruction results with the original under-sampled k-space data; ℜ( ) denotes the prior regularization term. Term is a balance parameter, which determines the tradeoff between the prior information and the data fidelity term. In particular, ℜ( ) is usually an L0 or L1 norm in a certain sparsity transform domain. To solve the above optimization problem, an iterative method is usually required. With the introduction of deep learning, ℜ( ) can be represented by a CNN-based regularization term, that is = min Therefore, we introduced the conditional GAN loss into the MRI reconstruction, that is in which is the ZF image which represents the input of the generator, ̂ is the corresponding reconstructed image yielded from the generator and the fully-sampled ground truth is .

Fig. 1. Overview of the PI-GAN architecture. The input of generator G is the combined ZF image and the sensitivity maps S and the output of G is ̂. G is composed of a residual U-net with 4 encoder (green box) and 4 decoder (lavender box) blocks
Briefly, the PI-GAN architecture integrates data fidelity and regularization terms into the generator. Thus, this approach can not only provide "end-to-end" undersampled multi-channel image reconstruction but also makes full use of the acquired multi-channel coil information. Besides, to better preserve image details in the reconstruction process, the adversarial loss function is combined with the pixel-wise loss in the image and frequency domains. The loss function is divided into three parts, including the mean absolute error (MAE) in the image domain ( iMAE ), and the MAE in the frequency domains fMAE,M and fMAE,1−M , respectively. We choose MAE as the loss function because it provides better convergence than the widely used MSE loss [10,33,34]. The input and output of the generator were and ̂, respectively.
The Generator is trained by minimizing the following loss: Here α, β and γ are the hyperparameters that control the trade-off between each loss term. The four loss terms can be denoted as: GAN is difficult to be trained successfully because of the need for alternate training.
Thus, to stabilize the training of GAN, we introduce the refinement learning, which is ̂= ( ) + . That means the generator only needs to learn part of data that is not acquired which can significantly reduce the model complexity.

Generator Architecture
As shown in

Discriminator Architecture
The discriminator model is a 7-layer CNN network. The first and second layers are convolutional layers with a filter size of 4 4 using the Leaky ReLU as the activation function. The third to sixth layers use the same network structure as the generator encoder block. The last layer is a convolutional layer with a stride of 1.

Public MRI dataset
Datasets of healthy subjects were obtained from the Calgary-Campinas brain MR

Quantitative Evaluation
The obtained reconstruction results were evaluated using three metrics: peak signal to noise ratio (PSNR), structural similarity index measure (SSIM) and normalized root mean square error (RMSE). Besides, histograms of reconstruction results were generated from a region of interest (ROI) covering the brain tumor, and histogram parameters (kurtosis [k] and skewness [s]) were obtained.

Results
The "Calgary Model" model was pre-trained using a large number of images from    group. As indicated by red arrowheads, Transfer Learning data could remove the artifacts found in Directly Trained images, and the brain edges were smoother. We used a total of 1000 epochs for fine-tuning, and the results showed that TL200 (PSNR, 36.76; SSIM, 0.96) had almost reached the optimum, which may be due to fast convergence because training and test data were both from the brain. images. The results showed that L1-ESPIRiT did not remove the artifacts completely.
As indicated by red arrowheads, vessels were blurry in Directly Trained and Calgary Model images but were preserved well in Transfer learning data. Calgary Model data were worse than Directly Trained results because of residual artifacts (shown by green arrowheads). This indicated that it may be better to train an independent model for Knee data with only a small amount of data. It is obvious that Transfer learning yielded the best data among all methods.   All the results showed that Transfer Learning was better than that of Calgary Model.
Meanwhile, the brain results (blue line) showed that the TL200 (PSNR, 36 Fig. 11. As shown in Fig. 11, all the quantitative results showed that Transfer Learning has the best performance for both T1W and FLAIR images.

Discussion
The main contribution of this study was to develop a transfer learning enhanced GAN approach to reconstruct several unseen multi-channel MR datasets. The results demonstrated that transfer learning from a pre-trained model could reduce the variation between the training and test datasets in terms of the variations in image contrast, anatomy and AF.
For the brain tumor dataset, T1W reconstruction images showed better performance compared with FLAIR images. This suggests that the optimal strategy may be to make training and test data with the same contrasts. This is because the Calgary model was initially trained with T1W data. When the training and the test dataset belong to similar distribution, the reconstruction performance will be good.
Similarly, the larger the differences between the distributions of training and test datasets, the worse the reconstruction performance will be. Besides, after transfer learning, FLAIR images had much more improved PSNR and SSIM and reduced RMSE than T1W images. This indicates that fine-tuning is more effective for data reconstructed across domains due to the additional information provided by T1W data for FLAIR images. Meanwhile, the above quantitative results support the notion that transfer learning could address the deviation between healthy and diseased subjects.
Subhas et al. [38] reported that training a model using images with or without pathology does not affect performance. These observations can be explained that, under the condition of using the same under-sampling trajectory, the PI-GAN model is used for artifact removal and is not related to the image content. However, the present study differs from theirs in that they only included data with pathologies in the training dataset and did not perform fine-tuning of the network model. Moreover, the model trained was only applicable for combined single-channel images in the latter study. Besides, histogram analysis is linked to the application for the tumor heterogeneity assessment [39], which is important for treatment planning [40]. Our results show that the distribution of the reconstructed image using transfer learning is closer to that of the fully sampled image as compared to other methods, which can further facilitate the segmentation and diagnosis of tumor malignancy.
Besides, we successfully transferred the model pre-trained on Calgary data to different anatomies. We found that compared with the knee and liver datasets, brain tumor samples converged faster. This may be because brain tumor data were in the same anatomical position as the training data, so a small number of transfer learning steps could achieve optimal results. Arshad et al. [32] also transferred the U-net model pretrained on brain samples to heart data. On one hand, the latter study was only applicable to single channel images. On the other hand, it explored the reconstruction performance of the model after fine-tuning using different datasets. In contrast, we explored the performance of model reconstructed images after fine-tuning with a fixed training set and a different number of iterations. It is reasonable that they concluded that the larger the dataset, the higher the performance. We believe that the present investigation is more realistic because the amount of data that can be collected (e.g., liver, kidney, and heart) is inherently small. Results showed that, as long as fine-tuning is performed, whether the acceleration factor is higher or lower than its own under-sampling acceleration factor, reconstruction results would be better than taking only a small amount of its own data for training. However, it is clear that both T1W and FLAIR data showed that fine-tuning was best for models with AF=2, which means that a model with a low acceleration factor should be selected for transfer learning.
This study had some limitations. First, we under-sampled the used MRI data retrospectively, which might not be as good as prospective data under-sampling of kspace. Additional studies using prospective under-sampling are needed to validate these results. Secondly, several unsupervised learning algorithms [41][42][43] have been proposed to address the problem of insufficient sample size. In the future, we will compare the reconstruction performance of our transfer learning method with the existing unsupervised learning strategies. Finally, the PI-GAN model requires accurate sensitivity maps during the training process. However, we have not explicitly validated the quality of the sensitivity maps before performing the reconstruction. The optimization of the sensitivity maps has been done by other studies [44,45] , which is beyond the scope of our study.

Conclusion
This study provides insights into the generalization ability of a learned PI-GAN model for under-sampled multi-channel MR images with respect to deviations between the training and test datasets. Our results indicate that the PI-GAN model pre-trained on public Calgary brain images can be applied to brain tumor patients with T1W and FLAIR images, knee and liver images, and images with different acceleration factors through transfer learning with a small tuning dataset. Thus, this study reveals the potential of transfer learning in multi-channel MRI reconstruction, where no sufficient data are available for complete training.