DAGAN: Deep De-Aliasing Generative Adversarial Networks for Fast Compressed Sensing MRI Reconstruction

Compressed sensing magnetic resonance imaging (CS-MRI) enables fast acquisition, which is highly desirable for numerous clinical applications. This can not only reduce the scanning cost and ease patient burden, but also potentially reduce motion artefacts and the effect of contrast washout, thus yielding better image quality. Different from parallel imaging-based fast MRI, which utilizes multiple coils to simultaneously receive MR signals, CS-MRI breaks the Nyquist–Shannon sampling barrier to reconstruct MRI images with much less required raw data. This paper provides a deep learning-based strategy for reconstruction of CS-MRI, and bridges a substantial gap between conventional non-learning methods working only on data from a single image, and prior knowledge from large training data sets. In particular, a novel conditional Generative Adversarial Networks-based model (DAGAN)-based model is proposed to reconstruct CS-MRI. In our DAGAN architecture, we have designed a refinement learning method to stabilize our U-Net based generator, which provides an end-to-end network to reduce aliasing artefacts. To better preserve texture and edges in the reconstruction, we have coupled the adversarial loss with an innovative content loss. In addition, we incorporate frequency-domain information to enforce similarity in both the image and frequency domains. We have performed comprehensive comparison studies with both conventional CS-MRI reconstruction methods and newly investigated deep learning approaches. Compared with these methods, our DAGAN method provides superior reconstruction with preserved perceptual image details. Furthermore, each image is reconstructed in about 5 ms, which is suitable for real-time processing.


I. INTRODUCTION
M AGNETIC Resonance Imaging (MRI) is a widely applied medical imaging modality for numerous clinical applications. MRI can provide reproducible, non-invasive, and quantitative measurements of tissue, including structural, anatomical and functional information. However, one major drawback of MRI is the prolonged acquisition time. MRI is associated with an inherently slow acquisition speed that is due to data samples not being collected directly in the image space but rather in k-space. K-space contains spatial-frequency information that is acquired line-by-line and anywhere from 64 to 512 lines of data are needed for a high quality reconstruction. This relatively slow acquisition could result in significant artefacts due to patient movement and physiological motion, e.g., cardiac pulsation, respiratory excursion, and gastrointestinal peristalsis. Prolonged acquisition times also limit the usage of MRI due to expensive cost and considerations of patient comfort and compliance [1]. Moreover, for protocols that require contrast agent injection, lengthy acquisition can result in contrast washout that may lead to poor quality or nondiagnostic images. Due to limitations of the scanning speed, patient throughput using MRI is slow compared with other medical imaging modalities.
The MRI raw data samples are acquired sequentially in k-space and the speed at which k-space can be traversed is limited by physiological and hardware constraints [2]. Once the desired field-of-view and spatial resolution of the MRI images are prescribed, the required k-space raw data is conventionally This work is licensed under a Creative Commons Attribution 3.0 License. For more information, see http://creativecommons.org/licenses/by/3.0/ determined by the Nyquist-Shannon sampling criteria [3]. Some early research on fast MRI proposed acquiring several lines in k-space from a single radio frequency (RF) excitation by implementing multiple RF [4] or gradient [5] refocussings. Since these acceleration techniques acquire the full k-space coverage demanded by the Nyquist-Shannon sampling criteria, they are still categorised as fully sampled methods [1].
One possible fast MRI approach is to undersample k-space, resulting in an acceleration rate proportional to the undersampling ratio. Partial Fourier imaging (PFI) [6] is an undersampled technique based on the principle that in theory, only half of the k-space in the phase encoding direction is required according to the property that the Fourier transformation of a purely real function has complex conjugate symmetry in k-space [7]. However, in practice, more than half of the phase encoding is acquired to provide a robust phase correction [1]; therefore, the acceleration factor using PFI is limited to <2 and it is associated with a drop of signal to noise ratio (SNR). Alternatively, parallel imaging is a fast MRI method using multiple independent receiver channels. Each independent channel is most sensitive to the tissue nearest to that coil. The raw data acquired from these independent channels can be combined using either a sensitivity encoding (SENSE) technique [8] or a generalised autocalibrating (GRAPPA) method [9]. The acceleration factor of parallel imaging is limited by the number and arrangement of the receiver coils, which potentially introduce some imaging artefacts [10] and increase the manufacturing cost of the MRI scanner.
On the other hand, Compressed Sensing [11] based MRI (CS-MRI) allows fast acquisition that bypasses the Nyquist-Shannon sampling criteria with more aggressive undersampling. In theory, it can achieve a reconstruction without deterioration of image quality by performing nonlinear optimisation on randomly undersampled raw data, assuming the data is compressible. The main challenge for CS-MRI is to find an algorithm that can reconstruct an uncorrupted or de-aliased image from randomly highly undersampled k-space data.

II. RELATED WORK AND OUR CONTRIBUTIONS
A. Classic Model-Based CS-MRI CS-MRI research has been focused on three major directions. First, investigations have sought the best undersampling scheme, which should be as random as possible to create incoherent undersampling artefacts so that a proper nonlinear reconstruction can be applied to suppress noise-like artefacts without degrading image quality of the reconstruction [12]- [14]. The sparsity of the random undersampling determines the acceleration rate. More importantly, the random undersampling scheme must be feasible to implement on the MRI scanner and compatible with particular scanning sequences. Currently, most studies using 2D MRI have relied on 1D random undersampling to produce a sampling pattern that follows a 1D Gaussian distribution. This gives a higher level of undersampling in the high frequency regions while low frequencies are retained to preserve overall image structure.
Only 1D undersampling is used because the sampling along the frequency encoding direction is fast, and only the phase encoding direction limits the acquisition time [2]. Considering 3D reconstruction, 2D Gaussian and Poisson disc masks are commonly used to accelerate phase and slice encoding [1].
Second, in general, medical imagery acquired by MRI is naturally compressible. CS-MRI utilises the implicit sparsity to reconstruct accelerated acquisitions [15]. Here the term sparsity describes a matrix of image pixels or raw data points which are predominately zero valued or namely compressible. Such sparseness may exist either in the image domain or more commonly via a suitable mathematical representation in a transform domain. Sparse representation can be explored in a specific transform domain or generally in a dictionary-based subspace [16]. Classic fast CS-MRI uses predefined and fixed sparsifying transforms, e.g., total variation (TV) [17]- [19], discrete cosine transforms [20]- [22] and discrete wavelet transforms [23]- [25]. In addition, this has been extended to a more flexible sparse representation learnt directly from data using dictionary learning [26]- [28].
Although there are promising studies applying fast CS-MRI in clinical environments [31]- [33], most routine clinical MRI scanning is still based on standard fully-sampled Cartesian sequences or is accelerated only using parallel imaging. The main challenges are: (1) satisfying the incoherence criteria required by CS-MRI [1]; (2) the widely applied sparsifying transforms might be too simple to capture complex image details associated with subtle differences of biological tissues, e.g., TV based sparsifying transform penalises local variation in the reconstructed images but can introduce staircase artefacts and the wavelet transform enforces point singularities and isotropic features but orthogonal wavelets may lead to blocky artefacts [34]- [36]; (3) nonlinear optimisation solvers usually involve iterative computation that may result in relatively long reconstruction time [1]; (4) inappropriate hyper-parameters predicted in current CS-MRI methods can cause over-regularisation that will yield overly smooth and unnatural looking reconstructions or images with residual undersampling artefacts [1]. Due to these challenges and limitations, the acceleration rate using CS-MRI alone is still limited (2× to 6× acceleration).

B. Deep Learning-Based CS-MRI
Recently, deep learning has received great attention in computer vision studies and has generally returned dividends in performance. Shen et al. [37] surveyed the most recent research in deep learning for medical image analysis and Wang [38] provided an insightful perspective on deep imaging that proposed to incorporate deep learning into tomographic image reconstruction. Essentially, CS-MRI reconstruction solves a generalised inverse problem that is analogous to image super-resolution (SR) [39], de-noising and inpainting [40], [41] that have been successfully solved using deep neural network architectures, e.g., using convolutional neural networks (CNN).
Currently, there are only preliminary studies on deep learning based CS-MRI reconstruction. Wang et al. [42] introduced a CNN-based CS-MRI method, in which the learnt network was used to initialise the classic CS-MRI in a two-phase reconstruction, or integrated into the CS-MRI directly as an additional regularisation term. Despite preliminary qualitative visualisation that showed some promise, the applicability of this method for CS-MRI reconstruction is yet to be quantitatively assessed in detail. In [36], a deep network was trained for solving CS-MRI using an Alternating Direction Method of Multipliers [18] framework, i.e., ADMM-Net. The reconstruction, de-noising and Lagrangian multiplier updates were implemented in a data flow graph and optimised through cascaded deep network layers. The method achieved similar reconstruction results as classic CS-MRI reconstruction methods but dramatically reduced reconstruction time. There are also three preprints describing deep learning based CS-MRI: Schlemper et al. [43] proposed cascaded CNN incorporating a data consistency layer, Hammernik et al. [44] trained a variational network to solve CS-MRI and Lee et al. [45] combined CNN based CS-MRI with parallel imaging to estimate and remove the aliasing artefacts. Although deep learning has shown great potential in solving CS-MRI with much faster reconstruction, to date improvement was not found significantly different from what classic CS-MRI can achieve. Moreover, similar to other deep learning applications, it is not trivial to define the network architecture and convergence of the deep network training might be difficult to achieve unless comprehensive parameter tuning is performed.

C. Our Contributions
In this study, we proposed a novel conditional Generative Adversarial Networks (GAN) based deep learning architecture (dubbed DAGAN) for de-aliasing and fast CS-MRI by a comprehensive extension of our initial proof-of-concept study [46] in both method and simulation settings. Our main contributions are: • We propose a U-Net architecture [47], [48] with skip connections for the generator network; • A refinement learning approach is designed to stabilise the training of GAN for fast convergence and less parameter tuning; • The adversarial loss is coupled with a novel content loss considering both pixel-wise mean square error (MSE) and perceptual loss defined by pretrained deep convolutional networks from the Visual Geometry Group at Oxford University (in short VGG networks [49]) to achieve better reconstruction details; • Frequency domain information of the CS-MRI has been incorporated as additional constraints for the data consistency, which is formed as an extra loss term; • We perform comprehensive experiments and compare our proposed models with both classic CS-MRI and newly developed deep learning based methods.
Compared to the state-of-the-art CS-MRI methods, we can achieve high acceleration factors with superior results and faster processing time.

III. METHOD
A. General CS-MRI 1) Forward Model: The observation or data acquisition forward model of image restoration or reconstruction can be approximated as a discretised linear system [50], in which x ∈ C N represents the desired image to be reconstructed, which consists of √ N × √ N pixels formatted as a column vector. The observation is denoted as y ∈ C M . The forward model of image acquisition can be described using a linear operator F : C N → C M that different matrices F ∈ C M×N represent various image restoration or reconstruction problems, e.g., an identity operator for image de-noising, convolution operators for de-blurring, filtered subsampling operators for SR and k-space random undersampling operators for CS-MRI reconstruction [50].
2) Inverse Model: The inverse estimation of Eq. 1 is usually ill-posed because the problem is normally underdetermined with M N. Moreover, the inverse model is unstable due to a numerically ill-conditioned operator F and the presence of noise (ε in Eq. 1) [50].
3) Classic Model-Based CS-MRI: In order to solve this underdetermined and ill-posed system of CS-MRI, one must exploit a-priori knowledge of x that can be formulated as an unconstrained optimisation problem, that is where 1 2 ||F u x − y|| 2 2 is the data fidelity term and F u ∈ C M×N is the undersampled Fourier encoding matrix. R expresses regularisation terms on x and λ is a regularisation parameter. The regularisation terms R typically involve l q -norms (0 ≤ q ≤ 1) in the sparsifying domain of x [2]. 4) Deep Learning-Based CS-MRI: Deep learning based studies [42], [43] propose to incorporate a CNN into CS-MRI reconstruction, that is in which f cnn is the forward propagation of data through the CNN parametrised by θ , and ζ is another regularisation parameter. The image generated by the CNN (i.e., f cnn (x u |θ)) is used as a reference image and as an additional regularisation term, in whichθ represents the optimised parameters of the trained CNN. In addition, x u = F H u y is the reconstruction from the zero-filled undersampled k-space measurements, where H represents the Hermitian transpose operation. MRI data naturally encodes magnitude and phase information in complex number format. There are at least two strategies for a deep learning based method to handle complex numbers: (1) realvalued information can be embedded into the complex space using an operator Re * : R N → C N such that Re * (x) = x + 0i , and therefore the MRI forward operator can be expressed as combines the Fourier transform F and random undersampling U operators [51]; (2) the real and imaginary data can be learnt as separate channels in the training of the CNN, in other words, the C N is replaced as R 2N [43]. In our simulation based study, the first strategy was employed to avoid extra computational burden.

B. General GAN
Generative Adversarial Networks [52] consist of a generator network G and a discriminator network D. The goal of the generator G is to map a latent variable z, e.g., an input vector of random numbers, to the distribution of the given true data x that we are interested in imitating in order to fool the discriminator D. The discriminator aims to distinguish the true data x from the synthesized fake data G θ G (z). GAN can be formulated mathematically as a minimax game between the generator G θ G (z) : z → x and the discriminator D θ D (x) : x → [0, 1], and this training process is parameterised by θ G and θ D as following where latent variable z is sampled from a fixed latent distribution p z (z) and real samples x come from a real data distribution p data (x). The GAN model can be solved by alternating gradient optimisation between the discriminator and the generator. Let p G (x) be the distribution induced by the generator, for a fixed G, the optimal discriminator is when the generator distribution exactly matches the data distribution p data (x) = p G (x) [52]. To avoid the vanishing gradient problem, when initially the discriminator is very confident and almost always outputs 0, in practice the gradient step for the generator is replaced by In so doing, the gradient signals are enhanced, but this is no longer a zero-sum game [53]. When the discriminator is optimal, the minimax game is reduced into a minimisation over the generator only and is equal to using the Jensen-Shannon divergence [53]. In practice, optimal θ * D is rarely known; thus, minimisation of L(θ D , θ G ) yields only a lower bound [52], [53].

C. Proposed Method
A GAN can be extended to a conditional model if extra prior information is included to constrain the generator and discriminator; this is known as a conditional GAN [54]. Additional prior information can be discrete labels, text and images [55], [56]. In this study, a GAN conditioned on images was used and Figure 1 shows the overall framework of our conditional GAN-based CS-MRI architecture.
1) Conditional GAN Loss: First, instead of using a CNN, we incorporated the conditional GAN loss into our CS-MRI reconstruction, that is (8) in which there is one input for the generator, i.e., zerofilling reconstruction x u with aliasing artefacts. After learning, the generator yielded the corresponding de-aliased reconstructionx u , which was fed to the discriminator. The aim is to keep training until the discriminator cannot distinguish a de-aliased reconstructionx u from the fully-sampled ground truth reconstruction x t . Here x t and x u are our input training data or in other words we input x u conditioned on the given x t , and output the de-aliased reconstructionx u . Compared to the original conditional GAN [54], in which both the generator and discriminator are conditioned on some extra information, in our DAGAN model only the generator receives the undersampled image input as the conditional information.
2) DAGAN Architecture: Our DAGAN architecture was loosely inspired by [55] and [57]. We proposed to use a U-Net based architecture [47] to construct the generator G that consisted of 8 convolutional layers (encoder layers) and corresponding 8 deconvolutional layers (decoder layers), and each was followed by batch normalisation [58] and leaky ReLU layers. In addition, skip connections were applied to connect mirrored layers between the encoder and decoder paths in order to feed different levels of features to the decoder to gain superior reconstruction details. A hyperbolic tangent function was used as the output activation function for the generator. On the other hand, the discriminator D undertook a classification task to differentiate the de-aliased reconstructionx u from the fully-sampled ground truth reconstruction x t . It was formed using a standard CNN architecture with 11 convolutional layers, and each was also followed by batch normalisation and leaky ReLU layers. Finally a dense convolutional layer was cascaded and the sigmoid activation function output the classification results (more detail in Supplementary Material).
3) Content Loss: In order to improve the perceptual quality of our reconstruction, a content loss was designed for the training of the generator. This loss consisted of three parts, i.e., a pixel-wise image domain mean square error (MSE) loss, a frequency domain MSE loss and a perceptual VGG loss. First, the MSE based loss functions can be represented as in which y t andŷ u are the corresponding frequency domain data of x t andx u . The VGG loss is defined as Together with the adversarial loss of the generator the total loss function can be denoted as In this study, normalised MSE (NMSE) was used as the optimisation cost function for the fast CS-MRI reconstruction. However, the solution solely based on the optimisation of the NMSE, which is defined on pixel-wise image difference (L iMSE ), could result in perceptually nonsmooth reconstructions that often lack coherent image details. Therefore, we added NMSE of the frequency domain data as additional constraints (L fMSE ) and also an additional VGG loss (L VGG ) to take the perceptual similarity into account [59]. Once the generator has been trained based on the L TOTAL , we can apply it to any new inputs (i.e., initial aliased reconstructions after filling zeros into the undersampled k-space), and it will result in the de-aliased reconstruction.
4) Refinement Learning: It is well known that a deep learning based method might be hard to train due to vanishing or exploding gradient problems, however comprehensive parameter tuning may alleviate the problems but subject to large variance of performance with different parameter settings. The original GAN model is also difficult to train [57] due to the alternating training on the adversarial components. In this study, we proposed a refinement learning to stabilise the training of our DAGAN model, which can yield faster convergence. Essentially, we proposed to usex In so doing, we transferred the generator from a conditional generative function to a refinement function, i.e., only generate the missing information, which can dramatically reduce the complexity of the model learning. In addition, in order to ensure that the de-aliased reconstructionx u is in an appropriate intensity scale as the ground truth, we applied a simple ramp function to rescale the image. 5) Networks and Training Settings: The VGG network [49] in this work was pretrained on ImageNet [60]. In particular, we used the conv4 output of the VGG16 as the encoded embedding of the de-aliased output and the ground truth, and computed the MSE between them. We trained separate networks for different undersampling ratios with the following fixed mutual hyperparameters: α = 15, β = 0.1, γ = 0.0025, initial learning rate of 0.0001, batch size of 25. It is of note that the hyperparameters: α, β and γ are the weights associated with different loss terms. According to previous research [61] and in practice, we found that it is adequate to set these weights such that the magnitude of different loss terms is balanced into similar scales. We adopted Adam optimisation [62] with momentum of 0.5. Each model was learnt by employing early stopping and the learning rate was halved every 5 epochs. Here we want to emphasise that our DAGAN models are robust with minimal parameter tuning so that the same set of hyperparmeters were used for our following experiments using different undersampling ratios, various undersampling masks and with and without noises.
6) Data Augmentation: In order to boost the network performance, in addition to conventional data augmentation (e.g., image flipping, rotating, shifting, brightness adjustment and zooming), elastic distortion [63] was also applied to account non-rigid deformation of the imaged organs.

1) Implementation:
The implementation of our DAGAN models 1 has been done using a high-level Python wrapper (TensorLayer) 2 [64] of the TensorFlow 3 library.
2) Evaluation Methods: We report the NMSE, the Peak Signal-to-Noise Ratio (PSNR in dB) and the Structural Similarity Index (SSIM) [59]. The reconstructed fully sampled k-space data was used as ground truth (GT) for validation. In addition to quantitative metrics, we also evaluated our method using qualitative visualisation of the reconstructed MRI images and the error with respect to the GT.
3) DAGAN Variations: In order to test the effectiveness of different loss components in our cost function, we compared the following DAGAN variations: (1) Pixel-Frequency-Perceptual-GAN-Refinement (PFPGR): the full model using the GAN architecture with pixel-wise MSE, frequency domain data MSE, VGG loss, and refinement learning;

4) Comparison Methods:
We compared our DAGAN model with conventional CS-MRI methods including TV [16], SIDWT [65] and RecPF [18], and the state-of-the-art approaches including DLMRI [26], PBDW [23], PANO [66], Noiselet [67], BM3D [50] and DeepADMM [36]. It is worth noticing that all the comparison methods and our DAGAN models were initialised using the baseline zerofilling (ZF) reconstruction to achieve a fair comparison study. The initialisation using a prior reconstructed image (e.g., using SIDWT) may boost the performance of some methods, but can obviously suffer from much longer reconstruction. In addition, we only performed minimum parameter tuning for 5 http://www.brainlesion-workshop.org/ all the methods. For most of these comparison algorithms (TV, SIDWT, RecPF, PBDW and Noiselet), we used two generic stopping criteria: the number of maximum iterations (800) and the improvement tolerance (0.00001), and the reconstruction stops when either criterion is satisfied. However, the fundamental mechanism of these algorithms are different; therefore, they may have different definitions for the number of iterations. For the DLMRI method, we used the recommended setting, i.e., 40 outer loop iterations with 20 iterations for the K-singular value decomposition algorithm. Similarly, for the BM3D method, we used some recommended number of iterations, e.g., 100. For the PANO method, there is no open source implementation; therefore, we only used the provided executable file to perform the reconstruction (the pre-defined inner loop improvement tolerance is 0.005 [66]). For the DeepADMM method, we performed 500 iterations for the training procedure to avoid possible overfitting and also considered the relatively prolonged training time per iteration. For our DAGAN method we applied the early stopping strategy, which can be considered as an additional and efficient regularisation technique to avoid overfitting [68]. To test the noise tolerance of the CS-MRI methods, we synthesised additive white Gaussian noise, which was added to the k-space before applying the undersampling. Although we assumed that the MRI images before adding noise were clean, the actual data were acquired from MRI scanner that may contain certain amount of noise. In this study, the baseline noise level (5.5% ± 13%) was calculated using the method described in [69]. It is of note that although the noise model of magnitude MRI images should follow Rician distribution, the additive Gaussian white noise assumption still holds for the k-space components [70]. Table I tabulates the quantitative comparison results of DAGAN variations (i.e., PG, PPG, PPGR and PFPGR). Overall, the results using 1D Gaussian masks presented in Table I show that adding the refinement learning and frequency domain constraint (PFPGR) improved the average NMSE and PSNR. For all our DAGAN models, we obtained compelling de-aliasing results compared to the ZF reconstruction that contained significant aliasing artefacts. Figure 2 shows example PFPGR reconstructions by different undersampling ratios (1D Gaussian mask). Qualitatively, there is little difference between the reconstructed images and the GT when the undersampling ratio is ≥20%. For 10% retained data, most of the aliasing artefacts have still been suppressed effectively and we can still obtain an average PSNR > 31dB, but there is obvious loss of structural details (e.g., organ edges). This is because the k-space is highly undersampled and there is significant information loss in the low frequency regions.

1) Comparison of DAGAN Variations:
Compared to using pixel-wise MSE only (PG), adding the perceptual loss (PPG) produced similar quantitative results; however, qualitative visualisation showed finer reconstruction details without unrealistic jagged artefacts (Figure 3 (l) vs. (m), (n) and (o)). In addition, a horizontal line profile across a randomly selected case (Figure 4) showed that ZF reconstruction still contained a significant amount of artefacts. PG and PPG clearly reduced the aliasing artefacts, and with refinement learning both PPGR and PFPGR achieve more accurate reconstructed line profile compared to the GT (Figure 4).
2) Comparison With Other Methods: Table I also summarises the comparison results of conventional CS-MRI and some representative state-of-the-art methods. Conventional CS-MRI approaches (TV, SIDWT and RecPF) reconstructed images with limited de-aliasing effect, for example, significant remaining aliasing artefacts can be seen in Figure 3 (c), (d) and (e). Dictionary learning (DLMRI) and patch based methods (PBDW and PANO) obtained better de-aliasing, but with clearly over-smoothed reconstruction details (Figure 3 (f), (g) and (h)). Moreover, there are visible aliasing artefacts in the reconstructed images using the Noiselet method (e.g., in Figure 3 (i)). Although both BM3D and DeepADMM worked quite well (Figure 3 (j) and (k)), all our DAGAN models produced visually more convincing reconstructions with much higher SSIM (Figure 3 (l), (m), (n) and (o)).
3) Study on Noise and Masks: Figures 5 and 6 show the PSNR with respect to different noise levels and various undersampling patterns. Our DAGAN models demonstrated certain tolerance to the noise, e.g., our DAGAN models achieved > 30dB PSNR when the noise level is ≤20%. In contrast, other methods were dramatically affected by the noise ( Figure 5). For various sampling patterns, our DAGAN models performed better with 2D undersampling masks, and produced superior or comparable reconstruction results with other CS-MRI methods using the same sampling pattern ( Figure 6). 4) Zero-Shot Inference on Pathological Cases: Figures 7  and 8 show the reconstruction results using our PFPGR model on example pathological MRI images. It is of note that these results were obtained by using the trained PFPGR model on normal brain MRI images (i.e., randomly selected T 1 -weighted MRI data), and there was no pathological MRI image used for training. This can also be referred as a zero-shot inference problem [71]. In general, our DAGAN (PFPGR) model achieved faithful reconstruction with clear pathological patterns been preserved, for example, compared to the ZF reconstruction, our PFPGR model demonstrated superior reconstructed details and better defined brain tumour textures and boundaries (green arrows in Figure 7 (a-c)).

V. DISCUSSION
In this study, we developed a novel conditional GAN based method for fast CS-MRI reconstruction. To the best of our knowledge, the proposed DAGAN model is the first work that incorporates GAN based deep learning for CS-MRI [46]. Overall, our results suggest that the DAGAN model can outperform conventional CS-MRI methods (TV, SIDWT and RecPF) in both qualitative visualisation and quantitative validation. Compared to other state-of-the-art methods (e.g., PANO and BM3D), our DAGAN model can also obtain comparable results. More importantly, the reconstruction time using DAGAN is much faster than others (about 0.2 sec per 2D image on a CPU or 5 ms on a dedicated GPU) that is feasible for a real-time reconstruction on the MRI scanner.
To emphasise a fair comparison, we initialise all the comparison algorithms with the ZF reconstruction and performed minimum parameter tuning and used generic stopping criteria, e.g., the maximum number of iterations and the improvement tolerance. It should be stressed here that the major purpose of this study is to present our DAGAN model for CS-MRI reconstruction, not benchmarking various reconstruction methods; therefore, a comprehensive comparison of different  parameter settings of the compared methods is beyond the scope of the current study. In addition, previous studies have demonstrated that different initialisation could affect the final reconstruction, e.g., the PANO algorithm can perform better with an initialisation using the SIDWT results instead of using ZF [66], but this clearly sacrifices the reconstruction efficiency. Schlemper et al. [43] have also shown that a cascade of alternating CNN and a data consistency layer can achieve superior performance. Such alternating scheme can be also applied to combine our DAGAN model with a data consistency layer or a conventional CS-MRI method; however, this will dramatically reduce the reconstruction efficiency.  Classic fast CS-MRI methods try to solve the image reconstruction using nonlinear optimisation techniques assuming the data is compressible, but normally without considering  the prior information of the expected appearance of the anatomy or the possible structure of the undersampling artefacts. This is significantly different from how human radiologists learn to read and interpret MRI images. Radiologists have been trained by reading thousands of MRI images to develop remarkable skills to recognise certain reproducible anatomical and contextual patterns in the images even with known artefacts presented [1], [44]. Our deep learning based DAGAN method aims to imitate this human learning procedure, and therefore shifts the conventional online nonlinear optimisation into an offline training procedure. In other words, our DAGAN method bridges a substantial gap between conventional non-learning methods solving the inverse problem using information from only a single input, and abundant prior knowledge from large training datasets. Compared to our DAGAN method, dictionary learning based methods usually utilise either a fixed over-complete set of basis or form a dictionary learnt directly from data. For the former learning scheme, there is lack of adaptivity, and for the latter one, the resulting dictionary in sparse coding is not hierarchical as in the deep learning based methods, which in general could provide superior results. In addition, the performance of our DAGAN method is also improved by enriching the training datasets with a comprehensive data augmentation that has not been considered in previous dictionary learning or deep learning based methods [26], [36], [42]- [45]. Once a DAGAN model has been trained, it can be used to infer any new input raw data with the same undersampling ratio. The advantages of the DAGAN model can be two-fold: (1) a more complex nonlinear mapping can be learnt through a comprehensive feature extraction by deep learning, and therefore superior reconstruction details can be obtained; (2) the offline training procedure finishes the labourious optimisation, and the inference avoids any online iterative updating of the reconstruction, which is therefore much more efficient.
In addition to the proposed application of a conditional GAN architecture, a perceptual loss is incorporated to account for the improvement over the reconstructed image quality in terms of the visually more convincing anatomical or pathological details. The idea of the perceptual loss is loosely inspired by GAN based super-resolution [59], which has demonstrated that adding the perceptual loss can achieve better qualitative performance. Our simulation results have also confirmed this, which can be attributed to the fact that when reconstructing highly undersampled k-space data, the PG method without perceptual loss can only find an optimal solution to satisfy the MSE criteria but may not perceptually resemble the real data. In addition, our perceptual loss is formed using a pretrained VGG network, which is used to extract high-level features of MRI images; therefore, no natural-looking structures have been hallucinated in the reconstructed MRI images. Compared to the SR application, CS-MRI solves a more general inverse problem to recover data from undersampled measurements, in which the undersampling pattern is random and noise and artefacts propagation is global due to the frequency domain operation (compared to the regular downsampling pattern and local artefacts in SR). Therefore, the CS-MRI is a more challenging problem to solve. Furthermore, our DAGAN model can be generalised to solve SR and it is also applicable for solving tomographic reconstruction of other imaging modalities, e.g., Computed Tomography and Positron-Emission Tomography.
Together with the perceptual loss, the content loss of our DAGAN model incorporates also MSE loss terms considering both pixel and frequency domain information. In a preliminary study (results shown in Supplementary Material Section 3), the DAGAN model without content loss (using only the adversarial loss) could not achieve acceptable reconstruction. Moreover, the training using only the adversarial loss with refinement learning could not converge. This may be due to the fact that the content loss has provided effective constraints to  regularise the generator to synthesise reasonable reconstruction instead of arbitrary image features.
In the context of CS-MRI, one immediate question is whether the conditional GAN architecture would synthesise any unrealistic image details in the reconstruction. We studied this by visually scrutinising our reconstruction results, and by a thorough inspection we observed only residual aliasing artefacts when the undersampling ratio is high (≤20%); however, there were no unnatural synthesised image details in the intermediate results and final reconstruction (Figure 9). This may be due to the fact that the input of our DAGAN is not totally random, and the ZF reconstruction provided a reasonable initialisation for DAGAN to perform the de-aliasing. In addition, the proposed refinement learning can substantially stabilise the training of the GAN (Figure 10) that is known to be difficult. We noticed that in a previous study on CS based Computed Tomography reconstruction, a similar residual learning method was proposed [72]. It has been used in the U-Net based architecture instead of learning the full-view reconstruction of the Computed Tomography image. Furthermore, their persistent homology analysis demonstrated that the manifold of the full-view reconstruction is topologically more complex than the reconstruction of the residual image, and therefore the residual learning performed better. In our study, a similar refinement learning technique has been applied for training the conditional GAN model. In particular, refinement learning can constrain the generator to reconstruct only the missing details, and prevent it from generating arbitrary features that may not present in real MRI images. In fact, due to limited network capacity and uneven real data distribution, the discriminator might hardly differentiate these unrealistic features, and in turn could incorrectly encourage the generator to reconstruct such arbitrary features. Our convergence analysis ( Figure 10) has demonstrated that the proposed refinement learning can dramatically reduce the training complexity of GAN with stabilised and fast-converging training.
In order to demonstrate that the hyperparameters tuning will not affect the DAGAN reconstruction results significantly we performed a robustness analysis by tuning one hyperparameter by 0.5×, 5× and 10× of the used setting while fixing the other two. For comparison of different hyperparameter settings, statistical significances were given by a two-sample Wilcoxon rank-sum test. Figure 11 shows that only when we set β to 5×β, we obtained clearly worse results ( p = 0.012), but with other parameter settings, there were no significant differences in the reconstruction results. In addition, our DAGAN method has shown improved tolerance to the additive noise.
Interestingly, the zero-shot inference performed well on pathological cases. The visualisation results showed various brain lesions clearly without any distortion of the lesions or any new lesions being synthesised. It is of note that the training data used for our DAGAN model were normal brain images acquired with a T 1 -weighted MRI sequence while the pathological cases were acquired using different T 1 -weighted or FLAIR sequences. We have shown that our DAGAN model can still reconstruct these images (Figure 7(c), (f), (i)). For cardiac MRI images (Figure 8), the main features of cardiac anatomy were reconstructed reasonably well, although there were some artefacts introduced in the blood pool regions and some loss of fine structural detail. The obvious information loss around the peripheral regions (pink boxes in Figure 8) may be due to the fact that our DAGAN model has been trained using normal brain MRI, and during the inference of cardiac MRI images, the DAGAN model enforces the peripheral regions to be zero as for the brain MRI images. For cardiac MRI, this peripheral information loss is clinically unimportant.

VI. CONCLUSION
The presented study proposes a conditional GAN-based deep learning method for fast CS-MRI reconstruction. The proposed DAGAN method has outperformed conventional CS-MRI approaches and also achieved comparable reconstruction compared to newly developed methods, but the processing time has been remarkably reduced (from seconds to milliseconds per 2D slice) enabling possible real-time application. By combining with existing MRI scanning sequences and parallel imaging, we can envisage this simulation based study to be translated to the real clinical environment.