Improved generative adversarial networks using the total gradient loss for the resolution enhancement of fluorescence images

Because of the optical properties of medical fluorescence images (FIs) and hardware limitations, light scattering and diffraction constrain the image quality and resolution. In contrast to device-based approaches, we developed a post-processing method for FI resolution enhancement by employing improved generative adversarial networks. To overcome the drawback of fake texture generation, we proposed total gradient loss for network training. Fine-tuning training procedure was applied to further improve the network architecture. Finally, a more agreeable network for resolution enhancement was applied to actual FIs to produce sharper and clearer boundaries than in the original images. © 2019 Optical Society of America under the terms of the OSA Open Access Publishing Agreement


Introduction
Intraoperative near-infrared (NIR) fluorescence imaging is an emerging clinical imaging modality, which can effectively assist various kinds of surgical treatments and is attracting increased attentions from both imaging and surgical fields [1]. As it utilizes NIR fluorescence probes and specially designed optical imaging systems for real-time visualization during a surgery, it is non-radioactive, portable, and relatively cost-effective [2][3][4]. Typical applications include sentinel lymph node detection [5,6], tumor visualization [4], and the identification of other vital structures [7,8]. Clinical applications based on this technique mainly depend on the fluorescence contrast between the target area and surrounding tissues. Such differences can be caused by delivering a contrast agent with a spatially varying concentration in the tissues. It can also be designed with a targeting probe for both diagnostic and therapeutic purposes in specific biochemical environments [9]. When illuminated by an excitation light, the contrast agent with various concentrations in tissues emits a fluorescent signal, which is received by a charge-coupled device (CCD) camera for imaging. The migration of the excitation and emission photons through the tissues is likely to cause the fluorescence signal to disperse and be lost in space [10]. Because of the optical limitations of light scattering and diffraction as well as hardware restriction, medical fluorescence images (FIs) suffer from a relatively low contrast and reduced spatial resolution at the boundaries. This is problematic in cases where a fine analysis of the fluorescence concentration is required, such as photodynamic therapy with photosensitizer measurements in tissue [11] and recognizing vessels or nerves in vivo [12,13].
For the resolution enhancement of fluorescence images, many previous studies focused on the processing of microscopy fluorescence images [14,15], but for clinical fluorescence images, most of them still rely on the improvement of hardware performance. Even though many groups have developed hardware-based methods over the last 5-10 years to improve the FI resolution [16][17][18], post-processing techniques are still an appealing approach to alleviate the limitations of optical properties and hardware. For natural image resolution enhancement, deep learning methods have achieved great image recovery performances with high quality and relatively sharp edges [19][20][21][22][23][24]. SRGAN was proposed by Ledig et al. and implements a deep residual network to recover more realistic photos from heavily down-sampled natural images [25]. SRGAN can reconstruct more perceptually convincing images than the other state-of-the-art deep learning methods that are not based on GAN [25]. However, SRGAN also generates fake textures to sharpen images, which should be minimized for medical applications. More suitable networks based on deep learning methods should be developed that provide less fake textures and high-contrast boundaries for FI resolution enhancement.
We propose a novel FI resolution enhancement method that uses the total gradient loss to improve generative adversarial networks (GANs) and produce both sharpened edges and fewer artifact textures. To simulate low-resolution FI reconstruction we first down-sampled images, and then trained our network with pairs of original and re-up-scaled images with a 4× scale factor. Compared to SRGAN, our proposed method performed better with both the down-sampled FI dataset and the original resolution plate. Noise-affect resolution plate experiments were conducted that further illustrated the effectiveness and robustness of the network for image enhancement. Furthermore, we tested our method on a real FI of mouse tail blood vessels and a video of intraoperative fluorescence imaging acquired from a breast cancer surgery. The results showed image resolution enhancement with sharpened edges.

Methods
In this section, we describe the principle of fluorescence imaging and simplify the problem of low resolution with a down-sampled and re-up-scaled function model. Then, the proposed FI resolution enhancement method based on GAN is presented. To address the problem of fake textures, we propose total gradient loss to train the network. We then present a fine-tuning training procedure for the network architecture improvement based on the microscopy FI training dataset.

Problem formulation
The purpose of this study was to recover low-resolution FIs with sharpened boundaries and high-resolution quality. A low-resolution FI is caused by many factors, which we divided into three main groups.
1) Fig. 1 shows the basic principle for intraoperative NIR fluorescence imaging. After a short excitation light pulse, light photons pass through the thick tissue and reach the target area where the fluorescence contrast agent has accumulated. When the excitation light photon is absorbed by a fluorophore, a new fluorescence photon is launched at a longer wavelength and is eventually absorbed or emitted at the surface to be received by a CCD camera with an appropriate filter. To simplify the photon propagation process in the tissue, we omitted other factors that introduce complexity, such as the reabsorption of fluorescence photons, diffuse reflection of the excitation light on the tissue surface, and fluorescence quenching of the contrast agent over time [10]. The simplified optical propagation model is usually described as a point spread function (PSF) [26]. The PSF of an isotropic point source is often approximated as a Gaussian function [26]: where σ is calculated from the fluorophore in the specimen that specifies the width of the PSF, I 0 is the peak intensity, which is proportional to the photon emission rate and decreases because of the photon absorption effect, and (x 0 , y 0 ) is the location of the fluorophore. Optical properties (e.g., light absorption, scattering, and diffraction) change the photon propagation path and absorb part of the fluorescent emission photons, which leads to blurring and low resolution of the FI [27]. In addition, different tissues have different absorption and scattering properties, even for the same tissue in different parts [28]. Thus, optical scattering and diffraction reduce the signal-to-noise ratio (SNR), which limits the resolution of fluorescence images. Moreover, the complexity and inconsistency of the light propagation procedure make it difficult to describe these phenomena with precise mathematical models.  2) Hardware limitations are another factor that limits the quality of FIs. The resolution of FI is majorly determined by the sampling rate of the camera, the numerical aperture (NA) of the lens, and the SNR of the overall system. Thus, low-resolution can be directly improved by using a better imaging system with higher specifications, but this often brings the increase of cost [29]. Furthermore, a high-quality imaging system also increases the device volume and weight, which inevitably reduces the ergonomics and portability. There is a trade-off between the imaging resolution, ergonomics, and cost when designing an intraoperative NIR fluorescence imaging system [30]. Therefore, it is worthy of improving the imaging quality through image prost-processing methods rather than only upgrading the hardware setups.
3) The adverse ambient light condition and lower contrast agent accumulation normally lead to a low SNR during intraoperative NIR fluorescence imaging and blur the imaging contrast [31]. In such situation, operators frequently enhance the image contrast to elucidate object details by manually expending the exposure bar, so that the dynamic range of the camera can be fully utilized. However, this excessive enhancement actually damages the overall resolution and visual effects, as even more noise would be introduced into the image [32].
Because the causes of the limited resolution are many and complex, developing a single mathematical expression is difficult. However, the common features among the factors mentioned above are a loss of photon signals and diffraction effects. Therefore, we can simulate this low-resolution problem as a down-sampling and then re-upscaling procedure: where X is the high-resolution FI, D is the down-sampling operator, U is the augmenting operator with the bicubic interpolation method, and Y is the low-resolution FI. We applied the deep learning method to fit the inverse process from Y toX. The objective is to minimize the differences between X andX.

General GAN for resolution enhancement
Goodfellow [33] proposed the GAN, which can be used to generate higher-quality samples (x = G(y)) from a distribution of a low-resolution dataset y. GAN can be approximated as a minmax game between a generator network G and a discriminator network D [33]: where y is sampled from the distribution P y (y) of the low-resolution images and x is from the distribution P data (x) of the real dataset with high-resolution images. The discriminator is responsible for judging the gap between the synthesized fake sample G(y) and real sample x. Meanwhile, the results are presented in the form of scores that are fed back to the generator network. Thus, the G output is continuously optimized through the minmax game. GAN has the advantage of solving the regression problem of image recovery. SRGAN is a new state-of-the-art method for natural image resolution enhancement; it is a GAN-based network optimized for perceptual loss that is calculated on feature maps of the Visual Geometry Group (VGG) network, as described by Simonyan and Zisserman [34]. The perceptual loss is defined as the Euclidean distance between the feature representations of a reconstructed image G(I LR ) and reference image I HR : Where φ is the feature map obtained by the VGG network and W and H are the dimensions of the respective feature maps. A deep residual network is applied in the generator of SRGAN to decrease the losses, including the content loss and adversarial loss. Considering the unstable characteristics of the GAN training procedure, many improvements have been proposed, such as the Wasserstein GAN [35], improved Wasserstein GAN with gradient penalty [36], least-squares GAN [37], deep convolutional GAN [38], and loss-sensitive GAN with the Lipschitz density [39]. Many have been proven to significantly enhance the performance in certain applications.

Network architecture and loss functions
The overall framework of our FI resolution enhancement method is shown in Fig. 2. Simulate low-resolution (LR) images by first down-sample the high-resolution (HR) images and then re-upscaled to the original size through bicubic interpolation with a 4× scale factor. Then, pairs of HR and LR images were imported to the networks for training purpose. We used a least-squares GAN model [37] with seven dense residual blocks [40] and spectral normalization [41] in the generator to overcome the low-resolution problem. The primary objective functions for the GAN used in this study can be described as follows: where L FI_GAN (D) is the discriminator loss, D(x) is the discriminator score of the real image with high resolution, D(G(y)) is the discriminator score of the fake image generated from the generator network, and L Adv (G) is the adversarial loss, which is a part of the generative loss L FI_GAN (G).

Research Article
Vol. 10, No. 9 / 1 September 2019 / Biomedical Optics Express 4746 The generative loss has three parts: the content loss L X (G), adversarial loss mentioned above, and total gradient loss. The content loss includes MSE loss, L1Smooth loss, and perception loss, which further includes VGG_loss and ResNet_loss.

Tota
The total g problem of the total va gradients o pre-defined function (n TG L is math where , image Y, resp gradient of th nts of X and number along ber of pixel o umbers along of Low-resolutio e resolution enhan ss ) loss was pr s produced by which is an a xels for each Eq. 9) from TG loss) that p xpressed as fo

Total gradient loss
The total gradient (TG) loss was proposed for using in the network to mitigate the problem of fake textures produced by SRGAN method. This approach was inspired by the total variation [42], which is an algorithm designed for denoising by minimizing the gradients of adjacent pixels for each pixel in an image. In this study, we compared the pre-defined gradients (Eq. 9) from the generated and real images to develop a loss function (named as the TG loss) that provides feedback for network training. The TG loss L TG is mathematically expressed as follows: where u x i,j and u y i,j are the pixel values of the original high-resolution image X and low-resolution image Y, respectively; ∆U w is the gradient of the image along the width, and ∆U h is the gradient of the image along the height; ∆U XY w is the total differences between the gradients of X and Y along the width, and ∆U XY h is that along the height; N h is the total pixel number along the height, and N w is the total pixel number along the width; k is the number of pixel offsets in each loop, ranging from 1 to kmax, and kmax is half of the pixel numbers along the width (or the height) of the image. The reason of designing the TG loss to train our network is because it does not only compare the gradient of two adjacent pixels between generated and original images, it also compares gradients of long-distanced pixels between these two images (Fig. 3). Because for the detailed texture of an image, we assume that each pixel is related with its surrounding neighbors, as well as its distanced pixels. Comparing the long-distanced gradients of corresponding pixels between the two images helps to reduce the differences. The unique feature of TG loss takes account of this assumption, so that fake textures can be minimized in generated images.  Setting offset k to be the half of the total pixel number N (N w or N h ) can include all pixels of the image into the calculation process. When kmax is greater than N/2, some pixels will be excluded from the procedure of the loss calculation, and then the proportion of other pixels in the loss calculation will be increased. This excessive calculation will bring inhomogeneities. When kmax is equal to 1, only adjacent pixel variations are included into the loss calculation between two images. Furthermore, to evaluate the optimal value of kmax, controlled experiments were conducted. Based on those reasons, in order to generate an image with enhanced resolution and suppressed fake texture, we employed the difference of TG between original and generated images as the training loss.

Networks and training settings
In this study, we used the pretrained VGG16 [34] and ResNET152 [43] models to calculate the perception losses. The network was trained with a loss function having the following fixed hyperparameters: ε = 10 −3 , α = 1, β = 1, γ = 8 × 10 −3 , η = 10 −2 , θ = 1, which are the parameters in Eqs. (7) and (8) in the first training procedure. These parameters are adjusted depending on the loss changes and validation dataset performances, which including the generator loss and the PSNR (dB) calculated at the end of each epoch during training procedure. We adopted Adam optimization with a learning rate of 1 × 10 −3 for both the generative and adversarial networks. To speed up the training process and image generation procedure as well as to simulate the photon propagation properties in thick tissues, we adopted the network architecture shown in Fig. 2. In the network, we first down-sampled the low-resolution images by the stride convolution

Research Article
Vol. 10, No. 9 / 1 September 2019 / Biomedical Optics Express 4748 [38] to reduce the image size and speed up the calculation. Meanwhile, we extracted the main features by learning the parameters of the stride convolution kernel. Then, ResNET and sub-pixel convolution layer [44] structures were applied to upscale the images. Considering the similarity structures between low-resolution FIs and the simulated low-resolution images, a scale factor of 4× was selected for this study. We first trained the network by using natural image datasets with a batch size of 64 and random crop size of 64. A validation dataset was used to evaluate the network performance of every epoch. The first training step was run for 200 epochs, and the epoch with higher peak signal-to-noise ratio (PSNR) [45] calculated for the validation set was selected as the starting epoch in the next fine-tuning procedure. Both the down-sampling (strided convolution) and up-sampling (sub-pixel convolution layer) procedures were performed on the networks, and all were trained together during the training process.

Fine-tuning learning
To obtain more suitable networks for medical FIs, a fine-tuning procedure was performed during the second training process. For the fine-tuning procedure, a microscopic FI dataset including images with sharp edges was used for training. Based on experimental results, we adjusted the parameters of the loss function in Eqs. (7) and (8) as follows: ε = 1, α = 10, β = 10, γ = 8, η = 1, θ = 10. Adam optimization with a learning rate of 1 × 10 −6 was performed during the fine-tuning training procedure. The batch size was 64, and the crop size was 32. To avoid overfitting the fine-tuning training set, we developed a validation set to monitor the PSNR (dB) performance of every epoch and obtained better networks by stopping early.

Implementation
We implemented our FI resolution enhancement method with PyTorch 0.4.1, which is a library of Python 3.6. The code was run on a GPU (NVIDIA GeForce RTX 2070, 8 GB) and CPU (Intel Core i7-6700 @ 3.40 Hz).

Evaluation methods
The root mean square error (RMSE) [46], PSNR (dB), and structural similarity index (SSIM) [47] were used to compare the SRGAN and the improved FI resolution enhancement method. SRGAN was implemented in PyTorch [48]. FIs were divided into validation and test datasets for the training and evaluation of the networks. We adopted resolution plates, resolution plates with Poisson noise, and an original FI of blood vessels in mouse tail, as well as an intraoperative NIR fluorescence imaging video to test the performance of our FI resolution enhancement method.

Experiments and results
Experiments were performed as explained above. We evaluated the network performance with a macro FI test dataset. Note that the test images were down-sampled, and the original images were used as the ground truth for comparison. The experimental results are presented below.

Dataset preparation and experimental settings
The first training procedure was performed on natural image datasets (VOC2012 and DIV2K). The corresponding validation dataset consisted of four images that were randomly picked from CImageNET400 and constrained by the memory storage of the GPU. After the first training procedure using natural images, we also employed the second training (the fine-tuning procedure). Because our method was aimed to enhance the resolution of macro FIs, we needed to establish the fine-tuning training dataset with higher resolution FIs. Thus, we chose microscopy FIs with good image quality, along with the 4× down-sampled and re-up-scaled processing, as the second fine-tuning training dataset. Data augmentation strategies (e.g., cropping, cutting, adjusting the brightness) were used to expand the volume of the fine-tuning training dataset. The parameters were set according to the training procedure described above. The fluorescence microscopy images for fine-tuning training were taken from https://storage.googleapis.com/insilico-labeling/data_sample.zip.
To assist the fine-turning procedure, we further established a fine-tuning validation dataset, which consisted of 10 images from four intraoperative NIR fluorescence imaging videos. Furthermore, the test dataset consisted of 60 images from the other four NIR fluorescence imaging videos was used to evaluate the overall performance of the proposed method. To avoid overfitting caused by FI similarity, all employed images extracted from NIR videos were sampled with obvious differences. The intraoperative NIR fluorescence imaging videos were acquired by the Peking University People's Hospital (clinical trial number: NCT02611245 in ClinicalTrials.gov).

Fluorescence image dataset test results
We first applied our training procedure to a set of down-sampled and re-up-scaled natural images and made the networks learn this transform to recover the original high-resolution images. To evaluate the effects of the total gradient loss, TG loss with different parameters and fine-tuning procedure on the network training results, we trained the networks separately: without the TG loss (FI-GAN-NOTG) and with the TG loss for kmax = 1 (FI-GAN-TG-1), kmax = N/2 (FI-GAN-TG), as well as kmax = N (FI-GAN-TG-N) (Fig. 4(A) and 4(B)). These results showed that the TG loss with kmax = N/2 achieved a better performance. Please note that in these controlled experiments, we fixed the weight parameter (θ = 1) of the TG loss to evaluate the performances with different kmax values. Then, the epoch with better PSNR value of FI-GAN-TG (kmax = N/2) was selected as the preliminary training network result. We transformed the training procedure to the fine-tuning process (FI-GAN-TGFT). During this process, the validation indicators were also computed on a FI validation dataset. The second training loss and the resulted PSNR of the validation dataset are plotted in Fig. 4(C) and (D), respectively. The PSNR of the fine-tuning procedure first increased and then decreased ( Fig. 4(D)); we selected the peak as the final network.
We used 60 macro FIs as the test dataset to compare the performances of our method with different parameters with SRGAN. Table 1 presents the statistical results in terms of RMSE, PSNR, and SSIM. The results showed that in the first training procedure, the FI-GAN-TG (kmax = N/2) provided a better performance than the other two networks (kmax = 1 and N). After the second training procedure (the fine-tuning), the obtained FI-GAN-TGFT further improved the performance from FI-GAN-TG.  Figure 5 shows the performances with three examples in the test dataset. These FIs included the lung tissue, Indocyanine green (ICG) injection point for lung lymph node mapping, and the lung tissue incision. The first column shows the merged and color images of the corresponding FIs. The second and third columns show the original high-resolution (HR) images and the preprocessed low-resolution (LR) images.
To clarify the detail variations, we magnified parts of the images with red frames which are shown in every second row. It was difficult to discover the differences between the post-processing methods at the small scale, but the difference between processed images can clearly be seen when To assist the fine-turning procedure, we further established a fine-tuning validation dataset, which consisted of 10 images from four intraoperative NIR fluorescence imaging videos. Furthermore, the test dataset consisted of 60 images from the other four NIR fluorescence imaging videos was used to evaluate the overall performance of the proposed method. To avoid overfitting caused by FI similarity, all employed images extracted from NIR videos were sampled with obvious differences. The intraoperative NIR fluorescence imaging videos were acquired by the Peking University People's Hospital (clinical trial number: NCT02611245 in ClinicalTrials.gov).

Fluorescence image dataset test results
We first applied our training procedure to a set of down-sampled and re-up-scaled natural images and made the networks learn this transform to recover the original high-resolution images. To evaluate the effects of the total gradient loss, TG loss with different parameters and fine-tuning procedure on the network training results, we trained the networks separately: without the TG loss (FI-GAN-NOTG) and with the TG loss for kmax = 1 (FI-GAN-TG-1), kmax = N/2 (FI-GAN-TG), as well as kmax = N (FI-GAN-TG-N) ( Fig. 4A and 4B). These results showed that the TG loss with kmax = N/2 achieved a better performance. Please note that in these controlled experiments, we fixed the weight parameter ( 1 θ = ) of the TG loss to evaluate the performances with different kmax values. Then, the epoch with better PSNR value of FI-GAN-TG (kmax = N/2) was selected as the preliminary training network result. We transformed the training procedure to the fine-tuning process (FI-GAN-TGFT). During this process, the validation indicators were also computed on a FI validation dataset. The second training loss and the resulted PSNR of the validation dataset are plotted in Fig. 4C and D, respectively. The PSNR of the fine-tuning procedure first increased and then decreased (Fig. 4D); we selected the peak as the final network. magnified. False textures were observed with SRGAN and diminished with the proposed method. The quantitative analysis of Fig. 5(A) is shown in Fig. 5(D). The change in the fluorescence intensity at the red line indicates that our method fit well to the original image, and SRGAN showed a jittery curve because of fake textures (stripe artifacts). All of the preprocessed LR images were after the 4× down-sampling. The difference between our method and SRGAN was whether the image input into the network was re-amplified to the original size by bicubic interpolation.

Resolution plate test results
As a practical application comparation with SRGAN, we tested the performance of the proposed method with a fluorescence resolution plate made of the serum-soluble ICG and a negative optical resolution plate. A solution was prepared by dissolving 2 mg of ICG in 10 ml of serum. A thin layer of the ICG solution was placed over a Petri dish, and the negative of the optical resolution plate was slowly placed on it to avoid air bubbles. The resolution plate was used with fluorescence imaging equipment developed by the Laboratory of Molecular Imaging at the Chinese Academy of Sciences to detect fluorescence signals. Figure 6(A) shows the fluorescence resolution plate images processed results. The first column is the original image magnified four times of the fluorescence resolution plate and the partial enlargement areas indicated by red boxes. The last two columns are the post-processed images. The image processed with the proposed resolution enhancement method showed clear improvement in the imaging resolution, but the one processed by SRGAN still had the problem of fake textures. The fluorescence intensity curve further shows that our method reduced the fake textures, achieved a balance between the fake details and sharpening, and overall performed better ( Fig. 6(B)).
To evaluate the robustness of our method under a noise condition, after the bicubic interpolation, the four times enlarged white light image and the corresponding NIR fluorescence image of the We used 60 macro FIs as the test dataset to compare the performances of our method with different parameters with SRGAN. Table 1 presents the statistical results in terms of RMSE, PSNR, and SSIM. The results showed that in the first training procedure, the FI- fluorescence images, respectively (Fig. 6D), which indicates that our method can survive from the interference of Poisson noise and enhanced the image resolution.

Practical application tests of resolution enhancement
Considering the advantages of our method for resolution enhancement and edge sharpness, we believe that it will be benefit for intraoperative NIR fluorescence imaging. Therefore, we applied this method to the in vivo fluorescence imaging of blood vessels in a mouse tail to further evaluate its performance. A 7-week-old nude mouse was used and the experiment was conducted under the guidelines approved by the Institutional Animal Care and Use Committee at Peking University. It was injected with ICG at a concentration of 0.1 mg/ml through the tail vein immediately before fluorescence imaging. The obtained NIR FI was then processed with our proposed method. Because SRGAN is used for processing images with a small size, it has less benefit for processing such a fluorescent divergent image besides magnifying it. Different from SRGAN, the proposed method has the network structure of first down-sampling and then re-up-scaling the image to the original size. Therefore, it fitted the optical scattering characteristics and provided good results, as shown in Fig. 7. The contour of the three blood vessels in the mouse tail were blurred and merged into each other in the original NIR FI (Fig. 7A). It was not easy for observers to distinguish these vessels by naked eyes. However, our method improved the contrast with much sharper contours of these vessels (Fig. 7B). resolution plate were deliberately added with the Poisson noise. Then, the proposed method was applied into the Poisson noise images to enhance the resolution ( Fig. 6(C)). The line pairs in each orange frame were further enlarged for comparisons (red, green, and blue frames for the origin, Poisson noise, and processed images, respectively). The originally distinguishable line pairs became unrecognizable after adding Poisson noise. However, our method successfully enhanced the image resolution with sharper edges, and the three lines became recognizable again. Quantitative comparisons of intensity curves were also plotted for white light and fluorescence images, respectively (Fig. 6(D)), which indicates that our method can survive from the interference of Poisson noise and enhanced the image resolution.

Practical application tests of resolution enhancement
Considering the advantages of our method for resolution enhancement and edge sharpness, we believe that it will be benefit for intraoperative NIR fluorescence imaging. Therefore, we applied this method to the in vivo fluorescence imaging of blood vessels in a mouse tail to further evaluate its performance. A 7-week-old nude mouse was used and the experiment was conducted under the guidelines approved by the Institutional Animal Care and Use Committee at Peking University. It was injected with ICG at a concentration of 0.1 mg/ml through the tail vein immediately before fluorescence imaging. The obtained NIR FI was then processed with our proposed method. Because SRGAN is used for processing images with a small size, it has less benefit for processing such a fluorescent divergent image besides magnifying it. Different from SRGAN, the proposed method has the network structure of first down-sampling and then re-up-scaling the image to the original size. Therefore, it fitted the optical scattering characteristics and provided good results, as shown in Fig. 7. The contour of the three blood vessels in the mouse tail were blurred and merged into each other in the original NIR FI (Fig. 7(A)). It was not easy for observers to distinguish these vessels by naked eyes. However, our method improved the contrast with much sharper contours of these vessels (Fig. 7(B)). Such resolution enhancement made all three vessels become more recognizable, and their anatomical structure was consistent with the ground truth white light image (Fig. 7(C)). Quantitative comparisons between the original FI and the processed FI also proved that the contrast of the three vessels was improved (Fig. 7(D)).
Such resolution enhancement made all three vessels become more recognizable, and their anatomical structure was consistent with the ground truth white light image (Fig. 7C). Quantitative comparisons between the original FI and the processed FI also proved that the contrast of the three vessels was improved (Fig. 7D). Furthermore, we applied the proposed method to a short video of intraoperative NIR fluorescence imaging acquired during a breast cancer surgery for sentinel lymph node mapping. The results showed that our method successfully processed the 640 360 × pixels video in real time, and the overall resolution of lymph-vessels was enhanced with sharper contours (supplementary video, Fig. 8). Furthermore, we applied the proposed method to a short video of intraoperative NIR fluorescence imaging acquired during a breast cancer surgery for sentinel lymph node mapping. The results showed that our method successfully processed the 640 × 360 pixels video in real time, and the overall resolution of lymph-vessels was enhanced with sharper contours (supplementary video, Fig. 8).
image of the mouse tail. The three blood vessels can be clearly seen as they are close to the skin surface. (D) The comparison of fluorescence intensity curves at the yellow line indicated in the right bottom of (C). Furthermore, we applied the proposed method to a short video of intraoperative NIR fluorescence imaging acquired during a breast cancer surgery for sentinel lymph node mapping. The results showed that our method successfully processed the 640 360 × pixels video in real time, and the overall resolution of lymph-vessels was enhanced with sharper contours (supplementary video, Fig. 8).

Conclusions
We presented a GAN-based method that uses the total gradient loss for FI resolution enhancement. This is a post-processing method based on enhancing the resolution of a single image. The total gradient loss acts as a constraint to approximate the gradient of the generated image to that of the original image. Our results suggest that our method provides a better performance with fewer fake textures than SRGAN. However, the problem of generating false features still exists, which is caused by the hallucinations of the networks. New methods need to be investigated to further minimize false features after such resolution enhancement, which we will carry on in future studies. In contrast with SRGAN, which directly processes the down-sampled images, our method re-up-scales images to simulate photon scattering in thick tissues. The results for the resolution plates, the original FI of the blood vessels in mouse tail and the original video of intraoperative NIR fluorescence imaging further illustrated the applicability of the proposed method in actual fluorescence imaging for resolution enhancement. For 1280 × 720 pixels video, the output frame per second rate for the proposed method is 5 fps under the experimental environment in this study. In this study, we only used the 4× scaling factor, which limited the best performance that could be obtained in various imaging situations. Future research can focus on adding a scaling factor estimation procedure based on the actual imaging situation to achieve an adaptive resolution enhancement effect for the image processing.