High-throughput, high-resolution deep learning microscopy based on registration-free generative adversarial network

: We combine a generative adversarial network (GAN) with light microscopy to achieve deep learning super-resolution under a large field of view (FOV). By appropriately adopting prior microscopy data in an adversarial training, the neural network can recover a high-resolution, accurate image of new specimen from its single low-resolution measurement. Its capacity has been broadly demonstrated via imaging various types of samples, such as USAF resolution target, human pathological slides, fluorescence-labelled fibroblast cells, and deep tissues in transgenic mouse brain, by both wide-field and light-sheet microscopes. The gigapixel, multi-color reconstruction of these samples verifies a successful GAN-based single image super-resolution procedure. We also propose an image degrading model to generate low resolution images for training, making our approach free from the complex image registration during training data set preparation. After a well-trained network has been created, this deep learning-based imaging approach is capable of recovering a large FOV (~95 mm 2 ) enhanced resolution of ~1.7 μ m at high speed (within 1 second), while not necessarily introducing any changes to the setup of existing microscopes

(SR) techniques present a computational way to increase the space-bandwidth product of a microscope platform [1,[7][8][9][10][11][12][13][14][15][16][17][18][19] For instance, pixel super resolution (PSR) represents a class of spatial domain techniques that can fuse multiple large FOV, low resolution measurements with sub-pixel shifts into a high resolution image [17,18]. On the other hand, several frequency domain methods, e.g., Fourier ptychographic microscopy (FPM) [1], synthetic aperture microscopy [7][8][9][10] and structured-illumination microscopy [20,21], produce a resolution-enhanced image by stitching together a number of variably illuminated, lowresolution images in Fourier domain. Despite offering unique imaging capabilities with scalable SBP, these methods, however, all require special hardware setup and complex computation on multiple frames. Nevertheless, another type of technique, named single image super resolution (SISR), has been widely applied in microscopy without these constraints. It aims at the reconstruction of a high-resolution (HR) images with rich details from single lowresolution (LR) image. For this technique, the conventional widely used method is the example-based approach [22,23], which works by replacing the LR information with the HR patches searched out in the example dictionary. Although SISR requires neither highresolution imaging hardware architecture nor intensive computation resource, the quality of reconstructed images remains suboptimal as compared to the multi-frame methods. The recent advent of deep learning neural network is providing another way to realize more effective SISR. Apart from its success in medical diagnosis like carcinoma detection, gliomas grading, histopathological segmenting and classifying [24][25][26], deep learning has been used in the super-resolution in bright-field microscopy [27,28] as well as fluorescence microscopy [29][30][31][32]. The most recent model that utilizes the generative adversarial network (GAN) for better visual details enhancement, has reached remarkable resolution enhancement [29,32]. However these methods require an extra image registration between high-resolution and lowresolution training pairs captured under different magnifications. Considering a pixel-wise error function is the most common practice in super resolution, the accuracy of registration could affect the performance of the neural network.
Here we present a deep learning-based super resolution approach that is free from registration during training process, meanwhile capable of providing significant resolution enhancement for conventional microscopy, without the need of acquiring a plurality of frames or retrofitting existing optical systems [33]. This imaging method uses data sets that consist of high-resolution measurements and their low-resolution simulations to train a GAN model. We carefully model the image degradation of the microscope system to generate low-resolution trial images from measured high-resolution source images, thereby eliminating the need of complicated alignment between the high-and low-resolution pairs. As long as the network training is accomplished, the network is capable of using single low-resolution measurement of a new specimen to recover its high-resolution, large FOV image. We demonstrate the efficiency of this registration-free GAN microscopy (RFGANM) method with bright-field image of USAF resolution target, color image of whole pathological slides, dual-channel fluorescence image of fibroblast cells, and light-sheet fluorescence image of a whole mouse brain, verifying that it's widely applicable to various microscopy data. By taking a few example images as the references and applying a GAN deep-learning procedure, we can transform a conventional optical microscope into a high-resolution (~1.7 μm), wide-FOV (~95 mm 2 ) microscope with a final effective SBP of 0.13 gigapixels. Furthermore, unlike the training stage that must be performed on GPUs to greatly reduce the time cost, reconstructing procedure can work readily with an ordinary CPU device in still acceptable time of several minutes per image. This underlying advantage renders RFGANM a robust platform that allows multiple applications to be followed once after a well-trained SR artificial intelligence based system is established. In the following, we will briefly describe the RFGANM operation and experimental set-up, discuss how to perform the network training and inference process, discuss its imaging applications in a variety of biomedical samples, and demonstrate how RFGANM can benefit bio-medical analysis such as cancer diagnosis, cell counting in pathological section images and neuron profiling in light sheet image of mouse brain.

Deep learning based image super resolution reconstruction
A classic GAN model [34] that consists of a generator and a discriminator, is used to "learn" the various types of microscopy data from scratch. Figure 1 illustrates the network training and inference process. We establish its capability of mapping from a LR image to a HR reconstruction as shown in Fig. 1(a). Firstly multiple HR images of the example specimen are captured under high-magnification objective ( Fig. 1(a), step 1), then through accurately modeling the transfer function of the microscope system, we can obtain the down-sampled, blurred images of the example specimen directly via simulation ( Fig. 1(a), step 2). Based on its currently-learned parameters, the generator creates resolution-enhanced reconstructions of LR simulations in each training iteration ( Fig. 1(a), step 3). The differences between the generator outputs and the realistic HR images are calculated using the mean squared error (MSE), denoted as the content loss function of the generator ( Fig. 1(a), step 4). Besides the generator, GAN includes an additional discriminator that aims to evaluate the reliability of the generator. This discriminator makes a judgement on whether an image is a reconstruction by the generator or a realistic high-resolution measurement, after they are randomly input ( Fig.  1(a), step 5). An adversarial loss is created to estimate the accuracy of the discriminator's judgement. It iteratively optimizes the discriminator, aiming at an enhanced capability on making correct decision. Also, the adversarial loss together with the content loss are used to optimize the generator, pushing it towards the direction that generates more perceptually realistic outputs which can further fool the discriminator. This adversarial training process thereby promotes the accuracy of both the generator and the discriminator. The training process can be terminated when the generator produces results that the discriminator can hardly tell from the realistic HR images. Then in the inference phase, a LR measurement of sample, which is excluded from the training data set, is divided into several patches and fed into the well-trained generator ( Fig. 1(b), step 6). The generator is capable of recovering high frequency information for each patch, based on the prior GAN training. These qualityimproved patches are finally stitched into one gigapixel image of the sample that encompasses high-resolution details as well as large FOV ( Fig. 1(b), step 7). The aforementioned image reconstruction process is illustrated in Fig. 1(b), and the overall implementation process of our approach is illustrated in Fig. 5(a). It is noteworthy that usually the GAN training is required only once, and then applicable to the recovery of multiple samples with similar type of signals.

Image degrading model
It is widely accepted that the performance of a neural network relies heavily on the training data set, where there are LR images as inputs and HR images as targets for super resolution task. These LR and HR image pairs for training can be obtained in two ways. Most intuitively, both LR and HR images are experimentally captured with microscope. However, since the LR and HR image pairs are taken under different magnifications, image cutting and registration techniques must be used to match the FOV and remove the unavoidable distortion. Therefore, the performance of image registration is the key to the quality of training data, which is mainly based on feature detection and matching. Unfortunately, in cases of cell, tissue and brain imaging, a great deal of feature details is lost in LR images compared with the corresponding HR images due to the down-sampling process, leading to a high failure rate of image registration. Even though we have used a decent and standard image registration procedure, the mismatch between LR and HR images happens a lot, which significantly deteriorates the quality of training data set.
Instead of capturing LR and HR images under different magnifications and then aligning them, we can apply an image degrading model to the captured HR images to generate the simulated LR images. In a nutshell, the LR images for training are directly down-sampled from the HR images, so we can guarantee that the two images share the same FOV. To make sure that our model trained on the simulated LR images can still well super-resolve the experimentally captured LR images, the image degrading model we used should be able to produce a sim degrading pro where I is th point spread operator * is t the discretizat contributed b measurement In practic obtained und sampling on t kernel in the procedure is v we de-noise measurement sigma value i result with th model being and the meas 2(g). Except differences ar degrading mo Besides using the GAN framework to encourage perceptual similarity, we further used the special feature reconstruction loss function proposed by Johnson et al. [37]. Let be the activations of the jth convolution layer of the VGG19 network described in Simonyan and Zisserman [38] where j in our experiments was set to 12.
In addition to the losses described so far, we also need to add the adversarial component of our GAN for the generative side to the perceptual loss. It is defined based on the probabilities of the discriminator over the reconstructed samples as: For better gradient computation stability As for the discriminator, It first contains 8 convolutional layers with 4 x 4 kernels followed by BN layers and LeakyReLU (α = 0.2) activation (except that the first convolutional layer does not come with BN). Through these 8 layers, the feature map dimension first increases gradually by a factor of 2 from 64 to 2048, then decreases by the same factor to 512. Strided convolutions with stride of 2 are used to reduce the image resolution each time the number of features is doubled. Afterwards, the network is followed by a residual block that contains three convolutional layers followed by BN and LeakyReLU activation. Finally, the resulted 512 feature maps are flattened and connected by one dense layer and a sigmoid activation function to obtain the final probability over whether the input image is natural or not. The network is trained by minimizing the following loss function:

Characterization of super-resolution GAN
The capability of GAN is first characterized through imaging a negative USAF resolution target (Thorlabs R3L3S1N) with highest resolution of 228 line pairs per mm (lpm). We captured HR and corresponding LR images under a macro microscope (Olympus MVX10) with 10 × and 2 × total magnifications, respectively. Due to the simple pattern of the test target, an image registration was applied to match their corresponding FOV, forming strictly aligned HR and LR pairs for the GAN training. Considering the limited number of experimentally obtained samples, we applied a geometric transformation, such as translation and rotation, to these paired images to further expand the data set. Finally, 1008 groups of HR and LR pairs were imported into the GAN network for training. Another large FOV, LR measurement was used to validate the converged network ( Fig. 5(b)). As shown in Fig. 5(c), (d), the 5x-enhanced reconstructions have a significant improvement compared to the raw images. Due to the small magnification factor as well as limited numerical aperture, the raw 2 × image can hardly discern the high-frequency stripes in USAF target ( Fig. 5(b), cyan box for 114, and orange box for 228 lpm). The RFGANM reconstruction, in contrast, has resolved the finest part of USAF target (Fig. 5(d2), 228 lpm). The GAN-reconstruction results are further compared with the realistic measurement under a 10 × magnification (Figs. 5(c3) and 5(d3)), showing a good structural similarity (SSIM) to the high-resolution ground truth. The linecuts through the resolved line pairs (Fig. 5(d)) by each method are quantified in Fig. 5(e), revealing a substantially improved resolution by GAN.  zed threed to the M has the d [40]. As important alternative to conventional histology imaging approaches. However, even for current LSFM, the optical throughput of the system optics remains insufficient to intoto map the cellular information throughout a specimen of large volume size; for example, for visualization of the fine neuronal networks across a mouse brain. Tile imaging is the commonly-used approach to artificially increase the SBP, and realize high-resolution imaging of large specimens [41,42]. Despite the compromised speed induced by repetitive mechanical stitching, the high illumination/ detection NA configuration in tile imaging greatly limits the fluorescence extraction from deep tissue of the thick specimens. We demonstrate, instead of commonlyused tile stitching, which significantly sacrifices the throughput and limit the signal extraction from deep tissue, the integration of RFGANM with light-sheet imaging can achieve highresolution imaging of selective sectional planes in a whole adult mouse brain. We first constructed a macro-view light-sheet geometry with wide laser-sheet illumination and large-FOV detection (Fig. 8(a)), which can fully cover an optically-cleared P30 mouse brain (Tg: Thy1-GFP-M). 200 consecutively-illuminated transverse planes in the middle of the brain (depth 2 to 3 mm) were recorded ( Fig. 8(b)), with their maximum-intensity-projection (MIPs) showing the global distribution of the neurons. The raw plane images simply accept the limited resolution from the macro-view LSFM system optics, hence the densely-packed neuronal fibers remain dim. The super-resolved image is then instantly obtained by RFGANM, with a reconstructed pixel size of 0.53 μm (Figs. 8(c)-8(h)). The -result is furthermore compared to higher-magnification light-sheet measurements (6.4 × detection) to confirm the authenticity of the computation. In Fig. 8(d3), the neuronal dendrites identified by each method reveal substantially improved resolution from RFGANM. Therefore, besides conventional epifluorescence methods, RFGANM is proven to be the same efficient to the LSFM imaging, which together are capable of rapidly obtaining the high-resolution signals from arbitrary planes of intact large tissues. Furthermore, in the light of the strong 3-D imaging capability of LSFM, RFGAN-LSFM is possibly to be extended to the third dimension in the future, to achieve high-throughput, high-resolution volumetric mapping of whole specimens, such us intact organs, and whole embryos.

Quantita
We calculate [43]  and HR counterparts. Theoretically a network trained by one type of sample images should be also applicable to similar types of samples. For example, we can reasonably speculate that a GAN generator well-trained by healthy prostate data can work with the prostate cancer tissue as well. To test this underlying robustness, we blindly apply the network trained by healthy prostate tissue images to reconstruct a low-resolution image of prostate cancer tissue. Its outputs are compared with those from a real prostate-cancer-data-trained network, as shown in Fig. 9 below. Both networks recover highly similar structures with similar qualities presented, capable of resolving high-resolution cellular details, such as the nucleus and textures. It strongly suggests that GAN network could be highly robust, implying that RFGANM can go further with being applied to the reconstruction of a variety of samples merely with single type of data training.

Histopathological diagnosis and cell counting by RFGANM
Large-scale quantitative analyses are further enabled based on RFGANM imaging, as shown in Fig. 10. In the four segmented encephalic regions of a whole mouse brain section ( Fig.  10(a)), the neurons are identified and calculated with populations (Imaris visualization software, Fig. 10(b)) using 1.6 × LR, 6.4 × HR and RFGANM images, respectively. The counting results are plotted as Fig. 10(c). Due to a severe structure details decimation, 1.6 × LR results fail to precisely count the numbers of neurons recognized. Even denoised using a widely accepted BD3M algorithm [44,45], the results of 1.6 × are still far away from HR ground truth. In contrast, the counted numbers of neurons in RFGANM images are very close to those of the 6.4 × HR measurements. Figures 10(d)-10(i) further show the cell nuclei counting of the healthy prostate tissue and the prostate cancer tissue images. Figure 10(d) shows the normal prostate gland, which has a lobular architecture. The glands are grouped and often have folded contour. Figure 10(f) shows the glands of prostate cancer invasion, which has predominant cribriform glands, lack of the component of well-formed glands. Unlike the raw low-resolution measurement which is not able to discern the single cell nuclei in detail, GANM image here enables fairly accurate cell number counting within the microtumor, which is beneficial to the doctor for more specific rating of the cancer invasion. With the integration of RFGANM into biomedical application, which is featured by the combination of sing-cell resolution and centimeter large field view at seconds high throughput, biological research or clinical work efficiency can be considerably improved for quantitatively checking of large mass surgical specimens.

Sample p
The fluoresc endothelial c preparation, a Fluor 488 pha 10

Training implementations
Our model is implemented based on Google's deep learning framework, TensorFlow (version r1.8), and trained on an Inspur server with two NVidia Tesla P100 GPUs. The source code together with a small example data set is available on GitHub (https://github.com/xinDW/RFGANM). Initiating with a batch size of 16 and a fixed learning rate of 10 −4 , we trained the network for 200 epochs, which took about 48 hours.

Inference process
In the inference phase after network training, the experimentally captured LR images for validation are cut into a bunch of small pieces with overlaps with each other, and then input into the network for super-resolved reconstruction piece by piece. Afterwards, all these output pieces are stitched into one whole image that possesses both large FOV and high resolution. The stitching process is achieved by matching the overlapped regions, which is very robust and accurate. The inference process is quite fast. An image piece of 100 × 100 pixels size takes less than 0.01 second to be super-resolved into a 400 × 400 pixels image, even on an ordinary Windows laptop with Intel Core i5 CPU.

Imaging setups
There are several kinds of microscopy images in our experiment: the bright-field grayscale resolution test target images, the dual-color fluorescent BPAE images, the bright-field color images of two types of tissue slides, and the light sheet images of mouse brain. Images of resolution target were recorded by a Photometrics IRIS15 camera (pitch size 4.25 μm), with the HR and LR images taken under 10 × and 2 × magnifications of an Olympus MVX10 microscope, respectively. The BPAE cells were imaged under an Olympus IX73 microscope equipped with a HAMAMATSU ORCA-Flash 4.0-V2 camera (pitch size 4.25 μm). In both fluorescent channels, the HR training images and LR validation images were taken under a 2 × 20 × /0.45 and 4 × /0.1 objective, respectively. For pathology slide imaging, a QHY247C color camera (pitch size is 3.9 μm) was used on the Olympus MVX 10 microscope to capture the healthy prostate/prostate cancer tissues stained with hematoxylin-eosin. The HR training and LR validation images were then taken under 10 × and 2.5 × magnifications, respectively. The sectional images of intact mouse brain were obtained by a macro-view light-sheet system, which comprised a self-made dual-side laser-sheet illumination and large-FOV widefield detection by Olympus MVX10 microscope body. The images were sequentially recorded using Photometrics IRIS15 camera under 1.6 × and 6.4 × magnifications.

Conclusion
We have demonstrated a deep learning-based microscopy method without the requirement of extra registration procedure in training course, which can significantly improve the resolution of conventional wide-field and cutting-edge light-sheet fluorescence microscopes, and greatly increase the imaging throughput for whole biomedical specimens. We apply a state-of-the-art GAN network to deeply learn how to map from the low-resolution microscopy images to their high-resolution counterparts. For cell and tissue images that contain complicated patterns, their low-resolution training data are artificially generated and intrinsically registered to the high-resolution training images via a degradation model. This step has simplified the data preprocessing and improved the robustness of the GAN network. Once the model training being accomplished, the well-established AI agent is capable of quickly reconstructing a large FOV, super-resolution image of new sample based on a single low-resolution snapshot taken by an ordinary optical microscope. Besides the improved resolution that has been verified by imaging of resolution target and PSNR analysis, the structure similarity to the sample ground truth has also been quantified, at a high level of over 90%. Currently we have proven RFGANM method could be beneficial to biomedical applications such as cell counting and histopathological diagnoses. At the same time, the artifacts existing in some minor regions still need to be refined in future work, by upgrading the network structure as well as the training algorithm. We also prove that this RFGANM method is very robust, readily applicable to most forms of microscopy data such as bright-field images, epifluorescence images, and light-sheet fluorescence images. It significantly extends the SBP of these microscope systems neither at the cost of acquiring multiple frames nor relying on the retrofit of conventional microscope system. Therefore, RFGANM has a high temporal performance, but shows a much better image quality that is comparable to those multi-frame SR methods. As a reference point, it produces a 0.38 gigapixel digital pathology slide at 1 μm resolution, with an acquisition time of 0.01 second and computation time of less than 1 second. This high-resolution combined with high-throughput capability renders RFGANM a valuable tool for many applications, such as tissue pathology and neuroanatomy. Furthermore, though currently we demonstrate the combination of deep learning and convolutional neural network with optical microscopy in form of 2-D imaging of exvivo samples, we can reasonably expect that provided its superior spatial-temporal performance, this methodology will be also applicable to both 3-D microscopy and highly dynamic process.