Efficient high-resolution fluorescence projection imaging over an extended depth of field through optical hardware and deep learning optimizations

Optical microscopy has witnessed notable advancements but has also become more costly and complex. Conventional wide field microscopy (WFM) has low resolution and shallow depth-of-field (DOF), which limits its applications in practical biological experiments. Recently, confocal and light sheet microscopy become major workhorses for biology that incorporate high-precision scanning to perform imaging within an extended DOF but at the sacrifice of expense, complexity, and imaging speed. Here, we propose deep focus microscopy, an efficient framework optimized both in hardware and algorithm to address the tradeoff between resolution and DOF. Our deep focus microscopy achieves large-DOF and high-resolution projection imaging by integrating a deep focus network (DFnet) into light field microscopy (LFM) setups. Based on our constructed dataset, deep focus microscopy features a significantly enhanced spatial resolution of ∼260 nm, an extended DOF of over 30 µm, and broad generalization across diverse sample structures. It also reduces the computational costs by four orders of magnitude compared to conventional LFM technologies. We demonstrate the excellent performance of deep focus microscopy in vivo, including long-term observations of cell division and migrasome formation in zebrafish embryos and mouse livers at high resolution without background contamination.


Introduction
Wide field fluorescence microscopy (WFM) is a common and cost-effective commercial tool for biologists to observe dynamics in cultured cells and living animals [1,2].However, it has been rarely employed recently since its shallow depth of field (DOF) causes severe background contamination, making it challenging to detect high-resolution details.High imaging contrast and resolution are essential for reliable observation and analysis in many applications, such as developmental biology and immunology, but WFM cannot meet these requirements due to its inherent limitations [3,4].Consequently, WFM has been gradually replaced by more complicated yet higher-cost instruments, such as confocal [5,6] and light sheet microscopy [7,8].However, these technologies require complex scanning systems to acquire multiple axial planes for 3D imaging, which thus increases the cost and reduces the speed of imaging.But for most biological observations [3,[9][10][11][12], researchers often make projections of 3D volumes over the z axis for visualization.The current methods of acquiring 3D images are indirect and inefficient, which lower the imaging speed and raise a challenge for data storage.Moreover, it would induce stronger phototoxicity in biological samples due to the increased dwell time of excitation light [13,14].Therefore, capturing large DOF within a single image without axial scanning is a more efficient way for biological researches [15].
Light field microscopy (LFM) is a novel technology that can capture signals from a large DOF in a single snapshot.LFM efficiently keeps photons along the axial dimension using a microlens array (MLA) to separate spatial and angular information [16][17][18][19][20][21].While LFM is primarily designed for 3D imaging, it often compromises the spatial resolution for retrieving more volumetric pixels.To reconstruct a 3D volume from the 2D measurement, researchers additionally developed light-field deconvolution (LFD) algorithm [22,23] and deep learning-based methods [24][25][26][27].However, they suffer from huge computing costs, severe artifacts, and poor generalization, due to the hundredfold pixel gap between the input and output data.Furthermore, massive disk arrays are required for data storage, loading, and analysis, resulting in a tedious processing time, which is inefficient and impractical in the biological laboratory.
To address these problems, we propose an efficient imaging framework that combines the LFM setup with a lightweight deep focus network (DFnet) to extract information from a large axial coverage and generate a 2D high-resolution image.The whole pipeline is called deep focus microscopy, wherein deep learning technologies are integrated with the optical hardware of LFM to achieve an extended axially focus range.Since macro-pixel images embed information from multiple axial planes into the spatial and angular dimensions, deep focus microscopy has the potential to handle this learning task by elaborately exploiting spatial-angular correlations.By leveraging data priors during the network training, deep focus microscopy achieves incoherent aperture synthesis and exhibits significantly enhanced spatial resolution, extended DOF, and superior generalization across diverse sample structures when compared to previous methods.We demonstrate that deep focus microscopy achieves a lateral resolution of ∼260 nm and a DOF of over 30 µm within a wide FOV of 218 × 218 µm 2 based on a 63×/1.4NA objective lens, surpassing that of WFM by more than tenfold.Experiments on fluorescence beads, resolution charts, and biological samples confirm that our method achieves contrast enhancement with much fewer artifacts and ∼300 times less computational costs.Furthermore, deep focus microscopy can observe elaborate subcellular dynamics, including cell division and migrasome generation in zebrafish embryos and mouse livers in vivo with high fidelity.To promote the community, we also release an open-source dataset consisting of 1,400 pairs of light-field raw macro-pixel images and their corresponding high-resolution 2D counterparts, spanning both experimental and synthetic data.

Principle of deep focus microscopy
The DOF of an optical microscope is inversely proportional to the numerical aperture (NA) of its objective lens.If emission light rays can be separated based on angles at the pupil plane, an extended DOF will be obtained due to the low NA for each angular component.Inspired by LFM [16], we adopt a simple yet cost-effective approach to separate angles by inserting an MLA at the image plane of commercial microscopy.Compared to WFM with full NA, our setup has a larger DOF with reduced effects from out-of-focus light (Fig. 1(a)).However, traditional LFM relies on high computational costs to reconstruct a 3D volume from raw LFM measurements.The spatial-angular property also sacrifices spatial resolution and makes the system unable to capture fine structures at the subcellular level [22,[24][25][26].To address the tradeoff between resolution and DOF, we devise an efficient strategy in our deep focus microscopy.Instead of retrieving 3D volumes from limited measured pixels, we turn to directly reconstruct a high-resolution 2D image that contains signals from extended DOF without background contamination to enhance image contrast and resolution, while reducing the processing time by 4 orders of magnitude (Fig. 1(b)).Furthermore, deep focus microscopy demonstrates a capability to achieve almost the same performance as confocal microscopy while facilitating up to 100-fold reduction in the number of axial layers (Fig. 1(c)).

Overview of DFnet
To better exploit the physical priors offered by spatial-angular correlations, we have devised the DFnet that organizes angular information along the channel dimension and performs feature extraction across multiple spatial scales, resulting in effective representations with rich spatial details and informative contexts.Our DFnet adopts a symmetrical encoder-decoder structure and incorporates an attention mechanism within skip connections to effectively enhance fine-grained textures.Cascaded up-sampling layers are used to generate high-resolution images as the output of DFnet (Fig. 2(a)).Specifically, the raw light-field measurements (with a size of 1989 × 1989 pixels) were rearranged into the form of image stack (with a size of 153 × 153 × 169 pixels, height × width × angle) and fed into our DFnet, obtaining a 2D tensor output (with a size of 1989 × 1989 pixels).To construct the training dataset, we also added a galvo system before the MLA to acquire highresolution scanning LFM data.Specifically, the scanning LFM (sLFM) system [3,28] periodically scans the image plane in a 3 × 3 grid by stepping at 1/3 of the size of microlens.Pixels that have the same relative position to the center of each microlens are considered to capture the same angular component of the light-field.The process improves the spatial sampling rate that limits the resolution of LFM.By realigning these angular components from different microlenses, high-resolution scanning LFM images (with a size of 5967 × 5967 pixels) can be obtained.An LFM image (with a size of 1989 × 1989 pixels) can be extracted from them as input.The training targets (with a size of 1989 × 1989 pixels) were generated by acquiring corresponding xy-MIPs, which were reconstructed from scanning light-field measurements captured by sLFM.We formed training pairs using the LFM images and 2D high-resolution images, which were well matched without registration.
The widely adopted mean squared error (MSE loss) was utilized as the pixel-wise loss function to guide the network training.For data pre-processing, the input data (with a size of 153 × 153 × 169 pixels) was normalized by its maximum intensity, while the corresponding target (with a size of 1989 × 1989 pixels) was normalized by the 99.8%-score intensity.During the training stage, the dataset was randomly cropped into small patches.The input sizes were 64 × 64 × 169 pixels, and the target sizes were 832 × 832 pixels.We typically utilized 4,500 paired patches for the training.The whole training process took approximately 10 hours (about 2,000 epochs) for convergence.We set the initial learning rate to 10 −4 and decreased it by a factor of 0.5 for each 200 epochs.The ADAM algorithm [29] was used to optimize the network with a batch size of 5.During the network inference, we directly generated predictions for the whole-FOV images due to the low computational cost of DFnet.
The network structure of DFnet is designed to fully maintain the high-resolution details embedded in multiple spatial-angular views, which employs a multi-scale symmetrical encoderdecoder architecture [30] with attention mechanisms [31], followed by an up-sampling block.The input (with a size of 64 × 64 × 169 pixels) is first fed into a convolution block consisting of two basic layers, each of which includes a 3 × 3 convolution, a batch normalization, and a rectified linear unit (ReLU).After acquiring an initial feature (with a size of 64 × 64×C pixels, height × width × channel), where C is the channel number and is usually set to 64, three down-sampling operations are used to broaden the receptive fields.Each down-sampling layer comprises a max-pooling layer followed by a convolution block with doubled channels and halved size to generate a spatial feature (with a size of 8 × 8 × 512 pixels).Next, three symmetrical up-sampling operations, each consisting of deconvolutions and a convolution block, are inserted to obtain an up-sampled feature (with a size of 64 × 64 × 64 pixels).Note that skip connections are performed via concatenation in the decoder to incorporate high-level semantic information.Furthermore, we introduce channel and spatial attention mechanisms in the skip connections to enhance feature representations, which are fed into the up-sampling block to fit the output size.Four basic blocks are included, each of which consists of an up-sample layer, a 3 × 3 convolution layer, and a ReLU layer.The size is doubled, and the channel is halved in each basic block so that a feature with a size of 1024 × 1024 × 4 pixels is obtained.Last, a 1 × 1 convolution layer and a resizing operation are adopted to get the final output with a size of 832 × 832 pixels.
Based on the elaborate feature interaction, our framework establishes a mapping from the input LFM measurements to the extended-DOF images.The input and output of the mapping have equal pixel numbers, which mitigates the ill-posed nature of the optimization compared to previous 3D LFM reconstruction methods.By simplifying the task with a well-designed network, DFnet can gradually converge to a high-resolution solution (Fig. 2(b)).Therefore, our deep focus microscopy achieves high-resolution imaging with reduced artifacts, enabling its applications in observing subcellular structures in living animals.

Details of the use of state-of-the-art methods for comparison
We compared DFnet with LFD and VCD-Net.For the use of LFD, we downloaded the source codes of phase-space deconvolution [22] and implemented them with our data.Light-field images with a size of 153 × 153 × 169 pixels were used as inputs for LFD.The output reconstructed volumes have a size of 1989 × 1989 × 101 pixels.Finally, we projected the maximum intensity of reconstructed volumes along the axial dimension to acquire MIPs of LFD for comparison.The processing time for one frame required ∼342 s based on an NVIDIA RTX 3090 GPU.For the use of VCD-Net, we directly ran the original training codes described previously [24] with our dataset.To train the VCD-Net, we treated light-field images as input data and corresponding high-resolution volumes reconstructed by iterative tomography [3] as targets, ensuring a fair comparison with DFnet.To save the GPU memory, the training inputs were randomly cropped into the size of 64 × 64 × 169 pixels, and the size of corresponding reconstructed volumes was 832 × 832 × 101 pixels.The whole training process took about 2 days (∼100 epochs) on an NVIDIA RTX 3090 GPU for convergence.During network inference, the prediction time for one input frame (of size 1989 × 1989 × 101 pixels) took about 1.12 s.To compare these methods with DFnet, we obtained corresponding MIPs of the reconstructed volumes derived by LFD or VCD-Net along the axial dimension.

Experimental setup
The hardware system was built upon a commercial inverted microscope (Zeiss, Observer Z1), equipped with a 63×/1.4NA oil-immersion objective (Zeiss Plan-Apochromat 63×/1.4Oil M27).A customized MLA with a pitch size of 84.5 µm and a focal length of 1800µm was inserted at the image plane before the camera (Andor Zyla 4.2).Each microlens exactly covered 13 × 13 sensor pixels.Lasers (Coherent OBIS 405/488/561/640) were used for sample excitation in a time division multiplexing way [32].Another camera (Andor Zyla 4.2) was placed at another exit of the microscope for wide-field detection as comparison.A time division multiplexing strategy was used for simultaneous imaging.All devices were synchronized using the LABVIEW software (2019 version).MATLAB (2020 version) and OriginPro software (2019 version) were adopted for data analysis and visualization.We utilized an NVIDIA RTX 3090 GPU to implement DFnet on a PyTorch platform.

Robustness of deep focus microscopy with different optical parameters
To demonstrate the robustness of our proposed framework to different hardware parameters, we compared the performance under four different MLA settings on 1-µm-diameter synthetic tubulins which were randomly distributed in a 3D space.That is, MLA #1 (54-µm pitch size and 1100-µm focal length, corresponding to 7 × 7 angles), MLA #2 (100-µm pitch size and 2100-µm focal length, corresponding to 13 × 13 angles), MLA #3 (115-µm pitch size and 2400-µm focal length, corresponding to 15 × 15 angles), and MLA #4 (160-µm pitch size and 3400-µm focal length, corresponding to 21 × 21 angles).The DFnet model was pretrained and validated on simulated dataset under the same setting.The results indicated that as the angular resolution changed, the performance maintained stable (Fig. 3).For example, increasing the angular resolution from 7 × 7 to 13 × 13 led to only a variation of 0.7 dB on PSNR value.For convenience, we selected MLA #2 with 13 × 13 angular resolution in our system implementation.

Experimental resolution characterization
Resolution and contrast are two critical factors for observing the dynamics of cells and organelles in vivo.However, existing cost-effective methods have their bottlenecks.On the one hand, WFM suffers from severe image blur due to a shallow DOF, which limits its ability to capture clear details of biological specimens.On the other hand, processing methods aim to generate a 3D volume as their output, using either LFD [22,23] or deep learning-based technologies (VCD-Net [24]).The raw LFM measurements and the associated reconstructions differ by nearly two orders of magnitude in pixel number, resulting in severe artifacts, low resolution, and low contrast.Different from previous methods, deep focus microscopy incorporates LFM with DFnet to extend the DOF at each angle and increase the spatial resolution of the output 2D image.By maintaining an equal pixel number of the input and output, our method alleviates the ill-condition of the task, and thus can enhance the resolution and contrast while suppressing reconstruction artifacts with the well-designed network.
To demonstrate the resolution enhancement of deep focus microscopy, we experimentally imaged fluorescent beads with sub-diffraction-limit diameters randomly distributed in a 3D space.To prepare the experiment, 1 mL of 100-nm-diameter beads (ThermoFisher TetraSpeck Microspheres, T7279) were diluted in 100 mL of pure water.At 40 °C, 1 mL of 1% agarose and 1 µL of diluted fluorescent beads were thoroughly mixed and placed in a 35-mm dish for solidification and imaging.A 63×/1.4 NA oil-immersion objective was used to assess the high-resolution performance of deep focus microscopy.WFM, LFD, and VCD-Net were also compared, the latter of which is a typical high-speed reconstruction method that is widely used to enhance the resolution, contrast, and sectioning capability in LFM.We observe that WFM failed to distinguish two closely positioned beads since it recognized most of them as out-of-focus patterns with a bubble-like shape.In contrast, these two beads were resolved by our deep focus microscopy clearly (Fig. 4(a)).For quantitative analysis, the FWHM was obtained by measuring the intensity distributions of the beads laterally after a Gaussian fit.Although LFD and VCD-Net were also capable of detecting the two beads, they required more computational time and exhibited widened profiles, indicating the resolution loss during their 3D reconstruction process (Figs.4(b) and 4(c)).Deep focus microscopy trained on a broader dataset achieved the decent performance comparable to those in previous conditions, which demonstrated the stability of our DFnet (Fig. 4(d)).The quantitative results showed that deep focus microscopy achieved an improvement in resolution by more than tenfold than WFM, by fivefold than LFD, and by twofold than VCD-Net (Fig. 4(e)).WFM focused only on one depth, causing beads at out-of-focus depths to appear blurred and resulting in a notable decrease in resolution.The stable results of deep focus microscopy showed that our approach achieved resolution close to the diffraction limit throughout the entire FOV of 218 × 218 µm 2 , independent of the distance between the measured beads and the native focal plane.

DOF improvement over conventional methods
To further validate the extended DOF achieved by deep focus microscopy, we imaged a USAF-1951 resolution chart at different axial positions away from the native focal plane (z = 0 µm).The images were acquired at varying axial positions.Results obtained by confocal microscopy with axial scanning were considered as the ground truth for comparison.At the focal plane, WFM demonstrated comparable performance to deep focus microscopy for the thin chart, which minimized the presence of out-of-focus light that could otherwise blur the image (Fig. 5(a)); however, as the out-of-focus distance increased, the WFM results blurred rapidly.Existing DOF-enhanced methods, like digital refocusing [33], still suffered a relatively small DOF and severe blurring effects when positioned away from the focal plane.Whereas LFD and VCD-Net based on LFM setups kept focused within an extended axial range but showed blur and detail loss.Line pairs in group 1 cannot be correctly detected by them.Only deep focus microscopy could simultaneously obtain both high resolution and large DOF in this challenging case, distinguishing the line pairs (narrow to group 1, line 6) with high contrast, as indicated by yellow marked profiles.To quantitatively confirm the DOF extension of deep focus microscopy, we calculated the modulation transfer function (MTF) on a selected line pair in different axial positions (Fig. 5(b)).Among all five methods, deep focus microscopy exhibited the highest image contrast at each axial layer.We fitted the FWHMs of these MTF curves, and the results demonstrated that deep focus microscopy achieved an over 30-µm DOF, which was a significant enhancement compared to WFM (Fig. 5(c)).The similar resolution performance can be obtained using the dataset with multiple sample types (Fig. 5(d)) to show the network stability.

Generalization and efficiency based on a lightweight network
Biological samples manifest diversity in terms of structures and functions, which poses challenges to deep learning methods for their generalization ability.In biological experiments, it is timeconsuming to retrain the newly captured data, and the advantage of fast processing via deep learning will be lost.One solution is to train the network on a dataset and generalize it to unseen data.However, the existing methods, relying on the intricate network structure to overfit sample textures, have limited generalization ability.By contrast, DFnet embraces a lightweight encoder-decoder architecture integrating data priors.A reduced number of network parameters and FLOPs are not able to fully fit the dataset.Instead, the universal physical mapping was learned from them.
To demonstrate its generalization capability, we conducted a cross-sample analysis of deep focus microscopy and VCD-Net.L929 cells with membrane and mitochondria labelling were used as the sample.The cells were cultured in DMEM (Gibco) medium with 5% CO 2 , 2 mM GlutaMAX, 10% FBS (Biological Industries), and 100 U/ mL penicillin-streptomycin at 37 °C.We used the PiggyBac Transposon Vector System and Vigofect for cell transfection to generate L929 TSPAN4-mCherry and TOM20-GFP stable cell line [34].The cells were fixed in a 35-mm dish for imaging.During the experiment, DFnet and VCD-Net were trained on the membrane channel and mitochondria channel of L929 cells respectively.Then, we utilized the pre-trained models to make predictions on the membrane channel (Figs.6(a) and 6(b)).Compared to VCD-Net, deep focus microscopy showed higher image correlation and a wider frequency range in both cases (Figs.6(c) and 6(d)).Unfortunately, deep focus microscopy also suffered a slight performance drop when the training and test data were completely different.To improve the stability of our approach, we built a large dataset with various real and synthetic biological samples, in which light-field raw macro-pixel images and their high-resolution 2D pairs were used for network training (Fig. 6(e)).Based on our constructed dataset, we improved the generalization performance, which could be attributed to the data priors derived from diverse structures.VCD-Net trained on the same dataset was also inferior to our method since our lightweight network is not prone to overfitting.Together, DFnet achieves significant improvements in efficiency and performance over VCD-Net, with 345 times fewer floating-point operations, twofold fewer network parameters, and a hundredfold faster inference speed.(Figures 6(f)-6(h)).

High-fidelity dynamic observations in vivo
High-fidelity imaging within living organisms is crucial for investigating complex interactions among multiple cells and organelles.However, the intravital microenvironments are difficult to monitor due to irregular morphology and tissue heterogeneity.For example, a zebrafish embryo grows as a sphere with a diameter of several hundred micrometers, densely packed with numerous embryonic cells [35].They crowd against each other in a 3D space, resulting in considerable background fluorescence to cause image blur and fidelity degradation.Deep focus microscopy fills up this niche with high resolution and large DOF within a compact and cost-effective system.We used WFM, VCD-Net, and deep focus microscopy to simultaneously observe the developmental process of zebrafish embryos with membrane labeling.For the   preparation of zebrafish embryos, the cultured embryos were injected with 300-pg Tspan4a-EGFP mRNA (synthesized in vitro with mMessage mMachine T7 kit, Ambion, AM1344) in one cell at the 16-cell stage.Then, the embryos were mounted in 1% low-melting-point agarose in a 35-mm dish for imaging.During imaging, we controlled the environmental temperature at around 27 °C.Due to the decent generalization ability, the pre-trained model could be applied directly to the intravital data.Compared to WFM, deep focus microscopy exhibited a notable reduction in image blur, presenting sharper and more detailed results (Figs.Fluorescence microscopy is widely used for immune observations [36][37][38], but it faces challenges from the heterogeneity and autofluorescence of thick organs.To overcome these challenges, we propose deep focus microscopy based on the LFM setup, which captures the spatialangular measurements that separate the signal and background from different angles.During the DFnet training, we apply spatial-channel attention mechanisms to selectively enhance the signal.To demonstrate the applicability of deep focus microscopy for immune observations, we imaged a living mouse liver with neutrophil and vessel labeling.In this experiment, male C57BL6/J mice (around 7-8 weeks) were used.Avertin (350 mg/kg, i.p.) was used for anesthetization.For the fluorescent labeling, 2 µg of Ly6 G/Ly6C antibody (PE-Cyanine7, eBioscience, 25-4317-82), 5 µg of AF647-WGA dye (Alexa Fluor 647 Conjugate, Thermo Fisher, P21462) and 90 µL of PBS were injected to a mouse intravenously.We dissected the deeply anesthetized mouse to expose its liver on a glass holder for imaging.After data acquisition, the same pre-trained model was used for DFnet processing.The WFM results were covered by a layer of haze-like artifacts, while our method exhibited high contrast and distinct cell structures in the microenvironments.Although the VCD-Net counterparts showed significant improvement compared to WFM, but were still inferior to deep focus microscopy (Fig. 8(a)).We quantified the signal-to-background-ratio (SBR) of the images and found that deep focus microscopy achieved higher SBR of ∼13 dB than WFM, and ∼6 dB higher than VCD-Net (Fig. 8(b)).Moreover, during the time-lapse recordings, we could clearly observe a neutrophil that migrated in the vessels, elongated a retraction fiber, and finally produced a migrasome from the broken-off fiber (Figs.8(c)-8(e); Visualization 2).The cross profiles also verified the submicron resolution of deep focus microscopy in vivo, which were over twofold enhancement relative to those in WFM and VCD-Net (Fig. 8(f)).Together, with its extended DOF and high resolution, deep focus microscopy is a universal and cost-effective tool for high-fidelity intravital subcellular observations in both ovipara and mammals.

Conclusions
In this paper, we have developed deep focus microscopy to efficiently address the tradeoff between resolution and DOF, enabling rapid 2D imaging with reduced background for high-fidelity dynamic observations within living organisms.Deep focus microscopy employs an MLA to acquire macro-pixel images with an extended DOF and subsequently introduces a lightweight DFnet to efficiently exploit the abundant spatial-angular information, achieving 10-fold resolution higher than WFM and at least 100-fold faster speed than LFD.The DFnet is designed based on the U-Net architecture, optimized to preserve high-resolution details across multiple spatial-angular views [39], which features 345 times fewer floating-point operations and a hundredfold faster inference speed.In addition, with significantly fewer network parameters, DFnet can learn the physical prior behind the dataset better.By integrating DFnet, deep focus microscopy exhibits improved resolution and generalization ability by narrowing the pixel gap during the reconstruction process to reduce the ill-condition of the inverse reconstruction problem.Although confocal microscopy can achieve similar performance with our deep focus microscopy in terms of resolution and DOF, it exhibits reduced imaging speed, increased optical complexity, and higher hardware cost.To better implement deep focus microscopy, we also released a large dataset including 1,400 pairs of light-field raw macro-pixel images and their corresponding high-resolution 2D counterparts, covering both experimental and synthetic data.Based on the data priors and well pre-trained network, deep focus microscopy is appropriate for diverse observations in complex intravital environments to record the processes of cell division and migrasome generation with high fidelity.
There are several potential extensions to deep focus microscopy.Firstly, it could be combined with other super-resolution techniques to surpass the diffraction limit [11,40,41].Secondly, the imaging speed of deep focus microscopy is limited by the camera frame rate, which can be accelerated using high-speed cameras [42].Thirdly, the capability of imaging deep tissues can be improved by combining with pupil modulation [43] or multi-photon techniques [44].Lastly, the introduction of self-supervised strategy holds the potential to reduce the requirement of paired data and increase the applicability [45,46].With superior performance and extensibility, we anticipate deep focus microscopy will facilitate the observations and studies of long-term subcellular behaviors and functions in biological researches.

Fig. 1 .
Fig. 1.Principle of deep focus microscopy.(a) Schematics of WFM and deep focus microscopy.WFM collects the intensity of all photons, so it suffers from a shallow DOF and limited optical sectioning, while deep focus microscopy, which relies on a microlens array (MLA) in hardware and a neural network in algorithm, obtains an extended DOF within a single 2D image.(b) The comparison of a fixed L929 cell in the membrane channel, obtained by WFM, light field deconvolution (LFD), and deep focus microscopy.The corresponding Fourier components are shown on the right with estimated resolution by Fourier ring correlation (FRC).The processing time of the latter two methods is also marked, measured on data with an output size of 1989 × 1989 pixels.(c) The comparison of imaging mode between confocal microscopy and deep focus microscopy.In confocal microscopy, different axial planes are acquired individually and then projected into a plane for visualization, whereas deep focus microscopy allows for capturing volumetric information within a single image with comparable resolution.The right shows the reduction in the number of axial layers using deep focus microscopy.Scale bars, 50 µm and 1 µm −1 (b, c).

Fig. 2 .
Fig. 2. Overview of DFnet architecture.(a) The structure of DFnet, which employs a multi-scale symmetrical encoder-decoder architecture with an attention mechanism followed by an up-sampling block.The pixel size of each layer is indicated.The detailed architectures of the channel and spatial attention (CASA) module and cascaded up-sampling layers are shown below with symbol explanations.(b) Example loss-versus-iteration curve trained on 100-nm-diameter fluorescent beads.Mean squared error (MSE) is used as the loss function for network training.The network usually converges at 10 k iterations.The outputs of DFnet at different training phases are illustrated on the right.Scale bars, 10 µm.

Fig. 3 .
Fig. 3. Robustness test of deep focus microscopy to different MLAs.(a) Simulation of 1-µm-diameter tubulins under different MLA settings imaged by our deep focus microscopy.The network was pretrained and validated on simulated dataset under the same setting.The zoom-in patches are shown in the second row.(b)-(c) Boxplots of PSNR and SSIM by deep focus microscopy with different MLAs (n = 10 samples per MLA setting).The central line signifies the median, while the box boundaries correspond to the lower and upper quartiles, and the whiskers of the boxplots reach out to a distance of 1.5-fold the interquartile range from the quartiles.Scale bars, 10 µm (a).

Fig. 4 .
Fig. 4. Resolution characterization of deep focus microscopy.(a) Images of 100-nm fluorescence beads obtained by WFM and deep focus microscopy with a 63×/1.4NA oil-immersion objective.(b) The four zoom-in regions applied for different methods corresponding to subregions marked in (a).(c) Normalized intensity profiles along the white dashed lines marked in (b).(d) The four zoom-in regions corresponding to subregions marked in (a) obtained by deep focus microscopy, which is trained with the dataset with multiple sample types that is described in Fig. 6.(e) Bar chart of lateral resolution of four different methods (n = 10 beads per method), which was calculated by measuring FWHMs after Gaussian fit.The dashed blue line indicates the lateral diffraction-limited resolution.Data are shown as mean +/standard derivations.Scale bars, 10 µm (a), 2 µm (b, d).

Fig. 5 .
Fig. 5. Extended DOF of deep focus microscopy.(a) Simulation of a USAF-1951 resolution chart at different axial positions, obtained by WFM, digital refocusing, LFD, VCD-Net, deep focus microscopy, and confocal microscopy (as the ground truth).The intensity profile of the yellow line is attached beside each sub-image.Arrows point to the blur and artifacts.(b) The modulation transfer function (MTF) of the line pairs (group 1 line 6, 1.1 cycles/µm in our manufacturing) at different axial positions.(c) The DOF comparisons between five methods.We calculated DOF by measuring FWHMs of MTF curves after Gaussian fit.The VCD-Net and deep focus microscopy were trained using chart dataset.(d) Simulation of a USAF-1951 resolution chart at the axial position of z = 0 µm, obtained by deep focus microscopy trained with the dataset with multiple sample types that is described in Fig. 6.Scale bars, 5 µm (a, d).

Fig. 6 .
Fig. 6.Evaluation of generalization ability and efficiency.(a)-(b) Cross-sample test for VCD-Net (a) and DFnet (b) on fixed L929 cells labeled with membrane and mitochondria.Left to right, outputs from networks trained with membrane data, mitochondrial data, and our constructed dataset, working on the same input membrane data.The enlarged regions and corresponding Fourier spectrums are shown below.(c) The ground-truth data were captured by sLFM after reconstruction.(d) Boxplots of Pearson correlations between VCD-Net and DFnet trained with different datasets.The central line signifies the median, while the box boundaries correspond to the lower and upper quartiles, and the whiskers extend to 1.5 times the interquartile range.P values were calculated using the two-sided paired t-test: P = 1.32 × 10 −15 for training on the membrane channel, P = 3.98 × 10 −18 for training on the mitochondria channel, and P = 4.41 × 10 −17 for training on our constructed dataset.(e) Illustration of our constructed dataset, which includes both experimental and synthetic data consisting of 1,400 pairs spanning multiple structures and species such as tubulins, cells, zebrafish, and mice.The macro-pixel images are inputs, and the maximum intensity projections of the volume over the z axis are regarded as targets.(f)-(h) Comparisons between VCD-Net and DFnet on network parameters, floating-point operations per second (FLOPs), and processing time, calculated from the input with a size of 100 × 100 × 169 pixels.Scale bars, 10 µm and 2 µm −1 (a-c).

Fig. 7 .
Fig. 7. In-vivo high-resolution recordings of zebrafish embryo development.(a)-(b) Images of a fixed zebrafish embryo by WFM (a) and deep focus microscopy (b) with a 63×/1.4NA oil-immersion objective.The enlarged regions and Fourier transforms are shown on the right, corresponding to the four marked boxes.The estimated resolution by Fourier ring correlation (FRC) is shown below.The DFnet used here was the pre-trained model with our constructed dataset.(c)-(e) Time stamps showing the cell division process clearly visualized by deep focus microscopy within the enlarged regions of the red marked box.In contrast, WFM endues low resolution and intense background contamination while the results of VCD-Net exhibit blur and artifacts.Scale bars, 10 µm (spatial domain), 1 µm −1 (Fourier domain).
7(a) and 7(b)).The resolution enhancement was also verified with more high-frequency components in the Fourier spectrum.As shown in Fig. 7(d), the projections of VCD-Net were displayed.Compared to deep focus microscopy, the results of VCD-Net exhibited blur and artifacts, as indicated by yellow arrows.With the superior performance, deep focus microscopy effectively captured the cell division process within a series of 2D images, enabling clear observation of cells at different axial planes (Figs.7(c)-7(e); Visualization 1).

Fig. 8 .
Fig. 8. Intravital high-contrast imaging of neutrophils in living mouse livers.(a) Example images of mouse liver labeled with neutrophils (cyan) and vessels (magenta), obtained by WFM, VCD-Net, and deep focus microscopy with a 63×/1.4NA oil-immersion objective.The DFnet used here was the pre-trained model with our constructed dataset.(b) Boxplots of signal-to-background-ratio (SBR) of WFM, VCD-Net, and deep focus microscopy on the neutrophils channel (upper row) and vessels channel (lower row).The SBR indices were calculated as the ratio between signal (maximum intensity of selected ROIs) and background (average intensity of regions without cells and vessels).P values were calculated using the two-sided paired t-test: P = 5.71 × 10 −46 for the neutrophil channel and P = 6.69 × 10 −33 for the vessel channel for WFM; P = 4.94 × 10 −17 for the neutrophil channel and P = 2.04 × 10 −25 for the vessel channel for VCD-Net.(c)-(e) The enlarged regions of red dashed box in (a) at different time stamps, obtained by WFM (c), VCD-Net (d), and deep focus microscopy (e).The arrows indicate that deep focus microscopy could capture the process of migrasome formation.(f) The normalized intensity profiles along the migrasome for different methods.Scale bars, 5 µm (a, c-e).