Emergent physics-informed design of deep learning for microscopy

Deep learning has revolutionised microscopy, enabling automated means for image classification, tracking and transformation. Beyond machine vision, deep learning has recently emerged as a universal and powerful tool to address challenging and previously untractable inverse image recovery problems. In seeking accurate, learned means of inversion, these advances have transformed conventional deep learning methods to those cognisant of the underlying physics of image formation, enabling robust, efficient and accurate recovery even in severely ill-posed conditions. In this perspective, we explore the emergence of physics-informed deep learning that will enable universal and accessible computational microscopy.


Introduction
Over the past decade, deep learning has taken a leading role in technological innovation, offering a learned means of classification, synthesis, transformation and tracking of images and big data [1]. Among its many established uses, deep learning has found utility in image transformation [2], in which new meaningful representations of data are generated for human perception, or as part of artificial intelligence. These same approaches have recently found application in microscopy, particularly, in super-resolution, denoising and phase retrieval [3,4]. Microscopy, however, has shifted the goal posts from perceptual quality, often favoured in many computer vision applications, to the accurate reconstruction of underlying physical properties. Coupled with the 'black-box' nature of trained networks, early deep learning demonstrations have been met with understandable apprehension in the scientific community [4]. The overarching question is as follows: Are learned networks reconstructing an accurate image of biological tissues, or are they producing outputs that simply offer visual appeal? Recent studies have helped demystify this uncertainty, as we explore in this perspective. It is important to note, however, that contemporary deep learning methods are founded by mathematics and physics, sharing among them the capacity to accurately approximate a breadth of non-linear continuous functions given careful training [5,6]. This has positioned deep learning as a powerful, novel means to solve a broad range of inverse problems, driving a recent flurry of exciting demonstrations in computational imaging [3,7].
In microscopy, inverse problems have received much attention. They aim to recover spatial, spectral and phase information from incomplete, scattered or diffracted light, particularly at the microscale and at depths needed to observe cellular function [3,4]. Typically, when solving inverse problems, there are issues with the problem itself being ill-posed, meaning the recovered parameter is sensitive to minute noise. Further, solutions that overcome this often present a high computational demand. Many strides have been made in designing robust and efficient objective functions and regularisation schemes, and underpin methods such as deconvolution, quantitative phase retrieval and tomography [3,7]. However, many approaches still struggle to account for non-linear effects, which often prevent convergence onto closed-form or efficient solutions. The introduction of deep learning has reinvigorated the approach to inverse problems due to its capacity to learn non-trivial inversions directly from data [3]. Further, the feed forward property of deep learning allows for exceptionally fast inference (passing data through a network) once the initial training phase has been completed. As a result, deep learning has established a new paradigm, shifting the focus from prescribing careful and optimal inversion and regularisation schemes, to a careful selection of losses, training and testing with universal cross-applicable models.
Early applications of deep learning in microscopy have relied on careful acquisition of extensive paired training data [8][9][10][11][12]. The onus on data preparation has made these methods inaccessible to many, beyond using existing pre-trained models. Further, the capacity to over-train and potentially generate artefactual images in purely data-driven models has necessitated a close comparison to ground truths, which may not be readily available [4]. Recently, a new class of physics-informed models have emerged that exploit sparsity and known physics priors (that is a priori information) as a means to constrain the training. Most images are sparse, meaning that they can be represented in a lower-dimensional form with little-to-no loss in information. The use of sparsity and other constraints that are specific to the imaging system can help the learned model arrive at a likely solution even when the problem itself may be underdetermined or undersampled. These approaches reduce or even remove the requirements for extensive experimental training data [13][14][15][16][17][18][19], and have enabled even severely ill-posed problems to be solved, such as low-light and lensless imaging [15]. The reduction in the need for experimental ground truths has further lowered a barrier to entry in the experimental equipment. In this space, much of the recent work has been devoted to answering a few very important questions. For example, when can networks generalise, i.e. learn the underlying physics agnostic of the sample contents; and how can priors guide generalisation, even when recovery is exceptionally ill-posed?
In this perspective, we explore the recent impact of physics-informed design for robust, universal and accessible computational microscopy, and comment on the future opportunities and challenges. We note that a detailed and expansive review of deep learning in the broader context of computational imaging is provided by Barbastathis et al [3]. Specific applications of classification and image enhancement in microscopy are reviewed by Belthangady et al [4]. A focus on tomographic reconstruction, including non-optical methods, is provided by Wang et al [20]. The tremendous pace in this field has seen a near doubling in published papers over the past year alone. Here, we discuss the impact of these recent works with a specific focus on inverse problems in microscopy. We explore these advances in the context of network design and learning strategy, across a spectrum bridging classical (non-learned), physics-informed and purely deep-learning methods.

Deep learning
Deep learning, and the broader machine learning (ML), methods seek 'generalisation' by learning from experience [21]. ML using a neural network (NN) was conceived as early as the 1940-50s (see [22] for a historical perspective) by the concept of the perceptron [23]; an architecture that aimed to mimic perception in the brain. Perceptrons comprising multiple layers have demonstrated the important capacity to approximate a wide range of non-linear functions [24]. This property was later codified by the universal approximation theorem, stating that NNs with arbitrary widths and depths can approximate any functional transformation of data if such a function exists [25]. This important property is attributed to the NN architecture, where each layer comprises weighted sums of previous layers (termed as weights), followed by non-linear activation units. Recently, the concept of a deep NN (DNN) comprising many layers and many weights per layer has become tractable. They have revolutionised many ML applications by exceeding the performance of user-designed algorithms by over ten fold [4].
A DNN learns by back-propagating the error from a training loss function (TLF) to the individual weights. Specifically, the TLF quantifies the parameter to be optimised during training. The design of such a DNN revolves around the careful selection of: (1) the NN architecture, which may be feed-forward, recurrent, among others; (2) the TLFs that quantify performance in a supervised, i.e. with respect to some ground truth, or unsupervised, i.e. with respect to some internal metric, structure or sparsity, fashion; and, (3) the training data, ground truths or labels. The capacity of DNNs to be universal approximations, and the breadth of architectures, have found many uses in solving inverse problems in computational imaging [26].
The imaging process can be described, quite broadly, by an operator H : X → Y, that maps a true image x ∈ X to the measurement space y ∈ Y via y = H(x) + ϵ, where ϵ is noise [26]. The form of H varies between applications, but is often complicated by diffraction, scattering and aberrations of light fields, and it may feature computational detection schemes, such as single-pixel or tomographic detection. It is the goal of the reconstruction to solve the inverse problem posed byx = H −1 (y) that is robust to noise. Conventional inverse solvers typically make some linear or first-order assumptions on H and the statistics of ϵ, and recover x via an objective: arg min x { f (H(x), y) + λ(x)}, where f is an error function and λ is a regulariser. Note that this optimisation must be performed for every measurement y. DNNs do not define H. Instead they optimise the following metric: arg min DNN {TLF(x, DNN(y))}, which can be trained with sufficient pairs of x and y, or even with their general statistical distributions [27]. A trained DNN can then infer x from any subsequent measurement y.

Progress, opportunities and challenges
The use of DNNs for computational microscopy is very recent, with the majority of developments taking place within the past four years. Early demonstrations have used data-driven end-to-end approaches, whereby a network learns the transformation from raw measurement Y to desired output X directly. In 2017, Rivenson et al [8] overcame the diffraction limit, achieving super-resolution by means of learning the deconvolution process. This was achieved by training a network with co-registered low-and high-resolution microscopy images taken, respectively, with low and high numerical aperture objective lenses. Network-based super-resolution using single-molecule localisation from multiple frames followed shortly thereafter [28,29]. The capacity to image through scattering media with deep learning was preceded by Horisaki et al [30] in 2016 employing ML, and finally demonstrated by Lyu et al [9] in 2017 with fully connected NNs. In 2017, Sinha et al [10] demonstrated that the phase of an object can be recovered from far-field diffraction without the use of a lens. This lensless imaging marked the first use of DNNs in strictly computational imaging [3], namely the concomitant design of both the imaging process and its recovery. In holography, similar paired supervised training enabled quantitative amplitude and phase recovery from interference by Rivenson et al [11] and by Wang et al [12] in 2018. More recently, Wang et al [31] and Spoorthi et al [32] have approached the broader challenge of phase unwrapping with DNNs, showing improved performance to conventional algorithms. These early demonstrations have unveiled the utility of deep learning across a broad spectrum of challenges in optics, and have motivated much of the rapid pace and versatility of contemporary deep learning research.
In these methods, the measurements, y, were passed through a DNN under training and compared in a supervised fashion against known ground truths, x, with the aim that the network would learn the intrinsic transformation of data associated with the physics of the imaging process. This data-driven approach is powerful in learning transformations with no priors; however, it is challenged by the need for extensive, co-registered, experimental data. Further, an important consideration has to be made: is the network learning the underlying physics (is it generalisable) or is it memorising the examples given (overfitting)? This can be evidenced in the performance of a network when fed with data different to that with which it was trained. For instance, comparing inferences of various fluorescent markers or biological specimens, or even comparing images taken with different imaging instruments. Deng et al [16] have recently observed that the selection of training and priors can significantly affect generalisation, and offer strategies to guide learning. In another interesting approach, Xue et al [33] have quantified the uncertainty of DNN estimates for use as a visual metric. Such a metric can inform the viewer on how accurate certain spatial areas of the reconstruction are, aiding interpretation.
Despite these challenges, supervised paired training has demonstrated a wealth of capacities. DNN methods have extended depth-of-field [34] and resolution [35], and enabled automated Fourier filtering [36] in holography. They have facilitated phase tomography [37] and quantitative phase imaging [38], and have even enabled mobile-phone microscopy to match benchtop performance [39]. In fluorescence microscopy, Weigert et al [40] have demonstrated that deep learning can reconstruct images and volumes with low-light illumination by training against their high-exposure counterparts. The concept of pair-wise training was pushed further by Wang et al [41] with their use of a generative adversarial network (GAN) [42], showing that DNNs can learn transformations across imaging modalities, including widefield, stimulated emission depletion and total internal reflection fluorescence microscopes, from single images. DNNs have further learned to image through scattering media in the form of multimode fibres [43], through optically thick media [44], and at low photon counts [45]. Intriguingly, Li et al [46] demonstrated that DNNs can even learn beyond the memory effect [47]. The memory effect dictates that there is a small range of tilts or shifts that can be applied to light passing through a scattering medium such that the scattered wavefront (or speckle) remains correlated at different points. Adaptive optics-corrected wavefronts typically operate within this limit, requiring recalibration only when exceeding the memory effect distance [47]. Learning beyond this range indicates that not only the network generalises broad statistics of the scattering media rather than that of an individual instance and orientation, but also that there may exist means to correct for scattering universally over large fields of view. Recently, the capacity to overcome scattering was demonstrated in microscopy by Xiao et al [48] by training on images with and without introduced scattering, and in a multiview detection scheme.
The burden of data collection for supervised paired training can be alleviated by several means. Paired data may be directly simulated from a best guess of the forward imaging operator. This was demonstrated in both fluorescence [49] and light-field [50] microscopy. It is also preferred in situations where accurate ground truths are unavailable, such as for phase unwrapping [51,52]. Ground truths may be also generated using existing algorithms. For instance, this was performed in unwrapping quantitative phase images [53]. Whilst DNNs trained using this approach may offer faster processing, they are unlikely to outperform the algorithms used for training. Alternatively experimental data can be transformed into a form usable by existing pre-trained networks. This was achieved for super-resolution light-sheet microscopy with a network trained on widefield fluorescence [54]. Unsupervised training has also been demonstrated for microscopy [55] by using a cycle-consistent GAN [27]. Such networks learn the transformation between image domains of unpaired datasets rather than focusing on paired pixel-wise loss. This approach is promising as unpaired images can readily be acquired. However, unsupervised GANs are challenging to train. This is because instead of minimising losses with robust gradients, such as the mean error between pixels, these approaches implement multiple competing networks that simultaneously solve multiple ill-posed inverse and forward transformations. As such, they require exceptionally high volumes of data and have a greater capacity to overfit.
Recent years have marked several approaches to regularise or constrain the training with known physics priors, moving away from purely data-driven training. Sparsity is one such prior that has received considerable attention, and has enabled accurate reconstructions from severely undersampled data, for instance in compressive sensing [56,57]. By using a DNN with a sparsity constraint, Ouyang et al [58] demonstrated that photoactivated localisation microscopy images may be reconstructed from up to 100× fewer frames than conventional approaches, with an equivalent improvement to speed and photodamage. In imaging through scattering media, Li et al [59] have demonstrated that a DNN can achieve a better and more generalised performance in noisy sparse images by using a negative Pearson correlation coefficient (NPCC) and l 2 weight regularisation in the TLF. Zhang et al [60] have demonstrated that using a NPCC TLF enabled the restoration of holograms captured under low-light conditions. In fact, sparsity features in most DNNs as typical image-to-image architectures employ autoencoders that first encode, or reduce the spatial dimensionality of data while expanding the feature space, resulting in a latent or feature representation of data that contains the most 'important' information. This latent representation is then decoded, in a reverse fashion, such that this important information is the basis for a new transformed representation. In fact, unsupervised autoencoders have the ability to filter out unnecessary data, and remove noise [61]. This is because the latent space is often smaller than the spatial dimensionality, leading the the network prioritising information that carries the most 'weight' in reconstructing a faithful image, which naturally excludes random noise when the signal-to-noise is high.
Exploiting sparsity enables a wealth of retrievable information beyond the space-bandwidth or the dynamic-range limits. In severely underdetermined inversions, sparse learning should be guided towards likely estimates using additional constraints. In optical imaging, the forward operator H typically attenuates higher spatial frequencies, as is evident in many optical transfer functions. This leads to the optimisation favouring lower spectral components and losing high-resolution fidelity. To address this issue, Deng et al [15] have demonstrated that splitting and recombining high and low frequencies via two DNNs can substantially improve reconstruction for super-resolution and phase recovery. Similar improved reconstructions can also be achieved by pre-modulating the power spectral density [62]. Ultimately, learning could be guided to optimise a spectral density consistent with that expected for a particular general subset of samples and the imaging system in question.
For phase retrieval at very low photon counts, Goy et al [13] demonstrated that using a first-order estimate (approximant) as an entry point to a DNN leads to improved results compared to an end-to-end reconstruction (figure 1). As with any optimisation, deep learning requires good loss gradients. The approximant, acting as an initial guess, transforms the input closer to the form of the output, providing the DNN stronger loss gradients for back-propagation, especially in low SNR conditions. Further, it provides an initial guess of high spatial frequencies with the DNN learning effectively the statistics of strong shot-noise, leading to reconstructions from as low as a single photon per pixel [13] (figure 1). Since then, phase retrieval has been substantially improved in low-light conditions with the combined use of an approximant and frequency-split DNN [63], the use of coherent pre-modulation of phase [64], and by utilising an added perceptual loss in the TLF [65]. Perceptual loss based on a pre-trained VGG network (Visual Geometry Group) [66] has helped identify and match fine-detail salient features between the training pairs. This move away from end-to-end networks came with the realisation that learning direct data transformations may not be the most efficient means of training a DNN. Multi-step training using a combination of specialised networks and conventional algorithms can break up a complex transformation into several simpler problems. This is particularly useful in situations where the measurement has to undergo basis transformations, such as the Fourier transform or phase wrapping. For instance, phase unwrapping can be performed by first using a DNN to classify phase fringes into their integer multiples, and then recombining them with the wrapped phase to recover absolute phase [67], which can be further augmented with a denoising network [68]. In fringe projection profilometry, multiple DNNs were used to estimate the individual amplitude and phase quadrature components, outperforming a single end-to-end network [69]. Further, arbitrary fringe spacing may be simulated by a multi-network architecture from one or two acquisitions, enabling rapid 3D profilometry [70].
Interestingly, back-propagation to weights in a NN shares mathematical foundations with beam propagation through samples with varying refractive indices (RIs) [3]. In early intriguing demonstrations of ML, sample RI were stored as weights in a layered architecture, and their values were estimated using stochastic gradient descent [71,72]. This enabled learned tomographic recovery through highly scattering objects, a challenging ill-posed problem. Sun et al [73] have demonstrated that scattering can be decoded using a DNN in diffraction tomography. More recently, this issue was tackled by a DNN using an approximant input, enabling impressive 3D phase recovery from limited-angle tomography of optically dense objects [14]. As an alternative to tackling the problem of tomography directly, a DNN was instead used as a learned regulariser, specifically, a projector in a projected gradient descent [74]. In a different approach, a recurrent NN (RNN) transformed the static problem of scattering inversion to a dynamic problem of moving illumination [75]. RNNs incorporate previous internal hidden states it their evaluation. Here, multiple acquisitions share features in their forward, scattering operator, and subsequent acquisitions are dependent variables; this 'memory' can be exploited by RNN for improved reconstruction.
Zhou et al [76] have demonstrated an interesting implementation of DNNs that has utilised a deep image prior [77] to compensate for the 'missing-cone' problem in diffraction tomography. Ulyanov et al [77] have recognised that the structure of the CNN naturally conveys content that is biased towards natural images, even when that CNN is randomly initialised and untrained. The use of a CNN as a prior (or estimate) to infill the information in the missing cone, thus, estimates content that is biased towards natural images over noise. This has led to superior tomographic recovery over state-of-art classical and end-to-end methods [76].
In an interesting approach of physics-constrained learning, conventional iterative inversion methods can be implemented as an unrolled NN [78,79]. There, the calculation of each iteration is treated as a single layer of a network. The combination of multiple iterations together comprise a DNN. In this architecture, system physics and sparsity priors can be prescribed explicitly, with the DNN left to learn parameters. Monakhova et al [78] demonstrated lensless imaging recovery using a spectrum of inverse methods, from purely data-driven DNN to learned and classic iterative methods (figure 2). In figure 2, we observe that classical methods suffer from incomplete parameterisation of the imaging process (model mismatch), while the deep learning method suffers from overfitting the training data. Likely, optimal learned inversions will comprise some middle ground between learned and prescribed physics. Kellman et al [79] further used an unrolled network to learn optimal LED illumination bases for quantitative phase imaging, in a truly physics-learned computational imaging, whereby the physical imaging process was learned to enable improved reconstruction.
Physics-constrained models have even enabled inverse recovery using untrained DNNs. In one demonstration [19], the output of a DNN, constituting a ground truth estimate, was passed through a simulation of image formation, to form an estimated measurement ( figure 3(a)). The error between the actual and estimated measurements was used as the TLF for the DNN. Here, the optimisation was used directly for reconstruction, rather than the traditional training followed by inference. In another demonstration [18], a generator network that predicts the ground truth and a fully connected layer that synthesises Zernike polynomials were combined through an ideal forward operator, and trained to match the raw measurements ( figure 3(b)). However, for these methods, each recovery has to optimise the network, which loses the major advantage of deep learning, namely, the fast feed-forward inference. Despite this, these approaches may be tractable on the timescales of the order of minutes to hours, which is acceptable for many microscopy reconstructions.
We have seen a tremendous push of physics-informed deep learning in computational microscopy, stimulated by two important aspects. First, data gathering for paired training of end-to-end approaches is onerous, and inaccessible to many. Second, the addition of physics constraints guide learning towards an accurate estimation and generalisation even in exceptionally ill-posed inversions. The goal being the efficient learning of ill-posed inversions with few, if any, training examples.
Physics-constrained learning schemes, however, suffer from a major challenge, namely, the impact of model mismatch on accurate reconstruction. Incorrect priors, for instance from forward operators in simulated data [49] or from approximants [13], will prevent accurate reconstruction. In exceptionally ill-posed situations, small discrepancies in the priors may lead to poor network convergence and poor reconstruction quality. Many proof-of-principle demonstrations have used idealised set-ups, for instance with well-calibrated light modulators or simulations with known noise. In these instances, the forward operator is known well. Whilst recent effort has been made to understand the role of training, priors and their robustness [16], the remaining step is now to demonstrate how these methods translate to broader experimental use with biological samples and more complex optical geometries.
Despite the experimental challenges of end-to-end methods, they have demonstrated exceptional promise in microscopy [40,41,48]. Standardisation of these networks will likely lead to few pre-trained and generalised models being used as an entry point, or a sub-network, in bespoke training, much in the same way generic classification networks, such as the VGG [66], are used across a wide spectrum of applications. For this to be effective, a strong emphasis must be placed on generalisation. Specifically, it is important to ensure that networks are trained and can support a broad spectrum of imaging targets. For biological applications, this may be especially challenging. Towards this, standardised training data spanning the breadth of microscopy modalities would be of immense utility [80], mirroring standardised classification datasets, such as MNIST or CIFAR-10.
One may consider deep learning as a means to expand the dynamic range of an imaging process. Namely, increasing the bandwidth product between common trade-offs, such as speed and sampling, resolution and  [20]. (b) Deep phase decoder predicts sample phase and aberrations that, when passed through a model of image formation, generate measurements consistent with raw data. Adapted from [18]. CC BY 4.0.
field-of-view, among others. Recovery beyond these limits requires solving an underdetermined problem, i.e. selecting one of infinite possible solutions given a priori knowledge of the data. Deep learning does so by leveraging statistics of the training data and any explicitly added priors. Ultimately, the more constrained this recovery (given it is consistent with the physics), the better the outcome. Thus, it is not surprising that end-to-end approaches perform exceptionally well when trained on very specific modalities and samples [40,41,58]. There, both physics and priors linked to a specific class of samples are learned. However, the utility of such trained networks diminished when applied to other classes of samples, stipulating that new sample-specific training should be performed. Broader training with many classes of samples will likely improve generalisation [16], however, at the cost of reconstruction quality.
The recent, rapid flurry of demonstrations across a breadth of computational challenges suggests an emerging boom of deep learning in microscopy. Each avenue of approach, from approximants and multi-step training to unrolled networks have demonstrated creativity and powerful performance. It is challenging to predict which approach and network architecture will ultimately provide the most optimal performance. In following the broader achievements of deep learning, it will likely become necessary to quantify performance of leading networks for each imaging modality or application. Thus, it is important that this emerging field strives towards two goals. First, that new works are open and reproducible, revealing data and the source code behind DNN architectures (not just providing a trained set of weights). Second, that standardised validation datasets are provided across various modalities. How these challenges are met in the coming years remains to be seen, but what is certain is that we will see many more novel learned solutions to computational microscopy.
An intriguing and speculative idea is that DNNs are able to learn underlying physics-consistent transformations of data, if such transformations exist in the training data. It may be that a DNN, beyond offering improved noise performance over existing algorithms, can capture physical phenomena that are not yet parameterised or defined by the field. Already we have seen exceptional performance of DNNs in overcoming multiple scattering and low-angle, low-light illumination in tomography [14,74,75] and in breaching the memory effect in scattering media [46]. It is possible that physics may not only inform deep learning, but that deep leaning may inform physics, for instance in discovering new illumination schemes [79], propagation-invariant beams or optical encoding methods.

Conclusion
The merger of physics-informed constraints with deep learning has bridged the gap between deep and classical means of solving inverse problems in imaging. This has precipitated many powerful advances, and has made steps to demystify deep learning from a 'black-box' approach towards a rigorous computational tool. These advances are revolutionising microscopy, from fluorescence microscopy to quantitative phase imaging, and enabling new capacities for robust imaging that can overcome scattering and diffraction. We have seen that the purely deep learned approaches preform exceptionally well when trained on specific modalities and samples with a wealth of experimental data. On the other end of the spectrum, physics-informed approaches have enabled broader generalisation of trained networks, robust solutions of exceptionally ill-posed problems and clever integration of both imaging and recovery schemes. Recent demonstrations have laid out a canvas of tools to create bespoke networks that can strike a balance between deep and classical approaches. These can be tailored to each modality and application, and the ill-posed nature of the inverse problem at hand. The rapid pace seen the past several years indicates a bright future, with learned methods well-poised to feature in many upcoming microscopy innovations.

Funding
We acknowledge funding from the UK Engineering and Physical Sciences Research Council through Grant EP/P030017/1.