Differentiable model-based adaptive optics with transmitted and reflected light

Aberrations limit optical systems in many situations, for example when imaging in biological tissue. Machine learning offers novel ways to improve imaging under such conditions by learning inverse models of aberrations. Learning requires datasets that cover a wide range of possible aberrations, which however becomes limiting for more strongly scattering samples, and does not take advantage of prior information about the imaging process. Here, we show that combining model-based adaptive optics with the optimization techniques of machine learning frameworks can find aberration corrections with a small number of measurements. Corrections are determined in a transmission configuration through a single aberrating layer and in a reflection configuration through two different layers at the same time. Additionally, corrections are not limited by a predetermined model of aberrations (such as combinations of Zernike modes). Focusing in transmission can be achieved based only on reflected light, compatible with an epidetection imaging configuration.


Introduction
Machine learning offers novel approaches to correct for aberrations encountered when imaging though scattering materials, [1,2,3,4] from astronomy [5,6,7,8] to microscopy with transmitted (for example [9,10,11]) and reflected light [12]. To find aberration corrections in these situations, machine learning typically relies on large synthetic datasets. Large datasets are required, first, because the many parameters of deep neural networks need to be adjusted to work under a wide range of conditions and all these conditions need to be covered in the training data. Secondly, machine learning models are typically agnostic about the underlying image generating process. Therefore, even a priori known information, for example the transformations inside the optical system, needs to be learned from data.
In practice, training datasets are often based on combinations of Zernike polynomials [5,6,7,8,9,10,11,12] which might however not accurately capture all aspects of experimentally encountered aberrations. Additionally, for more strongly scattering samples, which require increasingly higher orders of Zernike modes, covering all potential scattering situations by sampling a sufficient number of different mode combinations eventually results in very large datasets. This is in particular the case if aberrations in multiple layers are combined, for example when using reflected light in an epidetection configuration [12].
While finding inverse models through such data driven strategies is well suited for situations where the underlying physical model is undetermined, the image formation process in an optical system is typically at least partially known. This is the basis of model-based adaptive optics, where optical systems modeling is combined with optimization to find an unknown phase aberration [13,14,15,16]. Similar situations where models based on a well known underlying physical process are learned from data are also encountered in other imaging modalities [17], and more broadly many areas of engineering and physics (for example [18,19,20,21,22,23,24,25]). To take advantage of such prior information, methods have been developed that combine physical process models with machine learning optimization. For experiments in an epidetection configuration, light reflected off the mirror M at the sample plane is additionally recorded with a second camera (reflection camera). BS = beam splitter, L = lens (see main text and Methods for details).
For such optimization, first a model is described as a differentiable function mapping input to output. Since the function is differentiable, one can take advantage of automatic differentiation, which is more accurate and computationally efficient than finite differences [26,27,28], and is one of the cornerstones of machine learning frameworks such as Tensorflow. Automatic differentiation is used in these frameworks to compute gradients for optimization of a loss function with respect to parameters of interest. The loss function compares model output to a target output and the discrepancy is minimized by adjusting model parameters ( [18,19,20,21,22,23,24,25]).
Here, we employ this model optimization strategy for adaptive optics: we describe light propagation through the optical system, including unknown aberrations represented as parameters, with a differentiable model (Fig. 1). For matching the input-output relationship of the computational model to the experimental setup we record a number of output images resulting from corresponding input phase modulations and optimize model parameters using Tensorflow. We show that this allows extracting an accurate description of the introduced aberrating layer(s) as verified by focusing in transmission through a single layer, as well as in a reflection, through two layers. In the latter epidetection configuration only reflected light is used for optimization and transmission focusing.

Results
The experimental setup is shown schematically in Fig. 1. An expanded and collimated laser beam is reflected off a spatial light modulator (SLM) with a beam splitter cube (BS 1 ) and a part of the beam is imaged onto a camera (transmission camera) over a beam splitter (BS 3 ). For experiments with reflected light, the remaining part of the beam is additionally sent to a mirror at the sample plane (same focal plane as the transmission camera) which serves as a proxy for a reflecting sample. Light reflected by the mirror is imaged onto a second camera (reflection camera) with a beam splitter (BS 2 ). Aberrations (a layer of nail polish on a microscope slide, see Methods) are introduced between beam splitters BS 2 and BS 3 . The beam undergoes aberrations once to the transmission camera, and twice to the reflection camera.
A single transmission pass through the setup is described by the function (see Methods for details) where P d is a propagation operator over the distance d, U 0 is the complex amplitude of the unmodulated beam at the SLM, φ lens is the phase representation of the lens L 1 , and f 1 is its focal length; φ SLM is the (known) SLM phase modulation, and φ aberration is the (unknown) introduced aberration.
For computational efficiency, the aberration is simulated at the same plane as lens L 1 (see Methods). Finding the unknown aberration which maximizes the similarity, measured with Pearson's correlation coefficient r, of the simulated camera images S(φ SLM , φ aberration ) and experimentally recorded images I was solved in Tensorflow using automatic differentiation and gradient-based optimization (see Methods): To further refine the focus after a first optimization step, a second step was performed with a new set of modulations and corresponding images. In this second step, the correction obtained in the first optimization step was added to all modulations displayed, φ SLM + φ correction1 . The final correction was the sum of the first and second step correction φ correction1 + φ correction2 . We used 180 modulations for transmission and 540 for reflection experiments in each of the two iteration steps.
As seen for two representative examples in Fig. 2 a, b and d, e, respectively, optimization (which results in the corresponding phase profile in Fig. 2 c, f) leads to closely matching (correlation coefficient r is indicated in Fig. 2 b, e) measured and predicted light distributions at the sample (transmission camera) after applying the correction at the SLM. The similarity is quantified with the loss function 1 − r in Fig. 2 o. To verify the achieved correction, the optimized aberration found at the plane of lens L 1 was propagated back to the SLM (see Methods) and the corresponding correction In an epidetection configuration, as typical for imaging in biological samples, only reflected light can be used for finding a correction. Reflected light however accumulates a first aberration encountered in the excitation pass and a (generally different) second aberration in the reflection pass [4,12]. These need to be disentangled for example to recover a transmission pass correction required for generating a focus inside a sample. For focusing in transmission using only reflected light therefore the function S (equation 1) is extended (see Methods) to include the reflection pass from the mirror at the sample plane through the aberration to the second camera (reflection camera), now including an aberration in the transmission as well as in the reflection pass. This model is fitted to match observed reflected light distributions by optimizing at the same time two independent aberrations. Optimization is performed as before (see Methods).
Two representative examples of predicted and measured light distributions at the sample plane (transmission camera) are shown in Fig. 2 g, h and k, l, respectively, and the loss function quantifying the similarity is shown in Fig. 2 o (correlation coefficient r between predicted and measured distributions is indicated in Fig. 2 h, l). The corresponding transmission and reflection phase aberrations at the plane of lens L 1 and L 2 are shown in Fig. 2 i, j and m, n, respectively. To verify the correction, we generated a focus at the sample plane by displaying the corresponding transmission correction on the SLM (see Methods). Fig. 3 shows two representative examples (g-i and j-l) of aberrated focus, corrected focus, and corresponding correction (resulting in an increase in enhancement by a factor of 10.4 and 8.7, respectively, see Methods). In reflection-based transmission control, the obtained focus was not necessarily centered in the field of view (as for example seen in Fig. 3 j, k), due to tilt introduced by the sample that was not corrected. Importantly, in reflection-based transmission control experiments, focusing in transmission is achieved only using reflected light, compatible with an epidetection configuration.
For reflection-based transmission control, aberrations in two different focal planes are independently computed at the same time. This is similar to multiconjugate adaptive optics, where typically however an additional SLM is used to correct for a second focal plane [29,30,31]. Additionally, while we generated single focal spots, arbitrary other focal distributions could be generated as well (for example for applications in optogenetics).

Experimental setup and data acquisition
The laser was from Toptica (iBeam smart, 640 nm), the spatial light modulator from Meadowlark (SLM, ODP512-1064-P8), cameras were from Basler (acA2040-55um). All optical parts were from Thorlabs: BS 1 in Fig. 1 was BS016, BS 2 and BS 3 were EBS2. Lenses were visible achromats, L 1 with focal length 300 mm (AC254-150-A) and L 2 with 150 mm (AC254-150-A). Data were collected by placing an aberrating sample in the optical path between BS2 and BS3 (see Fig. 1), displaying random SLM phase modulations, and recording the resulting 512 × 512 images with the transmission camera, and additionally with the reflection camera for reflection experiments. Random SLM phase modulations were generated by summing the first 78 Zernike modes with random coefficients drawn from a normal distribution with standard deviation π and displayed at a resolution of 512 × 512 pixels on the SLM.
The light intensity in transmission and reflection can vary by several orders of magnitude, exceeding the dynamic range of the cameras. Therefore, in order to capture the full range of intensities, multiple frames with different exposure  times were recorded (each frame with a 12-bit per pixel resolution). In transmission, each frame was recorded with exposures of 60, 120, and 250 ms, respectively, and the resulting image was the sum of the recorded frames weighted by the inverse of the exposure time. Saturated pixels as well as pixels below the noise threshold were discarded. For the reflection camera, images were taken with exposures of 60, 120, 250, 500, and 1000 ms, respectively. Additionally, transmission light intensity was reduced with a neutral density filter wheel (NDM2/M, Thorlabs).

Computational model
Light travelling through the setup is modeled as a complex amplitude U (x, y, z), initialized with U 0 = U (x, y, 0), and propagating through a sequence of planar phase objects and intermittent free space along the optical axis (z-axis; x, y, z are spatial coordinates). A wavefront U (x, y, d) interacting with a phase object φ(x, y, d) at plane d is described as a multiplication Free space propagation of the wavefront over a distance d is calculated using the angular spectrum method with the following operator [32]: Here, A(f X , f Y ; z) is the Fourier transform of U (x, y, z), f X , f Y are spatial frequencies, the circ function is 1 inside the circle with the radius in the argument and 0 outside [32], and H(f X , f Y ) = exp i2π d λ 1 − (λf X ) 2 − (λf Y ) 2 is the optical transfer function. The intensity measured by the camera is given by

Model optimization
We used a Python library for diffractive optics [33] to calculate the known factors of (1). By providing the focal lengths and setup dimensions, discretized versions of the optical transfer functions for the propagation operators and the phase representation of the lenses were determined. The resulting function which relates displayed, known SLM phase modulations and unknown sample aberrations to camera images, was then transferred to Tensorflow. The position of the scatterer, as seen in expression (1), was simulated at the plane of lens L 1 . This saves computations and memory, since each intermediate plane requires additional wavefront propagation calculations. Similarly, for computational efficiency, a single lens L 2 with focal length f 2 = f1 2 was used to focus reflected light onto the reflection camera. While the parameters of the optical model for transmission and reflection were adjusted manually to match the setup, they can equally be tuned using the optimization approach described below, for example to obtain a systems correction.
The model of light propagation in the setup, equation (1), was incorporated in the loss function according to equation (2): where φ SLM and φ aberration , are the phase modulations at the SLM and due to the introduced aberration, respectively. All variables are 512 × 512 real-valued tensors, and φ aberration is the optimization variable.
Similar to the training of neural networks, we split the data into training and validation sets and used batches. We used Adam optimizer with learning rate 0.1 and batch size 30. Optimization with the loss function resulted in matching simulated and experimentally recorded images and yielded the phase profile of the aberration in the setup. The quality of the solution was quantified with the correlation between modelled and recorded images in the validation part of the dataset and was used as the criterion for stopping the optimization. Convergence of the optimization process depended on the magnitude and spatial frequencies of the aberration and the number of samples. Typically a solution with r > 0.6 was sufficient for focusing.
For experiments with reflected light, the simulation of the setup was extended to include the reflected light pass, and the loss function (6) is optimized with variables φ trans and φ ref .

Evaluation
After φ aberration is found through optimization at the lens plane, the corresponding correction at the SLM is found by propagating the conjugate phase of the aberration backwards to the SLM plane φ correction = arg(P −f1 (exp [−iφ aberration ])). Additionally, we smooth the found correction with a low-pass spatial frequency filter: discrete Fourier transform is applied to exp [iφ correction ] and frequencies exceeding 0.1 of the pattern resolution are discarded. When displayed on the SLM, φ correction , this results in a compensation of the aberration.
Aberrations were introduced with a thin layer of transparent nail polish distributed on a microscope slide (inserted between BS 2 and BS 3 ). Two different aberrating samples were used for transmission and reflection experiments. Generally, the strength of aberrations varies depending on sample positioning. Optimization parameters (such as number of phase modulations or learning rate) were only adjusted once for transmission experiments, and once for reflection experiments. As a simple measure for quantifying the shape of the uncorrected light distributions, we use its maximum extension as measured by the length of the first principle component of the pixels above a 30 % intensity threshold, σ. To quantify the change in the distribution before and after correction we compared the uncorrected and corrected distribution, σ rel = σ u /σ c . To additionally quantify the quality of the aberration correction, we also used an enhancement metric defined as ratio of maximum intensity to mean intensity in the frame, η = max(I)/mean(I), comparing it before and after correction η rel = η c /η u . The distribution of these values (η rel , σ rel ) for a series of

Discussion
Differentiable model-based approaches for image reconstruction have been introduced in several domains of imaging [17], for example in ptychography [22]. Instead of directly optimizing model parameters, an additional deep neural network (a deep image prior [34,35]) has also been introduced, for example for phase imaging [23,24] or ptychography [25].Even without such additional regularization, the optimization converged reliably to smooth phase patterns (Fig. 2 c, f, i, m, j, n and Fig. 3 c, f, i, l), but for example a DIP could also be combined with the introduced method to further reduce the number of samples used for optimization.
The number of required samples depends on the magnitude and spatial frequencies of the aberrations, requiring more samples with stronger aberrations. This can be compared to the training of deep neural networks, where the number of required samples for model training similarly increases with increasing aberrations. Compared to deep neural networks, including a physical model of the imaging process allows finding aberration corrections with a small number of samples (albeit only for a single field of view at a time). Different from neural networks which are trained on a predetermined distribution of aberrations (for example based on Zernike polynomials), optimization is achieved independently in each pixel without prior assumptions about aberrations.
Similar to other techniques that require multiple measurements for finding a correction [4,13], a limitation of the presented approach for dynamic samples is the time it takes to find a correction. The optimization time, several minutes on a single GPU, could be reduced by using multiple GPUs. Generally, the gap between optimization and control (corresponding to rapidly changing corrections in response to aberrations) is expected to narrow with increasing computational power [36].
Thanks to advanced computational frameworks [33,37], the introduced model-based optimization can easily be combined with any optical setup equipped with a spatial light modulator and a camera without requiring additional hardware such as wavefront sensors or interferometers. For example, the described technique could be combined with imaging through scattering materials in a microscope with a high numerical aperture objective in an epidetection configuration [12]. In summary, we expect that the developed method will be useful in many situation that can benefit from correcting aberrations through single and multiple layers.

Funding
This work was supported by the Max Planck Society and the research center caesar.

Disclosures
The authors declare no conflicts of interest.