Computational structured illumination for high-content fluorescent and phase microscopy

High-content biological microscopy targets high-resolution imaging across large fields-of-view (FOVs). Recent works have demonstrated that computational imaging can provide efficient solutions for high-content microscopy. Here, we use speckle structured illumination microscopy (SIM) as a robust and cost-effective solution for high-content fluorescence microscopy with simultaneous high-content quantitative phase (QP). This multi-modal compatibility is essential for studies requiring cross-correlative biological analysis. Our method uses laterally-translated Scotch tape to generate high-resolution speckle illumination patterns across a large FOV. Custom optimization algorithms then jointly reconstruct the sample's super-resolution fluorescent (incoherent) and QP (coherent) distributions, while digitally correcting for system imperfections such as unknown speckle illumination patterns, system aberrations and pattern translations. Beyond previous linear SIM works, we achieve resolution gains of 4x the objective's diffraction-limited native resolution, resulting in 700 nm fluorescence and 1.2 um QP resolution, across a FOV of 2x2.7 mm^2, giving a space-bandwidth product (SBP) of 60 megapixels.


Introduction
The space-bandwidth product (SBP) metric characterizes information content transmitted through an optical system; it can be thought of as the number of resolvable points in an image (i.e. the system's field-of-view (FOV) divided by the size of its point spread function (PSF) [1,2]). Typical microscopes collect images with SBPs of ¡20 megapixels, a practical limit set by the systems' optical design and camera pixel count. For large-scale biological studies in systems biology and drug discovery, fast high-SBP imaging is desired [3][4][5][6][7][8][9][10]. The traditional solution for increasing SBP is to use an automated translation stage to scan the sample laterally, then stitch together high-content images. However, such capabilities are costly, have long acquisition times and require careful auto-focusing, due to small depth-of-field (DOF) and axial drift of the sample over large scan ranges [11].
Instead of using high-resolution optics and mechanically scanning the FOV, new approaches for high-content imaging use a low-NA objective (with a large FOV) and build up higher resolution by computationally combining a sequence of lowresolution measurements [12][13][14][15][16][17][18][19][20][21][22][23][24][25]. Such approaches typically illuminate the sample with customized patterns that encode high-resolution sample information into lowresolution features, which can then be measured. These methods reconstruct features smaller than the diffraction limit of the objective, using concepts from synthetic aperture [26][27][28] and super-resolution (SR) [29][30][31][32][33][34]. Though the original intent was to maximize resolution, it is important to note that by increasing resolution, SR techniques also increase SBP, and therefore have application in high-content microscopy. Eliminating the requirement for long-distance mechanical scanning means that acquisition is faster and less expensive, while focus requirements are also relaxed by the larger DOF of low-NA objectives.
Though most SIM implementations have focused on super-resolution, some previous works have recognized its suitability for high-content imaging [18][19][20][21][22][23][24]. However, these predominantly relied on fluorescence imaging with calibrated illumination patterns, which are difficult to realize in practice because lens-based illumination has finite SBP. Here, we use random speckle illumination, generated by scattering through Scotch tape, in order to achieve both high-NA and large FOV illumination.
Our method is related to blind SIM [46]; however, instead of using many random speckle patterns (which restricts resolution gain to ∼1.8×), we translate the speckle laterally, enabling resolution gains beyond that of previous methods [46][47][48][49][50][51][52] (see Appendix D). Previous works also use high-cost spatial-light-modulators (SLM) [53] or galvonemeter/MEMs mirrors [41,54] for precise illumination, as well as expensive objective lenses for aberration correction. We eliminate both of these requirements by performing computational self-calibration, solving for the translation trajectory and the field-dependent aberrations of the system.
Our proposed framework enables three key advantages over existing methods: • resolution gains of 4× the native resolution of the objective (linear SIM is usually restricted to 2×) [46-52, 55, 56], • synergistic use of both the fluorescent (incoherent) and quantitative-phase (coherent) signal from the sample to enable multi-modal imaging, • algorithmic self-calibration to significantly relax hardware requirements, enabling low-cost and robust imaging.
In our experimental setup, the Scotch tape is placed just before the sample and mounted on a translation stage (Fig. 1). This generates disordered speckles at the sample that are much smaller than the PSF of the imaging optics, encoding SR information. Nonlinear optimization methods are then used to jointly reconstruct multiple calibration quantities: the unknown speckle illumination pattern, the translation trajectory of the pattern, and the field-dependent system aberrations (on a patch-by-patch basis). These are subsequently used to decode the SR information of both fluorescence and phase. Compared to traditional SIM systems that use high-NA objective lenses, our system utilizes a low-NA low-cost lens to ensure large FOV. The Scotch tape generated speckle illumination is not resolution-bound by any imaging lens; this is what allows us to achieve 4× resolution gains. The result is high-content imaging at sub-micron resolutions across millimeter scale regions. Various previous works have achieved cost-effectiveness, high-content (large SBP), or multiple modalities, but we believe this to be the first to simultaneously encompass all three.

Theory
SIM generally achieves super-resolution by illuminating the sample with a high spatialfrequency pattern that mixes with the sample's information content to form lowresolution "beat" patterns (i.e. moire fringes). Measurements of these "beat" patterns allow elucidation of sample features beyond the diffraction-limited resolution of the imaging system. Maximum achievable resolution in SIM is set by the sum of the numerical apertures (NAs) of the illumination pattern, NA illum , and the imaging system, NA sys . Thus, SIM enables a resolution gain factor (over the system's native Figure 1: Structured illumination microscopy (SIM) with laterally-translated Scotch tape as the patterning element, achieving 4× resolution gain. Our imaging system has both an incoherent arm, where Sensor-F captures raw fluorescence images (at the emission wavelength, λ em = 605 nm) for fluorescence super-resolution, and a coherent arm, where Sensor-C1 and Sensor-C2 capture images with different defocus (at the laser illumination wavelength, λ ex = 532 nm) for both super-resolution phase reconstruction and speckle trajectory calibration. OBJ: objective, AP: adjustable iris-aperture, DM: dichroic mirror, SF: spectral filter, ND-F: neutral-density filter. resolution) of (NA illum + NA sys )/NA sys [33]. The minimum resolvable feature size is inversely related to this bound, d ∝ 1/(NA illum + NA sys ).
Linear SIM typically maximizes resolution by using either: 1) a high-NA objective in epi-illumination configuration, or 2) two identical high-NA objectives in transmission geometry [33,35]. Both result in a maximum of 2× resolution gain because NA illum = NA sys , which corresponds to an SBP increase by a factor of 4×. Given the relatively low native SBP of high-NA imaging lenses, such increases are not sufficient to qualify as high-content imaging. Though nonlinear SIM techniques can enable higher resolution gains [34], they require either fluorophore photo-switching or saturation capabilities, which can associate with photobleaching and low SNR, and are not compatible with coherent QP techniques.
In this work, we aim for > 2× resolution gain; hence, we need the illumination NA to be larger than the detection NA, without using a high-resolution illumination lens (that would restrict the illumination FOV). To achieve this, we use a wide-area high-angle scattering element -layered Scotch tape -on the illumination side of the sample (Fig. 1). Multiple scattering within the tape creates a speckle pattern with finer features than the PSF of the imaging system, i.e. NA illum > NA sys . This means that spatial frequencies beyond 2× the objective's cutoff are mixed into the measurements, which gives a chance to achieve resolution gains greater than two.
The following sections outline the algorithm that we use to reconstruct large SBP fluorescence and QP images from low-resolution acquisitions of a sample illuminated by a laterally-translating speckle pattern. Unlike conventional SIM reconstruction methods that use analytic linear inversion, our strategy relies instead on joint-variable iterative optimization, where both the sample and illumination speckle (which is unknown) are reconstructed [25,55,56].

Super-resolution fluorescence imaging
Fluorescence imaging requires an incoherent imaging model. The intensity at the sensor is a low-resolution image of the sample's fluorescent distribution, obeying the system's incoherent resolution limit, d sys = λ em /2NA sys , where λ em is the emission wavelength. The speckle pattern generated through the Scotch tape excites the fluorescent sample with features of minimum size d illum = λ ex /2NA illum , where λ ex is the excitation wavelength and NA illum is set by the scattering angles exiting the Scotch tape. Approximating the excitation and emission wavelengths as similar (λ = λ ex ≈ λ em ), the resolution limit of the SIM reconstruction is d SIM ≈ λ/2(NA sys + NA illum ), with a resolution gain factor of d sys /d SIM . This factor is mathematically unbounded; however, it will be practically limited by the illumination NA and SNR (see Appendix D).

Incoherent forward model for fluorescence imaging
Plane-wave illumination of the Scotch tape, positioned at the -th scan-point, r , creates a speckle illumination pattern, p f (r − r ), at the plane of the fluorescent sample, o f (r), where subscript f identifies variables in the fluorescence channel. The fluorescent signal is imaged through the system to give an intensity image at the camera plane: where r is the 2D spatial coordinates (x, y), h f (r) is the system PSF, and N img is the total number of images captured. The subscript describes the acquisition index. In this formulation, o f (r), h f (r), and I f, (r) are 2D M × M -pixel distributions. To accurately model different regions of the pattern translating into the object's M × M FOV with incrementing r , we initialize p f (r) as a N × N pixel 2D distribution, with N > M , and introduce a cropping operator C to select the M × M region of the scanning pattern that illuminates the sample.

Inverse problem for fluorescence imaging
We next formulate a joint-variable optimization problem to extract SR estimates of the sample, o f (r), and illumination distributions, p f (r), from the raw fluorescence measurements, I f, (r), as well as refine the estimate of the system's PSF [25] (aberrations) and speckle translation trajectory, r . We start with a crude initialization from raw observations of the speckle made using the coherent imaging arm (more details in Sec. 2.3). Defining f f (o f , p f , h f , r 1 , . . . , r N img ) as a joint-variable cost function that measures the difference between the raw intensity acquisitions and the expected intensities from estimated variables via the forward model, we have: To solve, a sequential gradient descent [57,58] algorithm is used, where the gradient is updated once for each measurement. The sample, speckle pattern, system's PSF and scanning positions are updated by sequentially running through N img measurements within one iteration. After the sequential update, an extra Nesterov's accelerated update [59] is included for both the sample and pattern estimate, to speed up convergence. Appendix A contains a detailed derivation of the gradient with respect to the sample, structured pattern, system's PSF and the scanning position based on the linear algebra vectorial notation. The algorithm is described in Appendix B.

Super-resolution quantitative-phase imaging
In this section, we present our coherent model for SR quantitative-phase (QP) imaging. A key difference between the QP and fluorescence imaging processes is that the detected intensity at the image plane for coherent imaging is nonlinearly related to the sample's QP [1,38]. Thus, solving for a sample's QP from a single intensity measurement is a nonlinear and ill-posed problem. To circumvent this, we use intensity meaurements from two planes, one in-focus and one out-of-focus, to introduce a complex-valued operator that couples QP variations into measurable intensity fluctuations, making the reconstruction well-posed [60,61]. The defocused measurements are denoted by a new subscript variable z. Figure 1 shows our implementation, where two defocused sensors are positioned at z 0 and z 1 in the coherent imaging arm. Generally, the resolution for coherent imaging is roughly half that of its incoherent counterpart [1] . For our QP reconstruction, the resolution limit is d SIM = λ ex /(NA sys + NA illum ), where the coherent resolution of the native system and the speckle are d sys = λ ex /NA sys and d illum = λ ex /NA illum , respectively.

Coherent forward model for phase imaging
Assuming an object with 2D complex transmittance function o c (r) is illuminated by a speckle field, p c (r), where subscript c refers to the coherent imaging channel, positioned at the -th scanning position r , we can represent the intensity image formed via coherent diffraction as: where g c, z (r) and h c,z (r) are the complex electric-fields at the imaging plane and the system's coherent PSF at defocus distance z, respectively. The comma in the subscript separates the channel index, c or f , from the scanning-position and acquisitionnumber indices, and z. N img here indicates the total number of translations of the Scotch tape. The defocused PSF can be further broken down into h c, is the in-focus coherent PSF and h z (r) is the defocus kernel. Similar to Section 2.
C is a cropping operator that selects the sub-region of the pattern that interacts with the sample. The sample's QP distribution is simply the phase of the object's complex transmittance, ∠o c (r).

Inverse problem for phase imaging
We now take the raw coherent intensity measurements, I c, z (r), and the registered trajectory, r z , from both of the defocused coherent sensors (more details in Sec. 2.3) as input to jointly estimate the sample's SR complex-transmittance function, o c (r), and illumination complex-field, p c (r), as well as the aberrations inherent in the system's PSF, h c (r). The optimization also further refines the scanning trajectory, r z . Based on the forward model, we formulate the joint inverse problem: minimize oc,pc,hc,r 1z 0 ,r 1z 1 , Here, we adopt an amplitude-based cost function, f c , which robustly minimizes the distance between the estimated and measured amplitudes in the presence of noise [57,61,62]. We optimize the pattern trajectories, r ,z 0 and r ,z 1 , separately for each coherent sensor, in order to account for any residual misalignment or timing-mismatch (see Sec. 2.3). As in the fluorescence case, sequential gradient descent [57,58] was used to solve this inverse problem.

Registration of coherent images
Knowledge of the Scotch tape scanning position, r , reduces the complexity of the joint sample and pattern estimation problem and is necessary to achieve SR reconstructions with greater than 2× resolution gain. Because our fluorescent sample is mostly transparent, the main scattering component in the acquired raw data originates from the Scotch tape. Thus, using a sub-pixel registration algorithm [63] between successive coherent-camera acquisitions, which are dominated by the scattered speckle signal, is sufficient to initialize the scanning trajectory of the Scotch tape, where R is the registration operator. These initial estimates of r z are then updated, alongside o f (r), o c (r), p f (r), and p c (r) using the inverse models described in Sec. 2.1.2 and 2.2.2. In the fluorescence problem described in Sec. 2.1.2, we only use the trajectory from the in-focus coherent sensor at z = 0 for initialization, so we omit the subscript z in r z .

Fluorescence super-resolution verification
We start with a proof-of-concept experiment to verify that our method accurately reconstructs a fluorescent sample at resolutions greater than twice the imaging sys-tem's diffraction-limit. To do so, we use the higher-resolution objective (40×, NA 0.65) and a tunable Fourier-space iris-aperture (AP) that allows us to artificially reduce the system's NA (NA sys ), and therefore, resolution. With the aperture mostly closed (to NA sys = 0.1), we acquire a low-resolution SIM dataset, which is then used to computationally reconstruct a super-resolved image of the sample with resolution corresponding to an effective NA = 0.4. This reconstruction is then compared to the widefield image of the sample acquired with the aperture open to NA sys = 0.4, for validation. Figure 2(d) shows the final SR reconstruction of the fluorescent sample in real space, along with the amplitude of its Fourier spectrum. Individual microspheres can be clearly resolved, and results match well with the 0.4 NA deconvolved widefield image (Fig. 2(e)). Fourier-space analysis confirms our resolution improvement factor to be 4×, which suggests that the Scotch tape produces NA illum ≈ 0.3. To verify, we fully open the aperture and observe that the speckle pattern contains spatial frequencies up to NA illum ≈ 0.35 (Fig. 2(b)).

Coherent super-resolution verification
To quantify super-resolution in the coherent imaging channel, we use the low-resolution objective (4×, NA 0.1) to image a USAF1951 resolution chart (Benchmark Technologies). This phase target provides different feature sizes with known phase values, so is a suitable calibration target to quantify both the coherent resolution and the phase sensitivity of our technique.
Results are shown in Fig. 3. The coherent intensity image ( Fig. 3(a)) acquired with 0.1 NA (no tape) has low resolution (∼ 5.32 µm), so hardly any features can be resolved . In Fig. 3(b), we show the "ground truth" QP distribution at 0.4 NA, as provided by the manufacturer. After inserting the Scotch tape, it was translated in 400 nm increments on a 36 × 36 rectangular grid, giving N img = 1296 total acquisitions (details in Sec. 4) at each of the two defocused coherent sensors (Fig. 3(c)). Figure 3(d,e) shows the SR reconstruction for the amplitude and phase of this sample, resolving features up to group 9 element 5 (1.23 µm separation). Thus, our coherent reconstruction has a ∼ 4× resolution gain compared to the brightfield intensity image.

High-content multi-modal microscopy
Of course, artificially reducing resolution in order to validate our method required using a moderate-NA objective, which precluded imaging over the large FOVs allowed by low-NA objectives. In this section, we demonstrate high-content fluorescence imaging with the low-resolution, large FOV objective (4×, NA 0.1) to visualize a 2.7×3.3 mm 2 FOV (see Fig. 4(a)). We note that this FOV is more than 100× larger than that allowed by the 40× objective used in the verification experiments, so is suitable for large SBP imaging.
Within the imaged FOV for our 1 µm diameter microsphere monolayer sample, we zoom in to four regions-of-interest (ROI), labeled 1 , 2 , 3 , and 4 . Widefield fluorescence imaging cannot resolve individual microspheres, as expected. Using our method, however, gives a factor 4× resolution gain across the whole FOV and enables resolution of individual microspheres. Thus, the SBP of the system, natively ∼5.3 mega-pixels of content, was increased to ∼85 mega-pixels, a factor of 4 2 = 16×. Though this is still not in the Gigapixel range, this technique is scalable and could reach that range with a higher-SBP objective and sensors.
We next include the QP imaging channel to demonstrate high-content multimodal imaging, as shown in Fig. 5. The multimodal FOV is smaller (2×2.7 mm 2 FOV) than that presented in Fig. 4 because our coherent detection sensors have a lower pixelcount than our fluorescence detection sensor. Figure 5 includes zoom-ins of three ROIs to visualize the multimodal SR.
As expected, the widefield fluorescence image and the on-axis coherent intensity image do not allow resolution of individual 2 µm microspheres, since the theoretical resolution for fluorescence imaging is λ em /2NA sys ≈ 3µm and for QP imaging is λ ex /NA sys ≈ 5µm. However, our SIM reconstruction with 4× resolution gain enables clear separation of the microspheres in both channels. Our fluorescence and QP reconstructions match well, which is expected since the fluorescent and QP signal originate from identical physical structures in this particular sample.
The full-FOV reconstructions ( Fig. 4 and 5) are obtained by dividing the FOV into small patches, reconstructing each patch, then stitching together the high-content images. Patch-wise reconstruction is computationally favorable because of its lowmemory requirement, but also allows us to correct field-dependent aberrations. Since we process each patch separately using our self-calibration algorithm, we solve for each patch's PSF independently and correct the local aberrations digitally. The reconstruction takes approximately 15 minutes for each channel on a high-end GPU (NVIDIA, TITAN Xp) for a patch with FOV of 110 × 110 µm 2 .

Discussion
Unlike many existing high-content imaging techniques, one benefit of our method is its easy compatibility for simultaneous QP and fluorescence imaging. This arises from SIM's unique ability to multiplex both coherent and incoherent signals into the system aperture [35]. Furthermore, existing high-content fluorescence imaging techniques that use micro-lens arrays [18][19][20][21][22][23] are resolution-limited by the physical size of the lenslets, which typically have NA illum < 0.3. Recent work [24] has introduced a framework in which gratings with sub-diffraction slits allow sub-micron resolution across large FOVs -however, this work is heavily limited by SNR, due to the primarily opaque grating, as well as tight required axial alignment. Though the Scotch tape used in our proof-of-concept prototype also induced illumination angles within a similar range as micro-lens arrays (NA illum ≈ 0.35), we could in future use a stronger scattering media to achieve NA illum ≈ 1.0, enabling further SR and thus larger SBP. The main drawback of our technique is that we use around ∼ 1200 translations of the Scotch tape for each reconstruction, which results in long acquisition times (∼ 180 seconds for shifting, pausing, and capturing) and higher photon requirements. Heuristically, for both fluorescence and QP imaging, we found that a sufficiently large scanning range (larger than ∼ 2 low-NA diffraction limited spot sizes) and finer scan steps (smaller than the targeted resolution) can reduce distortions in the reconstruction. Tuning such parameters to minimize the number of acquisitions without degrading reconstruction quality is thus an important subject for future endeavors.

Conclusion
We have presented a large-FOV multimodal SIM fluorescence and QP imaging technique. We use Scotch tape to efficiently generate high-resolution features over a large FOV, which can then be measured with both fluorescent and coherent contrast using a low-NA objective. A computational optimization-based self-calibration algorithm corrected for experimental uncertainties (scanning-position, aberrations, and random speckle pattern) and enabled super-resolution fluorescence and quantitative phase reconstruction with factor 4× resolution gain.

.1. Fluorescence imaging vectorial model
In order to solve the multivariate optimization problem in Eq. (2) and (4) and derive the gradient of the cost function, it is more convenient to consider a linear algebra vectorial notation of the forward models. The fluorescence SIM forward model in Eq. (1) can be alternatively expressed as where I f, , H f , S(r ), p f , and o f designate the raw fluorescent intensity vector, diffraction-limit low-pass filtering operation, pattern translation/cropping operation, N 2 × 1 speckle pattern vector, and M 2 × 1 sample's fluorescent distribution vector, respectively. The 2D-array variables described in (1) are all reshaped into column vectors here. H f and S(r ) can be further broken down into their individual vectorial components: whereh f is the OTF vector and e(r ) is the vectorization of the exp(−j2πu · r ) function, where u is spatial frequency. The notation diag(a) turns a n × 1 vector, a, into an n × n diagonal matrix with diagonal entries from the vector entries. F N and F M denote the N × N -point and M × M -point 2D discrete Fourier transform matrix, respectively, and Q is the M 2 × N 2 cropping matrix. With this vectorial notation, the cost function for a single fluorescence measurement is where is the cost vector and T denotes the transpose operation.

A.1.2. Coherent imaging vectorial model
As with the fluorescence vectorial model, we can rewrite Eq. (3) using vectorial notation: where o c and p c are the M 2 × 1 sample transmittance function vector and N 2 × 1 structured field vector, respectively.h c andh z are the system pupil function and the deliberate defocus pupil function, respectively. With this vectorial notation, we can then express the cost function for a single coherent intensity measurement as where f c, z = I c, z − |g c, z | is the cost vector for the coherent intensity measurement.

A.2. Gradient derivation
Turning the row gradient vector into a M 2 × 1 column vector in order to update the object vector in the right dimension, we the final gradient becomes To compute the gradient of p f , we first rewrite the cost vector f f, as Now, we can write the gradient of the cost function with respect to the pattern vector in row and column vector form as Similar to the derivation of the pattern function gradient, it is easier to work with the rewritten form of the cost vector expressed as The gradient of the cost function with respect to the OTF vector in the row and column vector form are expressed, respectively, as where a denotes entry-wise complex conjugate operation on any general vector a. One difference between this gradient and the previous one is that the variable to solve,h f , is now a complex vector. When turning the gradient row vector of a complex vector into a column vector, we have to take a Hermitian operation, †, on the row vector following the conventions in [64]. We will have more examples of complex variables in the coherent model gradient derivation. For taking the gradient of the scanning position, we again rewrite the cost vector f f, : We can then write the gradient of the cost function with respect to the scanning position as (19) where q is either the x or y spatial coordinate component of r . u q is the N 2 × 1 vectorial notation of the spatial frequency function in the q direction.
To numerically evaluate these gradients, we represent them in the functional form as: where a * stands for complex conjugate of any general function, a, F is the Fourier transform operator, and P is a zero-padding operator that pads an M × M image to size N × N pixels. In this form, The gradients for the sample and the structured pattern are of the same size as o f (r) and p f (r), respectively. Ideally, the gradient of the the scanning position in each direction is a real number. However, due to imperfect implementation of the discrete differentiation in each direction, the gradient will have small imaginary value that will be dropped in the update of the scanning position.

A.2.2. Gradient derivation for coherent imaging
For the coherent imaging case, we will derive the gradients of the cost function in Eq. (11) with respect to the sample transmittance function o c , speckle field p c , pupil functionh c , and the scanning position r z . First, we take the gradient of f c, z with respect to o c , we then have the gradient in the row and column vector forms as where the g c, z |gc, z | operation denotes entry-wise division between the two vectors, g c, z and |g c, z |. In addition, the detailed calculation of ∂f c, z ∂g c, z can be found in the Appendix of [57]. Next, we take the gradient with respect to the pattern field vector, p c , and write down the corresponding row and column vectors as In order to calculate ∂g c, z ∂pc , we need to reorder the dot multiplication of o c and S(r z )p c as we did in deriving the gradient of the pattern for fluorescence imaging. In order to do aberration correction, we will need to estimate the system pupil function,h c . The gradient with respect to the pupil function can be derived as, In the end, the gradient of the scanning position for refinement can be derived as where q is either the x or y spatial coordinate component of r z .

B.1. Algorithm for fluorescence imaging
First, we initialize the sample, o f (r), with the mean image of all the structure illuminated images, I f, (r), which is approximately a widefield diffraction-limited image. As for the structured pattern, p f (r), we initialize it with a all-one image. The initial OTF,h f (u), is set as a non-aberrated incoherent OTF. Initial scanning positions are from the registration of the in-focus coherent speckle images, I c, z (r) (z = 0). In the algorithm, K f is the total number of iterations (K f = 100 is generally enough for convergence). At every iteration, we sequentially update the sample, structured pattern, system's OTF and the scanning position using each single frame from = 1 to = N img . A Nesterov acceleration step is applied on the sample and the structured pattern at the end of each iteration. The detailed algorithm is summarized in Algorithm 1.
For the coherent imaging reconstruction, we use a total number of K c ≈ 30 iterations to converge. We sequentially update o c (r), p c (r), h c (r), and r , ( = 1, . . . , N img ) for each defocused plane (total number of defocused planes is N z ) per iteration. Unlike for our fluorescence reconstructions, we do not use the extra Nesterov's acceleration step in the QP reconstruction. prepared by placing microsphere dilutions (60 uL stock-solution/500 uL isopropyl alcohol) onto #1.5 coverslips and then allowing to air-dry. High-index oil (n m (λ) = 1.52 at λ = 532 nm) was subsequently placed on the coverslip to index-match the microspheres. An adhesive spacer followed by another #1.5 coverslip was placed on top of the original coverslip to assure a uniform sample layer for imaging. without self-calibration. The SR reconstruction with no self-calibration contains severe artifacts in reconstructions of both the speckle illumination pattern and the sample's fluorescent distribution. With OTF correction, dramatic improvements in the fluorescence SR image are evident. OTF correction is especially important when imaging across a large FOV ( Fig. 4 and 5) due to space-varying aberrations.
Further self-calibration to correct for errors in the initial estimate of the illumination pattern's trajectory enables further refinement of the SR reconstruction. We see that this illumination trajectory demonstrates greater smoothness after undergoing self-calibration. We fully expect that this calibration step to have important ramifications in cases where the physical translation stage is of lower stability or more inaccurate incremental translation. We also test how the self-calibration affects our phase reconstruction, using the same dataset as in Fig. 3. Similar to the conclusion from the fluorescence selfcalibration demonstration, pupil correction (coherent OTF) plays an important role in reducing SR reconstruction artifacts as shown in Fig. 7. The reconstructed pupil phase suggests that our system aberration is mainly caused by astigmatism. Further refinement of the trajectory of the illumination pattern improves the SR resolution by resolving one more element (group 9 element 6) of the USAF chart. Paying more attention to the uncorrected and corrected illumination trajectory, we find that the self-calibrated trajectory of the illumination pattern tends to align the trajectories from the two coherent cameras. We also notice that the trajectory from the quantitative-phase channels seems to jitter more compared to the fluorescence chan- Figure 7: Algorithmic self-calibration significantly improves coherent super-resolution reconstructions. We show a comparison of reconstructed amplitude, phase, speckle amplitude, and phase of the pupil function with no correction, pupil correction, and both pupil correction and scanning position correction. The right panel shows the overlay of scannning position trajectory for the in-focus and defocused cameras before and after correction.
nel. We hypothesize that this is due to longer exposure time for each fluorescence acquisition, which would average out the jitter.