ML-SIM: universal reconstruction of structured illumination microscopy images using transfer learning

Structured illumination microscopy (SIM) has become an important technique for optical super-resolution imaging because it allows a doubling of image resolution at speeds compatible with live-cell imaging. However, the reconstruction of SIM images is often slow, prone to artefacts, and requires multiple parameter adjustments to reflect different hardware or experimental conditions. Here, we introduce a versatile reconstruction method, ML-SIM, which makes use of transfer learning to obtain a parameter-free model that generalises beyond the task of reconstructing data recorded by a specific imaging system for a specific sample type. We demonstrate the generality of the model and the high quality of the obtained reconstructions by application of ML-SIM on raw data obtained for multiple sample types acquired on distinct SIM microscopes. ML-SIM is an end-to-end deep residual neural network that is trained on an auxiliary domain consisting of simulated images, but is transferable to the target task of reconstructing experimental SIM images. By generating the training data to reflect challenging imaging conditions encountered in real systems, ML-SIM becomes robust to noise and irregularities in the illumination patterns of the raw SIM input frames. Since ML-SIM does not require the acquisition of experimental training data, the method can be efficiently adapted to any specific experimental SIM implementation. We compare the reconstruction quality enabled by ML-SIM with current state-of-the-art SIM reconstruction methods and demonstrate advantages in terms of generality and robustness to noise for both simulated and experimental inputs, thus making ML-SIM a useful alternative to traditional methods for challenging imaging conditions. Additionally, reconstruction of a SIM stack is accomplished in less than 200 ms on a modern graphics processing unit, enabling future applications for real-time imaging. Source code and ready-to-use software for the method are available at http://ML-SIM.github.io.


PERFORMANCE ASSESSMENT ON TEST IMAGE SET
Test on two different image sets, DIV2K and Kodak 24. The sets consist of 10 and 24 images, respectively, all of which are distinct from the original images used for the training data.

DIV2K Test Set
Kodak 24 Table S1. Test scores on simulated raw SIM data generated from image sets DIV2K and Kodak 24 for commonly used reconstruction methods and for ML-SIM.

RESIDUAL NEURAL NETWORK ARCHITECTURE OF ML-SIM
The model used in ML-SIM is a deep residual neural network that is largely based on the ResNet architecture and the extensions to single image super-resolution with EDSR and RCAN. A diagram is shown on Figure S2.  Fig. S2. The architecture of ML-SIM is inspired by state-of-the-art single image superresolution architectures. Here the architecture of EDSR is shown, but the same structure applies to RCAN only with a more complex block called a channel attention block. ML-SIM has a RCAN architecture without an upsampling module and with a larger input layer that handles 9 frames. Fourier transform of A. The resolution limit can be visualised as a cutoff frequency k d beyond which no spatial frequency information from the sample is collected. The frequency components of the striped illumination pattern are visible as bright peaks close to the cutoff frequency. (C) The frequency components of the excitation pattern, k 0 , are chosen to be as close to the diffraction limit as possible, to maximise resolution increase. The interference of the patterned illumination with the sample pattern means the observed region of frequency space now contains frequency components from outside the supported region, shifted by ±k 0 . (D) By shifting the phase of the pattern, the regions of frequency space can be isolated and moved to the correct location in frequency space. The maximum spatial frequency recovered is now k d + k 0 .

POISSON NOISE FOR DATA GENERATION
By default the ML-SIM model uses Gaussian noise source for data generation. The underlying Gaussian distribution is randomised from image to image to make the model more generalised. In microscopy, however, Poisson noise is often the predominant noise source [2]. We tested whether the performance of ML-SIM is significantly affected by the noise model used to generate the test data and performed reconstructions of images corrupted by Poisson noise. The results are shown in Figure S4 below. We have not found a strong sensitivity on the type of noise source used for data generation and other factors, such as blur caused by the PSF, out-of-focus light and errors in the SIM illumination pattern, (i.e. errors in phase shifts or stripe orientations) were found to have a more significant effect. On the other hand, high levels of synthetic noise used for data generation may be detrimental to the final performance of a model.

INFLUENCE OF SIM STACK SIZE
Almost all the reconstruction outputs presented in the main paper are based on a ML-SIM models trained to work on a SIM configuration with illumination patterns consisting of three orientations and three phase shifts, a 3x3 configuration. However, the ML-SIM pipeline fully supports any configuration of SIM, and the usual benefits of using larger SIM stacks also apply here. One benefit is noise robustness and consequently an improved reconstruction quality, but at the risk of photo-damage to the sample and lower imaging speed. The improvement in reconstruction quality when used on simulated test images is shown in Figure S5, where models for 3x3 (default), 3x5 and 5x5 SIM configurations are compared. The mean value of the respective structural similarity index measures are obtained by averaging over a total of 1000 test images that have been reconstructed with each method. Each test image exists in three versions according to the different SIM configurations, but the underlying point spread function as well as the noise and error characteristics are similar.

MODULATION DEPTH, FREQUENCY, PHASE ERRORS AND ORIENTATION ANGLES
As described in Section 2.2, the illumination stripe patterns are calculated from their spatial frequency k 0 and a phase φ, where [k x , k y ] = [k 0 cos θ, k 0 sin θ] for a pattern orientation θ and m is the modulation depth. The training data for training ML-SIM is generated with randomised values for k 0 and m by sampling uniformly from the intervals k 0 ∈ [0.22, 0.28] cycles/px and m ∈ [0.65, 0.95], respectively. In a standard SIM implementation a number of illumination phase shifts, φ, are used at each orientation according to an evenly spaced interval. For a typical configuration of three orientations and three phase shifts (3x3), the phase shift values might therefore be 0, 1 3 × 2π and 2 3 × 2π. Depending on the nature of the SIM instrumentation that produces the illumination patterns, these phase shifts will be offset by some error, and furthermore they may not be highly consistent from image stack to image stack. Thus, it is of high importance to include an approximation of phase errors in the training data generation for ML-SIM to obtain a model that is robust to such errors. In the most extreme case, the phase shifts could be completely random with no constraint as to whether the values are too similar or not sufficiently spaced across the 2π period. This is how the default ML-SIM model presented in the main paper has been trained with the aim of improving the generality of the model. This is referred to as a model with high (phase) error tolerance. A corresponding model with everything kept the same but with consistent phase steps, i.e. each phase only deviating from its ideal value by a few percent, is referred to as a model with low error tolerance.
Another parameter that will vary across distinctive, real SIM systems is the order of the illumination stripe orientations. The 9 frames in a 3x3 SIM stack might be ordered according to orientation angles of 0 • , 0 • , 0 • , 120 • , 120 • , 120 • , 240 • , 240 • , 240 • from the first axis, e.g. the x-axis of the image frame. However, there is no standard across different systems, so the order of the frames could equally correspond to orientation angles of 240 • , 240 • , 240 • , 0 • , 0 • , 0 • , 120 • , 120 • , 120 • . In addition to this there are also offsets and errors in the actual angles. To make ML-SIM able to work well despite uncertainty about the particular ordering, and in the presence of errors and other offsets to the orientations, the simulated SIM images in the training data consist of all the permutations by using randomisation. A model that is not trained with these different permutations is referred to as an orientation dependent model -i.e. the ordering of orientations is fixed in all of the training data samples.
The above-mentioned ML-SIM models have been tested on actual SIM images acquired experimentally and a simulated test dataset of 100 images with a high presence of phase errors and random orientation ordering. An example of reconstruction outputs of a SIM image of beads on Microscope 2, as defined in the main paper, in addition to the mean structural similarity index measures across the simulated test image set are shown on Figure S6. The output from the two models with low and high error tolerance appear similar on experimental data, and only significantly differ when testing on the simulated images that are known to have a high level of phase shift errors. The model that is orientation dependent appears to lose both resolution and contrast when testing on experimentally acquired SIM images, as indicated by the example on Figure S6, but performs at a similar quality as the model with low error tolerance on the test images with high phase shift errors.

INSPECTION OF FREQUENCY SUPPORT
The resolution improvement provided by ML-SIM can also be visualised in frequency space as an extension of the spatial frequency pass band (i.e. high spatial frequencies in the Fourier transform of the reconstructions). Figure S7 shows a comparison of reconstruction techniques in frequency space. The raw data was acquired by imaging microtubules labelled with Alexa-647 on the spatial light modulator based SIM microscope with a 647 nm excitation laser and a 1.2 numerical aperture water immersion objective. Both ML-SIM and FairSIM have extended the range of frequencies supported, indicating high resolution information is present in the reconstruction. FSA was performed for a reconstruction of SIM data acquired on microscope 1 of microtubules labelled with Alexa-647. Note that the cut-off frequency for the widefield is lower than that predicted from the Abbe limit as spherical aberrations inevitably degrade frequency support.

TRAINING ML-SIM WITH IDEAL SIM TARGETS
The standard ML-SIM model used throughout the paper is trained with clean and unmodified images as targets in a supervised learning approach. However, the targets could instead have been limited to the resolution corresponding to the theoretical optimum of standard SIM reconstruction, i.e a resolution increase of a factor of 2 over a wide-field image. This is enabled by gaining the frequency support of a modified optical transfer function (OTF) with twice the radius over the wide-field equivalent OTF. A more conservative model could be obtained in this way at the expense of resolution. This is illustrated on Figure S8, where a ML-SIM model trained in such a way (ML-SIM 2x OTF) provides reconstruction output of lower resolution than the default ML-SIM model (ML-SIM GT). While other studies on applying deep learning to microscopy have reported on content-aware approaches [3,4], ML-SIM is trained with its diverse training data to avoid sample-specific models, thus in principle preventing resolutions in reconstructions that exceed the theoretical SIM optimum. Yet, basic features such as simple curves, lines, edges and corners are arguably similar between natural objects across different length scales. Imposing this resolution constraint during training may thus cause the reconstruction quality to suffer as indicated by the corresponding line profile in Figure S8.

APPLYING ML-SIM TO TIRF-SIM DATA
ML-SIM is also tested on a third SIM system which is distinct from Microscopes 1 and 2 described in the main paper in that it uses total internal reflection fluorescence structured illumination microscopy (TIRF-SIM) and produces raw SIM images at a resolution of 256x256 pixels per frame, while ML-SIM is trained for images with 512x512 pixels per frame. Rather than training a separate model specifically for this system, the TIRF-SIM data is reconstructed with the same ML-SIM model used throughput the main paper to further demonstrate its generality. The TIRF-SIM image used here is a test image of tubulin from the open source FairSIM repository 1 [5]. Reconstruction output and line profiles from the respective methods across the tubulin structures are shown on Figure S9.