Fluorescence fluctuation-based super-resolution microscopy using multimodal waveguided illumination.

Photonic chip-based total internal reflection fluorescence microscopy (c-TIRFM) is an emerging technology enabling a large TIRF excitation area decoupled from the detection objective. Additionally, due to the inherent multimodal nature of wide waveguides, it is a convenient platform for introducing temporal fluctuations in the illumination pattern. The fluorescence fluctuation-based nanoscopy technique multiple signal classification algorithm (MUSICAL) does not assume stochastic independence of the emitter emission and can therefore exploit fluctuations arising from other sources, as such multimodal illumination patterns. In this work, we demonstrate and verify the utilization of fluctuations in the illumination for super-resolution imaging using MUSICAL on actin in salmon keratocytes. The resolution improvement was measured to be 2.2-3.6-fold compared to the corresponding conventional images.


A. Imaging System
The imaging system, waveguide coupling and formation of illumination modes are outlined in Figure S1. The imaging system is based on an upright Olympus microscope, modified to accept a four-camera system for either simultaneous multi-channel imaging, or fast multiplexed imaging. The illumination system is a custom configuration made of mostly commercially available components, but with some parts designed and fabricated in-house. A list of components is included in Table S1.

Imaging Module
The collection module is based on an upright Olympus microscope. It has been modified by replacing the tube lens and adding a set of four cameras. The excitation light is filtered out using a quad band emission filter. After the tube lens, the image beam is split into four using dichroic mirrors. The images are then digitized using compact and low cost industrial cameras. Adjustable length mechanical adapters make sure each camera is focused correctly.

Illumination and Modulation
During image acquisition, the photonic chip is securely mounted using a vacuum chuck. A linear stage is used to move the photonic chip relative to the coupling objective allowing selection of a desired waveguide. The waveguide is excited by directly focusing a laser onto the input facet with a long working distance objective. Coupling efficiency is optimized by precisely aligning the excitation beam with the input of a waveguide along two axes using a flexure stage.
A mirror on the galvanometer allows for changing the input angle of the laser beam, the focused beam can then be scanned across the field of view of the coupling objective along the input facet of the waveguide. Fast settling time of the galvanometer (800 µs) allows for switching the illumination pattern in the waveguide without significantly compromising acquisition rate.

System Control and User Interface
High level control and user interface for the system is provided through a custom LabVIEW application. Timing critical parts of the system, such as laser modulation and galvanometer position update are handled using trigger signals from the cameras, and a custom DAQ board monitoring these signals and updating the galvo position by means of a control voltage.

B. Waveguide fabrication
A 250 nm thick Ta 2 O 5 film was deposited using a magnetron sputtering system on Si substrate with 2.5 µm thick SiO 2 bottom cladding layer. The base pressure of the deposition chamber was evacuated up to 1 · 10 −6 Torr, with sputtering gases being argon and oxygen (Ar:O 2 flow ratios of 20 SCCM: 5 SCCM). Substrate heating was applied to maintain a substrate temperature of 200°C throughout the deposition time [1] [2]. The waveguide fabrication process was carried out using a conventional optical projection lithography method. First, 1 µm thick positive resist (Shipley S1813) is coated on top of a 250 nm Ta 2 O 5 film and then prebaked at 90°C in the oven for a half-hour. Then, the wafer was mounted at the mask aligner (MA6), which illuminated the photoresist with the waveguide pattern on the mask. The masking process was optimized for the transparent wafer by varying the exposure time and development time (developed in Shipley MF319). Ion beam etching of the Ta 2 O 5 layer using argon gas was done to fabricate the waveguide. The etching was performed in the ion beam system (Ionfab 300+ from Oxford Instrument) using an argon gas flow rate of 6 SSCM. The process pressure (2.3 · 10 −4 Torr), beam voltage (500 V), beam current (100 mA), RF power (500 W), and substrate temperature (15°C) were kept constant. The etch rate goes down by reducing the crystal period. In the ion beam milling process, the substrate was placed at an angle of 45 degrees with respect to the incident ion beam to achieve low sidewall roughness. Finally, wafers are placed in a 3-zone semiconductor furnace at 600°C in an oxygen environment for 3 hours to reduce the stress and supplement the oxygen deficiency created in Ta 2 O 5 during the fabrication process.

C. Sample preparation
The keratocytes were obtained from scales harvested from farmed Atlantic salmon. The fish were killed by a blow on the head which is allowed according to Norwegian Regulations for use of animals in experimentation (https://lovdata.no/dokument/SF/forskrift/2015-06-18-761#KAPITTEL_ 10). This method is also in compliance with corresponding EU legislation -Directive 2010/63/EU (http://data.europa.eu/eli/dir/2010/63/oj). The scales were placed on ethanol cleaned photonic chips Fig. S1. Chip imaging system with galvo mode scanning. (a) Acquisition system: a: xy stage, b: coupling alignment stage, c: adaptor plate, d: fiber collimator, e: custom mount, f: galvo, g: coupling objective, h: linear stage, i: vacuum chuck, j: collection objective, k: upright microscope, l: tube lens, m: dichroic mirror mounts, n: cameras. (b) Overview of waveguide coupling, mode scanning and the formation of different TIRF illumination pattern from waveguide mode interference. (c) An illustration of image acquisition protocol for minimum distance of probing input facet (∆x) and scan start location x 0 , a deinterleaving parameter d for acquiring fewer frames, a stack offset parameter n. These notations are also used in supplementary note 2. Frame number is represented using k, which also indexes the probing points in the corresponding image stack. x SM1 to C-mount adapters Thorlabs 4x SM1A39 Table S1. Components of the imaging system. The letters in the 'Ref.' column refer to the letters in Figure S1a. Further, the letter 'x' denotes the components not directly shown in Figure  S1a.
combined with a polydimethylsiloxane (PDMS) camber of height ≈ 0.5 cm and volume ≈ 1 mL (see Figure 1c). For the scales to attach to the waveguide surface, they were left to dry for about 2 min before adding antibiotic antimycotic buffer solution (Hank's Balanced Salt Solution (HBSS, Corning, #21-023-CM) with 100 IU/mL Penicillin (Alfa Aesar), 100 µg/mL Streptomycin (VWR) (P/S) and 1 µg/mL Amphotericin B (PanReac AppliChem, #A7009. The cells were kept at 4°C in an air tight box, replacing the old solution with fresh medium every 2-3 days. After day 4, the cells were kept in Leibovitz's L-15 medium (Gibco) with P/S.The cells and scales were fixed on chip (11 days after harvesting) in freshly thawed paraformaldehyde (4% in phosphate buffered saline (PBS)) for 1.5 hours. The cells were permeabilized using 0.1% trypsin for 4 min and washed using PBS before F-actin labelling using phalloidin-ATTO647N (ATTO-TEC) in a dilution from stock 3:100 in PBS for 1 hour while kept protected from light and at room temperature. The cells were washed in PBS and kept in the HBSS antibiotic antimycotic buffer solution in a sealed container in the fridge until imaging.

D. Image acquisition
Different waveguide illumination patterns were used to illuminate the sample by moving the excitation spot along the input facet of the waveguide as outlined in section S1.1. The illumination and image acquisition were synchronized such that the camera would start its exposure when the galvo mirror was stationary, and the laser would be turned on only while exposing. This approach was chosen to create a high contrast between the mode patterns captured in different image frames and to minimize photobleaching. The acquisition time per frame is the sum of the camera exposure and readout time, plus the time of mode switching, of which the latter one is negligible compared to the former ones for our implementation with galvo scanning. In effect, the imaging rate is primarily determined by the camera technology and the fluorophore brightness.
In the experiments presented in the main article, the modes of a 600 µm wide waveguide were excited using a 650 nm laser of power 150 mW. The distance between each coupling point along the input facet for each consecutive frame was 400 nm. The fluorescence images were collected through a 670 nm longpass emission filter. The choice of laser coupling step size and exposure times were made from a compromise on getting a high enough number of frames with a sufficient signal-to-noise ratio without deteriorating the image quality from photobleaching too much towards the end of the time sequence.

E. Data analysis
Image visualization, analysis and processing were conducted using Fiji [3] with ImageJ 1.53c. The MUSICAL image reconstructions were accomplished using an implementation of MUSICAL for Python, with the following parameters: 5 subpixels, lambda 660 nm, and alpha 2. The threshold was picked automatically using the work presented in [4]. More details are described in 5.
Resolution estimation based on decorrelation analysis were performed using the ImageJ plugin from [5] with default settings. SSIM calculations were performed using MATLAB (The MathWorks, Inc.,version R2020a) calculating the SSIM from [6].

MATHEMATICAL DERIVATION OF CHIP-BASED MUSICAL AND MEASUREMENT OPTIMALITY
This section contains the mathematical analysis needed for optimizing the illumination design for chip-based MUSICAL. It is broken into the following sub-sections: description of the design problem needed for identifying the optimal measurement setup, description of the imaging model, the criteria designed for identifying the optimal illumination approach (and consequently the measurement approach), the protocol for designing the optimal measurement strategy for a waveguide prior to bioimaging, and lastly the resolution limit derived from the mathematical analysis. The complete list of notations are listed in Table S2.

A. The design problem
We first define the design problem related to the illumination approach. The input facet of a multimodal waveguide is a wide facet, and when using a focused laser beam, only a small width of the facet is sampled. Let the position of the center of the beam spot on the input facet with respect to the left edge of the waveguide facet be denoted as x. The beam is scanned over the input facet, and each illumination point (also called illumination probe) introduces a particular

Meaning
Indices p Pixel number for pixels in camera. The pixels in 2D camera space may be numbered or indexed linearly using, for example, raster scan. There may be a separate look-up table for converting the 2D pixel coordinates into p.
m Pixel number for pixels in a super-resolved (SR) grid in the sample space. The pixels in the 2D plane illuminated by TIRF (z-axis neglected as small due to evanescent waves) may be numbered of indexed linearly using, for example, raster scan. There may be a separate look-up table for converting the 2D SR pixel coordinates into m.
k Frame number. This corresponds also to (a) the time coordinate as t = k/F and the probing location x on the input facet of the waveguide. q The index representing a propagating mode in the waveguide.

Imaging setup related ∆x
The smallest scan step for the input facet of the waveguide used for designing the optimal measurement strategy.

F
The rate of image acquisition of the camera. In the case of the illumination engineering system used in this article, it is also the rate of switching of the scanning point on the input facet of the waveguide. t The time at which an image is taken.
x 0 The left edge of the waveguide (or a small practical offset point) which serves as the reference for the input facet scanning by the illumination spot. x The distance between x 0 and the illumination beam's spot (also referred to as illumination probe) d It is an integer, referred to as the 'deinterleaving parameter' that represents the distance between two consecutive positions of the illumination probes. Thus, if x k and x k+1 represent two consecutive positions, then It is an integer referred to as the 'offset parameter'. It indicates the offset from the x 0 in terms of ∆x for a given input facet scan. Therefore x k is completely specified as  L m,q Evanescent field intensity of qth propagating mode at the SR pixel m in the sample region. These intensities are normalized in the sense that the maximum electric field amplitude is 1.
The coefficient of the qth propagating mode that indicates the strength of mode in the illumination pattern in the kth frame.
f Spatial frequency (magnitude) in the Fourier domain f max The maximum spatial frequency of the propagating modes in the waveguide, which corresponds also to the maximum spatial frequency supported in the illumination patterns generated by the chip-based microscope f obj The bandwidth of the optical transfer function of the collection system comprising of the conventional objective lens system.

Statistical quantities defined for analysis
The variance of intensity I(p, k) across the frames (i.e. ∀k) at a given pixel p when an imaging setup characterized by a particular combination of n and d is employed.
The average of intensity I(p, k) across the frames (i.e. ∀k) at a given pixel p when an imaging setup characterized by a particular combination of n and d is employed.
For a given frame k and the deinterleaving parameter d, the spatial correlation between the kth frame I(p, k) and its adjacent frame I(p, k + 1). Table S2(contd.) : Description of symbols and notations combination of the waveguide modes supported by the waveguide. Among these combinations, the excited modes propagate for long distances and produce evanescent fields at different regions of interest (ROIs) along the length of the waveguide. Therefore, by varying x, one can create illumination fluctuations in the ROIs which are to be exploited for super-resolution. Hence, the design of the optimal measurement approach corresponds to identification of a suitable set of illumination probing points {x}. The criteria of suitability will be derived later in this section.
From experimental practicality, we consider the constraint that the illumination facet can be probed in regular intervals via some form of scanning mechanism (employing e.g. piezo stage or galvo scanner) with the minimum scan distance between two consecutive probes being ∆x. In other words, the input facet can be scanned in discrete multiples of ∆x. Let the starting point of the scan be given by x 0 , which is close to the left edge of the input facet. The starting point need not be particularly known as long as it is practically close to the facet edge, enabling the entire waveguide facet to be scanned. Then, the illumination probing distances can be specified as where d relates to the physical scan distance as d∆x and n relates to additional offset from the starting point creating the possibility of scanning starting at multiple locations. Therefore, the design parameters are n and d. We further incorporate in our design a choice of d such that the sensitivity of n is low, such that the offset from the left edge is not a factor of concern. In essence, our design parameter is d or d∆x for a given waveguide, illumination wavelength, and illumination probing setup.
In the experimental results presented in section 3.2 of the main manuscript, we start with an oversampled stack with ∆x = 400 nm and d = 1 acquired at NA 0.3. In order to study the optimal stack size (i.e. number of illumination points) and sensitivity to the starting point, we create different stacks using different values of d and n ∈ [1, d]. However, if the stacks were created using optimal selection of d, the number of actual measurements would be significantly smaller. As noted in section 3.2, when we use d = 5, the structural similarity and resolution of MUSICAL images match well with the reference d = 1. In this case, d = 5, i.e. sampling distance d∆x = 2000nm between two consecutive frames is a good experimental design.

B. The imaging model and the role of illumination
Consider a set of photostable emitters (fluorescent molecules) that have negligibly small standard deviation in the number of photons emitted over time. Let their average photoemission constant be e photons per unit illumination per frame. This assumption is made so that the analysis can focus on the fluctuations arising solely from variations in the illumination resulting from scanning the excitation laser along the input facet of the multimodal waveguide. Further, let us consider a hypothetical and sufficiently fine, super-resolved (SR) grid in the sample region. It is represented in the sample through the number of emitters E(m) in the mth pixel. The SR sample grid is significantly finer than the diffraction limit as well as at least two times smaller than the smallest feature of interest. Let the point spread function that maps the photons emitted at an SR pixel m in the sample to the pth camera pixel be G(p, m). Let the illumination pattern in the sample region in the kth frame be L(m, k). Then, the intensity image on the camera is given by: In these notations, the following assumptions are made. The notation t denotes a particular frame number during which the camera sensor is exposed for a time ∆t. The illumination L(m, k) is the sum of all illumination profiles delivered during the exposure of the camera. If the illumination point on the input facet is stationary during the image acquisition (such as used in this article), this consideration does not apply since the illumination is constant during the exposure time. However, the consideration applies when a continuous scan of the input facet is employed instead of discrete jumps characterized by ∆x. We assume that the exposure time of the camera is large enough (> milliseconds), such that the photons' emitter-camera travel time is negligible. The observed intensity at a pixel p in a frame k indicates the total number of photons collected in the pixel region of the pth pixel during the exposure of the camera in that frame.
Relegating the measurements into matrix notations, eq. (S1) can be expressed as: where the (p, k)th element of I corresponds to I(p, k), the (p, m)th element of G corresponds to G(p, m), the (m, k)th element of L corresponds to L(m, k), and E is a diagonal matrix in which the mth element contains E(m).
MUSICAL involves eigenanalysis of J = II T , which can be written as Therefore, the illumination variation achieved in the waveguides contributes to MUSICAL through the mathematical support of LL T . The (m 1 , m 2 )th element of the matrix LL T , being denoted as l(m 1 , m 2 ) for the ease of further reference, is specified as This means that each element consists of correlation in time with lag zero but for different points in the sample space. The exceptions are the diagonal elements, as they represent autocorrelation in time at a given point in the sample space. For these cases, the diagonal elements comprise the illumination's second order temporal raw moment (explained below).

C. Criteria of optimal measurements for waveguide based illumination engineering
We first consider the diagonal elements, which correspond to the second order temporal raw moment of illumination at a given point. Mathematically, this term is the sum of the temporal variance and the square of the average signal strength of the illumination for each point in the sample space independently. In order to achieve a diagonal dominant matrix, it is important to have high variance as well as high signal strength across all the points in the sample space. In order to assess this property over all the sample points in a ROI, we can first compute the temporal variance and average signal strength at each individual point, and then form a histogram each for the distribution of variance and signal strength over all the points. Then the optimality in the sense of temporal variance is given by the input facet sampling distance such that the median value of the spatial distribution of the temporal variance is large but the spread of the distribution is small.
Similarly, the optimality in the sense of temporal average intensity is given by the input facet sampling distance such that the median value of the spatial distribution of this average intensity is large but the spread of the distribution is small.
Furthermore, the rank of the matrix LL T is quite insightful. It indicates the number of independent illumination patterns in L. The number of electromagnetic modes inside the waveguide is finite, say Q, even though it might be difficult in practice to determine the value of Q. As a consequence of the finite value of Q, there is a bound to the rank of this matrix irrespective of how fine sampling is taken on the waveguide input facet. Therefore, while one may think that sampling the input facet with a very small step size between two consecutive frames provides the best measurement strategy if photobleaching of the fluorophores is not a problem, this is actually not the case. In fact, taking very many such measurements provide (a) different linear combinations of the same waveguide modes, and (b) generate independent noise patterns for every single measurement. Therefore, taking too many such measurements with small facet size does not increase the cardinality of illumination patterns but does increase the sampling of noise. This in turn leads to noisier eigenimages and therefore also to a poorer image reconstruction. On the other hand, using too few sampling points also leads to a rank below the maximum number of modes supported by the waveguide.
Therefore, the optimal number of measurements is such that taking a higher number of measurements does not alter the rank of LL T , while fewer measurements surely and systematically reduces the rank.
An alternative way of assessing the optimal number of measurements is described next. Consider two consecutive frames that are acquired by two consecutive samplings of the input facet. If the sampling points are too close, they will result in similar or highly correlated illumination patterns. If the difference between the illumination patterns across a chosen ROI is comparable to the noise, then the two patterns do not necessarily provide exploitable information. An eigenvalue decomposition of a matrix L comprising of just these two illumination patterns results in one significant eigenvalue associated with an eigenvector which is proportional to the average of the two illumination and the other eigenvalue is comparable to noise. In essence, the rank of the matrix is 1 for this noisy scenario even though two illumination patterns were used. On the other hand, as the distance between the consecutive input facet sampling points increases, the illumination patterns from such points become less correlated, providing more exploitable information. As the distance between the sampling points increases, the eigenvalue decomposition results in more and more prominent value of the second eigenvector, contributing to full rankedness of such matrix.
In other words, the optimal sampling distance between the input facet is such that the correlation between two consecutive measurements decreases when the distance is further increased and increases consistently when the distance is decreased.
This alternate approach for assessing the optimal sampling distance is more computationally efficient than assessing the ranks for all combinations of d and n, as required for the previously described optimization procedure.

D. Strategy for identifying the optimal illumination approach
In order to identify the optimal illumination approach prior to bioimaging, we performed some measurements as outlined next and processed the acquired data stacks using the criteria discussed in the previous subsection and outlined more explicitly in this section as a procedure or protocol.

Acquiring images of illumination patterns under uniform sample conditions and over-sampling of the input facet
Since the illumination strategy needs to be designed a priori and irrespective of the sample, we consider uniform sample conditions for this process. This may be achieved in waveguidebased systems through one of the following two strategies: (a) Form a thin and uniform monolayer of fluorophores using standard protocols such as in [], or (b) autofluorescence spectrum of waveguide material may be exploited to image the illumination at the autofluorescence peak. The second strategy is more suited for the conventional homogeneous waveguide materials, but is useful only if the excitation frequency of the fluorophores of interest generates sufficient autofluorescence in the waveguide material. On the other hand, the fluorophores of interest can be directly used for the characterization by the former strategy if uniform layers can be created effectively on the waveguide surface.
Beside this, the normal microscopy setup for chip-based imaging such as described in Suppl. section 1A or other multimodal waveguide-based imaging setups described elsewhere [7] may be used. The only condition is that the same setup is used for applying MUSICAL on actual samples later. The waveguide input facet may be illuminated using optics such as described in the main paper with the only condition that as fine sampling of the input facet as possible (i.e. ∆x) is used and one illumination pattern is imaged for one sampling.

Data analysis to determine the optimal d
We first define the quantities that need to be computed using the stack described above for determining d opt .
Consider a pair of images I(p, k) and I(p, k + 1). There can be K − d such image pairs for a given value of d, where K is the maximum number of frames when ∆x is used for sampling the input facet. Let the spatial correlation between each pair be denoted as C(d, k).
Also, let a stack with deinterleaving d, and offset n be defined as I(n, d) that contains the image frames starting at frame number n and sampled every d frames thereafter for the experimental stack measured using the aforementioned experiment. Let V(p, n, d) represent the variance of intensity across the frames and A(p, n, d) represent the average intensity across the frames.
Based on the analyses presented in section C, we outline the criteria that imply a good measurement strategy for waveguide-based illumination engineering for MUSICAL.
• The univariate histogram of V(p, n, d) with respect to d is computed. In the optimal case, the median of such distribution is high but the spread is small.
• The univariate histogram of A(p, n, d) with respect to d is computed. In the optimal case, the median of such distribution is high but the spread is small.
• The univariate histogram of C(d, k) with respect to d is computed. In the optimal case, the median of such a distribution is small.
We note that the above analysis is easily generalizable to include images acquired over many ROIs, and the conclusions on d opt can be derived through the above mentioned histograms by including evidences over multiple ROIs in the same histograms. The above criteria are described qualitatively and also used for illustration purposes in the following sub-sections. We recommend the above qualitative measures as more intuitive and amenable to human-based heuristics. Nonetheless, one is free to compute the medians and standard deviations of V(p, n, d), A(p, n, d), C(d, k) as functions of d for more quantitatively rigorous and automated conclusions. This aspect and also design of other single valued functions using a combination of the above mentioned histograms are out of the scope of the current work.

E. Resolution limit and relationship with the spatial frequencies supported by the waveguide
The wide-field resolution of chip-based TIRFM and the potential of super-resolution through both single molecule localization and fluctuations based techniques have already been investigated [7][8][9]. At the same time, resolution and reconstruction quality of MUSICAL have also been studied previously [10,11]. The scope of utilizing the chip-based illumination for super-resolution using MUSICAL is new and the focus of the current article.
In the above discussion, so far, we have discussed the optimality of the distance between consecutive sampling points of the waveguide input facet. Here, we discuss how the spatial frequency support of the waveguide illumination patterns in the eigenvalue decomposition contributes to resolution. Let the electromagnetic modes in the waveguides be represented as L(m, q), q ∈ 1, Q, and each characterizing one spatial frequency. Let the coefficients of the linear combinations of these modes be denoted as a(k, q) such that L(m, k) = ∑ Q q=1 a(k, q)L(q, m). Therefore eq. (S4) can be rewritten as This indicates that the maximum spatial frequency support of the eigenimages derived using J defined in eq. (S3) is given by the multiplications of all of the spatial frequency components supported by the waveguide with each other. In the frequency domain, this corresponds to the convolutions of all frequency components with each other. Therefore, if the maximum spatial frequency in the waveguide has magnitude k max , then the maximum frequency of LL T is 2k max . Further, multiplication with the G on either side in the definition of J defined in eq. (S3) contributes additional 2k obj where k obj is the maximum frequency supported by the optical transfer function of the collection objective. This results in the net resolution limit of MUSICAL for waveguide based illumination engineering to be determined by 2(k max + k obj ).

F. Resolution in the context of structured illumination approaches, including blind and speckle illuminations
There is a certain degree of overlap between the structured illumination approaches with random or pseudo-random illuminations patterns [12][13][14] and the proposed chip-based MUSICAL. It is notable that the resolution of the proposed photonic-chip based nanoscope is limited in the sense of the hardware through the numerical apertures of the illumination and collection, similar to other structured illumination systems [7,15]. However, we clarify an important difference with respect to the illumination numerical aperture between conventional optics-based systems versus chip-based imaging systems. We note that the objective lens used for coupling the light into the waveguide does not determine the spatial frequency support or the effective numerical aperture contributed by the waveguide-based illumination, as also discussed in detail in earlier works [15]. In the waveguide-based illumination system, the numerical aperture is indicated indirectly by the effective refractive index of the waveguide modes supported by the waveguide. Simulations provide an effective refractive index range from 1.9329 to 1.9317 for transverse electric polarization or 1.8469 to 1.8457 for transverse magnetic polarization at the geometry utilized for this work [7].
It is also interesting to consider the differences between MUSICAL and blind SIM given that both MUSICAL and blind SIM involve correlations. In blind SIM [13] and related algorithms, the sample's image is correlated with the illumination pattern, i.e. pattern-image correlation is performed. In MUSICAL, images taken under different illumination patterns are correlated, and therefore indirectly different illumination patterns are cross-correlated. We note that the patternimage correlation, in the ideal case support the spectral span corresponding to 2k illum + k obj . On the other hand, the cross-correlation of illumination patterns, such as indirectly performed in MUSICAL, supports spectral span corresponding to 2(k illum + k obj ). This is a direct consequence of the mathematical formulation presented in [16] although only fluctuations from photokinetics is considered there.

CROSS-CORRELATION OF ILLUMINATION PATTERNS
In this section we discuss the cross-correlation between frames located at different positions in time. The objective of this comparison is to analyse the difference between frames and the random The diagonal is full of ones as the correlation of a frame by itself is one. The inset shows a magnified-view of the top left corner of 10 by 10 frames.
patterns generated by light reflection in the chip. We captured the patterns by measuring images of the autofluorescence at discrete positions and measured the similarity between every pair of frames by obtaining the Pearson correlation coefficient between frames. The coefficient for frame number f 1 and f 2 where the operator · represents the mean of values in the vector is defined as follows: To represent a column or row we use the colon sign (:) to indicate that the whole column or raw is considered. For instance I :,1 corresponds to the first column, while I 1,: is the first row.
When the correlation for every pair is obtained, they can be ordered in the correlation matrix, which is shown in Fig. S2 for the frames taken at NA 0.3 (6000 frames). As expected, the image is symmetrical since it contains every pair of frames twice, with the maximum value located across the main diagonal as it shows the correlation of each frame by itself. Due to the symmetry of the correlation matrix, we can analyse only half of the values in this similarity matrix.
Our interest lies in the off-diagonals as they represent different deinterleaving distances, starting from zero in the main diagonal. From this representation, it is also clear that for larger values of deinterleaving, fewer samples are available as the diagonals contain less pairs. Then, to make them comparable, we obtained the normalized histogram using a fixed number of bins (100) for each deinterleaving (or each off-diagonal) and collect them all in a single image as columns. The results are shown in Fig S3, where we added two curves. The mean curve is computed by getting the weighted mean of the histogram (continuous line), while the maximum is just the maximum value of each histogram.
From the plot, we observe that correlation decreases as the distance increases, reaching a plateau at around 20 frames for both of the 0.3 NA and 1.0 Na case. It is also interesting to observe that for the higher resolution case (i.e. 1.0 NA), the cross-correlation goes closer to zero, as can be clearly seen from Fig S3c. Therefore, we conclude that using an deinterleaving of 20 frames would be enough for breaking the correlation between patterns.

A. Analysis of the mean and variance across multiple substacks
In this section, we compare the fluctuation properties captured in the mode pattern images acquired using the 0.3 NA and 1.0 NA objectives. We performed this analysis by getting V and A as defined in Section 2.F of this supplementary document for multiple substacks of the original image I that depend on the deinterleaving d and also in the offset n as: From these expressions we get the ratio R(p, n, d) = V(p, n, d)/A(p, n, d) to observe variations in the fluctuations for every case. This expression allows to characterize every pixel at single stack in terms of its intensity values, which is normalized by the mean intensity in the stack. We note that instead of variance to mean ratio, standard deviation to mean ratio could have been used and would lead to similar interpretations. Since the interpretations are similar, we have chosen to present only variance to mean ratio.
As a result, for every choice of deinterleaving d we can obtain a distribution of R as there exist several substacks for each value. An important problem is that as the number d increases, so does it the number of substacks available (with each of them containing less frames as the total number of frames is limited).
To increase the number of samples, we extracted 10 small patches of 100 by 100 pixels from the original autofluorescence acquisition. The results can be visualized as living in a threedimensional array whose axis correspond to d, substack index n, and pixel position p. Then, we compute R as an element-wise ratio and then compress the result to be represented as a function of d. Finally, we present the results in Figure S4 which shows the histogram across every value of d. In addition, we have added a line with the mean of the curve in red with an envelope showing the standard deviation for each d.
In this figure, it is possible to observe a change in the histograms and curves for deinterleaving values above 20 frames. For values below that, the standard deviation (blue curve) varies slowly suggesting that the variation of R (and therefore the variance across pixel intensities) does not change considerably in the range between 1 and 20. This represents the effect of picking different values of offset n. A large spread indicates large sensitivity to the offset, and vice versa. As d increases, indeed the number of candidate values for n increases, and indeed some amount of sensitivity to the value of n is expected. However, the fact that the spread becomes almost constant after a certain value of d indicates robustness to the value of n and therefore less critical need of controlling the offset during an experiment.

B. Interpretation of eigenimages in each pixel window
MUSICAL is based on the decomposition of small sub-sections into what is referred to as the eigenimages in related literature. In the original MUSICAL, where widefield illumination and a blinking behavior rooted in the fluorophores' photokinetics is expected, these eigenimages contain the most prominent spatial structures in the sample. Later, these are divided into two subsets called the signal subset and the noise subset depending on their statistical significance measured through their respective eigenvalues. On this new application of the MUSICAL algorithm, the fluorescence fluctuations is attributed largely to the illumination patterns generated in the chip. Therefore, in this new scheme, the eigenimages contain the different illumination modes (which encode the structural information) and the division into signal and noise subspaces is equivalent to selecting which modes represent statistically prominent illumination patterns within the pixel sub-window and which do not. This may be interpreted also as selecting fewer modes than pixels, namely the most relevant ones for super-resolution.

FOURIER ANALYSIS OF ILLUMINATION PATTERNS
The relevant waveguide mode patterns for this study, can in general be described as lines. While the patterns are random, due to the geometry of the waveguide and excitation laser, these tend to be horizontal or deviating with only a small angle compared to the waveguide optical axis. We applied a Fast Fourier transform to each frame in the stack in order to observe this effect. As the lines' position can be modulated by the phase in the Fourier domain, we are interested only in comparing the magnitudes. The results of the mean Fourier magnitude are shown in Figure S5.
The circular distribution shows that the illumination covers frequencies in both the horizontal and vertical axes. However, the strong vertical line indicates that the patterns are indeed emphasised in the horizontal direction. As both directions are not equalized, it is likely that resolution-limited structures along the vertical axis are better resolved than the ones along the horizontal axis.

MUSICAL THRESHOLD SELECTION
The most difficult parameter to choose-and a potential source of subjectivity-for the MUSICAL reconstruction is the threshold parameter. This determines which fluctuations are regarded as signal and which are regarded as noise or background fluctuations.
To overcome this challenge, various automatic thresholding schemes were developed by Acuña et al. in [4] and tested on both simulated and real microscopy data, but only for the case of homogeneous illumination and epifluorescence data. Here, we tested the different automatic thresholding schemes for the inhomogeneous multimodal waveguide (TIRF) illumination. We used the same deinterleaved image stacks as presented in the main article.
We found that for all cases of deinterleaving, the soft thresholding scheme MUS-S gave a satisfactory and the best performance compared to the other thresholding schemes. The reconstructions were in accordance with the 1.0 NA ground truth reference, low to no appearance of reconstruction artifacts and with an intensity scaling appearing in accordance with the actual sample brightness. The reconstruction procedures MUS-A, MUS-B and EV-B were seen to reject too many sample details for all cases of deinterleaving. The eigenvalue-scaled methods EV-A and EV-S appeared blurry and with a high presence of reconstruction artifacts for small stack sizes. Similar results as for the 0.3 NA data were also achieved for the 1.0 NA data, although the ground truth is not available for these images. The results for the 0.3 NA data are summarized in Figure S6.

EXTREME FOVS SUPER-RESOLUTION MICROSCOPY
For application scientists to efficiently harvest the advantages of the huge super-resolved areas provided by chip-based MUSICAL, the MUSICAL image computation must also be within reach of normal computers, e.g., via the open-source ImageJ plugin MusiJ [17]. Below we consider a practical scenario based on the reconstruction displayed in Figure S7.
Assuming that the image coming from a microscope is stored using 16 bits and with a grid of sensors of 2048 by 2048 elements, then each frame has a size of 8 MB. This means that a stack of 1500 frames requires approximately 12 GB in memory which is already larger than the common computer configuration of 8 GB. This can be solved partially by using ImageJ's virtual stacks where the images are not stored completely in the RAM but on the hard drive (disk resident). However, MusiJ requires the complete image stack to be loaded into the RAM in order to split the data across different processors in a way compatible with the MUSICAL computation. Moreover, we observed heap memory issues while processing larger datasets, obstructing the attainment of a best-quality MUSICAL image (processing all the available information) via the MusiJ plugin. We found the deinterleaved reconstruction strategy as a simple and elegant solution to this bottleneck of computational resources, namely a method for reconstructing all the available information which was also greatly sped-up compared to the previously applied "all-frames simultaneous" reconstruction strategy. Figure S7 is an example of a large MUSICAL image resulting from such a deinterleaved processing with d=10, meaning a reduction of 10 times the required memory for processing. In terms of size, this sample contains originally 1500 frames of 2448 by 2048 pixels and requires approximately 14 GB in memory, impossible to process using MusiJ and standard laptop configurations. The deinterleaved stack of 150 frames (1.4 GB) on the other hand, is possible to process. Hence, the deinterleaved reconstruction approach with sum of sub-stack MUSICAL images outlined in the main manuscript, enables the obtainment of larger MUSICAL images of superior quality than what would have been obtainable without the deinterleaved approach.
The MUSICAL computations presented in this work were achieved via a Python implementation of MUS-S [4]. The computer used was equipped with an Intel Xeon Gold 5118 processor with 12 cores and 128 GB DDR4 memory. For this configuration (and splitting the task into 8 parallel processes), the particular processing of the large FOV displayed in Figure S7 took 12 min 44 s for 150 frames, and 23 min 32 s for 1500 frames. Thus, the main advantage of the deinterleaved reconstruction approach is not saving reconstruction time (although this greatly depends on the particular hardware and software implementation), but to enable the computations to be performed on more memory-restricted systems.

COMPARISON OF MUSICAL-ON-CHIP WITH BSOFI AND SRRF
Although the multimodal chip illumination is expected to cause reconstruction artifact for other fluctuation imaging methods than MUSICAL, we have for completeness included a comparison of the MUSCIAL results with the corresponding SRRF and bSOFI images in Figure S8.

LIVE-CELL IMAGING
As the photonic chips are bio-compatible, the gentle TIRF illumination can also be exploited for live-cell imaging of a great variety of cell types. Figure S9 demonstrates imaging (conventional and MUSICAL) of actin in living cardiomyoblasts (labelled using SiR-Actin). The color coded panel (projection of three MUSICAL images at different time-points) highlights minute cytoskeletal changes over time. The optimization for high-speed super-resolution imaging of living cells is the scope of future work.  Figure 2a, but instead of computing the mean of modes for a conventional TIRF image, the resolution is enhanced by computing the MUSICAL image. To speed-up and ease the computational burden, the SUM of 10 MUSICAL images from the deinterleaved image stack (d=10) was used instead of all frames simultaneously. The superresolved area is 507 000 µm 2 ≈ 0.51 mm 2 (here limited vertically by the 600 µm wide waveguide, and horizontally by the camera FOV, about 845 µm). The color coded ROIs indicated in the upper panel are displayed magnified below. The large super-resolved image area is ideal for capturing the great variety of morphological features exhibited by keratocyte skin cells such as their extruding pseudopods.