Multi-line fluorescence scanning microscope for multi-focal imaging with unlimited field of view

: Confocal scanning microscopy is the de facto standard modality for ﬂuorescence imaging. Point scanning, however, leads to a limited throughput and makes the technique unsuitable for fast multi-focal scanning over large areas. We propose an architecture for multifocal ﬂuorescence imaging that is scalable to large area imaging. The design is based on the concept of line scanning with continuous ‘push broom’ scanning. Instead of a line sensor, we use an area sensor that is tilted with respect to the optical axis to acquire image data from multiple depths inside the sample simultaneously. A multi-line illumination where the lines span a plane conjugate to the tilted sensor is created by means of a diﬀractive optics design, implemented on a spatial light modulator. In particular, we describe a design that uses higher order astigmatism to generate focal lines of substantially constant peak intensity along the lines. The proposed method is suitable for fast 3D image acquisition with unlimited ﬁeld of view, it requires no moving components except for the sample scanning stage, and provides intrinsic alignment of the simultaneously scanned focal slices. As proof of concept, we have scanned 9 focal slices simultaneously over an area of 36 mm 2 at 0.29 µm pixel size in object space. The projected ultimate throughput that can be realized with the proposed architecture is in excess of 100 Mpixel/s.


Introduction
In the current era of big data analysis, instrumentation to generate massive amounts of image data is in high demand. For high throughput screening in biology, or for novel computer aided medical diagnoses in the field of digital pathology, there is a need for fluorescence imaging of tissues over large fields of view (∼few cm), in 3D (up to hundred layers of µm thickness), and at cellular resolution (∼1 µm) [1][2][3]. This can be used, for example, in immunofluorescence or fluorescence in situ hybridization (FISH) studies.
The de facto standard modality for fluorescence imaging is scanning confocal microscopy [4], because the optical sectioning capability enables high contrast. The underlying point scanning technique has a limited throughput, and the imaged area is limited by the Field Of View (FOV) of the microscope objective. Parallelization is a strategy to increase throughput, as then the space-bandwidth-time product is increased [5]. For example, in spinning disk microscopy, a large number of points is scanned in parallel [6,7]. Wide field structured illumination has also been proposed as a technique for high throughput imaging [8,9], where the loss in resolution of lower Numerical Aperture (NA) objectives for increasing the FOV, is compensated by the use of structured illumination. Throughput can also be increased by scanning multiple depth layers in parallel using an illumination with multiple foci [10,11]. These techniques require the distribution of the emitted light over several detector arms to apply a pinhole conjugate to the foci that scan the specimen at different depths. The necessary beam splitters used in these approaches result in a loss in collected fluorescence signal strength by a factor equal to the number of scanned layers. Another throughput enhancing technique is the use of a line illumination instead of a spot illumination in combination with a line sensor [12][13][14]. The pinhole for achieving optical sectioning must then be replaced by a slit, which goes at the expense of a small loss in the optical sectioning capability [13,15]. Line scanning can be combined with multi-focus scanning [16], but the proposed method suffers from the same signal losses induced by splitting the beam in the detection path as the point scanning based multi-focus systems.
Scanning large, cm 2 sized, areas with all mentioned systems can be accomplished by using 'mosaic' or 'step-and-stitch' scanning. The most favorable method for scanning such large areas, however, is continuous 'push broom' scanning with a line sensor, because of its mechanical simplicity and reduced need for stitching [17,18]. This scanning approach is naturally compatible with confocal line illumination [19]. Stage-scanning instead of beam scanning makes for a system with a minimum number of moving optical components. A line scanning system can be extended by replacing the line sensor with an area sensor. This gives additional freedom for hyper spectral scanning approaches [20,21].
In this paper we propose a multi-focal multi-line scanning fluorescence microscope for efficient 3D imaging over large, cm 2 sized, scanning areas. Figure 1 shows the essentials of the scanner concept. The architecture is based on the use of an area image sensor that is tilted with respect to the optical axis, inspired by [22]. The fluorescent specimen is scanned with a set of illumination lines, that are oriented perpendicular to the plane spanned by the optical axis and the scan direction, are focused at different depths inside the specimen, and that are optically conjugate to rows of pixels of the image sensor. This makes it possible to apply confocal slit apertures digitally, post-acquisition. Another advantage is that the line foci used for scanning are laterally separated, avoiding losses in the collected fluorescence signal by beam splitters in existing approaches [10,11]. Schematic side view of the architecture for a multi-line fluorescence scanning microscope for multi-focal imaging with an unlimited field of view. For clarity we illustrate the concept using only two focal layers. The imaging path is shown in green, the illumination path is shown in blue. A diffractive optical element (DOE) is used for generating a set of parallel scan lines, focused at equidistant planes. Note that in this side view, the lines are orthogonal to the drawing. (a) The tube lens and objective lens form a telecentric optical system such that the tilted sensor has a tilted conjugate plane in object space. The rows on the area sensor conjugate to the line foci are used to capture the image data from the different focal slices scanned by the line foci simultaneously. (b) The sample is scanned in a continuous 'push broom' scanning fashion to obtain a multi-focal image with a field of view that is in principle unlimited.
The major challenge in realizing this new scanning concept lies in the area of PSF engineering, in which many approaches targeting the imaging light path, often in the context of single-molecule imaging [23][24][25][26][27][28], as well as the illumination light path, in particular light sheet based techniques [29,30], have been proposed. In the current case the single laser beam must be transformed into a set of parallel scan lines projected at different depths inside the sample. Splitting a beam with a diffraction grating into a set of sub-beams with near equal intensity that are focused by the objective lens into a set of equidistant scanning spots in the scan (y) direction is an obvious possibility. In this paper we show how to extend the use of diffractive optical elements to first add defocus such that the set of spots scan the sample at equidistant focal layers, and to second modify the spots to lines orthogonal to the plane spanned by the optical (z) axis and scan (y) axis. Moreover, we introduce a design method for such lines to have uniform intensity along the lines. Secondary challenges in realizing the scanner concept lie in calibration and alignment of the diffractive optical element with the system, in particular with the pixel rows of the image sensor, and with the need to correct for aberrations in the illumination light path. Minimizing the spherical aberration arising from the finite conjugate imaging at different depths is an issue that also must be addressed.
This article has the following structure. In the theory section, the illumination PSF engineering approach to generating the set of equidistant multi-focal scan lines will be described in detail. The experimental methods section starts with a description of our experimental setup. After this, the use of a Spatial Light Modulator (SLM) as diffractive optical element in the setup will be discussed, and, finally, the image metric based aberration correction that is implemented, will be described. In the experimental results section, a quantitative characterization of the realized illumination PSF is presented; images of immunofluorescently stained cells and tissues and of cells stained with Fluorescence in situ Hybridization (FISH) are shown; and a quantitative description of the photo-electron yield of the system, needed to assess the potential optical throughput, is provided. The article is concluded with a discussion of further extension of this scanning platform to higher resolutions and to multi-color imaging.

Design for generating a set of defocused points
The first step in engineering the desired illumination pattern is to split the illumination laser beam into a set of sub-beams of near equal power, such that the sub-beams form a set of distinct foci after focusing by the lens. Each focus must be a line and the total set of foci must have an equidistant spacing in the scan (y) direction and in the focus (z) direction. Our design for this is a diffractive structure, often referred to as a 'diffractive optical element'.
Our design approach is based on the method used earlier by one of us [28]. For the sake of completeness, the essentials of the method are repeated here. The diffractive structure is assumed to be placed in a plane conjugate to the pupil plane and is described as a thin surface, locally altering the phase of the incoming light, independent of the angle of incidence. Then the complex amplitude of the incoming beam is modified by the transmission function T ( ì ρ) = exp (2πiW( ì ρ)/λ), where ì ρ is the normalized pupil coordinate and W ( ì ρ) is the phase profile added to the incident beam. This phase profile is described by: where λ is the illumination wavelength, f (t) the so-called profile function with t ∈ [0, 1), and K ( ì ρ) the so-called zone function. It appears that the incident beam is split into diffraction orders with integer index m with amplitude: depending only on the profile function, and phase: depending only on the zone function. For a diffraction grating, the zone function is a linear function of the pupil position, leading to an added aberration that only consists of wavefront tilt. This is sufficient for splitting the incoming beam into a set of diffraction orders (the required set of sub-beams), that will result in a set of foci with an equidistant spacing in the lateral (scan, y) direction. In our case, however, it is desired to have an equidistant spacing of the diffraction orders in the axial (focus, z) direction as well. We therefore choose a zone function that is a sum of tilt and defocus: with ∆y the lateral spacing of the foci, ∆z the axial spacing of the foci, n the refractive index of the sample medium, and NA the microscope objective Numerical Aperture. The next step in the design is to find a profile function f (t) that gives a power distribution over the diffraction orders |C m | 2 that is as uniform as possible. This problem is in general related to the problem of finding a phase function that leads to a desired intensity distribution, and can for example be solved using the Gerchberg-Saxton algorithm [31]. We use a parameterization of the profile function in terms of a finite Fourier series: which advantageously results in a smooth and band-limited profile function, and is characterized by the N free parameters a n . The cost function is defined as where an integer M ≥ N is used, and where η m represents the desired set of diffraction efficiencies. An even distribution of power over the first 2K + 1 diffraction orders is obtained when η m = 1/(2K + 1) for m ≤ K and zero otherwise. A generic search algorithm can be used for the minimization problem, such as implemented in the function fminsearch in MATLAB.
In our experiments we have implemented a solution for K = 4, N = 5, M = 10, which results in a profile function that gives 97 % of the power in the 9 diffraction orders, where the power in the individual orders varies by less than 3 %, see Fig. 2.

Design for lines with uniform intensity
So far, the diffractive structure is capable of producing a set of point foci, equidistantly spaced in the scan (y) direction as well as in the focus (z) direction. The next step is to modify the set of point foci into a set of line foci, where the direction of the lines is orthogonal to the yz-plane of the original point foci. This can be accomplished by adding astigmatism to the aberration profile, giving a total aberration profile: where W yz ρ x , ρ y is the aberration profile of the grating structure, according to Eqs. (1), (4), and (5), and where W x (ρ x ) represents astigmatism. This aberration function depends only on the normalized pupil coordinate ρ x as the intended line foci are oriented in the lateral plane orthogonal to the scan (y) direction. The astigmatic aberration can be generated by a refractive optical element such as a cylindrical lens or by the same diffractive optical element that produces the grating structure. It appears that a simple parabolic astigmatic aberration profile W x (ρ x ) = 1 2 aρ x 2 with a a coefficient, results in a line focus where the peak intensity and line width vary along the line. Here, we add higher order astigmatism in order to overcome these issues, generating focal lines with substantially constant peak intensity and line width along the line. The optical design for these improved astigmatic focal lines starts with a simplified 1D diffraction model for computing the impact of a general aberration profile W x (ρ x ) on the focal line shape. Consider an infinitesimal strip in the back focal plane at location ρ x with width Rdρ x and height 2R 1 − ρ x 2 , with R the pupil radius as indicated in Fig. 3(a). This strip generates a plane wave that makes an angle α with the optical axis as indicated in Fig. 3(b), given by: This plane wave is focused to a point in the field of view a distance x from the optical axis given by: with F the focal length of the objective lens, which we assume to be aplanatic and where we have used that NA = R/F . The strip width in the front focal plane as illustrated in Fig. 3(c) is given by: Equation (9) provides a unique mapping of pupil coordinate ρ x to field position x if the second order derivative of W x as a function of ρ x does not change sign. Now, the intensity in the front focal plane according to the 1D diffraction model is: where the first factor is the ratio between the width of the strip in the back focal plane and the width of the strip in the front focal plane. The second factor is the intensity resulting from diffraction in the ρ y -direction: where the mapping from pupil coordinate ρ x to field position x is implicitly taken into account, and where A(ρ x , ρ y ) is the amplitude profile of the beam incident on the diffractive structure. It is mentioned that the 1D diffraction model can also be derived from the standard 2D diffraction integral over the pupil plane if we apply the stationary phase approximation to the integral over the pupil coordinate ρ x . Assuming that the incident amplitude profile is flat, A ρ x , ρ y = A, we find that: with sinc (t) = sin (t) /t. The focal line intensity according to Eq. (11) is: For the simple parabolic phase profile W x (ρ x ) = 1 2 aρ x 2 , corresponding to a cylindrical lens in the paraxial approximation, the pupil coordinate ρ x and field position x are related by ρ x = NAx/a, and we find a focal line intensity profile: The peak intensity decays as 1 − (NAx/a) 2 along the line, and the line width increases as 1/ 1 − (NAx/a) 2 along the line.
A peak intensity along the line that is independent of x can be achieved if: This results in an aberration profile W x (ρ x ) that is of fourth order in ρ x . Ignoring constant phase offsets, and requiring a symmetric line structure centered at the optical axis (no odd terms in ρ x ), we arrive at: with l an integration constant. The relation between pupil coordinate ρ x and field position x according to Eq. (9) is: At the pupil rim ρ x = 1 we find that x = l, implying that the total line length is 2l. Equation (18) may be inverted to: It is mentioned that, although the peak intensity is constant along the line, the line width still depends somewhat on field position x. An alternative requirement that might be pursued is the requirement that the total power is distributed equally along the line, i.e. ∫ dyI f (x, y) is constant. It then follows that: which can be solved in a similar way as before, leading to: In our experiments we have opted for keeping the peak intensity along the line as constant as possible, which is optimal in the limit of a small confocal detection slit. We have tested the theoretical analysis with our 1D diffraction model by computing the shapes of line foci computed numerically with standard 2D diffraction by a circular aperture. Figure 4 shows an evaluation of the obtained results for a design line length 2l = 500λ/NA. Figure 4(a) shows the result for an aberration function that consists of lowest order astigmatism only, as e.g. created by the use of a cylindrical lens. The peak intensity over the line decreases to zero and the line broadens towards the edges. The full width at half maximum of this line is 1.42l, i.e. just 71 % of the total intended line length 2l. The practically usable part of the line is even less than this as a drop of 50 % in peak intensity is already quite high. Figure 4(b) shows the result for the aberration function including higher order astigmatism, optimized for uniform peak intensity according to Eq. (17). The peak intensity turns out to be uniform over the full line, as predicted by the 1D diffraction model, except for a small amount of ringing close to the line edges as can be seen in Fig. 4(c). This ringing originates from diffraction at the aperture edges in the ρ x -direction, which is not accounted for in the 1D diffraction model. At x = ±220λ/NA ringing is no longer present and the line width is within one Airy unit, as shown in Fig. 4(d). We conclude that about 85 % of the design line length can be used, without a substantial decrease in peak intensity or a substantial increase in line width.

Image formation theory
Several aspects of the envisioned imaging system have impact on the formal image formation model. First of all, the illumination is with a set of line foci, making the illumination intensity as a function of position in object space: where H ex (x, z) is the intensity profile of the line foci, ∆y is the lateral separation of the line foci, ∆z is the axial separation of the line foci, and the sum is over all participating diffraction orders. Second, the image data is captured at a tilted image sensor, which mixes the lateral and axial coordinates. For a fluorescent object F (x, y, z) recorded for a scan coordinate y s , the intensity recorded at the image sensor is: where H em (x, y, z) is the emission Point Spread Function (PSF), and M is the lateral magnification. This may be written as a sum over line images: with (substitute y = y − m∆y): Taking into account that only a small range of y values around the central line position at y = 0 give rise to substantially non-zero intensities we may approximate this as: The procedure of digital spatial filtering following a slit shape, possibly, including the collection of extra signal along the scan with a TDI operation can be incorporated by setting y s = y i − y and integrating over y , where the integration range is K rows of pixels (we take K = 4 or K = 1).
Here y i is the image coordinate in the scan direction. The image coordinate in the field direction is simply found by x = x i . Then we find for the image data at layer m: where the total PSF: is seen to be the product of the emission PSF with the average of the line excitation PSF averaged over a lateral distance corresponding to K rows of pixels. Close to focus, this integration range is sufficient to capture the entire excitation signal, maximizing signal, and reducing the overall PSF to the emission PSF of the system. This stands in contrast to conventional confocal point scanning microscopes, where the imaging in focus is determined by the excitation PSF. Further away from focus, the integration range is substantially less than the width of the defocused lines. Consequently, the captured signal is reduced and optical sectioning is achieved. The complete sectioning behavior is described by the background function: which gives the contribution to the background of a layer a distance z away from the imaged plane. For non-sectioning widefield imaging B (z) = 1, for sectioning modalities B (z) is peaked around z = 0, decaying to zero as |z| → ∞. The illumination lines are characterized experimentally by imaging thin uniform fluorescent layers, for which F (x, y, z) ≈ δ (z − z ), with δ (z) the delta-function, in a static configuration without any scanning (y s = 0) but with a tunable defocus z . Then the expected line image is a function of the image coordinate in the scan direction y and of the defocus z only: which leads to an image that is the average of the emission PSF in the field (x) direction, and the convolution of the line excitation PSF and the emission PSF in the scan (y) direction. This image function depends is dominated by the broadest of the two functions involved in the convolution, either the line excitation profile or the average of the emission PSF in the field direction. Integrating the obtained images over the lateral coordinate provides a measurement of the background function B (z) needed to assess the degree of confocality. The illumination lines are further characterized experimentally by imaging thick fluorescent layers for which F (x, y, z) is substantially constant, in a static configuration without any scanning (y s = 0), and with an additional defocus z applied to the excitation PSF. In that case the expected line image as a function of y and z is given by: where a change of integration variable is used for the integration over the axial coordinate. This through-focus line PSF like function is thus the 2D convolution of the through-focus line excitation PSF and the average of the through-focus emission PSF over the field coordinate. The measured through-focus line shape is determined by the broadest of these two functions.

Experimental setup
The experimental setup is illustrated in Fig. 5. A Nikon 20× NA 0.75 Plan Apochromat VC objective lens with a focal length of F ob = 10 mm and a custom Nikon tube lens (effective focal length of F tube = 222.4 ± 2.2 mm) are used, giving a lateral magnification of the imaging light path M = 22.24. The axial magnification of the imaging path is M ax = χM 2 /n = 352, where a sample refractive index n = 1.5 is assumed and χ = 1.07 is a non-paraxial correction factor [32,33]. The specimen (slide) is scanned using two stages: a PI M-505 low profile translation stage for positioning of the slide in the field direction and a Newport XM1000 ultra precision linear motor stage for the continuous scanning motion of the sample. For focusing two stages are used to axially translate the objective: a PI M-111 compact micro-translation stage for coarse positioning, and a PI P-721.CL0 piezo nanopositioner for fine positioning. A Hamamatsu Orca Flash 4.0 v2 camera is used with a pixel pitch of 6.5 µm corresponding to p = 0.29 µm in object space, which somewhat undersamples the imaging PSF (Nyquist sampling is at λ/4NA = 0.20 µm for λ = 590 nm). The gain of the camera is measured to be 0.39 e − /ADU. The camera is tilted over an angle of β = 20°with respect to the optical axis. The image sensor size in the direction perpendicular to the line sensors is 2048×6.5 µm= 13.3 mm, such that the axial range in image space is d = 4.4 mm and the axial range in object space is d/M ax = 12.5 µm. The Orca Flash camera has a micro-lens array for higher photon detection efficiency, which makes the sensor less efficient for light that is incident at an angle. The measured loss in light collection efficiency is shown in Fig. 5(b). It appears that tilting the detector 20°reduces the light collection efficiency to 53 % of its nominal value. The camera supports exposure times down to 1 ms. The frame rate, however, is limited by the time required for the rolling shutter to read out the sensor area, leading to a maximum frame rate of 100 fps for a full frame acquisition.
We found the camera to skip some frames when operated at high frame rates, which was solved by reducing the frame rate to 21 fps.
The light source is an Omicron LightHUB-4 equipped with a PhoXX+ 488-100 laser with 100 mW output power and 488 nm wavelength. A polarization maintaining broadband fiber with an achromatic collimator is used to obtain a collimated beam in free space. A beam expander is built using a Thorlabs AC254-50-A, and AC254-200-A achromats for a 4× magnification. A Liquid Crystal on Silicon (LCoS) Spatial Light Modulator (SLM, Boulder Nonlinear Systems XY Phase 512) is used for the required illumination PSF engineering. The SLM has N 2 = 512 × 512 pixels and a square aperture of 7.68 mm, resulting in a pixel pitch of 15 µm. A Thorlabs AHWP10M-600 achromatic half wave plate is used to align the polarization of the beam to The polarization of the laser beam is aligned to the SLM using a half wave plate (HWP) and filtered using a polarizer (POL). The beam is expanded and projected onto the SLM. The diffracted light is coupled into the objective lens using a filter cube consisting of an excitation filter (EX), emission filter (EM) and a dichroic mirror (DIC) and imaged in the back focal plane of the objective. The fluorescence light is imaged onto the CMOS sensor using the tube lens. Focal lengths F are given in mm. (b). The measured impact on the photon detection efficiency of tilting the camera with respect to the optical axis. For a tilt of 20°the detection efficiency is reduced with 53 % with respect to orthogonal incidence. the SLM and a Thorlabs LPVISE100-A linear polarizer is used to suppress the orthogonal polarization. A relay path from SLM to the back focal plane of the objective lens is made using a Thorlabs AC254-150-A achromat (F = 150 mm), and an AC254-100-A achromat (F = 100 mm giving a magnification of M = 0.67). A Semrock LF405/488/532/635-B-000 quad-band dichroic mirror is used to couple in the illumination light into the objective and direct the emission light towards the camera. A National Instruments NI PXIe-6363 data acquisition card (DAQ) is used for synchronization of the hardware. The DAQ gives a start trigger to the camera for each frame, and it provides a high signal to the laser for digital modulation, such that the laser is only switched on during the camera integration period. The acquisition card is synchronized to the output trigger of the scanning stage.

Fluorescent test samples
Two test slides are used to calibrate and test the scanner setup. First, we use a 1.4 mm thick green plexiglas slide of Chroma that shows strong autofluorescence when excited with blue light. A glass coverslip was attached onto the slide for proper spherical aberration matching. This removes the need for imaging deep into the material, and so prevents scattering and reduces out-of-focus fluorescence. This slide has high fluorescence, is relatively homogeneous and has low bleaching. Second, we used a slide with a spin coated fluorescent layer of Rhodamine on a coverslip. This slide has the advantage of being very thin ( 1 µm), but is relatively inhomogeneous and suffers from bleaching.
Two test samples are used to demonstrate the scanner system. The first slide contains a 5 µm thick human prostate tissue micro array (TMA) section labeled using the Kreatech ERBB2 (17q12) / SE 17 FISH probe (product number KBI-10701). This labeling is used for the detection of amplification of the ERBB2 (also known as HER-2/neu) gene via Fluorescence in situ Hybridization (FISH). As a reference, also the chromosome 17 centromere is labeled using a Satellite Enumeration (SE) probe. The ERBB2 specific FISH probe is direct-labeled with PlatinumBright550. The SE 17 specific FISH probe gene region is direct-labeled with PlatinumBright495. The SE 17 probe is imaged with our setup. The average wavelength of the fluorophore at the detector (taking into account the emission spectrum and the dichroic filter) is 563 nm. The second slide is a stage 3 human rectum cancer sample with immunofluorescence staining. Three different antibodies label Desmin, CD31, and D2-40. In this work, the Desmin protein (labeled with IgG1 M Alexa Fluor 488) is imaged, which is highly expressed in muscle cells. The average wavelength of the fluorophore at the detector (taking into account the emission spectrum and the dichroic filter) is 539 nm.

Data acquisition and processing
Image data is acquired in a 'push broom' scanning fashion [17,18]. The speed of the scan movement is v = p cos β/t l , where β is the sensor tilt angle, t l is the line period and p is the pixel pitch in object (sample) space. Every line period the camera captures a full frame which is transferred to the host computer. A number of regions of interest (ROIs) corresponding to the number of illumination lines are selected, the rest of the image data is deleted. The ROIs have an equal size, span the width of the sensor, and have an equal spacing s r = Ms l /cos β where s l is the tangential illumination line spacing in object space. The selected ROIs consist of either one or four rows of pixels. This functions as a confocal slit detector of width p cos β or 4p cos β. The data is added in a Time Delayed Integration (TDI) fashion for the four-line acquisition mode, in order to increase signal-to-noise ratio without sacrificing resolution. [34]. The data obtained in this way for the different ROIs correspond to the layers of the simultaneously acquired focal stack. Storage of the image data to the computer memory of layer m is delayed with a time m = s r t l m in order to guarantee lateral alignment of the layers of the focal stack. The data is saved to hard disk after completion of a lane scan.
The following steps are taken in post-processing. First, a fine alignment between the layers is done to compensate for residual misalignments by optimizing the product of all layers on a 1024 × 1024 pixels patch with distinct point sources. Second, the image is resampled with a factor cos β in the scan direction to have an image with square pixels. Third, a constant value is subtracted to correct for detector offset. Finally, flat fielding is applied to compensate for residual intensity variations (down to 50 % at the edge of the region of interest) mainly arising from the Gaussian illumination profile of the back aperture of the objective.
A maximum intensity projection along the z direction is used for a 2D visualization of the 3D dataset. For improved dynamic range, a gamma correction is applied. The gray scale image is mapped to RGB by multiplying the gray level with an RGB triplet based on the calculated filtered fluorophore emission spectrum: (0.55, 1, 0.08) for Platinum495 and (0.28, 1.0, 0,15) for Alexa488.

Spatial light modulator
The so-called space-bandwidth product of the illumination light path (product of illuminated FOV and spatial frequency bandwidth of the illumination pattern) is limited by the number of pixels of the SLM N. Suppose the SLM is imaged by the relay lenses to an aperture of width w in the back focal plane of the objective lens. This provides an illumination NA given by NA ill = w/2F 0 with F 0 the focal length of the objective. The maximum spatial frequency that can be generated by the SLM is N/2w, which gives rise to field angles η up to sin η = λN/2w, resulting in a field of view: In other words, the product between FOV ill and spatial frequency bandwidth 2NA/λ is fixed by the number of SLM pixels N. In the resulting trade-off between FOV ill and NA ill , we have chosen for NA ill = 0.26 and FOV ill = 488 µm.
An illumination line pitch of s l = 54.5 µm is used, corresponding to 200 pixel rows on the sensor. The 9 lines span a total distance 8s l = 436 µm, which conveniently fits in the FOV ill . Intentionally some room was left at the edges of the FOV because this area is most sensitive for ghost lines. These lines correspond to spatial frequency content of the aberration profiles higher than the bandwidth defined by the sampling density of the pixel grid of the SLM. These spatial frequencies are folded back into the central field of view.
Using a multi-line imaging approach can potentially lead to optical cross-talk, i.e. fluorescent signal excited by a line beamlet is detected at the pixel rows that are conjugate to the neighboring line beamlet. The sensitivity to cross-talk can be estimated using a geometrical optics based argument. Suppose a line of interest has its focus at the origin (y, z) = (0, 0). Light is collected from within a cone with a half angle α det = arcsin(NA det /n), where NA det is the detection NA and n the sample refractive index. The neighboring line is illuminated with a beam focused at (y, z) = (s l , −M tan βs l /M ax ), where β is the sensor tilt, and M and M ax the lateral and axial magnification of the imaging path. This beam has a half angle of α ill = arcsin(NA ill /n), where NA ill is the illumination NA. The z position at which a fluorophore excited by the neighboring illumination beam first enters the cone of the detection NA is given by: Filling in the relevant parameters results in z c = 72 µm. Light from such a distance away from focus will in practice be completely suppressed by the confocal detection. An SLM has a limited diffraction efficiency, which leaves a part of the incoming beam unmodulated, resulting in a focused 'zero-order' spot in the sample. The diffraction efficiency is affected by several factors [35]. First, the SLM only works for one polarization direction. A half wave plate and a polarizer were included in the beam expander for polarization alignment and suppression of residual orthogonally polarized light. Additionally, the top of the glass surface will contribute to a plane wave reflection, even though it is coated with an anti-reflection layer. Finally, the limited fill factor of the SLM pixels leave parts of the aperture unmodulated. The line illumination pattern is aligned such that the zero-order spot is centered between two lines to enable spatial filtering at the image sensor, and will therefore have no impact on the final 3D image.
The SLM response curve is described by the phase change φ( ) as a function of the digital input gray level ∈ [0, 1). A lookup table is required to find the gray level given the desired phase shift. A single lookup table is used across the field of the SLM, implying that non-uniformity of the response function is not taken into account. Coarse pixel-to-pixel variations result in low order aberrations that are corrected later on, fine pixel-to-pixel variations result in a small scatter background that has little impact on the final image.
An image based method is developed to obtain the response curve of the SLM. A binary grating with a period much larger than a single SLM pixel and gray levels 1 and 2 is created, for a range of values ( 1 , 2 ) . The power in the diffracted orders is obtained by imaging a diffraction pattern that is projected onto the uniformly fluorescent calibration slide. The space ( 1 , 2 ) is sampled randomly instead of linearly to prevent biases arising from bleaching during data acquisition. Then, the ratio between the power in the first and second diffraction order is measured and fitted with the theoretically expected value: where an extra degree of freedom C is introduced to account for the above mentioned zero order spot and possible background and detector offset. The response function φ ( ) is parameterized by a polynomial series of order L, and MATLABs fminsearch algorithm is used to fit the polynome coefficients. Figure 6(a) shows the measured values, Fig. 6(b) shows the fit model for the obtained fit parameters, and Fig. 6(c) shows the corresponding SLM response curve φ ( ). We found L = 3 to provide enough degrees of freedom to fit the data. The total phase modulation of the SLM exceeds 2π for the used wavelength, which leaves the freedom to choose a subset of values to create the lookup table. We choose the range between = 0.31 and = 0.90, as indicated with the red boxes.

Alignment of the spatial light modulator
A careful alignment of the illumination PSF, consisting of the set of multi-focal scan lines, with the pixel grid of the image sensor is required for implementing digital slit pinholes at the imaged line positions in the sensor plane. The reference coordinate system to align the scan lines is given by the optical axis of the objective lens (z-axis) and the two principal directions of the pixel grid of the sensor, which correspond to the field axis (x-axis) and to the scan axis (y-axis). The phase pattern on the SLM needs to be aligned to this coordinate system carefully. A final alignment step is implemented, after mechanical alignment of the SLM in the illumination light path, by means of an affine coordinate transformation between the coordinates of the SLM pixel grid and the pupil coordinates in the back focal plane of the objective lens. In this way three types of misalignment can be corrected. First, the in-plane rotation can be corrected, so that the scan lines are imaged on the sensor parallel to the field axis defined by the pixel grid. Second, the SLM is slightly tilted with respect to the optical axis to separate the incoming and outgoing beam by 11 • , see Fig. 5. Projection of the tilted sensor plane to the plane perpendicular to the optical axis, distorts the circular beam profile to an elliptical one. This effect may be compensated by an anisotropic stretch in the affine correction transformation. Third, a slight misalignment of the origin of the SLM can be compensated for by a shift of the grating pattern. A small misalignment in the axial position of the SLM can be corrected by adding a small amount of defocus to the induced aberration. The affine transformation between the coordinates in the SLM plane ì σ and the pupil coordinates ì ρ is given by: where the first matrix describes the rotation over an angle θ around the z-axis, the second matrix represents the (anisotropic) scaling due to the tilt of the SLM in the light path and due to the overall magnification from the SLM plane to the back focal plane of the objective, and ì σ 0 is the position offset with respect to the optical axis.
The following aberration profile is fed to the SLM: where g is a pattern function, λ is the wavelength, R s is the radius of the illuminated area on the SLM, and A and B are freely configurable parameters in units of λ. The first two terms will diffract the light along the horizontal and vertical directions, respectively. The pattern function g is chosen such that light is uniformly distributed over the first 9 diffraction orders (−4 th to +4 th order). The third term will introduce a defocus, but for nonzero ì σ 0 it also introduces a lateral shift [27].
where ì x 0 is an overall offset with respect to the optical axis. These values are small, as is expected of a mechanically well aligned system, but in need of correction nevertheless.

Correction of aberrations in the illumination light path
The designed illumination PSF, comprising the set of multi-focal scan lines, will be distorted by unintended aberrations in the illumination light path. These can arise from a (coarse) non-uniformity of the SLM pixel response function, or by the components and/or misalignment of the beam expander and the relay path between the SLM and the objective back focal plane. Fortunately, the SLM that is used to introduce intentional aberrations, can also be used to remove the unintentional aberrations. To this end, an image based method is used, without the need for any additional optical components in the setup [36]. The optimization metric that is used is the peak intensity of a single imaged spot (Strehl ratio optimization). A focused spot is imaged using the thick uniformly fluorescent calibration slide, which is aligned such that the object plane lies about 10 µm deep in the material. In this way a depth averaged illumination PSF is imaged, but this is not problematic for the intended aberration retrieval. The actual peak intensity is determined from a two dimensional parabolic fit of the pixel values around the peak. The to-be corrected aberration profile is parameterized as a sum of orthogonal Zernike polynomials Z nm with radial index n and azimuthal index m, where all modes with n + |m| ≤ 6 are taken into account, except for piston, tip and tilt. This takes primary and secondary astigmatism, coma, and spherical aberration into account, as well as primary trefoil. Figure 8 illustrates the search algorithm used to find the optimal aberration. For each Zernike mode a set of typically three to five spots are recorded with an aberration coefficient that is sequentially changed by an amount ±∆A, where the sign is set such that a local optimum in the fitted peak intensity is within the range of recorded aberration settings. A value ∆A = 50 mλ is found to properly sample the optimization metric, while only a small number of measurements is required to find the optimum. The optimum aberration value is found by parabolic fitting of the set of fitted peak intensities, where only the maximum value and its two neighbors are included. Figure 8(a) illustrates this search method for primary vertical astigmatism. In this case, four measurements where needed to find the optimum. The corresponding spots are displayed in Fig. 8(c). This procedure is repeated for all Zernike modes. The result is shown in Fig. 8(b). Confidence intervals were obtained by repeating the whole procedure ten times on different parts of the sample. Apart from the contribution of Z 20 (defocus), the total root mean squared (rms) aberration was found to be 168 ± 20 mλ, composed mainly of astigmatism, and significantly higher than the diffraction limit of 72 mλ. Figure 8(d) show the illumination spot without any correction, after minimizing the contribution of Z 20 , and after aberration correction. Clearly, correction of unintended aberrations is a necessity in our setup.

Characterization of the illumination pattern
The line profile of the illumination PSF is tested experimentally using the thin uniformly fluorescent test slide. The thin fluorescent layer is aligned such that the central illumination line is in focus. The sample translation stage is then used to slowly move the slide during image acquisition in order to create a small 'motion blur' for smoothing out sample irregularities and for spreading the impact of photo-bleaching. The measurement is repeated ten times and the resulting images are averaged for further reduction of noise and sample irregularities. According to the image formation model the expected line shapes are the convolution of the line excitation PSF at focus m∆z and the average of the emission PSF in the field direction, at focus −m∆z. The difference between the imaging NA (equal to 0.75) and the line illumination NA (equal to 0.26) implies that near focus the width of the line excitation PSF is much larger than the width of the emission PSF, whereas further away from focus this is the opposite. The reason is that the relatively small depth of focus of the emission PSF results in a steep increase in the lateral width with defocus. Figure 9(a) shows the full FOV of the image sensor with the illumination PSF, showing the successfully creation of 9 equally spaced focal lines (labeled with their diffraction order number m). Figure 9(b) shows the intensities of the focal lines along the field direction. The higher order astigmatism based design ensures a substantial illumination intensity over the whole line length. The remaining decrease of power towards the edges is caused by the Gaussian beam profile at the objective pupil. In practice, a decrease of intensity at the edges of the lines to about 30 % is observed. The lines show irregularity on a small length scale resulting from laser speckle. Further, a significant decrease in intensity is visible around x = 0 µm for all lines, resulting from an uneven illumination of the SLM. Figure 9(c) displays the FWHM of the lines. The center lines are in focus and approach the theoretical limit of FWHM = 0.89λ/NA. The FWHM of the outer lines (m = −4, −3, 3, 4) are substantially larger, as they are recorded out of focus, in agreement with the image formation model. A slight tilt of the sample in the field direction is apparent from the fact that the lines with m > 0 have a decreasing FWHM for larger x and vice versa for the lines with m < 0. The sectioning capability of the system is tested by measuring the background function as defined in Eq. (29). This is done by repeating the previous measurement for a series of axial sample stage positions. The 10 times averaging is omitted for this measurement to limit the illumination photon dose and so avoid bleaching of the sample. The measured images are integrated over the digital slit in the y (scan) direction (up to 4 pixel rows) and over 50 pixels about halfway the line center and the line edge (at around x = w/4) in the x (lateral) direction, for noise suppression. As a result, we obtain a background signal as a function of the stage position z for all 9 lines, see Fig. 10. For the first result, in Fig. 10(a), a single row on the sensor is used as confocal detector. The background function has an average full width at half maximum FWHM z = 8.4 µm. In Fig. 10(b), the result is given the case of a four row confocal detector, which slightly increases the FWHM z to 9.0 µm on average. The theoretical values for the response to a uniformly fluorescent thin slide are given in Fig. 10(e). Qualitatively similar curves are found, but less wide than the experimental curves. The FWHM z is found to be 4.2 µm. The background measured at the bottom scan plane from a source located at the top scan plane is about 20 %, where from the simulation a reduction to about 10 % would have been expected. This result is based on a numerical simulation that computes the illumination PSF H ill , given a defocus following from a stage position that is z s out-of-focus, according to Eq. (22). Then, the intensity at the sensor I is calculated by convolution of H ill with the imaging PSF corrected for a stage defocus of z s , see Eq. (23). The simulated intensity I is saved to disk as an image file and analyzed in the same way as the experimental results. The discrepancy between the width of the measured peak in the background function and the theoretical expectation may be attributed to misalignment, the finite thickness of the layer, and residual aberrations, mainly spherical aberration. The use of scalar diffraction for computing the PSFs instead of more realistic vector PSF models could also lead to a predicted background function peak width that is too optimistic. The illumination PSF is further tested for uniform spacing and for equidistant foci. To this end, the thick uniformly fluorescent test slide is aligned such that the focal point of the objective is about 10 µm deep in the material. A through-focus stack is created by repeatedly adding a defocus aberration W to the SLM and capturing an image. According to the image formation model the expected line shape is the 2D convolution of the through-focus line excitation PSF and the average of the through-focus emission PSF. Because of the much smaller illumination NA this implies that the measured through-focus line profiles are largely determined by the through-focus line excitation profiles, and hardly by the through-focus emission PSF. Consequently, by using an aberration W = z n 2 − ρ 2 NA 2 , the illumination PSF is sampled at a depth z. Figure 11(a) shows the measured through focus stacks for an illumination PSF of the 9 beamlets, averaged across the x (field) direction. To find the focus, for every beamlet the position of the maximum intensity is determined. Figure 11(b) displays these positions in the (y, z) plane. The positions have an equal spacing in the y direction of 54.5 µm or 200 pixels on the sensor with a deviation less than 0.14 µm (0.5 pixel). In the z direction the sub beams do have the desired tilted focus. The graph includes a linear fit through the focal points, which overlaps with the tilted object plane with a deviation less than 0.2 µm. However, on top of the linear focus shift, we observe an apparent field curvature leading to a defocus of up to 1.8 µm at the edges. This leads to a slight axial misalignment of the illumination PSF compared to the reference axial imaging depth defined by the imaging PSF. The imaging depth will, however, only be slightly biased as the NA of the illumination is lower than the NA of the imaging path, and therefore the latter will dominate the imaging depth.

Scan results
We present three scan result as proof of concept. First, a 9 layer multi-focal whole slide scan is created of the FISH labeled TMA tissue section. Figure 12(a) shows a maximum intensity projection of the whole slide scan. The image is composed of thirteen scan lanes of 20,000 lines with 1536 pixels width. After resampling this results in a 3600 Mpixel volumetric image (19, 968 × 18, 794 × 9 pixels, size 5, 790 × 5, 450 × 11 µm). For this acquisition, an exposure time of 1 ms and four TDI lines are used. Figures 12(b)-12(f) provide closer details of the scan and show that zooming in to the cellular level for inspecting tissue structure in Fig. 12(c) and resolving individual FISH spots in Fig. 12(f) is possible.
The second scan result displayed in Fig. 13 shows the multi-focal imaging capability of the system. The same FISH labeled tissue is scanned using a single line detector (i.e. no TDI) for better confocality. Here, an exposure time of 10 ms is used. Figure 13(a) shows a frame of a 3D visualization of the z-stack, created using the image analysis software Icy [37]. FISH spots can be observed to have different z-positions. This is further illustrated by the color coded maximum intensity projection shown in Fig. 13(b). In this image, the luminescence of each pixel corresponds to the maximum intensity throughout the layers, while the color corresponds to the layer of maximum intensity. Most spots color orange, indicating a depth of about 3 µm. Some are observed to lay deeper in the sample, ranging from −1 µm (green) to −3 µm.
The third result is a scan of the stage 3 human rectum cancer sample with immunofluorescence staining (10 ms exposure time and using a single pixel row as digital slit). Figure 14 shows the central layer of a single lane scan. Fig. 14(b) gives a detailed view which provides a good benchmark for the resolution and contrast delivered by the system. For all scanned samples the reduction of out-of-focus background light by the digital slit pinhole is moderate, in agreement with the measured background function with FWHM z of around 9 µm, and the thickness of the samples of around 5 µm.

Throughput and resolution
The throughput of the current test setup is T = N z N x f , with N z = 9 the number of focus layers, N x = 1536 the line length in pixels, and f = 21 lines/s the line rate, leading to a modest T = 0.29 Mpixels/s. This number is primarily limited by the practical constraints of the image sensor, as the actual exposure time in TDI-mode was just 1 ms, implying a potential gain in throughput with a factor of about 50. A second practical limitation is in the illumination light path efficiency. The main losses were measured to be the coupling into the fiber (35 % loss), the polarizer (35 % loss),   the iris that cuts the beam to about one sigma (39 % loss) and the SLM (48 % loss), accumulating to a total illumination light path efficiency of 14 %. These losses are not fundamental and can be improved upon, for example by better alignment of the half wave plate to the polarizer; using free space optics instead of a fiber; a beam shaping optical element to reshape the Gaussian beam into a top hat beam [38]; and by using a dedicated diffractive structure etched in quartz or embossed in a polymer instead of the SLM. A third limitation is the reduced detection efficiency of the camera for light incident at an angle due to the micro lens array (47 % loss). Overcoming these practical hurdles can improve throughput with a factor ranging between 200 and 600 without affecting the SNR. The exposure time could be reduced by an increase in laser power to a decent but not excessive 500 mW. This would further enhance the throughput with a factor of 5. As a result, the proposed optical architecture is estimated to support image acquisitions with a throughput on the order of several hundred Mpixel/s. In the current setup, that would correspond to a scanning speed on the order of 10 mm/s. Note that mechanics do not pose practical constrains on the throughput as the only movement in the system is the linear translation of the sample.
The fundamental limitation to throughput of the proposed scanning architecture is the minimum SNR that is required [39]. The SNR is dominated by shot noise, and therefore the achievable throughput is governed by the minimum amount of photons that must be captured. The maximum line rate f max is proportional to ηKP/N min , with η the overall photon efficiency of the illumination and imaging light path and detector, K the number of TDI-lines, P the laser power, and N min the minimum photon count for a reference fluorescent object.
The relation between the throughput and SNR that is currently achieved is further assessed on the basis of the scan result of the whole slide scan of the FISH slide, see Fig. 12. This 5 × 5 mm tissue section is scanned in 12 lanes of about 17,000 lines, which gives a total camera time (exposure time times number of lines) of less than 3.5 min. We have measured the peak signal to noise ratio (pSNR) and the peak signal to background ratio (pSBR) in a region of 445 µm by 558 µm with a dense distribution of FISH dots. A maximum intensity projection of this region is shown in Fig. 15(a). A zoomed-in view of three FISH spots are shown in Figs. 15 where we selected the layer with best focus. A classification is performed in which each pixel is classified as being part of the background, part of a spot or neither of both. The classification is done on the 2D maximum intensity projection. The 20 th percentile lowest values displayed in Fig. 15(e) are classified as background. An average background level of 139±7 e − is found. The pixels with the 0.5 th percentile highest are classified as part of a spot or cluster of spots. These pixels are segmented based on 4-connectivity. Small segments are rejected by requiring the shape to span at least three columns and three rows. The peak intensity in every segment was determined by first selecting the layer with best focus and then by interpolating the peak intensity using a two dimensional parabolic fit of the values surrounding the pixel with the maximum intensity to the the model I(x, y) = w x (x − x c ) 2 + w y (y − y c ) 2 + I p , where I is the image intensity, w x , w y , x c and y c are fit parameters and I P provides the peak intensity. In total, 109 segments are found with peak values ranging from 908 to 1739 e − . For every segment the pSBR is calculated by dividing the peak intensity by the average background level. Figure 15(f) shows a histogram of the obtained values. An average pSBR of 8.4 (6.5 -12.0) is found. The pSNR for every segment is approximated as the square root of the peak intensity, as the noise is dominated by shot noise. Figure 15(g) shows a histogram of the obtained pSNR values. An average of 34 (27 -38) is found. The pSNR and pSBR values that are found are sufficient for state-of-the-art spot segmentation algorithms, which require an SNR level around 5 [40]. This implies that in principle the scan speed could have been 7 times faster.
The lateral and axial resolution of the system are assessed based on the same dataset. Close to the maximum, the spot intensity is well approximated by a two dimensional parabola. Consequently, the lateral full width at half maximum (FWHM) is estimated from the fit parameters w x , w y according to FWHM (x, y) = 2 I p /2w (x, y) . Histograms summarizing the results are shown in Figs. 15(h)-15(j). The lower bounds of the distributions provide the best indication of the resolution, as the segments include both single spots and spot clusters. The 5 th -percentile is 0.50±0.13 µm in the x direction (along the illumination line). In the y direction (orthogonal to the illumination line i.e. the scan direction) a value of 0.53±0.14 µm is found. The FWHM of the emission PSF according to the scalar diffraction based Airy distribution convolved with the pixel size is given by 0.57λ/NA = 0.43 µm for an emission wavelength λ = 563 nm, and is included in Figs. 15(h)-15(j) as black lines for comparison. The obtained values agree reasonably well with this theoretical value. The axial resolution was assessed in a similar fashion. For every segment, the maximum intensity in every layer I z (z) is determined. The values around the maximum are fitted with the parabolic model I z (z) = w z (z − z c ) 2 + I pz . Then, FWHM z = 2 I pz /2w z . Figure 15(j) shows a histogram of the obtained result. The FWHM of the emission PSF is given by 1.61λ/(n − √ n 2 − NA 2 ) = 4.52 µm and is included as a black line. The 5 th -percentile of the found distribution is 3.35±0.10 µm, which exceeds the theoretical value. Factors impacting this analysis are the limited number of focal layers and the residual non-uniformity of the illumination intensity between the the focal layers.

Discussion
In summary, we presented a multi-line scanner for multi-focal fluorescence image acquisitions in a single scan. The sample is scanned with respect to a mechanically fixed optical setup, thereby providing simple means for an unlimited field of view. The optical architecture of the scanner is based on illumination PSF engineering. We used a design with diffractive optics for generating a set of parallel scan lines, focused at equidistant focal planes, in combination with a tilted sensor, for capturing the emitted fluorescence from the scan lines in parallel. An important new element in the design is the use of higher order astigmatism to improve the uniformity of peak intensity and line width along the scan lines. The approach is suitable for scanning large area tissue samples, as needed for imaging e.g. immunofluorescently labeled tissue samples. The ultimate goal for large area FISH imaging is to improve the statistical reliability in diagnosis, by increasing the number of cells tested and by correlating the resulting gene counts to the heterogeneous tissue structure.
The current setup is limited in the overall axial range of the 3D scan to a value between 10 and 15 µm. A larger axial range could for example be obtained by replacing the 20× objective with a 10× objective, leading to a fourfold increase in axial range. This, however, comes at the expense of lateral resolution, as the decrease in magnification is usually attended by a decrease in NA as well. Such an intended increase in axial range would also entail producing more than the current 9 parallel scan lines.
A major technical improvement of the current setup would be the use of an SLM with more pixels, e.g. 1920 × 1152 instead of the current 512 × 512. The enhanced space-bandwidth product could enable focal lines with an illumination NA matching the detection NA, equal to 0.75, instead of the current value 0.26, while maintaining or increasing the length of the scan lines of close to 0.5 mm. The main impact would be the improved sectioning capability, where our PSF simulations indicate that a peak in the background function with a width equal to 1.5 µm can be realized. A next step for the proposed multi-line confocal scanner is the extension to multi-colour imaging. This can be realized using time multiplexing or space multiplexing. For time multiplexing the system cycles through the different excitation wavelengths. For each excitation wavelength a set of multi-focal scan lines must be generated, either via a fast switching SLM or by a setup with a modulator to select the wavelength and a dichroic beam splitter architecture wherein each color branch has a separate diffractive optical element, designed for the particular wavelength. For space multiplexing, the different sets of multi-focal scan lines for the different excitation wavelengths are slightly displaced with respect to each other in the direction of the tilted plane imaged onto the sensor. As a consequence, the different color channels can be imaged in parallel, making more economical use of the sensor area. This approach does not require switching of the illumination and is compatible with a rolling shutter read-out.
Another extension that is envisioned is the application of techniques to improve the resolution. In particular, the Image Scanning Microscopy (ISM) method and the closely related pixel reassignment and rescan methods [41][42][43][44]. In fact, a modification of the TDI-approach we use can already deliver an ISM type of resolution improvement in the scan direction. This would require oversampling in the scan direction in combination with a digital stretch of the captured images in the scan direction before the step of adding the images acquired in subsequent frames. Although relatively straightforward, this would have the drawback of an anisotropic resolution, as the imaging in the field direction, along the scan lines, remains unaffected. A more isotropic improvement in resolution would require a modification of the multi-focal line illumination to a multi-focal multi-spot illumination, i.e. an illumination PSF which is spatially varying in the direction orthogonal to the scan direction as well. Such methods could overcome the drawback of the current scanning approach that it is not easily compatible with immersion objective lenses for high-resolution imaging.