Multicolor fluorescent imaging by space-constrained computational spectral imaging

: Spectral imaging is a powerful technique used to simultaneously study multiple fluorophore labels with overlapping emissions. Here, we present a computational spectral imaging method, which uses sample spatial fluorescence information as a reconstruction constraint. Our method addresses both the under-sampling issue of compressive spectral imaging and the low throughput issue of scanning spectral imaging. With simulated and experimental data, we have demonstrated the reconstruction precision of our method in two and three-color imaging. We have experimentally validated this method for differentiating cellular structures labeled with two red-colored fluorescent proteins, tdTomato and mCherry, which have highly overlapping emission spectra. Our method has the advantage of totally free wavelength choice and can also be combined with conventional filter-based sequential multi-color imaging to further improve multiplexing capability.


Introduction
In fluorescence microscopy, it is desirable to selectively label sample structures with differently colored fluorophores to study interactions [1]. Commonly, multiple colors (namely channels) are either imaged sequentially by using different filter sets or simultaneously by splitting signals onto different regions of the camera or onto several cameras. Both approaches rely on filters and are thus ultimately limited by the spectral overlap of fluorophores, which makes it difficult in practice to distinguish more than four colors within the visible spectrum without having substantial crosstalk among channels. Sequential imaging of more than four targets is particularly challenging for live samples.
As an alternative, spectral imaging is a powerful tool to simultaneously study multiple labels in biological samples at the subcellular, cellular and tissue level [1][2][3]. Traditionally, in spectral imaging, a three-dimensional (3D) data cube (2D spatial, 1D spectral) is generated via spatial or wavelength scanning (hereafter called scanning spectral imaging), where a diffraction grating or prism is used to acquire the full spectrum at each spatial point of the image [4]. With the spectral information, spectral imaging approaches are capable of unambiguously identifying fluorophores with overlapping spectra and permitting high levels of signal multiplexing [5]. However, a major drawback of scanning spectral imaging is the relatively long acquisition time. To solve this problem, snapshot spectral imaging methods that based on tomographic multiplexing [6,7] or compressive hyperspectral imaging [8][9][10] have been developed. In the latter case, the entire data cube are either coded and captured in a single 2D camera integration with different wavelength signal dispersed to different spatial locations [8,9], or coded by a series of Hadamard patterns to generate single pixel signals [10]. The whole data cube are then reconstructed using compressed sensing theory.
Unfortunately, these methods are difficult to apply in fluorescence microscopy that uses high numerical aperture objectives to achieve diffraction-limited resolution [4]. Only multicolor fluorescent beads images have been demonstrated [9,10].
In this paper, we demonstrate a computational spectral imaging method, which aims to address both the under-sampling issue of compressive spectral imaging and the long acquisition issue of scanning spectral imaging. To do so, we employ two strategies: (1) using a dual-view imaging system to generate both an undispersed spatial image capturing the superposition of multiple labels and a spectral image where signals are dispersed by a wedge prism to different spatial locations according to wavelength (Figs. 1(a) and 1(b)); (2) using Digital Micromirror Device (DMD) [8,9] to generate multiple randomly coded illumination patterns on the sample. In our approach, the undispersed spatial image acts as a spatial constraint in data reconstruction to guarantee correct reconstruction and enhance accuracy. This strategy is similar to what was used in compressive hyperspectral imaging [11] and compressed ultrafast photography [12,13]. With both simulation and experimental data, we demonstrate that for two-color imaging, single snapshot imaging using strategy (1) alone ensures a high reconstruction accuracy in resolving labels with emission peaks separated by ~20 nm. Using simulation, we also show that for three-color imaging, the use of five randomly coded snapshots is sufficient for an accurate reconstruction.

Space-constrained computational spectral imaging
We describe our sample as a three-dimensional non-negative matrix, x m = {x(i, j, k); i, j = 1,…, N, k = 1, …, K}. x(i, j, k) represents the fluorescence signal from the k th type of probes at spatial pixel localization (i, j), N is the number of pixels along one spatial dimension (for simplicity, we assumed the filed-of-view to be square), and K is the number of probes used to label the sample. The subscript m indicates the matrix representation of the variables.
To achieve multi-color imaging of the sample, we simultaneously measured its spatial and spectral information. Specifically, we employed a dual-view detection scheme ( Fig. 1(a)), which is similar to that used in spectral-resolved super-resolution localization microscopy and spectral single molecule tracking [14][15][16][17]. In the detection path, the emitted fluorescence was split by a 50:50 beam splitter cube (BS013, Thorlabs). The reflected fluorescence was directed to the camera by a mirror (Path 1, referred to as the spatial path) to generate a spatial image y spatial_m . The transmitted fluorescence passed through a dispersive wedge prism (SM1W189, Thorlabs), which would shift blue to red emissions from left to right along the lateral direction, and two pairs of mirrors (Path 2, referred to as spectral path) to produce a spectral image y spectral_m . With spectral calibration results (Section 2.2), we sought to reconstruct x m through solving an optimization problem in which y spatial_m acted as a spatial constraint in restoring the spectral information contained in y spectral_m ( Fig. 1(b)).
Here, we first describe the forward imaging formation model of our system. In the spatial path, y spatial_m is the superposition integral of all probe signals at each spatial location, which is expressed as y spatial_m = Σ k = 1 to K x(i, j, k).
The formation of y spatial_m can be described by matrix multiplication: , , , , spatial spatial K (1) where y spatial and x are one dimensional vector forms of y spatial_m and x m , respectively, and I is an identical matrix with a size of N × N. On the other hand, in the spectral path, because we model the third dimension of x m as the choice of probes instead of wavelength, the formation of y spectral_m can be described as a convolution problem, in which the emission spectrum of each probe and the dispersion by the prism determine the corresponding convolution kernel function. Assuming h(k) is the dispersed convolution kernel function for the k th channel (obtained through spectral calibration), then y spectral_m is the summation of convolved signals: . This convolution can also be expressed in matrix multiplication: _ . spectral m spectral = y A x (2) Each column of the measurement matrix, col i (A spectral ), i = 1, …, N × N × M, corresponds to the convolved image assuming a single probe is placed at the i th position of x. Combining Eq. (1) and Eq. (2), we have: , = y Ax (3) with y = [ y spatial_m; y spectral_m ] and A = [A spatial; A spectral ].
To reconstruct x under the forward imaging model ( Fig. 1(b)), two scenarios need to be considered. First, when the sample contains only two probes, Eq. (3) is a determined equation. Estimating x is a deconvolution problem. However, because of the ill-conditioned nature of the deconvolution problem, general inverse matrix method cannot be used. Therefore, we need to solve the inversion problem with a regularization term to stabilize the solution: (4) while x is a is also subject to the non-negativity constraint. In our implementation, we adopted the two-step iterative shrinkage/thresholding (TwIST) [18] algorithm, and chose ψ(x) in the form of summation of 2D total variation (TV) of each channel: Δi and Δj are the horizontal and vertical first-order local difference operators. In Eq. (4), the first term, ||y -Ax|| 2 , is the fidelity term which encourages the actual measurement y to closely match the estimated measurement Ax. The second term, βψ(x), is the regularization term, which encourages x to be piecewise constant (i.e., utilizing a sparse prior in the gradient domain, which is generally valid for cellular fluorescent images [19,20]). The regularization parameter, β, is adjusted empirically to lead to results that are consistent with the physical reality. In both our simulation and experiments, we found a β value between 1 and 5 gave good reconstruction results. In a second scenario, when the sample is multi-color labeled (number of probes K > 2) and only one snapshot is taken, the number of variables in x becomes larger than the number of measurements. In this case, solving Eq. (3) becomes a compressive sensing problem [21]. Due to the overall similarity of emission spectrum of fluorescence proteins, the measurement matrix A would generally lack the incoherence required for successful compressive sensing reconstruction. This problem can be solved by acquiring multiple snapshots of the sample using random illumination patterns generated by a DMD. Under this strategy, the forward imaging model becomes: , = y ADx (6) in which D is a binary matrix describing the random illumination pattern. y becomes the cascade of multiple measurements, and A is also a cascade of the original measurement matrix with certain columns setting to zero according to the corresponding illumination pattern. This strategy guarantees a successful reconstruction in two ways: (1) the incoherence of the measurement matrix A increases because of the randomized illumination; (2) with more measurements, Eq. (3) could become an over-determined problem. Thus, x can be obtained by optimizing the same function of Eq. (4) with AD being the new measurement matrix. We will illustrate the setup in Section 3.3. Fig. 1. Space-constrained computational spectral imaging detection system. (a) Schematic of the setup. The emitting light from the sample is split evenly by a beam splitter to generate an undispersed spatial image (Path 1) and a dispersed spectral image (Path 2) respectively on the left and right parts of the same camera. L, lens. (b) Illustration of overall space-constrained computational spectral imaging approach using green and red fluorescent beads. (c) Registered spatial (blue) and spectral (red) images of fluorescent beads at different wavelength. Scale bar: 1 µm. (d) Measured lateral spatial shift and its corresponding polynomial fitting. The measurement is averaged from 5 independent experiments, with a negligible standard deviation.

Spectral calibration
Calibration between the spectral and spatial paths was performed using fluorescent beads in different colors (yellow-green, orange, red, and dark red, F10720, Life Technologies). Each kind of beads was deposited onto a coverslip at a low density. Upon imaging, individual beads appeared as well-resolved, diffraction-limited spots ( Fig. 1(b)). Narrow bandpass filters (FB500-10, FB550-10, FB600-10, FB700-10, Thorlabs) were placed between the beam splitter and the wedge prism to determine the spectral positions of the corresponding wavelength in the spectral image relative to the image position in the spatial image. Then, the paired images were registered (described in 2.3 below, Fig. 1(b)) to estimate the lateral spatial shift of different wavelengths using cross-correlation. The measured lateral shift curve was further fitted with a second-order polynomial function to generate the prism dispersion function (Fig. 1(c)). This calibration only needed to be performed once for all experiments performed on the setup.
In order to generate the measurement matrix A, we need to measure the convolution kernel functions in A, which is determined by the probe emission spectrum, spectral transmission of the optical system (especially the dichroic mirror and the emission filter) and the prism dispersion function. For this purpose, we directly calibrated the convolution kernel functions using samples labeled with probes in a single color. In this single-color imaging scenario, image formation still can be described as the format of Eq. (3): y c = A c x probe_c , in which y c is the vectorized form of spectral measurement from single-color labeled sample, x probe_c is the one-dimensional convolution kernel function, A c is a matrix with each of its column representing the corresponding lateral shifted spatial measurement. Because the size of x probe_c is much smaller than y c , the overdetermined problem can be solved by the generalized inverse matrix method

Dual-view image registration
Image registration was performed before calibration and reconstruction to avoid dual-view mismatch and the effect of potential optical aberrations. We used control point-based image registration method. First, yellow-green fluorescent beads were imaged at 500 nm. Wellseparated spots, which acted as control-points, were localized [22]. Then the localized beads positions were manually checked to identify four pairs of matched spots from the four corners of the spatial and spectral images. The positions of the four pairs of spots were used to calculate the initial registration parameters from the spectral channel to the spatial channel. After applying the initial transformation, the program automatically identified all matched spots and determined a third-order polynomial coordinate transform function between the two images with a least-square fitting to the coordinates of these matched spots. The polynomial function was: , where x t and y t were the transformed spectral coordinates, x and y were the initial spectral coordinates, and A n and B n were coefficients obtained by least-square fitting. The described procedure was repeated on five independently measured data sets to generate an averaged registration function. Finally, backward registration [23] from spatial channel to spectral channel was performed to eliminate registration artifacts.

Simulation
Simulated data sets were used to quantitatively evaluate the performance of the image reconstruction method. The data sets were generated by the following steps using experimental wide-field fluorescent images acquired with high signal-to-noise ratio (SNR), which we referred to as ground truth images: (1) ground truth images were normalized to the mean photon counts per pixel and then rescaled up by the desired signal level (average photon counts per pixel); (2) a homogenous background was added to the rescaled images according to desired signal-to-background ratio (SBR); (3) for each ground truth image, we assumed that it was generated by a specific fluorophore with known emission spectrum (downloaded from Fluorescence SpectraViewer (ThermoFisher)) masked by the dichroic mirror and emission filter transmission window (500 to 550 nm for green-yellow fluorophores and 575 to 625 nm for orange-red fluorophores); (4) the emission spectra and prism dispersion function were used to generate the convolution kernel function for each probe, and then these kernel functions were incorporated to build the measurement matrix A; (5) the spatial and spectral images were generated according to Eqs. (1) and (2); (6) Poisson noise and sCMOS readout noise map [24] were added to both images to generate the synthesized image data sets.

BSC-1 cells (African green monkey kidney cells, from UCSF Cell Culture Facility) were maintained in Dulbecco's modified Eagle medium (DMEM) with high glucose (UCSF Cell
Culture Facility), supplemented with 10% (vol/vol) FBS and 100 µg/ml penicillin/streptomycin (UCSF Cell Culture Facility). All cells were grown at 37°C and 5% CO2 in a humidified incubator. The plasmids encoding mCherry-Vimentin-C-18 and tdTomato-Clathrin-15 (the last number indicating the number of amino acid residues of the linker between the fluorescent protein and the target protein) were purchased from the Michael Davidson Fluorescent Protein Collection at the UCSF Nikon Imaging Center. We transfected BSC-1 cells grown in an 8-well glass bottom chamber (Thermo Fisher Scientific) using FuGene HD (Promega). For better cell attachment, the 8-well chambers was coated with fibronectin solution (Sigma-Aldrich, F0895-1MG) for 45 min before seeding cells. Cells were seeded at the density of 2000 ~4000 per well one day before transfection. Plasmid encoding clathrin and vimentin were mixed at a ratio of 1:4, and a total amount of 200 ng DNA was added to each well. Forty-eight hours after transfection, cells were fixed with 4% paraformaldehyde for 15 mins, followed by three times of PBS washes. In order to minimize the background fluorescence, our spectral detection setup was built around a wide-field inverted fluorescence microscope (Nikon Ti-U) with an oil immersion objectives (Olympus 100x 1.4 NA UPlanSApo). Excitation light beams (488 nm and 561 nm, Coherent OBIS) were combined, expanded, and then passed through a home-built total internal reflection fluorescence (TIRF) illuminator before entering the microscope. The incident angle of the excitation light was adjusted to be just smaller than the critical angle. Fluorescence was filtered using a quad-band dichroic mirror (z405/488/561/640rpc, Chroma) before exiting the microscope body. An emission filter (ET525/50m or ET600/50m Chroma) was inserted in the detection path to block the excitation light. A pair of lenses ( Fig. 1(a), L1: f = 150 mm, L2: f = 100 mm) relayed the intermediate image plane to the camera, with the optical elements for spatial-spectral dual-view imaging placed in between. Images were recorded by a sCMOS camera (ORCA Flash 4.0 sCMOS, Hamamatsu). The final pixel size at the image plane was 85 nm. During imaging, the camera frame rate was set to be 5 Hz.

Evaluation using simulated two-color data
We first quantified the performance of the reconstruction algorithm by analyzing a simulated data set (see Section 2.4) (Fig. 2). Experimentally acquired images of microtubule ( Fig. 2(a)) and mitochondria outer membrane protein Tom20 (Fig. 2(b)) were assumed to be labeled by EGFP and EYFP. Conventionally, EGFP and EYFP cannot be easily distinguished using filters, because their emission peaks are separated by only 20 nm (Fig. 2(c)). Here we used this probe pair to demonstrate the reconstruction accuracy and spectral resolution of our computational method. For this purpose, simulated images (Figs. 2(d)-2(e)) with various signal levels and SBR were generated. We set the average signal level to be between 100 and 1500 photons per pixel and SBR to be between 1 and 5, which is typical for wide-field fluorescence microscopy of biological samples. Under this condition and with sensitive sCMOS or EMCCD cameras, Poisson shot noise instead of camera readout noise is the most significant noise source. Therefore, the average signal-to-noise ratios (SNR) of the simulated data set is approximately the square root of average photon counts per pixel. We purposely chose relatively low signal levels to examine the effect of such noise on the reconstruction algorithm.
We compared the reconstructed results (Figs. 2(f)-2(g)) with the ground truth images (Figs. 2(a)-2(b)) and quantified the accuracy by the root mean squared error (RMSE) between them. In order to measure RMSE independent of the absolute signal level, we used the relative RMSE, which was calculated after the ground truth and reconstructed images are normalized to the maxima. The resulted relative RMSE was smaller than 0.1 even when signal level in extremely low (Fig. 2(h)). For signal level higher than 500, our method steadily presented a RMSE of ~0.04. Besides, our method had robust performance (0.03 ~0.05 relative RMSE) under a wide range of background signal level. We also simulated orange-red fluorescent protein pair tdTomato and mCherry (Fig. 2(j)). The results (Fig. 2(k)-2(l)) still achieved high precision reconstruction for red fluorescent proteins. The dependence of tdTomato-mCherry reconstruction RMSE on signal level and SBR followed the same trend as that of EGFR and EYGP, with the difference that tdTomato-mCherry reconstruction was slightly more sensitive to noise and background signal. This difference could be explained by the fact that even though the emission peak separation between tdTomato and mCherry is comparable to that between EGFP and EYFP (Fig. 2(c)), the emission spectrum of tdTomato and mCherry are much wider than those of EGFP and EYFP (Fig. 2(c) and 2(j)). With these performances, we conclude that our method can accurately separate probes with emission peaks separated by ~20 nm. We note that the main contributor to this performance is the space-constraint from the spatial image. Without space-constraint, the algorithm failed to correctly assign signals between channels.

Analysis of two-color experimental data
We then validated our method in analyzing real experimental images. Here we demonstrated two-color imaging by tdTomato-clathrin and mCherry-vimentin. This sample was chosen for two reasons. First, because the ground truth was unknown, two distinct structures were chosen to facilitate visual evaluation of reconstruction performance. As shown in Figs. 3(a) and 3(b), in the acquired spatial and spectral snapshot of the sample, the bright dots corresponded to clathrin pits and the network structure was vimentin. By comparing the reconstruct results with the structural prior knowledge, we could better estimate the performance of our method in analyzing real experimental data. Second, tdTomato and mCherry were used because their emission peaks are separated by just 20 nm (Fig. 3(c)).
Using dual-view registration function (obtained as described in section 2.4) and dispersed convolution kernel functions (Fig. 3(d)) calibrated from single-color labeled sample, we successfully reconstructed clathrin (Fig. 3(e)) and vimentin structure (Fig. 3(f)) with low cross talk. Vimentin network was well resolved despite that their intensity varied a lot in the image. Most of the clathrin pits were also well resolved except the cross-talk at upper left corner and center of the image where vimentin signal was strong. The image resolution was not sacrificed during reconstruction, as supported by the fact that the peak widths of intensity profiles in both reconstructed channels were similar to those from the same structures in the non-dispersed spatial image (Fig. 3(g)).

Evaluation using simulated three-color data
As mentioned in section 2.1, when the sample is multi-color labeled, multiple snapshots generated by DMD random illumination can be used to avoid under-sampling of the 3D data. We illustrated the proposed setup diagram (Fig. 4(a)), in which a DMD was placed in the excitation path at the conjugate plane of the sample plane. Although placing DMD in the detection path can serve the same purpose, it would result in significant loss of emitted photons. As a proof-of-principle test, we used simulations to demonstrate the effectiveness of our computational spectral imaging method in three-color imaging. Based on the simulation of Section 3.1, we added another clathrin-tagRFP channel. Random illumination patterns were simulated by generating random binary patterns of the same size with the ground truth images ( Fig. 4(b)). These patterns were then multiplied to ground truth and measurement matrix before generating simulated spatial and spectral images ( Fig. 4(a)). Figure 4(c) displays clean separation of all three channels, with the cross-talk slightly higher than twocolor imaging. The reconstruction performance depended on the number of snapshots. We found that for three-color imaging, relative RMSE decreased from ~0.1 to ~0.04 with the increasing of the number of snapshots from 1 to 10 ( Fig. 4(d), equivalent to an oversampling ratio from 0.7 to 7). This result means that more measurements ensure higher reconstruction accuracy. However, more measurements also lowers the temporal resolution of computational spectral imaging and makes this method less advantageous compared to scanning spectral imaging. For three-color imaging, a snapshot number of five is a good tradeoff between reconstruction accuracy and imaging speed (Fig. 4(d)).

Conclusion
We present a computational spectral imaging method for wide-field multi-color fluorescence imaging. The method is based on simultaneous spatial and spectral data acquisition, in which the spatial information acts as a constraint in the image reconstruction to achieve superior accuracy. With both simulation and experimental data, we have shown that our method has high reconstruction accuracy in separating probes with emission peak separation of 20 nm for two-color imaging. As a proof-of-principle, we have also shown with simulation that our method is capable of multi-color imaging given multiple snapshots taken. Our method provides free wavelength choice for multi-color imaging that is not limited by fluorescence band-pass filter choices, for example, when using notch filters to block excitation lasers and hence providing wide-open passing bands. It can also be easily combined with conventional filter-based sequential multi-color imaging to further expand the probe choices of fluorescent imaging, because multiple fluorophores within the filter windows can be distinguished.
The same method can also be combined with other imaging modalities besides TIRF, for example spinning disk confocal and light sheet microscopy [25]. We note that the main limitation of our method is that the application is confined to imaging distinct structures. Relatively homogeneous distributed protein over the FOV, for example diffuse membrane protein, would result in homogeneous spectral image, additional spatial coding such as using DMD-generated excitation patterns would then be required to extract the color information.