Single molecule mRNA fluorescent in situ hybridization combined with immunofluorescence in S. cerevisiae: Dataset and quantification

Single-molecule fluorescent in situ hybridization (smFISH) has emerged as a powerful technique that allows one to localize and quantify the absolute number of mRNAs in single cells. In combination with immunofluorescence (IF), smFISH can be used to correlate the expression of an mRNA and a protein of interest in single cells. Here, we provide and quantify an smFISH-IF dataset in S. cerevisiae. We measured the expression of the cell cycle-controlled mRNA CLN2 and the cell cycle marker alpha-tubulin. The smFISH-IF protocol describing the dataset generation is published in the accompanying article “Simultaneous detection of mRNA and protein in S. cerevisiae by single-molecule FISH and Immunofluorescence” [1]. Here, we analyze the smFISH data using the freely available software FISH-quant [2]. The provided datasets are intended to assist scientists interested in setting up smFISH-IF protocol in their laboratory. Furthermore, scientists interested in the generation of imaging analysis tools for single-cell approaches may find the provided dataset useful. To this end, we provide the differential interference contrast (DIC) channel, as well as multicolor, raw Z-stacks for smFISH, IF and DAPI.


a b s t r a c t
Single-molecule fluorescent in situ hybridization (smFISH) has emerged as a powerful technique that allows one to localize and quantify the absolute number of mRNAs in single cells. In combination with immunofluorescence (IF), smFISH can be used to correlate the expression of an mRNA and a protein of interest in single cells. Here, we provide and quantify an smFISH-IF dataset in S. cerevisiae. We measured the expression of the cell cycle-controlled mRNA CLN2 and the cell cycle marker alpha-tubulin. The smFISH-IF protocol describing the dataset generation is published in the accompanying article "Simultaneous detection of mRNA and protein in S. cerevisiae by single-molecule FISH and Immunofluorescence" [1] . Here, we analyze the smFISH data using the freely available software FISH-quant [2] . The provided datasets are intended to assist scientists interested in setting up smFISH-IF protocol in their laboratory. Furthermore, scientists interested in the generation of imaging analysis tools for singlecell approaches may find the provided dataset useful. To this end, we provide the differential interference contrast (DIC) channel, as well as multicolor, raw Z-stacks for smFISH, IF and DAPI.
© 2020 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY license.
( http://creativecommons.org/licenses/by/4.0/ ) Specifications Table   Subject Biology, Cell Biology, Biophysics Specific subject area Gene expression analysis, Single molecule mRNA imaging, Fluorescence imaging analysis Type of data Table  Image Graph Figure  Analysis files How data were acquired Data were acquired using a wide-field fluorescence microscope coupled to a CCD camera, a motorized stage driven by ultrasonic piezo technology for precise Z-positioning and a 120 W mercury short arc lamp as fluorescence illumination system. Model and make of the instruments used:

Value of the data
• smFISH-IF data are useful to correlate the expression of an mRNA of interest to the expression of a protein in single cells. This approach has the potential to reveal the subcellular localization of mRNAs, the cell-to-cell variability in gene expression, and to precisely quantify the absolute number of mRNAs in the cytoplasm and at the transcription site. • Scientists interested in setting up the smFISH-IF protocol in their lab could use this dataset as a reference to compare their dataset to an established protocol. • The presented data were collected using S. cerevisiae as a model system. We believe that the approach is relevant to other organisms. Here, we provide an overview of the smFISH-IF protocol and imaging analysis process by making available the raw data and the smFISH quantification performed with the software FISH-quant [2] . • The increasing use of microscopy in life sciences is rising the need for imaging analysis tools.
Further automation of the quantification process (e.g. by using machine learning algorithms) will expand the application of life science microscopy data. Thus, software developers involved in the generation of imaging analysis tools for gene expression analysis and cell physiology may find useful the provided dataset.

Data
smFISH has emerged as a powerful technique that allows one to localize and quantify the absolute number of mRNAs in single cells [3] . In combination with immunofluorescence, smFISH can now be used to correlate the expression of an mRNA of interest to that of a protein in single cells [4][5][6] . Because the quantifications are strictly dependent on the quality of the primary data, it is important to achieve high-quality smFISH datasets. Here, we provide three smFISH-IF datasets that we deposited in the public repository http://dx.doi.org/10.17632/bcmn9cxyzs.4 [7] . The smFISH was done to detect the cyclin 2 mRNA ( CLN2 ), which is a cell cycle-regulated mRNA periodically expressed during the G1 phase [8] . The immunofluorescence was done to detect the protein tubulin 1 that was used to monitor the cell cycle phase ( Fig. 1 a).
For each experiment, we provide multiple stage positions using the following filter sets: a) CY5 or FITC: to detect the IF, revealed using a secondary antibody conjugated to the fluorophore Alexa647 or Alexa488, respectively. b) CY3: to detect the smFISH, revealed using probes conjugated with the fluorophore Quasar 570. c) DAPI: to detect the nuclear staining performed using DAPI. d) DIC: differential interf erence contrast image. For each stage position and wavelength, we collected 41 z-stacks every 200 nm in the following order: CY5 or FITC, CY3 and DAPI channels. The files are saved in TIFF format.
Representative smFISH-IF results are shown in Fig. 1 b, whereby the different fluorescent channels are merged to the DIC channel. The different fluorescent channel images are the result of maximal projections of the Z-stacks where the cells are located (usually 20-25 Z-planes), generated using Image J [9] . The CY3 channel has been filtered using the Gaussian filter in FISHquant. The DIC images have a systematic drift compared to the fluorescence images due to our microscope characteristics. This drift is corrected using the x, y "translate" function in Fiji.
The three datasets are representative of typical smFISH-IF results. Datasets 1 and 2 have a few imperfections that can be found in this type of experimental setting and are discussed in the following paragraphs.
In dataset 1, positions 6 and 7 show a significant drift in the Z directions (about 1 pixel per Z-plane). This problem can be detected using, for instance, the Fiji function: TS are identified by the higher brightness compared to the single cytoplasmic mRNAs. Tubulin (purple) co-localizes with the spindle pole body which is duplicated during S phase. The bud emergence starts during S phase and ends with the formation of the daughter cell. During anaphase, the microtubules stretch between the mother and the daughter cell [11] . The CLN2 mRNA is transcribed during late G1 and is homogeneously dispersed in the cytoplasm. Image > Stacks > Orthogonal views , and it can be caused by a small bubble in the oil or if the objective is not clean. A small bubble is likely the cause for positions 6 and 7 since the positions of EXP1 were collected consecutively, but only at these two positions, we found the drift. These images can still be used to quantify the number of mRNAs per cell using FISH-quant (see output results). The filtered images still allow the identification of mRNA spots. However, if one requires to localize mRNAs with high precision (e.g. for co-localization experiments), we recommend discarding these types of images.
In dataset 2, we observe some debris in the IF channel (CY5) at the bottom of the slide. This can happen if the excess of fluorescently labeled secondary antibodies used for IF detection is not completely removed during the washes. This problem can be solved by increasing the number and the stringency of washes during the IF protocol, or by increasing the concentration of bovine serum albumin (BSA) used to block the non-specific binding of the antibodies. In the case of tubulin, used in this experiment as a cell cycle marker, we believe the data can still be used for cell cycle scoring based on the reasons described in the following paragraph.

Cell cycle phase scoring of cells
The cell cycle phase of each cell was determined using the morphology of the cell and the tubulin IF signal intensity. The software Cellprofiler was used to quantify the integrated intensity of the IF signal within the cell outlines and for the maximal projection of the Z-sections containing the cells, excluding the bottom of the slide. Thus, most of the debris is not quantified. This allows one to distinguish cells in the G1 phase from cells in the following phases of the cell cycle (S to M phase) due to an increase in the tubulin signal that is caused by a spindle duplication during S phase [10] . The CellProfiler pipeline used in this study was deposited in the public repository http://dx.doi.org/10.17632/bcmn9cxyzs.4 .
After quantifying the tubulin signal, we combine these measurements with the morphology of the tubulin and cell (based on DIC) to manually score the cells according to their cell cycle phase. This scoring is done in Fiji, using the plugin Cell Counter [9] . G1 phase cells can be identified by the single, nuclear localized, spindle pole body and round morphology of the cells as well as a lower tubulin intensity. The emergence of a small bud identifies S-phase cells. G2 phase cells can be identified by their larger bud and duplicated spindle pole body with tubulin stretching into the emerging bud. M-phase cells can be identified by the extended spindle and dividing nucleus ( Fig. 1 a). A CSV file is created from Fiji, containing the information on the position and scoring group of each cell. This file can then be imported into the outline designer in FISH-quant using the "Load Groups" menu option. After the FISH-quant analysis, described in detail below, one has the possibility of sorting the mRNA expression results according to the cell cycle group ( Fig. 1 c).

Cell outlines
Together with the raw images, we deposited the smFISH analysis files that were generated using FISH-quant [2] (see Experimental Design, Materials, and Methods for more details). The outlines for the ten stage positions were generated using Cellprofiler (the pipeline is provided with V3 of FISH-quant) and imported using the built-in FISH-quant option.
The text files (e.g., EXP1wtTub647CLN2Q570_01_CY3__outline) describe the coordinates of nuclear and cell outlines for a total of 1018 cells. The files are imported in FISH-quant main interface ( Fig. 2 a) after loading the corresponding raw smFISH stage position (41 Z-stacks) and loading the corresponding outline file ( Fig. 2 b). At this stage, the quality of the outlines is verified and corrected if necessary. Furthermore, at this stage, transcription sites (TS) auto-detection settings are defined (e.g. threshold for TS brightness, and DAPI brightness).

FISH-quant batch analysis files
Once all cell outlines and transcription sites are defined, we set the parameters of single mRNA detection. Using FISH-quant main interface, the image is filtered using the parameters shown in Fig. 2 a. A single cell expressing the mRNA of interest, is chosen to set the initial parameters using the pre-detection tools. Once the parameters best identifying the mR-NAs in the chosen cell are found, they are applied to the entire image and then saved as a text file. We deposited the file describing the mature mRNA settings used to analyze EXP1 (EXP1wtTub647CLN2Q570_01_CY3__settings_MATURE).
Next all the smFISH images can be analysed using the same setting in the Batch processing mode that can be selected from the FISH-quant main interface. The setting and all the outlines are loaded and processed. The analysis for EXP1 is shown in Fig. 3 a. The analysis file is saved using the Matlab format. This file was deposited (_FQ_batch_ANALYSIS_191205) and can be utilized to verify the mRNA detection parameters.
Following the batch processing, the intensity of single mRNAs is averaged ( Fig. 3 b-c) and it is used to set the parameters for measuring TS intensities ( Fig. 3 d). The file describing the averaged single mRNA intensity was deposited as well (_mRNA_AVG_ns). Once the settings tested for a single image are found, they are saved as a text file that can be used in the batch mode ( Fig. 3 a) to automatically detect TS intensities for all images. We deposited the file describing the nascent mRNA settings used to analyze EXP1 (_FQ__settings_NASCENT).

FISH-quant output results file
After the batch identification of both mRNAs and TS, several text files can be saved reporting the analysis output (e.g. the number of nascent and mature mRNAs per cell, the intensities and localization of the mRNAs). We deposited the text file reporting the number of nascent and mature CLN2 mRNAs for the ten stage positions of EXP1 (__FQ_batch_summary_ALL_191204). This information can be further plotted to show, for instance, the number of mRNAs present in expressing cells ( Fig. 4 a), or the distribution of the mRNA in a population of cells (expressing and non-expressing) ( Fig. 4 b). Furthermore, the number of nascent mRNAs per transcription site can be plotted as individual values ( Fig. 4 c) or as a frequency distribution ( Fig. 4 d).

Single molecule fish probes design
CLN2 probes conjugated to Quasar 570 were designed using the Stellaris TM Probe Designer by LGC Biosearch Technologies and purchased from Biosearch Technologies ( https://www. biosearchtech.com/support/tools/design-software/stellaris-probe-designer ).

Single molecule FISH combined with immunofluorescence
The detailed protocol to perform single molecule FISH combined with Immunofluorescence (smFISH-IF) is described in the accompanying article [1] . In brief, it was performed as follows. Yeast strains were grown overnight at 26 °C in SC-complete medium with 2% glucose. In the morning cells were diluted to OD 600 0.1 and allowed to grow until OD 600 0.3-0.4. Cells were fixed by adding paraformaldehyde (32% solution, EM grade; Electron Microscopy Science #15714) to a final concentration of 4% and gently shacked at room temperature for 45 min.
Cells were then washed 3 times with buffer B (1.2 M sorbitol and 100 mM potassium phosphate buffer pH = 7.5) and resuspended in 500 μL of spheroplast buffer (buffer B containing 20 mM VRC (Ribonucleoside-vanadyl complex NEB #S1402S), and 25 U of Lyticase enzyme (Sigma #L2524) per OD of cells for about 8 min at 30 °C. Digested cells were washed once with buffer B and resuspended in 1 mL of buffer B. 150 μL of cells were seeded on 18 mm (#1.5) poly-lysine treated coverslips and incubated at 4 °C for 30 min. Coverslips were washed once with buffer B, gently covered with ice-cold 70% ethanol and stored at −20 °C. For hybridization, coverslips were rehydrated by adding 2xSSC at room temperature twice for 5 min. Coverslips were pre-hybridized with a mix containing 10% formamide (ACROS organics #2058210 0 0)/2xSSC, at room temperature for 30 min. For each coverslip the probe mix (to obtain a final concentration in the hybridization mix of 125 nM) was added to 5 μL of 10 mg/μL E. coli tRNA/ ssDNA (1:1) mix and dried with a speed-vac. The dried mix was resuspended in 25 μL of hybridization mix (10% formamide, 2 × SSC, 1 mg/ml BSA, 10 mM VRC, 5 mM NaHPO4 pH 7.5) and heated at 95 °C for 3 min. Cells were then hybridized at 37 °C for 3 h in the dark. Upon hybridization coverslips were washed twice with the pre-hybridization mix for 30 min at 37 °C, once with 0.1% Triton X-100 in 2xSSC for 10 min at room temperature, once with 1 ×SSC for 10 min at room temperature.
Before the IF, the smFISH was fixed with 4% PFA-in PBS for 10 min at room temperature (RT). Coverslips were washed once with PBS 1 × for 5 min at RT. The coverslips were pre-incubated with the IF solution (1 ×PBS, 0.1% RNAse-free Bovine Serum Albumin) for 30 min at RT. Next, the coverslips were incubated with the primary antibody (mouse anti-tubulin, 1:10 0 0) in IF solution for 45 min at RT. The coverslips were washed with 1 ×PBS for 5 min at RT before incubation with the secondary antibody (goat anti-mouse Alexa 647, 1:1500, or goat anti-mouse Alexa 488, 1:1500) in PBS 1 ×, BSA 0.1%, for 45 min at RT. Excess antibody was removed by washing the coverslips three times with PBS1x for 5 min at RT. Before mounting, we dipped the coverslip in 100% ethanol and let them dry completely at RT, protected from the light. We mounted the coverslips face down onto a drop of ProLong Gold anti-fade mounting solution with DAPI placed on a glass slide. We allowed the mounting solution to polymerize at RT, overnight and in the dark.
To note, both secondary antibodies, conjugated with Alexa 647 or Alexa 488, can be used in combination with smFISH probes labelled with Quasar 570. Probes labelled with Quasar 670 can be combined with IF detected with secondary antibodies labelled with Alexa 555 or Alexa 488. Immunofluorescence can be also combined with two-color smFISH (protocols are provided in [ 4 , 11 , 12 ]).

Image acquisition
smFISH-IF data were collected by using a wide-field fluorescence microscope, combined with a CCD camera and a high numerical aperture objective corrected for chromatic aberration. Yeast cells attached on a glass coverslips were imaged at different wavelengths using the following conditions and order: Alexa 647 or Alexa 488 (IF): exposure 750 ms (100% lamp power); Quasar 570 (smFISH): 750 ms (100% lamp power); DAPI (nuclear staining): 50 ms (25% lamp power); DIC (cell outline): 150 ms. Ten to twenty different stage positions were acquired per experiment (to reach ~10 0 0 cells for quantification). At each stage position, and for each fluorescent channel, we collected 41 Z-stacks every 200 nm. For DIC, one single Z-stack was acquired at the focal plane.

smFISH image analysis
smFISH images were analyzed using the Matlab written software FISH-quant [2] . The software is freely available, provided by Florian Mueller ( https://research.pasteur.fr/en/member/ florian-muller/ ) on the Bitbucket repository ( https://bitbucket.org/muellerflorian/fish _ quant/src/ master/ ). The scripts are accompanied by an in depth-manual that explains the theory and how to perform single mRNA detection. The current version (V3) provides the scripts to perform automatic cell outline recognition using the freely available software Cellprofiler [13] as well as scripts to perform co-localization of spots in two-colors [14] . After background subtraction, the integrated fluorescence intensities of all detected cytoplasmic smFISH spots were fitted to a three-dimensional (3D) Gaussian to determine the average intensity and the coordinates of the mRNAs ( Fig. 2 ). The peak intensity of the 3D Gaussian represents the intensity of a single CLN2 mRNA hybridized with 48 single-labelled Quasar 570 probes. The average intensity of all the mRNAs was used to determine the number of transcripts at each transcription site ( Fig. 3 ). sm-FISH performed with 48 20-nucleotide singly-labelled probes is likely to permit the detection of the majority of cellular mRNAs. This assumption is based on previous reports showing that two probe sets targeting different regions of the same mRNA and labelled with spectrally distinct fluorophores (e.g. Quasar 670 and Quasar 570) co-localize more than 80% of the time [ 4 , 12 , 15 ]. Furthermore, in an experiment performed in the model organism Drosophila melanogaster , it was shown that increasing the number of probes (e.g. from 48 to 63 or 91) did not significantly increase the number of mRNAs detected per cell by smFISH [16] .