MR-seq: measuring a single cell’s transcriptome repeatedly by RNA-seq

We described a novel single-cell RNA-seq technique called MR-seq (measure a single-cell transcriptome repeatedly), which permits statistically assessing the technical variation and identifying the differentially expressed genes between just two single cells by measuring each single cell twice. We demonstrated that MR-seq gave sensitivity and reproducibility similar to the standard single-cell RNA-seq and increased the positive predicate value. Application of MR-seq to early mouse embryos identiﬁed hundreds of candidate intra-embryonic heterogeneous genes among mouse 2-, 4- and 8-cell stage embryos. MR-seq should be useful for detecting differentially expressed genes among a small number of cells. (cid:1) 2017 Science China Press.


Introduction
The single cell RNA-seq techniques developed rapidly in recent years [1][2][3][4][5]. They have been applied for analyzing precious and rare cell types [6][7][8][9][10][11][12] such as human early embryos, and for dissection of gene expression heterogeneity within a population of cells [13][14][15][16][17][18][19][20]. The technical noise of the single-cell RNA-seq techniques is much larger than the bulk RNA-seq, and separating the technical variation from the biological variation has been a research focus [21][22][23][24]. Particularly, when there are only several cells different from each other, e.g. blastomeres in early mammalian embryos, it is not possible to statistically identify the differentially expressed genes (DEGs) due to no replicates and high technical noise in the single cell RNA-seq dataset.
Here, we presented a novel method named MR-seq (measure a single-cell transcriptome repeatedly). This technique, which splits the lysate of a single cell into 2 aliquots and perform single-cell RNA-seq analysis separately, allows for measuring each single cell's transcriptome twice, then statistically assessing the technical variation and identifying the DEGs between just two single cells.

Isolation of zygotes; 2-, 4-and 8-cell embryos; inner cell mass (ICM) and trophectoderm (TE)
Six-to eight-week old ICR female mice were injected with 7.5 IU human chorionic gonadotropin (hCG) 46-48 h after injection with 7.5 IU pregnant mare serum gonadotrophin (PMSG). The 2-, 4-and 8-cell stage embryos were collected from the oviduct of mated mice at 48, 56 and 70 h post-hCG injection, respectively. The embryos were treated with Acid Tyrode's solution to remove the zona pellucida, transferred to trypsin-EDTA, and incubated at 37°C for 5 min. Then, the embryos were gently washed with PBS-BSA buffer to dissociate the embryos into individual blastomeres. The ICM and TE were isolated from mouse E3.5 blastocysts [8]. The trophectoderm was isolated from the blastocysts with zona pellucida removed under the microscope by mechanically cutting using a 30 G needle. The exposed ICM was gently scraped from the trophectoderm. These two cell types were then treated with accutase or 0.005% trypsin at 37°C to dissociate single cells.

Cell lines
MEFs were derived from ICR strain E13.5 embryos. Mouse embryonic stem cells (mESCs) were cultured on 0.1% gelatinized tissue culture plates and passaged every two days.

Preparation of single-cell cDNAs for MR-seq
In this work, we measured 132 samples' transcriptome (Table S1, online), and sequenced the libraries on Illumina and Ion platforms. The MR-seq protocol was developed based on the single-cell RNA-seq technique we previously developed [2]. Briefly, single cells were lysed with a doubled volume (9.1 lL) of lysis buffer, and the sample was incubated at 70°C for 90 s. Then, a mixture of the reverse transcription enzyme solution and cell lysate was divided into two aliquots. After the reverse transcription, primer remove, A-tailing, and second strand synthesis, one round of no more than 24 cycles of PCR amplification was used to amplify the cDNAs for sequencing library preparation.

Construction of the cDNA libraries for Illumina sequencing
The 5-100 ng amplified cDNAs were frgmented to approximately 200 bp fragments using Covaris S2 and purified with DNA Clean & Concentrator TM -5. Then, NEBNext Ò Ultra TM DNA Library Prep Kit for Illumina Ò (NEB, Catalog #E7370L) was used to construct the cDNA library with half reagents. The cDNA libraries were sequenced on Illumina HiSeq 2000 and 2500 instruments with 100 bp reads and paired-end parameters.

Construction of the cDNA libraries for Ion Proton sequencing
First, 5-100 ng amplified cDNA were fragmented to approximately 250 bp fragments using Covaris S2 and purified with DNA Clean & Concentrator TM -5. The Ion Xpress TM Plus Fragment Library Kit (ThermoFisher, cat. no. 4471269) and Ion Xpress TM Barcode Adapters 1-16 Kit (Thermo Fisher, cat. No. 4471250) were used to construct the library with half reagents. The CV is plotted against the log10-transformed geometric mean of expression for all genes detected in 8-cell single blastomeres (merged from MR-seq analysis dataset) and 8-cell technical replicates. The pink points showed the CV of genes from single blastomeres in the ''pool/split" analysis, and the light green points showed the CV of genes from technical replicates in the ''pool/split" analysis.

Ion PI template preparation, enrichment and sequencing on the Thermo Fisher Proton
Emulsion PCR was performed using the One Touch 2 (OT2) system following the Ion PI Template OT2 200 V3 protocol (Life Tech MAN0009133 RevB.0) with480 million DNA templates. Enriched Ion spheres were quantified, and approximately 400-600 million Ion spheres were recovered. PI chips were loaded according to the spinning protocol, and sequencing was performed using the Ion PI Sequencing 200 Kit V3 (Life Tech MAN0009136 Rev B.0) on an Ion PI Proton system. Base calls were collected using the Ion Torrent Suite v4.3 software.

MR-seq data alignment, analysis and graph drawings
The Illumina HiSeq 2500 and 2000 (Illumina, CA, USA) and Ion Proton (Life Technologies, CA, U.S.) sequencing systems were used for sequencing. Adaptor contamination and low-quality reads were discarded from the raw data. For the Illumina paired-end reads, TopHat (version 2.0.10) was used for sequence alignment, and FPKM values were generated by Cufflinks (version 2.1.1). For the Ion Proton RNA-seq reads, a 2-step alignment method that combined STAR (version 2.3.0) and Bowtie2 (version 2.1.0) was applied. Unmapped reads produced by STAR were re-mapped with Bowtie2 in the ''verysensitive-local" mode, and the merged bam files were used by Cufflinks as the input. Reference sequence and transcript annotation stages. PC1 and PC2 represented the top two dimensions of the genes showing differential expression among the embryos, which accounted for 71.5% and 8.9% of the expressed genes, respectively. c Number of intra-embryonic heterogeneous genes from 2-to 8-cell stages. The intra-embryonic heterogeneous genes were identified according following cutoff: |FC| > 2, adjusted P < 0.05. FC, fold change. d Scatter plot showed how the log2FoldChange of intra-embryonic heterogeneous blastomeres' genes change with the mean expression (nCount). files were from the UCSC genome assembly mm10. DEGs were found using the R package ''Deseq2" (version 1.6.1) [25], and the counts required for in Deseq2 were achieved by HTSeq (version 0.6.0). The entire correlation or distance matrixes were scaled and the unsupervised hierarchical clustering was performed using the 'heatmap.2' function from the ''gplots" package in R. For the CV scatterplot, average CV and standard deviation of FPKM values from the technical replicates and 8-cell blastomeres were calculated within each bin (0.15) and plotted in log 10 scale. The 2D PCA plot was generated by R with the ''ggbiplot" package. The Venn graphs are pictured on the http://www.cmbi.ru.nl/cdd/biovenn/index.php [26].

Ethical approval
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institu-tional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. For studies with animals, all applicable institutional and/or national guidelines for the care and use of animals were followed.

Performance of MR-seq
Firstly, we determined the sensitivity of MR-seq in comparison with the standard single-cell RNA-seq method. We split the lysate of single mouse embryonic stem cells (mESCs) into 2, 4 or 8 aliquots and constructed their libraries separately. The results showed that the number of genes (FPKM > 1) detected in half-cell lysate was comparable to that in a whole single cell (average  Fig. 1a). The gene expression correlation between half-cell lysate and the bulk mESCs was also comparable to that between the whole single cell and the bulk (the Spearman's rank correlation coefficient, median 0.71 vs 0.72 for bulk & half-cell vs bulk & single-cell, Figs. 1b and S1). The fraction of overlapping genes of two half-cell lysates was also similar to that of two single cells (Jaccard index, 62% vs 65% for half-cell lysates vs single cells, Fig. 1c). On the contrast, the detection sensitivity and reproducibility greatly decreased when divid-ing the lysate into 4 or 8 equal aliquots ( Fig. 1a and b). These results indicated that MR-seq by 2-split, but not more splits, gives a technical sensitivity comparable to the standard single-cell RNA-seq.
To assess the ability of MR-seq to identify DEGs between only two single cells, we examined the mESCs and the mouse embryonic fibroblasts (MEFs). In condition of only one sample in each group, no statistical analysis can be performed, while measuring each sample twice allows for statistical analysis by adding technical replicates and is expected to reduce the technique variation. The results showed that two measurements conducted by MR-seq indeed significantly (P < 0.001) increased the positive predicate value in comparison with only one measurement (average 50% vs 42% for MR-seq vs single-cell, Fig. 1d). Also, the MRseq successfully identified known DEGs between mESCs and the MEFs [27] (Fig. 1e).
Further, we compared MR-seq with the ''pool/split" experiment for identification of highly variable genes. We analyzed each blastomere of an 8-cell embryo by MR-seq for identification of the DEGs. For the ''pool/split" experiment, the dataset of halfblastomere lysate from the same blastomere was merged to identify the highly variable genes by comparing with the control of the technical variation obtained from another 8-cell embryo. The MRseq method identified 2,092 DEGs and the ''pool/split" approaches identified 1,014 highly variable genes, with 225 genes overlapped (Fig. 2a). The 225 overlapping genes included embryonic development-related genes such as Nanog and Krt18 (Figs. 2b and S2b). Interestingly, of the 1,867 MR-seq only genes, gene ontology (GO) terms strongly enriched in functions of cell cycle, chromatin modification, RNA processing, in utero embryonic development (Fig. S2a), as well as many transcriptional factors (Fig. 2b). In contrast, of the 789 highly variable genes identified by the ''pool/split" analysis slightly enriched in functions of translation and protein localization (Fig. S2c). These results implied that MR-seq method identified many potential intra-embryonic heterogeneous genes that could be missed by the 'pool/split' approach (Figs. 2 and S2).

Analysis of early mouse preimplantation embryos by MR-seq
We next applied MR-seq to detect the intra-embryonic heterogeneous genes in early mouse preimplantation embryos from zygote to 8-cell stages. We analyzed a total of 11 embryos of 2 zygotes and 3 embryos each for 2-, 4-, and 8-cell stages. Unsupervised hierarchical clustering and PCA analysis clearly distinguished embryos of different stages, confirming the technical sensitivity of MR-seq ( Fig. 3a and b). We identified 4 genes which showed potential intra-embryonic heterogeneity overlapping between two embryos in the 2-cell stage, yet no genes overlapped among all three embryos could be identified (Fig. 3c). In the 4-cell stage, we identified 17 genes showing intra-embryonic heterogeneity overlapped among three embryos. The gene number increased to 519 in the 8-cell stage (Fig. 3c, Tables S2 and S3). These results indicated that the intra-embryonic heterogeneity prominently increased from 2-to 8-cell stage. The gene expression levels of all these identified intra-embryonic heterogeneous genes were relatively high in the blastomere of the identified heterogeneous stages, suggesting they were not caused by technical variation (Fig. 3d).

Discussion and conclusion
We present here a novel single-cell RNA-seq method allowing for identification of DEGs between just two single cells. We applied this method for identifying the intra-embryonic heterogeneous genes among mouse 2-, 4-and 8-cell stage embryos.
We have determined that the sensitivity of MR-seq is comparable with the standard single-cell RNA-Seq method. This indicates that although dividing the cell lysate into 2-splits reduces the amount of starting material, the loss of transcripts during the dividing step does not significantly reduce the performance of the MR-seq method. Our results suggest that MR-seq can be used for cells as small as mESCs, MEF and mouse blastomeres. Since the technical noise should increase as the size of the cell decreases, whether MR-seq also works for smaller cells needs further study.
We applied MR-seq for early mouse preimplantation embryos and identified hundreds of intra-embryonic heterogeneous genes among mouse 2-, 4-and 8-cell stage embryos. We found that more than half of these heterogeneous genes were phosphoproteins. These suggest that protein phosphorylation plays an important role in early cell fate decision, i.e. the separation of the inner cell mass (ICM) and the trophectoderm (TE), and further separation of ICM to epiblast (EPI) and primitive endoderm (PE), of mammalian blastomeres. For example, the phosphorylation of Yesassociated protein (Yap), which is encoded by a heterogeneous gene Yap1 identified in 8-cell stage, keeps the Yap protein in the plasma and thus prevents its transcription activation of Cdx2, a master gene for the TE lineage [28]. Our intra-embryonic heterogeneous gene list also includes some known genes that play essential roles in early mammalian embryo development. We found that Nanog, which is a key transcription factor for segregation of PE vs EPI in the ICM, already showed intra-embryonic heterogeneity in the 8-cell stage [29,30]. Whether this intra-embryonic heterogeneity is related to the later cell fate decision requires further investigation. Also, Pard6a and Pard6b, which are cell polarity regulators, have been associated with the first cleavage of the mouse embryo [31]. Our finding of their intra-embryonic heterogeneous expression in the 8-cell stage may point to their roles in later cleavage stage.