Single-cell transcriptional profiling reveals the heterogenicity in colorectal cancer

Supplemental Digital Content is available in the text


Introduction
Tumor heterogeneity poses a serious challenge to cancer treatment and patient survival. It can be divided into intertumoral and intra-tumoral heterogeneity. The former is observed that variability occur between patients with the same histologic type, and forms the cornerstone of targeted cancer medicine. [1,2] The latter is further present within a single tumor. Cells from the same tumor may harbor different genetic mutations or exhibit distinct phenotypic states. [3][4][5] Such intra-tumoral heterogeneity is increasingly appreciated as a key determinant of treatment failure, drug resistance, and disease recurrence. [2] It is therefore All study procedures were conducted with Internal Review Board approval (Shenzhen People Hospital, China). All participants received information on the aim and procedures of the study and written informed consent was obtained from all subjects.
indispensable to fully characterize the phenotypes and interactions of diverse cell types present within tumors.
Bulk cell RNA-sequencing (RNA-Seq) technology has been widely used for transcriptome profiling to study transcriptional structures, splicing patterns, and gene and transcript expression levels. [6,7] However, such ways of sequencing can only target a cell colony and obscure the signatures of distinct cell populations. Alternatively, recently developed single-cell RNA sequencing (scRNA-seq) technologies provide comprehensive, unbiased analysis of all cell types and states based on gene activity within tumor masses. [8,9] It opens the possibility to map cellular heterogeneity unbiasedly, recovers cellular identities independently of a priori defined labeling strategies and can furthermore uncover either their previously unrecognized disease-associated cell populations or functional states, novel biomarkers and potential molecular regulators. [10] Multiple studies have used scRNA-seq to investigate the intra-tumoral heterogeneity in several cancer types. [11,12] Patel et al used scRNA-seq to profile 430 cells from 5 primary glioblastomas, and found inherently variable in their expression of diverse transcriptional programs related to oncogenic signaling, proliferation, complement/ immune response and hypoxia. [11] Min et al analyzed transcriptomic data obtained from 34 single cells from human lung adenocarcinoma (LADC) patient-derived xenografts (PDXs) and revealed 2 distinct intra-tumoral subgroups that were primarily distinguished by the gene module G64. [12] The above results indicate that scRNA-seq may have the ability to characterize the phenotypes and interactions of diverse cell types present within tumors.
Colorectal cancer (CRC) is the third most common cancer in the world, and about 1.4 million cases were diagnosed worldwide in 2012. [13] It was characterized by heterogeneous features of histopathologic, genomic, epigenomic, and transcriptomic alteration. Bulk transcriptomic analysis shows that stromal cell signatures correlate with risk of recurrence and predict patient survival, highlighting the importance of diverse cell populations in CRC. [14,15] In this study, scRNA-seq was performed to generate a molecular census of cell types and stated with the cancer tissue from a CRC patient. Combining this technology with clustering analysis to characterize gene expression at the single cell level and resolve differential responses by cell type can help us understand how activated and quiescent, abnormal cellular subpopulations contribute to CRC disease initiation, maintenance, and progression.t

Study subject
This study involved a 73-years old female Stage III right Colorectal cancer patient. Clear pathologic diagnosis and clinical evidence were confirming as CRC. Cancer tissue was dissociated from the patient and used for scRNA-seq analysis. This study was conducted by the Declaration of Helsinki and was approved by the Ethics Committee of the Shenzhen People's Hospital, China (Ref No. 2015-313); the sole participant also provided her written informed consent.

Dissociation of tumor
To ensure accurate and satisfactory results, fresh tissue was required for scRNA-sequencing. Cancer tissue from the CRC patient was obtained and immediately dissociated by Tumor Dissociation Kit, human"" (Miltenyi Biotec, Germany) as the vendor recommended. In detail, an enzyme mix was firstly prepared by combing various types of enzymes. Then, the cut 2 to 4 mm tumor was added to a gentle MACS C Tube along with the enzyme mix. The tube was attached upside down to the genleMACS Dissociator and process by the gentleMACS Program. Afterward, the C Tube was detached and incubated for 30 minutes at 37°C while rotating continuously using the MACSmix Tube Rotator. Later, the sample was resuspended and applied to a MACS * SmartStrainer placed on a 50 mL tube. The cell was washed with 20 mL of RPMI 1640. Lastly, the cell was centrifuged at 300 Â g for 7 minutes. Both the cancer tissue and adjacent tissue were prepared and used for sc-RNA sequencing analysis.

Single-cell capture, library preparation, and RNA-seq
The cell suspension was diluted to a concentration of 11 Â 0 6 cells/ml in PBS plus 0.04% bovine serum albumin (BSA). The cell viability was at least 80%. Cell count and viability were determined using trypan blue on a Countess FL II. Then, scRNA-seq libraries were prepared according to the Single Cell 3 0 Reagent Kit User Guide v2 (10Â Genomics). [16] Briefly, cellular suspensions were loaded on a Chromium Controller instrument (10Â Genomics) to generate single-cell gel bead-inemulsions (GEMs). GEM-reverse transcriptions (GEM-RTs) were performed in 96-deep well reaction module: 55°C for 45 minutes, 85°C for 5 minutes; end at 4°C. After the RT step, GEMs were broken, and barcoded-cDNA was purified with DynaBeads MyOne Silane Beads (Thermo Fisher Scientific, 37002D). Subsequently, cDNAs were amplified with 96-Deep Well Reaction Module (98°C for 3 minutes; cycled 12 times: 98°C for 15 second, 67°C for 20 second, and 72°C for 1 minute; 72°C for 1 minute; end at 4°C) and cleaned up with the SPRIselect Reagent Kit (Beckman Coulter, USA). Indexed sequencing libraries were constructed using the GemCode Single-Cell 3 0 Library Kit for enzymatic fragmentation, end-repair, A-tailing, adaptor ligation, ligation cleanup, sample index PCR, and PCR cleanup. [16] Quantification of the constructed libraries was evaluated using QubitdsDNA HS Assay Kit (Thermo Fisher), Agilent cDNA High Sensitivity Kit, and Kapa DNA Quantification Kit for Illumina platforms, following the manufacturer's instructions. The library was sequenced on Illumina HiSeq2500, using the paired-end 2 Â 125 bp sequencing protocol. Sequencing run parameters were set up according to version 2 chemistry, the number of cycles for each read as follows: Read 1: 26 cycles, i7 index: 8 cycles, i5 index: 0 cycles and Read 2: 98 cycles. [17]

Processing of sc-RNAseq data
CellRanger software was utilized to analyze Single-cell expression (version 2.1.1); it was also used to perform quality control, sample de-multiplexing, barcode processing, and single-cell 3'gene counting. Sequencing reads were aligned to the UCSC hg38 transcriptome using the default parameters in Cell Ranger suite. For Quality Control (QC), we filtered out genes detected in less than 3 cells and cells where < 200 genes had nonzero counts. Also, we removed cells having a percentage of total unique molecular identifiers (UMIs) that derived from mitochondrial genome greater than 10%, as such data may result from conditions such as cells undergoing stress and cell death. Filtered data were normalized and log-transformed; later, the logtransformed matrix was used for all downstream analysis. The cluster identities defined by K-means clustering in Cell Ranger Based on Louvain Modularity optimization algorithm. DDRTree method was adopted for pseudo time analysis; generally, genes with the top 1500 highest standard deviations were obtained as highly variable genes. These genes were used for dimensionality reduction utilizing Principle component analysis (PCA). Using the first 10 principal components as input, t-distributed stochastic neighbor embedding (t-SNE) was used for data visualization in 2 dimensions.

Cluster cell type annotation and marker genes identification
We used the feature plot function to highlight the expression of known marker genes to identify clusters. Violin plots for the given genes were generated using the Seurat toolkit VlnPlot function. Also, Gene ontologies of DEGs were assessed using the PANTHER Overrepresentation Test (Gene Ontology Consortium, http://www.geneontology.org/), with the reported P values corrected by the Bonferroni adjustment for multiple comparisons.

Statistical analysis
Major cell type-specific markers were screened by Seurat's Bimod likelihood ratio statistical test, satisfying P-value .05 and greater than or equal to 2 times the differential expression range screens the differential gene between the specified cell population and the remaining cell population. These markers are further subjected to bidirectional hierarchical clustering of genes and samples. According to log2FC, A heat map of the top 10 genes that were unique to each cluster was displayed.

Results
A deep transcriptional map of cancer cell states and gene expression in CRC patient at single-cell level was generated after the performance of scRNA-seq on cancer tissue from a CRC patient. The 2824 cells were obtained for further analysis. The median number of UMIs and genes detected per cell were 11,731 and 21,111. 69.6% of sequencing saturation rate indicated comprehensive sampling of the available transcripts. Above data can be seen from Figure 1.

Cell markers applying to each cluster
A total of 5 clusters were recognized after scRNA-seq (Fig. 2). Upon analyzing specific genes in each cluster, cell markers can be placed to each one of them. After careful research, it was determined that cluster 1 mainly consists of immune cells, since IGL@ and IGH@ both recognize foreign antigens and initiate immune responses (Fig. 3). cluster 2 contains specific genes related to the major histocompatibility complex (see

Major cell type-specific markers
To further differentiate between each cluster, analysis were performed on differentially expressed genes between cell types based on mean expression and covariance patterns; we discovered marker gene sets were sufficient to identify cell type with high probability uniquely. A heat map of the top 10 genes that were unique to each cluster based on average log fold-change showed a high degree of heterogeneity between the clusters (Fig. 4). Cluster 1 was distinguished by having a unique set of genes such as IGLC7, IGLC2, IGLC3. Cluster 2 possesses unique  genes such as HLA-DRA, IGHM, IGHG2. Other 3 clusters also define themselves with unique genes as well. It can also be witnessed that these top 10 genes unique to each cluster have different UMI counts. Moreover, the top 10 most distinct signature genes in each cell type were presented as violin plots and compared for the differences between each cluster within the Cancer tissue.

Differential expression genes-GO/KEGG analysis
To further investigate CRC disease-associated cell functional states and potential molecular regulators, gene ontology (GO) analysis of differentially expressed genes (DEG) in cancer tissue. Cluster 1 revealed a higher proportion of biological processes genes, including, ATP synthesis, cellular respiration, oxidative phosphorylation, and mitochondrion organization (Fig. 5). . KEGG pathway analysis also demonstrated disease enrichment in each cluster. It revealed that cancer-related disease present in all 5 clusters, such as neoplasm metastasis, colorectal cancer, malignant neoplasm of breast, etc; This confirmed the reliability of the experimental results and indicated that CRC is a highly complex disease.

Cell states displayed heterogeneity in CRC
We generated trajectory plots to determine the relatedness of cells types and states. Then, several trajectories were constructed in pseudo temporal order; this can provide us with a clear view of both branched and linear differentiation. Theoretically, cells resided on the same or adjacent branches are believed to be more hierarchically related than cells on neighboring branches in a specific trajectory plot. Thus, it was revealed that there were 5 distinct cell clusters in cancer tissue. As shown in Figure 6, cluster 2 to 5 presents in state 1; cluster 1, 2, and 4 presents in state 2; and cluster 2, 4, and 5 presents in state 3. It revealed that some clusters overlapped in different states.

Discussion
CRC is one of the most concerned types of cancer in the world. Pathogenesis is still not clearly understood; many believe genetic and environmental factors contribute to this disease. However, the role of each type of signaling molecules and immune cells plays in this disease remains obscure. Sc-RNAseq is the new technology that can prevent aberrant signals from overlooking and provide more insight into cells itself for more synthesized observation and analysis. [18] Here, we report comprehensive single-cell expression profiling of cancer tissue of CRC patient using this technology. The 2824 cells were obtained and further classified into 5 distinct cell clusters whereas each cluster was made up of a different number of cells. We suspect that the tumor of CRC may display heterogeneity. This point of view becomes more evident as the analysis went on. Initially, the analysis revealed that cell markers can be applied to each cluster; for example, Cluster 2 mainly consist genes related to Major Histocompatibility Complex (MHC), while the remaining 4 clusters possess cell markers uniquely to themselves. Also, the generated heatmap displayed heterogeneity between each cluster, due to each distinct cluster defined themselves with a unique set of genes; This supported the hypothesis of heterogeneity in the tumor of CRC. Another evidence that supports the above viewpoint was the results from GO term analysis and KEGG pathway analysis. GO term analysis demonstrates that each cluster serves different functions. GO term analysis of cluster 1 revealed that there were genes responsible for biological processes including, ATP synthesis, cellular respiration, energy derivation and so on; other genes are cellular components; remaining part of those genes were responsible for molecular function, serving to bind protein, RNA, dehydrogenate NADH, adhesion of cells. Other 4 clusters that underwent GO term analysis indicated similar pattern whereas genes in the same cluster served different functions. Moreover, GO term analysis demonstrated the relatedness towards the tumor differs for each cluster. Apart from Cluster 1 written above, Cluster 3 and 4's genes mainly served to support cells (providing energy, generating extracellular matrix) Genes of cluster 2 and 5 emphasized more on immunity; functions included immune response, regulation of lymphocyte, leukocyte, T-cell activation and so on indicating these 2 clusters were more related to the tumor. However, KEGG pathway analysis of disease enrichment revealed that carcinoma diseases presented in all 5 clusters. These discoveries proved that, apart from diversities of genes within a cluster, each cluster's relatedness to the tumor also differs. Again, such pieces of evidence enforced the idea that tumor of CRC displayed numerous heterogeneity. Trajectory plot analysis was another piece of evidence demonstrating CRC's heterogeneity. As stated above, trajectory plots were analyzed based on clusters, states, and pseudo-time. Results showed several clusters were present in the same state, while a specific cluster spans over several states. Also, clusters were observed overlapping in several pseudo time regions. For example, Cluster 2 exists in all 3 states; Cluster 1, 2, and 4 all present in state 3; This lead to the conclusion that tumor of CRC was indeed heterogeneous, and different clusters participated disproportionally within a tumor.
In conclusion, the sc-RNAseq of cancer tissue from CRC patient allowed for unbiased, readjusting identification of distinct cell types and states and can provide us with a detailed insight. By utilizing this technology, analysis towards clusters, GO term, KEGG and Trajectory plot supported the viewpoint that CRC is indeed heterogeneous in numerous aspects. During the research, it was somehow regrettable that only one case of Stage III CRC can be collected. In addition, further analysis such as qPCR was not able to perform. Combining further analysis with comparison between the tumor and cancer peripheral tissue will probably yield more promising evidence. Thought the fact, existing data can still present a comprehensive view and insight towards CRC in a whole; others can utilize the data to develop more strategies against CRC.