Characterization of cell-type specific circular RNAs associated with colorectal cancer metastasis

Abstract Colorectal cancer (CRC) is the most common gastrointestinal malignancy and a leading cause of cancer deaths in the United States. More than half of CRC patients develop metastatic disease (mCRC) with an average 5-year survival rate of 13%. Circular RNAs (circRNAs) have recently emerged as important tumorigenesis regulators; however, their role in mCRC progression remains poorly characterized. Further, little is known about their cell-type specificity to elucidate their functions in the tumor microenvironment (TME). To address this, we performed total RNA sequencing (RNA-seq) on 30 matched normal, primary and metastatic samples from 14 mCRC patients. Additionally, five CRC cell lines were sequenced to construct a circRNA catalog in CRC. We detected 47 869 circRNAs, with 51% previously unannotated in CRC and 14% novel candidates when compared to existing circRNA databases. We identified 362 circRNAs differentially expressed in primary and/or metastatic tissues, termed circular RNAs associated with metastasis (CRAMS). We performed cell-type deconvolution using published single-cell RNA-seq datasets and applied a non-negative least squares statistical model to estimate cell-type specific circRNA expression. This predicted 667 circRNAs as exclusively expressed in a single cell type. Collectively, this serves as a valuable resource, TMECircDB (accessible at https://www.maherlab.com/tmecircdb-overview), for functional characterization of circRNAs in mCRC, specifically in the TME.


INTRODUCTION
Tumor growth in the colon and rectum, classified as colorectal cancer (CRC), currently ranks third in terms of incidence and second in terms of mortality globally (1,2). In the United States, CRC is the most common gastrointestinal malignancy, constituting almost 10% of all cancer cases and deaths. Approximately 60% of patients with CRC develop metastatic disease (mCRC), most commonly in the liver (2)(3)(4). Although early-stage CRC is curable with surgery and adjuvant therapy, mCRC is usually lethal, with a 5-year survival rate of 13% (5).
To date, CRC research has primarily focused on the dysregulation of protein-coding genes to identify oncogenes and tumor suppressors as potential diagnostic and therapeutic targets (6). One emerging class of noncoding RNAs, circular RNAs (circRNAs), has been shown to be functionally implicated in CRC tumorigenesis. CircRNAs are single-stranded circular molecules averaging 500 nucleotides in length, with a covalent 3 -5 bond formed in the process of backsplicing, where a downstream splice donor on a pre-mRNA pairs with an unspliced upstream splice acceptor (7)(8)(9). Though intron retention can happen in a small percentage of circRNAs, exonic circRNAs make up the majority of annotated circRNAs (10,11). The lack of an open 3 end makes circRNAs particularly stable compared to their linear counterparts because they are resistant to exonucleolytic decay by the cellular exosome ribonuclease complex (11). Moreover, the higher stability and longer half-life of circRNAs relative to mRNAs allow them to accumulate in exosomes and blood, thereby making them promising potential biomarkers for noninvasive detection of cancer metastasis (12)(13)(14)(15)(16). Therefore, characterizing cir-cRNAs, elucidating their function and assessing their clinical applicability could significantly impact cancer diagnosis and treatment.
CircRNAs have been found to affect a diverse range of molecular events, with their most heavily theorized function being microRNA (miRNA) sponges that counteract their target miRNA's inhibitive effects (11). However, most reported miRNA sponges were derived from bulk RNA sequencing (RNA-seq) data, without accounting for the celltype specificity of the circRNA and miRNA involved, i.e. the spatial origins of a circRNA in the tumor microenvironment (TME). The TME is described as the 'soil' for metastatic tumor cell growth, as the intricate crosstalk between cancer cells and their surrounding structure contributes greatly to tumor initiation, development and therapeutic response (17). CiRS-7 is one of the most welldocumented miRNA sponges, but, strikingly, Kristensen et al. discovered that it may be able to function through alternative mechanisms depending on its specific location within the TME. They reported that ciRS-7 was completely absent in CRC cancer cells, but highly expressed in stromal cells. The putative 'sponging' target miR-7, on the other hand, was only found to be expressed in cancer cells, rendering the miRNA sponge mechanism implausible (18). As this discovery poses significant challenge to the prevailing miRNA sponge hypothesis, it is evident that the spatial origin of circRNAs should be determined first before any functional investigation.
Many recent studies have shown that many circRNAs are implicated in crucial hallmarks of CRC tumorigenesis, such as sustaining proliferative signaling, avoiding immune destruction, activating invasion and metastasis, and evading cell death and senescence (4,14,. While transcriptome sequencing has provided an unbiased method for discovering circRNAs, existing large-scale sequencing projects such as The Cancer Genome Atlas Network predominantly consist of primary tumors, thereby lacking matched metastatic samples (59). This represents a critical barrier to discovering circRNAs altered throughout the progression of primary to metastatic disease.
Furthermore, multiple obstacles have yet to be overcome in the study of cell-type specific expression of circRNAs. Most known single-cell RNA-seq (scRNA-seq) protocols require poly(A) selection, without which the sequencing data tend to be very noisy (60). As circRNAs lack open ends, poly(A) selection renders capturing circRNAs difficult. Existing digital flow cytometry tools that enumerate cell-type composition or infer cell-type specific gene expression such as CIBERSORT are typically limited to comparing bulk gene expression profiles with gene signatures established from the linear transcriptome, without representing circRNAs. Hence, the scarcity of scRNA-seq data that include circRNA cell-type markers has contributed to a lack of ground truths of cell-type specific circRNA expression (61,62).
To address this, we performed RNA-seq of matched normal, primary and distant metastatic tissues from CRC patients and conducted a meta-analysis to discover differentially expressed (DE) circRNAs in metastatic tumors compared to primary tumors, termed circular RNAs associated with metastasis (CRAMS). Further, to overcome the limitations of determining cell-type specific expression of cir-cRNAs in the TME, we performed cell-type deconvolution followed by a non-negative least squares (NNLS) model to estimate cell-type specific expression of circRNAs (62,63). From the integrative analysis results, we highlighted cir-cRNAs that showed significant dysregulation and/or celltype specificity in mCRC progression. The cell-type specific expression data of identified circRNAs are available at the TMECircDB resource: https://www.maherlab.com/ tmecircdb-overview.

Patient samples and total RNA-seq
To evaluate circRNAs in CRC metastatic progression, we collected specimens from mCRC patients and performed RNA-seq. Patients (N = 14) were enrolled at Washington University School of Medicine in St Louis (WUSTL) and informed consent was obtained under an Institutional Review Board (IRB)-approved protocol. Adjacent normal colon [normal (N); n = 8], primary CRC [primary (P); n = 6], liver metastasis [metastasis (M); n = 13] and normal liver (NL; n = 3) tissues were resected from 14 stage IV mCRC patients and fresh frozen prior to RNA extraction. Input RNA quantity was 1 g except for samples 15 (0.5 g) and 27 (0.24 g) due to low volume (Supplementary  Table S1). RNA-seq libraries were constructed using the Illumina TruSeq Stranded Total RNA Library Prep kit with IDT xGen Dual Index UMI Adapters. Paired-end sequencing was performed on Illumina HiSeq X.

Cell culture and cDNA-capture RNA-seq
To evaluate circRNA expression in cell lines, we performed RNA-seq of five cell lines commonly used in CRC research. HCT116 and HT29 cells were grown in McCoy's medium (Invitrogen, Carlsbad, CA) supplemented with 10% fetal bovine serum and 1% penicillin/streptomycin. SW480 and SW620 cells were grown in DMEM (Invitrogen) supplemented with 10% fetal bovine serum (Sigma, St Louis, MO) and 1% penicillin/streptomycin. LoVo cells were grown in DMEM/F12 (Invitrogen) supplemented with 10% fetal bovine serum and 1% penicillin/streptomycin. Total RNA inputs of 1 g were used to generate libraries using the New England Biolabs NEB Ultra II Directional RNA Library  Prep for Illumina with rRNA Depletion kits. To construct  the cell line reference database, 200 ng aliquots of each library were pooled and used as input for cDNA capture. The  IDT xGen Exome Research Panel v2 including 415 115 exon  probes was used for exome capture and paired-end sequencing was performed on Illumina HiSeq X. For each cell line sample, uniquely mapped reads and spliced reads were compared with exon probes from IDT xGen Exome Research Panel v2 using BEDtools v2. 29.0 overlap to identify on-and off-target reads that were subsequently used to calculate cDNA-capture sequencing ontarget rates (64).

RNA-seq read alignment and circRNA quantification
RNA-seq reads were aligned to the human genome reference assembly version GRCh38. Gene annotations were combined from Gencode v23, RefSeq downloaded from the UCSC Genome Browser and the Broad long noncoding RNA (lncRNA) catalog as described in Silva-Fisher et al. (65). Redundant transcripts were removed and overlapping transcripts were assigned to the same gene.

DE analysis
To identify circRNAs associated with CRC tumor and metastatic progression, DE analysis was carried out comparing normal, primary and metastatic tissues via edgeR v3.30.3, using raw counts of linear genes and circRNAs (78). All linear and circular transcripts were combined and prefiltered [≥5 transcript per million (TPM) in 25% of all samples for linear genes; ≥1 backspliced read in 15% of all samples for circRNAs]. All transcripts were compared between primary versus normal (PvN), metastasis versus primary (MvP) and metastasis versus normal samples (MvN). A circRNA with P < 0.05, |log fold change (FC)| > 0 and FDR adjusted P ≤ 0. 15 (79,80). Cir-cRNAs characterized by CIRIquant to be differentially expressed (P < 0.05) in the same comparison group and in the same direction (up-or downregulated) and/or circRNAs found by CircTest to be differentially expressed (P < 0.05) in the same comparison group were considered validated in parallel.

Cell-type deconvolution
To estimate the cell-type composition of the WUSTL patient cohort, we built scRNA-seq-based signature matrices for deconvolution with CIBERSORT's 'Create Signature Matrix' module using scRNA-seq datasets of six Belgian patients downloaded from Lee et al.  Table S4). The Lee 2020 scRNA-seq cohort encompassed six major cell types in the CRC TME, Figure S2A Table S5) (62,81,82).

Cell-type expression specificity analysis
Cell-type proportion estimates from CIBERSORT were used to build an NNLS statistical model to estimate the celltype specific expression value of circRNAs based on their bulk expression (in normalized backspliced reads) values ( Figure 5A, step 6). The Lawson-Hanson algorithm for the NNLS model using R package nnls v1.4 was applied for each sample to produce the estimated coefficients of celltype specific expression for each circRNA as well as each linear gene ( Figure 5A and Supplementary Figure S3A). In short, each transcript's bulk expression was modeled as a constrained linear least squares regression of the celltype proportions where all estimated coefficients in the regression must be non-negative (Supplementary Figure S3A) (63). Two models were constructed for both scRNA-seq references, termed the Lee 2020 model and the Li 2017 model. For benchmarking purposes, both models were also run to estimate cell-type specific linear gene expression values.

Computational model benchmarking
To assess the performance of the NNLS models, we reconstructed 'bulk' expression from each model's estimated celltype specific expression of both linear genes and circRNAs, were enriched for exonic circRNAs through cDNA capture before RNA-seq, followed by computational analysis to quantify circRNAs. CircRNAs catalogued in cell lines will be used for future validation of highlight candidates discovered in patient samples. Thirty matched CRC patient samples from 14 patients underwent total RNA-seq, whereby both circRNAs and linear genes were quantified and normalized to construct expression matrices. DE analysis was performed to identify circRNAs with altered expression in mCRC. Six patient samples from the Lee 2020 scRNA-seq cohort and 11 patient samples from the Li 2017 scRNA-seq cohort were used to build two separate signature matrices to train CIBERSORT, thereby estimating cell-type proportions in TME and cell-type specific linear gene expression, respectively. The circRNA expression matrix and the TME cell-type proportions were used to estimate cell-type specific circRNA expression via NNLS statistical models. Likewise, cell-type specific linear gene expression was estimated using the linear gene expression matrix and TME proportions. The NNLS model estimations were benchmarked by comparing reconstructed 'bulk' expression of circRNAs and linear genes with the ground truth values. Collectively, DE and cell-type specific circRNAs were identified as CRAMS.
which was then compared with the ground truth to evaluate how accurately the model recaptures the real expression values ( Figure 5B-E, Supplementary Figure S3B and C, and Supplementary Tables S7 and S8).
For linear genes, modeled 'bulk' expression (matrix L) of every gene in every sample was calculated through matrix multiplication of NNLS estimated coefficients (matrix B) and CIBERSORT estimated cell-type proportions (matrix p). The modeled 'bulk' expression (matrix L) was compared with the ground truth bulk expression (matrix l). A linear regression model was used (R function lm) to evaluate the correlation of expression between modeled and ground truth bulk data for each gene (resulting in P-value and R 2 ). Pearson correlation coefficients were calculated using R function cor.test for each sample (Supplementary Figure S3B and Supplementary Table S8).
For circRNAs, modeled 'bulk' expression (matrix N) of every circRNA in every sample was calculated through matrix multiplication of NNLS estimated coefficients (matrix C) and CIBERSORT estimated cell-type proportions (matrix p). Modeled 'bulk' expression (matrix N) was then compared with the ground truth bulk expression (matrix n). A linear regression model was used (R function lm) to evaluate the correlation of expression between modeled and ground truth bulk data for each circRNA (resulting in P-value and R 2 ). Pearson correlation coefficients were calculated using R function cor.test for each sample (Supplementary Figure  S3C and Supplementary Table S7).
Cell-type specific circRNAs with supporting evidence through scRNA-seq studies were downloaded from circSC (83)(84)(85). A total of 3843 circRNAs from cancer cell lines and colon tissue from two studies were used in comparison with cell-type specificity prediction from our analyses (Ar-rayExpress accession code E-MTAB-6072 and GEO accession code GSE51254).

RNA-seq of matched patient cohort identifies abundance of circRNAs throughout mCRC progression
We performed total RNA-seq using 30 samples from 14 stage IV mCRC patients from Washington University in St Louis, termed the WUSTL patient cohort (Figure 1, central column, Figure 2A and Supplementary Table S1 Table S1).
A total of 28 670 unique circRNAs were detected in all samples, with 13 238 detected in normal, 10 226 in primary, 14 831 in metastasis and 5576 in normal liver. On average, 2291 unique circRNAs were detected in each sample; within each tissue type, the mean numbers of unique circRNAs detected were 2700 in normal, 2433 in primary, 1965 in metastasis and 2323 in normal liver. A total of 4820 unique cir-cRNAs were shared by tumor samples (P, M); 3360 were shared by normal and tumor samples (N, P, M). Overall, 1644 unique circRNAs were shared between all tissue types ( Figure 2B).
For downstream analyses of DE and cell-type specificity, RNA-seq reads mapping to linear genes were also quantified (68). A total of 97 847 annotated linear genes were detected, whereby 7176 (7% of expressed genes) were found to have at least one circular isoform. These genes were considered as the parental genes of their corresponding cir-cRNA(s). In general, most detected circRNAs had low expression compared to linear genes. In linear genes, the upper quartile value of mean normalized reads was 1.68 TPM in all samples. On the contrary, the upper quartile value of mean normalized backspliced reads of circRNAs was 0.004 ( Figure 2E).

CircRNA catalog in CRC cell lines
We established a cell line reference catalog by sequencing five common CRC cell lines, including three derived from primary tumors (HCT116, HT29 and SW480) and two from metastatic tumors (SW620 and LoVo), termed WUSTL cell lines ( Figure 1, left column). All cell lines underwent cDNA-capture sequencing, as previously described, followed by circRNA quantification (86). The targeted sequencing achieved high on-target rates (97% for all mapped reads and 92% for spliced reads) (64) (Supplementary Table S9).
A total of 32 469 unique circRNAs were detected in all cell lines, originating from 6666 parental genes. A total of 1325 circRNAs were found in common in all cell lines (Supplementary Figure S1A). Out of the 28 670 unique circR-NAs from the WUSTL patient cohort, 13 270 (46%) were expressed in at least one cell line sample, while 1289 (4%) were expressed in all cell line samples ( Supplementary Figure S1B). All circRNAs detected in cell lines were consolidated into a reference database to serve as a roadmap for future mechanistic and phenotypic validation using cell culture.

Discovery of novel circRNAs beyond existing annotation
To identify circRNAs that have yet to be characterized in CRC patient samples, we compared our results with the CRC cohort from the pan-cancer circRNA consortium MiOncoCirc ( Figure 2C It is worth noting that many circRNA databases only characterize unique circRNAs by their backsplice junction coordinates, without taking into consideration possible different isoforms arising from the same backsplice junction. Hence, the true percentage of novel, unannotated circRNAs in our datasets may be >14%.

DE analysis highlights circRNAs associated with metastasis
To characterize circRNAs altered in the mCRC progression, we performed DE analysis using edgeR comparing normal, primary and metastatic tissues from the WUSTL patient cohort ( Figure 3A). All expressed linear genes and circRNAs were pooled to identify DE circRNAs in CRC patients (78). We identified 362 DE circRNAs in total, of which 55 showed upregulation in at least one of three comparison groups (PvN, MvN and MvP) and 308 showed downregulation ( Figure 3B and Supplementary Tables S2 and S3). A total of 69 circRNAs were consistently dysregulated in two or more comparison groups in the same direction, including 11 upregulated and 58 downregulated (Figure 3B and Supplementary Tables S2 and S3). In addition, 319 (88%) DE circRNAs were expressed in at least one cell line sample (Supplementary Table S3).
To provide additional evidence for the DE status of these candidates, we performed two parallel DE tests using CIRIquant and CircTest on the same cohort of circRNAs detected from the WUSTL patient cohort (79,80). Among the 362 DE circRNAs, 276 (76%) showed at least one concurring test, with 129 (36%) validated by both tests ( Figure 3C and Supplementary Table S3).
Several previously characterized circRNAs that are associated with mCRC or other types of metastatic cancer were shown to be differentially expressed with supporting evidence from at least one other DE test (Figures 3A and C and 4A). This included downregulated candidates cir-cBNC2, circDDX17, circSETD3 and circFBXW7, which have been reported to have tumor suppressing roles by inhibiting migration, invasion and proliferation of metastatic cancers (24,48,(87)(88)(89)(90). Among the upregulated candidates, circFIRRE and circCEP128 have been shown to promote cancer growth in osteosarcoma and bladder cancer, respectively (91,92). On the contrary, circC3P1, found to be upregulated in metastatic samples, has been reported to be downregulated in hepatocellular carcinoma, warranting further investigation (93).
In general, more circRNAs showed lowered expression in primary or metastatic tumors compared with normal tissues ( Figure 4A). For example, DE circRNAs in metastasis compared with normal tissues had a negative mean log FC (−0.532), indicating a global reduction of abundance ( Figure 4B, middle panel). A significant portion (29%) of DE circRNAs also had their parental linear genes found to be dysregulated in the same direction (3% both upregulated and 26% both downregulated). Interestingly, a large portion of circRNAs (57%) were found to be dysregulated while their parental linear gene expression was unchanged (53%) or dysregulated in the opposite direction (4%), suggesting alternative mechanisms of circRNA dysregulation ( Figure 4B and Supplementary Table S10). A similar trend of reduced abundance was observed when comparing primary tumors with normal tissues, and when comparing metastatic tumors with primary tumors (Figure 4A and B, and Supplementary  Table S10).

Integrative deconvolution models to estimate cell-type specific circRNA expression
To better understand circRNAs dysregulated in CRC in the context of TME, we evaluated their cell-type specific expression in bulk patient RNA-seq. To do so, we leveraged existing patient scRNA-seq data from two independent cohorts, termed the Lee 2020 scRNA-seq cohort (six patients, six cell types, GEO accession code GSE144735) and the Li 2017 scRNA-seq cohort (11 patients, seven cell types, GEO accession code GSE81861), to build signature matrices (Figure 1, right column, Supplementary Figure S2A and B, and Supplementary Table S4) (81,82). Using both datasets in parallel, the bulk linear gene expression of the WUSTL patient samples was deconvolved via CIBERSORT using the signature matrices constructed with the scRNA-seq references ( Figure 5A, steps 3-5, Supplementary Figure S2A and B, and Supplementary Table S5) (62).
The resulting cell-type proportion estimates from CIBERSORT were then used to train an NNLS statistical model to estimate the cell-type specific expression value of circRNAs based on their bulk expression (in normalized backspliced reads) values ( Figure 5A, step 6). Two models were constructed for both scRNA-seq references, termed the Lee 2020 model and the Li 2017 model. For benchmarking purposes, both models were also run to estimate cell-type specific linear gene expression values.
Both the Lee 2020 model and the Li 2017 model achieved comparable accuracy in recapturing the ground truth expression per transcript, as well as per sample. In general, both models performed better in estimating linear gene expression than circRNA expression. When comparing between transcripts, a higher percentage of linear genes (P < 0.05) was predicted by the Lee 2020 model (93%) than by the Li 2017 model (90%). The percentage of circRNAs (P < 0.05) from the Lee 2020 model (55%) was also greater than that from the Li 2017 model (46%). There was an overall increase in the linear regression R 2 values in the Lee 2020 model as compared to the Li 2017 model in linear genes with significant P-values, with R 2 ranging from 0.126 to 0.997 with a mean of 0.592 in the Lee 2020 model and R 2 ranging from 0.126 to 0.997 with a mean of 0.586 in the Li 2017 model ( Figure 5B and Supplementary Table S6). In circR-NAs, however, the Li 2017 model performed better in recapturing bulk circRNA expression, with Lee 2020's R 2 ranging from 0.126 to 0.859 with a mean of 0.226 and Li 2017's R 2 ranging from 0.126 to 0.916 with a mean of 0.266 for circRNAs with significant P-values ( Figure 5C and Supplementary Table S6).
When comparing between samples, both models yielded robust real bulk expression estimations of both linear genes and circRNAs, exemplified by all samples having P < 0.05 in the association test. In linear genes, the two models performed nearly identically, with the Pearson  Table S7).
We also investigated the cell-type specific expression of ciRS-7 as a positive control and found evidence corroborating Kristensen et al.'s conclusion that ciRS-7 is only found in stromal cells and is absent in CRC epithelial cells (18). The NNLS model trained using the Lee 2020 data estimated ciRS-7 to have the highest expression in stromal cells and very low expression in epithelial cells, whereas the model trained using the Li 2017 data estimated ciRS-7 to have high abundance in endothelial cells, a subtype of stromal cells, and zero expression in epithelial cells ( Figure 5F). Interestingly, only 17% of all detected circRNAs were estimated to be expressed in epithelial cells, with 7% shared between both models, 6% in the Lee 2020 model only and 4% in the Li 2017 model only. Only 167 DE (46%) candidates were predicted to be expressed in epithelial cells, warranting further study of the exact mechanisms of their dysregulation in cancer (Supplementary Table S6).

NNLS model results reveal cell-type specific circRNAs in CRC TME
Based on the Lee 2020 model predictions, 7391 out of 28 670 (26%) unique circRNAs only showed expression in one cell type, while the Li 2017 model predicted 7458 (26%). The consensus between both models showed 4123 (14%) circR-NAs with exclusive expression in a single cell type. To select those with significant cell-type specificity, we required the following criteria: P < 0.05 and R 2 above the median R 2 of all circRNAs in NNLS estimates. This narrowed the number of cell-type specific circRNAs to 3652 (13%) in the Lee 2020 model and 1725 (6%) in the Li 2017 model, and a consensus of 1217 (4%) between models ( Figure 6A). To further consolidate the cell-type specific circRNAs, we required a circRNA to show cell-type specificity in the same cell type in both models. Given the differences in cell types annotated by the Lee 2020 and Li 2017 models, we considered macrophages as a subtype of myeloid cells, and fibroblasts and endothelial cells as subtypes of stromal cells. A total of 667 candidates fulfilled the criteria as consensus cell-type specific circRNAs ( Figure 6A). While our cell-type specificity results were derived computationally from bulk RNA-seq data, we compared our predictions with cell-type specific circRNAs reported in the recently published database circSC (83). Sixteen circR-NAs predicted by our pipeline to be cell-type specific were present in circSC, and four (25%) were found to be specific in the same cell types as shown by scRNA-seq studies. For example, circSRFBP1, the candidate with the highest specificity to epithelial cells based on NNLS P-values, was reported to be specific to CRC cells ( Figure 5G and Figure 6C, Supplementary Figure 4D and Supplementary Table S11).
When compared with DE circRNAs, two upregulated cir-cRNAs, CRAMS1 and circCDKAL1, were estimated to be expressed in one cell type only ( Figure 6A-C and Supplementary Figure S4A-C). CircCDKAL1 was among the consensus of both NNLS models to be exclusively expressed in myeloid cells, while CRAMS1 expression was exclusive to T cells ( Figure 6A, Supplementary Figure S4A and B, and Supplementary Table S6). Markedly, CRAMS1 was predicted to have the highest specificity to T cells based on NNLS P-values ( Figure 6C).
Many DE circRNAs have ample literature evidence supporting their dysregulation in CRC or other cancers (Figure 3A). However, through our cell-type specificity analysis, neither circCDKAL1 nor CRAMS1 has been reported to be associated with metastasis. When examining the linear parental genes, CDKAL1 is known to be implicated in endometrial cancer tumorigenesis (94). On the other hand, CRAMS1's parental transcript CRCAT 00081448 is a novel lncRNA characterized by our lab's de novo assembly in mCRC patient samples, however, without previous report on its role in mCRC progression ( Figure 6B) (65). The scarcity of existing evidence necessitates the need for future studies of these dysregulated, TME cell-type specific candidates in CRC.

DISCUSSION
Our study has addressed many limitations of existing CRC research. To date, there has not been a comprehensive, spatial expression profile for circRNAs in mCRC. For example, the pan-cancer database of expressed circRNAs across cancers, MiOncoCirc, consists of 868 patient samples, of which 217 (25%) are from prostate cancer, whereas only 13 (1.5%) samples are from CRC. Further, since these CRC samples were not curated from adjacent sites of normal, primary and metastatic tissues from the same patients, it proved challenging to systematically identify dysregulated circRNAs throughout mCRC progression (74). Our unique matched normal, primary and metastasis samples from mCRC patient cohorts allowed us to discover circRNAs altered throughout mCRC progression. With our integrated computational approach, we not only discovered circRNAs altered in mCRC, but also elucidated their potential celltype specificity. We envision that these novel findings will serve as a resource for future mechanistic and functional studies in circRNAs in relation to the TME of CRC. Our computational pipeline can be adapted to study circRNAs in other tissues and diseases as well, serving as a versatile tool. In addition, we established an extensive collection of circRNAs in CRC cell lines that can be used to delve into the oncogenic roles of candidate circRNAs identified above (Supplementary Figure S1A and B, and Supplementary Table S9).
We performed transcriptome profiling of a matched cohort of mCRC patients and characterized CRAMS. We detected 28 670 unique circRNAs, 51% of which were isoforms unannotated in MiOncoCirc ( Figure 2C). Through DE analysis, we highlighted 362 CRAMS that are DE cir-cRNAs throughout mCRC progression ( Figure 3A and B, and Supplementary Tables S2 and S3). Three-quarters of CRAMS were supported by orthogonal validation in spite of the intrinsic differences between CIRIquant, CircTest and our DE analysis approach. Despite providing the CIRCexplorer2 detection output as guidance, CIRIquant is based on different alignment and read assembly software than that in our pipeline, resulting in slightly different read counts for the downstream DE analysis (79). On the other hand, CircTest solely relies on the parental linear transcripts that have corresponding circRNAs, thereby omitting a large number of linear genes (90 671 genes, 93% of all expressed genes) and affecting the scaling and normalization of the read counts during the DE analysis (80). We wish to use these supporting test results as robust evidence for the CRAMS identified and emphasize candidates 12 NAR Cancer, 2023, Vol. 5, No. 2 A B C that have high confidence when validated by more than one pipeline. A vast majority of DE circRNAs were downregulated in primary or metastatic tumors, which could be attributed to the faster cell proliferation diluting their global abundance as the cancer progresses, or increased export to exosomes due to their high stability (14,74). Comparing circRNAs and their parental linear genes showed that only approximately one-third of the DE circRNAs were co-dysregulated with their parental genes, whereas the rest of them had parental genes that were unchanged or even dysregulated in the opposite direction ( Figure 4B). This suggests while cir-cRNA expression could be attributed to parental gene regulation, it could also be independent of their parental gene in some cases. Therefore, further investigation is needed to precisely define how circRNAs are regulated in mCRC.
Identifying the spatial origin and cell-type specific expression of circRNAs is paramount to understanding their biological roles as potential TME regulators. To address the lack of cell-type deconvolution methods involving circR-NAs, we devised a unique computational approach to identify cell-type specific circRNAs associated with the TME of CRC. By combining deconvolution analysis using two published scRNA-seq datasets and building an NNLS statistical model, we discovered that 667 circRNAs were exclusively expressed in one cell type only. Our results were congruent with earlier findings by highlighting known celltype specific circRNAs such as ciRS-7 being exclusive to stromal cells ( Figure 5F) (18). In particular, four predicted cell-type specific candidates, such as the epithelial cellspecific circSRFBP1, had supporting scRNA-seq evidence from circSC despite the highly heterogeneous nature of the datasets included in circSC ( Figure 5G). Many studies in circSC focus on other cancers or diseases and are heavily skewed toward hematopoietic cell types that are absent in solid tumors like CRC, and only two out of 171 datasets from circSC contained cell types relevant to this comparison. Moreover, there is a lack of stromal and myeloid cell presence in these scRNA-seq datasets. Hence, we believe that our findings reported in TMECircDB will serve as valuable knowledge for cell-type specific circRNAs that are unique to the TME of metastatic CRC.
Through benchmarking, we showed that the NNLS models derived from both the Lee 2020 and Li 2017 scRNA-seq cohorts achieved highly robust performance in predicting cell-type specific expression by accurately recapturing bulk expression of linear genes. When applied to circRNAs, both models performed comparably, but only successfully recaptured the bulk expression in approximately half of the circR-NAs. In the circRNAs that yielded statistically significant predictions, the overall R 2 range was similar to that of the linear genes but with lower maximum R 2 values ( Figure 5B and C). However, the distribution of R 2 values in circRNAs was clustered below 0.2 with a large number of outliers in the upper end of the R 2 spectra ( Figure 5C). In linear genes, most R 2 values were clustered above 0.6 and there were no outliers in either model ( Figure 5B). This is likely due to the overall low abundance of circRNAs as compared to linear genes, hence providing less information about their expression profiles for the statistical models to estimate cell-type specificity ( Figure 2E). Despite this caveat, our analysis pre-dicted 1217 consensus circRNAs with statistically significant cell-type specific expression, 667 of which were celltype specific in the same cell type as agreed upon by both models. This suggests that these consensus candidates were detected with high confidence even when the majority of R 2 values were low. Moreover, when comparing between samples, the Pearson correlation coefficients in linear genes were close to 1 in both models, showing nearly perfect correlation ( Figure 5D). In circRNAs, the median Pearson correlation coefficients also showed high correlation per sample, albeit lower than that in linear genes ( Figure 5E). This shows that both models performed robustly in recapturing the ground truth expression of circRNAs in each sample.
Among the DE and cell-type specific circRNAs, we highlighted several candidates that showed promising association with CRC metastasis (Figure 3A). With a global downregulation trend of circRNAs in cancer, the upregulated cir-cFIRRE and CRAMS1 are particularly worth investigating further. While circFIRRE's parental transcript FIRRE was also upregulated in primary and metastasis samples, suggesting parental gene regulation contribution, CRAMS1 was upregulated independent of its parental transcript CR-CAT 00081448, which was unchanged in mCRC ( Figure  6B). Furthermore, CRAMS1 displayed cell-type specificity in T cells, which could be a result of increased immune response in cancer (Supplementary Figure S4B). In addition, certain circRNAs could be involved in the infiltration of certain myeloid populations such as tumor-associated macrophages, which are known to promote metastasis, as suggested by Song et al. (17). Hence, we speculate that the myeloid cell-specific upregulated circRNA circCDKAL1 could be involved in facilitating the crosstalk of the CRC TME (Supplementary Figure S4A and C). As these candidates have little to no literature support in CRC research, there is substantial room for more experimental evidence to emerge. While our study provides an insightful resource, further experiment validation is required to show the mechanism and exact function of the CRAMS we have reported.

CONCLUSION
This research will have a broad overall impact on the field of RNA tumor biology by using human patients to provide a comprehensive resource for defining the landscape of circRNAs altered during metastatic CRC progression. Our computational methodology also enabled further deconvolution of circRNA expression into tumor microenvironment cell types that can guide future mechanistic studies of circRNAs contributing to mCRC.

DATA AVAILABILITY
The cell-type specific expression data of identified circR-NAs are available at the TMECircDB resource: https:// www.maherlab.com/tmecircdb-overview. The data analysis pipeline and related scripts are available at Zenodo (https: //doi.org/10.5281/zenodo.7884187) and GitHub (https:// github.com/ChrisMaherLab/TMECircDB). The RNA-seq data generated in this study (WUSTL patient cohort and WUSTL cell lines) have been deposited in the NCBI Gene Expression Omnibus database under the accession code GSE221240. The Lee 2020 scRNA-seq data referenced during the study are available in a public repository from the NCBI Gene Expression Omnibus under the accession code GSE144735. The Li 2017 scRNA-seq data referenced during the study are available in a public repository from the NCBI Gene Expression Omnibus under the accession code GSE81861. All the other data supporting the findings of this study are available within the article and its Supplementary Data and from the corresponding author upon reasonable request.