Transcriptomic analyses reveal three distinct molecular subgroups of Merkel cell carcinoma with differing prognoses

Merkel cell carcinoma (MCC) is a cutaneous neuroendocrine malignancy with a poor prognosis and an unknown cell of origin. Proffered cells of origin include epithelial stem cells of the hair follicle or interfollicular epidermis, dermal stem cells and pro/pre‐ or pre‐B cells. MCC has also been proposed to have more than one cell of origin and indeed to represent more than one type of carcinoma, currently grouped together due to phenotypic similarities. We explored the heterogeneous nature of MCC by studying the most variably expressed genes with the goal of identifying gene expression patterns that are either clinically relevant or have implications regarding the cell(s) of origin. We performed RNA sequencing on primary tumor samples from 102 patients and identified the top 200 most variably expressed genes. These genes and the tumor samples were hierarchically clustered based on their expression. The functions of three gene clusters exhibiting clearly divergent expression between samples were studied by cross‐referencing the lists of genes with online databases. High expression of a gene cluster related to embryonic developmental processes and low expression of a gene cluster related to neuroendocrine processes distinguished Merkel cell polyomavirus (MCPyV)‐negative tumors from MCPyV‐positive tumors. Furthermore, two prognostically relevant subgroups of MCPyV‐positive MCC were identified based on dichotomic expression of genes related to epidermal structures and processes. We identified three distinct molecular subgroups of MCC with prognostic relevance. We propose that the dichotomic expression of epidermis‐related genes might reflect both an epidermal and a nonepidermal origin for MCPyV‐positive MCC.

MCC, including two Merkel cell polyomavirus (MCPyV)-positive MCC subgroups that differed in prognosis, tumor size and expression of genes associated with epidermal structures and activities.
The latter potentially reflects both epidermal and nonepidermal origins of MCPyV-positive MCC.
Meanwhile, MCPyV-negative MCC exhibited elevated expression of genes related to developmental processes and reduced expression of genes associated with neuroendocrine functions.

| INTRODUCTION
Merkel cell carcinoma (MCC) is a rare, yet highly aggressive neuroendocrine cutaneous carcinoma. In approximately 80% of cases, the main oncogenic event is clonal integration of the DNA genome of Merkel cell polyomavirus (MCPyV) and subsequent expression of the viral small and large T oncoproteins. [1][2][3] In MCPyV-negative MCC, the main oncogenic mechanism is UV-radiation and the prognosis is significantly worse than that of MCPyV-positive MCC. 2,4 MCPyV-negative MCC has a considerably higher mutational burden than MCPyV-positive MCC. 4 Furthermore, epigenetic dysregulations, such as improper histone posttranslational modifications and DNA methylation, are implicated in the etiology of both MCPyV-positive and -negative MCC. [5][6][7] The cell of origin of MCC is yet unknown, with different proffered cells of origin including epithelial stem cells of the hair follicle or interfollicular epidermis, dermal stem cells and a pro/pre-or pre-B cell. 8 An increasingly more popular theory involves MCC having more than one cell of origin, such as MCPyV-negative MCC originating from the epidermis and MCPyV-positive MCC arising from dermal fibroblasts. 9 This would entail that the term MCC in fact describes more than one type of carcinoma, which are currently grouped together due to phenotypic similarities but should perhaps receive different treatments.
The argument for an epidermal origin of MCPyV-negative MCC is supported by the presence of a UV-signature as well as the occurrence of combined MCC, wherein MCPyV-negative MCC occurs in conjunction with another carcinoma component, commonly squamous cell carcinoma (SCC). 4 Evidence of clonality between the MCC and the SCC in situ components in combined MCC strongly suggests the ability of MCPyV-negative MCC to arise from SCC in situ. 10,11 The argument for a dermal fibroblast as the cell of origin of MCPyVpositive MCC is supported by the capability of active replication of MCPyV in these cells, the absence of a UV-signature and the lack of an epidermal connection in most cases of MCC. 12 The theory of MCC arising from pro/pre-or pre-B cells is based on the expression of B-lymphoid lineage markers, such as PAX5 and TdT, in both MCPyV-positive and -negative cases, as well as the expression of immunoglobulins and occasional immunoglobulin gene rearrangement in MCPyV-positive MCC. 13,14 In our study, we aimed to explore the heterogeneous nature of MCC by investigating the most significant differences in gene expres-  15 The study cohort comprised 24 male and 78 female patients and the median age at diagnosis was 79.5 years (range 46-100).
MCPyV detection from paraffinized tumor blocks was performed previously and is described in detail elsewhere. 2 Briefly, the presence of MCPyV DNA was analyzed from DNA extracted from representative deparaffinized tumor sections. Quantitation of MCPyV DNA was performed using real-time polymerase chain reaction. The relative DNA sequence copy number for each tissue sample was expressed as a ratio of MCPyV DNA to protein tyrosine phosphatase gamma receptor gene (reference gene) DNA. The sample was considered positive when MCPyV DNA copy number per reference gene was greater than 0.1.

| RNA extraction, sequencing and processing of sequencing data
The extraction and sequencing of RNA and the processing of sequencing data were performed previously and are described in detail elsewhere. 16

| Gene clusters
The genes in gene clusters A-C are listed in Table S1. The top 10 GO BP terms most significantly enriched by the genes in gene clusters A-C are presented arranged according to their P values in Tables 1-3 along with their corresponding q values and a list of the genes causing the enrichment. The q value is an adjusted P value calculated using the Benjamini-Hochberg method for correction for multiple hypotheses testing. Complete lists of all significantly enriched (P < .05) terms are presented in Tables S2-S4. Gene cluster A, hereafter referred to as the developmental gene cluster, comprised 19 genes, including the seven HOX genes HOXA9, A10, C4, C6, C8, C9 and C10. Among the top 10 most significantly enriched GO BP terms were such developmental processes as skeletal system development, anterior/posterior pattern specification, embryonic skeletal system morphogenesis, embryonic skeletal system KRT14  CDSN  KPRP  TRIM29  PKP1  SFN  DSP  LY6D  S100A14  KLK5  PERP  LYPD3  SLC15A1  ELOVL3  SEC14L6  SAA1  KRT15  KRT77  IFFO2  TFAP2C  LRATD1  KRT6A  KRT6B  KRT16  KRT17  GJB2  KRT6C  S100A9  S100A8  S100A7  S100A2  CSTA  SPRR2G  CXCL14  PI16  KRT2  DCD  COMP  LRRC15  COL11A1  IFI44L  TIGIT  CCL19  GNLY

| Clinicopathological characteristics of patient groups
Detailed clinicopathological data are presented according to patient group in Table 4. The study cohort comprised 102 patients. The Kruskal-Wallis analysis revealed a significant association between patient group and primary tumor size (P = .001), and pairwise analyses revealed that a statistically relevant difference was present only between patient groups 2 and 3.

| Survival analysis
In Kaplan-Meier analysis including all three patient groups, patient group 1 had a significantly poorer MCC-specific survival than patient It is noteworthy that the average quality of the extracted RNA was relatively poor. The average RIN was 2.1 and there was no significant correlation between fixation year and RIN. This is a common challenge when extracting RNA from FFPE samples. However, 3 0 tag counting, which was used during RNA sequencing in our study, significantly improves specificity when examining differential gene expression of samples with low RIN. 29 Previously, we found a strong correlation between the logFC values of genes based on MCPyV status when comparing the transcriptomic data used in our study with transcriptomic data obtained from fresh frozen tissues provided by Harms et al (GSE39612), suggesting a high specificity of our transcriptomic data when studying differentially expressed genes, as compared with data from fresh frozen tissues. 16,30 Our findings could be further validated by studying an independent MCC cohort as well as validating RNA sequencing data using a different methodology, for example, quantitative PCR.
In conclusion, we identified three distinct molecular subgroups of