Genomic signatures in pediatric advanced stage Burkitt lymphoma/ leukemia in Chinese population by next generation sequencing

Background: Burkitt lymphoma/leukemia (BL/BAL) is the most common lymphoma in chidren and sporadic subtype is dominant in Chinese population. MYC gene translocation is essential for sporadic BL/BAL (sBL/BAL), whereas other gene mutations also play important roles in the development of sBL/BAL. Methods : The clinical data of ten Chinese children with sBL/BAL were collected; next generation sequencing of tumor tissues were detected. BL and diffused large B cell lymphoma (DLBCL) database was also collected and bioinformatics analyzed was done. Results : Nine male and one female were enrolled in the study, including six BL patients (stage III) and four BAL patients (stage IV). Average age at diagnosis was 100.10±13.39mon; and sBL/BAL may be inhibited by such signaling pathway; EGFR-TKI resistance pathway was also analyzed in sBL/BAL patients, it reveals that EGFR-TKI treatment is invalid. Further research is needed for the hypothesis and possible mechanisms.


Introduction
Burkitt lymphoma/leukemia (BL/BAL) is most common subtype of non-Hodgkin lymphoma (NHL) in children and adolescent, it occurs 30-50% of pediatric NHL populations; there are three reported variants of BL/BAL: endemic, sporadic and immunodeficiency-related BL/BAL [1]. Sporadic BL/BAL (sBL/BAL) is familiar throughout the world and it is common in Chinese race rather than endemic or immunodeficiency-associated BL/BAL [2].
Murphy staging system or revised international pediatric non-Hodgkin lymphoma staging system (IPNHLSS) has been proposed for pediatric BL cases; localized-stages (stage I or II) or advanced-stages (stage III or IV) are classified respectively [3]. With intensive combination chemotherapy, the prognosis of sBL/BAL in children and adolescents had been improved dramatically. Currently, 5-year event-free survival (EFS) of local or advanced stages of BL/BAL had reached to -100% or 85-90% respectively [1,2].
It is well known that t(8;14)(q24;q32) or its variants played key role in BL/BAL, translocation of MYC gene, which is located at band 8q24, is detectable in over 95% cases; Epstein-Barr virus (EBV) is also common in BL/BAL. The MYC translocation is the essential driver for overexpression of MYC gene, activation of MYC gene leads to cell cycle progression, inhibition of differentiation and effects promoting cell proliferation and/ or genomic instability [4]. Additional chromosomal abnormalities, recurrent transcript and/or gene mutations were also detected in BL/BAL patients and played their roles in the arising, progression and aggression of disease [5]. For instance, somatic mutation of TCF3 can activate the PI3K/MAPK/MTOR pathway, in part by increasing a tonic form of B cell receptor signaling; patients with germline mutation in SH2 domain protein 1A gene (SH2D1A) suffered from increased risk of BL/BAL [6].
Next-generation sequencing (NGS) studies had provided valuable insight into the landscape of genomic alterations in malignancies by using Whole exome sequencing (WES) and/ or RNA sequencing (RNAseq), it is helpful for researcher to elaborate the arising, pathogenesis and mechanism of such diseases [7], Bioinformatics analysis has been widely used in cancer research. WES sequencing or mRNAseq for sBL/BAL in Chinese children has not been demonstrated. In this study, we detected WES and/ or RNAseq in ten sBL/BAL patients, investigated the role of individual ID family members in BL/BAL and diffused large B cell lymphoma (DLBCL) using large databases and try to reveal the signaling pathway in pathogenesis and its relationship between clinical character and gene mutations.

Patients
Ten patients with newly diagnosed sBL/BAL at Children Hospital of Chongqing Medical University (CHCMU) between February 2018 and October 2019 were enrolled in the study. The diagnosis is in accordance with the World Health Organization criteria of 2016, and patients were staged with revised IPNHLSS [3].
Patients who were ≥18yr at diagnosis, diagnosed with Burkitt-like lymphoma with 11q aberration, secondary lymphoma, or had human immunodeficiency virus infection were excluded; patients who were classified with local stages (stage I or stage II), who had received chemotherapy before hospitalized, were also excluded from the study. The patients received chemotherapy in accordance to modified non-Hodgkin Lymphoma 1995, Berlin-Frankfurt-Münster, (BFM-95) protocol [8,9]. Intrathecal injection (iT) was administrated as protocol required, cranial radiotherapy was carried out for these patients with central nervous system (CNS) involvement. The detail of risk group, course of treatment and drugs dosage for modified BFM-95 protocol were listed in Supplementary (S1-S4).
The Ethics Administration Office of CHCMU granted ethic approval for the research, informed consents were obtained from the patients or their guardians.
Clinical data, laboratory findings and prognosis data of enrolled patients were collected and analyzed retrospectively from the medical record system.

Pathologic diagnosis
Pathologic diagnosis of BL patients were confirmed by lymph node (LN) biopsy. which were detected by multiplex RT-PCR as literature report [11].

DNA, RNA isolation and sequencing
Tumor DNA samples of BL or BAL patients were obtained from formalin-fixed specimen or BM samples at diagnosis; germline samples were collected from oral mucosa of patients and their parents' peripheral blood (PB). Genomic DNA was Discovered variants were divided into the following four categories according to literature reports [12] and software analysis: 1) Pathogenic genotypes that were confirmed by literature reports; 2) Likely pathogenic genotypes that were reported by literatures and, or affected protein by function predicting. 3
Pathogenic genotypes and likely pathogenic genotypes were recorded as causal gene mutations, causal gene mutations of tumor samples were confirmed by Sanger sequencing. Samples of control group were cross-checked and also detected by Sanger sequencing, somatic or germline causal gene mutations were identified.

PPI network construction and GO, KEGG pathway enrichment analysis
The detectable mutations by WES sequencing were collected, and R-package wordcloud2 was used to visualize the frequency of mutations in these cases. We used STRING (STRING, http://string-db.org , RRID: SCR_005223) to find protein to protein interaction (PPI) and visualize the interaction using Cytoscape (version3.7.1).
The gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were performed using R package clusterProfiler and Biological Networks Gene Ontology tool (BiNGO, RRID:SCR_005736).

Identification of DEGs between BL and DLBCL dataset
The gene chip dataset GSE4475, GSE10172, GSE43677, GSE48435 was downloaded from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/). We select BL and DLBCL dataset for downstream analysis. After quality control (QC), we got 41 BL samples and 178 DLBCL samples. Differential expression analysis was used R package limma.
Database of essential genes (DEGs) were screened at a statistical significance Benjamini and Hochberg false discovery rate-adjusted p-value cutoff of 0.05 and an absolute value of fold change greater than 1. Volcano plot show DEGs using R language ggplot2.

WGCNA analysis of the BL and DLBCL dataset
We used R package weighted gene co-expression network analysis (WGCNA) to construct coexpression modules. 219 samples were used to calculate the Pearson's Correlation Matrices. A power of 6 was selected. A unsigned hybrid co-expression network was then calculated using standard settings. We select 5000 genes to construct topological heatmap. We did a Pearson correlation between the module eigengenes and the traits data to identify module-trait relationships. At last, we select blue module (related to BL) and red module (related to DLBCL) genes to construct gene regulatory network and did the GO enrichment analysis.

Statistical Analysis
Event was defined as each of the following situation: refractory or relapsed disease, Proportional differences between patient groups were analyzed by Pearson chi-squared (χ 2 ) tests or Fisher's exact tests. P value <0.05 was regarded as statistical different.

Clinical, laboratory and pathologic characteristics
Nine male and one female were enrolled in the study, the six BL patients were The four BAL patients were diagnosed with ALL-L3 by FAB morphyology; matured B-ALL phenotypes were performed and overexpression of CD20 was detectable, restrict expression of Kappa or Lambda chains were detected in two and two patients respectively; t(8;14) and/or MYC rearrangement were confirmed by chromosomal karyotype and/or FISH; common fusion genes were undetectable. The six BL patients were confirmed by LN biopsy. Overexpression of CD20, Ki-67 and MYC were detected by IHC staining; FISH of MYC and EBER was positive, FISH of BCL-2, BCL-6 or MLL was negative.

Treatment and prognosis
Six BL patients (five patients were classified as R3 group; one patient was regarded as R2 group) and four BAL (R4 group) patients, received chemotherapy in accordance to modified BFM-95 protocol. Treatment effects were evaluated after one or two courses of chemotherapy for BAL or BL patients respectively, all of them achieved complete remission (CR). Chemotherapy were finished in these patients in 5-6mon. After that, the patient with CNS involvement received cranial radiotherapy (total dosage of 18Gy, divided into 10 times). With following-up to May of 2020, the 10 patients were alive without events, treat related mortality (TRM) was 0% and EFS rate was 100%.

Results of NGS sequencing
WES sequencing were supplied for the ten patients, germline causal gene mutations were detected in four (40%) patients; several somatic causal gene mutations were identified in the ten patients.
The data obtained by WES sequencing were analyzed using R-package wordcloud2 to visualize the frequency of candidate gene mutations, PPI network, GO enrichment and KEGG pathway analysis in these patients. (1) Identified gene mutations were showed by tag cloud using word size according to the gene frequency (Fig-1a), it demonstrated that ID3, BRCA2, ARID1A and SMARCA4 mutations, except MYC mutation, were the most common mutations; the identified gene mutations were also accumulated by PPI network (Fig-1b), it showed that proteins which were modified by MYC, HRAS, TP53 and NOTCH1 genes were the key proteins. The figure indicates that these identified gene mutations play important roles in the pathopoiesis of BL/BAL. (2) Identified genes were enriched by GO analysis (Fig-2a). P value was defined as 0.05, the top gene functions in pathopoiesis were: leukocyte differentiation and regulation of hemopoiesis in biological process (BP), nuclear chromosomal part and chromatin in cellular component (CC), chromatin binding and transcription coregulator activity in molecular function (MF); these genes were also enriched and connected by KEGG pathway (Fig-2b), it seemed that PI3K-Akt signaling pathway was the key role in the pathopoiesis of BL/BAL.
RNAseq was supplied and gene transcripts were analysized for the four MAL patients, the results were listed in Supplements.

Identification of DEGs and WGCNA analysis
For further understanding the pathophysiolopoiesis of BL/BAL, up-regulated genes were selected in BL and DLBCL dataset, we identified DEGs and get WGCNA analysis of between BL and DLBCL dataset, genes expression analysis for BL and DLBCL microarray data showed different by volcano plot (Fig-3); up-regulated genes in BL or DLBCL dataset was enriched by GO analysis and KEGG pathway respectively (Fig-4 and Fig-5), P value was also defined as 0.05, the gene functions in pathopoiesis were completely differ from BL to DLBCL in dataset.
The Topological Overlap Matrix (TOM) of co-expressed genes in different modules in the top 5000 genes was showed by Heatmap (Fig-6), eigengene adjacency heatmap for different modules was also presented. Module and trait relationships. The co-expression network of the significant genes related to BL (Fig-7a) or DLBCL (Fig-7b) was presented; GO enrichment of the significant genes related to BL (Fig-8a) and DLBCL (Fig-8b) were also demonstrated. It also presented with different co-expression networks or GO enrichment genes.

Discussion
BL/BAL is an aggressive non-Hodgkin lymphoma which was derived from germinal center B cells, and BAL is regarded as leukemic phrase of BL [1]. The peak age of sBL/BAL was 11yr and male is common than female, the enrolled ten patients included nine male and one female, the average age of these patients was 100.10±13.39m, clinical data in the study is similar to literature review [1][2]10].
The prognosis of sBL/BAL remains poor in past decades, with short, intensive chemotherapy, the survival rate had been improved steadly. Overexpression of CD20, restrict expression of Kamppa or Lambda chain are remarkable in BL/BAL, with combination treatment of intensive chemotherapy and CD20 monoclonal antibody (Rituximab), the prognosis of pediatric sBL/BAL had exceed to 90%, EFS of the ten patients in the study was 100%, the treatment and diagnosis for these patients was alike with literatures [13][14].
The molecular hallmark of BL/BAL is translocation of the oncogenic MYC, similar translocations are also expressed in other NHL [10]. Although translocation of oncogenic MYC is detectable in these subtypes of NHL, clinical manifestation and prognosis remained diversity, diversity of sBL/BAL is also existed in children and adult [1,4,15]. It suggest that pediatric sBL/BAL can be distinguished from others by gene expression profiling, and differentiation of gene expression profiling potentially reflect distinct pathogenetic mechanisms.
Dataset of BL and DLBCL were collected and differential expression analysis was analysized, statistical significance were presented; it reveal different gene expression profiling between BL and DLBCL, and it also indicate that the pathophysiolopoiesis between BL and DLBCL was distinct.
Data of WES sequencing in the ten pediatric sBL/BAL of Chinese populations were achieved and analysized. ID3, BRCA2, ARID1A and SMARCA4 mutations, except MYC mutation, were common (Fig-1). By literature review [10,[15][16][17][18] Genetic susceptibility individuals such as SH2D1A mutations are at greatly increased risk of developing BL; it is interesting that germline causal gene mutations were detected in four of the ten patients, but large sample and multiple centers are needed to verify the exact detection rate.
These mutated genes were enriched and connected by GO or KEGG pathway (Fig-2), it seemed that PI3K-Akt signaling pathway was the key role in the pathopoiesis of BL/BAL, similar finding was also analyzed by dataset and reported by literature review [16,17], BL/BAL may be inhibited by such signaling pathway [18].
Epidermal growth factor receptor (EGFR) is a tyrosine kinase; EGFR gene mutations and overexpression of its protein are associated with cancers growth [19].
Tyrosine kinase inhibitor (TKI) against EGFR (EGFR-TKI) has been administered for cancer patients with EGFR mutations, lung adenocarcinoma, i.e [20]. However EGFR-TKI resistance pathway was found by KEGG enrichment for identified genes in BL/BAL patients, it reveals that EGFR-TKI treatment is invalid in BL/BAL patients; further research is needed for the hypothesis and possible mechanisms.

Conclusion
BL/BAL is a highly aggressive but curable subtype of lymphoma, with combination treatment of intensive chemotherapy and Rituximab, the survival rate of BL/BAL has been improved steady; the pathophysiolopoiesis is still in research. The molecular hallmark of sBL/BAL is MYC translocation, whereas additional chromosomal abnormalities and gene mutations also occur and play roles in the progression of the disease.
In this study, NGS sequencing was detected and analyzed in pediatric sBL/BAL patients in Chinese population. Other recurrent mutations except MYC gene were detected, and possible signal pathways were also demonstrated, further lab studies will be carry on to prove it.