The LL-100 panel: 100 cell lines for blood cancer studies

For many years, immortalized cell lines have been used as model systems for cancer research. Cell line panels were established for basic research and drug development, but did not cover the full spectrum of leukemia and lymphoma. Therefore, we now developed a novel panel (LL-100), 100 cell lines covering 22 entities of human leukemia and lymphoma including T-cell, B-cell and myeloid malignancies. Importantly, all cell lines are unequivocally authenticated and assigned to the correct tissue. Cell line samples were proven to be free of mycoplasma and non-inherent virus contamination. Whole exome sequencing and RNA-sequencing of the 100 cell lines were conducted with a uniform methodology to complement existing data on these publicly available cell lines. We show that such comprehensive sequencing data can be used to find lymphoma-subtype-characteristic copy number aberrations, mRNA isoforms, transcription factor activities and expression patterns of NKL homeobox genes. These exemplary studies confirm that the novel LL-100 panel will be useful for understanding the function of oncogenes and tumor suppressor genes and to develop targeted therapies.

Human cancer cell lines form a renewable resource and are vital models for studying the cellular and molecular mechanisms underlying tumorigenesis as well as for anti-cancer drug screening 1,2 . In particular, leukemia-lymphoma (LL) cell lines serve as convenient in vitro tool due to their world-wide accessibility, straightforward manipulability and low culture costs, providing experimental models to address a multitude of questions in the field of LL biology 3 . Indeed, the scientific benefits of utilizing LL cell lines have definitely boosted our knowledge on a plethora of aspects of these diseases 4 . Importantly, many studies contoured our appreciation of the suitability of LL cell lines as model systems, replicating faithfully most features of the primary cells 5,6 .
The National Cancer Institute (NCI) tumor cell line panel (known as NCI-60 as 60 cancer cell lines were assembled) was developed in the 1980s as an in vitro drug discovery tool intended to supplant animal studies in drug screening (reviewed in 7 ). This screening tool was quickly appreciated as an invaluable source of information about the mechanisms of growth inhibition and tumor cell cytotoxicity 7 . Later in the 2000s, the NCI-60 panel transitioned from a drug-discovery pipeline to a more general research tool in support of the cancer research community 7,8 . Another panel incorporating a reduced number of cell lines of particular interest which had been derived from several solid tumor types was established in Japan 9 . These two cell line panels did not aim at one single cancer category but were designed to represent a variety of different tumor entities. Nevertheless, these sets have provided the framework for the use of defined panels of cell lines at the same time as keeping with the information-rich character of screens 7 .
The majority of studies in the arena of LL focus on a narrow number of cell lines. We realized that there is a need for a reference panel specialized on LL cell lines to facilitate hypothesis-driven research efforts 10 . We have assembled a panel of 100 authenticated LL cell lines that reflects the heterogeneity of the entities under the umbrella category of LL. In addition to well-known and commonly analyzed cell lines, this invaluable and publicly available platform includes additional cell lines assigned unequivocally to the various entities but with specific characteristics. It is hoped that this focused LL-100 cell lines panel may enhance the current scientific momentum, helping to fully elucidate the underlying pathology of these LL malignancies and providing an important and unique resource for the testing of novel therapeutic agents.
Based on data of the human genome project, high-throughput methods have boosted the knowledge of processes in normal and malignant cells. The microarray technology showed for the first time simultaneous activities of thousands of genes and allowed the classification of tissues and diseases 11 . This approach is being steadily replaced by next generation sequencing technologies which comprise the sequencing of complete transcriptomes, exomes and whole genomes. These applications are used in cancer research to identify aberrations in the genome, deregulated and mutated genes, and alternative splicing. The obtained data are helpful to classify malignancies, to improve existing therapies, and to identify new targets for novel therapeutic approaches 12 . Here, we present transcriptome and exome sequencing data of a panel of 100 authenticated LL cell lines (LL-100) and selected examples of their utilization.

Results and Discussion
Sequencing of exomes and transcriptomes of the LL-100 panel. We performed whole exome sequencing (WES) and mRNA-sequencing (RNA-seq) on a panel of 100 LL cell lines representing 22 subtypes (Table 1). For exomic analyses over 10 million reads (2 × 151 bases) per sample were sequenced resulting in at least 50x coverage on a 60 MB exome size. RNA-seq yielded over 29 million (2 × 151 bases) reads per sample. Sequencing data have been deposited at ENA under the accession number PRJEB30297 for WES and PRJEB30312 for RNA-seq, respectively.
Based on the analysis of WES and RNA-seq data we show the usefulness of the LL-100 panel for LL research in five exemplary studies.
PEL and HL cell lines cluster separate from cell lines of other B-NHL entities. For many years, expression profiling has been applied to classify tumors including LL 11 . RNA-seq and microarray analyses show highly reproducible results with correlation between expression profiles 13 . We performed cluster analysis to test whether the two techniques yield comparable results also in the LL-100 panel. We analyzed gene expression of primary effusion lymphoma (PEL) cell lines and of cell lines from various other B-non Hodgkin lymphoma (B-NHL) entities as well as from Hodgkin lymphoma (HL).
Unsupervised cluster analysis showed that all PEL cell lines grouped together, separate from cell lines derived from activated-B-cell-like (ABC) and germinal center (GC) diffuse large B-cell lymphoma (DLBCL), mantle cell lymphoma (MCL), primary mediastinal B-cell lymphoma (PMBL) and from cell lines derived from HL (Fig. 1a). Notably, PEL and HL cell lines clustered on one arm, separate from all cell lines representing the other B-NHL entities (Fig. 1a). Microarray and RNA-seq data yielded identical results, confirming the suitability of both techniques (Figs 1a, S1).
PEL and HL cell lines are characterized by a set of common up-and downregulated genes (Fig. S2). Prominent were expression of CCND2 and the absence of B-cell markers in PEL and HL cell lines. CD19, CD20 (MS4A1), CD24, CD79A and CD79B were expressed in all tested lymphoma entities beside PEL and HL (Fig. S2). Absence, low or rare expression of these "early" B-cell markers in PEL and HL has been described for both primary lymphoma cells and cell lines [14][15][16] .
PRDM1/BLIMP1 is a master regulator of terminal B-cell differentiation. Originally described as repressor 20 , BLIMP1 can also enhance transcription of SLAMF7 in multiple myeloma 21 and of IL-10 in type 1 regulatory T-cells 22 . Thus, coexpression of the three genes in PEL suggests a causal relationship between transcriptionally active PRDM1 and the targets SLAMF7 and IL-10 also in this B-NHL entity. Independent of its regulation, the expression of SLAMF7 in PEL is remarkable because a monoclonal antibody targeting SLAMF7 (elotuzumab) has recently been approved for treatment of patients with multiple myeloma 23 . RQ-PCR analysis showed that SLAMF7 is comparably expressed in PEL and multiple myeloma cell lines (Fig. S4).
PEL is a rare, aggressive form of NHL, cells typically being infected with HHV-8 14 . With a median survival time of six months the prognosis for PEL patients is poor 24 . If our cell line results can be translated to primary tumor cells, PEL patients might benefit from targeted therapy with elotuzumab.

Activities of hematopoietic transcription factors in leukemia-lymphoma cell lines. Numerous
transcription factors (TF) regulate normal hematopoiesis and their activities are precisely controlled during hematopoietic stem cell self-renewal and their differentiation into the diverse blood cell lineages. Consequently, many of these TFs emerged as proto-oncogenes or tumor suppressors because deregulation of these TFs alters the cellular transcriptional program eventually impairing differentiation and thus fostering malignant transformation. Aberrant activities of TFs which can be caused by a variety of direct or indirect mutations and epigenetic alterations, are a hallmark of cancer, including hematological malignancies 25,26 .
We aimed to analyze TF activities from TFs relevant for hematopoiesis across the LL-100 panel. Because the expression level of a TF itself barely gives information about its downstream activity 27 , activities of TFs were predicted via the expression levels of their direct target genes. TF activities were estimated via so-called consensus TF regulons (CTFRs) which have been defined recently by Garcia-Alonso et al. on the basis of diverse sources for human TF-target interactions 28,29 . For each cell line from the LL-100 panel relative TF activities were computed from RNA-seq data applying DoRothEA (Discriminant Regulon Expression Analysis) for CTFRs from 289 single TFs (Table S1). From these 289 TFs we selected 20 TFs based on their known role in hematopoiesis. The activities of the respective CTFRs in the LL-100 cell lines are represented in Fig. 2.
Obviously, activity patterns of several TFs within the cell lines mirror their cell of origin: PAX5 and OCT-2 (encoded by POU2F2) are critical for B-cell development 30 . Accordingly, the CTFRs of these TFs showed strong activity in cell lines from B-cell derived malignancies but were inactive in myeloid-derived leukemias (Fig. 2). Other TF activities reflect the differentiation status of their respective normal counterparts: the strong activity of the CTFRs from GATA1 and GATA2 was highly specific for the cell lines from erythroid and megakaryocytic AML, CML and in cell line SET-2 (myeloproliferative neoplasm) (Fig. 2), which is in line with the role of GATA1 www.nature.com/scientificreports www.nature.com/scientificreports/ and GATA2 in the differentiation of erythroid-megakaryocytic progenitors where alterations in their dosages are related to transformation 31 .
Other CTFR activities indicate the mutation status of hematopoietic TFs in specific entities: C/EBPα is a TF relevant for granulopoiesis and AML patients frequently show inactivating mutations of C/EBPα impairing final differentiation of the cells 32 . Accordingly, the activity of the CEBPA-CTFR was diminished in erythroid and megakaryocytic AML cell lines compared to myelocytic and monocytic AML cell lines (Fig. 2).
Another subset of TF activities is characteristic for specific leukemia or lymphoma entities: TAL1 impairs T-cell differentiation and is a master oncogene in T-ALL 33 . Accordingly, T-ALL cell lines showed the strongest activity of the TAL1-CTFR (Fig. 2).
The lymphocyte specific TF LEF1 was primarily active in pre-B-ALL and CLL/PLL cell lines but rather inactive in T-ALL and ALCL cell lines (Fig. 2), which is in line with the current literature [34][35][36] . In addition, the LEF1-CTFR activity was moderately upregulated in MCL cell lines and some BL and GC-DLBCL cell lines (Fig. 2). This activity pattern seems to reflect the situation in patients because upregulation of LEF1 in subsets of B-NHL patients has been reported before [37][38][39] .
For 9 of the 20 hematopoietic TFs we observed a moderate positive correlation between gene expression levels of the TFs and their corresponding CTFR activities ( Table 2, Fig. S5). Best correlations were detected for PU.1 (encoded by SPI1) and GATA2. However, in general hematopoietic TF activities determined via CTFRs did hardly correlate with gene expression levels of the TFs (Fig. S6). For example TAL1 expression was rather weak in cell lines from CML in blast crisis (Fig. S5), but the TAL1-CTFR activity was increased in these cell lines (Fig. 2). On the other hand TAL1 expression was detected in several AML cell lines on a comparable level to T-ALL cell lines (Fig. S5), but activity of the TAL1-CTFR was low in AML cell lines (Fig. 2). This underpins that upregulation of a TF alone is not sufficient to regulate its target genes. In some cases a defined CTFR (e.g. from TAL1) might also be regulated by further transcriptional activators or repressors.
In summary, activity scores of CTFRs are much more informative concerning the role of a TF than its transcript levels alone. We show that transcriptional activities in LL-100 cell lines mirror the lineage origin of hematologic malignancies for a set of specific TFs (e.g. PAX5). Other TFs (e.g. GATA1) reflect the differentiation status of the respective normal counterpart and a third group of TFs (e.g. TAL1) depicts aberrant activities highly characteristic for specific entities. In general, TF activities across all studied cell lines did rarely correlate with their gene expression levels. Thus, analyses of CTFR activities from RNA-seq data are a suitable tool to measure and evaluate the relevance of a specific TF in hematological cell lines.
Aberrant NKL homeobox gene activities in lymphoid malignancies. Homeobox genes encode TFs which show basic impacts in developmental processes including embryonal development and cell differentiation in the adult. Therefore, deregulation of homeobox genes generates developmental disturbances or cancer 40 . These genes are classified according to differences in their conserved homeobox and ordered into classes and   www.nature.com/scientificreports www.nature.com/scientificreports/ subclasses 41 . The NKL subclass comprises 48 members which are involved in fundamental differentiation processes like NKX2-1 in that of the lung and the thyroid, and NKX2-5 in the heart 42,43 . Normal expression patterns of nine NKL homeobox genes in early hematopoiesis and subsequent lymphocyte development have been identified and termed hematopoietic NKL-code 44,45 . According to this code, T-cells silence all NKL homeobox genes during their thymic development while mature NK-cells maintain expression of MSX1 and mature B-cells HHEX or NKX6-3 [44][45][46] . Alterations of the NKL-code may underlie the generation of particular hematopoietic malignancies. According to this notion, 24 NKL homeobox genes are reported to date for aberrant activity in T-cell acute lymphoblastic leukemia (T-ALL), mediating differentiation arrest and transformation 44,47,48 .
Using the LL-100 transcriptome dataset we here screened NKL homeobox gene activities in cell lines and show some results for selected lymphoid entities. To discriminate active and inactive genes we have set a cutoff at 500 normalized counts. Accordingly, aberrant activation of particular subclass members was detected in immature T-ALL but not in mature T-cell lines (Fig. 3a). This finding supported the observation that NKL homeo-oncogenes provoke an arrest in differentiation which plays a role in immature thymocytes but obviously not in mature T-cells. Furthermore, MSX1 is an oncogene in T-ALL and a tumor suppressor in NK-cells 46,49 , showing accordingly reduced activity in NK-cell leukemia cell lines (Fig. 3a). Thus, these data confirm published deregulated NKL homeobox genes including MSX1, NKX2-5, NKX3-1 and TLX3 in T-cell and NK-cell leukemia. Moreover, our RNA-seq data indicated elevated NKX2-1 expression in T-ALL cell line RPMI-8402 (Fig. 3a). Aberrant activation of NKX2-1 has been identified in T-ALL patients by chromosomal translocation, representing thus an additional clinically relevant oncogene 50 . Subsequent RQ-PCR analysis confirmed NKX2-1 activity in this cell line (Fig. 3b). Of note, chromosomal and genomic analyses indicated absence of a translocation or an amplification targeting NKX2-1 in RPMI-8402 cells (data not shown). Therefore, this cell line may represent a model to examine alternative upstream and novel downstream factors of NKX2-1 in T-ALL.
In normal B-cell development NKL homeobox genes HHEX and NKX6-3 are the only subclass members active in developing naïve and germinal center B-cells, and in mature memory B-cells and plasma cells while B-cell progenitor (BCP) cells additionally express HLX and MSX1 44,45 . Our data show that malignant BCP-ALL cell lines lack activity of HLX, MSX1 and NKX6-3 (except SEM) (Fig. S7), showing fundamental changes in the normal expression pattern of NKL homeobox genes. Furthermore, three of five cell lines aberrantly expressed HMX2 or HMX3. The activity of these genes was confirmed by RQ-PCR in the indicated cell lines (Fig. 3c). Moreover, HMX2 overexpression was detected in 13% of 229 BCP-ALL patients by analysis of public dataset GSE79533 (Fig. 3c), supporting the clinical relevance of this finding. Thus, HMX2 (and HMX3) may represent NKL homeobox genes primarily deregulated in this type of B-cell malignancy, serving as diagnostic marker and/ or therapeutic target.
In DLBCL cell lines we detected silencing of HHEX (in OCI-LY7 and RI-1) and NKX6-3 (except DOHH-2) and aberrant activation of HLX (NU-DHL-1) and NKX3-1 (OCI-LY3) (Fig. S7). Of note, these data did not show significant differences between ABC-and GC-DLBCL cell lines, suggesting that NKL homeobox genes do not play a role in the discrimination of these disease subtypes. Surprisingly, PEL and MM cell lines (except RPMI-8226 expressing BARX2) demonstrated complete absence of NKL homeobox gene activity (Fig. S7). Therefore, NKL subclass members may operate as basic tumor suppressors in these particular B-cell lymphoma types. The malignant cells of both PEL and MM are derived from mature B-cells suggesting that in final stages of development NKL homeobox genes lose their oncogenic potential. Finally, the lack of B-cell specific NKL homeobox gene activity in PEL is in accordance with reported downregulation of general B-cell factors as indicated above (Fig. S2) 15 . Together, deregulation of NKL homeobox genes in B-cell malignancies is more important than hitherto expected. The identified cell lines may serve as models to investigate the role of these genes in the indicated tumor entities. Thus, the LL-100 datasets allow the identification of cell line models for the examination of deregulated NKL homeobox genes in particular disease entities. The expression patterns of this fundamental gene subclass in cell lines reflect the situation observed in normal lymphocytes and in primary tumor cells, highlighting the significance of these cell line data for cancer research.

Copy number alterations and their effect on gene expression in DLBCL. DLBCL shows a high
degree of genetic diversity with unique molecular patterns including varying occurrences of copy number alterations (CNAs), resembling the different states of B-cell maturation they derive from and which is also reflected by the diverse clinical outcome [51][52][53][54] .
To evaluate such alterations and to test if differences in the molecular subtypes are maintained in culture, we used WES data generated in our study to call CNAs in DLBCL derived cell lines. In ABC and GC DLBCL, we identified on average 157 (+/−29) and 129 (+/−12) CNAs, respectively, with a size >10 kb (Fig. 4a and Table 3). While amplification of both arms of chr7 occurs frequently in both subtypes, certain events like the 6q-deletion seem to occur more often in the ABC-subtype. Intending to compare the identified events to primary tumors, we took a recently published set of significantly recurrent CNAs in DLBCL which described a total of 45 recurring focal events (14 amplifications and 31 deletions) in 304 patients 55 . Of those CNAs, we find 38 (84%) in at least one cell line, including all focal amplifications (Fig. 4a and Table S2).
Similarly, of the 20 arm-level alterations described, 10 out of 18 amplifications and both recurrent deletions are present in one or more cell lines (Table S2). Of particular interest are subtype specific alterations, possibly reflecting different mutational processes during tumor development. Therefore, we assessed such specific events by integrating another set of ABC-or GC-related CNAs 56 and found 83% of patient-derived events in our cell lines (15 out of 18; Table S2). In addition to the previously observed 6q-deletion, we could confirm a preferentially occurring gain of 18q22-q23 in ABC-DLBCL cell lines (3/5 ABC-DLBCL cell lines and 0/6 cell lines of the GC-subtype). Also, we find the deletion of the far end of 1p36 exclusively in 4/6 GC-DLBCL cell lines (Fig. 4b).
While CNAs can serve as diagnostic markers, their main impact results from associated changes in gene expression. We therefore compared the expression of all affected genes in a DLBCL-subtype specific www.nature.com/scientificreports www.nature.com/scientificreports/ manner, identifying around 20,000 genes to be affected by CNAs across all cell lines. Although we observe little overall changes for the majority of genes, several outliers are present in each cell line (Fig. S8). To identify CNA-deregulated genes important for disease progression, we filtered for those genes included in the COSMIC   (Table S3), with 10 (DOHH-2) to 121 (OCI-LY3) showing changes in expression >1.5, concordant with the respective CN change (Fig. 4c). Several of these genes have been confirmed deregulated in patient derived RNA-seq data 55 , which we also find associated with the corresponding CNA, e.g. BCL2 and KDSR on chr18. Interestingly, we also find PRDM1 and FOXO3 to show reduced expression in ABC-DLBCL cell lines harboring a deletion of 6q21. This deletion has been described to be ABC-subtype related in an earlier study 53 . Nevertheless, these authors did not identify potential tumor-relevant genes in this deleted region 53 .
In summary, we exemplarily identified CNAs in cell lines derived from both DLBCL-subtypes and characterized the associated expressional changes. We find a high degree of similarity towards data from primary tumors and highlighted which cancer-relevant genes become deregulated in the individual cell lines.
This analysis (i) characterizes CNAs in cell lines of the two major DLBCL-subtypes and shows how they recapitulate recurring events from patients, (ii) allows the identification of those genes that are deregulated by CNAs and likely have a disease-relevant function and (iii) by doing so enables the selection of appropriate models for further molecular research related to DLBCL. Furthermore, we believe that this kind of analysis is applicable to other entities and that thereby valuable models for those entities can be obtained.
Tissue-specific RNA isoforms. Allowing different combinations of exons, alternative splicing leads to the production of multiple mRNA isoforms of the same gene, often resulting in proteins of different functionality 58 . More than 90% of human genes are affected by alternative splicing 59,60 . Tissue-specific splice factors together with ubiquitious RNA binding factors cooperate to generate tissue-specific RNA isoforms 59 . The existence of different promoters can also lead to different N-terminal RNA variants.
The RNA-seq data of the LL-100 panel allowed us to find RNA isoforms that specify different hematopoetic lineages. Bioinformatic analysis identified genes with tissue-specific exons, e.g. LIMS1 in myeloid vs T-cell lines (Fig. S9). Two N-terminal variants of LIMS1 were expressed in myeloid cell lines, only one of them in T-cell lines (Fig. S9). These results were confirmed by RT-PCR and validated with a second cell lines cohort (Figs 5, S10). Altogether, data from 18 AML and from 17 T-ALL cell lines revealed that the two groups could be distinguished on the basis of LIMS1 exon 1 expression (NM_001193488) with a sensitivity of 1 and a specificity of 1.
In sum, our data show that RNA-seq analysis allows detection of cell lines from different lineages on the basis of alternatively expressed exons.  Table 3. Chromosomal gains and losses in ABC and GC DLBCL. Cell lines were grouped according to ABCor GC-subtype, total number of CNAs, and gains and losses. CNAs were called with control-FREEC using the B-lymphoblastoid cell line NC-NC as normal control. Neighbouring alterations with identical copy number were fused and CNAs <10 kb were omitted. www.nature.com/scientificreports www.nature.com/scientificreports/

Conclusion
One goal of personalized medicine in cancer medicine is the development of targeted therapies aiming to reverse detrimental effects of mutated or deregulated genes. The costs of sequencing technologies will presumably soon be low enough to allow routine diagnostics detecting genetic alterations for classification of the patient's tumor and determining treatment strategies.
Immortalized tumor cell lines have been used for a long time to understand the molecular and cellular function of mutated genes and to develop new drugs. However, the cell line panels established hitherto did not represent most forms of leukemia and lymphoma 7,9 . Thus, the NCI-60 human cell line panel developed for use in drug development comprises sixty human cancer cell lines derived from nine different tissues 7 . Only six cell lines (CCRF-CEM, HL-60, K-562, MOLT-4, RPMI-8226, SR-786) represent tumors of the blood.
Covering 22 leukemia and lymphoma entities we present the novel LL-100 panel, 100 cell lines for use in basic research and drug development. The selected cell lines of this panel are authenticated and free of contamination by mycoplasma or non-inherent viruses. Furthermore, the methods of RNA-and DNA isolation and sequencing are identical in all cell lines. Therefore, this dataset allows comparative studies without methodical impact. We performed WES and RNA-seq analysis for all 100 cell lines. In exemplary studies, we show that lymphoma entities can be identified by gene expression analysis and splice variant analysis. WES analysis documented that copy number aberrations in DLBCL cell lines reflects the situation in primary tumor cells and may lead to the identification of potential oncogenes. RNA-seq analysis identified tumor entity-specific activities of CTFRs, demonstrating the usefulness of cell lines as model systems for transcription factor research. Finally, RNA-seq analysis specified aberrant activities of NKL homeobox genes.
All data and cell lines are publicly available. As demonstrated exemplarily in this study, the sequencing data can be used for various approaches. We hope that the novel LL-100 panel described here will stimulate many studies in the field of leukemia and lymphoma research.

Methods
Cell lines. Cell lines were taken from the stock of the cell lines bank (Leibniz Institute DSMZ -German Collection of Microorganisms and Cell Cultures). Cell lines were authenticated by DNA profiling and cytogenetics. Detailed references and cultivation protocols have been described previously 3 .
RNA-sequencing analysis. Total RNA was extracted via miRNeasy Mini Kit (Qiagen, Hilden, Germany) including DNase digestion. Library preparation and sequencing steps were commissioned to GATC Biotech (Cologne, Germany). The GATC pipeline included the production of strand-specific (fr-first strand) mRNA libraries, quality control via Applied Biosystems Fragment Analyzer and Nanodrop, concentration measurement via Qubit fluorometer. The libraries were sequenced on Illumina HiSeq2500 (2 × 151 cycles, paired end run, 8 bp dual indices) with >29 million reads per sample and deposited at ENA (PRJEB30312). Reads were trimmed via fastq-mcf (ea-utils 1.04.807). Reads were quality controlled via FastQC (www.bioinformatics.babraham.ac.uk/ projects/fastqc). Reads were aligned by STAR (2.5.3a) 61 to the Gencode Homo sapiens genome (v26) and converted/sorted via samtools (0. 1.19) 62 . Counting the reads to each gene was done via HTSeq-count python script (0.8.0) 63 . Data was processed and analyzed in the R/Bioconductor environment (3.3.2/3.3, www.bioconductor. org). Normalization, estimation of dispersions, and testing for differentially expressed genes based on a test assuming negative binomial data distribution was computed via DESeq2 64 .
Whole exome sequencing analysis. DNA was isolated with the High Pure PCR Template Preparation Kit (Roche Diagnostics, Mannheim, Germany). Library preparation (Agilent SureSelect Human All Exon V6, 60 MB) and sequencing steps (2 × 151 bp + 8 bp barcoding, HiSeqX) were commissioned to Genewiz (Leipzig, Germany) and deposited at ENA (PRJEB30297). Insert lengths were aimed to be higher than 250 bp in order to increase coverage and uniformity in coding regions 66 . Reads were aligned by STAR (2.5.3a) 61 to the human gencode genome (v26). Subsequently, alignment files were processed (samtools 0.1.19), duplicates removed (picard 2.9.2, www.broadinstitute.github.io/picard/), and variants called via GATK tools (3.7, Haplotypecaller) 67  For identifying copy number alterations (CNA) in DLBCL cell lines control-FREEC 70 (v11.0) was applied on the duplicate-clean alignment files with NC-NC as normal cell control. Neighboring regions with identical CN in the disperse whole exome data were fused to one region and CN regions below 10 kb were omitted and visualized via circos 71 (0.67-7). Individual regions and genes of interest were plotted with the R/bioconductor packages ggplot2 (3.1.0) and Gviz (1.22.3).

Expression array analysis. Profiling of gene expression was commissioned to the Genome Analytics
Research Group, Helmholtz Centre for Infection Research, Braunschweig, Germany. 500 ng total RNA were used for biotin labelling according to the 3′ IVT Express Kit (Affymetrix, Santa Clara, CA, USA). 7.5 µg of biotinylated cDNA were fragmented and placed in a hybridization cocktail containing four biotinylated hybridization controls (BioB, BioC, BioD, and Cre). Samples were hybridized to an identical lot of Affymetrix GeneChip HG-U133 Plus 2.0 for 16 h at 45 °C. Steps for washing and SA-PE staining were processed on the fluidics station 450 using the recommended FS450 protocol (Affymetrix). Image analysis was performed on GCS3000 Scanner and GCOS1.2 Software Suite (Affymetrix). For data analysis spot intensities were RMA-background corrected and