Skip to main content

Genomic, transcriptomic and epigenomic sequencing data of the B-cell leukemia cell line REH

Abstract

Objectives

The aim of this data paper is to describe a collection of 33 genomic, transcriptomic and epigenomic sequencing datasets of the B-cell acute lymphoblastic leukemia (ALL) cell line REH. REH is one of the most frequently used cell lines for functional studies of pediatric ALL, and these data provide a multi-faceted characterization of its molecular features. The datasets described herein, generated with short- and long-read sequencing technologies, can both provide insights into the complex aberrant karyotype of REH, and be used as reference datasets for sequencing data quality assessment or for methods development.

Data description

This paper describes 33 datasets corresponding to 867 gigabases of raw sequencing data generated from the REH cell line. These datasets include five different approaches for whole genome sequencing (WGS) on four sequencing platforms, two RNA sequencing (RNA-seq) techniques on two different sequencing platforms, DNA methylation sequencing, and single-cell ATAC-sequencing.

Peer Review reports

Objective

Human cell lines are commonly used by researchers as accessible models of disease [1, 2]. The REH cell line, derived from a fifteen-year old female patient at relapse, is frequently used in the study of ALL, the most common cancer in children [3, 4], as well as for method development [5]. At the same time, next-generation sequencing has become an invaluable tool for cancer research [6, 7], while long-read technology increasingly offers novel insights into complex oncological aberrations [8, 9]. Therefore, a multi-faceted dataset encompassing the genomics, transcriptomics and epigenomics of a cell line such as REH can be a valuable resource for leukemia researchers. Likewise, developers of bioinformatic analysis software stand to benefit from the availability of publicly available reference datasets [10, 11].

A subset of the datasets in this project were used for downstream analysis with the purpose of cataloging the structural variants and fusion genes of the REH cell line [12]. For this project, mapping was performed to the human reference genome GRCh38. Additionally, the long-read WGS datasets were subjected to de-novo assembly.

Here, we present the raw sequencing reads as well as assemblies and mapped BAM files in order to make the data available to the research community.

Data description

The project consists of 33 sequencing datasets generated from the ALL cell line REH, which was obtained from DSMZ (ACC 22) and cultured according to the supplier’s specifications (see Supplemental Methods) [13, 14]. The cell line’s authenticity was confirmed by karyotyping [12] and STR analysis [15]. The datasets are divided into nine library types. Of the seven library types using DNA as input, five are whole genome sequencing (WGS) methods producing genomic data [16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34], one is a method producing chromatin accessibility data (single-cell ATAC-seq) [35,36,37,38,39], and one is a whole genome methylome sequencing method producing epigenomic data (EM-seq) [40, 41]. The WGS methods include Illumina TruSeq DNA PCR-Free, PacBio SMRT, Oxford Nanopore (ONT), MGISEQ stLFR and linked-read WGS (10x Genomics), while RNA was used as input to two RNA-seq methods, Illumina TruSeq Stranded Total RNA and PacBio IsoSeq [42,43,44,45,46,47,48]. The datasets include raw sequencing reads in FASTA and FASTQ formats, reference-mapped BAM files, and de-novo assemblies (Table 1).

Genomic datasets

The genomic data consists of short- and long-read sequencing datasets, including de-novo assemblies, providing a combination of generous coverage and contiguity that allows for the in-depth analysis of the genomic variation present in this cell line. Included are FASTQ files from the short-read WGS sequencing of two lanes prepared with the Illumina TruSeq DNA PCR-Free kit and sequenced on the HiSeq X sequencer with PE150 read-length, as well as a BAM file of the reads mapped to human reference genome GRCh38.

Long-read WGS datasets include FASTQ files generated from a CLR library and a HiFi library sequenced on the PacBio Sequel II, as well as six ONT libraries prepared with three different kits using DNA selected to varying sizes and sequenced on the PromethION 24. BAM files mapping reads generated from both PacBio libraries and the ONT ultralong library to GRCh38 are included, as are three de-novo assemblies generated from these reads using hifiasm and flye.

Additionally, there are FASTQ files from one WGS library prepared with BGI’s MGIEasy stLFR kit and sequenced on the MGISEQ-2000RS, as well as from two linked-read WGS libraries prepared using the 10x Genomics Gemcode kit and sequenced on the Illumina HiSeq 2500.

Chromatin accessibility datasets

Single-cell ATAC-seq enables the selective sequencing of chromatin-accessible genomic regions, allowing for the determination of chromatin accessibility profiles on a cellular level. A library was prepared using the Chromium Single Cell ATAC Reagent Kit from 10X Genomics and sequenced on an SP flowcell on an Illumina NovaSeq 6000 instrument. FASTQ data, plus a BAM file mapping this data to GRCh38, are included among the datasets.

Epigenomic datasets

Methylome analysis of the REH cell line can be performed using the epigenomic data sets, which identify 5-mC or 5-hmC modifications to DNA. Two such libraries were prepared with 10 ng and 100 ng input DNA using the NEBNext enzymatic methyl-seq kit (EM-seq). The libraries were sequenced on an Illumina NovaSeq 6000 on an S4 flowcell.

Transcriptomic datasets

The datasets include both short-read and long-read transcriptomic data, allowing insight into gene expression and aberrations such as fusion genes, as well as detailed transcript splicing information. The RNA-seq datasets include FASTQ files from the short-read sequencing of two lanes prepared with the Illumina TruSeq Stranded Total RNA kit and sequenced PE-100 on a NovaSeq 6000 instrument, as well as a BAM file of these reads mapped to GRCh38. The long-read RNA-seq data consists of two IsoSeq libraries, with a varying bead ratio used to generate one library with standard-length transcripts and one library with full-length transcripts. An additional dataset containing resulting FLNC reads and a BAM file mapping them to GRCh38 is included for each of the IsoSeq libraries.

Table 1 Overview of data files/data sets

Limitations

  • The 10x Genomics Gemcode linked-read sequencing technology is discontinued.

  • The MGISEQ WGS data was sequenced to low (~ 10x) sequencing depth.

  • The REH cells used to generate the datasets herein were obtained from a single source. Given that cell lines may undergo alterations during proliferation, leading to genetic heterogeneity within the cell population, these data may not serve as a universal reference for all REH cultures.

Data Availability

The data described in this Data note can be freely and openly accessed on SRA under BioProject PRJNA600820 [16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48]. Supplemental methods are available at https://doi.org/10.6084/M9.FIGSHARE.22643065 [14].

References

  1. Gillet J-P, Varma S, Gottesman MM. The clinical relevance of Cancer Cell Lines. JNCI J Natl Cancer Inst. 2013;105:452–8.

    Article  CAS  PubMed  Google Scholar 

  2. Gazdar AF, Minna JD. Cell lines as an investigational tool for the study of biology of small cell lung cancer. Eur J Cancer Clin Oncol. 1986;22:909–11.

    Article  CAS  PubMed  Google Scholar 

  3. Rosenfeld C, Goutner A, Choquet C, Venuat AM, Kayibanda B, Pico JL, et al. Phenotypic characterisation of a unique non-T, non-B acute lymphoblastic leukaemia cell line. Nature. 1977;267:841–3.

    Article  CAS  PubMed  Google Scholar 

  4. Cortes JE, Kantarjian HM. Acute lymphoblastic leukemia a comprehensive review with emphasis on biology and therapy. Cancer. 1995;76:2393–417.

    Article  CAS  PubMed  Google Scholar 

  5. Raine A, Manlig E, Wahlberg P, Syvänen A-C, Nordlund J. SPlinted Ligation Adapter Tagging (SPLAT), a novel library preparation method for whole genome bisulphite sequencing. Nucleic Acids Res. 2017;45:e36–6.

    Article  PubMed  Google Scholar 

  6. Shyr D, Liu Q. Next generation sequencing in cancer research and clinical application. Biol Proced Online. 2013;15:4.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. LeBlanc VG, Marra MA. Next-generation sequencing approaches in Cancer: where have they brought us and where will they take us? Cancers. 2015;7:1925–58.

    Article  PubMed  PubMed Central  Google Scholar 

  8. Sakamoto Y, Sereewattanawoot S, Suzuki A. A new era of long-read sequencing for cancer genomics. J Hum Genet. 2020;65:3–10.

    Article  PubMed  Google Scholar 

  9. Rausch T, Snajder R, Leger A, Simovic M, Giurgiu M, Villacorta L, et al. Long-read sequencing of diagnosis and post-therapy medulloblastoma reveals complex rearrangement patterns and epigenetic signatures. Cell Genomics. 2023;3:100281.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Fang LT, Zhu B, Zhao Y, Chen W, Yang Z, Kerrigan L, et al. Establishing community reference samples, data and call sets for benchmarking cancer mutation detection using whole-genome sequencing. Nat Biotechnol. 2021;39:1151–60.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Ren L, Duan X, Dong L, Zhang R, Yang J, Gao Y et al. Quartet DNA reference materials and datasets for comprehensively evaluating germline variants calling performance. preprint. Bioinformatics; 2022.

  12. Lysenkova Wiklander M, Arvidsson G, Bunikis I, Lundmark A, Raine A, Marincevic-Zuniga Y et al. A complete digital karyotype of the B-cell leukemia REH cell line resolved by long-read sequencing. preprint. Cancer Biology; 2023.

  13. Lysenkova Wiklander M. REH Data Note - Overview of REH sequencing datasets. 2023. https://doi.org/10.6084/m9.figshare.23966340. Accessed 16 Aug 2023.

  14. Lysenkova Wiklander M. REH Data Note - Supplemental Methods.pdf. 2023. https://doi.org/10.6084/M9.FIGSHARE.22643065. Accessed 11 May 2023.

  15. Lysenkova Wiklander M. REH Data Note - Data File 3. Short Tandem Repeat Analysis of the REH cell line. 2023. https://doi.org/10.6084/m9.figshare.24131670

  16. NCBI Sequence Read Archive. WGS of REH (Illumina TruSeq DNA PCR-Free) - Illumina HiSeq X - Lane 1. 2023. https://identifiers.org/insdc.sra:SRR10882610

  17. NCBI Sequence Read Archive. WGS of REH (Illumina TruSeq DNA PCR-Free) - Illumina HiSeq X - Lane 2. 2023. https://identifiers.org/insdc.sra:SRR10882609

  18. NCBI Sequence Read Archive. WGS of REH (Illumina TruSeq DNA PCR-Free) - mapped - hg38. 2023. https://identifiers.org/insdc.sra:SRR23704824

  19. NCBI Sequence Read Archive. CLR WGS of REH (PacBio SMRT). 2023. https://identifiers.org/insdc.sra:SRR22805329

  20. NCBI Sequence Read Archive. HiFi WGS of REH (PacBio SMRT). 2023. https://identifiers.org/insdc.sra:SRR19123265

  21. NCBI Sequence Read Archive. HiFi/CLR WGS of REH (PacBio SMRT) - mapped - hg38. 2023. https://identifiers.org/insdc.sra:SRR23704823

  22. NCBI Sequence Read Archive. De-novo REH assembly (hifiasm, PacBio HiFi and CLR WGS). 2023. https://identifiers.org/insdc.sra:SRR23704827

  23. NCBI Sequence Read Archive. ONT WGS of REH, sheared to 10 kb. 2023. https://identifiers.org/insdc.sra:SRR22730978

  24. NCBI Sequence Read Archive. ONT WGS of REH, sheared to 20 kb. 2023. https://identifiers.org/insdc.sra:SRR22444744

  25. NCBI Sequence Read Archive. ONT WGS of REH, sheared to 30 kb. 2023. https://identifiers.org/insdc.sra:SRR23054498

  26. NCBI Sequence Read Archive. ONT WGS of REH, sheared to 60 kb, size selected with Circulomics SRE. 2023. https://identifiers.org/insdc.sra:SRR22444743

  27. NCBI Sequence Read Archive. ONT WGS of REH, size selected with Circulomics SRE. 2023. https://identifiers.org/insdc.sra:SRR22444742

  28. NCBI Sequence Read Archive. ONT WGS of REH, Ultralong. 2023. https://identifiers.org/insdc.sra:SRR21147769

  29. NCBI Sequence Read Archive. ONT WGS of REH, Ultralong - mapped - hg38. 2023. https://identifiers.org/insdc.sra:SRR23704822

  30. NCBI Sequence Read Archive. De-novo REH assembly (flye/medaka, ONT Ultralong WGS). 2023. https://identifiers.org/insdc.sra:SRR23704826

  31. NCBI Sequence Read Archive. De-novo REH assembly (flye/racon, ONT Ultralong and PacBio WGS). 2023. https://identifiers.org/insdc.sra:SRR23704825

  32. NCBI Sequence Read Archive. MGISEQ WGS of REH (stLFR). 2023. https://identifiers.org/insdc.sra:SRR18907774

  33. NCBI Sequence Read Archive. 10x GemCode linked-read WGS of REH (high molecular weight) - mapped - hg37. 2023. https://identifiers.org/insdc.sra:SRR10902121

  34. NCBI Sequence Read Archive. 10x GemCode linked-read WGS of REH (standard DNA) - mapped - hg37. 2023. https://identifiers.org/insdc.sra:SRR10902122

  35. NCBI Sequence Read Archive. Single cell ATAC sequencing of REH (10x Chromium) – 1 of 4. 2023. https://identifiers.org/insdc.sra:SRR22320001

  36. NCBI Sequence Read Archive. Single cell ATAC sequencing of REH (10x Chromium) – 2 of 4. 2023. https://identifiers.org/insdc.sra:SRR22320000

  37. NCBI Sequence Read Archive. Single cell ATAC sequencing of REH (10x Chromium) – 3 of 4. 2023. https://identifiers.org/insdc.sra:SRR22319999

  38. NCBI Sequence Read Archive. Single cell ATAC sequencing of REH (10x Chromium) – 4 of 4. 2023. https://identifiers.org/insdc.sra:SRR22319998

  39. NCBI Sequence Read Archive. Single-cell ATAC sequencing of REH - Illumina NovaSeq 6000 - mapped - hg38. 2023. https://identifiers.org/insdc.sra:SRR10907069

  40. NCBI Sequence Read Archive. EM-seq of REH (NEBNext) – 100ng DNA. 2023. https://identifiers.org/insdc.sra:SRR23020114

  41. NCBI Sequence Read Archive. EM-seq of REH (NEBNext) – 10ng DNA. 2023. https://identifiers.org/insdc.sra:SRR23020113

  42. NCBI Sequence Read Archive. RNA-seq of REH (Illumina TruSeq stranded total RNA) - Illumina NovaSeq 6000 - Lane 1. 2023. https://identifiers.org/insdc.sra:SRR10882846

  43. NCBI Sequence Read Archive. RNA-seq of REH (Illumina TruSeq stranded total RNA) - Illumina NovaSeq 6000 - Lane 2. 2023. https://identifiers.org/insdc.sra:SRR10882845

  44. NCBI Sequence Read Archive. RNA-seq of REH (Illumina TruSeq stranded total RNA) - mapped - hg38. 2023. https://identifiers.org/insdc.sra:SRR23704830

  45. NCBI Sequence Read Archive. HiFi RNA-seq of REH (PacBio IsoSeq) - standard-length transcripts. 2023. https://identifiers.org/insdc.sra:SRR22729869

  46. NCBI Sequence Read Archive. HiFi RNA-seq of REH (PacBio IsoSeq) - long transcripts. 2023. https://identifiers.org/insdc.sra:SRR22729868

  47. NCBI Sequence Read Archive. HiFi RNA-seq of REH (PacBio IsoSeq) - standard-length transcripts - FLNC and mapped - hg38. 2023. https://identifiers.org/insdc.sra:SRR23704829

  48. NCBI Sequence Read Archive. HiFi RNA-seq of REH (PacBio IsoSeq) - long transcripts - FLNC and mapped - hg38. 2023. https://identifiers.org/insdc.sra:SRR23704828

Download references

Acknowledgements

The authors would like to acknowledge support of the National Genomics Infrastructure (NGI) unit in Uppsala for aiding in RNA/DNA extraction, library preparation, and sequencing and Susanne Reinsbach for bioinformatics support.

Funding

Open access funding provided by Uppsala University. This project was funded in part by the Swedish Research Council (#2019 − 01976 and #2019 − 0222), the Swedish Childhood Cancer Fund (#2019-0046 and #2022-0086), and the Göran Gustafsson Foundation. This project received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No. 824110 EASI-Genomics. Sequencing was performed at the National Genomics Infrastructure (NGI) at SciLifelab in Uppsala. NGI is funded by SciLifeLab, the Swedish Research Council RFI, and the Knut and Alice Wallenberg Foundation. The computations were enabled by resources provided by the National Academic Infrastructure for Supercomputing in Sweden (NAISS) and the Swedish National Infrastructure for Computing (SNIC) at the Uppsala Multidisciplinary Center for Advanced Computational Science (UPPMAX) partially funded by the Swedish Research Council through grant agreements no. 2022–06725 and no. 2018–05973.

Open access funding provided by Uppsala University.

Author information

Authors and Affiliations

Authors

Contributions

MLW, AA, LF and JN conceived the research and constructed the experimental design. JN and LF acquired funding. MLW and JN wrote the paper. EÖ, JL, AR, ACW, JR, YMZ, HG, TM and UL prepared and sequenced short-read Illumina libraries. SE, RE and PL performed bioinformatics analysis of Illumina sequencing libraries. AP, MBM, SH,and SHK prepared and sequenced long-read libraries. IB performed bioinformatics analysis of long-read sequencing libraries.

Corresponding author

Correspondence to Jessica Nordlund.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lysenkova Wiklander, M., Övernäs, E., Lagensjö, J. et al. Genomic, transcriptomic and epigenomic sequencing data of the B-cell leukemia cell line REH. BMC Res Notes 16, 265 (2023). https://doi.org/10.1186/s13104-023-06537-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13104-023-06537-2

Keywords