Small RNA datasets of drug-susceptible Mycobacterium tuberculosis strains from Sabah, Malaysia

These datasets present a list of small RNAs from three drug-susceptible Mycobacterium tuberculosis strains isolated from Sabah, Malaysia. Sputum samples were obtained from three tuberculosis patients belonging to different districts. The bacteria were detected using GeneXpert MTB/RIF, isolated and cultured in BACTECTM MGITTM 320, and tested for their drug susceptibility. Total RNAs were extracted, sequenced, and analyzed using bioinformatic tools to filter out small RNA present in the Mycobacterium tuberculosis strains. Small RNA sequencing generated total raw reads of 63,252,209, 63,636,812, and 61,148,224 and total trimmed reads (15-30 nucleotides) of 51,533,188, 53,520,197, and 51,363,772 for Mycobacterium tuberculosis strain SBH49, SBH149, and SBH372, respectively. The raw data were submitted to the Sequence Read Archive (SRA) database of the National Center for Biotechnology Information (NCBI) under the accession numbers of SRX16744291 (SBH49), SRX16744292 (SBH149), and SRX16744293 (SBH372). Small RNAs play important roles in cellular processes such as cell differentiation, cell signaling, development of resistance to antibiotics and immune response, and metabolism regulation. The small RNAs determined here could provide further insights into various cellular processes crucial for Mycobacterium tuberculosis survivability and a better understanding of their gene regulation which ultimately opens a new pathway for combating tuberculosis infection.


a b s t r a c t
These datasets present a list of small RNAs from three drugsusceptible Mycobacterium tuberculosis strains isolated from Sabah, Malaysia. Sputum samples were obtained from three tuberculosis patients belonging to different districts. The bacteria were detected using GeneXpert MTB/RIF, isolated and cultured in BACTEC TM MGIT TM 320, and tested for their drug susceptibility. Total RNAs were extracted, sequenced, and analyzed using bioinformatic tools to filter out small RNA present in the Mycobacterium tuberculosis strains. Small RNA sequencing generated total raw reads of 63,252,209, 63,636,812, and 61,148,224 and total trimmed reads (15-30 nucleotides) of 51,533,188, 53,520,197, and 51,363,772 for Mycobacterium tuberculosis strain SBH4 9, SBH14 9, and SBH372, respectively. The raw data were submitted to the Sequence Read Archive (SRA) database of the National Center for Biotechnology Information (NCBI) under the accession numbers of SRX16744291 (SBH49), SRX16744292 (SBH149), and SRX16744293 (SBH372). Small RNAs play important roles in cellular processes such as cell differentiation, cell signaling, development of resistance to antibiotics and immune response, and metabolism regulation. The small RNAs determined here could provide further insights into various cellular processes crucial for Mycobacterium tuberculosis survivability and a better understanding of their gene regulation which ultimately opens a new pathway for combating tuberculosis infection. ©

Value of the Data
• The data provide valuable information for researchers involved in the studies of the relation of small RNAs with cellular processes such as cell signaling, and metabolism, among others, in different experimental conditions, and the discovery of novel small RNAs. • The data are useful for researchers related to the study of the mechanisms of drug resistance, which could contribute to the establishment of biosignatures related to drug-susceptible and drug-resistant Mycobacterium tuberculosis strains. • The data could provide valuable information to researchers looking for pathways useful for the development of new diagnostics, vaccines, and therapies for tuberculosis. • The data will help to understand the genetic and functional characteristics of circulating strains in different geographical areas. Table 1 shows the pre-analysis data of raw reads for each sample obtained after sequencing. Initial raw reads of samples SBH4 9, SBH14 9, and SBH 372 were 63,252,209, 63,636,812, and 61,148,224, respectively. After trimming the adapter sequence and removing low-quality reads, a total of 11,719,021, 10,116,615, and 9,784,452 reads have been discarded from the samples SBH4 9, SBH14 9, and SBH372, respectively. The reads with 15-30 nucleotide (nt) of samples

Culturing of Mycobacterium tuberculosis
Mycobacterium tuberculosis strains were isolated from three individuals who were diagnosed with pulmonary tuberculosis in the year 2018 from Kota Kinabalu (SBH49), Penampang (SBH149), and Semporna (SBH372) districts, respectively. Briefly, sputum samples were collected from individuals who presented with tuberculosis symptoms and abnormal chest X-rays. The sputum samples (0.5 mL) were analyzed with GeneXpert MTB/RIF (Cepheid, Sunnyvale, CA, USA) according to the manufacturer's protocol [1] . The bacteria present in the GeneXpert-positive samples were isolated using liquid culture media. First, the sputum samples were digested and decontaminated with mucolytic agent, BBL® MycoPrep TM , to reduce the sputum viscosity and minimize contamination by rapidly growing normal flora. Then, each processed sputum was cultured in a tube containing Middlebrook 7H9 medium (7 mL) with BACTEC TM MGIT TM Growth Supplement (Becton, Dickinson and Company, USA) and BBL TM MGIT TM PANTA TM (Becton, Dickinson and Company, USA) using BD BACTEC TM MGIT TM 320 system (Becton, Dickinson and Company, USA) at 37 °C until the instrument signals the tube positive to growth. The bacteria culture were subjected to antibiotic susceptibility testing against streptomycin, isoniazid, rifampicin, and ethambutol at a final concentration of 1.0, 0.1, 1.0, and 5.0 μg/mL, respectively, using BACTEC TM MGIT TM SIRE kit (Becton, Dickinson and Company, USA) [2] . The isolates were preserved in 15% glycerol and stored at -80 °C in the Microbial Culture Collection at the Faculty of Medicine and Health Sciences, Universiti Malaysia Sabah.

RNA extraction, quality control, and library preparation
Total RNA was extracted using Lucigen Masterpure TM Complete DNA and RNA Purification kit (Epicentre Biotechnologies, Madison, WL, USA) with several modifications in the lysis method. Briefly, bacteria were revived from glycerol stock by culturing in a 7 mL MGIT for 14 days, the log phase for Mycobacterium growth. The bacteria in the tube were pelleted by centrifugation at 5,0 0 0 x g for 5 minutes. The supernatant was decanted and the cell pellet was mixed with 1X Tissue and cell lysis solution (600 μL) and transferred to a 1.5 mL Eppendorf tube. The cell mixture was transferred to a MN Bead Tube Type B (Macherey-Nagel, Düren, Germany) and was lysed with alternate bead beating for 1-minute and cooling on ice for 1-minute for a total of 20 minutes, followed by proteinase K (20 mg/mL) (40 μL) (Pygene TM , USA) treatment at 55 °C for 15 minutes [ 3 , 4 ]. The remaining steps to obtain the total RNA pellet was conducted according to the manufacturer's instructions [5] . The pellet was resuspended in nuclease-free water (25 μL) and was analyzed with 1.5% agarose gel electrophoresis at 90V for 45 minutes using a pre-stained gel with FloroSafe Stain (First BASE Laboratories, Malaysia) and visualized using Gel Doc TM XR (Bio-Rad, USA). The purity (A 260 /A 230 and A 260 /A 280 ) and concentration of the extracted RNA were measured by DS-11 spectrophotometer (DeNovix Inc, USA).
The RNA was outsourced for sample preparation and sequencing by Apical Scientific Sdn. Bhd, Malaysia. The integrity of RNA was determined by Agilent RNA 60 0 0 Nano kit (Agilent Technologies, USA) [6] . Small RNA libraries preparation was carried out using NEBNext® Multiplex Small RNA Library Prep Set for Illumina® (New England Biolabs, UK), following the manufacturer's instructions. To prevent the production of dimers, the RNA samples were first ligated to the 3 SR adaptor and primer hybridization was performed. Then, 5 SR adaptor ligation and cDNA synthesis were performed. The small RNA libraries were enhanced via PCR amplification using a common primer and a primer containing one of the 48 index sequences. The libraries were gel purified by BluePippin TM (Sage Science, USA). The libraries were quantified using KAPA Library Quantification kits for Illumina Sequencing platforms according to the qPCR Quantification Protocol Guide. Indexed libraries were pooled in equimolar amounts and sequenced on an Illumina HiSeq2500 instrument to generate 51-base reads.

Raw read pre-analysis and annotation
The data have been deposited in the NCBI (National Center for Biotechnology Information) [7] Sequence Read Archive (SRA) [BioSample accession number SRX16744291 (SBH49), SRX16744292 (SBH149), and SRX16744293 (SBH372)] under BioProject accession number PR-JNA863377 ( https://www.ncbi.nlm.nih.gov/sra/PRJNA863377 ) [8] . Further analysis was carried out using fastq-mcf tools version 1.04.676 [9] . The quality phred score used was Q20. Trimming of adapter index sequences was carried out and low-quality reads were removed to produce clean reads [9] . Clean reads having similar sequences were merged using R software and the redundant, partial sequences were removed to generate the sequence tags, the individual sequence which is specifically unique to a sRNA [10] .

Ethics Statements
The ethics approval for this study was obtained from Universiti Malaysia Sabah Medical Research and Ethics Committee [JKEtika 1/20 (10)]. Informed consent for sample collection was obtained from all the participants in this study. The authors kept the ethical concerns into consideration when gathering data and ensured that the information obtained from the respondents were only utilized for research purposes.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data Availability
Small RNA Mycobacterium tuberculosis raw sequence reads (Original data) (NCBI).