Whole genome sequence data of an Antarctic bacterium, Arthrobacter sp. ES1 from the Schirmacher Oasis, East Antarctica

Arthrobacter is a coryneform bacterium in the family of Micrococcaceae. Arthrobacter species isolated from hostile environments are capable of producing interesting bioactive compounds, some of which may be a new class of antibiotics. Here, we present the complete genome sequence of Arthrobacter sp. ES1 isolated from Schirmacher Oasis in East Antarctica. Genomic DNA sequencing was performed using the Illumina MiSeq sequencer. Arthrobacter sp. ES1 has a genome size of 3,964,927 bp and a GC content of 65.73%. The raw genome sequences have been deposited in the NCBI Sequence Read Archive database under the accession number, SRR20664316.


Specification
Biology Specific subject area Microbiology and genomics Type of data Table  Figure How the data were acquired The genomic library was constructed using Nextera® XT DNA Sample preparation kit. Genome sequencing was performed using 300 cycles of Miseq® Regent Kit v2 and Illumina MiSeq Platforms. Raw sequencing data were trimmed and filtered using SolexaQA and bowtie2 tools [ 1 , 2 ]. De novo assembly was performed by using Velvet [3] . The genome completeness was assessed using BUSCO [4] tool. Genome annotation was performed using an online web tool, Rapid Annotation using Subsystem Technology (RAST) [5] . Data

Value of the Data
• Whole genome sequence data can be used to identify Arthrobacter sp. ES1 and determine whether it is a new species. • Whole genome sequence data of strain ES1 can be useful for comparative genomic studies with other Arthrobacter species. • Unravelling the genome of strain ES1 may aid in the discovery of novel bioactive compound-coding gene clusters.

Objective
Arthrobacter sp. Are ubiquity present in the environment, they are frequently isolated from the soil. Arthrobacter sp. strains have exceptional survival abilities. They have been isolated from a variety of harsh environments, including radioactive and chemically contaminated sites as well as the polar regions. These Arthrobacter sp. are capable of metabolizing and resisting environmental hazards and heavy metals [6] . In addition, polar environments have been proposed as a source of novel bioactive compounds. Sixteen bacterial strains that produce antibiotics have been isolated from the central Arctic Ocean by Wietz et al., [7] . Seven of these Arthrobacter spp. can produce arthrobacilins, A and C under different growth conditions [7] . Our objective was to sequence, assemble, and annotate the genome of Arthrobacter sp. ES1, which would allow us to discover new bioactive compounds and conduct evolutionary research.

Data Description
This data set includes raw and assembled DNA sequences that have been quality-assessed, as well as annotated versions of the genomes of Arthrobacter sp. ES1. The resulting paired-end Table 1 Pre-processed sequencing reads statistics of forward (ES1_R1.fastq) and reverse (ES1_R2.fastq) reads. sequencing reads were designated as ES1_R1.fastq and ES1_R2.fastq. Herein, the raw and cleansequencing reads, statistics for the assembly, the genome's quality, and its annotation are reported. A total of 3,24 8,04 8 raw reads were generated resulting in 4 90,455,24 8 bases ( Table 1 ). The sequencing reads were then pre-processed to remove low-quality, contaminant, and short reads, a total of 72.92% of clean reads were recovered. The genome size of strain ES1 is 3,964,927 bp at 77 × sequence coverage with a GC content of 65.73%. The strain ES1 draft genome consists of 170 contigs, with the longest contig having 356,645 bases, the N50 having 66,568 bases, and the N90 having 15,117 bases. De novo assembly produced 111 small contigs ( < 10,0 0 0 bp) and 59 large contigs ( > 10,0 0 0 bp) ( Table 2 ). The quality of the draft genome of strain ES1 was examined using Benchmarking Universal Single-Copy Ortholog (BUSCO) tested with actinobac-teria_odb9 lineage, resulting in 97.8% of complete BUSCOs ( Fig. 1 ). The genome annotation was performed using Rapid Annotation using Subsystem Technology (RAST) server. The output shows that there are 3,904 coding sequences and 51 RNAs in strain ES1, and 25% of coding sequences were classified into 285 subsystems ( Fig. 2 ).

Genome DNA extraction and sequencing
Arthrobacter sp. ES1 was grown in nutrient broth (NB) medium at 20 °C for 3 days and used for genomic DNA extraction. Genomic DNA was extracted by using DNeasy Blood and Tissue kit (Qiagen, Inc, USA) according to the manufacturer's instructions. The Nextera® XT DNA sample preparation kit was used to construct a genomic library. A whole genome shot-gun sequencing was performed by using a 300 cycles Miseq® Reagent Kit v2 on an Illumina MiSeq sequencer to generate 150 bp paired-end reads.

Reads Pre-Processing, Genome Assembly, Quality Assessment, and Annotation
The raw reads were pre-processed with the SolexaQA tool to remove low-quality bases (Qphred < 20) and short reads (minimum length = 50) [1] . Reads were filtered using bowtie2 to remove phiX reads [2] . FastQC was used to ensure that the generated clean reads were of high quality ( https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ ) [8] . De novo assembly and scaffolding were performed by using Velvet v1.2.10 [3] . The quality of the draft genome was assessed by using Benchmarking Universal Single-Copy Ortholog (BUSCO) [4] . The draft genome was annotated by using Rapid Annotation using Subsystem Technology (RAST) software [5] .

Ethics Statement
This work neither involves human subjects nor animal subjects. The authors declare that this manuscript is original work and has not been published elsewhere.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: