Draft genome sequence data of Gordonia hongkongensis strain EUFUS-Z928 isolated from the octocoral Eunicea fusca

Octocorals are among the most prolific sources of biologically active compounds. A significant part of their specialized metabolites richness is linked to the abundance of their associated microbiota. Consequently, research on the bioprospecting potential of microorganisms associated with these marine invertebrates has gained much interest. Here, we describe the draft genome of Gordonia hongkongensis strain EUFUS-Z928 isolated from the octocoral Eunicea fusca. The genome was assembled de novo from short-read whole-genome sequencing data. Additionally, functional annotation of predicted genes was performed using the RAST tool kit, including genome mining for specialized metabolite biosynthetic gene clusters using the antiSMASH v6.0 tool. The genome sequence data of G. hongkongensis EUFUS-Z928 can provide information for further analysis of the potential biotechnological use of this microorganism and guide the characterization of other related actinobacterial isolates. Likewise, this information increases the analytical capacity for studying the genus Gordonia.


a b s t r a c t
Octocorals are among the most prolific sources of biologically active compounds. A significant part of their specialized metabolites richness is linked to the abundance of their associated microbiota. Consequently, research on the bioprospecting potential of microorganisms associated with these marine invertebrates has gained much interest. Here, we describe the draft genome of Gordonia hongkongensis strain EUFUS-Z928 isolated from the octocoral Eunicea fusca . The genome was assembled de novo from short-read whole-genome sequencing data. Additionally, functional annotation of predicted genes was performed using the RAST tool kit, including genome mining for specialized metabolite biosynthetic gene clusters using the antiSMASH v6.0 tool. The genome sequence data of G. hongkongensis EUFUS-Z928 can provide information for further analysis of the potential biotechnological use of this microorganism and guide the characterization of other related actinobacterial isolates. Likewise, this information increases the analytical capacity for studying the genus Gordonia .

Value of the Data
• The draft genome data of Gordonia hongkongensis strain EUFUS-Z928 provides valuable information for the study of the evolution of the genus Gordonia and its biotechnological potential. • These data are valuable for environmental and clinical microbiology, bioprospecting, and biotechnology researchers. • These data can be used for genome mining to discover novel metabolite biosynthesis pathways. • Given the potential shown by Gordonia species in bioremediation, these data serve to conduct comparative genomics work further and allow a better understanding of the mechanisms involved in bioremediation processes.

Data Description
The strain EUFUS-Z928 was isolated from the octocoral Eunicea fusca collected in Santa Marta Bay, Colombia. Table 1 shows the results of the de novo and scaffolded genome assembly of the strain EUFUS-Z928. Scaffolding substantially improved the assembly by reducing the number of contigs by 76.23% and leaving an L50 and L75 of 1 (N50 = 5,295,384). The scaffolding was performed using as reference the genomes of the closest relatives according to the overall genome relatedness indices (OGRI) results obtained on the de novo assembly (Table S1: https://osf.io/q8xus/ ).
The genome-based classification and identification found the strain EUFUS-Z928 to be closely related to Gordonia terrae and Gordonia lacunae type strains ( Table 2 , Fig. 1 A). Phylogeny analysis with the 16S rRNA gene also found a close relationship with Gordonia hongkongensis ( Fig. 1 B).  Table 2 Overall genome relatedness indices (OGRI) between EUFUS-Z928 and the closely related type strain genomes.   Finally, phylogenetic analysis with the sequences of the genes coding for protein translocase subunit SecA1 (secA1) and DNA gyrase subunit B (gyrB) allowed classification of strain EUFUS-Z928 as G. hongkongensis ( Fig. 1 C and D). It is important to clarify that at the time of the analysis, G. honkongensis genomes were not available in the Type Strain Genome Server (TYGS); therefore, it was impossible to include them in the whole genome-based phylogram. A total of 5042 genes were annotated in the genome of G. honkongensis EUFUS-Z928 (the complete annotation data can be found in Table S2: https://osf.io/2ra3k/ ). Of these, 4987 corresponded to coding sequences (CDS), most of them (62.44%) with a functional assignment ( Table 3 ). Additionally, analysis with the antiSMASH v6.0 tool identified 14 biosynthetic gene clusters (BGCs) ( Table 3 and Fig. 2 ), among which NRPS and Terpene had more than 1 cluster (i.e., 4 and 2, respectively).
Regarding proteins with assignments to subsystems, as shown in Fig. 3 , Metabolism (45.61%), Protein Processing (14.76%), Energy (13.23%), and Stress Response, Defense, Virulence (8.72%) were the subsystems with the highest assignments. In the latter, genes related to antibiotic resistance ( n = 43), arsenic resistance ( n = 5), as well as genes related to protection against oxidative stress such as mycothiol ( n = 10) and protection from reactive oxygen species ( n = 3) stand out. Complete information on the 1640 genes assigned to subsystems is shown in Table  S3 ( https://osf.io/6j5zs/ ).
According to the List of Prokaryotic names with Standing in Nomenclature, 47 species of the genus Gordonia have been reported so far ( https://lpsn.dsmz.de/genus/gordonia ; consulted on 04/02/2022). Although several strains of Gordonia are opportunistic pathogens, their potential for bioremediation of polluted environments [1] makes them a valuable biological resource in several research areas. The whole-genome sequence and functional annotation data of G. hongkongensis EUFUS-Z928 provides valuable information to facilitate the design and execution of more in-depth studies such as comparative genomics and genome mining.

Strain isolation and DNA extraction
Strain EUFUS-Z928 was isolated from a sample of the octocoral Eunicea fusca (collected by diving at Punta de Betín, 11 °15 02.1 N 74 °13 16.0 W, Santa Marta, Magdalena, Colombia). The isolation was carried out using a modified Zobell medium (1.25 g of yeast extract, 3.75 g of peptone, 18 g of NaCl, 2 g of MgCl 2 , 0.525 g of KCl, 0.075 g of CaCl 2 and 15 g of agar dissolved in enough distilled water to make 1 l of solution) supplemented with nalidixic acid (50 μg/mL).
Genomic DNA extraction was performed using the Quick-DNA Fungal/Bacterial Microprep kit (Zymo Research Corporation, Irvine, CA, USA) following the manufacturer's instructions. The quality of the extracted DNA was verified by agarose gel electrophoresis and quantified using Qubit 1X dsDNA High Sensitivity kit (Invitrogen, Life Technologies, CA, USA).

Whole genome sequencing, assembly and annotation
Whole-genome sequencing was performed by Macrogen Inc. (Korea) using Illumina pairedend sequencing technology. Short read (151 bp) libraries were prepared using TruSeq Nano DNA Library Prep kit (Part # 15041110, Rev. D, Illumina, Inc., San Diego, CA, USA) and sequencing on Illumina NovaSeq 60 0 0 platform. The raw sequence reads were quality filtering, trimming and de novo assembled applying the Shovill pipeline v1.1.0 (with default parameters) [2] , employing SPAdes as the assembler tool [3] . Contigs shorter than 200 bp were removed. To check the quality of the de novo assembly, the genome completeness was analyzed by the BUSCO tool [4] (it reached 99.8%), and the ContEst16S algorithm [5] did not identify contamination in the assembled genome. Genome sequencing and assembly data are available from NCBI BioProject with accession PRJNA798903.
Genome scaffolding was performed using the MeDuSa web server [6] with the reference genomes Gordonia sp. SGD-V-85 (RefSeq assembly accession: GCF_001456905) and Gordonia terrae (RefSeq assembly accession: GCF_901542405). These genomes were selected considering the results of the OGRI with the de novo assembled genome. The de novo assembled and scaffolded genomes were compared using the QUAST web server [7] . Genome annotation was done according to the RAST tool kit using the PATRIC service center [8] . To detect and characterize the content of specialized metabolite BGCs, we annotated the genome using the antiSMASH v6.0 tool [9] . A graphical circle map was generated on the CGView server to visualize the annotation results [10] .

Phylogeny analysis
The analysis was conducted on both a genome-wide and single gene basis; including 16S ribosomal RNA gene (well established for phylogenetic analysis of bacteria), secA1, and gyrB genes, which have also been used for Gordonia phylogenetic analysis with more discriminatory power to identification at the species level [11] . The OGRI were calculated using the TYGS [12] and JSpeciesW [13] web servers. The sequences of 16S rRNA, secA1, and gyrB genes were retrieved from our annotated genome G. hongkongensis EUFUS-Z928. Phylogenetic trees for single gene analysis were estimated based on the maximum likelihood method using the IQ-Tree tool [14] (bootstrap values were calculated from 10 0 0 replicates). Phylograms were generated using MEGA v11.0.10 [15] . Whole-genome phylogeny analyses were inferred using the Genome BLAST Distance approach in the TYGS server.

Ethics Statements
The samples used by this research were of Colombian origin, and they were obtained according to Amendment No. 5 to ARG Master Agreement No. 117 of 26 May 2015, granted by the Ministry of Environment and Sustainable Development, Colombia.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.