High quality draft genome sequences of Mycoplasma agassizii strains PS6T and 723 isolated from Gopherus tortoises with upper respiratory tract disease

Mycoplasma agassizii is one of the known causative agents of upper respiratory tract disease (URTD) in Mojave desert tortoises (Gopherus agassizii) and in gopher tortoises (Gopherus polyphemus). We sequenced the genomes of M. agassizii strains PS6T (ATCC 700616) and 723 (ATCC 700617) isolated from the upper respiratory tract of a Mojave desert tortoise and a gopher tortoise, respectively, both with signs of URTD. The PS6T genome assembly was organized in eight scaffolds, had a total length of 1,274,972 bp, a G + C content of 28.43%, and contained 979 protein-coding genes, 13 pseudogenes and 35 RNA genes. The 723 genome assembly was organized in 40 scaffolds, had a total length of 1,211,209 bp, a G + C content of 28.34%, and contained 955 protein-coding genes, seven pseudogenes, and 35 RNA genes. Both genomes exhibit a very similar organization and very similar numbers of genes in each functional category. Pairs of orthologous genes encode proteins that are 93.57% identical on average. Homology searches identified a putative cytadhesin. These genomes will enable studies that will help understand the molecular bases of pathogenicity of this and other Mycoplasma species.


Introduction
The genus Mycoplasma, within the bacterial class Mollicutes (Tenericutes), contains over one hundred species, many of which are pathogenic to vertebrates [1]. An upper respiratory tract disease has been implicated in population declines in Mojave Desert tortoises (Gopherus agassizii) found in the desert southwest of the United States and gopher tortoises (Gopherus polyphemus) inhabiting forests of the U.S. southeast [2][3][4]. Pathogens associated with this disease include two Mycoplasma, Mycoplasma agassizii and Mycoplasma testudineum [5][6][7]. Due to conservation concerns regarding URTD, this disease and its associated pathogens have become a topic of research interest, though our understanding of the biology and progression of URTD is lacking [8,9]. In particular, disease in tortoises is found with varying levels of morbidity, and one hypothesis for this finding is that there is genetic variation of M. agassizii associated with varying levels of virulence [8]. To understand better the amount of genomic differentiation occurring between M. agassizii collected from different tortoise host species, and to identify markers associated with virulence, we sequenced the M. agassizii genome from two strains, PS6 T and 723. This sequencing is part of a larger project to ultimately genetically detect variation in strains and their virulence from field-cultured samples.

Classification and features
Mycoplasma agassizii has been isolated from multiple tortoise species, and was found to be pathogenic in Mojave Desert tortoises and gopher tortoises in North America, causing URTD [5,6,10]. In infected North American tortoises, M. agassizii is most often found in the nasal passages and choana, but can also be isolated from the trachea and lungs [10]. This microbe forms a close extracellular association with the nasal epithelium of its host, and severe infections can result in lesions [11]. Infected hosts experience clinical signs of disease including nasal exudate, possibly leading to lethargic behavior and loss of appetite [5,11].
M. agassizii is coccoid to pleomorphic in shape, lacks a cell wall, and has a three-layer membrane ( Table 1, Fig.  1). These microbes range in size under 1 μm [10,11] and grow in culture at an optimal temperature of 30°C, with an extremely slow growth rate [10,12]. Mortality of M. agassizii occurs at temperatures above 37°C [12], and it retains viability after prolonged periods of cold temperatures [6,10], indicating that body temperatures experienced by its ectothermic hosts likely affect the microbe's success over the seasons. In an experiment to detect co-infection patterns of M. agassizii with its close relative M. testudineum, there was some indication that the two species form a facilitative relationship in a hostcontext-dependent manner [13]. Preliminary microbiome data suggest that the presence of M. agassizii is associated with a shift in the microbial community composition in Mojave and Sonoran Desert tortoises (Gopherus morafkai) (CLW, FCS and CRT, unpublished data).
The strains of M. agassizii that we have sequenced were isolated from two host species. Strain PS6 T was isolated from the upper respiratory tract of a Mojave Desert tortoise in the Las Vegas Valley, Nevada, USA [10], while strain 723 was obtained from an ill gopher tortoise in Sanibel Island, Florida, USA [6]. Strains were cultured in SP4 broth, and have been used in experiments to demonstrate their pathogenic effects on their tortoise hosts [5,6]. To determine the placement of M. agassizii in the mycoplasmal phylogeny, all 16S rRNA gene sequences from the type strains of Mycoplasma species were obtained from the SILVA database [14] and aligned using MUSCLE 3.8.31 [15], and a phylogenetic tree was constructed using the maximum likelihood method implemented in MEGA7 [16] (Fig. 2). Consistent with prior results [17,18], M. testudineum is a sister group of M. agassizii in the resultant tree, and the M. agassizii/M. testudineum clade is a sister group of Mycoplasma pulmonis, the agent of murine respiratory mycoplasmosis. All three species fall within the hominis group of Mycoplasma (see ref. [19] for group definitions). The 16S rRNA gene sequence from M. agassizii, strain PS6 T , is 99.8, 93.2 and 89.2% identical to those of M. agassizii strain 723, M. testudineum strain BH29 T , and M. pulmonis strain PG34 T , respectively.

Genome sequencing information
Genome project history Two strains of M. agassizii were selected for sequencing, strains PS6 T and 723, both isolated from tortoises with signs of URTD [5,6,10,20]. Sequencing was conducted in October 2016. The Whole Genome Shotgun projects were deposited at DDBJ/ENA/GenBank under the accession numbers NQMN00000000 (strain PS6 T ) and NQNY00000000 (strain 723). The versions described in this paper are the first versions. A summary of the information of both projects in compliance with MIGS version 2.0 [21] is shown in Table 2.

Growth conditions and genomic DNA preparation
Freeze-dried M. agassizii strains were obtained from the ATCC in March 2011 (strain PS6 T ) and May 2016 (strain 723). Strain PS6 T was cultured on SP4 media and re-pelleted in-lab prior to DNA extraction. Genomic DNA was extracted using the Qiagen DNeasy Blood and Tissue Kit protocol for Gram-negative bacteria and eluted with water. Extracted DNA was quantified on a Qiagen QIAxpert system and with Picogreen analysis.

Genome sequencing and assembly
Genome sequencing was conducted using the Illumina Nextera XT DNA Library Preparation Kit (Illumina, Inc., San Diego, USA) with the Illumina NextSeq500 platform (150 bp, paired-end) and 2 ng of starting genomic DNA at the Nevada Genomics Center (University of Nevada, Reno). Sequencing was performed in multiplex with multiple samples, using dual index sequences from the Illumina Nextera XT Index Kit, v2 (PS6 indices: index 1 N702, index 2 S510; 723 indices: index 1 N702, index 2 S511). A total of 349,251 and 332,967 read pairs were obtained for strains PS6 T and 723, respectively. Using Trimmomatic, version 0.36 [22], reads were trimmed to remove Nextera adapter sequences and low quality nucleotides from either end (average Phred score Q ≤ 5, four bp sliding window), and sequences trimmed to < 35 bp were removed. After trimming, 330,351 (PS6 T ) and 305,002 (723) read pairs, and 16,438 (PS6 T ) and 25,017 (723) single-reads (the pairs of which were removed) remained. De novo genome assembly was performed using SPAdes 3.10.1 [23], using as inputs the trimmed paired reads, and the trimmed single reads (assembly k-mer sizes 21, 33, 55, and 77, with read error-correction enabled and '-careful' mode mismatch correction). After removing scaffolds of less than 500 bp, the final assemblies consisted of 8 (PS6 T ) and 40 (723) scaffolds with a total length of 1,274,972 bp (PS6 T ) and 1,211,209 bp (723), an average length of 159,372 bp (PS6 T ) and 30,280 bp (723), and an N50 of 654,010 bp (PS6 T ) and 56,701 bp (723). The coverage was 38.51× for the PS6 T assembly and 37.73× for the 723 assembly.

Genome properties
The properties of both draft genomes are summarized in Table 3. The final assembly for strain PS6 T consisted of 8 scaffolds, with a total length of 1,274,972 bp, and a G + C content of 28.43%. PGAP [24] identified a total of 979 protein-coding genes, 13 pseudogenes, and 35 RNA genes. The assembly for strain 723 consisted of 40 scaffolds, with a total length of 1,211,209 bp, and a G + C content of 28. 34%. A total of 955 protein-coding genes, 7 pseudogenes, and 35 RNA genes were identified. In both cases, the identified RNA genes include 3 rRNAs (one 5S, one 16S and one 23S), 3 ncRNAs and 29 tRNAs. PGAP identified no CRISPR repeats in any of the two genomes, and CRISPR-Finder [30] identified 6 "questionable" repeats in the PS6 T genome and one "questionable" repeat in the 723 genome, but no "confirmed" repeats. The numbers of proteincoding genes in each COG category [31] were similar for both strains, and are summarized in Table 4.

Insights from the genome sequence
The small genome size and low G + C content of both M. agassizii genomes described here are consistent with those of other Mycoplasma genomes sequenced [18,32,33]. However, the M. agassizii genomes are significantly larger than the genome of M. testudineum, strain BH29 T (960,895 bp, 788 protein-coding genes; ref. [18]). The difference in the genome size of both sister species might account for the fact that M. agassizii is associated with URTD, whereas the link between M. testudineum and URTD is less clear [13]; i.e., genes present in M.  agassizii PS6 T and 723 (•). All 16S sequences from the Mycoplasma genus were obtained from the SILVA database [14]. Only sequences in the 'The All-Species Living Tree' Project (LTP), release 128, were retained. This dataset only contains sequences from type strains, designated with a superscripted "T". Clostridium botulinum strain ATCC 25763 was also included in the dataset as outgroup. Sequences were aligned using MUSCLE 3.8.31 [15]. A phylogenetic tree was obtained using the maximum likelihood method implemented in MEGA7 [16], with 1000 bootstrap replicates. Species with available genomes at the NCBI Genomes database or the Genomes Online Database are represented in bold face. GenBank accession numbers are shown in parentheses. Bootstrap support values above 50% are represented. The scale bar represents a divergence of 0.05 nucleotide substitutions per nucleotide position agassizii but not in M. testudineum might be responsible for pathogenicity.
In spite of the fact that the two M. agassizii strains sequenced here were obtained from geographically distant locations (the Mojave Desert and Sanibel Island) and from different tortoise species (G. agassizii and G. polyphemus; refs. [5,6,10,20]), the two genomes are very similar, exhibiting very similar sizes, numbers of genes (Table 3), functional composition (Table 4), and a high degree of synteny (Fig. 3a). A best-reciprocal-hit approach (based on BLASTP searches, E-value ≤10 − 10 ) identified 828 pairs of putative orthologs within both genomes. The sequences of proteins encoded by pairs of orthologous genes were aligned using ProbCons version 1.12 [34], and were 93. 57% identical on average (median: 96.84%). In contrast, comparison of the genomes of M. agassizii   (Fig. 4). Surprisingly, our 16S sequence for strain PS6 T and that obtained by Brown et al. (also for strain PS6 T ; ref. [20]) exhibit 8 differences (4 point differences and 4 indels; Fig. 4). These differences may represent mutations accumulated since the isolation of the strain, or sequencing errors.
To initiate pathogenesis, Mycoplasma cells usually require adhering to the host mucosa. Adhesion mechanisms are relatively well understood in Mycoplasma pneumoniae and its close relatives, but poorly understood in other Mycoplasma species [35]. In a prior study, we searched all available Mycoplasma genomic data (nr database, including the genome of M. testudineum BH29 T ) for homologs of M. pneumoniae cytadhesins P1, P30, P65, P40 and P90 and cytadhesin accessory proteins hmw1, hmw2 and hmw3, finding homologs only in species closely related to M. pneumoniae (Mycoplasma genitalium, Mycoplasma gallisepticum, Mycoplasma pirum, Mycoplasma alvi, Mycoplasma imitans, and Mycoplasma testudinis) [18]. Here, we expanded these analyses (BLASTP and TBLASTN searches; E < 10 − 5 and low-complexity regions filtering) to the two M. agassizii proteomes/genomes, also with negative results. In addition, none of the predicted M. agassizii proteins exhibit any of the Pfam domains present in the M. pneumoniae (domains "CytadhesinP1", "Adhesin_P1", "Cytadhesin_P30", "MgpC" and "EAGR_ box"). This could be attributed either to (i) M. pneumoniae adhesion proteins being specific to this species and its close relatives, or (ii) adhesion proteins evolving very fast, perhaps due to co-evolutionary races, precluding detection of homologs in distantly related species. The first possibility is supported by the fact that M. pulmonis, the most closely related known species to the M. agassizii/   [45], a web interface for Circos [46]. The relative order of scaffolds is unknown. For strain PS6 T , scaffolds are sorted by size   [36]. In support of the second scenario, our analysis of orthologous sequences revealed poor protein conservation among the sister groups M. agassizii and M. testudineum. We repeated our similarity searches using as query a list of known Mycoplasma adhesins, which we obtained by searching the text "Mycoplasma adhesin" in the Uni-Prot database [37]. Our prior searches against the M. testudineum BH29 T proteome/genome failed to detect any significant hits. In the current study, we detected significant similarity between a Mycoplasma mobile protein annotated as a "Truncated adhesin protein" (UniProt ID: Q8L3E5_9MOLU) and the proteins CJF60_ 05070 (strain PS6 T , 3308 amino acids) and CJJ23_03020 (strain 723, also 3308 amino acids) of M. agassizii. CJF60_ 05070 and CJJ23_03020 are 92% identical. The Cterminal part of the M. mobile protein exhibits homology to three regions of the M. agassizii proteins (positions 958-1261, 1296-1597 and 1717-1924 of CJF60_05070; positions 956-1259, 1294-1595 and 1715-1922 of CJJ23_ 03020). A BLASTP search using CJF60_05070 as query sequence against the nr database identified a total of 17 hits, including three adhesion proteins (Table 5). Of note, the first hit is a M. testudineum protein (34%), which was not detected in our prior analyses [18]. Equivalent results were obtained using the CJJ23_03020 protein sequence as query (data not shown). The TMHMM server v. 2.0 [29] predicted both CJF60_05070 and CJJ23_03020 to contain a transmembrane domain at the N-terminal part of the protein (positions 7-29), and most of the protein (positions 30-3308) to be extracellular. Taken together, these observations point to these proteins as potential M. agassizii adhesins.

Conclusions
We have obtained draft genome sequences for M. agassizii, strains PS6 T and 723, both isolated from tortoises of the genus Gopherus with URTD. Both genomes exhibited a very small size and low G + C content, which is typical from Mycoplasma genomes. The two assemblies were very similar, in terms of synteny and protein sequences, in spite of the fact that they were obtained from different hosts and geographical locations. We identified a putative cytadhesin in both genomes. The new genomes will facilitate future studies that will help understand the molecular bases of pathogenicity of this and other Mycoplasma species.