A high-quality reference genome for the fish pathogen Streptococcus iniae

Fish mortality caused by Streptococcus iniae is a major economic problem in aquaculture in warm and temperate regions globally. There is also risk of zoonotic infection by S. iniae through handling of contaminated fish. In this study, we present the complete genome sequence of S. iniae strain QMA0248, isolated from farmed barramundi in South Australia. The 2.12 Mb genome of S. iniae QMA0248 carries a 32 kb prophage, a 12 kb genomic island and 92 discrete insertion sequence (IS) elements. These include nine novel IS types that belong mostly to the IS3 family. Comparative and phylogenetic analysis between S. iniae QMA0248 and publicly available complete S. iniae genomes revealed discrepancies that are probably due to misassembly in the genomes of isolates ISET0901 and ISNO. Long-range PCR confirmed five rRNA loci in the PacBio assembly of QMA0248, and, unlike S. iniae 89353, no tandemly repeated rRNA loci in the consensus genome. However, we found sequence read evidence that the tandem rRNA repeat existed within a subpopulation of the original QMA0248 culture. Subsequent nanopore sequencing revealed that the tandem rRNA repeat was the most prevalent genotype, suggesting that there is selective pressure to maintain fewer rRNA copies under uncertain laboratory conditions. Our study not only highlights assembly problems in existing genomes, but provides a high-quality reference genome for S. iniae QMA0248, including manually curated mobile genetic elements, that will assist future S. iniae comparative genomic and evolutionary studies.


S. iniae QMA0248 methylome
DNA methylation guides numerous critical processes including defence against foreign DNA, DNA replication and repair, gene expression and virulence. Analysis of PacBio sequence data enabled the detection of genome-wide DNA methylation.
Three DNA methyltransferases (MTases) were encoded in the QMA0248 genome.
On the basis of homology to known MTases, QMA0248_0514 (annotated as M.Sin248ORF514P in REBASE) likely targets the GCNGC motif and QMA0248_1949 (annotated as M.Sin248ORF1949P in REBASE) likely targets GCCHR (1). QMA0248_0505 (annotated as M.Sin248ORF0505P in REBASE) is encoded ~5kb upstream of QMA0248_0514 but has no close functional homologs and thus has an unknown recognition sequence (1). A small fraction of GCNGC and GCCHR motifs were methylated in QMA0248 (<3%). Collectively these two motifs account for 21,421 sites across the chromosome (or 1 every 100 bp) suggesting a potential role for methylation in the regulation of gene expression in QMA0248. This figure is roughly equivalent to the frequency of Dam GATC sites in Escherichia coli. Further work is required to determine if the activity detected here is biologically meaningful.
Uncharacterised homologs with 99% amino acid identity are found in 8 other available S. iniae complete or draft genomes including SF1, YSFST01-82, ISET0901, ISNO and 89353 (1). In most cases restriction enzymes predicted to recognise GCNGC are predicted to be encoded nearby, or immediately adjacent to the respective MTase gene. Notably, in QMA0248 the adjacent restriction enzyme (QMA0248_0515) is a pseudogene that has been truncated by an IS981. In S. iniae KCTC 11634BP, the orthologous gene in its draft quality 454 genome was truncated the same point by a contig break (2), suggesting that in both QMA0248 and KCTC 11634BP the MTase does not function as part of a restrictionmodification system. The GCNGC motif is found 8074 times in the QMA0248 genome suggesting that methylation activity could have wide-ranging regulatory consequences.
The closest homologs to QMA0248_0514 for which MTase activity has been determined is the M.CmaLM2II enzyme from Clostridium mangenotii LM2 and the M.LmoJ3I enzyme from Listeria monocytogenes J3115. Despite sharing modest overall amino acid similarity to QMA0248_0514 (59% and 45%, respectively), regions of high amino acid identity within their predicted target recognition domains (34/34 for M.CmaLM2II and 32/24 for M.LmoJ3I) support the prediction that QMA0248_0514 would also methylate the 2 nd cytosine of the GCNGC motif.
Detection of m5C using PacBio data is normally unreliable (3). As expected, methylation was detected at only a small fraction of GCNGC sites in the QMA0248 genome and the consensus motif determined by the PacBioSMRT-Portal software includes additional bases that are probably artefactual (e.g. GCNGCAGC) (Supplementary Table S5). Further experimental work (such as using Tet1 pretreatment to enhance detection of m5C with PacBio sequencing, or Oxford Nanopore sequencing) is needed to determine the true extent of cytosine methylation in the S. iniae genome and its role in gene regulation.
M.NgoDCXV homologs are remarkably rare and confined to a few streptococcal species (including S. iniaie SF1). No specific methylation of GCCHR was detected in the QMA0248 genome but 126 motifs that partially overlapped with GCCHR showed evidence of methylation (Supplementary Table S5). The GCCHR motif is present in 13,347 locations in the QMA0248 genome so this represents only a fraction of available sites. The m4C modification is normally detectable from PacBio sequence data, therefore further work is required to determine if QMA0248_1949 is expressed and functional.

methyltransferase QMA0248_0505
The third MTase in S. iniae QMA0248 (QMA0248_0505, known as M.Sin248ORF0505P in REBASE) is encoded ~5kb upstream of QMA0248_0514 and shares a similar strain distribution. There are no close homologs of QMA0248_0505 for which a recognition site has been determined. Accordingly it has been annotated by REBASE as a putative Type II N4-cytosine or N6-adenine DNA methyltransferase of unknown recognition sequence (1).