Comparative genomics of Nocardia seriolae reveals recent importation and subsequent widespread dissemination in mariculture farms in the South Central Coast region, Vietnam

Between 2010 and 2015, nocardiosis outbreaks caused by Nocardia seriolae affected many permit farms throughout Vietnam, causing mass fish mortalities. To understand the biology, origin and epidemiology of these outbreaks, 20 N . seriolae strains collected from farms in four provinces in the South Central Coast region of Vietnam, along with two Taiwanese strains, were analysed using genetics and genomics. PFGE identified a single cluster amongst all Vietnamese strains that was distinct from the Taiwanese strains. Like the PFGE findings, phylogenomic and SNP genotyping analyses revealed that all Vietnamese N. seriolae strains belonged to a single, unique clade. Strains fell into two subclades that differed by 103 SNPs, with almost no diversity within clades (0–5 SNPs). There was no association between geographical origin and subclade placement, suggesting frequent N. seriolae transmission between Vietnamese mariculture facilities during the outbreaks. The Vietnamese strains shared a common ancestor with strains from Japan and China, with the closest strain, UTF1 from Japan, differing by just 220 SNPs from the Vietnamese ancestral node. Draft Vietnamese genomes range from 7.55 to 7.96 Mbp in size, have an average G+C content of 68.2 % and encode 7 602–7958 predicted genes. Several putative virulence factors were identified, including genes associated with host cell adhesion, invasion, intracellular survival, antibiotic and toxic compound resistance, and haemolysin biosynthesis. Our findings provide important new insights into the epidemiology and pathogenicity of N. seriolae and will aid future vaccine development and disease management strategies, with the ultimate goal of nocardiosis-free aquaculture.


INTRODUCTION
The genus Trachinotus, of the family Carangidae, comprises a group of marine, medium-sized, migratory, pelagic finfish that are widely distributed in subtropical and tropical waters worldwide [1,2]. Many members of the genus, such as T. carolinus, T. blochii, OPEN ACCESS T. ovatus, and T. falcatus are of great economic importance for fisheries and aquaculture sectors in America and Asia due to their high-quality meat, fast growth, high market price, and strong adaptability to a variety of captive environments [3][4][5][6][7]. In Asia, the farming of permit fish, particularly the snub nose permit, T. falcatus, has commercially taken place in ponds, raceways, and floating sea cages in both brackish and sea waters. Since 2010, Asian mariculture farms have produced over 2 million tonnes of fish meat, significantly contributing to the food security, poverty alleviation, and economic growth of the region [8]. However, the shortage of quality seed stock and the risk of fish disease outbreaks in several countries are key obstacles and challenges for the sector's sustainable development.
T. falcatus fingerlings were first imported into Vietnam from Taiwan and China in the 2000s and have quickly gained popularity, with permit fish now the third largest group of commercially cultured marine fish after seabass and grouper. However, high mortality rates of T. falcatus weighing between 5 and 350 g (6-45 cm in length) emerged in 2010 during an epizootic event that affected sea cage farms in Khánh Hòa province, in the South Central Coast region of Vietnam. Since this initial outbreak, large-scale outbreaks have occurred at several other farming sites in southern and central parts of the country [9,10]. Infected fish showed clinical signs of nocardiosis such as lethargy, skin blisters, ulcers, and multiple yellowish to whitish nodules affecting both internal and external organs. Based on analyses of 16S rRNA gene sequences and biochemical characteristics, the bacterial pathogen Nocardia seriolae was confirmed as the causative agent [10]; however, the origin of N. seriolae affecting Vietnamese permit fish farms has not yet been identified.
N. seriolae is a Gram-positive, branching, filamentous intracellular bacterium of the family Nocardiaceae that was initially described as N. kampachi in farmed yellowtail, Seriola quinqueradiata [11], following large outbreaks in Mie Prefecture, Japan. An estimated loss of approximately 260 tonnes of cultured yellowtails due to the disease was recorded in 1989 [12]. Nocardiosis has also impacted several other important fish species within the Japanese aquaculture industry such as amberjack (Seriola dumerili), Japanese flounder (Paralichthys olivaceus), and chub mackerel (Scomber japonicas). N. seriolae has subsequently been documented in Taiwan, China, Korea, USA, and Mexico, where high mortalities and associated economic losses due to nocardiosis have been reported in freshwater and marine fish species in both cultured and wild populations [13][14][15][16][17][18][19][20][21][22][23]. Despite causing significant economic losses in fish aquaculture worldwide, there are currently no effective measures against nocardiosis.
Five complete and eight draft N. seriolae genome sequences were publicly available prior to our study, representing isolates retrieved from Japan, South Korea, and China [24][25][26][27][28]. These genomes have provided important insights into N. seriolae epidemiology, transmission, pathogenesis, and infection control strategies; however, isolates from other nocardiosis-prevalent regions such as Taiwan, USA, Mexico, and Vietnam have not yet been examined, leaving major gaps in our understanding of this devastating infectious disease. In the current study, we sequenced the genomes of seven N. seriolae isolates isolated from different permit fish farm locations across Vietnam and compared them with the 13 previously genome-sequenced N. seriolae isolates, allowing a comparison of isolates spanning a decade in time and from a variety of sources and geographical locations. Using this information, we developed two novel SNP-based PCR assays to rapidly differentiate Vietnam and non-Vietnam strains, and strains representing the two Vietnamese clades. We also characterized potential virulence factors and antimicrobial/toxin resistance determinants to gain insights into pathogenicity and survival mechanisms. Finally, we functionally annotated our N. seriolae genomes to determine whether differences in gene content might contribute to physiological variability among isolates.

Impact Statement
Nocardia seriolae, the aetiological agent of a lethal granulomatous disease known as fish nocardiosis, has caused high fish mortalities to global aquaculture sectors in recent decades, particularly in Asia and the Americas. This pathogen possesses a highly conserved genome and minimal genetic diversity, which limits the discriminatory power of existing genotyping techniques such as PFGE, leading to insufficient resolution among genetically related strains. To overcome resolution issues using genotyping methods such as PFGE, we employed whole-genome sequencing (WGS) to create highly resolved time-calibrated phylogenies from all available N. seriolae genomes (n=20), including seven newly sequenced strains we retrieved from Vietnamese fish farms, where nocardiosis outbreaks are increasingly imposing a significant commercial burden. This comprehensive comparative genomic analysis provides the first global phylogenetic analysis of N. seriolae strains, allowing the elucidation of the temporal and spatial dynamics of this pathogen, particularly in Vietnam. Using the comparative genomic data, we developed two SNP-based genotyping assays for differentiating Vietnamese from non-Vietnamese strains, and for distinguishing between the two Vietnamese clades, offering an inexpensive tool for rapidly discriminating and tracing the origin of new nocardiosis outbreaks. Our WGS and SNP assays identified the rapid and undetected spread of N. seriolae throughout South Central Coast aquaculture facilities, reflecting the need for better surveillance measures for this emerging pathogen. Finally, our genomic analysis also identified multiple virulence factors and antimicrobial resistance genes, which provide valuable information for better understanding the pathogenicity and persistence of this important aquaculture pathogen.

Bacterial strains
Due to a ban on N. seriolae culture importation into Australia, all live culture work was carried out in laboratories at the Institute for Aquaculture, Nha Trang University, Vietnam (for Vietnamese strains) and the Department of Veterinary Medicine, College of Veterinary Medicine, National Pingtung University of Science and Technology, Pingtung, Taiwan (for Taiwanese strains).
Twenty-two N. seriolae strains isolated from fish were examined in this study, comprising 20 from Vietnam and two from Taiwan. Vietnamese strains were isolated from cultured permit fish (T. falcatus) (31.0-85.8 g) during nocardiosis outbreaks occurring between 2014 and 2015 in four provinces (Phú Yên, Khánh Hòa, Ninh Thuận, and Vũng Tàu) in the South Central Coast region, and the Taiwanese strains were isolated from largemouth bass (Micropterus salmoides) and mullet (Mugil cephalus) in 2007 ( Fig. 1 and Table 1). Isolates were confirmed as N. seriolae based on morphological observations, Ziehl-Neelsen staining (Fig. 2), 16S rRNA gene sequencing, and biochemical characteristics. The 20 Vietnamese strains were subject to PFGE analyses, of which seven isolates were selected for whole-genome sequencing (WGS) to enable more detailed genetic analyses. All 22 isolates were tested using our SNP genotyping assays.
Isolates were preserved in Brain Heart Infusion (BHI; Difco) broth mixed with 25 % (v/v) glycerol and stored at −80 °C. For culturing, strains were grown in BHI broth at 28 °C for 5 days, with orbital shaking at 150 r.p.m. For DNA extraction, 0.3 ml of bacterial cells were pelleted at 6000 g at 4 °C for 5 min and washed twice with 1× sterile PBS. To test for a haemolytic reaction, N. seriolae colonies grown in BHI broth were streaked onto 5 % (v/v) sheep blood agar and incubated at 28 °C for 3 weeks (Fig. 2a).

PFGE typing
PFGE was performed using 50 U XbaI or AseI (New England BioLabs) as previously described [20]. The type strain, N. seriolae BCRC 13745 (JCM 3360; isolated from the spleen of farmed yellowtail in Nagasaki Prefecture, Japan, ca. 1974), was included for comparative purposes. Gels of DNA fragments were analysed using GelCompar II software version 6.5 (Applied Maths). Gel bands were automatically assigned by the software and were checked and corrected manually. Only clearly resolved bands were considered for further analysis. A dendrogram was constructed using an unweighted pair group method with arithmetic mean (UPGMA) approach and the Dice similarity coefficient, with band optimization and band position tolerances of 1.0 %. Isolates that showed similarity between the banding profiles of ≥80 % (fewer than six bands of difference) were defined as indistinguishable or clonally related, whereas patterns with <80 % similarity (six or more bands of difference) represented different clusters of unrelated strains [29,30].

DNA extraction
Total genomic DNA of bacterial isolates was extracted using the Wizard Genomic DNA Purification Kit (Promega) as per the manufacturer's instructions. DNA was checked for sterility and shipped to the University of the Sunshine Coast, Queensland, Australia. The quantity and purity of extracted DNA were assessed using a NanoDrop 2000 (Thermo Scientific) and 1 % gel electrophoresis. DNA for Illumina WGS was submitted on dry ice to the Australian Genome Research Facility (AGRF; North Melbourne, VIC, Australia).
We performed a hierarchical rooted phylogenetic approach to identify the appropriate root for our N. seriolae-only phylogeny ( Fig. 3). First, we identified the nearest genetic neighbour to N. seriolae via a SPANDx phylogenomic comparison of 134 Nocardia species genomes belonging to 78 assigned species and 10 unassigned species (Fig. S1, available in the online version of this article). Next, we reconstructed a rooted phylogeny using the closest relative, N . concava NBRC 100430 (RefSeq accession: GCF_000308815.1) (Fig. S2), to determine the most ancestral N. seriolae strain for phylogenetic rooting.

SNP genotyping
The SPANDx SNP matrix was used to identify SNPs that: (i) distinguished Vietnamese from non-Vietnamese N. seriolae strains (220 SNPs; SNP1 assay), and (ii) differentiated the two Vietnamese clades (103 SNPs; SNP2 assay). We selected SNPs at positions 60409 and 587171 in EM150506 for SNP1 and SNP2 assay design, respectively (Data S1). SYBR Green-based mismatch amplification mutation assay (SYBR-MAMA) real-time PCRs were developed to permit rapid genotyping of all strains from this study against these two SNPs. SYBR-MAMA, also known as allele-specific PCR or amplification-refractory mutation system, exploits the differential 3′ amplification efficiency of Taq polymerase in real time via allele-specific primers targeting each SNP allele at their ultimate 3′-end [45]. SYBR-MAMA has been used for SNP genotyping in many bacteria [46,47] due to its low cost and simplicity . Each SNP assay consisted of one common primer and two allele-specific primers, matching either the non-Viet allele or the Viet allele for the SNP1 assay, and the Viet Clade 1 allele or Viet Clade 2 allele for the SNP2 assay ( Table 2). The same destabilizing mismatch (A for SNP1 and G for SNP2) was incorporated at the penultimate (−2) 3′ base of both allele-specific primers to increase allele specificity [48]. Cycles-to-threshold (C T ) values for each allele-specific reaction were used to determine the SNP genotype for each strain via a change in C T value (ΔC T ).
To validate SNP genotypes for our newly developed assays, we first established the reference ΔC T values for each assay by running against the two Taiwanese and seven genome-sequenced Vietnamese strains. Assays were then tested against the 13 remaining Vietnamese isolates to determine their genotypes. For each PCR run, control DNA samples representing the matching and nonmatching allele genotypes were used as positive controls, and at least two no-template controls were included.

Virulence and antimicrobial resistance profile determination
The identification of antimicrobial resistance-and virulence-related genes among the Vietnamese N. seriolae genomes were performed using RAST and the Virulence Factor Database (VFDB), Victors and PATRIC Virulence Factor (VF) databases available on the Pathosystems Resource Integration Center (PATRIC) [60,61]. In addition, homologues of experimentally verified pathogenicity determinants within other members of the genus Nocardia were searched for in the N. seriolae genomes.

PFGE genotypes
Twenty N. seriolae isolates from four Vietnamese coastal provinces (Fig. 1) were subjected to XbaI and AseI digestion to determine isolate relatedness across provinces. Restriction fragment sizes ranged from 40 kb to 1.1 Mbp. PFGE with XbaI alone resulted in between 19 and 21 restriction fragments among the Vietnamese strains; similarly, between 16 and 20 fragments were identified using AseI. Seven distinct patterns (labelled as pulsotypes NsX1-NsX7) were present using XbaIdigested DNA fragments, and ten patterns (labelled as pulsotypes NsA1-NsA10) for AseI. Using the ≥80 % similarity cut-off and 'fewer than six bands of difference' Tenover criteria, only one cluster was identified for each enzyme [29,30]. Even when combining data from both enzymes, the 20 Vietnamese isolates were still closely related, irrespective of their geographical origin, as shown by their categorization into a single cluster that was distinct from the Japanese type strain (Fig. 4).

Phylogenomic analysis
Based on the PFGE results, seven geographically diverse Vietnamese isolates were Illumina-sequenced, resulting in highcoverage draft genomes (Table 3). These genomic data were generated to address two questions: (i) whether comparative genomics, as with PFGE, would reveal minimal genetic diversity among the Vietnamese N. seriolae strains, and (ii) whether phylogenomic analysis could identify a potential origin for nocardiosis in Vietnamese aquaculture facilities. The seven Vietnamese genomes generated in this study, plus the sequences of 13 publicly available N. seriolae strains (all from other Asian countries), were compared to identify phylogenetically informative SNPs. A total of 8206 SNPs were identified; 7517 (91.6 %) were located in coding regions and comprised 126 nonsense, 5163 missense and 1531 silent variants. Of the 8206 SNPs, 7275 high-confidence, orthologous, core genome, biallelic SNPs were identified among the 20 N. seriolae strains; these SNPs were used for phylogenomic reconstruction.
The phylogenomic dendrogram revealed five distinct strain clusters (Fig. 3). As with PFGE, the seven Vietnamese isolates were highly clonal, with all strains clustering into a single unique 'Vietnamese' clade. Within this clade were two subclades that differed by 103 SNPs. These subclade SNPs were well distributed across the genome, with no evidence of SNP clusters due to recombination. The phylogenomic analysis also suggested that N. seriolae undergoes very little, if any, recombination, as demonstrated by a very high consistency index of 0.997; in other words, homoplastic SNP characters, which are more common following recombination events [62], were essentially absent. Within the two Vietnamese subclades, isolates were virtually identical (0-5 SNPs), indicating limited genomic alterations among these lineages (Fig. 3). Notably, there was no link between geographical region and subclade placement, with strains from Phú Yên, Khánh Hòa and Vũng Tàu falling into both Vietnamese subclades, indicating frequent N. seriolae transmission events between regions. The most recent common ancestor of the Vietnamese strains differed by 220 SNPs from the next closest known strain, UTF1, which was isolated from cultured yellowtail that succumbed to nocardiosis in 2008 in Miyazaki Prefecture, Japan [27].  Table 1). No amplification was observed for the no-template controls.

Genome assembly and functional annotation
To gain deeper insights into the seven Vietnamese N. seriolae genomes, we conducted a comparative analysis of genome assembly metrics and gene function. The Vietnamese genomes possess 6937 core genes and encode  (Table 3). Multiple genome alignment of all strains against EM150506 using BRIG showed a high degree of homology (Fig. 5), demonstrating high conservation among N. seriolae genomes. There were four main non-homologous regions (positions 2 700 000-3 100 000, 3 900 000-4 100 000, 7 500 000-7 600 000 and 8 000 000-8 200 000 bp) that were present in the reference genome but absent in all other genomes; these regions may indeed be absent or may simply reflect differences in assembly quality [5]. Most genes at these loci were classified as hypothetical proteins, mobile element proteins and repeat regions; the remaining loci are mainly genes involved in membrane transport, biosynthesis, metabolism and transcription (Data S2).  (Table 4). Little difference was found in the number of genes in family categories among Vietnamese vs. non-Vietnamese strains (Table 4). No plasmids were identified in any of the Vietnamese genomes, consistent with most N. seriolae genomes lacking plasmids; the only exception is CK-14008 from South Korea, which potentially harbours two plasmids [28].
A typical CRISPR-Cas system contains both a CRISPR array of repeat and spacer units, and associated cas genes; however, many systems are devoid of one of these components. These atypical CRISPR configurations are known as 'orphan' or 'isolated' CRISPR arrays and cas loci depending on which component is lacking. Between three and six CRISPR arrays were found in the Vietnamese strains, with lengths varying from 73 to 114 bp. Each array is made up of two direct repeats and one spacer without nearby Cas (CRISPR-associated) genes (Data S3). Notably, the same CRISPR array structure was found in all 20 N. seriolae genomes.

Virulence and antimicrobial/toxin resistance profiles
To explore the pathogenic potential of the Vietnamese N. seriolae strains, we assessed their virulence and antimicrobial/toxin resistance gene content in comparison to non-Vietnamese genomes. The RAST, VFDB, Victors and VF databases found between 182 and 202 genes that encode virulence and resistance factors, including gene products associated with Adherence (n=50-54), Cellular metabolism and nutrient uptake (   dismutase, phospholipase C and protease [63], were present in all Vietnamese and non-Vietnamese strains, indicating that they are highly conserved genes within this genus.

DISCUSSION
PFGE has conventionally been considered the 'gold standard' for studying the genetic diversity of many different pathogenic bacteria species, including N. seriolae [19,20,30,64]. PFGE has previously identified multiple pulsotypes among isolates retrieved from fish in Japan and Taiwan [19,20]. Notably, one study identified identical pulsotypes between certain Taiwanese 1997-2007 outbreak strains and Japanese N. seriolae isolated from yellowtail in 2002 (pulsotypes X1 and A1) and 2005 (pulsotype X11) [20], suggesting at least two transmission events between Taiwan and Japan. Unlike N. seriolae from Japan and Taiwan, all 20 Vietnamese isolates fell into a single cluster, even when using a combination of XbaI and AseI. However, PFGE lacked the resolution to differentiate Vietnamese isolates into the two clades identified using phylogenomic analysis. This limited resolution has also been documented for other bacteria such as Salmonella enterica [65], Listeria monocytogenes [66] and Escherichia coli [67]. It was unfortunately not practical to compare the Vietnamese pulsotypes with published studies due to known challenges with interlaboratory standardization using PFGE [68]; therefore, it is not known whether the Vietnamese PFGE cluster has been previously reported.
Next-generation sequencing provides excellent resolution, accuracy and data portability, and as such, has begun replacing PFGE as the new gold standard for nocardiosis outbreak analyses [69]. To illustrate the value of WGS for nocardiosis epidemiological investigations, we sequenced seven representative Vietnamese N. seriolae strains and compared them with all publicly available genomes (n=13). Like PFGE, the limited genomic variation (0-5 SNPs; Fig. 3) observed among Vietnamese strains confirms a recent, single introduction into Vietnam, with subsequent dissemination across multiple mariculture facilities within the South Central Coast region. Phylogenomic analysis showed that Vietnamese strains were most closely related to UTF1, which was isolated from farmed yellowtail in Japan in 2008 [27]; this strain differed from the Vietnamese common ancestor by just 220 SNPs (MRCA: ~1998). Shimahara and colleagues [20] have previously postulated that transboundary translocation of live fish stocks asymptomatically infected with N. seriolae from China and Hong Kong may have introduced new strains into Japan. Wild-caught amberjack juveniles, one of the most susceptible host species for N. seriolae infection, was also reportedly imported into Japan from Vietnam in 2000 [70]. However, there has not yet been a case of nocardiosis reported in Vietnam in other aquatic species besides Trachinotus species, and the first of these cases were only recorded in 2012 [9]; therefore, it is unlikely that the Japanese N. seriolae was introduced from amberjack imported from Vietnam. Based on our genomic analysis, it is plausible that N. seriolae from Japan has been introduced into other countries such as Vietnam given that international export of valuable aquaculture fish species is relatively common; however, there is a paucity of information about import-export of live fish stocks from Japan or Vietnam, and, as such, this hypothesis cannot be confirmed.
Our BEAST results (Fig. S3) add further to our hypothesis of a recent introduction of N. seriolae into Vietnam from infected Trachinotus species. Our analysis showed that N. seriolae introduction into Vietnam occurred in ~2001 (95 % HPD: 1999-2003), which fits with the Taiwanese/Japanese outbreaks occurring in the late 1990s and early 2000s. We unfortunately lack isolate data from Taiwan that could suggest the directionality of transfer, or that could provide more accurate source attribution; nevertheless, we have been able to make some interesting and useful insights into the evolutionary history of N. seriolae in Vietnam based on this dated phylogeny.
Whilst our results suggest a probable Asian origin for the Vietnamese outbreaks, there are few publicly available N. seriolae genomes (only 20 as of 11 February 2022, including seven from our study), and none from other Asian regions such as Taiwan [20], Singapore, Malaysia, or Indonesia [71], or non-Asian regions such as Mexico [23] and USA [21] where N. seriolae outbreaks have been documented; therefore, the precise origin of the Vietnamese outbreaks and mode of N. seriolae introduction currently remain unresolved. Concerningly, our results, and those of others, demonstrate that, unchecked, N. seriolae transmission may represent a substantial unmitigated risk to fish aquaculture. It is thus an utmost imperative to establish domestic and international monitoring processes for N. seriolae for both farmed and wild species, including the implementation of molecular methods to characterize new outbreaks, to prevent the spread of this devastating pathogen into new environments, and associated heavy economic losses and food security concerns.
To facilitate the rapid identification of N. seriolae genotypes among our Vietnamese strains, we designed inexpensive SYBR-MAMAs targeting two phylogenetically informative SNPs. The first SNP assay robustly differentiates Vietnam from non-Vietnamese strains, thereby permitting prospective identification of newly transmitted strains into Vietnam, an essential facet in future fish importation biocontrol efforts. This assay can also be used to monitor for the emergence of Vietnamese strains in new regions, such as new aquaculture facilities in Vietnam, or prior to export of fingerlings to other countries. The second SNP assay rapidly differentiates strains belonging to the two Vietnamese clades. By applying this second assay to the 20 Vietnamese strains, we observed that both clades were well disseminated across all four provinces: Khánh Hòa, Ninh Thuận, Phú Yên and Vũng Tàu. Phylogenomic analysis of seven representative Vietnamese strains also showed dispersal of these two clades among three of the four provinces. Although unconfirmed, it is probable that the widespread trade of eggs, fingerlings and live permit fish for aquaculture in Vietnam since industry inception in the early 2000s, including local unmonitored trade among fish farmers, has driven the successful dissemination of N. seriolae among Vietnamese permit farms. Taken together, our findings highlight the large risk of undetected N. seriolae dispersal among mariculture facilities and the need for establishing strict monitoring practices to prevent further pathogen transmission.
WGS is currently laborious, expensive and inaccessible to most laboratories in Vietnam and many other Asian countries.
Using comparative genomics, we established a catalogue of SNPs specific to each clade and subclade. This SNP database may be useful for both targeted resequencing efforts and the design of phylogenetically robust genotyping methods to permit source tracing of future N. seriolae outbreaks without the requirement for further WGS or bioinformatic analyses. The SYBR-MAMAs developed in this study successfully detected two phylogenetically informative SNPs, with genotyping results fully concordant with WGS, confirming that SYBR-MAMA is a valuable and inexpensive diagnostic method for SNP characterization.
Very little is known about the pathogenesis of Nocardia species, which are capable of invading host macrophages and preventing the fusion of phagosomes with lysosomes, leading to long-term survival and proliferation in host cells [72]. Due to the paucity of available genomic data for this pathogen, a final aspect of this study was to better understand virulence and antimicrobial resistance factors encoded by the N. seriolae genome. Our analysis of 2020 N. seriolae genomes is the largest genomic assessment of this pathogen to date, and largely corroborates the conclusions drawn from a previous analysis of seven N. seriolae genomes, which showed that N. seriolae have >99.9 % Orthologous Average Nucleotide Identity values [28]. Analysis of the genome content of seven Vietnamese N. seriolae strains revealed that, like non-Vietnamese strains, they encode a high proportion of 'hypothetical protein' genes (i.e. 45.8 %), a finding that highlights the need for more studies to investigate the functions of these genes. More than 180 core genes (present in all strains) were found to code for antimicrobial resistance and virulence factors in the Vietnamese strains, including genes associated with Adherence (n=49), Cellular metabolism and nutrient uptake (n=10), Damage (n=6), Invasion and intracellular survival (n=33), Resistance to antibiotics and toxic compounds (n=26), and Others (n=11) that may possibly account for the main virulence traits of this fish pathogen. The presence of conserved genes encoding β-lactamase class C-like and penicillin-binding proteins (n=11), multidrug resistance protein ErmB (n=1), probable multidrug resistance protein NorM (n=1) and a small multidrug resistance family protein (n=1) in all N. seriolae genomes may explain observed antimicrobial resistance towards penicillin and cephalexin, two β-lactam antibiotics that are commonly used to treat nocardiosis in Vietnamese permit fish farms (data not shown).
CRISPRs, which are encoded by many bacterial and archaeal species, defend against invasive mobile genetic elements such as viral or plasmid DNA [73], and also play a role in bacterial pathogenesis, biofilm formation, adherence, programmed cell death and quorum sensing [74]. Acquisition and maintenance of CRISPR-Cas systems are greatly influenced by environmental conditions and microbial communities [75]. Recent research has shown that 40 % of CRISPR-Cas loci are away from any associated cas genes or are not associated with cas genes, which are known as orphan CRISPR arrays [76]. Like many other bacterial species such as Listeria monocytogenes, Aggregatibacter actinomycetemcomitans, Enterococcus faecalis, Staphylococcus spp., Pseudomonas aeruginosa and Salmonella enterica [77][78][79][80][81], orphan CRISPR arrays were found in N. seriolae genomes. These incomplete CRISPR-Cas systems may be a remnant of decaying loci that are recruited and/or selectively maintained to perform important, but as yet unknown, biological functions [73]. Alternatively, our results may be an artefact of current CRISPR-Cas prediction tools, which predict the CRISPRs primarily based on the typical CRISPR structure [77]. As the role of these CRISPR loci in N. seriolae is not yet known, further work is needed to uncover their precise role in this pathogen.
In conclusion, our study provides novel insights into the epidemiology of N. seriolae outbreaks in farmed permit fish in Vietnam. Our detailed molecular and genomic analyses revealed minimal genomic diversity among Vietnamese N. seriolae isolates. Unlike PFGE, WGS detected strain variation at single-base resolution, and identified two distinct Vietnamese clades that share recent ancestry. Our results indicate recent importation of a single N. seriolae clone into Vietnam, which has then led to a nationwide outbreak of nocardiosis in permit fish farms. The analysis of additional genomes, particularly from other geographical regions, will be important for better understanding N. seriolae evolution, and will enable more precise investigations into the origin and transmission of this devastating pathogen. Finally, our SNP assays provide a rapid and inexpensive method for genotyping of ongoing and future nocardiosis outbreaks in Vietnam.

Funding information
This research was supported by an Australia Awards PhD scholarship to C.L., which is funded by the Australian Department of Foreign Affairs and Trade. D.S.S. and E.P.P. were supported by Advance Queensland fellowships (AQRF13016-17RD2 and AQIRF0362018, respectively) Author contributions C.L.: Project design, sample collection, sample and data analysis, results interpretation, drafting paper. D.S.S.: Data analyses and interpretation, drafting and revising paper. E.P.P.: Supervision, data analyses and interpretation, drafting and revising paper. T.T.A.N.: Assistance in the sample preparation and drafting paper. D.P.: Sample collection guidance, drafting and revising paper. H.V.-K.: Sample collection guidance, drafting and revising paper. I.D.K.: Assisting with the project design, revising paper. W.K.: Supervision, advising on project design, drafting paper. S.-C.C.: Assistance in PFGE analyses, drafting paper. M.K.: Supervision, project design, revising paper. All authors read and approved the final manuscript.

Conflicts of interest
The authors have no competing interests to declare.