Comparative Genomic Analysis and a Novel Set of Missense Mutation of the Leptospira weilii Serogroup Mini From the Urine of Asymptomatic Dogs in Thailand

Leptospira weilii belongs to the pathogenic Leptospira group and is a causal agent of human and animal leptospirosis in many world regions. L. weilii can produce varied clinical presentations from asymptomatic through acute to chronic infections and occupy several ecological niches. Nevertheless, the genomic feature and genetic basis behind the host adaptability of L. weilii remain elusive due to limited information. Therefore, this study aimed to examine the complete circular genomes of two new L. weilii serogroup Mini strains (CUDO6 and CUD13) recovered from the urine of asymptomatic dogs in Thailand and then compared with the 17 genomes available for L. weilii. Variant calling analysis (VCA) was also undertaken to gain potential insight into the missense mutations, focusing on the known pathogenesis-related genes. Whole genome sequences revealed that the CUDO6 and CUD13 strains each contained two chromosomes and one plasmid, with average genome size and G+C content of 4.37 Mbp and 40.7%, respectively. Both strains harbored almost all the confirmed pathogenesis-related genes in Leptospira. Two novel plasmid sequences, pDO6 and pD13, were identified in the strains CUDO6 and CUD13. Both plasmids contained genes responsible for stress response that may play important roles in bacterial adaptation during persistence in the kidneys. The core-single nucleotide polymorphisms phylogeny demonstrated that both strains had a close genetic relationship. Amongst the 19 L. weilii strains analyzed, the pan-genome analysis showed an open pan-genome structure, correlated with their high genetic diversity. VCA identified missense mutations in genes involved in endoflagella, lipopolysaccharide (LPS) structure, mammalian cell entry protein, and hemolytic activities, and may be associated with host-adaptation in the strains. Missense mutations of the endoflagella genes of CUDO6 and CUD13 were associated with loss of motility. These findings extend the knowledge about the pathogenic molecular mechanisms and genomic evolution of this important zoonotic pathogen.


INTRODUCTION
Leptospirosis is a significant zoonotic disease caused by spirochetal bacteria in the genus Leptospira. The disease is recognized as a public health concern in many parts of the world, including Thailand (Wuthiekanun et al., 2007;Costa et al., 2015). There are an estimated more than 1.03 million cases and 60,000 deaths from human leptospirosis worldwide per year, and the frequency has tended to increase due to global climate change . Most mammals can be infected by Leptospira with different clinical manifestations ranging from asymptomatic through chronic to acute-life threatening infections (Ko et al., 2009). In addition, asymptomatic animals play a role in disease maintenance and transmission by carrying the pathogen in their kidneys and shedding them via their urine, leading to contamination in the environment and infections in humans (Kurilung et al., 2017).
At least 64 different Leptospira species have been recognized and divided into three groups based on their phylogeny and virulence status, the latter encompassing the pathogenic, intermediate, and saprophytic groups (Picardeau, 2017;Thibeaux et al., 2018;Casanovas-Massana et al., 2019Vincent et al., 2019). Moreover, 24 serogroups with more than 300 serovars have been determined based on the diversity of the lipopolysaccharide (LPS) antigen on the outer cell membrane (Evangelista and Coburn, 2010).
Leptospira weilii is a member of the pathogenic Leptospira group and has been widely reported to cause disease in humans and animals in Australia, China, Chile, and Thailand (Corney et al., 2008;Mason et al., 2016;Kurilung et al., 2017;Xu et al., 2017). L. weilii infection can develop with the classical symptoms of typical leptospirosis in human, such as fever, headaches, and myalgia, similar to that of other pathogenic Leptospira, but can also produce pathological changes in the lungs and kidneys in the experimental guinea pig model (Slack et al., 2007;Xu et al., 2017). Nevertheless, two L. weilii strains (CUDO6 and CUD13) were recently isolated from the urine of asymptomatic dogs, and to our knowledge, this is the first isolation from a canine source (Kurilung et al., 2017). The characterization of CUDO6 and CUD13 became essential to understand strains that show no infection symptoms in dogs, and serve as a basis for understanding the potential public health impact that such strains may have in the region.
The advent of next-generation sequencing (NGS) over the past decade has accelerated a sudden increase in the number of genome sequences in the database, including for Leptospira species. This makes it possible to define genomic information and compare them amongst different groups and/or sources of infection (Fouts et al., 2016). Comparative genome analysis has revealed remarkable knowledge on the genus Leptospira in terms of their genes related to pathogenesis and hostadaptation and provided new insights into their genetic variation and evolutionary pathway (Xu et al., 2016). To expand our understanding of genome characteristics along with the evolution process related to host-adaptation, we serotyped and sequenced the L. weilii strains CUDO6 and CUD13 using the Oxford Nanopore Technology (ONT) and Illumina NovaSeq platforms to generate complete and accurate whole genome sequences. Multiple bioinformatics tools were then used to characterize their genomic features, and they were then compared with the other available strains. Variant calling analysis (VCA) was also undertaken to investigate missense mutations that affected protein-coding genes related to the pathogenesis of Leptospira. This study provides a genetic basis for pathogenic Leptospira and improves our understanding of evolution and molecular pathogenesis for further studies.

Bacterial Strains, Growth Conditions, and DNA Isolation
The genome of 19 L. weilii strains that originated from different hosts and geographical locations was used in this study (Table 1), except for L. weilii strain ICFT for which the genome was of poor quality and likely to have been species misclassified based on core genome phylogeny and average nucleotide identity (ANI) (Bulach and Adler, 2018;Vincent et al., 2019). The genome of the L. weilii strains CUDO6 and CUD13 were obtained during this study, whereas the genome of the 17 other strains was retrieved from the NCBI database for genomic comparison. Strains CUDO6 and CUD13 were recovered from the urine of asymptomatic dogs in Nan province, Thailand, in 2014. Both strains had been isolated in the same period (the year 2014) and in the same area (Sop Khun village, Nan province, Thailand), and were confirmed at the species level by partial rrs sequencing and comparative phylogeny. In addition, multilocus sequence typing (MLST) was also examined using seven house-keeping genes (caiB, glumU, mreA, pfkB, pntA, sucA, and tpiA) as previously described (Boonsilp et al., 2013). Both strains were assigned as novel sequence type 94 (ST94) belonging to the L. weilii cluster. Moreover, the strains shared the single locus variant (mreA52) with ST183 (mreA49), and ST193 (mreA55) from human L. weilii isolates in Laos and China (Kurilung et al., 2017).
The L. weilii CUDO6 and CUD13 strains were kept as a stock culture and stored at −80 • C before use. A total of 1 mL of stock culture was inoculated into 10 mL of liquid Ellinghausen-McCullough-Johnson-Harris medium supplemented with 6% (v/v) rabbit serum and incubated at 30 • C for 2 weeks. For DNA isolation, the 10-mL cultures were centrifuged at 4,800 rpm for 10 min, and then the genomic DNA was extracted from the cell pellet using the guanidinium thiocyanate-phenol-chloroform extraction method (Pitcher et al., 1989). Subsequently, the obtained DNA was sheared to get DNA fragments ranging from 6 to 20 kbp using Covaris g-TUBE (Covaris, United States). The fragmented DNA was further enriched using Agencourt AMPure XP beads (Beckman Coulter, United States), and measured for quantity and quality (A 260 /A 280 ratio) using Qubit Fluorometric Quantitation (Thermo Fisher Scientific, United States) and NanoDrop (Thermo Fisher Scientific, United States), respectively. Genomic DNA was stored at −20 • C for further library preparation and genome sequencing.

Genome Sequencing, Assembly, and Annotation
The DNA library was prepared using the ligation sequencing 1D kit (SQK-LSK108) and barcoded with the Native barcoding kit (EXP-NBD103) (Oxford Nanopore, United Kingdom) as per the manufacturer's instructions. Whole genome sequencing was performed using a FLO-MIN106 flow cell with the MinION sequencing device (Oxford Nanopore, United Kingdom). Similarly, the genomic DNA was also sent in parallel for shortread DNA sequencing using NovaSeq6000 S2 Reagent Kit (300 cycles) and S2 flow cell on a NovaSeq6000 (2 × 150-bp pairedend) system (Illumina, United States) at Eurofins Genomics GmbH, Germany. Raw fast5 reads from ONT were base called, quality filtered, and demultiplexed using Guppy v2.3.7 (ONT, United Kingdom). Adaptors from the ONT reads were removed using Porechop v0.2.1. 1 The ONT reads were then assembled using the Canu v1.7 software with the default parameter setting (Koren et al., 2017). Paired-end Illumina reads were paired, trimmed, and quality filtered using the Trimmomatic v0.36 software (Bolger et al., 2014). The quality of the Illumina reads was measured via the online FastQC v0.11.8 software. 2 Subsequently, the Illumina paired reads and draft genome sequence output from the Canu assembler were used as input files for Minimap2 (Li, 2018) and Racon (Vaser et al., 2017) for polishing twice. The third and fourth polishing steps were then performed with Pilon (Walker et al., 2014). For the final inspection and curation, the sequences were manually validated by mapping paired-Illumina reads to the polished sequences with the aid of the Geneious software v10.2.3 (Biomatters, New Zealand). Genome sequences of the strains CUDO6 and CUD13 were primarily annotated with Prokka v1.12 (Seemann, 2014), and the Rapid Annotation using Subsystem Technology (RAST) (Aziz et al., 2008). Metabolic pathways were annotated with KEGG (Kanehisa et al., 2017) and MetaCyc (Caspi et al., 2019), as implemented in the MicroScope platform (Vallenet et al., 2009).

Average Nucleotide Identity and Phylogenetic Analysis
To confirm and define the species delineation of the CUDO6 and CUD13 strains, the ANI values between each pair of L. weilii genomes, and the other public available genome of various Leptospira species (Table 1) were calculated using OrthoANI (Lee et al., 2016). Isolates with ANI values of more than 95% in pairwise genome comparisons were considered the same species.
To investigate the phylogenetic relationship of L. weilii, the phylogenetic analysis was constructed based on the concatenated sequence of core-single nucleotide polymorphism (core-SNPs) located in the core-genome region of the 19 available L. weilii strains (Table 1). Briefly, the 19 L. weilii genomes were aligned using the Parsnp program nested in the Harvest package with the default parameter settings (Treangen et al., 2014). The aligned core genome sequence was then filtered for recombination regions using Gubbins (Croucher et al., 2015). The SNP sites were then called and concatenated using the online Snippy v4.4.0 software 3 and the phylogenetic tree was generated using the maximum likelihood (ML) method with the GTR substitution model and 1,000 bootstrap replicates in the RAxML program (Stamatakis, 2014). The tree was visualized with FigTree. 4

Pan-Genome Analysis
The gene repertoire of the 19 L. weilii genomes was examined by generating a pan−genome structure using the Anvi'o package following the developer's recommendation (Delmont and Eren, 2018). Firstly, genome sequences were searched for open reading frames that could encode for proteins using Prodigal v2.6.3 (Hyatt et al., 2010). The proteome sequences of L. weilii strains (n = 19) were searched for amino acid similarity using the BLASTp algorithm and subsequently clustered as core (shared by all strains), accessory (shared at least one other strain), and unique (presented by only one strain) genes with MCL (Enright et al., 2002). The binary matrix of the presence/absence of gene clusters across all 19 L. weilii strains was used to generate fitting pan-genome and core-genome curves using the PanGP software . The pan-genome and core-genome curves were derived using a power-law regression model (y = A pan x Bpan + C pan ) and exponential decay model (y = A core e Bcore + C core ), respectively, Tettelin et al. (2005Tettelin et al. ( , 2008. The B pan or γ parameter of the pan-genome curve indicates whether the pan-genome size of L. weilii is open (0 < γ < 1), where the size of the pan-genome increases with additions of new genomes, or closed (γ < 0), where a limited maximum number of genes are found as further genomes are added (Tettelin et al., 2005;Tettelin et al., 2008). Additionally, the unique gene clusters of strains CUDO6 and CUD13 were further inspected and identified using the NCBI and VF databases.

Variant Calling Analysis
To identify SNPs, VCA was performed using the Snippy v4.4.0 pipeline (see text footnote 3). Briefly, the filtered Illumina reads of L. weilii strains CUDO6 and CUD13 were mapped against the reference genome of L. weilii strain 2006001853 (human isolate) using BWA-MEM v0.7.12 (Li and Durbin, 2009). The SNPs were subsequently identified using Freebayes v1.1 with the default parameter setting. 5 The missense variants that potentially affected the protein sequences were annotated using SnpEff v4.3 (Cingolani et al., 2012)

Serotyping of Leptospira weilii CUDO6 and CUD13 Strains
A panel of polyclonal antibodies representing all pathogenic and saprophytic Leptospira serogroups were primarily typed for serogroup classification of L. weilii strains CUDO6 and CUD13. Both strains were reacted to polyclonal antibodies producing from Leptospira serogroups Hebdomadis, Mini, and Sejroe. Nevertheless, they showed the highest MAT titer against Leptospira serogroup Mini (titer > 1:1,280) than that of other serogroups (Supplementary Table 1).
For serovar identification, the whole set of monoclonal antibodies designing to differentiate serovars of serogroup Mini was subsequently typed in both strains. The results indicated that L. weilii CUDO6 and CUD13 strains reached the highest MAT titer against monoclonal antibody F106C1 (titer = 1:640) with undesignated serovar identification (Supplementary Figure 1). Moreover, both strains shared a similar agglutination pattern to all monoclonal antibodies typing, suggesting that CUDO6 and CUD13 strains might be the same serovar and were likely to recognize as a novel serovar in dogs. They also had closely antigenic relatedness to undesignated serovar of Leptospira mayottensis serogroup Mini strain 200901116, supporting by closely General Genomic Feature of the Leptospira weilii CUDO6 and CUD13 Strains Analysis of the genomic features of L. weilii strains CUDO6 and CUD13, recovered from the urine of asymptomatic dogs, showed that the strains were comprised of three circular replicons with a similar G+C content of 40.9%, which accounted for a total size of 4.38 and 4.36 Mbp for CUDO6 and CUD13, respectively. Of these, the two large replicons of strains CUDO6 (3.96 and 0.32 Mbp) and CUD13 (3.97 and 0.28 Mbp) were referred to as CI and CII, respectively. The small circular replicons contained 89.23 kbp for strain CUDO6 and 95.48 kbp for strain CUD13 with an average G+C content of 37.28%. These small circular replicons confirmed the previous report of plasmids in L. weilii (Wang et al., 2015), and represent, to the best of our knowledge, the first release of a complete circular plasmid sequence in L. weilii. The two plasmids were designated pDO6 and pD13 for the strains CUDO6 and CUD13, respectively.
The L. weilii CUDO6 genome encoded for 3,965 proteincoding sequences (CDSs), whilst the L. weilii CUD13 genome yielded 3,905 CDSs. Of the predicted proteins, 1,006 CDSs and 1,001 CDSs, respectively, were function-categorized into 26 subsystems by RAST annotation. Both strains had a broadly similar number of proteins related to the subsystem of amino acids and derivatives, protein metabolism, cofactors, vitamins, prosthetic groups, pigments, carbohydrates, and motility and chemotaxis (Figure 1 and Supplementary Table 2).
The putative replication origin of CI and CII in both strains was predicted from the GC skews (Supplementary Figure 2). Similar to the other Leptospira species, the replication origin of CI was adjacent to the dnaA, dnaN, recF, and gyrAB genes (Ren et al., 2003;Nascimento et al., 2004b;Bulach et al., 2006;Picardeau et al., 2008), while replication origin of CII and the plasmid were adjacent to the partitioning system genes (parAB) and downstream of the replication gene (repB) (Bulach et al., 2006). Five rRNA genes, comprised of one rrf, two rrl, and two rrs genes, were identified in CI of CUDO6 and CUD13. Moreover, they had a similar number of tRNA genes (n = 37), encoding for all 20 amino acids. The chromosomal mapping based on the COG category of L. weilii strain CUD13 is illustrated in Figure 2, and the general genome characteristics of L. weilii CUDO6 and CUD13 are presented in Table 1.
From the KEGG and MetaCyc metabolic pathway analysis, genes encoding for the enzymes for biosynthesis of all 20 amino acids and glycolysis were present in both CUDO6 and CUD13 strains. In addition, genes responsible for longchain fatty acid β-oxidation and glycerol degradation were also found, including acetyl-CoA acyltransferase, 2,3,4saturated fatty acyl-CoA synthetase, enoyl-CoA hydratase, acetyl-CoA C-acyltransferase, glycerol kinase, and glycerol 3-phosphate oxidoreductase. Furthermore, genes involved in the tricarboxylic acid cycle, encompassing genes encoding for pyruvate dehydrogenase, malate dehydrogenase, and citrate FIGURE 1 | Number of RAST subsystems annotated in the L. weilii serogroup Mini strains CUDO6 (pink bar) and CUD13 (blue bar). The CDS of strains CUDO6 and CUD13 were classified into 26 functional subsystems, and arranged according to the number of CDS in each category. Of these, amino acid and derivatives, protein metabolism, and cofactor had the largest number of identified CDS in both L. weilii genomes.
synthase, were identified. The gene for pyruvate dehydrogenase could synthesize the intermediate product of acetyl-CoA, whilst malate dehydrogenase and citrate synthase could catalyze the reduction of malate and oxaloacetate to oxaloacetate and citrate, respectively (Supplementary Figure 3). These findings supported that both L. weilii strains could utilize fatty acid and glycerol as energy sources similar to those of all Leptospira species (Fouts et al., 2016).
The genome structure comparison between CUDO6 and CUD13 showed that both strains extensively shared syntenic region in CI, although insertion sequence (IS) mediated chromosomal rearrangements were observed in strain CUDO6. The major rearrangement event in CUDO6 CI occurred at around 1.40 Mbp from the replication origin. This region had two identical copies of ISLbp6 (fhg67_rs07085 and fhg67_rs14785) facing in opposite orientation at both sides of the inversion breakpoints, resulting in inverted sequence of a fragment of approximately 1.75 Mbp (Supplementary Figure 4). In contrast, the CII had a nearly colinear structure, except for an ∼39 kbp insertion region in CUDO6, which likely resulted from phagemediated horizontal gene transfer (Supplementary Figures 2, 4). Plasmid comparison showed highly sequence similarity, except for an 8 kbp insertion region in pD13, which contained genes encoding for hypothetical proteins (Supplementary Figure 4).
Although CUDO6 and CU13 belong to the same ST (ST94), genomic organization of both strains were different. This finding reflects the plasticity of genomic structure in Leptospira species (Jorge et al., 2018).

Phage, CRISPR-Cas System, and Putative Virulence Factor
A total of six and five putative prophage sequences were predicted in the CUDO6 and CUD13 genomes, respectively (Supplementary Table 3). The putative prophage sequences had a size range from 7.4 to 21.8 kbp.
FIGURE 2 | Circular maps of three replicons of L. weilii serogroup Mini strain CUD13. The outermost and second outermost rings represent genes on the sense and antisense strands colored according to COG categories annotation. The COG categories are denoted as follows: A (RNA processing and modification), B (chromatin structure and dynamics), J (translation, ribosomal structure and biogenesis), K (transcription), L (replication, recombination, and repair), D (cell cycle control, cell division, and chromosome partitioning), O (post-translational modification, protein turnover, and chaperones), M (cell wall/membrane/envelope biogenesis), N (cell motility), P (inorganic ion transport and metabolism), T (signal transduction mechanisms), U (intracellular trafficking, secretion, and vesicular transport), V (defense mechanisms), W (extracellular structures), Y (nuclear structure), Z (cytoskeleton), C (energy production and conversion), G (carbohydrate transport and metabolism), E (amino acid transport and metabolism), F (nucleotide transport and metabolism), H (coenzyme transport and metabolism), I (lipid transport and metabolism), Q (secondary metabolites biosynthesis, transport, and catabolism), R (general function prediction only), and S (function unknown). The second inner ring indicates the GC skew; positive skew is shown in pink-purple, and negative skew is shown in purple. The genome map was created and visualized using CGView (Stothard and Wishart, 2005).
One intact prophage was found in CI of both CUDO6 and CUD13, flanking by attL and attR with an estimated size 16.05 and 10.27 kbp in length, respectively, and a G+C content of around 41%, which was higher than that of the three circular replicons. They harbored a similar number of CDSs (n = 18), which were conserved between the two L. weilii strains. The prophage region was predicted to have an attachment site (attL and attR), putative tail fiber protein, lysis protein, phage-like proteins (PLPs), transposase, and hypothetical proteins that were homolog in sequence to many known prophage sequences, such as Arthrobacter phage vB_ArtM-ArV1, Burkholderia phage KS9, Stx2-converting phage 1717, and Listeria phage LP-114. Intriguingly, one cryptic incomplete prophage was found within the serovar determinant region (rfb locus) of the CUDO6 and CUD13 genomes, spanning around 10.5 kbp and encoding for 10 CDSs. Of these, four CDSs shared a 30-63% amino acid identity with the rhamnose biosynthesis enzymes of Burkholderia vietnamiensis phage G4, three CDSs matched (33-35% amino acid identity) to the glycosyltransferase genes of Staphylococcus aureus phage N315, and one CDS matched (27% amino acid identity) to the DegT family aminotransferase of Methylobacterium extorquens phage PA1. All CDSs related to O-antigen biosynthesis loci found on the prophage may be responsible for phage infection and/or the serogroup/serovar diversity in Leptospira species. Previously, O-antigen was demonstrated as an important receptor for leptophage infection (Schiettekatte et al., 2018). Additionally, phage infection has been also reported to play an essential role in the emergence of novel serotypes in Salmonella spp., Shigella flexneri, and Vibrio cholera (Wright, 1971;Allison and Verma, 2000;Faruque et al., 2003). Thus, this cryptic prophage may exploit O-antigen as a receptor and also mediate O-antigen variation and/or serovar conversion in Leptospira by altered sugar composition and/or sugar ramification on the LPS structure (Allison and Verma, 2000). The effect of prophage on the Leptospira, especially to O-antigen alteration and/or serotype diversity, remains however to be evaluated.
The PLP that encoded for the host-nuclease inhibitor protein matched (27% amino acid identity) to the Haemophilus phage SuMu. The host-nuclease inhibitor protein is responsible for inhibiting the host RecBCD proteins, one of the host defense mechanisms for bacteriophage infection. This putative prophage may use this protein to interfere the helicase and nuclease activities of RecBCD, enabling the phage to escape DNA degradation during phage replication (Court et al., 2007). In addition, plasmids pDO6 and pD13 were found to both have a similar intact prophage sequence with a total length of 21.8 kbp that encoded for 37 CDSs. Most of these CDSs were involved with the transcriptional regulatory system and post-segregational killing system. These proteins may assist both L. weilii strains adapt their growth and facilitate colonization in various milieus (Engelberg-Kulka and Glaser, 1999).
A cluster of regulatory interspaced short palindromic repeats and CRISPR-associated genes or CRISPR-Cas systems are an adaptive immunity of bacteria that confer resistance to foreign exogenous nucleic acid invasion, such as bacteriophages and plasmids (Makarova et al., 2015). These systems are found in around 40% of bacterial populations, including Leptospira species (Fouts et al., 2016;Makarova et al., 2020). This study used the criteria that broadly divided the CRISPR-Cas system into two classes (class 1 and class 2), with further subdivision into 33 distinct subtypes based upon the variety of the cas gene combination, sequence similarity, and phylogenetic analysis (Makarova et al., 2020). Both L. weilii strains CUDO6 and CUD13 harbored only one CRISPR loci, which was subsequently identified to CRISPR-Cas class 1 (CRISPR1) subtype I-E ( Figure 3A). This CRISPR comprised eight cas genes (cas1, cas2, cas3, cas5, cas6e, cas7, cas8e, and cse2) encoded for adaptation, expression, and interference modules for foreign DNA invasion (Al-Attar et al., 2011). The CRISPR arrays of both L. weilii strains were uncoupled with the cas genes cluster and localized upstream of the CRISPR loci. The CRISPR arrays consisted of two similar consensus directed repeats (CDRs) but with a different number of spacers (32 and 31 for CUDO6 and CUD13, respectively). Both L. weilii strains shared 31 identical spacers in their CRISPR arrays FIGURE 3 | CRISPR-Cas gene organization in the L. weilii serogroup Mini strains CUDO6 and CUD13. Both L. weilii strains contained CRISPR1 subtype I-E with an uncoupled CRISPR array region located upstream of the locus (A). The CRISPR arrays harbor both of the consensus directed repeats (black diamond) and spacers (white rectangle). Strains CUDO6 and CUD13 carried two different consensus directed repeats, as shown in the sequence logo box (A). DNA sequences in the spacers are transcribed into CRISPR RNA (crRNA) and targeted to the protospacer of invaded foreign DNA. The spacers of strains CUDO6 and CUD13 have the highest matching score to the protospacer of Leptospira plasmid Laicp and lcp1 (B). Table 4), with one unique spacer found in strain CUDO6 that was identified only as a hypothetical protein.

(Supplementary
The shared spacers of the CRISPR arrays of CUDO6 and CUD13 may reflect the same immunity background and evolutionary pathway of the two strains by sharing the same environmental challenges before dissemination to the new host. For a more detailed analysis, the spacer sequences of both L. weilii were screened for a putative protospacer that matched to the phage/prophage and plasmid sequence using the CRISPRTarget web tool (Biswas et al., 2013). Among the 31 identical spacers, four protospacers were targeted to Leptospira phages and plasmids, including Leptospira phage LalZ_80412, L. interrogans plasmid Laicp, L. interrogans plasmid lcp1, and L. mayottensis plasmid p_Lmay_MDI272 ( Figure 3B). The result suggested that these phages and plasmids may successfully enter L. weilii.
According to the BLAST search of the VFDB, a total of 518 and 519 putative VFs were found in strains CUDO6 and CUD13, respectively (Supplementary Tables 5, 6). Most of the VFs were broadly categorized into adherence, motility, chemotaxis, and secretion systems. To gain insight into the main pathogenesisrelated proteins in Leptospira, a total of 34 representatives experimentally confirmed VFs were recruited from the literature (Murray, 2015;Fouts et al., 2016;Picardeau, 2017), and protein sequence homology searching with the BLASTp algorithm revealed that 23 proteins (LigB, OmpL1, OmpL37, OmpL47, LipL21, LipL45, LruA, LipL32, FcpA, FlaA2, FliM, FliN, Loa22, ColA, Mce, KatE, ClpB, Sph2, HemO, LpxD, LA0589, LB194, and TlyA) were similarly distributed amongst all analyzed L. weilii strains (n = 19), indicating that those VFs were commonly shared in all L. weilii (Figure 4 and Supplementary Table 7). Strains CUDO6 and CUD13 contained almost all of the confirmed VFs. Nevertheless, proteins LenB, LenD, and LigA for adherence function were absent in both strains, like those reported for other L. weilii strains. Protein LA1641, which is associated with LPS structure was also absent in CUDO6 (Murray et al., 2010). Although strains CUDO6 and CUD13 had genes with confirmed roles as VFs, they were isolated from the urine of asymptomatic dogs. Thus, a balance of gene contents and gene expressions may provide a better explanation than the simple gene presence or absence for bacterial phenotypes. The process of host-adaptation is hypothesized to include both events. Therefore, virulence genes present in the L. weilii genome may be down-regulated during host-adaptation, thereby facilitating bacterial fitness to establish persistence in dog kidneys.
rfb Locus, Sialic Biosynthesis, and Lipid a Biosynthesis Encoding Genes According to the serotyping, both the CUDO6 and CUD13 L. weilii strains were identified as belonging to the serogroup Mini (Supplementary Table 1). Subsequently, serovar typing was homologous to the undesignated serovar, which was serologically closely related to the undesignated serovar of L. mayottensis serogroup Mini strain 200901116 (Bourhy et al., 2014). We next investigated serovar determinant region (rfb locus) of the strains CUDO6 and CUD13, and compared these genetic contents with the same (serogroup Mini) and different serogroups (serogroups Hardjo and Icterohaemorrhagiae) to find the link between serotyping and genetic information.
Strains CUDO6 and CUD13 harbored a serovar determinant region (rfb locus) spanning around 93 kbp in the genome and encoding for 87 CDSs (Supplementary Figure 5). Consistent with previous reports, their rfb loci were embedded between the conserved MarR transcriptional regulator protein and the DASS sodium-coupled anion symporter, localized upstream and downstream of the rfb loci, respectively (Fouts et al., 2016). The complete genes of the dTDT-rhamnose biosynthesis (rfbABCD) were found in their rfb loci, similar to that in other pathogenic Leptospira species (Fouts et al., 2016). Therefore, this typical gene cluster may be necessary for pathogenesis by altering the complexity of the LPS structure (Patra et al., 2015).
The Wzx/Wzy-dependent pathway associated with the O-antigen assembly and export system were subsequently found amongst the loci of both strains, including the flippase gene wzx and the O-antigen polymerase gene wzy. The sialic biosynthesis gene for N-acetylneuraminic (sialic) synthase (neuB2) was also identified within the loci. This gene is involved in producing sialic acid and its derivatives (legionamic and pseudaminic acids), which participate in the post-translational modification of surface proteins, and are involved in bacterial colonization, immune evasion, and biofilm formation (Fouts et al., 2016). Therefore, the sialic biosynthesis genes may facilitate CUDO6 and CUD13 to persist as asymptomatic infection agents in dogs. Although the lipid A biosynthesis encoding genes were not situated within this region, a complete set of 13 genes (lpxA, lpxB1, lpxB2, lpxC, lpxD1, lpxD2, lpxK, kdsA, kdsB1, kdsB2, kdtA, lnt, and htrB) involved in the pathway were identified elsewhere in both L. weilii genomes (BLASTp analysis).
In comparison, both rfb loci of the CUDO6 and CUD13 strains showed a highly conserved sequence homology in gene compositions and organization (sharing of 87 paired orthologous proteins). Moreover, a total of 73 orthologous genes were conserved with the closely related (by serotyping) strain of L. mayottensis serogroup Mini strain 200901166 (Supplementary Table 8), and had difference of genetic contents in rfb locus with distinct serogroups [serogroups Hardjo and Icterohaemorrhagiae (Supplementary Figure 5)]. This finding reflected that the gene contents in the rfb loci were shared among some antigens/epitopes in the same serogroup, supporting the previous serotypic observation that found serologically indistinguishable serotypes would share highly identical gene contents amongst their rfb loci even though they came from different Leptospira species (de la Pena-Moctezuma et al., 1999).
Nevertheless, there were examples of unique genes in the rfb loci of CUDO6 and CUD13 that were not found in L. mayottensis strain 200901166, although most of these unique genes encoded for hypothetical proteins. Changes in gene contents in this region are thought to be related to serovar diversity and may lead to variation in the serological characteristics (de la Pena-Moctezuma et al., 1999). In addition, four genes from the DegT family of aminotransferases were found in the rfb loci of both CUDO6 and CUD13. This protein may allow the biosynthesis of the O-antigen side chains and contribute to the unique LPS structure of strains CUDO6 and CUD13 (Nascimento et al., 2004a).

Characteristics of the Extrachromosomal Chromosome (Plasmid) of Leptospira weilii
The pDO6 and pD13 replicons contained 100 and 108 proteincoding genes (CDSs) with an overall G+C content of 37.40 and 37.20%, respectively. Both replicons had a lower G+C content than the circular chromosomes, which is congruent with the low G+C content of the confirmed plasmid Laicp in L. interrogans (34.64%)  and plasmid p74 in Leptospira biflexa (37.47%) (Picardeau et al., 2008). Additionally, the two extrachromosomal DNA replicons met the criteria of being independently replicating plasmids, since their genome harbored CDSs related to the replication region (predicted by GC skew), partition system (parA and parB), and toxin/antitoxin system (TAS) (mazE and mazF), which concurs with the previous report of plasmid characterization in L. interrogans . Moreover, a transcriptional regulator, transposase, and recombinase were identified in both plasmids. Notably, neither resistance genes nor specific virulence genes were detected within pDO6 and pD13, and they showed no similarity to the other L. weilii genomes in the database. However, some regions shared similarities with plasmids and chromosomes in other Leptospira species (i.e., L. interrogans and L. mayottensis), suggesting that the Leptospira plasmids might share some epitopes and could integrate some of their DNA into the chromosome.
When pDO6 and pD13 were compared with the plasmid Laicp and p74, no orthologous gene cluster was shared amongst the four analyzed plasmids. However, 15 orthologous gene clusters that mainly encoded for DNA replication protein, DNA-binding domain, MazEF toxin-antitoxin module, and transposase were common in the pathogenic Leptospira plasmid (pDO6, pD13, and Laicp) (Supplementary Table 9), suggesting that plasmids of saprophytic and pathogenic Leptospira might use a different mechanism to be maintained in the respective host cell. A total of 65 orthologous gene clusters were shared amongst plasmids pDO6 and pD13, including genes encoding for chromosome segregation protein, transcriptional regulator, and HigAB toxinantitoxin module. Plasmid pD13 had six unique genes, which all encoded for hypothetical proteins. Within both L. weilii plasmids, the TASs may play a functional role in the stress response during persistence in nutritionally limited terrestrial environments or in the kidneys, rather than a role in maintaining plasmid stabilization during cell division (Christensen et al., 2003). Therefore, plasmids that carry beneficial genes for survival may promote L. weilii adaptability to a specific habitat.

Phylogenetic Analysis and Average Nucleotide Identity
The ANI values from the pairwise genome comparisons within intragenic L. weilii strains were all above 95%, whereas the DNA-DNA relatedness comparisons with any two interspecies strains varied substantially from 64.69 to 94.41%. The ANI values supported the taxonomic status of strains CUDO6 and CUD13 as L. weilii species, as presented by the ANI values above 95% of the threshold for species delineation. Moreover, strains CUDO6 and CUD13 shared the highest ANI values (99.86%) (Supplementary Table 10).
The 19 L. weilii genomes analyzed contained a total of 40,599 SNPs in their core-genome region (core-SNPs), with the number of different SNPs between paired genomes varying from 39 to 31,428 SNP sites (Supplementary Table 11). Notably, L. weilii strains 56105 and 56655, which were isolated from human leptospirosis in Indonesia and China, respectively, harbored the largest number of different SNPs sites in their core-genome ( Figure 5 and Supplementary Table 11). This finding may imply the specific character of these two strains that presumably was related to the number of accumulated SNPs, leading to significant evolutionary diversification from the others. The strains CUDO6 and CUD13 had 39 different SNP sites between them. Moreover, the ML-based tree grouped these strains in the same clade with strong bootstrap support (100%), supporting their close genetic relationship. They clustered in the same lineage as the human L. weilii strains from Thailand (strain 2006001853), Laos (strain LNT1234), and China (strain 56655) (Figure 5). Thus, the progenitor of this lineage may have circulated amongst these localities. However, strains CUDO6 and CUD13 showed a distinct genetic distance, in terms of the SNP distance and phylogeny, from the human L. weilii strains, which may reflect the evolutionary diversification and adaptation of both strains to infect canine hosts.

Pan-Genome Analysis and Unique Gene Identification
Pan-genome analysis revealed that the gene repertoire of the 19 L. weilii genomes analyzed was comprised of a total of 5,744 orthologous gene clusters. Of these, 2,645 (46.04%) and 2,183 (38%) gene clusters were identified as belonging to the core and accessory genomes, respectively. The number of strain-specific gene clusters ranged from two in strain CUD13 to 269 in strain LT2116 ( Figure 6A). The power-law fitting curve for the pangenome of L. weilii appeared to be expanded after the continual addition of new genomes. In contrast, the trajectory of the exponential fit curve for the core-genome became constant after 17 genomes were analyzed ( Figure 6B).
According to the Heaps' law, the pan-genome of the 19 L. weilii analyzed showed characteristics of an "open" pangenome structure as supported by the γ parameter (γ = 0.12). Consistently, the new gene curve does not reach zero and contributed an average of 48 new genes in each sequential genome addition (Figure 6C). These results are consistent with the previous findings that described the open-pan genome nature of Leptospira (Fouts et al., 2016;Xu et al., 2016;Vincent et al., 2019), and correlated with their lifestyle and ability to successfully colonize and infect a broad range of hosts and environments.
A total of 1,946 orthologous gene clusters (73.57%) of the core-genome of L. weilii were functionally annotated based on the COG category classification (Supplementary Table 12). Of these, the core-genome of the 19 L. weilii analyzed was abundant in encoded proteins related to categories R (general function prediction only, 9.18%), T (signal transduction, 8.39%), M (cell wall/membrane/envelop biosynthesis, 8.05%), and J (translation, 6.42%). This result may explain the common abilities of L. weilii that permit this microorganism to sense, respond, replicate, and survive in various hosts and environmental conditions. Furthermore, the pan-genome analysis revealed strain-specific genes that were uniquely presented in strains CUDO6 and CUD13. Strain CUDO6 carried two strain-specific genes with potential hits in the VFDB of the twitching motility protein (30% amino acid identity) and auto-transport protein (29% amino acid identity) of Burkholderia pseudomallei K96243. Strain CUD13 contained three strain-specific genes, one as a hypothetical protein of unknown function, and the other two genes resembling the ABC transporters of Klebsiella pneumoniae NTUH-K2044 (28 and 31% amino acid identity, respectively).

Variant Calling Analysis
In comparison with L. weilii strain 2006001853 isolated from human leptospirosis in Thailand, a total of 1,101 missense mutations were identified amongst both L. weilii CDSs. Of these, 591 CDSs were functionally categorized in the COG database and those related to categories S (function unknown), M (cell wall/membrane/envelope biogenesis), T (signal transduction mechanisms), and E (amino acid transport and metabolism) were over-assigned in the set of missense variants, accounting for the majority of protein function that was affected ( Supplementary  Table 13). Interestingly, there were instances of missense mutations in pathogenesis-related genes involved with the endoflagellum (flaA2, flaB2, fliD, fliE, fliO, flgD, flhA, and flhB), LPS structure (degT, fcl, neuB1, neuB2, rfaD, rfbB, lptD, lpxD1, lpxD2, wzx, and wzy), mammalian cell entry protein (mce), and hemolytic and sphingomyelinase activities (sph1, sph2, and tlyA) in strains CUDO6 and CUD13. The missense mutations included SNPs, insertion/deletion (indels), and complex variations (when more than one change occurred in the same gene) ( Table 2). In endoflagellum and LPS structure, all missense mutation types were found amongst pathogenesis-related genes. The gene encoding for flagellar type III secretion system (flhB) showed mutations consisting of four SNPs and two indels, resulting in six amino acid changes. Similarly, wzx encoding for O-antigen flippase also showed several mutations consisting of four SNPs, generating four amino acid substitutions. For the mammalian cell entry gene mce, a A656G mutation was observed, leading to substitution of lysine to arginine at amino acid position 219. The most complex variations were found in sph2, which harbored eight SNPs and five indels, resulting in 19 amino acid changes ( Table 2). The missense mutation of genes encoding for virulence determinants of pathogenic Leptospira may be one reason for attenuated virulence and asymptomatic infection in animals.
Endoflagella-mediated motility is a crucial VF for pathogenic Leptospira (Gomes-Solecki et al., 2017). Constructed mutant strains with affected endoflagella-related genes (flaA2 and fcpA) showed a loss of translocation and ability to infiltrate the host cell (Lambert et al., 2012;Wunder et al., 2016). Moreover, a spontaneous non-motile mutant strain isolated from a dog with leptospirosis contained a single nucleotide deletion in fliM, resulting in the absence of motility in soft agar and lack of pathogenicity to cause disease in hamsters (Fontana et al., 2016). To examine the effect of the missense mutations presented in endoflagella genes on motility, motility of L. weilii strains CUDO6 and CUD13 was tested and compared with the motile reference L. interrogans serovar Copenhageni strain M20. Motility was observed for the L. interrogans control strain, which swarmed around 1 cm away from the inoculation spot, but not for L. weilii strains CUDO6 and CUD13 after 5 days of incubation (Supplementary Figure 6). Additionally, L. weilii strains CUDO6 and CUD13 reached a similar OD of 0.4-0.5 as L. interrogans serovar Copenhageni strain M20 after 5 days of incubation, indicating that the observed absence of motility was not due to a slower growth rate of the L. weilii strains. Although none of the missense mutations resulted in endoflagellar gene truncation, the result provides a plausible explanation that the missense mutations observed in the endoflagella genes of L. weilii strain CUDO6 and CUD13 might impair movement function, decrease systemic bacterial dissemination, and facilitate host adaptation during asymptomatic infection. Similarly, LPS has been described as an essential factor for Leptospira infection (Gomes-Solecki et al., 2017). This complex cell wall component comprises three parts (lipid A moiety, core oligosaccharide, and O-antigen) responsible for antigenic variation and serovar diversity in pathogenic Leptospira (Patra et al., 2015). Previously, a lpxD1-mutant strain with a hydrophobic modification of lipid FIGURE 6 | Pan-genome and core-genome analysis of 19 L. weilii strains. The genomes in this figure are ordered according to their phylogenomic analysis based on clustering of the presence and absence gene matrix, which is shown at the right upper of figure. The bars in the 19 first layers represent the gene clusters, as calculated by the Euclidian distances across 19 L. weilii genomes. The next blue layer demonstrates the functional annotation of gene clusters using the EggNOG database, and the last green layer corresponds to core-genome organization (A). The power-fit and exponential fitting curves with equations of the pan-and core-genomes of the 19 L. weilii are represented by blue and green lines, respectively (B), while the new gene curve is manifested by a brown line (C).
A in the LPS structure, showed attenuation of its virulence and gain of temperature adaptation (Eshghi et al., 2015). Additionally, evidence of over-expression of the O-antigen was associated with chronic infections in a rat model (Nally et al., 2005). Both findings could imply that alteration of the genes in the LPS structure may affect LPS expression and allow Leptospira to persist in animal carriers.
The mce gene encodes for mammalian cell entry protein and has a role in cell adherence and invasiveness in Leptospira . Accordingly, a mce-mutant strain was demonstrated to have an attenuated phenotype with a decreased ability to internalize into macrophages during the early stage of infection . Moreover, a mce-mutant strain was detected in the urine of an animal model up until 2 weeks of post-infection . This figure reflects the possibility that genetic mutation of mce may allow the bacterium to survive during infection and provide advantage associated asymptomatic carriage in animals. Pathogenic Leptospira can produce hemolysins that target red blood cells and host cell membrane sphingomyelin for nutrient acquisition and induce pro-inflammatory cytokines during the early stage of infection . However, the sph2-mutant strain had almost no hemolytic and sphingomyelinase activities, suggesting that genetic mutation of sph2 may reduce the function of hemolysins in pathogenic Leptospira (Narayanavari et al., 2015).
During chronic infection, pathogenic Leptospira will compete with selective pressures, arising from host immune response and changes in different ecological niches. Several adaptive strategies are used for short-and long-term adaptations, including altered VFs and/or the formation of factors that enhance their capacity to persist in the host. In short-term adaptation, many regulatory mechanisms are believed to quickly adjust gene expression to down-regulate their virulence and/or increase expression of genes related to host persistence, such as O-antigen and biofilm (Nally et al., 2005;Yamaguchi et al., 2018). Altogether, genetic alterations that affect protein function (i.e., missense mutation) may be linked to long-term adaptation, especially to beneficial mutations that are the consequence of positive or diversifying selection (Xu et al., 2016). This advantageous selection may either enforce augmentation of persistence factors and/or repression of VFs during asymptomatic infection, leading to an inheritable shift in the bacterial population to a less pathogenic state in the host (Kurilung et al., 2019). However, the association of genetic mutations with leptospiral asymptomatic infection in dogs remains to be elucidated in further studies.

CONCLUSION
This study provided new serotyping and genomic analyses insight into the two L. weilii strains (CUDO6 and CUD13) that were isolated from the urine of asymptomatic dogs. Both strains were of the same serogroup (serogroup Mini), with almost conserved genomic features. They had a close evolutionary relationship based on their core-SNPs phylogeny, reflecting the same microevolutionary background. Nevertheless, some disparities were identified amongst their genomes, including the chromosomal rearrangement, plasmid sequence, and unique genes encoding for twitching motility protein, autotransport protein, and ABC transporter protein. Compared with the L. weilii strain from human, missense mutations in pathogenesis-related genes encompassing endoflagella, LPS structure, mammalian cell entry, and hemolytic activities were identified in non-motile CUDO6 and CUD13. These may be involved with the host-adaption of the strains to enable them to persist as an asymptomatic infection in dogs. Overall, this study provides important genetic information to explore further the molecular epidemiology and the host-pathogen interactions of pathogenic L. weilii species.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/ Supplementary Material.

AUTHOR CONTRIBUTIONS
AK conceived and designed the experiments, performed the experiments, analyzed the data, and contributed to the writing of the manuscript. VP and NP contributed to study design, reagents, materials, and analysis tools, and supervision and proofreading of the writing. All authors contributed to the article and approved the submitted version.