Development of a sequence-based in silico OspA typing method for Borrelia burgdorferi sensu lato

Abstract Lyme disease (LD), caused by spirochete bacteria of the genus Borrelia burgdorferi sensu lato, remains the most common vector-borne disease in the northern hemisphere. Borrelia outer surface protein A (OspA) is an integral surface protein expressed during the tick cycle, and a validated vaccine target. There are at least 20 recognized Borrelia genospecies, that vary in OspA serotype. This study presents a new in silico sequence-based method for OspA typing using next-generation sequence data. Using a compiled database of over 400 Borrelia genomes encompassing the 4 most common disease-causing genospecies, we characterized OspA diversity in a manner that can accommodate existing and new OspA types and then defined boundaries for classification and assignment of OspA types based on the sequence similarity. To accommodate potential novel OspA types, we have developed a new nomenclature: OspA in silico type (IST). Beyond the ISTs that corresponded to existing OspA serotypes 1–8, we identified nine additional ISTs that cover new OspA variants in B. bavariensis (IST9–10), B. garinii (IST11–12), and other Borrelia genospecies (IST13–17). The IST typing scheme and associated OspA variants are available as part of the PubMLST Borrelia spp. database. Compared to traditional OspA serotyping methods, this new computational pipeline provides a more comprehensive and broadly applicable approach for characterization of OspA type and Borrelia genospecies to support vaccine development.


INTRODUCTION
Lyme borreliosis, or Lyme disease (LD), is the most common tickborne disease in the northern hemisphere and is caused by genospecies members of Borrelia burgdorferi sensu lato (s.l.) [1][2][3].B. burgdorferi spirochetes are extracellular pathogens whose lifecycle is restricted to cycling between vertebrae reservoir hosts and its vector, Ixodes ticks [4].At least 20 accepted genospecies have been described within the species complex of B. burgdorferi s.l.[5,6].Most LD cases in North America are caused by B. burgdorferi sensu stricto (hereafter B. burgdorferi), while multiple species of Borrelia cause the majority of LD in Europe and Asia, including B. burgdorferi B. garinii, B. bavariensis, and B. afzelii [1][2][3].Increases in the number of reported LD cases in these regions have been attributed to many factors, including but not limited to, changes in diagnostic procedures and case definitions [7], overestimation of cases due to differences and gaps in patient data [8,9], and geographical expansion of the tick vector, I. scapularis [10][11][12][13].
Outer surface protein A (OspA), encoded by ospA on linear plasmid 54 (lp54), is an outer membrane protein found in all B. burgdorferi s.l.species.OspA is expressed by the spirochetes while in the tick mid-gut, where it is integral to persistent colonization [14] and is a proven antigen target for LD vaccines [15][16][17][18][19][20][21].The primary disease-causing species of Borrelia in North America and Europe belong to OspA serotypes 1-6 [22].B. burgdorferi, the primary etiological agent of LD in North America, is typed as OspA serotype 1.By comparison, disease isolates from Europe include multiple genospecies and associated OspA serotypes.A comparison of ospA sequences revealed that certain European Borrelia species were rather homogeneous (e.g.B. afzelii serotype 2 and B. bavariensis serotype 4), while others were more heterogeneous (e.g.B. garinii serotypes 3, 5, and 6) in their OspA grouping [23,24], and that there was general agreement between ospA genotype and OspA serotype assigned using type-specific monoclonal antibodies (mAbs) [25].
The first seven OspA serotypes reported in 1993 [25], together with serotype 8 in 1996 [26], were identified using a traditional serotyping system limited by the non-commercial availability of reference mAbs and a requirement for mAb combinations for identification.Beginning in the early 2000s, sequence-based approaches [27,28] largely replaced traditional serological typing [25,26] for characterization of B. burgdorferi s.l.OspA types.Instead, OspA typing commonly utilized PCR in combination with restriction fragment length polymorphism (RFLP) analysis [29][30][31].Beyond OspA, multilocus sequence typing (MLST) [32] and a method that targets the 5S-23S rDNA (rrfA-rrlB) intergenic spacer (IGS) [33] are genotyping tools used for Borrelia.However, none of these molecular methods are able to fully interrogate OspA diversity, information that is critically important for interpretation of the immune response to OspA antigens included in investigational vaccines currently in development.Today, such approaches are being displaced by whole-genome sequencing (WGS) with the rapid advancement in next-generation sequencing (NGS) platforms and sequence analysis algorithms [34].In silico strain typing based on WGS/NGS data can provide more precise information on the diversity of B. burgdorferi s.l. and the relationship between phylogenetic clusters of OspA variants, Borrelia genospecies, and OspA type.
To support Borrelia surveillance and typing, we have compiled a database of over 400 genomes representing 11 genospecies of B. burgdorferi s.l.Approximately half were accessed from PubMLST and GenBank [35] and the remainder from NGS of isolates determined in-house.The isolates in this collective database comprise human pathogenic Borrelia species of North America, Europe, and Asia (B.burgdorferi, B. afzelii, B. garinii, B. bavariensis, B. spielmanii, B. bissettiae, and B. mayonii), as well as those not recognized as human pathogens (B.japonica, B. turdi, and B. finlandensis).Analysis of these genomes has revealed a comprehensive picture of OspA diversity across and within the major pathogenic B. burgdorferi genospecies and has been used to define sequence similarity boundaries between OspA types.The development of a sequence-based OspA in silico typing (IST) scheme, described here, provides a valuable tool for characterization of clinical samples at the level of OspA type and Borrelia genospecies.

Bacterial growth and whole-genome sequencing
MKP (Modified Kelly-Pettenkoffer) medium prepared in-house [36] was used for cultivation of B. afzelii, B. garinii, B. bavariensis, and B. spielmanii isolates from frozen stock vials, whereas BSK-H (modified Barbour-Stonenner-Kelly) medium (Sigma-Aldrich Cat# B8219) was used for the culture of B. burgdorferi.Cultures were incubated at 34 °C for 5 to 14 days and closely monitored for bacterial growth by dark field microscopy.Cultures were centrifuged at 10 000 g for 10 min once the spirochete concentration reached at least 1.0×10 6 cells ml −1 .Cell pellets were processed for genomic DNA extraction following the magnetic bead-based Genfind V3 DNA isolation protocol (Beckman Coulter Life Sciences cat# C34881).Next-generation sequencing was performed on the Illumina MiSeq platform, with 2×300 bp paired-end NGS chemistry following a slightly modified protocol previously described by Jones et al. [37].Read quality was verified using samtools (v1.15.1) to find the per-base coverage of the chromosome.An average sequencing depth of 193× was achieved across all isolates.

Genotype characterization and phylogenetic analysis
For NGS reads from each isolate, de novo genome assembly was performed using CLC Genomic Workbench (v21.0.5) with default settings, and the ospA sequence was obtained using blast.In the case of publicly sourced genomes, blast was again used to determine the ospA sequence and only isolates with a complete open reading frame were included.Whole-genome core alignment was performed using Parsnp [38], and the OspA-specific phylogenetic tree was generated using mega11 [39].Consensus OspA sequences per IST were calculated using CLC Genomic Workbench.For ISTs where <2 sequences were identified, a single OspA variant was used in place of a consensus sequence.

OspA reference variants and IST
Unique OspA protein sequences (Table S3) were obtained from a combination of sources: NCBI, the PubMLST database ( pubmlst.org), Mendeley, and internal collections of Borrelia (Table S1, Table S2).Phylogenetic analysis (see previous section) identified distinct OspA clusters, defined here as OspA ISTs, wherever there were more than two isolates.

Read alignment and consensus ospA sequence
Borrelia NGS reads were aligned by bwa mem (v0.7.17) to reference sequences for multiple genospecies and ISTs: B. burgdorferi (IST1), B. afzelii (IST2), B. garinii (IST3, IST5, IST6, IST11, IST12), and B. bavariensis (IST4, IST9, IST10).Per-base coverage was extracted using samtools and the mean and standard deviation coverage was computed for each alignment.The alignment with the highest mean minus one standard deviation was used for further analysis.The selected alignments were assessed for median coverage and sequence depth at each nucleotide in the ospA gene.Samples with <100 reads aligned, median coverage <10× and/or sequence depth <5× across <99 % of the gene were deemed low quality and excluded from further IST analysis.Variant calling was performed using bcftools (v1.17) with the setting QUAL>=30, and the consensus ospA sequence generated by bcftools.
To predict OspA IST from assembled genomic contigs, blastn was instead used to obtain alignments to the IST-specific reference sequences.The alignment with the highest E-value was used for downstream processing, with sequences with <99 % per-base alignment considered low quality and excluded from further analysis.

OspA IST assignment and estimation of IST-specific pairwise identity threshold
The consensus ospA nucleotide sequence was translated to amino acids and compared against 90 reference OspA protein variants (Table S3) using MAFFT (v.7.480 [40]).A pairwise sequence identity was then calculated for each comparison.In cases of exact sequence match, the OspA IST of the known sequence variant was used for IST assignment.For non-exact matches, the IST of the OspA variant with the highest pairwise identity was assigned only if the pairwise identity met the corresponding threshold.Otherwise, an 'unknown IST' designation was assigned.
To estimate IST-specific pairwise identity threshold, all known OspA variants were multi-aligned using MAFFT and the number of mismatches calculated using a custom Python script to determine pairwise sequence identity.For each OspA variant, the maximum pairwise sequence identity within and outside of their corresponding IST was calculated.A similarity threshold for each IST was then determined as the midpoint between the lowest within-IST percentage similarity and the highest out-of-IST percentage similarity.

Evaluation of OspA NGS pipeline using clinical isolates
To evaluate the performance of the OspA typing pipeline, 22 clinical B. burgdorferi s.l.isolates obtained from Valneva SE (Saint-Herblain, France) were used for testing.Sequence data were processed using the in silico OspA typing pipeline.Results were compared with OspA serotypes previously determined by amino acid sequence alignment to reference strains containing candidate OspA sequences [21].

Borrelia genome collections
We sequenced and assembled the genomes of 193 Borrelia isolates (Table S1).The majority of isolates were collected from humans between 1988 and 2018, 81 % from human skin biopsy samples (Fig. 1a).A small percentage (7 %) of isolates were obtained directly from ticks.Overall, B. afzelii and B. burgdorferi predominated and all B. afzelii included in the collection were isolated from Europe (88.4 % of these from the Netherlands) (Table S1).Within B. burgdorferi, 77.5 % were collected from the USA, whereas the remaining isolates were from European countries.
The sequences of an additional 227 Borrelia genomes were sourced from the public databases PubMLST and GenBank (Table S2).In contrast to our own collection, these were predominantly obtained directly from ticks (Fig. 1b).Nearly half of the isolates in this subset were B. burgdorferi from North America, with the remainder predominantly consisting of B. garinii and B. bavariensis from Europe and Japan (Table S2).To obtain broader representation of global OspA sequence diversity, we also included lp54 sequences of 55 isolates from the Mendeley Data repository (Table S2).These plasmid sequences were predominantly isolated from ticks in Japan.When combined with our internal collection, a total of 11 different B. burgdorferi s.l.genospecies were represented in the aggregate collection, with B. afzelii, B. burgdorferi, B. garinii, and B. bavariensis all well represented (Fig. 1c).

Genetic diversity of OspA within and across Borrelia genospecies
We extracted the ospA gene sequences from the compiled sequence collection, which were translated to amino acids in order to remove silent mutations caused by codon redundancy.In total, we identified a total of 90 unique OspA protein variants (Table S3).The phylogenetic tree of OspA variants was annotated to identify the existing reference Borrelia strains with known OspA serotypes [1][2][3][4][5][6][7][8] (Fig. 2).As expected, ospA sequences sharing the same OspA serotype were phylogenetically clustered.We also observed additional clusters in the phylogeny that did not correspond to any previously reported serotypes.To accommodate these potential novel OspA types, and to better characterize the OspA diversity, we have developed a new nomenclature: OspA in silico type (IST).We found that OspA IST correlated strongly with OspA serotypes 1-8 and could be linked to the corresponding genospecies: B. burgdorferi (IST1), B. afzelii (IST2), B. bavariensis (IST4), and B. garinii (IST3, IST5, IST6, IST7, and IST8) (Fig. 2, Table 1).OspA variants identified from B. burgdorferi (IST1) and B. afzelii (IST2) were limited to a single phylogenetic cluster.In contrast, two other genospecies typically restricted to LD cases in Europe and Asia, B. garinii and B. bavariensis, were more heterogenous, with multiple clusters of OspA variants (Fig. 2).OspA variants identified among B. bavariensis strains were clustered into three phylogenetic groups, one of which corresponded to IST4 (sequence variant 12, i.e., European OspA serotype 4).OspA variants corresponding to two new phylogenetic groups designated IST9 and IST10 were detected in B. bavariensis strains from Asia.Only OspA variant 12 was represented in the phylogenetic IST4 cluster, although 23 isolates carried this OspA variant (Tables S1 and S2).We next investigated the amino acid sequence diversity within and between ISTs.To begin, we measured the pairwise sequence identity across all 90 OspA variants.For each variant within an IST, we noted the highest sequence identity to variants from the same IST compared to the highest identity to variants not in that IST.The distributions of these two groups are shown in Fig. 3 and Fig. S1.In all 17 cases, a clear distinction was observed when comparing the sequence identity within an IST to the sequence identity of variants from another IST.This indicated that phylogenetic clustering was a robust method for assigning IST.
To further understand the variation between specific ISTs, we calculated the average sequence identity between OspA variants from each of the first 12 ISTs (Fig. 4a) corresponding to B. burgdorferi, B. afzelii, B. bavariensis, and B. garinii.ISTs 4, 5, 6, 8, and 12 were found to have high between-IST sequence identity, reaching nearly 93 % average identity in the cases of IST6 and IST12.This was relatively unsurprising as these ISTs occupy the same branch in the OspA phylogeny (Fig. 2).As variation in OspA is known to primarily occur at the C-terminal domain of the protein [21], we aligned these regions using consensus or representative OspA sequences for these ISTs.We found that the C-terminal variable domain was relatively conserved across these ISTs, with the exception of a segment from amino acid positions 206-230 (Fig. 4b).This region of OspA contained 11 amino acid substitutions and one indel, and clearly distinguished IST5 from IST8 and IST5 from IST6.In contrast, these substitutions were well conserved between IST4, IST6, and IST12.Similarly, we found that IST3 and IST7, which averaged 87.9 % identity with one another, had well conserved C-terminal domains when consensus OspA sequences were compared to the other ISTs (Fig. S2).
Although within-IST sequence identity was consistently greater than between-IST sequence identity, we found that IST3 OspA variants had the greatest sequence heterogeneity, only averaging 92.5 % identity when compared with one another (Fig. 4a).Based on the OspA phylogeny, it appeared that three sub-groups of IST3 were present in the collection (Fig. 2) and were sufficiently diverged from one another to impact the average sequence identity.We aligned the C-terminal domain of multiple IST3 OspA variants and identified a region from positions 201-227 that characterizes the two sub-groups, consisting of differences at eight amino acid positions in addition to a single residue indel (Fig. 4c).The full-length alignment of IST3 OspA variants is shown in Fig. S3.

of an in silico OspA typing pipeline
Leveraging this collection of diverse OspA variants, we sought to develop a standardized and automated method for determining the OspA IST of new isolates (Fig. 5).To this end, we first tested whether alignment of short-read NGS data of ospA from wholegenome sequence could be used to generate a consensus sequence of OspA.For initial processing, a reference sequence(s) was needed for alignment.To determine whether a single ospA sequence was sufficient as a reference for all ISTs, NGS reads of ospA from B. burgdorferi B31 (IST1) and a B. garinii IST6 isolate were simulated using SimuSCoP v1.0 and aligned to ospA sequences corresponding to ISTs 1-6.Coverage depth at each position of ospA is shown in Fig. S4.While coverage was consistent when aligned to the ospA sequence of the respective ISTs, the same could not be said when aligned to heterologous reference sequences.Consistent with sequence divergence in the C-terminal region of OspA, alignment failed at the 3′ end of ospA.An example of this is illustrated in Fig. S4a for B. burgdorferi, when aligning to non-burgdorferi ospA sequences.Similarly, the B. garinii input reads were found to align poorly to the non-garinii reference sequences at the 3′ end (Fig. S4b).Given that alignment quality is species-and IST-dependent, we concluded that alignment must be performed to a corresponding reference ospA sequence from each target IST in order to maximize coverage and sequence integrity.
To determine the best alignment for consensus sequence building, both the total read coverage and coverage variation were considered.When aligning reads to a reference sequence of a different IST, the average coverage across ospA was found to be both lower and more variable, particularly in the C-terminal variable domain (Fig. S3).To infer the best quality alignment across ISTs, an alignment score was calculated as the mean coverage minus one standard deviation.The alignment with the highest score was then used to build a consensus ospA sequence.Although this scoring selects for the most robust alignment, it did not consider cases where alignment quality or coverage is poor in general.In order to flag low-confidence ospA alignments, minimum required values were set for total reads aligned, median read coverage, and depth of coverage across the entire ospA gene sequence (at least 5× sequence depth; see Methods).If the assembled genomic contigs were used as input, blastn was used instead to obtain alignments to the reference sequences of individual ISTs.
Finally, to determine the genospecies and IST, the generated consensus ospA sequence was translated and aligned to the collection of 90 OspA variants with species and IST assignments (Fig. 5b).In cases where an exact OspA sequence match was available, the corresponding species and IST were assigned.If an exact match was not returned, the highest percentage similarity to a known sequence was recorded alongside the IST associated with that sequence.As we had previously calculated the sequence identity Fig. 3. Comparison of OspA amino acid sequence identity within and outside of each IST.For each of the 90 OspA variants, sequence identity was compared to each variant from the same IST, and the highest percentage similarity saved.The distribution of maximal sequence identity is shown in blue boxes.Likewise, sequence identity was compared to each variant not in the same IST and the highest percentage similarity saved.The distribution of maximal sequence identity for these cases is shown in red boxes.A similarity threshold for each IST was then calculated as the midpoint between the tails of the two distributions, the values for which are depicted below the plot.As only a single OspA variant comprises both IST4 and IST8, sequence identity within these ISTs is depicted at 100 %.Distributions for IST13-17 are found in Fig. S1.
for OspA variants with known ISTs (Fig. 3 and S1), these data were used to set thresholds for inclusion in each IST (see Methods).If the similarity of the query sequence was greater than or equal to the threshold for the known sequence, the associated IST was assigned.Otherwise, the sample was labelled as an unknown IST.

Testing of the OspA NGS pipeline using clinical Borrelia isolates
To evaluate performance of the OspA NGS pipeline, we ran a test dataset of whole-genome sequences from 22 isolates representing serotypes 1-6 [25] and 6 genospecies (B.burgdorferi, B. garinii, B. afzelii, B. bavariensis, B. mayonii, and B. spielmanii).A detailed comparison of the OspA ISTs identified by the pipeline is found in Table S4.The pipeline returned 100 % concordance with serologically OspA-typed genospecies, and all 22 isolates returned the expected results for OspA IST (Table S4).A single B. garinii isolate, previously labelled as OpsA serotype 3, was found to harbour a novel allele of ospA with 99 % identity to the reference IST3 ospA sequence used by the OspA NGS pipeline.

DISCUSSION
A hexavalent (serotypes 1-6) OspA-based vaccine, VLA15, is being developed for prevention of LD caused by Borrelia spp. that code for these most prevalent serotypes in North America and Europe [41].VLA15 is currently being tested for efficacy in a phase 3 study conducted in both North America and Europe (VALOR; NCT05477524).Determination of the OspA type of clinical isolates is important for vaccine development, both to determine concordance with the vaccine antigens, as well as to better understand the epidemiology and prevalence of different OspA types and Borrelia genospecies in human disease.Moreover, it is important to understand the genetic diversity of OspA.Characterization of the efficacy of VLA15 in Europe will be challenged by the number of potential OspA protein types present among European isolates.A reliable and implementable classification method that can readily accommodate new OspA types is currently lacking.In the present study, we developed a pipeline that accurately assigns OspA ISTs and Borrelia genospecies from either OspA PCR amplicon sequence reads or assembled Borrelia genome data.These include IST1-IST8, which correspond to OpsA serotypes 1-8 that had been previously assigned using traditional serological methods, as well as nine novel ISTs (IST9-IST17).Our IST approach is rapid, simple, standardized, and only requires ospA sequence data.The typing approach described in this work is the first reported pipeline specifically designed for OspA types of B. burgdorferi s.l.Consistent with mAb-assigned OspA serotypes [25,26], the OspA variants belonging to B. burgdorferi and B. afzelii, identified as IST1 and IST2, respectively, displayed near 100 % within-group homology.New OspA clusters (IST11 and IST12) were identified for B. garinii as they diverge from the five established OspA serotypes associated with this species [26] (reported here as IST3 and IST5-IST8).We found that the B. bavariensis IST4 European cluster was geographically distinct from two novel clusters (IST9-IST10) containing B. bavariensis isolates from Asia.The divergence between these OspA groups is expected, given evidence for a clonal expansion of the European population of B. bavariensis [42][43][44].Only a single OspA variant corresponding to IST4 (B.bavariensis) and IST8 (B.garinii) was identified in our sequence collection, and thus, within-serotype similarity was set at 100 % for these ISTs.
Genotyping Borrelia isolates in clinical samples is desirable for vaccine development to better understand the epidemiology of LD.Borrelia genospecies and OspA types are differentially distributed across the northern hemisphere and have been associated with distinct manifestations of LD [25,26,45,46].Sequence alignment of the ospA gene and analyses based on OspA amino acid sequence similarity have been used broadly for clustering B. burgdorferi s.l.isolates [23,[47][48][49][50].This clustering has provided useful genospecies information in agreement with classification based on the analysis of other conserved chromosomal genes [28] and mAb-based serotyping [25].However, alignment analyses suffer from limitations of scalability and standardization, and often partial ospA gene sequence was targeted.The pipeline presented here provides high resolution to differentiate between multiple OspA ISTs, as well as the ability to identify novel phylogenetic IST clusters within a single genospecies (e.g.IST9 and IST10 for B. bavariensis and IST11 and IST12 for B. garinii).Since the IST OspA pipeline only scans for ospA sequences, interference of host DNA or DNA of similar pathogens lacking ospA, such as relapsing-fever Borrelia, is minimized.Consequently, the pipeline is capable of IST assignment using a small amount of Borrelia DNA in a background of host DNA.Only NGS reads aligned to ospA are utilized, and IST is assigned with high sensitivity and specificity.The ability for de novo strain detection, identification, and assignment of ospA alleles is an advantage of NGS-based typing platforms, with the capability to easily build on phylogenetic trees as new disease-causing Borrelia species are discovered [51] and additional complete genomes are sequenced.
Evaluation of the OspA IST pipeline was limited to 22 Borrelia isolates from a single collection.The small number of Borrelia isolates with both serological typing data and OspA sequence information presented a challenge.In addition, the current pipeline design does not take co-infection of Borrelia species or strains into consideration, which has been observed in Europe [52]; therefore, the pipeline can only detect the dominant IST when working on samples that may carry >1 isolates from different ISTs.also opens the rare possibility of a novel variant being identified based on polymorphisms from a mixture of OspA sequences.Furthermore, this pipeline relies on the association between OspA sequence and genospecies, so assignment of genospecies based on could potentially be occluded in the case of horizontal gene or plasmid transfer during co-infection [53].Future developments of the OspA typing pipeline will aim to address these potential limitations.
Our work has uncovered novel clusters of OspA variants among strains of B. burgdorferi s.l., and also introduced the concept of ISTs as a new nomenclature for strain characterization.Furthermore, we developed a reliable open-source OspA analysis pipeline that enables characterization of novel Borrelia OspA types using NGS data without the need for traditional, antibody-based serotyping systems.As global surveillance of Borrelia continues to expand, this method has the potential to document additional ISTs.Novel ISTs will be incorporated into future updates to the in silico OspA typing pipeline and added to the IST scheme in the Borrelia PubMLST database.

Fig. 1 .
Fig.1.Host and genospecies distribution of Borrelia strains.(a) Isolates sequenced for this study were largely collected from human hosts, with the majority coming from skin biopsies.A small subset of strains was isolated directly from ticks.(b) Assembled genomes sourced from public collections were predominantly from ticks rather than human hosts.Some isolates were obtained from non-human animal hosts.(c) Core genome phylogeny of Borrelia strains and representation of the predominant genospecies in the combined collection.Cross-genospecies phylogeny was determined based on 18,348 core nucleotides containing 2,011 SNPs.
The diversity of OspA variants from B. garinii was more considerable, spanning seven phylogenetic clusters.Five corresponded to the OspA serotypes originally identified for B. garinii (serotypes 3, 5-8) [25, 26]; two novel B. garinii clusters were labelled as IST11 and IST12.Unlike the other B. garinii ISTs, only a single OspA variant was identified from

Fig. 2 .
Fig. 2. Phylogenetic tree of OspA variants.Phylogenetic analysis of 90 unique OspA protein variants.Clusters corresponding to the 17 ISTs defined in this study are labelled.The 12 OspA ISTs associated with species of Borrelia responsible for the majority of LD in North America, Europe, and Asia (B.burgdorferi, B. garinii, B. bavariensis, and B. afzelii) are bolded.

Fig. 4 .
Fig. 4. OspA sequence identity within and between ISTs.(a) Average percentage sequence identity of OspA protein variants within and between IST1-12.As ISTs 4 and 8 consist of only a single OspA sequence variant, no within-IST identity was calculated for these types.(b) C-terminal domain alignment of ISTs 4, 5,6,8, and 12 OspA sequences.(c) C-terminal alignment of representative IST3 OspA variants.

Fig. 5 .
Fig. 5. Workflow for an NGS-based in silico OspA typing pipeline.(a) NGS data of ospA are aligned to reference sequences for each IST.A consensus sequence is constructed based on the highest quality alignment.(b) The consensus ospA gene sequence is translated and multi-aligned against 90 known OspA variants.A sequence similarity threshold (Figs 3 and S1) is used to assign the IST of novel variants.

Table 1 .
Comparison of OspA in silico types (ISTs) and reported OspA serotypes corresponding to B. burgdorferi s.l.genospecies [25,26]ypes listed here were classified using the monoclonal antibody-based typing system described by Wilske and colleagues[25,26].† NR, not reported.