The emergence and diversification of a zoonotic pathogen from within the microbiota of intensively farmed pigs

Significance There is growing concern that rapid growth in livestock production and major changes in farming practices are driving the emergence of pathogens capable of causing disease in both livestock and humans. However, most studies neglect livestock microbiota as a potential source of emerging pathogens. Here, we show how the global transport of live animals has facilitated the emergence of an important livestock and human zoonotic pathogen from a common member of the pig respiratory microbiota. Our results indicate that pathogenic lineages are likely to continue to emerge and diversify and recommend ways of controlling this.


Significance
There is growing concern that rapid growth in livestock production and major changes in farming practices are driving the emergence of pathogens capable of causing disease in both livestock and humans.However, most studies neglect livestock microbiota as a potential source of emerging pathogens.Here, we show how the global transport of live animals has facilitated the emergence of an important livestock and human zoonotic pathogen from a common member of the pig respiratory microbiota.Our results indicate that pathogenic lineages are likely to continue to emerge and diversify and recommend ways of controlling this.
The expansion and intensification of livestock production is predicted to promote the emergence of pathogens.As pathogens sometimes jump between species, this can affect the health of humans as well as livestock.Here, we investigate how livestock microbiota can act as a source of these emerging pathogens through analysis of Streptococcus suis, a ubiquitous component of the respiratory microbiota of pigs that is also a major cause of disease on pig farms and an important zoonotic pathogen.Combining molecular dating, phylogeography, and comparative genomic analyses of a large collection of isolates, we find that several pathogenic lineages of S. suis emerged in the 19th and 20th centuries, during an early period of growth in pig farming.These lineages have since spread between countries and continents, mirroring trade in live pigs.They are distinguished by the presence of three genomic islands with putative roles in metabolism and cell adhesion, and an ongoing reduction in genome size, which may reflect their recent shift to a more pathogenic ecology.Reconstructions of the evolutionary histories of these islands reveal constraints on pathogen emergence that could inform control strategies, with pathogenic lineages consistently emerging from one subpopulation of S. suis and acquiring genes through horizontal transfer from other pathogenic lineages.These results shed light on the capacity of the microbiota to rapidly evolve to exploit changes in their host population and suggest that the impact of changes in farming on the pathogenicity and zoonotic potential of S. suis is yet to be fully realized.

Streptococcus suis | pathogen emergence | bacterial pathogens | comparative genomics | livestock pathogens
Global livestock populations have grown rapidly over the past few centuries, with the global biomass of livestock now exceeding that of humans and wild mammals combined (1,2).This has been facilitated by intensive farming systems that have also led to increased livestock population density, lower genetic diversity, and the long-distance movement of live animals.These changes are predicted to promote the emergence of pathogens (3,4).While pathogen emergence typically arises through a pathogen jumping into a new host, pathogens can also emerge from within the microbiota already associated with a host population (5,6).This route to pathogen emergence may be particularly important in intensive farming systems, where large population size and high population density may select for traits associated with pathogenicity, while biosecurity reduces the risk of novel pathogens entering the population (7,8).
Streptococcus suis was first reported as a cause of disease in farmed pigs in 1954 (9) and is now a major cause of bacterial disease in piglets and an emerging human zoonotic pathogen (10,11).As well as being an important pathogen, S. suis is a ubiquitous component of the microbiota of the upper respiratory tract of all pigs.It is one of the most common bacterial species on the surface of the palatine tonsil, which is considered its main niche (12).S. suis disease in pigs takes the form of septicemia with sudden death, meningitis, arthritis, and endocarditis and most often affects piglets.It is also associated with respiratory disease, although these infections tend to be polymicrobial (13).Humans can be infected by S. suis either through contact with pigs or consumption of raw pork or other pig products.These infections result in similar pathologies to pigs and have high fatality rates.The first reported human case was in 1968 (14), and since then, S. suis has led to large outbreaks in China and has become a major cause of adult meningitis and septicemia in South-East Asia (15)(16)(17)(18).
While S. suis is a diverse species, only a small number of strains, typically characterised by multilocus sequence type (ST) or serotype, are responsible for most cases of disease pnas.org(19).What determines the pathogenicity of these strains remains poorly understood despite the identification of more than 100 putative virulence genes or factors (20).Difficulties in identifying the determinants of pathogenicity in S. suis have been attributed to its complex pathogenesis and high level of genetic diversity (20).Few studies have considered virulence factors in strains other than ST 1, which is responsible for most cases of S. suis disease in both pigs and humans worldwide (19).
In this study, we carried out a population-genomic analysis of 3,070 bacterial isolates sampled from tonsil and nasal swabs of pigs and wild boar, and blood and sites of infection in pigs and humans with S. suis disease, from Europe, North America, Asia, and Australia, dating from 1960 to 2020, to investigate the emergence, diversification, and geographic spread of pathogenic lineages of S. suis.Through development of a whole-genome typing schema, we identified 10 pathogenic lineages with broad geographic distributions, dated their origins, and mapped their movements between countries.We identified genomic changes associated with the emergence of these lineages and investigated their origins.We also considered the impact of pathogenicity on broader evolutionary dynamics, the ongoing diversification of pathogenic lineages, and how farming practices may have contributed to these processes.S1).These isolates date from 1960 to 2020.Those from Denmark, Germany, the Netherlands, and Spain included similar numbers of isolates from the tonsils or noses of pigs without S. suis-associated disease (carriage isolates; n = 188) and isolates from blood or sites of infection in pigs with respiratory (n = 196) or systemic (n = 196) forms of S. suis-associated disease (disease isolates).Those from Myanmar only included carriage and environmental isolates and those from the United States only disease isolates.The collection from Spain included isolates from a wild boar population (n = 34).We combined these with previously published genome sequence data from isolates from Australia (n = 143), Canada (n = 200), China (n = 217), Denmark (n = 1), Vietnam (n = 191), the United Kingdom (n = 441), the Netherlands (n = 101), Spain (n = 10), and the United States (n = 16).This led to a collection of 3,070 high-quality genome assemblies, including 48 complete genomes.

Pathogenic
To investigate the population structure of S. suis, we considered genetic variation estimated from both single nucleotide polymorphisms (SNPs) in genes that are present across all isolates (core genes) and from the presence/absence of accessory genes (Fig. 1A and SI Appendix, Figs.S1-S4).Both revealed high levels of diversity and distinguished a cluster of closely related isolates that includes most of our collection (2,424/3,070 isolates).Isolates in this cluster have a maximum pairwise distance of 0.08 differences per nucleotide in a core genome alignment, while the remaining 647 isolates diverge from members of this cluster by between 0.13 and 0.32 differences per nucleotide site.We refer to this cluster as the "central population" of S. suis [a previous study referred to it as "normal" S. suis (21)].The isolates that fall outside of the central population form several distinct clades in a core genome phylogeny (SI Appendix, Fig. S1).While it has been suggested that these may represent distinct species, a previous study concluded that phenotypic similarities and extensive gene exchange between some of these lineages and the central S. suis population meant that there was insufficient evidence to assign them to a new species or subspecies (21).In support of this conclusion, we found that divergent isolates are less clearly distinguished from the central population of S. suis by differences in gene content than by SNPs in core genes (Fig. 1A and SI Appendix, Fig. S1).
To mitigate any geographic bias, we characterised variation in disease association across S. suis using a subset of our collection that includes isolates that a) have a well-characterised association with disease or carriage states and b) are from a country from which we have large samples of both disease and carriage isolates in our collection (>40 of each).This subset includes isolates from Denmark, Germany, the Netherlands, Spain, Canada, and the United Kingdom (n = 1,193).In agreement with previous studies, our analysis revealed that while disease isolates are present across the entire genetic diversity of S. suis (including in divergent clades), they are concentrated in a subpopulation of the central population (Fig. 1 B and C) (22).
Using variation in both core and accessory genes, we partitioned the central population of S. suis into clusters of closely related isolates (lineages) (SI Appendix, Table S1).A reference database describing these lineages, which will allow for classification of further isolates using the same schema, is available online (www.bacpop.org/poppunk).Combining this with data on disease association, allowed us to identify 10 "pathogenic" lineages.In each of these lineages, at least 60% of the isolates from the subset of our collection described above are disease associated, compared to 26% of isolates from other lineages in the central population, and 19% of isolates from divergent lineages (Fig. 1C).Combined, the 10 lineages account for 80% of disease-associated isolates in our collection (Fig. 1C and Table 1) (19).These lineages also have much higher frequencies of serotypes that are commonly associated with disease: 88% of isolates from these 10 lineages have a disease-associated serotype (1, 1/2, 2, 3, 4, 5, 6, 7, 8, 9, or 14) compared to 16% from other lineages (Table 1 and Fig. 1D).
The 10 pathogenic lineages of S. suis fall within a subpopulation of the central S. suis population.Core nucleotide distances between these 10 lineages are, on average, lower than between lineages across the rest of the central population (SI Appendix, Figs.S1 and  S2).Nevertheless, they are not clearly distinguished from other lineages in the central population; there are an additional 34 lineages (98 isolates) that fall within the clade that includes the 10 pathogenic lineages (Fig. 1B).Combined, these 34 lineages show an elevated rate of disease association: 54% of isolates are disease-associated compared to 26% from the central S. suis population outside of this clade (Fig. 1C).They are also enriched for pathogenic serotypes; 41% of isolates have a pathogenic serotype compared to 8% in other lineages of the central population (Fig. 1D).This suggests that some of these lineages may also be pathogenic, but we cannot confidently characterise them as such due to small sample sizes (all have <15 isolates in the subset of our collection used for characterising disease association).
Consistent Genomic Changes Reveal Evolutionary Constraints on Pathogen Emergence.Previous studies that have investigated genes associated with pathogenicity in S. suis have tended to focus on isolates from the zoonotic lineage ST1, which is both highly pathogenic in pigs and responsible for most cases of human disease.Here, we aimed to identify genes associated with pathogenicity more broadly across S. suis (i.e., across all pathogenic lineages of S. suis).To do this, we identified genes that are present at >70% higher frequency in isolates from any of the 10 pathogenic lineages relative to both lineages from the central S. suis population outside of the pathogenic clade and divergent lineages.While no genes are uniquely present in pathogenic lineages, a small number (n = 37) are present at >70% higher frequencies in isolates from pathogenic lineages than both isolates from other lineages from the central population and from divergent lineages (SI Appendix, Fig. S5 and Table S2).Most of these genes (26/37) fall within just three genomic islands: Island 1 (SSU_RS05400-SSU_RS05325 in the published annotation of P1/7), Island 2 (SSU_RS02325-SSU_RS02355), and Island 3 (SSU_RS01130-SSU_RS01185) (Fig. 2 and SI Appendix, Figs.S6-S8 and Table S2).Five of the 11 genes outside of the three genomic islands have previously been described as associated with pathogenicity in S. suis, these are mrp, sspA, sspep, SSU0587, and sstgase (20).
Island 1 is present in >95% of isolates from 9/10 pathogenic lineages, only 17% of isolates from the central population outside of the pathogenic clade (and these tend to be from lineages closely related to the pathogenic clade; Fig. 2A), and <1% of isolates from divergent lineages.The island contains genes predicted to encode proteins involved in the breakdown of host glycans present in the extracellular matrices of mammalian tissues.It includes a putative heparan sulfatase (SSU_RS05330 in P1/7) and a putative hyaluronate lyase (SSU_RS05335-SSU_RS05350 in P1/7) which cleave hyaluronic acid found in all connective body tissue.However, in zoonotic lineage 1, the gene encoding the hyaluronate lyase protein is always truncated, likely leading to loss of function.The phosphotransferase system (PTS) and other enzymes present in this island may be associated with sugar transport and metabolism of host aminoglycans which are degraded extracellularly by S. suis.The degradation of extracellular matrix proteins may both Each of the 10 pathogenic lineages includes multiple STs and serotypes.STs and serotypes present in >5% of the isolates from our collection of each lineage are shown here, along with the number of isolates from these lineages in our collection, and the frequency of disease association (and systemic disease association) in the subset of our collection we used to characterize pathogenicity.A description of the STs and serotypes of all isolates in our collection is provided in SI Appendix, Table S1.
contribute to the spread of S. suis through the connective tissue and provide a source of energy at epithelial surfaces.Nucleotide diversity within Island 1 is similar to and broadly correlated with diversity in core genes both within and between pathogenic lineages (Fig. 2 and SI Appendix, Figs.S9 and S10).The island is also found next to the same core gene (nrdF or SSU_RS05320 in P1/7) in 97% of isolates.Together, this suggests that the island was acquired once by a common ancestor of all pathogenic lineages.As homologous recombination is common in S. suis (23), the inheritance of this island is unlikely to have been entirely vertical, but our observations suggest that it has followed similar patterns to core genes (Fig. 2C).
Island 2 is present in 84% of isolates from the 10 pathogenic lineages, and only 11% from the central population outside of the pathogenic clade, where it tends to be carried by isolates that are both closely related to the pathogenic clade and also carry Island 1 (Fig. 2A).It is not found in any isolates from divergent lineages.Its presence is variable across pathogenic lineages: It is present in >80% of isolates from lineages 1, 2, 3, 4, 6, 7, and 10, but absent from lineages 5, 8, and 9. Island 2 encodes a major pilin subunit (SSU_RS02345), a minor pilin subunit (SSU_ RS02335 and SSU_RS02340), and a pilin-specific sortase (SSU_RS02350) that is involved in maturation.The minor pilin subunit (SSU_ RS02335 and SSU_RS02340) was previously described as a pseudogene in several pathogenic strains of S. suis (including P1/7) (24).We find evidence that this is true of all isolates in lineage 1 and 2 and several from other lineages due to premature stop codons.Nevertheless, Fittipaldi et al. (24) showed that despite this truncation, S. suis P1/7 still produces a major pilin subunit.While they found that this did not play a role in adherence to porcine brain microvascular endothelial cells or virulence in a mouse model of sepsis, Faulds-Pain et al. (25) later showed that the pilin-specific sortase in this island was essential for causing disease in pigs via the intranasal route of infection.We find that in a small number of isolates (largely from lineages 3 and 7), the major pilin subunit is truncated (see SI Appendix, Fig. S6 for an example).Further work is required to understand the impact of this on the function of the island.
Diversity within Island 2 suggests multiple acquisitions from outside of S. suis: Three divergent versions are carried by pathogenic lineages (lineages 1 and 2 carry a similar version that is distinct from a version carried by lineages 6 and 10, and a version carried by lineages 3 and 7; Fig. 2 B and C and SI Appendix, Fig. S9).The island is found next to the same core gene (murD or SSU_RS02355 in P1/7) in 99% of isolates, and divergence within each of the pathogenic lineages is similar to and correlated with diversity in both core genes and Island 1 (SI Appendix, Figs.S9 and S11).Divergence between the pairs of lineages that carry each of the three versions of the island is variable, and similar to distances between these lineages based on core genes and Island 1.This suggests that the presence of each version of Island 2 in a pair of pathogenic lineages is a consequence of a single acquisition by a common ancestor.As the most recent common ancestor of lineages 1 and 2 is also that of the entire pathogenic clade, this suggests that the version of this island carried by lineages 1 and 2 was maintained since the common ancestor of the pathogenic clade, while the version carried by lineages 6 and 10 has been acquired much more recently.
Island 3 shows the strongest association with pathogenic lineages.It is present in >95% of isolates in the 10 of the pathogenic lineages and only 1% of isolates from the central population outside of the pathogenic clade.It is entirely absent from divergent lineages.It contains genes that code for an ABC transporter, a ROK (repressor, open reading frame, kinase) family gene (26), and a large predicted surface protein with similarity to a PTSII subunit.The ROK is similar to an E. coli repressor (Mlc), which represses several genes including two PTS genes (27).In Salmonella Typhimurium, Mlc positively regulates expression of the Salmonella Typhimurium pathogenicity island 1 genes by reducing the expression of the negative regulator HilE (28).
There are two divergent versions of Island 3 in our collection: one carried by lineages 1, 5, 6, 7, 8, 9, and 10 and the other by lineages 2, 3, and 4. While some isolates from lineage 9 carry a version that differs from these two, this appears to be the result of recombination between the two versions.In these isolates, there are extended tracts of SNPs that are shared with either of the two main versions, and there is only one unique SNP.Island 3 is found next to the same core gene in 98% of isolates in our collection (SSU_RS01115 in P1/7) and diversity in the island is positively correlated with diversity in core genes within lineages (SI Appendix, Fig. S12).Within-lineage diversity is generally lower in this island than in the other two islands (SI Appendix, Fig. S9), but this may reflect stronger selective constraint rather than a more recent origin.This is supported by lower divergence at 1st and 2nd codon positions relative to 3rd codon positions (SI Appendix, Fig. S13).Together, this suggests that the presence of Island 3 in each of the pathogenic lineages tends to be the result of a single acquisition by a common ancestor of the lineage.We find evidence of only one case of a second acquisition event by a pathogenic lineage: One isolate from lineage 2 carries the version of Island 3 associated with lineages 1, 5, 6, 7, 8, 9, and 10 (PH2016-139).While very low diversity within the two versions of the island means that the exact relationships between those carried by different lineages cannot be reconstructed, it indicates that for most pathogenic lineages Island 3 was obtained from another pathogenic lineage and that the transfer did not occur long before the most recent common ancestor of each lineage.
Each of the three pathogenicity-associated islands are sometimes present in lineages other than the 10 pathogenic lineages (Fig. 2A and SI Appendix, Table S1).All three are present together in only six additional lineages (15 isolates): lineages 18, 107, 144, 164, 466, and 511.Of these, three fall inside the pathogenic clade and three within the central population but outside of the pathogenic clade.While the three in the pathogenic clade are all represented by multiple isolates, the three outside the pathogenic clade are all only represented by a single isolate.Combined, the six lineages have a broad geographic distribution that spans Europe, North America, Australia, and Asia.Lineage 18 alone includes isolates from Canada, the United States, and Myanmar, that date from 1987 to 2019.Some of these lineages may represent additional pathogenic lineages that are not well represented by our collection, while others may represent the chance acquisition of these islands by isolates that are otherwise ill-suited to a pathogenic ecology.
The Emergence and Spread of Pathogenic Lineages Are Linked to Growth in Pig Farming and International Trade.We used the temporal structure in our collection of isolates from the six most common pathogenic lineages to construct dated phylogenies (Fig. 3 and SI Appendix, Fig. S14).These reveal that the dates of the most recent common ancestors of these lineages range from 1827 (lineage 2; 95% HPD: 1798-1854) to 1951 (lineage 5; 95% HPD: 1944 to 1958) (SI Appendix, Table S3).These origins largely predate a rapid period of growth in pig numbers in several European countries that accompanied the wide-scale shift to larger farms and indoor rearing in the early 20th century (29,30) and instead accompany a period of human population growth in Europe and North America that followed the Industrial Revolution (31) (SI Appendix, Fig. S15).
The 10 pathogenic lineages all have a broad geographic spread: Each is found in at least 6/11 countries in our collection.Similarly, outside of these lineages, we find little evidence of geographic structure in S. suis: Isolates from each well-sampled country span the diversity of S. suis (SI Appendix, Fig. S16).This is also true of isolates in our collection from a wild boar population in Spain (SI Appendix, Fig. S17).Using our dated phylogenies and a discrete asymmetric model, we estimated the number of between-country movements for each of the six most common pathogenic lineages and their relative rates.We estimated between 16 and 45 between-country movements for each lineage, and 182 across all  S4.
six lineages (SI Appendix, Table S4).This is likely to be a substantial underestimate of the true number of movements due to incomplete sampling.
Across all lineages, our highest inferred rate of between-country transmission was from Denmark to Germany, followed closely by Canada to the United States, the Netherlands to Germany, and the United States to Canada (Figs. 3 and 4 and SI Appendix, Fig. S18 and Table S4).This is consistent with these movements being driven by international trade in live pigs; over the last 80 y, the top three global exporters of live pigs have been Denmark, the Netherlands, and Canada, and top two importers have been Germany and the United States (www.fao.org/faostat).Additionally, we find no evidence of the transmission of pathogenic lineages into Australia since its ban on live pig imports in the mid-1980s (32).While we infer several transmission events from Europe and Canada to Australia, the most recent is a movement of zoonotic lineage 1 from the United Kingdom that we estimate to have occurred between 1981 (95% CI: 1976 to 1987) and 1986 (95% CI: 1981 to 1991).
We also observe an unusually low frequency of pathogenic lineages in our collection from Myanmar.Pig farming in Myanmar is typically small-scale and most of our samples from this country are from backyard or semi-intensive farms that typically have 10 to 30 pigs.While all of our isolates from Myanmar are nonclinical and so we would expect a low frequency of pathogenic lineages, they are less likely to be from pathogenic lineages than non-clinical isolates from other countries: <1% of isolates from Myanmar compared to a mean of 18% across other countries (SI Appendix, Table S5).In fact, none of our isolates from backyard and semi-intensive farms in Myanmar are from pathogenic lineages (0/109 isolates from the central S. suis population); the only pathogenic lineage isolate in our collection is from a larger commercial farm (1/39) with around 2,000 pigs.
As well as the spread of pathogenic lineages between pig farms in different countries, we find evidence of transmission between farmed pigs and wild boar.The two isolates from our collection of Spanish wild boar that are from a pathogenic lineage are both from lineage 1.They form a single clade that falls within a clade that appears to have been circulating in Spain since the 1970s.This indicates recent transmission between farmed pigs and local wild boar.
Individual lineages show different patterns of between-country transmission (Fig. 4 and SI Appendix, Figs.S19 and S20).This may reflect variation in both the prevalence of these lineages and their dates of introduction to particular countries.In particular, we find evidence that zoonotic lineage 1 has been circulating in the United Kingdom and the Netherlands for at least 50 y, with frequent transmission between these two countries, while the earliest evidence we find of its presence in North America is 2009 (95% CI: 2007 to 2011).This is consistent with this lineage both being less commonly associated with disease in North America and reports of an increase in its rate of detection over the last few years.In contrast, lineage 2 shows evidence of circulation in Canada for around 80 y and repeated transmission from Canada to the USA.Unfortunately, uncertainty in the inferred locations of the common ancestors of each of the lineages means that we cannot confidently infer the country of origin for any pathogenic lineage.

Pathogenic Lineages Have Variable Associations with Systemic
Disease and Carry Unique Genes.Genetic and phenotypic diversity can make it more difficult to control the spread of a pathogen.For example, it can lead to difficulties in developing cross-protective vaccines.It can also make a pathogen more capable of evolving to evade control measures.In S. suis, we find evidence of both phenotypic and genetic variation between pathogenic lineages.The six most common pathogenic lineages in our collection vary in their frequency of both association with disease relative to carriage (chi-squared test, P = 3.6 × 10 -8 ) and systemic disease relative to respiratory disease (chi-squared test, P = 2.0 × 10 -20 ).Pairwise comparisons reveal two distinct groups: lineages 1, 3, 4, and 5 have higher frequencies of association with systemic disease, while lineages 2 and 6 have higher frequencies of association with respiratory disease (SI Appendix, Fig. S21).This is consistent with previous studies that found that ST28, which is part of lineage 2, is less virulent in mouse models than both ST1 (lineage 1) and ST25 (lineage 4) (33).
Pathogenic lineages also show evidence of variation in their evolutionary dynamics and genome content.Average rates of nucleotide substitutions per site were highest for lineage 5, while the ratio of transitions to transversions is lower for lineages 2 and 4 (SI Appendix, Fig. S22).The six most common pathogenic lineages can be distinguished by their carriage of 2 to 32 lineage-specific genes (that are present in >95% of isolates in that lineage and <5% of isolates from other lineages; SI Appendix, Table S6).For instance, 17 genes distinguish zoonotic lineage 1. Far more genes are associated with multiple pathogenic lineages; 309 genes are present in >95% of isolates in at least one of the six most common pathogenic lineages and in <5% of isolates outside of the pathogenic clade (SI Appendix, Table S6).Apart from lineages 2 and 4 that share more genes than other pathogenic lineages, there is little evidence of clustering of lineages by shared gene content (SI Appendix, Fig. S23).

Pathogenic Lineages Continue to Diverge from Commensal
Lineages and Diversify.Pathogenic lineages of S. suis tend to have smaller genomes than those from outside of the pathogenic clade, and pathogenic lineages with higher frequencies of association with disease tend to have smaller genomes than those with higher frequencies of association with carriage (Fig. 5A).This pattern was previously described in a study based on a smaller sample of S. suis isolates (22).Using the temporal structure in our collection, we additionally found that older isolates of pathogenic lineages tend to have larger genomes than more recent isolates.This suggests a gradual and ongoing process of genome reduction in pathogenic lineages (Fig. 5B).While all pathogenic lineages show this pattern, we observe variation in the rate of genome reduction across lineages, with zoonotic lineage 1 showing the slowest rate and lineage 5 the fastest.
Against this backdrop of genome reduction, we found evidence of adaptive gene acquisitions.In particular, we found evidence of multiple capsular switches in all of the six most common pathogenic lineages (Fig. 1 and SI Appendix, Fig. S24).Serotype 2 is often linked with invasive disease in pigs and zoonotic disease in humans, particularly when carried by ST1 (zoonotic lineage 1).We found that the serotype 2 capsule locus is present in the genomes of the majority of isolates from lineage 1 and also in isolates from 3/9 of the other pathogenic lineages (lineages 2, 4, and 9) and a few other lineages within the pathogenic clade (SI Appendix, Table S1).While there is little diversity within the genes in the serotype 2 capsular locus either within or between lineages, there is greater diversity within zoonotic lineage 1 than within other lineages (SI Appendix, Fig. S25).Combined with an older estimate of the most recent common ancestor of lineage 1 (1876, 95% CI: 1860-1891) compared to the clades that carry serotype 2 in lineages 2 (1935, 95% CI: 1923 to 1948) and 4 (1914, 95% CI: 1902 to 1926), this suggests that this serotype has been horizontally transferred from lineage 1 to these other pathogenic lineages.We also observe evidence of repeated transitions between serotypes 2 and 1/2 [whose capsular loci are nearly identical (34)].These transitions appear to be particularly common in lineage 2 (SI Appendix, Figs.S24 and S25).
Serotypes 1 and 14 are also common in zoonotic lineage 1.The genes encoding these capsules are largely shared with serotypes 2 and 1/2 (34), but divergence within the capsular genes is large enough to indicate independent origins of these two pairs of serotypes (SI Appendix, Fig. S24).Serotype 14 is also present in lineage 2 and in two other lineages (lineages 11 and 78).Diversity within the capsular genes of serotypes 1 and 14 is again consistent with them having been horizontally transmitted from zoonotic lineage 1 to other lineages and repeated transitions between serotypes 1 and 14.

Discussion
Studies of the impact of agricultural intensification on the risk of pathogen emergence generally focus on the most typical route: a pathogen jumping from one host species to another (4,7,35).In this study, we instead described the emergence of an important zoonotic pathogen from within a largely commensal member of the respiratory microbiota of pigs.Our results suggest that while this form of pathogen emergence is gradual, it can lead to a diverse pathogen whose impact on its host population is difficult to control, particularly alongside efforts to reduce the use of antibiotics in farming.
Over the last two hundred years global pig numbers have increased more than 10-fold (31).While the global rate of increase was highest in the second half of the 20th century, in some regions, such as the United States, growth was faster in the 19th century (SI Appendix, Fig. S15).Population growth has been accompanied by increased population density.For example, between 1921 and 2011 the number of pigs in Canada increased from 3.3 to 14.6 million, while the number of farms declined from 453 to just over 7 thousand (36).Modern farming practices also involve frequent movement of pigs between farms, with a trend toward production specialisation meaning that piglets often move to a different farm after weaning, and breeding for genetic improvement leading to the transport of breeding pigs around the world.These kinds of changes are predicted to promote pathogen emergence by facilitating the transmission of pathogens between hosts and therefore reducing the selective cost associated with increased host morbidity (8,37).
Our analyses reveal high levels of genetic diversity in S. suis in pigs, that is mirrored in isolates sampled from a wild boar population in Spain.This diversity might reflect a long-standing association between S. suis and pigs (both domestic and wild).Our dating of the most recent common ancestors of the six most common pathogenic lineages in our collection indicates that they all emerged in the 19th and 20th centuries.The conclusion that these dates reflect an ecological shift toward pathogenicity in at least some of these lineages is supported by evidence that they coincided with the acquisition of a pathogenicity-associated genomic island (Island 3).It is further supported by patterns of genome reduction in each of the pathogenic lineages.In comparisons across bacterial species, it has been shown that bacterial pathogenicity is broadly associated with smaller genomes and fewer genes (22).While the drivers of this pattern are not well understood, and may be multiple, our observation of a gradual decline in the genome sizes of isolates from pathogenic lineages over our sampling period suggests a recent transition whose effect has yet to stabilise.Further evidence of an impact of intensive farming practices on pathogenic lineages of S. suis is found in our estimates of frequent long-distance movements, which are most likely to be a consequence of international trade in live animals.
While our results suggest that pathogenic lineages have emerged and spread globally over the last two centuries, they are also consistent with pathogenicity in S. suis predating this.Genetic diversity in all three of the pathogenicity-associated genomic islands we identified is consistent with their presence in S. suis long before the origins of individual pathogenic lineages.Genetic distances between the versions of Islands 1, 2, and 3 that are carried by lineages 1 and 2 are similar to distances between these lineages estimated from core genes.This may reflect the presence of these islands in a common ancestor of these two lineages.The prior existence of pathogenic lineages could therefore have aided the recent emergence of several new pathogenic lineages of S. suis.
Traits that promote virulence are thought to be selected for because they increase within-host growth or between-host transmission (38).The pathogenicity-associated genomic islands we identified have putative functions that may influence patterns of within-host growth.Islands 1 and 3 both have putative functions linked to metabolism, either the capacity to exploit particular sources of sugar within a host, or to regulate their metabolism.Metabolic capacity and growth rate have been linked to virulence in several bacterial species (39).The maintenance of both commensal and pathogenic lineages of S. suis could therefore be a consequence of a partitioning of the within-host niche.Pathogenic lineages may be better able to exploit particular regions of the tonsil than commensal lineages and vice versa, thereby reducing within-host competition.This could lead to segregation of these populations and reduced gene flow between them, which could in turn lead to the genome reduction in more pathogenic lineages due to fewer opportunities for gene acquisition from more diverse commensal lineages.This partitioning of the within-host niche may also be aided by Island 2. Pili are often involved in adhesion and evasion of cells, and therefore, this island might aid in the colonization of a particular region of the tonsil.
As we observe a high rate of spread of pathogenic lineages between countries, and previous studies have found that pathogenic strains of S. suis are capable of spreading rapidly between pigs within a farm (40), it is likely that pathogenic lineages have faster rates of between-host transmission than commensal lineages.As none of the pathogenicity-associated genes we identify have putative functions that are likely to be directly associated with between-host transmission, faster transmission rates may instead be driven by a trait with a diverse molecular basis, such as the capsular polysaccharide.In the human pathogen Streptococcus pneumonia, the capsular structure is known to both protect against immune recognition and immune clearance within a host (41), and promote between-host transmission through aiding survival outside a host (42), and increasing shedding (43).
The pathogenic lineages we have identified carry a mosaic of genes shared with other pathogenic and commensal lineages of S. suis.This is accompanied by diversity at key loci, such as the capsular locus, and in their association with systemic vs. respiratory disease.Our results reveal that the emergence of novel pathogenic lineages therefore has not only led to more pathogenic lineages but also to their diversification.This diversification of pathogenic lineages has led to difficulties in developing a cross-protective S. suis vaccine (44).We also find evidence of horizontal gene transfer between lineages leading to further diversification.In particular, we find evidence that the serotype 2 and serotype 1/14 capsules have been transferred from zoonotic lineage 1 (ST1) to other lineages.While we have been unable to explore this with our collection, which includes only a small sample of isolates from human infections, previous studies have suggested that strains of S. suis vary in their capacity to infect humans, and most human infections are caused by strains with a serotype 2 or 14 capsule (19).These acquisitions of the serotype 2 and 14 capsular loci may therefore have increased the zoonotic potential of these lineages.
Our results provide a framework for understanding the genomic diversity in S. suis and its association with pathogenicity.This is likely to be of widespread use in S. suis research and in informing strategies for controlling the burden of this disease on pig farming and human health.As our collection spans only a small proportion of the countries that farm pigs globally, further sampling from a broader range countries and more extensive sampling within countries, particularly those with large and growing pig populations, is needed to investigate the existence of additional pathogenic lineages that are geographically restricted or have recently emerged.While our analyses suggest that the global movement of S. suis is driven by trade, they also suggest a possible role for wild boar.Further sampling of wild boar populations is needed to determine the role of wild boar in the transmission of pathogenic lineages between pig farms and into human populations.Further research is also required to experimentally characterise the functions of the pathogenicity-associated genomic islands we have identified and establish their relationship with pathogenicity.Our results suggest that they represent evolutionary constraints on the emergence of pathogenic lineages: The emergence of new pathogenic lineages is contingent on the horizontal transfer of genes from an existing pathogenic lineage to another susceptible lineage.The conditions that allow for the spread of pathogenic lineages may therefore also promote the emergence of new pathogenic lineages and the diversification of existing pathogenic lineages through generating opportunities for the transfer of pathogenicity-associated genes.While this process has so far been gradual, we might predict that the pace will increase.The intensification of pig farming is ongoing in some regions, and the common pathogenic lineages we have identified have only recently spread to some parts of the world.This could lead to both an expanding niche for pathogenic lineages and more opportunities for the emergence of new pathogenic lineages.Controlling the spread of pathogenic lineages of S. suis through pig populations should therefore be a priority to limit the potential impact of this pathogen on our future food security and public health.

Methods
Sampling of Isolates and Characterisation of Disease Association.We generated a collection of genome assemblies of 3,070 isolates of S. suis and its close relatives including both isolates that we sequenced and previously published data (SI Appendix, Table S1).The read data generated by this study is publicly available from the SRA; the BioProject IDs are provided in SI Appendix, Table S1.Our collection includes isolates from Australia, Canada, China, Denmark, Germany, Myanmar, the Netherlands, Spain, the United Kingdom, the United States, and Vietnam.We also included 29 published reference genomes.
Isolates from Denmark, Germany, the Netherlands, and Spain are largely from two newly sequenced collections.The first aimed at collecting similar numbers of systemic disease, respiratory disease, and carriage isolates from pigs from each country and were sampled from 2014 to 2018 (n = 593).The second was a sample of historic isolates that aimed at capturing the breadth of disease-associated strains from each country from sample archives; they date from 1960 to 2007 (n = 99).Isolates from the Netherlands also included a published collection of systemic disease isolates from humans and pigs that date from 1982 to 2008 (n = 97) (45).Isolates from Spain also included newly sequenced isolates from a herd of wild boars sampled in 2015 as part of a published study (n = 41) (46).
Isolates from the United States are from a newly sequenced collection of isolates from pigs from 2017 to 2020.Isolates were obtained from clinical cases submitted to the Iowa State University Veterinary Diagnostic Laboratory for routine diagnostics.Isolates from Myanmar are from another newly sequenced collection of isolates from pigs from farms and slaughterhouses in Yangon.They were sampled from pig farms that included small backyard farms, small-scale traditional farms, and modern industrial farms between 2016 and 2019.They were predominantly from throat swabs from pigs without S. suis disease but were also sampled from farm and slaughterhouse drainage systems.
Isolates Isolates from Vietnam are from a collection that aimed at sampling closely related populations from humans and pigs (described in ref. 47).These included systemic disease isolates (n = 153) from human clinical cases of meningitis from provinces in southern and central Vietnam, and systemic disease (n = 6) or nonclinical isolates (n = 32) from pigs, collected between 2000 and 2010.These isolates were exclusively serotype 2 or 14.Isolates from Canada are from a previously published collection that aimed to target similar numbers of clinical and nonclinical isolates and dates from 1983 to 2016 (described in ref. 50).
Modern isolates from pigs from Denmark, Germany, the Netherlands, Canada, the United Kingdom, and Vietnam were characterised as associated with systemic or respiratory disease, or nonclinical carriage based on clinical symptoms and the isolation site.In pigs that showed clinical symptoms consistent with S. suis infections (e.g., meningitis, septicemia, and arthritis), the site of isolation was classified as "systemic" if recovered from systemic sites (i.e., brain, liver, blood, joints).The site of recovery was classified as "respiratory" if derived from lungs with gross lesions of pneumonia.S. suis isolates from the tonsils, nose, or tracheo-bronchus of healthy pigs or dead pigs without any typical signs of S. suis infections were defined as "nonclinical".Isolates that could not confidently be assigned to these categories (e.g., a tonsil isolate from a pig with systemic signs) were classified as unknown.
Whole Genome Sequencing and Assembly.Illumina whole genome sequencing was undertaken for all newly sequenced isolates.For isolates from Europe and the United Kingdom, DNA extraction, library preparation, and sequencing were undertaken using a HiSeq 2500 instrument (Illumina, San Diego, CA, USA) by MicrobesNG (Birmingham, UK).For isolates from the United States, multiplex genome libraries were prepared using the Nextera XT DNA library preparation kit (Illumina, San Diego, CA, USA).The genomic library was quantified using a Qubit fluorometer dsDNA HS kit (Life Technologies Carlsbad, CA, USA) and normalized to the recommended amplification concentrations.The pooled libraries were sequenced on an Illumina Miseq sequencer using Miseq Reagent V3 for 600 cycles (Illumina, San Diego, CA, USA).Raw reads were demultiplexed automatically on the Miseq.
We assembled high-quality reference genomes for 19 isolates.For 12/19 isolates, long-read sequencing library preparation was performed using Genomic-tips and a Genomic Blood and Cell Culture DNA Midi kit (Qiagen, Hilden, Germany).Sequencing was performed on the Sequel instrument from Pacific Biosciences using v2.1 chemistry and a multiplexed sample preparation.Reads were demultiplexed using Lima in the SMRT link software (https:// github.com/PacificBiosciences/barcoding).Reads shorter than 2,500 bases were removed using prinseq-lite.pl(https://sourceforge.net/projects/prinseq).Hybrid assemblies, using filtered PacBio and Illumina reads, and preliminary assemblies of long-read data assembled with Canu v1.9 (55) were generated with Unicycler v0.4.7 using the normal mode and default settings (56).Assembly graphs were visualised and, if necessary, manually corrected with Bandage (57).For the remaining 7/19 isolates, library preparation, short-read and long-read sequencing, and hybrid assembly with Unicycler were undertaken by MicrobesNG as part of their Enhanced Genome Service, which uses both Illumina and Oxford Nanopore Technologies.Four of these complete assemblies were published in a previous study (58).
Serotyping and Sequence-Typing.Serotypes and STs were determined in silico using the Athey et al. (59) pipeline.

Identification of Homologous Genes and Analysis of Pathogenicity-
Associated Genomic Islands.All genomes were annotated using Prokka v1.14.5 (60).Panaroo v1.2.2 (61) was used to identify orthologous genes (using recommended parameter settings) and create alignments of core genes.Pathogenicity-associated genomic islands were identified through comparison of frequencies of homologous genes identified by Panaroo and analysis of their relative genomic locations.Concatenated alignments of genes within each genomic island were generated and checked by eye.Regions too divergent to pnas.org align were excluded from further analysis.Distance matrices were estimated and a neighbour-joining tree was created using the ape package in R (62).Trees were visualised using iTOL and Grapetree (63,64).
Analysis of Population Structure, Phylogeny, and Geographic Spread.We used PopPunk to cluster our genomes (65).We first used the software to identify divergent genomes and then used the pipeline to cluster genomes in the central population into lineages.We used the standard pipeline followed by refinement using only core distances to determine lineages.
We generated reference-mapped assemblies of the six most common pathogenic lineages with Bowtie2 using reference genomes within the lineages.We identified recombination using Gubbins v2.3.1 (66) and masked all identified recombinant sites in our alignments.We tested for temporal signal in each lineage using a regression of root-to-tip distances against sampling year using the trees output from Gubbins and a published R script (67).Roots were chosen so as to minimise the residual mean squares of a linear regression.For lineage 1, we downsampled our data from 530 to 200 isolates that were randomly selected under the constraint of maintaining the temporal and geographic breadth of the collection.For each lineage, we tested for temporal signal using 1,000 random permutations of dates over clades sampled from the same year to account for any confounding of temporal and genetic structure.This yielded significant evidence (P < 0.05) of temporal signal in all lineages except for in lineage 2. Nevertheless, as our estimate of evolutionary rate for lineage 2 was similar to lineages 1 and 3, we considered it likely that there was sufficient temporal signal to inform our estimates of dates.
We constructed dated phylogenies using BEAST v1.10 with a HKY+Γ model, a strict molecular clock, and an exponential population size coalescent model (68).We also constructed dated phylogenies using a relaxed molecular clock, a constant population size model, and a skyline population model, and found that our results were robust to different model choices.We undertook ancestral state reconstruction in BEAST to infer the geographic spread of these lineages by fitting an asymmetric discrete traits model to the posterior distributions of trees, with each state representing a country.
Data, Materials, and Software Availability.DNA sequence reads, genome Lineages Emerged from a Subpopulation of a Diverse and Largely Commensal Species.We sequenced the genomes of isolates identified in laboratory assays as S. suis from pigs from farms in Denmark (n = 173), Germany (n = 166), the Netherlands (n = 168), Spain (n = 200), the United Kingdom (n = 49), Myanmar (n = 701), and the United States (n = 293) (SI Appendix, Table

Fig. 1 .
Fig. 1.The relationship between pathogenicity and genetic structure within S. suis.(A) The first components of principal component analyses of the presence/ absence of accessory genes and SNPs in core genes plotted against one another for all 3,070 isolates in our collection (other components and their eigenvalues are shown in SI Appendix, Figs.S3 and S4).Points represent individual isolates and different shapes/colours represent categories of lineages.Isolates in the dashed box fall within the central population of S. suis.(B) A core genome phylogeny of the central population of S. suis (excluding divergent isolates).Branches in the pathogenic clade are coloured red and branches outside of this clade are coloured black.Colours in the outer ring indicate the 10 most common lineages in our collection, which we also find to be the most pathogenic.Lineage 1 largely corresponds to ST1, which is associated with most cases of zoonotic disease in humans.(C) The frequency of disease-associated isolates in each of the 10 pathogenic lineages (1 to 10), in other lineages within the pathogenic clade (P, coloured red in B), in the central population outside of the pathogenic clade (C, coloured black in B), and in divergent lineages (D, green addition signs in A).Bars 1 to 10 are coloured to match the outer ring in B with the lower part of the bar (deeper colour) representing the frequency of systemic disease isolates and the upper part of the bar (paler colour) representing the frequency of respiratory disease isolates.(D) The frequency of disease-associated serotypes in the same groups as shown in C.

Fig. 2 .
Fig. 2. Divergent evolutionary histories of the three pathogenicity-associated genomic islands.(A) The same core genome phylogeny as shown in Fig. 1B with three additional outer rings describing the presence of the three pathogenicity-associated genomic islands (1, 2, and 3).(B) Trees representing median genetic distances between pathogenic lineages based on coding regions of each of the three genomic islands (as described in SI Appendix, Figs.S6-S8).For Island 3, lineages that have average distances of zero are represented by the same circle split into segments and lineage 9 is represented by two circles due to the presence of two divergent versions of the island.(C) Cladograms representing inferred patterns of acquisition and inheritance of these three islands (red lines) compared to the core genome (thicker gray lines).Arrows indicate inferred acquisitions from outside of S. suis.A single inferred recombination event is omitted from the description of the history of Island 3 shown in C.

Fig. 3 .
Fig. 3. Dates of emergence and paths of between-country transmission for the six most common pathogenic lineages.(A) Estimates of the dates of the most recent common ancestors of the six most common pathogenic lineages (coloured points) against an estimate of the global number of pigs (gray line).Countryspecific estimates of pig numbers are shown in SI Appendix, Fig. S15.The vertical dashed line shows the date of the first reported case of S. suis disease in pigs (1954), and the dotted line shows the first reported human case (1968).(B) Map showing inferred routes of transmission of these six pathogenic lineages between the countries in our collection.Arrows represent routes with at least one inferred transmission event.Routes with more than ten inferred transmission events are shown in red, those with more than three in blue, and those with one to three in turquoise.Further details of the numbers and rates of movements between countries across our six lineages are shown in SI Appendix, Figs.S18-S20 and TableS4.

Fig. 4 .
Fig. 4. Reconstructions of the emergence and spread of the six most common pathogenic lineages.Time-scaled phylogenies for the six most common pathogenic lineages of S. suis coloured by country of origin.Colours of branches represent the most likely ancestral location inferred from ancestral state reconstructions with an asymmetric discrete transition model in BEAST.

Fig. 5 .
Fig. 5. Gradual and ongoing genome reduction in pathogenic lineages.(A) Average genome sizes for recently sampled isolates (2018 to 2020) are shown against the proportion of disease-associated isolates for pathogenic lineages 1 to 8. Pathogenic lineages 9 and 10 are not shown as we have no or very few recent isolates from these lineages in our collection.The size of the points reflects the number of isolates in our collection on a log-scale.The dashed line represents the average genome size of isolates of S. suis from outside of the pathogenic clade, where disease isolates are present at a frequency of approximately 25%.(B)Genome sizes of individual isolates against sampling year for lineages 1 to 8. Dashed lines represent best fits for linear models of genome size against year of isolation; all have a negative slope, with a gradient ranging from a loss of 516 bases per year (lineage 1) to a loss of 5,550 bases per year (lineage 5).
from the United Kingdom came from four different collections.The first sampled nonclinical and clinical isolates from pigs across England and Wales in 2009-2011 (described in ref. 47) (n = 193).The second in 2013 to 2014 sampled nonclinical isolates from five farms (described in ref. 48) (n = 117).The third in 2013 to 2014 targeted clinical isolates from pigs across England and Wales (described in ref. 49) (n = 129).The fourth is a newly sequenced collection of archived clinical isolates from pigs that date from 1987 to 2000 (n = 49).