Identification of evolutionarily conserved virulence factor by selective pressure analysis of Streptococcus pneumoniae

Evolutionarily conserved virulence factors can be candidate therapeutic targets or vaccine antigens. Here, we investigated the evolutionary selective pressures on 16 pneumococcal choline-binding cell-surface proteins since Streptococcus pneumoniae is one of the pathogens posing the greatest threats to human health. Phylogenetic and molecular analyses revealed that cbpJ had the highest codon rates to total numbers of codons under considerable negative selection among those examined. Our in vitro and in vivo assays indicated that CbpJ functions as a virulence factor in pneumococcal pneumonia by contributing to evasion of neutrophil killing. Deficiency of cbpL under relaxed selective pressure also caused a similar tendency but showed no significant difference in mouse intranasal infection. Thus, molecular evolutionary analysis is a powerful tool that reveals the importance of virulence factors in real-world infection and transmission, since calculations are performed based on bacterial genome diversity following transmission of infection in an uncontrolled population.


Abstract
Streptococcus pneumoniae is a pathogen that poses one of the greatest threats to human health, causing pneumonia, sepsis, and meningitis. Here, we investigated the evolutionary selective pressures on 16 pneumococcal choline-binding cell-surface proteins. Phylogenetic and molecular analyses revealed that the cbpJ and lytA genes had the highest codon rates under negative selection among those examined. Although LytA is an important virulence factor inducing pneumococcal autolysis, the role of CbpJ in pneumococcal pathogenesis is unknown. We therefore examined whether CbpJ acts as a virulence factor in vitro and in vivo.
In neutrophil bactericidal assays, the cbpJ mutant showed reduced survival as compared to wild-type (WT) S. pneumoniae. Mice intranasally infected with a cbpJ mutant strain showed a lower bacterial burden in bronchoalveolar lavage fluid and higher survival rate than those infected with WT cells, whereas no differences were observed between strains by intravenous infection. Thus, molecular evolutionary analysis is a powerful tool that reveals the importance of virulence factors in real-world infection and transmission, since calculations are performed based on bacterial genome diversity following transmission of infection in an uncontrolled population. In addition, evolutionarily conserved virulence factors are candidate therapeutic targets or vaccine antigens. Improper use of antibiotics creates evolutionary pressures that drive bacteria to acquire drug resistance by natural mutation and/or horizontal transfer of resistance genes. This is a major public health threat: it is estimated that drug-resistant infections cause 10 million deaths annually and may result in economic losses reaching 100 trillion US dollars by 2050 1 .
However, a target-to-hit screen typically requires 24 works-in-process and 94 million US dollars, and the baseline total cost is 1.8 billion US dollars over 13 years to launch a new drug 2 . In fact, the number of new antibiotics developed and approved has steadily decreased in the past three decades, leaving fewer options for treating resistant bacteria 3 .
Streptococcus pneumoniae is one of the pathogens posing the greatest threat to human health 4,5 . S. pneumoniae belongs to the mitis group 6,7 and is a major cause of pneumonia, sepsis, and meningitis 8,9 . In 2015, pneumococcal pneumonia caused over 1.5 million deaths in individuals of all ages, and this rate increased in people over 70 years old between 2005 and 2015 10 , which is especially problematic since the elderly population is growing in many parts of the world. Although pneumococcal conjugate vaccines have considerable benefits, non-vaccine pneumococcus serotypes have increased worldwide 11,12 . The conflict between the host immune system and pathogens leads to an evolutionary arms races known as the "Red Queen" scenario 13,14 . Protein regions at the host-pathogen interface are subjected to the strongest selective pressure and thus evolve under positive selection. Adaptive evolution has been reported in genes related to the mammalian immune system such as pattern recognition receptors 14 . In contrast, bacterial proteins are under negative/purifying selection, which is important for the survival and/or success of the species since non-synonymous substitution of codons can lead to lineage extinction (Fig. 1).
Phylogenetic and molecular evolutionary analyses can reveal the number of codons under negative/purifying selection in a species, since alterations in amino acid residues in regions under negative selective pressure are not allowed. Thus, drugs targeting these regions would be less likely to promote the development of resistance through natural mutation.
We analysed pneumococcal choline-binding proteins (CBPs) localised on the bacterial cell surface through interaction with choline-binding repeats and phosphoryl choline on the cell wall. S. pneumoniae harbours various CBPs including N-acetylmuramoyl L-alanine amidase (LytA), which induces pneumococcal-specific autolysis 15 . Although several CBPs have been characterised, their phylogenetic relationships remain unclear and the unclassified gene names are confusing. We first analysed the distribution of genes encoding CBPs based on pneumococcal genome sequences. Orthologues of genes in each strain were identified by phylogenetic analysis. We then calculated the evolutionary selective pressure on each codon from phylogenetic trees and aligned sequences. We found that the cbpJ gene contains the highest rate of codons under negative selection. CbpJ has no known functional domains except signal sequences and choline-binding repeats, and its role in pneumococcal pathogenesis is unclear. Functional analyses revealed that CbpJ contributes to evasion of host neutrophil-mediated killing in pneumococcal pneumonia. Thus, evolutionary analysis focusing on negative selection can reveal novel virulence factors.

Distribution of cbp genes among pneumococcal strains
Genes encoding CBPs among pneumococcal strains were extracted by tBLASTn search (Supplementary Table 1). Some genes were re-annotated since the search results showed that certain homologous regions were not matched to annotated open reading frames (ORFs). In strain SPNA45, the SPNA_01670 gene contains both predicted promoter regions and intact ORF structures of the cbpF and cbpJ genes. On the other hand, cbpG-homologous regions in strains R6, D39, SPN034183, SPN994038, and SPN994039 did not contain promoters (Supplementary Table 1 and Supplementary Table 2). Orthologous relationships of each gene were analysed. The distribution of cbp genes was unrelated to capsular serotypes. Four genes-i.e., lytA, lytB, cbpD, and cbpE-were conserved as intact ORFs in all 28 pneumococcal strains ( Fig. 2A). Other cbp genes contained frameshift mutations in the orthologues or were absent in some strains.

Phylogenetic relationships in pneumococcal CBPs
Phylogenetic relationships of genes encoding CBPs in pneumococcal species are confusing since some genes in the same cluster show high similarity to each other. To clarify the relationships, we compared common nucleotide sequences among genes encoding CBPs in strain TIGR4. Maximum likelihood and Bayesian phylogenetic analyses revealed two common clusters: one comprising cbpF, cbpG, cbpJ, cbpK, and cbpC, and the other lytA, lytB, lytC, cbpL, and cbpE ( Fig. 2B and Supplementary Fig. 1). The names of some CBP genes were not consistent with those of phylogenetically related genes. In particular, cbpF, cbpG, cbpJ, and cbpK were located close to each other in pneumococcal genomes and showed high similarity. We thus defined orthologous genes in each pneumococcal strain based on maximum likelihood and Bayesian phylogenetic analyses ( Fig. 3 and Supplementary Fig. 2).
The gene locus tag numbers in orthologous relationships are shown in Supplementary Table 1.
The sequence similarity of cbpF, cbpG, cbpJ, and cbpK and their close proximity within genomes indicated that a common ancestral S. pneumoniae acquired the genes by gene duplication. Phylogenetic trees showed well-separated clusters of each gene. These independent relationships indicated that horizontal gene transfer did not contribute to the spread of cbpF, cbpG, cbpJ, and cbpK in S. pneumoniae species, despite their ability to take up exogenous DNA. The genetic diversity of these genes may have been established by accumulation of natural mutations during pneumococcal transmission.

Evolutionary selective pressures on each of the CBP codons
To evaluate the significance of CBPs in real-life infection and transmission, we performed molecular evolutionary calculations based on bacterial genome diversity established after transmission of infection in an uncontrolled population. The nucleotide sequences of each CBP were aligned by codon, and conserved common codons were used for phylogenetic analysis ( Supplementary Fig. 3). The selective pressure on each gene was calculated based on the phylogenetic trees and aligned sequences ( Table 1). The rates of codons under negative selection are visualised in Supplementary Figure 4. Over 13% of total codons in cbpJ and lytA genes were under negative selection as compared to less than 5% for other cbp genes, indicating that these genes play an important role in the success of S. pneumoniae species. On the other hand, the pspA gene encoding the genetically divergent virulence factor PspA contained fewer evolutionarily conserved codons, but had the highest numbers of codons under positive pressure. Additionally, there were no evolutionarily conserved codons in the cbpG, cbpC, and cbpL genes. The latter two had no common codons since a few genes had frameshift mutations. When we re-calculated selective pressure without these genes, we found a low rate of codons under negative selection among CBP-encoding genes (Supplementary Table 3).

CbpJ acts as a virulence factor in pneumococcal pneumonia
While CbpJ had the highest rate of codons under negative selection among pneumococcal CBPs, it has no known functional domains except a choline-binding repeat in its amino acid sequence; moreover, its role in pneumococcal pathogenesis is unknown. In contrast, CbpL showed limited numbers of evolutionarily conserved codons even after the above-described adjustment. Domain structures and codons of CbpJ and CbpL under negative selection are shown in Figure 4A. To assess the roles of CbpJ and CbpL in pneumococcal pathogenesis, we generated mutant strains deficient in the corresponding genes. We first performed a blood bactericidal assay to investigate the role of CbpJ and CbpL in sepsis. The survival rates of ΔcbpJ and ΔcbpL strains in mouse blood were comparable to that of the wild-type (WT) strain (Fig. 4B). We also carried out a human neutrophil bactericidal assay without serum.
Strain ΔcbpJ had a lower survival rate than the WT. These results suggest that CbpJ contributes to the evasion of neutrophil-mediated killing (Fig. 4C).

In a mouse intravenous infection model, the survival rates of ΔcbpLand
ΔcbpJ-infected mice did not differ significantly from that of mice infected with WT S. pneumoniae (Fig. 5A). We also found that incubation of S. pneumoniae in human plasma for 3 h inhibited the expression of cbpL and cbpJ, as determined by quantitative real-time (q)PCR (Fig. 5B). On the other hand, mice intranasally infected with strain ΔcbpJ showed an improved survival rate as compared to those infected with WT S. pneumoniae; although a similar tendency was observed for ΔcbpL-infected relative to WT mice, the difference was not statistically significant (Fig. 5C). The number of bacteria in bronchoalveolar lavage fluid (BALF) from ΔcbpJ-infected mice was lower than that in BALF from ΔcbpLand WT-infected mice (Fig. 5D). We also examined whether CbpL or CbpJ contributes to the association of S. pneumoniae to alveolar epithelial cells and found that WT S. pneumoniae as well as ΔcbpL and ΔcbpJ mutant strains did not differ in their ability to adhere to A549 human alveolar epithelial cells (Fig. 5E). However, S. pneumoniae WT strain exhibited extensive inflammatory cell infiltration and bleeding as compared to the ΔcbpJ strain.
Histological examination of lung tissue from intranasally-infected mice showed that ΔcbpJ induced milder inflammation than the WT strain. Lung tissue from ∆cbpL-infected mice showed moderate inflammation (Fig. 5F). These results indicate that CbpJ acts as a pneumococcal virulence factor in lung infection by contributing to the evasion of neutrophil-mediated killing, whereas CbpJ has no role in bacterial survival in blood. In addition, cbpL deficiency in strain TIGR4 did not significantly attenuate pathogenesis in mouse lung and blood infection.

Discussion
In this study, we investigated evolutionarily conserved rates of CBP codons since these cell surface proteins directly interact with the external environment, which induces rapid rates of evolution in genes involved in genetic conflicts 14  Combining an evolutionary analysis and an animal model would be highly effective for evaluating the functional significance of a putative virulence factor.
Genome-wide association study (GWAS) is a powerful tool for identifying the relationship between genetic variants-mainly single nucleotide polymorphisms (SNPs)-and phenotype, such as in diseases. As GWAS has become more prevalent, various programs and software packages have been developed for this purpose 24,25 . On the other hand, this approach has certain limitations, including the requirements for an appropriate control group and detailed information regarding phenotype. In infectious diseases, it can be difficult to quantify clinical features recorded at different medical centres. Furthermore, in the case of most pathogens, there are no natural attenuated or avirulent strains that can serve as a control group. Our evolutionary analysis has the advantage that it can be performed with genomic information of pathogenic strains only by assuming the presence of pathogens as a phenotype evading natural selection. Since synonymous and non-synonymous substitutions are estimated to occur with equal probability under no selective pressure, a population in which the latter has resulted in extinction by natural selection can serve as a control group. While we have shown in the current study that evolutionary analysis with a small population has the power to detect evolutionarily conserved proteins, a larger population would allow a higher-resolution analysis, including the detection of conserved regions in some pathogenic strains isolated from a specific site of infection or pathological condition. Since this analysis involves simultaneous processing of aligned nucleotide and amino acid sequences, more information is obtained from only SNPs extracted from nucleotide sequences. In addition, automatised phylogenetic and evolutionary analyses are needed to analyse a large population.
Therefore, the development of software packages for meta-data is expected to aid widespread application of this analytical approach.
There are some limitations to our evolutionary analysis. However, these features of molecular evolutionary analysis can be advantages when screening for therapeutic target sites or vaccine antigens with a low frequency of missense mutations, which could reduce the virulence or survivability of the pathogen. Evolutionary analysis could also be an effective alternative strategy for overcoming drug resistance through antigen replacement, and could reduce costs associated with drug discovery and development.
The lytA gene, which was conserved among virtually all pneumococcal strains, showed the highest rates of codons under negative selection, except for the cbpJ gene that was only present in some strains. LytA is known to induce pneumococcal-specific autolysis 29 and contributes to pneumococcal virulence 30,31 . Our evolutionary analysis supports previous reports that the lytA gene is a suitable genetic marker 32,33 due to its evolutionary conservation.
We also showed that the pspA and cbpA genes contain relatively high rates of codons under positive selection, and both encode polymorphic virulent proteins 26,27,34 that are candidate vaccine antigens, although the genes are not universally present within a global serotype 1 collection 28 . In addition, selective pressure by vaccines can easily cause differentiation or deficiency of these proteins since the corresponding genes contain few codons under negative selection. A multivalent system would be required for vaccines using these antigens.
Although the difference was not statistically significant, mice intranasally infected with TIGR4 ΔcbpL strain showed a trend towards improved survival relative to the WT-infected mice. In a previous study, a D39lux cbpL-deficient strain showed reduced virulence as compared to the WT strain 35 . Since CbpL sequences in TIGR4 and D39 strains are similar, the discrepancy between the previous study and our findings is likely due to differences in other surface proteins in each strain. For example, the absence of CbpJ, which contributes to the evasion of neutrophil killing, could affect the survivability of D39.
Recently, antivirulence drugs have been developed as an additional strategy to treat or prevent bacterial infections. Drugs targeting bacterial virulence factors are expected to reduce the selective pressure of conventional antibiotics since they would not affect the natural survival of targeted bacteria 36 . Furthermore, the abundance of candidate targets is a major advantage of antivirulence strategies. Effective design of vaccines and antivirulence drugs requires a thorough understanding of virulence factors; combining our evolutionary analysis and traditional molecular microbiological approaches can improve the detection of potential drug targets. In this study, we identified CbpJ as a novel evolutionarily conserved virulence factor. Thus, molecular evolutionary analysis is a powerful system that can reveal the importance of virulence factors in real-world infections and transmission.

Phylogenetic and evolutionary analyses
Phylogenetic and evolutionary analyses were performed as previously described 37 S. pneumoniae TIGR4 isogenic cbpJ (ΔcbpJ) and cbpL (ΔcbpL) mutant strains were generated as previously described 22 . Briefly, the upstream region of cbpJ or cbpL, an aad9 cassette, and the downstream region of cbpJ or cbpL were combined by PCR using primers shown in Supplementary Table 4. The products were used to construct the mutant strains by double-crossover recombination with synthesised CSP2 50 . All mutations were confirmed by PCR amplification of genomic DNA isolated from the mutant strains.

Blood and neutrophil bactericidal assays
A blood bactericidal assay was performed as previously described 20,22,51 . Mouse blood was obtained via cardiac puncture from healthy female CD-1 mice (Slc:ICR, 6 weeks old; Japan SLC, Hamamatsu, Japan). For human neutrophil isolation, blood was collected via venopuncture from healthy donors after obtaining written, informed consent according to a protocol approved by the institutional review board of Osaka University Graduate School of Dentistry (H26-E43). Neutrophils were isolated from fresh human blood by density gradient centrifugation using Polymorphprep (Alere Technologies, Jena, Germany). Pneumococcal cells grown to the mid-log phase were washed and resuspended in phosphate-buffered saline (PBS). Bacterial cells (1 × 10 4 CFU/20 µl) were combined with fresh mouse blood (180 µl) or human neutrophils (2 × 10 5 cells/180 µl), and the mixture was incubated at 37°C in 5% CO 2 for 1, 2, and 3 h. Viable cell counts were determined by seeding diluted samples onto THY blood agar. The percent of original inoculum was calculated as the number of CFU at the specified time point divided by the number of CFU in the initial inoculum.

Mouse infection assays
All mouse experiments were conducted in accordance with animal protocols approved by the Animal Care and Use Committee of Osaka University Graduate School of Dentistry (28-002-0). Female CD-1 mice (Slc:ICR, 6 weeks old) were intranasally infected with 5 ×

Statistical analysis of in vitro and in vivo data was performed with a non-parametric
Kruskal-Wallis test with Dunn's multiple comparisons test. Mouse survival curves were compared with the log-rank test. P < 0.05 was considered statistically significant. The tests were carried out with Prism v.7.0d software (GraphPad Inc., La Jolla, CA, USA). All experiments were repeated at least three times. In the evolutionary analyses, P < 0.1 was regarded as a significant difference with the HyPhy default setting.

Acknowledgements
This study was supported in part by Japanese Society for the Promotion of Science

K G L I S A A G G GC C T G A T T T C C A A G G G A T T G A T C AC C A A A G C A T T G TT T T C C
ly tA _ T a iw a n 1 9 F _ 1 4 _ _ A T C C _ 7 0 0 6 6 9 _ _ T C H 8 4 3 1 _ 1 9 A _ _ S T 5 5        Growth of pneumococcal strains in mouse blood (B) and in the presence of human neutrophils (C). Bacterial cells were incubated in blood or with neutrophils for 1, 2, and 3 h at 37°C and 5% CO 2 , then serially diluted and plated on TS blood agar. The number of CFUs was determined following incubation. Growth index was calculated by dividing CFU after incubation by the CFU of the original inoculum. Data are presented as the mean of six samples with standard error.