Large scale genomic analysis shows no evidence for repeated pathogen adaptation during the invasive phase of bacterial meningitis in humans

Recent studies have provided evidence for rapid pathogen genome variation, some of which could potentially affect the course of disease. We have previously detected such variation by comparing isolates infecting the blood and cerebrospinal fluid (CSF) of a single patient during a case of bacterial meningitis. To determine whether the observed variation repeatedly occurs in cases of disease, we performed whole genome sequencing of paired isolates from blood and CSF of 938 meningitis patients. We also applied the same techniques to 54 paired isolates from the nasopharynx and CSF. Using a combination of reference-free variant calling approaches we show that no genetic adaptation occurs in the invasive phase of bacterial meningitis for four major pathogen species: Streptococcus pneumoniae, Neisseria meningitidis, Listeria monocytogenes and Haemophilus influenzae. From nasopharynx to CSF, no adaptation was seen in S. pneumoniae, but in N. meningitidis mutations potentially mediating adaptation to the invasive niche were occasionally observed in the dca gene. This study therefore shows that the bacteria capable of causing meningitis are already able to do this upon entering the blood, and no further sequence change is necessary to cross the blood-brain barrier. The variation discovered from nasopharyngeal isolates suggest that larger studies comparing carriage and invasion may help determine the likely mechanisms of invasiveness. Author Summary We have analysed the entire DNA sequence from bacterial pathogen isolates from cases of meningitis in 938 Dutch adults, focusing on comparing pairs of isolates from the patient’s blood and their cerebrospinal fluid. Previous research has been on only a single patient, but showed possible signs of adaptation to treatment within the host over the course of a single case of disease. By sequencing many more such paired samples, and including four different bacterial species, we were able to determine that adaptation of the pathogen does not occur after bloodstream invasion during bacterial meningitis. We also analysed 54 pairs of isolates from pre- and post-invasive niches from the same patient. In N. meningitidis we found variation in the sequence of one gene which appears to provide bacteria with an advantage after invasion of the bloodstream. Overall, our findings indicate that evolution after invasion in bacterial meningitis is not a major contribution to disease pathogenesis. Future studies should involve more extensive sampling between the carriage and disease niches, or on variation of the host.


Abstract
Recent studies have provided evidence for rapid pathogen genome variation, 20 some of which could potentially affect the course of disease. We have previously 21 detected such variation by comparing isolates infecting the blood and 22 cerebrospinal fluid (CSF) of a single patient during a case of bacterial meningitis. 23 To determine whether the observed variation repeatedly occurs in cases of 24 disease, we performed whole genome sequencing of paired isolates from blood 25 and CSF of 938 meningitis patients. We also applied the same techniques to 54 26 paired isolates from the nasopharynx and CSF. 27 Using a combination of reference-free variant calling approaches we show that 28 no genetic adaptation occurs in the invasive phase of bacterial meningitis for 29 four major pathogen species: Streptococcus pneumoniae, Neisseria meningitidis, 30 Listeria monocytogenes and Haemophilus influenzae. From nasopharynx to CSF , 31 no adaptation was seen in S. pneumoniae, but in N. meningitidis mutations 32 potentially mediating adaptation to the invasive niche were occasionally 33 observed in the dca gene. 34 This study therefore shows that the bacteria capable of causing meningitis are 35 already able to do this upon entering the blood, and no further sequence change 36 is necessary to cross the blood-brain barrier. The variation discovered from 37 nasopharyngeal isolates suggest that larger studies comparing carriage and 38 invasion may help determine the likely mechanisms of invasiveness. 39

62
Bacterial meningitis is a severe inflammation of the meninges surrounding the 63 brain as a response to the presence of bacteria [1]. This inflammation can 64 5 Such within-host variation has been shown to occur through a variety of 90 mechanisms such as recombination [15], gene loss [16,17] and variation in 91 regulatory regions [18][19][20]. The rapid variation that occurs in these regions of 92 the genome can increase the population's fitness as the bacteria adapt to the host 93 environment [21,22], and potentially affect the course of disease [23]. Previous 94 studies have shown variation between strains even during the rapid clinical 95 progression of bacterial meningitis [24,25]. 96 97 It is possible that the existing genetics of the bacterium invading from the 98 carriage population may determine, prior to blood stream invasion, whether CSF 99 invasion is possible. Causing invasive disease is an evolutionary dead end for 100 these pathogens, so studies of carriage will not observe selection for variations 101 that advantage bacteria in the blood or CSF. Current knowledge is mostly focused 102 at the serotype and MLST level, and lacks the resolution and sample size to be 103 able to address this question [26][27][28]. Though the only whole genome based 104 study suggests this is not case (at the gene level) in S. pneumoniae [29], we 105 believe higher powered study designs are needed to better answer this question. 106 107 We also hypothesise that bacterial variation may also occur during the invasive 108 phase of meningitis. We have previously reported in a single patient that the 109 bacteria appeared to adapt to the distinct conditions of blood and CSF [24]. 110 These are very different niches from that of nasopharyngeal carriage where this 111 variation is well documented [30], not least because the bacteria are under more 112 intense exposure to immune pressures and have less time over which to 113 accumulate mutations. 114 To look for adaptation to these three niches, we used samples from the 116 MeninGene study [31,32], based at the Academic Medical Centre Amsterdam. 117 which has a functional dlt operon, the variations were annotated with their 214 predicted function. There was no directionality to the mutations: 19 occurred in 215 the blood, and 11 in the CSF. Only seven of the patients infected by these strains 216 showed signs of blood invasion before CSF invasion (sinusitis or otitis); this also 217 did not show directionality. The nature of the mutations is shown in Fig. 2 and 218  The other notable gene with more mutations than expected in N. meningitidis 239 was porA, a variable region which is a major determinant of immune reaction 240 [48], in which 12 samples had frameshift mutations in one of two positions. 241 Phase variation in the gene's promoter region, affecting its expression, is 242 discussed in more detail below. 243

244
The mutations in table 2 show no association with blood or CSF specifically, so 245 do not represent adaptation to either niche. Genetic variation in pilE, hpuA, wbpC, 246 porA and lgtB within host has been observed previously in a single patient with a 247 hypermutating N. meningitidis infection [25]. The coding sequences with excess 248 variants that are found in the samples analysed here include these genes. This 249 also suggests an elevated background rate in these sequences, rather than strong 250 selection between the blood and CSF niches. We therefore separately investigated the mutations in non-coding regions, which 264 were only observed in S. pneumoniae and N. meningitidis. As genome annotations 265 from the calling above are not consistent between samples outside of CDSs, we 266 mapped the variation in intergenic regions to the coordinates of a reference 267 genome. In a subset of samples we had carriage isolates corresponding to blood 268 and CSF isolate pairs (discussed further below); we used these carriage isolates 269 as the reference genome to determine whether these mutations occur in the 270 blood or CSF isolate. frpB/NMB1988 which is a surface antigen involved in iron uptake [55]. 287 Differential expression of these genes may be an important factor affecting 288 invasion, but the mutations we observed that may affect this do not appear to be 289 specific to blood or CSF. 290 291 In N. meningitidis we observed six samples with single base changes in length of 303 the phase-variable homopolymeric tract in the porA gene's promoter sequence, 304 and five samples with the single base length changes in the analogous promoter 305 sequence of opc. While changes in the length of these tracts will affect expression 306 of the corresponding genes, both of which are major determinants of immune 307 response [59,60], the tract length does not correlate with blood or CSF 308 specifically. porA expression has previously been found to be independent of 309 whether isolates were taken from CSF, blood or throat [59]. 310 311 In S. pneumoniae, recent publications highlight a potential role in virulence for 312 the ivr locus, a type I restriction-modification system with a phase-variable 313 specificity gene allele of hsdS in the host specificity domain (Fig. 3) [19,20,61]. 314 There are six possible different alleles A-F ( Figure S5) for hsdS, each 315 corresponding to a different level of capsule expression. Some of these alleles are 316 more successful in a murine model of invasion, whereas others are more 317 successful in carriage. We used a mapping based approach (see methods) to 318 determine whether any of these alleles were associated with either the blood or 319 CSF niche specifically, which could be a sign of adaptation. 320

321
As the locus inversion is rapid and occurs within host, we first ensured that 322 cultured samples are representative of the original clinical samples using PCR 323 quantification of each allele. However, as even a single colony contains 324 heterogeneity at this locus, simply taking the allele with the most reads mapping 325 to it in each sample gives a poor estimate of the overall presence of each allele in 326 the blood and CSF niches. To take into account the mix of alleles present in each 327 sample, and to calculate confidence intervals, we developed a hierarchical 328 Bayesian model for the ivr allele (see Fig S6 and Methods). This simultaneously 329 estimates the proportion of each colony pick with alleles A-F for both each 330 individual isolate (p), and summed over all the samples in each niche (µ). We 331 apply this over i samples and c niches (in this case c can be blood or CSF). 332 333 For each pair of blood and CSF samples listed in Table S1, the difference in allele 334 prevalence πCSF -πblood was calculated (table S3). All S. pneumoniae samples had a 335 difference in mean of at least one allele (as the confidence intervals overlap 336 zero), highlighting the speed at which this locus inverts. 337 338 While this means that between a single CSF and blood pair the allele at this locus 339 usually changes, it is the mean of µc (corresponding to the mean allele frequency 340 in each niche over all sample pairs) which tells us whether selection of an allele 341 occurs in either the blood or CSF more generally. This is plotted in Fig. 4. As the 342 confidence intervals overlap, no particular allele is associated with either blood 343 or CSF S. pneumoniae isolates. 344 Previous work on a murine invasion model [19] has shown an increase in 346 proportion of alleles A and B over the course of infection. We did not observe the 347 same effect in our clinical samples, though the large confidence intervals from 348 the mathematical model suggest that genomic data with a small insert size 349 relative to the size of repeats in the locus is limited in resolving changes in this 350 allele. A small selective effect of ivr allele between these niches would therefore 351 not be detected, but we can rule out strong selection for a particular allele (odds 352 ratio > 2). Application to read data from a large carriage dataset may help 353 resolve whether the same effect does occur in humans, as it would provide a 354 greater temporal range over the course of pathogenesis. 355 Carriage and invasive disease sample pairs show some evidence of repeated 356 adaptation 357 Using the same methods, we also sequenced and analysed pairs of genomes from 358 54 patients that were collected from the nasopharynx and CSF. Six of these were 359 S. pneumoniae. In these S. pneumoniae samples, we detected only one sample 360 with any variation, which was a two base insertion upstream of the gph gene 361 ( Figure 5). This is similar to the amount of mutation observed between the blood 362 and CSF isolates, which is expected given the similar sampling timeframes. While 363 we found that a functional dlt operon appears to have a deleterious effect in 364 invasive disease, we did not observe mutation between our carriage and disease 365 samples. However, this was expected given the small number of carriage samples 366 relative to the effect size detected for this operon. 367 368 Between the 48 N. meningitidis carriage and CSF isolate pairs small numbers of 369 mutations were common. We went on to search for regions enriched for 370 mutation, however in 8 samples we observed large numbers of mutations 371 clustered close together. These represent single recombination events, so when 372 analysing genes enriched for mutation we counted each recombination as a 373 single event [62,63]. 374 375 Table 4 shows the results of this analysis. Similar genes are mutated as in the 376 blood/CSF pairs, again with no specificity to either niche. In phase variable 377 intergenic regions, we observed four sample pairs with an insertion or deletion 378 in the porA promoter tract with no niche specificity. Otherwise, none of the 379 regions above showed enrichment for mutation in either niche. These 380 observations support the theory that these genes mutate at a higher rate but do 381 not confer a selective advantage in any of the three niches studied. 382 383 We found overall that the amount of variation within bacteria that occurs after 415 infection is low. The mutation observed is not randomly distributed throughout 416 the genome, but is randomly distributed between blood and CSF isolates. These 417 mutations are therefore an observation of a higher mutation rate in these 418 regions during invasion (for example the pilus in N. meningitidis, which is known 419 to be under diversifying selection) but not repeated adaptation to either niche. 420 This study indicates that our previous observation of variation between blood 421 and CSF isolates from a single case of meningitis was a rare event most likely 422 driven by antibiotic selection pressure during treatment. 423

424
We went on to analyse 54 samples comparing carriage and invasive isolates from 425 the same patient. Though the sample size was lower, and we did not fully sample 426 diversity within the nasopharynx, we were able to get an insight into potential 427 genetic differences between bacteria in these niches. We see some of the same 428 genes mutate rapidly between blood and CSF isolates also doing this between 429 carriage and invasion. This supports the conclusion that these genes have a 430 higher mutation rate, rather than giving a selective advantage to a niche. Finally of cortex was then used with each set of corrected reads in its own path in the de 470 Bruijn graph, and bubble calling was used to produce a second set of variants 471 between samples. SNPs in the error corrected reads were also called using the 472 graph-diff mode of SGA [78]. 473

Simulations of closely related genomes 474
As the rate of variation is very low, we needed to ensure we had sufficient power 475 to call variants and didn't suffer from an elevated false negative rate. We did this 476 by simulating evolution of S. pneumoniae genomes along the branch of the tree 477 between S. pneumoniae R6 [79]  We used simulations to compare against a simple method of mapping against an 504 arbitrary reference, in this case TIGR4 [83]. We found our reference free method 505 has greater power, especially for INDELs (figure S8), and a markedly reduced 506 false positive rate. We also used an assembly method alone to compare gene 507 presence and absence, but this too suffered from a vastly elevated false positive 508 rate (figure S9). 509

Tests for genes, intergenic regions and genotypes enriched for mutation 510
To scan for repeated variation, the number of mutations in each CDS annotation 511 (adjusted for CDS length) was counted. We then performed a single-tailed 512 Poisson test using the genome wide mutation rate per base pair multiplied by 513 the gene length as the expected number of mutations. The resulting p-values we 514 corrected for multiple testing using a Bonferroni correction with the total 515 number of genes tested as the m tests. Tables 1 and 2 report those CDS with an 516 adjusted p-value less than the significance level of 0.05. For intergenic regions, 517 anywhere with more than one variant is reported. 518 519 To test whether certain genotypic backgrounds are associated with a higher 520 number of mutations that occurs post-invasion, a linear fit of each MLST against 521 number of mutations between blood and CSF isolates was performed. The p-522 values of the slope for each MLST were Bonferroni corrected; at a significance 523 level of 0.05 no MLST was associated with an increased number of mutations. 524 For genes the same test was performed, except samples were coded as one and 525 zero based on whether they had a mutation in the gene being tested or not. We 526 performed a logistic regression for each gene with over ten mutations in tables 1 527 and 2: no genes being mutated post invasion were associated with an MLST. 528

Copy number variation 529
We called copy number variations (CNVs) between samples by first mapping The observed number of reads mapping to each allele, prior distributions 604 defined above, and structure of the model in figure S6 defines a likelihood which 605 can be used to infer the most likely values of the parameters of interest p and µ. 606 We used Rjags to perform MCMC sampling to simulate the posterior distribution 607 of these parameters. We used 3 different starting points, and took and discarded 608 30000 burn in steps, followed by 45000 sampling steps. Noticeable auto-609 correlation was seen between consecutive samples, so only every third step in 610 the chain was kept in sampling from the posterior. We manually inspected plots 611 of each hyperparameter value and mean at each point in the chain, as well as the 612 Gelman and Rubin convergence diagnostic, which showed that the chain had 613 converged over the sampling interval (figure S10). 614 615 To model both the 5' end (TRD 1.1 and 1.2) and the 3' end (TRD 2.1, 2.2 and 2.3) 616 together, so each isolate i is represented by an allele A-F, for each isolate the total 617 number of reads mapping ni was drawn from the distribution in equation (1)  618 where j is the index of the TRD region, rij is the number of reads in sample i that 619 had a mate pair downstream from TRD1.j mapping to any TRD2 region, and pi is 620 the posterior for allele frequency in the sample. 621

622
The distribution for the number of reads mapping to each allele j, zij was defined 623 as in equation (2) where qi is a vector of length six which contains the number of reads mapped to 625 each allele A-F as described above, and p, i and n are as previously. A single 626 sample for z was taken for each isolate i. This 6-vector zij is then used as the 627 observed data in the same model as above to infer pi, and µc for the whole locus 628 allele (A-F) rather than just the 5' end. 629 630 For the 5' allele (TRD1.j) a model using a single k parameter rather than a k 631 indexed by tissue c was preferred (change in deviance information criterion [89] 632 DDIC = −0.523). For the 3' allele (TRD2.j), a model with a single k parameter did 633 not converge. A model with k indexed by allele was used instead. 634

Diversity of ivr allele within samples 635
As the speed of inversion is rapid, the subsequent polymorphism of this locus 636 was also used to evaluate our assumptions about diversity of the bacterial 637 population within each niche. We calculated the Shannon index of each sample's 638 vectors pCSF and pblood to measure diversity of the sample in each niche. The mean 639 Shannon index across CSF samples was 1.01 (95% highest posterior density 640 (HPD) 0.39-1.51) and 0.98 (95% HPD 0.35-1.55) in the blood. Looking at each 641 sample pair individually, the difference between diversity in each niche appears 642 normally distributed with a mean of zero (figure S11). Together, these 643 observations suggest a similar rate of diversity generation in each niche. This is 644 in line with our assumption that the two populations have similar mutation 645 rates, and a similar number of generations between being founded and being 646 sampled. 647