Typing complex meningococcal vaccines to understand diversity and population structure of key vaccine antigens

Background: Protein-conjugate capsular polysaccharide vaccines can potentially control invasive meningococcal disease (IMD) caused by five (A, C, W, X, Y) of the six IMD-associated serogroups. Concerns raised by immunological similarity of the serogroup B capsule to human neural cell carbohydrates, meant that ‘serogroup B substitute’ vaccines target more variable subcapsular protein antigens. A successful approach using outer membrane vesicles (OMVs) as major vaccine components had limited strain coverage. In 4CMenB (Bexsero ®), recombinant proteins have been added to ameliorate this problem. Methods: Scalable, portable, genomic techniques were used to investigate the Bexsero ® OMV protein diversity in meningococcal populations. Shotgun proteomics identified 461 proteins in the OMV, defining a complex proteome. Amino acid sequences for the 24 proteins most likely to be involved in cross-protective immune responses were catalogued within the PubMLST.org/neisseria database using a novel OMV peptide Typing (OMVT) scheme. Results: Among these proteins there was variation in the extent of diversity and association with meningococcal lineages, identified as clonal complexes (ccs), ranging from the most conserved peptides (FbpA, NEISp0578, and putative periplasmic protein, NEISp1063) to the most diverse (TbpA, NEISp1690). There were 1752 unique OMVTs identified amongst 2492/3506 isolates examined by whole-genome sequencing (WGS). These OMVTs were grouped into clusters (sharing ≥18 identical OMVT peptides), with 45.3% of isolates assigned to one of 27 OMVT clusters. OMVTs and OMVT clusters were strongly associated with cc, genogroup, and Bexsero ® antigen variants, demonstrating that combinations of OMV proteins exist in discrete, non-overlapping combinations associated with genogroup and Bexsero ® Antigen Sequence Type. This highly structured population of IMD-associated meningococci is consistent with strain structure models invoking host immune and/or metabolic selection. Conclusions: The OMVT scheme facilitates region-specific WGS investigation of meningococcal diversity and is an open-access, portable tool with applications for vaccine development, especially in the choice of antigen combinations, assessment and implementation.

The OMVT scheme facilitates region-specific WGS investigation

Introduction
Ideal vaccines comprise a single antigenically conserved component of the target pathogen, preferably against a virulence determinant. The highly successful tetanus and diphtheria toxoid vaccines epitomise this approach, with a major impact on disease globally as part of the World Health Organisation Expanded Programme on Immunisation (WHO EPI) 1 . Similarly, the conjugate polysaccharide vaccines against Haemophilus influenzae type b and certain capsular variants of Neisseria meningitidis, the meningococcus, and Streptococcus pneumoniae, have been highly successful in eliminating the diseases they cause, largely through herd immunity effects 2 . A number of more complex vaccines have also been successful, but several of these, including the widely used tuberculosis vaccine BCG 3 , or those comprising killed whole cells, such as whole cell pertussis and certain typhoid and cholera vaccines, are more difficult to define and control, often achieving imperfect disease control. Antigenically diverse pathogens continue to challenge vaccine design, and it is necessary either: (i) to identify protective antigens that are invariant, as has been attempted with the RTS,S malaria vaccine 4 , or (ii) to include multiple components, perhaps varying composition over time, as has been done with influenza vaccines over many years.
There is a limited repertoire of meningococcal polysaccharide capsules associated with invasive meningococcal disease (IMD), identified as serogroups A, B, C, W, X, and Y 5 , potentially simplifying vaccine formulation. While vaccines based on one or a few capsular polysaccharides have been deployed successfully since the 1960s, with notable success achieved by proteinpolysaccharide conjugate vaccines 2 , a comprehensive anti-capsular vaccine has been precluded by concerns about the safety and efficacy of formulations that include the serogroup B polysaccharide. Attempts to generate 'serogroup B substitute' vaccines have consequently focussed on subcapsular antigens, especially proteins, but in most cases these are highly variable. Meningococcal diversity extends beyond antigen genes, with a very wide range of distinct genotypes defined by variation in 'housekeeping genes' that encode essential cytoplasmic metabolic functions. This diversity is captured by multilocus sequence typing (MLST) approaches, including conventional MLST 6 , ribosomal MLST (rMLST), and core genome MLST (cgMLST) 7 . These approaches have shown that both genetic and antigenic diversity of this organism is structured into clusters of related organisms, which share a common ancestor (lineages). These clusters are recognised by conventional seven-locus MLST as clonal complexes (ccs) 8 . The ccs are correlated with clinical phenotypes, including the propensity to cause disease and its severity 9 , and antigenic characteristics including capsular and subcapsular antigens 10 . Comprehensive genomic typing schemes have been developed that catalogue serogroups, ccs and principle vaccine antigens 11 .
The multiple variants of the subcapsular antigens included in serogroup B substitute vaccines do not necessarily generate crossreactive protective immune responses against a wide range of distinct meningococci 12 . Two main approaches have been adopted to overcome this problem: (i) the generation of strain-specific vaccines that target epidemics caused by particular meningococci; (ii) the identification of conserved antigens that offer broad protection. The most widely used strain-specific vaccines have been based on meningococcal outer membrane vesicles (OMVs), which contain outer membrane proteins (OMPs) as principle antigens, especially the PorA porin 5 . The investigational vaccine 'Nonamen' included nine different PorA proteins in a single OMV background to increase population coverage 13 . 'Second generation' meningococcal B substitute vaccine development, using both conventional 14 and 'reverse vaccinology' 15 approaches, identified Factor H binding protein (fHbp) as a leading contender for a single, conserved vaccine antigen, leading to licensure of two vaccines: Bexsero ® (4CMenB) 16 ; and Trumenba ® (rLP2086) 14 . The fHbp protein is, however, highly variable in meningococcal populations 17 , so the effectiveness of the protection offered, not fully established at the time of writing, is dependent on the generation of cross-protective immune responses 12 .
The bivalent Trumenba ® vaccine, containing two lipidated fHbp variants was licenced for use in the US in 10-15-year-olds in 2014 and for those aged 10 years or over in Europe in 2017; however, as of summer 2018, Bexsero ® vaccine was the only serogroup B substitute vaccine licenced, in 2013, for use in infants. This vaccine was introduced into the UK infant immunisation programme in 2015 18 , undergoing post implementation assessment 19 and comprising three recombinant meningococcal proteins: fHbp, neisserial heparin binding antigen (NHBA); and Neisseria adhesin A (NadA); together with the MeNZB™ OMV that contained PorA as a principal antigen 20 . The meningococcal antigen typing system (MATS) immunological assay had been used to assess potential cross-reactivity of toddler antibodies to the principal Bexsero ® components 21,22 and a Bexsero ® Antigen Sequence Typing scheme (BAST) was developed to assess the sequence variability of these vaccine components from meningococcal whole-genome sequence (WGS) data 23,24 . At the time of writing, however, no means were available to systematically asses the other protein components of the OMV, which could potentially contribute to cross-immunity generated by this vaccine. Shotgun proteomics was performed to identify the protein content of the Bexsero ® -derived OMVs, and alongside published data 25 , informed the selection of proteins for use in the development of an Outer Membrane Vesicle peptide Typing (OMVT) scheme. Here we describe the OMVT scheme and investigate its value in supporting the assessment of vaccine impact and the improvement of vaccine formulation.

Methods
Proteomic OMV sample preparation Two batches of meningococcal NZ98/254 derived OMVs, from GlaxoSmithKline, Siena Italy, were analysed in triplicate. Each replicate contained 100 µg total protein in 0.5% SDS and 200 mM

Amendments from Version 1
We are grateful to the reviewers for their insightful and constructive comments. We have revised the manuscript to reflect these comments in particular clarifying: the proteomic methodology; decisions for choosing proteins to include in the typing scheme; the uncertainty regarding cross-protection of certain outer membrane proteins; and the limitations of "designer" OMV vaccine.

See referee reports
REVISED triethylammonium bicarbonate, (TEAB), the pH 8.0 was reduced with tris(2-carboxyethyl) phosphine and alkylated with iodoacetamide before an overnight acetone precipitation. A 200 mM TEAB solution containing 2.5 µg trypsin (Promega) was directly added to the protein pellet and digestion was performed at 37°C overnight. The resulting peptides from each sample in a set were labelled with a different isobaric tag (TMTs 126-131, ThermoFisher) before being mixed into a single sixplex sample.
A sample of labelled peptide mixture was injected on to a XBridge C18 column, (5 µm, 4.6 mm id and 25 cm long, Waters) for the first-dimension high pH RP-HPLC separation under a linear gradient consisting mobile phase A (10 mM ammonium formate, pH 10.0) and up to 70% B (90% acetonitrile in mobile phase A) for 2 hours at flow rate of 0.5 ml/min, using a Jasco system consisting an autosampler, semi-micro HPLC pumps and UV detector. Eluted fractions were collected and concatenated into eight tubes and vacuum dried. As OMPs are not all equally susceptible to enzymatic digestion, we evaluated different sample preparation methods. The method described provided an increased number of OMPs to other studies in our laboratory and the contribution of membrane to cytoplasmic proteins was comparable to other studies 25-27 .

Nano-LC-MS/MS
Nano-LC and tandem mass spectrometry (MS/MS) was performed using a U3000 direct nano system coupled with nano-electrospray and LTQ-Orbitrap Discovery mass spectrometer (Thermo). Resuspending in 0.1% formic acid, the HPLC fractions containing a mixture of sixplex labelled peptides were separated on a PepMap C18 reversed phase nano column (3 µm, 100 Ǻ, 50 cm length, Thermo) under a column flow rate of 0.3 µl/min using linear gradient of 5-25% for 180 min, 25-32% for 20 min and 32-90% for 10 min of 95% acetonitrile and 0.1% formic acid. MS scan and MS/MS fragmentation were carried out in Orbitrap and LTQ mass analysers, respectively, using 2 cycles of top 3 datadependent acquisition with dynamic exclusion mode enabled and total cycle time at approximately 30 milliseconds. The first cycle used collision-induced dissociation (CID) fragmentation generating spectra for peptide sequencing, and the second High energy CID (HCD).

Proteomic data analysis
Mass spectra processing, database searching and quantitation were carried out using Thermo Proteome Discoverer 1.4 with builtin Sequest against the Uniprot N. meningitidis MC58 FASTA database, release 2014_03. Spectra from the 8 fractions were added together as one sample during searching. Initial mass tolerances by MS were set to 10 ppm. Up to two missed tryptic cleavages were considered. Methionine oxidation was set as dynamic modification whereas carboxymethylation on cysteine and TMT6plex labels on N-terminal amino acid and lysine side chain were set as static modifications. Positive protein identification was considered when matched to minimum of two peptides sequenced at rank 1 with high confidence. Protein FASTA sequences identified were subsequently submitted to a web server for protein subCELular LOcalisation prediction (http://cello.life.nctu. edu.tw/cello2go/) and PSORTb; Supplementary Table 1) 28-30 . OMVT scheme development and nomenclature A total of 25 proteins were chosen, primarily for pragmatic reasons as this was considered manageable for detailed curation within the typing scheme. The proteins represented the 20 most abundant outer membrane or periplasmic proteins identified in by Nano-LC-MS/MS with an additional five proteins chosen from the published core proteome 25 , also identified by Nano-LC-MS/MS ( Table 1). All proteins chosen for the scheme were predicted to be either outer membrane or periplasmic and therefore more likely to be OMV antigens than the cytoplasmic proteins, which were  more abundant in the proteomic analysis. The cytoplasmic proteins are unlikely to be constituents of the meningococcal outer membrane and their presence in OMV samples is a probable consequence of cell lysis during detergent extraction causing cytosolic proteins to be released into the preparations 31 . Although the Pilin E (PilE) protein was amongst this group, it was excluded from the scheme due to high frequency (>60%) of sequencing or assembly difficulties, which were a consequence of the highly repetitive DNA regions and high diversity of this antigen, with multiple copies in the meningococcal genome. Thus, 24 proteins were taken forward and defined using a novel peptide-sequence based nomenclature and visualised with GView (Version 1.7) ( Figure 1) 32 . Each of the 24 peptides was been given the prefix NEISp, referring to the peptide sequence deduced from the corresponding nucleotide locus, designated with the prefix 'NEIS'. Each peptide was numbered according to its equivalent NEIS number. For example, porA has a nucleotide locus NEIS1364, and peptide locus NEISp1364. This collection of 24 peptide antigens form the OMVT scheme, which is hosted on the PubMLST Neisseria database 33 . The closed reference genome from New Zealand isolate NZ05/33 was used to identify variant 1 amino acid sequence for each peptide included in the OMVT scheme. This isolate shares 99.4% sequence identity and 1595/1605 core genome loci with NZ98/254, the New Zealand outbreak isolate used to make the MeNZB™ vaccine. Every unique amino acid sequence variant subsequently identified was assigned a unique variant number, in order of discovery.

Peptide variant designation and caveats
For each component of the OMVT scheme, peptide variants were assigned from deduced peptide sequences of the respective NEIS locus. Where the NEIS locus was a complete coding sequence, i.e. containing a start and terminal stop codons with the number of base pairs (bp) being a multiple of three and no internal stop codons, each unique peptide sequence was assigned a unique arbitrary variant number in order of identification. Where the gene contained a mutation generating a frameshift, indel, or an internal stop codon more than 50 bp from the consensus stop codon, then variant 0 (null) was assigned, following the deduction that no functional protein would be produced from such loci. Where internal stop codons occurred fewer than 50 bp from the consensus stop codon, the peptide variants were assigned. Where no NEIS or NEISp variant could be deduced, due to incomplete sequencing or genome assembly, no variant was assigned.

OMVTs
For OMVT profiles with all 24 loci present, each unique combination was assigned an integer, the OMVT. For example, OMVT-1 corresponded to the OMV peptide variants present in the NZ05/33 genome. All subsequent OMVTs were numbered in order of identification. Although OMVTs provided a rapid means of identifying isolates with identical profiles, there were related isolates that differed at ≤23 loci. To analyse these relationships among OMVTs, a clustering method was applied using eBURSTv3, which determined non-overlapping groups of closely-related strains, i.e. no OMVT was allocated to more than one OMVT cluster, and entry into a cluster required matching at ≥18 loci to the central OMVT 34 . To ensure stability of OMVT clusters, a central OMVT was defined as the OMVT that differed from the largest number of other OMVTs at only a single peptide sequence. This was also used to name the cluster, for example OMVT-449 was the central OMVT of the OMVT-449 cluster. Bootstrapping for each group was assessed using the 23-peptide cut-off.
The OMVT scheme was used to catalogue the diversity of IMD isolates from the Meningitis Research Foundation Meningococcus Genome Library (MRF-MGL) and related isolate collections. This included 3506 WGS of all culture-confirmed meningococcal isolates in epidemiological years 2010/11 to 2016/17 from England, Wales, Scotland, and Northern Ireland, representing approximately 55% of all laboratory-confirmed IMD in these regions 24,33 . All isolates in the collection were automatically annotated for the OMVT variants, with manual curation of new variants. OMVTs and OMVT clusters were assigned accordingly.

Cluster analysis
GrapeTree software 35 was used to cluster allelic profiles from large collections of WGS and visualise these relationships. It provides an interactive, web-based interface associated with relevant metadata including: year, serogroup, cc, and BAST. GrapeTree was run through the PubMLST.org plug-in.

Statistical analysis
All statistical analyses were performed using R version 3.2.4. Cramer's V coefficient was used to assess the association of peptide loci with cc and was calculated using the 'Cramer's V' function in the 'lsr' package (v0.5) in R. The diversity of each OMVT protein was assessed using Shannon's index of diversity (H) and the "adjusted diversity" calculated from the natural log of the number of peptide variants/amino acid/isolate. H was calculated using the 'Vegan' package (v2.4-v2.6) in R, H represents the uncertainty in predicting the peptide variant of a new isolate, given the number of peptide variants and the evenness in abundance of isolates possessing each variant. H increases as richness (number of variants) and evenness (distribution) in the population increases.

Results
The proteome of the OMV from Bexsero ® A total of 461 proteins, representing the proteome of the NZ98/254 OMV, were unambiguously identified in this study (Supplementary Table 1). The predicted subcellular location of each protein revealed 282 cytoplasmic proteins, 60 inner membrane or periplasmic proteins, 36 outer membrane or extracellular proteins and 83 that could not be definitively assigned to any location. Other proteomic analysis of OMVs have found between 100-300 proteins 25-27 , with an increasing number of proteins identified in the OMVs presumably due to technical advances improving the sensitivity of mass spectrometers. All studies share the finding that membrane and periplasmic proteins dominate the proteome in terms of abundance. The peak areas of the top three most abundant peptides were used to provide a relative quantity of each of the proteins. By summing the peak areas of all the proteins from each location, the protein abundance in each sub-cellular location was determined with 58% of the total protein from the outer membrane, 6% from either the inner membrane or periplasm, 32% cytoplasmic proteins, and 3% without a location defined. The 24 proteins chosen for the OMVT scheme encompass 60% of the total protein found in the OMVs ( Figure 2).
The distribution of OMVT protein variants within the UK IMD isolates was non-random. A cumulative frequency distribution of the peptide variants for each locus, for the 3506 IMD isolates, demonstrated the loci for which four variants are found in >90% of isolates including: FbpA (NEISp0578); putative periplasmic . Relative amount of protein as determined from peak area of tandem mass spectrometry data. Proteins were assigned to a group using CELLO 29,30 and PSORTb 28 informatic tools, where the predictions differed proteins were analysed manually using UNIPROT and available published data to define a location. Where a location could not be predicted proteins were assigned an unknown status. protein (NEISp1063); Mip/FkpA (NEISp1487); and PldA (NEISp1687) (Figure 4). Other loci, including TbpA, LbpA, FetA, PorA, and PorB, showed more diversity and variants were distributed more evenly across the 3506 isolates. The distribution of the variation within each protein was also non-random, i.e. the variation was not distributed evenly across the amino acid sequence. In many OMVT proteins there were regions of high diversity concentrated in the so-called "variable regions", representing surface-exposed loops under immune selection, previously well-characterised for TbpA, PorB, and PorA 36-38 ( Figure 5).

Distribution of OMVT variants by OMV vaccine
OMV vaccines can be uniquely identified using OMVTs; for example MeNZB™ and Bexsero ® vaccines are OMVT-1. Each OMVT deduced protein from all 3506 isolates were compared to the variants present in OMVT-1, a perfect match giving a score of 24. Only 10/3506 (0.3%) of isolates did not share any peptide variants with MeNZB™, these belonged to cc23, cc213, and cc116. The only isolates with a defined cc that had 11 or more shared variants with MeNZB™ belonged to cc41/44 (Figure 8a). There were 34 isolates with undefined cc with ≥11 shared variants with MeNZB™, corresponding to STs that have not been assigned to a cc or isolates that had a partial MLST profile related to cc41/44. The cc41/44 isolates comprised 169 STs. Among these cc41/44 isolates, there was a bimodal distribution of OMVT variants shared with MeNZB™, suggesting two dis-  The same approach was used to analyse other OMV vaccines: MenBVac ® ; MENGOCOC-VA ® ; and the Chilean OMV vaccine, all of which were derived from cc32 isolates and optimised OMV formulations (Figure 8b, c). By identifying variants that occurred at high frequency and across the most prevalent OMVT clusters, the composition of the 24 OMVT proteins was manipulated to improve the number of antigenic matches amongst the set of UK disease isolates. Devising a formulation comprising the protein variants found in the most frequently occurring ccs, which differed at 14/24 loci from OMVT-1, exhibited the highest degree of shared variants to an OMV formulation (Figure 8b). The maximum number of shared variants was 14 (n=40), and all isolates shared ≥1 variant. The lower overall number of variant matches was because most isolates shared the loci that are independent of cc, and additionally only a small number of cc-specific variants. Another way to broaden potential vaccine coverage would be to create a vaccine containing multiple OMVTs, for example a formulation representing the five most prevalent hyperinvasive ccs (OMVTs: 160, 368, 1152, 27, and 21). With this formulation all isolates in the reference set shared ≥2 variants with the proposed vaccine, with 3310/3506 (94.4%) isolates having ≥8 variant matches, across all ccs, except cc11 ( Figure 8c). All cc11 isolates had ≤8 shared variants with the multiple OMVT vaccine; however, they are highly uniform and the development of a      . The disease isolates (n=3506) were stratified by cc (n shown after each cc) and then the number of matches, 1-24, are displayed on the right (n shown after the number of matches). The thickness of the lines is proportional to the number of isolates. Undefined cc represents STs that have not been assigned to a cc or isolates that had a partial multilocus sequence typing profile.
single-strain cc11 OMV vaccine, using for example OMVT-1149, would provide good coverage (Figure 8d), similar to that demonstrated by previously developed OMV vaccines in Cuba, Norway, and New Zealand.

Discussion
The diversity of the meningococcus and the complexity of its infection biology 39  The majority of the OMVT scheme components were OMPs, constituting approximately 60% of the total protein content 51 known to be present in the meningococcal outer membrane, and therefore potentially exposed to diversifying selection imposed by immune responses, although the most diverse components Pil and Opa were too variable to be included in the scheme 52 . Consistent with this view, the cytoplasmic proteins present in the OMV formulation, three of which were ribosomal subunit proteins, were the least diverse; however, among the OMPs, there was a wide range of diversity. The least diverse proteins, a putative periplasmic protein (NEISp1063) and FbpA, (NEISp0578), are likely not to have surface-exposed regions and might also be influenced by functional constraints. In the most diverse proteins, TbpA (NEISSp1690), PorA (NEISSp1364) and PorB (NEISSp2020), peptide sequence variability was localised in particular 'variable regions' (VRs), shown in a number of studies to represent surface-exposed parts of the sequence subject to immune attack by bactericidal antibodies 36-38 . As our knowledge and understanding of the OMV proteome grows, in terms of understanding protein function, location and immunological relevance to disease and cross-protection, the OMVT scheme could be amended to remove or add relevant antigens.
As reported previously for a number of meningococcal protein antigens, the variable components of the OMVT were non-randomly distributed among isolates, with the most diverse antigens exhibiting a non-overlapping structure 53,54 . The persistence of particular non-random, non-overlapping combinations of PorA and FetA antigens, a few of which attain high frequency in the population 55 , has been used as evidence to support the existence of stable stain types in the meningococcal population driven by immune and/or metabolic selection 53,54 . This structuring was especially exhibited by the more diverse proteins in the OMVT scheme, with the exception of TbpA (NEISSp1690) and LbpA (NEISSp1468). Thus, although there were a very large number of variants of individual proteins and combinations of these variants, they grouped into a relatively small number of clusters, similar to those observed with FetA and PorA 55 and the BAST scheme 23 . In addition, as with PorA:FetA types 9 and BASTs 24 , OMVT clusters were strongly associated with ccs and capsular group. Ccs correspond to lineages within the meningococcal population, several of which have an increased propensity to cause invasive disease. These are referred to as the hyperinvasive lineages 39 and they dominate collections of isolates from IMD, although they are rarer in isolates obtained from asymptomatic carriage 9 . A number of mechanisms have been proposed to explain the existence of persistence of these population structures in the highly recombinogenic meningococcus 56 , but whatever the mechanism that generates them, these associations can be exploited in vaccine formulation, potentially simplifying it 10 .
Indexing of the variation in the OMVTs in the context of representative isolate collections, such as the MRF-MGL 33 , along with the analysis tools embedded within the PubMLST.org website, enabled the in silico assessment of the likely performance of different OMV formulations. The OMVT from a given isolate could be rapidly determined and the likely content of an OMV vaccine made from the isolate deduced. The number of exact protein matches present in a given meningococcal population to this vaccine could then be readily established, indicating the likely degree of coverage that vaccine might attain. This analysis only considered exact matches to vaccine OMP variants as crossprotection is not well characterised for antigens beyond PorA and FetA. Analysis of the UK reference dataset showed that the MeNZB™ OMV exhibited a similar number of exact matches to a putative cc11 OMV, with higher levels of exact matches attainable by alternative vaccine formulations or the inclusion of multiple OMVs in a single vaccine formulation. In the absence of evidence for very broad cross-protection, these comparisons suggest that multi-component or 'designer' OMVs that contained artificial repertoires of OMPs could have substantial advantages in terms of vaccine coverage, though this requires an understanding of regional meningococcal epidemiology with ongoing surveillance and the additional lipoproteins present in native OMV preparations.
Using a similar approach as the previously described MLST 6 and BAST 23 schemes, the OMVT scheme represents a portable and scalable means of cataloguing variation in the OMV components of meningococcal vaccines 12 . This provides a stable basis for the comparison of the extensive variation of these molecules. The OMVT scheme has the potential to be enhanced with: (i) bioinformatic approaches that predict B-and T-cell epitopes 57 ; and (ii) phenotypic expression and immunological data, to provide a more sophisticated prediction of likely crossprotection provided by these vaccines 58 ; however, while a peptide sequence catalogue can be comprehensive, this is unlikely to be the case for phenotypes and some deduction from sequence data is likely to remain essential for the foreseeable future. Correlation of the OMVT with immune responses in infants and those that affect carriage in older individuals is of particular importance in assessing the cost-effectiveness of novel meningococcal vaccines 18 . The embedding of the OMVT scheme within the PubMLST.org/nesseria database 59 enables the scheme to be implemented genus-wide, permitting inter-species as well as intra-species diversity to be explored. This is of particular importance given the renewed interest in the development of novel vaccines against Neisseria gonorrhoeae, the gonococcus, and the possible impact of Bexsero ® on gonococcal infection 60 . Intra-species diversity is also important when considering the possible impact of such vaccines on the oropharyngeal microbiota. Finally, this approach exemplifies how WGS data can be used to support the development and evaluation of complex multicomponent vaccines against highly variable pathogens.

Supplementary material
Supplementary

Grant information
This study was funded by the Wellcome Trust (109031 and 087622).
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. 1.

2.
3. The paper describes a novel typing scheme to address the population diversity of meningococcal vaccine antigens. This constitutes an important step forward, as it allows a more "holistic" approach to estimate vaccine coverage. Instead of focusing on one or a few individual antigens, using this method one can assess the overall potential contributions of multiple antigens in OMV-based vaccines.

Minor comments
Considering that the total OMV proteome listed in Supplementary Table 1 has 461 proteins, the selection of just 24 candidates for the scheme seems somewhat arbitrary. It would be helpful to give more discussion on the reasons for this particular selection, and how the results could be influenced by a different selection. In our experience, not all outer membrane proteins are equally susceptible to the trypsin digestion step used during sample preparation for MS analysis, and therefore the details of the method used can influence the quantification results that are obtained. Have the authors compared different methods ? Some well-known vaccine antigens are missing in Supplementary Table 1, in particular the Opa proteins and fHbp. Were these left out deliberately ? Figure 4 contains very interesting information on variation of individual antigens. However, it is very difficult to identify the antigens corresponding to the different lines. The NZ98/254 OMV used is obtained with detergent extraction. However, other methods for OMV preparation have also been described, which avoid the use of detergent extraction. Such native OMVs have different antigen composition, especially with regards to surface-exposed lipoproteins such as the well-known vaccine antigen factor H binding protein. They would therefore require different selections of OMVT antigens. It would be helpful to discuss this in the context of hypothetical novel OMV formulations (Figure 8).

Are sufficient details of methods and analysis provided to allow replication by others? Yes
If applicable, is the statistical analysis and its interpretation appropriate?

If applicable, is the statistical analysis and its interpretation appropriate? Yes
Are all the source data underlying the results available to ensure full reproducibility? Yes We are grateful for your constructive review and we have addressed all the following points: Minor comments 1. Considering that the total OMV proteome listed in Supplementary Table 1 has 461 proteins, the selection of just 24 candidates for the scheme seems somewhat arbitrary. It would be helpful to give more discussion on the reasons for this particular selection, and how the results could be influenced by a different selection.
We have clarified this in the "OMVT scheme development and nomenclature" section: A total of 25 proteins were chosen, primarily for pragmatic reasons as this was considered manageable for detailed curation within the typing scheme.The proteins represented the 20 most abundant outer membrane or periplasmic proteins identified in by Nano-LC-MS/MS with an additional five proteins chosen from the published core proteome 25 , also identified by Nano-LC-MS/MS ( Table 1). All proteins chosen for the scheme were predicted to be either outer membrane or periplasmic and therefore more likely to be OMV antigens than the cytoplasmic proteins, which were more abundant in the proteomic analysis. The cytoplasmic proteins are unlikely to be constituents of the meningococcal outer membrane and their presence in OMV samples is a probable consequence of cell lysis during detergent extraction causing cytosolic proteins to be released into the preparations.

We have additionally noted in the "Distribution of OMVTs in IMD and association with cc" section:
To determine whether a different subset of OMV proteins could alter the clustering of isolates, the same 3506 meningococcal genomes were analysed using 50 OMV proteins (OMPs, periplasmic and cytoplasmic), showing similarly strong association with clonal complex (data not shown).
2. In our experience, not all outer membrane proteins are equally susceptible to the trypsin digestion step used during sample preparation for MS analysis, and therefore the details of the method used can influence the quantification results that are obtained. Have the authors compared different methods ?
We have added this to the "Proteomic OMV sample preparation" section: As OMPs are not all equally susceptible to enzymatic digestion, we evaluated different sample preparation methods. The method described provided an increased number of OMPs to other studies in our laboratory and the contribution of membrane to cytoplasmic proteins was comparable to other studies.
3. Some well-known vaccine antigens are missing in Supplementary Table 1, in particular the Opa proteins and fHbp. Were these left out deliberately ?
We have added this in the figure legend of Supplementary  Figure 4 contains very interesting information on variation of individual antigens. However, it is very difficult to identify the antigens corresponding to the different lines.
We have made the markers on the graph larger as well as on the legend, so it will be easier to identify which colour represents the different antigens.
5. The NZ98/254 OMV used is obtained with detergent extraction. However, other methods for OMV preparation have also been described, which avoid the use of detergent extraction. Such native OMVs have different antigen composition, especially with regards to surface-exposed lipoproteins such as the well-known vaccine antigen factor H binding protein. They would therefore require different selections of OMVT antigens. It would be helpful to discuss this in the context of hypothetical novel OMV formulations (Figure 8). This manuscript describes the development of an Outer Membrane Vesicles Peptide Typing (OMVT) scheme based on a subset of selected proteins in 3506 UK invasive isolates. The paper is of great interest, well-written, and the methodology used to define the OMVT scheme and the definition of the relative clusters are clearly described. Moreover, the paper represents a milestone in vaccine development, because it provides important insight in the assessment of vaccine impact, improvement of vaccine formulations, and evaluation of inter-species and intra-species diversity.

We have clarified this in the figure legend of Supplementary
Minor comments: -page 4 paragraph: The criteria used for the selection OMVT scheme development and nomenclature of five additional proteins from the core proteome, and the basis for the selection of only five should be further clarified. -page 7 paragraph: It would be worth describing the The proteome of the OMV from Bexsero differences between the number of proteins identified from this analysis, compared to what is already published.
-: The figure would be clearer by increasing the size of the dots to enable the reader to easily Figure 4 identify the distribution for each single antigen. A possibility could be also to zoom in the region of the dashed line, related to the four most common variants.
-: Only results related to distribution of variation for proteins known to contain variable regions Figure 5 and for three most conserved proteins are shown in Figure 5. However, it would be interesting to add a comment to what happens in-between: are there other peptides containing variation patterns compatible with variable regions? -at page 16 and : The sentence "[…]has been used as evidence to support the Discussion Abstract existence of stable stain types in the meningococcal population driven by immune and metabolic selection 50,51]" should read "immune and/or metabolic selection". The same applies to the abstract: ….. "is consistent with strain structure models invoking metabolic and/or host immune selection". -: The comparison between potential coverage of real and hypothetical vaccines is very Discussion interesting. However, it could be more appropriate to clarify a few points in the discussion section: The differences between antigens in which a perfect match is needed to confer protection (e.g. PorA) and antigens for which the influence that sequence variability could have on the cross-protective ability is unknown, and should be further emphasized. The ability of additional antigens to confer cross-protection and not included in the OMV analysis, should be mentioned and further discussed. It would be interesting to evaluate in future how much these antigens cluster with OMVT scheme. The sentence on page 16 : "In the absence of evidence for a very broad cross-protection[…]",highlights the importance of this study and raises a very interesting concept related to a multi-component or "designer" OMV vaccine. It would be interesting to briefly introduce also the possible limitations that such an approach may have.

Is the work clearly and accurately presented and does it cite the current literature? Yes
Is the study design appropriate and is the work technically sound? Yes

Are sufficient details of methods and analysis provided to allow replication by others? Yes
If applicable, is the statistical analysis and its interpretation appropriate? - Figure 4: The figure would be clearer by increasing the size of the dots to enable the reader to easily identify the distribution for each single antigen. A possibility could be also to zoom in the region of the dashed line, related to the four most common variants.
We have made the markers on the graph larger as well as on the legend, so it will be easier to identify which colour represents the different antigens.
- Figure 5: Only results related to distribution of variation for proteins known to contain variable regions and for three most conserved proteins are shown in Figure 5. However, it would be interesting to add a comment to what happens in-between: are there other peptides containing variation patterns compatible with variable regions?
We have added the following text to the figure 5 legend: There were other proteins that had potential variable regions including well-described protein NEISp1963 (FetA), NEISp 1468 (LbpA) and NEISp 1428 (Ton-B dependent receptor).
-Discussion at page 16 and Abstract: The sentence " […]has been used as evidence to support the existence of stable stain types in the meningococcal population driven by immune and metabolic selection 50,51]" should read "immune and/or metabolic selection". The same applies to the abstract: ….. "is consistent with strain structure models invoking metabolic and/or host immune selection".
Thank you, this has been changed in the updated manuscript.
-Discussion: The comparison between potential coverage of real and hypothetical vaccines is very interesting. However, it could be more appropriate to clarify a few points in the discussion section: • The differences between antigens in which a perfect match is needed to confer protection (e.g. PorA) and antigens for which the influence that sequence variability could have on the cross-protective ability is unknown, and should be further emphasized.
We have added the following to the discussion: This analysis only considered exact matches to vaccine OMP variants as cross-protection is not well characterised for antigens beyond PorA and FetA.
• The ability of additional antigens to confer cross-protection and not included in the OMV analysis, should be mentioned and further discussed. It would be interesting to evaluate in future how much these antigens cluster with OMVT scheme.
We have amended the following to include: As our knowledge and understanding of the OMV proteome grows, in terms of understanding protein function, location, and immunological relevance to disease and cross-protection, the OMVT scheme could be amended to remove or add relevant antigens. scheme could be amended to remove or add relevant antigens.
• The sentence on page 16 : "In the absence of evidence for a very broad cross-protection[…]",highlights the importance of this study and raises a very interesting concept related to a multi-component or "designer" OMV vaccine. It would be interesting to briefly introduce also the possible limitations that such an approach may have We have added the following limitations of this approach: In the absence of evidence for very broad cross-protection, these comparisons suggest that multi-component or 'designer' OMVs that contained artificial repertoires of OMPs could have substantial advantages in terms of vaccine coverage, though this requires an understanding of regional meningococcal epidemiology with ongoing surveillance and the additional lipoproteins present in native OMV preparations.
No competing interests were disclosed. Competing Interests: