Modified MLVA for Genotyping Queensland Invasive Streptococcus pneumoniae

Background Globally, over 800 000 children under five die each year from infectious diseases caused by Streptococcus pneumoniae. To understand genetic relatedness between isolates, study transmission routes, assess the impact of human interventions e.g. vaccines, and determine infection sources, genotyping methods are required. The ‘gold standard’ genotyping method, Multi-Locus Sequence Typing (MLST), is useful for long-term and global studies. Another genotyping method, Multi-Locus Variable Number of Tandem Repeat Analysis (MLVA), has emerged as a more discriminatory, inexpensive and faster technique; however there is no universally accepted method and it is currently suitable for short-term and localised epidemiology studies. Currently Australia has no national MLST database, nor has it adopted any MLVA method for short-term or localised studies. This study aims to improve S. pneumoniae genotyping methods by modifying the existing MLVA techniques to be more discriminatory, faster, cheaper and technically less demanding than previously published MLVA methods and MLST. Methods Four different MLVA protocols, including a modified method, were applied to 317 isolates of serotyped invasive S. pneumoniae isolated from sterile body sites of Queensland children under 15 years from 2007–2012. MLST was applied to 202 isolates for comparison. Results The modified MLVA4 is significantly more discriminatory than the ‘gold standard’ MLST method. MLVA4 has similar discrimination compared to other MLVA techniques in this study). The failure to amplify particular loci in previous MLVA methods were minimised in MLVA4. Failure to amplify BOX-13 and Spneu19 were found to be serotype specific. Conclusion We have modified a highly discriminatory MLVA technique for genotyping Queensland invasive S. pneumoniae. MLVA4 has the ability to enhance our understanding of the pneumococcal epidemiology and the changing genetics of the pneumococcus in localised and short-term studies.


Introduction
More than 800 000 children under five succumb to invasive pneumococcal diseases (IPD) each year globally [1]. Australia is not exempt from invasion with 6.7 per 100 000 notifications in 2013 [2][3]. IPD are defined as the isolation of S. pneumoniae from normally sterile body sites including blood, tissues, and cerebrospinal, joint, pericardial or pleural fluids [4]. The changing pneumococcal population structure worldwide is largely the result of serotype replacement and capsule switching, especially in first world countries such as Australia where vaccines are widely implemented [5][6][7]. Serotype replacement has been problematic as current pneumococcal vaccines only target 13 out of 95 pneumococcal serotypes [8][9][10][11][12][13]. Capsule switching, the transfer of a capsule genes from one pneumococcus to another, is a regular occurrence in pneumococcal populations however it has the potential to reduce vaccine efficacy because vaccine escape isolates can emerge [6][7]. Vaccine escape isolates can develop within 2-3 years of a vaccine introduction, as already detected in the USA and Italy [6,14]. In vitro studies have demonstrated that capsule switching can also impact pneumococcal virulence properties, particularly since the polysaccharide capsule is a virulence factor of pneumococci [8]. A highly virulent capsule type 5 strain was rendered avirulent when expressing a capsule type 3 and a type 6A strain expressing a capsule type 6C was more virulent than wild types [15][16].
So far, there have been little published studies examining the pneumococcal population structure in Australia since the introduction of the new vaccine 13-valent pneumococcal conjugate vaccine in July 2011 [17].
For decades, the main technique for surveying serotype distribution and replacement has involved serotyping pneumococci with antisera (such as the Quellung reaction) [18]. However serotyping is expensive, laborious, and ambiguous, revealing no information about genetic recombination and capsule switching. Therefore several genotyping methods have been developed, including MLST [19] and several MLVA techniques [20][21][22][23]. It is important to bear in mind that genotyping and serotyping are both required in combination to detect capsule switching events.
Commonly, MLST is used to genotype S. pneumoniae based on the original technique involving housekeeping genes developed by Enright and Spratt [19]. MLST housekeeping genes are considered to be stable and less prone to recombination than the rest of the genome, enabling examination of long-term population changes and studies across wider areas [19]. However, MLST is expensive and laborious, not suitable for large-scale genotyping or routine use [20][21][24][25][26][27]. There is no national Australian MLST database, and the international MLST database only contains 138 Australian isolates from the year 1967 to 2013, providing very little information regarding the Australian pneumococcal population. The international MLST database could be used as a comparison against the pneumococcal isolates found in the rest of the world.
To reduce cost and labour intensity, MLVA was developed for genotyping S. pneumoniae. It is reported to be more discriminatory, inexpensive and faster than MLST, however more suitable for short-term epidemiology changes and localised outbreaks [20][21]27]. MLVA involves amplifying Variable Number of Tandem Repeats (VNTR) loci and sizing the PCR products according to fragment lengths. VNTRs are suitable genotyping targets since there are multiple loci throughout the genome and they are polymorphic [28]. MLVA can be multiplexed, and uses a DNA sequencer for high throughput analysis of fragment sizes, not sequencing [25].
Different MLVA protocols exist including Koeck's et al. [21] MLVA which amplifies seventeen VNTRs, each in a singleplex PCR. This method was used to genotype pneumococcal isolates in Burkina Faso [29]. Unfortunately the practicality of typing seventeen targets and the difficulty comparing analysis of gels are some of the limitations. There is a freely accessible database (http://mlva.u-psud.fr/mlvav4/genotyping/index.php -Simple Databases labelled 'Streptococcus pneumoniae2005) which allows comparison of profiles, however there are only 59 isolates available in this database. Recently this protocol has been modified by Van Cuyck et al. [22] by reducing the number of VNTRs to seven, which were experimentally determined to have the highest discrimination (Hunter-Gaston Diversity >0.8) within a population of 331 UK isolates. This seven marker MLVA is claimed to be a minimum universal set, ideal for genotyping pneumococcal isolates. However it is known that pneumococcal populations differ between countries, therefore the selection of these seven MLVA markers may not be suitable for Australian isolates.
Another MLVA protocol was developed by Elberse et al. [20], which utilises eight BOX VNTR loci amplified in two multiplex PCRs with fluorescently labelled probes. BOX loci are a type of VNTR loci, distinguishable by varying numbers of boxB repeat regions (45 base-pair) found between boxA and boxC which remain stable under laboratory conditions [20,30]. BOX elements can form secondary structures and can affect the expression of downstream genes [31]. Elberse's method solely uses BOX loci (no other VNTR), while Koeck's method uses a combination of BOX loci and other VNTR loci, which do not contain stable boxA and boxC units, and vary in repeat lengths e.g. 60 base-pair. Elberse's MLVA protocol has been applied to genotype pneumococci in the Netherlands and carriage isolates from Portugal, as well as tracking a localised outbreak in England [32]. However, a limitation is that some BOX loci fail to amplify (assigned '99') therefore leaving profiles incomplete, an issue that remains unresolved. An MLVA type can still be assigned even if the profile contains a non-amplified locus (studies from Elberse identify that 89% of serotype 7F isolates will commonly have a BOX-06 that won't amplify). The limitation of having an incomplete genetic profile is that we can't see the 'true' bacterial fingerprint of the isolate. It is unknown whether this could have implications for population studies.
Finally, Multi-Locus boxB Typing (MLBT) was developed, a variation of MLVA by sequencing VNTR loci to detect Single Nucleotide Polymorphisms (SNPs) as well as fragment length variations [23]. MLBT contains VNTRs that have been used in the other MLVA protocols, however the complexities of MLBT does not enable ease of comparison with other MLVA methods [20][21]23].
The aim of this study was to improve pneumococcal genotyping methods by modifying existing MLVA. It is important to modify the existing MLVA methods to be more discriminatory, faster, cheaper and technically less demanding than previously published MLVA methods and MLST.

Methods Setting
All pneumococcal isolates from Queensland patients with IPD are required to be submitted to the Public Health Microbiology Laboratory at Queensland Health Forensic and Scientific Services (QHFSS), Brisbane. Serotyping (Quellung) is mandatory, however further genotyping has only been performed for research purposes. Invasive S. pneumoniae isolates taken from normally sterile body sites were serotyped by the Pneumococcal Reference Laboratory, QHFSS, using Quellung reaction [4,18].

Laboratory methods
S. pneumoniae isolates were cultured 16-streak on Horse Blood Agar (HBA, Oxoid, Australiaa commercially available product routinely used for the culture of bacteria). A single colony was re-cultured on fresh HBA to ensure isolates were genetically identical. Isolates were boiled in 400μL TE buffer (~pH8.0) for eight minutes to prepare a thermolysate with template DNA and were stored at -20°C until further use.
S. pneumoniae isolates (n = 317) detected in Queensland were genotyped using MLVA1, MLVA2 and MLVA4. Elberse's MLVA1 contained two multiplexes with eight BOX genes (Table 1) [20]. The MLVA2 procedure was based on Van Cuyck's MLVA method and contained seven VNTRs (Table 1) [22]. A single multiplex reaction was performed with Spneu17, Spneu19, Spneu27 and Spneu39 as the other three had already been typed in MLVA1. Spneu31 [21] and B10 [23] were separately amplified to determine whether they provided high discrimination within the pneumococcal population, and therefore suitable for the modified MLVA method.
The Corbett Cas1200 Robotic system was used to prepare mastermix with DNA thermolysate. The PCR protocol was optimised and consisted of 15min at 95°C, 30 cycles of 95°C for 30sec, 58°C for 60sec and 72°C for 60sec, followed by extension of 72°C for 10min and held at 4°C. Diluted PCR products (1:150 reverse-osmosis water (PALL, Australia)) were combined with 1200LIZ internal ladder (Applied Biosystems, Australia). A heat denaturation step (95°C for 5min, followed by a hold step at 4°C) was performed on a thermocycler to separate the dsDNA. Fragment sizing was performed on AB3130 (Applied Biosystems, Australia). Finally, MLST was also applied to selected isolates (n = 202) for comparison purposes as previously described [19,33].

Analysis
PeakScanner V1.0 software was used to analyse MLVA results (Applied Biosystems, Australia). A MLVA type (MT) using MLVA1 was assigned to each isolate using the MLVA database (http://www.mlva.net). MT types for MLVA2, and MLVA4 was manually assigned from our own database. Van Cuyck et al. [22] did not provide a database or published MLVA types that could be used in the Queensland pneumococcal population. The pneumococcal population structure using all MLVA methods were displayed as eBurst diagrams produced from PHYLO-ViZ software [34]. Clonal clusters (CC) are defined as two or more isolates that are genetically related and linked by single locus variants (SLV) or double locus variants (DLV). Where an international or larger database is available, Queensland isolates were compared.
MLST results were analysed using ChromasPro software (Technelysium Pty Ltd.) using batch alignment analysis. Allele numbers and sequence types (ST) were assigned to each isolate from the MLST database (www.mlst.net) and displayed as an eBurst diagram using PHYLOViZ software.
The Simpson's Index of Diversity (S) was calculated to compare the discrimination of all genotyping methods (http://darwin.phyloviz.net/ComparingPartitions/index.php?link=Tool). If the 95% confidence intervals (CI) overlap between methods, the hypothesis that the methods have similar discriminatory power cannot be excluded. The Adjusted Wallace coefficient (AW) was used to measure the congruence between typing methods [35][36].
The frequency of non-amplified loci ('99') was compared between each MLVA method to determine whether this was a limitation of a particular method or associated with specific serotypes. Hunter-Gaston Diversity Index (DI) (http://www.hpa-bioinformatics.org.uk/cgi-bin/ DICI/DICI.pl) was used to calculate the genetic diversity of VNTR genes within the Queensland population, as used in previous studies [21][22]37]. DI is a measure of the variation of the number of repeats at each locus, ranging from 0.0 (no diversity) to 1.0 (complete diversity).

Ethics statement
No human participants were involved directly in this study and hence, human ethics clearance was not required. S. pneumoniae isolates routinely cultured from clinical specimens were used and we investigated the epidemiology of the S. pneumoniae isolates, changing genotypes and population structure in Queensland. . MLVA4 method has novel aspects as four sets of primers and the three multiplex PCR has been newly designed in this study.

Comparison of genotyping methods
Isolates selected for MLST included 13vPCV serotypes and non-13vPCV serotypes to minimise labour work and costs (excluding those originally in the 7vPCV i.e. serotype 4, 6B, 9V, 14, 18C, 19F and 23F, and serotype 19A). Studies have already shown that 7vPCV serotypes have significantly declined [38][39], so we focused on the examination of these recently targeted or non-targeted serotypes.
All three MLVA genotyping methods had a higher discriminatory power compared to MLST (Table 2) However, when comparing the Adjusted Wallace Coefficient of MLVA4 with the other two MLVA methods, all MLVA methods had similar discriminatory power (Table 2). MLVA4 has high congruence with MLVA1 (AW = 0.883), MLVA2 (AW = 0.766) and MLST (AW = 0.966). This indicates that the MLVA4 genotypes will have a 96.6% probability that it will have the same MLST type. Conversely, the congruency of MLST with MLVA4 (AW = 0.314) indicates that the MLST types will have a 31.4% probability of having the same MLVA4 types, indicating that MLVA4 is more discriminatory.

Comparison of MLVA and MLST eBurst
MLVA4 has been shown to be vastly cheaper and faster than MLST, costing $23.71 compared to $346.65 per isolate, respectively, and taking only 3-4 days on average to genotype 48 isolates, compared to 16-20 days, respectively.
Comparison of MLST and our MLVA4 eBurst analysis clearly shows two different population structures (Fig. 1). The pneumococcal population structure is displayed according to the MLVA4 genotypes, and it is observed that there are 32 clonal clusters (CC), with the larger eleven clusters labelled in the figure. The MLST results are overlayed on the MLVA4 results; therefore each colour represents a different MLST type. It can be seen that several MLVA4 clusters would appear as a single MLST type, for example CC1 which predominantly contains serotype 7F isolates has as many as ten different MLVA4 types however only one MLST type as depicted by the single green colour of the circles.

Discussion
MLVA has emerged as an alternative genotyping technique to the 'gold standard' MLST as it has higher discriminatory power, is fast and inexpensive [20][21]24]. Results in this study support this finding as MLVA4 is vastly cheaper and faster than MLST, and is still comparable to the costs and laboratory processing time of the other published MLVA methods (Table 2). Reducing time and costs for genotyping will have an impact on the public health field by having the ability to resolve "large and complex outbreak situations" [24].
The choice of the ten VNTRs for our MLVA4 method was based on a Hunter-Gaston Diversity of 0.8, as well as an anchor locus with low discrimination to determine long-term changes (BOX-02), and an extra locus with high discriminatory power for specific serotypes (Spneu19). Of the eight highly discriminatory VNTR loci used in previously published MLVA papers, seven VNTRs are included in the modified MLVA4 method [20][21][22][23]. MLVA4 maintains high congruence with the other MLVA methods and MLST (Table 2).
When comparing the different genotyping methods, MLVA4 method maintains a high discriminatory power whilst minimising the number of '99', and is significantly more accurate in representing the Queensland pneumococcal population structure than MLST. Clusters of isolates can be observed in more detail and can glean more information when combined with isolate information such as antibiotic resistance or location of disease (not investigated in this study). Admittedly, additional markers have increased the discriminatory power when applied to Queensland, Australian isolates however we have maintained minimal laboratory work to three multiplexes.
MLST was proven to be less discriminatory although it could still maintain as an ideal method for long-term and international epidemiology studies. A less discriminatory protocol is problematic if it does not detect emerging genotypes or outbreaks in a localised area or state. It is evident that MLVA4 is more efficient at discriminating pneumococcal isolates because, for example, MLST type 191 (serotype 7F) actually can be separated into nine different MLVA4 types (Fig. 4).
However, this study confirms the problem of failing to amplify particular loci, resulting in incomplete genotypes. As a result, a number of primers were re-designed in this study in an attempt to successfully amplify the failing loci (including BOX-10, BOX-12, BOX-13 and Spneu19). Elberse et al. [40] reported that 24% of their isolates still contained one or more non-amplified BOX loci even after repeated PCR. This study observed that some loci failed to amplify in specific serotypes, for example BOX-06 failed to amplify 75% of Queensland serotype 7F, indicating that primers needed to be redesigned, or VNTR fragments were absent. Serotype 7F (89% isolates) has been associated with a large number of non-amplified BOX-06 genes [40]. Since the primers for MLVA are located in stable boxA and boxC regions flanking the boxB repeats, the difficulty in amplifying the loci is unlikely. Therefore, it is theorised that gene elements are more likely to be lost (or acquired) if there is a higher average number of boxB repeats [31]. It is possible that the BOX-06 region is missing from serotype 7F as there are up to eight different BOX-06 fragment lengths. Therefore BOX-06 was not used in MLVA4 due to a high percentage of '99' and a low Hunter-Gaston diversity. Serotype 7F was the second most common serotype (9%) found in Queensland, therefore using higher discriminatory loci was favoured.
Similarly, Spneu19 loci and BOX-13 could not be detected in serotype 3 and 33F isolates, respectively. The failure to amplify Spneu19 in serotype 3 has also been observed, suggesting that serotype 3 lack pcpA which codes for a non-essential surface protein involved in cell adhesion [21,41]. On the other hand, long BOX-13 fragments (>2000bp) have been identified in serotype 33F isolates, possibly accounting for the '99' results since the AB3130 internal size ladder only reaches 1200bp. Large fragments could be explained by the placement of an insertion sequence (IS) element, making the BOX element appear to be larger than 2kb. The presence of IS elements has been described in another MLVA study [42].
Variations in interpreting the population structure of S. pneumoniae have been observed when using different genotyping protocols. Already potential capsule switches have been observed between a serotype 19A and 15C in CC8 (MT59; ST411), and a serotype 1 and 4 (MT36; ST306) in CC10 in our Queensland population using MLVA4 (S1 Fig.). MLST also identifies these capsule switches, as well as many others which may indicate false capsule switching since MLST is less discriminatory to discern true genetic background of S. pneumoniae isolates. When examining the international database, some of these capsule switches can be verified, for example ST199 (serotype 19A/15C), ST1012 (serotype 22F/33F) and ST4237 (serotype 6A/6C), however there is no support for capsule switching of the other strains in the international database. This could mean that the capsule switch is relatively new. Alternatively, MLVA4 is too discriminatory and even though true capsule switch occurs, MLVA4 identifies two distinct genotypes therefore the assumption is that no capsule switching has occurred. Further investigation is required to determine whether MLVA4 could fail to detect capsule switches or that MLST is detecting false capsule switches. Since MLVA4 is highly discriminatory, it may enable detection of capsule switching earlier than MLST would. Accurately and quickly detecting relationships between serotypes may have an impact on the selection of serotypes for future vaccine strategies. Furthermore, the ability to examine CC with higher discrimination using MLVA4 can provide insight into which BOX elements are changing. Serotype 7F (MLVA CC1 or ST191) largely diversifies due to BOX-10 and serotype 3 diversifies due to Spneu17. It is unknown what the specific functions of these elements are, however it is known that VNTRs and BOX elements play a role in bacterial competence and virulence and can influence gene expression downstream [30,43]. VNTR loci with high diversity (e.g. Spneu17) would allow increased discrimination within localised or short-term studies, whereas VNTR loci with low diversity (e.g. BOX-02 and BOX-11) would allow identification of long term changes. The population structure based on the respective genotypes determined by each genotyping method varies. MLST is considered the least discriminatory as only one genotype is assigned to serotype 3 and serotype 7F, whereas MLVA4 provides increased discrimination by identifying a number of genetically related but different genotypes. In conclusion, we have developed a MLVA4 method for genotyping invasive S. pneumoniae. The main advantage of this new method over other MLVA protocols is the ability to achieve complete MLVA profiles for serotypes while also maintaining a highly discriminatory and fast genotyping technique. Loci that failed to amplify were found to be serotype specific, which may indicate that these BOX elements in these serotypes may be variable and have the capacity to transpose. Further research is required to understand the VNTR genetics of these serotypes as VNTRs and BOX loci are thought to play a role in virulence. MLVA4 is also more suitable for genotyping S. pneumoniae than MLST as a more diverse population can be visualised and allow accurate tracking of strains across the state. MLST may be more suitable for a national study, rather than state. This study, establishes a population structure prior and post 13vPCV introduction in Queensland, and it is expected that future monitoring will comprehensively and accurately depict the changes in the pneumococcal population. The future perspective of MLVA is that it will emerge as a cheap and fast genotyping method for localised and national studies that can still be used in conjunction with the currently traditional and slower serotyping and MLST methods for characterising S. pneumoniae.