Capsular Typing Method for Streptococcus agalactiae Using Whole-Genome Sequence Data

Group B streptococcus (GBS) capsular serotypes are major determinants of virulence and affect potential vaccine coverage. Here we report a whole-genome-sequencing-based method for GBS serotype assignment. This method shows strong agreement (kappa of 0.92) with conventional methods and increased serotype assignment (100%) to all 10 capsular types.

S treptococcus agalactiae, or group B streptococcus (GBS), is an important pathogen in neonates (1)(2)(3), with early infections being acquired from the maternal genitourinary tract (4). In addition, GBS is now recognized as an increasingly important pathogen among immunosuppressed and elderly individuals in highincome regions (5,6).
GBS expresses a capsular polysaccharide that is involved in virulence and immune evasion. Ten different serotype variants (i.e., Ia, Ib, II, III, IV, V VI, VII, VIII, and IX), which differ in their disease-causing abilities, have been described. Conjugate vaccines targeting the most common disease-causing serotypes are currently in development (7). Establishment of vaccine serotype coverage is important, as is postintroduction surveillance to monitor for potential serotype replacement, as has been seen following the introduction of other conjugate vaccines (8).
Current methods for GBS serotype allocation rely on latex agglutination assays or PCR assays (9). Recent advances in wholegenome sequencing (WGS) have enabled the development of approaches that can be used in place of traditional microbiological methods, such as strain typing and antibiotic susceptibility profiling (10)(11)(12). A major advantage of this approach is that the cost of sequencing can be mitigated by the ability to use the same data to generate multiple outputs. Given the decreasing cost of WGS (13) and the ongoing increase in WGS data generation, we sought to establish and to validate a WGS-based method for GBS capsular typing.
We developed an algorithm for serotype assignment on the basis of sequence similarity between a given de novo assembly and capsular gene sequences of the 10 GBS serotypes. For nine serotypes, published sequences were used as references (Table 1); for serotype IX, however, only a partial capsular locus sequence has been published (14). A suitable reference for the full capsular locus region was therefore determined by WGS of a serotype IX isolate obtained from the Statens Serum Institute (Copenhagen, Denmark).
To assign the serotype for a given isolate, a BLAST database was generated from the de novo assembly and queried with the variable region of the capsular locus sequence for each serotype (cpsG-cpsK for serotypes Ia to VII and IX and cpsR-cpsK for serotype VIII), using BLASTn with an E value threshold of 1eϪ100 and otherwise default parameters. A serotype was considered correct if it showed Ն95% sequence identity over Ն90% of the sequence length.
These thresholds were chosen on the basis of being stringent enough to provide differentiation between the various reference sequences while maximizing serotype allocation for an initial test set of publicly available GBS WGS data, for which serotype information was not available (therefore, we had no way of knowing whether the assigned serotypes were actually correct).
This sequence-based method for serotype allocation was validated using WGS with a set of 223 colonizing or invasive human isolates from Canada, Latin America, Singapore, the United Kingdom, the United States, and Thailand for which serotypes had been determined previously using conventional latex agglutina- ). Three isolates that did not have a capsular type assigned by latex agglutination methods had serotypes Ib, VI, and VII assigned. For all previously serotyped GBS isolates with a known capsule type, the kappa statistic of 0.92 indicated very strong agreement between WGS-predicted and conventional serotypes. Nine isolates had discordant results. In each case, there was strong support for the sequence-allocated serotype, with Ͼ98% sequence identity over 100% of the reference length in all nine cases (Fig. 1). Across all isolates, differences in relatedness between the capsular locus sequences of the different serotypes led to characteristic relationships between the allocated serotype (best match) and the second-best match. For example, all isolates assigned to serotype Ia had serotype III as the second-best match. In all cases, the second-best match was substantially poorer than the best match, demonstrating that there was no ambiguity in the predicted serotype ( Fig. 1 and Table 3).
The nine isolates with discordant results and the three nontypeable isolates were retested by latex agglutination assays (Table  4) and were resequenced using the Illumina MiSeq platform, with sequence processing and WGS-based serotype prediction performed as described above. In all cases, resequencing results were consistent with the initial WGS classification. For 6/9 isolates with discordant results, the new latex agglutination results matched the WGS-based prediction, suggesting that the initial discordance might have resulted from incorrect latex agglutination typing or sample mislabeling. The other three isolates with discordant results and the three nontypeable isolates were all classified as nontypeable with retesting.
This WGS-based method for GBS serotyping, which was validated using 223 isolates that had been typed using conventional methods, was therefore highly accurate. Although WGS currently may not be cost-effective for direct replacement of traditional serotyping, costs are likely to decrease further. Furthermore, WGS may already be the cheapest option for combined studies, with possibilities for utilizing the resulting data for additional analyses, such as multilocus sequence typing, analyses of relatedness to other sequenced isolates, and detailed phylogenetic analyses.  Ia Ib II III  IV V VI VII VIII IX   Ia  34 0 0  1  0  0  0  0  0  0  35  Ib  0  9 Table 3).   Ia  Ia  Ia  IW8194 Discordant results  III  IX  IX  IX  IW8466 Discordant results  Ia  III  III  III  IW8471 Discordant results  III  Ia  Ia  Ia  IW7157 Discordant results  Ib  II  II  II