Comparative genomics of Bordetella pertussis isolates from New Zealand, a country with an uncommonly high incidence of whooping cough

Whooping cough, the respiratory disease caused by Bordetella pertussis , has undergone a wide-spread resurgence over the last several decades. Previously, we developed a pipeline to assemble the repetitive B. pertussis genome into closed sequences using hybrid nanopore and Illumina sequencing. Here, this sequencing pipeline was used to conduct a more high-throughput, longitudinal screen of 66 strains isolated between 1982 and 2018 in New Zealand. New Zealand has a higher incidence of whooping cough than many other countries; usually at least twice as many cases per 100000 people as the USA and UK and often even higher, despite similar rates of vaccine uptake. To the best of our knowledge, these strains are the first New Zealand B. pertussis isolates to be sequenced. The analyses here show that, on the whole, genomic trends in New Zealand B. pertussis isolates, such as changing allelic profile in vaccine-related genes and increasing pertactin deficiency, have paralleled those seen elsewhere in the world. At the same time, phylogenetic comparisons of the New Zealand isolates with global isolates suggest that a number of strains are circulating in New Zealand, which cluster separately from other global strains, but which are closely related to each other. The results of this study add to a growing body of knowledge regarding recent changes to the B. pertussis genome, and are the first genetic investigation into B. pertussis isolates from New Zealand.


INTRODUCTION
Since the pertussis vaccine was introduced globally, New Zealand has noted a high number of infections and hospitalizations due to whooping cough compared with other developed countries; some sources indicate disease rates 5 to 10 times higher in New Zealand than the UK or USA. For instance, during the 1980s, hospitalization for whooping cough was required for 0.37 per 100 000 people in the USA, compared with 3.75 per 100 000 in New Zealand [18]. In addition, epidemic periods have often been more severe in New Zealand than other countries. During concurrent epidemics in the UK and New Zealand in 2012, 122.3 cases

Impact Statement
Since the 1990s, whooping cough has been resurgent in many countries around the world, despite the wide availability of vaccines. New Zealand has often had a higher incidence of whooping cough than other countries such as the USA, UK and Australia, both during outbreak periods and in the intervening years. One potential reason for the resurgence of whooping cough is genetic changes to the causative bacterium, Bordetella pertussis, with several recently identified, ongoing global genetic trends. No B. pertussis isolates from New Zealand have previously been sequenced, however. Here, we used hybrid sequencing to investigate the genomes of 66 New Zealand B. pertussis isolates, collected between 1982 and 2018. This revealed that genetic trends in New Zealand B. pertussis match those observed elsewhere, but over the years a number of highly similar or identical strains appear to have circulated (or are currently circulating) in the country, a phenomenon not commonly noted elsewhere. This first study of B. pertussis isolates from New Zealand contributes to the global understanding of B. pertussis genomics, as well as providing the groundwork for any future whole-genome sequencing of New Zealand B. pertussis isolates.
were seen per 100 000 people in New Zealand, compared to 20 per 100 000 in the UK [2,19]. Whilst, historically, rates of vaccine uptake in New Zealand were relatively low, a marked increase in immunization coverage across all age groups has been seen over the last two decades [20,21]. At 93 %, immunization coverage in New Zealand at all age points up to 5 years is now comparable to the mean 94 % coverage in the UK [22]. Nonetheless, rates of whooping cough have remained higher in New Zealand than elsewhere, with some outbreaks occurring that are unique to New Zealand. The latest such outbreak occurred between 2017 and 2019, a period during which countries such as the USA, UK and Australia did not observe a similar outbreak.
As of 2018, no B. pertussis sequencing reads or genomes in the NCBI database were tagged as being from New Zealand. However, B. pertussis isolates have been collected and stored at the Institute of Environmental Science and Research (ESR)'s Invasive Pathogen Laboratory in New Zealand since the 1980s [23]. Here, we sequenced the genomes of 66 B. pertussis isolates collected in New Zealand between 1982 and 2018, using a variant of the nanopore-based hybrid sequencing workflow we developed previously [24]. We then used comparative genomics to place these first New Zealand B. pertussis genome sequences in a global context, and to screen for any potential genetic factors responsible for New Zealand's ongoing high incidence of whooping cough.

METHODS
All data processing and analysis was carried out using the MRC's Cloud Infrastructure for Microbial Bioinformatics (CLIMB) [25].

Strain isolation
In total, 66 strains, collected between 1982 and late 2018, were stored at −80 °C at the ESR's Kenepuru Science Centre, Porirua, New Zealand. Strains were grown and heat-killed, then shipped on ice to the University of Bath, UK. On arrival, the heat-killed cells were stored at −20 °C. Serotyping data had previously been determined by the ESR as the strains were isolated, where possible (46/66 strains). Full details, including accession numbers, are included in Tables 1 and S1.

The outbreaks
Pertussis case numbers and incidence data were collated from the New Zealand Ministry of Health's Public Health Surveillance monthly reports, dating back to January 2003 (https://surv.esr.cri.nz/surveillance/monthly_surveillance.php, Table S2). The resulting timeline (Fig. 1) was used to classify the 'outbreak' periods in New Zealand studied here. For consistency, a major outbreak was defined here as the period during which incidence per 100 000 was above 30.0 for longer than 6 months. This resulted in three major outbreak periods since January 2003 being defined: July 2004 to November 2006 (29 months), November 2011 to October 2014 (36 months), and October 2017 to October 2019 (25 months). A fourth, shorter and smaller, increase in case numbers was also noted from early 2009 to late 2010. Further details of these outbreak periods are shown in Table 2.

DNA extraction and Illumina sequencing
Heat-killed cells were resuspended in 1 ml PBS and OD 600 was measured. Volumes of suspension equating 1 ml at an OD 600 of 2.0 (~4×10 9 B. pertussis cells) were pelleted in a microcentrifuge for 2 min at 12 000 g. gDNA was extracted from each pellet using the QIAamp DNA mini kit (Qiagen) according to the manufacturer's instructions, including a single-step elution into 200 µl of elution buffer (buffer AE). gDNA from 34 isolates was sent for Illumina MiSeq sequencing at the Milner Centre, University of Bath. gDNA from the remaining 32 isolates was sent to Novogene for Illumina NovaSeq sequencing. gDNA from three isolates was lost during shipping, hence 63 isolates were sequenced with Illumina sequencers.

Nanopore library preparation and sequencing
Sequencing libraries were prepared for all samples using ONT's Rapid Barcoding Kit (SQK-RBK004), according to the manufacturer's instructions, including the optional 1 × SPRI concentration and clean-up step (using Promega ProNex size selection beads) after library pooling and before addition of RAP adaptors. Between 10 and 12 barcodes were used per MinION flow cell (see Table S3 for full details).
Each pooled sequencing library was loaded onto an R9.4 MinION flow cell and sequenced for 48 h using a MinION Mk1b device with MinKNOW sequencing software.

Basecalling, demultiplexing and adaptor trimming
Deepbinner (v0.2.0, [26]) was used to demultiplex the raw fast5 files, using the 'realtime' setting with 'rapid' option. This placed all fast5 files into separate bins, one for each barcode. ONT's Guppy fast Flip-flop basecaller (v3.1.5+781ed57) was then run on each of these barcode bins, with its own rapid barcode demultiplexing option enabled. This placed each of the fastq files basecalled from the fast5s in the Deepbinner barcode bins into further barcode bins, resulting in 12 Deepbinner barcode bin fast5 directories (plus one 'unclassified' bin, containing reads with unclassifiable barcodes), each containing up to 12 further barcode bin fastq Table 1. Details of the New Zealand isolates sequenced here. Further details can be found in Table S1 Isolate  Table 1. Continued directories (plus 'unclassified'). Theoretically, most of the reads from any specific Deepbinner bin should have been placed in a bin corresponding to the same barcode by Guppy, but in some cases the two tools may disagree over the barcode identity; for example, most of the reads in the 'barcode01' Deepbinner bin should have also been placed in the 'barcode01' bin by Guppy after basecalling, but some may have been placed in other bins. Only reads identified as having the same barcode by both tools were retained for further processing; for example, when basecalling the fast5s from Deepbinner's 'barcode01' bin, only fastqs placed in Guppy's 'barcdoe01' bin were kept. Finally, Porechop (v0.2.4, [27]) was used to trim the barcodes and other adaptor sequences from the demultiplexed reads.

Hybrid genome assembly
Illumina MiSeq data was provided pre-trimmed by the Milner Genomics Centre. Illumina NovaSeq data provided by Novogene was trimmed using Trimmomatic (v0.36, [28]) with the options PE, HEADCROP:10, SLIDINGWINDOW:4 : 25 and MINLEN:100. Genome assembly was attempted for all strains using the hybrid pipeline we developed previously [24]. Nanopore reads were pre-corrected using Canu (v1.8, [29]), followed by hybrid assembly with nanopore and Illumina reads using Unicycler (v0.4.7, [30]). However, some of the available Illumina data was of lower quality (reads <70 bp, non-uniform length) or low coverage (<40×). Unicycler uses an Illumina-centric hybrid assembly method, using SPAdes [31] to first assemble the Illumina data into contigs, then the nanopore reads to attempt to bridge the contigs. The low quality of some of the Illumina data therefore prevented Unicycler from assembling closed genomes for these strains. An updated variant of the best long-read-centric hybrid assembly strategy identified in Ring et al. [24] was used to assemble genomes for the strains with low-quality Illumina data (full details of which strategy was used for each strain are shown in Table S1).  For the isolates which required long-read-centric assembly, Canu-corrected nanopore reads were assembled with Flye (v2.7b-b1526, [32]) and polished four times with Racon (v1.4.11, [33]), then polished with Medaka (v0.11.3, https://github.com/nanoporetech/ medaka), both using the nanopore reads alone. Finally, the assembly was polished three times using Pilon (v1.22, [34]) with the short Illumina reads.
Illumina reads were not available for three strains (NZ1, NZ5 and NZ29). Genome assemblies were produced for these strains using nanopore reads alone, using the Flye-Racon-Medaka strategy above, without the Pilon polishing steps. These strains were not included in SNP analysis, phylogenies or allele typing, but they were included in analysis of genome arrangement and numbers of IS elements.
All sequences were rearranged to start with gidA, and reverse complemented where necessary, as described previously [24].

Comparing genome arrangement
The closed genome sequences described above were aligned against each other using progressiveMauve (v20150226 build 10, [35]). The results were manually inspected and grouped into types (within each type, no arrangement differences were visible). A representative of each New Zealand arrangement type was aligned against 29 global strains (Table S4), representing each of the CDC arrangement types defined in Weigand, Peng's [17] landmark B. pertussis genomic arrangement study, in order to place the New Zealand arrangements in a wider global context.

Prediction of pertactin, pertussis toxin and filamentous hemagluttinin deficiency
The final closed genome sequence for each strain was annotated using Prokka (v1.14.6, [38]), with the Tohama I reference proteins for NC_002929.2 from GenBank as a guide. The resulting annotations for prn, ptxA-E and fhaB were screened for presence/ absence, and for insertion of IS 481, IS 1002, or IS 1663. Additionally, the assembled prn sequences were screened for the presence of other mutations previously identified in pertactin-deficient strains [39].

Phylogenies
To place the New Zealand strains in a global context, 89 global strains representing different continents, time periods and allelic profiles were included in the analysis. A further 98 global strains were included in more detailed trees for each of New Zealand's four whooping cough outbreaks since 2004. Illumina reads were downloaded from NCBI's SRA using fasterq-dump (v2.10.5) and trimmed using Trimmomatic (v0.36) using the options HEADCROP:10, SLIDINGWINDOW:4 : 20 and MINLEN:30 (see Table S5 for full details including accession numbers).  [43]. The best maximum-likelihood tree for each dataset was output in Newick format, TreeCollapserCL4 (v3.2, http://emmahodcroft.com/TreeCollapseCL.html) was used to collapse branches with bootstrap values below 70 %, and the resulting tree was then visualized using iTOL [44].

Closed genome sequences were assembled for all isolates
In total, 6 MinION R9.4 RevD flow cells were used to sequence the 66 New Zealand isolates. The amount of usable data per flow cell, after demultiplexing with Deepbinner and Guppy, varied from 3.94 to 12.76 Gb. This corresponded to coverage between 42× and 484× per isolate, with a mean coverage of 207×. The overall mean read length across all sequencing runs was 6282 bp. The individual run mean read lengths ranged from 4908 to 7917 bp.
The coverage and read lengths were sufficient to assemble closed genome sequences for all isolates, 63 in hybrid with Illumina data, and three using nanopore data alone (full details shown in Table S1).

Overall, 19 different genome arrangements were observed in the 66 New Zealand isolates
Closed genome sequences were produced for all 66 New Zealand isolates, and the genome arrangement of these closed sequences was investigated using progressiveMauve. The isolates were grouped based on shared arrangement, resulting in 19 different arrangement groups. Ten isolates displayed 'singleton' arrangements, which were not shared with any other isolate (Fig. S1); the remaining 56 isolates shared nine arrangements between them (shown in Fig. 2a). As seen in Weigand et al. [16], the rearrangements generally displayed a pattern of symmetrical inversions around the origin of replication.
One representative of each arrangement group, including each singleton, was aligned using progressiveMauve against representatives of each of the arrangement types defined in Weigand et al. [17] and Weigand et al. [16] (the representatives are listed in Table S4). This revealed that six of the New Zealand arrangements were congruent with Weigand arrangement types, including two of the New Zealand singletons, NZ10 and NZ48, which were congruent with the CDC Cluster-BP-23 and CDC046 arrangement types, respectively. In total, 30 New Zealand isolates shared the same arrangement type, CDC010. The remaining isolates were spread more evenly in smaller groups across the other eight arrangement types, as shown in Fig. 2b. Full details of the grouped arrangement types are given in Table S6. Arrangement type CDC010 was the fifth most commonly seen arrangement type in Weigand et al. [17], after CDC237, CDC002, singletons and CDC013; CDC237, CDC002 and CDC013 were also observed here.

Shifts in allelic profiles of New Zealand isolates have paralleled those seen elsewhere
MLST was used to identify which alleles of ptxP, ptxA, prn, fim2 and fim3 were present in the 63 New Zealand isolates for which Illumina data was available, using allele definitions from PubMLST or as defined in Bart et al. [7]. The results are shown in Fig. 3. The most common allelic profile in New Zealand in the WCV era, particularly before the 1990s, was ptxP1-ptxA1-prn1-fim2-1-fim3-1 (75 % of 1980s strains, n=8). This then shifted gradually to ptxP3-ptxA1-prn2-fim2-1-fim3-1 ( In total, 35% of all strains, and 89% of strains isolated since 2012, were predicted to be pertactin deficient The closed genome sequences were annotated using Prokka, and the resulting annotations were screened for the presence/ absence of prn, fhaB and ptxA-E, as well as for the insertion of IS 481, IS 1002 or IS 1663 into the same genes.
One additional strain (NZ63) from 2017 was found to have a mutation previously identified in Weigand et al. [17] as causing deficiency, a change from C to T at position 223, resulting in a premature stop codon. In total, therefore, 88.9 % (24/27) of strains isolated in 2012 or later are predicted to be pertactin deficient. This suggests that pertactin deficiency was uncommon in New Zealand B. pertussis strains until 2012, and has rapidly become prevalent since then.  [16,17]. One of the New Zealand isolates with a 'singleton' arrangement (NZ48) was also found to be congruent with CDC046 (not shown), whilst another was congruent with CDC Cluster-BP-23 (NZ10). (b)In total, 30 New Zealand isolates shared the same arrangement type (NZ001/CDC010). Ten isolates had 'singleton' arrangements shared with no other isolate (although one of these arrangements was found to be congruent with CDC046). The remaining 26 isolates were split more evenly between the other eight arrangement types. No FHA or Ptx deficiency caused by IS insertion was predicted in any strain.

New Zealand isolates mainly cluster according to allelic profile rather than outbreak or year of isolation, with some notable exceptions
A phylogeny was inferred using 325 core SNPs for 63 New Zealand isolates (Fig. 4), using Tohama I as a reference. The ptxP1 strains clustered separately from the ptxP3 strains. All non-prn2 strains were contained in the ptxP1 cluster. All predicted Prndeficient strains occurred in the ptxP3 cluster, with all but one of them clustered on the same sub-branch within the ptxP3 cluster. However, the different deficiency-causing prn2 mutations did not cluster together within this sub-branch (for example, the IS 481-1613fwd and IS 481-1613rev strains did not cluster separately), suggesting each mechanism for Prn deficiency has arisen on multiple occasions.
Most arrangement types were spread relatively evenly throughout the phylogeny; however, strains with the NZ002 arrangement type clustered, including the three highly similar strains from the 2004-2006 outbreak. These strains (NZ35, NZ36 and NZ37) were identical in terms of core genome SNPs; comparing the NZ36 and NZ37 Illumina reads to the hybrid assembled NZ35 genome showed that NZ35 and NZ36 were also identical in their whole genomes, whilst NZ37 was only one SNP different.
Likewise, strains with the arrangement type CDC237 clustered, and all were isolated during the most recent (2017-2019) outbreak, representing 40 % (n=10) of all strains sequenced from that outbreak. These four strains (NZ57, NZ58, NZ64 and NZ65) also shared the same mutation in their prn2 gene (IS 481-1613fwd), and three of them (NZ57, NZ64 and NZ65) were identical in terms of core genome SNPs. Comparing the Illumina reads for the four strains to the hybrid assembled whole-genome sequence of NZ64 showed that the cluster is separated by only 1-2 SNPs across the whole genome, with NZ57 and NZ65 being identical across the whole genome. Two of the strains (NZ57 and NZ58) were isolated in the Southern region within the same 2 week period, whilst the other two strains (NZ64 and NZ65) were isolated in the South Canterbury region several months later, but again within 2 weeks of each other.
In general, strains isolated earlier in time were found closer to the top of the tree. All of the 2017-2019 outbreak strains (n=10) occurred on the same major (Prn deficiency) branch, as did all but one of the 2011-2014 outbreak strains (n=14). Six of the 2011-2014 strains clustered on the same sub-branch (NZ50-NZ53, NZ55 and NZ56), and another six clustered on a different but close sub-branch (NZ22-NZ25, NZ47, NZ54).
On the whole, many New Zealand strains were very closely related, including a large group of eight strains (NZ51, NZ52, NZ55, NZ27, NZ56, NZ59, NZ50 and NZ53) collected throughout the 2000s/2010s, which were either identical or separated by only one or two SNPs in their core genome. Variant calling with Snippy, with the hybrid assembled NZ56 genome as a reference and the Illumina reads for the other seven strains, showed that six of the strains (NZ56, NZ51, NZ52, NZ53, NZ55 and NZ27) were separated by no more than three SNPs across their whole genome, whilst the remaining two strains (NZ59 and NZ50) were eleven and six SNPs different from NZ56, respectively.

New Zealand isolates cluster phylogenetically with global strains according to allelic profile, with some clusters of closely related isolates
To determine whether the New Zealand strains were genetically distinct from those circulating elsewhere in the world during 1982-2018, a further phylogeny was inferred from the 63 New Zealand strains and 89 global strains from the same period, using 511 core SNPs with Tohama I as the reference (Fig. 5). Again, all ptxP1 strains clustered on the same branch, and all ptxP3 strains clustered on a separate, larger branch. In general, strains isolated earlier (for example, in the 1980s) clustered at one end of the tree, whilst strains isolated later (for example, the 2010s) clustered at the other end.
During periods for which sequencing data was available from many countries, little geographical clustering was seen. For recent years (since 2014), most of the available sequencing data was for strains isolated in the USA and sequenced by the CDC; this gives the impression of geographical clustering, but is more likely simply due to under-sampling of strains from other countries.
However, certain New Zealand strains do group separately from the majority of the global strains. Groups A-C each contain ten isolates [nine that were isolated in New Zealand, and one that was isolated in either Australia (groups B and C) or Asia (group A)], which are clustered on their own sub-branches of the global tree, separate from the other groups and from the rest of their closest global neighbours. The isolates contained in each group are all strikingly similar to each other in terms of allelic profile, genomic arrangement, presence/absence of specific prn2 deficiency-causing mutations and, where data was available, serotype. Perhaps surprisingly, these samples were not all isolated in the same location, and nor were they always isolated around the same time (although all were isolated since 1999); in group A, for example, the oldest isolate was from 1999, whilst the newest was from 2010. Further details of groups A-C are shown in Table S7.
Group D is a smaller cluster of four strains (NZ57, NZ58, NZ64 and NZ65) which was previously identified in the New Zealand phylogeny. Fig. 4. Phylogenetic tree showing the evolutionary relationships of the New Zealand strains sequenced here, in the context of allele type, Prn deficiency, genomic arrangement and period of isolation. Snippy was used to identify variants between the NZ strains and the reference, Tohama I, as well as defining a core genome. TreeCollapserCL4 was used to collapse branches with bootstrap support < 70%, and IQ-Tree2 was used to infer a maximumlikelihood phylogeny, which was then displayed using iTOL. Key branches are highlighted. All ptxP3 strains are contained within one major branch, whilst a slightly smaller sub-branch contains all but one of the predicted Prn-deficient strains.

Isolates circulating during outbreak years appear to be more closely related to each other than outbreak strains in other countries
Groups A-D in the global phylogeny suggest that certain strains circulating in New Zealand in the 2000s and 2010s were more closely related to each other than those circulating in other countries, particularly during New Zealand whooping cough outbreak periods. To investigate whether this effect was simply due to oversampling of New Zealand strains in the global phylogeny, smaller phylogenies were constructed for each of the three major post-2003 New Zealand outbreaks, with additional global strains included for each outbreak period, and a more recent reference genome, B1917. Fig. 6 shows the focussed phylogeny for the 2004-2006 outbreak, which was constructed from 331 core SNPs. The New Zealand strains are spread across several different branches in the phylogeny, although certain strains are closely related: NZ35, NZ36 and NZ37, as mentioned previously, are identical in terms of core SNPs, and were not all isolated at the same time or location. However, on the same branch as these three New Zealand strains are two Australian strains. These Australian strains are identical to each other and differ from the New Zealand strains by only two core SNPs. Similarly, NZ32 and L508 (also from Australia) are identical to each other in their core genomes, as are two other Australian strains, B048 and L518.  Variants between the selected strains and the Tohama I reference were called, and a core genome was defined, using Snippy. TreeCollapserCL4 was used to collapse branches with bootstrap values <70%, and a maximum-likelihood phylogeny was inferred using IQTree2, and iTOL was used to display the resulting tree. Like the New Zealand strains alone, all strains carrying the ptxP3 allele* are found on the same major branch, separate from the ptxP1 strains. Several clusters containing highly similar strains, almost all from New Zealand, are indicated. *Allele information was not available for some of the more recent global strains.
L1779 and L1780). Four of the New Zealand strains in this group have identical core genomes, and the others differ by one SNP each. These were previously mentioned as part of group C in the global phylogeny (Fig. 5). The Australian strains are not identical to each other, but each is only two SNPs different from the New Zealand strains. Another six of the remaining New Zealand strains (NZ22-NZ25, NZ47 and NZ54) are also found on their own branch, and differ at most by only three SNPs in their core genomes. Interestingly, four of these six strains were isolated later in the outbreak, in 2013 (NZ22 and NZ23) or 2014 (NZ24 and NZ25). All are a part of group B in the global phylogeny (Fig. 5).
Finally, Fig. 8 shows a phylogeny from the most recent outbreak (2017-2019), which was constructed from 207 core genome SNPs. Four highly similar New Zealand strains mentioned previously (NZ57, NZ65, NZ64 and NZ58) cluster on their own branch. The remaining New Zealand strains, however, cluster on branches with strains from Australia and/or the USA.

DISCUSSION
New Zealand B. pertussis strains were investigated here for a number of reasons. Historically, vaccine uptake in New Zealand has been lower than in many other countries, although efforts to monitor and increase rates of uptake since 2008 mean that vaccine coverage is now comparable to that in the UK. Despite the increasing vaccine coverage, whooping cough incidence has remained higher in New Zealand than in most other countries, including the USA and UK. An outbreak occurred in New Zealand from Variants were called using recent reference strain B1917, and core genomes defined using Snippy. Branches with <70% bootstrap support were collapsed using TreeCollapserCL4, then phylogeny was inferred using IQ-Tree2 and visualized using iTOL. 2017 to 2019, for example -a period when other countries did not note a similar increase in cases. Although isolates have been collected and stored by New Zealand's ESR since the 1980s, no sequencing of any B. pertussis from New Zealand had previously been conducted. Therefore, the genomes of 66 isolates from between 1982 and 2018 were sequenced here, using hybrid nanopore and Illumina sequencing, to determine the following: whether global genomic trends observed in B. pertussis had been delayed by New Zealand's slower vaccine uptake; whether the strains circulating during whooping cough outbreaks were polyclonal, as seen in recent outbreaks in the UK and USA; and whether the consistently higher incidence of whooping cough in New Zealand could potentially be explained by the circulation of a unique hypervirulent strain.

Allelic profile and antigen deficiency trends in New Zealand generally match those observed elsewhere in the world
Analysis of the allelic profile of the ACV-related genes in the New Zealand isolates reveals a pattern in the New Zealand strains similar to that seen in Bart et al. 's landmark 2014 study of global strains throughout the 20th and early 21st century, as well as other studies of how B. pertussis populations have changed over recent years [2,7,45,46]. The most common global allelic profile during the 'WCV era' (defined as 1960-1995) was ptxP1-ptxA1-prn1-fim2-1-fim3-1 [7]. The most common allelic profile of the New Zealand strains from 1982 to 1995 was the same, and was present in 50 % (n=12) of the strains screened; the other 50 % of strains were split between those carrying ptxP3, and different prn alleles. In the Bart et al. 'WCV/ACV era' and ' ACV era' (post-1995), the most prevalent allelic profile shifted to ptxP3-ptxA1-prn2-fim2-1-fim3-1. An increased frequency of strains carrying the newer fim3-2 allele was also noted (from <1 % frequency in the WCV era to 37 % in the ACV era), with ptxP3-ptxA1-prn2-fim2-1-fim3-2 being observed as the dominant profile in the late 2010s [47]. Again, the most common allelic profile seen in the New Zealand strains throughout the period matches that seen elsewhere in the world. Noted shifts, such as ptxP1 to ptxP3 and prn1 to prn2, also appear to have happened in New Zealand on similar timescales to the rest of the world. Interestingly, although the progression of fim3-2 in the New Zealand strains reflects that seen in Bart et al. [7] throughout the WCV and early ACV eras, rising from 0 % of strains in the 1980s and 1990s to 30.4 % (7/23 strains) in the 2000s, a decline in the frequency of this allele is apparent in the strains from the 2010s (to 9.7 %, 3/31 strains, Fig. 3). The most recent strains in the Bart et al. screen were isolated in 2010; it is therefore possible that a similar decrease in prevalence would be seen in more recent global strains, rather than being a New Zealand-specific phenomenon. This is supported by studies such as Bowden et al. [6], which note a re-emergence of the fim3-1 allele since 2010.
Another shift observed globally in the ACV era has been the rapid increase in strains, which do not express functional pertactin protein. In Australia, for example, the percentage of Prn-deficient strains increased from 5-78 % over the period 2008 to 2012 [3].
A study of strains from European countries isolated between 1998 and 2015 showed a clear correlation between how early each country introduced the ACV and the current proportion of Prn-deficient strains [12]. The ACV was introduced in New Zealand in 2000, several years before some European countries, including the UK. Although the first Prn-deficient strains did not appear in the New Zealand cohort studied here until 2012, later than in some countries, the proportion of Prn-deficient strains rapidly increased; 88.9 % of strains since 2012 are Prn-deficient, including 90 % of those isolated in 2017 and 2018. This frequency is higher than for any of the European strains included in Barkoff et al. 's study [12], although it is similar to the frequencies observed in Australia (78 %) and the USA (85 %) [3,11]. The majority of the predicted Prn-deficient New Zealand strains (95.8 %, n=24) have an insertion of IS 481 in their prn gene, and one (NZ63) possesses a SNP, which results in a premature stop codon. Other mechanisms for Prn deficiency have been observed in other studies (for example, 41), such as a commonly observed inversion of 22 kb containing the prn promoter, but were not observed here. Yet further mechanisms for Prn deficiency may not have been identified from sequence alone as, in previous studies, the mechanism for deficiency has not always been identifiable. The percentage of Prn-deficient strains in New Zealand may therefore be even higher than predicted. Expression or non-expression of antigens such as Prn, Ptx and FHA is often tested in vitro, for example by Western blot, but only heat-killed cells were available here. Nonetheless, it should be noted that the use of hybrid genome assembly allowed the prediction from sequence alone of Prn deficiency caused by the most common deficiency mutations.
Overall, New Zealand's historically lower coverage of the pertussis vaccine does not seem to have significantly delayed the most commonly observed recent genomic changes, unlike countries that have been slower to take up the vaccine, such as China [45].
The B. Pertussis strains circulating in New Zealand appear to be more clonal than in many other countries A variety of global B. pertussis phylogenies have been produced over the last decade, including the landmark Bart et al. 2014 study [7]. Some of the key patterns from these phylogenies, for example the clear branching of ptxP3 from the ptxP1 strains, were also observed in the global phylogeny here (Fig. 5), with the New Zealand isolates clustering with the sequences from the rest of the world. Another discovery of Bart et al. was that there was very little geographic clustering of strains; strains from all around the world were spread throughout the phylogeny. Likewise, Sealey's [48] investigation of UK strains showed little geographic specificity. In addition, studies of specific outbreaks in the UK and USA showed that strains circulating during outbreaks were polyclonal, and not characterized by a single outbreak strain [2,6]. Conversely, an outbreak in Australia in the late 2000s and early 2010s was found to be caused primarily by the circulation of a group of highly related strains [4].
Although some of the New Zealand isolates cluster throughout the global phylogeny along with strains from other countries, some notable groups (labelled groups A-D in Fig. 5) of highly similar New Zealand isolates stand apart. Interestingly, the highly similar strains were not always isolated during the same year, often not even during the same outbreak, and all are from the post-2000 era. In each group, the strains are not only similar in terms of core genome SNPs, but also in serotype (where tested), Prn deficiency and, usually, genomic arrangement.
In case the clustering of New Zealand strains on the global tree was due to overrepresentation of New Zealand sequences compared to any other country, further phylogenies were constructed for the three major outbreaks since 2000, each including a greater number of global strains from the outbreak period, and each using a more recent reference genome (B1917) than the original global tree. As New Zealand's closest neighbour, it seemed possible that strains from Australia would show most similarity to New Zealand strains, so extra Australian strains were included where possible. In each of the trees, strains from Australia did indeed cluster most closely with the New Zealand strains, particularly during the 2004-2006 and 2011-2014 outbreaks. Three closed B. pertussis genome sequences were also available for Australian isolates, sequenced by Fong et al. [49]; the genomic arrangements of these three strains were compared to each of the identified New Zealand arrangement types, including the singletons, to further investigate the apparent close similarities between isolates from the two countries. However, only one of the Australian isolates shared an arrangement with any of the New Zealand isolates: CIDM-BP2 and NZ48 both belonged to cluster CDC046, first identified by Weigand et al. [16,17]. Like eight of the New Zealand isolates, two of the Australian isolates had singleton arrangements, not yet seen in any other isolates globally. Seemingly, whilst some isolates from Australia are highly similar to those from New Zealand, both countries also have circulating isolates, which are more geographically unique.
Whilst not indicating that the outbreaks in New Zealand have been caused by a single hypervirulent strain, the phylogenies do show evidence of some geographic specificity, and that many of the New Zealand strains are more closely related to each other than to strains from other countries except, perhaps, Australia. The phylogenetically close strains usually also share other similarities identified here, such as genome arrangement, serotype and prn mutation. Not all New Zealand strains fall into clustered groups, however, as some do occur throughout the phylogenies in clusters with strains from other countries.
The results presented here contribute to a growing body of knowledge on B. pertussis genomics, revealing changes that have occurred over time, influenced by the introduction and alteration of vaccines, as well as representing the first New Zealand B. pertussis isolates to be sequenced. This study also shows for the first time the utility of nanopore sequencing in conducting rapid and affordable B. pertussis strain screens, in combination with Illumina sequencing. The benefits of closed genome sequences for comparative genomics are demonstrated by the ability to group isolates not only by allelic profile, but also by genomic arrangement type. With a growing number of closed genome sequences to which the arrangement of future isolates can be compared, along with evidence that differences in genomic arrangement may influence the phenotypic behaviour of isolates [15], the ability to easily produce closed genome sequences is becoming increasingly useful. Finally, we were able to use the closed genome sequences to predict pertactin deficiency; in light of globally increasing antigen deficiency, this ability, particularly in the absence of live B. pertussis cells, could be especially beneficial in adding to our understanding of this ongoing trend.

Funding information
This work was funded by the University of Bath and Oxford Nanopore Technologies. The New Zealand isolates were originally collected, stored, prepared and shipped by the Institute of Environmental Science and Research, funded by the New Zealand Ministry of Health with the cooperation of the local diagnostic laboratories.