Phase variable DNA repeats in Neisseria gonorrhoeae influence transcription, translation, and protein sequence variation

There are many types of repeated DNA sequences in the genomes of the species of the genus Neisseria, from homopolymeric tracts to tandem repeats of hundreds of bases. Some of these have roles in the phase-variable expression of genes. When a repeat mediates phase variation, reversible switching between tract lengths occurs, which in the species of the genus Neisseria most often causes the gene to switch between on and off states through frame shifting of the open reading frame. Changes in repeat tract lengths may also influence the strength of transcription from a promoter. For phenotypes that can be readily observed, such as expression of the surface-expressed Opa proteins or pili, verification that repeats are mediating phase variation is relatively straightforward. For other genes, particularly those where the function has not been identified, gathering evidence of repeat tract changes can be more difficult. Here we present analysis of the repetitive sequences that could mediate phase variation in the Neisseria gonorrhoeae strain NCCP11945 genome sequence and compare these results with other gonococcal genome sequences. Evidence is presented for an updated phase-variable gene repertoire in this species, including a class of phase variation that causes amino acid changes at the C-terminus of the protein, not previously described in N. gonorrhoeae.


Introduction
In Neisseria gonorrhoeae, the causative agent of gonorrhoea, DNA repeats are intimately linked to the biology of the organism. N. gonorrhoeae, and the closely related bacterial species Neisseria meningitidis, undergo phase-variable stochastic switching of gene expression for several surface structures, contributing to antigenic variation and immune evasion as well as niche adaptation in the course of infection (Bhat et al., 1991;Moxon et al., 2006;Carbonnelle et al., 2009;Srikhanta et al., 2009;Omer et al., 2011). Phase variation is mediated by simple sequence repeats associated with genes. In the species of the genus Neisseria the vast majority contain homopolymeric tracts within the coding sequences (Snyder et al., 2001).
Comparative sequence analysis between a single N. gonorrhoeae and several N. meningitidis genome sequences identified over 100 potentially phase-variable genes (Snyder et al., 2001), some of which have later been demonstrated to be phase-variable experimentally (Jordan et al., 2005). Transcriptional and translational phase variation have been extensively studied in the species of the genus Neisseria, however an additional class of simple sequence repeat-mediated phase variation has been described in Helicobacter canadensis following whole-genome analysis (Snyder et al., 2010). Simple sequence repeat-mediated changes in the presence or absence of C-terminal cell wall attachment motifs has also been described in Streptococcus agalactiae (Janulczyk et al., 2010). In N. meningitidis, a gene fusion between pglB2 and the downstream phosphoglycosyltransferase gene appears to be mediated by a poly-A repeat tract (Viburiene et al., 2013).
With the availability of additional gonococcal genome sequences, the gonococcal phase-variable repertoire has here been re-assessed. As a result, phase variation in which repeats at the 3 ¢ ends of genes mediate changes in the Cterminal sequence of the proteins is described as part of a refined phase-variable gene repertoire.
Identification of repeat variation within N. gonorrhoeae strain NCCP11945. N. gonorrhoeae strain NCCP11945 was grown on GC agar (Oxoid) with Kellogg's (Kellogg et al., 1963) and 5 % Fe(NO 3 ) 3 supplements at 37 C in a candle tin for a period of 8 weeks with passages to fresh agar plates every 2 days or at 37 C 5 % CO 2 for a period of 20 weeks with passages to fresh agar plates every 2-3 days. At each passage, cells were scraped from the plate and resuspended in 1 ml of GC broth to a turbidity equivalent to a 0.5 McFarland standard before inoculation onto fresh plates using a sterile cotton swab. DNA was extracted from such resuspensions using the Puregene Yeast/Bacterial kit (Qiagen). A sample (1 mg or 100 ng) of the DNA was genome sequenced using the Ion Personal Genome Machine, Ion Express Fragment Library kit, Ion Express Template kit, and Ion Sequencing kit (Life Technologies) or using the Illumina-based methods of the MicrobesNG service (microbesng.uk). Sequence read data was interpreted using Galaxy on usegalaxy.org (Afgan et al., 2016). Briefly, the reference sequence (NC_011035.1), Ion Torrent data for eight-week passages (KU1-4, KU1-45), Ion Torrent data for 20-week passages (KU1-95, KU1-96), and Illumina data for 20-week passages (2928-NS1_1 & 2929-NS1_2 and 2929-NS2_1 & 2929 were uploaded to Galaxy. The Ion Torrent bam format files were converted to fastq format using BAMTools Convert (Barnett et al., 2011). FASTQ Groomer was used on all NGS data (Blankenberg et al., 2010). Bowtie2 was used to map the reads against the reference (Langmead et al., 2009;Langmead et al., 2012) before visualisation using the Integrated Genomics Viewer (Robinson et al., 2011;Thorvaldsdóttir et al., 2013).

Impact Statement
Phase variation plays a vital role in the ability of Neisseria gonorrhoeae to adapt to the various niche environments encountered. Through stochastic switching in the expression of key genes and regulatory systems, mediated by simple sequence repeats, the population of bacteria are diverse and readily able to survive in the face of selective pressures. Not all simple sequence repeats within the genome mediate phase variation. Previous investigations have sought to define the phasevariable repertoire of the species of the genus Neisseria and have identified a large number of candidates using a small number of genome sequences. With the availability of more genome sequence data and additional experimental data, we have refined the original repertoire to include those most likely to be phase-variable in N. gonorrhoeae. As these genes are important for survival, their definition as phase-variable is important for understanding pathogenesis and for potential future therapies. The advent of high-throughput sequencing has the potential to reveal additional cases of withinstrain variations in repeat tracts, supporting phase-variable candidacy of genes.

Phase variable genes
The phase-variable gene repertoire of N. gonorrhoeae strain NCCP11945 was investigated and compared against gonococcal strains FA1090, FA19, FA6140, 35/02, and MS11 to assess the presence of similar repeat tracts across the species and variations in repeat tract lengths between strains.
Transcriptional phase variation is mediated by repeats within or associated with the promoter region (Fig. 1a). Changes in the repeat alters the level of transcription of the gene, as in fetA (frpB; NGK_2557) where differences in the Transcriptional phase variation, in which changes in a repeat tract alter the facing and spacing of the À10 and À35 promoter elements and the level of transcription of the gene. Phase variation of fetA is used as an example, where it has been shown that differences in spacing of the À10 and À35 elements due to changes in the poly-C repeat tract alter expression levels, represented by the widths of the arrows (Carson et al., 2000). (b): Translational phase variation, in which changes in a repeat tract towards the 5 ¢ end of the coding sequence alter the reading frame of a coding region and switch expression on and off due to frame-shift. Phase variation of pilC is used as an example, where it has been shown that changes in the poly-G repeat tract generate frame-shifts which switch protein expression on and off (Jonsson et al., 1991). (c): C-terminal phase variation, in which changes in a repeat tract towards the 3 ¢ end of the coding sequence alter the reading frame of a coding region and switch the encoded C-terminal amino acids between the three reading frames. In the example NGK_1211, two of the reading frames result in different C-terminal ends to the protein, while the third generates a fusion with the downstream coding sequence, NGK_1212.
Only some examples of C-terminal phase variation result in this type of fusion (Table 3).
Known van der Ende et al.  #Gene phase variation candidacy in N. gonorrhoeae. Known, phase variation has been reported in the literature. Yes, there is evidence of repeat tract variation between strains supporting phase variation. **This coding sequenc appears to be frame-shifted and annotated as two coding sequences. † †NGK_2186 and NGO2047 annotations are on opposite strands.    length of the poly-C homopolymeric tract between the À10 and À35 promoter regions alters expression (Carson et al., 2000). There are three transcriptional phase-variable genes in N. gonorrhoeae strain NCCP11945 (Table 1), fetA (NGK_2557), a lipoprotein (NGK_2186), and porA (NGK_0906/NGK_0907), yet in gonococci porA does not have an intact coding region. Variation in the repeats between gonococcal strains is found for all three transcriptional phase-variable genes ( Table 1).
Most common in the species of the genus Neisseria is translational phase-variation where, as in pilC, the repeat is within the 5 ¢ portion of the coding region of the gene (Fig. 1b). Changes in the repeat tract generate frame-shift mutations in two of the three open reading frames, with the gene only being translated into protein when the repeat tract length puts the gene in-frame. Whilst many phase-variable genes in the species of the genus Neisseria contain homopolymeric tracts, some experience copy number changes in repetitive sequences, such as the CTTCT repeat in opa (Muralidharan et al., 1987;Bhat et al., 1991) or the AAGC repeat in autA (Peak et al., 1999;Arenas et al., 2015). In the N. gonorrhoeae strains examined here, the AAGC repeat in virG (NGK_0804) is only present in two or three copies (Table 2), rather than several copies as in NGK_0831a and autA (NGK_2082). Although virG has low copy number for the repeat, variations between strains are observed and strains with many copies may yet be identified [there are currently none >(AAGC)3 in the NCBI nr/nt or wgs databases], therefore it is placed amongst the phase-variable genes even though this may be at low frequency or be a strain-specific effect. There are 36 translational phasevariable genes in N. gonorrhoeae based on the species examined ( Table 2).
In addition, a third class of repeat-mediated phase-variable gene was identified (Snyder et al., 2010). In these C-terminal phase-variable genes, a repeat tract towards the 3 ¢ of the coding region is able to alter the sequence at the C-terminus of the encoded protein (Fig. 1c). In N. gonorrhoeae strain NCCP11945, four of these C-terminal phase-variable genes were identified (Table 3). It is likely that in the case of the pilin sequence (NGK_2161), changes in the repeat are causing pilus protein changes, mediating antigenic variation through a phase-variable mechanism. Comparisons also show repeat tract variation in a membrane protein (NGK_1211) and mafB cassette (NGK_1624), supporting C-terminal phase variation in the species of the genus Neisseria Variations in the products of mafB cassettes are believed to contribute to competition between species within the niche (Jamet et al., 2015). Although no variation was observed in these strains in ispH (NGK_0106), (G)8 repeats are known to vary in lgtC (NGK_1632), hpuA (NGK_2581), and hsdS (NGK_0571) ( Table 2), therefore it is highly likely that the repeat in ispH also has the capacity to vary.
A number of previously reported candidates are not supported by evidence of phase variation, based on the absence ‡From the N. gonorrhoeae strain FA19 genome sequence (NZ_CP012026.1). §From the N. gonorrhoeae strain FA6140 genome sequence (NZ_CP012027.1).
#Gene phase variation candidacy in N. gonorrhoeae. Known , phase variation has been reported in the literature. Yes, there is evidence of repeat tract variation between strains supporting phase variation. **This coding sequence appears to be frame-shifted and annotated as two coding sequences.

NP,
The coding sequence is not present in this strain.    (Chen et al., 1998;Shafer et al., 2002;Adamczyk-Poplawska et al., 2011). † †There is a 400 bp deletion in this strain encompassing the region that would contain this repeat.  of tract length changes between the strains (Table 4). For example, although tract variation was reported for cvaA (NGK_0168), mafA-3 (NGK_2270), and dca (NGK_1830) in N. meningitidis (Martin et al., 2003), there are no changes observed in the short (C)4 and (G)5 tracts in these genes in N. gonorrhoeae (Table 4). They are therefore unlikely to be phase-variable in this species. Likewise, neither of the dinucleotide-repeat-containing genes (NGK_1607 and NGK_2274) show variations (Table 4); dinucleotides are not likely to be phase-variable in the species of the genus Neisseria (Martin et al., 2003). All of these genes contain short repeats that do not vary or alternative nucleotide sequences in the strains investigated (Table 4).
Combined with the previous data on repeat variation within and between gonococcal strains and demonstration of phase variation (Sparling et al., 1986;Yang et al., 1996;Lewis et al., 1999;Snyder et al., 2001;Power et al., 2003;Jordan et al., 2005;Srikhanta et al., 2009), a revised repertoire of 43 transcriptional (Table 1), translational (Table 2), and Cterminal (Table 3) phase-variable genes is proposed for N. gonorrhoeae as a species. This is fewer than previous predictions (76 in Jordan et al., 2005) and thus far two-thirds (67 %, 29 out of 43) have been experimentally demonstrated to be phase-variable (Tables 1, 2). The additional 14 genes, 13 of which show strain-to-strain repeat variation, require additional investigation.

Phase variable repeat copy number variation in vitro
Previously, for H. canadensis, 454 and Illumina genome sequence read data was used to support candidacy of phasevariable genes (Snyder et al., 2010). In the present study, Ion Torrent and Illumina genome sequence read data from N. gonorrhoeae strain NCCP11945 that had been passaged in the laboratory for 8 weeks or for 20 weeks was analysed for changes to phase-variable repeats for the 14 genes for which there is no within-strain evidence of phase variation (Tables 1-3 The region of the coding sequence containing the repeat tract does not have homology to the aligned region in these strains.  variable genes pilC1, opa , and fetA, suggesting that read data can support phase variability by demonstrating withinstrain variation in tracts (Table 5). Of the 14 genes, only virG (NGK_0804) and the pyrimidine 5 ¢ -nucleotidase (NGK_2176) showed no changes in repeats (Table 5). Probably, the virG (AAGC)3 copy number is too low to vary, however there may be strains with greater copy number in which it would. Likewise, the poly-C repeat in NGK_2176 has been replaced with CAAACCCC in strain NCCP11945 and therefore would not be expected to phase vary in this strain, however phase variation is likely in strain MS11, for example.
The Ion Torrent sequencing technology has been criticised for generating homopolymer-associated indels (Loman et al., 2012) and that the tracts can be incorrect at more than eight bases (Quail et al., 2012), the optimal length for phase variation. Homopolymeric tracts in Illumina data are believed to be less error prone (Schirmer et al., 2015). However, repeat sequence data from Illumina often agreed with Ion Torrent on the presence of variation (9 of 14 genes with variation in Ion Torrent, Table 5). When the Illumina data did not show repeat variation, this often corresponded to relatively low read coverage of the region compared to the Ion Torrent data (Table S2).
It is currently impossible to differentiate genuine biologically induced indels from sequencing errors (Narzisi & Schatz, 2015). We may find that what we ascribe to errors can also be subtle changes that are constantly being generated within the bacterial population. From this data, the expected biological variation supporting phase variation appears to be present in N. gonorrhoeae strain NCCP11945 for 12 as yet unexplored genes.

Conclusion
In conclusion, N. gonorrhoeae possesses three different mechanisms for phase variation: transcriptional; translational; and C-terminal. Stochastic systems obviously play important roles in the biology of the organism given the variety and number of genes involved. The functions of previously unexplored phase-variable genes, including one transcriptional phase-variable gene, nine translational phase-variable genes, and four C-terminal phase-variable genes require further investigation.