Phasevarions of bacterial pathogens – phase-variable epigenetic regulators evolving from restriction–modification systems

Phase-variable DNA methyltransferases control the expression of multiple genes via epigenetic mechanisms in a wide variety of bacterial species. These systems are called phasevarions, for phase-variable regulons. Phasevarions regulate genes involved in pathogenesis, host adaptation and antibiotic resistance. Many human-adapted bacterial pathogens contain phasevarions. These include leading causes of morbidity and mortality worldwide, such as non-typeable Haemophilus influenzae , Streptococcus pneumoniae and Neisseria spp. Phase-variable methyltransferases and phasevarions have also been discovered in environmental organisms and veterinary pathogens. The existence of many different examples suggests that phasevarions have evolved multiple times as a contingency strategy in the bacterial domain, controlling phenotypes that are important in adapting to environmental change. Many of the organisms that contain phasevarions have existing or emerging drug resistance. Vaccines may therefore represent the best and most cost-effective tool to prevent disease caused by these organisms. However, many phasevarions also control the expression of current and putative vaccine candidates; variable expression of antigens


IntRoductIon Bacterial dnA methyltransferases and epigenetic regulation
Epigenetics is the study of heritable gene expression changes that occur without change in the DNA sequence [1].Many mechanisms of epigenetic gene regulation have been well studied in eukaryotes, including histone modification and genomic imprinting [1].DNA methylation is probably the best-studied epigenetic gene regulation mechanism, with adenine methylation being the most common form of DNA methylation in bacteria [2].DNA adenine methyltransferase (Dam) is a well-established example of epigenetic regulation in bacteria.There is strong evidence that Dam regulates genes by methylating DNA at specific target sites that are crucial for controlling gene expression and these then compete with regulatory proteins for binding sites in promoter regions [3].For example, variable expression of the Pap pilus and antigen 43 in Escherichia coli is mediated by Dam methylation of the promoter of the encoding pap gene.This alters the ability of the LRP and OxyR regulatory proteins to bind DNA [4].Another well-studied example of bacterial epigenetic regulation by adenine methylation is mediated by the methyltransferase cell cycle-regulated methyltransferase (CcrM), where variable expression of CcrM controls the cell cycle of a number of Alphaproteobacteria [3].The functions of solitary DNA methyltransferases, such as Dam and CcrM, have been reviewed in detail previously [3][4][5].In summary, the expression of these methyltransferases does not phase-vary and they are not regulated; they generate a static pattern of methylation upon which stochastic switches may evolve due to conflict between DNA methylated by Dam and CcrM, and the ability of DNA-binding proteins to bind at the same site.

Bacterial restriction-modification (R-M) systems
In addition to solitary DNA methyltransferases such as Dam and CcrM, many bacterial DNA methyltransferases exist as part of R-M systems.These are classically thought of as bacterial immune systems, protecting the cell from foreign DNA, typically bacteriophage [6], but have also been demonstrated to perform several additional roles, including Abstract Phase-variable DNA methyltransferases control the expression of multiple genes via epigenetic mechanisms in a wide variety of bacterial species.These systems are called phasevarions, for phase-variable regulons.Phasevarions regulate genes involved in pathogenesis, host adaptation and antibiotic resistance.Many human-adapted bacterial pathogens contain phasevarions.These include leading causes of morbidity and mortality worldwide, such as non-typeable Haemophilus influenzae, Streptococcus pneumoniae and Neisseria spp.Phase-variable methyltransferases and phasevarions have also been discovered in environmental organisms and veterinary pathogens.The existence of many different examples suggests that phasevarions have evolved multiple times as a contingency strategy in the bacterial domain, controlling phenotypes that are important in adapting to environmental change.Many of the organisms that contain phasevarions have existing or emerging drug resistance.Vaccines may therefore represent the best and most cost-effective tool to prevent disease caused by these organisms.However, many phasevarions also control the expression of current and putative vaccine candidates; variable expression of antigens could lead to immune evasion, meaning that vaccines designed using these targets become ineffective.It is therefore essential to characterize phasevarions in order to determine an organism's stably expressed antigenic repertoire, and rationally design broadly effective vaccines.
speciation and epigenetic regulation [7].R-M systems are made up of restriction enzymes (R), which cleave DNA in a sequence-specific manner, and a cognate methyltransferase enzyme (M), which methylates the same sequences that are cleaved by the restriction enzyme, protecting 'self ' DNA from degradation.There are four main classes of R-M systems found in bacteria (Fig. 1), with each differing in their subunit composition, sequence specificity, cleavage position and cofactor requirements [8].
Type I systems consist of co-transcribed hsdR, hsdM and hsdS genes, which encode restriction (R), methyltransferase (M) and specificity (S) subunits, respectively (Fig. 1a) [9].In Type I systems, the hsdR and hsdM genes are highly conserved, with the DNA sequences being restricted/ methylated dictated by the variable hsdS specificity subunit.Each hsdS gene is composed of two half-target recognition domains (TRDs; the 5′ TRD and the 3′ TRD), each of which contributes half to the overall methylation specificity of the encoded HsdS protein.Each TRD recognizes a 3-4 bp sequence, with the two TRDs separated by a central spanning domain.Therefore, if the region encoding the 5′ TRD or the 3′ TRD is exchanged for a different 5′ or 3′ TRD, this will change the DNA target sequence recognized by the HsdS subunits, leading to restriction/methylation at different sequences [9].An active restriction enzyme is made up of an R 2 M 2 S pentamer, whereas an M 2 S trimer is an active, standalone methyltransferase.
Type II systems consist of two independent enzymes; a restriction endonuclease (Res; R), and a methyltransferase (Mod; M) (Fig. 1b).Type II restriction enzymes are standard tools in the modern molecular biology laboratory.Recognition sequences for type II Res and Mod are typically 4-8 bp long and palindromic.There are 11 distinct Type II Res subtypes, with each subtype differing in cleavage properties [10].Type II Res and Mod proteins are both active, stand-alone enzymes.
Type III systems encode separate methyltransferase (M; encoded by mod) and restriction endonuclease (R; encoded by res) components (Fig. 1c).Type III mod and res genes are transcribed together and form a two-subunit complex [11].Mod (M 2 ) is active as a stand-alone methyltransferase, whereas an active restriction enzyme requires the formation of an R 2 M 2 tetramer [12].Mod catalyzes the methylation of a single strand of DNA at a specific 4-6 bp asymmetrical recognition sequence, independently of Res [13].Mod contains the target recognition domain (TRD), which dictates the DNA sequence that is methylated/restricted.Type IV systems are methylation-dependent restriction systems, and are useful tools for epigenetic research [14].However, they are not associated with a cognate methyltransferase [8,14], and will not be discussed further in this review.

Methods for studying dnA methylation in bacteria
Whilst epigenetic gene regulation by DNA methyltransferases has been investigated in bacteria for many years, the sequences recognized and methylated by many of these systems have not been well characterized.Many methods have been developed for eukaryotic CpG methylation, which is important in a variety of processes, including cancer development and chromatin structure [15].Specific methods to study CpG methylation, such as bisulphite sequencing [16], cannot be used to detect methylated adenine residues.Methods based on bisulphite sequencing rely on knowing the DNA sequence context in which methylation is occurring, such as methylation-specific co-immunoprecipitation [17], and methylation-specific PCR [18].Other methodologies for detecting and studying adenine methylation have traditionally required extensive experimentation: restriction inhibition assays using methylation-sensitive restriction enzymes can be used [19,20], but these rely on finding a restriction enzyme whose recognition sequence overlaps the site that is methylated.Methods such as chemical modification and bond formation using modified oligonucleotides and chemical crosslinking [21], or the use of radio-labelled AdoMet [22], require the use of hazardous chemicals or special experimental set-ups.Mass spectrometry can detect the methyl group itself, but gives no information about the sequence that is methylated.More recently, next-generation sequencing methods to detect methylation have supplanted the techniques described above, and a recent review describes these methods in detail [23], and as such, we will only briefly describe two: MinION and Single-Molecule, Real-Time (SMRT) sequencing.Oxford Nanopore MinION DNA sequencing technology has been used to map methylated adenine and cytosine residues using bacterial genomic DNA [24], but this has not yet been used to discover the specificity of uncharacterized methyltransferases.Pacific Biosciences (PacBio) SMRT DNA-sequencing technology [25,26] allows determination of the methyltransferase sequence specificity and sequence context.By analysing the kinetics of DNA synthesis during SMRT sequencing, the position of modifications such as methylation can be identified [25,27].SMRT sequencing/methylome analysis therefore allows the generation of a complete closed genome sequence, and the determination of the sequence context and position of every DNA modification [28].SMRT methylome analysis has been reviewed separately in detail previously [29], as has the role that PacBio SMRT sequencing and methylome analysis has played in advancing the study of bacterial phasevarions [30].SMRT sequencing and methylome analysis has been used extensively over the last ~5 years to: (i) verify existing DNA methyltransferase specificities [26]; (ii) identify previously uncharacterized methyltransferases in a variety of bacterial species [26,31]; (iii) characterize complete bacterial methylomes [31][32][33][34][35]; and (iv) determine the specificity of phase-variable DNA methyltransferases [36][37][38][39], the focus of this review.When PacBio SMRT sequencing is coupled to a technique or techniques to study gene and/or protein expression changes, such as RNA Seq or quantitative massspec proteomics (e.g.iTRAQ or SWATH), we are able to combine knowledge of methylation specificity and gene expression changes commensurate with methyltransferase phase variation [40].This will allow us to determine the exact mechanisms behind gene regulation mediated by methyltransferase phase variation.We discuss the possible modes of gene regulation by phase-variable DNA methyltransferases later in this review.

Phase variation and phasevarions
Phase variation is the random and reversible switching of gene expression [41].It is typically associated with genes encoding bacterial surface features, such as adhesins [42], pili [43], iron-acquisition proteins [44,45] and lipo-oligosaccharide (LOS) [46,47].Phase variation allows a population of organisms to generate a variety of phenotypic variants.These mixed populations may contain individuals that are, for example, better equipped to colonize certain host niches, or primed to evade an immune response.This random switching of expression means that proteins encoded by phase-variable genes are not ideal vaccine candidates, as their expression is not stable.However, it is easy to identify phase-variable genes from the primary DNA sequence of an organism, as they contain a number of well-defined DNA sequence features.These DNA sequence features are inverted repeats (IRs) and simple-sequence repeats (SSRs) [40,41].Recombination between homologous IRs results in gene shuffling between expressed and silent variants of particular loci.Therefore, the protein encoded by a gene containing IRs is always expressed, but shuffles between allelic variants.SSR tracts are unstable, and vary in length through polymerase slippage during replication.Depending on the number of SSRs present in the tract, genes containing SSRs in their open reading frame are in-frame, and expressed (ON), or are out-of-frame, resulting in a premature stop codon, and not expressed (OFF), or expressed as truncated variants.When SSRs change length and result in a frameshift downstream, the SSR repeat unit is not divisible by three (e.g.T n , GA n , AGCC n , etc.) [41].Intriguingly, a number of bacterial pathogens contain methyltransferase genes, associated with R-M systems subject to phase variation.Phase variation of methyltransferases can occur through the gene encoding the methyltransferase being expressed (ON) or not (OFF), so methylation occurs or does not, or through the expression of multiple, variable methyltransferase specificities by shuffling between variable expressed and silent loci.Variable expression of methyltransferases leads to variable genome-wide methylation differences within a bacterial population.This variable methylation leads to altered expression of multiple genes through epigenetic mechanisms [30,48].These systems are called phase-variable regulons (phasevarions), and have been described in many human-adapted pathogens [30].All described phasevarions regulate the expression of multiple genes, including genes that are involved in host colonization, survival and pathogenesis, and many regulate putative vaccine candidates [20,36,37].The presence of phasevarions complicates the identification of stably expressed proteins, as the regulated genes do not contain any easily identifiable features, and variation in expression is random; therefore, in contrast to genes that are controlled by classical 'sense and respond systems' , such as those that control expression of genes involved in nutrient uptake or changes in temperature, the conditions influencing the expression of genes by phasevarions are not well defined.This provides the bacterial species containing phasevarions with an extra contingency strategy to survive changing environmental conditions.Therefore the only way to identify genes in a phasevarion is by detailed study of the organisms containing such systems.The study of phasevarions, and the identification of the proteins they regulate, is therefore critical to generate effective and stable vaccines.

Inverting type I hsdS loci are present in many bacterial species
A number of Type I R-M systems that contain duplicated hsdS specificity genes encoding inverted repeats have been described, and are present in a number of species, such as Mycoplasma pulmonis [49], Porphyromonas gingivalis [50] and Listeria monocytogenes [51].These systems have recently been thoroughly reviewed, and have been termed 'inverting' Type I systems [52], as the systems phase-vary by DNA inversions.This review will describe two of the best studied systems: those found in the related pathogens Streptococcus pneumoniae and Streptococcus suis.
The SpnD39III system in S. pneumoniae was the first described phasevarion controlled by phase variation of a Type I R-M system.This system switches between six distinct methyltransferase specificities, and is termed the SpnD39III system [39].S. pneumoniae is a major global bacterial pathogen, and is responsible for acute and chronic diseases of the upper and lower respiratory tract, and serious and life-threatening infections, including meningitis and sepsis [53].The SpnD39III system is present in every strain of S. pneumoniae that has a publicly available genome sequence (>200 strains) [39].This Type I system contains multiple, duplicated hsdS genes containing IRs, with two unique 5′ TRDs and three unique 3′ TRDs encoded (Fig. 2a) [39].Therefore, the SpnD39III system shuffles between six different specificities through homologous recombination between the multiple, variable TRDs present in the expressed (hsdS) and silent (hsdS′ and hsdS′′) copies of these specificity genes (Fig. 2a).This process is catalyzed by a locus-associated recombinase, creX [54], and a non-locus associated recombinase [55].The six methyltransferase specificities of the SpnD39III locus, termed alleles A-F, all methylate a different DNA target sequence, and consequently control a different phasevarion.This results in six differentiated cell types in a S. pneumoniae population [39].For example, cells expressing the SpnD39III-A allele are highly invasive in a mouse model of infection [39], and S. pneumoniae expressing the SpnD39III-B allele show decreased expression of capsule biosynthetic genes, and genes involved in carbohydrate metabolism [39].There also appears to be strain-specific variation in phenotype and the SpnD39III allele expressed, as studies in different strains show different colony opacity phenotypes even when the same SpnD39III allele is expressed [54,56].This implies that the expression of different SpnD39III alleles and strain-specific factors interact to increase the phenotypic variation of different S. pneumoniae populations.This adds a further level of complexity to characterizing the phenotypic effects of SpnD39III phasevarion switching on S. pneumoniae pathobiology.
An inverting Type I system is present in S. suis.S. suis is a major pig pathogen, causing a range of respiratory tract infections, as well as invasive diseases such as arthritis and septicaemia [57].S. suis is also a major cause of zoonotically acquired meningitis in humans, particularly in South-East Asia [58] where it is a major public health concern, and has high mortality rates due to eliciting a streptococcal septic/ toxic shock-like syndrome in patients [59].An inverting Type I system was recently described in S. suis [60] that shuffles between four different HsdS specificities.Unlike the S. pneumoniae 'six-way' SpnD39III system, this 'four-way' system in S. suis does not contain a locus-associated recombinase, nor a truncated hsdS′′ gene, but only duplicated, inverted hsdS loci (hsdS and hsdS′) that contain IRs (Fig. 2b).Interestingly this four-way inverting Type I R-M system in S. suis is associated with a highly virulent zoonotic lineage [60,61].No gene expression analysis has yet been carried out for the four alleles expressed by this system (alleles A-D), and it will be interesting to decipher the gene expression differences controlled by these phasevarions, and the resulting invasive phenotype associated with the virulent lineage containing this system.

type I R-M loci containing simple sequence repeats
A single Type I R-M locus has been observed that can phasevary by changes in the length of SSRs located in the open reading frame of the encoding hsdS gene.This locus, named the NgoAV system [19], is present in the human pathogen Neisseria gonorrhoeae, which is responsible for the sexually transmitted infection (STI) gonorrhoea.N. gonorrhoea has emerged as a major drug-resistant pathogen, with some strains being resistant to all but last-line antibiotics [62].The SSR tract in the hsdS gene is a polyguanine (G n ) tract, and is located between the 5′ and 3′ TRDs (Fig. 2c).Variation between G 6 and G 7 results in a change in the reading frame downstream of the tract.This results in two different hsdS allelic variants being expressed, dependent on the SSR tract length: a full-length HsdS protein is produced if the number of repeats (G 6 ) means the 3′ TRD is translated with the 5′ TRD to produce a single polypeptide; a truncated HsdS protein is produced if the number of SSRs varies (G 7 ), which leads to a frameshift and premature stop codon before the 3′ TRD coding region (Fig. 2c).Thus, two different HsdS proteins are produced, which have two different methyltransferase specificities [19].No phenotypic characterization has been carried out on N. gonorrhoeae strains containing the two HsdS variants, but the fact that the two HsdS proteins lead to methylation at different DNA target sequences strongly implies that this system controls a phasevarion.
Variation in SSR length can occur in hsdM loci as well as hsdS subunits.In these examples, changes in SSR tract length leads to ON/OFF switching of the hsdM methyltransferase gene, rather than the production of full-length or truncated proteins, as seen with hsdS genes.Phase variation of Type I hsdM loci has been observed in Haemophilus influenzae [63] and Mannheimia haemolytica [64], although neither system has yet been shown to control a phasevarion.system, but only contains expressed (hsdS) and silent (hsdS′) genes containing inverted repeats (blue and brown).In this system, there are two different 5′ TRDs (green and yellow), and two different 3′ TRDs (pink and purple).This means that four different HsdS allelic variants can be expressed by recombining between the IRs in these hsdS loci (blue and brown boxes), meaning four different methyltransferase specificities in a population of S. suis.(c) Either a full-length or a truncated protein is produced from the NgoAV locus in N. gonorrhoeae, dependent on the length of the SSR tract between the 5′ TRD (red) and the 3′ TRD (green).The truncated HsdS is equivalent to just the 5′ TRD (red), with two truncated proteins dimerizing to form a functional HsdS protein.This leads to two different methyltransferase specificities.

Phase-variable type III mod genes are widespread in the bacterial domain
Most of the phasevarions described to date are associated with Type III mod genes [13].In these organisms the methyltransferase (Mod) phase-varies between two states (ON or OFF) by variation in the number of SSRs in the mod gene [48] (Fig. 3).A recent study reported that almost 20 % of all Type III mod genes contain SSR tracts in their open reading frame [65].This remarkable observation indicates that almost one-in-five Type III mod genes are able to phase-vary, and potentially control a phasevarion.Based on this analysis, the highly variable mod gene sequences present, and the location and repeating unit of the SSR tracts present in these mod genes, it appears that phase-variable Type III methyltransferases have independently evolved multiple times and appear to be a common and widespread strategy used by bacteria to generate phenotypic diversity and improve adaptability.
Different mod genes show little (<25 % homology) sequence conservation (Fig. 3a).Allelic variants of individual mod genes show high (>95 % DNA) sequence identity in their 5′ and 3′ conserved regions, but contain a highly variable central region, encoding the TRD (Fig. 3b) [66].Like Type I hsdS genes, the TRD dictates the DNA sequence methylated by Mod.Therefore, if individual mod genes contain a different TRD, but a highly conserved 5′ and 3′ region, these are classified as allelic variants of a single mod gene.This means that different alleles of individual mod genes encode enzymes that methylate a different DNA target sequence.Methylation of a different target sequence means that different Mod alleles modA contains an SSR tract made up of varying numbers of AGCC (n) or AGTC (n) repeats, and modB contains an AACCC (n) repeat tract.The positions of the catalytic domain, DPPY, and the substrate-binding domain, FXGXG, are depicted by black bars.When bacterial strains contain multiple independently phase-variable mod genes, these are located in different places in the genome.modA genes are found in NTHi, pathogenic Neisseria, and K. kingae.modB genes are found in the pathogenic Neisseria.(b) Allelic variants of the same mod gene are highly conserved in their 5′ and 3′ regions (white coloured arrows), whereas the central variable TRD region varies between different alleles of the same gene (different coloured boxes), meaning that different alleles have different methylation specificities.The example uses the five most common phase-variable modA alleles in patients with otitis media, studied in [35], as an illustration.
regulate the expression of a different set of genes; i.e. they control different phasevarions.An analysis of the evolution of phase-variable mod genes present in H. influenzae and the pathogenic Neisseria [66] indicated that new TRDs evolve by shuffling existing ones between strains and species.

Many important human-adapted bacterial pathogens contain phase-variable type III mod genes
Phasevarions as a novel gene regulation system were first described in H. influenzae ~15 years ago [67].This followed the observation of SSR tracts in Type III mod genes [68].Phase-variable mod genes and phasevarions have since been well characterized in a number of human-adapted pathogens, including modA in non-typeable H. influenzae [36,69], modA and modB in N. gonnorhoeae and Neisseria meningitidis [20], modD in N. meningitidis [70], modH in Helicobacter pylori [71] and modM in Moraxella catarrhalis [72].
Non-typeable H. influenzae (NTHI) is a major human otopathogen that is responsible for acute and chronic infections of the respiratory tract, including middle ear infection (otitis media; OM) in children [73], and chronic obstructive pulmonary disease (COPD) [74,75] and communityacquired pneumonia in adults [76].Since the introduction of a vaccine against H. influenzae serotype b (Hib), the incidence of invasive infection caused by NTHi has increased significantly worldwide [77,78].NTHi contains 21 modA alleles (modA1-21), each containing a variable central TRD, meaning that each allele methylates a different DNA target sequence, and implying that each allele controls a different phasevarion [36,66,79].Characterization of NTHi strains taken from children with middle ear infection showed that ~65 % of strains contained just one of five phase-variable modA alleles -modA2, modA4, modA5, modA9 and modA10 [36], with modA phase variation being associated with differences in antibiotic resistance and immune evasion, and also influencing variable expression of a number of putative NTHi vaccine candidates, e.g.OMP P6 and HMW [36].The same study demonstrated that the modA2 ON state (i.e.expressed) was selected for in a chinchilla model of middle ear infection [36], indicating that the genes differentially regulated in the ModA2 phasevarion provide a selective advantage in the middle ear.Subsequent work demonstrated that the switch from modA2 OFF to modA2 ON results in more severe middle ear infections [80], indicating that an as yet uncharacterized interaction likely occurs between modA2 ON and modA2 OFF sub-populations in the middle ear to increase disease severity.Further work with modA-controlled phasevarions in NTHi demonstrated that they influence important pathobiological traits, such as resistance to oxidative stress [81] and biofilm formation [82].
N. gonorrhoeae and N. meningitidis are major human pathogens that are responsible for the STI gonorrhea and meningococcal meningitis and septicaemia, respectively.Both contain multiple independently switching Type III mod genes, with N. gonorrhoeae also containing a phase-variable Type I methyltransferase (described above).Both N. gonorrhoeae and N. meningitidis contain modA and modB, with N. meningitidis also containing modD [20,70,83].ON/OFF switching of the modA13 allele in N. gonorrhoeae causes differences in virulence traits, such as invasion of human cells and biofilm formation [20]; ON/OFF switching of the modA11 and modA12 alleles in N. meningitidis results in differences in antibiotic susceptibility [84].Phase variation of the modA11 allele in N. meningitidis also results in variable expression of the lactoferrin-binding proteins LbpA and LbpB [20], which were being investigated as vaccine candidates for N. meningitidis.This finding illustrates the importance of understanding which proteins are differentially regulated as part of phasevarions; LbpA and LbpB were thought to be stably expressed, as they contain no features associated with phase-variable genes.Currently six modB and seven modD alleles have been described [38,83], with the modD1 allele being associated with hypervirulent strains of N. meningitidis [70,83].Therefore N. meningitidis can encode up to three independently switching mod genes (modA, modB and modD), all of which likely play a key role in virulence.
The human pathogen H pylori, which is responsible for gastric ulcers and is implicated as a cause of gastric cancer, contains 17 different modH alleles [71].The modH gene phase-varies ON/OFF through changes in the length of a SSR G n tract located in its open reading frame [71].The most prevalent modH allele in H. pylori, modH5, has been shown to control the expression of the flagellum of this organism [85].A recent survey of the extent of SSR tracts in all Type III mod genes [65] demonstrated the presence of two further phase-variable mod genes in H. pylori, which were named modJ and modL.Whether these newly discovered Type III mod genes control phasevarions is yet to be demonstrated, although genes from multiple strains contain variable length SSR tracts [65], which strongly implies that they phase-vary.Therefore, like N. meningitidis, some strains of H. pylori can contain up to three independently switching Type III mod genes.
M. catarrhalis is a human respiratory tract pathogen.Individual strains can contain one of six different alleles of the modM gene [72,86].An association of the modM3 allele with isolates causing middle ear infection has been demonstrated [37], which like modA2 in NTHi above, implies that certain genes in the modM3 phasevarion provide a selective advantage in the middle ear [37].A survey of all R-M systems in M. catarrhalis demonstrated the presence of two further putatively phase-variable Type III mod genes, which have been named modN and modO [86].The presence of multiple phasevariable mod genes in multiple human-adapted pathogens shows that high levels of phenotypic variation are selected for in these organisms, and that this plays a key role in virulence and pathobiology.
newly identified phase-variable type III mod genes in major human pathogens Shiga toxin-producing Escherichia coli (STEC) is a major food-borne bacterial pathogen that causes lethal haemolytic uremic syndrome in ~10 % of infected individuals [87].A Type III mod gene containing an SSR tract was recently identified in this important human pathogen [65].Different strains of STEC containing this Type III mod gene had SSR tracts of varying length, implying phase-variable expression and the potential to control a phasevarion [65].Phenotypic characterization of strains containing ON and OFF versions of this Type III mod gene will determine the importance of phase variation of this system for the virulence of STEC strains containing it, and it will be interesting to understand the effects of phase-variable methyltransferase expression in this pathogen.
Kingella kingae is a Gram-negative organism that has recently been identified as the cause of bone infections, endocarditis and bacteraemia in young children [88].K. kingae contains the modK gene [89].Differential expression of modK1 was shown to modulate the host immune response, and vary toxin production by K. kingae [89], showing the importance of phasevarion-mediated gene expression in the pathobiology of this emerging paediatric pathogen.K. kingae also contains modA alleles [65], which means that individual strains of K. kingae can potentially contain multiple, independently phase-variable mod genes controlling distinct phasevarions.
Multiple species of the genus Mycoplasma contain Type III mod genes containing SSR tracts.Mycoplasmas are dedicated intracellular pathogens that cause disease in a variety of mammalian species [90].Humans can be infected with Mycoplasma pneumoniae and Mycoplasma genitalium, causing pneumonia and pelvic inflammatory disease, respectively.Both f these organisms contain multiple Type III mod genes containing SSRs [48,65,91].As with the other newly identified mod genes containing SSRs described above, these systems are yet to be studied in detail, but the advantage of containing multiple ways to generate phenotypic diversity without gene acquisition in small-genome pathogens is clear, and an investigation of the effects of phasevarions in mycoplasmas will be essential for the generation of effective vaccines and treatments.
The zoonotic pathogen S. suis contains a Type III mod gene containing SSRs, in addition to containing an inverting Type I R-M system as described above [60].Three unique alleles of this mod gene have been discovered in individual S. suis strains, with these new alleles being named modS1, modS2 and modS3, and all three alleles have been shown to methylate different target sequences [60].Interestingly, analysis of a large collection of S. suis isolates [61] showed that no strain contains both a phase-variable Type I and a phase-variable Type III methyltransferase [60], with strains containing one or the other.Whether this is due to different systems arising in strains from different parts of the world, or an incompatibility of phase-variable Type I and Type III methyltransferases, remains unclear.However, evidence against it being due to 'over-methylation' is provided by multiple bacterial species (N.meningitidis, H. pylori and M. catarrhalis) apparently being able to contain multiple Type III mod genes that can phase-vary, so the separation of phase-variable Type I and Type III methyltransferases in S. suis remains open to investigation.

Mode of action of phase-variable epigenetic regulators
Whilst the identification and study of phase-variable DNA methyltransferases has advanced rapidly over recent years, understanding of the basic mechanism(s) of gene regulation by these systems is much less advanced.Whilst PacBio SMRT sequencing and methylome analysis has allowed the identification of methyltransferase specificity, and transcriptomic/ proteomic studies have elucidated the genes regulated by these systems, this knowledge has not yet been underpinned by mechanistic studies concerning the events involved in gene regulation in phasevarions.It should be noted that gene expression changes in phasevarions are likely mediated by both methylation and loss of methylation at particular sites.The only example to date of how a gene is regulated by differential methylation by a phase-variable methyltransferase has come from study of the Type III mod gene modH5 in H. pylori [85].Determination of the recognition sequence of ModH5 by SMRT sequencing and methylome analysis (5′-G m6 ACC-3′) showed that this sequence was present in the promoter of the gene encoding the major flagellar protein, flaA.This allowed the demonstration that differential methylation of a 5′-GACC-3′ motif in the promoter of flaA leads directly to expression differences in this gene.The authors of this study also demonstrated that removal of the modH5 recognition site from the flaA promoter by site-directed mutagenesis resulted in the flaA gene no longer being controlled by modH5 phase variation [85].This study therefore provided the first evidence of direct control of a gene by a phasevariable methyltransferase.This mode of gene regulation by a phase-variable methyltransferase can be thought of as 'primary' regulation by methylation, i.e. methylation directly affects expression of the gene where the methylation is occurring (Fig. 4a).It is also likely that secondary regulation (Fig. 4b), i.e. methylation of a distal site affects gene expression, and even tertiary regulation (Fig. 4c), i.e. methylation of a regulator subsequently affecting gene expression, occurs in phasevarions.We come to this hypothesis as mapping of the sequences methylated by phase-variable methyltransferases to the genes differentially regulated in their phasevarions shows that many of the regulated genes do not contain any recognition sequences in their promoter or immediately upstream regulatory regions [36][37][38].This would mean indirect (i.e.secondary) regulatory events are occurring.modA11 phase-variation results in differential regulation of the ferric uptake regulator Fur in N. meningitidis [20], which is likely to lead to differential expression of the Fur regulon in this organism due to differential production of the Fur protein in modA11 ON vs OFF sub-populations.A summary of the demonstrated and potential methods of regulation by phasevariable DNA methyltransferases is illustrated in Fig. 4.

Importance of phasevarions in vaccine development
As discussed throughout this review, an understanding of the genes regulated by phasevarions is required to understand bacterial virulence and pathobiology.Individual phase-variable genes are easily identified by the presence of sequence features such as SSRs and IRs, and are not ideal vaccine candidates, as their expression is not stable.However, if certain phase-variable genes are highly immunogenic, and/or their expression is required in certain host niches, phase-variable genes should not necessarily be discounted as vaccine candidates.For example, the NadA protein forms part of the 4c-MenB (Bexsero) vaccine against N. meningitidis serogroup B [92], as it has been shown to be expressed at high levels during infection [93].The NTHI vaccine candidate Hia, an outer-membrane adhesin, is also a phase-variable vaccine candidate.Hia is able to induce high levels of serum anti-Hia antibodies in a chinchilla model of NTHi disease [94], and Hia is expressed (phase ON) during colonization of the nasopharynx [42], meaning that a vaccine would prevent NTHi colonization and disease.All described phasevarions regulate the expression of proteins that are involved in pathobiology, and many regulate current and putative vaccine candidates.Phasevarions are distinct from all previous examples of phase-variable genes, as they create phenotypically distinct sub-populations through altered expression of multiple genes.The presence of phasevarions complicates the identification of stably expressed proteins, as the epigenetically regulated genes do not contain features that are easily identifiable in silico.Understanding the factors that are regulated by phasevarions is therefore important in defining the stably expressed antigenic repertoire of organisms that contain phase-variable methyltransferases.This will allow the rational design of more effective vaccines and treatments, and ensure that any vaccine candidates that are shown to be regulated by phasevarions are expressed and able to switch expression under conditions that are relevant to the induction of immunity.[85].Differential methylation between the modH5 ON and modH5 OFF states leads directly to differential expression of fla; we hypothesize two further modes of gene regulation by genome-wide differential methylation.(b) Secondary methylation, where methylation at a distal site leads to differential methylation of a gene shown to be regulated by a phasevarion.There is no direct evidence for this, other than the fact that many genes controlled by phasevarions do not contain the DNA sequence recognized by the relevant methyltransferase in or near their promoter.(c) Tertiary methylation, where differential methylation leads to variable expression of a regulatory protein, or regulatory RNA, which then leads to differential regulation of genes.In the example, differential regulation of Fur by the modA11 phasevarion results in differential expression of multiple genes in the Fur regulon [20].

concluSIonS
Work over the last 10 years has shown that phase-variable DNA methyltransferases are widespread in the bacterial domain.The identification of multiple distinct mechanisms of methyltransferase phase variation implies that phasevarions have evolved independently multiple times, and that this type of variable epigenetic regulation provides a strong selective advantage.In every case where phasevarions have been studied in detail, they regulate genes involved with multiple virulence traits, and understanding the factors regulated is key to rational and successful vaccine design.Even though it is expected that each distinct phasevarion will regulate the same set of genes in the same species, this has never been studied.There is evidence that there are strain-specific gene expression differences influenced by the SpnD39III system in S. pneumoniae, as there are some differences in the phasevarion when comparing S. pneumoniae strains D39 and TIGR4.However, the full extent of these differences needs to be studied further.Whether the same methyltransferase controls the same phasevarion in different species also remains to be studied.The sequence recognized by the phase-variable methyltransferase will be the same, but due to the differences in, for example, the G/C content of the genome, and the genes encoded, the actual genes controlled by methylation differences may vary, as the location of the DNA sequence methylated may vary with these differences.This is an outstanding question, but one we would be interested to see solved.Although the effort required to characterize phasevarions is extensive, ongoing advances in gene and protein expression technology and methylome analysis have streamlined this process.In many cases, particular phasevarions are associated with disease and pathobiology, which underlines the importance of thoroughly characterizing these systems in the bacterial species where they are present.A number of important bacterial pathogens contain phase-variable methyltransferases, and by implication phasevarions, yet these remain to be characterized.The study of these newly identified systems will be key to designing new and more effective vaccines, or improving existing vaccines, against the pathogens that contain them.The production of better and more effective vaccines is particularly important with the increase in antibiotic resistance.Vaccines represent the best way to prevent disease, with phasevarion identification and analysis likely to play a major part in the development of stable and efficacious vaccines against a number of major human pathogens.

Fig. 1 .
Fig. 1.Illustration of the four main types of restriction-modification (R-M) system.(a) Type I R-M loci encode separate restriction (R), methyltransferase (M) and specificity (S) components, encoded by the hsdR, hsdM and hsdS genes, respectively.Methylation can occur independently of the R subunit, through a trimeric M 2 S complex.For DNA cleavage to occur, a pentameric R 2 M 2 S complex must form.The HsdS subunit dictates the DNA sequences that are methylated and restricted; changing the S subunit will change the specificity.(b) Type II R-M systems encode separate restriction (Res; R) and methyltransferase (Mod; M) enzymes.The encoding genes are often located divergently in the genome.The enzymes recognize the same DNA motif (often 6 bp palindromic sequences), but act independently.(c) Type III R-M systems are encoded by co-localized mod (modification; encoding a methyltransferase, Mod/M) and res (restriction; encoding a restriction enzyme, Res/R) genes.Res proteins require Mod to restrict DNA, but Mod enzymes are active as stand-alone methyltransferases.(d) Type IV enzymes require modified DNA to act; in the example this is DNA modified with a methyl group (CH 3 ).

Fig. 2 .
Fig. 2. Illustration of phase-variable Type I R-M systems.(a) A phase-variable Type I R-M locus shuffles domains between expressed and silent loci via recombination between inverted repeats.The example uses the six-way switch, SpnD39III, present in S. pneumoniae as an illustration.Inverted repeats (IRs) are represented by yellow, brown and green boxes.Recombination of DNA sequences encoding the two different 5′ TRDs (blue and purple) and the three different 3′ TRDs (red, pink and orange) generates six different allelic variants in the expressed hsdS locus.This leads to S. pneumoniae expressing six unique HsdS allelic variants, meaning six different methyltransferase activities are expressed in a S. pneumoniae population, depending on the sequence of the hsdS gene in the expressed locus in each individual bacterial cell.This locus also encodes a DNA recombinase, creX.(b) S. suis contains inverted hsdS loci like the SpnD39IIIsystem, but only contains expressed (hsdS) and silent (hsdS′) genes containing inverted repeats (blue and brown).In this system, there are two different 5′ TRDs (green and yellow), and two different 3′ TRDs (pink and purple).This means that four different HsdS allelic variants can be expressed by recombining between the IRs in these hsdS loci (blue and brown boxes), meaning four different methyltransferase specificities in a population of S. suis.(c) Either a full-length or a truncated protein is produced from the NgoAV locus in N. gonorrhoeae, dependent on the length of the SSR tract between the 5′ TRD (red) and the 3′ TRD (green).The truncated HsdS is equivalent to just the 5′ TRD (red), with two truncated proteins dimerizing to form a functional HsdS protein.This leads to two different methyltransferase specificities.

Fig. 3 .
Fig. 3. Depiction of phase-variable Type III mod genes and alleles.(a) Different mod genes show little (<25 %) sequence identity, as illustrated with the modA (red) and modB (purple) genes.The hatched box represents the central target recognition domain (TRD) that dictates the DNA sequence methylated by the Mod protein, with different TRDs in the same mod backbone representing different allelic variants of the same mod gene.The location of the SSR tract is depicted in grey, with the repeating unit represented underneath, e.g.modA contains an SSR tract made up of varying numbers of AGCC (n) or AGTC (n) repeats, and modB contains an AACCC (n) repeat tract.The positions of the catalytic domain, DPPY, and the substrate-binding domain, FXGXG, are depicted by black bars.When bacterial strains contain multiple independently phase-variable mod genes, these are located in different places in the genome.modA genes are found in NTHi, pathogenic Neisseria, and K. kingae.modB genes are found in the pathogenic Neisseria.(b) Allelic variants of the same mod gene are highly conserved in their 5′ and 3′ regions (white coloured arrows), whereas the central variable TRD region varies between different alleles of the same gene (different coloured boxes), meaning that different alleles have different methylation specificities.The example uses the five most common phase-variable modA alleles in patients with otitis media, studied in[35], as an illustration.

Fig. 4 .
Fig.4.Illustration of the ways phase-variable methyltransferases control gene expression in a phasevarion.(a) Direct DNA methylation at a promoter has been demonstrated for a single phasevarion-controlled gene, the fla gene in the H. pylori modH5 phasevarion[85].Differential methylation between the modH5 ON and modH5 OFF states leads directly to differential expression of fla; we hypothesize two further modes of gene regulation by genome-wide differential methylation.(b) Secondary methylation, where methylation at a distal site leads to differential methylation of a gene shown to be regulated by a phasevarion.There is no direct evidence for this, other than the fact that many genes controlled by phasevarions do not contain the DNA sequence recognized by the relevant methyltransferase in or near their promoter.(c) Tertiary methylation, where differential methylation leads to variable expression of a regulatory protein, or regulatory RNA, which then leads to differential regulation of genes.In the example, differential regulation of Fur by the modA11 phasevarion results in differential expression of multiple genes in the Fur regulon[20].