Campylobacter sequence typing databases: applications and future prospects

Human campylobacteriosis, caused by the zoonotic bacteria Campylobacter jejuni and Campylobacter coli, remains a major cause of gastroenteritis worldwide. For many countries the implementation of effective interventions to reduce the burden of this disease is a high priority. Nucleotide sequence-based typing, including multilocus sequence typing (MLST) and antigen gene sequence typing (AGST), has provided unified, comprehensive, and portable Campylobacter isolate characterization, with curated databases of genotypes available (pubMLST.org/campylobacter). Analyses of large collections of isolates from various sources with these approaches have provided many insights into the epidemiology of these ubiquitous and diverse organisms. C. jejuni and C. coli populations are structured into clonal complexes, which reflect genealogy and are associated with specific phenotypes, e.g. the predisposition to infect particular animals, a property that has permitted the development of genetic means of attributing isolates from human disease to potential sources. This has identified retail meat, and especially chicken, as the likely cause of most human disease in many countries, although some human isolates have other likely origins. Such data have led directly to effective intervention studies and will be important in ongoing targeting of intervention strategies and the monitoring of their effectiveness. MLST and AGST data have also been employed in epidemiological investigations and studies of Campylobacter evolution and population biology. The sequence databases that have been established are compatible with the whole-genome sequencing (WGS) approaches likely to be implemented soon; indeed, the hierarchical approach adopted by MLST and AGST will be essential for the exploitation of WGS data.


Introduction
More than 30 years after human campylobacteriosis was described as a 'new' disease (Skirrow, 1977;Skirrow et al., 1993), its epidemiology remains incompletely understood (Gillespie et al., 2002).Elucidating the transmission of the two major causative bacteria of this disease, Campylobacter jejuni (about 90 % of cases) and Campylobacter coli (most of the remaining 10 % of cases), to humans is essential for the development and implementation of effective public health interventions, which are a priority in many countries, including the UK (Tam et al., 2012).These two bacteria represent an extremely prevalent cause of gastroenteritis worldwide, responsible for an estimated 400-500 million cases a year (Friedman et al., 2000).A substantial proportion of cases are unreported, perhaps 7-10 times the number of reported cases in industrialized countries, yet these unreported cases still contribute significant costs to economies through lost working hours (Allos, 2001).Although campylobacteriosis is normally a self-limiting and relatively mild disease, it can be severe, indeed life-threatening, is a leading cause of hospitalization in the USA (Scallan et al., 2011), and in some cases leads to debilitating neuropathologies (Nachamkin, 2002).
Two features of Campylobacter infection have hindered investigations into the epidemiology of human campylobacteriosis: (i) human disease is most commonly sporadic (Gormley et al., 2011), and (ii) the bacterium can be readily isolated from intestines of many different animals and environmental sources such as water and soil (Brown et al., 2004).Accurate isolate characterization is consequently essential to this endeavour, but the early application of serological methods, which were effective for typing organisms such as Salmonella, was ineffective for Campylobacter.It has been established that this was due to a combination of factors including: (i) different antigens being targeted within one serological assay; (ii) phase variation of some antigens, meaning that the results changed upon subculture; and (iii) horizontal genetic exchange, which resulted in antigen genes being reassorted among different Campylobacter genotypes (Allos et al., 2004;Dingle et al., 2001b).
Molecular typing methods based on electrophoresis banding patterns or analysis of single loci, including PFGE fingerprinting, RFLP analysis and flaA short variable region (SVR) typing, were successful in highlighting similarities among Campylobacter isolates from human disease and Downloaded from www.microbiologyresearch.orgby IP: 54.70.40.11On: Sat, 08 Dec 2018 20:26:27 farm animal species.They also indicated a possible role for environmental reservoirs, such as water and flies, in the transmission of Campylobacter strains to poultry and other host sources (Hald et al., 2008;Newell & Fearnley, 2003).Difficulties in the reproducibility of methodology and interpretation of results among laboratories, however, precluded these being used as unified typing schemes by which Campylobacter isolates could be compared on a wider scale (Wassenaar & Newell, 2000).
The application of sequence-based typing schemes to Campylobacter, both multi-locus sequence typing (MLST) (Dingle et al., 2001b) and antigen gene sequence typing (AGST) (Dingle et al., 2008) provided the tools necessary for the reproducible and portable classification of Campylobacter isolates.The widespread adoption of these approaches, along with the provision of online databases that catalogue the extensive variation of these bacteria (pubMLST.org/campylobacter),have permitted major advances in understanding their epidemiology and population biology on a local, national, and global scale (Fig. 1).MLST is a nucleotide sequence-based scheme (Maiden et al., 1998), based on the principles of multi-locus enzyme electrophoresis (MLEE) (Selander et al., 1986), and gives congruent results with this method for those organisms where the two approaches have been compared, including Campylobacter (Meinersmann et al., 2002;Sails et al., 2003b).MLST indexes variation at a number of different housekeeping genes, usually seven, which are subject to stabilizing selection for conservation of function (Maiden, 2006).Data in the form of nucleotide sequence or allelic profiles are electronically portable, comparable, and lend themselves to further population genetic analyses with a range of approaches.MLST is very powerful in detecting groups of related organisms, commonly referred to as 'clones' or 'lineages', but can lack resolution for very closely related isolates, although this resolution can be enhanced by adding a sequence-based characterization of a number of more variable loci, particularly those encoding protein antigens (AGST), or by using other approaches (Clark et al., 2012).This review describes the impact of sequence-based isolate characterization in improving our understanding of the biology of C. jejuni and C. coli, from the development of the C. jejuni MLST scheme in 2001 (Dingle et al., 2001b).

Campylobacter MLST scheme
The definitive seven-locus MLST scheme used for Campylobacter indexes variation within fragments of seven housekeeping genes (aspA, glnA, gltA, glyA, pgm, tkt and uncA) (Dingle et al., 2001b).A number of alternative primer sets are available for the amplification and sequencing of these genes from C. jejuni and C. coli (Dingle et al., 2001b(Dingle et al., , 2005;;Gormley et al., 2008;Miller et al., 2005): given the high genetic variability of Campylobacter isolates, a combination of these is often required for the characterization of the widest range of isolates possible.Two other MLST schemes, specific for C. jejuni and using some different housekeeping loci, have also been described (Manning et al., 2003;Suerbaum et al., 2001).These are broadly equivalent to the definitive scheme, but are not widely used.Although these alternative schemes do not contribute to the recognized sequence typing nomenclature maintained by the PubMLST database, the additional loci can be used to expand the original scheme and gain greater discrimination.MLST schemes have been described for other Campylobacter species but shall not be discussed further here (Miller et al., 2005;Parsons et al., 2012;van Bergen et al., 2005).
As it is a nucleotide sequence-based scheme, it is possible to use MLST data to design diagnostic techniques using, for example, real-time PCR to identify single nucleotide polymorphisms (SNPs) (Best et al., 2005).Unfortunately, however, the extent and nature of the diversity of Campylobacter populations, with very large numbers of SNPs that are reassorted by frequent horizontal genetic exchange, make it impossible to definitively associate SNPs with genotypes, and any technique that does not exhaustively sample variation may result in the misidentification of isolates of these highly diverse organisms.Fortunately, recent increases in sequencing capacity and reductions in cost make it increasingly possible to collect the contiguous sequence data necessary for reliable typing of these organisms (Sheppard et al., 2012).
In common with other MLST schemes, each of the alleles at each locus is assigned a unique arbitrary allele number, in order of discovery.Thus each isolate typed by MLST has an allelic profile made up of the allele numbers for each locus.Each allelic profile is, in turn, assigned an arbitrary sequence type (ST) number -therefore the ST defines 3309 bp of unique sequence (e.g.ST-21 has the allelic profile 2-1-1-3-2-1-5).The need for a scheme which summarizes and compresses sequence data this efficiently is demonstrated by the consideration that, at the time of writing, there were 5891 STs defined in the pubMLST/campylobacter website, corresponding to the many unique sequences of 3309 bp out of a genome of 1.64 Mbp (i.e. the MLST loci correspond to about 0.2 % of the Campylobacter genome).STs in turn can be grouped into clonal complexes, groups of STs that share a minimum of four identical alleles with an ST that has been defined as a 'central genotype'.These central genotypes are typically high frequency, widely distributed in space and time, and occupy a central position when STs from populations are analysed with heuristic approaches such as split decomposition (Huson, 1998), NEIGHBOURNET (Bryant & Moulton, 2004) or EBURST (Feil et al., 2004) (Fig. 2).For many bacteria, including Campylobacter, these informally defined groupings have proven useful units of analysis (Colles et al., 2003;Dingle et al., 2002).MLST data consisting of all alleles described to date, together with provenance data for isolates with new STs, are distributed via the internet-accessible, curated database http://pubmlst.org/campylobacter/.At the time of writing (May 2012) this database included submissions from more than 100 users from diverse laboratories worldwide comprising more than 130 642 sequences, 5891 STs and data for 18 406 isolates.
Campylobacter AGST Additional discrimination of isolates, where required, has been achieved by indexing variation in the SVRs of the antigen genes flaA and flaB, encoding A and B regions of the flagella (Alm et al., 1993), and porA, encoding a major outer-membrane protein (Zhang et al., 2000).These can be sequence-typed in combination with MLST, for example to detect strains responsible for disease outbreaks (Clark et al., 2005;Dingle et al., 2002Dingle et al., , 2008;;Meinersmann et al., 1997;Sails et al., 2003a).The diversity in the nucleotide sequences of these gene regions, which are under positive (diversifying) selection, is much greater than that seen amongst housekeeping genes, which are under stabilizing selection, and so for the antigens allele designations are given for both nucleotide sequences and the translated peptide sequences which they encode.As with MLST data, alleles are assigned and curated using the PubMLST database (pubmlst.org/campylobacter/),which at the time of writing held designations for over 1500 flaA/B and 1400 porA nucleotide sequences, and 357 FlaA/B and 1364 MOMP (PorA) peptide sequences.Strain designation using MLST and AGST of three loci in combination gave a discriminatory index greater than 0.99, which was higher than reported for methods such as PFGE (Dingle et al., 2008).
Allelic variants from the central genotype (ST-45) are indicated by underlining, and the vertices are labelled for the allelic change that they represent.Note that the relationships among the STs do not conform to a bifurcating tree-like model.Pragmatically clonal complexes are defined within pubMLST/campylobacter by a central genotype, of which ST-45 is one, with all STs sharing at least four loci with the central genotype being included in the complex.
IP: 54.70.40.11 On: Sat, 08 Dec 2018 20:26:27 association or virulence (Clark et al., 2007;Cody et al., 2009).It is not recommended that sequence typing of the FlaA SVR is used as a typing method in the absence of other typing data such as those from MLST, since it is not a consistent marker of ST, clonal complex, or even species (Dingle et al., 2005;Djordjevic et al., 2007;Korczak et al., 2009;Meinersmann et al., 2005).
Campylobacter population structure C. jejuni and C. coli are among the bacteria shown to be naturally competent for DNA uptake (Wang & Taylor, 1990), and this property, principally due to the horizontal genetic exchange that it promotes, has a major impact on their population structure and evolution, as it does on other transformable bacteria (Didelot & Maiden, 2010).The two Campylobacter species are genetically highly diverse, with much of this diversity generated by reassortment of sequence variation (Harrington et al., 1997), as indicated by the very large number of alleles identified for each of the MLST loci (318-605 at the time of writing, depending on the locus), which is exceeded by approximately an order of magnitude by the number of STs (18 406).MLST data have been used to estimate recombination parameters for these organisms, indicating very high rates of change with a relatively low contribution from point mutation (Wilson et al., 2009).As a consequence of this, these organisms do not exhibit a clonal population structure (Levin, 1981), but are partially clonal (Maynard Smith et al., 1993), and their populations are dominated by clusters of related genotypes which are recognised by MLST as clonal complexes.Although clonal complexes are pragmatically defined, as described above, they nevertheless have the strength that they reflect the genealogy of the species (Sheppard et al., 2010a(Sheppard et al., , 2011a) ) and have become major units of analysis for Campylobacter populations (Dingle et al., 2002).
Intriguingly, the two species have different population structures: C. jejuni populations comprise many clonal complexes with little evidence of any phylogenetic relationship among them; although there are some groups of phylogenetic relationships among some clonal complexes, there is little evidence of a clonal frame linking all clonal complexes (Maiden & Dingle, 2008).C. coli, by contrast, comprises three distinct clades (clades 1-3), which are related to each other clonally.Most of human infection is caused by clade 1, and the majority of these belong to one of two clonal complexes, the ST-828 complex and the ST-1150 complex.Comparison of clade 1 isolates with clade 2 and 3 isolates has shown that the ST-828 and ST-1150 complexes have undergone recent extensive and genome-wide introgression of genetic material from C. jejuni (Sheppard et al., 2008(Sheppard et al., , 2011a)).This observation has been controversial, as genome-wide introgression is unusual in bacteria and violates some models of how bacterial populations evolve (Caro-Quintero et al., 2009;Lefe ´bure et al., 2010), although this process may also have occurred during the evolution of Salmonella enterica serovars Typhi and Paratyphi A (Didelot et al., 2007).However, this process may be the exception rather than the rule in bacterial populations and reflect a recent change in the evolutionary pressures experienced by C. jejuni and C. coli as a consequence of the development of intensive agriculture practices, especially in chicken production (Sheppard et al., 2011a(Sheppard et al., , b, 2012)).

Epidemiology of human infection
The main motivation to study C. jejuni and C. coli is human campylobacteriosis (Gormley et al., 2011), and the application of sequence-based typing to isolates from human disease has revealed very high diversity of isolates, with many types recovered.Despite this diversity, there is remarkable similarity amongst isolate collections both on a national and international scale, even across different continents (Cody et al., 2012;Dingle et al., 2008;Duim et al., 2003;Kittl et al., 2011;Litrup et al., 2007;Mickan et al., 2007;Sopwith et al., 2006).For example, there was relatively little genetic differentiation evident amongst human disease isolates from two different areas of the UK, Canada and Australia; however, greater genetic variation was apparent comparing strains from Curac ¸ao, an island in the Caribbean, with those from the more industrialized countries (Dingle et al., 2008).These results imply that climate, culture, agricultural practices, and food distribution are important contributing factors in shaping the global epidemiology of human campylobacteriosis.It appears that Campylobacter genotypes are able to cross different continents colonizing the same host and food source more easily than colonizing different host sources within even the same farm (McCarthy et al., 2007).
The Campylobacter MLST Project in Scotland (CaMPS) study, surveying 5674 human disease isolates over an 18 month period, demonstrated the scalability of sequence typing methods and the ease with which such data can be shared.In addition to enabling source attribution and confirming the similarity of genotypes recovered from campylobacteriosis in diverse regions of the UK, the CaMPS study demonstrated that there were differences in the Campylobacter types causing infection in urban areas, where the risk factor is most likely to be retail food, and rural areas, where young children were more likely to be infected with genotypes similar to those observed in bovines (Strachan et al., 2009).Such effects have also been seen in using spatial modelling in New Zealand, where rural residence and a high density of dairy cattle have both been shown to be risk factors for Campylobacter infection (Spencer et al., 2012).
Investigation of disease outbreaks caused by Campylobacter is impossible without high-resolution genetic typing methods (Clark et al., 2012).Household outbreaks and secondary transmission are relatively rare and difficult to detect: although it has been shown that 89 % of such outbreaks are caused by a single ST (Rotariu et al., 2010) the national distribution of Campylobacter strains through food exacerbates the problem of distinguishing point source outbreaks from dissemination by national food production.For example, temporally related clusters of indistinguishable Campylobacter subtypes in the UK are indicative of widely distributed food (Dingle et al., 2008).The situation may be even more complex, with a recent outbreak in Scotland being associated with contaminated chicken liver pate ´, from which four different Campylobacter strains were recovered (Forbes et al., 2009).
A number of longitudinal studies of human campylobacteriosis conducted in the UK using MLST have shown that the Campylobacter genotypes infecting humans in different regions are similar, which is consistent with nationally distributed food being a major source of infection, with some infection due to travel abroad (Cody et al., 2012).Although these studies show no association of particular genotypes with virulence, different clonal complexes are prevalent at different times, with the ST-45 and ST-283 complexes more common in the summer.There is evidence that genotypes change over time but that this is a gradual process, with relatively small changes in the frequencies of different clonal complexes affecting humans year-on-year (Bessell et al., 2012;Cody et al., 2012;McCarthy et al., 2012;Sopwith et al., 2010).Some differences are apparent in the epidemiology of human infection with C. jejuni and C. coli, but the reasons for this are unclear (Sopwith et al., 2010).In conclusion, most human disease is caused by Campylobacter genotypes found in retail food, especially chicken meat, with relatively small changes in genotypes over shorter periods of time, except for a seasonal signal in some, but not all genotypes.Such changes that do occur among countries appear to be largely due to different exposure risks.

Neuropathology
Although the majority of Campylobacter infections result in diarrhoea of varying degrees of severity, some lead to the severe neuropathological disorders Guillain-Barre ´syndrome (GBS) and Miller Fisher syndrome (MFS), and to reactive arthritis.These conditions are relatively rare, and well-described isolate collections of associated Campylobacter are small in number and contain relatively few isolates.Nevertheless, the application of sequence-based characterization has enabled comparative studies of the genotypes of isolates from uncomplicated gastroenteritis with data from various collections of Campylobacter isolates associated with GBS, MSF and reactive arthritis obtained in the Netherlands, Belgium, Denmark and Bangladesh (Dingle et al., 2001a;Islam et al., 2009;Nielsen et al., 2010).These studies show that the isolates from patients with neuropathology are diverse and broadly similar to those from gastroenteritis, supporting the suggestion that the genetic factors responsible for neuropathology are reassorted among C. jejuni strains by horizontal genetic exchange, made on the basis of hybridization studies of 1712 genes in 56 isolates associated with neuropathology (Taboada et al., 2007).
Two studies have highlighted the relative over-representation of the ST-22 complex in neuropathology-associated isolates (Dingle et al., 2001a;Nielsen et al., 2010): this complex also accounted for 2/10 of the GBS-associated isolates but none of 39 gastroenteritis isolates from Bangladesh.In addition the ST-403 complex, represented by two different STs, accounted for 5/10 GBS isolates as well as 26 % of gastroenteritis isolates from the same country (Islam et al., 2009).This complex is associated with lipooligosaccharide (LOS) class B, which is thought to be a molecular mimic of gangliosides of human nerve cells (Kimoto et al., 2006;Mortensen et al., 2009).The ST-403 complex also accounted for one isolate in a European isolate collection, but this was consistent with its relative abundance amongst human disease isolates as a whole (Dingle et al., 2001a).It is uncertain whether the apparent greater prevalence of ST-403 amongst the Bangladesh GBS isolates truly represents an increased tendency to cause neuropathogenic disease, or whether it reflects a greater abundance amongst human disease isolates as a whole, similar to the distribution observed in Curac ¸ao (Dingle et al., 2008).Taken together these data are consistent with neuropathology largely being a consequence of expression of particular surface antigens by Campylobacter strains which infect humans: this expression is somewhat, but not absolutely, correlated with membership of a particular clonal complex (Engberg et al., 2001).

Campylobacter in animals
Food-based surveys have consistently shown the presence of high levels of Campylobacter in chicken meat, implicating chickens as a major potential source of human infection (Gormley et al., 2011).Consequently there has been extensive interest in infection of commercial chickens by Campylobacter, and commercial broiler flocks were the best-sampled animal source of Campylobacter at the time of writing, with the sampling of caecal contents at slaughter the most commonly used method.Characterization of such isolates with MLST has enabled direct comparison of isolates from a wide variety of samples, and has been consistent with that undertaken with earlier, less precise, typing methods.This indicates that flocks themselves are the source of the majority of strains contaminating the end product, although cross contamination in the abattoir is also significant (Allen et al., 2007;Colles et al., 2010;Wirz et al., 2010).The Campylobacter isolates recovered after slaughter from one free-range broiler crop were more similar to those recovered from retail chicken meat than to isolates obtained from the live flock before slaughter, implying that production processes have an important impact on the differential survival of Campylobacter genotypes (Colles et al., 2010).There is increasing evidence that a number of clonal complexes are agriculturally associated and dominant throughout poultry production, and thus commonly reach the human consumer via this route (Colles et al., 2010;Mu ¨llner et al., 2010;Sheppard et al., 2011b).
With the chicken meat industry highly industrialized, it is possible that agricultural practices, for example the transfer of chicken-associated STs on travel crates, may promote the transfer of chicken-associated strains amongst farms (Hastings et al., 2011;Ridley et al., 2011).In New Zealand, particular genotypes were associated with different poultry producers; however, there were only three producers in New Zealand and the situation may be more complex in other, less remote, countries (Mu ¨llner et al., 2010).A study in Switzerland, for example, found only minor differences in Campylobacter genotypes recovered from different poultry meat production companies, although greater prevalence of the ST-257 complex was noted from one abattoir, and a novel ST accounted for 34.6 % of isolates from another (Wirz et al., 2010).
The Campylobacter populations that infect broiler flocks can be complex, containing multiple genotypes, and flocks may be colonized by a succession of different genotypes over time (Bull et al., 2006;Colles et al., 2008bColles et al., , 2011a;;Schouls et al., 2003).Most information is available for broiler flocks which are slaughtered at a young age; however, a longitudinal study of a free-range broiler breeder flock over the course of a year indicated that Campylobacter populations naturally become more diverse as flocks age (Colles et al., 2011a).The greater diversity observed was similar to that sampled amongst flocks of wild birds, which, in contrast to intensively reared commercial birds, exhibit much greater variation in age group, immunological maturity and diet (Colles et al., 2008a(Colles et al., , 2009)).There is evidence that better leg health and lower growth rate are associated with greater diversity of Campylobacter genotypes in Campylobacter-positive broiler flocks (Bull et al., 2008a;Colles et al., 2008b), perhaps indicating that a single genotype infection is indicative of poor health status in chickens.Thus, improved welfare of commercial flocks may be a means of managing Campylobacter prevalence (Bull et al., 2008b).Whilst flocks on-farm are colonized by a relatively limited number of genotypes, abattoirs are a convenient point at which a wide variety of flocks (Powell et al., 2012), farms and companies can be sampled, in order to maximize the extent of genetic diversity that can be recovered.
It remains difficult to establish definitively the routes by which chicken flocks become colonized, possibly because there are many of them.A number of studies have isolated Campylobacter strains that were indistinguishable by MLST from broiler flocks and their environment, including areas of housing, drinking water, puddles and nearby cattle (Ogden et al., 2007;Patriarchi et al., 2011).Large population-based studies, however, indicate that while certain clonal complexes, for example the ST-21 and ST-45 complexes, are able to colonize multiple host sources, the majority of Campylobacter isolates from broiler flocks are characteristically 'chicken-associated' and can be differentiated from ruminant and environmental strains (McCarthy et al., 2007;Mullner et al., 2009b;Sheppard et al., 2009b;Wilson et al., 2008) (Table 1).Campylobacter genotypes isolated from non-agricultural sources, such as wild birds and mammals, or from environmental waters that are not contaminated by agricultural run-off, are found only rarely among broiler flock isolates (Griekspoor et al., 2010).In one study of free-range chickens, the STs recovered at two separate locations were similar at the same time and could not be predicted by farm location, despite flocks being exposed to different local environments through minimal biosecurity (Colles et al., 2008b).Biosecurity remains a high priority in reducing Campylobacter levels in chickens, however, and MLST typing of strains provides a means by which 'environmental contamination' can more accurately be defined and attributed to agricultural or wildlife sources (Ridley et al., 2011).
Fewer large studies have been published for other animal sources but, as seen with chickens, the clonal complexes present vary within and among cattle herds over time, and frequently resemble those isolated from human disease isolates (Kwan et al., 2008a;Sanad et al., 2011).Some studies have identified a spatial relationship among genotypes, with isolates being more similar within rather than among farms (French et al., 2005;Kwan et al., 2008a;Rotariu et al., 2009).These data suggest that differences may be maintained among farms by localized transmission and continual reinfection, whilst animal movements on a regional scale may contribute to overall homogeneity of Campylobacter diversity (Ridley et al., 2011).Scottish cattle and sheep have been shown to harbour different Campylobacter populations, despite sharing the same farm environment (Sproston et al., 2011), and while the genotypes of Campylobacter isolates from pet dogs are diverse, they exhibit a high degree of similarity to human disease isolates (Parsons et al., 2009).Companion animals may present a potential source of infection for humans, but it is also possible that they become infected by shared routes of transmission.

Host association
Many clonal complexes are over-represented amongst particular host sources (Table 1), with growing evidence that host specificity of at least some Campylobacter genotypes overrides geographical location (McCarthy et al., 2007;Sheppard et al., 2010b).Examples include: (i) the ST-45 and ST-257 clonal complexes, which are common amongst poultry in Europe, New Zealand and, to a lesser extent, Senegal (Colles et al., 2003;de Haan et al., 2010a;Griekspoor et al., 2010;Habib et al., 2009a;Jorgensen et al., 2011;Kinana et al., 2006;Magnu ´sson et al., 2011;McTavish et al., 2009;Patriarchi et al., 2011;Wirz et al., 2010); (ii) the ST-61 and ST-42 clonal complexes, which have been found to be common amongst ruminants in This database depends on voluntary submissions and the data are, therefore, neither necessarily representative nor exhaustive; however, they do indicate that different clonal complexes show marked differences in the likelihood of being associated with particular isolation sources.Note that all clonal complexes are C. jejuni, with the exception of the ST-828 clonal complex, which is C. coli.

Human
The differentiation of Campylobacter populations isolated from diverse wild bird species is particularly striking (Colles et al., 2008a(Colles et al., , 2011b;;Ogden et al., 2009;Sheppard et al., 2011b).C. jejuni populations isolated from more than 400 samples from wild starlings and geese sampled in the same geographical location provide a typical example, having no STs in common, and being more than 60 % different using F ST , a measure of genetic differentiation, or gene flow (Colles et al., 2008a).There is evidence that some clonal complexes, in particular the ST-21 and ST-45 complexes, are 'multihost' genotypes which can be isolated from a wide variety of agricultural and environmental sources (Table 1) (Colles et al., 2003;Sheppard et al., 2011b).These genotypes remain a challenge in attribution and epidemiological analysis, and may require higherresolution genotyping than seven-locus MLST.
The situation is less clear for C. coli, since the majority of isolates from human disease and agricultural sources group into the large ST-828 clonal complex, or the less frequently isolated ST-1150 clonal complex.Nonetheless, host association is evident, since the C. coli genotypes isolated from large studies of pigs have little overlap with those isolated from turkey, chicken, and human disease sources (Lang et al., 2010;Litrup et al., 2007;Miller et al., 2006Miller et al., , 2010)).A small number of host-associated multidrug-resistant C. coli genotypes have been identified in association with the US turkey production industry (D'Lima et al., 2007).In addition, it has been shown that wild mallard ducks were colonized by C. coli that grouped into two clades which were distinct from the agriculture-associated ST-828 and ST-1150 clonal complexes (Colles et al., 2011b).After 10 years of MLST, novel STs for both Campylobacter species are now most often isolated from environmental sources such as water and wildlife, including rabbits, badgers, bank voles and even slugs, suggesting that the diversity of the Campylobacter population as a whole is far from exhaustively sampled but that we have a reasonable picture for human and agricultural isolates (Carter et al., 2009;French et al., 2005;Kwan et al., 2008b;Le ´vesque et al., 2008;Sproston et al., 2010;Williams et al., 2010).
Despite evidence of strong host association for particular clonal complexes, the frequency with which they are isolated varies among countries, farms, and over time.
There are examples of localized transmission resulting in, for example, the predominance of the ST-474 complex amongst chicken and human disease isolates in New Zealand (McTavish et al., 2008).Some studies have identified higher similarity among clonal complexes isolated from chicken flocks that are sampled closest in time (Colles et al., 2008b;Jorgensen et al., 2011) or within, rather than among, farms for ruminant isolates (Rotariu et al., 2009).A succession of genotypes colonizing flocks and/ or farms over time amongst chickens and ruminants has also been reported (Bull et al., 2006;Colles et al., 2008bColles et al., , 2011a;;Kwan et al., 2008a), and there is some evidence for different seasonal distributions amongst Campylobacter genotypes.The ST-45 clonal complex, in particular, exhibits its highest prevalence amongst a number of different isolation sources, including human disease sources, during spring or summer months (Grove-White et al., 2011;Jorgensen et al., 2011;Sopwith et al., 2006).Knowledge of the way in which prevalence of Campylobacter strains varies is an essential component of understanding their routes of transmission and potential for human disease (Sheppard et al., 2009a).
Two clonal complexes, the ST-21 and ST-45 complexes, are particularly diverse and are frequently isolated from a wide variety of sources (Table 1).They may represent genotypes that have evolved to exploit a range of different animal hosts, and indeed there is some evidence from wholegenome sequencing (WGS) studies of five ST-21 isolates that this might be the case (Gripp et al., 2011).Alternatively, it may be that these large clonal complexes contain subgroups of host-specific genotypes that are closely related and not easily distinguished from seven-or 10-locus data.The reasons for host specificity of specific Campylobacter genotypes, in terms of detailed molecular mechanisms, remain unclear, but MLST studies provide an evidence base from which representative isolates can be chosen for further studies to resolve these questions.

Genetic source attribution of Campylobacter isolates
Genetic attribution studies using MLST data have made a major contribution in improving our understanding of the relative importance of different infection sources in human disease, and therefore the relative importance of different routes of transmission (Cody et al., 2010b).This is an essential prerequisite for the design of effective interventions to reduce the incidence of human disease.A number of different genetic attribution models have been developed and employed with MLST data: (i) the Dutch model (Mullner et al., 2009b); (ii) the modified Hald model (Mullner et al., 2009a); (iii) STRUCTURE (Falush et al., 2003;Pritchard et al., 2000); (iv) the Asymmetrical Island (AI) model (Wilson et al., 2008), and (v) the Bayesian Analysis of Population Structure (BAPS) model (Corander & Marttinen, 2006).The Dutch and modified Hald models are based on comparing the number of human cases of disease caused by a Campylobacter subtype, relative to the proportional occurrence of particular subtypes in each potential host source (Mullner et al., 2009b).The modified Hald model additionally incorporates a Bayesian approach to statistical analysis in order that uncertainty regarding model parameters may be addressed.Rather than ST (i.e.allelic profile), the STRUCTURE, AI model and BAPS use nucleotide sequence data combined with Bayesian statistics.
The results from these methods using data from Scotland, north-west England and New Zealand were in agreement that strains isolated from chickens, and in particular chicken meat, were the most similar to those isolated from human disease (Gormley et al., 2008;Mullner et al., 2009b;Sheppard et al., 2009b).The percentage estimates have varied, with 58-76 % attribution to chicken sources for New Zealand human disease isolates and 58-78 % attribution to chicken sources for Scottish human disease isolates.This, at least in part, reflected differences in the attribution models used.The Asymmetrical Island model, which unlike STRUCTURE does not assume that all loci are unlinked, consistently gave the highest attribution to chicken and, importantly, also performed best with selfattribution validation tests (Sheppard et al., 2009b).
The application of BAPS attribution methods to data from Finland found attribution of human disease isolates to chicken sources to be much lower than Scotland and New Zealand, at 45.4 % (de Haan et al., 2010b).This can be explained by a lower prevalence of Campylobacter amongst Finnish chicken flocks and differences in transmission to humans in Finland, where environmental exposure is higher, for example as a consequence of outdoor activities and swimming (McCarthy et al., 2012).In this case, attribution of human disease isolates to cattle sources was of equal importance, whilst 10.3 % of isolates could not be attributed to a particular source, suggesting that further investigation of more unusual transmission routes, for example water or companion animals, may be required.
In summary, the widespread collection of MLST data has made it possible to compare results for large numbers of isolates within and among different studies and sampling regimes, enabling the importance of transmission routes of human infection to be investigated.In early 2012, challenges remained in the application of this approach, including the availability of computational power, the number of loci to be included, and the availability of representative isolate collections to be used as reference sets; however, there was the prospect of refining these estimates as more data became available and methods improved.It is essential that large and representative strain collections from population studies are chosen for source attribution analyses in order to avoid false emphases and misleading results.Much added value is to be gained from information on prevalence as well as from provenance information for strains to ensure that biological significance is not lost.(Habib et al., 2009b), and in Senegal, where a relationship has been reported between ST-353 complex isolates from chicken carcasses and a particular resistance-conferring substitution in the gyrA gene (T86I), although identical resistance mechanisms could be found in distantly related isolates (Kinana et al., 2006).A number of other studies, however, have found less evidence for association of resistance phenotypes with membership of particular clonal complexes (Le ´vesque et al., 2008;Wang et al., 2011).

Antimicrobial resistance
Resistance to erythromycin is typically much higher amongst C. coli isolates than C. jejuni (Chan et al., 2007;Wang et al., 2011;Wirz et al., 2010).Most of 104 C. coli isolates from US turkeys, with diverse STs, were resistant to erythromycin, with the exception of a cluster of STs characterized by the asp-103 MLST allele, thought to originate in C. jejuni (Chan et al., 2007).Whilst the majority of C. coli isolates were associated with an intervening sequence in the 23S rRNA genes, it was predictably absent in the cluster of sensitive STs.Extensive resistance to antimicrobials relevant to human disease is present among C. coli isolates from pigs in the USA and Switzerland (Egger et al., 2012;Thakur & Gebreyes, 2005).
These results demonstrate the comparative power that the adoption of sequence-based typing has provided, enabling studies from different parts of the world to be compared as they generate comparable data.Floroquinolone resistance in particular is a global problem (Smith & Fratamico, 2010), as it is present throughout the food chain and in human disease (Cody et al., 2010a), and certain Campylobacter genotypes are associated with resistance (Wirz et al., 2010).In some cases the same gyrA allele conferring resistance can found in widely different types, although it is not possible at the present time to be sure if this is a consequence of horizontal genetic exchange or selection pressures producing the same resistance mutations in diverse bacteria (Kinana et al., 2006)

Future directions
At the time of writing (early 2012) clinical microbiology is entering an era of WGS of multiple bacterial isolates, with the prospect of highly parallel 'next generation' sequencing technology becoming available for routine applications (Sheppard et al., 2011b).The generation of high-quality sequence data for the great majority of the genome for between 10 and 100 US Dollars per bacterial isolate means that cost is unlikely to be a limiting factor, and instrument manufacturers are aggressively developing 'bench-top' parallel sequencers for mass deployment.It is not entirely clear, however, which technology will finally be deployed, although the data generated by the different instruments are largely equivalent.The more pertinent question is how these data are best to be exploited for epidemiological and evolutionary studies (Medini et al., 2008).
MLST and AGST data are entirely compatible with WGS data as they are effectively subsets of it; indeed, as has been illustrated here, the combined 10 loci used from MLST and AGST can resolve many clinical and epidemiological questions, and WGS data may be overdiscriminatory in some cases.What WGS data does is to make the great majority of the loci in the genome accessible simultaneously, increasing the range of questions that can be addressed.The PubMLST database was upgraded to operate on a new database system, BIGSDB (Jolley & Maiden, 2010), in 2010, replacing the MLSTDBNET (Jolley et al., 2004) and AGDBNET (Jolley & Maiden, 2006) databases which formally ran separate MLST and AGST databases, which enables pubMLST/campylobacter to hold and exploit WGS data.
BIGSDB stores three fundamental types of information: (i) isolate provenance and phenotype data; (ii) any type of sequence data, held in a sequence bin; and (iii) catalogues of locus diversity.The database has no inherent limitations on the number of any of these types of data, and a 'locus' for this purpose can be any nucleotide or peptide string.
Loci can be grouped into any number of schemes, such as the MLST scheme, to provide a further level of data organization.Standard search tools, such as BLAST, are used to search for known variants of predefined loci in the sequence bin.These are reposed back as: (i) an identified allele; (ii) a novel sequence with a known relationship to an identified allele; or (iii) not present.The database then tags the related sequence in the sequence bin for easy future reference (Jolley & Maiden, 2010).If most genes are catalogued in the database, this means that the whole genome of a newly sequenced isolate can be rapidly and largely automatically annotated and its genetic variation defined.For organisms such as Campylobacter, which may have an 'open' genome, some gene discovery will be necessary for new isolates (Duong & Konkel, 2009), but most of the genome will be rapidly and effectively annotated (Sheppard et al., 2011b).
The hierarchical approach enshrined in BIGSDB enables the extensive diversity of the Campylobacter isolates of medical importance to be studied efficiently (Sheppard et al., 2012).
What is perhaps extraordinary is that so much has been learned about the epidemiology, pathogenesis and population biology of these organisms from the highly diverse seven MLST and three AGST loci, corresponding to a tiny fraction of the genome: this is presumably a reflection of the extensive genetic structuring of Campylobacter populations, evidenced by the correlation of MLST data with whole-genome sequences and hybridization studies (Hepworth et al., 2011;Lang et al., 2010;Taboada et al., 2008;Zautner et al., 2011).The association of particular genotypes with given phenotypes presumably reflects an important role of selectiondriven adaptation to particular niches in the emergence of these types (Sheppard et al., 2011b).If this is indeed the case, then the careful correlation of genomic sequence data with known phenotype will ultimately elucidate the biology of these fascinating and persistent pathogens.