Overview of molecular typing methods for outbreak detection and epidemiological surveillance

genome [AND] sequencing [AND] typing; microarrays [OR] microarray [AND] typing; optical [OR] whole [AND] genome [AND] mapping [AND] typing. outbreak; MLST


Introduction
Identifying different types of organisms within a species is called typing.Traditional typing systems based on phenotype, such as serotype, biotype, phage-type or antibiogram, have been used for many years.However, the methods that examine the relatedness of isolates at a molecular level have revolutionised our ability to differentiate among bacterial types (or subtypes).The choice of an appropriate molecular typing method (or methods) depends significantly on the problem to solve and the epidemiological context in which the method is going to be used, as well as the time and geographical scale of its use.Importantly, human pathogens of one species can comprise very diverse organisms.Therefore, typing techniques should have excellent typeability to be able to type all the isolates studied [1].In outbreak investigations, a typing method must have the discriminatory power needed to distinguish all epidemiologically unrelated isolates.Ideally, such a method can discriminate very closely related isolates to reveal person-to-person strain transmission, which is important to develop strategies to prevent further spread.At the same time it must be rapid, inexpensive, highly reproducible, and easy to perform and interpret [1,2].When typing is applied for continuous surveillance, the respective method must yield results with adequate stability over time to allow implementation of efficient infection control measures.Moreover, a typing method that is going to be used in international networks should produce data that are portable (i.e.easily transferrable between different systems) and that can be easily accessed via an open source webbased database, or a client-server database connected via the Internet.Additionally, a typing method used for surveillance should rely on an internationally standardised nomenclature, and it should be applicable for a broad range of bacterial species.There should also be procedures in place to check and validate, by using quantifiable internal and external controls, that the typing data are of high quality.A clear advantage for a typing approach is the availability of software that: (i) enables automated quality control of raw typing data, (ii) allows pattern/type assignment, (iii) implements an algorithm for clustering of isolates based on the obtained data, (iv) provides assistance in the detection of outbreaks of infections, and (v) facilitates data management and storage.To date, many different molecular methods for epidemiological characterisation of bacterial isolates have been developed.However, none of them is optimal for all forms of investigation.Thus, a thorough understanding of the advantages and limitations of the available typing methods is of crucial importance for selecting the appropriate approaches to unambiguously define outbreak strains.
Here, we present an overview of the typing methods that are currently used in bacterial disease outbreak investigations and active surveillance networks, and we specify their advantages and disadvantages.Importantly, we focus on those methods that have the strongest impact on public health, or for which there is a growing interest in relation to clinical use.

PubMed database searches
To investigate the impact of typing methods in public health, we first queried the PubMed database using a combination of specific keywords to retrieve the relevant articles without any constraints on the time of publication.Furthermore, in order to reveal a growing interest in particular typing methods, we subsequently restrictively searched PubMed for articles published between January 2010 and the present day (as of 1 December 2012).We considered a method as a method of growing interest when the number of articles published between January 2010 and the present day was higher than the number of articles published before 2010.Specifically, an electronic search was conducted using the following combinations of keywords: PFGE [ [AND] typing.Also, to identify the impact of particular typing methods on outbreak investigations currently conducted, we searched the PubMed database with a restriction to articles published between January 2011 and the present day, using the following combinations of specific keywords: PFGE [ [AND] outbreak.The results of these literature searches have been included in the following sections of this review that address the respective typing methods.

Pulsed-field gel electrophoresis
Pulsed-field gel electrophoresis (PFGE) has been considered as the 'gold standard' among molecular typing methods for a variety of clinically important bacteria.When 'PFGE AND typing' were used as search terms, over 2,700 publications were retrieved in PubMed, which underscores the major influence and importance of this method in the field.For most bacterial species, the technique was adopted as an epidemiological tool in the 1990s [3][4][5][6].Today, it is still the most frequently used approach to characterise bacterial isolates in outbreaks [7,8] as revealed by a PubMed database search with a restriction to articles published between January 2011 and the present day.In total, 183 hits were obtained for the terms 'PFGE AND outbreak', while searches for all other methods in combination with the term 'outbreak' invariably resulted in less than 100 hits.For many years, PFGE has been a primary typing tool to analyse centre-to-centre transmission events, and it has been used successfully in large-scale epidemiological investigations [9].The success of PFGE results from its excellent discriminatory power and high epidemiological concordance.Moreover, it is a relatively inexpensive approach with excellent typeability and intra-laboratory reproducibility.In the past decade, protocols for PFGE have been standardised and inter-laboratory comparison has been undertaken through several initiatives, such as PulseNet [10] or Harmony [11].It has also been possible to establish international fingerprinting databases, which allowed fast detection of emerging clones and monitoring of the spread of pathogenic bacterial strains through different regions or countries.To perform PFGE, a highly purified genomic DNA sample is cleaved with a restriction endonuclease that recognises infrequently occurring restriction sites in the genome of the respective bacterial species.The resulting restriction fragments, which are mostly large, can be separated on an agarose gel by 'pulsed-field' electrophoresis in which the orientation of the electric field across the gel is changed periodically.The separated DNA fragments can be visualised on the gel as bands, which form a particular pattern on the gel, the PFGE pattern.For most bacteria PFGE can resolve DNA fragments with sizes ranging from about 30 kb to over 1 Mb [12].Large restriction fragments are thus separated in a sizedependent manner and the method yields relatively few bands on the gel, which makes analysis of the results easier.A clear advantage of the PFGE method is that it addresses a large portion of an investigated genome (>90%).Accordingly, insertions or deletions of mobile genetic elements as well as large recombination events within genomic DNA will result in changes in the PFGE patterns.Usually, plasmid DNA does not interfere with the macrorestriction profiles of the chromosomal DNA, which is responsible for the particular PFGE pattern, as the fragments generated by restriction of plasmid DNA are too small to affect the profile.However, in some bacteria, differences in the carriage of large plasmids (over 50 kb) have been observed as singleband differences between the respective PFGE profiles [12].Unfortunately, although widely used, PFGE suffers from several limitations.The method is technically demanding, labour-intensive and time-consuming, and it may lack the resolution power to distinguish bands of nearly identical size (i.e.fragments differing from each other in size by less than 5%).Moreover, the analysis of PFGE results is prone to some subjectivity and the continuous quality control and portability of data are limited compared to sequence-based methods.

Amplified fragment length polymorphism
In the amplified fragment length polymorphism (AFLP) method, genomic DNA is cut with two restriction enzymes, and double-stranded adaptors are specifically ligated to one of the sticky ends of the restriction fragments [13].Subsequently, the restriction fragments ending with the adaptor are selectively amplified by polymerase chain reaction (PCR) using primers complementary to the adaptor sequence, the restriction site sequence and a number of additional nucleotides (usually 1-3 nucleotides) from the end of the unknown DNA template.At the start of the amplification, highly stringent conditions are used to ensure efficient binding of primers to fully complementary nucleotide sequences of the template.AFLP allows the specific co-amplification of high numbers (typically between 50 and 100) of restriction fragments and is often carried out with fluorescent dye-labeled PCR primers.This allows to detect the fragments once they have been separated by size on an automated DNA sequencer.A subsequent computer-assisted comparison of high-resolution banding patterns generated during the AFLP analysis enables the determination of genetic relatedness among studied bacterial isolates [14].AFLP has been described as being at least as discriminatory as PFGE [15].In addition, AFLP is a reproducible approach and like other DNA banding pattern-based methods it can be automated [16] and results are portable.The major limitations of AFLP include the fact that it is labour-intensive (a typical analysis takes about three days), and the kits for extraction of the total DNA, enzymes, fluorescence detection systems and adaptors are expensive.

Random amplification of polymorphic DNA and arbitrarily primed polymerase chain reaction
Random amplification of polymorphic DNA (RAPD) is based on the parallel amplification of a set of fragments by using short arbitrary sequences as primers (usually 10 bases) that target several unspecified genomic sequences.Amplification is conducted at a low, non-stringent annealing temperature, which allows the hybridisation of multiple mismatched sequences.When the distance between two primer binding sites on both DNA strands is within the range of 0.1-3 kb, an amplicon can be generated that covers the sequence between these two binding sites.Importantly, the number and the positions of primer binding sites are unique to a particular bacterial strain.RAPD amplicons can be analysed by agarose gel electrophoresis or DNA sequencing depending on the labeling of primers with appropriate fluorescent dyes.Although, less discriminatory than PFGE, RAPD has been widely used for the typing of bacterial isolates in cases of outbreaks [17,18], because it is simple, inexpensive, rapid and easy in use.The main drawback of the RAPD method is its low intra-laboratory reproducibility since very low annealing temperatures are used.Moreover, RAPD lacks inter-laboratory reproducibility since it is sensitive to subtle differences in reagents, protocols, and machines.
Arbitrarily primed PCR (AP-PCR) is a variant of the original RAPD method, and it is therefore often referred to as RAPD [19].The differences between the AP-PCR and RAPD protocols involve several technical details.In AP-PCR: (i) the amplification is conducted in three parts, each with its own stringency and concentration of components, (ii) high primer concentrations are used in the first PCR cycles, and (iii) primers of variable length and often designed for other purposes are used.Consequently, the advantages and limitations of AP-PCR are identical to those of RAPD, as pointed out above.

Repetitive-element polymerase chain reaction
Repetitive-element PCR (rep-PCR) is based on genomic fingerprint patterns to classify bacterial isolates.The rep-PCR method uses primers that hybridise to noncoding intergenic repetitive sequences scattered across the genome.DNA between adjacent repetitive elements is amplified using PCR and multiple amplicons can be produced, depending on the distribution of the repeat elements across the genome.The sizes of these amplicons are then electrophoretically characterised, and the banding patterns are compared to determine the genetic relatedness between the analysed bacterial isolates.Multiple families of repeat sequences have been used successfully for rep-PCR typing, such as the 'enterobacterial repetitive intergenic consensus' (ERIC), 'the repetitive extragenic palindromic' (REP), and the 'BOX' sequences [20].As this typing approach is based on PCR amplification and subsequent DNA electrophoresis, the results of rep-PCR can be obtained in a relatively short period of time.This is also the reason why this approach is very cheap.For many bacterial organisms rep-PCR can be highly discriminatory [21,22].The main limitation of rep-PCR combined with electrophoresis using traditional agarose gels is that it lacks sufficient reproducibility, which may result from variability in reagents and gel electrophoresis systems.
The DiversiLab system (bioMérieux, Marcy l'Etoile, France) is a semiautomated method using the rep-PCR approach.We mention it here, because it is used in local infection control settings by a number of hospitals worldwide.In this case, commercial PCR kits have been developed for a series of clinically important microorganisms [23].After PCR, amplified genomic DNA regions between repetitive elements are separated by high-resolution chip-based microfluidic capillary electrophoresis.The microfluidic capillary electrophoresis has been utilised by the DiversiLab system to substantially increase resolution and reproducibility of the rep-PCR approach in comparison to traditional gel electrophoresis.The resulting data are automatically collected, normalised and analysed by the DiversiLab software.A number of studies have evaluated the usefulness of DiversiLab by comparing its performance with current standard typing methods using well-characterised collections of outbreak-related and epidemiologically unrelated bacterial isolates [24][25][26].These studies have shown that the DiversiLab system is simple, easy to perform, rapid, reproducible, endowed with full typeability and applicable to a wide range of microorganisms.The authors concluded that for most bacterial species, in case of a suspected outbreak in hospital settings, DiversiLab is useful especially in first-line outbreak detection.In particular, Fluit and colleagues [25] have shown that DiversiLab is a useful tool for identification of hospital outbreaks of Acinetobacter spp., Stenotrophomonas maltophilia, Enterobacter cloacae, Klebsiella spp., and Escherichia coli, but that it is inadequate for Pseudomonas aeruginosa, Enterococcus faecium, and methicillin-resistant Staphylococcus aureus (MRSA).The view that DiversiLab can be insufficiently discriminative for typing some bacterial species, including MRSA, in outbreak settings was confirmed by Babouee et al. [27].The results obtained by Overdevest and colleagues [26], who evaluated the performance of DiversiLab, were also in line with the findings reported by Fluit et al. [25], except for the conclusions regarding P. aeruginosa.Deplano and colleagues [24] have demonstrated excellent epidemiological concordance of the results produced by DiversiLab by correctly linking all outbreak-related isolates of vancomycinresistant E. faecium (VREF), Klebsiella pneumoniae, Acinetobacter baumannii, and P. aeruginosa.However, they also recommended that for E. coli isolates with the same DiversiLab type, the results should be confirmed by testing additional markers [24].The total cost of all consumables and reagents for DiversiLab is comparable to that of PFGE, amounting in euros (EUR) to about EUR 20 per isolate.By checking the PubMed database using 'DiversiLab AND typing' as the search term, 63 publications were retrieved of which 48 were dated after the end of 2009.This indicates a growing interest in the use of DiversiLab as a typing tool.However, as the inter-laboratory reproducibility of rep-PCR approaches is generally limited, large-scale intraand inter-laboratory reproducibility studies should be carefully performed to further evaluate the usefulness of the DiversiLab system for regional and eventually national surveillance of bacterial genotypes.Moreover, the DiversiLab database is housed on a manufacturer server, which prevents some potential users from using this typing system because of concerns with data security issues.

Variable-number tandem repeat (VNTR) typing
Bacterial genomes possess many regions with nucleotide repeats in coding and non-coding DNA sequences.When these repeats are directly adjacent to each other and their number at the same locus varies between isolates, the respective genomic regions are called variable-number tandem repeat (VNTR) loci.The repeats at the same locus can be identical or their nucleotide sequences can differ slightly.Multilocus VNTR analysis (MLVA) is a method which determines the number of tandem repeat sequences at different loci in a bacterial genome.In a most simple MLVA assay, a number of well-selected VNTR loci are amplified by multiplex PCR and an analysis of the amplicons is conducted on standard agarose gels [28].An advantage of this simple but also cheap, fast and easy to use assay is that the whole procedure can be performed in laboratories without sophisticated electrophoresis equipment.When MLVA does not enable a convenient and unambiguous calculation of the individual numbers of repeats per locus, some investigators call it multiplelocus VNTR fingerprinting (MLVF) [21,29].A drawback of MLVF is that the resulting data cannot be compared directly between different laboratories.This is due to the fact that the generated amplicons are monitored as banding patterns by conventional electrophoresis on low-resolution agarose gels.Such analyses do not reveal the exact numbers of repeats in the obtained amplicons and it is also impossible to determine which band in a pattern corresponds to which PCR target.A better separation of the amplified DNA fragments by size during electrophoresis has been achieved by replacement of standard agarose gels with a microfluidic chip-based analysis on a fully integrated miniaturised instrument.In 2005, Francois and colleagues [30] reported on the use of automated microfluidic electrophoresis with the Agilent 2100 bioanalyzer 'lab-on-achip' for the VNTR typing of S. aureus isolates.Since then, there have been a growing number of studies that have shown the clear advantage of microfluidic chips over the standard agarose gels for the MLVA/MLVF typing in terms of electrophoretic separation resolution, reproducibility, rapidity and automated data analysis [31,32].
For inter-laboratory comparison, the exact number of repeat units in each MLVA locus must be determined.From the size of a particular PCR product and the known length of a single repeat and the flanking consensus regions to which primers were designed, the number of repeated units at each locus can be calculated.The use of capillary electrophoresis on an automatic DNA sequencer and the labeling of primers with different fluorescently coloured dyes allows MLVA amplicons to be analysed in one run and still be typed individually [33,34].The different fluorophore molecules incorporated in the amplicons absorb the laser energy and release light of different wavelengths, which are then identified by the detector in the DNA sequencer.Using computer software, all loci are distinctly recognised on electropherograms according to their colours, and based on their amplicon sizes, the repeat number per MLVA locus is calculated automatically.Moreover, the determination of amplicon sizes using a DNA sequencer is conducted much more precisely than when agarose gels or microfluidic chips are used.Once the number of repeats in a set of VNTR loci (alleles) for a bacterial isolate is assessed, an ordered string of allele numbers corresponding to the number of repeat units at each MLVA locus results in an allelic profile (e.g.7-12-3-3-22-11-6-1), which can be easily compared to reference databases via the Internet.
The intrinsic limitation of MLVA is that it is not a universal method, meaning that primers need to be designed specifically for each pathogenic species targeted.This is the major reason why it cannot replace PFGE in epidemiological investigations in general.Furthermore, MLVA is not 100% reproducible unless the allele amplicons are sequenced and the users have agreed on where the VNTR begins and ends for each locus.For improved reproducibility of MLVA, single PCR amplifications of VNTR loci instead of multiplex reactions can be conducted.However, this approach increases the assay time and its costs.Separation by size of amplicons is not reproducible when using different sequencers, polymers, or fluorescent labels.The size difference in a VNTR locus may not always reflect the real number of tandem repeats, because insertions, deletions or duplications in the amplified region can also give rise to the same size difference.Therefore, sequencing of the amplicons is necessary in this case.Importantly, MLVA has not yet been fully developed and properly validated for use in surveillance networks dedicated to clinically relevant organisms as is underscored by the fact that multiple protocols have been published that still remain to be carefully validated.
An alternative strategy for epidemiological typing is the measurement of variations in the VNTR regions by DNA sequencing.Methods relying on sequence variations in multiple VNTR regions have been developed for the subtyping of Mycobacterium avium subsp.paratuberculosis [35], Vibrio cholerae [36], and Legionella pneumophila [37] isolates.
When 'VNTR AND typing' were used as a search term in PubMed, about 1,000 publications were retrieved from PubMed, showing that VNTR-based typing approaches are of major importance in the field.

Single locus sequence typing
Single locus sequence typing (SLST) is used to determine the relationships among bacterial isolates based on the comparison of sequence variations in a single target gene.The terminology SLST has been borrowed from the better known approach called multilocus sequence typing (MLST) (see below) in which several genes are characterised by DNA sequencing to determine genetic relatedness among the isolates.
Typing based on the M-protein found on the surface of group A Streptococcus (GAS) has been the most widely used method for distinguishing GAS isolates [38].The M-protein, encoded by the emm gene, is the major virulence and immunological determinant of this human-specific pathogen.In recent years, the classic M-protein serological typing was largely replaced by sequencing of the hypervariable region located at the 5'end of the emm gene [39].The emm-typing method has become the gold-standard of GAS molecular typing for surveillance and epidemiological purposes, and more than 200 emm types have been described so far.Nevertheless, in order to fully discriminate GAS clones, emm-typing should be complemented with other typing methods, like PFGE or MLST [40,41].
Nucleotide sequencing of the short variable region (SVR) of the flagellin B gene (flaB) provides adequate information for the study of Campylobacter epidemiology.Although PFGE remains the most discriminatory typing method for Campylobacter, a study conducted by Mellmann and colleagues [42] showed that sequencing of the SVR region of flaB is a rapid, reproducible, discriminatory and stable screening tool.It was also found that flaB sequence-typing is useful in combination with other typing methods such as MLST to differentiate closely related or outbreak isolates [43].
When 'emm OR flab AND typing' were used as a search term in PubMed, 238 hits were retrieved, which shows the importance of this method for the typing of GAS and Campylobacter isolates.

Staphylococcus aureus protein A gene-typing
The most widely used method of the SLST group is called S. aureus protein A gene (spa)-typing, because it involves the sequencing of the polymorphic X region of the protein A gene of S. aureus.Molecular typing of S. aureus isolates on the basis of the protein A gene polymorphism was the first bacterial typing method based on repeat sequence analysis [44].The high degree of genetic diversity in the VNTR region of the spa gene results not only from a variable number of short repeats (24 bp), but also from various point mutations.In the spa sequence typing method, each identified repeat is associated to a code and a spa-type is deduced from the order of specific repeats.Although spa-typing has a lower discriminatory ability than PFGE [45,46], its cost-effectiveness, ease of use, speed, excellent reproducibility, appropriate in vivo and in vitro stability, standardised international nomenclature, high-throughput by using the StaphType software, and full portability of data via the Ridom database (http:// spaserver.ridom.de)makes this method the currently most useful instrument for characterising S. aureus isolates at the local, national and international levels [47][48][49][50][51][52].Importantly, this approach ensures strict criteria for internal and external quality assurance of data submitted to the database that is curated by SeqNet.org[50,53].Furthermore, the implementation of the based upon repeat patterns (BURP) algorithm to the StaphType software has greatly facilitated the assignment of spa-types into clonal complexes and singletons.Nevertheless, spa-typing has also certain disadvantages.The major drawback of this method based on single-locus typing is that it can misclassify particular types due to recombination and/or homoplasy.When 'spa AND typing' were used as a search term in PubMed, 548 hits were retrieved, which highlights the importance of this method for the typing of S. aureus isolates.Moreover, 341 of the respective publications were dated after the end of 2009, showing that spa-typing is gaining an increasing influence.

Multilocus sequence typing
In order to overcome the lack or poor portability of traditional and older molecular typing approaches, the MLST method has been invented.MLST is based on the principles of phenotypic multilocus enzyme electrophoresis (MLEE) [54], which relies on the differences in electrophoretic mobility of different enzymes present in a bacterium.The first MLST scheme was developed for Neisseria meningitidis in 1998 [55].Shortly thereafter, the method was extended to other bacterial species and, over time, it has become a very popular tool for global epidemiological studies, and for studies on the molecular evolution of pathogens [56][57][58][59][60][61][62][63][64][65][66].Accordingly, a PubMed search with the term 'MLST AND typing' yielded 1,485 hits.In MLST, internal sequences (of approximatively 450-500 bp) of mostly seven housekeeping genes are amplified by PCR and sequenced.For each locus, unique sequences (alleles) are assigned arbitrary numbers and, based on the combination of identified alleles (i.e. the 'allelic profile'), the sequence type (ST) is determined.The number of nucleotide differences between alleles is not considered.The great advantage of MLST is that all data produced by this method are unambiguous due to an internationally standardised nomenclature, and highly reproducible.Moreover, the allele sequences and ST profiles are available in large central databases (http://pubmlst.organd www.mlst.net)that can be queried via the Internet.These databases also provide on-line software (eBURST) for determination of the genetic relatedness between bacterial strains within a species as well as MLST-maps to track the isolates of each ST that have been recovered from each country plus the details of these isolates.The great disadvantage of MLST is its high cost.The total costs of all consumables and reagents for MLST greatly depend on the number of loci investigated and the country in which this typing procedure is conducted.We estimate that in Member States of the European Union, the total costs of an MLST analysis based on seven loci amount to about EUR 50 per isolate.In contrast, the total costs of MLVF performed with an Agilent BioAnalyzer, MLVA with a DNA sequencer, or SLST merely amount to about EUR 2, EUR 8 and EUR 8 per isolate, respectively [32].Moreover, MLST is labour-intensive, time-consuming and for some pathogens insufficiently discriminating for routine use in outbreak investigations and local surveillance.To increase the discriminatory power of the 'classical' MLST schemes based on seven housekeeping genes, the sequencing results for particular antigen-encoding genes can be included in the analysis.This is exemplified, by the two-locus sequence typing (Neisseria gonorrhoeae multi-antigen sequence typing, NG-MAST) approach developed for N. gonorrhoeae, which includes two of the most variable gonococcal genes, namely por and tbpB [67].Another example is the MLST approach developed for Salmonella enterica in which two housekeeping genes, gyrB and atpD, in combination with the flagellin genes fliC and fljB were applied [68].Moreover, attempts have been undertaken to develop MLST schemes that are entirely based on virulence genes.Such approaches, termed multivirulence-locus sequence typing (MVLST), have been applied for the subtyping of pathogens like Listeria monocytogenes, V. cholerae, S. enterica and S. aureus [69][70][71][72].Altogether, the currently available data suggest that MVLST is endowed with a higher discriminatory power than that of the 'classical' MLST.However, for most of the MVLST approaches, additional research is needed.This should involve different and larger sets of isolates, and the results should also be correlated with conventional epidemiological data in order to validate the applicability of MVLST for epidemiological typing.

Comparative genomic hybridisation
A DNA microarray used for typing studies is a collection of DNA probes attached in an ordered fashion to a solid surface.These probes can be used to detect the presence of complementary nucleotide sequences in particular bacterial isolates.Thus, microarrays represent facile tools for detecting genes that serve as markers for specific bacterial strains, or to detect allelic variants of a gene that is present in all strains of a particular species.The probes on the array may be PCR amplicons (> 200 bp) or oligonucleotides (up to 70 mers).Depending on the number of probes placed on a solid surface, we can distinguish low-density (hundreds of probes) and high-density (hundreds of thousands of probes) DNA microarrays.In the usual approach, total DNA is extracted from a pathogen of interest.This target DNA is then labeled, either chemically or by an enzymatic reaction, and hybridised to a DNA microarray.Unbound target DNA is removed during subsequent washing steps of different stringency, and the signal from a successful hybridisation event between the labeled target DNA and an immobilised probe is measured automatically by a scanner.The data produced by a microarray assay are then analysed using dedicated software to assess the bacterial diversity.The results retrieved from array technology are variable and depend on the customised array.DNA microarrays appear to be very well suited for bacterial typing as is underscored by the 506 PubMed hits with the search terms 'microarrays OR microarray AND typing'.Microarrays are currently widely used to analyse genomic mutations, such as single-nucleotide polymorphisms (SNPs).In addition, microarray technology is an efficient tool for the detection of extra-genomic elements [73,74].Through microarray-based gene content analyses, pathogens can be simultaneously genotyped and profiled to determine their antimicrobial resistance and virulence potential.Importantly, such a high-density whole genome microarray approach comprises probes allowing for the detection of the open reading frame (ORF) content of one or many genomes.Comparative genomics by using whole genome microarrays has revealed that 10 major S. aureus lineages are responsible for the majority of infections in humans [75].The application of very recently developed microarrays (Sam-62) based on 62 S. aureus whole genome sequencing (WGS) projects and 153 plasmid sequences has shown that MRSA transmission events unrecognised by other approaches can be identified using microarray profiling, which is capable of distinguishing between extremely similar but non-identical sequences [73].Also, a high-density Affymetrix DNA microarray platform based on all ORFs identified on 31 chromosomes and 46 plasmids from a diverse set of E. coli and Shigella isolates has been applied to quickly determine the presence or absence of genes in very recently emerged E. coli O104:H4 and related isolates [76].This genome-scale genotyping has thus revealed a clear discrimination between clinically, temporally, and geographically distinct O104:H4 isolates.The authors have therefore concluded [76] that the whole genome microarray approach is a useful alternative for WGS to save time, effort and expenses, and it can be used in real-time outbreak investigations.However, the application of high-density microarrays for bacterial typing in routine laboratories is currently hindered by the high costs of materials and the specialised equipment needed for the tests.Alere Technologies has therefore developed a rapid and economic microarray assay for diagnostic testing and epidemiological investigations.The assay was miniaturised to a microtitre strip format (ArrayStrips) allowing simultaneous testing of eight to up to 96 samples.The Alere StaphyType DNA microarray for S. aureus covers 334 target sequences, including approximately 170 distinct genes and their allelic variants [77].Ninety six arrays are scanned on the reader and the affiliation of S. aureus isolates to particular genetic lineages is done automatically by software based on hybridisation profiles.With the ArrayStrips, the ArrayTube Platform as a single test format is also available for a number of bacterial species.Interestingly, the total cost of an Alere microarray test per bacterial isolate is comparable to that of PFGE (about EUR [20][21][22][23][24][25][26][27][28][29][30] and much lower than that of MLST (EUR 50).The whole typing procedure for 96 isolates can be conducted within two working days.Recently, Alere Technologies has also developed genotyping DNA microarray kits for other bacterial species, such as E. coli, P. aeruginosa, L. pneumophila, and Chlamydia trachomatis.Altogether, the available data show that microarray-based technologies are highly accurate.However, the reproducibility of microarray data within and between different laboratories needs to be established prior to the broad application of this technology.In particular, if SNPs are the target for typing of highly clonal species, then DNA microarray analysis is probably not the best method to apply.Moreover, arrays have the major disadvantage that they do not allow the identification of sequences which are not included in the array.
Classical serotyping involves a few days to achieve final conclusive results.It requires a major set of costly antisera, is expensive and tedious so that its use is usually restricted to only a few reference laboratories.These technical difficulties can be overcome with molecular serotyping methods.Accordingly, Alere Technologies has developed fast DNA Serotyping assays based on oligonucleotide microarrays for C. trachomatis, E. coli and S. enterica [78,79].The microarray serogenotyping assay for C. trachomatis includes a set of oligonucleotide probes designed to exploit multiple discriminatory sites located in variable domains 1, 2 and 4 of the ompA gene encoding the major outer membrane protein A. In case of E. coli and S. enterica, separate approaches have been developed, but in both these assays the genes encoding the O and H antigens have been selected as target sequences.After multiplex amplification of the selected DNA target sequences using biotinylated primers, the samples are hybridised to the microarray probes under highly stringent conditions.The resulting signals yield genotype (serovar)specific hybridisation profiles.

Optical mapping
Optical maps from single genomic DNA molecules were first described for a pathogenic bacterium in the year 2001 [80].They were constructed for E. coli O157:H7 to facilitate genome assembly by an accurate alignment of contigs generated from the large number of short sequencing reads and to validate the sequence data.Optical mapping, also called whole genome mapping, is now a proven approach to search for diversity among bacterial isolates.Moreover, optical mapping can be coupled with next generation sequencing (NGS) technologies to effectively and accurately close the gaps between sequence scaffolds in de novo genome sequencing projects.The system creates ordered, genome-wide, high-resolution restriction maps using randomly selected individual DNA molecules [81].High molecular weight DNA is obtained from gently lysed cells embedded in low-melting-point agarose.The purified DNA is subsequently stretched on a microfluidic device.Following digestion with a selected restriction endonuclease, the resulting molecule fragments remain attached to the surface of the microfluidic device in the same order as they appear in the genome.The genomic DNA is then stained with an intercalating fluorescent dye and visualised by fluorescence microscopy.The lengths of the restriction fragments are measured by fluorescence intensity.Finally, using specialised software, the consensus genomic optical map is assembled by overlapping multiple single molecule maps.Whole chromosome optical maps can be created for a few organisms within two days.Due to a very high accuracy and resolution potential, optical mapping has been used successfully in retrospective outbreak investigations to examine the genetic relatedness among isolates of several bacterial species [82][83][84].Mellmann and colleagues [85] created for the first time whole chromosome optical maps in real-time outbreak investigations for the E. coli isolates recovered from patients in hospitals located in four different German cities during the 2011 outbreak of E. coli O104:H4.Based on these studies, it can be concluded that optical mapping is a very powerful tool to assess the genetic relationships among bacterial isolates.However, the use of this technique is currently limited by the high costs of the experiments and the specialised equipment needed.

Whole genome sequencing
NGS has transformed genetic investigations by providing a cost-effective way to discover genome-wide variations.These NGS technologies are also known as 'second generation sequencing', or 'high-throughput sequencing'.The terms next generation or second generation sequencing are used to distinguish these approaches from the first generation sequencing approaches based on the Sanger method.The clear advantage of NGS over traditional Sanger sequencing is the ability to generate millions of reads (approximately 35-700 bp in length) in single runs at comparatively low costs.To construct the complete nucleotide sequence of a genome, multiple short sequence reads must be assembled based on overlapping regions (de novo assembly), or comparisons with previously sequenced 'reference' genomes (resequencing).WGS is becoming a powerful and highly attractive tool for epidemiological investigations [85][86][87][88] and it is highly likely that in the near future WGS technology for routine clinical use will permit accurate identification and characterisation of bacterial isolates.However, the key challenge will not be to produce the sequence data, but to rapidly compute and interpret the relevant information from large data sets.Ideally, this information should include and therefore enable a direct comparison to the results obtained by conventional typing methods (e.g.PFGE, MLST), and it should be stored in globally accessible databases.However, the reads produced by the NGS technologies are relatively short, which can make the de novo genome assembly a challenging enterprise.Accordingly, the term 'whole genome sequence' refers often to only approximately 90% of the entire genome.The gaps between assembled regions (contigs) are mainly caused by the presence of dispersed or tandemly arrayed repeats.
As current NGS sequencing platforms do not resolve such VNTRs very well, it is often difficult or even impossible to extract useful information on repeats in the MLVA loci from the available genome sequences.Also, for an in silico restriction digest to simulate PFGE, there is a need to close completely the gaps between the contigs to obtain one long, contiguous sequence.Therefore, PFGE profiles cannot be predicted without closing the genome sequences, and on top of this it is necessary to know how different restriction sites used for PFGE are methylated in an organism of interest.To improve de novo genome assembly, the introduction of new platforms that generate much longer reads is needed.Recently, a 'third-generation sequencer' (PacBio) was launched by Pacific Biosciences, which generates very long reads with average lengths of 2-3 kb, and reads of more than 7 kb are not uncommon with this system.Furthermore, approximately 100 kb reads are generated by nanopore sequencing technologies as developed by Oxford Nanopore.The main limitations of these third-generation sequencing approaches are their very high costs and low accuracy (approximately 15% error rate).However, further improvements are promised by Pacific Biosystems and Oxford Nanopore to generate long sequence reads with much higher accuracy [89].
The costs of bacterial WGS by NGS continue to decline.Currently, a price level has been reached that comes close to the price of an MLST analysis carried out by traditional Sanger sequencing reactions.Thus, the sequencing cost in United States (US) dollars (USD) of a bacterial genome using NGS can be as little as USD 100-150 per isolate (which amounts to EUR 75-110), including sample preparation, library quality control (quantification and size assessment), and sequencing [90,91].Not surprisingly, there is an increasing interest in the replacement of PCR/Sanger sequencing with high-throughput deep sequencing technologies, such as 454-pyrosequencing, Illumina and the Ion Torrent system yielding large numbers of short and high-quality reads.
Desktop model sequencers are within the financial reach of many, if not all, reference laboratories.However, the procedure is still too slow, and the genome assembly too complicated for implementation in routine surveillance, as NGS requires heavy computer resources and the help of well-trained bioinformaticians.On the other hand, Windows-based software (e.g.Bionumerics and Lasergene) that does not require deep insights into bioinformatics for assembling the sequenced genomes and query them against reference genomes or other sequences is just around the corner.An important prerequisite for the effective application of WGS technologies in the typing of microorganisms is the availability of novel web-accessible bioinformatics platforms for rapid data processing and analysis.Moreover, these bioinformatics tools should be simple enough for use in clinical settings.This is highly feasible as exemplified by the convenient web-based method for MLST of 66 bacterial species that was developed by Larsen et al. [92].This method utilises short sequence reads or reassembled genomes for identifying MLST sequence types, and it is publicly available at www.cbs.dtu.dk/services/MLST.The great advantage of MLST based on seven housekeeping genes is that this method is fully standardised for numerous bacterial species.However, a very significant amount of genomic information, including DNA sequence and gene content diversity, exists outside of the genes targeted by traditional MLST.Therefore, to be more effective in the characterisation of outbreak isolates and to strengthen the surveillance systems for particular pathogens, higher resolution methods which utilise WGS are urgently needed.This view is critically underscored by the outbreak of a multidrugresistant enterohaemorrhagic E. coli (EHEC) O104:H4 infection causing a number of haemolytic uraemic syndrome (HUS), which occurred in Germany in the period between May and June 2011 [85,93].This outbreak resulted in the death of 46 people and more than 4,000 diseased patients [94].Before the outbreak in 2011, only one case of HUS associated with E. coli O104:H4, which took place in 2001, had been reported in Germany [85,95].The traditional MLST typing based on sequence determination of seven housekeeping genes revealed that both the historical isolate recovered in 2001 and an isolate originating from a HUS patient during the outbreak in 2011 had the same MLST type 678.This indicated that both isolates were closely related.However, in this case, MLST was not able to reveal major differences between the outbreak isolate and the earlier isolate as became clearly evident upon their characterisation by NGS.Strikingly, the WGS data revealed that the isolate originating from the 2011 outbreak differed substantially from the 2001 isolate in chromosomal and plasmid content [85].An independent study by Hao and colleagues [96] confirmed these results as the analysis of E. coli O104:H4 ST678 isolates (one of them was epidemiologically linked to the 2011 outbreak) showed that traditional MLST cannot accurately resolve relationships among genetically related isolates that differ in their pathogenic potentials.Using the WGS data they found in 167 genes an evidence of homologous recombination between distantly related E. coli isolates, including the 2011 outbreak isolate [96].
We are convinced that in the near future WGS will become a highly powerful tool for outbreak investigations and surveillance schemes in routine clinical practice.However, this will require standard operating procedures for identifying variations by examining similarities and differences between bacterial genomes over time.A way forward seems to be the development of a genome-wide gene-by-gene analysis tool.To this end, two approaches can be used.The first approach would involve an extended MLST (eMLST).However, instead of the traditional MLST based on seven genes, the eMLST method would be based on the whole core genome including all genes present in all isolates of a species.An allelic profile produced by eMLST would then be composed of hundreds to thousands of different alleles depending on the genome size of the investigated species.A second 'pan-genome approach' would use the full complement of genes in a species, including the core genome, the dispensable genome that represents a pool of genetic material that may be found in a variable number of isolates within this species, and the unique genes specific to single strains of the species.In this approach, the relatedness of isolates would be measured by the presence or absence of genes across all genomes within a species.Such core-and pan-genome approaches will be endowed with a much higher discriminatory power than that of the traditional MLST, allowing the discrimination of very closely related isolates.However, to use these approaches for bacterial typing, comparative genomics must first determine the core, dispensable and unique genes among bacterial genomes at the species level.This process can be greatly facilitated by the Bacterial Isolate Genome Sequence Database (BIGSdb) comparator, and the software implemented within the web accessible PubMLST database (http://pubmlst.org/software/database/bigsdb/), which was created to store and compare sequence data for bacterial isolates [97].Any number of sequences, from a single sequence read to whole genome data generated from NGS technologies, can be linked to an unlimited number of bacterial sequences.Within BIGSdb, large numbers of loci can be defined and allelic profiles for each bacterial isolate can be determined with levels of discrimination chosen on the basis of the question being asked.In this way, WGS can probably replace MLST and other typing methods currently in use.As soon as the cost of WGS comes further down and it becomes possible to perform the sequencing and analysis in <24 hours, the method will be highly useful for real-time outbreak surveillance and will likely take over as the first line surveillance typing method in any setting.
Although most typing approaches were developed to detect the presence or absence of genetic polymorphisms inside protein-encoding ORF sequences, important differences in nucleotide sequences between different bacterial strains of a species can also be observed in intergenic regions.In Europe, the predominant method for Clostridium difficile typing is PCRribotyping, which requires the PCR amplification of the intergenic space region between the 16S and 23S ribosomal RNA genes.This method yields an appropriate grouping of isolates with identical PFGE pulsotypes and has an excellent discriminatory power for isolates with different PFGE pulsotypes [98].This supports the view that the analysis of DNA polymorphisms in intergenic regions by WGS may provide truly valuable epidemiological insights.
The genetic relatedness among bacterial isolates can also be determined by examining the genome sequence as a whole.In contrast to conventional molecular typing methods, WGS has the potential to compare different genomes with a single-nucleotide resolution.This would allow an accurate characterisation of transmission events and outbreaks.However, translating this potential into routine practice will involve extensive investigations.Methods based on SNPs permit a detailed, targeted analysis of variations within related organisms.Very recently, Köser and colleagues [91] reported a clinically meaningful application of SNPs analysis involving the rapid high-throughput sequencing of MRSA isolates recovered from a putative outbreak in a neonatal intensive care unit.The whole genome SNPs analysis identified the isolates associated with an outbreak, and clearly separated them from other non-outbreak isolates.However, one outbreak isolate showed a higher number of SNPs than the other outbreak isolates, which highlights the difficulty in applying a simple cut-off for differences in the identified SNPs of isolates in an outbreak setting.Therefore, additional investigations and comparisons are needed to develop a strategy for automated data interpretation of an outbreak situation in clinical practice.
Interestingly, the '100K Genome Project', which is an initiative of the US Food and Drug Administration (FDA), Agilent, the University of California at Davis, and other federal and private partners, is aimed at the sequencing of 100,000 genomes of at least 100,000 food-borne pathogens over the next five years (http://100kgenome.vetmed.ucdavis.edu).The knowledge that is to be derived from this enormous effort will be extremely useful for epidemiological surveillance, not only due to the specific genomic information that will facilitate detailed comparisons between different bacterial isolates, but also because the data will serve as a knowledge base for the development of new pathogen detection and typing assays for outbreak investigations.
In addition to traditional epidemiological applications, WGS can also be effective for defining phenotypic characteristics, such as the virulence or antibiotic resistance of a particular pathogen [99].First attempts to create an artificial 'resistome' of antibiotic resistance genes were already successful, as demonstrated by a comparison of genome-based predictions to the results of phenotypic susceptibility testing [91].Similarly, based on the WGS data a potential 'toxome' was established, consisting of all toxin genes [91].Accordingly, WGS can potentially be used to support or replace the classical determination of bacterial serotypes as it allows the detection of genes critical for the expression of particular serotype-specific antigens.However, a note of caution is in place, since the genome sequence does as yet neither allow an accurate prediction of the potentially conditional expression of particular genes, nor their expression level.This is critically underscored by proteomics studies on the cell surface and exoproteomes of different isolates of S. aureus, which revealed high degrees of variation in the expression of particular proteins, including known virulence factors [100][101][102].Lastly, genome sequences will be also used to search for genetic markers, such as the presence or absence of a gene or an amino acid substitution in a protein, which can then be linked with an exclusive or higher occurrence in a disease, or associated with disease severity and virulence.

Conclusions
In recent years, we have witnessed substantial technical improvements in existing approaches for the typing of bacterial isolates, and completely new technologies have emerged that will substantially impact on the way pathogenic microorganisms can be defined and distinguished in the near future.This has involved major efforts towards the automation of these typing methods, the improvement of their resolution and throughput, and the design of adequate bioinformatics tools.The steadily increasing number of genotyping databases containing DNA sequences and DNA microarray profiles now allows easier and faster interlaboratory comparisons, retrospective analyses and long-term epidemiological surveillance of bacterial infections.Unfortunately, there is currently no single ideal typing method available, and each genotyping approach has various advantages and disadvantages.Therefore, depending on the setting (local, national or international), one or more different typing methods need to be applied.If speed is important for containing a local disease outbreak, a PCR-based method with high discriminatory power, such as MLVF and/or DiversiLab, may work well for characterisation of the isolates.However, if an outbreak of bacterial disease is disseminated among various geographical locations, a more robust typing approach, such as PFGE, will be needed to allow reliable comparison of the results obtained in different laboratories.Notably, some of the newer methods, such as MLVA, SLST, MLST, SNP or DNA microarray analysis, allow the typing of isolates equally well as the gold standard PFGE, and urgently needed results can be obtained in shorter periods of time.On the other hand, these newer methods also have certain drawbacks, including the need for highly trained staff and expensive equipment, such as automated DNA sequencers or scanners.Therefore, it is much easier to replace traditional methods with newer ones at the local level than in large national or international surveillance networks where all laboratories (with different staff and budgets) must implement the same new typing method and train all participants in its standardised application.It is important to realise that a newly introduced method must be very well validated by different independent laboratories to determine its typing potential, and this process takes years rather than months.A new method must also implement a specific unambiguous nomenclature, which needs to be developed and improved during the validation process.Accordingly, the replacement of an old well-and widely established method with a new one must be conducted gradually to avoid the loss of precious historic information generated over many years.This is underscored by the continued use of PFGE which, for example, has remained the preferred typing method in the PulseNet network for surveillance and investigation of food-borne outbreaks for over 15 years (www.cdc.gov/pulsenet/).Moreover, if a surveillance network addresses different bacterial species, it is also very convenient if the same standardised typing platform can be used for all these species.This is another reason why PFGE is likely to remain a preferred method in PulseNet.Notably, because different typing methods are usually based on the detection of different genomic target sequences, strain variations detected with one method may remain undetected when applying another approach.Therefore, in certain situations, the combined use of several different typing methods may lead to a more precise discrimination of bacterial isolates than the use of a single method.A completely unambiguous typing of different bacterial isolates can be achieved by WGS, as this technology has the potential to resolve single base differences between two genomes.WGS thus promises to deliver high-resolution genomic epidemiology as the ultimate method for bacterial typing.However, it is presently difficult to estimate when exactly this approach will become the norm in routine laboratories.In fact, we do not anticipate that WGS can completely replace other typing systems in the near future.Compared with many conventional methods, WGS is still not a rapid and cost-effective approach.Nevertheless, recent technical improvements as well as cost reductions suggest that, in industrialised countries, WGS will gradually become a primary typing tool in routine use.Especially, bioinformatic solutions will be necessary to extract rapidly information from WGS that is important for clinical microbiology, infection control and public health.Therefore, a common web-based database will be necessary in order to have on the one side quantifiable quality control of the enormous amount of sequencing data, and to have on the other side a growing worldwide WGS-reference database.In less-resourced countries, due to limited financial resources, the well-established conventional methods like PFGE or PCR-based typing systems will probably prevail in routine laboratories in the coming decade, although these countries may then rapidly adopt WGS once it is more affordable and practical to use.In this respect, it is however important to bear in mind that all sequence-based typing methods will produce -already today -the data sets that will also be readable by the next generation, because they are based on the universal genetic code.Moreover, the challenge is to correlate continuously increasing genome sequence information with phenotypic characteristics of bacterial isolates and to make this data publically available via the Internet, thereby warranting that these achievements will be further put to clinical use not only in industrialised countries but also in less-resourced countries.Finally, the data produced by WGS will be invaluable for the development of new typing strategies and the optimisation of traditional typing methods, such as the PCR-and microarray-based approaches presented in this review.