Multicenter External Quality Assessment Program for PCR Detection of Mycobacterium ulcerans in Clinical and Environmental Specimens

Background Mycobacterium ulcerans is the causative agent of Buruli ulcer (BU), a necrotizing disease of the skin, soft tissue and bone. PCR is increasingly used in the diagnosis of BU and in research on the mode of transmission and environmental reservoir of M. ulcerans. Methodology/Principal Findings The aim of this study was to evaluate the performance of laboratories in detecting M. ulcerans using molecular tests in clinical and environmental samples by implementing sequential multicenter external quality assessment (EQA) programs. The second round of the clinical EQA program revealed somewhat improved performance. Conclusions/Significance Ongoing EQA programs remain essential and continued participation in future EQA programs by laboratories involved in the molecular testing of clinical and environmental samples for M. ulcerans for diagnostic and research purposes is strongly encouraged. Broad participation in such EQA programs also benefits the harmonization of quality in the BU research community and enhances the credibility of advances made in solving the transmission enigma of M. ulcerans.


Introduction
The implementation of PCR-based methods for the detection of Mycobacterium ulcerans, the causative organism of Buruli ulcer (BU), in clinical and environmental samples 15 years ago [1][2][3] has drastically improved our knowledge of BU. BU is an indolent necrotizing disease of the skin, subcutaneous tissue, and bone [4] occurring mainly in certain riverine rural areas of West and Central Africa and in coastal southeastern Australia with about 20,000 cases reported in the last decade [4]. BU is presently the third most common mycobacterial disease of humans, after tuberculosis and leprosy, and the least well understood of the three [5].
In most BU endemic settings the working conditions are difficult and the diagnosis of BU is often made on clinical and epidemiological grounds. However, the disease presents with a diverse range of clinical symptoms and, due to possible confusion with other tropical skin diseases, the added value of microbiological confirmation is becoming more appreciated. Among the available laboratory tests (direct smear examination for acid fast bacilli, culture, PCR and histopathology), PCR targeting the insertion element IS2404 (present in over 200 copies in the M. ulcerans genome) is by far the most sensitive and specific, and much faster than culture, which takes an average of 10 weeks and only has 45% sensitivity despite many efforts to improve decontamination methods, culture media and incubation conditions [6][7][8][9][10]. Since isolating M. ulcerans from environmental sources has until now, despite numerous attempts, only been successful once [11], most current knowledge on the environmental reservoir and mode of transmission of M. ulcerans is based on studies that have used PCR to amplify IS2404 and other targets (less frequent in the M. ulcerans genome) such as the insertion element IS2606, the ketoreductase B domain (KR) and the enoyl reductase domain (ER) suggesting the presence of M. ulcerans DNA in a number of biotic and abiotic elements of the environment [12][13][14]. The actual environmental reservoir(s) and mode(s) of transmission, however, remain a mystery.
Clinical and environmental samples can contain low concentrations of M. ulcerans DNA, PCR inhibitors and DNA from other sources that may generate non-specific PCR products, which present a number of difficulties when applying molecular methods to diagnosis and research. Moreover, previous external quality control studies for PCR detection of Mycobacterium tuberculosis and hepatitis B virus have shown that PCR may be unreliable because of false-positive results caused by contamination or because of false-negative results caused by a lack of sensitivity, inhibition, or other reasons [15][16][17]. The virtually unique reliance on PCR for diagnostic and research purposes in the field of BU requires the continued and convincing demonstration of its accuracy, reliability and reproducibility. This requirement compels laboratories to establish and implement effective and comprehensive quality assurance schemes for their PCR tests. Quality assurance involves intensive internal quality control as well as a system of external quality control. In light of this, in 2008 the Technical Advisory Group of the WHO Global Buruli Ulcer Initiative recommended the establishment of an external quality assessment program (EQAP) for the molecular detection of M. ulcerans in clinical and environmental samples.
EQAPs are performed to assist laboratories maintain high standards. They enable participants to check that samples are processed correctly, that results are appropriately recorded, and that assays are robust and are being performed in an accurate and reproducible manner. External quality assessment can consist of audit visits, proficiency testing with an adequate number of coded specimens and periodic rechecking of specimens. This report summarizes the results of two rounds of proficiency testing for the molecular detection of M. ulcerans in clinical and environmental samples coordinated by the WHO Collaborating Centers for M. ulcerans, respectively, the Institute of Tropical Medicine (ITM) in Antwerp, Belgium, and the Victorian Infectious Diseases Reference Laboratory (VIDRL) in Melbourne, Australia. The objectives of these programs were to assess the performance of the participating laboratories in detecting M. ulcerans DNA and to compare their performance between rounds.

Materials and Methods
Proficiency testing by both WHO Collaborating Centers involved the distribution to national laboratories of panels of coded specimens with known status, with the coding sequence differing from panel to panel. All samples were shipped at ambient temperature to the participants (Table 1). In addition, a questionnaire on the methodologies used and laboratory characteristics was included in each shipment along with instructions and a results sheet.
Participating laboratories were asked to process the EQA panel using the DNA extraction and PCR methods they routinely used for molecular detection of M. ulcerans. To ensure confidentiality, all participating laboratories were assigned a code. The test results by the laboratories were compared with the coded results in a blinded way and specific performance indicators (concordance, sensitivity, specificity and reproducibility) calculated. The participating laboratories received a report summarizing the results of the EQAP that allowed them to compare their performance to that of other laboratories. Individual discussions between the WHO Collaborating Centers and some participants to reflects on possible causes and solutions of weak performances took place by email and at the WHO BU meeting in Geneva, Switzerland in March 2010.

Ethics Statement
The existing collection of anonymized surplus diagnostic samples hosted by the ITM in its role of WHO Collaborating Centre for the Diagnosis and Surveillance of Mycobacterium ulcerans Infection was used to prepare the distributed EQA panels. For this kind of activity no approval of an ethics committee is needed

Clinical EQAP
During the first round, organized in 2009, the EQA panel for the detection of M. ulcerans DNA consisted of 34 suspensions of clinical specimens in PBS and 70% ethanol (50:50) to kill bacteria. During the second round, organized in 2011, 33 such suspensions were included. Suspensions were selected in such a way that they would allow an assessment of sensitivity, specificity, and interlaboratory reproducibility ( Table 2). The positive suspensions represented strong as well as weak positives (quantified by direct smear examination). Among the M. ulcerans negative samples, some suspensions of clinical specimens positive for M. tuberculosis and M. marinum were included. All suspensions were sent in duplicate to assess intra-laboratory reproducibility. In order to distinguish whether weak performance during the first round was due to problems during DNA extraction versus amplification, serial dilutions of extracted genomic M. ulcerans DNA suspended in 1xTE were distributed during the second round, in addition to the suspensions. Ten-fold dilutions of 29610 24 ng/ml to 29610 29 ng/ml genomic DNA were included in duplicate. Testing both specimen suspensions and DNA extracts allowed laboratories to evaluate the performance of their methods for DNA extraction and for PCR, separately.

Environmental EQAP
During the first (pilot) round, organized in 2008, participants were sent eight heat-sterilized environmental samples each comprising a mixture of soil, leaf litter, detritus and animal faeces collected from BU endemic and non-endemic areas in Victoria, Australia. The positive samples were true positives containing varying concentrations of M. ulcerans DNA as determined by IS2404 real-time PCR [14] ( Table 3). The rationale for using only one sample type during the pilot round was to minimize the number of variables so that the only difference between samples was the presence or absence of M. ulcerans DNA. During the second round, organized in 2010, participants received 10 heatsterilized environmental samples. The samples were selected to represent the types of samples commonly tested by BU researchers (water, soil, aquatic plants and animal faeces) and to include the types of samples that often contain PCR inhibitors ( Table 3). The positive samples represented stronger as well as weaker positives as determined by IS2404 real-time PCR [14] ( Table 3). The positive faecal samples were true positive samples collected from a BU endemic area, whereas the other positive samples were spiked with suspensions of M. ulcerans, as in Australia these types of samples are less frequently positive and contain lower concentrations of M. ulcerans DNA [19], rendering the collection of true positive samples difficult. To assess the impact of a delay in the shipment and/or processing of samples, two panels were retained at VIDRL (one stored at 4uC and the other at room temperature) and tested after all participants had submitted their results.

Statistical Analysis
Inter-laboratory reproducibility was calculated per sample as the number of qualitative results concordant with the organizing centers obtained by the laboratories divided by the total number of participating laboratories.
The concordance of a laboratory was calculated as the ratio of concordant qualitative results obtained by the laboratory over the total number of samples.
Results were considered false negative or false positive when they differed from the qualitative results obtained by the organizing laboratory using real-time PCR.
Intra-laboratory reproducibility was assessed by shipping all suspensions in duplicate and calculated as the ratio of pairs concordant with the organizing centers over the total number of pairs. GraphPad Prism v. 5 was used for linear regression analysis of reproducibility and concordance vs. workload, and false negative rate vs. the PCR detection limit.

Clinical EQAP
A total of 11 laboratories from 11 countries participated in the first round of the clinical EQAP while 18 laboratories from 15 countries took part in the second round (Table 1).

Results
by sample (inter-laboratory reproducibility). During the first round, the proportion of qualitative results concordant with ITM varied between 44% and 100%, median 90%, by sample (Table 2). Only 3 samples were identified correctly by all laboratories. In the second round, this proportion ranged between 31% and 100%, median 94%, with eleven samples identified correctly by all laboratories. Six suspensions (actually the duplicates of three) were reported correctly by less than 60% of the laboratories indicating that they were positive at the detection limit of the PCR assays used. These samples were indeed negative by direct smear examination and had reported IS2404-Ct values between 36 and 37, as compared with Ct values between 20 and 30 for smear positive samples.
Results by laboratory (concordance, false positive and false negative results, intra-laboratory reproducibility). The proportion of concordant qualitative results varied between 58% and 100%, median 82%, by laboratory in the first round and between 64% and 100%, median 88%, in the second round (Table 4). In both rounds only two laboratories reported all results correctly. In the first round, seven (64%) of the participating laboratories reported false positive results while six (55%) laboratories reported false negative results. Five laboratories tested for individual inhibition and one of them erroneously observed inhibition in two samples. In the second round, six (38%) participating laboratories reported false positive results while 13 (81%) reported false negative results. Seven laboratories tested for inhibition in individual samples and correctly observed none.
When stratifying for workload we found a non-significant correlation between workload and concordance (Fig. 1) and no association between workload and reproducibility (Fig. 2).
Participants used a range of DNA extraction protocols, PCR methods and amplification targets, including both in-house methods (Phenol-chloroform and modified Boom [20]) and kits (e.g. Roche respiratory specimen preparation kit, Qiagen QiAmp DNA mini kit, MOBID ultraclean soil kit and MoBio spin columns, Qiagen Puregene core kit, Qiagen DNeasy blood and tissue kit, Roche high pure PCR template preparation kit, Promega Maxwell 16 Instrument) for DNA extraction and gelbased [21][22][23] and real-time PCR [14,24] for sequence detection. In the first round, six laboratories reported the use of an in-house DNA extraction method (58%-97%, median 75% concordant results) while five reported to use a commercial method (68%-100%, median 94% concordant results). Three laboratories used real-time PCR (58%-100%, median 88% concordant results). Among the eight laboratories using gel-based conventional PCR, three used a nested format (74%-97%, median 82% concordant results) while five used a single run format (68%-100%, median 77% concordant results). All laboratories amplified the IS2404 sequence. One laboratory amplified additionally also the gene encoding for ketoreductase (KR), one laboratory the gene encoding for enoyl reductase (ER), while a third laboratory amplified also the IS2606 sequence.
DNA extracts. All laboratories but one were able to detect 10 25 *29 ng/ml or less DNA. No laboratory detected the lowest dilution of 10 29 *29 ng/ml. One laboratory detected M. ulcerans DNA in none of the DNA dilutions. Laboratories able to detect lower concentrations of DNA also had less false negative results among the specimen suspensions (Fig. 3) and this association was significant (p,0.0001).

Environmental EQAP
Seven laboratories from six countries participated in the first round of the environmental EQAP (Table 1). Eight laboratories from eight countries took part in the second round (Table 1). Five laboratories participated in both rounds.

Results
by sample (inter-laboratory reproducibility). There was a high level of inter-laboratory reproducibility in both rounds (Table 3). In the first round, the proportion of results concordant with VIDRL varied between 86% and 100% by sample, with four samples correctly identified by all laboratories. In the second round, this proportion ranged  Results by laboratory (concordance, false positive and false negative results). The proportion of concordant qualitative results varied between 43% and 100% by laboratory in the first round and between 50% and 100% in the second round ( Table 5). The median overall performance was 100% concordant in the first round and 95% in the second round. Six laboratories reported 100% concordant results in the first round while four reported 100% concordant results in the second round. Of the five laboratories that participated in both rounds, two reported all results correctly in both rounds, two scored worse in the second round and one scored better.
In the first round, one laboratory reported false positive results ( Table 5). In the second round, one laboratory reported false positive results, two laboratories reported false negative results and one laboratory reported both false positive and false negative results.
Laboratory type, workload and methods used. In both rounds, the majority of participants were reference or academic laboratories. Three laboratories reported testing between 100 and 500 environmental samples per year while the other participants tested less than 100 samples. As with the clinical EQAP, participants used a range of DNA extraction protocols, PCR methods and amplification targets, including both in-house methods and commercial kits for DNA extraction and gel-based and real-time PCR for sequence detection. However, in both rounds, commercial kits were most commonly used for DNA extraction and real-time PCR targeting IS2404 (+/2 other targets) was most commonly used for sequence detection. There was no correlation between participants' performance and laboratory type, workload or method. The two panels retained by the coordinating laboratory and tested after all participants had submitted results yielded results consistent with the expected results (in most cases less than a 10-fold difference in detection of M. ulcerans DNA), demonstrating that any delay in testing was unlikely to have been a cause of discordant results (Table 3).

Discussion
The results of the two rounds of clinical and environmental EQAP indicated that there is a great variation between laboratories in the quality of molecular detection of M. ulcerans from clinical and environmental samples. Only 36% of the laboratories in the first round of the clinical EQAP and 31% in the second round had more than 90% concordant results. In both environmental rounds, however, at least half of respondents reported 100% concordant results. This discrepancy between the clinical and environmental EQAP could be explained by the fact that, in the environmental program, samples had a higher bacillary load and because the laboratories participating in the environmental EQAP were among the stronger ones in the clinical EQAP (87% and 90% concordant results in the first and second round respectively).
False positive results indicate problems of specificity and are suggestive of cross-contamination during DNA extraction or PCR most often due to carryover amplicon contaminations from previously amplified PCR products. This highlights the importance of the three-room PCR principle (where DNA extraction, mastermix preparation and template addition are separated) and reinforces the need for performing DNA extraction from clinical (particularly cultured isolates) and environmental samples in separate areas with dedicated reagents and equipment. The false Table 4. Cont. negative results may be due to poor DNA extraction efficiency, low PCR sensitivity and/or PCR inhibition. Among the nine laboratories with discordant results during the first clinical EQAP, four had both false positive and false negative results indicating both problems of sensitivity and specificity of their DNA extraction and/or PCR assay. The laboratories with few concordant results all had low intra-laboratory reproducibility indicating that their false results were probably due to mistakes in manipulations rather than to the techniques used. This impression was reinforced by the observation that every extraction method and PCR assay showed great variations in concordant results. Moreover, reference laboratories had more concordant results than academic and hospital laboratories, again suggesting that the variation was most probably due to laboratory performance.
In the second round of the clinical EQAP five laboratories had an intra-laboratory reproducibility of more than 90% with three of them processing more than 100 clinical specimens in 2010. Among the laboratories with false results, a majority of 13 had false negatives indicating problems with the sensitivity of DNA extraction and/or PCR assay. Six of these laboratories had intralaboratory reproducibilities over 90% indicating that their false results were due to the techniques in use rather than to manipulation errors.
Indeed, the two laboratories with the lowest number of concordant results (lab05 and lab15) did have high intralaboratory reproducibilities. Moreover, they reported no false positive results suggesting that the low sensitivity was probably due to a problematic PCR assay rather than manipulation errors. They both indeed did not detect less than 29610 25 ng/ml DNA.
For most laboratories the limited sensitivity of molecular detection was probably due to a weak detection by the PCR assay as demonstrated in Fig 3. Five laboratories detected DNA dilutions up to 29610 26 ng/ml only and also reported several false negative results among the set of specimen suspensions. Among the laboratories that detected 29.10 27 ng/ml DNA some still reported several false negative suspensions indicating the use of an insufficiently sensitive DNA extraction method. Manipulation errors may also explain the combination of false positive results and a low intra-laboratory reproducibility (lab02, lab23 and lab28).
More laboratories participated in the second round of the clinical EQA program (11 in 1st vs. 18 in 2nd round) and the median proportion of concordant results increased (from 82% to 88%). Also the median inter-and intra-laboratory reproducibilities increased respectively from 90% to 94% by sample and from 88% to 94% by laboratory. The number of laboratories reporting false positive results reduced from 64% to 38% while the number  reporting false negative results increased from 55% to 81%, suggesting that the set of specimen suspensions distributed in the second round included some difficult ones/weak positives, challenging the detection limit of the PCR assays used. Eight laboratories participated in both rounds. Four laboratories performed better in the second round while three performed worse. One laboratory twice reported 100% concordant results. One of the laboratories that performed worse in round 2 (coded lab08 in round 1 and lab05 in round 2) reported results with a reduced concordance but a higher intra-laboratory reproducibility (Table 3). This laboratory reported many false negatives and did not detect more than 29610 25 ng/ml M. ulcerans DNA indicating that solving the PCR sensitivity problem will probably increase the number of concordant results compared with the 1st round. In a repeated quality control study for the molecular detection of M. tuberculosis, improved performance was observed as well [15,16].
Three of the eight laboratories that participated twice in the clinical EQA program used a different method for either DNA extraction (a commercial instead of an in-house method) or amplification (qPCR instead of a single run and nested conventional PCR assay). The laboratory that changed to a commercial DNA extraction method slightly increased its performance with less false positive but an increase in false negatives. Both laboratories that changed to real-time PCR as their amplification method had reduced performances with more false positives as well as false negatives. This could be due to those laboratories not being acquainted yet with the newly implemented technique.
Both in the clinical and environmental EQAP the laboratories' performances did not correlate with the methodologies used for DNA extraction and amplification. It is therefore not possible, based on the results presented here, to make any recommendations on methodologies. Comparisons of different methods for the molecular detection of M. ulcerans in clinical and environmental specimens have been described previously [20,22,25].
Of the five laboratories with false results in either the first or second round of the environmental EQAP, two reported false positive results, two reported false negative results and one reported both false positive and false negative results. In the second round, the three positive samples that were incorrectly reported as negative by at least two laboratories were the types of samples that typically contain PCR inhibitors (faeces and soil). The fact that one of these samples had an IS2404-Ct value of 22.19 (the strongest positive sample in the panel) and that these laboratories correctly detected M. ulcerans DNA in the water sample that had a IS2404-Ct value of 36.19 (the weakest positive sample in the panel), suggests that these false negative results were due to inhibition (although none was reported by participants) rather than low sensitivity of their DNA extraction or PCR assay. These results highlight the importance of including internal positive controls in every reaction to test for inhibition and the challenge of optimizing DNA extraction protocols and PCR assays to reduce inhibition without compromising sensitivity.
Implications of the false positive and false negative results of respondent laboratories may be that (i) patients suffering from illnesses other than BU receive inappropriate treatment; (ii) patients with BU are erroneously considered as suffering from other illnesses; (iii) epidemiological data on BU are unreliable; (iv) conclusions drawn from clinical as well as environmental studies are doubtful; (v) researchers pursue or abandon lines of research on the basis of incorrect environmental results, which hampers efforts to elucidate the mode of transmission and environmental reservoir of M. ulcerans; and (vi) PCR as a single confirmatory test is insufficient for laboratories with weak EQA results (,90% concordant results).
We therefore recommend that: (i) these quality assessment programmes are continued on a regular basis; (ii) laboratories investigate the areas for improving their performance; (iii) laboratories implement rigorous internal quality control procedures (with e.g. the inclusion of (weak) positive and negative controls in every DNA extraction batch and PCR run as well as the inclusion of an internal positive control in every reaction); and (iv) the diagnosis of BU by microscopy is reinforced. Two-thirds of PCR positive specimens can be confirmed by direct smear examination [8,9,26,27]. Direct smear examination is a cheap and fast diagnostic method that can be applied easily in BU endemic areas without the need for expensive and sophisticated equipment [28,29]. Moreover, in most BU endemic countries systems are in place to control the quality of direct smear examination by national tuberculosis programs, and the periodic (e.g. quarterly) correlation of smear microscopy results with PCR results can serve as an additional internal control to detect problems with the molecular diagnosis of M. ulcerans. Table 5. Qualitative results per laboratory of the two rounds of environmental EQAP. This laboratory did not formally submit results, but used the program to troubleshoot its DNA extraction and PCR protocols with the assistance of the coordinating laboratory (see results for details