Sensitivity of diagnostic tests for human soil-transmitted helminth infections: a meta-analysis in the absence of a true gold standard

Reliable, sensitive and practical diagnostic tests are an essential tool in disease control programmes for mapping, impact evaluation and surveillance. To provide a robust global assessment of the relative performance of available diagnostic tools for the detection of soil-transmitted helminths, we conducted a meta-analysis comparing the sensitivities and the quantitative performance of the most commonly used copro-microscopic diagnostic methods for soil-transmitted helminths, namely Kato-Katz, direct microscopy, formol-ether concentration, McMaster, FLOTAC and Mini-FLOTAC. In the absence of a perfect reference standard, we employed a Bayesian latent class analysis to estimate the true, unobserved sensitivity of compared diagnostic tests for each of the soil-transmitted helminth species Ascaris lumbricoides, Trichuris trichiura and the hookworms. To investigate the influence of varying transmission settings we subsequently stratified the analysis by intensity of infection. Overall, sensitivity estimates varied between the different methods, ranging from 42.8% for direct microscopy to 92.7% for FLOTAC. The widely used double slide Kato-Katz method had a sensitivity of 74-95% for the three soil-transmitted helminth species at high infection intensity, however sensitivity dropped to 53-80% in low intensity settings, being lowest for hookworm and A. lumbricoides. The highest sensitivity, overall and in both intensity groups, was observed for the FLOTAC method, whereas the sensitivity of the Mini-FLOTAC method was comparable with the Kato-Katz method. FLOTAC average egg count estimates were significantly lower compared with Kato-Katz, while the compared McMaster counts varied. In conclusion, we demonstrate that the Kato-Katz and Mini-FLOTAC methods had comparable sensitivities. We further show that test sensitivity of the Kato-Katz method is reduced in low transmission settings.


Introduction
Reliable, sensitive and practical diagnostic tests are an essential tool in disease control programmes, including those for neglected tropical diseases. The requirements and expectations for a diagnostic tool in terms of technical performance, feasibility and costs change as control programmes progress through different phases, from initially high levels of infections to the confirmation of absence of infections. More precisely, during initial mapping to identify priority areas for control, when infection levels are typically highest, a diagnostic test with moderate sensitivity is acceptable, although the chosen tool needs to be easy to use, cost-effective and allow for the high-throughput screening of large populations (McCarthy et al., 2012;Solomon et al., 2012). Since mapping data can also serve as a baseline for the monitoring and evaluation of programme impact, diagnostic tests must have sufficient performance to detect changes in the prevalence and intensity of infection (Solomon et al., 2012). In later stages of programmes, when infection prevalence and intensity have decreased significantly, more sensitive diagnostic tools are needed to establish an endpoint of treatment programmes. If test sensitivity is insufficient at this point, light infections might be missed and this runs the risk of stopping control programmes too early, before programme endpoints have been achieved. Highly sensitive tests are also required for surveillance once treatment has been stopped to detect the potential re-occurrence of infections (McCarthy et al., 2012;Solomon et al., 2012). Finally, diagnostic tests play an important role in the assessment of treatment efficacy (Albonico et al., 2012) and in patient management.
For the detection of the human soil-transmitted helminth (STH) species, Ascaris lumbricoides, Trichuris trichiura and the hookworms (Necator americanus and Ancylostoma duodenale), The World Health Organization (WHO) currently recommends the use of the Kato-Katz method, based on duplicate slides (WHO, 2002). Other commonly used methods include direct smear microscopy, http://dx.doi.org/10.1016/j.ijpara.2014.05.009 0020-7519/Ó 2014 The Authors. Published by Elsevier Ltd. on behalf of Australian Society for Parasitology Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/3.0/). formol-ether concentration (FEC), McMaster, FLOTAC and Mini-FLOTAC. All of these techniques rely on visual examination of a small sample of stool to determine the presence and number of STH eggs (WHO, 1994). Due to intra-and inter-sample variation in egg counts (Booth et al., 2003;Krauth et al., 2012), microscopy-based techniques can have differing sensitivities, especially in low transmission settings. Moreover, diagnostic methods vary considerably in the quantification of egg counts, which is necessary to establish intensity of infection and to evaluate treatment effects (Knopp et al., 2011;Albonico et al., 2012;Levecke et al., 2014). In order to better understand the suitability of diagnostic tools for various transmission settings and stages of disease control programmes, we performed a meta-analysis of the most commonly used copro-microscopic STH diagnostic tests.
Our main study objective was an independent and global assessment of the relative performance of commonly used diagnostic methods for STH, as well as factors associated with heterogeneity in test sensitivity. Previous evaluations of STH diagnostics have generally relied on comparisons with a combined reference standard (generated by adding the results of several compared tests or consecutively obtained samples), an approach which has been widely criticised (Enoe et al., 2000;Ihorst et al., 2007). Moreover, the absence of a common reference standard has been a major obstacle for combined evaluations of diagnostic tests in the form of a meta-analysis. We have addressed this problem by using Bayesian latent class analysis (LCA), which allows simultaneous estimation of the unknown true prevalence of infection and the sensitivities and specificities of compared diagnostic tests. This approach has been previously applied to the evaluation of imperfect diagnostic tests for Chagas disease, leishmaniasis and malaria (Menten et al., 2008;de Araujo Pereira et al., 2012;Goncalves et al., 2012), as well as specific studies evaluating STH diagnostic methods (Booth et al., 2003;Tarafder et al., 2010;Assefa et al., 2014;Knopp et al., 2014). The approach has also been used for the meta-analyses of diagnostic test performance (Ochola et al., 2006;Menten et al., 2008;Limmathurotsakul et al., 2012). The current paper presents a Bayesian meta-analysis of different diagnostic tests for the detection of STH species.

Literature search
A systematic literature search was performed to identify publications presenting the evaluation of diagnostic techniques for the human STH species, A. lumbricoides, T. trichiura and hookworms (N. americanus and A. duodenale). Systematic searches were performed (date of search 25th February 2014) using the electronic databases PubMed (http://www.ncbi.nlm.nih.gov/), MEDLINE and EMBASE (via OvidSP) (http://ovidsp.uk.ovid.com/) and the medical subject headings and search terms as detailed in Supplementary Data S1. Articles were considered if written in English, German, French or Spanish. The search was validated by verifying that a number of previously identified key readings were included in the retrieved search results. The titles of initially obtained search results were screened for suitable content and all abstracts mentioning studies on helminths were retrieved. The abstracts were subsequently screened for studies using more than one diagnostic test for the determination of infections, even if not directly mentioning a comparison of test performances. Full texts were read and information on test outcomes, egg counts, age-groups, countries of the studies and years of publication was extracted where results were presented in a suitable format as explained below. Reference lists were screened for additional publications.
The literature selection process is outlined in Fig. 1. Data were collected separately for A. lumbricoides, T. trichiura and hookworms, and restricted to the most commonly used diagnostic methods for STH, namely Kato-Katz (Katz et al., 1972), direct microscopy (WHO, 1994), formol-ether concentration (FEC) (Ritchie, 1948), McMaster (Ministry of Agriculture Fisheries andFood, 1986), FLOTAC  and Mini-FLOTAC (Barda et al., 2013a). Other techniques such as midi-Parasep, Koga Agar Plate, Willis technique and Spontaneous tube sedimentation technique (SSTT) were not included due to a lack of suitable data. As performance during field surveys was the main interest, evaluations of diagnostic tests on samples from diagnostic laboratories of hospitals were excluded. Only data provided in the form of 2 Â 2 comparisons (T1+T2+, T1+T2À, T1ÀT2+, T1ÀT2À, where T1 and T2 are the two diagnostic methods and + and À indicate the observed positive or negative results) were retained. This also included data for which these 2 Â 2 comparisons could be created by transforming the original data provided, e.g. where comparisons were made against a combined 'gold standard' of two diagnostic methods. Additionally, data on egg counts obtained by the various techniques were retrieved, including those studies that did not provide data in a suitable format for the LCA. Arithmetic mean egg counts were the most commonly reported measures and therefore used for the analysis.
For articles where data could not be directly extracted, corresponding authors were invited to contribute additional study results. Three authors replied and provided four datasets for the analysis; we were also able to contribute a further two datasets to the analysis.

Bayesian LCA
A Bayesian latent class model was used to estimate the sensitivity of different diagnostic tests as described elsewhere (Dendukuri and Joseph, 2001;Branscum et al., 2005). LCA allows estimation of the sensitivity and specificity of imperfect diagnostic tests by assuming a probabilistic model for the relationship between five unobserved, or latent, parameters: true disease prevalence p k and the sensitivities S i , S j and specificities C i , C j of diagnostic methods i and j (Pepe and Janes, 2007). The model additionally incorporates the covariance terms covD þ ij , covD À ij to account for conditional dependency between compared diagnostic tests amongst infected and non-infected individuals, which is necessary as the included diagnostic tests are based on the same biological principle (detection of eggs under a microscope) and therefore factors other than the true infection status are likely to influence both test outcomes simultaneously (Dendukuri and Joseph, 2001). Thus, the joint distribution of the results of a 2 Â 2 table follows a multinomial distribution, ðX kþþ ; X kþÀ ; X kÀþ ; X kÀÀ Þ $ Multiðp kþþ ; p kþÀ ; p kÀþ ; p kÀÀ ; N k Þ with the multinomial probabilities calculated as follows: The conditional correlations between two test outcomes for infected and non-infected individuals were calculated as , respectively. Uninformative prior information was provided for the sensitivity and underlying true prevalence (using a beta distribution with the shape parameters alpha and beta equal to 1). For the covariance terms, a uniform prior distribution was assumed with limits as described in Dendukuri and Joseph (2001) and Branscum et al. (2005) to ensure that probabilities are confined to values between 0 and 1. Specificity was included as a fixed term based on the most parsimonious, best-fitting model (i.e. that with the lowest deviance information criterion (DIC) value) and was assumed to be the same for all compared methods. This was justified on the dual assumption that false positives are rarely obtained by any type of copromicroscopic technique (Knopp et al., 2011;Levecke et al., 2011) and the necessity to restrict the number of estimated parameters for the identifiability of the model. The models, built separately for A. lumbricoides, T. trichiura and hookworms, were computed using WinBUGS software version 14 (Spiegelhalter, D., Thomas, A., Best, N., Gilks, W., 1996. BUGS: Bayesian Inference Using Gibbs Sampling. MRC Biostatistics Unit, Cambridge). Models were also developed separately for low and high intensity settings. Stratification was based on reported arithmetic mean egg counts (in eggs per gram of faeces, epg). Empirical cut-offs of 2500 epg, 400 epg and 165 epg average infection intensity were used for A. lumbricoides, T. trichiura and hookworms, respectively. These cut-offs were established based on the overall average infection intensity of studies included in the meta-analysis. Data with only geometric means reported were excluded from this analysis unless the geometric mean, which is lower than the average egg count, exceeded the cut-off value. Further details of model parameterisation, including handling of multiple slides, are provided in Supplementary Data S2.

Comparison of quantitative performances
To compare the various diagnostic tests in terms of their quantitative performance, we compared the arithmetic mean egg count obtained by various techniques. Statistical significance of differences was assessed using the non-parametric paired Wilcoxon signed-ranks test and the linearity of the relationship between counts was assessed by scatter plots of log-transformed (natural logarithm) average egg counts. Moreover, we evaluated the percentage of studies reporting egg counts of other techniques that were lower/higher than the Kato-Katz method, which currently forms the basis of the WHO defined intensity thresholds. To allow for a small variation in counts, egg counts were considered as lower or higher than the Kato-Katz method if these were lower or higher than the Kato-Katz egg count plus or minus 10%. Due to the limited availability of data and the fact that faecal egg counts do not vary significantly by the sampling effort for Kato-Katz analysis, all versions of Kato-Katz were combined (Levecke et al., 2014).

Identification of diagnostic test comparisons
The initial literature search identified 56 articles which were retrieved for full-text review. Of these, 32 studies fulfilled the inclusion criteria and 2 Â 2 comparison data could be obtained for 20 studies (Table 1) (see Fig. 1 for an outline of literature selection steps). The number of extracted 2 Â 2 comparisons by species  and diagnostic methods is shown in Fig. 2. The included studies were published between 2003 and 2014 and conducted in 12 countries, primarily among school-aged children. The inclusion of only recent studies was somewhat surprising. Even though the original literature search had retrieved studies published since 1967, the non-availability of 2 Â 2 data, the type of compared techniques and the evaluation of methods in laboratory or hospital samples led to their exclusion. The evaluation of diagnostic tests was mainly based on comparison with a combined reference-standard (14 of 20 studies); few studies used predicted estimates as a reference (1/20), an LCA approach (1/20) or a combination of the two (1/ 20). Three studies did not provide sensitivity estimates. The most widely applied method was the Kato-Katz method in 18 of 20 studies (mostly 1-slide or 2-slides on a single sample). The main characteristics of included studies are summarised in Table 1.

LCA of diagnostic test sensitivities (presence of infection)
For all STH species, the models allowing for dependency between compared diagnostic tests showed a better fit, indicated by a lower DIC (not shown). Significant positive correlation between diagnostic test outcomes for infected individuals was observed, especially for comparisons of a 1-slide 1-sample Kato-Katz test with other diagnostic tests (details are provided in Supplementary Data S2).
The estimated sensitivity of the 2-slide 1-sample Kato-Katz test for A. lumbricoides was 64.6% (95% BCI: 59.7-69.8%), for T. trichiura was 84.8% (95% BCI: 82.5-87.1%) and for hookworm was 63.0% (95% BCI: 59.8-66.4%). These estimates were only a slight improvement upon the sensitivities of a 1-slide 1-sample Kato-Katz test. However, increased sensitivities could be observed for 1-slide Table 1 Studies included in a soil-transmitted helminth (STH) diagnostic test meta-analysis. The literature search identified 20 studies evaluating selected diagnostic methods in field settings for which 2 Â 2 test comparisons could be obtained. The majority of studies compared diagnostic test performance with a combined reference standard obtained by adding positive test results from all evaluated methods. One additional study (Funk et al., 2013) was included for the evaluation of quantitative test performances. Analysis 1 refers to studies used for the latent class analysis of test sensitivities, while analysis 2 indicates studies used for the comparison of egg count outcomes. Kato-Katz performed on two consecutive samples. The sensitivity for Kato-Katz tests performed on three consecutive samples was only slightly further improved. Test specificities were not the main outcome and were fixed at 99.6% for A. lumbricoides, 97.5% for T. trichiura and 98.0% for hookworm, based upon model fit.

Effect of infection intensity on diagnostic test sensitivity
The obtained sensitivity estimates by intensity group are presented in Table 3 and Fig. 4. For all tests and STH species evaluated in both intensity groups, sensitivity varied markedly and most strongly for the Kato-Katz method. For example, for A. lumbricoides the 1-slide Kato-Katz method had a sensitivity of 48.8% (95% BCI: 37.6-58.2%) in the low intensity group compared with 95.8% (95% BCI: 91.8-98.5%) in the high intensity group. Interestingly, in the low intensity group the sensitivity of Kato-Katz was improved markedly by performance of a second slide on the same sample. The sensitivity of the FLOTAC method was highest at 81.8% (95% BCI: 65.5-90.3%) at low intensity compared with 97.1% (95% BCI: 93.1-99.7%) at high intensity.

Comparison of quantitative test performances
A total of 17, 16 and 27 comparisons of average Kato-Katz A. lumbricoides, T. trichiura and hookworm egg counts with other diagnostic methods were obtained from 11 articles (  Fig. 2. Two-by-two comparisons of diagnostic methods by soil-transmitted helminth species Acaris lumbricoides, Trichuris trichiuria and hookworm. The outlined comparisons were included in the models; numbers represent the number of available comparisons. Where studies could be subdivided into several populations, each was counted as one comparison. The Kato-Katz method could be differentiated into variations of the protocol according to number of slides or samples processed.

Table 2
Sensitivity estimates for selected diagnostic methods by helminth species. The sensitivity estimates and 95% Bayesian credible interval (BCI) were obtained for each soiltransmitted helminth species by Bayesian latent class analysis. Specificity was included as a fixed term based on model fit. however, resulted in a higher egg count for six of 11 comparisons (55%) for T. trichiura and four of 12 comparisons (33%) for hookworm whilst A. lumbricoides egg counts were significantly lower. The relationships between the logarithmic average measurements of Kato-Katz and FLOTAC or McMaster techniques followed a linear trend as shown by the scatter plots presented in Fig. 5.

Discussion
A global assessment of STH diagnostic test sensitivities and their extent of variation is required to investigate the suitability of diagnostic tools for different transmission settings or stages of STH control programmes. Here we present, to our knowledge, the first meta-analysis of STH diagnostic method performance using a Bayesian LCA framework to overcome the absence of a true gold standard (Dendukuri and Joseph, 2001;Branscum et al., 2005). Our results demonstrate that sensitivities of evaluated diagnostic tests are low overall and cannot be generalised over different transmission settings. Sensitivity, overall and in both intensity groups, was highest for the FLOTAC method, but was comparable for Mini-FLOTAC and Kato-Katz methods. Test sensitivities are strongly influenced by intensity of infection and this variation needs to be taken into account for the choice of a diagnostic test in a specific setting. Moreover, reduced test sensitivity at low infection intensities is of increasing importance as ongoing control programmes reduce the prevalence and intensity of STH infections within endemic communities.
The Kato-Katz method is the most widely used and reported diagnostic method, due to its simplicity and low cost (Katz et al., 1972), and is recommended by the WHO for the quantification of STH eggs in the human stool (WHO, 2002). Even though the overall sensitivity of the Kato-Katz method was low, the results of the stratified analysis suggest a high sensitivity of 74-95% when infection intensity is high, which is likely the case for mapping and baseline assessment. However, the test sensitivity dropped dramatically in low transmission settings, making the method a less valuable option in later stages of control programmes. This is likely a reflection of methodological problems specific to the Kato-Katz method, especially when diagnosing multiple STH species infections, as different helminth eggs have different clearing times (Bergquist et al., 2009). In high intensity settings, little value was added by performing a 2-slide test on the same sample, even though this is the currently recommended protocol; whereas in low intensity settings sensitivity was improved by performing a second slide. Sensitivity increased significantly when performing the Kato-Katz method on multiple consecutive samples, which is most likely explained by daily variations of egg excretions and the non-equal distribution of eggs in the faeces leading to substantial variation in egg numbers between stool samples from the same person (Booth et al., 2003;Krauth et al., 2012).
For all investigated STH species, sensitivity was highest for the FLOTAC method, even when evaluated in low intensity settings, a finding which is consistent with previous evaluations Knopp et al., 2009b;Glinz et al., 2010). However, despite its improved performance compared with other copromicroscopic methods, FLOTAC has several practical constraints including higher associated costs, necessity of a centrifuge and longer sample preparation time, decreasing its value as a universal diagnostic method (Knopp et al., 2009a). To enable its use in settings with limited facilities, the Mini-FLOTAC method, a simplified form of FLOTAC, was developed (Barda et al., 2013a). Our findings suggest that the sensitivity of Mini-FLOTAC is much lower than FLOTAC, and it does not outperform the less expensive Kato-Katz method according to a recent study in Kenya (Speich et al., 2010;Assefa et al., 2014). A recognised advantage of the Mini-FLOTAC method, however, is that it can be performed on fixed stools, enabling processing at a later date in a central laboratory. This can help to increase the quality control process and overcomes some of the logistical difficulties in examining fresh stool samples in the field on the day of collection (Barda et al., 2013a). The obtained Mini-FLOTAC sensitivity estimates have relatively high uncertainty, visible in the wide confidence intervals, probably due to the limited number of studies available for the analysis and their evaluation primarily in low transmission settings, where the number of positive individuals is very limited. The detection or failure of detection of a single individual therefore might have a large impact on the sensitivity estimate.
In remote areas where microscopy is often unavailable, studies can also use FEC, which allows the fixation of stool samples for later examination (WHO, 1994); several authors have also suggested the use of the McMaster technique as it is easier to standardise than Kato-Katz (Levecke et al., 2011;Albonico et al., 2012). Overall, the observed relative performances of these diagnostic tests when compared with the Kato-Katz method are consistent with those presented in the literature: the performance of Kato-Katz and McMaster methods were comparable, although this did vary by setting (Levecke et al., 2011;Albonico et al., 2013). Similarly, even though FEC had predominantly lower sensitivity than Kato-Katz in included studies, the reported relative performance varies in the literature (Glinz et al., 2010;Speich et al., 2013). The sensitivity of direct microscopy was consistently lower than the Kato-Katz method. Other available methods which were not included in our meta-analysis due to limited data availability, such as the midi-Parasep, do not show any improved test performance in their previous evaluations (Funk et al., 2013).
Although we present an improved approach for evaluating diagnostic test performances, accounting for the absence of a perfect gold standard by estimating the true unmeasured infection status and allowing for conditional dependency between the test outcomes, our analysis is subject to several limitations. The results presented here are limited by the low availability of comparable data for each diagnostic test, especially when performing the analysis stratified by intensity group. Direct microscopy was primarily evaluated in low intensity settings, which could have led to the lower observed sensitivity estimates, whereas the Kato-Katz method was evaluated in a full range of settings. The cut-off value to define high and low intensity groups of study populations was chosen based on the data included in the meta-analysis, but does not necessarily represent two main types of transmission settings. Nevertheless, the groupings demonstrate the substantial differences in test performance across varying infection intensities. As the investigated range of transmission settings was limited, further diagnostic test evaluations in specified transmission settings will be needed to provide concrete test performance estimates for each of the settings. To take into account the conditional dependency between compared diagnostic tests, we used a fixed effects model, assuming that conditional dependency is the same for all study settings. Different approaches allowing for varying correlations by using random effects to model sensitivities and specificities as a function of a latent subject-specific random variable could be explored further (Dendukuri and Joseph, 2001). Moreover, our findings might be biased towards results from studies comparing multiple diagnostic tests at the same time, as these are underpinned by a larger amount of data. Assumptions had to ensure identifiability of the model by limiting the number of parameters to be estimated. We focussed our analysis on the sensitivity of diagnostic tests, assuming that specificity of various methods do not differ largely, and therefore included the specificity of all single sample diagnostic tests as one fixed parameter. This assumption can be questioned, as for example Kato-Katz slides are more difficult to read than FLOTAC slides due to debris (Glinz et al., 2010); however, it is still an improvement on the assumption of 100% test specificity for all diagnostic tests as applied in previous publications (Booth et al., 2003;Knopp et al., 2011;Levecke et al., 2011). Using uninformative priors instead of fixed terms did not improve model fit and led to slightly wider BCIs. Importantly, the current model assumes that sensitivities are identical within all populations, which is not fulfilled if sensitivity varies by study setting (Toft et al., 2005). Indeed, the stratified analysis showed that sensitivity varied by infection intensity; however, there were not sufficient data to obtain good estimates for all tests in various transmission settings. Additionally, sensitivity in a specific study setting might be affected by other factors including stool consistency and diet, standardisation and adherence to protocols, equipment quality and human error (Bogoch et al., 2006;Bergquist et al., 2009;Levecke et al., 2011). To overcome the limited comparability of evaluations from different studies, purposeful evaluations of test sensitivity over a continuous range of infection intensities in comparable populations, for example before and after treatment rounds, are clearly necessary to better refine sensitivity estimates, and could be used to identify intensity categories within which sensitivity remains comparable. Results could then be transformed into recommendations for the use of diagnostic tests for different stages of disease control programmes.
The performance of a diagnostic tool should not only be measured in terms of sensitivity, but also needs to consider the ability of the test to quantify faecal egg counts. Current infection and treatment effect indicators are based on the Kato-Katz method, and the question arises whether the increasing use of other methods will constitute a problem for standardised recommendations (WHO, 2002). The comparison of average egg counts obtained by Kato-Katz and FLOTAC methods shows a broad agreement with previous studies with generally higher Kato-Katz egg counts (Knopp et al., 2009b(Knopp et al., , 2011Albonico et al., 2013). The quantitative performance of the McMaster technique, however, varied in comparison to the Kato-Katz method as higher McMaster average egg counts were observed in several studies, especially for T. trichiura and hookworms (Levecke et al., 2011;Albonico et al., 2012Albonico et al., , 2013. The current analysis has focussed on copro-microscopic diagnostic tests, which are based on examination of stool samples. There is current interest in developing more sensitive assays that allow a high sample throughput for screening of large populations using other biological samples and the simultaneous detection of several parasite species in co-endemic settings (Bergquist et al., 2009;Knopp et al., 2014). Recently, assays based on PCR have been developed for the detection of STH (Verweij et al., 2007;Schar et al., 2013;Knopp et al., 2014); however, we did not include this method in our meta-analysis due to limited data availability from field settings. Nonetheless, a recent study showed that the sensitivity of PCR methods was comparable with the Kato-Katz method, especially in low endemicity settings .
In conclusion, we provide a first known meta-analysis of the sensitivity and quantitative performance of STH diagnostic methods most widely used in resource-limited settings. Our results show that the FLOTAC method had the highest sensitivity both overall and in low intensity settings; however this technique low high low low high high Fig. 4. Sensitivity and 95% Bayesian credible intervals of a 1-slide 1-sample Kato-Katz, FLOTAC and McMaster test by intensity of infection and helminth species. The figure presents the results for those methods with sensitivity estimates in both intensity groups and for all three soil-transmitted helminth species. The Bayesian latent class analysis was performed, stratified by intensity of infection group, where high intensity was defined as P2500 eggs per gram of faeces (epg), P400 epg, and P165 epg average infection intensity for Ascaris lumbricoides, Trichuris trichiura and hookworm, respectively.

Table 4
Comparison of arithmetic mean egg counts (eggs per gram of faeces, epg) obtained by various techniques and Kato-Katz (various protocols). The statistical significance of the difference between egg counts was assessed using a Wilcoxon matched pairs signed-ranks test.

Method
n Mean epg (range) Mean Kato-Katz epg (range) n lower (%) a n higher (%) a Difference P value requires a centrifuge and has relatively low throughput. Our results further show that the sensitivities of the Kato-Katz and Mini-FLOTAC techniques were comparable and in high intensity settings both techniques provide a practical and reliable diagnostic method. A particular advantage of the Kato-Katz method is the ability to simultaneously detect STH and schistosome species at low cost; whereas the Mini-FLOTAC method has the advantage that it can be used on preserved samples. As control programmes reduce the intensity of infection, there is a need for diagnostic methods which are more sensitive than these currently used. In evaluating the performance of new diagnostic methods we recommend a standardised evaluation in multiple transmission settings, using the robust statistical methods presented here, as well as a consideration of the cost-effectiveness of alternative methods (Assefa et al., 2014). tests. The graphs are presented separately for the three soil-transmitted helminth species, Ascaris lumbricoides (A, D), Trichuris trichiura (B, E) and hookworm (C, F). Arithmetic mean egg counts were log-transformed (natural logarithm) for presentation purposes. Due to the limited data analysed, we refrained from fitting a regression line and presenting linear regression coefficients.