Accuracy of the interferon-gamma release assay for the diagnosis of tuberculous pleurisy: an updated meta-analysis

Background and Objectives. The best method for diagnosing tuberculous pleurisy (TP) remains controversial. Since a growing number of publications focus on the interferon-gamma release assay (IGRA), we meta-analyzed the available evidence on the overall diagnostic performance of IGRA applied to pleural fluid and peripheral blood. Materials and Methods. PubMed and Embase were searched for relevant English papers up to October 31, 2014. Statistical analyses were performed using Stata and Meta-DiSc. Pooled sensitivity, specificity, positive likelihood ratio (PLR), negative likelihood ratio (NLR), positive predictive value (PPV), negative predictive value (NPV) and diagnostic odds ratio (DOR) were count. Summary receiver operating characteristic curves and area under the curve (AUC) were used to summarize the overall diagnostic performance. Results. Fifteen publications met our inclusion criteria and were included in the meta analysis. The following pooled estimates for diagnostic parameters of pleural IGRA were obtained: sensitivity, 0.82 (95% CI [0.79–0.85]); specificity, 0.87 (95% CI [0.84–0.90]); PLR, 4.94 (95% CI [2.60–9.39]); NLR, 0.22 (95% CI [0.13–0.38]); PPV, 0.91 (95% CI [0.85–0.96]); NPV, 0.79 (95% CI [0.71–0.85]); DOR, 28.37 (95% CI [10.53–76.40]); and AUC, 0.91. The corresponding estimates for blood IGRA were as follows: sensitivity, 0.80 (95% CI [0.76–0.83]); specificity, 0.70 (95% CI [0.65–0.75]); PLR, 2.48 (95% CI [1.95–3.17]); NLR, 0.30 (95% CI [0.24–0.37]); PPV, 0.79 (95% CI [0.60–0.87]); NPV, 0.75 (95% CI [0.62–0.83]); DOR, 9.96 (95% CI [6.02–16.48]); and AUC, 0.89. Conclusions. This meta analysis suggested that pleural IGRA has potential for serving as a complementary method for diagnosing TP; however, its cost, high turn around time, and sub-optimal performance make it unsuitable as a stand-alone diagnostic tool. Better tests for the diagnosis of TP are required.


INTRODUCTION
Tuberculous pleurisy (TP) is the most common form of extrapulmonary tuberculosis, accounting for 23% of all tuberculosis cases and 30% of cases of disease-causing pleural effusion (PE) (Vidal et al., 1986;Corbett et al., 2003;Valdés et al., 2003), which involves exudate containing primarily lymphocytes. Direct diagnosis of TP would be the best way to avoid misdiagnosis and the resulting inappropriate treatment (Lin et al., 2009), but this remains a challenge. Definitive diagnosis of TP depends on isolating Mycobacterium tuberculosis from PE or pleural tissue. Conventional methods, such as PE culture, pleural biopsy and Ziehl-Neelsen staining, show poor sensitivity for detecting the limited amounts of bacteria in the PE of affected patients (Escudero et al., 1990;Valdés et al., 1998). Culturing PE is also time-consuming. Pleural biopsy is invasive and technically difficult, so its effectiveness depends on technical skill (Pérez & Jiménez, 2000). It may not be suitable for elderly and children, individuals with underlying co-morbidities, and those at high risk of bleeding. The tuberculin skin test is cross-reactive for Bacille Calmette Guérin (BCG) and many non-tuberculous mycobacteria, increasing the risk of misdiagnosis (Lawrence, 2000;Stead & To, 1987;Liebeschuetz et al., 2004). The limitations of these conventional approaches to diagnosing TP highlight the need to identify new diagnostic tools.
The PE of patients with TP has been shown to contain significantly higher levels of T lymphocytes and interferon (IFN)-γ than peripheral blood (North & Jung, 2004;Sharma et al., 2002), and the PE of these patients contains higher IFN-γ levels than the PE of uninfected individuals (Yamada et al., 2001). In fact, T lymphocytes that have previously been exposed to MTB release more IFN-γ on repeat exposure. This inspired the development of a T-cell IFN-γ release assay (IGRA), which is now licensed as a blood test for diagnosis of latent tuberculosis (Lalvani, 2007;Pai, Zwerling & Menzies, 2008).
Whether IGRA can be used to diagnose TP is controversial. A previous meta-analysis concluded that it showed poor sensitivity and specificity for this purpose (Zhou et al., 2011). Nevertheless, a growing number of studies have focused on extending the use of IGRA to the diagnosis of TP (Hooper, Lee & Maskell, 2009). Therefore, the present meta-analysis was undertaken to comprehensively assess the overall accuracy of IGRA for the diagnosis of TP.

Search strategy and study selection
PubMed and Embase were searched for articles published before October 31, 2014. The following search terms were used: "pleural effusion/pleural fluid, pleurisy/pleuritis AND elispot, OR quantiferon, OR interferon-gamma assays, OR interferon-gamma release assays, OR t cell assays." The related-articles function was also used, and reference lists in relevant articles were searched manually.
Studies were included in our meta-analysis if they (1) used IGRA testing for the diagnosis of tuberculous pleurisy (2) reported sufficient data to calculate true positive, false positive, false negative , and true negative of IGRA for the diagnosis of TP, and (3) constituted original research published in English. Studies available only as abstracts were excluded.

Data extraction and quality assessment
Two reviewers independently checked all potentially relevant studies, and disagreements were resolved by consensus. Data were collected from each study, including first author, year of publication, country, participant characteristics, IGRA method, samples, cut-off values, sensitivity, specificity and methodological quality. For each study we constructed 2 × 2 contingency tables in which we calculated true positive, false positive, false negative, and true negative rates.
The methodological quality of the studies was assessed using the 14-items Quality Assessment for Studies of Diagnostic Accuracy (QUADAS) guidelines (Whiting et al., 2003). When a criterion was fulfilled, a score of 1 was given, 0 if a criterion was unclear, and −1 if a criterion was not achieved. This evaluation instrument rates studies on a quality scale of up to 14 points.

Statistical analyses
Standard methods recommended for meta-analyses of diagnostic test evaluations (Devillé et al., 2002) were used. Stata 12.0 and Meta-DiSc 1.4 were used for statistical analysis. The following accuracy measures were calculated for each study: sensitivity, specificity, positive likelihood ratio (PLR), negative likelihood ratio (NLR), positive predictive value (PPV), negative predictive value (NPV) and diagnostic odds ratio (DOR). Summary receiver operating characteristic (SROC) curves and area under the curve (AUC) were also calculated (Moses, Shapiro & Littenberg, 1993;Irwig et al., 1995;Vamvakas, 1998). Heterogeneity across studies was detected using chi-square and Fisher's exact tests. We planned to use a random-effects model to synthesize data if heterogeneity was present (P < 0.05 and I 2 > 50%) (Shen et al., 2012). Based on this rule, pooled average sensitivity, specificity and other diagnostic parameters of pleural and blood IGRA were calculated using, respectively, a random-effects model and a fixed-effects model (Irwig et al., 1995;Vamvakas, 1998). Potential presence of publication bias was tested using funnel plots and the Egger's test. All statistical tests were two-sided, and the threshold of significance was set at P < 0.05.

Diagnostic accuracy
In the 17 analyses of pleural IGRA, diagnostic sensitivity ranged from 0.44 to 1.0 (Fig. 2   This meta-analysis involved two different types of commercially available assays: ELISPOT and ELISA. The ELISPOT assay, such as the T-SPOT-TB, involves sensitizing T cells to specific M. tuberculosis antigens, such as the early secreted antigenic target 6 (ESAT-6) and culture filtrate protein 10 (CFP-10), and then measuring the IFN-γ subsequently released. ELISA, such as Quanti-FERON-TB Gold (QFN-G) or the third-generation 'In-Tube' (QFN-IT), measures the release of INF-γ into whole blood or PE after stimulation by ESAT-6 and CFP-10. Comparison of overall diagnostic values for ELISPOT and ELISA did not allow a conclusion about which assay type was superior (Table 2). We assessed the overall diagnostic performance by calculating SROC curves and the corresponding AUC. The SROC curve for pleural IGRA was not positioned near the desirable upper left corner, and the point where sensitivity equals specificity (Q) was 0.84; the optimum AUC was 0.91 (Fig. 3A). The corresponding SROC curve for blood IGRA showed Q of 0.77 and AUC of 0.84 (Fig. 3B). Although neither the pleural or blood AUC was entirely satisfactory, this summary analysis suggests that pleural IGRA shows much better diagnostic performance than blood IGRA.

Multiple regression analysis and publication bias
The quality of the 17 studies in this meta-analysis varied considerably, with only five studies earning high QUADAS scores (≥ 11; Table 1). These scores were used in a meta-regression analysis to assess the effect of study quality on the relative DOR (RDOR) of IGRA for the diagnosis of TP (Table 3). Higher-and lower-quality studies did not differ significantly in RDOR for either pleural or blood IGRA (Table 3). Seven studies were performed in areas with a low tuberculosis incidence (Wilkinson et al., 2005;Ariga et al., 2007;Losi et al., 2007;Keng et al., 2013;Ates et al., 2011;Eldin et al., 2012;Kang et al., 2012) and 10 studies (eight publications) were performed in areas with a high tuberculosis incidence (Baba et al., 2008;Chegou et al., 2008;Dheda et al., 2009;Lee et al., 2009;Liu et al., 2013;Liao et al., 2014;Chung et al., 2011;Gao et al., 2012). Diagnostic accuracy of pleural IGRA depended significantly only on assay method (ELISPOT vs ELISA, P = 0.023),

Figure 3 Summary receiver operating characteristic (SROC) curves for T-cell interferon-gamma assays in pleural fluid (A) and peripheral blood (B).
Solid circles represent each study included in the meta-analysis, with circle size representing the sample size in each study. The regression SROC curves summarize the overall diagnostic accuracy.  but not on study quality or tuberculosis incidence. Diagnostic accuracy of blood IGRA depended significantly on both assay method and tuberculosis incidence. Results of the RDOR analysis were shown in Table 3. Publication bias was analyzed by using funnel plots and the Egger's test. Since the funnel plots for publication bias showed asymmetry (Fig. 4), Egger's tests were performed, which confirmed significant risk of publication bias in the meta-analyses for both blood IGRA and pleural IGRA (both P < 0.001).

DISCUSSION
IGRA has an advantage over conventional methods of diagnosing M. tuberculosis infection, because it is based on specific antigens, such as ESAT-6 and CFP-10, that are absent from BCG and most environmental mycobacteria. Whether this assay is suitable for diagnosing TP is controversial. In fact, Zhou et al. (2011) conducted a meta-analysis to analyze the diagnostic role of IGRA for TP. According to his inclusion criteria, only seven publications were included. Several years have passed, and some new studies have been added, so we conducted this updated meta-analysis. Our meta-analysis summarizes the available evidence on this question in an effort to provide guidance for TP diagnosis. Our results showed that the pooled sensitivities of pleural and blood IGRA were 0.82 and 0.80, respectively, and the corresponding specificities were 0.87 and 0.70. These findings, coupled with the relatively low AUC values representing overall performance, suggest that IGRA has some usefulness for diagnosing TP, but that it should be interpreted only in conjunction with conventional tests or clinical signs. Positive results from IGRA may be helpful for confirming TP, but the relatively low sensitivity makes it vulnerable to generating false negatives. Significant heterogeneity was found in sensitivity, specificity, PLR, NLR, DOR for pleural IGRA, and specificity, PLR, DOR for blood IGRA. Five studies had a higher QUADAS score (≥ 11). There was no significant difference between higher-quality studies and lower-quality ones.
We assessed pleural and blood IGRAs using SROC curves and DOR tests, both of which combine sensitivity and specificity. SROC curves, which are unlikely to be affected by a diagnostic threshold effect (Jones & Athanasiou, 2005), showed an optimum cut-off of 0.84 for pleural IGRA and 0.77 for blood IGRA, while the corresponding AUCs were 0.91 and 0.84, suggesting less than fully satisfactory overall accuracy. The DOR of a test is the ratio of the odds of obtaining a positive test result in the disease group to the odds of obtaining a positive test result in the no-disease group (Zhou et al., 2011). When DOR >1, higher values indicate better discriminatory test performance. We calculated a pooled DOR of 28.37 for pleural IGRA and of 9.96 for blood IGRA, suggesting that IGRA and particularly pleural IGRA may be helpful for diagnosing TP. We found higher pooled sensitivity and specificity for pleural IGRA than a previous meta-analysis (Zhou et al., 2011), which likely reflects our inclusion of more articles. Similarly we calculated a higher pooled DOR for pleural IGRA (19.0,]) than that meta-analysis did. We conclude that pleural IGRA has better prospects than blood IGRA for widespread clinical implementation. This was possibly due to compartmentalization of antigen-specific effector T cells, which could be recruited and concentrated at the site of infection, such as pleural cavity. ESAT-6-specific, IFN-γ secreting T-cells have a 15-fold concentration in PE relative to peripheral blood in patients with TP (Wilkinson et al., 2005).
Potentially more clinically meaningful than DOR and SROC, PLR and NLR are often used as measures of diagnostic accuracy. PLR indicates how much the odds of a condition are increased by a positive test, while NLR indicates how much they are decreased by a negative test. Larger PLR means greater diagnostic accuracy, whereas a smaller NLR is better. The pooled PLR of 4.94 for pleural IGRA suggests that patients with TP have a nearly five-fold greater chance of a positive test result than patients without TP. Even though this PLR is larger than that reported in a previous meta-analysis (Zhou et al., 2011), it is still too small for clinical purposes. At the same time, we calculated a pooled NLR of 0.22 for pleural IGRA, indicating that the probability that a patient with a negative result has a 22% chance of having TP, which is not low enough to reliably rule out false negatives. The corresponding PLR and NLR for blood IGRA were even less satisfactory.
The pooled PPV for pleural IGRA was 0.91, indicating that 9% of positive results may be false positives. The NPV of pleural IGRA was 0.79, suggesting a negative rate of 21%. The corresponding values for blood IGRA were less satisfactory. Although these PPV and NPV values are higher than those reported in a recent meta-analysis (Zhou et al., 2011), they are still not as high as necessary for reliable clinical performance.
Our results are consistent with the observation that pleural and blood IGRAs give a relatively high rate of false positive test results because IGRA cannot distinguish active from latent tuberculosis (Hooper, Lee & Maskell, 2009;Dheda et al., 2009). In the present meta-analysis, we found pleural IGRA to show a lower rate of false positive results than false negative results. Previous studies showed IGRA, especially T-SPOT-TB, to be helpful in the diagnosis of latent tuberculosis (Lalvani, 2007;Pai, Zwerling & Menzies, 2008), while the overall accuracy of the technique for diagnosing TP was lower than for diagnosing latent tuberculosis (Diel et al., 2011) but higher than for diagnosing active tuberculosis (Sester et al., 2011). This dependence of diagnostic accuracy on tuberculosis form may reflect the fact that patients with latent M. tuberculosis infection live with superior immunologic function, such that smaller pathogen load can elicit an effective response to tuberculosis antigen. Another explanation is significant heterogeneity among studies. A third possible explanation is transient exposure to non-replication persistent M. tuberculosis in the pleural space of patients without PE.
Two types of IGRAs are commercially available: the ELISA-based QFT-G or QFT-IT, and the ELISPOT-based T-SPOT-TB. Although both ELISPOT and ELISA measure IFN-γ release after T cell stimulation by ESAT-6 and CFP-10, ELISPOT has been reported to be more stable and sensitive (Liebeschuetz et al., 2004). Indeed, we found the sensitivity, PLR, DOR and AUC to be higher for pleural ELISPOT than for pleural ELISA (Table 2). On the other hand, the specificity and NLR were lower for pleural ELISPOT than for pleural ELISA. In the blood-based assay, sensitivity, specificity, PLR, DOR and AUC were higher for ELISPOT than for ELISA, but NLR was lower for ELISPOT than for ELISA. Therefore, we cannot determine whether ELISPOT or ELISA shows greater overall accuracy for diagnosing TP. This requires larger studies that compare the two types of IGRAs in parallel.
The reliability of meta-analysis in general is limited by the methodological quality and heterogeneity of included studies (Petitti, 2001). Quality scoring was compiled for every study on the basis of title, introduction, methods, results and discussion. When a criterion was fulfilled, a score of 1 was given, 0 if a criterion was unclear, and −1 if a criterion was not achieved. Quality of study can be interpreted into different scores by the use of QUADAS, thus, easy to be carried out and compared. Overall the quality of study design and reporting diagnostic accuracy of most studies were good to a certain extent and five studies had a higher QUADAS score (≥ 11). IGRA performance was similar in higher-quality studies (QUADAS ≥ 11) and lower-quality ones. Pleural IGRA studies showed significant heterogeneity in meta-analyses of sensitivity, specificity, PLR, NLR and DOR. Whether the study used ELISPOT or ELISA significantly affected the diagnostic accuracy of both pleural and blood IGRAs. We also found that whether a study was performed in an area of low or high tuberculosis incidence significantly affected the accuracy of blood IGRA, but not of pleural IGRA. A previous study concluded that IGRA was more sensitive and specific than conventional methods in areas of high tuberculosis prevalence (Gao et al., 2012). This contrasts with studies in low-incidence areas showing that pleural fluid T-cells in pleural fluid respond to stimulation with ESAT-6 and CFP-10 are significantly more than do to T-cells in peripheral blood (Ariga et al., 2007;Losi et al., 2007), perhaps reflecting the fact that most patients in such areas are immunocompetent. Our observation of a differential effect of study area on the two types of IGRAs may reflect country biases in the studies examining each type of IGRA. Future studies should address this question in detail.
Theoretically, tuberculosis antigen-specific responses like the one measured by IGRA should allow clinicians to distinguish PE from alternative diagnosis and provide greater discriminatory value than non-specific inflammatory biomarkers such as unstimulated IFN-γ or adenosine deaminase (ADA). However, comparing our findings with those of previous meta-analyses (Zhou et al., 2011;Liang et al., 2008) suggests that IGRA has lower overall accuracy than either IFN-γ or ADA for diagnosing TP. In fact, one study found that combining ADA and IFN-γ to diagnose TP led to 100% specificity (Keng et al., 2013). The authors of that study were unsure why IFN-γ and ADA perform better than IGRA. Future studies should investigate this question.
Some limitations should be discussed in this meta-analysis. First, we included only studies published in PubMed and Embase, and we excluded abstracts, letters to the editor and articles written in languages other than English. This may have led to publication bias, which is indeed suggested by our funnel plots and Egger's test. Second, only five of the 15 publications diagnosed TP based on bacteriological or histological assessment, or on the gold standard combination of both (Wilkinson et al., 2005;Ariga et al., 2007;Eldin et al., 2012;Liu et al., 2013;Gao et al., 2012). The remaining 10 publications used a mixture of bacteriological, histological or clinical assessment (Losi et al., 2007;Baba et al., 2008;Chegou et al., 2008;Dheda et al., 2009;Lee et al., 2009;Keng et al., 2013;Ates et al., 2011;Kang et al., 2012;Liao et al., 2014;Chung et al., 2011). Third, the results of this meta-analysis may be less applicable to severely immunocompromised subjects, since IGRA depends on host immunity and many studies excluded indeterminate results from analysis. This may have led to systematic error in some studies.

CONCLUSION
Our meta-analysis suggests that pleural IGRA shows much better diagnostic performance than blood IGRA. Pleural IGRA has potential for serving as a complementary method for diagnosing TP; but that its sub-optimal performance, cost and high turnaround time make it unsuitable as a stand-alone diagnostic tool. Better tests for the diagnosis of TP are required.