Blood transcriptional biomarkers for active pulmonary tuberculosis in a high-burden setting: a prospective, observational, diagnostic accuracy study

Summary Background Blood transcriptional signatures are candidates for non-sputum triage or confirmatory tests of tuberculosis. Prospective head-to-head comparisons of their diagnostic accuracy in real-world settings are necessary to assess their clinical use. We aimed to compare the diagnostic accuracy of candidate transcriptional signatures identified by systematic review, in a setting with a high burden of tuberculosis and HIV. Methods We did a prospective observational study nested within a diagnostic accuracy study of sputum Xpert MTB/RIF (Xpert) and Xpert MTB/RIF Ultra (Ultra) tests for pulmonary tuberculosis. We recruited consecutive symptomatic adults aged 18 years or older self-presenting to a tuberculosis clinic in Cape Town, South Africa. Participants provided blood for RNA sequencing, and sputum samples for liquid culture and molecular testing using Xpert and Ultra. We assessed the diagnostic accuracy of candidate blood transcriptional signatures for active tuberculosis (including those intended to distinguish active tuberculosis from other diseases) identified by systematic review, compared with culture or Xpert MTB/RIF positivity as the standard reference. In our primary analysis, patients with tuberculosis were defined as those with either a positive liquid culture or Xpert result. Patients with missing blood RNA or sputum results were excluded. Our primary objective was to benchmark the diagnostic accuracy of candidate transcriptional signatures against the WHO target product profile (TPP) for a tuberculosis triage test. Findings Between Feb 12, 2016, and July 18, 2017, we obtained paired sputum and RNA sequencing data from 181 participants, 54 (30%) of whom had confirmed pulmonary tuberculosis. Of 27 eligible signatures identified by systematic review, four achieved the highest diagnostic accuracy with similar area under the receiver operating characteristic curves (Sweeney3: 90·6% [95% CI 85·6–95·6]; Kaforou25: 86·9% [80·9–92·9]; Roe3: 86·9% [80·3–93·5]; and BATF2: 86·8% [80·6–93·1]), independent of age, sex, HIV status, previous tuberculosis, or sputum smear result. At test thresholds that gave 70% specificity (the minimum WHO TPP specificity for a triage test), these four signatures achieved sensitivities between 83·3% (95% CI 71·3–91·0) and 90·7% (80·1–96·0). No signature met the optimum criteria, of 95% sensitivity and 80% specificity proposed by WHO for a triage test, or the minimum criteria (of 65% sensitivity and 98% specificity) for a confirmatory test, but all four correctly identified Ultra-positive, culture-negative patients. Interpretation Selected blood transcriptional signatures met the minimum WHO benchmarks for a tuberculosis triage test but not for a confirmatory test. Further development of the signatures is warranted to investigate their possible effects on clinical and health economic outcomes as part of a triage strategy, or when used as add-on confirmatory test in conjunction with the highly sensitive Ultra test for Mycobacterium tuberculosis DNA. Funding Royal Society Newton Advanced Fellowship, Wellcome Trust, National Institute of Health Research, and UK Medical Research Council.


Introduction
Delays in diagnosis of active tuberculosis contribute to its high death toll and facilitate onward transmission of infection. 1 Current diagnostic tools include smear microscopy, microbiological culture, and molecular detection by Xpert MTB/RIF (Xpert) or Xpert MTB/RIF Ultra (Ultra). These all rely on obtaining sputum or other biological samples from the site of disease. Each approach has additional limitations, such as the poor sensitivity of microscopy, the time delay for culture, the high cost of molecular tests, and false-positive Ultra results arising from detection of non-viable Mycobacterium tuberculosis. WHO has specified an urgent need for a rapid, simple, and low-cost triage test that prioritises sensitivity to confidently rule out tuberculosis, or to identify patients who require further investigation. 2 A Delphi process partly informed by cost-effectiveness considerations concluded that such a test required a minimum of 90% sensitivity and 70% specificity. 2,3 As not all patients with tuber culosis produce sputum spontaneously, a Blood transcriptional biomarkers for active pulmonary tuberculosis in a high-burden setting: a prospective, observational, diagnostic accuracy study nonsputum confirmatory test that priori tises specificity is also advocated. 2 Many host blood transcriptional signatures have been proposed to differentiate patients with pulmonary tuberculosis from healthy controls or patients with other infectious or respiratory diseases, 4 raising hopes for translation into near-patient assays. However, validation of these signatures is currently limited to evidence from case-control studies. 5 Such studies are prone to overestimate performance because of the spectrum effect arising from differences in disease prevalence and other unmeasured covariates in selected patient subgroups, and biased inclusion of cases at extremes of the distribution of phenotypes that might not be representative of the target population. 6 Independent validation in prospective, realworld populations is therefore crucial to assess true test performance, however, there are no comprehensive headto-head comparisons in such settings for candidate blood transcriptional tuberculosis signatures. WHO has endorsed use of Ultra to provide increased sensitivity compared with Xpert for PCR detection of M tuberculosis in sputum specimens. 7 However, Ultra returns more false-positive results (culture-negative) than Xpert, particularly within the semi-quantitative trace output category, which detects the lowest bacillary burden of M tuberculosis. 8 The large number of falsepositives has been attributed to detection of DNA from non-culturable M tuberculosis as a result of past infection, which is more likely in high-burden settings. 9 The decreased specificity makes diagnostic interpretation of positive Ultra results challenging, and potentially undermines the value of its greater sensitivity. 7,8,10 Therefore, in addition to being applied as standalone tests, blood transcriptional biomarkers of tuberculosis could improve the specificity of Ultra by resolving results in which only traces of DNA are detected or those in patients with previous tuberculosis. We undertook a prospective observational study to compare the diagnostic accuracy of candidate trans criptional signatures identified by systematic review. Our primary objective was to benchmark the performance of the signatures against the WHO target product profile (TPP) for a tuberculosis triage test. As secondary objectives, we sought to assess the performance of these signatures against WHO TPP criteria for a blood-based confirmatory tuberculosis test, and to explore their potential use as an add-on confirm atory test to clarify interpretation of positive Ultra results.

Research in context
Evidence before this study We did a systematic review, using comprehensive terms for "tuberculosis", "transcriptional", "signatures", and "blood", with no language or date restrictions. Many studies have been done with the aim of discovering whole-blood transcriptional signatures that discriminate individuals with tuberculosis from disease-free controls or from patients with other infectious or respiratory diseases. Several candidate signatures have thus been identified, raising hope of translation into near-patient assays. However, validation of these signatures has been limited, especially in settings where they are needed most and in sick patients undergoing routine investigation for tuberculosis. Only one previous study compared the diagnostic accuracy of candidate signatures in a head-to-head analysis, but key signatures were not included, and validation relied solely on existing case-control datasets. It has therefore been unclear which candidate signature works best for the diagnosis of tuberculosis, or if any signatures meet minimum or optimum benchmarks proposed by WHO in a real-world observational cohort. Addressing these research gaps is crucial to inform whether these biomarkers should be translated into scalable test platforms or considered for adoption by national programmes.

Added value of this study
To our knowledge, we provide the first comprehensive and systematic head-to-head comparison of candidate transcriptional signatures for identification of active tuberculosis in a prospective diagnostic accuracy study. Moreover, we used an unbiased consecutive sampling approach, in contrast to the case-control design of previous studies. Among 181 consecutive patients presenting for investigation of presumptive pulmonary tuberculosis in South Africa, four of 27 candidate transcriptional signatures performed equivalently to each other in discriminating individuals with tuberculosis from those without, irrespective of HIV status and other baseline characteristics. These signatures met or approximated to the minimum WHO target product profile for a triage test (of 90% sensitivity, 70% specificity). However, no signature met the optimum criteria (of 95% sensitivity, 80% specificity) for a tuberculosis triage test, or the minimum criteria for a confirmatory test (65% sensitivity, 98% specificity). The bestperforming signatures all improved the specificity of the Xpert MTB/RIF Ultra microbiological molecular test for Mycobacterium tuberculosis DNA, in which the advantages of greater sensitivity have been undermined by a higher rate of false-positive results.

Implications of all the available evidence
Selected blood transcriptional biomarkers show promise as triage tests for patients being investigated for pulmonary tuberculosis in high-incidence settings, exemplified by our study site. The signatures did not achieve the minimum criteria needed for a confirmatory test and should not be used by themselves for this purpose. Nonetheless, they might improve diagnostic accuracy when used in conjunction with highly sensitive molecular tests for M tuberculosis DNA. These data support further development of assays for blood transcriptional biomarkers to enable interventional trials of their potential clinical and healtheconomic effects in the diagnostic pathway for tuberculosis.

Study design and participants
Our study was nested within a diagnostic accuracy study of sputum Xpert and Ultra tests for pulmonary tuberculosis. 10 Symptomatic adults (≥18 years) selfpresenting for investigation of pulmonary tuberculosis were consecutively recruited in Cape Town, South Africa, from a tuberculosis clinic within a government primary health-care centre (Scottsdene). Patients were screened and investigated according to South African guidelines. 11 At recruitment, demographic and clinical metadata were recorded, including a modified tuberculosis symptom score (appendix 1, p1). 12 This study was approved by the Stellenbosch University Faculty of Health Sciences Research Ethics Committee (N14/10/136). All participants provided written informed consent.

Specimen microbiology and definitions
Blood was collected in Tempus tubes, and patients provided two sputum samples. One was decontaminated by Mycoprep (BD, Johannesburg, South Africa) before double Ziehl-Neelsen smear microscopy and Mycobacteria Growth Indicator Tube 960 liquid culture (appendix 1, p1). The second sputum sample was used for Xpert testing. The next morning, patients provided a third sputum sample for Ultra testing. Sputum samples were either obtained via spontaneous expectoration or induced by nebulising with 5% sodium chloride for 7-10 min.
In our primary analysis, patients with tuberculosis were defined as those with either a positive liquid culture or a positive Xpert result, to overcome the limitation of a single culture reference. Patients with missing blood RNA or sputum results were excluded.

Blood RNA sequencing and data processing
Extraction and sequencing of blood mRNA was done as previously described, 13    RNAseq data were mapped to the reference transcriptome (Ensembl Human GRCh38 release 95) and processed as previously described, 14 focusing on protein-coding genes. Unless otherwise specified, log 2 -transformed transcripts per million values were used for analysis. To account for an observed batch effect that could not be accounted for by any biological or known technical variables (appendix 1, p 13), we tested two batch correction techniques, using the ComBat and sva functions from the sva package in R, respectively (appendix 1, pp 1-2). 15 Since surrogate variable analysis preserved specified outcomes of interest (tuberculosis status, HIV status, age, sex, and ethnicity) while correcting any other, unwanted variation, and because samples clustered more tightly after batch correction with surrogate variable analysis (SVA; appendix 1, p 14), we used SVA-adjusted data for the primary analyses.

Systematic review of blood transcriptional signatures for tuberculosis
We previously did a systematic review 14 to identify candidate concise whole-blood transcriptional signatures for incipient or active tuberculosis published before April 15, 2019, including only signatures that were discovered by comparison with asymptomatic controls. In the present study, we extended the inclusion criteria from the previous review to also capture signatures intended to distinguish active tuberculosis from other diseases (appendix 1, p 2). Additionally, following initial peer review, we included two further signatures that met the inclusion criteria but were published after the date limit of our search. 16,17 All screened articles are listed in appendix 2, with reviewed full text articles matched against inclusion criteria. Signature scores were calculated using the original authors' methods (appendix 1, pp 2-4). Some signatures included genes whose annotations have since been withdrawn, or non-coding RNA and putative pseudogenes that were not present in our protein-coding RNAseq dataset (appendix 3). Where changes to the original model were made, or where a model had to be recreated, we validated the reconstructed model by comparing the area under the receiver operating characteristics curves (AUROCs) in the original dataset where possible (appendix 1, p 7).

Statistical analysis
Our sample size was primarily determined by the number of participants in the parent study 10 with paired blood RNA and sputum samples. To assess our statistical power, we used published models for estimates of sample size calculations in diagnostic tests (appendix 1, p 12). 18,19 The prevalence of tuberculosis in patients of the parent study was 30% (72/239). 10 At this prevalence, a total sample size of more than 135 participants was required to establish whether the blood transcriptional bio markers could achieve the minimum thresholds of the WHO TPP for a triage test (90% sensitivity and 70% specificity) with a 10% margin of error. Assuming the best-performing test achieved an AUROC of at least 0•9 (as is generally the case in the original reports of each signature), a total sample size of more than 130 participants was required for 80% power to identify a 0•1 difference in AUROCs between paired tests.
This study is reported in accordance with the Standards for Reporting of Diagnostic Accuracy Studies guidelines. 20   significant. Cohort characteristics were compared with χ² or Mann-Whitney tests. CIs for the differences between proportions were calculated using the Newcombe-Wilson method with continuity correction. 21 The pROC package in R was used to construct receiver operating characteristic (ROC) curves to discriminate between patients with and without tuberculosis. CIs for ROC curves' sensitivities were plotted at 1% specificity intervals, using the ci.se function of the pROC package. We compared AUROCs for each candidate signature in a pairwise approach with the DeLong method, 22 using the signature with highest AUROC as reference.
To test for differential diagnostic accuracy among predefined population subgroups, we stratified the cohort according to age, sex, ethnicity, HIV infection, previous tuberculosis, and indices of disease severity at presentation (symptom score, body-mass index [BMI], haemoglobin concentration, and sputum smear results). We constructed univariable subgroup-specific ROC curves and compared their AUROCs using DeLong tests. Sensitivity, specificity, and predictive values were reported at the maximum Youden index reflecting the highest test accuracy. 23 Additionally, we assessed diagnostic accuracy when fixing sensitivity and specificity at the minimum and optimum thresholds, as defined by the WHO TPP criteria for triage and confirmatory tests of tuberculosis, 2 using the coords function in the pROC package. WHO thresholds for a triage test were minimum 90% sensitivity, 70% specificity; optimum 95% sensitivity, 80% specificity. WHO thresholds for a confirmatory test were minimum 65% sensitivity, 98% specificity. CIs for proportions were calculated using the binomial Wilson method, 24 implemented in the binconf function of the hmisc R package. We used the upper limit of the CIs for each signature to assess whether they might achieve the required thresholds for sensitivity and specificity.
McNemar's tests were used to compare sensitivity and specificity between Ultra analysis alone and a diagnostic algorithm combining sputum Ultra analysis with transcriptional signatures (appendix 1, p 4).
We did three sensitivity analyses to confirm the robustness of our results. First, we restricted the tuberculosis case definition to patients with cultureconfirmation, irrespective of Xpert results. Second, we estimated the best possible specificity of the transcriptional signatures by simulating increased sensitivity of the standard reference that might be achieved using four sputum cultures (appendix 1, p 4), 25 as previously described. 10 Third, ComBat was used as an alternative batch correction method to the surrogate variable analysis used in primary analysis. All statistical analyses were done, and data graphically visualised, in R (version 3.6.0) or GraphPad Prism (version 8.1.1).

Role of the funding source
The funders of the study had no role in study design, data collection, data analysis, data interpretation, writing of the report, or the decision to submit for publication. The corresponding author had full access to all the data in the study and had final responsibility for the decision to submit for publication.

Results
Between Feb 12, 2016, and July 18, 2017, we obtained blood RNA samples from 205 consecutive patients. 10 Paired sputum and RNA sequencing data were available in 181 participants included in our analysis (figure 1). Their baseline characteristics are given in table 1; characteristics of participants who were excluded from the analysis are in None of these clinical parameters independently discriminated between patients with and without tuberculosis with sufficient diagnostic accuracy for a tuberculosis triage test as defined by WHO TPP (appendix 1, p 16). 2 27 signatures from 18 of 645 articles identified by our systematic review and expert consultation met the inclusion criteria (appendix 1, p 17; appendix 2; table 2). 14 of these 27 signatures were derived from study populations that included South African participants. Ten signatures were discovered in datasets that included HIV-infected participants. Five signatures were intended for diagnosis of incipient tuberculosis; 22 signatures were discovered with their intended application for diagnosis of active tuberculosis disease. Of these 22 signatures, eight aimed to distinguish tuberculosis from asymptomatic controls (including people who were healthy or with latent tuberculosis infection), five targeted discrimination of tuberculosis from other diseases, and nine aimed to distinguish tuberculosis from a mixed population of patients with other diseases and healthy controls. 24 of the 27 signatures were discovered through a genome-wide approach. Ten signatures required reconstruction of random forest or support vector machine models. We assessed whether each of the models that required reconstruction or had been otherwise altered, achieved the AUROC reported by the authors in the original dataset (appendix 1, p 7). We could not recapitulate the original AUROC for two signatures: Anderson39.OD, 26 which had been reduced from 51 transcripts originally to the 39 protein-coding transcripts that were available in our RNAseq dataset, and Duffy10, 16 for which our attempt to reconstruct the original model did not achieve the same AUROC as that reported in their validation data. In this case, we used a binary support vector machine model, which did reproduce a similar AUROC in their validation cohort. In addition, this assessment was not possible for two other signatures (Huang11 and Kaforou45) for which the AUROCs were not provided in the original reports. 30,31 We ranked the 27 candidate transcriptional signatures by their AUROC for discriminating tuberculosis and no tuberculosis in all 181 patients. The signature with the highest diagnostic accuracy was Sweeney3 (AUROC 90•6% [95% CI 85•6-95•6]), which was derived from an analysis of multiple previously published studies of patients with pulmonary tuberculosis compared with controls comprising both healthy individuals and patients with non-tuberculosis diseases. 41  Test scores of the four signatures with the highest diagnostic accuracy, among all patients and stratified by HIV status, are shown in figure 2. In exploratory subgroup analyses, diagnostic accuracy of these four signatures was not affected by HIV infection (figure 3) or any other patient baseline characteristics, including age, sex, and previous tuberculosis disease (appendix 1, p 10). AUROCs tended to be numerically lower among black patients (compared with those of mixed ancestry), and numerically lower in patients with higher BMI and with tuberculosis symptom scores of less than 3, which might indicate less severe disease. None of these differences was significant for all four signatures. Additionally, there was no systematic effect of sputum smear status or haemoglobin concentration, as other markers of disease severity, on signature performance (appendix 1, p 10). Similarly, signature scores did not correlate with duration of cough, time to culture positivity, or minimum Xpert cycle threshold, as surrogate measures of bacterial load (appendix 1, p 18). Table 3 shows the sensitivity and specificity of the BATF2, Kaforou25, Roe3, and Sweeney3 signatures at the maximum Youden index of each in all 181 patients.
When ROC curves of these signatures were benchmarked against the WHO TPP criteria for a tuberculosis triage test, point estimates or 95% CIs of all four signatures reached the minimum cutoffs of 90% sensitivity and 70% specificity ( figure 4). Similarly, when fixing either sensitivity at 90% or specificity at 70% to enforce minimum WHO TPP diagnostic criteria, all four signatures met or approximated to the required performance thresholds (table 3). However, the optimum target criteria of 95% sensitivity and 80% specificity were beyond the 95% CI of all four signatures, either at the maximum Youden index or when fixing sensitivity or specificity at the required thresholds ( figure 4, table 3).
As a secondary objective, we assessed signature performance as a blood-based confirmatory tuberculosis test, using WHO TPP criteria as a reference (figure 4). 2 At the maximum Youden index, all four signatures with the highest diagnostic accuracy failed to reach the required 98% specificity (table 3). Similarly, when setting the test thresholds to enforce either 98% specificity or 65% sensitivity, these four signatures were substantially short of the minimum performance requirements (table 3).
In view of emerging concerns that the higher sensitivity of the Ultra test might be compromised by false-positive results, 7,8,10 we also assessed the potential use of blood signatures as an add-on confirmatory test for Ultrapositive patients. Of 51 patients with Ultra-positive results in our cohort, ten (20%) were designated as false-positive by comparison with our standard reference (ie, these individuals were culture-negative and Xpert-negative at enrolment). Six (60%) of the ten Ultra false-positive patients had trace-positive results. Previous tuberculosis disease was more common in patients with Ultra falsepositive results compared with patients with Ultra truepositive results (seven [70%] of ten vs 14 [34%] of 41; χ² test p=0•039; figure 5).
Nine of the ten Ultra false-positive patients scored consistently below the Youden index threshold of all four transcriptional signatures with the highest diagnostic accuracy, correctly classifying them as nontuberculosis ( figure 5). This also included five of the six Ultra trace false-positives. However, two to eight (5-20%) of the 41 true-positive Ultra patients were incorrectly classified as non-tuberculosis at the Youden index threshold of each signature, consistent with the imperfect sensitivity of the transcriptional signatures. A diagnostic algorithm that used the blood transcriptional signature results to re-classify all Ultrapositive patients, or only those with trace results, or those with previous tuberculosis, led to improved specificity compared with Ultra analysis alone, with small associated reductions in sensitivity (table 4). Of note, follow-up of cases that were Ultra-positive but culture-negative in the parent study revealed that three of the ten cases that we designated as Ultra  false-positives were diagnosed with tuberculosis at intervals of 295, 432, and 777 days. 10 Restricting the tuberculosis case definition to cultureproven patients led to re-assignment of only one culturenegative, Xpert-positive patient as without tuberculosis. Data reanalysis confirmed the finding that the four signatures performed equivalently, independent of HIV status, while meeting or approximating the minimum criteria for a tuberculosis triage, but not confirmatory, test (appendix 1, pp [19][20]. The possibility that some patients might have been diagnosed with tuberculosis after enrolment to our study and the absence of multiple sputum cultures might have led to an underestimation of the specificity of the transcriptional signatures. To overcome this limitation, we sought to estimate the best possible specificity that the signatures could achieve if the sensitivity of the standard reference was increased by additional sputum cultures. We reclassified signature false-positive cases (at the Youden index threshold) to truepositive cases by the ratio of the sensitivity expected from four sputum cultures to that of a single culture. 25 Even in this analysis, the four signatures with the highest diagnostic accuracy failed to achieve optimum criteria for a triage test, and minimum criteria for a confirmatory test (appendix 1, p 11). Finally, we repeated our analysis after batch correction with ComBat instead of surrogate variable analysis. Again, our main findings were unchanged, confirming the robustness of our results (appendix 1, pp 21-22).

Discussion
To our knowledge, this is the first comprehensive headto-head analysis of candidate blood transcriptional biomarkers of tuberculosis in a prospective validation cohort with a high burden of tuberculosis and HIV. Four signatures (comprising 1-25 genes) had equivalent diagnostic accuracy for differentiating patients with and (B) ROC curves are replicated with restricted y axes, and benchmarked against target criteria for a tuberculosis triage test. Minimum criteria (90% sensitivity, 70% specificity) are indicated by the dashed black boxes, optimum criteria (95% sensitivity, 80% specificity) are indicated by the blue boxes. Light blue shaded areas represent the 95% CIs. (C) ROC curves are replicated with restricted x axes and benchmarked against minimum criteria for a confirmatory test (dashed black box: 65% sensitivity, 98% specificity). Light blue shaded areas represent the 95% CIs. ROC=receiver operating characteristic. AUROC=area under the ROC curve. without tuberculosis, irrespective of HIV status. These signatures met or approximated to the minimum WHO TPP criteria of 90% sensitivity and 70% specificity for a triage test to rule out tuberculosis, but failed to reach the optimum criteria (95% sensitivity and 80% specificity), and at a test threshold that offers the maximum diagnostic accuracy, they missed 9-26% of tuberculosis cases (ie, five to 14 of 54 patients with tuberculosis).
To date, no transcriptional signature has been translated into a point-of-care test, which would require the adaptation and validation of these tests as PCR-based assays. Such studies are underway; 17,29 however, the cost is likely to exceed the target threshold of $2 per sample. 2 Taken together with the suboptimal clinical performance observed in our study, the question is raised of whether host transcriptional biomarkers represent a realistic and achievable triage strategy for the resource-limited settings where they are most needed. Of note, the diagnostic accuracy of the best transcriptional biomarkers in the current analysis was similar to that of point-of-care C-reactive protein (CRP) alone for active case-finding among HIV-infected individuals. 42 Since CRP testing is likely to be substantially cheaper, prospective assessments of the superiority of transcriptional biomarkers above this benchmark are required if they are to be pursued for this application. We also tested whether transcriptional biomarkers could be used as blood-based confirmatory tests for tuberculosis, for which the WHO-specified maximum target price is higher. However, the transcriptional signatures with the highest diagnostic accuracy in our study had insufficient specificity, making them non-viable for confirmatory tuberculosis diagnostics. A principal advantage of these signatures is the easy accessibility of blood sampling. However, alternative microbiological tests for tuberculosis using non-sputum samples are being developed, 43,44 which might offer greater promise among patient subgroups where obtaining sputum is difficult.
In the current cohort, ten patients had false-positive Ultra results, including six with false-positive Ultra trace results. This finding permitted exploration of alternative clinical applications of host transcriptional signatures. The four signatures with the highest diagnostic accuracy in our study showed promise in correctly classifying Ultra false-positive patients, including those with trace results. Our preliminary results suggest that a diagnostic algorithm combining Ultra sputum analysis with blood transcriptional biomarkers improves Ultra specificity. Large-scale prospective validation studies are required to further assess this potential application, particularly among individuals suspected to be false-positives, such as those with trace results or a history of tuberculosis disease. Of note, Ultra false-positive results have been attributed to non-viable mycobacterial remnants, 7,8,10 but the fact that three individuals with Ultra false-positive results were diagnosed with tuberculosis after 295-777 days' follow-up raises the possibility that some false-positive results might   represent detection of very early paucibacillary or latent infection, undetected by Xpert or culture. In a highburden setting, we cannot exclude the possibility that these cases were due to acquisition of infection after enrolment. Therefore, whether Ultra-positive results in the absence of prevalent disease predict future incident disease, can only be addressed by randomised trials to test whether tuberculosis treatment in this group will reduce incident disease. Among the best-performing signatures, BATF2, Kaforou25, and Roe3 were originally discovered by comparing patients with active tuberculosis with asymptomatic individuals. 13,27,31 Nonetheless, their perform ance in this observational cohort of almost exclusively symptomatic patients suggests that these signatures can discriminate between tuberculosis and the casemix of other symptomatic illness in this context. Assessing the extent to which these findings are generalisable will require sim ilar observational studies in settings that might have a different casemix. Additionally, whether existing signatures have reached the maximum possible diagnostic accuracy using blood transcriptomics, or whether novel signatures, derived on even larger discovery datasets, might lead to further improvements in diagnostic accuracy, remains to be seen. Likewise, whether integration of clinical metadata with bio markers will generate models with greater diagnostic accuracy also needs to be tested using independent training and validation cohorts.
Within the limitations of the statistical power in our cohort, signature performance was independent of age, sex, HIV coinfection, or previous tuberculosis disease, and preserved in subgroup analyses of patients stratified by sputum smear status or haemoglobin concentrations as surrogate measures of disease severity. The point estimates for test performance among black patients, and patients with higher BMI and lower tuberculosis symptom scores were lower, but our study had insufficient power to assess the significance of these observations for all four signatures with the highest diagnostic accuracy.
An important strength of our study was the clinically relevant, real-life population of patients who were evaluated for tuberculosis in a high-burden setting, with both HIVinfected and HIV-uninfected individuals, and patients with varying severity of tuberculosis disease. We induced sputum, ensuring that we did not include only patients who could expectorate, for whom there is less need for non-sputum tests. Importantly, the non-tuberculosis group was not pre-selected to be homo-genous, thus likely encompassing a casemix of people with latent tuberculosis infection and other diseases. Second, we used a robust standard of culture or Xpert positivity as a diagnostic reference for our primary analysis, and confirmed that the most optimistic estimates of additional cultures in the standard reference would not significantly improve signature performance. Third, we did a systematic review to identify 27 candidate transcriptional signatures for tuberculosis to undertake a comprehensive head-to-head analysis. Finally, our dataset was exclusively used for validation rather than discovery, making it a truly independent diagnostic accuracy study.
A limitation of our study was the observed batch effect in RNA sequencing data, which appeared to result from a mixture of technical batch factors. We addressed this effect with two different data adjustment approaches, and found in both analyses that the same four signatures performed equivalently, irrespective of HIV status, and met or approximated the minimum criteria for a tuberculosis triage but not a confirmatory test. A second limitation of our study was that our cohort was restricted to adults with possible pulmonary tuberculosis. Similar independent validation studies are needed for children and patients with extra pulmonary tuberculosis. Since inclusion criteria for our systematic review were not limited by age or site of disease, the 27 candidate signatures identified could be tested in such a study. Third, no alternative diagnoses were available for patients without tuber culosis; thus, we were not able to establish whether false-positive results were related to particular non-tuberculosis diseases. Finally, this study was limited to transcriptional bio markers. Prospective head-to-head studies comparing performance of transcriptional signatures with other candidate triage test biomarkers, such as point-of-care CRP 42 and automated chest radiograph interpretation tools, 45 or with strategies that integrate biomarkers with clinical metadata, are needed.
In conclusion, we showed that four blood transcriptional signatures have equivalent diagnostic accuracy for active tuberculosis, independent of HIV status. These biomarkers achieved the WHO minimum diagnostic accuracy para meters required for a tuberculosis triage test but failed to meet the criteria for a confirmatory test in the present setting. Notwithstanding the challenge of achieving the desired target price for such tests, further validation studies are needed to assess their application in different settings alongside head-to-head comparisons with other candidate triage bio markers, with a view to interventional trials to assess their clinical and health economic effects.

Contributors
GT and MN conceived the study. ZP and BWPR were responsible for sample and metadata collection. JKR, PM, GRN and ET processed the samples. CTT and RKG analysed the data with input from RFM, GT, and MN. CTT, RKG, and MN wrote the manuscript with input from all other authors.

Declaration of interests
RFM reports personal fees from Gilead. MN reports grants from the Wellcome Trust and the UK National Institute for Health Research (NIHR). BWPR reports non-financial support from Cepheid. RKG reports grants from NIHR. JKR reports grants from the UK Medical Research Council. GT reports grants from the Royal Society Newton Advanced fellowship. JKR and MN have a UK patent application pending (1603367.2) in relation to blood transcriptomic biomarkers of tuberculosis. GRN, ZP, ET, CTT, and PM declare no competing interests.