How useful is the tuberculin skin test for tuberculosis detection: Assessing diagnostic accuracy metrics through a large Tunisian case-control study

Background and aim During the past decade, the frequency of extrapulmonary forms of tuberculosis (TB) has increased. These forms are often miss-diagnosed. This statement of the TB epidemiological profile modification, conduct us to reflect about the utility of the Tuberculin Skin Test (TST) in active TB detection. This study aimed to evaluate the diagnostic accuracy performance of the TST for active tuberculosis detection. Methods This was a case-control, multicenter study conducted in 11 anti-TB centers in Tunisia (June-November2014). The cases were adults aged between 18 and 55 years with newly diagnosed and confirmed tuberculosis. Controls were free from tuberculosis. A data collection sheet was filled out and a TST was performed for each participant. Diagnostic accuracy measures of TST were estimated using Receiver Operating Curve (ROC) curve and Area Under Curve (AUC) to estimate sensitivity and specificity of a determined cut-off point. Results Overall, 1050 patients were enrolled, composed of 336 cases and 714 controls. The mean age was 38.3±11.8 years for cases and 33.6±11 years for controls. The mean diameter of the TST induration was significantly higher among cases than controls (13.7mm vs.6.2mm; p=10 -6). AUC was 0.789 [95% CI: 0.758-0.819; p=0.01], corresponding to a moderate discriminating performance for this test. The most discriminative cut-off value of the TST, which was associated with the best sensitivity (73.7%) and specificity (76.6%) couple was ≥ 11 mm with a Youden index of 0.503. Positive and Negative predictive values were 3.11% and 99.52%, respectively. Conclusions TST could be a useful tool used for active tuberculosis detection, with a moderate global performance and accepted sensitivity and specificity at the cut-off point of 11 mm. However, it cannot be considered as a gold standard test due to its multiple disadvantages.


Introduction
The tuberculosis (TB) epidemic poses a serious public health problem, with high associated morbidity and mortality rates worldwide. 1It has been estimated that almost a quarter of the total world's population is infected with latent TB infection and approximately 10 million have developed active TB, with about 1.6 million attributed deaths in 2021, all over the world. 2 According to the World Health Organization (WHO) estimations, diagnosing and treating TB saved 66 million lives between 2000 and 2020. 2 Therefore, there is a need to improve the TB diagnostic procedure.It is a priority requirement to dispose of an accurate diagnostic tool for early detection and appropriate treatment, and set up timely control measures to limit the spread of the infection. 3However, there is no international agreement about a gold-standard test for detection of active nor latent TB. 3,4 Controls were recruited from basic health centers and from district hospitals, located in the same geographical zone of the anti-TB centers (in which cases were recruited) and during the same study period.The gender distribution frequency was approximately the same as that of TB cases.Overall, the same proportion of men and women in TB cases and in controls was respected.All the included controls had no clinically manifested active TB; they did not present any respiratory or extra-respiratory symptoms that could be of TB origin.

Exclusion criteria
Were excluded from the study: -TB cases already treated for pulmonary or extra-pulmonary TB.
-TB cases and controls with a pathological condition that may lead to tuberculin anergy: acute viral infections (measles, infectious mononucleosis, influenza), lymphomas, neoplastic pathologies, sarcoidosis, severe bacterial infection, HIV infection, among others.
-TB cases and controls having undergone immunosuppressive treatment, corticosteroid therapy for more than one month or vaccination with live attenuated vaccines two months prior to the test.
-TB cases and controls with a history of known allergic reaction to one of the components of TST or during a previous administration.

Data collection
Data was collected using a standardized questionnaire for all participants including information regarding demographic variables (age, sex, educational level), as well as medical history of any pathological condition that may lead to tuberculin anergy, the status of participant (case or control) and the type of TB infection for cases (pulmonary or extra-pulmonary), as well as the date of administration and lecture of the TST result.

Tuberculin skin testing (TST)
It should be noted that the TST was performed for all participants, cases, and controls, not as an investigator-mandated intervention, but as part of the normal process of diagnosing the etiology of their disease.
The test was performed using the Mantoux method, in all participants (cases and controls) by injecting 0.1 mL of tuberculin solution (tuberculin PPD [Purified Protein Derivative RT23 from Copenhagen]), strictly intradermally, on the forearm away from any other scar. 6e TST results were read 72 hours after administration by the same trained investigator.Diameter of induration was determined by calculating the average of the transverse and longitudinal diameter of induration (in mm). 6

Statistical analysis
Continuous and categorical variables were summarized using the mean (AE standard deviation) and relative frequencies (expressed into percentage), respectively.Pearson's Chi-square test and Student's T test were used for comparing two percentages or two means for independent samples, respectively.For all statistical tests, the significance level adopted was 0.05.All statistical analyses were performed using SPSS version 23.0 software.

Sensitivity and specificity
To assess the performance of the TST, we first calculated the sensitivity and specificity of the TST induration diameter as well as the Youden index for different possible thresholds (from a diameter of TST ≥ 5 mm to a TST diameter ≥ 15 mm).
The calculation of the 95% confidence intervals (95% CI) of the sensitivity and specificity was done using an Excel calculator and using the properties of the exact binomial distribution.
Sensitivity (Se) was defined as the proportion (ranging from 0 to 1) of tuberculosis cases who tested positive (true positives = TP) with TST.The proportion of cases which were not identified using TST were false negative (FN) results. 14pecificity (Sp) was defined as the proportion (ranging from 0 to 1) of controls who tested negative for the disease (true negatives = TN) with TST.The proportion of controls which were tested positive using TST were false positive (FP) results. 14,15e two intrinsic qualities of the test, sensitivity, and specificity, were aggregated into an index, known as the Youden Index noted J such that: J = (Se + Sp) -1.
Youden index varies between -1 and +1; a value less than or equal to 0 reflects the diagnostic ineffectiveness of the test.The test performance is better when its Youden index is close to 1. 16 Positive and negative predictive values (PPV and NPV) (or post-test probabilities) of TST were also determined for different possible TST thresholds, using the properties of Bayes' theorem 17 and using Fagan's nomogram graph. 18e predictive values were deducted from Fagan's nomogram graph based on the TB prevalence (or pre-test probability) according to TB patients' data from three university hospitals in Tunis, Tunisia (The Rabta Hospital, Charles Nicolle Hospital and Abderrahmane Mami hospital).The estimated TB prevalence was around 1% in 2016.
The positive and negative likelihood ratios (LR) were also determined for different possible thresholds of the TST, and their 95% confidence intervals (95% CI) were calculated using the Excel calculator.
The positive LR (varies from 1 to +∞) indicates that the TST is more discriminating tuberculosis cases from non-cases when it is far from 1 and the specificity approaches 1.
The negative LR (varies from 0 to 1) indicates that the TST is better able to discriminate TB cases from non-cases when it is closer to 0 and the sensitivity approaches to 1.

ROC and area under the curve (AUC)
The overall discriminative performance of the TST was evaluated using the ROC curve, 19 to determine the optimal cutoff value of the best couple (sensitivity, specificity).
We established a ROC curve, first for all participants (age, sex and site of infection combined), then according to site infection (pulmonary and lymph node tuberculosis) and according to age groups.
The AUC was calculated for each ROC curve and presented with their 95% confidence intervals (95% CI).The AUC can vary between 0.5 and 1.The closer it is to 1, the better the discriminating ability or overall diagnostic value of the TST.

Ethical considerations
All included participants were informed about the purpose of the study and have given their written informed consent to participate to the study and carrying out the TST.They were also informed about their right to refuse participation or drop out at any moment of the study collection.All collected information and data analysis was confidential and anonymous during and after data collection.
In Tunisia, there was only one national ethics committee at the time of study (2014) which was in charge of requests for clinical trial type studies only.Since no blood samples were taken or procedures performed on the participants for research purposes, and since the normal process of diagnosis and management of all patients was respected without any intervention (the TST was practiced for all participants for a diagnostic purpose and not for a research purpose), we did not submit our study to that committee at the time.The study was retrospectively approved, on August 2023 by the ethics committee of the Faculty of Medicine Of Tunis under the approval number CE-FMT/2023/03/HCN/V1.

Sociodemographic characteristics of the study population
Overall, 1050 participant were included (336 cases and 714 controls) with a mean age of 35 years.Of the cases, more than half were female (n = 179, 53.3%), sex ratio (M/F) = 0.87, with a mean (AE standard deviation) age of 38.3 (AE11.8)years.
There was no significant difference in the mean diameter of the TST induration between male and female participants (8.5 mm (SD: 6.9 mm) versus 8.7 mm (SD: 7.9 mm); p = 0.6).
The TST induration diameter disaggregated by sex and by participant status is represented in Table 2.

ROC curve
For the global performance of TST among all the 1050 participants, the best discriminant cut-off value of TST was when the induration diameter of TST was ≥11 mm, with a sensitivity and specificity of 73.7% (95% CI: 68.8 % -78.1 %) and 76.6 % (95% CI: 73.3 % -79.5 %), respectively, with a Youden Index of 0.503 and an area under the curve (AUC) of 0.789 (95% CI: 0.758 -0.819; p = 0.01).The sensitivity became >80% from a TST cut-off value ≥9 mm to 5 mm (see Table 3, Figure 2A).Using Fagan's nomogram for the best selected threshold value of TST (≥11 mm), positive and negative predictive values were determined with values of 3.11% and 99.52%, respectively (see Figure 3).Depending on the sex distribution, the best cut-off value of TST among male participants (sensitivity: 77.2%; specificity: 75.2%;Youden Index: 0.524) and female participants (sensitivity: 70.7%; specificity: 77.9%; Youden Index: 0.486) was also ≥11 mm for both sex groups.The sensitivity became >80% from a TST cut-off value ≥10 mm to 5 mm and a TST cutoff value ≥8 mm to 5 mm, for male participants and female participants, respectively (see Table 4).

Discussion
TB remains a serious public health threat.In this study, we evaluated the performance of TST in a diagnostic situation and we measured its different accuracy metric indicators for different possible cut-off points, among a large size population, with 1050 participants.The best discriminating cut-off value was chosen based on the best couple sensitivity and specificity with the highest Youden Index value.
The TST is widely used for the detection of a latent TB form and for screening the contacts of a TB patient.This test has the advantage of being easy, rapid and safe to conduct, and low-cost. 20 view of our results, it may be concluded that the TST can be used for active TB diagnosis with a moderate global performance (AUC = 0.789), 19,21 and a good couple of sensitivity and specificity at the cut-off point of 11 mm.
The results of the positive (3.1) and negative (0.34) LR of the selected cut-off value also indicates a moderate performance and utility of the TST. 22ere is no international agreement about the best cut-off value of the TST which clearly distinguishes between a positive and negative test result.Different suggestions were proposed varying between 5mm and 10mm diameter of induration.But it depends on many epidemiological and immunological patient risk factors. 23nsidering our results, the sensitivity improved (>80%) from a TST cut-off value ≥9 mm to 5 mm.So, the induration diameter threshold of TST can be reduced from 11mm (the best cut-off value) to 9mm to enhance the sensitivity of TST and consequently, reduce the frequency of false negatives and limit the miss-diagnosis of a tuberculosis infection case, regardless of its pulmonary or extra-pulmonary form.Moreover, the challenging difficulties encountered in diagnosing extrapulmonary TB, which is requiring invasive diagnostic procedures, poses a diagnostic problem for clinicians. 24 Tunisia, diagnostic confirmation of lymph node TB requires needle aspiration or surgical excision with cytological or pathological and bacteriological study (direct examination, Xpert MTB/RIF test and culture).Generally, the lymph node TB lesions are paucibacillary with a rarely positive direct examination (about 10% of positive rate only). 11nce lymph node tuberculosis is usually more difficult to clinically diagnose than pulmonary tuberculosis, it is thus possible, in case of lymph node tuberculosis suspicion, to use the 10 mm threshold to guarantee a good sensitivity (>80%) of the test.
However, the TST has some limitations.the skin induration measurement is operator-dependent and remains subjective; its interpretation depends on many other factors, like the immunological status, co-morbid conditions, and prior Calmette Guerin (BCG) vaccination. 20,25deed, the TST is not a specific test that distinguish between TB infection and post-vaccination allergy.The crossreaction of TST with antigens of other non-tuberculous mycobacteria and with the antigen used for the Calmette Guerin (BCG) vaccine generates false positives, inducing an apparent false increase in sensitivity and a decrease in specificity. 3,26rtainly, progress has been made in developing other rapid tests for TB screening like Xpert MTB/RIF, Xpert MTB/RIF Ultra 2 and Interferon-gamma release assay (IGRA) tests.However, they are still often not available and expensive, especially in low-and middle-income countries, which have the highest TB burden. 3,27That is why these novel tests are not an accessible and available option for TB diagnosis in these unfavorable areas. 27e strengths of our study include the large sample size of included participants (1050), the easy application of the TST and the clear and detailed description of the diagnostic test evaluation methodology, for further repeatability.As any epidemiological study, there were some limitations, essentially including the sample of controls which may not be representative of all non-TB participants the real life, and the non-blinded administration of the test, which may influence the operator interpretation of the test result.
Therefore, for more suitable evaluation of the TST performance for diagnosing active TB, further prospective research studies are needed.We propose to follow-up for a minimum two years, and a sample of closely exposed contacts of TB patients, for prospective detection of incident newly infected cases.At the end of the follow-up period, the contact subjects will be divided into newly diagnosed active cases and controls, which represent a nested case-control study.This method will minimize the selection bias for a good representativeness of included cases and controls.All included participants will receive the TST and the IGRA test, at the same time.It will be therefore possible to compare the performance of the two tests by comparing their corresponding ROC curves.
A progressive diagnostic approach based on a flowchart decision strategy using consecutive tests can also be a good alternative.[30] Economic evaluation of the cost-effectiveness of this diagnostic approach must be conducted to support the usefulness and the rational application of this strategy for public health purposes.
Finally, the ROC curve certainly has an important role in determining the best TST cut-off for the TB diagnosis or screening.However, it is important to consider that the interpretation of the TST characteristics is based on probabilities.Also, its predictive values cannot be properly assessed without considering all other risk factors.So, there is a need to develop a predictive clinical score for TB based on a range of epidemiological (TB prevalence, demographic characteristics, notion of contagion), clinical (individual risk factors, TB symptoms) and paraclinical examinations (chest X-ray, among others) factors.This approach can optimize and guide the use of the various diagnostic methods for TB.Unnecessary invasive examinations and procedures will be, therefore, avoided while reducing TB diagnosis cost.

Conclusions
The TST is simple to perform and a low-cost test.It can be used for active TB detection with a moderate global performance and accepted sensitivity and specificity at the cut-off point of 11mm.However, it has some disadvantages, especially its low specificity regarding high rates of false positives in areas with mass BCG vaccination.The association of the TST with another test such as the IGRA test would be a good alternative for early and accurate diagnosis of TB. many had microbiological confirmation.In the case of patients with adenitis, did you rule out the possibility of non-tuberculous mycobacteria that could have cross-reacted with the TST? ○ Regarding the controls, how did you ensure they were free of TB? Did they have chest Xrays, or was it based solely on the absence of symptoms compatible with TB? Since in the controls the TST was performed as part of routine clinical practice and not as part of the study, Was TB infection ruled out in those cases that had a TST >5-10mm?On the other hand, if TST was done as part of contact tracing, Was it repeated after 8-12 weeks?How did you differentiate if the controls were free of TB or latent TB infection?

If applicable, is the statistical analysis and its interpretation appropriate? Yes
Are all the source data underlying the results available to ensure full reproducibility?Yes

Are the conclusions drawn adequately supported by the results? Yes
Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Pediatric Infectious and Tropical Diseases
I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
Thank you very much for your review.We really appreciate your response and time.We have revised our manuscript as you suggested.Here are the responses to your comments:

Reviewer Comment:
-Title: Consider tuberculosis disease instead of active TB, and TB infection instead of latent TB, according to the latest terminology.Also, consider shortening the title length.I just wanted to point out a grammar mistake in the "Sensitivity and specificity", 10th paragraph: "the TST a test is more discriminating".Response: it was corrected.Thank you *Ethical considerations: Another grammar mistake in 1st pg, it should be "have given" Author Response: it was corrected.Thank you.

Reviewer Comment:
• I understand that not all cases were microbiologically confirmed.If not, please specify how many had microbiological confirmation.The statistical methods that have been applied are appropriate for the data, and have been described and interpreted correctly.I only have a minor comment regarding the statistical elements: -Methods: "Positive and negative predictive values (PPV and NPV) (or post-test probabilities) of TST were also determined for different possible TST thresholds by two different methods, using the properties of Bayes' theorem and using Fagan's nomogram graph".Fagan's nomogram is not really a 'different method', but rather a graphical application of Baye's theorem.
My main concern in reading this manuscript is the description of the recruitment and selection of the cases and controls, which for me leaves a number of questions unanswered: -Did you approach/include all people who underwent the TST at the participating centers during the time period stated?-No details are given regarding the number of cases and controls excluded due to the exclusion criteria.
-Did anyone decline involvement?-Was there any particular reason to evaluate the specific time period chosen?-Why did the cases undergo a TST at the time of their anti-TB treatment?-Why did the controls undergo a TST if they had no signs of TB? -Was the data collection questionnaire filled in by participants?Or was it filled in retrospectively by clinics?-Were participants recruited prospectively at each clinic?This is implied but not explicitly stated.
Further minor points: -The English writing throughout can be understood, but would benefit from further review and editing.
-I don't think that there is much value in Figure 2, unless the TST values are divided by both sex and case-control status.
-Some of the references cited do not have full details listed (although you can click through the hyperlinks for the intended target documents), e.g. 1 and 2.
-Anonymized underlying data have been provided, but the data coding is not clear for all variables.
-The data described are 10 years old.Are there likely to have been further changes to TB epidemiology in this time?

Is the work clearly and accurately presented and does it cite the current literature? Partly
Is the study design appropriate and is the work technically sound?Partly

Figure 1 .
Figure 1.Box plot of TST diameter for cases and controls.

Figure 2 .
Figure 2. Receiver operating characteristic curve of the tuberculin of skin test in diagnosing tuberculosis.(A) All 1050 participants.(B) 180 cases of lymph node tuberculosis.(C) 121 cases of pulmonary tuberculosis.

Figure 3 .
Figure 3. Positive predictive value (PPV) and negative predictive value (NPV) determined using Fagan's nomogram for the best selected threshold value of TST induration greater than or equal to 11 mm, for all participants (n = 1050).

1
Hospital La Paz, La Paz Research Institute (IdiPAZ), Universidad Autónoma de Madrid (UAM), Madrid, Spain 2 Centro de Investigacion Biomedica en Red Enfermedades Infecciosas (Ringgold ID: 637284), Madrid, Community of Madrid, Spain I agree with the changes made and approve the manuscript is ready for indexing.

○ Results *TABLE 1 :
I recommend that you specify the other sites of extrapulmonary TB *Did you analyze the time between the contact risk or the beginning of the symptoms and TST performance?-*Did you analyze the reliability of the test according to the severity of the disease?Is the work clearly and accurately presented and does it cite the current literature?Yes

Author Response:
It was done.Thank you "how useful is the tuberculin skin test for tuberculosis detection: Assessing diagnostic accuracy metrics through a large Tunisian case-control study" Reviewer Comment: -Introduction I suggest including a reference in 6th paragraph, referring to Pulmonary and extrapulmonary TB forms in Tunisia.Author Response: The ref 11 corresponds to the entire previous paragraph (6th paragraph) Reviewer Comment: -Methods:

Oliver Stirrup 1
University College London, London, UK 2 University College London, London, UK Nouira et al. describe a case-control study evaluating the diagnostic performance of the tuberculin skin test (TST) for active TB.I am a statistician with only limited experience of TB research.As such, I have commented on the statistical methods used and the study design, but not on the scientific novelty and interest of this work.

Table 1 .
Sociodemographic and clinical characteristics of the cases (n = 336) and controls (n = 714) included.

Table 2 .
The tuberculin skin test induration diameter disaggregated by sex (male versus female) and by participant status (cases versus controls).

Table 3 .
Sensitivity, specificity, Youden Index, positive and negative likelihood ratios values, positive predictive and negative predictive values corresponding to different possible tuberculin skin test thresholds (From 5-15 mm) for all participants (n=1050).

Table 4 .
Sensitivity, specificity, Youden index, positive and negative likelihood ratios values, corresponding to different possible tuberculin skin test thresholds (From 5-15 mm) by sex distribution (Male versus Female).TST: tuberculin skin test; CI: confidence interval; LR: Likelihood Ratio.

Table 5 .
Sensitivity, specificity, Youden index, positive and negative likelihood ratios values, corresponding to different possible tuberculin skin test thresholds (From 5-15 mm) by infection type (pulmonary versus lymph node tuberculosis; n cases=301).