Using biomarkers to predict TB treatment duration (Predict TB): a prospective, randomized, noninferiority, treatment shortening clinical trial

Background: By the early 1980s, tuberculosis treatment was shortened from 24 to 6 months, maintaining relapse rates of 1-2%. Subsequent trials attempting shorter durations have failed, with 4-month arms consistently having relapse rates of 15-20%. One trial shortened treatment only among those without baseline cavity on chest x-ray and whose month 2 sputum culture converted to negative. The 4-month arm relapse rate decreased to 7% but was still significantly worse than the 6-month arm (1.6%, P<0.01). We hypothesize that PET/CT characteristics at baseline, PET/CT changes at one month, and markers of residual bacterial load will identify patients with tuberculosis who can be cured with 4 months (16 weeks) of standard treatment. Methods: This is a prospective, multicenter, randomized, phase 2b, noninferiority clinical trial of pulmonary tuberculosis participants. Those eligible start standard of care treatment. PET/CT scans are done at weeks 0, 4, and 16 or 24. Participants who do not meet early treatment completion criteria (baseline radiologic severity, radiologic response at one month, and GeneXpert-detectable bacilli at four months) are placed in Arm A (24 weeks of standard therapy). Those who meet the early treatment completion criteria are randomized at week 16 to continue treatment to week 24 (Arm B) or complete treatment at week 16 (Arm C). The primary endpoint compares the treatment success rate at 18 months between Arms B and C. Discussion: Multiple biomarkers have been assessed to predict TB treatment outcomes. This study uses PET/CT scans and GeneXpert (Xpert) cycle threshold to risk stratify participants. PET/CT scans are not applicable to global public health but could be used in clinical trials to stratify participants and possibly become a surrogate endpoint. If the Predict TB trial is successful, other immunological biomarkers or transcriptional signatures that correlate with treatment outcome may be identified. Trial Registration: NCT02821832


Gates Open Research
Background Tuberculosis (TB) is one of the top three global causes of infectious diseases and one of the top ten global causes of death, recently surpassing even HIV/AIDS 1 . Treatment of drug sensitive TB (DS-TB) is long, typically requiring six months with good adherence to achieve cure and prevent relapse after therapy is stopped. However, treatment adherence and completion rates are not optimal, which could probably be improved by shorter and more effective treatments. Multiple studies conducted over the past 40 years have attempted to reduce the treatment duration of TB. In the 1970s, successful combination chemotherapy lasted 24 months, based on a series of studies conducted by the British Medical Research Council (BMRC). Follow-up studies in the early 1980s successfully reduced treatment duration to 6 months using pyrazinamide and rifampin, achieving relapse rates of 1-2%. Trials that reduced the duration to below 6 months experienced increasing relapse rates, roughly 12% at 4 months and up to 20% at 3 months 2 , establishing 6 months as the accepted treatment duration for all cases of DS-TB.
Three recent randomized-controlled trials attempted to shorten treatment using a 4-month experimental arm with a fluoroquinolone substituted for one of the four standard drugs 3-5 . In all three trials, the 4-month treatment arms had relapse rates of approximately 15-20%, significantly higher than the standard of care 6-month arms but similar to the BMRC 4-month treatment trials (12%, 95% CI 9-16 among 364 patients) conducted 30+ years earlier 2 . These trials suggest that, with currently available drugs, roughly 80-85% of drug-sensitive TB patients are cured with 4 months of standard therapy but 15-20% will relapse if not treated for at least 6 months and, further, that a subset of those with less severe disease could be cured at 4 months. Identifying this subset prospectively could lead to: (1) new treatment guidelines for eligible lower risk patients; (2) criteria for selection of high-risk patients that might be candidates for Phase 2b studies with novel regimens with proposed shorter durations; and (3) quantitative estimates of the rate of change of markers associated with durable cure at specific timepoints to establish milestones for even shorter regimens. The hypothesis that patients with less severe disease can be cured earlier was tested in a separate treatment shortening trial, in which only those with less severe disease and good initial treatment response were randomized to 4 vs. 6 months of treatment 6 . In this trial, less severe disease was defined as absence of pulmonary cavities on baseline chest x-ray, and good treatment response was defined as sputum culture conversion to negative by 2 months of treatment. Only participants who met these 2 criteria were randomized to 4 vs. 6 months of treatment with standard of care drugs. This trial was stopped early by its Data and Safety Monitoring Board (DSMB) because those in the 6-month arm had a relapse rate of 1.6% while the 4-month arm had a significantly higher relapse rate of 7% (P<0.01). Despite this trial not meeting its target, the disease stratification criteria did increase the treatment success rate of the 4-month arm from 80-85% in the prior non-stratified trials to 93% (95% CI 89%, 96%), approaching 6-month treatment relapse rates. If the stratification criteria could be further refined, a successful 4-month treatment arm may be feasible. The hypothesis of the Predict TB trial is that a combination of radiographic characteristics at baseline, the rate of change of these features at one month, and markers of residual bacterial load at 4 months will identify patients with tuberculosis who are cured within 4 months (16 weeks) of standard treatment. Participants are stratified into treatment arms by the early treatment completion criteria, composed of baseline PET/CT criteria, the change in these measurements at week 4, and a minimum adherence dose count and Xpert cycle threshold as a marker of residual sputum bacterial load at week 16 (Table 2). Those with more severe disease (who do not meet all early treatment completion criteria) are placed in Arm A (standard of care arm, treatment completion at week 24). Those with less severe disease (who meet all early treatment completion criteria) are randomized either to Arm B (standard treatment duration of 24 weeks) or Arm C (standard treatment shortened to 16 weeks) at week 16. The third PET/CT scan is done at week 16 for participants in Arms B or C and randomized to either week 16 or 24 for participants in Arm A. All participants are followed to week 72 for final treatment outcomes ( Figure 1), with sputum, blood, and urine samples collected per the schedule in Table 3. Study enrollment began in Cape Town in June 2017 and is expected to begin in Henan in October 2017. Enrollment is expected to take about 3 years and the study is expected to complete within 5 years.

Treatment adherence
Total treatment duration will be determined by dose counts. Participants will receive either 16 weeks of treatment (Arm C) or 24 weeks of treatment (Arms A and B). (Arm A participants may be treated longer at the discretion of the treating physician.) Participants receiving 16 weeks of treatment will receive 112 doses with a minimum total of 100 doses. Participants who do not meet this minimum dosing requirement within the week

Radiographic criteria
Baseline PET/CT: • No total lung collapse of a single side, AND • No pleural effusion, AND • No single cavity air volume on CT scan >30 mL, AND • CT scan hard volume (-100 to +100 HU density) <200 mL, AND • PET total activity <1500 units Week 4 PET/CT: • All individual cavities decrease by >20% (unless cavity <2 mL), AND • CT scan hard volume does not increase by >10% unless the increase is <5 mL, AND • PET total activity does not increase by >30% unless the increase is <50 units

Adherence criterion
Minimum of 100 doses received by week 16 Figure 1. Predict TB study schematic. 16 visit window will not be eligible for randomization and will be moved to Arm A. Approximately 90% adherence is used, as missing more than this has been associated with an increased risk of poor outcomes 7 . Participants receiving 24 weeks of treatment will receive 168 doses with a minimum total of 150 doses. Missed doses during the initial 8-week intensive phase will be added on to the end of the intensive phase, replacing continuation phase dosing. Arm B participants who do not achieve the minimum 150 doses within the Week 24 visit window will be allowed to complete a minimum of 150 doses even if this exceeds the visit window.
All possible and available forms of adherence monitoring are encouraged as much as local resources allow. This includes (but does not require) directly observed therapy (DOT), whether by a healthcare worker, an outreach worker, or a family member. The vast majority of participants will not receive formal DOT; these participants will be provided an electronic pill box, the Medication Event Reminder Monitor (MERM; Wisepill Technologies, South Africa), which has been shown to improve treatment adherence among TB patients in China 8 . The MERM is a box that stores dispensed medication and also contains a cartridge that monitors box openings and sounds an alarm daily at a set time to remind participants to take their medicine. Box open/close data will be downloaded on follow-up visits.

Treatment outcome definitions
Treatment success is defined as a participant with at least 2 consecutive negative cultures on solid medium over a span of at least 4 weeks, achieved by the end of therapy, with no subsequent confirmed positive cultures during follow-up. Participants who remain culture positive on solid medium at Week 24 in Arm A will be considered treatment failures, will be taken off study as meeting a study endpoint and referred to continue treatment per the local standard of care (SOC). Participants who convert to solid culture negative who subsequently have a single solid culture positive for Mtb before or at week 24 need to have a subsequent culture positive for Mtb to be confirmed as treatment failures. Isolated positive cultures that are not confirmed on a subsequent sputum sample are not considered failures as these may have arisen from processing error or laboratory contamination 9 . Solid culture results will be used for the primary endpoint analysis. Liquid culture results may be used for secondary analyses.
Participants randomized to Arms B or C who are subsequently found to have a positive culture for Mtb on solid medium from weeks 16-24, confirmed on a subsequent culture, will be considered treatment failures. These participants will be referred to continue treatment per local SOC and will be followed observationally until the end of their treatment to determine outcomes. Participants who convert their sputum to culture negative (2 consecutive negatives over ≥4 weeks) and who subsequently become culture positive for Mtb again on solid medium during follow-up after week 24, confirmed by a second positive sputum culture, will be considered recurrences. Isolated positive cultures that are negative on follow-up will not be considered recurrences. Relapses will be distinguished from re-infections by DNA strain typing and only relapses will be considered a study endpoint.
Participants who are treatment failures and relapses will have drug sensitivity testing done to inform subsequent treatment. Relapses on Arms B and C will have observational follow-up until the end of retreatment to determine outcomes.

Statistical analyses
This is a non-inferiority study, with the primary endpoint being a comparison of the rate of treatment successes at 18 months (after treatment initiation) between Arms B and C. Final study treatment outcome data from participants who are unable to return at 18 months but do return during the 1 year following will be imputed back to the 18-month time point for the primary endpoint.
The primary analysis will estimate the lower bound of a 95% confidence interval of the difference in success rates between arms B and C. If the lower bound is greater than -7%, this will be evidence that the treatment-shortening arm is not inferior to the standard duration arm. Confidence intervals will be constructed using Wald intervals, with inverse weighting according to siteestimated variances, as a stratified analysis. Additional analyses of the primary endpoint will consider a non-stratified-based confidence interval of the difference.
The sample size is determined for the comparison between Arms B and C. Because these are lower risk participants, we expect a treatment success rate of 97%. Table 4 provides power calculations for a total enrollment of 117 and 140 per group. With true success rates of 97% in both arms, study power is greater than 90% with only 117 participants per group. However, to increase power to accommodate a scenario in which the true success rate in the four-month treatment arm is slightly lower than the six-month arm, a sample size of 140 per treatment arm was selected, corresponding to 155 subjects per arm after adjusting for a 10% loss to follow-up. We expect that approximately 50% of participants will be classified as higher risk and be placed into Arm A, giving a total study sample size of 620 participants.
PK-MIC substudy Typically, bacteria are termed "resistant" to a drug if the bacteria are able to grow at concentrations above the established "breakpoint" for that drug and resistance is a well-known determinant of TB treatment outcome 10 . Conventional drug susceptibility testing for isoniazid and rifampin will be employed during the trial to confirm that patients have susceptible isolates. The minimum inhibitory concentration (MIC) for a specific bacterial isolate is the drug concentration at which the growth of the bacteria is inhibited by approximately 95%. For each patient enrolled into the substudy, we will determine the specific isoniazid and rifampin MIC for their Mtb isolate. In addition, TB patients are known to have widely variable serum PK values, and these differences appear to affect treatment outcome 11-13 . Because a given patient's serum drug concentration achieved will affect the clinical interpretation of a given MIC result, we hypothesize that a model incorporating both of these parameters may predict outcomes better than either one alone. This hypothesis will be tested in a substudy among study participants believed to be at higher risk of relapse based on preliminary data, those who move to Arm A due to an inadequate treatment response on the week 4 PET/CT scan. After substudy informed consent is signed, two substudy visits will occur where a baseline blood sample is drawn, TB medication for that day is dosed, then blood is again drawn at 1, 2, and 6 hours post-dose for pharmacokinetic (PK) analysis for isoniazid and rifampin. For every Arm A participant who agrees to join the substudy, a control participant from the combined B/C arm will also be recruited to join. This will be a convenience sample of participants willing to participate in the substudy and no specific sample size is targeted.

Data and Safety Monitoring Board (DSMB)
The standing NIAID DSMB with three global TB experts added as ad hoc members will provide oversight of the study. The DSMB will meet at least twice per year to evaluate safety, study conduct, and scientific validity and integrity of the trial.

Ethical statement
Informed consent is conducted in the local language of the participant (Chinese, English, Afrikaans, or Xhosa). The study radiation dose was reviewed and approved by the Radiation Safety Committee of the U.S. National Institutes of Health (NIH). The study protocol was reviewed and approved by the institutional review board (IRB) of the National Institute of Allergy and

Discussion
With currently available drugs for TB, all published trials attempting to shorten TB treatment below 6 months have failed.
The trial that came closest to success randomized only those with less severe disease and favorable early treatment response (baseline chest x-ray without cavity and a month 2 sputum culture that had converted to negative). Multiple tuberculosis biomarkers potentially predictive of treatment outcomes are being studied to improve the risk stratification of patients 14 . From a microbiological standpoint, sputum culture conversion at 2 months of treatment is most commonly used to predict non-relapsing cure 15 but its true predictive ability is poor, with one meta-analysis showing a pooled sensitivity and specificity for predicting relapse of 40% (95% CI 25%-56%) and 85% (95% CI 77%-91%), respectively 16 . A review of data from the BMRC trials from the 1970s and 1980s found only a weak correlation (R 2 =0.36) for this marker as a surrogate for treatment failure and relapse, depending on factors such as geographic location, baseline disease and cavity status, and concomitantly used medications 17 . Further evidence against using culture conversion as a surrogate for predicting treatment outcome was recently demonstrated from the phase 3 TB treatment shortening trial REMoxTB 4 , where subsequent analyses of the culture data collected demonstrated poor correlation with treatment outcome whether analyzed at a single time point (2 months) or over time (time to culture conversion or time to culture positivity) 18 .
The ability of early radiographic changes to predict subsequent treatment outcomes in TB has been recognized for over 50 years 19 and prior studies have identified baseline cavity on chest x-ray as a risk factor for relapse [20][21][22] . However, chest x-rays are not sensitive for cavities, particularly smaller ones; more recent analyses of radiographic biomarkers have moved beyond chest x-ray to 2-deoxy-2-[ 18 F]-fluoroglucose (FDG) positron emission tomography/computed tomography (PET/CT) as an early marker of treatment response and possibly as a marker for relapse at the end of treatment. CT scans produce more detailed lung morphology than chest x-rays and PET scans provide additional information on inflammatory activity. In macaques, changes on PET/CT scans correlate with TB disease activity and treatment response 23,24 . Our group has analyzed human PET/CT data from a randomized clinical trial using metronidazole in the treatment of pulmonary multi-drug resistant tuberculosis (MDR-TB) participants 25 . As a substudy within the overall MDR-TB study, we performed PET/CT scans at 0 and 2 months and CT scans at 0, 2, and 6 months of treatment and correlated these changes with final treatment outcomes 30 months after treatment start (6 months after the end of therapy). PET changes at 2 months and CT changes at 6 months appeared to be more sensitive to predict final treatment outcomes than sputum culture conversion at 2 months, although these differences were not statistically significant 26 . These results support the potential of PET/CT imaging biomarkers as possible surrogate endpoints in clinical trials, and larger cohorts are needed to confirm these results.
We developed the radiographic early treatment completion criteria for the Predict TB trial (Table 2) (unpublished study; National Institutes of Health, Bethesda, MD, USA) using a cohort of 100 pulmonary DS-TB participants from Cape Town who received PET/CT scans at baseline, 1 month, and 6 months while on standard therapy through the national TB treatment program 27 . The participants were treated for 6 months, then followed through 18 months for final treatment outcomes. In total, 92 participants had complete PET/CT scan data, Xpert MTB/RIF cycle thresholds, and treatment outcomes. After 18 months of follow-up, 73 were considered cured, 8 failed treatment, and 11 were restarted programmatically on TB treatment, defined as the participant restarting treatment during follow-up for any reason. Our radiographic early treatment completion criteria are divided into baseline disease burden and week 4 reduction in disease burden due to treatment, which is reflective of the Johnson 2009 study which also had a measure of baseline disease burden (baseline chest x-ray without cavity) and treatment response (month 2 sputum culture conversion).
Currently, the only direct measure of TB sputum bacterial load is sputum smear, which is not very sensitive. Alternative surrogate markers evaluated include the time to positivity (TTP) of a positive culture on liquid mycobacterial culture systems, with the shorter TTP indicating a higher bacterial load. Different studies have demonstrated some correlation between Mycobacteria Growth Indicator Tube (MGIT) TTP and sputum bacterial load but with poor specificity in predicting treatment outcomes 28,29 . Thus, although time to culture conversion and MGIT TTP do correlate independently with treatment outcome, these markers do not discriminate well between high and low risk patients and therefore have only a limited role in predicting treatment outcomes of individual patients 18 . Another marker of sputum bacterial load is the GeneXpert cycle threshold. The GeneXpert assay is an automated rapid molecular diagnostic test for Mtb and resistance to rifampin with results provided directly from sputum within 2 hours 30 . The test is run using a polymerase chain reaction and the number of cycles (cycle threshold) at which the amplification curve crosses the specified threshold is recorded, with a lower cycle threshold suggestive of a higher bacterial load. In the study mentioned previously of 100 pulmonary DS-TB participants in Cape Town, a week 24 Xpert MTB/RIF cycle threshold of ≥30 predicted treatment failure with higher sensitivity and specificity than earlier time points 36 . This is in contrast to the week 8 culture, which has lower sensitivity (for cure) than Xpert MTB/RIF cycle threshold at week 24: 61% (week 8 culture) vs 89% (Xpert MTB/RIF cycle threshold week 24, p<0.01).
Estimates of specificity (for failure) for Xpert MTB/RIF cycle threshold week 24 was higher than week 8 culture (88% vs 50%), but the improvement was not statistically significant. The poor performance of week 8 culture data as a predictor of treatment outcome is similar to what was observed in the REMoxTB trial 18 .
The observation that a later test predicts outcomes better than an earlier test may be similar to findings in HIV infection, where baseline CD4 cell count is a strong predictor of mortality over time but current CD4 cell count is even stronger 37 . This has been seen in TB too, where culture conversion status at month 6 predicts final treatment outcome significantly better than culture conversion status at month 2 38 . Taken together, these results suggest that Xpert MTB/RIF cycle thresholds collected later may be able to replace an earlier microbiological culture in predicting treatment outcomes, with the major advantage of Xpert MTB/RIF over culture being the time to test result, with Xpert requiring 2 hours and culture up to 6 weeks. Thus, Xpert cycle threshold may be useful as a point-of-care test whereas culture cannot.
In summary, the Predict TB trial builds upon previous trial results, in particular the Johnson 2009 trial 6 . Instead of using a chest x-ray to determine baseline disease burden, we will use a PET/CT scan. Instead of using month 2 culture conversion as a measure of treatment response, we will use month 1 change in PET/CT scan disease burden and a month 4 Xpert MTB/RIF cycle threshold. Prior treatment shortening trials that randomized all subjects to shortened vs. standard treatment achieved treatment success rates in the 4-month arms of roughly 80-85%. Using a risk stratification approach, the Johnson study increased this to 93%. By refining their risk stratification parameters, we hypothesize that the treatment success rate in our 4-month arm will be non-inferior to our 6-month arm. If successful, our methodology could be extended to identify participants cured with even shorter regimens. We do not expect that PET/CT scans will become a risk stratification tool for global TB use due to its cost and availability limited to larger cities. Use of PET/CT scans would likely be limited to clinical trials but could be a method to stratify trial participants and possibly become a surrogate endpoint by which to reduce the number of participants needed in a Phase 2b trial, shorten overall trial duration or predict drug sterilizing activity such as in early bactericidal activity studies. These achievements could expand the number of regimens evaluated in Phase 2b trials prior to committing to an expensive and time-consuming Phase 3 study and thereby contribute to the likelihood of identifying optimal regimens. If the Predict TB trial is successful, other immunological biomarkers or transcriptional signatures that correlate with treatment outcome may be identified. These markers or signatures will likely be much cheaper and more widely available than PET/CT scans and more amenable to being scaled up globally. 1.

Summary
Chen and colleagues have provided a well-written protocol describing a randomized multi-site trial to determine if 4 months of treatment is noninferior to standard 6 months in low-risk pulmonary tuberculosis patients. The overall aim of this study is to identify patients with earlier cure through previously researched biomarkers of GeneXpert cycle thresholds and imaging markers on PET/CT scans.
Patients will be stratified by defined early treatment criteria, which incorporates initial PET/CT and imaging changes at 4 weeks, medication adherence, and GeneXpert cycle thresholds at 4 months. Those identified as low-risk will be randomized to 4 versus 6 months of treatment, while patients classified to have severe disease will complete a full 6 months of treatment. The primary endpoint is defined as treatment success at 18 months, with detailed attention to determine relapse versus repeat infection to clearly define treatment failure.
Chen provide a sound rationale for the trial based on multiple pre-clinical and clinical studies. et al Methods are well written, and supplemental material provides further detail which would allow easy reducibility. Statistical analysis described are appropriate for the trial proposed. Overall, we have only a few minor comments.

Details Introduction
The authors provide a well-written summary of how standard-of-care for pulmonary tuberculosis was defined as 6 months, and recent trials conducted which demonstrate unacceptable rates of relapse when treatment is shortened to 4 months.

Methods
The study design is clearly defined for study sites, inclusion and exclusion criteria, data collection and responsibilities, and methods to address adherence. In addition, statistical analysis is well explained, and power calculations provided for support of aimed sample size. A few clarifications would be helpful: Method of randomization appears to be only in supplemental protocol, and would be useful to list it in the main proposal. Are PET/CT readers blinded to the patient randomization?
The range of weight in inclusion criteria is broad -this could further be noted in the PK-MIC 1 2 1 2 3.
The range of weight in inclusion criteria is broad -this could further be noted in the PK-MIC substudy.
Please consider adding additional info on analysis of PK samples (location, methods etc).

Discussion
The discussion is well-written and provides further detailed insight into the design approach. With this discussion, the reader can understand the cut off of GeneXpert cycle threshold of >30. Additional detail on the reasoning for timing of PET/CT at 4 weeks rather than a later date would be helpful. Reference 26 demonstrated changes on PET at 8 weeks compared to baseline was a better predictor of cure than liquid culture in MDR-TB patients. While reference 27 utilized 4 week PET/CT rather than 8 week in drug-sensitive patients, the authors described persistent PET activity likely related to inflammation and not necessarily active infection, which would be ongoing after 4 weeks of treatment.
Is the rationale for, and objectives of, the study clearly described? Yes

Are the datasets clearly presented in a useable and accessible format? Yes
No competing interests were disclosed.

Competing Interests:
We have read this submission. We believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Ray Chen
We thank the reviewers for their review and comments. We have responded point by point below.

Methods
The study design is clearly defined for study sites, inclusion and exclusion criteria, data collection and responsibilities, and methods to address adherence. In addition, statistical analysis is well explained, and power calculations provided for support of aimed sample size. A few clarifications would be helpful: Method of randomization appears to be only in supplemental protocol, and would be useful to list it in the main proposal. The method of randomization is described in the statistical analysis plan, although it is our practice to keep precise details (e.g., block size) restricted. In brief, participants who meet baseline, week 4, and week 16 early completion criteria are block randomized (stratified by site) to Arm B (continue treatment to week 24) or Arm C (complete treatment early at week 16).

Are PET/CT readers blinded to the patient randomization?
Gates Open Research 2. Are PET/CT readers blinded to the patient randomization?
For randomization to either arm B or C, blinding is not a concern, as patient randomization does not occur until after the reader scores are entered into the database. Indeed, these results are needed to determine risk classification, which determines eligibility for randomization. Furthermore, PET/CT readers are blinded to other reader's results.
The other randomization element in this study occurs amongst arm A participants. The timing of imaging scans for arm A participants is randomized to either 16 or 24 weeks. PET/CT readers are not blinded to the timing of the scan.
The range of weight in inclusion criteria is broad -this could further be noted in the PK-MIC substudy.
This will be noted in future publications related to the PK-MIC substudy.

Please consider adding additional info on analysis of PK samples (location, methods etc).
In Cape Town, PK sample analysis will be performed on plasma samples shipped to the Rutgers New Jersey Medical School in Newark, New Jersey, USA. In China, PK sample analysis will be performed on plasma samples shipped to Shanghai Public Health Clinical Center in Shanghai. The samples will be analyzed by high pressure liquid chromatography coupled to a tandem mass spectrometer as previously described ( . 2012 Jan; 56(1): 446-457. Antimicrob Agents Chemother doi: 10.1128/AAC.05208-11). The equipment and methods used by the two labs are harmonized currently and will be revalidated as producing similar results prior to protocol sample analysis. Data processing will be performed using a current version of Analyst Software (Applied Biosystems Sciex).

Discussion
The discussion is well-written and provides further detailed insight into the design approach. With this discussion, the reader can understand the cut off of GeneXpert cycle threshold of >30. Additional detail on the reasoning for timing of PET/CT at 4 weeks rather than a later date would be helpful. Reference 26 demonstrated changes on PET at 8 weeks compared to baseline was a better predictor of cure than liquid culture in MDR-TB patients. While reference 27 utilized 4 week PET/CT rather than 8 week in drug-sensitive patients, the authors described persistent PET activity likely related to inflammation and not necessarily active infection, which would be ongoing after 4 weeks of treatment.
Our decision to use a week 4 PET/CT scan was based primarily on data from reference 27 which concerned drug-sensitive patients; the 8-week changes in reference 26 were in MDR patients. The overall slower response in MDR patients compared to drug susceptible patients suggests that a similar outcome-predictive timepoint would be shorter than 8 weeks. Therefore, we chose to use the 4-week changes in drug susceptible patients for this study. Although inflammation is still ongoing at week 4 compared to later time points, there appeared to be enough change from baseline to week 4 to be predictive of final treatment outcomes. Later time points may have less inflammation and possibly be more predictive but a certain amount of inflammation still persists even 1 year after completion of treatment in apparently cured patients, as noted in ref 27. Our goal therefore was to identify an early time point that was still predictive of final treatment outcomes. We It is of note that patients are included based on results of a GeneXpert test but without reference to their smear status, it would be beneficial to include the results of the baseline smear in the analysis to ascertain whether it predicts outcome as well as it did in the Hong Kong study. Additionally, it may be worth considering including a minimum cycle threshold for inclusion into the trial.

Our goal is to include participants with a wide range of TB disease severity. Hence, we use GeneXpert positive as an inclusion criterion regardless of smear status. We do collect enrollment smear status so this can be analyzed. For GeneXpert cycle threshold, the inclusion criterion states that this test must be positive, which equates to a cycle threshold of <39.
There is no discussion of why HIV-infected patients have been excluded from the study. Published reviews such as those by Khan in 2010 and 2012 suggested that HIV-positive patients may require a longer duration of chemotherapy than those who are uninfected although much of the data derived from studies before the era of universal ART for patients who were coinfected. Inclusion of patients from South Africa with HIV coinfection receiving ART would have been particularly informative in view of the high proportion of patients affected. A lower bacillary load and limited evidence of cavitated disease may well have different implications in this patient group.

Adherence criteria
Reference 7 on which the rationale for requiring approximately 90% of prescribed treatment to have been received is based on a study of patients with MDR-TB. Recent work by Rada Savic and colleagues has demonstrated this to be the case in patients receiving standard treatment as in the current trial. Use of MERM will presumably provide valuable data on the extent to which patients have opened the electronic pill box throughout the course of treatment. It does not mean the patients will always have taken their treatment every time they open the box! Are there any plans to ask patients about adherence either during the trial or at an exit interview, which can prove to be a more informative measure.

Detailed adherence data will be collected at every visit while the participant is on treatment, including participant self-reported adherence, pill counts, and MERM data.
Treatment outcome definitions I note the fact that solid media is being used for the primary analysis but liquid media may be used for secondary analysis. What was the rationale for using solid media?

Solid medium was selected as the primary endpoint to reduce variability associated with false positive liquid culture results possibly due to laboratory cross-contamination. We note that the reviewer's recent publication (Phillips et al. BMC Medicine (2017) 15:207) comparing solid vs. liquid culture results from the REMoxTB trial found that such laboratory cross-contamination did sometimes occur, although additional factors also may have played a role. Using liquid culture results as the primary endpoint would also have been a reasonable choice.
With regard to patients who have a single positive culture either at their last scheduled visit or prior to defaulting presumably this will be sufficient to classify such patients as having an unfavourable outcome, failure during treatment, or recurrence after stopping? Participants who default will not be eligible for randomization. Participants who are randomized but then have a single positive culture on their last visit and cannot be 3,4

randomized but then have a single positive culture on their last visit and cannot be brought back for a confirmatory culture will have their single positive culture strain typed. If the strain type is consistent with their original TB strain, this participant will be classified as "unfavorable." If the strain type is consistent with a new TB infection, the participant will be classified as "favorable."
How will other patients who are lost are during treatment be classified? Unassessable or unfavourable? Will those lost during follow-up be regarded as unassessable in the analysis? As indicated above, participants who are lost during treatment, are not eligible for randomization and will not be included in the primary analysis. Participants lost during follow-up will be considered "unassessable" for the primary analysis. The number of randomized participants who are lost during follow-up is anticipated to be small, given the eligibility requirements for randomization. However, this will be closely monitored during the study. Additionally, sensitivity analyses will evaluate the impact of considering these as unassessable considering the time at which the participant was lost to follow-up and their status at that time.

Summary
The protocol of this PREDICT TB study by Chen and colleagues describes a prospective, multi-site, non-inferiority, randomized trial of treatment shortening in pulmonary TB (PTB) patients stratified by a refined early treatment completion criteria. The aim is to identify low-risk PTB patients who could be cured with a four-month course of anti-TB therapy (compared to the standard six-month regimen) by a combination of radiographic characteristics and markers of residual bacterial load i.e. PET/CT scan results at baseline and changes after one month of treatment and GeneXpert Ct values -a predictor of bacterial load.
All eligible PTB patients will commence the standard treatment regimen, while participants who meet ALL criteria for early treatment completion as defined will be randomized at 16 weeks to either continue treatment to the full 24 weeks or to complete treatment at 16 weeks with clearly defined follow up protocol. PTB patients who did not qualify for early treatment completion criteria at baseline, or fail early treatment completion criteria at 1-month follow-up will continue standard treatment to 24 weeks. The primary outpoint was treatment success rate at 18 months between the randomized groups.
This is a very well designed RCT with clear and unambiguous hypothesis, which is based on rational and credible scientific data. The study is justified and attempts to answer a very topical question relating to the 1 2 1 2 credible scientific data. The study is justified and attempts to answer a very topical question relating to the quest for shortened treatment of pulmonary TB patients.

Specific comments
Introduction: -The protocol presents a good background to the proposed study with a review of previous studies that have attempted to shorten duration of TB treatment and the outcome. Background is well written, hypothesis is clear and credible and the authors have provided a clear justification for the RCT. However, we will suggest a revision of the hypothesis to state clearly that the focus of the study is specifically 'pulmonary tuberculosis'.

Methods:
The design of the RCT is clear and appropriate for the study aims and objectives in general. Particularly important is the very clear and unambiguous definition of criteria for early treatment completion. Having stated this, there are a number of minor methodological issues that need to be stated more clearly in the protocol: What was the sampling strategy for eligible participants at the study sites? There should be some statement about the qualification or specialization of the PET/CT readers to specify or indicate their minimum qualification and experience. Will the PET readers be blinded to the other patient characteristics including the clinical and laboratory results or not? Will the third reader who is to resolve any discrepant PET/CT scan be a more senior expert or this is to be assumed? There is no explicit mention in the protocol of what happens to patients who pass the early treatment completion criteria at baseline and at 4 weeks follow up, but found smear/culture and/or Xpert positive at 8 weeks or at 12 weeks follow-up (smear/culture only). One can assume that the likelihood of this happening might be low especially among the subjects with possible less severe disease at baseline, but this is still a very possible scenario that should be considered in the study schematic. 'PK-MIC sub study' -kindly spell out PK-MIC at first mention. Although we believe it refers to 'pharmacokinetic'.

Statistical analysis
The statistical plan is well described and very appropriate methods to assess the comparisons of the primary end points between the trial arms. Detailed sample size calculations also given.

Discussion:
Very well written

Conclusion:
This protocol by Chen clearly describes a well design randomized trial of well-refined severity stratification This protocol by Chen clearly describes a well design randomized trial of well-refined severity stratification criteria that could potentially identify low-risk pulmonary TB patients who are cured within 4 months of standard TB therapy. Overall, this is a well-written and justifiable clinical trial protocol and we will recommend that it be accepted with minor corrections.
Is the rationale for, and objectives of, the study clearly described? Yes

Is the study design appropriate for the research question? Yes
Are sufficient details of the methods provided to allow replication by others? Yes Are the datasets clearly presented in a useable and accessible format? Partly No competing interests were disclosed.

Competing Interests:
We have read this submission. We believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Ray Chen
We thank the reviewers for their comments. We have responded point by point below.

Specific comments
Introduction: -The protocol presents a good background to the proposed study with a review of previous studies that have attempted to shorten duration of TB treatment and the outcome. Background is well written, hypothesis is clear and credible and the authors have provided a clear justification for the RCT. However, we will suggest a revision of the hypothesis to state clearly that the focus of the study is specifically 'pulmonary tuberculosis'.

Methods:
The design of the RCT is clear and appropriate for the study aims and objectives in general. Particularly important is the very clear and unambiguous definition of criteria for early treatment completion. Having stated this, there are a number of minor methodological issues that need to be stated more clearly in the protocol: What was the sampling strategy for eligible participants at the study sites? In RSA, staff from the local TB clinics are aware of the study inclusion/exclusion criteria and refer newly diagnosed pulmonary TB patients who appear to fit the criteria to study staff for informed consent and screening. In China, local TB clinic staff are our study staff