A Decision-Theoretic Comparison of Treatments to Resolve Air Leaks After Lung Surgery Based on Nonparametric Modeling

We propose a Bayesian nonparametric utility-based group sequential design for a randomized clinical trial to compare a gel sealant to standard care for resolving air leaks after pulmonary resection. Clinically, resolving air leaks in the days soon after surgery is highly important, since longer resolution time produces undesirable complications that require extended hospitalization. The problem of comparing treatments is complicated by the fact that the resolution time distributions are skewed and multi-modal, so using means is misleading. We address these challenges by assuming Bayesian nonparametric probability models for the resolution time distributions and basing the comparative test on weighted means. The weights are elicited as clinical utilities of the resolution times. The proposed design uses posterior expected utilities as group sequential test criteria. The procedure's frequentist properties are studied by extensive simulations.


The motivating clinical trial
Intraoperative air leaks (IALs) occur in 48 to 75% of patients after pulmonary resection (Serra-Mitjans and Belda-Sanchís, 2005). Despite the routine use of intraoperative sutures and stapling devices, IALs remain a significant problem in the practice of thoracic surgery. IALs that persist beyond the immediate postoperative period of five days may result in longer chest tube drainage, greater postoperative pain, increased risk of infection, empyema, thromboemboli, and increased length of hospitalization (Merritt et al., 2010;Singhal et al., 2010). Air leaks are a particularly severe problem in patients with emphysematous lungs or who have undergone extensive visceral pleural denuding procedures, such as pleurectomy decortication. This is a surgical procedure in which the lining surrounding one lung first is removed (pleurectomy), and then any tumor masses that are growing inside the chest cavity are removed (decortication). In addition to the noted risks to the patient, the economic impact of a prolonged air leak is significant, primarily due to increased hospital stay. Because the standard procedure of suturing visible leaks and using staple reinforcement gives unpredictable results, an alternative technique to control IALs is the use of liquid sealants, which are thick fluids instilled in the areas of leaks. Progel (Neomend, Inc., Irvine, CA) is a polymeric biodegradable hydrogel sealant, that currently is the only FDA approved sealant to control IALs during pulmonary resection (Kobayashi et al., 2001).
Despite FDA approval, the true benefit of Progel in reducing the rate of occurrence or duration of IALs in lung resection patients has not been established, and therefore it is not used routinely. Researchers have conducted two studies comparing Progel (treatment group) with standard care (control group) to demonstrate the safety and efficacy of Progel (Allen et al., 2004;Klijian, 2012). Because the study of Allen et al. (2004) varied the application of Progel based on the size of the air bubbles seen in each patient, and the precise methodology of how this was done was not explained in sufficient detail to enable replication, the results of this trial are of limited use for a general comparison of Progel to standard care. The study of Klijian (2012) was retrospective and not randomized.
Given these limitations of existing data, the desire to obtain a prospective randomized comparison of Progel to standard care motivated the clinical trial described in this paper.
The trial has passed IRB (internal review board) approval and is scheduled to start accrual at The University of Texas M.D. Anderson Cancer Center.

Modeling considerations
Denote by T the days to resolve IALs, allowing the possibility that an air leak may not develop, represented by T = 0. Allen et al. (2004) and Klijian (2012) compared µ 0 and µ 1 , the means of T in a control and treatment group, respectively, using a standard t-test, and concluded that Progel was superior to standard care in reducing IALs. Figure 1  Center. The histogram suggests that a standard parametric model is inappropriate to describe air leak resolution time distributions. For example, a normal or log normal distribution would fail to allow for the observed multi-modality and late resolution times.
Moreover, some patients treated with Progel after resection may be free of air leaks immediately following surgery, corresponding to a positive probability mass at T = 0.
Let G 1 denote the distribution of T in the treatment (Progel) group and G 0 the distribution of T with the control (standard care). We will represent each G j , j = 0, 1 as a mixture of a point mass δ 0 at 0 and a hypothesized distribution M j for non-zero resolution times, with M 1 a left-shifted version of M 0 to formalize the assumption that, stochastically, IAL resolution times with Progel are no longer than with standard care. This order constraint is motivated by several medical considerations: Progel is inert, and thus it cannot react chemically with the patient's lung tissue, is not a potential source of infection, and does not slow down the healing process. Moreover, Progel cannot make an air leak worse because it does not contribute to air leak formation. These considerations motivate a priori stochastic ordering of G 1 and G 0 , which effectively says that, in terms of time to resolve an IAL, Progel may be better than standard care, but it cannot be worse. Nevertheless, for comparison we later report also inference under an otherwise equivalent model without the stochastic ordering constraint.
An important consideration in developing a trial design is that the use of an expected value as the target for a comparative test is inappropriate and inadequate, both because the historical distribution is skewed with a long right tail, and because a change in the early days after surgery is clinically more important than a comparable change in later days. Also, a standard test of µ 1 = µ 0 versus µ 1 < µ 0 would require an impractically large sample size to achieve any reasonable power. These complications are the principal reasons why designing a randomized trial to compare Progel to standard care is nontrivial, and why the use of Progel has not been widely accepted among surgeons who perform pulmonary resections.
The desire to obtain reliable confirmatory evidence to evaluate the comparative benefit of Progel motivates the randomized trial described in this paper. The goal of the trial is to assess the extent to which Progel is superior to standard care. The comparison also allows for the possibility no difference.

Stochastic ordering and Bayesian nonparametric priors
The time until resolution of air leaks for patients treated with Progel is a priori expected to be shorter than under standard care. This introduces a stochastic ordering constraint on G 0 and G 1 . Formally, a distribution G 1 is stochastically smaller than G 0 , denoted by G 1 G 0 , if the corresponding cumulative distribution functions satisfy F 1 (t) ≥ F 0 (t) for all t. Lehmann and Romano (2006) and Randles and Wolfe (1979) have modeled stochastic ordering parametrically. Although straightforward, these approaches are limited by the requirement that a parametric family must be specified.
To compare distributions of air leak resolution times, detailed features (e.g., skewed or multi-modal) of the distributions are important, leading us to consider a Bayesian nonparametric (BNP) approach. Importantly, uncertainties about the inference on these details are critical, as posterior probabilities about comparisons drive the decisions about sequential continuation and the terminal decision. Such descriptions of uncertainties are best considered in the framework of a probability model on the unknown distributions, as they are implemented in BNP models.
Formally, BNP refers to prior models for infinite dimensional unknown quantities.
Inference for random distributions, like G 0 and G 1 here, is a typical example. A common feature of BNP models is their large support, which allows one to approximate essentially arbitrary distributions (Ishwaran and James, 2001). For the proposed design, we use a model based on the Dirichlet process (DP) prior (Ferguson, 1973), which is by far the most commonly used BNP model for a random distribution. MacEachern (1999) introduced the dependent DP (DDP), which extended the DP to a probability model for a family {G x , x ∈ X} of random probability measures, indexed by some covariate x. The special case of a finite family, like {G 0 , G 1 } in our application, was discussed in De Iorio et al. (2004). Several authors have considered BNP models for stochastically ordered distributions. Gelfand and Kottas (2001) started with two DP random probability measures G 0 and G 1 , and used the product of the corresponding cumulative distribution functions to define a pair of stochastically ordered random probability measures. A general methodology for stochastic ordering by considering probability measures constrained to a convex set was proposed by Hoff (2003). Finally, Dunson and Peddada (2008) incorporated stochastic ordering constraints in the DDP prior. In this paper, we use a simple implementation of this finite DDP model with order constraint as the prior probability model for the distributions G 1 and G 0 of leak resolution times under Progel and control.
Details are discussed in Section 3. For more extensive reviews of BNP methods, see, for example, Hjort et al. (2010). Utility 100 50 10 6 5 4 3 2 0 Table 1: Elicited utility u(T ) for T = days to resolve intra-operative air leak.
We will show that the proposed design, based on differences in mean utilities evaluated from the posterior under the BNP model, has very desirable operating characteristics with n = 48 patients. The impact is the opportunity to establish what is expected to be a greatly superior treatment option for patients, with reasonable cost and effort.

Utilities and Trial Design
The primary outcome is T , the time (in days) to resolve an air leak in the lungs following surgery, and we define Y = log(T + 1). The possibility that an air leak may not develop is represented by T = Y = 0. The mean, or any other single measure of central tendency of Y , is not an appropriate summary for treatment comparisons. Instead, we take a utility-based approach. Utility-based decision criteria have been used recently in clinical trial designs (Thall and Nguyen, 2012;Lee et al., 2015). We use utilities to weight the importance of air leak resolution times after surgery. For example, a difference of a few days in time to resolution of air leaks in the days immediately after surgery is far more important than a comparable difference in later days. We performed a formal utility elicitation with our clinical collaborator RM. The rationale of the utility elicitation includes: 1) the most desirable resolution time is T = 0 (free of air leaks immediately after surgery, although this ideal outcome is almost never seen with standard care); 2) early (1 ≤ T ≤ 5) resolution of air leaks is very desirable and therefore the interval [1, 5] received a relatively high utility; 3) the utilities drop off steeply for later resolution times (T > 5). These considerations are both medical and economic, and they motivated the elicited utilities u(Y ) for Y = log(T + 1) in Table 1. In the table, the numerical utility 50 assigned to the outcome T = 5 days corresponds to the subjective assessment of RM that this comparatively favorable outcome is half as desirable as the ideal outcome of having no air leak at all. Similarly, the utility for T = 10 days reflects that this outcome, which involves a long hospital stay and the complications described earlier, is 1/5 as desirable as the outcome T = 5 days. We now are ready to define the expected utility for each group as where G j is a sampling model for the outcome in treatment group j. This expectation is over the distribution of the outcome T , and is conditional on the unknown distribution G j . We do not need to make any specific assumptions about G j yet, except for the existence of such a distribution.
Based on the probability model and the utilities of Table 1, we now define a design for the Progel trial. There are two types of decisions to be considered. At each interim test in the group sequential procedure, we make a stopping decision d i ∈ {0, 1} to stop (d i = 0) or continue (d i = 1). If we reach a predetermined maximum sample size, N, we set d N = 0 by definition. Upon stopping, a terminal decision a ∈ {0, 1} reports the final recommendation, with a = 1 denoting a recommendation for Progel and a = 0 for standard care. A decision-theoretic optimal solution would require backward induction (Bellman, 1957) to solve the full sequential decision problem. We stop short of carrying out this computationally prohibitive solution. Instead, we propose to conduct the trial as follows. Denote Y n = {Y ji , i = 1, . . . , n/2, j = 0, 1}, the observed data for the first n treated patients, that is n/2 patients in each group under the restricted equal randomization (rounding n/2 for odd n). The proposed decision criterion is the posterior probability where U ≥ 0 is a minimum clinically meaningful difference in expected utility. Because the sequential rule makes multiple decisions, as with any group sequential procedure the decision boundaries must be calibrated to control the design's overall false positive error rate. This is similar to the use of so-called alpha-spending functions in conventional frequentist group sequential designs. Like other frequentist summaries, false positive error rate (type-I error) is a probability under an assumed truth, with respect to repeated simulations of the entire trial. In the context of clinical trial designs such summaries under repeated simulations are also known as (frequentist) operating characteristics (OCs).
Because evaluating the design's OCs analytically is far too complex, we do this by repeated computer simulations of the design, under an array of different possible scenarios.
This follows routine practice in evaluating the behavior of sequentially adaptive clinical trial designs. In the present setting, the OCs are the type I error, mean sample size, and probabilities of different possible decisions (correct decision, stop due to futility, stop due to superiority). Details are reported in Tables 3 and S1.
After each cohort, we carry out Markov chain Monte Carlo posterior simulation and evaluate the posterior estimates η( U , Y n ). Let ξ U be an upper probability boundary for which the trial will be terminated early and the treatment arm declared superior Similarly, let ξ L be a lower boundary for which the trial will be terminated early due to futility, with the null hypothesis accepted, if η( U , Y n ) ≤ ξ L .
The bounds ξ L and ξ U are chosen by preliminary simulations to obtain a design with desirable frequentist OCs. In Section 4, we will illustrate how one may calibrate these bounds. In summary, the sequential stopping decision at any point of the trial is: d n = 1 if ξ L < η( U , Y n ) < ξ U ; d n = 0 otherwise.
Terminal decision rule. Upon stopping, we record the terminal decision a = 1 if η( U , Y n ) > 1 2 and a = 0 otherwise. Assuming that ξ L < 0.5 < ξ U , the rule simply records whether we stop due to crossing either the upper or lower bound, respectively. If the trial reaches the maximum number of patients, N = 48, the terminal decision uses the threshold η > 0.5 to determine a recommendation for Progel.

Model and properties
We now construct a prior probability model for G j , the sampling model for Y ij for patients under control (j = 0) and Progel (j = 1). Because some patients may be free of air leaks immediately following surgery, we allow a point mass at Y ji = 0 by defining G j , j = 0, 1, as mixtures where ∞ h=1 w h = 1. Also, we impose a constraint ν 10 ≥ ν 00 on the probabilities ν 10 and ν 00 , and M 1 M 0 , formalizing the prior belief that patients are more likely to be free of an air leak in the treatment group than in the control group. For M j = ∞ h=1 w h N (θ jh , σ 2 ), j = 0, 1, we use a DDP prior with common weights and dependent atoms. The common weights w h have the DP stick-breaking prior, The dependent prior on the atoms is constructed as follows, to ensure M 1 M 0 . We assume θ h = (θ 0h , θ 1h ) ∼ M , where M is a truncated multivariate normal base measure, including a positive probability κ for ties θ 0h = θ 1h : where N + (x | m, V ) refers to a truncated normal random variable x subject to x ≥ m, and κ = p(θ 0h = θ 1h ). For comparison we will also consider inference under a variation of model (3.2) without the order constraint, replacing the N + kernel by an unconstrained normal N (θ 1h , τ 2 ).
Denote M j = h w h δ θ jh , where δ θ jh denotes a point mass at θ jh , j = 0, 1. It is straightforward to show that M 1 M 0 , which implies M 1 M 0 , and this in turn implies G 1 G 0 , as desired. Barrientos et al. (2012) study the support properties of various DDP models. Applying Theorem 2 of Barrientos et al. (2012), it follows that the proposed model has full support over all pairs of stochastically ordered random probability measures.
For reference we state the complete model, We complete the model specification with choices for the hyperparameters ν j0 , ν j1 , σ 2 , κ, µ 1 , σ 1 , τ . In the context of clinical trial design, the hyperparameters should not introduce inappropriately strong information into the prior. To ensure this, we provide the following guidelines.
We first standardize the data by subtracting the sample meanȲ 1 of the Y 1i 's of the treatment group and scaling with the sample standard deviation s 1 , mapping Y ji → (Y ji − Y 1 )/s 1 . This is done to mitigate sensitivity to the measurement scale. We fix µ 1 = 0 and σ 1 = 1, to reflect the standardization. For σ 2 , we assume p(1/σ 2 ) = Ga(0.001, 0.001) to ensure that the prior is not too informative, where Ga(a, b) denotes a gamma distribution with mean a/b. To allow for a wide range of shifts in the response density, we specify p(1/τ 2 ) = Ga(0.5, 0.5). This implies a Cauchy distribution for θ 0h , which often is used as a robust choice in parametric models. To satisfy the constraint ν 10 ≥ ν 00 , we let ζ 0 = ν 00 , ζ 1 = ν 10 − ν 00 , and assume p(ζ 0 , ζ 1 ) = Dirichlet(0.1, 0.1, 0.1). Finally, we assume p(κ) = Beta(1, 1) and p(α) = Ga(1, 1). The conjugacy of the implied normal on θ h in (3.2) and the normal kernel in (3.3) greatly simplify posterior inference. Any Markov chain Monte Carlo (MCMC) scheme for DP mixture model as described, for example, in Neal (2000), can be applied. In our implementation, we used an implementation based on the finite DP (Ishwaran and James, 2001), which truncates the infinite sum in the DP mixture model after a finite number of terms. We used H = 10, following a recommendation based on Theorem 1 in Ishwaran and James (2002)

Trial Simulation Study
To assess average behavior of the proposed BNP trial design, we performed an extensive simulation study under a variety of scenarios that were constructed to mimic the Progel trial. For the proposed stopping and decision rules, we fixed the parameters as ξ U = 0.9, ξ L = 0.05, based on preliminary studies (described later) and examining the OCs of the proposed BNP design. In all scenarios, we set the maximum number of patients to be N = 48, randomized equally between the control and treatment group, with cohorts of 16 patients. The smallest clinically meaningful improvement used to define the decision criterion η( U , Y n ) was determined by our clinical collaborator (RM) to be U = 18, given the numerical utilities of IAL resolution times in Table 1.
We considered nine scenarios, and simulated 100 trials for each scenario. The response outcomes Y ji were generated from the simulation truth G o j shown in the last column of To calculate type I error and power, we define the null hypothesis H 0 : Under the proposed design, the test rejects H 0 in favor of Progel if η( U , Y n ) ≥ ξ U interimly with early stopping at n=16 or 32, and if η( U , Y N ) ≥ 0.5 for the terminal rule at N = 48. Similarly, the test fails to reject H 0 if η( U , Y n ) ≤ ξ L (n = 16, 32), with early stopping for futility, and if η( U , Y N ) < 0.5 at N = 48.
We fixed the hyperparameters as described earlier in Section 3 and fit the proposed BNP model (3.3) to each simulated data set. Table 3 3: Trial simulation results. MSS = mean sample size, TIE = type I error, PCD = probability of making the correct decision, P r(EarS) = probability of stopping early due to superiority, P r(EarF) = probability of stopping early due to futility, P r(FinS) = probability of declaring superiority in a final analysis without early stopping, and P r(FinF) = probability of declaring futility in a final analysis without early stopping. All probabilities are computed by repeated simulations.  Table S1.
The OCs under all nine scenarios are given in Table 3
We completed the model with a prior p(π j ) = Beta(0.1, 0.1) and a conjugate prior p(λ 2j ) = InvGa(b 1j , b 2j ). The hyperparameters b 1j and b 2j were determined by matching the prior mean of λ 2j with a maximum likelihood estimate and assuming a prior variance of 10. Finally, for λ 1j there is no conjugate prior. We followed Fink (1997) by assuming T ij ) + 2, and a 3j = 2, j = 0, 1. A final set of simulations explored robustness with respect to the decision boundaries ξ U and ξ L for the continuation decision. Table S3 summarizes OCs under Scenarios 2, 3, and 4. Again, while some summaries, like the probability of early stopping for futility, change in the expected direction, the nature of the overall comparison across scenarios remains unchanged under different criteria.

Conclusions and Discussion
We developed a Bayesian nonparametric (BNP) utility-based group sequential design to compare Progel with standard care in resolving air leaks after lung surgery. In this setting, Beyond the application discussed in this paper, the proposed BNP utility-based method can be extended to many other contexts. For example, in applications that involve multiple groups, one may replace the truncated bivariate normal base measure M * in (3.3) with a truncated multivariate normal distribution that incorporates the desired stochastic ordering constraints. Furthermore, the hypothesis testing framework discussed in Section 4 can be extended easily to testing equalities in multiple distributions that are stochastically ordered.
Finally, we note that the BNP model could be replaced by a sufficiently flexible parametric model without any substantial change in the performance of the proposed design. For example, one could use a mixture of H = 5 normals as the model. However, the computational effort for posterior simulation in any finite mixture of normal model is nothing less than in the proposed DDP model. We prefer the BNP model for reasons of conceptual clarity and, in principle, natural scaling to larger sample sizes and greater precision.