Bayesian nonparametric estimation of survival functions with multiple-samples information

In many real problems, dependence structures more general than exchangeability are required. For instance, in some settings partial exchangeability is a more reasonable assumption. For this reason, vectors of dependent Bayesian nonparametric priors have recently gained popularity. They provide flexible models which are tractable from a computational and theoretical point of view. In this paper, we focus on their use for estimating survival functions with multiple-samples information. Our methodology allows to model the dependence among survival times of different groups of observations and extend previous work to an arbitrary dimension . Theoretical results about the posterior behaviour of the underlying dependent vector of completely random measures are provided. The performance of the model is tested on a simulated dataset arising from a distributional Clayton copula.


Introduction
Bayesian nonparametric modelling in survival analysis problems often relies on the assumption that the times observed are exchangeable, see for example Doksum (1974) and Ishwaran and James (2009). Such assumption fails to hold when we consider events that are pooled from different dependent scenarios. For example, consider patients under the same treatment but in different hospitals. The survival times of patients from the same hospital could be assumed exchangeable. On the other hand, this is not a reasonable assumption when we consider patients from different hospitals since factors specific to each hospital might exert significant influence. In general, we can consider that the data is originated from d different but related studies. Formally, we have d sets of observations where the exchangeability assumption is assumed only within each set. In the above cases, it would be more appropriate to assume a form of dependence called partial exchangeability (see Section 2 for a formal account on exchangeability and partial exchangeability). This motivates the extension of Bayesian nonparametric models into a partially exchangeable setting where multiple-samples information could be used.
Applications of Bayesian nonparametrics in survival analysis go back, for example, to Doksum (1974) and Ferguson and Phadia (1979), who used nondecreasing independent increment processes to construct random survival functions. Dykstra and Laud (1981) and Lo and Weng (1989) focused on random hazard rates. More recently, Ishwaran and James (2009) used a general class of random hazard rate-based models, and Nieto-Barajas (2014) used a general short-term and long-term hazard ratios model. There is an ongoing effort in Bayesian nonparametrics to propose flexible dependent random probability measures as set forth with the seminal work of MacEachern (1999). In survival analysis, for example, De Iorio et al. (2009) introduced a model based on a dependent Dirichlet process. In a partial exchangeable setting, survival analysis models have been used, for example, in Epifani and Lijoi (2010) where a dependent two-dimensional extension of the neutral to the right (NTR) model was introduced and in Lijoi and Nipoti (2014) where a dependent vector of hazard rates was constructed. Griffin and Leisen (2017) introduced a new class of vectors of dependent completely random measures, called Compound Random Measures, where the dependence contribution is modelled with a parametric distribution.
In the seminal work of Doksum (1974), the NTR model for survival functions was introduced. The NTR model can be expressed in terms of a Completely Random Measure (CRM) μ. This means that when μ is evaluated at pairwise disjoint sets it gives rise to mutually independent nonnegative random variables. We say that a positive random variable Y has a NTR distribution given by a CRM μ, denoted Y ∼ NTR(μ), if where μ is such that lim t→∞ μ(0, t] = ∞. NTR distributions have several appealing properties, including the independence of normalized increments and posterior conjugacy for censored to the right data. An extension of the NTR model into a partially exchangeable setting was given by Epifani and Lijoi (2010) for the 2−dimensional case. In the present work, we follow the approach of Epifani and Lijoi (2010) and focus on models based on a d-dimensional vector of completely random measures (VCRM). More precisely, we consider d collections of survival times {Y (1) such that, for t = (t 1 , . . . , t d ) ∈ (R + ) d , with arbitrary i 1 , . . . , i d ∈ N \ {0}. This model is convenient for modeling data where the dependence among the entries of the VCRM μ = (μ 1 , . . . , μ d ) accounts for dependence among the multiple-samples in a partially exchangeable setting. Furthermore, marginally we recover the NTR model, namely with i ∈ {1, . . . , d}, n i ∈ N \ {0}. In (2) we want to model the dependence of the VCRM μ in a way that allows us to fix a marginal behavior so to exploit the fact that marginally we recover a NTR model; Lévy copulas are a natural framework to model the dependence structure of VCRM's in such way. In this paper we provide a posterior characterization for the above model, see Theorem 1. Similarly to Epifani and Lijoi (2010) for 2-dimensional setting, we show that the posterior distribution corresponds to a survival function of the type as in (1) leading to a conjugacy property. Extensions of some results in Epifani and Lijoi (2010) are also provided. We would like to stress that the derivation of such results are not trivial when considering an arbitrary dimension. In particular, Proposition 1 gives a general expression for the Laplace exponent when a Lévy copula is considered to set the dependence of the VCRM underlying the d−dimensional NTR model; Proposition 3 gives an alternative characterization of the multivariate NTR. Furthermore, other theoretical results are proved in order to facilitate the calculation of posterior means when the inferential exercise is implemented. Finally, we illustrate the methodology on a synthetic dataset.
The paper is organized as follows: Section 2 presents the preliminary notions which are needed in this work. In Section 3 we extend some results in Epifani and Lijoi (2010) to the multivariate setting. In particular, we state the posterior characterization of the model and provide some useful corollaries for implementing the posterior inference. In Section 4, an application with synthetic data is illustrated. All the proofs can be found in the appendix.

Preliminaries
In this section, we provide some preliminaries about exchangeability, partial exchangeability and vectors of completely random measures which are the building blocks of our Bayesian nonparametric proposal. Furthermore, we will illustrate the concept of a positive Lévy copula which is useful to model the dependence structure between the components of a vector of completely random measures.

Exchangeability and partial exchangeability
Let Z be a complete and separable metric space, with corresponding Borel σ- As highlighted in the Introduction, in several problems the exchangeability assumption appears far too restrictive. In particular, we considered d groups of observations where the order in which they are collected within each group is irrelevant. To describe this setting we resorted to the notion of partial exchangeability, as set forth by de Finetti (1938), that formalizes the idea of partitioning the entire set of observations into a certain number of classes, say d, in such a way that exchangeability may be reasonably assumed within each class. For ease of exposition, we confine ourselves to consider the case where d = 2.

Vectors of completely random measures
Given a complete and separable metric space X, with corresponding Borel σalgebra X = B(X), we call a measure μ on (X, X ) boundedly finite if μ(A) < ∞ for any bounded set A ∈ X . A random measure is a measurable function from a probability space (Ω, F, P) onto (M X , M X ) which is the measure space formed by M X , the space of boundedly finite measures on (X, X ), and its corresponding Borel σ-algebra M X . In particular we will focus on the class of completely random measures as introduced in Kingman (1967).
Definition 3. A random measure μ on a complete and separable metric space X with corresponding Borel σ-algebra X = B(X) is called a completely random measure (CRM) if for any collection of disjoint sets {A 1 , . . . , A n } ⊂ X the random variables μ(A 1 ), . . . , μ(A n ) are mutually independent.
A CRM μ has the following representation (Kingman, 1967), where μ d is a deterministic measure, μ fl is a measure that consists on jumps with possibly random jump heights but fixed jump locations, and where for i ∈ {1, 2, . . . } X i ∈ X are random jump locations and W i ∈ R + are random jump heights. The measures μ d , μ fl and μ r are mutually independent. In particular, μ r is again a CRM and is characterized by the following Laplace transform where λ > 0 and ν is a measure on R + × X such that for any bounded set A ∈ X . The measure ν is usually called the Lévy intensity of μ r . In the remainder of this work we only consider CRM's μ without fixed jump locations nor deterministic part so we take μ = μ r to be solely determined by (3). In particular we focus on Lévy intensities ν which are homogeneous, i.e.
where α is a non-atomic measure on X referring to the jump locations and ρ is a measure on R + referring to the jump heights. A popular example of an homogeneous CRM is the σ-stable process given by As an illustration, we plot in Figure 1 the associated process μ(0, t] for the σ-stable process (4) with α(dx) = dx. We extend this framework to the multivariate setting by considering vectors (μ 1 , . . . , μ d ) where each μ i is a homogeneous CRM on (X, X ) with respective Lévy intensitiesν j (ds, dx) = ν j (ds)α(dx). Moreover we take the intensity α to be smooth in the sense that α((0, t]) = γ(t) with γ : [0, ∞) → R + a nondecreasing and differentiable function such that γ(0) = 0 and lim t→∞ γ(t) = ∞; this last conditions on the limit behaviour will enable us to get, marginally, the associated NTR cumulative distributions in our models. We have that for any A 1 , . . . , A n in X , with A i ∩ A j = ∅ for any i = j, the random vectors (μ 1 (A i ), . . . , μ d (A i )) and (μ 1 (A j ), . . . , μ d (A j )) are mutually independent; furthermore, one has a multivariate analogue of the Laplace transform (3) where λ = (λ 1 , . . . , λ d ) ∈ (R + ) d and ρ d is a measure on (R + ) d . In particular, we introduce the notation for the multivariate Laplace transform Henceforth, ψ t (λ) is called the Laplace exponent of μ = (μ 1 , . . . , μ d ); in the case at hand, In Section 3, we use this particular kind of homogeneous and additive vector of CRM's to construct priors for survival analysis models.

Positive Lévy copulas
Although in this work we consider vectors of CRM's with fixed marginal behaviour, it remains to establish the dependence structure. Kallsen and Tankov (2006) introduced the concept of positive Lévy copulas which allows to construct vectors of CRM's with fixed marginals.
For example, a vector of independent CRM's is obtained with A vector of completely dependent CRM's, in the sense that the jumps of the stochastic vector are in a set S such that whenever v, u ∈ S then either An interesting example of positive Lévy copulas is the Clayton Lévy copula The parameter θ is positive and regulates the level of dependence. The above copulas are special cases of the Clayton Lévy copula, i.e.
We define the tail integral of an univariate Lévy intensity ν to be U (x) = ∞ x ν(s)ds. In the setting of Section 2.1 we use a Lévy copula C d and the marginal tail integrals U 1 , . . . , U d associated to ν 1 , . . . , ν d to specify an abso- Therefore, under suitable regularity conditions, we can recover the multivariate Lévy intensity from the copula and marginal intensities in the following way For example, consider the Clayton Lévy copula with σ-stable margins, given by (4), and α(dx) = dx. Figure 2 shows the dependence behaviour when a 2-dimensional Clayton Lévy copula with parameter θ = 0.3 and θ = 3.5 is employed; we plot the associated stochastic processes μ i (0, t] with i ∈ {1, 2} similarly to Figure 1. As expected, when θ = 0.3, at each jumping time, the processes have one jump weight big and one small since we are close to the independence case (where the processes almost surely share no jumping times).
On the other hand, when θ is increased to 3.5, we can appreciate the higher dependence induced by a larger value of the copula parameter. We simulated the trajectories in Figure 2 by using Algorithm 6.15 in Cont and Tankov (2004), where a full treatment of the dependence structure of Lévy intensities is also given. Leisen and Lijoi (2011), Leisen, Lijoi and Spano (2013) and Zhu and Leisen (2015) used a Lévy copula approach for building vectors of dependent completely random measures.

Working example
If we consider the Lévy intensity arising from (8) when considering the ddimensional Clayton Lévy copula, (7), with parameter θ and σ-stable marginals, (4), with parameters A, σ, we obtain Furthermore, if we take θ = 1/σ we obtain the simplified Lévy intensity Such intensity corresponds to a particular family of vectors of completely random measures known as Compound Random Measures (CoRM's) and introduced in Griffin and Leisen (2017); the previous Lévy intensity arises when taking φ = 1 in equation (4.4) of the aforementioned paper. A convenient feature of this Lévy intensity is that, as shown in Proposition 3.1 of Zhu and Leisen (2015), we can explicitly get the corresponding Laplace exponent where we take the appropriate limits when λ = (λ 1 , . . . , λ d ) is such that λ i = λ j for distinct i, j ∈ {1, . . . , d}. As indicated in the remark at the end of Section 3, evaluation of the Laplace exponent is necessary for the explicit calculation of the posterior mean of the survival function given censored data.

Main results
Let d ∈ N \ {0}, and suppose we have d collections of random variables We characterize the probability distribution of these random variables in terms of a vector of CRM's We observe that under such model the random variables (12) are partially exchangeable and marginally follow a NT R process. The dependence structure in this model can be given through the Lévy copula associated to the CRM μ. This model extends the one in Epifani and Lijoi (2010) to an arbitrary dimension d.
The family of Clayton Lévy copulas is of interest because it has both the independence and complete dependence cases as limit behaviour. In the next result, we work towards finding expressions for the Laplace exponent associated to the Clayton family in such a way that the dependence structure is decoupled across dimensions. This result could be useful since, as we will see, an explicit calculation of ψ is of key importance to implement the Bayesian inference in our survival analysis model.
Let ρ d (s; θ) be the Lévy intensity associated via (8) to the Clayton Lévy copula C θ,d and fixed marginal Lévy intensities ν 1 , . . . , ν d with corresponding Laplace transforms ψ 1 , . . . , ψ d . We denote the vector of tail integrals corresponding to the marginal Lévy intensities as U d (x) = (U 1 (x 1 ), . . . , U d (x d )) and fix the notation then We refer to the Appendix A.1 for the proof. We incorporate the Lévy exponent ψ in the multivariate survival analysis setting of (12), in the next result. We introduce the notation , . . . , d}; and denote ψ i1,··· ,i h for the respective Laplace exponents.
Proposition 2. In the context of (12), let 1 = (1, . . . , 1). For t 1 ≤ · · · ≤ t d and We refer to the Appendix A.2 for the proof. This result showcases the importance of the Laplace exponent ψ for calculating probabilities in the model and the impact of the function γ(t), related to the time depending part of the Laplace exponent, in the survival function. In Section 4, we will show that the availability of the Laplace exponent is also of main importance to implement the Bayesian inference for the model. The model we are working on generalizes to arbitrary dimension the classic model of Doksum (1974). We present a multivariate extension of Theorem 3.1 in Doksum (1974), which relates our model with the notion of neutrality to the right. Let F be a d-variate random distribution function on (R + ) d and, for a d-variate vector of CRM's μ = (μ 1 , . . . , μ d ), denote μ i (t) = μ i ((0, t]) with i ∈ {1, . . . , d}. Then, we have the following multivariate extension to Theorem 3.1 in Doksum (1974) and Proposition 4 in Epifani and Lijoi (2010).
Proposition 3. F (t = (t 1 , . . . , t d )) has the same distribution as for some d-variate CRM μ = (μ 1 , . . . , μ d ) if and only if for h ∈ {1, 2, . . . } and vectors t 1 = (t 1,1 , . . . , t d,1 ), We refer to the Appendix A.3 for the proof. We now establish some notation in order to address the posterior distribution arising from (12) when some survival data is available. Let Y ni , i = 1, . . . , d, be d groups of observations that come from the distribution given by where t i,ni = (t i,1 , . . . , t i,ni ) and the event {Y n d be their respective censoring times; therefore, the set of censored data is the following j and the number of censored observations is n c = n 1 + n 2 − n e . Taking into account the possible repetition of values among the observations, we consider the order statistics (T (1) , . . . , T (k) ) of the distinct observations where k is the number of distinct observed times among all groups.
Let define the set functions ). The next theorem determines the calculation of the posterior distribution for a vector of CRM's given some censored data and it applies to general vectors of CRM's. In particular, the assumption that the respective Lèvy intensity is homogeneous has been dropped.
Theorem 1. Let μ = (μ 1 , . . . , μ d ) be a d-variate CRM such that its corresponding Lèvy intensity ν(s, dt)ds is differentiable with respect to t 0 on R + \ {0} in the sense that for η t = ν(s, (0, t]) the partial derivative η t0 (s) = ∂η t (s)/∂t t=t0 exists. Moreover we assume that the entries of μ are not independent. Then the posterior distribution of μ given data D is the distribution of the random measure We refer to the Appendix A.4 for the proof. The previous result showcases that the posterior distribution arising from (12) can be modeled in the same framework via a vector of CRM's by updating the prior vector of CRM's μ to μ as above. This result is enough to provide a scheme for posterior inference. In particular, in the setting of (12) and Theorem 1, we want to estimate the corresponding survival function P Y (1) > t 1 , . . . , Y (d) > t d |(μ 1 , . . . , μ d ) when multiple samples information is available.
A natural approach in Bayesian nonparametrics is to marginalize over the infinite dimensional random element which characterizes the probability model. In our case, given censored data D, we calculate the mean of the survival function given the data by marginalizing over the vector of CRM's μ. As a result of Theorem 1, we can calculate such quantity. The next results allow us to implement the necessary inferential scheme for performing the estimation of the survival function as a posterior mean. We denote e i for the canonical basis of R d , and S L (t) = S(t l∈L e l ) for t > 0, ∅ = L ⊂ {1, . . . , d}. In view of the independent increments of the CRM's, calculation of the posterior mean of S L is all that is needed for the evaluation of the posterior mean of S. The next corollary shows how to evaluate the posterior mean of S L .

Corollary 1. Let μ be a vector of CRM's with corresponding Lèvy intensity such that η t (s) = γ(t)ν(s) with γ a differentiable function satisfying γ (t) = 0
for t > 0. Moreover we assume that the entries of μ are not independent. Let ∅ = L ⊂ {1, . . . , d} and set We see that we can estimate S(t) for arbitrary t ∈ (R + ) d in terms of the estimates defined in the previous corollary. Indeed, let t = (t 1 , . . . , t d ) and π be a permutation of {1, . . . , d} such that t π(1) ≤ t π(2) ≤ · · · ≤ t π(d) . We define, for i ∈ {1, . . . , d − 1}, the following sets From the independence of increments of CRM's, it follows that the posterior mean of the survival function given censored data D iŝ Usually, we deal with Lévy intensities which exhibit some dependences in a vector of hyper-parameters c. On the proof of Theorem 1, it is outlined how, given censored data D as before, we could derive the likelihood of the hyperparameters in the Lévy intensity. This likelihood is necessary for implementing the inferential procedure and it is displayed in the next corollary.

Corollary 2. Let μ be a vector of CRM's with corresponding Lèvy intensity such that η t (s) = γ(t)ρ d,c (s) with γ a differentiable function satisfying γ (t) = 0 for t > 0, and c a vector of hyper-parameters. Given censored data D we get the likelihood on c.
where ψ d,c is the Laplace exponent associated to ρ d,c .
The next lemma provides a useful identity for the computation of the integrals in Corollary 1 and Corollary 2.
We omit the proof as it is just an application of the binomial theorem in the same line as the proof of Lemma 3 in the appendix.

Application
In this section we perform the fitting of a multivariate survival function given censored to the right data in the framework of (12). As mentioned in the previous remark, the evaluation of the Laplace exponent of μ in (12) is necessary to evaluate the posterior mean in Corollary 1 and the likelihood in Corollary 2; with this in mind, we choose the random measure μ given by the Lévy intensity showcased in (9), so that the corresponding Laplace exponent is readily given by (10). For illustration purposes, we use 4-dimensional data arising from a distributional copula with fixed marginal distributions, see Nelsen (2013) for an overview of distributional copulas. More precisely, we generate simulated data Y = (Y 1 , ..., Y 4 ) with probability distribution F θ,λ given by a distributional Clayton copula with parameter θ and exponential marginals with parameter λ. Then, we perform right-censoring by considering censoring time variables c consisting of independent exponential random variables with parameter λ c , and define δ = (1 Y1<c1 , . . . , 1 Y4<c4 ), c 1 }, . . . , min{Y 4 , c 4 }).
For fitting the data, we use the 4-dimensional Lévy intensity given by (9) and assign priors for the hyper-parameters in (9), σ and A. We choose a log-normal prior for the parameter A and a Beta prior for the parameter σ. We use the Metropolis within Gibbs algorithm to draw samples from the posterior distributions of A and σ by making use of the likelihood showed in Corollary 2. We present a Monte Carlo approximation of the estimator (17), where we have averaged over the posterior draws of A and σ. A more in depth description of the simulation algorithm is given in Appendix A.5. In Figures 3 and 4 we show the fit for 150 possibly right censored observations as in (18). The simulated synthetic observations are such that We chose λ c = 3.7 so we have at least 75% of exact observations for T in each dimension. The construction of F θ,λ through a distributional Clayton allows us to calculate explicitly the associated survival function as showcased in Appendix A.6. We use the true survival function for comparison with the fitted survival functions. The estimated survival function are given by the posterior mean as in (17). The prior distributions of the hyperparameters are We ran 1000 iterations for the associated Metropolis within Gibbs sampler. Figure 3 and Figure 4 show that the estimated survival functions approximate well the true functions. For comparison purposes, we presented a Kaplan-Meier estimator for the true survival function, see for example Aalen et al. (2008). As there is no multivariate Kaplan-Meier, we use the next estimator for a multivariate survival function: where each S KM estimator is treated as a univariate Kaplan-Meier estimator restricted to the corresponding set of observations. In Figure 3 and Figure 4, we could appreciate that in the last subplots of each column the Kaplan-Meier can fit poorly as there are less observations on the conditioned Kaplan-Meier functions, as presented in the formula above. and a k,m (λ) where k ∈ {1, . . . , d}, m ∈ {0, 1, . . . , d} and λ ∈ (R + ) d such that a 0,d (λ) < ∞; we also define l j=k a j = 1 when k > l, and denote x −i for the vector x without its i-th entry. An integration by parts shows that and in general for r ∈ {1, . . . , d} we get the recursion formula We prove the next technical lemma Lemma 2. If a 0,d (λ) < ∞ then the next d + 1 identities hold ; λ, (1, . . . , d)) ; λ, (1, . . . , d)) . . .
Proposition 1 follows by considering the first equation in the Lemma statement and the definition of a 0,d .

A.2. Proof of Proposition 2
Proof. Using the independent increments property of CRM's we get that

A.3. Proof of Proposition 3
For notation purposes, in this proof we use the shorthand μ(t) = μ ((0, t]) for a measure μ and positive real number t.

A.4. Proof of Theorem 1
This proof is not only restricted to the homogeneous Lévy intensity case; in this general setting, we recall that the Laplace exponent has the form (6). In order to prove the theorem we use the next technical lemma.

A.5. Simulation algorithm
We use a Metropolis within Gibbs sampler to draw simulations from σ|D and A|D as in Section 4. We recall that Corollary 2 gives the likelihood l(σ, A; D) and we denote p σ , p A for the prior distributions of σ and A as in Section 4. Given initial values σ (0) , A 0) , the algorithm is as follows (1) Draw A (i+1) from a Metropolis-Hastings sampler with proposal distribution g(x |x) ∼ Log-Norm(log(x), 1) and target distribution l(σ (i) , x; D)p A (x).
For the fits in Section 4 we used 100 iterations for each inner Metropolis-Hasting sampler and 1000 iterations for the overall Gibbs sampler.

A.6. Survival function of F θ,λ .
Let C θ,d be a d-dimensional distributional Clayton copula andF i , i = 1, . . . , d, a collection of marginal cumulative distribution functions; then the survival function associated to the Clayton distributional copula and marginals is given by