Skip to main content
Log in

Bayesian tree-based heterogeneous mediation analysis with a time-to-event outcome

  • Original Paper
  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Mediation analysis aims at quantifying and explaining the underlying causal mechanism between an exposure and an outcome of interest. In the context of survival analysis, mediation models have been widely used to achieve causal interpretation for the direct and indirect effects on the survival of interest. Although heterogeneity in treatment effect is drawing increasing attention in biomedical studies, none of the existing methods have accommodated the presence of heterogeneous causal pathways pointing to a time-to-event outcome. In this study, we consider a heterogeneous mediation analysis for survival data based on a Bayesian tree-based Cox proportional hazards model with shared topologies. Under the potential outcomes framework, individual-specific conditional direct and indirect effects are derived on the scale of the logarithm of hazards, survival probability, and restricted mean survival time. A Bayesian approach with efficient sampling strategies is developed to estimate the conditional causal effects through the Monte Carlo implementation of the mediation formula. Simulation studies show the satisfactory performance of the proposed method. The proposed model is then applied to an HIV dataset extracted from the ACTG175 study to demonstrate its usage in detecting heterogeneous causal pathways.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Albert, J.M., Nelson, S.: Generalized causal mediation analysis. Biometrics 67(3), 1028–1038 (2011)

    MathSciNet  Google Scholar 

  • Athey, S., Wager, S.: Estimating treatment effects with causal forests: an application. Obs. Stud. 5(2), 37–51 (2019)

    Google Scholar 

  • Avin, C., Shpitser, I., Pearl, J.: Identifiability of path-specific effects. In: Proceedings of the 19th International Joint Conference on Artificial Intelligence, pp. 357–363. Edinburgh, Scotland (2005)

  • Baron, R.M., Kenny, D.A.: The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. J. Pers. Soc. Psychol. 51(6), 1173 (1986)

    Google Scholar 

  • Bonetti, M., Gelber, R.D.: Patterns of treatment effects in subsets of patients in clinical trials. Biostatistics 5(3), 465–481 (2004)

    Google Scholar 

  • Chipman, H.A., George, E.I., McCulloch, R.E.: BART: Bayesian additive regression trees. Ann. Appl. Stat. 4(1), 266–298 (2010)

    MathSciNet  Google Scholar 

  • Cho, S.-H., Huang, Y.-T.: Mediation analysis with causally ordered mediators using Cox proportional hazards model. Stat. Med. 38(9), 1566–1581 (2019)

    MathSciNet  Google Scholar 

  • Cui, Y., Kosorok, M.R., Sverdrup, E., Wager, S., Zhu, R.: Estimating heterogeneous treatment effects with right-censored data via causal survival forests. J. R. Stat. Soc. Ser. B Stat Methodol. 85(2), 179–211 (2023)

    Google Scholar 

  • Curth, A., Lee, C., van der Schaar, M.: SurvITE: learning heterogeneous treatment effects from time-to-event data. Adv. Neural. Inf. Process. Syst. 34, 26740–26753 (2021)

    Google Scholar 

  • Daniel, R., De Stavola, B., Cousens, S., Vansteelandt, S.: Causal mediation analysis with multiple mediators. Biometrics 71(1), 1–14 (2015)

    MathSciNet  Google Scholar 

  • Didelez, V.: Defining causal mediation with a longitudinal mediator and a survival outcome. Lifetime Data Anal. 25, 593–610 (2019)

    MathSciNet  Google Scholar 

  • Dorie, V., Hill, J., Shalit, U., Scott, M., Cervone, D.: Automated versus do-it-yourself methods for causal inference: lessons learned from a data analysis competition. Stat. Sci. 34(1), 43–68 (2019)

    MathSciNet  Google Scholar 

  • Fulcher, I.R., Tchetgen, E.T., Williams, P.L.: Mediation analysis for censored survival data under an accelerated failure time model. Epidemiology (Cambridge, Mass.) 28(5), 660 (2017)

  • Hahn, P.R., Murray, J.S., Carvalho, C.M.: Bayesian regression tree models for causal inference: regularization, confounding, and heterogeneous effects (with discussion). Bayesian Anal. 15(3), 965–1056 (2020)

    MathSciNet  Google Scholar 

  • Hammer, S.M., Katzenstein, D.A., Hughes, M.D., Gundacker, H., Schooley, R.T., Haubrich, R.H., Henry, W.K., Lederman, M.M., Phair, J.P., Niu, M., et al.: A trial comparing nucleoside monotherapy with combination therapy in HIV-infected adults with CD4 cell counts from 200 to 500 per cubic millimeter. N. Engl. J. Med. 335(15), 1081–1090 (1996)

    Google Scholar 

  • Hastings, W.K.: Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57(1), 97–109 (1970)

    MathSciNet  Google Scholar 

  • Hayes, A.F.: An index and test of linear moderated mediation. Multivar. Behav. Res. 50(1), 1–22 (2015)

    MathSciNet  Google Scholar 

  • Henderson, N.C., Louis, T.A., Rosner, G.L., Varadhan, R.: Individualized treatment effects with censored data via fully nonparametric Bayesian accelerated failure time models. Biostatistics 21(1), 50–68 (2020)

    MathSciNet  Google Scholar 

  • Hill, J., Linero, A., Murray, J.: Bayesian additive regression trees: a review and look forward. Annu. Rev. Stat. Appl. 7(1), 251–278 (2020)

    MathSciNet  Google Scholar 

  • Hu, L., Ji, J., Li, F.: Estimating heterogeneous survival treatment effect in observational data using machine learning. Stat. Med. 40(21), 4691–4713 (2021)

    MathSciNet  Google Scholar 

  • Huang, Y.-T., Yang, H.-I.: Causal mediation analysis of survival outcome with multiple mediators. Epidemiology (Cambridge, Mass.), 28(3), 370 (2017)

  • Ibrahim, J.G., Chen, M.-H., Sinha, D.: Bayesian Survival Analysis. Springer, New York (2001)

    Google Scholar 

  • Imai, K., Keele, L., Yamamoto, T.: Identification, inference and sensitivity analysis for causal mediation effects. Stat. Sci. 25(1), 51–71 (2010)

    MathSciNet  Google Scholar 

  • Jenks, J.D., Hoenigl, M.: CD4: CD8 ratio and CD8+ cell count for prognosticating mortality in HIV-infected patients on antiretroviral therapy. J. Lab. Precis. Med., 3 (2018)

  • Kang, K., Pan, D., Song, X.: A joint model for multivariate longitudinal and survival data to discover the conversion to Alzheimer’s disease. Stat. Med. 41(2), 356–373 (2022)

    MathSciNet  Google Scholar 

  • Lin, S.-H., Young, J.G., Logan, R., VanderWeele, T.J.: Mediation analysis for a survival outcome with time-varying exposures, mediators, and confounders. Stat. Med. 36(26), 4153–4166 (2017)

    MathSciNet  Google Scholar 

  • Linero, A.R.: Bayesian regression trees for high-dimensional prediction and variable selection. J. Am. Stat. Assoc. 113(522), 626–636 (2018)

    MathSciNet  Google Scholar 

  • Linero, A.R., Basak, P., Li, Y., Sinha, D.: Bayesian survival tree ensembles with submodel shrinkage. Bayesian Anal. 17(3), 997–1020 (2022)

    MathSciNet  Google Scholar 

  • Linero, A.R., Sinha, D., Lipsitz, S.R.: Semiparametric mixed-scale models using shared Bayesian forests. Biometrics 76(1), 131–144 (2020)

    MathSciNet  Google Scholar 

  • Linero, A.R., Yang, Y.: Bayesian regression tree ensembles that adapt to smoothness and sparsity. J. R. Stat. Soc. Ser. B Stat Methodol. 80(5), 1087–1110 (2018)

    MathSciNet  Google Scholar 

  • Linero, A. R., Zhang, Q.: Mediation analysis using Bayesian tree ensembles. Psychol. Methods (2022)

  • Lu, W., Zhang, H.H., Zeng, D.: Variable selection for optimal treatment decision. Stat. Methods Med. Res. 22(5), 493–504 (2013)

    MathSciNet  Google Scholar 

  • MacKinnon, D.P., Fairchild, A.J., Fritz, M.S.: Mediation analysis. Annu. Rev. Psychol. 58, 593–614 (2007)

    Google Scholar 

  • MacKinnon, D.P., Lockwood, C.M., Hoffman, J.M., West, S.G., Sheets, V.: A comparison of methods to test mediation and other intervening variable effects. Psychol. Methods 7(1), 83 (2002)

    Google Scholar 

  • Malinsky, D., Shpitser, I., Richardson, T.: A Potential Outcomes Calculus for Identifying Conditional Path-Specific Effects. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 3080–3088. PMLR (2019)

  • Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E.: Equation of state calculations by fast computing machines. J. Chem. Phys. 21(6), 1087–1092 (1953)

  • Monsalvo, M., Vallejo, A., Fontecha, M., Vivancos, M.J., Vizcarra, P., Casado, J.L.: CD4/CD8 ratio improvement in HIV-1-infected patients receiving dual antiretroviral treatment. Int. J. STD AIDS 30(7), 656–662 (2019)

  • Murray, J.S.: Log-linear Bayesian additive regression trees for multinomial logistic and count regression models. J. Am. Stat. Assoc. 116(534), 756–769 (2021)

    MathSciNet  Google Scholar 

  • Pearl, J.: Direct and indirect effects. In Proceedings of the Seventeenth Conference on Uncertainty and Artificial Intelligence, pp. 411–420. Morgan Kaufman, San Francisco, CA (2001)

  • Preacher, K.J., Rucker, D.D., Hayes, A.F.: Addressing moderated mediation hypotheses: theory, methods, and prescriptions. Multivar. Behav. Res. 42(1), 185–227 (2007)

    Google Scholar 

  • Qin, J., Shen, Y.: Statistical methods for analyzing right-censored length-biased data under cox model. Biometrics 66(2), 382–392 (2010)

    MathSciNet  Google Scholar 

  • Royston, P., Parmar, M.K.: The use of restricted mean survival time to estimate the treatment effect in randomized clinical trials when the proportional hazards assumption is in doubt. Stat. Med. 30(19), 2409–2421 (2011)

    MathSciNet  Google Scholar 

  • Rubin, D.B.: Estimating causal effects of treatments in randomized and nonrandomized studies. J. Educ. Psychol. 66(5), 688–701 (1974)

  • Rubin, D.B.: Causal inference using potential outcomes: design, modeling, decisions. J. Am. Stat. Assoc. 100(469), 322–331 (2005)

    MathSciNet  Google Scholar 

  • Sinha, D., Ibrahim, J.G., Chen, M.-H.: A Bayesian justification of Cox’s partial likelihood. Biometrika 90(3), 629–641 (2003)

    MathSciNet  Google Scholar 

  • Sun, R., Zhou, X., Song, X.: Bayesian causal mediation analysis with latent mediators and survival outcome. Struct. Equ. Model. 28(5), 778–790 (2021)

    MathSciNet  Google Scholar 

  • Tan, Y.V., Roy, J.: Bayesian additive regression trees and the general BART model. Stat. Med. 38(25), 5048–5069 (2019)

    MathSciNet  Google Scholar 

  • Ten Have, T.R., Joffe, M.M.: A review of causal estimation of effects in mediation analyses. Stat. Methods Med. Res. 21(1), 77–107 (2012)

    MathSciNet  Google Scholar 

  • Wang, W., Albert, J.M.: Causal mediation analysis for the Cox proportional hazards model with a smooth baseline hazard estimator. J. R. Stat. Soc. Ser. C Appl. Stat. 66(4), 741 (2017)

    MathSciNet  Google Scholar 

  • Xue, F., Tang, X., Kim, G., Koenen, K.C., Martin, C.L., Galea, S., Wildman, D., Uddin, M., Qu, A.: Heterogeneous mediation analysis on epigenomic PTSD and traumatic stress in a predominantly African American Cohort. J. Am. Stat. Assoc. 117(540), 1669–1683 (2022)

    MathSciNet  Google Scholar 

  • Yin, G., Ibrahim, J.G.: A general class of Bayesian survival models with zero and nonzero cure fractions. Biometrics 61(2), 403–412 (2005)

    MathSciNet  Google Scholar 

  • Zhou, X., Song, X.: Mediation analysis for mixture Cox proportional hazards cure models. Stat. Methods Med. Res. 30(6), 1554–1572 (2021)

    MathSciNet  Google Scholar 

Download references

Acknowledgements

This research was fully supported by GRF Grants (14303622, 14302220) from Research Grant Council of the Hong Kong Special Administration Region.

Author information

Authors and Affiliations

Authors

Contributions

R.S. and X.S. wrote the manuscript. R.S. and X.S. reviewed the manuscript.

Corresponding author

Correspondence to Xinyuan Song.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Derivation of Eqs. (3) and (4)

Under the identification assumptions stated in Sect. 2.3, the targeted function of potential survival time for subject i, \(\omega \{T_i(a, M_i(a'))\}\), is identified as

$$\begin{aligned}{} & {} \omega \{T_i(a,M_i(a'))\} {=} \int \omega \{T_i(a, m) | M_i(a') {=} m, \textbf{x}_i = \varvec{x}\}dF_{M_i(a')|\textbf{x}_i=\varvec{x}} \nonumber \\{} & {} \quad {\left[ \text {By Assumption } 4\right] }\nonumber \\{} & {} \quad = \int \omega \{T_i(a, m) |\textbf{x}_i = \varvec{x}\}dF_{M_i(a')|\textbf{x}_i=\varvec{x}}\nonumber \\{} & {} \quad {\left[ \text {By Assumption } 1\right] }\nonumber \\{} & {} \quad = \int \omega \{T_i(a, m) |A_i = a, \textbf{x}_i = \varvec{x}\}dF_{M_i(a')|\textbf{x}_i=\varvec{x}} \nonumber \\{} & {} \quad {\left[ \text {By Assumption } 2\right] }\nonumber \\{} & {} \quad = \int \omega \{T_i(a, m) |A_i = a, M_i = m, \textbf{x}_i = \varvec{x}\}dF_{M_i(a')|\textbf{x}_i=\varvec{x}} \nonumber \\{} & {} \quad {\left[ \text {By Assumption } 3\right] }\nonumber \\{} & {} \quad = \int \omega \{T_i(a, m) |A_i = a, M_i = m, \textbf{x}_i = \varvec{x}\}dF_{M_i(a')|A_i = a', \textbf{x}_i=\varvec{x}} \nonumber \\{} & {} \quad {\left[ \text {By Consistency}\right] }\nonumber \\{} & {} \quad = \int \omega \{T_i |A_i = a, M_i = m, \textbf{x}_i = \varvec{x}\}dF_{M_i | A_i = a', \textbf{x}_i=\varvec{x}}. \nonumber \\ \end{aligned}$$
(A.1)

The last equation follows from the basic consistency assumption of the counterfactual framework, which states that an individual’s potential outcome under their observed exposure history is exactly their observed outcome (Imai et al. 2010; Ten Have and Joffe 2012), i.e., \(M_i(A_i) = M_i\) and \(T_i(A_i, M_i(A_i)) = T_i\). We used the Monte Carlo method to integrate over \(F_{M_i|A_i, \textbf{x}_i}\), and also adopted the comonotonic sampling strategy (Linero and Zhang 2022) to enhance the efficiency of the numerical implementation of the mediation formula. Major steps are outlined in Algorithm A1.

Algorithm 2
figure b

The Monte Carlo implementation of mediation formula

By using inverse transform sampling with shared \(u_i\), \(M_i^{(itr)}(1)\) and \(M_i^{(itr)}(0)\) are sampled as comonotonic pairs. As explained by Linero and Zhang (2022), this approach is effective in reducing Monte Carlo errors by making the \(M_i^{(itr)}(a)\) pairs as correlated as possible, thereby allowing for accurate estimation of the conditional PSEs with only one sample per iteration.

Fig. 5
figure 5

Trace plots of a \(\sigma \), b \(\lambda _1\), and cd two randomly selected \(\tau (\textbf{x}_i)\) generated under setup (i) with \(\lambda _0(t) = 1\)

Table 10 Average bias, relative bias, RMSE, coverage rate and length of the 95% credible interval for the average PSEs on the scale of log hazard, survival probability, and RMST at the maximal observed time under setup (i) with \(n = 1000\) and a true baseline hazard of \(\lambda _0(t)=1/2t^{1/2}\)
Table 11 Average MSE of mediation effects across individuals and average (in-sample) PRMSE of the mediator and the time-to-event outcome on three scales under setup(i) with \(\lambda _0(t)=1/2t^{1/2}\)

The group-average or sample-average PSEs are identified through an additional integration over the covariate distribution \(F_{\textbf{x}}\). We followed the common practice in the mediation literature to approximate the integration using empirical distribution of the covariates based on the collected sample (Albert and Nelson 2011; Bonetti and Gelber 2004), under which circumstance Eq. 4 simplifies to

$$\begin{aligned}{} & {} {} {\bar{\omega }}\{T(a,M(a'))\} \approx \frac{1}{n}\sum _{i=1}^n \omega \{T_i(a,M_i(a'))\} \nonumber \\{}{} & {} {} = \frac{1}{n}\sum _{i=1}^n \int \omega \{T_i |A_i = a, M_i = m, {\textbf {x}}_i = \varvec{x}\} dF_{M_i | A_i = a', {\textbf {x}}_i=\varvec{x}}.\nonumber \\ \end{aligned}$$
(A.2)

Appendix B: Full conditional distributions of the tree ensembles

Denote \({\textbf{Y}} = (Y_1, Y_2, \dots , Y_n)^T\), \({\textbf{M}} = (M_1, M_2, \dots , M_n)^T\), \({\textbf{A}} = (A_1, A_2, \dots , A_n)^T\), \({\textbf{X}} = (\textbf{x}_1^T, \textbf{x}_2^T, \dots , \textbf{x}_n^T)^T\), and \(\varvec{\delta } = (\delta _1, \dots , \delta _n)^T\). Let \({\mathcal {D}} = \{{\textbf{Y}},{\textbf{X}},{\textbf{A}},\varvec{\delta }, {\textbf{M}}\}\) denote the observed data. The complete data likelihood of the proposed model is expressed as

$$\begin{aligned} p({\mathcal {D}}|\cdot )= & {} \prod _{i=1}^{n}\exp \Bigg [-\Lambda _0(Y_i)\exp \{\upsilon (\textbf{x}_i, M_i)+ \tau (\textbf{x}_i)A_i\}\big ]\nonumber \\{} & {} \quad \big [\lambda _0(Y_i)\exp \{\upsilon (\textbf{x}_i,M_i)+ \tau (\textbf{x}_i)A_i\}\Bigg ]^{\delta _i}\nonumber \\{} & {} \quad \times \frac{1}{\sqrt{2\pi }\sigma }\exp \{-\frac{(M_i - \upsilon _M(\textbf{x}_i) - \tau _M(\textbf{x}_i)A_i )^2}{2\sigma ^2}\},\nonumber \\ \end{aligned}$$
(A.3)

where \(\Lambda _0(t) = \int _0^t \lambda _0(u)du\) is the cumulative baseline hazard function. After assuming prior independence among the unknown parameters \(\{\mathcal {T}_j, \mathcal {M}_j\}_{j=1}^J\), \(\{\widetilde{\mathcal {T}}_h, \widetilde{\mathcal {M}}_h^{(M)}, \widetilde{\mathcal {M}}_h\}_{h=1}^H\), \(\{\breve{\mathcal {T}}_k, \breve{\mathcal {M}}_j\}_{k=1}^K\), \(\varvec{\lambda }\), and \(\sigma \), and prior independence among the individual binary trees inside each tree ensembles as well, the full conditional distribution of the parameters are derived as

follows.

(I) Full conditional distribution of \(\{\mathcal {T}_j, \mathcal {M}_j\}_{j=1}^J\)

Let \(R_{ij} = M_i - \sum _{l \ne j}g({\textbf {x}}_i;\mathcal {T}_l, \mathcal {M}_l) - \tau _M({\textbf {x}}_i)A_i\) and \(\varvec{R}_j = (R_{1j}, \dots , R_{nj})^T\) denote the partial residuals in the mediator regression equation w.r.t the jth binary tree \(\{\mathcal {T}_j,\mathcal {M}_j\}\). By prior independence and the i.i.d. assumption on the sample data,

$$\begin{aligned}{} & {} {} p\left( \{\mathcal {T}_j,\mathcal {M}_j\}_{j = 1}^{J}|{\mathcal {D}}, \sigma , \{\widetilde{\mathcal {T}}_h,\widetilde{\mathcal {M}}_h^{(M)}\}_{h=1}^H \right) \\{}{} & {} {} =\prod _{j=1}^{J}p(\mathcal {M}_j|\varvec{R}_j, \mathcal {T}_j, \sigma )p(\mathcal {T}_j|\varvec{R}_j,\sigma ), \end{aligned}$$

where

$$\begin{aligned}{} & {} {} p( \mathcal {M}_j|{\mathcal {D}},\mathcal {T}_j,\sigma ) \propto p(\varvec{R}_j|\mathcal {M}_j,\mathcal {T}_j, \sigma )p(\mathcal {M}_j|\mathcal {T}_j)\nonumber \\{}{} & {} {} \propto \prod _{l=1}^{b_j}\prod _{i:\mathbb {1}_{il}=1}\exp \Big \{-\frac{1}{2\sigma ^2}(R_{ij} -\mu _{jl})^2\Big \}\exp \Big \{-\frac{1}{2\sigma _\mu ^2}\mu _{jl}^2\Big \}\nonumber \\{}{} & {} {} \propto \prod _{l=1}^{b_j}\exp \Big \{-\frac{1}{2\sigma _\mu ^2}\mu _{jl}^2-\frac{1}{2\sigma ^2}\sum _{i:\mathbb {1}_{il} = 1}(R_{ij} - \mu _{jl})^2\Big \}\nonumber \\{}{} & {} {} \overset{d}{=}\ \prod _{l=1}^{b_j}N(a_l,\sigma ^2_l), \end{aligned}$$
(A.4)

with \(\sigma _l^2 = \frac{\sigma ^2\sigma _\mu ^2}{\sigma ^2 + \sigma _\mu ^2\sum _i\mathbb {1}_{il}}\), \(a_l = \frac{\sigma _\mu ^2\sum _{i:\mathbb {1}_{il} = 1}R_{ij}}{\sigma ^2 + \sigma _\mu ^2\sum _i\mathbb {1}_{il}}\), \(\sigma ^2_\mu = \frac{c^2_1}{4Jk_1^2}\), \(\mathbb {1}_{il}\) denoting that subject i is allocated to the lth terminal node of \(\mathcal {T}_j\) with mean parameter \(\mu _{jl}\), and ‘\(\overset{d}{=}\)’ representing equivalence in distribution. The conjugacy makes it feasible to draw \(\mathcal {T}_j\) from its marginal posterior distribution where \(\mathcal {M}_j\) is integrated out to avoid the need for reversible jumps. That is,

$$\begin{aligned}{} & {} {} p(\mathcal {T}_j| \varvec{R}_j,\sigma ) \propto p(\mathcal {T}_j)\left( \int p(\varvec{R}_j|\mathcal {M}_j, \mathcal {T}_j, \sigma )p(\mathcal {M}_j|\mathcal {T}_j)d\mathcal {M}_j\right) \nonumber \\{}{} & {} {} \propto p(\mathcal {T}_j)\prod _{l=1}^{b_j} \int \prod _{i:\mathbb {1}_{il}=1}\frac{1}{\sqrt{2\pi \sigma ^2}}\exp \left\{ -\frac{1}{2\sigma ^2}(R_{ij} - \mu _{jl})^2\right\} \nonumber \\{}{} & {} {} \quad \times \frac{1}{\sqrt{2\pi \sigma ^2_\mu }}\exp \left\{ -\frac{\mu _{jl}^2}{2\sigma ^2_\mu }\right\} d\mu _{jl}\nonumber \\{}{} & {} {} \propto p(\mathcal {T}_j)\prod _{l=1}^{b_j} \left( \frac{1}{\sqrt{2\pi \sigma ^2}}\right) ^{n}\exp \left\{ -\frac{\sum _{i=1}^n(R_{ij})^2}{2\sigma ^2}\right\} \nonumber \\{}{} & {} {} \quad \times \sqrt{\frac{\sigma _l^2}{\sigma ^2_\mu }}\exp \left\{ \frac{\left( \sum _{i:\mathbb {1}_{il} = 1}R_{ij}\right) ^2\sigma _l^2}{2\sigma ^4}\right\} . \end{aligned}$$
(A.5)

To acquire posterior samples of \(\mathcal {T}_j\), we employed the Metropolis-Hastings algorithm (Metropolis et al. 1953; Hastings 1970), with a classical proposal distribution suggested by (Chipman et al. 2010) that includes four possible moves with probabilities as follows: splitting at a leaf node (0.25), pruning a pair of leaf nodes (0.25), changing the splitting rule of an internal node (0.4), and swapping the splitting rule between an internal node and its child (0.1). Given the current iteration \(\mathcal {T}_j^{(itr)}\), a candidate tree \(\mathcal {T}_j^*\) is proposed according to the proposal distribution \(q(\cdot |\cdot )\), and is accepted as \(\mathcal {T}_j^{(itr+1)}\) with a probability of \(\min \left\{ 1, \frac{q\left( \mathcal {T}_j^{(itr)}\big |\mathcal {T}_j^*\right) p\left( \mathcal {T}_j^*\big |\varvec{R}^{(itr)}_j, \sigma ^{(itr)}\right) }{q\left( \mathcal {T}_j^*\big |\mathcal {T}_j^{(itr)}\right) p\left( \mathcal {T}_j^{(itr)}\big |\varvec{R}^{(itr)}_j, \sigma ^{(itr)}\right) }\right\} \).

(II) Full conditional distribution of \(\{\widetilde{\mathcal {T}}_h, \widetilde{\mathcal {M}}_h, \widetilde{\mathcal {M}}_h^{(M)}\}_{h=1}^H\)

Let \(\widetilde{R}_{ih}^{(M)} = M_i - \upsilon _M({\textbf {x}}_i) - \sum _{l \ne h}g({\textbf {x}}_i;\widetilde{\mathcal {T}}_l, \widetilde{\mathcal {M}}_l^{(M)})A_i\) and \(\varvec{\widetilde{R}}_h^{(M)} = (\widetilde{R}_{1\,h}^{(M)}, \dots , \widetilde{R}_{nh}^{(M)})^T\) denote the partial residuals in the mediator regression equation w.r.t the hth binary tree \(\{\widetilde{\mathcal {T}}_h,\widetilde{\mathcal {M}}_h^{(M)}\}\). Let \(\widetilde{R}_{ih} = \upsilon ({\textbf {x}}_i, M_i) + \sum _{l \ne h}g({\textbf {x}}_i;\widetilde{\mathcal {T}}_l, \widetilde{\mathcal {M}}_l)A_i\) and \(\varvec{\widetilde{R}}_h = (\widetilde{R}_{1h}, \dots , \widetilde{R}_{nh})^T\) denote the partial residuals in the PH model w.r.t \(\{\widetilde{\mathcal {T}}_h,\widetilde{\mathcal {M}}_h\}\). The full conditionals of \(\{\widetilde{\mathcal {T}}_h, \widetilde{\mathcal {M}}_h, \widetilde{\mathcal {M}}_h^{(M)}\}_{h=1}^H\) are derived similarly, with the only difference being that the leaf node parameters are bivariate due to the shared tree topologies. Assuming independence among the leaf node parameters, we have

$$\begin{aligned}{} & {} {} p\left( \{\widetilde{\mathcal {T}}_h,\widetilde{\mathcal {M}}_h^{(M)}, \widetilde{\mathcal {M}}_h\}_{h = 1}^{H}|{\mathcal {D}}, \sigma , \{(\mathcal {T}_j,\mathcal {M}_j)\}_{j=1}^J, \{\breve{\mathcal {T}}_k,\breve{\mathcal {M}}_k\}_{k=1}^K, \varvec{\lambda } \right) \\{}{} & {} {} =\,\prod _{h=1}^{H}p\left( \widetilde{\mathcal {M}}_h^{(M)},\widetilde{\mathcal {M}}_h|\varvec{\widetilde{R}}_h^{(M)}, \varvec{\widetilde{R}}_h,\widetilde{\mathcal {T}}_h, \varvec{\lambda },\sigma \right) \\{}{} & {} {} p\left( \widetilde{\mathcal {T}}_h|\varvec{\widetilde{R}}_h^{(M)}, \varvec{\widetilde{R}}_h,\varvec{\lambda },\sigma \right) =\,\prod _{h=1}^{H}p\left( \widetilde{\mathcal {M}}_h| \varvec{\widetilde{R}}_h,\widetilde{\mathcal {T}}_h, \varvec{\lambda }\right) p\left( \widetilde{\mathcal {M}}_h^{(M)}|\varvec{\widetilde{R}}_h^{(M)}, \widetilde{\mathcal {T}}_h, \sigma \right) \\{}{} & {} {} \quad \times p\left( \widetilde{\mathcal {T}}_h|\varvec{\widetilde{R}}_h^{(M)}, \varvec{\widetilde{R}}_h,\varvec{\lambda },\sigma \right) \end{aligned}$$

where

$$\begin{aligned}{} & {} {} p\left( \widetilde{\mathcal {M}}_h| \varvec{\widetilde{R}}_h,\widetilde{\mathcal {T}}_h, \varvec{\lambda }\right) \propto p\left( \varvec{\widetilde{R}}_h|\widetilde{\mathcal {M}}_h,\widetilde{\mathcal {T}}_h,\varvec{\lambda }\right) p\left( \widetilde{\mathcal {M}}_h|\widetilde{\mathcal {T}}_h\right) \nonumber \\{}{} & {} {} \propto \prod _{l=1}^{b_h}\prod _{i:\mathbb {1}_{il}=1}\exp \left[ \delta _i\left\{ \widetilde{R}_{ih} + \widetilde{\mu }_{hl}A_i\right\} - \exp \left\{ \widetilde{R}_{ih} + \widetilde{\mu }_{hl}A_i\right\} \right. \nonumber \\{}{} & {} {} \quad \left. \times \sum _{s:i\in {\mathcal {R}}(s)}\lambda _s\delta _s\right] \exp \left[ \widetilde{\mu }_{hl}{\widetilde{\zeta }} - \exp (\widetilde{\mu }_{hl}){\widetilde{\eta }}\right] \nonumber \\{}{} & {} {} \propto \prod _{l=1}^{b_h} \exp \left[ \widetilde{\mu }_{hl}\left( \sum _{i:\mathbb {1}_{il}= 1}\delta _iA_i + {\widetilde{\zeta }}\right) \right. \nonumber \\{}{} & {} {} \quad \left. - \exp (\widetilde{\mu }_{hl})\left\{ {\widetilde{\eta }} +\sum _{i:\mathbb {1}_{il}= 1,A_i=1}\exp (\widetilde{R}_{ih})\sum _{s:i\in {\mathcal {R}}(s)}\lambda _s\delta _s\right\} \right] \nonumber \\{}{} & {} {} \overset{d}{=}\prod _{l=1}^{b_h}logGam({\widetilde{\zeta }}^*, {\widetilde{\eta }}^*), \end{aligned}$$
(A.6)

with \({\widetilde{\zeta }}^* = \sum _{i:\mathbb {1}_{il}= 1}\delta _iA_i + {\widetilde{\zeta }}\) and \({\widetilde{\eta }}^* = {\widetilde{\eta }} +\sum _{i:\mathbb {1}_{il}= 1,A_i=1}\)\(\exp (\widetilde{R}_{ih})\sum _{s:i\in {\mathcal {R}}(s)}\lambda _s\delta _s\); and

$$\begin{aligned}{} & {} {} p\left( \widetilde{\mathcal {M}}_h^{(M)}|\varvec{\widetilde{R}}_h^{(M)}, \widetilde{\mathcal {T}}_h, \sigma \right) \propto p\left( \varvec{\widetilde{R}}_h^{(M)}|\widetilde{\mathcal {M}}_h^{(M)},\widetilde{\mathcal {T}}_h, \sigma \right) p\left( \widetilde{\mathcal {M}}_h^{(M)}|\widetilde{\mathcal {T}}_h\right) \nonumber \\{}{} & {} {} \propto \prod _{l=1}^{b_h}\prod _{i:\mathbb {1}_{il}=1}\exp \left\{ -\frac{1}{2\sigma ^2}\left( \widetilde{R}_{ih}^{(M)} -\widetilde{\mu }_{hl}^{(M)}\right) ^2\right\} \exp \left\{ -\frac{1}{2\sigma _{\widetilde{\mu }}^2}\left( \widetilde{\mu }_{hl}^{(M)}\right) ^2\right\} \nonumber \\{}{} & {} {} \overset{d}{=}\ \prod _{l=1}^{b_h}N({\widetilde{a}}_l,{\widetilde{\sigma }}^2_l), \end{aligned}$$
(A.7)

with \({\widetilde{\sigma }}_l^2 = \frac{\sigma ^2\sigma _{\widetilde{\mu }}^2}{\sigma ^2 + \sigma _{\widetilde{\mu }}^2\sum _i\mathbb {1}_{il}}\), \({\widetilde{a}}_l = \frac{\sigma _{\widetilde{\mu }}^2\sum _{i:\mathbb {1}_{il} = 1}\widetilde{R}_{ih}^{(M)}}{\sigma ^2 + \sigma _{\widetilde{\mu }}^2\sum _i\mathbb {1}_{il}}\), and \(\sigma ^2_{\widetilde{\mu }} = \frac{c^2_2}{4Hk_2^2}\).

Using the conjugate normal and log-gamma priors once again leads to closed-form marginal posteriors for \(\widetilde{\mathcal {T}}_h\), which is integrated over \(\{\widetilde{\mathcal {M}}_h,\widetilde{\mathcal {M}}_h^{(M)}\}\) and expressed as

$$\begin{aligned}{} & {} p\left( \widetilde{\mathcal {T}}_h|\varvec{\widetilde{R}}_h, \varvec{\widetilde{R}}_h^{(M)},\varvec{\lambda },\sigma \right) \propto \nonumber \\{} & {} \quad p(\widetilde{\mathcal {T}}_h)\int p\left( \varvec{\widetilde{R}}_h|\widetilde{\mathcal {M}}_h, \widetilde{\mathcal {T}}_h, \varvec{\lambda }\right) p\left( \widetilde{\mathcal {M}}_h|\widetilde{\mathcal {T}}_h\right) d\widetilde{\mathcal {M}}_h\nonumber \\{} & {} \quad \int p\left( \varvec{\widetilde{R}}_h^{(M)}|\widetilde{\mathcal {M}}_h^{(M)}, \widetilde{\mathcal {T}}_h, \sigma \right) p\left( \widetilde{\mathcal {M}}_h^{(M)}|\widetilde{\mathcal {T}}_h\right) d\widetilde{\mathcal {M}}_h^{(M)}\nonumber \\{} & {} \quad \propto p(\widetilde{\mathcal {T}}_h)\prod _{l=1}^{b_h} \int \exp \Bigg [\sum _{i:\mathbb {1}_{il}= 1}\delta _i\widetilde{R}_{ih} + \widetilde{\mu }_{hl}\sum _{i:\mathbb {1}_{il}= 1}\delta _iA_i\nonumber \\{} & {} \quad - \exp (\widetilde{\mu }_{hl})\sum _{i:\mathbb {1}_{il}=A_i= 1}\exp (\widetilde{R}_{ih})\sum _{s:i\in {\mathcal {R}}(s)}\lambda _s\delta _s\nonumber \\{} & {} \quad - \sum _{i:\mathbb {1}_{il}= 1,A_i=0}\exp (\widetilde{R}_{ih})\sum _{s:i\in {\mathcal {R}}(s)}\lambda _s\delta _s\Bigg ]\nonumber \\{} & {} \quad \times \frac{{\widetilde{\eta }}^{{\widetilde{\zeta }}}}{\Gamma ({\widetilde{\zeta }})}\exp \left( \widetilde{\mu }_{hl} {\widetilde{\zeta }} - \exp (\widetilde{\mu }_{hl}){\widetilde{\eta }}\right) d\widetilde{\mu }_{hl}\nonumber \\{} & {} \quad \times \int \prod _{i:\mathbb {1}_{il}=1}\frac{1}{\sqrt{2\pi \sigma ^2}}\exp \left\{ -\frac{1}{2\sigma ^2}\left( \widetilde{R}_{ih}^{(M)} - \widetilde{\mu }_{hl}^{(M)}\right) ^2\right\} \nonumber \\{} & {} \quad \frac{1}{\sqrt{2\pi \sigma ^2_{\widetilde{\mu }}}}\exp \left\{ -\frac{\left( \widetilde{\mu }_{hl}^{(M)}\right) ^2}{2\sigma ^2_{\widetilde{\mu }}}\right\} d\widetilde{\mu }_{hl}^{(M)}\nonumber \\{} & {} \quad \propto p(\widetilde{\mathcal {T}}_h)\prod _{l=1}^{b_h}\exp \left( \sum _{i:\mathbb {1}_{il}= 1}\delta _i\widetilde{R}_{ih}\right. \nonumber \\{} & {} \left. \quad - \sum _{i:\mathbb {1}_{il}= 1,A_i=0}\exp (\widetilde{R}_{ih})\sum _{s:i\in {\mathcal {R}}(s)}\lambda _s\delta _s\right) \frac{{\widetilde{\eta }}^{{\widetilde{\zeta }}}\Gamma ({\widetilde{\zeta }}^*)}{\Gamma ({\widetilde{\zeta }}) ({\widetilde{\eta }}^*)^{{\widetilde{\zeta }}^*}}\nonumber \\{} & {} \quad \times \left( \frac{1}{\sqrt{2\pi \sigma ^2}}\right) ^{n}\exp \left\{ -\frac{\sum _{i=1}^n(\widetilde{R}_{ih}^{(M)})^2}{2\sigma ^2}\right\} \nonumber \\{} & {} \quad \sqrt{\frac{{\widetilde{\sigma }}_l^2}{\sigma ^2_{\widetilde{\mu }}}}\exp \left\{ \frac{\left( \sum _{i:\mathbb {1}_{il} = 1}\widetilde{R}_{ih}^{(M)}\right) ^2{\widetilde{\sigma }}_l^2}{2\sigma ^4}\right\} . \end{aligned}$$
(A.8)

Posterior draws of \(\widetilde{\mathcal {T}}_h\) can thus be obtained using MH algorithm with the same proposal distribution described above.

(III) Full conditional distribution of \(\{\breve{\mathcal {T}}_k, \breve{\mathcal {M}}_k\}_{k=1}^K\)

Let \(\breve{R}_{ik} = \sum _{l \ne k}g({\textbf {x}}_i, M_i;\breve{\mathcal {T}}_l, \breve{\mathcal {M}}_l) + \tau ({\textbf {x}}_i)A_i\) and \(\varvec{\breve{R}}_k = (\breve{R}_{1k}, \dots , \breve{R}_{nk})^T\) denote the partial residuals in the PH model w.r.t \(\{\breve{\mathcal {T}}_k,\breve{\mathcal {M}}_k\}\). The full conditionals of \(\{\breve{\mathcal {T}}_k, \breve{\mathcal {M}}_k\}_{k=1}^K\) are derived as

$$\begin{aligned}{} & {} {} p\left( \{\breve{\mathcal {T}}_k,\breve{\mathcal {M}}_k\}_{k = 1}^{K}|{\mathcal {D}}, \varvec{\lambda }, \{\widetilde{\mathcal {T}}_h,\widetilde{\mathcal {M}}_h\}_{h=1}^H \right) \\{}{} & {} {} = \prod _{k=1}^{K}p\left( \breve{\mathcal {M}}_k|\varvec{\breve{R}}_k, \breve{\mathcal {T}}_k, \varvec{\lambda }\right) p\left( \breve{\mathcal {T}}_k|\varvec{\breve{R}}_k,\varvec{\lambda }\right) , \end{aligned}$$

where

$$\begin{aligned}{} & {} {} p\left( \breve{\mathcal {M}}_k|\varvec{\breve{R}}_k, \breve{\mathcal {T}}_k, \varvec{\lambda }\right) \propto p\left( \varvec{\breve{R}}_k|\breve{\mathcal {M}}_k, \breve{\mathcal {T}}_k, \varvec{\lambda }\right) p\left( \breve{\mathcal {M}}_k|\breve{\mathcal {T}}_k\right) \nonumber \\{}{} & {} {} \propto \prod _{l=1}^{b_k}\prod _{i:\mathbb {1}_{il}=1}\exp \left[ \delta _i\left\{ \breve{R}_{ik} + \breve{\mu }_{kl}\right\} \right. \nonumber \\{}{} & {} {} \quad \left. - \exp \left\{ \breve{R}_{ik} + \breve{\mu }_{kl}\right\} \sum _{s:i\in {\mathcal {R}}(s)}\lambda _s\delta _s\right] \exp \left[ \breve{\mu }_{kl}\breve{\zeta } - \exp (\breve{\mu }_{kl})\breve{\eta }\right] \nonumber \\{}{} & {} {} \propto \prod _{l=1}^{b_k} \exp \left[ \breve{\mu }_{kl}\left( \sum _{i:\mathbb {1}_{il}= 1}\delta _i + \breve{\zeta }\right) \right. \nonumber \\{}{} & {} {} \quad \left. - \exp (\breve{\mu }_{kl})\left\{ \breve{\eta } +\sum _{i:\mathbb {1}_{il}= 1}\exp (\breve{R}_{ik})\sum _{s:i\in {\mathcal {R}}(s)}\lambda _s\delta _s\right\} \right] \nonumber \\{}{} & {} {} \overset{d}{=}\prod _{l=1}^{b_k}\text{ log-gamma }(\breve{\zeta }^*, \breve{\eta }^*), \end{aligned}$$
(A.9)

with \(\breve{\zeta }^* = \sum _{i:\mathbb {1}_{il}= 1}\delta _i + \breve{\zeta }\) and \(\breve{\eta }^* = \breve{\eta } +\sum _{i:\mathbb {1}_{il}= 1}\exp (\breve{R}_{ik}) \sum _{s:i\in {\mathcal {R}}(s)}\lambda _s\delta _s\). The marginal posterior distribution of \(\breve{\mathcal {T}}_k\) is then derived as

$$\begin{aligned}{} & {} {} p(\breve{\mathcal {T}}_k|\varvec{\breve{R}}_k, \varvec{\lambda }) \propto p(\breve{\mathcal {T}}_k)\int p\left( \varvec{\breve{R}}_k|\breve{\mathcal {M}}_k, \breve{\mathcal {T}}_k, \varvec{\lambda }\right) p\left( \breve{\mathcal {M}}_k|\breve{\mathcal {T}}_k\right) d\breve{\mathcal {M}}_k,\nonumber \\{}{} & {} {} \propto p(\breve{\mathcal {T}}_k)\prod _{l=1}^{b_k}\int \exp \left[ \sum _{i:\mathbb {1}_{il}= 1}\delta _i\breve{R}_{ik} + \breve{\mu }_{kl}\sum _{i:\mathbb {1}_{il}= 1}\delta _i \right. \nonumber \\{}{} & {} {} \quad \left. - \exp (\breve{\mu }_{kl})\sum _{i:\mathbb {1}_{il}= 1}\exp (\breve{R}_{ik})\sum _{s:i\in {\mathcal {R}}(s)}\lambda _s\delta _s\right] \nonumber \\{}{} & {} {} \quad \times \frac{\breve{\eta }^{\breve{\zeta }}}{\Gamma (\breve{\zeta })}\exp \left\{ \breve{\mu }_{kl}\breve{\zeta } - \exp (\breve{\mu }_{kl})\breve{\eta }\right\} d\breve{\mu }_{kl}\nonumber \\{}{} & {} {} \propto p(\breve{\mathcal {T}}_k)\prod _{l=1}^{b_k}\exp \left( \sum _{i:\mathbb {1}_{il}= 1}\delta _i\breve{R}_{ik}\right) \nonumber \\{}{} & {} {} \quad \int \frac{\breve{\eta }^{\breve{\zeta }}}{\Gamma (\breve{\zeta })}\exp \Bigg [ \breve{\mu }_{kl}\bigg (\sum _{i:\mathbb {1}_{il}= 1}\delta _i + \breve{\zeta }\bigg ) \nonumber \\{}{} & {} {} \quad - \exp (\breve{\mu }_{kl})\bigg \{\breve{\eta }+\sum _{i:\mathbb {1}_{il}= 1}\exp (\breve{R}_{ik})\sum _{s: i\in {\mathcal {R}}(s)}\lambda _s\delta _s\bigg \}\Bigg ]d\breve{\mu }_{kl}\nonumber \\{}{} & {} {} \propto p(\breve{\mathcal {T}}_k)\prod _{l=1}^{b_k}\exp \left( \sum _{i:\mathbb {1}_{il}= 1}\delta _i\breve{R}_{ik}\right) \frac{\breve{\eta }^{\breve{\zeta }}\Gamma (\breve{\zeta }^*)}{\Gamma (\breve{\zeta })(\breve{\eta }^*)^{\breve{\zeta }^*}}. \end{aligned}$$
(A.10)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, R., Song, X. Bayesian tree-based heterogeneous mediation analysis with a time-to-event outcome. Stat Comput 34, 24 (2024). https://doi.org/10.1007/s11222-023-10340-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11222-023-10340-1

Keywords

Navigation