Abstract
Missing data is a common occurrence in mediation analysis. As a result, the methods used to construct confidence intervals around the indirect effect should consider missing data. Previous research has demonstrated that, for the indirect effect in data with complete cases, the Monte Carlo method performs as well as nonparametric bootstrap confidence intervals (see MacKinnon et al., Multivariate Behavioral Research, 39(1), 99–128, 2004; Preacher & Selig, Communication Methods and Measures, 6(2), 77–98, 2012; Tofighi & MacKinnon, Structural Equation Modeling: A Multidisciplinary Journal, 23(2), 194–205, 2015). In this manuscript, we propose a simple, fast, and accurate two-step approach for generating confidence intervals for the indirect effect, in the presence of missing data, based on the Monte Carlo method. In the first step, an appropriate method, for example, full-information maximum likelihood or multiple imputation, is used to estimate the parameters and their corresponding sampling variance-covariance matrix in a mediation model. In the second step, the sampling distribution of the indirect effect is simulated using estimates from the first step. A confidence interval is constructed from the resulting sampling distribution. A simulation study with various conditions is presented. Implications of the results for applied research are discussed.
Similar content being viewed by others
Notes
While we use and recommend free and open-source statistical software, Mplus was used in the simulations because it is currently the fastest SEM software available. The speed was crucial for this study because it allowed us to investigate a wide range of simulation conditions (2,832) with adequate replication size (5,000). It also allowed us to use a large number of bootstrap samples (5,000) and imputations (100).
The path coefficient .376 for \({\alpha }\) and \({\beta }\) results in the indirect effect .141, the square of which times \(100\%\) is equal to \(2\%\). The path coefficient .600 for \({\alpha }\) and \({\beta }\) results in the indirect effect .361, the square of which times \(100\%\) is equal to \(13\%\). The path coefficient .714 for \({\alpha }\) and \({\beta }\) results in the indirect effect .50, the square of which times \(100\%\) is equal to \(26\%\).
The path coefficient .141 for \(\tau ^{\prime }\) squared times \(100\%\) is equal to \(2\%\). The path coefficient .361 for \(\tau ^{\prime }\) squared times \(100\%\) is equal to \(13\%\). The path coefficient .509 for \(\tau ^{\prime }\) squared times \(100\%\) is equal to \(26\%\).
The ampute function defines the proportion of missing cases as the proportion of the original case which would be dropped had listwise deletion been employed. There are other ways of thinking about proportion of missing data, for example, the number of cells in the data matrix with missing values over the total number of cells. For a more thorough presentation of the proportion of missing data used in the study see https://jeksterslab.github.io/manMCMedMiss/articles/proportion-missing.html.
An alternative total variance was introduced by Li et al. (1991) and was used in the simulation. While results for this approach are available in the supplementary materials, they will be omitted in the manuscript as this approach did not provide a significant improvement to the performance of Eq. 11.
The automated MI implementation in Mplus only pools the diagonal elements of the sampling variance-covariance matrix. The wrapper function FitModelMI fits the model using normal-theory maximum likelihood on each imputation in Mplus, extracts, and pools the entire sampling variance-covariance matrix for each of the fitted models.
References
Arbuckle, J. L. (1996). Full information estimation in the presence of incomplete data. In G. A. Marcoulides & R. E. Schumacker (Eds.), Advanced structural equation modeling. Psychology Press. https://doi.org/10.4324/9781315827414
Arbuckle, J. L. (2021). Amos 28.0 User’s Guide. Chicago, IBM SPSS.
Aroian, L. A. (1947). The probability function of the product of two normally distributed variables. The Annals of Mathematical Statistics, 18(2), 265–271. https://doi.org/10.1214/aoms/1177730442
Asparouhov, T., & Muthén, B. O. (2022). Multiple imputation with Mplus (Tech. Rep.). http://www.statmodel.com/download/Imputations7.pdf
Baron, R. M., & Kenny, D. A. (1986). The moderator-mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51(6), 1173–1182. https://doi.org/10.1037/0022-3514.51.6.1173
Bauer, D. J., Preacher, K. J., & Gil, K. M. (2006). Conceptualizing and testing random indirect effects and moderated mediation in multilevel models: New procedures and recommendations. Psychological Methods, 11(2), 142–163. https://doi.org/10.1037/1082-989x.11.2.142
Biesanz, J. C., Falk, C. F., & Savalei, V. (2010). Assessing mediational models: Testing and interval estimation for indirect effects. Multivariate Behavioral Research, 45(4), 661–701. https://doi.org/10.1080/00273171.2010.498292
Blanca, M. J., Arnau, J., López-Montiel, D., Bono, R., & Bendayan, R. (2013). Skewness and kurtosis in real data samples. Methodology, 9(2), 78–84. https://doi.org/10.1027/1614-2241/a000057
Bollen, K. A., & Stine, R. (1990). Direct and indirect effects: Classical and bootstrap estimates of variability. Sociological Methodology, 20, 115. https://doi.org/10.2307/271084
Bradley, J. V. (1978). Robustness? British Journal of Mathematical and Statistical Psychology, 31(2), 144–152. https://doi.org/10.1111/j.2044-8317.1978.tb00581.x
Cheung, G. W., & Lau, R. S. (2007). Testing mediation and suppression effects of latent variables. Organizational Research Methods, 11(2), 296–325. https://doi.org/10.1177/1094428107300343
Cheung, M.W.-L. (2009a). Comparison of methods for constructing confidence intervals of standardized indirect effects. Behavior Research Methods, 41(2), 425–438. https://doi.org/10.3758/brm.41.2.425
Cheung, M.W.-L. (2009b). Constructing approximate confidence intervals for parameters with structural equation models. Structural Equation Modeling: A Multidisciplinary Journal, 16(2), 267–294. https://doi.org/10.1080/10705510902751291
Cheung, M.W.-L. (2021). Synthesizing indirect effects in mediation models with meta-analytic methods. Alcohol and Alcoholism, 57(1), 5–15. https://doi.org/10.1093/alcalc/agab044
Cochran, W. G. (1952). The \(\chi ^{2}\) test of goodness of fit. The Annals of Mathematical Statistics, 23(3), 315–345. https://doi.org/10.1214/aoms/1177729380
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed). Routledge. https://doi.org/10.4324/9780203771587
Craig, C. C. (1936). On the frequency function of \(xy\). The Annals of Mathematical Statistics, 7(1), 1–15. https://doi.org/10.1214/aoms/1177732541
Davison, A. C., & Hinkley, D. V. (1997). Bootstrap methods and their application. Cambridge University Press. https://doi.org/10.1017/CBO9780511802843
Efron, B. (1987). Better bootstrap confidence intervals. Journal of the American Statistical Association, 82(397), 171–185. https://doi.org/10.1080/01621459.1987.10478410
Efron, B. (1988). Bootstrap confidence intervals: Good or bad? Psychological Bulletin, 104(2), 293–296. https://doi.org/10.1037/0033-2909.104.2.293
Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. Chapman & Hall. https://doi.org/10.1201/9780429246593
Enders, C. K. (2010). Applied missing data analysis. Guilford Publications.
Fritz, M. S., & MacKinnon, D. P. (2007). Required sample size to detect the mediated effect. Psychological Science, 18(3), 233–239. https://doi.org/10.1111/j.1467-9280.2007.01882.x
Goodman, L. A. (1960). On the exact variance of products. Journal of the American Statistical Association, 55(292), 708–713. https://doi.org/10.1080/01621459.1960.10483369
Graham, J. W., Olchowski, A. E., & Gilreath, T. D. (2007). How many imputations are really needed? Some practical clarifications of multiple imputation theory. Prevention Science, 8(3), 206–213. https://doi.org/10.1007/s11121-007-0070-9
Hayes, A. F. (2022). Introduction to mediation, moderation, and conditional process analysis: A regression-based approach (3rd ed.). Guilford Publications.
Hayes, A. F., & Scharkow, M. (2013). The relative trustworthiness of inferential tests of the indirect effect in statistical mediation analysis. Prevention Science, 24(10), 1918–1927. https://doi.org/10.1177/0956797613480187
Jorgensen, T. D., Pornprasertmanit, S., Schoemann, A. M., & Rosseel, Y. (2022). semTools: Useful tools for structural equation modeling. https://CRAN.R-project.org/package=semTools
Koopman, J., Howe, M., & Hollenbeck, J. R. (2014). Pulling the Sobel test up by its bootstraps. In More statistical and methodological myths and urban legends: Doctrine, verity and fable in organizational and social sciences (pp. 224–243). Routledge/Taylor & Francis Group. 10.4324/9780203775851
Koopman, J., Howe, M., Hollenbeck, J. R., & Sin, H.-P. (2015). Small sample mediation testing: Misplaced confidence in bootstrapped confidence intervals. Journal of Applied Psychology, 100(1), 194–202. https://doi.org/10.1037/a0036635
Li, K. H., Raghunathan, T. E., & Rubin, D. B. (1991). Large-sample significance levels from multiply imputed data using moment-based statistics and an F reference distribution. Journal of the American Statistical Association, 86(416), 1065–1073. https://doi.org/10.1080/01621459.1991.10475152
Little, R. J. A. & Rubin, D. B. (2019). Statistical analysis with missing data (3rd ed.). Wiley. https://doi.org/10.1002/9781119482260
MacKinnon, D. P. (2008). Introduction to statistical mediation analysis. Erlbaum Psych Press. https://doi.org/10.4324/9780203809556
MacKinnon, D. P., Fritz, M. S., Williams, J., & Lockwood, C. M. (2007). Distribution of the product confidence limits for the indirect effect: Program PRODCLIN. Behavior Research Methods, 39(3), 384–389. https://doi.org/10.3758/bf03193007
MacKinnon, D. P., Lockwood, C. M., Hoffman, J. M., West, S. G., & Sheets, V. (2002). A comparison of methods to test mediation and other intervening variable effects. Psychological Methods, 7(1), 83–104. https://doi.org/10.1037/1082-989x.7.1.83
MacKinnon, D. P., Lockwood, C. M., & Williams, J. (2004). Confidence limits for the indirect effect: Distribution of the product and resampling methods. Multivariate Behavioral Research, 39(1), 99–128. https://doi.org/10.1207/s15327906mbr3901_4
Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures. Psychological Bulletin, 105(1), 156–166. https://doi.org/10.1037/0033-2909.105.1.156
Muthén, L. K. & Muthén, B. O. (2017). Mplus user’s guide (8th ed.). Muthén & Muthén.
Pawitan, Y. (2013). In all likelihood: Statistical modelling and inference using likelihood. Oxford University Press.
Pesigan, I. J. A., & Cheung, S. F. (2020). SEM-based methods to form confidence intervals for indirect effect: Still applicable given nonnormality, under certain conditions. Frontiers in Psychology, 11,. https://doi.org/10.3389/fpsyg.2020.571928
Peugh, J. L., & Enders, C. K. (2004). Missing data in educational research: A review of reporting practices and suggestions for improvement. Review of Educational Research, 74(4), 525–556. https://doi.org/10.3102/00346543074004525
Preacher, K. J., & Hayes, A. F. (2008). Asymptotic and resampling strategies for assessing and comparing indirect effects in multiple mediator models. Behavior Research Methods, 40(3), 879–891. https://doi.org/10.3758/brm.40.3.879
Preacher, K. J., & Hayes, A. F. (2004). SPSS and SAS procedures for estimating indirect effects in simple mediation models. Behavior Research Methods, Instruments, & Computers, 36(4), 717–731. https://doi.org/10.3758/bf03206553
Preacher, K. J., & Selig, J. P. (2012). Advantages of Monte Carlo confidence intervals for indirect effects. Communication Methods and Measures, 6(2), 77–98. https://doi.org/10.1080/19312458.2012.679848
R Core Team. (2022). R: A language and environment for statistical computing.R Foundation for Statistical Computing. https://www.R-project.org/
Raghunathan, T. E., Lepkowski, J. M., Hoewyk, J. V., & Solenberger, P. (2001). A multivariate technique for multiply imputing missing values using a sequence of regression models. Survey Methodology, 27(1), 85–95.
Rosseel, Y. (2012). lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48(2). https://doi.org/10.18637/jss.v048.i02
Rubin, D. B. (1976). Inference and missing data. Biometrika, 63(3), 581–592. https://doi.org/10.1093/biomet/63.3.581
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. John Wiley & Sons, Inc. https://doi.org/10.1002/9780470316696
Savalei, V., & Rosseel, Y. (2021). Computational options for standard errors and test statistics with incomplete normal and nonnormal data in SEM. Structural Equation Modeling: A Multidisciplinary Journal, 29(2), 163–181. https://doi.org/10.1080/10705511.2021.1877548
Schafer, J. L. (1997). Analysis of Incomplete Multivariate Data. Chapman and Hall/CRC. https://doi.org/10.1201/9780367803025
Schafer, J. L., & Graham, J. W. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7(2), 147–177. https://doi.org/10.1037/1082-989x.7.2.147
Schouten, R. M., Lugtig, P., & Vink, G. (2018). Generating missing values for simulation purposes: A multivariate amputation procedure. Journal of Statistical Computation and Simulation, 88(15), 2909–2930. https://doi.org/10.1080/00949655.2018.1491577
Serlin, R. C. (2000). Testing for robustness in Monte Carlo studies. Psychological Methods, 5(2), 230–240. https://doi.org/10.1037/1082-989x.5.2.230
Shrout, P. E., & Bolger, N. (2002). Mediation in experimental and nonexperimental studies: New procedures and recommendations. Psychological Methods, 7(4), 422–445. https://doi.org/10.1037/1082-989x.7.4.422
Sobel, M. E. (1982). Asymptotic confidence intervals for indirect effects in structural equation models. Sociological Methodology, 13, 290. https://doi.org/10.2307/270723
Sobel, M. E. (1987). Direct and indirect effects in linear structural equation models. Sociological Methods & Research, 16(1), 155–176. https://doi.org/10.1177/0049124187016001006
Sobel, M. E. (1986). Some new results on indirect effects and their standard errors in covariance structure models. Sociological Methodology, 16, 159. https://doi.org/10.2307/270922
Taylor, A. B., MacKinnon, D. P., & Tein, J.-Y. (2007). Tests of the three-path mediated effect. Organizational Research Methods, 11(2), 241–269. https://doi.org/10.1177/1094428107300344
Tofighi, D., & Kelley, K. (2020). Improved inference in mediation analysis: Introducing the model-based constrained optimization procedure. Psychological Methods, 25, 496–515. https://doi.org/10.1037/met0000259
Tofighi, D., & Kelley, K. (2019). Indirect effects in sequential mediation models: Evaluating methods for hypothesis testing and confidence interval formation. Multivariate Behavioral Research, 55(2), 188–210. https://doi.org/10.1080/00273171.2019.1618545
Tofighi, D., & MacKinnon, D. P. (2015). Monte Carlo confidence intervals for complex functions of indirect effects. Structural Equation Modeling: A Multidisciplinary Journal, 23(2), 194–205. https://doi.org/10.1080/10705511.2015.1057284
van Buuren, S. (2018). Flexible imputation of missing data (2nd ed.). Chapman; Hall/CRC. https://doi.org/10.1201/9780429492259
van Buuren, S., Brand, J. P. L., Groothuis-Oudshoorn, C. G. M., & Rubin, D. B. (2006). Fully conditional specification in multivariate imputation. Journal of Statistical Computation and Simulation, 76(12), 1049–1064. https://doi.org/10.1080/10629360600810434
van Buuren, S. & Groothuis-Oudshoorn, K. (2011). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45(3). https://doi.org/10.18637/jss.v045.i03
Venables, W. N., & Ripley, B. D. (2002). Modern applied statistics with S. Springer, New York.https://doi.org/10.1007/978-0-387-21706-2
Venzon, D. J., & Moolgavkar, S. H. (1988). A method for computing profile-likelihood-based confidence intervals. Applied Statistics, 37(1), 87. https://doi.org/10.2307/2347496
Wu, W., & Jia, F. (2013). A new procedure to test mediation with missing data through nonparametric bootstrapping and multiple imputation. Multivariate Behavioral Research, 48(5), 663–691. https://doi.org/10.1080/00273171.2013.816235
Yuan, K.-H., & Bentler, P. M. (2000). Three likelihood-based methods for mean and covariance structure analysis with nonnormal missing data. Sociological Methodology, 30(1), 165–200. https://doi.org/10.1111/0081-1750.00078
Yzerbyt, V., Muller, D., Batailler, C., & Judd, C. M. (2018). New recommendations for testing indirect effects in mediational models: The need to report and test component paths. Journal of Personality and Social Psychology, 115(6), 929–943. https://doi.org/10.1037/pspa0000132
Zhang, Z., & Wang, L. (2012). Methods for mediation analysis with missing data. Psychometrika, 78(1), 154–184. https://doi.org/10.1007/s11336-012-9301-5
Zhang, Z., Wang, L., & Tong, X. (2015). Mediation analysis with missing data through multiple imputation and bootstrap. In Quantitative Psychology Research (pp. 341–355). Springer International Publishing. https://doi.org/10.1007/978-3-319-19977-1_24
Acknowledgements
The simulation was performed in part at the High-Performance Computing Cluster (HPCC) which is supported by the Information and Communication Technology Office (ICTO) of the University of Macau.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Pesigan, I.J.A., Cheung, S.F. Monte Carlo confidence intervals for the indirect effect with missing data. Behav Res 56, 1678–1696 (2024). https://doi.org/10.3758/s13428-023-02114-4
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.3758/s13428-023-02114-4