Published
Influence Diagnostics for Correlated Binomial Regression Models: An Application to a Data Set on High-Cost Health Services Occurrence
Diagnósticos de influencia para modelos de regresión binomial correlacionada: una aplicación a un conjunto de datos sobre la ocurrencia de servicios de salud de alto costo
DOI:
https://doi.org/10.15446/rce.v44n2.85606Keywords:
generalized binomial distribution, health plan, influence, overdispersion, regression, residuals (en)distribución binomial generalizada, plan de salud, influencia, sobredispersión, regresión, residuos (es)
Downloads
This paper considers a frequentist perspective to deal with the class of correlated binomial regression models (Pires & Diniz, 2012), thus providing a new approach to analyze correlated binary response variables. Model parameters are estimated by direct maximization of the log-likelihood function. We also consider a diagnostic analysis under the correlated binomial regression model setup, which is performed considering residuals based on predictive values and deviance residuals (Cook & Weisberg, 1982) to check for model assumptions, and global in˛uence measure based on case-deletion (Cook, 1977) to detect in˛uential observations. Moreover, a sensitivity analysis is carried out to detect possible in˛uential observations that could a˙ect the inferential results. This is done using local in˛uence metrics (Cook, 1986) with case-weight, response, and covariate perturbation schemes. A simulation study is conducted to assess the frequentist properties of model parameter estimates and check the performance of the considered diagnostic metrics under the correlated binomial regression model. A data set on high-cost claims made to a private health care provider in Brazil is analyzed to illustrate the proposed methodology.
Este artículo considera una perspectiva frecuentista para tratar con la clase de modelos de regresión binomial correlacionada (Pires & Diniz, 2012), proporcionando así un nuevo enfoque para analizar variables de respuesta binaria correlacionadas. Los parámetros del modelo se estiman mediante la maximización directa de la función de log-verosimilitud. También consideramos un análisis de diagnóstico bajo la configuración del modelo de regresión binomial correlacionada, que se realiza considerando los residuos basados en valores predictivos y los residuos de desviación (Cook & Weisberg, 1982) para verificar los supuestos del modelo y la medida de influencia global basada en la eliminación de casos (Cook, 1977) para detectar observaciones influyentes. Además, se realiza un análisis de sensibilidad para detectar posibles observaciones influyentes que podrían afectar los resultados inferenciales. Esto se hace utilizando métricas de influencia local (Cook, 1986) con esquemas de perturbación de covariable, variable respuesta y ponderación de casos. Se realiza un estudio de simulación para evaluar las propiedades frecuentistas de los estimadores de parámetros del modelo y verificar el rendimiento de las métricas de diagnóstico consideradas bajo el modelo de regresión binomial correlacionada. Se analiza un conjunto de datos sobre un plan de salud de un operador brasileño para ilustrar la metodología propuesta.
References
Agresti, A. (2015), Foundations of Linear and Generalized Linear Models, Wiley Series in Probability and Statistics, rst edn, Wiley, New Jersey.
Akaike, H. (1974), 'A new look at the statistical model identification', IEEE Transactions on Automatic Control 19(6), 716-723. DOI: https://doi.org/10.1109/TAC.1974.1100705
Altham, P. M. E. (1978), 'Two generalizations of the binomial distribution', Journal of the Royal Statistical Society. Series C 27(2), 162-167. DOI: https://doi.org/10.2307/2346943
Cook, R. D. (1977), 'Detection of influential observations in linear regression', Technometrics 19(1), 15-18. DOI: https://doi.org/10.1080/00401706.1977.10489493
Cook, R. D. (1986), 'Assessment of local influence', Journal of the Royal Statistical Society. Series B (Methodological) 48(2), 133-169. DOI: https://doi.org/10.1111/j.2517-6161.1986.tb01398.x
Cook, R. &Weisberg, S. (1982), Residuals and influence in regression, Monographs on statistics and applied probability, Chapman and Hall, London.
Diniz, C. A. R., Tutia, M. H. & Leite, J. G. (2010), 'Bayesian analysis of a correlated binomial model', Brazilian Journal of Probability and Statistics 24(1), 68-77. DOI: https://doi.org/10.1214/08-BJPS014
Efron, B. (1986), 'Double exponential families and their use in generalized linear regression', Journal of the American Statistical Association 81(395), 709-721. DOI: https://doi.org/10.1080/01621459.1986.10478327
Fu, J. & Sproule, R. (1995), 'A generalization of the binomial distribution', Communications in Statistics - Theory and Methods 24(10), 2645-2658. DOI: https://doi.org/10.1080/03610929508831639
Lambert, D. (1992), 'Zero-inflated poisson regression, with an application to defects in manufacturing', Technometrics 34(1), 1-14. DOI: https://doi.org/10.2307/1269547
Lehmann, E. L. & Casella, G. (1998), Theory of point estimation, second edn, Springer, New York.
Luceño, A. (1995), 'A family of partially correlated poisson models for overdispersion', Computational Statistics and Data Analysis 20(5), 511-520. DOI: https://doi.org/10.1016/0167-9473(94)00057-P
McCullagh, P. & Nelder, J. A. (1989), Generalized Linear Models, second edn, Chapman and Hall, London. DOI: https://doi.org/10.1007/978-1-4899-3242-6
Nocedal, J. & Wright, S. J. (2006), Numerial Optimization, second edn, Springer-Verlag, New York.
Pires, R. M. & Diniz, C. A. R. (2012), 'Correlated binomial regression models', Computational Statistics and Data Analysis 56(8), 2513-2525. DOI: https://doi.org/10.1016/j.csda.2012.02.004
Prentice, R. L. (1986), 'Binary regression using an extended beta-binomial distribution, with discussion of correlation induced by covariate measurement errors', Journal of the American Statistical Association 81(394), 321-327. DOI: https://doi.org/10.1080/01621459.1986.10478275
R Development Core Team (2007), R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. http://www.R-project.org
Schwarz, G. (1978), 'Estimating the dimension of a model', Annals of Statistics 6(2), 461-464. DOI: https://doi.org/10.1214/aos/1176344136
She, Y. & Owen, A. B. (2011), 'Outlier detection using nonconvex penalized regression', Journal of the American Statistical Association 106(494), 626-639. DOI: https://doi.org/10.1198/jasa.2011.tm10390
Sherman, M. (2011), Spatial Statistics and Spatio-Temporal Data: Covariance Functions and Directional Properties, Wiley Series in Probability and Statistics, John Wiley and Sons. DOI: https://doi.org/10.1002/9780470974391
Skellam, J. G. (1948), 'A probability distribution derived from the binomial distribution by regarding the probability of success as variable between the sets of trials', Journal of the Royal Statistical Society, Series B 10(2), 257-261. DOI: https://doi.org/10.1111/j.2517-6161.1948.tb00014.x
Zhu, H., Lee, S.-Y., Wei, B.-C. & Zhou, J. (2001), 'Case-deletion measures for models with incomplete data', Biometrika 88(3), 727-737. DOI: https://doi.org/10.1093/biomet/88.3.727
How to Cite
APA
ACM
ACS
ABNT
Chicago
Harvard
IEEE
MLA
Turabian
Vancouver
Download Citation
License
Copyright (c) 2021 Revista Colombiana de Estadística
This work is licensed under a Creative Commons Attribution 4.0 International License.
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).