Published

2021-07-12

Influence Diagnostics for Correlated Binomial Regression Models: An Application to a Data Set on High-Cost Health Services Occurrence

Diagnósticos de influencia para modelos de regresión binomial correlacionada: una aplicación a un conjunto de datos sobre la ocurrencia de servicios de salud de alto costo

DOI:

https://doi.org/10.15446/rce.v44n2.85606

Keywords:

generalized binomial distribution, health plan, influence, overdispersion, regression, residuals (en)
distribución binomial generalizada, plan de salud, influencia, sobredispersión, regresión, residuos (es)

Downloads

Authors

  • Carlos Alberto Ribeiro Diniz Federal University of São Carlos
  • Rubiane Maria Pires Federal University of São Carlos
  • Carolina Costa Mota Paraíba Federal University of Bahia
  • Paulo Henrique Ferreira Federal University of Bahia

This paper considers a frequentist perspective to deal with the class of correlated binomial regression models (Pires & Diniz, 2012), thus providing a new approach to analyze correlated binary response variables. Model parameters are estimated by direct maximization of the log-likelihood function. We also consider a diagnostic analysis under the correlated binomial regression model setup, which is performed considering residuals based on predictive values and deviance residuals (Cook & Weisberg, 1982) to check for model assumptions, and global in˛uence measure based on case-deletion (Cook, 1977) to detect in˛uential observations. Moreover, a sensitivity analysis is carried out to detect possible in˛uential observations that could a˙ect the inferential results. This is done using local in˛uence metrics (Cook, 1986) with case-weight, response, and covariate perturbation schemes. A simulation study is conducted to assess the frequentist properties of model parameter estimates and check the performance of the considered diagnostic metrics under the correlated binomial regression model. A data set on high-cost claims made to a private health care provider in Brazil is analyzed to illustrate the proposed methodology.

Este artículo considera una perspectiva frecuentista para tratar con la clase de modelos de regresión binomial correlacionada (Pires & Diniz, 2012), proporcionando así un nuevo enfoque para analizar variables de respuesta binaria correlacionadas. Los parámetros del modelo se estiman mediante la maximización directa de la función de log-verosimilitud. También consideramos un análisis de diagnóstico bajo la configuración del modelo de regresión binomial correlacionada, que se realiza considerando los residuos basados en valores predictivos y los residuos de desviación (Cook & Weisberg, 1982) para verificar los supuestos del modelo y la medida de influencia global basada en la eliminación de casos (Cook, 1977) para detectar observaciones influyentes. Además, se realiza un análisis de sensibilidad para detectar posibles observaciones influyentes que podrían afectar los resultados inferenciales. Esto se hace utilizando métricas de influencia local (Cook, 1986) con esquemas de perturbación de covariable, variable respuesta y ponderación de casos. Se realiza un estudio de simulación para evaluar las propiedades frecuentistas de los estimadores de parámetros del modelo y verificar el rendimiento de las métricas de diagnóstico consideradas bajo el modelo de regresión binomial correlacionada. Se analiza un conjunto de datos sobre un plan de salud de un operador brasileño para ilustrar la metodología propuesta.

References

Agresti, A. (2015), Foundations of Linear and Generalized Linear Models, Wiley Series in Probability and Statistics, rst edn, Wiley, New Jersey.

Akaike, H. (1974), 'A new look at the statistical model identification', IEEE Transactions on Automatic Control 19(6), 716-723. DOI: https://doi.org/10.1109/TAC.1974.1100705

Altham, P. M. E. (1978), 'Two generalizations of the binomial distribution', Journal of the Royal Statistical Society. Series C 27(2), 162-167. DOI: https://doi.org/10.2307/2346943

Cook, R. D. (1977), 'Detection of influential observations in linear regression', Technometrics 19(1), 15-18. DOI: https://doi.org/10.1080/00401706.1977.10489493

Cook, R. D. (1986), 'Assessment of local influence', Journal of the Royal Statistical Society. Series B (Methodological) 48(2), 133-169. DOI: https://doi.org/10.1111/j.2517-6161.1986.tb01398.x

Cook, R. &Weisberg, S. (1982), Residuals and influence in regression, Monographs on statistics and applied probability, Chapman and Hall, London.

Diniz, C. A. R., Tutia, M. H. & Leite, J. G. (2010), 'Bayesian analysis of a correlated binomial model', Brazilian Journal of Probability and Statistics 24(1), 68-77. DOI: https://doi.org/10.1214/08-BJPS014

Efron, B. (1986), 'Double exponential families and their use in generalized linear regression', Journal of the American Statistical Association 81(395), 709-721. DOI: https://doi.org/10.1080/01621459.1986.10478327

Fu, J. & Sproule, R. (1995), 'A generalization of the binomial distribution', Communications in Statistics - Theory and Methods 24(10), 2645-2658. DOI: https://doi.org/10.1080/03610929508831639

Lambert, D. (1992), 'Zero-inflated poisson regression, with an application to defects in manufacturing', Technometrics 34(1), 1-14. DOI: https://doi.org/10.2307/1269547

Lehmann, E. L. & Casella, G. (1998), Theory of point estimation, second edn, Springer, New York.

Luceño, A. (1995), 'A family of partially correlated poisson models for overdispersion', Computational Statistics and Data Analysis 20(5), 511-520. DOI: https://doi.org/10.1016/0167-9473(94)00057-P

McCullagh, P. & Nelder, J. A. (1989), Generalized Linear Models, second edn, Chapman and Hall, London. DOI: https://doi.org/10.1007/978-1-4899-3242-6

Nocedal, J. & Wright, S. J. (2006), Numerial Optimization, second edn, Springer-Verlag, New York.

Pires, R. M. & Diniz, C. A. R. (2012), 'Correlated binomial regression models', Computational Statistics and Data Analysis 56(8), 2513-2525. DOI: https://doi.org/10.1016/j.csda.2012.02.004

Prentice, R. L. (1986), 'Binary regression using an extended beta-binomial distribution, with discussion of correlation induced by covariate measurement errors', Journal of the American Statistical Association 81(394), 321-327. DOI: https://doi.org/10.1080/01621459.1986.10478275

R Development Core Team (2007), R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. http://www.R-project.org

Schwarz, G. (1978), 'Estimating the dimension of a model', Annals of Statistics 6(2), 461-464. DOI: https://doi.org/10.1214/aos/1176344136

She, Y. & Owen, A. B. (2011), 'Outlier detection using nonconvex penalized regression', Journal of the American Statistical Association 106(494), 626-639. DOI: https://doi.org/10.1198/jasa.2011.tm10390

Sherman, M. (2011), Spatial Statistics and Spatio-Temporal Data: Covariance Functions and Directional Properties, Wiley Series in Probability and Statistics, John Wiley and Sons. DOI: https://doi.org/10.1002/9780470974391

Skellam, J. G. (1948), 'A probability distribution derived from the binomial distribution by regarding the probability of success as variable between the sets of trials', Journal of the Royal Statistical Society, Series B 10(2), 257-261. DOI: https://doi.org/10.1111/j.2517-6161.1948.tb00014.x

Zhu, H., Lee, S.-Y., Wei, B.-C. & Zhou, J. (2001), 'Case-deletion measures for models with incomplete data', Biometrika 88(3), 727-737. DOI: https://doi.org/10.1093/biomet/88.3.727

How to Cite

APA

Diniz, C. A. R., Pires, R. M., Paraíba, C. C. M. and Ferreira, P. H. (2021). Influence Diagnostics for Correlated Binomial Regression Models: An Application to a Data Set on High-Cost Health Services Occurrence. Revista Colombiana de Estadística, 44(2), 253–278. https://doi.org/10.15446/rce.v44n2.85606

ACM

[1]
Diniz, C.A.R., Pires, R.M., Paraíba, C.C.M. and Ferreira, P.H. 2021. Influence Diagnostics for Correlated Binomial Regression Models: An Application to a Data Set on High-Cost Health Services Occurrence. Revista Colombiana de Estadística. 44, 2 (Jul. 2021), 253–278. DOI:https://doi.org/10.15446/rce.v44n2.85606.

ACS

(1)
Diniz, C. A. R.; Pires, R. M.; Paraíba, C. C. M.; Ferreira, P. H. Influence Diagnostics for Correlated Binomial Regression Models: An Application to a Data Set on High-Cost Health Services Occurrence. Rev. colomb. estad. 2021, 44, 253-278.

ABNT

DINIZ, C. A. R.; PIRES, R. M.; PARAÍBA, C. C. M.; FERREIRA, P. H. Influence Diagnostics for Correlated Binomial Regression Models: An Application to a Data Set on High-Cost Health Services Occurrence. Revista Colombiana de Estadística, [S. l.], v. 44, n. 2, p. 253–278, 2021. DOI: 10.15446/rce.v44n2.85606. Disponível em: https://revistas.unal.edu.co/index.php/estad/article/view/85606. Acesso em: 19 apr. 2024.

Chicago

Diniz, Carlos Alberto Ribeiro, Rubiane Maria Pires, Carolina Costa Mota Paraíba, and Paulo Henrique Ferreira. 2021. “Influence Diagnostics for Correlated Binomial Regression Models: An Application to a Data Set on High-Cost Health Services Occurrence”. Revista Colombiana De Estadística 44 (2):253-78. https://doi.org/10.15446/rce.v44n2.85606.

Harvard

Diniz, C. A. R., Pires, R. M., Paraíba, C. C. M. and Ferreira, P. H. (2021) “Influence Diagnostics for Correlated Binomial Regression Models: An Application to a Data Set on High-Cost Health Services Occurrence”, Revista Colombiana de Estadística, 44(2), pp. 253–278. doi: 10.15446/rce.v44n2.85606.

IEEE

[1]
C. A. R. Diniz, R. M. Pires, C. C. M. Paraíba, and P. H. Ferreira, “Influence Diagnostics for Correlated Binomial Regression Models: An Application to a Data Set on High-Cost Health Services Occurrence”, Rev. colomb. estad., vol. 44, no. 2, pp. 253–278, Jul. 2021.

MLA

Diniz, C. A. R., R. M. Pires, C. C. M. Paraíba, and P. H. Ferreira. “Influence Diagnostics for Correlated Binomial Regression Models: An Application to a Data Set on High-Cost Health Services Occurrence”. Revista Colombiana de Estadística, vol. 44, no. 2, July 2021, pp. 253-78, doi:10.15446/rce.v44n2.85606.

Turabian

Diniz, Carlos Alberto Ribeiro, Rubiane Maria Pires, Carolina Costa Mota Paraíba, and Paulo Henrique Ferreira. “Influence Diagnostics for Correlated Binomial Regression Models: An Application to a Data Set on High-Cost Health Services Occurrence”. Revista Colombiana de Estadística 44, no. 2 (July 12, 2021): 253–278. Accessed April 19, 2024. https://revistas.unal.edu.co/index.php/estad/article/view/85606.

Vancouver

1.
Diniz CAR, Pires RM, Paraíba CCM, Ferreira PH. Influence Diagnostics for Correlated Binomial Regression Models: An Application to a Data Set on High-Cost Health Services Occurrence. Rev. colomb. estad. [Internet]. 2021 Jul. 12 [cited 2024 Apr. 19];44(2):253-78. Available from: https://revistas.unal.edu.co/index.php/estad/article/view/85606

Download Citation

CrossRef Cited-by

CrossRef citations0

Dimensions

PlumX

Article abstract page views

193

Downloads

Download data is not yet available.