Influence Diagnostics for Correlated Binomial Regression Models: An Application to a Data Set on High-Cost Health Services Occurrence

Carlos Alberto Ribeiro Diniz; Rubiane Maria Pires; Carolina Costa Mota Paraíba; Paulo Henrique Ferreira

doi:10.15446/rce.v44n2.85606

Published

2021-07-12

Influence Diagnostics for Correlated Binomial Regression Models: An Application to a Data Set on High-Cost Health Services Occurrence

Diagnósticos de influencia para modelos de regresión binomial correlacionada: una aplicación a un conjunto de datos sobre la ocurrencia de servicios de salud de alto costo

DOI:

https://doi.org/10.15446/rce.v44n2.85606

Keywords:

generalized binomial distribution, health plan, influence, overdispersion, regression, residuals (en)
distribución binomial generalizada, plan de salud, influencia, sobredispersión, regresión, residuos (es)

Downloads

pdf

Authors

Carlos Alberto Ribeiro Diniz Federal University of São Carlos
Rubiane Maria Pires Federal University of São Carlos
Carolina Costa Mota Paraíba Federal University of Bahia
Paulo Henrique Ferreira Federal University of Bahia

Abstract (en)
Abstract (es)

This paper considers a frequentist perspective to deal with the class of correlated binomial regression models (Pires & Diniz, 2012), thus providing a new approach to analyze correlated binary response variables. Model parameters are estimated by direct maximization of the log-likelihood function. We also consider a diagnostic analysis under the correlated binomial regression model setup, which is performed considering residuals based on predictive values and deviance residuals (Cook & Weisberg, 1982) to check for model assumptions, and global in˛uence measure based on case-deletion (Cook, 1977) to detect in˛uential observations. Moreover, a sensitivity analysis is carried out to detect possible in˛uential observations that could a˙ect the inferential results. This is done using local in˛uence metrics (Cook, 1986) with case-weight, response, and covariate perturbation schemes. A simulation study is conducted to assess the frequentist properties of model parameter estimates and check the performance of the considered diagnostic metrics under the correlated binomial regression model. A data set on high-cost claims made to a private health care provider in Brazil is analyzed to illustrate the proposed methodology.

Este artículo considera una perspectiva frecuentista para tratar con la clase de modelos de regresión binomial correlacionada (Pires & Diniz, 2012), proporcionando así un nuevo enfoque para analizar variables de respuesta binaria correlacionadas. Los parámetros del modelo se estiman mediante la maximización directa de la función de log-verosimilitud. También consideramos un análisis de diagnóstico bajo la configuración del modelo de regresión binomial correlacionada, que se realiza considerando los residuos basados en valores predictivos y los residuos de desviación (Cook & Weisberg, 1982) para verificar los supuestos del modelo y la medida de influencia global basada en la eliminación de casos (Cook, 1977) para detectar observaciones influyentes. Además, se realiza un análisis de sensibilidad para detectar posibles observaciones influyentes que podrían afectar los resultados inferenciales. Esto se hace utilizando métricas de influencia local (Cook, 1986) con esquemas de perturbación de covariable, variable respuesta y ponderación de casos. Se realiza un estudio de simulación para evaluar las propiedades frecuentistas de los estimadores de parámetros del modelo y verificar el rendimiento de las métricas de diagnóstico consideradas bajo el modelo de regresión binomial correlacionada. Se analiza un conjunto de datos sobre un plan de salud de un operador brasileño para ilustrar la metodología propuesta.

References

Agresti, A. (2015), Foundations of Linear and Generalized Linear Models, Wiley Series in Probability and Statistics, rst edn, Wiley, New Jersey.

Akaike, H. (1974), 'A new look at the statistical model identification', IEEE Transactions on Automatic Control 19(6), 716-723. DOI: https://doi.org/10.1109/TAC.1974.1100705

Altham, P. M. E. (1978), 'Two generalizations of the binomial distribution', Journal of the Royal Statistical Society. Series C 27(2), 162-167. DOI: https://doi.org/10.2307/2346943

Cook, R. D. (1977), 'Detection of influential observations in linear regression', Technometrics 19(1), 15-18. DOI: https://doi.org/10.1080/00401706.1977.10489493

Cook, R. D. (1986), 'Assessment of local influence', Journal of the Royal Statistical Society. Series B (Methodological) 48(2), 133-169. DOI: https://doi.org/10.1111/j.2517-6161.1986.tb01398.x

Cook, R. &Weisberg, S. (1982), Residuals and influence in regression, Monographs on statistics and applied probability, Chapman and Hall, London.

Diniz, C. A. R., Tutia, M. H. & Leite, J. G. (2010), 'Bayesian analysis of a correlated binomial model', Brazilian Journal of Probability and Statistics 24(1), 68-77. DOI: https://doi.org/10.1214/08-BJPS014

Efron, B. (1986), 'Double exponential families and their use in generalized linear regression', Journal of the American Statistical Association 81(395), 709-721. DOI: https://doi.org/10.1080/01621459.1986.10478327

Fu, J. & Sproule, R. (1995), 'A generalization of the binomial distribution', Communications in Statistics - Theory and Methods 24(10), 2645-2658. DOI: https://doi.org/10.1080/03610929508831639

Lambert, D. (1992), 'Zero-inflated poisson regression, with an application to defects in manufacturing', Technometrics 34(1), 1-14. DOI: https://doi.org/10.2307/1269547

Lehmann, E. L. & Casella, G. (1998), Theory of point estimation, second edn, Springer, New York.

Luceño, A. (1995), 'A family of partially correlated poisson models for overdispersion', Computational Statistics and Data Analysis 20(5), 511-520. DOI: https://doi.org/10.1016/0167-9473(94)00057-P

McCullagh, P. & Nelder, J. A. (1989), Generalized Linear Models, second edn, Chapman and Hall, London. DOI: https://doi.org/10.1007/978-1-4899-3242-6

Nocedal, J. & Wright, S. J. (2006), Numerial Optimization, second edn, Springer-Verlag, New York.

Pires, R. M. & Diniz, C. A. R. (2012), 'Correlated binomial regression models', Computational Statistics and Data Analysis 56(8), 2513-2525. DOI: https://doi.org/10.1016/j.csda.2012.02.004

Prentice, R. L. (1986), 'Binary regression using an extended beta-binomial distribution, with discussion of correlation induced by covariate measurement errors', Journal of the American Statistical Association 81(394), 321-327. DOI: https://doi.org/10.1080/01621459.1986.10478275

R Development Core Team (2007), R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. http://www.R-project.org

Schwarz, G. (1978), 'Estimating the dimension of a model', Annals of Statistics 6(2), 461-464. DOI: https://doi.org/10.1214/aos/1176344136

She, Y. & Owen, A. B. (2011), 'Outlier detection using nonconvex penalized regression', Journal of the American Statistical Association 106(494), 626-639. DOI: https://doi.org/10.1198/jasa.2011.tm10390

Sherman, M. (2011), Spatial Statistics and Spatio-Temporal Data: Covariance Functions and Directional Properties, Wiley Series in Probability and Statistics, John Wiley and Sons. DOI: https://doi.org/10.1002/9780470974391

Skellam, J. G. (1948), 'A probability distribution derived from the binomial distribution by regarding the probability of success as variable between the sets of trials', Journal of the Royal Statistical Society, Series B 10(2), 257-261. DOI: https://doi.org/10.1111/j.2517-6161.1948.tb00014.x

Zhu, H., Lee, S.-Y., Wei, B.-C. & Zhou, J. (2001), 'Case-deletion measures for models with incomplete data', Biometrika 88(3), 727-737. DOI: https://doi.org/10.1093/biomet/88.3.727

How to Cite

APA

Diniz, C. A. R., Pires, R. M., Paraíba, C. C. M. and Ferreira, P. H. (2021). Influence Diagnostics for Correlated Binomial Regression Models: An Application to a Data Set on High-Cost Health Services Occurrence. Revista Colombiana de Estadística, 44(2), 253–278. https://doi.org/10.15446/rce.v44n2.85606

ACM

[1]

Diniz, C.A.R., Pires, R.M., Paraíba, C.C.M. and Ferreira, P.H. 2021. Influence Diagnostics for Correlated Binomial Regression Models: An Application to a Data Set on High-Cost Health Services Occurrence. Revista Colombiana de Estadística. 44, 2 (Jul. 2021), 253–278. DOI:https://doi.org/10.15446/rce.v44n2.85606.

ACS

(1)

Diniz, C. A. R.; Pires, R. M.; Paraíba, C. C. M.; Ferreira, P. H. Influence Diagnostics for Correlated Binomial Regression Models: An Application to a Data Set on High-Cost Health Services Occurrence. Rev. colomb. estad. 2021, 44, 253-278.

ABNT

DINIZ, C. A. R.; PIRES, R. M.; PARAÍBA, C. C. M.; FERREIRA, P. H. Influence Diagnostics for Correlated Binomial Regression Models: An Application to a Data Set on High-Cost Health Services Occurrence. Revista Colombiana de Estadística, [S. l.], v. 44, n. 2, p. 253–278, 2021. DOI: 10.15446/rce.v44n2.85606. Disponível em: https://revistas.unal.edu.co/index.php/estad/article/view/85606. Acesso em: 19 apr. 2024.

Chicago

Diniz, Carlos Alberto Ribeiro, Rubiane Maria Pires, Carolina Costa Mota Paraíba, and Paulo Henrique Ferreira. 2021. “Influence Diagnostics for Correlated Binomial Regression Models: An Application to a Data Set on High-Cost Health Services Occurrence”. Revista Colombiana De Estadística 44 (2):253-78. https://doi.org/10.15446/rce.v44n2.85606.

Harvard

Diniz, C. A. R., Pires, R. M., Paraíba, C. C. M. and Ferreira, P. H. (2021) “Influence Diagnostics for Correlated Binomial Regression Models: An Application to a Data Set on High-Cost Health Services Occurrence”, Revista Colombiana de Estadística, 44(2), pp. 253–278. doi: 10.15446/rce.v44n2.85606.

IEEE

[1]

C. A. R. Diniz, R. M. Pires, C. C. M. Paraíba, and P. H. Ferreira, “Influence Diagnostics for Correlated Binomial Regression Models: An Application to a Data Set on High-Cost Health Services Occurrence”, Rev. colomb. estad., vol. 44, no. 2, pp. 253–278, Jul. 2021.

MLA

Diniz, C. A. R., R. M. Pires, C. C. M. Paraíba, and P. H. Ferreira. “Influence Diagnostics for Correlated Binomial Regression Models: An Application to a Data Set on High-Cost Health Services Occurrence”. Revista Colombiana de Estadística, vol. 44, no. 2, July 2021, pp. 253-78, doi:10.15446/rce.v44n2.85606.

Turabian

Diniz, Carlos Alberto Ribeiro, Rubiane Maria Pires, Carolina Costa Mota Paraíba, and Paulo Henrique Ferreira. “Influence Diagnostics for Correlated Binomial Regression Models: An Application to a Data Set on High-Cost Health Services Occurrence”. Revista Colombiana de Estadística 44, no. 2 (July 12, 2021): 253–278. Accessed April 19, 2024. https://revistas.unal.edu.co/index.php/estad/article/view/85606.

Vancouver

1.

Diniz CAR, Pires RM, Paraíba CCM, Ferreira PH. Influence Diagnostics for Correlated Binomial Regression Models: An Application to a Data Set on High-Cost Health Services Occurrence. Rev. colomb. estad. [Internet]. 2021 Jul. 12 [cited 2024 Apr. 19];44(2):253-78. Available from: https://revistas.unal.edu.co/index.php/estad/article/view/85606

Download Citation

CrossRef Cited-by

0

Dimensions

PlumX

Article abstract page views

193

Downloads

Download data is not yet available.

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).

	IBN Publindex
	El Índice Bibliográfico Nacional Publindex es un sistema colombiano para la clasificación, actualización, escalafonamiento y certificación de las publicaciones científicas y tecnológicas. Es regido por COLCIENCIAS y el ICFES en Colombia.
	SciELO Colombia
	SciELO Colombia es una librería virtual para América Latina, el Caribe, España y Portugal, fue creada por FAPESP en el año de 1997 en Sao Pablo Brasil, actualmente en Colombia es gestionada por la Universidad Nacional de Colombia.
	REDIB
	Portal donde se muestran las revistas electrónicas españolas y latinoamericanas de acceso abierto (Open Access). Fue creado en España.
	Scopus
	Scopus es una base de datos bibliográfica de resúmenes y citas de artículos de revistas científicas. Cubre aproximadamente 19.500 títulos de más de 5.000 editores internacionales, incluyendo la cobertura de de 16.500 revistas.
	Latindex
	Latindex es producto de la cooperación de una red de instituciones latinoamericanas que funcionan de manera coordinada para reunir y diseminar información bibliográfica sobre las publicaciones científicas seriadas producidas en la región.
	Dialnet
	Dialnet es un portal de difusión de la producción científica hispana que inició su funcionamiento en el año 2001 especializado en ciencias humanas y sociales. Su base de datos, de acceso libre, fue creada por la Universidad de La Rioja (España).
	Zentralblatt Math
	Zentralblatt MATH (zbMATH) es el servicio de resumen y revisión más completo y de más larga duración del mundo en matemática pura y aplicada. Está editado por la European Mathematical Society (EMS), la Academia de Ciencias y Humanidades de Heidelberg y FIZ Karlsruhe. El trabajo editorial lo realiza la oficina de Berlín de FIZ Karlsruhe que, como miembro de la Asociación Leibniz, es una empresa sin fines de lucro y una organización reconocida de interés público. zbMATH es distribuido por Springer Nature.

Revista Colombiana de Estadística

Published

Influence Diagnostics for Correlated Binomial Regression Models: An Application to a Data Set on High-Cost Health Services Occurrence

Diagnósticos de influencia para modelos de regresión binomial correlacionada: una aplicación a un conjunto de datos sobre la ocurrencia de servicios de salud de alto costo

DOI:

Keywords:

Downloads

Authors

References

How to Cite

APA

ACM

ACS

ABNT

Chicago

Harvard

IEEE

MLA

Turabian

Vancouver

Download Citation

CrossRef Cited-by

Dimensions

PlumX

Article abstract page views

Downloads

License

Make a Submission

Information for Authors

Scimago Journal & Country Rank (SJR)

Keywords