Skip to main content

Advertisement

Log in

Estimation of a zero-inflated Poisson regression model with missing covariates via nonparametric multiple imputation methods

  • Original paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

Zero-inflated Poisson (ZIP) regression is widely applied to model effects of covariates on an outcome count with excess zeros. In some applications, covariates in a ZIP regression model are partially observed. Based on the imputed data generated by applying the multiple imputation (MI) schemes developed by Wang and Chen (Ann Stat 37:490–517, 2009), two methods are proposed to estimate the parameters of a ZIP regression model with covariates missing at random. One, proposed by Rubin (in: Proceedings of the survey research methods section of the American Statistical Association, 1978), consists of obtaining a unified estimate as the average of estimates from all imputed datasets. The other, proposed by Fay (J Am Stat Assoc 91:490–498, 1996), consists of averaging the estimating scores from all imputed data sets to solve the imputed estimating equation. Moreover, it is shown that the two proposed estimation methods are asymptotically equivalent to the semiparametric inverse probability weighting method. A modified formula is proposed to estimate the variances of the MI estimators. An extensive simulation study is conducted to investigate the performance of the estimation methods. The practicality of the methodology is illustrated with a dataset of motorcycle survey of traffic regulations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Barry SC, Welsh AH (2002) Generalized additive modeling and zero-inflated count data. Ecol Model 157:179–188

    Google Scholar 

  • Bohning D, Dietz E, Schlattmann P, Mendonca L, Kirchner U (1999) The zero-inflated Poisson model and the decayed, missing and filled teeth index in dental epidemiology. J R Stat Soc Ser A 162:195–209

  • Cameron AC, Trivedi PK (2013) Regression analysis of count data, 2nd edn. Cambridge University Press, New York

    MATH  Google Scholar 

  • Clayton D, Spiegelhalter D, Dunn G, Pickles A (1998) Analysis of longitudinal binary data from multiphase sampling (with discussion). J R Stat Soc Ser B 60:71–87

    MathSciNet  MATH  Google Scholar 

  • Chen XD, Fu YZ (2011) Model selection for zero-inflated regression with missing covariates. Comput Stat Data Anal 55:765–773

    MathSciNet  MATH  Google Scholar 

  • Cheung YB (2002) Zero-inflated models for regression analysis of count data, a study of growth and development. Stat Med 21:1461–1469

    Google Scholar 

  • Creemers A, Aerts M, Hens N, Molenberghs G (2012) A nonparametric approach to weighted estimating equations for regression analysis with missing covariates. Comput Stat Data Anal 56:100–113

    MathSciNet  MATH  Google Scholar 

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39:1–38

    MathSciNet  MATH  Google Scholar 

  • Deng D, Paul SR (2000) Score tests for zero inflation in generalized linear models. Can J Stat 27:563–570

    MathSciNet  MATH  Google Scholar 

  • Deng D, Paul SR (2005) Score tests for zero-inflation and over-dispersion in generalized linear models. Stat Sin 15:257–276

    MathSciNet  MATH  Google Scholar 

  • Dietz K, Böhning D (1997) The use of two-component mixture models with one completely or partly known component. Comput Stat 12:219–234

    MATH  Google Scholar 

  • Fay RE (1996) Alternative paradigms for the analysis of imputed survey data. J Am Stat Assoc 91:490–498

    MATH  Google Scholar 

  • Hall DB, Shen J (2010) Robust estimation for zero-inflated Poisson regression. Scand J Stat 37:237–252

    MathSciNet  MATH  Google Scholar 

  • Horvitz DG, Thompson DJ (1952) A generalization of sampling without replacement from a finite universe. J Am Stat Assoc 47:663–685

    MathSciNet  MATH  Google Scholar 

  • Hsieh SH, Lee SM, Shen PS (2009) Semiparametric analysis of randomized response data with missing covariates in logistic regression. Comput Stat Data Anal 53:2673–2692

    MathSciNet  MATH  Google Scholar 

  • Hsieh SH, Lee SM, Shen PS (2010) Logistic regression analysis of randomized response data with missing covariates. J Stat Plan Inference 140:927–940

    MathSciNet  MATH  Google Scholar 

  • Huang L, Zheng D, Zalkikar J, Tiwari R (2017) Zero-inflated Poisson model based likelihood ratio test for drug safety signal detection. Stat Methods Med Res 26:471–488

    MathSciNet  Google Scholar 

  • Jansakul N, Hinde JP (2002) Score tests for zero-inflated Poisson models. Comput Stat Data Anal 40:75–96

    MathSciNet  MATH  Google Scholar 

  • Johnson NL, Kemp AW, Kotz S (2005) Univariate discrete distributions, 3rd edn. Wiley, New York

    MATH  Google Scholar 

  • Lambert D (1992) Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics 34:1–14

    MATH  Google Scholar 

  • Lee SM, Gee MJ, Hsieh SH (2011) Semiparametric methods in the proportional odds model for ordinal response data with missing covariates. Biometrics 67:788–798

    MathSciNet  MATH  Google Scholar 

  • Lee JH, Han G, Fulp WJ, Giuliano AR (2012a) Analysis of overdispersed count data: application to the human papillomavirus infection in men (HIM) study. Epidemiol Infect 140:1087–1094

    Google Scholar 

  • Lee SM, Li CS, Hsieh SH, Huang LH (2012b) Semiparametric estimation of logistic regression model with missing covariates and outcome. Metrika 75:621–653

    MathSciNet  MATH  Google Scholar 

  • Lee SM, Hwang WH, Tapsoba JD (2016) Estimation in closed capture-recapture models when covariates are missing at random. Biometrics 72:1294–1304

    MathSciNet  MATH  Google Scholar 

  • Li CS (2011) A Lack-of-fit test for parametric zero-inflated Poisson models. J Stat Comput Simul 81:1081–1098

    MathSciNet  MATH  Google Scholar 

  • Li CS (2012) Score test for semiparametric zero-inflated Poisson model. Int J Stat Probab 1:1–7

    Google Scholar 

  • Little RJA (1992) Regression with missing X’s: a review. J Am Stat Assoc 87:1227–1237

    Google Scholar 

  • Liu H, Powers DA (2007) Growth curve models for zero-inflated count data: An application to smoking behavior. Struct Equ Model Multidiscip J 14:247–279

    MathSciNet  Google Scholar 

  • Lu SE, Lin Y, Shih WCJ (2004) Analyzing excessive no changes in clinical trials with clustered data. Biometrics 60:257–267

    MathSciNet  MATH  Google Scholar 

  • Lukusa TM, Lee SM, Li CS (2016) Semiparametric estimation of a zero-inflated Poisson regression model with missing covariates. Metrika 79:457–483

    MathSciNet  MATH  Google Scholar 

  • Lukusa TM, Lee SM, Li CS (2017) Review of zero-inflated models with missing data. Curr Res Biostat 7:1–12

    Google Scholar 

  • Mullahy J (1986) Specification and testing of some modified count data models. J Econ 33:341–365

    MathSciNet  Google Scholar 

  • Pahel BT, Preisser JS, Stearns SC, Rozier RG (2011) Multiple imputation of dental caries data using a zero-inflated Poisson regression model. J Public Health Dent 71:71–78

    Google Scholar 

  • Reilly M, Pepe MS (1995) A mean score method for missing and auxiliary covariates data in regression methods. Biometrika 82:299–314

    MathSciNet  MATH  Google Scholar 

  • Ridout M, Demetrio CGB, Hinde J (1998) Models for count data with many zeros. In: 19th international biometric conference, Cape Town, pp 179–192

  • Righi P, Falorsi S, Fasulo A (2014) Methods for variance estimation under random hot deck imputation in business surveys. Rivista Di Statistica Ufficiale N 1–2(2014):45–64

    Google Scholar 

  • Robins JM, Wang N (2000) Inference for imputation estimators. Biometrika 87:113–124

    MathSciNet  MATH  Google Scholar 

  • Robins JM, Rotnitzky A, Zhao LP (1994) Estimation of regression coefficients when some regressors are not always observed. J Am Stat Assoc 89:846–866

    MathSciNet  MATH  Google Scholar 

  • Rubin DB (1976) Inference and missing data. Biometrika 63:581–592

    MathSciNet  MATH  Google Scholar 

  • Rubin DB (1978) Multiple imputations in sample surveys: a phenomenological Bayesian approach to nonresponse. In: Proceedings of the survey research methods section of the American Statistical Association, vol. 1. American Statistical Association, Boston, pp 20-28

  • Rubin DB (1987) Multiple imputation for nonresponse in surveys. Wiley, New York

    MATH  Google Scholar 

  • Rubin DB (1996) Multiple imputation after 18+ years. J Am Stat Assoc 91:473–489

    MATH  Google Scholar 

  • Rubin DB, Schenker N (1986) Multiple imputation for interval estimation from simple random samples with ignorable nonresponse. J Am Stat Assoc 81:366–374

    MathSciNet  MATH  Google Scholar 

  • Samani EB, Ganjali M, Amirian Y (2012) Zero-inflated power series joint model to analyze count data with missing responses. J Stat Theor Pract 6:334–343

    MathSciNet  MATH  Google Scholar 

  • Schafer JL, Graham JW (2002) Missing data: our view of the state of the art. Psychol Methods 7:147–177

    Google Scholar 

  • Singh S (1963) A note on inflated Poisson distribution. J Indian Stat Assoc 1:140–144

    MathSciNet  Google Scholar 

  • Van den Broek J (1995) A score test for zero inflation in a Poisson distribution. Biometrics 51:738–743

    MathSciNet  MATH  Google Scholar 

  • Wang S, Wang CY (2001) A note on kernel assisted estimators in missing covariate regression. Stat Probab Lett 55:439–449

    MathSciNet  MATH  Google Scholar 

  • Wang D, Chen SX (2009) Empirical likelihood for estimating equations with missing values. Ann Stat 37:490–517

    MathSciNet  MATH  Google Scholar 

  • Wang CY, Wang S, Zhao LP, Ou ST (1997) Weighted semiparametric estimation in regression with missing covariate data. J Am Stat Assoc 92:512–525

    MathSciNet  MATH  Google Scholar 

  • Wang CY, Chen JC, Lee SM, Ou ST (2002) Joint conditional likelihood estimator in logistic regression with missing covariate data. Stat Sin 12:555–574

    MathSciNet  MATH  Google Scholar 

  • Xiang L, Lee AH, Yau KKW, McLachlan GJ (2007) A score test for overdispersion in zero-inflated Poisson mixed regression model. Stat Med 26:1608–1622

    MathSciNet  Google Scholar 

  • Yau KKW, Lee AH (2001) Zero-inflated Poisson regression with random effects to evaluate an occupational injury prevention programme. Stat Med 20:2907–2920

    Google Scholar 

  • Zhao LP, Lipsitz S (1992) Designs and analysis of two-stage studies. Stat Med 11:769–782

    Google Scholar 

Download references

Acknowledgements

The authors are very grateful for two referees’ helpful comments and suggestions that improved the presentation. This work was supported by the Ministry of Science and Technology of Taiwan (S.M. Lee).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chin-Shang Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

1.1 Proof of Theorem 1

It can be obtained from the empirical CDF \(\hat{F}(x|Y_i,\varvec{V}_i)\) in (10) that for \(i=1,\dots ,n\) and \(v=1,\dots ,M\),

$$\begin{aligned} E_{\hat{F}}(\tilde{S}_{iv}({\varvec{\theta }})|Y_i,\varvec{V}_i) =\sum _{k=1}^{n}\dfrac{\delta _kI(Y_k=Y_i,\varvec{V}_k=\varvec{V}_i)S_k({\varvec{\theta }})}{\sum _{r=1}^{n}\delta _rI(Y_r=Y_i,\varvec{V}_r=\varvec{V}_i)}. \end{aligned}$$
(18)

By using the expression of \(U_v({\varvec{\theta }})\) in (11), the expression of \(E_{\hat{F}}(\tilde{S}_{iv}({\varvec{\theta }})|Y_i,\varvec{V}_i)\) in (18), and the fact that

$$\begin{aligned} \sum _{i=1}^{n}(1-\delta _i)E_{\hat{F}}(\tilde{S}_{iv}({\varvec{\theta }})|Y_i,\varvec{V}_i) =\sum _{k=1}^n\delta _kS_k({\varvec{\theta }})\left[ \dfrac{1}{\hat{\pi }(Y_k,\varvec{V}_k)}-1\right] , \end{aligned}$$
(19)

\(v=1,\ldots ,M\), we can have \(E_{\hat{F}}(U_v({\varvec{\theta }})|\mathcal {O})=n^{-1/2}\sum _{i=1}^n[\delta _i/\hat{\pi }(Y_i,\varvec{V}_i)]S_i({\varvec{\theta }}) =U_w({\varvec{\theta }},\hat{\varvec{\pi }})\), \(v=1,\ldots ,M\). Similarly, it can be shown that \(E_{\hat{F}}(\partial {U}_v({\varvec{\theta }})/{\varvec{\theta }}|\mathcal {O}) =\partial {U}_w({\varvec{\theta }},\hat{\varvec{\pi }})/\partial {\varvec{\theta }}\) and, hence, \(E(\partial {U}_v({\varvec{\theta }})/\partial {\varvec{\theta }}) =E(\partial {U}_w({\varvec{\theta }},\hat{\varvec{\pi }})/\partial {\varvec{\theta }})\), \(v=1,\ldots ,M\).

Recall that \(S_i^*({\varvec{\theta }})=E(S_i({\varvec{\theta }})|Y_i,\varvec{V}_i)\), \(i=1,\dots ,n\). As given in (16), \(U_{m2}({\varvec{\theta }})\) can be expressed as follows:

$$\begin{aligned} U_{m2}({\varvec{\theta }})= & {} \dfrac{1}{\sqrt{n}}\sum _{i=1}^{n}\left[ \delta _iS_i({\varvec{\theta }})+(1-\delta _i)S_i^*({\varvec{\theta }})\right] \nonumber \\&+\dfrac{1}{\sqrt{n}}\sum _{i=1}^{n}(1-\delta _i) \left[ \tilde{S}_{i}({\varvec{\theta }})-E_{\hat{F}}(\tilde{S}_{i1}({\varvec{\theta }})|\mathcal {O})\right] \nonumber \\&+\dfrac{1}{\sqrt{n}}\sum _{i=1}^{n}(1-\delta _i) \left[ E_{\hat{F}}(\tilde{S}_{i1}({\varvec{\theta }})\big |\mathcal {O})-S_i^*({\varvec{\theta }})\right] . \end{aligned}$$
(20)

Note that the second term of the expression of \(U_{m2}({\varvec{\theta }})\) in (20) can be reformulated as

$$\begin{aligned} \dfrac{1}{\sqrt{n}}\sum _{i=1}^{n}(1-\delta _i) [\tilde{S}_{i}({\varvec{\theta }})-E_{\hat{F}}(\tilde{S}_{i1}({\varvec{\theta }})|\mathcal {O})] =O_p(M^{-1/2}), \end{aligned}$$
(21)

where \(\tilde{S}_i({\varvec{\theta }})=\sum _{v=1}^{M}\tilde{S}_{iv}({\varvec{\theta }})/M\). The third term of the expression of \(U_{m2}({\varvec{\theta }})\) in (20) can be rewritten as follows:

$$\begin{aligned}&\dfrac{1}{\sqrt{n}}\sum _{i=1}^{n}(1-\delta _i)\left[ E_{\hat{F}}(\tilde{S}_{i1}({\varvec{\theta }})|\mathcal {O})-S_i^*({\varvec{\theta }})\right] \nonumber \\&= \dfrac{1}{\sqrt{n}}\sum _{i=1}^{n}(1-\delta _i)\left[ \sum _{k=1}^{n}\dfrac{\delta _kS_k({\varvec{\theta }})I(Y_k=Y_i,\varvec{V}_k=\varvec{V}_i)}{\sum _{r=1}^{n}\delta _rI(Y_r=Y_i,\varvec{V}_r=\varvec{V}_i)} \nonumber \right. \\&\quad \left. -S_i^*({\varvec{\theta }})\sum _{k=1}^{n}\dfrac{\delta _kI(Y_k=Y_i,\varvec{V}_k=\varvec{V}_i)}{\sum _{r=1}^{n}\delta _rI(Y_r=Y_i,\varvec{V}_r=\varvec{V}_i)}\right] \nonumber \\&= \dfrac{1}{\sqrt{n}}\sum _{i=1}^{n}(1-\delta _i)\left[ \sum _{k=1}^{n}\dfrac{\delta _k[S_k({\varvec{\theta }})-S_i^*({\varvec{\theta }})] I(Y_k=Y_i,\varvec{V}_k=\varvec{V}_i)}{\sum _{r=1}^{n}\delta _rI(Y_r=Y_i,\varvec{V}_r=\varvec{V}_i)}\right] \nonumber \\&= \dfrac{1}{\sqrt{n}}\sum _{i=1}^{n}(1-\delta _i) \left[ \sum _{k=1}^n\dfrac{\delta _k[S_k({\varvec{\theta }})-S_i^*({\varvec{\theta }})] I(Y_k=Y_i,\varvec{V}_k=\varvec{V}_i)}{\sum _{s=1}^{n}I(Y_s=Y_i,\varvec{V}_s=\varvec{V}_i)}\right] \nonumber \\&\quad \times \,\left[ \dfrac{\sum _{s=1}^{n}I(Y_s=Y_i,\varvec{V}_s=\varvec{V}_i)}{\sum _{r=1}^{n}\delta _r I(Y_r=Y_i,\varvec{V}_r=\varvec{V}_i)}\right] \nonumber \\&= \dfrac{1}{\sqrt{n}}\sum _{i=1}^{n}\dfrac{(1-\delta _i)}{\hat{\pi }(Y_i,\varvec{V}_i)} \left[ \dfrac{\sum _{k=1}^{n}\delta _k[S_k({\varvec{\theta }})-S_k^*({\varvec{\theta }})]I(Y_k=Y_i,\varvec{V}_k=\varvec{V}_i)}{\sum _{s=1}^{n}I(Y_s=Y_i,\varvec{V}_s=\varvec{V}_i)}\right] \nonumber \\&=\dfrac{1}{\sqrt{n}}\sum _{k=1}^n\delta _k[S_k({\varvec{\theta }})-S_k^*({\varvec{\theta }})] \left\{ \sum _{i=1}^{n}\dfrac{(1-\delta _i)}{\hat{\pi }(Y_i,\varvec{V}_i)} \left[ \dfrac{I(Y_k=Y_i,\varvec{V}_k=\varvec{V}_i)}{\sum _{s=1}^{n}I(Y_s=Y_i,\varvec{V}_s=\varvec{V}_i)}\right] \right\} \nonumber \\&=\dfrac{1}{\sqrt{n}}\sum _{k=1}^{n}\delta _k[S_k({\varvec{\theta }})-S_k^*({\varvec{\theta }})] \left[ \dfrac{1-\hat{\pi }(Y_k,\varvec{V}_k)}{\hat{\pi }(Y_k,\varvec{V}_k)}\right] \nonumber \\&= \dfrac{1}{\sqrt{n}}\sum _{k=1}^{n}\delta _k\left[ \dfrac{1-\pi (Y_k,\varvec{V}_k)}{\pi (Y_k,\varvec{V}_k)}\right] \left[ S_k({\varvec{\theta }})-S_k^*({\varvec{\theta }})\right] +o_p(1). \end{aligned}$$
(22)

Hence, from (21) and (22), \(U_{m2}({\varvec{\theta }})\) can be re-expressed as

$$\begin{aligned} U_{m2}({\varvec{\theta }})= & {} \dfrac{1}{\sqrt{n}}\sum _{i=1}^{n}\left\{ \dfrac{\delta _i}{\pi (Y_i,\varvec{V}_i)}S_i({\varvec{\theta }})+ \left[ 1-\dfrac{\delta _i}{\pi (Y_i,\varvec{V}_i)}\right] S_i^*({\varvec{\theta }})\right\} \nonumber \\&+ O_p(M^{-1/2})+o_p(1). \end{aligned}$$
(23)

Because the first term is the sum of independent and identically distributed random vectors, it can be shown by the multivariate central limit theorem that \(U_{m2}({\varvec{\theta }}){\mathop {\rightarrow }\limits ^{d}}\mathcal {N}(\varvec{0},M({\varvec{\theta }},\varvec{\pi }))\) as \(n,M\rightarrow \infty \), where \(M({\varvec{\theta }},\varvec{\pi })\) is given in (15). In addition, \(U_{m2}({\varvec{\theta }})\) in (23) can be expressed as

$$\begin{aligned} U_{m2}({\varvec{\theta }})= & {} U_w({\varvec{\theta }},\varvec{\pi }) +\dfrac{1}{\sqrt{n}}\sum _{i=1}^{n}\left[ 1-\dfrac{\delta _i}{\pi (Y_i,\varvec{V}_i)}\right] S_i^*({\varvec{\theta }})\nonumber \\&+ O_p(M^{-1/2}) +o_p(1). \end{aligned}$$
(24)

Because \(\hat{\varvec{\theta }}_{m2}^{(M)}\) is the solution of \(U_{m2}({\varvec{\theta }})=\varvec{0}\), it follows by a Taylor’s expansion of \(U_{m2}(\hat{\varvec{\theta }}_{m2}^{(M)})\) at \({\varvec{\theta }}\) and the expression of \(U_{m2}({\varvec{\theta }})\) in (23) that

$$\begin{aligned} \varvec{0}=U_{m2}(\hat{\varvec{\theta }}_{m2}^{(M)})= & {} \dfrac{1}{\sqrt{n}}\sum _{i=1}^{n}\left\{ \dfrac{\delta _i}{\pi (Y_i,\varvec{V}_i)}S_i({\varvec{\theta }}) +\left[ 1-\dfrac{\delta _i}{\pi (Y_i,\varvec{V}_i)}\right] S_i^*({\varvec{\theta }})\right\} \\&- G({\varvec{\theta }},\varvec{\pi })\sqrt{n}(\hat{\varvec{\theta }}_{m2}^{(M)}-{\varvec{\theta }})+O_p(M^{-1/2})+o_p(1), \end{aligned}$$

where \(G({\varvec{\theta }},\varvec{\pi })=E[-\partial {U}_w({\varvec{\theta }},\varvec{\pi })/(\sqrt{n}\partial {\varvec{\theta }})]\). Therefore, it can be obtained that

$$\begin{aligned} \sqrt{n}(\hat{\varvec{\theta }}_{m2}^{(M)}-{\varvec{\theta }})= & {} \dfrac{1}{\sqrt{n}}G^{-1}({\varvec{\theta }},\varvec{\pi })\left\{ \sum _{i=1}^n \left[ \dfrac{\delta _i}{\pi (Y_i,\varvec{V}_i)}S_i({\varvec{\theta }})+\left( 1-\dfrac{\delta _i}{\pi (Y_i,\varvec{V}_i)}\right) S_i^*({\varvec{\theta }})\right] \right\} \\&+\, O_p(M^{-1/2})+o_p(1). \end{aligned}$$

This implies that \(\sqrt{n}(\hat{\varvec{\theta }}_{m2}^{(M)}-{\varvec{\theta }}){\mathop {\rightarrow }\limits ^{d}}\mathcal {N}(\varvec{0},\Delta _m({\varvec{\theta }}))\) as \(n,M\rightarrow \infty \), where \(\Delta _m({\varvec{\theta }})=G^{-1}({\varvec{\theta }},\varvec{\pi })M({\varvec{\theta }},\varvec{\pi })[G^{-1}({\varvec{\theta }},\varvec{\pi })]^T\).

Let \(\hat{\varvec{\theta }}_v\) be the solution to the estimating equations \(U_v({\varvec{\theta }})=\varvec{0}\). We have by a Taylor’s expansion of \(U_v(\hat{\varvec{\theta }}_v)\) at \({\varvec{\theta }}\) that

$$\begin{aligned} \varvec{0}=U_v(\hat{\varvec{\theta }}_v) =U_v({\varvec{\theta }})-G({\varvec{\theta }},\varvec{\pi })\sqrt{n}(\hat{\varvec{\theta }}_v-{\varvec{\theta }})+o_p(1). \end{aligned}$$

Hence, it follows that \(\sqrt{n}(\hat{\varvec{\theta }}_v-{\varvec{\theta }})=G^{-1}({\varvec{\theta }},\varvec{\pi })U_v({\varvec{\theta }})+o_p(1)\). Because \(\hat{\varvec{\theta }}_{m1}^{(M)}=\sum _{v=1}^{M}\hat{\varvec{\theta }}_v/M\), using the above result and the expressions for \(U_{m2}({\varvec{\theta }})\) in (13) and (23), we can have

$$\begin{aligned} \sqrt{n}(\hat{\varvec{\theta }}_{m1}^{(M)}-{\varvec{\theta }})= & {} G^{-1}({\varvec{\theta }},\varvec{\pi })\left( \dfrac{1}{M}\sum _{v=1}^MU_v({\varvec{\theta }})\right) +o_p(1)\nonumber \\= & {} \dfrac{1}{\sqrt{n}}G^{-1}({\varvec{\theta }},\varvec{\pi })\sum _{i=1}^n\left\{ \dfrac{\delta _i}{\pi (Y_i,\varvec{V}_i)} S_i({\varvec{\theta }})+\left[ 1-\dfrac{\delta _i}{\pi (Y_i,\varvec{V}_i)}\right] S_i^*({\varvec{\theta }})\right\} \nonumber \\&+\, O_p(M^{-1/2})+o_p(1), \end{aligned}$$
(25)

and it is shown easily that \(\sqrt{n}(\hat{\varvec{\theta }}_{m1}^{(M)}-{\varvec{\theta }}){\mathop {\rightarrow }\limits ^{d}}{N}(\varvec{0},\Delta _{m}({\varvec{\theta }}))\) as \(n,M\rightarrow \infty \).

1.2 Proof of Theorem 2

Because \(\hat{\varvec{\theta }}_{m2}^{(M)}\) is the solution of \(U_{m2}({\varvec{\theta }})=M^{-1}\sum _{v=1}^{M}U_v({\varvec{\theta }})=\varvec{0}\), a Taylor’s expansion of \(U_{m2}(\hat{\varvec{\theta }}_{m2}^{(M)})\) at \({\varvec{\theta }}\) can lead to \(\varvec{0}=U_{m2}(\hat{\varvec{\theta }}_{m2}^{(M)}) =M^{-1}\sum _{v=1}^MU_v({\varvec{\theta }})-G({\varvec{\theta }},\varvec{\pi })\sqrt{n}(\hat{\varvec{\theta }}_{m2}^{(M)}-{\varvec{\theta }})+o_p(1)\), which implies that

$$\begin{aligned} \sqrt{n}(\hat{\varvec{\theta }}_{m2}^{(M)}-{\varvec{\theta }}) =G^{-1}({\varvec{\theta }},\varvec{\pi })\left( \dfrac{1}{M}\sum _{v=1}^{M}U_v({\varvec{\theta }})\right) +o_p(1). \end{aligned}$$
(26)

Thus, it follows from (25) and (26) that \(\sqrt{n}(\hat{\varvec{\theta }}_{m2}^{(M)}-\hat{\varvec{\theta }}_{m1}^{(M)})=o_p(1)\). This shows that \(\sqrt{n}(\hat{\varvec{\theta }}_{m2}^{(M)}-\hat{\varvec{\theta }}_{m1}^{(M)})\) converges in probability to \(\varvec{0}\) as \(n,M\rightarrow \infty \).

Next, we show that the semiparametric IPW estimator and the second MI-type estimator are asymptotically equivalent. \(U_{m2}({\varvec{\theta }})\) can be expressed as

$$\begin{aligned} U_{m2}({\varvec{\theta }})= & {} \dfrac{1}{\sqrt{n}}\sum _{i=1}^{n} \left\{ \delta _iS_i({\varvec{\theta }})+(1-\delta _i)E_{\hat{F}}(\tilde{S}_{i1}({\varvec{\theta }})\big |Y_i,\varvec{V}_i) \right. \\&\left. +\,(1-\delta _i)\left[ \tilde{S}_i({\varvec{\theta }})-E_{\hat{F}}(\tilde{S}_{i1}({\varvec{\theta }})|Y_i,\varvec{V}_i)\right] \right\} . \end{aligned}$$

Using the fact given in (19), \(n^{-1/2}\sum _{i=1}^n[\delta _iS_i({\varvec{\theta }})+(1-\delta _i)E_{\hat{F}}(\tilde{S}_{i1}({\varvec{\theta }})|Y_i,\varvec{V}_i)]\) can be expressed as

$$\begin{aligned}&\dfrac{1}{\sqrt{n}} \sum _{i=1}^{n}[\delta _iS_i({\varvec{\theta }})+(1-\delta _i)E_{\hat{F}}(\tilde{S}_{i1}({\varvec{\theta }})\big |Y_i,\varvec{V}_i)] \\&\quad = \dfrac{1}{\sqrt{n}}\sum _{i=1}^{n} \delta _iS_i({\varvec{\theta }})+\dfrac{1}{\sqrt{n}}\sum _{k=1}^{n}\delta _kS_k({\varvec{\theta }}) \left[ \dfrac{1}{\hat{\pi }(Y_k,\varvec{V}_k)}-1\right] =U_w({\varvec{\theta }},\hat{\varvec{\pi }}). \end{aligned}$$

Hence it can be obtained that

$$\begin{aligned} U_{m2}({\varvec{\theta }})=U_w({\varvec{\theta }},\hat{\varvec{\pi }})+\dfrac{1}{\sqrt{n}}\sum _{i=1}^n(1-\delta _i) \left[ \tilde{S}_i({\varvec{\theta }})-E_{\hat{F}}(\tilde{S}_{i1}({\varvec{\theta }})|Y_i,\varvec{V}_i)\right] . \end{aligned}$$

Recall \(\mathcal {B}({\varvec{\theta }};Y_i,\varvec{V}_i) =M^{-1/2}\sum _{v=1}^M\left[ \tilde{S}_{iv}({\varvec{\theta }})-E_{\hat{F}}(\tilde{S}_{i1}({\varvec{\theta }})|Y_i,\varvec{V}_i)\right] \), \(i=1,\ldots ,n\). Because

$$\begin{aligned} \dfrac{1}{\sqrt{n}}\sum _{i=1}^{n}(1-\delta _i) \left[ \tilde{S}_i({\varvec{\theta }})-E_{\hat{F}}(\tilde{S}_{i1}({\varvec{\theta }})|Y_i,\varvec{V}_i)\right] M^{-1/2}\dfrac{1}{\sqrt{n}}\sum _{i=1}^{n}(1-\delta _i)\mathcal {B}({\varvec{\theta }};Y_i,\varvec{V}_i) \end{aligned}$$

and \((1-\delta _i)\mathcal {B}({\varvec{\theta }};Y_i,\varvec{V}_i)\) are independent and identically distributed random vectors with mean \(\varvec{0}\) and covariance matrix \(E\{(1-\delta _1)[\mathcal {B}({\varvec{\theta }},Y_1,\varvec{V}_1)]^{\otimes 2}\}\), it implies by the multivariate central limit theorem that \(n^{-1/2}\sum _{i=1}^{n}(1-\delta _i)\mathcal {B}({\varvec{\theta }};Y_i,\varvec{V}_i)=O_p(1)\) and, hence,

$$\begin{aligned} U_{m2}({\varvec{\theta }})-U_w({\varvec{\theta }},\hat{\varvec{\pi }})= & {} \dfrac{1}{\sqrt{n}}\sum _{i=1}^{n}(1-\delta _i) \left[ \tilde{S}_i({\varvec{\theta }})-E_{\hat{F}}(\tilde{S}_{i1}({\varvec{\theta }})\big |Y_i,\varvec{V}_i)\right] \nonumber \\= & {} M^{-1/2}O_p(1)=O_p(M^{-1/2}). \end{aligned}$$
(27)

Let \(\hat{\varvec{\theta }}_{m2}^{(M)}\) be the solution of \(U_{m2}({\varvec{\theta }})=\varvec{0}\). Because by a Taylor’s expansion of \(U_{m2}(\hat{\varvec{\theta }}_{m2}^{(M)})\) at \({\varvec{\theta }}\) and \(U_w(\hat{\varvec{\theta }}_{ws},\hat{\varvec{\pi }})\) at \({\varvec{\theta }}\), respectively, we can have that

$$\begin{aligned} \varvec{0}=U_{m2}(\hat{\varvec{\theta }}_{m2}^{(M)}) =U_{m2}({\varvec{\theta }})-G({\varvec{\theta }},\varvec{\pi })\sqrt{n}(\hat{\varvec{\theta }}_{m2}^{(M)}-{\varvec{\theta }})+o_p(1) \end{aligned}$$

and

$$\begin{aligned} \varvec{0}=U_w(\hat{\varvec{\theta }}_{ws},\hat{\varvec{\pi }}) =U_{w}({\varvec{\theta }},\hat{\varvec{\pi }})-G({\varvec{\theta }},\varvec{\pi })\sqrt{n}(\hat{\varvec{\theta }}_{ws}-{\varvec{\theta }})+o_p(1), \end{aligned}$$

it can be shown from (27) that

$$\begin{aligned} \sqrt{n}(\hat{\varvec{\theta }}_{m2}^{(M)}-\hat{\varvec{\theta }}_{ws}) G^{-1}({\varvec{\theta }},\varvec{\pi })\left[ U_{m2}({\varvec{\theta }})-U_w({\varvec{\theta }},\hat{\varvec{\pi }})\right] +o_p(1) =o_p(1)+O_p(M^{-1/2}). \end{aligned}$$

Therefore, it follows that \(\sqrt{n}(\hat{\varvec{\theta }}_{m2}^{(M)}-\hat{\varvec{\theta }}_{ws})\) converges in probability to \(\varvec{0}\) as \(n,M\rightarrow \infty \).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lee, SM., Lukusa, T.M. & Li, CS. Estimation of a zero-inflated Poisson regression model with missing covariates via nonparametric multiple imputation methods. Comput Stat 35, 725–754 (2020). https://doi.org/10.1007/s00180-019-00930-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-019-00930-x

Keywords

Navigation