Abstract
One-sample sign test is one of the common procedures to develop distribution-free inference for a quantile of a population. A basic requirement of this test is that the observations in a sample must be independent. This assumption is violated in certain settings, such as clustered data, grouped data and longitudinal studies. Failure to account for dependence structure leads to erroneous statistical inferences. In this study, we have developed statistical inference for a population quantile of order p in either balanced or unbalanced designs by incorporating dependence structure when the distribution of within-cluster observations is exchangeable. We provide a point estimate, develop a testing procedure and construct confidence intervals for a population quantile of order p. Simulation studies are performed to demonstrate that the confidence intervals achieve their nominal coverage probabilities. We finally apply the proposed procedure to Academic Performance Index data.
Similar content being viewed by others
References
Carlos ARD, Marcelo HT, Leite JG (2010) Bayesian analysis of a correlated binomial model. Braz J Probab Stat 24:68–77
Datta S, Satten GA (2005) Rank-sum tests for clustered data. J Am Stat Assoc 100:908–915
Datta S, Satten GA (2008) A signed-rank test for clustered data. Biometrics 64:501–507
Datta S, Nevalainen J, Oja H (2012) A general class of signed-rank tests for clustered data when the cluster size is potentially informative. J Nonparametr Stat 24:797–808
Diniz CAR, Tutia M, Leite JG (2010) Bayesian analysis of a correlated binomial model. Braz J Probab Stat 24(1):68–77
Donner A, Birkett N, Buck C (1981) Randomisation by cluster: sample size requirements and analysis. Am J Epidemiol 114:906–914
Ferguson TS (1967) Mathematical statistics: a decision-theoretic approach. Academic Press, New York
Fox M, Rubin H (1964) Admissibility of quantile estimates of a single location parameter. Ann Math Stat 35(3):1019–1030
Haataja R, Larocque D, Nevalainen J, Oja H (2009) A weighted multivariate signed-rank test for cluster-correlated data. J Multivar Anal 100:1107–1119
Hoffman EB, Sen PK, Weinberg CR (2001) Within-cluster resampling. Biometrika 88:1121–1134
Koenker R, Bassett G (1978) Regression quantiles. Econometrica 46(1):33–50
Larocque D (2003) An affine-invariant multivariate sign test for cluster correlated data. Can J Stat 31:437–455
Larocque D (2005) The Wilcoxon signed-rank test for cluster correlated data. In: Duchesne P, RÉMillard B (eds) Statistical modeling and analysis for complex data problems. Springer, New York, pp 309–323
Larocque D, Nevalainen J, Oja H (2007) A weighted multivariate sign test for cluster correlated data. Biometrika 94:267–283
Luceno A (1995) A family of partially correlated Poisson models for overdispersion. Comput Stat Data Anal 20:511–520
Luceno A, Ceballos F (1995) Describing extra-binomial variation with partially correlated models. Commun Stat Theory Methods 24:1637–1653
Lumley T (2004) Analysis of complex survey samples. J Stat Softw 9(1):1–19
Nevalainen J, Larocque D, Oja H, Prsti I (2010) Nonparametric analysis of clustered multivariate data. J Am Stat Assoc 105:864–871
Nevalainen J, Datta S, Oja H (2014) Inference on the marginal distribution of clustered data with informative cluster size. Stat Pap 55:71–92
Ozturk O (2013) Combining multi-ranker information in judgment post stratified and ranked set samples when sets are partially ordered. Can J Stat 41:304–324
Ozturk O, MacEachern SN (2004) Control versus treatment comparison under order restricted randomization. Ann Inst Stat Math 56:701–720
Rosner B, Grove D (1999) Use of the Mann–Whitney U-test for clustered data. Stat Med 18:1387–1400
Rosner B, Glynn RJ, Lee MLT (2003) Incorporation of clustering effects for the Wilcoxon rank sum test: a large-sample approach. Biometrics 59:1089–1098
Rosner B, Glynn RJ, Lee MLT (2006a) The Wilcoxon signed rank test for paired comparisons of clustered data. Biometrics 62:185–192
Rosner B, Glynn RJ, Lee MLT (2006b) Extension of the rank sum test for clustered data: two-group comparisons with group membership defined at the subunit level. Biometrics 62:1251–1259
Tallis GM (1962) The use of a generalized multinomial distribution in the estimation of correlation in discrete data. J R Stat Soc Ser B 24:530–534
Williamson JM, Datta S, Satten GA (2003) Marginal analyses of clustered data when cluster size is informative. Biometric 59:36–42
Acknowledgments
Authors would like to thank two anonymous reviewers as well as the editor for their helpful comments on the earlier version of the manuscript.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendix
Appendix
Proof Theorem 1
We would like to minimize the variance of \(\bar{T}(\eta _{p,0})=T(\eta _{p,0})/N\) under the constraint that \(\frac{1}{N} \sum _{i=1}^M n_iw_i=1\). This minimization problem is equivalent to minimizing the following expression under the Lagrangian constraint
The optimal weight, \(w_{i,o}\), must satisfy the equality \(\frac{\partial {\varLambda }(w_i,\lambda )}{\partial w_i}=0\) for \(i=1,\ldots ,M\). This leads to following M equalities
Using the lagrangian constraint \(\frac{1}{N} \sum _{i=1}^M n_iw_i=1\), the estimate of \(\lambda \) is obtained as
Inserting the above expression in Eq. (5), the optimal weights, for \(i=1,\ldots ,M\), are obtained as
The variance of \(\sqrt{N}\bar{T}(\eta _{p,0})\), with the optimal weights, simplifies to
which reduces to
if \(n_i=n\) for \(i=1,\ldots ,M\).
Proof of Theorem 2
Let
First, we consider the expected value of \(U_p(a/\sqrt{N})\)
It is clear that the first term (sum) is equal to 1 from the constraint of the weights. The second term has a limit at \(-f(\eta _p)\) as N goes to infinity. Hence, we have
We next show that the variance \(U_p(a/\sqrt{N})\) approaches to zero as N goes to infinity. Without loss of generality, assume that \(a>0\). It is easy to see that
where
has a probability mass function \(b(y_i,n_i,1-p_{a/\sqrt{N}},\delta )\) in Eq. (2) with \(p_{a/\sqrt{N}} = F(\eta _p)-F(\eta _p+a/\sqrt{N})\). The variance of \(U_p(a/\sqrt{N})\) then becomes
In the above equation, the expression
is finite for any N. Since F is a continuous function, \(p_{a/\sqrt{N}}\) converges to zero as N goes to infinity. Hence, we show that the variance \(U_p(a/\sqrt{N})\) converges the zero as N approaches to infinity. This completes the proof of point-wise convergence. Uniform convergence in a compact set follows from the fact that estimating equation is non-increasing in its argument.
EM-Algorithm The EM-algorithm is an iterative procedure involving two steps: Expectation (E-step) and maximization (M-steps). Let \(p^{(0)}\) and \(\delta ^{(0)}\) be the initial values of the parameters p and \(\delta \). We set \(p^{(0)}\) as the sample \(p{\hbox {th}}\) quantile of the data and \(\delta ^{(0)}\) as 0.5.
E-step In the kth iteration of the algorithm, expectation step computes the conditional expected values of the log likelihood functions \(l_1(\delta ,\mathbf z )\) and \(l_2(p,\mathbf n ,\mathbf y ,\mathbf z )\) for given values of \(\mathbf y \) and \((k-1)\)-st step estimate of p and \(\delta \)
where
and
M-step In the M-step, we update the kth iteration estimates by maximizing \(Q_1(\delta |\delta ^{(k-1)}, p^{(k-1)},\mathbf n ,\mathbf y )\) with respect to \(\delta \) and \(Q_2(p|\delta ^{(k-1)},p^{(k-1)}, \mathbf n ,\mathbf y )\) with respect to p. The updated kth iteration estimates are given by
and
For the final estimates, the E- and M-steps are repeated several times based on a certain stopping rules. In this paper, the iteration is terminated if the difference of the successive estimates is less than \(10^{-6}\).
Rights and permissions
About this article
Cite this article
Ozturk, O., Turkmen, A. Quantile inference based on clustered data. Metrika 79, 867–893 (2016). https://doi.org/10.1007/s00184-016-0581-0
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00184-016-0581-0