Quantile inference based on clustered data

Ozturk, Omer; Turkmen, Asuman

doi:10.1007/s00184-016-0581-0

Quantile inference based on clustered data

Published: 19 April 2016

Volume 79, pages 867–893, (2016)
Cite this article

Metrika Aims and scope Submit manuscript

Omer Ozturk¹ &
Asuman Turkmen¹

227 Accesses
1 Citation
Explore all metrics

Abstract

One-sample sign test is one of the common procedures to develop distribution-free inference for a quantile of a population. A basic requirement of this test is that the observations in a sample must be independent. This assumption is violated in certain settings, such as clustered data, grouped data and longitudinal studies. Failure to account for dependence structure leads to erroneous statistical inferences. In this study, we have developed statistical inference for a population quantile of order p in either balanced or unbalanced designs by incorporating dependence structure when the distribution of within-cluster observations is exchangeable. We provide a point estimate, develop a testing procedure and construct confidence intervals for a population quantile of order p. Simulation studies are performed to demonstrate that the confidence intervals achieve their nominal coverage probabilities. We finally apply the proposed procedure to Academic Performance Index data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Quantile Regression for Clustering and Modeling Data

Bayesian Framework for Causal Inference with Principal Stratification and Clusters

Article 23 July 2022

Testing heterogeneity in quantile regression: a multigroup approach

Article Open access 10 June 2023

References

Carlos ARD, Marcelo HT, Leite JG (2010) Bayesian analysis of a correlated binomial model. Braz J Probab Stat 24:68–77
Article MathSciNet MATH Google Scholar
Datta S, Satten GA (2005) Rank-sum tests for clustered data. J Am Stat Assoc 100:908–915
Article MathSciNet MATH Google Scholar
Datta S, Satten GA (2008) A signed-rank test for clustered data. Biometrics 64:501–507
Article MathSciNet MATH Google Scholar
Datta S, Nevalainen J, Oja H (2012) A general class of signed-rank tests for clustered data when the cluster size is potentially informative. J Nonparametr Stat 24:797–808
Article MathSciNet MATH Google Scholar
Diniz CAR, Tutia M, Leite JG (2010) Bayesian analysis of a correlated binomial model. Braz J Probab Stat 24(1):68–77
Article MathSciNet MATH Google Scholar
Donner A, Birkett N, Buck C (1981) Randomisation by cluster: sample size requirements and analysis. Am J Epidemiol 114:906–914
Google Scholar
Ferguson TS (1967) Mathematical statistics: a decision-theoretic approach. Academic Press, New York
MATH Google Scholar
Fox M, Rubin H (1964) Admissibility of quantile estimates of a single location parameter. Ann Math Stat 35(3):1019–1030
Article MathSciNet MATH Google Scholar
Haataja R, Larocque D, Nevalainen J, Oja H (2009) A weighted multivariate signed-rank test for cluster-correlated data. J Multivar Anal 100:1107–1119
Article MathSciNet MATH Google Scholar
Hoffman EB, Sen PK, Weinberg CR (2001) Within-cluster resampling. Biometrika 88:1121–1134
Article MathSciNet MATH Google Scholar
Koenker R, Bassett G (1978) Regression quantiles. Econometrica 46(1):33–50
Article MathSciNet MATH Google Scholar
Larocque D (2003) An affine-invariant multivariate sign test for cluster correlated data. Can J Stat 31:437–455
Article MathSciNet MATH Google Scholar
Larocque D (2005) The Wilcoxon signed-rank test for cluster correlated data. In: Duchesne P, RÉMillard B (eds) Statistical modeling and analysis for complex data problems. Springer, New York, pp 309–323
Larocque D, Nevalainen J, Oja H (2007) A weighted multivariate sign test for cluster correlated data. Biometrika 94:267–283
Article MathSciNet MATH Google Scholar
Luceno A (1995) A family of partially correlated Poisson models for overdispersion. Comput Stat Data Anal 20:511–520
Article MATH Google Scholar
Luceno A, Ceballos F (1995) Describing extra-binomial variation with partially correlated models. Commun Stat Theory Methods 24:1637–1653
Article MATH Google Scholar
Lumley T (2004) Analysis of complex survey samples. J Stat Softw 9(1):1–19
Google Scholar
Nevalainen J, Larocque D, Oja H, Prsti I (2010) Nonparametric analysis of clustered multivariate data. J Am Stat Assoc 105:864–871
Article MathSciNet MATH Google Scholar
Nevalainen J, Datta S, Oja H (2014) Inference on the marginal distribution of clustered data with informative cluster size. Stat Pap 55:71–92
Article MathSciNet MATH Google Scholar
Ozturk O (2013) Combining multi-ranker information in judgment post stratified and ranked set samples when sets are partially ordered. Can J Stat 41:304–324
Article MathSciNet MATH Google Scholar
Ozturk O, MacEachern SN (2004) Control versus treatment comparison under order restricted randomization. Ann Inst Stat Math 56:701–720
Article MathSciNet MATH Google Scholar
Rosner B, Grove D (1999) Use of the Mann–Whitney U-test for clustered data. Stat Med 18:1387–1400
Article Google Scholar
Rosner B, Glynn RJ, Lee MLT (2003) Incorporation of clustering effects for the Wilcoxon rank sum test: a large-sample approach. Biometrics 59:1089–1098
Article MathSciNet MATH Google Scholar
Rosner B, Glynn RJ, Lee MLT (2006a) The Wilcoxon signed rank test for paired comparisons of clustered data. Biometrics 62:185–192
Article MathSciNet MATH Google Scholar
Rosner B, Glynn RJ, Lee MLT (2006b) Extension of the rank sum test for clustered data: two-group comparisons with group membership defined at the subunit level. Biometrics 62:1251–1259
Article MathSciNet MATH Google Scholar
Tallis GM (1962) The use of a generalized multinomial distribution in the estimation of correlation in discrete data. J R Stat Soc Ser B 24:530–534
MathSciNet MATH Google Scholar
Williamson JM, Datta S, Satten GA (2003) Marginal analyses of clustered data when cluster size is informative. Biometric 59:36–42
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

Authors would like to thank two anonymous reviewers as well as the editor for their helpful comments on the earlier version of the manuscript.

Author information

Authors and Affiliations

Department of Statistics, The Ohio State University, 1958 Neil Avenue, 404 Cockins Hall, Columbus, OH, 43210-1247, USA
Omer Ozturk & Asuman Turkmen

Authors

Omer Ozturk
View author publications
You can also search for this author in PubMed Google Scholar
Asuman Turkmen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Asuman Turkmen.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 32 KB)

Appendix

Proof Theorem 1

We would like to minimize the variance of $\bar{T}(\eta _{p,0})=T(\eta _{p,0})/N$ under the constraint that $\frac{1}{N} \sum _{i=1}^M n_iw_i=1$. This minimization problem is equivalent to minimizing the following expression under the Lagrangian constraint

$$\begin{aligned} {\varLambda }(w_i,\lambda )=\frac{1}{N^2} w_i^2 n_i(1-p) p\left[ (1-\delta )+n_i\delta \right] + \lambda \left( \frac{1}{N}\sum _{i=1}^M n_i w_i-1\right) . \end{aligned}$$

The optimal weight, $w_{i,o}$, must satisfy the equality $\frac{\partial {\varLambda }(w_i,\lambda )}{\partial w_i}=0$ for $i=1,\ldots ,M$. This leads to following M equalities

$$\begin{aligned} w_i=\frac{-\hat{\lambda }N}{2p(1-p) \left[ (1-\delta )+n_i\delta \right] }, \quad i=1,\ldots , M. \end{aligned}$$

(5)

Using the lagrangian constraint $\frac{1}{N} \sum _{i=1}^M n_iw_i=1$, the estimate of $\lambda $ is obtained as

$$\begin{aligned} \hat{\lambda }=\frac{-2}{\sum _{i=1}^M \frac{n_i}{p(1-p)[(1-\delta )+n_i\delta ]}}. \end{aligned}$$

Inserting the above expression in Eq. (5), the optimal weights, for $i=1,\ldots ,M$, are obtained as

$$\begin{aligned} w_{i,o}=\frac{\frac{N}{[(1-\delta )+n_i\delta ]}}{\sum _{i=1}^M \frac{n_i}{[(1-\delta )+n_i\delta ]}}. \end{aligned}$$

The variance of $\sqrt{N}\bar{T}(\eta _{p,0})$, with the optimal weights, simplifies to

$$\begin{aligned} Var \left( \sqrt{N}\bar{T} \left( \eta _{p,0}\right) \right) = \frac{p(1-p)}{\frac{1}{N}\sum _{i=1}^M \frac{n_i}{[(1-\delta )+n_i\delta ]} } \end{aligned}$$

which reduces to

$$\begin{aligned} Var \left( \sqrt{N}\bar{T}_p(\eta _p)\right) = p(1-p)[(1-\delta )+n\delta ] \end{aligned}$$

if $n_i=n$ for $i=1,\ldots ,M$.

Proof of Theorem 2

Let

$$\begin{aligned} U_p(a/\sqrt{N})=\frac{\bar{S}_p(\eta _p+a/\sqrt{N}) - \bar{S}_p(\eta _p)}{a/\sqrt{N}}. \end{aligned}$$

First, we consider the expected value of $U_p(a/\sqrt{N})$

$$\begin{aligned} E (U_p(a/\sqrt{N})=\sum _{i=1}^M \frac{w_in_i}{N} \frac{F(\eta _p)- F \left( \eta _p+a/\sqrt{N}\right) }{a\sqrt{N}}. \end{aligned}$$

It is clear that the first term (sum) is equal to 1 from the constraint of the weights. The second term has a limit at $-f(\eta _p)$ as N goes to infinity. Hence, we have

$$\begin{aligned} \lim _{N\rightarrow \infty }U_p(a/\sqrt{N})= -f(\eta _p). \end{aligned}$$

We next show that the variance $U_p(a/\sqrt{N})$ approaches to zero as N goes to infinity. Without loss of generality, assume that $a>0$. It is easy to see that

$$\begin{aligned} U_p\left( a/\sqrt{N}\right)= & {} \frac{\sqrt{N}}{aN} \sum _{i=1}^M w_i \sum _{j=1}^{n_i} I \left( \eta _p \le X_{ij} \le \eta _p+a/\sqrt{N}\right) \\= & {} \frac{\sqrt{N}}{aN} \sum _{i=1}^M w_i Y^*_i \left( a/\sqrt{N}\right) , \end{aligned}$$

where

$$\begin{aligned} Y^*_i \left( a/\sqrt{N}\right) = \sum _{j=1}^{n_i} I \left( \eta _p \le X_{ij} \le \eta _p+a/\sqrt{N} \right) \end{aligned}$$

has a probability mass function $b(y_i,n_i,1-p_{a/\sqrt{N}},\delta )$ in Eq. (2) with $p_{a/\sqrt{N}} = F(\eta _p)-F(\eta _p+a/\sqrt{N})$. The variance of $U_p(a/\sqrt{N})$ then becomes

$$\begin{aligned} var \left( U_p \left( a/\sqrt{N}\right) \right) = \frac{1}{a^2 N} \sum _{i=1}^M w_i^2 n_i \left( 1-\delta +n_i \delta \right) p_{a/\sqrt{N}} \left( 1-p_{a/\sqrt{N}}\right) . \end{aligned}$$

In the above equation, the expression

$$\begin{aligned} \frac{1}{a^2N} \sum _{i=1}^M w_i^2n_i[ 1-\delta +n_i\delta ] \end{aligned}$$

is finite for any N. Since F is a continuous function, $p_{a/\sqrt{N}}$ converges to zero as N goes to infinity. Hence, we show that the variance $U_p(a/\sqrt{N})$ converges the zero as N approaches to infinity. This completes the proof of point-wise convergence. Uniform convergence in a compact set follows from the fact that estimating equation is non-increasing in its argument.

EM-Algorithm The EM-algorithm is an iterative procedure involving two steps: Expectation (E-step) and maximization (M-steps). Let $p^{(0)}$ and $\delta ^{(0)}$ be the initial values of the parameters p and $\delta $. We set $p^{(0)}$ as the sample $p{\hbox {th}}$ quantile of the data and $\delta ^{(0)}$ as 0.5.

E-step In the kth iteration of the algorithm, expectation step computes the conditional expected values of the log likelihood functions $l_1(\delta ,\mathbf z )$ and $l_2(p,\mathbf n ,\mathbf y ,\mathbf z )$ for given values of $\mathbf y $ and $(k-1)$-st step estimate of p and $\delta $

$$\begin{aligned} Q_1 \left( \delta |\delta ^{(k-1)},p^{(k-1)},\mathbf n , \mathbf y \right)= & {} E \left( l_1 \left( \mathbf z |\delta ^{(k-1)},p^{(k-1)},\mathbf n ,\mathbf y \right) \right) \\= & {} \sum _{i=1}^M \tau \left( y_i,p^{(k-1)},\delta ^{(k-1)}\right) log(\delta ) \\&+\,\left( M-\sum _{i=1}^M\tau (y_i,p^{(k-1)},\delta ^{(k-1)})\right) log (1-\delta ), \\ Q_2 \left( p|\delta ^{(k-1)},p^{(k-1)},\mathbf n ,\mathbf y \right)= & {} E\left( l_2 \left( \mathbf z |\delta ^{k-1},p^{k-1},\mathbf n , \mathbf y \right) \right) \\= & {} B^* \left( p^{(k-1)},\delta ^{(k-1)}\right) log(p) \\&+\,C^*\left( p^{(k-1)}, \delta ^{(k-1)}\right) log(1-p), \end{aligned}$$

where

$$\begin{aligned} B^* \left( p^{(k-1)},\delta ^{(k-1)}\right)= & {} E(B(\mathbf z |p^{(k-1)},\delta ^{(k-1)},\mathbf y ,\mathbf n ) \\= & {} \sum _{i=1}^M \left( 1-\tau \left( y_i,p^{(k-1)},\delta ^{(k-1)} \right) \right) y_i I_{A_{1,i}} (y_i)\\&+\,\sum _{i=1}^M \tau \left( y_i,p^{(k-1)},\delta ^{(k-1)}\right) \frac{y_i}{n_i}I_{A_{2,i}}(y_i), \end{aligned}$$

and

$$\begin{aligned} C^* \left( p^{(k-1)},\delta ^{(k-1)}\right)= & {} E(C(\mathbf z |p^{(k-1)},\delta ^{(k-1)},\mathbf y ,\mathbf n )\\= & {} \sum _{i=1}^M \left( 1-\tau _i \left( y_i,p,\delta \right) \right) \left( n_i-y_i\right) I_{A_1}(y_i)\\&+\,\sum _{i=1}^M\tau _i \left( y_i,p,\delta \right) \frac{(n_i-y_i)}{n_i}I_{A_{2,i}}(y_i). \end{aligned}$$

M-step In the M-step, we update the kth iteration estimates by maximizing $Q_1(\delta |\delta ^{(k-1)}, p^{(k-1)},\mathbf n ,\mathbf y )$ with respect to $\delta $ and $Q_2(p|\delta ^{(k-1)},p^{(k-1)}, \mathbf n ,\mathbf y )$ with respect to p. The updated kth iteration estimates are given by

$$\begin{aligned} \delta ^{(k)}= \frac{1}{M} \sum _{i=1}^M \tau \left( y_i,p^{(k-1)},\delta ^{(k-1)}\right) \end{aligned}$$

and

$$\begin{aligned} p^{(k)}= \frac{B^*\left( p^{(k-1)}, \delta ^{(k-1)}\right) }{B^*\left( p^{(k-1)}, \delta ^{(k-1)}\right) + C^*\left( p^{(k-1)}, \delta ^{(k-1)}\right) }. \end{aligned}$$

For the final estimates, the E- and M-steps are repeated several times based on a certain stopping rules. In this paper, the iteration is terminated if the difference of the successive estimates is less than $10^{-6}$.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ozturk, O., Turkmen, A. Quantile inference based on clustered data. Metrika 79, 867–893 (2016). https://doi.org/10.1007/s00184-016-0581-0

Download citation

Received: 18 December 2014
Published: 19 April 2016
Issue Date: October 2016
DOI: https://doi.org/10.1007/s00184-016-0581-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Quantile inference based on clustered data

Abstract

Access this article

Similar content being viewed by others

Quantile Regression for Clustering and Modeling Data

Bayesian Framework for Causal Inference with Principal Stratification and Clusters

Testing heterogeneity in quantile regression: a multigroup approach

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 32 KB)

Appendix

Proof Theorem 1

Proof of Theorem 2

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Quantile inference based on clustered data

Abstract

Access this article

Similar content being viewed by others

Quantile Regression for Clustering and Modeling Data

Bayesian Framework for Causal Inference with Principal Stratification and Clusters

Testing heterogeneity in quantile regression: a multigroup approach

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 32 KB)

Appendix

Appendix

Proof Theorem 1

Proof of Theorem 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation