Testing longitudinal data by logarithmic quantiles

: The shoulder tip pain study of Lumley [13] is re-investigated. It is shown that the new logarithmic quantile estimation (LQE) technique in [9] applies and behaves well under singular covariance structure and small sample sizes as in the shoulder tip pain study. The ﬁndings in [6] can be assured under weaker assumptions using a combination of LQE and an ANOVA type statistic.


Introduction
Statistical decision theory for longitudinal data has attracted much attention over the past three decades. Various model assumptions for the parametric and nonparametric case have been proposed and investigated, partly with moderate success due to lack of information about the structure of the involved finite time series. Modeling the joint distribution of the time series and their parameters is a major draw back. A minor problem here is as well the estimation of unknown parameters and the speed of approximation to the distribution which is used to determine the quantiles for the testing procedure.
In order to overcome these difficulties Akritas and Arnold [1] developed a complete theory of nonparametric hypothesis testing in factorial designs. This was complemented by Akritas, Arnold and Brunner in [2] among other work. As reasonable test statistics rank methods were proposed, either rank transform or linear rank statistics under Wilcoxon scores. It is argued that for repeated measure designs or longitudinal data the gap in efficacy compared to parametric procedures is not as important to close as it is to avoid errors occurring by model misspecification. This leads to a simple statistical model where distributional assumptions are kept to a minimum of conditions. Such procedures are given by rank statistics, which reduces the statistical decision problem to the original Bernoulli idea of making decisions on coin tossing. The test statistics encountered in previous work are based on Wilcoxon scores which is sufficiently general (see [8] how to use Wilcoxon scores to improve efficiency of procedures compared to other score functions). The program can be carried out in full generality due to the central limit theorem for linear rank statistics under arbitrary dependencies in [3]. Using this method and its refinements by Munzel [14] and others (see [4] for an overview) a nearly complete evaluation of problems involving nonparametric factorial designs has been established over the years.
This note adds to this endeavor and introduces a new method of hypothesis testing for longitudinal data motivated by the study of Lumley [13] on a 'shoulder tip pain' data set, and subsequentially re-investigated in Brunner, Domhof and Langer [6]. In this study patients were scored at 6 time instances (on different days) according to their subjective feeling of pain levels. The group was stratified according to gender and treatment. It turned out that the corresponding Wald type test statistics was badly approximated by its limiting distribution while the ANOVA type statistics gave good results in connection with the Box-Welch-Satterhwaite approximation (see [6], p. 72 and 191). The second statistics can only be used making additional assumptions on the distribution approximating the unknown law, while the first method requires a huge data set. It is therefore desirable to have a procedure which is independent of such additional assumptions.
The new method we have introduced in [9] may well serve to overcome this unpleasant feature. Logarithmic quantile estimation is based on almost sure central limit theorems and has, for the first time, been set up in [9] in the general framework of simple linear rank statistics. This result parallels the result in [3] in as much that the assumptions are the same, the only change which has to be made is that the observations are defined on some fixed probability space (due to the a.s. behavior). It is evident that the result persists in the presence of ties, the ranks have to be interpreted as mid-ranks as in [14]. Previous results on the logarithmic quantile method are concerned with two sample linear rank statistics and the Behrens-Fisher problem [15], the comparison of the logarithmic quantile method and the t-test [15], the correlation coefficient in [10], and the c-sample problem under dependencies (i.e. the problem of testing if c dependent samples have the same distribution) in [9]. In all cases it was found that the logarithmic quantile method works well as a nonparametric procedure with minimal assumptions on distributions.
In this note we set up the testing procedure for longitudinal data based on the results in [3] and [9]. It follows that an almost sure weak convergence for the ANOVA-type statistics used in the shoulder tip pain study can be obtained and this result is given in Proposition 2.1. In this paper we extend the applicability of the almost sure central limit theorem for the c-sample problem with equal sample sizes explained in [9] to a longitudinal data design with multiple groups and unequal sample sizes. It turns out that for the shoulder tip pain study we can assure the findings in [6] with tolerable changes of p-values due to the generality of this method.

An unbalanced design for longitudinal data
The model for the longitudinal data we have in mind is a nonparametric factorial design with two factors (A and B), each of them having two groups. More general designs are certainly possible to treat along the lines here, but we restrict to the special design we are interested in, motivated by the Lumley study as treated in [6]. We will present the notation and the theory that lead to hypothesis testing for this type of design. Let X ik = (X ikj ), 1 ≤ i ≤ 4, 1 ≤ k ≤ n i , 1 ≤ j ≤ t be the recorded scores at t time points of the k-th independent repetition within group i, where the four groups of the factor A and B are joined for simplicity in notation. We denote by n = 4 i=1 n i the total number of independent repetitions and by N = nt the total number of observations. We use F ij (x) = 1 2 (P (X ikj ≤ x) + P (X ikj < x)) the normalized version of the distribution function of X ikj in order to allow for ties.
Denote by the vector of marginal distributions. Using the concept of relative marginal effects we can express the null hypotheses in terms of distribution functions and test for the main effects and their interaction. The relative marginal effects are defined as where is the average of all distribution functions in the model.

M. Denker and L. Tabacu
The estimators of the relative treatment effects are given by where R ikj is the rank of X ikj among all N observations and where c(u) is the normalized counting function and c(u) = 0, 1 2 or 1 according as u < 0, u = 0 or u > 0, and the vector of the estimated relative treatment effects. The nonparametric hypotheses for testing for the main effects and their interactions are given as H 0 : CF = 0, where C is a suitable chosen contrast matrix corresponding to each test. To test the hypotheses we will use an ANOVA type test statistic as in [5] which is preferred in case of a small sample size. For this we give an equivalent formulation of the null hypotheses as The test statistic that we use is defined by the quadratic form where n is the total number of subjects. It is known that under the assumptions below and under the null hypothesis H F 0 : MF = 0 ⇔ CF = 0, the statistic Q n (M) has, asymptotically, the same distribution as the random variable 1 are independent random variables and λ ij 's are eigenvalues of MV n M. The matrix V n is defined as Testing longitudinal data by logarithmic quantiles The asymptotic distribution of the statistic Q n (M) is derived under the following assumptions: In [6] the statistic in (2.4) can be used for testing if the asymptotic distribution is approximated by an F distribution whose degrees of freedom need to be estimated. This idea was introduced by Brunner, Dette, and Munk [7] who proposed approximating the distribution of the f such that the first two moments of these distributions be equal. This approximation procedure concludes that The estimator V n of the matrix V n defined in equation (2.5) is given by . . , R ikt ) ′ the vector of the ranks R iks of X iks among all N observations and R i· = 1 ni ni k=1 R ik , i = 1, 2, 3, 4, k = 1, . . . , n i . There are other forms of this approximation procedure and they depend on the nonparametric design considered (Brunner et al. [5,6]). More about this approximation method is given in [7].
In this paper we propose an alternative way to the approximation method given in [7] that holds under weaker assumptions. In Denker and Tabacu [9] we set a general framework for approximating quantiles based on the almost sure limit theorems. If a sequence of statistics satisfies the central limit theorem and the almost sure limit theorem and they have the same limiting distribution, then the true quantile of the chosen statistic can be approximated by the logarithmic empirical quantile obtained from the almost sure limit theorem. Using the logarithmic quantile estimation developed in [9] we were able to obtain approximations for the quantiles of the test statistic defined in (2.4). The almost sure quantile estimation is a procedure that avoids the approximation given in [7] by calculating the quantiles directly from the data. In order to perform the almost sure quantile estimation for the statistics defined in (2.4) we need to verify the almost sure weak convergence of (3.1) towards the limiting distribution. The almost sure weak convergence of the test statistics Q n (M) is stated in Proposition 2.1 and its proof is deferred to the Appendix in Section 5. In order to formulate the proposition we need some further notation. This leads to condition 5 in the next proposition which is necessary for convergence in distribution. For 1 ≤ k ≤ n 1 define the random variables y ikl = 0, for n 3 + 1 ≤ k ≤ n 1 , i = 1, 2, 4, Proposition 2.1. Under the assumptions converges to some covariance matrix Σ as n 1 → ∞, where for 1 ≤ k ≤ n 1 Σ k denotes the covariance matrix of z k , the statistics Q n (M) satisfy

Numerical studies
The calculation procedure to set up the quantile estimation numerically is described in [9]. We recall the essential steps here. The logarithmic average of the sequence of statistics Q n (M) has the form 1 n and where I C denotes the indicator function of the set C so that G N becomes an empirical distribution function. Then the empirical α-quantile of G N can be used in hypothesis testing, for example a typical rejection region may look like {X ∈ R dN : |Q N (M)| ≥ z α } with G N (z α ) = α. Note that the empirical logarithmic distribution for the general statistic Q n (M) = Q n (M)(X 1 , . . . , X n ) is not symmetric and the rejection or acceptance region might depend on the order of the observations. To overcome this problem we considered a number of random permutations of the observations and calculated the quantities of the permuted sequence of independent vectors. Now the empirical logarithmic α-quantiles can be computed by where "per" is the number of permutations that we want to consider and t * i,(n) α is the empirical logarithmic α-quantile for permutation i and is given by (1) , . . . , X τi(k) ) and τ i is the i-th permutation of {1, 2, . . . , n}.
Based on this algorithm, we perform a simulation study to obtain the significance level and the power of the ANOVA-type statistics using the logarithmic quantile estimation. We consider the same factorial design as in the Lumley study [13] with 41 subjects divided in two groups (factor A) each consisting of 22, respective 19 subjects, and each subject being measured at 6 time points (factor T). For simplicity we do not divide the two groups by gender. The simulated data comes from a multivariate normal distribution with an autoregressive correlation structure Σ = (σ kl ) 1≤k,l≤6 , where σ kl = τ 2 ρ |k−l| . For the following hypotheses we obtain the significance level and the power at different levels of α. We use the same notation as in Section 2. The null hypotheses are expressed as follows: F ij and for this particular simulation study a = 2, t = 6.
Under the null hypothesis the significance level was calculated using multivariate normal vectors with mean vector 0 and correlation structure given by τ 2 = 3, ρ = 0.2. We consider 500 simulations and 100 permutations for each test statistic and each α level. The results of the significance level are given in Table 1. The test is rather strongly conservative. This is a well known effect which already has been observed in [15]. It was shown there that this property holds over a large class of distributions, hence one would like to correct for this effect by better approximation for small sample. Note that at present there is no procedure for it. On the other hand, the simulation shows (as all other simulations in the literature) that the type I error is well controlled. The power for each test statistic and different alternatives is given in Tables 2, 3, 4. We consider 500 simulations and 100 permutations for each test statistic and each α level. The power was calculated using multivariate normal vectors with different mean vectors and correlation structure given by τ 2 = 3, ρ = 0.2. The results of the study show that alternatives are well detected when they are far off. Since the test has no assumptions on the underlying distribution functions, it cannot detect small differences when the sample sizes are small or moderate. The simulation also shows that small covariances do not seem to cause problems. Last, note that there is no direct competitor for the LQE procedure in our setup here   unless additional assumptions are met. Of course, in such cases other methods outperform the LQE method (see Section 4 for an example). The LQE method may still serve as a check for model correctness.

Shoulder tip pain study revisited
As we mentioned in the introduction, a description of the shoulder tip pain study and a first analysis based on cumulative odds ratio were given in Lumley [13]. Later on, Brunner et al. [6] analyzed the same data set as a nonparametric factorial design with three factors (treatment (A), gender (B), time (T )). In this study the scores of the shoulder pain of 41 patients that had a laparoscopic surgery were recorded at 6 time points. A group of 22 patients received a treatment to reduce the shoulder pain (the treatment consists of a suction procedure to reduce the abdominal gas) and the remaining 19 patients were the control group. Each group was divided by gender. We denote by X ik = (X ikj ), 1 ≤ i ≤ 4, 1 ≤ k ≤ n i , 1 ≤ j ≤ t the recorded scores at t = 6 time points of patient k within group i. We assume that the vectors X ik are independent but their components may be dependent. Note that n 1 = 14 is the number of patients in the treatment-female group, n 2 = 8 is the number of patients in the treatment-male group, n 3 = 11 is the number of patients in the control-female group and n 4 = 8 is the number of patients in the control-male group. Below we provide the nonparametric hypotheses of no main effects and no interaction effects expressed in terms of the vector of marginal distributions and contrast matrices: Here a = 2, b = 2, t = 6, ⊗ denotes the Kronecker product of matrices, 1 t = (1, . . . , 1) ′ denotes the t-dimensional vector of 1's, P t = I t − 1 t J t is a t-dimensional projection matrix of rank t − 1, I t the t-dimensional unit matrix and J t = 1 t 1 ′ t . The ANOVA-type statistics are calculated according to the formula given in (2.4): The empirical p-values calculated using the logarithmic quantile procedure explained in Section 3 are presented in Table 5 below. In Table 6 we reproduce the results that Brunner et al [6] obtained for this study using the approximation by the F-distribution given in (2.6). Note that we obtained larger p-values than in [6] and at level 0.05 we conclude that the time effect and the interaction treatment-time effect are not significant, which is contrary to the results in [6]. However, the result of a 10% and 7.7% p-value are acceptable to confirm [6].  Tables 5 and 6 are approximations of the quantiles of the test statistic defined in (2.4) since its asymptotic distribution cannot be calculated. We noticed that the p-values in Table 5 obtained by the logarithmic quantile estimation are higher than the ones obtained by Brunner et al. [6] using the approximation by the F-distribution.

Appendix
We present the details of the proof of Proposition 2.1 stated in Section 2. We can follow the general scheme of the corresponding proof of Theorem 2.1 in [9], however the details are to be changed and discussed here. We use the basic notation as in [9], equations (1)- (10). Define and For a fixed 1 ≤ l ≤ t and for a fixed 1 ≤ v ≤ 4, define the following distribution functions For fixed 1 ≤ l ≤ t and 1 ≤ v ≤ 4 define the simple linear rank statistics as The goal is to show the almost sure weak convergence of the statistics Q n (M) under the null hypothesis. This can be done by showing the almost sure weak convergence of the vector T n since each test statistic Q n (M) is a function of T n . The almost sure weak convergence of the vector can be obtained as in Section 3 (the dependent c-sample problem) of [9], by showing the almost sure weak convergence of the vector , where for a fixed 1 ≤ l ≤ t and 1 ≤ v ≤ 4, The almost sure weak convergence of B n is obtained using Theorem 3.1 in Lifshits [12]. This is an almost sure limit theorem for sums of random vectors and a different form of it also appears in Lifshits [11]. For this we need to show that B n can be expressed as a sum of independent vectors and check the assumptions in [12]. For a fixed n, we let γ i = n ni and assume there is λ 0 , N 0 independent of n with: 1. max n max 1≤i≤4 γ i ≤ N 0 < ∞, 2. 1 − λ 0 ≤ max n ni n ≤ λ 0 < 1 for 1 ≤ i ≤ 4, 3. as n → ∞, γ i = γ i (n) →γ i . Now, since n 1 > n 3 > n 2 = n 4 it follows that B n can be written as where z k are defined in Section 2. Now we check the assumptions in Theorem 3.1 of [12]. The first condition is the weak convergence of B n . By assumption 5 of the proposition we have that Σ 1 + · · · + Σ n1 n 1 → Σ as n 1 → ∞, and since the independent vectors z k are centered and bounded, they satisfy the Lindeberg condition and hence by the multivariate central limit theorem it follows that ζ n1 := 1 √ n 1 n1 k=1 z k → N(0, Σ) as n 1 → ∞ and then B n1γ1 = n 1 γ 1 t + 1 n 1 γ 1 t The second assumption is that for some ǫ > 0 it holds sup n1 E(ln + ln + ||ζ n1 ||) 1+ǫ < ∞, where we denote ln + x = ln x, if x ≥ 1 and 0 otherwise. It is easy to see that for ǫ = 1 we have E(ln + ln + ||ζ n1 ||) 2 ≤ E(||ζ n1 ||) 2 = 1 n 1 n1 k=1 E((z 1k1 ) 2 + · · · + (z 4kt ) 2 )