Saddlepoint p-values for a class of nonparametric tests for the current status and panel count data under generalized permuted block design

: Current status and panel count data appear in many applied ﬁelds, including medicine, clinical trials, epidemiology, econometrics, demography, engineering and public health. Therefore, in this article, we use the saddlepoint approximation method to approximate the exact p-value of a number of nonparametric tests for the current status and panel count data under a generalized permuted block design. The saddlepoint approximation is referred to as higher-order approximation and it is more accurate than the methods that lead to approximations that are accurate to the ﬁrst order, such as the asymptotic normal approximation method. To verify the accuracy and e ﬃ ciency of the saddlepoint approximation method, a simulation study is conducted. The simulation study results conﬁrm that the saddlepoint approximation method is more powerful than the existing approximation method. Furthermore, number of real current status and panel count data sets are analyzed and displayed as illustrative examples.


Introduction
In medical and reliability studies, units are inspected for the occurrence of an event of interest within a predefined period. Some units under study may not have experienced the event during the time specified for the study. All we know about the event time of those units is that it exceeds the time specified previously for the end of the study. This type of data is referred to as right-censored data. Another type of data appears when the time of the event of interest cannot be directly observed, but instead we know that the event of interest occurred within a specific interval. This type of data is referred to as interval censored data and often appears in studies that cannot be monitored continuously over time, such as epidemiology studies, demographic studies, and infectious disease studies, when infection is an unobserved event. In this regard, we can refer to two important references that offer more information and applications for the interval censored data, namely: Huang and Wellner [1] and Sun [2]. Due to time limits, expenses, and other difficulties, a more extreme form of survival censored data, namely current status data may be preferred. In general, the term current state data refers to two types of data, the first of which is also known as case-1 interval censored data, and we refer to it here as CS-I data. In CS-I data, the only available information is whether the event of interest has occurred before the examination time or not. Carcinogenicity studies, partner studies of HIV and studies of non-fatal human disease are examples of studies that contain CS-I data; see Gart et al. [3], Jewell and Shiboski [4] and Keiding et al. [5]. The second type of current status data appears in studies in which subjects acquire a recurring event over time, such as multiple incidental neoplasms. In these studies, each element of the study is examined only once, and the number of events that occurred before the time of the examination is counted. Therefore, the information available for this type of data is the number of repeated events that occurred before the time of the examination. Throughout this article we refer to this type of data as CS-II data. For more information and applications about this type of data and the studies in which it appears, readers can review the articles of Ii et al. [6], Diamond and McDonald [7] and Dinse [8]. Another type of important data that appears in medical follow-up studies and reliability studies which can be considered as a generalization of CS-II data is the panel count data. This type of data differs from the CS-II data in that each element of the study is examined more than once. Accordingly, the panel count data consists of the discrete examination times for each element of the study and the number of events that occurred between the successive examination times. It is possible to refer to a number of essential references that dealt with the panel count data in more detail, such as: Thall and Lachin [9], Sun and Kalbfleisch [10] and Wellner and Zhang [11]. As a result of the importance of the previous data types, nonparametric tests for CS-I data, CS-II data and panel count data were introduced by Sun and Kalbfleisch [12], Sun [13], Sun and Fang [14] and Balakrishnan and Zhao [15]. Such nonparametric tests are presented in more detail in the second section of this article.
The primary goal of clinical trials is to make an accurate treatment comparison. The design of a clinical trial includes choosing a method to assign treatment for patients. This method of allocating treatment to patients must fulfill several conditions, including concealing the allocation from the patient and researcher, so that their judgment on the experimental treatment is more objective and not biased. Moreover, the assignment of the next patient must be unexpected even if some information is available about previous assignments. The simplest procedure to achieve these conditions is to toss a coin to determine which of the two treatments is assigned to the patient. Assigning patients to treatment by tossing a balanced coin is often referred to as simple randomization, SR, or complete randomized design, CRD. The CRD achieves the highest level of randomization but may result in treatment imbalance which may cause selection bias in the trial outcome. The literature includes a number of other designs that ensure equilibrium between treatments, such as the random allocation rule, RAR, truncated binomial design, TBD and permuted block design, PBD. In RAR, assigning patients to treatment is done by selecting a ball at random from an urn containing n/2 balls for each treatment, and the withdrawal is without replacement. This process continues until the urn is empty. TBD uses the same technique of CRD until half of the patients are in one of the treatments, and then the rest go to the other treatment. PBD is the most commonly used randomization design; see McEntegart [16], Berger [17], and Pond et al. [18]. It uses the same technique of RAR within each block. A block is a group of empirical units that are similar in some measure. The way the block elements are similar is expected to have an effect on the response to treatments. This design is applied by dividing the sample into relatively homogeneous blocks of equal sizes, then RAR is applied within each block. The dispersion within each block is less than its counterpart within the entire sample. Therefore, testing the effect of treatment within the block is more effective compared to the whole sample [19][20][21]. A general form of the PBD is called generalized PBD which enables different block sizes and imbalanced group sizes within each block. As a result of the importance and uses of PBD, we propose in this article an approximation to the lower tail probability of a class of tests for current status and panel count data using the saddlepoint approximation method under generalized PBD.
The saddlepoint approximation method is one of the statistical approximation methods that has importance and various uses in many branches of statistics. It has been described as one of the approximation methods that has a significant impact on the development of high-precision approximations in the various branches of statistics [22]. The most influential and important contribution to this method began with Daniels [23], who presented an accurate approximation of the probability density function. Since the publication of this pioneering article by Daniels, many successive approximations to a number of statistical functions have been appeared, such as the saddlepoint approximation of univariate cumulative distribution function, CDF, [24], conditional CDF [25] and bivariate CDF [26]. After that, the previous approximations were widely used to solve many statistical problems. A limited number of them can be mentioned here, for example, Daniels [27], Davison and Hinkley [28], Butler [29] and Abd-Elfattah and Butler [30,31]. For a number of recent articles on this topic, readers can see [32][33][34][35][36][37][38].
This article consists of five sections, which can be briefly described as follows: The first section is an introduction to the topic of the article. The second section presents a number of nonparametric tests that manipulate the data of interest in this article. The third section is dedicated to displaying the approximation of the exact p-values for tests presented in the second section and similar tests using the saddlepoint approximation method. The fourth section is devoted to clarifying the theoretical results presented in the previous sections and comparing the accuracy of the proposed method with the asymptotic normal approximation method through a simulation study and the analysis of real data sets. The last section of this article presents a summary of the obtained observations and results.

Permutation tests for current status and panel count data under PBD
In this section, four permutation tests for current status and panel count data under PBD are displayed. Under PBD a sample of n individuals are divided into b blocks of sizes n l such that n = b l=1 n l . Within each block, the RAR is applied to randomize q l , l = 1, 2, ..., b of the individuals to the treatment group and n l − q l to the control group.
The general form of the permutation test statistic for current status and panel count data under PBD to test the tumor prevalence rate is given by where w l is the weight of the block l, l = 1, 2, ..., b, W l, j = w l a l, j , and ρ l, j is the group indicator for the jth individual belongs to the block l. Furthermore, a l, j is the score function of the test which varies according to the nature and type of data, and this will be addressed in more details in the following subsections for CS-I, CS-II and panel count data. Van Elteren [39] introduced the block weight w l = 1 n l +1 as the optimal block weight. Thus, we consider this optimal bock weight through the calculation of this article.

CS-I data
Consider the observed CS-I data for the jth individual in the lth block be {(t l, j , ρ l, j , δ l, j ), j = 1, 2, ..., n l , l = 1, 2, ..., b}, where t l, j is the examination time, ρ l, j is the treatment indicator (ρ l, j = 0 or 1 if the jth individual belongs to the control or treatment group, respectively), and δ l, j is an indicator of the occurrence of the event of interest before the examination time t l, j . Sun and Kalbfleisch [12] introduced test statistic for testing, H 0 : The prevalence of tumors in the treatment and control groups is equal as follows: where S l, j is the score function and δ * l, j =Ĝ(t l, j ) is the isotonic estimate of the CDF, G(t l, j ), of the common propagation of the tumor. For more details on how to calculateĜ(t l, j ), see Barlow et al. [40], and Sun and Kalbfleisch [12].
The statistic T 1 is asymptotically normally distributed [12]. It is worth noting that the statistics T 1 include a number of tests, including: Hoel and Walburg [41] statistic when S l, j = 1 and Finkelstein [42] statistic when S l, j = − log

CS-II data
Let the observed CS-II data for the jth individual in the lth block be {(t l, j , ρ l, j , ψ l, j ), j = 1, 2, ..., n l , l = 1, 2, ..., b}, where t l, j is the examination time, ρ l, j is the treatment indicator previously defined in Subsection 2.1, and ψ l, j = Ψ l, j (t l, j ) is the number of events that happened before the examination time t l, j . Also, let µ l, j (t) = E(Ψ l, j (t)|ρ l, j ) is the conditional mean function, MF, of the individual j in block l. To test the hypothesis of equality of the n l mean functions within each block, the test statistic of Sun and Kalbfleisch [43] can be modified and used as follows: where ψ * l, j is the isotonic regression estimate, IRE, of the MF µ l, j (t l, j ). One can use the algorithm that was introduced by Barlow et al. [40] to evaluate the IRE of the MF. Sun and Kalbfleisch [43] derived the asymptotic distribution of T 2 , assuming the null hypothesis is true and n l → ∞ or (n → ∞).

Panel count data
As we mentioned earlier, panel count data is a generalization of CS-II data. The panel count data differs from the CS-II data in that each element of the study is examined more than once, and the number of events that occurred between successive examination times is counted. Whereas in the CS-II data, each element of the study is examined only once, and the number of events that occurred before the time of the examination is counted. Through the relationship between the panel count data and CS-II, Sun and Fang [14] modified the test statistic in (2.3) to take into account the repeated or periodic examination of each element of the study. Since each element of the study is examined several times, the examination times for jth element of block l are as follows: 0 < t l, j,1 < t l, j,2 < ... < t l, j,m j , l = 1, 2, ..., b, j = 1, 2, ..., n l , where m j be the number of examination for the element j. In the framework of the previous description of the difference between the CS-II data and panel count data, the statistic (2.3) becomes where ψ * l, j,k is IRE of the MF. Sun and Fang [14] obtained the asymptotic normal distribution of T 3 . While Sun and Fang [14] relied on the IRE method for estimating the MF, Balakrishnan and Zhao [15] suggested using the nonparametric maximum likelihood estimator, NPMLE, for estimating the MF. Thus, the statistic T 3 becomes where ∆μ(t l, j,k ) =μ(t l, j,k ) −μ(t l, j,k−1 ), ∆ψ(t l, j,k ) = ψ(t l, j,k ) − ψ(t l, j,k−1 ), andμ(t) is the NPMLE of the common MF µ(t). Balakrishnan and Zhao [15] introduced the asymptotic normal distribution of T 4 .

Approximating the tail probabilities of statistic T
This section presents the proposed procedures for approximating the tail probabilities of statistic T under PBD. As we previously stated that the PBD divides the patients into independent blocks and within each block, patients are randomly distributed using the RAR. Therefore, the PBD is considered as repetition of the RAR. As a result of this, the permutation distribution of the vector of the treatment indicators (ρ 1,1 , ρ 1,2 , ..., ρ 1,n 1 , ..., is the permutation distribution of the vector of the treatment indicators of the block l, (ρ l,1 , ρ l,2 , ..., ρ l,n l ). The RAR ensures that a certain number of patients within block l are assigned to the treatment group, let this number of patients be q l , i.e n l j=1 ρ l, j = q l . Thus, the permutation distribution of the vector of the treatment indicators (ρ 1,1 , ρ 1,2 , ..., ρ 1,n 1 , ..., ρ b,1 , ρ b,2 , ..., ρ b,n b ) is equivalent to the distribution of the independent and identically Bernoulli(β l ) random variables, From the foregoing, we conclude that (3.1) n l j=1 W l, j Y l, j , and U l = n l j=1 Y l, j , l = 1, 2, ..., b, then the p-value of the statistic T can be obtained by evaluating the following probability where T 0 is the observed value of the statistic T . The probability in the right-hand side of Eq (3.2) can be approximated using the saddlepoint approximation of the conditional CDF [25] as follows: where Φ() and φ() are the CDF and PDF of the standard normal distribution. Also,η andυ are given as follows:η In Eq (3.5), C and C κ are the (b + 1) × (b + 1), and b × b Hessian matrices of C(τ, κ 1 , ..., κ b ), and C(0, κ 1 , ..., κ b ), respectively. The abbreviation det(C ) refers to determinant of the matrix C .

(3.9)
It is clear that the permutation distribution of the statistic T does not depend on β l 's values, for this reason, we can take β l equals to q l /n l . This value of β l makesκ l0 = 0, l = 1, 2, ..., b.

Illustrative examples and simulation study
This section explains the importance of the proposed method to approximate the exact p-value by comparing the proposed method with the normal approximation method. This comparison is conducted on two levels: The first level is real data analysis and the second level is a simulation study. During this section, statistics T 2 and T 3 are applied for current status and panel count data, respectively, and statistics T 1 and T 4 can be applied in the same manner. The exact p-value is approximated using the proposed method, which is the saddlepoint approximation method and the asymptotic normal approximation method for the considered tests. To compare the accuracy of the two approximation methods, we need to compute the exact p-value, and unfortunately, it cannot be calculated. Thus, we turn to the simulated method to introduce an accurate approximation to the exact mid-p-value by generating million of randomized PBD sequences for the control and treatment labels. Then, approximating the exact mid-p-value of the statistic T as { I(T > T 0 ) + 0.5 I(T = T 0 )}/10 6 . We refer to this approximated p-value as a reference mid-p-value or simulated mid-p-value. The mid-pvalue is used here instead of the ordinary p-value due to its accuracy preference by many statisticians. Especially for hypothesis testing in discrete problems. For views that promote use of the mid-p-value; see Pierce and Peters [44], Routledge [45], Agresti and Gottard [46], and Chapter 7 of Butler [29].

Illustrative examples
An important way to clarify the purpose of this section and illustrate the suggested procedures is through real illustrative examples. In this regard, two sets of real data are analyzed, representing current status data and panel count data. The current status data set were observed from a tumorigenicity study of laboratory male and female mice by Ii et al. [6]. This data is presented in Table 1 of the reference [6], and it represents the number of tumors detected at the time of death or the time of sacrifice for 199 mice, 100 of them are males (ρ l, j = 1) and 99 are females (ρ l, j = 0). The hypothesis of there is no difference in the rate of tumor development in groups of male and female mice is tested under PBD with nine blocks of sizes n 1 = 23, and n l = 22, l = 2, ..., 9 and q l = 14, 17, 7, 15, 10, 15, 7, 7, 8, for l = 1, ..., 9, respectively.  Table 1 represents the simulated mid-p-value, saddlepoint approximation, SPA, p-value and asymptotic normal approximation, ANA, p-value, for this data set, in the first row in front of the title "Example 1". As an example of samples of medium and small sizes, two samples of sizes 100 and 32 were taken from the total sample. The results of these two samples are indicated in Table 1 as "Example 2" and "Example 3", respectively. For the sample of size 100, the number of blocks was 5 with block size n l = 20, and q l = 5, 15, 9, 12, 9, l = 1, ..., 5. Furthermore, for the sample of size 32, the number of blocks was 4 with block size n l = 8, and q l = 5, 5, 2, 4, l = 1, ..., 4.
The second application example represents panel count data observed from a bladder tumors study [47,48]. The study began by removing all tumors from 72 patients and then distributing the patients into two groups, one of which was a treatment group and the other a control group. At the monthly follow-up visit for each patient during the year, the number of bladder tumors that were detected is recorded. The hypothesis of there is no difference in the rate of tumor development in treatment and control groups is tested under PBD with six blocks of sizes n l = 12, and q l = 9, 4, 6, 7, 6, 6, for l = 1, ..., 6, respectively. The three p-values for such data set are presented in Table 1 in front of the title "Example 4". As an example of a sample of small size, a sample of size 50 was taken from the total sample. For this sample, the number of blocks was 5 with block size n l = 10, and q l = 7, 4, 4, 4, 6, l = 1, ..., 5. Also, the three p-values for such data set are presented in Table 1 in front of the title "Example 5". Furthermore, Table 1 represents the relative absolute errors, RAE, of the saddlepoint and asymptotic normal approximation methods with respect to the simulated method which is considered here as the reference method.
By comparing the three columns of the p-values in Table 1, we can note that, in all the proposed examples, the SPA p-values are closer to the simulated p-values than the ANA p-values. Accordingly, we can say that the proposed method is more accurate than ANA method.

Simulation study
It is not possible to completely rely on the accuracy that appeared through the illustrative examples, although it was a good indicator of the accuracy of the proposed method. Therefore, a simulation study is conducted to verify that accuracy and form a clear conclusion about the accuracy of the proposed method. In this context, clinical trials are simulated to yield current status and panel count data. First, the current status data for the block l, l = 1, 2, ..., b is generated from Poisson(µ l, j (t l, j )e αρ l, j ) processes, where α is regression coefficient, the MF µ l, j (t l, j ) = λt l, j , λ is constant and t l, j = j/n l , j = 1, 2, ..., n l . Secondly, the panel count data for the block l, l = 1, 2, ..., b is generated assuming that we have a clinical trial in which patients are examined monthly for half a year. Accordingly, the number of examination for the patient j in block l, follows Uniform distribution U{1, 2, ..., 6}. As in the current status data, the number of the repeated event that found out at each examination time follows a Poisson process with MF, µ(t l, j,k ) exp{αρ l, j }, where µ(t l, j,k ) = λt l, j,k , and t l, j,k = k/6, k = 1, 2, ..., 6, j = 1, 2, ...., n l , l = 1, 2, ..., b. 1, 000 current status and panel count data sets are generated according to four scenarios, SC, namely SC1, SC2, SC3 and SC4 as follows: • SC1: n = 30, b = 5, n l = 6, q l = 3, l = 1, 2, ..., 5.
The treatment indicators ρ l, j are chosen by simulating a PBD sequence for each simulated data set. Since the aim of this subsection is to emphasize the accuracy of the proposed method, which appeared clearly in the examples illustrated in the previous subsection. Accordingly, three criteria are proposed to compare the saddlepoint and asymptotic normal approximation methods. The first criterion expresses the percentage of the number of times that the proposed approximation was closer to the reference method, which is the simulation method, which needs a lot of calculations, as we explained how to calculate it at the beginning of this section. We refer to this criterion as PER(S PA). The second and third criteria both measure the amount of error resulting from both methods to approximate the exact p-value of the considered class of tests. These two criteria are the mean relative absolute error and mean square error for both methods. We refer to them as RAE(S PA), RAE(ANA), MS E(S PA), and MS E(ANA). The RAE(S PA) can be obtained as RAE(S PA) = 1 where P i andP i are the simulated and SPA p-values, respectively. Similarly, the RAE(ANA) can be calculated by replacing SPA p-value by ANA p-value. Furthermore, the MS E(S PA) can be obtained as MS E(S PA) = 1 The MS E(ANA) also can be calculated by replacing SPA p-value by ANA p-value in the last formula, where N = 1, 000. The three comparison criteria are summarized in Table 2 of the current status data and in Table 3 for panel count data.  All the results presented in Tables 2 and 3 confirm the observation drawn from the illustrative examples, which is the proposed method is more accurate than the traditional method in approximating the exact p-value of the proposed class of tests. For example, we find that the lowest percentage of the proposed method approaching the reference method is PER(S PA) = 81.9%, while the highest percentage exceeds 96%. Moreover, if we take the results presented in the first row in Table 2, we note that the mean relative absolute error for the proposed and traditional methods are 3.37% and 14.17%, respectively. In this case, the mean relative absolute error resulting from the traditional method is approximately three times the corresponding error resulting from the proposed method. To illustrate this in another way, we plot the relative absolute errors for both approximation methods, SPA and ANA methods, for the aforementioned case in Figure 1. We would like to point out that the exact p-value of the considered class of nonparametric tests can be approximated using three approximation methods: The asymptotic normal approximation method, the simulation method, the saddlepoint approximation method, and the last one is the method proposed in this article. We have shown later that the saddlepoint approximation method is more accurate than the asymptotic normal approximation method. Therefore, we can use the saddlepoint approximation method as an alternative to the asymptotic normal approximation method for approximating the exact p-value. But there remains an important question here that may interest readers. This question is why we may use the saddlepoint method as an alternative to the simulation method. The answer is that the saddlepoint method is computationally less demanding than the simulation method. To make this matter more clear to the readers, the computing times for saddlepoint and simulation methods were recorded and these results are presented in Tables 4 and 5 for the two data types used in this article. From Tables 4 and 5, we note that to find the average value of 1,000 p-values that were calculated using the saddlepoint method, we need approximately a minute for all the imposed sample sizes. While we need 15 to 23 hours to calculate the corresponding value using the simulation method. Thus, we can say that the proposed method saves a lot of time compared to the simulation method and it is more accurate than the asymptotic normal approximation. Thus, we can suggest it as a good alternative to both asymptotic normal and simulated methods.

Conclusions
In this article, a number of nonparametric tests for the current status and panel count data which appear in many studies such as clinical trials, epidemiology, and demography are formulated in the form of a class of linear rank tests. The saddlepoint approximation method is proposed to approximate the exact distribution of the proposed class of tests under PBD. The proposed method of approximation showed high accuracy compared to the normal approximation method which is widely used with these types of tests. Thus, we can conclude that the proposed method is a good alternative to the normal approximation method, and the proposed method does not require a lot of calculations that take a lot of time, like the simulated method. Therefore, we can say that the proposed method has the required accuracy and does not require much time in its calculation.

Use of AI tools declaration
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.