Dbemplikegof: an R Package for Nonparametric Likelihood Ratio Tests for Goodness-of-fit and Two Sample Comparisons Based on Sample Entropy

We introduce and examine dbEmpLikeGOF, an R language for performing goodness-of-fit tests based on sample entropy. This package also performs the two sample distribution comparison test. For a given vector of data observations, the provided function dbEmpLikeGOF tests the data for the proposed null distributions, or tests for distribution equality between two vectors of observations. The proposed methods represent a distribution-free density-based empirical likelihood technique applied to nonparametric testing. The proposed procedure performs exact and very efficient p values for each test statistic obtained from a Monte-Carlo (MC) resampling scheme. Note by using an MC scheme, we are assured exact level α tests that approximate nonparametrically most powerful Neyman-Pearson decision rules. Although these entropy based tests are known in the theoretical literature to be very efficient, they have not been well addressed in statistical software. This article briefly presents the proposed tests and introduces the package, with applications to real data. We apply the methods to produce a novel analysis of a recently published dataset related to coronary heart disease.


Introduction 1.Empirical likelihood
Empirical likelihood (EL) allows researchers the benefit of employing powerful likelihood methods (e.g., maximizing likelihoods) without having to choose a parametric family for the data.A thorough overview of empirical likelihood methods can be found in Owen (2001).The research in this area continues to grow while empirical likelihood methods are being extended to many statistical problems (e.g., Vexler, Yu, Tian, and Liu 2010;Yu, Vexler, and Tian 2010).
In short, an outline of the EL approach can be presented as follows.Given independently identically distributed observations X 1 , . . ., X n , the EL function has the form of L p = Π N i=1 p i where the components p i , i = 1, . . ., n maximize the likelihood L p (maximum likelihood estimation) provided that empirical constraints, based on X 1 , . . ., X n are in effect (e.g., n i=1 p i = 1, n i=1 p i X i = 0, under the hypothesis EX 1 = 0).Computation of the EL's components p i , i = 1, . . ., n used to be an exercise in Lagrange multipliers.This nonparametric approach is a product of the consideration of the 'distribution-functions'-based likelihood Π n i=1 F (X i ) − F (X i −) over all distribution functions F where F (X i −) denotes the left hand limit of F at X i .
The following extensions from these methods involve a density-based likelihood methodology for goodness-of-fit testing.The proposed extensions have been motivated by developing test statistics that approximate nonparametrically most powerful Neyman-Pearson test statistics based on likelihood ratios.A density-based EL methodology can be introduced utilizing the EL concept (see Vexler and Gurevich 2010a,b;Gurevich and Vexler 2011).Following the EL methodology, the likelihood function L f = Π n i=1 f (X i ) where f (•) is a density function of X i can be approximated by Π n i=1 f i , where values of f i should maximize Π n i=1 f i provided that an empirical constraint which corresponds to f (u)du = 1 under an underlying hypothesis is in effect.Outputs of the density based EL approach have a structure that utilize sample entropy (e.g., Vexler and Gurevich 2010a).To date, density based EL tests have not been presented in R packages (R Development Core Team 2009) but are known to be very efficient in practice.Moreover, despite the fact that many theoretical articles have considered very powerful entropy-based tests, to our knowledge there does not exist software procedures to execute procedures based on sample entropy in practice.

Goodness-of-fit tests
Goodness-of-fit tests commonly arise when researchers are interested in checking whether the data come from an assumed parametric model.In certain situations, this question manifests to test whether two datasets come from the same parametric model.Commonly used goodnessof-fit tests include the Shapiro-Wilks (SW) test, Kolmogorov-Smirnov (KS) test, Wilcoxon rank sum test (WRS), the Cramér-von-Mises test, and the Anderson-Darling test (Darling 1957;Hollander, Wolfe, and Wolfe 1973;Royston 1991).
Recently several new goodness-of-fit tests have been developed using density based empirical likelihood methods.These powerful new tests offer exact level α tests with critical values that can be easily obtained via Monte-Carlo approaches.

EL ratio test for normality
The derivation of the EL ratio test for normality can be found in Vexler and Gurevich (2010b).To outline this method, we suppose that the data consist of n independent and identically distributed observations X 1 , . . ., X n .Consider the problem of testing the composite hypothesis that a sample X 1 , . . ., X n is from a normal population.Notationally, the null hypothesis is where N (µ, σ 2 ) denotes the normal distribution with unknown mean µ and unknown standard deviation σ.
Following Vexler and Gurevich (2010b), to test the null hypothesis at (1) we use the test statistic, where 0 < δ < 1, s denotes the sample standard deviation, and X (1) , . . ., X (n) represent the order statistics corresponding to the sample X 1 , . . ., X n .Note, here, X (j) = X (1) if j ≤ 1 and We employ the following decision rule, we reject the null hypothesis if and only if where C is a test threshold and V n is the test statistic defined in (2).

Since sup
µ,σ the type I error of the test at (3) can be calculated exactly using e.g., a Monte Carlo approach.
Figure 1 displays the Monte-Carlo roots C α of the equation P X 1 ,...,Xn∼N (0,1) {log (V n ) > C α } = α for different values of α and n. (For each value of α and n, the solutions were derived from 75,000 samples of size n.)

EL ratio test for uniformity
One can show that tests for uniformity correspond to general goodness-of-fit testing problems when the null hypothesis is based on completely known distribution functions.The full derivation of the EL ratio test for uniformity can be found in Vexler and Gurevich (2010b).We consider the test for the uniform distribution on the interval [0, 1] (U ni(0, 1)), specifying the null distribution versus the alternative that Y 1 , . . ., Y n are from a nonuniform distribution F (y).
By virtue of outputs from Vexler and Gurevich (2010b), we use the following EL ratio test statistic where 0 < δ < 1 and Y (1) , . . ., Y (n) correspond to the order statistics from the sample implies that H 0 is rejected, where C is a test threshold.The significance level of this test can be calculated according to the following equation, Figure 2 shows the roots C α of the equation in ( 8) for different values of α and n. (For each value of α and n, the solution is derived from 75,000 samples of size n).
Note, the test for uniformity in (7) will cover a generalized version of the goodness-of-fit problem when the distribution in H 0 is completely known.In other words, if we consider the random sample X 1 , . . ., X n from a population with a density function f and a finite variance we can test the hypotheses: where, under the alternative hypothesis, F H 1 is completely unknown, whereas under the null hypothesis, F H 0 (x) = F H 0 (x; θ) is known up to the vector of parameters θ = (θ 1 , . . ., θ d ).
Note, d ≥ 1 defines the dimension for θ.Although a strong assumption, by assuming that densities exist under the alternative, we are able to demonstrate asymptotic consistency of the proposed test statistic.By employing the probability integral transformation (Dodge 2006), if Hence, the uniformity test in ( 7) can be employed on data Y 1 , . . ., Y n to test whether X 1 , . . ., X n conforms with density f H 0 .

EL ratio test for distribution equality
In this section we present the EL ratio test for examining if two datasets are from the same distribution.The complete derivation for this case can be found in Gurevich and Vexler (2011).In short, let X 1 = (X 11 , X 12 , . . ., X 1n 1 ) denote independent observations in the first dataset and X 2 = (X 21 , X 22 , . . ., X 2n 2 ) denote independent observations in another dataset.Under H 0 (equal distributions), we assume that both groups are identically distributed.That is, our null hypothesis is where F X 1 and F X 2 denote the cumulative density function (CDF) for the observations in X 1 and X 2 , respectively.
Our test statistic for the hypothesis in ( 10) is given by Rn 1 ,n 2 = min The ∆ mij function is defined as: where I() denotes an indicator function that takes the value 1 if the condition in the parenthesis is satisfied and takes the value 0, otherwise.The x i(j) indicates the j-th order statistic for the group i. Note, here, The test rejects the null hypothesis for large values of log Rn 1 ,n 2 .Note that we define Significance of level α can be determined since for any distribution function F .Hence, the null distribution of Rn 1 ,n 2 is independent with respect to the form of the underlying distributions given H 0 .Hence, we can tabulate universal critical values regardless of the null distribution of the X ij 's.
Table 1 shows the critical values for the logarithm of Rn 1 ,n 2 for common sample sizes and significance levels.These critical values were obtained from deriving Monte Carlo roots of based on 75,000 repetitions of sampling X 1j ∼ N (0, 1) and X 2j ∼ N (0, 1).
In the following we present the structure and functioning of the package, with applications to real datasets.

What is package dbEmpLikeGOF
In summary, the dbEmpLikeGOF package provides a function dbEmpLikeGOF to be used for empirical likelihood based goodness-of-fit tests based on sample entropy.The function can also perform the two sample EL ratio test for the hypothesis in (10).The output of dbEmpLikeGOF analysis is an object containing the test statistic and the p value.Standard bootstrap options can be used in conjunction with the object (statistic) in order to make confidence sets a straightforward and automated task.
The proposed function provides the test statistic and p value, where the user can specify an option for the p value to be obtained from a Monte Carlo simulation or via interpolation from stored tables.A complementary function is also included in this package to compute the cut-off value for the appropriate tests of normality and uniformity.
To perform the goodness of fit function, we call the dbEmpLikeGOF function: dbEmpLikeGOF(x=data, y=na, testcall="uniform", pvl.Table=FALSE, num.MC=1000) where data represents a vector of data and the testcall option allows the user to perform the goodness-of-fit test for uniformity (uniform) or normality (normal).The pvl.Table option when set to TRUE employs a stored table of p values to approximate the p value for the given situation, when set to FALSE, a Monte-Carlo simulation scheme is employed to estimate the p value.The number of simulations in the Monte-Carlo scheme can be controlled using the num.MC option.
In the event that the user specifies both x and y in dbEmpLikeGOF the two sample distribution equality hypothesis in ( 10) is performed using the logarithm of the statistic in (11).
Further input options for dbEmpLikeGOF include specifying δ (delta) in ( 6) and δ (delta.equality) in ( 11).We recommend using the default settings and note that these procedures are fairly robust to the specification of δ.
In certain situations the user may simply be interested in obtaining the cut-off value for a given test and sample size.The function returnCutoff is designed to return the cut-off value for the specified goodness-of-fit test at a given α significance level.For example, the following code: returnCutoff(samplesize, testcall="uniform", targetalpha=.05,pvl.Table=F, num.MC=200) will return the Monte Carlo based test statistic cutoff for determining significance at level 0.05 for the null hypothesis in (5) with decision rule in (7).
The required input for returnCutoff requires the user to specify the sample size (samplesize) and targetalpha represents the significance level of the test.If the user specifies samplesize as a two element vector, then it is assumed that the user is specifying the two sample sizes for the distribution equality test.Note, num.MC represents the number of Monte-Carlo simulations performed to estimate the cut-off value.Similar to the dbEmpLikeGOF, there is an option to use stored tables to obtain the cutoff rather than Monte-Carlo simulations.The logical variable pvl.Table when true will determine the cut-off from interpolation based on stored tables.Importantly, note that the cutoff values for the test statistics in (2), (6), and (11) are returned on the logarithm scale with base e.
Using the methodology developed by North, Curtis, and Sham (2003), for each test statistic, T obs , the Monte Carlo p value is computed according to the equation below: where M represents the number of simulations and T (x 1 , . . ., x n ) is the statistic from the simulated data (x 1 , . . ., x n ) and T obs is the observed statistic.

Availability
The dbEmpLikeGOF package is available from the Comprehensive R Archive Network at http://CRAN.R-project.org/ and also available for download from the author's department webpage (http://sphhp.buffalo.edu/biostat/research/software/dbEmpLikeGOF/index.php).

Examples
This section provides examples of using dbEmpLikeGOF with the corresponding R code.
Using several publicly available datasets and a novel dataset we compare our results with results from other goodness-of-fit tests including Shapiro-Wilks (SW), Kolmogorov-Smirnov (KS), and Wilcoxon rank sum (WRS) tests.The SW test for normality was implemented using the R function shapiro.test.The one and two sample KS tests were implemented using the R function ks.test.The two sample WRS test was implemented using the R function wilcox.testR Development Core Team ( 2009).
Note that Monte Carlo studies presented in Vexler and Gurevich (2010a) and Gurevich and Vexler (2011) showed various situations when the density based EL test clearly outperformed the classical procedures.

Snowfall dataset
We consider the 63 observations of the annual snowfall amounts in Buffalo, New York as observed from 1910/11 to 1972/73 (data in Table 2 and Figure 3); see, for example, Parzen (1979).We perform the proposed test for (1) with the statistic in (2).We obtain the value of the test statistic to be 8.49 with an MC based p value of 0.3234 using the following command, dbEmpLikeGOF(x=snow, testcall="normal", pvl.Table=FALSE, num.mc=5000), where snow represents the vector of annual snowfall amounts.Note, when using a KS test to examine the same hypothesis for the snowfall dataset we obtain a p value of 0.9851 and a SW p value of 0.5591.Thus, we conclude that there is not significant evidence to conclude that the snowfall data is inconsistent with a normal distribution.
To examine the robustness of our tests, we employed a resampling technique where we randomly removed 10, 20 and 50 percent of the data and examined the significance of the test statistics derived from the remaining dataset.For each test, we repeated this technique 2000 times where the results are summarized in Table 3.When randomly removing 10, 20, and 50 percent of the data, we obtained a significant density based EL test statistic in 3.6, 5 and 6.1 percent of the simulations, respectively.Table 4 displays the average p value for each of the tests when randomly removing the data.Ultimately, the results from randomly removing a percentage of observations demonstrates the robustness of the proposed test.
To study the power of the EL statistic, we examine four snowfall datasets where each dataset is obtained by randomly removing 50 percent of the snowfall data.These datasets represent examples where the EL based test is significant (p value < 0.05), while the KS and SW tests are not significant (p values > 0.05).These examples are summarized by displaying the kernel density estimates and the hypothesized distributions as shown in Figure 4. From the examples in Figure 4, there is the potential for the EL tests to be more powerful than KS and SW tests.

Birth dataset
As another example of dbEmpLikeGOF, we examine a baby boom dataset summarizing the time of birth, sex, and birth weight for 44 babies born in one 24-hour period at a hospital  in Brisbane, Australia.These data appeared in an article entitled "Babies by the Dozen for Christmas: 24-Hour Baby Boom" in the newspaper The Sunday Mail on December 21, 1997.According to the article, a record 44 babies were born in one 24-hour period at the Mater Mothers' Hospital, Brisbane, Australia, on December 18, 1997.The article listed the time of birth, the sex, and the weight in grams for each of the 44 babies where the full dataset can be found at Dunn (1999).We examine whether an exponential distribution can be used to model the times between births.The data summarizing the time between births is shown in Table 5.Using the KS test for an exponential distribution, we obtain a p value of 0.3904.We transform the data in Table 5 using the inverse exponential distribution and thus the transformed data can be examined using the EL ratio test for uniformity.The following command returns the test statistic (6) and p value, dbEmpLikeGOF(x=baby, testcall="uniform", pvl.Table=FALSE, num.mc=5000), where baby represents the vector of transformed data.When this test is employed, we observe a MC based p value of 0.6708.Ultimately, for this data the time between births can be adequately modeled using an exponential distribution.
Similar to the snowfall dataset, we examine the robustness of our results by employing a bootstrap scheme where the bootstrap resamplings are taken when removing 10, 20, or 50 percent of the original dataset.The results are summarized in Tables 3 and 4 simulations where we randomly remove 10, 20, and 50 percent of the observations from the original dataset, we find significant statistics in 0, .2, and 2 percent of the simulated datasets, respectively.When randomly removing 50 percent of the observations from the original dataset, the mean p value using the EL test is smaller than the KS statistic, suggesting that the EL test may be more powerful in certain situations.This is supported by examining four birth datasets representing 50 percent of the original birth dataset where the EL test is significant (p value < 0.05), while the KS test is not significant (see Figure 5).Figure 5 displays the data driven kernel density estimate against the hypothesized distribution.From this example, there may be situations where the EL test for uniformity may be more powerful than the traditional KS test.

A TBARS data example
For a novel analysis using the density-based EL software, we consider data from a study evaluating biomarkers related to atherosclerotic coronary heart disease (see Acknowledgments).
Free radicals have been implicated in the atherosclerotic coronary heart disease process.Welldeveloped laboratory methods may grant an ample number of biomarkers of individual oxidative stress and antioxidant status.These markers quantify different phases of the oxidative stress and antioxidant status process of an individual.A population-based sample of randomly selected residents of Erie and Niagara counties of the state of New York, U.S.A., was the focus of this investigation.The New York State Department of Motor Vehicles drivers' license rolls were utilized as the sampling frame for adults between the ages of 35 and 65; where the elderly sample (age 65-79) was randomly selected from the Health Care Financing Administration database.Participants provided a 12-hour fasting blood specimen for biochemical analysis at baseline, and a number of parameters were examined from fresh blood samples.A complete description of this dataset is available at Schisterman, Faraggi, Browne, Freudenheim, Dorn, Muti, Armstrong, Reiser, and Trevisan (2001).
A cohort of 5620 men and women were selected for the analyses yielding 1209 cases (individuals that had a heart attack) and 4411 controls (no heart attack history).In a subset of this dataset we examine the significance of the thiobarbituric acid reactive substances (TBARS) variable which is known to play a role in atherosclerotic coronary heart disease process.TBARS was measured in patient serum samples using reverse-phase high performance liquid chromatography and spectrophotometric approaches.
We employed a bootstrap strategy to study the TBARS variable using the statistic in (11).
The strategy was based on randomly choosing 200 patients, where 100 patients had previously suffered a heart attack and 100 patients did not have a heart attack.The distribution of TBARS was examined for equality between the heart attack patient cohort and the no heart Figure 6 displays the histogram of the logarithm of the two sample test statistic calculated in (11).5.9 percent of the bootstrap resamplings yielded a significant test statistic.These results were also compared against the two sample KS test and two sample WRS test to compare distribution equality.Using the KS test, 3.4 percent of the resamplings were significant (p value < 0.05).Using the two sample WRS test, 4.5 percent of the resamplings were significant (p value < 0.05).
Thus all of the statistical tests suggest that TBARS is not significantly different in heart attack patients.This is further confirmed when examining the mean p-values in the resampled datasets.The mean p-values are 0.5262, 0.5002, and 0.4489 for the KS, WRS, and empirical likelihood tests, respectively.
To examine the power of the two sample EL test, we focus on four examples comparing 100 patients that had previously suffered a heart attack and 100 patients that did not have a heart attack, where the EL statistic is significant, however, the KS and WRS tests are not significant.Figure 7 displays the density for each cohort when a kernel density smoother is employed with the bandwidth chosen according to Equation (3.31) in Silverman (1986).These examples highlight situations where there may be a power advantage obtained using the density-based EL statistics over traditional goodness-of-fit tests such as the two sample KS test and two sample WRS test.

Figure 3 :
Figure 3: Histogram of snowfall data in Buffalo, NY from 1910/11 to 1972/73.Using the EL test for normality, we conclude that the distribution for the data is consistent with a normal distribution (p value=.3234).

Figure 4 :
Figure 4: Snow fall dataset examples where the density-based EL statistic is significant (p value < 0.05), but the Kolmogorov-Smirnov (KS) test and Shapiro Wilks (SW) test are not significant (p values > 0.05).The normal density (red curve) is determined using the sample mean and sample standard deviation to estimate the mean and standard deviation.The black curve represents the kernel density for a randomly chosen subset from the snowfall dataset described in Section 3.1.

Figure 6 :
Figure 6: Histogram of the test statistic for TBARS distribution equality based on 2000 bootstrap resamplings.

Figure 7 :
Figure 7: TBARS examples where the density-based EL distribution equality statistic is significant (p value < 0.05), but the two-sample Kolmogorov-Smirnov (KS) test and twosample Wilcoxon rank sum (WRS) test are not significant (p values > 0.05).

Table 1 :
The critical values for log(R n 1 ,n 2 ) with δ = 0.10 for the two sample comparison with various sample sizes n 1 and n 2 at significance level α.

Table 3 :
Resampling results for Snowfall dataset and Birth dataset.With 2000 simulations, we randomly remove 10, 20, and 50 percent of the observations in the original dataset (Snowfall or Birth).In each remaining dataset, we compute the test statistic and the percentage of significant test statistics (at level 0.05) are summarized in each cell.EL refers to the densitybased empirical likelihood test, KS denotes the Kolmogorov-Smirnov test, and SW denotes the Shapiro-Wilks test.

Table 4 :
Resampling results for Snowfall dataset and Birth dataset.With 2000 simulations, we randomly remove 10, 20, and 50 percent of the observations in the original dataset (Snowfall or Birth).In each remaining dataset, we compute the test statistic and the mean p values are summarized in each cell.EL refers to the density-based empirical likelihood test, KS denotes the Kolmogorov-Smirnov test, and SW denotes the Shapiro-Wilks test.