Next Article in Journal
Asymptotic Expansion and Weak Approximation for a Stochastic Control Problem on Path Space
Next Article in Special Issue
Stochastic Volatility Models with Skewness Selection
Previous Article in Journal
Open Problems within Nonextensive Statistical Mechanics
Previous Article in Special Issue
An Objective and Robust Bayes Factor for the Hypothesis Test One Sample and Two Population Means
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

On the Nuisance Parameter Elimination Principle in Hypothesis Testing

by
Andrés Felipe Flórez Rivera
*,
Luis Gustavo Esteves
,
Victor Fossaluza
and
Carlos Alberto de Bragança Pereira
Institute of Mathematics and Statistics, University of São Paulo, São Paulo 05508-090, Brazil
*
Author to whom correspondence should be addressed.
Entropy 2024, 26(2), 117; https://doi.org/10.3390/e26020117
Submission received: 1 December 2023 / Revised: 23 January 2024 / Accepted: 24 January 2024 / Published: 29 January 2024
(This article belongs to the Special Issue Bayesianism)

Abstract

:
The Non-Informative Nuisance Parameter Principle concerns the problem of how inferences about a parameter of interest should be made in the presence of nuisance parameters. The principle is examined in the context of the hypothesis testing problem. We prove that the mixed test obeys the principle for discrete sample spaces. We also show how adherence of the mixed test to the principle can make performance of the test much easier. These findings are illustrated with new solutions to well-known problems of testing hypotheses for count data.

1. Introduction

Principles of Statistical Inference (or Data Reduction) constitute important guidelines on how to draw conclusions from data, especially when performing standard inferential procedures for unknown parameters of interest, like estimation and hypothesis testing. For instance, the Sufficiency Principle (SP) states that any sufficient statistic retains all relevant information about the unknown parameters that should be used to make inferences about them. It precisely recommends that if T is a sufficient statistic for the statistical model under consideration and  x 1  and  x 2  are sample points such that  T ( x 1 ) = T ( x 2 ) , then the observation of any of these points should lead to the same conclusions regarding the parameters of interest.
Besides the place of sufficiency in Statistical Inference, these recommendations cover several issues such as the contrast between post-experimental and pre-experimental reasoning and the roles of non-informative stopping rules, censoring mechanisms and nuisance parameters in data analysis. Among the main principles, the Sufficiency Principle is generally recognized as a cornerstone of Statistical Inference. On the other hand, the Likelihood Principle (LP) and its profound consequences are still subjects of intense debate. The reader will find a detailed discussion of the Likelihood Principle in [1,2,3,4,5,6].
In this work, we examine the Non-Informative Nuisance Parameter Principle (NNPP) introduced by Berger and Wolpert in 1988 in their remarkable book that concerns the problem of the way inferences about a parameter of interest should be made in the presence of nuisance parameters. Nuisance parameters usually affect inferences about the parameter of interest, like in the estimation of the mean of a normal distribution with unknown variance, in the estimation of the parameters of a linear regression model in the presence of unknown variance, and in the determination of p-values for specific hypotheses in the analysis of  2 × 2  contingency tables ([7]). In a few words, the NNPP states that under suitable conditions, it is irrelevant whether the value of a non-informative nuisance parameter is known or not in order to draw conclusions about the parameter of interest. Despite the importance of the problem for eliminating nuisance parameters in data analysis, the authors have not explored this principle and its consequences in some depth as far as we have reviewed the literature. For this reason, we revisit the NNPP by formally stating it for the problem of hypothesis testing, present decision rules that meet the principle and show how the performance of a particular test in line with the NNPP can then be simplified.
This work is organized as follows: in Section 2, the NNPP for hypothesis testing is stated, discussed and illustrated under a Bayesian perspective. In Section 3, the Bayesian test procedure based on the concept of adaptive significance level and on an alternative p-value introduced by Pericchi and Pereira in [8], henceforth named the mixed test, is reviewed and is proven to satisfy the NNPP for discrete sample data when the (marginal) null hypothesis regarding the parameter of interest is a singleton (as a matter of fact, the result also holds when such a null hypothesis is specified by a hyperplane). In that section, we also define conditional versions of the adaptive significance level and p-value based on suitable statistics and prove that under those conditions, the performance of the mixed test is then simply the comparison between these new conditional quantities. These results are of great importance to make it easier to use the mixed test in various situations. In Section 4, we exemplify the main results by presenting new solutions by using the mixed test for well-known problems of test of hypotheses for count data under suitable reparametrizations of the corresponding models: we revisit the problems of comparison of Poisson population means and of testing the hypotheses of independence and symmetry in contingency tables. We make our final comments in Section 5. The proofs of the theorems and the calculations for one example in Section 4 are found in the Appendix A.

2. The Non-Informative Nuisance Parameter Principle for Hypothesis Testing

The problem of the elimination of nuisance parameters in statistical inference has a long history and remains a major issue. Proposals to deal with it include the marginalization of the likelihood function by integrating out the nuisance parameter ([9,10,11]), the construction of partial likelihood functions ([12,13,14], among others) and the consideration of conditional likelihood functions based on different notions of non-informativeness, sufficiency and ancillarity. Elimination of nuisance parameters and different notions of non-information have also been studied in more detail in [15,16,17,18], where, based on suitable statistics, the concepts of B, S and G non-information are presented. The generalized Sufficiency and Conditionality Principles are also discussed in [17]. On the other hand, Bayesian methods for eliminating nuisance parameters based on a suitable statistic T involve different definitions of sufficiency: for instance, K-Sufficiency, Q-Sufficiency and L-Sufficiency (see for example [17] and references therein).
In this section, the Non-Informative Nuisance Parameter Principle (NNPP) by Berger and Wolpert is discussed and formally defined for the problem of hypothesis testing. As we will see, on the one hand, the NNPP seems to be fair under both the partial and the conditional non-Bayesian approaches mentioned in the previous paragraph; on the other hand, it sounds really reasonable under the Bayesian standpoint. Despite the relevance of the problem of the elimination of nuisance parameters in data analysis, Berger and Wolpert [1] presented the NNPP but has not explored the principle in-depth as far as we have examined in the literature.
Some notation is needed to continue. We denote by  θ  the unknown parameter and by X the sample to be observed.  Θ  and  X  represent the parameter and the sample spaces, respectively. The family of discrete probability distributions for X is denoted by  P = { P ( · | θ ) : θ Θ } . In addition, for  x X L x ( · )  denotes the likelihood function for  θ  generated by the sample point x. By an experiment  E , we mean, as in [1], a triplet  E = ( X , θ , P ) , with X θ  and  P  as defined earlier. Finally, for a subset  Θ 0  of  Θ , we formulate the null hypothesis  H : θ Θ 0  and the alternative one  A : θ Θ 0 . We recall that a test function (procedure) for the hypotheses H versus A is a function  φ : X { 0 , 1 }  that takes the value 1 ( φ ( x ) = 1 ) if H is rejected when  x X  is observed and takes the value 0 ( φ ( x ) = 0 ) if H is not rejected when x is observed. Under the Bayesian perspective, we also consider a continuous prior density function  π ( · )  for  θ  that induces, when combined with the likelihood function  L x ( · ) , a continuous posterior density function for  θ  given x π ( · | x ) .
In [1], Berger and Wolpert presented the following principle on how to make inferences about an unknown parameter of interest  θ 1  in the presence of a nuisance parameter  θ 2 : when a sample observation, say  x 0 , separates information concerning  θ 1  from information on  θ 2 , it is irrelevant whether the value of  θ 2  is known or unknown in order to make inferences about  θ 1  based on the observation of  x 0 . In other terms, if the conclusions on  θ 1  were to be the same for every possible value of the nuisance parameter, were  θ 2  known, then the same conclusions on  θ 1  should be reached even if  θ 2  is unknown. These authors then consider the following mathematical setup to formalize these ideas.
Let  θ = ( θ 1 , θ 2 ) , with  θ 1  and  θ 2  defined as in the previous paragraph. Consider  Θ = Θ 1 × Θ 2 ; that is, the parameter space is variation independent, where  Θ i R n i  is the set of values for  θ i n i N * i = 1 , 2 . Suppose the experiment  E = ( X , θ , P )  is carried out to learn about  θ . Let  E ¯ = ( ( X , θ 2 ) , θ 1 , P ¯ )  be the “thought” experiment in which the pair  ( X , θ 2 )  is to be observed (instead of observing only X), where  P ¯  is the family of distributions for  ( X , θ 2 )  indexed by  θ 1 . Suppose also that under experiment  E , the likelihood function generated by a specific  x 0 X  for  θ  has the following factored form:
L x 0 ( θ 1 , θ 2 ) = L x 0 1 ( θ 1 ) L x 0 2 ( θ 2 )
where  L x 0 i : Θ i R + i = 1 , 2 ; that is,  L x 0 i  depends on  θ  only through  θ i .
Berger and Wolpert then states the Non-Informative Nuisance Parameter Principle (NNPP): if  E  and  x 0 X  are such that (1) holds, and if the inference about  θ 1  from the observation of  ( x 0 , θ 2 )  when  E ¯  is performed does not depend on  θ 2 , then the inferential statements made for  θ 1  from  E  and  x 0  should be the same as (should coincide with) the inferential statements made from  E ¯  and  ( x 0 , θ 2 )  for every  θ 2 Θ 2 .
The authors named such a parameter  θ 2  a Non-Informative Nuisance Parameter (NNP), as the conclusions or decisions regarding  θ 1  from  E ¯  and  ( x 0 , θ 2 )  do not depend on  θ 2 .
A likelihood function that satisfies (1) is named a likelihood function with separable parameters ([19]). The factored form of the likelihood function in (1) seems to capture the notion of “absence of information about a parameter, say  θ 1 , from the other,  θ 2 , and vice versa” under both Bayesian and non-Bayesian reasoning. Indeed, under the Bayesian paradigm, posterior independence between  θ 1  and  θ 2  (say, given  x 0 ) reflects the fact that one´s opinion about the parameter  θ 1  after observing  x 0  is not altered by any information about  θ 2 , and consequently, decisions regarding  θ 1  should not depend on  θ 2 . Since posterior independence between  θ 1  and  θ 2  given  x 0  is equivalent to the factored form of the likelihood function generated by  x 0  under prior independence, condition (1) sounds really reasonable as a mathematical description of separate information about the parameters. Thus, if a Bayesian statistician should make inferences regarding a parameter  θ 1  in the presence of a nuisance parameter  θ 2 , it would be ideal that these parameters are independent a posteriori; that is, the factored form of the likelihood function holds. This last equivalence is proven in the theorem below.
Theorem 1.
Let  E = ( X , θ , P )  be an experiment and  π ( · )  be the prior probability density function for  θ = ( θ 1 , θ 2 ) . Suppose  θ 1  is independent of  θ 2  ( θ 1 θ 2 ) a priori. Then, for each  x X ,
θ 1 θ 2 | X = x L x i : Θ i R + , i = 1 , 2 ,
such that  L x ( θ 1 , θ 2 ) = L x 1 ( θ 1 ) L x 2 ( θ 2 ) .
On the other hand, the condition (1) seems to also be a fair representation of non-informativeness of one parameter on another under a non-Bayesian perspective. In fact, such a factored form of the likelihood function arises, for instance, when the sample X is conditioned on particular types of statistics that are simple to interpret under non-Bayesian paradigms. Note that for any statistic T, one can write
L x ( θ 1 , θ 2 ) = P ( X = x | θ 1 , θ 2 ) = P ( X = x | T ( X ) = T ( x ) , θ 1 , θ 2 ) P ( T ( X ) = T ( x ) | θ 1 , θ 2 ) .
If, in addition, T is a statistic such that its distribution given  θ  depends only on  θ 1  and the conditional distribution of X given  T ( X ) = T ( x ) , and  θ  depends only on  θ 2 , the factored form in (1) is easily obtained (such a statistic was named p-sufficient for  θ 1  by Basu ([17]). In this situation, all the relevant information on  θ 1  is summarized in T, and one can fully make inferences on  θ 1  taking into account only the conditional distribution of T given  θ , which does not involve  θ 2 . Similarly, if T is a statistic such that its distribution given  θ  depends only on  θ 2  and the conditional distribution of X given  T ( X ) = T ( x )  and  θ  depends only on  θ 1 , the factored form in (1) holds. Such a statistic was named s-ancillary for  θ 1  by Basu ([17]), and it is somewhat evident that in this case, conclusions on  θ 1  should be drawn exclusively from the distribution of X given  T ( X )  and  θ , which does not depend on  θ 2 . Such a conditional approach to the problem of elimination of nuisance parameters had already been proposed by Basu ([17]) and in a sense is closely related to the NNPP by Berger and Wolpert. The next theorem formally presents such results.
Theorem 2.
Let  E = ( X , θ , P )  be an experiment in which  θ = ( θ 1 , θ 2 )  and Θ is variation independent. Then, if  T : X T  such that T is either p-sufficient or s-ancillary for  θ 1 , then for each  x X , the likelihood function generated by x,  L x ( · )  can be factored as (1).
In summary, it seems reasonable that inferences about  θ 1  and  θ 2  can be performed independently under condition (1). Thus, if only  θ 1  is of interest, then it seems sensible under (1) that we reach the same conclusions on  θ 1  when x is observed either by using the whole likelihood function  L x  or only the factor  L x 1 . That is, it makes sense to disregard the information contained in  L x 2  and focus on  L x 1 . As mentioned by [19], examples of likelihood functions with separable parameters like (1) are rare, but if (1) holds, it would be a useful property for Bayesian and non-Bayesian statisticians to analyze statistical data, especially in the presence of nuisance parameters. This fact will be illustrated in Section 3 and Section 4.
We end this section by formally adapting the general NNPP to the special problem of hypothesis testing, in which inference about an unknown parameter consists of deciding whether a statement about the parameter (a statistical hypothesis) should be rejected or accepted by using the observable quantity X.
As before, let  E = ( X , θ , P )  be an experiment, with  Θ = Θ 1 × Θ 2 . Let  E ¯ = ( ( X , θ 2 ) , θ 1 , P ¯ )  be the “thought” experiment in which, in addition to X θ 2  is observed. Then, consider the following definition.
Definition 1.
Non-Informative Nuisance Parameter (NNP): Let  B Θ 1  and  φ ¯ : X × Θ 2 { 0 , 1 }  be a test for the hypotheses
H ¯ : θ 1 B A ¯ : θ 1 B
Then, we say that  θ 2  is a Non-Informative Nuisance Parameter (NNP) for testing  H ¯  versus  A ¯  by using  φ ¯  if, for every  x X  such that (1) holds,  φ ¯ x , θ 2  does not depend on  θ 2 ; that is, it depends only on x.
In a nutshell, Definition 1 tells us something that appears intuitive: if the decision between H and A does not depend on  θ 2 , then  θ 2  does not provide any information about  θ 1 . In the following example, we illustrate this idea.
Example 1.
Consider that  Θ 1 = Θ 2 = R  and the experiment  E ¯ = ( ( X , θ 2 ) , θ 1 , P ) . Let  B R  and  φ ¯ : X × Θ 2 { 0 , 1 }  be the test for the hypotheses
H ¯ : θ 1 B A ¯ : θ 1 B ,
such that the null hypothesis is rejected when the conditional probability of B given x and  θ 2  is small; that is,
φ ¯ ( x , θ 2 ) = 1 P ( θ 1 B | x , θ 2 ) < δ ,
where  δ ( 0 , 1 ) . Suppose, in addition, that  θ 1  and  θ 2  are independent a priori. Let us verify that  θ 2  is an NNP for testing these hypotheses by means of  φ ¯ . Let  x 0 X  be such that  L x 0 ( θ 1 , θ 2 ) = L x 0 1 ( θ 1 ) L x 0 2 ( θ 2 )  for specific functions  L x 0 1  and  L x 0 2 . Then,
P ( θ 1 B | x 0 , θ 2 ) = B π ( θ 1 | x 0 , θ 2 ) d θ 1 = = B [ P ( X = x 0 | θ 1 , θ 2 ) π ( θ 2 | θ 1 ) π 1 ( θ 1 ) Θ 1 P ( X = x 0 | θ 1 , θ 2 ) π ( θ 2 | θ 1 ) π 1 ( θ 1 ) d θ 1 ] d θ 1 = = B L x 0 ( θ 1 , θ 2 ) π 2 ( θ 2 ) π 1 ( θ 1 ) d θ 1 Θ 1 L x 0 ( θ 1 , θ 2 ) π 2 ( θ 2 ) π 1 ( θ 1 ) d θ 1 = = B L x 0 1 ( θ 1 ) L x 0 2 ( θ 2 ) π 2 ( θ 2 ) π 1 ( θ 1 ) d θ 1 Θ 1 L x 0 1 ( θ 1 ) L x 0 2 ( θ 2 ) π 2 ( θ 2 ) π 1 ( θ 1 ) d θ 1 P ( θ 1 B | x 0 , θ 2 ) = B L x 0 1 ( θ 1 ) π 1 ( θ 1 ) d θ 1 Θ 1 L x 0 1 ( θ 1 ) π 1 ( θ 1 ) d θ 1 ,
where  π i  is the prior of  θ i i = 1 , 2 . Thus, we have that
φ ¯ ( x 0 , θ 2 ) = 1 B L x 0 1 ( θ 1 ) π 1 ( θ 1 ) d θ 1 Θ 1 L x 0 1 ( θ 1 ) π 1 ( θ 1 ) d θ 1 < δ .
Note from Equation (7) that  φ ¯ ( x 0 , θ 2 )  does not depend on  θ 2 . Thus,  θ 2  is an NNP for testing  H ¯  versus  A ¯  by using  φ ¯ .
After defining an NNP, we formally state the Non-Informative Nuisance Parameter Principle (NNPP) for hypothesis testing.
Definition 2.
Non-Informative Nuisance Parameter Principle (NNPP): Let the parameter space be variation independent; that is,  Θ = Θ 1 × Θ 2 . Consider the experiments  E = ( X , θ , P )  and  E ¯ = ( ( X , θ 2 ) , θ 1 , P ¯ ) . Let  B Θ 1  be the subset of  Θ 1  of interest. In addition, let  φ : X { 0 , 1 }  and  φ ¯ : X × Θ 2 { 0 , 1 }  be tests for the hypotheses
H : θ B × Θ 2 A : θ B × Θ 2 , a n d H ¯ : θ 1 B A ¯ : θ 1 B ,
respectively.
If  θ 2  is an NNP for testing  H ¯  versus  A ¯  by using  φ ¯  and  x 0 X  such that condition (1) holds, then
φ ( x 0 ) = 1 φ ¯ x 0 , θ 2 = 1 .
The NNPP for statistical hypothesis testing says that if one intends to test a hypothesis regarding only the parameter  θ 1 , it is irrelevant whether  θ 2  is known or unknown if it is non-informative for such a decision-making problem. More formally, if one wants to test a hypothesis concerning only  θ 1  and he observes a sample point  x 0 X  that separates information on  θ 1  from information on  θ 2 —that is, (1) holds—then the performances of the tests  φ  under the original experiment  E  and  φ ¯  under the “thought” experiment  E ¯  should yield the same decision on the hypothesis  θ 1 B  if  θ 2  is non-informative for that purpose.
We should mention that the NNPP can be adapted to any other inferential procedure. However, in this work, we focus on the principle for the problem of hypothesis testing. We conclude this section by proving that tests based on the posterior probabilities of the hypotheses satisfy the NNPP under prior independence.
Example 2
(continuation of Example 1). Consider the conditions of Example 1. Consider  E = ( X , θ , P )  and let  φ : X { 0 , 1 }  be the test for the hypotheses
H : θ B × Θ 2 A : θ B × Θ 2
that rejects the null hypothesis H if its posterior probability is small; that is,
φ ( x ) = 1 P ( θ B × Θ 2 | x ) < δ .
Let  x 0 X  be such that  L x 0 ( θ 1 , θ 2 ) = L x 0 1 ( θ 1 ) L x 0 2 ( θ 2 ) . We can write the posterior probability on the right-hand side of (11) as
P ( θ B × Θ 2 | x 0 ) = B × Θ 2 π ( θ | x 0 ) d θ = B × Θ 2 L x 0 ( θ ) π ( θ ) d θ Θ L x 0 ( θ ) π ( θ ) d θ = = B × Θ 2 L x 0 1 ( θ 1 ) L x 0 2 ( θ 2 ) π 1 ( θ 1 ) π 2 ( θ 2 ) d θ Θ 1 × Θ 2 L x 0 1 ( θ 1 ) L x 0 2 ( θ 2 ) π 1 ( θ 1 ) π 2 ( θ 2 ) d θ = B L x 0 1 ( θ 1 ) π 1 ( θ 1 ) d θ 1 Θ 1 L x 0 1 ( θ 1 ) π 1 ( θ 1 ) d θ 1 ,
where the last equality follows from Fubini’s Theorem. Hence,
φ ( x ) = 1 B L x 0 1 ( θ 1 ) π 1 ( θ 1 ) d θ 1 Θ 1 L x 0 1 ( θ 1 ) π 1 ( θ 1 ) d θ 1 < δ .
From Equations (7) and (13), we have that  φ ( x ) = 1 φ ¯ ( x , θ 2 ) = 1 . Thus, the NNP Principle is met by tests based on posterior probabilities, as in Example 1. This result also holds when  Θ i R n i n i N * i = 1 , 2 .
In the next section, we examine a second test procedure that is in line with the NNPP. We review the mixed test introduced by Pericchi and Pereira ([8]) and prove that such a test meets the NNPP for simple hypotheses concerning the parameter of interest. We also show how the adherence of the mixed test to the NNPP can then simplify its use.

3. The Mixed Test Procedure

The mixed test formally introduced in ([8]) is a test procedure that combines elements from both Bayesian and frequentist views. On the one hand, it considers an (intrinsically Bayesian) prior distribution for the parameter from which predictive distributions for the data under the competing hypotheses and Bayes factors are derived. On the other hand, the performance of the test depends on ordering the sample space by the Bayes factor and on the integration of these predictive distributions over specific subsets of the sample space in a frequentist-like manner. The mixed test is an optimal procedure in the sense that it minimizes linear combinations of averaged (weighted) probabilities of errors of decision. It also meets a few logical requirements for multiple-hypothesis testing and obeys the Likelihood Principle for discrete sample spaces despite the integration over the sample space it involves. In addition, the test overcomes several of the drawbacks fixed-level tests have. However, a difficulty with the mixed test procedure is the need to evaluate the Bayes factor for every sample point to order the sample space, which may involve intensive calculations. Properties of the mixed test and examples of application are examined in detail in [8,20,21,22,23,24,25].
Next, we review the general procedure for the performance of the mixed test and then show the test satisfies the NNPP when the hypothesis regarding the parameter of interest is a singleton.
First, we determine the predictive distributions for X under the competing hypotheses H and A f H  and  f A , respectively. For the null hypothesis  H : θ Θ 0 Θ 0 Θ f H  is determined as follows: for each  x X ,
f H ( x ) = Θ 0 L x ( θ ) d P H ( θ ) ,
where  P H  denotes the conditional distribution of  θ  given  θ Θ 0 . That is, for each  x X f H ( x )  is the expected value of the likelihood function generated by x against  P H . Similarly, for the alternative hypothesis  A : θ Θ 0 c  we define
f A ( x ) = Θ 0 c L x ( θ ) d P A ( θ ) ,
where  P A  denotes the conditional distribution of  θ  given  θ Θ 0 c . From (14) and (15), we obtain the Bayes factor of  x X  for the hypothesis H over A as
B F ( x ) = f H ( x ) f A ( x ) .
Finally, the mixed test  φ * : X { 0 , 1 }  for the hypotheses H versus A consists in rejecting H when  x X  is observed if and only if the Bayes factor  B F ( x )  is small. That is, for each  x X ,
φ * ( x ) = 1 B F ( x ) b / a ,
where the positive constants a and b reflect the decision maker’s evaluation of the impact of the errors of the two types or, equivalently, his prior preferences for the competing hypotheses. A detailed discussion on the specification of such constants is found in [8,20,21,22,23,24,25].
The mixed test can also be defined as a function of a new significance index. That is, (17) can be rewritten as a comparison between such a significance index and a specific cut-off value. These quantities are defined below.
For the mixed test defined in (17), the p-value of the observation  x 0 X  is the significance index given by
p - v a l u e ( x 0 ) = x D ( x 0 ) f H ( x ) ,
where  D ( x 0 ) = { x X : B F ( x ) B F ( x 0 ) } . Also, we define the adaptive type I error probability of  φ *  as
α * = x D f H ( x ) ,
where  D = { x X : B F ( x ) b / a } . Alternatively,  α *  is also known as the adaptive significance level of  φ * .
Pereira et al. [21] proved that the mixed test  φ *  for the hypotheses
H : θ Θ 0 A : θ Θ 0 c
can be written as
φ * ( x ) = 1 B F ( x ) b / a p - v a l u e ( x ) α * .
Note that  φ *  consists of comparing the p- v a l u e  with the cut-off  α * , which depends on the specific statistical model under consideration and on the sample size, as opposed to a standard test with a fixed significance level that does not depend on the sample size.
The former does not have a few of the disadvantages of the latter, such as inconsistency ([8,26]) lack of correspondence between practical significance and statistical significance ([8,27]) and absence of logical coherence under multiple-hypothesis testing. We continue with the main results of the manuscript.

The Mixed Test Obeys the NNPP

In this subsection, we prove that the mixed test meets the NNPP when the hypothesis about the parameter of interest is simple. Next, we examine further the case in which there is a statistic s-ancillary for the parameter of interest and show how the introduction of the concepts of a conditional p- v a l u e  and a conditional adaptive significance level can make performance of the mixed test much easier.
Theorem 3.
Let  θ = ( θ 1 , θ 2 )  and  Θ = Θ 1 × Θ 2  (that is, Θ is variation independent). Let  E = ( X , θ , P )  and  E ¯ = ( ( X , θ 2 ) , θ 1 , P ¯ )  be two experiments as defined in Section 2. Let  θ 0 Θ 1 . In addition, let  φ * : X { 0 , 1 }  and  φ ¯ * : X × Θ 2 { 0 , 1 }  be the mixed tests for the hypotheses
H : θ { θ 0 } × Θ 2 A : θ { θ 0 } × Θ 2 a n d H ¯ : θ 1 = θ 0 A ¯ : θ 1 θ 0 ,
respectively. Assume  θ = θ 1 , θ 2  is absolutely continuous with prior density function π, with  θ 1 θ 2 . Then,  θ 2  is a Non-Informative Nuisance Parameter for testing  H ¯  versus  A ¯  by using  φ ¯ * , and for every  x X  such that (1) holds,
φ * ( x ) = 1 φ ¯ * x , θ 2 = 1
Theorem 3 tells us that when the likelihood function may be factored as (1), the mixed test obeys the NNPP. That is to say, if one aims to test a simple hypothesis about the parameter of interest  θ 1  in the presence of a non-informative nuisance parameter  θ 2  by means of the mixed test, then he can disregard  θ 2  in the analysis. Under a purely mathematical viewpoint, when  x X  satisfying (1) is observed, the decision between rejecting and accepting the null hypothesis regarding  θ 1  depends on  L x  only through the factor  L x 1 , which is not a function of  θ 2 , as we can see from Equation (A16) in Appendix A. It should be emphasized that Theorem 3 holds for null hypotheses more general than only simple ones. For instance, the Theorem is still valid when the null hypothesis H is of the form  H : θ Θ 0 × Θ 2 , where  Θ 0 Θ 1  is a hyperplane of  Θ 1 . The proof of this result is quite similar to the proof of Theorem 3 in Appendix A and for this reason is omitted.
The adherence to the NNPP is indeed an advantage of the mixed test. It may bring a considerable reduction in the calculations involved along the procedure of the mixed test, especially under statistical models for which a statistic s-ancillary for the parameter of interest can be found. Such cases are examined after Corollary 1, which follows straightforwardly from Theorems 2 and 3.
Corollary 1.
Assume the same conditions of Theorem 3 and suppose that  T : X T  such that T is p-sufficient for  θ 2  and s-ancillary for  θ 1 . Then, for all  x X φ * ( x ) = 1 φ ¯ * x , θ 2 = 1 .
Now, let us suppose that under experiment  E = ( X , θ , P ) , there is a statistic  T : X T  such that T is s-ancillary for  θ 1 . Let  H : θ { θ 0 } × Θ 2  be the hypothesis of interest. From the predictive distribution  f H  for X, we can define for each value  t T ( X ) = { T ( x ) : x X }  the conditional probability function for X given  T ( X ) = t f H , t  by
f H , t ( x ) = f H ( x ) y X : T ( y ) = t f H ( y )
if  T ( x ) = t , and  f H , t ( x ) = 0 , otherwise.
Finally, from the conditional distribution in (24), we define two conditional statistics: the conditional p- v a l u e  and the conditional adaptive significance level. Such quantities will be of great importance for the performance of the mixed test, as we will see in the next section.
Definition 3.
Conditional p-value: Let  E = ( X , θ , P )  be an experiment for which the statistic  T : X T  is s-ancillary for  θ 1 . Let  H : θ { θ 0 } × Θ 2  be the hypothesis of interest, and  f H , t t T ( X ) , as in (24). We define the p-value conditional on T,  p T v a l u e : X [ 0 , 1 ]  for each  x 0 X  by
p T - v a l u e x 0 = x D ( x 0 ) f H , T ( x 0 ) ( x ) = x D T * ( x 0 ) f H ( x ) x D T ( x 0 ) f H ( x ) ,
where  D T * ( x 0 ) = { x X : B F ( x ) B F ( x 0 ) , T ( x ) = T ( x 0 ) }  and  D T ( x 0 ) = { x X : T ( x ) = T ( x 0 ) } . From Equation (A14), the  P T - v a l u e  may be rewritten as
p T - v a l u e x 0 = x D T * ( x 0 ) L x 1 θ 0 Θ 2 L x 2 θ 2 π 2 θ 2 d θ 2 x D T ( x 0 ) L x 1 θ 0 Θ 2 L x 2 θ 2 π 2 θ 2 d θ 2 ,
where  L x 1 ( θ 0 ) = P ( X = x | T ( X ) = T ( x ) , θ 0 )  and  L x 2 θ 2 = P T ( X ) = T ( x ) | θ 2  since T is s-ancillary for  θ 1 . It follows that
p T - v a l u e x 0 = x D T * ( x 0 ) P ( X = x | T ( X ) = T ( x ) , θ 0 ) Θ 2 P T ( X ) = T ( x ) | θ 2 π 2 θ 2 d θ 2 x D T ( x 0 ) P ( X = x | T ( X ) = T ( x ) , θ 0 ) Θ 2 P T ( X ) = T ( x ) | θ 2 π 2 θ 2 d θ 2 = x D T * ( x 0 ) P ( X = x | T ( X ) = T ( x 0 ) , θ 0 ) Θ 2 P T ( X ) = T ( x 0 ) | θ 2 π 2 θ 2 d θ 2 x D T ( x 0 ) P ( X = x | T ( X ) = T ( x 0 ) , θ 0 ) Θ 2 P T ( X ) = T ( x 0 ) | θ 2 π 2 θ 2 d θ 2 = = x D T * ( x 0 ) P ( X = x | T ( X ) = T ( x 0 ) , θ 0 ) x D T ( x 0 ) P ( X = x | T ( X ) = T ( x 0 ) , θ 0 ) ;
that is,
p T - v a l u e x 0 = x D T * ( x 0 ) P ( X = x | T ( X ) = T ( x 0 ) , θ 0 ) .
Definition 4.
Conditional adaptive significance level: Let  E = ( X , θ , P )  be an experiment for which the statistic  T : X T  is s-ancillary for  θ 1 . Let  H : θ { θ 0 } × Θ 2  be the hypothesis of interest and  f H , t t T ( X )  be as in (24). We define the conditional adaptive significance level given T,  α T * : X [ 0 , 1 ] , for each  x 0 X  by
α T * x 0 = x D f H , T ( x 0 ) ( x ) = x D D T ( x 0 ) f H ( x ) x D T ( x 0 ) f H ( x ) .
The conditional adaptive significance level  α T *  may be rewritten as
α T * x 0 = x D D T ( x 0 ) P ( X = x | T ( X ) = T ( x 0 ) , θ 0 ) .
Definitions 3 and 4 are conditional versions of Definitions in (18) and (19), respectively. While calculation of the unconditional quantities involves the evaluation of the Bayes factor for every  x X , the determination of the conditional statistics at a specific sample point  x 0 X  depends only on the values of the Bayes factor for the sample points x such that  T ( x ) = T ( x 0 ) , which may be much easier to accomplish. Note also that the  p T -value and  α T *  can be seen, respectively, as an alternative (conditional) measure of evidence in favor of the null hypothesis H and an alternative threshold value for testing the competing hypotheses. As a matter of fact, one can substitute the p- v a l u e  and the adaptive significance level with their conditional versions in order to perform the mixed test. This is exactly what the next theorem states.
Theorem 4.
Assume the same conditions as in Corollary 1 and Theorem 3. Then, for all  x 0 X ,
φ * x 0 = 1 p T - v a l u e x 0 α T * ( x 0 ) .
The results of Theorems 3 and 4 and Corollary 1 suggest a way the mixed test may be used without doing so many calculations: when an ancillary statistic for the parameter of interest, T, is available, one can perform the test by comparing the conditional statistics  p T - v a l u e  and  α T *  instead of comparing the unconditional ones in Definitions (18) and (19). This possibility is illustrated in the next section.

4. Examples

We now revisit three well-known problems of hypothesis testing for count data and present new solutions to them by means of the mixed test. In each problem, we consider a suitable reparametrization of the standard model in order to ensure that
  • There exists a statistic T that is ancillary to the new parameter of interest;
  • The hypothesis about the new parameter of interest under the reparametrization is a singleton (or a hyperplane);
  • The new parameter of interest is independent of the new nuisance parameter a priori;
  • The distribution of the data X given any value of the statistic T is simple enough to render the calculations of the conditional p- v a l u e  and the conditional adaptive significance level easy.

4.1. Comparison of Poisson Means

Suppose we are interested in testing the equality between two Poisson means: say  θ 1  and  θ 2 . Let  θ = ( θ 1 , θ 2 ) . For this purpose, let  X = ( X 1 , X 2 )  be a random vector to be observed such that given  θ X 1  and  X 2  are independent Poisson random variables with parameters  n θ 1  and  n θ 2 , respectively, where  n N * = { 1 , 2 , 3 , }  is a known integer. The hypotheses to be tested are
H : θ Θ 0 A : θ Θ 0 c ,
where  Θ 0 = { ( θ 1 , θ 2 ) R + 2 : θ 1 = θ 2 } . The likelihood function for  θ R + 2  generated by  x = ( x 1 , x 2 ) N 2  is
L x ( θ ) = ( n θ 1 ) x 1 x 1 ! ( n θ 2 ) x 2 x 2 ! e θ 1 n e θ 2 n .
Suppose also that  θ 1  and  θ 2  are independent a priori and that  θ i  is distributed as a Gamma random variable with parameters  a i > 0  and  c > 0 i = 1 , 2 . That is, the prior density function of  θ  is
π ( θ ) = c a 1 Γ ( a 1 ) θ 1 a 1 1 e c θ 1 I ( 0 , ) ( θ 1 ) c a 2 Γ ( a 2 ) θ 2 a 2 1 e c θ 2 I ( 0 , ) ( θ 2 ) .
Although one can determine an exact expression for the Bayes factor in this case (as a matter of fact, in [25], the authors first presented a solution to the problem of testing the equality of Poisson means by using weighted likelihoods in the context of a production process monitoring procedure), the use of the mixed test under the above parametrization may be computationally disadvantageous, as the sample space is  N 2  and one should determine infinitely many Bayes factors to perform the test. To overcome this difficulty, we next consider the following reparametrization of the model: let  λ = ( λ 1 , λ 2 )  be the new parameter, where
λ 1 = θ 1 θ 1 + θ 2 a n d λ 2 = θ 1 + θ 2 .
The new parameter space is then  Λ = ( 0 , 1 ) × R + . Now, the hypotheses (30) can be rewritten as
H ˜ : λ Λ 0 A ˜ : λ Λ 0 c ,
with  Λ 0 = { 1 2 } × R + . Note that the likelihood function (31) can be rewritten by conditioning on the statistic  T ( X ) = X 1 + X 2  as follows:
L x ( θ ) = P ( X 1 = x 1 , X 2 = x 2 | θ ) = P ( X 1 = x 1 , X 2 = x 2 | X 1 + X 2 = x 1 + x 2 , θ ) P ( X 1 + X 2 = x 1 + x 2 | θ ) = x 1 + x 2 x 1 θ 1 θ 1 + θ 2 x 1 θ 2 θ 1 + θ 2 x 2 e n ( θ 1 + θ 2 ) [ n ( θ 1 + θ 2 ) ] x 1 + x 2 ( x 1 + x 2 ) ! .
Hence, the induced likelihood function for  λ  generated by  ( x 1 , x 2 )  may be factored as
L ˜ x ( λ ) = x 1 + x 2 x 1 λ 1 x 1 1 λ 1 x 2 ( n λ 2 ) x 1 + x 2 ( x 1 + x 2 ) ! e λ 2 n
Note that T is an ancillary statistic for  λ 1 , as it is distributed as a Poisson random variable with mean  n λ 2 , and the conditional distribution of X, given  T ( X ) = t t N , depends on  λ  only through  λ 1 . The prior distribution for  λ  is given by
π ˜ ( λ ) = Γ ( a 1 + a 2 ) Γ ( a 1 ) Γ ( a 2 ) λ 1 a 1 1 ( 1 λ 1 ) a 2 1 I ( 0 , 1 ) ( λ 1 ) c a 1 + a 2 Γ ( a 1 + a 2 ) λ 2 a 1 + a 2 1 e c λ 2 I ( 0 , ) ( λ 2 ) .
Now, as (34), (36) and (37) hold, it follows from Theorem 3 that  λ 2  is an NNP and that the performance of the mixed test for the hypothesis  H ˜ : λ { 1 2 } × R +  against  A ˜  based on  L ˜ x  and the prior  π ˜  is equivalent to the performance of the mixed test for the simple hypothesis  λ 1 = 1 2  against  λ 1 1 2  based on the binomial-like factor of  L ˜ x  that depends only on  θ 1  and the marginal Beta prior density for  λ 1  ignoring the NNP  λ 2 . In addition, Theorem 4 implies that the test for  H ˜  versus  A ˜  reduces to the comparison of the statistics  p T - v a l u e  and  α T *  at the observed sample point, say  x 0 = ( x 01 , x 02 ) . Note that in this case, one does not need to evaluate the Bayes factor for every point of  N 2  but only for those  x 01 + x 02 + 1  of them for which the sum of the components is  x 01 + x 02 . That is, one needs to evaluate the Bayes factor only for the elements of  { ( u , v ) N 2 : T ( u , v ) = T ( x 0 ) = x 01 + x 02 }  when  x 0 = ( x 01 , x 02 )  is observed.
From Equations (A14) and (A15), one gets the following predictive functions under  H ˜  and under  A ˜  for X:
f H ˜ ( x ) = ( 1 / 2 ) x 1 + x 2 x 1 + x 2 x 1 × K
and
f A ˜ ( x ) = x 1 + x 2 x 1 Γ ( x 1 + a 1 ) Γ ( x 2 + a 2 ) Γ ( x 1 + x 2 + a 1 + a 2 ) Γ ( a 1 + a 2 ) Γ ( a 1 ) Γ ( a 2 ) × K ,
where  K = 0 ( n λ 2 ) x 1 + x 2 ( x 1 + x 2 ) ! e λ 2 n c a 1 + a 2 Γ ( a 1 + a 2 ) λ 2 a 1 + a 2 1 e c λ 2 d λ 2 . Consequently, the Bayes factor  B F ( x )  is
B F ( x ) = ( 1 / 2 ) x 1 + x 2 Γ ( a 1 ) Γ ( a 2 ) Γ ( a 1 + a 2 ) Γ ( x 1 + x 2 + a 1 + a 2 ) Γ ( x 1 + a 1 ) Γ ( x 2 + a 2 ) .
Finally, it follows from (28) and (30) that for  x 0 = ( x 01 , x 02 ) N 2 ,
p T - v a l u e x 0 = ( x 1 , x 2 ) D T * ( x 0 ) x 1 + x 2 x 1 ( 1 / 2 ) x 1 + x 2 = x 1 = 0 T ( x 0 ) T ( x 0 ) x 1 ( 1 / 2 ) T ( x 0 ) I D ( x 0 ) ( x 1 , T ( x 0 ) x 1 )
and
α T * x 0 = ( x 1 , x 2 ) D D T ( x 0 ) x 1 + x 2 x 1 ( 1 / 2 ) x 1 + x 2 = x 1 = 0 T ( x 0 ) T ( x 0 ) x 1 ( 1 / 2 ) T ( x 0 ) I D ( x 1 , T ( x 0 ) x 1 ) .
Note that in this case, the conditional  p T - v a l u e  resembles the frequentist p- v a l u e  for the simple hypothesis  θ = 1 2  under simple random sampling from the Bernoulli model with parameter  θ  (however, for the calculation of the  p T - v a l u e , the sample space is ordered by the Bayes factor instead of the likelihood ratio).
Example 3
(Comparison of Poisson means). In [25], the authors consider that a methodology to detect a shift in a production process is to compare the quality index of the current rating period P,  θ 2 , with the quality index of the previous rating period,  θ 1 . Suppose that we want to test if a process is under control; that is, if  θ 1 = θ 2 . For this purpose, two audit samples of size  n = 10  are collected at rating periods  P 1  and P, respectively. Let  X 1  represent the number of defects found in the first sample and  X 2  represent the number of defects found in the second sample. Also suppose that  X 1  and  X 2  are Poisson random variables with parameters  n θ 1  and  n θ 2 , respectively. Let  X = ( X 1 , X 2 ) . For simplicity, we consider the hyperparameters in (32) as  a 1 = a 2 = c = 1 . Hence, the predictive functions under the competing hypothesis are given by:
f H ( x ) = ( 1 / 2 ) x 1 + x 2 ( x 1 + x 2 + 1 ) x 1 + x 2 x 1 n n + 1 x 1 + x 2 1 n + 1 2 ,
and
f A ( x ) = n n + 1 x 1 + x 2 1 n + 1 2 .
Consequently, the Bayes factor at  x = ( x 1 , x 2 ) N 2  can be expressed by
B F ( x ) = ( 1 / 2 ) x 1 + x 2 ( x 1 + x 2 + 1 ) x 1 + x 2 x 1 .
Now, suppose that two defects are found at rating period  P 1  and nine defects are found at period P. That is, suppose that  X = ( 2 , 9 )  is observed. In this case, one gets  B F ( 2 , 9 ) = 0.322 . Considering  b a = 1 , the conditional adaptive significance level and the conditional  p T -value at  ( 2 , 9 )  are, respectively,
α T * ( ( 2 , 9 ) ) = x 1 { 0 , 1 , 2 , 3 , 8 , 9 , 10 , 11 } 2 + 9 x 1 ( 1 / 2 ) 2 + 9 = 0.227 a n d
P T - value ( ( 2 , 9 ) ) = x 1 { 0 , 1 , 2 , 9 , 10 , 11 } 11 x 1 ( 1 / 2 ) 11 = 0.065 .
Since  P T - value ( ( 2 , 9 ) ) < α T * ( ( 2 , 9 ) ) , the decision is to reject the null hypothesis (34), where  λ = ( θ 1 θ 1 + θ 2 , θ 1 + θ 2 ) .
Note that although the sample size is small, the null hypothesis can be rejected with a conditional  P T - v a l u e  of 0.065. Such a value is not compared with standard (fixed) cut-off values such as 0.01 or 0.05 but rather with the conditional adaptive significance level of 0.227 for  X = ( 2 , 9 ) . Note also that performance of the mixed test by means of the conditional statistics  p T - v a l u e  and  α T *  when  X = ( 2 , 9 )  is observed requires the calculation of only finitely many Bayes factors (twelve, precisely) even though the sample space is infinite.

4.2. Test of Symmetry

Suppose we want to test the hypothesis of symmetry in an  r × r  two-way contingency table. Several methods have been proposed for testing diagonal symmetry: see, for example, [28,29,30,31,32,33] and references therein. Here we propose a solution to this problem by using the mixed test and its properties. We present the simplest case  r = 2 . The reader will find the general case  r > 2  in Appendix A for the sake of readability.
Suppose each element (individual) of a sample of size n is classified into four mutually exclusive combinations of the two-valued variables  X 1  and  X 2 . Let Table 1 represent the observed frequencies of the cross-classifications, where  X i j  is the number of individuals classified into the  i - t h  category of  X 1  and the  j - t h  category of  X 2 i , j = 1 , 2 . Let  θ = ( θ 11 , θ 12 , θ 21 ) , with  θ i j 0  and  θ 11 + θ 12 + θ 21 1 , where  θ i j  denotes the probability of classification into the  i - t h  category of  X 1  and the  j - t h  category of  X 2 i , j = 1 , 2 .
The hypotheses for testing diagonal symmetry are
H : θ 12 = θ 21 A : θ 12 θ 21
We assume that the vector  X = ( X 11 , X 12 , X 21 )  is, given  θ , a multinomial random vector with parameters n and  θ . The likelihood function generated by  x = ( x 11 , x 12 , x 21 )  is then given by
L x ( θ ) = n ! x 11 ! x 12 ! x 21 ! x 22 ! θ 11 x 11 θ 12 x 12 θ 21 x 21 θ 22 x 22 ,
where  x 22 = n x 11 x 12 x 21  and  θ 22 = 1 θ 11 θ 12 θ 21 . Assume also a prior Dirichlet distribution with parameter vector  α = ( α 11 , α 12 , α 21 ; α 22 ) α i j > 0  for  θ . That is,
π ( θ ) = Γ ( α 11 + α 12 + α 21 + α 22 ) Γ ( α 11 ) Γ ( α 12 ) Γ ( α 21 ) Γ ( α 22 ) θ 11 α 11 1 θ 12 α 12 1 θ 21 α 21 1 θ 22 α 22 1 I Θ ( θ ) ,
where  Θ = { ( u 1 , u 2 , u 3 ) R + 3 : u 1 + u 2 + u 3 1 } .
We should note that the determination of the predictive functions is much easier under the following reparametrization of the model: let us define
λ 11 = θ 11 λ 21 = θ 12 + θ 21 and λ 12 = θ 12 θ 12 + θ 21 .
Let  λ = ( λ 12 , ( λ 11 , λ 21 ) ) . Thus, the new parameter space is  Λ = ( 0 , 1 ) × S 2  , where  S 2 = { ( u , v ) R + 2 : u + v 1 } .
Then, we can reformulate the hypotheses (46) as
H ˜ : λ Λ 0 A ˜ : λ Λ 0 c ,
where  Λ 0 = { 1 2 } × S 2 . Note that the likelihood function for  θ  generated by  x = ( x 11 , x 12 , x 21 ) { ( a , b , c ) N 3 : a + b + c n }  can be rewritten by conditioning on the statistic  T ( X ) = ( X 11 , X 12 + X 21 )  as
L x ( θ ) = P ( X = ( x 11 , x 12 , x 21 ) | T ( X ) = ( x 11 , x 12 + x 21 ) , θ ) P ( T ( X ) = ( x 11 , x 12 + x 21 ) | θ ) = x 12 + x 21 x 12 θ 12 θ 12 + θ 21 x 12 θ 21 θ 12 + θ 21 x 21 n ! x 11 ! ( x 12 + x 21 ) ! x 22 ! θ 11 x 11 ( θ 12 + θ 21 ) x 12 + x 21 θ 22 x 22 .
Hence, the induced likelihood function for  λ  generated by  x = ( x 11 , x 12 , x 21 )  may be factored as
L ˜ x ( λ ) = x 12 + x 21 x 12 λ 12 x 12 1 λ 12 x 21 n ! x 11 ! ( x 12 + x 21 ) ! x 22 ! λ 11 x 11 λ 21 x 12 + x 21 ( 1 λ 11 λ 21 ) x 22
Note that T is an ancillary statistic for  λ 12  as it is a multinomial random vector with parameters n and  ( λ 11 , λ 21 ) , and the conditional distribution of X given  T ( X ) = t t { ( u , v ) N 2 : u + v n }  depends on  λ  only through  λ 12 . The prior distribution for  λ  is given by
π ˜ ( λ ) = Γ ( α 12 + α 21 ) Γ ( α 12 ) Γ ( α 21 ) λ 12 α 12 1 ( 1 λ 12 ) α 21 1 I ( 0 , 1 ) ( λ 12 ) π ˜ ( λ 11 , λ 21 ) ( λ 11 , λ 21 ) ,
where  π ˜ ( λ 11 , λ 21 )  is the prior Dirichlet distribution for  ( λ 11 , λ 21 )  with parameter vector  ( α 11 , α 12 + α 21 , α 22 ) .
As in the example of the previous subsection, the results from Section 3 imply that  ( λ 11 , λ 21 )  is an NNP for testing the hypotheses  H ˜  versus  A ˜  in (50) by using the mixed test. In addition, we only need to compare the conditional  p T - v a l u e  with the conditional adaptive significance level to decide between the hypotheses.
From Equations (A14) and (A15), one gets the following predictive functions under  H ˜  and under  A ˜  for X:
f H ˜ ( x 11 , x 12 , x 21 ) = ( 1 / 2 ) x 12 + x 21 x 12 + x 21 x 12 × K
and
f A ˜ ( x ) = x 12 + x 21 x 12 Γ ( x 12 + α 12 ) Γ ( x 21 + α 21 ) Γ ( x 12 + x 21 + α 12 + α 21 ) Γ ( α 12 + α 21 ) Γ ( α 12 ) Γ ( α 21 ) × K ,
where
K = S 2 n ! x 11 ! ( x 12 + x 21 ) ! x 22 ! λ 11 x 11 λ 21 x 12 + x 21 ( 1 λ 11 λ 21 ) x 22 π ˜ ( λ 11 , λ 21 ) ( λ 11 , λ 21 ) d λ 11 d λ 21 .
Consequently, the Bayes factor  B F ( x )  is
B F ( x ) = ( 1 / 2 ) x 12 + x 21 Γ ( α 12 ) Γ ( α 21 ) Γ ( α 12 + α 21 ) Γ ( x 12 + x 21 + α 12 + α 21 ) Γ ( x 12 + α 12 ) Γ ( x 21 + α 21 ) .
Finally, it follows from (28) and (30) that for  x = ( x 11 , x 12 , x 21 ) ,
p T - v a l u e x = ( y 11 , y 12 , y 21 ) D T * ( x ) y 12 + y 21 y 12 ( 1 / 2 ) y 12 + y 21 = y 12 = 0 x 12 + x 21 x 12 + x 21 y 12 ( 1 / 2 ) x 12 + x 21 I D ( x ) ( x 11 , y 12 , x 12 + x 21 y 12 )
and
α T * x = ( y 11 , y 12 , y 21 ) D D T ( x ) y 12 + y 21 y 12 ( 1 / 2 ) y 12 + y 21 = y 12 = 0 x 12 + x 21 x 12 + x 21 y 12 ( 1 / 2 ) x 12 + x 21 I D ( x 11 , y 12 , x 12 + x 21 y 12 )
In this example (as in the previous subsection), the conditional  p T - v a l u e  looks like the frequentist p-value for the simple hypothesis  θ = 1 2  regarding an unknown proportion. We should emphasize that for calculation of the  p T - v a l u e , the sample space is ordered by the Bayes factor in place of the likelihood ratio. Note also that the evaluation of this conditional statistic involves ordering at most  n + 1  points of the sample space (exactly those for which the statistic T takes the value  T ( x 0 )  when  x 0  is the effectively observed sample point). On the other hand, if one performs the mixed test without using these conditional quantities, he shall order all  n + 3 3 = ( n + 3 ) ( n + 2 ) ( n + 1 ) 6  elements of the sample space.
Example 4
(Analysis of opinion swing). Suppose it is of interest to evaluate whether the proportion of individuals that did not support the US President before the State of the Union Address remained unchanged after his address. For this purpose,  n = 100  individuals are surveyed with regard to their support for the President before and after his annual message. The survey results are displayed in the following  2 × 2  contingency Table 2:
Let  X 1  ( X 2 ) be the support—“No” or “Yes”—for the President before (after) the State of the Union Address. Let  θ i j  be the probability that an individual is classified into the i-th category of  X 1  and j-th category of  X 2  (for instance,  θ 11  is the probability that an individual does not support the President both before and after his address). The hypothesis that the support for the President remains unchanged is  θ 21 + θ 22 = θ 12 + θ 22 . This is equivalent to the hypothesis that the proportion of swings from “Yes” to “No” is equal to the proportion of swings from “No” to “Yes”; that is, this is equivalent to the symmetry hypothesis  θ 12 = θ 21 . Thus, we can test such a hypothesis by means of the mixed test considering the mathematical setup of this subsection. Suppose  α = ( 1 , 1 , 1 , 1 ) . Then, the Bayes factor is given by
B F ( x 11 , x 12 , x 21 ) = ( 1 / 2 ) x 12 + x 21 ( x 12 + x 21 + 1 ) x 12 + x 21 x 12 .
For the observed data  x = ( 20 , 17 , 10 ) , we obtain the Bayes factor  B F ( ( 20 , 17 , 10 ) ) = 1.76 . Considering  a = b  (that is  b / a = 1 ) , we do not reject the null hypothesis since  B F ( ( 20 , 17 , 10 ) ) > 1 . In this case, the conditional adaptive significance level and the conditional  P T - value  at the point  ( 20 , 17 , 10 )  are
α T * ( ( 20 , 17 , 10 ) ) = y 12 { 0 , , 9 } { 18 , , 27 } 17 + 10 y 12 ( 1 / 2 ) 17 + 10 = 0.122 a n d
P T - value ( ( 20 , 17 , 10 ) ) = y 12 { 0 , , 10 } { 17 , , 27 } 27 y 12 ( 1 / 2 ) 27 = 0.248 .
Note that  p T -value ( ( 20 , 17 , 10 ) ) > α T * ( ( 20 , 17 , 10 ) ) , as it was expected. Note also that we ordered only 28 elements of the sample space by the Bayes factor to determine the above conditional quantities. To calculate the unconditional ones, we should have ordered all 176,851 points in the sample space.

4.3. Test of Independence

Consider the same statistical model as in the previous subsection. However, now we want to evaluate whether there exists (or not) association between the variables  X 1  and  X 2 . For this purpose, we may test the independence hypothesis between these variables. Consider the joint distribution for  ( X 1 , X 2 )  in Table 3 below:
The hypotheses to be tested are
H : θ 11 = ( θ 11 + θ 12 ) ( θ 11 + θ 21 ) A : θ 11 ( θ 11 + θ 12 ) ( θ 11 + θ 21 )
It is easy to check that hypotheses H and A can be rewritten as
H : θ 11 θ 11 + θ 12 = θ 21 1 ( θ 11 + θ 12 ) A : θ 11 θ 11 + θ 12 θ 21 1 ( θ 11 + θ 12 )
Let us define
λ 11 = θ 11 θ 11 + θ 12 λ 21 = θ 21 1 ( θ 11 + θ 12 ) and λ 12 = θ 11 + θ 12
and consider the new parameter  λ = ( ( λ 11 , λ 21 ) , λ 12 ) , which takes value in  Λ = ( 0 , 1 ) 2 × ( 0 , 1 ) . Let  T ( X ) = T ( X 11 , X 12 , X 21 ) = X 11 + X 12 . Proceeding as in the previous subsections, we obtain the following induced likelihood function for  λ  generated by  x = ( X 11 , x 12 , x 21 )
L ˜ x ( λ ) = x 11 + x 12 x 11 λ 11 x 11 1 λ 11 x 12 x 21 + x 22 x 21 λ 21 x 21 1 λ 21 x 22 n x 11 + x 12 λ 12 x 11 + x 12 1 λ 12 x 21 + x 22
Note that T is an ancillary statistic for  ( λ 11 , λ 21 ) , as it is a binomial random variable with parameters n and  λ 12 . In addition, for each possible value t of the statistic T, the conditional distribution of X given  T ( X ) = t  depends on  λ  only through  ( λ 11 , λ 21 ) .
The prior distribution for  λ  is such that  λ 11 λ 21  and  λ 12  are independent Beta random variables with respective parameters  α 11  and  α 12 α 21  and  α 22 , and  α 11 + α 12  and  α 21 + α 22 .
Finally, note that under the new parametrization, the independence hypothesis is
H ˜ : λ Λ 0 A ˜ : λ Λ 0 ,
where  Λ 0 = { ( u , v ) ( 0 , 1 ) 2 : u = v } × ( 0 , 1 ) .
From the results from Section 3, it follows that  λ 12  is an NNP for testing the hypotheses  H ˜  versus  A ˜  above by using the mixed test. In addition, we only need to compare the conditional  P T v a l u e  with the conditional adaptive significance level to decide between these hypotheses. In a sense, the test for the hypothesis of independence between  X 1  and  X 2  by using the conditional statistics resembles the test for the hypothesis of homogeneity were the marginal counts  X 11 + X 12  and  n X 11 X 12  fixed beforehand.
Considering as in the previous section  α = ( 1 , 1 , 1 , 1 ) , we obtain the following expression for the Bayes factor:
B F ( x 11 , x 12 , x 21 ) = 36 ( x 11 + x 21 + 1 ) ! ( n + 1 x 11 x 21 ) ! x 11 ! x 12 ! x 21 ! x 22 ! ( n + 3 ) n + 2 x 11 + x 21 + 1 .
The conditional predictive probability function for X given  T ( X ) = X 11 + X 12 = t t = 0 , , n  is given by
f H , t ( x 11 , x 12 , x 21 ) = 6 t x 11 n t x 21 ( n + 3 ) n + 2 x 11 + x 21 + 1 I T 1 ( t ) ( x 11 , x 12 , x 21 ) ,
where  T 1 ( t ) = { ( y 11 , y 12 , y 21 ) X : T ( y 11 , y 12 , y 21 ) = y 11 + y 12 = t } .
From the above distribution, one may obtain the conditional  P T - v a l u e  and the conditional adaptive significance level  α T *  at each point in the sample space.
Example 5
(Market’s directional change). In [34] it is argued that the directional change of the stock market in January signals the directional change of the market for the remainder of the year. Suppose the following Table 4 summarizes the directional changes of the prices of a few stocks in both periods.
In this case, the Bayes factor is given by
B F ( x 11 , x 12 , x 21 ) = 36 ( x 11 + x 21 + 1 ) ! ( 16 x 11 x 21 ) ! x 11 ! x 12 ! x 21 ! x 22 ! 18 17 x 11 + x 21 + 1 .
For the observed data  x = ( 5 , 0 , 2 ) , we obtain the Bayes factor  B F ( ( 5 , 0 , 2 ) ) = 0.244 . Considering  a = b  (that is  b / a = 1 ) , we reject the null hypothesis by using the mixed test since  B F ( ( 5 , 0 , 2 ) ) < 1 . That is, the data from only a few stocks reveal that the directional change for the remainder of the year depends on the directional change in January. Note that although the sample size is small ( n = 15 ) and a cell count is equal to zero, the mixed test can be fully performed, as opposed to standard tests for the hypothesis of independence that rely on asymptotic results. In this case, the conditional adaptive significance level and the conditional  p T - v a l u e  at the point  ( 5 , 0 , 2 )  are
α T * ( ( 5 , 0 , 2 ) ) = 0.0109 a n d P T - value ( ( 5 , 0 , 2 ) ) = 0.0022 .

5. Discussion

Statistical hypothesis testing is an important quantitative method that may help the daily activity of scientists from different areas of knowledge. However, with recent computational advances, the misuse of standard tests have come to light. Thus, problems with tests of significance and fixed-level tests have brought a growing need for alternative approaches to hypothesis testing that do not have such drawbacks. Among these alternatives, we revisit in this manuscript the mixed test by Pericchi and Pereira, which combines aspects from two opposing viewpoints: the frequentist and the Bayesian. The mixed test satisfies various reasonable properties one desires when performing a test of hypotheses. Here we prove that the mixed test also meets the Non-Informative Nuisance Parameter Principle (NNPP) for simple hypotheses regarding the parameter of interest. The NNPP concerns the question of how to make inferences about a parameter in the presence of Non-Informative Nuisance Parameters: it states that it is irrelevant whether a Non-Informative Nuisance Parameter is known or unknown in order to draw conclusions about a quantity of interest from data. This principle, though important, has not been explored in some depth, and for this reason, we studied it further in hypothesis testing problems. Nuisance parameters typically affect inferences about a parameter of interest: when the variance is unknown, estimation of the mean of a normal distribution and estimation of the parameters of a linear regression model are examples of this.
Adherence of the mixed test to the NNPP allowed for much easier performance of the test, as the calculations involved were significantly reduced. Indeed, decision making between the competing statistical hypotheses was simplified in the three examples we examined: in each situation, conditioning on a suitable statistic and considering conditional versions of the p-value and the adaptive significance level were revealed to be an advantageous course of action to use the mixed test. The extent to which the adherence of the mixed test to the NNPP is valid and the use of the mixed test can then be made easier remains unanswered in this work. This issue is the goal of future investigation.

Author Contributions

All authors have contributed to the conceptualization, formal analysis and writing of the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported by Conselho Nacional de Desenvolvimento Científico e Tecnológico [grant 141161/2018-3].

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Proof of Theorem 2.
Suppose there exists a statistic  T : X T  such that it is p-sufficient for  θ 1  and s-ancillary for  θ 2 . Then,
P ( X = x | θ ) = P ( X = x , T ( X ) = T ( x ) | θ ) = P ( X = x | T ( X ) = T ( x ) , θ 1 , θ 2 ) P ( T ( X ) = T ( x ) | θ 1 , θ 2 )
The result is immediate: as the conditional distribution of X given  T ( X ) = T ( x )  and  θ  depends only on  θ 2 , and the marginal distribution of T given  θ  depends only on  θ 1 , one can write for each  x X P ( T ( X ) = T ( x ) | θ 1 , θ 2 ) = L x 1 ( θ 1 )  and  P ( X = x | T ( X ) = T ( x ) | θ 1 , θ 2 ) = L x 2 ( θ 2 ) . The proof when T is p-sufficient for  θ 2  and s-ancillary for  θ 1  is analogous. □
Proof of Theorem 1.
We first prove the  ( )  part of the theorem. Suppose that  x X  is such that  θ 1 θ 2 | X = x ; that is, the posterior distribution of  θ  given  X = x  can be factored as  π ( θ | x ) = π 1 ( θ 1 | x ) π 2 ( θ 2 | x ) . Then,
π ( θ | x ) = π 1 ( θ 1 | x ) π 2 ( θ 2 | x ) L x ( θ ) π ( θ ) Θ L x ( θ ) π ( θ ) d θ = Θ 2 L x ( θ ) π ( θ ) d θ 2 Θ L x ( θ ) π ( θ ) d θ Θ 1 L x ( θ ) π ( θ ) d θ 1 Θ L x ( θ ) π ( θ ) d θ .
Due to the fact that  θ 1 θ 2 , it follows from the last equality in (A2) that
L x ( θ ) π 1 ( θ 1 ) π 2 ( θ 2 ) = π 1 ( θ 1 ) Θ 2 L x ( θ ) π 2 ( θ 2 ) d θ 2 π 2 ( θ 2 ) Θ 1 L x ( θ ) π 1 ( θ 1 ) d θ 1 Θ L x ( θ ) π ( θ ) d θ .
The result follows considering, for instance,
L x 1 ( θ 1 ) = Θ 2 L x ( θ ) π 2 ( θ 2 ) d θ 2 and L x 2 ( θ 2 ) = Θ 1 L x ( θ ) π ( θ 1 ) d θ 1 Θ L x ( θ ) π ( θ ) d θ .
Next, we prove the converse. Suppose that the likelihood can be factored as  L x ( θ ) = L x 1 ( θ 1 ) L x 2 ( θ 2 ) . Then,
π ( θ | x ) = L x ( θ ) π ( θ ) Θ L x ( θ ) π ( θ ) d θ = L x 1 ( θ 1 ) L x 2 ( θ 2 ) π 1 ( θ 1 ) π 2 ( θ 2 ) Θ 1 × Θ 2 L x 1 ( θ 1 ) L x 2 ( θ 2 ) π 1 ( θ 1 ) π 2 ( θ 2 ) d θ .
The posterior marginal density of  θ i  is obtained from (A5) by integrating out the other component. Thus,
π i ( θ i | x ) = L x i ( θ i ) π i ( θ i ) Θ i L x i ( θ i ) π i ( θ i ) d θ i , i = 1 , 2 ,
and therefore,  π ( θ 1 , θ 2 | x ) = π 1 ( θ 1 | x ) π 2 ( θ 2 | x ) ; that is,  θ 1 θ 2 | X = x . □
Proof of Theorem 3.
We first verify that  θ 2  is an NNP for testing  H ¯  versus  A ¯  by means of  φ ¯ * . Recall that
φ ¯ * x , θ 2 = 1 f ¯ H ¯ x , θ 2 f ¯ A ¯ x , θ 2 < b a ,
where  f ¯ H  ( f ¯ A ) is the predictive distribution for  ( X , θ 2 )  obtained under  H ¯  ( A ¯ ). In this case, the likelihood function generated by  x 0 , θ 2  for  θ 1  with  x 0 X  such that (1) holds and  θ 2 Θ 2  is
L ¯ ( x 0 , θ 2 ) ( θ 1 ) = P X = x 0 | θ 1 , θ 2 π 2 θ 2 | θ 1 = L x 0 θ 1 , θ 2 π 2 θ 2 = L x 0 1 ( θ 1 ) L x 0 2 ( θ 2 ) π 2 θ 2 .
Then, the predictive function under the null hypothesis  H ¯  can be calculated as
f ¯ H ¯ x 0 , θ 2 = { θ 0 } L ¯ ( x 0 , θ 2 ) ( θ 1 ) d P H ¯ θ 1 ,
where  P H ¯  is degenerate at  θ 0  conditional distribution of  θ 1 | θ 1 = θ 0 . Thus,
f ¯ H ¯ x 0 , θ 2 = L x 0 1 θ 0 L x 0 2 θ 2 π 2 θ 2 .
In addition, the predictive function under the alternative hypothesis  A ¯  is given by
f ¯ A ¯ x 0 , θ 2 = Θ 1 L x 0 1 θ 1 L x 0 2 θ 2 π 2 θ 2 d P H ¯ 1 θ 1 = Θ 1 L x 0 1 θ 1 L x 0 2 θ 2 π 2 θ 2 π 1 θ 1 d θ 1 = L x 0 2 θ 2 π 2 θ 2 Θ 1 L x 0 1 θ 1 π 1 θ 1 d θ 1 .
Thus, the Bayes factor can be expressed by
f ¯ H ¯ x 0 , θ 2 f ¯ A ¯ x 0 , θ 2 = L x 0 1 θ 0 L x 0 2 θ 2 π 2 θ 2 L x 0 2 θ 2 π 2 θ 2 Θ 1 L x 0 1 θ 1 π 1 θ 1 d θ 1 = L x 0 1 θ 0 Θ 1 L x 0 1 θ 1 π 1 θ 1 d θ 1 .
Note that Equation (A12) does not depend on  θ 2 . As a result, the test in (A7) does not depend on  θ 2 , and consequently,  θ 2  is an NNP for testing  H ¯  versus  A ¯  by means of  φ ¯ * . Now, we shall determine the test  φ *  for H versus A. The predictive distribution for X at  x 0  under the null hypothesis is
f H x 0 = Θ 1 × Θ 2 L x 0 ( θ 1 , θ 2 ) d P H θ 1 , θ 2 .
It is not difficult to verify that for fixed  θ 0 Θ 1 , the conditional distribution of  θ  given  θ 1 = θ 0  is such that  θ 1  is degenerate at  θ 0 , and  θ 2  is independent of  θ 1  with density  π 2 .
Then,
f H x 0 = Θ 1 × Θ 2 L x 0 θ 1 , θ 2 d P H θ 1 , θ 2 = Θ 2 L x 0 1 θ 0 L x 0 2 θ 2 π 2 θ 2 d θ 2 = L x 0 1 θ 0 Θ 2 L x 0 2 θ 2 π 2 θ 2 d θ 2 .
For the alternative hypothesis, we have that
f A x 0 = Θ 1 × Θ 2 L x 0 1 θ 1 L x 0 2 θ 2 d P A ( θ 1 , θ 2 ) = Θ 1 × Θ 2 L x 0 1 θ 1 L x 0 2 θ 2 d P ( θ 1 , θ 2 ) = Θ 1 Θ 2 L x 0 1 θ 1 L x 0 2 θ 2 π 1 θ 1 π 2 θ 2 d θ 1 d θ 2 = Θ 1 L x 0 1 θ 1 π 1 θ 1 d θ 1 Θ 2 L x 0 2 θ 2 π 2 θ 2 d θ 2 .
Finally,
f H x 0 f A x 0 = L x 0 1 θ 0 Θ 2 L x 0 2 θ 2 π 2 θ 2 d θ 2 Θ 1 L x 0 1 θ 1 π 1 θ 1 d θ 1 Θ 2 L x 0 2 θ 2 π 2 θ 2 d θ 2 = L x 0 1 θ 0 Θ 1 L x 0 1 θ 1 π 1 θ 1 d θ 1 = f ¯ H ¯ x 0 , θ 2 f ¯ A ¯ x 0 , θ 2 .
Hence,
f H 0 x 0 f H 1 x 0 < a b f ¯ H ¯ 0 x 0 , θ 2 f ¯ H ¯ 1 x 0 , θ 2 < a b ,
and consequently,
φ * x 0 = 1 φ ¯ * x 0 , θ 2 = 1 .
Proof of Corollary 1.
The corollary follows directly from Theorems 2 and 3. □
Proof of Theorem 4.
From Theorem 3 and Corollary 1, we have that for each  x 0 X ,
φ * x 0 = 1 B F x 0 b a .
Then,
φ * x 0 = 1 B F x 0 b a D ( x 0 ) D
x D ( x 0 ) f H , T ( x 0 ) ( x ) x D f H , T ( x 0 ) ( x ) p T - v a l u e x 0 α T * x 0 .
Thus,
φ * x 0 = 1 P T - v a l u e x 0 α T * x 0 .
The converse is proven by the contrapositive.
φ * x 0 = 0 b a < B F x 0 D { x 0 } D ( x 0 ) .
As  x 0 D  if  B F ( x 0 ) > b a , we obtain that
φ * x 0 = 0 x D f H , T ( x 0 ) ( x ) + f H , T ( x 0 ) ( x 0 ) x D ( x 0 ) f H , T ( x 0 ) ( x ) .
Since  x 0 D T ( x 0 )  and  B F ( x 0 ) > b a > 0 , it follows that  f H , T ( x 0 ) ( x 0 ) > 0 . Thus,
φ * x 0 = 0 α T * x 0 < P T - v a l u e x 0 ,
and consequently,
p T - v a l u e x 0 α T * ( x 0 ) φ * x 0 = 1 .
From (A20) and (A22), the result follows. □
Mixed test for symmetry hypothesis for  r × r  contingency tables
In this case, Table A1 represents the observed frequencies of the cross-classification of n units by the variables  X 1  and  X 2 .
Table A1. Observed frequencies of  X 1  and  X 2  in the  3 × 3  case.
Table A1. Observed frequencies of  X 1  and  X 2  in the  3 × 3  case.
X 2
X 1 x 11 x 12 x 13 n 1 .
x 21 x 22 x 23 n 2 .
x 31 x 32 x 33 n 3 .
n . 1 n . 2 n . 3 n
Let  X = ( X i j )  be the  ( r 2 1 ) -dimensional vector of cell counts and  θ = ( θ i j )  be the  ( r 2 1 ) -dimensional vector of cell probabilities, where  X i j  and  θ i j  are self explanatory. Suppose that X is a multinomial random vector with parameters n and  θ . The likelihood function generated by  x X  for  θ  is given by
L x ( θ ) = n ! x 11 ! x r r ! θ 11 x 11 θ r r x r r .
The hypotheses for testing diagonal symmetry are
H : θ i j = θ j i i j . A : θ i j θ j i for   at   least   one i j .
We also assume a prior Dirichlet distribution with parameter  α = ( α i j )  for  θ . That is,
π ( θ ) = Γ ( α 11 + + α r r ) Γ ( α 11 ) Γ ( α r r ) θ 11 α 11 1 θ r r α r r 1 .
To perform the mixed test for the symmetry hypothesis, we consider the following reparametrization of the model: we define
λ i j = θ i j θ i j + θ j i f o r i < j λ i j = θ i j + θ j i f o r i > j λ i j = θ i j f o r i = j .
Let  λ = ( λ 1 , λ 2 ) , where  λ 1  is the  ( r 2 r 2 ) -dimensional vector for which the components are  λ i j ’s such that  i < j , and  λ 2  is the  ( r 2 r 2 + r 1 ) -dimensional vector for which the components are  λ i j ’s such that  i j . The new parameter space is then  Λ = ( 0 , 1 ) r 2 r 2 × S r 2 r 2 + r 1 .
Then, we can rewrite the hypotheses (A24) as
H ˜ : λ Λ 0 A ˜ : λ Λ 0 c .
where  Λ 0 = B × S r 2 r 2 + r 1 , and B is the singleton  B = { ( 1 2 , , 1 2 ) } .
As in previous sections, we consider a statistic T that is s-ancillary for  λ 1 : T is the  ( r 2 r 2 + r 1 ) -dimensional vector for which the components are the sums  X i j + X j i  for  i < j  and  X i i  for  i = 1 , , r 1 . The induced likelihood function for  λ  generated by x is
L ˜ x ( λ ) = i < j x i j + x j i x i j λ i j x i j ( 1 λ i j ) x j i n ! i x i i ! i > j ( x i j + x j i ) ! i λ i i x i i i > j λ i j x i j + x j i .
We can easily see that the likelihood function in (A28) can be factored as  L ˜ x ( λ 1 , λ 2 ) = L ˜ x 1 ( λ 1 ) L ˜ x 2 ( λ 2 ) . In addition, the prior distribution for  λ  is such that  λ 1  and  λ 2  are independent:  λ 2  being a Dirichlet random vector and  λ 1  a vector of independent Beta random variables. That is,
π ˜ ( λ ) = π ˜ 1 ( λ 1 ) π ˜ 2 ( λ 2 ) = i < j Γ ( α i j + α j i ) Γ ( α i j ) Γ ( α j i ) λ i j α i j 1 ( 1 λ i j ) α i j 1 I ( 0 , 1 ) ( λ i j ) π ˜ 2 ( λ 2 )
From Theorem 3, we have that  λ 2  is an NNP for testing  H ˜  versus  A ˜  by means of the mixed test. In addition, the mixed test for  H ˜  reduces to the mixed test for the simple hypothesis  λ 1 = ( 1 2 , , 1 2 )  were  λ 2  known. From Theorem 4, it follows that we only need to compare the conditional  P T - v a l u e  with the conditional adaptive significance level to test  H ˜  against  A ˜ . From (28), (30) and (A28), we obtain for  x = ( x i j ) X :
p T - v a l u e x = y D T * ( x ) i < j y i j + y j i y i j ( 1 2 ) y i j + y j i = y D T * ( x ) i < j x i j + x j i y i j ( 1 2 ) x i j + x j i
and
α T * x = y D D T ( x ) i < j y i j + y j i y i j ( 1 2 ) y i j + y j i = y D D T ( x ) i < j x i j + x j i y i j ( 1 2 ) x i j + x j i .
In this case, these conditional quantities are simply determined by the products of binomial-type probabilities.

References

  1. Berger, J.; Wolper, R. The Likelihood Principle; Institute of Mathematical Statistics: Hayward, CA, USA, 1988. [Google Scholar]
  2. Mayo, D. On the Birnbaum Argument for the Strong Likelihood Principle. Stat. Sci. 2014, 29, 227–239. [Google Scholar] [CrossRef]
  3. Dawid, A. Discussion of “On the Birnbaum Argument for the Strong Likelihood Principle”. Stat. Sci. 2014, 29, 240–241. [Google Scholar] [CrossRef]
  4. Evans, M. Discussion of “On the Birnbaum Argument for the Strong Likelihood Principle”. Stat. Sci. 2014, 29, 242–246. [Google Scholar] [CrossRef]
  5. Hannig, J. Discussion of “On the Birnbaum Argument for the Strong Likelihood Principle”. Stat. Sci. 2014, 29, 254–258. [Google Scholar] [CrossRef]
  6. Bjørnstad, J. Discussion of “On the Birnbaum Argument for the Strong Likelihood Principle”. Stat. Sci. 2014, 29, 259–260. [Google Scholar] [CrossRef]
  7. Shan, G. Exact Statistical Inference for Categorical Data; Academic Press: Cambridge, MA, USA, 2016. [Google Scholar]
  8. Pericchi, L.; Pereira, C. Adaptive significance levels using optimal decision rules: Balancing by weighting the error probabilities. Braz. J. Probab. Stat. 2016, 30, 70–90. [Google Scholar] [CrossRef]
  9. Butler, R.W. Predictive Likelihood Inference with Applications. J. R. Stat. Soc. Ser. B 1986, 48, 1–38. [Google Scholar] [CrossRef]
  10. Severini, T. Integrated likelihoods for functions of a parameter. Stat 2018, 7, e212. [Google Scholar] [CrossRef]
  11. Berger, J.; Liseo, B.; Wolpert, R. Integrated Likelihood Methods for Eliminating Nuisance Parameters. Stat. Sci. 1999, 14, 1–28. [Google Scholar] [CrossRef]
  12. Cox, D.R. Partial likelihood. Biometrika 1975, 62, 269–276. [Google Scholar] [CrossRef]
  13. Dawid, A.P. On the concepts of sufficiency and ancillarity in the presence of nuisance parameters. J. R. Stat. Soc. Ser. B 1975, 37, 248–258. [Google Scholar] [CrossRef]
  14. Sprott, D.A. Marginal and conditional sufficiency. Biometrika 1975, 62, 599–605. [Google Scholar] [CrossRef]
  15. Barndorff-Nielsen, O. Nonformation. Biometrika 1976, 63, 567–571. [Google Scholar] [CrossRef]
  16. Barndorff-Nielsen, O. Information and Exponential Families: In Statistical Theory; Wiley: Chichester, UK, 1978. [Google Scholar]
  17. Basu, D. On the Elimination of Nuisance Parameters. J. Am. Stat. Assoc. 1977, 72, 355–366. [Google Scholar] [CrossRef]
  18. Jørgensen, B. The rules of conditional inference: Is there a universal definition of nonformation? J. Ital. Stat. Soc. 1994, 3, 355. [Google Scholar] [CrossRef]
  19. Pace, L.; Salvan, A. Principles of Statistical Inference: From a Neo-Fisherian Perspective; World Scientific Publishing Company Pte Limited: Singapore, 1997. [Google Scholar]
  20. Gannon, M.; Pereira, C.; Polpo, A. Blending Bayesian and Classical Tools to Define Optimal Sample-Size-Dependent Significance Levels. Am. Stat. 2019, 73, 213–222. [Google Scholar] [CrossRef]
  21. Pereira, C.; Nakano, E.; Fossaluza, V.; Esteves, L.; Gannon, M.; Polpo, A. Hypothesis Tests for Bernoulli Experiments: Ordering the Sample Space by Bayes Factors and Using Adaptive Significance Levels for Decisions. Entropy 2017, 19, 696. [Google Scholar] [CrossRef]
  22. Olivera, M. Definição do nivel de significancia em função do tamanho amostral. Master’s Thesis, IME, Universidade de São Paulo, São Paulo, Brazil, 2014. [Google Scholar]
  23. Pereira, B.; Pereira, C. A Likelihood approach to diagnostic test in clinical medicine. Stat. J. 2005, 3, 77–98. [Google Scholar]
  24. Montoya, D.; Irony, T.; Pereira, C.; Whittle, M. An unconditional exact test for the Hardy-Weinberg equilibrium law: Sample space ordering using the Bayes factor. Genet. Soc. Am. 2001, 158, 875–883. [Google Scholar]
  25. Irony, T.; Pereira, C. Bayesian hypothesis test: Using surface integrals to distribute prior information among the hypotheses. Resenhas IME-USP 1995, 2, 27–46. [Google Scholar]
  26. DeGroot, M. Probability and Statistics; Adson Wesley: Boston, MA, USA, 1986. [Google Scholar]
  27. Freeman, P. The role of p-values in analysing trial resultss. Stat. Med. 1993, 12, 15–16. [Google Scholar] [CrossRef]
  28. Bowker, A. A Test for Symmetry in Contingency Tables. J. Am. Stat. Assoc. 1948, 43, 572–574. [Google Scholar] [CrossRef]
  29. Ireland, C.; Ku, H.; Kullback, S. Symmetry and marginal homogeneity of an r × r contingency table. J. Am. Stat. Assoc. 1969, 64, 1323–1341. [Google Scholar] [CrossRef]
  30. Kullback, S. Marginal Homogeneity of Multidimensional Contingency Tables. Ann. Math. Stat. 1971, 42, 594–606. [Google Scholar] [CrossRef]
  31. Bernardo, G.; Lauretto, M.; Stern, J. The full Bayesian significance test for symmetry in contingency tables. AIP Conf. Proc. 2012, 1443, 198–205. [Google Scholar]
  32. Agresti, A. Categorical Data Analysis, 3rd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
  33. Tahata, K.; Tomizawa, S. Symmetry and asymmetry models and decompositions of models for contingency tables. SUT J. Math. 2014, 50, 131–165. [Google Scholar] [CrossRef]
  34. McClave, J.T.; Benson, P.G.; Sincich, T.T. Statistics for Business and Economics; Number 519.5; Pearson: London, UK, 2001. [Google Scholar]
Table 1. Observed frequencies of  ( X 1 , X 2 )  in the  2 × 2  case.
Table 1. Observed frequencies of  ( X 1 , X 2 )  in the  2 × 2  case.
X 2
x 11 x 12 n 1 .
X 1 x 21 x 22 n 2 .
n . 1 n . 2 n
Table 2. Survey results.
Table 2. Survey results.
After
No Yes
No201737
BeforeYes105363
3070100
Table 3. Joint distribution of  X 1  and  X 2  given  θ .
Table 3. Joint distribution of  X 1  and  X 2  given  θ .
X 2
θ 11 θ 12 θ 1 .
X 1 θ 21 θ 22 θ 2 .
θ . 1 θ . 2 1
Table 4. Survey results.
Table 4. Survey results.
After January
Up Down
up5 0 5
January changedown2 8 10
7 8 15
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Flórez Rivera, A.F.; Esteves, L.G.; Fossaluza, V.; de Bragança Pereira, C.A. On the Nuisance Parameter Elimination Principle in Hypothesis Testing. Entropy 2024, 26, 117. https://doi.org/10.3390/e26020117

AMA Style

Flórez Rivera AF, Esteves LG, Fossaluza V, de Bragança Pereira CA. On the Nuisance Parameter Elimination Principle in Hypothesis Testing. Entropy. 2024; 26(2):117. https://doi.org/10.3390/e26020117

Chicago/Turabian Style

Flórez Rivera, Andrés Felipe, Luis Gustavo Esteves, Victor Fossaluza, and Carlos Alberto de Bragança Pereira. 2024. "On the Nuisance Parameter Elimination Principle in Hypothesis Testing" Entropy 26, no. 2: 117. https://doi.org/10.3390/e26020117

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop