A Wald-type test statistic for testing linear hypothesis in logistic regression models based on minimum density power divergence estimator

In this paper a robust version of the classical Wald test statistics for linear hypothesis in the logistic regression model is introduced and its properties are explored. We study the problem under the assumption of random covariates although some ideas with non random covariates are also considered. The family of tests considered is based on the minimum density power divergence estimator instead of the maximum likelihood estimator and it is referred to as the Wald-type test statistic in the paper. We obtain the asymptotic distribution and also study the robustness properties of the Wald type test statistic. The robustness of the tests is investigated theoretically through the influence function analysis as well as suitable practical examples. It is theoretically established that the level as well as the power of the Wald-type tests are stable against contamination, while the classical Wald type test breaks down in this scenario. Some classical examples are presented which numerically substantiate the theory developed. Finally a simulation study is included to provide further confirmation of the validity of the theoretical results established in the paper.

Let M be any matrix of r rows and k + 1 columns with rank(M ) = r, and m a vector of order r with specified constants such that rank(M T , m) = r. If we are interested in testing the Wald test statistic is usually used in which β is estimated using the maximum likelihood estimator (MLE). Notice that if we consider M = I k+1 and m = β 0 , we get the Wald-type test statistic presented by Bianco and Martinez (2009) based on a weighted Bianco and Yohai (1996) estimator. It is well known that the MLE of β can be severely affected by outlying observations. Croux and Haesbroeck (2003) discuss the breakdown behavior of the MLE in the logistic regression model and show that the MLE breaks down when several outliers are added to a data set. In the recent years several authors have attempted to derive robust estimates of the parameters in the logistic regression model; see for instance Pregibon (1982), Morgenthaler (1992), Carroll and Pederson (1993), Christmann (1994), Bianco and Yohai (1996), Croux and Haesbroeck (2003), Bondell (2005;2008) and Hobza et al. (2008;2017). Our interest in this paper is to present a family of Wald-type test statistics based on the robust minimum density power divergence estimator for testing the general linear hypothesis given in (1.3).
In Section 2 we present the minimum density power divergence estimator for β. The Wald-type test statistics, based on the minimum density power divergence estimator, are presented in Section 3, together with their asymptotic properties. The theoretical robustness properties are presented in Section 4 and finally, Section 5 and 6 are devoted to the presentation of a simulation study and real data examples, respectively.
Based on (2.4) and (2.5), we shall define the minimum density power divergence estimator as follows.
Definition 2.1. The minimum density power divergence estimator for the parameter β, β λ , in the logistic regression model is given by In order to obtain the estimating equations we need to get the derivative of (2.5) with respect to β. First we write expression (2.5) as, Now, taking into account the expressions and after some algebra, we get Therefore, the estimating equations for λ > 0 are given by where π(x T i β) as given in (1.2). Based on the previous results we have established the following theorem.
Theorem 2.1. The minimum density power divergence estimator of β, β λ , can be obtained as the solution of the system of equations given in (2.6).
If we consider λ = 0 in (2.6), we get the estimating equations for the MLE as Based on equation (2.6), we can write the estimating equation for the MDPDE under the the logistic regression model as In order to get the asymptotic distribution of the MDPDE of β, β λ , we are going to assume that not only are the explanatory variables random but they are also identically distributed and moreover are independent and identically distributed. We shall assume that X 1 , ..., X n is a random sample from a random variable X with marginal distribution function H(x). By following the method given in Maronna et al. (2006), the asymptotic variance covariance matrix of √ n β λ is X is the support of X, and In relation to the matrix K λ (β 0 ), we have A. Basu et al. An estimator of K λ (β) will be where H n (x) the empirical distribution function associated with the sample x 1 , ..., x n . Then It is interesting to observe that for λ = 0 we get with I F (β) being the Fisher information matrix associated to the logistic regression model. To compute the matrix J λ (β), first we need to calculate and hence On the other hand Finally, and an estimator of J λ (β) is given by In particular, for λ = 0, we have From the sequence of above results, the next theorem follows.
Remark 2.1. We have considered that the covariates are random, a crucial assumption to get the asymptotic distribution of the MDPDE by using the standard asymptotic theory for M-estimators. It is interesting to highlight that whenever the covariates were non-stochastic (fixed design case), the asymptotic distribution of the MDPDE could be obtained from Ghosh et al. (2016d) without using the standard asymptotic theory of M-estimators. In order to present the results in the most general setting, we shall assume that the random variables Y i with i = 1, ..., I, are binomial with parameters n i and π i = π(x T i β) instead of Bernoulli random variables. We shall denote by N = I i=1 n i and let n i1 denotes the observed value of Y i . We will assume that I is fixed and for each i = 1, . . . , I, construct the independent and identically distributed latent observations z i1 , . . . , z ini each following a Bernoulli distribution with probability π 2748 A. Basu et al. and n i1 = ni j=1 z ij . Then, N random observations z 11 , . . . , z 1n1 , z 21 , . . . , z 2n2 , . . ., z I1 , . . . , z In I are independent but have possibly different distribution with z ij ∼ Ber(π i ). This falls under the general setup of independent but nonhomogeneous observations as considered in Ghosh and Basu (2013) and hence it is immediately seen that the corresponding estimating equations for the MD-PDE, β * λ in this context, for λ > 0 are given by and following Ghosh and Basu (2013), we get the asymptotic distribution of the MDPDE of β, β * λ , as given by Here, the matrices J * (β 0 ) and K * (β 0 ) can be obtained directly from the general results of Ghosh and Basu (2013) or from the simplified results in the context of Bernoulli logistic regression with fixed design in  and are given by For λ = 0, it is clear, based on (2.12), that we get the classical likelihood estimator. We can observe that in this situation and we get the classical result,

Wald type test statistic for testing linear hypothesis
Based on the asymptotic distribution of β λ we are going to define a family of Wald-type test statistics for testing the null hypothesis where M T is any matrix of r rows and k + 1 columns and m a vector of order r of specified constant. We assume that the matrix M T has full row rank, i.e., rank (M ) = r.
Definition 3.1. Let β λ be the minimum power divergence estimator. The family of Wald type test statistics for testing the null hypothesis given in (3.1) is given by In the particular case of λ = 0, i.e. β is the MLE, we get the classical Wald test statistic because in this case . Theorem 3.1. The asymptotic distribution of the Wald type test statistic, W n , defined in (3.2), under the null hypothesis given in (3.1), is a chi-square distribution with r degrees of freedom.
= I r×r , the asymptotic distribution of W n is a chi-square distribution with r degrees of freedom.

Remark 3.1. If we consider
if and only if β i = 0, i = 1, ..., k. Therefore, we can consider the Wald-type test statistics with M T defined in (3.3) for testing In this case, the asymptotic distribution of the Wald type test statistic is a chi square distribution with k degrees of freedom. If we consider M T to be a vector with all elements equal zero except for the (i + 1)-th term, equals 1, we can test Based on the previous theorem the null hypothesis given in (3.1) will be rejected if we have that where χ 2 r,α is the quantile of order 1 − α.for a chi-square with r degrees of freedom Let us consider β * ∈ Θ such that M T β * = m, i.e., β * does not belong to the null hypothesis. We denote and we are going to get an approximation to the power function for the test statistics given in (3.4).
where Φ n (x) tends uniformly to the standard normal distribution Φ (x) and σ (β * ) is given by Now we are going to get the asymptotic distribution of the random variable Now the result follows.

Remark 3.2.
Based on the previous theorem we can obtain the sample size necessary to get a fix power ξ (β * ) = ξ 0 . From (3.5), we must solve the equation and we get that n = [n * ] + 1 with In the following theorem we present an approximation to the power function at the contiguous alternative hypothesis

Theorem 3.3. An approximation of the power function for the test statistic given in (3.4), in β n
where F χ 2 r (δ) is the distribution function of a non-central chi-square with p degrees of freedom and non-centrality parameter δ given by δ = d T Σ λ (β 0 ) d.

Influence function of the MDPDE
We will consider the influence function analysis of Hampel et al. (1986) to study the robustness of our proposed MDPDE and the corresponding Wald-type test of general linear hypothesis in the logistic regression model. Since the MDPDE can be written in term of a M -estimator as shown in Section 2 with ψ-function given by (2.7), we can apply directly the results of the M-estimation theory of Hampel et al. (1986) in order to get the influence function of the proposed MDPDE.
However, we first need to re-define the minimum density power divergence estimator β λ from Definition 1 in terms of a statistical functional. Let us assume the stochastic nature of the covariates X and that the observations (X 1 , Y 1 ), . . . , (X n , Y n ) are i.i.d. with some joint distribution G. Then we define the required statistical functional corresponding to β λ as follows.
Definition 4.1. The minimum DPD functional T λ (G), corresponding to the minimum DPD estimator β λ , at the joint distribution G is defined as the solution of the system of equations with respect to β, whenever the solution exists.
Therefore, the minimum DPD functional T λ is Fisher consistent.
Next, we can easily obtain the influence function for our MDPDE at the model distribution G 0 as presented in the following theorem. This can be derived either through a straightforward calculation or by applying the corresponding results from M-estimation theory of Hampel et al. (1986) and hence the proof of the theorem is omitted.
Theorem 4.1. The influence function of the minimum DPD functional T λ , as defined in Definition 4.1 with tuning parameter λ, at the model distribution G 0 is given by is as defined in Section 2 of the paper and (x t , y t ) is the point of contamination.
Before studying the above influence function, let us first recall different types of outliers in logistic regression model following the discussion in Croux and Haesbroeck (2003). A contamination point (x t , y t ) will be a leverage point if x t is outlying in the covariates space and will be a vertical outlier (in response) if it is not a leverage point but the residual y t − π(x T t β) is large. Croux and Haesbroeck (2003) also noted that, for the maximum likelihood estimator of β, a vertical outlier or a "good" leverage point (for which the residual is small) has bounded influence whereas a bad leverage point (e.g., misclassified observation etc.) has infinite influence for ||x t || → ∞.
Next, in order to study the similar nature of the influence function of the MDPDE having different λ, note that the influence function given in Theorem 4.1 can be factored into two components as where the first part Ψ λ depends on the score, s = x T t β 0 , and the response, y t , and is defined as .

A Wald-type test statistic in logistic regression based on MDPDEs
2753 Figure 1 shows the nature of this function over the score input at y = 0, 1 for different values of λ. Clearly, the function Ψ λ corresponding to λ = 0 (MLE) is unbounded as s → ∞, illustrating the well-known non-robust nature of the MLE. However, for λ > 0 the function Ψ λ is bounded in s and becomes more redescending as λ increase, which implies the increasing robustness of our proposed MDPDEs with increasing λ > 0. Further, to examine the effect of different types of leverage points more clearly, following Croux and Haesbroeck (2003), in Figure 2, we present the influence function of the MDPDE of the first slope parameter β 1 over the covariates values in a logistic regression model with two independent standard normal covariates and β 0 = (0, 1, 1) T fixing y t = 0 (without loss of generality). We can see that when both covariates tend to −∞ the influence function becomes zero for all MDPDEs including the MLE (at λ = 0). These are the "good" leverage points, as noted in Croux and Haesbroeck (2003), and all MDPDEs are robust with respect to such good leverages as in the case of MLE. However, when the covariates approach to ∞ they yield bad leverage points (generally corresponding to misclassified points) and have large influence for the MLE (λ = 0). But in this case the influence function values of the MDPDEs with λ > 0 are quite small even for these bad leverages and get progressively smaller as λ increases. This phenomenon gain indicates the increasing robustness of our proposed MDPDEs with larger positive λ.
Remark 4.1. Under the setup of Remark 2.1, even when the covariates are nonstochastic, we can derive the influence function of the corresponding MDPDE, β * λ , following Ghosh and Basu (2013). Whenever the covariates x i s are fixed, the contamination needs to be considered over the conditional distribution of the response given the covariates which are not identical for each groups with given fixed covariates. Hence, as in Ghosh and Basu (2013), we can consider the contamination in any one group or in all the group. This leads to the influence function of β * λ under contamination only in one group (i 0 -th, say) with covariate x i0 as given by Similarly, if there is contamination in all the groups with covariates x 1 , . . . , x I , respectively, at the contamination points y t1 , . . . , y t I , then the resulting influence function has the form Note that, since the response in a logistic regression takes only values 0 and 1, the contamination points y ti all take values only in {0, 1} (misclassification errors) and hence all the above influence functions are bounded with respect to contamination in response for all λ ≥ 0. Hence, the effect of these (misclassification) error in response cannot be clearly inferred only from these influence functions; see Pregibon (1982), Copas (1988) and Victoria-Feser (2000) for more examples such analysis of misclassification error in logistic regression with a fixed design. However, the above influence functions are bounded in the values of given fixed covariates only for λ > 0, implying the robustness of the MDPDEs with λ > 0 and non-robust nature of MLE (at λ = 0) with respect to the extreme values of the fixed design in any one group.

Influence function of the Wald-Type test statistics
We will now study the robustness of the proposed Wald-type test of Section 3 through the influence function of the corresponding test statistics W n defined in Definition 5. Ignoring the multiplier n, let us define the associated statistical functional for the test statistics W n evaluated at any joint distribution G as given by Now, considering the ε-contaminated joint distribution G ε = (1 − ε)G + ε∧ w with respect to the point mass contamination distribution ∧ w at the contamination point w = (x t , y t ), the influence function of W λ (·) is defined as Now, assuming the null hypothesis to be true, let G 0 denote the joint model distribution with true parameter value β 0 satisfying M T β 0 = m. Then, under G 0 , we have T λ (G 0 ) = β 0 and hence IF(w, W λ , G 0 ) = 0. Therefore, the first order influence function analysis is not adequate to quantify the robustness of the proposed Wald-type test statistics W λ . It is bounded in the contamination points w = (x t , y t ) for all λ ≥ 0 but does not necessarily imply the robustness of the tests since it includes the well-known non-robust MLE based Wald-test at λ = 0. This fact is consistent with the robustness analysis of different other Wald-type tests under different setups (See, for example, Rousseeuw and Ronchetti, 1979;Toma and Broniatowski, 2011;Ghosh et al., 2016b etc.) and we need to consider the second order influence analysis to asses the robustness of W λ . The second order influence function of the Wald-type test statistics W n at the joint distribution G is defined as Again, under the null hypothesis H 0 with β 0 being the corresponding true parameter value, this second order influence function simplifies further as presented in the following theorem and yields the possibility of studyng the robustness of our proposed tests through its boundedness.
Theorem 4.2. The second order influence function of the proposed Wald-type test statistics W n , given in Definition 5, at the null model distribution G 0 having true parameter value β 0 is given by Note that, the influence function of the Wald-type test statistic is directly a quadratic function of the corresponding MDPDE used. Hence, as described in the previous subsection, the influence function for the proposed tests with λ > 0 will be small and bounded for all kinds of outliers in a logistic regression model, whereas the classical MLE based Wald-type test will have an unbounded influence function for large "bad" leverage points. Figure 3 shows the plots of these second order influence functions for the Wald-type test statistics for different λ for testing the significance of the first slope parameter in a logistic regression model with two independent standard normal covariates and β 0 = (0, 1, 1) T fixing y t = 0. The behavior of the influence functions are again similar to those observed for the corresponding MDP DE in Figure 3, which shows the greater robustness of our proposal at larger positive λ over the non-robust MLE based Wald test at λ = 0.

Level and power influence functions
We now study the robustness of the proposed tests through the stability of their Type-I and Type-II error which are the two basic components for measuring the performance of any testing procedure. In particular, we will study the local stability of level and power of the proposed tests through the corresponding influence function analysis. Note that the finite sample level and power of our proposed Wald-type tests are difficult to compute and has no general form; on the other hand, the tests are consistent having asymptotic power equal one against any fixed alternative. So, we will study the influence function of the asymptotic level under the null β = β 0 and asymptotic power under the sequence of contiguous alternatives β n = β 0 + n −1/2 d as defined in, for example, Hampel et al. (1986) and Ghosh et al. (2016b) among others. In particular, assuming that the contamination proportion tends to zero at the same rate in which the contiguous alternatives approaches to the null, here we consider the following contaminated joint distribution for the power stability calculation as where w denote the contamination point w = (x T t , y t ) T , and G β n denote the joint model distribution with true parameter value β = β n . The contamination distribution to be considered for the level stability check can be obtained by substituting d = 0 in (4.2), which yields Then, the level and power influence functions are defined in terms of the following quantities α(ε, w) = lim n→∞ P G L n,ε,w (W n > χ 2 r,α ), and π(β n , ε, x) = lim n→∞ P G P n,ε,w (W n > χ 2 r,α ).

Definition 4.2. The level influence function (LIF) and the power influence function (PIF) for the Wald-type test statistics W n are defined respectively as
See Ghosh et al. (2016b) for an extensive discussion on the interpretations of the level and power influence functions and their relations with the influence function of the test statistics in the context of a general Wald-type test.
Next, we will derive the forms of the LIF and PIF for our proposed tests in logistic regression model assuming the conditions required for the derivation of asymptotic distributions of the MDPDE hold. (i) The asymptotic distribution of the test statistics W n under G P n,ε,w is noncentral chi-square with r degrees of freedom and the non-centrality param- (ii) The asymptotic power under G P n,ε,w can be approximated as π(β n , ε, w) ∼ = P χ 2 r (δ) > χ 2 χ 2 p (δ) denotes a non-central chi-square random variable with p degrees of freedom and δ as non-centrality parameter and χ 2 q = χ 2 q (0) denotes a central chi-square random variable having degrees of freedom q.

2759
and hence we get that where δ is as defined in Part (i) of the theorem. Part (ii) of the theorem follows from Part (i) using the infinite series expansion of a non-central chi-square distribution function in terms of that of the central chi-square variables:

Corollary 4.1. Putting ε = 0 in Theorem 4.3, we get the asymptotic power of the proposed Wald-type tests under the contiguous alternative hypotheses β
This is identical with the results obtained earlier in Theorem 10 independently.

Corollary 4.2.
Putting d = 0 in Theorem 4.3, we get the asymptotic distribution of W n under G L n,ε,w as the non-central chi-square distribution having r degrees of freedom and non-centrality parameter

Then, the asymptotic level under contiguous contamination is given by
In particular, as ε → 0, β * n → β 0 and the non-centrality parameter of the above asymptotic distribution tends to zero leading to the null distribution of W n . Now we can easily obtain the the power and level influence functions of the Wald-type test statistics from Theorem 4.3 and Corollary 4.2 and these have been presented in the following theorem.

Theorem 4.4. Under the assumptions of Theorem 4.3, the power and level influence functions of the proposed Wald-type test statistic W n is given by
Further, the derivative of α(ε, w) of any order with respect to ε will be zero at ε = 0, implying that the level influence function of any order will be zero.
Proof. We start with the expression of π(β n , ε, w) from Theorem 4.3. Clearly, by definition of PIF and using the chain rule of derivatives, we get Now d 0,w,λ (β 0 ) = d and standard differentiations give Combining above results and simplifying, we get the required expression of PIF as presented in the theorem.
It is clear from the above theorem that, the asymptotic level of the proposed Wald-type test statistic will be unaffected by a contiguous contamination for any values of the tuning parameter λ, whereas the power influence function will be bounded whenever the influence function of the MDPDE is bounded (which happens for all λ > 0). Thus, the robustness of the power of the proposed tests again turns out to be directly dependent on the robustness of the MDPDE β λ used in constructing the test. In particular, the asymptotic contiguous power of the classical MLE based Wald-type test (at λ = 0) will be non-robust whereas that for the Wald-type tests with λ > 0 will be robust under contiguous contamination and this robustness increases as λ increases further.

Simulation study
In this section we have empirically demonstrated some of the strong robustness properties of the density power divergence tests for the logistic regression model. We considered two explanatory variables x 1 and x 2 in this study, so k = 2. These two variables are distributed according to a standard normal distribution N (0,I 2×2 ). The response variables Y i are generated following the logit model as given in (1.1). The true value of the parameter is taken as β 0 = (0, 1, 1) T . We considered the null hypothesis H 0 : (β 1 , β 2 ) T = (1, 1) T . It can be written in the form of the general hypothesis given in ( Our interest was in studying the observed level (measured as the proportion of test statistics exceeding the corresponding chi-square critical value in a large number -here 1000 -of replications) of the test under the correct null hypothesis. The result is given in Figure 4(a) where the sample size n varies from 20 to 100. We have used several Wald-type test statistics, corresponding to different minimum density power divergence estimators. We have used, λ = 0, 0.1, 0.5 and 1, in this particular study. As it is previously mentioned, λ = 0 is the classical Wald test for the logistic regression model. The horizontal lines in the figure represents the nominal level of 0.05. It may be noticed that all the tests are slightly conservative for small sample sizes and lead to somewhat deflated observed levels. In particular, the Wald-type tests with higher values of λ are relatively more conservative. However, this discrepancy decreases rapidly as sample size increases.
To evaluate the stability of the level of the tests under contamination, we repeated the tests for the same null hypothesis by adding 3% outliers in the data. For the outlying observations we first introduced the leverage points where x 1 and x 2 are generated from N (μ c , σI 2×2 ) with μ c = (5, 5) T and σ = 0.01. Then the values of the response variable corresponding to those leverage points were altered to produce vertical outliers (y t = 1 was converted to y t = 0). Figure  4(b) shows that the levels of the classical Wald test as well as DPD(0.1) test break down, whereas Wald-type test statistics for λ = 0.5 and λ = 1 present highly stable levels.
To investigate the power of the tests we changed the null hypothesis to H * 0 : (β 1 , β 2 ) T = (0, 0) T , and kept the data generating distributions as before, as well as the true value of the parameter as β 0 = (0, 1, 1) T . In terms of the null hypothesis in (1.3) the value of m is changed to (0, 0) T whereas M remained unchanged from the previous experiment. The empirical power functions are calculated in the same manner as the levels of the tests, and plotted in Figure  4(c). The Wald test is the most powerful under pure data. The power of the Wald-type test statistic for λ = 0.1 almost coincides with the classical Wald test in this case. The performances of the Wald-type test statistics for λ = 0.5 and λ = 1 are relatively poor, however, as the sample size increases to 60 and beyond, the powers are practically identical.
Finally, we calculated the power functions under contamination for the above hypothesis under the same setup as that of the level contamination. The observed powers of that the tests are given in Figure 4(d). The Wald-type test statistics for λ = 0.5 and λ = 1 show stable powers under contamination, but the classical Wald test and the Wald-type test for λ = 0.1 exhibit a drastic loss in power. In very small sample sizes the classical Wald test and the Wald-type test for λ = 0.1 have slightly higher power than the other tests, but this advantage quickly disppears with increasing sample size. On the whole, the proposed Wald-type test statistics corresponding to moderately large λ appear to be quite competitive to the classical Wald test for pure normal data, but they are far better in terms of robustness properties under contaminated data.

Real data examples
In this section we will explore the performance of the proposed Wald-type tests in logistic regression models by applying it on different interesting real data sets. The estimators are computed by minimizing the corresponding density power divergence through the software R, and the minimization is performed using "optim" function.

Students data
As an interesting data example leading to the logistic regression model, we consider the students data set from Muñoz-Garcia et al. (2006). The data set consists of 576 students of the University of Seville. The response variable is the students aim to graduate after three years. The explanatory variables are gender (x i1 = 0 if male; x i1 = 1 if female), entrance examination (EE) in University (x i2 = 1 if the first time; x i2 = 0 otherwise) and sum of marks (x i3 ) obtained for the courses of first term. There were 61 distinct cases (i.e. n = 61) in this study. We assume that the response variable follows a binomial logistic regression model as mentioned in Remark 2.1. We are interested to test the null hypothesis that the gender of student does not play any role on their aim. So the null hypothesis is given by H 0 : β 1 = 0. Figure 5 shows p-values of Wald-type tests for different values of λ. Muñoz-Garcia et al. (2006) mentioned that the 32nd observation is the most influential point as it has a large residual and a high leverage value. If we use the classical Wald test or Wald-type tests with small λ under the full data, the null hypothesis is rejected at 10% level of significance. But this result is clearly a false positive as the outlier deleted p-values for all λ are close to 0.35. On the other hand, Wald-type tests with large λ give robust p-values in both situations.  Brown (1980), Pardo (2009) andZelterman (2005, Section 3.3) studied the data that focused on the evidence of lymphatic cancer in prostate cancer patients for predicting lymph nodal involvement of cancer. There were five covariates (three dichotomous and two continuous): the X-ray finding (x i1 = 1 if present; x i1 = 0 if absent), size of the tumor by palpation (x i2 = 1 if serious; x i2 = 0 if not serious), pathology grade by biopsy (x i3 = 1 if serious; x i3 = 0 if not serious), the age of the patient at the time of diagnosis (x i4 ) and serum acid phosphatase level (x i5 ). The diagnostics was associated with 53 individuals. An ordinary logistic model is assumed here. We are interested in testing the significance of the size of the tumor on the response variable, so the null hypothesis is taken as H 0 : β 2 = 0. The p-values of Wald-type tests for different values of λ are given in Figure 6. Martín and Pardo (2009) noticed that the 24th observation is an influential point. The p-value of the classical Wald test under the full data is 0.0430, but if the outlier is deleted it becomes 0.0668. So if we consider a test at 5% level of significance, the decision of the test changes when we delete just one outlying observation. However, Wald-type tests with high values of λ always produce high p-values.  Finney (1947), Pregibon (1981) and Martín and Pardo (2009) studied the data where the interest is in the occurrence of vasoconstriction in the skin of the finger. The covariates of the study were the logarithm of volume (x i1 ) and the logarithm of rate (x i2 ) of inspired air measured in liters. Pregibon (1981) has shown that two observations, the 4th and 18th, are not fitted well by the logistic model as they have large residuals. However, it can be checked easily that these observations are only outliers in the y-space and are not leverage points. Here we want to test that there is no effect of the covariates, so the null hypothesis is given by H 0 : β 1 = β 2 = 0. The p-value of the classical Wald test under the full data is 0.0194, and in the outlier deleted data it becomes 0.0371. But, Figure 7 shows that Wald-type tests with large λ produce large p-values.

Leukemia data
This data set consists of 33 cases on the survival of individuals diagnosed with leukemia. The explanatory variables are white blood cell count (x i1 ) and another variable which indicates the presence or absence of a certain morphological characteristic in the white cells (x i2 = 1 if present; x i2 = 0 if absent). This data set was also studied by Cook and Weisberg (1982), Johnson (1985) and Martín and Pardo (2009). They defined a success to be patient survival in excess of 52 weeks. We are interested to test the significance of two covariates, i.e. the null hypothesis is H 0 : β 1 = β 2 = 0. The plot of the p-values of Wald-type tests for different values of λ is given in Figure 8. Martín and Pardo (2009) noticed that the 15th observation is an influential point. The p-value of the classical Wald test under the full data is 0.0226, but if the outlier is deleted it becomes 0.0683. Thus, at 5% level of significance, the decision of the test depends on only one outlying observation. In this case also Wald-type tests with high values of λ always produce high p-values.

On the choice of tuning parameter λ
In this paper, we have proposed a robust family of Wald-type test statistics for testing general linear hypothesis under the logistic regression model, which depend crucially on a tuning parameter λ involved in its definition. We have seen from all the theoretical results and numerical illustrations throughout the paper that the power for contiguous alternative hypotheses for the proposed Wald-type tests decrease slightly with increasing λ under pure data with no contamination but, on the other hand, in presence of contamination in data the stability of both power and level increases drastically with increasing λ. In particular, it can be noted from Figure 4 that the loss in power is not very significant even for λ ≈ 0.5 under moderate sample size and this loss becomes almost zero for larger sample sizes; however levels of the tests are highly stable in presence of contamination for any sample size with λ ≈ 0.5. Further, from the real data examples ( Figures  5-8), we can also see that the p-values and the resulting inferences are highly robust for λ ≥ 0.4. All this empirical evidences suggest the use of λ ≈ 0.5 as an ad-hoc choice of tuning parameter while applying the proposed method in any practical problem and is expected to produce a fair enough trade-off between the power under pure data and robustness under contamination.
Although this ad-hoc choice of λ works quite well in practice, many practitioner may believe that the level of contamination is different for each practical data set and hence we should have different trade-off for each of them. This can be done through an appropriate algorithm to obtain an data-driven choice of λ separately for each sample, which provides a trade-off between true power and robustness against outliers for the test based on only the given data at hand. To develop such a method for our proposed Wald-type test statistics, we note that their performances are directly dependent on that of the MDPDE having the same tuning parameter λ. The power of our proposed Wald-type tests at contiguous alternatives, as obtained in Theorem 10, increases whenever the noncentrality parameter δ = d T Σ λ (β 0 ) d of the associated chi-square distribution decreases, i.e., whenever the asymptotic variance Σ λ (β 0 ) of the MDPDE used decreases and its asymptotic efficiency increases. Hence, as λ increases, both the asymptotic power of the Wald-type test at contiguous alternatives and the asymptotic efficiency of the MDPDE decreases slightly. On the other hand the influence function of the Wald-type test statistics as well as its power influence functions are a direct function of the influence function of the MDPDE with the same tuning parameter. Therefore the robustness of both the test and the MDPDE are equivalently dependent of their respective tuning parameter λ; in particular, their robustness increases significantly with increasing λ. Therefore, the problem of a suitable data-driven selection of the tuning parameter λ for the proposed Wald-type test statistics through proper trade-off between its power under true data and robustness can be equivalently solved by obtaining a data-driven tuning parameter with proper trade-off of asymptotic efficiency and robustness of the MDPDE used in constructing the test statistics.
There are a few existing approaches of selection of data-driven tuning parameter of the general MDPDE under i.i.d. setup; among them the popular one is the method of Warwick and Jones (2005) who proposed the minimization of an estimator of the MSE of the MDPDE to get optimum λ. The approach has been recently studied in many contexts with suitable extensions (Ghosh and Basu, 2013;Ghosh et al., 2016aGhosh et al., , 2016c) and shown to provide satisfactory performances in selecting proper tuning parameter for any given data set. Here, we will use their approach to propose a data-driven selection of the tuning parameter λ of the MDPDE and hence for the proposed Wald-type tests under the present logistic regression model. Following Warwick and Jones (2005), we need to minimize an estimate of the MSE of the MDPDE β λ as an function of the tuning parameter λ given by 1) where β * is the true value of the target parameter and β λ is the best fitting parameter that minimizes the DPD measure (with tuning parameter λ) between the true and the (model density. Note that, although we have considered the model to be correct in the previous parts of the paper, the above construction gives us more flexibility to work with true densities outside the model family also. In particular, the first term in (7.1) indicates the model misspecification bias and becomes zero whenever the true density belongs to the assumed model family. On the other hand, the second term in (7.1) simply gives the variance of the MDPDE. We need to get an estimate of this MSE based on the given data without assuming that the model is true and then minimize this suitable oven λ ∈ [0, 1] (may be through a grid search) to get the optimum λ for the data at hand.
In order to estimate the MSE in (7.1), let us first consider the (second) variance term. We have already provided estimator of J λ (β λ ) and K λ (β λ ) in Section 2 but assuming that the model is true. One can easily obtain their model free estimators also in a similar way, which could be given by where L 1 and L 2 are defined in Section 2. Next, in order to estimate the (first) bias term in (7.1), we can estimate β λ by the MDPDE β λ but there is no obvious choice for β * . Warwick and Jones (2005) suggested to use suitable pilot estimator β P in place of β * and use the following estimate of the MSE: Note that, this selection procedure clearly depends on the pilot estimator used. When we take the pilot estimator β P = β λ , it corresponds to the assumption of no model bias and the approach coincides with that of Hong and Kim (2001); this is clearly more restrictive and we lose generality of the procedure against outliers due to model misspecification. Alternatively, Warwick and Jones (2005) suggested, through an extensive simulation study, that the choice β P = β 1 works well enough for the case of MDPDE under i.i.d. data. Later  empirically concluded, while extending to the non-homogeneous setup, the choice β P = β 0.5 often works better. Here, we will first empirically examine a good choice of the pilot estimator for the present case of random design logistic regression and illustrate that this method works in practice for choosing a data-driven choice of tuning parameter λ.
Let us reconsider the simulation study discussed in Section 5, but now we perform the selection of λ following the above proposal for each iteration with different possible pilot estimators. Figure 9 gives the simulated level and power in the same setup of Figure 4. The average optimum values of λ for the pure data as well as the contaminated data are plotted in Figure 10. Pilot(λ) in these plots refers to the Wald-type test statistic where MDPDE with the tuning parameter λ is used as a pilot estimator. Figure 9 (a) and (c) show that in the pure data there is no significant effect of the pilot estimator to the level and power of the tests. In fact, Figure 10 (a) shows that the optimum λ turns out to be small (less than 0.25) in case of the pure data. This result is consistent with the result in Figure 4 (a) and (c) as we noticed almost no difference in the level or power of the tests with small values of λ. However, in the contaminated data the tuning parameter plays a vital role in the robustness of the test. This is also true for the pilot estimator. Figure 10 (b) shows that the optimum value of λ is still small if a pilot estimator with small λ is chosen. As a result, the level of the test breaks down and the power of the test is not sufficiently high.  On the other hand, a pilot estimator with large λ produces a large optimum λ (more than 0.5), so the corresponding test gives a stable level and high power, see Figure 9 (b) and (d). The simulation results indicate that the performance  Table 2 The optimum values of λ and the corresponding p-values in full data (p-value 1 ) and in outliers deleted data (p-value 2 ).
of the tests with pilot estimators λ = 0.5 and 1 both give sufficient robustness properties, moreover, λ = 1 gives slightly better results in this simulation setup. The similar scenario is observed in case of the DPD test with a fixed value of λ. Now, we apply the proposed method of optimal selection of tuning parameter λ to all our real data examples of Section 6. We use several pilot estimators, but finally, they produced almost same optimal λ; see in Table 1 the detailed results. We have also computed the p-values corresponding to these optimum value of λ in the full data and in outliers deleted data for each examples, which are reported in Table 2. Note that, the p-values do not change significantly in the presence of outliers when the tuning parameter λ is chosen optimally for each example following the proposed algorithm. The interpretation of the result remains the same as we discussed in the previous section; however, as it is based on the optimum choice of λ, it eliminates the subjective choice of the tuning parameter for the DPD tests.

Concluding remarks
Logistic regression for binary outcomes is one of the most popular and successful tools in the statisticians toolbox. It is frequently used by applied scientists of many disciplines to solve problems of real interest in their domain of application. However, in the present age of big data, the need for protection against data contamination and other modeling errors is paramount, and, wherever possible, strong robustness qualities should be a default requirement for statistical methods used in practice. In this paper we have presented one such class of inference procedures. We have provided a thorough theoretical evaluation of the proposed class of tests for testing the linear hypothesis in the logistic regression model highlighting their robustness advantages. We have also produced substantial numerical evidence, including simulation results and a large number of real problems, to demonstrate how these theoretical advantages translate in practice to real gains. On the whole, we feel that the proposed tests will turn out to be an useful set of tools with significant practical application.