Semiparametric density testing in the contamination model

: In this paper we investigate a semiparametric testing approach to answer if the parametric family allocated to the unknown density of a two-component mixture model with one known component is correct or not. Based on a semiparametric estimation of the Euclidean parameters of the model (free from the null assumption), our method compares pairwise the Fourier’s type coeﬃcients of the model estimated directly from the data with the ones obtained by plugging the estimated parameters into the mixture model. These comparisons are incorporated into a sum of square type statistic which order is controlled by a penalization rule. We prove under mild conditions that our test statistic is asymptotically χ 2 (1)-distributed and study its behavior, both numerically and theoretically, under diﬀerent types of alternatives including contiguous nonparametric alternatives. We discuss the counterintuitive, from the practitioner point of view, lack of power of the maximum likelihood version of our test in a neighborhood of challenging non-identiﬁable situations. Several level and power studies are numerically conducted on models close to those considered in the literature, such as in McLachlan et al. (2006), to validate the suitability of our approach. We also implement our testing procedure on the Carina galaxy real dataset which low luminosity mixes with the one of its companion Milky Way. Finally we discuss possible extensions of our work to a wider class of contamination models.


Introduction
Let us consider n independent and identically distributed random variables (X 1 , . . ., X n ) drawn from a two-component mixture model with probability den-sity function g defined by: where f 0 is a known probability density function, corresponding to a known signal, and where the unknown parameters of the model are the mixture proportion p ∈ (0, 1) and the probability density function f ∈ F (a given class of densities) associated to an unknown signal.Model (1.1) is widely used in statistics and is usually so-called the contamination model.This class of models is especially suitable for detection of differentially expressed genes under various conditions in microarray data analysis, see McLachlan et al. (2006) or Dai and Charnigo (2010).In astronomy such a model has been used to model mixtures of X-ray sources, see Melchior and Goulding (2018) and Patra and Sen (2016).Recently some applications have been also developed in selective Statistical Editing, see Di Zio and Guarnera (2013), in biology to model trees diameters, see Podlaski and Roesch (2014) or in kinetics to model plasma data, see Klingenber et al. (2018).
Many techniques have been proposed to estimate the Euclidean and functional parameters p and f in model (1.1).The most popular methods for known finite order mixture models, such as the moment method, see Lindsay (1989), the moment generating function based method, see Quandt and Ramsey (1978), or the maximum likelihood method, see Lindsay (1983), are largely used but suffer from the requirement of assigning a parametric form to the f density.Since then, some semiparametric approaches have been developed, such as the pioneer work by Bordes et al. (2006), to relax that parametric modelling.These authors only restricted, for example, their study to the class of location-shift symmetric densities in order to make model (1.1) semiparametrically identifiable.More recently, different nonparametric approaches have been also considered, such as in Nguyen and Matias (2014) where f 0 is a uniform distribution on [0, 1].In Ma and Yao (2015), where f 0 is only supposed to belong to a parametric family, a tail identifiability approach is used, considering symmetric distributions embedded in a nonparametric envelop.We also recommend the recent work by Al Mohammad and Boumahdaf (2018) who consider situations where the unknown component f is defined through linear constraints.In Balabdaoui and Doss (2018) a log-concave assumption is done on the family F to insure the identifiability of the model.In Patra and Sen (2016) the identifiability and estimation problem is considered under tail conditions with very few shape constraints assumptions.
The goal of the present paper is to answer a very natural question, explicitly raised in McLachlan et al. (2006, Section 6) or Patra and Sen (2016, Section 9.2), which is basically "can we test if the unknown component of the contamination model belongs to a given class of parametric densities ?", or more formally can we test where f θ is a probability density function parametrized by an Euclidean parameter θ belonging to a parametric space Θ.For simplicity we will restrict ourselves imsart-generic ver.2014/10/16 file: DParxiv-V2.tex date: March 11, 2019 to the case where f θ is a symmetric probability density function with respect to a location parameter µ ∈ R, as described in (2.1), but discuss in Section 10 how our approach can be generalized to any class of parametric densities provided that model (1.1) can be √ n-estimated semiparametrically.This problem has been considered recently by Suesse et al. (2017), who use a maximum likelihood estimate-based testing approach.In general the behavior of the maximum likelihood estimator is difficult to control or figure out, as illustrated in Section 7, under the alternative since the model is then misspecified.To get a consistent testing method under both H 0 and H 1 , at the price of some shape restriction about H 1 , we propose to use an H 0 ∪ H 1 consistent semiparametric estimation approach in order to build a H 0 -free statistic (do not forcing to fit into the parametric model).To the best of our knowledge this is the first time that an H 0 -free semiparametric approach is used to test mixture models.The advantage of this new strategy will be demonstrated, both theoretically and numerically, on very counterintuitive examples in the close neighborhood of non-identifiable situations, see Fig. 1 and comments.For a general overview about semiparametric mixture models we recommend the recent surveys by Xian et al. (2018) or Gassiat (2018).Note that the test against a specific distribution, proposed in Bordes and Vandekerkhove (2010, Section 4.1), does not allow to test versus a complete class of probability density functions which is our goal here.To point out the interest of the statistical community about the contamination problem testing, let us mention the very recent work by Arias-Castro and Huang (2018) on the sparse variance contamination model testing and references therein.The main idea of our test is based on the data driven smooth test procedure developed by Ledwina (1994), extending the idea of Neyman (1934), which consists in estimating the expansion coefficients of f in an orthogonal basis, first assuming f ∈ S (the set of symmetric probability density functions with respect to a location parameter µ ∈ R), and to compare thes estimates to those obtained by assuming f ∈ F. This approach has been used in Doukhan et al. (2015), see also references therein, but the specificity of the two-component mixture model necessitates a special adaptation of the Neyman smooth test.In our case we develop a two rates procedure, one rate driven by the asymptotic normality of the test statistic and another one driven by the almost sure rate of convergence of the semiparametric estimators.As we will discuss along this paper, the approach of Suesse et al. (2017), restricted to model (1.1), does not allow to investigate the asymptotic behavior of the test statistic under alternative assumptions (possibly contiguous) since the asymptotic behavior of the maximum likelihood estimator cannot be controlled properly under distribution misspecification.Another aspect of our nonparametric approach is that it can easily deal with situations where f 0 is only known through a training data.This situation is illustrated in Section 9 through a real dataset collecting the radial velocity of the Carina galaxy and its companion Milky Way.The paper is organized as follows: in Section 2 we describe our two-step test methodology; in Section 3 we state the assumptions and asymptotic results under the null hypothesis; Section 4 is dedicated to the test divergence under the alternative; Section 5 is devoted to the study of our testing procedure under imsart-generic ver.2014/10/16 file: DParxiv-V2.tex date: March 11, 2019 contiguous nonparametric alternatives (inspired from the parametric contiguous alternative concept); in Section 6 we discuss the choice of the reference measure when considering orthogonal bases for the unknown density decomposition; in Section 7 we conduct a power comparison between the semiparametric and maximum likelihood versions for our test, this section enlightens interestingly the fact that a maximum likelihood approach could force, in certain setups of the McLachlan et al. (2006, Section 6) Gaussian mixture model, to consider the number q of components defining f equal to 1 when in reality q = 2; Section 8 is dedicated to a simulation-based empirical and power levels study; in Section 9 we proceed with the application of our testing method to the datasets (breast cancer, colon cancer, HIV) previously studied in McLachlan et al. (2006) and to the Galaxy dataset studied in Patra and Sen (2016).Finally in Section 10 we discuss further leads of research connected with the contamination model testing problem.

Testing problem
Let us consider an independent and identically distributed sample denoted (X 1 , . . ., X n ), drawn from a probability density function g defined in (1.1) with respect to a given reference measure ν.The problem addressed in this section deals with testing the unknown component f assuming the fact that f belongs to S, the set of symmetric densities provided with the identifiability conditions in Bordes and Vandekerkhove (2010, p. 25).More precisely, denoting F = {f (µ,θ) ; (µ, θ) ∈ Λ} the set of densities with respect to ν, with mean µ and shape parameter θ where (µ, θ) is supposed to belong to a compact set Λ of R × Θ, our goal is to test Our test procedure is based on the Ledwina (1994) approach and consists in estimating the expansion coefficients of the unknown density f in an orthogonal basis, first assuming f ∈ S, and comparing in contrast these estimates to those obtained when f is supposed to belong strictly to the sub-parametric family F.
As intuitively expected, we will show how the study of the successive expansion coefficient differences helps in detecting possible departure from H 0 given the data.We will denote by with δ jk = 1 if j = k and 0 otherwise, and where the normalizing factors q 2 k ≥ 1 will permit to control the variance of our estimators, as illustrated in Lemmas 1 and 3. We assume that Q is an L 2 (R, ν) Hilbert basis, which is satisfied if there exists θ > 0 such that R e θ|x| ν(dx) < ∞, and that the following integrability conditions are satisfied: Then, for all x ∈ R, we have From (1.1) we have Let us denote by Z a random variable with density f µ,θ and consider The null hypothesis can be rewritten as c k = α k (µ, θ), for all k ≥ 1, or equivalently as Since the probability density function f 0 is known, the coefficients b k are automatically known.As a consequence, for all k ≥ 1, the coefficients a k can be estimated empirically by: To avoid possible compensation phenomenon under H 1 between the estimation of ϑ = (p, µ) and the estimation of the α k 's, the estimator of (p, µ) will be obtained without assuming the null hypothesis, that is using the semiparametric estimator θn = (p n , μn ) introduced in Bordes et al. (2006) and studied more deeply in Bordes and Vandekerkhove (2010).Indeed, as numerically demonstrated in Section 7, the maximum likelihood estimator ( p n , µ n , θ n ) under the null assumption tends to provide the best H 0 -fitted model when the semiparametric estimator of Bordes and Vandekerkhove (2010) is not influenced by this constraint and can provide very distant, Euclidean and functional, estimations under H 1 (when the model is misspecified under the null assumption).In the same way, considering the relation (1.1), the estimator of θ is obtained by the H 0 -free semiparametric plug-in moment method satisfying where E θ (X p 1 ) means that we express this expectation as a function of θ.The estimator of α k (µ, θ) is obtained by using a standard plug-in approach, that is: imsart-generic ver.2014/10/16 file: DParxiv-V2.tex date: March 11, 2019 To illustrate our general approach, let us detail the Gaussian case here.If F is equal to G the set of normal densities with mean µ and variance θ = s, then the plug-in moment yields sn where M2,n = n −1 n i=1 X 2 i .Now coming back to generality, looking at the H 0 reformulation in (2.3) we expect that the differences will allow us to detect any possible departure from the null hypothesis.For simplicity matters and without loss of generality, since the b k 's are known constants, we assume from now on them to be equal to zero.For all k ≥ 1, we define the k-th order coefficient of our test statistic (incorporating the k-th order departure information from H 0 ) where U k,n = (R 1,n , . . ., R k,n ) and where D k,n is an estimator of where var(R i,n ) is a weakly consistent estimator of var(R i ) as n → +∞, and e(n) → 0. Following Ledwina (1994) and Inglot et al. (1997), we suggest a data driven procedure to select automatically the number of coefficients needed to answer the testing problem.We introduce the following penalized rule to pick parcimouniously (trade-off between H 0 departure detection and complexity of the procedure involved by index k) the "best" rank k for looking at T k,n : where s(n) → 0 is a normalizing rate, d(n) → +∞ as n → +∞, pen(n) is a penalty term such that pen(n) → +∞ as n → +∞, and the β k 's are penalization factors.In practice we will consider To match the asymptotic normality regime, under H 0 , of the test statistic T k,n defined in (2.6), the normalizing factor s(n) is usually taken equal to one, but in our case, due to the specificity of the semiparametric mixture estimation (possibly adapted to nonparametric contiguous alternatives), we chose: . By using the delta method, we can show that the second term of the above quantity is asymptotically normal, however the behavior of the first term looks much more difficult to analyze due to the random factor term s n inside the semiparametric estimate F n .In addition of this technical difficulty, it would also be more satisfactory to investigate a Kolmogorov type test based on embracing the whole complexity of F (0,1) , instead of a χ 2 (k)-type test based on the above expression evaluated over a k-grid.Again this is a very challenging problem.In that sense our approach allows to get a sort of asymptotic framework to capture the whole complexity of f through its (asymptotically unrestricted) decomposition in a base of orthogonal functions.

Assumptions and asymptotic behavior under H 0
To test consistently (2.1), based on the statistic T (n) = T Sn,n , we will suppose the following conditions: , where e(n) is the trimming term in (2.7).
and there exists nonnegative constants M 1 and M 2 such that for all (µ, θ) ∈ Λ, where αk denotes the gradient (∂α k /∂µ, ∂α k /∂θ) (A3) There exists a nonnegative constant M 3 such that for all (k, i) Under these three conditions, which will be checked respectively in Lemma 1 and 3 for the Gaussian and the Lebesgue reference measure, we state the following theorem.
Corollary 3.Under (A1-3), the test statistic T (n) converges in law towards a χ 2 -distribution with one degree of freedom as n → +∞.
Remark 4. Theorem 2 and Corollary 3 still hold if we replace in T (n) the semiparametric estimators and their (asymptotic) variances by their maximum likelihood counterparts.The proofs of these two results are completely similar to the semiparametric case and rely on the asymptotic normality of the maximum likelihood estimator detailed in the supplementary material file.In this case the rate of the selection rule is the standard one, which is namely s(n) = 1.

Asymptotic behavior under H 1
In the next proposition we study the behaviour of our test statistic under Proposition 1.If f ∈ S \ F, then the test statistic T (n) tends to +∞ in probability with a n λ -drift, 0 < λ < 1/2, as n → +∞.
We would like to stress out the fact that the identifiability conditions supposed when considering the class of densities S, see definition in Section 2, are crucial in the proof of Proposition 1.As mentioned in Bordes, Delmas and Vandekerkhove (2006), there exists various non identifiability cases for model (1.1).Let us remind the following one from Bordes and Vandekerkhove (2010): where ϕ is an even probability density function, p ∈ (0, 1) and This example is very interesting since it clearly shows the danger of estimating model (1.1) when the probability density function of the unknown component has exactly the same shape as the known component.In particular if ϕ is a given Gaussian distribution and we want to test if the 2nd component is Gaussian, we could possibly either reject or accept H 0 with our testing procedure depending on the convergence of our semiparametric estimators.Indeed the maximum likelihood estimator would converge towards the natural underlying Gaussian model and the semiparametric method could possibly converge towards both solutions.To avoid this very well identified concern, we recommend to check if the departures between the maximum likelihood estimator and the semiparametric one is not driven by a factor 2, i.e µ n ≈ 2μ n and p n ≈ pn /2.To advise on this possible proximity, one could check if µ n /2 and 2 p n respectively belong to the 95% confidence intervals of µ and p derived from the asymptotic normality of (p n , μn ), see Bordes and Vandekerkhove (2010).Now if so, we suggest to initialize the semiparametric approach close the maximum likelihood estimator to force it to detect the possibly existing f -component in model (1.1).

Detected contiguous alternatives
We consider in this section a vanishing convolution-class of nonparametric contiguous alternatives.More specifically, the null hypothesis consists here in considering that the observed sample X n = (X 1 , . . ., X n ) comes from where (U i ) i≥1 and (Y i , Z i ) i≥1 are respectively independent and identically distributed sequences distributed according to a Bernoulli distribution with parameter p and f 0 ⊗ f µ,θ , where f µ,θ is the unknown density function with respect to the reference measure ν.For each n ≥ 1, the contiguous alternative consists in the fact that the observed sample X (n) = (X n 1 , . . ., X n n ) comes from a row independent triangular array: is an independent and identically distributed sequence of random variables, independent from the Z's and δ n → 0 as n → +∞ (vanishing factor).We assume here that, ∀i ≥ 1, Z i + δ n ε i / ∈ S. In the Gaussian case this assumption is insured if the ε s are non Gaussian.It is also assumed that the E(e | 1| < ∞.This type of contiguous modeling looks natural to us as, in any experimental field, measurement errors could happen, represented above by the δ n ε i 's, and additively impact the Z true underlying phenomenon.We also remind at this point that the distribution of the Y 's is theoretically known by assumption.
The whole contiguous models collection will be denoted To emphasize the role of index n in the triangular array, we will denote all the estimators depending on X (n) or any function depending on G (n) , the cumulative distribution function of the X (n) i 's, with the extra superscript (n) ; for example, with this new notational rule, the estimator pn (X (n) ) of p will be denoted p(n) n .Similarly we will denote by g n the kernel density estimator of g (n) involved in the contiguous alternative setup, see the supplementary material file, defined by where the bandwidth h n satisfies h n → 0, nh n → +∞ and K is a symmetric kernel density function detailed in the supplementary file.We will denote also by E (n) and P (n) the expectation and probability distribution under the alternative and consider the following assumptions: The vanishing factor satisfies δ n = n −3/4−ξ , with 3γ < ξ < 2γ + 1/4.(A6) There exists a nonnegative constant C such that for all k ∈ N, where Condition (A6) is checked in Lemmas 2-4 for the Gaussian and the Lebesgue reference measure.It is also satisfied for any reference measure with bounded support.For simplicity, we refer to condition (A2-3) under H * 1 in the proposition below.This means that both conditions are satisfied for all n ≥ 1 replacing X 1 by X n 1 .Following the proof of these conditions in Appendix under H 0 it is possible to establish explicit moment conditions on ε, adapted to the moments of Z, to insure (A2-3) under H * 1 .These conditions being technical and their proof being painful but straightforward we do not detail them here.Proposition 2. If assumptions (A1-6) hold, then, under H * 1 , S n converges in Probability towards 1 and T (n) converges in law towards a χ 2 -distribution with one degree of freedom, as n → +∞.

Undetected contiguous alternatives
Combining Assumptions (A4) and (A5), we clearly have 0 < ξ < 1/3 and then there exists ξ = 3/4+ξ ∈ (3/4, 13/12) such that δ n = n − ξ .The convergence rate of δ n to zero is slow enough to distinguishe the asymptotic null hypothesis when n tends to infinity.Contrarily, we now consider two convergence rates which are too fast to recover the asymptotic null distribution of the test statistic, despite the convergence of the contiguous alternative towards the null hypothesis.These convergence rates are given under the following assumptions: where ε denotes a generic random variable involved in the above definition of the Z n 's.The rate in (A7) will control the mean deviation due to the perturbations ε and the rate given in (A8) will allow to control the variance of these perturbations when there is no mean deviation.Proposition 3. If assumptions (A7) or (A8) holds, then, under H * 1 , T (n) converges in probability towards +∞.Moreover, under (A7) S n converges in probability towards 1, and under (A8) S n converges in probability towards 2.

Choice of the reference measure and test construction
In order to run our test, we have to select now a reference measure ν and an ad.hoc.orthogonal family Q = {Q k , k ∈ N}.The choice of the ν depends clearly of the support of X 1 .For a compact support, one can choose a uniform distribution for ν and their associated Legendre polynomials.Since our numerical studies are dedicated to the Gaussian case, we illustrate here the choice of ν corresponding to two measures on the real line: the Gaussian and the Lebesgue one.The verification of conditions (A2-3) for these two measures is relegated in the supplementary material file.
Gaussian reference measure.In practice, in the present paper, we chose for ν the standard normal distribution for testing the Gaussianity.This choice is adapted to any distribution having support on the real line.The set Q is constructed from the f (0,1) -orthogonal Hermite polynomials defined for all k ≥ 0 by: We have H k 2 = k! and, for illustration purpose, the six first polynomials are: Lemma 1.Let H k be defined by (6.1) and let Q k (x) = H k (x), for all x ∈ R.
Assume that we want to test H 0 : f ∈ G, where G is the set of Gaussian densities.
Remark 5. Lemma 1 can be extended to non Gaussian null distribution f with known moments as discussed in Remark 1 in supplementary file.
Lemma 2. Let H k be defined by (6.1) and let Then condition (A6) is satisfied.
Lebesgue reference measure.Another simple ν reference measure could the Lebesgue measure over R. In that case, we would rather consider the set of orthogonal Hermite functions defined by: where ), with H k defined in (6.1).In addition we have Lemma 3. Let H k be defined by (6.2) and let Then conditions (A2-3) are satisfied.Test construction.The computation of the test statistic T (n) = T Sn,n , see expressions (2.6) and (2.8), is grounded on the computation of the α i (µ, s)'s quantities.We detail here the expression of R 1,n and var(R 1,n ) when the reference measure is Gaussian associated with the Hermite polynomials.To overcome the complex dependence between the estimators a 1,n , pn , μn and sn , we split the sample into four independent sub-samples of size n 1 , n 2 , n 3 , n 4 , with We use the first sample to estimate a 1 , the second sample to estimate p, the third one to estimate µ, and the last one to estimate s.We get α 1 (µ, s) = µ and α 1,n = μn which makes We propose a consistent estimator of var(R 1,n ): where S 2 X,n1 denotes the empirical variance based on (X 1 , . . ., X n1 ), and v p,n2 , respectively v µ,n3 , denotes the consistent estimator of var(p n2 ), respectively var(μ n3 )), obtained from Bordes and Vandekerkhove (2010, p. 40).The computation of the test statistic first requires the choice of d(n), e(n) and s(n).A previous study showed us that the empirical levels and powers were overall weakly sensitive to d(n) for d(n) large enough.From that preliminary study we decided to set d(n) equal to 10.The trimming e(n) is calibrated equal to (log(n)) −1 .The normalization s(n) = n α−1 is setup close enough to n −1/2 , with α equal to 2/5, which seemed to provide good empirical levels.
Secondly, since the probability density functions considered in our set of simulation are R-supported we use the standard Gaussian distribution for ν and its associated Hermite polynomials for Q.All our simulations are based on 200 repetitions.Let us remind briefly that the empirical level is defined as the percentage of rejections under the null hypothesis and that the empirical power is the percentage of rejections under the alternative.Finally the asymptotic level is standardly fixed to 5%.

Semiparametric and maximum likelihood approaches comparison
In our testing procedure we estimate p, µ by the semiparametric estimators proposed in Bordes and Vandekerkhove (2010) instead of the maximum likelihood estimators.In the same way our estimation of θ, see expression (2.5), is H 0 -free contrary to what would happen when using the maximum likelihood technique.Both approaches are asymptotically equivalent under the null hypothesis, see remark 4, and all the simulations we did shown very similar empirical levels when comparing the semiparametric and maximum likelihood approaches under null models.However, under certain types of alternatives, the maximum imsart-generic ver.2014/10/16 file: DParxiv-V2.tex date: March 11, 2019 likelihood approach can lead to very unexpected empirical powers.These behaviors are due to compensation phenomenon in models close, for example, to the non-identifiable one described in Section 4. To illustrate clearly this point we detail here the Gaussianity test in these cases.Write where h a,s (x) = (f (0,s) (x − a) + f (0,s) (x + a))/2, a = 0, f (0,s) being the Gaussian density, centered, with variance s.We notice that (7.1) turns to satisfy, when µ = a and s = 1, the following rewriting In this case there are two different parametrizations for (7.1): one that we call the null parametrization, coinciding with H 0 with null parameters p 0 = p/2, µ 0 = 2µ and s 0 = 1, see the right hand side of (7.2).The other one is called the alternative parametrization, coinciding with H 1 with p 1 = p, µ 1 = µ and s 1 = µ 2 +1, see the right hand side of (7.1).By construction the maximum likelihood estimator will favor the null parameters.We study now this phenomenon through a set of simulations where the parameters are µ = 4, s = 1 and p = 0.4.For comparison, we used the same initial values for the both semiparametric and maximum likelihood algorithms, namely (p, µ, s) = (0.3, 6, 8.5), which is exactly between the null parametrization (p, µ, s) = (0.2, 8, 1), and the alternative parametrization (p, µ, s) = (0.4,4, 17).It is of interest to study now the behavior of the semiparametric and maximum likelihood testing methods when the true model deviates smoothly from the null hypothesis in two ways: i) the unknown component is a h a,1 with µ = a, i.e µ-symmetric mixture detected by the semiparametric method (a + µ)-centered Gaussian attracting the maximum likelihood method this case will be called the mean deviation trap, and ii) the unknown component is a h a,s with µ = a but s = 1, i.e.
µ-symmetric mixture detected by the semiparametric method ) when actually two Gaussian components (q = 2) would be necessary to accurately fit the model.
Mean deviation trap.We consider deviations from the null model obtained by considering µ = 3, 2, 1 and s = 1.Fig. 1 shows the g probability density function under these respective alternatives.It can be observed that, if we try to visually detect a mixture of two Gaussian distributions, the probability density function of the left-side component moves clearly aside the Gaussian distribution family as µ moves largely away from a = 4, i.e. when µ = 1, but we bet that many practitionners would probably vote "intuitively" for a mixture of two Gaussian distributions when µ = 3 or 2. Fig. 1 in supplementary file illustrates the difficulty of the maximum likelihood estimator to recognize the alternative model when the mean deviation is not distant enough (here µ = 3 and a = 4).Based on a run of 200 repetitions, it is shown that the maximum likelihood estimation is trapped at the null parametrization which namely imsart-generic ver.2014/10/16 file: DParxiv-V2.tex date: March 11, 2019 is (p, µ, s) = (0.2, 7, 1) when on the opposite, the semiparametric estimation detects the correct (p, µ, s) = (0.4,3, 17) alternative parametrization.In Fig. 2 we display respectively the empirical power of our testing procedure based on the maximum likelihood and the semiparametric approach for µ = 3, 2, 1, a = 4, s = 1, and for n = 1000, 2000, 5000.As expected the maximum likelihood approach barely detects the alternative for small values of n when its semiparametric counterpart surpasses it with up to 10 times more correct decision results.The reason of this lack of power is due to the fact that our test focuses more on the moments of the second components than those of the first one and, as seen in Fig. 1, the second components looks pretty much Gaussian even for µ = 1.Variance deviation trap.We consider the variance deviations s = 2, 3, 4, fixing µ = a = 4. Fig. 3 shows the g probability density function under these alternatives.Empirical powers are displayed in Fig. 4. We can observe that both  powers associated with the maximum likelihood and semiparametric approach increase according to the variance deviation but it is worth to notice that the detection based on the maximum likelihood approach is again very poor compared to the semiparametric approach.As a conclusion, this set of numerical exper- iments shows the clear interest, in terms of testing power, of considering the semiparametric versus the maximum likelihood approach especially in a close neighborhood of non-identifiable type (1.1) Gaussian models.simulated examples is reported in Fig. 2 of the supplementary file.It appears that a significant number of observations is needed to get close to the theoretical level.This drawback can be balanced by the fact that today, as mentioned in the Introduction, genomic datasets usually contain thousands of genes which makes our methodology in practice suitable for a wide class of standard (from the sample size view point) microarray analysis problems.

Empirical powers
In this section we consider the Gaussian testing problem (1.2) with F = G where  5 and 6.
As expected, when comparing pairwise the Student alternatives, the power is greater for the t(3) distribution compared to the t(10) distribution.The t( 3) is very clearly detected by the test since the detection level is greater than 80% for all the cases and even close to 100% for n = 7000.Now, similarly to the mean and variance deviation trap setups investigated in Section 7, we can observe that the power is greater as p increases, which practically means that the Student component is enhanced in the model (remind that our test procedure is focused on the 2nd-component moments analysis) .We display the mixture densities corresponding to this set of alternatives in Fig. 3 of the supplementary file.For the first Student alternative, comparing p = 1/2 and p = 0.98, we can observe that a serious jump happens in terms of dissimilarity between the alternative model and the best fitted (same mean and variance) Gaussian null-model.For p = 0.98, the Student distribution strongly prevails and the test is automatically empowered.The second alternative is also detected, but with a lower power, let say between 40 % and 90%, due to the proximity of the Student t(10) with the Gaussian N (0, 1).In Fig. 3 of the supplementary file we can see how close the null distribution and the t(10) alternative are, especially for p = 1/3 and p = 1/2, and visually evaluate how challenging these testing problems really are.
The empirical powers for Laplace alternatives are given in Fig. 6.The power is larger with the alternative L(1, 2) than with the alternative L(1, 1).Indeed the L(1, 2) distribution has a stronger shape departure from the Gaussian than the L(1, 1), and the associated mixture densities inherit these characteristics as we can see in Fig. 3 of the supplementary file.These alternatives are globally very well detected by our method and the power increases strongly when p gets closer to 1 (see Fig. 6 curve in green).here by the fact that the nonparametric and the maximum likelihood estimators lead to notably different values especially on p.
HIV data.We consider the HIV dataset of vant' Wout et al. (2003).It contains expression levels of n = 7680 genes in CD4-T-cell lines, after infection with the HIV-1 virus.The maximum likelihood estimations of the parameters are p n = 0.98, µ n = −0.15,s n = 0.79.The semiparametric method provides pn = 0.99, μn = 0.20 and s = 0.80.The p-value given by our testing procedure is equal to 0.64, associated with the decision S n = 1.As a consequence the normality under H 0 cannot be rejected despite the fact that the maximum likelihood and semiparametric estimations of µ are quite different but both close to 0, meaning a strong overlap of the mixed distributions (see the almost symmetry of the third probability density function in Fig. 7).
Galaxy data.We consider here the Carina dataset, see Walker et al. (2007), studied previously in Patra and Sen (2016).Carina is a low luminosity galaxy companion of the Milky Way.The data collects n = 1266 measurements of the radial velocity of stars in Carina.This is a contamination model in the sense that the measurements of stars in the Milky Way are mixed with some of Carina (overlapping).The Milky Way is largely observed, see Robin et al. (2003).
Figure 8 shows the density f 0 of the radial velocity of Milky Way, estimated over n = 170, 601 observations.This density is clearly not zero-symmetric but in such a case it is enough to refer to the tail-oriented set of identifiability conditions of Proposition 3 i) in Bordes et al. (2006) to make the semiparametric estimation method still valid.Note also that the asymptotic results of (2018) and references therein.Though, the fact that the unknown component of g under H 0 is supposed to have a parametric form should definitely help to control some technical tail issues specific to the Kullback estimation.We obtained for p and µ, respectively the proportion and the mean of the Carina radial velocity, the following estimations: pn = 0.361 and μn = 222.60.
In their study, Patra and Sen (2016) obtained very similar values: p = 0.323 and µ = 222.9.However, the estimation of the variance s appears to be highly sensitive to the estimation of p.Using the plug-in estimator given by (2.5) we get sn = 453.93.Note that the estimation given in Patra and Sen (2016) was s n = 56.4 which looks far from the expected value given the data.To illustrate this remark, we compare in Fig. 8 the kernel density estimate of the observed data with the probability density of model (1.1), obtained by replacing (p, µ, s) by our estimates (p n , μn , sn ) and the Patra and Sen (2016)'s estimates ( p n , µ n , s n ).We can observe that our estimation provides an excellent fitting when the variance estimated by Patra and Sen (2016) appears to be way too small.Our test procedure yields a p-value equal to 0.75 with a test statistic T Sn,n = T 1 = 0.097.As a consquence, there is no evidence here to reject the normality of the Carina radial velocity.

Discussion and perspectives
In this paper we proposed an H 0 -free testing procedure to deal with the delicate problem of the contamination model parametrization.In our numerical study we focused our attention on the Gaussianity testing problem however it is very important to remind that our asymptotic results can be generalized to any suitable distribution (possibly non-symmetric).Indeed, if the unknown distribution of model (1.1) is embedded in a nonparametric envelop S provided with identifiability constraints and if there exists a corresponding semiparametric √ n-consistent method, then the asymptotic results in Sections 3-4 extends straightforwardly.For this latter case, we recommend the recent work by Al Mohammad and Boumahdaf (2018) who consider in model (1.1) an unknown component defined through linear constraints.In their paper, the authors derive an original consistent and asymptotically normally distributed semiparametric estimation method with asymptotic closed form variance expressions.Indeed, when considering null assumptions different from the Gaussian case, basically only the shape parameter estimation, usually deduced from moment equations, and the choice of the orthogonal basis described in Section 2 could possibly change, depending on the support of the tested distribution.Wavelet functions and Laguerre polynomials could respectively be used for probability density functions on the whole, respectively positive, real line, when Legendre, or cosine bases could be used for densities with compact support.Also, with a slight adaptation of our work, we could definitely test the unknown component of the contamination model considered in the recent work by Ma and Yao (2015) where the first component density is only supposed to belong to a parametric family (the first component is not entirely known anymore).For each case, the use of the maximum likelihood or semiparametric approach could be again discussed.On the other hand, as it has been demonstrated in Section 7, see Figs. 2 and  4, the semiparametric testing approach shows better power performances than the maximum likelihood version especially in the neighborhood of the mean and variance deviation trap situations (up to 10 times more efficient for small sample sizes).We also proposed in Section 5 a vanishing convolution-class of nonparametric contiguous alternatives and studied theoretically their detectability imsart-generic ver.2014/10/16 file: DParxiv-V2.tex date: March 11, 2019 under certain convergence rate conditions.In a futur work it would be very interesting to address the contiguous detection problem associated with the mean and variance deviation trap setups.This would namely consist in looking at the asymptotic behavior of our test when replacing respectively the parameters µ and s in the mean and variance deviation trap setups by sequences µ n and s n converging respectively towards a and 1 as n goes to infinity.The major technical difficulty here is that we are not able to establish yet optimal bounds of convergence for the semiparametric Euclidean estimator associated with a triangular array driven by the above asymptotic parametrization, see Remark 4 in the supplementary material file.Future work is also to consider a K-sample extension, K ≥ 2, in the spirit of Wylupeck (2010), Ghattas et It is important for us to keep the summation term up to d(n) in the left hand side of the above inequality-type event in order to straightforwardly use the almost sure rate of convergence of the semiparametric Euclidean parameters, see (11.5)- (11.6).We decompose R k,n as follows: By using the inequality (a + b) 2 ≤ 2(a 2 + b 2 ), for all (a, b) ∈ R 2 , we get We study now all the above quantities separately.By the Markov inequality, we first have where the right hand side term goes to zero as n → +∞ since d(n)/log(n)e(n) = O(1) according to (A1) and (2.9).Secondly, by decomposing pn )), we obtain the following majorization Since the α k,n 's are bounded by M 1 according to (A2), we have where the last right hand side term goes to zero as n → +∞ since λ ∈ (0, 1/2) and ) for all α > 0, by Bordes and Vandekerkhove (2010).By denoting ρ 0 = (µ 0 , θ 0 ) and ρn = (μ n , θn ), we also have ρn − ρ 0 2 = o a.s.(n −1/2+α ), for all α > 0. Since the αk,n 's are bounded by M 2 according to (A2), using the mean value theorem we obtain: imsart-generic ver.2014/10/16 file: DParxiv-V2.tex date: March 11, 2019 r j = 0.For simplicity matters let us consider j 0 = min {j ≥ 1 : r j = 0}.Since from (2.6), for every k ≥ 1 fixed, we can decompose T k,n as follows: it comes that for all k < j 0 , T k,n = O p (n λ−1 ) since the r 's are all equal to zero for 1 ≤ ≤ k, when instead for the index j 0 we have T j0,n ≥ n λ r 2 j0 +O p (n λ−1/2 ).It comes that for all k < j 0 we have P (s(n)T k,n − β k pen(n) < s(n)T j0,n − β j0 pen(n)) −→ 1, as n → +∞.
This obviously shows, according to S n s' definition (2.8), that S n ≥ j 0 with probability one as n → +∞.Now, since T k,n is a k-increasing sequence for every given n ≥ 1, we have that T Sn,n ≥ T j0,n ≥ n λ r 2 j0 + O p (n λ−1/2 ) which proves the wanted result.Note that the right hand side of the previous inequality shows clearly a drift of our test statistic in O p (n λ ), 0 < λ < 1/2, under the alternative H 1 .To prove that the right hand side term of the above probability goes to zero as n → +∞, we decompose R (n) k,n as follows: with α which denotes the expectation of the k-th difference between the H We study the two above probabilities separately.First we have, according to the Markov inequality and Condition (A3), that where the last right hand side term goes to zero as n → +∞ according to (A1).(n −λ ).Since the αk 's are bounded by M 2 according to (A2), using the mean value Theorem, we obtain:

Secondly we have
, which last term goes to zero as n → +∞ according to (A1).Hence from (11.7), we obtain that P(S n ≥ 2) → 0 as n → +∞.Therefore, using the proofs of Corollary 3 we get the limiting distribution of the test statistic T (n) under H * 1 .Proposition 3. Let us compute the close forms of the quantities ψ 1,n and ψ 2,n defined in (11.9).It first comes imsart-generic ver.2014/10/16 file: DParxiv-V2.tex date: March 11, 2019 normalizing the test statistic as in Munk et al. (2010).To avoid instability in the evaluation of D −1 k,n , following Doukhan et al. (2015), we add a trimming term e(n) to every i-th, i = 1, . . ., k, diagonal element of D k,n as follows:

Fig 2 :
Fig 2: Empirical powers obtained with the maximum likelihood approach (left) and semiparametric approach (right) under the trap effect for µ = 3, 2, 1 and a = 4

Proposition 2 .
Similarly to the proof of Theorem 2, we haveP S (n) n ≥ 2 ≤ P
Since according to Theorem 3.2 in Bordes and Vandekerkhove (2010) the semiparametric estimator F n of F satisfies a functional central limit theorem, one could consider s n in (2.5) as a natural estimate of s under H 0 and evaluate the square of denotes the set of Gaussian densities with mean µ and variance s, compared to Student and Laplace alternatives.First a 1-shifted Student distribution t(3), having a shape far enough from the Gaussian distribution, with a shift µ = 1.Second a shifted Student t(10), again with a shift equal to 1, but having a shape closer to the null Gaussian distribution.Third a Laplace distribution L(1, 1) with mean 1 and variance 2. The last alternative is a Laplace L(1, 2) with mean 1 and variance 8.The empirical powers for Student and Laplace alternatives are respectively summarized in Fig.
Bordes and Vandekerkhove (2010)still hold if the cumulative distribution function F 0 is replaced by a smooth empirical estimate F 0,n based on a n = ϕ(n) sized training data provided with n/n → 0 as n → +∞.Unfortunately the study of the maximum likelihood estimate, see Section 5 of the supplementary file, cannot be generalized straightforwardly since the non-parametric estimation of the Kullback distance, obtained by replacing f 0 by a kernel density estimate f 0,n in the log-likelihood, is known to be very a delicate problem, see Berrett et al.
Left side: estimated density of the Milky Way Radial velocities.Right side: in black, the plot of the Carina dataset nonparametric density estimate.In red, resp. in blue, the plot of the model (1.1) probability density function under f = f (µ,s) and obtained by plugging (p n , μn , sn ), resp.( p n , µ n , s n ), into (p, µ, s).
al. (2011), or more recently Doukhan et al. (2015).More precisely, we could test the equality of K unknown components through K observed mixture models.Acknowlegement.The authors acknowledge the Office for Science and Technology of the Embassy of France in the United States, especially its antenna in Atlanta, for its valuable support to this work.11.Appendix: proofs of the main results Theorem 2. Let us prove that P(S n ≥ 2) vanishes as n → +∞.By definition of S n in (2.8) and D k,n [•] in (2.7) we have for all λ ∈]0, 1/2[: