Probing the scale of New Physics at the LHC: the example of Higgs data

We present a technique to determine the scale of New Physics (NP) compatible with any set of data, relying on well-defined credibility intervals. Our approach relies on the statistical view of the effective field theory capturing New Physics at low energy. We introduce formally the notion of testable NP and show that it ensures integrability of the posterior distribution. We apply our method to the Standard Model Higgs sector in light of recent LHC data, considering two generic scenarios. In the scenario of democratic higher dimensional operators generated at one-loop, we find the testable NP scale to lie within $[10,260]$ TeV at $95\%$ Bayesian credibility level. In the scenario of loop-suppressed field strength-Higgs operators, the testable NP scale is within $[28,1200]$ TeV at $95\%$ Bayesian credibility level. More specific UV models are necessary to allow lower values of the NP scale.


Introduction
Several major experimental and theoretical facts like the measurement of neutrino masses, proofs of the existence of dark matter, as well as the hierarchy problem or the striking hints for Grand Unification all point towards the existence of physics beyond the Standard Model (SM). Although there are strong expectations that such New Physics (NP) should show up at an energy scale close to the electroweak scale, direct searches for new states have so far turned out to be unsuccessful. Indirect constraints from electroweak precision measurements at LEP also push the NP scale Λ above the electroweak scale.
Overall, it seems that Λ should be substantially higher than the electroweak scale, Λ m Z . This paradigm is adopted in a large amount of propositions of new physics. We adopt this fairly general hypothesis in the present work. It implies that the NP involved in physical processes at an energy scale E Λ can be integrated out. This results in a low-energy effective theory, consisting of the Standard Model supplemented by infinite series of local, higher-dimensional operators (HDOs) involving negative powers of the NP scale Λ, Considering a set of experimental observations through this effective description of new physics, we can wonder what information can be obtained about Λ. For a dataset perfectly compatible with the SM, it is common to derive a lower bound on Λ, barring some fine-tuned cancellations among HDO-induced contributions. On the other hand, if data show a deviation with respect to the SM, arbitrary high values of Λ should be also disfavoured, as the effective theory reduces to the SM in the decoupling limit Λ → ∞ and cannot explain the discrepancy. Finding a general method to consistently infer the range of Λ compatible with some data -whether they deviate or not from the SM -is the subject of the present work.
We are going to use the effective theory approach within the framework of Bayesian statistics. An important feature of the Bayesian framework is that any irrelevant parameter can be consistently eliminated in a well-defined way through integration. Here we will be mainly interested in the probability distribution of Λ, p(Λ|data), which will be obtained through integration over all the α i coefficients. Adopting a Bayesian view is appropriate to account for the generic character of the scenario we will consider (i.e. it ensures that no fine-tuning is present in the scenario). 1 The outline of this note is as follows. In Section 2 we shortly review the basics of Bayesian inference and discuss its application to effective theories. In Section 3 we show that one has to require NP to be testable to obtain an integrable posterior. The basic MCMC setup and conceptual subtleties inherent to our approach are discussed in Section 4. Although inference on Λ applies to any kind of data, it is particularly motivated by current LHC results. In Section 5 we apply our method to the Standard Model Higgs sector, using the latest pieces of information available from CMS, ATLAS and Tevatron. We discuss the leading constraints and the necessary conditions favouring lower values of the NP scale.

Effective theory and Bayesian inference
Let us briefly review necessary notions of Bayesian statistics (see [2] for an introduction). In this approach, the notion of probability p is defined as the degree of belief about a proposition. Our study lies in the domain of Bayesian inference, which is based on the relation In our case θ ≡ {Λ, α 1...n } are the parameters of the higher-dimensional operators (HDOs) defined in Eq. (1). The parameter space will be denoted by D. 2 M is the Standard Model extended with HDOs, and d represents the experimental data. The distribution p(θ|d, M) is the so-called posterior distribution, p(d|θ, M) ≡ L(θ) is the likelihood function encoding experimental data, and p(θ|M) is the prior distribution, which represents our a priori degree of belief in the parameters. The posterior distribution is the core of our results. Being interested in the new physics scale, we focus on the marginal posterior p(Λ|d, M), obtained by integrating all HDO coefficients α's, The prior and posterior distributions do not need to be normalized to unity to carry out the inference process in its broader meaning. For example, assuming some significant deviation from the SM is present in the data, it is sufficient to look at the bump in Λ's improper posterior to have a good idea about the values of Λ favored by the data. However, to go further and determine intervals associated with an actual probability (Bayesian Credible intervals), the posterior does need to be normalizable. More precisely, the posterior needs to be "proper". It should be integrable on a unbounded domain like R. Over a bounded domain, the integral should be independent of the bound of the domain, unless the bound is well justified.
In the rest of this section we will observe that the Λ's posterior is improper. We will find the conceptual subtlety at the origin of this improperness, then propose a slight conceptual change leading to a proper Λ's posterior. In this work we consider as valuable the ability to determine Bayesian Credible intervals, and thus to have proper posteriors. However, even without paying particular interest to properness and Bayesian Credible intervals, the conceptual observations and their consequences that we will present below are in any case relevant for anyone interested in inference on Λ.
For concreteness, we give to the NP scale a logarithmically uniform distribution, By doing so, all the orders of magnitude are given the same probability weight. This is arguably the most objective choice, justified by the "principle of indifference" [3,4]. 3 There is no sensible argument to fix the upper bound on Λ. The prior of Λ is therefore improper. Similarly, we give uniform priors to the α's. Contrary to the domain of Λ, there are well justified bounds on α's because of perturbativity of the EFT approach. Indeed for Λ > 4πv, perturbativity implies α i ∈ [−16π 2 , 16π 2 ]. We refer to [5] for more details about the bounds on HDOs coefficients. Although the priors adopted above are well motivated, the whole approach including the upcoming statements remains valid for any kind of priors, as long as the domain of α's remains bounded.
Determining the posterior distribution of Λ is a standard Bayesian procedure. However a peculiarity of the Λ posterior is that in the decoupling limit Λ → ∞, the likelihood tends to its SM value L → L SM . As the logarithmic prior of Λ is also improper, it turns out that the posterior distribution is improper in the Λ direction, To understand the origin of this improperness, let us introduce the notion of "testability", carrying the usual meaning as given e.g. in philosophy of science (see e.g. [6]). Considering the effective Lagrangian Eq. (1), we observe that, for Λ → ∞, the new physics cannot manifest itself in the data. It is therefore not testable at Λ → ∞. However, the behaviour of p(Λ|d) in the decoupling limit does not seem to reflect this fact, as it remains constant up to the 1/Λ factor coming only from the prior.
Let us be more precise by translating the notion of testability in a formal way. We adopt the following definition as a Bayesian translation of testability. "A model is testable with respect to the SM for a given dataset d whenever L = L SM ". We would like to know what happens to our posterior when we require testability. For a continuous parameter space, requiring testability corresponds to excising a slice Ω Λ,L SM of the parameter space, defined as Ω Λ,L SM = {α i |L(Λ, α i ) = L SM }. Therefore, by requiring testability of the HDOs-extended SM, inference is made on the possibilities of new physics which are actually testable by the data. Stated differently, to the initial question "What can we learn about Λ from d ?", we already know that the answer is "Nothing" whenever L = L SM . We therefore discard this particular possibility, to investigate the NP which can be actually probed by d.
The fact that the requirement of testability leads to a proper posterior will be demonstrated in Sect. 3 and in the Appendix. We admit it for the rest of this section. Requiring testability, the marginal posterior of the NP scale Λ is then expressed as In our approach this distribution is the relevant object to inform us on the NP scale and will be therefore at the center of our interest for subsequent applications. We refer the reader to Sect. 3 for a formal discussion. Notice this subtlety about testability usually does not matter in cases where the posterior is proper. Typically, the likelihood is continuous and bounded, such that the subdomain Ω Λ,L SM has measure zero. Excluding this subdomain therefore does not change integrals of the posterior, and leaves the results of inference unchanged. The requirement of testability becomes important in our case because the posterior is improper. More generally this problem is susceptible to appear whenever the NP scale is a free parameter of the model. Some qualitative comments can be made about the different effects driving the shape of the p * (Λ|d, M) posterior. Both tails will drop to zero, fast enough to let the distribution be integrable. Let us consider the low-Λ tail of the distribution. Even though experimental constraints push Λ to high values, it often happens that some precise cancellations between various HDOs contributions allow Λ to go to low values. However, the regions of parameter space in which precise cancellations occur have weak statistical weight by construction, such that their unnatural character is built-in the Bayesian approach (see [7] for more considerations on naturalness). We conclude that the low-Λ tail is set by the trade-off between goodness-of-fit and possible fine-tuning. Considering the high-Λ tail, if the data d are compatible with the SM, the shape is asymptotically independent of d, and is only dictated by the probability of M to be testable. In contrast, when d shows some deviation with respect to the SM, a good fit of the deviation favours low values of Λ. The high-Λ tail is thus shaped by the two effects. It is set by default to a profile depending only on M, which is overwhelmed by the shape dictated by goodness-of-fit once an excess appears in the data. The high-Λ tail behaviour can be observed in the toy model of App. Appendix D.

Inference on testable new physics
In this Section we scrutinize the posterior to better understand how its integral diverges. We then show that the requirement of having testable NP leads to a proper posterior. It is necessary to use the framework of Lebesgue integration to treat rigorously the following questions. In doing so, we will introduce the Lebesgue measure µ. 4 In what follows we let Λ go to infinity, such that the designation"proper" is equivalent to "integrable". Various proofs are reported in the Appendix, as well as a useful example of explicit computation within a toy model.
Let us first show that the integral of the posterior distribution diverges, i.e.
To do so, let us rewrite Eq. (7) to make appear the manifold Ω Λ,L defined by fixed values of the likelihood, 5 where The Jacobian is J where µ is the Lebesgue measure. It is shown in App. Appendix A that p(Λ, L|d, M) tends to a Dirac peak (i.e. ν tends to the Dirac measure) in the decoupling limit, A schematic picture of the p(Λ, L|d, M) distribution is shown in Fig. 1 Employing Radon-Nikodim (RN) decomposition, the measure ν can be decomposed as where ν c is absolutely continuous with respect to Lebesgue measure while ν d is discrete. The discrete measure satisfies and we can then identify our "excised" marginal posterior as The presence of the absolute value is related to a non-trivial subtlety in the definition of the excised probability space, that is discussed in App. Appendix C. In the decomposition of ν defined by Eqs. (13)-(15) , it turns out that the contribution from the discrete measure ν d is infinite, In contrast, one can show that the ν c measure leads to a contribution which is finite, for any HDO with dimension m + 4. The proofs of Eqs. (16), (17) are given in Appendix Appendix B. From this point of view, it appears that the divergent part of the posterior is localized on the subspace Ω Λ,L SM . It is precisely the domain where the new physics cannot be tested by the data. Requiring testability, we reduce the parameter space to D\Ω Λ,L SM , such that only the contribution Eq. (17) remains in the posterior integral. This contribution being finite, the posterior of testable NP is well proper.
We can check that the requirement of testability is harmless regarding the experimental information. Let us recall that the likelihood function comes initially from an experimental probability density function (PDF) p X (x) associated with some observable X. We assume that p X has no discrete component. The repartition function of the observable X is Expressing x as a function of (Λ, α i ), the likelihood function is then The domain Ω Λ,L SM is mapped onto the SM value of the observable x SM . Excluding this domain amounts to excluding the point x SM from the experimental density. A single point having measure zero, this leaves the repartition function unchanged. We conclude that the restriction from D to D\Ω Λ,L SM leaves the experimental information invariant.

The MCMC setup
In the present work we are going to evaluate posterior distributions by means of a Markov Chain Monte Carlo (MCMC) method. The basic idea of a MCMC is setting a random walk in the parameter space such that the density of points asymptotically reproduces a target function, in our case the posterior distribution. Any marginalisation is then performed through a simple binning of the points of the Markov chain along the appropriate dimension. We refer to [8,2] for details on MCMCs and Bayesian inference. Our MCMC method uses the Metropolis-Hastings algorithm with a symmetric, Gaussian proposal function. We check the convergence of our chains using an improved Gelman and Rubin test with multiple chains [9]. The first 10 4 iterations are discarded (burn-in).
Some precautions about the MCMC method are necessary regarding the subtleties about improper posteriors discussed in Sects. 2, 3. Indeed, using the MCMC method, we are not working with the exact continuous posterior distributions, as the one discussed in Sect. 3. Instead, we are manipulating histograms which are estimators of the exact posteriors. These estimators are discrete distributionŝ where N is the number of points and ∆ (n) is the bin size along the various dimensions.
The estimator tends to its estimand p(Λ, α i |d, M) when N → ∞, ∆ → 0, i.e. in the continuum limit with infinite sampling. Notice the bin size can be optimized for a given N . Too large bins give a poor estimation of the distribution, while too thin bins suffer from large binomial noise. It exists therefore an optimal bin size to minimize estimators uncertainty. As far as we know it is commonly determined in a ad-hoc way. We proceed similarly in this note.
In the continuous case, we found in Sect. 3 that the L = L SM subdomain (i.e Ω Λ,L SM ) shall be excluded to obtain a proper posterior. This feature is translated into the discrete estimator case as follows. Let us evaluatep without the L = L SM restriction. Consideringp in the (Λ, L) plane, for Λ → ∞, the only non zero bin ofp is the bin containing the value L SM . This is the discrete equivalent of the Dirac peak obtained in Eq. (12). To obtain the estimator of p(Λ|L = L SM , d, M), we have therefore to excise this bin. This is the discrete equivalent of the L = L SM restriction. The fact that we exclude a seemingly finite slice of the parameter space should not be surprising, as for the estimatorp, space is not continuous but discrete. Finally, the upper bound Λ < Λ max also has to be finite in practice. For a given finite N and a given bin size, there exists a finite value Λ =Λ above which all points ofp are in the L SM bin. In practice one has therefore to make sure that Λ max is large enough such thatΛ < Λ max .

Probing Λ in the Higgs sector
In this Section we apply the inference process defined through Sect. 2-4 to the Standard Model Higgs sector extended with higher-dimensional operators. The theoretical treatment of HDOs and the analysis of data we used are the same as realized in the recent work [5]. Here we briefly review the main points of the analysis, and refer to this work for any further theoretical and experimental details.
The Higgs sector is supplemented by a set of CP-even dimension-6 operators, whose basis is chosen to be 6 Here J H and J f are SU (2) or U (1) Y currents involving the Higgs field and the fermion f respectively, and J = f J f are the SM fermion currents coupled to B µ and W µ . This choice of basis is such that the field strength-Higgs operators O F F 's cannot be generated at tree-level in a perturbative UV theory. We therefore consider two general cases, "democratic HDOs" and "loop-suppressed O F F 's ", depending on whether or not the O F F 's are loop-suppressed with respect to the other HDOs. Moreover, in important classes of models like for the R-parity conserving MSSM, the HDOs can only be generated at one-loop. We will therefore consider two cases within the democratic HDOs scenario, one with tree-level HDOs, α i ∈ [−16π 2 , 16π 2 ], and one with loop-level HDOs, α i ∈ [−1, 1]. For the case of loop-suppressed O F F 's, we assume that the unsuppressed HDOs are generated at tree-level. We therefore investigate three scenarios whose features are summarized in Tab. 1. In case of tree-level HDOs, perturbativity of the HDO expansion |α|/Λ 2 < 1/v 2 imposes an additional constraint for Λ < 4πv. We take custodial symmetry to be an exact symmetry of the theory, such that the operators Ø D , Ø D 2 are set to zero. Finally, we emphasize that these scenarios are generic, in the sense that they encompass all known UV models in addition to the realizations not yet thought of. This implies that features predicted only by specific UV models, like suppression of HDOs or precise cancellations between HDOs, will get a small statistical weight, as we consider the whole set of UV realizations.
Concerning data, we take into account the results from Higgs searches at the LHC and at Tevatron as well as electroweak precision observables and trilinear gauge couplings. Higgs results [28,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27] have to be exploited with care as HDOs modify both Higgs decays and production. We use results (partly) accounting for correlations between the subchannels when they are available. When estimated decomposition into production channels are unavailable, we take the relative ratios of production cross sections for a SM Higgs [29, 28] as a reasonable approximation. The Higgs mass is set to m h = 125.5 GeV, close to the combined mass measurement from the two experiments, since it is not yet possible to take it as a Democratic HDOs nuisance parameter without losing the correlations between production channels. We take into account the electroweak precision observables using the Peskin-Takeuchi S and T parameters [30,31]. Beyond S and T , the W and Y parameters [32] should be used in the HDO framework. However the constraints arising from these parameters are by far negligible with respect to our other constraints. Experimental values of S and T are taken from the latest SM fit [33], S = 0.05 ± 0.09 and T = 0.08 ± 0.07 with a correlation coefficient of 0.91. Regarding constraints on TGV, we take into account the LEP measurements [34].
Applying the method described in Sect. 2-4, we obtain the normalizable posterior distributions p * (Λ|d). One can always normalize them to unity such that we will designate them as probability density functions (PDFs). It turns out that the posterior PDF of the NP scale for tree-level and one-loop democratic HDO is essentially the same under a shift log 10 Λ → log 10 Λ − log 10 (4π) ≈ log 10 Λ − 1.10, i.e. a rescaling Λ → Λ/4π. This happens because the region |α| ∈ [0, Λ 2 /v 2 ] for tree-level HDOs has a negligible impact on the posterior, such that the tree-level and one-loop scenarios can be identified through a rescaling of Λ. The posterior PDFs of the NP scale for the various scenarios are shown in Fig. 2 [28,1200] TeV for the scenario of loop-suppressed Ø F F 's.
We find the leading constraint on Λ to be the Higgs data for democratic HDOs, while these are the electroweak observables for loop-suppressed Ø F F 's. This can be understood as follows. The Ø F F operators are mapped onto field strength-Higgs anomalous couplings, among which the ζ g h(G µν ) 2 and ζ γ h(F µν ) 2 couplings. Given that the corresponding SM couplings are generated at one-loop, ζ g,γ need to be sensibly suppressed to not induce large deviations in the predictions of gluon fusion and h → γγ processes (see [5] for details). For democratic HDOs, this need of small ζ g,γ pushes Λ to high values in order to suppress the Ø F F 's. In contrast, for the scenario of loop-suppressed Ø F F 's, the ζ g,γ 's are already loop-suppressed with respect to other anomalous couplings by assumption. This alleviates the aforementioned constraint, leaving the S, T measurements as leading constraints. Having identified the leading constraints, we may comment about the necessary conditions allowing more specific UV models to reach lower values of Λ. For models having democratic HDOs, a suppressed Ø GG is required to reduce the ζ g coupling. The ζ γ coupling being proportional to s 2 w α W W + c 2 w α BB − 1 2 c w s w α W B , precise cancellations among these various terms may occur within an appropriate UV model, while they are unprobable (i.e. fine-tuned) in the generic scenario. Note both conditions on ζ g and ζ γ need to be fulfilled in order to lower the values of Λ. If only one of the ζ's is suppressed, the outcome will still remain similar to the left plot of Fig. 2. This occurs in particular when these ζ's are generated perturbatively. In that case one has ζ g /ζ γ ≈ g 2 s /g 2 Y 1, such that ζ γ is naturally suppressed with respect to ζ g , which then becomes the leading constraint. Concerning models with loop-suppressed Ø F F 's, the main condition to reach a lower Λ is to have a suppressed Ø D . This operator induces the main contribution to the S parameter, αS ≈ s 2 w α D v 2 /Λ 2 , other contributions to S, T being loop-suppressed (see [5]).
The PDFs presented above are given for an optimal size of the bins. 7 To exemplify the uncertainty associated with the MCMC estimation of the PDFs, we compute the 95% BCIs obtained with twice more bins and twice less bins. We find the variations over log 10 Λ to be O(2%). The origin of these variations lies in the uncertainty inherent to the concrete estimation method presented in Sect. 4, and is not related to the formal inference process described in Sects. 2, 3.

Conclusion
Whenever one considers a set of data -showing or not a significant deviation from the Standard Model, it is interesting to ask what information can be obtained about the energy scale of a possible underlying new physics. We present a method to consistently infer the distribution of Λ from any dataset. In doing so we use a statistical view of the unknown NP parametrized by higher-dimensional operators. To obtain a proper posterior, necessary to create Bayesian credible intervals, we point out the requirement that NP has to be testable by the data.
We formally demonstrate using Lebesgue integration that this requirement implies proper posteriors. In doing so we introduce a subspace where the likelihood itself is taken as a random variable. Some conceptual subtleties related to this trick are discussed, and a helpful toy model is introduced in the appendix. Given that Monte Carlo Markov Chains methods are commonly used to realize statistical inference, we describe the concrete implementation of this inference process in MCMCs.
As an illustration, we apply our approach to the SM Higgs sector, in light of recent data. Building on the recent work [5], we consider the scenarios of democratic HDOs and loop-suppressed Ø F F 's. For democratic HDOs, we obtain 95% Bayesian credible intervals of [123, 3300] TeV and [9.8, 260] TeV, for respectively tree-level and loop-generated HDOs. For loop-suppressed Ø F F 's, we find the 95% BCI [28,1200] TeV, assuming that unsuppressed HDOs are generated at tree-level. More specific UV models suppressing some particular HDOs or predicting otherwise fine-tuned relations are necessary to favour lower values of the NP scale.
Proof In the decoupling limit Λ → ∞, L tends to L SM . Thus, for any arbitrary small δL > 0, it exists a finite Λ =Λ such that Ω Λ,L = ∅ for any Λ >Λ and |L − L SM | > δL. In the decoupling limit with L = L SM , the integration domain Ω Λ>Λ,L therefore reduces to the null set. This implies Let us now study the behaviour for L = L SM . Defining ∂ i L = ∂L(x i )/∂x i , a Λ −m factor out from the Jacobian J = ( i (∂L/∂α i ) 2 ) 1/2 , The . f n converges pointwise to f and we have f n (x) ≤ f (x), such that f dµ = lim n→∞ f n dµ by the Monotone Convergence Theorem (MCT). We define k SM such that k SM /n < f SM < (k SM + 1)/n. We then have and lim n→∞ f SM n = f SM , lim n→∞ f * n = f * . Let us compute f SM n dµ where µ is the Lebesgue measure. Given that L → L SM for Λ → ∞, for any arbitrary small δL = k SM /2 n − L SM , it exists a finiteΛ such that f ∈ E k SM for any Λ ∈ [Λ, ∞]. Therefore µ(E k SM ) = ∞. This implies f SM n (Λ) dµ = ∞, then f SM (Λ) dµ = ∞ by the MCT, and thus dν d = ∞.
Let us now compute f * n dµ. µ(E k =k SM ) is finite. We have to show that the sum over n converges. To do so we first simplify f * using the Λ → ∞ limit. The Limit Comparison Test (LCT) will ensure that the simpler function has the same integrability features as f * . We will denote the successive simplified functions byf * .
We factorize the Λ prior and factorize the likelihood function out from the first integral such that For any finiteΛ, one can expand the likelihood with respect toΛ/Λ, L can be reexpressed as The LCT ensures that one can replace L by its truncated expansion to study the integrability of f * . In this limit, µ(Ω .f * being Riemann integrable over [Λ min , ∞] and absolutely convergent, it is therefore Lebesgue integrable. We deduce that f * n dµ converges for n → ∞ , thus f * n dµ converges as well by the LCT, the integral f * dµ is therefore finite and so is dν c .

Appendix C. Probability definition in the excised space
Here we discuss the subtlety that leads to the apparition of the absolute value on |dν c /dµ| in Eq. (15). We stress that this discussion mainly matters at the formal level. In practice, for example when computing the posterior p * (Λ|d) using the MCMC method of Sec. 4, this question will not appear.
First, notice that we expressed our posterior distribution as a function of the likelihood L. This is perfectly allowed, as the likelihood can be just seen as a random variable as another. However the likelihood is also a conditional probability. Our "excised" space D\Ω Λ,L SM is thus rather particular.
Second, let us note that in the Kolmogorov axioms of probability, the positivity axiom can be seen as a simple sign convention. For any sample space Ω, requiring p(Ω) = −1 and p(E) ≤ 0, ∀E ∈ Ω, the subsequent results just change by a sign flip. Let us denote by p (−) the probabilities defined in this way, and by p (+) the usual positive probabilities. One of the consequence of using the p (−) system is that the expectation of a random variable X is given by (C.1) When using such convention, a crucial point is that the conditional probabilities must still be taken positive, contrary to the actual probabilities -inconsistencies would appear otherwise due to the probability multiplications. The freedom to switch between the p (+) and p (−) system of probabilities is just a symmetry of the classical probability theory. Keeping these points in mind, let us now compute the naive expectation L * Λ of the likelihood L over the excised parameter space D\Ω Λ,L SM . To do so we use the RN decomposition of Eq. (13), and obtain It is clear that L Λ is not necessarily larger than L SM . This typically happens when data disfavor the model with respect to the SM. We deduce that L * Λ can take both signs. But in the two paragraphs above, we emphasized that L is a conditional probability, and as such must be positive whatsoever. We conclude that ν c has to be taken as a probability measure of the p (−) kind, whenever L Λ − L SM < 0. The actual expectation is then L * Λ = − L dν c , which is positive as it should. We thus end up with the prescription that the measure ν c is taken as a probability p (+) or p (−) when L Λ − L SM is respectively positive or negative. Finally, as soon as we restrict ourselves to the excised space, we always have the freedom to switch between p (−) and p (+) . Choosing to deal only with p (+) , the probability density over the excised space is expressed as hence the absolute value in Eq. (15).

Appendix D. The BSM coin
To exemplify our approach, let us adopt a simple NP model. Suppose that the SM predicts that a certain coin is fair. It comes Heads or Tails with probability 1/2. Suppose that a HDO modifies the probability such that the coin is not fair anymore, 9 p(H|Λ, α) = 1/2 + α/Λ , p(T |Λ) = 1/2 − α/Λ . (D.1) The SM is recovered for Λ → ∞, or if α = 0. Λ is given a logarithmic prior over [2, ∞[, α is given a flat prior over [−1, 1]. Let us assume that the coin is tossed twice and comes down "H,T". We toss the coin only twice for simplicity of the subsequent expressions. A more complicated likelihood would unnecessarily complicate the formulas. In doing so, data favor the SM hypothesis. We can thus expect a likelihood L Λ < L SM .
The  9 We are grateful to the referee for pointing out to us this simple example. (D.14) The proportionality constant is the same for both terms. The divergent piece cancels between both terms, leaving As a final illustration, p(Λ|d) and p * (Λ|d) are shown on Fig. D.3 for various outcomes of the BSM coin tossing. As discussed in Sec. 2, the shapes remain roughly identical when data are compatible with the SM. In contrast, a bump appears in p(Λ|d) when the data favor the BSM hypothesis. The high-Λ tail of p * (Λ|d) drops increasingly quick with the increase of BSM evidence.  (5,5), (20,20) in respectively blue, purple, yellow. Right pannel: p(Λ|d)×Λ (top) and p * (Λ|d)×Λ (bottom) distributions for (H, T ) = (5, 5), (5,15), (5,20), (5,30) in respectively blue, purple, yellow, green.