Anatomy of the Higgs fits: a first guide to statistical treatments of the theoretical uncertainties

The studies of the Higgs boson couplings based on the recent and upcoming LHC data open up a new window on physics beyond the Standard Model. In this paper, we propose a statistical guide to the consistent treatment of the theoretical uncertainties entering the Higgs rate fits. Both the Bayesian and frequentist approaches are systematically analysed in a unified formalism. We present analytical expressions for the marginal likelihoods, useful to implement simultaneously the experimental and theoretical uncertainties. We review the various origins of the theoretical errors (QCD, EFT, PDF, production mode contamination...). All these individual uncertainties are thoroughly combined with the help of moment-based considerations. The theoretical correlations among Higgs detection channels appear to affect the location and size of the best-fit regions in the space of Higgs couplings. We discuss the recurrent question of the shape of the prior distributions for the individual theoretical errors and find that a nearly Gaussian prior arises from the error combinations. We also develop the bias approach, which is an alternative to marginalisation providing more conservative results. The statistical framework to apply the bias principle is introduced and two realisations of the bias are proposed. Finally, depending on the statistical treatment, the Standard Model prediction for the Higgs signal strengths is found to lie within either the $68\%$ or $95\%$ confidence level region obtained from the latest analyses of the $7$ and $8$ TeV LHC datasets.


Introduction and summary
Besides the historical discovery of a resonance around 125 GeV [1,2] that is most probably the Brout-Englert-Higgs boson responsible for the ElectroWeak (EW) symmetry breaking [3], the ATLAS and CMS Collaborations have provided a set of 88 rate measurementsbased on the full dataset collected so far with luminosities of ∼ 5 fb −1 at the center of mass energy √ s = 7 TeV and ∼ 20 fb −1 at √ s = 8 TeV [4, 5] (see also Ref. [6,7]) -that constitutes a new and precious source of indirect information on physics beyond the Standard Model (SM). Indeed, observing deviations of the Higgs boson rates with respect to their SM predictions would reveal the presence of an underlying theory while the absence of such deviations allows one to strongly constrain new models (see for example Ref. [8] for higherdimensional models, Ref. [9] for composite Higgs theories and Ref. [10] for supersymmetric scenarios). So far, no signs from an unknown world have came out from the data, but this is only the beginning of a long exploration, given the expected LHC upgrades [11].
The fits of the Higgs rates (c.f. Ref. [12] for the first set of analyses, Ref. [13][14][15][16] for the results after the Moriond 2013 winter conference and Ref. [4,5] for the latest official ATLAS and CMS analyses) are thus obviously important. Now certain aspects of these analyses remain to be worked out in order to obtain the final fits for testing new physics.
First, the precise likelihood functions associated to the experimental rates (in particular their specific shapes and the complete correlations between channels) are not provided in the present public papers, although they might be expected at some point. Second, a major part of the theoretical uncertainties is due to QCD calculations of the Higgs production rates [17][18][19][20] and their treatments in the fits raise questions in the Higgs physics community (see Ref. [21,22] for recent discussions). Taking carefully into account these theoretical uncertainties is crucial for the Higgs fits due to the following reasons.
First, theoretical uncertainties can be sizeable with respect to the experimental ones. The QCD uncertainty on the gluon-gluon fusion mechanism dominantly involved in most of the Higgs discovery channels induces typically an error of ∼ 10% on signal strengths (see Section 6), that is already comparable to the experimental error bars in several Higgs channels which reach values down to ∼ 20% [4][5][6][7]. Besides, considering for instance the CMS prospectives at √ s = 14 TeV with a luminosity of 300 fb −1 , the experimental error bars are around ∼ 5% (with same systematic errors as today) for the diphoton final state and less than ∼ 10% for the τ -lepton, Z and W boson channels [11] so that the theoretical error might even become the dominant one in some channels. Second, theoretical uncertainties might be of the same magnitude as the main potential deviations due to new physics. For instance the maximal corrections to Higgs couplings estimated in Ref. [23] for characteristic composite Higgs and supersymmetric models 0 lead typically to deviations of the signal strengths between ∼ 2% and tens of percent compared to SM. This is of the same order as the theoretical error mentioned above, so that one is precisely in the situation where the theoretical error deserves a careful treatment to test new physics scenarios. 1 0 In the case of no new states, related to the EW symmetry breaking, directly observed at the LHC. 1 This intermediate situation is to be contrasted with the two extreme cases of expected signal strength Therefore, in this paper, our primarily goal is to answer precisely the question : what is the correct treatment of the theoretical uncertainties in the fits of the Higgs boson rates? 2 This seemingly simple question has lead us to several new developments, summarized in the three lines of work described in the paragraphs below. First, we present a systematic survey of the various statistical treatments of the theoretical error and their applications to the Higgs fits within a unified formalism. We confront the frequentist and Bayesian frameworks, 3 4 that prove to exhibit a certain degree of convergence at the level of accuracy of the present LHC data. 5 We also compare the marginalisation and bias treatments. In the former, we consider the representative cases of Gaussian and flat combined priors because of the lack of knowledge inherent to the distribution of theoretical uncertainties. 6 We find the Gaussian prior to be well motivated by the full combination of each individual theoretical uncertainty. It turns out that the choice of one among all these statistical approaches may affect significantly the determination of the Higgs properties. It is thus important to understand precisely the conceptual differences between these approaches. Finally, this survey is the opportunity to provide useful analytical expressions for the marginalised likelihood functions, including the theoretical correlations among the Higgs channels. Second, we explain precisely the principle of bias 7 and its fundamental differences with the marginalisation principle. The bias principle is more conservative than the marginalisation principle by construction and does not depend on the shape of the priors of the nuisance parameters. This thorough examination of the bias principle leads naturally to introduce a statistical framework for biasing. We propose two realisations of the bias, referred to as the extremal bias and the envelope method, that apply in both frequentist and Bayesian contexts. Regarding the error combinations, important differences arise between the marginalisation and bias frameworks. 8 deviations much higher than the theoretical error (which can then be neglected) or deviations well smaller (no hope to detect them). In both of these cases, a detailed treatment of the theoretical error would not be really needed to test new physic scenarios. 2 Throughout this paper, we use generically the expression "theoretical error" to denote any error on the SM prediction for the Higgs rates. This is a slight wording abuse, because certain of these errors like the ones from the PDF determination have a partial experimental origin. 3 Sometimes in the literature, there are inconsistencies in the sense that errors are combined in a frequentist way (combination depending on the prior shape) while the priors are convoluted in a Bayesian way (convolution via integrations). 4 A pure Bayesian fit of the Higgs rates has been carried out in Ref. [16]. 5 To be contrasted with the preliminary study of Ref. [24] based on simulated Higgs data. 6 To the best of our knowledge, a flat prior for the theoretical uncertainty is for the first time applied to the Higgs fits. Notice also that the combination in quadrature of the theoretical and experimental errors, sometimes made in the literature, is equivalent to a marginalisation assuming Gaussian distributions for both sources of errors and neglecting the correlations. This is true in both frequentist and Bayesian cases. 7 A bias has been applied once in Ref. [14]. The analysis developed here improves the bias performed in Ref. [14] by including more effects like the production contamination, the individual scale/EFT/PDF errors, the branching fraction uncertainties, the correlations between Higgs channels and the Bayesian/frequentist cases. 8 For example, the PDF and amplitude uncertainties for the ggF mechanism are summed in quadrature in the Bayesian marginalisation, whereas they are linearly summed in the bias approach.
Third, we discuss and implement several improvements in the treatment of the theoretical uncertainties. (i) For the cross sections, the combinations of all the individual uncertainties are discussed exhaustively, including in particular the several errors constituting the parton PDF uncertainty. The so-called leading moment approximation is developed to facilitate the combination of such a high number of errors. (ii) The error contamination by various production modes and the errors on the Higgs branching ratios are taken into account. (iii) The correlations between the theoretical errors on the various Higgs detection channels are included. 9 We show that these theoretical correlations induce significant shifts of the bestfit regions in the Higgs coupling parameter space. (iv) A Higgs fit with more conservative theoretical errors is shown to illustrate the potential impact from the imperfect knowledge of the magnitude of these errors. For each of the statistical approaches developed along these three lines of work, we provide the up-to-date Higgs fit results based on the latest available data from the 7 and 8 TeV LHC, that can be readily used for new physics tests. From the theory side, we have updated the major gluon-gluon Fusion mechanism by using its reduced perturbative QCD error, issued from the recent calculation up to N 3 LO [25]. We have also included the theoretical uncertainty on this production mode due to the use of an Effective Field Theory in the amplitude calculation [25][26][27], so that the whole error on the cross section remains at ∼ 10%.

Statistical preliminaries
This section condenses the basic elements of frequentist and Bayesian statistics that will be used along the paper. In addition to statistical basics, the principle of bias is also presented.

Need-to-know frequentist and Bayesian statistics
In order to extract some information about a new physics model from a set of data, the central quantity to study is the likelihood function [28]. 10 The likelihood function is equal to the conditional probability density for obtaining the observed data, taken as a function of the hypothesis. In the case of predictions made in a given hypothesis H with n parameters {θ n } ≡ θ, the likelihood function reads where d represents the set of data. Note that the likelihood is defined up to an overall factor. In the present work, the data we will consider are the set of signal strength measurements from LHC and Tevatron, described in Section 4.1.
In particle physics, the likelihood function encloses a statistical uncertainty associated with the data. This is the uncertainty coming from the fluctuations inherent to the observation of a quantum process. This statistical uncertainty tends to zero in the limit of 9 We notice that such correlations were included e.g. in Ref. [15] for the specific assumption of errors with Gaussian priors and neglecting the correlations among different Higgs production modes. 10 Note this is an abuse of language, the likelihood function is actually a distribution.
a large amount of data. However, other sources of uncertainty can be present, both on the experimental or the theoretical side. For example, uncertainties arise from the finite resolution of a detector, or from the finite accuracy of a computation. These systematic uncertainties do not depend on the amount of data, and need to be taken into carefully.
In this paper, we are going to have a close look at the theoretical systematic uncertainties. The starting point for modeling a systematic uncertainty is to explicitly parametrize it. Namely, one introduces a set of new parameters, δ ≡ {δ i }, which explicitly modifies the likelihood, L(θ, δ) . (2.2) These new parameters are named nuisance parameters, as opposite to the θ's which are considered as the parameters of interest. This step of parametrisation is common to the frequentist and Bayesian frameworks, and is fairly universal. Discrepancies will appear in the way the δ's are treated, and will be at the center of our attention in the rest of the paper. Two fundamentally different points of view on how to treat the nuisance parameters, denoted as marginalisation and bias, will be further identified (in both the frequentist and Bayesian contexts).
In Bayesian statistics, model parameters are genuine random variables. They are associated with a so-called prior distribution, noted π(θ). In order to carry out a process of inference (for example, setting exclusion bounds), the relevant object to study is the posterior distribution, p(H, θ|d) ∝ L(θ) π(θ) . Ω being the whole parameter space. The The probability density function (pdf ) of this test is then computed by simulation (typically, using Monte-Carlo pseudo-data). The pdf of q(θ), noted f q , can then be used to evaluate a p-value, typically of the form where q d is the value given by the actual data. The 1 − α confidence regions are then obtained by solving p(θ) = α, i.e. the confidence regions are given by Ω α = {θ|p(θ) > α}. Whenever the likelihood is Gaussian, q follows a χ 2 distribution. One has then 1 − α = F (n) χ 2 is the χ 2 cumulative function with n degrees of freedoms. Confidence regions can thus be obtained by plotting q(θ) = q α . This simpler procedure is commonly used in the literature, even when the likelihood is not Gaussian. We adopt this procedure throughout this paper. In the case where the likelihoods are bivariate (which will be the case of our example of Higgs fit), we adopt the threshold values q = {2.30, 6.18, 11.83} . (2.8) In the Gaussian limit, these values match exactly the confidence levels 1 − α = {68.27%, 95.45%, 99.73%}.

Marginalisation principle
Having introduced the nuisance parameters δ 12 in the likelihood L(θ, δ), the next step is to eliminate them. This will effectively deform the likelihood, enlarging the preferred regions, and possibly shift their central values. In the Bayesian framework, this is naturally done by integrating over δ, so that where π(δ) is the prior distribution for the δ parameters. This operation is named marginalisation. In the frequentist framework, the likelihood is instead maximized, (2.10) 11 In classical frequentist statistics, hypotheses and parameters are not associated with probabilities. In this paper, for the frequentist side, we adopt the more general framework of hybrid Bayesian-frequentist statistics, in which a distribution can be attributed to a nuisance parameter. Conceptually, such distribution cannot be seen as a prior pdf, but corresponds to the likelihood for a real or imaginary measurement constraining the nuisance parameter (see Ref. [57], p. 4). However, by abuse of language, we will sometimes use the term "prior" in frequentist statistics as well. Classical frequentist statistics are recovered by giving a flat shape to these frequentist "prior" distributions. 12 Recall that we have defined δ as a set of nuisance parameters, δ ≡ {δi}. The subsequent integrations and maximisations will thus be multidimensional.
This operation is usually named profiling. Here however, in order to emphasize the parallel between Bayesian and frequentist cases, we also refer to it as "marginalisation". The outcome of Bayesian and frequentist marginalisation gives respectively the marginal likelihoods L B and L F . The best-fit regions are then obtained by using L B and L F in Eqs. (2.4) and (2.6), respectively. Finally, let us notice that in the frequentist case, it is clear that the marginalisation operation has the effect of selecting the values of δ preferred by the data.

Bias principle
The common feature of Bayesian and frequentist marginalisations is that nuisance parameters contribute to goodness-of-fit. This implies that the nuisance parameters can relax a tension among various measurements, which in turn induces a shift of the best-fit regions.
In the context of the search for new physics, such a shift could also be characteristic of the presence of a new physics signal. It is thus of highest importance to correctly understand the effects of nuisance parameters, in order not to confuse systematic uncertainties with the presence of new physics! In order to explicitly expose the shifts induced by nuisance parameters, and ultimately obtain more conservative results, a useful approach is to define a new operation, alternative to marginalising, with the requirement that the nuisance parameters do not contribute to goodness-of-fit. We will refer to this principle as bias, as opposite to the marginalisation principle. We will see that the bias principle provides results that are independent of the shape of the prior of the nuisance parameters.
The bias principle can be intuitively grasped as follows. Consider the likelihood L(θ, δ) with a single nuisance parameter on the interval δ ∈ [δ a , δ b ]. Instead of marginalising over δ, one can look at the contours of the likelihood for various discrete values of δ, say δ = δ a , δ b . For each value of δ, the contours are given by Eq. (2.4) (Bayesian) or Eq. (2.6) (frequentist). To obtain the contours, we can see that the likelihood is separately normalised for δ a and δ b . This normalisation is in general not the same for δ a and δ b . Because of this normalisation factor, no particular value of δ is preferred by the fit. It is this normalisation factor that concretely realises the bias principle.
In Bayesian statistics, the bias principle finds a general realisation as follows. The requirement one wants to implement is that the nuisance parameters δ do not contribute to goodness-of-fit. This is equivalent to ask that the δ do not have a preferred region once data are taken into account. To translate formally this condition, the relevant quantity to involve is the marginal posterior of δ, p(δ|d). To implement the bias principle, one should thus require p(δ|d) to be constant, which translates into the condition We see that the condition (2.11) fixes the π(δ) prior to be π(δ) = 1 Ω dθ L(θ, δ) π(θ) .

(2.13)
This peculiar prior is not independent on data, and is thus not orthodox with respect to the usual Bayesian philosophy. This is an expected consequence of biasing and all quantities are nevertheless well defined. It follows that the posterior for θ and δ has the form L(θ, δ)π(θ)/ dθ [L(θ, δ)π(θ)]. The Bayesian bias likelihood is then given by marginalising this particular posterior with respect to the nuisance parameters, . (2.14) In frequentist statistics, the bias principle is realized in a very similar way to the Bayesian case. The quantity telling how δ is constrained by the data is the marginal likelihood for δ (with its associated "prior"), max θ∈Ω [L(θ, δ)π(θ)π(δ)], which selects the preferred θ for a given δ. One requires this marginal likelihood to be constant, This implies that the π(δ) "prior" satisfies . (2.16) The marginal likelihood of θ is then given bȳ This operation is sometimes referred to as the envelope method. This is because, for a continuous domain D, it draws continuous regions which are wider than the ones obtained by marginalising. 13 Comparing the Bayesian and frequentist realisations of the bias principle, Eq. (2.14) and Eq. (2.17), it appears that the resulting bias operations are fully similar: the expressions Eq. (2.14) and Eq. (2.17) are identical up to interchanging maximisation and integration.
Let us finally comment about the best-fit regions for the bias likelihoods. The Bayesian bias is a particular case of Bayesian marginalisation with a well-chosen prior. The contours are thus obtained by integration, usingL B in Eq. (2.4). For the frequentist bias, the bias likelihoodL F can be treated using the usual likelihood ratio test and computing the associated p-value, as described in Eq. (2.6). We conclude that the best-fit regions for both the Bayesian and frequentist bias are well-defined.
Let us make an important comment which will turn useful for the frequentist treatments in Section 8. For a single δ in the discrete domain D = {δ a , δ b }, the best-fit regions 13 Using L = e −χ 2 /2 , one has the equivalent formulation of the envelope method in terms of χ 2 , (2.18) In case of classical frequentist statistics, π(θ) is a constant, so that the two log π(θ) terms cancel.
obtained by inserting the likelihood (2.17) in Eq. (2.6) reproduce exactly the ones in the discrete version of the bias described earlier in this subsection. Indeed, the normalized likelihood (2.17) will lead to a denominator equal to one in Eq. (2.6) and the role of this denominator in the contour definition will be played instead by the denominator of Eq. (2.17).
In this paper, we will refer to the general realisations of the bias principle given by Eq. (2.14), (2.17) as the envelope method, for both the Bayesian and frequentist versions. In contrast, the discrete version of the bias previously introduced can be seen as a minimal realisation of this principle. In this paper, we will refer to it as the extremal bias, for both the Bayesian and frequentist versions.

Combinations of theoretical uncertainties
This section applies to any systematic uncertainties. Nevertheless, since in this paper our main focus is on theoretical uncertainties, we will readily use this term. In the previous section, we have seen that the correct procedure to incorporate theoretical uncertainties into the likelihood is to model these uncertainties using nuisance parameters and treat them using either the marginalisation or the bias approach. From the practical point of view, this step of marginalisation can be computationally heavy to carry out, both in the Bayesian and frequentist cases. Indeed, for each point in the space of parameters of interest, for n nuisance parameters, either a n-dimensional integration or a n-dimensional maximisation has to be done, whose complexity typically grows exponentially with n.
Because of the cost of exact marginalisation, it is a common practice in the high-energy physics community to combine certain uncertainties in a preliminary step, before carrying out the operation of marginalising. This approach of "preliminary combinations" should be followed with some care, because it can be approximative and may contain implicit assumptions. In this section, we revisit and develop the various operations of preliminary combination on a firm statistical ground.

Error modelisation
Let Q be an arbitrary quantity entering into a base likelihood L[Q]. The uncertainty about Q can be modelled via a dependence of the form where δ is the nuisance parameter, associated with a distribution π(δ), defined over the domain D. Here and throughout this paper, without loss of generality, we let all the δ follow a "standard distribution", such that all the information about the magnitude of the uncertainty will be contained in the coefficient ∆. With this parametrisation, ∆ represents the relative uncertainty associated with Q. This linear model (3.1) is valid for any π distribution, provided that the magnitude of the relative error is small, ∆ 1. The actual definition of π depends on the statistical approach adopted. In the Bayesian case, δ is a random variable, so that one chooses E[δ] = 0, V[δ] = 1. 14 Note that the domain of δ can be either finite or infinite. In the hybrid frequentist case, one can follow the same conventions as for the Bayesian case. The classical frequentist case is equivalent to have a flat π, and one sets the domain to be D ≡ [−1, 1] in that case. For the errors we will consider, π will always be centred on zero.

Bayesian combination of theoretical uncertainties
In the Bayesian framework, a nuisance parameter δ is rigorously taken as a random variable with prior distribution π. In presence of various nuisance parameters, one may wish to combine various sources of error, say δ A and δ B . A combination of these sources can be done if they appear systematically into a single combination inside the likelihood, is the Dirac distribution. Here π A,B is the common prior of δ A , δ B . If these are independent, one has π A,B (δ A , δ B ) = π A (δ A )π B (δ B ). Note that the integration over δ C of the left-hand side of this equation recovers Eq. (2.9).
When δ A and δ B are independent, Eq. (2.9) implies that the distribution of δ C is exactly given by a convolution product, The variable x can be seen as δ∆. It is convenient to defineπ C (x) = π C x ∆ C , so that the width ofπ C is given by ∆ C . In contrast, recall that the width of π C is always normalized to one by convention. Using theπ definition, the convolution (3.3) can simply be written asπ or more shortlyπ (3.5) The resulting distribution π C has in general a non trivial shape, except for example when both π A and π B are Gaussian, in which case π C is Gaussian as well. In contrast, Eq. (3.3) implies that the magnitudes of the errors ∆ A , ∆ B are combined following irrespective of the shape of the distributions. That is, the errors are always combined in quadrature, i.e. the variances always add-up. Note the ∆ 2 's correspond to the variance of theπ distributions.
In case of two independent sets of several correlated variables δ A,i , δ B,i with respective covariance matrices C A , C B , combined as δ C,i = δ A,i + δ B,i , 15 the combination is naturally generalized to Again, this is independent of the prior shapes. The distribution of δ C,i is again obtained using Eq. (3.2). Finally, one may wish to combine nuisance parameters that are themselves correlated. In the case of two nuisance parameters δ A , δ B with a correlation coefficient ρ, one gets giving rise to a linear combination in the fully (anti-)correlated case ρ = ±1, and to Eq. (3.6) in the de-correlated case ρ = 0. The combination (3.8) is still independent of the prior shapes. Note that in this case π C is still obtained from Eq. (3.2), but is not given anymore by a convolution product because π A and π B are not factorised anymore. Finally, in the case of two sets of nuisance parameters δ A,i , δ B,i with a relative correlation matrix C AB , one gets All the results of this subsection are straightforward to derive using characteristic functions (see Appendix A).
In the limit ∆ A ∆ B , it appears that π C ∼ π A , i.e. the combined prior has mainly the shape of the leading uncertainty. In Section 3.4, we demonstrate that it is well justified to use Eq. (3.6), which is exact, together with the approximation π C ≈ π A . Beyond the ∆ A ∆ B limit, if one wishes to care about the shape of π C , a conservative approach is to consider both extreme cases π C = π A and π C = π B . This is because the actual shape of π C is always an intermediate distribution between π A and π B , as dictated by the convolution product.

Frequentist combination of theoretical uncertainties
Let us start again with the nuisance parameters δ A , δ B and their associated "prior" distribution π A,B . If the nuisance parameters enter as a single combination in the likelihood, L[δ A ∆ A + δ B ∆ B ], one can define the nuisance parameter δ C as above, and write is the Dirac distribution. 16 We emphasis that this formula is exactly similar to the Bayesian one, Eq. (3.2), with integration replaced by marginalisation. When π A,B (δ A , δ B ) = π A (δ A )π B (δ B ), it appears then that the distribution of δ C is given by (3.11) 15 Note that in this case, for simplicity, we used a different convention from the one-variable case: we do not factor out the magnitude of the uncertainties (∆i) in front of the δi. 16 Here δ[x] can be taken as the regularised Dirac peak.
This formula has a convolution product structure, where the integration has been replaced by a maximisation. From that point, it is then possible to compute the frequentist correlation matrix, C −1 ij = −∂ 2 log L/∂θ i ∂θ j . The general formula for the combination of C A , C B is straightforward but tedious to compute. In sharp contrast with the Bayesian case, it appears in the frequentist case that the combination of the correlation matrices C A , C B accordingly to Eq. (3.11) depends on the shape of the π A , π B distributions.
In the particular case where both π A , π B are Gaussian, the combination appears to be in quadrature, as in the Bayesian case. The combination formulas then match exactly the Bayesian ones, Eqs. (3.6) and (3.7). Moreover π C is also Gaussian. Another important particular case is the one of flat priors. In that case, π C appears to be flat, and the combination is linear, Note that no correlation matrix can be defined in the flat case. 17 In the case where δ A and δ B are correlated, they should be treated with a common "prior" as in the Bayesian case.

The leading moment approximation
Consider again the Bayesian case of a combination of two nuisance parameters, Recall that the δ parameters have zero mean and have a standard distribution so that E[δ] = 0, V[δ] = 1. Assume further that the magnitude of the uncertainty B is small with respect to the uncertainty A, When this condition is satisfied, the source of uncertainty B can be treated as a perturbation to the source of uncertainty A. Starting from this observation, one can obtain π C up to ∆ B /∆ A corrections (see Eq. (A.9)). This is demonstrated in Appendix A using characteristic functions. In particular, for independent variables, at the first non-trivial order in the expansion, one obtains that π C ≈ π A (3.14) Recall that π C is determined by the convolution productπ C =π A π B . Hence for ∆ A ∆ B , one can intuitively expect that the shape ofπ A andπ C are similar (see Eq. (3.14)), even though their widths are different (according to Eq. (3.15)). In case δ A and δ B are correlated, Eq. (3.15) has to be replaced be Eq. (3.8). This "leading moment" approximation is useful in presence of a hierarchy between the magnitude of the various uncertainties. It dictates how to consistently capture the main effects of the uncertainties into the likelihood. This in turn allows one to obtain an approximate form for the combined priors, which opens up the possibility of obtaining analytical expressions for the marginal likelihoods.
The leading moment approximation also applies when δ A and δ B appear in various linear combinations within the likelihood. This situation typically happens when various observables are affected by the same source of uncertainty. The case of two nuisance parameters and two combinations is discussed in Appendix A. One considers two combinations It is found that the ∆ C 1,2 are obtained as in the one-combination case discussed above. The correlation coefficient between δ C 1 and δ C 2 requires more attention.
it is found to be approximately equal to one. This implies that the shapes of the distributions of δ C 1 , δ C 2 and δ A are the same up to ∆ B 1,2 /∆ A 1,2 corrections (see Eq. (A.15)), that is (3.16) From Eq. (3.16), it appears that the leading moment approximation reduces the number of nuisance parameters in the likelihood. In the case where ∆ A 1 ∆ B 1 , ∆ A 2 ∆ B 2 , it appears that the correlation coefficient between δ C 1 and δ C 2 is approximately equal to the correlation coefficient between δ A and δ B (see Eq. (A.16)), so that (3.17) In the particular case where δ A and δ B are independent, one has In the other particular case where δ A and δ B are 100% correlated or anti-correlated, one has All the cases with more variables or more combinations can be deduced recursively from the case with two parameters and two combinations studied here. 18

Combining uncertainties in the bias approach
We now analyse how the combination of uncertainties arises in the case of the method of bias. We still consider a combination of nuisance parameters δ A,B entering in the likelihood as Recall that in our conventions, δ is a random variable with a fixed domain, while ∆ is a number representing the magnitude of the uncertainty. In the bias approach, by definition, the shape of the distribution of δ is set so that δ does not participate to the fit. The information about the uncertainty is thus encoded only in the domain of the variable δ∆. The choice of this domain has some degree of arbitrariness. This choice depends on how conservative one wants the results to be. In the following we choose to let δ vary in the interval [−1, 1] and we identify ∆ as a 1σ error, i.e. the same way it is defined for the marginalisation.
The operation of Bayesian bias can be seen as a special case of marginalisation, where the prior is set by Eq. (2.13). As the likelihood we consider in this section depends only on the combination δ A ∆ A + δ B ∆ B , this peculiar prior depends only on the combination δ A ∆ A + δ B ∆ B by construction. Let us denote it as π B bias (δ A ∆ A + δ B ∆ B ). In order to get the combination δ C ∆ C = δ A ∆ A + δ B ∆ B , one applies the definition of Eq. (3.2) using the π B bias prior. It turns out that π C (δ C ) = π B bias (δ C ∆ C ). This means that the domain of δ C ∆ C is given by the domain of (3.20) When δ A and δ B are independent, one has simply When δ A and δ B are 100% correlated positively (i.e. δ A = δ B ), it turns out that one has again the combination When δ A and δ B are 100% correlated negatively (i.e. δ A = −δ B ), the combination reads Let us stress that the correlation between δ A and δ B is determined by their common domain D δ A ∆ A ,δ B ∆ B . The above extreme cases are easily determined. The case of an intermediate correlation is trickier as it requires a precise definition of the domain. The case of an arbitrary correlation will not be needed throughout this paper. We see that the uncertainties are automatically combined linearly in the Bayesian bias method. These results above can be applied recursively to more complex combinations. For example if δ D ∆ D = δ A ∆ A + δ B ∆ B + δ C ∆ C , with δ A and δ B 100% anti-correlated and δ c independent from the two others, the bias combination gives Also, the bias combination applies in presence of various linear combinations (labelled by i) of the same nuisance parameters. In that case, the result of the combination is a common nuisance parameter δ, coming with different magnitudes ∆ i for each combination.
The frequentist bias has the same structure as the Bayesian bias. The starting point to determine the error combination is to use the frequentist version of the bias prior of Eq.(2.16) in Eq. (3.10). It follows that the frequentist combinations are the same as in the Bayesian case. We can thus conclude that in the bias approach, the preliminary combinations of uncertainties are done linearly, in both the frequentist and Bayesian cases. One should remark that such a combination is systematically more conservative than the combinations from both the Bayesian and frequentist marginalisations, as can be seen comparing Eqs. (3.21), (3.22), (3.23) with for example Eq. (3.8). Note that the combination in the frequentist marginalisation with flat prior (see e.g. Eq. (3.12)) is the same as the bias combination. Therefore the bias method is also more conservative than the standard marginalisation at the level of error combinations.

The Higgs boson rates
The couplings of the Higgs boson h are all predicted in the Standard Model, so that any deviation from the SM predictions would constitute a sign of the existence of physics beyond the SM. The Higgs couplings can be probed by collider experiments, which can produce the Higgs on-shell and observe its decays. This process of Higgs production followed by its decay is parametrised as The SM Higgs production mechanisms accessible at the LHC (and Tevatron) are i) gluongluon fusion (ggF), ii) vector boson fusion (VBF), iii) associated production with an electroweak gauge boson V = W, Z (VH), and iv) associated production with a tt pair (ttH).
The main SM Higgs decays observed at the colliders are decays into gauge bosons, h → γγ, ZZ, W + W − , and into heavy fermions, h → bb, ττ . The production modes X and final states Y will be therefore taken in the following list,

The data
The Higgs searches at ATLAS, CMS and the Tevatron are focussed on a specific final state Y . For each final state, various channels are defined using mutually exclusive cuts. Throughout this paper, these experimental channels will be labelled by lower case latin indices (i, j . . . ). We will consider all the 88 channels. A given i contains the information on the final state and the specific channel. In the following, it will be sometimes useful to refer to the final state Y corresponding to a given channel i. We will use the short notation Y i , meaning that Y is taken as a function of the variable i, i.e. Y i ≡ Y (i).
The results from Higgs searches at the LHC and the Tevatron are reported in terms of signal strengths µ ex i . A signal strength is defined as the ratio of the observed event number with the expected SM event number, The predicted SM event rate of a process pp (pp) and L is the integrated luminosity. However, from the experimental viewpoint, all the production processes contribute to a given final state. Hence the Higgs production cross sections have to be weighted by a selection efficiency SM X,i encoding the effects of kinematical cuts. The actual expected event rates are thus given by Note that the kinematical cuts have been to some extent designed to disentangle the production modes, so that often one of the efficiencies will dominate over the others.
The experimental central values of the µ ex i , the associated statistical errors, the experimental systematic errors, and the selection efficiencies SM X,i that we will exploit in our analysis are taken from the following references. The statistical and experimental systematic errors are often combined within these references and will be denoted here as ∆µ ex i .  [39,40].
Apart from statistical and experimental systematic errors, certain theoretical errors on µ ex i are included in the public results. To the best of our knowledge, the combination between these experimental and theoretical uncertainties is often made in quadrature. We thus subtract in quadrature these theoretical errors from the provided total uncertainties. How to properly (re)introduce the theoretical errors constitutes the main topic of this paper, and will be discussed at length in the upcoming sections.
Finally, we mention that we do not include in our fits more challenging observables related to the Higgs pair production [41], off-shell effects, loop-induced Zγ final state, electron/muon pair final states, final states induced by flavour-changing Higgs couplings, nor exotic or invisible final states. Some of those would require to introduce new parameters in the Lagrangian that we will consider in Eq. (4.7). The motivation is to keep a simple physical framework in order to discuss easily the statistical aspects. In any case, the present experimental limits on such Higgs observables are still not stringent enough to affect drastically the Higgs fits. Moreover, all the statistical concepts discussed throughout the paper can be simply extended to new Higgs observables.

New physics parametrisation
The new physics possibly lying beyond the SM may induce a distortion of the SM Higgs couplings. The correct way of dealing with the low-energy manifestation of heavy new physics is through the use of an effective Lagrangian (see e.g. Ref. [16] for global fits of the Higgs effective Lagrangian). The leading effects on the Higgs sector appear through dimension-6 operators. The effective Lagrangian then induces anomalous couplings between the Higgs and the SM particles. The anomalous couplings to weak bosons and to heavy fermions can be parametrised as where y t,b,c,τ are the SM Yukawa coupling constants (in mass eigenbasis), the subscript L/R indicates the fermion chirality, v is the Higgs vacuum expectation value, g hW W = 2M 2 W /v and g hZZ = M 2 Z /v are the EW gauge boson couplings. The c W,Z,t,b,c,τ parameters are defined such that the limiting case c W,Z,t,b,c,τ → 1 corresponds to the SM. New tensor structures are also generated by the effective Lagrangian but are not taken into account here.
Our focus being on theoretical uncertainties, we adopt a fairly simple parametrisation of the new physics effects. We assume universal deviations for fermion couplings, The c f are assumed to be real. Clearly, this simplified description of the new physics effects represents only a piece (operators with no extra derivatives) of the full dimension-6 effective Lagrangian. Having c W ≈ c Z and c f universality is however approximately compatible with certain new physics scenarios, like for a warped extra-dimension with bulk custodial symmetry vanishing IR brane kinetic terms for EW gauge bosons [42,43]. 19 Having only two parameters in this simplified framework, the results of our fits will systematically be presented in the c V − c f plane.
In the hypothesis of the existence of a physics Beyond the SM (BSM) parametrised by c V − c f , the expected signal strength is given by being defined in Eq. (4.5). This is the theoretical prediction of the experimental signal strength defined in Eq. (4.6). Both BSM cross sections and branching ratios σ BSM X , B BSM i can be expressed in terms of the SM amplitudes and of c V , c f . The expressions can for example be found in Ref. [44], whose procedure is closely followed here. In all generality, the BSM efficiencies are not the same as the ones of the SM either. However, this happens when couplings with new tensors structures are generated by new physics. In our simplified framework, this does not happen, such that one can safely take BSM X,i = SM X,i ≡ i X . The SM production cross sections and partial decay widths for the Higgs boson are taken, respectively, from the LHC Higgs cross section Working Group (LHCHWG) Ref. [17] (see also Ref. [18][19][20] as well as the recent N 3 LO ggF computation [25]) and Ref. [17,20]. These numerical results correspond to the rates calculated at the highest orders of EW and QCD corrections known so far (mixed EW-QCD at NNLO for the ggF mechanism [27] and at NLO for other Higgs production modes).

The base likelihood
Having introduced the statistical framework and the Higgs data in Sections 2 to 4, we can proceed with building the Higgs likelihood function. We define the base likelihood L 0 as the likelihood containing the central values of Higgs signal strengths, and the experimental uncertainties. The theoretical uncertainties are kept apart from now. Their inclusion into the base likelihood will be discussed at length in the next sections and is the central topic of this paper.
In absence of any experimental systematic errors, a signal strength variable follows a Poisson statistics, and the associated likelihood is thus a Poisson distribution. Whenever the event number is large enough, about O(10) in practice, the likelihood can be approximated by a Gaussian. In contrast, in presence of systematic uncertainties, this approximation generally does not hold. In practice however, the complete likelihood resulting from the combination of statistical and experimental systematic errors is not provided in the experimental public results. We will therefore model the base likelihood using Gaussian distributions, just as if the shape came out only from the statistical error. Such an approximation is expected to be good as long as the systematic error is small with respect to the statistical error, as shown in Section 3.4 and Appendix A.
The observed rates in the current 88 channels (labelled by i, j) are potentially correlated, for example because of the experimental error on the luminosity. The base likelihood follows therefore a multivariate normal distribution, where C ex ij is the correlation matrix among all channels. Ideally, each individual observed channel i must be considered in order to take into account all the experimental information available on the signal strengths. In practice, few elements of this correlation matrix have been provided by the Collaborations up to now. Therefore in the following, we will include only the diagonal elements of C ex ij , given by C ex ii = (∆µ ex i ) 2 , where ∆µ ex i is the experimental uncertainty extracted from the public experimental results. For future releases, we encourage the experimental Collaborations to provide as many elements as possible for the correlation matrix of the individual signal strengths. 20 Alternatively, to perform the Higgs fits one could think of using the correlations between the combined observed rates, that are currently provided by the LHC Collaborations. Although instructive, these combined rates do not keep track of all information since they are grouping together different Higgs production modes (which were originally measured independently), like µ ex VBF,VH and µ ex ggF,ttH for each Higgs decay channel [6,7]. Notice that such combined signal strengths also hide some information in the sense that they can result from summations over various exclusive selection cut categories.

The uncertainty on the signal strengths
The Higgs theoretical uncertainties we will refer to are the theoretical uncertainties associated with the expected event rates N SM i defined in Eq. (4.5), that are obtained through analytical and numerical computations in quantum field theory. These uncertainties will propagate both into the experimental signal strengths µ ex i and into the theoretical strengths µ th i , defined in Eqs. (4.6), (4.8). Following our conventions (see Section 3, Eq. (3.1)), the theoretical uncertainty on the Standard Model expected rate in a channel i is written under the form , and ∆ N i represents the relative magnitude of the uncertainty.
The theoretical uncertainty on N SM i propagates to the experimental signal strength as The case of the theoretical signal strength µ th is slightly trickier. Here we focus on the most realistic case where the deviations induced by new physics are small, so that the anomalous couplings c a (with a = (W, Z, t, b, c, τ )) are close to one, i.e. |c a −1| 1. The contributions from new physics can be linearised with respect to the small parameters c a − 1, so that the BSM event rate in the channel i can be written as In this expression, it appears that the leading source of uncertainty comes from the SM event rate uncertainty ∆ N i . In the expression of µ th i , it turns out that this uncertainty cancels out at first order between the numerator (N BSM i ) and the denominator (N SM i ). The subleading uncertainties would then come from a term quadratic in ∆ N i and from the relative uncertainty (c a − 1) ∼ ∆ N i . These higher-order contributions are subleading compared to the error on the experimental signal strength, given in Eq. (5.3), which is of order ∆ N i . In the following, we will thus focus only on the uncertainty of the experimental signal strength µ ex

The structure of the Higgs theoretical uncertainties
The theoretical uncertainty on N SM i comes from the errors on the Higgs cross sections σ SM X and partial decay widths Γ SM Y . Still following our conventions, these relative uncertainties are written as The exact content of these errors will be discussed in details in the next section. The uncertainty on the partial decay width propagates to the branching ratios. Defining the relative error on the branching ratios as B SM The uncertainty from the cross sections and branching ratios then propagates to the signal strength (4.6) and is thus encoded in a factor µ ex being the Y decay mode of the Higgs channel detection i. Note that the sign after the first equal symbol is just a convention if the errors are symmetric. Finally, the errors on cross sections and partial widths come from several sources. One can write those generically as with the relative errors ∆ n X , ∆ n Y to be detailed in the following. 22 Knowing the base likelihood of Eq. (5.1), and knowing where exactly the theoretical uncertainties enter, we have the complete Higgs likelihood as a function of all the quantities that will have to be treated statistically, namely the nuisance parameters and the effective BSM parameters, 23 Rigorously, the next step is to eliminate the nuisance parameters, δ n X , δ n Y , applying either the marginalisation or the bias method. In general these steps should be performed numerically, and are computationally heavy. Here however, we will use the methods of preliminary combinations advocated in Section 3. Then it will appear that the subsequent Higgs likelihoods are much lighter to treat.

Combining the Higgs rate uncertainties
In this section we shall combine the Higgs rate uncertainties that will be used in the marginal likelihood studied in Section 7. The most clear and rigorous statistical context 21 δ Y Y represents the Kronecker symbol. 22 Throughout the paper, we will systematically denote the values of ∆ n X , ∆ n Y taken from the literature by ∆ |0 or ∆ 0 . The possible ambiguities in the interpretation of these numbers will be discussed case by case. 23 In the following, to adopt compact notation, we will omit the cV , c f arguments of the likelihood function when no ambiguity is possible.
for the marginalisation procedure is arguably the one of Bayesian statistics. In particular, the nuisance parameters are treated on the same ground as the variables of interest and are thus automatically given a probability distribution (see for instance Ref. [45]). For that reason we focus in this section on the error combinations within the Bayesian context. The resulting likelihood involving the combined errors will be formally treated within both the Bayesian and frequentist marginalisations in Section 7.
As we have described in Section 2.2.1, the Bayesian marginalisation procedure eliminates the dependence of the likelihood on the nuisance parameters through an integration. For the Higgs likelihood Eq. (5.11), this integration reads where π 0 is the joint prior of all the nuisance parameters. Recall that this prior factorises when parameters are independent. More explicitly, this marginal likelihood reads The theoretical uncertainties δ µ i ∆ µ i on each signal strength µ i are expressed in terms of the uncertainties on cross section δ n X ∆ n X and partial decay width δ n Y ∆ n Y through Eqs. (5.7) to (5.10).
In the following subsections, starting from Eq. (6.2), we will combine all the sources of uncertainty step-by-step, following the combination formalism established in Section 3. The aim of this section is to provide a clear and exhaustive treatment of all the Higgs theoretical uncertainties.

Combining the PDF and α s uncertainties
Let us first discuss the errors on QCD predictions for the Higgs production cross sections at the proton level. Those are induced by the uncertainties on the parton Probability Density Functions (PDF) inside the proton. First, one may distinguish between two distinct origins to the PDF uncertainties: an experimental source -as the PDF are reconstructed from collider data -and the choice of a specific PDF set (MSTW, CT/CTEQ, NNPDF. . . ). Second, we consider simultaneously the parametric uncertainty coming from the strong coupling constant, α s . We consider both PDF and α s uncertainties simultaneously because they contribute in an intricate way to the cross section, as α s enters both in the hard process matrix element and the PDF themselves.
• Modeling the uncertainties: The uncertainties from α s and the collider data are modeled by the nuisance parameters δ αs , δ data and constitute independent sources of uncertainty (hence with factorisable priors). The relative uncertainties on α s and the PDF data can be parametrised as The α s error enters in the cross section in two different ways. On one hand, α s is used in the fit of the data aimed at determining the PDF themselves. On the other hand, α s is also involved in the hard subprocess that is convoluted with the PDF to obtain the final cross section. These two contributions to the cross section uncertainty, named here as ∆ αs,fit and ∆ αs,hard , are not available in the literature. However, we will show that the knowledge of these two separate contributions is not necessary either. Rather, provided that the relative errors ∆ αs,fit and ∆ data are small enough to be linearised, only the sum ∆ αs,hard + ∆ αs,fit is needed. This sum can typically be inferred from the literature. In order to understand the interplay among the α s and the data uncertainties, it is instructive to write explicitly how they enter into the cross section. One should start with the form where the first argument corresponds to the PDF input, while the second argument represents the α s -dependence coming from the partonic process. From this general form, one then introduces the δ αs and δ data nuisance parameters, and expand the expression at first order, 24 The terms in the last two lines represent the errors propagated to the cross section at first order in ∆, expressed as partial derivatives of σ SM X , and correspond precisely to the relative errors on the cross section, 25 It appears clearly that only the sum ∆ αs,hard X +∆ αs,fit X is needed. Fortunately, this is what is provided in the literature. This sum ∆ αs X ≡ ∆ αs,hard X + ∆ αs,fit X can be read for example from Ref. [20]. Note also that the nuisance parameter δ αs is common to any production mode, i.e. it does not carry the index X. In contrast, the nuisance parameter δ data X carries an 24 The ∂1,2 represents derivative with respect to the first and second argument of the function respectively, ∂1f = ∂f (x, y)/∂x, ∂2f = ∂f (x, y)/∂y. 25 Note that the ∆'s in Eq. (6.6) can be negative as they are identified from the partial derivatives in Eq. (6.5). In the rest of the paper however, the ∆'s are taken positive by convention. Different signs for the ∆'s would correspond to a negative correlation, that is instead included at the level of the δ's in the rest of the paper.
index X because each production mode potentially involves different initial states. These initial states correspond to different PDF, which are fitted from different data sets. Finally, one should check the validity of the error propagation at linear order in the cross sections (i.e. that the O(∆ 2 ) in Eq. (6.5) is well negligible). From Eq. (6.5)-(6.6), one can see that at linear order, for any fixed value of α s (i.e. fixed value of δ αs ), the error bar on σ SM X induced by the data uncertainty (obtained from varying δ data , e.g. in [-1,1]) should have the same size. A change with α s of this bar size could thus come only from higher order terms such like On the Fig. (57)- (58)- (59) of Ref. [20] for the various Higgs production reactions at the 8 TeV LHC, we see that the change of this bar size (vertical bar there) is small with respect to the shift (i.e. ∆ αs X ) of the bar central values. We conclude that one can restrict the expansion Eq. (6.5) to linear order in a good approximation.
Notice that a customary way to write these uncertainties is by splitting between the overall PDF error and the hard subprocess error, δ PDF The trouble when using this form is that the δ PDF X and δ αs,hard contributions are correlated via α s . Combining these uncertainties then requires to know such a correlation coefficient, which is fixed by ∆ αs,fit , as well as ∆ αs,hard . We emphasize that the use of this intermediate parametrisation brings unnecessary complications, and we recommend thus to avoid it.
Hence according to Eq. (6.6), the parametric uncertainties from α s are cast into a single error ∆ αs X , and add up with the statistical error from the data as Using this approach, one deals directly with the elementary sources of uncertainty. These two sources of error have no intrinsic relation and are thus independent, meaning that δ data and δ αs have factorisable priors. Similarly, the uncertainty from the choice of a specific PDF set, modeled by δ set , can be added up linearly to the errors of Eq. (6.7) in a good approximation. The linear approximation can be justified from Fig. (57) in Ref. [20]. There one can see that the size of the data error bars as well as the shifts induced by α s depend only weakly on the PDF set choice. The δ set error is also independent from the δ data X , δ α s errors and in turn possesses its own prior distribution. All those errors induce three terms in the sum of theoretical errors entering Eq. (5.9). These terms can be cast into a global PDF uncertainty, We recall that X = {ggF, VBF, VH, ttH} and that the ∆'s are relative errors, which are chosen by convention to correspond to one standard deviation. Those are related to the 1σ absolute errors on the SM Higgs cross section through e.g.
• Combining the three uncertainties: Here we combine the three sources of theoretical uncertainty described in Eq. (6.8). We will add up more and more errors progressively in the following subsections. These three independent sources of error are associated with three priors π αs , π data X , π set . These nuisance parameters appear in Eq. (6.2), where they are integrated over. We now proceed to combine these errors following the analysis of Section 3, starting from Eq. (3.2). In practice, for the discussion, it will be convenient to combine only two errors at a time. One then finds a likelihood of the type (6.2) depending only on the nuisance parameter δ PDF+αs X . The distribution of this nuisance parameter comes with a 1σ width ∆ PDF+αs X given by The nuisance parameter δ PDF+αs π PDF+αs X =π set X π data X π αs X , (6.10) whereπ PDF+αs ) and the variable x corresponds to the relative error δ PDF+αs X ∆ PDF+αs X . For the initial priors one has for exampleπ αs X (x) = π αs (x/∆ αs X ). The Eq. (6.9) and then (6.10) are justified in details in the rest of this subsection.
• Details on the data and α s error combinations: We emphasize that the Bayesian combination of the 1σ widths, as here in Eq. (6.9), is independent of the shapes of the prior distributions. This combination only depends on the possible correlations among individual errors [c.f. Section 3.2]. In the present case, there is no correlation between the δ data X and δ αs X parameters, as explained right below Eq. (6.7). This leads to the sum in quadrature of the 1σ errors (∆ data X ) 2 + (∆ αs X ) 2 in Eq. (6.9). Let us comment about those uncertainties. First, the error associated to π data X originates mainly from measurements: it is mainly induced by the limited accuracy of data points used to perform the fit for reconstructing PDF. Hence this error is mostly of statistical nature. There exists of course systematic errors as well, but it has been checked by several groups that the final π data X distribution can be reasonably taken as Gaussian [18]. Second, the uncertainty on α s originates mainly from lattice calculation errors (mainly theoretical) and especially from perturbative truncation errors [46] 26 . Indeed the α s determination from lattice methods (most accurate one in Ref. [46]) represents today the most precise determination and hence essentially dictates the final world average error [47]. The FLAG Working Group on lattice calculations has estimated a more conservative uncertainty on α s , which is increased by a new QCD perturbative error estimation [48], thus still leading to a dominant theoretical uncertainty. At this level, a comment is needed on the link between the 1σ errors and the uncertainty magnitudes provided in literature. To remain conservative we use ∆ αs X = ∆ αs X | 0 for the 1σ error, where ∆ αs X | 0 is the error provided by Ref. [17,20]. There is indeed a somewhat arbitrary choice for the relation between ∆ αs X and ∆ αs X | 0 , due to the theoretical (QCD) nature of the uncertainty. The origin of this arbitrariness is the fact that the QCD errors are just estimated by varying the renormalisation and factorisation scales on arbitrary intervals. We present a similar discussion in the beginning of next Section (6.2) for ∆ scale X . Concerning the 1σ error from data, one can adopt ∆ data X = ∆ data X | 0 (∆ data X | 0 being read from Ref. [17,20]). Indeed, the probability distribution for the uncertainty induced by the experimental data can be safely described by a Gaussian, as described above, so that the errors provided by Ref. [17,20] can reasonably be interpreted as 1σ errors.
Let us now discuss the convolution betweenπ data X andπ αs X that appears in Eq. (6.10). For that purpose, we first need to discuss the form of the π αs distribution. The shape of π αs can be taken as flat since the uncertainty on α s originates mainly from theoretical uncertainty, as mentioned above. However, the choice of the prior for a theoretical uncertainty is often controversial, so that we will also consider the case of a non-flat π αs distribution. 27 Finally, the convolution of the Gaussian prior,π data X , with a flat prior,π αs X , gives rise to a Gaussian distribution,π data X π αs X , in a good approximation for the various Higgs production modes. The justification is that theπ αs X width, ∆ αs X , is systematically smaller or of the same order as ∆ data X , 28 in which case the convolution leads to an almost pure Gaussian prior. This will be demonstrated explicitly in Fig. (2) for other priors.
• Details on the combination with the PDF set error: The various PDF estimations provided by the different fitting groups reflect several sources of error [49][50][51]. Indeed, these groups make different choices/hypotheses about the numbers of free parameters used to model the PDF 29 , the statistical methods adopted to fit the data 30 , the number of independently parameterized PDF (in particular regarding (anti-) strangeness), the collider results exploited, the matching methods applied to include heavy-quark mass effects in the flavour number scheme and the variable-or fixed-flavour number scheme. All these sources of uncertainty are synthesized in the 1σ error on the Higgs production rates noted ∆ set X . To remain conservative, we assume ∆ set X = ∆ set X | 0 , where ∆ set X | 0 is the error read from Fig. (57)-(59) of Ref. [20]. ∆ set X | 0 can be estimated by taking 27 To be consistent throughout the paper, concerning the initial priors, we will assume a flat shape for the distributions whose shape is unknown (uncertainties from QCD, parametrisation. . . ). 28 For the ggF example, our conservative treatment of the errors provided in Fig. (59) of Ref. [20] gives an half absolute width, W/2= √ 3∆σ αs X = √ 3∆σ αs X |0 0.5 pb, which is indeed comparable to, ∆σ data 0.5 pb. In the alternative case (see the analogous discussion at the start of Section 6.2), one has instead, W/2= √ 3∆σ αs X = ∆σ αs X |0 0.3 pb, which is clearly smaller than, ∆σ data X 0.5 pb, so that the Gaussian approximation for the final convolution would be even better because this case would tend to a situation where the non-Gaussian error becomes negligible. 29 The infinite-dimensional problem of representing a space of functions is reduced to a finite-dimensional form, in order to be manageable, by introducing a parametrisation of the PDF. half the interval obtained by using the various PDF sets which lead to a finite number of predictions for the Higgs rate central values. Of course, this determination of ∆ set X | 0 is probably underestimated as (i) the hypotheses made by the groups provide illustrative examples which do not necessarily indicate the extremal values of the PDF, and, (ii) the effects of the various sources of error listed above can potentially compensate each other. We comment on this point in the following paragraph. In Eq. (6.9), the sum in quadrature between the ∆ set X error and the data and α s errors is justified because these are independent uncertainties. Nevertheless, in practice, for our numerical applications, we use the so-called envelope method 31 to determine ∆ PDF+αs X as done in Ref. [20,52] 32 and calculated by the LHCHWG [17]. Note that the envelope method overestimates the combined errors, compensating somehow for the underestimation of the PDF set error. For the ggF mechanism, the ∆σ PDF+αs ggF error derived in this way has to be reduced by ∼ 40% to recover the quadrature summation of Eq. (6.9), and the decrease is smaller for the other Higgs production reactions. Hence, we conclude that the use of the envelope method to determine the global PDF uncertainties gives rise to a substantial overestimation of these errors.
We finally discuss the shape of the prior of the final combination π PDF+αs X . Most of the sources of error taken into account in ∆ set X are of theoretical nature and all the errors have unknown distributions. The shape of π set X is therefore assumed to be flat. The convolution ofπ set X (see Eq. (6.10)) with the nearly Gaussian distributionπ data X π αs X leads in a good approximation to a final Gaussian prior, π PDF+αs X 33 . Once more, this is guaranteed by the fact that for any Higgs production mode at the LHC, ∆ set X is smaller or comparable to the combination of ∆ data X and ∆ αs X (see for instance Ref. [20]).

Scale and EFT errors: the amplitude uncertainties
• Scale error: There exists another major type of error, this time at the parton level, on the QCD prediction for Higgs production cross sections. It originates from the lack of knowledge on the higher order contributions to the amplitude in the perturbative expansion, and can be recast into the dependence on the QCD renormalisation and factorisation scales. We note δ scale X the nuisance parameter representing this "scale uncertainty". There are no strong arguments to choose the shape for π scale X . As for many other theoretical uncertainties, the choice of the prior is typically a subject of controversy. Here we choose π scale X to be flat. Concerning the magnitude of the scale uncertainty ∆ scale X , it is also not 31 This "envelope method" corresponds precisely to the uncertainty combinations in the bias approach, see Section 3.5. What we call envelope method in the present paper is rather described in Section 2.2.2. 32 In the envelope method used in this reference, the whole uncertainty interval is found by searching at the minimum and maximum rates (considering the various PDF sets, αs values and including the possibility to move along the data-error bars). Then dividing by two this interval gives an estimation of the combined error as well as a central value for the rate. 33 Given that there are several sources of errors contained in the PDF set uncertainty, one may expect the π set X prior to be somehow peaked. This feature improves even more the Gaussian approximation of π PDF+αs X . clear to which width exactly corresponds the provided value, noted ∆ 0 X here, that is found in Ref. [17,18,25]. It is reasonable to expect ∆ scale X to be of order ∆ 0 X . To be more precise, we could make the two different assumptions, ∆ 0 X= ∆ scale X or ∆ 0 X= W/2 where W is defined as the support of the distribution, 34 with e.g. in the case of a flat distribution on an interval with size W: In order to be conservative in the choice of ∆ scale X , we choose the former hypothesis throughout this paper: ∆ scale X = ∆ 0 X . It is remarkable that recently [25], the calculation for the ggF mechanism has been pushed up to the complete N 3 LO order in perturbative QCD. This has allowed a reduction of the symmetrized 35 scale error from ∆ 0  Figure 1: Probability density distribution, π amp ggF (x/∆ amp ggF ) (in red), involving the relative error x (in %) of the ggF cross section, as derived through the convolution of the π Q,V ggF and π scale ggF priors (both in blue). The quantity ∆ amp ggF represents the relative 1σ error on the Higgs production rate (see text). For better comparison, the normalisation is chosen such that all the functions possess the same maximum, equal to unity at the origin.

• EFT error:
In the specific case of the ggF mechanism, another source of error arises in the amplitude of the Higgs production [54], that we describe now. The evaluation of this amplitude beyond the NLO level is possible within the Effective Field Theory (EFT) approach, where the particles running in the triangle loop are assumed to be much heavier than the produced Higgs boson to integrate out the heavy particles. For the top quark exchange, the infinite mass assumption, m t m H , induces a negligible 34 Recall that the support of a distribution is the domain where this distribution is not zero-valued. 35 Symmetrized over the positive and negative errors as, ∆ = [(∆ 2 + + ∆ 2 − )/2] 1/2 . 36 Choosing instead, µ0 = mH /2, could be motivated by a faster convergence of the perturbative series [25].
However, since it would lead to a significantly smaller uncertainty, ∆ 0 ggF 2.13%, we stick to the central choice, µ0 = mH , in order to remain conservative. error on the ggF amplitude [27,55]. In contrast, the EFT approach is clearly not valid for the other significant ggF contribution: the bottom quark exchange [25]. This inappropriate use of the EFT limit introduces some non-negligible error mainly through the interference between the bottom and dominant top quark loops (this error being smaller at the Tevatron than at the LHC) [56]. A similar uncertainty originates from the mixed QCD-EW corrections to the ggF process [27]. Those have been calculated at NNLO via the EFT approach based on the simplifying but unrealistic assumption, M W,Z m H . For all the EFT errors, some approximative estimations can be computed at NNLO (using K-factors obtained at NLO and NNLO for the top loop) [26,55]. A related uncertainty comes from the freedom in the choice of a renormalisation scheme for the bottom quark mass, involved in the ggF amplitude (on-shell scheme, MS scheme. . . ). The error from the renormalisation scheme dependence can be approximately estimated at NLO [55]. These three sources of theoretical uncertainty, namely the two kinds of EFT assumptions (on the heavy quark masses, m Q (Q = b, t), and vector boson masses, M V (V = W, Z)) and the m b scheme dependence, are independent and their respective priors are unknown. We assume these priors to be flat. To be conservative, we take the three 1σ errors to be equal to the numbers estimated in Ref. [26,55], for the 8 TeV LHC. Summing those in quadrature gives rise to the relative rate error, ∆ Q,V ggF= ∆σ Q,V ggF /σ SM ggF 5.6%. The convolution of the three flat priors (accordingly to Eq. (3.5)) leads to the blue distribution, π Q,V ggF , shown in Fig. (1), which already resembles a Gaussian shape as predicted by the central limit theorem.
• Combining the ∆ scale ggF and ∆ Q,V ggF errors: The theoretical scale and EFT uncertainties on the ggF mechanism are of different nature and are thus independent. The combined ggF 1σ error is in turn given by This error constitutes the characteristic width of the π amp ggF distribution obtained by convoluting theπ scale ggF andπ Q,V ggF priors, as performed in Fig. (1) (see the final red curve). Remarkably, this distribution,π amp ggF ≡π scale ggF π Q,V ggF , (6.12) derived from four purely flat priors, is Gaussian in a good approximation. This can be also seen in Fig. (3) where π amp ggF is plotted together with a pure Gaussian distribution (blue curves). Recall thatπ amp ggF (x) = π amp ggF (x/∆ amp ggF ) and the variable x corresponds to δ amp ggF ∆ amp ggF .

Combination of the PDF and amplitude errors
For the various Higgs production modes -except the ggF process that will be discussed separately below, one has to combine the PDF and scale errors to determine the final uncertainty on the whole cross section. The scale error adds up to the PDF error of Eq. (6.8), according to Eq. (5.9), defining the total uncertainty on the cross section, These errors being independent, the 1σ widths add-up in quadrature, as dictated by Section 3.2, i.e. irrespective of the π PDF+αs X and π scale X shapes. Recall that ∆ σ X is the 1σ width of the resultingπ σ X distribution. The prior π σ X of this total uncertainty is then given by (see Eq. (3.5))π σ X ≡π PDF+αs X π scale X , (6.15) withπ σ X (x) = π σ X (x/∆ σ X ) and x corresponding to δ σ X ∆ σ X . Let us discuss the form of the π σ X function, as generated through Eq. (6.15). The shape of π scale X being unknown, we assume a flat π scale X distribution. Remind that this error is simply obtained by varying the QCD scale, so that no favoured value is predicted for the cross section. It is therefore a sensible choice to assign equal probabilities to all the values of δ scale X (or equivalently of the Higgs cross section) inside a certain range. On the other hand, we have seen in Section 6.1 that π PDF+αs X is approximatively Gaussian. Given the relative values of ∆ PDF+αs X and ∆ scale X for each process X -which are systematically such that either ∆ PDF+αs

-a Gaussian π PDF+αs
X and a flat π scale X lead in a good approximation to a final Gaussian π σ X . This combination is shown in Fig. (2) for ZH production, for which ∆ PDF+αs • The ggF reaction: In the case of Higgs production via the ggF mechanism, the PDF error has to be combined with the whole amplitude error studied previously in Section 6.2. The resulting total error on the cross section is These two errors being independent, their widths add-up in quadrature,  Fig. (3), using the π amp ggF distribution obtained in Fig. (1) Figure 2: Probability density distribution, π σ ZH (x/∆ σ ZH ) (in red), involving the relative error x (in %) of the ZH production cross section, as derived through the convolution of a Gaussian π PDF+αs ZH and a flat π scale ZH priors (both in blue). The quantity, ∆ σ ZH , represents the relative 1σ error on the Higgs production rate. The normalisation is chosen such that all the functions possess the same maximum, equal to unity at the origin. The 1σ band for the π scale ZH distribution is indicated by the vertical dotted lines.  Figure 3: Probability density distribution, π σ ggF (x/∆ σ ggF ) (in red), involving the relative error x (in %) of the ggF cross section, as derived through the convolution of a Gaussian π PDF+αs ggF prior and the π amp ggF distribution obtained in Fig. (1) (both in blue). The quantity, ∆ σ ggF , represents the relative 1σ error for the ggF rate.

The production contamination
There are several production mechanisms for the Higgs boson (recall that X = {ggF, VBF, WH, ZH, ttH}). The cross section for each of these production modes is associated with a theoretical uncertainty, that has been obtained through subsections 6.1 to 6.3. In fact, one may note that the uncertainties of these various cross sections are potentially correlated, as they partly arise from common sources like the α s parametric error. Therefore the δ σ X follow a common distribution π σ , which does not necessarily factorise into π σ ggF π σ VBF × . . . The aspect of correlations among the cross section errors will be further discussed in Section 7.1. Here we shall proceed using the most general prior π σ , and we denote the resulting correlation matrix as ρ σ XX . 39 The contribution from the cross sections errors in a given detection channel can be read from Eq. (5.8). Let us first adopt a more compact notation, where the δ σ X ∆ σ X are defined in Eqs. (6.13), (6.16). The Higgs detection channels have been designed to select predominantly a certain mode of production. That is, for a given channel i, the experimental cuts are profiled so that typically the efficiency i X for one of the production modes X (see Eq. (4.6)) is much larger than for the others, implying a hierarchy among the ∆ X,i . We can therefore use the leading moment approximation, developed in Section 3 and Appendix A, to proceed to the combination of the errors. Applying the leading moment approximation amounts to treat the contaminations as a small perturbation of the uncertainty from the leading production mode. The cross section uncertainties propagate in a given detection channel as (P stands for production) Here the label of the combined nuisance parameter δ P X i is chosen to be the label of the dominant production mode in the i channel. Note that X i should be understood as X(i). This naming refers to the fact that the shape of the combined nuisance parameter prior corresponds approximatively to the shape for the dominant uncertainty, see Eq. (3.14). For example, if the production mode ggF dominates in the channel i, one has δ P X i = δ P ggF . (6.21) The various nuisance parameters δ P X are potentially correlated. They should thus follow a joint prior distribution, π P , generating a correlation matrix ρ P XX . Assuming generic correlations ρ σ XX among the various cross section errors, the magnitude of the combined production uncertainty in a channel i is given exactly by The leading moment approximation then dictates (see Eqs. (3.16)-(3.19)) that π P ≈ π σ . (6.23) Equation (6.23) implies that the correlations among the δ P X are approximatively the same as the ones between the δ σ X , i.e. ρ P XX ≈ ρ σ XX .

(6.24)
This fact can be understood as follows. Consider only two detection channels, i and j.
If the same production mode X= X i = X j dominates in both channels, they are nearly 39 In Section 7.1, the assumptions adopted for ρ σ XX will allow us to express π σ in terms of the π σ X .
100% correlated, so that they are described by a single nuisance parameter δ P X , which is equivalent to say that ρ P XX ≈ 1. Note that one has ρ σ XX = 1 by definition, so that ρ P XX ≈ ρ σ XX . Besides, if two different production modes X i = X j dominate respectively in the i and j channels, the uncertainties in both channels are respectively described by δ P X i and δ P X j . These two nuisance parameters inherit the correlation from the leading production modes X i and X j , which is given by ρ σ X i X j . Therefore one recovers Eq. (6.24). Finally, notice that for certain kinematical cuts selecting the ttH mode in the diphoton decay channel [29], even additional production modes can slightly contribute, like the bbH, tHW and tHbq productions. These production modes participate in the contamination and have thus been included in the combination of production modes in Eq. (6.20).

The uncertainties on branching ratios
Two sources of error affect the Higgs signal strengths: the production and the decay rate uncertainties (see Eq. (4.6)). The latter is often not considered in the Higgs fits. Still following our approach of step-by-step combinations, one should start with the signal strength error Eq. (5.8), where all uncertainties on production modes have been already combined (Eq. (6.20)). The uncertainties on production and decay rates combine thus as, up to an irrelevant global sign, where Γ SM Y i is the SM partial decay width for the detection channel i. In this equation, we apply the leading moment approximation to treat the branching ratios errors as perturbations of the leading error from production modes. This is why the δ µ X i parameters carry the index X i , which is the index of the dominant production mode in the channel i, as in the previous subsection. For example, if the production mode ggF dominates in the channel i, one has δ µ X i = δ µ ggF . (6.26) The relative error δ B Y i ∆ B Y i on the SM branching ratio is expressed as in Eq. (5.7), where the decay width uncertainty (5.10) can now be specified in terms of the various sources of error (c.f. Section 3 of Ref. [55] for a recent overview, and references therein), The partial decay width errors ∆Γ errors are associated to Gaussian distributions, and are thus identified without ambiguity with the errors defined in Ref. [20]. The ∆Γ thu Y errors are purely theoretical, so that one associates them with flat priors. To adopt a conservative prescription, as in Section 6.3, we interpret the numbers given in [17] as 1σ-widths. These numbers are thus directly identified with the ∆Γ thu Y . Now inserting Eq. (6.27) into Eq. (5.7) provides the contributions of the theoretical and parametric uncertainties to the branching ratios, where in the last line one introduces a compact notation for the error magnitudes. The sum over Y here must include all the individual Higgs decay channels (not only the ones effectively detected at colliders), namely Y ≡ bb, cc, WW, ZZ, ττ , γγ, gg . . . We stress that the parametric error δ on various decay rates Y arises from the same source (namely, varying the fundamental parameter a). The parametric errors on the various decays are thus fully correlated. Therefore, one could in principle drop the Y index on δ pu a Y . There is however a subtlety, because these errors can be either 100% correlated or 100% anti-correlated. The use of parameters δ pu a would render the full correlation manifest, but minus signs would have to be included in certain ∆ pu a Y . Here instead, we chose positive ∆'s by convention. We have thus to keep the Y index on δ pu a Y , bearing in mind that this Y labels only 100% correlation or anti-correlation. A second subtlety is that these signs are actually not clearly given in the literature. Rather, only the absolute values of the ∆ pu a Y | 0 are provided. We adopt a conservative choice by assuming that all these errors are 100% correlated.
We can now apply the leading moment approximation on the combination of Eqs. (6.25)-(6.28), where the leading uncertainty is δ P X i ∆ P i and the perturbation is The 1σ-width of the global theoretical uncertainty in a channel i is given by with ∆ P i given by Eq. (6.22). Regarding the prior distribution of the δ µ X , the discussion is exactly the same as the one in Section 6.4. That is, following the leading moment approximation, the joint distribution of the δ µ X corresponds to the one of the leading uncertainties δ P X , so that π µ ≈ π P . (6.30) This implies in particular that the δ µ X inherit the correlations from the δ P X , that is ρ µ XX ≈ ρ P XX .
Let us discuss the correlations used to derive Eq. (6.29), which are drawn from Ref. [17,18,20]. First, a given parametric uncertainty associated to δ pu a Y introduces 100% correlated errors among the various decay modes Y , so that the sum over Y of the ∆ a Y,i is linear. Recall the parametric correlations are taken to be all positive. There is also a slight correlation between δ P X i ∆ P i and δ pu αs Y ∆ αs Y,i , because δ P X also contains a contribution from the α s error.
The α s contribution being subleading in δ P X , its correlation with δ pu αs Y is expected to be small, so that we can neglect it. All the other sources of uncertainties are independent due to their different origins, so that summations in quadrature appear everywhere else in Eq. (6.29). Using the definitions of the reduced ∆'s in Eq. (6.28), we finally write explicitly the total theoretical uncertainty on the signal strength of a Higgs detection channel i, . (6.31)

Summary
In this section we have assembled step by step all the theoretical uncertainties on the Higgs signal strengths, starting from the Higgs likelihood Eq. (6.2). This combination is made possible by the statistical analysis of Section 3, whose results have been extensively used here. The final Higgs likelihood involving the combined errors reads The only label for the combined nuisance parameters δ µ X i is X i , the dominant production mode for a given channel i (see for instance Eq. (6.26)). The prior π µ is approximately equal to the prior of the production mode uncertainties π σ , through Eq. (6.23) and Eq. (6.30). In Section 7.1, the assumptions on the correlations among the production modes will allow us to express π σ in terms of the priors of individual production mode uncertainties π σ X (see Eqs. (3.16)- (3.19)).
One of the outcome of the combination procedure followed throughout this section is that the shape of the combined priors π σ X appears to be almost Gaussian. This comes partly because some of the priors for the individual sources of uncertainty are Gaussian. However, the main reason is actually that a substantial number of the individual sources of uncertainty are independent and of same order of magnitude. These conditions resemble to the ones of the central limit theorem, which predicts that the combination would converge towards a Gaussian distribution. Besides, the small errors from contamination and partial decay widths do not affect either the final prior shape under the leading moment approximation. It follows that the π µ distribution is close to a multivariate Gaussian distribution.
Finally, we stress again that the famous question of the linear versus quadratic summation of individual errors (as the ones used in this section to derive ∆ µ i in Eq. (6.31)) relies uniquely on the correlations among the errors, and is therefore independent of the shapes of the priors. This general feature holds when uncertainties are combined using Bayesian statistics.

Correlations of the detection channels
In this subsection we focus on the correlations among Higgs detection channels induced by the theoretical uncertainties. These correlations appear whenever a source of uncertainty contributes simultaneously to various channels.
As a preliminary observation, let us recall that these correlations are sometimes not taken into account in the literature. What is typically done in such case is that some amount of error, typically from Refs. [17][18][19][20], is added independently to the statistical error of each detection channel. Such combination typically reads (∆µ ex i ) 2 + (∆µ th i ) 2 if done in quadrature. From the point of view of nuisance parameters, this combination would correspond to associating one independent δ µ i ∆ µ i to each detection channel, and thus performing one integration per channel in the marginal likelihood.
The issue with such approach is that the correlations among channels induced by the theoretical uncertainties are lost. As stated in Section 2.2.2, these correlations are crucial because they potentially change the tension among the various channel measurements, which in turn can modify the best-fit regions. As slight modifications of the best-fit regions are expected in presence of new physics, treating correctly the theoretical uncertainties is fundamental.
Taking into account the correlations among channels amounts to consistently propagate the theoretical errors into the different detection channels. This is precisely what is done through the combination procedure of Section 6. Combining the errors together and using the leading moment approximation to treat subdominant errors, only five nuisance parameters δ µ ggF , δ µ VBF , δ µ ZH , δ µ WH and δ µ ttH arise (see Eqs. (6.31)-(6.32)). The uncertainty on each channel is described by only one of these δ µ X , where the X corresponds to the dominant production mode in this channel. That is, all channels dominated by the same production mode X have the same nuisance parameter δ µ X . This implies that these channels are 100% correlated.
In principle, the combination procedure of Section 6 describes the complete distribution for the δ µ X , π µ , including the correlations ρ µ XX among the different δ µ X . In practice, a complete knowledge of the correlations among the individual sources of uncertainties is needed to obtain ρ µ XX . Here we consider the determination of ρ µ XX as beyond the scope of this paper, since for example one would have to work out clearly the correlations among the Higgs production modes induced by the PDF data uncertainties (δ data X ). Using the information available in the literature we will rather consider some characteristic cases for ρ µ XX .
Let us first discuss the typical correlations induced by the PDF uncertainties (originating from the PDF data fit) and the scale uncertainties (c.f. Section 6.2) on the production cross sections. From now on, the δ µ X are denoted as δ X for simplicity, First, we will set δ ggF = −δ ttH since an anti-correlation between the corresponding PDF errors is reported in Ref. [57] 40 . Note that in reality, this anti-correlation is not total (its value is -0.6 in Ref. [57]) and furthermore the other source of error, the scale uncertainty, does not correlate the ggF and ttH cross sections as these come from independent QCD calculations.
The correlation coefficients of the PDF errors -between ∼ 0.63 and 0.93 [57] -for the three other production modes motivate us to take δ VBF = δ ZH = δ WH . This assumption is further justified by the fact that the PDF error is larger than the scale error (particularly for VBF) and that the scale error most probably correlates the ZH and WH modes. The correlation coefficients of the PDF errors between ggF and WH (-0.23), ZH (-0.14) or VBF (-0.57) suggest to consider the two extreme cases of vanishing correlation and 100% anti-correlation. The scale uncertainties tend to decorrelate these modes. It is thus coherent to consider the cases of vanishing correlation and 100% anti-correlation as the two extreme cases to study. All these assumptions are summarized as the two following configurations on the nuisance parameters, 41 keeping in mind that the realistic situation lies in between these extreme cases.
Regarding the PDF set error, the individual uncertainties giving rise to this error are not available in the literature. Rather, only the global PDF set error is estimated by changing various assumptions at a time. One can at least notice that the PDF set errors can be potentially correlated either negatively or positively, respectively, for the ggF and VBF reactions or the VBF and VH processes, as observed from the relative signs of rate variations in Fig. (57) of Ref. [20] when changing the PDF set. 42 These correlations are roughly consistent with the ones in Eq. (7.3).
Let us describe how the correlation configurations of Eq. (7.2)-(7.3) are related to the π µ appearing in the marginal likelihood (6.32). The prior π µ is approximately equal to the prior of the production mode uncertainties π σ (Eq. (6.23) and Eq. (6.30)) which can itself be expressed (according to (3.18)- (3.19)) in terms of the π σ X under the assumptions (7.2)-(7.3). One ends up with the two final priors, associated respectively to the correlation configurations of Eqs. (7.2)-(7.3), where δ() denotes the Dirac distribution. 40 It is not clear from this reference whether the correlations include as well the whole error from αs which is 100% correlated between the production modes. Nevertheless this source of error is minor compared to the other ones. 41 For consistency, these two configurations are used as well to determine the ρ σ XX correlation matrix of Eq. (6.22). 42 Recall that the Fig. (57) of Ref. [20] is for the 8 TeV LHC.

The Bayesian analytical likelihood
The π σ X priors deduced from the combination of all the cross section errors, in Section 6.3, have been found to be nearly Gaussian distributions. These Gaussian shapes are obtained by choosing flat shapes for all the unknown priors for theoretical uncertainties. As mentioned in Section 6.6, one expects this result to hold approximatively for other choices of initial priors. Nevertheless, in order to take into account in our numerical results the possibility of non-flat initial shapes, we also consider a totally different form of the final prior: we take it as a flat distribution. The choice of these two shapes (Gaussian and flat) provides an estimate of the impact of the prior shape on the final results. The distributions π σ X appearing in Eqs. (7.4)-(7.5) are hence defined as for the Gaussian and flat cases respectively. Recall that the variance of all the δ's, including δ σ X , are chosen to be equal to one for any prior shape. This appears clearly in Eq. (7.6) and implies the [− √ 3, √ 3] interval in Eq. (7.7). For analytical integrations of the final likelihood (6.32), it is convenient to denote by X a subset of fully correlated production modes, {X, X , . . .}. We then denote by Ω X the subset of channels (labelled by i) dominated by the production modes contained in X . In presence of anti-correlations, one further divides Ω X into two anti-correlated subsets Ω + X , Ω − X . Finally, the set of all channels is written Ω. Assuming the correlations among production modes follow Eq. (7.2), the set of detection channels is splitted into Ω {ggF,ttH} and Ω {VBF,WH,ZH} . Ω {ggF,ttH} is then splitted into the anti-correlated subsets Ω + {ggF,ttH} = Ω ggF , Ω − {ggF,ttH} = Ω ttH . Assuming the correlations of Eq. (7.3), there is instead a unique set Ω = Ω {ggF,ttH,VBF,WH,ZH} . It is splitted into the anti-correlated subsets Ω + {ggF,ttH,VBF,WH,ZH} = Ω ggF , Ω − {ggF,ttH,VBF,WH,ZH} = Ω {ttH,VBF,WH,ZH} . At that point it is also convenient to introduce the following quantities ζ X and η X X defined as The overall sign of ζ X is irrelevant. Note also that if X = X (as may occur in the η X X function), there are no theoretical correlations at all between the channels belonging to Ω X and Ω X .
In the case of a Gaussian prior (Eq. (7.6)), it is noticeable that the most general likelihood (6.32) can be integrated analytically and results in the simple analytical expression 43 Here δ X X is the Kronecker symbol. L µ is the base likelihood defined in Eq. (5.1), i.e. the likelihood before introducing nuisance parameters. One observes that the marginal likelihood takes the form of a product of the base likelihood with a term generated by the theoretical uncertainties. This term, which depends on c V , c f through ζ X , as well as on all theoretical and experimental uncertainties, implements all the deformations and correlations induced by the theoretical uncertainties.
For the case of no experimental correlations between different group of channels of dominant production modes, including the case considered without experimental correlations at all (see Section 5.1), one has η X X = 0 for X = X and The marginal likelihood (7.9) then reduces to, Note that this product is over different X subsets i.e. there are no theoretical correlations among the channels belonging to the different Ω X groups. Note that if one assumes a single independent nuisance parameter per channel, there is no sum in Eqs. (7.8), meaning that no correlation among channels is induced. 44 One can directly verify that in the purely de-correlated case (neither experimental nor theoretical correlations), Eq.(7.11) gives back the primary likelihood (5.1) with a summation in quadrature between the absolute experimental and theoretical errors, ∆µ ex i and µ ex i ∆ µ i . In the case of the flat prior of Eq. (7.7), there is no simple general form such as Eq. (7.9). However, assuming no experimental correlations among various Ω X subsets, the marginal likelihood takes a simple form, where Erf is the standard error function.

The marginal likelihood
In classical frequentist statistics, hypotheses are not associated with probabilities, so that there is no such thing as a prior distribution for a nuisance parameter. In the hybrid frequentist framework however, one can associate a parameter with a "prior" distribution that 43 A similar expression can also be obtained for an arbitrary correlation matrix ρ µ XX . Note one dropped an overall factor, as the likelihood is defined up to a normalisation constant. 44 We recall that such a combination should be avoided as it is not realistic.
can be seen as an extra likelihood constraining the nuisance parameter. Pushing forward the analogy with the Bayesian case, we worked out the way to combine uncertainties within frequentist statistics in Section 3.3. One may find however that the Bayesian combination of uncertainties are better defined than the frequentist one. More pragmatically, frequentist combinations are also more complicated, as the combination of the magnitude of the errors (the ∆'s) depends on the shape of the frequentist "priors", contrary to the Bayesian case. These drawbacks can constitute motivations to rather follow the Bayesian approach developed in previous sections. Nevertheless, for completeness we describe here the final part of the frequentist method for the Higgs fit. For that purpose we consider in the following, a generic prior, π µ (δ µ X ), of width ∆ µ i , obtained after a first phase of frequentist combination.
Recall that the frequentist marginalisation procedure, also called profiling, consists in maximizing over δ µ X , instead of integrating as done in Eq. (6.32). Hence the frequentist marginal Higgs likelihood reads As often done in practice for the frequentist treatment, one can equivalently minimize the χ 2 distribution, χ 2 = −2 log L, instead of the maximisation in Eq. (7.13), The best-fit point given by the χ 2 minimum in the (c f , c V ) parameter space is noted (ĉ f ,ĉ V ) and the best-fit regions are obtained by drawing contour levels of the difference (c.f. Section 2.1) at the values given by Eq. (2.8).

The frequentist analytical likelihood
Assuming that the Bayesian and frequentist combinations of the errors lead to analogous shapes for the final priors, we consider both a Gaussian and a flat shape for each π σ X prior, as in Eqs. (7.6)-(7.7). In the Gaussian case, the marginal likelihood (7.13) can be computed analytically, where the ζ X , η X X are defined as in Section 7.2. This is precisely the same result as for the Bayesian likelihood of Eq. (7.9), L Gauss B . For the case of no experimental correlations between the Ω X 's, the marginal likelihood with Gaussian prior thus simplifies just like in Eq. (7.11). 45 In this case, the marginal likelihood with a flat prior also gets an analytical expression, where ζ X , η X are defined as in Eq. (7.8), (7.10).

Numerical results
The frequentist marginalisation (likelihood (7.16) for the Gaussian prior or (7.17) for the flat one) is not illustrated here because the frequentist framework may seem slightly less consistent than the Bayesian one and the error combinations are more delicate. For these reasons, we rather recommend to use the Bayesian marginalisation technics for the Higgs fits. In any case, the Bayesian and frequentist approaches are expected to converge as the experimental uncertainties become small relatively to the theoretical ones. This situation will gradually occur in the next LHC Runs due to the decrease of the statistical uncertainties and the expected improvement in the knowledge of the experimental systematic errors.
We have described this feature in Ref. [58]. Now as a general remark allowing a better comprehension of the following subsections, let us try to explain in simple words the reason why the presence of nuisance parameters can indeed modify the size and the location of the best-fit domains in c V − c f . For the sake of understanding the impact on the size, it is easier to focus on frequentist marginalisation. Frequentist marginalisation can be seen as an approximation of Bayesian marginalisation, so that the same explanation holds for both. The frequentist marginalisation consists of a maximisation of the nuisance parameter (say δ µ X ) at any point in the space of the parameters of interest. This means that the value of δ µ X at a given point is chosen in order to maximise goodness-of-fit. Now, this improvement of goodness-of-fit is typically larger for the points far away from the best-fit point than for those close by the best-fit point. When this fact is true (which is usually the case), the operation of marginalising tends to enlarge the best-fit regions. The effect of the nuisance parameters on the location of the best-fit regions in c V − c f can be understood as follows. Recall that the nuisance parameters enter in the likelihood as µ ex i (1 + δ µ X i ∆ µ i ) (see Eq. (6.32)), so that they shift the central experimental value of the signal strength. This in turn can induce a change in the location of the best-fit point in c V − c f . Such a shift actually occurs if a non-zero value of δ µ X is preferred. This happens when a non-zero value for δ µ X helps relaxing the tensions (i.e. different preferred values of c V , c f ) among various signal strengths µ ex i . Notice that this means that the likelihood itself favours a non-zero value for δ µ X , even though the prior of δ µ X is centered on zero.

The forbidden case: no correlations
Following our overview approach, let us start with the simplest case: the Bayesian marginalisation in the absence of correlations between the theoretical errors of the different Higgs channels. Let us take for instance a Gaussian prior (taking a flat one would not change our conclusions). This case was described in more details in the beginning of Section 7.1 as well as in Section 7.2. In this "de-correlated" case, the likelihood is simply the primary likelihood (5.1) with a summation in quadrature of the absolute experimental and theoretical errors, (∆µ ex The best-fit domains in the c V − c f plane are derived following the standard procedure described in Section 2, and are shown in Fig. (4). Here and throughout Section 7.4, the priors for c V , c f are taken flat, π(c V,f ) ∝ 1. We see on this figure that the theoretical SM prediction (c V = c f = 1) lies well within the 68% C.L. 46 region. Physically, this implies that, with such a fit, no physics beyond the SM is required to interpret the 8 TeV LHC measurements of the Higgs rates. The increase of the best-fit domain sizes induced by the existence of theoretical errors is relatively weak, due to the sum in quadrature, as observed when comparing to the best-fit regions obtained with vanishing theoretical errors. The latter regions are superimposed on Fig. (4) for illustration purpose (as the dashed contours) and to ease the comparison with next plots. However let us recall that the likelihood used here (and leading to the colored regions of Fig. (4)) is not realistic as the correlations among the Higgs channels should not be neglected. We thus do not recommend the use of this likelihood.

Flat prior
From now on we consider the more realistic likelihoods obtained in Section 7.2. These likelihoods contain all the correlations between Higgs channels induced by the theoretical uncertainties. First, we consider the configuration with two independent nuisance parameters (see Eq. (7.2) and Eq. (7.4)). The Bayesian marginalisation over these two nuisance parameters leads to the analytical likelihood (7.12) for flat final priors. Applying the standard Bayesian procedure, described in Section 2, we find the best-fit regions of Fig. (5)[left]. By comparing the colored plots in Fig. (4) and Fig. (5)[left], one observes clearly a shift of the best-fit regions. This shift originates from the theoretical correlations that are taken into account in Fig. (5)[left]. This shift occurs because the relaxation of the tensions between the individual signal strength measurements (see discussion in the introduction of Section 7.4) is different in the correlated case and in the "de-correlated" one. We emphasize that this shift is a consequence of taking into account the theoretical correlations. Indeed we will see in next subsection that the same effect occurs for a different prior shape. Concerning the region size, a slight increase occurs relatively to Fig. (4). This comparison can be done by looking at the reference case (dashed contours) without theoretical errors at all, which is once more superimposed on Fig. (5)[left]. The plot on the right hand side of Fig. (5) is the same as the left plot but for the second correlation configuration, involving a single nuisance parameter (discussed in Eq. (7.3) and Eq. (7.5)). The effect of the theoretical correlations (relatively to Fig. (4)) appears to be softer than for the left plot: the shift is smaller. This difference between the two colored regions of Fig. (5) makes clear that the theoretical correlations have an important impact on the fits, and should thus be carefully taken into account.
As described below Eq. (7.3), the most realistic correlation configuration is most probably an intermediate configuration between those adopted in the two plots of Fig. (5). We thus conclude that, with the statistical treatment adopted here, the SM prediction remains in a good agreement (1σ level) with the 8 TeV LHC Higgs data, even once realistic theoretical correlations are taken into account.    (7.19)). The 68%, 95% and 99% credible domains are indicated respectively by the green, yellow and grey areas.

Gaussian prior
appears that there is no substantial difference (neither in location, size nor shape of the best-fit regions) between these two figures. This illustrates the mild impact of the choice of the shape for the prior of the theoretical uncertainties. We conclude that, with the present statistical uncertainties on Higgs data, the recurring question of the exact shape of the prior, 48 in particular for the errors due to truncated perturbative expansions in QCD, is nearly irrelevant.
However we should stress that this insensitivity to the prior shape occurs because the experimental uncertainties of the current data are typically larger or of the same order as the theoretical ones. This situation is expected to change with the upcoming LHC runs, as the statistical uncertainties will decrease with the integrated luminosity.

The nuisance parameters favoured by the data
Let us now consider the posterior distribution for the theoretical uncertainties themselves, instead of the posterior for the parameters of interest. Here we shall take the priors associated with the theoretical uncertainties (π σ X ) as flat and with an infinite range. For such choice of prior, the information of the posterior is fully contained in the likelihood (second line in Eq. (7.19)). The interest of this data-dominated posterior is that it allows us to study exclusively the information that the sole Higgs data provide about the theoretical uncertainties, ∆ µ i . We first consider the case with a single nuisance parameter δ ggF (i.e. the fully correlated case), given in Eq. (7.3), and we present in Fig. (7) the data-dominated posterior for δ ggF , 48 Including the details of the form at the boundaries in case e.g. of a flat distribution.
This posterior is obtained by integrating the likelihood of Eq. (6.32) (with π µ given by Eq. (7.4)) over all δ's but one, chosen to be δ ggF , and marginalising with respect to the c V , c f parameters with π(c V , c f ) ∝ 1. It appears in Fig. (7) that the posterior for δ ggF is centred on δ ggF −1. 49 This means that for each signal strength, the data typically favour a value falling at ±1σ (i.e. at ±∆ µ i ) from the nominal value µ ex i . In other words, for the correlation configuration of Eq. (7.3) the Higgs data provide a non-trivial indication that the magnitudes of the theoretical errors are reasonably well estimated. Indeed, the theoretical estimations predict the µ ex i to lie typically within the 1σ interval ±∆ µ i . This compatibility suggests that the ∆ µ i uncertainties, whose estimations rely on quite ad hoc QCD scale variations and on the arbitrariness in the choice of PDF sets, are nevertheless quite robust. On the other hand, one also notices in Fig. (7) that the credible intervals for p(δ ggF |µ ex i ) go beyond −1. This could be taken as an argument for slightly increasing the overall magnitude of the theoretical uncertainties (see next subsection).
The correlation configuration with two nuisance parameters, given by Eq. (7.2), leads to larger preferred values for the nuisance parameters δ ggF −2, δ VBF −5. We interpret these very large values as the fact that neglecting totally the correlation between the two nuisance parameters is an unrealistic hypothesis (as already described in Section 7.1). As a matter of fact, if one restored the usual prior for the δ's (i.e. a prior with unit variance, V [δ] = 1), a hypothesis testing would show that the data favour the correlation configuration of Eq. (7.3) with respect to the configuration of Eq. (7.2).

More conservative theoretical errors
Throughout this paper, we have been observing that, among the various origins of theoretical uncertainty involved in the Higgs fit, some are of a nature (see Section 6.1 -6.5) which renders difficult the exact determination of the associated 1σ interval. These are the truncation of the perturbative expansion for the QCD calculation of Higgs rates translated into an arbitrary error range for the renormalisation/factorisation scale µ = µ R = µ F (affecting the production and decay amplitudes as well as the α s coupling constant), the choices made (on the statistical method, the number of free parameters. . . ) in the different PDF sets, and finally the m b renormalisation scheme and EFT assumptions for the ggF mechanism. These considerations can be taken as a motivation to adopt more conservative theoretical errors.
Moreover, we have seen in the previous subsection (see Fig. (7)) that the data tend to prefer theoretical uncertainties that are somewhat larger than the combined 1σ width ∆ µ i obtained in Section 6, see e.g. the 68% C.L. interval in Fig. (7). Taking seriously this fact, it makes sense to perform the fits with a slight overall increase of the uncertainties. We suggest a rescaling ∆ µ i → 1.5 ∆ µ i (7.20) 49 For comparison, the maximum of p(δggF, cV = c f = 1|µ ex i ) is reached for δggF −0.7. as a reasonable estimation for a most conservative choice of theoretical uncertainties. Notice that the rescaling of Eq. (7.20) is equivalent (c.f. Eq. (6.32)) to rescale by 1.5 the axis on Fig. (7). For example, the point δ ggF = −1 becomes δ ggF = −1.5. The best-fit regions with ∆ µ i × 1.5 are shown in Fig. (8) for the two correlation configurations and considering the flat prior case (Eq. (7.12)), keeping in mind that with the current Higgs data, the final prior shape does not affect significantly those best-fit domains. The impact of the increase of the theoretical uncertainties (Eq. (7.20)) on the fit of the current Higgs data can be seen by comparing Fig. (5) and Fig. (8). It turns out that the shift of the preferred regions with respect to the case without theoretical errors gets slightly accentuated. In the correlation configuration of Eq. (7.4), i.e. with two independent δ X , it even appears (see Fig. (8)[left]) that the SM point moves just outside the 68% C.L. region.
The increase of this shift can be understood by recalling that rescaling the ∆ µ i is equivalent to increase the width of the δ X prior. It is then clear that more possibilities are opened for the preferred values of δ X . It turns out that these preferred values move further away from zero, which induces a more pronounced shift of the best-fit regions.
Even though these effects are not statistically significant for the current Higgs data, we stress that the impact of the theoretical errors will increase while more data will be accumulated at the LHC. The ambiguity existing in the theoretical errors estimation deserves thus to be taken into account. For future LHC phenomenological studies, we suggest to take into account, in the same way as proposed in this subsection, the impact on the fits from the lack of knowledge in theoretical errors.

Biasing the Higgs likelihood
The principle of bias has been presented in Section 2.2.2. To have a self-consistent section, we recall here the basics of a "biasing" procedure. We distinguish two realisations of the bias principle: the extremal bias and the envelope method.
The method of extremal biasing consists in drawing the best-fit regions for the parameters of interest for extreme fixed values of the theoretical errors. By the word 'extreme', we mean that we set the nuisance parameters δ at ±1 (corresponding to one-standard deviations with our conventions) in order to obtain a strong impact on the fit. In our Higgs fit, the theoretical uncertainties affect the signal strengths µ ex i , which in turn modify the preferred value of µ th i (c f , c V ) and thus the best-fit regions of c V , c f . Note that the choice of extreme values δ ± 1 can be seen as natural, and for that reason will be used in our numerical results, but strictly speaking remains only a choice with a certain degree of arbitrariness.
The envelope method corresponds formally to the continuous version of this extremal biasing. Loosely speaking, this is what one obtains if one does the fit for each fixed value of the nuisance parameters between the extreme values δ = ±1. One expects typically a deformed contour somehow interpolating between the regions of extremal biasing. For a more formal and unified description of these biasing methods, see Section 2.2.2.
What are the motivations for choosing the marginalisation or the bias approaches (extremal bias or envelope method) in the Higgs fits? The lack of knowledge on the shape of the prior associated to the main QCD uncertainties discussed in Section 6.3 encourages one to apply a bias method, which does not rely on the prior shape -in contrast with the marginalisation. Besides, the bias is more conservative. Indeed, while in the marginalisation the best-fit domain corresponds roughly to nuisance parameters centered around a preferred δ X value, in the bias methods δ X rather spans by construction its [−1, 1] interval without favouring any value. Hence, generally speaking (and this is the case for the Higgs fit), the best-fit regions in the space of the parameters of interest obtained through the bias methods are wider than the ones from marginalising. In addition, the envelope method allows one to see at a glance the whole best-fit domain in the c V − c f plane spanned by varying the nuisance parameters inside their entire [−1, 1] intervals. The price to pay here is maybe a heavier technical approach than in the marginalisation procedure: compare the marginalisation definitions in Eqs. (2.9),(2.10) with the biasing definitions in Eqs. (2.14),(2.17) (see for example Eq. (6.32) and Eq. (8.5) for the application to the Higgs likelihood). It is clear that more operations (either integrations or maximisations) are needed for the envelope method.

Combining the uncertainties
The starting point is the likelihood (5.1), and then (5.11). Applying the Eqs.
which is a new compact notation comparable to Eq. (6.19), we obtain the likelihood depending on a unique nuisance parameter, δ b , 2) relying on the combined error, or, 1}. This equation dictates to use the sum of the posteriors at δ b = −1 and δ b = 1, with each posterior separately normalised by its integral over the c V − c f plane.

Envelope method
The envelope method corresponds to letting vary continuously δ b within [−1, 1], i.e. this is the continuous version of the extremal bias, as discussed in Section 2.2.2. The corresponding likelihood isL .
This likelihood is derived by applying Eq. (2.14) with the likelihood L bias (c V , c f , δ b ) from Eq. (8.2). The best-fit regions are obtained through the standard procedure of Eqs. (2.3)-(2.4)-(2.5). Again, we take the priors for the parameter of interest to be flat, π(c V,f ) ∝ 1.

Extremal bias
For the extremal bias in the frequentist framework (see Section 2.2.2), one uses again the likelihood L bias (δ b ) (Eq. (8.2)), with δ b fixed at the two extreme values δ b = ±1. In practice, in order to draw the best-fit regions in c V −c f , one can define a χ-squared function difference as follows from Eq. (2.6). Remind that χ 2 (ĉ f ,ĉ V , δ b ) stands for the minimum of χ 2 with respect to c f , c V for a given δ b . The best-fit regions are obtained by drawing the contour levels of ∆χ 2 set at the values given in Eq. (2.8). Once more, the prior for the parameters of interest entering in Eq. (2.6) are taken flat, π(c V,f ) ∝ 1. If the two extreme regions overlap, the same remark as in the Bayesian case holds. To display consistently the two regions together, one has to follow the rigorous definition of Eq. (2.17), using a discrete domain D = {−1, 1}. This equation dictates to use the minimum of the two ∆χ 2 , i.e. min δ b ∈{−1,1} [∆χ 2 (c f , c V , δ b )].

Envelope method
For the envelope method in the frequentist case, one can proceed with the χ 2 introduced in Eq. (8.6) and definē according to the general definition of Eq. (2.17). This equation is the frequentist analog of Eq. (8.5). In order to draw the best-fit regions in the c V − c f plane, one should then define The best-fit regions are obtained by drawing the contour levels of ∆χ 2 set at the values given in Eq. (2.8). Again, the prior for the c V , c f parameters entering in Eq. (2.6) are taken flat, π(c V,f ) ∝ 1. Let us finally recall the parallel between Eq. (8.5) and Eq. (8.7). As first explained in Section 2.2.2, the subtracted term in Eq. (8.7) is the frequentist analogy of the ratio over dc f dc V L bias (c f , c V , δ b ) in Eq. (8.5). In both cases, the effect of this term is to remove the contribution of δ b to goodness-of-fit (which avoids favouring specific values of δ b ). Both formulas are analog up to exchanging integration over δ b with minimisation over δ b . The fact that the integration/minimisation over δ b is performed on the whole range [−1, 1], rather than on the discrete domain {−1, 1}, leads to an envelope in the c f − c V plane, instead of two distinct domains as in the extremal bias.

Numerical results
In this section, we apply both the frequentist and Bayesian versions of the bias method to the Higgs likelihood. We stress that the Higgs likelihood L bias (δ b ) is exactly the same in the two statistical frameworks, so that the discrepancies observed among the plots originate solely from the different statistical treatments. These two treatments differ in their definition of the best-fit regions (see Section 2.1) and their realisation of the bias principle (see Eqs. (2.14), (2.17)).

Extremal bias
In Fig. (9), we present the best-fit regions obtained through the Bayesian and frequentist bias methods, respectively described in Sections 8.  Fig. (9) correspond to the two correlation configurations surrounding the case with realistic correlations. It turns out that the best-fit regions obtained in these two extreme correlation configurations have only mild differences. Now, compare the two upper plots and lower plots of Fig. (9), corresponding respectively to the frequentist and Bayesian treatments. A small difference appears at the junction of the two set of regions, coming from the different realisation of the bias principle in the two statistical frameworks. Besides, the frequentist best-fit regions are slightly larger than the Bayesian ones, due to the non-equivalent definitions of the Bayesian and frequentist contours. Overall, there is a strong resemblance between the Bayesian and frequentist results. This reflects the weak impact of choosing the Bayesian or frequentist procedure for the extremal bias.
Let us now compare the lower plots of Fig. (9) with the previous Bayesian marginalisation plots obtained in Fig. (5) -considering of course respectively the two correlation configurations used in the left and right plots. One can clearly see that the best-fit regions 50 obtained from the extremal bias are larger than the ones obtained through marginalisation. This is because the regions in Fig. (5), derived by marginalising, correspond somehow to fix the nuisance parameters to their values favoured by the fit. For the present Higgs fits, it turns out that these preferred values are close to δ ≈ −1. Hence, the regions from the extremal bias (Fig. (9)) being obtained for δ b = ±1 (lower left set is for δ b = −1 51 ), they clearly cover more space in the c V − c f plane than the domains in Fig. (5).

Envelope method
The four plots of Fig. (10) illustrate the Bayesian and frequentist envelope methods performed accordingly to Sections 8.2.2 and 8.3.2. Again, both correlation configurations, giving rise to the combined errors of Eq. (8.3)-(8.4), are studied numerically. The two upper and lower plots of Fig. (10) differ due to the direct envelope method being not equivalent within the Bayesian and frequentist cases. The sets of frequentist envelopes represent the best-fit areas that would be obtained by superimposing the best-fit regions of the extremal bias, but for δ b spanning continuously the interval [−1, 1]. This correspondence between the envelope method and extremal bias appears clearly when one realises (c.f. end of Section 2.2.2) that the former is based on the Eqs. (8.7)-(8.8) while the latter can be obtained through the same equations just with a minimisation over the discrete domain δ b ∈ D = {−1, 1} in Eq. (8.7), instead of the continuous range [−1, 1]. The correspondence is visible when comparing the envelopes with the extreme sets of best-fit domains at δ b = ±1, obtained previously from the frequentist bias method and also superimposed on upper plots of Fig. (10), as dashed contours: these contours draw exactly the extreme limits of the envelopes. The two sets of Bayesian envelopes obtained in the two lower plots of Fig. (10) represent less conservative regions with respect to the frequentist envelope. Besides, the envelopes of these plots cover smaller regions than the best-fit domains that would be obtained by superimposing the best-fit regions of the extremal bias, but for δ b spanning continuously the interval [−1, 1]. This appears clearly when comparing those envelopes to the extreme sets of best-fit regions at, δ b = ±1, obtained previously from the Bayesian bias method (once more superimposed on the lower plots of Fig. (10), as dashed contours).
Finally, we mention that the SM point belongs to all the 68% C.L. regions of Fig. (10). At this level, we can illustrate one of the interests of the bias. Let us consider an hypothetical but plausible situation. For example, suppose that with future LHC data, the SM point would fall outside the 3σ region obtained by marginalising. Such a discrepancy could be interpreted either as an indirect effect of physics underlying the SM on the Higgs sector, or as a shift of the best-fit regions induced by values of the nuisance parameters favoured statistically by the fit. This shift induced by the nuisance parameters would come from the fact that the nuisance parameters and the parameters of interest are determined simultaneously. In contrast, in the envelope method, a SM prediction falling beyond the 3σ region would indicate the presence of new physics without any alternative explanation relying on the statistical treatment (the entire interval of the nuisance parameters being 51 The dependence of the best-fit region location on the nuisance parameter is induced by the dependence of the likelihood (8.2) on, µ ex  The dashed grey contours illustrate the best-fit regions at 68% C.L., 95% C.L. and 99% C.L., obtained in Fig. (9). The SM prediction is shown by the red point.

Conclusions
The main goal of this analysis was to work out a consistent statistical treatment of the theoretical uncertainties in the fits of the Higgs boson rates. We have analysed in a unified formalism both the Bayesian and frequentist approaches to theoretical uncertainties. We systematically analysed how to perform error combinations in a given statistical context and we have introduced a framework to use the bias principle on firm ground. This analysis has been the opportunity to update the Higgs rate fit based on the latest LHC data at 7 and 8 TeV. In the case of Bayesian marginalisation, we have found that the SM prediction for the Higgs couplings still falls into the 68% C.L. region of the c V − c f plane. Bayesian marginalisation benefits from well-defined distributions for the nuisance parameters and from an easier convolution of these error distributions compared to frequentist marginalisation.
We have reviewed all the fundamental sources of the individual theoretical errors involved in the SM Higgs cross sections and branching ratios. Then those errors have been combined in a careful 'step-by-step' approach following the Bayesian rules. In this task of combining a significant number of uncertainties (various Higgs production modes, decay channels. . . ), we were helped by the leading moment approximation -which has been deduced from considerations on the moment-generating function.
This has allowed us to show that the prior of the total uncertainty resulting from the combination of all the theoretical errors (using flat priors for the unknown ones) converges to a nearly Gaussian shape. Besides, it also came out from the numerical results that the precise form of this final theoretical prior is not crucial with respect to the determination of the best-fit regions. This conclusion holds only for the present data, which still have large experimental errors with respect to the theoretical ones.
In contrast, our analysis has shown that the correlations of the theoretical uncertainties among the Higgs detection channels induce a significant shift of the best-fit domains in the space of the parameters of interest. These correlations appear thus to be an unavoidable ingredient of the fits. The Higgs fits were performed in two extreme configurations of theoretical correlations between the various detection channels. The most realistic correlation setup is an intermediate configuration between those two. Such an approach is thus conservative. Besides, considering characteristic configurations has allowed us to derive simple analytic expressions for the marginal likelihood functions.
For future Higgs fits, given the ambiguities inherent to the estimation of the theoretical error magnitudes, we recommend to present an additional analysis with 1σ errors enhanced by a typical factor of 1.5 as a conservative benchmark. Such a factor is consistent with the 1σ theoretical errors preferred by the data. Of course the present degree of arbitrariness in the theoretical error magnitudes could be improved for instance with future higher order QCD calculations or new methods to determine the PDFs.
Finally, we have provided a rigorous statistical framework for the bias principle, which constitutes an alternative to marginalisation. This framework has lead us to define two complementary bias treatments: the extremal bias and the envelope method. The bias principle is more conservative than marginalisation by construction, and does not depend on the shape of the priors of the nuisance parameters, which are not always known. Therefore, a reasonable advice is to apply both the marginalisation and bias methods to the Higgs data. Using the envelope method, we find that the SM prediction belongs to the 68% C.L. region of the c V − c f plane.
Going one order further in the expansion leads to keep with N n = n!/(2(n − 2)!). At that point, the corrections to all moments m C n should in principle be kept.
We then take a second step in our approximation, by considering that the amount of information relevant for our problem somehow decreases with the order of the moment. As a consequence, the corrections to the first moments are the more relevant. Keeping the next-to-leading corrections up to order p, our approximation scheme thus reads if p < n . (A.7) In particular, truncating the corrections at p = 2 amounts to take into account only the correction to the variance, The other details of the shape remaining unperturbed, it follows that Here δ (n) is the n-th derivative of the Dirac distribution. It comes from the Laplace transform of the t n term of the moment-generating function (see also Ref. [59]). These δ (n) should be understood as the leading functional deformation to π A . In practice, it appears that keeping only the first leading moment is appropriate when π A is a one-parameter pdf. In that case, the parameter characterising π C is identified through the combination of variances. For example, taking the normal distribution π A = N (0, σ 2 A ) gives σ 2 C = σ 2 A + ∆ 2 B and π C = N (0, σ 2 C ). 52 The approach above also extends to correlated variables. The difference with respect to the uncorrelated case is that the moment-generating functions do not factorise, as δ A , δ B now share common moments. For example, truncating the corrections at p = 2 gives the correction where ρ (= m AB 1 ) is the covariance of (δ A , δ B ). In the limit of full correlation, one has ρ = 1, so that ∆ 2 C = (∆ A + ∆ B ) 2 . Note that when ρ > ∆ B /∆ A in Eq. (A.10), the contribution from the correlation term becomes larger than the contribution from the square term ∆ 2 B . 52 It is worth noticing that in the Gaussian case, this identification reproduces exactly the correction to the m C n at any order. This is not true for other distributions.
Finally, the leading moment approximation also extends to the case of several linear combinations of variables. Here we consider the case with two linear combinations of two variables δ A , δ B with correlation ρ. The combinations are defined as The variances are found to be like in the one-combination case described above. In the case ∆ A 1 ∆ B 1 , ∆ A 2 ∆ B 2 , the correlation coefficient ρ 12 between δ C 1 and δ C 2 reads .