Taming systematic uncertainties at the LHC with the central limit theorem

We study the simplifications occurring in any likelihood function in the presence of a large number of small systematic uncertainties. We find that the marginalisation of these uncertainties can be done analytically by means of second-order error propagation, error combination, the Lyapunov central limit theorem, and under mild approximations which are typically satisfied for LHC likelihoods. The outcomes of this analysis are i) a very light treatment of systematic uncertainties ii) a convenient way of reporting the main effects of systematic uncertainties such as the detector effects occuring in LHC measurements.


Introduction
The search for physics beyond the Standard Model requires a thorough statistical investigation of the data collected at the LHC. The analyses of LHC samples are often plagued by a number of systematic uncertainties, such as detector resolutions or the imperfect knowledge of physical constants, that should be treated with care in order to obtain reliable conclusions on a putative signal. A consistent implementation of these systematic uncertainties in the likelihood function is done either by summing over all the realisations of the nuisance parameters or by maximising the likelihood with respect to them, respectively within the Bayesian and frequentist frameworks.
While the operations of Bayesian and frequentist marginalisation are conceptually clear, one can distinguish two main issues related to their practical implementation. First, the correct treatment of systematic uncertainties is technically challenging to perform, because it necessitates many multidimensional integrations or maximisations. For example, integrating the O(4000) nuisance parameters of the Higgs likelihood is very difficult, even with the computing power accessible to the LHC Collaborations. Second, transmitting or making public the information of the systematic uncertainties can also be technically challenging. The study we carry out will end up providing new insights into both of these topics.
In the context of the LHC, communicating the full experimental likelihoods via the RooFit/Roostats framework [1,2] has been suggested in [3,4]. The presentation method we will propose is somehow complementary from the proposal of [3,4], in that it is technically straightforward to carry out and leads to a fairly human-readable summary of the systematic uncertainties. Also, the goal of presenting LHC results decoupled from systematic uncertainties has been pursued in [5] in the context of theoretical errors on Higgs cross-sections. While the objectives of [5] are partly similar to the ones of this work, the results obtained are different. In particular, the general marginal likelihood that we will display is derived from first principles, and no discussion about reparametrisation templates is required.
Along this work we are going to adopt the hypothesis of a large number of small independent sources of uncertainties. Here by "small" we mean to qualify the relative magnitude of the systematic uncertainties. We stress that this is an intrinsic property that does not depend on the magnitude of the statistical uncertainties, i.e. on the size of the data sample. The validity of our results will thus not depend on the amount of observed data.
The assumption of small relative magnitude is fairly weak as, to our knowledge, most of LHC systematic uncertainties have a relative magnitude which is much lower than 100%. The assumption of independence follows naturally from the process of describing systematic uncertainty as correctly as possible, as the more one delves into the origin of uncertainty, the more its description becomes a set of elementary sources unrelated to each other. Independence will have the crucial implication that the combination of the elementary uncertainties is mostly described by its first and second moments. 1 Our computation consists in two steps of error propagation and error combination, that are laid down in Sec. 2. These steps already lead to a substantially simplified likelihood. In addition, if the relative magnitude of the combined uncertainties is somewhat small with respect to one, we show in Sec. 3 that marginalisation can be done exactly, providing a general, explicit formulation of the marginal likelihood. The cases of sig-nal+background and differential distributions are treated. Finally, a signal strength toy-analysis illustrating the validity of our calculation is displayed in Sec. 4.

Taming a large number of small uncertainties
We consider an event-counting likelihood wheren is the observed event number and n is an expected (i.e. theoretical) event number. Assume n depends on parameters of interest θ and on nuisance parameters δ. The likelihood is then defined as L(θ, δ) ≡ Pr n n(θ, δ), θ, δ . (1) All the variables written in this likelihood (θ, δ,n, n) should be understood as vectors, whose labels and dimensions will be made explicit below. Without much loss of generality, 2 one further assumes that the various measurements of n are statistically independent. In the following we will denote by N the number of independent measurements, i.e. I ∈ [1 . . . N ], and by p the number of nuisance parameters. The likelihood of our focus has thus the form Pr n I n I (θ, δ), θ, δ . ( The approach laid down in the present section applies in fact to any likelihood that can be expressed as a function of the expected event numbers, L[n I (θ, δ)]. 3 However, the subsequent analytical marginalisation presented in Sec. 3 would not hold in general, as the approximate likelihood would not be Poissonian or Gaussian in δ.

Parametrisation
For a systematic uncertainty spanning a given domain, there exists in principle an infinite number of parametrisations, that are all equivalent under suitable redefinition of the central limit theorems. 2 Most of LHC event selections are independent from each other. A notable exception occurs when various overlapping selections of a same dataset are reported, with no information about more elementary, mutually exclusive selections. The correct statistics describing the set of overlapping selections is a multivariate Poisson. This has been described in details in [9] in the context of diboson ATLAS results [10]. distribution of the uncertainty. Among all possible parametrisations, it is useful to choose one that makes appear the relative magnitude of the uncertainty. We define a standardised representation as follows.
For any quantity A subject to uncertainty, that can take both signs, one simply defines where the nuisance parameter δ satisfies 4 so that ∆ corresponds to the relative magnitude of the uncertainty, i.e. V[A] = A 2 0 ∆ 2 . Our working hypothesis of small relative magnitude translates as ∆ 1. We also need to consider quantities that can only be positive, in the first place the expected event number n. 5 Throughout this paper one defines the standardised form for the error on a positive quantity as where E[δ] = 0, V[δ] = 1 and n 0 is the nominal value of n in the absence of uncertainty. Below it will become clear that the expansion in ∆ has to be done up to second order, so that Eq. (5) can be equivalently taken to be Compared to the linear form Eq. (4), one can see that the extra quadratic term ensures positivity of n. It also induces a small, positive shift of the mean value of n, as E(n) = n 0 (1 + ∆ 2 /2) -or similarly E(n) = n 0 e ∆ 2 /2 without the expansion. The variance is V(n) = n 2 0 (e 2∆ 2 − e ∆ 2 ) = n 2 0 ∆ 2 (1 + O(∆ 4 )).

Error propagation
As a first step, we want to propagate the systematic uncertainties at the level of the event numbers. For an event number n depending on a quantity Q subject to uncertainty, we have The propagation amounts to perform a Taylor expansion with respect to ∆ Q . This expansion should be truncated appropriately to retain the leading effects of the systematic uncertainties in the likelihood. For now we take for granted that the expansion should be truncated above second order. This order will be justified further below.
In a one-parameter case, second order propagation leads to This is most easily obtained by expanding log n. Clearly, the validity of this expansion relies on neglecting higher powers of ∆ Q times the appropriate derivative of log n. As long as n is well-behaved, which should be checked in practice, this expansion is valid for uncertainties that have a small relative magnitude, i.e.
For example, for elementary systematic uncertainties that do not exceed ∆ ∼ 10%, keeping the expansion up to second degree implies that the neglected higher order terms are O(0.1%).
In the case were various uncertainties Q 1...p are propagated into n, it is convenient to use a vector notation for δ and ∆. Assuming p nuisance parameters, one defines The relative uncertainties propagated to n are then written as After this step of error propagation, the likelihood takes the form L(θ, δ) ≡ I Pr n I n 0,I exp(∆ 1,I · δ + δ · ∆ 2,I · δ), θ, δ .
All the n 0,I , ∆ 1,I , ∆ 2,I depend in principle on the parameters of interest θ. Details about the order of truncation. For the sake of determining the truncation order, it is enough to consider a one-parameter case and take limits. Consider a likelihood Pr(n|n 0 exp(∆ 1 δ + ∆ 2 δ 2 )) with ∆ 2 = O(∆ 2 1 ). We first study the limit of an infinite amount of data. In that case, the likelihood tends to a Dirac peak, 6 δ n − n 0 exp(∆ 1 δ + ∆ 2 δ 2 ) .
6 More precisely, the Dirac limit can be taken when the relative magnitude of the statistical uncertainty -given by 1/ √ n0 in the Poisson case -is small with respect to the inverse Fisher information of all the priors present in the problem.
As δ is O(1) by definition, it follows that log(n/n 0 ) = O(∆ 1 ). Using this fact, one can then verify thatL = π log(n/n 0 ) We conclude that the leading effect of the systematic uncertainty comes from ∆ 1 , and appears thus at first order in the expansion.
Second we study the case where the amount of data is small enough so that the likelihood itself can be expanded with respect to n 0 ∆ 1 , n 0 ∆ 2 . It comes One then marginalises this likelihood with respect to the nuisance parameter δ using an arbitrary prior, dδL(δ)π(δ). By definition, E(δ) = 0 (see Eq. (4)), so the linear term vanishes. The leading effect of the uncertainties appears thus from the second order term in the expansion. This implies that the expansion has to be done at quadratic order from the beginning, so that the ∆ 2 term should not be neglected.
As the truncation has to be done above second order in one of the limiting cases, it is convenient to use this order in all cases to ensure that all leading effects of systematic uncertainties are consistently taken into account.

Error combination
The previous step of propagation opens up the possibility of combining the nuisance parameters. We define nuisances parameterδ I , associated to every measurement I, so that n I =n 0,I exp(∆ IδI ) ≡ n 0,I exp ∆ 1,I · δ + δ t · ∆ 2,I · δ .
These combined nuisance parameters are in general correlated with each others, their joint distribution we will denoteπ. The set of equations (18) is the starting point for the combination of uncertainties. 7 The likelihood expressed with respect to the combined nuisance parameters is written as Following our conventions, the combined nuisance parameters have to satisfy E( Iπ (δ I ) = 1. The next task is to determine the numbers 7 At the level of the likelihood, combination is defined as the variable change where the πi are the priors of the elementary nuisance parameters. This is equivalent to Eq. (18).
n 0,I and ∆ I . This is obtained by taking the expectation and the variance on both sides of Eq. (18). The central value of the event numbers before and after combination are different because of the nonlinear propagation. It turns out that the diagonal terms of ∆ 2,I contribute to the mean value of n I , so that 8 n 0,I = n 0,I (1 + tr(∆ 2,I )) .
The relative magnitudes ∆ I are obtained by evaluating the variance on the two sides of Eq. (18). One gets where the O(∆ 4 ) denotes higher order terms like tr(∆ t 2,I ·∆ 2,I ), (tr ∆ 2,I ) 2 , ∆ t 1,I ·∆ 1,I tr ∆ 2,I . One may note the contrast of Eq. (22) with the mean value Eq. (21), where ∆ 2,I provides the main correction and cannot be ignored.
The next step is to compute the correlation matrix among the event numbers n I , n J induced by the systematic uncertainties. The correlation matrix is found to be 9 In the next paragraph it will be made clear that under our working assumptions, this information is enough to describe the entire shape of the combined uncertainties distributionπ.

Shape of the combined prior
Computing the joint distributionπ of the combined uncertainties may seem at first view a very challenging task, as there can be in principle a lot of uncertainty sources. The experimental Higgs likelihood, for example, contains O(4000) nuisance parameters, i.e. the vector δ has dimension O(4000). This means that 4000 convolutions would have to be done for each value of the combined nuisance parametersδ I . One should however realise that the shape ofπ is determined by its central moments of order higher than two, which all depend only on the ∆ 1,I at leading order in the ∆-expansion. In fact, at leading order, the ∆ 2 term matters only for the mean value and always gives subleading contributions to higher moments. This can be seen by evaluating the central moments of Eq. (18). One can thus safely neglect the O(∆ 2 ) term in the combination ∆ IδI = ∆ 1,I · δ + O(∆ 2 ), and make the crucial observation that this quantity is as sum of many independent random variables.
Besides, one notices that all the common distributions for nuisance parameters, such as the uniform, normal, log-normal distributions, possess finite higher moments. This is enough to invoke the Lyapunov central limit theorem (CLT) [12], 10 which can be stated as follows. If it exists an integer κ > 0 so that then the distribution of the combination ∆ 1,I · δ converges in distribution towards a normal law with variance ∆ 2 I . For κ = 1, for example, the condition involves the third moments of the nuisance parameters. The Lyapunov condition is verified for any kind of prior shape used in LHC analyses, such as normal, log-normal or uniform distributions. 11 An estimate of the rate of convergence of the combined prior towards the normal law is given by the Berry-Esseen theorem [6,7,8]. For the combination of identical nuisance parameters the maximal difference between the combined prior and the Gaussian decreases as 1/ √ n. For a combination of arbitrary nuisance parameters, which is our focus, the Berry-Esseen theorem states that the maximal difference between the combined prior and the Gaussian is of order This can be used in order to get an estimate of the convergence of the combined prior. The arguments above can be applied separately to every combined nuisance parameterδ I . However, whereas the elementary uncertainties are independent, the variousδ I are correlated between each other -the correlation matrix is given by Eq. (23). The proof that the distribution of the set ofδ I converges towards a multivariate normal is obtained by decomposing ρ IJ as ρ IJ = A t IK A KJ . 12 Provided that the Lyapunov condition is satisfied for every A IK δ K , one gets by definition a multivariate normal with diagonal correlation matrix. Applying the reverse transformation achieves to proove that the combined prior has asymptotically the form As a consequence, the combined uncertainty on the expected event numbers n I asymptotically follows a multivariate log-normal distribution (see Eq. (20)). 10 The Lyapunov CLT does not require identically distributed nuisance parameters, nor identical variances. In a similar fashion, the Lindberg-Feller CLT [8,12,13] applies with a condition weaker than Eq. (24), but maybe less intuitive. The Lindberg condition is implied by the Lyapunov condition. 11 In particular, note that in cases where the distribution of the nuisance parameters is symmetric, Eq. (24) is zero for odd κ and any n, so the Lyapunov condition is automatically satisfied.
12 This is allowed as ρIJ is a real symmetric matrix.
Besides these robust arguments, one may also note that many of the elementary systematics uncertainties readily have a Gaussian prior, which further enhances the convergence rate of the combination. This is also true for the log-normal distribution in the limit of relative magnitude, in which case the log-normal is approximately Gaussian. The manifestation of the CLT in the case of theoretical uncertainties for Higgs production and decay rates has been explicitly observed in [14].

Practical considerations
We claimed above that under the assumption of a large number of small uncertainties, the shape of elementary priors does not matter and the shape of the combined prior is approximately Gaussian. All the information needed to treat the systematic uncertainties is in fact contained in the mean values and the covariance matrix of the combined nuisance parameters, that are obtained through the steps of propagation/combination described above.
Some practical conclusions can already be drawn. It turns out that the approximate treatment proposed above only requires the knowledge of a finite set of numbers: • The magnitude of the elementary uncertainties, ∆ i Q , of dimension p.
• The first derivative of the expected event numbers with respect to every nuisance parameters, i.e. ∂n I /∂δ i , of dimension N × p.
• The diagonal second derivative of the expected event numbers with respect to every nuisance parameters, i.e. ∂ 2 n I /∂δ i ∂δ i , of dimension N × p.
All the relevant information about systematic uncertainties is thus encoded into (2N +1)p numbers. The transmission of this information poses no technical challenge. In the context of LHC analyses, it could be an easy and efficient way for the Collaborations of making public the main detector effects. Besides, as a rule of thumb about the typical number of elementary uncertainties required for the CLT to converge, one can ask for a minimum number of p = 4 − 5 elementary uncertainties with similar magnitudes and flat priors. In case of Gaussian priors, this constraint does not hold as the combined prior is perfectly Gaussian for any p.

Analytic marginalisation for Poisson and Gaussian likelihoods
The previous steps of propagation, combination, and prior simplification can readily be used to reduce the amount of nuisance parameters in any kind of fit. In the case of Higgs experimental uncertainties, the O(4000)-dimensional space of nuisance parameters would be reduced to a O(100)-dimensional space -the amount of statistically independently observed channels. But ultimately, an integration still needs to be carried out over a space of substantially large dimension, for which a Monte Carlo integration is often required .
However, it turns out that one can go further under the extra condition that the combined uncertainties are small. Indeed, one can use a Taylor expansion with respect to the magnitude of the combined uncertainties ∆ I up to quadratic order in order to simplify the likelihood. This will render possible a completely analytical marginalisation. In both cases of Poisson and Gaussian statistics, the expansion of the likelihood reads In practice, the validity range of the approximation depends on the amount of data and on the expected number of events. This is illustrated in Fig. (1) forn = 10, 100 and for typical values of λ roughly corresponding to 0, 2 and 3 sigma deviations. As a very rough rule of thumb for typical values ofn and λ, one may keep in mind that the validity is good up to ∆ ∼ 20%. 13 We now plug the approximation Eq. (27) into the general likelihoodL, given in Eq. (20). The marginalisation ofL is given bỹ where the combined priorπ is given by Eq. (26) and involves the correlation matrix ρ IJ of theδ given in Eq. (23). The approximate marginal likelihood is found to bẽ where "·" is matrix multiplication, and one introduced the vector 13 In the presence of large samplen 1, it is customary to approximate the Poisson likelihood by a Gaussian. It is worth noticing that, while the computation of Eq. (27) is straightforward for the Poisson case, obtaining the same result starting from the Gaussian is a bit more delicate. Depending on how the approximation is done, slightly different expressions can be obtained, that all are close from each other provided thatn 1. The Poisson result Eq. (27) is valid for anyn, and will be used in the following.
the diagonal matrix η with and L stat = I Pr(n I |n 0,I ).
Both ξ and η depend on the parameters of interest θ via the expected event numbers and the combined uncertainties, i.e. one has in general n 0,I (θ), ∆ I (θ). The L stat term is almost the likelihood with no nuisance parameters, i.e. the piece of likelihood encoding the statistical uncertainty, except that it involves the shifted expected event numbers n 0,I (see Eq. (21)). In the limit of zero systematic uncertainty, i.e. ∆ → 0, L sys (θ) becomes an irrelevant constant so thatL(θ) → L stat (θ). The L sys encodes most of the effect of the systematic uncertainties. Its effect is to enlarge and shift the preferred regions of the parameters of interest.
Frequentist marginalisation (i.e. profiling), as well as the Bayesian and frequentist bias methods described in [14] can all be treated analytically, substituting Eq. (28) by the appropriate operation. The approach described above and leading to Eq. (29) is general. Nevertheless it is interesting to work out specific cases that are omnipresent in LHC analyses.

Signal strength fit
The general approach summarised by Eq. (29) can be applied to the very typical case where the expected number of events is split into a signal and background component, n = s + b. The signal can be also further parametrised as s = µs 0 , where the parameter of interest µ is a "signal strength modifier" and s 0 is some nominal value for the signal.
In principle both background and signal are plagued by systematic uncertainties, so that one should distinguish the elementary nuisance parameters for signal and background, δ s ,δ b . After a preliminary step of error propagation, the systematic uncertainty on the expected rates take the form n I =s I exp(∆ s 1,I · δ s + (δ s ) t · ∆ s 2,I · δ s ) In order to obtain the standard form for propagated errors Eqs. (10), (11), (12) , one defines the overall vector of elementary uncertainties δ = (δ s , δ b ), and write n I = n 0,I exp(∆ 1,I · δ + δ t · ∆ 2,I · δ) where n 0,I = s 0,I + b 0,I , ∆ 1,I = (s 0,I ∆ s 1,I , b 0, (34) This makes contact with the standard notation of Eq. (13), and the analytic marginal likelihood is readily given by Eq. (29). If the N independent likelihoods correspond to N measurements of a same process, one has s 0,I = s 0 , b 0,I = b 0 for every I. This case will be illustrated in Sec. 4.
Here, positive s and b have been assumed. It is also possible to allow s to take negative values if it is dominated by the destructive interference between the SM and BSM matrix elements. In that case a linear modelisation of the error on s is fine, but one should bear in mind that if the support of the δ s I is such that b + s can be zero, depending on the prior of δ s I the likelihood can blow up above some arbitrary large value of s. This is a general fact that is not specific to the approximations studied in this paper.

Differential distributions
Another typical measurement at the LHC is the one of a differential distribution. The likelihood with no systematic uncertainties has then the form of Eq. (14), where every measurement corresponds to a different bin I , andn I is the observed event number in the bin I. Denoting by X ∈ D the variable along which the events are binned and by D I the subdomain of D defining the bin I, the expected event numbers are given by n I = n tot D I dxf X (x). Differential distributions get deformed by detector effects, that typically smear their shape. A general way of modelling the smearing is to write the binning variable as where ∆(X 0 ) is the relative magnitude of the uncertainty at the location X 0 . As a simple example, we assume a model of smearing independent of X -the general case can be treated similarly. The expected number of events in a bin I is given by where n tot is the expected total number of events. Starting from Eq. (14), one can disentangle the information of shape and total event number, L(θ, δ) = L tot (θ)L shape (θ, δ) , L shape (θ, δ) = n I=1 n I (θ, δ) n tot (θ) Only L shape depends on δ, as this nuisance parameter models a shape deformation. Expanding log n over ∆ at quadratic order gives with n I,0 = n tot (40) As an aside one may notice that when one expands n I at second order, the quadratic term explicitly shows the effect of smearing. If f is convex (concave) over the bin, then f X > 0 (f X < 0), so that the quadratic term fills (depletes) the bin, accordingly to what is expected from a smearing process.
Plugging this expression into the likelihood gives exactly 14 with ξ = I ∆ 1,I , η = 2 I ∆ 2,I . Marginalising with a Gaussian prior for δ gives once again Eq. (29), here in a one-variable version: Finally, the unbinned version of the same likelihood is directly obtained by taking the limit of infinitely thin bins. Thenn I can be only zero or one, the integrals can be simplified and the observed events end up labelled by their positionX I . One gets from which the marginal likelihood follows. The information that has to be reported to reconstruct this smeared likelihood is • The magnitude of the relative uncertainty on the binning variable X, • The first and second derivatives of the expected shape f X .

An example of signal strength fit
In order to illustrate our results, we consider a somewhat realistic scenario for the characterisation of a signal. Formally, the scenario considered corresponds to a particular case of the signal strength analysis described in Sec. 3.1. This example will also be used to check the accuracy of the approximate marginal likelihood,L.
To carry out this toy analysis one first has to setup the "observed" data, the expected background and the systematic uncertainties. We assume three independent observation channels I = (A, B, C). An observed event numbern A,B,C is assumed for each channel. The expected event number is given a signal+background form n = s + b, which is taken to be common to all channels, so that s I = s, b I = b, and the nominal value of b is fixed.
We further assume the presence of 3 independent systematic uncertainties labelled a, b, c for the signal and 2 uncertainties labelled d, e for the background. Both signal and 14 Interestingly, no Taylor expansion of the likelihood is needed to get this result. Table 1: Observed data and nuisance parameters in three statistically independent channels A, B, C. The average expected background is taken to be b 0 = 100.

Channel
background are positive, so that we use the error modelisation of Eq. (5). We are going to consider cases of a flat and Gaussian prior for δ, which imply respectively log-normal and log-flat priors for the s, b components of the expected event number. Assuming a first step of propagation from more elementary uncertainties, and disregarding the possible ∆ 2 terms for simplicity, the leading effect of the systematic uncertainties on the expected event number is characterised by 3 × 5 numbers ∆ s/b,i 1,I , given in Tab. 1. In the notation of Sec. 2.5, one has N = 3, p = 5. Note that with this starting point, the magnitude of the p elementary uncertainties are already combined with the N × p derivatives of the expected signal. The uncertainties appear in each channel I as . Having characterised the main effect of the systematic uncertainties, we can readily use the approximate marginal likelihoodL. Moreover, we compute the exact marginal likelihood by numerically integrating over the five nuisance parameters, 15 which provides a way of testing directly the accuracy ofL. In practice, plotting the exact marginal likelihood of our example takes about an hour on a laptop with average specifications, while plotting the approximate likelihood is instantaneous.
All the numbers assumed forn I and the ∆ s/b,i 1,I are given in Tab. 1. The observed numbers are chosen so that the statistical uncertainty be of O(10%). This is for example the case in the global fit of the 8 TeV Higgs signal strengths. The signs of the systematic uncertainties have been chosen so that the likelihood strongly depends on every nuisance parameter. The relative magnitudes of the elementary uncertainties are chosen to be O(10%).
This illustrative scenario can be used in order to check the accuracy of the two kinds of approximations required to obtain theL likelihood: a) the CLT-based Gaussian approximation (see Sec. 2) and b) the likelihood expansion (see Sec. 3). The Gaussian and flat priors allow us to disentangle between these two approximations, because for Gaussian priors, approximation a) is always satisfied, i.e. the CLT is perfectly convergent. Thus for Gaussian priors, the discrepancies between the approximate and local likelihoods come only from approximation b). In contrast, in the flat prior case the discrepancies come from both approximations a) and b). 15 Note that for a flat distribution, The exact and approximate marginal likelihoods are shown in Fig. 2. In the case of Gaussian priors for δ (i.e. log-normal uncertainties), the two curves agree very well. This shows that approximation b) is well under control. In order to test approximation a), i.e. the CLT convergence, we can now compare with the case of flat priors for δ. It turns out that a mild discrepancy appears in certain regions. This illustrates the degree of convergence of the CLT in a five-parameters case. The two curves agree still fairly well in this flat-prior case, in the sense that the best-fit regions drawn from these likelihoods would be similar. For a larger number of nuisance parameters, this discrepancy is expected to decrease as the CLT convergence should improve.

Conclusion
With the goal of simplifying the treatment of systematic uncertainties in typical LHC analyses, we have studied the behaviour of a generic likelihood in the presence of a large number of uncertainties with small relative magnitudes.
Whenever this condition is satisfied, it turns out that well-controled approximations become available, which provide a way of drastically simplifying the incorporation of systematic uncertainties into the likelihood. Our demonstration is split into steps of error propagation and error combination. In the latter, the Lyapunov central limit theorem applies to the combined uncertainties, thereby approximating their joint distribution as a multivariate normal. This implies that the shape of the priors of the elementary uncertainties is irrelevant -only their magnitudes matter.
Whenever the combined uncertainties are small enough, say 20%, the likelihood can be further simplified and the complete marginal likelihood is obtained analytically. This general result is applied to the important cases of signal strength characterisation and differential distribution smearing.
For illustration, we present a toy-analysis of signal strength characterisation including systematic uncertainties on signal and background. The approximate and exact marginal likelihoods are found to be in fairly good agreement in this example, implying that all approximations are well under control.
Beyond the obvious gain of avoiding heavy numerical marginalisation, another practical matter is the communication of systematic uncertainties, for example from an experiment to the public. Our approach implies that all the needed information is encoded into a finite set of numbers, namely the relative magnitude of elementary uncertainties and the derivatives of the expected event numbers. The transmission of this information is straightforward, and gives a fairly human-readable summary of the systematic uncertainties. In principle, this simple method could be used to make public the detector effects that are included in LHC analyses.
The marginal likelihood presented in this paper is purely Bayesian. It is also possible to compute analytically the marginal likelihood in case of a frequentist profiling (described in App. A), as well as to apply the bias methods formalised in Ref. [14]. Finally, although our study is oriented towards LHC analyses, it could also be readily applied into other experimental contexts.
which ressembles very much the Bayesian result up to a |ηρ + 1| factor. Typically, the variation of this factor with respect to the parameters of interest is small compared to the variation of the exponential term. Hence, in practice, the frequentist and Bayesian approximate likelihoods are almost equivalent. The subsequent frequentist and Bayesian best-fit regions obtained from these likelihoods thus differ mostly by the definition of frequentist and Bayesian contours [15].