On the significance of new physics in $b\to s\ell^+\ell^-$ decays

Motivated by deviations with respect to Standard Model predictions in $b\to s\ell^+\ell^-$ decays, we evaluate the global significance of the new physics hypothesis in this system by including the {\it look-elsewhere effect} for the first time. We estimate the trial-factor with pseudo-experiments and find that it can be as large as seven. We calculate the global significance for the new physics hypothesis by considering the most general description of a non-standard $b\to s\ell^+\ell^-$ amplitude of short-distance origin. Theoretical uncertainties are treated in a highly conservative way by absorbing the corresponding effects into a redefinition of the Standard Model amplitude. Using the most recent measurements of LHCb, ATLAS and CMS, we obtain the global significance to be $4.3$ standard deviations.

While there is no single result exhibiting a 5 σ deviation from the SM, the pattern of deviations, collectively denoted as the b → s + − anomalies, is striking. In order to guide future activities in this field, and possibly claim a discovery, it is essential to determine the combined statistical significance of these anomalies in a robust way. This is the purpose of this paper.
The first point to clarify is the alternative hypothesis that we aim to test with respect to the SM. The scope of this paper is to test in general terms the hypothesis of a new short-distance interaction connecting the b and s quarks with a dilepton pair. By short-distance we mean a NP interaction which appears as a local interaction in b-hadron decays. This general hypothesis, which is well justified by the absence of non-SM particles observed so far at colliders, allows us to describe b → s + − transitions using the general formalism of effective Lagrangians, encoding a hypothetical NP contribution via appropriate four-fermion operators. This description, which is conceptually similar to Fermi's theory of beta decays [12], allows to consider each specific bhadron decay of interest as a different way to probe the same underlying b → s + − short-distance interaction.
The hypothesis of NP effects in b → s + − transitions of short-distance origin was formulated first in Ref. [13]. Later on several theory groups have analysed these processes within the framework of effective Lagrangians (see e.g. Ref. [14][15][16][17][18][19][20][21][22][23][24][25][26][27]). These analyses provided fits of the coefficients of well-defined sets of four-fermion operators, the so-called Wilson Coefficients (WCs), obtaining significances that in the last few years largely exceed the 5 σ level [25][26][27]. While these results are interesting and highly valuable, they do not provide the robust and general estimate of the significance we aim for. Our goal is not obtaining the best fit values of the WCs, which is the main goal of these previous studies, but rather estimating the significance of the NP hypothesis irrespective of its specific structure.
Most of the WC fits quoted in the literature are obtained by varying a small number of WCs, typically one or two. While this approach is well suited to test specific (often well motivated) NP hypotheses, and to determine the values of the WCs in these frameworks, it does not provide an unbiased estimate of the significance of the NP hypothesis. As we clarify below, the significance thus obtained resembles the local significance in resonance searches. The concern lies in the fact that several measurements are performed but only a few exhibit deviations with respect to the SM, corresponding to welldefined sets of WCs. It should also be stressed that the WC basis is a purely conventional choice: if a given correlation emerges from data in a two-parameter fit, one can change the basis and perform a fit with apparently higher significance enforcing such correlation via the basis choice and using a single parameter.
Overestimating the significance of a subset of measurements is equivalent to the look-elsewhere effect (LEE) in searches for new resonances [28][29][30]. While there is a small probability to observe a nσ statistical fluctuation in a given bin of a distribution where the resonance could appear (local p-value), when several bins are measured the probability that at least one of them deviates by nσ is larger (global p-value). When searching for a new resonance with unknown mass, the LEE can be addressed by calculating a trial-factor with an ensemble of pseudo-experiments [30][31][32], which is the ratio of the global and local p-values. Conceptually, this is the same approach we adopt in this paper: We estimate the significance of the NP hypothesis in real data via pseudo-experiments. The trial-factor is then due to alternative deviations which could have emerged in a hypothetical dataset with the same experimental precision.
There are fits in the literature that use a large number of WCs and a rather general NP hypothesis [14,25,33]. In particular, Ref. [33] fits all possible WC directions and therefore does not suffer from the LEE. The issue in this case is not the number of WCs but the effective number of degrees of freedom in the system, which depends on the correlations between WCs and the observables that are accessible to the experiments. Using pseudo-experiments is an efficient method to eliminate flat directions in the space of WCs and can easily account for many experimental details such as non-Gaussian uncertainties and correlated systematics.
Summarizing, the approach we propose to determine the statistical significance of NP in b → s + − transitions is based on the following points: • We consider the short-distance b → s + − transition as a unique process constrained by different decay channels.
• We describe NP effects in b → s + − transitions using the most general effective Lagrangian compatible with the hypothesis of an effective local interaction.
• We estimate the trial-factor via an ensemble of pseudo-experiments generated according to the SM hypothesis and using the likelihood ratio as the test statistic.
• We adopt a highly-conservative attitude towards theory uncertainties, particularly in the case of non-local charm contributions.
This method allows us to evaluate the probability to observe the numerical coherence that is seen in data by chance. Only coherent deviations with respect to the SM can give a large value of the test statistic. All possible deviations in both the measurements and Wilson coefficients are considered. Therefore, this method evaluates the global significance of the b → s + − anomalies for the first time.

II. EFFECTIVE LAGRANGIAN AND SELECTION OF THE OBSERVABLES
In the limit where we assume no new particles below the electroweak scale, we can describe b → s + − transitions by means of an effective Lagrangian containing only light SM fields. The only difference between SM and effective Lagrangians, renormalized at a scale µ ∼ m b , is the number of effective operators, which can be larger in the NP case. To describe all the relevant non-standard local contributions, we add to the SM effective Lagrangian where G F denotes the Fermi constant, and where the index i indicates the following set of dimension-six operators (treated independently for = e and µ): As shown in [46], these operators are in one-to-one correspondence with the independent combinations of dimension-six operators involving b, s and lepton fields in the complete basis of dimension-six operators invariant under the SM gauge group. We do not include in the list (2) the dipole operators, O ( ) 7 , for two reasons: these do not describe a b → s + − local interaction and they are well constrained by Γ(B → X s γ) and Γ(B → K * γ). 1 The four scalar operators in (2) lead to b → s + − amplitudes which are helicity suppressed. We thus restrict the attention to the single effective combination which contributes to the B 0 s → µ + µ − helicity-suppressed rate. Finally, in the absence of stringent experimental constraints on CP-violating observables, we treat the NP WCs as real parameters. 2 According to these general hypotheses, NP effects in b → s + − transitions are described in full generality by nine real parameters. As far as C e,µ 9,10 are concerned, it is convenient to separate universal and non-universal corrections in lepton flavor, defining Adopting a conservative attitude toward theoretical uncertainties, we restrict the attention to the following three sets of observables: i) the LFU ratios R K [11] and R K * [5], ii) the branching ratio for the rare dilepton mode B 0 s → µ + µ − [3,6,7,47] and, iii) the normalized angular distribution in B 0 → K * 0 µ + µ − decays [9,10]. As the 1 An explicit quantification of the change of the significance when C ( ) 7 are also varied, taking into account their a priori knowledge before any LHCb measurements, is presented in Section IV. 2 This statement refers to the standard quark-phase convention, where the WCs are approximately real also in the SM. Imaginary contributions to the WCs would not interfere with the SM amplitude and cannot induce large deviations from the SM in CP-conserving observables.
measurements in class i) and ii) are statistically dominated, they are treated as uncorrelated whereas the full experimental correlation matrix given in Ref [9] is used for the B → K * µ + µ − angular observables.
By construction, the observables in class i) and ii) are insensitive to form-factor and decay constant uncertainties (except for f B 0 s in class ii) as well as non-local charm contributions. The latter induce contributions to the decay amplitudes that can effectively be described via the shift where q 2 denotes the squared dilepton invariant mass. The absence of a completely reliable estimate of the theoretical uncertainty on the function f cc B→f , in particular on its normalization at q 2 = 0, forces us to treat the determination of ∆C U 9 as SM nuisance parameter 3 and ignore the information from exclusive decay rates or dilepton spectra. This way we automatically remove of most of the uncertainties associated to the hadronic form factors: a choice that maybe seen as too conservative, but that certainly does not lead to overestimate the NP significance.
The only observable with a residual form-factor uncertainty we retain is the B 0 → K * 0 µ + µ − angular distribution. We keep it since this distribution is sensitive to non-standard effects in short-distance operators other than O µ 9 , even if we marginalise over ∆C U 9 . To reduce the form-factor uncertainty we make use of the P i observables [48]. We explicitly checked that consistent results are obtained using the S i observables [49], employing the form-factor parameterization in [50].
The set of nine parameters discussed above provides an unbiased description of heavy NP contributions to b → s + − transitions. In order to evaluate the impact of motivated, but more specific theoretical assumptions, we also define a reduced set of WCs based on the hypotheses of small flavor-violating effects in the right-handed sector. According to this hypothesis, C i ≈ 0 and the set of independent WCs is reduced to five operators. This hypothesis follows from the general assumption of a minimally broken U (2) 3 flavor symmetry: A general property of SM extensions which was proposed in [51] well before the observation of the b → s + − anomalies, motivated by the stringent constraints on right-handed quark flavor mixing especially in the kaon system (see e.g. [52]). 3 If we were to include more channels potentially affected by nonlocal charm contributions, we would need to treat the determination of ∆C U 9 from each channel as an independent nuisance parameter.

III. STATISTICAL METHOD
To evaluate the significance of the NP hypothesis in the b → s + − system we use as the test statistic. The trial-factor is calculated with a similar technique as described in Ref. [30][31][32]. Starting from SM predictions, a large number of pseudoexperiments are generated, varying the measurements according to the experimental uncertainty. For each simulated experiment, the full set of WCs (C i ) is fitted and the ∆χ 2 between the best fit (Ĉ i ) and the SM prediction (∆Ĉ U 9 , C SM i ) is calculated. Data are fitted in the same way as pseudo-experiments and the distribution of ∆χ 2 is used to calculate the p-value. The software package Flavio [53] is used to fit WCs.
One of the interesting features of the b → s + − anomalies is that they can be easily explained with only one WC: C µ LL = ∆C µ 9 − ∆C µ 10 . While this makes the NP hypothesis easy to interpret from the theory point of view, it is not the best way to assess the sensitivity with respect to the SM. To illustrate this point we apply our method to the fit of one or two WCs. In Fig. 1, the ∆χ 2 distribution under the SM hypothesis is shown when the one/two WCs which maximise the likelihood are chosen to fit the data: For each pseudo-experiment, we fit every single possible one/two WC combination and choose the largest test statistic. The blue curve is an empirical function that best describes the distribution. The comparison with a χ 2 distribution with one/two degrees of freedom demonstrates that a sizeable trial-factor is present. Taking for instance a hypothetical 4 σ discrepancy when fitting the best one/two WCs, it would be diluted down to 3.7/3.5 σ with a trial-factor equal to 4.1/7.0, respectively. Since the current best scenarios to explain the anomalies with NP in C LL or in C 9 and C 10 have emerged from the data, using this hypotheses to evaluate the NP significance can lead to overestimates.
As discussed in Sec. II, we advocate the full set of nine WCs to be used if we would like to have an agnostic approach to NP. However, the full set of WCs contains redundancy, which makes the fit unstable. For instance, the deviations in R K and R * K can be explained with non zero values of C µ LL or non-zero values of C µ RL = C µ 9 −C µ 10 . Here we are not interested in interpreting the best NP direction and we therefore treat all of these in the same way. In total, the maximum number of WCs that can be fitted is seven, with the full basis of muonic operators, the single effective combination of scalar operators, and two electronic operators. Each pseudo-experiment is fitted six times, with all possible combinations of seven WCs. For each experiment, the largest test-statistic value is used. Adding redundant directions will not improve the χ 2 of a given pseudo-experiment, since there are not enough sensitive measurements to constrain simultaneously all nine WCs.

IV. RESULTS
The ∆χ 2 distribution for the fit to the full set of Wilson coefficients is shown in Fig. 2 (top). The same procedure is then used in data, obtaining a ∆χ 2 = 31.4, which corresponds to a global significance of 4.3σ. As expected, the large ∆χ value arises mostly due to the discrepancies with respect to the SM in the LFU ratios, R K and R * K . The goodness of fit to data can be computed by calculating the p-value of the absolute χ 2 of the best fit. This results in a 11% p-value, which is acceptable. The largest pulls of the best fit with respect to the measurements come from the lowest q 2 bins of the angular observables in the B 0 → K * 0 µ + µ − decays. This is a known issue [54] and has a small impact on the significance. Eliminating the lowest q 2 bin of all the angular observables decreases the ∆χ 2 by only one unit and the fit quality of the fit improves, leading to a p-value associated to the absolute χ 2 of 24%.
While the C ( ) 7 WCs do not describe b → s contact interactions and are not included in the default analysis, we investigated the impact of adding them to the set of WCs we allow to be affected by NP. Imposing constraints on C ( ) 7 prior to the flavour anomalies from Ref [55] and including the angular analysis of B 0 → K * e + e − from Ref [56], the total significance marginally decreases, as expected, from 4.3σ to 4.2σ.
Here we advocate that for claiming a discovery, the NP significance should be calculated using an agnostic approach. However, as discussed in Sec. II, there were good a-priori theoretical reasons to assume no NP in C 9,10 . To evaluate the significance of this hypothesis we apply our method to the reduced set of five WCs. The ∆χ 2 distribution is shown in Fig. 2 (bottom) 4 . Applying the same fit to data we obtain a ∆χ 2 = 30.5, which integrating the distribution corresponds to a significance of 4.7σ. Interestingly, this is similar to the values quoted in the recent literature [57][58][59] for single-parameter fits of theoretically clean observables only. Having a larger number of free parameters, one could have expected a lower significance in our case. However, in this specific case the LEE effect is compensated by two facts: i) the inclusion of the angular distribution of the B → K * µ + µ − decay which, even after marginalizing over ∆C U 9 , retains some sensitivity to the other WCs; ii) the overall higher ∆χ 2 obtained with more parameters. This observation reinforces the high significance of the b → s + − anomalies in motivated NP models.

V. CONCLUSION AND DISCUSSION
In conclusion, we have presented a method to evaluate the global significance for the NP interpretation of the b → s + − anomalies. This method transposes the known criteria used for discovering new resonances, such as the Higgs boson, into searching for NP in b → s + − transitions. It is worth emphasizing that, while it is remarkable that all data can be explained by fitting one or two WCs and that this observation can be used to investigate what are the interesting theoretical directions, this hypothesis has been made after having seen the data. Using the same hypothesis to evaluate the global significance of NP would be the Bayesian-inference equivalent of choosing the prior after having calculated the likelihood. Therefore, we advocate a more agnostic method to calculate the global NP significance with respect to the SM in b → s + − processes. To this end, we have calculated the LEE for the first time and shown that the trial-factor cannot be neglected.
We stress that the approach proposed in this paper should not be interpreted as a criticism towards existing attempts made so far of combining and interpreting the anomalies in motivated theoretical frameworks. We are simply addressing a different question. While current fits of selected WC sets in the b → s + − system only evaluate a local significance, these approaches are fundamental to obtain theory insights on the flavor anomalies. Similarly, there is a strong theoretical interest in trying to combine the b → s + − anomalies with other hints of deviations from the SM, such as the b → c ν anomalies [60][61][62][63][64][65][66][67][68] or the recent (g − 2) µ result [69,70] (see also [71]). However, this combination is not appropriate to establish a global significance, given the hypothesis of a connection between different processes is made a posteriori, after having observed data.
We also recognise that our approach of treating ∆C U 9 as a nuisance SM parameter can be viewed as an overly conservative choice. Nevertheless, in the absence of a widely accepted estimate for the theory uncertainty of the non-local cc contributions, this is mandatory for a conservative estimate of the significance. While the uncertainty of all the measurements used here are statistically dominated, the results of our analysis can be improved by adding correlations of experimental systematic uncertainties and taking into account that they can follow non-Gaussian PDFs. Additional potential improvements concern the observables to be included. To simplify the numerical analysis we have only included the observables that are most sensitive.
For instance, observables such as Q 5 [54] measured by Belle [72], were not included in this work since these measurements are still not precise enough to have a sizeable impact. For the same reason, angular observables in B 0 s → φµ + µ − [73,74] decays are not considered. While the decay B 0 s → φµ + µ − is analogous to B 0 → K * 0 µ + µ − from theory point of view, it is limited statistically due to the value of the fragmentation fraction f s /f d [75] and that it is not self-tagged decay.
While beyond the scope of this paper, a more rigorous approach of including all observables and treat all correlated systematics is desirable in view of future combinations.
With current data, all these effects are expected to have a small impact and will not change the main conclusions presented here.
The global significance of 4.3 standard deviations we obtain for the NP hypothesis in the b → s + − system clearly demonstrates the potential of combining different measurements in this system, even when adopting an agnostic alternative hypothesis and an highly conservative theory approach. In view of future measurements, we advocate that experimental collaborations adopt this method to calculate the global significance of the new physics hypothesis in a conservative and unbiased way.

ACKNOWLEDGMENTS
This work was inspired by a discussion with Niels Tuning and a separate discussion with Joaquim Matias. We acknowledge their contributions in attracting our attention to this problem. We thank Konstantinos Petridis and Diego Tonelli for very useful discussions and comments. We also thank Abhijit Mathad for valuable crosschecks on the numerical results of the paper. This project has received funding from the Swiss National Science Foundation (SNF) under contracts 00021-182622 and 200021-175940, and from the European Research Council (ERC) via the European Union's Horizon 2020 research and innovation programme under grant agreement 833280 (FLAY).