New physics in $b\to s \ell\ell$ decays with complex Wilson coefficients

We perform a data-driven analysis of new physics (NP) effects in exclusive $b \to s \ell^+\ell^-$ decays in a model-independent effective theory approach with dimension six operators considering scalar, pseudo-scalar, vector and axial-vector operators with the corresponding Wilson coefficients (WC) taken to be complex. The analysis has been done with the most recent data while comparing the outcome with that from the relatively old data-set. We find that a left-handed quark current with vector muon coupling is the only one-operator $(\mathcal{O}_9)$ scenario that can explain the data in both the cases with real and complex WC with a large non-zero imaginary contribution. We simultaneously apply model selection tools like cross-validation and information-theoretic approach like Akaike Information Criterion (AIC) to find out the operator or sets of operators that can best explain the available data in this channel. The $\mathcal{O}_9$ with complex WC is the only one-operator scenario which survives the test. However, there are a few two and three-operator scenarios (with real or complex WCs) which survive the test, and the operator $\mathcal{O}_9$ is common among them.

'cross-validation' [27]. In accordance to our earlier publication [19], we use both AICc and cross-validation to pin down the best possible scenarios.
In the above equation, g is the strong coupling constants. The current-current and the QCD-penguin operators are taken from [28]. Though the primed operators and O S,P vanish or are highly suppressed in the SM, they could be relevant in some NP scenarios. For a proper organization of the perturbative expansion of the WCs, a normalization factor 1/g 2 has been introduced in front of the operatorsÕ i (i = 7, .., 10) [28]. To be consistent, a similar factor has been introduced in front of the primed operator. Note that we are treating the new WCs as free parameters and fitting them from data. Therefore, the discussion on this choices of a proper normalization is more relevant in the context of the SM operators and the corresponding WCs, in particular, in their renormalization group equation (RGE) evolution. For these operators and the corresponding WCs, it is possible to change the normalization to another commonly used basis [29,30]: We can see that both the basis of operators are equivalent. With a change in the normalization of the operators, there will be an appropriate scaling in the respective WCs. As mentioned above, since we fit the new WCs from data, working in any basis should be fine with the proper scaling of the corresponding WCs. In our analysis, we have considered the NP effects in the following set of operators (non-tilde basis): The relevant WCs are the following: C 7 , ∆C 9 , C 9 , ∆C 10 , C 10 , C ( ) S , and C ( ) P . The amplitude and the corresponding rate distributions in B → K ( * ) decays are defined using the hadronic form factors computed in full QCD. Our treatment of the decay amplitude in B → K is closely related to that given in ref. [29] and that for the B → K * decays are taken from [20]. The SM WCsC 7 ,C 9 andC 10 are taken in next-tonext-to-leading logarithmic (NNLL) approximation, the details of which are provided in [28]. The contributions from the four-quark current-current operators O q 1 , O q 2 (with q = u and c) and quark-penguin operators O 3.. 6 are included in an effective coefficientC ef f 9 of the operator O 9 which also includes the leading order (LO) charm-loop effects. Note TABLE I: List of a few observables with pulls > 2 of the 'Likelihood dataset 2020' and 'Likelihood dataset 2016'. The superscripts on the observables indicate the q 2 range in GeV 2 . The corresponding SM predictions can be seen from [19,29]. thatC ef f 9 is in general complex. A part of the contributions from the operators O 3.. 6 are also included inC ef f 7 . The perturbatively calculable non-factorizable corrections to the operators O 1 and O 2 at order α s are taken from [31,32] and added as a correction toC ef f 9 . Other non-factorizable corrections which have been considered in this analysis include the weak annihilation and the hard spectator scattering contributions as given in [33,34]. Note that both these additional contributions include the charm loop effects, and they are complex. In this analysis, we have not considered the contributions from the non-factorizable power corrections which are expected to be very small as pointed out in a very recent analysis [35]. When the final state contains a vector meson, one can construct various helicity/transversity amplitudes like A ⊥L,R , A L,R , A 0L,R etc., which are then used to form angular coefficients relevant in defining the CP-symmetric and asymmetric observables measured by the different experimental collaborations. As mentioned earlier, details about various transversity amplitudes and the respective angular coefficients can be obtained from [20]. Here, the non-factorizable corrections are considered as additional contributions ∆A ⊥L,R , ∆A L,R and ∆A 0L,R to the transversity amplitudes for details see section 4.3 of ref. [20]. In the SM, the complex phases will appear in all the A iL,R 's and ∆A iL,R 's.
The NP contributions to operators O 9,10 are given by ∆C 9,10 . We have not explored the possibility of new physics effects in O 7 which is tightly constrained from the available data on inclusive and exclusive radiative decays. However, we have considered new physics effects in the chirality flipped operator O 7 , the real part of the corresponding WC C 7 is also tightly constrained by the data. However, as pointed out in [36], the direct CP-asymmetry A CP in the inclusive b → sγ decays tightly constrained the imaginary part of C 7 along with it's real part, though it has marginal effect in the Im(C 7 ). Note that in our analysis, we are not using A CP (b → sγ) as an input since both its predicted and the measured values have large error. We have constrained both the real and imaginary parts of the C 7 from all other available data as will be discussed below, and we do not expect the currently available inputs on A CP (b → sγ) to provide tighter constraints on Im(C 7 ).

III. EXPERIMENTAL INPUTS
In what follows, we categorically present the experimental inputs in our analysis. Our main analysis is based on the following data sets: • Likelihood dataset 2020: (i) Measured values of the angular observables in B 0 → K * 0 µ + µ − decays in different bins, gathered from ref. [6] (LHCb).
(iv) Binned data on the differential branching fractions and angular observables (CP-averaged and asymmetric) for B s → φµ + µ − (LHCb) from ref. [42]. Other inputs are: For the purpose of comparison we perform our analysis over two other datasets apart from the set mentioned above, which we call the (i) Moment 2016 dataset, (ii) Likelihood 2016 dataset, which are as given below • Moments dataset 2016: Values for the angular observables measured by the "Method of moments" in B 0 → K * 0 µ + µ − decays in different bins, gathered from refs. [5] (LHCb). • Likelihood dataset 2016: Values for the angular observables due to the "Unbinned Maximum likelihood" in B 0 → K * 0 µ + µ − decays in different bins, gathered from refs. [5] (LHCb). The only difference between the datasets mentioned above is subject to the angular observables in the B 0 → K * 0 µ + µ − sector due to LHCb. The inputs discussed in items (ii), (iii) and (iv) of our primary dataset (Likelihood dataset 2016) are included in the Moments dataset 2016 and Likelihood dataset 2016 as well. Beyond the fit scenarios discussed above, we have also carried out an analysis without considering the inputs on CP-asymmetric observables in B s → φµµ decays.
In addition to other inputs (e.g. CKM matrix elements) [47], we have included the lattice input f Bs = 0.2284±0.0037 GeV [48] in all the fits. Unless otherwise specified, all numerical uncertainties quoted in this analysis denote the 1σ (68.27% c.l.) range. Note that in [6], LHCb did not update the measurement on CP-asymmetric observables in B 0 → K * 0 µ + µ − decays. Therefore, to check whether they provide tighter constraints on the complex WCs, we incorporate those measurements [5] in a different fit along with the data defined as Likelihood dataset 2020. This additional set of B 0 → K * 0 asymmetric observables is completely uncorrelated with the Likelihood dataset 2020. Also, we consider the bins for q 2 ≤ 6 GeV 2 to avoid any contamination from the charm resonances. Other relevant information about the inputs from theory can be obtained from our earlier publication [19] and the references therein.

IV. ANALYSIS AND RESULTS
Our 'Likelihood dataset 2020' contains a total of 224 observables. We first check whether each of the new operators defined in eq.4 can independently explain the present data after including all these observables. To do so, we perform a frequentist statistical analysis optimizing a χ 2 statistic which is a function of the relevant WCs. Separate covariance matrices are constructed for statistical (systematic) uncertainties where necessary. In the post-process for each fit, we obtain the corresponding fit-quality using p-value. For other technical details about the fit, we refer to our previous study [19]. We perform separate analyses with complex, as well as real WCs.
For the analysis with all data, we obtain a very poor quality fit for all the one operator scenarios with the respective p-values << 1%. It is hard to get a statistically significant result for the NP to fit all the 224 data points that  we consider. There must be some data points that can not be explained alongside the rest of the data. Note that O 9 is the only one operator scenario in both the analyses with complex and real WCs having a relatively better p-value, though it is < 1%. Table III shows the results of the fits corresponding to O 9 only. In all other one operator scenarios, the fit quality is very poor, with the respective p-values ≈ 0. Parameter uncertainties are obtained from both the Fisher-matrix and the 1-dimensional (1-D) profile-likelihood of the parameter of interest. We have repeated the fit with the new 'Likelihood 2020 datasets' in two more subsets: (i) without considering the inputs from the CPasymmetric observables in B s → φµµ, (ii) including the inputs from the CP-asymmetric observables in B → K * µµ from 'Likelihood 2016 datasets'. For comparison, the results for these different datasets corresponding to the oneoperator scenario O 9 are presented in the same table III. The allowed confidence intervals for Re(∆C 9 ) are consistent with each other for all the three different fits, and large imaginary contributions are allowed by the present data even after dropping the CP-asymmetric observables in B s → φµµ and B → K * µµ from the fit. We have given the values of the respective χ 2 SM and mentioned the corresponding p-values (in SM) for all the three different scenarios. For all these cases, following ref. [49] we have calculated Pull SM which is defined as Here, F is the χ 2 cumulative distribution function and n dof is the associated number of degrees of freedom. In this case we are comparing the respective NP scenarios against the SM by defining Pull SM . A large value of this quantity indicates a large deviation from the SM. The respective values of the Pull SM are given in the third column of the above table.
In the following we will discuss the possibility of improvement in the statistical significance of the fits. To proceed further, we have computed the pulls of the SM estimates w.r.t the corresponding measured values for all the observables that make up the different datasets as discussed above. The pull corresponding to the i th observable O i is defined as: where ∆O i corresponds to the uncertainty of the data, including theoretical uncertainties. The SM predictions are provided in [29]. From our analysis, we found a few angular observables (listed in table I) whose measured values (from LHCb) have pulls greater than 2 w.r.t the corresponding SM estimates for the likelihood dataset 2020. The table also shows the pulls corresponding to the likelihood dataset 2016 for the same observables so that the trend of the data and the differences between the two sets become clear. , respectively. Besides this, using results of fit to all data, we compare the fitted values (in an NP scenario ) of each observable to Observables in List-1 with the respective pulls in the one-operator scenario O9 BR(B 0 → K 0 µ + µ − ) [1,6] [4,6] (ATLAS) → 2.90 P [4,6] (ATLAS)→ 2.10 P [4,6] (ATLAS) → 2.02 The respective values of pull N P for the observables in List-1 which are obtained using eq. 7 for the one-operator scenario O9 with complex WC. For the scenario with real WC also, these observables have pull N P > 2. The superscripts on the observables indicate the q 2 range in GeV 2 .
their measured values by defining the pull, which for the i th observable is given by Here, O i (C N P k ) is the predicted value of the i th observable in a new physics scenario with the best fit value of C N P k . A few observations from the pull analyses of eq. 7: 1. Apart from O 9 , in all the other one-operator NP scenarios, the data points listed in tables I and II with a pull > 2 are also selected with a pull N P > 2.
2. Only for the one-operator scenario O 9 , the three data points: P 3. Note that apart from the data points given in table I and II, the LFUV data points R (LHCb) also have pull > 2 compared to their SM estimates. However, in different NP scenarios, the estimates of pull N P for R (LHCb) in various NP scenarios including O 9 . As will be discussed later, it is hard to explain the data on R [0.045,1.1] K * 0 (LHCb) in the NP scenarios that we are considering in our analysis. We have included all these important data points on LFUV observables in our fits. 4. In all the NP scenarios, a few additional data points are selected with a pull N P > ∼ 2. The data points which are common among all of them are the following: [1,6] (Belle).

5.
Note that in the one-operator scenario O 9 , the four data points mentioned in item 4 are the only additional points which are picked up with a pull N P > 2 in addition to what we have discussed earlier in points 1 and 2.
6. For the one operator scenarios other than O 9 the number of data points with pull N P > 2 are more than the four data points mentioned in point 3.
7. Note that the data points: [1,6] (Belle) (which are common to all other datasets) are picked up with a pull N P > 2 in the analyses with other datasets.
Following the above discussion and with the primary intention to get a fit with allowed p-values, we prepare a list of data as given below which we will drop while fitting, • List-1: This list is prepared from the Likelihood dataset 2020. It contains the observables listed in tables I and II for the same dataset with a pull > 2.5. As discussed above, the optimised observable P has not been included in this list. Also, it includes P 5 (B 0 → K * 0 µ + µ − ) [4,6] (ATLAS), S 5 (B 0 → K * 0 µ + µ − ) [4,6] (ATLAS), dB [1,6] (Belle). Including all these, the list contains 12 data points which are explicitly shown in table IV. For the observables listed in List-1, the respective values of the pull N P are given in table IV; these values are obtained for the one-operator scenario O 9 with complex WCs. For the same scenario with real WC the respective  pulls are also greater than 2. Following the discussions in items 2 and 3 above, we are not dropping the following data points from LHCb in the analyses: P We repeat all the fits with one operator after dropping these 12-data points from our complete list. After dropping the observables given in List-1, we have noticed a considerable improvement in the overall fit quality. Other than O 9 , the fit-qualities for all the other one-operator scenarios, though better than those before dropping these data, are still very poor. For O 9 , we have presented the corresponding results in table V. The respective improvements in the fit scenarios compared to those given in table III are apparent. Note that the allowed parameter spaces for ∆C 9 are consistent in the fits with and without the data dropped from List-1, respectively. We observe a slight improvement in the fit-quality for the analysis with complex WCs compared to real WCs.
For the current datasets, we do not have many references to cross-check our findings. At the moment, we are aware of a few analyses which have used 2020 LHCb datasets [50][51][52][53]. However, none of these considers the exact same dataset as ours; only a subset has been used. At least in [52,53], the authors perform a global fit of complex C 9 WC including LHCb 2020 data on B 0 → K * 0 µ + µ − [6]. In ref. [51], the authors have performed the analysis with 117 data points which include a considerable number of data from Λ b → Λµµ. The analysis of ref. [50] finds the above scenario to be favourable. However, in [50], the authors have considered a total of 180 data points. Although many of them coincide with ours, there are appreciable differences. For example, they have not considered any CPaveraged observables (S i ), forward-backward asymmetries A F B from LHCb, measured values of branching fractions and isospin asymmetries from Belle, a few optimized observables in large bins from LHCb, asymmetric observables in B s → φµµ decays etc. Also, they have considered a few data points corresponding to the high-q 2 regions, which we don't consider. Therefore, in general, it would be hard to compare our findings with them directly. However, we would like to mention that out of the 12 data points listed in table IV only 4 data points are considered in the analysis of [50]. They have included the following four data points: [4,6] (ATLAS), P 6 (B 0 → K * 0 µ + µ − ) [4,6] (LHCb) and , rest of the 8 data points are not considered.
A few additional remarks on the impact of the data points listed in table IV on the fit to 'Likelihood dataset 2020': • Note that the data points listed in tables I and II with higher pulls (w.r.t. the SM values) also have higher values of pull N P obtained using eq. 7. • In the analysis of Likelihood dataset 2020, if we drop only the observables listed in the left column of table IV, we get an allowed fit for the scenario O 9 with a p-value ≈ 7%. • We have noted that the inclusion of data points: P 4 (B 0 → K * 0 µ + µ − ) [4,6] (ATLAS), P [4,6] 6 (LHCb) and [4,6] (ATLAS) from List-1 reduce the fit-quality from 62.5% to 36%. In addition, if we include dB , the p-value of the fit will reduce further by a few percent. Hence, we can get an allowed fit with an appreciable p-value for the scenario O 9 even if we include these three inputs on optimized observables. Estimation of the parameter-uncertainties from the Hessian matrix of the χ 2 assumes Gaussian nature for the said uncertainties, and is not always accurate. On the other hand, confidence intervals estimated from the actual 1-D profile likelihoods correctly capture the non-uniformity of the uncertainties (also, ambiguities), and thus are considered to be more accurate. Following this reasoning, we have chosen to showcase our results in the latter form in table V. The corresponding 1-D and 2-D profile likelihoods for the real and imaginary WCs have been depicted as 1-CL plots, which are shown in fig. 1. In 2-D, x σ CL corresponds to the region whose projection on both axes would bound x σ CL for the corresponding 1-D parameter space. For example, in 2-D, 1 σ ≡ 39.347%CL.
As mentioned earlier, in figs. 1a and 1b, we have compared the 1-D profile likelihoods of Re(∆C 9 ) and Im(∆C 9 ) obtained from fits to the different datasets as given in table V. In fig.1c, we have shown the corresponding 2-D profile likelihoods in the Re(∆C 9 ) − Im(∆C 9 ) plane. Following are the observations made from these profile likelihoods: • As shown in the table V, the allowed confidence intervals for Re(∆C 9 ) are consistent in all the three fit scenarios.
• At 68.3%CL, we note a significant shift in the allowed regions of Im(∆C 9 ) in the fits with and without the CP-asymmetric observables in B → K * modes, and they are almost mirror images of each other. • The fit to the datasets without the CP-asymmetric observables in B → K * modes has a slight preference for positive solutions for Im(∆C 9 ), though negative solutions are allowed. Also at 95%CL, the allowed regions are consistent with zero (see figure 1c). • On the other hand, the fit to the datasets with all the CP-asymmetric observables in B → K * and B s → φ modes prefers negative solutions for Im(∆C 9 ), though positive solutions are also allowed. In this case, even at 68% CL the allowed regions are consistent with zero (see figure 1c). • In the fit with all the CP-asymmetric observables dropped, both positive and negative solutions for Im (∆C 9 ) are allowed and both of them can be considered favorable. As can be seen from fig. 1c, at 68.3%CL we have two disconnected regions that are almost symmetric w.r.t zero; while at 95%CL we get a zero consistent solution.
In particular, we have checked that the χ 2 min for the negative and positive solutions are 194.93 and 195.03, respectively 1 .
• The allowed ranges of Im(∆C 9 ) will be sensitive to the choices of the CKM elements since in the SM, the complex part of C 9 is sensitive to the CKM element V ub which has a phase. In this analysis, we have presented our results using the inputs from CKMfitter group [47] where the analysis has been done assuming SM for the ∆F = 2 processes 2 . It is possible to understand these observations from a study of the dependencies of the CP-asymmetric and other associated angular observables in B → K * and B s → φ decay modes on Im(∆C 9 ). We have figured out a few CPasymmetric observables in B s → φµµ decays, for example, A and A [2,5] 8 for which, at the present level of accuracy, the current measured values prefer the positive values of Im(∆C 9 ) slightly. The dependence of these observables on Im(∆C 9 ) for the best fit value of Re(∆C 9 ) are shown in figure 2. Note that though these observables prefer a positive value for Im(∆C 9 ), negative values are also allowed. This is also evident from the corresponding profile likelihood as discussed above. On the other hand, from the CP-asymmetric observables in B → K * modes we have found that A favor negative values of Im(∆C 9 ), which are shown in fig. 2, though positive values are also allowed. In general, for both B → K * and B s → φ modes the dependence of A 8 on Im(∆C 9 ) should be similar. However, as can be seen from the figures, the deciding factors in this case are the respective measured values. Note that the measured values of A 8 in B s → φ have large errors and are slightly more positive than the ones measured in B → K * modes. The data for the B → K * modes being relatively more precise, they play the dominant role in choosing the favorable solutions for Im(∆C 9 ) when combined with the CP-asymmetric observables due to B s → φ.
In all the three fit scenarios, with and without the CP-asymmetric observables, a considerable value of Im(∆C 9 ) is allowed. Note that large values of Im(∆C 9 ) are allowed even when the CP-asymmetric observables in B → K * and B s → φ decays are not included in the fit, though at 95%CL the solutions are consistent with zero. The requirement of a large Im(∆C 9 ), even in the absence of any CP-asymmetric observables can be understood by looking at the dependence of other observables on Im(∆C 9 ). We have pointed out a few such observables in this work. In figs. 3a, 3b, 3c and 3d, we have shown the variation of F L , A F B , P 5 and R K + respectively with Im(∆C 9 ) for the allowed values of Re(∆C 9 ) (at 68%CL), as given in Table V. We display only those q 2 -bins that exhibit the possibility of a large non-zero imaginary contribution from ∆C 9 . As can be seen from the figures, we require a large non-zero Im(∆C 9 ) to explain the experimental observations within their 1-σ confidence interval in a few specified q 2 -bins. Moreover, the allowed values are symmetric about Im(∆C 9 ) = 0. Though it is too early to conclude, at the moment it might be hinting towards the possibility of a large imaginary contribution to ∆C 9 . However, further assertions and hence solid conclusions must be subject to more precise data, hopefully in the near future.
In passing, we also carry out the fit for the interesting scenario ∆C 9 = −∆C 10 . The analysis with the LHCb 'Moment datasets' shows this scenario to be a favorable solution for the data, which is in agreement with earlier observations (see, for example, [29,55]). Also, for the 'Liklihood 2020 datasets', this scenario has been shown as a favourable one in [50]. For the NP scenario as mentioned above, we have done a comparative study in all the different datasets given earlier. At first, we have carried out the fit to this scenario after dropping the inputs given in 'List-1'. The corresponding result is shown in the first row in Table VI. The fit has a p-value ∼ 6%, which can be considered as a marginally significant result. Therefore, to look for a possibility of improvement in the quality of fit, we have prepared a second list with the following observables, • List-2: It contains all the observables in table II and the observables from LHCb likelihood 2020 dataset in    . On top of this, we have dropped few more observables like P 5 (B 0 → K * 0 µ + µ − ) [4,6] (ATLAS), S 5 (B 0 → K * 0 µ + µ − ) [4,6] (ATLAS), dB dq 2 (B + → K + µ + µ − ) [0.1,0.98] (LHCb) and BR(B + → K + µ + µ − ) [1,6] (Belle). Overall, this list contains 19 data points.
Additional observables in List-2 with the respective pulls in the one-operator scenario O9 In table VII, the respective pulls for the additional observables in List-2 for the one-operator scenario O 9 is shown. For the scenario ∆C 9 = −∆C 10 also, these observables have similar pulls.
We repeat the fit for the new physics scenario as mentioned above after dropping the 19 data points as listed in List-2 4 . The corresponding results are given in the third row of the table VI. We noticed improvement in the fit quality in terms of the respective p-values.
We have also noted improvement in fit quality for the scenario with a single operator O 9 . For example, after dropping the observables listed in 'List-2' in the scenario "Likelihood 2020 dataset", the results are as given below: • without the CP-asymmetric observables in B → K * : Re(∆C 9 ) → −1.08 ± 0.11, Im(∆C 9 ) → 1.20 ± 0.38, p-value 95%, • with the CP-asymmetric observables in B → K * : Re(∆C 9 ) → −1.08 ± 0.11, Im(∆C 9 ) → −1.03 ± 0.48, p-value 93.3%. Inspite of the improvement in the respective p-values in all the other one operator scenarios, they can not be considered as a good fit. 3 Note that the only available data on A is from LHCb; therefore, we keep it in our analysis, though these measurements have large errors and a pull > 2. Following the earlier discussion, we don't drop P [4,6] 5 , S [4,6] 5 , and P [4,6] 2 from the analysis 4 In this scenario, we perform the analysis with 205 number of data-points which is still larger than that was considered in [50] where they have got a fit-probability of about 50% for the NP scenario ∆C 9 = −∆C 10 .
Observables from List-1 or List-2 Other Experiment LHCb NP prediction (O9) BR(B 0 → K 0 µ + µ − ) [1,6] (Belle) (0.31 ± 0.19) × 10 −7 (Belle) (0.92 ± 0.17) × 10 −7 (1.12 ± 0.21) × 10 −7 BR(B + → K + µ + µ − ) [1,6] [2,4.3] 0.85 ± 0.35 (CMS) N.A. 0.0225 ± 0.0005 In table VIII, we have compared the predicted values of the observables from List-2 (or in List-1) in the one operator scenario O 9 with the respective measured values from LHCb and other experiments. This table discusses 9-data points from Lists, out of which eight are measured in experiments other than LHCb. As compared to List-1, in List-2 we have added 7 more observables out of which the predicted value of F H (B + → K + µ + µ − ) [2,4.3] in the NP scenario O 9 has been given in table VIII. Note that the observables P 6 , P 8 , P 1 and P 3 are related to S 7 , S 8 , S 3 and S 9 , respectively. Therefore, any observed deviations in P 6 , P 8 , P 1 and P 3 are related to that in the measured values of S 7 , S 8 , S 3 and S 9 , respectively. Based on all these and the discussion above, a few useful remarks about the data points in List-2 or List-1 are in order: • We have dropped the two data points BR(B 0 → K 0 µ + µ − ) [1,6] and BR(B + → K + µ + µ − ) [1,6] measured by Belle. As can be seen from table VIII, the measured values of these branching fractions by Belle and LHCb are not consistent with each other, at least within their 1-σ error bars. The LHCb data points are included in the analysis. In the new physics scenario, along with the respective parameter spaces favoured by all the analysed data points, the predicted values of the above two branching fractions are in good agreement with that measured by LHCb while these are in tension with Belle. The measured values by Belle have large errors as compared to LHCb. It is essential to mention that many other data points from LHCb on the same branching fractions in different small q 2 bins are included in the analysis, which is comfortably explained by the allowed NP scenario. For example, the dataset includes dB dq 2 (B +,0 → K +,0 µ + µ − ) from LHCb in all the bins of q 2 in between [0.98,6]. Also, we have dropped dB dq 2 (B + → K + µ + µ − ) [0.1,0.98] by LHCb, as can be seen from the table predicted value is in tension with the measured one. As discussed earlier, this data point has minimal impact on the fit probability (p-value).
• The measured value of A I (B → Kµ + µ − ) [1,6] by Belle has large errors and the allowed NP solution has negligible impact on this observable. • We have presented our main analyses after dropping the data points P [4,6] 4 , P [4,6] 5 , S [4,6] 4 and S [4,6] 5 measured by ATLAS since these data points have large pulls. The reason is clear from the observation made in table VIII. The data points have large errors and are inconsistent with the relatively precise data from LHCb. The allowed NP solution can easily accommodate these LHCb data points alongside all the other data considered in the analyses. While the respective Belle data largely deviate from the one predicted in the allowed NP scenario.
• In the analyses, we have dropped the data points P from LHCb. For the same observables, we include the data points in three other small q 2 bins in the region 0.1 ≤ q 2 ≤ 4 GeV 2 . Note that the SM predictions are small, these observables have very low sensitivity to the variation of ∆C 9 [1,56]. We have observed that it is hard to explain the current measured values in the above-mentioned bins for the allowed NP solutions. The data points in the other three bins, as mentioned above, are consistent with the predicted values in the SM. Here, we would like to mention that even though the above four data points impact the statistical significance of the results, if we keep them in the analysis, we get results with allowed p-values.
• CMS has measured F H (B + → K + µ + µ − ) [2,4.3] with an error ≈ 40%. The corresponding prediction in the NP scenario O 9 is given in as these observables are insensitive to ∆C 9 [1,20,56]. In the respective bins, the predictions for the allowed NP scenarios do not change from the respective SM predictions. Here, we would like to mention that the measurements are also available in three more bins for each of the observables, and they are all consistent with the respective SM predictions.
• Out of the seven observables as mentioned in the last item, F H (B + → K + µ + µ − ) [2,4.3] , P have major impact in determining the quality of the fit. For example, in the analysis of NP scenario O 9 , alongside the data points in List-1 if we drop only these three data points (with large pulls) then the p-value of the fit increases from 62.5% (table V) to 87%.
• We have checked that if we drop the Belle and ATLAS data, and F H (B + → K + µ + µ − ) [2,4.3] from CMS from List-1, we get acceptable fit quality ∼ 10 % for the complex O 9 scenario. Therefore, to get an acceptable fit we don't have to drop any LHCb data. All the observations mentioned in the above items are also applicable for the scenario O 9 = −O 10 . We can not explain a few data listed above in the allowed NP scenario because of their low sensitivity to this scenario. Also, we notice that to get an acceptable fit, we don't need to drop any LHCb data. Still, the above exercise is helpful for one to see the impact of very few data points in determining the statistical significance of the analysis without changing the outcome. However, we should stress that the data suffer from large error and further conclusions must wait for more precise data.
As mentioned earlier, we present our results based on the most recent 'likelihood' data (by LHCb). To understand the trend of the data, we carry out a fit using the old-data (Run-1: Likelihood and moments) on angular observables from LHCb [5] with complex WCs. The 'Likelihood datasets 2016' and 'Moment datasets 2016' contain a total of 259 and 281 data points respectively. For the 'moment datasets', including all these observables, one can fit the one-operator scenario O 9 with an allowed p-value. However, it is hard to fit the other one-operator scenarios with allowed p-values, including the ∆C 9 = −∆C 10 scenario. On the other hand, for 'Likelihood datasets 2016', fit to any NP scenario, including all the observables has a fit-probability p << 3 %. As discussed earlier, the data points: [4,6] (ATLAS), S 5 (B 0 → K * 0 µ + µ − ) [4,6] (ATLAS), dB dq 2 (B + → K + µ + µ − ) [0.1,0.98] (LHCb) and BR(B + → K + µ + µ − ) [1,6] (Belle), and the ones given in table II are common to all the datasets. To be consistent with the earlier analyses, we prepare a third list consisting of these observables only: • List-3: It includes P 5 (B 0 → K * 0 µ + µ − ) [4,6] (ATLAS), S 5 (B 0 → K * 0 µ + µ − ) [4,6] (ATLAS), dB [1,6] (Belle) in addition to the observables listed in table II with a pull > 2.5. After dropping the data points given in List-3 from the old-datasets, the fit results for the scenario ∆C 9 = −∆C 10 is given in the last two rows of table VI. Note that in both the datasets, we have results with appreciable fit-probabilities.
For the one-operator scenarios, the comparative results are shown in Table XI in the appendix. We have dropped the inputs on the observables given in List-1 for the "Likelihood 2020 datasets". Note that in the analyses involving the "Likelihood 2016 datasets" and "Moments 2016 datasets" we have dropped the inputs given in List-3. We find that O 9 is the only one operator scenario that can comfortably explain all these data-sets, and a large value of Im(∆C 9 ) is allowed by the current data. The fit with the "moment-dataset" results in appreciable fit probabilities for all the one operator scenarios, as seen from table XI. The old-data-set includes the CP-asymmetric angular observables in B → K * decays while the 2020 update [6] does not. With a p-value ≈ 71.6%, the fit with O 9 for the old Likelihood data is slightly better than the one we obtain for the new-data-set, though the value of the corresponding χ 2 min is larger in the old-dataset as compared to the new dataset. This could be due to the relatively large number of observables in the old datasets. As can be seen from table XI, the magnitude of the best fit value of Re(∆C 9 ), corresponding to the new dataset reduces by ≈ 10% as compared to the one obtained from the old likelihood-data.

V. MODEL SELECTION
As seen above, O 9 is the only one-operator scenario that can comfortably explain the present 'likelihood' data. In the future, more precise data might prefer more complex multi-operator scenarios or models. However, with the increasing complexity of a model, its predictive capability deteriorates. Thus, model selection needs to take both goodness of the fit and the complexity of the competing models into account. Selecting the best model from a set of candidates for a given set of data is thus not an easy task. Several criteria are available in the literature that achieve this. Keeping at par with our earlier publication [19], in order to measure model performance and select the best model from a set of potential models we use the mean squared error (MSE) and small-sample-corrected Akaike's Information Criterion (AICc), which is defined as where n is the sample-size and K is the number of estimable parameters. The applicability of these criteria and other details are provided in [19].   For model selection, 'cross-validation' (CV) is the most generally applicable, powerful, reliable, and computationally expensive method. The most straightforward and expensive version of cross-validation is "leave-one-out cross-validation" (LOOCV). In LOOCV, one of the data points is left out and the rest of the sample ("training set") is optimized for a particular model. Then that result is used to find the predicted squared error (SE) for the left out data point, which is defined for the i th observable as given below where σ exp and σ theory are the experimental and theory errors of the the i th observable. This process is repeated for all data points and a mean-squared-error (MSE) is obtained using all those residuals. This process is repeated for all models. The models with the least MSE are the best ones. Although leave-one-out-cross-validation (LOO-CV) is asymptotically equivalent to AICc, there are differences between the two. Theoretical considerations aside, AICc is just likelihood penalized by the degrees of freedom. Evidently, AICc accounts for uncertainty in the data (-2Log(L)) and makes the assumption that more parameters leads to higher risk of over-fitting (2k). Cross-validation (CV) just looks at the test set performance of the model, with no further assumptions. There is no explicit measure of model complexity, unlike AICc. Clearly, AICc penalizes model complexity more than CV. The accepted practice in the literature is that if one cares mostly about making predictions and assumes the test set(s) to be reasonably similar to the validation sets, one should go for CV (only with large number of data).
Following the reasoning in the paragraph above, it seems clear a priori that models with low MSE but not selected by AICc may have more parameters. Furthermore, this study clearly finds that under the simultaneous application of both AICc and CV, the models cluster in such a way that results in a more robust selection of models than the use of any one of these criteria alone. So the fact that some of the selected models by CV are further discarded by AIC due to relative complexity is actually quite a non-trivial finding.
As shown from eq. 4, there are 9 operators and corresponding WCs that we are considering. Considering all of them to be real, there are k=1,9 (9!/k!(9 − k)!) = 511 possible combinations, i.e. scenarios to be considered. If we consider the WCs to be complex in general, disregarding the combinations with a mixture of real and complex WCs, there are exactly those many cases where the WCs are complex. In total, there are 1022 possible combinations of the coefficients forming a predefined global set of different scenarios. Thus, the considered scenarios only include cases where the WCs are either all real or all complex. Among several competing models, we select the best ones that explain the data by using the conservative limit of ∆AICc ≤ 6 and with a low value of MSE X-val < 1.5. There is no particular reason for choosing the allowed values of MSE X-val lower than 1.5, it is just a choice. The idea is to check that the selected scenarios in AIC c should not have a very high MSE score. Also, we have noted that most of the models which are not selected by both the criteria have MSE score >> 1.5. Out of these 1022 possible combinations, only a few are selected by these criteria and are listed in table IX. We rank the selected models according to the value of ∆AICc. Note that O 9 with complex WC, though not the best model, is the only one-operator scenario passing all the selection criteria. Some two, three and four-operator scenarios are selected, and all of these contain O 9 (with real or complex WC) as one of the operators. Note that apart from model 18, all the selected models have complex WCs. Here, we have dropped a few selected complex models with five operator scenarios. The best model is a three-operator scenario: [O 9 , O 9 , O 10 ] with complex WCs. As expected, the real parts of the WCs are large. Also a large non-zero contribution from Im(∆C 9 ) is required in all the selected cases. In a few cases large non-zero contributions from Im(C 9 ) is also allowed. Note that we have obtained the above results in the absence of the CP-asymmetric observables from B → K * decays. We have explicitely checked that this conclusion holds even if we drop the asymmetric observables in B s → φµµ decays. For this particular dataset χ 2 SM = 288.9. As we have discussed earlier, our main datasets due to 'Likelihood 2020' do not contain the CP-asymmetric observables of B → K * decay modes. It is expected that these observables will be more sensitive to the imaginary parts of the WCs. As shown in table V, we have carried out a fit after adding the data on these observables from the 'Likelihood 2016 datasets' with those in 'Likelihood 2020 datasets'. The results show that the allowed range of Re(∆C 9 ) remains the same, while the Im(∆C 9 ) changes its sign because of a few asymmetric observables as shown earlier. We have checked that almost all the models given in table IX are selected even after incorporating the additional datasets as mentioned above. The corresponding details are provided in table X. There are a few changes in the ranking of the respective models. Now in the AIC c ranking, the two operator scenario [O 9 , O 9 ] with real WCs is slightly ahead of the three operator scenario [O 9 , O 10 , O 9 ] (model-585) with complex WCs. This is due to a slight increase in the value of χ 2 min /DOF for the model-585. There are a few more complex models which are selected for 4 < ∆AIC c < 6 but are not listed in this table. The one operator scenario O 9 with real WC appears in the list with a value of ∆AIC c = 5.032 and MSE X−val = 0.964, respectively. The same scenario has a relatively high value of ∆AIC c = 8.51 in the analysis without the CP-asymmetric data in B → K * modes. Also, the two operator scenario [O 9 , O 10 ] with real WCs is now selected by the data with ∆AIC c = 5.569 and MSE X−val = 0.967. Earlier this model was not selected because of its high value of ∆AIC c . On the other hand, the two operator scenario [O 9 , O 10 ] (model no. 530) with the complex WCs is not selected by the present datasets since it has ∆AIC c = 7.32, though the associated p value is 61.58%.
We note from table X that in all the two or more operator scenarios, the two operators O 9 and O 9 are common and Im(C 9 ) has a sizable positive contribution that is not consistent with zero at 68% confidence interval (CI). However, Im(∆C 9 ) is consistent with zero even at 68% CI, though it could be large within the allowed interval since the estimated error is large. The differences in the allowed CIs of Im(∆C 9 ) between the results presented in tables IX and X could be because a large positive value of Im(∆C 9 ) is not allowed by the CP-asymmetric data in B → K * modes (see Figs. 2c, 2d, 2e and 2f). Similarly, in the relevant scenarios Im(∆C 10 ) or Im(C 10 ) could be large while being consistent with zero. Fig. 4a shows the predictions of R ( * ) K and P 5 in different q 2 bins for our selected models from table IX, while comparing them with the corresponding measurements. Note that here the correlations between the SM and the NP parameters have been neglected. In all the models, the predictions are consistent with the measurements of R ( * ) K by Belle. Our best model can accommodate all the observed data on R ( * ) K and P 5 at 1-σ CL, except the R low K * (LHCb). In fact, none of the selected models can explain the R low K * (LHCb) at 1-σ. At low-q 2 the B → K * decay rates are dominated by 1/q 2 enhanced photon contributions and as pointed out earlier in several model independent analyses [15,55,57], a NP contact interaction that explains R K and R Cen K * affects R Low K * typically by at most 10%. Also, the possibility of explaining R Low K * with tensor operators has been addressed in ref. [58], which we are not considering in this analysis (see also [59]). Note that in all the selected models, the predicted results are consistent with their respective measured values at 2-σ. In model 641, the predictions of R K and R Cen K * have relatively large errors compared to that in R Low K * . This is due to the large errors in the fit result of Im(∆C 10 ), while R Low K * is very less sensitive to these WCs. Similar predictions/comparisons are provided in Fig. 4b for the selected models in table X. No noticeable changes are observed except now the errors in R K and R Cen K * have reduced because of the reason mentioned above.

VI. SUMMARY
Following a model-independent effective theory approach with dimension-six operators, we have analyzed the new physics effects in b → s decays, based on the data available till date. To the best of our knowledge, we are analyzing the relevant operator basis with complex WCs for the first time in literature. We have found that O 9 is the only one-operator scenario with both real and complex WC (with a large non-zero imaginary part), which can provide a plausible explanation of the given data. This is the case even if all the CP-asymmetric observables are dropped from the fit. We have pointed out the corresponding CP-averaged and CP-asymmetric observables which could be the probable source of such large imaginary contributions. Given the data, we have used the method of model selection incorporating both AIC c and cross-validation to pinpoint the best possible combination of operators with real and complex WC, which can best explain the data. The scenario with O 9 is the only one-operator scenario which passes the test. However, there are a few two, three and four-operator scenarios which have passed all the criteria set by the selection methods. Allowed confidence intervals of the new WCs are shown. For the selected models, we have provided predictions for various observables and compared them.