Simultaneous Explanation of the $R_K$ and $R_{D^{(*)}}$ Puzzles: a Model Analysis

$R_K$ and $R_{D^{(*)}}$ are two $B$-decay measurements that presently exhibit discrepancies with the SM. Recently, using an effective field theory approach, it was demonstrated that a new-physics model can simultaneously explain both the $R_K$ and $R_{D^{(*)}}$ puzzles. There are two UV completions that can give rise to the effective Lagrangian: (i) $VB$: a vector boson that transforms as an $SU(2)_L$ triplet, as in the SM, (ii) $U_1$: an $SU(2)_L$-singlet vector leptoquark. In this paper, we examine these models individually. A key point is that $VB$ contributes to $B^0_s$-${\bar B}^0_s$ mixing and $\tau \to 3\mu$, while $U_1$ does not. We show that, when constraints from these processes are taken into account, the $VB$ model is just barely viable. It predicts ${\cal B} (\tau^-\to\mu^-\mu^+\mu^-) \simeq 2.1 \times 10^{-8}$. This is measurable at Belle II and LHCb, and therefore constitutes a smoking-gun signal of $VB$. For $U_1$, there are several observables that may point to this model. Perhaps the most interesting is the lepton-flavor-violating decay $\Upsilon(3S) \to \mu \tau$, which has previously been overlooked in the literature. $U_1$ predicts ${\cal B}(\Upsilon(3S) \to \mu \tau)|_{\rm max} = 8.0 \times 10^{-7}$. Thus, if a large value of ${\cal B}(\Upsilon(3S) \to \mu \tau)$ is observed -- and this should be measurable at Belle II -- the $U_1$ model would be indicated.

1. b → sµ + µ − : The LHCb Collaboration has made measurements of B → K * µ + µ − [1,2] that deviate from the SM predictions [3]. The Belle Collaboration finds similar results [4]. The main discrepancy is in the angular observable P 5 [5]. The significance of the discrepancy depends on the assumptions about the theoretical hadronic uncertainties [6][7][8]. Indeed, it has been recently argued [9] that, by including non-factorizable power corrections, the experimental results can be reproduced within the SM. However, the latest fits to the data [10,11], which take into account the hadronic uncertainties, find that a discrepancy is still present. It may reach the 4σ level.
The LHCb Collaboration has also measured the branching fraction and performed an angular analysis of B 0 s → φµ + µ − [12,13]. They find a 3.5σ disagreement with the predictions of the SM, which are based on lattice QCD [14,15] and QCD sum rules [16].
2. R K : The LHCb Collaboration has found a hint of lepton non-universality. They measured the ratio R K ≡ B(B + → K + µ + µ − )/B(B + → K + e + e − ) in the dilepton invariant mass-squared range 1 GeV 2 ≤ q 2 ≤ 6 GeV 2 [17], and found This differs from the SM prediction of R SM K = 1 ± 0.01 [18] by 2.6σ, and is referred to as the R K puzzle. 3. R D ( * ) : The charged-current decaysB → D ( * ) −ν have been measured by the BaBar [19], Belle [20] and LHCb [21] Collaborations. It is found that the values of the ratios R D ( * ) ≡ B(B → D ( * ) τ −ν τ )/B(B → D ( * ) −ν ) ( = e, µ) considerably exceed their SM predictions. Assuming Gaussian distributions, and taking correlations into account, the experimental results and theoretical predictions can be combined to yield [22,23] R ratio The measured values of R D and R D * represent deviations from the SM of 1.7σ and 3.1σ, respectively. These are known as the R D and R D * puzzles.
It must be stressed that, while the discrepancies in point 1 have some amount of theoretical input, those in points 2 and 3 are quite clean. As such, the R K and R D ( * ) puzzles provide very intriguing hints of new physics (NP) 1 .
In Ref. [25], Hiller and Schmaltz searched for a NP explanation of the R K puzzle. They performed a model-independent analysis of b → s + − , considering NP operators of the form (sOb)(¯ O ), where O and O span all Lorentz structures. They found that the only NP operator that can reproduce the experimental value of R K is of (V − A) × (V − A) form: (s L γ µ b L )(¯ L γ µ L ). Subsequent fits [26][27][28], which included both the B → K * µ + µ − and R K data, confirmed that such a NP operator can also account for the P 5 discrepancy. To be specific, b → sµ + µ − transitions are defined via the effective Hamiltonian where the primed operators are obtained by replacing L with R. The Wilson coefficients C ( ) a include both SM and NP contributions. In the fits it was shown that a NP contribution to b → sµ + µ − is required; one of the possible solutions is C N P 9 = −C N P 10 < 0, with C N P 9 large. This corresponds to the (s L γ µ b L )(¯ L γ µ L ) operator of Ref. [25]. In Ref. [29], Glashow, Guadagnoli and Lane (GGL) stressed that the NP responsible for lepton flavor non-universality will generally also lead to lepton-flavor-violating (LFV) effects. To illustrate this, they proposed the following explanation of the R K puzzle. The NP is assumed to couple preferentially to the third generation with (V − A) × (V − A) form, giving rise to the operator where G = O(1), G/Λ 2 NP G F , and the primed fields are the fermion eigenstates in the gauge basis. When one transforms to the mass basis, this generates the operator (b L γ µ s L )(μ L γ µ µ L ) that contributes tob →sµ + µ − . The contribution tob →se + e − is much smaller, leading to a violation of lepton flavor universality. GGL's point was that LFV decays, such as B → Kµe, Kµτ and B 0 s → µe, µτ , are also generated. In Ref. [30], it was pointed out that, assuming the scale of NP is much larger than the weak scale, the operator of Eq. (1.4) should be made invariant under the full SU (3) C × SU (2) L × U (1) Y gauge group. There are two possibilities: where G 1 and G 2 are both O(1), and the σ I are the Pauli matrices. Here Q ≡ (t , b ) T and L ≡ (ν τ , τ ) T . The key point is that O N P 2 contains both neutral-current (NC) and charged-current (CC) interactions. The NC and CC pieces can be used to respectively explain the R K and R D ( * ) puzzles 2 . (Of course, while a common model of these anomalies is intriguing, it is also more constraining than separate explanations of the two puzzles.) This method was explored in greater detail in Ref. [44]. The starting point is the model-independent effective Lagrangian based on Eq. (1.5): These operators are written in the gauge basis and involve only third-generation fermions.
In transforming from the gauge basis to the mass basis, the left-handed down-and up-type quarks are operated upon by the matrices D and U , respectively, where the Cabibbo- The leptons are different: neglecting the neutrino masses, the left-handed charged and neutral leptons are both operated upon by the same matrix L. In Ref. [44] it is assumed that the transformations D and L lead to mixing only between the second and third generations, so that they each depend on only one unknown theoretical parameter, respectively θ D and θ L . In the mass basis, the above operators contribute to a variety of B decays. Ref. [44] considers the following processes/observables: [45,46]. The experimental measurements thus put constraints on the coefficients, which are all functions of G 1 , G 2 , θ D and θ L . When all constraints are taken into account, it is found that the R K and R D ( * ) puzzles can be simultaneously explained if θ L is of the order of π/16 and θ D is very small (less than V cb ). With these values for θ L and θ D , one can make predictions for the rates of other (LFV) processes, and this is done for B → K ( * ) and B 0 s → ( = τ µ, τ e, µe). Finally, Ref. [44] considers possible UV completions that can give rise to O N P 2 [Eq. (1.5)], that is required to explain both R K and R D ( * ) . Its coefficient (G 2 /Λ 2 NP ) suggests that this operator is generated by the tree-level exchange of a single particle. In this case, there are only four possibilities for the underlying NP model: (i) a vector boson (V B) that transforms as (1, 3, [47]. It is shown that the combination of O N P 1 and O N P 2 generated by the S 3 and U 3 leptoquarks cannot simultaneously explain R K and R D ( * ) . The only possible UV completions are therefore the V B and U 1 models.
But this now raises a question. If the NP responsible for the R K and R D ( * ) puzzles leads to the effective Lagrangian of Eq. (1.6), the underlying NP model is either V B or U 1 . But which is it? Short of actually producing a W /Z or a leptoquark in an experiment, is there any way of distinguishing the two models? At first glance, the answer is no. After all, the two models lead to the same effective Lagrangian and so are "equivalent." However, this is not really true. To see this, one has to understand the difference between analyses based on an effective field theory (EFT) and those based on models. In an EFT analysis, one writes all effective operators of a given order; these are considered as independent. The effective Lagrangian of Eq. (1.6) includes all four-fermion operators containing two quarks and two leptons. One can also write four-quark and four-lepton operators. But since these are uncorrelated with the operators with two quarks and two leptons, and since only processes of the type q → q ( ) are studied, these other operators are uninteresting. But this does not hold in a model analysis. Concretely, while both V B and U 1 models lead to operators with two quarks and two leptons, V B also produces four-quark and four-lepton operators at tree level. In particular, it will contribute significantly to B 0 s -B 0 s mixing and the lepton-flavor-violating decay τ → 3µ. These will lead to additional constraints on θ D and θ L , respectively. Furthermore, while V B contributes to B → K ( * ) νν, U 1 does not. The bottom line is that the experimental constraints on the V B model are more stringent than those on the U 1 model. Thus, the predictions for the rates of other processes can be very different in the two models, and this may allow us to distinguish them. It is this feature that is studied in the present paper.
We begin in Sec. 2 by reviewing the method of Ref. [44] for generating contributions to R K and R D ( * ) , as well as the NP models in which this can occur. These include the vector-boson model V B and the S 3 , U 1 and U 3 leptoquark models. The experimental measurements that constrain these models are described in Sec. 3. These include not only processes involving b → s + − , b → sνν and b → cτ −ν , but also τ → µφ, B 0 s -B 0 s mixing and τ → 3µ. These experimental constraints are applied to the models in Sec. 4. As in Ref. [44], we find that S 3 and U 3 are excluded, leaving only V B and U 1 . In Sec. 5 we examine the predictions of V B and U 1 for other processes, to see if the two models can be distinguished. We find that, in fact, there are a number of different ways of doing this. Useful processes/observables include τ → 3µ, R K , and a previously overlooked leptonflavor-violating decay, Υ → µτ . We conclude in Sec. 6.

Models
Including the generation indices i, j, k, l, the effective Lagrangian of Eq. (1.6) can be written as This holds in both the gauge and mass bases. The gauge eigenstates, which involve only third-generation fermions, are indicated by primes on the spinors; the mass eigenstates have no primes. In transforming from the gauge basis to the mass basis, we have where U , D and L are 3×3 unitary matrices and the spinors u ( ) , d ( ) , ( ) and ν ( ) include all three generations of fermions. The fact that the left-handed charged and neutral leptons are both operated upon by the same matrix L is a result of neglecting the neutrino masses 3 . The CKM matrix is given by The assumption of Ref. [44] is that the transformations D and L involve only the second and third generations: Because of these transformations, for the down-type quarks and charged leptons, couplings involving the second generation (possibly flavor-changing) are possible in the mass basis.
(For the up-type quarks, the first generation can also be involved.) Specifically, in the mass basis we have where X and Y include the transformations from the gauge to the mass basis. The exact forms of these matrices depend on which four-fermion operator is used. For the decay b → s + − we have If up-type quarks are involved in a process (such as b → cτ −ν ), one must include the transformation matrix U [Eq. (2. 2)]. Because V CKM = U † D, the amplitude will involve factors of V CKM in addition to X and Y . In terms of components, the effective Lagrangian is However, in processes such as b → cτ −ν τ and b → sνν, the final-state neutrinos are not detected, and so one must sum over all neutrinos. In this case, since VP M N S is unitary (V † P M N S VP M N S = 1), its effect on these processes vanishes.
For the processes of interest, the NP contributions are b → sµ + µ − : From these expressions, we see that there is no NP contribution to In the above, the NP is described in effective field theory language, as in Ref. [44]. However, we are interested in explicitly studying the models that can lead to this EFT. There are two categories of NP models, those with new vector bosons, and those that involve leptoquarks. Below we summarize the features of the various models.

SM-like vector bosons
This model contains vector bosons (V Bs) that transform as (1, 3, 0) under SU (3) C × SU (2) L × U (1) Y , as in the SM. We refer to the V Bs as V = W , Z .
In the gauge basis, the Lagrangian describing the couplings of the V Bs to left-handed third-generation fermions is where σ I (I = 1, 2, 3) are the Pauli matrices. Once the heavy V B is integrated out, we obtain the following effective Lagrangian, relevant for b → s + − , b → cτ −ν and b → sνν decays: Comparing this with Eq. (2.1), we find (2.12) Note that g 2 can be either positive or negative in this model. When one transforms to the mass basis, the V Bs couple to other generations. The Z contributes at tree level to b → sµ + µ − and b → sνν; the W contributes at tree level to b → cτ −ν . These contributions are given in Eqs. (2.7)-(2.9) for the above values of g 1 and g 2 .
The above processes all involve four-fermion operators that contain two quarks and two leptons. But V B exchange also produces four-quark and four-lepton operators at tree level. In the gauge basis, the corresponding effective Lagrangian is Table 1. Fierz transformations and Pauli-matrix identities used in the analysis of LQ models.
In the mass basis, these contribute to processes such as B 0 s -B 0 s mixing and τ → 3µ, and their measurements can be used to further constrain the V B model.
There are a number of variants of the V B model -for example, see Refs. [48][49][50][51][52]. Note that some of these models address the b → sµ + µ − anomalies with a Z , while others also try to explain the R D ( * ) puzzle. In some models, new fermions are involved. This introduce additional parameters, which can lead to more flexibility in predictions.

Leptoquarks
In Refs. [35,53] it was shown that six different types of leptoquark (LQ) models can explain R D ( * ) . Of these, only four lead to four-fermion operators of the desired [47]. However, different models will produce different combinations of the two operators. Below, with the help of the identities in Table 1, we determine these combinations for each of the four LQ models. That is, we derive the relation between g 1 and g 2 , as well as the signs of these quantities.
Note that, unlike the V B model, four-quark and four-lepton operators are not produced in LQ models at tree level.
In the gauge basis, the interaction Lagrangian for the S 1 LQ is given by [53] where ψ c = Cψ T denotes a charge-conjugated fermion field. When the heavy LQ is integrated out, we obtain the following effective Lagrangian: SU (2) L indices have been inserted in the first line. In the second line, we have used relations from Table 1 and then suppressed the indices. Comparing this with Eq. (2.1), we find When one transforms to the mass basis, the S 1 LQ couples to other generations. However, because g 1 = −g 2 , it does not contribute to b → sµ + µ − [Eq. (2.7)] and hence cannot explain R K . So this LQ model is not of interest to us.

SU
S 3 is a scalar LQ that is an SU (2) L triplet (it transforms as (3, 3, −2/3)). In the gauge basis, its interaction Lagrangian is given by [53] Integrating out the heavy LQ, we obtain the following effective Lagrangian: Comparing this with Eq. (2.1), we find When one transforms to the mass basis, the S 3 LQ couples to other generations. The These contributions are given in Eqs. (2.7)-(2.9) for the above values of g 1 and g 2 .
U 1 is a vector LQ that is an SU (2) L singlet (it transforms as (3, 1, 4/3)). Its interaction Lagrangian is given in the gauge basis by [53] ∆L Integrating out the heavy LQ, and inserting SU (2) L indices, we obtain the following effective Lagrangian: Comparing this with Eq. (2.1), we find In the mass basis, the U 1 LQ couples to other generations and contributes at tree level to b → sµ + µ − and b → cτ −ν . These contributions are given in Eqs. (2.7) and (2.9) for the above values of g 1 and g 2 . However, because g 1 = g 2 , there is no contribution to b → sνν.
The U 1 LQ has been studied in Ref. [56].
The U 3 LQ is a vector that is an SU (2) L triplet (it transforms as (3, 3, 4/3)). In the gauge basis, its interaction Lagrangian is given by [53] When the heavy LQ is integrated out, the effective Lagrangian is Comparing this with Eq. (2.1), we find In the mass basis, the U 3 LQ couples to other generations. The components of the These contributions are given in Eqs. (2.7)-(2.9) for the above values of g 1 and g 2 .

Summary
We briefly recap the above results. We assume that the NP couples only to the third generation in the gauge basis, and that it produces four-fermion operators with a (V − A) × (V − A) structure. We find that there are four NP models that contribute to both R K and R D ( * ) . There are two operators, O N P 1 and O N P 2 , shown in Eq. (2.1), whose coefficients are g 1 and g 2 . The four models contribute differently to O N P 1 and O N P 2 : V B : g 1 = 0 , g 2 = −g 33 qV g 33 V , g 2 can be positive or negative , In Ref. [44], it is noted that λ (3) (= g 2 ) is positive for the S 3 and U 3 models, but negative for U 1 . This is confirmed by the above.

Constraints
When one transforms to the mass basis, two new parameters are introduced, θ D , θ L . The NP contributes to b → sµ + µ − , b → sνν and b → cτ −ν . These contributions are given in Eqs. (2.7)-(2.9); the coefficients are (different) functions of g 1 , g 2 , θ D , θ L . Another decay to which all four models contribute is τ → µφ. In addition, the V B model contributes to other processes, such as B 0 s -B 0 s mixing and τ → 3µ. The experimental measurements of, or limits on, these processes provide constraints on the NP parameter space.
In order to compare models, we fix Λ NP = 1 TeV and assume a common value for We apply all the experimental constraints to establish the allowed region in the (θ D , θ L ) parameter space. If there is no region in which all constraints overlap, the model is excluded. For the models that are retained, we predict the rates for other processes based on the allowed region in parameter space. Since this region can be different for different models, it may be possible to distinguish them.
where the Wilson coefficients include both the SM and NP contributions: C X = C X (SM)+ C X (NP). Comparing with Eqs. (2.7)-(2.9) (and recalling that L NP and H eff have opposite signs), we have In the following subsections we examine the experimentally-preferred values of the above quantities.
3.1.1 C µµ 9 (NP) = −C µµ 10 (NP) A global analysis of the b → s + − anomalies was recently performed in Ref. [10]. The fit It was found that there is a significant disagreement with the SM, possibly as large as 4σ, and that it can be explained if there is NP in b → sµ + µ − . There are four possible explanations, each having roughly equal goodness-of-fits: Of these, it is solution (ii) that interests us. According to the fit, the allowed 3σ range for the Wilson coefficients is Note that the above range of the NP contribution is consistent with the R K anomaly: the central value of R expt K can be explained with C µµ 9 (NP) −0.55.
can be constrained by the existing data ofB → Kνν andB → K * νν decays. The BaBar and Belle Collaborations give the following 90% C.L. upper limits [45,46]: (3.8) In Ref. [59], these are compared with the SM predictions Taking into account the theoretical uncertainties [59], the 90% C.L. upper bounds on the NP contributions are (3.10) We have A constraint on the NP contribution can also be obtained from the inclusive decay. The ALEPH Collaboration gives the 90% C.L. upper limit as B(B → X s νν) ≤ 6.4 × 10 −4 [60]. However, this implies B SM+NP Xs /B SM Xs ≤ 22, which is a weaker constraint than that from the exclusive decays. (3.14) Here we have assumed C eν j V = 0.

τ → µφ
The NP effective Lagrangian of Eq. (2.6) generates the process τ → µss: which will lead to τ → µφ and τ → µη ( ) . Writing the hadronic currents as the branching ratios (neglecting the mass of the muon) are given by where Thus we obtain the following ratio: .

(3.18)
We may use the following expression to estimate f 2 φ : Taking the values for m φ , m τ , τ τ , Γ φ and B(φ → µ + µ − ) from Ref. [61], this yields f φ ≈ 225 MeV. For the η ( ) decay constant we get (using f π = 130 MeV, f 1 ∼ 1.1f π , f 8 ∼ 1.3f π [62] , and θ = 19 Using these we obtain The current 90% C.L. limits on these branching ratios are [61] B(τ → µη) < 6.5 × 10 −8 , Of these decays, τ → µη is the least constraining. And since τ → µφ and τ → µη are of the same order, we will use τ → µφ to constrain the coupling κ. Using B(τ → µφ) < 8.4 × 10 −8 [64] and Λ NP = 1 TeV, we obtain the constraint 3.3 B 0 s -B 0 s mixing As noted in Sec. 2.1, the VB model also generates four-quark operators at tree level. In the mass basis, the operator of Eq. (2.13) includes where we have assumed g 33 qV = g 33 V . This generates a contribution to B 0 s -B 0 s mixing. In the SM, the same operator is produced via a box diagram. Here we have 25) where (3.26) In the above, x t ≡ m 2 t /m 2 W and η Bs = 0.551 is the QCD correction [65]. The SM and NP contributions can be combined. We define The mass difference in the B s system is then given by As we will see in the next section, B 0 s -B 0 s mixing will put a constraint on the V B model, but it is weaker than that fromB → K * νν.

τ → 3µ
Finally, the VB model also produces four-lepton operators at tree level. In the mass basis, the Lagrangian of Eq. (2.13) includes the operator which generates the decay τ → 3µ. As this is a lepton-flavor-violating decay, it can arise only due to NP. The decay rate for τ → 3µ is then given by where X is a suppression factor due to the non-zero muon mass. In terms of η µ = m µ /m τ , it is given by At present, the branching ratio for τ − → µ − µ + µ − has only an experimental upper bound [71]: This then puts a constraint on θ L in the VB model. Figure 1. Allowed regions in the (θ L , θ D ) plane for the V B, S 3 , U 1 , and U 3 models. We have fixed the NP scale as Λ NP = 1 TeV. In each model, the third-generation coupling is taken as anomalies can be explained in the shaded regions colored in pink, red, and blue, respectively. The regions bounded by the gray, cyan, and green lines are allowed from the measurements of b → sνν, τ → 3µ, and τ → µφ, respectively. For the VB model, the entire region in the figure satisfies the constraint from B 0 s -B 0 s mixing.

Models: allowed parameter space
Taking into account all the experimental constraints described in Sec. 3, we find the allowed parameter space in the four NP models. We assume Λ NP = 1 TeV, and take the thirdgeneration coupling to be g 33 qV g 33 For all four models, the flavor anomalies R D , R D * and R K can be explained in the shaded regions colored in pink, red and blue, respectively. The gray shaded region is allowed fromB → K ( * ) νν at 90% C.L. The region bounded by the green lines is consistent with the 90% C.L. upper limit on the branching ratio of τ → µφ. For the V B model, there are additional constraints coming from B 0 s -B 0 s mixing and τ → 3µ. For the τ → 3µ constraint, the region to the left of the cyan line is allowed. On the other hand, with the given g 33 qV g 33 V = 1, B 0 s -B 0 s mixing does not provide any constraint -the entire region in the figure is allowed.
Based on this figure, one can make two observations: • There are only two regions in parameter space where the constraints from R D , R D * , R K andB → K ( * ) νν (if applicable) might overlap. These are roughly around π/16 θ L π/8, with θ D near 0 (region 1) or π/2 (region 2). However, the additional constraint from τ → φµ distinguishes the two regions. That is, while region 1 satisfies the τ → φµ constraint, region 2 does not, and is therefore excluded. Henceforth, we focus only on region 1.
• For the VB model, the constraint from B 0 s -B 0 s mixing has the same shape as that from B → K ( * ) νν. They are both independent of θ L , and so bound only θ D . However, we see that the B 0 s -B 0 s mixing constraint is much less stringent than that fromB → K ( * ) νν. As noted above, for g 33 qV g 33 V = 1, the entire region in the figure is allowed, while the region allowed byB → K ( * ) νν is quite small.
In order to obtain more information, in Fig. 2, we show the constraints in region 1 of the (θ L , θ D ) plane for the V B, S 3 , U 1 , and U 3 models. In the figures, we indicate the values of the contours for the flavor anomalies, that is, R NP+SM X /R SM X for X = K and D ( * ) . From this figure, we can see that • On the other hand, the V B and U 1 models are allowed if θ D is very small, < π/64. To be precise, for our choice of couplings, θ D ≤ 0.035 for V B, while θ D ≤ 0.028 for U 1 .
• Comparing the V B and U 1 models, the constraints from the flavor anomalies (R K , R D ( * ) ) are similar (for θ D near 0). But V B has additional constraints from B → K ( * ) νν and τ → 3µ, so that the allowed region is much smaller for V B than for U 1 . In particular, for the V B model the maximum allowed value of |θ L | is ∼ π/16, while for the U 1 model, |θ L | can be much larger (|θ L | max = 0.61).
The V B and U 1 models are therefore the candidates to simultaneously explain the R K and R D ( * ) puzzles in the case where the NP couples predominantly to the third-generation fermions. The favored value for the mixing in the down sector is then very small, with (θ D ) max = 0.035 (V B) or 0.028 (U 1 ).

Predictions
We have now established that the V B and U 1 models are candidates to explain the present discrepancies with the SM in b → sµ + µ − , R D ( * ) and R K . The main question we wish to address in this paper is: is there any way of distinguishing the two models? There are two handles that can potentially accomplish this. First, the V B model contributes to four-lepton and four-quark operators, and hence to processes such as τ → 3µ and B 0 s -B 0 s mixing, while the U 1 model does not 5 . Second, due to additional constraints, the allowed 5 In this paper we perform the analysis at tree level. Radiative corrections to four-lepton operators have been considered in Ref. [72] within an EFT framework. However, as with all EFT analyses, the results do not necessarily apply to all models. To obtain the proper result, a more complete analysis must be done within each individual model. region in (θ L , θ D ) space is quite a bit smaller for V B than for U 1 . Below we explore the predictions of the two models for various processes. As we will see, it is potentially possible to distinguish the V B and U 1 models.

Processes
The 3σ allowed ranges of R ratio D ( * ) are given in Eq. (3.13). At present, large deviations from the SM are allowed (up to 79% and 53% for R D and R D * , respectively). On the other hand, from Fig. 2, we see that the V B and U 1 models are allowed only if θ D is very small. This means that such large deviations in R D ( * ) from the SM are not favored, as these are inconsistent with the R K anomaly. The models predict is measured with greater precision, it will probably not be possible to distinguish the V B and U 1 models. However, if the measurements confirm large deviations from the SM, both models will be ruled out.

R K
The situation is different for R K . Using Eq. (1.1), its allowed 3σ range is 0.498 ≤ R K ≤ 1.036. The models predict [73] V B : 0.59 ≤ R K ≤ 0.90 , We therefore see that the U 1 model can accomodate smaller values of R K than can the V B model. This is due to the fact that its allowed (θ L , θ D ) region includes larger values of θ L . Thus, if future measurements of R K find it to be in the 0.51-0.59 range, this would point clearly to U 1 (and exclude V B).

τ → 3µ
This decay is particularly interesting because only the V B model contributes to it. The present experimental bound is Belle II expects to reduce this limit to < 10 −10 [74]. The reach of LHCb is somewhat weaker, < 10 −9 [75]. Now, the amplitude for τ → 3µ depends only on θ L [Eq. (3.31)]. In the allowed (θ L , θ D ) region for the V B model, the maximum value of θ L is fixed by the bound on B(τ − → µ − µ + µ − ). But there is also a lower limit on θ L , due to the overlap of all the constraint regions. V B then predicts Thus, the V B model predicts that τ → 3µ should be observed at both LHCb and Belle II. This is a smoking-gun signal for the model.

B → K ( * ) µτ
The BaBar Collaboration obtained an experimental bound of B(B + → K + µ ± τ ∓ ) < 4.8 × 10 −5 at 90% C.L. [76]. Belle II will collect 100 times more data than BaBar, and this will allow it to measure B(B + → K + µ ± τ ∓ ) to a level of 5 × 10 −7 [77]. The models predict [73] V B : 3.2 × 10 −9 ≤ B(B → K ( * ) µτ ) ≤ 2.3 × 10 −8 , Neither model can produce B(B → K ( * ) µτ ) sufficiently large that it can be observed at Belle II. (And even if it could be observed, it would not be possible to distinguish the V B and U 1 models, since their upper limits on the branching ratio are essentially the same.) The BaBar Collaboration recently put a limit of B(B + → K + τ + τ − ) < 2.25 × 10 −3 at 90% C.L. [78]. Belle II will be able to improve on this, but because there are two τ 's in the final state, the expected reach is only ∼ 2 × 10 −4 [77].
To measure and calculate the branching ratio of B → K ( * ) τ + τ − , we need to deal with charmonium resonances. In analogy with B → K ( * ) µ + µ − , we take q 2 > 15 GeV 2 for integration and obtain the partial branching ratio by using flavio [73]: The maximum value of B(B → K ( * ) τ + τ − ) possible in both models is still two orders of magnitude smaller than the estimated reach of Belle II. This decay can therefore not be used as a signal of the the V B and/or U 1 models.
5.1.6 B 0 s → µτ , B 0 s → τ + τ − At present, LHCb is working on measuring these two decays, which are difficult due to the presence of τ 's in the final state. However, no estimates of the reach are available [79]. (At Belle II, a rough estimate for B 0 s → τ + τ − could be ∼ 2 × 10 −3 with 50 ab −1 of data, obtained by rescaling the present data at Belle.) For B 0 s → µτ , the models predict For B 0 s → µτ , the upper limits on the branching ratio are essentially the same for the V B and U 1 models. Thus, the only way to distinguish the models is if B(B 0 s → µτ ) were measured to be between 1.8 × 10 −9 and 5.3 × 10 −8 . This can only occur in the U 1 model. However, it is unlikely that such a small branching ratio is measurable. On the other hand, the upper limits on B(B 0 s → τ + τ − ) are different for the two models. Thus, if this branching ratio were found to be in the range 6.7 × 10 −6 -1.2 × 10 −5 , this would point to V B (and exclude U 1 ). However, here too it is not clear that such a small branching ratio is measurable.

Υ → µτ
Finally, we turn to Υ → µτ . This lepton-flavor-violating decay has been overlooked in previous analyses, but it is potentially an important process to consider 6 . At the fermion level, this decay is bb → µτ , and it can receive contributions from both the V B and U 1 models. Note that this process has a pattern of mixing different from the above processes, and thus the models provide unique predictions.
In the past, the BaBar [81] and CLEO [82] Collaborations have studied lepton flavor violation in narrow Υ(nS)(n = 1, 2, 3) decays. The strongest limits come from BaBar [81], which put an upper limit on B(Υ(2S, 3S) → µτ ) of a few times 10 −6 . This was obtained using 13.6 fb −1 and 26.8 fb −1 of the BaBar dataset on the Υ(2S) and Υ(3S), respectively. Belle II is expected to collect a few hundred fb −1 of data on the Υ(3S) [77]. A precise estimate of the sensitivity to Υ(3S) → µτ will require a dedicated study. However, given the order-of-magnitude increase in luminosity at Belle II compared to BaBar, we expect roughly an order-of-magnitude improvement in the sensitivity. That is, a reach of about 10 −7 for B(Υ(3S) → µτ ) at Belle II is not unreasonable. These decays may also be studied at LHCb, but we are not aware of the LHCb reach for these processes.
In the SM, the LFV decay Υ(nS) → − + , where and represent leptons of different flavor, is highly suppressed. On the other hand, in the V B and U 1 models, Υ(nS) → µ − τ + receives significant contributions. Assuming the NP is purely left-handed, the decay rate for this process is given by where η τ = m τ /m Υ(nS) and κ contains the coupling corresponding to the transition bb → τ µ. In the V B and U 1 models we have The decay constant f Υ(nS) can be found using the electromagnetic decay Υ(nS) → − + , which is unaffected by NP. Its decay rate can be expressed as where η (nS) = m /m Υ(nS) . 6 Quark flavor violating quarkonium decays were considered in Ref. [80].

Summary
There are therefore three observables that can distinguish the V B and U 1 models: does not contribute to the decay). This implies that the LFV decay τ → 3µ, which is absent in the SM, should be observed at both LHCb and Belle II. This is therefore a smokinggun signal: it can occur only in the V B model, and if the decay is not seen, the model would be ruled out.
2. R K : The current 3σ range for R K is 0.498 ≤ R K ≤ 1.036. The U 1 model can accomodate smaller values of R K , while the V B model cannot. Specifically, if future measurements of R K find it to be in the range 0.51 to 0.59, this would point to U 1 (and exclude V B).
3. Υ → µτ : To date, the LFV decay Υ → µτ has been overlooked as a test of NP models in B decays. Within the V B model, B(Υ(nS) → µτ ) can be at most a few times 10 −9 , but in the U 1 model, it can reach a few times 10 −8 . Belle II should be able to measure B(Υ(3S) → µτ ) down to ∼ 10 −7 . However, this is only a very rough estimate -a detailed study is needed for a precise determination of the reach. It may be that, in fact, Belle II (or LHCb) will be able to observe branching ratios of O(10 −8 ). And if the decay Υ → µτ is observed, this will suggest the U 1 model.
There are five other observables that receive contributions in the V B and U 1 models: However, either these observables cannot distinguish the two models, or, if they can, the predicted branching ratios fall below the expected reach of Belle II and LHCb.  Fig. 2 for the V B model, but with g 33 qV g 33 V = 0.3 (left) or g 33 qV g 33 V = 5 (right). Left: the blue (R K ), cyan (τ → 3µ) and red (R D * ) regions barely overlap, so this is the minimum value of the coupling 2 for which V B is viable. Right: the gray (b → sνν), cyan and red regions barely overlap, so this is the maximum value of the coupling 2 for which V B is viable.

Varying the Couplings
Now, the results of the previous subsection have been found assuming that g 33 qV g 33 V = h 33 U 1 2 = 1. However, there is nothing special about this value of the square of the coupling (henceforth denoted coupling 2 ). This then raises the question: if the coupling 2 is allowed to take different values, how do the results of Sec. 5.1 change? This is examined in this subsection.
For each new value of the coupling 2 , one must redo the analysis of Sec. 4, to determine the region in (θ L , θ D ) parameter space allowed by the various experimental constraints. That is, figures of the type in Fig. 2 are produced. The following results are found: • For the S 3 and U 3 models, it is found that the R D * andB → K ( * ) νν regions do not overlap, and this is independent of the value of coupling 2 . S 3 and U 3 are therefore excluded.
• The V B model is viable only if 0.3 ≤ g 33 qV g 33 V ≤ 5, see Fig. 3.
• The U 1 model is viable only if h 33 Values of the coupling 2 larger than 5 are allowed, see Fig. 4.
In fact, we do have some information about the value of the coupling 2 . One can set limits on coupling 2 /Λ 2 NP from direct searches, assuming a certain mode of production for the new mediator states. Following Ref. [83], using the bb → ττ process mediated by sor t-channel vector-boson or leptoquark exchange, one can get the following rough upper For coupling 2 = 1, we found that (i) for both models, large deviations in R D ( * ) from the SM are not favored, and (ii) the allowed maximum value of R ratio D ( * ) is similar for the V B and U 1 models, so that they cannot be distinguished using this observable. From the above numbers, we see that, when the coupling 2 is allowed to vary, (i) no longer holds. Somewhat large (but not too large) values of R ratio D ( * ) are now allowed. However, (ii) still holds. We therefore conclude that R D ( * ) cannot be used to distinguish the V B and U 1 models. 7 To be precise, the bound given in Ref. [83] should be applied as g 33 qV g 33 V cos 2 θD cos 2 θL ≤ 3 and h 33 U 1 cos θD cos θL 2 ≤ 5 (for ΛNP = 1 TeV). The down-sector mixing, which reduces the rate of bb pair production, is negligible since θD 1 for the present case. As for the lepton mixing, it can at most reduce the decay rate into ττ by 15% (for θL ≤ π/8). Here we (conservatively) ignore this effect, resulting in a slightly more stringent constraint on coupling 2 , as shown in the main text. 3. τ → 3µ:

R
(5.14) As noted above, the expected reach of Belle II for B(τ − → µ − µ + µ − ) is < 10 −10 [74], while for LHCb it is < 10 −9 [75]. For coupling 2 = 1, we found that [B(τ − → µ − µ + µ − )] min = 4.5 × 10 −9 [Eq. (5. 3)], so that the V B model predicts that τ → 3µ should be observed at both LHCb and Belle II. When the coupling 2 is allowed to vary, the minimum value for B(τ − → µ − µ + µ − ) is smaller, but the decay should still be observed at Belle II. Thus, the τ → 3µ decay remains a smoking-gun signal for the V B model. When the coupling 2 is allowed to vary, the values of B(B → K ( * ) µτ )| max predicted by the V B and U 1 models are quite different, so that, in principle, B → K ( * ) µτ decays could be used to distinguish the two models. Unfortunately, this will not work, as both values of B(B → K ( * ) µτ )| max are below the reach of Belle II (which is 5 × 10 −7 [77]).
Here too, when the coupling 2 is allowed to vary, we find that the value of B(B → K ( * ) τ + τ − )| max is quite different in the V B and U 1 models. For U 1 , this value may just be attainable at Belle II (its reach is ∼ 2 × 10 −4 [77]). Thus, B → K ( * ) τ + τ − could perhaps be used to distinguish the two models.
6. B 0 s → µτ : Once again, the value of B(B 0 s → µτ )| max is quite different in the two models. However, we cannot evaluate whether this decay can be used to distinguish the two models as we do not know the reach of LHCb or Belle II for B 0 s → µτ .
The value of B(B 0 s → τ + τ − )| max is quite different in the two models. However, we cannot evaluate whether this decay can be used to distinguish the two models as we do not know the reach of LHCb or Belle II for B 0 s → τ + τ − .
8. Υ(3S) → µτ : Previously, we made a rough estimate that Belle II should be able to measure B(Υ(3S) → µτ ) down to ∼ 10 −7 . We speculated that perhaps Belle II could do better than this (and noted that a precise determination of the reach can only be obtained through a detailed study). However, the above predicted values of B(Υ(3S) → µτ )| max show that, even with our rough estimate, the U 1 model can lead to rates for Υ(3S) → µτ that are easily observable at Belle II. If this decay were seen, it would exclude V B and point to U 1 . This demonstrates the importance of this process for testing NP models in B decays.

Combining Observables
Above, we have seen that it is indeed possible to distinguish the V B and U 1 models. V B predicts that τ → 3µ should be seen at LHCb and Belle II, and there are several observables that are signals of U 1 . Should one of these signals be seen, indicating the presence of a particular type of NP, it would of course be very exciting. However, even more information about the underlying NP model can be obtained by using the measurements of other observables.
The key point is that both models contain three unknown parameters: θ L , θ D and coupling 2 /Λ 2 NP (without loss of generality, we set Λ NP = 1 TeV). Then, given the measurement of an observable that indicates which of the V B and U 1 models is present, one can use two other observables to derive the values of all the parameters of the model.
To illustrate this, suppose that R K and R D ( * ) are measured very precisely, and R K = 0.781 and R ratio D ( * ) = 1.077 are found.

Conclusions
At present, there are several measurements of B decays that exhibit discrepancies with the predictions of the SM. These include P 5 (from an angular analysis of B → K * µ + µ − ), the differential branching fraction of B 0 . These suggest NP inb →sµ + µ − (first three signals) orb →cτ + ν τ (R D ( * ) ). Now, suppose that NP is present, and that it couples preferentially to the left-handed third-generation particles in the gauge basis. In Ref. [30], it was noted that, if this NP is invariant under the full SU (3) C × SU (2) L × U (1) Y gauge group, then, when one transforms to the mass basis, one generates the operators (b L γ µ s L )(μ L γ µ µ L ) (that contributes tob →sµ + µ − ) and (b L γ µ c L )(τ L γ µ ν τ L ) (that contributes tob →cτ + ν τ ). In other words, the R K and R D ( * ) puzzles can be simultaneously explained.
This idea was explored in greater detail, using an effective field theory approach, in Ref. [44]. Here the starting point is a model-independent effective Lagrangian consisting of two four-fermion operators in the gauge basis, each with its own coupling. It was assumed that the transformation from the gauge basis to the mass basis leads to mixing only between the second and third generations. As a consequence, for the down-type quarks, only one unknown theoretical parameter is introduced: θ D . Similarly, for the charged leptons, θ L is the new parameter. In the mass basis, the two operators contribute to a variety of B decays, all with two quarks and two leptons at the fermion level: B → K * µ + µ − , B 0 s → φµ + µ − , R K , R D ( * ) , B → K ( * ) νν. The coefficients of the operators in the mass basis are all functions of the coupling 2 , θ D and θ L . For assumed values of the coupling 2 , the experimental measurements lead to an allowed region in (θ L , θ D ) space. This region was found to be nonzero, showing that a simultaneous explanation of R K and R D ( * ) is possible. There are two UV completions that can give rise to the effective Lagrangian. They are (i) V B: a vector boson that transforms as an SU (2) L triplet, as in the SM, and (ii) U 1 : an SU (2) L -singlet vector leptoquark.
The purpose of this paper is to explore ways of distinguishing the V B and U 1 models. There are two reasons to think that this might be possible. First, the V B model does not lead only to tree-level operators with two quarks and two leptons. It also produces fourquark and four-lepton operators. As such, it also contributes to processes such as B 0 s -B 0 s mixing and τ → 3µ. These will lead to additional constraints on θ D and θ L , respectively. Second, while V B contributes to B → K ( * ) νν, U 1 does not. The net effect is that the experimental constraints on the V B model are more stringent than those on the U 1 model. That is, the allowed region in (θ L , θ D ) space is smaller for V B than for U 1 . This implies that the predictions for the rates of other lepton-flavor-violating processes may be very different in the two models, which will allow us to distinguish them. With this in mind, we proceeded as follows. First, for each of the models, we applied the relevant experimental constraints to determine the allowed region in (θ L , θ D ) space. The constraints from the measurements of R K , R D , R D * , and τ → µφ applied to both models. For V B there were additional constraints from B → K ( * ) νν, B 0 s -B 0 s mixing, and τ → 3µ. Second, using the allowed (θ L , θ D ) region, we computed the predictions of the two models for various observables. (This was done for g 33 qV g 33 V = 1 (V B) and h 33 We note in passing that, for g 33 qV g 33 V = h 33  .6)] than in the effective field theory analysis of Ref. [44]. This is because all constraints have been included in the model analysis. This illustrates that the results from the effective field theory analysis must be used carefully: despite being "model-independent," they don't necessarily apply to all models.
We found that it is indeed possible to distinguish the V B and U 1 models experimentally. The results when the couplings vary in the ranges 0.3 ≤ g 33 qV g 33 V ≤ 3 and 0.5 ≤ h 33 This should be observable at Belle II (the expected reach is < 10 −10 ). The τ → 3µ decay is therefore a smoking-gun signal for the V B model. There is no similar observable for the U 1 model. However, there are a number of processes that can potentially point to U 1 . For the decay Υ(3S) → µτ , we estimated that Belle II should be able to measure its branching ratio down to ∼ 10 −7 . But the U 1 (V B) model predicts B(Υ(3S) → µτ )| max = 7.9 × 10 −7 (2.2 × 10 −8 ). Thus, if this decay were observed, it would indicate U 1 (and exclude V B). Another possibility is R K . Its present allowed 3σ range is 0.498 ≤ R K ≤ 1.036. The U 1 (V B) model predicts (R K ) min = 0.51 (0.58). The U 1 model can therefore accomodate smaller values of R K than can the V B model, so that, if future measurements find R K to be in the range 0.51 to 0.58, this would exclude V B and favor U 1 . Finally, for the other decays B → K ( * ) µτ , B → K ( * ) τ + τ − , B 0 s → µτ , and B 0 s → τ + τ − , in all cases the U 1 model predicts larger branching ratios than does V B. However, whether or not these decays can be used to distinguish the two models depends on whether they can be observed at Belle II or LHCb.
Notes Added: (1) While this paper was being completed, the Belle Collaboration released a new measurement of R D * [84]. They find consistency with the SM at the level of 0.6σ. Now, if this result is combined with the previous results of BaBar, Belle and LHCb, the discrepancy with the SM is reduced. However, in any case, neither of the V B and U 1 models presented in this paper allows for large deviations in R D ( * ) from the SM. Thus, this result is rather favored. (2) After this paper was submitted to the arXiv, we were informed that LHCb has now set the upper limit B(B 0 s → τ + τ − ) < 3.0 × 10 −3 (95% C.L.) [85].