$\tau \to \ell +$ invisible through invisible-savvy collider variables

New particles $\phi$ in the MeV-GeV range produced at colliders and escaping detection can be searched for at operating $b-$ and $\tau-$factories such as Belle II. A typical search topology involves pair-produced $\tau$s (or mesons), one of which decaying to visibles plus the $\phi$, and the other providing a tag. One crucial impediment of these searches is the limited ability to reconstruct the parents' separate boosts. This is the case in the 'typical' topology where both decay branches include escaping particles. We observe that such topology lends itself to the use of kinematic variables such as $M_2$, designed for pairwise decays to visibles plus escaping particles, and endowed with a built-in ('MAOS') way to efficiently guess the parents' separate boosts. Starting from this observation, we construct several kinematic quantities able to discriminate signal from background, and apply them to a benchmark search, $\tau \to e + \phi$, where $\phi$ can be either an axion-like particle or a hidden photon. Our considered variables can be applied to a wider range of topologies than the current reference technique, based on the event thrust, with which they are nearly uncorrelated. Application of our strategy leads to an improvement by a factor close to 3 in the branching-ratio upper limit for $\tau \to e \phi$, with respect to the currently expected limit, assuming $m_\phi \lesssim 1$ MeV. For example, we anticipate a sensitivity of $1.7 \times 10^{-5}$ with the data collected before the 2022 shutdown.

New light particles are commonplace in Standard-Model (SM) extensions and allow to elegantly solve problems of both conceptual and observational nature, see e.g. [1]. Depending on the couplings structure, these particles may actually be as heavy as a GeV. Remarkably, new scalars in the MeV-GeV range with larger than weak couplings to SM matter are fully compatible with the body of knowledge we have on stable matter [2], because most constraints, notably astrophysical data, apply to interactions with 1 st -generation matter only. Theoretically, there is no compelling reason why these particles' couplings should be universal across generations, or flavourdiagonal [3]. Meson or τ decays at colliders are especially suited to test such couplings, not only by definition, but also because of the large statistics and accuracy now attainable.
A common hypothesis that allows for minimal model dependence is that these light particles, once produced in the decay, escape detection. The strongest limits are obtained in missing-energy searches where the new particle is either produced in the beam interaction with a fixed target, or is the product of collisions whose initial state is very well known -e.g. e + e − beams. For a recent comprehensive review see Ref. [2].
A prototype example of search under the above hypothesis is τ → (= e or µ) plus an axion-like particle (ALP, denoted by φ) [4], performed at Mark III [5] and ARGUS [6], and on-going at Belle II [7]. 1 At these facilities, the parent τ 's are pair-produced at well-defined energies and their decay products are collected over a large angular acceptance, which allows for an accurate estimate of the total missing energy of the system. The dominant background to this kind of search is represented by SM processes also containing undetected particles, notably neutrinos. Hence, for the rare signal and the overwhelming background alike, the separate momenta of the pair-produced τ 's are unknown. To pinpoint the signal, the reference strategy has historically been to estimate the signal-τ momentum using the visible momenta on the tag side. If for the latter one assumes τ → 3πν (note that the 3π allow for a high-quality vertex), the signalτ momentum may then be estimated via the relations s/2 [6]. 2 A generalisation of this technique [9], representing the current state-of-the-art, takes advantage of the 'thrust axis' [10,11] of the event, identified from the maximum of the 'thrust scalar' T ≡ i p i · n/|p i |, where p i denotes all the visible momenta in the decay. This direction can 1 A recent limit was also placed by [8]. 2 Here and henceforth, a hat denotes a unit vector. arXiv:2106.16236v2 [hep-ph] 17 Oct 2021 be used for approximating the signal-τ momentum [7] as In either case, the spectrum of the signal-side daughter lepton accompanying the φ particle is then calculated through a boost to the rest frame of the parent τ . We propose a different approach based on the following observations: (i) signal and background decays have a common topology, consisting of visible final states plus either the elusive φ (signal), or neutrinos (backgrounds). There exists an arsenal of kinematic variables that are designed precisely for such pairwise decay topology, in particular the 'stransverse mass' M T2 [12,13] and Lorentzinvariant generalisations thereof [14], here collectively denoted as M 2 ; (ii) M 2 is a minimisation procedure in the three unknowns constituting the invisible 3-momentum on one of the two decay branches. 3 The minimum, henceforth referred to as M 2 -Assisted On-Shell, or 'MAOS', invisible momentum [15,16], is distributed around the corresponding true invisible 3-momentum. M 2 thereby offers a 'built-in' estimator of the invisible 3-momenta, separately for the two branches -which addresses the underlying challenge of the search; (iii) We expect M 2 , as well as MAOS momenta, to show negligible correlation with variables built out of visible momenta only. Because of this small correlation all of these variables can be profitably combined. Our results show that the performance of such combination is substantially higher than the case where the different variables discussed are used individually.
We introduce our idea in the context of τ → +φ, 4 with φ an ALP. However, the kinematic methods discussed may be applied to any other beyond-SM scenario with the same topology, e.g. lepton flavour violating couplings mediated by a spin-0 or spin-1 particle. We will make comments notably on the case of a 'hidden photon'.
The signal of interest is where we have omitted charge specifications on the r.h.s., and we assume that the φ escapes undetected. The τ decay to three charged pions is used as a tag. We denote this channel as '1×3'. The dominant, irreducible background is while other channels such as τ (→ πν) τ (→ 3πν) are all suppressed by either PID requirements, kinematic selections, event-shape analysis or track vertexing. The tag decay in eq. (2) is chosen for comparison with current state-of-the-art measurements; however, MAOS momenta can also be calculated for 1-prong tags such as τ (→ νν), for which the ARGUS/thrust method is not available. We will thus also consider this '1×1' channel, whose irreducible background is τ (→ νν)τ (→ νν). Clearly, in terms of the decay topology the signal and backgrounds thus differ only by the number of invisible particles. As mentioned, this topology lends itself to the use of M T2 and its generalizations. Such variables have been extensively applied to high-p T searches, notably of pair-produced supersymmetric particles. Instead, to our knowledge, this approach has never been considered for decays of pair-produced mesons or leptons, of the kind of interest here.
The M T2 variable is the two-decay-chain generalization of the M T variable [17,18], used since CERN's UA1 experiment to measure the W mass in W → ν, and defined from the inequality m 2 T implying that the M T endpoint allows to measure m W . If one has two parents decaying to visible products with collective momenta p 1,2 for the two branches, plus invisible final states on either branch, one may generalise the above argument with max{M T (branch 1 ), M T (branch 2 )} -the largest M T will furnish the best lower bound. While the two invisible momenta, k 1,2 , are not known individually, they fulfil the constraint k 1T + k 2T = P miss T , where the r.h.s. denotes the measured transverse momentum imbalance. Hence, the 'most conservative max' one can take is the minimum over the configurations fulfilling such constraints, i.e.
which defines M T2 [12,13]. The subscript "T" in the above discussion denotes the projection onto the plane transverse to the beam direction. Such projection is unnecessary at lepton colliders like Belle II, where the full, transverse as well as longitudinal, momentum imbalance is reconstructible. We can then employ the fully Lorentz-invariant extension of M T2 , known as M 2 [14] (see also [19,20]). As elucidated in Ref. [20], several variations of the M 2 variable can actually be used for one and the same topology, depending on the kinematic constraints -i.e. on-shell mass relations -that are imposed in the minimisation and those that are not. This feature makes M 2 an extremely versatile tool. Since the center-of-mass system (CMS) energy is fixed, we focus on the following definition where s is the squared collision energy, and p i (k i ) denote here and henceforth the visible-(invisible-)system total momenta on the decay branch i = 1, 2. One may further subject eq. (5) to the decaying parents' on-shell mass relations, i.e. (p 1 + k 1 ) 2 = (p 2 + k 2 ) 2 = m 2 τ . However, the additional constraints lead to the same minimum as our M 2 definition [21]. 5 Similarly as for M T2 , also the M 2 endpoint is the parent-particle mass. Compared to M T2 , M 2 distributions peak at a higher value and are more populated toward the endpoint. Interestingly, one common feature is that the smaller the number of invisible particles, the more the distributions are populated toward their upper edge (for M T2 , see related discussions in [23,24]). This is displayed in fig. 1 for the case m φ = 1 MeV. 6 Hence a shape analysis of both M T2 and M 2 could in principle be used to extract information about the number of invisible particles in the event, i.e whether the event is more signal-or background-like. Given the correlation between M T2 and M 2 , for definiteness we only consider the latter in the rest of our discussion.
Importantly, the MAOS solution to the constrained minimisation in eq. (5) can be used as an estimator of the true values of k 1,2 , to be denoted as k maos 1,2 [15,16]. Similarly as in the M T2 case, the M 2 -based MAOS momenta [20,25] are distributed symmetrically around the true momenta, and peak at the respective true values. A few remarks are in order. First, the M 2 -based MAOS method turns out to be more efficient than the traditional MAOS method from M T2 [25]. One reason is the fact that the M T2 -based MAOS solution comes with a twofold ambiguity in the longitudinal components of the invisible momenta, whereas the M 2 -based MAOS solution is unique, as all momentum components are treated on a similar 5 In the definition (5), M 2 is similar to M 2Cons devised to study H → τ + τ − at hadron colliders [21,22], except that the constraint on the missing longitudinal momentum is not adopted.  footing. Second, according to the definition of the full invariant masses M in eq. (5), one further piece of information would be required: k 2 1,2 . We set k 2 1,2 = 0, which usefully ensures the inequality M 2 ≤ m τ for both signal and background events [15,16]. At face value, k 2 1,2 = 0 is a bad guess, because e.g. for the background channel νν one has k 2 = m 2 νν , which peaks around 1 GeV 2 . We inspected more refined ansaetze, including the unfeasible case where one uses the truth-level k 2 1,2 values. We found that the M 2 distributions do not depend significantly on this choice, in agreement with existing literature [15,16], although this issue may warrant further scrutiny.
With the thus-defined k maos 1,2 we can construct additional variables that would require knowledge of the invisible momenta, and that offer criteria for signalbackground discrimination. A first example is |p τ -RF | in the 1 × 3 channel. In the τ rest frame (RF), obtained through either the MAOS or thrust methods, this quantity is equivalent to the signal-side invisible momentum. In fig. 2 we thus compare the distribution for p maos e ≡ |p e | maos τ -RF with the corresponding quantity obtained through the thrust, for m φ = 1 MeV, representative of the small-mass case. For comparison, the case m φ = 1 GeV is shown in fig. 6 in the Appendix. In either case we restrict to the 1×3 channel. MAOS and thrust achieve a comparable separation between the signal and the background distributions for m φ = 1 GeV, whereas MAOS performs better for small m φ , as quantified by the full analysis, to be discussed at the end of the paper.
One further quantity constructible from k maos 1,2 is the ratio with ξ maos k denoting the corresponding ratio calculated with k maos 1,2 . This variable is reminiscent of the R p T ratio pointed out in Ref. [23], with two differences. First, R p T is constructed in terms of the visible momenta, whereas ξ k requires the invisible ones, separately for the two branches -which is precisely what MAOS provides. Second, R p T is a 'max-over-min' ratio, implying the noncompact domain [1, ∞], i.e. a long distribution tail. The difference between signal and background is thus 'diluted' over this tail. Conversely, ξ k spans the compact domain [0, 1], which enhances the shape difference between signal and background. The ξ k distribution performs best of all and is shown in fig. 3 (first panel) for the case m φ = 1 MeV. Note that ξ k,p could be defined in the lab or CMS frames. We used the CMS-frame definition, where the slope differences are more pronounced. For purposes of comparison, the ξ p , R k T and R p T distributions are shown in the remaining panels of fig. 3. Besides, all of ξ k,p , R k T ,p T are also shown in fig. 6 of the Appendix for the case m φ = 1 GeV. The underlying rationale of eq. (6) is that this ratio is expected to be closer to unity for the 1 × 1-channel background (4th entry in the legend), because of the symmetric decay chains. By the same argument, we also expect the distribution 'slope' to decrease roughly with the number of invisibles. The ξ k histograms, and to a lesser extent the ξ p ones, display both of these features. We shortly discuss other known variables that do not require MAOS momenta, and that show a small enough correlation with those discussed so far. One popular example for lepton colliders is the recoil mass [26,27], defined as M 2 recoil = (P CMS − p 1 − p 2 ) 2 , i.e. the invariant mass of the full invisible system. Since there are more invisible particles in the background, they 'typically' have a larger invariant mass than in the signal case. This property is clearly visible in the M recoil distribution in fig. 4 (first panel), for both the 1 × 3 and the 1 × 1 cases. One may reverse the argument for the variable E miss ≡ |P miss |: in the τ -pair rest frame, the invisible particles would be boosted along the momentum direction of the parent τ , so their three-momenta will partially cancel those from the other decay chain. This cancellation will be the more efficient, the more symmetric is the decay. Hence one may expect a 'thicker' tail for the signal decay φ + νν, than for the corresponding background νν + νν. In practice, the E miss discriminating power is inferior to M recoil 's, as shown by the second panel of fig. 4. 7 Before presenting our main analysis, we collect details about our setup. We generate e + e − → τ + τ − using MadGraph. Tag-side decays and backgrounds are obtained through TauDecay [29], whereas signal-side decays are populated as phase space through ROOT. We generate about 1.5·10 7 events per process. In order to populate the phase space in a similar way as Ref. [7], we apply the cut 0.8 ≤ T ≤ 0.99 on the thrust scalar. We also inspected the effect of including further cuts on the total visible energy and on the invariant mass of the 3-prong system, used in the Belle-II analysis [7] for the suppression of reducible backgrounds, and found it to be negligible in our case. Momentum smearing due to detector effects is typically 1% and we safely neglect this effect. To be more exact, event distributions are vastly more populated for p T > 0.2 GeV/c [30], which is the region where the below-1% momentum-smearing figure holds. Our numerical analysis uses the public library YAM2 [31] and the TMVA [32] class available in ROOT. We restrict to phasespace decays for comparison with Ref. [7], and also for the following reason. The variables discussed above are insensitive to the angular distribution of the new particle. The cuts implemented to mimic the search in Ref. [7] will not modify angular distributions either, as these cuts affect invariant masses or momentum magnitudes. We also note that, since our decay of interest is 2-body, the decay amplitude is isotropic and so is the differential decay rate [33]. As a consequence, our results apply equally to the case of an ALP and of a hidden photon, with coupling chirality whatever. A separate, interesting question would be to construct "invisible-spin-savvy" variables, which exploit MAOS or thrust momenta in spin-sensitive kinematic variables (as in e.g. [34]), in order to tell apart different coupling assumptions.
We next discuss the main analysis. We first note that M 2 , ξ k,p , M recoil and E miss can be unambiguously calculated for the 1 × 3 and 1 × 1 cases alike, 8 and their distributions depend, to different degrees, on the number of invisibles in the decay. We thus collectively denote this ensemble as 'invisible-savvy' variables, and construct a classifier that we refer to as ISy.
Note that ISy does not include p maos e . We then consider the following cases: We refrain from including an (abcd) case, which would show the 'maximal' improvement achievable from both the use of two channels in lieu of one, and the use of more variables. In fact, a reliable combination of 1 × 3 and 1×1 is not straightforward, because of the different tag, and should be performed on actual data. Fig. 5 (first panel) presents a comparison of the performance profiles for the cases discussed above, i.e. (a) to (d), in the plane of signal efficiency vs. background rejection, yielding the 'ROC' curve. In the single-variable cases (a) and (b), we use a cut-based approach rather than a BDT/NN, although the optimal cut is obtained by TMVA. We see that, for m φ = 1 MeV, p maos e has a larger area-under-curve (AUC) than p thrust e for any signal efficiency. For m φ = 1 GeV, the two AUCs are comparable, and we show the corresponding plot in fig. 7 (left) in the Appendix. In the remaining panels of this figure we also show, for the sake of possible reproducibility, the BDT response for both cases m φ = 1 MeV and 1 GeV.
Our S/B separation can be translated into an estimate of the upper limit (UL) on B(τ → eφ) achievable with a given Belle-II luminosity. To determine such limit we proceed as follows. Given a sample of N s signal and N b background events, the weight of the sample background can be determined as w b = B/N b where, 8 On the other hand, application of p maos e and p thrust e is not straightforward in the 1 × 1 case, because the symmetry of the topology implies a combinatorial ambiguity, introducing a separate source of uncertainty. Including these variables in this case requires a dedicated study (see e.g. discussion in [35]). in our case, B = σ τ τ × B(τ τ → bkg) × L, with σ τ τ the total τ + τ − production cross-section and L the luminosity. We need the weight w s for the signal sample, from which we may estimate the statistical significance as σ(w s ) The σ(w s ) value corresponding to a 95% confidence-level (CL) exclusion is given by σ(w s ) = 1.96 [33], that we invert in terms of w s . (With fixed N s,b , one may proceed iteratively starting from w This procedure is independent from the template-fit method used for the on-going Belle-II analysis [7], hence the agreement with [7] of the p thrust e -case upper-limit curve shown in fig. 5 is a non-trivial check of our approach. We also verified that, after the classifier cut, B is large enough that the above approximate relations are valid [36].
Before concluding with the main analysis results we note that our approach has a wide range of applicability -be it to searches of new decays to invisibles or to improving the knowledge of background decays to invisibles -to the extent that the 'pairwise-decay' topology is the same. Besides, if one restricts to transverse variables, a similar approach may also be applied for meson or τ decays at hadron colliders.
We conclude by presenting the 95% CL upper limit on the signal branching ratio (BR) as a function of the new light particle mass, for different assumed Belle-II luminosities L, and with the different classifiers discussed. These upper limits are summarised in the middle and rightmost panels of fig. 5. In particular, the former shows, for the integrated luminosity L = 0.1 ab −1 , a comparison among the cases (a), (b), (c), (d), (abc) discussed above; conversely, the last panel of fig. 5 focuses on the evolution of the expected upper limit with luminosity, and shows the cases L = {0.1, 1, 50} ab −1 , corresponding to the dataset accumulated as of Summer 2021, the dataset anticipated before the 2022 shutdown, and the target Belle-II dataset, respectively. As also suggested by the ROC curves, p maos e allows for a better BR limit than p thrust e for small m φ 0.1 GeV. For example, this improvement is a factor of ≈ 1.8 for m φ ≈ 0 and L = 0.1/ab. In short, for an ALP or hidden vector of small m φ 1 MeV, we anticipate that application of our full strategy to the 1×3 channel alone will lead to a 95%-CL limit of around to be compared with B(τ → eφ) ≤ {1.3 · 10 −4 , 4.0 · 10 −5 , 5.7 · 10 −6 } with the thrust method alone. As a consequence, our strategy improves by a factor close to 3 the limit achievable with the strategy currently in place within Belle II. We thus expect a Belle-II limit on BR(τ → e + invisible) stronger than the existing AR-GUS limit [6] by a factor of respectively 50, 170, 1150