Enhancement of new physics signal sensitivity with mistagged charm quarks

We investigate the potential for enhancing search sensitivity for signals having charm quarks in the final state, using the sizable bottom-mistagging rate for charm quarks at the LHC. Provided that the relevant background processes contain light quarks instead of charm quarks, the application of b-tagging on charm quark-initiated jets enables us to reject more background events than signal ones due to the relatively small mistagging rate for light quarks. The basic idea is tested with two rare top decay processes: i) t ->c h ->c b bbar and ii) t ->b H+ ->b bbar c where h and H+ denote the Standard Model-like higgs boson and a charged higgs boson, respectively. The major background source is a hadronic top quark decay such as t ->b W+ ->b sbar c. We test our method with Monte Carlo simulation at the LHC 14TeV, and find that the signal-over-background ratio can be increased by a factor of O(6-7) with a suitably designed (heavy) flavor tagging algorithm and scheme.


I. INTRODUCTION
The discovery of the Higgs particle at the Large Hadron Collider (LHC) [1,2] reaffirms that the Standard Model (SM) is a successful description of fundamental particles and their interactions in nature. Nevertheless, the detailed mechanism of protecting its mass scale from large quantum corrections is still unexplained by the SM, and new physics beyond the Standard Model (BSM) is anticipated to address this puzzle. Since the corrections are dominantly contributed by the top quark, the top quark sector has been regarded as a promising host to accommodate and reveal new physics signatures. Furthermore, the LHC, dubbed a "top factory", is capable of copiously producing top quarks in pairs via the strong interaction, and it can therefore be taken as a great venue to discover new physics phenomena using top quarks.
We emphasize that although many physical properties of the top quark have been measured with great precision since its discovery, its decays are relatively poorlymeasured; typical errors in the top quark decays are of O(10%) mostly coming from systematics in the measurement of the t-channel single top quark cross section [3,4]. Hence, any new physics effects emerging in the top quark decay channels are, in principle, less constrained by current experimental data.
One of the rare top decay examples to be considered here is t → ch via a flavor changing neutral current (FCNC) [5][6][7][8] followed by the dominant decay mode of the higgs of 125 GeV, i.e., h → bb. In principle, nothing precludes the top quark from decaying in this man- * immworry@ufl.edu † parc@apctp.org ner. Nevertheless, the SM prediction on the branching ratio (Br) of this process is extremely small due to the famous Glashow-Iliopoulos-Maiani mechanism and second-third generation mixing suppression, which results in Br(t → ch) SM ≈ 10 −13 − 10 −15 [5][6][7]. Therefore, a significant excess from such a small SM expectation could be a convincing sign of the existence of new physics. In fact, once new physics is introduced, the aforementioned suppressions can be relaxed, and thus fairly larger branching fractions can be anticipated, e.g., Br(t → ch) BSM ≈ 10 −3 − 10 −6 depending on the details of the BSM models of interest [7], which is comparable with the recent experimental bound reported by the CMS collaboration [9].
Another exciting scenario to be considered here is t → bH + where the charged higgs sequentially decays into a charm quark and an anti-bottom quark unlike the typical decay mode of H + → cs. A sizable branching fraction of H + → cb arises in a few models with two or more higgs doublets: for example, multi-higgs doublet models (MHDM) [10], flipped two-higgs doublet models (2HDM) [11][12][13] with "natural flavor conservation", and Aligned-2HDM [14]. Depending on the model details, Br(H + → cb) could be as large as ∼ 80% [15]. Although existing experimental searches of t → bH + → bsc done by the CDF [16] and ATLAS [17] collaborations could be applied to the decay of H + → cb [12], an enhanced branching ratio motivates more dedicated searches to discover a new phenomenon or directly constrain the parameter space in the relevant physics models.
For the purpose of concreteness we focus on the collider signatures in the context of pair-produced top quarks, and assume that one of the top quarks decays into two bottom and one charm quarks via the decay sequences described above while the other follows the regular leptonic decay cascade. Provided with the visible final state arXiv:1507.03990v2 [hep-ph] 9 Jun 2016 defined by the signal processes, obviously, the dominant SM background is semi-leptonic top quark pair production. Since there exist three bottom quarks for the signal process vs. two bottom quarks for the background one, the requirement of three bottom-tagged jets can substantially reduce background events. It is noteworthy that this event selection enables us to have the hadronic top quark decaying into bsc (i.e., t → bW + → bsc) as a main background source because the b-mistagging rate for charm quarks is rather sizable. We henceforth take it as the major background unless specified otherwise.
We point out that, remarkably enough, the high mistagging rate for charm quarks can be useful for a further improvement in the relevant signal-over-background ratio (S/B). More specifically, if one demands an additional bottom-tagged jet, then signal events can be selected by tagging the remaining charm quark as a bottom quark, whereas background events can be selected by tagging the remaining strange quark as a bottom quark for which the corresponding mis-tagging rate is typically far smaller than that for charm quarks. Therefore, we expect that the relevant signal sensitivity gets increased so that it is possible to probe smaller branching fractions of signal processes. 1 Of course, a non-negligible reduction in the signal acceptance due to the additional b-jet requirement could be an issue. Given an immense production cross section of top pairs and a large expected integrated luminosity, for example, L = 300fb −1 at the 14TeV LHC, adequate statistics can be nevertheless achieved in these search channels.

II. EXPECTED ENHANCEMENT AND POTENTIAL ISSUES
To develop intuition on the basic idea described thus far, we provide a rough estimation of the expected enhancement by parametrizing pertinent efficiencies. As mentioned before, a way to enhance the S/B (before any posterior analysis using kinematic variables) is to require one more b-tagged jet in the final state, utilizing the sizable mistagging rate of charm-induced jets. For more systematic comparison, we begin with (would-be) conventional selection scheme (denoted by 3b), that is, three bottom jets, one regular jet, and a W gauge boson. Since the W is irrelevant to the later discussion, we drop it for convenience. We first define some of the efficiencies with respect to the identification of bottom-initiated jets; b b as b-tagging efficiency of b quark, b c as b-mistagging efficiency of c quark, and b s as b-mistagging efficiency of s quark (light quarks). With this set of definitions and S being the number of signal events before the tagging procedure, the expected number of signal events in the 3b scheme (S 3b ) is given by where the first term represents the leading contribution while the second term represents the subleading contribution such as the case where c-induced jet is mistagged while one of the b-induced jets is not tagged. When it comes to the major background, the leading contribution comes from a hadronic W decaying into c and s as mentioned earlier. Therefore, the expected number of background events in the 3b scheme (B 3b ) is where B denotes the number of background events before the tagging procedure. We then have the S/B in the 3b scheme as Here we assume that the expected number of background events originating from t → bdu is negligible because two light quarks are involved.
On the other hand, if we modify the aforementioned selection scheme by requiring an additional b-tagged jet instead of a regular jet (denoted by 4b), the expected numbers of signal and background events (S 4b and B 4b , respectively) are expressed as from which the relevant S/B is simply given by where a small portion from t → bdu is neglected again in computing B 4b . Defining the improvement of the 4b scheme with respect to the 3b scheme as I, we have Since tagging efficiencies vary in the transverse momentum of jets, it is interesting to investigate the dependence of I according to the P T of b-jets, which is explicitly shown by red dots in FIG. 1. As an example tagging scheme, the CSVM tagger of the CMS collaboration has been adopted, and relevant efficiencies are applied based upon the values reported in Ref. [18] wherein they have measured the data using tt events. To understand this behavior more intuitively, it is worthwhile to rewrite I using the leading contributions (i.e., the first terms) in eqs. (1) and (2): One can easily see that the value of I is solely governed by b c and b s . More specifically, this is an increasing function as b c (b s ) increases (decreases) so that a large gap between b c and b s is favored to attain a large improvement. In fact, it turns out that heavy flavor quarks are less tagged as b-jets while light quarks fake b-jets more often in the high P T region. The reason is that jets with a large P T are typically collimated so that errors in particle tracking, which is involved in the tagging algorithm, are likely to increase. As a consequence, less significant improvement is shown in the high P T region. A couple of issues may arise in this strategy. Now that charm quark tagging techniques are being developed [19], one could apply it to improve the S/B as an alternative option. 2 To make a comparison of the idea in this paper with the data analyses involving the c-tagging technique, we again define relevant efficiencies with respect to the identification of charm-initiated jets; c b as c-mistagging efficiency of b quark, c c as c-tagging efficiency of c quark, and c s as c-mistagging efficiency of s quark. Although there may exist some non-trivial correlation between heavy flavor tagging techniques and the possibility of b-c mixing tagger [19, 23,24], to be more conservative we ignore the events in which any of the visible entities involve a contradictory result between btagging and c-tagging; for example, if a certain b is not only b-tagged but c-tagged, the associated event is discarded. We basically require three b-tagged jets together with one c-tagged jet. Denoting the tagging scheme explained thus far as 3b1c, we have the expected numbers of signal and background events (S 3b1c and B 3b1c , respectively) in this scheme as where The associated S/B is given by and in turn, the associated improvement can be The Potential improvements I and I of signal-overbackground ratio as a function of PT of jets in tt events. In this plot, the dependence on the rapidity of a jet is integrated out (|η| < 2.4). The efficiencies associated with b-tagging are obtained from Ref. [18], whereas for I the efficiencies associated with c-tagging are fixed to be the average values of the medium operating point in Ref. [19] for which (c b , cc, cs) ≈ (0.125, 0.20, 0.010).
is larger than I in the entire range of P T . This behavior can be viewed in a simpler way by considering the leading contributions (i.e., the first terms) in eqs. (9) and (10). We then have where I is the approximated expression in eq. (8). Since c b is typically larger than c s , I is typically smaller than I. From this series of calculations, we find that the simultaneous application of b-and c-tagging techniques is not beneficial with respect to the signal-over-background ratio. Furthermore, the expected number of signal events itself (after applying the flavor tagging techniques) becomes worse as is clear from the comparison between eqs. (4) and (9). Thus we employ only the b-tagging technique for our data analysis.
Another issue that one may argue is the possibility that other SM backgrounds come into play while an additional bottom-tagged jet is required. For the cases at hand, an immediate example is ttbb for which one W gauge boson decays leptonically while the other is undetected. Certainly, this issue depends strongly on the signal processes of interest, i.e., it may not be an issue for other signal models or collider signatures. We closely look at the impact of ttbb onto the relevant analysis, switching from the 3b scheme to the 4b scheme in conjunction with Monte Carlo simulations later. TABLE I: Reduction rates for signal and background processes in 3b and 4b schemes, and the associated improvements I. The reduction rates for tt are computed with all decay modes included, whereas those for signal processes are computed only with the semi-leptonic decay mode.

III. COLLIDER STUDY
Equipped with the rough estimate discussed thus far, we test the feasibility of the basic idea in the context of the two aforementioned example models with Monte Carlo simulations of the 14TeV LHC. The parton-level events for the signal and the background are generated by MadGraph aMC@NLO [25]. The output information is then streamed to Pythia 6.4 [26] and Delphes3 [27] in order. Jet formation is conducted by the anti-k t algorithm with a jet radius parameter R = 0.5. The b-tagging efficiencies in a detector simulator for bottom, charm, and light quarks are tuned according to the performance of the CSVM algorithm reported by the CMS collaboration [18]. Furthermore, we apply cuts on the final state of both signal and background processes, closely following the selection scheme used in Ref.
We then require different b-jet multiplicities for those selected events; for the 3b (4b) scheme, we exclusively demand 3 (4) b-tagged jets and 1 (0) regular jets together with an isolated lepton. Table I summarizes not only reduction rates of signal and background processes in both schemes but the resulting improvements. Six benchmark scenarios are examined here. To see the potential dependence on the choice of the charged higgs mass, we vary it from 80 GeV to 160 GeV at intervals of 20 GeV. The reduction rates for the tt sample are evaluated with hadronic and leptonic channels included, whereas those for the signal samples are evaluated only with semi-leptonic events.
We observe that the relevant signal-over-background ratio can be improved by a factor of O(6 − 7) for all scenarios; no significant dependence is shown on the choice of the mass of the charged higgs. We point out that unlike the rough estimation demonstrated in FIG. 1, the overall improvement is reduced by a factor of ∼ 2.5 from the maximum expected improvement. In more detail, signal efficiencies are reduced by ∼ 10% from 3b to 4b schemes, which does not so much differ from the mis-tagging rate for charm quarks, whereas background efficiency is degraded only by ∼ 1.6% that is larger than the typical mis-tagging rate for light quarks. To understand this slight mismatch, one could suspect that a large fraction of b-tagged jets come along with a large transverse momentum ( 100 GeV) so that the overall improvement gets reduced, predicated upon the observation in FIG. 1. However, it turns out that only about a quarter of b-jets belong to this hard P T regime for both signal and background events as clear from FIG. 2. In fact, this is not surprising, given charged or SM-like higgs masses themselves and the mass gap between top quark and charged or SM-like higgs masses. Therefore, it does not make any substantial impact. We instead identify the effect from initial and final state radiations as a dominant cause of such a departure from the expectation in FIG. 1. More specifically, initial or final state radiated gluons, which split into a heavy flavor quark pair, play a major role because those quarks are b-tagged with high probability. In other words, once the contributions from ttbb or ttcc are switched on, the relevant efficiency drop for tt is not as large as expected, while such contributions appear as a subleading effect for signal processes. For the sake of validating this intuition, we perform another simulation of tt with initial and final state radiations completely turned off, and find that the efficiency drop between 3b and 4b schemes is restored to ∼ 0.7%, which is close to the typical mis-tagging rate for light quarks.

IV. PROJECTIONS AND CONCLUSIONS
As mentioned at the beginning, the LHC is capable of copiously producing top quark pairs, which makes the search channels discussed here become systematicsdominated. 3 In more detail, the significance σ is given by where κ is a prefactor encoding overall systematics of backgrounds. Considering the typical κ of O(3%) for the tt channel (see, for example, Ref. [29]), we find that the second term in the denominator becomes larger than the first one once B is greater than ∼ 1100. Note that the expected production cross section of top quark pairs at next-to-next-to-leading order including resummation of next-to-next-to-leading logarithm is 954 pb at the LHC14 [30], and the branching ratio of tt → bbcs ν is ∼ 15%. From these two numbers, we can easily see that the number of relevant background events B is much larger than ∼ 1100 even with an integrated luminosity of 1 fb −1 . Therefore, the relevant significance is proportional to S/B so that the improvements discussed in this letter can be directly translated into the associated signal sensitivity, that is, for a given scenario, one can probe ∼ 6 − 7 times smaller branching fraction into the signal process of interest than expected in the 3b scheme. Obviously, posterior analyses with kinematic variables etc. can increase S/B further, which is beyond the scope of this paper. We instead leave such a research direction as future work. We again emphasize that the search strategy proposed here is not restricted to the benchmark scenarios employed here, but straightforwardly extended to the situations where the final state for the signal processes of interest contains charm quark-initiated jet(s) while the corresponding object(s) in backgrounds are light quarkinitiated one(s). We also remark that different operating points may give rise to better improvements. Finally, we strongly encourage the ATLAS and CMS collaborations to adopt this idea as an alternative strategy in relevant new physics model searches.