Tagging new physics with charm

We propose a new variable, the charm fraction, for collider searches for new physics. We analyze this variable in the context of searches for simplified supersymmetry models with squarks, the gluino, and the bino, assuming that only the lightest mass-degenerate squarks can be produced at the high-luminosity LHC. The charm fraction complements event counting and kinematic information, increasing the sensitivity of the searches for models with heavy gluinos, for which squark production is flavor-blind. If squarks are discovered at the LHC, this variable can help discriminate between different underlying models. In particular, with improved charm tagging, the charm fraction can provide information on the gluino mass, and in some scenarios, on whether this mass is within the reach of a future 100 TeV hadron collider.

can thus increase the sensitivity of LHC squark searches to scenarios with very heavy gluinos, which are challenging due to their smaller production cross sections. As the gluino mass decreases, t-channel gluino-exchange diagrams with quarks in the initial state become increasingly important, and the fraction of charm quarks from pair-produced squarks goes down.
We therefore assume that the gluino is beyond the discovery reach of the high-luminosity LHC (HL-LHC), and study the charm fraction in squark pair production. If an excess is observed in jets plus missing energy searches, significant effort would be required in order to determine the multiplicity, the SU(3) charge, the mass, and the spin of the particles produced. In addition, a key question is whether additional particles with masses beyond the LHC reach exist. The simplified models we study here are characterized by four parameters: the number of squark flavors produced, the gluino mass, the squark mass, and the bino LSP mass. The first three determine the BSM production cross section, while the latter two-and in particular, their difference-determine the event kinematics and consequently the efficiency of the search. As is well known, measurements of various kinematic observables such as the effective mass m eff and the stransverse mass m T2 can be used to extract some information on the squark and bino masses [7]. The charm fraction, which is a qualitatively different observable, can yield new information on the underlying model, and in particular on the gluino mass. The latter is important input for the planning of future accelerators like the 100 TeV proton-proton collider, which is anticipated to be sensitive to gluino masses up to 10-15 TeV [8,9].
While we use squark production as a concrete example, we believe it is important to approach LHC searches with as few theory biases as possible. Thus for example, the new states produced could be colored Kaluza-Klein fermions, and the "squarks" should be thought of merely as new fundamental colored scalars. The charm fraction may help determine whether these new produced states are the end of the story. This paper is organized as follows. In Section 2, we specify the simplified models we use, review the basics of squark searches, describe the Monte Carlo numerical analysis, and expand on the treatment of charm tagging in our analysis. In Section 3, we proceed to study the charm fraction in the models and discuss the results. We end with some remarks in Section 4.

General framework: models and overview of searches
In order to demonstrate the use of the charm fraction, we consider simplified models, consisting of the first-and second-generation squarks, the gluino, and a bino LSP. We assume that the squark spectrum is flavor-blind: squarks of the same gauge quantum numbers are mass degenerate, while some hierarchies may exist between left-handed and right-handed squarks, and/or between up-and down-type squarks. We imagine a scenario in which the gluino is beyond the reach of the 14 TeV LHC, with mass above 4 TeV [10], and assume that only the lightest squarks can be directly produced. Note that the latter assumption requires only mild hierarchies among the squark masses. Since the gluino mass affects the production of squark pairs, the models are characterized by the squark mass mq, the bino mass mχ0 1 , the gluino mass mg, and the number of squark flavors produced, which we denote by Nq. We consider three scenarios: • Nq = 8 models, in which all the left-and right-handed squarks are degenerate; • Nq = 4 models, in which only the right-handed squarks can be directly produced (with the left-handed squarks beyond LHC reach); • Nq = 2 models, in which only the right-handed up-type squarks,ũ R andc R , can be directly produced (with all remaining squarks beyond LHC reach).
The parameters of the different models are summarized in Table 1.

Search basics
The simplified models we consider predict squark-pair production at the LHC, yielding events with at least two hard jets, large missing energy, and no electron or muon. Our analysis below closely follows ATLAS analyses of this topology. As we focus on the HL-LHC with √ s = 14 TeV and an integrated luminosity L = 3000 fb −1 , we employ the Meff-2j-3100 signal region (SR) of Ref. [10], which discusses HL-LHC reaches based on this event topology. This SR selects events with two or more jets, missing energy above 160 GeV, and inclusive effective mass above 3100 GeV. Some of the model points we will display in the discussion of the charm fraction are already excluded by 13 TeV data. In order to determine whether a model is already excluded, we use the Meff-2j-2000 SR of Ref. [11], which is based on 13 TeV LHC data with L = 13.3 fb −1 . The full sets of cuts defining both SRs are reviewed in Table 2.

SM backgrounds and charm production
The main SM background for the two-jets plus missing energy search is Z + jets with the Z boson decaying into neutrinos (see Fig. 1a and Fig. 1b). The next source of background Table 2: Definitions of our signal regions. SR Meff-2j-2000 is from the ATLAS analysis based on 13.3 fb −1 data at the 13 TeV LHC [11], and Meff-2j-3100 is based on the HL-LHC study [10]. In Meff-2j-2000 (Meff-2j-3100), jets are required to satisfy p T > 50 GeV and |η| < 2.8 (p T > 20 GeV and |η| < 4.5), and ∆φ cuts are applied to all the jets with p T > 50 GeV (p T > 40 GeV). H T is the scalar sum of p T of all the jets, and m eff (incl.) is the sum of E T and H T . Events are vetoed if electrons and/or muons with p T > 10 GeV are present.
Meff-2j-2000 Meff-2j-3100 Number of jets, electrons, muons ≥ 2, = 0, = 0 is W + jets production, which we return to below. Dibosons and tt production give smaller contributions (see, e.g., Ref. [11]). The dominant W -background is from W decays to tau plus neutrino, with the tau decaying hadronically, predominantly to light jets. These processes receive large contributions from diagrams like Fig. 1b, with the Z replaced by a W , leading to a τ , neutrino and two jets in the final state, as well as from diagrams like Fig. 2a, which lead to a τ , neutrino and one jet. Another type of background from W production is processes in which the W decays into a light lepton (electron or muon) plus neutrinos, and the lepton is lost in the reconstruction (see, e.g., Fig. 2b).
In the invisible-Z background, the leading source of charm quarks is QCD production of cc pairs as shown in Fig. 1a. Another important source of charm quarks in the SM background comes from higher-order, but log-enhanced, processes, and in particular "gluon splitting" into a cc pair (see, e.g., Ref. [12]). While this is not included in our leading order (LO) simulation of the hard processes, some component of charm pairs from gluon splitting is generated by Pythia. Gluon-charm initial states in Fig. 1b and gluon-strange initial states in Figs. 2a and 2b give small contributions since they are PDF-suppressed. In the latter two figures, processes with an initial-state down-quark are CKM-suppressed.

Monte Carlo simulation
Signal samples are generated by MadGraph_aMC@NLO 5 [13] at LO, with the PDF set NNPDF2.3QED at LO with α s = 0.13 [14]. The baseline selections described in Table 3 are applied based on the missing transverse energy and jet p T . Parton showering and hadronization are performed by Pythia 6.4 [15]. Tau decays are simulated by TAUOLA [16]. For detector simulation, Delphes 3.3.0 [17] is utilized with the default detector card, where the parameter of anti-k T algorithm [18,19] for jet clustering is replaced by R = 0.4 to match the ATLAS studies. Pile-up effects are not considered. These signal samples are rescaled by next-toleading-order (NLO) K-factors, which are calculated by Prospino 2 [20] 1 . We then apply the selection cuts for SR Meff-2j-2000 and, using the K-factors for 13 TeV collisions, and compare to the upper bound obtained by the ATLAS analysis [11] to determine whether the model point is excluded.
To obtain the charm fraction at the HL-LHC, background events are also generated by the same procedure. Our selection cuts, especially the high m eff cut of 3100 GeV, suppress the W + jets background so that it is about a third of the Z + jets background. At the same time, the Z + jets background is easier to calculate compared to the W + jets background, since the latter comprises different components (e.g., jτ ν, jjτ ν). We simulate the Z + jets background and the different components of the W + jets background at leading order. We find that the fractions of (truth-level) charm quarks in each of these are similar. We therefore obtain the total number of background events by reweighting the Z + jets sample 1 Prospino does not handle non-degenerate squarks from the first two generations. However, the NLO correction is dominated by QCD contributions (light quarks and gluons) [21]. Therefore, additional heavy squarks only contribute at next-to-next-to-leading order and the Prospino K-factors are a good approximation even in this case. As previously mentioned, we only require mild hierarchies between the masses of the lightest and other squarks, so we will ignore leading-log corrections.

Meff-2j-2000
Meff-2j-3100 (with the baseline selection in Table 3) to match the number of events from Z + jets and W + jets processes in the ATLAS analysis [10] (Figure 8b), and approximate the fraction of charm quark events (whether these contain a single charm quark or a pair of charm quarks) in this sample by its value in our simulated Z + jets sample. Events are selected by the cuts of the Meff-2j-3100 SR. For each event in the SR, the two jets with leading p T are considered in the calculation of the charm fraction. Initially, Delphes 3.3.0 is utilized to determine the "truth-level" jet flavor. In this algorithm, a jet is considered bottom-flavored, or a "truth-level bottom jet", if one or more bottom quarks (b orb) exist in the jet cone as a Pythia-level parton. If no bottom partons are found but charm partons exist, the jet is labeled as a "truth-level charm jet". Otherwise, the jet is treated as a light jet at the truth level. The detector-level charm tagging is performed based on this "truth-level" information as we describe below.

Charm tagging
The main limiting factor in measurements of the charm fraction is the charm tagging capabilities and in particular the fake rates. Current analyses at the LHC experiments utilize charm-tagging algorithms based on the working points ( c , b , l ) = (0.19, 0.2, 0.005) [1] or (0.2, 0.24, 0.02) [2]. Here c is the tagging efficiency of charm quarks, while b and l are the mistag rates for bottom and light jets, respectively. These taggers are primarily trained on tt samples and thus the maximal jet transverse momentum does not exceed 300 GeV [10]. In contrast, the average p T for the simplified models presented here is ∼ 500 GeV, so charmtagging would be more challenging. Still, charm tagging algorithms will likely undergo significant improvement by the end of the HL-LHC program. Thus, we consider two optimistic scenarios with efficiencies c = 0.5 or 0.3 and mistag rates b = 0.2 and l = 0.005 (see also Ref. [22]). Since we cannot reliably estimate the p T and η dependence of the various efficiencies, we take them to be constant over the entire ranges.
For a given set of tagging parameters, ( c , b , l ), we can divide the sample of events passing the cuts as follows. In each event, we examine the truth-level flavor of each of the two hardest jets. A truth-level charm jet is "tagged" as a charm jet with probability c . Similarly, each truth-level l (b) jet is "tagged" as a charm jet with probability l ( b ). We denote the number of these "charm-tagged" jets by N c , and the total number of events in the sample by N ev . For high-efficiency, high-purity taggers, it would also be useful to separately consider events in which the two hardest jets are both tagged as charm jets. We denote the number of these double-tagged events by N 2-tag . One caveat of our simulation is the fact that the Delphes algorithm that we utilize treats cc pairs originating from gluons or from taus as charm jets, even when the pair is clustered into a single jet. We return to this issue in the discussion of the results.

The charm fraction
Squark pairs would be produced at the LHC either through flavor-democratic processes (see Fig. 3), or through gluino-mediated processes (see Fig. 4), which are sensitive to the proton PDF's and are thus flavor-dependent. As the gluino mass is increased, the latter processes become less significant.
We define the charm fraction F c as the ratio where N ev , N c were defined in Sec. 2.4. For events coming from squark pair production, we expect this fraction to increase as the gluino mass increases. This behavior is exhibited in Fig. 5a, where we plot the charm fraction for a model with Nq = 8 squarks with mq = 1.5 TeV and a massless bino, assuming an ideal tagger ( c = 100%, b = 0, l = 0). Error bars are the Monte Carlo uncertainties. The fraction rises from 0.10 to 0.16 as the gluino mass varies from 4 TeV to 13 TeV, and asymptotes to 0.25 for a decoupled gluino. Repeating this for the SM background yields a charm fraction of 0.09. The hollow points are already excluded by the Meff-2j-2000 SR of [11] (see Sec. 2.3 for details). In Fig. 5b, we show the fraction of double-tagged events,  The value for the decoupled gluino is 0.18, which is smaller than the naive expectation of 0.25 because jets from QCD radiation can be harder than charm jets from charm squarks. The SM background is reduced by a larger relative margin as the number of double charm events is smaller for the SM. The results of Figs. 5a and 5b are based on truth-level parton flavor; however, realistically, we must consider charm-tagged jets. In Fig. 6, we show the results for various model points in the N ev -F c plane, assuming tagging efficiencies of ( c , b , l ) = (0.5, 0.2, 0.005). We focus on a narrow range of squark masses, for which the models may potentially be probed by the HL-LHC, and which are not yet excluded for at least some gluino masses. We then vary the number of squarks produced, over Nq = 2, 4, 8, and consider both heavy (300 GeV) and massless binos. The squark and bino masses are chosen such that all the models yield similar kinematics, and cannot be distinguished based on m T2 [7]. Each shape-color combination maps to a particular choice of (Nq, mq, mχ0 1 ): the shape of the central value marker The theoretical m T2 endpoints for these models are in the range 1500 ± 50 GeV. The numbers in the legend correspond to (Nq, mq/GeV, mχ0 1 /GeV).
indicates Nq, and the color designates pairs of squark and bino masses (mq, mχ0 1 ). Points with the same shape and color correspond to the different gluino masses of Table 1: in a sequence increasing in N ev , the values of mg decrease, beginning with mg = 450 TeV. Points with hollow central values are already excluded by the Meff-2j-2000 analysis of [11]. Only statistical uncertainties on F c (assuming 3000 fb −1 of integrated luminosity) are shown.
For the largest gluino masses, discovery based on N ev alone would be challenging. In Fig. 6, we have not displayed horizontal error bars on N ev since we cannot reliably estimate the dominant systematic uncertainties. Still, using current LHC analyses as a guide, it is reasonable to expect the systematic uncertainty on N ev to be around 10% [10]. The SM prediction is then N SM ev = 3433 ± 59 stat ± 343 syst = 3433 ± 348. Roughly, most of the model points of Fig. 6 lead to excesses in N ev below 3σ (corresponding to N ev 4500). Furthermore, for fixed values of (Nq, mq/GeV, mχ0 1 /GeV), only a limited range of gluino masses remains for which a 5σ discovery, requiring N ev 5200, would be possible. Recall that hollow points denote models which are excluded by Ref. [11].
On the other hand, for large gluino masses, the charm content of supersymmetry events is large, so charm tagging can be used to increase the sensitivity to these models. Since it is down by the fraction of charm squarks produced and the charm tagging efficiency, the number of charm-tagged events is prone to larger statistical uncertainties compared to N ev ; however, many systematic uncertainties cancel out in this ratio, including the uncertainty on the jet energy scale, which affects the determination of both the missing energy and m eff . We expect the dominant remaining sources of uncertainties to be the charm tagging efficiencies and the PDF's. Note that the latter do not completely cancel in the ratio, as the charm fraction is sensitive to the relative sizes of the PDF's of the valence quarks, gluons, and sea quarks 2 . As seen in Fig. 6, for models with a heavy gluino, the charm fraction displays the largest deviation from the SM background.
Let us assume a 10% uncertainty on F c to get a rough estimate of the discriminating power of this variable. The SM then predicts F c = (5.8 ± 0.3 stat ± 0.6 syst )% = (5.8 ± 0.6)%. The deviation of F c from the SM prediction is then at the level of 1.4-3.2σ for decoupled gluino models, and when combined with N ev , may allow for discovering these models. Obviously, the F c values shown in Fig. 6 are weighted averages of the SM and supersymmetry samples, and are particularly skewed towards the smaller SM value when the squark production cross section is small. Thus for example, for the Nq = 2 models with mq = 1.5 TeV, a vanishing bino mass, and gluino masses of 6 TeV and above, the excesses in N ev vary between 0.5-2.7σ, while the deviations of the charm fraction from the SM prediction vary between 2.3σ (for the decoupled gluino) and 1.0σ (for the 6 TeV gluino). The combination of these two variables increases the sensitivity for these challenging scenarios.
If an excess in N ev is observed, attention will be focused on the properties of the new particles produced and on whether additional new particles exist. As explained above, the model points shown here are chosen such that the end points of their various m T2 distributions lie in the range 1500±50 GeV 3 . Thus, it will be difficult to distinguish between them based on their missing energy signatures. The charm fraction can clearly break the degeneracy between different underlying models. While a definitive statement cannot be made given that we cannot reliably estimate the systematic uncertainties, the results suggest that models with gluino masses around or below 10 TeV can be discriminated from decoupled gluino scenarios. Thus for example, F c can easily discriminate between the (8, 1500, 1) model with a decoupled gluino, which gives N ev 4800 and F c = (7.9±0.3 stat )%, and the other models which give a similar value of N ev , but with gluino masses between 4-7.5 TeV and F c = (5.9-6.3)%; assuming a 10% systematic uncertainty on F c , these are 2-3σ away from each other.
With a high efficiency to tag charm jets, it is sensible to also consider the fraction of double-charm-tagged events, F 2-tag c , which we plot for the same set of models in Fig. 7. Compared to F c , F 2-tag c suffers from QCD radiation effects and larger statistical uncertainties due to a further reduction by approximately c . If the systematic uncertainty in the charm fraction is dominated by uncertainties on tagging efficiencies, F 2-tag c will also be subject to a systematic uncertainty approximately twice that of F c . However, because the SM prediction for F 2-tag c is small, a deviation from the SM value will be more significant. For example, assuming a 20% systematic uncertainty on  Figure 6 assumes charm tagging efficiencies of ( c , b , l ) = (0.5, 0.2, 0.005). This is far better than those currently published [1,2]. Furthermore, these numbers are expected to deteriorate as the jet p T increases; however, since we do not know the high-p T and η dependence of the tagger at the HL-LHC, we take them to be constant over the entire range. For comparison, we also show results for the same model points, but with more conservative efficiencies, in Figs. 8 and 9.

Discussion
The charm fractions displayed in Figs. 6-9 contain both SM and supersymmetric contributions. As noted in Sec. 2.4, our analysis overestimates the number of charm jets from gluon splitting compared to realistic detectors, since with Delphes, a jet containing a cc pair is labelled as a charm jet. This occurs in both supersymmetry and SM events, but since gluon splitting is more important in the SM, the effect on the SM charm fraction is more pronounced.
The charm fraction of supersymmetric models can in principle be lower than in SM events. This requires a low gluino mass, which is not of much interest to us, since discovery in this case occurs based on the total number of events. In the pure supersymmetry sample, the charm fractions of Nq = 8 and Nq = 4 are identical (assuming the other three parameters are equal). Note that, while bino-mediated t-channel processes give an O(1%) modification of the cross section and are thus negligible, processes involving winos may give an O(10%)   N ev (2, 1500, 1) (2, 1600, 300) (4, 1600, 300) (8, 1500, 1) (8, 1600, 300) SM contribution even if the winos are as heavy as the squarks. Therefore, in the presence of lefthanded squarks, our results assume an extremely large wino mass. Still, our analysis can be straightforwardly generalized to include winos, with little qualitative changes. Charm squarks would mainly decay to strange quarks in this case and vice versa. As mentioned above, estimating the number of charm jets from SM processes is nontrivial. Beyond this theoretical difficulty, in standard experimental analyses, collinear charm pairs from gluon splitting are typically merged into a single jet, but jets containing two heavy quarks are subsequently discarded [24]. New approaches for heavy flavor tagging were proposed recently to address this problem [24]. The intrinsic charm fraction of the proton is another potential source of charm quarks that is hard to estimate [25].
Fortunately, these theoretical uncertainties can be straightforwardly circumvented by measuring the charm content of the SM background in the data. For the Meff-2j-3100 SR analysis, the background is dominated by Z + jets, and the charm fraction can be measured in the analogous sample with the Z decaying leptonically. In fact, Z + c production with leptonic Z decays has been used by CMS for training some of their charm taggers [26]. Thus, one can extract the numbers of both charm and non-charm jets in the sample of invisible-Z decays, and by subtracting them, obtain the purely supersymmetric charm fraction. While this will be subject to larger experimental uncertainties, the theory systematics will be significantly improved.

Conclusions
The exclusion limits on superpartner masses from ATLAS and CMS are fast approaching the discovery reach of the LHC. The fraction of charm quarks in jet plus missing energy events provides a new handle on superpartner production, and may increase the sensitivity of LHC searches to squark-pair production. While we have only studied here squark pair production, the charm fraction in gluino-pair production is of interest too as this process is flavor democratic for degenerate squarks.
We did not address here the production of top and bottom squarks. Because of their relatively large Yukawa couplings, these are likely to be split in mass from the first-and second-generation squarks. Furthermore, because the bottom and top content of the proton is negligible, their production is mainly gluon-mediated, and to a good approximation, independent of the gluino mass. Thus, while their discovery would yield additional information, it is orthogonal to our discussion here. We also neglect winos and higgsinos in this study. The latter have little effect on first-and second-generation squark production. Winos, on the other hand, mediate t-channel squark production, and would alter our results unless they are very heavy.
We have argued that the charm fraction can be used to disentangle different model points with similar kinematics. We note that, while event kinematics are largely governed by the squark and bino mass, they have some sensitivity to the gluino mass as well, since how central or forward the events are depends on the weight of t-channel gluino processes. This suggests that measurements of the charm fraction may be optimized by a judicious choice of kinematic cuts in order to extract the gluino mass.
Here we studied models with mass-degenerate up and charm squarks. While plausible, this is by no means mandatory. With mass splittings between the squarks, the fermion Cabibbo mixing will typically translate into up-charm mixing of the left-handed squarks; for concrete spectra, see, e.g., Ref. [27]. Measuring the charm fraction will yield information on the squark flavor composition.