A swarm of Bs

New physics signals containing five or more b-tagged jets, but without T or leptons, could realistically be sitting within the current 8 TeV LHC data set without receiving meaningful constraints from any of the existing LHC searches at either ATLAS or CMS. This work provides several examples of simple, motivated models that yield final states containing many b-jets. To study the potential for uncovering new physics in these high b-jet multiplicity channels, this paper focuses on a natural supersymmetry scenario where each of the pair-produced stops decays to an on-shell chargino, which subsequently decays via an MFV-motivated, R-parity violating coupling. This gives rise to an eight-jet final state containing six b-quarks. Although no public measurements exist, estimates indicate that the standard model backgrounds in high b-jet multiplicity channels should be very small. To circumvent the background uncertainty, an asymmetric method is presented that utilizes two different techniques to conservatively exclude or to discover new physics in high b-jet multiplicity final states.


JHEP08(2014)073
However, the many b-jet background of the standard model (a subset of the QCD background) is very uncertain. To date, there has been no public LHC measurement of many b-tagged jets without accompanying E / T or leptons. As we will discuss, both monte carlo generation and simple projections estimate that the background is small. However, very large NLO K-factors have been known to appear within QCD backgrounds [9]. It is extremely difficult to estimate how large the backgrounds are expected to be without using data-driven methods. For this reason, we will utilize a novel asymmetric approach for treating signal exclusion and signal discovery separately.
This paper is outlined as follows: in section 2, we will briefly discuss some new physics models that give rise to high b-jet multiplicities. We will then propose an LHC study looking for new physics in high b-multiplicity channels in section 3. First, we estimate the highly uncertain QCD backgrounds in subsection 3.1, before presenting an asymmetric method to separately constrain (3.2) and discover (3.3) new physics. Section 4 contains the conclusions.

New physics with high b-jet multiplicities
Final states with many b-jets, but without leptons or large E / T , do not only originate within baroque models. In fact, "third-generation dominance," a simple ansatz for the coupling of new particles to the standard model particles, naturally leads to decays that involve b-quarks. One motivation for this ansatz is that any new physics closely tied to the generation of mass, such as extended Higgs sectors, is expected to exhibit this behavior. Additionally, indirect bounds from low energy flavor and CP observables, such as K −K mixing and the neutron electric dipole moment, indicate that the standard model's firstand second-generation particles must couple to new physics either very weakly or in a very structured manner, while the bounds on interactions involving the third-generation are generically much weaker. While this is more of a statement concerning current experimental limitations pertaining to flavor studies on the third-generation, the possibility that sizable couplings exist is enough motivation to search for models with such couplings. Minimal flavor violation (MFV) [10], which ties all flavor relations to the standard model yukawas, is one example of third-generation dominance, however, more general scenarios of thirdgeneration dominance are quite plausible [11][12][13].
Arguably, the most motivated spectra from natural SUSY contain light stops and higgsinos [14] (naturalness constraints on the gluino mass can be alleviated if the gauginos are Dirac [15,16]). For R-parity violating couplings originating from an MFV structure [17][18][19][20][21], the superpotential coupling λ 323 U c 3 D c 2 D c 3 is the largest. With its fairly large crosssection, pair-production of stops (i.e., through R-parity conserving diagrams) is a promising avenue for discovery of SUSY. If a stop that is at least partially right-handed is the lightest superpartner, then the decayt → bs would be the dominant decay channel in these MFV RPV SUSY models [22,23]. However, if the higgsinos (which can naturally be nearly degenerate) are lighter than the stop, the decayt → bχ + (and, when not kinematically forbidden, the phase-space suppressedt → tχ 0 1,2 ) would dominate as this decay uses the top yukawa, an O (1) coupling. The chargino can then decay promptly through an off-shell t to yield a six b-quark and two light quark final state [24,25] This branching ratio can realistically be near 100%.
In R-parity conserving SUSY, the addition of a hidden sector [26,27] could yield a very high multiplicity of b-jets without introducing significant E / T . Stealth SUSY [28,29] is a simple example that can give rise to this signature. If the stealth sector contains an NMSSM-like singlet and singlino (with mS ≈ m S ), then S will most often decay to bb through its mixing with the Higgs sector. In the presence of a jet p T hierarchy [30], gluinos as light as 800 GeV could be hiding in the data. A simple model involves whereq is a second-generation squark, mg mq introduces a jet p T hierarchy, and a singlet-singlino mass squeezing makes the gravitino too soft to contribute appreciably to the E / T . In principle, this is a signal with twelve jets containing eight bs. Even with a jet p T hierarchy, this simple model may be constrained by searches sensitive to the rarer S → τ + τ − decays. However, with decoupled gluinos (which is consistent with naturalness for Dirac gluinos), the direct pair production of higgsinos in a similar model, gives rise to eight jets, all of which are b-quarks. As an alternative, one could have direct sbottom production, where the sbottom decays as, giving rise to six jets, all of which are b-quarks. One could argue that S → τ + τ − decays could constrain this model as well. However, due to the paucity of τ + b searches at the JHEP08(2014)073 LHC, there may be no sensitivity to this much lower S T signature [24,25]. Even if such searches were to exist, after branching factors, soft partonic-level τ s of O (30 − 40 GeV) would only rarely produce events with useful light leptons. Resonances of extended Higgs sectors can cascade decay into other resonances. For instance, many b-jets can arise from production of a heavy Higgs that decays dominantly into two pseudoscalars, which each decays to a SM higgs and a light pseudoscalar, i.e., pp → H → AA → φhφh → (bb)(bb)(bb)(bb), (2.5) where A and φ are pseudoscalars, H is the heavy Higgs, and h is the observed ∼ 125 GeV SM-like Higgs state. Although constrained by the current best fits to the observed Higgs state [31], if the model is an extension of a Type III or IV two Higgs doublet model (also known as "lepton-specific" and "flipped," respectively) [8], then bb pairs can dominate all pseudoscalar decays. With the high branching fractions and low expected backgrounds, it is plausible that multi-b could prove the most sensitive channel to these models. However, it is also possible that other decay channels of the Higgs, such as h → W W * or h → γγ, could prove the more promising discovery mode. Decay chains in extended Higgs sectors, such as the example shown here, are a motivated extension to the standard model wherein many b-jets can occur. The decay of b → bh → b(bb) makes vector-like fourth-generation quarks another place where high multiplicity b-jet signatures could appear. Minimal b models require BR(b → bZ) ∼ BR(b → bh) [32]; in these models, if BR(b → bh) ∼ 50%, then BR(b b → 6b) ∼ 13%. However, in extended models, such as those having both (b q −4/3 ) L/R and b L/R fields (in analogy to the t models of [33]), the branching ratios of b have more freedom, and b → bh can dominate. Although these models can receive significant constraints from precision electroweak observables, most notably Z → bb, a complete model may realize a consistent solution. As we are only positing this decay path as a phenomenological possibility, uncovering a specific model is tangential to the focus of this work.
For the remainder of this paper, we will focus on the 6b + 2j signature of pair-produced stops in natural MFV RPV SUSY (figure 1). This signal has been shown to receive no meaningful constraint from any existing LHC study [24,25]. To remove the possibility of top quarks entering in decays (which could receive constraints from existing searches), we will consider signal regions where the decayt → tχ 0 is negligible, i.e. mt − mχ 200 GeV. The branching fraction into the desired final state can be set to one without invoking any artificial squeezing of new states or contrived assumptions about coupling structures. After the branching fraction is set to one, this simplified model is left with only two free parameters: the mass of the stop and the mass of the chargino.
We will briefly note that displaced decays, including two-, three-, and four-body scenarios, are a realistic option for most of the signatures mentioned above. While there are certainly difficulties with b-jets originating from a point away from the primary interaction vertex, this possibility is important enough that such signals should be specifically addressed in studies of displaced new physics.
Estimating the sensitivity to the various other signatures mentioned here is beyond the scope of this work. However, we stress that high b-jet multiplicities can easily arise from a variety of new physics models.

Cuts
Region 1 Region 2 Region 3 Region 4 Region 5 GeV and |η| < 2.5 Table 1. The cuts used in this study. H T is, in all cases, the sum over jets with p T > 40 GeV with |η| < 2.5. In addition to using higher H T cuts, the different signal regions also use different b-tagging working points. The parameters b eff , c eff , and j eff are the percent of jets originating from a partonic-level b, c, or light parton (guds) that are tagged as a b-jet. Realistic efficiencies are here taken from the CMS 7 TeV study of b-tagging with the CSV tagging algorithm [34].

A many b-tags study
While signatures containing five or more b-jets with neither E / T nor isolated leptons may be motivated from a new physics perspective, the challenge to such a study is in the background estimation. However, even with an unknown background, it can be possible to place meaningful bounds. In fact, if the signal alone (i.e., considered in the presence of zero background) would be too large to account for the observed data, then the signal can be excluded. In this section, we present a simple study where our RPV stop signature can be conservatively constrained independent of the estimated background. We will then illustrate how an already established technique could ferret out the resonant structures in the events and ultimately conclude that an excess is due to a genuine signal, rather than due to a larger than expected background. Such an asymmetric approach to exclusion and discovery has been used in experimental studies before (see, for instance, [35]), so this general strategy is not without precedence. One immediate concern for signals with high jet activity, but without leptons or E / T , is whether the events will pass the trigger requirements. Fortunately, CMS has in place six-jet, eight-jet, and high-H T triggers that have no prescaling instituted. By the end of the 8 TeV run, these stood at: 6j -p T,j > 45 GeV; 8j -p T,j > 30 GeV; and H T -(over jets with p T > 40) p T,j > 750 GeV [36]. 1 In this study, the H T trigger proved the most powerful probe for most signals, so it will be the only trigger used throughout this study. 2

JHEP08(2014)073
As the trigger efficiencies in our region of interest are unknown, we assume they are 100%. Unless the trigger efficiencies are very low, the results will not be extremely sensitive to these details, as our signal is typically an order of magnitude larger than our backgrounds in regions where the trigger efficiency turn-on could be an issue. We note that the n b = 0 QCD data could be used to reliably model the H T trigger efficiency turn-on.
In this study, we use five different signal regions with differing H T thresholds. Additionally, we utilize three different b-tagging working points (from the CMS CSV tagging algorithm [34]) for the search -medium, tight, and very tight -with approximate btagging efficiencies of 0.7, 0.6, and 0.5, respectively. We note that the "very tight" working point is taken from their figures, but has not been used in any CMS experimental study thus far. For simplicity, in this work, b-tagging efficiencies are treated as being constant in both p t and η. Details of the five search regions are shown in table 1.

Backgrounds
The dominant backgrounds to signals with ≥ 5 bs are ttbb, bbbbbb, bbbbcc, and bbbb + {light jets}. With the exception of ttbb, these have never been explicitly measured in published LHC data. The QCD cross-sections for each of these processes have very large uncertainties. As it is difficult to reliably estimate the backgrounds, we will implement an asymmetric approach that is insensitive to the background estimation by using different methods for placing limits (subsection 3.2) and for making discoveries (subsection 3.3). However, we will first estimate the sizes that might be expected for these standard model backgrounds.
CMS has measured ttbb explicitly in 8 TeV data and found σ(ttbb) = 460 ± 45(stat) ± 120(sys) fb under a b-jet p T -cut of 20 GeV [37]. As this measurement was made within the relatively clean, semi-leptonic top decay channel, the size of the all-hadronic channel relevant for this study can be reliably scaled. For the signal regions used in our study, we adopt a p T -cut of 30 GeV, however, we will conservatively apply the measured p T > 20 GeV cross-section to our sample. To simulate this background, we generated ttbb + {0, 1, 2} jets with Alpgen [38] and showered with PYTHIA 8 [39]. Alpgen is not equipped to match jets in the 4Q sample [38], however, we combine these three samples and normalize their sum to the measured σ(ttbb) of [37]. As a cross-check, ttbb was also generated in MadGraph 5 [40]. The normalized distribution there is in good agreement with the Alpgen sample. In each of our five signal regions, 1 event is expected (see table 2).
The cross-section for bbbbbb (bbbbcc) is unknown. Naïve generation in MadGraph 5 yields a partonic-level 8 TeV cross-section of 11 fb (35 fb) under the cuts p T,j ≥ 30 GeV and |η| < 2.5. Of course, six jets with p T of 30 will not pass the trigger thresholds, and we found that typically no events between the two samples would be expected to pass the selection cuts of our signal regions. For our presentation, we apply a K-factor of K = 3 to that sample (resulting in a cross-section of 33 (105) fb). This K-factor is chosen because it is greater than typical top system K-factors, and thus we believe it to be a conservative for several different choices of p T,j/b > 30, 40, 50, 60 GeV, as well as Nj ≥ 8, N b ≥ 6, p T,j/b > 40 GeV, each occasionally served as the most effective search region, although the gain over the HT -based search regions was typically small.  Table 2. Above: the backgrounds and how they populate the signal regions used in this study. Hard process generation is in either Alpgen or Madgraph 5, as discussed in the text of section 3.1, and parton showering is performed by PYTHIA 8. ttbb is normalized to the measured cross-section of [37], while all other backgrounds receive a K-factor = 3. We stress that these very uncertain backgrounds are not ultimately necessary for our proposed method of setting limits. In projecting a potential exclusion for figure 2, we round our expected background to act as our "observed" values. Below: six of our signal benchmark points are presented (indicated with mt and mχ± ), and we display how they are expected to populate each of our five signal regions. The most sensitive channel to that signal is emboldened. Production of the signals is in Madgraph 5; showering is in PYTHIA 8.

JHEP08(2014)073
choice. However, even with this K-factor, these two samples together yield fewer than one event in each signal region.
The cross-section for bbbb+{light jets} is also unknown. To simulate this, we use Alpgen to generate bbbb + {0, 1, 2, 3} jets. Again, Alpgen is not equipped to match jets in the 4Q sample [38], however, we scale these four samples up by a K-factor of 3, yielding an overall cross-section of 88 pb. 3 As can be seen in table 2, this proves the dominant background in all signal regions. However, there are very large uncertainties on this estimation. It has been established that large K-factor corrections can sometimes appear in QCD backgrounds [9], especially along the tails of distributions. For this reason, we are hesitant to utilize leading order monte carlo estimations in setting exclusion limits. 4 Fortunately, as we will discuss in 3 Alpgen technically includes the bbbbbb and bbbbcc in this sample [38]. However, as a direct generation of this is shown to be quite small, we ignore this double counting. 4 On the other hand, extremely good agreement has also been measured between some monte carlo JHEP08(2014)073 Shown in the plane of mt vs δ ≡ mt − mχ± are the 95% CL limits derived from the number of "observed" events as estimated in table 2. The thick, solid (thin, dashed) line corresponds to the conservatively defined signal-only, i.e. derived assuming expected background is zero as discussed in the text, 95% exclusion limit for a 25% (50%) systematic uncertainty applied to the signal. The five signal regions are denoted by R1, R2, etc., and indicate the most constraining signal region in the case of 25% signal systematic uncertainty. In the upper left corner of the plot, mχ falls below 100 GeV, and would either have appeared at LEP or has a mass that is unphysical. the next subsection, the method we utilize in this study is designed to remove all sensitivity to the background estimations.
As an alternative to the extremely conservative exclusion strategy we will use in this work, it is possible to attempt to utilize data-driven techniques to estimate the backgrounds. Unfortunately, projections from a control region with n b = 5 are nearly impossible, because the signal would be expected to heavily contaminate any such control region that contains events. However, one could use projections from control regions with n b = 0−4 to estimate the size of the n b = 5 backgrounds. This would be particularly useful in measuring the dominant background of bbbb + {light jets}. Similarly, while it is expected to be small, an ansatz of σ(bbbbjj) σ(bbbbbb) ≈ σ(ttjj) σ(ttbb) ≈ 2.3% (3.1) could prove useful in making a controlled extrapolating from n b = 3, 4 measurements to get estimations for the bbbbbb and bbbbcc backgrounds. 5

Signal exclusion
As can be clearly seen in table 2, our signals over much of the interesting region are expected to be much larger, often by even an order of magnitude, than our backgrounds. While one might not trust our background estimates, we can assume solely for the sake of deriving extremely conservative exclusion limits that our expected backgrounds are zero. If our expected number of signal events proves too impossibly large to have only produced the number of observed events, then the signal can be excluded with some confidence. For instance, if a study were performed and Region 2 were measured to have 6 events, whereas distributions and LHC data [41]. 5 The ttbb and ttjj cross-section data are taken from [37].

JHEP08(2014)073
our signal predicts 50 events (as with our mt = 500 GeV; mχ = 350 benchmark), then it is extremely improbable that this signal would have such a large downward fluctuation to produce a mere 6 events. This can be well excluded, certainly at a 95% confidence level, even with fairly large systematic uncertainties on the signal. This conservative exclusion method is only useful in placing robust limits. In particular, in the event of a genuine signal, this method would be of no use for discovery. This conservative exclusion method of assuming the backgrounds are zero can be applied across the five signal regions to constrain our benchmark RPV model. To simulate our signal, we use a grid of points in mt ∈ [150, 1000] GeV in 50 GeV steps vs mχ± = [mt − 25 GeV, mt − 200 GeV] in 25 GeV steps. These signal samples were generated using MadGraph 5 and showered in Pythia8. 6 In figure 2, the exclusion contours, assuming the data observed manifests as in table 2, are shown as a thick, solid (thin, dashed) black line for signal systematic uncertainties of 25% (50%), where the signal uncertainty is assumed to be gaussian. The distribution of the five signal regions shown is for the case with 25% systematic signal uncertainty. As can be clearly seen, even under the conservative assumptions used here, powerful exclusion on the signal region is achievable. Of course, for the same number of observed events, exclusions would become stronger by treating any background estimation acquired from data-driven methods.

Signal discovery
Unfortunately, the method utilized in the previous section is only capable of setting limits and could never indicate an excess as being due to a genuine signal. In the face of very uncertain backgrounds, how can one make a discovery? Fortunately, these exotic events are, in principle, fully reconstructable. If a larger than anticipated background is observed, then one can utilize the "jet-ensemble" technique of [43][44][45][46] to search for resonant structures.
In these CDF and CMS searches for RPV gluinos, a scatter plot is formed by taking each event which passes the selection cuts, considering all possible trijet combinations, and plotting i |p T,j | vs M jjj . As the trijet objects originating from a true resonance will occasionally be somewhat boosted, the decay products will be slightly collimated and appear with a higher i |p T,j |, while preserving an M jjj value close in mass to the resonance. On the other hand, objects that do not all originate from a common resonance typically have M jjj i |p T,j |. Thus, by only considering events populating a region offset by j |p T,j | − M jjj > ∆ j , both the incorrect jet combinations in signal events, and all combinations in the typical background events are removed.
We will adopt this strategy by taking the leading eight jets with p T,j > 30 GeV in each event (if fewer than eight jets are in an event passing the selection criteria, we take them all) and consider all possible combinations to form both trijet and quadrajet resonances (for eight jets, this means 56 and 70 combinations, respectively). We will implement a JHEP08(2014)073 very large offset of ∆ j = 300 GeV. These distributions are presented in figure 3 for our mt = 500 GeV and mχ± = 350 GeV benchmark in signal region 2. Shown in blue are 50 events from this signal; in red are 50 events originating from a scaled up bbbb + {light jets} background (with a K-factor of ∼ 30 applied). Even with only 50 events, the distribution of signal is markedly different from that of the background within the signal region (lower right region of each plot). In models producing fewer events, the higher energy LHC run may be necessary to determine that an excess originates from a genuine signal, as opposed to a higher than expected background. In figure 3, a ∆ j cut of 300 GeV was used, but the results are not very sensitive to this precise choice. A normalized fraction of combinations passing the ∆ j cut is shown in figure 4 for both four-jet (solid) and three-jet (dashed) reconstructions. There, it can be seen that increasing ∆ j more harshly affect background than signal. Even a mild choice of ∆ j (e.g., 100 GeV) can discriminate signal from background, but a larger value provides better discrimination if there are sufficiently many events. This fact has no significant dependence on the spectra of signal masses.
The kinematic features of the signal and background samples shown in figure 3 should hold even if fully matched samples were generated. As the signal distribution is due to the decay of slightly boosted resonances, this feature should be unaffected by matrix element/parton shower matching. If Alpgen was equipped to match the background samples with an MLM matching procedure, the matched events would still be a subset of the unmatched events used in this study. Matching could enhance the number of background events populating the signal region only if there were an abnormally large matching efficiency ratio between background events that reach the signal region and events that do not. As there is no reason to expect such a behavior, and nothing such as this appears in the QCD data of [46], it should not happen to these high b-jet multiplicity samples.  Of course, because the simulation done here is at leading order with unmatched jets, the uncertainties on the background are potentially large, especially the uncertainties in the kinematic distributions. Data-driven control regions should be used to more accurately estimate the kinematic distributions of the backgrounds. In an actual experimental study, one could utilize the data from lower n b channels, i.e. n b = 0, 1, 2, 3, 4 regions, to estimate the expected kinematic distributions of the jets in the n b = 5 sample. In particular, the number of events expected to fall into the signal region at particular m 3j and m 4j values, should be predictable. Importantly, the contamination from the top samples in each n b region needs to be correctly extrapolated. The details of implementing this data-driven approach are better suited for the experimenters conducting the study.
While not relevant for the particular model studied here, the other models producing high b-jet multiplicity signatures that were mentioned in section 2 all contained bb resonances. Resonances in bb are quite generic, and identifying them within an excess would be a key to understanding the signal, so we will briefly discuss how one could distinguish them from background. This discussion is only intended to indicate that there are simple handles on the problem that can be employed to discover these scenarios in the event of an excess, and is not intended to be a thorough study of these bb resonance scenarios.
First, one could plot the dijet masses versus sum p T formed only from b-tagged jets. This would help to reduce combinatoric inefficiency. For the stealth gluino example, one would expect decay products to often be quite boosted. However, in cases where the boost is not typically as significant (for instance, the higgsino production example), this method JHEP08(2014)073 would be less effective. As another option, one could, for each event, choose the two bb pairs with the greatest ∆ = |p T,b 1 | + |p T,b 2 | − M bb , and plot M bb,1 vs M bb,2 to locate resonances. This method would obviously be most effective when all resonance are of the same mass, although with enough events, it could suffice even if there are multiple bb resonances of different masses.

Conclusions
In this work, we have discussed the value and feasibility of searching for new physics within very high b-jet multiplicities. We utilized signal regions that focused on the H T -based trigger at CMS, and that contained five or more b-tagged jets. With the conservative estimates used in this study, these channels are shown to have very low expected backgrounds, which should yield only a handful of events in the existing 20 fb −1 data set at 8 TeV.
To study the potential for uncovering new physics in this channel, we focused on a particular R-parity violating and minimal flavor violating supersymmetric scenario with a natural superpartner mass hierarchy containing only a right-handed stop and nearly degenerate higgsinos. This model, with only a few parameters, receives no constraints from any existing experimental study. Due to the large background uncertainties, we adopted an asymmetric approach to constrain and discover our signal through different methods. With our study, we illustrated that powerful constraints are feasible, even under extremely conservative assumptions about the size of the background. We showed that stops up to 750 GeV could realistically be excluded by a study using the current data set.
The asymmetric method, cut-and-count for exclusion and resonant reconstruction for discovery, can be applied to other scenarios with very uncertain backgrounds as well. For instance, the backgrounds to another potential signature of RPV stopst → tχ → t(jjj) has extremely large uncertainties, however, the tt + 6 jet background is small enough that meaningful limits could be placed (as can be inferred from [47]). Resonant reconstruction on the same sample could distinguish new physics from a large background.
While one of the most motivated RPV SUSY scenarios can give rise to this unconstrained signature, we discussed a handful of other models that can also yield high b-jet multiplicities. These specific examples included stealth SUSY, b models, and extended Higgs sectors. The search strategy employed in this work is generic enough that other models of new physics, especially those that couple to standard model particles with a third-generation dominant structure, could be constrained or discovered through an experimental study of this nature. Models such as these can slip through the current LHC analyses, meaning this signature is an outstanding gap in LHC coverage at both ATLAS and CMS.
Even in the absence of new physics, a measurement of high b-jet multiplicities at the LHC is important, as it can help to further our understanding of QCD, while simultaneously providing important constraints on models that have not yet been imagined. When the LHC begins collecting data at a higher energy, the results from 8 TeV will be invaluable for background estimation. For these reasons, such an experimental study should use general cuts and not tailor itself specifically to any one signal in order to maximize its value in the future.