Search for single production of a vector-like quark via a heavy gluon in the 4b final state with the ATLAS detector in pp collisions at √s = 8 TeV

A search is performed for the process pp → G ∗ → B H ¯ b / ¯ B H b → Hb ¯ b → b ¯ bb ¯ b , predicted in composite Higgs scenarios, where G ∗ is a heavy colour octet vector resonance and B H a vector-like quark of charge − 1 / 3. The data were obtained from pp collisions at a centre-of-mass energy of 8 TeV corresponding to an integrated luminosity of 19 . 5 fb − 1 , recorded by the ATLAS detector at the LHC. The largest background, multijet production, is estimated using a data-driven method. No signiﬁcant excess of events with respect to Standard Model predictions is observed, and upper limits on the production cross section times branching ratio are set. Comparisons to the predictions from a speciﬁc benchmark model are made, resulting in lower mass limits in the two-dimensional mass plane of m G ∗ vs. m B H . © 2016 The Author. Published by Elsevier B.V. This


Introduction
Composite Higgs [1][2][3][4] models interpret the Higgs boson discovered at the Large Hadron Collider (LHC) [5] as a pseudo-Goldstone boson resulting from spontaneous symmetry breaking in a new strongly coupled sector, thus addressing the naturalness problem, the extreme fine tuning required in the Standard Model (SM) to cancel quadratically divergent radiative corrections to the Higgs boson mass.A generic prediction of these models is the existence of massive vector-like quarks (VLQ).These VLQs are expected to mix mainly with the third family of quarks of the SM [6][7][8], leading to partial compositeness.Colour octet resonances (massive gluons) also occur naturally in these models [6,7,9,10].
Searches for vector-like quarks in the ATLAS and CMS experiments, in both the pair and single production processes [11][12][13][14][15][16][17][18][19][20][21][22], constrain their mass to be above 700-900 GeV.This analysis is a search for single production of a vector-like quark B H of charge −1/3 via the s-channel exchange of a heavy colour octet vector resonance G * , using data recorded by the ATLAS detector at the LHC.The search is performed for the process of Hb b production through pp → G * → B H b/ BH b → Hb b → b bb b (see Fig. 1), 1 based on Ref. [23] and using the benchmark model of Ref. [9].This simplified minimal composite Higgs model has a composite sector with a global SU(3) c × SU(2) L × SU(2) R × U(1) Y symmetry and an elementary sector which contains the SM particles but not the Higgs boson.Physical states of the composite sector include the heavy gluon G * , a composite Higgs boson and heavy vector-like quarks of charge 5/3, 2/3, −1/3 and −4/3.Among these heavy quarks, there is one singlet of charge 2/3 which mixes with the right-handed top quark of the SM with an angle θ t R , and similarly one singlet of charge −1/3 which mixes with the right-handed bottom quark of the SM with an angle θ b R .After mixing between the gluons from the elementary and composite sectors by an angle θ s , the physical state of the heavy gluon has a coupling g c cos θ s to composite states, where g c = g s / sin θ s and g s is the coupling of the SM gluon.The other parameters of the model are the composite fermion masses, assumed to be universal, the heavy gluon mass m G * and two Yukawa couplings Y T and Y B .In a large part of the parameter space, the lightest of the new heavy quarks is B H , of charge −1/3, and in this model it decays exclusively to Hb.In Ref. [23], the condition m B H = m G * /2 is applied, with the result that pair production of the heavy partners is kinematically forbidden and the width of G * is consequently not too large.In the search presented here, the phase space is extended to m B H ≥ m G * /2.When m B H < m G * /2, present results on pair production of vector-like quarks can be recast in a model with a massive colour octet [24].
For high masses of the G * and B H resonances, the Higgs boson is highly boosted and the decay products are reconstructed in a single large-radius (large-R) jet in the detector, whereas for lower masses the four b-quarks are reconstructed as separate small-radius jets.The analysis uses two sets of selection criteria to target these two cases.
1 Charge conjugate states are implied in the following text.
Signal samples based on the model discussed in Ref.
[23] are generated with MadGraph5_aMC@NLO [28], using CTEQ6L1 [29] parton distribution functions (PDFs), in the mass region TeV, in steps of 250 GeV in m G * and in steps of 125 GeV in m B H .The Higgs boson mass is set to 126 GeV and its branching ratio BR H → b b to 56.1% [30].The parameters of the model are set as in Ref. [23]: The event selection requires at least two b-jets in the final state.Multijet events from strong interactions have a large cross section and are the dominant background.Due to the large number of events 2 ATLAS experiment uses a right-handed coordinate system with its origin at the nominal interaction point in the centre of the detector and the z-axis along the beam pipe.The x-axis points from the IP to the centre of the LHC ring, and the y-axis points upward.Cylindrical coordinates (r, φ) are used in the transverse plane, φ being the azimuthal angle around the z-axis.
The pseudorapidity is defined in terms of the polar angle θ as η = − ln tan(θ/2).The distance in η-φ space is referred to as required to simulate this background and the difficulty of modelling it accurately, it is evaluated using a data-driven method, as described in Section 6.Other background contributions include top-pair and single-top-quark production, generated with Powheg-Box [31][32][33] interfaced to Pythia [34] using CT10 PDFs [35].The t t sample is normalised to the theoretical calculation performed at next-to-next-to-leading order (NNLO) including resummation of next-to-next-to-leading logarithmic (NNLL) soft gluon terms with Top++2.0 [36,37], giving an inclusive cross section of 253 +13 −15 pb [38].Samples of t t + Z and t t + H events are generated with Pythia and CTEQ6L1 PDFs.The Sherpa [39] generator, with CT10 PDFs, is used to simulate W/Z + jets samples with leptonic decay of the vector bosons.Sherpa is also used to generate Z + jets events, with Z → b b, where the extra jets are produced inclusively.Contributions from diboson backgrounds-WW, WZ and ZZ-are estimated to be negligible.

Object reconstruction
The final state consists of four jets from b-quarks (b-jets), two of which come from the Higgs boson decay.If the Higgs boson is sufficiently boosted, having a transverse momentum p T 300 GeV, the two b-jets may be merged into a single jet with a large radius parameter (large-R jet) and therefore two different jet definitions are used.
Jets with smaller radius parameter, or small-R jets, are reconstructed from calibrated calorimeter energy clusters [40,41] using the anti-k t algorithm [42] with a distance parameter R = 0.4.The high p T threshold used in the event selection ensures that the contamination of jets from pile-up is small.To ensure highquality reconstruction of central jets while rejecting most jets not coming from hard-scattering events, criteria as described in Ref. [43] are applied.Jets are corrected for pile-up by a jet-area subtraction method and calibrated by a jet energy scale factor [44].They are required to have p T > 50 GeV and |η| < 2.5.
Small-R jets are identified as containing a b-hadron (b-tagged) by a multivariate algorithm [45].This algorithm was configured to give a b-tagging efficiency of 70% in simulated t t events, with a mistag probability of about 1% for gluon and light-quark jets and of about 20% for c-quark-initiated jets.The b-tagging efficiency in simulated events is corrected to account for differences observed between data and simulation.
Large-R jets are reconstructed using the anti-k t algorithm with R = 1.0.Jet trimming [46,47] is applied to reduce the contamination from pile-up and underlying-event activity: subjets are formed using the k t algorithm [48] with R = 0.3 and subjets with p T (subjet)/p T (jet) < 5% are removed.
Leptons are vetoed in this analysis to reduce background involving leptonically decaying vector bosons.Electron candidates with p T > 7 GeV are identified in the range |η| < 2.47 from energy clusters in the electromagnetic calorimeter, matched to a track in the inner detector.Requirements of 'medium' quality, as defined in Ref. [49], are applied together with two isolation criteria: the scalar sum of the transverse momentum (energy) within a radius ∆R = 0.2 around the electron candidate has to be less than 15% (14%) of the electron p T (E T ).Muons with p T > 7 GeV and |η| < 2.4 are reconstructed from matched tracks in the muon spectrometer and the inner detector.Quality criteria are applied, as described in Ref. [50], and an isolation requirement is applied: the scalar sum of the transverse momentum of tracks within a radius ∆R = 0.2 around the muon candidate has to be less than 10% of the muon p T .

Event selection
Because of the very high hadronic background at the LHC, it is not possible to have adequate Monte Carlo statistics for multijet events.The uncertainties in the quality of simulation of b-jets at high-p T can also be large.For these reasons, for each mass pair m G * , m B H being tested, a data-driven technique was used to evaluate the expected background, as described in Section 6.The technique requires that we define control regions orthogonal to the signal regions.A blind analysis is performed, in which the background is first evaluated without initial knowledge of the data in the signal regions.In order to test the large number of mass pair hypotheses, all signal region cuts are applied except the Higgs mass window which is blinded when evaluating the background in the signal regions.

Event preselection
Events in the signal region are first preselected according to the following criteria (see end of Section 5.2 for the signal region definition).
• They must satisfy a combination of six triggers requiring multiple jets and b-jets for various p T thresholds, where b-jets are identified by a dedicated online b-tagging algorithm.This combination of triggers is > 99% efficient for signal events passing the offline selection, across the B H and G * mass ranges considered in this analysis.
• They are vetoed if they contain reconstructed isolated leptons (e or µ) in order to reduce the contribution from W/Z + jets and t t backgrounds.
• At least three small-R b-tagged jets must be present in the signal region.
• The invariant mass of the system composed of all selected R = 0.4 jets is required to be greater than 600 GeV.
Two event topologies are considered for the signal, depending on the boost of the Higgs boson.Highly boosted Higgs bosons are reconstructed using large-R jets as described in Section 4 and this topology corresponds to the merged scenario (see Section 5.2).If no large-R jet is found, an attempt is made to reconstruct the Higgs boson from two small-R jets (see Section 5.3).The acceptance times reconstruction efficiency for the combined yields of the two topologies varies from 5% to 20% depending on the masses of the G * and B H .

Merged selection
The signal region for the merged case consists of the following requirements.
• A large-R jet must be present with p T > 300 GeV and |η| < 2.0 and mass in the range [90, 140] GeV.The mass window was optimised based on the signal sensitivity.If more than one such large-R jet is present, the Higgs candidate is chosen to be the one with mass closest to 126 GeV.At least one b-tagged jet must be matched to it within a distance ∆R = 1.0.
• There must be at least two additional b-tagged jets separated from the Higgs boson candidate, ∆R (H, j) > 1.4.The two with the highest p T are used to reconstruct the G * and B H candidates.
Once the Higgs boson candidate has been identified as above, there remains an ambiguity in assigning the other jets to the vector-like quark B H .The four-momentum of the B H candidate is reconstructed as the four-momentum sum of the Higgs boson candidate and either the next-to-leading-p T (category 1) or the leading-p T (category 2) b-jet away from it, depending on the assumed mass difference between G * and B H .For large G * -B H mass difference, the B H and b-quark from G * splitting have high momentum and therefore the jet from the subsequent B H decay is likely to be the next-to-leading jet.For a small mass difference the opposite is true since in this latter case the B H decay products are more boosted than the G * splitting products.For each m G * , m B H pair, the category which has the higher probability that the correct pairing is formed is chosen, based on the simulated signal events.Finally, the G * four-momentum is reconstructed as the four-momentum sum of the Higgs boson jet and the two leading-p T b-jets not matched to the Higgs boson candidate.
Different signal regions are defined for the different m G * , m B H mass pair hypotheses.They are characterised by the choice of category defined above as well as by lower cuts on the reconstructed masses of G * and B H candidates. Five inclusive signal regions were defined, with the minimum mass of the G * candidate ranging from 0.8 to 1.8 TeV and of the B H candidate from 0.5 to 1 TeV; these are shown in Table 1.
No upper cut on the resonance masses was set since the multijet background distribution falls rapidly and the resonance widths become larger for high masses.For each mass pair considered, the signal region that gives the maximum signal sensitivity, the ratio of the expected number of signal events to the square root of the number of background events, is chosen.Table 1: Signal region definitions: category 1 (2) refers to the case where the next-to-leading-p T (leading-p T ) jet not associated with the Higgs boson is assumed to be from the B H decay.

Resolved selection
Events in the resolved signal region are required to satisfy the following criteria.
• In order to be able to later combine the results with the merged channel, events are required to fail the merged selection criteria.
• Events are required to have exactly four small-R jets with p T > 50 GeV and |η| < 2.5, with at least three of these jets being b-tagged.The Higgs boson candidate is reconstructed using the two jets with invariant mass nearest to 126 GeV.The invariant mass is required to be in the interval [90,140] GeV and the transverse momentum of the dijet system p T ( j j) > 200 GeV.
The four-momentum of the B H candidate is reconstructed from the four-momentum sum of the Higgs candidate and either the leading or the next-to-leading-p T jet away from the Higgs boson jets, depending on the G * -B H mass splitting.As in the merged case, for each pair of masses considered the category is chosen to be the one with the lower mis-assignment rate of jets, based on samples of simulated signal events.Inclusive signal regions are defined by lower minimum mass values identical to the merged case, and shown in Table 1.Each mass pair is assigned to the same SR for the merged and resolved analysis.
The four-momentum of the G * candidate is reconstructed from the four-momentum sum of the four jets in the event.

Modelling of the multijet background
The 'ABCD' data-driven method is used to estimate the multijet background.For each of the ten signal regions, three control regions orthogonal to the signal region are defined: region B has all the signal region selection criteria mentioned in Section 5 applied, including the lepton vetoes and lower cuts on the masses of B H and G * candidates, but the Higgs boson candidate mass is required to be outside the interval [90, 140] GeV; region C has all the signal region selection requirements, but requires exactly two jets to be b-tagged; and region D has the Higgs boson candidate outside the Higgs boson mass window and exactly two b-tagged jets.In regions C and D, only one of the two jets not associated with the Higgs boson candidate is b-tagged.The number of multijet (MJ) events expected in the signal region (SR) is then evaluated according to where N X is the number of events in region X, after having removed the top-quark, diboson and other electroweak background contributions as determined from MC simulations.
This estimate assumes that no bias results from the choice of control regions.To evaluate and potentially correct for the effect of any biases, a re-weighting is performed on two kinematic distributions, the leading-jet p T and the ∆R between the reconstructed Higgs boson candidate and the leading jet not associated with it.Control regions C and D (B and D) are re-weighted, using a method similar to Ref.
[51], to have the same shape as in control region B (C) with weights obtained from N B /N D (N B /N D ) per bin.The effect of this re-weighting is found to be negligible and therefore no correction is applied.
A validation region is defined as the 15 GeV sideband regions outside the Higgs boson candidate mass window, i.e. 75-90 GeV and 140-155 GeV, for each signal region.The contribution from multijet background is estimated as above, but with the control regions B and D excluding these validation regions and region C now being the two sidebands.It is then compared to the number of observed data events, after adding back the simulation-based background, in these regions.Table 2 shows that the expected and observed numbers of events agree well in the validation regions for the merged-and resolved-channel signal regions.

Systematic uncertainties
Systematic uncertainties from several sources affect the expected numbers of background and signal events.Table 3 shows the estimated size of the different components.
The statistical uncertainty in the data control regions used for the estimation of the multijet background is considered as part of the statistical error.
There is an uncertainty in the number of background events due to the difference between the observed and estimated numbers of events in each of the validation regions.In each validation region, if the observed number of events is compatible with the estimated number within one standard deviation (calculated as the sum in quadrature of the relative statistical errors of the two), this standard deviation is considered to be the background estimation uncertainty.Otherwise, the background uncertainty is considered to be the fractional difference between the observed and estimated numbers of events.This is the largest uncertainty, ranging from 5% in SR1 to 27% in SR5 for the merged case, and from 3.5% in SR1 to 16% in SR5 for the resolved case.
The t t contribution dominates the simulation-based background.The theoretical uncertainty on its cross section is taken to be 6%, as discussed in Section 3.
Uncertainties due to the calibration and modelling of the detector affecting the simulation-based background estimates in the control and signal regions are principally due to the jet energy scale (JES) and jet energy resolution (JER).JES uncertainties for small-R jets include contributions from detector reconstruction and from different physics modelling and evaluation methods [52].Uncertainties leading to a higher (lower) yield than the nominal value are added in quadrature to the total JES up (down) uncertainty.
To evaluate the impact of JER for small-R jets, energies of simulated jets are smeared to be consistent with the JER measured in data.The JER systematic uncertainty is the difference between the nominal and smeared values.
JES uncertainties for large-R jets in the central region are evaluated as described in Ref. [47].The jet mass scale (JMS) uncertainty is 4-5% for p T 700 GeV and increases linearly with p T to about 8% in the range 900 p T 1000 GeV.
The total uncertainty in the measured b-tagging efficiency was evaluated in Ref.
[53] and is p T and η dependent.For high-p T jets, the systematic uncertainty is derived from simulation.It is estimated here for the simulation-based backgrounds, accounting for the statistical uncertainty, the error on the generatordependent scale factors, the track momentum scale, resolution and efficiency systematic uncertainties, and the extrapolation uncertainties for light jets.It is at or below the percent level and always dominated by the background estimation.
The predicted signal is not confined to the signal region: it could also constitute a fraction of the observed data in the control regions.The effect of this potential contamination on the statistical procedure is described in Section 8.
Systematic uncertainties due to detector effects also affect the VLQ signal yields.They are dominated by the b-tagging uncertainties, ranging from 16% to 40% depending on p T , while other sources of systematic uncertainties listed above are below 5%.Theoretical uncertainties in the signal cross section due to the choice of PDFs are estimated from CTEQ6.6 with its 22 eigenvector sets [29].Table 3: Systematic and statistical uncertainties on the total background in each of the signal regions for the merged and resolved analyses.The background estimation uncertainties have been scaled by the ratio of the multijet contribution to the total background estimation in order to get the relative error on the total background.

Results
After applying all selection criteria in the signal regions, the multijet background in the Higgs boson candidate mass window is evaluated according to Eq. ( 1).Mass distributions of reconstructed Higgs boson candidates are shown in Fig. 2 for the merged and resolved cases in SR3.The observed data and the background predictions are consistent within statistical and systematic uncertainties.
For each pair of mass points considered, the expected signal yield, based on the benchmark model, is evaluated in the corresponding signal region.These yields result from the signal σ × (A × ǫ), where σ is the cross section including all the branching fractions and (A × ǫ) is the acceptance times reconstruction efficiency of the signal selection cuts.The amount of contamination, defined as the expected ratio of the number of signal events in control regions B, C, or D to that in the signal region, is also estimated.Table 4 shows the expected and observed background event yields in each of the signal regions for the merged and resolved cases.No significant excess of data events is found compared to the expected SM background.Taking into account the number of expected background events in each of the signal regions and the yield of signal events for each test mass pair, together with all statistical and systematic uncertain-ties, upper limits at the 95% confidence level (CL), using the CL S prescription [54] and RooStats [55], are set on the cross section times the branching fraction of a signal, combining results from the merged and resolved analyses.To account for possible contamination of the control regions by signal, an iterative procedure is used: a 95% CL limit is first obtained assuming no contamination in the control regions.The contamination in regions B, C, D is then calculated, assuming a signal corresponding to that limit, and the multijet background is then re-evaluated.The procedure is repeated until it converges to a stable value.

Merged
Expected and observed limits on the cross section σ(pp where σ(pp → G * → B H b) represents the cross section of the process pp → G * → B H b and its complex conjugate, as well as the theoretical cross section for the benchmark model, with its theoretical uncertainty, are shown in Table 5. Limits for the particular cases where m B H = m G * /2 and m B H = m G * −250 GeV are shown in Figs. 3 and 4. [TeV]

Conclusion
A search for a heavy gluon and a charge −1/3 vector-like quark in the process pp → G * → B H b, with B H → bH and H → b b, has been performed using an integrated luminosity of 19.5 fb −1 of pp collision data recorded at √ s = 8 TeV with the ATLAS detector at the LHC.The main background, multijet production, is estimated with a data-driven technique.Five signal regions are defined based on the choice of jet assignment to the B H candidate and on lower mass requirements for the reconstructed G * and B H .No significant excess over the SM predictions is observed and upper limits have been set at the 95% confidence level on the total cross section times branching ratio in the two-dimensional plane of m reconstructed m G * and m BH [TeV] (1.0, 0.5) (1.3, 0.5) (0.8, 0.5) (1.5, 0.5) (1.8, 1.0)

Figure 2 :
Figure 2: Observed (black points) and expected (red band) distribution of the reconstructed Higgs boson candidate mass in signal region 3 for the (a) merged and (b) resolved cases.The normalisation of region C is applied as an overall factor, and not bin-by-bin, for the Higgs boson candidate mass window.The red error bands represent the systematic uncertainty on the expected background.The distribution from a signal with m G * = 1 TeV and m B H = 0.75 TeV is also shown for the parameters listed in Section 3. The lower panels show the ratio of the observed number of events in data to the expected background.

Figure 3 :Figure 4 :
Figure 3: Observed (solid) and expected (dashed) 95% C.L. upper limits on the cross section σ (pp → G * → B H b)× BR (B H → Hb) × BR H → b b for VLQ mass points with m B H = m G * /2, from the combined merged and resolved analyses, as well as the theoretical prediction based on parameters given in Section 3. The uncertainty band around the theory cross section reflects the uncertainty in the CTEQ6.6PDFs.

Table 2 :
Expected and observed numbers of events in the validation regions (VR) associated to their respected signal regions for the merged and resolved channels.Only the statistical error is shown.

Table 4 :
Observed data and background yields in the different signal regions for the merged and resolved cases.The first error is statistical and the second is systematic, while for individual background contributions only the statistical error is shown.Statistical errors on the numbers of data events in the control regions used to estimate the multijet background are included in the total statistical error.The row tt/top includes tt, single-top and tt + V/H backgrounds while W/Z + jets includes leptonic and hadronic decays of the vector boson.

Table 5 :
Combined limits, in fb, on σ (pp→ G * → B H b) × BR (B H → Hb) × BR H → b b .First and second entries in each cell give the expected and observed limits, respectively.The third entry gives the cross section in fb predicted by the benchmark model.Red cells are excluded and green cells are not excluded at 95% C.L. Cases where m G * > 2m B H are not considered in this analysis and are marked in yellow.