UvA-DARE (Digital Academic Repository) A search for resonances decaying into a Higgs boson and a new particle X in the XH → qqbb final state with the ATLAS detector

A search for heavy resonances decaying into a Higgs boson ( H ) and a new particle ( X ) is reported, utilizing 36.1 fb − 1 of proton–proton collision data at √ s = 13 TeV collected during 2015 and 2016 with the ATLAS detector at the CERN Large Hadron Collider. The particle X is assumed to decay to a pair of light quarks, and the fully hadronic ﬁnal state XH → q ¯ q (cid:4) b ¯ b is analysed. The search considers the regime of high XH resonance masses, where the X and H bosons are both highly Lorentz-boosted and are each reconstructed using a single jet with large radius parameter. A two-dimensional phase space of XH mass versus X mass is scanned for evidence of a signal, over a range of XH resonance mass values between 1 TeV and 4 TeV, and for X particles with masses from 50 GeV to 1000 GeV. All search results are consistent with the expectations for the background due to Standard Model processes, and 95% CL upper limits are set, as a function of XH and X masses, on the production cross-section of the XH → q ¯ q (cid:4) b ¯ b resonance. © 2018

1 Introduction 2 ATLAS detector is the jet mass estimated via tracks with p T > 0.4 GeV associated with the large-R jet using "ghost association" [39], and is scaled in the combined mass formula by the ratio of calorimeter to track p T estimates in order to account for the missing neutral-particle component in the track-jet. The ghost association technique relies on repeating the jet clustering process with the addition of measured tracks that have the same direction but infinitesimally small p T , so that the jet properties remain unaffected. A track is associated with a jet if it is contained in the jet after this re-clustering procedure. The weighting factors w calo and w track depend on p T satisfy w calo + w track = 1, and are used to optimize the combined mass resolution. To partially account for the energy carried by muons from semileptonic b-hadron decays, the four-momentum of the closest muon candidate satisfying "Tight" muon identification criteria [46] with p T > 4 GeV and |η| < 2.5 that is within ∆R = 0.2 of a track-jet that is b-tagged is added to the calorimeter jet four-momentum [47]. In this study, only large-R jets with m J > 50 GeV are considered for further analysis.
Identifying a large-R jet as a hadronically decaying Higgs boson candidate is aided by using track-jets matched via ghost association to the large-R jet [48]. The identification of b-hadrons relies on a multivariate b-tagging algorithm [49] applied to a set of tracks in a region of interest around each track-jet axis. The b-tagging requirements result in an efficiency of 77% for track-jets containing b-hadrons, and the misidentification rate is ∼ 2% (∼ 24%) for light-flavour (charm) jets. These were determined in a sample of simulated tt events. For simulated samples the b-tagging efficiencies are corrected, based on the jet p T , to match those measured in data [50].
Identifying a large-R jet as a hadronically decaying X → qq boson candidate is aided by using jet substructure techniques. The variable D 2 is defined 2 as a ratio of two-and three-point energy correlation functions [51,52], which are based on the energies and pairwise angular distances of particles within a jet. This variable is optimized [53] to distinguish between jets originating from a single parton and those from the two-body decay of a heavy particle. A detailed description of the optimization, performed using simulated V decays, can be found in Refs. [54,55]. Studies of boosted W boson decays [54] demonstrate that the D 2 variable is well modeled, with good agreement observed between data and simulation.
"Loose" electrons and muons are reconstructed and identified as described in Refs. [46,56]. These leptons have p T > 7 GeV, |η| < 2.5 (2.47) for muons (electrons), |d 0 |/σ d 0 < 3 (5) and |z 0 sin θ| < 0.5 mm, where d 0 is transverse impact parameter with respect to the beam line, σ d 0 is the corresponding uncertainty, and z 0 is the distance between the longitudinal position of the track along the beam line at the point where d 0 is measured and the longitudinal position of the primary vertex. An isolation criterion is also applied. Specifically, within a cone of size ∆R = 0.2 (0.3) around an electron (muon), the scalar sum of transverse momenta of tracks divided by the lepton p T is required to be less than a cut-off value, chosen to provide a constant efficiency of around 99% as a function of p T and |η|.

Event selection
Events are selected from the 2015 (2016) running period using a trigger that requires a single large-R jet with p T > 360 (420) GeV. Selected events are required to have at least one primary vertex (PV) that has at least two associated tracks, each with transverse momentum p T > 0.4 GeV. For events with more than one PV candidate, the one with the largest track p 2 T is chosen as the hard-scatter PV. To select a hadronic final state, events with one or more "Loose" charged leptons (electrons or muons) are vetoed.
Events are kept only if they contain at least two large-R jets. For events with more than two large-R jets, only the two "leading" jets, namely those with highest p T , are considered as possible X and H candidates. To ensure high and uniform trigger efficiency, the leading large-R jet must satisfy p T > 450 GeV.
Additional requirements are imposed to distinguish hadronically decaying boson candidates from background jets. To be identified as a Higgs jet candidate, the combined mass of the jet is required to satisfy 75 GeV < m J < 145 GeV, a criterion that is ∼90% efficient for Higgs boson jets. The number of b-tagged jets inside the Higgs jet is used to categorize the events in two orthogonal signal regions: the "1-tag" signal region (SR1) and "2-tag" signal region (SR2) definitions require Higgs candidates to contain exactly one and two b-tagged jets, respectively. The SR1 region recovers some efficiency loss in the high Y mass region where the angular separation between the two b-quarks from the Higgs decay is small and the fragmentation products are merged [47]. As depicted in Fig. 1 and described in Section 7, a number of different samples of events are selected to be used as part of the background estimation strategy. These samples include events with the Higgs jet candidate with a mass in the so-called "high sideband" (HSB) region with 145 GeV < m J < 200 GeV, as well as events in a "low sideband" (LSB) region with 50 GeV < m J < 75 GeV. In addition, "0-tag" samples, with Higgs candidates that pass these various mass requirements but have no associated b-tagged jets, are also selected. The 0-tag sideband samples are denoted by LSB0 and HSB0. The 0-tag sample satisfying the SR mass cut is denoted by CR0 to emphasize its use as a control region in the background determination (see Section 7), rather than as an additional signal region. Figure 1: Illustration of the definitions of the various mutually exclusive data regions. The data are divided into Low Sidebands (LSB), Signal Regions (SR), and High Sidebands (HSB) according to the mass of the candidate Higgs jet. Each region is subdivided further according to the number of b-tagged jets associated with the candidate Higgs jet. The 0-tag sample satisfying the SR mass cut is denoted by CR0 to emphasize its use as a control region in the background determination, rather than as an additional signal region.
If both large-R jets fulfil the Higgs mass requirement, the jet with the highest number of associated btagged jets is chosen as the Higgs candidate. In case both have the same number of b-tagged jets, the ambiguity is resolved by assigning the leading p T large-R jet as the Higgs candidate. Studies of signal samples with m(X) = 110 GeV demonstrate that, in the lower m(Y) region, the probability to correctly identify the H and X jets is 98.8%. This value decreases to 89.5% for m(Y) = 4 TeV, due to the falling efficiency in the very high mass region to tag two separate b-tagged jets within the H jet.
The large-R jet which is not chosen as the Higgs candidate is assigned to the X hypothesis. The jet substructure variable D 2 , described previously, is used to check whether the X candidate jet is compatible with the two-prong structure expected due to its assumed X → qq decay. The requirement on the value of D 2 corresponds to that defined in Refs. [54, 55] to provide a constant efficiency of 50% for selecting hadronic V decays, when applied along with an associated requirement that m J lies within a window around the V mass. Given that the X mass is a priori unknown, the event selection does not make any requirement on the mass of the X candidate jet beyond the m J > 50 GeV restriction applied to all large-R jets, however the data are interpreted in windows of m X .
The mass of the Y → XH candidate is required to be larger than 1 TeV. The efficiencies for the 1-tag and 2-tag signal region selections are presented as a function of m(Y) and m(X) in Fig. 2 for the various signal samples. No signal samples with m(Y) > 4 TeV were simulated, due to the low expected sensitivity for such high masses with the current data sample. However, data events with higher reconstructed Y mass values are included in the overflow bin in the mass distributions.

Signal modelling
The signal search in the two-dimensional space of m(Y) versus m(X) employs a sliding-window technique to the m J spectrum of the X candidate jet, dividing the data into a series of overlapping m J ranges. As described in Section 9, for each m J window of the X candidate jet, the corresponding m JJ distribution is examined for evidence of an excess due to a signal, where m JJ denotes the invariant mass of the system formed by the H and X candidate large-R jets.
The reconstructed X mass resolution varies from 12 GeV to 40 GeV. The widths of the overlapping m J windows for the X candidate jet are chosen to be around twice the X mass resolution, but also take into account the limited number of data events. The central values of neighbouring m J windows are shifted by roughly half of the X mass resolution, ensuring that they overlap and do not leave gaps in the search coverage. Studies in which simulated signals were injected at various mass values were used to demonstrate the reasonableness of the window choices.
The reconstructed Y → XH mass resolution varies from about 65 GeV to 100 GeV. For each m J window of the X candidate jet, the binning for the corresponding m JJ distribution is chosen to be at least as large as the Y mass resolution, dependent on the size of the data sample. , are subsequently applied to parameterize grid points using the surrounding four nearby points. The normalization of the parameterized signals is estimated using linear interpolation.

Background Estimation
After the event selection, over 96% of the SM background originates from multijet processes. A datadriven method is used to determine the shape of the dominant multijet background, with its normalization in the signal regions being determined with the fit procedure described in Section 9. The small additional background contributions from tt and V+jets production are estimated via simulation. The background determination is performed separately in each of the m J windows considered for the X candidate jet mass.
As depicted in Fig. 1, the event sample, before any specific requirements are made on the mass of the X candidate jet, is separated into mutually exclusive categories according to the mass and number of btagged jets of the Higgs boson candidate. The SR1 and SR2 regions are used to perform the signal search, while the other seven regions are used as control regions to develop and validate the background model. The multijet background component in each of the seven control regions is determined by subtracting from the data the tt and V+jets contributions predicted by simulation.
The modelling of the dominant multijet background in the SR1 and SR2 signal regions starts from the CR0 sample, for which the candidate Higgs jet satisfies the SR requirement on the jet mass but has no associated b-tagged jets. The CR0 sample should contain negligible signal contamination. The basic strategy is to use events in the CR0 sample with at least one (two) track-jet(s) associated with the Higgs jet to model the shape of the multijet background in the SR1 (SR2) signal region. However, sources of possible differences between the multijet components of CR0 versus SR1 and SR2 include changes in the underlying event populations due to the absence or presence of b-quarks as well as kinematic differences arising from the application of b-tagging, since the b-tagging efficiency depends on the p T and η of the track-jet. The corrections required to take these differences into account are extracted from the HSB events. In each of the HSB regions with different number of b-tags (HSB0, HSB1, HSB2), a pair of two-dimensional histograms is filled, one of p T versus η for the leading track-jet, and the other for the subleading track-jet. Subsequently, leading and subleading track jet reweighting maps are created by dividing the HSB1 and HSB2 histograms by the corresponding HSB0 histogram. A Gaussian kernel, similar to that applied in Ref.
[59], is used to smooth out statistical fluctuations by taking a weighted sum of neighbouring bins. The weight is inversely proportional to the width of the Gaussian kernel and depends on the statistical uncertainty of each bin. The result is a reduction of statistical fluctuations, while adding negligible bias to the distributions.
To describe the SR multijet background shapes, the CR0 events are reweighted using the maps in bins of track-jet p T and η extracted from the sideband. The reweighting is performed only for the 2-tag samples, as the modelling of the shape of the multijet background in the 1-tag sideband regions without reweighting is observed to be adequate. Due to the correlations between the kinematic variables of the two leading track-jets used for the reweighting, multiple reweighting iterations are performed, until the track-jet properties, p T and η, are matched within statistical uncertainties.
The background modelling is validated by using the same method to predict the background in the LSB regions in the low mass sidebands, and then comparing with the data and obtaining good agreement.
Agreement is also confirmed in the signal region by integrating over all values of the mass of the X candidate jet, thereby diluting any possible signal contamination to a negligible level.

Systematic uncertainties
The main systematic uncertainty in the background estimate arises from the potential mis-modelling of the background processes. As described in Section 7, the multijet background shapes in the two signal regions are estimated directly from data using sideband regions. To determine the systematic uncertainty in this method, the multijet background shapes in the HSB1 and HSB2 regions are compared, in each m J window of the X candidate jet, to those of the reweighted HSB0 multijet distributions. The differences, in bins of m JJ , fitted with a line are used as the multijet modelling systematic uncertainties. The largest observed shape difference is found to be approximately 12%, while the smallest is 3%. A normalization uncertainty of 30% is assigned to the small tt background, based on the ATLAS tt differential crosssection measurement [60]. The same uncertainty is conservatively applied to the small V+jets background component.
The uncertainty in the combined 2015+2016 integrated luminosity is 2.1%. It is derived, following a methodology similar to that detailed in Ref.
[61], from a calibration of the luminosity scale using xy beam-separation scans performed in August 2015 and May 2016. This uncertainty is applied to the yields of both the signal and the small tt and V+jets backgrounds, which are not determined from data.
Additional systematic uncertainties in the signal acceptance arise from the choice of PDF set and the uncertainty in the amount of initial-and final-state radiation. For all the two-dimensional mass grid points, a constant 5% uncertainty is applied and covers both effects.
Uncertainties related to the signal parameterization method are estimated by comparing the mass distributions of generated signal points to those predicted from morphing nearby points in the two-dimensional space of m(Y) versus m(X). The largest observed normalization deviation is ∼8%, while differences in the signal shape also reach levels up to ∼8%. Both effects are included as uniform uncorrelated systematic uncertainties.
Systematic uncertainties which account for experimental effects on the signal shape include the large-R jet mass scale and mass resolution, as well as D 2 and the b-tagging. The impact of each of these effects is evaluated by shifting each variable according to these systematic uncertainties and then re-performing the signal parameterization. Based on the measurements documented in Refs. [55, 62], a 2% uncertainty is assigned for the large-R jets p T scale, while uncertainties of 20% and 15%, respectively, are assigned for the large-R jet mass resolution and D 2 resolution. Uncertainties in the correction factors for the b-tagging are applied to the simulated event samples by looking at dedicated flavour-enriched samples in data. An additional term is included to extrapolate the measured uncertainties to the high-p T region of interest. This term is calculated from simulated events by considering variations of the quantities affecting the b-tagging performance such as the impact parameter resolution, percentage of poorly measured tracks, description of the detector material, and track multiplicity per jet. The dominant effect on the uncertainty when extrapolating to high p T is related to the change in tagging efficiency when smearing the track impact parameters based on the resolution measured in data and simulation. The uncertainty in the btagging efficiency is measured as a function of b-jet p T and η for track-jets with p T < 250 GeV while for higher p T values it is extrapolated using simulation [50]. These uncertainties vary between 2% and 8% in the lower p T range, and rise to approximately 9% for track-jets with p T > 400 GeV.

Statistical treatment and results
The signal search in the two-dimensional space of m(Y) versus m(X) is performed by applying a slidingwindow technique to the m J spectrum of the X candidate jet, which divides the data into a series of overlapping m J ranges. For each m J window, a binned maximum-likelihood fit is then performed to the resultant m JJ distributions in both SR1 and SR2 signal regions simultaneously. A test statistic based on the profile likelihood ratio [63] is used to test hypothesized values of the global signal strength (µ), corresponding to the HVT model. The systematic uncertainties, described previously, are modelled with Gaussian or log-normal constraint terms (nuisance parameters) in the definition of the likelihood function.
Each m J window of the X candidate jet is fitted independently, but as the mass ranges overlap, the fit results are correlated. In each fit, the normalizations of the multijet backgrounds in the SR1 and SR2 signal regions are treated as free parameters. All of the systematic uncertainties are treated as correlated between the two signal regions.
The data distributions and fit results are shown for three example m J windows of the X candidate jet in Fig. 3-5. In each case, the left (right) upper plot shows the m JJ distribution in the SR1 (SR2) signal region, and the results of the simultaneous fit are superimposed on the plots. Examples of possible signal models are also shown, with arbitrary overall normalization, illustrating the corresponding contributions that would be expected in the SR1 and SR2 regions. The data distributions are well described by the fitted SM background, and there is no significant evidence of a signal. The same conclusion is valid for all m J windows of the X candidate jet.
The fits to the data are used to set upper limits on the production cross-section, σ(pp → Y → XH → qq bb). Exclusion limits are computed using the CL s method [64], with a value of µ regarded as excluded at the 95% confidence level (CL) when CL s is less than 5%. The corresponding limit plots are included in panel (c) of Fig. 3-5.
A summary of the expected and observed 95% CL limits on the production cross-section, σ(pp → Y → XH → qq bb), is given in the two-dimensional plane of m(Y) versus m(X) in Fig. 6. For X mass windows around the V and Higgs boson mass values, the results are found to be compatible with the dedicated ATLAS V H and HH searches reported in Refs.
[4] and [9], respectively. To visualize both the magnitude and the sign of discrepancies between the expected and observed limits, Fig. 7 presents the differences between the expected and observed limits, expressed in multiples of σ, the equivalent Gaussian significance. The maximum observed deviation has a local significance of about 2.5σ, corresponding to a global deviation of 1.2σ.  Figure 5: The m JJ mass distributions in the m J window of the X candidate jet from 500 GeV to 584 GeV after the likelihood fit for events in (a) the 1-tag signal region (SR1) and (b) the 2-tag signal region (SR2). The highest mass bin includes any overflows. The background expectation is given by the filled histograms and the ratio of the observed data to the background (Data/Bkg) is shown in the lower panel. The uncertainties shown are those after the fit described in the text. One particular example of a possible signal model, namely with m(X) = 542 GeV and m(Y) = 2.9 TeV, is overlaid with an arbitrary overall normalization, illustrating the corresponding contributions that would be expected in the SR1 and SR2 regions. Panel (c) shows the resultant 95% CL upper limit on the production cross-section, σ(pp → Y → XH → qq bb).   Figure 7: Differences between the observed and the expected 95% CL limits on the resonance production crosssection σ(pp → Y → XH → qq bb), expressed in terms of multiples of σ, the equivalent Gaussian significance.

Conclusion
A search for new heavy resonances Y decaying into a Higgs boson and a new particle X was carried out using 36.1 fb −1 of pp collision data collected at √ s = 13 TeV by ATLAS during the 2015 and 2016 runs of the CERN Large Hadron Collider. The Y → XH → qq bb channel was studied using a generic approach in the topological regime where both bosons are highly Lorentz-boosted, and each is reconstructed as a single jet with a large radius parameter. Jet substructure and b-tagging techniques are exploited to tag the X and Higgs bosons and to reduce the dominant multijet background. Values of Y (X) mass in the range of 1 TeV to 4 TeV (50 GeV to 1000 GeV) were considered.
A search for evidence of an excess in the XH mass spectrum was made in a large number of overlapping sliding windows in the mass of the X particle. The data are found to be in agreement with the Standard Model background expectations and only small deviations are observed, with local (global) significance of no more than 2.5σ (1.2σ). Within the framework of a modified Heavy Vector Triplet model, upper limits on the resonance production cross-section σ(pp → Y → XH → qq bb) were set at 95% CL in the two-dimensional space of the Y mass versus the X mass.