Search for resonant pair production of Higgs bosons in the $b\bar{b}b\bar{b}$ final state using $pp$ collisions at $\sqrt{s}$ = 13 TeV with the ATLAS detector

A search for resonant Higgs boson pair production in the $b\bar{b}b\bar{b}$ final state is presented. The analysis uses 126-139 fb$^{-1}$ of $pp$ collision data at $\sqrt{s}$ = 13 TeV collected with the ATLAS detector at the Large Hadron Collider. The analysis is divided into two channels, targeting Higgs boson decays which are reconstructed as pairs of small-radius jets or as individual large-radius jets. Spin-0 and spin-2 benchmark signal models are considered, both of which correspond to resonant $HH$ production via gluon$-$gluon fusion. The data are consistent with Standard Model predictions. Upper limits are set on the production cross-section times branching ratio to Higgs boson pairs of a new resonance in the mass range from 251 GeV to 5 TeV.


Contents
This paper presents a search for resonant pair production of Higgs bosons via gluon-gluon fusion in thē¯fi nal state, using the full LHC Run 2 dataset collected by ATLAS. The results are interpreted in terms of two representative benchmark models: a generic spin-0 boson, (as, for example, predicted by two-Higgs-doublet models [22] such as the Minimal Supersymmetric Standard Model [23,24]), and a spin-2 Kaluza-Klein graviton, * KK , in the context of the bulk Randall-Sundrum (RS) model [25][26][27][28]. In both cases, only the gluon-gluon fusion production mode is considered. No further assumptions are made on the signal models except for the spin hypothesis and generated resonance width. Example production diagrams for these signal models are shown in Figure 1. Throughout this analysis, the nominal →¯branching ratio is taken to be 0.582, corresponding to the SM value at a Higgs boson mass of 125 GeV [29].
The analysis is divided into two complementary channels: resolved, in which each of the four -quarks from the decays leads to an individually reconstructed jet, and boosted, which targets the topology where each is produced with large transverse momentum ( T ) and its decay products are reconstructed as a single large-radius jet. The resolved and boosted channels target low and high resonance masses, respectively. The resolved channel covers resonance masses from 251 GeV to 1.5 TeV, and the boosted channel covers resonance masses from 900 GeV to 5 TeV. The two channels are statistically combined in the mass range where they overlap.
In addition to utilizing the full Run 2 dataset and benefiting from progress in the ATLAS -jet identification algorithms, this analysis is improved over the previous ATLAS search [7] in several ways. In the resolved channel, a machine-learning algorithm is used to pair the jets into Higgs boson candidates and the fully data-driven background model is substantially improved with a neural-network-based reweighting procedure. In the boosted channel, variable-radius track-jets are used for -tagging to recover signal acceptance for the highest resonance masses, which fell in the previous analysis as the Higgs boson decay products became very collimated, and the search is extended to cover the previously unexplored resonance mass range between 3 and 5 TeV. In both channels, the analysis uses a new neural-network-based -tagging algorithm DL1r, which performs better than the older MV2 algorithm [30,31].

ATLAS detector
The ATLAS detector [32] at the LHC covers nearly the entire solid angle around the collision point. 1 It consists of an inner tracking detector surrounded by a thin superconducting solenoid, electromagnetic and hadron calorimeters, and a muon spectrometer incorporating three large superconducting air-core toroidal magnets.
The inner-detector (ID) system is immersed in a 2 T axial magnetic field and provides charged-particle tracking in the range | | < 2.5. The high-granularity silicon pixel detector covers the vertex region and typically provides four space-point measurements per track, the first hit normally being in the insertable B-layer installed before Run 2 [33,34]. The next layer outward is the silicon microstrip tracker, which usually provides eight measurements per track. These silicon detectors are complemented by the transition radiation tracker, which enables radially extended track reconstruction up to | | = 2.0.
The calorimeter system covers the pseudorapidity range | | < 4.9. Within the region | | < 3.2, electromagnetic calorimetry is provided by barrel and endcap high-granularity lead/liquid-argon (LAr) calorimeters, with an additional thin LAr presampler covering | | < 1.8 to correct for energy loss in material upstream of the calorimeters. Hadron calorimetry is provided by the steel/scintillator-tile calorimeter, segmented into three barrel structures within | | < 1.7, and two copper/LAr hadron endcap calorimeters. The solid angle coverage is completed with forward copper/LAr and tungsten/LAr calorimeter modules optimized for electromagnetic and hadronic energy measurements respectively.
The muon spectrometer (MS) comprises separate trigger and high-precision tracking chambers measuring the deflection of muons in a magnetic field generated by the superconducting air-core toroidal magnets. The field integral of the toroids ranges between 2.0 and 6.0 T·m across most of the detector. A set of precision chambers covers the region | | < 2.7 with three layers of monitored drift tubes, complemented by cathode-strip chambers in the forward region, where the background is highest. The muon trigger system covers the range | | < 2.4 with resistive-plate chambers in the barrel, and thin-gap chambers in the endcap regions.
Interesting events are selected by the first-level trigger system implemented in custom hardware, followed by selections made by algorithms implemented in software in the high-level trigger [35]. The first-level trigger accepts events from the 40 MHz bunch crossings at a rate below 100 kHz, which the high-level trigger further reduces in order to record events to disk at about 1 kHz.
An extensive software suite [36] is used in the reconstruction and analysis of real and simulated data, in detector operations, and in the trigger and data acquisition systems of the experiment. 1 ATLAS uses a right-handed coordinate system with its origin at the nominal interaction point (IP) in the center of the detector and the -axis along the beam pipe. The -axis points from the IP to the center of the LHC ring, and the -axis points upwards. Cylindrical coordinates ( , ) are used in the transverse plane, being the azimuthal angle around the -axis. The pseudorapidity is defined in terms of the polar angle as = − ln tan( /2). Angular distance is measured in units of Δ ≡

Data and simulated samples 3.1 Data sample
This analysis is performed using LHC collision data at √ = 13 TeV. Only data collected during stable beam conditions with all relevant detector systems functional are used [37]. The resolved (boosted) channel uses 126 fb −1 (139 fb −1 ) of data collected in 2016-2018 (2015-2018). The triggers for both channels are based on jets reconstructed using the anti-algorithm [38,39]. For the resolved (boosted) channel, the jets are clustered with a radius parameter of = 0.4 ( = 1.0).
The resolved channel uses a combination of 12 triggers with various requirements on the transverse energy ( T ) and -tagging status of the jets [40]. These include requirements on any one of four sets of objects: • two -jets plus two additional jets (2 +2j), • two -jets plus one additional jet (2 +1j), • a single high-T -jet, and • two -jets plus a high T , defined as the scalar sum of all jets' T (2 + T ).
The minimum T requirements on the jets are 35 GeV for the -jets from the 2 +2j triggers, 55 GeV for the -jets from the 2 + T triggers, 100 to 150 GeV (depending on the year) for the additional jet used for the 2 +1j triggers, and finally 225 or 300 GeV for the single high-T -jet trigger. The minimum T requirement is 300 GeV for the triggers which include it. In the trigger, T is computed using all jets with T ≥ 30 GeV. The efficiency of individual triggers varies from a few percent to up to 80% depending on the kinematic and -tagging requirements of the trigger, and the signal hypothesis. The choice of triggers is optimized to maximize the signal efficiency over the full range of hypothesized resonance mass values. The set of triggers used depends on the data-taking year, and the triggers from each year have different effects on kinematic distributions. This results in a different signal-to-background ratio in each year's data. Therefore, the datasets from each year are treated independently until they are combined in the statistical analysis. During 2016 data taking, a fraction of the data (8.3 fb −1 ) was affected by an inefficiency in the online vertex reconstruction, which reduced the efficiency of the algorithms used to identify -jets; those events were not retained for further analysis. This results in an integrated luminosity of 24.6 fb −1 for the 2016 dataset in the resolved channel. The integrated luminosities of the 2017 and 2018 datasets are 43.7 fb −1 and 57.7 fb −1 , respectively.
In the boosted channel, events were selected from the 2015 dataset using a trigger that required a single jet, , with T > 360 GeV. In 2016, 2017 and 2018 a similar trigger was used but requiring T > 420 GeV. The 2017 and 2018 triggers have additional requirements on the mass of the jet of > 40 GeV and > 35 GeV, respectively. The efficiency of these triggers is 98% for data passing the jet requirements described in Section 6, so the triggers do not significantly impact any relevant kinematic distributions and the datasets corresponding to each year are combined into one dataset.

Simulated samples
Monte Carlo (MC) simulation is used for the modeling of signal events and, in the case of the boosted channel, for the modeling of the tt background. The ATLAS detector response is simulated with G 4 [41] for background samples, spin-0 signal samples with a resonance mass of 1 TeV or higher, and all spin-2 signal samples. A F II [42], which utilizes a fast calorimeter simulation, is used for spin-0 signals with resonance masses below 1 TeV.
The signal processes for both benchmark models were simulated at leading order (LO) in s , using M G 5_ MC@NLO 2.6.1 [43] for the spin-0 samples and M G 5_ MC@NLO 2.2.2 for the spin-2 samples. For both cases, the NNPDF2.3 [44] parton distribution function (PDF) set was used. Both resonances were produced via gluon-gluon fusion and were forced to decay into a pair of SM Higgs bosons, as indicated in Figure 1. Signal samples were generated for resonance masses in a range from 251 GeV to 5 TeV, with increased spacing between the higher mass-points. The spacing becomes larger than the detector resolution at high masses. For the spin-0 model, interpolation methods [45] are used to estimate the mass distribution between the points above 2 TeV for which samples were simulated. The spin-2 resonance is wide compared to the mass spacing, so no interpolation is required.
For the spin-0 case, a two-Higgs-doublet model was used in the event generation, taking the heavy CP-even neutral scalar as the resonance of interest. Its width was set to be much smaller than the detector resolution, and the other non-SM particles in this model do not enter the generation (At LO in QCD, there are no Feynman diagrams for this process containing them GeV is the effective four-dimensional Planck scale. The generated width, based on the model prediction, ranges from 3% to 20% of the resonance mass, which is not negligible compared to the detector resolution. The hadronization and showering were modeled using P 8.186 [49] with E G 1.2.0 for heavy-flavor decays. The A14 set of tuned underlying-event parameters [50] and the NNPDF2.3 PDF set were used. The cross-sections for this process are taken from Ref. [28]. They are used solely for setting limits on the graviton mass. Top quark pair production (tt) was simulated at next-to-leading order (NLO) in s using P B v2 [51-54]. Parton showering, hadronization, and the underlying event were modeled using P 8.230 with E G 1.6.0 for heavy-flavor decays. The matrix element calculation uses NNPDF3.0 [55] as the PDF set, while the parton shower and underlying-event modeling uses NNPDF2.3 [44] and the A14 set of tuned parameters. The damping parameter ℎ damp , which effectively regulates radiation at high T , was set to 1.5 times the top quark mass. The tt simulation is normalized using the value of the inclusive cross-section calculated with T ++ 2.0 [56,57]. This accounts for next-to-next-to-leading-order (NNLO) corrections in The effect of multiple interactions in the same and neighboring bunch crossings (pileup) was modeled by overlaying each simulated hard-scattering event with inelastic proton-proton ( ) events generated with P 8.186 using the NNPDF2.3 PDF and the A3 set of tuned parameters [58].

Object reconstruction
Primary vertices from proton-proton interactions are reconstructed using at least two charged-particle tracks with T > 500 MeV measured with the ID [59]. The vertex which has the largest sum of squared track momenta ( 2 T ) is selected as the hard-scatter primary vertex. Hadronic jets are reconstructed using the anti-algorithm [38,39]. Depending on the use case, different input objects and radius parameters are used.
For the resolved channel, small-R jets are clustered using = 0.4 with particle-flow objects as inputs [60]. Particle-flow objects are charged-particle tracks matched to the hard-scatter vertex and calorimeter energy clusters following an energy subtraction algorithm that removes the calorimeter deposits associated with good-quality tracks from any vertex. The tracking information is used to improve the clusters' energy resolutions. The momenta of these jets are calibrated in a multi-step procedure [61]. Jets with T < 60 GeV and | | < 2.4 must also satisfy a requirement based on the output of the multivariate "jet vertex tagger" (JVT) algorithm, which is used to identify and reject jets in which much of the energy originates from pileup interactions [62]. The "Tight" working point, corresponding to an average signal efficiency of 96%, is used and jets failing this requirement are discarded. All small-jets are required to have T > 40 GeV and | | < 2.5. Any jets failing these requirements are discarded and not used further, except where stated explicitly.
Additional small-jets are reconstructed from topological clusters of energy deposits in the calorimeter [63] instead of particle-flow objects. These jets are used exclusively for the purpose of applying quality criteria to identify events which are consistent with noise in the calorimeter or noncollision background [64]. They are calibrated in the same way as the small-jets reconstructed from particle-flow objects. If an event contains at least one jet which has T > 20 GeV, passes the JVT, and fails to meet these quality criteria, the event is rejected.
For the boosted channel, large-R jets are clustered using = 1.0 with topological clusters of energy deposits in calorimeter cells as the input. The clusters are locally calibrated [65] before being combined into jets. After these large-jets are created, a trimming procedure [66] is applied to mitigate the effects of pileup: the constituents are reclustered into "subjets" using the algorithm [67] with = 0.2, and any of these subjets with less than 5% of the large-jet's T are removed. The large-jets are calibrated following a procedure similar to that for the small-jets; however, there is no area-based pileup subtraction step or global sequential calibration [68]. Additionally, the mass of each large-jet is calibrated using both calorimeter and track information [69].
The boosted channel also makes use of track-jets to identify individual -hadron decays within the largejet. ID tracks are clustered using the anti-algorithm with a variable radius. The effective radius is inversely proportional to the T of the constituent(s) in question: = / T . Here, a value of = 30 GeV is used. Minimum and maximum values of this effective radius are set at min = 0.02 and max = 0.4. Track-jets do not have a dedicated calibration; their momenta are taken to be the vector sum of the momenta of their constituent tracks. After being reconstructed, these track-jets are exclusively matched to largejets using the ghost association method [70].
A -tagging algorithm [30,31] is applied to both the small-jets and the track-jets to identify those which are likely to have originated from a -quark. The DL1r algorithm is used, at a working point chosen to have 77% efficiency on average for jets associated with true -hadrons in simulated tt events. This is a multivariate algorithm which uses a selection of inputs including information about the impact parameters of ID tracks, the presence of displaced secondary vertices, and the reconstructed flight paths of -and -hadrons inside the jet [71]. At the chosen working point, the light-jet (charm-jet) rejection measured in tt MC simulation is about a factor of 130 (4.9) on average for small-jets. The training and calibration of this algorithm is performed separately for each jet type [72,73]. Correction factors are applied to the simulated samples to compensate for differences between the -tagging efficiencies in data and simulation.
Muons are reconstructed by matching ID tracks with either MS tracks or aligned individual hits in the MS and performing a combined track fit. They are required to have T > 4 GeV and | | < 2.5, and to satisfy "Medium" identification criteria based on track-quality variables [74]. Muons are used only to apply corrections to jet momenta.
A momentum correction is applied to -tagged small-jets to account for energy lost to soft out-of-cone radiation and to muons and neutrinos in semileptonic -hadron decays. This correction follows the procedure used in Ref.
[75] and consists of two parts. For the first, if any muon is within Δ = 0.4 of a -tagged small-jet, the four-momentum of the muon is added to that of the jet. Any energy deposited in the calorimeter by the muon is then subtracted from the jet to prevent double counting; this is computed according to the description in Ref. [76]. In the second step a global scale factor is applied to each -tagged small-jet based on its T and whether or not it has a muon within Δ = 0.4 of the jet axis. These scale factors are derived from simulation.
To account for energy lost in semileptonic -hadron decays, a similar muon-in-jet correction is applied to large-jets. If a muon is matched within a distance of Δ = min 0.4, 0.04 + 10 GeV/ muon T to one of the two leading track-jets associated with the large-jet, and if the track-jet is -tagged, the muon is considered part of the large-jet. Again, any energy deposited in the calorimeter by the muon is subtracted from the jet to prevent double counting. The muon four-momentum is then added to the calorimeter-based component of the large-jet four-momentum, and the jet mass is recalculated [77].

Event selection
To be considered for analysis, events must pass the trigger requirements specified in Section 3.1. To simplify the modeling of trigger efficiencies, events are sorted into exactly one of four classes based on offline kinematic quantities, each of which requires one specific type of trigger to be passed. In decreasing order of priority, these are as follows: 1. If the leading jet has T > 325 GeV and is -tagged, the trigger requiring one high-T -jet is used.
2. If the leading jet has T > 170 GeV and is not -tagged, the trigger requiring two -jets and one additional jet is used.
3. If the T in the event (computed using all jets with T > 25 GeV and | | < 2.5) is greater than 900 GeV, the trigger requiring a high T is used.
4. For all remaining events, the trigger requiring two -jets and two additional jets is used.
The definitions of these classes are the result of a dedicated sensitivity optimization intended to minimize the expected limits on the signal cross-section. Following the trigger selection, events are required to have at least four small-jets. Events are then divided into two categories, "2 " (where exactly two jets are -tagged) and "4 " (where at least four jets are -tagged). Exactly four jets are selected to construct the two candidates. For 4 events, the four -tagged jets with the highest T are selected. For 2 events, the two -tagged jets and the two untagged jets with the highest T are selected. The 2 events are needed to construct the background model for the 4 category. This selection of untagged jets can introduce a kinematic bias with respect to the 4 category, but this is accounted for by the reweighting function described in Section 5.2.
After the four jets are chosen, there are three possible combinations for pairing them into candidates. For a given pairing, the four-momentum of the candidate is defined as the sum of the four-momenta of the jets used to construct it. The pairing is chosen using a boosted decision tree (BDT). This is trained using LightGBM [78] to classify each of the three possible pairings in a signal event as either correct or incorrect. The correct pairing is defined by using the generator's decay record to match jets to the parton-level -quarks which result immediately from the decay. The classifier assigns each of the three candidate pairings a score, and the pairing with the highest score is chosen. The input variables to the BDT are the separations in pseudorapidity , azimuthal angle, and their quadratic sum (Δ , Δ , and Δ , respectively) between the two jets in each pair. Although the same information is contained in Δ and Δ alone, additional use of a precalculated Δ leads to improved performance. The BDT is also parameterized in the invariant mass of the four identified -jets ( (4 )), which is included as an additional input feature. However, (4 ) cannot itself be used to discriminate between correct and incorrect pairings as it is independent of the pairing. Its inclusion as a parameter serves the purpose of ensuring optimal performance for all resonance masses. The BDT is trained on a sample consisting of one quarter of the simulated spin-0 signal events, across the full range of resonance masses considered in the resolved channel. A further selection on training events is applied, requiring them to contain at least four jets with T > 35 GeV, each of which is within Δ = 0.4 of a true -quark, defined at parton level and originating from a Higgs boson decay. This T requirement is loosened relative to the nominal selection in order to increase the number of selected events with low (4 ). The events entering this training are not used anywhere else in the analysis. This technique results in less background in the signal region (defined below) compared to the strategy used in the previous ATLAS search [7], which was based on minimizing the difference between the two di-jet invariant masses. The BDT algorithm finds the correct pairing in at least 65% of signal events. This efficiency is lowest for low resonance masses, but reaches almost 100% for resonance masses of 600 GeV and higher. This is a significant improvement with respect to the efficiency of the previous method, especially at low resonance masses.
After the candidates are formed, they are ordered by the scalar sum of the T of their constituent jets: 1 and 2 denote the leading and subleading candidates, respectively. A pseudorapidity separation between the two candidates of |Δ | < 1.5 is required in order to reduce the multĳet background. Additionally, a "top veto" is applied, to reduce the background from hadronic top quark decays. This is defined by combining every possible pair of jets with T > 40 GeV and | | < 2.5, including those that were not selected for the candidates, to form " candidates". "Top quark candidates" are built by pairing candidates with each remaining jet that was selected for candidates. Events are rejected if any top quark candidate satisfies < 1.5, where the discriminant is defined as Here, ( ) and ( ) denote the masses of the candidates under consideration, while SM ( ) and SM ( ) denote the measured masses of these particles (80.4 GeV and 172.5 GeV, respectively [79]). The denominators in the expression for represent the approximate mass resolution of the detector.
Finally, events are sorted into three kinematic regions based on the invariant masses of the candidates. The first of these is the signal region (SR), defined by requiring The shape of the SR in the ( 1 )-( 2 ) plane is chosen to optimize the signal significance. The mass values of 120 GeV and 110 GeV correspond to the position of the peaks of the simulated signal ( 1 ) and ( 2 ) distributions. The deviations from the measured Higgs boson mass of 125 GeV [79] are due to detector effects, as well as energy lost to neutrinos from the -hadron decays and to out-of-cone radiation. Jets which lose energy give rise to lower mass in their candidate. Since these jets are more likely to compose 2 by definition, this results in a slightly lower mass for 2 than 1 on average. The validation region (VR) contains the events not in the SR which satisfy the condition VR HH ≡

√︃
( Finally, the control region (CR) contains the events not in the SR or VR which satisfy the condition The centers of the VR and CR are shifted relative to that of the SR to ensure that the mean candidate masses are equal in the three regions. The shapes of these regions in the ( 1 )-( 2 ) plane are shown with the 2 data in Figure 2.
After the full selection, the final discriminating variable "corrected (HH)" is constructed. This is obtained by rescaling the four-momenta of the candidates such that ( 1 ) = ( 2 ) = 125 GeV. The corrected (HH) is then the invariant mass of the sum of the two resulting four-momenta. This procedure improves the scale and resolution of the reconstructed signal mass distribution by correcting for detector effects and physical processes such as radiative emission outside the jet cones. This correction improves the signal mass resolution by up to 25% and shifts the mean of the mass distribution closer to the true value. It also modifies the background shape, but does not introduce any signal-like features. The signal efficiency times acceptance for the various event selection steps is shown in Figure 3. The efficiency at low resonance masses is mainly limited by the trigger. At high resonance masses the jets start to merge together and the reconstruction and -tagging efficiencies decrease. The efficiency is substantially larger for the spin-2 model than for the spin-0 model because the corrected (HH) distribution of the spin-2 model is much broader. This has an especially large effect at the lowest resonance masses, where the shape of the corrected (HH) distribution is distorted toward higher values. This is a result of both the (HH) correction and the natural mass shape of the graviton resonance near the production threshold.  Trigger categories 4 jets, 2 tagged 4 tagged | HH| < 1.5 min(XWt) > 1.5 XHH < 1.6 (b) Figure 3: Cumulative acceptance times efficiency as a function of resonance mass for each event selection step in the resolved channel for (a) the spin-0 and (b) the spin-2 signal models. The local maximum at 251 GeV is a consequence of the near-threshold kinematics.

Background estimation
After the selection described above, the background is dominated by pure QCD multĳet processes (excluding top quark production), with the approximately 5% remainder almost entirely composed of tt. This background composition was determined by comparing tt MC simulation to the total background estimate in the SR; it is purely meant to be indicative and is not used in the statistical treatment. The background is modeled using a purely data-driven technique; the only MC simulation used is for the signal.
The background shape in the 4 SR is estimated from data in the 2 regions and the 4 CR. The signal contamination in the 2 dataset is evaluated and found to be negligible compared to the background uncertainties. The signal-to-background ratio in the 4 CR depends on the signal hypothesis and ranges from 10% to 25% of that in the SR. However, the impact on the analysis results was studied and found negligible. This was determined by injecting various signals into the CR (and VR) data using cross sections corresponding to existing experimental upper limits.
The event kinematics in the 2 and 4 regions are not expected to be identical, so a reweighting function which maps the 2 kinematic distributions onto the 4 distributions is derived: where 2 (ì) and 4 ( ì) are the probability density functions for 2 and 4 data, respectively, over a set of kinematic variables ì. This function is derived in the CR and then applied to the 2 SR in order to produce a model of the background in the 4 SR. It can also be applied more generally to any 2 region to produce a background model for the corresponding 4 region.
The computation of ( ì) is a density ratio estimation problem, for which a variety of approaches exist. The method employed in this analysis is modified from Refs. [80,81] and makes use of an artificial neural network (NN). This NN is trained on 2 and 4 CR data to minimize the loss function: The function in Eq.
(1) optimizes this loss by equalizing the contributions from the two terms.
The kinematic variables used to make up ì are chosen to be sensitive to the differences between the 2 and 4 events. These are: 1. log T of the selected jet with the second-highest T , 2. log T of the selected jet with the fourth-highest T , 3. log(Δ ) between the two selected jets with the smallest Δ , 4. log(Δ ) between the other two selected jets, 5. the average | | of selected jets, 6. log T of the system, 7. Δ between the two candidates, 8. Δ between the jets making up 1 , 9. Δ between the jets making up 2 , 10. log min( ) , and 11. the number of jets in the event with T > 40 GeV and | | < 2.5, including jets that are not selected.
Here, "selected" jets refer to the four jets which are used to construct the candidates. The variables to which the reweighting is most sensitive are the jet multiplicity, Δ between the two candidates, and log T of the system. The NN has three densely connected hidden layers of 50 nodes, each with a rectified linear unit activation function [82], and a single-node linear output.
The training of the NN is subject to variation due to initial conditions and the limited size of the training samples. To account for these effects, the bootstrap resampling technique is used [83]. This entails constructing a set of training samples by sampling with replacement from the original. The NN is trained independently on each element of this set, using different initial conditions each time. This results in an ensemble of background estimates. Since the original training sample is large, the resulting background estimate in each bin can be approximated as being Gaussian-distributed. Additionally, this sampling-with-replacement procedure can be approximated by applying a randomly distributed integer weight to each event, drawn from a Poisson distribution with a mean of 1. Both of these approximations are used in order to reduce the computational complexity of the problem. These "bootstrap weights" are independent of ( ì), which reweights the nominal 2 kinematic distributions to the nominal 4 kinematic distributions. To increase the stability of the background estimate, the median value of ( ì) for each event is calculated across the ensemble and used as the nominal background estimate. This ensemble of weights is also used to evaluate the uncertainty due to the finite training sample, as detailed in Section 5.3.
The effect of applying this reweighting to the CR, where it is derived, is shown in Figure 4. The output of this procedure is an estimate of the corrected (HH) distribution in the 4 SR, which is then used as input to the statistical procedure detailed in Section 7. The optimization of the bin width of the corrected (HH) distribution is based on the detector resolution at low masses. The performance of this reweighting outside of the region where it is derived is checked using the VR. The scaling of the 2 distribution to the 4 sample size is always derived from the two respective CRs, as part of the reweighting procedure. The data are found to be compatible with the background model in the VR, as shown in Figure 5. Residual differences between the CR and VR are used to estimate a systematic uncertainty as described in Section 5.3.

Systematic uncertainties
The most limiting uncertainties in the resolved channel are those arising from the data-driven background estimate which is derived in the CR and applied in the SR. There are two main sources: uncertainties from the limited sample size in the CR, and physical differences between the CR and SR.
The limited sample size in the CR can lead to a random bias in the reweighting function ( ì). Ideally, this effect would be evaluated using the event-level covariance matrix between all bootstrap weights; however, this is computationally intractable in practice. Instead, an approximation is computed using the interquartile  range (IQR) of each event's weight distribution as well as the IQR of the normalization factor ( ) for each bootstrap training , which is defined as where 4 is the number of 4 events, ( ) is the weight for event from the bootstrap resampling , and the 2 and 4 datasets are restricted to the region in which the reweighting NNs are trained. Varied distributions are constructed by assigning each event a weight which is varied to the upper boundary of the event-level weight IQR. These varied distributions are then scaled to have the same normalization as the nominal distribution, multiplied by the ratio of the upper boundary value of the IQR of ( ) to its nominal value. These rescaled varied distributions form an envelope around the nominal one, which specifies the size of this uncertainty in each corrected (HH) bin. As this uncertainty is statistical in origin, it is uncorrelated across (HH) bins.
Uncertainties in the background estimate also arise from kinematic differences between the CR and the SR. To evaluate these effects, an alternative background model is derived in the VR instead of the CR. The difference between the corrected (HH) distributions from the nominal and alternative background models is used to estimate the uncertainty in the shape of the (HH) distribution in the SR. To allow sufficient flexibility in the model, this uncertainty is parameterized in terms of two components: low-T and high-T , where T now denotes the scalar sum of the T of the four jets constituting the candidates. This variable is chosen because it is correlated with (HH), but does not introduce a discontinuity in the (HH) spectrum when the two components are varied separately. The boundary between low-T and high-T events is chosen to be 300 GeV. Each of these two components is symmetrized around the nominal shape to construct a two-sided uncertainty. These uncertainties are taken to be uncorrelated across the different years to accommodate differences due to the varying triggers and run conditions. Several detector modeling uncertainties are evaluated and included. These affect only the signal description, as the background is estimated entirely from data. Uncertainties in the jet energy scale and resolution are treated according to the prescription in Ref. [61]. Uncertainties in the -tagging efficiency are treated according to the prescription in Ref. [30]. Uncertainties in the trigger efficiencies are evaluated from measurements of per-jet online efficiencies for both jet reconstruction and -tagging, which are used to compute event-level uncertainties. These are then applied to the simulated events as overall weight variations. The uncertainty in the total integrated luminosity used in this analysis is 1.7% [84], obtained using the LUCID-2 detector for the primary luminosity measurements [85].
Several sources of theoretical uncertainty affecting the signal models were considered and are described as follows. Uncertainties due to modeling of the parton shower and underlying event are evaluated by comparing results between two generators for these parts of the calculation: the nominal H 7.1.3 and the alternative P 8.235. This is found to have a 5% effect on the signal acceptance and a negligible impact on the (HH) distribution, independently of the resonance mass. Uncertainties in the matrix element calculation are evaluated by raising and lowering the factorization and renormalization scales used in the generator by a factor of two, both independently and simultaneously. This results in an effect smaller than 1% for all variations and all masses; the impact of such uncertainties is therefore neglected. PDF uncertainties are evaluated using the PDF4LHC_NLO_MC set [86] by calculating the signal acceptance for each replica and taking the standard deviation. In all cases, these result in a less than 1% uncertainty in the signal acceptance, and therefore these are also neglected. Theoretical uncertainties in the →b ranching ratio [29] are included; they amount to a 2.5% overall uncertainty on the signal normalization.

Results
The corrected (HH) distributions for data and the estimated background after the fit to data described in Section 7 are shown in Figure 6. The data agree well with the background prediction and no significant excess is observed. The event yields for data, background, and several signal hypotheses are presented in Tables 1 and 2. These are integrated over windows in the corrected (HH) spectrum containing approximately 90% of the signal in each case. These windows are defined such that their first and last bins contain the 5 th and 95 th percentile of the distribution, respectively. This range is larger for the spin-2 signal because the benchmark model predicts a wider resonance. The statistical interpretation of the data is discussed in Section 7.  After passing the trigger requirement, each event is required to contain at least two large-jets with T > 250 GeV. The two highest-T jets are selected as the candidates. The leading and subleading candidates ordered by T are denoted by 1 and 2 , respectively. Each candidate is required to have | | < 2.0, ( ) > 50 GeV, and at least one associated track-jet. Additionally, at least one candidate must have T > 450 GeV, which is driven by the trigger threshold. In order to further reduce the background, an additional requirement of |Δ | < 1.3 is placed on the candidates.
Events are then categorized according to the multiplicity and -tagging status of the track-jets associated with each of the two candidates. For candidates with more than two associated track-jets, only the two with the highest T are considered.
Since track-jets have a variable radius, it is possible for a high-T jet to be contained completely within the catchment area of another low-T jet. This can lead to a pathological case in which the low-T jet's axis is also contained within the high-T jet. This can result in misassignment of tracks to jets by the -tagging algorithm (which is based purely on proximity to the jet axis). To avoid any degradation in performance resulting from this, events containing such collinear track-jets are vetoed from the boosted channel.
Three signal-enriched categories with four or fewer -tagged track-jets are defined: • Events in the "4 " category have two -tagged track-jets associated with each candidate.
• Events in the "3 " category have two -tagged track-jets associated with one candidate and exactly one -tagged track-jet associated with the other candidate.
• Events in the "2 " category have exactly one -tagged track-jet associated with each candidate.
These are collectively labeled as the high-tag categories. Including events with less than four -tagged track-jets increases the sensitivity of the search especially for high resonance masses, where, due to the large boost, track-jets can become so close that they are often not reconstructed individually.
Additional low-tag categories with track-jets that fail the -tagging requirement are also defined in order to estimate the background.
• Events in the "2 -2 " category (for modeling 4 ) have one candidate with two or more associated -tagged track-jets and the other candidate with no -tagged track-jets but two or more untagged track-jets.
• Events in the "2 -1 " category (for modeling 3 ) have one candidate with two or more associated -tagged track-jets and the other candidate with no -tagged track-jets but one or more untagged track-jets.
• Events in the "1 -1 " category (for modeling 2 ) have one candidate with exactly one associated -tagged track-jet and the other candidate with no -tagged track-jets but one or more untagged track-jets.
In these low-tag categories, the candidate that has no -tagged track-jets is also referred to as untagged, while the other one is labeled as tagged. The untagged candidate in the 2 -1 region is allowed to have more than one track-jet because requiring exactly one would result in a very small number of events in this category. A diagram of events in these high-tag and low-tag categories is shown in Figure 7.
Events satisfying the 2 -2 criteria also necessarily satisfy the 2 -1 criteria. To avoid overlap between the two categories, these events are distributed randomly between them, with 80% allocated as 2 -1 events and the remaining 20% allocated as 2 -2 events. This corresponds roughly to the ratio of background events present in the two categories.
Similarly to the resolved channel, events are sorted into signal, validation, and control regions based on the invariant masses of the candidates. The SR is defined by requiring < 1.6, where This definition, as well as those of the validation and control regions, slightly differs from that in the resolved channel. This is due to the different energy scale of the boosted jet reconstruction and the different background distribution. The VR contains the events not in the SR which satisfy the condition  2b-2f 2b-1f 1b-1f Figure 7: Illustration of the three high-tag categories (4 , 3 , and 2 ) with the corresponding low-tag categories used to estimate the multĳet background (2 -2 , 2 -1 , and 1 -1 ). Teal cones represent large-jets, yellow cones represent associated -tagged track-jets, and white cones represent associated untagged track-jets. For candidates with more than two associated track-jets, only the two with the highest T are considered.
Finally, the CR contains the events not in the SR or VR which satisfy the condition CR HH ≡
The CR is shifted to higher masses relative to the signal and validation regions in order to maximize the number of selected events while avoiding the low-mass peak of the multĳet background distribution. The definition of these regions in the ( 1 )-( 2 ) plane are shown with the 2 -1 data in Figure 8.
In order to ensure orthogonality between the resolved and boosted channels, any events passing the resolved signal region selection are vetoed from the boosted channel. This priority choice results in the best signal sensitivity.
The signal acceptance times efficiency for various steps of the selection is shown in Figure 9.  this efficiency is obtained after the other selection steps including the SR definition. The signal efficiency in the 4 region has a maximum around 1.5 TeV. Above that value the track-jets start to merge together, and for the highest resonance masses the 2 category becomes the most efficient.

Background estimation
As in the resolved channel, the background in the boosted channel is dominated by QCD-induced jet production, which is separated into multĳet (light quark) and tt production. The fractions of tt relative to the total background are 10%, 15% and 30% for the 4 , 3 and 2 regions, respectively. Other background sources, such as single Higgs boson production, SM production, ( → b)+jets, and → b b account for ≤1% of the total and are neglected.
A data-driven method is used to estimate the multĳet background in each of the 4 , 3 , and 2 signal regions. The tt background is estimated from MC simulation, with corrections derived from data applied in the 3 and 2 regions.
The overall normalization of the multĳet and tt estimates are obtained from a fit to the CR data in each category. Two normalization parameters MJ and tt per -tagging category (here denoted ) are introduced as follows: Here, hi and lo denote the number of events in bin of the 1 mass distribution in the high-tag and corresponding low-tag regions, respectively. The parameter MJ scales the multĳet background from the low-tag CR to the high-tag CR, and tt corrects the MC estimate in the high-tag region. The values of MJ and tt are determined using a maximum-likelihood fit to the mass of the leading candidate after the kinematic reweighting is applied. The mass of the leading candidate discriminates between the two background processes, as shown in Figure 10. The data in the control regions of the 4 , 3 , and 2 categories are fitted separately, and therefore a total of six floating normalization factors are obtained. However, the value of tt in the 4 region is fixed to 1, since the fit is insensitive to the tt MC normalization with the available dataset and therefore cannot constrain it. The results of these normalization fits are summarized in Table 3. It is assumed that these values of MJ and tt are also applicable in the VR and SR; potential deviations from this assumption are accounted for by systematic uncertainties. The fact that MJ 1 implies that the background is much larger in the low-tag categories than in the high-tag categories. As a result, any potential bias in the high-tag background estimate due to signal contamination in the low-tag categories is much smaller than the signal contribution itself in the high-tag regions. Table 3: Best-fit values for MJ and tt , with their statistical uncertainties. The linear correlation coefficient between the two parameters is also given. The value of tt in the 4 region is fixed to 1, since the data are unable to constrain it significantly.  The difference between these low-tag and high-tag regions is that the low-tag events have an untagged candidate (no -tagged track-jets), while high-tag events instead have a tagged candidate (exactly one -tagged track-jet, since only the 3 and 2 categories are considered here). Therefore, the reweighting applied to low-tag events is based on their untagged candidates, with the aim of matching the kinematics of the tagged candidates in high-tag events. This reweighting is derived purely in the low-tag regions; the tagged candidates in the 1 -1 category are used to define the target.
The following kinematic distributions are used to construct the reweighting, for which leading and subleading refer to an ordering in T : 1. T of the candidate, 2. T of the chosen track-jet, 3. of the chosen track-jet, and 4. Δ between the leading and subleading track-jets (for candidates with at least two track-jets).
The "chosen" track-jet is the -tagged one for tagged candidates and a random one for untagged candidates. In tagged candidates with two track-jets, the leading and subleading track-jets have roughly equal probabilities to be the -tagged one, so this random selection does not introduce a significant bias. Separate distributions are constructed for leading and subleading candidates, as well as for leading and subleading track-jets.  At each iteration , cubic splines are fitted to the ratios of tagged to untagged distributions, and the weights are updated according to where indexes the different reweighting variables , denotes the spline functions, and the "learning rate" controls how much the weight can change with each iteration. This is set to = 1 − 0.5 , and ten iterations are used, after which convergence is observed. Suppressing the learning rate for early iterations is intended to avoid instabilities. This reweighting function is applied to the low-tag data sample which contains multĳet and tt events. In order to obtain the multĳet model, the tt contribution is subtracted from data, and for that purpose the tt events in the low-tag regions are therefore also reweighted. The tt distributions in the high-tag regions are not reweighted.
In order to reduce the effect of statistical fluctuations in the background at high (HH), the following function is fitted to the reweighted multĳet and tt distributions for (HH) ≥ 1200 GeV: where the are dimensionless free parameters and ≡ (HH)/ √ . This function and similar ones have been used to fit falling dĳet and multĳet spectra in similar analyses (e.g. Refs. [7, 87]). The results of these smoothing fits for the multĳet background model are shown in Figure 11. Due to the small number of events in the 4 category, the shape of the¯distribution in this region is taken from the 3 category and scaled to the yield in the 4 category. This is found to be consistent with the shape in the 4 category within statistical uncertainties. The background model outside the region where it is derived is checked using the VR, as shown in Figure 12. Good agreement between the background model and the data in the VR is observed; residual differences between the CR and VR are used to estimate a systematic uncertainty as described in Section 6.3.

Systematic uncertainties
The boosted channel is generally limited by statistical uncertainties, especially for higher resonance masses. Systematic uncertainties in the background and hypothesized signals are nonetheless fully accounted for and described here.
The uncertainty in the data-driven background estimate is split into three normalization components and five shape components. Each of these components is evaluated separately for the tt background (from simulation) and the multĳet background.
The first normalization component is associated with the assumption that the normalization factor MJ derived in the CR is also applicable in the SR. To evaluate this uncertainty, a Gaussian process technique [88] is used to interpolate the multĳet contribution in the SR. First, a fit is performed on the two-dimensional ( 1 )-( 2 ) distribution, with the SR data removed, to determine the parameters for a Gaussian two-point correlation function in each of the low-tag and high-tag regions. The correlation scales are optimized as part of the fit and are found to be >100 GeV in all cases, so the interpolation is able to smoothly fit across the SR. Second, the resulting correlation function is used to construct an estimate for all points in the ( 1 )-( 2 ) plane. Event yields in each region, in particular the SR, are determined from this result. The uncertainty is finally defined as the difference of high-tag to low-tag yield ratio between the SR and the CR. This is applied only to the multĳet component of the background; the tt estimate from MC is subtracted from the data when deriving it.
The second normalization component is associated with the choice of control region. The suitability of this choice is validated by comparing the background model with data in signal-depleted regions; however, a small dependence of the background estimate on this choice remains. This is accounted for by defining slightly different CRs and re-deriving the background model from each of them. The largest difference between any of these and the nominal background model is taken as a normalization uncertainty. The alternative CRs are defined by raising or lowering the values of ( 1 ) and ( 2 ) by 3 GeV (independently, for four variations). Additional variations are defined by increasing or decreasing the value of the CR HH requirement by 3 GeV while applying the opposite change to the VR HH requirement, effectively making the CR thicker or thinner in the ( 1 )-( 2 ) plane. In all cases, a veto on the SR is maintained. This procedure results in uncertainties of 0.9%, 1.6%, and 6.0% for the 2 , 3 , and 4 regions, respectively.
The third component of the background normalization uncertainty is associated with the statistical uncertainty of the normalization fit. The best-fit values of MJ and tt are varied along the eigenvectors of their covariance matrix. The varied values are propagated through the background estimation procedure to evaluate the resulting effect. Since tt is fixed to a value of 1 for the 4 category, only one variation is performed in that case, corresponding to MJ . This variation changes the normalizations of the individual background components, resulting in different total background shapes.
The first component of the background shape uncertainty comes from the limited sample size used to fit the smoothing function in Eq. (2). This is taken into account by an eigenvariation method using the covariance matrix of the function parameters. Function variations are defined by varying the best-fit parameters according to the eigenvectors, scaled to the square root of the corresponding eigenvalues. These varied functions are then treated as components of the uncertainty in the background shape. Three variations are used for each of the multĳet and tt background components.
Two further components of the shape uncertainty in the background model are associated with choosing the fit function that is used in Eq. (2). As the function and fit range are arbitrary and chosen based only on empirical results from the CR and VR, both are varied. Seven alternative functional forms are used, based on the scheme from Ref. [87]. The nominal fit is compared with the other seven, and those which differ the most from the nominal one in each direction are used to define an envelope for the shape uncertainty. The result of this procedure is shown in Figure 11. The fit range is also varied by changing its upper and lower bounds by 100 GeV in each direction. Again, the results which differ the most from the nominal one are used to construct an envelope, defining another shape uncertainty. In cases where all variations fall on one side of the nominal result, an envelope is constructed by symmetrization.
The fourth component of the shape uncertainty is the "residual" one, associated with shape differences between the CR and SR. This is associated only with the multĳet background, as direct simulation of the SR is used for the tt background shape. This uncertainty is evaluated by comparing the multĳet background model with the data in the VR, after subtracting the tt component determined from simulation. Since the events in the VR are kinematically closer to the SR than events in the CR are, this comparison of the background models covers residual mismodeling connected to the extrapolation over the regions. The multĳet models agree within statistical uncertainties for the 3 and 4 regions, so this uncertainty is only derived for 2 events. The shape of the uncertainty is defined as the ratio of the multĳet background model to the tt-subtracted VR data. Empty bins and bins with relative statistical uncertainty over 50% are discarded to suppress unphysical effects from statistical fluctuations, and the remainder are smoothed and symmetrized to obtain the final shape.
The fifth component of the shape uncertainty is associated with the nonclosure of the background estimation method itself. This is evaluated by running the full background estimation procedure on multĳet MC simulation. The reweighting is derived in the CR and applied to the low-tag events in both the VR and SR, which are summed into a single region for the purpose of evaluating this uncertainty. The ratio of the resulting estimate to the actual MC prediction in that summed region is determined as a function of (HH). A linear function is fitted to this set of ratios in each high-tag category, in order to mitigate the effects of statistical fluctuations. The result is then symmetrized to form an envelope, which defines this component of the shape uncertainty. It is applied only to the multĳet component of the background. For 2 and 3 categories, the relative size of this uncertainty is 5% or less for (HH) < 2 TeV, ranging up to about 15% for the largest values of (HH). For the 4 category, the relative size of this uncertainty is less than 15% over the whole (HH) range.
Several experimental uncertainties are considered. These affect only the signal and tt background models directly, as the multĳet background is estimated from data alone. However, they have indirect effects on the multĳet background model due to the tt subtraction applied in the data-driven procedure. These are accounted for by propagating the corresponding variations through the background estimation procedure. Uncertainties in the jet energy and mass scales and resolutions are treated according to the prescriptions in Refs. [89,90]. Uncertainties in the -tagging efficiency are treated according to the prescription in Ref.
[30]; this follows exactly the same procedure as in the resolved channel. Unlike in the resolved channel, no trigger efficiency uncertainties are required: no -tagging is done at trigger level and the jet T requirement is high enough that the triggers are fully efficient. As in the resolved channel, the uncertainty in the total integrated luminosity is 1.7% [84], obtained using the LUCID-2 detector [85].
Theory uncertainties in the tt background model are evaluated for the matrix element and parton shower parts of the calculation. Matrix element uncertainties are computed by raising and lowering the factorization and renormalization scales by a factor of two. The P damping parameter ℎ damp is also varied upwards by a factor of two. Additionally, a comparison with an alternative matrix element generator, M G 5_ MC@NLO 2.6.0, is performed and the difference is taken as an uncertainty component. Uncertainties in the initial-state and final-state radiation modeling are evaluated using both scale variations and eigenvariations of the A14 tune [50]. Parton shower uncertainties are obtained by generating alternative samples which are showered using H 7.1.3 instead of P 8.230 and taking the difference between the resulting (HH) distributions. The effects of PDF uncertainties are evaluated by comparing the nominal (HH) distribution with those obtained from a set of 100 weight variations, but are found to be much smaller than the statistical uncertainty and are therefore neglected.
Theory uncertainties in the signal models are evaluated using exactly the same method as for the resolved channel. Here, the parton shower and underlying event are found to have a 10% effect on the signal acceptance, independently of the resonance mass. Uncertainties in the matrix element calculation are again found to be negligible, and are not included in the statistical interpretation.

Results
The (HH) distributions for data and the estimated background are shown in Figure 13 for the three categories. The data agree well with the background and no significant excess is observed. The numbers of events for data, background, and several signal hypotheses are presented in Tables 4 and 5. These event yields are integrated over a set of (HH) bins containing approximately 90% of the signal in each case. These windows are defined such that their first and last bins contain the 5 th and 95 th percentile of the distribution, respectively. This range is larger for the spin-2 signal because the benchmark model predicts a wider resonance. The statistical interpretation of the data is discussed in Section 7.  Table 4: Boosted signal region data, estimated background, and signal event yields in (HH) windows containing roughly 90% of each signal, for representative spin-0 mass hypotheses. The signal is normalized to the overall expected limit on its cross-section; its uncertainties are evaluated by adding all individual components in quadrature. The background yields and uncertainties are evaluated after a background-only fit to the data. The 4 category is not used for ( ) > 3 TeV and the 2 category is not used for ( ) < 2 TeV.

Combined results and interpretations
For each signal model, the hypothesis of the presence of a signal is tested using a profile likelihood ratio [91]. The likelihood fit is carried out in bins of corrected (HH) for the resolved channel and in bins of (HH) for the boosted channel. In the resolved channel, data from 2016, 2017, and 2018 are included separately in a simultaneous fit. In the boosted channel, data from the 2 , 3 , and 4 signal regions are included separately in a simultaneous fit. The relative contribution of each -tagging category to the sensitivity varies significantly with the resonance mass. At very high masses, the track-jets associated with one Higgs boson candidate tend to be highly boosted and are often not reconstructed as individual jets. Therefore, the 4 category is used only for signal hypotheses with ( / * KK ) ≤ 3 TeV, and the 2 category is used only for signal hypotheses with ( / * KK ) ≥ 2 TeV. For resonance masses in the range 900 GeV-1.5 TeV, the resolved and boosted channels are fitted simultaneously.
The likelihood function used to construct the test statistic has a standard form, consisting of a product of Poisson distributions for the yields in each bin and constraint functions for nuisance parameters describing systematic uncertainties. For uncertainties due to the limited sample size in data or MC simulation, the constraint is a Poisson distribution. For all other systematic uncertainties, the constraint is a Gaussian distribution. Any systematic uncertainty which is treated as uncorrelated between different regions or bins has a separate independent nuisance parameter for each of them. Uncertainties in the luminosity and signal modeling are treated as fully correlated between the resolved and boosted channels. All other uncertainties in the background model are treated as uncorrelated between the resolved and boosted channels. The statistical model is implemented using HistFactory [92].
The global significance is evaluated according to the procedure detailed in Ref. [93]. Pseudo-experiments are generated from the background-only model that was fitted to data, and used to construct a local -value distribution as a function of the resonance mass. The number of level crossings below a reference level of = 0.5 is used together with the local -value to compute a global -value. The most significant excess is found for a signal mass of 1100 GeV. The local significance of this excess is 2.3 for the spin-0 signal model and 2.5 for the spin-2 signal model. Its global significance is 0.4 for the spin-0 signal model and 0.8 for the spin-2 signal model.
Upper limits on the cross-section of resonant Higgs boson pair production via gluon-gluon fusion ggF are set in each of the benchmark models. These are based on the CL s method [94], where a cross-section value is considered excluded at the 95% confidence level (CL) when CL s is less than 0.05. For signal masses up to 3 TeV, the limits are computed using asymptotic formulae [91]. At higher masses, the asymptotic approximation is inaccurate, so the limits are instead computed by sampling pseudo-experiments. The results are shown in Figure 14. The theoretical prediction for the bulk RS model with / Pl = 1 is also shown; this is taken from Ref. [28]. This model is excluded for masses between 298 GeV and 1460 GeV. The expected mass exclusion range is from 304 GeV to 1740 GeV. The difference between the limits on the spin-0 and spin-2 signal models at low mass is primarily due to the fact that the spin-2 model predicts a much broader corrected (HH) distribution. In particular, the spin-2 signals with masses below 300 GeV are sensitive to a small deficit in the data between 350 and 400 GeV, while the spin-0 signals with masses below 300 GeV are not.
The impacts of the most important systematic uncertainties are shown in Table 6. In order to compute these numbers, the limit-setting procedure is repeated, but with the nuisance parameters in question held fixed to their best-fit values instead of being allowed to vary within an uncertainty. The resulting expected limit is an approximation of how much the sensitivity of the search would be improved if the "true values" Observed limit (95% CL) Expected limit (95% CL) Expected limit ± 1σ Expected limit ± 2σ Resolved expected limit Boosted expected limit Observed limit (95% CL) Expected limit (95% CL) Expected limit ± 1σ Expected limit ± 2σ production in the (a) spin-0 and (b) spin-2 signal models. The ±1 and ±2 uncertainty ranges for the expected limits (colored bands) are shown. Expected limits using each of the resolved and boosted channels individually (dashed colored lines) are shown. The theoretical prediction for the bulk RS model with / Pl = 1 [28] (solid red line) is shown; the decrease below 350 GeV is due to a sharp reduction in the * KK → branching ratio. The nominal →¯branching ratio is taken as 0.582. Table 6: Impacts of the main systematic uncertainties on the expected 95% CL upper limits on the signal cross-section for four illustrative values of ( ). These are defined as the relative decrease in the expected limit when each relevant nuisance parameter is held fixed to its best-fit value instead of being assigned an uncertainty. The spin-0 signal model is used here. of those parameters were known exactly. Uncertainties originating from the limited sample size in any data region are not considered "systematic" for the purposes of this evaluation, and no corresponding fit parameters are held fixed. For all signal mass hypotheses, statistical uncertainties are dominant. At low masses, uncertainties in the shape of the background (HH) distribution from the data-driven estimate also contribute significantly. These uncertainties get considerably constrained (typically by a factor of 2-3) by the fit. Detector and theoretical uncertainties have only a very small impact on the sensitivity of the search.

Conclusion
A search for resonant pair production of Higgs bosons in the¯¯final state was carried out using up to 139 fb −1 of LHC collision data collected by the ATLAS detector at √ = 13 TeV. Results are reported for the resolved channel, where each bb pair is reconstructed as two separate small-jets, and the boosted channel, where each bb pair is reconstructed as a single large-jet. The sensitivity of this analysis is improved relative to previous searches by using more sophisticated background modeling techniques, machine-learning methods, and variable-radius track-jets with optimized -tagging in addition to the full ATLAS Run 2 dataset. The expected upper limits on the cross-section are reduced relative to the previous ATLAS search in this final state by approximately 20% at low resonance masses and more than 80% at high masses. This search also covers resonance masses in the range from 3 to 5 TeV for the first time.
No significant evidence of a signal is observed. Upper limits are set on the cross-section of resonant Higgs boson pair production for two benchmark models: a generic narrow spin-0 resonance, and a spin-2 graviton in the context of a bulk Randall-Sundrum model with / Pl = 1, both of which are assumed to be produced via gluon-gluon fusion. The bulk Randall-Sundrum model is excluded for graviton masses between 298 GeV and 1460 GeV.

Acknowledgments
We thank CERN for the very successful operation of the LHC, as well as the support staff from our institutions without whom ATLAS could not be operated efficiently.          [75] ATLAS Collaboration, Evidence for the →¯decay with the ATLAS detector, JHEP 12 (2017) 024, arXiv: 1708.03299 [hep-ex].

References
[76] ATLAS Collaboration, Muon reconstruction performance of the ATLAS detector in proton-proton collision data at