Search for nonresonant Higgs boson pair production in ﬁnal state with two bottom quarks and two tau leptons in proton-proton collisions at √ s = 13 TeV

A search for the nonresonant production of Higgs boson pairs (HH) via gluon-gluon and vector boson fusion processes in ﬁnal states with two bottom quarks and two tau leptons is presented. The search uses data from proton-proton collisions at a center-of-mass energy of √ s = 13 TeV recorded with the CMS detector at the LHC, corresponding to an integrated luminosity of 138fb − 1 . Events in which at least one tau lepton decays hadronically are considered and multiple machine learning techniques are used to identify and extract the signal. The data are found to be consistent, within uncertainties, with the standard model (SM) predictions. Upper limits on the HH production cross section are set to constrain the parameter space for anomalous Higgs boson couplings. The observed (expected) upper limit at 95% conﬁdence level corresponds to 3.3 (5.2) times the SM prediction for the inclusive HH cross section and to 124 (154) times the SM prediction for the vector boson fusion HH cross section. At 95% conﬁdence level, the Higgs ﬁeld self-coupling is constrained to be within − 1 . 7 and 8.7 times the SM expectation, and the coupling of two Higgs bosons to two vector bosons is constrained to be within − 0 . 4 and 2.6 times the SM expectation. © 2022 The Author(s). Published by Elsevier B


Introduction
The discovery of the Higgs boson (H) by the ATLAS and CMS Collaborations [1][2][3] was a major step towards improving the understanding of the mechanism of electroweak symmetry breaking. With the value of the H mass (m H ) experimentally measured by the CMS Collaboration to be 125.38 ± 0.14 GeV [4], the structure of the Higgs scalar field potential and the strength of the H self-coupling are precisely predicted in the standard model (SM). While the measured properties are so far consistent with the SM [5], determining the H self-coupling provides an independent test of the SM and is fundamental to probe experimentally the shape of the scalar potential that is at the base of the electroweak symmetry breaking mechanism.
The trilinear self-coupling of the Higgs boson (λ HHH ) can be extracted from the measurement of the Higgs boson pair (HH) production cross section. In the SM, the HH production occurs for proton-proton (pp) collisions at the LHC mainly via gluon-gluon fusion (ggF), involving either couplings to a loop of virtual fermions, or the λ HHH coupling itself. The leading-order (LO) gg → HH Feynman diagrams shown in Fig. 1 have approximately the same amplitude but interfere destructively, therefore reducing the production cross section. The SM prediction for the HH production cross section, computed at the next-to-next-to-LO (NNLO) precision for a center-of-mass energy of √ s = 13 TeV, is σ HH ggF = 31.05 +6% −23% (scale + m t ) ± 3% (PDF + α S ) fb for an m H of 125 GeV [6,7], where PDF is the parton distribution function, α S is the strong coupling constant and m t is the mass of the top quark. Among the other HH production modes, vector boson fusion (VBF) is of particular interest as it is the second-largest contribution to HH production and it gives access to the coupling between one or two H and two vector bosons. Despite its small production cross section, σ HH VBF = 1.726 +0.03% −0.04% (scale) ± 2.1% (PDF + α S ) fb at next-to-NNLO precision for √ s = 13 TeV [8], the distinctive VBF topology provides a useful handle to identify signal events, thanks to its two forward jets being well separated in pseudorapidity (η) and with a large invariant mass. The LO Feynman diagrams for VBF HH production are shown in Fig. 2.
Beyond SM physics effects in the nonresonant case can appear via anomalous couplings of the H or via new particles that contribute to the virtual quantum loops in Fig. 1. The experimental signature would be a modification of the HH production cross section and of the event kinematical properties [9].
The variation of λ HHH with respect to the SM prediction is parameterized by the coupling modifier κ λ = λ HHH /λ SM HHH , which affects both the ggF and the VBF production mechanisms. Similarly, the variations of the coupling of the H to the top quark are parameterized as κ t , while those of the couplings involved in the VBF production are parameterized as κ V (VVH coupling) and κ 2V (VVHH coupling).
In this paper, we present a search for HH production in pp collisions at √ s = 13 TeV, in- vestigating both the ggF and the VBF production mechanisms. The final state where one H decays to bb and the other decays to τ + τ − (for simplicity HH → bbττ), characterized by a decay branching fraction of 7.3% for m H = 125 GeV, is studied. The sizable branching fraction, combined with the precise tau lepton identification algorithms developed within the CMS Collaboration (DEEPTAU [10]), makes this final state one of the most sensitive to HH production. We consider three decay modes of the ττ system: τ µ τ h , τ e τ h , and τ h τ h , where τ µ , τ e , and τ h correspond, respectively, to the decay of a tau lepton into a muon, an electron, or hadrons, with the associated neutrinos. These account for 87.6% of the full ττ decay modes. Quantum chromodynamics (QCD) multijet and Z/γ * → ℓℓ events represent the dominant background in the fully hadronic channel, while in the semileptonic channels tt events represent the largest contamination. The data used in this analysis correspond to an integrated luminosity of 138 fb −1 collected during 2016-2018 by the CMS experiment.
The search described in this paper improves on the previous HH → bbττ results from the CMS and ATLAS Collaborations [11,12] by making use of a larger data sample, an improved trigger strategy, and the introduction of several newly developed neural networks that identify the b jets from the H decay, categorize the events, and perform signal extraction. Moreover, this analysis builds up on CMS-wide innovations such as the new DEEPTAU [10] and DEEP-JET [13] algorithms. All these factors lead to achieving particularly stringent results on the HH production cross sections.
Tabulated results are provided in the HEPData record for this analysis [14].

The CMS detector
The CMS detector is a multipurpose apparatus optimized to study high-transverse-momentum (p T ) physics processes in pp collisions. The central feature of the CMS detector is a superconducting solenoid, providing a magnetic field of 3.8 T. Within the solenoid volume are a silicon pixel and strip tracker that cover a region of |η| < 2.5, a lead tungstate crystal electromagnetic calorimeter (ECAL), and a brass and scintillator hadron calorimeter (HCAL), each composed of a barrel and two endcap sections that cover |η| < 3. Forward calorimeters extend the η coverage, provided by the barrel and endcap detectors, to |η| < 5. Muons are detected in gasionization chambers embedded in the steel flux-return yoke outside the solenoid and covering an angle |η| < 2.4. Events of interest are selected using a two-tiered trigger system. The first level (L1), composed of custom hardware processors, uses information from the calorimeters and muon detectors to select events at a rate of around 100 kHz within a fixed latency of about 4 µs [15]. The second level, known as the high-level trigger, consists of a farm of processors classified as electrons, muons, photons, and charged or neutral hadrons. An important portion of the bbττ final state is reconstructed and identified through the combination of PF objects, such as the jets originating from b quarks, the τ h candidates, and the missing transverse momentum vector ⃗ p miss T , computed as the negative vector p T sum of all the PF candidates in an event. The magnitude of the ⃗ p miss T vector is denoted as p miss T [35]. The primary vertex (PV) is taken to be the vertex corresponding to the hardest scattering in the event, evaluated using tracking information alone, as described in Section 9.4.1 of Ref. [36].
The electron momentum is estimated by combining the energy measurement in the ECAL with the momentum measurement in the tracker. The momentum resolution for electrons with p T ≈ 45 GeV from Z → ee decays ranges from 1.6 to 5.0%. It is generally better in the barrel region than in the endcaps, and also depends on the bremsstrahlung energy emitted by the electron as it traverses the material in front of the ECAL [37,38]. Electron identification (ID) is based on a boosted decision tree (BDT) that combines purely-ECAL, purely-tracker, and ECAL-plustracker observables, including the isolation of the electron. The "tight" working point of this discriminator, characterized by an 80% efficiency, is employed in the analysis [37].
Muons are measured in the range |η| < 2.4, with detection planes made using three technologies: drift tubes, cathode strip chambers, and resistive plate chambers. The single muon trigger efficiency exceeds 90% over the full η range, and the efficiency to reconstruct and identify muons is greater than 96%. Matching muons to tracks measured in the silicon tracker results in a relative p T resolution, for muons with p T up to 100 GeV, of 1% in the barrel and 3% in the endcaps. The p T resolution in the barrel is better than 7% for muons with p T up to 1 TeV [39]. The isolation of muon candidates (I ℓ rel ) is defined as the p T sum of PF candidates inside a ∆R = √ (∆η) 2 + (∆ϕ) 2 < 0.4 cone around the muon relative to the p ℓ T of the muon (where ϕ is the azimuthal angle in radians): where ∑ p charged T , ∑ p neutral-had T , and ∑ p γ T are the scalar sums of the transverse momenta of charged hadrons originating from the PV, neutral hadrons and photons, respectively, and ∑ p PU T is the sum of transverse momenta of charged hadrons not originating from the PV. In this analysis, the signal muon candidates are required to have I ℓ rel < 0.15 and to pass the "tight" identification criteria described in Ref. [39]. The combined identification and isolation efficiency varies between 90 and 99% in the p T range 20-200 GeV.
Hadronic decays of τ leptons are reconstructed using the "hadrons-plus-strips" algorithm [40][41][42], which classifies individual hadronic decay modes of the τ by combining charged hadrons from the PF reconstruction with neutral pions. The latter are reconstructed by clustering electrons and photons into rectangular strips, which are narrow in η but wide in the ϕ dimension. The spread in ϕ accounts for photons originating from neutral pion decays, which convert into electron-positron pairs while traversing the silicon tracker. These particles are bent in opposite directions in ϕ by the magnetic field, and may further emit bremsstrahlung photons before reaching the ECAL. The decay modes considered in this analysis produce one charged pion or kaon plus zero, one or two neutral pions, or three charged pions or kaons plus zero or one neutral pion. The DEEPTAU algorithm identifies τ h objects using a convolutional neural network combining information from high-level tau lepton observables together with low-level features obtained from the silicon tracker, the ECAL and HCAL, and the muon detectors for all the PF candidates, electrons, and muons that are reconstructed inside the tau lepton isolation cone (∆R < 0.5). Three different discriminants are employed in order to distinguish τ h candidates from jets ("DEEPTAUVSJETS"), electrons ("DEEPTAUVSELE") and muons ("DEEPTAUVSMU"). Different working points are used in each analysis channel, as summarized in Table 1. Jets are reconstructed offline from PF candidates, clustered using the anti-k T algorithm [43,44] with distance parameters of 0.4 and 0.8, respectively, denoted as "AK4" and "AK8" jets. The constituents of the AK8 jets are reclustered using the Cambridge-Aachen algorithm [45,46]; the "modified mass drop tagger" algorithm [47,48], also known as the "soft-drop" algorithm, is applied to remove soft, wide-angle radiation from the jet. The jet momentum is determined as the vectorial sum of all particle momenta in the jet, and is found from simulation to be, on average, within 5-10% of the true momentum over the entire p T spectrum and detector acceptance. Pileup collisions can give rise to jets not coming from the hard-scattering process (pileup jets) or contribute additional tracks and calorimetric energy depositions, increasing the apparent jet momentum. For AK4 jets, to mitigate this effect, tracks identified to be originating from pileup vertices are discarded and an offset is applied to correct for remaining contributions [49]. For AK8 jets, the pileup-per-particle identification algorithm [50] is used to mitigate the effect of pileup at the reconstructed particle level, making use of local shape information, event pileup properties, and tracking information. Jet energy corrections are derived from simulation studies so that the average measured energy of jets becomes identical to that of particle-level jets. In situ measurements of the momentum balance in dijet, γ + jets, Z + jets, and multijet events are used to determine any residual differences between the jet energy scale in data and in simulation, and appropriate corrections are applied [51, 52]. Jets originating from b quarks are identified using the DEEPJET algorithm [13], a deep neural network combining secondary vertex properties, track-based variables, and PF jet constituents (neutral-and charged-particle candidates). Different thresholds applied to the output of the DEEPJET algorithm result in working points characterized by their b tagging accuracy; in this analysis the "loose" and "medium" working points are used, corresponding to an efficiency (misidentification rate for light-flavor and gluon jets) of 94 (10)% and 84 (1)%, respectively [13].
As mentioned in the introduction, this analysis considers three decay modes of the ττ system: τ µ τ h , τ e τ h , and τ h τ h . Different trigger strategies are employed for each of these channels. In the semileptonic ones, single-lepton and cross-lepton triggers are used. The former requires the presence of an isolated electron or muon only, while the latter also requires an additional hadronically decaying tau lepton. The use of cross-triggers represents the main change in the trigger strategy with respect to the result previously published on 2016 data [11] and improves the analysis sensitivity, allowing the p T threshold on electrons and muons to be lowered, as described in Table 1. In the τ h τ h channel, triggers requiring the presence of a pair of hadronically decaying τ leptons with p T > 35 GeV are employed. A VBF ditau trigger was introduced in 2017 to extend the analysis phase space, targetting events containing two hadronically decaying τ leptons with p T > 20 GeV and two jets, one with p T > 115 GeV and the other with p T > 45 GeV. The online trigger thresholds for each electron, muon or tau lepton are reported in Table 1. Compared to the online selection, the offline thresholds are increased by 1 GeV for electrons and muons and by 5 GeV for τ h candidates: this is done to assure that the trigger efficiency reaches its plateau value.
Events are classified in three different ττ decay channels based on the information from the τ lepton candidates reconstructed offline. Selected signal events are required to have at least one τ h candidate. An event is classified as τ µ τ h if a muon passing the selections detailed in Table 1 is found, otherwise it is classified as τ e τ h if an electron passing the selections is found: if neither a muon nor an electron are found, the event is classified as τ h τ h if a second τ h is present. In the τ h τ h channel, if not stated otherwise, the former τ h refers to the more isolated one.
Each reconstructed electron, muon, or tau lepton candidate is required to have a distance between the track of the candidate and the PV in the transverse and longitudinal planes satisfying |d xy | < 0.045 cm (electrons and muons only) and |d z | < 0.2 cm. The two objects forming a pair must have opposite charge and be separated by ∆R > 0.5. To reduce Z/γ * → ℓℓ contamination, a third-lepton veto is applied, by discarding events where additional electrons or muons are present.
Finally, events selected with the above described criteria are required to have at least two additional AK4 jets with p T > 20 GeV, |η| < 2.4, and ∆R(j, ℓ) > 0.5, where ℓ represents each of the two leptons forming the ττ candidate pair. Studies on the previous published bbττ result [11] highlighted the selection of the H → bb candidate as being one of the least efficient. A new b jet assignment algorithm (referred to as HH-bTag) has therefore been developed, based on a neural network architecture. The input features of the network include the b jets and H → ττ system kinematic variables, the DEEPJET score of the b jet candidates, and their angular separation with respect to the H → ττ candidate. For each event, all possible b jet candidates are assigned a score by the HH-btag algorithm; the two jets with the highest scores are taken as those originating from the decay of the H. Two additional jets are looked for amongst all the remaining jets with p T > 30 GeV and |η| < 4.7. Among all the additional jets in the event, all possible pairs are built and that with the highest invariant mass (m jj ) is chosen as the VBF candidate pair. In case the event was selected by the VBF ditau trigger, and not by the other ditau triggers, due to the lower p T thresholds on τ h candidates, three additional requirements are then made: m jj > 800 GeV, p T > 140 GeV and 60 GeV for the leading and subleading VBF jet candidates.
The selections on leptons and jets described in this section and in Table 1 lead to a final efficiency and acceptance product of 5.5 (3.4)% for the ggF (VBF) SM signal events.

Event categorization and discriminating variables
Events selected as described in Section 4 are classified in eight mutually exclusive categories, in order to maximize the sensitivity to both the ggF and VBF HH searches.
In order to categorize signal events produced through the VBF mechanism, a selection (denoted "VBF inclusive") is introduced by requiring that two VBF jet candidates, as described in Section 4, can be identified in the event and that they further pass two requirements (jointly referred to as the "VBF tag") designed to reduce the background contamination: m jj > 500 GeV and a separation in pseudorapidity (∆η) between the two VBF jet candidates larger than 3.
Because of the ggF signal contamination in the VBF inclusive selection, in order to enhance the analysis sensitivity, a multiclassification approach is introduced that allows to create two additional signal categories (classVBF for the HH VBF events and classGGF for the ggF contribution), as well as three background-enriched ones (classttH for ttH events, classTT for the tt contamination, and classDY for the Z/γ * → ℓℓ contribution), used to constrain the systematic uncertainties affecting these processes. Each event is assigned to the category for which it has the highest multiclassifier score. The score represents the probability of originating from the specific process denoted by the category name.
Events without two VBF jet candidates identified, or with two VBF jet candidates identified but not passing the VBF tag selection, are classified exploiting the differences in Lorentz boost regimes of bb production in order to increase the analysis sensitivity. If the two b jets have ∆R(b, b) > 0.8, jets are reconstructed as separated AK4 jets; on the other hand, if they have ∆R(b, b) < 0.8 then they are merged and reconstructed as a single AK8 jet. In the first case, the jets are referred to as "resolved", in the latter case as "boosted". To be categorized as boosted, events are required to have a minimal AK8 jet mass of 30 GeV, a distance between the two subjets and the previously selected AK4 jets of ∆R < 0.4, and both the subjets must pass the loose b tagging working point. All events not satisfying these requirements are categorized as resolved. In addition, resolved events are split into two categories depending on the b tag multiplicity: events where only one jet passes the medium b tagging working point (res1b), and events where both jets pass this working point (res2b).
After object selection and event categorization, the kinematic information in the events is used to introduce additional requirements on the invariant masses of the b jet candidates (m bb ) and the tau lepton candidates (m τ τ ): the latter is reconstructed using the SVFIT algorithm [53], a dynamic likelihood technique combining the kinematical properties of the two visible lepton candidates and the p miss T in the event. Events satisfying the invariant mass requirements constitute the signal region (SR) of the analysis. For the resolved categories the selection is: while for the boosted category, because of its different kinematical properties, the selection is: No invariant mass requirement is applied to the VBF multicategories.
In Eqs. (2) and (3), the mass offsets (numerator) and resolutions (denominator) are chosen using a random search that minimizes the background acceptance for a signal efficiency larger than 90%. Contrary to what was done in Ref.
[11], the purpose of the invariant mass selection is no longer to maximize the signal significance, but rather to remove significantly outlying background events in regions where no signal is expected, while the discrimination of HH events from the background is left to a specifically designed neural network, as described in the next paragraph.
The eight orthogonal categories (res2b, res1b, boosted, classVBF, classGGF, classttH, classTT and classDY) obtained are used for the signal extraction via a deep neural network developed to identify HH → bbττ events. In the following, this discriminating algorithm will be referred to simply as DNN. A single training is performed with events from all years, channels and categories, obtained from the simulated background and the SM HH signal samples as described in Section 3. The scope of the training is to classify events as originating either from signal or background processes by assigning them a single prediction. Predictions closer to zero indicate "background-like" events, while predictions closer to one indicate "signal-like" events. The final DNN distributions used in the signal extraction fit are obtained by inferring predictions of the trained network separately in each of the eight orthogonal categories, split by channel and year.
The DNN is composed of two discriminators, each consisting of ten neural networks trained via ten-fold stratified cross-validation of the training set. The architecture of the networks is the same as the optimal architecture developed in Ref. [54], with the addition of increased overfitting supervision [55,56]. The approach used to train the DNN [57][58][59][60] follows that used in the CMS HH High-Luminosity LHC projection analysis for the bbττ final state [61,62]. In this approach, the simulated events are split evenly into two subsets and each discriminator of the DNN is trained on a different subset. At the inference stage, the discriminators are used in the subset of events on which they were not trained. Thus, all of the simulated events are used for inference, reducing the associated statistical uncertainty on the sample density distributions, while not being subject to a systematic bias in the predictions that would otherwise arise if the discriminators were applied to events on which they were actually trained. It should be noted that this approach does not provide an explicit validation sample, and does therefore not permit any fine-tuning of hyper-parameters without the risk of biasing the model. Nevertheless, the lack of tuning is not expected to significantly impair the performance of the DNN [54]. A total of 26 features (selected from a starting pool of over 100) are used as an input to the DNN. Besides the categorical features, such as the year of data taking, the ττ decay mode, and the number of VBF jet candidates, among the most important features are the DEEPJET scores of the b jets, binned in working points, the invariant masses of the HH, ττ, and bb systems, and several kinematic variables, such as visible momenta and distances in the η − ϕ−plane between reconstructed particles, describing the spatial distribution of the physics objects in the event.

Background modeling
The main background sources contaminating the SR are tt production, Z/γ * → ℓℓ production, and QCD multijet events; all three are modeled using specific approaches. The productions of single top, single H, W boson in association with jets, diboson, triboson, and tt pairs in association with a single boson or a pair of vectors bosons are instead estimated from simulation, as described in Section 3. For all backgrounds, except multijet, both events with real hadronically decaying tau leptons and with lepton or quark or gluon jets misidentified as τ h candidates are taken from simulation. The fraction of misidentified τ h candidates in the dominant background (tt) is lower than 20% in the analysis SR. No specific correction is applied to compensate for potential differences between simulation and data for fake τ h candidates, but a systematic uncertainty is added to the fit model as described in Section 7.
The tt background contribution is modeled relying on simulation. The normalization of the background shows a disagreement with respect to the observed data, thus a correction scale factor (ttSF) is fitted from a tt-enriched control region (CR). Three CRs are defined, one for each year, to contain τ e τ h , τ µ τ h , and τ h τ h events that satisfy the res2b selections outlined in Section 5 and pass a mass requirement obtained by inverting, among the resolved categories, the one defined in Eq. (2). The contamination from other backgrounds in the tt-enriched CR is less than 9% for all three years. The ttSFs resulting from the fit are 0.908, 0.988, and 0.966 for 2016, 2017, and 2018, respectively, with an uncertainty, also coming from the fit, that ranges between 0.6 and 0.9%.
The Z/γ * → ℓℓ background contribution is modeled relying on simulation. Scale factors for the simulation normalization are taken from 18 CRs containing events with two isolated muons compatible with the Z → µµ decay. The same jet selection criteria as in the SR are applied in the CRs. The CRs are defined as follows: the data are first split into three CRs based on the number of b-tagged jets (0, 1, or 2); each CR so obtained is further split into six CRs based on the p T of the reconstructed Z candidate. Simulated events are split, obtaining 18 templates based on the number of b partons at the generator level and the p T of the generator-level Z boson; one additional template, combining all other backgrounds, is also considered. The scale factors are obtained performing a simultaneous fit of the di-muon mass in the CRs using the 19 templates, where their normalizations are kept floating.
The multijet background contribution is determined from data in jet-enriched regions where the tau lepton pair requirements are inverted. Two different shapes are obtained in regions where either the pair charge or the τ h isolation requirement is inverted (in the τ h τ h final state, only the isolation requirement on the lowest p T τ h candidate is inverted) and all the other selections are applied as in the SR. The final multijet background template is then defined as the mean of these two shapes. The yield is obtained as the product of the yields in the two regions defined above divided by the yield obtained in the region where both the charge and isolation requirements are inverted. The contributions of other backgrounds, based on predictions as outlined above, are subtracted in all these regions.
A comparison of data and simulation prediction, corrected as described in this Section, for three of the most important input features of the DNN described in Section 5 is shown in Fig. 3.

Systematic uncertainties
In this analysis, we include various sources of systematic uncertainty that originate from a limited knowledge of the background and signal processes, discrepancies between simulation and  data, and imperfect knowledge of the detector response. They are categorized as "normalization" and "shape" uncertainties: while the first affect only the total event yield of the processes, the latter affect also the distribution of the events. The uncertainties described in this Section are introduced as nuisance parameters in the maximum likelihood fit described in Section 8, with log-normal (Gaussian) priors for normalization (shape) uncertainties.
The following normalization uncertainties are considered. These uncertainties are applied only to the signals and to background processes estimated from simulation. Since the normalizations of the tt, Z/γ * → ℓℓ, and multijet backgrounds are obtained from data, they are not subject to the integrated luminosity uncertainties.
• Electron and muon reconstruction, isolation, and identification uncertainties are determined from the simulation-to-data scale factors; a value of 1% for both electrons and muons is obtained [37,39]. An additional uncertainty of 3 (15)% for tau leptons with p T < 100 GeV (> 100 GeV) is added in the τ h τ h channel [10]. • During the 2016-2017 data taking, a gradual shift in the timing of the inputs of the ECAL L1 trigger in the region at |η| > 2.0 caused a trigger inefficiency. For events containing an electron (a jet) with p T > 50 (100) GeV, in the region 2.5 < |η| < 3.0 the efficiency loss is approximately 10-20%, depending on p T , η, and time. Correction factors were computed from data and applied to the acceptance evaluated by simulation. An uncertainty of 2% is assigned to this effect.
• The uncertainty in the pileup reweighting technique is estimated by varying the values of the applied pileup weights by their uncertainty. The resulting systematic uncertainty is estimated to have a value of 1% and it is correlated among all channels and categories in each year.
• The normalization of the tt background is taken from a fit to a CR per year, as described in Section 6. The uncertainty in the scale factors obtained in these CRs is purely statistical and is always below 1%.
• The normalization of the Z/γ * → ℓℓ background is taken from a fit to 18 CRs per year, as described in Section 6. The uncertainties in the scale factors obtained in these CRs are propagated to the SR taking into account their correlation and range from 0.1 to 60% depending on the year and CR considered.
• The multijet background contribution is determined from data in jet-enriched regions, as described in Section 6. Two normalization uncertainties are derived in order to take into account the event yield statistical fluctuations in these CRs and their dependence on the τ h isolation requirement used to define them.
• The uncertainties in the normalizations of the backgrounds modeled relying solely on the simulated events range from 2 to 10%.
• The theoretical uncertainty in the cross section of HH production: via ggF: +6%

−23%
(scale + m t ), ±3% (PDF + α S ) fb [6]; and via VBF: +0.03 −0.04 % (scale) and ±2.1% (PDF + α S ) [8]. These uncertainties are only considered when upper limits are quoted with respect to the SM and for the likelihood scans, they are not included for the upper limits on the cross section.
• The theoretical uncertainty in the H branching fractions [63] amount: for the decay to bb, to ±0.65% (theory), +0.72 −0.74 % (m q ), +0.78 −0.80 % (α S ), where m q is the quark mass; and for the decay to tau leptons to +1.16 −1.17 % (theory), +0.98 −0.99 % (m q ), 0.62% (α S ). • The uncertainty in the modeling of the VBF signal in PYTHIA8 is estimated through samples generated setting the dipole recoil option, which affects the initial-state parton showers, to ON [64]. The ratio of integrated yields in the dipole recoil ON/OFF samples is taken as the uncertainty: it varies from 10% for largely populated categories such as classVBF, to 70% for categories with poor VBF signal contribution, such as classttH.
The following shape uncertainties are considered.
• The uncertainty in the measurement of the energy of τ h leptons [10]. The uncertainties are derived by combining low-and high-p T measurements in Z → ττ and in the off-shell W * → τν events. Four different uncertainties are included to take into account the different τ h decay modes considered in this analysis. When considering the uncertainty for a particular decay mode, the shift is applied only to the τ h candidates that are reconstructed with that particular decay mode, while all other τ h candidates are left unchanged.
• Uncertainties related to the calibration of jet energy scale (JES) and resolution (JER). For JES, 11 separate sources of uncertainty are included per year: those appearing in multiple years are treated as fully correlated, while those appearing only in one year are treated as uncorrelated. For JER, alternative templates are produced by shifting all the jet-related features. These shifts stem from the use of scale factors to smear the energy of simulated jets to match the observed energy resolution in data.
• Separate uncertainties in the energy scale of electrons misidentified as τ h candidates are provided to take into account two different decay modes, h ± and h ± π 0 . The uncertainty in the energy scale of muons misidentified as τ h candidates is 1%, uncorrelated across decay modes.
• The uncertainties arising from the application of the DEEPTAU identification scale factors are determined using a tag-and-probe procedure as a function of the τ h candidate p T . Five uncertainties are computed for the identification of taus against jets and are calculated in p T bins of the τ h candidate. All these uncertainties are used in the τ µ τ h and τ e τ h channels, while, since in the τ h τ h channel both leptons are required to have p T above a threshold, only the highest p T bin uncertainty can be applied. Two uncertainties are present in the identification of tau leptons against electrons, one for the barrel and one for the endcap, and are treated as uncorrelated across η bins of the τ h candidate.
• The shape uncertainty in the multijet contribution is determined from the two alternative templates described in Section 6.
• The uncertainties arising from the misidentification of jets as τ h candidates are determined from τ µ τ h CRs defined inverting the charge sign requirement on the tau leptons and imposing that neither of the b jet candidates pass the "medium" working point of the DEEPJET algorithm. Six uncertainties, one for the barrel and one for the endcap for each year, are derived and treated as uncorrelated across years and detector regions.
• Trigger efficiencies and scale factors are measured in a Z → ττ → µν µ ν τ τ h ν τ enriched region with a tag-and-probe procedure and fitted separately for data and simulation. The uncertainty, determined as a function of the reconstructed τ h candidate p T , is taken from the fit. The scale factors are obtained from the data to simulation ratio and their uncertainties are propagated accordingly. Four uncertainties are included to take into account the different τ h decay modes considered in this analysis. They are applied to the τ h leg of each channel. Two additional trigger uncertainties are used to cover the cases where the tau lepton decays to an electron or a muon. Finally, one uncertainty is added for the τ h τ h final state in 2017-2018 to take into account the jet legs scale factors of the VBF trigger.
• The uncertainty in the b-tagging efficiency. This takes into account the contamination from udscg (cb) jets in heavy-(light-) flavor regions, as well as statistical fluctuations in both data and simulation samples used for the computation of the b-tagging efficiency.
• The uncertainties in the pileup jet identification scale factor as functions of p T and η.
The largest sources of systematic uncertainties come from the imperfect knowledge of the ggF HH production cross section, the statistical fluctuations affecting the multijet background estimation, and the mismodeling of jet and τ leptons identification and reconstruction in simulated samples. The total impact of the systematic uncertainties on the result is approximately 15%.

Results
A binned maximum likelihood fit of the DNN prediction to the data is performed in eight categories per channel simultaneously for all three years, for a total of 72 input distributions. The systematic uncertainties described in Section 7 are introduced as nuisance parameters in the likelihood function. Upper limits at 95% confidence level (CL) on the HH production cross section are set using the asymptotic modified frequentist method (asymptotic CL s ) [65][66][67]. We report here the expected and observed sensitivities for the inclusive (ggF + VBF) HH production cross section as functions of κ λ , whereas the expected and observed sensitivities for the VBF HH production cross section are reported as functions of κ 2V . Coupling modifiers that are not part of the scanning procedure are fixed to unity, which corresponds to the SM prediction.
The binning schemes for the DNN prediction distributions used in the likelihood fit are chosen to minimise the expected upper limit, while keeping the number of bins the smallest possible and ensuring the stability of the fit. The binning schemes start from finely-binned histograms: bins are then merged using Bayesian optimization techniques to find the best performing scheme [68]. To reduce computational complexity, limits are minimized consecutively by combining one category after the other and freezing the binning for all previous histograms. In order to avoid poor modeling of background yield or background uncertainty, each bin is required to contain at least one tt and one Z/γ * → ℓℓ event and to have a weighted background yield larger than 0.03.
The obtained postfit DNN distributions used for the maximum likelihood fit are exemplified in Fig. 4, where the distributions for the two most sensitive categories in the ggF (res2b) and VBF (classVBF) searches in the τ h τ h channel in 2018 are shown; the distributions are plotted as a function of the DNN bin number for an easier visualization.
As the total number of distributions is relatively high, we provide distributions that combine the content of bins with equal or similar expected sensitivity in terms of a signal-to-noise ratio, as shown in Fig. 5, separately for the three channels τ e τ h , τ µ τ h , and τ h τ h , as well as in Fig. 6 for the combination of the full 2016-2018 data set. In each plot, the bins of all input distributions are collected, then ordered according to their exected prefit signal-to-square-root-background ratio, and eventually merged according to a dedicated binning rule. Hence, the right-most bins are most sensitive to the signal, whereas the left-most bins are highly dominated by background contributions.
The expected and observed limits for the ggF plus VBF (inclusive) HH production cross section at the SM point (κ λ = 1) and those for the VBF only HH production cross section (κ 2V = 1) are listed in Tables 2 and 3, respectively. A visual representation of these limits is provided in Fig. 7.
The upper limits are used to set two-dimensional constraints on the coupling modifiers κ λ , κ t ,  signal-to-square-root-background ratio, where the signal is the SM HH signal expected in that bin and the background is the prefit background estimate in the same bin, separately for the τ e τ h channel (upper left), the τ µ τ h channel (upper right), and τ h τ h channel (lower). The ratio also shows the signal scaled to the observed exclusion limit (as shown in Table 2). κ 2V , and κ V , as shown in Fig. 9. A point in a two-dimensional parameter space is excluded when the upper limit on the overall rate of HH production at 95% CL is measured to be below the theoretical cross section, evaluated at the corresponding parameter values.
For both the ggF and the VBF results here reported, the largest sensitivity is provided by the τ h τ h channel, followed by the τ µ τ h channel, and lastly by the τ e τ h one. The resolved categories, especially res2b, dominate the sensitivity when measuring the constraint on κ λ , while the combined VBF multicategories, particularly classVBF, provide the most stringent constraints on κ 2V .
Assuming that an HH signal exists with the properties predicted by the SM, a scan of the likelihood as a function of the κ λ and κ 2V coupling modifiers is performed, as shown in Fig. 10. This method consists in obtaining central best fit values and confidence intervals for the coupling modifiers directly from the profile likelihood [67], as opposed to using the CL s method as in the results reported above. The observed confidence interval on κ λ corresponds to [+0. 68, +6.37] at 68% CL and to [−1.77, +8.73] at 95% CL for a best fit value of κ λ = +3.61. The observed confidence interval on κ 2V corresponds to [+0.28, +1.88] at 68% CL and to [−0.34, +2.49] at 95% CL for a best fit value of κ 2V = +1.07. Data / Bkg. Figure 6: Combination of bins of all postfit distributions, ordered according to the expected signal-to-square-root-background ratio, where the signal is the SM HH signal expected in that bin and the background is the prefit background estimate in the same bin, separately for the background contribution split into physics processes (left), and split into the three considered final state channels (right). The ratio also shows the signal scaled to the observed exclusion limit (as shown in Table 2).

Summary
A search for nonresonant Higgs boson pair (HH) production via gluon-gluon fusion (ggF) and vector boson fusion (VBF) processes in final states with two bottom quarks and two tau leptons was presented. The search uses the full 2016-2018 data set of proton-proton collisions at a center-of-mass energy of √ s = 13 TeV recorded with the CMS detector at the LHC, corresponding to an integrated luminosity of 138 fb −1 . The three decay modes of the ττ pair with the largest branching fraction have been selected, requiring one tau lepton to be always decaying hadronically and the other one either leptonically or hadronically. Upper limits at 95% confidence level (CL) on the inclusive ggF plus VBF HH production cross section are set, as well as on the VBF only HH production cross section. This analysis benefits from an improved trigger strategy as well as from a series of techniques  developed especially for this search: among others, several neural networks to identify the b jets from the H decay, to categorize the events, and to perform signal extraction. Moreover, this analysis builds up on the improvements made by the CMS Collaboration in the jet and tau lepton identification and reconstruction algorithms. All these techniques enable the achievement of particularly stringent results on the HH production cross sections.

Acknowledgments
We congratulate our colleagues in the CERN accelerator departments for the excellent performance of the LHC and thank the technical and administrative staffs at CERN and at other CMS institutes for their contributions to the success of the CMS effort. In addition, we gratefully acknowledge the computing centers and personnel of the Worldwide LHC Computing Grid and other centers for delivering so effectively the computing infrastructure essential to our analyses. Finally, we acknowledge the enduring support for the construction and operation of the LHC, the CMS detector, and the supporting computing infrastructure provided by the following funding agencies:  [10] CMS Collaboration, "Identification of hadronic tau lepton decays using a deep neural network", JINST 17 (2022) P07023, doi:10.1088/1748-0221/17/07/P07023, arXiv:2201.08458.