Search for Higgs boson pair production in the $\gamma\gamma\mathrm{b\overline{b}}$ final state in pp collisions at $\sqrt{s} =$ 13 TeV

A search is presented for the production of a pair of Higgs bosons, where one decays into two photons and the other one into a bottom quark-antiquark pair. The analysis is performed using proton-proton collision data at $\sqrt{s} =$ 13 TeV recorded in 2016 by the CMS detector at the LHC, corresponding to an integrated luminosity of 35.9 fb$^{-1}$. The results are in agreement with standard model (SM) predictions. In a search for resonant production, upper limits are set on the cross section for new spin-0 or spin-2 particles. For the SM-like nonresonant production hypothesis, the data exclude a product of cross section and branching fraction larger than 2.0 fb at 95% confidence level (CL), corresponding to about 24 times the SM prediction. Values of the effective Higgs boson self-coupling $\kappa_\lambda$ are constrained to be within the range $-$11<$\kappa_\lambda$<17 at 95% CL, assuming all other Higgs boson couplings are at their SM value. The constraints on $\kappa_\lambda$ are the most restrictive to date.


Introduction
The discovery of a boson with a mass of about 125 GeV, with properties compatible with those expected for the Higgs (H) boson of the standard model (SM) [1][2][3], has stimulated interest in the detailed exploration and understanding of the origin of the Brout-Englert-Higgs (BEH) mechanism [4,5]. The production of a pair of Higgs bosons (HH) is a rare process that is sensitive to the structure of the BEH potential through the Higgs boson's self-coupling mechanism. In the SM, the corresponding cross section via gluon-gluon fusion in proton-proton (pp) collisions at √ s = 13 TeV is predicted to be σ gg→HH = 33.5 +2.5 −2.8 fb [6][7][8], a value beyond the reach of the analyses based on the current integrated luminosity of the CERN LHC program.
Many theories beyond the SM (BSM) suggest the existence of heavy particles that can couple to a pair of Higgs bosons. These particles could appear as a resonant contribution to the invariant mass of the HH system and induce a significant increase of the HH production cross section with respect to the SM. For example, models with warped extra dimensions (WED) [9] postulate the existence of compactified extra spatial dimensions. They predict new resonances decaying to a Higgs boson pair, including spin-0 radions R [10] and spin-2 Kaluza-Klein gravitons (KK graviton) [11]. The benchmark scenario considered in this paper, the bulk Randall-Sundrum (RS) model [12], assumes that all fields can propagate in the extra dimension. Models with an extended Higgs sector also predict a spin-0 resonance that decays to a pair of SM Higgs bosons, if sufficiently massive. Examples of such models are the singlet extension of the SM [13], the two-Higgs-doublet models [14] (in particular, the minimal supersymmetric model [15]), and the Georgi-Machacek model [16]. Many of these models predict that heavy scalar production occurs predominantly through the gluon-gluon fusion process. The Lorentz structure of the effective coupling between the scalar and the gluon is the same for a radion or a heavy Higgs boson. Therefore, the kinematics for the production of a radion or an additional Higgs boson are essentially the same, provided the spin-0 resonance is narrow. The radion results can therefore be applied to constrain this class of models.
If the new particles are too heavy to be observed through a direct search, they may contribute to HH production through virtual processes and lead to an enhancement of the cross section with respect to the SM (as discussed, e.g., in Ref. [17]). In addition, different BSM models can modify the Higgs boson's fundamental couplings and impact HH production in gluon-gluon fusion [18] (ggHH) and vector boson fusion (VBF) [6,8].
This letter describes a search for the production of pairs of Higgs bosons via pp → HH → γγbb using a data sample of 35.9 fb −1 collected by the CMS experiment in 2016. Both nonresonant and resonant production are explored, with the search for a narrow resonance X conducted at masses m X between 260 and 900 GeV. The fully reconstructed γγbb final state combines the large SM branching fraction (B) of the H → bb decay with the comparatively low background and good mass resolution of the H → γγ channel, yielding a total B(HH → γγbb) of 0.26% [6]. The search uses the mass spectra of the diphoton (m γγ ), dijet (m jj ), and four-body systems (m γγjj ), as well as the associated helicity angles, to provide discrimination between the HH production signal and the other SM processes. The ggHH production process is studied in detail and the sensitivity of CMS data to the nonresonant VBF production mechanism is investigated for the first time.

The CMS detector
The CMS detector, its coordinate system, and the main kinematic variables used in the analysis are described in detail in Ref. [29]. The central feature of the CMS apparatus is a superconducting solenoid, of 6 m internal diameter, providing a magnetic field of 3.8 T. A silicon pixel and strip tracker covering the pseudorapidity range |η| < 2.5, an electromagnetic calorimeter (ECAL) made of lead tungstate crystals, and a brass and scintillator hadron calorimeter reside within the field volume. Forward calorimeters extend the pseudorapidity coverage above |η| = 3.0. Muons are detected in gas-ionization chambers embedded in the steel flux-return yoke outside the solenoid. The first level of the CMS trigger system, composed of special hardware processors, uses information from the calorimeters and muon detectors to select the most interesting events in a time interval of less than 4 µs. The high-level trigger further decreases the event rate, from around 100 kHz to less than 1 kHz, before data storage [30].

Simulated events
Signal samples are simulated at leading order (LO) using MADGRAPH5 aMC@NLO 2.3.2 [31,32] interfaced with LHAPDF6 [33]. The next-to-leading-order (NLO) parton distribution function (PDF) set PDF4LHC15 NLO MC is used [34][35][36][37][38]. The models describe the production through gluon-gluon fusion of particles with narrow width (set to 1 MeV) that decay to two Higgs bosons with the mass of m H = 125 GeV [39]. Events are generated either for spin-0 radion production, or spin-2 KK graviton production, as predicted by the bulk RS model. For each spin hypothesis 16 mass points are generated within the range 260 ≤ m X ≤ 900 GeV in steps of 10 GeV for m X between 260 and 300 GeV, and in steps of 50 GeV for m X above 300 GeV.
In the nonresonant case we use the effective field theory approach and notations from Refs. [6,40]. First we consider two SM coupling modifiers: κ λ ≡ λ HHH /λ SM HHH , which measures deviations of the Higgs boson trilinear coupling λ from its SM expectation, λ SM HHH ≡ m 2 H /(2v 2 ) = 0.129; and κ t ≡ y t /y SM t , which measures deviations of the top quark Yukawa coupling y t from its SM expectation y SM Here v = 246 GeV denotes the vacuum expectation value of the Higgs boson, m H its mass, and m t denotes the top quark mass. Second, this analysis also considers couplings not found in the SM that are derived from dimension-6 operators: contact interactions between two Higgs bosons and two top quarks (c 2 ), between one Higgs boson and two gluons (c g ), and between two Higgs bosons and two gluons (c gg ). We define these three parameters in such a way that their values are zero within the SM. The most general production is described by a modification of the SM Lagrangian, the relevant part of which, labelled as L HH , is given in the following equation [41]: where t L and t R are the top quark fields with left and right chiralities, respectively. The H denotes the physical Higgs boson field, G µν is the gluon field strength tensor, and α S is the strong coupling constant. The notation h.c. is used for the Hermitian conjugate. Five main Feynman diagrams, shown in Fig. 1, contribute to ggHH at LO.
The cross section is expressed at LO as a function of five BSM parameters (κ λ , κ t , c 2 , c g , and c gg ) [6,40] and this parametrization is approximately extended to the next-to-next-to-leading order matched to the next-to-next-to-leading log in quantum chromodynamics (QCD) [6,7,42]. which affects the final sensitivity through the modification of the acceptance and efficiency of the experimental analysis.
To avoid a prohibitively large number of samples to simulate and to analyze, we use the method proposed in Refs. [6,40,43] to partition the parameter space into 12 regions with distinct kinematics, referred to as clusters. Each of the clusters with its HH kinematics can be represented by a point in the 5D parameter space that is referred to as a benchmark hypothesis. We simulate the 12 benchmark hypotheses together with two additional ones -one assuming all parameters to be SM ones (referred to as SM benchmark hypothesis) and the other using identical assumptions except for a vanishing Higgs boson self-coupling (referred to as κ λ = 0). The list of benchmark hypotheses is provided in Table 1. An additional HH sample is produced via the VBF mechanism using SM couplings.
The ensemble of events obtained by combining all 14 gluon-gluon initiated samples covers the possible kinematic configurations of the effective field theory parameter space. These events can therefore subsequently be reweighted using the procedure derived in Ref. [44] to model any desired point in the full 5D parameter space. In this procedure, an event-by-event weight is analytically calculated from the generator-level information on the HH system. Table 1: Parameter values of nonresonant BSM benchmark hypotheses. The first two columns correspond to the SM and κ λ = 0 samples, while the next 12 correspond to the benchmark hypotheses identified using the method from Ref. [43]. The dominant backgrounds to the γγbb final state are those in which two objects identified as photons (either prompt photons or jets misidentified as photons) are produced in association with jets (referred to as nγ + jets). The simulation of these final states is challenging due to large effects of higher orders in QCD [45] and limited knowledge of fragmentation effects in the case of a jet misidentified as a photon. In this analysis, these contributions are modeled entirely from the data.
Single Higgs boson production in the SM, with two additional jets and with a subsequent decay of the Higgs boson to two photons, is also considered. In some cases additional jets can be effectively initiated by b quarks, but in others they can be initiated by lighter quarks and misidentified as a b jet. The considered processes -gluon-gluon fusion (ggH), VBF, and associated production with tt (ttH), bb (bbH), and vector bosons (VH) -are sources of background in this analysis. They are simulated using MADGRAPH5 aMC@NLO 2.2.2 for VH and 2.3.3 for bbH, and POWHEG 2.0 [46][47][48][49] at NLO for ggH, VBF H , and ttH.
All generated events are processed with PYTHIA 8.212 [50] with the tune CUETP8M1 [51] for showering, hadronization, and the underlying event description, and GEANT4 [52] for the simulation of the CMS detector response. The simulated events include multiple overlapping pp interactions (pileup) as observed in the data.

Data set and event selection
Events are selected using double-photon triggers, which require two photons with transverse momenta p γ1 T > 30 GeV and p γ2 T > 18 GeV for the leading and subleading photons, respectively. In addition, calorimeter-based isolation and shower shape requirements are imposed online on the two photons. Finally, the diphoton invariant mass is required to exceed 90 GeV.
In the offline selection, events are required to have at least one well-identified vertex with a position less than 24 cm away from the nominal interaction point in the z-direction. The primary vertex is identified by a multivariate analysis that was trained for the measurement of H → γγ production [53]. This analysis uses the momenta of the charged particle tracks associated with the vertex, and variables that quantify the vector and scalar balance of p T between the diphoton system and the charged particle tracks associated with the vertex. The presence of at least two jets in the final state of this analysis helps to correctly identify the primary vertex in more than 99.9% of the simulated signal events.

The H → γγ candidate
Photons are identified using a multivariate technique that includes as inputs the p T of the electromagnetic shower, its longitudinal leakage into the hadron calorimeter, and its isolation from jet activity in the event. It was designed during the data taking at √ s = 8 TeV [53,54] and retrained with √ s = 13 TeV data. Identified photon candidates with a track matched to the ECAL cluster are rejected. Photon energies are calibrated subsequently and their energies in simulated samples are smeared to match the resolution in data [53].
Events are required to have at least two identified photon candidates that are within the ECAL and tracker fiducial region (|η| < 2.5), but excluding the ECAL barrel-endcap transition region (1.44 < |η| < 1.57). The photon candidates are required to pass the following criteria: 100 < m γγ < 180 GeV; p γ1 T /m γγ > 1/3 and p γ2 T /m γγ > 1/4. In cases where more than two photons are found, the photon pair with the highest transverse momentum p γγ T is chosen. For events that pass the above selections, the trigger efficiency is 100%.

The H → bb candidate
The particle-flow (PF) algorithm aims to reconstruct each individual particle (referred to as a PF candidate) in an event with an optimized combination of information from the various elements of the CMS detector [55]. Jets are clustered from these candidates using the anti-k T algorithm with a distance parameter Rj = 0.4 [56,57]. Jet candidates are required to have p T > 25 GeV and |η| < 2.4. In addition, identification criteria are applied to remove spurious jets associated with calorimeter noise. Finally, jets must be separated from each of the two selected photon candidates by a distance ∆R γj ≡ √ (∆η γj ) 2 + (∆φ γj ) 2 > 0.4, where φ is the azimuthal angle in radians. The selected jets are combined into dijet candidates. At least one dijet candidate is necessary for an event to be selected. The combined secondary vertex algorithm, optimized for 13 TeV data, provides a continuous b tagging score defined between 0 and 1. It is used to quantify the probability that a jet is a result of a b quark hadronization [58]. In cases where more than two jets are found, the dijet constructed from the two jets with the highest b tagging scores is selected. An event is accepted if 70 < m jj < 190 GeV.
The energies of the two selected jets are corrected using the standard CMS algorithm, which is flavor blind [59]. In addition to this correction, a jet energy regression procedure is used to improve the m jj resolution. A multivariate analysis technique is used to correct the absolute scale of the heavy-quark jets by taking into account their specific fragmentation features, i.e., a larger contribution from charged leptons and neutrinos than light-quark jets. The approach we use in this letter is similar to the one used in the CMS search for the SM H → bb decays [60], and in addition, takes advantage of variables related to the missing transverse momentum vector, p miss T , to estimate the neutrino contribution to the heavy-quark decay. The p miss T is calculated as the negative of the vectorial sum of the transverse momenta of all PF candidates. The jets forming the dijet candidate are ordered by p T , and an optimized energy regression distinguishes the higher p T jet from the lower p T one.

The HH system
A summary of the baseline selection requirements is presented in Table 2. After the diphoton and dijet candidates are selected, they are combined to form an HH candidate.  [70,190] To have a better estimate of m HH we correct m γγjj using which mitigates the m γγjj dependency on the dijet and diphoton energy resolutions with the assumption that the dijet and diphoton originate from a Higgs boson decay [61]. This procedure has an effect similar to the kinematic fit used previously in Ref.
[23], but without the need for analysis-level measurements of quantities such as the jet energy and position resolution. The improvements in the m HH reconstruction are most striking for low-m X hypotheses, as shown in Fig (Table 2), and are normalized to unit area.
With the four reconstructed objects from the HH decay, angular correlations in the signal can provide important information to separate it from the background. In this analysis we consider three helicity angles. The scattering angle, θ CS HH , is defined in the Collins-Soper (CS) frame of the four-body system [62], as the angle between the momentum of the Higgs boson decaying into two photons and the line that bisects the acute angle between the colliding protons. Since the directions of the two Higgs boson candidates are collinear in the CS frame, the choice of the Higgs boson decaying to photons as the reference direction is arbitrary. Therefore, we use the absolute value of the cosine of this angle |cos θ CS HH | to obviate this arbitrariness. The decay angles are defined as the angles of the decay products in each Higgs boson's rest frame with respect to the direction of motion of the boson in the CS frame. Since the two photons in the Higgs boson decay are indistinguishable and the charges of the b quarks are not considered in this analysis, the absolute values of the cosines of these angles are used: |cos θ γγ | and |cos θ jj |.

Background properties
The main kinematic distributions of the γγbb final state that are used throughout the analysis (invariant masses and helicity angles) are shown in Fig. 3, after the basic selections summarized in Table 2. The data in Fig. 3 are dominated by nγ + jets events, which are the primary contribution to the background in this region of phase space. The SM single-Higgs boson production processes, represented by colored areas in the figure, are three orders of magnitude lower than the nonresonant nγ + jets processes. Only single-Higgs boson production processes with a sufficient number of events are shown for clarity of the figure: ttH, VH, and ggH. Finally, signal shapes are shown in the figures, where the resonant ones have been normalized to a cross section of 500 fb, while the SM-like ggHH (VBF HH) signal has been normalized to 10 4 (10 5 ) times its cross section.
As expected, the signals produce peaks in m γγ and m jj . The resonant di-Higgs boson signals peak sharply in M X , while SM HH processes exhibit broad structures induced by the interference pattern of different processes contributing to the HH production and shaped by the anal- HH (x10 → gg ) 5 VBF HH (x10 HH (x10 → gg ) 5 VBF HH (x10 HH (x10 → gg ) 5 VBF HH (x10 HH (x10 → gg ) 5 VBF HH (x10 HH (x10 → gg ) 5 VBF HH (x10 HH (x10 → gg ) 5 VBF HH (x10   Table 2 for the kinematic distributions described in Sections 1 and 4.3: M X (top left) and |cos θ CS HH | (top right); m γγ (middle left) and |cos θ γγ | (middle right); m jj (bottom left) and |cos θ jj | (bottom right). The statistical uncertainties on data are barely visible beyond the markers. The resonant signal cross section is normalized to 500 fb. ysis selections. The data show a smoothly falling mass spectrum, as expected for the nγ + jets background. Finally, the single-Higgs boson backgrounds peak in m γγ , but not in m jj or M X (except the VH process gives a peak in m jj around the V masses).
The |cos θ CS HH | distribution is sensitive to the tensor structure of the production mechanism (see for example Ref. [63]). It is relatively flat for ggHH [64] and the spin-0 mediated production. For the spin-2 mediated production it decreases toward 1, while for VBF HH and the data it rises toward 1. The distribution of the cosine of the H → γγ helicity angle is expected to be flat for the samples with genuine Higgs bosons. The decrease toward 1 is due to the selections on photon p T . In the data the distribution rises up to 0.8 and then decreases. This shape results from the combination of matrix element properties and the asymmetric selections on the photon p T . In the same way, the |cos θ jj | distribution is flat for the signal, but rises significantly toward 1 for the data and ggH.

Event classification and modeling
After dijet and diphoton candidate selection, events are placed into categories using the M X variable and a multivariate (MVA) classifier. Both variables are designed to minimize the correlation between m γγ and m jj . In each category, a parametric fit is performed in the twodimensional m γγ -m jj plane for the signal extraction procedure. This 2D approach helps to constrain the impact of the single-Higgs boson background since its structure in m jj differs from that of the signal. Finally, all the categories are combined together assuming a signal model to maximize the sensitivity of the analysis.

M X categorization
In the nonresonant search, a categorization is performed using the M X information. Since the M X spectrum for SM-like ggHH production has a maximum at around 400 GeV and the nγ + jets background peaks at the kinematic threshold of 250 GeV = 2m H , the maximal sensitivity is achieved for M X > 350 GeV. However, anomalous couplings may change the M X distribution for the signal hypothesis. Therefore, instead of imposing a M X selection, events are categorized in the nonresonant search into high-mass (HM) and low-mass (LM) regions that are above and below M X = 350 GeV, respectively.
In the resonant search, M X is used to define a unique signal region that depends on the mass of the resonance being sought. This mass window typically contains 60% of the signal at low m X , increasing gradually for higher m X . The resonant search starts just above the threshold at 260 GeV 2m H and extends up to m X = 900 GeV. Above this value the typical separation between the two boosted b quarks produced in the Higgs boson decays is of the order of R j and the decay products cannot be reconstructed as two separate jets [65].

MVA categorization
An MVA procedure is used to select the most signal-like events and to further classify them. With this goal, a boosted decision tree (BDT) is trained with the TMVA package [66] using three types of variables: • b tagging variables: the b tagging score of each jet in the dijet candidate; • Helicity angles as defined in Section 4.3; • HH transverse balance variables: p γγ T /m γγjj and p jj T /m γγjj , where p jj T is the transverse momentum of the dijet candidate.
The BDT is trained with the ensemble of ggHH samples as the signal hypothesis in the nonresonant search separately for low-and high-mass categories. For the resonant cases the ensemble of resonant signals is used to train one BDT for m X < 600 GeV and another one for m X > 600 GeV. This training strategy maximizes the sensitivity to the massive resonances. The background events used for the training are obtained from a control sample that was extracted from the data by inverting the identification condition on one of the two photons. This sample is dominated by events having a photon produced with three accompanying jets. After excluding the events with 120 < m γγ < 130 GeV in the signal and control samples, the kinematic properties of the two samples are well matched.   Table 2.The resonant signal cross section is normalised to 500 fb. Figure 4 shows the BDT output from one of the four trainings. The MVA efficiently separates gluon-gluon produced signals from the nγ + jets background that represents the dominant contribution to the data. The most powerful discriminating variables used in the BDT are the b tagging scores of the jets, followed by the kinematic variables. Therefore, the single-Higgs boson production samples with genuine contributions from two b quarks (ttH and Z(→ bb)H) are classified as more signal-like, while ggH and other VH processes are classified as more background-like. Finally, events from the VBF HH production are selected less efficiently than those from ggHH production.
For a given category the purity is defined as the ratio between the number of events coming from a hypothetical signal with a production cross section normalized to 1 fb and the number of background events. For each of the four trainings, the output of the MVA classifier is used to define a category with the highest purity (HPC) and another with medium purity (MPC). The remaining events are rejected, because they do not improve the sensitivity of the analysis. In the nonresonant low-M X MPC region, an additional requirement on the b tagging score, corresponding to 80% efficiency for real b jets [58], is applied to mitigate the impact of jets from pileup. Table 3 shows the HPC and MPC definitions for the different regions of the resonant and nonresonant analyses.

Signal efficiency
The typical signal efficiencies obtained in the resonant analysis are shown in Fig. 5. Consecutive selection efficiencies for different analysis steps are shown: after the trigger selection that includes a loose online preselection on the photons, after the diphoton candidate selection, after the dijet candidate selection, and after the MVA categorization. The final efficiencies range from approximately 20% (low mass) to 50% (high mass) for both the spin-0 and spin-2 resonance hypotheses. For an identical m X value, the efficiency is a bit higher for a spin-2 hypothesis than for a spin-0, because the Higgs bosons are, on average, produced more centrally in the KK graviton model considered in this letter (see Fig. 3).
The efficiency times acceptance is 30 (13)% for the SM-like ggHH (VBF HH) signal hypothesis, with 25 (10)% in the high-mass region and 5 (3)% in the low-mass one. The difference between the two production mechanisms mainly comes from the fact that the MVA was trained assuming a ggHH signal.  Table 2. Dijet selections are those described in Table 2.

Signal modeling
The probability density (PD) of each invariant mass distribution used in the signal extraction procedure is modeled with a double-sided Crystal Ball (CB) function, which is a modified version of the standard CB function [67] with two independent exponential tails. This modeling is useful in situations in which a lower-energy tail might be created by energy mismeasurements and a higher-energy tail by the mismatching of objects (for example when a jet from additional QCD radiation is misidentified as one of the H boson decay jets). The final two-dimensional signal model PD is the product of the independent m γγ and m jj models. The no-correlation hypothesis is checked by comparing the two-dimensional m γγ -m jj distribution from the simulated signal samples with the two-dimensional PD. For the typical expected number of signal events in this analysis, the impact of such correlations is found to be negligible.
The PD parameters are obtained by fitting the simulated signal samples in each analysis region. For each m X point and each nonresonant sample a dedicated fit is performed. The resolution is estimated by the σ eff value defined as half of the width of the narrowest region containing 68.3% of the signal shape. Examples of the signal shapes in the nonresonant analysis, assuming an SM-like signal, are shown in Fig. 6. The diphoton resolution is determined to be ≈1.6 GeV and the dijet resolution is ≈20 GeV. The mean of the Gaussian core of the CB function, µ, is close to m H .

Background modeling
The total background model is obtained as a sum of the nγ + jets background continuum PD and single-Higgs boson production PDs, the latter being normalized to their SM production cross sections.
The nγ + jets background continuum for both the resonant and nonresonant analyses is described using polynomials in the Bernstein basis [53]. A second-order Bernstein polynomial fits the data well. In categories with fewer events, occurring in the resonant analysis, a firstorder Bernstein polynomial is used. This choice of the background PD is tested for possible biases in the signal extraction by comparing it to other possible background models, such as exponentials and Laurent polynomials. The bias from the chosen PD is always found to be smaller than the statistical uncertainty in the fit, and can be safely neglected [2]. The correlation between m γγ and m jj was measured in the data control sample and found to be compatible with zero.
The SM single-Higgs boson background contribution is estimated using a PD fitted to simulated samples. For all production mechanisms, the m γγ distribution is modeled by a doublesided CB function. The m jj modeling depends on the production mechanism: for ggH and VBF H production m jj is modeled with a Bernstein polynomial; for the VH production a double-sided CB function is expected to describe the line shape of the hadronic decays of vector bosons; for ttH and bbH a double-sided CB function is also used. Like the signal modeling, the final 2D SM single-Higgs boson model is an independent product of models of the m γγ and m jj distributions. This background contribution is explicitly considered only for the nonresonant search, since for the resonant one it is severely reduced by a tight selection window on M X . The residual events are accounted for by the continuum background models for the m γγ and m jj variables.

Fitting procedure and systematic uncertainties
A likelihood function is defined based on the total PD including the backgrounds, signal hypothesis, and the data. Then an unbinned maximum likelihood fit is performed to the 2D m γγ -m jj data distribution. The parameters for the signal yield and for the background-only PD are constrained in the fit. Uniform priors are used to parametrize the nonresonant background  PD and log-normal priors are assumed for the single-Higgs boson background parametrizing our degree of uncertainty in the exact SM production cross section. When converting the fitted yields into production cross sections, we use simulation to estimate the selection efficiency for the signal. The difference between the simulation and the data is taken into account through parameters included in the likelihood function. Parameters not of immediate interest (nuisance parameters) are varied in the fit according to a log-normal PD function and can be classified according to their impact on the analysis. The uncertainty in the estimation of the integrated luminosity modifies the total expected signal normalization and is taken to be 2.5% [68]. Other systematic uncertainties modify the efficiency of the signal selection or impact the signal or the Higgs boson PD.
The photon-related uncertainties are discussed in Ref. [53]. The photon energy scale (PES) is known at a sub-percent level and the photon energy resolution (PER) is known with 5% precision. A 2% normalization uncertainty is estimated in the offline diphoton selection efficiency and in the trigger efficiency, while 1% is assigned to quantify the uncertainty on the photon identification efficiency.
The uncertainties in the jet energy scale (JES) and jet energy resolution (JER) are accounted for by changing the jet response by one standard deviation for each source [59]. They impact the average m jj peak position by approximately 1% and the peak resolution by 5%. The effects on the signal acceptance are also accounted for.
To use the b tagging score as an input to the classification MVA, its simulated distribution is matched to data by applying differential scale factors that depend on the jet p T and η [58]. The uncertainty in the efficiency of the categorization MVA is estimated by varying the b tagging differential scale factors within one standard deviation of their uncertainties [58]. The impacts of PES, PER, JES, and JER on the MVA classification procedure have been found to be negligible.  Theoretical uncertainties in the SM HH boson production QCD missing orders Normalization 4.3-6 PDF and α S uncertainties Normalization 3.1 m t effects Normalization 5 Theoretical uncertainties have been applied to the normalization of the single-Higgs boson background, but not on BSM signals. When we consider the SM-like search and parametrize the BSM cross section σ BSM gg→HH by the ratio µ HH = σ BSM gg→HH /σ SM gg→HH , the theoretical uncertainties on σ SM gg→HH are included in the likelihood. The following sources are considered correlated for all the single-Higgs boson channels (except bbH) and the double-Higgs boson channel in line with the recommendations from Ref. [6]: the scale dependence of higher-order terms; the impact from the choice of PDF quadratically summed with the uncertainty on α S ; and, in the case of the HH channel the uncertainties related to the inclusion of m t into the cross section calculations. Finally, for the bbH channel all sources are summed together including also the uncertainty on the b quark mass.
The systematic uncertainties are summarized in Table 4. The correlations between different categories are taken into account.

Results
No evidence for HH production is observed in the data. Upper limits on the production cross section of a pair of Higgs bosons times the branching fraction B(HH → γγbb) are computed using the modified frequentist approach for confidence levels (CL s ), taking the profile likelihood as a test statistic [69][70][71][72] in the asymptotic approximation. The limits are subsequently compared to theoretical predictions assuming SM branching fractions for Higgs boson decays.

Resonant signal
The observed and median expected upper limits at 95% confidence level (CL) are shown in Fig. 9, for the pp → X → HH → γγbb process assuming spin-0 and a spin-2 resonances. The data exclude a cross section of 0.23 to 4.2 fb depending on m X and the spin hypothesis.
The results are compared with the cross sections for bulk radion and bulk KK graviton production in WED models. In analogy with the Higgs boson, the hypothesized bulk radion field is expected to be predominantly produced through gluon-gluon fusion [73] and the cross section is calculated at NLO accuracy in QCD, using the recipe suggested in Ref. [74]. The theoretical input used in this letter is identical to the one from a previous CMS analysis [23]. More details can be found in Ref. [75]. The production cross section in this model is proportional to 1/Λ 2 R , where Λ R is the scale parameter of the theory. The analysis at √ s = 8 TeV [23], already excluded a radion resonance up to 980 GeV for the scale parameter Λ R = 1 TeV, but had no sensitivity for Λ R = 3 TeV. In this analysis the observed limits are able to exclude radion resonances, assuming Λ R = 2 TeV, for all points below m X = 840 GeV and, assuming Λ R = 3 TeV, below m X = 540 GeV. The couplings of the gravitons to matter fields are defined by κ/M Pl , with M Pl ≡ M Pl / √ 8π being the reduced Planck mass and κ the warp factor of the metric. Assuming κ/M Pl = 1.0, gravitons are excluded within the range 290 < m X < 810 GeV, while assuming κ/M Pl = 0.5, the exclusion range is 350 < m X < 530 GeV.

Nonresonant signal
The observed (expected) 95% CL upper limits on the SM-like pp → HH → γγbb process are 2.0 (1.6 fb); 0.79 (0.63 pb) for the total ggHH production cross section assuming SM Higgs boson branching fractions. The results can also be interpreted in terms of observed (expected) upper limits on µ HH of 24 (19). This is the most stringent constraint to date from the LHC. In particular, the constraint on µ HH improves over the previous search by a factor of three [23].
An additional study is performed including both VBF HH and ggHH production mechanisms in the definition of the scaling factor where σ SM VBF HH = 1.64 +0.05 −0.06 fb [6]. The expected sensitivity of the analysis for µ ext HH improves by 1.3% compared to µ HH . The improvement is smaller than the relative contribution of the VBF production cross section to the total one in the SM because of the nonoptimal selection efficiency of this analysis for the VBF events, as explained in Section 5.
The results are also interpreted in the context of Higgs boson anomalous couplings. Limits on the different BSM benchmark hypotheses (listed in Table 1) are shown in Fig. 10 (top). Using the limits on the benchmark hypotheses and the map between the clusters and the points in the 5D BSM parameter space, one can estimate the constraints provided by these data in different regions of phase space. It is important to stress that the same analysis categories are used for the SM-like search and for all BSM nonresonant searches. The differences between the limits come only from the kinematic properties of the benchmark signal hypotheses. For instance, the tightest constraint is placed on the benchmark 2 hypothesis, which features a m HH spectrum that extends above 1 TeV owing to large contributions from dimension-6 operators. The least restrictive constraint is on benchmark 7 that describes models with large values of κ λ , where the m HH spectrum peaks below 300 GeV. For benchmark 2 most of the events would be observed in the HM categories, while for benchmark 7 most events fall in the LM categories. Assuming equal cross sections, benchmark hypothesis 2 has a much better signal-over-background ratio than benchmark hypothesis 7. In the intermediate case of SM-like production it was observed that the HM categories have the best sensitivity.
We reweight the benchmark samples to model different values of the SM coupling modifier κ λ , with κ t and other BSM parameters fixed to their SM values. In Fig. 10 (bottom), 95% CL limits on the nonresonant Higgs boson pair production cross sections are shown as a function of the ratio κ λ . Assuming that the top quark Yukawa coupling is SM-like (κ t = 1), the analysis observes limits that constrain κ λ to values between −11 and 17.  for all masses below m X = 540 GeV, and the KK graviton (spin-2) hypothesis for the mass range 290 < m X < 810 GeV, assuming κ/M Pl = 1.0 (M Pl being the reduced Planck mass and κ the warp factor of the metric). For nonresonant production with SM-like kinematics, a 95% CL upper limit of 2.0 fb is set on σ(pp → HH → γγbb), corresponding to about 24 times the SM prediction. Anomalous couplings of the Higgs boson are also investigated, as well as the vector boson fusion HH production process. Values of the effective Higgs boson self-coupling κ λ are constrained to be within the range −11 < κ λ < 17 at 95% CL, assuming all other Higgs boson couplings to be at their SM values. The direct constraints reported on SM-like production and κ λ are the most restrictive to date.

Acknowledgments
We congratulate our colleagues in the CERN accelerator departments for the excellent performance of the LHC and thank the technical and administrative staffs at CERN and at other CMS institutes for their contributions to the success of the CMS effort. In addition, we gratefully acknowledge the computing centers and personnel of the Worldwide LHC Computing Grid for delivering so effectively the computing infrastructure essential to our analyses. Finally, we acknowledge the enduring support for the construction and operation of the LHC and the CMS detector provided by the following funding agencies: BMWFW and FWF (Aus  [30] CMS Collaboration, "The CMS trigger system", JINST 12 (2017) P01020, doi:10.1088/1748-0221/12/01/P01020, arXiv:1609.02366.