Evidence for the Higgs boson decay to a bottom quark-antiquark pair

A search for the standard model (SM) Higgs boson ($\mathrm{H}$) decaying to $\mathrm{b}\overline{\mathrm{b}}$ when produced in association with an electroweak vector boson is reported for the following processes: $\mathrm{Z}(\nu\nu)\mathrm{H}$, $\mathrm{W}(\mu \nu)\mathrm{H}$, $\mathrm{W}(\mathrm{e} \nu)\mathrm{H}$, $\mathrm{Z}(\mu\mu)\mathrm{H}$, and $\mathrm{Z}(\mathrm{e}\mathrm{e})\mathrm{H}$. The search is performed in data samples corresponding to an integrated luminosity of 35.9 fb$^{-1}$ at $\sqrt{s}=$ 13 TeV recorded by the CMS experiment at the LHC during Run 2 in 2016. An excess of events is observed in data compared to the expectation in the absence of a $\mathrm{H} \to\mathrm{b}\overline{\mathrm{b}}$ signal. The significance of this excess is 3.3 standard deviations, where the expectation from SM Higgs boson production is 2.8. The signal strength corresponding to this excess, relative to that of the SM Higgs boson production, is 1.2 $\pm$ 0.4. When combined with the Run 1 measurement of the same processes, the signal significance is 3.8 standard deviations with 3.8 expected. The corresponding signal strength, relative to that of the SM Higgs boson, is 1.06 $^{+0.31}_{-0.29}$.

The H → bb decay tests directly the Higgs boson coupling to fermions, and more specifically to down-type quarks, and has not yet been established experimentally. In the SM, for a Higgs boson mass m H = 125 GeV, the branching fraction is approximately 58% [27], by far the largest. An observation in this channel is necessary to solidify the Higgs boson as the source of mass generation in the fermion sector of the SM [28,29].
At the Tevatron pp collider the sensitivity of the SM Higgs boson search, for masses below 130 GeV, was dominated by its production in association with a weak vector boson (VH production) and its decay to bb [30]. The combined searches from the CDF and D0 Collaborations resulted in an excess of events with a local significance, at m H = 125 GeV, of 2.8 standard deviations, with an expected value of 1.6. For the H → bb search at the LHC, the following Higgs boson production processes have been considered: in association with a top quark pair [31][32][33][34], through vector boson fusion [35,36], through VH production [37,38], and, more recently, through gluon fusion [39]. The process with the largest sensitivity is VH production.
The combined searches for H → bb by the ATLAS and CMS Collaborations in Run 1, at √ s = 7 and 8 TeV, evaluated for a Higgs boson mass of 125.09 GeV, resulted in a significance of 2.6 standard deviations, with 3.7 standard deviations expected [18]. The corresponding signal strength, relative to the SM expectation, is µ = 0.7 ± 0.3. The significance from the individual search by the ATLAS (CMS) experiment is 1.7 (2.0) standard deviations, with 2.7 (2.5) standard deviations expected, and a signal strength µ = 0.6 ± 0.4 (µ = 0.8 ± 0.4).
Recent results by the ATLAS Collaboration [40] in the search for H → bb through VH production at √ s = 13 TeV, with data corresponding to an integrated luminosity of 36.1 fb −1 , report a significance of 3.5 standard deviations, corresponding to a signal strength of µ = 1.20 +0. 42 −0. 36 . The combination with the results from the same search in Run 1 [37] yields a significance of 3.6 standard deviations and a signal strength µ = 0.90 +0.28 −0.26 . This article reports on the search with the CMS experiment for the decay of the SM Higgs boson to bottom quarks, H → bb, when produced through the pp → VH process, where V is either a W or a Z boson. This search is performed with data samples from Run 2 of the LHC, recorded during 2016, corresponding to an integrated luminosity of 35.9 fb −1 at √ s = 13 TeV. The following five processes are considered in the search: Z(νν)H, W(µν)H, W(eν)H, Z(µµ)H, and Z(ee)H. The final states that predominantly correspond to these processes, respectively, are characterized by the number of leptons required in the event selection, and are referred to as the 0-, 1-, and 2-lepton channels.
Throughout this article the term "lepton" (denoted ) refers solely to muons and electrons, but not to taus. The leptonic tau decays in WH and ZH processes are implicitly included in the W(µν)H, W(eν)H, Z(µµ)H, and Z(ee)H processes. Background processes originate from the production of W and Z bosons in association with jets from gluons and from light-or heavyflavor quarks (W+jets and Z+jets), from singly and pair-produced top quarks (single top and tt), from diboson production (VV), and from quantum chromodynamics multijet events (QCD).
Simulated samples of signal and background events are used to optimize the search. For each channel, a signal region enriched in VH events is selected together with several control regions, each enriched in events from individual background processes. The control regions are used to test the accuracy of the simulated samples' modeling for the variables relevant to the analysis. A simultaneous binned-likelihood fit to the shape and normalization of specific distributions for the signal and control regions for all channels combined is used to extract a possible Higgs boson signal. The distribution used in the signal region is the output of a boosted decision tree (BDT) event discriminant [41,42] that helps separate signal from background. For the control regions, a variable that identifies jets originating from b quarks, and that discriminates between the different background processes, is used. To validate the analysis procedure, the same methodology is used to extract a signal for the VZ process, with Z → bb, which has a nearly identical final state to VH with H → bb, but with a production cross section of 5 to 15 times larger, depending on the kinematic regime considered. Finally, the results from this search are combined with those of similar searches performed by the CMS Collaboration during Run 1 [18, 36,38].
This article is structured as follows: Sections 2-3 describe the CMS detector, the simulated samples used for signal and background processes, and the triggers used to collect the data. Sections 4-5 describe the reconstruction of the detector objects used in the analysis and the selection criteria for events in the signal and control regions. Section 6 describes the sources of uncertainty in the analysis, and Section 7 describes the results, summarized in Section 8.

The CMS detector and simulated samples
A detailed description of the CMS detector can be found elsewhere in Ref. [43]. The momenta of charged particles are measured using a silicon pixel and strip tracker that covers the range |η| < 2.5 and is immersed in a 3.8 T axial magnetic field. The pseudorapidity is defined as η = − ln[tan(θ/2)], where θ is the polar angle of the trajectory of a particle with respect to the direction of the counterclockwise proton beam. Surrounding the tracker are a crystal electromagnetic calorimeter (ECAL) and a brass and scintillator hadron calorimeter (HCAL), both used to measure particle energy deposits and both consisting of a barrel assembly and two endcaps. The ECAL and HCAL extend to a range of |η| < 3.0. A steel and quartz-fiber Cherenkov forward detector extends the calorimetric coverage to |η| < 5.0. The outermost component of the CMS detector is the muon system, consisting of gas-ionization detectors placed in the steel flux-return yoke of the magnet to measure the momenta of muons traversing through the detector. The two-level CMS trigger system selects events of interest for permanent storage. The first trigger level, composed of custom hardware processors, uses information from the calorimeters and muon detectors to select events in less than 3.2 µs. The high-level trigger software algorithms, executed on a farm of commercial processors, further reduce the event rate using information from all detector subsystems. The variable ∆R= √ (∆η) 2 +(∆φ) 2 is used to measure the separation between reconstructed objects in the detector, where φ is the angle (in radians) of the trajectory of the object in the plane transverse to the direction of the proton beams.
Samples of simulated signal and background events are produced using the Monte Carlo (MC) event generators listed below. The CMS detector response is modeled with GEANT4 [44]. The signal samples used have Higgs bosons with m H = 125 GeV produced in association with vector bosons. The quark-induced ZH and WH processes are generated at next-to-leading order (NLO) using the POWHEG [45][46][47] v2 event generator extended with the MiNLO procedure [48,49], while the gluon-induced ZH processes (denoted ggZH) are generated at leadingorder (LO) accuracy with POWHEG v2. The MADGRAPH5 aMC@NLO [50] v2.3.3 generator is used at NLO with the FxFx merging scheme [51] for the diboson background samples. The same generator is used at LO accuracy with the MLM matching scheme [52] for the W+jets and Z+jets in inclusive and b-quark enriched configurations, as well as the QCD multijet sample. The tt [53] production process, as well as the single top quark sample for the t-channel [54], are produced with POWHEG v2. The single top quark samples for the tW- [55] and s-channel [56] are instead produced with POWHEG v1. The production cross sections for the signal samples are rescaled to next-to-next-to-leading order (NNLO) QCD + NLO electroweak accuracy combining the VHNNLO [57][58][59], VH@NNLO [60,61] and HAWK v2.0 [62] generators as described in the documentation produced by the LHC Working Group on Higgs boson cross sections [63], and they are applied as a function of the vector boson transverse momentum (p T ). The production cross sections for the tt samples are rescaled to the NNLO with the next-to-next-to-leading-log (NNLL) prediction obtained with TOP++ v2.0 [64], while the W+jets and Z+jets samples are rescaled to the NLO cross sections using MADGRAPH5 aMC@NLO. The parton distribution functions (PDFs) used to produce the NLO samples are the NLO NNPDF3.0 set [65], while the LO NNPDF3.0 set is used for the LO samples. For parton showering and hadronization the POWHEG and MADGRAPH5 aMC@NLO samples are interfaced with PYTHIA 8.212 [66]. The PYTHIA8 parameters for the underlying event description correspond to the CUETP8M1 tune derived in Ref. [67] based on the work described in Ref. [68].
During the 2016 data-taking period the LHC instantaneous luminosity reached approximately 1.5 × 10 34 cm −2 s −1 and the average number of pp interactions per bunch crossing was approximately 23. The simulated samples include these additional pp interactions, referred to as pileup interactions (or pileup), that overlap with the event of interest in the same bunch crossing.

Triggers
Several triggers are used to collect events with final-state objects consistent with the signal processes in the channels under consideration.
For the 0-lepton channel, the quantities used in the trigger are derived from the reconstructed objects in the detector identified by a particle-flow (PF) algorithm [69] that combines the online information from all CMS subsystems to identify and reconstruct individual particles emerging from the proton-proton collisions: charged hadrons, neutral hadrons, photons, muons, and electrons. The main trigger used requires that both the missing transverse momentum, p miss T , and the hadronic missing transverse momentum, H miss T , in the event be above a threshold of 110 GeV. Online p miss T is defined as the magnitude of the negative vector sum of the transverse momenta of all reconstructed objects identified by the PF algorithm, while H miss T is defined as the magnitude of the negative vector sum of the transverse momenta of all reconstructed jets (with p T > 20 GeV and |η| < 5.2) identified by the same algorithm. For Z(νν)H events with p miss T > 170 GeV, evaluated offline, the trigger efficiency is approximately 92%, and near 100% above 200 GeV.
For the 1-lepton channels, single-lepton triggers are used. The muon trigger p T threshold is 24 GeV and the electron p T threshold is 27 GeV. For the 2-lepton channels, dilepton triggers are used. The muon p T thresholds are 17 and 8 GeV, and the electron p T thresholds are 23 and 12 GeV. All leptons in these triggers are required to pass stringent lepton identification criteria. In addition, to maintain an acceptable trigger rate, and to be consistent with what is expected from signal events, leptons are also required to be isolated from other tracks and calorimeter energy deposits. For W(µν)H events that pass all offline requirements described in Section 5, the single-muon trigger efficiency is ≈95%. The corresponding efficiency for recording W(eν)H events with the single-electron trigger is ≈90%. For Z( )H signal events that pass all offline requirements in Section 5, the dilepton triggers are nearly 100% efficient.

Event reconstruction
The characterization of VH events in the channels studied here requires the reconstruction of the following objects in the detector, using the PF algorithm [69] and originating from the primary interaction vertex: muons, electrons, neutrinos (reconstructed as p miss T ), and jetsincluding those that originate from the hadronization of b quarks, referred to as "b jets".
The reconstructed vertex with the largest value of summed physics-object p 2 T is taken to be the primary pp interaction vertex. The physics objects are the objects reconstructed by a jet finding algorithm [70,71] applied to all charged tracks associated with the vertex, plus the corresponding associated p miss T . The pileup interactions affect jet momentum reconstruction, p miss T reconstruction, lepton isolation, and b tagging efficiencies. To mitigate these effects, all charged hadrons that do not originate from the primary interaction vertex are removed from consideration in the event. In addition, the average neutral energy density from pileup interactions is evaluated from PF objects and subtracted from the reconstructed jets in the event and from the summed energy in the isolation criteria used for leptons [72]. These pileup mitigation procedures are applied on an object-by-object basis.
Muons are reconstructed using two algorithms [73]: one in which tracks in the silicon tracker are matched to hits in the muon detectors, and another in which a track fit is performed using hits in the silicon tracker and in the muon systems. In the latter algorithm, the muon is seeded by hits in the muon systems. The muon candidates used in the analysis are required to be successfully reconstructed by both algorithms. Further identification criteria are imposed on the muon candidates to reduce the fraction of tracks misidentified as muons. These include the number of hits in the tracker and in the muon systems, the fit quality of the global muon track, and its consistency with the primary vertex. Muon candidates are required to be in the |η| < 2.4 region.
Electron reconstruction [74] requires the matching of a set of ECAL clusters, denoted supercluster (SC), to a track in the silicon tracker. Electron identification [74] relies on a multivariate technique that combines observables sensitive to the amount of bremsstrahlung along the electron trajectory, such as the geometrical matching and momentum consistency between the electron trajectory and the associated calorimeter clusters, as well as various shower shape observables in the calorimeters. Additional requirements are imposed to remove electrons that originate from photon conversions. Electrons are required to be in the range |η| < 2.5, excluding candidates for which the SC lies in the 1.444 < |η SC | < 1.566 transition region between the ECAL barrel and endcap, where electron reconstruction is not optimal.
Charged leptons from W and Z boson decays are expected to be isolated from other activity in the event. For each lepton candidate, a cone in η-φ is constructed around the track direction at the event vertex. The scalar sum of the transverse momentum of each reconstructed particle, including neutral particles, compatible with the primary vertex and contained within the cone is calculated, excluding the contribution from the lepton candidate itself. This sum is called isolation. In the presence of pileup, isolation is contaminated with particles from the other interactions. A quantity proportional to the pileup is used to correct the isolation on average to mitigate reductions in signal efficiency at larger values of pileup. In the 1-lepton channel, if the corrected isolation sum exceeds 6% of the lepton candidate p T , the lepton is rejected. In the 2-lepton channel, the threshold is looser; the isolation of each candidate can be up to 20 (15%) of the muon (electron) p T . Including the isolation requirement, the total efficiency for reconstructing muons is in the range of 85-100%, depending on p T and η. The corresponding efficiency for electrons is in the range of 40-90%.
Jets are reconstructed from PF objects using the anti-k T clustering algorithm [70], with a distance parameter of 0.4, as implemented in the FASTJET package [71,75]. Each jet is required to lie within |η| < 2.4, to have at least two tracks associated with it, and to have electromagnetic and hadronic energy fractions of at least 1%. The last requirement removes jets originating from instrumental effects. Jet energy corrections are applied as a function of η and p T of the jet [76]. The missing transverse momentum vector, p miss T , is calculated offline as the negative of the vectorial sum of transverse momenta of all PF objects identified in the event [77], and the magnitude of this vector is denoted p miss T in the rest of this article.
The identification of b jets is performed using a combined multivariate (CMVA) b tagging algorithm [78,79]. This algorithm combines, in a likelihood discriminant, information within jets that helps differentiate between b jets and jets originating from light quarks, gluons, or charm quarks. This information includes track impact parameters, secondary vertices, and information related to low-p T leptons if contained within a jet. The output of this discriminant has continuous values between −1.0 and 1.0. A jet with a CMVA discriminant value above a certain threshold is labeled as "b-tagged". The efficiency for tagging b jets and the rate of misidentification of non-b jets depend on the threshold chosen, and are typically parameterized as a function of the p T and η of the jets. These performance measurements are obtained directly from data in samples that can be enriched in b jets, such as tt and multijet events (where, for example, requiring the presence of a muon in the jets enhances the heavy-flavor content of the events). Three thresholds for the CMVA discriminant value are used in this analysis: loose (CMVA L ), medium (CMVA M ), and tight (CMVA T ). Depending on the threshold used, the efficiencies for tagging jets that originate from b quarks, c quarks, and light quarks or gluons are in the 50-75%, 5-25%, and 0.15-3.0% ranges, respectively. The loose (tight) threshold has the highest (lowest) efficiency and allows most (least) contamination.
In background events, particularly tt, there is often additional, low energy, hadronic activity in the event. Measuring the hadronic activity associated with the main primary vertex provides additional discriminating variables to reject background. To measure this hadronic activity only reconstructed charged-particle tracks are used, excluding those associated with the vector boson and the two b jets. A collection of "additional tracks" is assembled using reconstructed tracks that: (i) satisfy the high purity quality requirements defined in Ref. [80] and p T > 300 MeV; (ii) are not associated with the vector boson, nor with the selected b jets in the event; (iii) have a minimum longitudinal impact parameter, |d z (PV)|, with respect to the main PV, rather than to other pileup interaction vertices; (iv) satisfy |d z (PV)| < 2 mm; and (v) are not in the region between the two selected b-tagged jets. This region is defined as an ellipse in the η-φ plane, centered on the midpoint between the two jets, with major axis of length ∆R(bb)+1, where ∆R(bb)= (∆η bb ) 2 +(∆φ bb ) 2 , oriented along the direction connecting the two b jets, and with minor axis of length 1. The additional tracks are then clustered into "softtrack jets" using the anti-k T clustering algorithm with a distance parameter of 0.4. The use of track jets represents a clean and validated method [81] to reconstruct the hadronization of partons with energies down to a few GeV [82]; an extensive study of the soft-track jet activity can be found in Refs. [83,84]. The number of soft track jets with p T > 5 GeV is used in all channels as a background discriminating variable.
Events from data and from the simulated samples are required to satisfy the same trigger and event reconstruction requirements. Corrections that account for the differences in the performance of these algorithms between data and simulated samples are computed from data and used in the analysis.

Event selection
A signal region enriched in VH events is determined separately for each channel. Simulated events in this region are used to train an event BDT discriminant to help differentiate between signal and background events. Also for each channel, different control regions, each enriched in events from individual background processes, are selected. These regions are used to study the agreement between simulated samples and data, and to provide a distribution that is combined with the output distribution of the signal region event BDT discriminant in the H → bb signalextraction fit. This control region distribution is obtained from the second-highest value of the CMVA discriminant among the two jets selected for the reconstruction of the H → bb decay, denoted CMVA min .
As mentioned in the Introduction, background processes to VH production with H → bb are the production of vector bosons in association with one or more jets (V+jets), tt production, single-top-quark production, diboson production, and QCD multijet production. These processes have production cross sections that are several orders of magnitude larger than that of the Higgs boson, with the exception of the VZ process with Z → bb, with an inclusive cross section only about 15 times larger than the VH production cross section. Given the nearly identical final state, this process provides a benchmark against which the Higgs boson search strategy can be tested. The results of this test are discussed in Section 7.1.
Below we describe the selection criteria used to define the signal regions and the variables used to construct the event BDT discriminant. Also described are the criteria used to select appropriate background-specific control regions and the corresponding distributions used in the signal-extraction fit.

Signal regions
The signal region requirements are listed in Table 1. Events are selected to belong exclusively to only one of the three channels. Signal events are characterized by the presence of a vector boson recoiling against two b jets with an invariant mass near 125 GeV. The event selection therefore relies on the reconstruction of the decay of the Higgs boson into two b-tagged jets and on the reconstruction of the leptonic decay modes of the vector boson.
The reconstruction of the H → bb decay is based on the selection of the pair of jets that have the highest values of the CMVA discriminant among all jets in the event. The highest and second-highest values of the CMVA discriminant for these two jets are denoted by CMVA max and CMVA min , respectively. Both jets are required to be central (with |η| < 2.4), to satisfy standard requirements to remove jets from pileup [85], and to have a p T above a minimum threshold, that can be different for the highest (j 1 ) and second-highest (j 2 ) p T jet. The selected dijet pair is denoted by "jj" in the rest of this article.
The background from V+jets and diboson production is reduced significantly when the b tagging requirements are applied. As a result, processes where the two jets originate from genuine b quarks dominate the sample composition in the signal region. To provide additional suppression of background events, several other requirements are imposed on each channel after the reconstruction of the H → bb decay. Table 1: Selection criteria that define the signal region. Entries marked with "-" indicate that the variable is not used in the given channel. Where selections differ for different p T (V) regions, there are comma separated entries of thresholds or square brackets with a range that indicate each region's selection as defined in the first row of the table. The values listed for kinematic variables are in units of GeV, and for angles in units of radians. Where selection differs between lepton flavors, the selection is listed as (muon, electron).

0-lepton channel
This channel targets mainly Z(νν)H events in which the p miss T is interpreted as the transverse momentum of the Z boson in the Z → νν decay. In order to overcome large QCD multijet backgrounds, a relatively high threshold of p miss T > 170 GeV is required. The QCD multijet background is further reduced to negligible levels in this channel when requiring that the p miss T does not originate from the direction of (mismeasured) jets. To that end, if there is a jet with |η| < 2.5 and p T > 30 GeV, whose azimuthal angle is within 0.5 radians of the p miss T direction, the event is rejected. The rejection of multijet events with p miss T produced by mismeasured jets is aided by using a different missing transverse momentum reconstruction, denoted p miss T (trk), obtained by considering only charged-particle tracks with p T > 0.5 GeV and |η| < 2.5. For an event to be accepted, it is required that p miss T (trk) and p miss T be aligned in azimuth within 0.5 radians. To reduce background events from tt and WZ production channels, events with any additional isolated leptons with p T > 20 GeV are rejected. The number of these additional leptons is denoted by N a .

1-lepton channel
This channel targets mainly W( ν)H events in which candidate W → ν decays are identified by the presence of one isolated lepton as well as missing transverse momentum, which is implicitly required in the p T (V) selection criteria mentioned below, where p T (V) is calculated from the vectorial sum of p miss T and the lepton p T . Muons (electrons) are required to have p T > 25 (30) GeV. It is also required that the azimuthal angle between the p miss T direction and the lepton be less than 2.0 radians. The lepton isolation for either flavor of lepton is required to be smaller than 6% of the lepton p T . These requirements significantly reduce possible contamination from QCD multijet production. With the same motivation as in the 0-lepton channel, events with any additional isolated leptons are rejected. To substantially reject tt events, the number of additional jets with |η| < 2.9 and p T > 25 GeV, N aj , is allowed to be at most one.

2-lepton channel
This channel targets Z → decays, which are reconstructed by combining isolated, oppositely charged pairs of electrons or muons and requiring the dilepton invariant mass to satisfy 75 < M( ) < 105 GeV. The p T for each lepton is required to be greater than 20 GeV. Isolation requirements are relaxed in this channel as the QCD multijet background is practically eliminated after requiring compatibility with the Z boson mass [86].

p T (V) requirements, H → bb mass reconstruction, and event BDT discriminant
Background events are substantially reduced by requiring significant large transverse momentum of the reconstructed vector boson, p T (V), or of the Higgs boson candidate [87]. In this kinematic region, the V and H bosons recoil from each other with a large azimuthal opening angle, ∆φ(V, H), between them. Different p T (V) regions are selected for each channel. Because of different signal and background content, each of these regions has different sensitivity and the analysis is performed separately in each region. For the 0-lepton channel, a single region requiring p miss T > 170 GeV is studied. The 1-lepton channel is also analyzed in a single region, with p T (V) > 100 GeV. The 2-lepton channels consider two regions: low-and high-p T regions defined by 50 < p T (V) < 150 GeV and p T (V) > 150 GeV.
After all event selection criteria described in this section are applied, the dijet invariant mass resolution of the two b jets from the Higgs boson decay is approximately 15%, depending on the p T of the reconstructed Higgs boson, with a few percent shift in the value of the mass peak relative to 125 GeV. The Higgs boson mass resolution is further improved by applying multivariate regression techniques similar to those used at the CDF experiment [88] and used for several Run 1 H → bb analyses by ATLAS and CMS [37,38]. The regression estimates a correction that is applied after the jet energy corrections discussed in Section 4. It is computed for individual b jets in an attempt to improve the accuracy of the measured energy with respect to the b quark energy. To this end, a BDT is trained on b jets from simulated tt events with inputs that include detailed jet structure information, which differs in jets from b quarks from that of jets from light-flavor quarks or gluons. These inputs include variables related to several properties of the secondary vertex (when reconstructed), information about tracks, jet constituents, and other variables related to the energy reconstruction of the jet. Because of semileptonic b hadron decays, jets from b quarks contain, on average, more leptons and a larger fraction of missing energy than jets from light quarks or gluons. Therefore, in the cases where a low-p T lepton is found in the jet or in its vicinity, the following variables are also included in the regression BDT: the p T of the lepton, the ∆R distance between the lepton and the jet directions, and the momentum of the lepton transverse to the jet direction.
For the three channels under consideration, the H → bb mass resolution, measured on simulated signal samples when the regression-corrected jet energies are used, is in the 10-13% range, and it depends on the p T of the reconstructed Higgs boson. Averaging over all channels, the improvement in mass resolution is approximately 15%, resulting in an increase of about 10% in the sensitivity of the analysis. The performance of these corrections is shown in Fig. 1 for simulated samples of Z( )H(bb) events. The validation of the technique in data is done using the p T ( )/p T (jj) distribution in samples of Z → events containing two b-tagged jets, and using the reconstructed top quark mass distribution in the lepton+jets final state in tt-enriched sam- ples. After the jets are corrected, the root-mean-square value of both distributions decreases, the peak value of the p T ( )/p T (jj) distribution is shifted closer to 1.0, and the peak value of the reconstructed top quark mass gets closer to the top quark mass. These distributions show good agreement between data and the simulated samples before and after the regression correction is applied. Importantly, the reconstructed dijet invariant mass distributions for background processes do not develop a peak structure when the regression correction is applied to the selected b-tagged jets in the event.
As mentioned above, to help separate signal from background in the signal region, an event BDT discriminant is trained using simulated samples for signal and all background processes. The set of event input variables used, listed in Table 2, is chosen by iterative optimization from a larger number of potentially discriminating variables. Among the most discriminating variables for all channels are the dijet invariant mass distribution, M(jj), the number of additional jets, N aj , the value of CMVA for the jet with the second-highest CMVA value, CMVA min , and the distance, ∆R(jj), between the two jets.

Background control regions
To help determine the normalization of the main background processes, and to validate how well the simulated samples model the distributions of variables most relevant to the analysis, several control regions are selected in data. Tables 3-5 list the selection criteria used to define these regions for the 0-, 1-, and 2-lepton channels, respectively. Separate control regions are specified for tt production and for the production of W and Z bosons in association with either predominantly heavy-flavor (HF) or light-flavor (LF) jets. While some control regions are very pure in their targeted background process, others contain more than one process.
Different background processes feature specific b jet compositions, e.g. two genuine b jets for tt and V+bb, one genuine b jet for V+b, no genuine b jet for V+udscg. This characteristic, together with their different kinematic distributions, results in distinct CMVA min distributions Jets are counted as additional jets to those selected to reconstruct the H → bb decay if they satisfy the following: p T > 30 GeV and |η| < 2.4 for the 0-and 2-lepton channels, and p T > 25 GeV and |η| < 2.9 for the 1-lepton channel.
distance in η-φ between jets ∆η(jj) difference in η between jets ∆φ(jj) azimuthal angle between jets p T (V) vector boson transverse momentum ∆φ(V, jj) azimuthal angle between vector boson and dijet directions >0.5 ∆φ( p miss T , p miss T (trk)) -<0.5 <0.5 min ∆φ(j, p miss T ) < π/2 -- Table 4: Definition of the control regions for the 1-lepton channels. The HF control region is divided into low-and high-mass ranges as shown in the table. The significance of p miss T , σ(p miss T ), is p miss T divided by the square root of the scalar sum of jet p T where jet p T > 30 GeV. The values listed for kinematic variables are in units of GeV, except for σ(p miss T )whose units are √ GeV. For angles units are radians. Entries marked with "-" indicate that the variable is not used in that region.  Table 6: Data/MC scale factors for each of the main background processes in each channel, as obtained from the combined signal-extraction fit to control and signal region distributions described in Section 7. Electron and muon samples in the 1-and 2-lepton channels are fit simultaneously to determine average scale factors. The same scale factors for W+jets processes are used for the 0-and 1-lepton channels.
Process 0-lepton 1-lepton 2-lepton low-p T (V) 2-lepton high-p T (V) W0b 1.14 ± 0.07 1.14 ± 0.07 --W1b 1 that serve to extract the normalization scale factors of the various simulated background samples when fit to data in conjunction with the BDT distributions in the signal region to search for a possible VH signal. In this signal-extraction fit, discussed further in Section 7, the shape and normalization of these distributions are allowed to vary, for each background component, within the systematic and statistical uncertainties described in Section 6. These uncertainties are treated as independent nuisance parameters. The simulated samples for the V+jets processes are split into independent subprocesses according to the number of MC generator-level jets (with p T > 20 GeV and |η| < 2.4) containing at least one b hadron. Table 6 lists the scale factors obtained from the fit. These account not only for possible cross section discrepancies, but also for potential residual differences in the selection efficiency of the different objects in the detector. Scale factors obtained from a similar fit to the control regions alone are consistent with those in Table 6. Given the significantly different event selection criteria, each channel probes different kinematic and topological features of the same background processes and variations in the value of the scale factors across channels are to be expected.  Table 6 have been applied to the corresponding simulated samples. Figure 3 shows examples of CMVA min and event BDT distributions, also for different control regions and for different channels, where not only the scale factors are applied but also the shapes of the distributions are allowed to vary according to the treatment of systematic uncertainties from all nuisances in the signalextraction fit. These BDT distributions are from control regions and do not participate in that fit. The signal region BDT distributions used in the fit are presented in Section 7.
In inclusive vector boson samples, selected for this analysis, the p T (V) spectrum in data is observed to be softer than in simulated samples, as expected from higher-order electroweak corrections to the production processes [89]. The events in all three channels are re-weighted to account for the electroweak corrections to p T (V). The correction is negligible for low p T (V) but is sizable at high p T (V), reaching 10% near 400 GeV.
After these corrections, a residual discrepancy in p T (V) between data and simulated samples is observed in some control regions. In the 0-lepton channel, tt samples are re-weighted as a function of the generated top quark's p T according to the observed discrepancies in data and simulated samples in differential top quark cross section measurements [90]. This re-weighting resolves the discrepancy in p T (V) in tt control regions. In the 1-lepton channel, additional corrections are needed for W+jets samples, and corrections are derived from the data in 1lepton control regions for these processes: tt, W+udscg, and the sum of W+b, W+bb, and single  Table 6. The top row of plots is from the 0-lepton Z+HF control region. The middle row shows variables in the 1-lepton tt control region. The bottom row shows variables in the 2lepton Z+HF control region. The plots on the left are always p T (V). Plots on the right show a key variable that is validated in that control region. These variables are, from top to bottom, the azimuthal angle between the two jets that comprise the Higgs boson, the reconstructed top quark mass, and the ratio of p T (V) and p T (jj). top quark backgrounds. A re-weighting of simulated events in p T (V) is derived for each, such that the shape of the sum of simulated processes matches the data. The correction functions are extracted through a simultaneous fit of linear functions in p T (V). The uncertainties in the fit parameters are used to assess the systematic uncertainty. The p T (V) spectra resulting from re-weighting in either the top quark p T or p T (V) are equivalent.
The V+jets LO simulated samples are used in the analysis because, due to computing resource limitations, considerably more events are available than for the NLO samples. A normalization K factor is applied to the LO samples to account for the difference in cross sections. Kinematic distributions between the two samples are found to be consistent after matching the LO distribution of the pseudorapidity separation ∆η(jj) between the two H → bb jet candidates to the NLO one. Different corrections are derived depending on whether these two jets are matched to zero, one, or two b quarks. Both the ∆η(jj) distributions of the NLO samples and the corrected LO samples agree well with data in control regions.

Systematic uncertainties
Systematic effects affect the H → bb mass resolution, the shapes of the CMVA min distributions, the shapes of the event BDT distributions, and the signal and background yields in the most sensitive region of the BDT distributions. The uncertainties associated with the normalization scale factors of the simulated samples for the main background processes have the largest impact on the uncertainty in the fitted signal strength µ. The next largest effects result from the size of the simulated samples and from uncertainties in correcting mismodeling of kinematic variables, both in signal and in background simulated samples. The next group of significant systematic uncertainties are related to b tagging uncertainties and uncertainties in jet energy. All systematic uncertainties considered are listed in Table 7 and are described in more detail below, in the same order as they appear in the Table. The sizes of simulated samples are sometimes limited. If the statistical uncertainty in the content of certain bins in the BDT distributions for the simulated samples is large, Poissonian nuisance parameters are used in the signal extraction binned-likelihood fit. These are required mainly in the V+jets samples and are a leading source of systematic uncertainty in the analysis.
The corrections to the p T (V) spectra in the tt and W+jets samples are applied per sample according to the uncertainty in the simultaneous p T (V) fit described in Section 5.2. This uncertainty on the correction is at most 5% on the background yield near p T (V) of 400 GeV. The shape difference in the event BDT and CMVA min distributions between simulations of two event generators are used to account for imperfect modeling in the nominal simulated samples. For the V+jets, the difference between the shapes for events generated with the MADGRAPH5 aMC@NLO MC generator at LO and NLO is considered as a shape systematic uncertainty. For the tt process, the differences in the shapes between the nominal sample generated with POWHEG and that obtained from the MC@NLO [91] generator are considered as shape systematic uncertainties. Variations on the QCD factorization and renormalization scales and on the PDF choice are considered for the simulated signal and background samples. The scales are varied by factors of 0.5 and 2.0, independently, while the PDF uncertainty effect on the shapes of the BDT distributions is evaluated by using the PDF replicas associated to the NNPDF set [65].
The b tagging efficiencies and the probability to tag as a b jet a jet originating from a different flavor (mistag) are measured in heavy-flavor enhanced samples of jets that contain muons and are applied consistently to jets in signal and background events. The measured uncertainties for the b tagging scale factors are: 1.5% per b-quark tag, 5% per charm-quark tag, and 10% per  [79]. These uncertainties are propagated to the CMVA min distributions by re-weighting events. The shape of the event BDT distribution is also affected by the shape of the CMVA distributions because CMVA min is an input to the BDT discriminant. For the 2-lepton channel CMVA max is also an input to this discriminant. The signal strength uncertainty increases by 8% and 5%, respectively, due to b tagging efficiency and mistag scale factor uncertainties propagated through the CMVA distributions and finally to the event BDT distributions.
The uncertainties in the jet energy scale and resolution have an effect on the shape of the event BDT output distribution because the dijet invariant mass is a crucial input variable to the BDT discriminant. The impact of the jet energy scale uncertainty is determined by recomputing the BDT output distribution after shifting the energy scale up and down by its uncertainty. Similarly, the impact of the jet energy resolution is determined by recomputing the BDT output distribution after increasing or decreasing the jet energy resolution. The uncertainties in jet energy scale and resolution affect not only the jets in the event but also the p miss T , which is recalculated when these variations are applied. The individual contribution to the increase in signal strength uncertainty is found to be around 6% for the jet energy scale and 4% for the jet energy resolution uncertainty. The uncertainty in the jet energy scale and resolution vary as functions of jet p T and η. For the jet energy scale there are several sources of uncertainty that are derived and applied independently as they are fully uncorrelated between themselves [92], while for the jet energy resolution a single shape systematic is evaluated.
The total VH signal cross section has been calculated to NNLO+NNLL accuracy in QCD, combined with NLO electroweak corrections, and the associated systematic uncertainties [63] include the effect of scale variations and PDF uncertainties. The estimated uncertainties in the NLO electroweak corrections are 7% for the WH and 5% for the ZH production processes, respectively. The estimate for the NNLO QCD correction results in an uncertainty of 1% for the WH and 4% for the ZH production processes, which includes the ggZH contribution.
An uncertainty of 15% is assigned to the event yields obtained from simulated samples for both single top quark and diboson production. These uncertainties are about 25% larger than those from the CMS measurements of these processes [93][94][95], to account for the different kinematic regime in which those measurements are performed.
Another source of uncertainty that affects the p miss T reconstruction is the estimate of the energy that is not clustered in jets [77]. This affects only the 0-and 1-lepton channels, with an individual contribution to the signal strength uncertainty of 1.3%.
Muon and electron trigger, reconstruction, and identification efficiencies in simulated samples are corrected for differences in data and simulation using samples of leptonic Z boson decays. These corrections are affected by uncertainties coming from the efficiency measurement method, the lepton selection, and the limited size of the Z boson samples. They are measured and propagated as functions of lepton p T and η. The parameters describing the turn-on curve that parametrizes the Z(νν)H trigger efficiency as a function of p miss T are varied within their statistical uncertainties, and are also estimated for different assumptions on the methods used to derive the efficiency. The total individual impact of these uncertainties on lepton identification and trigger efficiencies on the measured signal strength is about 2%.
The uncertainty in the CMS integrated luminosity measurement is estimated to be 2.5% [96]. Events in simulated samples must be re-weighted such that the distribution of pileup in the simulated samples matches that estimated in data. A 5% uncertainty on pileup re-weighting is assigned, but the impact of this uncertainty is negligible.
The combined effect of the systematic uncertainties results in a 25% reduction of the expected significance for the SM Higgs boson rate.

Results
Results are obtained from combined signal and background binned-likelihood fits, simultaneously for all channels, to both the shape of the output distribution of the event BDT discriminants in the signal region and to the CMVA min distributions for the control regions corresponding to each channel. The BDT discriminants are trained separately for each channel to search for a Higgs boson with a mass of 125 GeV. To remove the background-dominated portion of the BDT output distribution, only events with a BDT output value above thresholds listed in Table 1 are considered. To achieve a better sensitivity in the search, this threshold is optimized separately for each channel. In this signal-extraction fit, the shape and normalization of all distributions for signal and for each background component are allowed to vary within the systematic and statistical uncertainties described in Section 6. These uncertainties are treated as independent nuisance parameters in the fit. Nuisance parameters, the signal strength, and the scale factors described in Section 5.2 are allowed to float freely and are adjusted by the fit.
In total, seven event BDT output distributions are included in the fit: one for the 0-lepton channel, one for each lepton flavor for the 1-lepton channels, and two for each lepton flavor for the 2-lepton channels (corresponding to the two p T (V) regions). The number of CMVA min distributions included is 24, corresponding to the control regions listed in Tables 3-5: three for the 0-lepton channel, four for each lepton flavor for the 1-lepton channels, and six for each lepton flavor for the 2-lepton channels (each corresponding to one of two p T (V) regions). Figure 4 shows the seven BDT distributions after they have been adjusted by the fit. Figure 5 combines the BDT output values of all channels where the events are gathered in bins of similar expected Table 8: The total numbers of events in each channel, for the rightmost 20% region of the event BDT output distribution, are shown for all background processes, for the SM Higgs boson VH signal, and for data. The yields from simulated samples are computed with adjustments to the shapes and normalizations of the BDT distributions given by the signal extraction fit. The signal-to-background ratio (S/B) is also shown.
Process 0-lepton 1-lepton 2-lepton low-p T (V) 2-lepton high-p T (V) signal-to-background ratio, as given by the value of the output of their corresponding BDT discriminant. The observed excess of events in the bins with the largest signal-to-background ratio is consistent with what is expected from the production of the SM Higgs boson. To detail this excess, the total numbers of events for all backgrounds, for the SM Higgs boson signal, and for data are shown in Table 8 for each channel, for the rightmost 20% region of the BDT output distribution, where the sensitivity is large. The simulation yields are adjusted using the results of fit.
The significance of the observed excess of events in the signal extraction fit is computed using the standard LHC profile likelihood asymptotic approximation [97][98][99][100]. For m H = 125.09 GeV, it corresponds to a local significance of 3.3 standard deviations away from the backgroundonly hypothesis. This excess is consistent with the SM prediction for Higgs boson production with signal strength µ = 1.19 +0.21 −0.20 (stat) +0.34 −0.32 (syst). The expected significance is 2.8 standard deviations with µ = 1.0. Together with this result, Table 9 also lists the expected and observed significances for the 0-lepton channel, for the 1-lepton channels combined, and for the 2-lepton channels combined. The observed signal strength µ is shown in the lower portion of Fig. 6 for 0-, 1-and 2-lepton channels. The observed signal strengths of the three channels are consistent with the combined best fit signal strength with a probability of 5%. In the upper portion of Fig. 6 the signal  strengths for the separate WH and ZH production processes are shown. The two production modes are consistent with the SM expectations within uncertainties. The fit for the WH and ZH production modes is not fully correlated to the analysis channels because the analysis channels contain mixed processes. The WH process contributes approximately 15% of the Higgs boson signal event yields in the 0-lepton channel, resulting from events in which the lepton is outside the detector acceptance, and the ZH process contributes less than 3% to the 1-lepton channel when one of the leptons is outside the detector acceptance. Figure 7 shows a dijet invariant mass distribution, combined for all channels, for data and for the VH and VZ processes, with all other background processes subtracted. The distribution is constructed from all events that populate the signal region event BDT distributions shown in Fig. 4. The values of the scale factors and nuisance parameters from the fit used to extract the VH signal are propagated to this distribution. To better visualize the contribution of events from signal, all events are weighted by S/(S+B), where S and B are the numbers of expected signal and total post-fit background events in the bin of the output of the BDT distribution in which each event is contained. The data are consistent with the production of a standard model Higgs boson decaying to bb. In the Figure, aside from the weights, which favor the VH process, the event yield from VZ processes is reduced significantly due to the p T (V) and M(jj) selection requirements for the VH signal region, and from the training of the BDT that further discriminates against diboson processes.

Extraction of VZ with Z → bb
The VZ process with Z → bb, having a nearly identical final state as VH with H → bb, serves as a validation of the methodology used in the search for the latter process. To extract this diboson signal, event BDT discriminants are trained using as signal the simulated samples for this process. All other processes, including VH production (at the predicted SM rate), are treated as background. The only modification made is the requirement that the signal region M(jj) be in the [60,160] GeV range.  The results from the combined fit for all channels of the control and signal region distributions, as defined in Sections 5.1 and 5.2, are summarized in Table 10 for the same √ s = 13 TeV data used in the VH search described above. The observed excess of events for the combined WZ and ZZ processes has a significance of 5.0 standard deviations from the backgroundonly event yield expectation. The corresponding signal strength, relative to the prediction of the MADGRAPH5 aMC@NLO generator at NLO mentioned in Section 2, is measured to be µ VV = 1.02 +0.22 −0.23 .  Figure 8 shows the combined event BDT output distribution for all channels, with the content of each bin, for each channel, weighted by the expected signal-to-background ratio. The excess of events in data, over background, is shown to be compatible with the yield expectation from VZ production with Z → bb.
0.5 1 1.5 Figure 8: Combination of all channels in the VZ search, with Z → bb into a single event BDT distribution. Events are sorted in bins of similar expected signal-to-background ratio, as given by the value of the output of their corresponding BDT discriminant. The bottom inset shows the ratio of the data to the predicted background, with a red line overlaying the expected SM contribution from VZ with Z → bb.

Combination with Run 1 VH(bb) analysis
The results from the search for VH with H → bb, presented in this article, are combined with those from the similar searches performed by the CMS experiment [18, 36,38] during Run 1 of the LHC, using proton-proton collisions at √ s = 7 and 8 TeV with data samples corresponding to integrated luminosities of up to 5.1 and 18.9 fb −1 , respectively. The combination yields an observed signal significance, at m H = 125.09 GeV, of 3.8 standard deviations, where 3.8 are expected. The corresponding signal strength is µ = 1.06 +0.31 −0.29 . All systematic uncertainties are assumed to be uncorrelated in the combination, except for cross section uncertainties derived from theory, which are assumed to be fully correlated. Treating all uncertainties as uncorrelated has a negligible effect on the significance. Table 11 lists these results.  [9] ATLAS Collaboration, "Study of (W/Z)H production and Higgs boson couplings using H → WW * decays with the ATLAS detector", JHEP 08 (2015) 137, doi:10.1007/JHEP08(2015)137, arXiv:1506.06641. [14] ATLAS Collaboration, "Measurements of the Higgs boson production and decay rates and coupling strengths using pp collision data at √ s = 7 and 8 TeV in the ATLAS experiment", Eur. Phys. J. C 76 (2016) 6, doi:10.1140/epjc/s10052-015-3769-y, arXiv:1507.04548.