Observation of the Higgs boson decay to a pair of τ leptons with the CMS detector

A measurement of the H → ττ signal strength is performed using events recorded in proton–proton collisions by the CMS experiment at the LHC in 2016 at a center-of-mass energy of 13 TeV. The data set corresponds to an integrated luminosity of 35 . 9 fb − 1 . The H → ττ signal is established with a signiﬁcance of 4.9 standard deviations, to be compared to an expected signiﬁcance of 4.7 standard deviations. The best ﬁt of the product of the observed H → ττ signal production cross section and branching fraction is 1 . 09 + 0 . 27 − 0 . 26 times the standard model expectation. The combination with the corresponding measurement performed with data collected by the CMS experiment at center-of-mass energies of 7 and 8 TeV leads to an observed signiﬁcance of 5.9 standard deviations, equal to the expected signiﬁcance. This is the ﬁrst observation of Higgs boson decays to τ leptons by a single experiment.


Introduction
In the standard model (SM) of particle physics [1][2][3], electroweak symmetry breaking is achieved via the Brout-Englert-Higgs mechanism [4][5][6][7][8][9], leading, in its minimal version, to the prediction of the existence of one physical neutral scalar particle, commonly known as the Higgs boson (H). A particle compatible with such a boson was observed by the ATLAS and CMS experiments at the CERN LHC in the ZZ, γ γ , and W + W − decay channels [10][11][12], during the proton-proton (pp) data taking period in 2011 and 2012 at center-of-mass energies of √ s = 7 and 8 TeV, respectively. Subsequent results from both experiments, described in Refs. [13][14][15][16][17][18], established that the measured properties of the new particle, including its spin, CP properties, and coupling strengths to SM particles, are consistent with those expected for the Higgs boson predicted by the SM. The mass of the Higgs boson has been determined to be 125.09 ± 0.21 (stat) ± 0.11 (syst) GeV, from a combination of ATLAS and CMS measurements [19].
To establish the mass generation mechanism for fermions, it is necessary to probe the direct coupling of the Higgs boson to such particles. The most promising decay channel is τ + τ − , because of the large event rate expected in the SM compared to the μ + μ − decay channel (B(H → τ + τ − ) = 6.3% for a mass of 125.09 GeV), E-mail address: cms-publication-committee-chair@cern.ch. and of the smaller contribution from background events with respect to the bb decay channel.

The CMS detector
The central feature of the CMS apparatus is a superconducting solenoid of 6 m internal diameter, providing a magnetic field of 3.8 T. Within the solenoid volume, there are a silicon pixel and strip tracker, a lead tungstate crystal electromagnetic calorimeter (ECAL), and a brass and scintillator hadron calorimeter (HCAL), each composed of a barrel and two endcap sections. Forward calorimeters extend the pseudorapidity coverage provided by the barrel and endcap detectors. Muons are detected in gas-ionization chambers embedded in the steel flux-return yoke outside the solenoid.
Events of interest are selected using a two-tiered trigger system [29]. The first level (L1), composed of custom hardware processors, uses information from the calorimeters and muon detectors to select events at a rate of around 100 kHz within a time interval of less than 4 μs. The second level, known as the high-level trigger (HLT), consists of a farm of processors running a version of the full event reconstruction software optimized for fast processing, and reduces the event rate to about 1 kHz before data storage.
Significant upgrades of the L1 trigger during the first long shutdown of the LHC have benefited this analysis, especially in the τ h τ h channel. These upgrades improved the τ h identification at L1 by giving more flexibility to object isolation, allowing new techniques to suppress the contribution from additional pp interactions per bunch crossing, and to reconstruct the L1 τ h object in a fiducial region that matches more closely that of a true hadronic τ decay. The flexibility is achieved by employing high bandwidth optical links for data communication and large field-programmable gate arrays (FPGAs) for data processing.
A more detailed description of the CMS detector, together with a definition of the coordinate system used and the relevant kinematic variables, can be found in Ref. [30].

Simulated samples
Signal and background processes are modeled with samples of simulated events. The signal samples with a Higgs boson produced through gluon fusion (ggH), vector boson fusion (VBF), or in association with a W or Z boson (WH or ZH), are generated at next-to-leading order (NLO) in perturbative quantum chromodynamics (pQCD) with the powheg 2.0 [31][32][33][34][35] generator. The minlo hvJ [36] extension of powheg 2.0 is used for the WH and ZH simulated samples. The set of parton distribution functions (PDFs) is NNPDF30_nlo_as_0118 [37]. The ttH process is negligible. The various production cross sections and branching fractions for the SM Higgs boson production, and their corresponding uncertainties are taken from Refs. [38][39][40] and references therein.
The MG5_amc@nlo [41] generator is used for Z + jets and W + jets processes. They are simulated at leading order (LO) with the MLM jet matching and merging [42]. The MG5_amc@nlo generator is also used for diboson production simulated at next-to-LO (NLO) with the FxFx jet matching and merging [43], whereas powheg 2.0 and 1.0 are used for tt and single top quark production, respectively. The generators are interfaced with pythia 8.212 [44] to model the parton showering and fragmentation, as well as the decay of the τ leptons. The pythia parameters affecting the description of the underlying event are set to the CUETP8M1 tune [45].
Generated events are processed through a simulation of the CMS detector based on Geant4 [46], and are reconstructed with the same algorithms used for data. The simulated samples include additional pp interactions per bunch crossing, referred to as "pileup". The effect of pileup is taken into account by generating concurrent minimum bias collision events generated with pythia. The simulated events are weighted such that the distribution of the number of additional pileup interactions, estimated from the measured instantaneous luminosity for each bunch crossing, matches that in data, with an average of approximately 27 interactions per bunch crossing.

Event reconstruction
The reconstruction of observed and simulated events relies on the particle-flow (PF) algorithm [47], which combines the information from the CMS subdetectors to identify and reconstruct the particles emerging from pp collisions: charged hadrons, neutral hadrons, photons, muons, and electrons. Combinations of these PF objects are used to reconstruct higher-level objects such as jets, τ h candidates, or missing transverse momentum. The reconstructed vertex with the largest value of summed physics-object p 2 T is taken to be the primary pp interaction vertex. The physics objects are the objects constructed by a jet finding algorithm [48,49] applied to all charged tracks associated with the vertex, including tracks from lepton candidates, and the corresponding associated missing transverse momentum.
Muons are identified with requirements on the quality of the track reconstruction and on the number of measurements in the tracker and the muon systems [50]. Electrons are identified with a multivariate discriminant combining several quantities describing the track quality, the shape of the energy deposits in the ECAL, and the compatibility of the measurements from the tracker and the ECAL [51]. To reject non-prompt or misidentified leptons, a relative lepton isolation is defined as: In this expression, charged p T is the scalar sum of the transverse momenta of the charged particles originating from the primary vertex and located in a cone of size 3) centered on the muon (electron) direction. The sum neutral p T represents a similar quantity for neutral particles. The contribution of photons and neutral hadrons originating from pileup vertices is estimated from the scalar sum of the transverse momenta of charged hadrons in the cone originating from pileup vertices, charged, PU p T . This sum is multiplied by a factor of 1/2, which corresponds approximately to the ratio of neutral to charged hadron production in the hadronization process of inelastic pp collisions, as estimated from simulation. The expression p T stands for the p T of the lepton. Isolation requirements used in this analysis, based on I , are listed in Table 1.
Jets are reconstructed with an anti-k T clustering algorithm implemented in the FastJet library [49,52]. It is based on the clustering of neutral and charged PF candidates within a distance parameter of 0.4. Charged PF candidates not associated with the primary vertex of the interaction are not considered when building jets. An offset correction is applied to jet energies to take into account the contribution from additional pp interactions within the same or nearby bunch crossings. The energy of a jet is calibrated based on simulation and data through correction factors [53]. In this analysis, jets are required to have p T greater than 30 GeV and |η| less than 4.7, and are separated from the selected leptons by a R of at least 0.5. The combined secondary vertex (CSV) algorithm is used to identify jets that are likely to originate from a b quark ("b jets"). The algorithm exploits the track-based lifetime information together with the secondary vertices associated with the jet to provide a likelihood ratio discriminator for the b jet identification. A set of p T -dependent correction factors are applied to Table 1 Kinematic selection requirements for the four di-τ decay channels. The trigger requirement is defined by a combination of trigger candidates with p T over a given threshold (in GeV), indicated inside parentheses. The pseudorapidity thresholds come from trigger and object reconstruction constraints. The p T thresholds for the lepton selection are driven by the trigger requirements, except for the leading τ h candidate in the τ h τ h channel, the τ h candidate in the μτ h and eτ h channels, and the muon in the eμ channel, where they have been optimized to increase the significance of the analysis.

Channel
Trigger requirement Lepton selection simulated events to account for differences in the b tagging efficiency between data and simulation. The working point chosen in this analysis gives an efficiency for real b jets of about 70%, and for about 1% of light flavor or quark jets being misidentified.
Hadronically decaying τ leptons are reconstructed with the hadron-plus-strips (HPS) algorithm [54,55], which is seeded with anti-k T jets. The HPS algorithm reconstructs τ h candidates on the basis of the number of tracks and of the number of ECAL strips in the η-φ plane with energy deposits, in the 1-prong, 1-prong + π 0 (s), and 3-prong decay modes. A multivariate (MVA) discriminator [56], including isolation and lifetime information, is used to reduce the rate for quark-and gluon-initiated jets to be identified as τ h candidates. The working point used in this analysis has an efficiency of about 60% for genuine τ h , with about 1% misidentification rate for quark-and gluon-initiated jets, for a p T range typical of τ h originating from a Z boson. Electrons and muons misidentified as τ h candidates are suppressed using dedicated criteria based on the consistency between the measurements in the tracker, the calorimeters, and the muon detectors [54,55]. The working points of these discriminators depend on the decay channel studied. The τ h energy scale in simulation is corrected per decay mode, on the basis of a measurement in Z → τ τ events. The rate and the energy scale of electrons and muons misidentified as τ h candidates are also corrected in simulation, on the basis of a tag-and-probe measurement [57] in Z → events.
All particles reconstructed in the event are used to determine the missing transverse momentum, p miss T . The missing transverse momentum is defined as the negative vectorial sum of the transverse momenta of all PF candidates [58]. It is adjusted for the effect of jet energy corrections. Corrections to the p miss T are applied to reduce the mismodeling of the simulated Z + jets, W + jets and Higgs boson samples. The corrections are applied to the simulated events on the basis of the vectorial difference of the measured missing transverse momentum and total transverse momentum of neutrinos originating from the decay of the Z, W, or Higgs boson. Their average effect is the reduction of the p miss T obtained from simulation by a few GeV.
The visible mass of the τ τ system, m vis , can be used to separate the H → τ τ signal events from the large contribution of irreducible Z → τ τ events. However, the neutrinos from the τ lepton decays carry a large fraction of the τ lepton energy and reduce the discriminating power of this variable. The svfit algorithm combines the p miss T with the four-vectors of both τ candidates to calculate a more accurate estimate of the mass of the parent bo-son, denoted as m τ τ . The resolution of m τ τ is between 15 and 20% depending on the τ τ final state. A detailed description of the algorithm can be found in Ref. [59]. Both variables are used in the analysis, as detailed in Section 6, and m vis is preferred over m τ τ when the background from Z → events is large.

Event selection
Selected events are classified into the various decay channels according to the number of selected electrons, muons, and τ h candidates. The resulting event samples are made mutually exclusive by discarding events that have additional loosely identified and isolated muons or electrons. Leptons must meet the minimum requirement that the distance of closest approach to the primary vertex satisfies |d z | < 0.2 cm along the beam direction, and |d xy | < 0.045 cm in the transverse plane. The two leptons assigned to the Higgs boson decay are required to have opposite-sign electric charges. In the μτ h channel, events are selected with a combination of online criteria that require at least one isolated muon trigger candidate, or at least one isolated muon and one τ h trigger candidate, depending on the offline muon p T . In the eτ h channel, the trigger system requires at least one isolated electron object, whereas in the eμ channel, the triggers rely on the presence of both an electron and a muon, allowing lower online p T thresholds. In the τ h τ h channel, the trigger selects events with two loosely isolated τ h objects. The selection criteria are summarized in Table 1.
In the τ h channels, the large W + jets background is reduced by requiring the transverse mass, m T , to satisfy where p T is the transverse momentum of the lepton , and φ is the azimuthal angle between its direction and the p miss T .
In the eμ channel, the tt background is reduced by requiring p ζ − 0.85 p vis ζ > −35 or −10 GeV depending on the category, where p ζ is the component of the p miss T along the bisector of the transverse momenta of the two leptons and p vis ζ is the sum of the components of the lepton transverse momenta along the same direction [60]. This selection criterion has a high signal efficiency because the p miss T is typically oriented in the same direction as the visible di-τ system in signal events. In addition, events with a btagged jet are discarded to further suppress the tt background in the eμ channel. Table 2 Category selection and observables used to build the 2D kinematic distributions. The events neither selected in the 0-jet nor in the VBF category are included in the boosted category, as denoted by "Others".

Categorization
The event sample is split into three mutually exclusive categories per decay channel. In each category the two variables that maximize the H → τ τ sensitivity are chosen to build twodimensional (2D) distributions.
The three categories are defined as: Integrating over the whole m jj phase space, up to 57% of the signal events in the VBF category are produced in the VBF production mode, but this proportion increases with m jj .
• Boosted: This category contains all the events that do not enter one of the previous categories, namely events with one jet and events with several jets that fail the specific requirements of the VBF category. It contains gluon fusion events produced in association with one or more jets (78-80% of signal events), VBF events where one of the jets has escaped detection or has low m jj (11-13%), as well as Higgs bosons produced in association with a W or a Z boson decaying hadronically (4-8%). While m τ τ is chosen as one of the dimensions of the distributions, p τ τ T is taken as the second dimension to specifically target Higgs boson events produced in gluon fusion, with a Lorentz-boosted boson recoiling against jets. Most background processes, including W + jets and QCD multijet events, typically have low p τ τ T . The 2D distributions for the signal and W + jets background in the boosted category of the μτ h decay channel are shown in Fig. 1 (bottom).
The categories and the variables used to build the 2D distributions are summarized in Table 2. The results of the analysis are extracted with a global maximum likelihood fit based on the 2D distributions in the various signal regions, and on some control regions, detailed in Section 7, that constrain the normalizations of the main backgrounds.

Background estimation
The largest irreducible source of background is the Drell-Yan production of Z/γ * → τ τ , . In order to correct the yield and distributions of the Z/γ * → τ τ , simulations to better reproduce the Drell-Yan process in data, a dedicated control sample of Z/γ * → μμ events is collected in data with a single-muon trigger, and compared to simulation. The control sample is composed of events with two well-identified and well-isolated opposite-charge muons with p T greater than 25 GeV and an invariant mass between 70 and 110 GeV. More than 99% of events in this region come from Z/γ * → μμ decays. Differences in the distributions of m /τ τ and p T ( /τ τ ) in data and in simulations are observed in this control region, and 2D weights based on these variables are derived and applied to simulated Z/γ * → τ τ , events in the signal region of the analysis. In addition, corrections depending on m jj are derived from the Z/γ * → μμ region and applied to the Z/γ * → τ τ , simulation for events with at least two jets passing the VBF category selection criteria. After this reweighting, good agreement between data in the Z/γ * → μμ region and simulation is found for all other variables. The simulated sample is split, on the basis of the matching between objects at the generator and (bottom) categories in the μτ h decay channel. The background processes are chosen for illustrative purpose for their separation from the signal. The Z → μμ background in the 0-jet category is concentrated in the regions where the visible mass is close to 90 GeV and is negligible when the τ h candidate is reconstructed in the 3-prong decay mode. The Z → τ τ background in the VBF category mostly lies at low m jj values whereas the distribution of VBF signal events extends to high m jj values. In the boosted category, the W+jets background, which behaves similarly to the QCD multijet background, is rather flat with respect to m ττ , and is concentrated at low p ττ T values. These distributions are not used as such to extract the results. at the detector levels, into events with prompt leptons (muons or electrons), hadronic decays of the τ leptons, and jets or misidentified objects at the detector level that do not have corresponding objects at generator level within R < 0.2. The electroweak production of Z bosons in association with two jets is also taken into account in the analysis; it contributes up to 8% of the Z boson production in the VBF category.
The background from W + jets production contributes significantly to the μτ h and eτ h channels, when the W boson decays leptonically and a jet is misidentified as a τ h candidate. The W + jets distributions are modeled using simulation, while their yields are estimated using data, as detailed below. In the boosted and VBF categories, statistical fluctuations in the distributions from simulations are reduced by relaxing the isolation of the τ h and candidates, which has been checked not to bias the distributions. The simulated sample is normalized in such a way as to obtain agreement between the yields in data and the predicted backgrounds in a control region enriched in the W + jets background, which is obtained by applying all selection criteria, with the exception that m T is required to be greater than 80 GeV instead of less than 50 GeV.
The W + jets event purity in this region varies from about 50% in the boosted category to 85% in the 0-jet category. The high-m T sidebands described above, for each category, are considered as control regions in this fit. The constraints obtained in the boosted category are extrapolated to the VBF category of the corresponding decay channel because the topology of the boosted and VBF events is similar, and few data events would pass the high-m T sideband selection in the VBF category. Fig. 2 shows the control regions with m T > 80 GeV in the 0-jet and boosted categories of the μτ h and eτ h channels. These control regions are composed of only one bin because they are used solely to constrain the normalization of the W + jets process. In the eμ and τ h τ h decay channels, the W + jets background is small compared to other backgrounds, and its contribution is estimated from simulations.
The QCD multijet events constitute another important source of reducible background in the τ h channels, and it is entirely estimated from data. Various control samples are constituted to estimate the shape and the yield of the QCD multijet background in these channels, as explained below: 1. The raw yield is extracted using a sample where the and the τ h candidates have the same sign. Using this sample, the QCD multijet process is estimated from data by subtracting the contribution of the Drell-Yan, tt, diboson, and W + jets processes.
2. The yield obtained above is corrected to account for differences between the background composition in the same-sign and opposite-sign regions. The extrapolation factor between the same-sign and opposite-sign regions is determined by comparing the yield of the QCD multijet background for events with candidates passing inverted isolation criteria, in the same-sign and opposite-sign regions. It is constrained and measured by adding to the global fit the opposite-sign region where the candidates pass inverted isolation criteria, using the QCD multijet background estimate from the same-sign region with candidates passing inverted isolation criteria. For the same reasons as in the case of the W + jets background, the constraints are also extrapolated to the VBF signal region. The same technique is used in the eμ decay channel, but no control region is included in the fit because QCD multijet events contribute little to the total background in this decay channel.
In the τ h τ h channel, the large QCD multijet background is estimated with a slightly different method, from a sample composed of events with opposite-sign τ h satisfying a relaxed isolation requirement, disjoint from the signal region. In this region, the QCD multijet background shape and yield are obtained by subtracting the contribution of the Drell-Yan, tt, and W + jets processes, estimated as explained above, from the data. The QCD multijet background yield in the signal region is obtained by multiplying the yield previously obtained in the control region by an extrapolation factor. The extrapolation factor is measured in events passing identical selection criteria as those in the signal region, and in the relaxed isolation region, except that the τ h candidates are required to have the same sign. The events selected with opposite-sign τ h candidates passing relaxed isolation requirements form control regions, shown in Fig. 4, and are used in the fit to extract the results. The tt production process is one of the main backgrounds in the eμ channel. The 2D distributions in all decay channels are predicted by simulation. The normalization is adjusted to the one observed in a tt-enriched sample orthogonal to the signal region. This control region, shown in Fig. 5, is added to the global fit to extract the results, and is defined similarly as the eμ signal region, except that the p ζ requirement is inverted and the events should contain at least one jet.   The contributions from diboson and single top quark production are estimated from simulation, as is the H → WW background.

Uncertainties related to object reconstruction and identification
The overall uncertainty in the τ h identification efficiency for genuine τ h leptons is 5%, which has been measured with a tagand-probe method in Z → τ τ events. This number is not fully correlated among the di-τ channels because the τ h candidates are required to pass different working points of the discriminators that reduce the misidentification rate of electrons and muons as τ h candidates. The trigger efficiency uncertainty per τ h candidate amounts to an additional 5%, which leads to a total trigger uncertainty of 10% for processes estimated from simulation in the τ h τ h decay channel. This uncertainty has also been measured with a tag-and-probe method in Z → τ τ events. In the 0-jet category of the μτ h and eτ h channels, the relative contribution of τ h in a given reconstructed decay mode is allowed to fluctuate by 3% to account for the possibility that the reconstruction and identification efficiencies are different for each decay mode. This uncertainty has been measured in a region enriched in Z → τ τ events with one τ lepton decaying hadronically and the other one decaying to a muon, by comparing the level of agreement in exclusive bins of the reconstructed τ h decay mode, after adjusting the inclusive normalization of the Z → τ τ simulation to its best-fit value. The effect of migration between the reconstructed τ h decay modes is negligible in other categories, where all decay modes are treated together. For events where muons or electrons are misidentified as τ h candidates, essentially Z → μμ events in the μτ h decay channel and Z → ee events in the eτ h decay channel, the τ h identification leads to rate uncertainties of 25 and 12%, respectively, per reconstructed τ h decay mode. Using m vis and the reconstructed τ h decay mode as the observables in the 0-jet category of the μτ h and eτ h channels helps reduce the uncertainty after the signal extraction fit: the uncertainty in the rate of muons or electrons misidentified as τ h becomes of the order of 5%. The energy scale uncertainty for muons or electrons misidentified as τ h candidates is 1.5 or 3%, respectively, and is uncorrelated between reconstructed τ h decay modes. The fit constrains these uncertainties to about one third of their initial values. For events where quark-or gluon-initiated jets are misidentified as τ h candidates, a linear uncertainty that increases by 20% per 100 GeV in τ h p T accounts for a potential mismodeling of the jet → τ h misidentification rate as a function of the τ h p T in simulations. The uncertainty has been determined from a region enriched in W + jets events, using events with a muon and a τ h candidate in the final state, characterized by a large transverse mass between the p miss T and the muon [54,55]. In the decay channels with muons or electrons, the uncertainties in the muon and electron identification, isolation, and trigger efficiencies lead to the rate uncertainty of 2% for both muons and electrons. The uncertainty in the electron energy scale, which amounts to 2.5% in the endcaps and 1% in the barrel of the detector, is relevant only in the eμ decay channel, where it affects the final distributions. In all channels, the effect of the uncertainty in the muon energy scale is negligible.
The uncertainties in the jet energy scale depend on the p T and η of the jet [53]. They are propagated to the computation of the number of jets, which affects the repartition of events between the 0-jet, VBF, and boosted categories, and to the computation of m jj , which is one of the observables in the VBF category. The rate uncertainty related to discarding events with a btagged jet in the eμ decay channel is up to 5% for the tt background. The uncertainty in the mistagging rate of gluon and lightflavor jets is negligible.
The p miss T scale uncertainties [61], which are computed eventby-event, affect the normalization of various processes through the event selection, as well as their distributions through the propagation of these uncertainties to the di-τ mass m τ τ . The p miss T scale uncertainties arising from unclustered energy deposits in the detector come from four independent sources related to the tracker, ECAL, HCAL, and forward calorimeters subdetectors. Additionally, p miss T scale uncertainties related to the uncertainties in the jet energy scale measurement, which lead to uncertainties in the p miss T calculation, are taken into account. The combination of both sources of uncertainties in the p miss T scale leads to an uncertainty of about 10% in the measured signal strength.

Background estimation uncertainties
The Z → τ τ background yield and distribution are corrected based on the agreement between data and the background prediction in a control region enriched in the Z → μμ events, as explained in Section 7. The extrapolation uncertainty related to kinematic differences in the selections in the signal and control regions ranges between 3 and 10%, depending on the category. In addition, shape uncertainties related to the uncertainties in the applied corrections are considered; they reach 20% for some ranges of m jj in the VBF category. These uncertainties arise from the different level of agreement between data and simulation in the Z → μμ control region obtained when varying the threshold on the muon p T .
The uncertainties in the W + jets event yield determined from the control regions in the μτ h and eτ h channels account for the statistical uncertainty of the observed data, the statistical uncertainty of the W + jets simulated sample, and the systematic uncertainties associated with background processes in these control regions. Additionally, an uncertainty in the extrapolation of the constraints from the high-m T (m T > 80 GeV) control regions to the low-m T (m T < 50 GeV) signal regions is additionally taken into account. The latter ranges from 5 to 10%, and is obtained by comparing the m T distributions of simulated and observed Z → μμ events where one of the muons is removed and the p miss T adjusted accordingly, to mimic W + jets events. The reconstructed invariant mass of the parent boson in the rest frame is multiplied by the ratio of the W and Z boson masses before removing the muon. In the τ h τ h and eμ channels, where the W + jets background is estimated from simulation, the uncertainty in the yield of this small background is equal to 4 and 20%, respectively. The larger value for the eμ channel includes uncertainties in the misidentification rates of jets as electrons and muons, whereas the uncertainty in the misidentification rate of jets as τ h candidates in the τ h τ h channel is accounted for by the linear uncertainty as a function of the τ h p T described earlier.
The uncertainty in the QCD multijet background yield in the eμ decay channel ranges from 10 to 20%, depending on the category. It corresponds to the uncertainty in the extrapolation factor from the same-sign to opposite-sign region, measured in events with anti-isolated leptons. In the μτ h and eτ h decay channels, uncertainties from the fit of the control regions with leptons passing relaxed isolation conditions are considered, together with an additional 20% uncertainty that accounts for the extrapolation from the relaxed-isolation control region to the isolated signal region. In the τ h τ h decay channel, the uncertainty in the QCD mutlijet background yield is a combination of the uncertainties obtained from fitting the dedicated control regions with τ h candidates passing relaxed isolation criteria, and of extrapolation uncertainties to the signal region ranging from 3 to 15% and accounting for limited disagreement between prediction and data in signal-free regions with various loose isolation criteria.
The yield of events in a tt-enriched region is added to the maximum likelihood fit to control the normalization of this process in the signal region, as explained in Section 7. The uncertainty from the fit in the control region is automatically propagated to the signal regions, resulting in an uncertainty of about 5% on the tt cross section. Per-channel uncertainties related to the object reconstruction and identification are considered when extrapolating from the eμ final state to the others. The tt simulation is corrected for differences in the top quark p T distributions observed between data and simulation, and an uncertainty in the correction is taken into account.
The combined systematic uncertainty in the background yield arising from diboson and single top quark production processes is estimated to be 5% on the basis of recent CMS measurements [62,63].

Signal prediction uncertainties
The rate and acceptance uncertainties for the signal processes related to the theoretical calculations are due to uncertainties in the PDFs, variations of the QCD renormalization and factorization scales, and uncertainties in the modeling of parton showers. The magnitude of the rate uncertainty depends on the production process and on the event category.
The inclusive uncertainty related to the PDFs amounts to 3.2, 2.1, 1.9, and 1.6%, respectively, for the ggH, VBF, WH, and ZH production modes [38]. The corresponding uncertainty for the variation of the renormalization and factorization scales is 3.9, 0.4, 0.7, and 3.8%, respectively [38]. The acceptance uncertainties related to the particular selection criteria used in this analysis are less than 1% for the ggH and VBF productions for the PDF uncertainties. The acceptance uncertainties for the VBF production in the renormalization and factorization scale uncertainties are also less than 1%, while the corresponding uncertainties for the ggH process are treated as shape uncertainties as the uncertainty increases linearly with p τ τ T and m jj . The p T distribution of the Higgs boson in the powheg 2.0 simulations is tuned to match more closely the next-to-NLO (NNLO) plus next-to-next-to-leading-logarithmic (NNLL) prediction in the HRes2.1 generator [64,65]. The acceptance changes with the variation of the parton shower tune in herwig++ 2.6 samples [66] are considered as additional uncertainties, and amount to up to 7% in the boosted category. The theoretical uncertainty in the branching fraction of the Higgs boson to τ leptons is equal to 2.1% [38].
The theoretical uncertainties in the signal production depend on the jet multiplicity; this effect is included by following the prescriptions in Ref. [67]. This effect needs to be taken into account because the definitions of the three categories used in the analysis are based partially on the number of reconstructed jets. Additional uncertainties for boosted Higgs bosons, related to the treatment of the top quark mass in the calculations, are considered for signal events with p τ τ T > 150 GeV.
Theory uncertainties in the signal prediction contribute an uncertainty of 10% to the measurement of the signal strength.

Other uncertainties
The uncertainty in the integrated luminosity amounts to 2.5% [68].
Uncertainties related to the finite number of simulated events, or to the limited number of events in data control regions, are taken into account. They are considered for all bins of the distributions used to extract the results if the uncertainty is larger than 5%. They are uncorrelated across different samples, and across bins of a single distribution. Taken together, they contribute an uncertainty of about 12% to the signal strength measurement, coming essentially from the VBF category, where the background templates are less populated than in the other categories.
The systematic uncertainties considered in the analysis are summarized in Table 3.

Results
The extraction of the results involves a global maximum likelihood fit based on 2D distributions in all channels, shown in Figs. 6-17, together with the control regions for the tt, QCD multijet, and W + jets backgrounds. The choice of the binning is driven by the statistical precision of the background and data templates, Table 3 Sources of systematic uncertainty. If the global fit to the signal and control regions, described in the next section, significantly constrains these uncertainties, the values of the uncertainties after the global fit are indicated in the third column. The acronyms CR and ID stand for control region and identification, respectively.    Observed and predicted 2D distributions in the VBF category of the eτ h decay channel. The description of the histograms is the same as in Fig. 6.   Fig. 9. Observed and predicted 2D distributions in the VBF category of the eμ decay channel. The description of the histograms is the same as in Fig. 6.  Fig. 11. Observed and predicted 2D distributions in the boosted category of the μτ h decay channel. The description of the histograms is the same as in Fig. 6.   Fig. 12. Observed and predicted 2D distributions in the boosted category of the eτ h decay channel. The description of the histograms is the same as in Fig. 6.  Observed and predicted distributions in the 0-jet category of the τ h τ h decay channel. The description of the histograms is the same as in Fig. 6.   Fig. 15. Observed and predicted 2D distributions in the 0-jet category of the μτ h decay channel. The description of the histograms is the same as in Fig. 6.

Table 4
Background and signal expectations, together with the number of observed events, for bins in the signal region for which log 10 (S/(S + B)) > −0.9, where S and B are, respectively, the number of expected signal events for a Higgs boson with a mass m H = 125.09 GeV and of expected background events, in those bins. The background uncertainty accounts for all sources of background uncertainty, systematic as well as statistical, after the global fit. The contribution from "other backgrounds" includes events from diboson and single top quark production. The contribution from Higgs boson decays to a pair of W bosons is zero in these bins.  The systematic uncertainties are represented by nuisance parameters that are varied in the fit according to their probability density functions. A log-normal probability density function is assumed for the nuisance parameters affecting the event yields of the various background contributions, whereas systematic uncertainties that affect the shape of the distributions are represented by nuisance parameters whose variation results in a continuous perturbation of the spectrum [69] and which are assumed to have a Gaussian probability density function. Overall, the statistical uncertainty in the observed event yields is the dominant source of uncertainty for all combined results.
Grouping events in the signal region by their decimal logarithm of the ratio of the signal (S) to signal-plus-background (S + B) in each bin (Fig. 18), an excess of observed events with respect to the SM background expectation is clearly visible in the most sensitive bins of the analysis. The expected background and signal contributions, as well as the observed number of events, are indicated per process and category in Table 4 for the bins with log 10 (S/(S + B)) > −0.9. The channel that contributes the most to these bins is τ h τ h .
An excess of observed events relative to the background expectation is also visible in Fig. 19, where every mass distribution for a constant range of the second dimension of the signal distributions  The excess in data is quantified by calculating the corresponding local p-value using a profile likelihood ratio test statistic [70][71][72][73]. As shown in Fig. 20 A likelihood scan is performed for m H = 125.09 GeV in the (κ V , κ f ) parameter space, where κ V and κ f quantify, respectively, the ratio between the measured and the SM value for the couplings of the Higgs boson to vector bosons and fermions, with the methods described in Ref. [26]. For this scan only, Higgs boson decays to pairs of W bosons are considered as part of the signal. All nuisance parameters are profiled for each point of the scan. As shown in Fig. 22, the observed likelihood contour is consistent with the SM expectation of κ V and κ f equal to unity.
The results are combined with the results of the search for H → τ τ performed with the data collected with the CMS detector at center-of-mass energies of 7 and 8 TeV [14], using a common signal strength for all data taking periods. All uncertainties are considered as fully uncorrelated between the different centerof-mass energies. The combination leads to an observed and an expected significance of 5.9 standard deviations. The corresponding best fit value for the signal strength μ is 0.98 ± 0.18 at m H = 125.09 GeV. This constitutes the most significant direct measurement of the coupling of the Higgs boson to fermions by a single experiment.

Acknowledgements
We congratulate our colleagues in the CERN accelerator departments for the excellent performance of the LHC and thank the technical and administrative staffs at CERN and at other CMS institutes for their contributions to the success of the CMS effort. In addition, we gratefully acknowledge the computing centers and personnel of the Worldwide LHC Computing Grid for delivering so effectively the computing infrastructure essential to our analyses.