Measurement of the $VH, H\rightarrow \tau\tau$ process with the ATLAS detector at 13 TeV

A measurement of the Standard Model Higgs boson produced in association with a $W$ or $Z$ boson and decaying into a pair of $\tau$-leptons is presented. This search is based on proton-proton collision data collected at $\sqrt{s}=13$ TeV by the ATLAS experiment at the LHC corresponding to an integrated luminosity of 140 fb$^{-1}$. For the Higgs boson candidate, only final states with at least one $\tau$-lepton decaying hadronically ($\tau\rightarrow \mathrm{hadrons} + \nu_\tau$) are considered. For the vector bosons, only leptonic decay channels are considered: $Z \rightarrow \ell\ell$ and $W\rightarrow \ell\nu_\ell$, with $\ell=e,\mu$. An excess of events over the expected background is found with an observed (expected) significance of 4.2 (3.6) standard deviations, providing evidence of the Higgs boson produced in association with a vector boson and decaying into a pair of $\tau$-leptons. The ratio of the measured cross-section to the Standard Model prediction is $\mu_{\text{VH}}^{\tau\tau} = 1.28\ ^{+0.30}_{-0.29}\ (\mathrm{stat.})\ ^{+0.25}_{-0.21}\ (\mathrm{syst.})$. This result represents the most accurate measurement of the \vh process achieved to date.


Introduction
This paper presents a search for the associated production of the Higgs boson with a vector boson in which the Higgs boson decays to a pair of -leptons.This process is referred to as   (), where  represents either a  or  boson.Two possible final states are considered for the  →  +  − decay: either both -leptons decay hadronically to one or more hadrons ( had  had ), or one -lepton decays leptonically ( → ℓ  νℓ , ℓ = , ) and one hadronically ( lep  had ).The combination in which both -leptons from the Higgs boson decay leptonically ( lep  lep ) is not included in order to ensure an event selection that is independent from analyses such as the one presented in Ref. [1].
The events associated with the   () process are classified by the leptonically-decaying vector boson candidate (either a  or  boson) and by the Higgs boson decay channel ( had  had or  lep  had ) into four channels.Each channel is independently optimised to separate these rare events from their background processes and to maximise the sensitivity of the analysis.The vector boson decaying to light leptons provides an efficient trigger option for these events that does not require relying on the Higgs boson decay products.Both the   production [2,3] and the  →  decay channels [4][5][6] have separately been observed by the ATLAS and CMS experiments in recent years.Recently, the CMS collaboration measured a signal strength relative to the SM prediction of the inclusive   () process to be 1.79 ± 0.45 [7].
A similar search was performed by the ATLAS collaboration using 20.3 fb −1 of LHC Run 1 data at √  = 8 TeV [8].Because of the smaller size of the dataset used in this search, it was only able to set a 95% confidence level (C.L.) upper limit on the overall   () cross section of 5.6 times the SM prediction.Compared with the Run 1 result, the analysis presented in this paper uses nearly seven times the total integrated luminosity.This analysis also benefits from the higher center-of-mass energy of Run 2 ( √  = 13 TeV), providing an increase in the Higgs boson production cross section [9] of a factor of about 2, and from improved physics object reconstruction and calibration.The event selection is expanded through the addition of several channels, and the overall analysis strategy is improved via a neural network (NN) discriminant that enhances the signal vs. background selection efficiency.

The ATLAS detector
The ATLAS detector [10] at the LHC covers nearly the entire solid angle around the collision point 1 .It consists of an inner tracking detector surrounded by a thin superconducting solenoid, electromagnetic and hadron calorimeters, and a muon spectrometer incorporating three large superconducting air-core toroidal magnets.
The inner-detector system (ID) is immersed in a 2 T axial magnetic field and provides charged-particle tracking in the range || < 2.5.The high-granularity silicon pixel detector covers the vertex region and typically provides four measurements per track, the first hit normally being in the insertable B-layer (IBL) installed before Run 2 [11,12].It is followed by the silicon microstrip tracker (SCT), which usually provides eight measurements per track.These silicon detectors are complemented by the transition radiation tracker (TRT), which enables radially extended track reconstruction up to || = 2.0.The TRT also provides electron identification information based on the fraction of hits (typically 30 in total) above a higher energy-deposit threshold corresponding to transition radiation.
The calorimeter system covers the pseudorapidity range (|| < 4.9).The solid angle coverage for || between 3.2 and 4.9 is completed with copper/liquid-argon (LAr) and tungsten/LAr calorimeter modules optimised for electromagnetic and hadronic measurements, respectively.Within the region || < 3.2, electromagnetic calorimetry is provided by barrel and endcap high-granularity lead/LAr calorimeters, with an additional thin LAr presampler covering || < 1.8 to correct for energy loss in material upstream of the calorimeters.Hadron calorimetry is provided by the steel/scintillator-tile calorimeter, segmented into three barrel structures within || < 1.7, and two copper/LAr hadron endcap calorimeters.The solid angle coverage is completed with forward copper/LAr and tungsten/LAr calorimeter modules optimised for electromagnetic and hadronic energy measurements respectively.
The muon spectrometer (MS) comprises separate trigger and high-precision tracking chambers measuring the deflection of muons in the magnetic field generated by the superconducting air-core toroidal magnets.The field integral of the toroids ranges between 2.0 and 6.0 T m across most of the detector.Three layers of precision chambers, each consisting of layers of monitored drift tubes, cover the region || < 2.7, complemented by cathode-strip chambers in the forward region, where the background is highest.The muon trigger system covers the range || < 2.4 with resistive-plate chambers in the barrel, and thin-gap chambers in the endcap regions. 1 ATLAS uses a right-handed coordinate system with its origin at the nominal interaction point (IP) in the centre of the detector and the -axis along the beam pipe.The -axis points from the IP to the centre of the LHC ring, and the -axis points upwards.Cylindrical coordinates (, ) are used in the transverse plane,  being the azimuthal angle around the -axis.
The pseudorapidity is defined in terms of the polar angle  as  = − ln tan(/2).Angular distance is measured in units of Interesting events are selected by the first-level trigger system implemented in custom hardware, followed by selections made by algorithms implemented in software in the high-level trigger [13].The first-level trigger accepts events from the 40 MHz bunch crossings at a rate below 100 kHz, which the high-level trigger further reduces in order to record events to disk at about 1 kHz.
An extensive software suite [14] is used in the reconstruction and analysis of real and simulated data, in detector operations, and in the trigger and data acquisition systems of the experiment.

Data and simulation samples
The dataset used for this measurement consists of the LHC proton-proton collision data recorded by the ATLAS experiment at √  = 13 TeV during the Run 2 period from 2015 to 2018.Events are selected for analysis only if they are of good quality and if all the relevant detector components are known to have been in good operating condition [15].The total integrated luminosity of the analysed data is 140 fb −1 .Only events that pass relevant trigger requirements are considered in the analysis.These triggers are designed to select single electrons, single muons, or combinations of these two light leptons [16][17][18][19].The thresholds applied to the reconstructed transverse momentum ( T ) for the single-lepton triggers were   T > 27 (25) GeV and   T > 27.3 (21) GeV for the 2016-2018 (2015) data-taking period.The  T thresholds for the dilepton triggers were   T > 18 GeV and   T > 14.7 GeV for the entire data-taking period.For the   events, the trigger selection acceptance was maximized by using the logical OR of triggers requiring one or two light leptons.
Samples of Monte Carlo (MC) simulated events are used to optimise the event selection and to model the signal and several background processes.Simulated event samples for the   () signal, as well as all background samples, were produced using various MC generators, as described in Table 1.The samples were produced with the ATLAS simulation infrastructure [20] using the full detector simulation performed by the Geant4 [21] toolkit.The Powheg NNLOPS program [22][23][24][25][26] was used to model gluon-gluon fusion (ggF) Higgs boson production with next-to-next-to-leading-order (NNLO) accuracy with the PDF4LHC15NLO [27] parton distribution function (PDF) set.The vector boson fusion (VBF) and the   production processes were simulated with Powheg at next-to-leading-order (NLO) accuracy in QCD using the PDF4LHC15NLO [27] PDF set.The MC prediction of the Higgs production modes mentioned above was normalized to cross-sections calculated at NNLO in the strong coupling with NLO electroweak corrections [28][29][30][31][32].The production of  t events was simulated using PowhegBox [22][23][24][25][26] at NLO using the NNPDF30NNLO [33] PDF set.In all signal events, the decays of the -leptons were modelled by Pythia8.235[34].Background samples of  + jets use Sherpa 2.2.1 [35] with NNLO accuracy and NNPDF30NNLO [33] PDF, while the diboson and triboson events were generated by Sherpa 2.2.2 [35] (including -lepton decays) at NNLO with NNPDF30NNLO [33] PDF, and  t and single-top samples were generated by Powheg+Pythia8.230with NLO accuracy using the NNPDF30NLO [33] PDF, with Pythia also performing -lepton decays.The Powheg+Pythia8 samples use EvtGen(v1.6.0)[36] for the simulation of the -hadron decays.
The effects of multiple interactions in the same and neighbouring bunch crossings (pile-up) were modeled by overlaying minimum-bias events to reproduce the pile-up distributions seen in the data.These minimumbias events were simulated using the soft QCD processes of Pythia 8.186 [37] with the A3 [38] set of tuned parameters and the NNPDF2.3lo[33] PDF.
Table 1: Information on the Monte Carlo event samples used to produce the most relevant processes incorporated into this analysis, including the process name, names of the MC generator and the model of the underlying event with hadronisation and parton showering (UEPS), the corresponding PDF set, and the perturbative order in QCD to which the events were generated.

Object reconstruction and event selection
The   () event selection requires the reconstruction of electrons, muons, visible products of hadronically decaying -leptons ( had-vis ), jets (along with their flavour tagging properties), and missing transverse energy ( miss T ).The number of reconstructed electrons, muons and  had-vis in each event is used to separate the events into analysis channels.

Object reconstruction
Electron candidates are reconstructed from tracks in the inner detector matching calorimeter energy deposits [39].The electron candidates must fulfill the following baseline requirements:  T > 13 GeV, a pseudorapidity || below 2.5 and not in the barrel-endcap transition region (1.37 < || < 1.52), and passing the Loose likelihood selection requirement (93% efficient) for electron identification [39].For an electron to qualify for one of the signal-enhanced categories, it must additionally pass the Tight identification selection requirement (80% efficiency) and the Loose isolation criterion, which is defined for both calorimeter and track-based isolation [39].level, the Muon candidates are reconstructed from tracks in the muon spectrometer and then matched to tracks in the inner detector [40].Baseline muon candidates included in the analysis are required to pass a minimum  T threshold of 9 GeV, have an || < 2.5, and pass the Loose muon identification selection requirement (corresponding to over 98% efficiency).For a muon to qualify for one of the signal-enhanced categories, it must pass the Tight selection requirement (with an efficiency between 90 and 93%, depending on the  T of the muon).Selected muon candidates must also pass a Tight isolation criterion that is based exclusively on tracking information.[40] Jets are reconstructed from particle flow objects using the anti-  [41,42] algorithm with a distance parameter  = 0.4, and calibrated as in Ref. [43].Additional requirements on the jet-vertex-tagger (JVT) [44] are imposed to suppress jets originating from pile-up.In order to identify jets initiated by b quarks for suppression of top quark backgrounds in   () events, the DL1r -tagging algorithm [45][46][47][48] is used on jets with  T > 20 GeV and || < 2.5.The fixed 85% efficiency selection requirement is inverted to reject -jets.The rejection factors for -tagged jets initiated by -quarks and light partons are 2.6 and 29 respectively for the 85% efficiency working point.
The final states of -lepton hadronic decays include a neutrino and a set of visible decay products, most frequently one or three charged pions and up to two neutral pions.The reconstruction of the  had-vis is seeded by jets reconstructed via the anti-  algorithm, using calibrated energy clusters as inputs, with a distance parameter of  = 0.4 [49].Jets seeding  had-vis candidates are additionally required to have  T > 10 GeV and || < 2.5.To separate the  had-vis candidates initiated by hadronic -lepton decays from jets initiated by quarks or gluons, a Recurrent Neural Network (RNN) [50] identification method was trained on information from reconstructed charged-particle tracks and clusters of energy in the calorimeter associated to  had-vis candidates as well as high-level discriminating variables.A separate multivariate discriminant based on a Boosted Decision Tree [51] is also used to reject backgrounds arising from electrons mimicking a  had-vis (mainly from  → +jets in this analysis).This discriminant (eBDT) is built using information from the calorimeter and the tracking detector.Transition radiation information from the TRT system plays a key role in the performance of this discriminant.Baseline  had-vis are required to have 1 or 3 associated tracks, electric charge of ±1,  T > 20 GeV and || < 2.5, excluding the barrel-endcap transition region.In addition, a dedicated muon veto criterion, designed to reject muons misreconstructed as taus (typically due to large calorimeter energy deposits), is applied.To qualify as objects at the earliest stage of the analysis event categorisation, each  had-vis must pass the Medium RNN identification selection requirement, with efficiencies of 75% for 1-prong and 60% for 3-prong taus, and the Loose eBDT requirement, with an efficiency of 95%.2 [51].
An overlap procedure is applied to ensure that electrons, muons,  had-vis and jets used in this analysis are built from a set of mutually exclusive tracks and calorimeter energy deposits.More details can be found in Ref. [52].
The missing transverse momentum vector is an estimate of the imbalance in the transverse momentum in the detector.This vector is calculated as the negative vector sum of the transverse momenta of all reconstructed final-state objects (electrons, muons, taus, and jets).The magnitude of the missing transverse momentum vector is referred to as missing transverse momentum ( miss T ).Tracks not associated to any reconstructed object are also included in the calculation, and serve to estimate the contribution from low- T collision remnants (referred to as the soft term contribution).The default Tight criterion of the official ATLAS Missing Transverse Energy Tool was chosen [53].Tight  miss T is calculated without including forward jets with || > 2.4 and 20 <  T < 30 GeV.This tighter threshold removes regions of phase space that have more pileup jets than hard scatter jets.

Event categorization and selection
The analysis channels are defined by the vector boson associated with the Higgs boson production and by the decay mode (leptonic or hadronic) of the -leptons associated with the Higgs boson decay.This results in four different channels:   ( lep  had ),   ( had  had ),   ( lep  had ) and   ( had  had ).In all channels, only the final states in which the vector boson decays to light leptons are considered.In the  lep  had channel, the leading  T light lepton is assigned to the vector boson ().If  is a  boson, the other light lepton assigned to the  boson leptonic decay is required to have the same flavor and opposite electric charge (referred to as "sign" throughout the paper) to that of the leading  T light lepton.If multiple opposite-sign light lepton pairs can be formed, the pair with invariant mass closest to the  boson mass is chosen.The same invariant mass is also required to be within a given mass range, optimized separately for each category, to suppress events with a pair of light leptons not originating from a  → ℓℓ process.The light lepton not assigned to the  boson leptonic decay is then associated to the leptonic decay of the -lepton from the Higgs boson.When  is a  boson, the sub-leading  T light lepton is assigned to the Higgs boson with no flavor or charge selection applied.For each category, the selection is organized into a Preselection and Signal Region selection.The Preselection is used as a starting point of shared selections from which a series of validation regions are further defined to check the modelling of specific background sources.The light lepton from the  lep decay in the   ( lep  had ) and   ( lep  had ) Signal Regions events has a small probability of about 3% to be misassigned to the vector boson instead of the Higgs boson because of the kinematic selections applied.
The main analysis strategy uses a NN classifier as a final discriminant.As a cross-check, another version of the analysis is performed using Higgs boson mass estimators as the final discriminants, as was done in the Run 1 analysis [8].In the   channels, the Missing Mass Calculator ( MMC ) [54] is used to estimate the Higgs boson mass.The  MMC method is a precise strategy for calculating a most likely parent particle mass when that parent particle decays into multiple sources of  miss T , as in most of the  →  event topologies.For the   channels, in which both the  and the Higgs boson decays act as sources of  miss T , the  MMC assumption that the  miss T (i.e.neutrinos) originate only from the Higgs boson decay is no longer valid, so the Late-Projected transverse mass  2T [55] is instead used as a discriminant 3 .
Table 2 summarizes the selection criteria used for each category.The event selection shown in Table 2 is identical for the cross-check analysis using the Higgs boson mass discriminants, with the exception of the Higgs boson mass window cuts.The combined signal efficiency is about 6% and 8% for the   and   channels, respectively 4 .The   ( lep  had ) selection has a signal efficiency of about 7% and represents about 44% of all the   () signal events across the four categories.The   ( had  had ) selection has a signal efficiency of about 5% and represents about 32% of all the   () signal events across the four categories.The   ( lep  had ) selection has a signal efficiency of about 7% and represents about 10% of all the   () signal events across the four categories.The   ( had  had ) selection has a signal efficiency of about 9% and represents about 14% of all the   () signal events across the four categories.

Selection of the 𝑾 𝑯, 𝑯 → 𝝉 lep 𝝉 had events
The Preselection begins with requiring exactly two light leptons and one  had-vis that pass their baseline requirements as well as additional isolation and identification criteria as described in Section 4.1.A -jet veto is applied to suppress backgrounds from top quark production.
For the Signal Region selection, a same-sign light lepton requirement is applied to suppress +jets backgrounds and the  had-vis is required to have the opposite sign of the light leptons.A differentiation 3 The  2T variable is constructed to provide an event-by-event lower bound on the transverse mass of the heaviest parent particle -in this topology, the Higgs boson.The  2T is defined as where   is the late-projected transverse mass of the   ℎ parent particle and ì   is the sum of the transverse momenta of all invisible particles. 4Estimated using MC simulations and normalized to the total number of signal events from a specific process falling into the detector acceptance with no trigger requirements.based on the final state light leptons is also used to further optimise the selection criteria.For the dielectron final state, the invariant mass of the two electrons m  must satisfy m  ∉ [80,100] GeV to reduce the contamination of +jets events with  →  where one of the  was reconstructed with the wrong charge, thereby passing the same-sign light lepton requirement.The scalar sum of all three final state particles'  T is required to be greater than 90 GeV to suppress events that contain one or two misidentified objects.Finally, for the main analysis strategy using a NN discriminant, the  2T is required to fall within 60 <  2T < 130 GeV to improve the signal-to-background ratio.

Selection of the 𝑾 𝑯, 𝑯 → 𝝉 had 𝝉 had events
The Preselection begins with requiring exactly one light lepton and two  had-vis that pass their baseline requirements as well as additional isolation and identification criteria to qualify as analysis-level objects, as described in Section 4.1.A -jet veto is applied to suppress backgrounds from top-quark production.
The Signal Region selection adds the following criteria.The two  had-vis are required to have opposite sign.A requirement on the radial distance between the two  had-vis candidates (0.8 < Δ( had-vis ,  had-vis ) < 2.8) and on the scalar sum of the  T of the two ( had-vis > 100 GeV) are imposed to suppress events that contain one or two misidentified  had-vis passing the selection.An additional cut on the transverse mass 5 between the  miss T and the light lepton  T (ℓ,  miss T ) > 20 GeV is applied to reduce the  →  background.Finally, for the main analysis strategy using a NN discriminant, the  2T is required to fall within 80 <  2T < 130 GeV to improve the signal-to-background ratio.

Selection of the 𝒁𝑯, 𝑯 → 𝝉 lep 𝝉 had events
The Preselection begins with requiring exactly three light leptons and one  had-vis that pass their baseline requirements as well as additional isolation and identification criteria to qualify as analysis-level objects, as described in Section 4.1.Two light leptons must have the same flavour and opposite signs, as they are associated with the  boson decay.The  MMC needs to have successfully converged.
The Signal Region selection further requires that the decay products of the Higgs boson have opposite sign.The invariant mass of the light leptons m ℓℓ must satisfy 81 < m ℓℓ < 101 GeV.The scalar sum of  T from the two objects associated with the Higgs boson decay is required to be greater than 60 GeV.Lastly, for the main analysis strategy using a NN discriminant, the  MMC must satisfy 100 <  MMC < 170 GeV to enhance the signal-to-background ratio.

Selection of the 𝒁𝑯, 𝑯 → 𝝉 had 𝝉 had events
The Preselection begins with requiring exactly two light leptons and two  had-vis that pass their baseline requirements as well as additional isolation and identification criteria to qualify as analysis-level objects, as described in Section 4.1.These two light leptons associated with the  boson decay need to be same flavour and opposite-sign.The  MMC is required to have successfully converged.
For the Signal Region selection, the two  had-vis from the Higgs boson decay are required to have opposite signs.The invariant mass of the light leptons m ℓℓ must satisfy 71 < m ℓℓ < 111 GeV.In this case, the m ℓℓ acceptance window is slightly larger than the   ( lep  had ) category due to lower background levels.To enhance the suppression of the misidentified  had-vis events, the scalar sum of the  T of the taus is required to be greater than 75 GeV.Lastly, for the main analysis strategy using a NN discriminant, the  MMC must satisfy 100 <  MMC < 180 GeV.

Background estimation
The main backgrounds for this analysis consist of   events and +jets events in which a jet is misidentified as a light lepton (ℓ = , ) or  had-vis .The background contribution from events with jets misidentified is estimated using a data-driven technique.Other background components such as top quark decays,  t, and  triboson events are estimated using MC simulations.
In the   categories, the   events are the main background source and are estimated using simulated events.In both   categories, the   events form ∼ 60% of the total background, while the background from misidentified jets events account for much of the remaining ∼ 40%.The other background sources (top-quark decays, triboson and  t events) account for less than 1% of the total background and are estimated using simulations.
In the   categories, background from misidentified jets events represent ∼ 70% of the total background.In both   categories, the   events account for ∼ 30% of the total background and are estimated using simulated events.The other background sources (top quark decays, , and  t events) form less than 2% of the total background and are also estimated using simulated events.
Due to the difficulties of validating the diboson background in dedicated validation regions, the analysis relies on previous measurements with higher statistics available [56,57] and normalises this background to the Standard Model expectation.No dedicated validation region is used to extract the diboson background normalization from the data.
The background from misidentified jets is evaluated using the Fake Factor method [58,59].A similar background estimation was used in the previous version of this analysis [8].A Fake Factor is defined as  = /(1 − ), where  represents the selection efficiency of misidentified objects ( had-vis or light lepton).This means that  is the ratio of the number of objects passing the selection requirements (as described in Section 4.1) to the number of objects that pass nearly all the selection requirements, but fail one or both of the identification and isolation requirements.For electrons and muons, at least one of the identification and isolation requirements must fail, while -leptons need to pass the Very-Loose identification requirement (efficiency of 99%) and fail the Medium identification requirement.The derived  values range from 0.32 (0.07) at  T < 25 GeV to 0.02 (0.01) at  T > 60 GeV for 1-track (3-track)  had-vis candidates.For the muon (electron) case, the  values range from 0.14 (0.13) at  T < 15 (10) GeV to 0.25 (0.36) at  T > 20 GeV.
The expected number of misidentified jets in a given region is obtained using the Fake Factor to scale the number of events selected in an orthogonal region in which one or more requirements are inverted: the identification and/or isolation requirements for light leptons, and the identification requirements for the  had-vis .The Fake Factors are measured in a dedicated +jets control region (CR) enriched in background from misidentified jets, and are then used as extrapolation factors to estimate the number of selected fake objects in the signal region.This +jets CR is based on the selection of exactly two opposite-sign light leptons that are required to be consistent with a -boson decay, plus a third object that is assumed to originate from a jet that is misidentified either as an electron, muon or  had-vis and is used for the determination of the corresponding Fake Factors.The Fake Factors are computed in bins of lepton  T and ||.The  had-vis Fake Factors are also parametrized with respect to the  had-vis jet width, defined as a weighted sum of jet constituent distance from the jet axis (jet width =  Δ    T /    T ) separately for one-and three-track  had-vis candidates.In the  had-vis case, the jet width parametrization is particularly important because it is highly correlated with the jet quark-gluon fraction composition, and quark-vs.gluon-initiated jets exhibit different  had-vis misidentification rates.Contributions from a jet misidentified as a light lepton that triggered the event are estimated from the simulation and found to be negligible.Diboson events can contaminate the selection at a level below a few percent, and are therefore subtracted using the MC prediction.The evaluation of the background from misidentified jets takes into account the presence of multiple misidentified objects, which can be as high as three in the   categories in which only one light lepton triggered the event.
In all the categories, the modelling of the background from misidentified jets is validated in a misidentified background-enriched same-sign region.For the   categories, the Preselection criteria are applied, and the objects associated to the Higgs boson decay are required to have the same charge.For the   categories, all the SR cuts are applied except the  2T mass cut and same-sign -leptons selection requirement.The selected region contains a sufficiently large number of events to minimize statistical fluctuations.Validation regions in the   categories contain sufficient events to be able to verify the modelling of the background from misidentified jets, as summarized in Table 3.The same table also shows the expected composition of the processes contributing to the background from misidentified jets in each region obtained entirely from simulation.
The uncertainties associated with the Fake Factor method have statistical and systematic components.The statistical component is estimated for each Fake Factor bin separately and consists of the statistical uncertainties from the data within the +jets CR propagated to the Fake Factors.For each category, a dedicated systematic uncertainty takes into account the statistical fluctuations associated with the subtracted MC component.Another systematic uncertainty accounts for the residual difference between the background from misidentified jets modelling and the data in the misidentified background-enriched same-sign region.This uncertainty is evaluated at the Preselection stage, which requires the objects associated to the Higgs boson decay to have the same charge.For each category, this uncertainty is evaluated in bins of  had-vis  T .The statistical component is negligible (< 5%) compared with the systematic uncertainty, which ranges up to 23% in the high  T region (above 60 GeV).
Figure 1 shows the distributions of some kinematic variables that demonstrate the good modelling of the background from misidentified jets in the misidentified background-enriched same-sign region in each of the four analysis categories.
Table 3: Validation regions, in addition to the misidentified background-enriched same-sign region, used to check the modelling of the background from misidentified jets in the   categories.The last column shows an estimate of the major contribution to the background from misidentified jets, which was estimated using a pure MC study.The definition of the colinear mass ( coll ) can be found in Ref. [60].

Analysis strategy
In this analysis, the signal strength is measured by a fit to a neural network (NN) classifier score distribution.As a cross-check for this method, an alternative strategy is applied using the same mass-based analysis methods as in Run 1:  MMC for the   channels and  2T for the   channels.

Neural network analysis
The result is extracted using a fit to the distribution of the score of a NN classifier.Six different NN classifiers are trained: one for each of the   and the   ( had  had ) channels and three for   ( lep  had ), which has a dedicated classifier for each combination of final light lepton flavour ( + ,  + , and  + ).
The NNs are trained to distinguish simulated signal and diboson events using a combination of low-level and high-level kinematic information from the particles in the event (e.g.  , ||, ) and the overall event (e.g. miss T and dilepton mass).The full list of the input variables used for each NN is provided in Table 4.The NNs training is performed at the Preselection stage to overcome the limited statistics of the MC samples and take advantage of the larger dataset size to avoid overtraining.The mass-based observables  MMC and  2T were not included in the list of inputs, as it was determined that they did not yield a significant improvement in sensitivity.
Each NN is implemented using Keras [61] with a Tensorflow [62] backend.The networks consist of two initial transformation layers followed by three fully connected layers of 128 nodes with ReLU [63] activations.The output layer consists of a single node with a sigmoid activation function.The two initial transformation layers enforce rotation invariance in  by adding a global  offset during training, which is consistently applied event-by-event to all reconstructed objects.The binning of the NN score distributions in the four categories results from an optimization process that maximized significance under the constraint that the statistical uncertainty associated with the signal and background templates is no larger than 20% in each bin.Table 4: Input variables for the neural networks included in all channels, and then for the specific category.The indexes "1" and "2" refer to the leading and sub-leading objects, respectively (following a  T ordering).The symbol ℓ  refers to the light lepton originating from a -lepton decay, while ℓ (without any index) refers to a light lepton associated with the  boson decay.

Mass-based analysis
As a cross-check for the main analysis strategy, a historical approach used in the Run 1 analysis based on the mass observables ( MMC for the   channels and the  2T for the   channels) is also adopted to extract the signal yield.The   categories use  2T instead of  MMC because the presence of an additional neutrino coming from the  decay breaks an important assumption used by the  MMC algorithm, as previously discussed in Section 4.2.
For the mass-based analysis, the cut on the mass observable is dropped from the Signal selection criteria in all categories to better constrain the background using the sidebands in which the background contribution is dominant.As a consequence, the total number of events selected in the Signal region increases up to a factor of about four due to a general increase of the background fraction from misidentified  had-vis .
As shown in Table 4 , the mass variables defined here ( MMC and  2T ) are not used as inputs for the NN.However, the binning of the distributions of the mass variables follows the same optimization criteria used in the NN-based fit case discussed in the previous section.

Systematic uncertainties
Systematic uncertainties affect the yields in the signal and control regions as well as the shape of the fitted distribution.They can be separated into four groups: MC sample statistical uncertainties (using the lite version of the Beeston-Barlow method [64]), experimental uncertainties, theoretical uncertainties for the backgrounds, and theoretical uncertainties for the signal.The systematic uncertainties related to the estimation of misidentified objects are described in Section 5.
Experimental uncertainties pertain to the trigger as well as final-state objects: reconstruction, identification and isolation efficiency uncertainties for electrons [39], muons [65],  had-vis [66], jets [43,[67][68][69], b-jets [70,71] and  miss T [72].The uncertainties associated with the  had-vis identification efficiency are in the range of 2% to 6%, while the eBDT efficiency uncertainty is 1% to 2%.All these uncertainties are parameterised as a function of the  had-vis  T and number of associated tracks or -lepton decay mode (eBDT efficiency).For the  had-vis energy scale, the total uncertainty is in the range of 1% to 4%, arising from a combination of measurements: (i) a direct measurement with  →  →  had-vis + 3 events, (ii) measurements of the calorimeter response to single particles, and (iii) comparisons between simulations using different detector geometries or Geant4 physics lists [21].This uncertainty is also parameterised as a function of the  had-vis  T and number of associated tracks [66].The energy scale and resolution uncertainties of final state objects are taken into account as well.Experimental uncertainties affect the shape of the distribution of the final discriminant, the background yields, and the signal cross-section through their effects on the acceptance of and migration between different analysis categories.An additional uncertainty from the measurement of the luminosity [73,74], amounting to 0.83%, is also included.
The theoretical uncertainties for the diboson, triboson and the top quark background are estimated from simulation.These include the systematic uncertainties due to renormalisation ( r ), factorisation ( f ) and resummation scale ( qsf ), the jet-to-parton matching scheme (CKKW) [75], the choice of  s value, and the PDFs.For the diboson background, the most relevant contributions come from the systematic uncertainties due to  r and  f , which affect the shape and the global normalization with a total uncertainty that ranges from 7% to 12% in the in the   and   processes respectively.For the top quark background, uncertainties related to the choice of matrix element and parton shower generators [76,77], the initial-and final-state radiation model [78], and the PDFs are considered [79].Their effect on the normalisation and shape of the final discriminant is considered in the statistical analysis.
The Higgs boson production cross-section uncertainties are obtained from Ref. [9].To account for missing higher orders in QCD, additional uncertainties are estimated by varying  r ,  f ,  qsf , the choice of  s value, and the choice of matrix element generator or parton shower and hadronisation model.For the matrix element variation, predictions by Powheg Box v2 are compared with those by MadGraph5_aMC@NLO [80].The parton shower and hadronisation model variation replaces the nominal Pythia 8 simulation with Herwig 7 [76,77].Additional theoretical uncertainties affecting the  t production cross-section are also considered; more details can be found in Ref. [6].

Results
The statistical procedure is based on a likelihood function L (, ), constructed as the product of Poisson probability terms over the bins of the input distributions.The parameter of interest, , is the signal strength that multiplies the SM Higgs boson production cross-section in association with a vector boson times the branching fraction into .It is extracted by maximising the likelihood.An additional statistical procedure, also based on a likelihood function defined as described above, is used to estimate two parameters of interest    and    separately for the   and the   categories, respectively.Systematic uncertainties enter the likelihood as nuisance parameters (NP), .Most of the uncertainties discussed in Section 7 are constrained with Gaussian or log-normal probability density functions.The systematic variations that are subject to large statistical fluctuations are smoothed, and systematic uncertainties that have a negligible impact on the final results are pruned away category by category.Only the Signal regions are considered in the fit.The normalisation of the background contributions from diboson, triboson, top processes and other small backgrounds are taken from the simulation.The probability that the background-only hypothesis is compatible with the observed data is determined using the profile-likelihood ratio test statistic defined in Ref. [81].

Results of the neural network analysis
For a Higgs boson mass of 125 GeV, when the   and   categories are combined under the constraint that      =      =      , the NN-based fit shows an observed significance of 4.2 standard deviations from the background-only hypothesis, compared with an expectation of 3.6 standard deviations.The fitted value of the signal strength is: = 1.28 +0.39 −0.36 = 1.28 +0.30 −0.29 (stat.)+0.25 −0.21 (syst.).
Using a predicted cross-section of (6.59 ± 0.03) fb from the Standard Model [9], this corresponds to a measured cross-section of 8.5 +2.6 −2.4 fb.
The total statistical uncertainty is defined as the uncertainty in    VH when all the NPs are fixed to their best-fit values.The total systematic uncertainty is then defined as the difference in quadrature between the total uncertainty in    VH and the total statistical uncertainty.The result obtained is limited by the data sample size, as shown by the breakdown of the statistical and systematic uncertainties.
The relative effects of systematic uncertainties on the measurement of    VH are shown in Table 5.The impact of a category of systematic uncertainties is defined as the difference in quadrature between the uncertainty in    VH computed when all NPs are fitted and that when the NPs in the category are fixed to their best-fit values.As shown in Table 5, the systematic uncertainties associated to the  had-vis reconstruction (including its identification and calibration) and the systematic uncertainties associated to the background sample size play a dominant role, followed by the modelling of the background from misidentified jets.
Figure 2 shows the post-fit distributions of the NN scores.The background prediction in all post-fit distributions is obtained by setting the nuisance parameters according to their best-fit values.Significant studies into the goodness of fit were performed given the fluctuations seen in several distributions, including working with Monte Carlo simulation toys to confirm the applicability of the asymptotic approximation used in extracting the analysis sensitivity.
Figure 3 shows the data, background, and signal yields, where final-discriminant bins in all regions are combined into bins of log 10 (/).Here,  and  are the fitted signal and background yields in each analysis bin.Table 6 shows the good agreement obtained in the resulting yields in the four categories.
A combined fit is also performed with floating signal strengths separately for the   and   production processes.The results of this fit are shown in Figure 4.The probability that the signal strengths measured in the two production processes are compatible is 56% 6 .
Table 5: Summary of the different sources of uncertainty affecting the observed    VH and their impact as computed by the NN-based fit described in Section 6.1.Experimental uncertainties for reconstructed objects combine efficiency and energy/momentum scale and resolution uncertainties.'Simulated background sample size' includes the bin-by-bin statistical uncertainties in the simulated backgrounds and in misidentified jets background, which is estimated using data.

Source of uncertainty
/        values for the (/) processes are obtained from a simultaneous fit with the signal strength for each of the   and   processes floating independently.The probability of compatibility of the individual signal strengths is 56%.

Results of the mass-based analysis
For all channels combined, the fitted value of the signal strength is: in good agreement with the result of the NN-based analysis discussed in the previous section.The observed excess in the mass analysis has a significance of 3.5 standard deviations, compared to an expectation of 2.6 standard deviations.The relative effects of systematic uncertainties on the measurement of    VH in this case are quite similar to the results discussed in Section 8.1.
Figure 5 shows the post-fit distributions of the observables used in the mass-based analysis:  MMC for the   categories and  2T for the   ones.The background prediction in all post-fit distributions is obtained by setting the nuisance parameters according to the results of the combined   and   fit.

Conclusion
A search for the Standard Model Higgs boson decaying into a  pair and produced in association with a leptonically-decaying  or  boson is presented, using data collected by the ATLAS experiment in proton-proton collisions from Run 2 of the LHC.The data correspond to an integrated luminosity of 140 fb −1 collected at a centre-of-mass energy of √  = 13 TeV.
In addition to the approximately seven times larger dataset, the main sources of improvement with respect to the Run 1 result are the more sophisticated analysis methodologies.These include the introduction of a neural network discriminator for rejecting the diboson background and better  had-vis identification algorithms.
An excess over the expected background is observed with a significance of 4.2 standard deviations compared with an expectation of 3.6.The measured signal strength relative to the SM prediction for   = 125 GeV is found to be  VH = 1.28 +0.30 −0.29 (stat.)+0.25 −0.21 (syst.).This analysis provides currently the most sensitive measurement of the Higgs boson and a leptonically-decaying vector boson in events where the Higgs boson decays into a pair of -leptons.

Figure 1 :
Figure 1: Distributions of representative kinematic variables in the misidentified background-enriched same-sign region: (a) the Higgs boson transverse momentum (  T ) in the   ( had  had ) category, (b) the missing transverse momentum ( miss T ) in the   ( lep  had ) category, (c) the radial distance (dR(ℓ, ℓ)) between the two light leptons associated to the  → ℓℓ decay process in the   ( had  had ) category, and (d) the invariant mass ( ℓℓ ) of the two light leptons associated to the  → ℓℓ decay in the   ( lep  had ) category.The hatched band represents the pre-fit statistical, experimental and theoretical uncertainties.The signal contributions are considered as part of the predictions and are normalized as predicted by the Standard Model.

Figure 2 :
Figure 2: Post-fit distributions for NN-based analysis of the NN-scores in the   ( had  had ) (a),   ( lep  had ) (b),   ( had  had ) (c) and   ( lep  had ) (d) categories.The hatched band indicates the total post-fit uncertainty of the total predicted yields.The post-fit signal contributions are considered as part of the predictions.

Figure 3 :
Figure 3: Event yields as a function of log 10 (/) for data, background, and a Higgs boson signal with   = 125 GeV.Final-discriminant bins in all regions are combined into bins of log 10 (/), where  is the fitted signal and  is the fitted background from the NN-based fit.The Higgs boson signal contribution is shown after re-scaling the SM cross-section according to the value of the signal strength extracted from data ( = 1.28).In the lower panel, the pull of the data relative to the background-only expectation, evaluated as the difference between the data and the background-only expectation divided by the square-root sum of the data and background statistical uncertainty, is shown.The solid line shows the expected pull in each bin for the best-fit signal value.

Figure 4 :
Figure 4: The fitted values of the Higgs boson signal strength      for   = 125 GeV for the   and   processes and their combination from the NN-based fit.The individual     values for the (/) processes are obtained from a simultaneous fit with the signal strength for each of the   and   processes floating independently.The probability of compatibility of the individual signal strengths is 56%.

Figure 5 :
Figure 5: Post-fit distributions for mass-based analysis for  MMC in the (a)   ( had  had ), (b)   ( lep  had ), and for  2T in the (c)   ( had  had ) and (d)   ( lep  had ) categories.The hatched band indicates the total post-fit uncertainty of the total predicted yields.The post-fit signal contributions are considered as part of the predictions.

Table 2 :
Preselection and Signal Region selection for the four categories."OS" stands for opposite-sign, "SS" for same-sign.

Table 6 :
Post-fit yields from the NN-based fit performed with      =      =      .The symbol "-" is used when no events or < 10 −2 events are present. ( had  had )   ( lep  had )   ( had  had )   ( lep  had )