UvA-DARE (Digital Academic Repository) A search for the dimuon decay of the Standard Model Higgs boson with the ATLAS detector

A search for the dimuon decay of the Standard Model (SM) Higgs boson is performed using data corresponding to an integrated luminosity of 139 fb − 1 collected with the ATLAS detector in Run 2 pp collisions at √ s = 13 TeV at the Large Hadron Collider. The observed (expected) signiﬁcance over the background-only hypothesis for a Higgs boson with a mass of 125 . 09 GeV is 2 . 0 σ (1 . 7 σ ). The observed upper limit on the cross section times branching ratio for pp → H → μμ is 2 . 2 times the SM prediction at 95% conﬁdence level, while the expected limit on a H → μμ signal assuming the absence (presence) of a SM signal is 1 . 1 (2 . 0). The best-ﬁt value of the signal strength parameter, deﬁned as the ratio of the observed signal yield to the one expected in the SM, is μ = 1 . 2 ± 0 . 6. © 2020 The Author(s). by Elsevier open access article CC


Introduction
In July 2012, the ATLAS and CMS Collaborations announced the discovery of a new particle with a mass of approximately 125 GeV [1,2] at the CERN Large Hadron Collider (LHC). Subsequent measurements have indicated that this particle is consistent E-mail address: atlas .publications @cern .ch. son with m H = 125.09 GeV is (2.17 ± 0.04) × 10 −4 [11]. However, physics beyond the SM [12,13] could modify the branching ratio.
Both the ATLAS and CMS Collaborations carried out searches for the H → μμ process based on partial sets of about one quarter of the data collected during Run 2 of the LHC [14,15]. This paper presents an improved search for the dimuon decay of the Higgs boson using the full pp collision dataset recorded with the ATLAS detector in the LHC Run 2 period, spanning 2015 to 2018 at √ s = 13 TeV, corresponding to an integrated luminosity of about 139 fb −1 . Compared to the previous publication [14], several improvements have been made. They include a better categorisation based on multivariate techniques that exploit the topological and kinematic differences between the different signal production modes and the background processes, improvements in the muon reconstruction, a large increase in the equivalent integrated luminosity of the simulated background samples using a dedicated fast simulation, and an improved methodology for the background modelling.
The analysis selects events with two opposite-charge muons and classifies them into 20 mutually exclusive categories based on the event topology and multivariate discriminants to increase the signal sensitivity. After event categorisation, the signal yield is extracted by a simultaneous fit to the 20 dimuon mass (m μμ ) distributions in the range 110-160 GeV together with background normalisation and shape parameters, exploiting the resonant behaviour of the Higgs boson signal. The Higgs boson is assumed to have a mass of m H = 125.09 GeV [5] for all results presented.
More recent measurements of the Higgs boson mass [16,17] are compatible within their uncertainties with this value.

ATLAS detector
The ATLAS detector [18,19] covers nearly the entire solid angle around the collision point. 1 It consists of an inner tracking detector surrounded by a thin superconducting solenoid, electromagnetic and hadronic calorimeters, and a muon spectrometer incorporating three large superconducting toroid magnets.
The inner detector (ID) system is immersed in a 2 T axial magnetic field and provides charged-particle tracking in the range |η| < 2.5. A high-granularity silicon pixel detector covers the vertex region and typically provides four measurements per track. It is surrounded by a silicon microstrip tracker, which typically provides four measurement points per track. These silicon detectors are complemented by a transition radiation tracker, which enables radially extended track reconstruction up to |η| = 2.0.
The calorimeter system covers the pseudorapidity range |η| < 4.9. Within the region |η| < 3.2, electromagnetic calorimetry is provided by barrel and endcap high-granularity liquid-argon (LAr) sampling calorimeters, with an additional thin LAr presampler covering |η| < 1.8 to correct for energy loss in material upstream of the calorimeters. Hadronic calorimetry is provided by a scintillatortile calorimeter, segmented into three barrel structures within |η| < 1.7, and two LAr hadronic endcap calorimeters.
The muon spectrometer (MS) comprises separate trigger and high-precision tracking chambers measuring the deflection of muons in a magnetic field generated by the superconducting aircore toroids. The precision chamber system covers the region 1 ATLAS uses a right-handed coordinate system with its origin at the nominal interaction point (IP) in the centre of the detector and the z-axis along the beam pipe. The x-axis points from the IP to the centre of the LHC ring, and the y-axis points upward. Cylindrical coordinates (r, φ) are used in the transverse plane, φ being the azimuthal angle around the z-axis. The pseudorapidity is defined in terms of the polar angle θ as η = − ln tan(θ/2). The rapidity is defined as y = 1 2 ln E+pz E−pz and the distance between two objects is defined as R = ( y) 2 + ( φ) 2 .
|η| < 2.7 with three layers of monitored drift tubes, complemented by cathode-strip chambers in the forward region, where the background is highest. The muon trigger system covers the range |η| < 2.4 with resistive-plate chambers in the barrel, and thin-gap chambers in the endcap regions.
The data were collected with a two-level trigger system [20]. The first-level trigger (L1) is implemented in hardware and uses a subset of the detector information. This is followed by a softwarebased high-level trigger which runs algorithms similar to those in the offline reconstruction software, reducing the event rate to approximately 1 kHz from the maximum L1 rate of 100 kHz.

Data and simulated event samples
The pp collision data at √ s = 13 TeV analysed here correspond to the full recorded Run 2 dataset, with an integrated luminosity of 139 fb −1 after the application of data quality requirements. The mean number of pp interactions per bunch crossing was about 34. Events used in this analysis were recorded using a combination of single-muon triggers with transverse momentum thresholds up to 26 GeV for isolated muons and 50 GeV for muons without any isolation requirement imposed, allowing to recover some inefficiency introduced by the isolation requirement at trigger level for high momentum muons. The trigger efficiency for the sum of the H → μμ signal processes is about 91% relative to the common event preselection discussed in Section 4. Samples of simulated Monte Carlo (MC) events are used to optimise the selection, to model the signal processes and to develop an analytic function to model the m μμ distributions for the background estimate. The signal samples as well as a complete set of background processes were processed through the full ATLAS detector simulation [21] based on Geant4 [22], henceforth referred to as fully simulated samples.
Signal samples were generated for the main Higgs boson production modes. The mass of the Higgs boson was set in the simulation to m H = 125 GeV and the corresponding width is H = 4.07 MeV [23]. The samples are normalised with the latest available theoretical calculations of the corresponding SM production cross sections, summarised in Ref. [11]. The normalisation of all Higgs boson samples also accounts for the H → μμ branching ratio of 2.17 × 10 −4 calculated with HDECAY [24][25][26][27] and PROPHECY4F [28][29][30].
Higgs bosons produced via vector-boson fusion (VBF) and in as-  [71] and interfaced to Pythia 8 using the A14 set of tuned parameters [72]. The cross section is taken from a calculation accurate to NLO in QCD with NLO electroweak corrections [73][74][75][76].
Background events from the Drell-Yan (DY) Z /γ * → μμ process were generated with Sherpa 2.2.1 [77] using NLO-accurate matrix elements for up to two partons, and LO-accurate matrix elements for up to four partons calculated with the Comix [78] and OpenLoops [79,80] libraries and the NNPDF3.0 NNLO set. They were matched to the Sherpa parton shower [81] using the MEPS@NLO prescription [82][83][84][85]. Diboson processes (W W , W Z, and Z Z ) as well as electroweak Z jj production were simulated in a similar set-up with Sherpa 2.2.1. The tt and single-top-quark samples were generated at NLO accuracy with Powheg-Box [86,87] using the NNPDF3.0NLO PDF set interfaced to Pythia 8 for parton showering and hadronisation using the A14 parameter set. For the W t process, the diagram removal scheme [88] is applied to remove the overlap with tt production. The production of tt V events was modelled using the MadGraph5_aMC@NLO [69] generator at NLO in a set-up similar to the one used for the tt H process. The effects of multiple pp collisions in the same or neighbouring bunch crossings (pile-up) are included in the MC simulation by overlaying inelastic pp interactions produced using Pythia 8 with the NNPDF2.3LO set of PDFs [89] and the A3 set of tuned parameters [90]. Events are reweighted such that the distribution of the average number of interactions per bunch crossing matches that observed in data. Simulated events are corrected to reflect the momentum scales and resolutions as well as the trigger, reconstruction, identification, and isolation efficiencies measured in data for all the physics objects used in this analysis.
The background samples discussed above provide an equivalent integrated luminosity that is typically 5-20 times higher than that of data and are used to train multivariate classifiers and to test the background modelling. However, the statistical uncertainties in the dominant DY background are a limiting factor in studying the background modelling at the level required by the small expected H → μμ signal. Therefore, two fast-simulation set-ups were developed to generate significantly larger DY samples. The primary fast-simulation DY sample is based on parton-level events generated [91] with Sherpa 2.2.4 [92] using LO matrix elements for Z /γ * production with up to three additional partons and using the CT14 NNLO PDF set [93]. The parton-level events were processed with Pythia 8 to provide QED and QCD parton showering and hadronisation, and double-counted QCD emissions were removed using the CKKW-L algorithm [94] with a merging scale of 20 GeV. For further cross-checks, an additional fast-simulation DY sample was prepared that simulates Z /γ * + 0, 1 partons inclusively at NLO accuracy using Powheg-Box [95] with the CT10 PDF set [96] and Z /γ * + 2 partons at LO accuracy with Alpgen [97] using the CTEQ6L1 PDF set [98]. These parton-level events were processed with an approximate QCD shower algorithm, overlaps between the two samples were removed, and QED FSR was provided by Photos [99]. For both of these generated samples, experimental effects were approximated using parameterisations rather than using the full ATLAS detector simulation and reconstruction software. The parameterisations, extracted from fully simulated MC samples or directly from ATLAS data, reproduce the reconstruction and selection efficiencies of detector-level objects by event weighting and model the resolution of the ATLAS detector with predetermined probability distributions. Detailed descriptions were employed for the muon momentum resolution and muon trigger and selection efficiencies, for the photons from QED FSR, for hadronic jets from the primary interaction and pile-up events in terms of kinematics and the number of associated ID tracks, and for the effect of pile-up and the underlying event on the measurement of the miss-ing transverse momentum E miss T . In total, two sets of about 10-20 billion events are prepared in this way, corresponding to an equivalent integrated luminosity of at least 50 ab −1 in the kinematic phase space relevant for the analysis.
Both fast simulation DY samples give a good description of the data distributions of the observables used in this analysis to discriminate the DY background from the H → μμ signal, i.e. the m μμ mass spectra and the multivariate discriminants described in the following sections. Small residual differences are taken into account by reweighting the mass spectra to the data sidebands as described in Section 6.

Object definitions and event selection
Events are required to contain at least one reconstructed pp collision vertex candidate with at least two associated ID tracks each with p T > 0.5 GeV. The vertex with the largest sum of p 2 T of tracks is considered to be the primary vertex of the hard interaction. For signal events the primary vertex selection criteria has an efficiency of about 99% [100].
The majority of muon candidates are reconstructed by combining a track in the ID with a track in the MS. To improve the muon reconstruction efficiency in the region of |η| < 0.1, which has limited coverage in the MS, additional muon candidates are identified by matching a reconstructed ID track to either an MS track segment or a calorimetric energy deposit consistent with a minimum-ionising particle. In the region 2.5 < |η| < 2.7, which is not covered by the ID, additional muons are reconstructed from an MS track with hits in the three MS layers and combined with forward ID hits, if possible. Muon candidates are required to satisfy the 'loose' criteria defined in Ref. [100] and have p T > 6 GeV and |η| < 2.7. Muons with an associated ID track must be matched to the primary vertex by having a longitudinal impact parameter z 0 that satisfies |z 0 sin(θ )| < 0.5 mm, where θ is the polar angle of the track. The significance of the transverse impact parameter d 0 calculated relative to the measured beam-line position is re- Furthermore, isolation criteria are applied to suppress non-prompt muons originating from hadron decays. The isolation selection uses information about ID tracks and calorimeter energy deposits in a range R < 0.2 around the muon as described in Ref. [101].
Since muons may lose a significant fraction of their energy by QED FSR, up to one final-state photon candidate per event is included in the m μμ calculation to improve the signal reconstruction. Photon candidates are reconstructed with a procedure similar to the one described in Ref.
[101], optimised to achieve the best sensitivity for the H → μμ signal. Only photon candidates close to muons ( R(γ , μ) < 0.2) are considered. To reduce background from pile-up interactions, a variable threshold is imposed on the photon transverse momentum p γ T ; the threshold increases linearly from p γ T = 3 GeV for R = 0 to p γ T = 8 GeV for R = 0.2. If more than one photon passes this requirement, the photon with the highest transverse momentum is selected. A QED FSR candidate is found in about 5% of the events and the signal m μμ width is reduced by about 3% when considering all reconstructed signal events. With these selections, contributions from the loop-induced decay H → Z γ , Z → μμ [102] are expected to be around 0.1% of the H → μμ yield and are thus neglected in the further analysis.
Electrons are reconstructed by matching clusters of energy in the electromagnetic calorimeter to tracks in the ID. They are required to satisfy 'Medium' identification criteria [103], have p T > 7 GeV and |η| < 2.47 and be outside the region of 1.37 < |η| < 1.52. Similarly to muons, electrons are required to be isolated from additional activity measured by ID tracks and the calorimeters within R < 0.2 [101] and to be matched to the primary vertex with |z 0 sin(θ )| < 0.5 mm and |d 0 |/σ (d 0 ) < 5. Table 1 Summary of the main event selection criteria common to all events as well as the criteria applied to the selection of hadronic jets. The bottom sections give the basic requirements on leptons and b-tagged jets for the analysis categories targeting different Higgs boson production processes. The subleading muon momentum threshold is 15 GeV in all categories except the V H 3-lepton categories, where it is lowered to 10 GeV.

Common preselection
Primary vertex Two opposite-charge muons Jets p T > 25 GeV and |η| < 2.4 or with p T > 30 GeV and 2.4 < |η| < 4.5 tt H Category at least one additional e or μ with p T > 15 GeV, at least one b-jet (85% WP) Jets containing b-hadrons with |η| < 2.5 are identified as btagged jets using a multivariate b-tagging algorithm. Two identification working points (WP) are used [108,109] to provide a 60% (85%) efficiency in tt events and a rejection factor of 1200 (25) for light-flavour jets, respectively. Neutrinos escape from the detector and lead to missing transverse momentum E miss T . The E miss T is defined as the magnitude of the negative vectorial sum of the transverse momenta of the selected and calibrated physics objects (including muons, electrons and jets) and the ID tracks not associated with any physics object (soft term) [110,111].
Events are selected if they contain at least two oppositecharge muon candidates. The leading muon is required to have p T > 27 GeV to be above the trigger threshold and in most categories the subleading muon has to have p T > 15 GeV. Further requirements on the presence or absence of additional muons, electrons and b-tagged jets depend on the targeted Higgs boson production mode (tt H, V H, or ggF +VBF), as detailed in Section 5. The main selection requirements are summarised in Ta

Event categorisation
Events satisfying the preselection criteria of Section 4 are classified into 20 mutually exclusive categories. They are defined to exploit the topological and kinematic differences between the background processes and the different Higgs boson production modes: ggF, VBF, V H and tt H. The background is dominated inclusively by the DY process, while diboson production, tt and single-top production and rarer SM processes such as tt V play a significant role in the categories targeting V H and tt H production.
After preselecting events according to the presence of additional leptons and the number of jets and b-tagged jets, boosted decision trees (BDT) [112,113] are trained using the XGBoost package [114] to enhance the signal sensitivity as explained in the following. In order to avoid any potential bias, all trainings are performed using k-fold cross-validation, where k different partitions are used in turn for training, for validation and for testing.
The category selections targeting the different Higgs boson production modes are made in a specific exclusive order, corresponding to the order in which they are presented in this section.

tt H category
A category enriched in tt H events is defined in order to target the dileptonic or semileptonic decay of the tt system. Events are considered for this category if there is at least one lepton (e or μ) with a transverse momentum p T > 15 GeV in addition to the opposite-sign muon pair and at least one b-tagged jet selected by the 85% efficiency working point. The two highest-p T muons with opposite charge are chosen as the Higgs boson decay candidate and used to calculate the variable m μμ used in the final fit. This procedure correctly selects the muon pair coming from the Higgs boson decay in about 80% of the cases. After this selection, a BDT is trained using simulated tt H, H → μμ events as signal and simulated events from all SM background processes as background, both selected in a range m μμ = 100-200 GeV. The 12 variables used in the BDT include the Higgs boson candidate's transverse momentum p μμ T , the value of the cosine of the lepton decay angle cos θ * in the Collins-Soper frame [115], the transverse momenta of the additional leptons, the multiplicity of central jets with |η| < 2.5, the multiplicity of b-tagged jets, and the scalar sum of the transverse momenta of all the jets, H T . In addition, several invariant masses derived from the reconstructed objects as described in the following are used. If there are not enough reconstructed objects in the event to define the invariant masses described in the following, fixed arbitrary values outside their physical ranges are assigned. The leptonic top-quark candidate mass m Lep-Top is calculated as the transverse mass of the system composed of the third lepton, the missing transverse momentum and the b-tagged jet candidate (if more than one b-tagged jet is present, the one yielding m Lep-Top closest to 173 GeV is chosen). The transverse mass of the leptonic W -boson candidate m Lep-W is calculated from the system composed of the third lepton and the missing transverse momentum. The hadronic top-quark candidate mass m Had-Top is reconstructed from three jets, where one jet must be b-tagged (if only one b-tagged jet is present, it is used in the reconstruction of both m Lep-Top and m Had-Top ). If more than three jets are available, the combination is chosen that maximises the probability of compatibility with a hadronic top-quark decay in terms of closeness of m Had-Top and m Had-W to the top-quark and W -boson masses, respectively, where the mass of the hadronically decaying W boson, m Had-W , is calculated from the two non-b-tagged jets associated with the hadronic top decay. If, in addition to the two muons associated with the Higgs boson candidate, two additional opposite-sign and same-flavour leptons are present in the event, their invariant mass is used in the BDT. Similarly, if there are at least three muons reconstructed, the mass of the dimuon pair formed from the muon with the third-highest p T and the oppositely charged muon assigned to the Higgs boson is included in the classification.
A selection is applied to the BDT score to define one tt Henriched category (named tt H in the following) by optimising the sensitivity to the predicted SM signal. The background is expected to be dominated by the tt Z process with additional contributions from the production of tt, dibosons, and tt H with Higgs boson decays into a final state different from H → μμ. Assuming SM Higgs boson production and decay, 1.2 signal events are expected in this category with a purity of 98% for the tt H process relative to other Higgs boson production modes and a signal-to-background ratio of 8% in the mass window m μμ = 120-130 GeV. Two BDTs are trained, separately for the three-lepton and fourlepton events, to discriminate between the simulated signal and background events satisfying the preselection criteria and having a dimuon invariant mass in the range m μμ = 110-160 GeV. The three-lepton BDT uses the W H, H → μμ production as signal.

V H categories
The variables used include the azimuthal separation φ between the Higgs boson candidate and the missing transverse momentum, the transverse momentum of the W lepton, the transverse mass of the W boson candidate, the azimuthal separation and the separation in pseudorapidity η between the Higgs boson candidate and the W lepton, the missing transverse momentum, the transverse momentum of the leading jet (if present) and the number of reconstructed jets.
For the four-lepton events the Z H, H → μμ production is chosen as signal in the BDT training. The variables used in the BDT include the azimuthal separation between the leptons from the Z → candidate, the azimuthal separation and the separation in pseudorapidity between the H → μμ candidate and the Z → candidate, the invariant mass of the Z → candidate, the number of jets and the transverse momentum of the two leading jets (if present).
Three V H categories are defined by applying selection criteria to the two BDT scores which optimise the sensitivity to the predicted SM signal. Two categories are defined for the three-lepton events and are named VH3LH and VH3LM, where the former has the higher signal-to-background ratio. For the four-lepton events, only one category is selected and is named VH4L. The diboson processes are expected to constitute about 70% (55%) of the total background in the VH3LH (VH3LM) category with smaller contributions expected from top-quark pair production and the DY process. In the VH4L category, about 98% of the background is from the Z Z process. Assuming the SM Higgs boson production and decay, the numbers of signal events expected in the m μμ = 120-130 GeV mass window for the VH3LM, VH3LH and VH4L categories are 2.8, 1.4 and 0.5, respectively, and the corresponding signal-tobackground ratios are 0.8%, 3.7% and 2.6%. The expected H → μμ signal purity for the V H production process relative to other Higgs production modes is 89% in the VH3LM category and more than 99% in the VH3LH and VH4L categories.

ggF and VBF categories
The events not selected in the tt H or V H categories described above, are further classified according to the number of reconstructed jets into three jet multiplicity categories: 0-jet, 1-jet and 2-jet, where the last includes events with two or more jets. Events with at least one b-tagged jet selected by the 60% efficiency working point or with a third muon with p T > 15 GeV are rejected in the ggF and VBF categories, as they are found to have a very low signal-to-background ratio and negligible signal sensitivity. To fully exploit the kinematic differences between the signal and the backgrounds, which are dominated by DY dimuon production contributing more than 90% of all background events after preselection, BDTs are trained in each jet multiplicity category. All BDTs are trained using the MC background samples and the simulated H → μμ signal in the mass window m μμ = 120-130 GeV.
In the 2-jet category, a BDT is trained to disentangle signal events produced by VBF, used as signal sample in the training, from background events. This BDT, with a score denoted by O VBF , is based on 17 variables related to the dimuon and dijet systems as described below. The dimuon system is characterised by the transverse momentum p μμ T , rapidity y μμ and the value of cos θ * . Compared to events from the dominant DY background, signal events from both ggF and VBF production are characterised by larger p μμ T and smaller absolute values of y μμ . The cos θ * distributions provide some discrimination due to different spin-structures and Z -γ interference effects for the DY background [116]. For the leading and subleading jets in the event (denoted by j 1 and j 2 ), the following variables are computed: p T and η of j 1 and j 2 ; the azimuthal separation between the dimuon system and each jet, φ μμ, j 1 and φ μμ, j 2 ; the kinematics of the dijet system ( jj) characterised by transverse momentum p jj T , mass m jj and rapidity y jj ; and the azimuthal separation between the dimuon and dijet systems, φ μμ, jj . These variables exploit the unique signature of the VBF process: two high-p T jets separated by a large rapidity gap with little hadronic activity. For jets that have a high p T > 50 GeV and are in the central region |η| < 2.1, the multiplicity of ID tracks with p T > 0.5 GeV associated with each of the two leading jets, N j i track , is also used to help discriminate between jets produced by fragmentation of gluons and quarks [117,118]. In addition, E miss T and H T , which can discriminate the tt background from the signal, are also considered.
From the O VBF classifier, four categories in the region with the highest score are selected and named VBF Very High, High, Medium, and Low. The expected H → μμ signal contributing to these categories is dominated by the VBF process with a purity ranging from about 93% (VBF Very High) to 65% (VBF Low) and the expected SM signal-to-background ratio computed in the 120-130 GeV mass window varies between 18% (VBF Very High) and 2.8% (VBF Low). The predicted number of SM signal events in the VBF categories ranges between 2.8 and 7.5 events.
The remaining events are considered for further classification in three other BDTs split by jet multiplicity, with scores denoted as O The ggF production process contributes 80%-100% of the expected H → μμ signal in these categories. The expected SM signal-tobackground ratio in the 120-130 GeV mass window varies from 1.7%-1.5% (2-jet Very High and 1-jet Very High) to 0.07% (0-jet Low). The predicted number of SM signal events in the ggF categories ranges between 17 and 125 events.

Signal and background modelling and systematic uncertainties
The signal extraction is based on a binned maximum-likelihood fit to the invariant mass spectrum of the dimuon system as described in Section 7. Analytic models are used in the fit to describe the m μμ distributions for both the signal and background processes.

Signal modelling
In the SM, the H → μμ signal is predicted to be a narrow resonance with a width of 4.1 MeV for m H = 125.09 GeV. The observed signal shape is thus determined by detector resolution effects on the muon momentum measurement. A double-sided Crystal Ball function is used for the Higgs signal model. This function is a modification of the Crystal Ball function [119,120], and consists of a Gaussian central part with a power-law tail on each side.
For each of the 20 categories, the signal parameters are fitted to signal MC spectra summed over all production modes (ggF, VBF, V H, tt H) assuming the relative normalisations as predicted by the SM. Within each category no significant differences are found between the signal shapes of the different production modes, indicating that the signal parameterisation is not sensitive to the assumption on their relative normalisations. The width of the Gaussian component of the double-sided Crystal Ball function varies between 2.6 and 3.2 GeV depending on the category. Potential biases in the extracted signal yields due to the analytic parameterisations are tested with a signal injection procedure: in a signal-plus-background fit to pseudo-data constructed from the expected signal and background distributions, the extracted signal yields agree with those injected within the statistical accuracy of about 0.3%.
Several sources of systematic uncertainty in the signal modelling are considered, including both theoretical and experimental effects. The theoretical uncertainties in the signal production affect the number of signal events expected in each category. The uncertainties considered for the main production modes (ggF and VBF) include the impact of the missing higher-order QCD corrections, PDFs, the underlying event and hadronisation. In particular, the uncertainty in the ggF signal is derived using the approach described in Ref.
[11], including effects from the variation of QCD scales for factorisation, renormalisation and resummation, and the migration between jet-multiplicity regions [121][122][123][124][125][126][127][128]. The uncertainty in the ggF Higgs boson transverse momentum, including the effect of migration between different kinematic regions and of the treatment of the top-quark mass in the loop corrections, is also taken into account. In addition, dedicated uncertainties are assigned for the ggF signal acceptance in VBF topologies. The uncertainties in the predicted SM branching ratio and Higgs boson production cross sections are included in accord with Ref. [11]. The uncertainties associated with the modelling of the underlying event and parton showering are estimated by considering the Pythia 8 systematic eigentune variations and by comparing events showered by Pythia 8 with those showered by Herwig 7 [129,130]. The impact of the theory uncertainties on the predicted signal acceptances in the different categories ranges between a few per mill and 15% for ggF production. Similarly, for the VBF production the impact of the theory uncertainties on the predicted signal acceptances varies between a few per mill and 7%. For the V H and tt H categories the theory systematic uncertainties have an impact on the predicted signal acceptances of between a few per mill and about 18%.
Systematic uncertainties related to the different reconstructed physics objects used in the analysis affect the expected signal yields in each category. In addition, systematic uncertainties in the muon momentum scale and resolution also affect the signal mass distribution. The experimental uncertainties considered are the muon reconstruction and identification efficiencies, the efficiencies due to the trigger, isolation and impact parameter requirements, the muon momentum scale and resolution [100,131,132], the determination of the E miss T soft term [110], the b-tagging efficiency [109], the uncertainty in the number of tracks associated with the jets [117], the pile-up modelling [90], and uncertainties in the electron reconstruction and identification efficiency [103] as well as in the jet reconstruction efficiency, energy scale and resolution [133]. The impact of the experimental uncertainties on the predicted signal yields and modelling in the different categories is dominated by the uncertainties in the jet energy scale and resolution and the muon momentum resolution. The former can affect signal yields by up to about 10% in some of the 2-jet categories. The muon momentum resolution uncertainty has an impact on the fitted yields ranging between 1% and 6% depending on the category.
The experimental uncertainty of 240 MeV in the assumed value of the Higgs mass from Ref. [5] is also taken into account. All these sources of uncertainty are included in the signal extraction fit described in Section 7 through nuisance parameters acting on the relative signal yields in the different categories and on the signal mass distributions.

Background modelling
Due to the very small signal-to-background ratio, which is at the level of 0.2% in the region m μμ = 120-130 GeV in an inclusive selection, an accurate determination of the background is of paramount importance. The m μμ background spectrum is parameterised by analytic functions that can describe this distribution at the per-mill level to avoid a significant bias in the extracted signal yields. The mass range used for the fit, m μμ = 110-160 GeV, is optimised to obtain the best signal sensitivity taking into account the statistical and systematic uncertainties. For the ggF and VBF categories, the background is dominated by the DY process, which accounts for more than 90% of the total, with small contributions from top-quark processes (mainly in the 2-jet categories) and diboson production. In the tt H and V H categories, the dominant backgrounds are associated production of tt Z and V Z with Z → μμ, respectively, while the DY process, tt production and other diboson processes give minor contributions.
To achieve the required accuracy in the analytic description of the background m μμ distribution, the following approach is used. A core function that describes the DY mass shape inclusively is multiplied by an empirical function that can correct for distortions of the mass shape due to the event selection and categorisation, higher-order theory corrections and other smaller background contributions. The empirical functions chosen are also flexible enough to describe the background shape in categories where the dominant background is not the DY process. The core function has no free parameters and is common to all categories, while the empirical functions have a certain number of free parameters that are selected and fit to data independently in each category.
The core component of the background is an analytical function based on a LO DY line-shape, described in Appendix A, convoluted with detector effects. The experimental resolution in the dimuon invariant mass is found to have an important effect on the core function, since it produces a significant shape variation in the mass region just above the Z -boson resonance and thus influences the lower end of the fit region in the H → μμ search. To take this effect into account, the LO DY line-shape is convolved with a Gaussian function with a mass-dependent resolution derived from the simulation.
The core function is multiplied by the empirical component to obtain the final background parameterisation used in the fits to the m μμ spectra. Two families of functions are studied for this empirical component: power-law functions ('Power') and exponentials of polynomials ('Epoly'), as defined in Table 2.
The criteria used to select the background functions from among those listed above and to determine the associated systematic uncertainty, referred to as the spurious signal (SS) [1], are described in the following. The SS yields are taken as the measured signals obtained in signal-plus-background fits to the backgroundonly MC templates. They are determined not only for a signal mass of 125 GeV, but also for values of m H between 120 and 130 GeV in steps of 1 GeV. The templates derived from fast and full simulation DY samples are reweighted using first-or second-order polynomial functions in m μμ to the data sidebands for all these studies.
As a first requirement, only functions able to fit the data sidebands, the fully simulated background samples and the fast DY simulation with a χ 2 probability of the fit greater than 1% (for all these samples) are considered. For the tt H and V H categories, only the data sidebands and the fully simulated background samples are considered for these criteria, and the DY contribution is neglected since it is very small and subject to large statistical fluctuations.
For the functions that satisfy these criteria, a spurious-signal test is performed separately in each category. For the ggF and VBFenriched categories the primary fast-simulation DY sample based on Sherpa as described in Section 3 is used, since it has high statistical precision, while for the tt H and the V H categories the fully simulated non-DY background samples are used. Only the functions with the absolute value of the SS below 20% of the expected signal statistical error in data in the mass range 120 to 130 GeV are considered. When applying this requirement, the MC statistical error is subtracted from the absolute value of the SS. Among the functions that pass this requirement, those with the smallest number of degrees of freedom are selected in each category to minimise the statistical uncertainty that dominates in this search. If more than one function per category passes this last selection, the one with the smallest SS is selected. The maximum absolute value of the SS in the mass range 120-130 GeV is taken to be the background modelling uncertainty for the respective category.
As an additional cross-check, the SS tests for the ggF and VBFenriched categories are also performed on the fully simulated Sherpa DY samples and the alternative fast DY simulation based on the merged Powheg-Box and Alpgen DY samples as explained in Section 3. Further cross-checks are performed with the fast DY simulation after applying several theoretical variations, such as changes of the QCD renormalisation and factorisation scales by factors of two and one half and alternative PDF sets, and experimental variations of the muon momentum resolution and scale and the pile-up jet modelling within the experimental uncertainties. In all these checks, no statistically significant increase in the SS values is found, hence they are not included as additional systematic uncertainty since their impact would be negligible. The SS systematic uncertainty also addresses any potential local biases in the mass spectra close to the signal region caused by the experimental selections, such as the BDT score requirements or the lepton pairing procedure in the V H categories.
After applying the above criteria, there is no evidence of statistically significant mismodelling, as no SS values are found that are more than two standard deviations away from zero for a signal mass of 125 GeV. This considers the statistical accuracy of the fast DY simulation that is about ten times better than that of the data. All the SS are considered as uncorrelated systematic uncertainties among the different categories. If the SS uncertainties were considered as fully correlated between categories, the expected significance would change by less than 2%.
The SS uncertainties in the different categories range from a few per cent up to about 20% of the expected data statistical uncertainties in the VBF and ggF categories and up to about 30% in the V H and tt H categories, which have less statistical precision in their background simulated samples.

Other systematic uncertainties
In addition to the systematic uncertainties in the signal and background modelling described above, the uncertainty of 1.7% in the combined 2015-2018 integrated luminosity is also considered. It is derived from the calibration of the luminosity scale using x-y beam-separation scans [134], obtained using the LUCID-2 detector [135] for the primary luminosity measurements.

Results
The signal yield is obtained by a simultaneous binned maximum-likelihood fit to the m μμ distributions of the 20 categories in the range 110-160 GeV. The chosen bin size is 0.1 GeV. Confidence intervals are based on the profile-likelihood-ratio test statistic [136]. The systematic uncertainties listed in Section 6  + S/B), where S are the observed signal yields and B are the background yields derived from the fit to data in the m μμ = 120-130 GeV window. The background and signal pdf are derived from the fit to the data, with S normalised to its best-fit value. The lower panels compare the fitted signal pdf, normalised to the signal best-fit value, to the difference between the data and the background model. The error bars represent the data statistical uncertainties.

Table 3
Number of events observed in the m μμ = 120-130 GeV window in data, the number of signal events expected in the SM (S SM ), and events from signal (S = μ × S SM ) and background (B) as derived from the combined fit to the data with a signal strength parameter of μ = 1.2. The uncertainties in S SM correspond to the systematic uncertainty of the SM prediction, the uncertainty in S is given by that in μ, and the uncertainty in B is given by the sum in quadrature of the statistical uncertainty from the fit and the SS uncertainty. In addition the observed number of signal events divided by the square root of the number of background events (S/ √ B) and the signal-to-background ratio (S/B) in % for each of the 20 categories described in the text are displayed. In the last column, the width of the Gaussian component of the double-sided Crystal Ball function used in the signal modelling (σ , as described in Section 6) is reported.

Category
Data The best-fit value of the signal strength parameter, defined as the ratio of the observed signal yield to the one expected in the SM, is μ = 1.2 ± 0.6, corresponding to an observed (expected) significance of 2.0σ (1.7σ ) with respect to the hypothesis of no H → μμ signal. The spectra of the dimuon invariant mass for all the analysis categories after the signal-plus-background fit are presented in Fig. 1. In Fig. 1 Table 3.
The best-fit values of the signal strength parameters for the five major groups of categories (tt H + V H, ggF 0-jet, 1-jet, 2-jet, and VBF) are shown in Fig. 2 together with the combined value. A goodness-of-fit test is performed using the saturated model technique [137] and returns a probability of 10%.
The signal strength uncertainty is dominated by the data statistical error of about ±0.58. The impact of the systematic uncertainties on the signal strength is found to be +0.18 −0.13 , with contributions from the signal theory uncertainties that account for +0.13 −0.08 , the signal experimental uncertainties that account for +0.07 −0.03 and the spurious-signal uncertainties that account for ±0.10.
The compatibility of the measured signal strengths between the 20 categories is tested by repeating the fit after allowing each category to have its own signal strength parameter. The probability of compatibility is found to be at the level of 2%. With the same methodology, the probability of compatibility between the signal strengths of the five groups of categories shown in Fig. 2 is found to be 20%. Among the 20 categories, those with an individual signal strength at the level of two standard deviations from the mean value are the VBF Medium, the 0-jet Very High and the VH4L. For each of these three categories, it was checked that adding one degree of freedom to the function used to model the background or changing the functional form from 'Power' to 'Epoly' does not significantly impact the analysis results or the probability of compatibility between the 20 categories.
An upper limit on the signal strength μ is computed using a modified frequentist CL s method [136,138]. The observed upper limit on μ at 95% confidence level (CL) is found to be 2.2, with an expected limit of 1.1 for the case of no H → μμ signal and an expected limit of 2.0 for the case of an H → μμ signal at SM strength. The corresponding branching ratio upper limit at 95% CL is B(H → μμ) < 4.7 · 10 −4 , assuming the SM cross section for Higgs boson production.
This result represents an improvement of about a factor of 2.5 in expected sensitivity compared with the previous ATLAS publication [14]. Of this improvement, a factor of about two is due to the larger analysed dataset and the additional 25% improvement can be attributed to more advanced analysis techniques.

Conclusion
A search for the rare dimuon decay of the Higgs boson is performed using the full Run 2 dataset of 139 fb −1 collected with the ATLAS detector in pp collisions at √ s = 13 TeV at the LHC.
The best-fit value for the SM H → μμ signal strength parameter is found to be μ =

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
The parton luminosity contribution L qq is derived from the PDF4LHC15 PDF set as a function of ŝ = m 2 μμ using APFEL [140] interfaced to LHAPDF [141] and parameterised using a 6th order polynomial. The matrix element component σ qq (ŝ) = σ qq (m μμ )/ (2m μμ ) can be expressed as The ATLAS Collaboration Here Q , V , A denote the electric charges, vector and axial-vector couplings of the fermions, α, G F the electroweak couplings, m Z , Z the mass and width of the Z -boson using values from Ref. [142] and N c = 3 the number of QCD colour charges. The DY function described above is then convolved with a Gaussian function with a mass-dependent resolution derived from the simulation.