Evidence for Higgs boson decays to a low-mass dilepton system and a photon in pp collisions at $\sqrt{s} =$ 13 TeV with the ATLAS detector

A search for the Higgs boson decaying into a photon and a pair of electrons or muons with an invariant mass $m_{\ell\ell}<30$ GeV is presented. The analysis is performed using 139 fb$^{-1}$ of proton-proton collision data, produced by the LHC at a centre-of-mass energy of 13 TeV and collected by the ATLAS experiment. Evidence for the $H \rightarrow \ell \ell \gamma$ process is found with a significance of 3.2$\sigma$ over the background-only hypothesis, compared to an expected significance of 2.1$\sigma$. The best-fit value of the signal strength parameter, defined as the ratio of the observed signal yield to the one expected in the Standard Model, is $\mu = 1.5 \pm 0.5$. The Higgs boson production cross-section times the $H \rightarrow\ell\ell\gamma$ branching ratio for $m_{\ell\ell}<$ 30 GeV is determined to be 8.7 $^{+2.8}_{-2.7}$ fb.


Introduction
In July 2012, the ATLAS and CMS Collaborations at the CERN Large Hadron Collider (LHC) announced the discovery of a new particle with a mass of approximately 125 GeV [1,2]. The observed properties of the particle, such as its couplings to Standard Model (SM) elementary particles, its spin and its parity, are so far consistent with the predictions for the SM Higgs boson [3][4][5][6][7].
Measurements of rare decays of the Higgs boson, such as → ℓℓ where ℓ is an electron or muon, can probe coupling modifications introduced by possible extensions to the SM [8]. In addition, such three-body Higgs boson decays can be used to probe -violation in the Higgs sector [9,10].
Multiple processes contribute to the → ℓℓ decay: Dalitz decays involving a boson or a virtual photon ( * ) (Figure 1(a-c)), as well as the decay of the Higgs boson to two leptons and a photon from final-state radiation (FSR) (Figure 1(d)). Their respective fractions depend on the invariant mass of the dilepton pair, ℓℓ . In this analysis only low-mass dilepton pairs with ℓℓ < 30 GeV are considered. This region is completely dominated by the decay through * [8, 11,12]. The contributions of the other processes and interferences are negligible. Based on a data sample of proton-proton ( ) collisions at √ = 13 TeV with an integrated luminosity of 35.9 fb −1 , the CMS Collaboration reported a 95% CL upper limit on the production cross-section times branching ratio for the low-→ process of 4.0 times the SM prediction [13]. In addition, both the ATLAS and CMS Collaborations carried out searches at √ = 13 TeV for the closely related → process [13,14]. The CMS Collaboration also searched for the lowℓℓ → ℓℓ process in the dimuon and dielectron channels in collisions at √ = 8 TeV [15].
This paper describes a search for → and → decays with ℓℓ < 30 GeV. When the invariant mass of the two electrons is low and the transverse momentum of the dielectron system is high, their electromagnetic showers can overlap in the calorimeter. Therefore, the search for final states requires the development of dedicated electron trigger and identification algorithms.
The search uses collision data at √ = 13 TeV recorded with the ATLAS detector during Run 2 of the LHC between 2015 and 2018, corresponding to a total integrated luminosity of 139 fb −1 . The sensitivity of the search is enhanced by dividing the selected events into mutually exclusive categories, according to the event topology and lepton flavour. The dominant background is the irreducible non-resonant production of ℓℓ . After event categorisation, the signal yield is extracted by a simultaneous fit of parametric functions to the reconstructed ℓℓ invariant mass ( ℓℓ ) distributions in all categories.

ATLAS detector
The ATLAS detector [16] covers nearly the entire solid angle around the collision point. 1 It consists of an inner tracking detector (ID) surrounded by a thin superconducting solenoid providing a 2 T axial magnetic field, electromagnetic (EM) and hadron calorimeters, and a muon spectrometer (MS). The ID covers the pseudorapidity range | | < 2.5. It consists of silicon pixel, silicon microstrip, and transition radiation tracking detectors. The silicon pixel detector provides up to four measurements per track. The insertable B-layer (IBL) [17,18] constitutes the innermost layer at a radius of 33.3 mm. It surrounds the beam pipe, which has an inner radius of 23.5 mm. Lead/liquid-argon (LAr) sampling calorimeters provide EM energy measurements with high granularity. A steel/scintillator-tile hadron calorimeter covers the central pseudorapidity range (| | < 1.7). The endcap and forward regions are instrumented with LAr calorimeters for both the EM and hadronic energy measurements up to | | = 4.9. For | | < 2.5, the EM calorimeter is divided into three longitudinal layers, which are finely segmented in and , particularly in the first layer. This segmentation allows the measurement of the lateral and longitudinal shower profile, and the calculation of shower shapes [19] used for particle identification and background rejection. The longitudinal segmentation of the EM calorimeter is also exploited to calibrate the energy response of electron and photon candidates [19]. The MS comprises separate trigger and high-precision tracking chambers measuring the deflection of muons in a magnetic field generated by superconducting air-core toroids. The precision chamber system covers the region | | < 2.7 with three layers of monitored drift tubes, complemented by cathode-strip chambers in the forward region, where the background is highest. The muon trigger system covers the range | | < 2.4 with resistive-plate chambers in the barrel, and thin-gap chambers in the endcap regions.
A two-level trigger system [20] was used during the √ = 13 TeV data-taking period. The first-level trigger (L1) is implemented in hardware and uses a subset of the detector information. This is followed by a software-based high-level trigger which runs algorithms similar to those in the offline reconstruction software, reducing the event rate to approximately 1 kHz from the maximum L1 rate of 100 kHz.

Data and simulated event samples
The analysed collision data correspond to the full recorded LHC Run-2 data set at √ = 13 TeV with an integrated luminosity of 139 fb −1 after data quality requirements [21]. The events were collected with a combination of single-lepton, dilepton, diphoton, and lepton+photon triggers. Standard electron triggers, which require narrow, isolated EM energy clusters, are efficient if the two electrons in the event have a very small angular separation Δ , as the two electrons produce a cluster that is similar to that of a single electron. For Δ > 0.1, two separate EM clusters are typically produced. For the 2017 and 2018 data-taking periods, a dedicated trigger was introduced which requires at least one photon with details are provided in Ref. [22]. The overall trigger efficiencies for the → and → processes are 98% and 96%, respectively (97% combined), relative to the event selection discussed in Section 5. The uncertainty in the trigger efficiency is negligible.
Samples of simulated Monte Carlo (MC) events are crucial for the optimisation of the search strategy and the modelling of the signal and background processes. Simulated → * → ℓℓ signal and → background events are used to parameterise these processes, and simulated non-Higgs ℓℓ events are used to choose analytic functional forms to describe the non-resonant background. The generated MC events, unless stated otherwise, were processed with the full ATLAS detector simulation [23] based on G 4 [24]. The effect of multiple interactions in the same and neighbouring bunch crossings (pile-up) was included by overlaying minimum-bias events simulated with P 8.186 [25] using the NNPDF2.3LO parton distribution function (PDF) set [26] and the A3 set [27] of tuned parameters. The MC events were weighted to reproduce the distribution of the number of interactions per bunch crossing observed in the data.
Higgs boson production in the gluon-gluon fusion (ggF) and vector-boson fusion (VBF) production modes, as well as in quark-initiated associated production with a or boson (¯/ →VH) or with two top quarks (¯) was modelled with the P -B v2 MC event generator [28][29][30][31][32]. P -B v2 was interfaced with P 8 [25] to simulate the → * → ℓℓ and → decays. P also provides parton showering, hadronisation and the underlying event. The PDF4LHC15 PDF set [33] was used, except for¯, where the NNPDF3.0nlo PDF set [26] was used. For the ggF process, the P NNLOPS program [34,35] achieves next-to-next-to-leading-order (NNLO) accuracy in QCD for inclusive observables after reweighting the Higgs boson rapidity spectrum [36]. The simulation reaches next-to-leading-order (NLO) accuracy in QCD for the VBF, VH, and¯processes. For VH, the MiNLO technique [37-39] was applied. No simulated samples are available for gluon-initiated associated production with a boson ( →ZH) and associated production with two -quarks (bbH). Their minor contribution, 1% of the expected Higgs boson production, is modelled using the acceptances from thē / →ZH and ggF samples, respectively.
Alternative → samples are considered for the ggF and VBF processes, where H 7 [40] was used instead of P 8 to provide parton showering. Weights were calculated by comparing the generator-level Higgs boson transverse momentum and jet distributions in these samples with the distributions obtained from the nominal → samples. The weights are applied to the → * → ℓℓ ggF and VBF samples to estimate the effect of varying the parton shower and underlying event.
The mass of the Higgs boson was set in the simulation to = 125 GeV and its width to Γ = 4.1 MeV [41]. The Higgs mass peak position in the simulation was corrected to account for the small difference relative to the measured value of 125.09 GeV [42] (see Section 6); otherwise the effect on the kinematics is neglected.
There are multiple calculations of the → ℓℓ branching ratios for different slices of phase space [8,11,12,[73][74][75][76]. The → ℓℓ branching ratios used in this analysis are B ( → ) = 7.20 × 10 −5 and B ( → ) = 3.42 × 10 −5 , as estimated with P 8 for ℓℓ < 30 GeV. Their sum corresponds to ∼5% of the → branching ratio. When extrapolated to a common phase space, the P 8 estimate agrees with the predictions of Refs. [74,76] within a relative difference of 3%. Currently, no theoretical uncertainty calculation exists for the lowℓℓ → ℓℓ branching ratio. Therefore, the theoretical uncertainties in the → and → branching ratios are considered [12], and the larger of the two uncertainties is used, which corresponds to the 5.8% relative uncertainty in the → branching ratio [41].
The background originates mainly from non-resonant ℓℓ production. Events were simulated with the S 2.2.8 [77] generator based on LO matrix elements for ℓℓ production with up to three additional partons and using the NNPDF3.0 PDF set [78]. The S simulation includes parton shower, fragmentation and underlying-event modelling. As the statistical uncertainties in the simulated ℓℓ background samples are a limiting factor when studying the background modelling at the level required by the small signal-to-background ratio, a procedure was developed to generate significantly larger samples. These are based on events generated using the S 2.2.8 configuration described above with object efficiencies approximated by parameterisations rather than using the full ATLAS detector simulation and reconstruction software. The parameterisations, extracted from fully simulated MC samples, reproduce the reconstruction and selection efficiencies of detector-level objects via event weighting. Comparisons with a sample that underwent the full ATLAS detector simulation, and whose effective luminosity is greater than the data luminosity, show good agreement within the statistical uncertainties.

Object selection
Events are required to have a collision vertex associated with at least two tracks with transverse momentum T > 0.5 GeV each. In the case of multiple vertices, the vertex with the largest 2 T of the associated tracks is considered to be the primary vertex.
Muon candidates are obtained by matching high-quality tracks in the MS and ID. Standalone MS tracks are used to extend the muon reconstruction beyond the ID acceptance to 2.5 < | | < 2.7. Muon candidates are required to satisfy the medium identification criteria [79,80], be within | | < 2.7 and have T > 3 GeV. Muon candidates with an associated ID track must be matched to the primary vertex by having a longitudinal impact parameter Δ 0 that satisfies |Δ 0 · sin | < 0.5 mm, where is the polar angle of the track. The significance of the transverse impact parameter 0 calculated relative to the measured beam-line position is required to be | 0 |/ 0 < 3, where 0 is the uncertainty in 0 . A subset of the muon candidates are also required to be isolated from additional activity in the tracking detector and in the calorimeters, using a loose isolation selection [79,80]. The efficiency of the muon reconstruction and identification, as well as the momentum calibration, including the associated systematic uncertainties, is estimated as described in Refs. [79,80].
Photon and electron candidates are reconstructed from energy clusters in the calorimeters which are formed using an algorithm based on dynamical, topological cell-clustering [19]. Candidates in the transition region between the barrel and endcap EM calorimeters, 1.37 < | | < 1.52, are excluded. The performance of the electron and photon reconstruction, including the associated systematic uncertainties, is studied in Ref. [19].
Photon candidates can be either unconverted or converted. In the latter case, a track or conversion vertex is matched to the EM cluster. Photon candidates are required to satisfy T > 20 GeV and | | < 2.37, and pass the tight identification requirements [19,81] as well as a loose isolation selection [19].
The energy of the EM clusters associated with the photon and electron candidates is corrected in successive steps using a combination of simulation-based and data-driven correction factors [19]. The simulation-based calibration regression is optimised separately for electrons, unconverted photons and converted photons. The resolution of the energy response is corrected in the simulation to match the resolution observed in data using → events, by smearing the electron energy such that the width of the simulated boson peak matches the width observed in data.
Because of the event kinematics of the signal process, it is common for the energy deposits of the two electrons in the EM calorimeter to be reconstructed as a single cluster. Therefore, two types of electron candidates are defined, each with its own selection criteria: one in which a topological cluster of energy deposits is associated with one selected ID track (resolved electron [82]), representing a single electron, and one in which a topological cluster is associated with two selected ID tracks (merged ee), representing a merged electron pair. Each ID track considered must satisfy |Δ 0 · sin | < 0.5 mm and | 0 |/ 0 < 5. Resolved electron candidates must satisfy the medium likelihood-based identification criteria [19,82], have T > 4.5 GeV and be within | | < 2.47. A subset of the resolved electron candidates are also required to pass a loose isolation requirement [19].
ID tracks considered for merged electrons must have opposite charge, T > 5 GeV, | | < 2.5, and at least seven hits in the pixel and microstrip detectors combined. To suppress backgrounds from converted photons, tracks must also have a hit in the innermost pixel layer, and merged-electron candidates are rejected if they match a conversion vertex with a radius larger than 20 mm whose momentum agrees better with the cluster energy than the momentum of the track that geometrically matches the cluster best. Because the kinematic behaviour of merged electrons in the calorimeter most closely resembles a photon converting in the material close to the interaction vertex, the energy of the merged-object is calibrated as a converted photon with a conversion radius set to conv = 30 mm calculated relative to the measured beam-line position. As presented in Figure 2(a), this treatment is found to induce minimal bias in the dielectron energy measurement. To cover remaining differences between the simulated detector response to converted photons and merged-objects, the difference between the squares of the energy resolutions for converted photons and merged-objects in the MC simulation is treated as the square of an additional energy resolution uncertainty. The four-momentum of the merged-candidate is constructed using the calibrated energy and the direction and invariant mass obtained from the vertex reconstructed from the two electron-track candidates [83].
Merged-candidates are required to have | | < 2.37 (excluding 1.37 < | | < 1.52) and T > 20 GeV, and satisfy dedicated identification requirements, as the standard electron criteria have a low efficiency for objects with closely spaced energy deposits or broader EM showers. For the merged-identification, a multivariate discriminator is trained to separate the * → signal objects from jets or single electrons. The input variables for the training include shower shape variables [82], the information provided by the transition radiation tracker [84], and the kinematic information from the cluster and ID tracks. Mergedcandidates are also required to pass a tight isolation requirement [19]. A combined efficiency of ∼50% for the merged-identification and isolation is achieved for → * → events. Since photons with a relatively small conversion radius offer a signature similar to that of merged-objects, the mergedidentification and isolation efficiency is measured in data using a tag-and-probe method with FSR photons from boson radiative decays ( → ℓℓ ). Candidate → ℓℓ events are selected similarly to those in Ref. [81]. Only two-track converted photons with conv < 160 mm, corresponding to conversions inside the silicon pixel detector volume, are considered. The → ℓℓ and background yields are estimated from a fit to the ℓℓ distribution in data. The extracted efficiencies are compared with the efficiencies estimated from simulated → ℓℓ events as shown in Figure 2(b) for photons with | | < 0.8. The resulting T -and -dependent data/simulation scale factors are between 0.9 and 1.1 and are used to correct the identification and isolation efficiencies of the simulated → * → events. The statistical uncertainties of the scale factors are taken into account. In addition, a systematic uncertainty is assessed for the background modelling by varying the selection criteria of the ℓℓ background template. The total uncertainty reaches 2% for 20 < T < 30 GeV and 9% for T > 50 GeV. Figure 2(b) also shows a comparison of the extracted efficiencies with efficiencies in simulated → * → events where an additional generator-level requirement of |Δ | < 0.003 is used. This requirement is applied in order to better match the signal signature to the converted photon signature in the detector; approximately 70% of the merged-objects in the signal sample pass it. The efficiencies of converted photons and the merged-objects agree within 10% for T < 30 GeV and within 5% for T > 30 GeV in the entire range.   Figure 2: (a) Ratio of reconstructed to true merged-energy in simulated → * → events as a function of the true merged-T for several energy calibration techniques. The merged-object is calibrated as a photon with a conversion radius of 30 mm (black circles, analysis choice), 100 mm (red squares), and 400 mm (blue upward triangles) or as an electron (purple downward triangles). (b) Combined merged-identification and isolation efficiency extracted from → ℓℓ events with a photon that converts at a radius of conv < 160 mm. The efficiencies are shown for photons with | | < 0.8 as a function of T . Data (black circles) are compared with simulated → ℓℓ events (red squares). The resulting efficiencies are also compared with efficiencies in simulated → * → events (blue triangles). On all points, the vertical error bars indicate the statistical uncertainties.

Event selection
Candidate → * → ℓℓ events must have at least one reconstructed photon, and at least one oppositecharge, same-flavour pair of leptons (muons or resolved electrons) or one merged-object. One of the muons (resolved electrons) in the lepton pair must have T > 11 (13) GeV, to match the T thresholds used in the dilepton triggers. As discussed above, the merged-T is required to be larger than 20 GeV. If the leading photon overlaps with one of the EM clusters (resolved electrons or merged-) forming the * candidate within Δ < 0.02, the photon is discarded and the next-highest-T photon is considered. No isolation requirement is applied to the subleading lepton (muon or resolved electron) as it is within the isolation cone of the leading lepton in the majority of events. Additionally, if the subleading lepton (muon or electron) falls within the isolation cone of the leading lepton, it is not included in the calculation of the isolation variable.
Muon pairs are given the highest priority if there are lepton pairs of different flavours in one event. If no opposite-charge muon pair satisfying the requirements above is found, the electron-pair candidates are considered. The resolved electron pair or merged-object with the highest vector-sum of the T of the associated ID tracks is selected.
In order to suppress events arising from FSR processes, events are rejected if the photon is within a Δ = 0.4 cone around either of the selected leptons. If the axis of a jet is within Δ = 0.4 of the photon or a muon, or within Δ = 0.2 of an electron (merged or resolved), the jet is discarded. However, if the electron-jet angular separation is 0.2 < Δ < 0.4, the electron and therefore the event is discarded.
To suppress events involving boson decays, the dilepton invariant mass must satisfy ℓℓ < 30 GeV.
The invariant mass of the ℓℓ system is required to satisfy 110 < ℓℓ < 160 GeV. Additionally, both the photon and dilepton momenta must satisfy T > 0.3 · ℓℓ .
The selected events are classified into mutually exclusive categories, depending on the lepton types and event topologies. The VBF-enriched categories, which have the best expected signal-to-background ratio, but the lowest event count, are defined as follows. Events must contain at least two jets with T > 25 GeV. If the leading or subleading jet is a forward jet (defined as a jet with | | > 2.5), it is required to have T > 30 GeV to suppress jets originating from pile-up. In addition, the invariant mass of the two leading jets, , must be greater than 500 GeV, and the pseudorapidity separation between the two leading jets, |Δ |, greater than 2.7. The quantity | ℓℓ − 0.5( 1 + 2 )| [88] is required to be less than 2.0, where ℓℓ is the pseudorapidity of the ℓℓ system and 1 ( 2 ) is the pseudorapidity of the leading (subleading) jet. The selected leptons and the photon must be separated from the two jets by Δ > 1.5. Additionally, the azimuthal separation between the ℓℓ system and the system formed by the two jets must be greater than 2.8. These jet variables are expected to have a different shape for the VBF signal and the background from QCD processes.
The T is defined as the component of the transverse momentum of the ℓℓ system that is perpendicular to the difference of the three-momenta of the dilepton system and the photon candidate . This quantity is strongly correlated with the transverse momentum of the ℓℓ system, but has better experimental resolution [89,90]. Events failing the VBF-enriched selection, but having T > 100 GeV, are assigned to the high-T category, which has a lower expected signal-to-background ratio than the VBF-enriched category, but is expected to have more events. The high-T category is expected to have an increased fraction of VBF and VH events as these production modes lead on average to higher Higgs boson T than ggF production.   GeV. In addition, the following numbers are given: number of → * → ℓℓ events in the smallest ℓℓ window containing 90% of the expected signal ( 90 ), the non-resonant background in the same interval ( 90 ) as estimated from fits to the data sidebands using the background models described in Section 6, the resonant background in the same interval ( → ), the expected signal purity 90 = 90 /( 90 + 90 ), and the expected significance estimate defined as 90  The vast majority of selected events do not fall into the VBF-enriched or high-T categories discussed above, and are placed into the low-T category. The full list of all categories considered in the analysis, together with the expected yields for a 125.09 GeV Higgs boson decaying into ℓℓ , is shown in Table 1. The table also summarises the observed number of events in data in the ℓℓ mass range of 110-160 GeV.

Signal and background modelling
To extract the observed signal yield, parametric functions are chosen to model the ℓℓ invariant mass distributions of signal and background in each analysis category. A combined model is built from these functions and fit to the selected data in the ℓℓ range of 110−160 GeV, simultaneously in all categories.
The signal model, including its parameters, is obtained by fitting a double-sided Crystal Ball function (DSCB) [91,92] to the ℓℓ distribution obtained from the → * → ℓℓ samples described in Section 3 after applying the event selection and categorisation from Section 5. The DSCB function features a Gaussian core and asymmetric power-law tails. A shift of +0.09 GeV is applied to the mean of the Gaussian function to account for the fact that the sample assumes a Higgs boson mass of 125 GeV. The effective signal mass resolution, defined as half the width containing 68% of the signal events, depends on the category and lies between 1.6 GeV ( -merged high-T category) and 2.2 GeV ( -resolved low-T category).

The
→ process, which contributes as background to the electron categories through converted photons, is modelled using the same functions and parameters as the signal in the respective categories, and is normalised to the predicted SM yield. The parameterisations are compatible with the statistically limited distributions obtained from simulated → events. This background contribution is relatively small (<2.5% and <7% of the expected → * → ℓℓ signal in the -resolved channel and -merged channel, respectively) and is taken into account in the fitting procedure.
The non-resonant background is also estimated with parametric functions. The background normalisation and function parameters are allowed to float in the fit to data. The chosen functional forms are based on background templates that are built taking into account the contributions of different processes to the background. The function choice is performed separately for each category.
The dominant part of the background originates from the non-resonant ℓℓ process. There is also a smaller background from events with misidentified photons, electrons, or muons. To estimate the fraction of events with a misidentified photon, a control region is formed using the signal selection, but dropping the photon isolation requirement and using it as a discriminating variable in a template fit. A background template enriched in events with misidentified photons is built by inverting the photon identification selection, with the prompt-photon contamination removed by subtracting its distribution found in simulated events. The template normalisation is obtained in the background-dominated sideband of the photon isolation distribution for each category separately. In the signal region, about 10% of all selected events have a misidentified photon, independently of the category. The fraction of events with a misidentified electron or muon that is in fact a hadronic jet, a lepton from heavy-flavour decays or an electron from a photon conversion, is estimated in a similar manner, using a control region in which the isolation selection is dropped for the softer lepton. The estimated fractions of events with misidentified leptons are 4% in the low-T category, 2% in the -merged low-T category, and 30% in the -resolved low-T category. Events in the last category are separated into two groups depending on the angular distance between the electrons, as two populations with different misidentified-electron rates and mass distributions were found. The numbers in the other categories are extracted as well, but suffer from fairly large statistical uncertainties.
The invariant mass template for the non-resonant ℓℓ background is built from the simulated events described in Section 3. The invariant mass templates for events with misidentified objects are obtained from background-dominated control regions, and scaled by the yields derived above. Reasonable agreement is observed between the templates containing the sum of all backgrounds and the sidebands of the ℓℓ distributions in data (105−120 GeV and 130−160 GeV).
The choice of fit function for the non-resonant background is made in each category using signal-plusbackground fits to the constructed background-only template by measuring the bias associated with each function, expressed as the number of fitted signal events, and choosing the function with the smallest number of degrees of freedom that satisfies the bias criteria described below.
The functional forms used to model the background are selected from the following: exponential (e ℓℓ ), exponential of a second-order polynomial (e ℓℓ + 2 ℓℓ ), and a power-law function ( ℓℓ ), where and are free parameters. Signal hypotheses with ranging from 121 GeV to 129 GeV are tested in steps of 1 GeV, and the fit bias is evaluated as the absolute maximum bias over this range (referred to as the 'spurious signal' in the following), similar to what is done in Ref.
[1]. A function passes the test if the spurious signal is less than 10% of the expected number of → * → ℓℓ events or less than 20% of the statistical uncertainty of the fitted spurious signal due to the expected number of background events. To account for statistical fluctuations in the background template, the criteria are relaxed by the statistical uncertainty due to the template [93]. Furthermore, in a background-only fit of the template in the mass range 110 < ℓℓ < 160 GeV, the function must pass a 2 test with a probability larger than 1%. As described in Ref.
[93], an additional check is performed by fitting both the chosen function and a function belonging to the same family with one more degree of freedom to the sidebands of the ℓℓ distribution in data, to ensure the data do not prefer the more complex function. Following the procedure described above, the power-law function is selected for all categories, except for the low-T and -merged low-T categories, which are best described by the second-order exponential polynomial, and the -resolved VBF-enriched category, for which the exponential function is chosen.
An unbinned extended likelihood function is formed from the product of each category's parameterised signal-plus-background probability density function. Systematic uncertainties are considered in the form of nuisance parameters with Gaussian or log-normal constraints. They are correlated across all categories, except for the spurious-signal uncertainties. The latter are implemented for each category as a signal-like component with a yield parameter that is constrained by a Gaussian function, centred at zero, with a width corresponding to the estimated spurious signal. The Higgs boson mass is set to 125.09 ± 0.24 GeV [42].
The parameter of interest is the signal strength , which is defined as the ratio of the measured signal yield to the SM expectation, taking into account the uncertainties on the latter. The corresponding profile likelihood ratio is maximised to extract the best-fit [94]. A possible excess over the background-only hypothesis is quantified by a -value using the profile likelihood ratio, evaluated for a vanishing → ℓℓ branching ratio, as a test statistic. For all results, the asymptotic approximation is used [94]. Cross-checks of the best-fit and the -value extracted with pseudo-experiments show good agreement with those obtained using the asymptotic approximation.
The measurement of the signal strength can be converted into a measurement of the Higgs boson production cross-section times the → ℓℓ branching ratio in the fiducial region ℓℓ < 30 GeV. In this case, the included theory uncertainties are adjusted as discussed in Section 7, and the acceptance is extracted with the simulated samples described in Section 3.

Systematic uncertainties
The total observed systematic uncertainty in the signal strength measurement is 11%, which is about 35% of the size of the statistical uncertainty. Therefore, systematic uncertainties do not play a dominant role in this analysis.
The dominant experimental systematic uncertainties are due to the estimated biases in the fitted signal events (spurious signal, see Section 6). The corresponding uncertainty in the observed signal strength amounts to 6.1%. Other non-negligible systematic uncertainties relate to photon and lepton identification efficiencies in the simulated signal samples, in particular for merged-objects, as well as the energy/momentum calibration (see Section 4). Including jet uncertainties, which have a much smaller impact, these add up to 4.0%.
The uncertainty in the combined 2015-2018 integrated luminosity is 1.7% [95], obtained using the LUCID-2 detector [96] for the primary luminosity measurements. Uncertainties in the pile-up modelling contribute 1.7%. Uncertainties in the estimate of the → background have a small impact of 0.7%. They include uncertainties in the photon-electron fake rate, and the uncertainties in the ( ) ×B ( → ) measurement [93].
The assumed uncertainty in the → * → ℓℓ branching ratio contributes 5.8%. The choice of QCD scales impacts the number and distribution of signal events in the different categories and is evaluated for the ggF, VBF, and VH production modes using a scheme similar to the one discussed in Ref. [93]. The corresponding uncertainty in the measured signal strength is 4.7%. Uncertainties in the PDF are evaluated using the eigenvectors of the PDF4LHC15 PDF set [33] and have a smaller effect of 2.3%. A conservative uncertainty of 50% is assigned to the normalisation of the¯, →ZH, and bbH production modes, as no dedicated → * → ℓℓ samples are available, with an impact of 0.8%. Parton shower uncertainties contribute only 0.3%.
For the ( ) × B ( → ℓℓ ) measurement, the effect of the theory uncertainties is reduced to 1.1% for the QCD scale and 0.9% for the PDF uncertainty, as only acceptance effects are considered, whereas the uncertainties in the predicted cross-sections and branching ratio are not applicable to this measurement.

Results
The ℓℓ distributions of the selected events and the result of the global fit of the parametric signal-plusbackground models to the data are shown in Figure 3 for each event category.
The best-fit value of the signal-strength parameter is = 1.5 ± 0.5 = 1.5 ± 0.5 (stat.) +0.2 −0.1 (syst.), while the corresponding expected SM value is exp = 1.0 ± 0.5 = 1.0 ± 0.5 (stat.) +0.2 −0.1 (syst.). The best-fit signal strength in the muon (electron) channel, obtained from a fit with two separate signal-strength parameters, is = 1.9 ± 0.7 ( = 1.0 ± 0.7). Figure 4 shows the results of the fit when the signal strength in each category is allowed to float independently. As anticipated in Table 1, the low-T categories, especially in the channel, have the smallest uncertainties. It can be seen that all categories yield results that are consistent with each other and with the result of the single-fit.
For illustration, Figure 5 shows the ℓℓ distribution, with every data event reweighted by a categorydependent weight ln (1 + 90 / 90 ), where 90 is the number of signal events in the smallest window containing 90% of the expected signal and 90 is the expected number of background events in the same window, which consists of the resonant → background as well as the non-resonant background. The latter is estimated from fits to the data sidebands using the background models.
The observed (expected) significance over the background-only hypothesis for a Higgs boson with a mass of 125.09 GeV is 3.2 (2.1 ).

Conclusion
A search for the Higgs boson decaying into a low-mass pair of electrons or muons and a photon is presented. The analysis is performed using a data set recorded by the ATLAS experiment at the LHC with proton-proton collisions at a centre-of-mass energy of 13 TeV, corresponding to an integrated luminosity of 139 fb −1 . For a Higgs boson with a mass of 125.09 GeV and ℓℓ < 30 GeV, evidence for the → ℓℓ process is found with a significance of 3.2 over the background-only hypothesis, compared to an expected significance of 2.1 . The best-fit value of the signal-strength parameter, defined as the ratio of the observed signal yield to the signal yield expected in the Standard Model, is = 1.5 ± 0.5. The Higgs boson production cross-section times the → ℓℓ branching ratio for ℓℓ < 30 GeV is determined to be 8.7 +2.8 −2.7 fb. This result constitutes the first evidence for the decay of the Higgs boson into a pair of leptons and a photon, an important step towards probing Higgs boson couplings in this rare decay channel.