Search for an exotic decay of the Higgs boson into a Z boson and a pseudoscalar particle in proton-proton collisions at $\sqrt{s}$ = 13 TeV

A search for an exotic decay of the Higgs boson to a Z boson and a light pseudoscalar particle (a), decaying to a pair of leptons and a pair of photons, respectively, is presented. The search is based on proton-proton collision data at a center-of-mass energy of $\sqrt{s}$ = 13 TeV, collected with the CMS detector and corresponding to an integrated luminosity of 138 fb$^{-1}$. The analysis probes pseudoscalar masses $m_\mathrm{a}$ between 1 and 30 GeV, leading to two pairs of well-isolated leptons and photons. Upper limits at 95% confidence level are set on the Higgs boson production cross section times its branching fraction to two leptons and two photons. The observed (expected) limits are in the range of 1.1-17.8 (1.7-17.9) fb within the probed $m_\mathrm{a}$ interval. An excess of data above the expected standard model background with a local (global) significance of 2.6 (1.3) standard deviations is observed for a mass hypothesis of $m_\mathrm{a}$ = 3 GeV. Limits on models involving axion-like particles, formulated as an effective field theory, are also reported.


Introduction
Following the discovery of the Higgs (H) boson by the ATLAS and CMS Collaborations [1][2][3] at the CERN LHC, a thorough program of precision measurements [4,5] was carried out to uncover possible deviations from the predictions of the standard model (SM) of particle physics and to decipher the nature of the scalar sector.In particular, deviations in the H boson decay width or the observation of exotic decay modes would constitute evidence of beyond-the-SM (BSM) physics.
Axion-like particles, referred here as ALPs (a), were originally proposed to address the strong CP problem [6].Recently, it was shown that ALPs could explain the observed anomaly in the magnetic moment of the muon [7].Theoretical overviews of ALP models can be found in Refs.[8,9].The models are formulated as an effective field theory of ALPs coupled to various SM particles.In particular, these models allow couplings amongst the H boson, the Z boson, and the ALP fields, where the Z boson is found to be longitudinally polarized [10].Several searches involving ALPs, targeting different processes and final states, have been performed by the ATLAS and CMS Collaborations [11][12][13][14][15][16][17][18][19].In particular, the search for H → aa → γγγγ has probed the existence of pseudoscalars with masses as low as 0.1 GeV [18].
In this Letter, we report a search for an exotic decay of the H boson to an ALP and a Z boson, where the ALP decays to a pair of photons, and the Z boson decays to electrons or muons.The dominant Feynman diagram contributing to this process is shown in Fig. 1.Such a final state, with two charged leptons and two photons, results in an experimental signature that has low cross section in the SM [20], and provides a complementary channel for the search for ALPs.It is the first search of this type at the LHC.Assuming that the narrow-width approximation is valid for decays of the ALP and that the Z boson is on-shell, only the mass range m a < m H − m Z ≈ 34 GeV is kinematically accessible to the H → Za decay, where m H and m Z are the H and Z boson masses, respectively.For m a below 1 GeV, the two photons originating from the ALP decay cannot be separated anymore in the detector [21].Consequently, in this analysis, the mass range of 1 < m a < 30 GeV is considered.The tabulated results are provided as HepData records [22].

The CMS detector
The central feature of the CMS apparatus is a superconducting solenoid of 6 m internal diameter, providing a 3.8 T magnetic field.Within the solenoid volume are a silicon pixel and strip tracker, a lead tungstate crystal electromagnetic calorimeter (ECAL), and a brass and scintillator hadron calorimeter (HCAL), each composed of a barrel and two endcap sections.Forward calorimeters extend the pseudorapidity (η) coverage provided by the barrel and endcap detectors.Muons are detected in gas-ionization chambers embedded in the steel flux-return yoke outside the solenoid.A more detailed description of the CMS detector, together with a definition of the coordinate system used and the relevant kinematic variables, can be found in Ref. [23].
Events of interest are selected using a two-tiered trigger system.The first level, composed of custom hardware processors, uses information from the calorimeters and muon detectors to select events at a rate of around 100 kHz within a fixed latency of 4 µs [24].The second level, known as the high-level trigger, consists of a farm of processors running a version of the full event reconstruction software optimized for fast processing, and reduces the event rate to around 1 kHz before data storage [25].

Data and simulated samples
The proton-proton (pp) collision data at √ s = 13 TeV used in this analysis were collected in 2016-2018 and correspond to a total integrated luminosity of 138 fb −1 .
Signal samples for the physical processes pp → H → Za → ℓℓγγ (where ℓ = e, µ, and leptonic τ decays are included in the signal) are generated at leading order (LO) for m a range of 1-10 GeV in 1 GeV steps, and 10-30 GeV in 5 GeV steps, using the MADGRAPH5 aMC@NLO 2.2.2 (2.4.2) [26][27][28] generator for analyzing data collected in 2016 (2017)(2018).The H boson mass is assumed to be 125 GeV, and only the dominant production mode of the H boson, the gluon fusion, is considered.Other production modes were evaluated, but they were found to have comparable signal selection efficiencies relative to the gluon fusion mode, with no more than a 10% relative difference.Consequently, since the final results are normalized to the H boson inclusive production cross section, differences in kinematics between the different production modes have a negligible impact (<1%) on the predicted signal normalization.
The predominant background in this search is the SM Drell-Yan production of Z + jets, in which jets are misidentified as photons.As in Refs.[29,30], this background contribution is modeled entirely from data to extract the results.For the analysis selection optimization, simulated Drell-Yan events are used.They are generated at next to leading order with the MAD-GRAPH5 aMC@NLO 2.2.2 (2.4.2) [26][27][28] generator for the analysis of 2016 (2017-2018) data.
The set of parton distribution functions (PDFs) NNPDF3.0 [31] (NNPDF3.1 [32]) is used for the 2016 (2017-2018) simulation.Parton showering and hadronization are simulated using the PYTHIA 8.230 generator [33] with the CP5 underlying event tune for the simulation of all three years [34,35].The response of the CMS detector is modeled using the GEANT4 toolkit [36,37].Additional interactions in the same or adjacent bunch crossings (pileup) are modeled by superimposing simulated minimum bias collisions on the hard-scattering interaction, with the multiplicity matching that observed in data.

Event reconstruction
The primary vertex is taken to be the vertex corresponding to the hardest scattering in the event, evaluated using tracking information alone [38].
The particle-flow (PF) algorithm [39] aims to reconstruct and identify each individual particle in an event (PF candidate), with an optimized combination of information from the various elements of the CMS detector.The energy of photons is obtained from the ECAL energy deposition in a supercluster, combining deposits from the photon and the conversions, produced by the tracker material upstream of the ECAL detector [40].The energy of the electrons is estimated from a combination of the electron momentum at the primary interaction vertex as determined by the tracker, the energy of the corresponding ECAL cluster, and the energy sum of all bremsstrahlung photons spatially compatible with the origin of the electron track.A multivariate regression technique is employed to correct the photon and electron energies measured in the ECAL.These procedures are described in Ref. [40].The energy of muons is obtained from the curvature of the corresponding track.The energy of the charged hadrons is determined from a combination of their momentum measured in the tracker and the matching ECAL and HCAL energy deposits, corrected for the response function of the calorimeters to hadronic showers.Finally, the energy of neutral hadrons is obtained from the corresponding corrected ECAL and HCAL energies.
Electrons with the transverse momentum p T > 7 GeV and |η| < 2.5 are identified with a multivariate discriminant, which is constructed using variables related to the bremsstrahlung along the electron trajectory, ECAL energy measurements, electromagnetic showers, missing pixel detector hits, and the photon conversion vertex fit probability.The momentum resolution for electrons with p T ≈ 45 GeV from Z → ee decays, where e refers to e + and e − , is in the range 1.7-4.5%.For electrons with p T ≲ 30 GeV, the momentum resolution is in the range 2.1-6.1% [40].It is generally better in the barrel than in the endcaps, and also depends on the bremsstrahlung energy emitted by the electron as it traverses the material in front of the ECAL.
Muons are measured in the range |η| < 2.4, with detection planes made using three technologies: drift tubes, cathode strip chambers, and resistive-plate chambers.The efficiency of the single-muon trigger, which has a 20 GeV p T threshold, exceeds 90% over the full η range, and the efficiency to reconstruct and identify muons is greater than 96%.Matching muons to tracks measured in the silicon tracker results in a relative p T resolution, for muons with p T up to 100 GeV, of 1% in the barrel and 3% in the endcaps [41].
To reduce the contributions from leptons arising from hadron decays within jets, a requirement is imposed on each lepton candidate using their relative isolation, defined as: where the sums are over the PF candidates within an angular cone of radius ∆R < 0.3 (where ∆R = (∆η) 2 + (∆ϕ) 2 and ϕ is the azimuthal angle in radians), p i T represents the p T of individual charged or neutral hadrons, photons, or pileup [42].For electrons, the relative isolation variable, I e , is included in the multivariate discriminant.For muons, the relative isolation is required to be I µ < 0.35.In addition, the three-dimensional impact parameter of electrons and muons is required to be consistent with the primary collision vertex.
An algorithm is utilized to account for final-state radiation (FSR) from leptons.Photons reconstructed with the PF algorithm are considered as FSR candidates if they satisfy the requirement p γ T > 2 GeV and I γ < 1.8, where I γ is calculated similarly to the lepton isolation variable.Each FSR candidate is then assigned to the closest lepton in the event.The candidates are further required to have ∆R(γ, ℓ)/ p γ T 2 < 0.012 GeV −2 and ∆R(γ, ℓ) < 0.5.These FSR photon candidates are excluded from the calculation of the lepton isolation variables.The Higgs boson mass resolution is improved by about 1 -4% with this FSR recovery algorithm.
Lepton reconstruction and selection efficiencies are measured in data with the "tag-and-probe" technique with an inclusive sample of Z boson events [43].The differences between the efficiencies in data and simulation are observed to be around 1-4%, depending on the p T and η of the lepton considered.They are used to correct lepton efficiencies in simulation.
Prompt photons are distinguished from jets using a cut-based technique [29] that uses information related to the photon isolation variables, the ratio of the energy in the HCAL behind an electromagnetic supercluster to the supercluster energy, and the transverse width of the electromagnetic shower (σ iηiη ).Because the two photons are collimated for low m a , their σ iηiη and photon isolation in the ECAL, which reflect the relative contribution of energy from nearby photons in the ECAL, depend on each other.In this analysis, σ iηiη and the photon isolation in the ECAL are removed from the identification requirement.Photons are required to lie in the geometrical region |η| < 2.5 and to have p T > 10 GeV.The photon identification efficiency is measured using Z → ee events with electrons reconstructed as photons, using the tag-andprobe technique.

Event selection optimization
A set of event requirements is defined to maximize the sensitivity to a potential signal.The leading muon (electron) is required to have p T > 20 (25) GeV and the subleading one p T > 10 (15) GeV, while each photon is required to have p T > 10 GeV.
For each event, Z boson candidates are formed by considering all opposite-sign, same-flavor lepton pairs with the dilepton invariant mass m ℓℓ closest to the nominal Z boson mass, while ALP candidates are built from selected photons with the highest leading and subleading p T .For each dilepton and diphoton candidate, a Za candidate is constructed.The Za candidates are required to satisfy m ℓℓ > 50 GeV, 95 < m ℓℓγγ < 180 GeV, and ∆R(ℓ, γ) > 0.4.A loose dilepton invariant mass requirement, m ℓℓ > 50 GeV, is applied in order not to introduce any mass peak sculpting in the m ℓℓγγ mass distribution for high masses of ALP candidates.
A boosted decision tree (BDT) event classifier is trained to separate the signal from the background.In order to make the classifier output uniform and sensitive to the full m a range considered in this search, the training sample is parameterized as a function of m a [44].In this approach, a parameter equal to the hypothesized pseudoscalar mass m a,hyp is provided as one of the inputs to the training.The parameterized classifier requires only one single training and is able to provide interpolation to intermediate m a hypotheses that were not used during the training.Simulations corresponding to various m a hypotheses are combined and treated as a common signal in the training.The value of m a,hyp is set equal to the mass hypothesis for the signal simulation.For the background, simulated Drell-Yan events are used in the training and the parameter m a,hyp is randomly distributed as a flat function in the m a range.The variables used to train the BDT are chosen such that the value of m ℓℓγγ cannot be inferred from the inputs.For this purpose, the correlations between the input variables and m ℓℓγγ are checked, and the variables that have a strong correlation with m ℓℓγγ are removed.Finally, the following discriminating variables are used as input to the BDT: • p T (γ1) and p T (γ2), where γ1 and γ2 are the leading and subleading photons, re-spectively; • R 9 (γ1) and R 9 (γ2): the energy sum of the 3×3 crystal array centered around the most energetic crystal in the supercluster, divided by the energy of the supercluster; • σ iηiη (γ1) and σ iηiη (γ2): the second moment of the log-weighted distribution of crystal energies in η, calculated in the 5×5 matrix around the most energetic crystal in the supercluster and rescaled to units of crystal size; • I γ (γ1) and I γ (γ2): the isolation variable obtained by summing the p T of photons inside an isolation cone of ∆R = 0.3 with respect to the photon direction, while the impact of another selected photon is also included; • I γ,a , the isolation variable obtained by summing the p T of photons inside an isolation cone of ∆R = 0.3 with respect to the direction of the ALP candidate; • the angular separation between the Z boson and the diphoton pair, ∆R(Z, a); • the angular separation between the two photons, ∆R(γ 1 , γ 2 ); • the angular separation between the leading photon and the Z boson, ∆R(γ 1 , Z); • the ALP candidate's p T divided by m ℓℓγγ ; • the H boson candidate's p T ; • the difference between the invariant masses of the ALP candidate and the m a,hyp parameter divided by m ℓℓγγ , (m a − m a,hyp )/m ℓℓγγ .
Because the leptons have been selected with tight requirements, lepton identification criteria are not included as discriminating variables.In addition, the fact that angular variables have low ranking among the BDT input variables makes the difference between pseudoscalar and scalar assumptions negligible.The simulated signal and background data sets from the different data-taking years are scaled by the corresponding integrated luminosity and combined into a large training sample.Additionally, signal and background events are divided into independent training and testing samples to confirm the absence of overtraining.
The four most discriminating variables, shown in Fig. 2, are (m a − m a,hyp )/m ℓℓγγ , the leading photon's σ iηiη , the subleading photon's σ iηiη , and the leading photon's R 9 .The distributions of the simulated background and the data are found to be in reasonable agreement.Since the background model used to extract the limits is estimated from data (see Section 8), any remaining disagreement between data and simulated background does not bias the final results.The shape discrepancy between the data and simulated background is propagated to the final BDT selection, and the impact is checked to be negligible (<2%).
A unique BDT output is obtained for each m a hypothesis.Figure 3 shows the distributions of the BDT outputs for data and simulated events for m a = 1, 10, 20, and 30 GeV.
Events are categorized according to the output of the BDT.The BDT thresholds maximize the approximate mean significance (AMS) [45] over all possible categories in the signal region (115 < m ℓℓγγ < 135 GeV), separately for each m a .The AMS is defined as: In Eq. ( 2), S and B refer to the number of signal and background (Drell-Yan simulation) events in the signal region.In order to minimize the impact of statistical fluctuations, the output BDT distribution of the background is smoothed, using the super-smoothing technique [46,47].This optimization procedure is performed separately for each m a hypothesis for up to (m a − m a,hyp )/m ℓℓγγ (upper left), leading photon's σ iηiη (upper right), subleading photon's σ iηiη (lower left), and leading photon's R 9 (lower right).The events pass the selection criteria described in Section 5.The signal is scaled to a cross section of 0.1 pb and the background sample is normalized to an integrated luminosity of 138 fb −1 .The systematic uncertainties included in the shaded band are related to the photon efficiency, lepton efficiency, and pileup modeling.
The impact of the remaining disagreement between data and simulation is negligible.
two categories, based on the BDT score.An increase of less than 1% in the AMS value is observed when increasing the number of categories beyond 1.Therefore, a single category based on the BDT output is considered.Table 1 details the category boundaries, the related signal efficiencies, and the Drell-Yan background yields.

Statistical procedure
The statistical procedure to extract the results is identical to that described in Ref. [48].An unbinned maximum likelihood fit is performed on the m ℓℓγγ distribution in the mass range 95 < m ℓℓγγ < 180 GeV for each m a hypothesis, with an m a granularity of 1 GeV in the range 1-30 GeV.A likelihood function is defined for each m a hypothesis using analytic models that describe the m ℓℓγγ distributions of signal and background processes.Nuisance parameters are included to account for the experimental and theoretical systematic uncertainties described in Section 9.The data collected in 2016-2018 are combined, with the electron and muon channels merged together.The best-fit values and confidence intervals for the parameters of interest are estimated using a profile likelihood test statistic: where ⃗ α and ⃗ θ describe the unconditional maximum likelihood estimates for the parameters of interest and the nuisance parameters, respectively, whereas ⃗ θ ⃗ α corresponds to the conditional maximum likelihood estimate for fixed values of the parameters of interest,⃗ α.

Signal model
The signal shape for the m ℓℓγγ distribution, for each nominal signal hypothesis, is constructed from simulation.After all the selection criteria are applied, a signal model is built for each m a hypothesis and for each data-taking year.The m ℓℓγγ distribution is modeled with a sum of n Gaussian functions (n < 5).The electron and muon channels are treated separately.These signal models, scaled by the corresponding integrated luminosity, are summed together to construct the final signal model.
The number of Gaussian functions used for the signal modeling is determined with an F-test [49].First, an F-test is performed to determine the order of the Gaussian fit, and then the parameters that fit the signal distribution best are extracted.As an example, the signal models are shown in Fig. 4 for m a = 30 GeV in the electron and muon channels and for the year 2018.The full width at half maximum (FWHM) and the effective standard deviation (σ eff ), which is defined as half the width of the smallest interval containing 68% of the m ℓℓγγ distribution, are also shown.
To build the signal models for the intermediate mass hypotheses in the range 10 < m a < 30 GeV, two factors must be considered: the shape of the m ℓℓγγ distribution and its normalization.Since the shape of the m ℓℓγγ distribution does not significantly depend on m a in the interpolation range, only the normalization of the signal model is parameterized.Figure 5 shows the product of the efficiency and acceptance as a function of m a for the intermediate mass hypotheses.The BDT selection efficiency is higher in the regions with lower diphoton background contributions because the diphoton invariant mass related information ((m a − m a,hyp )/m ℓℓγγ ) is included in the BDT training.Since the diphoton background contamination is minimal for masses around 5 GeV, the product of the efficiency and acceptance has a maximum at m a ≃ 5 GeV.The photons from the ALP decay are easier to distinguish in the high mass regions, which makes the separation between the signal and background by the BDT easier.This effect leads to an increase of the product of the efficiency and acceptance beyond 20 GeV.A better momentum resolution makes the efficiency times acceptance of the muon channel higher than that of the electron channel.However, the efficiency decreases because of higher pileup in later data-taking years.For each intermediate point, a signal model is constructed using the m ℓℓγγ shape of the nearest mass hypothesis and the normalization is interpolated from the two nearest mass hypotheses.

Background model
The background model is built to describe the shape of the m ℓℓγγ distribution that comes from processes other than the decay of the H boson. Since the shape of this distribution is not known, different functional forms must be considered.They result in different numbers of estimated background events in the signal region, affecting the signal extraction.The uncertainty arising from the choice of background model is handled with the envelope method [50]: the background modeling is performed using data as in Ref. [29], and the choice of the background functional form is treated as a discrete nuisance parameter in the likelihood fit to data.As shown in Fig. 6, the m ℓℓγγ distribution consists of a turn-on peak around 105-115 GeV, driven by the photon p T selection, and a falling spectrum at higher mass.To model the background distribution, functions with the following general form are considered: where N is a Gaussian function with mean µ and standard deviation σ, Θ is the Heaviside step function defining the cutoff point s below which the falling spectrum function f with shape parameters ⃗ α has no influence, and f is a falling spectrum function with shape parameters ⃗ α.
The falling spectrum function families considered in the analysis are exponentials, Bernstein polynomials, Laurent series, and power-law functions.A subset of functions from each family is used to build the background model.For each family, the maximum order of parameters to be used is chosen with an F-test, and the minimum order is determined by applying a requirement on the goodness-of-fit to the data.A penalty is added to take into account the number of floating parameters in each candidate function.When making a measurement of a given parameter of interest, the discrete profiling method minimizes the overall negative logarithm of the likelihood considering all allowed functions.
The fit is performed over the range 95 < m ℓℓγγ < 180 GeV where the background model is extended from the sideband region (95 < m ℓℓγγ < 115 GeV and 135 < m ℓℓγγ < 180 GeV) to the signal region, and data from all data-taking years and from the electron and muon channels are combined to construct the background model.A unique background model is created for each m a hypothesis.For intermediate m a hypotheses, the BDT boundary is taken from the closest simulated m a .Figure 6 shows the best-fit functions for m a = 1 and 30 GeV.
For each m a hypothesis, an ensemble of pseudo-experiments was generated using the various background functions.Each pseudo-experiment was fitted using the discrete profiling method, and it was verified that the chosen functional form used to describe the background does not introduce any bias in the signal extraction.

Systematic uncertainties
The systematic uncertainty associated with the background modeling is taken into account with the envelope method, as described in Section 8.The signal modeling has systematic uncertainties of two kinds: those that modify the shape of the m ℓℓγγ distribution, and those that leave the shape of m ℓℓγγ distribution unchanged but affect the overall normalization of the signal process.
Uncertainties affecting the shape of the m ℓℓγγ distribution are typically related to the energy of the individual photons and leptons, and therefore impact the mean and width of the signal model.Their sources are the following:  • Photon energy scale and resolution: corrections are applied to the photon energy scale in data and to the energy resolution in simulation.The uncertainties related to these corrections are computed using Z → ee events [40].The resulting uncertainty on the signal strength due to the uncertainty on the energy scale and resolution is estimated to be 5.7% (2016), 3.5% (2017), and 4.5% (2018).The uncertainties for each data set are fully correlated; • Lepton energy scale and resolution: corrections are applied to the lepton energy scale in data and to the energy resolution in simulation.The uncertainties related to these corrections are computed using Z → ℓℓ events [40,41].The resulting uncertainty on the signal strength due to the uncertainty on the energy scale and resolution is estimated to be 4.3 (4.9)% (2016), 4.2 (4.4)% (2017), and 4.9 (5.2)% (2018) in the electron (muon) channel.The uncertainties for each data set are fully correlated.
The uncertainties that affect the normalization of the signal model are the following: • Integrated luminosity: uncertainties in the integrated luminosity measurement are estimated to be 1.2% (2016), 2.3% (2017), and 2.5% (2018) [51][52][53].The uncertainty in the total integrated luminosity of the three years combined is 1.6%.The uncertainties for each data set are partially correlated and account for the common sources in the luminosity measurement schemes; • Pileup modeling: the total inelastic pp cross section is varied by ±5%, and the analysis is repeated with the shifted weights.Then, the maximum difference of the yield compared to the nominal yield is taken as the systematic uncertainty.The average magnitude of the resulting uncertainty is below 3% across the full m a range.The uncertainties for each data set are fully correlated; • Lepton and photon identification efficiencies: the analysis is rerun by shifting the lepton and photon identification scale factors, which are applied to simulations to match the lepton and photon selection efficiencies in data, by one standard deviation and the maximum difference of the yield with respect to the nominal yield is taken as the systematic uncertainty.The average magnitude of the resulting uncertainty is around 10% across the full m a range.The uncertainties for each data set are fully correlated; • The BDT uncertainties: because the BDT score is used to define the signal region, uncertainties in the BDT can lead to event migration across boundaries.The systematic uncertainties in the input variables, which include the data-MC differences in the shower shape variables, are propagated to the final BDT selection.Using the same BDT score boundaries to define the signal region, the maximum difference of the yield with respect to the nominal yield is taken as the systematic uncertainty.The resulting uncertainty is less than 2%.
Theoretical PDF and scale uncertainties have a negligible impact on the analysis results and therefore are not applied.A summary of all the systematic uncertainties is given in Table 2.The impact of systematic uncertainties on the expected upper limit in the cross section is about 1% across the probed m a range and the analysis is primarily limited by the statistical uncertainty.

Results and interpretation
Upper limits at 95% confidence level (CL) on the cross section times branching fraction of the H boson decaying to a Z boson and an ALP are evaluated using a CL s criterion, taking the profile likelihood as a test statistic [54][55][56].In this procedure, an asymptotic approximation for the likelihood was used [45].The limits are calculated in the mass range of 1-30 GeV and are shown in Fig. 7.The observed (expected) limits are in the range 1.1-17.8(1.7-17.9)fb within the probed m a interval of 1-30 GeV.An excess of data above the expected SM background with 2.6 (1.3) σ local (global) significance [57] is observed for a mass hypothesis of m a = 3 GeV.
Upper limits at 95% CL are calculated on C eff ZH /Λ [8,9], as shown in Fig. 8, where C eff ZH is the effective coupling parameter of the H boson, the Z boson, and the ALP, and Λ is the scale of BSM physics.In this interpretation, in order to avoid the issue of ALPs being long-lived in the The dashed black curve is the expected upper limit, while the one and two standard-deviation bands are shown in green and yellow, respectively.The solid black curve is the observed upper limit.Expected and observed limits at 95% CL on C eff ZH /Λ, assuming the ALP decays exclusively to a photon pair.The dashed black curve is the expected upper limit, while the one and two standard-deviation bands are shown in green and yellow, respectively.The solid black curve is the observed upper limit.The red curve represents the interpreted results from the search for invisible decays of the Higgs boson performed by the CMS Collaboration.
detector, the ALP is assumed to decay promptly with the branching fraction B(a → γγ) = 100% [19].The indirect limit from the search for invisible decays of the Higgs boson performed by the CMS Collaboration [58] is also shown.Whereas the expected limits on the H boson production cross section times its branching fraction into a dilepton and a diphoton pair via a Z boson and a pseudoscalar, shown in Fig. 7, do not exhibit a strong mass dependence for m a > 3 GeV, the limits on the C eff ZH /Λ are several factors stronger in the low-mass region with respect to m a = 30 GeV.

Summary
A search for Higgs boson (H) decays to a Z boson and an axion-like particle (ALP), which subsequently decay into a lepton pair and a photon pair, respectively, is presented.The analysis is based on proton-proton collision data collected at √ s = 13 TeV by the CMS experiment in 2016-2018, corresponding to an integrated luminosity of 138 fb −1 .The analysis probes pseudoscalar masses (m a ) in the range 1-30 GeV.This is the first search for Higgs boson decays in the final state of two leptons and two photons.Upper limits are set at 95% confidence level on the production cross section of the Higgs boson times its branching fraction into a dilepton and a diphoton pair via a Z boson and a pseudoscalar, σ(pp → H) B(H → Za → ℓℓγγ), where ℓ = e, µ.The observed (expected) limits varies in the range 1.1-17.8(1.7-17.9)fb within the probed m a interval of 1-30 GeV.The largest excess with respect to the standard model prediction is observed for m a = 3 GeV and has a local (global) significance of 2.6 (1.

Figure 1 :
Figure 1: Feynman diagram for a BSM decay of the H boson into a Z boson and a light pseudoscalar boson, subsequently decaying to two leptons (ℓ = e, µ) and two photons, respectively.

6 (Figure 2 :
Figure 2: Distributions of the four most discriminating variables used as input to the BDT:(m a − m a,hyp )/m ℓℓγγ (upper left), leading photon's σ iηiη (upper right), subleading photon's σ iηiη (lower left), and leading photon's R 9 (lower right).The events pass the selection criteria described in Section 5.The signal is scaled to a cross section of 0.1 pb and the background sample is normalized to an integrated luminosity of 138 fb −1 .The systematic uncertainties included in the shaded band are related to the photon efficiency, lepton efficiency, and pileup modeling.The impact of the remaining disagreement between data and simulation is negligible.

Figure 3 :
Figure3: Distributions of the BDT output for m a = 1 GeV (upper left), 10 GeV (upper right), 20 GeV (lower left), and 30 GeV (lower right).The events pass the selection criteria described in Section 5.The signal is scaled to a cross section of 0.1 pb and the background sample is normalized to an integrated luminosity of 138 fb −1 .The systematic uncertainties included in the shaded band are related to the photon efficiency, lepton efficiency, and pileup modeling.

Figure 4 :
Figure 4: Fit to the simulated m ℓℓγγ distributions for a signal with m a = 30 GeV in the electron (left) and muon (right) channels for the year 2018.

Figure 5 :
Figure 5: Product of detector efficiency and analysis acceptance for signal samples with various m a values for the electron (left) and muon channel (right).The error bars include statistical and systematic uncertainties.The photon efficiency, lepton efficiency, and pileup modeling uncertainties are taken into account for the systematic uncertainty.

Figure 6 :
Figure 6: Invariant mass m ℓℓγγ distribution in data (black points).The signal-plus-background model fit is shown for m a = 1 (left) and 30 (right) GeV, where the solid red line shows the total signal-plus-background contribution, and the dashed red line shows the background component only.The lower panels show the residual signal yield after the background subtraction.The one (green, inner) and two (yellow, outer) standard deviation bands show the uncertainties in the fitted background model.These bands include the uncertainty due to the choice of function and the uncertainty in the fitted parameters.

Figure 7 :
Figure7: Expected and observed 95% CL limits on the product of the production cross section of the H boson and its branching fraction into a dilepton and a diphoton pair via a Z boson and a pseudoscalar, σ(pp → H) B(H → Za → ℓℓγγ).The dashed black curve is the expected upper limit, while the one and two standard-deviation bands are shown in green and yellow, respectively.The solid black curve is the observed upper limit.

Figure 8 :
Figure 8: Expected and observed limits at 95% CL on C effZH /Λ, assuming the ALP decays exclusively to a photon pair.The dashed black curve is the expected upper limit, while the one and two standard-deviation bands are shown in green and yellow, respectively.The solid black curve is the observed upper limit.The red curve represents the interpreted results from the search for invisible decays of the Higgs boson performed by the CMS Collaboration.
3) standard deviations.Constraints are set on the ALP model parameter C eff ZH /Λ, which describes the coupling between the Higgs boson, Z boson, and ALP.Foundation; the Alexander von Humboldt Foundation; the Science Committee, project no.22rl-037 (Armenia); the Belgian Federal Science Policy Office; the Fonds pour la Formation à la Recherche dans l'Industrie et dans l'Agriculture (FRIA-Belgium); the Agentschap voor Innovatie door Wetenschap en Technologie (IWT-Belgium); the F.R.S.-FNRS and FWO (Belgium) under the "Excellence of Science -EOS" -be.h project n. 30820817; the Beijing Municipal Science & Technology Commission, No. Z191100007219010 and Fundamental Research Funds for the Central Universities (China); the Ministry of Education, Youth and Sports (MEYS) of the Czech Republic; the Shota Rustaveli National Science Foundation, grant FR-22-985 (Georgia); the Deutsche Forschungsgemeinschaft (DFG), under Germany's Excellence Strategy -EXC 2121 "Quantum Universe" -390833306, and under project number 400140256 -GRK2497; the Hellenic Foundation for Research and Innovation (HFRI), Project Number 2288 (Greece); the Hungarian Academy of Sciences, the New National Excellence Program -ÚNKP, the NKFIH research grants K 124845, K 124850, K 128713, K 128786, K 129058, K 131991, K 133046, K 138136, K 143460, K 143477, 2020-2.2.1-ED-2021-00181, and TKP2021-NKTA-64 (Hungary); the Council of Science and Industrial Research, India; ICSC -National Research Center for High Performance Computing, Big Data and Quantum Computing, funded by the EU NexGeneration program (Italy); the Latvian Council of Science; the Ministry of Education and Science, project no.2022/WK/14, and the National Science Center, contracts Opus 2021/41/B/ST2/01369 and 2021/43/B/ST2/01552 (Poland); the Fundac ¸ão para a Ciência e a Tecnologia, grant CEECIND/01334/2018 (Portugal); the National Priorities Research Program by Qatar National Research Fund; MCIN/AEI/10.13039/501100011033,ERDF "a way of making Europe", and the Programa Estatal de Fomento de la Investigaci ón Científica y Técnica de Excelencia María de Maeztu, grant MDM-2017-0765 and Programa Severo Ochoa del Principado de Asturias (Spain); the Chulalongkorn Academic into Its 2nd Century Project Advancement Project, and the National Science, Research and Innovation Fund via the Program Management Unit for Human Resources & Institutional Development, Research and Innovation, grant B05F650021 (Thailand); the Kavli Foundation; the Nvidia Corporation; the SuperMicro Corporation; the Welch Foundation, contract C-1845; and the Weston Havens Foundation (USA).

Table 1 :
Minimum BDT output values used to define the signal region, with the associated signal efficiencies and background yields in the signal region.The statistical uncertainties are also shown.

Table 2 :
Sources of systematic uncertainties and their impact on the signal strength for each data-taking period.