Search for an exotic decay of the Higgs boson to a pair of light pseudoscalars in the final state with two muons and two b quarks in pp collisions at 13 TeV

A search for exotic decays of the Higgs boson to a pair of light pseudoscalar particles a 1 is performed under the hypothesis that one of the pseudoscalars decays to a pair of opposite sign muons and the other decays to bb. Such signatures are predicted in a number of extensions of the standard model (SM), including next-to-minimal supersymmetry and two-Higgs-doublet models with an additional scalar singlet. The results are based on a data set of proton-proton collisions corresponding to an integrated luminosity of 35 . 9 fb − 1 , accumulated with the CMS experiment at the CERN LHC in 2016 at a centre-of-mass energy of 13 TeV. No statistically signiﬁcant excess is observed with respect to the SM backgrounds in the search region for pseudoscalar masses from 20 GeV to half of the Higgs boson mass. Upper limits at 95% conﬁdence level are set on the product of the production cross section and branching fraction, σ h B ( h → a 1 a 1 → μ + μ − bb ) , ranging from 5 to 33 fb, depending on the pseudoscalar mass. Corresponding limits on the branching fraction, assuming the SM prediction for σ h , are ( 1–7 ) × 10 − 4 .


Introduction
The discovery of the particle now identified as the Higgs boson by the ATLAS and CMS experiments [1-3] at the CERN LHC [4] has opened a new era in the history of particle physics. So far, precise measurements of the Higgs boson spin, parity, width, and couplings in production and decay have been consistent with the expectations for the standard model (SM) Higgs boson [5][6][7][8]. However, the possibility of exotic Higgs boson decays to new lighter bosons is not excluded, and is proposed in various theories beyond the SM (BSM) [9]. The LHC combination of the SM Higgs boson measurements at 7 and 8 TeV allows Higgs boson decays to BSM states with a rate of up to 34% [7] at 95% confidence level (CL). The LHC data at 13 TeV have been used to place an upper limit of about 40% for the Higgs boson branching fraction (B) to BSM particles at 95% CL [10].
Several searches for exotic decays of the Higgs boson have been performed at the LHC, using the data at 8 TeV [11][12][13][14] and 13 TeV [15][16][17][18][19][20][21]. Such decays occur in the context of the nextto-minimal supersymmetric standard model, NMSSM, and other extensions to two-Higgs-doublet models (2HDM) where the exis-E-mail address: cms -publication -committee -chair @cern .ch. tence of a scalar singlet is hypothesised (2HDM+S) [9, [22][23][24]. The 2HDM, and hence 2HDM+S, are categorised into four types depending on the interaction of SM fermions with the Higgs doublet structure [14]. All SM particles couple to the first Higgs doublet, 1 , in type I models. In type II models, which include the NMSSM, up-type quarks couple to 1 while leptons and down-type quarks couple to the second Higgs doublet, 2 . Quarks couple to 1 and leptons couple to 2 in type III models. In type IV models, leptons and up-type quarks couple to 1 , while down-type quarks couple to 2 . After electroweak symmetry breaking, the 2HDM predicts a pair of charged Higgs bosons H ± , a neutral pseudoscalar A, and two neutral scalar mass eigenstates, H and h. In the decoupling limit the lighter scalar eigenstate, h, is the observed boson with m h ≈ 125 GeV. In 2HDM+S models, a complex scalar singlet S R + i S I that has no direct Yukawa couplings is introduced. Hence, it is expected to decay to SM fermions by virtue of mixing with the Higgs sector. This mixing is small enough to preserve the SM-like nature of the h boson.
In this Letter we consider the Higgs boson decay to a pair of a 1 particles where a 1 is a pseudoscalar mass eigenstate mostly composed of S I . We perform a search for the decay chain h → a 1 a 1 → μ + μ − bb. The gluon gluon fusion (ggF) and the vector boson fusion (VBF) production mechanisms are considered, with production cross sections of 48.58 ± 2.45 pb (at next-to-next-to- next-to-leading order in QCD) and 3.78 ± 0.08 pb (at next-tonext-to-leading order in QCD), respectively [25]. As a benchmark, the branching fraction of h → a 1 a 1 is assumed to be 10%. The branching fractions of a 1 to SM particles depend on the type of 2HDM+S, on the pseudoscalar mass m a 1 , and on tan β, defined as the ratio of the vacuum expectation values of the second and first doublets. The tan β parameter is assumed to be 2 which implies 2B(a 1 → bb)B(a 1 → μ + μ − ) = 1.7 × 10 −3 for m a 1 = 30 GeV in type-III 2HDM+S [9]. For the set of parameters under discussion and with 20 ≤ m a 1 ≤ 62.5 GeV, no strong dependence on m a 1 is expected for B(a 1 → bb) and B(a 1 → μ + μ − ) [9]. The product of the cross section and branching fraction is therefore approximated to be about 8 fb for all m a 1 values considered in this analysis.
The present search for the exotic a 1 particle in the μ + μ − bb final state is sensitive to the mass range of 20 ≤ m a 1 ≤ 62.5 GeV. The sensitivity of the search largely decreases towards m a 1 ≈ 20 GeV and lower because a 1 gets boosted and the two b quark jets tend to merge [26]. The upper bound is imposed by the Higgs boson mass. The analysis is performed using the proton-proton collision data at 13 TeV collected with the CMS detector during 2016, corresponding to an integrated luminosity of 35.9 fb −1 . Though the signal selection is optimised for the h → a 1 a 1 → μ + μ − bb process, decays of h → a 1 a 1 → μ + μ − τ + τ − can contribute to the selected sample if hadronically decaying τ leptons are misidentified as b quark jets. Such a contribution is found to be negligible using the benchmark scenario, although in some parts of the parameter space the enhancement in B(a 1 → τ + τ − ) can lead to a nonnegligible fraction of these events surviving the selection. This is taken into account in the scan over the (m a 1 , tan β) plane in the type III 2HDM+S, as for certain values, the increase in μ + μ − τ + τ − signal can affect the sensitivity. The signal from a 1 a 1 → bbτ + τ − with τ → μ leads to m μμ significantly smaller than m a 1 and is not considered in the search.
The CMS detector is briefly described in Section 2. The data and simulated samples are introduced in Section 3. Section 4 is devoted to the event selection and categorisation. The signal and background modelling is discussed in Section 5, while in Section 6, different sources of systematic uncertainties are described. Results are presented in Section 7, and the paper is summarised in Section 8.

The CMS detector
The central feature of the CMS apparatus is a superconducting solenoid of 6 m internal diameter, providing a magnetic field of 3.8 T. Within the solenoid volume are a silicon pixel and strip tracker, a lead tungstate crystal electromagnetic calorimeter, and a brass and scintillator hadron calorimeter, each composed of a barrel and endcap sections. Forward calorimeters, made of steel and quartz-fibres, extend the pseudorapidity coverage provided by the barrel and endcap detectors. Muons are detected in gasionisation chambers embedded in the steel flux-return yoke outside the solenoid. They are measured in the pseudorapidity range |η| < 2.4, with detection planes made using three technologies: drift tubes, cathode strip chambers, and resistive plate chambers. The efficiency to reconstruct and identify muons is greater than 96%. Matching muons to tracks measured in the silicon tracker results in a relative transverse momentum (p T ) resolution, for muons with p T up to 100 GeV, of 1% in the barrel and 3% in the endcaps [27]. A more detailed description of the CMS detector, together with a definition of the coordinate system used and the relevant kinematic variables, can be found in Ref. [28].

Simulated samples
The NMSSMHET model [9] is used to generate signal samples with the Monte Carlo (MC) event generator MadGraph5_amc@nlo [29] at leading order (LO). The signal samples span the m a 1 search region in 5 GeV steps. Background processes with dominant contributions are the Drell-Yan production in association with additional b quarks and tt in the dimuon final state. Simulated samples for background processes are used in this analysis to optimise the selection and for validation purposes in those selection steps that yield reasonable statistical precision. The contribution of backgrounds to the selected sample is directly extracted from data with no reference to simulation. The Drell-Yan process, Z/γ * (→ + − ) + jets with a minimum dilepton mass threshold of 10 GeV, is modelled with the same event generator at LO, exclusive in number of additional partons (up to 4). The reference cross section for the Drell-Yan process is computed using fewz 3.1 [30] at next-to-next-to-leading order. The top quark samples, tt and single top quark production, are produced with powheg2.0 [31][32][33][34] at next-to-leading order (NLO). Backgrounds from diboson (WW, WZ, ZZ) production are generated at NLO with the same program and settings as that of the Drell-Yan samples. The only exception is the WW process that is generated at LO. The set of parton distribution functions (PDFs) is NLO NNPDF3.0 for NLO samples, and LO NNPDF3.0 for LO samples [35]. For all samples, pythia 8.212 [36] with tune CUETP8M1 [37,38] is used for the modelling of the parton showering and fragmentation. The full CMS detector simulation based on Geant4 [39] is implemented for all generated event samples. In order to model the effect of additional interactions per bunch crossing (pileup), generated minimum bias events are added to the simulated samples. The number of additional interactions are scaled to agree with that observed in data [40].

Event selection and categorisation
Events are filtered using a high-level trigger requirement based on the presence of two muons with p T > 17 and 8 GeV. For offline selection, events must contain at least one primary vertex, considered as the vertex of the hard interaction. At least four tracks must be associated with the selected primary vertex. The longitudinal and radial distances of the vertex from the centre of the detector must be smaller than 24 and 2 cm, respectively. The vertex with the largest sum of p 2 T of the physics objects is chosen for the analysis. The physics objects are the jets, clustered using the jet finding algorithm [41,42] with the tracks assigned to the vertex as inputs, and the associated missing transverse momentum, taken as the negative vector sum of the p T of those jets. Extra selection criteria are applied to leptons and jets, reconstructed using the CMS particle-flow algorithm [43].
The selection requires two muons with opposite electric charge in |η| < 2.4, originating from the selected primary vertex. Events with the leading (subleading) muon p T > 20 (9) GeV are selected.
A relative isolation variable I rel is calculated by summing the transverse energy deposited by other particles inside a cone of size R = √ ( η) 2 + ( φ) 2 = 0.4 around the muon with φ being the azimuthal angle measured in radians, divided by the muon p T , where I ch. h , I γ , I n. h and I PU ch. h are, respectively, the scalar p T sums of stable charged hadrons, photons, neutral hadrons, and charged hadrons associated with pileup vertices. The contribution 0.5 I PU ch. h accounts for the expected pileup contribution from neutral particles. The neutral-to-charged particle ratio is taken to be approximately 0.5 from isospin invariance. Only muons with the isolation variable satisfying I rel < 0.15 are considered in the analysis. The efficiencies for muon trigger, reconstruction, and selection in simulated events are corrected to match those in data. In case more muons in the event pass the selection requirements, the two with the largest p T are chosen. Jets are reconstructed by clustering charged and neutral particles using the anti-k T algorithm [41] with a distance parameter of 0.4. The reconstructed jet energy is corrected for effects from the detector response as a function of the jet p T and η. Contamination from pileup, underlying event, and electronic noise are subtracted [44,45]. Extra η-dependent smearing is performed on the jet energy in simulated events as prescribed in Refs. [44,45].
Events are required to have at least two jets with |η| < 2.4 and p T > 20 (leading) and 15 GeV (subleading), with both jets separated from the selected muons ( R > 0.5). A combined secondary vertex algorithm is used to identify jets that are likely to originate from b quarks. The algorithm uses the track-based lifetime information together with the secondary vertices inside the jet to provide a multivariate discriminator for the b jet identification [46]. Working points "loose" (L), "medium" (M), and "tight" (T) are defined. They correspond to thresholds on the discriminator, for which the misidentification probability is around 10, 1, and 0.1%, respectively, for jets originating from light quarks and gluons [46]. The misidentification probability for jets originating from c quarks is around 30, 10, and 2%, respectively, for the loose, medium, and tight working points. The efficiencies for correctly identifying b jets are ≈80% for the loose, ≈60% for the medium, and ≈40% for the tight working point. The jet with maximum discriminator value must pass the tight working point of the algorithm, while the second is required to pass the loose one. The correction factors for b jet identification are applied to simulated events to reproduce the data distribution of the b tagging discriminator. In events with more jets passing the selection criteria, the two with the largest p T are taken.
The imbalance in the transverse momentum in signal events is not expected to be large, as the contribution from neutrinos from semileptonic decays in b jets is typically small. The missing transverse momentum, p miss T is defined as the magnitude of the negative vector sum of the transverse momenta of all reconstructed particles. The jet energy calibration introduces corrections to the p miss T measurement [45]. Events are required to have p miss T < 60 GeV.
Assuming the b quark jets and muons are the decay products of the pseudoscalar a 1 , it is expected to have m bb ≈m μμ ≈m a 1 in signal events. Moreover, the system of muons and b quark jets is expected to have an invariant mass close to m h . A χ 2 variable is introduced as Here σ bb and σ h are, respectively, the mass resolutions of the di-b-quark jet system and the Higgs boson candidate, derived from simulation. The mass resolution of the di-b-quark jet system increases linearly with m a 1 . It is evaluated on an event-by-event basis, where m μμ is assumed to be equal to m a 1 . The decay width of a 1 is negligible compared with the experimental mass resolutions in the analysis. The distribution of χ 2 in the signal sample with m a 1 = 40 GeV is compared with that in backgrounds in  A method that fully relies on data is used to estimate the background, as described in Section 5. Simulated background samples are however used to optimise the selection. Fig. 3 shows distributions, in data and simulation, for events passing the selection requirements except those of p miss T and χ 2 . In this figure, expected number of simulated events is normalised to the integrated luminosity of 35.9 fb −1 . Data and simulation are compared for the p T of the dimuon system, and the mass and p T of the di-b-jet system. Using the same selected muon and jet pairs, Fig. 3 also illustrates the distributions of the invariant mass m μμbb and the transverse momentum p μμbb T of the four-body system. The distributions for simulated events follow reasonably those in the data, within the statistical uncertainties presented in the figure. The yield in data and the expected yields in simulation are presented in Table 1. The expected yield from a signal of h → a 1 a 1 → μ + μ − τ + τ − is found to be around 0.01 with the model parameters used in this table.
To enhance the sensitivity, an event categorisation is employed: different categorisation schemes are tried, and the one resulting in the highest expected significance is chosen. The data in a sideband region are used to determine the categorisation that is most sensitive for this analysis. The sideband region is constructed using the same selection as that for the signal region except that 5 < χ 2 < 11. In simulated background samples, the correlations between χ 2 and m μμ and the variables used for categorisation are found to be small. The best sensitivity is found with categorisation according to the b tagging discriminator value of the loose b-tagged jet. The tight-tight (TT) category contains events with both jets passing the tight requirements of the b jet identification algorithm. Events in which the loose b-tagged jet passes the medium b tagging requirements but fails the tight conditions fall Table 1 Event yields for simulated processes and data after requiring two muons and two b jets (μ + μ − bb selection) and after the final selection. The expected number of simulated events is normalised to the integrated luminosity of 35  into the tight-medium (TM) category. The remaining events with the loose b-tagged jet failing the medium requirements of the b jet identification algorithm belong to the tight-loose (TL) category. On average, 41% of signal events pass the TL selection, while 32% fulfil the TM requirements and 27% belong to the TT category. According to the data in the sideband region, the majority of background events (≈70%) fall into the TL category whereas about 20% pass the TM requirements and less than 10% can meet the TT criteria.

Signal and background modelling
The search is performed using an unbinned fit to the m μμ distribution in data, simultaneously in the TT, TM, and TL categories. The signal shape is modelled with a weighted sum of Voigt profile [47] and Crystal Ball (CB) functions [48], where the mean values of the two are bound to be the same. The initial values for the signal model parameters are extracted from a simultaneous fit of the model to the simulated signal samples described in Section 3. Almost all parameters in the signal model are found to be independent of m a 1 and are fixed in the final fit. The only exceptions are the resolution parameter of the Voigt profile and CB functions, σ v and σ cb , respectively. These parameters depend linearly on m a 1 and only their slopes, respectively α and β, float in the final fit within their uncertainties, The expected signal efficiency and acceptance are interpolated for m a 1 values not covered by simulation. The m μμ distribution in data is used to evaluate the contribution of backgrounds. The uncertainty associated with the choice of the background model is treated in a similar way as other uncertainties for which there are nuisance parameters in the fit. The unbinned likelihood function for the signal-plus-background fit has the form where s(p, m μμ ) is the parametric signal shape with the set of parameters indicated by p, and b(m μμ ) is the background model. The shape for the background is modelled, independently in each category, with a set of analytic functions using the discrete profiling method [49][50][51]. In this approach the choice of the functional form of the background shape is considered as a discrete nuisance parameter, for which the best fit value can vary as the trial value of the parameter of interest (m μμ ) varies. The background parameter space therefore contains multiple models, each including its own parameters.  To provide the input background models to the discrete profiling method, the data are modelled with different parametrisations of polynomials. The degrees of the polynomials are determined through statistical tests (F-test) [52] to ensure the sufficiency of number of parameters and to avoid over-fitting the data. The input background functions are tried in the minimisation of the negative logarithm of the likelihood with a penalty term added to account for the number of free parameters in the background model. The discrete profiling method can choose a different best-fit functional form for the background as the physics parameter of interest varies, thus effectively incorporating the systematic uncertainty on the background functional form: in the present analysis the result is to yield expected upper limits that are about 10% less stringent than those obtained with a single functional form for the background. The likelihood ratio for the penalised likelihood function L can be written as where μ is the measured quantity. The numerator is the maximum penalised likelihood for a given μ, at the best fit values of nuisance parameters, θ μ and of the background function, b μ . The denominator is the global maximum for L, achieved at μ =μ, θ =θ and b =b. A confidence interval for μ is obtained with the background function maximising L for any value of μ [49].

Systematic uncertainties
The statistical interpretation of the analysis takes into account several sources of systematic uncertainties related to the accuracy in the signal modelling and uncertainties in the signal acceptance. The imprecise knowledge of the background contributions is taken into account by the discrete profiling method described in Section 5.
Theoretical uncertainties: to evaluate the upper limit on B(h → a 1 a 1 → μ + μ − bb), the Higgs boson production cross section is set to the SM prediction where an uncertainty of 4.7% is considered for the sum of the ggF and VBF production cross sections, accounting for PDF and α s uncertainties [25].
Uncertainties in signal shape and acceptance modelling: an uncertainty of 2.5% is assigned to the integrated luminosity of the CMS 13 TeV data collected in 2016 [40]. The uncertainty in the number of pileup interactions per event is estimated by varying the total inelastic ppcross section by ±4.6% [53]. The simulation-to-data correction factors for the trigger efficiency, muon reconstruction, and selection efficiencies are estimated using a "tag-and-probe" method [54] in Drell-Yan data and simulated samples. These uncertainties include the pileup dependence of the correction factors. For the jet energy scale (JES), the variations are made according to the ηand p T -dependent uncertainties and propagated to the p miss T of the event. An additional uncertainty, arising from unclustered energies in the event, is assessed for p miss T . For the jet energy resolution, the smearing corrections are varied within their uncertainties [44]. Systematic uncertainty sources that affect the simulation-to-data corrections of the b tagging discriminator distribution are JES, the contaminations from light flavor (LF) jets in the b-jet sample, the contaminations from heavy flavor (HF) jets in the light-flavor jet sample, and the statistical fluctuations in data and MC. The uncertainties due to JES and light-flavor jet contamination in b-jet samples are found to be dominant [46].
Uncertainties in the choice of the renormalisation, μ r , and factorisation, μ f , scales are estimated by doubling and halving μ r and μ f simultaneously in the signal sample. To estimate the uncertainties associated with the parton showering and fragmentation model, additional signal samples are produced using herwig ++ [55] and compared to pythia. Finally, uncertainties arising from the limited understanding of the PDFs [56] are taken into account. These uncertainties have a negligible effect on the shape of the signal. Their effects on the yield are taken into account by introducing nuisance parameters with log-normal distributions into the fit.

Results
The analysis yields no significant excess of events over the SM background prediction. Fig. 4 shows the m μμ distribution in the data of all categories together with the best fit output for the background model, including uncertainties.
The upper limit on σ h B(h → a 1 a 1 → μ + μ − bb) is obtained at 95% CL using the CL s criterion [57,58] and an asymptotic approximation to the distribution of the profiled likelihood ratio test statistic [59]. Assuming the SM cross sections for the Higgs boson production processes within the theoretical uncertainties, an upper limit is placed on B(h → a 1 a 1 → μ + μ − bb) using the same procedure. Limits are evaluated as a function of m a 1 . The observed and expected limits are illustrated in Fig. 5 for both cases. Dominant systematic uncertainties are those associated with the b jet identification, followed by the modelling of parton shower and fragmentation. For m a 1 = 40 GeV, the b tagging uncertainties arising from LF contamination and JES amount to 17 and 14%, respectively. The uncertainty arising from the parton shower and fragmentation models is about 7%. Other uncertainties are below 5%. ggF Higgs boson production cross section has increased by a factor of about 2.3 over that at 8 TeV, while the production cross section of main backgrounds, Drell-Yan and tt, has increased by a factor of 1.5 and 3.3, respectively. Despite the relative increase in backgrounds, better sensitivity is achieved using improved analysis techniques in Run II. Observed limits on B(h → a 1 a 1 ) are shown in Fig. 6 in the plane of (m a 1 , tan β) for type-III and type-IV 2HDM+S, using only the μ + μ − bb signal. The allowed ranges for B(h → a 1 a 1 ) ≤ 1 and B(h → a 1 a 1 ) ≤ 0.34 [7] are also presented.
μμbb as well as the sensitivity of the analysis. Here sel. refers to the acceptance and the selection efficiency of the process. The ratio sel. μμτ τ / sel. μμbb is about 1% in the TL category while it reduces to 0.3 and 0.1% in the TM and TT categories, respectively. However, because of the increase in the relative branching fraction, the contribution of the μ + μ − τ + τ − sig- Fig. 6. Observed upper limits at 95% CL on B(h → a 1 a 1 ) in the plane of (m a1 , tan β) for (upper) type-III and (lower) type-IV 2HDM+S, using only the μ + μ − bb signal. nal becomes nonnegligible in the type-III 2HDM+S with tan β ≈ 5. Fig. 7 shows the observed limits on B(h → a 1 a 1 ) in the (m a 1 , tan β) plane, including the contribution of μ + μ − τ + τ − signal for type-III 2HDM+S. The observed limit contours of B(h → a 1 a 1 ) = 1.00 and B(h → a 1 a 1 ) = 0.34 are generally extended compared with Fig. 6 (upper).

Summary
A search for the Higgs boson decay to a pair of new pseudoscalars h → a 1 a 1 → μ + μ − bb, motivated by the next-to-minimal supersymmetric standard model and other extensions to two-Higgs-doublet models, is carried out using a sample of protonproton collision data corresponding to an integrated luminosity of 35.9 fb −1 at 13 TeV centre-of-mass energy. No statistically significant excess is found in data with respect to the background prediction. The results of the analysis are presented in the form of upper limits, at 95% confidence level, on the product of the Higgs boson production cross section and branching fraction, σ h B(h → a 1 a 1 → μ + μ − bb) as well as on the Higgs boson branching fraction assuming the SM prediction of σ h . The former

Acknowledgements
We congratulate our colleagues in the CERN accelerator departments for the excellent performance of the LHC and thank the technical and administrative staffs at CERN and at other CMS institutes for their contributions to the success of the CMS effort. In addition, we gratefully acknowledge the computing centres and personnel of the Worldwide LHC Computing Grid for delivering so effectively the computing infrastructure essential to our analyses.