Search for MSSM Higgs bosons decaying to $\mu^+\mu^-$ in proton-proton collisions at $\sqrt{s} =$ 13 TeV

A search is performed for neutral non-standard-model Higgs bosons decaying to two muons in the context of the minimal supersymmetric standard model (MSSM). Proton-proton collision data recorded by the CMS experiment at the CERN Large Hadron Collider at a center-of-mass energy of 13 TeV were used, corresponding to an integrated luminosity of 35.9 fb$^{-1}$. The search is sensitive to neutral Higgs bosons produced via the gluon fusion process or in association with a $\mathrm{b\overline{b}}$ quark pair. No significant deviations from the standard model expectation are observed. Upper limits at 95% confidence level are set in the context of the $m_\mathrm{h}^{\text{mod+}}$ and phenomenological MSSM scenarios on the parameter $\tan\beta$ as a function of the mass of the pseudoscalar A boson, in the range from 130 to 600 GeV. The results are also used to set a model-independent limit on the product of the branching fraction for the decay into a muon pair and the cross section for the production of a scalar neutral boson, either via gluon fusion, or in association with b quarks, in the mass range from 130 to 1000 GeV.


Introduction
The boson discovered at the Large Hadron Collider (LHC) in 2012 [1-3], with a mass around 125 GeV [4], has properties that are consistent with those predicted for the standard model (SM) Higgs boson [5]. However, the SM is known to be incomplete, and several well-motivated theoretical models beyond the SM predict an extended Higgs sector. One example is supersymmetry [6, 7] that protects the mass of the Higgs boson against quadratically divergent quantum corrections. In the minimal supersymmetric standard model (MSSM) [8][9][10], the Higgs sector consists of two Higgs doublets, one of which couples to up-type fermions and the other to down-type fermions. Assuming that CP symmetry is conserved, this results in two charged bosons H ± , two neutral scalar bosons, h and H, and one pseudoscalar boson, A.
At the tree level, the Higgs sector in the MSSM can be described by only two parameters, which are commonly chosen as m A , the mass of the neutral A, and tan β, the ratio of the vacuum expectation values of the neutral components of the two Higgs doublets. The masses of the other four Higgs bosons can be expressed as a function of these two parameters. Beyond the tree level the MSSM Higgs sector depends on additional parameters, which enter via higher-order corrections in perturbation theory, and which are usually fixed to values motivated by experimental constraints and theoretical assumptions. Setting these parameters defines a benchmark scenario [11], which is then described by m A and tan β. The relevant scenarios are those consistent with a mass of one neutral boson of 125 GeV for the majority of the probed m A -tan β parameter space [12], and not ruled out by other existing measurements. In particular, the m mod+ h scenario [11] constrains the mass of the h boson to be near 125 GeV for a wide range of tan β and m A values, by tuning some of the MSSM parameters. In the phenomenological MSSM (hMSSM) [13][14][15][16] the mass of h boson is an input parameter, set to 125 GeV, and the observed neutral boson is interpreted as the h boson. Small differences in the cross sections and branching fractions exist between the two models, although the kinematics of the Higgs bosons remains almost identical.
This Letter reports on a search for beyond-the-SM neutral Higgs bosons in the dimuon final state in proton-proton (pp) collisions at a center-of-mass energy √ s of 13 TeV. The search is performed in the context of the MSSM for values of m A larger than 130 GeV, assuming either the m mod+ h or the hMSSM scenario. For values of m A 200 GeV, the MSSM is close to the decoupling limit: the h boson takes the role of the observed SM-like Higgs boson at 125 GeV, and the H and A bosons are nearly degenerate in mass. For values of m A 200 GeV the MSSM leads to similar, but not degenerate, masses for the H and A bosons [17]. The mass of the h boson is assumed to be at 125 GeV, and its width smaller than the experimental resolution, consistently with the ATLAS and CMS measurements in other decay modes [4,18,19]. The analysis tests the h boson production as predicted by the MSSM and the constraints on its production mechanisms measured by ATLAS and CMS are not enforced. Alternatively, the search is also performed in a model-independent way, where a neutral boson is assumed to be produced either via gluon fusion or in association with a bb quark pair.
At the LHC, dominant production mechanisms for the neutral A and H bosons are gluon fusion, in which the Higgs boson can be produced via a virtual loop of bottom or top quarks, and b-associated production, where the Higgs boson is produced in association with a b quark pair. This is also the case of the h boson for values of m A 200 GeV, while, in the decoupling regime, the h boson production mechanisms correspond to those predicted by the SM. Figure 1 shows the Feynman diagrams for the two production processes at leading order (LO). The gluon fusion mechanism is more relevant for tan β 30, whereas at LO, the coupling of the Higgs boson to down-type fermions is enhanced by tan β, resulting in b-associated pro- duction becoming more important at large tan β. The coupling of the neutral Higgs boson to charged leptons is enhanced for the same reason. Although the branching fraction to muons is predicted to be about 300 times smaller than that for the τ + τ − final state, the µ + µ − channel can be fully reconstructed, and the dimuon invariant mass can be measured with a precision of a few percent by exploiting the excellent muon momentum resolution of the CMS detector, making the dimuon final state an additional probe of the MSSM.
The common experimental signature of the two production mechanisms is a pair of oppositecharge muons with high transverse momentum (p T ). The b-associated production process is characterized by the presence of additional jets originating from b quark fragmentation, whereas the events containing jets from light quarks or gluons are linked to the gluon fusion production mechanism. The presence of a signal would be characterized by an excess of events over the SM background in the dimuon invariant mass corresponding to the value of the Higgs boson masses.
The analysis is performed using the data at √ s = 13 TeV collected during 2016 by the CMS experiment at the LHC corresponding to an integrated luminosity of 35

The CMS detector
The central feature of the CMS apparatus is a superconducting solenoid of 6 m internal diameter, providing a field of 3.8 T. Within the field volume are a silicon pixel and strip tracker, a crystal electromagnetic calorimeter (ECAL), and a brass and scintillator hadron calorimeter (HCAL), each composed of a barrel and two endcap sections. Muons are measured in gasionization detectors embedded in the steel return yoke of the magnet. The first level (L1) of the CMS trigger system uses information from the calorimeters and muon detectors to select events of interest. The high-level trigger processor farm decreases the L1 accept rate from around 100 kHz to about 1 kHz before data storage. A more detailed description of the CMS detector, together with a description of the coordinate system and main kinematic variables used in the analysis, can be found in Ref. [37].

Signal and background simulation
Samples of Monte Carlo (MC) simulated events are generated to model the Higgs bosons signal for the two leading production processes. This is done for a large number of m A and tan β combinations, where m A spans the range from 130 to 1000 GeV and tan β is varied from 5 to 60. Higgs boson events are generated with a mass within ±3Γ of the nominal Higgs boson mass, where Γ is the intrinsic width. The values of Γ strongly depend on m A and tan β, being, for example, Γ = 0.2 (2.7)% of the nominal Higgs boson mass at m A = 150 (550) GeV and tan β = 10 (40). The signal samples are generated with PYTHIA 8.212 [38] at LO. Additional signal samples are generated at next-to-LO (NLO) for some mass points to estimate higher-order corrections: gluon fusion samples are produced with POWHEG 2.0 [39], while b-associated production samples are produced with MADGRAPH5 aMC@NLO [40] using the four-flavor scheme.
Simulated background processes are used to optimize the event selection but not to model the background shape and normalization, which are determined directly from data. The most relevant SM background processes considered are Drell-Yan (DY) production, and single and pair production of top quarks, which can produce µ + µ − pairs with large invariant mass. Other background sources are the diboson production processes, W ± W ∓ , W ± Z, and ZZ, whose contributions are each smaller than 1% for dimuon invariant masses larger than 130 GeV, the Higgs boson search region. The background samples are generated at NLO using MADGRAPH5 aMC@NLO and POWHEG. Spin correlations in multiboson processes generated using MADGRAPH5 aMC@NLO are simulated using MADSPIN [41]. The NNPDF 3.0 [42] parton distribution functions (PDFs) are used for all samples. The parton shower and hadronization processes are modeled by PYTHIA with the CUETP8M1 [43] underlying event tune.
Detector response is based on a detailed description of the CMS detector and is simulated with the GEANT4 package [44]. Additional pp interactions in the same or nearby bunch crossings (pileup) are simulated by PYTHIA. During the data taking period, the CMS experiment was operating with, on average, 23 inelastic pp collisions per bunch crossing. The distribution of the number of additional interactions per bunch crossing in the simulation is weighted to match that observed in the data.
The values of the Higgs boson masses, widths, and the Yukawa couplings are calculated as a function of m A and tan β following the LHC Higgs Cross Section Working Group prescriptions [45,46], using the FEYNHIGGS 2.12.0 [47][48][49][50][51] program for the m mod+ h scenario. The inclusive cross sections of the Higgs bosons for the gluon fusion process are obtained with SUSHI [52], which includes NLO supersymmetric-QCD corrections [53][54][55][56][57][58], next-to-NLO (NNLO) QCD corrections for the top-quark contribution in the effective theory of a heavy top quark [59][60][61][62][63], and electroweak effects by light quarks [64,65]. Higgs boson cross sections for the b-associated production are calculated with SUSHI, and rely on matched predictions [66], which are based on the five flavour NNLO QCD calculation [67] and the four flavour NLO QCD calculation [68,69]. Higgs to µ + µ − branching fractions are calculated with FEYNHIGGS for the m mod+ h scenario and using the program HDECAY 6.40 [70] for the hMSSM scenario. Cross sections for the tt and DY background processes are computed at the NNLO with TOP++2.0 [71] and FEWZ3.1 [72], respectively, while for the single top and the diboson production processes they are computed at NLO with HATHOR [73,74] and MCFM [75], respectively.

Object reconstruction and event selection
The particle-flow (PF) algorithm [76] aims at reconstructing and identifying each individual particle in an event, with an optimized combination of information from the various elements of the CMS detector. The energy of photons is obtained from the ECAL measurement. The energy of electrons is obtained from a combination of the electron momentum at the primary interaction vertex as determined by the tracker, the energy of the corresponding ECAL cluster, and the energy sum of all bremsstrahlung photons spatially compatible with originating from the electron track. The energy of muons is obtained from the curvature of the corresponding track. The energy of charged hadrons is determined from a combination of their momentum measured in the tracker and the matching ECAL and HCAL energy deposits, corrected for zero-suppression effects and for the response function of the calorimeters to hadronic showers. Finally, the energy of neutral hadrons is obtained from the corresponding corrected ECAL and HCAL energies.
Muons with 20 < p T < 100 GeV are measured with a relative p T resolution of 1.3 to 2% in the barrel and better than 6% in the endcaps. The p T resolution in the barrel is better than 10% for muons with p T up to 1 TeV [77, 78].
Jets are reconstructed using the anti-k T clustering algorithm [79] with a distance parameter of 0.4, as implemented in the FASTJET package [80]. The quantity missing transverse momentum, p miss T , is defined as the magnitude of the negative vector p T sum of all the PF objects (charged and neutral) in the event, and is modified by corrections to the energy scale of reconstructed jets. Collision vertices are obtained from reconstructed tracks using a deterministic annealing algorithm [81]. The reconstructed vertex with the largest value of summed physics-object p 2 T is taken to be the primary pp interaction vertex (PV). The physics objects are the jets, clustered using the jet finding algorithm [79,80] with the tracks assigned to the vertex as inputs, and the associated missing transverse momentum taken as the negative vector sum of the p T of those jets.
The combined secondary vertex algorithm of Ref.
[82] is used to identify jets resulting from the hadronization of b quarks. A medium operating working point of the algorithm is applied to jets with p T > 20 GeV in the pseudorapidity range |η| < 2.4. Within this kinematic range, the efficiency of the algorithm is 66% with a misidentification probability of 1%.
The events are preselected by the trigger system [83] requiring a muon candidate with |η| < 2.4, satisfying at least one of the following criteria: p T > 24 GeV with isolation (iso) requirements, or p T > 50 GeV without isolation requirements. These are the trigger algorithms with the lowest p T threshold whose output is not artificially reduced to limit the event rate and that cover the entire η acceptance of the muon detector. Since the Higgs boson signal is searched for over a large mass range, the p T of the muons from its decay can vary from tens to hundreds of GeV. Therefore, two sets of muon identification (ID) criteria are employed in the analysis: one is optimized for muons with lower p T ( 200 GeV) (ID1) and the other for muons with larger p T (ID2).
Events with a pair of opposite-charge muons, coming from the PV, are selected requiring both muons to satisfy the same ID criterion. Accepting, more generally, pairs of muons that pass any of the two ID criteria would lead to a negligible increase in signal efficiency. At least one of the two muon candidates has to match (in η and azimuthal angle φ in radians) the muon that triggered the event. The trigger requirement depends on the ID algorithm. Offline reconstructed muons with |η| < 2.4 are considered. Their offline p T is required to be higher than 26 or 53 GeV, to be compatible with the muon that triggered the event. To reject muons from nonprompt decays, muon candidates must be isolated. The offline isolation variable is calculated depending on the ID algorithm, and is labelled iso1 (iso2) for ID1 (ID2). For ID1 it is the scalar p T sum of the PF charged and neutral hadrons in a cone of radius ∆R = √ (∆η) 2 + (∆φ) 2 = 0.4 around the muon direction, divided by the muon p T . The charged PF particles not associated with the PV are not considered in this sum, and a correction is applied in order to account for the neutral particle contamination arising from pileup [84]. For ID2 the offline iso is computed as the scalar p T sum of tracks in the silicon tracker, excluding the muon, in a cone of radius ∆R = 0.3 around the muon direction, and divided by the muon p T . Tracks not associated with the PV are not considered. Energy deposits in the calorimeters are not included, since electromagnetic showers can develop from photons radiated by a high-p T muon. The invariant mass of the Higgs boson candidate is reconstructed from the two highest-p T opposite-charge muon candidates in the event. The dimuon selection criteria are summarized in Table 1.
The muon momentum measurement is crucial for the reconstruction of the Higgs boson mass peaks since improving the dimuon mass resolution increases the sensitivity of the analysis. To set limits accurately, the mean and the resolution of the dimuon mass peaks in simulation must match those of the data. A correction of the muon momentum has been applied in order to provide consistent measurements in the different φ and η regions of the detector, improving the net resolution in data. The correction [78] is also applied to the simulated muons to align the scale and resolution to those measured in the data. The magnitudes of the momentum scale corrections are about 0.2 and 0.3% in the barrel and endcaps, respectively, for muons with p T up to 200 GeV. For muons with larger p T , since the statistical precision of the data is too poor to derive a correction, only a systematic uncertainty is considered (see Section 5).  categories, for events with dimuon invariant mass larger than 130 GeV, as observed in data (dots) and predicted by simulation (colored histograms). The shaded gray band around the total background histogram represents the total uncertainty in the simulated prediction. The contribution of the expected signal for m A = 300 GeV and tan β = 20, scaled by a factor of 100, is superimposed for illustration. The vertical line represents the upper threshold used to select the events in the two categories.
When the Higgs boson is produced in association with a bb pair, additional jets from b quark Table 2: Summary of the selection criteria that define the two event categories. Categorization is applied after the muon selection.
b-tag category No-b-tag category b-tagged jets 1 with p T > 20 GeV, |η| < 2.4 Veto Untagged jets 0,1 with p T > 20 GeV, |η| < 2.4 p miss T <40 GeV <80 GeV fragmentation are expected. Jets with p T > 20 GeV and |η| < 2.4 are considered in this analysis: those that satisfy the requirements for the medium b-tagging working point [82] are taken as b-jet candidates, otherwise they are taken as untagged jets. Events containing b-jet candidates provide the highest sensitivity for the b-associated production channel, and events that do not contain b-tagged jets provide the best sensitivity for the gluon fusion production channel. The events are therefore split into two exclusive categories: the b-tag category, containing events with strictly one b jet and at most one additional untagged jet, and the no-b-tag category, containing events without b-tagged jets. In the first category, the requirement of strictly one b jet is aimed at suppressing about 30% of the dominant background from top quark pairs, since the observed b-tagged jet multiplicity in tt events is on average higher than for the Higgs boson signal. This is because more than half of the signal events from b-associated production are characterized by b jets emitted at large η, out of the acceptance of the tracking detector, and failing the b-tag requirements, whereas b jets in tt events are preferentially emitted in the central η region. Therefore, discarding events with two or more b-tagged jets allows the tt background to be rejected without any major impact on the signal efficiency. Furthermore, tt events are characterized by a higher multiplicity of additional untagged jets than the signal events.
Signal events are characterized by a rather small p miss T . However, the background content is quite different for the two categories, as shown in Fig. 2. The background from tt events, characterized by a relatively large p miss T from W boson decays, is much more relevant for the b-tag category. For the no-b-tag category, the dominant background is DY production, whose events are characterized by a p miss T distribution that is similar to that of the signal. For this reason, a requirement on p miss T , separately tuned for the b-tag and the no-b-tag events, improves the background rejection and increases the signal sensitivity. Events belonging to the b-tag (no-b-tag) category are required to have p miss T < 40 (80) GeV. This requirement reduces the background from top quark production by about 75% (40%). The selection criteria that define the two categories are summarized in Table 2.

Signal efficiency and signal systematic uncertainties
For each value of m A and tan β, the signal efficiency for each Higgs boson sample is defined as the fraction of generated events that fulfill the selection criteria. This definition of efficiency also includes the effects of limited detector acceptance and the selections outlined in Section 4. Figure 3 shows the selection efficiency for the A boson as a function of m A , for the gluon fusion and the b-associated production processes, and for the two event categories. Each curve corresponds to the mean of the efficiency obtained by varying tan β between 5 and 60, while the band of each curve corresponds to the efficiency variations combined with the statistical and systematic uncertainties (described in the next paragraph) of the simulated samples. For a given mass, the selection efficiency is weakly dependent on tan β, since this parameter mostly affects the Higgs boson width, with a negligible impact on the kinematic properties of the event. The efficiency to detect events produced in association with b quarks is approximately Figure 3: The selection efficiency for the A boson, as a function of its mass, for the two production mechanisms, b-associated and gluon fusion, and for each of the two event categories. The band centered on each curve corresponds to the envelope of efficiencies obtained when varying tan β, combined with the statistical and systematic uncertainties.
10% at high masses for the b-tag category. This value is mostly determined by the large fraction of b jets that are emitted with an η value that is outside the coverage of the tracking detectors, and indeed ≈50% of events from b-associated samples are reconstructed in the no-b-tag category. The efficiency to detect events from gluon fusion reaches a maximal value at ≈65% for m A 400 GeV. The very small but nonvanishing efficiency for signal produced via gluon fusion in the b-tag category is due to the b misidentification probability, which is about 1%. The corresponding efficiencies for the H boson are consistent with those shown in Fig. 3.
The systematic uncertainties in the signal description arise from a possible mismodeling of the signal efficiency, of the signal shape, and, for the model interpretation, from uncertainties in its cross section.
The systematic uncertainties that affect the signal efficiency are given in Table 3. The size of the simulated signal samples introduces a statistical uncertainty in the signal efficiency that is between 0.2% and 6%, depending on the number of generated events.
In order to account for the differences between data and simulation in the muon trigger efficiency, identification, and isolation, scale factors calculated using the tag-and-probe technique [77, 78] have been applied to simulated events. A similar procedure is used to account for discrepancies between data and simulation in the b-tagging efficiency. A global correction, calculated as the product of the various scale factors, is applied as an event-by-event weight. The uncertainty associated with each scale factor is then propagated to the analysis and its impact on the final selection efficiency is assigned as systematic uncertainty. An event-by-event weight is also applied to account for the modeling of the pileup in the simulation. The uncertainty in the knowledge of the pileup multiplicity is evaluated by varying the total inelastic cross section [85, 86] by ±5%, which translates into an uncertainty smaller than 1% in the signal efficiency. The uncertainty associated with the jet energy scale [87] is estimated by rescaling the jet momentum by a factor depending on the p T and η of each jet. This variation is also propagated to the p miss T determination. Its effect on the signal selection efficiency is about 1.6 (0.4)% for the b-tag (no-b-tag) category. Systematic uncertainties in the unclustered energy are propagated Table 3: Systematic uncertainties in the signal efficiency for the two event categories. The systematic uncertainties hold for both Higgs boson production processes except for the sources listed in the last three rows, which apply to the b-associated production process only. For these three sources, in the model-independent search for a neutral boson produced in association with b quarks, the uncertainties are applied as quoted in the table. In the MSSM interpretation, these numbers have to be weighted by the relative contribution of the b-associated production process to each category. For those sources of systematics that depend on m A the range of uncertainty is quoted.

Source
Systematic uncertainty (%) b-tag category No-b-tag category MC statistical uncertainty 0.5-6 0. b tag (only for b-associated production) 2 0.6 b jet multiplicity (only for b-associated production) 20-30 7-20 Untagged jet multiplicity (only for b-associated production) 7-25 to the determination of p miss T . The effect on the signal efficiency is 4.1% for the b-tag category, and 0.3% for the no-b-tag category. Systematic uncertainty in the b-tagging algorithm affects the signal yield and the category migration with an impact on the signal efficiency of 2% for the b-tag category and 0.6% for the no-b-tag category. The uncertainty in the total integrated luminosity is 2.5% [88] and affects the signal yield.
The uncertainties in the MSSM cross sections depend on m A , tan β, and the scenario. They are provided by the LHC Higgs Cross Section Working Group [45,46]. An uncertainty of 3% is used to account for the parton distribution functions.
Additional corrections are applied to take into account the fact that the signal samples are generated with PYTHIA at LO instead of using an NLO generator. Higher-order corrections affect the Higgs boson p T modeling, with impacts on the muon acceptance and the jet multiplicity. Moreover, they cause event migration between the two categories. The acceptance obtained from the LO samples is corrected to that predicted at NLO. The corresponding systematic uncertainty is set to the size of the correction itself. The correction on the modeling of the Higgs p T increases the signal efficiency by 1-4%, depending on the Higgs boson mass. The correction on the b-jet multiplicity affects only the b-associated signal, resulting in a correction of 20-30% depending on m A , which increases the signal efficiency for the b-tag category, and a correction of 7-20% decreasing the signal efficiency for the no-b-tag category. An additional correction of 7-25%, related to the untagged jet multiplicity, is applied, and reduces the signal efficiency for the b-tag category, due to the veto on the untagged jets.
The systematic uncertainties in the b-tag efficiency and the jet multiplicity shown in Table 3 apply only to the b-associated production process. Both the b-tagging and the b-jet multiplicity uncertainties are anticorrelated between the two event categories. In the model-independent analysis for the case in which the neutral boson is assumed to be entirely produced in association with b quarks, these uncertainties are applied, as quoted in Table 3, while in the MSSM interpretation, where both the gluon fusion and the b-associated production processes contribute to the two event categories, these systematic uncertainties are weighted by the relative contribution of the latter process.
The shape of the reconstructed Higgs boson invariant mass distribution is affected by the muon momentum scale and resolution. Uncertainties in the calibration of these quantities are propagated to the shape of the invariant mass distribution assuming a Gaussian prior, leading to a variation of up to 10% in the width of the signal mass peak, and to a negligible shift of its position. These uncertainties are taken into account as a signal shape variation in the calculation of the exclusion limit.

Modeling of the signal and background shapes
The invariant mass spectrum of the signal events that pass the event selection is used to determine the signal yield for each category. In the framework of the MSSM, this is done by fitting the invariant mass distribution of the h, H, and A bosons, separately for the two event categories and for various combinations of m A -tan β values. The function F sig used to parametrize the signal mass shape [21] is defined as: (1) In Eq. (1), the terms F h , F H , and F A describe the mass shape of the h, H, and A signals, respectively. Each term is a convolution of a Breit-Wigner (BW) function to describe the resonance, with a Gaussian function to account for the detector resolution. The two parameters of the BW function, as well as the variance of each Gaussian function, are free parameters of the fit used to determine the signal model, while the quantities w h , w H , and w A are the numbers of expected events for each boson passing the event selection. For the m A -tan β points for which the signal samples were not generated, the parameters are interpolated from the nearby generated points. In order to correct for differences of the order of a few GeV between the PYTHIA prediction of m H with respect to the value calculated by FEYNHIGGS in the m mod+ h or the value used in the hMSSM, especially for m A 200 GeV, the invariant mass distribution of the H boson is shifted by the corresponding amount. For the model-independent analysis the signal shape is described using one single resonance in Eq. (1).
The analysis does not use background estimation from simulation due to the limited size of simulated events compared to data in the region of interest, as well as due to the large theoretical uncertainties in the background description at high invariant masses. Therefore, given the smooth dependence of the background shape on the dimuon invariant mass, it is estimated from the data, by assuming a functional form to describe its dependence as a function of the reconstructed dimuon invariant mass, m µµ , and by fitting it to the observed distribution.
The functional form used to describe the background shape is defined as: (2) The quantity exp(λm µµ ) parametrizes the exponential part of the mass distribution, and f represents the weight of the BW term with respect to DY photon exchange, while N 1 and N 2 correspond to the integral of each term in F bkg . The quantities λ and f are free parameters of the fit. The parameters Γ Z and m Z are separately determined for the two event categories by fitting the dimuon mass distribution close to the Z boson mass. The fit provides the effective values of such quantities, which include detector and resolution effects. Their values are then kept constant when using F bkg in the final fit. The systematic uncertainty that stems from the choice of the functional form in Eq.
(2), which was used in earlier searches [21], is assessed as described below.
A linear combination of the functions describing the expected signal and the background is then used to perform a binned maximum likelihood fit to the data, where the uncertainties are treated as nuisance parameters: The fit is performed for each m A and tan β hypothesis, as the yield of the signal events and the shape of F sig depend on these quantities. The parameters that describe the signal are determined by fitting the simulated samples that pass the event selection with Eq. (1), for each m A and tan β pair, as explained above. Subsequently they are assigned as constant terms in F fit . The quantity f bkg is a free parameter in the fit, and the fraction of signal events is defined as f sig = (1 − f bkg ). The overall normalization is also a free parameter and is profiled in the fit.
For each m A assumption, the function F fit is used to fit the data over an m µµ range centered on m A . The range has to be large enough to account for the signal width, including the experimental resolution, and it is ±50 GeV for m A ≤ 290 GeV, ±75 GeV for 290 < m A ≤ 390 GeV, and ±100 GeV for 390 < m A ≤ 500 GeV. For values of m A smaller than 165 GeV the lower bound of the mass window is set to 115 GeV. For m A > 500 GeV, the entire range from 400 to 1200 GeV is used. The h boson is used to constrain the results when its mass is included in the fitted mass range.
The uncertainty introduced by the choice of the analytical function used to parametrize the background is estimated by using a method similar to that used in Refs. [3, 21, 89]. The method is based on the determination of the number of spurious signal events that are introduced by the choice of the background function F bkg , when the background is fit by the function F fit . The invariant mass spectrum is fitted by the function F a bkg , chosen among various functional forms: Eq. (2) or other similar expressions that include a BW plus exponentials, and sum of exponentials. All these functional forms adequately describe the background distribution observed in data. The fit is performed in the proper mass range centered around the assumed value of m A , and the parameters of F a bkg are determined. Then, thousands of MC pseudoexperiments are generated, each one containing the same number of events as observed in the data, distributed according to the functional form F a bkg . For each pseudo-experiment, the invariant mass distribution is then fit with the function F fit of Eq. (3), once using F a bkg , and then using a different function F b bkg , given by Eq.
(2). For each pseudo-experiment, the spurious signal yield, expressed by the number of events N a bias and N b bias , is determined. The quantity N a bias is on average consistent with zero within statistical fluctuations. The quantity N b bias represents the number of spurious signal events that are found in the signal yield if the function F b bkg is used to describe the background, when the background itself is actually distributed according to F a bkg . The median of the distribution of the difference N a bias − N b bias obtained from the pseudo-experiments is defined as the bias introduced by using the function F b bkg , relative to the tested mass m A . This procedure is repeated for each function F a bkg among the functional forms mentioned above, and the largest bias is taken as the systematic uncertainty in the number of signal events obtained from the maximum likelihood fit, due to the choice of Eq.  Figure 4: Examples of fits to data with a signal plus background hypothesis, for a narrow-width signal with a mass of 400 GeV (left), and 980 GeV (right), for the two event categories added together, after weighting by their sensitivity. The resonance φ is assumed to be produced via the b-associated production, and to decay to two muons. The 68 and 95% CL bands, shown in dark green and light yellow, respectively, include the uncertainties in the background component of the fit. The lower panel shows the difference between the data and the background component of the fit. was shown to lead to similar biases over the whole mass range. The number of spurious signal events varies between a few units and a few hundred depending on the mass of the signal and the event category. Although the bias is due to the modelling of the background, its impact on the result depends on the expected signal strength and shape, both varying according to m A and tan β in the model-dependent analysis, and according to the mass of a generic resonance φ for the model-independent case. More details about the effect of the bias on the final results are discussed in Section 7.
An example of fits to the data with Eq. (3), for the model-independent case, is shown in Fig. 4. Two mass hypotheses, 400 and 980 GeV, are assumed for a single narrow-width resonance φ decaying to two muons. The two event categories are combined according to their sensitivity, S/(S + B), where S and B are the number of events in the expected signal and observed background, respectively. The uncertainties in the integrated luminosity, in the signal efficiency, and in the background parametrization are taken into account as nuisance parameters.

Results
No evidence of Higgs boson production beyond the SM prediction is observed in the mass range in which the analysis has been performed. Exclusion limits at 95% confidence level (CL) are therefore determined.
A maximum likelihood fit to the data, as explained in the previous section, is performed under the background only and the signal-plus-background hypotheses, where the background includes the expectation for the SM Higgs boson. The systematic uncertainties are incorporated as nuisance parameters in the likelihood. The upper limits for the signal production are computed using the CL s [90, 91] criterion and the hybrid frequentist-bayesian approach, where the distributions of the test-statistic are derived from pseudo-experiments [92].
The results are interpreted within the MSSM in the context of the m mod+ h and hMSSM scenarios, by combining both event categories. The 95% CL limit on the parameter tan β is presented as a function of m A : the exclusion limit is chosen for each m A as the tan β value at which the CL s is lower than 0.05.

95% CL excluded:
Observed The results of the τ + τ − analysis [28] exclude a much larger m A -tan β region, reaching the value of tan β = 60 at m A = 1.5 TeV. For values of m A up to 400 GeV the µ + µ − results exclude a larger m A -tan β region compared to the results of the bb analysis [31], which is instead slightly more sensitive at higher m A reaching the value of tan β = 60 at about m A = 700 GeV.

CMS
Limits on the production cross section times decay branching fraction σB(φ → µ + µ − ) for a single neutral scalar boson φ have also been determined. In the model-independent interpretation the φ boson is searched for as a single resonance with mass m φ assuming a narrow width or a width equal to 10% of m φ . In the first case the intrinsic width of the signal is smaller than the invariant mass resolution, while in the second case the width is larger even for mass val- ues near 1000 GeV (lower sensitivity of the analysis). The simulated signal of the A boson in the tan β = 5 case (smallest intrinsic width, dominated by the detector resolution) is used as a template to compute the detection efficiency of a generic φ boson decaying to a muon pair. The φ boson is assumed to be produced entirely either via the b-associated or the gluon fusion process, and the analysis is performed separately for the two production mechanisms. Figure 6 shows the 95% CL upper limits on the cross section times the decay branching fraction to µ + µ − as a function of the φ mass for a narrow resonance. These limits are more stringent by a factor of 2 to 3 than those recently obtained by ATLAS in a similar search [22]. The corresponding upper limits assuming a signal template with a width equal to 10% of its mass value are shown in Fig. 7. In the case of large signal widths, the upper limits as a function of m φ start from 140 GeV. This is done to have the signal peak ±3Γ within the fit range. Moreover, as one may expect, the limits are less stringent than for the narrow-width approximation, and it is no longer possible to distinguish the fine structure of the 95% CL limits as a function of the mass, as observed for the narrow-width case.

Summary
A search for neutral minimal supersymmetric standard model (MSSM) Higgs bosons decaying to µ + µ − was performed using 13 TeV data collected in proton-proton collisions by the CMS experiment at the LHC. No excess of events was found above the expected background due to standard model (SM) processes. The 95% confidence level upper limit for the production of beyond SM neutral Higgs bosons is determined in the framework of the m mod+ h and the phenomenological scenarios of the MSSM. For the ratio of the vacuum expectation values of the neutral components of the two Higgs doublets, tan β, its excluded values range from ≈10 to ≈60 for a mass of the pseudoscalar A boson (m A ) from 130 to 600 GeV. The larger collected luminosity and the higher center-of-mass energy exclude a larger m A -tan β region, compared to what was obtained at 7 and 8 TeV in a similar analysis. Model-independent exclusion limits on the production cross section times branching fraction of a generic narrow-width neutral boson decaying to two muons have been determined assuming the neutral boson to be produced entirely either via b-associated or gluon fusion mechanisms. The limits are determined in the mass range from 130 to 1000 GeV, separately for the two production mechanisms. Similarly, exclusion limits are also obtained assuming a signal width equal to 10% of its mass value.    [76] CMS Collaboration, "Particle-flow reconstruction and global event description with the CMS detector", JINST 12 (2017) P10003, doi:10.1088/1748-0221/12/10/P10003, arXiv:1706.04965.