Measurement of the B$^0_\mathrm{S}$ $\to$ $\mu^+\mu^-$ decay properties and search for the B$^0$ $\to$ $\mu^+\mu^-$ decay in proton-proton collisions at $\sqrt{s}$ = 13 TeV

Measurements are presented of the B$^0_\mathrm{S}$ $\to$ $\mu^+\mu^-$ branching fraction and effective lifetime, as well as results of a search for the B$^0$ $\to$ $\mu^+\mu^-$ decay in proton-proton collisions at $\sqrt{s}$ = 13 TeV at the LHC. The analysis is based on data collected with the CMS detector in 2016-2018 corresponding to an integrated luminosity of 140 fb$^{-1}$. The branching fraction of the B$^0_\mathrm{S}$ $\to$ $\mu^+\mu^-$ decay and the effective B$^0_\mathrm{S}$ meson lifetime are the most precise single measurements to date. No evidence for the B$^0$ $\to$ $\mu^+\mu^-$ decay has been found. All results are found to be consistent with the standard model predictions and previous measurements.


Introduction
Rare B meson decays provide a sensitive probe for beyond-the-standard-model (BSM) effects and allow exploring energy scales much higher than the ones directly accessible at the CERN LHC. A key factor in the success of these studies is the availability of precise theoretical predictions for experimentally accessible processes. The leptonic decays B 0 s → µ + µ − and B 0 → µ + µ − represent such a case, where precise theoretical predictions can be matched with a clear experimental signature. These rare decays are examples of flavor changing neutral current processes, which are strongly suppressed in the standard model (SM), making them sensitive to BSM physics contributions. In this Letter, when decays are mentioned, charge-conjugated decay modes are implied.
These predictions include next-to-leading order corrections of electroweak origin and next-tonext-to-leading order quantum chromodynamics (QCD) corrections. The largest contribution to the theoretical uncertainty is from the determining of the Cabibbo-Kobayashi-Maskawa matrix element values, in particular |V cb |. The issue can be mitigated by considering a ratio of the branching fraction to the mass difference of the heavy and light B 0 s and B 0 mesons [4]. A number of experiments at e + e − and hadron colliders have searched for these decays, but only recently the first observation of the B 0 s → µ + µ − decay was reported in a combined analysis of data taken by the CMS and LHCb Collaborations [5], which was later confirmed by the ATLAS [6], CMS [7], and LHCb [8-10] experiments individually. Currently, the most precise measurement of the B 0 s → µ + µ − branching fraction is achieved in a combined analysis of data from the three experiments [11]. The analysis shows a deviation from the SM prediction at the level of 2.4 standard deviations (σ) for the B 0 s → µ + µ − branching fraction. No significant detection of the B 0 → µ + µ − decay has been reported so far.
A few recent measurements of the semileptonic b → sℓ + ℓ − processes (where lepton ℓ = e or µ) have reported disagreements at the level of 2-3 σ with the SM predictions. Deviations were found in measurements of the differential branching fractions of the B → K * µ + µ − and B 0 s → ϕµ + µ − decays [12][13][14][15], angular observables in the B → K * µ + µ − decays [16,17], and in searches for lepton flavor universality violation via measurements of the R K and R K * ratios [18][19][20][21][22]. Not all the individual measurements confirm these observations though [23,24] and the revised R K and R K * measurements are found to be consistent with the SM [25,26].
In the framework of an effective field theory, the b → sℓ + ℓ − decays are dominated by the semileptonic operators O 9 = (s L γ µ b L )(ℓγ µ ℓ) and O 10 = (s L γ µ b L )(ℓγ µ γ 5 ℓ). The BSM physics contributions could be observed as deviations of the corresponding Wilson coefficients (C 9 and C 10 ) from their SM values. As the B 0 s → µ + µ − and B 0 → µ + µ − decays are dominated by the O 10 operator, they may be sensitive to the same effects. In contrast to the semileptonic case, the nonperturbative hadronic contributions for leptonic B meson decays enter solely through decay constants f B and f B S , which are precisely known from lattice QCD calculations [27], making the theory calculations more robust. Therefore, a precise measurement of the B 0 s → µ + µ − decay properties may have a big impact on the interpretation of these anomalies.
The effective lifetime of the B 0 s meson measured in the B 0 s → µ + µ − decay is an independent and theoretically clean probe for BSM physics [28]. In the SM, the heavy (B 0 s,H ) and light (B 0 s,L ) mass eigenstates are linear combinations of the flavor eigenstates. The lifetimes of the heavy and light mass eigenstates are τ H = 1.624 ± 0.009 ps and τ L = 1.429 ± 0.007 ps, respectively [29]. Only the heavy B 0 s,H mass eigenstate can decay to the µ + µ − final state. This is because, in the absence of charge-conjugation and parity (CP) violation in the B 0 s mixing, the B 0 s mass eigenstates are also CP eigenstates, with the heavier one being CP odd, while µ + µ − is also a CP-odd final state. Therefore, any significant deviation of the measurement from the established value for the B 0 s,H lifetime would indicate a BSM physics contribution. Currently, the most precise measurement of the B 0 s meson lifetime (τ) in the B 0 s → µ + µ − decay, τ(B 0 s → µ + µ − ) = 2.07 ± 0.29 ps, comes from the LHCb experiment [9, 10]. In this Letter, we report on new measurements of the B 0 s → µ + µ − decay properties and a search for the B 0 → µ + µ − decay based on proton-proton collision data collected by the CMS experiment in 2016-2018 at a center-of-mass energy of 13 TeV, corresponding to an integrated luminosity of 140 fb −1 . This new analysis uses improved techniques compared to the earlier publication based on 2011-2012 and 2016 data [7]; consequently the 2016 data sample has been re-analyzed, and the new results supersede the ones from Ref. [7]. Given that the sensitivity of the 2016-2018 data significantly exceeds that of the 2011-2012 sample, no attempt is made to combine the new results with those from 2011-2012 data. Tabulated results are provided in the HEPData record for this analysis [30].

Data analysis overview
The data analysis strategy employed in this measurement is based on the previous CMS studies of the B 0 s → µ + µ − and B 0 → µ + µ − decays. We made several modifications and improvements to increase the analysis sensitivity, while also benefiting from the large amount of data collected in 2017-2018. Because of the increased sensitivity, new methods were developed to achieve a better understanding of various systematic effects.
Leptonic B meson decays are reconstructed by combining two oppositely charged muons, performing a common vertex fit, and imposing selection criteria to separate small signals from large backgrounds. Here and in what follows, we use the notation B to denote either the B 0 or B 0 s meson. The dominant background sources are the combinatorial background where the two muons originate from two different heavy quarks, the partially reconstructed semileptonic decays where both muons originate from the same B meson, and the peaking background coming from the charmless two-body hadronic decays of B mesons.
The combinatorial and partially reconstructed backgrounds are the main limiting factors in the analysis sensitivity. Despite being reducible backgrounds with several distinct features, they are copious, which makes it difficult to reject them completely without losing a significant fraction of signal events. To maximize the analysis sensitivity, we perform a multivariate analysis (MVA) combining multiple discriminating observables in a single powerful discriminator (d MVA ) using a boosted decision tree algorithm. The training and performance evaluation of the MVA are described in Section 6.
The charmless two-body decays, such as B 0 → K + π − and B 0 s → K + K − , could mimic signal when both charged hadrons are misidentified as muons. We measure the misidentification probabilities in data using the K 0 S → π + π − , ϕ(1020) → K + K − , and Λ → pπ − decays after restricting the decay distance to match that of the B mesons. We find a reasonable agreement between the data and Monte Carlo (MC) simulation for pions and kaons, where the misidentification probability is dominated by pion and kaon decays to a muon and a muon antineutrino. The probability to misidentify protons as muons is an order of magnitude smaller according to simulation, and found to be even smaller in data, thus making contributions from the associated processes unimportant. With a stringent muon identification based on a multivariate analysis [7], we reduce the charmless two-body backgrounds to a negligible level compared to the dominant combinatorial background.
The results are extracted using simultaneous unbinned maximum likelihood (UML) fits in multiple categories (discussed in Section 7). For the branching fraction measurements, we perform a two-dimensional (2D) fit using the dimuon invariant mass (m µ + µ − ) and its uncertainty as the observables. For the lifetime extraction, we perform a three-dimensional (3D) fit using the dimuon mass, the decay time, and the decay time uncertainty as the observables.
Given the poor precision in the knowledge of the bb cross section at the LHC, a direct extraction of the branching fractions of the B 0 s → µ + µ − and B 0 → µ + µ − decays would be affected by a large uncertainty. As commonly done in B physics analyses, the signal branching fraction is instead calculated by normalizing it to another decay channel of a B meson, for which the branching fraction is well known and whose characteristics allow for a precise reconstruction with small backgrounds. The B + → J/ψK + decay with J/ψ → µ + µ − is the best candidate in our case as a normalization channel. We also consider B 0 s → J/ψϕ(1020) decays with ϕ(1020) → K + K − as a cross-check of the nominal result. The signal branching fractions B(B 0 s → µ + µ − ) and B(B 0 → µ + µ − ) can then be extracted as where N X is the number of the candidates of decay X, as extracted from the fit, and ε X is the corresponding full selection efficiency derived from MC simulation. In addition, f u , f d , and f s are the production fractions for B + , B 0 , and B 0 s mesons, respectively. The ratio f u / f d is expected to be 1 in the SM due to isospin symmetry. The ratio f s / f u , as well as B(B + → J/ψK + ) and B(B 0 s → J/ψϕ(1020)), are external inputs discussed in Section 9. Another advantage of measuring the branching fractions with respect to other decays is that it allows for a cancellation of many systematic uncertainties in the selection and reconstruction efficiencies of the signal and normalization channels.
To reduce unintentional bias, the analysis employs a "data blinding" technique. All optimization studies are performed using signal from MC simulation and background from data that does not include events with a dimuon mass of 5.15-5.50 GeV. Once the selection criteria and measurement procedure have been finalized, the data are unblinded.

The CMS detector
The central feature of the CMS apparatus is a superconducting solenoid of 6 m internal diameter, providing a magnetic field of 3.8 T. Within the solenoid volume are a silicon pixel and strip tracker, a lead tungstate crystal electromagnetic calorimeter, and a brass and scintillator hadron calorimeter, each composed of a barrel and two endcap sections. Forward calorimeters extend the pseudorapidity η coverage provided by the barrel and endcap detectors. Muons are measured in gas-ionization detectors embedded in the steel flux-return yoke outside the solenoid. A more detailed description of the CMS detector, together with a definition of the coordinate system used and the relevant kinematic variables, can be found in Ref. [31].
The silicon tracker used in 2016 measured charged particles within the range |η| < 2.5. For nonisolated particles with the transverse momentum of 1 < p T < 10 GeV and |η| < 1.4, the track resolutions were typically 1.5% in p T and 25-90   Muons are measured in the range |η| < 2.4, with detection planes made using three technologies: drift tubes, cathode strip chambers, and resistive-plate chambers. Matching muons to tracks measured in the silicon tracker results in a relative p T resolution, for muons with p T up to 100 GeV, of 1% in the barrel and 3% in the endcaps [35].
Events of interest are selected using a two-tiered trigger system. The first level, composed of custom hardware processors, uses information from the calorimeters and muon detectors to select events at a rate of around 100 kHz within a fixed latency of about 4 µs [36]. The second level, known as the high-level trigger, consists of a farm of processors running a version of the full event reconstruction software optimized for fast processing, and reduces the event rate to around 1 kHz before data storage [37].

Data and Monte Carlo simulation
We split the data collected with the CMS detector in 2016-2018 into four distinct periods: 2016a, 2016b, 2017, and 2018. The integrated luminosities of the corresponding samples are 20.0, 16.6, 42.0, and 61.3 fb −1 [38][39][40]. Data from the 2016a period were affected by strip tracker dynamic hit inefficiency. The problem was resolved in August 2016 and the rest of the 2016 data-taking period is referred to as 2016b. Before the 2017 data-taking period, the pixel detector was upgraded, improving the acceptance, redundancy, and resolution. During data taking in 2017, the pixel detector had synchronization issues at the beginning of the run, followed by DC-DC converter failures [33] leading to about 11% of the detector being unresponsive. Most of these issues were resolved before data taking began in 2018.
We use multiple samples of MC simulated events to evaluate the signal efficiency, the detector response, and the background yields. The simulated event samples are generated with PYTHIA 8.212 [41] using the CP5 underlying event tune [42] and propagated through the CMS detector model using the GEANT4 [43] package. The decays of B hadrons are simulated using the EVT-GEN 1.3.0 [44] program and final-state photon radiation is described using PHOTOS 3.56 [45].
Multiple interactions within the same or nearby bunch crossings (pileup) are simulated for all samples by overlapping the hard-scattering event with several minimum bias events, with the multiplicity similar to the one observed in data (averaging to 23 for 2016 and 32 for 2017-2018).

Event reconstruction and selection
The events used in this analysis were collected with a set of dimuon triggers designed to select B → µ + µ − , B + → J/ψK + , and B 0 s → J/ψϕ(1020) events. To achieve an acceptable trigger rate, the first-level trigger required two high-quality [36] oppositely charged muons restricted to |η| < 1.5. At the high-level trigger, a high-quality dimuon secondary vertex (SV) [37] was required and the events were restricted to mass ranges of 4.5-6.0 GeV and 2.9-3.3 GeV for the B and J/ψ mesons, respectively. The J/ψ triggers additionally required the SV to be displaced from the beam spot (defined as the average interaction point in the plane transverse to the beams) and the displacement vector to be aligned with the dimuon momentum.
The event candidate selection starts with the reconstruction of dimuon candidates, which are used to build different B mesons for the signal and normalization channels. The selection is kept as similar as possible for all the channels to benefit from the cancellation of systematic effects in the ratio of the efficiencies used for the branching fraction normalization. Both muons are required to have a high-quality inner track [32] with p T > 4 GeV and |η| < 1.4, and pass the tight muon identification requirements [7, 35], which suppress misidentified muons from pion and kaon decays, and from other sources. Extra single and double kaons with p T > 1 GeV and |η| < 2.5 are required for the B + → J/ψK + and B 0 s → J/ψϕ(1020) normalization channels, respectively.
We obtain B candidate properties by employing the kinematic fitter described in Ref. [46]. We apply different kinematic constraints depending on the final state. For the B 0 s → µ + µ − and B 0 → µ + µ − decays, we use the vertex constraint, while for the B + → J/ψK + and B 0 s → J/ψϕ(1020) decays we also add a mass constraint for the J/ψ candidate.
From the B candidate's decay vertex and its momentum we build a refitted trajectory representing the B candidate. Then, for each reconstructed primary vertex (PV) in the event, the trajectory is extrapolated to the point closest to that vertex. The absolute distance between the closest point and the PV in 3D space is defined as the impact parameter. The PV with the smallest impact parameter is selected as the best PV for the B candidate. Table 1 shows a summary of the B candidate selection requirements for the signal extraction and normalization UML fits. The 3D SV displacement significance is defined as the 3D distance between the PV and the dimuon SV, divided by its corresponding uncertainty. The 2D µ + µ − pointing angle α is defined as the angle between the dimuon momentum and the line connecting the PV and SV, and is calculated in the transverse plane with respect to the beam direction. The pointing angle requirement is introduced to match the cos α > 0.9 requirement used in the J/ψ triggers.

Multivariate analysis
Most of the observables used to distinguish signal from background have rather weak discriminating power. Therefore, we employ an MVA to combine them into a single, more powerful discriminator d MVA . Compared to the previous analysis [7], we have relaxed the preselection requirements, developed new discriminating observables, added significantly more data for the model training, and used a more advanced machine learning algorithm. This allows us to significantly improve the analysis sensitivity, achieving the same level as in the previous measurement with just ∼60% of the previous data.
Inputs to the MVA are split into three major classes. The first class includes pointing angles, which are defined as the angles between the B candidate momentum and the line connecting Table 1: Selection requirements for the three decay channels used in the signal yield and normalization fits. Addition selection requirements are applied for the B + → J/ψK + control sample used in systematic studies.
the PV and SV, either in 2D or 3D. We use both definitions since the 2D version benefits from a smaller uncertainty in the vertex position and the 3D version provides additional matching information along the beam line. These observables are effective at rejecting all types of backgrounds, except for the ones originating from the two-body decays.
The second class of observables is related to the SV. The dimuon candidates from the combinatorial background tend to be associated with a low-quality SV fit. Therefore, the SV probability, calculated using the χ 2 and the number of degrees of freedom of the fit, is one of the most powerful discriminating variables. Furthermore, we achieve additional background suppression by rejecting events that contain a better-quality SV formed by one of the muons and any track in the event. Finally, most of the misreconstructed SVs tend to be close to the PV and therefore can be rejected using the SV displacement information.
The last class of observables is designed to detect nearby decay products present in semileptonic decays of b and c hadrons. We compute the number of tracks compatible with the µ + µ − SV. In addition, isolation variables are calculated for the B candidate as well as both muon candidates. The isolation observables have been optimized in the previous studies [7] to maximize the separation power between the B 0 s → µ + µ − signal and the background, while maintaining a reasonable agreement between the MC simulation and the data for the B + → J/ψK + normalization channel. The isolation is defined as where p T is the momentum of either the B or muon candidate and the sum includes all chargedparticle tracks with p T > 0.9 GeV in a cone of radius ∆R = √ (∆η) 2 + (∆φ) 2 around the candidate momentum direction, with φ representing the azimuthal angle in radians. Only the charged particles that do not belong to the candidate and are associated with the same PV are considered. The ∆R is required to be smaller than 0.7 for the B candidate isolation and 0.5 for the muon isolation.
For the MVA training, we employ the XGBOOST library [47], which implements an advanced gradient boosting algorithm. The training is performed on a mixture of simulated B 0 s → µ + µ − signal events and background events in data selected using the sidebands of the dimuon mass distribution consisting of two regions: 5.5-5.9 GeV populated by the combinatorial background and 4.9-5.1 GeV representing a combination of the partially reconstructed and combinatorial backgrounds. The events are split into training and testing categories in a 2:1 proportion. To reuse all available events, we train three classifiers by assigning events to one of the categories based on their event number modulo 3. This allows us to classify all events in data, making sure that no event was evaluated by a classifier that was trained on the event itself.
The B + → J/ψK + selection requirements in Table 1 define the normalization sample used to extract the final branching fractions. We also define a B + → J/ψK + control sample. This sample starts with an additional selection requirement of kaon p T < 1.5 GeV, which effectively requires the kaon to carry only a small fraction of the parent B meson momentum, providing a better match with the B → µ + µ − kinematic distributions.
Even with the kaon p T < 1.5 GeV requirement, there are still differences in the kinematic distributions for the two decays, which may have an impact on the analysis. The most important one is the difference in the invariant mass of the B 0 s and J/ψ mesons, which has a significant impact on the opening angle between the two muons. The J/ψ → µ + µ − decay tends to have a larger uncertainty in the µ + µ − vertex position along the dimuon momentum. Therefore the SV significance for the B + → J/ψK + events needs to be scaled by a factor of ∼1.6 to match the We use the B + → J/ψK + control sample to measure the MVA performance in data. To achieve the best matching between the d MVA distributions for the B 0 s → µ + µ − and B + → J/ψK + decays, we need to select appropriate input observables for the MVA in the B + → J/ψK + control sample. For the pointing angle and the impact parameter we use the µ + µ − K + final-state observables, since otherwise we would have an incorrect B candidate momentum vector. We also ignore the kaon track in all the isolation calculations and extra track counting. For the remaining inputs, we rely on the µ + µ − observables only. Figure 1 shows a comparison of the d MVA distributions for the B + → J/ψK + simulation and data. The data plots have backgrounds subtracted using the sPlot technique [48] applied to the UML fits of the B + → J/ψK + invariant mass distributions. We observe good agreement between the MC simulation and data for the 2016a and 2016b samples. The agreement is worse for the 2017 and 2018 samples. We derive corrections to the efficiency of the d MVA selection requirements, defined in the caption of Table 2, in two different ways. The first ("Ratio") method derives the corrections using the B + → J/ψK + data and MC samples and applies them to the B 0 s → µ + µ − and B 0 → µ + µ − efficiencies. The second ("XGBOOST") method is based on the idea of reweighting the MC simulation samples to match the data. We were not able to find a single variable that would allow us to compensate for the discrepancy. Therefore we developed an approach using the XGBOOST algorithm to train a classifier on the difference between the simulation and data in B + → J/ψK + events and use it to reweight the simulated B → µ + µ − events. We trained the XGBOOST classifier using the same inputs that we use for the MVA. The backgrounds are subtracted from the data via the sPlot technique, as described above. The corrections from the two methods are summarized in Table 2. In general, the two methods give results compatible within 1-2σ. We use the results from the XGBOOST method as a default, and take the difference between the results of the two methods as a systematic uncertainty.

Data analysis
In order to extract results we perform a set of UML fits: the B + → J/ψK + and B 0 s → J/ψϕ(1020) yield fits, the simultaneous B 0 s → µ + µ − and B 0 → µ + µ − branching fraction fit, and the B 0 s → µ + µ − effective lifetime fit.
For the branching fraction measurement, we perform a 2D fit of the dimuon invariant mass and the relative mass resolution distributions within multiple event categories. The events are categorized using the following three independent criteria: The likelihood consists of five components: the B 0 s signal, the B 0 signal, the partially reconstructed semileptonic background, the peaking B → h + h − background, where h represents a hadron, and the combinatorial background. The statistical uncertainties from the MC simulation and the systematic uncertainties are propagated to the final results by introducing nuisance parameters, which are profiled during the fit.
The signal components are modeled by Crystal Ball functions [49] and include the per-event mass resolution in the parameterization. The mass resolution and scale are calibrated using the J/ψ → µ + µ − and Υ(1S) → µ + µ − data samples. All the parameters of the signal model are fixed in the fit, except for the width of the Crystal Ball function, which is a conditional parameter proportional to the dimuon mass resolution.
The semileptonic background is dominated by the three-body B → h − µ + ν and B → hµ + µ − processes. The contribution of this component is determined from the simulation of B 0 → π − µ + ν, B 0 s → K − µ + ν, B + → π + µ + µ − , and B 0 → π 0 µ + µ − decays. Contributions from Λ b → pµ − ν and B + c → J/ψµ + backgrounds are found to be negligible compared with the uncertainty in the dominant background normalization. The shape of the semileptonic background is a Gaussian function with the mass and width parameters allowed to vary in the fit to the data.
The peaking B → h + h − background is represented by a sum of Gaussian and Crystal Ball functions with a common central mean value. The parameters used in the fit to the data are fixed from the results of a fit to the distribution obtained from MC simulation of B 0 → K + K − , K + π − , π + π − , and B 0 s → K + K − , K − π + , π + π − decays, with branching fractions from Ref.
[29]. The yields of the semileptonic and peaking background components are calculated using the normalization channel and the corresponding MC simulation with the corrected efficiencies. Systematic uncertainties of 25% and 50% are assigned to the semileptonic and peaking backgrounds, respectively, to account for uncertainties in the rate of misidentifying charged hadrons as muons. The nuisance parameters in the yield calculation, such as the ratio of the efficiencies and the normalization of the B + → J/ψK + process, are constrained using Gaussian or log-normal priors, according to the corresponding systematic uncertainties.
The combinatorial background is modeled by a linear function (constrained to be positive in the entire fit range). The yields and the slopes of the combinatorial background are free parameters in the UML.
We estimate the expected performance of the branching fraction measurement via an ensemble of pseudo-experiments generated using the SM values for the branching fractions and the lifetime. The relative uncertainties, which include the systematic uncertainties described in Section 8, are expected to be +11.1 −10.5 % in B(B 0 s → µ + µ − ) and +67 −62 % in B(B 0 → µ + µ − ). To extract the effective lifetime of the B 0 s meson in the B 0 s → µ + µ − decay, we perform a 3D UML fit to the dimuon invariant mass, decay time, and decay time uncertainty, dividing the data by data-taking period, d MVA value, and rapidity of the most forward muon, as we do for the branching fraction fit. The signal acceptance as a function of the decay time is extracted from simulation and corrected with the B + → J/ψK + decays in data. To minimize the differences between the two channels, we use the B + → J/ψK + control sample defined in Section 6, along with its MVA. The combinatorial background decay time distribution was obtained from a mass sideband in data. The decay time uncertainty is calculated for each event and is used as an observable in the fit. The decay time uncertainty models are obtained from the simulation samples and sideband data. Using pseudo-experiments generated with a complete B 0 s → µ + µ − model, the expected total uncertainty in the lifetime is found to be +0.18 −0.16 ps.

Systematic uncertainties 8.1 Branching fraction measurement
The branching fraction measurements have multiple sources of experimental and theoretical systematic uncertainties. The experimental uncertainties are dominated by the uncertainty in the B → µ + µ − signal efficiency corrections due to mismodeling of d MVA in MC simulation, the kaon reconstruction and selection efficiency for the B + → J/ψK + and B 0 s → J/ψϕ(1020) normalization measurements, and the trigger efficiency difference between the signal and nor-malization channels. The uncertainties in the branching fractions of the B + → J/ψK + and B 0 s → J/ψϕ(1020) decays, as well as in f s / f u , are considered to be external uncertainties, which are factorized out in the final results.
The signal efficiency corrections for mismodeling of the d MVA distribution are estimated with two different methods described in Section 6. The two methods give results compatible with each other. Based on the difference between the two methods, we assign a 2 (3)% systematic uncertainty in the corrections for the B 0 s → µ + µ − and B 0 → µ + µ − signal efficiencies for the d MVA > 0.90 (0.99) selection ("d MVA correction").
The hadron tracking efficiency uncertainty is obtained by comparing the ratio of the measured branching fractions of the two-body D 0 → K − π + to the four-body D 0 → K − π + π + π − decays to the world average value [29]. This gives a "Tracking efficiency" uncertainty of 2.3% [50] for each kaon.
As a result of the different kinematics and triggers for the signal and normalization channels, the trigger efficiency effects do not fully cancel and are corrected using MC simulation. The simulation of the trigger efficiency is checked by comparing with the efficiency measured using data obtained from other triggers. The observed differences between simulation and data are used to establish a "Trigger efficiency" systematic uncertainty of 3.7 (2.4)% for 2016 (2017-2018).
There are a few more systematic uncertainties in the measurement. The fit bias is extracted from the difference between the branching fraction obtained from the pseudo-experiments and the input SM value, combined with the variation caused by using different background models in the fit ("Fit bias"). The shape uncertainty in the normalization channel is derived by using different signal templates in the yield fits ("B + → J/ψK + shape uncertainty"). The pileup uncertainty is extracted from the difference in the efficiency performance derived using the pileup distribution in data and in MC simulation ("Pileup"). The normalization channels use a tighter SV probability requirement than the signal channel because of the different triggers. The corresponding uncertainty is evaluated by comparing the efficiency difference between the tighter and the signal channel SV probability requirement (>0.100 with respect to >0.025) in the data and MC simulation in B + → J/ψK + events ("Vertex quality requirement"), where the data is from a sample that is triggered without the SV probability requirement. Table 3 summarizes the systematic uncertainties for the branching fraction measurements using B + → J/ψK + events for normalization, including the B + → J/ψK + branching fraction uncertainty [29]. For the B 0 s → µ + µ − branching fraction measurement with the B 0 s → J/ψϕ(1020) normalization, the systematic uncertainty in the tracking efficiency is doubled to 4.6% due to the presence of two kaons in the final state, and the shape uncertainty is found to be 1.5%. At the same time, this measurement could be free from explicitly taking into account the B production fraction systematic uncertainty, as discussed in Section 9.
The lifetime of the B 0 s meson has a significant impact on the signal efficiency for the B 0 s → µ + µ − decays. The branching fraction measurements are reported assuming the SM value for the lifetime (1.61 ps). Since the lifetime affects the branching fraction measurements, we provide a correction for alternative lifetime hypotheses. The scale factor for the branching fraction is 1.577 − 0.358τ, where τ is the B 0 s meson lifetime in ps.

Lifetime measurement
The dominant source of systematic uncertainty in the lifetime measurement is associated with mismodeling of the correlation between the d MVA and the decay time. This correlation stems from the most discriminating MVA variables, the pointing angle and its uncertainty, both of Table 3: Summary of the systematic uncertainties for the B 0 s → µ + µ − and B 0 → µ + µ − branching fraction measurements. which are strongly correlated with the decay time. The correlation enters via the decay distance: the larger the decay distance is, the better one knows the direction from the PV to SV. As the decay distance gets shorter, the uncertainty in the pointing angle increases, making such events harder to distinguish from the background. Mismodeling of these correlations in simulation can have a significant impact on the decay time distribution.
The decay time is also correlated with many selection requirements. Most of them are well simulated. We measure the lifetime bias in B + → J/ψK + events using a relaxed selection requirement, d MVA > 0.90, and compare it to the prediction from simulation. We find the bias for the lifetime measurement to be 0.04-0.05 ps, depending on the data-taking period ("Lifetime fit bias").
For the final selection, we derive a correction as a ratio of the decay time distributions for the d MVA > 0.99 and d MVA > 0.90 requirements using B + → J/ψK + events in data. Then we apply this correction to the B 0 s → µ + µ − decay time distribution extracted from simulation using d MVA > 0.90 as the selection requirement. Repeating the procedure using simulated events, we find that the method may introduce a bias of up to 0.10 ps in 2016 data. The bias is found to be much smaller in 2017 and 2018 data. These effects ("Decay time distribution mismodeling") are taken into account in the lifetime fit by introducing independent nuisance parameters in the fit model. Two additional minor systematic uncertainties are also included. The uncertainty from the imperfect parameterization of the efficiency dependence on the decay time is derived using different analytical functions in the lifetime fit to B + → J/ψK + events ("Efficiency modeling"). We also measure the lifetime in the MC samples generated with different lifetimes from the pseudo-experiments while sharing the same efficiency function. The difference between the measured lifetime and the input lifetime of the MC samples is assigned as a systematic uncertainty ("lifetime dependence"). Table 4 summarizes the systematic uncertainties in the lifetime measurement. The uncertainties of 2017 and 2018 data-taking periods are treated as correlated and other two are treated as uncorrelated.

Results
Using the result of the B + → J/ψK + normalization fit with Eqs. (1) and (3), we find the branching fractions to be: The correlation between the extracted B(B 0 s → µ + µ − ) and B(B 0 → µ + µ − ) branching fractions is −0.120. These results are based on the following external inputs: The branching fractions are taken from Ref. [29]. The f s / f u ratio is derived from the p Tdependent measurement of the f s / f u ratio by LHCb [51]. We are using the p T distribution observed in this measurement, shown in Fig. 2, to compute an effective f s / f u ratio for the corresponding phase space. Measurements of the p T and η dependence of the ratio by the CMS Collaboration [52] are found to be consistent with the LHCb results.
The mass projections of the likelihood fits with all four data-taking periods combined together are shown in Fig. 3. The event yields for each component of the fit are summarized in Table 5. The profile likelihood as a function of the B 0 s → µ + µ − and B 0 → µ + µ − branching fractions for 1D and 2D cases are shown in Fig. 4.
We also estimate the branching fractions using the B 0 s → J/ψϕ(1020) decays for the normalization. While this result is free from the explicit systematic uncertainty in the f s / f u ratio, it depends on the B 0 s → J/ψϕ(1020) branching fraction. At the moment, this branching fraction measurement uses the f s / f u ratio measurement as an input, but this dependence may be eliminated when new independent measurements of the B 0 s → J/ψϕ(1020) branching fraction become available, such as the measurement planned by the Belle II Collaboration at the KEKB e + e − collider [53] using the Υ(5S) data. Experimentally, the measurement based on the B 0 s → J/ψϕ(1020) normalization channel has slightly larger systematic uncertainties due to the presence of the second kaon in the final state.
The UML fit projection on the decay time axis for the signal region 5.28 < m µ + µ − < 5.48 GeV is shown in Fig. 6. The observed lifetime is consistent with the world average value of 1.624 ± 0.009 ps [29] within 1 σ, and therefore we do not correct the corresponding selection efficiency when performing the branching fraction measurement.    (N(B 0 )), the combinatorial background (N(comb)), the peaking background (N(peak)), and the semileptonic background (N(semi)) are summarized for each category (post-fit). The total expected and observed event yields are given in N(total) and Data column, respectively. Regions 0 and 1 refer to the ranges of 0.0-0.7 and 0.7-1.4, respectively, for the |η| of the most forward muon. The uncertainties are statistical only. The search for the B 0 → µ + µ − decay has not revealed a significant event excess with respect to the dominant combinatorial background prediction. The 95% confidence level upper limit on the branching fraction is found to be B(B 0 → µ + µ − ) < 1.9 × 10 −10 at 95% CL.
More data will be required to establish its existence and compare the result with the SM predictions.
The uncertainties in the branching fraction and effective lifetime measurements are dominated by the statistical component, which means that significant improvements can be expected in the precision of future measurements with the LHC Run 3 data.
The effective B 0 s meson lifetime measurement in the B 0 s → µ + µ − decay has achieved a precision comparable with the lifetime difference between the heavy and light B 0 s meson mass eigenstates, thus offering sensitivity to potential beyond-the-SM physics effects in the effective lifetime.

Acknowledgments
We congratulate our colleagues in the CERN accelerator departments for the excellent performance of the LHC and thank the technical and administrative staffs at CERN and at other CMS institutes for their contributions to the success of the CMS effort. In addition, we gratefully acknowledge the computing centres and personnel of the Worldwide LHC Computing Grid and other centres for delivering so effectively the computing infrastructure essential to our analyses. Finally, we acknowledge the enduring support for the construction and operation of the LHC, the CMS detector, and the supporting computing infrastructure provided by the following funding agencies:  [5] CMS and LHCb Collaborations, "Observation of the rare B 0 s → µ + µ − decay from the combined analysis of CMS and LHCb data", Nature 522 (2015) 68, doi:10.1038/nature14474, arXiv:1411.4413. [7] CMS Collaboration, "Measurement of properties of B 0 s → µ + µ − decays and search for B 0 → µ + µ − with the CMS experiment", JHEP 04 (2020) 188, doi:10.1007/JHEP04(2020)188, arXiv:1910.12127.
[34] CMS Collaboration, "Track impact parameter resolution for the full pseudorapidity coverage in the 2017 dataset with the CMS Phase-1 pixel detector", CMS Detector Performance Report CMS-DP-2020-049, 2020.