Search for decays of neutral beauty mesons into four muons

A search for the non-resonant decays $B^0_s \rightarrow \mu^{+}\mu^{-}\mu^{+}\mu^{-}$ and $B^0 \rightarrow \mu^{+}\mu^{-}\mu^{+}\mu^{-}$ is presented. The measurement is performed using the full Run 1 data set collected in proton-proton collisions by the LHCb experiment at the LHC. The data correspond to integrated luminosities of $1$ and $2~\mathrm{fb}^{-1}$ collected at centre-of-mass energies of $7$ and $8~\mathrm{TeV}$, respectively. No signal is observed and upper limits on the branching fractions of the non-resonant decays at $95\%$ confidence level are determined to be \mathcal{B}(B^0_s \rightarrow \mu^{+}\mu^{-}\mu^{+}\mu^{-})&<2.5 \times 10^{-9} \mathcal{B}(B^0 \rightarrow \mu^{+}\mu^{-}\mu^{+}\mu^{-})&<6.9 \times 10^{-10}.


Introduction
The rare decays B 0 (s) → µ + µ − µ + µ − proceed through b → d(s) flavour-changing neutralcurrent processes, which are strongly suppressed in the Standard Model (SM). 1 In the main non-resonant SM amplitude, one muon pair is produced via amplitudes described by electroweak loop diagrams and the other is created by a virtual photon as shown in figure 1(a). The branching fraction of the non-resonant B 0 s → µ + µ − µ + µ − decay is expected to be 3.5 × 10 −11 [1].
Theories extending the SM can significantly enhance the B 0 (s) → µ + µ − µ + µ − decay rate by contributions of new particles. For example, in minimal supersymmetric models (MSSM), the decay can proceed via new scalar S and pseudoscalar P sgoldstino particles, which both decay into a dimuon final state as shown in figure 1(b). There are two types of couplings between sgoldstinos and SM fermions. Type-I couplings describe interactions between a sgoldstino and two fermions, where the coupling strength is proportional to the fermion mass. Type-II couplings describe a four-particle vertex, where a scalar and a pseudoscalar sgoldstino interact with two fermions. Branching fractions up to B(B 0 s → SP ) ≈ 10 −4 and B(B 0 → SP ) ≈ 10 −7 are possible [2]. Sgoldstinos can decay into a pair of photons or a pair of charged leptons [3]. In this analysis the muonic decay is considered, as the coupling to electrons is smaller and the large τ -lepton mass limits the available phase space. The branching fractions of the sgoldstino decays strongly depend on the model parameters such as the sgoldstino mass and the supersymmetry breaking scale. In the search for Σ + → pµ + µ − decays the HyperCP collaboration found an excess of events, 1 The inclusion of charge-conjugate processes is implied throughout. which is consistent with the decay Σ + → P p with P → µ + µ − and a pseudoscalar mass of m(P ) = 214.3 ± 0.5 MeV [4].
So far only limits on the SM and MSSM branching fractions at 95% confidence level have been measured by LHCb based on the data recorded in 2011 [5] to be These limits are based on assumed sgoldstino masses of m(S) = 2.5 GeV/c 2 , which is approximately the central value of the allowed mass range, and m(P ) = 214.3 MeV/c 2 .
The dominant SM decays of neutral B mesons into four muons proceed through resonances like φ, J/ψ and ψ(2S). The most frequent of these decays is B 0 s → J/ψ φ, where both the J/ψ and the φ mesons decay into a pair of muons, as shown in figure 1(c). In the following, this decay is referred to as the resonant decay mode and treated as a background. From the product of the measured branching fractions of the underlying decays B(B 0 s → J/ψ φ), B(J/ψ → µ + µ − ), and B(φ → µ + µ − ) [6] its branching fraction is calculated to be (1.83 ± 0.18) × 10 −8 .
In this paper a search for the non-resonant SM process, and for the MSSM-induced B 0 (s) → µ + µ − µ + µ − decays is presented, using pp collision data recorded by the LHCb detector during LHC Run 1. Potentially contributing sgoldstinos are assumed to be short lived, such that they do not form a displaced vertex. The analysed data correspond to integrated luminosities of 1 and 2 fb −1 collected at centre-of-mass energies of 7 and 8 TeV, respectively. The branching fraction is measured relative to the decay B + → J/ψ (→ µ + µ − )K + , which gives a clean signal with a precisely measured branching fraction [6]. This yields a significant improvement compared to the previous measurement, where the use of the B 0 → J/ψ K * 0 decay as normalisation mode resulted in a large systematic uncertainty originating from the S-wave fraction and the less precisely measured branching fraction. The advantage of normalising to a well-known B meson decay is that dominant systematic uncertainties originating mainly from the bb cross-section cancel in the ratio. The LHCb detector [7,8] is a single-arm forward spectrometer covering the pseudorapidity range 2 < η < 5, designed for the study of particles containing b or c quarks. The detector includes a high-precision tracking system consisting of a silicon-strip vertex detector surrounding the pp interaction region, a large-area silicon-strip detector located upstream of a dipole magnet with a bending power of about 4 Tm, and three stations of silicon-strip detectors and straw drift tubes placed downstream of the magnet. The tracking system provides a measurement of momentum, p, of charged particles with a relative uncertainty that varies from 0.5% at low momentum to 1.0% at 200 GeV/c. The minimum distance of a track to a primary vertex (PV), the impact parameter, is measured with a resolution of (15 + 29/p T ) µm, where p T is the component of the momentum transverse to the beam, in GeV/c. Different types of charged hadrons are distinguished using information from two ring-imaging Cherenkov (RICH) detectors. Photons, electrons and hadrons are identified by a calorimeter system consisting of scintillating-pad and preshower detectors, an electromagnetic calorimeter and a hadronic calorimeter. Muons are identified by a system composed of alternating layers of iron and multiwire proportional chambers.
In the simulation, pp collisions are generated using Pythia [9, 10] with a specific LHCb configuration [11]. Decays of hadronic particles are described by EvtGen [12], in which final-state radiation is generated using Photos [13]. The interaction of the generated particles with the detector, and its response, are implemented using the Geant4 toolkit [14,15] as described in ref. [16].

Event selection
The online event selection is performed by a trigger, which consists of a hardware stage, based on information from the calorimeter and muon systems, followed by a software stage, which applies a full event reconstruction. In this analysis candidate events are first required to pass the hardware trigger, which for 7 TeV (8 TeV) data selects events with at least one muon with a transverse momentum of p T > 1.48 GeV/c (p T > 1.76 GeV/c) or at least one pair of muons with the product of the transverse momenta larger than (1.296) 2 GeV 2 /c 2 ((1.6) 2 GeV 2 /c 2 ). In the subsequent software trigger, at least one of the final-state particles is required to have p T > 1 GeV/c and an impact parameter larger than 100 µm with respect to all PVs in the event.
In the offline selection, the B 0 (s) decay vertex is constructed from four good quality muon candidates that form a common vertex and have a total charge of zero. The vertex is required to be significantly displaced from any PV. Among the four final-state muons, there are four possible dimuon combinations with zero charge. In all four combinations, the mass windows corresponding to the φ (950-1090 MeV/c 2 ), J/ψ (3000-3200 MeV/c 2 ) and ψ(2S) (3600-3800 MeV/c 2 ) resonances are vetoed. This efficiently suppresses any background from any of the three mentioned resonances to a negligible level. Contributions of other charmonium states are found to be negligible.

JHEP03(2017)001
The MatrixNet (MN) [17], a multivariate classifier based on a Boosted Decision Tree [18,19], is applied in order to remove combinatorial background, where a candidate B 0 (s) vertex is constructed from four muons that do not originate from a single B meson decay. The input variables are the following properties of the B 0 (s) candidate: the decay time, the vertex quality, the momentum and transverse momentum, the cosine of the direction angle (DIRA), and the smallest impact parameter chisquare (χ 2 IP ) with respect to any PV, where χ 2 IP is defined as the difference between the vertex-fit χ 2 of a PV reconstructed with and without the B 0 (s) candidate. The DIRA is defined as the angle between the momentum of the reconstructed B 0 (s) candidate and the vector from the PV with the smallest χ 2 IP to the B 0 (s) decay vertex. As training samples, simulated B 0 s → µ + µ − µ + µ − and B 0 → µ + µ − µ + µ − events, generated with a uniform probability across the decay phase space, are used as a signal proxy. Before training, the signal simulation is weighted to correct for known discrepancies between data and simulation as described later. The background sample is taken from the far and the near sidebands in data as defined in table 1. In order to verify that the classification of each event is unbiased, 10-fold cross-validation [20] is employed.
Background arising from misidentifying one or more particles is suppressed by applying particle identification (PID) requirements. Information from the RICH system, the calorimeters and the muon system is used to calculate the difference in log-likelihood between the hypothesis of a final-state particle being a pion or a muon, DLL µπ .
Events in the signal region are not examined until the analysis is finalised. Events outside the signal region are split into the far sidebands, used to calculate the expected background yield, and the near sidebands, used to optimise the cuts on the MN response and the minimum DLL µπ values of the four muon candidates in the final state. The optimization of the cuts is performed on a two-dimensional grid maximising the figure of merit [21] The intended significance in terms of standard deviations (σ) is set to three. Very similar selection criteria are found when using five. The expected background yield before applying the MN and PID selection, N expected bkg , is determined from a fit to the events in the near sidebands using an exponential function. For each grid point the background efficiency, ε bkg , is measured using events from the near sidebands. The signal efficiency, ε signal , is measured for each grid point using simulated B 0 Now that a suitable simulation model is available, significant improvements in terms of signal efficiency and background rejection are made by employing a multivariate classifier and being able to measure the selection efficiency from simulation.

Selection efficiencies and systematic uncertainties
The optimal working point corresponds to signal efficiencies of (0.580 ± 0.003)% and   (0.568 ± 0.003)% for the B 0 s → µ + µ − µ + µ − and B 0 → µ + µ − µ + µ − decay modes, respectively. Sources of peaking background such as B 0 → K * 0 µ + µ − , in which the kaon and the pion originating from the K * decay are misidentified as muons, are reduced to a negligible level by the optimised selection. The efficiencies for the MSSM processes are measured using simulated samples of the B 0 (s) → S(→ µ + µ − )P (→ µ + µ − ) decays, where the B 0 (s) meson decays into a pseudoscalar sgoldstino with a mass of 214.3 MeV/c 2 [4] and a scalar sgoldstino with a mass of 2.5 GeV/c 2 . Both the P and S particles are simulated with a decay width of Γ = 0.1 MeV/c 2 , which corresponds to a prompt decay. The measured efficiencies are the same for the B 0 s and the B 0 decays and amount to (0.648 ± 0.003)%. The difference between the SM and the MSSM efficiencies originates from the fact that in the case of the decay proceeding via P and S sgoldstinos, the decay products are more likely to be within the acceptance of the LHCb detector. In order to test the dependence of the measured B 0 (s) → S(→ µ + µ − )P (→ µ + µ − ) branching fractions on the mass of the scalar sgoldstino, the selection efficiency is measured in bins of dimuon invariant mass while requiring the corresponding other dimuon mass to be between 200 and 950 MeV/c 2 . An efficiency variation of O(20%) is observed.
The selection applied to the normalisation mode B + → J/ψ (→ µ + µ − )K + differs from that applied to the signal modes in the PID criteria and that no multivariate analysis technique is applied. The total efficiency is (1.495 ± 0.006)%. The uncertainties on the efficiencies are driven by the limited number of simulated events and are treated as systematic uncertainties of 0.4-0.5%.
The total efficiency is calculated as the product of the efficiencies of the different stages of the selection. As an alternative to the trigger efficiency calculated on simulation, the value is calculated on B + → J/ψ (→ µ + µ − )K + data [22] and a systematic uncertainty of 3% is assigned corresponding to the relative difference. The efficiency of the MN classifier to select the more frequent decay is compared between data and simulation. The relative difference of 0.3% is assigned as a systematic uncertainty. Another source of systematic uncertainty arises from the track finding efficiency. Again, values obtained from data [23] and simulation are compared and the deviation is treated as a correction factor for the efficiency, while the uncertainty on the deviation, 1.7%, is assigned as a systematic uncertainty.
In general the agreement in the observables used in the selection between data and simulation is very good, although there are some distributions that are known to deviate.

JHEP03(2017)001
Therefore, the gradient boosting reweighting technique [24] is used to calculate weights that correct for differences between data and simulation in B 0 s → J/ψ (→ µ + µ − )φ(→ K + K − ). The weighting is performed in the track multiplicity, the B transverse momentum, the χ 2 of the decay vertex fit and the χ 2 IP . The first two are chosen because they are correlated with the PID variables and the latter two dominate the feature ranking obtained from the MN training. These weights are applied to the B 0 (s) → µ + µ − µ + µ − and B + → J/ψ (→ µ + µ − )K + simulation samples, and are used to calculate the MN and the PID efficiencies. In order to account for inaccuracies of this method resulting from the kinematic and topological differences between the decay modes, systematic uncertainties of 3.6% are assigned based on the difference of the MN efficiency on B 0 (s) → µ + µ − µ + µ − and B 0 s → J/ψ (→ µ + µ − )φ(→ K + K − ). For the B + → J/ψ (→ µ + µ − )K + decay mode, the efficiencies are measured with and without weights and the observed difference of 2.3% is assigned as systematic uncertainty.
In order to determine accurate efficiencies of the applied PID requirements, calibration samples of muons from J/ψ → µ + µ − and φ → µ + µ − decays and of kaons from D * + → D 0 (→ K − π + )π + decays are used. The relative frequency for kaons and muons to pass the PID criteria is calculated in bins of track multiplicity, particle momentum and pseudorapidity. Different binning schemes are tested and the observed differences in the efficiencies of 1% for B + → J/ψ (→ µ + µ − )K + and 0.5% for B 0 (s) → µ + µ − µ + µ − are assigned as systematic uncertainties. Additionally, 3% of the simulated B 0 (s) → µ + µ − µ + µ − decays contain muons with low transverse momentum outside the kinematic region covered by the calibration data. This fraction is assigned as a systematic uncertainty. Candidates that have a reconstructed invariant mass within ±40 MeV/c 2 around the known B 0 (s) mass, which corresponds to ±2σ of the mass resolution, are treated as signal candidates. The accuracy of the efficiency of this cut is evaluated on B 0 s → J/ψ (→ µ + µ − )φ(→ K + K − ) data. A systematic uncertainty of 0.5% corresponding to the relative difference of the efficiency measured on data and simulation is assigned. Systematic uncertainties of 0.9% and 0.5% in the case of B 0 (s) → µ + µ − µ + µ − and B + → J/ψ (→ µ + µ − )K + originate from the imperfections of the efficiency of the event reconstruction due to soft photon radiation and 0.6% from mismatching of track segments between different tracking stations in the detector, which is measured using simulated events. All relevant sources of systematic uncertainty along with the total values are summarised in table 2. The most significant improvements with regard to the preceding measurement are the larger available data sample, and the choice of the B + → J/ψ (→ µ + µ − )K + decay as normalisation mode, which has the advantage of a precisely measured branching fraction and the absence of an additional systematic uncertainty originating from the S-wave correction.

Normalisation
The B + → J/ψ (→ µ + µ − )K + signal yield is determined by performing an unbinned extended maximum likelihood fit to the K + µ + µ − invariant mass distribution. In this fit the J/ψ mass is constrained [25] to the world average [6]. The normalisation yield is found to be N (B + → J/ψ (→ µ + µ − )K + ) = 687890 ± 920. The J/ψ K + mass spectrum along  with the fit result is shown in figure 2. A systematic uncertainty of 0.3% is assigned to the determined B + → J/ψ (→ µ + µ − )K + yield by using an alternative fit model and performing a binned extended maximum likelihood fit.
The B 0 (s) → µ + µ − µ + µ − branching fraction is calculated as where N (B + → J/ψ (→ µ + µ − )K + ) and N (B 0 (s) → µ + µ − µ + µ − ) are the observed yields of the normalisation and the signal channel, respectively. The ratio between the production rates of B 0 s and B 0 was measured by LHCb to be f s /f d = 0.259 ± 0.015 [27]. The measurement was performed using pp collision data at √ s = 7 TeV, but found to be stable between √ s = 7 TeV and 8 TeV by a previous LHCb measurement [28]. The ratio between the B + and B 0 production rates is assumed to be unity. As a consequence f s /f u is equal to f s /f d .
The single event sensitivities, η d,s , amount for the B 0 s and the B 0 decay modes in the SM and in the MSSM scenario. Here, the uncertainties are the combined values of the statistical uncertainty on the B + → J/ψ (→ µ + µ − )K + yield and the systematic uncertainty. In the case of η s the systematic uncertainty is dominated by the ratio of f u /f s and in the case of η d by the weighting procedure applied to correct for the difference between data and simulation.
The individual sources of systematic uncertainties given in table 2 are assumed to be uncorrelated and are combined quadratically. The total systematic uncertainty is 9.2% for the B 0 s decay and 7.2% for the B 0 decay. These values are small compared to the statistical uncertainty on the expected number of background events in the B 0 and B 0 s search regions. The whole analysis strategy is cross-checked by measuring the B 0 s → J/ψ (→ µ + µ − )φ(→ µ + µ − ) branching fraction. The obtained value has a precision of 20% and is compatible with the product of the branching fractions of the underlying decays taken from ref. [6].
The number of expected background events is determined by fitting an exponential function to the far sidebands of m(µ + µ − µ + µ − ). Extrapolating and integrating the fitted function in ±40 MeV/c 2 wide windows around the B 0 (s) meson masses yields the number of expected background events, The statistical uncertainty is the combination of the Poissonian uncertainty originating from the limited size of the data sample and the uncertainty on the fit parameters. As an alternative fit model a second-order polynomial is used and the difference between these background expectations is assigned as a systematic uncertainty.

Results
The final distribution of the reconstructed mass of the four muon system is shown in figure 3. No candidates are observed in either the B 0 or the B 0 s search region, which is consistent with the expected background yield.
The Hybrid CLs procedure [29][30][31], with log-normal priors to account for uncertainties of both background and efficiency estimations, is used to convert the observations into upper limits on the corresponding branching fractions. The exclusion at 95% confidence level assuming the SM single event sensitivities is shown in figure 4. The result for the corresponding MSSM values is presented in figure 5. The limits on the branching fractions of the B 0 s and B 0 decays are anti-correlated. Replacing the log-normal priors by gamma distributions yields the same results.
Assuming negligible cross-feed between the B 0 s and the B 0 search regions, the observed upper limits on the branching fractions at 95% confidence level are found to be Weighting B 0 (s) → µ + µ − µ + µ − 3.6 Kinematic coverage of PID calibration data 3.0 ±40 MeV/c 2 search region efficiency 0.5 Track segments mismatching 0.6 Combined η s MSSM 9.2 Combined η d MSSM 7. 2   Table 2. Summary of systematic uncertainties affecting the single event sensitivities along with the total systematic uncertainty calculated by adding up the individual components in quadrature. The dominating uncertainty arising from f u /f s only contributes to η s . The uncertainty of the stated selection efficiencies arising from the limited number of simulated events is 0.5% for B 0 → µ + µ − µ + µ − and 0.4% for all other considered decay modes.

Conclusion
In summary, a search for non-resonant B 0 (s) → µ + µ − µ + µ − decays has been presented. In addition, the sensitivity to a specific MSSM scenario has been probed. The applied selection focuses on finding four muon tracks that form a common vertex. For the SM scenario and the MSSM decay through short-lived scalar and pseudoscalar new particles, the limits set by the previous measurement performed by LHCb on a subset of the present data, are improved by a factor of 6.4    Open Access. This article is distributed under the terms of the Creative Commons Attribution License (CC-BY 4.0), which permits any use, distribution and reproduction in any medium, provided the original author(s) and source are credited.