Search for the lepton-flavour violating decay $D^0 \to e^\pm\mu^\mp$

A search for the lepton-flavour violating decay $D^0 \to e^\pm \mu^\mp$ is made with a dataset corresponding to an integrated luminosity of 3.0 fb$^{-1}$ of proton-proton collisions at centre-of-mass energies of $7$ TeV and $8$ TeV, collected by the LHCb experiment. Candidate $D^0$ mesons are selected using the decay $D^{*+} \to D^0 \pi^+$ and the $D^0 \to e^\pm \mu^\mp$ branching fraction is measured using the decay mode $D^0 \to K^-\pi^+$ as a normalisation channel. No significant excess of $D^0 \to e^\pm \mu^\mp$ candidates over the expected background is seen, and a limit is set on the branching fraction, $\mathcal{B}(D^0 \to e^\pm \mu^\mp)<1.3 \times 10^{-8}$, at 90 % confidence level. This is an order of magnitude lower than the previous limit and it further constrains the parameter space in some leptoquark models and in supersymmetric models with R-parity violation.


Introduction
Searches for decays that are forbidden in the Standard Model (SM) probe potential contributions from new processes and particles at mass scales beyond the reach of direct searches. The decay D 0 → e ± µ ∓ is an example of a forbidden decay, in which lepton flavour is not conserved. 1 The contributions to this process from neutrino oscillations would give a rate that is well below the reach of any currently feasible experiment. However, the decay is predicted to occur in several other models that extend the SM, with rates varying by up to eight orders of magnitude.
In Ref. [1] three extensions to the SM are considered: in a minimal supersymmetric (SUSY) SM with R-parity violation (RPV) the branching fraction B(D 0 → e ± µ ∓ ) could be as large as O(10 −6 ); in a theory with multiple Higgs doublets it would be less than about 7 × 10 −10 ; and in the SM extended with extra fermions the branching fraction would be less than O (10 −14 ). In Ref. [2] an RPV SUSY model is considered in which limits on products of couplings are obtained from the experimental upper limit on the branching fraction B(D + s → K + e ± µ ∓ ); from these limits, B(D 0 → e ± µ ∓ ) could be as large as 3 × 10 −8 . A similar study of constraints on coupling constants in RPV SUSY [3], obtained from limits on the branching fraction B(D + → π + e ± µ ∓ ), showed that B(D 0 → e ± µ ∓ ) could reach 10 −7 . LHCb has previously set limits [4] on branching fractions for the B meson decays B 0 → e ± µ ∓ and B 0 s → e ± µ ∓ , using them to put lower limits on the masses of Pati-Salam leptoquarks [5]. As is shown in Ref. [6], lepton-flavour violating charm decays are relatively insensitive to the presence of such leptoquarks. However, in a recent paper [7] it is shown that in other leptoquark scenarios B(D 0 → e ± µ ∓ ) could be as large as 4 × 10 −8 .
The first experimental limit on B(D 0 → e ± µ ∓ ) was from Mark II [8], and more recent results have come from E791 [9] and BaBar [10]. The most stringent limit is from Belle [11], B(D 0 → e ± µ ∓ ) < 2.6 × 10 −7 at 90% confidence level (CL). An improved limit, below O(10 −7 ), would provide tighter constraints on coupling constants in RPV SUSY models [1][2][3], while a limit below 4 × 10 −8 would also constrain the parameter space in some leptoquark models [7]. This Letter presents a search for the decay D 0 → e ± µ ∓ using pp collision data corresponding to integrated luminosities of 1.0 fb −1 at a centre-of-mass energy of 7 TeV and 2.0 fb −1 at 8 TeV, collected by the LHCb experiment in 2011 and 2012, respectively. In the analysis, signal candidates are selected using the decay D * + → D 0 π + and the measurements are normalized using the well-measured channel D 0 → K − π + , which has the same topology as the signal. A multivariate analysis based on a boosted decision tree algorithm (BDT) is used to help separate signal and background. The mass spectrum in the signal region, defined as 1815 − 1915 MeV/c 2 , is not examined until all analysis choices are finalized. 1 The inclusion of charge-conjugate processes is implied.

Detector and simulation
The LHCb detector [12,13] is a single-arm forward spectrometer covering the pseudorapidity range 2 < η < 5, designed for the study of particles containing b or c quarks. The detector includes a high-precision tracking system consisting of a silicon-strip vertex detector surrounding the pp interaction region, a large-area silicon-strip detector located upstream of a dipole magnet with a bending power of about 4 Tm, and three stations of silicon-strip detectors and straw drift tubes placed downstream of the magnet. The tracking system provides a measurement of momentum, p, of charged particles with a relative uncertainty that varies from 0.5% at low momentum to 1.0% at 200 GeV/c. The minimum distance of a track to a primary vertex (PV), the impact parameter, is measured with a resolution of (15 + 29/p T ) µm, where p T is the component of the momentum transverse to the beam, in GeV/c. Different types of charged hadrons are distinguished using information from two ring-imaging Cherenkov detectors. Photons, electrons and hadrons are identified by a calorimeter system consisting of scintillating-pad and preshower detectors, an electromagnetic calorimeter and a hadronic calorimeter. Muons are identified by a system composed of alternating layers of iron and multiwire proportional chambers.
The online event selection is performed by a trigger [14], which consists of a hardware stage, based on information from the calorimeter and muon systems, followed by a software stage in which all charged particles with p T > 500 (300) MeV/c are reconstructed for 2011 (2012) data. At the hardware trigger stage, events are required to have a muon with high p T , or a hadron, photon or electron with high transverse energy in the calorimeters. The software trigger requires a two-, three-or four-track secondary vertex with a significant displacement from the primary pp interaction vertices. At least one charged particle must have a transverse momentum p T > 1.7 GeV/c and be inconsistent with originating from a PV. A multivariate algorithm [15] is used for the identification of secondary vertices consistent with the decay of a b or c hadron.
In the simulation, pp collisions are generated using Pythia [16] with a specific LHCb configuration [17]. Decays of hadronic particles are described by EvtGen [18], in which final-state radiation is generated using Photos [19]. The interaction of the generated particles with the detector, and its response, are implemented using the Geant4 toolkit [20] as described in Ref. [21]. Samples of simulated events are generated for the signal D 0 → e ± µ ∓ channel, for the normalization D 0 → K − π + channel and for D 0 → π + π − , which is an important background channel.

Event selection and efficiencies
In the first stage of the offline event selection, the D * + → D 0 (e ± µ ∓ )π + and D * + → D 0 (K − π + )π + candidates that pass the trigger selection are required to have a vertex, formed from two good-quality tracks associated with particles of opposite charge, that is well separated from any PV, with the summed momentum vector of the two particles pointing to a PV (the mean number of PVs per beam crossing is 1.6). The measured momentum of the electron candidates is corrected to account for loss of momentum by bremsstrahlung in LHCb simulation D 0 →e + µ -D 0 →π + πmisid Figure 1: Mass spectra from simulation for D 0 → e ± µ ∓ decays (solid line) and D 0 → π + π − decays reconstructed as D 0 → e ± µ ∓ (dashed line). Each spectrum is normalized to unit area. The vertical line indicates the mass of the D 0 meson. the detector, using the photon energy deposition in the electromagnetic calorimeter [22]. Muon and electron candidates, and pions and kaons from the D 0 → K − π + candidates, are required to have p > 4 GeV/c and p T > 0.75 GeV/c and to be positively identified by the particle identification systems. The soft pion from the candidate D * + → D 0 π + decay is required to have p T > 110 MeV/c and to be consistent with coming from the PV. A kinematic fit is performed, with the two D 0 decay tracks constrained to a secondary vertex and the soft pion and D 0 candidates constrained to come from the PV. This fit improves the resolution on the mass difference between the reconstructed D * + and D 0 mesons, which is required to be in the range 135 − 155 MeV/c 2 . About 2% of events contain more than one D * + → D 0 π + candidate and in these events one is chosen at random. After the above selections, 2114 candidates remain in the signal mass region for D 0 → e ± µ ∓ and 330 359 for D 0 → K − π + (the trigger accept rate for the latter channel is scaled to retain only 1% of candidates). An important source of background in the sample of D 0 → e ± µ ∓ candidates comes from D 0 → π + π − decays where one pion is misidentified as an electron and the other as a muon. From simulations and calibration samples in the data [13], the probability for a D 0 → π + π − event to be selected in the final sample of candidate signal events is found to be (1.0 ± 0.6) × 10 −8 in the 7 TeV data and (1.8 ± 0.4) × 10 −8 in the 8 TeV data. Figure 1 shows a comparison of the mass spectra, from simulation, for D 0 → e ± µ ∓ decays and for D 0 → π + π − decays reconstructed as D 0 → e ± µ ∓ , with each spectrum normalized to unit area. The low-mass tail for genuine D 0 → e ± µ ∓ decays is caused by bremsstrahlung from the electrons; about 15% of the signal lies below 1810 MeV/c 2 . The misidentified D 0 → π + π − decays produce a peak at a mass about 15 MeV/c 2 below the signal mass.
Misidentified D 0 → K − π + decays always have reconstructed mass below the region selected for the analysis, because of the large mass difference between kaons and electrons or muons; as a consequence, there is no background from this source. Other sources of background include the semileptonic decay modes D 0 → π − e + ν e and D 0 → π − µ + ν µ , with the pion misidentified as a muon or an electron, respectively. Since, as part of bremsstrahlung recovery, the energy of unrelated photons may be incorrectly added to the energy of the electron candidates, these semileptonic backgrounds extend smoothly above the signal region and are treated as part of the combinatorial background of e ± µ ∓ pairs where the two lepton candidates have different sources. Trigger, selection and particle identification efficiencies, and misidentification probabilities, are obtained from a combination of simulation and data. Control samples of well-identified electrons, muons, pions and kaons in data are obtained from J/ψ meson decays into pairs of electrons or muons and from D * + → D 0 (K − π + )π + decays, selected using different requirements from those used in the current analysis. These control samples are binned in pseudorapidity and transverse momentum of the tracks, and in the track multiplicity of the event. The hardware trigger efficiency for signal is evaluated using data, while the efficiency for the software trigger and offline selections is evaluated using simulation after validation with the data control samples. Where efficiencies are taken from the simulation, the samples are weighted to take into account differences between simulation and data, particularly in the distribution of per-event track multiplicities.

Multivariate classifier
A multivariate classifier based on a BDT [23] with a gradient boost [24] is used to divide the selected sample into bins of different signal purity. The following variables are used as inputs to the BDT: the smallest distance of closest approach of the D 0 candidate to any PV; an isolation variable that depends on how much additional charged particle momentum is in a region of radius R ≡ (∆η) 2 + (∆φ) 2 = 1 around the D * + candidate, where η and φ are pseudorapidity and azimuthal angle; χ 2 of the kinematic fit; and χ 2 IP , the impact parameter χ 2 with respect to the associated PV, for each of the D * + and D 0 candidates, and for the two D 0 decay tracks. The variable χ 2 IP is defined as the difference in vertex fit χ 2 with and without the particle considered. None of the BDT input variables contains particle identification information. It therefore performs equally well for the signal and normalization channels (and for the misidentified D 0 → π + π − decays).
The BDT is trained separately for the 7 TeV and 8 TeV data samples, to exploit the dependence of some input variables, for example the isolation variable, on the collision energy. The background sample used for the training comprises selected candidates with invariant mass within 300 MeV/c 2 of the known D 0 mass, but excluding the signal region, 1815 − 1915 MeV/c 2 . The training for signal is done with the simulated D 0 → e ± µ ∓ events. One half of each sample is used for training the BDT, while the other half is used to test for over-training. No evidence for over-training is seen. Following procedures used in Refs. [25,26], the BDT output value, which lies between −1 (most background-like) and 1 (most signal-like), is used to separate the data sample into three sub-samples with ranges chosen to give optimum separation between the background-only and signal-plusbackground hypotheses.

Fits to mass spectra
In order to determine the number of signal decays, extended maximum likelihood fits are made simultaneously to unbinned distributions of m(D 0 ) and ∆m = m(D * + ) − m(D 0 ) for the D 0 → e ± µ ∓ candidates in each of the three BDT bins for the 7 TeV and 8 TeV data. Hereinafter, m(D 0 ) denotes the mass of the D 0 candidate for both signal and normalization channels, and ∆m denotes the mass difference between the D * + and D 0 candidates. In these fits, from which the branching fraction is extracted directly, all systematic uncertainties, as discussed in Sect. 6, are included as Gaussian constraints on the appropriate parameters.
The D 0 → e ± µ ∓ signal probability density functions (PDF) in the three BDT bins are obtained from the simulation. The simulated D 0 → e ± µ ∓ mass spectra are fitted using the sum of two Crystal Ball functions [27] with a common peak value but different widths. One of the Crystal Ball functions has a low-mass tail to account for energy loss due to bremsstrahlung while the other is modified to have a high-mass tail to accommodate events where a bremsstrahlung photon is incorrectly assigned to an electron candidate. The per-event particle multiplicity affects the amount of bremsstrahlung radiation recovered for the electron candidates, and this differs between simulation and data. Therefore both the simulation and the data are classified in three bins of the variable N SPD , the number of hits in the scintillating pad detector, which is a measure of the particle multiplicity. The parameters of the signal PDF are obtained as averages of their values in the three bins of N SPD , weighted to account for data-simulation differences. The PDF shapes for the peaking background due to misidentified D 0 → π + π − decays (see Fig. 1) are obtained in the same way as for D 0 → e ± µ ∓ , using the same functional form for the signal shapes, and their yields are Gaussian-constrained in the fits. The combinatorial background for the D 0 candidate mass is described by a second-order polynomial.
The signal shapes in the ∆m distributions for the D 0 → e ± µ ∓ and D 0 → π + π − channels are each parametrised as a sum of three Gaussian functions; for D 0 → e ± µ ∓ two of the Gaussians functions have the same mean, but the one with the largest width is allowed to have a different mean, while the three mean values are independent for the D 0 → π + π − shape. In each case all three Gaussian functions have independent widths. The combinatorial background in ∆m is fitted using an empirical function of the form where N is a normalization factor, (∆m) 0 is the threshold mass difference, and a, b and c are free parameters. In the fits to the D 0 → e ± µ ∓ candidates, the parameter a is fixed to zero. A fraction of the D 0 → e ± µ ∓ and the misidentified D 0 → π + π − decays is associated to a random soft pion, and therefore peaks in m(D 0 ) but not in ∆m. This fraction is Gaussian constrained to the value 23.7 ± 0.2% found in the fits to the D 0 → K − π + normalisation channel, discussed below. Figure 2 shows the fit results for the combined 7 TeV and 8 TeV dataset, separately for the three bins of BDT output. The peaks seen in the m(D 0 ) and ∆m distributions are due to misidentified D 0 → π + π − decays. No evidence is seen for any D 0 → e ± µ ∓ signal. The fits return a total of −7 ± 15 signal decays.
For the normalisation channel D 0 → K − π + , for which there are many candidates, binned fits are done separately to the 7 TeV and 8 TeV samples, using a sum of two Gaussian functions with a common mean to model the D 0 candidate mass distribution, and a sum of three Gaussian functions for the ∆m distribution. In the latter case, two of the Gaussian functions have the same mean, but the one with the largest width is allowed to have a different mean. The function defined by Eq. (1) is used for the background in the ∆m spectrum, with all parameters allowed to vary in the fit. Figure 3 shows the results of the fit for the D 0 → K − π + normalization samples in the 8 TeV data, for both the m(D 0 ) and ∆m distributions. Totals of 80 × 10 3 and 182 × 10 3 D * + → D 0 (→ K − π + )π + decays are observed in the 7 TeV and 8 TeV data, respectively.

Systematic uncertainties
The uncertainty on the fitted D 0 → e ± µ ∓ signal rate is dominated by statistical fluctuations of the combinatorial background. Sources of systematic uncertainty that could affect the final result include those on the yield of the normalization D 0 → K − π + decay, uncertainties in the shapes of the PDFs used for D 0 → e ± µ ∓ and D 0 → π + π − , and uncertainties in the selection efficiencies and particle misidentification probabilities. All of these uncertainties are included as Gaussian constraints in the fits described in Sect. 5.
In the nominal fit to signal candidates, the parameters of the signal PDF, obtained from the simulation, are Gaussian constrained according to their uncertainties. To obtain these uncertainties, samples of B + → J/ψ K + decays with J/ψ → e + e − are selected in both simulation and data, and the e + e − mass spectra are fitted using the same functional form as used for D 0 → e ± µ ∓ . The fractional differences in the parameter values between the J/ψ → e + e − fits to the data and to the simulation are taken as the fractional systematic uncertainties on the corresponding parameters of the PDF for the D 0 → e ± µ ∓ candidate mass spectra.
For the fits to the fully simulated, misidentified D 0 → π + π − mass spectra, some selection requirements are removed in order to have enough events to obtain reliable fits. The efficiency of the selection requirements that are not applied varies linearly by a relative 9.4% with reconstructed mass across the fit region. The PDF for the peak shape in the misidentified D 0 → π + π − decays is corrected for this variation of efficiency, and the resulting contribution to the systematic uncertainty on the yield is taken as 4.7%.
To allow for uncertainties in the fractions of D 0 → e ± µ ∓ signal and misidentified D 0 → π + π − decays that are estimated in the three bins of BDT output, a comparison is made between these fractions for simulated D 0 → e ± µ ∓ , simulated D 0 → π + π − and well identified D 0 → π + π − decays in the data. Since the BDT does not take into account particle identification, the largest differences between these fractions in each bin, typically 2.5%, are taken as the systematic uncertainties on the fractions in the data.
To account for differences between data and simulation in the per-event track multiplicity, the reconstruction efficiencies and misidentification probabilities for simulated events are evaluated in three bins of N SPD . These are then weighted to match the multiplicity distribution in the data. Half of the differences between the unweighted and the weighted efficiencies and misidentification probabilities, typically 5%, are taken as the systematic uncertainties on these quantities. Further uncertainties, of 2.5% for each of D 0 → e ± µ ∓ and D 0 → π + π − , are included to account for limited knowledge of the tracking efficiencies.
Using the calibration samples, particle identification and trigger efficiencies are estimated in bins of pseudorapidity, transverse momentum and event multiplicity. Overall efficiencies are determined by scaling the simulation so that the distributions in these variables match the data. To estimate systematic uncertainties from this procedure, different binning schemes are used and the resulting changes in the efficiency values are treated as systematic uncertainties. Overall systematic uncertainties are 6% on the D 0 → e ± µ ∓ selection efficiency and 30% on the D 0 → π + π − misidentification probability.
To study systematic effects in the fit to the normalization channel, the order of the background polynomial is increased, the number of bins changed, fixed parameters are varied and the Gaussian mean values in the ∆m fits are constrained to be equal. From these studies a contribution of 1% is assigned to the systematic uncertainty on the yield. Similar procedures as described above for the signal channel are also used to evaluate the other systematic uncertainties for the D 0 → K − π + normalization channel. The resulting overall systematic uncertainty in the measured number of D 0 → K − π + decays is 5%.

Results and conclusions
The measured branching fraction for the signal channel is given by where N eµ and N Kπ are the fitted numbers of D 0 → e ± µ ∓ and D 0 → K − π + decays, the corresponding are the overall efficiencies, and the branching fraction for the normalization channel, B(D 0 → K − π + ) = (3.88 ± 0.05)%, is taken from Ref. [28]. The efficiencies eµ = (4.4 ± 0.3) × 10 −4 and Kπ = (2.5 ± 0.1) × 10 −6 , for the signal and normalization channels, are the products of the reconstruction efficiencies for the final-state particles, including the geometric detector acceptance, the selection efficiencies, and the trigger efficiencies (including the 1% scaling in the trigger for the D 0 → K − π + channel). No evidence is seen for a D 0 → e ± µ ∓ signal in the overall mass spectrum, nor in any individual bin of BDT output, and the measured branching fraction is B(D 0 → e ± µ ∓ ) = (−0.6 ± 1.2) × 10 −8 , where the uncertainty accounts for both statistical and systematic effects. An upper limit on the branching fraction is obtained using the CL S method [29], where the p-value for the signal-plus-background hypothesis is compared to that for the background-only hypothesis. The expected and observed CL S values as functions of the assumed branching fraction are shown in Fig. 4, where the expected CL S values are obtained using an Asimov dataset [30] as described in Ref. [31], and are the median expected limits under the assumption of no signal. Expected limits based on pseudoexperiments give consistent results. There is excellent correspondence between the expected and observed CL S values, and an upper limit is set on the branching fraction, B(D 0 → e ± µ ∓ ) < 1.3 × 10 −8 at 90% CL (and < 1.6 × 10 −8 at 95% CL). This limit will help to further constrain products of couplings in supersymmetric models that incorporate R-parity violation [1][2][3] and constrains the parameter space in some leptoquark scenarios [7].
In summary, a search for the lepton-flavour violating decay D 0 → e ± µ ∓ is performed on a data sample corresponding to an integrated luminosity of 3.0 fb −1 collected in pp collisions at centre-of-mass energies of 7 and 8 TeV. The data are consistent with the background-only hypothesis, and a limit is set on the branching fraction, B(D 0 → e ± µ ∓ ) < 1.3 × 10 −8 at 90% CL, which is an order of magnitude lower than the previous limit.