B flavour tagging using charm decays at the LHCb experiment

An algorithm is described for tagging the flavour content at production of neutral B mesons in the LHCb experiment. The algorithm exploits the correlation of the flavour of a B meson with the charge of a reconstructed secondary charm hadron from the decay of the other b hadron produced in the proton-proton collision. Charm hadron candidates are identified in a number of fully or partially reconstructed Cabibbo-favoured decay modes. The algorithm is calibrated on the self-tagged decay modes B+ → J/ψ K+ and B0 → J/ψ K*0 using 3.0 fb−1 of data collected by the LHCb experiment at pp centre-of-mass energies of 7 TeV and 8 TeV. Its tagging power on these samples of B → J/ψ X decays is (0.30 ± 0.01 ± 0.01)%.


Introduction
Measurements that involve mixing and time-dependent CP asymmetries in decays of neutral B mesons require the identification of their flavour content at production. This is achieved via various flavour tagging algorithms that exploit information from the rest of the pp collision event. Sameside (SS) taggers look for particles produced in association with the signal B meson during the hadronization of the b quark [1]. The d or s partner of the light valence quark of the signal B has a roughly 50% chance of hadronizing into a charged pion or kaon. Since b quarks are mostly produced in bb pairs, the flavour content of the signal B meson can also be deduced from available information on the opposite-side (OS) b hadron, whose flavour is the opposite of the signal B meson at the production time. OS muon and electron taggers look for leptons originating from semileptonic b → cW transitions of the b hadron, and an OS kaon tagger looks for kaons coming from b → c → s transitions. A vertex-charge tagger reconstructs the decay vertex of the OS b hadron and predicts its charge by weighting the charges of its decay products according to their transverse momentum. The OS taggers employed by LHCb are described in ref. [2] and the SS taggers in refs. [3,4]. This paper reports a new flavour tagging algorithm for the LHCb experiment that relies on reconstructed decays of charm hadrons produced in the OS b hadron decay. For the development and evaluation of the tagging algorithm, signal B meson and charm hadron candidates are reconstructed using data from 3 fb −1 of integrated luminosity collected by LHCb at 7 TeV and 8 TeV centre-of-mass energies in 2011 and 2012, respectively.
The performance of a flavour tagging algorithm is defined by its tagging efficiency, ε tag , mistag fraction, ω, and dilution, D = 1 − 2ω. For a simple tagging algorithm with discrete decisions -B 0 , B 0 , or untagged -these metrics are directly related to the numbers of rightly tagged (R), wrongly tagged (W ), and untagged events (U) in a signal sample:

JINST 10 P10005
The performance of the flavour tagging algorithms is improved by assigning confidence weights to their tagging decisions. For each tagger, a multivariate classifier is trained using simulated data to distinguish between correct and incorrect decisions [2]. The inputs to the classifier are a selection of kinematic and geometric quantities describing the tagging track(s), the signal B meson, and the event. This classifier then calculates a predicted mistag probability η for each decision made. The predicted mistag probability is calibrated to data using an appropriate flavour self-tagged mode, such as B + → J/ψ K + , or a mode involving neutral B oscillation, which self-tags its flavour at the decay-time, such as B 0 → J/ψ K * 0 or B 0 s → D − s π + [4,5] (the use of charge-conjugate modes is implied throughout this paper). This calibration procedure provides a function ω(η), which relates the actual mistag probability ω to the predicted mistag probability η. Weighting each signal candidate by 1 − 2ω(η) leads to an improved effective mistag fraction ω and associated dilution D = 1 − 2ω. The statistical power of a CP asymmetry measurement using a tagging algorithm is proportional to the effective tagging efficiency (or tagging power) ε eff , defined as (1. 2) The typical combined tagging power of the current set of OS tagging algorithms used by LHCb is approximately 2.5% [3,[6][7][8]. Any augmentation to this tagging power increases the statistical precision achievable in CP measurements at LHCb.

Detector and simulation
The LHCb detector [9,10] is a single-arm forward spectrometer covering the pseudorapidity range between 2 and 5, designed for the study of particles containing b or c quarks. The detector includes a high-precision tracking system consisting of a silicon-strip vertex detector surrounding the pp interaction region [11], a large-area silicon-strip detector located upstream of a dipole magnet with a bending power of about 4 Tm, and three stations of silicon-strip detectors and straw drift tubes [12] placed downstream of the magnet. The tracking system provides a measurement of momentum, p, of charged particles with a relative uncertainty that varies from 0.5% at low momentum to 1.0% at 200 GeV/c. The minimum distance of a track to a primary vertex (PV), the impact parameter, is measured with a resolution of (15 + 29/p T ) µm, where p T is the component of the momentum transverse to the beam, in GeV/c. Different types of charged hadrons in the momentum range 2-100 GeV/c are distinguished using information from two ring-imaging Cherenkov detectors [13]. Photons, electrons and hadrons are identified by a calorimeter system consisting of scintillatingpad and preshower detectors, an electromagnetic calorimeter and a hadronic calorimeter. Muons are identified by a system composed of alternating layers of iron and multiwire proportional chambers [14]. The online event selection is performed by a trigger [15], which consists of a hardware stage, based on information from the calorimeter and muon systems, followed by a software stage, which applies a full event reconstruction.
In the simulation, pp collisions are generated using PYTHIA [16,17] with a specific LHCb configuration [18]. Decays of hadronic particles are described by EVTGEN [19], in which finalstate radiation is generated using PHOTOS [20]. The interaction of the generated particles with the detector, and its response, are implemented using the GEANT4 toolkit [21] as described in ref. [22].

Tagging potential of OS charm hadrons
In events containing a signal B decay, the opposite-side D + , D 0 , and Λ + c charm hadrons are primarily produced through the quark-level b → c transition. The charge of the D + or Λ + c determines the flavour of the b hadron parent. For D 0 decays through the dominant Cabibbo-favoured process D 0 → K − X, the kaon charge determines the flavour of the charm hadron, and thereby that of the parent B hadron (the effect of D 0 mixing is negligible). The OS charm tagging algorithm uses charm hadron candidates reconstructed in a number of decay modes, chosen for their relatively large branching fractions, listed in table 1. These include fully reconstructed hadronic modes with a single charged kaon in the final state, partially reconstructed hadronic modes with an unobserved neutral pion, and partially reconstructed semileptonic modes. Table 1 also reports the breakdown of the charm tagger's performance by decay mode. The relative rate and relative power of each mode are the amounts that it contributes to the algorithm's total tagging rate ε tag and tagging power ε eff , which are presented in section 6 and table 3. The algorithm predicts the flavour of the signal B meson using the charge of the kaon in the same manner as the OS kaon tagger; however, the selection based on the reconstruction of c hadrons (rather than the selection of kaons based on their individual kinematic properties) results in a different set of selected kaons and provides a complementary source of tagging information.
Several effects contribute an irreducible component to the mistag probability for the OS charm tagging algorithm. The dominant impact comes from B 0 -B 0 oscillation and from the contributions of "wrong sign" charm hadrons produced in b → ccq transitions. The impact of Cabibbosuppressed D 0 → K + X decays is negligible, as these typically produce additional kaons and do not mimic modes used by the tagging algorithm, and doubly Cabibbo-suppressed decays such as D 0 → K + π − have a negligibly small branching fraction. Accounting for relative production rates of b hadrons, neutral B oscillation, and branching fractions of the decay modes used in the tagger, the irreducible mistag probabilities for D 0 , D + and Λ + c modes are estimated to be 23%, 19%, and 6%, respectively.
In addition to the irreducible mistag probability arising from physics effects, the charm hadron candidates are contaminated by combinatorial and partially reconstructed b and c hadron background that can lead to an incorrect flavour tag result. For each mode, the charm tagger uses a multivariate algorithm that combines geometric and kinematic quantities and properties of the c hadron candidate and its daughters. The resulting discriminating variable is used both to suppress the combinatorial background and to predict the corresponding mistag probability for the surviving candidate.

Selection of charm candidates
Charm decay candidates are formed by combining kaon, pion, and proton candidates that satisfy particle identification criteria. These particles are required to have momentum p > 1000 MeV/c, transverse momentum with respect to the beam axis p T > 100 MeV/c, and to be significantly displaced from any PV. For the candidates in the partially reconstructed modes and the decay D 0 → K − π + π + π − , which contain large combinatorial backgrounds, more stringent requirements are imposed on the displacement of the final-state particles from the PV. In addition, particles are required to have p T > 150 MeV/c for candidates in the mode D 0 → K − π + π + π − .
-3 - Table 1. Decay modes used in the OS charm tagger. The symbol H c stands for any c hadron. The definition of the two right-most columns is given in the text.

Decay mode
Relative rate Relative power Charm hadron candidates are required to pass a number of selection requirements. These include a maximum distance of closest approach between each pair of daughter tracks and a minimum quality of the decay vertex fit. Each candidate is required to be well separated from any PV and to have a trajectory that leads back to the best PV, chosen to be the PV for which the impact parameter significance of the charm hadron is smallest. The invariant mass of the charm hadron candidate is required to be consistent with the known mass of the corresponding charm hadron, within 100 MeV/c 2 for the Λ + c channel and 50 MeV/c 2 for all other fully reconstructed D decay modes. For the partially reconstructed D → K − π + X modes, the K − π + mass is required to be in a −400 MeV/c 2 , +0 MeV/c 2 window around the known D 0 mass or in a window of ±50 MeV/c 2 around the K * (892) 0 resonance. The former is favoured by the invariant mass distribution of K − π + pairs from the quasi-two body decay D 0 → K − ρ + , and the latter selects D → K * (892) 0 X decays. Charm candidates surviving these criteria still contain significant background contamination, which must be further reduced in order to lower the mistag probability of the algorithm.
For each mode, an adaptive-boosted decision tree (BDT) [23,24] is used both to suppress background candidates and to estimate mistag probabilities. The inputs to the BDT are variables describing the decay kinematics, decay vertex and displacement, and particle identification information on the decay products. A variable related to the decay-time is calculated from the distance between the c hadron's decay vertex and the corresponding best PV; this approximates the sum of the decay-times of the c hadron and its parent b hadron. The BDT algorithms are trained using Monte Carlo (MC) simulations of bb events containing B + → J/ψ K + , B 0 → J/ψ K * 0 , and B 0 s → J/ψ φ decays on the signal side and inclusive decays of the b hadron on the opposite-side. These B decays are used to model the various sources and amounts of background when reconstructing OS c hadrons recoiling against signal B decays.
The output of the BDT, along with the simulation record of candidate identification, is used to compute the predicted mistag probability η for each c hadron candidate. Candidates with η < 45% are used in the flavour tagging decision. Removing candidates that fail this criterion significantly reduces the computing time of the algorithm at little cost to tagging performance. In cases where multiple charm candidates are present, the candidate with the lowest predicted mistag probability is retained. The combined efficiency of these requirements for retaining tagged events is (59.00 ± 0.07)% and (53.4 ± 0.3)% in simulation and data, respectively.

Calibration
While simulated data are used to develop and optimize the charm tagging algorithm, its performance is calibrated with collision data by comparing the algorithm's predictions to the known flavours of signal B candidates, according to the procedure detailed in ref. [2]. The calibration parameters p 0 , p 1 , ∆p 0 , and ∆p 1 are defined by where η is the average predicted mistag probability, ω is the actual mistag probability averaged over B + and B − signal mesons, and ∆ω is the excess mistag probability for B + mesons with respect to B − mesons; equivalent definitions hold for B 0 /B 0 signal. In the ideal case, the offset parameter p 0 should equal η , and so the related parameter δ p 0 = p 0 − η is often more convenient.
A calibration of the algorithm has been performed using the flavour self-tagged mode B + → J/ψ K + . The signal candidates are selected by combining pairs of oppositely charged muons, with invariant mass consistent with the known J/ψ mass, with charged kaons, and are required to pass a set of cuts to obtain a good signal to background ratio [2]. When multiple candidates are present for a single event, that with the best decay vertex fit is kept. A fit to the reconstructed B + mass distribution is used to separate signal and background via the sPlot procedure, which computes signal and background weights for each candidate [25]. The empirical model for the signal is a sum of two Crystal Ball functions [26], while background is modeled by an exponential distribution. A total of 1.1 × 10 6 signal candidates in this channel are found in the full dataset. The parameters p 0 and p 1 are determined by splitting the data into 13 bins of η between 0.19 and 0.45, calculating A cross-check of the calibration has been carried out using a B 0 → J/ψ K * 0 control sample. For this calibration, B 0 -B 0 oscillation must be taken into account. The Hypatia function [27] is used to model the signal's mass distribution, while the background is modeled with a sum of two exponential functions. A set of simultaneous fits to the B 0 lifetime distribution in bins of η is performed, in which p 0 , p 1 , ∆p 0 , and ∆p 1 are parameters of the fit model. In each bin, the raw B 0 -B 0 mixing asymmetry is defined as where D is the B meson flavour at decay-time and P is the production flavour predicted by the charm tagger. The amplitude of this asymmetry is governed by the actual mistag fraction ω i in the bin, while the bin's average predicted mistag probability isη i . The fit attempts to match the calibrated value ω(η i ) to ω i in each bin by adjusting the calibration parameters. A projection of the fitted model to the mixing asymmetry is shown in Fig. 3. The values of the calibration parameters obtained from the fit are given in table 2. The parameters are compatible with those obtained in the B + → J/ψ K + mode, with the total χ 2 per degree of freedom equal to 0.65.
-5 -   Figure 2. Excess mistag probability ∆ω as a function of the predicted mistag probability η for the B + → J/ψ K + data sample. A straight line fit to extract the parameters ∆p 0 and ∆p 1 is superimposed. The dark (green) and light (yellow) bands are the regions within 1σ and 2σ of the fitted value, respectively.  Figure 3. Raw B 0 -B 0 mixing asymmetry (defined in eq. 5.1) vs. decay-time for the B 0 → J/ψ K * 0 data sample. The amplitude of the asymmetry is diluted due to mistagging by the charm tagger. The mixing asymmetry from the fit is superimposed. Table 2. Calibration parameters as determined from the B + → J/ψ K + and B 0 → J/ψ K * 0 control samples. For both calibration modes, the average predicted mistag probability η is 0.379. The first uncertainties are statistical and the second are systematic. The systematic uncertainties are evaluated using simulation.

Sample
δ p 0 (10 −3 ) p 1 ∆p 0 (10 −3 ) ∆p 1 B + → J/ψ K + −25 ± 3 ± 3 1.00 ± 0.06 ± 0.02 15 ± 5 ± 4 −0.08 ± 0.12 ± 0.04 B 0 → J/ψ K * 0 −18 ± 8 ± 3 1.16 ± 0.17 ± 0.02 23 ± 11 ± 4 0.21 ± 0.25 ± 0.04 The relatively small yield of the decay B 0 s → D − s π + precludes performing a data-driven calibration on a B 0 s mode. Therefore, in order to ensure that the algorithm performs similarly for B 0 s channels as well as B + and B 0 channels, separate calibrations to simulated B + → J/ψ K + , B 0 → J/ψ K * 0 , and B 0 s → J/ψ φ events are performed. Where statistically significant differences between the calibration parameters in the three channels are found, a systematic uncertainty, corresponding to half of the maximum difference, is assigned to the parameter. These systematic uncertainties are roughly the size of the statistical uncertainties for the parameters p 0 and ∆p 0 , but are negligible for p 1 and ∆p 1 . The propagation of these uncertainties results in a 0.011% absolute systematic uncertainty on the tagging power, comparable to its statistical uncertainty.
Other sources of systematic uncertainty on calibration parameters have been investigated and found to have negligible effect. These include the potential effect of the chosen model of the invariant B mass distribution for the channel B + → J/ψ K + . Two alternative models of the mass distribution were used and gave nearly identical results.
There are additional systematic uncertainties related to flavour tagging that must be considered in a CP asymmetry analysis. These include differences between the signal channel sample and calibration channel sample in phase space distribution, event multiplicity, number of primary vertices,

Performance
The distribution of η after calibration for the B + → J/ψ K + control sample is shown in Fig. 4. The tagging efficiency, mistag fraction, and the tagging power of the charm tagger are reported in table 3 for the training sample of simulated B → J/ψ X decays and for both calibration channels. The propagated statistical uncertainty of the calibration parameters dominates the statistical uncertainty of the tagging power. The overall tagging power is slightly higher in simulation than in data, due to differences in the distributions of input variables. The tagging powers in the two B → J/ψ X calibration channels are consistent. Table 3 also reports the tagging metrics for the decays B 0 → D − π + and B 0 s → D − s π + . Fits to the mass distributions of the signal candidates are performed to separate signal from background.
In each fit, the signal is modeled by a sum of two Crystal Ball functions and the combinatorial background is described by an exponential function. Several fully and partially reconstructed backgrounds are also modeled in the fit to the B 0 s → D − s π + sample. The tagging efficiency for these samples is found to be higher than for the samples of B → J/ψ X decays, due to correlations between the kinematics of the signal B and the opposite-side charm hadrons. The effective mistag fraction for these samples is consistent with that on the B → J/ψ X samples. The net effect is an increased tagging power for these B → DX decays, similar to that observed for other opposite-side tagging algorithms [7,28].
To use the charm tagger in a physics analysis, the flavour tagging information from the charm tagger can be combined with information from other tagging algorithms. Assessing the actual gain in performance depends on the method of combination and calibration, as well as on the set of tagging algorithms being combined. Due to correlations with other tagging algorithms, in particular the OS kaon and vertex-charge taggers, the maximum possible increase in tagging power after the addition of the charm tagging algorithm is less than its individual tagging power. The performance of the combination of the current OS tagging algorithms with and without the addition of the charm tagger has been measured on the B + → J/ψ K + data sample. The absolute net gain in tagging power using the current combination algorithm is found to be around 0.11%, compared to the current total OS tagging power of about 2.5% [3, 6-8].

Conclusion
An algorithm has been developed that determines the flavour of a signal b hadron at production time by reconstructing opposite-side charm hadrons from a number of decay channels. The flavour tagger uses boosted decision tree algorithms trained on simulated data, and has been calibrated and evaluated on data using the self-tagged decay B + → J/ψ K + . Its tagging power for data in this channel is found to be (0.30 ± 0.01 (stat) ± 0.01 (syst))%. The calibration has been crosschecked using the decay B 0 → J/ψ K * 0 , giving consistent results. The tagging power is found to be higher for the decays B 0 → D − π + and B 0 s → D − s π + , at (0.40 ± 0.02 (stat) ± 0.01 (syst))% and (0.39 ± 0.03 (stat) ± 0.01 (syst))%, respectively.