First study of the CP-violating phase and decay-width difference in $B_s^0\to\psi(2S)\phi$ decays

A time-dependent angular analysis of $B_s^0\to\psi(2S)\phi$ decays is performed using data recorded by the LHCb experiment. The data set corresponds to an integrated luminosity of 3.0\invfb collected during Run 1 of the LHC. The CP-violating phase and decay-width difference of the $B_s^0$ system are measured to be $\phi_s = 0.23^{+0.29}_{-0.28} \pm 0.02$ rad and $\Delta\Gamma_s = 0.066^{+0.041}_{-0.044} \pm 0.007$ ps$^{-1}$, respectively, where the first uncertainty is statistical and the second systematic. This is the first time that $\phi_s$ and $\Delta\Gamma_s$ have been measured in a decay containing the $\psi(2S)$ resonance.


Introduction
The interference between the amplitudes of decays of B 0 s mesons to ccX CP eigenstates directly or via mixing, gives rise to a CP -violating phase, φ s . In the Standard Model (SM), ignoring subleading penguin contributions, this phase is predicted to be −2β s , where β s = arg[−(V ts V * tb )/(V cs V * cb )] and V ij are elements of the CKM quark flavour mixing matrix [1].
Measurements of φ s using B 0 s → J/ψ K + K − and B 0 s → J/ψ π + π − decays have been reported previously by the LHCb collaboration [2] based upon 3.0 fb −1 of integrated luminosity collected in pp collisions at a centre-of-mass energy of 7 TeV in 2011 and 8 TeV in 2012 at the LHC. Measurements of φ s using B 0 s → J/ψ φ decays have also been made by the D0 [3], CDF [4], CMS [5] and ATLAS [6] collaborations. The world-average value of these direct measurements is φ s = −0.033 ± 0.033 rad [7]. The global average from indirect measurements gives φ s = −0.0376 +0.0007 −0.0008 rad [8]. Measurements of φ s are interesting since new physics (NP) processes could modify the phase if new particles were to contribute to the box diagrams describing B 0 s -B 0 s mixing [9, 10]. In this analysis φ s is measured using a flavour tagged, decay-time dependent angular analysis of B 0 s → ψ(2S)φ decays, with ψ(2S) → µ + µ − and φ → K + K − . In addition, measurements of the decay-width difference of the light (L) and heavy (H) B 0 s mass eigenstates, ∆Γ s ≡ Γ L − Γ H , the average B 0 s decay width, Γ s ≡ (Γ L + Γ H )/2, and the polarisation amplitudes of the B 0 s → ψ(2S)φ decay are reported. This is the first time that a higher cc resonance is used to measure φ s . This analysis follows very closely that of B 0 s → J/ψ K + K − decays in Refs. [2,11], and only significant changes with respect to those analyses are described in this paper. Section 2 describes the phenomenology of the B 0 s → ψ(2S)φ decay and the physics observables. Section 3 describes the LHCb detector, data and simulated samples that are used along with the optimisation of their selection. Section 4 details the B 0 s meson decaytime resolution, decay-time efficiency and angular acceptance and Section 5 describes the flavour tagging algorithms. Results and systematic uncertainties are given in Section 6 and Section 7, respectively. Conclusions are presented in Section 8.

Phenomenology
The full formalism used for this analysis can be found in Ref. [11], where the J/ψ is now replaced with the ψ(2S) meson. The differential cross-section as a function of the signal decay time, t, and three helicity angles, Ω = (cos θ µ , cos θ K , ϕ) (Fig. 1), is described by a sum of ten terms, corresponding to the four polarisation amplitudes (three corresponding to the K + K − from the φ being in a P -wave configuration, and one to allow for an additional non-resonant K + K − S-wave component) and their interference terms. Each term is the product of a time-dependent function and an angular function, where the definitions of h k (t) and f k (Ω) are given in Ref. [11]. The f k (Ω) functions depend only upon the final-state decay angles. The h k (t) functions depend upon all physics parameters of interest, which are Γ s , ∆Γ s , φ s , |λ|, the mass difference of the B 0 s eigenstates, ∆m s , and the polarisation amplitudes A i = |A i |e −iδ i , where the indices i ∈ {0, , ⊥, S} refer to the different polarisation states of the K + K − system. The sum |A | 2 +|A 0 | 2 +|A ⊥ | 2 equals unity and by convention δ 0 is zero. The S-wave fraction is defined as F S ≡ |A S | 2 /(|A 0 | 2 + |A ⊥ | 2 + |A | 2 + |A S | 2 ). The parameter λ describes CP violation in the interference between mixing and decay and is defined by λ = η i (q/p)(Ā i /A i ). The complex parameters p = B 0 s |B s,L and q = B 0 s |B s,L describe the relation between flavour and mass eigenstates, where B s,L is the light mass eigenstate and η i is the CP eigenvalue of the polarisation state i. The CP -violating phase is defined by φ s ≡ − arg (η i λ) and is assumed here to be the same for all polarisation states. In the absence of CP violation in decay it follows that |λ| = 1. In this paper CP violation in B 0 s -meson mixing is assumed to be negligible, following measurements in Refs. [12,13].

Detector, data set and selection
The LHCb detector [14,15] is a single-arm forward spectrometer covering the pseudorapidity range 2 < η < 5, designed for the study of particles containing b or c quarks. The detector includes a high-precision tracking system consisting of a siliconstrip vertex detector surrounding the pp interaction region, a large-area silicon-strip detector located upstream of a dipole magnet with a bending power of about 4 Tm, and three stations of silicon-strip detectors and straw drift tubes placed downstream of the magnet. The tracking system provides a measurement of momentum, p, of charged particles with a relative uncertainty that varies from 0.5% at low momentum to 1.0% at 200 GeV/c. The minimum distance of a track to a primary vertex (PV), the impact parameter, is measured with a resolution of (15 + 29/p T ) µm, where p T is the component of the momentum transverse to the beam, in GeV/c. Different types of charged hadrons are distinguished using information from two ring-imaging Cherenkov detectors. Photons, electrons and hadrons are identified by a calorimeter system consisting of scintillating-pad and preshower detectors, an electromagnetic calorimeter and a hadronic calorimeter. Muons are identified by a system composed of alternating layers of iron and multiwire proportional chambers.
The online event selection is performed by a trigger [16], which consists of a hardware stage, based on information from the calorimeter and the muon system, followed by a software stage. In this analysis, candidates are required to pass the hardware trigger that selects muons and muon pairs based on their transverse momentum. In the software stage, events are triggered by a ψ(2S) → µ + µ − candidate, where the ψ(2S) is required to be consistent with coming from the decay of a b hadron, by using either impact parameter requirements on the decay products or the detachment of the ψ(2S) candidate from the PV.
In the simulation, pp collisions are generated using Pythia [17] with a specific LHCb configuration [18]. Decays of hadronic particles are described by EvtGen [19], in which final-state radiation is generated using Photos [20]. The interaction of the generated particles with the detector, and its response, are implemented using the Geant4 toolkit [21] as described in Ref. [22].
The B 0 s → ψ(2S)φ candidates are first selected with loose requirements to ensure high efficiency and significant background rejection. The ψ(2S) candidates are reconstructed from pairs of oppositely-charged particles identified as muons, and the φ candidates are reconstructed from pairs of oppositely-charged particles identified as kaons. The invariant mass of the muon (kaon) pair must be within 60 MeV/c 2 (12 MeV/c 2 ) of the known ψ(2S) (φ) mass [23]. Reconstructed kaon tracks that do not correspond to actual trajectories of charged particles are suppressed by requiring a good track χ 2 per degree of freedom. The p T of each φ candidate is required to be larger than 1 GeV/c.
The ψ(2S) and φ candidates that are consistent with originating from a common vertex are combined to create B 0 s candidates. Subsequently, a kinematic fit [24] is applied to the B 0 s candidates in which the ψ(2S) mass is constrained to the known value [23] and the B 0 s candidate is required to point back to the PV, to improve the resolution on the invariant mass m(ψ(2S)K + K − ). Combinatorial background from particles produced at the PV is reduced by requiring that the B 0 s candidate decay time (computed from a vertex fit without the PV constraint) is larger than 0.3 ps. Backgrounds from the misidentification of final-state particles from other decays such as B 0 → ψ(2S)K + π − and Λ 0 b → ψ(2S)pK − are negligible.
To further improve the signal-to-background ratio, a boosted decision tree (BDT) [25,26] is applied. The BDT is trained using simulated B 0 s → ψ(2S)φ events for the signal, while candidates from data with m(ψ(2S)K + K − ) larger than 5400 MeV/c 2 are used to model the background. Twelve variables that have good discrimination power between signal and background are used to define and train the BDT. These are: the B 0 s candidate kinematic fit χ 2 ; the p T of the B 0 s and φ candidates; the B 0 s candidate flight distance and impact parameter with respect to the PV; the ψ(2S) candidate vertex χ 2 ; the χ 2 IP of the kaon and muon candidates (defined as the change in χ 2 of the PV fit when reconstructed with and without the considered particle) and the muon identification probabilities. The optimal working point for the BDT is determined using a figure of merit that optimises the statistical power of the selected data sample for the analysis of φ s by taking account of the number of signal and background candidates, as well as the decay-time resolution and flavour-tagging power of each candidate. Figure 2 shows the distribution of m(ψ(2S)K + K − ) for the selected B 0 s → ψ(2S)φ candidates. An extended maximum likelihood fit is made to the unbinned m(ψ(2S)K + K − ) distribution, where the signal component is described by the sum of two Crystal Ball [27] functions and the small combinatorial background by an exponential function. All parameters are left free in the fit, including the yields of the signal and background components. This fit gives a yield of 4695 ± 71 signal candidates and 174 ± 10 background candidates in the range m(ψ(2S)K + K − ) ∈ [5310, 5430] MeV/c 2 . It is used to assign per-candidate weights (sWeights) via the sPlot technique [28], which are used to subtract the background contribution in the maximum likelihood fit described in Section 6.

Detector resolution and efficiency
The resolution on the measured decay time is determined with the same method as described in Refs. [2, 11] by using a large sample of prompt J/ψ K + K − combinations produced directly in the pp interactions. These events are selected using prompt J/ψ → µ + µ − decays via a prescaled trigger that does not impose any requirements on the separation of the J/ψ from the PV. The J/ψ candidates are combined with oppositely charged tracks that are identified as kaons, using a similar selection as for the signal decay. The resolution model, R(t − t ), is the sum of two Gaussian distributions with per-event widths. These widths are calibrated by using a maximum likelihood fit to the unbinned decay time and decay-time uncertainty distributions of the prompt J/ψ K + K − combinations, using a model composed of the sum of a δ function for the prompt component and two exponential functions for long-lived backgrounds, all of which are convolved with the resolution function. A third Gaussian distribution is added to the total fit function to account for the small (< 1%) fraction of decays that are associated to the wrong PV. The average effective resolution is 46.6 ± 1.0 fs. Simulated B 0 s → J/ψ K + K − and B 0 s → ψ(2S)K + K − events show no significant difference in the effective decay-time resolution between the two decay modes.
The reconstruction efficiency is not constant as a function of decay time due to displacement requirements made on signal tracks in the trigger and event selection. The efficiency is determined using the control channel B 0 → ψ(2S)K * (892) 0 , with K * (892) 0 → K + π − , which is assumed to have a purely exponential decay-time distribution. It is defined as where ε B 0 data (t) is the efficiency of the control channel and ε B 0 is the ratio of efficiencies of the simulated signal and control modes after the full trigger and selection chain has been applied. This correction accounts for the small differences in the lifetime and kinematics between the signal and control modes.
The B 0 → ψ(2S)K * (892) 0 decay is selected using a similar trigger, preselection and the same BDT training and working point as used for the signal (with appropriate changes for kaon to pion). Backgrounds from the misidentification of final-state particles from other decays such as B 0 s → ψ(2S)φ and Λ 0 b → ψ(2S)pK − are negligible. Similarly, possible backgrounds from B 0 (s) → ψ(2S)π + π − decays where a pion is misidentified as a kaon, and B + → ψ(2S)K + decays combined with an additional random pion, are negligible.
The ψ(2S)K + π − invariant mass distribution is shown in Fig. 3 along with the result of a fit composed of the sum of two Crystal Ball (CB) functions for the signal and an exponential function for the background. The tail parameters and relative fraction of the two CB functions are fixed to values obtained from a fit to simulated B 0 → ψ(2S)K * (892) 0 decays. The core widths and common mean of the CB functions are free in the fit and the B 0 yield is found to be 28 676 ± 195. The efficiency is defined as is the number of signal B 0 → ψ(2S)K * (892) 0 decays in a given bin of decay time and N B 0 gen (t) is the number of events generated from an exponential distribution with lifetime τ B 0 = 1.520 ± 0.004 ps [23]. The exponential distribution is convolved with a double Gaussian resolution model, the parameters of which are determined from a fit to the decay time distribution of prompt J/ψ K + π − combinations. In total 10 7 events are generated. The sPlot [28] technique with m(ψ(2S)K + π − ) as discriminating variable is used to determine N B 0 data (t). The analysis is not sensitive to the absolute scale of the efficiency. The final decay-time efficiency for the B 0 s → ψ(2S)φ signal is shown in Fig. 4. It is relatively uniform at high values of decay time but decreases at low decay times due to selection requirements placed on the track χ 2 IP variables.
[ps] t The efficiency as a function of the B 0 s → ψ(2S)φ helicity angles is not uniform due to the forward geometry of the LHCb detector and the requirements imposed on the final-state particle momenta. The three-dimensional efficiency, ε(Ω), is determined with the same technique as used in Ref. [11] using simulated events that are subjected to the same trigger and selection criteria as the data. The relative efficiencies vary by up to 20%, dominated by the dependence on cos θ µ .

Flavour tagging
The B 0 s candidate flavour at production is determined by two independent classes of flavour tagging algorithms, the opposite-side (OS) taggers [29] and the same-side kaon (SSK) tagger [30], which exploit specific features of the production of bb quark pairs in pp collisions, and their subsequent hadronisation. Each tagging algorithm gives a tag decision and a mistag probability. The tag decision, q, takes values +1, −1, or 0, if the signal meson is tagged as B 0 s , B 0 s , or is untagged, respectively. The fraction of events in the sample with a nonzero tagging decision gives the efficiency of the tagger, ε tag . The mistag probability, η, is estimated event-by-event, and represents the probability that the algorithm assigns a wrong tag decision to the event; it is calibrated using data samples of several flavour-specific B 0 , B + and B * 0 s2 decays to obtain the corrected mistag probability, ( ) ω , for an initial flavour ( ) B 0 s meson. A linear relationship between η and ( ) ω is used for the calibration. The effective tagging power is given by ε tag (1 − 2ω) 2 and for the combined taggers in the B 0 s → ψ(2S)φ signal sample is (3.88 ± 0.13 ± 0.12)%, where the first uncertainty is statistical and the second systematic.

Maximum likelihood fit
The physics parameters are determined by a weighted maximum likelihood fit of a signal-only probability density function (PDF) to the four-dimensional distribution of B 0 s → ψ(2S)φ decay time and helicity angles. The negative log-likelihood function to be minimised is given by where W i are the sWeights computed using m(ψ(2S)K + K − ) as the discriminating variable and the factor α = W i / W 2 i is necessary to obtain the correct parameter uncertainties from the Hessian of the negative log-likelihood. The PDF, P = S/ S dt dΩ, is obtained from which allows for the inclusion of information from both tagging algorithms in the computation of the decay rate. The function X(t, Ω) is defined in Eq. 1 and X(t, Ω) is the corresponding function for B 0 s decays. As in Ref. [11], the angular efficiency is included in the normalisation of the PDF via ten integrals, I k = dΩ ε(Ω)f k (Ω), which are calculated using simulated events. In contrast to Refs. [2, 11], the fit is performed in a single bin of m(K + K − ), within 12 MeV/c 2 of the known φ mass.
In the fit, Gaussian constraints are applied to the B 0 s mixing frequency ∆m s = 17.757 ± 0.021 ps −1 [7] and the tagging calibration parameters. The fitting procedure has been validated using pseudoexperiments and simulated B 0 s → ψ(2S)φ decays. Due to the symmetry in the PDF there is a two-fold ambiguity in the solutions for φ s and ∆Γ s ; the solution with positive ∆Γ s is used [31]. The results of the fit to the data are shown in Tables 1 and 2 while the projections of the fit onto the data are shown in Fig. 5. The results are consistent with previous measurements of these parameters [2-6], and the SM predictions for φ s and ∆Γ s [32][33][34]. They show no evidence of CP violation in the interference between B 0 s meson mixing and decay, nor for direct CP violation in B 0 s → ψ(2S)φ decays as the parameter |λ| is consistent with unity. The likelihood profile for δ is not parabolic and the 95% confidence level range is [2.4, 3.9] rad. Figure 6 shows values of F L ≡ |A 0 | 2 , the fraction of longitudinal polarisation, for and B 0 s → ψ(2S)φ final states as a function of the invariant mass squared of the dimuon system, q 2 . The precise measurement of F L from B 0 s → J/ψ φ at q 2 = 9.6 GeV 2 /c 4 is now joined by the precise measurement from this paper at q 2 = 13.6 GeV 2 /c 4 , demonstrating a clear decrease with q 2 towards the value of 1/3, as predicted by Ref. [36].

Systematic uncertainties
Systematic uncertainties for each of the measured parameters are reported in Table 3. They are evaluated by observing the change in physics parameters after repeating the likelihood fit with a modified model assumption, or by generating pseudoexperiments in case of uncertainties originating from the limited size of a calibration sample. In general the sum in quadrature of the different sources of systematic uncertainty is less than 20% of the statistical uncertainty, except for Γ s where it is close to 60%. 0.03 ± 0.14 ± 0.02  Repeating the fit to m(ψ(2S)K + K − ) in bins of the decay time and helicity angles shows that the mass resolution depends upon cos θ µ . This breaks the assumption that m(ψ(2S)K + K − ) is uncorrelated with the observables of interest, which is implicitly made by the use of weights from the sPlot technique. The effect of this correlation is quantified by repeating the four-dimensional likelihood fit for different sets of signal weights computed from fits to m(ψ(2S)K + K − ) in bins of cos θ µ . The largest variation in each physics parameter is assigned a systematic uncertainty. The mass model is tested by computing a new set of sWeights, using a Student's t-function to describe the signal component of the m(ψ(2S)K + K − ) distribution.
The statistical uncertainty on the angular efficiency is propagated by repeating the fit using new sets of the ten integrals, I k , systematically varied according to their covariance matrix. The effect of assuming perfect angular resolution in the likelihood fit is studied using pseudoexperiments. There is a small effect on the polarisation amplitudes and strong phases while all other parameters are unaffected.
The decay-time resolution is studied by generating pseudoexperiments using the nominal double Gaussian model and subsequently fitting them using a single Gaussian model, the parameters of which have been calibrated on the prompt J/ψ K + K − sample. In addition, the nominal model parameters are varied within their statistical uncertainties and the fit repeated.
The decay-time efficiency introduces a systematic uncertainty from three different sources. First, the contribution due to the statistical error on the determination of the decay-time efficiency from the control channel is determined by repeating the fit multiple times after randomly varying the parameters of the time efficiency within their statistical uncertainties. The statistical uncertainty is dominated by the size of the B 0 → ψ(2S)K * (892) 0 control sample. Second, a Student's t-function is used as an alternative mass model for the m(ψ(2S)K + π − ) distribution and a new decay-time efficiency function is produced. Finally, the efficiency function is recomputed with the lifetime of the B 0 modified by ±1σ. In all cases the difference in fit results arising from the use of the new efficiency function is taken as a systematic uncertainty. The sensitivity to the BDT selection is studied by adjusting the working point around the optimal position  Figure 6: |A 0 | 2 as a function of the invariant mass squared of the dimuon system, q 2 . Data points are taken from Ref. [35] (B 0 s → φµ + µ − , circles), Ref.
Stat. uncertainty 0.011 +0.041  [2] and pseudoexperiments were used to assess the impact of ignoring such a contribution. Only Γ s was affected, with a bias on its central value of (+20 ± 6)% of its statistical uncertainty. The assumption is made that the ratio of efficiencies for selecting B 0 s → ψ(2S)φ decays either promptly or via the decay of B + c mesons is the same as that for B 0 s → J/ψ φ decays. This leads to a bias of +0.002 ± 0.001 ps −1 in Γ s . The central value of Γ s is therefore reduced by 0.002 ps −1 and a systematic uncertainty of 0.001 ps −1 is assigned.
A test for a possible bias in the fit procedure is performed by generating and fitting many simulated pseudoexperiments of equivalent size to the data sample. The resulting biases are small and those that are not compatible with zero within two standard deviations are quoted as systematic uncertainties.
The uncertainty from knowledge of the LHCb detector's length and momentum scale is negligible as is the statistical uncertainty from the sWeights. The tagging parameters are allowed to float in the fit using Gaussian constraints according to their uncertainties, and thus their systematic uncertainties are propagated into the statistical uncertainties reported on the physics parameters themselves. The systematic uncertainties for φ s , ∆Γ s and Γ s can be treated as uncorrelated between this result and those in Ref. [2].

Conclusions
Using a dataset corresponding to an integrated luminosity of 3.0 fb −1 collected by the LHCb experiment in pp collisions during LHC Run 1, a flavour tagged, decay-time dependent angular analysis of approximately 4700 B 0 s → ψ(2S)φ decays is performed. The analysis gives access to a number of physics parameters including the CP -violating phase, average decay-width and decay-width difference of the B 0 s system as well as the polarisation amplitudes and strong phases of the decay. The effective decay-time resolution and effective tagging power are approximately 47 fs and 3.9%, respectively. This is the first measurement of the CP content of the B 0 s → ψ(2S)φ decay and first time that φ s and ∆Γ s have been measured in a final state containing the ψ(2S) resonance. The results are consistent with previous measurements [2-6], the SM predictions [32][33][34], and show no evidence of CP violation in the interference between B 0 s meson mixing and decay. The parameter |λ| is consistent with unity, implying no evidence for direct CP violation in B 0 s → ψ(2S)φ decays. The fraction of longitudinal polarisation in the B 0 s → ψ(2S)φ decay is measured to be lower than that in the B 0 s → J/ψ φ decay, consistent with the predictions of Ref. [36].