1 Introduction

The P-wave \({\mathrm {B}} ^0_{\mathrm {s}} \) states are the bound states of \({\mathrm {b}}\) and \({\mathrm {s}}\) quarks with an orbital angular momentum \(L=1\). Since the b quark is considerably heavier than the strange quark, heavy-quark effective theory (HQET) [1, 2] can be applied to describe this system. In the HQET framework, the state can be described by L and the spin of the light quark, providing a total angular momentum of the light subsystem \(j=L\pm \frac{1}{2}\). In the case of \(L=1\), this results in \(j=\frac{1}{2}\) or \(j=\frac{3}{2}\). Including the additional splitting from the spin of the heavy b quark results in a total angular momentum \(J=j\pm \frac{1}{2}\), yielding two doublets, with the four states denoted as: \({\mathrm {B}} ^{*}_{{\mathrm {s}}0}\) (\(j=\frac{1}{2}\), \(J^P=0^+\)), \({\mathrm {B}} ^{*}_{{\mathrm {s}}1}\) (\(j=\frac{1}{2}\), \(J^P=1^+\)), \({\mathrm {B}} _{{\mathrm {s}}1} \) (\(j=\frac{3}{2}\), \(J^P=1^+\)), and \({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \) (\(j=\frac{3}{2}\), \(J^P=2^+\)). The two former states have not been observed to date, while the latter two are known as the \({\mathrm {B}} _{{\mathrm {s}}1}(5830)^0 \) and \({\mathrm {B}} ^{*}_{{\mathrm {s}}2}(5840)^0 \) mesons, respectively. For simplicity in this paper, shortened symbols are used to denote the following particles: \(\mathrm {K} ^{*0} \equiv \mathrm {K} ^{*}(892)^0 \), \({\mathrm {B}} _{1} \equiv {\mathrm {B}} _{1}(5721)^0 \), \({\mathrm {B}} ^{*}_{2} \equiv {\mathrm {B}} ^{*}_{2}(5747)^0 \), \({\mathrm {B}} _{{\mathrm {s}}1} \equiv {\mathrm {B}} _{{\mathrm {s}}1}(5830)^0 \), \({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \equiv {\mathrm {B}} ^{*}_{{\mathrm {s}}2}(5840)^0 \), and \({\mathrm {B}} ^{(*)}_{{\mathrm {s}}1,2} \) refers to either \({\mathrm {B}} _{{\mathrm {s}}1} \) or \({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \). Charge-conjugate states are implied throughout the paper. According to HQET, the decays \({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^+ \mathrm {K} ^- \), \({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^{*+} \mathrm {K} ^- \), and \({\mathrm {B}} _{{\mathrm {s}}1} \rightarrow {\mathrm {B}} ^{*+} \mathrm {K} ^- \) are allowed and should proceed through a D-wave transition, while the decay \({\mathrm {B}} _{{\mathrm {s}}1} \rightarrow {\mathrm {B}} ^+ \mathrm {K} ^- \) is forbidden. Similar conclusions apply to the decays into \({\mathrm {B}} ^{(*)0} \mathrm {K} ^0_{\mathrm {S}} \).

Orbitally excited states of the \({\mathrm {B}} ^0_{\mathrm {s}} \) meson were observed by the CDF and D0 Collaborations via the decays into \({\mathrm {B}} ^{(*)+} \mathrm {K} ^- \) [3, 4]. More recently, the LHCb Collaboration presented a more precise study of these states and observed the decay \({\mathrm {B}} ^{*}_{{\mathrm {s}}2}(5840)^0 \rightarrow {\mathrm {B}} ^{*+} \mathrm {K} ^- \) [5], favouring the spin-parity assignment \(J^P=2^+\) for the \({\mathrm {B}} ^{*}_{{\mathrm {s}}2}(5840)^0 \) state. The CDF Collaboration subsequently presented a study of excited \({\mathrm {B}}\) meson states [6] that included measurements of the \({\mathrm {B}} ^{(*)}_{{\mathrm {s}}1,2} \rightarrow {\mathrm {B}} ^{(*)+} \mathrm {K} ^- \) decays. Table 1 summarizes all the available experimental \({\mathrm {B}} ^{(*)}_{{\mathrm {s}}1,2} \) results.

Table 1 Results on the masses, mass differences, and natural widths of the \({\mathrm {B}} ^{(*)}_{{\mathrm {s}}1,2} \) mesons from previous measurements. The mass differences are defined as \(\varDelta M_{{\mathrm {B}} _{{\mathrm {s}}1}}^{\pm } \equiv M({\mathrm {B}} _{{\mathrm {s}}1})-M_{{\mathrm {B}} ^{*+}}^{\mathrm {PDG}}-M_{\mathrm {K} ^-}^{\mathrm {PDG}} \) and \(\varDelta M_{{\mathrm {B}} ^{*}_{{\mathrm {s}}2}}^{\pm } \equiv M({\mathrm {B}} ^{*}_{{\mathrm {s}}2})-M_{{\mathrm {B}} ^+}^{\mathrm {PDG}}-M_{\mathrm {K} ^-}^{\mathrm {PDG}} \), where the PDG superscript refers to the world-average mass values at the time of each publication

In this paper, the first observation of the \({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^0 \mathrm {K} ^0_{\mathrm {S}} \) decay and a measurement of its branching fraction relative to that of the \({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^+ \mathrm {K} ^- \) decay are presented. The \({\mathrm {B}} ^+ \) and \({\mathrm {B}} ^0 \) candidates are reconstructed using the \({\mathrm {B}} ^+ \rightarrow {\mathrm {J}}/\psi (\mu ^+\mu ^-)\mathrm {K} ^+ \) and \({\mathrm {B}} ^0 \rightarrow {\mathrm {J}}/\psi (\mu ^+\mu ^-)\mathrm {K} ^{*0} (\mathrm {K} ^+ \pi ^-)\) decays, respectively. Measurements of several ratios of branching fractions and ratios of production cross sections times branching fractions are determined using the formulae:

$$\begin{aligned} R^{0\pm }_{2}= & {} \frac{\mathcal {B}({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^0 \mathrm {K} ^0_{\mathrm {S}} )}{\mathcal {B}({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^+ \mathrm {K} ^- )} \nonumber \\= & {} \frac{N({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^0 \mathrm {K} ^0_{\mathrm {S}} )}{N({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^+ \mathrm {K} ^- )}\, \frac{\epsilon ({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^+ \mathrm {K} ^- )}{\epsilon ({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^0 \mathrm {K} ^0_{\mathrm {S}} )}\nonumber \\&\times \frac{\mathcal {B}({\mathrm {B}} ^+ \rightarrow {\mathrm {J}}/\psi \mathrm {K} ^+ )}{\mathcal {B}({\mathrm {B}} ^0 \rightarrow {\mathrm {J}}/\psi \mathrm {K} ^{*0} )\mathcal {B}(\mathrm {K} ^{*0} \rightarrow \mathrm {K} ^+ \pi ^- )\mathcal {B}(\mathrm {K} ^0_{\mathrm {S}} \rightarrow \pi ^+\pi ^- )}, \nonumber \\\end{aligned}$$
(1)
$$\begin{aligned} R^{0\pm }_{1}= & {} \frac{\mathcal {B}({\mathrm {B}} _{{\mathrm {s}}1} \rightarrow {\mathrm {B}} ^{*0} \mathrm {K} ^0_{\mathrm {S}} )}{\mathcal {B}({\mathrm {B}} _{{\mathrm {s}}1} \rightarrow {\mathrm {B}} ^{*+} \mathrm {K} ^- )} \nonumber \\= & {} \frac{N({\mathrm {B}} _{{\mathrm {s}}1} \rightarrow {\mathrm {B}} ^{*0} \mathrm {K} ^0_{\mathrm {S}} )}{N({\mathrm {B}} _{{\mathrm {s}}1} \rightarrow {\mathrm {B}} ^{*+} \mathrm {K} ^- )}\,\frac{\epsilon ({\mathrm {B}} _{{\mathrm {s}}1} \rightarrow {\mathrm {B}} ^{*+} \mathrm {K} ^- )}{\epsilon ({\mathrm {B}} _{{\mathrm {s}}1} \rightarrow {\mathrm {B}} ^{*0} \mathrm {K} ^0_{\mathrm {S}} )}\nonumber \\&\times \frac{\mathcal {B}({\mathrm {B}} ^+ \rightarrow {\mathrm {J}}/\psi \mathrm {K} ^+ )}{\mathcal {B}({\mathrm {B}} ^0 \rightarrow {\mathrm {J}}/\psi \mathrm {K} ^{*0} )\mathcal {B}(\mathrm {K} ^{*0} \rightarrow \mathrm {K} ^+ \pi ^- )\mathcal {B}(\mathrm {K} ^0_{\mathrm {S}} \rightarrow \pi ^+\pi ^- )}, \nonumber \\\end{aligned}$$
(2)
$$\begin{aligned} R^{\pm }_{2*}= & {} \frac{\mathcal {B}({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^{*+} \mathrm {K} ^- )}{\mathcal {B}({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^+ \mathrm {K} ^- )}\nonumber \\= & {} \frac{N({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^{*+} \mathrm {K} ^- )}{N({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^+ \mathrm {K} ^- )} \, \frac{\epsilon ({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^+ \mathrm {K} ^- )}{\epsilon ({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^{*+} \mathrm {K} ^- )}, \end{aligned}$$
(3)
$$\begin{aligned} R^{0}_{2*}= & {} \frac{\mathcal {B}({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^{*0} \mathrm {K} ^0_{\mathrm {S}} )}{\mathcal {B}({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^0 \mathrm {K} ^0_{\mathrm {S}} )}\nonumber \\= & {} \frac{N({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^{*0} \mathrm {K} ^0_{\mathrm {S}} )}{N({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^0 \mathrm {K} ^0_{\mathrm {S}} )} \, \frac{\epsilon ({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^0 \mathrm {K} ^0_{\mathrm {S}} )}{\epsilon ({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^{*0} \mathrm {K} ^0_{\mathrm {S}} )}, \end{aligned}$$
(4)
$$\begin{aligned} R^{\pm }_{\sigma }= & {} \frac{\sigma (\mathrm {pp} \rightarrow {\mathrm {B}} _{{\mathrm {s}}1} \mathrm {X})\,\mathcal {B}({\mathrm {B}} _{{\mathrm {s}}1} \rightarrow {\mathrm {B}} ^{*+} \mathrm {K} ^- )}{\sigma (\mathrm {pp} \rightarrow {\mathrm {B}} ^{*}_{{\mathrm {s}}2} \mathrm {X})\,\mathcal {B}({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^+ \mathrm {K} ^- )} \nonumber \\= & {} \frac{N({\mathrm {B}} _{{\mathrm {s}}1} \rightarrow {\mathrm {B}} ^{*+} \mathrm {K} ^- )}{N({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^+ \mathrm {K} ^- )} \, \frac{\epsilon ({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^+ \mathrm {K} ^- )}{\epsilon ({\mathrm {B}} _{{\mathrm {s}}1} \rightarrow {\mathrm {B}} ^{*+} \mathrm {K} ^- )}, \end{aligned}$$
(5)
$$\begin{aligned} R^{0}_{\sigma }= & {} \frac{\sigma (\mathrm {pp} \rightarrow {\mathrm {B}} _{{\mathrm {s}}1} \mathrm {X})\,\mathcal {B}({\mathrm {B}} _{{\mathrm {s}}1} \rightarrow {\mathrm {B}} ^{*0} \mathrm {K} ^0_{\mathrm {S}} )}{\sigma (\mathrm {pp} \rightarrow {\mathrm {B}} ^{*}_{{\mathrm {s}}2} \mathrm {X})\,\mathcal {B}({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^0 \mathrm {K} ^0_{\mathrm {S}} )} \nonumber \\= & {} \frac{N({\mathrm {B}} _{{\mathrm {s}}1} \rightarrow {\mathrm {B}} ^{*0} \mathrm {K} ^0_{\mathrm {S}} )}{N({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^0 \mathrm {K} ^0_{\mathrm {S}} )} \, \frac{\epsilon ({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^0 \mathrm {K} ^0_{\mathrm {S}} )}{\epsilon ({\mathrm {B}} _{{\mathrm {s}}1} \rightarrow {\mathrm {B}} ^{*0} \mathrm {K} ^0_{\mathrm {S}} )}, \end{aligned}$$
(6)

where \(\mathrm {X}\) stands for an inclusive reaction, and \(N(\mathrm {A}\rightarrow \mathrm {BC})\) and \(\epsilon (\mathrm {A}\rightarrow \mathrm {BC})\) correspond to the number of \(\mathrm {A}\rightarrow \mathrm {BC}\) decays observed in data and the total efficiency for the \(\mathrm {A}\rightarrow \mathrm {BC}\) decay, respectively. The branching fractions of the decays \({\mathrm {B}} ^{*+} \rightarrow {\mathrm {B}} ^+ \gamma \) and \({\mathrm {B}} ^{*0} \rightarrow {\mathrm {B}} ^0 \gamma \) are assumed to be 100%. Additionally, the mass differences in the studied decays and the natural width of the \({\mathrm {B}} ^{*}_{{\mathrm {s}}2}(5840)^0 \) state are measured, as well as the mass differences \(M_{{\mathrm {B}} ^0}-M_{{\mathrm {B}} ^+} \) and \(M_{{\mathrm {B}} ^{*0}}-M_{{\mathrm {B}} ^{*+}} \). The data sample corresponds to an integrated luminosity of of proton-proton collisions at \(\sqrt{s}= 8\,\text {TeV} \), collected by the CMS experiment [7] at the CERN LHC in 2012.

2 The CMS detector

The central feature of the CMS apparatus is a superconducting solenoid of 6\(\text {\,m}\) internal diameter, providing a magnetic field of 3.8\(\text {\,T}\). Within the solenoid volume are a silicon pixel and strip tracker, a lead tungstate crystal electromagnetic calorimeter, and a brass and scintillator hadron calorimeter, each composed of a barrel and two endcap sections. Muons are detected in the pseudorapidity range \(|\eta |<2.4\) in gas-ionization chambers embedded in the steel flux-return yoke outside the solenoid. The main subdetectors used for the present analysis are the silicon tracker and the muon detection system. The silicon tracker measures charged particles within the range \(|\eta | < 2.5\). For nonisolated particles with transverse momentum \(1< p_{\mathrm {T}} < 10\,\text {GeV} \) and \(|\eta | < 1.4\), the track resolutions are typically 1.5% in \(p_{\mathrm {T}} \) and 25–90 (45–150)\(\,\mu \text {m}\) in the transverse (longitudinal) impact parameter [8]. Matching muons to tracks measured in the silicon tracker results in a relative \(p_{\mathrm {T}}\) resolution for muons with \(p_{\mathrm {T}} < 10\,\text {GeV} \) of 0.8–3.0% depending on \(|\eta |\) [9]. A more detailed description of the CMS detector, together with a definition of the coordinate system used and the relevant kinematic variables, can be found in Ref. [7].

Events of interest are selected using a two-tiered trigger system [10]. The first level, composed of custom hardware processors, uses information from the calorimeters and muon detectors to select events at a rate of around 100\(\text {\,kHz}\) within a time interval of less than 4\(\,\mu \text {s}\). The second level, known as the high-level trigger (HLT), consists of a farm of processors running a version of the full event reconstruction software optimized for fast processing, and reduces the event rate to around 1\(\text {\,kHz}\) before data storage.

3 Event reconstruction and selection

The data sample is collected with an HLT algorithm designed to select events with two muons consistent with originating from a charmonium resonance decaying at a significant distance from the beam axis. The requirements imposed at the trigger level include \(p_{\mathrm {T}} (\mu ^\pm ) >3.5\,\text {GeV} \), \(|\eta (\mu ^\pm ) |<2.2\), \(p_{\mathrm {T}} (\mu ^+\mu ^-) >6.9\,\text {GeV} \), dimuon vertex \(\chi ^2\) fit probability \(P_{\text {vtx}}(\mu ^+\mu ^-) > 10\%\), dimuon invariant mass \(1.0<M(\mu ^+\mu ^-)<4.8\,\text {GeV} \), distance between the beam axis and the reconstructed dimuon vertex position in the transverse plane \(L_{xy}(\mu ^+\mu ^-) > 3 \sigma _{L_{xy}(\mu ^+\mu ^-)}\), where \(\sigma _{L_{xy}(\mu ^+\mu ^-)}\) is the uncertainty in \(L_{xy}(\mu ^+\mu ^-)\), and the cosine of the dimuon candidate pointing angle to the beam axis \(\cos (\vec {L}_{xy}(\mu ^+\mu ^-)\), \(\vec {p_{\mathrm {T}}}(\mu ^+\mu ^-)) > 0.9\). The pointing angle is the angle between the \(\mu ^+\mu ^- \) candidate momentum in the transverse (xy) plane and the vector from the beam axis position to the reconstructed dimuon vertex in the transverse plane.

The reconstruction and selection of the \({\mathrm {B}}\) meson candidates are similar to those described in Ref. [11]. The analysis requires two muons of opposite charge that must match those that triggered the event readout. The trigger requirements are confirmed and the \({\mathrm {J}}/\psi \) candidates are selected by tightening the dimuon mass region to \([3.04,\,3.15]\,\text {GeV} \).

The \({\mathrm {B}} ^+ \rightarrow {\mathrm {J}}/\psi \mathrm {K} ^+ \) candidates are constructed by combining the selected \({\mathrm {J}}/\psi \) candidates with a track having \(p_{\mathrm {T}} >1\,\text {GeV} \) to which the kaon mass is assigned. The muon candidates must also satisfy the soft-muon identification criteria described in Ref. [9], and the kaon candidates must pass the high-purity track requirements detailed in Ref. [8]. A kinematic fit to the three tracks is performed that constrains the dimuon invariant mass to the world-average \({\mathrm {J}}/\psi \) mass [12]. From all the reconstructed \(\mathrm {pp}\) collision vertices in an event, the primary vertex (PV) is chosen as the one with the smallest \({\mathrm {B}} ^+ \) pointing angle. This pointing angle is the angle between the \({\mathrm {B}} ^+ \) candidate momentum and the vector from the PV to the reconstructed \({\mathrm {B}} ^+ \) candidate vertex. Furthermore, in this procedure, if any of the three tracks used in the \({\mathrm {B}} ^+ \) candidate reconstruction are included in the fit of the chosen PV, they are removed, and the PV is refitted. The \({\mathrm {B}} ^+ \) candidates are required to have \(p_{\mathrm {T}} ({\mathrm {B}} ^+) >10\,\text {GeV} \), \(P_{\text {vtx}}({\mathrm {B}} ^+)>1\%\), \(L_{xy}({\mathrm {B}} ^+) > 5 \sigma _{L_{xy}({\mathrm {B}} ^+)}\), and \(\cos (\vec {L}_{xy}({\mathrm {B}} ^+)\), \(\vec {p_{\mathrm {T}}}({\mathrm {B}} ^+))> 0.99\). The invariant mass distribution of the \({\mathrm {B}} ^+ \rightarrow {\mathrm {J}}/\psi \mathrm {K} ^+ \) candidates is shown in Fig. 1a. An unbinned extended maximum-likelihood fit is performed to this distribution using a triple-Gaussian function with common mean for the signal, an exponential function for the combinatorial background, and a fixed-shape function, derived from simulation, accounting for the Cabibbo-suppressed \({\mathrm {B}} ^+ \rightarrow {\mathrm {J}}/\psi \pi ^+ \) decay. The parameters of the signal and the combinatorial background contributions, as well as the yields of the different components, are free in the fit. The effective resolution of the signal function (\(\sigma _{M_{{\mathrm {B}} ^+}} \)) found from simulation of about 24\(\,\text {MeV}\) is consistent with the resolution measured in data. The invariant mass \(M({\mathrm {B}} ^+)\) returned by the vertex fit is required to lie in the range \([5.23, 5.33]\,\text {GeV} \), corresponding to a \(\pm 2\sigma _{M_{{\mathrm {B}} ^+}} \) window around the \({\mathrm {B}} ^+ \) mass.

The selected \({\mathrm {B}} ^+ \) candidates are combined with each track originating from the chosen PV with the charged kaon mass assigned to it. The track charge must be opposite to that of the reconstructed \({\mathrm {B}} ^+ \) meson candidate (in the following, this track is referred to as \(\mathrm {K} ^- \)). The kaon candidate is required to fulfill the standard high-purity track requirements [8] and have \(p_{\mathrm {T}} (\mathrm {K} ^-)>1\,\text {GeV} \).

The reconstruction of \({\mathrm {B}} ^0 \rightarrow {\mathrm {J}}/\psi (\mu ^+\mu ^-)\mathrm {K} ^{*0} (\mathrm {K} ^+ \pi ^-)\) candidates is similar to the one used for the charged decay mode. The dimuon combinations forming \({\mathrm {J}}/\psi \) candidates are obtained using the same algorithm. The \({\mathrm {B}} ^0 \) candidates are constructed from the selected \({\mathrm {J}}/\psi \) candidates and two tracks of opposite charge, assumed to be from a kaon and a pion. The tracks are required to satisfy standard high-purity track requirements [8] and have \(p_{\mathrm {T}} >1\,\text {GeV} \). Those kaon and pion candidates that can be matched to a signal in the muon chambers are rejected.

Fig. 1
figure 1

Invariant mass distributions of a \({\mathrm {J}}/\psi \mathrm {K} ^+ \) and b \({\mathrm {J}}/\psi \mathrm {K} ^{*0} \) candidates in data with the fit results superimposed. The points represent the data, with the vertical bars giving the corresponding statistical uncertainties. The thick curves are results of the fits, the dash-dotted lines display the signal contributions, and the short-dashed lines show the combinatorial background contributions. The long-dashed line shows in a the contribution from the \({\mathrm {B}} ^+ \rightarrow {\mathrm {J}}/\psi \pi ^+ \) decay, and in b the contribution from partially reconstructed \({\mathrm {B}} \rightarrow {\mathrm {J}}/\psi \mathrm {K} ^{*0} \mathrm {X}\) decays. The dashed line in b displays the contribution from swapping \(\mathrm {K} ^\pm \rightarrow \pi ^\pm \) in the reconstruction

The \({\mathrm {B}} ^0 \) candidates are obtained by performing a kinematic vertex fit to the four tracks described above that constrains the dimuon invariant mass to that of the \({\mathrm {J}}/\psi \) meson [12]. The candidates are required to have \(L_{xy}({\mathrm {B}} ^0) > 5 \sigma _{L_{xy}({\mathrm {B}} ^0)}\), \(P_{\text {vtx}}(\mu ^+\mu ^- \mathrm {K} ^+ \pi ^-)>1\%\), \(\cos (\vec {L}_{xy}({\mathrm {B}} ^0),\vec {p_{\mathrm {T}}}({\mathrm {B}} ^0))> 0.99\), and \(p_{\mathrm {T}} ({\mathrm {B}} ^0) >10\,\text {GeV} \). To reject the contribution from \({\mathrm {B}} ^0_{\mathrm {s}} \rightarrow {\mathrm {J}}/\psi \phi \) decay, the invariant mass of the two hadron tracks, if both are assigned the kaon mass, is required to be above 1.035\(\,\text {GeV}\). We demand that the \(\mathrm {K} ^+ \pi ^- \) invariant mass is within 90\(\,\text {MeV}\) of the \(\mathrm {K} ^{*0} \) mass [12]. If both the \(\mathrm {K} ^+ \pi ^- \) and \(\mathrm {K} ^- \pi ^+ \) hypotheses pass this selection, then the \(\mathrm {K} ^+ \pi ^- \) invariant mass must be closer to the \(\mathrm {K} ^{*0} \) mass than the \(\mathrm {K} ^- \pi ^+ \) invariant mass. The invariant mass distribution of the selected \({\mathrm {B}} ^0 \rightarrow {\mathrm {J}}/\psi \mathrm {K} ^+ \pi ^- \) candidates is shown in Fig. 1b. It is fitted with a sum of a triple-Gaussian function with a common mean for the signal, a double-Gaussian function accounting for the \(\mathrm {K} ^\pm \rightarrow \pi ^\pm \) swapped (KPS) component, where the second Gaussian is asymmetric, and an exponential function for the combinatorial background. An additional Gaussian function is included to account for the partially reconstructed \({\mathrm {B}} \rightarrow {\mathrm {J}}/\psi \mathrm {K} ^{*0} \mathrm {X}\) background near the left edge of the fit region. The resolution parameters of the signal function and the parameters of the KPS are fixed to the values obtained in simulation; the other parameters are free in the fit. The effective resolution of the signal function (\(\sigma _{M_{{\mathrm {B}} ^0}} \)) found from the simulation is about 19\(\,\text {MeV}\). The \({\mathrm {B}} ^0 \) candidate returned by the vertex fit is required to have an invariant mass in the range 5.245 to 5.313\(\,\text {GeV}\), corresponding to approximately \(\pm 2\sigma _{M_{{\mathrm {B}} ^0}} \) around the known \({\mathrm {B}} ^0 \) mass [12]. The fit results are used to extract the fraction of the KPS with respect to the signal yield in the \({\mathrm {B}} ^0 \) signal region of \((18.9\pm 0.3)\%\), where the uncertainty is statistical only.

The selected \({\mathrm {B}} ^0 \) candidates are combined with \(\mathrm {K} ^0_{\mathrm {S}}\) candidates that are formed from detached two-prong vertices, assuming the decay \(\mathrm {K} ^0_{\mathrm {S}} \rightarrow \pi ^+\pi ^- \), as described in Ref. [13]. The two-pion invariant mass is required to be within \(\pm 20\,\text {MeV} \) of the \(\mathrm {K} ^0_{\mathrm {S}}\) mass [12], which corresponds approximately to 4 times the \(\pi ^+\pi ^-\) mass resolution. The two pion tracks are refitted with their invariant mass constrained to the known \(\mathrm {K} ^0_{\mathrm {S}}\) mass, and the obtained \(\mathrm {K} ^0_{\mathrm {S}}\) candidate is required to satisfy \(P_{\text {vtx}}(\mathrm {K} ^0_{\mathrm {S}})>1\%\) and \(\cos (\vec {L}_{xy}(\mathrm {K} ^0_{\mathrm {S}}),\vec {p_{\mathrm {T}}}(\mathrm {K} ^0_{\mathrm {S}}))> 0.999\). Multiple candidates from the same event are not removed.

Simulated events that are used to obtain relative efficiencies and invariant mass resolutions are produced with pythia v6.424 [14]. The \({\mathrm {b}}\) hadron decays are modelled with evtgen 1.3.0 [15]. Final-state photon radiation is included in evtgen using photos  [16, 17]. The events are then passed through a detailed Geant4-based simulation [18] of the CMS detector with the same trigger and reconstruction algorithms as used for the data. The simulation includes effects from multiple \(\mathrm {pp}\) interactions in the same or nearby beam crossings (pileup) with the same multiplicity distribution as observed in data. Matching of the reconstructed candidates to the generated particles is obtained by requiring \(\varDelta R=\sqrt{\smash [b]{(\varDelta \eta )^2 + (\varDelta \phi )^2}}\) to be <0.015 for \(\pi ^\pm \) and \(\mathrm {K} ^\pm \), <0.004 for muons, and <0.020 for \(\mathrm {K} ^0_{\mathrm {S}}\), where \(\varDelta \eta \) and \(\varDelta \phi \) are the differences in pseudorapidity and azimuthal angle (in radians), respectively, between the three-momenta of the reconstructed and generated particles.

4 Fits to the \({\mathrm {B}} \mathrm {K} \) invariant mass distributions

For every invariant mass distribution fit discussed in this section, the functional models for the signal and the combinatorial background components are chosen such that a good description of the binned distribution is obtained. The description quality is verified using the difference between the data and fit result, divided by the statistical uncertainty in the data and also with \(\chi ^2\) tests.

4.1 \({\mathrm {B}} ^+ \mathrm {K} ^- \) invariant mass

To improve the \({\mathrm {B}} ^+ \mathrm {K} ^- \) invariant mass resolution, the variable \(m_{{\mathrm {B}} ^+ \mathrm {K} ^-} \) is computed as

$$\begin{aligned} m_{{\mathrm {B}} ^+ \mathrm {K} ^-} = M({\mathrm {B}} ^+ \mathrm {K} ^-) - M({\mathrm {B}} ^+) + M_{{\mathrm {B}} ^+}^{\mathrm {PDG}}, \end{aligned}$$

where \(M({\mathrm {B}} ^+ \mathrm {K} ^-)\) is the invariant mass of the reconstructed \({\mathrm {B}} ^+ \mathrm {K} ^- \) combination, \(M({\mathrm {B}} ^+)\) is the reconstructed \({\mathrm {B}} ^+ \) mass, and \(M_{{\mathrm {B}} ^+}^{\mathrm {PDG}}\) is the world-average \({\mathrm {B}} ^+ \) meson mass [12].

Fig. 2
figure 2

a Two-dimensional distribution of \(m_{{\mathrm {B}} ^+ \mathrm {K} ^-} \) versus \(m_{{\mathrm {B}} ^+ \pi ^-} \) in data. b The fitted \({\mathrm {B}} ^+ \pi ^- \) invariant mass distribution. The points represent the data, the thick solid curve is the fit projection, the thin lines indicate the three excited \({\mathrm {B}} ^0 \) signal contributions, the short-dashed curve is the combinatorial background, and the long-dashed lines show the contributions from the excited \({\mathrm {B}} ^0_{\mathrm {s}} \) decays

The decays of excited \({\mathrm {B}} ^0 \) mesons \({\mathrm {B}} _{1} \rightarrow {\mathrm {B}} ^{*+} \pi ^- \), \({\mathrm {B}} ^{*}_{2} \rightarrow {\mathrm {B}} ^+ \pi ^- \), and \({\mathrm {B}} ^{*}_{2} \rightarrow {\mathrm {B}} ^{*+} \pi ^- \) contribute to the obtained \({\mathrm {B}} ^+ \mathrm {K} ^- \) mass distribution, as seen from the two-dimensional distribution in Fig. 2a. It is important to take into account these background contributions in the fits to the \(m_{{\mathrm {B}} ^+ \mathrm {K} ^-} \) distribution. Simulated samples of these decays are reconstructed in the same way as the collision events to obtain the corresponding reflection shapes in the \(m_{{\mathrm {B}} ^+ \mathrm {K} ^-} \) distribution. In order to measure the yields of these reflections, the \({\mathrm {B}} ^+ \pi ^- \) invariant mass, \(m_{{\mathrm {B}} ^+ \pi ^-} \), is computed the same way as \(m_{{\mathrm {B}} ^+ \mathrm {K} ^-} \). Fits are performed on the \(m_{{\mathrm {B}} ^+ \pi ^-} \) distribution observed in data, using the same data set, with a pion mass assigned to the track instead of a kaon mass. Then the obtained yields of these contributions are used in the fits to the \(m_{{\mathrm {B}} ^+ \mathrm {K} ^-} \) distribution.

The measured \(m_{{\mathrm {B}} ^+ \pi ^-} \) distribution is presented in Fig. 2b. Clear enhancements are seen around 5.65–5.75\(\,\text {GeV}\), corresponding to the decays of excited \({\mathrm {B}} ^0 \) mesons. An unbinned extended maximum-likelihood fit is performed to this distribution. The three signal functions accounting for the \({\mathrm {B}} ^{*}_{2} \rightarrow {\mathrm {B}} ^+ \pi ^- \), \({\mathrm {B}} ^{*}_{2} \rightarrow {\mathrm {B}} ^{*+} \pi ^- \), and \({\mathrm {B}} _{1} \rightarrow {\mathrm {B}} ^{*+} \pi ^- \) decays are D-wave relativistic Breit–Wigner (RBW) functions, convolved with a double-Gaussian resolution function, with parameters fixed according to the simulation (the typical effective resolution is about 5.5\(\,\text {MeV}\), significantly below the natural widths of the states). As verified in simulations, the signal shapes of \({\mathrm {B}} ^{*}_{2} \rightarrow {\mathrm {B}} ^{*+} \pi ^- \) and \({\mathrm {B}} _{1} \rightarrow {\mathrm {B}} ^{*+} \pi ^- \) decays (where the photon from the \({\mathrm {B}} ^{*+} \) decay is lost and only the \({\mathrm {B}} ^+ \pi ^- \) mass is reconstructed) are simply shifted by the mass difference \(M_{{\mathrm {B}} ^{*+}}^{\mathrm {PDG}}-M_{{\mathrm {B}} ^+}^{\mathrm {PDG}} =45.34\pm 0.23\,\text {MeV} \) [12]. The combinatorial background is parametrized by the function \((x-x_0)^{\alpha }\, P_n(x)\), where \(x\equiv m_{{\mathrm {B}} ^+ \pi ^-} \), \(x_0\) is the threshold value, \(\alpha \) is a free parameter, and \(P_n\) is a polynomial of degree n, with \(n=3\). Additional, relatively small contributions come from the excited \({\mathrm {B}} ^0_{\mathrm {s}} \) decays. They are included in the fit with free normalizations and fixed shapes, obtained from the simulation.

In the nominal fit, the masses and natural widths of the excited \({\mathrm {B}} ^0 \) mesons are fixed to their world-average values [12]. The fit region is not extended to values above 5865\(\,\text {MeV}\) to avoid having to model the \({\mathrm {B}} (5970)\) contribution [6]. The fitted event yields are about 8500, 10 500, and 12 000 for the \({\mathrm {B}} ^{*}_{2} \rightarrow {\mathrm {B}} ^+ \pi ^- \), \({\mathrm {B}} ^{*}_{2} \rightarrow {\mathrm {B}} ^{*+} \pi ^- \), and \({\mathrm {B}} _{1} \rightarrow {\mathrm {B}} ^{*+} \pi ^- \) signals, respectively.

Fig. 3
figure 3

Invariant mass distributions of a \({\mathrm {B}} ^+ \mathrm {K} ^- \) and b \({\mathrm {B}} ^0 \mathrm {K} ^0_{\mathrm {S}} \) candidates with the results of the fit overlaid. The points represent the data, the thick solid curves are the results of the overall fits, and the thin solid lines display the signal contributions. The short-dashed lines show the combinatorial background contributions. The long-dashed lines show: in a the contributions from excited \({\mathrm {B}} ^0 \) meson decays, and in b the contributions from swapping \(\mathrm {K} ^\pm \rightarrow \pi ^\pm \) in the reconstruction of the \({\mathrm {B}} ^0 \) mesons

Figure 3a shows the measured \(m_{{\mathrm {B}} ^+ \mathrm {K} ^-} \) distribution. The three peaks from lower to higher mass correspond to the decays \({\mathrm {B}} _{{\mathrm {s}}1} \rightarrow {\mathrm {B}} ^{*+} \mathrm {K} ^- \), \({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^{*+} \mathrm {K} ^- \), and \({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^+ \mathrm {K} ^- \). An unbinned extended maximum-likelihood fit is performed to this distribution using the sum of three signal functions, a background function, and the three reflections from the excited \({\mathrm {B}} ^0 \) decays. The signals are described with D-wave RBW functions convolved with double-Gaussian resolution functions obtained from the simulation (the effective resolutions are about 1–2\(\,\text {MeV}\)). The natural widths of the \({\mathrm {B}} ^{(*)}_{{\mathrm {s}}1,2} \) states and their masses are free parameters in the fit. The nonresonant background is modelled by \((x-x_0)^{\alpha }\, P_n(x)\), where \(x\equiv m_{{\mathrm {B}} ^+ \mathrm {K} ^-} \), \(x_0\) is the threshold value, and the nominal fit uses \(n=6\). The reflections correspond to the contributions of excited \({\mathrm {B}} ^0 \) meson decays into a \({\mathrm {B}} ^{(*)+} \) meson and a charged pion, as described above. The shapes of these contributions are obtained from the simulation and are fixed in the fit to the data. The yields of these reflections are corrected by the efficiency of using the restricted fit region \(x_0<m_{{\mathrm {B}} ^+ \mathrm {K} ^-} <5.95\,\text {GeV} \). The results of the fit are presented in the second column of Table 2, where the measured masses of the \({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \) and \({\mathrm {B}} _{{\mathrm {s}}1} \) mesons are given with respect to the corresponding world-average \({\mathrm {B}} ^+ \) or \({\mathrm {B}} ^{*+} \), and \(\mathrm {K} ^- \) masses [12].

Table 2 The observed signal yields (N), natural widths (\(\varGamma \)), and mass differences from the fits to the \(m_{{\mathrm {B}} \mathrm {K} } \) distributions in data. The uncertainties are statistical only

4.2 \({\mathrm {B}} ^0 \mathrm {K} ^0_{\mathrm {S}} \) invariant mass

Similarly to the \({\mathrm {B}} ^+ \mathrm {K} ^- \) channel, the variable \(m_{{\mathrm {B}} ^0 \mathrm {K} ^0_{\mathrm {S}}} = M({\mathrm {B}} ^0 \mathrm {K} ^0_{\mathrm {S}}) - M({\mathrm {B}} ^0) + M_{{\mathrm {B}} ^0}^{\mathrm {PDG}}\) is used for the \({\mathrm {B}} ^0 \mathrm {K} ^0_{\mathrm {S}} \) invariant mass. The \(m_{{\mathrm {B}} ^0 \mathrm {K} ^0_{\mathrm {S}}} \) distribution of the selected \({\mathrm {B}} ^0 \mathrm {K} ^0_{\mathrm {S}} \) candidates is shown in Fig. 3b. There is a significant peak at about 5840\(\,\text {MeV}\) and a smaller one at 5781\(\,\text {MeV}\), corresponding to the decays \({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^0 \mathrm {K} ^0_{\mathrm {S}} \) and \({\mathrm {B}} _{{\mathrm {s}}1} \rightarrow {\mathrm {B}} ^{*0} \mathrm {K} ^0_{\mathrm {S}} \), respectively. The contribution from the \({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^{*0} \mathrm {K} ^0_{\mathrm {S}} \) decay, also shown in Fig. 3b at 5795\(\,\text {MeV}\), is not statistically significant. However, it is still included in the fit model described below.

The decays \({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^0 \mathrm {K} ^0_{\mathrm {S}} \), \({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^{*0} \mathrm {K} ^0_{\mathrm {S}} \), and \({\mathrm {B}} _{{\mathrm {s}}1} \rightarrow {\mathrm {B}} ^{*0} \mathrm {K} ^0_{\mathrm {S}} \) are modelled using three D-wave RBW functions convolved with double-Gaussian resolution functions whose parameters are fixed according to the simulation. The masses and natural widths are free parameters in the fit. Similarly to the \({\mathrm {B}} ^+ \mathrm {K} ^- \) final state, if the photon from \({\mathrm {B}} ^{*0} \) decay is lost and only the \({\mathrm {B}} ^0 \mathrm {K} ^0_{\mathrm {S}} \) mass is reconstructed, the peak position is simply shifted by the mass difference \(M_{{\mathrm {B}} ^{*0}}^{\mathrm {PDG}}-M_{{\mathrm {B}} ^0}^{\mathrm {PDG}} =45.18\pm 0.23\,\text {MeV} \) [12]. Studies on simulated events show that when the kaon and the pion from the \({\mathrm {B}} ^0 \rightarrow {\mathrm {J}}/\psi \mathrm {K} ^+ \pi ^- \) decay are exchanged, the three decays mentioned above produce narrow peaks at the same mass values as the signal peaks. In order to account for these KPS contributions, three additional RBW functions, convolved with double-Gaussian shapes, are added, where the parameters of these Gaussians are fixed to the values obtained in the simulation and the yields are fixed relative to the signal yields using the mistagging probability found in the fit to the \({\mathrm {B}} ^0 \) invariant mass distribution. A function of the form \((x-x_0)^{\alpha }\, P_n(x)\) is used to describe the combinatorial background, where \(x\equiv m_{{\mathrm {B}} ^0 \mathrm {K} ^0_{\mathrm {S}}} \), \(x_0\) is the threshold value, and \(n=1\). The results of the fit are presented in the last column of Table 2, where the signal yields do not include the KPS component.

The significance of the \({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^0 \mathrm {K} ^0_{\mathrm {S}} \) decay is estimated to be 6.3 standard deviations in the baseline fit model using a ratio of the fit likelihoods with and without the signal component [19]. Systematic uncertainties, discussed in the next section, are taken into account using nuisance parameters for the mass resolution, the KPS fraction, and the \({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \) mass and natural width. These parameters are allowed to vary in the fits but are constrained by Gaussian probability density functions. In particular for the \({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \) mass and natural width, the world-average values and their uncertainties [12] are used. Under variations of the fit range and background model, the significance varies from 6.3 to 7.0 standard deviations. Similarly, the statistical significance of the \({\mathrm {B}} _{{\mathrm {s}}1} \rightarrow {\mathrm {B}} ^{*0} \mathrm {K} ^0_{\mathrm {S}} \) signal peak is 3.9 standard deviations, where the systematic uncertainties due to the mass resolution and KPS fraction are taken into account, as well as the uncertainties in the \({\mathrm {B}} _{{\mathrm {s}}1} \) mass and natural width. The significance varies from 3.6 to 3.9 standard deviations under variations of the fit region and the background model.

5 Efficiencies and systematic uncertainties

The efficiency for each decay channel is calculated using simulated signal samples. It is defined as the number of reconstructed signal events from the simulation divided by the number of generated events. The efficiency includes the detector acceptance, trigger, and candidate reconstruction efficiencies. Only the ratios of such efficiencies for different decay modes are needed in formulae (1)–(6), which reduces the systematic uncertainties in those ratios. The resulting efficiency ratios used in the measurements of the ratios of the branching fractions are:

$$\begin{aligned} \frac{\epsilon ({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^+ \mathrm {K} ^- )}{\epsilon ({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^0 \mathrm {K} ^0_{\mathrm {S}} )}&= 15.77\pm 0.18,\\ \frac{\epsilon ({\mathrm {B}} _{{\mathrm {s}}1} \rightarrow {\mathrm {B}} ^{*+} \mathrm {K} ^- )}{\epsilon ({\mathrm {B}} _{{\mathrm {s}}1} \rightarrow {\mathrm {B}} ^{*0} \mathrm {K} ^0_{\mathrm {S}} )}&= 16.33\pm 0.20,\\ \frac{\epsilon ({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^+ \mathrm {K} ^- )}{\epsilon ({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^{*+} \mathrm {K} ^- )}&= 0.961\pm 0.010,\\ \frac{\epsilon ({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^0 \mathrm {K} ^0_{\mathrm {S}} )}{\epsilon ({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^{*0} \mathrm {K} ^0_{\mathrm {S}} )}&= 0.970\pm 0.012,\\ \frac{\epsilon ({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^+ \mathrm {K} ^- )}{\epsilon ({\mathrm {B}} _{{\mathrm {s}}1} \rightarrow {\mathrm {B}} ^{*+} \mathrm {K} ^- )}&= 0.953\pm 0.010,\\ \frac{\epsilon ({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^0 \mathrm {K} ^0_{\mathrm {S}} )}{\epsilon ({\mathrm {B}} _{{\mathrm {s}}1} \rightarrow {\mathrm {B}} ^{*0} \mathrm {K} ^0_{\mathrm {S}} )}&= 0.987\pm 0.012, \end{aligned}$$

where the uncertainties are statistical only and related to the finite size of the simulated samples.

The ratios \(R^{0\pm }_{2} \) and \(R^{0\pm }_{1} \) involve different numbers of final-state tracks from the decay processes in the numerator and denominator, and the related signal yields are extracted from fits to different invariant mass distributions, unlike the ratios \(R^{\pm }_{2*} \), \(R^{0}_{2*} \), \(R^{\pm }_{\sigma } \), and \(R^{0}_{\sigma } \). Therefore, the systematic uncertainties are described separately for the two cases in the next two subsections.

The statistical uncertainties in the efficiency ratios are considered as sources of systematic uncertainty in the measured branching fraction ratios. The systematic uncertainties related to muon reconstruction and identification and trigger efficiencies cancel out in the ratios. Systematic uncertainties associated with the track reconstruction efficiency are assigned only in ratios involving final states with a different number of tracks. Validation studies of the simulated signal samples are performed by comparing distributions of variables employed in the event selection between simulation and background-subtracted data, using the channels with the larger yields in data (\({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^+ \mathrm {K} ^- \), \({\mathrm {B}} _{{\mathrm {s}}1} \rightarrow {\mathrm {B}} ^{*+} \mathrm {K} ^- \), and \({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^0 \mathrm {K} ^0_{\mathrm {S}} \)). No significant deviations are found, and no additional systematic uncertainties in the efficiency ratios are assigned.

5.1 Systematic uncertainties in the ratios \(R^{0\pm }_{2}\) and \(R^{0\pm }_{1}\)

A systematic uncertainty of \(2\times 3.9\%=7.8\%\) [8] is assigned to the \(R^{0\pm }_{2} \) and \(R^{0\pm }_{1} \) ratios due to the uncertainty in the track reconstruction efficiency, since the neutral decay channel has two additional charged particles in the final state in comparison to the charged decay channel.

To evaluate the systematic uncertainties related to the choice of the invariant mass fit model, several alternative functions are tested. The systematic uncertainty in each signal yield is calculated as the highest deviation of the observed signal yield from the baseline fit result. Changes in each fit involve variations in the polynomial degree n in the background model and the fit range; for the fit to the \(m_{{\mathrm {B}} ^+ \pi ^-} \) distribution the variations also include letting the signal masses and natural widths float. The uncertainties related to fits to the \({\mathrm {B}} ^+ \pi ^- \), \({\mathrm {B}} ^+ \mathrm {K} ^- \), and \({\mathrm {B}} ^0 \mathrm {K} ^0_{\mathrm {S}} \) invariant mass distributions are treated separately and include:

  • A systematic uncertainty related to the fit to \({\mathrm {B}} ^+ \pi ^- \) invariant mass of 2.5% for \(N({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^+ \mathrm {K} ^-)\) and 2.0% for \(N({\mathrm {B}} _{{\mathrm {s}}1} \rightarrow {\mathrm {B}} ^{*+} \mathrm {K} ^-)\),

  • A systematic uncertainty related to the fit to \({\mathrm {B}} ^+ \mathrm {K} ^- \) invariant mass of 2.4% for \(N({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^+ \mathrm {K} ^-)\) and 4.6% for \(N({\mathrm {B}} _{{\mathrm {s}}1} \rightarrow {\mathrm {B}} ^{*+} \mathrm {K} ^-)\),

  • A systematic uncertainty related to the fit to \({\mathrm {B}} ^0 \mathrm {K} ^0_{\mathrm {S}} \) invariant mass of 14% for \(N({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^0 \mathrm {K} ^0_{\mathrm {S}} )\) and 8.1% for \(N({\mathrm {B}} _{{\mathrm {s}}1} \rightarrow {\mathrm {B}} ^{*0} \mathrm {K} ^0_{\mathrm {S}} )\).

The uncertainty from the invariant mass resolution is estimated by comparing the \({\mathrm {B}} ^+ \rightarrow {\mathrm {J}}/\psi \mathrm {K} ^+ \) decays in data and simulation, yielding a difference of at most 2.6%. To account for this, the signal fits to the \(m_{{\mathrm {B}} ^+ \mathrm {K} ^-} \) and \(m_{{\mathrm {B}} ^0 \mathrm {K} ^0_{\mathrm {S}}} \) distributions in data are repeated with the resolutions decreased and increased by 3%. The largest deviations from the baseline in the measured ratios are: 0.7% for \(N({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^0 \mathrm {K} ^0_{\mathrm {S}} )/N({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^+ \mathrm {K} ^- )\) and 2.2% for \(N({\mathrm {B}} _{{\mathrm {s}}1} \rightarrow {\mathrm {B}} ^{*0} \mathrm {K} ^0_{\mathrm {S}} )/N({\mathrm {B}} _{{\mathrm {s}}1} \rightarrow {\mathrm {B}} ^{*+} \mathrm {K} ^- )\). These values are used as systematic uncertainties in the ratios \(R^{0\pm }_{2} \) and \(R^{0\pm }_{1} \).

The fraction of the KPS component in the \({\mathrm {B}} ^0 \mathrm {K} ^0_{\mathrm {S}} \) signals is obtained from the fit to the \({\mathrm {B}} ^0 \) invariant mass distribution in the data. The systematic uncertainty in this fraction is evaluated by varying the \({\mathrm {B}} ^0 \) signal mass resolution by \(\pm 3\%\). The resulting variations of the KPS fraction are at most 3%. The other variations in the fit to the \({\mathrm {J}}/\psi \mathrm {K} ^{*0} \) invariant mass distribution result in negligible changes in the KPS fraction. The corresponding systematic uncertainty is 2.6% in both \(R^{0\pm }_{2} \) and \(R^{0\pm }_{1} \). As expected, the changes of the other ratios (\(R^{0}_{2*} \), \(R^{0}_{\sigma } \)) under these variations are negligible.

Formulae (1) and (2) assume the decay \({\mathrm {B}} ^0 \rightarrow {\mathrm {J}}/\psi \mathrm {K} ^+ \pi ^- \) proceeds only through the \(\mathrm {K} ^{*0} \) resonance. The systematic uncertainty related to this assumption is estimated by fitting the \(\mathrm {K} ^+ \pi ^- \) invariant mass distribution obtained from the candidate \({\mathrm {B}} ^0 \) data events using the background-subtraction technique \({}_\mathrm {s}\)Plot [20]. This gives an estimate of 5% for the nonresonant \(\mathrm {K} ^+ \pi ^- \) fraction in the total number of signal events, which is included as a systematic uncertainty in the ratios \(R^{0\pm }_{2} \) and \(R^{0\pm }_{1} \).

All these systematic uncertainties are summarized in Table 3, along with the total systematic uncertainty, calculated as the sum in quadrature of the different sources.

Table 3 Relative systematic uncertainties in percent in the ratios \(R^{0\pm }_{2} \) and \(R^{0\pm }_{1} \)
Table 4 Relative systematic uncertainties in percent in the ratios \(R^{\pm }_{2*} \), \(R^{0}_{2*} \), \(R^{\pm }_{\sigma } \), and \(R^{0}_{\sigma } \)

5.2 Systematic uncertainties in the ratios \(R^{\pm }_{2*}\), \(R^{0}_{2*}\), \(R^{\pm }_{\sigma }\), and \(R^{0}_{\sigma }\)

No systematic uncertainty related to the track reconstruction efficiency is assigned to the ratios considered in this subsection, since they involve final states in the numerator and denominator with equal numbers of charged particles.

In order to evaluate the systematic uncertainties related to the choice of the invariant mass fit model, several alternative functions are tested, as in the previous subsection. The systematic uncertainty in each ratio is calculated as the largest deviation of the corresponding ratio of signal yields obtained using alternative fit models with respect to the baseline fit model. The uncertainties related to the fits to \({\mathrm {B}} ^+ \pi ^- \), \({\mathrm {B}} ^+ \mathrm {K} ^- \), and \({\mathrm {B}} ^0 \mathrm {K} ^0_{\mathrm {S}} \) invariant mass distributions are treated separately and include:

  • A systematic uncertainty related to the fit to \({\mathrm {B}} ^+ \pi ^- \) invariant mass of 2.9% for \(N({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^{*+} \mathrm {K} ^- )/N({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^+ \mathrm {K} ^- )\) and 2.7% for \(N({\mathrm {B}} _{{\mathrm {s}}1} \rightarrow {\mathrm {B}} ^{*+} \mathrm {K} ^- )/N({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^+ \mathrm {K} ^- )\),

  • A systematic uncertainty related to the fit to \({\mathrm {B}} ^+ \mathrm {K} ^- \) invariant mass of 17% for \(N({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^{*+} \mathrm {K} ^- )/N({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^+ \mathrm {K} ^- )\) and 7.1% for \(N({\mathrm {B}} _{{\mathrm {s}}1} \rightarrow {\mathrm {B}} ^{*+} \mathrm {K} ^- )/N({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^+ \mathrm {K} ^- )\),

  • A systematic uncertainty related to the fit to \({\mathrm {B}} ^0 \mathrm {K} ^0_{\mathrm {S}} \) invariant mass of 13% for \(N({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^{*0} \mathrm {K} ^0_{\mathrm {S}} )/N({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^0 \mathrm {K} ^0_{\mathrm {S}} )\) and 24% for the ratio \(N({\mathrm {B}} _{{\mathrm {s}}1} \rightarrow {\mathrm {B}} ^{*0} \mathrm {K} ^0_{\mathrm {S}} )/N({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^0 \mathrm {K} ^0_{\mathrm {S}} )\).

The systematic uncertainty in the ratios \(R^{\pm }_{2*} \), \(R^{0}_{2*} \), \(R^{\pm }_{\sigma } \), and \(R^{0}_{\sigma } \), related to the knowledge of the invariant mass resolution is estimated as in the previous subsection, and is found to be in the range 1.2–3.0%.

The systematic uncertainty associated with the uncertainty in the mass differences \(M_{{\mathrm {B}} ^{*+}}^{\mathrm {PDG}}-M_{{\mathrm {B}} ^+}^{\mathrm {PDG}} \) and \(M_{{\mathrm {B}} ^{*0}}^{\mathrm {PDG}}-M_{{\mathrm {B}} ^0}^{\mathrm {PDG}} \) must be taken into account, since these values are fixed in the fits. The baseline fits are repeated with each mass difference fixed to its nominal value plus and minus its uncertainty, and the largest deviations from the baseline of the obtained ratios of signal yields are taken as systematic uncertainties: 7.7% for \(N({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^{*+} \mathrm {K} ^- )/N({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^+ \mathrm {K} ^- )\) and 4.8% for \(N({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^{*0} \mathrm {K} ^0_{\mathrm {S}} )/N({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^0 \mathrm {K} ^0_{\mathrm {S}} )\). The changes in other ratios under variations of \(M_{{\mathrm {B}} ^{*+}}^{\mathrm {PDG}}-M_{{\mathrm {B}} ^+}^{\mathrm {PDG}} \) and \(M_{{\mathrm {B}} ^{*0}}^{\mathrm {PDG}}-M_{{\mathrm {B}} ^0}^{\mathrm {PDG}} \) are negligible.

The systematic uncertainties due to non-\(\mathrm {K} ^{*0} \) contributions cancel out in the ratios \(R^{0}_{2*} \) and \(R^{0}_{\sigma } \).

Table 4 lists those systematic uncertainties, together with the total ones, calculated by summing the different contributions in quadrature.

5.3 Systematic uncertainties in the mass differences and natural widths

The fits to the \({\mathrm {B}} \mathrm {K} \) invariant mass distributions are also used to measure the mass differences

$$\begin{aligned} {\begin{aligned} \varDelta M_{{\mathrm {B}} ^{*}_{{\mathrm {s}}2}}^{\pm }&= M({\mathrm {B}} ^{*}_{{\mathrm {s}}2})-M_{{\mathrm {B}} ^+}^{\mathrm {PDG}}-M_{\mathrm {K} ^-}^{\mathrm {PDG}}, \\ \varDelta M_{{\mathrm {B}} _{{\mathrm {s}}1}}^{\pm }&= M({\mathrm {B}} _{{\mathrm {s}}1})-M_{{\mathrm {B}} ^{*+}}^{\mathrm {PDG}}-M_{\mathrm {K} ^-}^{\mathrm {PDG}}, \\ \varDelta M_{{\mathrm {B}} ^{*}_{{\mathrm {s}}2}}^{0}&= M({\mathrm {B}} ^{*}_{{\mathrm {s}}2})-M_{{\mathrm {B}} ^0}^{\mathrm {PDG}}-M_{\mathrm {K} ^0_{\mathrm {S}}}^{\mathrm {PDG}}, \\ \varDelta M_{{\mathrm {B}} _{{\mathrm {s}}1}}^{0}&= M({\mathrm {B}} _{{\mathrm {s}}1})-M_{{\mathrm {B}} ^{*0}}^{\mathrm {PDG}}-M_{\mathrm {K} ^0_{\mathrm {S}}}^{\mathrm {PDG}}. \end{aligned} } \end{aligned}$$

Using these values, the mass differences

$$\begin{aligned} M_{{\mathrm {B}} ^0}-M_{{\mathrm {B}} ^+}&= \varDelta M_{{\mathrm {B}} ^{*}_{{\mathrm {s}}2}}^{\pm }-\varDelta M_{{\mathrm {B}} ^{*}_{{\mathrm {s}}2}}^{0} +M_{\mathrm {K} ^-}^{\mathrm {PDG}}-M_{\mathrm {K} ^0_{\mathrm {S}}}^{\mathrm {PDG}} \end{aligned}$$

and

$$\begin{aligned} M_{{\mathrm {B}} ^{*0}}-M_{{\mathrm {B}} ^{*+}}&= \varDelta M_{{\mathrm {B}} _{{\mathrm {s}}1}}^{\pm }-\varDelta M_{{\mathrm {B}} _{{\mathrm {s}}1}}^{0} +M_{\mathrm {K} ^-}^{\mathrm {PDG}}-M_{\mathrm {K} ^0_{\mathrm {S}}}^{\mathrm {PDG}} \end{aligned}$$

can be determined.

The natural width of the \({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \) state is measured only in the \({\mathrm {B}} ^+ \mathrm {K} ^- \) channel due to the limited number of events in the \({\mathrm {B}} ^0 \mathrm {K} ^0_{\mathrm {S}} \) channel. Systematic uncertainties in these measurements are discussed in this subsection.

Table 5 Systematic uncertainties (in \(\,\text {MeV}\)) in the measured mass differences and natural width. The \({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \) width is measured only in the \({\mathrm {B}} ^+ \mathrm {K} ^- \) channel

The uncertainty related to the choice of the fit model is estimated by testing alternative fit models, as in Sect. 5.1. The largest deviation from the mass difference obtained from each baseline fit value is taken as the systematic uncertainty in the respective mass difference or natural width. The uncertainties related to the fits to the \({\mathrm {B}} ^+ \pi ^- \), \({\mathrm {B}} ^+ \mathrm {K} ^- \), and \({\mathrm {B}} ^0 \mathrm {K} ^0_{\mathrm {S}} \) invariant mass distributions are treated separately.

The systematic uncertainty associated with the knowledge of the mass difference \(M_{{\mathrm {B}} ^{*+}}^{\mathrm {PDG}}-M_{{\mathrm {B}} ^+}^{\mathrm {PDG}} \) (or \(M_{{\mathrm {B}} ^{*0}}^{\mathrm {PDG}}-M_{{\mathrm {B}} ^0}^{\mathrm {PDG}} \)) is taken into account as well: the baseline fits are repeated with the mass difference \(M_{{\mathrm {B}} ^{*}}^{\mathrm {PDG}}-M_{{\mathrm {B}}}^{\mathrm {PDG}} \) fixed to its nominal value plus or minus its uncertainty. The largest deviation from the baseline of the obtained mass differences and natural width is taken as the corresponding systematic uncertainty.

Studies of simulated events show that the mass differences measured in the reconstructed invariant mass distributions are slightly shifted with respect to the mass differences used in the generation of simulated events. Therefore, the measured mass differences are corrected by the observed shifts (which are up to 0.056\(\,\text {MeV}\)), and each shift is conservatively treated as a systematic uncertainty in the respective mass-difference measurement.

In order to estimate the systematic uncertainties due to possible misalignment of the detector [21], eighteen different simulated samples with various distorted geometries are produced and analyzed for each of the four decay channels. From these measurements the largest deviation of the estimation of the invariant mass or its resolution from the perfectly aligned case is accepted as an estimate of the systematic uncertainty from a possible detector misalignment. The magnitudes of distortions are large enough to be detected and corrected by the standard alignment procedures [21]. The shifts in the measured mass differences observed in these simulations are up to 0.038\(\,\text {MeV}\). The systematic uncertainty in the invariant mass resolution of the \({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^+ \mathrm {K} ^- \) signal is found to be 0.042\(\,\text {MeV}\), and the corresponding uncertainty in \(\varGamma _{{\mathrm {B}} ^{*}_{{\mathrm {s}}2}} \) is obtained by repeating the baseline fit with the resolution increased or decreased by this value. The largest deviation in the measured natural width with respect to the baseline value is used as a systematic uncertainty.

The systematic uncertainties related to the invariant mass resolution are estimated in the same way as in the previous subsections and are found to be up to 0.007\(\,\text {MeV}\) for the mass differences and 0.2\(\,\text {MeV}\) for the natural width. This source of uncertainty is conservatively considered to be uncorrelated with the systematic uncertainty related to a possible detector misalignment.

These systematic uncertainties are summarized in Table 5, together with the total systematic uncertainties, calculated by summing in quadrature the different contributions. It was checked that the mass of the \({\mathrm {B}} ^+\) meson, measured in the \({\mathrm {B}} ^+ \rightarrow {\mathrm {J}}/\psi \mathrm {K} ^+ \) decay, is consistent with the world-average value, after taking into account the systematic uncertainties related to the shift from the reconstruction and possible detector misalignment.

6 Results

The decay \({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^0 \mathrm {K} ^0_{\mathrm {S}} \) is observed for the first time with a corresponding statistical significance of 6.3 standard deviations. The first evidence (3.9 standard deviations) for the decay \({\mathrm {B}} _{{\mathrm {s}}1} \rightarrow {\mathrm {B}} ^{*0} \mathrm {K} ^0_{\mathrm {S}} \) is found. In the measurements presented below of the relative branching fractions, cross sections multiplied by branching fractions, masses, mass differences, and natural width, the first uncertainty is statistical, the second is systematic, and if there is a third, it is related to the uncertainties in the world-average values of the branching fractions, masses, and mass differences [12].

Formulae (1)–(4) are used with the branching fractions [12] \(\mathcal {B}({\mathrm {B}} ^+ \rightarrow {\mathrm {J}}/\psi \mathrm {K} ^+ )=(1.026\pm 0.031)\,10^{-3},\) \(\mathcal {B}({\mathrm {B}} ^0 \rightarrow {\mathrm {J}}/\psi \mathrm {K} ^{*0} )=(1.28\pm 0.05)\,10^{-3},\) \(\mathcal {B}(\mathrm {K} ^{*0} \rightarrow \mathrm {K} ^+ \pi ^- )=(0.99754\pm 0.00021),\) and \(\mathcal {B}(\mathrm {K} ^0_{\mathrm {S}} \rightarrow \pi ^+\pi ^- )=(0.6920\pm 0.0005)\) to determine the following ratios of branching fractions:

$$\begin{aligned} R^{0\pm }_{2}&= \frac{\mathcal {B}({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^0 \mathrm {K} ^0_{\mathrm {S}} )}{\mathcal {B}({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^+ \mathrm {K} ^- )} = 0.432\pm 0.077\pm 0.075\pm 0.021, \\ R^{0\pm }_{1}&= \frac{\mathcal {B}({\mathrm {B}} _{{\mathrm {s}}1} \rightarrow {\mathrm {B}} ^{*0} \mathrm {K} ^0_{\mathrm {S}} )}{\mathcal {B}({\mathrm {B}} _{{\mathrm {s}}1} \rightarrow {\mathrm {B}} ^{*+} \mathrm {K} ^- )} = 0.49\pm 0.12\pm 0.07\pm 0.02, \\ R^{\pm }_{2*}&= \frac{\mathcal {B}({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^{*+} \mathrm {K} ^- )}{\mathcal {B}({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^+ \mathrm {K} ^- )} = 0.081\pm 0.021\pm 0.015, \\ R^{0}_{2*}&= \frac{\mathcal {B}({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^{*0} \mathrm {K} ^0_{\mathrm {S}} )}{\mathcal {B}({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^0 \mathrm {K} ^0_{\mathrm {S}} )} = 0.093\pm 0.086\pm 0.014. \end{aligned}$$

The ratio \(R^{0\pm }_{2} \) is in good agreement with the theoretical predictions of about 0.43 [22, 23], while the ratio \(R^{0\pm }_{1} \) is 2.5 standard deviations away from the theoretical prediction of 0.23 [22], which, however, has no uncertainty estimate. The third ratio is in agreement with the measurements of LHCb [5] and CDF [6]: \(0.093\pm 0.013\pm 0.012\) and \(0.10\pm 0.03\pm 0.02\), respectively. It is also consistent with the theoretical predictions [22,23,24,25]. The fourth ratio is a new result.

In addition, using Eqs. (5)–(6), the ratios of production cross sections times branching fractions are measured:

$$\begin{aligned} {\begin{aligned} R^{\pm }_{\sigma }&= \frac{\sigma (\mathrm {pp} \rightarrow {\mathrm {B}} _{{\mathrm {s}}1} \mathrm {X})\,\mathcal {B}({\mathrm {B}} _{{\mathrm {s}}1} \rightarrow {\mathrm {B}} ^{*+} \mathrm {K} ^- )}{\sigma (\mathrm {pp} \rightarrow {\mathrm {B}} ^{*}_{{\mathrm {s}}2} \mathrm {X})\,\mathcal {B}({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^+ \mathrm {K} ^- )} \\&= 0.233\pm 0.019\pm 0.018, \\ R^{0}_{\sigma }&= \frac{\sigma (\mathrm {pp} \rightarrow {\mathrm {B}} _{{\mathrm {s}}1} \mathrm {X})\,\mathcal {B}({\mathrm {B}} _{{\mathrm {s}}1} \rightarrow {\mathrm {B}} ^{*0} \mathrm {K} ^0_{\mathrm {S}} )}{\sigma (\mathrm {pp} \rightarrow {\mathrm {B}} ^{*}_{{\mathrm {s}}2} \mathrm {X})\,\mathcal {B}({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \rightarrow {\mathrm {B}} ^0 \mathrm {K} ^0_{\mathrm {S}} )} \\&= 0.266\pm 0.079\pm 0.063. \end{aligned}} \end{aligned}$$

The value of \(R^{\pm }_{\sigma } \) was previously determined by LHCb to be \(0.232\pm 0.014\pm 0.013\) [5] at \(\sqrt{s}=7\,\text {TeV} \) and in a different pseudorapidity region, consistent with the result presented here.

The following mass differences are obtained:

$$\begin{aligned} {\begin{aligned} \varDelta M_{{\mathrm {B}} ^{*}_{{\mathrm {s}}2}}^{\pm }&= M({\mathrm {B}} ^{*}_{{\mathrm {s}}2})-M_{{\mathrm {B}} ^+}^{\mathrm {PDG}}-M_{\mathrm {K} ^-}^{\mathrm {PDG}} \\&= 66.87\pm 0.09\pm 0.07\,\text {MeV}, \\ \varDelta M_{{\mathrm {B}} ^{*}_{{\mathrm {s}}2}}^{0}&= M({\mathrm {B}} ^{*}_{{\mathrm {s}}2})-M_{{\mathrm {B}} ^0}^{\mathrm {PDG}}-M_{\mathrm {K} ^0_{\mathrm {S}}}^{\mathrm {PDG}} \\&= 62.37\pm 0.48\pm 0.07\,\text {MeV}, \\ \varDelta M_{{\mathrm {B}} _{{\mathrm {s}}1}}^{\pm }&= M({\mathrm {B}} _{{\mathrm {s}}1})-M_{{\mathrm {B}} ^{*+}}^{\mathrm {PDG}}-M_{\mathrm {K} ^-}^{\mathrm {PDG}} \\&= 10.45\pm 0.09\pm 0.06\,\text {MeV}, \\ \varDelta M_{{\mathrm {B}} _{{\mathrm {s}}1}}^{0}&= M({\mathrm {B}} _{{\mathrm {s}}1})-M_{{\mathrm {B}} ^{*0}}^{\mathrm {PDG}}-M_{\mathrm {K} ^0_{\mathrm {S}}}^{\mathrm {PDG}} \\&= 5.61\pm 0.23\pm 0.06\,\text {MeV}. \end{aligned}} \end{aligned}$$

The first two mass differences are in good agreement with LHCb [5] and CDF [6] results (see Table 1). Using these two measurements, the world-average masses of the \({\mathrm {B}} ^+ \) and \(\mathrm {K} ^- \) mesons, and the mass difference \(M_{{\mathrm {B}} ^{*+}}^{\mathrm {PDG}}-M_{{\mathrm {B}} ^+}^{\mathrm {PDG}} \), the \({\mathrm {B}} ^{(*)}_{{\mathrm {s}}1,2} \) masses are determined:

$$\begin{aligned}\begin{aligned} M({\mathrm {B}} ^{*}_{{\mathrm {s}}2})&= 5839.86\pm 0.09\pm 0.07\pm 0.15\,\text {MeV},\\ M({\mathrm {B}} _{{\mathrm {s}}1})&= 5828.78\pm 0.09\pm 0.06\pm 0.28\,\text {MeV}.\\ \end{aligned}\end{aligned}$$

The measured masses in the \({\mathrm {B}} ^0 \mathrm {K} ^0_{\mathrm {S}} \) channel are consistent with our results using the \({\mathrm {B}} ^+ \mathrm {K} ^- \) channel but have significantly larger uncertainties.

Using the mass-difference measurements above, the mass differences between the neutral and charged \({\mathrm {B}} \) and \({\mathrm {B}} ^{*} \) mesons are found to be:

$$\begin{aligned}\begin{aligned} M_{{\mathrm {B}} ^0}-M_{{\mathrm {B}} ^+}&= 0.57\pm 0.49\pm 0.10\pm 0.02\,\text {MeV}, \\ M_{{\mathrm {B}} ^{*0}}-M_{{\mathrm {B}} ^{*+}}&= 0.91\pm 0.24\pm 0.09\pm 0.02\,\text {MeV}. \end{aligned}\end{aligned}$$

The first mass difference result is consistent with the significantly more precise world-average value of \(0.31\pm 0.06\,\text {MeV} \) [12]. There are no previous measurements of \(M_{{\mathrm {B}} ^{*0}}-M_{{\mathrm {B}} ^{*+}} \), and this paper presents a new method to measure both of these mass differences.

Lastly, the natural width of the \({\mathrm {B}} ^{*}_{{\mathrm {s}}2} \) meson is determined to be

$$\begin{aligned} \varGamma _{{\mathrm {B}} ^{*}_{{\mathrm {s}}2}} = 1.52\pm 0.34\pm 0.30\,\text {MeV}, \end{aligned}$$

consistent with the results of LHCb [5] and CDF [6] (see Table 1).

7 Summary

The P-wave \({\mathrm {B}} ^0_{\mathrm {s}} \) meson states are studied using a data sample corresponding to an integrated luminosity of of proton-proton collisions collected by the CMS experiment at \(\sqrt{s}= 8\,\text {TeV} \) in 2012. Observation and evidence are reported for the decays \({\mathrm {B}} ^{*}_{{\mathrm {s}}2}(5840)^0 \rightarrow {\mathrm {B}} ^0 \mathrm {K} ^0_{\mathrm {S}} \) and \({\mathrm {B}} _{{\mathrm {s}}1}(5830)^0 \rightarrow {\mathrm {B}} ^{*0} \mathrm {K} ^0_{\mathrm {S}} \), respectively. Four ratios of branching fractions and two ratios of production cross sections multiplied by the branching fractions of the P-wave \({\mathrm {B}} ^0_{\mathrm {s}} \) mesons into a \({\mathrm {B}}\) meson and kaon are measured. In addition, the differences between the \({\mathrm {B}} ^{(*)}_{{\mathrm {s}}1,2} \) mass and the sum of the \({\mathrm {B}}\) meson and kaon mass are determined, as well as the \({\mathrm {B}} ^{*}_{{\mathrm {s}}2}(5840)^0 \) natural width. Finally, using a new approach, the mass differences \(M_{{\mathrm {B}} ^0}-M_{{\mathrm {B}} ^+} \) and \(M_{{\mathrm {B}} ^{*0}}-M_{{\mathrm {B}} ^{*+}} \) are measured, where the latter is determined for the first time.