Measurement of angular parameters from the decay $\mathrm{B}^0 \to \mathrm{K}^{*0} \mu^+ \mu^-$ in proton-proton collisions at $\sqrt{s} = $ 8 TeV

Angular distributions of the decay $\mathrm{B}^0 \to \mathrm{K}^{*0} \mu^+ \mu^-$ are studied using a sample of proton-proton collisions at $\sqrt{s} = $ 8 TeV collected with the CMS detector at the LHC, corresponding to an integrated luminosity of 20.5 fb$^{-1}$. An angular analysis is performed to determine the $P_1$ and $P_5'$ parameters, where the $P_5'$ parameter is of particular interest because of recent measurements that indicate a potential discrepancy with the standard model predictions. Based on a sample of 1397 signal events, the $P_1$ and $P_5'$ parameters are determined as a function of the dimuon invariant mass squared. The measurements are in agreement with predictions based on the standard model.


Introduction
Phenomena beyond the standard model (SM) of particle physics can become manifest directly, via the production of new particles, or indirectly, by modifying the production and decay properties of SM particles. Analyses of flavor-changing neutral-current decays are particularly sensitive to the effects of new physics because these decays are highly suppressed in the SM. An example is the decay B 0 → K * 0 µ + µ − , where K * 0 indicates the K * 0 (892) meson, with the chargeconjugate reaction implied here and elsewhere in this Letter unless otherwise stated. An angular analysis of this decay as a function of the dimuon invariant mass squared (q 2 ) allows its properties to be thoroughly investigated.
The differential decay rate for B 0 → K * 0 µ + µ − can be written in terms of q 2 and three angular variables as a combination of spherical harmonics, weighted by q 2 -dependent angular parameters. These angular parameters in turn depend upon complex decay amplitudes, which are described by Wilson coefficients in the relevant effective Hamiltonian [1]. There can be different formulations of the angular parameters. In this Letter we present measurements of the so-called P 1 and P 5 parameters [2,3].
The new measurements of the P 1 and P 5 angular parameters in B 0 → K * 0 µ + µ − decays presented in this Letter are performed using a sample of events collected in proton-proton (pp) collisions at a center-of-mass energy of 8 TeV with the CMS detector at the CERN LHC. The data correspond to an integrated luminosity of 20.5 ± 0.5 fb −1 [35]. The K * 0 meson is reconstructed through its decay to K + π − , and the B 0 meson by fitting to a common vertex the tracks from two oppositely charged muon candidates and the tracks from the K * 0 decay. The values of P 1 and P 5 are measured by fitting the distributions of events as a function of three angular variables: the angle between the µ + and the B 0 in the dimuon rest frame, the angle between the K + and the B 0 in the K * 0 rest frame, and the angle between the dimuon and the Kπ decay planes in the B 0 rest frame. The measurements are performed in the q 2 range from 1 to 19 GeV 2 . Data in the ranges 8.68 < q 2 < 10.09 GeV 2 and 12.90 < q 2 < 14.18 GeV 2 correspond to B 0 → J/ψK * 0 and B 0 → ψ K * 0 decays, respectively, and are used as control samples, since they have the same final state as the nonresonant decays of interest. Here, ψ denotes the ψ(2S) meson.
CMS previously exploited the same data set used in this analysis to measure two other angular parameters in the B 0 → K * 0 µ + µ − decay as a function of q 2 : the forward-backward asymmetry of the muons, A FB , and the K * 0 longitudinal polarization fraction, F L , as well as the differential branching fraction, dB/dq 2 [31]. After a simplification of the theoretical decay rate expression, this previous measurement was performed using two out of the three angular variables. The analysis was performed with a blinded procedure: the definition of fit strategy and its validation, as well as background distribution determination have been performed on simulated samples, control region and signal side bands. The final fit on data has been done at the end of validation. The analysis presented in this Letter shares with the previous analysis, together with the data set, the criteria used for selecting signal events, which are reported in Section 3 for completeness.

The CMS detector
A detailed description of the CMS detector, together with the coordinate system and the standard kinematic variables, can be found in Ref. [36]. The main detector components used in this analysis are the silicon tracker and the muon detection systems. The silicon tracker, positioned within a superconducting solenoid that provides an axial magnetic field of 3.8 T, consists of three pixel layers and ten strip layers (four of which have a stereo view) in the barrel region, accompanied by similar pixel and strip detectors in each endcap region, for a total pseudorapidity coverage of |η| < 2.5. For tracks with transverse momenta 1 < p T < 10 GeV and |η| < 1.4, the resolutions are typically 1.5% in p T and 25-90 (45-150) µm in the transverse (longitudinal) impact parameter [37]. Muons are measured in the range |η| < 2.4 with detection planes made using three technologies: drift tubes, cathode strip chambers, and resistive plate chambers. The probability for a pion, kaon, or proton to be misidentified as a muon is less than 2.5 × 10 −3 , 0.5 × 10 −3 , and 0.6 × 10 −3 , respectively, for p T > 4 GeV and |η| < 2.4. The muon identification efficiency is greater than 0.80 (0.98) for p T > 3.5 GeV and |η| < 1.2 (1.2 < |η| < 2.4) [38]. In addition to the tracker and muon detectors, CMS is equipped with electromagnetic and hadronic calorimeters.
Events are selected using a two-level trigger system [39]. The first level consists of specialized hardware processors that use information from the calorimeters and muon systems to select events of interest at a rate of around 90 kHz. A high-level trigger processor farm further decreases the event rate to less than 1 kHz before data storage.

Reconstruction, event selection, and efficiency
The criteria used to select the candidate events during data taking (trigger) and after full event reconstruction (offline) make use of the relatively long lifetime of B 0 mesons, which leads them to decay an average of about 1 mm from their production point. The trigger uses only muon information to select events, while the offline selection includes the full reconstruction of all decay products.
All events used in this analysis were recorded with the same trigger, requiring two identified muons of opposite charge to form a vertex that is displaced from the pp collision region (beamspot). Multiple pp collisions in the same or nearby beam crossings (pileup) cause multiple vertices in the same event. The beamspot position (the most probable collision point) and size (the extent of the luminous region covering 68% of the collisions in each dimension) were continuously measured through Gaussian fits to reconstructed pileup vertices as part of the online data quality monitoring. The trigger required each muon to have p T > 3.5 GeV, |η| < 2.2, and to pass within 2 cm of the beam axis. The dimuon system was required to have p T > 6.9 GeV, a vertex fit χ 2 probability larger than 10%, and a separation of the vertex relative to the beamspot in the transverse plane of at least 3 standard deviations, where the calculation of the standard deviation includes the calculated uncertainty in the vertex position and the measured size of the beamspot. In addition, the cosine of the angle in the transverse plane between the dimuon momentum vector and the vector from the beamspot to the dimuon vertex was required to be greater than 0.9.
The offline reconstruction requires at least two oppositely charged muons and at least two oppositely charged hadrons. The muons are required to match those that triggered the event. The matching is performed by requiring an offline muon to match a trigger-level muon within ∆R = √ (∆η) 2 + (∆φ) 2 < 0.1, where ∆η and ∆φ are the pseudorapidity and azimuthal angle differences, respectively, between the directions of the trigger-level and offline muons. Offline muons must, in addition, satisfy general muon identification requirements. For example, the muon track candidate from the silicon tracker must match a track segment from the muon detector, the χ 2 per degree of freedom in a global fit to the silicon tracker and muon detector hits must be less than 1.9, there must be at least six silicon tracker hits, including at least two from the pixel detector, and the transverse (longitudinal) impact parameter with respect to the beamspot must be less than 3 (30) cm. These selection criteria are chosen to optimize the muon identification efficiency as measured using J/ψ → µ + µ − decays in data. The dimuon system at the offline level is required to satisfy the same requirements as specified above for the trigger level.
The charged hadron candidates are required to fail the muon identification criteria, have p T > 0.8 GeV, and an extrapolated distance d of closest approach to the beamspot in the transverse plane greater than twice the sum in quadrature of the uncertainty in d and the beamspot transverse size. For at least one of the two possible identity assignments-that the positively charged hadron is a kaon and the negatively charged hadron a pion, or vice versa-the invariant mass of the hadron pair must lie within 90 MeV of the nominal K * 0 mass [40]. To remove contamination from φ(1020) → K + K − decays, we temporarily assign the kaon mass to both charged hadrons, and then eliminate the candidate if the resulting invariant mass of the hadron pair is less than 1.035 GeV. The B 0 candidates are obtained by fitting the four charged tracks to a common vertex, and applying a vertex constraint to improve the resolution of the track parameters. The B 0 candidates must have p T > 8 GeV, |η| < 2.2, vertex fit χ 2 probability larger than 10%, vertex transverse separation L from the beamspot greater than 12 times the sum in quadrature of the uncertainty in L and the beamspot transverse size, and cos α xy > 0.9994, where α xy is the angle in the transverse plane between the B 0 momentum vector and the line-of-flight between the beamspot and the B 0 vertex. The invariant mass m of the B 0 candidate must lie within 280 MeV of the nominal B 0 mass (m B 0 ) [40] for either the K − π + µ + µ − or K + π − µ + µ − possibility. The selection criteria are optimized using signal event samples from simulation and background event samples from sideband data in m. The sideband includes both a low-and a high-mass region and is defined by 3σ m < |m − m B 0 | < 280 MeV, where σ m is the average mass resolution (≈45 MeV) obtained from fitting a sum of two Gaussian functions with a common mean to simulated signal events. After applying the selection criteria, about 5% of the events have more than one candidate. A single candidate is chosen based on the best B 0 vertex χ 2 probability.
For each of the selected events, the dimuon invariant mass q and its uncertainty σ q are calculated. We define B 0 → J/ψK * 0 and B 0 → ψ K * 0 control samples through the requirements |q − m J/ψ | < 3σ q and |q − m ψ | < 3σ q , respectively, where m J/ψ and m ψ are the nominal masses [40] of the indicated meson. The average value of σ q is about 26 MeV.
The remaining event sample still contains contributions from B 0 → J/ψK * 0 and B 0 → ψ K * 0 decays, mainly because of unreconstructed soft photons in the charmonium decay, i.e., J/ψ or ψ → µ + µ − γ. These events have a low value of q and fall outside the control sample selection described above. They also have a low value of m and can be selectively removed using a combined requirement on q and m.
Using Monte Carlo (MC) simulation, these requirements were set so that less than 10% of the background events originate from the control channels.
To avoid bias, the optimization of the selection criteria, and the fit strategy described below in Section 4, are determined before data in the signal region are examined. The selection criteria do not depend on the choice of the primary vertex, and their optimization procedure makes use of both MC simulated signal events generated with the same pileup distribution as in data, and sideband data. After applying these requirements, 3191 events remain.
The selected four-track vertex is identified as a B 0 or B 0 candidate depending on whether the K + π − or K − π + invariant mass is closest to the nominal K * 0 mass. The fraction of candidates assigned to the incorrect state is estimated from simulation to be 12-14%, depending on q 2 .
The global efficiency, , is the product of the acceptance and the combined trigger, reconstruction, and selection efficiencies, all of which are obtained from MC simulated event samples. The pp collisions are simulated using the PYTHIA [41] event generator, version 6.424, with particle decays described by the EVTGEN [42] generator, version 9.1, in which final-state radiation is generated using PHOTOS [43]. The default matrix element in PYTHIA is used to describe the events. The simulated particles are propagated through a detailed model of the detector based on GEANT4 [44]. The reconstruction and selection of the generated events proceed as for the data. Separate samples of events are generated for B 0 decays to K * 0 µ + µ − , J/ψ K * 0 , and ψ K * 0 , with K * 0 → K + π − and both J/ψ and ψ decaying to µ + µ − . The distribution of pp collision vertices in each sample is adjusted to match the observed distribution.
The acceptance is obtained from generator-level events, i.e., before the particle propagation with GEANT4, and is defined as the fraction of events with p T (B 0 ) > 8 GeV and |η(B 0 )| < 2.2 that satisfy the single-muon requirements p T (µ) > 3.3 GeV and |η(µ)| < 2.3. These criteria are less restrictive than the final selection criteria in order to account for finite detector resolution, since they are applied to generator-level quantities. Only events satisfying the acceptance criteria are processed through the GEANT4 simulation, the trigger simulation, and the reconstruction software.
The combined trigger, reconstruction, and selection efficiency is given by the ratio of the number of events that satisfy the trigger and selection requirements and have a reconstructed B 0 candidate compatible with a generated B 0 meson, relative to the number of events that satisfy the acceptance criteria. The generated and reconstructed B 0 are considered to be compatible if the reconstructed K + candidate appears within a distance ∆R of the generated K + meson, and analogously for the π − , µ + , and µ − , where ∆R = 0.3 for the hadrons and ∆R = 0.004 for the muons. Requiring all four particles in the B 0 decay to be matched results in an efficiency of 99.6% (0.4% of the events have a correctly reconstructed B 0 candidate that is not matched to a generated B 0 meson) and a purity of 99.5% (0.5% of the matched candidates do not correspond to a correctly reconstructed B 0 candidate). Efficiencies are determined for both correctly tagged (the K and π have the correct charge) and mistagged (the K and π charges are reversed) candidates.

Background studies
Using simulation, we search for possible backgrounds that might peak in the B 0 mass region. The event selection is applied to inclusive MC samples of B 0 , B s , B + , and Λ b decays to J/ψ X and ψ X, where X denotes all of the exclusive decay channels found in the PDG [40], and with the J/ψ and ψ decaying to µ + µ − . No evidence for a peaking structure near the B 0 mass is found. The distributions of the few events that satisfy the selection criteria are similar to the shape of the combinatorial background. As an additional check, we generate events with B s → K * 0 (K + π − )µ + µ − decays. Assuming that the ratio of branching fractions B(B s → J/ψK * 0 )/B(B 0 → J/ψK * 0 ) ≈ 10 −2 [40], less than one event passes our selection criteria.
Possible backgrounds from events with two hadrons misidentified as muons, in particular from the hadronic fully reconstructable B 0 → DX decays, are suppressed by the misidentification probability (10 −3 × 10 −3 ), and are thus considered negligible. Also, events from B 0 → J/ψK * 0 decays, where a muon and a hadron are swapped, are suppressed by the hadronto-muon misidentification probability (10 −3 ) and by the muon-to-hadron identification inefficiency (10 −1 ). In fact, given the amount of reconstructed B 0 → J/ψK * 0 events (165 k), we expect ≈16 events distributed in the two adjacent q 2 bins close to the J/ψ control region.
Backgrounds from semileptonic decays such as where D − decays to K * 0 µ − ν µ and K * 0 to K + π − , are also studied using simulation. We estimate in data less than one event for each of the three decays populating the low-mass sideband. All these potential sources of background are evaluated in the whole q 2 range, excluding the J/ψ and ψ control regions, and are found to be negligible.
The impact of other partially reconstructed multibody B decays that might affect the low-mass sideband is addressed in Section 5.
Backgrounds from events in which a B + → K + µ + µ − decay is combined with a random pion, and from events with a Λ b → Λ 0 µ + µ − decay, where Λ 0 decays to p K, in which the proton is assigned the pion mass, are found to be negligible. In fact, both processes are flavorchanging neutral-current decays, therefore they have a comparable branching fraction to our signal process. The former decay has a theoretical lower bound on the invariant mass that lies at ≈5.41 GeV. We search in data for an invariant mass peak around the B + world-average mass after computing the invariant mass for both K + µ + µ − and K − µ + µ − possibilities in events with 5.41 − σ m < m < m B 0 + 0.280 GeV, but no evidence of such a peak is found. For the latter decay we search in data for an invariant mass peak around the Λ b world-average mass after assigning the proton mass to the track previously identified as a pion. Also in this case, no evidence of a peak is found. Indeed, the simulation shows that less than one event is expected to pass our selection requirements.

Analysis method
This analysis measures the P 1 and P 5 values in B 0 → K * 0 µ + µ − decays as a function of q 2 . Figure 1 illustrates the angular variables needed to describe the decay: θ is the angle between the positive (negative) muon momentum and the direction opposite to the B 0 B 0 momentum in the dimuon rest frame, θ K is the angle between the kaon momentum and the direction opposite to the B 0 B 0 momentum in the K * 0 K * 0 rest frame, and ϕ is the angle between the plane containing the two muons and the plane containing the kaon and the pion in the B 0 rest frame. Although the K + π − invariant mass is required to be consistent with that of a K * 0 meson, there can be a contribution from spinless (S-wave) K + π − combinations [25,[45][46][47]. This is parametrized with three terms: F S , which is related to the S-wave fraction, and A S and A 5 S , which are the interference amplitudes between the S-and P-wave decays. Including these components, the angular distribution of B 0 → K * 0 µ + µ − decays can be written as [25]: where F L denotes the longitudinal polarization fraction of the K * 0 . This expression is an exact simplification of the full angular distribution, obtained by folding the ϕ and θ angles about zero and π/2, respectively. Specifically, if ϕ < 0, then ϕ → −ϕ, and the new ϕ domain is [0, π].
If θ > π/2, then θ → π − θ , and the new θ domain is [0, π/2]. We use this simplified version of the expression because of difficulties in the fit convergence with the full angular distribution due to the limited size of the data sample. This simplification exploits the odd symmetry of the angular variables with respect to ϕ = 0 and θ = π/2 in such a manner that the cancellation around these angular values is exact. This cancellation remains approximately valid even after accounting for the experimental acceptance because the efficiency is symmetric with respect to the folding angles.
For each q 2 bin, the observables of interest are extracted from an unbinned extended maximumlikelihood fit to four variables: the K + π − µ + µ − invariant mass m and the three angular variables θ , θ K , and ϕ. The unnormalized probability density function (pdf) in each q 2 bin has the following form: where the three terms on the righthand side correspond to correctly tagged signal events, mistagged signal events, and background events. The parameters Y C S and Y B are the yields of correctly tagged signal events and background events, respectively, and are determined in the fit. The parameter f M is the fraction of signal events that are mistagged and is determined from simulation. Its value ranges from 0.124 to 0.137 depending on the q 2 bin.
The signal mass probability functions S C (m) and S M (m) are each the sum of two Gaussian functions, with a common mean for all four Gaussian functions, and describe the mass distribution for correctly tagged and mistagged signal events, respectively. In the fit, the mean, the four Gaussian function's width parameters, and the two fractions specifying the relative contribution of the two Gaussian functions in S C (m) and S M (m) are determined from simulation. The function S a (θ K , θ , ϕ) describes the signal in the three-dimensional (3D) space of the angular variables and corresponds to Eq. (1). The combination B m (m) B θ K (θ K ) B θ (θ ) B ϕ (ϕ) is obtained from the B 0 sideband data in m and describes the background in the space of (m, θ K , θ , ϕ), where B m (m) is an exponential function, B θ K (θ K ) and B θ (θ ) are second-to fourth-order polynomials, depending on the q 2 bin, and B ϕ (ϕ) is a first-order polynomial. The factorization assumption of the background pdf in Eq. (2) is validated by dividing the range of an angular variable into two at its center point and comparing the distributions of events from the two halves in the other angular variables.
The functions C (θ K , θ , ϕ) and M (θ K , θ , ϕ) are the efficiencies in the 3D space of | cos θ K | ≤ 1, 0 ≤ cos θ ≤ 1, and 0 ≤ ϕ ≤ π for correctly tagged and mistagged signal events, respectively. The numerator and denominator of the efficiency are separately described with a nonparametric technique, which is implemented with a kernel density estimator [48,49]. The final efficiency distributions used in the fit are obtained from the ratio of 3D histograms derived from the sampling of the kernel density estimators. The histograms have 40 bins in each dimension. A consistency check of the procedure used to determine the efficiency is performed by dividing the simulated data sample into two independent subsets, and extracting the angular parameters from the first subset using the efficiency computed from the second subset. The efficiencies for both correctly tagged and mistagged events peak at cos θ ≈ 0, around which they are rather symmetric for q 2 < 10 GeV 2 , and are approximately flat in ϕ. The efficiency for correctly tagged events becomes relatively flat in cos θ for larger values of q 2 , while it has a monotonic decrease for increasing cos θ K values for q 2 < 14 GeV 2 . For larger values of q 2 a decrease in the efficiency is also seen near cos θ K = −1. The efficiency for mistagged events has a minimum at cos θ ≈ 0 for q 2 > 10 GeV 2 , while it is maximal near cos θ K = 0 for q 2 < 10 GeV 2 .
For large values of q 2 a mild maximum also appears near cos θ K = 1.
The fit is performed in two steps. The initial fit does not include a signal component and uses the sideband data in m to obtain the B m (m), B θ K (θ K ), B θ (θ ), and B ϕ (ϕ) distributions. The distributions obtained in this step are then fixed for the second step, which is a fit to the data over the full mass range. The fitted parameters in the second step are the angular parameters P 1 , P 5 , and A 5 S , and the yields Y C S and Y B . To avoid difficulties in the convergence of the fit related to the limited number of events, the angular parameters F L , F S , and A S are fixed to previous measurements [31].
The expression describing the angular distribution of B 0 → K * 0 µ + µ − decays, Eq. (1), and also its more general form in Ref. [25], can become negative for certain values of the angular parameters. In particular, the pdf in Eq. (2) is only guaranteed to be positive for a particular subset of the P 1 , P 5 , and A 5 S parameter space. The presence of such a boundary greatly complicates the numerical maximization process of the likelihood by MINUIT [50] and especially the error determination by MINOS [50], in particular near the boundary between physical and unphysical regions. Therefore, the second fit step is performed by discretizing the P 1 , P 5 two-dimensional space and by maximizing the likelihood as a function of the nuisance parameters Y C S , Y B , and A 5 S at fixed values of P 1 and P 5 . Finally, the distribution of the likelihood values is fit with a bivariate Gaussian distribution. The position of the maximum of this distribution inside the physical region provides the measurements of P 1 and P 5 .
The interference terms A S and A 5 S must vanish if either of the two interfering components vanish. These constraints are implemented by requiring |A S | < f is a ratio related to the S-and P-wave line shapes, calculated to be 0.89 near the K * 0 meson mass [25]. The constraint on A S is naturally satisfied since F S , F L , and A S are taken from previous measurements [31].
To ensure correct coverage for the uncertainties, the Feldman-Cousins method [51] is used with nuisance parameters. Two main sets of pseudo-experimental samples are generated. The first (second) set, used to compute the coverage for P 1 (P 5 ), is generated by assigning values to the other parameters as obtained by profiling the bivariate Gaussian distribution description of the likelihood determined from data at fixed P 1 (P 5 ) values. When fitting the pseudo-experimental samples, the same fit procedure as applied to the data is used.
The fit formalism and results are validated through fits to pseudo-experimental samples, MC simulation samples, and control channels. Additional details, including the size of the systematic uncertainties assigned on the basis of these fits, are described in Section 5.

Systematic uncertainties
The systematic uncertainty studies are described below and summarized in Table 1 in the same order.
The adequacy of the fit function and the procedure to determine the parameters of interest are validated in three ways. First, a large, statistically precise MC signal sample with approximately 400 times the number of events as the data is used to verify that the fitting procedure produces results consistent with the input values to the simulation. The difference between the input and output values in this check is assigned as a simulation mismodeling systematic uncertainty. It is also verified that fitting a sample with only either correctly tagged or mistagged events yields the correct results. Second, 200 subsamples are extracted randomly from the large MC signal sample and combined with background events obtained from the pdf in Eq. (2) to mimic independent data sets of similar size to the data. These are used to estimate a fit bias by comparing the average values of the results obtained by fitting the 200 samples to the results obtained using the full MC signal sample. Much of the observed bias is a consequence of the fitted parameters lying close to the boundaries of the physical region. Third, 200 pseudoexperiments, each with the same number of events as the data sample, are generated in each q 2 bin using the pdf in Eq. (2), with parameters obtained from the fit to the data. Fits to these 200 samples do not reveal any additional systematic uncertainty. Because the efficiency functions are estimated from a finite number of simulated events, there is a corresponding statistical uncertainty in the efficiency. Alternatives to the default efficiency function are obtained by generating 100 new distributions for the numerator and the denominator of the efficiency ratio based on the default kernel density estimators as pdfs, and rederiving new kernel density estimators for each trial. The effect of these different efficiency functions on the final result is used to estimate the systematic uncertainty.
The efficiency determination is checked by comparing efficiency-corrected results obtained from the control channels with the corresponding world-average values. The B 0 → J/ψK * 0 control sample contains 165 000 events, compared with 11 000 events for the B 0 → ψ K * 0 sample. Because of its greater statistical precision, we rely on the B 0 → J/ψK * 0 sample to perform the check of the efficiency determination for the angular variables. We do this by measuring the longitudinal polarization fraction F L in the B 0 → J/ψK * 0 decays. We find F L = 0.537 ± 0.002 (stat), compared with the world-average value 0.571 ± 0.007 (stat+syst) [40]. The difference of 0.034 is propagated to P 1 and P 5 by taking the root-mean-square (RMS) of the respective distributions resulting from refitting the data 200 times, varying F L within a Gaussian distribution with a standard deviation of 0.034. As a cross-check that the overall efficiency is not affected by a q 2 -dependent offset, we measure the ratio of branching fractions B(B 0 → ψ K * 0 )/B(B 0 → J/ψK * 0 ) = 0.480 ± 0.008 (stat) ± 0.055 (R µµ ψ ), by means of efficiency-corrected yields including both correctly and wrongly tagged events (the same central value is obtained also separately for the two subsets of events), where R µµ ψ refers to the ratio B(J/ψ → µ + µ − )/B(ψ → µ + µ − ) of branching fractions. This is compared to the world-average value 0.484 ± 0.018(stat) ± 0.011(syst) ± 0.012(R ee ψ ) [40], where R ee ψ refers to the corresponding ratio of branching fractions to e + e − . The two results are seen to agree within the uncertainties.
To evaluate the uncertainty in the mistag fraction f M , we allow this fraction to vary in a fit to the events in the B 0 → J/ψK * 0 control sample. We find f M = (14.5 ± 0.5)%, compared to the result from simulation (13.7 ± 0.1)%. The difference of 0.8 is propagated to P 1 and P 5 by determining the RMS of the respective distributions obtained from refitting the data 10 times, varying f M within a Gaussian distribution with a standard deviation of 0.8.
The systematic uncertainty associated with the functions used to model the angular distribution of the background is obtained from the statistical uncertainty in the background shape, as these shapes are fixed in the final fit. This uncertainty is determined by fitting the data 200 times, varying the background parameters within their Gaussian uncertainties, and taking the RMS of the angular parameter values as the systematic uncertainty. Moreover, for the q 2 bin reported in Fig. 2, upper two rows, which shows an excess around cos θ ≈ 0.7 that is also present in the sideband distribution (not shown in the figure), we refit the data using different descriptions of the background as a function of cos θ . The differences in the measurement of P 1 and P 5 are within the systematic uncertainty quoted for the background distribution.
The low-mass sideband might contain partially reconstructed multibody B 0 decays. We test this possibility by refitting the data with a restricted range for the low-mass sideband, i.e., starting from ≈5.1 instead of ≈5 GeV. No significant differences are seen in the measurement of P 1 and P 5 , and therefore no systematic uncertainty is assigned.
To evaluate the systematic uncertainty associated with the signal mass pdfs S C (m) and S M (m), we fit the B 0 → J/ψK * 0 and B 0 → ψ K * 0 control samples allowing two of the width values in the four Gaussian terms to vary at a time. The maximum change in P 1 and P 5 for either of the two control channels is taken as the systematic uncertainty for all q 2 bins.
The q 2 bin just below the J/ψ (ψ ) control region, and the q 2 bin just above, may be contaminated with B 0 → J/ψK * 0 (B 0 → ψ K * 0 ) "feed-through" events that are not removed by the selection procedure. A special fit in these two bins is performed, in which an additional background term is added to the pdf. This background distribution is obtained from simulated B 0 → J/ψK * 0 (B 0 → ψ K * 0 ) events, with the background yield as a fitted parameter. The resulting changes in P 1 and P 5 are used as estimates of the systematic uncertainty associated with this contribution.
To properly propagate the uncertainty associated with the values of F L , F S , and A S , taking into account possible correlations, 10 pseudo-experiments per q 2 bin are generated using the pdf parameters determined from the fit to data. The number of events in these pseudo-experiments is 100 times that of the data. The pseudo-experiments are then fit twice, once with the same procedure as for the data and once with P 1 , P 5 , A 5 S , F L , F S , and A S allowed to vary. The average ratio ρ of the statistical uncertainties in P 1 and P 5 from the first fit to that in the second fit is used to compute this systematic uncertainty, which is proportional to the confidence interval determined from the Feldman-Cousins method through the coefficient The stability of ρ as a function of the number of events of the pseudo-experiments is also verified. As crosschecks of our procedure concerning the fixed value of F L , we fit the two control regions either fixing F L or allowing it to vary, and find that the values of P 1 and P 5 are essentially unaffected, obtaining the same value of F L as in our previous study [31]. Moreover, we refit all the q 2 bins using only the P-wave contribution for the decay rate in Eq. (1) and leaving all three parameters, P 1 , P 5 , and F L , free to vary. The differences in the measured values of P 1 and P 5 are within the systematic uncertainty quoted for the F L , F S , and A S uncertainty propagation.
The effects of angular resolution on the reconstructed values of θ K and θ are estimated by performing two fits on the same set of simulated events. One fit uses the true values of the angular variables and the other fit their reconstructed values. The difference in the fitted parameters between the two fits is taken as an estimate of the systematic uncertainty.
The systematic uncertainties are determined for each q 2 bin, with the total systematic uncertainty obtained by adding the individual contributions in quadrature.
As a note for future possible global fits of our P 1 and P 5 data, the systematic uncertainties associated with the efficiency, Kπ mistagging, B 0 mass distribution, and angular resolution can be assumed to be fully correlated bin-by-bin, while the remaining uncertainties can be assumed to be uncorrelated.  Figure 2: Invariant mass and angular distributions of K + π − µ + µ − events for (upper two rows) 2 < q 2 < 4.3 GeV 2 and (lower two rows) 4.3 < q 2 < 6 GeV 2 . The projection of the results from the total fit, as well as for correctly tagged signal events, mistagged signal events, and background events, are also shown. The vertical bars indicate the statistical uncertainties.

Results
The events are fit in seven q 2 bins from 1 to 19 GeV 2 , yielding 1397 signal and 1794 background events in total. As an example, distributions for two of these bins, along with the fit projections, are shown in Fig. 2. The fitted values of the signal yields, P 1 , and P 5 are given in Table 2 for the seven q 2 bins. The results for P 1 and P 5 are shown in Fig. 3, along with those from the LHCb [33] and Belle [34] experiments. The fitted values of A 5 S vary from −0.052 to +0.057. The measured signal yields, which include both correctly tagged and mistagged events, the P 1 and P 5 values, and the correlation coefficients, in bins of q 2 , for B 0 → K * 0 µ + µ − decays. The first uncertainty is statistical and the second is systematic. The bin ranges are selected to allow comparison with previous measurements.  A SM prediction, denoted SM-DHMV, is available for comparison with the measured angular parameters. The SM-DHMV result, derived from Refs. [18,25], updates the calculations from Ref. [52] to account for the known correlation between the different form factors [53]. It also combines predictions from light-cone sum rules, which are valid in the low-q 2 region, with lattice predictions at high q 2 [54] to obtain more precise determinations of the form factors over the full q 2 range. The hadronic charm-quark loop contribution is obtained from Ref. [55]. A reliable theoretical prediction is not available near the J/ψ and ψ resonances. The SM prediction is shown in comparison to the data in Fig. 3 and it is seen to be in agreement with the CMS results. Thus, we do not obtain evidence for physics beyond the SM. Qualitatively, the CMS measure-ments are compatible with the LHCb results. The Belle measurements lie systematically above both the CMS and LHCb results and the SM prediction.

Summary
Using proton-proton collision data recorded at √ s = 8 TeV with the CMS detector at the LHC, corresponding to an integrated luminosity of 20.5 fb −1 , an angular analysis has been performed for the decay B 0 → K * 0 µ + µ − . The signal sample consists of 1397 selected events. For each of seven bins between 1 to 19 GeV 2 in the dimuon invariant mass squared q 2 , unbinned maximumlikelihood fits are performed on the distributions of the K + π − µ + µ − invariant mass and three angular variables to obtain values of the P 1 and P 5 parameters. The results are among the most precise to date for these parameters and are consistent with predictions based on the standard model.