Measurement of spin correlations in tt production using the matrix element method in the muon+jets final state in pp collisions at √ s = 8 TeV

: The consistency of the spin correlation strength in top quark pair production with the standard model (SM) prediction is tested in the muon+jets final state. The events are selected from pp collisions, collected by the CMS detector, at a centre-of-mass energy of 8 TeV, corresponding to an integrated luminosity of 19.7fb 1 . The data are compared with the expectation for the spin correlation predicted by the SM and with the expectation of no correlation. Using a template fit method, the fraction of events that show SM spin correlations is measured to be 0.72 ± 0.08 (stat) +0 . 15 0 . 13 (syst), representing the most precise measurement of this quantity in the muon+jets final state to date. The consistency of the spin correlation strength in top quark pair production with the standard model (SM) prediction is tested in the muon+jets ﬁnal state. The events are selected from pp collisions, collected by the CMS detector, at a centre-of-mass energy of 8 TeV, corresponding to an integrated luminosity of 19.7 fb − 1 . The data are compared with the expectation for the spin correlation predicted by the SM and with the expectation of no correlation. Using a template ﬁt method, the fraction of events that show SM spin correlations is measured to be 0 . 72 ± 0 . 08 (stat) + 0 . 15 − 0 . 13 (syst), representing the most precise measurement of this quantity in the muon+jets ﬁnal state to date. © 2016 The Author. Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). Funded by SCOAP 3 .


Introduction
At the CERN LHC top quarks are predominantly produced in pairs (tt), mainly via gluon fusion, with each top quark decaying almost 100% of the time into a W boson and a b quark. The final states can be categorised as dilepton, where both W's decay into a lepton and a neutrino, hadronic, where both W's decay into quarks, and lepton+jets otherwise. The W decay into a tau lepton and neutrino is only considered leptonic if the τ decays include a muon or electron.
In quantum chromodynamics (QCD), the quark spins in heavy quark production are correlated. Since the lifetime of top quarks is smaller than the hadronisation timescale (1/ QCD ), which in turn is smaller than the spin decorrelation timescale m t / 2 QCD ∼ 3 × 10 −21 s, the top quarks decay before their spins decorrelate.
This spin correlation is therefore propagated to the top quark decay products and one can infer the tt spin correlation strength A by studying the angular correlations between the decay products, where E-mail address: cms-publication-committee-chair@cern.ch.
is the asymmetry between the number of tt pairs with aligned and antialigned spins. The value of A depends on the spin quantization axis chosen and on the production modes. Given the high centre-of-mass energy at the LHC, the helicity basis is used where the spin quantization axis is defined as the top quark or antiquark direction in the tt rest frame. The corresponding value of the spin correlation strength in the helicity basis is referred to as A hel . Since the spin correlation strength is precisely, but non-trivially, predicted by the standard model (SM) an accurate measurement of this variable tests various aspects of the SM, including the strength of the QCD coupling and the relative contribution of tt production modes, although new physics can influence the spin correlation strength [1,2].
Tevatron experiments made measurements of the tt spin correlation strength using template fits to the angular distributions of the top quark decay products and extracting the fraction of tt events with the SM prediction of spin correlation f defined as (2) where N tt SM is the number of SM tt events, whereas N tt uncor represents the number of events with uncorrelated tt. The top quark and antiquark in the uncorrelated tt events decay spherically. The assumption is that there are only SM and uncorrelated tt events, with a fraction of (1 − f ) of uncorrelated tt events. The physical range of this parameter f is restricted to [0, 1], with f = 1 for a sample of tt events produced by the SM. However, quite often an unconstrained template fit is performed, allowing for non-physical values of this parameter. The CDF Collaboration extracted the fraction f of events with the SM prediction of spin correlation using the lepton+jets final state [3] and the D0 Collaboration extracted this fraction using the dilepton final states [4,5]. The D0 Collaboration also made a spin correlation measurement using the matrix element method (MEM) [6] in the dilepton final state and found direct evidence of tt spin correlation by combining the measurements using MEM in the dilepton and lepton+jets final states [7].
The combined measurement yielded f = 0.85 ± 0.29 (stat + syst) using a data sample of pp collisions at √ s = 1.96 TeV, corresponding to an integrated luminosity of 5.3 fb −1 . At the LHC, the ATLAS Collaboration has reported observation of spin correlations in top quark pair production [8]. In the most recent measurement by the ATLAS Collaboration, the spin correlation measurement was performed using template fits to the distribution of the difference in azimuthal angle between the two oppositely charged leptons in the dilepton final state. This measurement at √ s = 8 TeV, using 20.3 fb −1 of integrated luminosity, resulted in f = 1.20 ± 0.05 (stat) ± 0.13 (syst) [9]. Another result by ATLAS in the dilepton channel has been reported in [10]. The only measurement in the lepton+jets final state at the LHC so far was made by the ATLAS Collaboration using the opening angle distributions between the decay products of the top quark and antiquark [11], giving f = 1.12 ± 0.11 (stat) ± 0.22 (syst) at √ s = 7 TeV, using 4.6 fb −1 of integrated luminosity.
Here, a measurement of the top quark spin correlations in events characterised by the presence of a muon and jets (μ+jets) is described using a MEM at √ s = 8 TeV with 19.7 fb −1 of integrated luminosity. Events with a muon coming from a τ decay are not considered as part of the signal. In this analysis, the traditional discrete hypotheses are investigated: SM and uncorrelated tt production and decay. In the MEM, the likelihood of an observed event to be produced by a given theoretical model is calculated. The likelihood ratio of the sample allows to distinguish between the two hypotheses. In addition, the distribution of event likelihood ratios is used in a template fit to extract the fraction f of events with the SM prediction of spin correlation. The rest of this Letter is organised as follows. In Section 2, a description of the apparatus used in this measurement, the CMS detector, is given. Following, in Section 3, a description of the simulation samples used in this analysis is given. The event selection and reconstruction procedure of the physics objects in an event are given in Section 4. In Section 5, the MEM is briefly explained. Section 6 describes the first part of this analysis, the hypothesistesting procedure, followed by the extraction of the variable f with a template fit in Section 7. The sources of systematic uncertainties are discussed in Section 8. A description on the treatment of these uncertainties in both parts of the analysis and the results are given in Section 9. Finally, a summary of the analysis is presented in Section 10.

The CMS detector
The central feature of the CMS apparatus [12] is a 3.8 T superconducting solenoid of 6 m internal diameter. The silicon pixel and strip tracker used for measuring charged-particle trajectories, a lead tungstate crystal electromagnetic calorimeter (ECAL), and the brass and scintillator hadron calorimeter (HCAL) are located within the superconducting solenoid volume. The calorimeters, ECAL and HCAL, both of which consist of a barrel and two endcap sections, surround the silicon tracking volume. Forward calorime-try extends the coverage provided by the barrel and endcap detectors to a pseudorapidity of |η| = 5.
Muons are measured using the tracker and the muon system that consists of gas-ionization detectors embedded in the steel flux-return yoke outside the solenoid. Muons are measured in the range |η| < 2.4, using three detector technologies: drift tubes, cathode strip chambers, and resistive-plate chambers. Matching muons to tracks measured in the silicon tracker results in a relative transverse momentum (p T ) resolution of 1.3-2.0% in the barrel and better than 6% in the endcaps for muons with 20 < p T < 100 GeV.
The p T resolution in the barrel is better than 10% for muons with p T up to 1 TeV [13].
The first level of the CMS trigger system, composed of custom hardware processors, uses information from the calorimeters and muon detectors to select the most interesting events. The highlevel trigger processor farm further decreases the event rate from around 100 kHz to around 400 Hz, before data storage.
A more detailed description of the CMS detector, together with a definition of the coordinate system used, can be found in Ref. [12].

Signal and background modeling
The signal processes (tt events in the μ+jets final state, SM and uncorrelated) as well as other tt decay channels (SM and uncorrelated) are simulated on the basis of a next-to-leadingorder (NLO) calculation using the generator mc@nlo v3.41 [14] with a top quark mass of 172.5 GeV. Parton showering is simulated using herwig 6.520 [15] and the default herwig6 underlying event tune was used. The NLO parton distribution function (PDF) set used is cteq6m [16]. The background samples of W+jets and Z/γ *+jets processes are generated using MadGraph 5.1.3.30 [17], pythia 6.426, and tauola v27.121.5 [18]. The backgrounds from single top quark processes are generated using powheg v1 [19][20][21] and tauola [22]. The Z2* underlying event tune is used. The most recent pythia Z2* tune is derived from the Z1 tune [23], which uses the CTEQ5L parton distributions set, whereas Z2* adopts CTEQ6L [24]. The generated events are processed through the CMS detector simulation based on Geant4 [25] and event reconstruction. To estimate the size of the effect of the top quark mass and factorisation and renormalisation scale uncertainties, mc@nlo samples with varied top quark mass and scales are used. The signal event yields are scaled to match the predicted top quark pair production cross section in proton-proton collisions at pb for a top quark mass equal to the world average of 173.3 GeV [26], computed with next-to-next-to-leading-order (NNLO) QCD corrections and next-to-next-to-leading-logarithmic (NNLL) resummation accuracy [27]. The simulated samples for the background processes are normalised using cross section calculations, generally at NLO accuracy [27]. Where necessary, systematically varied cross sections have been used for the normalisation. The simulation is corrected to the pileup conditions seen in the data. Pileup refers to the additional proton-proton interactions recorded simultaneously from the same bunch crossing. During 2012 data taking, there were on average 20 interactions per bunch crossing.

Event reconstruction and selection
The event selection has been optimised to identify tt events in the μ+jets final state. A single-muon trigger with a muon p T threshold of 24 GeV and a restriction on the pseudorapidity |η| < 2.1 is used to collect the data samples. Isolation and identification criteria are applied at the trigger level to achieve manageable rates with minimal loss of efficiency.
The physics objects used in this analysis are reconstructed with the CMS particle-flow (PF) algorithm [28,29]. The PF algorithm reconstructs and identifies each individual particle in an event using combined information from all CMS subdetectors. The energy of photons is directly obtained from the ECAL measurement. The energy of electrons is determined from a combination of the electron momentum measured at the primary interaction vertex by the tracker, the energy of the matched ECAL cluster, and the total energy of the associated bremsstrahlung photons. The momentum of muons is obtained from the curvature of the track associated to the muon. The energy of charged hadrons is determined from a combination of their momenta measured in the tracker and the matching energy deposits in the ECAL and HCAL, corrected for the calorimeter response to hadronic showers. Finally, the energy of neutral hadrons is obtained from the corresponding corrected ECAL and HCAL energy.
The reconstructed muon candidates are required to have p T > 26 GeV and |η| < 2.1, as to be in a region where the trigger is fully efficient. The track associated to the muon candidate is required to have a minimum number of hits in the silicon tracker, to be consistent with the primary vertex, and to have a high-quality fit which combines a track in the tracker and a minimum number of hits in the muon detectors into one track. For each muon candidate, a PF-based relative isolation is calculated, corrected for pileup effects on an event-by-event basis. The transverse momenta of all reconstructed particle candidates (excluding the muon itself) are summed in a cone of size R < 0.4 around the muon direction, with R = ( η) 2 + ( φ) 2 where φ is the azimuthal angle expressed in radians. The pileup contribution in this scalar sum is corrected for by summing only over the charged particles associated to the event vertex in the charged particle contribution, and subtracting the average energy due to pileup in the neutral particle contribution. After subtraction of the pileup contribution, the scalar sum is required to be smaller than 12% of the muon p T . It is required that exactly one of these well-identified muon candidates is present in the event. In addition, a looser selection on muons is applied which requires a relative isolation of less than 20% of the muon p T , a selection of p T > 10 GeV and |η| < 2.5. Events with additional muons passing looser identification criteria, as well as events with an electron are discarded. Events selected from other tt final states are denoted as "tt other" and consist of roughly 70% tt events in the dilepton final state and 30% events in the τ +jets final state.
For each event, hadronic jets are clustered from the reconstructed particle-flow particles with the anti-k T algorithm [30,31], with a distance parameter of 0.5. The jet momentum is determined as the vector sum of all particle momenta in the jet, which has been determined from simulation to be within 5% to 10% of the true momentum over the whole p T spectrum and detector acceptance. Contributions from pileup are taken into account by an offset correction to the jet energies. Jet energy scale corrections (JES) up to particle-level are derived from simulation, and are confirmed with in-situ measurements of the energy balance in dijet and pho-ton+jet events. The jet energy resolution (JER) in simulation is corrected to match the resolution observed in data. Additional selection criteria are applied to each event to remove spurious jet-like features originating from isolated uncharacteristic noise patterns in certain HCAL regions [32] and in the silicon avalanche photodiodes used in the ECAL barrel detector. The first three jets leading in p T are required to have a p T of at least 30 GeV, the fourth leading jet of at least 25 GeV and the remaining jets at least 20 GeV. At least two selected jets should be identified as coming from the decay of B-hadrons, based on the combined secondary vertex (CSV) algorithm with medium working point (CSVM) [33]. The CSV algorithm makes use of secondary vertices, when available, combined is defined as the projection on the plane perpendicular to the beams of the negative vector sum of the momenta of all reconstructed particles in an event. Its magnitude is referred to as E miss T . To reduce the effect of Final State Radiation (FSR), while not statistically limiting the analysis, we restrict the data set to events with four or five selected jets. To ensure that the selected jets in the event describe the tt kinematic quantities, we reject events if they have additional forward jets in the region of 2.4 < |η| < 4.7 and these have p T > 50 GeV.
To further increase the quality of the event selection and reduce the background contribution, we use a kinematic fitter, HitFit [34], designed to reconstruct the kinematic quantities of the tt system in the lepton+jets final state. The kinematic quantities observed in the event are varied within the detector resolution to satisfy some predefined constraints, i.e. the reconstructed hadronically decaying W boson mass is required to be consistent with 80.4 GeV and the reconstructed top quark and antiquark masses are required to be equal. The HitFit algorithm tries every jet-quark permutation and the solution with the highest goodness-of-fit (or equivalently, lowest χ 2 /ndof with ndof being the number of degrees of freedom) is chosen as the best estimate of the correct jet-quark permutation. We do not rely on HitFit to estimate the jet-quark permutation correctly, however, HitFit is used to decide which four jets in the event to use in the reconstruction of the tt final state in five-jet events. It is required that two of the jets selected by HitFit are identified as originating from B-hadrons. The selection of the jets in the event could be done with simpler methods, e.g. selecting the highest-p T jets, but HitFit offers the possibility to apply additional quality criteria. In order to reduce the background fraction and the fraction of mismodeled events, we only select events with a HitFit χ 2 /ndof < 5 or, equivalently, with the fit probability larger than 0.08. The value of the χ 2 /ndof-selection is chosen to maximise the separation power defined by Eq. (7) in Section 6. Mismodeled events can be due to the inclusion of radiated jets in the tt reconstruction or events with poorly reconstructed jet quantities. The χ 2 probability distribution is shown for data and simulation in The event yield after the full event selection is displayed in Table 1. The contributions are estimated from simulation and normalised to the observed luminosity using theoretical cross sections. The selection efficiency for the SM and uncorrelated signal samples are very similar so that the event selection does not bias the data towards one hypothesis. The background contribution due to multijet processes has been estimated from simulation and is found to be negligible. including the statistical uncertainties. The relative contributions in simulation are calculated using the theoretical cross sections with the total yield normalised to data. For the analysis, we only consider events with a probability larger than 0.08, as indicated by the arrow.

Matrix element method
The matrix element method [35][36][37][38] is a technique that directly relates theory with experimental events. The compatibility of the data recorded with the leading-order (LO) matrix element (ME) of a certain process is evaluated. The probability that an event is produced by this process is calculated using the full kinematic information in the event.
The probability P (x i |H) to observe an event i with kinematic properties x for a certain hypothesis H is given by: (3) The given probability is equivalent to an event likelihood. In this equation, q 1 and q 2 represent the parton energy fractions in the collision, f PDF (q 1 ) and f PDF (q 2 ) are the PDFs, s is the centre-ofmass energy squared of the colliding protons, and d 6 represents the phase space volume element. The transfer function, W (x, y), relates observed kinematic quantities x with parton-level quantities y. For every y, the transfer function is normalised to unity by integrating over all possible values of x. The LO ME is represented by M( y, H), where H denotes the hypothesis used. The tt spin correlation strength is not a parameter of the SM Lagrangian, therefore H is not a continuous parameter. The MEs M( y, H cor ) and M( y, H uncor ) both describe tt production and subsequent decay in the μ+jets channel valid for both on-and off-shell top quarks. In this analysis, the hypotheses are either the SM (H cor ), giving rise to a finite value of the spin correlation strength A (as discussed in Section 1) or the spin-uncorrelated hypothesis with A = 0 (H uncor ). Finally, σ obs (H) represents the observed tt cross section of the hypothesis, which ensures that the probability is normalised. The quantity σ obs (H) consists of the product of the production cross section σ , which is identical for our considered hypotheses, and the overall selection efficiency (H). The selection efficiency for events from both hypotheses are very similar, with an efficiency of (SM) = 0.0448 ± 0.0001 (stat) for the SM tt signal hypothesis, and (uncor) = 0.0458 ± 0.0001 (stat) for the uncorrelated signal hypothesis, which causes acceptance effects to nearly cancel in the likelihood ratio. The likelihood calculation is performed using MadWeight [39], in the MadGraph5 framework [17]. Since, in our convention, the likelihood for a single event is represented by P (x i |H), the likelihood of a sample with n events is The transfer function of a given interacting particle depends on the specifics of the detector. In this analysis, the transfer function is used to correct the jet kinematic quantities. The reconstructed jet energy information, corrected for JES and JER, is mapped onto parton-level quantities by integrating over the parton energy within the transfer function resolution during the likelihood calculation. All other kinematic quantities (such as angular information or lepton quantities) are unmodified by the transfer function as these are measured with sufficient accuracy with the CMS detector to describe a final state that does not include a dilepton resonance. The description of these variables with a Dirac delta function speeds up the integration. The E miss T is also described with a Dirac delta function and is only used to correct the kinematic quantities of the event for the transverse Lorentz boost. The event transfer function is the product of the object transfer functions, assuming no correlation between the reconstructed objects. The jet energy transfer function is determined from tt simulation to which the JES and JER corrections have been applied. For each jet in the simulation, unambiguously matched to a parton with R(jet, parton) < 0.3, the E jet and E parton are compared (separately for jets matched to b and light-flavour partons). The E jet distribution is fitted with a Gaussian function, where the Gaussian mean and width depend on E parton and are given by μ(E jet ) = m 0 (η parton ) + m 1 (η parton )E parton and σ (E jet ) = σ 0 (η parton ) + σ 1 (η parton )E parton + σ 2 (η parton ) E parton respectively. The fit of the E jet distribution is converted to a single Gaussian transfer function, which is a function of the variable E = E parton − E jet and the parameters are a function of E parton .
The transfer function, which is determined in the full kinematic phase space, is given by where the parameters are determined independently for b jets and light-flavour jets, in three slices of |η parton | given by 0 < |η parton | < 0.87, 0.87 < |η parton | < 1.48 and 1.48 < |η parton | < 2.5. In Fig. 2, the E distribution is shown for the E = E parton − E jet from simulation for all values of E parton and |η parton |. This is compared to the E distribution obtained by folding the E parton spectrum of matched partons with the transfer function. The reasonably good agreement of the resolution and the tails of the two distributions shows that the determined transfer functions are adequate.
The disadvantage of using a LO ME is that there is no explicit treatment for final state radiation in the MEs. As a result, the ME does not always cover the full event information leading to a slightly reduced discrimination between both hypotheses. In addition, background events evaluated under a tt hypothesis will more closely resemble the uncorrelated hypothesis as there is no correlation between the decay products. In the template fit part of this analysis, the small bias due to this effect is corrected for with a calibration curve (described in Section 7), whereas in the hypothesis testing the background contribution is fixed to the predictions from simulation, so no bias is present. MadWeight [39], the tool used to perform the MEM likelihood calculations, can partially correct for the initial state radiation (ISR) effect by evaluating the LO ME at an overall partonic p T of the tt system equal to the reconstructed p T of the system, thus properly treating fivejet events where one jet is due to ISR. Due to final state radiation (FSR), the matching with the LO ME, which requires four jets as input, becomes more difficult and more sensitive to systematic uncertainties related to variations on the jet energy scale or on the renormalisation/factorisation scales. The tt system is reconstructed using the four selected jets based on HitFit in the event, the lepton and the p miss T . The p miss T quantity is assigned to the undetected neutrino from the tt muon+jets final state. In the MadWeight likelihood calculations, every jet-quark permutation compatible with b tagging information is taken into account.

Hypothesis testing
The compatibility of the data with the SM hypothesis and the fully uncorrelated hypothesis is tested. The likelihood for each event is calculated under these two hypotheses, as described in Section 5. According to the Neyman-Pearson lemma, the test statistic with maximum separation power for a sample coming from either of two simple hypotheses is the likelihood ratio. This analysis uses λ event as the discriminating variable, defined as where P (H cor ) is the likelihood for the event under the SM hypothesis and similarly P (H uncor ) for the uncorrelated hypothesis.
Following the prescription proposed by Cousins et al. [40], we use −2 ln λ event as test statistic, a quantity hereafter referred to as the event likelihood ratio. The distributions of −2 ln λ event are shown in Fig. 3 for the SM tt sample (Fig. 3-top) and the uncorrelated tt sample (Fig. 3-bottom). The plots show a shape comparison between data and simulation. The differences between the SM and uncorrelated distribution are statistically significant.
with μ 1,2 being the means of the distributions and α 1,2 their width [40]. The separation power is a measure for the discrimination obtainable, for the size of the data set, between the two hypotheses expressed in standard deviations (σ ). Fig. 4 shows that a separation power of 8.8 σ can be obtained with the MEM when only statistical effects are taken into account. The distributions will be modified by the inclusion of the systematic uncertainties described in Section 8. The range of the −2 ln λ event distribution is chosen to maximise the separation power, while the binning is chosen finely enough to preserve the available separation power. In addition, the event selection (in particular the selection on the HitFit χ 2 /ndof) has been optimised to maximise the separation power.

Extraction of fraction of events with SM spin correlation
We extract the fraction f of tt signal events with the SM spin correlation by performing a template fit to the −2 ln λ event distribution. The fit model M( f obs , β obs ) is given by where f obs is the fraction of events with the SM spin correlation, and β obs is the fraction of background in the data. The tt signal SM template, the tt signal uncorrelated template, and the background template are denoted by T cor , T uncor , and T bkg , respectively. The background template contains the averaged contribution of the tt other background with SM spin correlation and the tt other with no spin correlation as these contributions are the same within statistical uncertainties. Systematic uncertainties are not included in the fit model. The parameter estimation is done using a binned maximum likelihood fit in RooFit [41], using Minuit [42]. The total normalisation is fixed to the observed data yield, but the relative background contribution and the fraction f obs are allowed to vary unconstrained in the fit. The binning and range of the template distributions are fixed to those used in the hypothesis testing, where they have been chosen to optimize the separation power between the two hypotheses.
There is a small bias in the extraction of f obs in the template fit due to the presence of background in the sample. The background shape resembles more the behaviour of the uncorrelated template, and the size of the sample from which the background template is derived is small. The small bias is corrected for with a calibration function. The bias is estimated from the simulation via pseudo-experiments with the observed data set size for a range of working points ( f input , β input ). At each working point, the mean observed f obs and β obs are extracted to construct a 2D calibration function, used to derive f calibrated as a function of the observed f obs and β obs . The f obs -and β obs -variables have been shifted by the weighted average of the evaluated working points to decorrelate the fit parameters. The calibration function is given by with f obs = f obs −0.502 and β obs = β obs −0.150. The fit parameters of the calibration function are listed in Table 2.

Fig. 5.
Result of the template fit to data. The squares represent the data with the statistical uncertainty smaller than the marker size, the dotted curve is the overall result of the fit, the solid curve is the contribution of the SM signal template to the fit, the dashed curve is the contribution of the uncorrelated signal, and the dash-dot curve is the background contribution.
It has been checked that the initial values of the parameters in the fit model have no influence on the template fit result. The result of the template fit on the data is shown in Fig. 5 with f obs,data = 0.747 ± 0.092, β obs,data = 0.168 ± 0.024, and a χ 2 /ndof = 1.552. From simulation, a background fraction β of 15.5% is expected in the fit range. After calibration of both the nominal result and the statistical uncertainty, the result is: In the fit to the −2 ln λ event distribution in the range [−0.7, 1.26], the correlation between f obs,data and β obs,data is around 54%.

Sources of systematic uncertainty
Systematic uncertainties affecting this analysis come from various sources, such as detector effects, theoretical uncertainties, and mismodeling in the simulation. The simulation is corrected where necessary by the use of event weights to account for efficiency differences in the data and simulation, e.g. muon identification, isolation efficiency, trigger efficiencies, b tagging and mistagging rates and pileup modeling. The systematic uncertainties are determined, independently of each other, by varying the efficiency correction, resolution, or scale correction factors within their uncertainties. For some uncertainties, this is equivalent to varying the event weights, for others, this requires recalculating the event likelihoods. In both cases, the −2 ln λ event distributions from which pseudo-experiments are drawn to calculate the sample likelihood ratios in simulation or that are used as templates for the fit, are modified. The sources of systematic uncertainties common to the hypothesis testing and template fit are listed and explained below. The order of the list of contributions gives an indication of the relative importance of the contribution in both the template fit and the hypothesis testing. The explicit treatment of the systematic uncertainties is explained in more detail in Section 9.1 for the hypothesis testing and in Section 9.2 for the template fit.
Limited statistical precision of simulation: The −2 ln λ event distributions are obtained from simulation with finite statistical precision. To estimate the effect of the statistical precision in this distribution on the observed significance or on the template fit, each bin of the −2 ln λ event distribution is varied randomly using a Poisson distribution within the statistical uncertainties. This is done independently for each simulation sample that contributes to the −2 ln λ event distribution.
Scale uncertainty: SM and uncorrelated tt samples with varied renormalisation and factorisation scales are used to estimate the uncertainty caused by the scale uncertainty. The renormalisation and factorisation scales are simultaneously doubled or halved with respect to their nominal values set to the sum of the transverse masses squared of the final-state particles (in the case of tt events this is the top quark pair and any additional parton) divided by two. The effect of the scale variation on the event selection is included.
JES and JER effects: The four-momenta of all jets reconstructed in simulated events are varied simultaneously within the uncertainties of the p T -and η-dependent JES [43,44] prior to the event selection. The additional resolution correction applied to the simulation to take into account the resolution difference between data and simulation is varied within the uncertainties in the simulation. The likelihood calculations are performed with the varied jet quantities, using the nominal transfer function. The JES uncertainty enters the measurement in two ways: (i) acceptance effects modify the relative contributions of the backgrounds and (ii) the event likelihood values vary due to the modified quantities. The latter effect is dominant.
Parton distribution functions: The PDF is varied within its uncertainty eigenvectors (CT10) in signal and background, and the effects are propagated through the event weights [45,46]. The procedure to propagate the effect to the −2 ln λ event distribution is described in [45].
Top quark mass uncertainty: SM and uncorrelated tt samples with varied top quark mass values have been produced, including the effect on the event selection. The nominal sample is simulated with a top quark mass of 172.5 GeV, whereas the systematically varied samples are simulated with m t = 169.5 GeV and m t = 175.5 GeV. The −2 ln λ event distribution is varied within 1/3 of the deviation obtained with m t = 175.5 GeV and m t = 169.5 GeV in order to mimic the −2 ln λ event variation caused by a 1 GeV uncertainty in the top quark mass world average value [26].
The top quark p t T modeling: The model of tt production in MadGraph as well as in mc@nlo predicts a harder transverse momentum spectrum for the top quark p t T than observed in the data [47,48]. The top quark pairs might be reweighted based on the p T spectrum of generator-level top quarks to obtain better agreement to the measured differential cross section. This reweighting is not applied in this analysis, but we do assign an uncertainty to the tt modeling by changing the event weight and propagating the effect to the −2 ln λ event shape.

Background modeling and theoretical cross sections:
We determine the relative contribution of the backgrounds using the theoretical cross sections for the background processes. The cross sections are varied within the theoretical uncertainties [27] and the effects are propagated to the analysis. The total background shape will change due to the change in relative contributions and, in the hypothesis testing, the total background fraction is fixed to the systematically varied value, whereas in the template fit, this fraction can vary freely in the fit. For the W+jets contribution, we vary the background yield by 50% and propagate the effects to the analysis, which is ample to cover the uncertainties on the theoretical cross sections. The shape of the W+jets background template is also varied by evaluating the −2 ln λ event distribution without the W+jets shape included, but keeping the total background fraction fixed to the nominal value.
Pileup: A 5% uncertainty on the inelastic pp cross section is taken into account and propagated to the event weights [49].

The b tagging efficiency and mistag rates:
The p T -and η-dependent tagging and mistagging efficiencies for light-and heavyflavour jets are varied within their uncertainties and are propagated to the event weights in the simulation [50].
Lepton trigger, identification, and isolation efficiencies: p Tand η-dependent scale factors are applied to the simulation to correct for efficiency differences in the data and simulation for the single lepton trigger, lepton identification and isolation. These scale factors are varied independently within their uncertainties and the effects are propagated to the event weights. The contribution of the individual systematic uncertainty sources is evaluated in the template fitting procedure described in Section 9.2 and reported in Table 3. The relative size of each systematic uncertainty contribution is consistent in the hypothesis testing procedure and the template fitting.

Hypothesis testing
To evaluate the compatibility of the data with either of the hypotheses, the systematic variations of the −2 ln λ event distribution need to be propagated to the −2 ln λ sample distribution. We assess the effect of this event likelihood ratio fluctuation by a Gaussian template morphing technique in which all systematic uncertainties are evaluated simultaneously. In each pseudo-experiment, we draw a sample from the morphed template with a size equal to that of the data set, and evaluate the sample likelihood ratio.
The −2 ln λ event distribution is morphed in the following way.
We draw a vector x of random numbers from a Gaussian distribution with mean 0 and width 1. Per systematic uncertainty source k, we have an independent entry x k in the vector. In each bin of the morphed template, the bin content N i is calculated as shown in the following equation with H(x k ) a Heaviside step function and N nom i the original bin content: Here, N k,up and N k,down are the bin contents of the systematically varied −2 ln λ event distribution for the upward and downward variation respectively. The summation runs over all systematic uncertainty sources. The systematic upward fluctuation is chosen for a systematic source when x k is positive and the downward fluctuation is chosen when x k is negative. This equation shows that all systematic uncertainty sources are varied simultaneously while the bin-to-bin correlations of the systematic effect is preserved. If the systematic up-and down-effects are asymmetric in size, this asymmetry is preserved. If the systematic up-and down-effects give a change in the same direction, the largest of the two contributions is chosen as a one-sided uncertainty while zero is used for the opposite side. Per template morphing iteration, we draw one x which gives us a varied −2 ln λ event distribution. From this distribution with this particular x, we draw one pseudo-experiment with a size equal to that of the data set. This is done independently for the SM and uncorrelated −2 ln λ event distribution.
We perform repeated pseudo-experiments with the template morphing technique to obtain the systematically varied sample likelihood ratio distribution shown in Fig. 6. The comparison of Figs. 4 and 6 shows the degradation of the separation power between the SM distribution and the uncorrelated distribution due to the systematic uncertainties. In addition, the result of the asymmetric behaviour of some systematic uncertainty sources is clearly visible. Performing 10 7 pseudo-experiments is enough to populate The samples in simulation contain signal and background mixed according to the theoretical cross sections, with the solid distribution obtained using SM tt simulation and the dashed distribution obtained using uncorrelated tt simulation, including systematic uncertainties. The arrow indicates the −2 ln λ sample observed in data.
The dotted curve shows a mixture of 72% SM tt events and 28% uncorrelated tt events.
the Gaussian tails in the template morphing phase space, ensuring a smooth −2 ln λ sample distribution with low statistical uncertainty even in the tails.
From the value of the data sample likelihood ratio, we find that 98.7% of the SM simulated area is above the data value, leading to an observed agreement with the SM hypothesis of 2.2 standard deviations. We find that 0.2% of the uncorrelated simulated area is above the data value, leading to an observed agreement of the uncorrelated hypothesis of 2.9 standard deviations. From this we can conclude that the data is more compatible with the SM hypothesis than with the uncorrelated hypothesis. The dominant uncertainty sources are the JES, scale variation, and the top quark mass uncertainties. The JES uncertainty is responsible for the asymmetric tails in the distribution.
As a test of the compatibility of the result in the hypothesis testing and the extraction of f , the hypothesis testing has been performed with a tt sample constructed such that 72% of the events contained SM correlations while the remainder 28% had no correlation. As a result we find a sample likelihood ratio distribution, shown in Fig. 6, in between the SM and uncorrelated scenario, with a data compatibility of 0.6 standard deviations. The value measured in data, which is slightly below the mean of the distribution, is within the expectation of statistical and systematic effects. We would have achieved even better agreement had we used in simulation a value of the top quark mass equal to the world average measurement of 173.3 GeV [26].

Extraction of fraction of events with SM spin correlation
In the extraction of f using a template fit to the variable −2 ln λ event , we have the same list of systematic uncertainty sources as described earlier, but in addition a systematic uncertainty due to the calibration of the method is taken into account. The calibration uncertainty is obtained by propagating the uncertainties in the calibration fit parameters shown in Table 2 and propagating the fit uncertainty of the fit parameter β obs,data .
The systematic uncertainties are determined by fitting the data with systematically varied templates and taking the difference from the nominal fit result. The systematic contributions, taking  Table 3 where the fit uncertainty of the nominal result is also shown. The systematic uncertainty related to the finite size of the simulation samples is evaluated by fitting one pseudodata set in the simulation by 1000 Poisson-fluctuated templates. The Gaussian width of the fit result f obs is taken as the systematic uncertainty value. This is done for each simulation sample independently with the uncertainties added in quadrature. In the template fit method, all systematic uncertainties are treated as independent of each other. The total systematic uncertainty is obtained by adding the positive and negative contributions in Table 3

Summary
The hypothesis that tt events are produced with correlated spins as predicted by the SM is tested using a matrix element method in the μ+jets final state at √ s = 8 TeV, using pp collisions corresponding to an integrated luminosity of 19.7 fb −1 . The data agree with the uncorrelated hypothesis within 2.9 standard deviations, whereas agreement with the SM is within 2.2 standard deviations. Our hypotheses are only considered up to NLO effects in the simulation, with LO matrix elements in the likelihood calculations.
Using a template fit method, the fraction of events which show SM spin correlations has been extracted. This fraction is measured to be f = 0.72 ± 0.08 (stat) +0.15 −0.13 (syst), leading to a spin correlation strength of A measured hel = 0.23 ± 0.03 (stat) +0.05 −0.04 (syst) using the value obtained in simulation which is compatible with the theoretical prediction for A SM hel from [51,52]. The result is the most precise determination of this quantity in the muon+jets final state to date and is competitive with the most accurate result in the dilepton final state [9].
We congratulate our colleagues in the CERN accelerator departments for the excellent performance of the LHC and thank the technical and administrative staffs at CERN and at other CMS institutes for their contributions to the success of the CMS effort. In addition, we gratefully acknowledge the computing centres and personnel of the Worldwide LHC Computing Grid for delivering so effectively the computing infrastructure essential to our analyses.