Measurement of the production cross section for a W boson and two b jets in pp collisions at √ s = 7 TeV

Article history: Received 23 December 2013 Received in revised form 14 June 2014 Accepted 14 June 2014 Available online 18 June 2014 Editor: M. Doser


Introduction
This Letter reports a study of the production of a W boson and two b jets in proton-proton collisions, where the W boson is observed via its decay to a muon and a neutrino, and each b jet is identified by the presence of a b hadron with a displaced decay vertex. The production mechanism of bb pairs together with W or Z bosons has been the subject of extensive theoretical studies and is included in different simulation programs [1][2][3], but is still not thoroughly understood. Previous measurements of vector boson production with associated b-quark jets have shown varying levels of agreement with theoretical calculations [4][5][6].
According to the standard model (SM), the primary contribution for bb production in association with a W boson is due to the splitting of a gluon into a bb pair. Two different models for b-quark production are available, depending on whether there are four or five quark flavors in the proton parton distribution functions (PDFs) [7]. Therefore, a precise experimental measurement of the W + bb production cross section provides important input to the refinement of theoretical calculations in perturbative quantum chromodynamics (QCD), as well as the validation of Monte Carlo (MC) techniques.
A key feature of this analysis compared to others [4][5][6] is the bb phase space that is covered. Previous measurements have concentrated on W-boson production with at least one observed b-quark jet, for which the predictions differ from the experimental results. E-mail address: cms-publication-committee-chair@cern.ch. This difference is larger in the production of events with a collinear bb pair that is reconstructed as one jet [8,9], a topology afflicted by significant theoretical uncertainties. Focusing on the observation of W-boson production with two well-separated b-quark jets, this analysis provides a complementary approach by probing a kinematic regime that is better understood theoretically.
The production of W + bb events is an irreducible background in analyses involving two separated and well-identified b jets, such as SM Higgs boson production in association with an electroweak gauge boson and subsequent decay to bb. The discovery of a Higgs boson with a mass of approximately 125 GeV by the ATLAS and CMS Collaborations [10-12] motivates further studies to determine the coupling of this new boson to b quarks.
Other SM processes produce events with an experimental signature similar to the one studied here. These include production of top quark-antiquark pairs (tt), associated production of a W boson with light jets misidentified as b-quark jets, single-top-quark production, multijet production (henceforth labeled "QCD multijet"), Drell-Yan production associated with jets, and electroweak diboson production.

CMS detector and event samples
This analysis uses a sample of proton-proton collisions at a center-of-mass energy of uses a right-handed coordinate system, with the origin at the nominal interaction point, the x axis pointing to the center of the LHC ring, the y axis pointing up (perpendicular to the plane of the LHC ring), and the z axis along the counterclockwise-beam direction. The polar angle θ is measured from the positive z axis and the azimuthal angle φ is measured in the x-y plane in radians. The magnitude of the transverse momentum p T is calculated as p T = p 2 x + p 2 y . A superconducting solenoid is the central feature of the CMS detector, providing an axial magnetic field of 3.8 T parallel to the beam direction. A silicon pixel and strip tracker, a crystal electromagnetic calorimeter, and a brass/scintillator hadron calorimeter are located within the solenoid. A quartzfiber Cherenkov calorimeter extends the coverage to |η| < 5.0, where η = − ln[tan (θ/2)]. Muons are measured in gas-ionization detectors embedded in the steel flux return yoke outside the solenoid. The first level of the CMS trigger system, composed of custom hardware processors, is designed to select the most interesting events using information from the calorimeters and muon detectors. A high-level trigger processor farm decreases the event rate to a few hundred hertz, before data storage.
A number of MC event generators are used to simulate the signal and background event samples. Vector boson + jets and tt + jets productions are generated at leading order (LO) using . For all processes, the detector response is simulated using a detailed description of the CMS detector based on Geant4 [20]. The reconstruction of simulated events is performed with the same algorithms used for the analyzed data sample. The simulated event samples include additional minimum-bias interactions per bunch crossing (pileup).

Event reconstruction
Individual particles emerging from each collision are reconstructed with the particle-flow (PF) technique [21,22]. This approach uses the information from all subdetectors to identify and reconstruct individual particle candidates in the event, classifying them into mutually exclusive categories: charged hadrons, neutral hadrons, photons, electrons, and muons.
Muons are reconstructed by combining the information from the tracker and the muon spectrometer [23]. The muon candidates are required to originate from the primary vertex of the event, chosen as the vertex with the highest p 2 T of the charged particles associated with it. The muon relative isolation is defined as The p T of a muon passing the identification and isolation requirements is combined with the missing transverse energy E miss T of the event to form a W candidate. We define E miss T as the negative vector sum of the transverse momenta of all reconstructed particle candidates in the event. The value of E miss T is corrected for noise in the electromagnetic and hadron calorimeters using the procedure described in Ref. [24], which is based on a parametrization of the recoil energy measured in Z → μμ events. The reconstructed transverse mass of the system, M T , is calculated from the p T of the isolated muon and the E miss imuth between E miss T and p T (μ). In W → μν decays, the M T distribution exhibits a Jacobian peak with a kinematic endpoint at the W mass. It is therefore a natural discriminator against non-W final states, such as QCD multijet events, that have a lepton candidate and E miss T and a relatively low value of M T . Jets are constructed using the anti-k T clustering algorithm [25], as implemented in the fastjet package [26,27], with a distance parameter of 0.5. Jet clustering is performed using individual particle candidates reconstructed with the PF technique. Jets are required to pass identification criteria that eliminate jets originating from noisy channels in the hadron calorimeter [28]. Jets originating from pileup interactions are rejected by requiring consistency of the jets with the primary interaction vertex. Small corrections to jet energy-relative and absolute calibrations of the detector-are applied as a function of the p T and η of the jet [29].
The combined secondary vertex (CSV) b-tagging algorithm [30] exploits the long lifetime and relatively large mass of b hadrons to provide optimized b-quark jet discrimination. The CSV algorithm combines information about impact parameter significance, secondary vertex (SV) kinematic properties, and jet kinematic properties in a likelihood-ratio technique. Jets are b-tagged by imposing a minimum threshold on the CSV discriminator value. This threshold provides an efficiency of approximately 50% for identifying jets containing b-flavored hadrons, while limiting the misidentification probability for light-quark and gluon jets to 0.1% and for c-quark jets to 3%. Furthermore, to increase the purity of the sample, a selected jet is required to have a reconstructed SV. This additional requirement has a small impact on the selected b-quark jets (93% efficiency with respect to the b-tag selection) while reducing the combined misidentification probability for c-quark, light-quark, and gluon jets to <0.1%.

W + bb event selection
Candidate events are selected online by a single-muon trigger that requires a reconstructed muon with p T > 24 GeV and |η| < 2.1. The offline W + bb event selection requires an isolated muon with I rel < 0.12, p T > 25 GeV, |η| < 2.1, and exactly two jets with p T > 25 GeV and |η| < 2.4, where both selected jets must contain an SV and pass the b-tagging requirement. To reduce the contribution from Z-boson production, the event is rejected if a second muon forms an invariant mass m μμ > 60 GeV with the isolated muon. There are no requirements on the isolation or p T of the second muon. The tt background is reduced by requiring that there be no additional isolated electrons or muons with p T > 20 GeV in the event and no additional jets with p T > 25 GeV and 2.4 < |η| < 4.5. To reduce the contribution from QCD multijet events, M T > 45 GeV is required. The total number of observed events in the data sample after all selection requirements are applied is 1230.
We first estimate the normalized distributions for the signal and each type of background and then perform a global fit to determine the fraction of each background in the candidate sample. The shapes of the signal and background distributions for the variables we use in the fit are evaluated using simulation, except for the QCD multijet background, which is derived from data. The cross sections for the W + jets and Z + jets processes are calculated with the predictions from fewz [31] evaluated at next-to-nextto-leading order (NNLO) using the MSTW08 NNLO PDF set [32]. Single-top-quark and diboson production cross sections are normalized to the NLO cross section predictions from mcfm [33, 34] using the MSTW08 NLO PDF set. The tt cross section is taken at NNLO as calculated in Ref. [35]. For each background simulation we apply the same selection requirements as for the candidate sample to generate the relevant distributions for fitting.
To prove that the simulation describes the data both in shape and normalization, and can be used in the fit as described in the following section, we select specific data samples that are enriched with the relevant backgrounds and verify the performance of the simulation. These control regions are not used in the final signal extraction and serve only for this verification task, with the exception of the QCD multijet control region.
The shapes of the distributions for multijet events are taken directly from a multijet enriched data sample obtained using the signal selection requirements and, in addition, requiring a nonisolated muon: I rel > 0.2. The yield of multijet events is obtained from a fit performed on events with M T < 40 GeV, which is below the Jacobian peak of the W → μν. The resulting normalization is extrapolated to the signal region, M T > 45 GeV. The relative uncertainty in the yield of QCD multijet events is estimated to be ±50%, taking into account both the fit result and the extrapolation to the high-M T range. This relative uncertainty also covers shape mismodelings of the small multijet contribution in the final sample.
The W + light-quark jets process, where the jets are not initiated by b or c quarks, is the dominant background before applying the selection requirements on the SV and on b-tagging. The btagging algorithm reduces the contamination of light-quark and c-quark jets in the selected sample to approximately 2% of the total expected yield. The contribution of events with a single b-quark jet in the initial state and a misidentified second light-quark or gluon jet is negligible.
A tt background control data sample is formed by requiring two jets in addition to the two highest p T b-tagged jets. This higher jet multiplicity requirement selects a sample that is dominated by tt events. Fig. 1 (top) shows the invariant mass m J 3 J 4 of the third-and fourth-highest p T jets in the event. In tt events this observable is correlated to the mass of the hadronically decaying W boson. This tt control region is used in the final fit for the signal yield to constrain the tt background normalization in the signal region. The simulation describes the observed distributions well, both in terms of shape and normalization.
A Z + jets background data sample is defined by requiring the standard selection criteria with the additional requirement of a second muon with opposite charge such that the invariant mass of the dimuon system is consistent with a Z boson (70 < m μμ < 100 GeV). This sample is used to validate the Z + jets background estimate, as documented in Ref. [36]. The simulation describes the experimental distributions well in this control data sample.
A single-top-quark background sample is defined by selecting events in which the W boson is accompanied by exactly one bquark jet, which passes the tagging criteria, and an additional forward jet with |η| > 2.8. No further rejection of additional light jets or leptons is imposed. The simulation describes the single-topquark background data sample well, as documented in Ref. [37], and therefore it is used to estimate the yield and shape of the distributions of kinematic variables in the signal region.
The expected yield in the signal region for the SM Higgs boson of M H = 125 GeV associated with a W boson where the Higgs boson decays to bb pairs and the W boson decays to a muon and a neutrino has been computed using the powheg event generator. It would account for <0.2% of the total expected yield in the signal region and is not considered for this measurement.

Signal extraction
After all the selection requirements, the largest background contributions are the production of tt pairs and single top quarks. Contributions to the background from W + jets, Z + jets, diboson production, and QCD multijet events are much smaller, as shown in the middle column of Table 1.
The composition of the candidate data sample is extracted via a binned extended maximum-likelihood fit. Because the tt background is large (larger in fact than the signal), it is essential to constrain tightly both the signal and the tt background with a simultaneous fit to the p T of the leading jet (p T,J 1 ) in the signal region after all selection requirements are applied, and to the m J 3 J 4 Table 1 Comparison of the yields expected from the SM and obtained from the fit to the analyzed data sample. The predicted yield uncertainty takes into account the uncertainties in the measurement of the cross section for each of the processes, except for the multijet contribution, which is estimated using a multijet enriched background data sample. The uncertainty in the fitted yields combines the statistical and systematic uncertainties obtained from the extended binned maximum-likelihood fit technique.

Process
Predicted Observed events 1230 distribution obtained from the tt control sample. The normalization of each background contribution is allowed to vary in the fit within its uncertainty. The p T of the leading jet is chosen as the final fit variable because of its discrimination power against toprelated backgrounds. Fig. 1 shows the fitted distributions: p T,J 1 in the signal region (bottom) and m J 3 J 4 in the tt control sample (top); the yields shown for the different processes are those resulting from the fit. The χ 2 of the fit is 16.9, for 29 degrees of freedom (χ 2 /dof = 0.58). The fitted yields for all the processes are listed in Table 1, and compared to the predictions. All observed yields are found to be in agreement with the expectations. The systematic uncertainties, including those in the predicted background yields, are introduced as nuisance parameters in the fit with constraints around the estimated central value. Any cross section or acceptance uncertainty in the background processes is introduced as a log-normal constraint on the rate of the process. Alternate binned templates are obtained by varying the different sources of systematic uncertainty; the nominal and alternate templates are then interpolated depending on the nuisance parameter values. One of the largest systematic uncertainties comes from the relative uncertainty in the b-tagging efficiency (6% per jet). This and other uncertainties in the light-quark and c-quark jet mistagging efficiencies are taken from Ref. [30]. The jet energy and muon p T scales are allowed to vary within their uncertainties (1-3% and 0.2%, respectively). Relative uncertainties in the muon efficiency (due to trigger, reconstruction, identification, and isolation) are estimated to be 1%. The average number of pileup events in the data sample analyzed is 9. The uncertainty associated with the pileup in the simulation is studied by shifting the overall mean number of interactions per bunch crossing up or down by 0.6, which has a negligible effect on the measurement. To account for the uncertainty in the description of the E miss T spectrum, the component of E miss T that is not clustered in jets is shifted by ±10%. The normalizations of the background processes are also taken into account, with an uncertainty assigned to each process according to the theoretical predictions, the previous CMS measurements when available, or an estimate from the multijet data sample. The overall relative uncertainty in the signal selection efficiency due to the choice of PDF set is estimated by following the PDF4LHC recommendation and found to be approximately 1% [32, [38][39][40][41]. The varying of the factorization (μ F ) and renormalization (μ R ) scales, also based on the PDF4LHC recommendation, leads to an uncertainty of 10%. A similar procedure is followed to estimate the effect of scale variations on the signal shape, yielding an uncertainty in the cross section smaller than 1%. The relative integrated luminosity uncertainty is 2.2% [42].
The number of events in the candidate sample, 1230, is in agreement with the expected and fitted total yields, although it is not explicitly included in the fitting process. The number of signal events obtained from the binned maximum-likelihood fit is To cross-check these results, an independent study is performed with looser b-tagging criteria, corresponding to an efficiency of 70% for selecting a jet containing b-flavored hadrons, while the misidentification probability for light-quark and gluon jets is 1% and for c-quark jets is 11%. All other selection criteria for the signal and control samples remain unchanged. Since the c-quark jet contribution becomes significant with these looser criteria, it is essential to use variables in the fit that can discriminate against both W + cc and top-quark-initiated processes. The invariant mass measured using all particles originating at the SVs of the highest p T (m SV,J 1 ) and second-highest p T (m SV,J 2 ) jets can distinguish between W + bb and W + cc. The scalar sum of the transverse momenta of the jets, H T , is used to distinguish W + jets from top-quark contributions. The W + bb signal is extracted in a twodimensional fit using the two variables m SV,J 1 + m SV,J 2 and H T and constraining the tt contribution in the tt background data sample as described above. The distributions of the variables m SV,J 1 + m SV,J 2 and H T , which are projections of the two-dimensional distributions fitted in this cross-check, are shown in Fig. 2, with yields as given by the fit. The central value of the cross section computed with this method differs by less than 3% from the primary fit result.

Results
The W + bb cross section is measured within a fiducial volume defined by requiring a final-state muon with p T > 25 GeV and |η| < 2.1 and exactly two final-state particle jets, reconstructed using the anti-k T jet algorithm with a distance parameter of 0.5, with p T > 25 GeV and |η| < 2.4 and with each containing at least one b hadron with p T > 5 GeV. Events with extra jets with p T > 25 GeV and |η| < 4.5 are vetoed.
Within this fiducial phase space, the W + bb cross section is obtained using the expression where the efficiency of the selection requirements, sel = (11.2 ± 1.0)%, is computed using the MadGraph + pythia MC sample. The uncertainty in this selection efficiency comes from the PDF and scale variation uncertainties mentioned above. The experimental uncertainties are included in the determination of N S . The measured fiducial cross section is ± 0.06 (theo.) ± 0.01 (lum.) pb.
This measured value cannot be directly compared to the SM NLO cross section calculated with mcfm [33,34] because the latter pertains to jets of partons, not jets of hadrons, and does not include the production of bb pairs from double-parton scattering (DPS). mcfm predicts a cross section of 0.52 ± 0.03 pb at the parton level, using the MSTW2008 NNLO PDF set and setting the factorization and renormalization scales to μ F = μ R = m W + 2m b [34]. The uncertainty in the theoretical cross section quoted above is estimated by varying the scales μ F , μ R simultaneously up and down by a factor of two. It also takes into account the PDF uncertainties following the PDF4LHC recommendation. The scale uncertainty in the theoretical cross section may be underestimated because of the requirement of exactly two jets in the final state, which introduces a veto on events with extra jets. Therefore, a more conservative estimate of this uncertainty in the theoretical prediction is computed, following the procedure described in Ref. [43], and the total theoretical uncertainty is found to be 30%.
Two corrections are needed to link the theoretical prediction to the measurement, a hadronization correction and a DPS correction. At the parton level, the events are required to have a muon of p T > 25 GeV and |η| < 2.1 and exactly two parton jets of p T > 25 GeV and |η| < 2.4, each containing a b quark. The hadronization correction factor C b→B = 0.92 ± 0.01, calculated using a five-flavor MadGraph + pythia reference MC, is used to extrapolate the cross section computed at the level of parton jets to the level of final-state particle jets. The uncertainty assigned to this correction is obtained by comparing the corresponding factors computed with a four-flavored MadGraph MC simulation. The simulated MadGraph + pythia events include DPS production of bb pairs and they reproduce these processes adequately as measured by CMS [44]. The contribution of DPS events to the cross section at the parton-jet level is estimated to be σ DPS = (σ W × σ bb )/σ eff = 0.08 ± 0.05 pb. The value of the effective cross section, σ eff , is taken from Ref. In addition to this measurement of the production cross section, we have explored the kinematics of the W + bb system. The angular distance between the two selected b jets, R J 1 ,J 2 = ( η J 1 ,J 2 ) 2 + ( φ J 1 ,J 2 ) 2 , is compared to the SM prediction in Fig. 3 (top). Signal and background yields are taken from the binned maximum-likelihood fit, and their shapes from Monte Carlo simulations or data as described in Section 4. The minimum separation of 0.5 between the two jets is an important aspect of the phase space definition, as discussed in the introduction. Fig. 3 (bottom) compares the M T distribution to the SM predictions. Fig. 4 shows the invariant mass of the two selected b-quark jets (m J 1 J 2 ) as well as the transverse momentum of the system formed by the two b-quark jets (p T,J 1 J 2 ). The simulation describes the observed distributions well.

Summary
In summary, we have presented a measurement of the W + bb production cross section in proton-proton collisions at 7 TeV.
The W + bb events have been selected in the W → μν decay mode with a muon of p T > 25 GeV and |η| < 2.1, and two b jets of p T > 25 GeV and |η| < 2.4. The data sample corresponds to an integrated luminosity of 5.0 fb −1 . The measured fiducial cross section for production of a W boson and two b jets, This study provides the first measurement for pp → W + bb production at 7 TeV in this particular phase space, thereby complementing previous measurements performed at the LHC [6], which focused on the production of W bosons accompanied by one identified b jet. The precision of the measured cross section approaches that of theoretical predictions at NNLO, thus enabling sensitive tests of perturbative calculations in the SM.  system (bottom). Signal and background yields are taken from the binned maximumlikelihood fit described in the text. The uncertainty band corresponds to the uncertainty in the yields as given by the fit. The last bin in both plots includes overflow events. The lower panels show the ratio of observed data events to the total fitted yield.

Acknowledgements
We congratulate our colleagues in the CERN accelerator departments for the excellent performance of the LHC and thank the technical and administrative staffs at CERN and at other CMS institutes for their contributions to the success of the CMS effort. In addition, we gratefully acknowledge the computing centers and personnel of the Worldwide LHC Computing Grid for delivering so effectively the computing infrastructure essential to our