Search for $WH$ associated production in 5.3 fb$^{-1}$ of $p\bar{p}$ collisions at the Fermilab Tevatron

We present a search for associated production of Higgs and $W$ bosons in $p\bar{p}$ collisions at a center of mass energy of $\sqrt{s}=1.96$ TeV in 5.3 fb$^{-1}$ of integrated luminosity recorded by the D0 experiment. Multivariate analysis techniques are applied to events containing one lepton, an imbalance in transverse energy, and one or two $b$-tagged jets to discriminate a potential $WH$ signal from standard model backgrounds. We observe good agreement between data and background, and set an upper limit of 4.5(at 95% confidence level and for $m_H=115$ GeV) on the ratio of the $WH$ cross section multiplied by the branching fraction of $H \rightarrow b \bar{b}$ to its standard model prediction. A limit of 4.8 is expected from simulation.

D0 experiment. Multivariate analysis techniques are applied to events containing one lepton, an imbalance in transverse energy, and one or two b-tagged jets to discriminate a potential W H signal from standard model backgrounds. We observe good agreement between data and background, and set an upper limit of 4.5 (at 95% confidence level and for mH = 115 GeV) on the ratio of the W H cross section multiplied by the branching fraction of H → bb to its standard model prediction. A limit of 4.8 is expected from simulation. through their Yukawa couplings to the Higgs field. The mass of the Higgs boson (m H ) is not predicted by the SM, but the combination of direct searches at the CERN e + e − Collider (LEP) [1] and precision measurements of other electroweak parameters constrain m H to 114.4 < m H < 185 GeV at the 95% CL [2]. While the region 158 < m H < 175 GeV has been excluded at the 95% CL by a combination of searches at CDF and D0 [3][4][5][6], the remaining mass range continues to be probed at the Fermilab Tevatron Collider. The associated production of a Higgs boson and a leptonically-decaying W boson is among the cleanest Higgs boson search channels at the Tevatron, and provides the largest event yield for the decay H → bb in the range m H < 135 GeV. Several searches for W H production at a pp center-of-mass energy of √ s = 1.96 TeV have been published. Three of these [7][8][9] use subsamples (0.17 fb −1 , 0.44 fb −1 , and 1.1 fb −1 ) of the data analyzed in this paper, while three from the CDF collaboration are based on cumulative samples (0.32 fb −1 , 0.95 fb −1 and 2.7 fb −1 ) of integrated luminosity [10][11][12].
We present a new search using an improved multivariate technique based on data collected with the D0 detector, corresponding to an integrated luminosity of 5.3 fb −1 . The search selects events with one charged lepton (ℓ = electron, e, or muon, µ), an imbalance in transverse energy ( E T ) that arises from the unobserved neutrino in the W → ℓν decay, and either two or three jets, with one or two of these selected as candidate bquark jets (b-tagged).
The channels are separated into independent categories based on the number of b-tagged jets in an event (one or two). Single b-tagged events contain three important sources of backgrounds: (i) multijet events, where a jet is misidentified as an isolated lepton, (ii) W boson production in association with c-quark or light-quark jets, and (iii) W boson production in association with two heavyflavor (bb, cc) jets. In events with two b-tagged jets, the dominant backgrounds are from W bb, tt, and single topquark production.
The analysis relies on the following components of the D0 detector [13]: (i) a central-tracking system, which consists of a silicon microstrip tracker (SMT) and a central fiber tracker (CFT), both located within a 2 T superconducting solenoidal magnet; (ii) a liquidargon/uranium calorimeter containing electromagnetic, fine hadronic, and coarse hadronic layers, segmented into a central section (CC), covering pseudorapidity |η| < 1.1 relative to the center of the detector [14], and two end calorimeters (EC) extending coverage to |η| ≈ 4.0, all housed in separate cryostats [15], with scintillators between the CC and EC cryostats providing sampling of developing showers for 1.1 < |η| < 1.4; (iii) a muon system located beyond the calorimetry consisting of layers of tracking detectors and scintillation trigger counters, one before and two after the 1.8 T iron toroids. A 2006 upgrade of the D0 detector added an inner layer of silicon [16] to the SMT and an improved calorimeter trigger [17]. The integrated luminosity is measured using plastic scintillator arrays located in front of the EC cryostats at 2.7 < |η| < 4.4. The trigger and data acquisition systems are designed to accommodate high instantaneous luminosities.
Events in the electron channel are triggered by a logical OR of several triggers that require an electromagnetic (EM) object or an EM object in conjunction with a jet. Trigger efficiencies are taken into account in the Monte Carlo (MC) simulation through a weighting of events based on an efficiency derived from data, and parametrized as a function of electron η and azimuth φ, and jet transverse momentum p T .
We accept events for the muon channel from a mixture of single high-p T muon, jet, and muon plus jet triggers, and expect this inclusive trigger to be fully efficient for our selection criteria. We verify this by comparing events that pass a well-modeled subset of high-p T muon triggers to those that are selected by the inclusive set of triggers. Good agreement is observed between data and MC for this high-p T muon subset of triggers. Events not selected by a high-p T muon trigger tend to be selected by a jet trigger. The efficiency of this complementary set of triggers is modeled as a function of the scalar sum of jet p T in an event (H T ). This model provides a gain in efficiency relative to the high-p T muon triggers, and produces good agreement between data and MC for the combination of all triggers following its application to the simulation.
The pythia [18] MC generator is used to simulate production of dibosons with inclusive decays (W W , W Z, and ZZ), W H → lνbb and ZH → llbb (l = e, µ, or τ ). The contribution from ZH events in which one lepton is not identified to the total signal yield is approximately 5%. Background from W/Z (V )+jets and tt events is generated with alpgen [19] interfaced to pythia for parton showering and hadronization. The alpgen samples are produced using the MLM partonjet matching prescription [19]. The V +jets samples are divided into V +light jets and V +heavy-flavor jets. The V +light jets samples include V jj, V bj, and V cj processes, where j is a light-flavor (u, d, s quarks or gluons) jet, while the V +heavy-flavor samples for V bb and V cc are generated separately. Production of single topquark events is generated using comphep [20,21], with pythia used for parton evolution and hadronization. Simulation of both background and signal processes relies on the CTEQ6L1 [22] leading-order parton distribution functions for all MC events. These events are processed through a full D0 detector simulation based on geant [23] using the same reconstruction software as used for D0 data. Events from randomly chosen beam crossings are overlaid on the simulated events to reproduce the effect of multiple pp interactions and detector noise.
The simulated background processes are normalized to their predicted SM cross sections, except for W +jets events, which are normalized to data before applying btagging, where contamination from the W H signal is expected to be negligible. The signal cross sections and branching fractions are calculated at next-to-nextto-leading order (NNLO) and are taken from Refs. [24][25][26][27][28], while the tt single t, and diboson cross sections are at next-to-leading order (NLO), and taken from Ref. [29], Ref. [30], and the mcfm program [31], respectively. As a cross check, we compare data with NLO predictions for W +jets based on mcfm, and find a relative data/MC normalization factor of 1.0 ± 0.1, where the normaliza-tion for data is obtained after subtracting all other expected background processes. The normalizations of the V bb and V cc yields in MC relative to data are consistent with the ratio of LO/NLO cross sections predicted by mcfm. Therefore we apply these mcfm ratios to the corresponding W +heavy-flavor and Z+heavy-flavor jet processes.
This analysis is based on a preselection of events with an electron of p T > 15 GeV, with |η| < 1.1 or 1.5 < |η| < 2.5, or a muon of p T > 15 GeV, with |η| < 1.6. Preselected events are also required to have E T > 20 GeV, either two or three jets with p T > 20 GeV (after correction of the jet energy [32]) and |η| < 2.5, and H T > 60 GeV for 2-jet events, or H T > 80 GeV for 3-jet events. The E T is calculated from the individual calorimeter cells in the EM and fine hadronic layers of the calorimeter, and is corrected for the presence of muons. All energy corrections to electrons and jets (including energy from the coarse hadronic layers associated with jets) are propagated into the E T . To suppress multijet background, events with M T W < 40 − 0.5 E T GeV are removed, where M T W is the transverse mass of the W boson candidate. Events with additional charged leptons isolated from jets that pass the flavor-dependent p T thresholds p e T > 15 GeV, p µ T > 10 GeV, and p τ T > 10 or 15 GeV, depending on τ decay channel [33], are rejected to decrease dilepton background from Z boson and tt events. Events must have a reconstructed pp interaction vertex (containing at least three associated tracks) that is located within ±40 cm of the center of the detector in the longitudinal direction.
Lepton candidates are identified in two steps. In the first step, each candidate must pass "loose" identification criteria. For electrons, we require 95% of the energy in a shower to be deposited in the EM section of the calorimeter, isolation from other calorimeter energy deposits, spatial distributions of calorimeter energies consistent with those expected for EM showers, and a reconstructed track matched to the EM shower that is isolated from other tracks. For the "loose" muon, we require hits in each layer of the muon system, scintillator hits in time with a beam crossing (to veto cosmic rays), a spatial match with a track in the central tracker, and isolation from jets within ∆R < 0.5 [14] to reject semileptonic decays of hadrons. In the second step, the loose leptons are subjected to a more restrictive "tight" selection. Tight electrons must satisfy more restrictive calorimeter isolation fractions and EM energy-fraction criteria, and satisfy a likelihood test developed on Z → ee data based on eight quantities characterizing the EM nature of the particle interactions [34]. Tight muons must satisfy stricter isolation criteria on energy in the calorimeter and momenta of tracks near the trajectory of the muon candidate. Inefficiencies introduced by lepton-identification and isolation criteria are determined from Z → ℓℓ data. The final selections rely only on events with tight leptons, with events containing only loose leptons used to determine the multijet background.
Jets are reconstructed using a midpoint cone algorithm [35] with radius 0.5. Identification requirements for jets are based on longitudinal and transverse shower profiles, and minimize the possibility that the jets are caused by noise or spurious depositions of energy. For data taken after 2006, and in the corresponding simulation, jets must have at least two associated tracks emanating from the reconstructed pp interaction vertex. Any difference in efficiency for jet identification between data and simulation is corrected by adjusting the jet energy and resolution in simulation to match those measured in data. Comparison of alpgen with other generators and with data shows small discrepancies in distributions of jet pseudorapidity and dijet angular separations [36]. The data are therefore used to correct the alpgen W +jets and Z+jets MC events through polynomial reweighting functions parameterized by the leading and second-leading jet η, and ∆R between the two leading jets, that bring these distributions for the total simulated background and the high statistics sample of events prior to b-tagging into agreement.
Instrumental background and that from semileptonic decays of hadrons, referred to jointly as the multijet background, are estimated from data. The instrumental background is significant in the electron channel, where a jet with a high EM fraction can pass electron-identification criteria, or a photon can be misidentified as an electron. In the muon channel, the multijet background is less important and arises mainly from semi-leptonic decay of heavy-flavor quarks, where the muon passes isolation criteria.
To estimate the number of events that contain a jet that passes the "tight" lepton selection, we determine the probability f T |L for a "loose" lepton candidate, originating from a jet, to also pass tight identification. This is done in events that pass preselection requirements without applying the selection on M T W , i.e., events that contain one loose lepton and two jets, but small E T (5 − 15 GeV). The total non-multijet background is estimated from MC and subtracted from the data before estimating the contribution from multijet events. For electrons, f T |L is determined as a function of electron p T in three regions of |η| and four of ∆φ( E T , e), while for muons it is taken as a function of |η| for two regions of ∆φ( E T , µ). The efficiency for a loose lepton to pass the tight identification (ε T |L ) is measured in Z → ℓℓ events in data, and is modeled as a function of p T for electrons and muons. The estimation of multijet background described in Ref. [34] is used to determine the multijet background directly from data, where each event is assigned a weight that contributes to the multijet estimation based on f T |L and ε T |L as a function of event kinematics. Since f T |L depends on E T , the scale of this estimate of the multijet background must be adjusted when comparing to data with E T > 20 GeV. Before applying b-tagging, we perform a fit to the M T W distribution to set the scales for the multijet and W +jets backgrounds simultaneously.
Efficient identification of b jets is central to the search for W H production. The D0 neural network (NN) btagging algorithm [37] for identifying heavy-flavored jets is based on a combination of seven variables sensitive to the presence of tracks or secondary vertices displaced significantly from the primary vertex. All tagging efficiencies are determined separately for data and for simulated events. We first use a low threshold on the NN output that corresponds to a misidentification rate of 2.7% for light-flavor jets of p T ≥ 50 GeV that are mistakenly tagged as heavy-flavored jets. If two jets in an event pass this b-tagging requirement, the event is classified as double-b-tagged (DT). Events that are not classified as DT are considered for placement in an independent single-b-tag (ST) sample, which requires exactly one jet to satisfy a more restrictive NN operating point corresponding to a misidentification rate of 0.9%. The efficiencies for identifying a jet that contains a b hadron for the two NN operating points are (63±1)% and (53±1)%, respectively, for a jet with a p T of 50 GeV. These efficiencies are determined for "taggable" jets, i.e., jets with at least two tracks, each with at least one hit in the SMT. Simulated events are corrected to have the same fraction of jets satisfying the taggability and b-tagging requirements as found in preselected data.
The expected event yields following these selection criteria for specific backgrounds and for m H = 115 GeV are compared to the observed number of events in Table I. Distributions in dijet invariant mass for the two jets of highest p T , in 2-jet and 3-jet events are shown for the ST and DT samples in Fig. 1(a-d). The data are welldescribed by the sum of the simulated SM processes and multijet background. The contributions expected from a Higgs boson with m H = 115 GeV, multiplied by a factor of ten, are also shown for comparison.
We use a random forest (RF) multivariate technique [38,39] to separate the SM background from signal, and search for an excess, which is expected primarily at large values of RF discriminant. A separate RF discriminant is used for each combination of jet multiplicity (two or three), lepton flavor (e or µ), and number of b-tagged jets (one or two). The 2-jet events are divided into data-taking periods, before and after the 2006 detector upgrade, for a total of twelve separately trained RFs for each chosen Higgs boson mass. Each RF consists of a collection of individual decision trees, with each tree considering a random subset of the twenty kinematic and topological input variables listed in Table II. The final RF output is the average over the individual trees. The input variables √ŝ and ∆R(dijet,ℓ + ν) each have two solutions arising from the two possibilities for the neutrino p z , assuming the lepton and E T (ν) constitute the decay products of an on-shell W boson. The angles θ * and χ I: Summary of event yields for the ℓ + b-tagged jets + ET final state. Event yields in data are compared with the expected number of ST and DT events in the samples with W boson candidates plus two or three jets, comprised of contributions from simulated diboson pairs (labeled "W Z" in the table), W/Z+bb or cc ("W bb"), W/Z+light-quark jets ("W +lf "), and top-quark ("tt" and "Single t") production, as well as data-derived multijet background ("MJ"). The quoted uncertainties include both statistical and systematic contributions, including correlations between background sources and channels. The expectation for W H signal is given for mH = 115 GeV. are described in Ref. [40], and exploit kinematic differences arising from the scalar nature of the Higgs and the spins of objects in the W bb background. The RF outputs from 2-jet ST and DT events are shown in Fig. 1(e,f). The dijet mass distribution is especially sensitive to W H production, and was used previously to set limits on σ(pp → W H) × B(H → bb) in Ref. [8]. However, the gain in sensitivity using the RF output as the final discriminant is about 20% for a Higgs mass of 115 GeV, which, in terms of the expected limit on the W H cross section, is equivalent to a gain of about 40% in integrated luminosity. The systematic uncertainties that affect the signal and SM backgrounds can be categorized by the nature of their source, i.e., theoretical (e.g., uncertainty on a cross section), MC modeling (e.g., reweighting of alpgen samples), or experimental (e.g., uncertainty on integrated luminosity). Some of these uncertainties affect only the normalization of the signal or backgrounds, while others also affect the differential distribution of the RF output.
Theoretical uncertainties include uncertainties on the tt and single top-quark production cross sections (10% and 12%, respectively [29,30]), an uncertainty on the diboson production cross section (6% [31]), and an uncertainty on W +heavy-flavor production (20%, estimated from mcfm). These uncertainties affect only the normalization of the backgrounds.
Uncertainties from modeling that affect the distribution in the RF output include uncertainties on trigger efficiency as derived from data (3-5%), lepton identifica-tion and reconstruction efficiency (5-6%), reweighting of alpgen MC samples (2%), the MLM matching applied to W/Z+light-jet events (< 0.5%), and the systematic uncertainties associated with choice of renormalization and factorization scales in alpgen as well as the uncertainty on the strong coupling constant (2%). Uncertainties on the alpgen renormalization and factorization scales are evaluated by adjusting the nominal scale for each, simultaneously, by a factor of 0.5 and 2.0.
Experimental uncertainties that affect only the normalization of the signal and SM backgrounds arise from the uncertainty on integrated luminosity (6.1%) [43]. Those that also affect the distribution in RF output include jet taggability (3%), b-tagging efficiency (2.5-3% per heavy quark-jet), the light-quark jet misidentification rate (10%), acceptance for jet identification (5%); jet-energy calibration and resolution (varies between 15% and 30%, depending on the process and channel). The background-subtracted data points for the RF discriminant for m H = 115 GeV, with all channels combined, are shown with their systematic uncertainties in Fig. 2.
We observe no excess relative to expectation from SM background, and we set upper limits on the production GeV, for the difference between data and background expectation, for all channels (both e and µ, ST and DT, and 2-jet and 3-jet), shown with statistical uncertainties. The lightly-shaded region represents the total systematic uncertainty before using constraints from data (referred to as "Pre-Fit" in the legend), while the solid lines represent the total systematic uncertainty after constraining with data ("Post-Fit" in the legend.) The darker shaded region represents the SM Higgs signal expectation scaled up by a factor of 5.
cross section σ(W H) using the RF outputs for the different channels. The binning of the RF output is adjusted to assure adequate population of background events in each bin. We calculate all limits at the 95% CL using a modified frequentist approach and a Poisson log-likelihood ratio as test statistic [44,45]. The likelihood ratio is studied using pseudoexperiments based on randomly drawn Poisson trials of signal and background events. We treat systematic uncertainties as "nuisance parameters" constrained by their priors, and the best fits of these parameters to data are determined at each value of m H by maximizing the likelihood ratio [46]. Independent fits are performed to the background-only and signal-plusbackground hypotheses. All appropriate correlations of systematic uncertainties are maintained among channels and between signal and background. The systematic uncertainties before and after fitting are indicated in Fig. 2. The log-likelihood ratios for the background-only model and the signal-plus-background model as a function of m H are shown in Fig. 3(a).
The upper limit on the cross section for σ(pp → W H)× B(H → bb) at the 95% CL is a factor of 4.5 larger than the SM expectation for m H = 115 GeV. The corresponding upper limit expected from simulation is 4.8. The analysis is repeated for ten other m H values from 100 to 150 GeV; the corresponding observed and expected 95% CL limits relative to their SM expectations are given in Table III and in Fig. 3(b).
In conclusion, ℓ+ E T +2 or 3-jet events have been analyzed in a search for W H production in 5.3 fb −1 of pp collisions at the Fermilab Tevatron. The yield of single and  search Council (Sweden); and CAS and CNSF (China).