Search for the standard model Higgs boson produced in association with a Z boson in 7.9 fb−1 of pp¯ collisions at s=1.96 TeV using the CDF II detector

We present a search for the standard model Higgs boson produced in association with a $Z$ boson, using up to 7.9 fb$^{-1}$ of integrated luminosity from $p\bar{p}$ collisions collected with the CDF II detector. We utilize several novel techniques, including multivariate lepton selection, multivariate trigger parametrization, and a multi-stage signal discriminant consisting of specialized functions trained to distinguish individual backgrounds. By increasing acceptance and enhancing signal discrimination, these techniques have significantly improved the sensitivity of the analysis above what was expected from a larger dataset alone. We observe no significant evidence for a signal, and we set limits on the $ZH$ production cross section. For a Higgs boson with mass 115 GeV/$c^2$, we expect (observe) a limit of 3.9 (4.8) times the standard model predicted value, at the 95% credibility level.

Production of Higgs bosons at the Tevatron primarily proceeds through the gluon fusion mechanism, gg → H [8]. Low-mass Higgs bosons (m H < 135 GeV/c 2 ) decay predominantly to a pair of b quarks, with a branching fraction of 79% (40%) [8]  Due to overwhelming QCD multijet production, lowmass searches with Higgs production via gluon fusion and H → bb decay are not feasible. To overcome this difficulty, we utilize the associated production of a Higgs boson with a massive vector boson, where leptonic decays of the vector boson produce distinctive event signatures.
This Letter presents a search for the SM Higgs boson using the ZH → + − bb process, where is an electron (e) or muon (µ). We search for events containing two oppositely-charged leptons consistent with the decay of a Z boson, and a hadronic signature consistent with the H → bb decay mode. Previous searches [9,10] by the CDF and D0 collaborations have demonstrated that this final state provides good sensitivity to a Higgs boson signal, primarily due to the ability of the experiments to reconstruct both the Z and Higgs bosons. We study data from pp collisions at √ s = 1.96 TeV recorded by the CDF II detector. We combine two independent analyses, one with Z → e + e − [11] and one with Z → µ + µ − [12], using data corresponding to 7.5 and 7.9 fb −1 of integrated luminosity, respectively.
The CDF II detector [13] consists of silicon-based and wire-drift-chamber tracking systems immersed in a 1.4 T magnetic field for particle momentum determination. Surrounding the tracking systems are electromagnetic and hadronic calorimeters, providing coverage in the pseudorapidity [14] range |η| < 3.6. Additional drift chambers used for muon identification are located in the outermost layer of the detector.
The sensitivity of this updated analysis is enhanced by using several novel techniques following two general strategies: increasing acceptance and enhancing signal discrimination. To increase acceptance, we introduce artificial neural networks (NNs) for lepton selection, and we also use several online event-selection (trigger) algorithms not previously used. Using a new technique, we are able to accurately model the combined behavior of these triggers, allowing access to ZH candidate events beyond the reach of the previous CDF searches. To enhance signal discrimination, we form a multi-stage event discriminant organized to isolate ZH candidates from known SM and instrumental processes (backgrounds).
To improve on standard cut-based lepton identification, we instead select leptons consistent with the decay of a Z boson by using several NNs. Each NN identifies individual electrons or muons, distinguishing them from both non-leptonic candidates and true leptons not originating from Z decays. A single NN is used for muon identification, and is trained [15] to distinguish between true muons from simulated Z decays and misidentified muons from a data sample containing same-charge muon candidates. In events with Z → µµ decays well contained in the detector, the muon NN selection achieves a Z identification efficiency of ∼96%, while simultaneously rejecting ∼94% of the non-Z background. Detector geometry [16,17] motivates three NNs for electron identification.
One is optimized for identification in the pseudorapidity range |η| < 1.1. The other two NNs are trained for the forward regions; one considers only candidates with a silicon-based track and the other considers candidates without such a track in the region 1.1 ≤ |η| ≤ 2.8. Compared to the selection utilized in previous searches, the electron NN has improved the rejection of jets misidentified as electrons by a factor of five. In total, the multivariate lepton selection has increased the acceptance of the analysis by ∼20% over previous searches [9].
Complementary to the improved lepton identification, we add additional triggers that were not previously utilized in this analysis channel. Rather than using a single trigger with a threshold for muon p T or electron E T for the respective Z selection, we consider any event selected by any trigger in three general sets. The first set includes several triggers that select events containing muon detector and drift chamber activity indicative of a high-p T muon [18,19]. Included in this category may be triggers with lower p T thresholds than the default muon trigger. The second set of triggers selects events with a large calorimeter-energy imbalance (missing transverse energy, / E T [20]). Some of these events contain muons that are not selected with the high-p T muon trigger, thereby increasing the acceptance of this analysis. A third set of triggers selects events with activity in the calorimeter suggestive of a high-E T electron [21]. By using these sets of triggers rather than just single triggers for each lepton type, we increase the event selection acceptance by ∼10%. To model the complicated correlations between kinematic variables used in the trigger selection described above, we use a novel technique that uses NN functions to parametrize trigger efficiencies as a function of kinematic observables [11,12].
Utilizing the above strategies to increase acceptance, we select events containing opposite-sign [22], same-flavor lepton pairs with m in a window ([76, 106] GeV/c 2 ) centered on the mass of the Z boson. Additionally, we require at least two jets [23], with transverse energy E T > 25 GeV for the leading jet, and E T > 15 GeV for all other jets. All jets are required to come from the central region of the detector, |η| < 2.0.
We define a pre-tag region (PT) before applying bquark jet identification, consisting of events with a reconstructed Z boson and two or more jets passing the criteria described above. We observe 33 975 events in the PT region, and expect a total background yield of 34 200 ± 4800 events, where the quoted uncertainty includes both systematic and statistical contributions. We expect 13.6 ± 1.1 ZH signal events in the PT region, for m H = 115 GeV/c 2 . The dominant process in the PT region is Z+light-flavor (LF) jets (u, d, s, and gluon jets), accounting for ∼ 85% of the total background. Z+heavyflavor (HF) (b and c) jets events, which contribute less than 10% of the background, are a small contribution in the PT region, but become relatively more significant in the signal regions. These processes are modeled using alpgen [24] to simulate the hard-scatter process, and pythia [25] for the subsequent hadronization. The Z+jets processes are simulated at leading order and require a K-factor of 1.4 [26] for normalization to NLO cross sections. Other small backgrounds include diboson (ZZ, W Z, and W W ) events and tt events, simulated entirely with pythia normalized to NLO [27] and NNLO [28] predictions, respectively. Finally, other processes, such as QCD multijet production, can produce two selected leptons in the event. For muon events, this background is modeled using same-charge muon pairs from data. For electron events, we measure the rate of jets passing the electron NN using collision data to estimate the contribution from these processes. This background accounts for ∼3% of the background in the PT region.
We utilize two different b-quark-identifying algorithms to search for jets consistent with the H → bb decay. The secondary vertex algorithm (SV) [29] identifies jets consistent with the decay of a long-lived b hadron by searching for displaced vertices. The SV algorithm has both a tight and a loose operating point -the loose point has better b-jet identification efficiency but also has a higher rate of jets incorrectly identified as b jets. The jet probability (JP) algorithm [30] uses track impact parameters relative to the primary vertex to construct a likelihood for all jet tracks to have originated from the primary vertex. Both algorithms have imperfect rejection of c-quark jets, allowing some events containing them to contribute to the final signal regions.
We use the combination of the two highest-E T jets to form potential Higgs boson candidates. We use a hierarchy of tag combinations to define three independent signal regions. We first search for events with two tight SV tags -defining the double-tag (DT) region, the most sensitive. A second signal region includes events with one loose SV tag and one JP tag (LJP), and the third contains events with just one tight SV tag (ST). These three regions are combined to search for ZH production. Table I shows the expected numbers of events for the signal and background processes, as well as the observed data.
In this analysis, we use a one-dimensional signal discriminant while maintaining the simultaneous separation of tt and Z+jets events from the ZH signal that was previously accomplished through a two-dimensional discriminant [9]. This method also further enhances signal discrimination by using two additional NNs in a multistage method, as described below.
We first train a NN signal discriminant, using several kinematic variables such as the dijet mass and / E T , to distinguish the signal-like (trained with ZH simulated events) and background-like (trained using a mixture of all background processes) events. Each data and simulated event is sent through the same signal discriminants, with a unique function optimized for 11 different Higgs mass hypotheses, defined in increments of 5 GeV/c 2 be- The multi-stage method defines three samples (I, II, III) where events can enter the final distributions used for limit setting. The first step involves separating tt and Z+jets events. This is done using a NN function trained to separate these specific processes. A cut on the output of this discriminant is chosen to define a ttenhanced sample (Sample I). Events which fail this cut and fall into Samples II or III are passed through a second NN function trained to separate b jets from charm and light flavor jets [31]. A cut on the output of this flavor separator function defines a sample containing mainly Z + cc and Z+LF backgrounds (Sample II), and a region enriched in b jets (Sample III).
This multi-stage approach produces final output distributions with three samples enriched in various background processes, as seen in Fig. 1, where we add (0, 1, 2) to the signal discriminant output score for each event when the event falls in Sample (I, II, III) as described above. By enhancing the signal discrimination in this way, we increase the sensitivity of the analysis by ∼10% over the technique used in Ref. [9]. We use these distributions to set limits on the ZH production cross section times H → bb branching ratio.
We evaluate several systematic uncertainties on the background and signal events. A large source of systematic uncertainty arises from the cross section values used in the normalization of events: 40%, 10%, 6%, and 5%, for Z+HF [32], tt, diboson, and ZH simulated events, respectively. An uncertainty of (1, 2, 5)% is applied to the (ST, LJP, DT) ZH samples after measuring changes in acceptance using simulated events with more or fewer particles radiated by the incoming and outgoing partons. The mistag prediction is measured using data, and carries an uncertainty ranging from 13.5% (ST) to 28 ing for uncertainty in the measurement of integrated luminosity. The trigger model applied to simulated events requires a 5% normalization uncertainty. We also apply uncertainties on the lepton reconstruction efficiency and energy measurement of 1% and 1.5%, respectively. For muons (electrons), we measure a 5% (50%) uncertainty on the normalization of the remaining background processes, based on differences in the rates of events containing same-charge and opposite-charge lepton pairs and in the rates of jets misidentified as electrons.
In addition, we account for sources of uncertainty that also include shape variations to account for the migration of events in the final signal discriminant distributions when fluctuating these shape-defining quantities within their uncertainties. These include uncertainties on the jet energies [33] as well as on the expected rate of Z + mistag events.
Comparing the observed data to our background prediction including uncertainties, we do not find any evidence of a ZH signal. We set upper limits on the ZH production cross section times H → bb branching ratio using a Bayesian algorithm [34], assuming a uniform prior on the signal rate. We do this by performing simulated experiments, each with a pseudo-dataset generated by randomly varying the normalizations of background processes within their respective statistical and systematic uncertainties, taking into account all background expectations in the absence of a signal. Each simulated experiment produces an upper limit on the ZH production cross section. The median of the 95% CL upper limits from the simulated experiments is taken to be the expected 95% CL upper limit of the analysis. We define the 1-sigma and 2-sigma deviations on the expected limit as the bounds which contain 68.3% and 95.5%, respectively, of the simulated experiment results. The observed data distribution is used to set the observed limit in a similar fashion. These limits are shown graphically, along with the 1-sigma and 2-sigma ranges, in Fig. 2. We find that the observed limit is in good agreement with the expected limit for no signal, within the 1-sigma range  across all Higgs mass hypotheses.
In conclusion, we have performed a search for the standard model Higgs boson in the process ZH → + − bb. The sensitivity of this analysis has improved due to several new multivariate techniques, including multivariate lepton identification, the use of NNs to obtain trigger efficiencies for simulated events, and a novel multi-stage discriminant approach used to enhance signal discrimination. We observe no significant excess and set an upper limit on the ZH production cross section times H → bb branching ratio. We expect (observe) a limit of 3.9 (4.8) times the standard model predicted value, for a Higgs boson with mass m H = 115 GeV/c 2 , at the 95% CL. The novel techniques presented here improve the sensitivity of the analysis by ∼25% above the gain expected from the ∼85% larger dataset.