Search for Single Top Quark Production at DZero Using Neural Networks

We present a search for electroweak production of single top quarks in ~90 pb^-1 of data collected with the DZero detector at the Fermilab Tevatron collider. Using arrays of neural networks to separate signals from backgrounds, we set upper limits on the cross sections of 17 pb for the s-channel process ppbar->tb+X, and 22 pb for the t-channel process ppbar->tqb+X, both at the 95% confidence level.

According to the standard model, top quarks can be produced at the Tevatron pp collider via two mechanisms.One involves a virtual gluon that decays via the strong interaction into a t t pair.This mode has been observed by the CDF and DØ collaborations [1,2].The second mechanism involves the electroweak production of a single top quark at a W tb vertex, where W and b refer to the W boson and b quark.There are three ways of producing a single top quark: an s-channel process q ′ q→t b, a t-channel mode q ′ g→tq b, and a final state generated via both the s-and t-channels, bg→tW .For a top quark of mass 174.3 ± 5.1 GeV [3], the predicted cross sections are 0.75 ± 0.12 pb for tb production [4] and 1.47 ± 0.22 pb for tqb production [5], both calculated at next-to-leading-order precision with √ s = 1.8 TeV.For tW production, the cross section is 0.15 pb at leading order [6].The errors on the cross sections include contributions from the choice of parton distribution functions (PDF), the uncertainty on the gluon distribution, the choice of scale, and the experimental error on the top-quark mass measurement.In this Letter, we use the notation "tb" to refer to both t b and the charge-conjugate process tb, and "tqb" to refer to both tq b and tqb.
The DØ collaboration recently published its first results on the production of single top quarks [7].The search used a classical selection method to optimize the signal significance, based on the expected kinematic properties of the events.The 95% confidence level upper limits on the cross sections were determined to be 39 pb for the s-channel process and 58 pb for the t-channel process.The events contained an isolated electron or muon [8], missing transverse energy, and jets.At least one jet in each event was required to contain a "tagging" muon [9], used as an indication that the jet originated from the hadronization of a b quark.We have now developed a new and more powerful technique using arrays of neural networks that allow us to utilize the far more numerous untagged events in the search, as well as to improve the sensitivity for tagged events.This Letter describes the new method of event selection and presents significantly improved upper limits on the single-top-quark production cross sections.
The DØ detector [10] in Run 1 (1992-1996) had three major components: a driftchamber-based central tracking system that included a transition radiation detector, a uranium/liquid-argon calorimeter with a central module (CC) and two end calorimeters (EC), and an outer muon spectrometer.For the electron channel, we use 91.9 ± 4.1 pb −1 of data collected with a trigger that required an electromagnetic (EM) energy cluster in the calorimeter, a jet, and missing transverse energy ( E T ).For events passing the final selection criteria, the efficiency of this trigger is (90-99)%, depending on the location of the EM cluster in the calorimeter and on the presence of a tagging muon.In the muon channel, we use 88.0 ± 3.9 pb −1 of data acquired with several triggers that required either E T , or a muon and a jet.The combined efficiency of these triggers is (92-98)%.A third data sample, obtained with a trigger requiring just three jets, is used for measuring one of the backgrounds.Since the multijet cross section is very large, this trigger was prescaled, and we have 0.8 pb −1 of such data.Each of the three samples contains approximately one million events.We reconstruct the events offline by applying the same criteria to identify electrons, jets, and isolated and tagging muons, as described in Ref. [7].
Single-top-quark events have a readily identifiable final-state topology.The top quark decays to a W boson and a b quark.The W boson decays to a central (i.e., low pseudorapidity |η| [11]), isolated electron or muon with high transverse energy and momentum (E T , p T ), and a central, high-E T neutrino.We infer the presence of the neutrino from the vector imbalance of E T in the event.For an s-channel tb event, there are two central, high-E T b jets, whereas for a t-channel tqb event, there is only one such b jet, plus a forward, light-quark jet, and a central low-E T b jet.About 11% of the time [12], one of the b jets contains a tagging muon.Unfortunately, events like this are easily mimicked by many background processes.Before applying the initial selection criteria, the largest background in the electron channel comes from multijet events in which a jet is misidentified as an electron.We call this false isolated-electron background "mis-ID e."In the muon channel, the dominant component of the background is from multijet production with a coincident cosmic ray or beam-halo particle misidentified as an isolated muon.We call this false isolated-muon background "cosmic"; it is not included in the background model.Other backgrounds such as W +jets, t t pairs, and b b pairs, contribute at the few percent level.The b b background affects only the muon channel, when a muon from a b decay is misidentified as isolated.We call this false-isolated muon background "mis-ID µ."The analysis starts with the simple selection criteria listed in Table I.The requirements are determined by comparing distributions for the summed backgrounds with those of the data before most of the initial selection criteria are applied, and rejecting regions where there is poor agreement between data and the model for the sum of signals and backgrounds.The backgrounds are modeled using data weighted to represent the mis-ID e or mis-ID µ backgrounds, and six samples of Monte Carlo (MC) events: t t, W b b, W cc (including W cs and W ss), W jj (where j represents u, d, or g), W W , and W Z production.
The final requirement in Table I on the µ+jets/tag decay channel is a cutoff on the output of a neural network trained to reject events in which a cosmic ray has been misidentified as an isolated muon.The network uses the mlpfit package [13], and has seven input nodes, 15 hidden nodes, and one output node.The input variables are described in Table II.The pseudo-three-dimensional impact parameter is defined as IP 3d = IP 2  BV + IP 2 NB where "BV" stands for "bend view" and "NB" for "non-bend" view for the muon trajectory through the spectrometer toroids.Opening angle in the transverse plane between the high-p T muon and tagging muon p T (µ), p T (tag µ) Muon transverse momentum z vert (µ), z vert (tag µ) Position of the primary vertex projected from the muon's track in the calorimeter IP 3d (µ), IP 3d (tag µ) Pseudo-3d impact parameter of the muon's trajectory relative to the beam axis The cosmic-ray-rejection network (NN cosmic ) is trained on a background sample of 575 data events chosen to contain cosmic rays by requiring ∆φ(µ, tag µ) > 2.4, and on an equalsized cosmic-ray-free signal sample of MC tb, tqb, t t, and W +jets events.In the background training sample, the first muon passes either the isolated or nonisolated identification criteria.The results of the training are shown in Fig. 1(a).We accept events if the value of the network output is greater than 0.79, a cutoff designed to maximize rejection of the cosmic-ray component of the background.This selects 73% of the s-channel single-top-quark acceptance, 76% of the t-channel acceptance, 47% of the background included in the model (i.e., not including the rejected cosmic-ray component), and 28% of the data.The result of applying the network to the data and to the model for signal+background is shown in Fig. 1(b).
After applying the initial selection criteria, the combined acceptance for the s-channel tb signal is 3.8%; for the t-channel tqb signal it is 3.6%.The signal-to-background (S:B) ratios range from 1:40 to 1:470, depending on the production and decay channels.In what follows, we improve on the S:B ratios significantly by using arrays of neural networks that reject background while retaining adequate signal acceptance.The networks are trained to recognize detailed features of the signals and backgrounds, including correlations among the kinematic variables.They thereby provide superior separation of signal from background relative to classical selection techniques, where no correlations are taken into account.
Without the neural networks, no useful information about single-top-quark production can be obtained from the untagged events because the S:B ratio is so poor.Using the neural networks allows the untagged channels to provide as much sensitivity as the tagged ones.For training the neural networks, we divide the background event samples into five sets.There are three sets for W +jets events: "W jj," "W bb" (for the combined W b b and W cc MC samples), and "W W " (for the combined W W and W Z MC sets); and a set each for the t t and misidentified-lepton ("mis-ID l") backgrounds.Each analysis has five parallel networks, one to reject each type of background.There are four separate analyses: tb → e + jets, tb → µ + jets, tqb → e + jets, and tqb → µ + jets.We use the same networks for both untagged and tagged events combined, but choose different cutoffs on the output variables, depending on whether there is a tagging muon.This provides eight sets of results from 20 neural networks and 40 cutoffs on the outputs.
We use the package mlpfit [13], which has multi-layered perceptrons with a feed-forward structure and back-propagation of the errors for efficient computation.The networks are trained on samples of 2000-9000 signal MC events and background sets of the same size, using the "hybrid linear Broyden-Fletcher-Goldfarb-Shanno" learning method [14].The performance of a neural network of course depends on the choice of input variables, which should be selected to provide maximal discrimination between signal and background, and should reflect as many different properties of the events as possible.However, too many input variables can worsen performance if the additional information is weak compared to the noise they introduce into the analysis.The variables chosen as inputs to the neural networks are defined in Table III.
The "best jet" in an event is the one that, when combined with the isolated lepton and the neutrino, generates an invariant mass closest to that of the top quark (174.3GeV).The momentum components of the neutrino are derived from E T by assuming that it and the lepton arise from the decay of a W boson.Of the two possible solutions to the quadratic relation for the W mass, the one with the smallest absolute value of the neutrino's longitudinal momentum is chosen.The invariant mass variable M best in Table III uses the best jet in its definition.Several variables use all the jets in the event ("alljets").Those which exclude the best jet are denoted with a "prime".
We optimize the performance of each network by choosing the number of hidden nodes that minimizes the network's error function.The numbers of nodes are given in Table IV. Figure 2 illustrates the outputs of one set of five networks, after training on t-channel single-top-quark signals and each of the five background sets in the combined untagged and tagged electron+jets decay channels.The least amount of discrimination is obtained for the W jj events, which is unfortunate since this process has a large cross section.Also, a significant fraction of the tqb signal cannot be differentiated from t t background.(This is not the case for the lower jet-multiplicity s-channel tb events.)The best separation of signal from background is obtained for events with a misidentified electron.
The outputs of the neural networks for the tqb search in the untagged electron+jets decay channel are shown in Fig. 3.They are obtained by passing all the signal and background events, and then the data, through each network.The cutoffs on the outputs are simultaneously chosen by minimizing the expected limit on the cross section in this channel.
The signal acceptances and numbers of events predicted to remain in the data after the initial event selection criteria are shown in Table V.Table VI shows the acceptances and event yields after the neural network selections.We measure the signal acceptances using MC samples of s-channel and t-channel single-top-quark events from the comphep event generator [15], with the pythia package [16] used to simulate fragmentation, initial-state and final-state radiation, the underlying event, and leptonic decays of the W boson.For all event samples, the PDF is CTEQ3M [17].The MC events are processed through a detector simulation program based on the geant package [18] and a trigger simulation, and are then reconstructed.We apply all selection criteria directly to the reconstructed MC events, except for several particle identification requirements that are taken into account using factors obtained from other DØ data.
We calculate the acceptance for t t pairs and for the five subprocesses for W +jets in a manner similar to that used for signal, and then convert to a number of events using the integrated luminosity for each channel and the appropriate cross section.The t t background is modeled using herwig [19].The W b b, W cc, and W jj processes use comphep, followed by pythia, and the diboson processes are from pythia.The cross sections are DØ's measured value for t t [20], leading-order values for W b b, W cc, and W jj [21], and next-to-leading-order values for W W [22] and W Z [23].

FIG. 2. Results
of training the five neural networks used to separate t-channel tqb signal from background in the untagged and tagged electron+jets decay channels.For each plot, the lighter curves are for background and the darker curves for the signal.The solid curves are for untagged events, and the dashed curves for tagged ones.3. Outputs from the five neural networks used to separate t-channel tqb signal from background in the untagged electron+jets decay channel.For each plot, the upper curve with error band shows the sum of signal and all backgrounds, the lower curve shows 50 times the expected single-top-quark signal, the shaded circles with error bars show the data after the initial event selection, and the shaded histogram shows the data after all neural network selection criteria have been applied, except for the network shown in that plot.In the electron channel, the misidentified-electron background is measured using multijet data.For each jet that passes the electron E T and |η| requirements, the events are weighted by the probability that a jet mimics an electron.These probabilities are determined from the same multijet sample, but for E T < 15 GeV, and are found to be (0.0231 ± 0.0039)% (CC), and (0.0850 ± 0.0118)% (EC) in untagged events, and (0.0154 ± 0.0019)% (CC), and (0.0612 ± 0.0057)% (EC) in tagged events.The probabilities are independent of the jet E T for E T > 20 GeV.We normalize the integrated luminosity of the multijet sample to match the data sample used in the search for signal, and correct for a small difference in trigger efficiency between the two samples.
In the muon channel, the misidentified-muon background is from b b events; it is produced when one or both of the b quarks decays semileptonically to a muon, and one muon is misidentified as isolated.There are two ways such events can mimic signals.First, one of the b jets may not be reconstructed, and its muon can therefore appear to be isolated.Second, a muon can be emitted wide of its jet and be reconstructed as an isolated muon.The background from each source is measured using data collected with the same triggers as used for the muon signal.The events are required to pass all selection criteria, except that the muon, which otherwise passes the isolated muon requirements, is within a jet.Events with truly isolated muons are excluded.Each event is then weighted by the probability that a nonisolated muon is reconstructed as an isolated one.This probability is measured using the same data, but for E T < 15 GeV, and is found to be a few percent for each source on average.The probabilities are parametrized as a function of the muon p T ; they are higher at low p T and fall to zero for p T > 28 GeV.We calculate a weighted average of the two results to obtain the number of mis-ID µ background events.After applying the neural network selection criteria, the combined acceptances are 0.86% for the tb signal and 0.88% for the tqb signal, with the S:B ratios increased to between 1:9 and 1:99, a factor of 4-8 improvement compared with those after the initial event selections.
We use a Bayesian approach [24] to calculate limits on the cross sections for single-topquark production in the s-channel and t-channel modes.The inputs are the numbers of observed events, the signal acceptances and backgrounds, and the integrated luminosities.Covariance matrices are used to describe the correlated uncertainties on these quantities.A flat prior is used for the single-top-quark cross section, and a multivariate Gaussian prior for the other quantities.We calculate the likelihood functions in each decay channel and combine them to obtain the following 95% confidence level upper limits: • σ(pp → tb + X) < 17 pb • σ(pp → tqb + X) < 22 pb.
The contributions of each decay channel to these results are shown in Table VII.To conclude, we have searched for the electroweak production of single top quarks using a neural-network signal-selection technique.We find no evidence for such production and set upper limits on the cross sections for s-channel production of tb and t-channel production of tqb.The limits are consistent with expectations from the standard model.

14 FIG. 1 .
FIG. 1.Output from the neural network used to reject cosmic-ray contamination in the tagged muon+jets decay channel: (a) shows the results for the two training samples, and (b) shows the output for the sum of the signals and modeled backgrounds, and the data.
FIG.3.Outputs from the five neural networks used to separate t-channel tqb signal from background in the untagged electron+jets decay channel.For each plot, the upper curve with error band shows the sum of signal and all backgrounds, the lower curve shows 50 times the expected single-top-quark signal, the shaded circles with error bars show the data after the initial event selection, and the shaded histogram shows the data after all neural network selection criteria have been applied, except for the network shown in that plot.

TABLE I .
Initial selection criteria.

TABLE II .
Input variables to the cosmic ray neural network.

TABLE IV .
Number of input (i), hidden (h), and output (o) nodes (i-h-o) for each of the neural networks.

TABLE V .
Signal acceptances (as percentages of the total cross sections) and numbers of events expected after application of initial selection criteria.

TABLE VI .
Signal acceptances (as percentages of the total cross sections) and numbers of events expected after application of both the initial and neural network selection criteria.

TABLE VII .
The 95% confidence level upper limits on cross sections for the two production modes of single top quarks.Values are in picobarns.