Kinematic Discrimination of tW and tt Productions Using Initial State Radiation

Production of a single top quark provides excellent opportunity for understanding top quark physics and Cabibbo-Kobayashi-Maskawa structure of the quark sector in the Standard Model. Although an associated production with a b-quark has already been observed at the Tevatron in 2009, a single top production in association with a W gauge boson has not been observed till 2014 at the LHC, where pair production of the top quark serves as the dominant background. Due to the kinematic similarity between tW and the dominant background, it is challenging to find suitable kinematic variables that offer good signal-background separation, which naturally leads to the use of multivariate methods. In this paper, we investigate kinematic structure of tW+j channel using M_T2 and invariant mass variables, and find that tW +j production could well be separated from tt production with high purity at a low cost of statistics when utilizing these kinematic correlations.


Introduction
The research program at the Large Hadron Collider (LHC) has been greatly successful in the sense that it not only discovered a new scalar state [1,2], which is consistent with the Higgs boson in the Standard Model (SM), but rediscovered the SM with great precision. Among the precision studies, the top quark (t) has received a particular attention as it is, and also as a window to new physics discovery. In fact, the LHC, dubbed as "top factory", is capable of copiously producing top quarks in pair via the strong interaction. Although mediated by the electroweak interaction, the production rate of a single top quark is quite sizable due to a large center of mass energy so that the LHC can provide with an ideal environment to study the single top modes as well. In the SM, the relevant production cross section of a single top is directly proportional to squaring one of the Cabibbo-Kobayashi-Maskawa (CKM) matrix elements, V tb , so that single top channels serve as a way to measure the parameter. On top of this parameter measurement, their cross section measurement is also sensitive to various new phenomena such as forth-generation models and models with flavor-changing neutral currents [3].
The production of a single top through s-channel and t-channel W gauge boson exchanges had been observed, at the 5.0 standard deviation level of significance, separately by D0 [4] and by CDF [5], whereas the associated production of a single top with a W gauge boson (henceforth denoted by tW ) had too small a cross section to be observed at the Tevatron. Nevertheless, the discovery of the tW channel becomes of great importance in the sense of 1) a way of confirming the SM in the top sector, 2) a way or cross-check of |V tb | measurement, and 3) a possible link to new physics searches such as bottom partners [6,7]. The LHC experiment has been able to reach a sufficient production cross section to see the tW mode [8,9] only in five years after the discovery of s-channel and t-channel single top modes, and the combination of their cross section measurements can be found in Ref. [10]. The ATLAS and CMS collaborations have devoted a lot of effort to develop a variety of sophisticated multivariate techniques that take advantage of the differences in the kinematic distributions between the relevant signal and backgrounds, i.e., the method of Boost Decision Tree (BDT) for the CMS and the method of Multi-Variate Analysis (MVA) for the ATLAS. Yet, there is no single kinematic variable that serves the reasonable separation between the signal and backgrounds. The signal channel is defined by the process shown in the left panel of Figure 1, while the major background to this channel is identified as the ordinary pair-produced top quarks for which one of the bottom quarks is missed. Although the missing rate of a bottom quark is small, the overwhelming production rate of tt can give rise to a sizable background to the signal process. It is interesting to compare the kinematic feature between the tW and the tt systems. First of all, the bottom quark comes from the decay of a top quark with a W gauge boson for both signal and background processes, i.e., the typical hardness and the directional preference of the bottom quark are similar. Therefore, its kinematic ensemble for both tW and tt, e.g., the distribution in the transverse momentum, tends to be close to each other. An analogous argument is readily applicable to the lepton. For both signal and background, it is emitted from a W boson along with a neutrino, so that the typical hardness and the directional preference are anticipated to be similar. Along the line of this observation, it is not surprising that other variables induced from the momenta of b-quarks and leptons do not show a reasonable performance in separating the signal and background events. In other words, it is rather difficult to find suited kinematic variables that offer good signal-background discrimination.
Provided with such a challenging situation, we here propose an alternative kinematic variable-based strategy which could have expedited the observation of the single top mode associated with a W gauge boson. The main idea behind our proposal can be summarized as follows. We basically require an additional jet on top of a bottom-tagged jet, two oppositesigned leptons, and a (large) missing transverse energy in the final state. Such an extra jet can be either b-tagged or not, i.e., 2b + + − + E / T or 1b + 1j + + − + E / T , correspondingly. For the latter signal region, we proceed exactly the same analysis for the former, i.e., we treat the additional non-b-tagged jet as if it were a bottom-initiated jet. With this requirement, the background restores the regular dileptonic tt event topology. 1 On the other hand, the signal process comes with a single b-quark at the leading order, so that higher order contributions are essential to meet the requirement, i.e., demanding an extra jet to attach to the leading order process. An example diagram is illustrated in the right panel of Figure 1. Unlike the background, the tW with an additional jet has an ill-defined event topology because such a jet is typically from either initial state radiation (ISR) or final state radiation (FSR). We then apply the well-known M T 2 variable [11][12][13][14] and the conventional invariant mass variable formed by a bottom quark and a lepton, m b . While the background yields upper-bounded distributions in those variables, the signal distributions are expected to stretch further beyond the kinematic endpoints of the background, for which the details are dictated by the hardness of the extra jet. It is therefore expected that a large fraction of signal events survive even with kinematic cuts in the M T 2 and m b while the background events are significantly suppressed.
The rest of this paper is organized as follows. In the next section, we briefly review the M T 2 variable, taking the dileptonic tt as a concrete example. In Sec. 3, we discuss behaviors of tt and tW in the M T 2 and m b variables with the requirement of 1b + 2 + E / T . We then re-examine their behaviors in those variables with an additional jet requirement in Sec. 4. Sec. 5 is reserved for our discussions and outlook.
2 A review on the M T 2 variable M T 2 and m b variables are well-motivated especially for a cascade decay of a heavy particle including two-step two-body decays such as the top decay, and therefore, it makes sense to investigate them for the tW case. While m b is (relatively) well-known, the M T 2 variable has non-trivial and less familiar features. In this sense, we provide a brief review on the M T 2 variable that is employed for the analyses in the following sections. For concreteness of the discussion later, we take the event topology defined by the pair-produced top quarks which subsequently decay dileptonically (see also Figure 2): We also take the decay sequence initiated by the top quark as the first decay side, while that by the anti-top quark as the second decay side solely for convenience. The M T 2 variable was originally proposed as a simple generalization of the well-known transverse mass to the case where each of the pair-produced heavier particles decays into an invisible particle along with a visible state [11][12][13][14]. Since the total missing transverse momentum / P T is shared by the two invisible particles, its formal definition is given by a minimization of the maximum of the two transverse masses (M (1) T and M (2) T ) in each decay chain over the transverse components of the invisible momenta (denoted by q (1) T and q (2) T ), subject to the / P T constraint, i.e., the total sum of the transverse momenta should identically vanish: wherem denotes the hypothetical/test mass parameter for the invisible particles and the superscripted numbers indicate the associated decay side. When more than one visible particle is involved in each decay chain, then one can define M T 2 in various subsystems [13] which can be further categorized into symmetric and asymmetric subsystems whether or not both M (i) T 's (i = 1, 2) are constructed in the same fashion. For the case of the tt system, there are three symmetric subsystems which are henceforth denoted by (bb), ( ), and (b b ) subsystems as per the visible particles associated with the subsystem under consideration. We explicitly delineate those three subsystems in Figure 2, and the operational difference among them is summarized below: • For the (b b ) subsystem, the transverse masses for the top quarks are minimized with the neutrinos considered as invisible particles.
• For the ( ) subsystem, the transverse masses for the W ± are minimized with the neutrinos considered as invisible particles. The visible momenta for the bottom quarks are considered as upstream momenta. • For the (bb) subsystem, the transverse masses for the top quarks are minimized with the W ± considered as invisible particles. The visible momenta for the leptons are considered as downstream momenta so that they are treated invisibly.
Since the neutrino plays a role of the invisible particle in the (b b ) and ( ) subsystems, the relevant test mass is typically assumed to be 0 GeV as per the SM neutrino mass. Analogously, for the (bb) subsystem, the relevant test mass is typically assumed to be 80 GeV as per the mass of the W gauge boson. Similar constructions can be performed for the asymmetric subsystems [15]. In this case, there arise three different subsystems denoted by (b ), (b b), and (bl) again named after the visible particles associated with the subsystem of interest. The corresponding subsystems are explicitly delineated in Figure 3, and the operational difference among them is explained below: • For the (b ) subsystem, the transverse masses for the top quark in one decay side and the W ± in the other decay side are minimized with the neutrinos considered as invisible particles. The visible momentum for the remaining bottom quark is considered as upstream momenta.
• For the (b b) subsystem, the transverse masses for the top quarks are minimized with the neutrino in one decay side and the W ± in the other decay side considered as invisible particles. The visible momentum for the remaining lepton is considered as downstream momenta so that it is treated invisibly.
• For the (b ) subsystem, the transverse masses for the top quark in one decay side and the W ± in the other decay side are minimized with the neutrino in one decay side and the W ± in the other decay side considered as invisible particles. The visible momenta for the remaining bottom quark and lepton are considered as upstream and downstream momenta, respectively, the latter of which is treated invisibly.
Since the neutrino is considered as the invisible particle in both decay sides for the (b ) subsystem, the relevant test mass is typically assumed to be 0 GeV as per the SM neutrino mass. On the contrary, in the other two subsystems, two different particle species take over the role of invisible particles so that two different test masses can be imposed, accordingly, i.e., 0 GeV and 80 GeV as per the masses of the SM neutrino and W gauge boson, depending on the subsystem of interest. One noteworthy fact is that the associated M T 2 distributions are bounded above by the mass of the decaying particle. 2 In fact, the analytic expressions for the kinematic endpoints can be written in terms of the mass parameters involved in the decay process [11][12][13], and interestingly enough, if the test masses are the same as the masses of invisible particles in the relevant subsystem, the maximum M T 2 value is the same as the heavier of the actual masses of the particles whose transverse masses are minimized. For our tt example, subsystems (b b ), (bb), (b ), (b b), and (b ) simply return the top quark mass while subsystem ( ) simply returns the W mass if each of the test masses is imposed correspondingly.

tW at the leading order: existing analyses
We first discuss collider signatures of dileptonic tW channel at the leading order together with a brief review on the corresponding experimental measurements conducted by CMS/ATLAS collaborations [8,9]. For more concrete discussions later on, Monte Carlo event samples of tt and tW including realistic effects such as detector resolutions have been prepared. For both signal and background, the parton level events at the leading order are generated by MadGraph aMC@NLO [16] in conjunction with parton distribution functions given by NNPDF23 [17] that is the default of MadGraph aMC@NLO. The outcomes from the parton event generator are subsequently fed to Pythia6.4 [18] for describing the showering and hadronization and Delphes3 [19] for describing the detector effects in order. All the simulation is done with a proton-proton collider of √ s = 8 TeV and an input top mass of 173 GeV. Given the final state defined by the dileptonic tW at the leading order, i.e., b + − + E / T with being either e or µ, several SM processes can give rise to the same visible final state. It turns out that among them dileptonic tt is the dominant background where one of b-quarks is lost, and therefore, we focus on the comparison between the two processes throughout this paper. To be mostly left with tW and tt events, we closely follow the event selection scheme employed in Ref. [8], among which the key criteria are enumerated below: • N = 2 with opposite electric charges, p e, µ T > 10 GeV and |η e(µ) | < 2.5(2.4), (3.1) • E / T > 50 GeV for the same flavor channels,  • m > 20 GeV and |m − m Z | > 10 GeV, where N and N j(b) denote the number of selected leptons and jets (b-tagged jets), respectively, and E / T is defined as | i p i T | = | − / P T | with i being all detected particle species. Jets are formed by the anti-k t algorithm [21] together with a radius parameter R = 0.5, and the btagging efficiency is hardwired to be 70 %, while the light quark jets are mis-tagged by 1% [8].
Having the events passing the above-given selection cuts, we first show that conventional kinematic variables such as M  Figure 4. Speaking of the M T 2 variables in various subsystems, we see that both of tt (blue dashed histograms) and tW (red solid histograms) develop similar distributions in them. For the case of tt, the distribution in each subsystem is nothing but the one anticipated in the respective subsystem, and therefore, the associated kinematic endpoint is expected to be the same as the W gauge boson mass (M T 2 of subsystem ( )) or the top quark mass (M T 2 of subsystems (b ) and (b )) with test masses imposed correspondingly as mentioned earlier [13]. The theoretical endpoints are indicated by black dashed lines, and we see that most of tt events are populated below them as expected. The small overflow in the M T 2 distributions for the ( ) and (b ) subsystems is due to various sources such as mis-measurement of E / T and parton showering/fragmentation (see, for example, Ref. [22] for more systematic study on the effect of those sources). On the other hand, for the M T 2 distribution in the (b ) subsystem, it is hard to find out kinematic configurations corresponding to the relevant endpoint so that the distribution does not reach the expected endpoint. When it comes to the signal process, in some sense, the final state of tW does not differ from that of tt. For example of the ( ) subsystem, while the M T 2 for tW can be interpreted as the one applied to the situation where W gauge bosons are pair-produced with a non-zero transverse upstream momentum given by a bottom quark, the net upstream momentum for tt is defined by a vector sum of the transverse momenta of two bottom quarks. Therefore, both distributions are expected to be upper-bounded by the same endpoint as well as to develop similar shapes up to the details of upstream momenta. A similar analogy is relevant to M T 2 for the other two subsystems. In this case, however, the tt is interpreted as a single top production associated with a W gauge boson with a missing b-jet absorbed into the upstream momentum. Again, signal and background distributions are expected to be bounded above by the same endpoint, and are inclined to exhibit similar shapes up to the details of upstream momenta. From all these observations, we conclude that M T 2 's in various subsystems are not good signal-background discriminators.
Finally, taking the m b distribution (the lower-right panel of Figure 4) into consideration, we see very similar behaviors for both tW (red solid histogram) and tt (blue dashed histogram). Here since there exists a two-fold combinatorial ambiguity [20], we keep only the smaller of the two to ensure the boundedness of the m b distributions. For both of them, the kinematic endpoint is dictated by the correct combination, i.e., the invariant mass formed by b and belonging to the decay cascade initiated by the same top quark, so that the expected maximum m b should be identical, that is, where all final state particles, i.e., bottom jet, lepton, and neutrino, are assumed massless. Again, the theoretical endpoint is indicated by a black dashed line, while the actual distributions involve a small overflow that is mostly stemming from the events where an ISR jet is mis-tagged as a bottom quark-initiated jet. In addition to the correct combinations, even the ensemble of incorrectly-combined m b is anticipated to be similar to each other because the lepton in the wrong combinatorial side is emitted from the common particle species W for both tW and tt. Of course, there may be a difference between the W from the decay of a top quark and the W in association with a top quark. Our simulation result shown in the lower-right panel of Figure 4, however, suggests that such a difference be insignificant. 3 All these observations above confirm that m b as well is not an ideal kinematic variable for discriminating signal events from background ones. The poor efficiency in separating tW and tt events by using a few simple kinematic variables can motivate to employ a more sophisticated method. As a matter of fact, the ATLAS and CMS collaborations have made use of Boost Decision Tree (BDT) [23] for the purpose of rejecting more background events with more signal events retained. The BDT is a type of multivariate analysis (MVA), which is a category of analysis methods that combine multiple input variables into a single discriminant. A BDT takes a number of input variables (chosen by the analyst) and trains a certain number of decision trees to separate the signal and background based on Monte Carlo samples for each (for both CMS and ATLAS, it was tW vs. tt, and other backgrounds were not included). To improve signal acceptance and background rejection with reliable performance, the relevant machine-training is "boosted" by giving a special weight to the cases where signal events are eventually identified as background events and vice versa. It has served very well the purpose of signal-background separation in the context of tW discovery. However, it is rather difficult to find variables yielding the best sensitivity so as to discriminate tW from backgrounds event-by-event. In addition, the eventual performance highly depends on the training samples, so that the internal procedure is rather obscure.
4 tW at the next-to-leading order: an alternative strategy Motivated by the challenging situation in separating the signal events from the background ones using simple kinematic variables, we propose an alternative kinematic variable-based strategy of enhancing the relevant signal-over-background. The basic idea behind it is to consider a higher order contribution, that is, a simple attachment of an extra jet (see the right panel of Figure 1 as an example event topology). The additional jet can be either mistagged as a bottom-initiated jet or not, and we consider both cases separately later on. Hence, we define a couple of signal regions whose final states are characterized by two opposite-signed leptons, a large missing energy, and two (one) b-tagged and zero (one) ordinary jets: Signal region I (SR-I): pp → 2b Signal region II (SR-II): pp → 1b + 1j We particularly emphasize that the discriminating power of M T 2 and m b can be dramatically improved for tW with an extra jet. For the background process (i.e., tt), the requirement of SR-I simply retrieves the entire dileptonic decay topology of top pairs so that the associated decay topology is totally well-defined. For SR-II, even if an extra jet is not b-tagged, we expect that the relevant final state of the background comes mostly from the dileptonic top pairs, i.e., the associated decay topology is as well-defined as that of SR-I. On the contrary, for the signal process, an extra jet is typically emitted as initial or final state radiation, and thus the relevant event topology is ill-defined. The main idea behind the proposed strategy is actually to tackle such a difference. Basically, the distributions of tt events in M T 2 of the six subsystems and m b are bounded above, and their upper bound (i.e., kinematic endpoint) can be easily calculated like the case considered in the previous section. On the other hand, for tW , the extra jet coming from ISR can be arbitrarily hard so that the corresponding endpoints in the M T 2 and m b distributions are completely dictated by the hardness of such an additional jet.

Signal region I:
We begin with the discussion for signal region I, followed by that for signal region II in the next subsection. The event selection scheme for SR-I is the same as Eqs. We clearly see that most of the tt events are confined below the expected kinematic endpoint, whereas a large fraction of the tW events exceed the kinematic endpoints for the tt system. Therefore, if one sets the cut near the kinematic endpoint, i.e., keeping the event whose M T 2 value is greater than the cut, one can reject most of the background events with many signal events retained. The right panels of Figure 5 demonstrate the associated efficiency curves for the tt and tW in the M T 2 cuts. They clearly show that the signal efficiency, tW (red solid curves) overwhelms the background efficiency, tt (blue dashed curves) as the cuts are close to or beyond the tt kinematic endpoints (black dashed lines). A similar analysis can be conducted for the three asymmetric subsystems that are exhibited in Figure 6: the (b b) subsystem in the top panels, the (b ) subsystem in the middle panels, and the (b ) subsystem in the bottom panels. The M T 2 distributions are shown in the left panels, while the corresponding efficiency plots are shown in the right panels. Since the invisible particles in the (b b) and (b ) subsystems are different in both decay legs, the relevant test masses are applied accordingly, i.e., 0 GeV for the decay leg involving a lepton and 80 GeV for the decay leg involving only a bottom. On the other hand, the (b ) subsystem assumes identical invisible particles (here neutrino) so that a common test mass of 0 GeV is employed. Note that there arises a combinatorial issue for all asymmetric subsystems. For any given event, there are two partitionings depending on the way of grouping one lepton and one bottom quark, and for each partitioning two M T 2 values are available. To resolve this combinatorial ambiguity, we follow the prescription used in Ref. [24] with a slight modification, summarizing as follows. As mentioned above, each partitioning has two M T 2 values, smaller and larger. Suppose that for one partitioning we have the smaller value a and the larger value A, while for the other partitioning we have the smaller value b and the larger  value B. When ordering those four values, we have six possibilities. As one of the partitionings is correct, either A or B is surely correct. However, we are unaware a priori which is the case. Here we simply choose the smaller out of A and B as a conservative approach. 4 Those ordering and selection rule are tabulated in Table 1. In principle, this selection scheme is not unique, and other possibilities are still available (see Ref. [20], for example.). We attempted other possible selection schemes and found that the above-described prescription is the best for signal-background separation.
Producing the distributions in Figure 6 according to the prescription, we observe a clear separation between the signal and background events in all three subsystems. Most of the background events are populated below the expected kinematic endpoint for the tt while a large number of signal events can be found even beyond the endpoint. Again, if the cut is applied near the kinematic endpoint, most of the background events can be suppressed with many signal events kept. This expectation is consistently supported by the associated efficiency curves in the M T 2 cuts. Like the cases in the symmetric subsystems, they also show that the signal efficiency denoted by red solid curves predominates the background efficiency denoted by blue dashed curves as the cuts are near or beyond the tt kinematic endpoints indicated by black dashed lines.
It is interesting to understand this overflow phenomenon of the signal in the M T 2 distributions of various subsystems by investigating its asymptotic behavior in the presence of a very hard b-jet that typically emerges due to a mis-tag of an ISR jet. By definition of M T 2 given in Eq. (2.2), it is sufficient to evaluate the global minimum of the transverse mass for the decay side having such a hard b-jet, assuming that it is M (1) T solely for convenience.
are the transverse energy and transverse mass formed by all visible particles belonging to the first decay side. One then can prove that the global minimum of the above transverse mass is given by where m v(1) simply implies the invariant mass formed by the relevant visible particles [25,26].
More specifically, if m v(1) is formed by a bottom and a lepton, it is evaluated by where θ b denotes the intersecting angle between b and . One can easily see that it can be arbitrarily large as the bottom becomes arbitrarily hard unless b and are extremely collinear. Thus, Eq. (4.4) can be arbitrarily large, and in turn, so can M T 2 . This argument is readily applicable to the subsystems where at least one of the decay sides involves a lepton and a bottom at the same time: for example, subsystems (b b ), (b b), and (b ). If m v(1) vanishes, however, this argument gets subtle, and thus it is better to look at the full expressions of both M T 's: where in the second line of Eq. (4.7) we used the assumption that p A similar observation can be made for the m b distribution using an analogous argument. Again, the requirement of an additional jet on top of a bottom-tagged jet and two oppositesigned leptons retrieves the entire decay topology of the dileptonic tt system, so that the invariant mass variable is upper-bounded as in the case of Sec. 3. On the other hand, the additional jet, which is mis-tagged as a bottom quark in SR-I, can be arbitrarily hard, thus the relevant invariant mass evaluated with it can be arbitrarily large as explained in Eq. (4.5) and thereafter. We therefore expect that the m b distribution for tt is bounded above, whereas that for tW is featured by a large tail stretching even beyond the expected m b endpoint of the tt system. Obviously, there arises a combinatorial issue in having the m b distributions. For the treatment of wrong combinations in m b , we again follow the prescription used in Ref. [24], being adopted for the M T 2 variables in the asymmetric subsystems. Having such a selection scheme in our mind, we plot the m b distributions for tt and tW in Figure 7 where the signal and the background distributions are described by the red solid and the blue dashed histograms, respectively. As the selection scheme preserves the kinematic endpoint of the m b distribution for the tt system (see also Eq. (3.5)), we denote such a theoretical endpoint by the black dashed line. We clearly see that for a large fraction of signal events, the associated m b value exceeds the kinematic endpoint as expected. Like M T 2 , if one imposes a m b cut near the tt kinematic endpoint, i.e., keeping the event whose m b value is greater than the cut, one can reject most of the background events while retaining many signal events. The right panel of Figure 7 shows the associated efficiency curves for the tt and tW in the m b cuts. We again observe that the signal efficiency (red solid curve) is better than the background efficiency (blue dashed curve) as the cut is close to or beyond the tt kinematic endpoint (black dashed line).
To look at the signal-background separation of each variable more closely, we plot the Receiver Operating Characteristic (ROC) curves in Figure 8. The right panel of it magnifies the region where the background rejections are large. The ROC curve showing the best performance (i.e., large signal efficiency as well as large background rejection) is drawn in the rightmost position, and the others are exhibited in sequence of decreasing performance such , and M T 2 (b ). The diagonal line connecting (1, 0) and (0, 1) (black dotted lines) is drawn for a reference. We here omit the one for the ( ) subsystem because it is hardly beneficial in selecting signal events against background ones. In other words, it is below or close to the above-mentioned diagonal line in all range. In Table 2, we also tabulate the cuts and signal efficiencies ( tW ) of four sample points for which the background events are rejected by a rate of 99.9%, 99%, 90%, and 50%. The ROC curves suggest that four variables should provide with almost equally best efficiencies, which are the m b and the M T 2 in subsystems (b b ), (b b), and (b ): for example, 99.9% of background rejection vs. ∼5% of signal acceptance, 99% of background rejection vs. ∼20% of signal acceptance, and so on.
As the above-mentioned four are the best variables, it is interesting to investigate the correlation among them to see if there is any further improvement in the relevant discriminating power. One could attempt various combinations among them. For example, Figure 9 demonstrates the unit-normalized two-dimensional temperature plots of M T 2 (b b ) vs. m b for the tt (left panel) and the tW (right panel) events. Very roughly, we observe that the two variables have a positive correlation, i.e., as M T 2 in the (b b ) subsystem increases, m b increases as well, and vice versa. In particular, this trend is more manifest for signal events partly because both values are commonly dictated by the hardness of the additional jet. Hence, it is rather challenging to get a dramatic improvement by the introduction of simple schemes such as rejection of events whose M T 2 (b b ) and m b values are simultaneously less than given respective cuts. We instead see that the background events tend to populate in a local region (lower-left corner in the figure), while the signal events spread over a (relatively) wider region. Given this observation, a potential improvement could be achieved by intro- ducing an customized cut enveloping the background region in the left panel of Figure 9. We do not perform a detailed study in this direction because it is beyond the scope of this paper. Once the jet is selected in this way, it is considered as another b-jet throughout the analysis later on. To preclude the inclusion of any extra loose jet, we additionally require that there should be only one jet even satisfying p j T > 20 GeV and |η j | < 4.9. Although most of events come from either tW or tt, SR-II is contrasted with SR-I by a couple of qualitative differences. First, an enhanced signal-over-background is anticipated. Since the additional jet is typically originated from ISR/FSR gluons, more tW + j events can pass the relevant selection criteria than those in SR-I. On the contrary, the ordinary dileptonic tt comes with two bottom quarks at the parton level, so that the requirement of a single regular jet and a single bottom jet reduces the background acceptance by the missing rate of bottom quarks. At the expense of gaining more signal acceptance, the signal separation from the background events becomes less efficient. The reason is that for tt there is more possibility that such an extra jet is from ISR which would have been rejected by an additional b-tagged jet. Like the signal process tW + j, the hard ISR jet can render even tt events exceed the expected  Table 3. Signal efficiency tW (numbers in the parentheses) and the associated cuts in GeV for m b and M T 2 in various subsystems with respect to SR-II. The numbers are tabulated for four representative background rejections, 1 − tt .
kinematic endpoint, and as a result, the signal efficiency becomes (slightly) reduced for a given background rejection. The combinatorial ambiguity arising in all variables but the M T 2 for the (bb) subsystem is taken care of by the same prescriptions elaborated in the previous subsection. The employed test masses are the same as the ones used in the corresponding M T 2 variables in SR-I. As before, the black dashed lines indicate the expected endpoints of the tt system. We observe that all distributions look very similar to the corresponding ones demonstrated in Figures 5, 6, and 7. However, we also observe that more background events leak beyond the associated kinematic endpoints as discussed before. To see the correlation of signal acceptance vs. background rejection, we plot the ROC curves in Figure 11. Like in SR-I, the right panel of it zoom in the region where the background rejections are large. The color code is the same as that in Figure 8. More quantitatively, we enumerate the cuts and signal efficiencies of four sample points in Table 3 like Table 2. Signal acceptance is somewhat worse than that in SR-I for large background rejection. But it becomes improved compared with that in SR-I as background rejection decreases.

Would-be-expected statistical significance
In this subsection, we discuss the statistical significance which would have been achieved by the kinematic variable-based strategy that we have studied thus far. First, since we have closely followed the selection scheme used by the CMS collaboration, our simulation was able to reproduce the numbers of signal (1, 500±20±130) and background (7, 090±60±900) events of the b + − + E / T channel (their signal channel) with an integrated luminosity of 12.2 fb −1 . For their control regions, CMS reported 220 ± 10 ± 30 events for tW and 7, 650 ± 60 ± 1, 020 for tt in SR-I (bb + − + E / T ) while 790 ± 20 ± 80 for tW and 12, 910 ± 80 ± 1, 320 for tt in SR-II (bj + − +E / T ) with statistical (first) and systematic (second) uncertainties [8]. We have verified that our simulation provides results consistent with the above numbers of events in

Discussions and outlook
The top quark is the heaviest particle in the Standard Model and has the largest coupling to the Higgs boson. It may open up a new window toward new physics and therefore it is important to understand its properties. Very recently, production of a top quark in association with a W boson has been observed by the ATLAS and CMS collaborations. Most of kinematic properties of the signal (tW ) are very similar to those of tt that is the dominant background.
Multi-Variate Analysis has been adapted to discover the production of tW without detailed understanding of kinematics of the signal and its backgrounds.
In this paper, we have re-examined the production of the single top and a W gauge boson in the Standard Model with a non-conventional strategy. Our suggestion is to consider tW + j instead of tW , which also modifies relevant backgrounds correspondingly. This nextto-leading order production for tW signifies the retrieval of the visible state of ordinary tt, the major background, under the assumption that such an additional jet mostly comes from one of the bottom quarks in it. Clearly, the relevant kinematic structure of the background is well-defined, so that the distributions in well-known kinematic variables such as the invariant mass and M T 2 are featured by well-defined kinematic endpoints. This is contrasted with the ill-defined kinematic structure for tW +j due to the fact that j is typically from ISR/FSR. As a consequence, it was observed that for tW + j, the kinematic endpoints of aforementioned distributions are also ill-defined, i.e., the distributions are not bounded above. Based on these observations, we found that one could suppress tt background very efficiently with those variables, while obtaining a high efficiency in the signal. According to our naive estimate, discovery with a large significance could have been made much earlier with a simple use of kinematic variables. Since this method provides excellent background rejection, one could try to study other properties of top quark in this channel. We strongly encourage the ATLAS and CMS collaborations to revisit their study on tW with our suggestions.
We emphasize that our novel strategy is very general and can play a key role in separating signal and background events even in the context of physics models beyond the Standard Model. More specifically, the discussion in this paper is readily applicable to any processes that resemble the following structure: where the former represents pair-production of particle A while the latter represents singleproduction of particle A in association with particle B. Here A → Bb (Ā →Bb), B → Cc (B →Cc), and the bar denotes antiparticle. In supersymmetric models, one can imagine the following processes.
(1)tt * vs.tχ − 1 (ort * χ+ 1 ) wheret →χ + 1 b → b +ν and similarlyt * →χ − 1b →b −ν (2)gg vs.gq (orgq * ) whereg → qq → qqχ 0 1 The selection procedure targeting at the full visible state of the former processes inevitably demands an extra object for the latter ones, leading an ill-defined event topology for the latter ones only. Then the kinematic variable-based strategy proposed in this paper can help us separate the latter processes from the former ones.