Observation of electroweak W + W − pair production in association with two jets in proton-proton collisions at √ s = 13 TeV

An observation is reported of the electroweak production of a W + W − pair in association with two jets, with both W bosons decaying leptonically. The data sample corresponds to an integrated luminosity of 138 fb − 1 of proton-proton collisions at √ s = 13 TeV, collected by the CMS detector at the CERN LHC. Events are selected by requiring exactly two opposite-sign leptons (electrons or muons) and two jets with large pseudorapidity separation and high dijet invariant mass. Events are categorized based on the ﬂavor of the ﬁnal-state leptons. A signal is observed with a signiﬁcance of 5.6 standard deviations (5.2 expected) with respect to the background-only hypothesis. The measured ﬁducial cross section is 10.2 ± 2.0 fb and this value is consistent with the standard model prediction of 9.1 ± 0.6 fb.


Introduction
Electroweak (EW) scattering processes of the form VV → VV , with V and V being W, Z, or γ vector bosons, are crucial tools to investigate the mechanism of EW symmetry breaking [1,2]. The existence of a Higgs boson with a mass of 125 GeV [3][4][5] prevents the violation of unitarity in such vector boson scattering (VBS) processes by adding new exchange diagrams that cancel divergences in the theoretical calculations involving massive gauge bosons [6]. Therefore, the precise measurement of VBS cross sections in proton-proton (pp) collisions at the CERN LHC can probe the nature of the Higgs sector and search for effects beyond the standard model.
The cross sections of the various VBS processes and their associated backgrounds differ considerably depending on the choice of the pair of bosons and their final state. The EW production of two W bosons with the same electric charge (W ± W ± ) in the fully leptonic final state has been extensively studied by the ATLAS and CMS Collaborations [7][8][9][10][11][12]. In this paper, the full 2016-2018 data set recorded by the CMS experiment is exploited to search for the purely EW production of a pair of opposite-sign (OS) W bosons in association with two jets, a process that has not been observed yet.
This analysis, however, is more challenging than the W ± W ± channel, mainly due to the large OS background from top pair (tt) production, which results in a lower experimental sensitivity. From a theoretical perspective, they are both key ingredients to provide a full picture of the EW symmetry breaking mechanism in the SM, but the EW production of W + W − bosons is more sensitive to new physics phenomena that might affect the couplings of W bosons to the Higgs boson. In view of a future combination of VBS channels to set constraints on effectivefield theory parameters, this process is particularly important for its capability of precisely determining several dimension-6 operators [13].   1 (left and middle) shows some typical diagrams at the lowest order in the EW coupling (i.e., α 6 including subsequent W boson decays) that contribute to the signal. In Fig. 1 (right), the production of the W + W − final state is instead mediated via gluon exchange, referred to as quantum chromodynamics (QCD)-induced W + W − production. These QCD diagrams constitute an irreducible background for the analysis, although they have different kinematic properties that can be used to reduce their contamination in the signal region (SR).
The characteristic signature of VBS events includes two vector bosons and two jets emitted at forward and backward rapidities. The jets in the VBS topology (tagging jets) have large pseudorapidity separation (|∆η jj |), a high dijet invariant mass (m jj ), and suppressed hadronic activity between them. The analysis selects final states with two OS leptons (e + e − , µ + µ − , e ± µ ∓ ). The eµ channel is less populated by Drell-Yan (DY) events, thus resulting in a lower background contamination and higher sensitivity. The neutrinos in the final state result in large missing transverse momentum (p miss T ).
tracks and calorimetric energy depositions, increasing the apparent jet momentum. To mitigate this effect, tracks identified as originating from pileup vertices are discarded and a correction is applied to remove the remaining spurious neutral contributions [21].
Jet energy corrections are derived from simulation studies so that the average measured energy of jets becomes identical to that of particle level jets. In situ measurements of the momentum balance in dijet, photon+jet, Z+jet, and multijet events are used to determine any residual differences between the jet energy scale in data and in simulation, and corrections are applied to simulated events [22].
The missing transverse momentum vector p miss T is computed as the negative vector sum of the transverse momenta of all the PF candidates in an event, and its magnitude is denoted as p miss T [23]. The p miss T is modified to account for corrections to the energy scale of the reconstructed jets in the event. The pileup per particle identification (PUPPI) algorithm [24], which weights the PF candidates according to their probability to originate from the primary interaction vertex, is applied to reduce the pileup dependence of the p miss T observable.

Data sets and simulated samples
The 2016-2018 data sets, corresponding to integrated luminosities of 36.3, 41.5, and 59.7 fb −1 , respectively, are analyzed. The integrated luminosities for the 2016, 2017, and 2018 data taking years have individual uncertainties of 1.2-2.5% [25-27], whereas the overall uncertainty for the 2016-2018 period is 1.6%. To account for changes in the detector and for the different numbers of pileup interactions, a different simulation is employed in the analysis of each yearly data set.
Our analysis requires events filtered by trigger algorithms that select either a single lepton passing a high p T threshold, or two leptons with a lower p T threshold, satisfying both isolation and identification criteria. In the 2016 data set, the p T threshold of the single electron trigger is 25 GeV for |η| < 2.1 and 27 GeV for 2.1 < |η| < 2.5, whereas the p T threshold of the single muon trigger is 24 GeV. Double lepton triggers have lower p T thresholds, namely 23 GeV (12 GeV) for the leading (trailing) lepton in the double electron trigger, 17 GeV (8 GeV) for the leading (trailing) lepton in the double muon trigger and 23 GeV (8 GeV in the first part of the data set, corresponding to 17.7 fb −1 , and 12 GeV in the remainder) for the leading (trailing) lepton in the electron-muon trigger. In the 2017 data set, single electron and single muon p T thresholds are raised to 35 and 27 GeV, respectively. In the 2018 data set the corresponding single lepton p T thresholds are 32 and 24 GeV. Double lepton p T thresholds in 2017-2018 data sets are the same as those described for the 2016 data set, except for the p T threshold of the trailing lepton in the electron-muon trigger which is 12 GeV.
All expected physics processes are modeled via Monte Carlo simulation, reweighted to account for known discrepancies between data and simulated events. Corrections to the trigger efficiencies, as well as the efficiencies for electron and muon reconstruction, identification, and isolation as functions of the lepton p T and η, are extracted from events with leptonic Z boson decays using a "tag-and-probe" technique, as described in Ref. [28]. The b jet tagging efficiency [29] is measured and corrections are derived using data samples enriched in b quark jets. For each data set, simulated events are reweighted according to the pileup profile distribution observed in data.
The signal process is simulated at leading order (LO) with MADGRAPH5 aMC@NLO (v.2.4.2) [30] interfaced with the event generator PYTHIA 8.240 (8.230) [31] in 2016 (2017 and 2018). We require two quarks and two leptonically decaying W bosons in the final state; contributions from τ decays to lighter leptons are also included in the simulation. Diagrams containing a top quark contribution are not included in the signal matrix element, since EW top quark production is taken into account by the tt and single top quark (tW) background samples. W bosons are generated within 15 decay lifetimes from their on-shell mass. Contributions beyond this limit do not significantly increase the overall cross section and therefore are not considered.
The dipole approach [32] is used to model the initial-state radiation, rather than the standard p T -ordered one used in the PYTHIA parton shower generator, which does not properly describe extra QCD emissions in the vector boson fusion (VBF) and VBS processes [33]. The QCD-induced W + W − background is modeled with POWHEG v2 [34], and the production of the second jet is described at LO accuracy in QCD [35]. The interference between the EW and QCD-induced processes was evaluated, and it results in a negligible effect.
Higgs boson production mechanisms are included in the analysis and treated as a background source. All production modes are simulated with POWHEG v2 at next-to-LO (NLO) accuracy in QCD. Gluon-gluon fusion (ggF) events are further reweighted to match next-to-NLO accuracy, according to the NNLOPS scheme [36]. The Higgs boson decay into two W bosons and subsequently into leptons is simulated with the JHUGEN generator [37], whereas its decay into two τ leptons is simulated with PYTHIA. Other minor Higgs boson decay channels are neglected.
On-shell VBF Higgs boson production is negligible when simulating two on-shell W bosons, as is done for the signal sample. Accordingly, this contribution is removed from our signal definition, whereas off-shell effects are retained. Furthermore, the SR phase space is tailored to enhance EW W + W − production occurring without the exchange of a Higgs boson, which is suppressed by the tight selection on the dilepton invariant mass as discussed in Section 4. A dedicated analysis has been designed to target the on-shell VBF Higgs boson production [38], where, conversely, our signal sample is regarded as a background process.
The following additional background processes are modeled in simulation: nonresonant W boson pair production induced by gluons (included in the QCD-induced W + W − background estimation), tt and tW production, DY lepton pair production, Wγ and Zγ production, and multiboson production. Most of the event samples are generated at NLO in QCD using either POWHEG v2, MADGRAPH5 aMC@NLO (v2.4.2), or MCFM (v7.0) [39][40][41]. Only Wγ events are generated in the LO mode with MADGRAPH5 aMC@NLO (v2.4.2). The p T distribution of the tt component of tt + tW background is weighted to better match data [42]. A similar procedure is applied to the p T distribution of the Z boson in DY samples [43]. Other EW-induced diboson channels (WZ, Wγ and W ± W ± ) produced in association with two jets have been evaluated and their contribution to the SR is negligible, hence they have not been included in the background estimation.
The chosen parton distribution functions (PDFs) and underlying event tunes are common to all simulated events for a given data set. The parton showering and hadronization processes are simulated through PYTHIA 8.226 (8.230) in 2016 (2017-2018). The PDF set is NNPDF 3.0 [44,45] (3.1 [46]) and the underlying event tune is CUETP8M1 [47] (CP5 [48]) for the 2016 (2017-2018) samples. All generated events are processed through a simulation of the CMS detector based on GEANT4 [49] and are reconstructed with the same algorithms as used for data.

Event selection
The VBS final state is characterized by the presence of two jets from the incoming partons with a large |∆η jj | and a high m jj , and two OS leptons and two neutrinos from the W boson decays.
• Two OS leptons (electrons or muons), with dilepton mass m > 50 GeV and transverse momentum p T > 30 GeV, are selected with the tight selections described in Refs. [50,51]. The thresholds for the highest and second-highest p T leptons are 25 and 13 GeV, respectively. Events with an additional lepton with p T > 10 GeV are rejected; • p miss T > 20 GeV; • At least two jets with p T > 30 GeV, m jj > 300 GeV, and |∆η jj | > 2.5.
Requirements on m and p T variables are added to reduce contributions from on-shell Higgs boson production and DY to τ + τ − background, respectively, without losing signal efficiency. The kinematic phase space is then divided into a SR and two CRs, which are used to check the agreement between data and simulation and to constrain the normalization of the major backgrounds, i.e., tt and DY production. Each region is further categorized according to the charged lepton flavor composition: two electrons (ee), two muons (µµ), or one electron and one muon (eµ). The SR is defined by requiring that no b jets, defined with the loose working point of the DeepJet algorithm [29], are present. The transverse mass m T is defined as where φ is the azimuthal angle in radians; m T is required to be above 60 GeV in the eµ SR. In the SR for ee and µµ, the p miss T threshold is raised to 60 GeV and m is required to be greater than 120 GeV to reject DY Z boson production. In the eµ category, only a residual DY contribution (from τ + τ − → eµ events) remains.
The SR is further split into two regions to optimize the signal significance. The categorization is based on the centrality of the dilepton system with respect to the tagging jets, quantified by the so-called Zeppenfeld variable with η , η j 1 , and η j 2 being the pseudorapidities of the lepton and two jets, respectively. The categories with Z < 1 are enriched with signal and have less background contamination. Post-fit event yields are shown in Table 1. The tt CRs are defined by inverting the b jet veto and thus requiring the presence of at least one b jet with p T > 20 GeV in the final state, resulting in a ≈ 95% pure tt sample. In the DY CRs, the b veto requirement is the same as that in the SR. In the DY eµ category, the m T requirement is reversed with respect to the SR and a 50 < m < 80 GeV window is selected.
In DY ee and µµ categories, the dilepton mass is chosen to be close to the Z boson mass peak, |m − m Z | < 15 GeV. Moreover, the DY ee and µµ CRs are divided in two |∆η jj | bins, as explained in Section 5. The fraction of DY events in the DY CRs are ≈ 64% and ≈ 91% for the eµ and ee/µµ categories, respectively. A summary of all regions included in the analysis is shown in App. A.

Background estimation and signal extraction
The normalizations of the major backgrounds are measured by a simultaneous fit of data, including CRs. For the tt background the determination mainly comes from the dedicated tt CR.
In the ee and µµ categories, DY production is one of the leading background sources, typically arising when a lepton pair is reconstructed with high p miss T due to instrumental effects. A large fraction of the DY background (≈50%) comes from events where at least one of the two jets originates from a pileup vertex, whereas the remaining DY events are fully associated with the "hard" interaction, in which the two highest p T jets are radiated by initial-state quarks. These two backgrounds are measured as two independent processes and scaled with different parameters during the fit. Hence, the normalization of the DY background is determined from separate CRs with |∆η jj | < 5 (dominated by events where the jets originate from initial-state QCD radiation) and |∆η jj | > 5 (dominated by events where at least one of the jets comes from a pileup interaction). A third, minor source of DY background is ττ events, and its normalization is determined from the eµ category.
In contrast, there is no CR purely enriched by nonresonant QCD-induced W + W − events. The normalization of this background is left to float freely and is determined by the global fit in all regions.
Nonprompt leptons, i.e., either leptons produced in decays of hadrons or jets misidentified as leptons, come mainly from W + jets events. This background is directly estimated from data by applying a transfer function to events entering a W + jets CR, where one of the two leptons fails either the tight identification or isolation criteria applied in the SR, but passes a looser selection [53]. The transfer function is determined in a separate control sample by measuring the rate of objects satisfying loose lepton requirements that also pass the signal lepton criteria. Triggers requiring either one electron and one jet, or a single muon are applied to select this control sample. In both cases, the lepton must be well separated from the highest p T jet, and the contribution from leptons originating from W boson decays is suppressed by requiring p miss T < 20 GeV. The remaining contamination of prompt leptons from the EW production of a Z boson in association with jets is estimated from simulation and removed.
Other minor backgrounds, such as the Higgs boson and multiboson production, are estimated entirely from simulation.
In the eµ signal category a feed-forward deep neural network (DNN) is used to separate the VBS signal from the tt and QCD-induced W + W − backgrounds. For optimization purposes, separate DNN models were built for the subregions with low (Z < 1) and high (Z > 1) values of the Zeppenfeld variable. The two models share the same architecture and are fit using nine discriminating variables, which are chosen from a larger set of observables; these variables are listed in Table 2. The DNN implementation comprises five fully connected hidden layers, the first two (last three) having 128 (64) nodes each, that are trained with the stochastic gradient descent technique of the "Adam" optimizer tool [54] to achieve a good separation of signal and backgrounds. A binary cross-entropy loss function [55,56] is minimized in both models. The m jj and |∆η jj | post-fit distributions are shown as an example in App. A, and they agree with SM predictions within post-fit uncertainties. These distributions are also shown in the ee and µµ combined top CRs, where the data is in excellent agreement with the simulation. The DNN output is illustrated in the eµ top CR too; less than 5% residual shape dependence is observed. Such disagreement is mostly concentrated in background-dominated regions of the spectrum, and we have checked that it does not affect the signal extraction. The signal strength modifier of EW W + W − production, µ EW = σ obs /σ SM , is the parameter of interest and is translated to a cross section measurement in two different fiducial volumes; more details are given in Section 7. The signal extraction procedure is based on a binned maximum likelihood fit of the chosen discriminating variable distribution, as specified in the next paragraphs, with signal and background templates, performed simultaneously in all categories. CRs are included as single-bin templates where the number of events is fit to data.
In the eµ SRs, the binned DNN output is chosen as the discriminating variable. The DNN output spans a range between 0 and 1 and can be interpreted as the probability of each event to be identified as signal. Therefore, high DNN values are signal-enriched, whereas background samples mostly populate the low DNN spectrum. In the ee and µµ SRs, different discriminating variables are chosen as a function of m jj and |∆η jj |. For m jj > 500 GeV and |∆η jj | > 3.5, where the signal-to-background ratio is the largest, m jj is used. The remaining phase space is divided into three bins for each flavor composition (ee and µµ) and Z category (Z < 1 and Z > 1), the number of events in each region being the discriminating variable. The bins are defined as follows: • 300 < m jj < 500 GeV and 2.5 < |∆η jj | < 3.5; • m jj > 500 GeV and 2.5 < |∆η jj | < 3.5; • 300 < m jj < 500 GeV and |∆η jj | > 3.5.
The number of events in each bin of the templates included in the likelihood function is modeled as a Poisson random variable, with a mean value that is the sum of the contributions from all processes. Systematic uncertainties are represented by individual nuisance parameters with log-normal distributions. The uncertainties affect the overall normalizations of signal and background processes, as well as the shapes of the distributions of the observables. The nuisance parameters associated to shape uncertainties are given by a unit Gaussian distribu-tion. Correlations between systematic uncertainties in different categories are included.
Figs. 2 and 3 show the observed post-fit distributions for the full data set in bins of the DNN output for the eµ category, and in bins of m jj and |∆η jj | for the ee and µµ categories. Similarly, Fig. 4 shows the number of post-fit events in the CRs. The difference between data and MC in the last bin of the DNN output for Z < 1 in Fig. 2 was investigated by verifying that input variables reasonably agreed with data at DNN > 0.88. The discrepancy is not localized in any bins of such distributions and, because of the good modeling of the top background, is therefore considered to be compatible with a statistical under-fluctuation of data. [23] are also included and calculated by varying the momenta of unclustered particles that are not identified with either a jet or a lepton. The b tagging [29] introduces various uncertainty sources. The uncertainties from the b tagging algorithm itself are correlated among all data sets, whereas uncertainties due to the finite size of the control samples are uncorrelated. Finally, the uncertainty in the pileup reweighting procedure is applied to all relevant simulated samples.
Among theoretical uncertainties, effects due to the choice of the renormalization and factorization scales are evaluated, to cover missing higher order terms in the perturbation series of cross  section calculations. These are computed by varying the scales up and down independently by a factor of two with respect to their nominal values, ignoring the extreme cases where they are shifted in opposite directions [60,61]; the envelopes of the various distributions are taken as one standard-deviation variation. Only shape effects are included when varying these scales for theoretical uncertainties that affect the signal and main backgrounds, since their normalizations are directly measured in data.
The PDF uncertainties are computed as recommended by the NNPDF prescription [62]. In the signal process, they do not introduce any shape effect in the m jj and DNN output distributions, hence they are not considered. For tt and DY backgrounds, since normalization effects have no impact in the fit, PDF uncertainties can only affect the ratio of the expected yields between the SR and the CR. Such uncertainties are included in the CR and estimated to be 1% and 2% for tt and DY backgrounds, respectively. The modeling of both the parton shower and the underlying event is included. For the former, the uncertainties are computed using PYTHIA by shifting the renormalization scale of a factor 2 and 0.5 for both initial and final state radiation; these variations are propagated to each sample, but only shape effects are retained for the signal. Underlying event uncertainties are derived from alternative samples obtained by varying the PYTHIA tune parameters as described in Ref. [48]. Two fiducial cross section measurements are provided. Therefore, two sets of theoretical uncertainties are computed as illustrated above in each fiducial volume.
A list of all systematic uncertainties in the cross section measurement is presented in Table 3 and refers to the fiducial volume in which signal candidates are selected, as discussed in Section 7. The systematic component of the overall relative uncertainty in the signal cross section measurement is 13.1%. The statistical uncertainty is evaluated by setting all sources of systematic uncertainty to their best-fit result and its value is 14.9%. The combined relative uncertainty in the cross section measurement is 19.8%.

Results
All categories are simultaneously fit to data using a maximum likelihood template fit. Expected results are assessed by using the Asimov data set [63], a pseudoexperiment in which data are set in each bin to the value provided by the prediction. The statistical significance of the signal is quantified by means of a p-value [64], converted to an equivalent Gaussian significance, which corresponds to the probability of observing data with a larger discrepancy with respect to the background-only hypothesis, under the asymptotic approximation [63]. The observed (expected) significance for the signal is 5.6 (5.2) standard deviations.
The EW W + W − production cross section is measured in two different fiducial volumes, one more inclusive and one closer to the region defined by kinematic criteria applied in the preselection. In the former one, parton-level requirements define the phase space of interest, in particular the two outgoing partons (qq ) are required to have p T > 10 GeV and an invariant mass m qq > 100 GeV; τ leptons are included in the simulated signal sample, and their subsequent decay to leptons is included as part of the signal. The measured cross section is 99 ± 20 fb, compared with the LO prediction of 89 ± 5 (scale) fb, where the theoretical uncertainty is computed by varying the factorization scale of the signal.
The other volume is defined as the preselection region where signal candidates are reconstructed, as outlined in Section 4, but kinematic requirements are transposed to generator-level. If a photon is found within a distance ∆R = √ (∆η) 2 + (∆φ) 2 < 0.1 from a lepton, its fourmomentum is added to that of the lepton, making a "dressed" lepton. Additionally, if such a lepton is found within a distance ∆R = 0.4 from a jet axis, the event is discarded. Electrons and muons coming from a τ decay are vetoed. The missing transverse momentum is computed as the modulus of the vector sum of transverse momenta associated with all invisible particles generated in the event, and is required to be greater than 20 GeV. A summary of the requirements of such a fiducial volume is presented in Table 4. The measured fiducial cross section is 10.2 ± 2.0 fb, while the LO theoretical prediction is 9.1 ± 0.6 (scale) fb.

Summary
The EW production of a pair of opposite-sign W bosons in association with two jets has been observed in a data set corresponding to an integrated luminosity of 138 fb −1 collected with the CMS detector at the CERN LHC in proton-proton collisions at √ s = 13 TeV. Tabulated results are provided in the HEPData record for this analysis [65]. Events containing two opposite-sign leptons (electrons or muons), missing transverse momentum, and two jets with large separation in pseudorapidity and high dijet invariant mass were selected. A deep neural network was employed to deal with the irreducible background from the QCD-induced production of W boson pairs, and the dominant background from the production of tt quark pairs.
The measured signal corresponds to an observed (expected) significance of 5.6 (5.2) standard deviations with respect to the background-only hypothesis. The EW W + W − production cross section has been measured in two fiducial volumes. In the more inclusive one, the cross section is 99 ± 20 fb (89 ± 5 fb expected), whereas in that comparable with the experimental phase space the measured cross section is 10.2 ± 2.0 fb (9.1 ± 0.6 fb expected). These results are compatible Table 4: Definition of the fiducial volume similar to the reconstructed SR.

Acknowledgments
We congratulate our colleagues in the CERN accelerator departments for the excellent performance of the LHC and thank the technical and administrative staffs at CERN and at other CMS institutes for their contributions to the success of the CMS effort. In addition, we gratefully acknowledge the computing centers and personnel of the Worldwide LHC Computing Grid and other centers for delivering so effectively the computing infrastructure essential to our analyses. Finally, we acknowledge the enduring support for the construction and operation of the LHC, the CMS detector, and the supporting computing infrastructure provided by the following funding agencies:  [16] CMS Collaboration, "The CMS trigger system", JINST 12 (2017) P01020, doi:10.1088/1748-0221/12/01/P01020, arXiv:1609.02366.
[52] D. L. Rainwater, R. Szalapski, and D. Zeppenfeld, "Probing color singlet exchange in Z + two jet events at the CERN LHC", Phys. Rev. D 54 (1996) Table 2. The contributions from background and signal (red line) processes are shown as stacked histograms; systematic uncertainties are plotted as dashed gray bands. Data points are displayed with asymmetric Poisson vertical bars to ensure a correct statistical coverage all over the spectrum. Data/SM Uncertainties Figure 6: Post-fit distribution of the DNN output in different-flavor top CR. Data to SM expectation ratio has some residual shape dependence (less than 5% across the full spectrum), which however was found to not affect the analysis. The contributions from background and signal (red line) processes are shown as stacked histograms; systematic uncertainties are plotted as dashed gray bands. Data points are displayed with asymmetric Poisson vertical bars to ensure a correct statistical coverage all over the spectrum.