Search for the rare decay of the W boson into a pion and a photon in proton-proton collisions at $\sqrt{s} = $ 13 TeV

A search is performed for the rare decay W$^\pm\to\pi^\pm\gamma$ in proton-proton collisions at $\sqrt{s} =$ 13 TeV. Data corresponding to an on W integrated luminosity of 137 fb$^{-1}$ were collected during 2016 to 2018 with the CMS detector. This analysis exploits a novel search strategy based on W boson production in top quark pair events. An inclusive search for the W$^\pm\to\pi^\pm\gamma$ decay is not optimal at the LHC because of the high trigger thresholds. Instead, a trigger selection is exploited in which the W boson originating from one of the top quarks is used to tag the event in a leptonic decay. The W boson emerging from the other top quark is used to search for the W$^\pm\to\pi^\pm\gamma$ signature. Such decays are characterized by an isolated track pointing to a large energy deposit, and by an isolated photon of large transverse momentum. The presence of b quark jets reduces the background from the hadronization of light-flavor quarks and gluons. The W$^\pm\to\pi^\pm\gamma$ decay is not observed. An upper exclusion limit is set to this branching fraction, corresponding to 1.50 $\times$ 10$^{-5}$ at 95% confidence level, whereas the expected upper limit exclusion limit is 0.85 $^{+0.52}_{-0.29}$ $\times$ 10$^{-5}$.


Introduction
Rare hadronic decays of W bosons represent probes of the strong interaction at the boundary between the perturbative and nonperturbative domains of quantum chromodynamics (QCD), offering insights into factorization and meson form factors at large energy scales [1][2][3]. In particular, exclusive decays of the W boson into final states containing a single meson can be used to test these theoretical frameworks in a context where the energy scales are sufficiently large. At such scales, corrections that depend on the momentum transferred to the hadron in the final state can be neglected. The observation of these decays would validate the QCD factorization formalism, and enhance the possibility to perform precise calculations using such an approach. Furthermore, at future colliders they could provide a new way to measure the mass of the W boson that is based solely on visible single-particle decay products. Theoretical calculations for such branching fractions (B) have large uncertainties: the expected value of B(W ± → π ± γ), for instance, is in the range of 10 −9 − 10 −7 [1,4]. Upper limits on B(W ± → π ± γ) and on two other exclusive W boson decays were set previously at 95% confidence level (CL): B(W ± → π ± γ) < 7.0 × 10 −6 [5], B(W ± → D ± s γ) < 1.3 × 10 −3 [6], and B(W ± → π ± π ± π ∓ ) < 1.01 × 10 −6 [7]. For the Z boson, there are comparatively many more results on searches for such rare exclusive decay channels [8].
This Letter reports the results of the first search at the CERN LHC for the rare decay W ± → π ± γ. The high energy thresholds for single photon triggers make an inclusive search, similar to the one performed by the CDF Collaboration [5], unsuitable at CMS. Therefore, this analysis uses a novel approach [4], focusing on W boson production from top quark-antiquark pair (tt) events at the center-of-mass energy of 13 TeV. The muon or electron emerging from the leptonic decay of one of the two W bosons from the tt pair is used to select events at the trigger level. The other W boson in the event is employed to search for the rare decay W ± → π ± γ. The decay is characterized by an isolated photon of large transverse momentum (p T ), and an isolated track corresponding to a particle with large p T . The analysis exploits the reduced jet activity expected in the vicinity of the pion [4], which is produced via electroweak processes, to reject background pions. The b quark jets are used in the multivariate event selection to reduce the contribution from non tt backgrounds. The W ± → π ± γ branching fraction is determined by parametrizing the number of observed signal events (N sig ) as: where σ tt is the tt production cross section, B(W ∓ → ∓ ν) the branching fraction of the W boson into leptons, L int the integrated luminosity, and ε the efficiency for the whole selection procedure (including acceptance effects). The symbol τ indicates that the leptonic decays of the τ lepton originating from W → τν are also considered in the computation.

The CMS detector
The central feature of the CMS apparatus is a superconducting solenoid 6 m in internal diameter, providing a magnetic field of 3.8 T. A silicon pixel and strip tracker, a lead tungstate crystal electromagnetic calorimeter (ECAL), and a brass and scintillator hadron calorimeter reside within the solenoid volume. Each of these systems is composed of a barrel and two endcap sections. Forward hadron calorimeters extend the pseudorapidity (η) coverage provided by the barrel and endcap detectors. Muons are detected in gas-ionization chambers embedded in the steel flux-return yoke outside the solenoid.
Events of interest are selected using a two-tiered trigger system [9]. The first level, composed of custom hardware processors, uses information from the calorimeters and muon detectors to select events at a rate of around 100 kHz with a latency of about 4 µs. The second level, known as the high-level trigger, consists of a farm of processors running a version of the full event reconstruction software optimized for fast processing, and reduces the event rate to around 1 kHz before data storage. A more detailed description of the CMS detector, together with a definition of its coordinate system and relevant kinematic variables, can be found in Ref. [10].
The signal process is simulated at leading order (LO) in perturbative QCD using the Monte Carlo (MC) event generator PYTHIA 8.226 (8.230) for 2016 (2017 and 2018) [11]. The transverse momentum of the W boson (p W T ) decaying into a pion and a photon in the signal MC is weighted so that it matches that obtained in a next-to-LO (NLO) W → ν ( = µ, e) simulation in POWHEG v2.0 [12][13][14][15], where the W boson originates from tt decay. This correction does not affect the total signal normalization.
The main background processes in this analysis are nonsignal tt production, Drell-Yan events, associated production of a vector boson and a photon or jets (Wγ → νγ, Zγ → γ and W + jets → ν + jets), along with standard model events comprised uniquely of jets produced through the strong interaction, referred to as QCD multijet events. The tt backgrounds are simulated at NLO using the POWHEG v2.0 framework [15], whereas the Drell-Yan, Wγ → νγ, Zγ → γ, W + jets → ν + jets, and QCD multijet processes are simulated at NLO using the MADGRAPH5 aMC@NLO 2.2.2 (2.4.2) generator for 2016 (2017 and 2018) [16].
The NNPDF 3.0 NLO [17] (NNPDF 3.1 next-to-NLO [18]) parton distribution functions are used for generating all 2016 (2017 and 2018) MC samples. The generators are interfaced to PYTHIA for parton showering and parton fragmentation. The PYTHIA parameters affecting the description of the underlying event are set to the CUETP8M1 tune [19] (CP5 tune [20]) for the 2016 (2017 and 2018) simulation. For the 2016 tt backgrounds, the CP5 tune is used instead of CUETP8M1. The difference in generator version and configuration over the years of data taking reflects the improved knowledge of the production mechanisms in pp collisions at CMS.
All the simulated background samples are normalized to the cross sections obtained from the corresponding event generator. For the signal, the tt cross section measured by CMS with the 2015 data [21] is used, together with the most up-to-date measurements of B(W → ν) [8]. pp interaction vertex with the largest value of summed physics-object p 2 T is referred to as the primary vertex (PV). The physics objects are the jets, clustered using the jet finding algorithm [24,25] with the tracks assigned to candidate vertices as inputs, and the associated missing transverse momentum p miss T , computed as the negative vector p T sum of all the PF candidates in an event. This vector, whose magnitude is denoted as p miss T [26], is modified to account for corrections to the energy scale of the reconstructed jets in the event.
Muon candidates are reconstructed by identifying signals from the muon subsystems together with those from the inner tracker [27]. The inner detector track must be consistent with originating from the PV. The energy of muons is obtained from the curvature of the corresponding track. The p T sum of the PF candidates within a cone of ∆R ≡ √ (∆η) 2 + (∆φ) 2 < 0.4 around the muon direction is required to be <25% of the muon p T after implementing a correction for neutral PF objects not associated with the PV. Here φ indicates the azimuthal angle in radians in the plane transverse to the beam axis.
Electron candidates are reconstructed from clusters in the ECAL that are matched to tracks reconstructed using a Gaussian-sum filter algorithm. The track is required to be consistent with originating from the PV. Electron identification proceeds through a multivariate classifier that involves observables sensitive to bremsstrahlung along the electron trajectory, the geometrical and energy-momentum compatibility between the electron track and the associated energy cluster in the ECAL, the energy distribution of the electromagnetic shower, and discrimination against electrons originating from photon conversion. The PF-based isolation variables are also used as input variables in the multivariate classifier, including the p T of photons, charged and neutral hadrons in the event [28].
Photon reconstruction is performed in the region |η| < 2.5, excluding the ECAL barrel-endcap transition region 1.44 < |η| < 1.57. The energies of photons are obtained from the ECAL measurement. An electron veto is implemented to minimize photon misidentification, combining information from the ECAL and the pixel tracker. As in the electron case, photon identification proceeds through a multivariate classifier that makes use of PF-based isolation variables [29,30].
Jets are reconstructed from the PF candidates using the anti-k T clustering algorithm [24] with a distance parameter of 0.4, as implemented in the FASTJET library [25]. Jets originating from b quarks are tagged with the DEEPCSV algorithm [31], which uses secondary vertices, together with other lifetime information, and exploits a deep neural-network architecture. For the chosen working point, the efficiency to select b quark jets is around 90%, and the rate for incorrectly tagging jets originating from the hadronization of gluons or u, d, s quarks is about 10%. Jet momentum is determined as the vectorial sum of all particle momenta in the jet and, based on simulation, is typically within 5 to 10% of the true momentum over the whole p T spectrum and detector acceptance. Additional pp interactions within the same or nearby bunch crossings can contribute additional tracks and calorimetric energy depositions, increasing the apparent jet momentum. To mitigate this effect, tracks identified as originating from pileup vertices are discarded and an offset correction is applied to correct for remaining contributions. Jet energy corrections are derived from simulation studies so the average measured energy of jets becomes identical to that of particle-level jets. In situ measurements of the momentum balance in dijet, photon+jet, Z+jet, and multijet events are used to determine any residual differences between the jet energy scale in data and in simulation, and appropriate corrections are made [32]. Additional selection criteria are applied to remove jets potentially originating from instrumental effects or reconstruction failures.

Event selection
The analysis strategy and selection criteria were designed before inspecting the signal region (SR) in the collision data. Events are chosen at trigger level with either a muon or an electron with p T above 24 (27) GeV for muons in the 2016 and 2018 (2017) samples, and 27 (32) GeV for electrons in the 2016 (2017 and 2018) samples, following changes in the trigger reconstruction algorithms for these particles throughout the data-taking periods. The reconstructed isolated muon and electron candidates that match the trigger conditions must satisfy specific offline identification criteria, and are required to be in the region |η| < 2.4, with p T > 25 GeV (or >28 GeV for 2017) for muons, and >33 GeV (or >30 GeV for 2016) for electrons. The muon and electron tracks should be within |d z | < 0.5 cm of the PV along the beam axis, and |d xy | < 0.2 cm in the transverse plane.
Events where multiple leptons satisfy the selection criteria are discarded to reduce the Drell-Yan background contribution. Pions are selected by searching for particle tracks with p T > 20 GeV and charge opposite to that of the lepton. To reduce contamination from pion candidates misidentified as leptons, the requirements ∆φ(µ, π) > 0.09 and ∆φ(e, π) > 0.05 are imposed on the angle between the lepton and the candidate pion. These requirements remove the region where the lepton and the candidate pion overlap. Their effect on the signal efficiency is negligible. To ensure the compatibility of the pion track with the PV, we also require |d z | < 0.5 cm and |d xy | < 0.2 cm. The pion with largest p T that satisfies all these requirements is chosen for further analysis. Isolated photons with p T > 20 GeV and |η| < 2.5 are selected. The condition ∆φ( , γ) > 0.04 reduces the misidentification of the trigger lepton as the selected photon. The photon with the largest p T that satisfies all these requirements is chosen for further analysis, and the event is retained if the pion-photon invariant mass is in the range 50 < m π γ < 100 GeV.

Pion isolation
Since the pion from the W ± → π ± γ decay originates from an electroweak process, minimal jet activity related to this decay is expected in its vicinity. For this reason, the analysis implements a pion-isolation variable based on the p T sum of all the PF candidates (excluding the pion itself) contained in a cone with radius ∆R around the pion, divided by the pion p π T (Σp T /p π T ). The cone is defined to be 0.02 < ∆R < 0.5, with the lower bound helping to exclude from the sum the single charged particles generating two nearby reconstructed tracks. To reduce the contribution from pileup, charged particles entering the sum are required to originate from the PV, with the same d xy and d z criteria as used for the pion. Figure 1 shows the distribution of the pion-isolation variable for signal events, which appear to be highly isolated, whereas background events indicate a clear presence of jet activity. A discrepancy between data and MC is observed for background events at low values of the pion-isolation variable. This discrepancy, independent of the data-taking period, does not affect the background characterization extracted from data, but somewhat reduces the effectiveness of the multivariate selection technique described in Section 5.2, which is trained on simulated events. As far as the signal is concerned, this discrepancy is included in the systematic uncertainty as discussed in Section 7.

Multivariate selection
The final event selection is performed using a multivariate analysis technique. A boosted decision tree (BDT) classifier is trained using half of the signal and background MC events, and is validated through the other half. The requirement of a muon or an electron generates differences in the background distributions and the two cases are therefore handled as separate chan- MC uncert (stat) ( The simulated MC distribution for the signal is given by the dashed red line and corresponds to a 1% branching fraction for the W ± → π ± γ decay. The statistical uncertainties in the data are small and thus not visible. In the lower plot, the ratio between data and the background component of the MC is shown. The gray bands represent the statistical uncertainty in the MC background. nels. The steps in the training and validation are carried out on the merged sample corresponding to the three years of data taking. The BDT input variables with good discriminating power and whose distributions are well described by the simulation are the transverse momenta of the lepton (p T ), pion (p π T ), and photon (p γ T ), the p miss T in the event, the pion-isolation variable, and the number of the b-tagged jets in the event with p T > 25 GeV (n b ), which provides particularly good suppression of the background topologies characterized by no b-tagged jets. Distributions of the BDT discriminant for collision data with simulated signal and background are shown in Fig. 2. The shape of the data is in agreement with the MC background.
Finally, thresholds for the BDT discriminant are defined for the two channels to identify a unique SR and a unique control region (CR) from the merging of the two lepton channels. These thresholds aim at obtaining the highest possible significance, while fulfilling the requirements discussed in the following. The CR, which includes a negligible signal contribution with respect to the SR, is chosen so that the m π γ distribution contains enough events to be well fitted and to limit the impact of the systematic uncertainty in the background description. This is obtained for 0.206< BDT discriminant <0.281 in the muon channel and 0.209< BDT discriminant <0.269 in the electron channel. The SR should contain a number of events comparable to that in the CR, which is the case for values of the BDT discriminant >0.281 in the muon channel and >0.269 in the electron channel.
The products of the signal efficiency and the acceptance for the two channels, ε , are computed separately for the 2016, 2017, and 2018 samples. They are evaluated as the ratio between the number of signal events in the MC sample in the SR, and the initial number of events generated in each channel. The efficiencies are given in Table 1 with their total uncertainties.  Year ε µ ε e 2016 0.12 ± 0.01 0.07 ± 0.01 2017 0.11 ± 0.01 0.07 ± 0.01 2018 0.12 ± 0.01 0.07 ± 0.01

Signal and background yield extraction
The signal and background yields are extracted using an unbinned maximum likelihood fit to the m π γ distribution in the SR. The probability density function (pdf) used to perform the fit is defined as follows: where N bkg (SR) is a floating parameter representing the number of background events in the SR, and G is a Gaussian pdf used to account for the nuisance parameters that model the systematic uncertainties. The functional form of the signal is determined fitting the m π γ distribution obtained from simulation. The sum of a double crystal ball (DCB) function [33] and a Gaussian pdf is chosen to characterize the signal lineshape f sig , whose parameters, excluding the normalization, are then fixed in the fit to the m π γ distribution of the data.
The CR is used to estimate the functional form of the background directly from data. A linear polynomial ( f bkg ) is chosen to describe the background. The values of the slope and the intercept (normalization) of this straight line can then vary in the fit to the m π γ distribution of the data in the SR.
The number of signal events N sig is not sensitive to the contribution of W bosons that are not produced via tt decay. It was verified in simulation that such processes are rejected by the requirement of tt production with a large p T lepton and jets originating from the hadronization of b quarks. Another contribution to the number of signal events that is not considered arises from the associated production of a top quark and a W boson, which is expected to be two orders of magnitude smaller than that from tt events.

Systematic uncertainties
Systematic uncertainties are treated as nuisance parameters in the fit to the m π γ distribution. These uncertainties are included as log-normal pdfs in the likelihood and will be discussed in the following in descending order of impact on the final result.
The tt production cross section in pp collisions at √ s = 13 TeV, which is included in Eq. (1), was measured by CMS using the 2015 data [21] to be 815 ± 43 pb, where the total uncertainty is the quadratic sum of the statistical, systematic, and luminosity-related components. The integrated luminosities of the 2016, 2017, and 2018 data-taking periods are individually known with uncertainties in the 2.3-2.5% range [34][35][36], whereas the total 2016-2018 integrated luminosity has an uncertainty of 1.8%; the improvement in precision reflects the uncorrelated time evolution of some systematic effects.
The choice of pdfs used to describe the background has an influence on the measured W ± → π ± γ branching fraction. To estimate this effect, a fit is performed to the data in the SR with the nominal background pdf, i.e., a linear function. The same distribution is then fitted again using an alternative pdf, namely an exponential. The following pull is calculated for the branching fraction: where B lin represents the value of the W ± → π ± γ branching fraction as extracted from the fit with a linear function, σ B lin is its uncertainty, and B exp indicates the value of the branching fraction extracted from the fit using the exponential function. The pull amounts to 14.6% and is taken as an estimate of the systematic uncertainty associated with the background parametrization.
The parameters representing the peak and width of the Gaussian component of the DCB pdf for the signal are extracted with their uncertainties through a fit to the simulated events. To include the limited event count in the MC sample, we take the width of the DCB function and its uncertainty as extracted from the fit to the MC events. Moreover, we include a 1% uncertainty in the position of peak of the DCB function to account for the photon energy scale, which dominates the uncertainty in the pion-photon invariant mass. First, we fit the data in the SR fixing the values of the peak and the width of the DCB function to their central values (nominal case). We then repeat the fit after increasing and decreasing these parameters by their uncertainties (shifted parameters). We compare the results by calculating a set of pulls in the parameter of interest: where σ B nominal represents the uncertainty associated to B nominal . The largest pull of 10.6% is an estimate of the systematic uncertainty associated with the parametrization of the signal. Since the other parameters of the signal lineshape are highly correlated with each other, as well as with the peak and the width of the Gaussian component of the DCB function, it is not necessary to float them within their uncertainties.
The contributions to the uncertainties in ε include the statistical uncertainties, the BDT modeling, the modeling of kinematic variables for the signal in PYTHIA, the scale factors that correct for differences between detector conditions in collision data and simulation, and the chargemisidentification in the tracking algorithm. To assess the uncertainty in modeling of the BDT, the performance of a BDT trained and validated with the nominal input variables is compared with that of a BDT trained with the same variables, and validated with variables to which a Gaussian spreading or a shift is applied. The width of the spreading corresponds to 5% of the value of p T , p π T , p γ T , and p miss T . The pion isolation is shifted upwards by 10%, based on the agreement between collision data and simulation observed in Fig. 1. The number of b jets, n b , is increased or decreased by 1 for 5% of the events in the training samples of 2016 and 2017, and for 10% of the events in the training sample of 2018, thus accounting for the larger uncertainties in the jet energy calibration during part of the 2018 data taking. For energy and momentum, this spreading corresponds to the largest expected resolution. For charged particle tracks, it includes the uncertainties in the tracking efficiency. The largest change in ε is 1% for a given background rejection efficiency in the muon channel, and 2% in the electron channel.
The procedure for modeling a process in PYTHIA represents another source of systematic uncertainty, since inaccuracies of the model have an impact on the signal acceptance. As a first step in the evaluation of this uncertainty, we consider the p T distribution of the generated signal W bosons, which is corrected to match that obtained from NLO tt MC generated events in POWHEG v2 (as discussed in Section 3). To assess the effect of a systematic change in the p W T reweighting, the p W T spectrum is shifted upwards and downwards by 5%, which corresponds to the maximum uncertainty in the weights used. The performance of the BDT trained and validated through the nominal variables is compared to that of a BDT trained on the nominal variables, and tested using MC events where the p W T -based reweighting is shifted by 5%. The most significant changes in signal efficiency for a given background rejection efficiency are about 2% and 3% in the muon and the electron channels, respectively.
The signal acceptance also depends on the angular spectrum of the particles involved in the W ± → π ± γ decay. To evaluate the uncertainty on ε related to the PYTHIA modeling of the W polarization, we modify this angular spectrum. Specifically, we consider the generated distribution in θ * , the angle between the pion direction in the W boson rest frame and the direction of the W boson momentum in the laboratory. We use sin 2 θ * and cos θ * distributions, corresponding to longitudinal and transverse polarizations of the W boson, respectively. By taking the ratio of each of these to the observed θ * distributions, we obtain two sets of weights that are used to reweight the signal. In similarity with p W T , a BDT is trained on a set of nominal events, and validated on a set that has the signal renormalization implemented. The most significant changes in terms of ε for a given background rejection efficiency are about 3% and 5% for the muon and the electron channels, respectively.
An additional uncertainty of 1.4% arises from the use of scale factors in the simulation, and an uncertainty of 1% covers the effects of charge misidentification in the tracking algorithm. These uncertainties, derived from detector performance studies [27][28][29]37], are summed in quadrature to the statistical and to the other systematic uncertainties, described above, in the signal efficiency, as presented in Table 1.

Results
The m π γ distribution is shown in Fig. 3 for the combination of the two lepton channels and the three data-taking periods. The simulated MC distribution for the signal is given by the dashed red line and corresponds to a 10 −4 branching fraction for the W ± → π ± γ decay. The uncertainties in the data are statistical only. The blue line represents the best fit to the data using the model described in Eq. (2). In the lower plot, the ratio between data and the background component of the MC is shown. The gray bands represent the uncertainty (statistical + systematic) in the MC background.
No significant excess is observed above the expected background. An observed upper limit at 95% CL is set on the branching fraction of the W boson to a pion and a photon using the asymptotic CL s method [38,39]: and the expected upper limit is 0.85 +0.52 −0.29 × 10 −5 . The total uncertainty is dominated by the statistical contributions, which account for ≈80%. The largest systematic uncertainties arise from the measurements of the tt cross section (≈5%) and the integrated luminosity (≈2%).

Summary
A first search is reported for the rare decay W ± → π ± γ at the LHC. The search is based on the proton-proton collision data collected at a center-of-mass energy of 13 TeV at the CMS experiment in 2016-2018, corresponding to an integrated luminosity of 137 fb −1 . Because of the high trigger thresholds for single photons, which make an inclusive search unsuitable at CMS, the measurement is performed using top quark-antiquark pair events, where one of the produced W bosons decays into leptons. This is the first search for W ± → π ± γ that adopts such a strategy. The rare decay is characterized by an isolated track, for which a specific pionisolation variable is defined, and an isolated photon with large transverse momentum. The data are compatible with the background-only hypothesis. The upper limit on the branching fraction of the W boson to a pion and a photon is 1.50 × 10 −5 at the confidence level of 95%.
Since this result is statistically limited, it can be significantly improved with larger integrated luminosities. Our study demonstrates the feasibility of a search for such rare decays of the W bosons at the LHC and defines a search strategy that can be adopted at future hadron colliders. This paves the way to increasingly precise measurements that could, on the one hand, provide effective validation of the QCD factorization formalism and, on the other, offer an alternative way to measure the mass of the W boson.

Acknowledgments
We congratulate our colleagues in the CERN accelerator departments for the excellent performance of the LHC and thank the technical and administrative staffs at CERN and at other CMS institutes for their contributions to the success of the CMS effort. In addition, we gratefully acknowledge the computing centers and personnel of the Worldwide LHC Computing Grid for delivering so effectively the computing infrastructure essential to our analyses. [23] CMS Collaboration, "Particle-flow reconstruction and global event description with the CMS detector", JINST 12 (2017) P10003, doi:10.1088/1748-0221/12/10/p10003, arXiv:1706.04965.