Charm-quark Yukawa Coupling in $h\rightarrow c\bar{c}\gamma$ at LHC

It is extremely challenging to probe the charm-quark Yukawa coupling at hadron colliders primarily due to the large Standard Model (SM) background (including $h\to b\bar b$) and the lack of an effective trigger for the signal $h\to c\bar c$. We examine the feasibility of probing this coupling at the LHC via a Higgs radiative decay $h\rightarrow c\bar{c}\gamma$. The existence of an additional photon in the final state may help for the signal identification and background suppression. Adopting a refined triggering strategy and utilizing basic machine learning, we find that a coupling limit of about 8 times the SM value may be reached with $2\sigma$ sensitivity after the High Luminosity LHC (HL-LHC). Our result is comparable and complementary to other projections for direct and indirect probes of $h\to c\bar c$ at the HL-LHC. Without a significant change in detector capabilities, there would be no significant improvement for this search from higher energy hadron colliders.


Introduction
It is of fundamental importance to establish the pattern of the Higgs boson Yukawa couplings to fermions in order to verify the Standard Model (SM) and seek hints of physics beyond the SM (BSM). The couplings to third generation fermions have all been observed with over 5σ significance. For top quarks, there is a large indirect contribution to the gluon-gluon fusion production mode and the photon-photon decay mode. However, direct observation is important to ensure there are no BSM quantum corrections to Higgs boson production or decay. Both the ATLAS and CMS collaborations have recently observed the production of top quark pairs in association to the Higgs boson [1,2] as well as Higgs boson decays to bottom quark pairs [3,4]. For leptons, the challenging decay channel h → τ + τ − reached 5σ already from the LHC Run 1 data [5] and now ATLAS and CMS have both individually observed this decay mode [6,7]. With the upgrade of the LHC to its high-luminosity phase (HL-LHC), the Higgs coupling measurements to the heaviest generation of fermions will reach an accuracy of about or better than 20% [8] and will extend to kinematic regions with high transverse momenta of the Higgs boson (p h T ) [9,10]. Direct observations of the Higgs couplings to the second generation of fermions will be critically important to confirm the pattern of non-universal Yukawa couplings and search for deviations from the SM as predicted in theories with an extended Higgs sector [11]. The channel h → µ + µ − is the cleanest Higgs signal of all decay modes [12,13]. Even with a branching fraction as small as 2 × 10 −4 , ATLAS and CMS almost have the sensitivity to the SM rate [14,15] and a measurement with an accuracy of 13% is expected at the HL-LHC [16]. In contrast to h → µ + µ − , the second generation hadronic decay modes of the Higgs boson are very difficult to distinguish from SM backgrounds, including other Higgs boson decays. While b-jet tagging is a powerful tool for rejecting backgrounds, c-jets are harder to distinguish from b and light jets [17][18][19] and strange-jets are nearly identical to up-and down-quark jets [20,21]. The h → cc branching ratio is expected to be about 3%, so the challenge is background rejection and triggering, not statistics.
So far, there have been two experimental studies to probe the Higgs-charm Yukawa coupling (y c ). One approach is to use the clean associated production of the Higgs boson with a vector boson and exploit charm tagging [19]. A key challenge with this method is that the h → bb contribution is large compared to the cc signal. An optimistic projection 1 for the full HL-LHC dataset suggests that 6 times the SM rate at 95% confidence level may be achievable [22]. A second approach used exclusive decays of the Higgs into a J/ψ and a photon [23]. While this final state can be well-separated from backgrounds in the J/ψ → µ + µ − channel [24][25][26], it suffers from a small branching ratio and modeling assumptions to extract the Higgs-charm Yukawa. In particular, the leading contribution to this process is via the vector meson dominance γ * → J/ψ, which is an order of magnitude larger than that involving the charm-quark Yukawa coupling [23,27,28], leading to a less sensitive upper bound on y c of about 50 times of the SM prediction at the HL-LHC [29].
Another recent proposal for probing the Higgs-charm Yukawa coupling is to study the associated production process gc → ch [30]. This has the advantage that it is independent of the Higgs decay mode, but suffers from a low rate and significant background. Other proposals for direct or indirect probes of first-and second-generation quark-Higgs couplings [31][32][33][34] are challenging due to large SM backgrounds and contamination from other Higgs production and decay modes. A global analysis of Higgs decays can also constrain the charm-Higgs Yukawa coupling, with a projected sensitivity of about 6 times the SM expectation [35,36].
It has been recently pointed out that the radiative decay of the Higgs boson to a pair of charm quarks could be used to constrain the charm-quark Yukawa coupling with the final state h → ccγ [37]. The addition of the photon can be helpful for triggering as well as suppressing both non-Higgs and Higgs backgrounds. In particular, the electromagnetic coupling would disfavor the down-type quarks, especially the flavor-tagged bbγ mode. In this work, we examine the feasibility of a Higgs-charm Yukawa coupling measurement in the h → ccγ channel at the HL-LHC. By proposing an optimal triggering strategy and simulating realistic detector effects, we show that a coupling of about 8 times the SM value may be reached at 95% confidence after the HL-LHC. This approach is complementary and competitive with other methods. We also explore the extent to which the energy upgrade of the LHC (HE-LHC) could improve the sensitivity.
The rest of the paper is organized as follows. In Sec. 2, we consider the features for the signal and background processes and propose an optimal but realistic trigger. In Sec. 3, we perform detailed analyses, including some basic machine learning, to obtain the optimal signal sensitivity. We extend our analyses to the HE-LHC in Sec. 4 Table 1: Representative operating points for the c-tagging efficiency ( c ), the b-jet mis-tag rate ( b ), and the light jet mist-tag rate ( j ).
conclusions and outlook in Sec. 5.

Trigger Considerations at HL-LHC
We focus on the leading Higgs production channel, gluon fusion, followed by the radiative decay gg → h → ccγ. (2.1) The signal is thus characterized by an isolated photon recoiling against two charm-tagged jets with a three-body invariant mass near the Higgs resonance. The energy of the two charm jets will be limited by the Higgs boson mass, and the photon tends to be soft and collinear with one of the charm quarks. Due to the large collision rate (40 MHz), enormous inelastic cross-section for pp → central activity, and limitations in hardware, most collisions at the LHC are discarded in real time. The trigger system is a key challenge for recording physics processes with relatively soft final states such as h → ccγ. The rest of this section explores the impact of triggering on the h → ccγ analysis in the context of the HL-LHC.

Signal and background processes
Tagging jets originating from charm quarks (c-tagging) is challenging, but important for suppressing backgrounds originating from light Quantum Chromodynamic (QCD) jets and from b-quark jets. Encouragingly, a recent study from ATLAS [19] has shown very promising c-tagging results. Based on the ATLAS result, three c-tagging working points listed in Table 1 are studied for the h → ccγ search. 2 One of the dominant backgrounds from the h → ccγ search is QCD di-jet production associated with a photon, where both jets are (mis-)tagged as c-jets. Similarly, QCD 3-jet production also contributes to the background if one of the jets is mis-identified as a photon. In addition to these hard-scatter background processes, one or more of the tagged objects could come from an additional nearly simultaneous pp collision (pile-up). Many sophisticated pile-up mitigation techniques have been proposed [38][39][40][41][42][43][44][45][46][47] which can significantly reduce the contamination from pile-up both in the trigger and in offline analysis.
However, no method can eliminate all of the pile-up and all methods perform worse (if even applicable) at the trigger level. Since pile-up conditions will be extreme at the HL-LHC (typically 200 pile-up collisions), their contribution to the event rate must be taken into account.
Current and future upgrades of the ATLAS and CMS trigger systems [48,49] will allow for multi-object requirements using offline-like information. In order to have a high efficiency, (relatively) low rate trigger for h → ccγ, we propose a new approach which requires two jets and one photon in the central region with invariant mass near the Higgs resonance.

Simulation Setup
Since the cross section for Higgs bosons is much smaller than for multijet production, the trigger rate is dominated by background. In order to estimate the trigger rate, the following background processes are simulated using MG5aMCNLO [50], including up to one additional jet matched using the MLM prescription [51]: pp → jγ and pp → jj. (2. 2) The parton shower and hadronization are simulated with PYTHIA6.4.28 [52], and a fast detector simulation is implemented using DELPHES3 [53] with the detector card delphes card ATLAS PilUp.tcl. Pile-up is modeled by mixing µ = 200 minimum bias events simulated using PYTHIA with the hard-scatter processes. The ATLAS and CMS trigger systems consist of a hardware trigger (L1) and a softwarebased high-level trigger (HLT). While the HLT jet resolution is very similar to offline, at L1, the momentum resolution for jets is much worse than offline due to the coarser detector granularity and reduced information available for the reconstruction algorithms. The event rate will have a significant contribution from events with low transverse momenta that fluctuate high, since the p T spectrum is steeply falling. In order to model the L1 jet resolution, a normal random number is added to each jet energy with a mean of zero and a standard deviation of 13 GeV. This additional resolution is estimated from the trigger turn-on curves in Ref. [54] as follows. Consider a jet trigger that requires a L1 p L1 T > X GeV. The distribution of L1 jet p T given the offline jet p T should be approximately Gaussian (ignoring effects from the prior) with a mean µ and standard deviation σ. Suppose that Since the mean and median of a Gaussian are the same, it must be that for p offline Fig. 31a in Ref. [54], this procedure gives the relationship p offline This means that the 3σ tail of the Gaussian with µ ∼ Y /2.5 is at X. Therefore, σ ∼ (Y /2.5 − X)/3. Once again using Fig. 31a in Ref. [54], this procedure gives σ ∼ 5 GeV, approximately independent of p T . Translating this 5 GeV back to an offline-scale results in 5×2.5 ∼ 13 GeV. Some degradation in this resolution will occur between the LHC and the HL-LHC, but a significant amount of the loss from pile-up will be compensated by gains in performance due to detector upgrades.
In addition to degrading the resolution of reconstructed jets, pile-up is also a source of jets from additional hard multijet events and random combinations of radiation from multiple soft collisions. Offline, the most effective method for tagging these pile-up jets is to identify the hard-scatter collision vertex and then record the contribution of momentum from tracks originating from other vertices. Full-scan tracking and vertexing is not currently available at L1, but both ATLAS and CMS will implement some form of tracking for the HL-LHC [48,[55][56][57][58][59][60]. Using Ref. [55] as an example, we assume a L1 tracking system that has nearly 100% efficiency for central charged-particle tracks with p T > 3 GeV and a z 0 resolution of 0.2 cm. We further assume that some timing information will be available at L1 so that no pile-up tracks with p T > 3 GeV enter the analysis. All of these conditions are optimistic, but are useful when setting a bound on what is achievable with the HL-LHC dataset. Tracks that can be identified as originating from pile-up are removed before jet clustering so that in a particle-flow-like [61,62] jet reconstruction algorithm, pile-up jets will be reconstructed with less energy than their true energy. To further suppress pile-up jets, a transverse momentum fraction of tracks within a jet is constructed per jet: where p track T is the transverse momenta of L1 reconstructable tracks and p jet T is the transverse momenta of the corresponding jet. Large values of r c correspond to more hard-scatterlike jets while low values of r c are indicative of pile-up jets. Since the sophisticated pile-up mitigation techniques mentioned earlier can be employed with nearly offline-level performance at the HLT and the pile-up challenge is most severe at L1, the impact of pile-up at the HLT and offline is ignored for the results presented in later sections.
Displaced vertex reconstruction at L1 is likely not possible with high efficiency and so we assume that no explicit c-tagging will be possible at L1. At the HLT, we assume offline-like c-tagging. Flavor tagging does degrade with pile-up, but detector upgrades are expected to compensate for pile-up ( Fig. 6 in Ref. [63] and Fig. 19a in Ref. [64]).
The probability for jets faking photons depends on how well-isolated photon candidates are required to be. Very stringent isolation requirements result in a purer sample of prompt photons at a cost of signal efficiency while loose requirements result in many fragmentation photons originating from jets. In our study, we follow the performance evaluation by ATLAS [63], and assume that the fake photon rate would be We further assume the misidentified photons carries 75% of the jet transverse momenta. 3 In our simulation, we define hard-scatter jets as jets close to a truth level jet with ∆R < 0.3.

Trigger Design
Currently, the L1 trigger has a maximum rate of 100 kHz, while HLT has a maximum rate of 1 kHz. After the HL-LHC upgrades [65,66], the trigger rates at L1 and HLT are expected to be about 1 MHz and 10 kHz, respectively. Therefore, it is vital to make sure the event rates of the processes are within the capacities of both the L1 trigger and the HLT.
For the L1 trigger, we required the two jets and a photon with transverse momenta To suppress the QCD background and put the L1 trigger rate under control, we make use of the fact that the three final state objects come from the Higgs resonance decay. Therefore, we also require the invariant mass of the three trigger objects at L1 to be 90 GeV < M jjγ < 160 GeV. (2.9) As the two jets come from the Higgs decay and do not tend to have rather high transverse momenta, they are often not the two leading jets at L1. Therefore, we require the two candidate jets must be among the 5 hardest jets in each event.
The the corresponding trigger rate is listed in the first row of Table 2. The trigger rate is calculated using the instantaneous luminosity L = 5 × 10 34 cm −2 s −1 = 5 × 10 −5 fb −1 s −1 (2.10) at the HL-LHC [67]. We note that the most dominant contribution at L1 comes from the QCD multi-light-jet production with a jet-faked photon. As shown in Table 2, the trigger proposed above would occupy less than 1% of the total bandwidth, and thus is plausible to implement as part of the HL-LHC trigger menus of ATLAS and CMS.

Cut-based Analysis
To gain physical intuition, we start with a simple analysis that uses only thresholds on various kinematic quantities ("cut-based"). In addition to the trigger requirements as before, we select the signal events with a basic threshold on the leading jet  Figure 1: Distributions of (a) the smaller value of the separations between the candidate jets and the photon; (b) the three-body invariant mass of the two candidate jets and the photon. Signal (blue solid) and background (red dashed) are both normalized to unit area. Figure 1a shows the normalized distribution of the smaller value of the separations between photon and jets. As the photon in the signal process comes from final-state radiation, it tends to be close in angle to one of the jets. Therefore, to optimize the signal significance, we further require the smaller one of the separations between the candidate jets and photon to be ∆R min jγ < 1.8.
We quantify the sensitivity using a profile likelihood fit to the three-body invariant mass in the range 60 GeV < M jjγ < 160 GeV, (3.3) as shown in Fig. 1b, with bin widths of 5 GeV, in two event categories. The two categories are defined as having either 1 or 2 of the Higgs candidate jets c-tagged.
The expected 95% CL s [68] upper limit (approximately at a 2σ-level) on the signal strength in the absence of systematic uncertainties is found to be 4 µ < 106, 88, 86, (3.4) for operating points I, II, III with a luminosity of 3 ab −1 . If the BSM physics significantly modifies the charm-Yukawa coupling, which can be parametrized using the κ-scheme,  Table 2: Expected numbers of events of the signal and background, and event rates, in the range of 100 < M jjγ < 140 GeV at the HL-LHC with L = 3 ab −1 . The first row gives the event rate at L1, with only the requirements in Sec. 2 applied. Systematic uncertainties are not accounted for in the significance calculation in the last column.
Then the above upper limit can be translated into The expected numbers of events and event rates, in the range of 100 < M jjγ < 140 GeV, are summarized in Table 2, for different event categories and c-tag working points described in Table 1. The third column shows the numbers of events for h → ccγ through QED radiation. The signal-to-background ratio S/B is between 10 −5 to 10 −6 . As the background is dominated by QCD multi-jet processes, it is likely that the background would be estimated using data-driven techniques. The resulting systematic uncertainties may not be small, but would likely be comparable to or smaller than the large relative statistical uncertainty on the signal. We also note that, although we aimed to optimize the light-jet rejection, the yields of the background process h → bbγ due to mis-tagging is about 1.5 − 3 times larger than those of h → ccγ for different c-tagging working points, comparable to the previous studies. 5

Machine Learning Analysis
In order to study the benefit from a more complex analysis approach, a boosted decision tree (BDT) is trained to distinguish the Higgs signal from the multi-jet background. The BDT is trained using XGBoost [69] with 5-fold cross-validation. The following 13 input features are used for training: Even though M jjγ is the most important feature, it is not explicitly provided to the BDT in order to minimize the bias to the distribution used for the profile likelihood fit in the range of Eq. (3.3) for extracting the expected upper limit. 6 The distribution of the BDT output on signal and background along with a receiver operator characteristic (ROC) curve are shown in Fig. 2. The two most important features used by the BDT are p max T j and ∆R min jγ , which are also the features used to form the simple event selection in the previous section.
Using a selection based on the BDT, the expected 95% CL s upper limit on the signal strength in the absence of systematic uncertainties is found to be µ < 91, 77, 75, ⇒ κ c < 9. 6, 8.8, 8.6. (3.9) for operating points I, II, III with a luminosity of 3 ab −1 . This is a modest improvement over the cut-based result by about 10%. Further gains using multivariate approaches may be possible, but will likely require advances in photon, pile-up, and c-tagging using low-level information. The distribution of M jjγ already captures most of the information available for separating signal and background given that the correct objects are identified.

HE-LHC Projection
Given the recent proposal of an energy upgrade (HE-LHC) operating at √ s = 27 TeV [77] after the high-luminosity phase (HL-LHC), it would be informative to estimate the potential reach for the radiative decay h → ccγ. However, it would be a non-trivial job to do so without knowing the high pile-up and the detector performance under the new conditions. As such, for the purpose of illustration, we can only give a crude projection by assuming a similar environment as in the above studies for HL-LHC. We consider the option with luminosity L = 3 ab −1 and the same pile-up µ = 200. We also assume the same L1 trigger rate.
To compensate the larger and harder background at 27 TeV, we raise the trigger threshold to p T j > 40 GeV, p T γ > 23 GeV, (4.1) in order to maintain the same L1 event rate. As future experiments would come with significant improvements, we relax the isolation cut in Eq. (2.5) and conservatively assume that the same fake photon rate can be achieved while the photon isolation efficiency remains unchanged for photon with p T > 20 GeV. The expected 95% CL s upper limit on the signal strength via cut-based analysis is found to be µ < 98, 82, 81, ⇒ κ c < 9.9, 9.0, 9.0. We thus do not find significant improvement for probing the charm-quark Yukawa coupling at the high energy upgrade of the LHC, since the sensitivity is mostly limited by the L1 rate which in this work is assumed to stay the same at the HE-LHC. We would like to reiterate that the estimated projection here should be considered in the context of our assumptions since the results sensitively depend on the unknown pile-up and the detector performance. Given our assumptions, there is room for potential improvements should the HE-LHC experiments be constructed.

Conclusion
While it is of fundamental importance to probe the charm-quark Yukawa coupling, it is extremely challenging at hadron colliders primarily due to the SM background and the lack of an effective trigger for the signal h → cc. We pointed out that the branching fraction for the Higgs radiative decay h → ccγ is about 4 × 10 −4 and thus would yield a large number of events at the HL-LHC. The existence of an additional photon in the final state may help for the signal identification and background suppression. For instance, the electromagnetic coupling would disfavor the down-type quarks, especially the flavor-tagged bbγ mode. We thus proposed to take advantage of the radiative decay and examine the feasibility of probing the charm-quark Yukawa coupling. We proposed a refined triggering strategy that also included many event features combined with a boosted decision tree. Our results can be summarized as follows.
• A traditional cut-based analysis for identifying the signal h → ccγ yields the sensitivity for a coupling of about 9 times of the SM value at 2σ level at the HL-LHC.
• A boosted decision tree improves the sensitivity by about 10%, reaching a coupling limit of about 8 times the SM value at 2σ.
• As a crude estimate for the sensitivity reach at the HE-LHC, assuming the same pile-up and L1 trigger rate, we found no significant improvement over the results of HL-LHC. There is room for improvement given the assumptions about the HE-LHC experiments and running conditions.
Our results with semi-realistic simulations are comparable to the other related studies [30][31][32][33][34][35][36] and better than the h → J/ψ + γ channel [29] in constraining the charm-Yukawa coupling. Although slightly weaker than the sensitivity from the ATLAS direct search of about 3 times of the SM value [22], there are uncertainties in both analyses due to missing effects in one or the other and so more detailed experimental studies would be required to know which method will achieve the best precision. Multiple complementary approaches are needed to improve the sensitivity to test the SM prediction. We close by making a few remarks on the possible future improvement. Since one of the limiting factors is the huge L1 event rate from QCD multi-jets background, a better photon identification would significantly improve the results. Furthermore, improved c-tagging would also enhance the sensitivity and the machine learning techniques would be more beneficial there. Finally, extending the analysis to other Higgs production modes and different kinematic regimes may help with the trigger challenge.