Probing non-standard bb̄h interaction at the LHC at √ s = 13 TeV

In the detailed probe of Higgs boson properties at the Large Hadron Collider, and in looking for new physics signatures in the electroweak symmetry breaking sector, the bottom quark Yukawa coupling has a crucial role. We investigate possible departure from the standard model value of bb̄h coupling, phenomenologically expressed in terms of a modification factor αb, in bb̄-associated production of the 125-GeV scalar at the high-luminosity LHC. In a next-to-leading order estimate, we make use of a gradient boosting algorithm to improve in statistical significance upon a cut-based analysis. It is found possible to probe down to αb = 3 with more than 5 σ significance, with L = 3000 fb−1 and √ s = 13 TeV, while the achievable limit at 95% C.L. is ±1.95.


Introduction
Whether the 125 GeV scalar discovered in 2012 [1,2] is 'the Higgs' or 'a Higgs' is still an unresolved issue. Most importantly, its interaction strengths with relatively heavy fermions are not yet known precisely enough, in contrast to the interaction with gauge boson pairs, where the uncertainty is much lesser [3]. For example, the signal strength defined as µ b = σ(bb) σ(bb) SM , where the denominator corresponds to the rate predicted by the standard model (SM), lies in the range 0.84 -1.24 [4]. Thus there is considerable scope of variation with respect to the standard model value.
Here, we propose one way of reducing this uncertainty, by taking a fresh look at h-production associated with bb at the high-luminosity Large Hadron Collider (HL-LHC).
The bb-associated production of Higgs has been already studied [5][6][7][8][9][10][11][12][13][14][15], and the the conclusion is that the rates are too small to make any difference, as far as the SM interaction is concerned. However, the rather large error-bar keeps alive the possibility of enhancement in the presence of new physics. This is reflected, for example, in two Higgs doublet models (2HDM) where regions in the parameter space with a large b-coupling of the 125 GeV scalar are still consistent with all experiments [16]. It is important, therefore, to look for clear signatures of such enhancement as the stamp of new physics.
Taking a model-independent standpoint, let us parametrize the modification factor for the bbh interaction strength by α b , treated here as real, as (1.1) Here (y b ) SM = √ 2m b /v is the SM bottom Yukawa coupling (v = 246 GeV is the vacuum expectation value), and y b is the bottom Yukawa coupling in a new physics model. The analysis of Higgs-p T data with Ldt = 35.9f b −1 [17] already restricts α b and α c (its analogue for the charm) as −1.1 ≤ α b ≤ +1.1, − 4.9 ≤ α c ≤ +4.8 at 95% C.L.. However, no other non-standard Higgs interaction is allowed in such an analysis, and thus the contributions to h → ZZ, γγ bring in stringent constraints. However, in the absence of this restrictive assumption and allowance for 'nuisance parameters' relaxes the corresponding ranges to [−8.5, 18.0] for α b and [−33.0, 38.0] for α c . A more recent study [18] in the context of the high-luminosity LHC, running upto 3000 f b −1 , yields the projected limits as −2.0 α b ≤ +4.0, − 10.0 ≤ α c ≤ +10.0 at 95% C.L., once other non-standard interactions are not forbidden, and the branching ratios in the ZZ and γγ channels are not used as prior constraints.
We show here that α b can be pinned down to an even shorter range by considering pp → bbh in the high-luminosity run. In this channel, significant enhancement takes place at the production level itself for large α b . This is of advantage, since the level of enhancement does not saturate with increasing α b , unlike the effect on the branching ratio in the bb channel when the anomalous bb shows up in decays alone.
The resulting signal, where one looks for four b-jets with two of them close to the h-peak, is jacked up substantially for α b → 3.0. However, it is also plagued by backgrounds, including four b-jets from QCD, bbZ production, and also QCD production of 2b2c, with two c-quark jets faking b's.
The backgrounds pose larger next-to-leading order (NLO) QCD corrections strengths than that of the signal, thus making the signal significance smaller at NLO than the leading-order (LO) values. Our analysis reveals how the resulting loss in signal significance due to the NLO QCD effects can be ameliorated by adopting an algorithm based on Boosted Decision Trees (BDT)-in particular, the gradient boosting technique.
In section 2, we provide an outline of the framework to operate within, with α b (and α c ) taken as purely phenomenological parameters, with no bar prima facie on other non-standard interactions. We also discuss the signal and all major irreducible background processes involved in the present analysis. Sections 3 and 4 contains, respectively, report on cut-based and BDT-based machine learning analyses. We summarise and conclude in section 5.

The parametrisation of anomalous couplings
We are interested in the bb-associated Higgs production followed by the Higgs decaying to a pair of b at the LHC, thus resulting in a 4b final state. The representative Feynman diagrams for the bbh production at the LHC are shown in Figure 1. The h → bb decay is not shown in the diagrams. The α b also appears in the decay vertex of the h → bb apart from the production process. The total cross section of the signal   with α b will be, enhancing the SM cross section by a factor of The enhancement factor for the signal cross section over the SM is shown in Figure 2 with varying α b for B(h → bb) ≈ 60% [19]. The solid/blue line represents the factor when the new physics effect is accounted for only in the production part; the dashed/green line, however, represents the enhancement factor when the new physics is accounted for in both production as well as in the decay process.
The major backgrounds to the 4b final state comes from QCD 4b-jets, QCD 2b2c with the c-quarks faking as b-jets, bbZ production, and hZ production. We Figure 3. Representative Feynman diagram for the topologically very similar background process through bbZ production at the LHC at LO.
636.098 +60.6% −35.4% (scale) ± 12.6%(PDF) hZ → 4b 0.01764 +4.7% −5.8% (scale) ± 6.1%(PDF) Table 1. Parton level cross sections of signal and the major background processes at the leading order (LO), after applying the generator level event selection cuts of ignore the QCD 4j process as the probability of four light jets faking as four b-jets is insignificantly small. The background bbZ, Feynman diagrams shown in Figure 3, has the same topology as the signal bbh, and thus expected to be irreducible from the signal. The QCD and the hZ backgrounds, however, are expected to be reducible for having a different topology than that of the signal. The leading order cross sections for the signal and the backgrounds for the 4b final state estimated in the MadGraph5_aMC@NLO v2.6.4 (mg5_aMC) [20] package with a generator level cuts Table 1. We use a fixed renormalization (µ R ) and factorization (µ F ) scale of µ R = µ F = µ 0 = (2m b + m h )/2 for the signal as well as for the backgrounds motivated by the bbh production topology. The scale uncertainties, shown in Table 1 Table 2.
Expected number of events for the signal as well as the backgrounds and signal significance (S/ √ B) at L = 3000 fb −1 with α b = 3 at LO as well as at NLO for We use the NNPDF3.0 [21] sets of parton distribution functions (PDFs) with α s (m Z ) = 0.118 for our calculations. A branching ratio of 60% is used for the h → bb decay [19] with m h = 125 GeV.

Cut-based analysis
We generated events for the signal and the backgrounds in mg5_aMC at LO and NLO with chosen renormalization and factorization scale. The QCD-2b2c background, however, is generated only at LO, and it is used for NLO analysis with a k-factor of 1.4 taken from the QCD-4b background. The showering and hadronization of the events are performed by PYTHIA8 [22] followed by the detector simulations by Delphes-3.4.2 [23]. We estimated the expected number of events with four b-tagged jets for the signal with α b = 3 and the backgrounds after detector simulations at an integrated luminosity of L = 3000 fb −1 for the following two kinematical regions: and present them in Table 2 for µ R = µ F = µ 0 = (2m b + m h )/2. For the cut2, we select events with at least one b pair with invariant mass in the range [100, 150], thus emulating a Higgs candidate. We calculate the signal significance, defined by S/ √ B with S being signal events and B being total background events, for the two cut region, and they are shown in the lowest row of Table 2. A bbh signal with α b = 3 can be observed with a significance of 4.54 (4.21) at LO (NLO) in the cut2 region at an integrated luminosity of L = 3000 fb −1 for renormalization and factorization scale of µ R = µ F = µ 0 = (2m b +m h )/2. The signal significance for other renormalization and factorization scales namely µ R = µ F = µ 0 /2, 2µ 0 are also shown in the next section for comparison. The QCD corrections for the signal being much smaller compared to the same for the QCD backgrounds, and the shape of the distributions of the variables being similar for LO and NLO, the signal significance is smaller at NLO compared to the LO result. Other than the cut2 regions, cuts such as p T , H T , m 4b , E T on the b-jets do not improve the signal significance. These variables, however, in certain combinations, may improve the signal significance, which we explore with the gradient boosting technique in the next section.

Analysis based on the gradient boosting technique
After estimating a maximally achievable signal significance with a simple cut-based analysis (cut2 in Eq. (3.1)), we further explore the possibility of improving the significance by a Machine Learning technique namely Gradient Boosted Decision Trees (gradient BDT) [24] by employing various kinematical variables. We use the package XGBoost [24] as a toolkit for the gradient boosting. We construct these following kinematical features as input for the gradient boosting: We use an equal number of events for the signal and background events to classify them using the train module of XGBoost. The backgrounds are mixed with the ratio of their corresponding rates in cut2 region given in Table 2. We use 80% of the total dataset for training purposes and the rest 20% for testing purposes. At first,  we vary the XGBoost parameters to obtain a combination of them for a maximum accuracy to classify the signal and the backgrounds. We obtain a maximum accuracy of 69.13% ± 0.44% (1σ) for LO events and 68.06% ± 0.23% (1σ) for NLO events at µ R = µ F = µ 0 for the following combination of the parameters' values [25]: • Step size shrinkage: η = 0.1, • Maximum depth of a tree: max_depth = 50, • Subsample ratio of the training instances: subsample = 0.9, • subsample ratio of columns when constructing each tree: colsample_bytree= 0.3, • Minimum loss reduction required to make a further partition on a leaf node of the tree: γ = 1.0, • L2 regularization term on weights: λ = 50.0, • L1 regularization term on weights: α = 1.0, • Number of parallel trees constructed during each iteration: num_parallel_tree=8.
The final probability distributions of the output of the BDT network (XGBoost score) for the signal and the total backgrounds are shown in Figure 4 in the top-row to Significance  Table 3 in the second and fourth column for LO and NLO, respectively. Table 3 also contains result for renormalization and factorization scale choice µ 0 /2 and 2µ 0 for NLO; The reason being discussed below.
The QCD correction strengths and the shape of the distributions for the kinematical variable change as the renormalization and factorization scale are changed for the signal as well as for backgrounds. As a result, our results in cut-based as well as in BDT-based analysis are expected to be different for different µ R and µ F . So, we repeat all analyses for two extreme cases of µ R and µ F with a factor of half and one, i.e., µ R = µ F = µ 0 /2, µ 0 , 2µ 0 apart from µ 0 and obtain the results. The results are shown in Table 3 for µ R = µ F = µ 0 /2, µ 0 , 2µ 0 . The results for µ 0 are repeated for comparison. The QCD correction strength increase for the signal as the scale choices are doubled to 2µ 0 , while it decreases as the scale choices are reduced to µ 0 /2. However, the QCD corrections remain roughly the same for the backgrounds, specially for the dominant 4b QCD background. As a result, as the scale choices are doubled, the signal significance improves by 25%, but it decreases when the scale choices are halved at the cut-based analysis. In the XGBoost result, the significance enhancement factor, however, increase a little due to the increase in signal efficiency for lower-scale choices.
Till now we have shown the results for α b = 3, i.e., for a fixed value of the new physics parameter. The total signal significance, including cut-based and XGBoost, are computed for varying α b , and they are shown in Figure 5 for µ R = µ F = µ 0 at LO and NLO in left-panel at L = 3000 fb −1 . The right-panel in Figure 5 shows the comparison of signal significance for three different scale choices namely µ 0 /2, µ 0 , and 2µ 0 at NLO for the same luminosity L = 3000 fb −1 . The limits on α b is obtained to be ±1.95 at 95% C.L. at NLO for µ R = µ F = µ 0 , see Figure 5 left-panel. The limits on α b is tighter for higher scale choices and weaker for lower scale choices, as can be seen in the right-panel.
It appears that the strengths of QCD corrections for the backgrounds are always higher than that for the signal for a range of renormalization and factorization scales, thus making the signal significance smaller for NLO than LO for both cut-based and XGBoost analysis. A 5σ discovery significance is achievable for a moderate value of α b = 3 at a projected luminosity of 3000 fb −1 at the LHC.

Summary and conclusions
While LHC is emphatically looking for any indication of elusive new physics, hints of that can already be hidden in our Higgs data. Precision measurements of Higgs coupling with third-generation quarks are thus crucial in indirect probes on the physics beyond the standard model. In this present work, we probe the non-standard hbb coupling parametrized in a model-independent standpoint in bb-associated production of Higgs. We point out the importance and effectiveness of this channel in uncovering the modification factor in hbb interaction strength.
With a detailed detector level simulation, we devised a phase space region to emulate a Higgs peak in the signal and also to regulate the background processes. We obtain a moderate signal significance showing the outcome both at LO as well as at NLO for a choice of modification factor α b = 3 at high luminosity LHC. This cut-based significance is further refined upon by gradient boosted techniques later on. Overall, the NLO result is slightly weaker than that of LO. We also investigate the effect of (renormalization and factorization) scale variation on the results at NLO and observe a significant variation, with better results at relatively higher scale values.
The limit on α b , which is ±1.95 at 95% C.L. for L = 3000 fb −1 , is surpassing the existing results in literature.
During the concluding stage of study, we came across reference [26], where an anomalous hbb interaction has been investigated via the Higgs decay channel h → γγ, in the context of higher energies and luminosities than those envisioned currently for the HL-LHC. Our study differs from theirs in several ways. First, by concentrating on the bb decay mode, one expects considerably larger event rates. Secondly, it also entails more severe backgrounds. Thirdly, next-to-leading order QCD effects are more non-trivial, not only for the backgrounds but for the signal as well. We have shown how to overcome the second and third issues, especially with the help of gradient boosting techniques, and thus improve upon hitherto estimated levels of constraining α b at the HL-LHC, with Ldt = 3000 fb −1 .