Search for light pseudoscalar boson pairs produced from decays of the 125 GeV Higgs boson in ﬁnal states with two muons and two nearby tracks in pp collisions at

A search is presented for pairs of light pseudoscalar bosons, in the mass range from 4 to 15 GeV, produced from decays of the 125 GeV Higgs boson. The decay modes considered are ﬁnal states that arise when one of the pseudoscalars decays to a pair of tau leptons, and the other one either into a pair of tau leptons or muons. The search is based on proton-proton collisions collected by the CMS experiment in 2016 at a center-of-mass energy of 13 TeV that correspond to an integrated luminosity of 35.9 fb − 1 . The 2 μ 2 τ and 4 τ channels are used in combination to constrain the product of the Higgs boson production cross section and the branching fraction into 4 τ ﬁnal state, σ B , exploiting the linear dependence of the fermionic coupling strength of pseudoscalar bosons on the fermion mass. No signiﬁcant excess is observed beyond the expectation from the standard model. The observed and expected upper limits at 95% conﬁdence level on σ B , relative to the standard model Higgs boson production cross section, are set respectively between 0.022 and 0.23 and between 0.027 and 0.19 in the mass range probed by the analysis.


Introduction
After the discovery of the 125 GeV Higgs boson (H) [1,2], searches for additional Higgs bosons, based on predictions beyond the standard model (SM), constitute an important part of the scientific program at the CERN Large Hadron Collider (LHC). The present analysis examines theoretical models that contain two Higgs doublets and an additional complex singlet Higgs field (denoted hereafter as 2HD+1S), that does not couple at tree level to fermions or gauge bosons and interacts only with itself and the Higgs doublets [3][4][5][6][7][8][9][10]. In CP conserving models, which are considered in this Letter, the Higgs sector features seven physical states, namely three CP-even, two CP-odd, and two charged bosons, where one of the CP-even states corresponds to the H. This kind of Higgs sector is realized, for example, in next-to-minimal supersymmetric models that solve the so-called μ problem of the minimal supersymmetric extension of the SM [11]. A large set of the 2HD+1S models is allowed by measurements and constraints set by searches for additional Higgs bosons and supersymmetric particles [12][13][14][15][16][17].
E-mail address: cms -publication -committee -chair @cern .ch. This Letter addresses specific 2HD+1S models in which the lightest pseudoscalar boson (a 1 ) with mass 2m a 1 < 125 GeV has a large singlet component, and therefore its couplings to SM particles are significantly reduced. For this reason, analyses using direct production modes of a 1 , such as gluon-gluon fusion (ggF) or b quark associated production, have limited sensitivity. The a 1 boson is nonetheless potentially accessible in the H decay to two pseudoscalar bosons. The a 1 states can be identified via their decay into a pair of fermions [18][19][20][21][22][23][24][25]. Constraints on the H couplings allow a branching fraction for H decays into non-SM particles as large as 34% [26], which can potentially accommodate the H → a 1 a 1 decay at a rate sufficiently high for detection at the LHC.
Several searches for H → a 1 a 1 decays have been performed in the ATLAS and CMS experiments in Run 1 (8 TeV) and Run 2 (13 TeV) of LHC, exploiting various decay modes of the a 1 boson, and probing different ranges of its mass [27][28][29][30][31][32][33][34][35][36][37][38][39][40]. These searches found no significant deviation from the expectation of the SM background and upper limits were set on the product of the production cross section and the branching fraction for signal resulting in constraints on parameters of the 2HD+1S models.
This analysis presents a search for light a 1 bosons in the decay channels H → a 1 a 1 → 4τ /2μ2τ , using data corresponding to an  in 2016 at a center-of-mass energy of 13 TeV. The analysis covers the mass range from 4 to 15 GeV and employs a special analysis strategy to select and identify highly Lorentz-boosted muon or tau lepton pairs with overlapping decay products. The study updates a similar one performed by the CMS Collaboration in Run 1 [28], and complements other recent CMS searches for the H → a 1 a 1 decay performed in Run 2 data in the 2μ2τ [30], 2τ 2b [31], 2μ2b [38] and 4μ [39] final states, covering respective mass ranges of 0.25 < m a 1 < 3.40 GeV for the 4μ final state and 15.0 < m a 1 < 62.5 GeV for the 2μ2τ , 2τ 2b, and 2μ2b final states.
The branching fraction a 1 → τ τ depends on the details of the model, namely the parameter tan β, the ratio of vacuum expectation values of the two Higgs doublets, and on which Higgs doublet couples to either charged leptons, up-type quarks or down-type quarks [41]. In Type-II 2HD+1S models, where one Higgs doublet couples to up-type fermions while the other couples to down-type fermions, the a 1 → τ τ decay rate gets enhanced at large values of tan β. The branching fraction of this decay reaches values above 90% at tan β > 3 for 2m τ < m a 1 < 2m b , where m τ is the mass of the tau lepton and m b is the mass of the bottom quark. For higher values of m a 1 the branching fraction decreases to 5-6% since the decay into a pair of bottom quarks becomes kinematically possible and overwhelms the decay into a pair of tau leptons. However, in some of the 2HD+1S models the a 1 → τ τ decay may be dominant even above the a 1 → bb decay threshold. This is realized, e.g., for tan β > 1 in the Type-III 2HD+1S models, where one Higgs doublet couples to charged leptons, whereas the other doublet couples to quarks [41]. The signal topology targeted by the present analysis is illustrated in Fig. 1. Each a 1 boson is identified by the presence of a muon and only one additional charged particle, the objective of this approach being the decay channels a 1 → μμ and a 1 → τ μ τ one-prong . The τ μ denotes the muonic tau lepton decay, and τ one-prong stands for its leptonic or one-prong hadronic decay. The three-prong modes are not used because of the very high QCD multijet background and lower reconstruction signal efficiency.
Given the large difference in mass between the a 1 and the H states, the a 1 bosons will be produced highly Lorentz-boosted, and their decay products are highly collimated. This will result in a signature with two muons, each of which is accompanied by a nearby particle of opposite charge. The search focuses primarily on the dominant ggF process, in which the H state is produced with relatively small transverse momentum p T , and the a 1 pseudoscalars are emitted nearly back-to-back in the transverse plane, with a large separation in azimuth φ between the particles originating from one of the a 1 decays and those of the other a 1 . In the ggF process, the H can be also produced with a relatively high Lorentz boost when a hard gluon is radiated from the initial-state gluons or from the heavy-quark loop. In this case, the separation in φ is reduced, but the separation in pseudorapidity η can be large.
The analysis therefore searches for a signal in a sample of samecharge (SC) dimuon events with large angular separation between the muons, where each muon is accompanied by one nearby oppositely charged particle originating from the same a 1 decay. The requirement of having SC muons in the event largely suppresses background from the top-quark-pair, Drell-Yan, and diboson production. This requirement also facilitates the implementation of a dedicated SC dimuon trigger with relatively low thresholds and acceptable rates as described in Section 4.

CMS detector
The central feature of the CMS detector is a superconducting solenoid of 6 m internal diameter, providing a magnetic field of 3.8 T. Within the solenoid volume are a silicon pixel and strip tracker, a lead tungstate crystal electromagnetic calorimeter, and a brass and scintillator hadron calorimeter, each composed of a barrel and two endcap sections. Forward calorimeters extend the η coverage provided by the barrel and endcap detectors. Muons are detected in gas-ionization chambers embedded in the steel fluxreturn yoke outside the solenoid.
Events of interest are selected using a two-tiered trigger system [42]. The first level, composed of custom hardware processors, uses information from the calorimeters and muon detectors to select events at a rate of around 100 kHz within a time interval of less than 4 μs. The second level, known as the high-level trigger, consists of a farm of processors running a version of the full event reconstruction software optimized for fast processing, and reduces the event rate below 1 kHz before data storage.
A more detailed description of the CMS detector, together with a definition of the coordinate system used and the relevant kinematic variables, can be found in Ref. [43].

Simulated samples
For the simulation of the dominant ggF production process, the Monte Carlo (MC) event generators pythia (v.8.212) [44] and [45] are used in order to model the H → a 1 a 1 → 4τ and H → a 1 a 1 → 2μ2τ signal events, respectively. For both decay modes the p T distribution of the H emerging from ggF is reweighted with next-to-next-to-leading order (NNLO) K factors obtained by the program HqT (v2.0) [46,47] with NNLO NNPDF3.0 parton distribution functions (PDF) [48], hereby taking into account the more precise spectrum calculated to NNLO with resummation to next-to-next-to-leading-logarithms order. Subdominant contributions from other production modes of H, namely vector boson fusion process (VBF), vector boson associated production (VH) and top quark pair associated production (ttH) are estimated using the pythia (v.8.212) generator.
Showering and hadronization are carried out by the pythia (v.8.212) generator with the CUETP8M1 underlying event tune [54], while a detailed simulation of the CMS detector is based on the

Event selection
Events are selected using a SC dimuon trigger with p T thresholds of 17 (8) GeV for the leading (subleading) muon. To pass the high-level trigger, the tracks of the two muons are additionally required to have points of closest approach to the beam axis within 2 mm of each other along the longitudinal direction.
Events are reconstructed with the particle-flow (PF) algorithm [56] which aims to identify and reconstruct individual particles as photons, charged hadrons, neutral hadrons, electrons, or muons (PF objects). The proton-proton (pp) interaction vertices are reconstructed using a Kalman filtering technique [57,58]. Typically more than one such vertex is reconstructed because of multiple pp collisions within the same or neighbouring bunch crossings. The mean number of such interactions per bunch crossing was 23 in 2016.
The reconstructed vertex with the largest value of summed physics-object p 2 T is taken to be the primary interaction vertex (PV). The physics objects are the jets, clustered using the jetfinding algorithm [59,60] with the tracks assigned to the vertex as inputs, and the associated missing transverse momentum, taken as the negative vector sum of the p T of those jets. Events must contain at least two SC muons reconstructed with the PF algorithm, which have to fulfil the following requirements.
• The pseudorapidity of the leading (higher p T ) and the subleading (lower p T ) muons must be |η| < 2.4. • The p T of the leading (subleading) muon must exceed 18 (10) GeV. • The transverse (longitudinal) impact parameters of muons with respect to the PV are required to be |d 0 | < 0.05 (|d z | < 0.1) cm.
• The angular separation between the muons is R = If more than one SC muon pair is found in the event to satisfy these requirements, the pair with the largest scalar sum of muon p T is chosen.
In the next step, the analysis employs information about tracks associated with the reconstructed charged PF objects, excluding the pair of SC muons. Selected muons and tracks are used to build and isolate candidates for the a 1 → τ μ τ one-prong or a 1 → μμ decays (referred to as a 1 candidates throughout the Letter). Three types of tracks are considered in the analysis.
A track is regarded as being nearby a muon if the angular separation R between them is smaller than 0.5. Each muon of the SC pair is required to have one nearby "signal" track with a charge opposite to its charge. This muon-track system is accepted as an a 1 candidate if no additional "isolation" tracks are found in the R cone of 0.5 around the muon momentum direction. The event is selected in the final sample if it contains two a 1 candidates. The Table 1 The signal acceptance and the number of expected signal events after selection in the SR. The number of expected signal events is computed for a benchmark value of branching fraction, B(H → a 1 a 1 )B 2 (a 1 → τ τ ) = 0.2 and assuming that the H production cross section is the one predicted in the SM. The quoted uncertainties for predictions from simulation include only statistical ones. B(H → a 1 a 1 )B 2 (a 1 → τ τ ) = 0.2 and assuming that the H production cross section is the one predicted in the SM. Contributions from the ggF, VBF, VH and ttH processes are summed up. The yield of the 2μ2τ signal is estimated under the assumption that the partial widths of the a 1 → μμ and a 1 → τ τ decays satisfy the relation [23] ( The ratio of branching fractions of the a 1 a 1 → 2μ2τ and a 1 a 1 → 4τ decays is computed through the ratio of the partial widths The factor of 2 in Eq. (2) arises from two possible decays, a 1 → 2τ 2μ, that produce the final state with two muons and two tau leptons. The ratio in Eq. (2) ranges from about 0.0073 at m a 1 = 15 GeV to 0.0155 at m a 1 = 4 GeV. The contribution from the H → a 1 a 1 → 4μ decay is estimated taking into account Eq. (1). It ranges between 0.4 and 2% of the total signal yield in the 2μ2τ and 4τ final states, depending on the probed mass of the a 1 boson. This contribution is not considered in the present analysis. The number of observed events selected in the SR amounts to 2035. A simulation-based study shows that the QCD multijet events dominate the sample of events selected in the SR. Contribution from other background sources constitutes about 1% of events selected in the SR.
The two-dimensional (2D) distribution of the invariant masses of the muon-track systems, constituting a 1 candidates, is used to discriminate between signal and the dominant QCD multijet background in the signal extraction procedure. The 2D distribution is filled with a pair of the muon-track invariant masses (m 1 , m 2 ), ordered by their value, m 2 > m 1 . The binning of the 2D distribution adopted in the analysis is illustrated in Fig. 2. As m 2 is required to exceed m 1 , only (i, j) bins with j ≥ i are filled in the 2D distribution, yielding in total 6(6 + 1)/2 = 21 independent bins. Bins (i, 6) with i = 1, 5 contain all events with m 2 > 6 GeV. Bin (6, 6) contains all events with m 1,2 > 6 GeV.

Modeling background
A simulation-based study reveals that the sample of SC muon pairs selected as described in Section 4, but without requiring the Table 2 Control regions used to construct and validate the background model. The symbols N sig , N iso and N soft denote the number of "signal", "isolation" (which are a subset of "signal" tracks) and "soft" tracks, respectively, within a cone of R = 0.5 around the muon momentum direction. The last row defines the SR.

Control region
Assessment of presence of a 1 candidates, is dominated by QCD multijet events, where about 85% of all selected events contain bottom quarks in the final state. The SC muon pairs in these events originate mainly from the following sources: • muonic decay of a bottom hadron in one bottom quark jet and cascade decay of a bottom hadron into a charm hadron with a subsequent muonic decay of the charm hadron in the other bottom quark jet; • muonic decay of a bottom hadron in one bottom quark jet and decay of a quarkonium state into a pair of muons in the other jet; • muonic decay of a bottom hadron in one bottom quark jet and muonic decay of a B 0 meson in the other bottom quark jet. The SC muon pair in this case may appear as a result of B 0 -B 0 oscillations.
The normalized 2D (m 1 , m 2 ) distribution for the muon-track pairs with m 2 > m 1 is represented in the sample of background events by a binned template constructed using the following relation • C (i, j) is a symmetric matrix, accounting for possible correlation between m 1 and m 2 , the elements of the matrix C (i, j) are referred to as "correlation factors" in the following.
The condition C (i, j) = 1 for all bins (i, j) would indicate an absence of correlation between m 1 and m 2 . We sum the contents to account for the fact that each event enters the 2D (m 1 , m 2 ) distribution with ordered values of the muon-track invariant masses.
By construction the background model estimates the dominant QCD multijet production as well as small contributions from other processes.
Multiple control regions (CRs) are introduced in order to derive and validate the modeling of f 1D (i) and C (i, j). The CRs are defined on the basis of a modified isolation criteria applied to one or both muon-track pairs. The isolation criteria are specified by the multiplicity of "isolation" tracks in the cone of R = 0.5 around the muon momentum direction. The summary of all CRs used to derive and validate the modeling of background shape is given in Table 2.

Modeling of f 1D (i)
The f 1D (i) distribution is modeled using the N 23 CR. Events in this CR pass the SC dimuon selection and contain only one a 1 candidate composed of the isolated "signal" track and muon (first muon). The invariant mass of the first muon and associated track enters the f 1D (i) distribution. Another muon (second muon) is required to be accompanied by either two or three nearby "isolation" tracks. The simulation shows that more than 95% of events selected in the CR N 23 are QCD multijet events, while the remaining 5% is coming from tt, Drell-Yan and other electroweak processes.
The modeling of the f 1D (i) template is based on the hypothesis that the kinematic distributions for the muon-track system, making up an a 1 candidate (the first muon and associated track), are weakly affected by the isolation requirement imposed on the second muon; therefore the f 1D (i) distribution of the muon-track system forming an a 1 candidate is expected to be similar in the SR and the N 23 CR.
This hypothesis is verified in control regions labelled N iso,2 = 1 and N iso,2 = 2, 3. Events are selected in these CR if one of the muons (first muon) has more than one "isolation" track (N iso > 1).
At least one of these "isolation" tracks should also fulfil the criteria imposed on the "signal" track. As more than one of these tracks can pass the criteria imposed on "signal" tracks, two scenarios have been investigated, namely using either the lowest or the highest p T "signal" tracks ("softest" and "hardest") to calculate the muon-track invariant mass. If only one "signal" track is found nearby to the first muon, the track is used both as the "hardest" Fig. 3. The observed invariant mass distribution, normalized to unity, of the first muon and the softest (upper) or hardest (lower) accompanying "signal" track for different isolation requirements imposed on the second muon: when the second muon has only one accompanying "isolation" track (N iso,2 = 1; circles); or when it has two or three accompanying "isolation" tracks (N iso,2 = 2, 3; squares). and the "softest" signal track. For the second muon, two isolation requirements are considered: when the muon is accompanied by only one "signal" track and the muon-track system is isolated as in the SR (CR N iso,2 = 1), or when it is accompanied by two or three "isolation" tracks as in the CR N 23 (CR N iso,2 = 2, 3). The invariant mass distributions of the first muon and the softest or hardest accompanying track are then compared for the two different isolation requirements on the second muon, N iso,2 = 1 and N iso,2 = 2, 3.
The results of this study are illustrated in Fig. 3. In both cases, the invariant mass distributions differ in each bin by less than 6%. This observation indicates that the invariant mass of the muontrack system, making up an a 1 candidate, weakly depends on the isolation requirement imposed on the second muon, thus supporting the assumption that the f 1D (i) distribution can be determined from the N 23 CR. The potential dependence of the muon-track invariant mass distribution on the isolation requirement imposed on the second muon is verified also by comparing shapes in the control regions N 23 and N 45 . The latter CR is defined by requiring the presence of 4 or 5 "isolation" tracks nearby to the second muon, while the first muon-track pair passes selection criteria for the a 1 candidate. The results are illustrated in Fig. 4. A slight difference is observed between distributions in these two CRs. This difference is taken as a shape uncertainty in the normalized template f 1D ( j) entering Eq. (3). Fig. 5 presents the normalized invariant mass distribution of the muon-track system for data selected in the SR and for the background model derived from the N 23 CR. The data and background distributions are compared to the signal distributions, obtained from simulation, for four representative mass hypotheses, m a 1 = 4, 7, 10, and 15 GeV. The invariant mass of the muon-track system is found to have higher discrimination power between the background and the signal at higher m a 1 . For lower masses, the signal shape becomes more background like, resulting in a reduction of discrimination power.

Modeling of C (i, j)
In order to determine the correlation factors C (i, j), an additional CR (labelled Loose-Iso) is used. It consists of events that contain two SC muons passing the identification and kinematic selection criteria outlined in Section 4. Each muon is required to have two or three nearby tracks. One of them should belong to the category of "signal" tracks, whereas remaining tracks should belong to the category of "soft" tracks. About 36k data events are selected in this CR. The simulation predicts that the QCD multijet events dominate this CR, comprising more than 99% of selected events. It was also found that the overall background-to-signal ratio is enhanced compared to the SR by a factor of 30 to 40, depending on the mass hypothesis, m a 1 . The event sample in this region is used to build the normalized distribution f 2D (i, j). Finally, the correlation factors C (i, j) are obtained according to Eq. (3) as  where f 1D (i) is the 1D normalized distribution with two entries per event (m 1 and m 2 ). The correlation factors C (i, j) derived from data in the Loose-Iso CR are presented in Fig. 6. To obtain estimates of C (i, j) in the signal region, the correlation factors derived in the Loose-Iso CR have to be corrected for the difference in C (i, j) between the signal region and Loose-Iso CR. This difference is assessed by comparing samples of simulated background events. The correlation factors estimated from simulation in the signal region and the Loose-Iso CR are presented in Fig. 7. The correlation factors in the signal region are then computed as where • C (i, j) CR data are correlation factors derived for the Loose-Iso CR in data (Fig. 6); • C (i, j) SR MC are correlation factors derived for the SR in the simulated QCD multijet sample (Fig. 7, upper); MC are correlation factors derived for the Loose-Iso CR in the simulated QCD multijet sample (Fig. 7, lower). <0.5% The difference in correlation factors derived in the SR (Fig. 7, upper) and in the Loose-Iso CR (Fig. 7, lower) using the QCD multijet sample is taken into account as an uncertainty in C (i, j).

Modeling signal
The signal templates are derived from the simulated samples of the H → a 1 a 1 → 4τ and H → a 1 a 1 → 2μ2τ decays. The study probes the signal strength modifier, defined as the ratio of the product of the measured signal cross section and the branching fraction into the 4τ final state B(H → a 1 a 1 )B 2 (a 1 → τ τ ) to the inclusive cross section of the H production predicted in the SM. The relative contributions from different production modes of H are defined by the corresponding cross sections predicted in the SM. The contribution of the H → a 1 a 1 → 2μ2τ decay, is computed assuming that the partial widths of a 1 → τ τ and a 1 → μμ decays satisfy Eq. (1).
The invariant mass distribution of the muon-track system in the a 1 → μμ decay channel peaks at the nominal value of the a 1 boson mass, while the reconstructed mass of the muon-track system in the a 1 → τ τ decay is typically lower, because of the missing neutrinos. This is why the H → a 1 a 1 → 2μ2τ signal samples have a largely different shape of the (m 1 , m 2 ) distribution compared to the H → a 1 a 1 → 4τ signal samples. Fig. 8 compares the (m 1 , m 2 ) distributions unrolled in a one row between the H → a 1 a 1 → 4τ and H → a 1 a 1 → 2μ2τ signal samples for mass hypotheses m a 1 4 GeV and 10 GeV. The signal distributions are normalized assuming the SM H production rate with the branching fraction B(H → a 1 a 1 )B 2 (a 1 → τ τ ) equal to 0.2. Table 3 lists the systematic uncertainties considered in the analysis for both signal and background.

Uncertainties related to the background
The estimation of the QCD multijet background is based on observed data, therefore it is not affected by imperfections in the simulation, reconstruction, or detector response.
The shape of the background in the (m 1 , m 2 ) distribution is modeled according to Eq. (3), while its uncertainty is dominated by uncertainties related to the correlation factors C (i, j) (as described in Section 5.2). Additionally, it is also affected by the shape uncertainty in the 1D template f 1D (m) (as discussed in Section 5.1). The bin-by-bin uncertainties in mass correlation factors C (i, j), derived from Eq. (5), are composed of the statistical uncertainties in observed data and simulated samples, as presented in Figs. 6 and 7, and range from 3 to 60%. These uncertainties are accounted for in the signal extraction procedure by one nuisance parameter per bin in the (m 1 , m 2 ) distribution [61]. The systematic uncertainties related to the extrapolation of C (i, j) from the Loose-Iso CR to the SR are derived from the dedicated MC study outlined in Section 5.2. The related shape uncertainty is determined by comparing correlation factors derived in the simulated samples, between the signal region and the Loose-Iso CR.
In the case when B(H → a 1 a 1 )B 2 (a 1 → τ τ ) = 0.34, corresponding to an upper limit at 95% confidence level (CL) on the branching fraction of the H decay into non-SM particles from Ref.
[26], the impact of possible signal contamination in the Loose-Iso CR is estimated on a bin-by-bin basis, and it is at most 2.8% in the bin (6, 6) which was found to have a negligible effect on the final results. For all other CRs, the signal contamination was found to be well below 1%.

Uncertainties related to signal
An uncertainty of 2.5% is assigned to the integrated luminosity estimate [62].
The uncertainty in the muon identification and trigger efficiency is estimated to be 2% for each selected muon obtained with the tag-and-probe technique [63]. The track selection and muontrack isolation efficiency is assessed with a study performed on a sample of Z bosons decaying into a pair of tau leptons. In the selected Z → τ τ events, one tau lepton is identified via its muonic decay, while the other is identified as an isolated track resulting from a one-prong decay. The track is required to pass the nominal selection criteria used in the main analysis. From this study, the uncertainty in the track selection and isolation efficiency is evaluated. The related uncertainty affects the shape of the signal estimate, while changing the overall signal yield by 10-18%. The muon and track momentum scale uncertainties are smaller than 0.3% and have a negligible effect on the analysis.
The bin-by-bin statistical uncertainties in the signal acceptance range from 8 to 100%, while the impact on the overall signal normalization varies between 5 and 20%.
Theoretical uncertainties have an impact on the differential kinematic distributions of the produced H, in particular its p T spectrum, thereby affecting signal acceptance. The uncertainty due to missing higher-order corrections to the ggF process is estimated with the HqT program by varying the renormalization (μ R ) and factorization (μ F ) scales. The H p T -dependent K factors are recomputed according to these variations and applied to the simulated signal samples. The resulting effect on the signal acceptance is estimated to vary between 1.2 and 1.5%, depending on m a 1 . In a similar way, the uncertainty in the signal acceptance is computed for the VBF, VH and ttH production processes. The impact on the acceptance is estimated to vary between 0.8 and 2.0%, depending on the process and probed mass of the a 1 boson.
The HqT program is also used to evaluate the effect of the PDF uncertainties. The nominal K factors for the H p T spectrum are computed with the NNPDF3.0 PDF set [48]. Variations of the NNPDF3.0 PDFs within their uncertainties change the signal acceptance by about 1%, whilst using the CTEQ6L1 PDF set [64] changes the signal acceptance by about 0.7%. The impact of the PDF uncertainties on the acceptance for the VBF, VH and ttH production processes is estimated in the same way and a 2% uncertainty is considered to account for these.
Systematic uncertainties in theoretical predictions for the signal cross sections are driven by variations of the μ R and μ F scales and PDF uncertainties. Uncertainties related to scale variations range from 0.4 to 9%, depending on the production mode. Uncertainties related to PDF vary between 2.1 and 3.6%.

Results
The signal is extracted with a binned maximum-likelihood fit applied to the (m 1 , m 2 ) distribution. For each probed mass of the a 1 boson, the (m 1 , m 2 ) distribution is fitted with the sum of two templates, corresponding to expectations for the signal and background, dominated by QCD multijet events. The normalization of both signal and background are allowed to float freely in the fit. The systematic uncertainties affecting the normalization of the signal templates are incorporated in the fit via nuisance parameters with a log-normal prior probability density function. The shape-altering systematic uncertainties are represented by nuisance parameters whose variations cause continuous morphing of the signal or background template shape, and are assigned a Gaussian prior probability density functions. The bin-by-bin statistical uncertainties are assigned gamma prior probability density functions. Fig. 9 shows the distribution of (m 1 , m 2 ), where the notation for the bins follows that of Fig. 2. The shape and the normalization of the background distribution are obtained by applying a fit to the observed data under the background-only hypothesis. Also shown are the expectations for the signal at m a 1 = 4, 7, 10, and 15 GeV. The signal normalization is computed assuming that the H is produced in pp collisions with a rate predicted by the standard model, and decays into a 1 a 1 → 4τ final state with a branching fraction of 20%. No significant deviations from the background expectation are observed in the (m 1 , m 2 ) distribution.
Results of the analysis are used to set upper limits at 95% CL on the product of the cross section and branching fraction, a 1 a 1 )B 2 (a 1 → τ τ ), relative to the inclusive SM cross section of H production. The modified frequentist CL s criterion [65,66], and the asymptotic formulae are used for the test statistic [67], implemented in the RooStats package [68]. Fig. 10 shows the observed and expected upper limits at 95% CL on the signal cross section times the branching fraction, relative to the total cross section of the H boson production as predicted in the SM. The observed limit is compatible with the expected limit within one standard deviation in the entire range of m a 1 considered, and ranges from 0.022 at m a 1 = 9 GeV to 0.23 at m a 1 = 4 GeV and reaches 0.16 at m a 1 = 15 GeV. The expected upper limit ranges from 0.027 at m a 1 = 9 GeV to 0.16 at m a 1 = 4 GeV and reaches 0.19 at m a 1 = 15 GeV. The degradation of the analysis sensitivity towards lower values of m a 1 is caused by the increase of the background yield at low invariant masses of the muon-track systems, as illustrated in Figs. 5 and 9. With increasing m a 1 , the average an- gular separation between the decay products of the a 1 boson is increasing. As a consequence, the efficiency of the signal selection drops down, as we require the muon and the track, originating from the a 1 → τ μ τ one-prong or a 1 → μμ decay, to be within a cone of R = 0.5. This explains the deterioration of the search sensitivity at higher values of m a 1 . The shaded area in blue indicates the excluded region of >34% for the branching fraction of the H decay into non-SM particles at 95% CL [26].
The new limits improve significantly over the previous 8 TeV limits [28] by 30% (for low masses) and up to 80% (for intermediate masses of 8 GeV), while the new analysis further extends the coverage of m a 1 up to 15 GeV.

Summary
A search is presented for light pseudoscalar a 1 bosons, produced from decays of the 125 GeV Higgs boson (H) in a data set corresponding to an integrated luminosity of 35.9 fb −1 of protonproton collisions at a center-of-mass energy of 13 TeV. The analysis is based on the H inclusive production and targets the H → a 1 a 1 → 4τ /2μ2τ decay channels. Both channels are used in combination to constrain the product of the inclusive signal production cross section and the branching fraction into the 4τ final state, exploiting the linear dependence of the fermionic coupling strength of a 1 on the fermion mass. With no evidence for a signal, the observed 95% confidence level upper limit on the product of the inclusive signal cross section and the branching fraction, relative to the SM H production cross section, ranges from 0.022 at m a 1 = 9 GeV to 0.23 at m a 1 = 4 GeV and reaches 0.16 at m a 1 = 15 GeV. The expected upper limit ranges from 0.027 at m a 1 = 9 GeV to 0.16 at m a 1 = 4 GeV and reaches 0.19 at m a 1 = 15 GeV.