Search for light pseudoscalar boson pairs produced from decays of the 125 GeV Higgs boson in final states with two muons and two nearby tracks in pp collisions at $\sqrt{s} =$ 13 TeV

A search is presented for pairs of light pseudoscalar bosons, in the mass range from 4 to 15 GeV, produced from decays of the 125 GeV Higgs boson. The decay modes considered are final states that arise when one of the pseudoscalars decays to a pair of tau leptons, and the other one either into a pair of tau leptons or muons. The search is based on proton-proton collisions collected by the CMS experiment in 2016 at a center-of-mass energy of 13 TeV that correspond to an integrated luminosity of 35.9 fb${-1}$. The 2$\mu$2$\tau$ and 4$\tau$ channels are used in combination to constrain the product of the Higgs boson production cross section and the branching fraction into 4$\tau$ final state, $\sigma\mathcal{B}$, exploiting the linear dependence of the fermionic coupling strength of pseudoscalar bosons on the fermion mass. No significant excess is observed beyond the expectation from the standard model. The observed and expected upper limits at 95% confidence level on $\sigma\mathcal{B}$, relative to the standard model Higgs boson production cross section, are set respectively between 0.022 and 0.23 and between 0.027 and 0.19 in the mass range probed by the analysis.


Introduction
After the discovery of the 125 GeV Higgs boson (H) [1,2], searches for additional Higgs bosons, based on predictions beyond the standard model (SM), constitute an important part of the scientific program at the CERN Large Hadron Collider (LHC). The present analysis examines theoretical models that contain two Higgs doublets and an additional complex singlet Higgs field (denoted hereafter as 2HD+1S), that does not couple at tree level to fermions or gauge bosons and interacts only with itself and the Higgs doublets [3][4][5][6][7][8][9][10]. In CP conserving models, which are considered in this Letter, the Higgs sector features seven physical states, namely three CP-even, two CP-odd, and two charged bosons, where one of the CP-even states corresponds to the H. This kind of Higgs sector is realized, for example, in next-to-minimal supersymmetric models that solve the so-called µ problem of the minimal supersymmetric extension of the SM [11]. A large set of the 2HD+1S models is allowed by measurements and constraints set by searches for additional Higgs bosons and supersymmetric particles [12][13][14][15][16][17].
This Letter addresses specific 2HD+1S models in which the lightest pseudoscalar boson (a 1 ) with mass 2m a 1 < 125 GeV has a large singlet component, and therefore its couplings to SM particles are significantly reduced. For this reason, analyses using direct production modes of a 1 , such as gluon-gluon fusion (ggF) or b quark associated production, have limited sensitivity. The a 1 boson is nonetheless potentially accessible in the H decay to two pseudoscalar bosons. The a 1 states can be identified via their decay into a pair of fermions [18][19][20][21][22][23][24][25]. Constraints on the H couplings allow a branching fraction for H decays into non-SM particles as large as 34% [26], which can potentially accommodate the H → a 1 a 1 decay at a rate sufficiently high for detection at the LHC.
Several searches for H → a 1 a 1 decays have been performed in the ATLAS and CMS experiments in Run 1 (8 TeV) and Run 2 (13 TeV) of LHC, exploiting various decay modes of the a 1 boson, and probing different ranges of its mass [27][28][29][30][31][32][33][34][35][36][37][38][39][40]. These searches found no significant deviation from the expectation of the SM background and upper limits were set on the product of the production cross section and the branching fraction for signal resulting in constraints on parameters of the 2HD+1S models. This analysis presents a search for light a 1 bosons in the decay channels H → a 1 a 1 → 4τ/2µ2τ, using data corresponding to an integrated luminosity of 35.9 fb −1 , collected with the CMS detector in 2016 at a center-of-mass energy of 13 TeV. The analysis covers the mass range from 4 to 15 GeV and employs a special analysis strategy to select and identify highly Lorentz-boosted muon or tau lepton pairs with overlapping decay products. The study updates a similar one performed by the CMS Collaboration in Run 1 [28], and complements other recent CMS searches for the H → a 1 a 1 decay performed in Run 2 data in the 2µ2τ [30], 2τ2b [31], 2µ2b [38] and 4µ [39] final states, covering respective mass ranges of 0.25 < m a 1 < 3.40 GeV for the 4µ final state and 15.0 < m a 1 < 62.5 GeV for the 2µ2τ, 2τ2b, and 2µ2b final states.
The branching fraction a 1 → ττ depends on the details of the model, namely the parameter tan β, the ratio of vacuum expectation values of the two Higgs doublets, and on which Higgs doublet couples to either charged leptons, up-type quarks or down-type quarks [41]. In Type-II 2HD+1S models, where one Higgs doublet couples to up-type fermions while the other couples to down-type fermions, the a 1 → ττ decay rate gets enhanced at large values of tan β. The branching fraction of this decay reaches values above 90% at tan β > 3 for 2m τ < m a 1 < 2m b , where m τ is the mass of the tau lepton and m b is the mass of the bottom quark. For higher values of m a 1 the branching fraction decreases to 5-6% since the decay into a pair of bottom quarks becomes kinematically possible and overwhelms the decay into a pair of tau leptons. However, in some of the 2HD+1S models the a 1 → ττ decay may be dominant even above the a 1 → bb decay threshold. This is realized, e.g., for tan β > 1 in the Type-III 2HD+1S models, where one Higgs doublet couples to charged leptons, whereas the other doublet couples to quarks [41].
The signal topology targeted by the present analysis is illustrated in Fig. 1. Each a 1 boson is identified by the presence of a muon and only one additional charged particle, the objective of this approach being the decay channels a 1 → µµ and a 1 → τ µ τ one-prong . The τ µ denotes the muonic tau lepton decay, and τ one-prong stands for its leptonic or one-prong hadronic decay. The three-prong modes are not used because of the very high QCD multijet background and lower reconstruction signal efficiency.
Given the large difference in mass between the a 1 and the H states, the a 1 bosons will be produced highly Lorentz-boosted, and their decay products are highly collimated. This will result in a signature with two muons, each of which is accompanied by a nearby particle of opposite charge. The search focuses primarily on the dominant ggF process, in which the H state is produced with relatively small transverse momentum p T , and the a 1 pseudoscalars are emitted nearly back-to-back in the transverse plane, with a large separation in azimuth φ between the particles originating from one of the a 1 decays and those of the other a 1 . In the ggF process, the H can be also produced with a relatively high Lorentz boost when a hard gluon is radiated from the initial-state gluons or from the heavy-quark loop. In this case, the separation in φ is reduced, but the separation in pseudorapidity η can be large. The analysis therefore searches for a signal in a sample of same-charge (SC) dimuon events with large angular separation between the muons, where each muon is accompanied by one nearby oppositely charged particle originating from the same a 1 decay. The requirement of having SC muons in the event largely suppresses background from the top-quark-pair, Drell-Yan, and diboson production. This requirement also facilitates the implementation of a dedicated SC dimuon trigger with relatively low thresholds and acceptable rates as described in Section 4.

Lorentz-boosted states
Well separated same-charge muons Figure 1: Illustration of the signal topology, in which the H decays into two a 1 bosons, where one a 1 boson decays into a pair of tau leptons, while the other one decays into a pair of muons or a pair of tau leptons. The analyzed final state consists of one muon and an oppositely charged track in each a 1 decay.

CMS detector
The central feature of the CMS detector is a superconducting solenoid of 6 m internal diameter, providing a magnetic field of 3.8 T. Within the solenoid volume are a silicon pixel and strip tracker, a lead tungstate crystal electromagnetic calorimeter, and a brass and scintillator hadron calorimeter, each composed of a barrel and two endcap sections. Forward calorimeters extend the η coverage provided by the barrel and endcap detectors. Muons are detected in gas-ionization chambers embedded in the steel flux-return yoke outside the solenoid.
Events of interest are selected using a two-tiered trigger system [42]. The first level, composed of custom hardware processors, uses information from the calorimeters and muon detectors to select events at a rate of around 100 kHz within a time interval of less than 4 µs. The second level, known as the high-level trigger, consists of a farm of processors running a version of the full event reconstruction software optimized for fast processing, and reduces the event rate below 1 kHz before data storage.
A more detailed description of the CMS detector, together with a definition of the coordinate system used and the relevant kinematic variables, can be found in Ref. [43].

Simulated samples
For the simulation of the dominant ggF production process, the Monte Carlo (MC) event generators PYTHIA (v.8.212) [44] and MADGRAPH5 aMC@NLO (v.2.2.2) [45] are used in order to model the H → a 1 a 1 → 4τ and H → a 1 a 1 → 2µ2τ signal events, respectively. For both decay modes the p T distribution of the H emerging from ggF is reweighted with next-to-nextto-leading order (NNLO) K factors obtained by the program HQT (v2.0) [46,47] with NNLO NNPDF3.0 parton distribution functions (PDF) [48], hereby taking into account the more precise spectrum calculated to NNLO with resummation to next-to-next-to-leading-logarithms order. Subdominant contributions from other production modes of H, namely vector boson fusion process (VBF), vector boson associated production (VH) and top quark pair associated production (ttH) are estimated using the PYTHIA (v.8.212) generator.
Showering and hadronization are carried out by the PYTHIA (v.8.212) generator with the CUETP8M1 underlying event tune [54], while a detailed simulation of the CMS detector is based on the GEANT4 [55] package.

Event selection
Events are selected using a SC dimuon trigger with p T thresholds of 17 (8) GeV for the leading (subleading) muon. To pass the high-level trigger, the tracks of the two muons are additionally required to have points of closest approach to the beam axis within 2 mm of each other along the longitudinal direction.
Events are reconstructed with the particle-flow (PF) algorithm [56] which aims to identify and reconstruct individual particles as photons, charged hadrons, neutral hadrons, electrons, or muons (PF objects). The proton-proton (pp) interaction vertices are reconstructed using a Kalman filtering technique [57,58]. Typically more than one such vertex is reconstructed because of multiple pp collisions within the same or neighbouring bunch crossings. The mean number of such interactions per bunch crossing was 23 in 2016.
The reconstructed vertex with the largest value of summed physics-object p 2 T is taken to be the primary interaction vertex (PV). The physics objects are the jets, clustered using the jet-finding algorithm [59,60] with the tracks assigned to the vertex as inputs, and the associated missing transverse momentum, taken as the negative vector sum of the p T of those jets. Events must contain at least two SC muons reconstructed with the PF algorithm, which have to fulfil the following requirements.
• The pseudorapidity of the leading (higher p T ) and the subleading (lower p T ) muons must be |η| < 2.4. • The p T of the leading (subleading) muon must exceed 18 (10) GeV.
• The transverse (longitudinal) impact parameters of muons with respect to the PV are required to be |d 0 | < 0.05 (|d z | < 0.1) cm.
• The angular separation between the muons is ∆R = If more than one SC muon pair is found in the event to satisfy these requirements, the pair with the largest scalar sum of muon p T is chosen.
In the next step, the analysis employs information about tracks associated with the reconstructed charged PF objects, excluding the pair of SC muons. Selected muons and tracks are used to build and isolate candidates for the a 1 → τ µ τ one-prong or a 1 → µµ decays (referred to as a 1 candidates throughout the Letter). Three types of tracks are considered in the analysis.
A track is regarded as being nearby a muon if the angular separation ∆R between them is smaller than 0.5. Each muon of the SC pair is required to have one nearby "signal" track with a charge opposite to its charge. This muon-track system is accepted as an a 1 candidate if no additional "isolation" tracks are found in the ∆R cone of 0.5 around the muon momentum direction. The event is selected in the final sample if it contains two a 1 candidates. The set of selection requirements outlined above defines the signal region (SR).
The expected signal acceptance and signal yield for a few representative values of m a 1 are reported in Table 1. The signal yields are computed for a benchmark value of the branching fraction, B(H → a 1 a 1 )B 2 (a 1 → ττ) = 0.2 and assuming that the H production cross section is the one predicted in the SM. Contributions from the ggF, VBF, VH and ttH processes are summed up. The yield of the 2µ2τ signal is estimated under the assumption that the partial widths of the a 1 → µµ and a 1 → ττ decays satisfy the relation [23] The ratio of branching fractions of the a 1 a 1 → 2µ2τ and a 1 a 1 → 4τ decays is computed through the ratio of the partial widths Γ(a 1 → µµ) and Γ(a 1 → ττ) as The factor of 2 in Eq. (2) arises from two possible decays, a 1 → 2τ2µ, that produce the final state with two muons and two tau leptons. The ratio in Eq. (2) ranges from about 0.0073 at m a 1 = 15 GeV to 0.0155 at m a 1 = 4 GeV.
The contribution from the H → a 1 a 1 → 4µ decay is estimated taking into account Eq. (1). It ranges between 0.4 and 2% of the total signal yield in the 2µ2τ and 4τ final states, depending on the probed mass of the a 1 boson. This contribution is not considered in the present analysis.
The number of observed events selected in the SR amounts to 2035. A simulation-based study shows that the QCD multijet events dominate the sample of events selected in the SR. Contribution from other background sources constitutes about 1% of events selected in the SR. Table 1: The signal acceptance and the number of expected signal events after selection in the SR. The number of expected signal events is computed for a benchmark value of branching fraction, B(H → a 1 a 1 )B 2 (a 1 → ττ) = 0.2 and assuming that the H production cross section is the one predicted in the SM. The quoted uncertainties for predictions from simulation include only statistical ones.

Modeling background
A simulation-based study reveals that the sample of SC muon pairs selected as described in Section 4, but without requiring the presence of a 1 candidates, is dominated by QCD multijet events, where about 85% of all selected events contain bottom quarks in the final state. The SC muon pairs in these events originate mainly from the following sources: • muonic decay of a bottom hadron in one bottom quark jet and cascade decay of a bottom hadron into a charm hadron with a subsequent muonic decay of the charm hadron in the other bottom quark jet; • muonic decay of a bottom hadron in one bottom quark jet and decay of a quarkonium state into a pair of muons in the other jet; • muonic decay of a bottom hadron in one bottom quark jet and muonic decay of a B 0 meson in the other bottom quark jet. The SC muon pair in this case may appear as a result of B 0 -B 0 oscillations.
The normalized 2D (m 1 , m 2 ) distribution for the muon-track pairs with m 2 > m 1 is represented in the sample of background events by a binned template constructed using the following relation where • C(i, j) is a symmetric matrix, accounting for possible correlation between m 1 and m 2 , the elements of the matrix C(i, j) are referred to as "correlation factors" in the following.
The condition C(i, j) = 1 for all bins (i, j) would indicate an absence of correlation between m 1 and m 2 . We sum the contents of the nondiagonal bins (i, j) and (j, i) in the Cartesian product f 1D (i) f 1D (j) to account for the fact that each event enters the 2D (m 1 , m 2 ) distribution with ordered values of the muon-track invariant masses.
By construction the background model estimates the dominant QCD multijet production as well as small contributions from other processes.
Multiple control regions (CRs) are introduced in order to derive and validate the modeling of f 1D (i) and C(i, j). The CRs are defined on the basis of a modified isolation criteria applied to one or both muon-track pairs. The isolation criteria are specified by the multiplicity of "isolation" tracks in the cone of ∆R = 0.5 around the muon momentum direction. The summary of all CRs used to derive and validate the modeling of background shape is given in Table 2. Table 2: Control regions used to construct and validate the background model. The symbols N sig , N iso and N soft denote the number of "signal", "isolation" (which are a subset of "signal" tracks) and "soft" tracks, respectively, within a cone of ∆R = 0.5 around the muon momentum direction. The last row defines the SR.

Control region
Signal extraction 2 035

Modeling of f 1D (i)
The f 1D (i) distribution is modeled using the N 23 CR. Events in this CR pass the SC dimuon selection and contain only one a 1 candidate composed of the isolated "signal" track and muon (first muon). The invariant mass of the first muon and associated track enters the f 1D (i) distribution. Another muon (second muon) is required to be accompanied by either two or three nearby "isolation" tracks. The simulation shows that more than 95% of events selected in the CR N 23 are QCD multijet events, while the remaining 5% is coming from tt, Drell-Yan and other electroweak processes. The modeling of the f 1D (i) template is based on the hypothesis that the kinematic distributions for the muon-track system, making up an a 1 candidate (the first muon and associated track), are weakly affected by the isolation requirement imposed on the second muon; therefore the f 1D (i) distribution of the muon-track system forming an a 1 candidate is expected to be similar in the SR and the N 23 CR.
This hypothesis is verified in control regions labelled N iso,2 = 1 and N iso,2 = 2, 3. Events are selected in these CR if one of the muons (first muon) has more than one "isolation" track (N iso > 1). At least one of these "isolation" tracks should also fulfil the criteria imposed on the "signal" track. As more than one of these tracks can pass the criteria imposed on "signal" tracks, two scenarios have been investigated, namely using either the lowest or the highest p T "signal" tracks ("softest" and "hardest") to calculate the muon-track invariant mass. If only one "signal" track is found nearby to the first muon, the track is used both as the "hardest" and the "softest" signal track. For the second muon, two isolation requirements are considered: when the muon is accompanied by only one "signal" track and the muon-track system is isolated as in the SR (CR N iso,2 = 1), or when it is accompanied by two or three "isolation" tracks as in the CR N 23 (CR N iso,2 = 2, 3). The invariant mass distributions of the first muon and the softest or hardest accompanying track are then compared for the two different isolation requirements on the second muon, N iso,2 = 1 and N iso,2 = 2, 3. The results of this study are illustrated in Fig. 3. In both cases, the invariant mass distributions differ in each bin by less than 6%. This observation indicates that the invariant mass of the muon-track system, making up an a 1 candidate, weakly depends on the isolation requirement imposed on the second muon, thus supporting the assumption that the f 1D (i) distribution can be determined from the N 23 CR. [GeV] , softest trk µ  Figure 3: The observed invariant mass distribution, normalized to unity, of the first muon and the softest (left) or hardest (right) accompanying "signal" track for different isolation requirements imposed on the second muon: when the second muon has only one accompanying "isolation" track (N iso,2 = 1; circles); or when it has two or three accompanying "isolation" tracks (N iso,2 = 2, 3; squares).
The potential dependence of the muon-track invariant mass distribution on the isolation requirement imposed on the second muon is verified also by comparing shapes in the control regions N 23 and N 45 . The latter CR is defined by requiring the presence of 4 or 5 "isolation" tracks nearby to the second muon, while the first muon-track pair passes selection criteria for the a 1 candidate. The results are illustrated in Fig. 4. A slight difference is observed between distributions in these two CRs. This difference is taken as a shape uncertainty in the normalized template f 1D (j) entering Eq. (3). Figure 5 presents the normalized invariant mass distribution of the muon-track system for data selected in the SR and for the background model derived from the N 23 CR. The data and background distributions are compared to the signal distributions, obtained from simulation, for four representative mass hypotheses, m a 1 = 4, 7, 10, and 15 GeV. The invariant mass of the muon-track system is found to have higher discrimination power between the background and the signal at higher m a 1 . For lower masses, the signal shape becomes more background like, resulting in a reduction of discrimination power.

Modeling of C(i, j)
In order to determine the correlation factors C(i, j), an additional CR (labelled Loose-Iso) is used. It consists of events that contain two SC muons passing the identification and kinematic selection criteria outlined in Section 4. Each muon is required to have two or three nearby tracks. One of them should belong to the category of "signal" tracks, whereas remaining tracks should belong to the category of "soft" tracks. About 36k data events are selected in this CR. The simulation predicts that the QCD multijet events dominate this CR, comprising more than 99% of selected events. It was also found that the overall background-to-signal ratio is enhanced compared to the SR by a factor of 30 to 40, depending on the mass hypothesis, m a 1 . The event sample in this region is used to build the normalized distribution f 2D (i, j). Finally, the correlation factors C(i, j) are obtained according to Eq. (3) as where f 1D (i) is the 1D normalized distribution with two entries per event (m 1 and m 2 ). The correlation factors C(i, j) derived from data in the Loose-Iso CR are presented in Fig. 6. To obtain estimates of C(i, j) in the signal region, the correlation factors derived in the Loose-Iso CR have to be corrected for the difference in C(i, j) between the signal region and Loose-Iso CR. This difference is assessed by comparing samples of simulated background events. The correlation factors estimated from simulation in the signal region and the Loose-Iso CR are presented in Fig. 7.
The correlation factors in the signal region are then computed as where • C(i, j) CR data are correlation factors derived for the Loose-Iso CR in data (Fig. 6);  The QCD multijet background model is derived from the control region N 23 . Also shown are the normalized distributions from signal simulations for four mass hypotheses, m a 1 = 4, 7, 10, and 15 GeV (dashed histograms), whereas for higher masses the analysis has no sensitivity. Each event in the observed and expected signal distributions contributes two entries, corresponding to the two muon-track systems in each event passing the selection. The signal distributions include 2µ2τ and 4τ contributions. The lower panel shows the ratio of the observed to expected number of background events in each bin of the distribution. The grey shaded area represents the background model uncertainty.
• C(i, j) SR MC are correlation factors derived for the SR in the simulated QCD multijet sample (Fig. 7, left); • C(i, j) CR MC are correlation factors derived for the Loose-Iso CR in the simulated QCD multijet sample (Fig. 7, right).
The difference in correlation factors derived in the SR (Fig. 7, left) and in the Loose-Iso CR (Fig. 7, right) using the QCD multijet sample is taken into account as an uncertainty in C(i, j).

Modeling signal
The signal templates are derived from the simulated samples of the H → a 1 a 1 → 4τ and H → a 1 a 1 → 2µ2τ decays. The study probes the signal strength modifier, defined as the ratio of the product of the measured signal cross section and the branching fraction into the 4τ final state B(H → a 1 a 1 )B 2 (a 1 → ττ) to the inclusive cross section of the H production predicted in the SM. The relative contributions from different production modes of H are defined by the [GeV]    corresponding cross sections predicted in the SM. The contribution of the H → a 1 a 1 → 2µ2τ decay, is computed assuming that the partial widths of a 1 → ττ and a 1 → µµ decays satisfy Eq. (1).
The invariant mass distribution of the muon-track system in the a 1 → µµ decay channel peaks at the nominal value of the a 1 boson mass, while the reconstructed mass of the muon-track system in the a 1 → ττ decay is typically lower, because of the missing neutrinos. This is why the H → a 1 a 1 → 2µ2τ signal samples have a largely different shape of the (m 1 , m 2 ) distribution compared to the H → a 1 a 1 → 4τ signal samples. Figure 8 compares the (m 1 , m 2 ) distributions unrolled in a one row between the H → a 1 a 1 → 4τ and H → a 1 a 1 → 2µ2τ signal samples for mass hypotheses m a 1 4 GeV and 10 GeV. The signal distributions are normalized assuming the SM H production rate with the branching fraction B(H → a 1 a 1 )B 2 (a 1 → ττ) equal to 0.2.
(1,     Table 3 lists the systematic uncertainties considered in the analysis for both signal and background.

Uncertainties related to the background
The estimation of the QCD multijet background is based on observed data, therefore it is not affected by imperfections in the simulation, reconstruction, or detector response.
The shape of the background in the (m 1 , m 2 ) distribution is modeled according to Eq. (3), while its uncertainty is dominated by uncertainties related to the correlation factors C(i, j) (as described in Section 5.2). Additionally, it is also affected by the shape uncertainty in the 1D template f 1D (m) (as discussed in Section 5.1). The bin-by-bin uncertainties in mass correlation factors C(i, j), derived from Eq. (5), are composed of the statistical uncertainties in observed data and simulated samples, as presented in Figs. 6 and 7, and range from 3 to 60%. These uncertainties are accounted for in the signal extraction procedure by one nuisance parameter per bin in the (m 1 , m 2 ) distribution [61]. The systematic uncertainties related to the extrapolation of C(i, j) from the Loose-Iso CR to the SR are derived from the dedicated MC study outlined in Section 5.2. The related shape uncertainty is determined by comparing correlation factors derived in the simulated samples, between the signal region and the Loose-Iso CR.
In the case when B(H → a 1 a 1 )B 2 (a 1 → ττ) = 0.34, corresponding to an upper limit at 95% confidence level (CL) on the branching fraction of the H decay into non-SM particles from Ref.
[26], the impact of possible signal contamination in the Loose-Iso CR is estimated on a binby-bin basis, and it is at most 2.8% in the bin (6,6) which was found to have a negligible effect on the final results. For all other CRs, the signal contamination was found to be well below 1%. <0.5%

Uncertainties related to signal
An uncertainty of 2.5% is assigned to the integrated luminosity estimate [62].
The uncertainty in the muon identification and trigger efficiency is estimated to be 2% for each selected muon obtained with the tag-and-probe technique [63]. The track selection and muontrack isolation efficiency is assessed with a study performed on a sample of Z bosons decaying into a pair of tau leptons. In the selected Z → ττ events, one tau lepton is identified via its muonic decay, while the other is identified as an isolated track resulting from a one-prong decay. The track is required to pass the nominal selection criteria used in the main analysis. From this study, the uncertainty in the track selection and isolation efficiency is evaluated. The related uncertainty affects the shape of the signal estimate, while changing the overall signal yield by 10-18%. The muon and track momentum scale uncertainties are smaller than 0.3% and have a negligible effect on the analysis.
The bin-by-bin statistical uncertainties in the signal acceptance range from 8 to 100%, while the impact on the overall signal normalization varies between 5 and 20%.
Theoretical uncertainties have an impact on the differential kinematic distributions of the produced H, in particular its p T spectrum, thereby affecting signal acceptance. The uncertainty due to missing higher-order corrections to the ggF process is estimated with the HQT program by varying the renormalization (µ R ) and factorization (µ F ) scales. The H p T -dependent K factors are recomputed according to these variations and applied to the simulated signal samples. The resulting effect on the signal acceptance is estimated to vary between 1.2 and 1.5%, depending on m a 1 . In a similar way, the uncertainty in the signal acceptance is computed for the VBF, VH and ttH production processes. The impact on the acceptance is estimated to vary between 0.8 and 2.0%, depending on the process and probed mass of the a 1 boson.
The HQT program is also used to evaluate the effect of the PDF uncertainties. The nominal K factors for the H p T spectrum are computed with the NNPDF3.0 PDF set [48]. Variations of the NNPDF3.0 PDFs within their uncertainties change the signal acceptance by about 1%, whilst using the CTEQ6L1 PDF set [64] changes the signal acceptance by about 0.7%. The impact of the PDF uncertainties on the acceptance for the VBF, VH and ttH production processes is estimated in the same way and a 2% uncertainty is considered to account for these.
Systematic uncertainties in theoretical predictions for the signal cross sections are driven by variations of the µ R and µ F scales and PDF uncertainties. Uncertainties related to scale variations range from 0.4 to 9%, depending on the production mode. Uncertainties related to PDF vary between 2.1 and 3.6%.

Results
The signal is extracted with a binned maximum-likelihood fit applied to the (m 1 , m 2 ) distribution. For each probed mass of the a 1 boson, the (m 1 , m 2 ) distribution is fitted with the sum of two templates, corresponding to expectations for the signal and background, dominated by QCD multijet events.
The normalization of both signal and background are allowed to float freely in the fit. The systematic uncertainties affecting the normalization of the signal templates are incorporated in the fit via nuisance parameters with a log-normal prior probability density function. The shape-altering systematic uncertainties are represented by nuisance parameters whose variations cause continuous morphing of the signal or background template shape, and are assigned a Gaussian prior probability density functions. The bin-by-bin statistical uncertainties are assigned gamma prior probability density functions. Figure 9 shows the distribution of (m 1 , m 2 ), where the notation for the bins follows that of Fig. 2. The shape and the normalization of the background distribution are obtained by applying a fit to the observed data under the background-only hypothesis. Also shown are the expectations for the signal at m a 1 = 4, 7, 10, and 15 GeV. The signal normalization is computed assuming that the H is produced in pp collisions with a rate predicted by the standard model, and decays into a 1 a 1 → 4τ final state with a branching fraction of 20%. No significant deviations from the background expectation are observed in the (m 1 , m 2 ) distribution.
Results of the analysis are used to set upper limits at 95% CL on the product of the cross section and branching fraction, σ(pp → H + X)B(H → a 1 a 1 )B 2 (a 1 → ττ), relative to the inclusive SM cross section of H production. The modified frequentist CL s criterion [65,66], and the asymptotic formulae are used for the test statistic [67], implemented in the RooStats package [68]. Figure 10 shows the observed and expected upper limits at 95% CL on the signal cross section times the branching fraction, relative to the total cross section of the H boson production as predicted in the SM. The observed limit is compatible with the expected limit within one standard deviation in the entire range of m a 1 considered, and ranges from 0.022 at m a 1 = 9 GeV to 0.23 at m a 1 = 4 GeV and reaches 0.16 at m a 1 = 15 GeV. The expected upper limit ranges from 0.027 at m a 1 = 9 GeV to 0.16 at m a 1 = 4 GeV and reaches 0.19 at m a 1 = 15 GeV. The degradation of the analysis sensitivity towards lower values of m a 1 is caused by the increase of the background yield at low invariant masses of the muon-track systems, as illustrated in Figs. 5 and 9. With increasing m a 1 , the average angular separation between the decay products of the a 1 boson is increasing. As a consequence, the efficiency of the signal selection drops down, as we require the muon and the track, originating from the a 1 → τ µ τ one-prong or a 1 → µµ decay, to be within a cone of ∆R = 0.5. This explains the deterioration of the search sensitivity at higher values of m a 1 . The shaded area in blue indicates the excluded region of >34% for the branching fraction

Summary
A search is presented for light pseudoscalar a 1 bosons, produced from decays of the 125 GeV Higgs boson (H) in a data set corresponding to an integrated luminosity of 35.9 fb −1 of protonproton collisions at a center-of-mass energy of 13 TeV. The analysis is based on the H inclusive production and targets the H → a 1 a 1 → 4τ/2µ2τ decay channels. Both channels are used in combination to constrain the product of the inclusive signal production cross section and the branching fraction into the 4τ final state, exploiting the linear dependence of the fermionic coupling strength of a 1 on the fermion mass. With no evidence for a signal, the observed 95% confidence level upper limit on the product of the inclusive signal cross section and the branching fraction, relative to the SM H production cross section, ranges from 0.022 at m a 1 = 9 GeV to 0.23 at m a 1 = 4 GeV and reaches 0.16 at m a 1 = 15 GeV. The expected upper limit ranges from 0.027 at m a 1 = 9 GeV to 0.16 at m a 1 = 4 GeV and reaches 0.19 at m a 1 = 15 GeV.

Acknowledgments
We congratulate our colleagues in the CERN accelerator departments for the excellent performance of the LHC and thank the technical and administrative staffs at CERN and at other CMS institutes for their contributions to the success of the CMS effort. In addition, we gratefully acknowledge the computing centers and personnel of the Worldwide LHC Computing Grid for delivering so effectively the computing infrastructure essential to our analyses.