Measurements of decay branching fractions of $H\to b\bar{b}/c\bar{c}/gg$ in associated $(e^{+}e^{-}/\mu^{+}\mu^{-})H$ production at the CEPC

The high-precision measurement of Higgs boson properties is one of the primary goals of the Circular Electron Positron Collider (CEPC). The measurements of $H\to b\bar{b}/c\bar{c}/gg$ decay branching fraction in the CEPC experiment is presented, considering a scenario of analysing 5000 $\mathrm{fb}^{-1}$ $e^{+}e^{-}$ collision data with the center-of-mass energy of 250 GeV. In this study the Higgs bosons are produced in association with a pair of leptons, dominantly mediated by the $ZH$ production process. The statistical uncertainty of the signal cross section is estimated to be about 1 % in the $H\to b\bar{b}$ final state, and approximately 5 % - 10 % in the $H\to c\bar{c}/gg$ final states. In addition, the main sources of the systematic uncertainties and their impacts to the measurements of branching fractions are discussed. This study demonstrates the potential of precise measurement of the hadronic final states of the Higgs boson decay at the CEPC, and will provide key information to understand the Yukawa couplings between the Higgs boson and quarks, which are predicted to be the origin of quarks' masses in the Standard Model.


Introduction
The discovery of a scalar boson with a mass around 125 GeV at the Large Hadron Collider (LHC) [1,2] completed the final piece of the standard model (SM). This particle, interpreted as the Higgs boson, plays a crucial role in the Electroweak Spontaneous Symmetry Breaking (EWSB), known as the Higgs mechanism [3][4][5]. The Higgs mechanism allows the W boson and Z boson to be massive while keeping the SU (2) L × U (1) Y gauge invariance. As a consequence of this mechanism, the fermions like quarks and charged leptons get their masses from their couplings to the Higgs field. The masses of the fermions (m f i ) in the SM are proportional to their Yukawa couplings (h i ) to the Higgs field: where v ≈ 246 GeV is the vacuum expectation value of the Higgs field. Thus measuring the Yukawa couplings between the Higgs boson and the SM fermions is essential to understand the origin of the fermions' masses. The deviation of these couplings from SM prediction would indicate new physics. The dominant Higgs boson decays into fermionic final states are H → bb, H → τ + τ − and H → cc, the decay branching fractions of which are predicted to be 57 %, 6 % and 2.7 % respectively [6,7]. In addition, the Higgs boson can decay to a pair of gluons via heavy quark loops. The large coupling between the Higgs boson and the top quark leads to considerably large branching fraction of H → gg which is estimated to be about 9 % [8,9].
Until now, all the Higgs boson measurements are performed in hadron colliders. The leading fermionic Higgs boson decay, H → bb, was studied in both AT-LAS and CMS, using the LHC Run-I data, which contains about 5 fb −1 and 20 fb −1 of pp collision data with √ s of 7 TeV and 8 TeV respectively. These measurements include several Higgs boson production channels: V H [10, 11], ttH [12][13][14] and VBF [15,16] processes. The H → bb were also studied at Tevatron [17] in V H production, using 9.7 fb −1 pp collision data with √ s of 1.96 TeV. The H → bb signal strength, defined as the ratio of the measured cross section to the corresponding SM prediction, is estimated to be 0.70 ± 0.29 according to the combination of ATLAS and CMS analysis of run-I data [18]. In 2018 observations of H → bb decay in V H production were declared by ATLAS [19] and CMS [20], using 79.8 fb −1 and 41.3 fb −1 pp collision data with √ s of 13 TeV. The signal strength is 1.16±0.16(stat.) +0.21 −0.19 (sys.) and 1.01±0.22 respectively. The H → cc was also studied using 36.1 fb −1 data with √ s of 13 TeV in ATLAS [21], giving a upper limit about 100 times higher than the SM prediction with 95% confidence level. The precision of those results is limited by large QCD production background, which is inevitable in hadron colliders.
A lepton collider has significant advantage in precise Higgs measurements as it's free of QCD production background and a has precise and tunable initial energy. Several future lepton colliders have been proposed with the capability of precise measurement of Higgs boson parameters, such as the International Linear Collider (ILC) [22], the e + e − Future Circular Collider (FCCee) [23], the Compact Linear Collider (CLIC) [24] and the Circular Electron Positron Collider (CEPC) [25]. The CEPC is a proposed electron-positron collider by the Chinese high energy physics community. It can be operated with √ s of 240 GeV to 250 GeV and the designed instaneous luminosity is 2 × 10 34 cm −2 s −1 . The cross section of Higgs production is about 0.2 pb in CEPC with √ s of 250 GeV. The primary production process is via ZH production(96.6%), which is often referred as Higgs-strahlung [26][27][28] process, while the fraction of production via W W -fusion [29] and ZZ-fusion is 3.06 % and 0.29 % respectively. After ten years of running, one million of the Higgs boson (5000 fb −1 collision data) are expected to be collected at the CEPC.
The work presented here focuses on the Higgs production in association with a pair of leptons (e + e − or µ + µ − ), in which Higgs decays to a pair of b-quark, c−quark or gluons. The leptons are either from Z−boson decay in Higgs-strahlung process, or ZZ−fusion which takes place only in e + e − H channel. The measurements of signal cross sections, denoted as σ H→bb l + l − H , σ H→cc l + l − H and σ H→gg l + l − H , are described. The branching fraction of H → bb/cc/gg can be derived once the cross sections of the l + l − H production, σ e + e − H and σ µ + µ − H , are determined from other measurements.
The work presented in this paper is partly inspired by the H → bb/cc/gg analysis in ILC [30], and it is the subsequent study in H → bb/cc/gg analysis presented in CEPC Higgs white paper [31] with improvement in background estimation. This paper is organized as follows. After the introduction, a brief description of the detector will be presented in Sec. 2. The MC samples and event selections are described in Sec. 3 and Sec. 4 respectively. In Sec. 5, the flavor tagging and flavor-template-recoil-mass-fit, which is the procedure to extract signal event yields, is presented. In Sec. 6, the uncertainties of the signal cross sections and signal branching fractions are discussed, and finally in Sec. 7 a short summary is provided.

Detector Design
The detailed description of the proposed CEPC detector can be found elsewhere [25]. A vertex detector with high pixel resolution is located in the inner most part of the detector. The 6 layers of sensors are laid coaxially, in radius from 16 mm to 60 mm, covering 97% -90% in the range of the polar angle. The spatial resolution in a single layer is 2.8 µm in the 2 inner layers and 4 µm in the 4 outer layers. The overall IP resolution can be represented as: The first and second term in the right side of equation depict the resolution from finite single point position and the resolution due to multi-scattering respectively. These two types of resolution are parameterized as a and b, which are estimated to be 5 and 10 respectively in CEPC. The parameters p and θ are the momentum and polar angle of the reconstructed charged particle and ⊕ denotes summation in quadrature. The high spatial resolution is essential to track impact parameter (IP) measurements and vertex reconstruction, on which the identification of the flavor of jets (flavor tagging) primarily relies. A Time Project Chamber (TPC) is located outside of the vertex dector to take the major task of track measurement. It covers the solid angle up to cos θ = 0.98. When being operated in designed magnetic field of 3.5 T, the momentum resolution is σ(1/p T ) = 10 −4 GeV.
The calorimeter system includes two sub-systems: the electromagnetic calorimeter (ECAL) and the hadronic calorimeter (HCAL). They are designed to have high energy resolution as well as high spatial resolution. The ECAL is a silicon-tungsten-based detector, which uses tungsten as absorber and silicon as sensor. It xxxxx-2 contains 30 layers of sampling structures. Each layer is divided into cells of 5mm × 5mm in size. The HCAL includes 40 layers. Each layer contains a 20 mm thick stainless steel as absorber layer, a 3 mm thick glass resistive plate chamber (GRPC) and a 3 mm thick readout electronics in 1cm × 1cm readout pad. The overall jet energy resolution(JER) is 3-4%, and the two-jet system invariant mass resolution is required to be around 3-4%, to distinguish the final states from Z boson and W ± boson hadronic decay. The high granularity of the ECAL and HCAL is crucial for application of the particle flow algorithm [32], which intends to reconstruct and identify each particle individually by combining information from all sub-detectors. The details of the ECAL and HCAL design and performance can be found in Ref. [33].
The muon system is mounted at the outermost part of the detector. The baseline design of muon detector has 8 sensitive layers in barrel and endcap region. On average, muons with momentum 2 GeV are expected to hit the first layer while those with momentum over 4 GeV can penetrate all the 8 layers. Muons with momentum above 5 GeV can be detected with standalone muon identification efficiency above 95% while the fake rate of pions is less than 1%. In order to improve the precision of the muon momentum measurement, the longitude and transverse position resolution are required to be σ z = 1.5 cm and σ rφ = 2.0 cm respectively.

MC Sample
Both background and signal events are generated using Whizard [34] configured as no-polarization electronpositron collision with the √ s of 250 GeV. The mass of the Higgs boson is assumed to be 125 GeV and the couplings are set to the SM predictions. The fragmentation and hadronization are performed by implementing PYTHIA6 [35]. All the MC datasets are normalized to the expected yields in data with integrated luminosity of 5000 fb −1 , by assigning a weight to the events of each process. The details of the event generation in CEPC can be found at Ref. [36].
The charged particles are identified as electrons or muons by Lepton Identification in Calorimeter with High Granularity(LICH) algorithm, a dedicate lepton identification algorithm designed for Higgs factories, as described in Ref. [42]. The algorithm use the dE/dX information measured by TPC, together with shower and hit information in high granularity calorimeter, as discrimination variables. A Boosted Decision Trees algorithm [43] with Gradient boosting (GBDT) method is implemented to further extract the discriminative characteristics of the variables. The overall efficiencies for electron and muons are 99.7% and 99.9% respectively, with the rate of electron and muons misidentified as each other smaller than 0.07%. The rate of particles like π ± identified as electrons or muons is 0.21% and 0.05% respectively.
Jets reconstruction and flavor tagging are essential to this analysis. They are done with the LCFIPLUS [44] software package, integrating the functionality of vertex finding, jet reconstruction and jet flavor tagging. Before the jet clustering, the secondary vertices are identified based on the reconstructed tracks. Jets are reconstructed by Durham algorithm [45]. This algorithm begins with jet cluster candidates, which are either single reconstructed particles, or compound objects like reconstructed secondary vertices. The procedure iteratively pairs the clusters and calculates the distance between them, defined as where E i and E j are the energy of i-th and j-th cluster, and θ ij refers to the angle between them. E vis are the sum of energy of all the clusters in the event. Clusters with minimum y ij are merged, reducing the cluster number by 1, until the ceasing criteria are met. The ceasing criteria can be either a minimum y ij −value threshold, or the remaining clusters number equals to the required jet multiplicity. In this analysis, the y ij threshold is set to 0, and each event is forced to have two jets reconstructed. The minimum value y ij can be denoted as Y k , in which k is the multiplicity of cluster candidates. When k is larger than 2, the signal events have relatively smaller Y k than the backgrounds with at least k primary parton * . This is because in a signal event, the closet two clusters, among more of them, are likely to be from the same parton. They tend to be collinear, which lead to small Y k .

Event selection
The final state of the signal contains two jets and two leptons with opposite charge and same flavor. There are two types of backgrounds according to the final states: the irreducible backgrounds and the reducible backgrounds. The irreducible backgrounds contain the same final states as that in signal. The semi-leptonic ZZ process, in which one Z−boson decays to e + e − or µ + µ − and the other decays to quark pair, is a typical example and the major components of the irreducible backgrounds. The reducible backgrounds include all the other types of background which have different final states, such as hadronic or leptonic W + W − and ZZ production, * Here primary parton refers to partons produced directly via electroweak processes or Higgs boson decay. It only counts for parton multiplicity before gluon radiation and gluon splitting. the lepton pair or quark pair production, semi-leptonic W + W − production, and the Higgs production processes other than l + l − H. There are backgrounds from e + e − H or µ + µ − H production, in which the Higgs boson decays to W W * or ZZ * and subsequently both vector bosons decay to quarks. They are reducible backgrounds since there are more quarks in final states than that in signal. But they behave more like irreducible background experimentally, so they will be discussed separately from the typical irreducible or reducible backgrounds, and they will be referred as e + e − H or µ + µ − H background in this analysis. There are also other irreducible backgrounds from l + l − H, in which the Higgs boson decays to light quark pair, to τ + τ − or to photons when all the tauons or photons from the Higgs boson decay are misidentified as jets. Their contributions are expected to be very small and they are classified as reducible backgrounds instead of l + l − H backgrounds.
Each event must contain two isolated tracks with opposite charge, reconstructed as e + e − or µ + µ − . The energy of each isolated lepton candidate must be above 20 GeV. The isolation criterion requires E 2 cone < 4E lep + 12.2, where E lep is the energy of the lepton, and E cone is the energy within a cone cos θ cone > 0.98 around the lepton. Here E lep and E cone are measured in GeV. Events with additional isolated leptons are rejected (extra isolated lepton veto). The polar angle of lepton pair system is required to be in the range of | cos θ µ + µ − | < 0.81 and | cos θ e + e − | < 0.71. The angle between the two isolation tracks ψ is required to satisfy cos ψ > −0.93 and cos ψ > −0.74 for e + e − H and µ + µ − H channel respectively, to reject events from lepton pair production where leptons tend to be back to back. The invariant mass of lepton pair is required to be inside the Z−mass window, which is defined as 77.5 GeV to 104.5 GeV.
The remaining particles in the event are used to reconstruct exactly two jets with polar angle θ jet in the range of | cos θ jet | < 0.96. The two jets are required to contain at least 20 particles, each with energy no less than 0.4 GeV, according to the optimization of distinguishing the signal events against the events including fake jets from photons or leptons. The invariant mass of the pair of jets is required to be between 75 GeV and 150 GeV to reject the irreducible backgrounds.
To suppress the e + e − H and the µ + µ − H backgrounds, Y 4 , an indicator of primary parton multiplicity described in Sec .3, was required to be less than 0.011.
The invariant mass of the lepton pair's recoil system, denoted as M ll recoil , provides clear signature of the llH events. The definition of M ll recoil is: in which √ s = 250 GeV, while E and P stand for the energy and momentum of the leptons respectively. A Higgs mass window is defined by requiring M ll recoil between 124 GeVand 140 GeV. The signal and background yields during the event selections are summarized in Tab. 1 for µ + µ − H and e + e − H analysis, respectively. Table 1.
Event yields of cut flow. Signal events are ll + H → ll + bb/cc/gg combined. µ + µ − H and e + e − H background refers to the background which Higgs are produced associated with µ + µ − and e + e − , but decay to final states other than bb/cc/gg. 'Other Higgs background' stands for the Higgs production process different from the signal. 'Irreducible SM background' is the e + e − /µ + µ − +jet pair process without Higgs productions. 'Other SM background' includes all the other background processes. The 'fit region' will be described in Sec. 5

Recoil-mass-flavor fit
After applying the object and event selection described in Sec. 4, it is necessary to get the component fractions of H → bb, H → cc and H → gg processes. It is achieved by the high performance multi-variablebased flavor tagging toolkit in LCFIPLUS. This toolkit is responsible for the flavor tagging algorithm training and implementing. The training is applied to the simulated Z → qq sample, produced with √ s of 91.2 GeV. The reconstructed jets in the sample are classified to 4 categories according to multiplicity the secondary vertex and lepton in the jet: jets with both secondary vertex and lepton, jets with secondary vertex but without lepton, jets without secondary vertex but with lepton and jets without neither secondary vertex nor lepton. The details of the secondary vertex finding can be find in Ref. [44]. The leptons are selected according to lepton identification presented in Sec.3 without any isolation criteria required. In each category, two types of training, one for the b-tagging algorithm and the other for the c-tagging algorithm, are implemented with GBDT method. Discrimination variables such as jets kinematic variables, impact parameters of tracks and secondary vertex parameters are included in the training. After the training, a b-tagging model and a c-tagging model are created. By invoking these models, a b-jet likeness weight and a c-jet likeness weight are calculated for each jet, representing the resemblance of the jet to a b-jet or a c-jet respectively. The likeness weights are in the range between 0 and 1, and a higher weight indicates higher likelihood of a jet to be b−jet or c−jet.
The b-weight likeness of the two individual jets of any selected event, L b1 and L b2 , can be used to construct the combined B likeness, defined as: A combined C−likeness X C can be defined in similar way, by replacing the L b1 and L b2 , by L c1 and L c2 respectively, in the right side of Eq. (3). The conservation of quark flavor in the Higgs boson decay guarantee that X B (X C ) is close to 1 if the Higgs boson decay to bb (cc) while close to 0 otherwise. Thus the flavor of the jets in each event can be characterized by the two dimensional distribution of variable X B and X C . Flavor templates are created for the (X B , X C ) distributions of different processes. By fitting the data with these flavor templates, one can get the fraction of each process. This template fit approach was implemented in ILC H → bb/cc/gg analysis [30].
Here the template fit is combined with the fit to M ll recoil , defined in Eq. (2). Such a combined fit is motivated by separating the irreducible backgrounds from signal while extracting the flavor components in signal events. The fit is applied to three variables: M ll recoil , X B and X C and simultaneously. The RooFit package is implemented to perform an unbinned likelihood fit to the weighted events. The overall likelihood function is constructed as: L(M ll recoil , X B , X C ; θ s , θ b , N sig H→bb , N sig H→cc , N sig H→gg , N bkg irred bb , N bkg irred cc , N bkg irred uds , N bkg l + l − H , N redu ) = P sig (M ll recoil ; θ s )(N sig H→bb P H→bb flavor (X B , X C ) + N sig H→cc P H→cc flavor (X B , X C ) + N sig H→gg P H→gg flavor (X B , X C ) + N bkg l + l − H P H→other flavor (X B , X C )) The N sig H→bb , N sig H→cc and N sig H→gg are the signal event yields of H → bb/cc/gg respectively, which are the concern of this analysis. N bkg irred bb , N bkg irred cc and N bkg irred uds are the event yields of the irreducible background with bb, cc or light quark in final states, respectively. N redu is the event yield of reducible backgrounds. θ s and θ b are the shape parameters of M ll recoil spectrum of llH and irreducible background respectively. Functions like P p flavor (X B , X C ) are the two-dimensional distributions of (X B , X C ) of process p, which are modeled as twodimensional histograms with 20 × 20 bins from MC simulation. The recoil mass function of l + l − H events, denoted as P sig (M ll recoil ; θ s ), are described by a crystal ball function, combined with a double sided exponential function which has peak near the Higgs mass threshold, to describe the resolution effect of track energy and momentum measurements. The lepton pair recoil mass spectrum of irreducible background events, denoted as P irred (M ll recoil ; θ b ), is described by a first order Chebychev polynomial function (µ + µ − H channel) or exponential functions (e + e − H channel). P redu is the 3-dimensional distribution of M ll recoil , X B and X C of the reducible backgrounds. It is depicted by MC simulation. The M ll recoil ranges in the fit are set to be between 120 GeV and 140 GeV for µ + µ − H channel, and between 115 GeVand 140 GeV for e + e − H channel. They are slightly wider than the signal regions to have better estimation of irreducible background. The signal and background yields in these regions are also summarized in Tab. 1. All the func- and P redu (M ll recoil , X B , X C ) are normalized to 1 in the fit range.
The shape parameters of the crystal ball function in θ s and all the parameters in θ b are free in the fit. The tail distribution of signal events, in M ll recoil spectrum near the Higgs mass threshold, are fixed according to a signal-only-fit. The event yields parameters such as N sig H→bb , N sig H→cc , N sig H→gg , N bkg irred bb , N bkg irred cc and N bkg irred uds are also free in the fit, while N bkg l + l − H and N redu are fixed to MC predictions. The 3D distribution of reducible background P redu (M ll recoil , X B , X C ) is also fixed as that predicted by MC simulation.
The fit program minimizes the negative logarithm of the likelihood function − i w i log L(M i , X i B , X i C ), in which L is the likelihood function presented in Eq.(4); w i , X i B , X i C and M i are event weight † , X B , X C and M ll recoil of event i respectively. The summation is applied by including all the events in the fit region. The fit results of the simulated data are shown in Fig. 1, in which one can find that the model describes the data very well.

Uncertainties of Measurements
The statistical uncertainty was estimated by using toyMC method that includes 6000 (µ + µ − H channel) and 10000 (e + e − H channel) iterations. In each iteration, the 'data' is filled in a 3D histogram with dimension of (M ll recoil , X B , X C ). In each bin of the histogram, the event yields fluctuated according to a Poisson distribution. Binned fit with the model described in Sec. 5 is then applied to the fluctuated histogram. The statistical uncertainty of signal event yields can be estimated according to dispersion of fit results of the toyMC test. The results of toyMC test for H → bb, H → cc and H → gg are represented in Fig. 2, in terms of pull of fitted signal events number for toyMC samples in the signal region which conform well to the standard normal distribution.
The systematic uncertainties from luminosity, lepton identification and selection efficiencies, Z → µ + µ − /e + e − modeling and ISR correction factor [46] are included in the measurements of σ H−>bb/cc/gg l + l − H . However, these uncertainties are also included in the measurement of inclusive Higgs boson production cross section associated with lepton pair σ llH , like that presented in Ref. [47]. To get the branching fractions of H → bb/cc/gg, one need to divide the measured σ H−>bb/cc/gg l + l − H by σ llH . As a consequence the systematic uncertainties discussed above will be canceled. So here we can ignore these uncertainties.
The fit method described in Sec.5 has two types of systematic uncertainty. The first kind is due to imperfect modeling of the PDF in the likelihood in Eq. (4). The inaccurate modeling of M ll recoil distribution and the bias in the prediction of F flavor (X B , X C ) can lead to such kind of uncertainty. The latter will be discussed later and here we only focus on the recoil mass modeling. The recoil mass functions for signal and background are verified by fitting the signal and background datasets alone respectively. These results demonstrate that the function describes the shape very well. The shape parameters are left to float in the fit, except for those of the tail function in signal recoil mass. These tail function can be studied by comparing the track momentum and energy resolution in data and MC. So far we assume the resolution is well simulated in MC sample.
The other kind of systematic uncertainty in the fit comes from the uncertainty of fixed parameters, including the event yields of e + e − H or µ + µ − H background, as well as the yields of all irreducible background. We conservatively set H → W W * and H → ZZ * event yields 5% higher and lower than the MC prediction, and vary the yields of non-ZZ backgrounds by ±100%, to estimate the corresponding systematic uncertainty. The systematic uncertainties discussed above are included in the row of 'Fixed Background' in Tab. 2.
In signal events extra leptons (leptons other than the 2 primary leptons) comes from leptonic decay of heavy flavor quarks. Systematic uncertainty of extra isolated lepton veto efficiency for H → bb and H → cc can be estimated using the bb and cc events produced at Z-pole respectively. By selecting the two-jets events, requiring both of the two jets tagged as b-jets and looking for an isolated lepton, the efficiency of lepton veto can be studied with 2 billion bb events in the Z-pole sample. The veto efficiency can be studied in a precision so high that it has no visible impact to the analysis presented, due to the high statistics of Z−pole sample and very low rate of occurrence of extra leptons.The impact to H → cc can be studied in similar way. By requiring both jets to be c-tagged in Z−pole sample and looking for isolated leptons, one can also study the impact with a precision high enough to have any visible impact to the analysis. For H → gg we assume the MC have ±50% of uncertainty in predicting the lepton-veto efficiency, and estimate the impact to be negligible in this analysis.
The jets' particle multiplicity, jet angular distribution and Y k value can be studied in very high precision with high statistics Z−pole data. Correspondingly, the systematic uncertainties of the efficiencies of jets' particle multiplicity cut, jet cos θ cut and Y k are negligible.
The systematic uncertainty of the efficiency of jet pair invariant mass cut can be estimated from the jet energy resolution. We apply a smearing on jet pair mass distribution according to a gaussian distribution † In this analysis the event weight are used for normalization, as mentioned in Sec. corresponding to the jet energy resolution. We take the value of 4% as the jet energy resolution from the CEPC pre-CDR [25] and calculate the uncertainty on the event yields of H → bb, H → cc and H → gg are +0.68% −0.20% , +0.43% −1.08% and +0.71% −1.68% respectively. The uncertainties of extra lepton veto, the jet angular and reconstructed particles multiplicity and jet pair mass resolution are included in the row of 'Event Selection' in Tab. 2.
Since flavor tagging method is implemented via flavor template fit, the flavor tagging systematic uncertainty is directly caused by the difference between the templates from the MC prediction and the templates in data. Evaluating such differences demands delicate flavor tagging commissioning and calibration. Although no such commissioning or calibration has been done yet, we can estimate the systematic uncertainty by assuming a difference between data and MC after the calibration was applied, and this difference is subsequently studied in terms of its impact on the H → bb/cc/gg branching fractions measurement. We select ZZ → qq + µ + µ − events as a control sample, which has a purity of 99.6%, and assuming a data-MC comparison has been done on the template distribution on this control sample. The estimation of the data-MC agreement is limited by the statistic uncertainty of the control sample, and the knowledge of the flavor components of hadronic Z−decay. For example, more than 80% of the bb events is concentrated in the region with highest b-likeness(b-likeness>0.95) and lowest c-likeness(c-likeness<0.05). If, due to some kinds of b−tagging systematic uncertainty, the Z → bb events fraction in this most concentrate bin changed, the data-MC disagreement would be increase. There are 1.92 × 10 4 Z → bb in this bin, which has a statistic uncertainty 0.72%. The current combined measurements of R b , defined as Br(Z → bb)/Br(Z → qq), has the uncertainty of 0.31% [48]. So the data and MC can be compared in the precision of √ 0.72% 2 + 0.31% 2 = 0.78%. Scaling the contents in this bin up and down by 0.78% in the bb template, one can estimate the uncertainty to H → bb, H → cc and H → gg are −0.4% +0.2% , +3.7% −5.0% and +0.2% −0.7% respectively. Had we use a much larger control sample (hadronic events at the Z-pole), and had a better understanding on the relationship between the flavor tagging variables and kinematic feature, the uncertainty will be further reduced.
The statistical uncertainty and systematic uncertainty discussed in this section are summarized in Tab. 2.

Conclusion
The measurements of branching fractions of H → bb/cc/gg are studied in µ + µ − H and e + e − H process, in the scenario of analysing 5000 fb −1 e + e − collision data with √ s of 250 GeV in CEPC. The statistical uncertainty on σ bb l + l − H , σ cc l + l − H and σ gg l + l − H measurements are estimated around 1.1%, 10.5% and 5.4% respectively in µ + µ − H channel, and 1.6%, 14.7% and 10.5% respectively in e + e − H channel. The systematic uncertainties on the branching fraction measurements are also studied, which are around 0.6%, 6% and 8% for bb, cc and gg final states respectively. The high precision of this measurement benefits from the distinct signature of events with the Higgs boson and clean background in electron-positron collider, as well as the model in-dependent analysis. As a comparison, by combining the extrapolated results in ATLAS and CMS in the scenario of the High Luminosity LHC(HL-LHC) [49], the H → bb branching fraction precision is expected to be 4.4%, in which the statistic uncertainty, the systematic uncertainty and theoretical uncertainty are 1.5%, 1.3% and 4.0% respectively. This study demonstrates the feasibility of precise measurement of Higgs Yukawa coupling to quarks at the CEPC.

Acknowledgment*
We would like to thank the CEPC higgs physics working group for the valuable discussions as well as the software and physics work without which this work couldn't have been accomplished. xxxxx-9