Tuning and validation of hadronic event generator for R value measurements in the tau-charm region *

: To measure the R value in an energy scan experiment with e + e − collisions, precise calculation of initial state radiation is required in the event generators. We present an event generator for this consideration, which incorporates initial state radiation eﬀects up to second order accuracy. The radiative correction factor is calculated using the totally hadronic Born cross section. The measured exclusive processes are generated according to their cross sections, while the unknown processes are generated using the LUND Area Law model, and its parameters are tuned with data collected at √ s =3 . 65 GeV. The optimized values are validated with data in the range √ s = 2 . 2324—3 . 671 GeV. These optimized parameters are universally valid for event generation below the D¯D threshold.


Introduction
The total cross section for hadron production in positron-electron (e + e − ) annihilation is one of the most fundamental observables in particle physics. A precise measurement of the hadronic cross section allows us to determine the hadronic contributions to the running of the quantum electrodynamic (QED) fine structure constant α, electroweak parameters, and the strong coupling α s . The R value, defined as the ratio of the total hadronic cross section to that of e + e − → µ + µ − at Born level, have been measured by many collaborations in e + e − scan experiments, over the center-of-mass energy from the two pion mass threshold (M 2π ) to the Z peak [1]. In the tau-charm energy region, the R values measured at BESII [2] were used in the evaluation of the hadronic contribution from the five quark loops at the energy of Z peak, ∆α (5) had (M 2 Z ), with an improved precision by a factor of 2 [3].
A large number of exclusive processes have been measured over the range from M 2π to 5 GeV [4], but most cross sections have large uncertainties. To improve these measurements, a hadronic event generator is needed for us to get better understanding of background events from e + e − → hadrons.
Especially, a precise R-value measurement requires excellent control of radiative correction (RC) and vacuum polarization (VP) in the Monte Carlo (MC) program. We design an event generator for measuring R values and exclusive decays in e + e − collisions. The generator is constructed in the framework of BesEvtGen [5], incorporating both the RC and VP effects. We also present details of the parameter optimization of the Lund Area Law (LUARLW ) model [6] with data, and validations with various distributions within the energy range √ s = 2.2324-3.671 GeV.

Framework of event generator
The generator is constructed as a model of the Bes-EvtGen package. It provides the 4-momentum of each final state particle for detector simulation, and provides the ISR correction factor and VP factors for users to undress the observed cross section. The basic idea of this generator is to decompose the total hadronic cross section into the measured exclusive processes and remaining unknown processes. The latter are generated with the LUARLW model.

Initial state radiative correction
In an e + e − energy scan experiment, we consider a measurement of the Born cross section (σ 0 ) for a process e + e − → X i , as shown in Fig. 1 (a), where X i denotes the hadron states of i-th process. Due to ISR, the observed cross section (σ) is actually for the process e + e − → γ ISR X i , as shown in Fig. 1 (b). The observed cross section is related to the Born cross section by the quasi-real electron method [7]: where m is the invariant mass of the final states; Π(m) is the vacuum polarization function, which will be discussed later; s is the e + e − center-of-mass energy squared; x ≡ 2E * γ / √ s = 1 − m 2 /s, and E * γ is the total energy carried by ISR photons in the e + e − center-of-mass frame; M th is the mass threshold of a given process. To calculate the finite-order leading logarithmic correction, the structure function method is used [8]. This method results in the same factorized form for the radiative photon emission cross section. Up to order α 2 , the radiative function takes the form: where Here the exponential part in Eq. (2) accounts for soft multi-photon emission, while the remaining part takes into account hard collinear bremsstrahlung in the leading logarithmic approximation. We use the radiative function up to the second order calculation to determine the cross section; it is accurate enough to construct the event generator for our purpose, though contributions from the α 3 -order are known [9].
To do the RC for the process e + e − → hadrons, we use the cross sections for the light hadron productions measured so far. In the energy region from M 2π to 5 GeV, the total cross sections are quoted from the Particle Data Group (PDG) [10]. The total distribution is shown in Fig. 2. At the leading order of QED calculation, the ISR photon is characterized by soft energy and beam collinear distribution. A more general result is obtained by the method of Bonneau and Martin [11] up to m 2 e /s terms, and the angular distributions is calculated by where E is the beam energy in the center of mass system of the electron and positron.

Vacuum polarization
The VP of the photon is a quantum effect which leads to the scale dependence of the electromagnetic coupling. It therefore plays an important role in the e + e − physics and it is crucial to know it for the R value measurement.
Conventionally the VP function is denoted by Π(q 2 ), where q is a space-or time-like momentum. In the R value measurement, we only consider the time-like case, i.e. s = q 2 , which receives all possible one-particle irreducible leptonic and hadronic contributions. Their contributions to Π(s) are calculated and then summed. While the leptonic contributions can be predicted within perturbative theory, the precise determination of the hadronic contributions depends on dispersion relations using experimental data as input.
The VP has been calculated by many groups and is available in the literature. Comparisons between them are given in Ref. [12]. There are notable differences below 1.6 GeV, and above 2.0 GeV; visible differences appear when approaching the charmonium resonances. We use the results from the Fred Jegerlehner group [13]. It provides leptonic and hadronic VPs both in the spaceand time-like region. For the leptonic VP the complete one-and two-loop results and the known high-energy approximation for the three-loop corrections are included. The hadronic contributions are given in tabulated form in the subroutine HADR5N [14]. Figure 3 shows the VP factor defined by 1/|1 − Π(s)| 2 in the energy region √ s = 2.0 − 5 GeV. The values at J/ψ and ψ(2S) peaks are very large but less significant elsewhere.

Cross sections for exclusive processes
Many exclusive processes have been measured in the e + e − collision experiments. Currently we collect 76 exclusive modes, with energy region covering from 0.3 GeV up to about 6 GeV. The Born cross sections are quoted from the published papers; their information is given in Table 1. The sum of these cross sections is shown in Fig.  2. Below 2.0 GeV, the total cross section is the sum of the exclusively measured ones.
The narrow vector resonances, such as ψ(3770), ψ(2S), J/ψ, ρ(1700), and ω(1420), are also included in the calculation for the ISR correction factor. The cross sections for these narrow resonances are represented with the Breit-Wigner function where M, γ, and γ ee are the mass, total width and partial decay width to e + e − final state, respectively. The distribution of cross section versus center-ofmass energy is described by an empirical function, which is parameterized with a multi-Gaussian function. Its parameters are determined by fitting the cross section mode by mode. These empirical functions are used in the generator for the calculation of the ISR correction factor and event type sampling.
The angular distribution for ISR photons is implemented according to Eq. (6). However, angular distributions are implemented only for two-body decays, namely, 1−cos 2 θ for PP (where P is a pseudoscalar meson) modes, and 1+α cos 2 θ for the PV (α = 1) and BB modes, where V is a vector meson, and B is a baryon. The angular distribution parameter α for the BB mode is taken as the quark model prediction [43]. The phase space model is used for multi-body decays.

LUND Area Law model
The hadronic events produced in the e + e − annihilation are evolved as follows. As the first step, a quarkantiquark (qq) pair is produced from a virtual photon, coupled to the e + e − pair. Then the qq branching proceeds via emitting gluons, and further develops into hadrons. In the high energy region, the cluster model (e.g. HERWIG [44]) and LUND string model (e.g. JETSET/PYTHIA [45]) are available and precise enough to describe the hadronic fragmentation with parameters optimized at boson Z peak. However, in the intermediate and low energy region, parameters need to be optimized or a new model is desirable to describe the light quark fragmentation.
In the tau-charm energy region, the LUARLW model [6] has been proposed to estimate the multiplicity distribution for primary hadrons produced from the string fragmentation. The probability distribution reads: where c 0 , c 1 , c 2 , α, β and γ are parameters to be tuned with data. An interface to access the LUARLW model is designed in the BesEvtGen [5] framework, and is only used to generate the primary hadrons. The further decays into light hadrons are realized with BesEvtGen [5].

Monte Carlo algorithm
The event sampling proceeds via two steps. Firstly, the mass of the hadron system, M hadrons , is sampled according to the distribution of the observed cross section, i.e. dσ(s)/dm, for the process e + e − → γ ISR X i according to Eq. (1). For simplicity, the ISR energy, √ s−M hadrons , is imposed on a single photon. The second step is to sample the event type topology according to the ratios of individual cross sections at the energy point M hadrons .

Sampling of M hadrons
To calculate the total observed cross section at √ s, we split the integral of Eq. (1) into two parts, i.e.
where the threshold energy M th is the sum of masses for the final state particles, and the broken point is taken at with a cut E cut γ on the ISR photon energy. At the BESIII detector, the designed photon energy of the detection range is from ∼ 20 MeV to 4.2 GeV [46]. If the photon energy is less than 20 MeV, it will be impossible to reconstruct it in the detection simulation.
Hence in practice, E cut γ can be set to an energy less than the sensitivity of photon detection. e.g., E cut γ = 1 MeV. In the range 0 ∼ E cut γ , the ISR photon is too soft to be detected, so the ISR photon is not considered. To simplify the calculation, Born cross sections near the energy point √ s are assumed to be a constant value σ 0 ( √ s). Using the relation x = 1 − m 2 /s, the second integral can be further decomposed into two parts: with b ≪ 1. Using the radiative function given in Eq.
(2), one has with Spence's function Li 2 (x) = −x+ 1 4 To sample the M hadrons , we split the region M th ∼ √ s into a few hundred intervals. The cumulative cross section up to the i-th interval, m i , iŝ The M hadrons is sampled according to theσ(m i ) distribution with the discrete MC sampling technique.

Sampling of event type
Using the discrete MC sampling technique, the final states for exclusive modes are sampled according to the ratios of their cross sections (σ m ) to the total cross section (σ tot ), i.e., where m is an index for exclusive precess, and events for the remainder part, 1 − m c m , are generated with the LUARLW model.

Strategy to optimize the LUARLW parameters
The LUARLW model parameters are optimized with the parameterized response function method. The optimal values are obtained by simultaneously fitting this function to data distributions. The idea for this method is borrowed from that implemented in the event generator tuning tool Professor and Rivet [47] system, which was introduced by TASSO, and later used by ALEPH, DELPHI [48][49][50][51][52][53], and recently by the LHC [47]. This method has the advantage of eliminating the problem from the so-called manual and brute-force tunings, such as the slow tuning procedure and the sub-optimal results.
An ensemble of MC samples was produced within the framework of the BesEvtGen [5] event generator, and then it is subject to detector simulation with BOSS software [57]. 91 independent MC samples were prepared, each one generated with a different set of LUARLW parameters, which were randomly chosen in the parameter space around a given central point p 0 . All MC samples were produced with equal statistics, and were large enough so that the overall statistical uncertainties are negligible.
By including the correlations among the model parameters, the dependence of physical observable is expanded up to the quadratic term as done in Ref. [54], and the response function reads where n is the number of parameters to be fitted, and M C(p 0 + δp, x) denotes the distribution of physical observable x predicted for a given set of parameter values p 0 + δp, where p 0 is the central value and δp i is the deviation of the i-th parameter. The quadratic term in the expansion accounts for the possible correlations between the model parameters. The number of coefficients a (0,1,2) , L, in the expansion is calculated with and the coefficients are determined by fitting Eq. (12) to the L reference simulation distributions. This fit is equivalent to solving a system of linear equations of Eq. (12). Then the optimal values of the parameters p i , their errors σ i , and their correlation coefficients ρ ij will be determined with a standard χ 2 fit to data using package MINUIT [55]. The fit is done simultaneously for all distributions and for all bins.
To minimize statistical uncertainties, the model parameters should be fitted to the distributions that show strong dependence on the parameters under consideration and least dependence on the others. For each distribution, a quality to measure the sensitivity to the model i-th parameter is calculated, i.e.
where δM C(x) is the change of the distribution M C(x) when the model parameter p i is changed by δp i from its central value. Sensitivity values for charged track distributions and event shapes vary within the range from −0.3 to 0.3, but the polar angle and azimuthal distributions for charged tracks are not sensitive to the change of model parameters. This is because the inclusive charged tracks are distributed isotropically over the whole phase space. Taking the sensitivity into consideration, only 12 observable distributions are kept for the model parameter fit. They are the number of photons (N γ ), the number of charged tracks (N track ), momentum of tracks (P track ),  [45], where W is the total reconstructed energy of an event, and P ⊥ is the transverse momentum.
We have 12 parameters to be optimized. According to Eq. (13), there are 91 coefficients, a (0,1,2) in Eq. (12) to be determined. Hence we need at least 91 MC samples to determine these coefficients. These were prepared with 0.5 million events for each sample. Then the dependence of response function on model parameters is established, and this analytical expression is used to simultaneously fit to the data distributions after QED background events are subtracted. In the optimization procedure, the χ 2 function is defined over each bin, ie., χ 2 → χ 2 /N , where χ 2 values are calculated over nonempty N bins. To consider the requirement of fit goodness on the multiplicity of charged tracks, this distribution is weighted with a factor of 10, while other distributions are weighted with a unitary factor. This weighted factor is chosen by requiring that the fit quality of all distributions are satisfactory.

Event selection and fit results
We use the data taken at √ s =3.65 GeV to optimize the parameters. To validate these parameters, we check whether it is suitable for describing the data distribution in the energy region 2.0-4.26 GeV. The QED backgrounds, e.g. e + e − → e + e − , γγ, γ * γ * , µ + µ − , and τ + τ − are subtracted using MC samples, and they are normalized according to their cross sections to the luminosity of data sets. The event selection criteria for light hadrons are similar to those applied to the R value measurements [2,56].
The selected candidates are characterized by the distributions of charged track multiplicity (N track ), track energy (E track ) and momentum (p track ), polar angle (cos θ), azimuthal angle (φ), rapidity, peseudorapidity, and a set of event shapes. These distributions are normalized to one and the errors are scaled for all bins.
To consider the possible correlations between these observable quantities, different observable combinations were tried. In each combination, track observables, N γ , N track , E track , x f and x ⊥ , must be included, while the p track distribution or event shapes are partly included in the simultaneous fit. Generally speaking, the more observable distributions are involved in the fit, the worse fit quality one gets. To validate the resulted parameters, they are reused to generate MC samples, and compared to data. The covariant matrix in fitting was checked, and it shows that there are strong correlations among these parameters. This indicates that the model parameters in question are not independent, which leads to some technical issues. One is the instability of the fitted values. If initial values are changed, then the fit gives a different set of parameters with almost the same fit quality. The dependence on the initial values brings about the so called multi-solution. Fortunately, we find that the produced MC distributions with these multi-solution values are in good agreement with data distributions. The correlation between the parameters means that the fitted value may be unphysical. One recipe to tackle this issue is to fix correlated parameters to the physical values, thus the fit can yield physical values for uncorrelated model parameters.

Validation of tuned parameters
In the simultaneous fit to data, we have tried various combination of data distributions, which results in a few sets of parameters. To select the most optimal values, we compare the data to the MC distributions, which are generated with optimized parameters for all sets. We require that the parameters can produce MC distributions having the best fit goodness quality χ 2 /N , where N is the total number of bins for calculating the χ 2 values. The optimal values are given in Table 2. These values are only responsible for unknown processes other than the exclusive modes. For example, the parameter PARJ(15)=0 implies that exclusive modes have produce sufficient scalar mesons, so the LUARLW model forbids the scalar meson production. Figure 4 shows a comparison of data and MC distributions at √ s = 3.65 GeV, where the MC sample is produced with the optimized parameters. The agreement between them is satisfactory. To demonstrate the flexibility of these parameters at low energy points, we generate MC at 3.06 GeV with the same parameters, and Fig. 5 shows comparisons between the data and MC simulation. The agreement between the data and MC distributions is acceptable. However, above the DD threshold, we check these parameters with the data taken at 4.26, 4.23 and 4.6 GeV, and we find that the agreement between data and MC gets worse. This suggests   that the optimized parameters are acceptable only below the DD threshold. To optimize parameters above the DD threshold, the charm meson decays will have to be added.
To validate this set of parameters for the MC generation below the DD threshold, we compare the charged track multiplicity distributions at 14 energy points from √ s = 2.2324 to 3.671 GeV, as shown in Fig. 6. When extending this set of parameters from 3.65 GeV to low energy points, the agreement between the data and MC multiplicity distributions gets better. This is due to the fact that the total cross section equals the sum of the exclusive ones when approaching the energy 2.0 GeV, as shown in Fig. 2.

Discussion and summary
To summarize, we have developed an event generator for R measurement at energy scan experiments, incorporating the initial state radiation effects up to the second order correction. In the event generator, the ISR correction factor is calculated using the totally hadronic Born cross sections measured in experiments. The measured exclusive processes are generated according to their cross sections, while unknown processes are generated using the LUARLW model, whose parameters are tuned with the data collected at 3.65 GeV. To validate the optimized parameters, we compare various distributions using data sets covering from energy √ s = 2.2324 to 3.671 GeV. We conclude that the optimized parameters are valid for MC generation below the DD threshold. Above the DD threshold, the parameters should be optimized with the charm meson decays.