First observation of forward $Z \rightarrow b \bar{b}$ production in $pp$ collisions at $\sqrt{s}=8$ TeV

The decay $Z \rightarrow b \bar{b}$ is reconstructed in $pp$ collision data, corresponding to 2 fb$^{-1}$ of integrated luminosity, collected by the LHCb experiment at a centre-of-mass energy of $\sqrt{s}=8$ TeV. The product of the $Z$ production cross-section and the $Z \rightarrow b \bar{b}$ branching fraction is measured for candidates in the fiducial region defined by two particle-level $b$-quark jets with pseudorapidities in the range $2.2<\eta<4.2$, with transverse momenta $p_{T}>20$ GeV and dijet invariant mass in the range $45<m_{jj}<165$ GeV. From a signal yield of $5462 \pm 763$ $Z \rightarrow b \bar{b}$ events, where the uncertainty is statistical, a production cross-section times branching fraction of $332 \pm 46 \pm 59$ pb is obtained, where the first uncertainty is statistical and the second systematic. The measured significance of the signal yield is 6.0 standard deviations. This measurement represents the first observation of the $Z \rightarrow b \bar{b}$ production in the forward region of $pp$ collisions.


Introduction
Measurements of Z -boson production in pp collisions constitute an important test of the Standard Model (SM), since they allow the electroweak sector to be precisely probed [1][2][3]. The LHCb experiment can be used to measure the decay of the Z boson into a bb quark pair in the forward region that is inaccessible at other LHC experiments.
The decay Z → bb provides a standard candle for searches in final states with a bb quark pair. The inclusive search for the SM Higgs decay to two b quarks at the LHC is of great interest, since the measurement of the Higgs boson coupling to b quarks is an important test of the SM [4]. Several extensions of the SM predict that new heavy particles that decay to two energetic b quarks could be accessible at LHC collision energies [5][6][7]. A sizeable Z → bb event sample will enable the measurement of the bb forward-central asymmetry at the Z pole, which could be enhanced by the contributions from new physics processes [8]. The forward-central asymmetry in inclusive bb events has previously been measured by the LHCb collaboration [9].
The measurements of this decay can also be used to demonstrate that no biases are induced by the b-jet reconstruction procedure and that the reconstruction efficiencies are evaluated correctly. In addition, the Z → bb decay is important to determine the so-called b-jet energy scale. This is the factor that has to be applied to the reconstructed b-jet energy in simulated events in order to reproduce the actual detector response. The CDF collaboration reconstructed the Z → bb decay in pp collisions at 1.96 TeV and determined the b-jet energy scale, obtaining a relative uncertainty on the product of the cross-section and the branching fraction of 29%. The analysis of the ATLAS collaboration reconstructed boosted Z → bb candidates in the central region of pp collisions at 8 TeV, with pseudorapidity |η| < 2.5, and determined the cross-section with a relative uncertainty of 16%. The CMS collaboration made the first observation of the Z → bb decay in a single-jet topology in the same pseudorapidity region, with a significance of 5.1 standard deviations. This Letter describes a new method to study the Z → bb decay, performed on pp collision data collected at a centre-of-mass energy of √ s = 8 TeV, corresponding to an integrated luminosity of 2 fb −1 . The low trigger thresholds on the particle energies that are employed at LHCb and the excellent b-jet identification performance make it possible to select candidates within a large invariant mass range, including those with masses below the Z -boson pole. Events are selected requiring two b-jet candidates, referred to as a b dijet, and an additional jet that balances the transverse momentum of the bb system. The invariant mass distribution of the b dijet is used to determine the Z → bb yield and the b-jet energy scale. The invariant mass distribution of the QCD background is determined using a control region that is defined through observables related to the b-dijet system and to the associated balancing jet. Simulated data are used to evaluate the reconstruction efficiency and the detector acceptance, enabling a measurement of the Z production cross-section multiplied by the Z → bb branching fraction.

The LHCb detector, trigger and simulation
The LHCb detector [13,14] is a single-arm forward spectrometer fully instrumented in the pseudorapidity range 2 < η < 5, which is designed for the study of b and c hadrons. The detector includes a high-precision tracking system consisting of a silicon-strip vertex detector surrounding the pp interaction region, a siliconstrip detector located upstream of a dipole magnet with a bending power of about 4 Tm, and three stations of silicon-strip detectors and straw drift tubes placed downstream of the magnet. The tracking system provides a measurement of momentum, p, of charged particles with a relative uncertainty that varies from 0.5% at low momentum to 1.0% at 200 GeV. 1 The minimum distance of a track to a primary vertex, the impact parameter, is measured with a resolution of (15 + 29/p T ) μm, where p T is the component of the momentum transverse to the beam, in GeV. Different types of charged hadrons are distinguished using information from two ring-imaging Cherenkov detectors. Photons, electrons and hadrons are identified by a calorimeter system consisting of scintillatingpad (SPD) and preshower detectors, an electromagnetic calorimeter and a hadronic calorimeter. Muons are identified by a system composed of alternating layers of iron and multiwire proportional chambers. The online event selection is performed by a trigger system, which consists of a hardware stage, based on information from the calorimeter and muon systems, followed by a software stage, which applies a full event reconstruction. Events are required to satisfy at least one of the following hardware trigger requirements: contain a muon with p T > 1.86 GeV, a hadron with transverse energy in the calorimeters E T > 3.7 GeV, an electron with E T > 3 GeV, a photon with E T > 3 GeV or a pair of muons with p T 1 · p T 2 > 1.6 GeV 2 . A global event cut (GEC) on the number of hits in the SPD is applied in order to prevent highmultiplicity events from dominating the processing time. At the software trigger stage events are required to have a two-, threeor four-track secondary vertex (SV) with significant displacement from any primary vertex. A multivariate algorithm [15] is used for the identification of secondary vertices consistent with the decay of a b hadron, strongly suppressing the contamination from charmed hadrons. Simulated events generated with Pythia [16], with a specific LHCb configuration [17], are used to model the properties of the signal Z → bb events and backgrounds such as Z → cc, W → qq decays and tt events. Decays of hadronic particles are described by EvtGen [18], where the final-state radiation is generated using Photos [19]. The interaction of the generated particles with the detector, and its response, are implemented using the Geant4 toolkit [20] as described in Ref. [21].

Candidate selection
Candidates are selected by requiring the presence of at least three jets, which are reconstructed as detailed in Refs. [22][23][24][25]. Jets are reconstructed using a particle flow algorithm [25] and are clustered with the anti-k T algorithm [26] with a distance parameter 0.5, as implemented in the FastJet software package [27]. A jet energy correction [25] determined from simulation is applied to recover the jet energy at particle level and jet quality requirements 1 In this Letter natural units where h = c = 1 are used. are applied [25]. Jets are heavy-flavour tagged, i.e. as containing a b or c hadron, if a SV is found with a distance R < 0.5 from the jet axis, where R is the distance in the (η, φ) plane and φ is the azimuthal angle between the jet axis and the vector that points from the pp interaction point to the SV. The details of the flavour-tagging algorithm are described in Ref. [28]. Two heavyflavour tagged jets are required to form a Z → bb candidate. At least one of the two b-jet candidates must be tagged by a SV selected by the software trigger requirements. The two heavy-flavour jets are each required to have transverse momenta p T > 20 GeV, pseudorapidities in the range 2.2 < η < 4.2, and a combined invariant mass (m jj ) in the range 45 < m jj < 165 GeV. The fiducial region of the measurement within which the cross-section is determined is defined by the kinematical requirements described above applied to particle-level jets, which are jets reconstructed in the simulation from stable particles (i.e. particles with lifetime in excess of 10 ps, excluding neutrinos) using the default reconstruction algorithm.
In order to increase the signal-to-background ratio, the absolute azimuthal angle between the two b-jets is required to be greater than 2.5 radians. The presence of a balancing jet is required to help discriminate Z → bb events from the QCD multijet background. The Z + jet signal is predominantly produced via quark-gluon scattering, while the QCD multijet background is produced via gluongluon interactions [29]. The balancing jet is defined as that which minimises the total p T of the Z boson and the jet. This jet is required to have p T > 10 GeV and 2.2 < η < 4.2. Given the SM cross-sections [30] and the selection efficiencies, which are evaluated using simulation, about 17 × 10 3 Z → bb candidates, 600 Z → cc candidates, 200 W → qq candidates and 50 tt candidates are expected after the application of the selection criteria. A sample of around 6 × 10 5 candidates is selected in data, dominated by the combinatorial background from the multijet QCD events.
A multivariate classifier is trained to discriminate Z → bb events from combinatorial QCD events. A uniform Gradient Boost Boosted Decision Tree technique [31] is adopted, in order to ensure a selection efficiency with a low dependence on the dijet invariant mass. The classifier is trained using four kinematical variables of the three-jet system, chosen for both their low correlation with the dijet invariant mass and for their discriminating power. The variables are the absolute pseudorapidity difference between the two heavy-flavour jets, the p T of the balancing jet, the angle between the balancing-jet momentum and the Z -boson candidate momentum in the azimuthal plane with respect to the beam axis, and the polar angle between the balancing-jet momentum and the Z -boson flight direction in the Z -boson rest frame. The classifier is trained using 5% of the data sample to represent the combinatorial QCD background. This training sample has a negligible Z → bb, Z → cc, W → qq and tt contamination and it is not used in the dijet invariant mass fit described below. The signal process is modelled using simulated Z → bb events. The distributions of the input observables related to the balancing-jet kinematics are validated by comparing the high purity Z (→ μ + μ − ) + jet data sample described in Ref. [24] with the corresponding simulation sample.
The output of the classifier (uGB) is shown in Fig. 1. Candidates are selected in two different regions of uGB: the signal region (uGB > x s ), which has enhanced Z → bb contribution, and a control region (uGB < x c ), which has a larger contribution from QCD combinatorial events. The two regions are fitted simultaneously to determine the Z → bb yield, and the values of x s and x c are chosen in order to achieve the best signal significance.

Signal yield determination
A simultaneous fit to the b-dijet invariant mass distributions in the signal and control regions is performed to determine the Z → bb yield and the jet energy scale factor, k JES . A triple-Gaussian model is used to describe the Z → bb dijet invariant mass distribution. The parameters of this model are obtained separately for the candidates in the signal and control regions using simulation, and are fixed in the fit to the data. The k JES factor is also introduced in the Z → bb invariant mass distribution model in order to account for differences between simulation and data in the jet four-momentum. This is achieved by substituting m jj with m jj /k JES in the model. The reconstructed invariant mass of dijets in Z → bb simulated events has a mean of 80 GeV, i.e. below the known Z -boson mass [30], and a resolution of 16%. The reduced mean is due to parton radiation outside the jet cone, missing energy, and residual biases in the reconstructed jet energy that are not recovered by the jet energy correction.
The invariant mass distribution of the combinatorial background is parametrized with a Pearson IV distribution, as is typical to describe the multijet combinatorial background [10]. The four parameters of the Pearson IV function are free to vary in the fit and they have approximately the same values in the signal and control regions, since the uGB is trained to be as uniform as possible with respect to the dijet invariant mass. To take into account the residual correlation with the dijet invariant mass, the Pearson IV distribution is multiplied in the signal (control) region by a linear transfer function t s(c) (m jj ), defined as where the superscript s (c) indicates the signal (control) region, and a s(c) and b s(c) are parameters fixed in the invariant mass fit. The parameters a s(c) and b s(c) are determined by fitting the transfer function to the selection efficiency after the requirement that uGB > x s (uGB < x c ) as a function of the dijet invariant mass in the 45 < m jj < 60 GeV and 100 < m jj < 165 GeV intervals, where the Z → bb contribution is negligible. As a cross-check, data events with uGB < x c are fitted with only the QCD background model, ignoring the small Z → bb contribution, and a good fit quality is obtained.
The invariant mass model used to fit the signal region is where N c Q is the number of QCD events in the control region and Q (m jj ), t c (m jj ) and Z c (m jj ) are the Pearson IV distribution, the transfer function and the Z -boson invariant mass distribution model in the control region. The parameter R is the ratio of the efficiency for Z -boson candidates selected with uGB < x c and uGB > x s and is determined from simulation and fixed in the fit. A simultaneous unbinned maximum likelihood fit is performed with the N s Q , N c Q , N s Z , k JES and the Pearson IV parameters free to vary. Pseudoexperiments are used to verify that the fit is stable and estimate any bias. The parameter N s Z is determined with a bias of about 2% and the value returned by the fit is corrected accordingly in the cross-section determination.
The fit result is shown in Fig. 2 and the background-subtracted data and result of the fit are shown in Fig. 3. The Z -boson yield in the signal region is 5462 ± 763 and the jet energy scale factor is measured to be 1.009 ± 0.015. Using Wilks' theorem [32], the Z → bb statistical significance is found to be 7.3 standard deviations.
As an additional cross-check to validate the technique, a fit to the dijet invariant mass distribution for candidates with x c < uGB < x s is performed, with a model analogous to that used in the signal and control regions. In this case, the parameters of the QCD background are fixed to the values returned by the default fit, but the Z → bb yield in this region, N v Z , is left free. The goodness of this fit is acceptable and the ratio N v Z /N s Z is compatible with the expectation from simulation.

Cross-section determination and systematic uncertainties
The product of the Z -boson production cross-section and the Z → bb branching fraction is determined using Z is the efficiency of the selection requirements, including uGB > x s , for events in the fiducial region, f uGB is the fraction (5%) of data events removed for the multivariate classifier training and 1 + f Z →cc is a factor applied to correct for the small Z → cc contamination. The selection efficiency is obtained from simulation, but correction factors are applied to account for differences in the heavy-flavour tagging efficiencies between data and simulation [28]. By using a small sample with a looser trigger requirement and a technique similar to that described in Ref. [24], the GEC efficiency is also corrected for differences in data and simulation. The balancing-jet selection efficiency is corrected at Next-to-Leading-Order (NLO) using simulated Z → bb events produced with aMC@NLO [33] plus Pythia for parton showers. The f Z →cc fraction is obtained by multiplying the Z → cc and Z → bb branching fraction ratio [30] by the acceptance and the efficiency ratios, both determined using simulation.
The sources of systematic uncertainty considered for the measurement are given in Table 1. Systematic effects that are asso-   Table 1 Systematic uncertainties on the cross-section, σ Z = σ (pp → Z )B(Z → bb), and jet energy scale in percent. The total uncertainty is the sum in quadrature of all the contributions. ciated with differences between data and simulation can affect the signal invariant mass distribution model and the selection efficiency. The impact of these differences is evaluated by repeating the fit with a modified signal model and by recalculating the cross-section varying s Z . Other sources of systematic uncertainties are related to the signal extraction procedure.
The method described in Ref. [28] is used to assess the systematic uncertainty due to the heavy-flavour tagging efficiency which amounts to 5%-10% per jet, depending on the p T range. This uncertainty is dominated by the size of the calibration samples used in the heavy-flavour tagging efficiency measurement. Since one of the two b-jet candidates must be tagged by a SV selected by the software trigger, the uncertainty on this trigger efficiency is included in this contribution. The systematic uncertainty associated with the hardware trigger efficiency is determined by measuring the efficiency with a tag-and-probe technique, using the high purity Z (→ μ + μ − ) + jet data sample [24]. In order to avoid trigger bias on the jet selection, the tag is the muon that triggered the event and the probe is the associated jet. The hardware trigger efficiency measured on probe jets is compared between data and simulation and the maximum difference in intervals of the jet p T is taken as an uncertainty. The latter does not take into account the systematic uncertainty on the GEC efficiency, which is determined separately by studying its dependence on the b-dijet invariant mass and assigning the largest variation as the uncertainty. The systematic uncertainty on the jet energy correction includes biases due to jet flavour dependence, reconstruction of tracks which are not associated to a real particle, the track momentum resolution and residual differences between simulation and data, as described in Refs. [24,25]. The jet energy resolution is modelled in simulation with an uncertainty measured in Refs. [23,25]. The uncertainties related to the jet reconstruction and identification are taken from Ref. [25]. The systematic uncertainty associated with the balancing-jet se-lection efficiency is evaluated by measuring this efficiency in the Z (→ μ + μ − ) + jet data and simulation samples and taking the difference as a systematic uncertainty.
The uncertainty on the model of the signal invariant mass distribution is determined by repeating the fit with an alternative distribution, consisting of the sum of two modified Gaussians. The uncertainty on the QCD model is determined by considering an alternative parametrization, consisting of an exponential decay model multiplied by a function that describes the effect of the jet p T requirements on the invariant mass distribution. It has been verified, by generating pseudoexperiments with this alternative model and by fitting them with the default model, that the choice of the QCD distribution model introduces a small bias in the measurement. This bias is taken as the systematic uncertainty. The systematic uncertainty associated with the transfer functions is evaluated by repeating the fit using second-order polynomial functions instead of linear functions. In these fits the coefficients of the quadratic terms are varied in a range consistent with the data in the invariant mass sidebands used in the determination of the transfer functions. The maximum variation with respect to the default measurement is taken as the uncertainty. The efficiency ratio R is determined using both Z (→ μ + μ − ) + jet data and simulation, and the observed difference is taken as a systematic uncertainty. The uncertainty associated with a possible bias introduced by the fit procedure is determined using pseudoexperiments.
The fit is repeated introducing contributions from the subdominant backgrounds, tt and W → qq , fixed to their SM expectations [30] and modelled with the simulation. The difference in the results is assigned as a systematic uncertainty. The final-state radiation systematic uncertainty is determined as described in Ref. [16].
The systematic uncertainty due to the Z → cc contribution is dominated by the knowledge of the Z → cc branching fraction [30] used in the evaluation of the f Z →cc parameter. The systematic uncertainty on the luminosity is determined as in Ref. [34]. The different sources of systematic uncertainties are considered to be uncorrelated and the total, relative systematic uncertainty is 17.7% for the cross-section measurement, dominated by the heavyflavour tagging efficiency uncertainty (16.6%). The total systematic uncertainty for the jet energy scale measurement is 1.1% and is dominated by the uncertainty on the transfer functions (0.8%). The significance of the signal yield, including all statistical and systematic uncertainties, is 6.0 standard deviations.

Results and conclusions
The product of the Z -boson production cross-section and the Z → bb branching fraction in pp collisions at a centre-of-mass energy of 8 TeV is where the first uncertainty is statistical and the second is systematic. The measurement is made in the fiducial region defined by two particle-level b jets with p T > 20 GeV, 2.2 < η < 4.2, and 45 < m jj < 165 GeV.
The expected cross-section in the fiducial region of the experimental measurement is calculated at NLO using aMC@NLO plus Pythia for the parton showers and the NNPDF3.0 Parton Distribution Functions (PDFs) set [35]. The theoretical prediction determined in this way is where the first uncertainty is related to the missing higher-order corrections and to the value of the strong coupling constant, and the second uncertainty is related to the PDFs. The uncertainty due to missing higher-order corrections is evaluated by varying the renormalization and factorization scales by a factor of two around the nominal choice, and taking the maximum differences with respect to the nominal values. The uncertainty on the strong coupling is included by varying it within its uncertainty and recalculating the cross-section. The uncertainty on the PDFs is estimated by taking the variance of the cross-section predictions, where each replica of the NNPDF3.0 set is used in turn. The prediction and the measurement are compatible within one standard deviation. The additional data being collected by the LHCb collaboration will allow a more stringent comparison with the theoretical prediction in the future. Moreover, the systematic uncertainty on the heavy-flavour tagging efficiency will be reduced by collecting more data [28].
The measured jet energy scale factor is k JES = 1.009 ± 0.015 ± 0.011, where the first uncertainty is statistical and the second uncertainty is systematic. The k JES factor is compatible with unity, which demonstrates that the LHCb simulation reproduces accurately the b-jet energy in data for bb-jet pairs with about 100 GeV of invariant mass. Since a jet energy correction evaluated using simulation is already applied on b jets, k JES represents the residual correction obtained using the data.