b-Jet Identification in CMS

A large fraction of the CMS physics program relies on the identification (tagging) of jets containing the decay of a B hadron (b jets). The b jets can be discriminated from jets produced by the hadronization of light quarks based on characteristic properties of B hadrons, such as the long lifetime. An overview of the large variety of b-tagging algorithms and the measurement of their performance with data collected in 2011 and 2012 are presented in this paper. A special focus lies on new methods of b-tagging in jet substructure. Searches for new physics often focus on boosted final states characterized by particles with large transverse momenta, resulting in decay products of heavy particles tending to be collimated and reconstructed as a single jet, known as fat jet. In this case, the reconstruction of the fat jet substructure is necessary to identify the particle initiating the fat jet. The substructure reconstruction can significantly be improved by the identification of b jets.


CITE THIS VERSION
Le dépôt institutionnel DIAL est destiné au dépôt et à la diffusion de documents scientifiques émanents des membres de l'UCLouvain. Toute utilisation de ce document à des fin lucratives ou commerciales est strictement interdite. L'utilisateur s'engage à respecter les droits d'auteur lié à ce document, principalement le droit à l'intégrité de l'oeuvre et le droit à la paternité. La politique complète de copyright est disponible sur la page Copyright policy DIAL is an institutional repository for the deposit and dissemination of scientific documents from UCLouvain members. Usage of this document for profit or commercial purposes is stricly prohibited. User agrees to respect copyright about this document, mainly text integrity and source mention. Full content of copyright policy is available at Copyright policy

Introduction
Identification of jets arising from bottom-quark hadronisation and decay (b-tagging [1] [2]) is used in many physics analyses to perform precise measurements of the standard model (SM) and for new particle searches. New physics signatures with b jets in the final states are expected at high mass, where the b quarks might end up in boosted topologies with overlapping jets from top-quark or Higgs boson decays, making b-tagging more challenging. The Compact Muon Solenoid (CMS) [3] detector recorded protonproton collisions occurring at the LHC during the 2012 data taking. With its precise charged particle tracking system and robust lepton identification, it is well matched to the task of b-jet identification. In this paper, the different b-tagging algorithms developed and used in CMS are described in section 2. Then the performance measurements are presented in section 3. Finally, Email address: camille.beluffi@cern.ch (Camille Beluffi) b-tagging in boosted topologies is discussed in section 4.

Algorithms and discriminators for b-tagging
The hadronization of a b quark produces a B hadron which propagates a measurable distance before decaying. Such behavior leads to special properties of the arising b jet, like the presence of an inner displaced secondary vertex with a flying distance higher than its resolution. Tracks coming from a secondary vertex have a large impact parameter that can also be used to identify b jets. Besides, in 20% of cases, a b jet will contain a lepton coming from the semi-leptonic decay of the B hadron. These features are used to build taggers, yielding a single discriminator value for each jet. To analyse the 2012 dataset, three taggers were used: -Combined Secondary Vertex (CSV): secondary vertex and track-based lifetime information are used to build a likelihood discriminator.
-Jet Probability (JP): the jet is assigned a likelihood estimation that all associated tracks come from the primary vertex.

Performance measurement
In order to use the b-tagging algorithms in physics analyses, the performance of each algorithm has to be calibrated in data. Many methods have been developed in CMS to measure the b-jet tagging efficiency and the misidentification probability to tag a light-parton jet as a b jet. Three samples of events are used: inclusive jet samples, muon-enriched jet samples and enriched tt samples.

b-tagging efficiency measurement
The b-tagging efficiency is measured in data using several methods applied to multijet events and tt events. The efficiency measured in data is compared with the identification efficiency for b jets in the simulation, resulting in a data/MC scale factor:

Measurement in multijet events
The PtRel, IP3D and LT methods use a sample of jets enriched in heavy flavour content by requiring a soft muon within the jet (muon-jet). The fraction of b jets in the selected sample is estimated by fitting the data distribution of a discriminant variable (the transverse momentum of the muon relative to the jet axis (PtRel), the 3D impact parameter of the muon track (IP3D), the discriminator distribution of another tagger (LT)). The btagging efficiency in data is measured by estimating the number of b jets in the muon-jet sample by the fit, then repeating the fit on the subsample of muon-jets passing the tagging requirement. The efficiency is the ratio between these two values. The System8 method uses two weakly correlated taggers, one of which is the one to be probed. They are tested in two samples with different b-quark enrichement. Various systematic uncertainties are considered, among them the pileup description, the rate of gluon splitting into b-quark pairs, the muon p T spectrum and bfragmentation modelling, and the description of the relative direction of the muon with respect to the jet.

Measurement in tt events
Both lepton+jets and dileptonic final states are used. In the lepton+jets channel, the flavour tag consistency (FTC) method and the bSample method are used. In the dilepton channel, the flavour tag matching (FTM) method is used as well as the LT method, which can be applied on the same events. The FTC method (FTM method) requires consistency between the observed and expected number of tags in the lepton+jets (dilepton) events. A log-likelihood fit is performed with the b jet tagging efficiency as free variable. In the bSample method, the b-jet tagging efficiency is measured from two subsamples, one enriched in b jets and the other depleted, based on the transverse mass of the muon and jet from the leptonically decaying top. Efficiencies are derived from the difference between the discriminator distributions in the two subsamples. The main sources of systematic uncertainties are the jetparton matching, the definition of the renormalization and factorization scales, the choice of the parton distribution function, the pileup description and jet energy scale and jet energy resolution.

Combination of b-tagging efficiency measurements
The combination is based on a weighted mean of the different scale factor measurements, taking into account correlated and uncorrelated uncertainties and evaluating the shared fraction of events between the different methods. Table 1 compares the combined scale factors S F b measured in multijet and tt events, averaged over the p T spectrum of jets from top decays.

Misidentification probability measurement
The probability of light-flavour quark and gluon jets being misidentified as b jets is evaluated with negative taggers, which are identical to the default algorithms, except that they use only tracks with negative impact tagger S F light JPL 1.03 ± 0.01 ± 0.07 CSVL 1.10 ± 0.01 ± 0.05 JPM 1.10 ± 0.02 ± 0.20 CSVM 1.17 ± 0.02 ± 0.15 TCHPT 1.27 ± 0.06 ± 0.27 JPT 1.11 ± 0.07 ± 0.31 CSVT 1.26 ± 0.07 ± 0.28 paramter values or secondary vertices with negative decay lengths. The discriminator values for negative and positive taggers are expected to be symmetric for lightparton jets by resolution effect. We can therefore derive the misidentification probability misid from the rate of negative-tagged jets − in inclusive jet data. A correction factor, R light = misid MC / − MC , is evaluated from the simulation in order to correct for second-order asymmetries in the negative and positive tag rates of lightflavour quark and gluon jets, and for the heavy flavour contribution to the negative tags: misid data = − data × R light . The data/MC scale factors of the misidentification probabilities, S F light = misid data / misid MC , are given in Table 2.

b-tagging in boosted topologies
High-mass resonances with a final state containing b quarks are predicted by various models of new physics. They may decay into top-quark pairs or Higgs bosons, and if they have a large enough momentum ("boosted toplogies"), their decay products are very collimated, resulting in a small angular distance ∆R between them, and ending up clustered in a single fat jet. Boosted topologies are usually reconstructed and interpreted using jet substructure reconstruction methods such as top/W/Z-tagging algorithms [4]. Algorithms of b-tagging in the jet substructure can significantly improve the sensitivity of these methods.

b-tagging in jet substructure
One important reconstruction parameter is the size of the jet, which needs to be optimised to include all decay products, depending on the jet p T . Two cases have been studied in detail: for top-tagging, the use of the HEPTopTagger [5] algorithm, which is based on Cambridge/Aachen jets of size R = 1.5 (CA15), is investigated. The fat-jet substructure is identified by undoing the CA algorithm clustering. For Higgs-tagging, the focus is on CA jets of size R=0.8 and the jet substructure  is described by pruned jets. Algorithms of b-tagging can then be applied on the fat jet or on its substructure components, the second option giving the best performance (see Fig.1).

Performance measurement
Measurement of b-tagging efficiency in boosted topologies is challenging, and needs specific treatment since results on standard jets are not necessarily applicable to boosted objects. For Higgs-tagging, efficiency is measured using LT method on different control samples to study the performance of b-tagging both on fat jets and subjets. The agreement found between data and simulation is compatible with what is observed in non boosted topologies. A modified implementation of the FTC method has been developed to measure the btagging efficiency in boosted top-quark events and results show that the simulation reproduces the b-tagging efficiencies in data equally well in boosted and in nonboosted top-quark events.