Performance of $b$-Jet Identification in the ATLAS Experiment

The identification of jets containing $b$ hadrons is important for the physics programme of the ATLAS experiment at the Large Hadron Collider. Several algorithms to identify jets containing $b$ hadrons are described, ranging from those based on the reconstruction of an inclusive secondary vertex or the presence of tracks with large impact parameters to combined tagging algorithms making use of multi-variate discriminants. An independent $b$-tagging algorithm based on the reconstruction of muons inside jets as well as the $b$-tagging algorithm used in the online trigger are also presented.The $b$-jet tagging efficiency, the $c$-jet tagging efficiency and the mistag rate for light flavour jets in data have been measured with a number of complementary methods. The calibration results are presented as scale factors defined as the ratio of the efficiency (or mistag rate) in data to that in simulation. In the case of $b$ jets, where more than one calibration method exists, the results from the various analyses have been combined taking into account the statistical correlation as well as the correlation of the sources of systematic uncertainty.


Introduction
The identification of jets containing b hadrons is an important tool used in a spectrum of measurements comprising the Large Hadron Collider (LHC) physics programme. In precision measurements in the top quark sector as well as in the search for the Higgs boson and new phenomena, the suppression of background processes that contain predominantly light-flavour jets using b-tagging is of great use. It may also become critical to achieve an understanding of the flavour structure of any new physics (e.g. supersymmetry) revealed at the LHC. Several algorithms to identify jets containing b hadrons have been developed, exploiting the long lifetime, high mass and decay multiplicity of b hadrons and the hard b-quark fragmentation function. They range from an algorithm that uses the signed significance of the decay length with respect to the proton-proton collision location, in the following referred to as the primary vertex, of an inclusively reconstructed secondary vertex to more refined algorithms using both secondary vertex properties and the significance of the transverse and longitudinal impact parameters of the charged particle tracks. The most discriminating observables resulting from these algorithms are combined in artificial neural networks. An independent b-tagging algorithm based on reconstructed muons inside jets, exploiting the relatively large fraction of b-hadron decays with muons in the final state, about 20%, and the b-tagging algorithm used for the online trigger selection have also been developed.
The performance of the tagging algorithms has been characterised in simulated events, including the dependence on additional proton-proton interactions in the same bunch crossing, referred to as pile-up in the following. A first comparison between data and simulation focuses on the basic ingredients for b-tagging, namely the track properties, including the impact parameter distributions. A second comparison focuses more specifically on tracks in b jets, and is made possible by fully reconstructing the b-hadron decay B ± → J/ψK ± .
To use b-tagging in physics analyses, the efficiency b with which a jet containing a b hadron is tagged by a b-tagging algorithm needs to be measured. Other necessary pieces of information are the probability of mistakenly tagging a jet containing a c hadron (but not a b hadron) or a light-flavour parton (u-, d-, s-quark or gluon g) jet as a b jet. In the following, these are referred to as the c-jet tagging efficiency and mistag rate, respectively.
Several methods have been developed to measure the b-jet tagging efficiency, the c-jet tagging efficiency and the mistag rate in data. The b-jet tagging efficiency has been measured in an inclusive sample of jets with muons inside and in samples of tt events with one or two leptons in the final state. The c-jet tagging efficiency has been measured in an inclusive sample of jets associated -1 -to D + mesons as well as in a sample of W + c events. The mistag rate has been measured in an inclusive jet sample. The calibration results are presented as data-to-simulation scale factors, derived from the ratio of the efficiency or mistag rate measured in data to that obtained in simulated events. Where more than one calibration method exists the results from the various analyses have been combined taking into account the statistical and systematic correlation.
This paper is intended to provide a complete description of almost all the b-tagging developments in ATLAS during Run 1 of the LHC in the years 2010 -2012. The results are illustrated with data taken in the year 2011 at a centre-of-mass energy of 7 TeV. As these developments extended over a period of years, there is some variation between the simulated samples and systematic uncertainties used for the data efficiency measurements depending on the chronology. Also, several of the methods developed to measure the tagging efficiency of b jets on the small samples available at the start of Run 1 have meanwhile been abandoned in favour of more precise calibration methods developed later; this is reflected in the choice of results used in the combination of b-jet efficiency measurements made to achieve the ultimate precision. In those methods used previously, quoted values and uncertainties for parameters entering the analysis do reflect the best knowledge at the time. They have not been updated since to benefit from the improved present knowledge on some of the analysis ingredients. Section 2 starts with a discussion of the data and simulated samples used throughout this paper, along with a description of the corrections applied to the simulated samples to reproduce the experimental conditions present in the data. The various b-tagging algorithms are described in sections 3, 4, and 5. Section 6 discusses the effects of pile-up, while section 7 provides a comparison between data and simulated samples of distributions of selected quantities important for b-tagging. Calibrations of the b-jet tagging efficiency and their combination are discussed in sections 8, 9, 10, and 11. Calibrations of the c-jet tagging efficiency are covered in sections 12 and 13, while the mistag rate calibration is discussed in sections 14 and 15.

Data and simulation samples, object selection
The studies presented in this paper are generally based on a data sample corresponding to approximately 4.7 fb −1 of 7 TeV proton-proton collision data, after requiring the data to be of good quality; slight differences exist due to variations in data quality requirements. The data have been collected in 2011 using the ATLAS experiment. The ATLAS detector is a large, general-purpose collider detector and is described in detail elsewhere [1]. Its most prominent features, as relevant to b-jet identification and its performance estimation, are: • An Inner Detector (ID) [2], providing tracking and vertexing capabilities for |η| < 2.5.1 It is immersed in an axial 2 T magnetic field and features three subdetectors employing different techniques. A pixel detector consisting of three layers of silicon pixel sensors is located closest to the beam line. It is followed by a silicon microstrip detector (SCT), consisting of eight (eighteen) layers of silicon microstrip sensors arranged in cylinders (disks) in its barrel (endcap) region, and by a straw tube tracker providing of order 36 measurements for track reconstruction as well as causing high-energy electrons to generate transition radiation. Especially the pixel and microstrip layers are essential for the purpose of a precise reconstruction of tracks and of displaced vertices.
• A hermetic hadronic calorimeter covering the range |η| < 4.9. Its central part is a steel and scintillating tile sampling calorimeter; its forward parts are again sampling calorimeters, using a liquid argon detection medium and copper and tungsten absorbers.
• A large air-core Muon System (MS), providing stand-alone precision muon momentum reconstruction in the range |η| < 2.7 using a combination of drift tube and resistive plate chamber technologies, and equipped with dedicated detectors for triggering and precise timing. A system of one barrel and two endcap magnet toroids provides a bending power ranging between 1 Tm and 7.5 Tm, lowest in the transition region between the toroids.
A three-level trigger system was used to reduce the event rate from the 20 MHz bunch crossing rate to ∼ 200 Hz. The trigger selections used in the different studies are described in the corresponding sections.
The key objects for b-tagging are the calorimeter jets, the tracks reconstructed in the Inner Detector and the signal primary vertex of the hard-scattering collision of interest which is selected from the set of all reconstructed primary vertices. Each vertex is required to have two or more tracks. Tracks are reconstructed from clusters of signals in the silicon pixel and microstrip sensors, and drift circles in the straw tube tracker (collectively referred to as "hits" in the following). They are associated with the calorimeter jets based on their angular separation ∆R(track, jet) ≡ (∆η) 2 + (∆φ) 2 . The association ∆R cut varies as a function of the jet p T , resulting in a narrower cone for jets at high p T which are more collimated. At 20 GeV, it is 0.45 while for more energetic jets with a p T of 150 GeV the ∆R cut is 0.26. Any given track is associated with at most one jet; if it satisfies the association criterion with respect to more than one jet, the jet with the smallest ∆R is chosen. The track selection criteria depend on the b-tagging algorithm, and are detailed in section 3.
Jets used in this paper are reconstructed from topological clusters [1] formed from energy deposits in the calorimeters using the anti-k t algorithm with a radius parameter of 0.4 [3][4][5]. The jet reconstruction is done at the electromagnetic scale and then a scale factor is applied in order to obtain the jet energy at the hadronic scale. In the studies based on jets with associated muons, the jet energy is further corrected for the energy of the muon and the average energy of the corresponding neutrino in simulated events, to arrive at the jet energy scale of an inclusive b-jet sample. The measurement of the jet energy and the specific cuts used to reject jets of bad quality are described in ref. [6]. The jets are generally required to have |η| < 2.5 and transverse momentum p T > 20 GeV. Furthermore, the jet vertex fraction (JVF) is computed as the summed transverse momentum of the tracks associated with a jet consistent with originating from the selected primary vertex (defined as having a longitudinal impact parameter with respect to it less than 1 mm) divided by the summed transverse momentum of all tracks associated with a jet, where only tracks with transverse impact -3 -parameters less than 1.5 mm are considered; it is required to be larger than 0.75. The selection of the primary vertex is described in section 3.1. Some measurements of the b-jet tagging efficiency make use of soft muons (p T > 4 GeV) associated with jets, using a spatial matching of ∆R(jet, µ) < 0.4.
Multiple Monte Carlo (MC) simulated samples are used throughout this paper. The properties and performance of the tagging algorithms are mostly studied using simulated samples of tt events, which unless otherwise stated are generated with MC@NLO v3.41 [7] interfaced to HERWIG v6.520 [8]; for several studies and performance measurements, multijet samples generated using PYTHIA v6.423 [9] are used. To reproduce the pile-up conditions in the data, extra collisions have been superimposed on the simulated events. To simulate the detector response, the generated events are processed through a GEANT4 [10] simulation of the ATLAS detector, and then reconstructed and analysed in the same way as the data. The simulated detector geometry corresponds to a perfectly aligned Inner Detector and the majority of the disabled silicon detector (pixel and strip) modules and front-end chips present in data are masked in the simulation. The ATLAS simulation infrastructure is described in more detail in ref. [11].
To bring the simulation into agreement with data for distributions where discrepancies are known to be present, corrections have been applied to some of the simulated samples. The average number of interactions per bunch crossing, denoted µ , ranged between 4 and 20 [12]. Its distribution in simulated events has been reweighted to ensure a good agreement in the distribution of the number of reconstructed primary vertices between data and simulation. The fraction of pile-up interactions leading to visible signatures (reconstructible interactions) in the region 2.09 < |η | < 3.84 is computed from refs. [13,14], and is used to scale the µ values prior to the reweighting described above, to bring the numbers of reconstructible interactions in agreement between data and simulated events. Applying this scaling has been verified to lead to a good agreement between data and simulated events also in the average number of reconstructed primary vertices as a function of µ . When appropriate, the p T spectrum of the simulated jets has also been reweighted to match the spectrum in data, to account e.g. for the fact that the prescale factors of low threshold jet triggers present in data are not activated in the simulation.
The labelling of the flavour of a jet in simulation is done by spatially matching the jet with generator level partons [15]: if a b quark with a transverse momentum of more than 5 GeV is found within ∆R(b, jet) < 0.3 of the jet direction, the jet is labelled as a b jet. If no b quark is found the procedure is repeated for c quarks and τ leptons. A jet for which no such association can be made is labelled as a light-flavour jet.

Lifetime-based tagging algorithms
The lifetime-based tagging algorithms take advantage of the relatively long lifetime of hadrons containing a b quark, of the order of 1.5 ps (cτ ≈ 450 µm). A b hadron with p T = 50 GeV will have a significant mean flight path length l = βγcτ, travelling on average about 3 mm in the transverse direction before decaying and therefore leading to topologies with at least one vertex displaced from the point where the hard-scatter collision occurred. Two classes of algorithms aim at identifying such topologies. An inclusive approach consists of using the impact parameters of the charged-particle tracks from the b-hadron decay products. The transverse impact parameter, d 0 , is the distance of closest approach of the track to the primary vertex point, in the r-φ projection. The -4 -longitudinal impact parameter, z 0 , is the difference between the z coordinates of the primary vertex position and of the track at this point of closest approach in r-φ. The tracks from b-hadron decay products tend to have large impact parameters which can be distinguished from tracks stemming from the primary vertex. Two tagging algorithms exploiting these properties are discussed in this article: JetProb, used mostly for early data, and IP3D for high-performance tagging. The second approach is to reconstruct explicitly the displaced vertices. Two algorithms make use of this technique: the SV algorithm attempts to reconstruct an inclusive secondary vertex; while the JetFitter algorithm aims at reconstructing the complete b-hadron decay chain. Finally, the results of several of these algorithms are combined in the MV1 tagger to improve the light-flavour-jet rejection and to increase the range of b-jet tagging efficiency for which the algorithms can be applied. These algorithms are discussed in detail in sections 3.2-3.4.

Key ingredients
The determination on an event-by-event basis of the primary vertex [16] is particularly important for b-tagging, since it defines the reference point with respect to which impact parameters and vertex displacements are expressed. The precision of the reconstructed vertex positions improves with increasing associated track multiplicity. For example, in minimum bias events it improves from approximately 300 µm (600 µm) in the x and y (z) directions for two-track vertices to 20 µm (35 µm) for vertices with 70 associated tracks. The vertex resolution depends strongly on the event topology, and significantly better resolutions can be achieved in events with high-p T jets or leptons. The number of reconstructed primary vertices is substantially larger than one in the presence of pile-up interactions: during the highest instantaneous luminosity of the 2011 data taking period, six primary vertex candidates were reconstructed on average. The adopted strategy is to choose the primary vertex candidate that maximises the sum of the associated tracks' p 2 T . The performance of this algorithm depends on the final state and on the pile-up conditions (as will be discussed further in section 6); simulation studies indicate that the probability to choose the correct primary vertex in tt events is higher than 98%, while in lower-multiplicity final states it can be considerably lower.
The actual tagging is performed on the sub-set of tracks in the event that are associated with the jet. Once associated with a jet, tracks are subject to specific requirements designed to select well-measured tracks and to reject so-called fake tracks (in which not all hits used for the track reconstruction originate from a single charged particle) and tracks from long-lived particles (K s , Λ and other hyperon decays) or material interactions (photon conversions or hadronic interactions). The b-tagging baseline quality level requires at least seven precision hits (pixel or micro-strip hits) on the track, and at least two of these in the pixel detector, one of which must be in the innermost pixel layer. Only tracks with p T > 1 GeV are considered. The transverse and longitudinal impact parameters defined with respect to the primary vertex must fulfil |d 0 | < 1 mm and |z 0 | sin θ < 1.5 mm, where θ is the track polar angle (the factor sin θ serves to make the efficiency for tracks to pass these selection criteria less dependent on their polar angles). This selection is used by all the tagging algorithms relying on the impact parameters of tracks. The average number of b-tagging quality tracks associated to a jet with p T = 50 GeV (200 GeV) is 3.5 (7). In typical tt events, the average number of selected tracks per light-flavour (b quark) jet is 3.7 (5.5) and their average p T is 6.6 GeV (6.3 GeV), respectively. The SV and JetFitter algorithms use looser track selection criteria, in particular to maximise the efficiency to identify tracks originating from material interactions or -5 -decays of long-lived particles; these tracks are subsequently removed for b-tagging purposes. The main differences in the selection cuts for the SV algorithm are: p T > 400 MeV, |d 0 | < 3.5 mm (no cut on z 0 ). The corresponding cuts used by the JetFitter algorithm are: p T > 500 MeV, |d 0 | < 7 mm, |z 0 | sin θ < 10 mm. Both algorithms make a requirement of at least one hit in the pixel detector (with no requirement on the innermost pixel layer).

Impact parameter-based algorithms
For the tagging itself, the impact parameters of tracks are computed with respect to the selected primary vertex. Given that the decay point of the b hadron must lie along its flight path, the transverse impact parameter is signed to further discriminate the tracks from b-hadron decay from tracks originating from the primary vertex. The sign is defined as positive if the track intersects the jet axis in front of the primary vertex, and as negative if the intersection lies behind the primary vertex. The jet axis is defined by the calorimeter-based jet direction. However if an inclusive secondary vertex is found in the jet (cf. section 3.3), the jet direction is replaced by the direction of the line joining the primary and the secondary vertices. The experimental resolution generates a random sign for the tracks originating from the primary vertex, while tracks from the b-/c-hadron decay normally have a positive sign. Decays of e.g. K 0 s and Λ 0 as well as interactions in the detector material also produce tracks with positively signed impact parameters, enhancing the probability to identify light flavour jets as b-quark jets.
JetProb [17] is an implementation of a simple algorithm extensively used at LEP and later at the Tevatron. It uses the track impact parameter significance S d 0 ≡ d 0 /σ d 0 , where σ d 0 is the uncertainty on the reconstructed d 0 . The S d 0 value of each selected track in a jet, i, is compared to a pre-determined resolution function R (S d 0 ) for prompt tracks, in order to measure the probability that the track originates from the primary vertex, P trk,i , as (3.1) The resolution function is determined from experimental data using the negative side of the signed impact parameter distribution, assuming that the contribution from heavy-flavour particles is negligible. The individual track probabilities P trk,i for the N tracks with positive d 0 are then combined as follows: For light-flavour jets and a perfect suppression of tracks resulting from decays of long-lived hadrons or from material interactions, the distribution of P jet should be uniform, while it should peak around zero for b jets. This robust algorithm with no dependence on simulation was mostly used for data taken before 2011, and is still used for online b-tagging (this is discussed in section 5). IP3D is a more powerful algorithm relying on both the transverse and longitudinal impact parameters, as well as their correlations. It is based on a log-likelihood ratio (LLR) method in which for each track the measurement S ≡ (d 0 /σ d 0 , z 0 /σ z 0 ) is compared to pre-determined two-dimensional probability density functions (PDFs) obtained from simulation for both the band light-flavour-jet hypotheses. The ratio of probabilities defines the track weight. The jet weight is the -6 -

JINST 11 P04008
sum of the logarithms of the individual track weights. The LLR formalism allows track categories to be used by defining different dedicated PDFs for each of them. Currently two exclusive categories are used: the tracks that share a hit in the pixel detector or more than one hit in the silicon strip detector with another track, and those that do not.

Vertex-based algorithms
To further increase the discrimination between b jets and light-flavour jets, an inclusive three dimensional vertex formed by the decay products of the b hadron, including the products of the possible subsequent charm hadron decay, can be sought. The algorithm starts from all tracks that are significantly displaced from the primary vertex2 and associated with the jet, and forms vertex candidates for track pairs with vertex fit χ 2 < 4.5. Vertices compatible with long-lived particles or material interactions are rejected: the invariant mass of the charged-particle track four-momenta is used to reject vertices that are likely to originate from K s , Λ decays and photon conversions, while the position of the vertex in the r-φ projection is compared to a simplified description of the innermost pixel layers to reject secondary interactions in the detector material. All tracks from the remaining two-track vertices are combined into a single inclusive vertex, using an iterative procedure to remove the track yielding the largest contribution to the χ 2 of the vertex fit until this contribution passes a predefined threshold.
A simple discriminant between b jets and light-flavour jets is the flight length significance L 3D /σ L 3D , i.e., the distance between the primary vertex and the inclusive secondary vertex divided by the measurement uncertainty. The significance is signed with respect to the jet direction, in the same way as the transverse impact parameter of tracks is. The flight length significance is the discriminating observable on which the SV0 tagging algorithm relies. As is typical for secondary vertex tagging algorithms, the mistag rate is much smaller than for impact parameterbased algorithms, but the limited secondary vertex finding efficiency, of approximately 70%, can be a drawback.
SV1 is another tagging algorithm based on the same secondary vertex finding infrastructure, but it provides a better performance as it is based on a likelihood ratio formalism, like the one explained previously for the IP3D algorithm. Three of the vertex properties are exploited: the vertex mass (i.e., the invariant mass of all charged-particle tracks used to reconstruct the vertex, assuming that all tracks are pions), the ratio of the sum of the energies of these tracks to the sum of the energies of all tracks in the jet, and the number of two-track vertices. In addition, the ∆R between the jet direction and the direction of the line joining the primary vertex and the secondary vertex is used in the LLR. Some of these properties are illustrated in figure 1 for b jets, c jets and light-flavour jets in simulated tt events. SV1 relies on a two-dimensional distribution of the first two variables and on two one-dimensional distributions of the latter variables. The secondary vertex finding efficiency depends in particular on the event topology. SV1 requires an a priori knowledge of SV b and the corresponding efficiency for light-flavour jets, SV l , obtained from simulation. This efficiency is shown as a function of the jet p T in figure 1c.
A very different algorithm, JetFitter [15], exploits the topological structure of weak band c-hadron decays inside the jet. A Kalman filter is used to find a common line in three dimensions 2d 3D /σ d 3D > 2, where d 3D is the three dimensional distance between the primary vertex and the point of closest approach of the track to this vertex, and σ d 3D its uncertainty. on which the primary vertex and the bottom and charm vertices lie, as well as their positions on this line approximating the b-hadron flight path. With this approach, the band c-hadron vertices are not merged, even when only a single track is attached to each of them. In the JetFitter algorithm, the decay topology is described by the following discrete variables: the number of vertices with at least two tracks, the total number of tracks at these vertices, and the number of additional single track vertices on the b-hadron flight axis. The vertex information is condensed in the following observables, shown in figure 2: the vertex mass (the invariant mass of all charged particle tracks attached to the decay chain), the energy fraction (the energy of these charged particles divided by the sum of the energies of all charged particles associated to the jet), and the flight length significance L/σ L (the average displaced vertex decay length divided by its uncertainty; the individual reconstructed vertices contribute to the average decay length weighted by the inverse square of their decay length uncertainties). The six JetFitter variables defined above are used as input nodes in an artificial neural network. As the input variable distributions depend on the p T and |η| of the jets, these kinematic variables are included as two additional input nodes. To ensure that the jet p T and |η| spectra of the b, c and light-flavour jets in the training sample are not used by the neural network to separate the different jet flavours, a two-dimensional reweighting yielding flat kinematic distributions for all three jet flavours is performed prior to the neural network training. A coarse two-dimensional binning with seven bins in p T and three bins in |η| is used for the reweighting. The JetFitter neural network has three output nodes, corresponding to the b-, cand light-flavour-jet hypotheses, referred to as P b , P c and P l . The network topology includes two hidden layers, with 12 and 7 nodes, respectively. A discriminating variable to select b jets and reject light-flavour jets is then defined from the values of the corresponding output nodes: w JetFitter = ln(P b /P l ).

Combined tagging algorithms
The vertex-based algorithms exhibit much lower mistag rates than the impact parameter-based ones, but their efficiency for actual b jets is limited by the secondary vertex finding efficiency. Both approaches are therefore combined to define versatile and powerful tagging algorithms. The LLR-based IP3D and SV1 algorithms are combined in a straightforward manner by summing their  Figure 2. The vertex mass (top), energy fraction (middle) and flight length significance (bottom) for b jets (left), c jets (middle) and light-flavour jets (right), split according to the decay chain topology found by JetFitter. In the case that no vertex with at least two outgoing tracks has been reconstructed, these quantities are computed from reconstructed single track vertices as explained in the text. The distributions are obtained from a simulated sample of tt events generated with POWHEG [18,19] interfaced to PYTHIA. respective weights: this is the so-called IP3D+SV1 algorithm. Another combination technique is the use of an artificial neural network, which can take advantage of complex correlations between the input values. Two tagging algorithms are defined in this way, IP3D+JetFitter and MV1.
The IP3D+JetFitter algorithm is defined in the same way as the JetFitter algorithm itself, with the only difference being that the output weight of the IP3D algorithm is used as an additional input node, and that the number of nodes in the two intermediate hidden layers is increased to 9 and 14, respectively. The discriminating variable to select b jets and reject light-flavour jets is defined as w IP3D+JetFitter = ln(P b /P l ). A specific tuning of the IP3D+JetFitter algorithm to provide a better discrimination between b and c jets uses w IP3D+JetFitter(c) = ln(P b /P c ) as a discriminant.
MV1 is an algorithm used widely in ATLAS physics analyses. Distributions of the three MV1 input variables (the IP3D and SV1 discriminants as well as the sum of the IP3D and JetFitter discriminants) are shown in figure 3, for b jets, c jets, and light-flavour jets in simulated tt events.   Figure 3. Distribution of the IP3D (a), SV1 (b) and IP3D+JetFitter (c) weights, for b, c and light-flavour jets. These three weights are used as inputs for the MV1 algorithm. The spikes at w IP3D ≈ −20 and ≈ −30 correspond to pathological cases where the IP3D weight could not be computed, due to the absence of good-quality tracks. The spike at w SV1 ≈ −1 corresponds to jets in which no secondary vertex could be reconstructed by the SV1 algorithm, and where discrete probabilities for a b and light-flavour jet not to have a vertex are assigned. The irregular behaviour in w IP3D+JetFitter arises because both the w IP3D and the w JetFitter distribution (not shown) exhibit several spikes.
The distributions of the correlations between the three input weights are also shown in figure 4, for b jets, c jets and light-flavour jets. These distributions illustrate the potential gain in combining the three weights: it can be seen that the IP3D weight has only limited correlations with the secondary vertex-based weights, while naturally SV1 and IP3D+JetFitter weights are more correlated but the correlation is different in the b-jet, c-jet and light-flavour-jet samples. The MV1 neural network is a perceptron with two hidden layers consisting of three and two nodes, respectively, and an output layer with a single node which holds the final discriminant variable. The implementation used is the MLP code from the TMVA package [20]. The training relies on a back-propagation algorithm and is based on two simulated samples of b jets (signal hypothesis) and light-flavour jets (background hypothesis). Most of the jets are obtained from simulated tt events and their average transverse momentum is around 60 GeV. To provide jets with higher p T for the training, simulated dijet events with jets in the 200 GeV < p T < 500 GeV range are also included. As in the case of the JetFitter neural network, since the tagging performance depends strongly on the p T and, to a lesser extent, on the η of the jet, biases may arise from the different kinematic spectra of the two training samples (of light-flavour and b jets). To reduce this effect, weighted training events are used. Each jet is assigned to a category defined by a coarse two-dimensional grid in (p T , η) with four bins in η and ten bins in p T . Jets in the same category are given the same weight, defined according to the overall fraction of all jets in this category, and the jet category is fed to the network as an additional input variable. The MV1 output weight distribution is shown in figure 5 for b, c, and light-flavour jets in simulated tt events. The spike around 0.15 corresponds mostly to jets for which no secondary vertex could be found.      Figure 4. Distributions of the correlations between the IP3D, SV1 and IP3D+JetFitter weights, for b jets (top), c jets (middle) and light-flavour jets (bottom). The spikes at w IP3D ≈ −20 and ≈ −30 correspond to pathological cases where the IP3D weight could not be computed, due to the absence of good-quality tracks. The spike at w SV1 ≈ −1 corresponds to jets in which no secondary vertex could be reconstructed by the SV1 algorithm, and where discrete probabilities for a b and light-flavour jet not to have a vertex are assigned.   . Light-flavour-jet rejection versus jet p T , for the MV1 algorithm. In each bin the cut on the b-tagging weight is adjusted to maintain an average 60% (70%) b-jet tagging efficiency.

Performance in simulation
The performance of the tagging algorithms is estimated in large samples of simulated tt events. Figure 6 shows the light-flavour-jet rejection as a function of b-jet tagging efficiency. As expected, a clear hierarchy between the standalone and combined algorithms is observed. In particular, the use of a combined tagging algorithm can improve the rejection by a factor 4 to 10 compared to JetProb in the 60-80% efficiency range.
For physics analyses it is important to understand the light-flavour-jet rejection as a function of kinematic variables. Figures 7 and 8 show the dependence on jet p T and η, respectively. The rejection is best at intermediate p jet T values and in the central region. At low p T and/or high |η|, the performance is degraded mostly because of the increase of multiple scattering and secondary interactions. For p T greater than about 200 GeV, some dilution arises because the fraction of fragmentation tracks increases, and more b hadrons fly beyond the first pixel layer. In addition, a further performance degradation results from pattern recognition issues in the core of very dense jets.
As mentioned in the previous section, algorithms such as IP3D+JetFitter can be tuned to achieve a better charm rejection. For high-performance b-tagging algorithms, the ability to reject c jets also becomes important. Charm hadrons have sufficiently long lifetimes to also lead to reconstructible secondary vertices. Since JetFitter relies not only on the long lifetimes of b and c hadrons but also on the full decay topology, it can help to discriminate b jets and c jets, for instance by separating b jets with cascade charm decays (i.e. at least 2 vertices) from single-vertex c jets. The neural network used for the IP3D+JetFitter combination has three output neurons: one for each of the light-quark, b and c hypotheses. The usual IP3D+JetFitter algorithm is built using the LLR of the light-flavour-jet and b-jet outputs. Figure 9 shows the c-jet rejection versus the b-jet tagging efficiency. On the other hand, the figure also shows that merely adding the SV1 and IP3D discriminants does not help to improve the performance with respect to IP3D+JetFitter.
-13 -Since hadronic decays of τ leptons can be reconstructed as jets which can mimic b jets, it is useful to know the discrimination power between τ jets and b jets. This is shown in figure 10 for two tagging algorithms.

Muon-based tagging algorithm
Decays of b hadrons to muons, either direct, b → µ − , or through the cascade, b → c → µ + (or, with significantly smaller rate, b →c → µ − ), can be exploited to identify b jets. 3 The intrinsic efficiency of muon-based tagging algorithms is typically lower than that of lifetime-based tagging algorithms due to the limited branching fraction of b hadrons to muons (≈ 20%, including both direct and cascade decays). The Soft Muon Tagger (SMT), which is described in this section, is a muon-based tagging algorithm that does not use any lifetime information. This makes it complementary to the lifetime-based techniques and subject to significantly different sources of systematic uncertainties.

Muon selection
The muons considered for tagging in the SMT algorithm are required to be reconstructed both in the ID and the MS, so-called combined muons [1]. Such muons must satisfy track quality requirements on the number of hits in the different ID sub-detectors, aimed at reducing the number of light-flavour hadron decays in flight. Candidate muons also have to be loosely compatible with the reconstructed primary vertex, in order to reject charged particles from additional proton collisions, especially at high LHC instantaneous luminosities, or from nuclear interactions of the hard collision products with the detector material. A candidate muon is associated with a jet if ∆R(jet, µ) < 0.5. If more than one jet fulfils this requirement, the muon is associated with the nearest jet only. The candidate muon must further fulfil a set of selection criteria, referred to as SMT selection criteria in the following: |d 0 | < 3 mm, |z 0 · sin θ| < 3 mm and p T > 4 GeV.
Light charged mesons (π ± , K ± ) decay predominantly into muons and thus contribute significantly to a sample of jets with associated muons. Given the long lifetimes of light charged mesons, a small fraction of those mesons decay between the end of the ID volume and the entrance of the muon system. While in those cases the ID measures the track parameters for the meson, the MS is sensitive to the track of the muon produced in the decay, giving rise to an enlarged χ 2 for the combination of both measurements. In order to discriminate between b and light-flavour jets the SMT therefore uses the χ 2 of the statistical combination of the track parameters of muons reconstructed in the ID and MS, χ 2 match , normalised to the number of degrees of freedom. The momentum imbalance and kink from the decay between the light charged meson and daughter muon will result in χ 2 match values larger on average than for decays of heavy-flavoured hadrons. The χ 2 match is defined as where P ID and P MS are the 5-dimensional vectors of the trajectory helix parameters measured in the ID and MS, respectively, and W ID and W MS are their associated 5 × 5 covariance matrices. The χ 2 match distribution for the different flavour sources in simulated tt events is shown in figure 11. Compared to b or c jets, light-flavour jets indeed show a significantly broader χ 2 match 3Charge-conjugate decay modes are implied throughout this paper. distribution. A jet is considered tagged by the SMT if it has an associated candidate muon passing the SMT selection criteria, which also include the requirement χ 2 match < 3.2.

Performance in simulation
Various aspects of the performance of the SMT algorithm have been studied in simulated events of different physics processes. An inclusive sample of di-muon events from J/ψ meson and Z boson decays has been used to provide a clean source of genuine muons spanning a wide transverse momentum range. This allows studies of the efficiency of the SMT selection criteria for isolated muons, including the χ 2 match cut. This efficiency, which is found to be on average around 95%, has been studied as a function of the muon transverse momentum and pseudorapidity. It is found not to depend significantly on the transverse momentum, and exhibit only a mild dependence on the pseudorapidity.
The efficiency of the SMT algorithm to identify b and c jets has been evaluated using a sample of simulated tt events. The average band c-jet tagging efficiencies in this sample are found to be 11.1% and 4.4%, respectively. The efficiencies as a function of jet p T are given in figure 12. As expected, the tagging efficiencies are significantly lower than what is typically found for lifetimebased tagging algorithms, due to the limited branching ratio of muonic band c-hadron decays. A dependence on the jet p T is observed, whereby a lower efficiency is found for lower p T : softer jets originate from decays of b hadrons with lower transverse momentum, which in turn produce less energetic tagging muons. The latter are more likely to fail the SMT pre-selection requirement on the muon p T (p T > 4 GeV). The efficiency becomes almost flat when jets attain a p T range where they produce high transverse momentum muons.
The mistag rate, i.e. the efficiency to falsely identify a light-flavour jet as a b jet, has been estimated using a sample of simulated inclusive jet events, generated with PYTHIA. As mentioned before, mistagging of light-flavour jets as b jets is mainly caused by decays in flight of charged pions and kaons, π + , K + → µ + ν µ . Another source is instrumental effects like punch-through of hadrons -15 -through the calorimeters and nuclear interactions of particles within a jet with the material in the calorimeters, mimicking muons in the MS. The values of the mistag rate, determined as a function of jet p T and |η|, are summarised in figure 13. They are found to be very low, demonstrating the power of the SMT tagging algorithm.

b-jet trigger algorithm
The possibility to identify b jets at trigger level is crucial for physics processes with purely hadronic final states containing b jets because the absence of leptons and the huge inclusive jet background make other trigger selections very challenging.

Trigger selection
The b-jet trigger selection starts from the calorimetric jet candidates, reconstructed by the hardwarebased first level trigger (LVL1); the corresponding charged-particle tracks, reconstructed by the two subsequent software-based trigger levels, the second level trigger (LVL2) and the Event Filter (EF), are then analysed with lifetime-based algorithms. For a detailed description of the ATLAS trigger scheme, including the detailed descriptions of tracking, vertexing and beamspot determination in the trigger, see ref. [21].
During the 2011 data taking, the b-tagging trigger selection was based on the impact parameter significance of the reconstructed tracks. The tagging algorithm adopted for the primary physics trigger was an online version of the JetProb algorithm described in section 3.2, applied to jet candidates identified by the LVL1 trigger. To maximise the acceptance for different physics channels, various b-jet trigger selections were deployed during 2011 data taking, differing in LVL1 jet requirements as well as in b-tagging requirements. The trigger selections required either a single or multiple b-tagged jets, and the b jets were selected at three working points. These working points, referred to as tight, medium and loose, correspond respectively to approximately 40%, 55% and 70% identification efficiency for selecting jets corresponding to true offline b jets, measured on a tt simulated sample. The b-tagging triggers also exploited a refined jet reconstruction at LVL2 and -16 -EF, which offers a better correlation between online and offline jet p T , to reduce further the rate without compromising the jet trigger efficiency plateau of the LVL1 selection. The rate reduction provided solely by the request of one tight (two medium) b-tagged jet(s) is a factor of 6 (13) at LVL2 and 2 (4) at EF.
The data collected in 2011 are compared to a PYTHIA generated dijet sample, and distributions of basic ingredients for the b-jet triggers are shown in figure 14. The overall agreement is good but to take into account deviations in the simulation, especially in the impact parameter tails, data-driven techniques will be employed to derive data-to-simulation scale factors, as described in sections 8, 13 and 14.

Performance in simulation
The different tagging methods are characterised, at each trigger level, as a curve showing the lightflavour-jet rejection (R l ) versus the efficiency to select b jets ( b ). The characterisation of trigger selections also involves studying the bias that each trigger level imposes on the next one and on the final recorded sample. In particular, for the b-jet triggers, this can be derived as an additional rejection versus efficiency curve for offline tagging algorithms, measured on a sample selected by a single b-jet trigger.
The combined rejection versus efficiency curves for the LVL2, EF and offline selections based on JetProb and measured in a sample of HERWIG generated tt events are shown in figure 15; the EF (offline) performance is shown starting from the tight and medium L2 (EF) working points.
When compared with the same curve measured on an unbiased sample, the curve describing the offline rejection on jets selected by a single b-jet trigger also provides an estimate of the correlation between the tagging algorithms used in the different selection stages. In each plot an offline curve, which is obtained on an unbiased sample, is drawn to provide an estimate of this correlation. For instance, figure 15b shows that a sample of jets selected by the offline b-tagging is not biased by the b-jet trigger "medium" selection if the offline selection operates at an average efficiency of -17 -

JINST 11 P04008
about 40%. However the use of the b-jet trigger is not limited to this unbiased offline sample since data-to-simulation efficiency scale factors are derived for trigger selection and for combined trigger and offline selections.  Figure 15. The combined rejection versus efficiency curves for the LVL2, EF and offline JetProb tagging algorithms for the tight (a) and medium (b) trigger working points. The offline rejection versus efficiency curve, measured on an unbiased sample, is superimposed, providing an estimate of the correlation between online and offline selections. The offline jets are required to have |η| < 2.5 and transverse momentum p T > 50 GeV (corresponding to the full acceptance of the jet trigger).

Dependence of the b-tagging performance on pile-up
With the increasing instantaneous luminosity of the LHC during 2011 data taking, the rate of pile-up interactions increased substantially with an average of 12 interactions per bunch crossing in the later data taking periods, reaching maximum values of more than 20 interactions. These additional interactions can potentially affect the b-tagging performance through several effects: • The hard scatter primary vertex has to be identified among the reconstructed primary interaction vertices along the beam line (see section 2). Identifying the wrong primary vertex as the signal vertex typically results in rejecting tracks for the signal jets when applying the quality criteria for b-tagging tracks, consequently losing the power to tag these jets as b jets. This effect is less pronounced in final states containing jets and/or charged leptons with large transverse momenta, such as tt events. However, it can play an important role in topologies with lower transverse momenta of the final state objects or if some high transverse momentum objects are not reconstructed. Pile-up effects on vertex reconstruction can also lead to a worsening of the z-resolution of the primary vertex due to contamination from tracks from nearby interactions. This will translate into a worsening of the longitudinal impact parameter resolution which constitutes an important input to b-tagging algorithms. Furthermore, the -18 -fraction of tracks in the tails of the longitudinal impact parameter distribution is increased, which also degrades the b-tagging performance. Studies in ref. [22] have shown that the fraction of tt events with a misidentified primary vertex is below 2% for the number of additional interactions as present during data taking in 2011. The resolution of the z coordinate of the signal vertex degrades by about 10% for an average of 12 additional interactions as in the later data taking periods in 2011. As explained in section 2, a requirement on the jet vertex fraction has been applied to jets selecting only jets for the b-tagging analyses that are compatible with the selected primary event vertex. As a result, jets from the hard scatter interaction that are lost when the wrong primary vertex is selected as signal vertex do not enter into the determination of the performance of b-tagging algorithms. The consequences of this depend strongly on the specific analysis considered and are not discussed in detail in this paper.
• The increased density of charged particle tracks in the inner tracking detectors makes track reconstruction more challenging. An increased rate of falsely associated hits or hits shared with other tracks, as well as an increased rate of fake tracks are the most important consequences. Furthermore, misassociated hits can lead to tails in the impact parameter resolutions for these tracks. These aspects have been studied in refs. [22,23]. It has been found that for the pile-up conditions in the 2011 data, there is no significant degradation of the track reconstruction efficiency and the track impact parameter resolution in the transverse plane. However, there is some increase of the rate of fake tracks and a slight worsening of the track impact parameter resolution along the z direction.
• Pile-up interactions can create additional jets reconstructed in the detector. If the corresponding interaction vertex is close to the primary vertex of the hard scatter process of interest, charged particle tracks stemming from the pile-up interaction might be falsely associated to the hard scatter primary vertex and mimic lifetime signatures leading to an increased misidentification rate of non-b jets. If the pile-up jet overlaps with a signal jet, tracks from the pile-up interaction might be misassociated with the signal jet, diluting the b-tagging performance. Studies in ref. [22] have shown that this is the main source of an increased multiplicity of tracks in signal jets in the presence of pile-up. If the pile-up vertex is sufficiently displaced from the hard scatter vertex, the corresponding tracks will be rejected by the selection criteria, typically not causing false identification of the pile-up jets.
The dependence of the b-tagging performance on the number of reconstructed primary vertices has been studied using simulated tt events. An important input to the b-tagging algorithms is the information from the reconstruction of inclusive secondary decay vertices in jets. The secondary vertex reconstruction can be affected by additional tracks from pile-up vertices. Figure 16 shows the rate with which secondary vertices are reconstructed by the SV1 algorithm in jets of different flavour, normalised to the average secondary vertex reconstruction rate. It can be seen that for c and b jets, where the reconstructed secondary vertices are mainly real vertices from decays of long-lived heavy hadrons, the secondary vertex rate is nearly independent of the number of pile-up interactions. For light-flavour jets on the other hand an almost linear dependence can be observed, leading to an increased misidentification rate of light-flavour jets. Figure 16 also shows the rejection of light--19 -flavour jets for a b-jet tagging efficiency of 70% versus the number of reconstructed primary vertices for the MV1 algorithm for tt events. It can be seen that the light-flavour-jet rejection degrades with increasing number of pile-up interactions, resulting in a light-flavour-jet rejection rate that is reduced by a factor of almost two for the highest level of pile-up as present in the year 2011.  Figure 16. The dependence of the secondary vertex reconstruction rate of the SV1 algorithm (a) and the light-flavour jet rejection of the MV1 algorithm (b) on the number of reconstructed primary vertices, estimated from simulated tt events. The secondary vertex reconstruction rate has been normalised to the rate in the inclusive sample.

Simulation modelling of b-tagging input observables
An acurate modelling of the b-tagging performance in the simulation is based on a correct description of the underlying quantities, such as the reconstruction efficiency and fake rate of tracks and vertices, and the properties of the reconstructed objects. In this section, a comparison between data and simulation is presented for a number of b-tagging input observables.

Measurement of the impact parameter resolution of charged particles
Two key ingredients for discriminating between tracks originating from displaced vertices and those originating from the primary vertex are: the transverse impact parameter (IP) of a track, d 0 , and z 0 sin θ, the longitudinal impact parameter projected onto the direction perpendicular to the track. Both of these quantities can be measured with respect to the primary vertex in an unbiased way (d PV and z PV sin θ): if the track under consideration was used for the primary vertex determination, it is first removed from the primary vertex which is subsequently refitted, and the impact parameters are computed with respect to this new vertex.
Due to the fact that the primary interaction point has a spread itself (of approximately 25 µm in the x and y directions), it is not possible to measure the impact parameter resolution IP track directly. Hence relating the impact parameter distributions to the purely track-based IP track resolution is not straightforward since it is convolved with the resolution on the primary vertex -20 -

JINST 11 P04008
position: σ 2 IP = σ 2 IP track + σ 2 PV , where σ 2 PV is the projection of the primary vertex uncertainty along the axis of closest approach of the track to the primary vertex on the transverse or longitudinal plane.
In this section, a measurement of the impact parameter resolution in data is presented. Since the measurement does not require a high-luminosity sample, and to limit pile-up effects, only the first runs of the collected data in 2011 are used.
The data were required to satisfy standard ID data quality requirements. The simulation samples considered are PYTHIA generated dijet samples. Events passing a logical OR of inclusive jet triggers, with at least 10 tracks used in the primary vertex reconstruction are retained for this study.
Tracks fulfilling the following basic track quality selection are used: • The track must be included in the primary vertex reconstruction.
• 2 hits in the pixel detector.
• 7 hits in the combined pixel and SCT detectors.
In order to extract the correct impact parameter resolution from data it is important to understand how to subtract the contribution from the resolution on the position of the reconstructed primary vertex. Since the primary vertex fit uses the beam spot constraint, the beam spot size is already included in the estimated uncertainty on the primary vertex position. The tracks are divided into different categories of η, p T √ sin θ, and the number of innermost pixel layer hits to ensure an almost constant resolution within a single category. Both the d 0 and z 0 sin θ resolution have been measured for each track category. The pseudorapidity η is chosen as it reflects the kinematics of the particle production mechanism while θ is more suitable for parametrising detector-related effects. Finally, p T √ sin θ has been chosen instead of p T itself because it is directly linked to the multiple scattering contribution to the impact parameter resolution in the case that the material traversed by charged particles follows a cylindrical geometry. The resolution is modelled as The method used to subtract the primary vertex reconstruction contribution to the IP resolution, σ PV , relies on an iterative deconvolution procedure. For each iteration it is possible to obtain the deconvolved distribution by multiplying the measured impact parameter of each track by a correction factor. For example, for the transverse impact parameter with respect to the primary vertex: where K is a correction factor that depends on the iteration index. For the first iteration K is equal to one. For each iteration, σ d PV can be evaluated by fitting each d PV distribution and for the i-th iteration it should be: -21 -which can then be used to calculate K i+1 . To evaluate the width of the core of the d PV distributions, and hence estimate the impact parameter resolution, a Gaussian fit is first applied to the whole distribution, and a temporary mean and width are obtained. A new fit range, of width four times the temporary fit width, is then centred around the temporary mean; finally the distribution is refitted within this new range. The iterative procedure ends when the fitted σ d PV is stable within approximately 0.01%. About five iterations are needed to make the K factor converge to stable values that range between 0.8 and 1.2. This iterative procedure was verified on Monte Carlo simulation; the impact parameter resolutions derived from reconstructed tracks in simulated events converges well, especially at high p T , to the values derived from the tracks reconstructed directly from the simulated hits in the ID. Figure 17 shows the comparison between data and simulation for both the transverse and longitudinal impact parameter resolutions, measured with respect to the primary vertex as a function of η for tracks with one hit in the innermost pixel detector layer, for two different p T √ sin θ regions (0.4 GeV < p T √ sin θ < 0.5 GeV and p T √ sin θ > 20 GeV). The η dependence of the transverse impact parameter resolution is shown in the upper plots for low-and high-p T tracks. The low-p T tracks of the first region show a rise in resolution versus |η| because of the increase in the multiple scattering contribution dominating the resolution in this momentum interval. At high p T , the tracks of the second region, the hit resolution and potential residual misalignments of the silicon detectors are dominating, leading only to a moderate η dependence in d 0 . The lower plots show the resolution of the projected longitudinal impact parameter z 0 sin θ. Because of this projection and of the variation of the average pixel hit's cluster size with η, a strong dependence is seen both at low and high p T . In both cases, d 0 and z 0 sin θ, the low-p T regime is well modelled in simulation thanks to the excellent description of the material in the beam pipe and the first layers of the pixel detector. The high-p T regime exhibits a significantly better resolution in simulation compared to data. These differences are attributed to residual alignment uncertainties in data not present in simulation, as well as to imperfections in the cluster modelling in the pixel sensors in simulation.

Input variable comparisons using fully reconstructed b hadrons
In this section, a comparison between data and simulation is presented of observables entering b-tagging algorithms. The main goal of this comparison is to validate the description in simulation of the b jets and quantify possible differences.
A pure b-jet sample is obtained by exploiting an invariant mass based selection of fully reconstructed b hadrons in an inclusive decay channel. It is possible to isolate a very pure b-jet sample to be used for the comparison by matching those candidates to jets, albeit at the expense of the sensitivity to the modelling of heavy flavour lifetimes and decay processes.

Comparison procedure and sample selection
Although there are a reasonable number of decay channels of b hadrons that would be suitable for the selection of the b-jet-enriched sample, in practice only the decay mode B ± → J/ψ(µ + µ − )K ± is chosen for this analysis. This decay is characterised by both a clear signature and a high branching fraction (≈ 10 −3 ), compared to other decays involving a J/ψ.
A logical OR of J/ψ triggers has been used for the event selection, applied to the full 2011 dataset and to all simulated samples. In the simulated signal sample, which is generated using  PYTHIA, true B ± → J/ψ K ± decays are required, matched in ∆R to the reconstructed candidate. The simulated jet kinematic (p T , η) spectra are reweighted to match those in the data.
The J/ψ candidate is selected requiring two muons with p T > 4 GeV and invariant mass within 200 MeV from the J/ψ mass. Secondly, a fully reconstructed B ± candidate is selected following the scheme shown in figure 18. In the B ± selection procedure all tracks that fulfil minimal quality requirements, and have a transverse momentum greater than 2.5 GeV, are refitted to a common vertex together with the selected muons. If more than one candidate is found in the event, the one with the lowest vertex fit χ 2 is selected.
Finally, the B ± candidate is matched to a jet satisfying the selection criteria used in this paper (p T > 20 GeV, |η| < 2.5) by means of an angular matching. No JVF requirement is imposed. It should have ∆R(B, jet) < 0.4, where the candidate B ± direction is estimated by summing the momenta of the muons and the third charged-particle track. If more than one jet is found compatible with the B ± candidate, only the jet having the smallest ∆R is considered.  The obtained mass spectrum of all B ± candidates is shown in figure 19, together with a fit to a Gaussian signal on top of a falling combinatorial background. In order to separate the signal from the combinatorial background, a sideband-based background subtraction procedure is adopted. The sideband region is defined as the mass region between 3σ above the signal peak and 6.6 GeV, where σ is the width of the Gaussian; masses below the signal peak are not used since they have a large contribution from partially reconstructed other b-hadron decays. The main assumption is that the distributions in background events of variables under study are the same for the events in the sideband region and under the resonance peak. This assumption has been tested in simulated samples for each variable, discarding the ones showing correlation with the invariant mass of the B meson.
In the following, the B ± signal region is defined as the mass region within two standard deviations from the signal peak. For each variable, its distribution in the sideband region is subtracted from that in the signal region, after proper normalisation. This normalisation is evaluated from the fit to the invariant mass spectrum using a double exponential function for the combinatorial background and a Gaussian for the signal. Systematic effects on the background subtraction have been estimated by replacing the double exponential with a single exponential. The statistical uncertainty on the estimated fraction of combinatorial background is found to be negligible.

Comparison of variables
For the sake of clarity the investigated observables are divided into three categories (although this procedure is not completely rigourous): detector specific variables, variables sensitive to the hadronisation physics, and tagging algorithm performance variables.
The first group of variables that have been compared are those mostly related to detector reconstruction effects. The B ± decay tracks as well as the hadronisation tracks, defined as those -24 -   tracks that have been associated to the matched jet but not identified as the B ± decay products, have been used for this comparison. Inner Detector hits and impact parameters in the x y plane (d 0 ) and along the beam axis (z 0 ) with respect to the primary vertex and their errors for B ± decay tracks have been studied. Hadronisation tracks have been used to study the associated number of innermost pixel detector layer and other pixel detector hits, which are of utmost importance for tagging algorithm performances. The distributions of these quantities in data and simulation are shown in figure 20. The impact parameter distributions are slightly wider in data than in simulated events, consistent qualitatively with the slightly worse impact parameter resolutions observed in figure 17, and the numbers of innermost layer and total pixel hits associated to each track are slightly lower in data than in simulation.
The reliability of the description of the hadronisation process in simulated events is verified by means of the second group of variables. These include the angular distance of hadronisation tracks to the jet axis and the track multiplicities, as shown in figure 21. In both cases excellent agreement with data is found, reflecting the quality of the Monte Carlo generator tuning, with a small tendency of the simulation to underestimate the multiplicity.  The last comparison targets the performance of the b-tagging algorithms. Given that the B ± decay is completely reconstructed, a detailed comparison of the secondary vertex-based algorithms is possible. In figure 22 the comparison between data and simulation of the number of tracks associated with the displaced vertices reconstructed by the SV1 and JetFitter algorithms is shown. In addition, figure 23 compares the efficiencies in data and simulation of the SV1 and JetFitter algorithms in associating the B ± decay products with the displaced vertices. The agreement between data and simulation is good but shows, nevertheless, slightly lower efficiency in data to select the tracks resulting from the B ± decay.
In summary, the overall agreement between distributions in data and simulated events is quite good, with only small differences in the impact parameter distributions and hit and track multiplicities. The track p T spectrum in the sample studied here is soft, and therefore the discrepancies observed at high track p T in section 7.1 are not evident here.

b-jet tagging efficiency calibration using muon-based methods
The ideal sample for the calibration of flavour-tagging algorithms is composed of jets characterised by a strong predominance of a single flavour, whose fractional abundance can be measured from data. For the b-jet tagging efficiency calibration, a good sample can be obtained by selecting jets containing a muon: because of the semileptonic decay of the b hadrons this sample is enriched in b jets.
Two methods were used to measure the b-jet tagging efficiency in the inclusive sample of jets containing a muon: p rel T and system8. The p rel T method uses templates of the muon momentum transverse to the jet axis to fit the fraction of b jets before and after b-tagging to extract the b-jet tagging efficiency. The system8 method, developed within the D0 experiment [24], was designed to involve a minimal input from simulation and therefore to be less sensitive to the associated systematic uncertainties. It applies three independent criteria to a data sample containing a muon associated with a jet to build a system of eight equations between observed and expected event counts.

Data and simulation samples
The events used in the analyses were collected with triggers that require a muon reconstructed from hits in the Muon Spectrometer and spatially matched to a calorimeter jet. In each jet p T bin of the analyses, the muon-jet trigger with the lowest jet threshold that has reached the efficiency plateau is used.
In the lower jet p T region (up to 60 GeV) jets with E T > 10 GeV at the EF level are required. Starting from 60 GeV up to 110 GeV the analyses use events with at least one jet with E T > 10 GeV at the first trigger level, while for jet p T above 110 GeV the trigger threshold is increased to 30 GeV. During data taking each of the muon-jet triggers was prescaled to collect data at a fixed rate slightly below 1 Hz.
For quantities related to b and c jets, the analyses make use of a simulated muon-filtered inclusive jet sample, referred to below as the µ-jet sample, where the events are required to have a -27 -muon with p T > 3 GeV at generator level. The sample is generated with PYTHIA [9], utilising the ATLAS AUET2B LO** PYTHIA tune [25]. A total of 25.5 million events have been simulated in four intervals ofp ⊥ , the momentum of the hard scatter process perpendicular to the beam line [9], starting fromp ⊥ = 17 GeV. For estimates of inclusive flavour fractions, as well as quantities related to light-flavour jets, the analyses make use of an inclusive jet sample for which the simulation has been carried out in sixp ⊥ intervals. About 2.8 million events have been simulated perp ⊥ interval.
To reduce the dependence on the modelling of p rel T for muons in light-flavour jets, the heavyflavour content in the p rel T sample is increased by requiring that there is at least one jet in each event, other than those used in the p rel T measurement, with a reconstructed secondary vertex with a signed decay length significance L/σ(L) > 1. The same sample is used as a subsample, called p-sample in the system8 analysis. This flavour-enhancement requirement is not applied in the sample used to derive the p rel T template for light-flavour jets.

Jet energy correction for semileptonic b decays
The jet energy measurement in ATLAS is characterised using the calorimeter response R calo = p jet T /p truth T , where p truth T is the p T of a matched jet built of final state particles with a lifetime longer than 10 ps, except for muons and neutrinos [6]. For b jets containing semileptonic decays, however, a larger fraction of the momentum is carried by muons and neutrinos than for inclusive jets. Therefore an additional correction is applied based on the all-particle response, R all = p jet+µ T /p truth,all T , where p jet+µ T includes selected reconstructed muons in the jet cone (while correcting for their mean energy loss in the calorimeters) and p truth,all T is the p T of a matched jet built of final state particles with a lifetime longer than 10 ps. This correction and its systematic uncertainties are described in detail also in ref. [6]. The estimation of the effect of the systematic uncertainty on the calibrations is described in section 8.5.

The p rel T method
The number of b jets before and after tagging can be obtained for a subset of all b jets, namely those containing a reconstructed muon, using the variable p rel T which is defined as the momentum of the muon transverse to the combined muon-plus-jet axis. Muons originating from b-hadron decays have a harder p rel T spectrum than muons in c and light-flavour jets. Templates of p rel T in simulated events are constructed for b, c and light-flavour jets separately, and these are fitted to the p rel T spectrum of muons in jets in data to obtain the fraction of b jets before and after requiring a b-tag.
As the templates from c and light-flavour jets have a rather similar shape, the fit can only reliably separate the b jets from non-b jets. Therefore, the ratio of the c and light-flavour fractions is constrained in the fit to the value observed in simulated events, which in the pre-tagged sample ranges from 2 at low p T to 0.7 at high p T . This ratio is then varied as a systematic uncertainty, as described in section 8.5. Figure 24 shows examples of template fits to the p rel T distribution in data before (left) and after (right) b-tagging. Having obtained the flavour composition of jets containing muons from the p rel T fits, the b-jet tagging efficiency is defined as  Figure 24. Examples of template fits to the distribution of p rel T , the momentum of the muon transverse to the combined muon-plus-jet axis, in data before (a) and after (b) b-tagging by applying the MV1 tagging algorithm at 70% efficiency, for jets with 60 GeV < p T < 75 GeV.
where f b and f tag b are the fractions of b jets in the pre-tagged and tagged samples of jets containing muons, and N and N tag are the total number of jets in those two samples. In practice, the second form of the expression is used (where N untag and f untag b denote the total number of events in the untagged sample and the fitted b-jet fraction therein), since this explicit subdivision into statistically independent samples allows for a proper computation of the statistical uncertainty on data b . The factor C corrects the efficiency for the biases introduced through differences between data and simulation in the modelling of the b-hadron direction and through heavy flavour contamination of the p rel T template for light-flavour jets, as described below. The efficiency measured for b jets with a semileptonically decaying b hadron in data is compared to the efficiency for the same kind of jets in simulated events to compute the corresponding data-to-simulation efficiency scale factor.
Both the pre-tagged and the tagged samples are fitted using templates derived from all jets passing the jet selection criteria defined in section 2. The p rel T templates for b and c jets are derived from the simulated µ-jet sample, using muons associated with b and c jets, without requiring any b-tagging criteria. It has been verified that the pre-tagged and tagged template shapes agree within statistical uncertainties. The template for light-quark jets is derived from muons in jets in a light-flavour dominated data sample. The sample is constructed by requiring that no jet in the event is b-tagged by the IP3D+SV1 tagging algorithm, using an operating point that yields a b-jet tagging efficiency of approximately 80% in simulated tt events. This requirement rejects most events containing b jets and yields a sample dominated by c and light-flavour jets. The b-jet contamination in this sample varies between 2% and 6% depending on the p T bin. The small bias introduced in the measurement from the b-jet contamination in the light-template is corrected for in the final result.
As the p rel T method is directly affected by how well the b-hadron direction and the calorimeter jet axis are modelled in the simulation, a difference in the jet direction resolution between data and simulation, or an improper modelling of the angle between the b quark and the b hadron in simulation would cause the p rel T spectra in simulation and data to disagree, introducing a bias in -29 -the measurement. To study this effect, an independent jet axis was formed by the vector addition of the momenta of all tracks in the jet. The difference between this track-based and the standard calorimeter-based jet axis in the azimuthal angle φ and the pseudorapidity η, ∆φ(calo, track) and ∆η(calo, track), was derived in both data and simulation. The difference between the track-based jet axis direction and the calorimeter-based jet axis direction is observed to be larger in data than in simulation, and the φ and η of the calorimeter-based jet axis in simulation were therefore smeared such that the ∆φ(calo, track) and ∆η(calo, track) distributions agreed better with those from data. No significant p T dependence of the difference between the widhts in data and simulation was observed, and a smearing based on a Gaussian distribution with a width of 0.004 in φ and 0.008 in η was found to give good agreement between data and simulation in all bins of jet p T . The p rel T templates for b and c jets were rederived from this smeared sample, and the p rel T distribution in data was fitted using these altered templates. The difference between using the unsmeared and smeared jet directions is then taken as a systematic uncertainty.

The system8 method
The system8 method [24] uses three uncorrelated selection criteria to construct a system of eight equations based on the number of events surviving any given subset of these criteria. The system, which is fully constrained, is used to solve for eight unknowns: the efficiencies for b and non-b jets to pass each of the three selection criteria, and the number of b and non-b jets originally present in the sample. As there are insufficient degrees of freedom to make a complete separation of the non-b component into (c, s, d, u, g) jet flavours, these are combined into one category and denoted cl. In simulated events, the flavour composition of the sample is relatively independent of jet p T in the range studied, while the efficiencies to pass each of the selection criteria have a strong p T dependence.
The three selection criteria chosen are: • The lifetime-based tagging criterion under study.
• The requirement p rel T > 700 MeV.
• The requirement of at least another jet in the event, other than the one containing the muon, with a reconstructed secondary vertex with a signed decay length significance L/σ(L) > 1.
The resulting system of equations can be written as follows: In these equations, the superscripts LT and MT denote the lifetime tagging criterion and soft muon tagging criterion, respectively. The n and p numbers denote the size of the samples without (n) and with (p) the application of the requirement of another jet; these samples are referred to as the "n" sample and the "p" sample, respectively.
Little correlation is expected between the variables used in the above criteria. However, even if correlations between tagging algorithms are small in practice, they must be accounted for. This is accomplished through correction factors, α i , i = 1, . . . , 8, which are defined as: ).

(8.3)
A lack of correlation between two criteria thus implies that the related correction factors are equal to unity.
As it is impossible to isolate independent corresponding samples in data, these correlations are inferred from simulated samples. The correction factors for b jets, as well as the c-jet information used to compute the cl correction factors, are derived from the simulated µ-jet sample, while the light-flavour-jet information used to compute the cl correction factors is derived from the simulated inclusive jet sample. As light-flavour jets only rarely have reconstructed muons associated with them, the statistical uncertainty on the correction factors would be unacceptably large if they were derived from muons matched to light-flavour jets in simulation. Instead, a charged particle track, fulfilling the requirements made for the Inner Detector track matched to reconstructed muons, is chosen at random and treated subsequently as if it were a muon. To ensure that Inner Detector tracks model the kinematic properties of reconstructed muons in light-flavour jets, the tracks are weighted to account for the p T -and η-dependent probability that a muon reconstructed as a track in the Inner Detector is also reconstructed in the Muon Spectrometer, as well as the sculpting of the muon kinematics by the muon trigger term. An additional correction factor is applied to account for the probability that a muon originating from an in-flight decay is associated with the jet.
As system8 only includes correction factors for b and non-b jets, the c and light-flavour samples have to be combined to obtain the cl correction factors. The relative normalisation of the charm and light-flavour samples is inferred from the simulated inclusive jet sample, leading to a charm-to-light ratio in the nand p-samples which ranges from 0.6 to 1.5 depending on the sample and jet p T bin. The variation of the charm fraction in the combined sample is treated as a systematic uncertainty, as discussed in section 8.5. The values of the correction factors depend on the tagging algorithm, operating point and jet p T bin. For the MV1 tagging algorithm at 70% efficiency, the correction factors for b jets (cl jets) range between 0.96 and 1.04 (between 0.93 and 1.15).
The system of equations is solved technically by minimising a χ 2 function relating the observed event counts in the eight disjoint event categories to the eight parameters n b , n cl , p b , p cl , LT b , LT cl , MT b , and MT cl . Since no degrees of freedom remain, the found minimum must have χ 2 = 0.

Systematic uncertainties
The systematic uncertainties affecting the p rel T and system8 methods are common to a large extent. One important class of common systematic uncertainties are those addressing how well the simulation models heavy flavour production, decays and fragmentation. Other common systematic uncertainties are those arising from the imperfect knowledge of the jet energy scale and resolution as well as the modelling of the additional pile-up interactions. A systematic uncertainty that applies only to the p rel T analysis is the heavy-flavour contamination in the p rel T light-flavour data control sample, while a systematic uncertainty that only applies to the system8 analysis arises from varying the muon p rel T cut which is used as the soft muon tagging criterion. The systematic uncertainties on the data-to-simulation scale factor κ data/sim b ≡ data b / sim b of the MV1 tagging algorithm at 70% efficiency are shown in tables 1 and 2 for the p rel T and system8 methods respectively. The estimates of the systematic uncertainties, especially in the system8 analysis, suffer from the limited number of simulated events which leads to unphysical bin-to-bin variations in some cases. However, when the calibration results of several methods are combined (see section 10) these irregularities are smoothed out.

Simulation statistics
The limited size of the simulated event samples results in statistical fluctuations on the p rel T templates in the case of the p rel T analysis, and in statistical uncertainties on the system8 correlation factors in the system8 analysis.
The effect from limited template statistics in the p rel T analysis is assessed through pseudoexperiments. In the system8 analysis, the limited statistics available for the samples used to estimate the correlation factors is accounted for using an extra contribution to the fit χ 2 : Here, α represent eight additional fit parameters, α 0 are their estimates, and V is the corresponding covariance matrix. To estimate the contribution to the uncertainty from this limited statistics, the uncertainty for the fit without this addition is subtracted quadratically from the uncertainty for the fit including it.
In addition, the limited simulation statistics result in an uncertainty on the denominator in the scale factor expression, denoted simulation tagging efficiency in tables 1 and 2.

Modelling of gluon splitting to bb and cc
As the properties of jets with two b or c quarks inside (originating e.g. from gluon splitting) are different from those containing only a single b or c quark, a possible mismodelling of the fraction of double-b or double-c jets in simulation has to be taken into account. Jets which have two associated b quarks or c quarks are either given a weight of zero or a weight of two (effectively removing or doubling the double-b or double-c contribution) when constructing the p rel T templates and system8 correlation factors. This uncertainty on c-jet production is the dominant one for the p rel T analysis at high jet p T because gluon splitting is a larger contribution to the total charm production in this regime. Also, in this region the p rel T variable has a reduced discriminating power and hence the fit becomes more sensitive to a change in the shape of the c-jet template. The systematic effects tend to cancel in the ratios that define the system8 correction factors, leading to much reduced systematic uncertainties for system8.

b-hadron direction modelling
Both the p rel T and the system8 analyses make use of the momentum of the associated muon transverse to the combined muon-plus-jet axis, where the muon plus jet axis is a measure of the b-hadron direction. A different jet direction resolution in data and simulation would therefore affect both analyses. This is accounted for by smearing the calorimeter jet direction by 0.004 in φ and 0.008 in η, as discussed in section 8.3. Both analyses use the result from the unsmeared template distributions as the central value and treat the full difference with the result from the smeared distributions as a systematic uncertainty.

b-quark fragmentation
An incorrect modelling of the b-quark fragmentation in simulation can affect the momentum spectrum of the muons from b decays and thus alter which muons pass the selection criteria. To investigate the impact of fragmentation on the data-to-simulation scale factor, the p rel T templates and system8 correlation factors have been rederived on a simulated sample where the b fragmentation function was reweighted so that the average fraction of the b-quark energy given to the b hadron was varied by 5%.
The production fractions of the various b-flavoured hadrons have been measured both at LEP and the Tevatron [26,27], and the results for b baryon production are only compatible at the 2 σ level. The production fractions in the simulated samples used in this paper are in reasonable agreement with the fractions as measured by LEP. A systematic uncertainty is evaluated by considering the difference in the result obtained by reweighting all of the events so that the distribution of hadron species matches the measured Tevatron values.

b-hadron decay
The spectrum of the muon momentum in the b-hadron rest frame, denoted as p * , directly affects the shape of the p rel T distribution for b jets. The p * spectrum has two components, direct b → µ + X decays and cascade b → c/c → µ + X decays. Their branching fractions are BF(b → X ) = (10.69 ± 0.22)% and BF(b → c/c → X ) = (9.62 ± 0.53)%, respectively [26], giving the ratio BF(b → X )/BF(b → c/c → X ) = 1.11 ± 0.07, where denotes either a muon or an electron.

JINST 11 P04008
This ratio of branching fractions has been varied within the quoted uncertainty. To investigate the effect of variations of the p * spectra, a weighting function has been applied to the p * spectrum of muons from the direct b → µ+ X decay. This weighting function has been derived by comparing the direct p * spectrum of b → + X decays in PYTHIA as used in the analysis with the corresponding spectrum measured in ref. [28].

Charm-to-light ratio
Both the p rel T and system8 methods are sensitive to the relative fractions of c and light-flavour jets in the simulation. As the p rel T templates for c and light-flavour jets have a very similar shape, the p rel T fits can become unstable if both components are allowed to vary freely in the fit. Therefore the fits to the p rel T templates are performed with the ratio of the charm and light-flavour fractions fixed to the simulated value. In the system8 method, the ratios of the c-jet and light-flavour-jet fractions in the n and p samples are also fixed to their simulated values. The relative fractions of c and light-flavour jets in these samples affect the correction factors related to non-b jets (α 2 , α 3 , α 4 and α 8 ). In both analyses the impact of the constrained charm-to-light ratio has been addressed by varying the ratio up and down by a factor of two. The charm-to-light ratio variation is one of the dominant systematic uncertainties in the p rel T analysis, especially in the highest p T bin where the band c-jet templates start to look very similar. The system8 analysis, which only relies on the p rel T cut to increase the fraction of b jets in the sample, is less affected.

Fake muons in b jets
The p rel T templates and system8 correlation factors are obtained from the simulated µ-jet sample where a muon with p T > 3 GeV is required at the generator level. This filter suppresses b jets containing a reconstructed muon from other sources compared to those where the muon originated from a b decay (about 30% of the muons in this category, denoted as "fake" muons here, are indeed real muons produced by in-flight decays of light mesons; the remainder are not true muons). The fraction of fake muons in this sample is therefore likely to be lower than in data. As this could potentially impact the p rel T b template shapes, the p rel T measurement has been repeated with the fake muon fraction in the b template increased by a factor of three, which was found to have a negligible impact on the final result. In the system8 measurement, the fake muon contribution is varied in both b and c jets, again with a negligible impact on the final result.
p rel T light-flavour-template contamination In the p rel T method, the templates for light-flavour jets are obtained from a light-flavour-enriched data sample. A measurement bias can arise from b-jet contamination in the light-flavour template. This b-jet contamination in the light-flavour template is estimated from simulation to be between 4% and 10%, depending on the jet p T bin. The bias introduced by this contamination is corrected for in the final result, and the result of a 25% relative variation of the b-jet contamination is taken as a systematic uncertainty.
p rel T cut variation The system8 analysis uses a cut on the p rel T of the muon associated to the jet to arrive at a sample with enhanced heavy flavour content. The p rel T cut, which is nominally placed at 700 MeV, was -35 -varied between 600 MeV and 800 MeV, and the difference with respect to the nominal result was taken as a systematic uncertainty.

Jet energy scale and resolution
A jet energy scale in simulation that is different from that in data would bias the p T spectrum of the simulated events used to extract the p rel T templates and the system8 correlation factors. The systematic uncertainty originating from the jet energy scale is obtained by scaling the p T of each jet in the simulation up and down by one standard deviation, according to the jet energy scale uncertainty [6].
To estimate the systematic uncertainty from the jet energy resolution, a smearing has been applied to the jet energies in simulation, corresponding to the jet energy resolution uncertainties as described in [29].

Semileptonic correction
Both the p rel T and system8 analyses measure the b-jet tagging efficiency of jets containing semileptonically decaying b hadrons. Both analyses therefore make an extra jet energy scale correction, described in section 8.2, to correct the jet energy to the inclusive b-jet scale. The uncertainty on this correction, which amounts to about 2%, reflects how sensitive the correction is to the modelling of b jets in the simulation, the correlation between the correction and the b-tag output weights and how well the correction agrees in data and simulation. The uncertainty on the semileptonic correction is propagated through the p rel T and system8 analyses as a systematic uncertainty. Systematic sources which affect both the semileptonic correction and the p rel T templates or system8 correlation factors are varied in a correlated manner.

Pile-up µ reweighting
Simulation studies show that the impact on the b-tagging performance from the change in pile-up conditions during 2011 is relatively small compared to the precision of the p rel T and system8 analyses. The change in light-flavour-jet rejection at fixed b-jet tagging efficiency exceeds 5% only for the tightest operating points. With the µ distribution in simulated events reweighted to that in the 2011 data, as described in section 2, only the 3% relative uncertainty on the µ scale factor applied prior to the reweighting procedure affects the pile-up in simulated events. It is accounted for by repeating the analyses after changing the scale factor accordingly.

Extrapolation to inclusive b jets
The p rel T and system8 methods only measure the b-jet tagging efficiency in data for b jets with a semileptonic b-hadron decay. The b-jet tagging efficiency is different for these b jets than for inclusive b jets (in simulated events their ratio decreases from about 1.1 for the lowest p T jets, to values agreeing with unity within about 1% for jet p T 100 GeV). This is because the charged particle multiplicity is different in semileptonic and hadronic b-hadron decays, and, most importantly, the jets used in the p rel T and system8 analyses always contain a high-momentum and typically well-measured muon track whereas the hadronic b jets do not. However, assuming that the simulation adequately models the relative differences in b-jet tagging efficiencies between -36 -semileptonic and hadronic b jets, the same data-to-simulation scale factor is valid for both types of jets.
To investigate the validity of this assumption, the data-to-simulation scale factor was measured separately for jets with and without muons using a high purity sample of b jets in tt dilepton events. The ratio of the data-to-simulation scale factors for jets with and without muons was found to be consistent with unity for all tagging algorithms and operating points. The uncertainty on the measurement, approximately 4%, is assigned as a systematic uncertainty on the data-to-simulation scale factors obtained with the muon-based methods. However, due to the limited number of b jets with a semileptonic b-hadron decay in the tt dilepton events, this analysis was not performed in bins of jet p T .
In order to independently investigate effects that can potentially lead to different relative b-jet tagging efficiencies in semileptonic and hadronic b jets in the simulation -and especially their jet p T dependence -properties of the b-hadron production process and semileptonic and hadronic b-hadron decays have been studied. The procedure used is analogous to that used for the calibration of the c-jet tagging efficiency, which is described in a more formal way in section 12.5. For the following quantities, the simulation has been adjusted to available measurements [30]: b-hadron production fractions, branching fractions of semileptonic b-hadron decays, relative branching fractions of the dominant exclusive semileptonic b-hadron decays and topological branching fractions of c-hadron decays. These adjustments result in very small shifts (well below 1%) and, given the uncertainties of the measurements, negligible uncertainties on the ratio of tagging efficiencies for jets with semileptonic and inclusive b-hadron decays. The effect of gluon splitting to bb also has a direct impact on the b-jet tagging efficiency as predicted by the simulation. The variation mentioned before (assigning weights of zero or two) results in a small change of the b-jet tagging efficiency in simulation, with some -almost linear -dependence on jet p T . The charged particle multiplicity spectrum of b-hadron decays has a direct impact on the b-jet tagging efficiency, and variations of the charged particle multiplicity spectrum of hadronic b-hadron decays do not cancel in the ratio of relative tagging efficiencies. Since no dedicated measurements of the charged particle multiplicity spectrum of (hadronic) b-hadron decays for an admixture of b hadrons as present in high energy collisions are available,4 the potential effect has been studied by adjusting the charged particle spectrum to the one predicted by E G [32].5 This is the dominant effect on the extrapolation to inclusive b jets; the impact from these variations on the ratio of the relative differences in b-jet tagging efficiencies between semileptonic and hadronic b jets is typically a few percent (depending on the scenario). The jet p T dependence of this effect is negligible. Since these studies do not show a significant jet p T dependence, the uncertainty of 4% from the measurements of data-to-simulation scale factors in jets with and without muons is applied to all jet p T bins.

Results
The b-jet tagging efficiency measured in data using the p rel T and system8 methods, the corresponding values from simulation and the resulting data-to-simulation scale factors for the MV1 tagging 4Only the total number of charged particles in b-hadron decays has been measured [30], but not their spectrum. Event-wise charged particle multiplicity spectra of Υ(4S) decays, corresponding to an admixture of B + -B − and B 0 -B 0 meson pairs, have been measured in ref. [31].
5Other scenarios have also been studied for cross-checks.

JINST 11 P04008
algorithm at 70% efficiency are shown in figure 25. As the jets selected for the p rel T and system8 measurements are different, the fraction of b-tagged jets are not necessarily equal. On average the scale factor is about 0.95, which is typically one standard deviation lower than unity, and has no strong dependence on p T . It has to be kept in mind that the dominant systematic uncertainty from the extrapolation of the scale factors to an inclusive b-jet sample is fully correlated between p T bins. 9 b-jet tagging efficiency calibration using tt-based methods The calibration methods described in the previous sections are based on dijet events. At the LHC, the large tt production cross section provides an alternative source of events enriched in b jets. With the large integrated luminosity of 4.7 fb −1 collected during 2011, the methods based on tt selections have become competitive for the first time with the muon-based b-jet calibration methods described in section 8. Compared to the muon-based methods, the tt-based methods provide a b-tagging calibration measurement in an inclusive jet sample rather than a sample of semileptonic b jets and cover a larger range in p T .
-38 -In the following, four calibration methods are presented. The tag counting method fits the multiplicity of b-tagged jets in tt candidate events while the kinematic selection method measures the b-tagging rate of the leading jets in the tt signal sample. The kinematic fit method uses a fit of the tt event topology to extract a highly purified sample of b jets from which the b-jet tagging efficiency is obtained. Finally, the combinatorial likelihood method improves the precision offered by the kinematic selection method by exploiting the kinematic correlations between the jets in the event.
The kinematic selection method is applied to both the single-lepton and dilepton decay channels, whereas the tag counting method is presented only in the single-lepton channel. By construction the kinematic fit method is restricted to the single-lepton channel. The combinatorial likelihood method is applied only to the dilepton channel.

Simulation samples, event selections and background estimates
The tt signal is simulated using MC@NLO interfaced to HERWIG (POWHEG [18,19] interfaced to PYTHIA [9] in the case of the combinatorial likelihood analysis) with the mass of the top quark set to 172.5 GeV. The cross section is normalised to the approximate NNLO calculation from H 1.2 [33] using the MSTW 2008 90% parton distribution function (PDF) sets [34], incorporating PDF+α S uncertainties according to the MSTW prescription [35] cross checked with the approximate NNLO calculation of ref. [36] as implemented in T ++1.0 [37].
For the main backgrounds, which consist of W /Z boson production in association with multiple jets, A v2.13 [38] is used, which implements the exact LO matrix elements for final states with up to six partons. Using the LO PDF set CTEQ6L1 [39], the following backgrounds are generated: W +jets events with up to five partons, Z/γ * +jets events with up to five partons, and diboson WW +jets, W Z+jets and Z Z+jets events. The MLM matching scheme [38,40] of the A generator is used to remove overlaps between the n and n + 1 parton samples. In the combinatorial likelihood analysis, S v1.4.1 [41] with the CT10 PDF set [42] is used instead for both W +jets and Z+jets processes.
For all but the diboson processes, separate samples are generated that include bb quark pair production at the matrix element level. In addition, for the W +jets process, separate samples containing W c+jets and W cc+jets events are produced. The same program employed for the generation of the tt signal (MC@NLO for most analyses, POWHEG for the combinatorial likelihood analysis) is used for the production of single-top sand Wt-channel backgrounds. A MC is used for t-channel production. The uncertainty due to the choice of tt generator is evaluated by comparing the predictions of MC@NLO with those of POWHEG interfaced to HERWIG or PYTHIA.

Event selection
Events in the single-lepton and dilepton tt channels are triggered using a high-p T single-electron or single-muon trigger. In addition to the objects described in section 1 the tt analyses require isolated electrons and muons, as well as missing transverse momentum.
In all tt analyses, both in the single-lepton and dilepton channels, the b-jet tagging efficiency measurement is performed in a sample comprising all electron and muon combinations (e+jets and µ+jets or ee, µµ and eµ), including muons and electrons resulting from τ lepton decays.

Selection of the single-lepton sample
In the single-lepton channels (e+jets and µ+jets), the following event selection is applied: • The appropriate single-electron (with trigger thresholds at 20 or 22 GeV, depending on the data taking period) or single-muon trigger (with trigger threshold at 18 GeV) has fired.
• The event contains exactly one reconstructed lepton with p T > 25 GeV (e) or p T > 20 GeV (µ), matching the corresponding trigger object.
• The lepton must be isolated from any jet activity. Beyond requiring that the nearest jet must be separated by ∆R > 0.4, the summed calorimeter transverse energies deposited in a cone of size ∆R < 0.2 around electron directions must be less than 3.5 GeV; for muons this maximum value is 4 GeV, in a cone of size ∆R < 0.2, and in addition the summed track p T in a cone of size ∆R < 0.3 must be less than 4 GeV. In all cases the contribution from the lepton itself is subtracted.
• The missing transverse momentum [43] is required to be E miss T > 30 GeV (E miss T > 20 GeV) in the e+jets (µ+jets) channel and the transverse mass is required to be m is the azimuthal angular difference between the directions of the selected lepton and the missing transverse momentum. These cuts reduce the contribution from the multijet background.
• The event is required to have at least four jets with p T > 25 GeV, |η| < 2.5 and (if tracks are associated with the jet) JVF > 0.75.

Background estimation in the single-lepton channel
The background in the single-lepton channel is expected to be around 30%. The dominant background arises from W boson production with associated jets (W +jets). Its estimate is based on the prediction from Monte Carlo simulation, corrected with scale factors derived directly from data. The correction of the overall normalisation is obtained with a charge asymmetry method [44]. The flavour composition of the W +jets sample is measured with a tag counting method [45], which provides scale factors for W bb/cc+jets, W c+jets and W with light-flavour jets events used to correct Monte Carlo simulation predictions. The second most important contribution to the background comes from multijet production and is measured directly in data using the matrix method. This method relies on finding a relationship between events with real and non-prompt or fake leptons, as described in ref. [46]. Estimates of other backgrounds processes such as single top, diboson and Z+jets production are obtained from Monte Carlo simulation. Figure 26 shows the jet multiplicity and jet p T distributions for data and simulated events. These distributions are sensitive to a correct description of the multijet and W +jets backgrounds, and they show a good agreement between the predicted background and signal contributions and data.

Selection of the dilepton sample
A very pure sample of tt events with dileptonic decays (ee, µµ and eµ) can be obtained with the following event selection criteria: • A single-electron (trigger threshold at 20 or 22 GeV depending on the data taking period) or single-muon trigger (trigger threshold at 18 GeV) has fired.
• The event contains exactly two oppositely charged leptons (ee, µµ or eµ), with the electron candidate satisfying p T > 25 GeV, and the muon candidate p T > 20 GeV. At least one of these must be associated with a lepton trigger object, and both leptons must satisfy the isolation requirements described in section 9.1.2.
• The event contains at least two jets with p T > 25 GeV, |η| < 2.5 and (if tracks are associated with the jet) JVF > 0.75. The jet p T threshold is lowered to 20 GeV for the combinatorial likelihood method.
• In the ee and µµ channels, to suppress backgrounds from Z+jets and multijet events the missing transverse momentum must satisfy E miss T > 60 GeV, and the invariant mass of the two leptons must differ by at least 10 GeV from the Z boson mass (Z mass veto): |m − m Z | > 10 GeV. To suppress backgrounds from Υ and J/ψ decays, a low mass cut of m > 15 GeV is applied.
• In the eµ channel, no E miss T or Z boson mass veto cuts are applied. However, for all analyses except the combinatorial likelihood one, the scalar sum of the transverse momenta of the jets -41 -

Background estimation in the dilepton channel
The dilepton channel has a purity of 80%-85% depending on the lepton flavours. The dominant background originates from non-prompt or fake leptons from electron-like jets reconstructed as electrons or non-prompt leptons from a decay of a heavy-flavour hadron within a jet. This background receives contributions from W boson production with associated jets, sand t-channel and Wt single top production, the single-lepton decay of tt pairs and multijet events.
In most analyses, this background is estimated directly from data with a matrix method [46] for each of the three channels separately; and all background processes leading to two prompt leptons (diboson, Z+jets and single top in the Wt-channel) are directly taken from the simulation. In the combinatorial likelihood analysis, the non-prompt and fake lepton background is instead estimated from a sample where both leptons have the same charge signs, and for which residual contributions predicted from simulation are corrected. The Z+jets background normalisation for each jet multiplicity bin in the eµ channel is estimated using a data/MC normalisation factor obtained from corresponding ee and µµ data samples with |m − m Z | < 10 GeV; in the combined ee and µµ data samples it is estimated from a sample where the suppression of the Z resonance is similarly replaced with a requirement |m − m Z | < 10 GeV. Figure 27 shows the jet multiplicity and jet p T distributions for data and simulated events. A good agreement between data and simulation can be seen.

Tag counting method
The tag counting method makes use of the fact that since the branching fraction of t → W b in the Standard Model is very close to unity, each tt event is expected to contain exactly two real b jets. If there were no other sources of b jets and if only b jets were b-tagged, the expected number of events with two b-tagged jets would be ε 2 b N sig while the number of events with one b-tagged jet would be where N sig is the number of tt signal events.
In reality, the number of reconstructed (or tagged) b jets in a tt event will not necessarily be equal to two, since the b jets from the top quark decays can be outside the detector acceptance, and additional b jets can be produced through gluon splitting. Moreover, c jets and light-flavour jets, which come from the hadronic W -boson decay or initial or final state radiation, can be tagged as b jets. These effects are taken into account by evaluating the expected fractions, F i jk , of events containing i b jets, j c jets and k light-flavour jets that pass the event selection. The F i jk fractions are estimated from Monte Carlo simulation and are derived separately for the tt signal and the various background processes. The expected number of events with n b-tagged jets is calculated as the sum of all these contributions. The b-jet tagging efficiency can be extracted by fitting the expected event counts to the observed counts.
The expected number of events with n b-tagged jets, < N n >, is calculated as where i, j and k (i , j and k ) represent the number of pre-tagged (tagged) b, c, and light-flavour jets. F i jk is the fraction of events containing i b jets, j c jets and k light-flavour jets before any tagging requirement is applied in each p T bin. BF is the branching fraction to each final state, A tt is the event selection efficiency for that particular final state, and L is the integrated luminosity. The binomial coefficients account for the number of arrangements in which the n-tags can be distributed. The efficiencies to mistag a c jet or light-flavour jet as a b jet, ε c and ε l respectively, are fixed to the values found in Monte Carlo simulation, but with data-driven scale factors applied as obtained with the methods described in sections 13 and 14. N bkg is the number of background events. To apply the method as a function of p T , the F i jk fractions are computed in p T bins using only the jets in each event that fall in a given p T bin. For both signal and background the dominant fraction is F 000 which occurs when no jets fall in that particular p T bin. Since a single event can contribute to several p T bins, this method maximises the use of the available jets in the sample.
The 0-tag bin is dominated in the single-lepton channel by inclusive jet and W +jets backgrounds and is therefore not included in the fit. The inclusive jet background is subtracted from the n-tag distribution prior to performing the fit since the F i jk fractions cannot be estimated reliably from Monte Carlo simulation. For the remaining background processes, dominated by W +jets, F

JINST 11 P04008
The extraction of parameters in eq. (9.1) from the data is performed using a likelihood fit. The likelihood function used is L = Gaus σ tt |σ tt,MC , δ σ t t,MC Gaus N bkg |N bkg,MC , δ N bkg n−tags Pois(N n | < N n >). (9. 2) The number of events in each n-tag bin is described by a Poisson probability with an average value corresponding to the number of expected events. The tt cross section and N bkg are floating parameters of the fit but are each constrained by a Gaussian distribution with a width of one standard deviation of the respective normalisation uncertainties. The uncertainty introduced by the Monte Carlo simulation statistics has been estimated from the uncertainties on the F i jk fractions and is found to be negligible.

Kinematic selection method
The kinematic selection method relies on the knowledge of the flavour composition of the tt signal and background samples, and extracts the b-jet tagging efficiency by measuring the fraction of b-tagged jets in data. Given an expected number of b, c and light-flavour jets, as well as the cand light-flavour-jet mistag efficiencies, the fraction of b-tagged jets in data is given by which can be rearranged to solve for the b-jet tagging efficiency, b : Here, f b , f c and f l are the expected fractions of b, c, and light-flavour jets in data which are estimated from simulated events. ε c and ε l are the mistag efficiencies for c and light-flavour jets to be tagged as b jets, which are taken from Monte Carlo simulation but with data driven scale factors applied, obtained with the methods described in sections 13 and 14. f fake/np is the fraction of jets from the non-prompt or fake lepton (in the dilepton channel) or inclusive jet (in the single-lepton channel) background and is determined from data. The flavour fractions are calculated with respect to the sum of jets from Monte Carlo simulation and obey the relation The flavour composition of the jet sample obtained after applying a dilepton selection is shown in figure 29 binned in both p T and η. The expected fraction of b-tagged non-prompt lepton, fake lepton or inclusive jet events, ε fake/np , is estimated from data, as detailed below. The inclusive jet mistag rate in the single-lepton channel, ε fake/np , is measured in a data control region enriched in inclusive jet events. The control region is obtained by reversing the E miss T and m T (lν) selection criteria: • e+jets channel: 5 GeV < E miss T < 30 GeV and m T (lν) < 25 GeV.
-45 -Moreover, the leptons in the control region are only required to satisfy looser selection criteria (so-called loose leptons), following ref. [46]. Loose muons are not required to fulfil any isolation criteria, while the isolation criterion for loose electrons is less strict than that used in the baseline event selection. From the events measured in the control region in data the predicted contributions of the tt, single top, diboson, W +jets and Z+jets processes obtained from Monte Carlo simulation are subtracted.
In the dilepton channel, the fraction of b-tagged jets coming from the non-prompt or fake lepton background ε fake/np is determined from events in which both leptons have the same charge. The remaining event selection criteria are required to be satisfied. Since it is expected that neither the dileptonic tt decay nor the background processes Z+jets or single top produce same sign events (the contribution from events with a wrongly measured electron charge is expected to be small), a sample is obtained that is dominated by events having at least one non-prompt or fake lepton.
To increase the purity in the single-lepton channel, in addition to the selection described in section 9.1.2, the events are also required to have at least one jet b-tagged with the MV1 tagging algorithm at an operating point that corresponds to an efficiency of 70%. Based on which jet is b-tagged, the single-lepton sample is split into two sub-samples in the following way: • If the leading jet is b-tagged, the b-tagging rate of the next three jets is measured (L234 sample).
• If the next-to-leading jet is b-tagged, the b-tagging rate of the leading jet is measured (L1 sample).
Subsequently, jets are divided in bins of p T , in which the number of b-tagged jets from each selection is counted. To calculate the b-jet tagging efficiency, the combined L1 and L234 sample is used.
In the dilepton channel, the b-jet fraction of the sample is increased by using only the two leading jets in each event, as this reduces the contamination of c and light-flavour jets originating from initial and final state gluon radiation.

Kinematic fit method
The kinematic fit method is based on the selection of a high-purity b-jet sample by applying a kinematic fit, similar to that described in ref. [47], to the events passing the selection described in section 9.1. The kinematic fit performed on the single-lepton tt event topology provides a mapping between the reconstructed jets, the lepton, and the missing transverse momentum onto the b jets originating directly from the top quark decays and the jets (leptons) from the subsequent hadronic (leptonic) W -boson decay. The kinematic fit exploits the masses of the two top quarks and W bosons as constraints, leading to four constraints in total with one unmeasured parameter resulting in three degrees of freedom. The fit, which is based on a χ 2 -minimisation method, is performed on all permutations of the six highest-p T jets, and the permutation with the lowest χ 2 is retained.
The b-jet tagging efficiency is measured with the jet assigned by the fit to be the b jet on the leptonic side of the event (i.e., associated with the leptonic W -boson decay). Choosing the lowest χ 2 permutation of the kinematic fit selects the correct jet association in about 60% of the tt events.
In addition to the combinatorial background the sample still contains backgrounds from other processes, such as single top and W +jets events. Nevertheless, the full b-jet weight distribution -46 -

JINST 11 P04008
can be obtained from data by using a statistical background subtraction. This subtraction is done by dividing the sample into two orthogonal sub-samples based on the information about the jets associated to the hadronic side of the event (i.e., associated with the hadronic W -boson decay): the first sub-sample ("signal sample") is selected by applying additional cuts to increase fraction of correct mappings, while the second sub-sample ("background sample") is enriched in incorrect mappings. The additional cuts applied to the signal sample are: • The jet identified by the kinematic fit to be the b jet on the hadronic side of the event needs to be b-tagged by the MV1 tagging algorithm at the 70% efficiency operating point. This is applied to suppress the W +jets events and incorrect permutations.
• The jets associated with the hadronic W -boson decay must not be b-tagged by the MV1 b-tagging algorithm at the 70% efficiency operating point.
• Only events with six or fewer jets with p T > 25 GeV are considered.
The background sample is instead defined by removing the b-tagging requirement on the hadronic-side b jet and the jet multiplicity requirement and inverting the b-tagging veto on the jets associated to the W -boson decay: • At least one of the jets assigned by the kinematic fit to the hadronic W -boson decay is required to be b-tagged by the MV1 tagging algorithm at the 70% efficiency operating point.
To verify that the signal sample is enriched in correct mappings at low values of fit χ 2 , while the background sample is dominated by incorrect mappings at all values of fit χ 2 , a truth-match based on a ∆R cut to the original partons of the hard interactions is performed in Monte Carlo simulation. Here, groupings of partons, hadron-level jets and reconstructed jets are chosen in a way that minimises the sum of their respective distances in the η − φ plane. Such a triplet is considered to be matched if the respective sum of the three distances in the η − φ-plane passes the requirement ∆R(parton, hadron) + ∆R(reco, hadron) + ∆R(reco, parton) < 0.5. Due to e.g. unreconstructed jets, it will not always be possible to define the above triplets, and thus a fraction of the events will remain unmatched. The unmatched mappings remain in the analysis: in Monte Carlo simulations the truth b jets are taken into account independently of their matching status.
The χ 2 distributions of both the signal and background samples are shown in figure 30, together with the result of the truth-match. As desired, the signal sample has a sizable fraction of correct mappings, while the background sample almost exclusively is made up of unmatched or incorrect mappings. Furthermore, the correct mappings predominantly have low χ 2 values, while the high χ 2 -region is fully dominated by incorrect and unmatched mappings.
The remaining background of incorrect mappings in the signal sample selected from data can therefore be estimated from the background sample. As events with high values of χ 2 are predominantly incorrect mappings in both sub-samples, the background sample prediction can be normalised at high χ 2 values ( χ 2 > 25), by using the scale factor S BG = The background-subtracted b-tag weight distribution of the b jet from the leptonic decay in the signal sample, from which the b-jet tagging efficiency is eventually extracted, is subsequently derived by subtracting the b-tag weight distribution in the background sample, scaled according to eq. (9.5).
For the background subtraction method to work correctly it is imperative that the shape of the χ 2 distribution of the non-b portion of the background sample agrees with that of the non-b portion of the signal sample. This has been found to be the case for all the b-tagging algorithms tested.
The measurement of the b-jet tagging efficiency is based on the b-tag weight distribution of the sample of b jets on the leptonic side of the event. An important advantage of this method is that a continuous calibration of the b-tag weight distribution is feasible, as the full distribution is reconstructed. The b-jet tagging efficiency for a given operating point, corresponding to a certain weight cut w cut , can be calculated using the (normalised to unity) weight distribution T (w) of the selected b-jet sample after the background subtraction by integration above the threshold w cut : (9.6) -48 -

JINST 11 P04008
Depending on the available statistics the measurement of the b-jet tagging efficiency can be binned in any variable, for example p T or η. The complete sequence of calibration steps for the MV1 b-tagging algorithm for jets with 25 GeV < p T < 200 GeV is presented in figures 31 and 32. After scaling the background sample ( figure 31) the prescription results in a background-subtracted distribution of the MV1 weight ( figure 32). Using eq. (9.6) the efficiency is then derived. It is shown that the method applied to simulated events ("expected") describes the distribution obtained from the sample of true b jets in Monte Carlo simulated events. The ratio displayed is the efficiency measured in data divided by the efficiency calculated from true b jets in simulated events.

Combinatorial likelihood method
The combinatorial likelihood method is intended to increase the precision by exploiting the kinematic correlations between the jets in the dilepton sample. Like the kinematic selection method, -49 -this method relies on an a priori knowledge of the flavour composition of the tt signal and background samples.
Events with either two or three jets selected using the criteria detailed in section 9.1.4 are used in the analysis. The analysis is carried out separately for two-and three-jet events, and for the eµ and combined ee and µµ channels. In detail, the b-jet tagging efficiency determination in each of the resulting four channels uses an unbinned maximum likelihood fit.
In the two-jet case the following per-event likelihood function is adopted: where: • the indices (i, k) run over (1, 2) and (2, 1); • f bb , f b j are the two independent jet flavour fractions, and f j j = 1 − f bb − f b j ; • P f (w|p T ) is the PDF (probability density function) for the b-tagging discriminant or weight for a jet of flavour f , for a given transverse momentum;6 • and P f 1 f 2 p T,1 , p T,2 is the two-dimensional PDF for [p T,1 , p T,2 ] for the flavour combination All PDFs are implemented as binned histograms. For example, for N p T bins, P f 1 f 2 p T,1 , p T,2 is expressed as an N × N binned histogram. For the symmetric bb and ll combinations, the PDF is symmetrised, reducing the number of independent bins to determine from N 2 −1 to N ×(N + 1) /2−1 which reduces the statistical fluctuations from the Monte Carlo simulation; as a consequence, the explicit symmetrisation expressed by eq. (9.7) for these combinations is for notational convenience only. The flavour PDFs P f (w|p T ) are defined in a similar way, with one binned histogram for each p T bin. All PDFs are determined from simulation, except for the b-jet weight PDF, which contains the information to be extracted from the data.
A histogram with only two bins is used to describe the b-weight PDF for each p T bin, with the bin above the cut value corresponding to the b-jet tagging efficiency. The b-jet tagging efficiency then corresponds to The likelihood function distinguishes between the different flavour fractions, but not between signal and background processes. To extract P b (w|p T ) in bins of p T , the flavour fractions f f 1 , f 2 , the P f 1 f 2 p T,1 , p T,2 and the non-b-jet b-weight PDFs are determined from simulation.
A slightly more complex likelihood function is defined for the three-jet case, which is conceptually analogous but needs to consider that the jet flavour combinations are increased to four.
6This means that, regardless of the jet p T , the integral of the PDF over the b-tagging weight variable is one.
-50 -Accordingly, there are up to 3! = 6 equivalent jet combinations the likelihood needs to be summed over, as they are a priori indistinguishable in data. The formalism in this case is: where the indices (i, k, l) run over all possible permutations of the three jets, and the various PDFs are defined in a similar way to the two-jet case.
In order to simplify the determination of P f 1 f 2 f 3 p T,1 , p T,2 , p T,3 from simulations, which otherwise requires prohibitive simulation statistics, the following factorisation assumption is made: The effect of this approximation was tested, along with the entire fitting method, using closure tests based on simulated events. In these tests the fit procedure is applied to the Monte Carlo simulation itself in order to check for possible biases. The tests have been performed for all four channels (eµ and ee+ µµ, 2 and 3 jets) and for several different b-jet tagging efficiency points. These tests all yield efficiencies compatible with their known inputs within the statistical uncertainties, with systematic effects from non-closure found to be much less than 1%.

Systematic uncertainties
The tag counting and kinematic selection analyses share most systematic uncertainties, and many sources of uncertainty are also in common with the combinatorial likelihood analysis. One important class of common systematic uncertainties are those addressing how well the simulation models the tt production process, including generator dependence, initial state radiation, and heavy flavour fragmentation. The estimation of the contamination from the main background processes is another source of systematic uncertainties. Furthermore, common systematic uncertainties are those arising from the imperfect knowledge of the jet energy scale and resolution. The dominant systematic uncertainty in the kinematic fit method arises from uncertainties in the background subtraction.
All individual contributions to the systematic uncertainties are summarised in tables 3 through 7, in bins of jet p T and for the MV1 b-tagging algorithm at an operating point corresponding to a nominal efficiency of 70%. While the total uncertainties on the results of the tag counting and the kinematic selection methods are dominated by systematic uncertainties, the kinematic fit method is limited by data statistics. In the combinatorial likelihood method the statistical and systematic uncertainties are approximately equal in size. The estimates of some systematic uncertainties include a sizeable statistical component which leads to unphysical bin-to-bin variations in some cases. However, when the calibration results of several methods are combined (see section 10) these irregularities are smoothed out.

Initial and final state radiation
Initial and final state radiation (IFSR) directly affects the flavour composition of the tt events. The associated systematic uncertainty due to IFSR is estimated by studies using samples generated -51 - Table 3. Relative uncertainties (in %) for the tag counting method in the single lepton tt channel (e+jets and µ+jets combined), for the MV1 algorithm at an operating point corresponding to a 70% tagging efficiency. Negligibly small uncertainties are indicated by dashes. with A MC [48] interfaced to PYTHIA, and by varying the parameters controlling IFSR in a range consistent with experimental data [49,50]. In the combinatorial likelihood method, the IFSR parameters are also varied in single top Wt-channel events.

Generator and fragmentation dependence
The baseline generator MC@NLO+HERWIG may not correctly predict the kinematic distribution of tt events, which may result in differences in the acceptance and flavour composition of selected events. A systematic uncertainty is assigned to the choice of Monte Carlo generator (Generator) by comparing the results produced with the baseline tt generator with those produced with events simulated with POWHEG+HERWIG. In the combinatorial likelihood method, which uses POWHEG+HERWIG as the baseline generator, a comparison between POWHEG+HERWIG and A +HERWIG is done instead. Uncertainties in the fragmentation modelling (Fragmentation) are estimated by comparing results between events generated with POWHEG+HERWIG and those generated using POWHEG+PYTHIA.

Background normalisation
In all analyses the dominant backgrounds are estimated using data-driven techniques. In the singlelepton final state, the dominant background comes from W +jets production, and the normalisation of this background is varied by 13% based on the consideration of the various scale factors to correct the expectations derived from simulated events. In the dilepton final states the Z+jets normalisation uncertainty depends on the number of jets in the final state. An inclusive normalisation uncertainty -52 -  of 4% is assumed, and following ref. [51], an additional term of 24% per jet is added in quadrature.
In the combinatorial likelihood method the normalisation of the Z+jets background is varied by 20 %. In the single-lepton analyses, where the Z+jets background is substantially smaller, it is normalised to the theoretical cross section and varied by 60% [46]. The multijet background in the kinematic selection measurement is varied by 50% in the e+jets channel, covering any differences in kinematic distributions arising from mismodelling of the multijet background. In the µ+jets channel, by comparing estimates based on two different control regions, the uncertainty on the multijet background normalisation can be reduced to 20%. The non-prompt or fake lepton background in the kinematic selection dilepton and combinatorial likelihood analyses is varied by 50%.
The single top and diboson backgrounds are normalised to their theoretical cross sections, and the corresponding uncertainties (8% for the single top Wt channel [52], 4% for the sand t-channel single top production processes [53,54], and 5% for the diboson background) are accounted for in the analysis. In the combinatorial likelihood method, the relative tt to single top normalisation is varied by 25 % in the two-jet bin and 35 % in the three-jet bin, motivated by scale uncertainties and parton distribution function systematic variations.

Background flavour composition
The flavour composition of all background samples except W +jets and in some cases Z+jets is taken from simulation. No systematic uncertainty on the flavour composition for these samples is -53 -   assigned. For the W +jets background the normalisations of heavy flavour (HF) events (Wb+jets, W cc+jets and W c+jets) are varied within their uncertainties. Sources of systematic uncertainty that affect the HF scale factors in W +jets events often also affect the calibration methods described in this paper directly. Examples of such systematic uncertainties are the uncertainties on the tt cross section and W +jets normalisation. To account for such correlations, these uncertainties are evaluated by coherently evaluating their impact on all components of the analysis. In the combinatorial likelihood method the heavy flavour component of the Z+jets background is varied up and down by 100% and 50% respectively, which is conservative compared to the variations between data and simulated events observed in Z + b measurements [55].

Background modelling
In the combinatorial likelihood method, the uncertainty from the modelling of Z+jets is estimated by comparing events generated with S to those generated with A . The impact of the modelling of diboson events is investigated by comparing A to HERWIG. In the other analyses, this uncertainty is estimated to be negligible compared to the corresponding background normalisation and background flavour composition uncertainties, and is neglected.

Jet reconstruction efficiency, energy scale and resolution
The systematic uncertainty originating from the jet energy scale [6] is obtained by scaling the p T of each jet in the simulation up and down by the estimated uncertainty on the jet energy scale. In the combinatorial likelihood method, the independent components of the jet energy scale uncertainty are applied separately, as discussed in ref. [6]. The nominal jet energy resolution in Monte Carlo -55 -simulation and data are found to be compatible, but a systematic uncertainty is assigned to cover the effect of possible residual differences by smearing the jet energy in simulated events. The full difference from the nominal result is taken as the uncertainty. The jet reconstruction efficiency was derived using a tag-and-probe method in dijet events and found to be compatible with a measurement using simulated tt events. A systematic uncertainty is assigned to cover the effect of possible residual differences by randomly rejecting jets based on the measured jet reconstruction efficiency.

Jet vertex fraction
The modelling of the jet vertex fraction in simulated events has been studied in Z → +jets events. Jets from the hard scatter interaction are selected from events where one jet is produced back-to-back with a high-p T Z boson, while jets from pile-up interactions are selected from events where the Z boson is produced almost at rest. Correction factors, bringing the efficiency of a jet either from the hard scatter vertex or from a pile-up vertex to pass the jet vertex fraction cut in simulated events to agree with that measured in data, are applied. These correction factors are then varied within their uncertainties.

Mistag efficiencies
In both the tag counting and the kinematic selection methods, the mistag efficiencies for c and lightflavour jets, c and l , directly enter the expression used to obtain the b-jet tagging efficiency. The efficiencies in simulated events are adjusted by the data-to-simulation scale factors obtained with the methods descried in sections 13 and 14. The efficiencies are then varied within the uncertainties on these correction factors, which range from approximately 12% to 50%.
In the kinematic selection methods the b-jet tagging efficiency ε fake/np for jets from the multijet background in the single-lepton analysis and the non-prompt or fake lepton background in the dilepton analysis is measured in a control region in data. In the dilepton analysis an uncertainty of 50% is assumed, while in the single-lepton analysis the uncertainty is obtained by comparing the baseline result with the b-jet tagging efficiencies measured in events in which the requirement of an isolated electron is replaced by that of a jet with a large electromagnetic energy fraction, the so called jet-electron model [45].

Missing transverse momentum
In the combinatorial likelihood method, the uncertainty due to the modelling of the soft-terms used in the E miss T calculation is accounted for. In addition the variation in the E miss T due to the jet, electron and muon uncertainties are taken into account when the corresponding systematics are varied.

Lepton trigger and identification efficiency, energy scale and resolution
The modelling in simulation of the lepton trigger, reconstruction and selection efficiencies as well as the energy resolution and scaling have been assessed using tag-and-probe techniques in Z → ee and Z → µµ events, as described in ref.
[56]. The correction factors obtained are further varied within their uncertainties.

Luminosity
The uncertainty on the integrated luminosity affects the measurement of the b-jet tagging efficiency due to the change in the overall normalisation of the backgrounds estimated from simulation. The integrated luminosity has been measured with a precision of 3.9% following the methods described in ref. [57].

Pile-up µ reweighting
In all analyses, the Monte Carlo simulation is reweighted on an event-by-event basis to reproduce the distribution of the average number of proton-proton interactions (µ) measured in data, after scaling µ in the Monte Carlo simulation as described in section 2. In the combinatorial likelihood method, the associated systematic uncertainty is evaluated as described in section 8.5, while the other analyses probe the indirect effect of pile-up on the results through pile-up related uncertainties in object modelling such as the jet energy scale and missing transverse momentum corrections (E miss T pile-up).

Top momentum reweighting
In the combinatorial likelihood analysis, the jet p T spectrum in data is found to be softer than the prediction from the POWHEG+PYTHIA tt sample. The distribution of the average p T of the top and anti-top quark is therefore reweighted at truth level according to the unfolded measurement performed on 2011 √ s = 7 TeV data [58]. The corresponding systematic uncertainty is taken to be 100% of the correction.

Top pair production cross section
In the kinematic selection method, the tt cross section is used to normalise the expected tt signal relative to the backgrounds. The uncertainty on the predicted tt cross section is 10% [59].

Top quark mass
The kinematic fit method involves a top quark mass constraint. To estimate the uncertainty from this source, the measurement has been repeated with simulated events with a top mass of 170 and 175 GeV and the change in the results is taken as a systematic uncertainty.
The kinematic fit method normalises the background sample in the high χ 2 region. The χ 2 value used to define the high χ 2 region has been varied from 20 to 50, and the effect on the final result is taken as a systematic uncertainty.

Pretag cut
In the kinematic fit method, the jets originating from the decay of the W boson are distinguished from those originating from bottom quarks by means of a b-tagging requirement. Nominally jets are b-tagged by the MV1 algorithm at the 70% operating point. The measurement has been repeated using the 75% operating point, and the full difference to the nominal result is taken as a systematic uncertainty.

Results
The b-jet tagging efficiency measured in data, the corresponding values from simulation and the resulting data-to-simulation scale factors for the MV1 tagging algorithm at 70% efficiency are shown as a function of the jet p T in figures 33 and 34 for the single-lepton and dilepton analyses, respectively. The agreement in the scale factors among all the methods is very good. The scale factors are close to unity, with an uncertainty ranging from 4% to about 40%, depending on the jet p T and on the calibration method. The efficiencies and scale factors as a function of |η| are shown in figure 35 for the combinatorial likelihood method. No significant scale factor dependence on |η| is observed.

The p rel T method in tt events
To further the understanding of the results obtained from muon-based calibration methods in dijet events, a p rel T analysis (as discussed already in section 8) has been performed on the single-lepton tt sample. As the tt sample has a high b-jet purity and no trigger bias on b jets, this tt-based measurement provides a useful cross check to the dijet-based p rel T measurement. Furthermore, this allows the application of different calibration methods using a very similar physics process, helping to understand potential biases originating from either different calibration analyses or the use of event samples from different physics processes.
The analysis takes the muon p rel T templates of b, c, and light-flavour jets from simulation. It has been verified through studies of simulated events that there are several sources of contamination (geometric overlaps from the close-by muons) to the muons in light-flavour jets. The dominant -59 - source is the isolated lepton that the other W boson in tt dilepton events decays to. This geometric overlap can be reduced by requiring the muons associated with jets that have a charge opposite to that of the isolated lepton in single-lepton tt events to have p T < 20 GeV. To reduce the heavy-flavour contamination to the light-flavour p rel T template, specific jet isolation (∆R(µ, other jets) > 0.8) and muon selection (∆R(µ, jet) < 0.3) criteria are imposed.
The two major sources of background in this analysis are W +jets and multijet events. With some minor exceptions, the backgrounds are estimated as described in section 9.1.3. The p rel T distributions in W +jets, Z+jets, single top and diboson events are obtained from simulation. The b, c, and light-flavour p rel T templates from W +jets samples (together with templates from Z+jets, single top and diboson) are added to the p rel T templates obtained from the simulated tt sample when performing the template fit. For the multijet background, both in µ+jets and e+jets channels, the muon p rel T shape and normalisation are obtained using the data-driven matrix methods described in section 9.1.3. Both in the pre-tag event sample (with muon-in-jet requirements) and in the tagged event sample (further requiring at least one jet being tagged by the b-tagging algorithm to be calibrated), the multijet contribution is estimated by using the fake rate fake of the tagged sample.
Examples of the b, c, and light-flavour jet p rel T templates obtained from simulation are shown in figure 36. The multijet p rel T template is treated as a fourth template besides the b, c, and light-flavour jet p rel T templates and is kept fixed at the estimated normalisation when performing the fit, while the normalisations of the b, c, and light-flavour jets are adjusted in the fit to the data. Figure 37 shows an example of p rel T template fit results before and after tagging requirements. The b-jet tagging efficiencies measured in data, the true b-jet tagging efficiencies in simulation, and the b-jet tagging efficiency data-to-simulation scale factors for the MV1 tagging algorithm at 70% efficiency are shown in figure 38 as a function of jet p T . Within sizeable statistical and systematic uncertainties, the data-to-simulation scale factors are consistent both with unity and with the scale factors obtained using the p rel T method in the dijet sample. Therefore, no evidence exists for any biases in the results originating from trigger biases and the lower b-jet fraction in the dijet -60 -

JINST 11 P04008
[GeV]   Figure 37. Examples of template fits to the p rel T distribution in data before (a) and after (b) b-tagging with the MV1 algorithm at 70% b-jet tagging efficiency, for jets with 60 GeV < p T < 90 GeV.
analysis. The results are consistent also with the other calibration results described in the present section. However, the uncertainties are too large for this method to address possible differences observed between the dijet-based and tt-based calibration results. The compatibility between these two results is discussed in more detail in the following section.
The analysis shares many systematic uncertainties with the p rel T analysis performed in the dijet sample, in particular those that affect the p rel T template shapes, as discussed in section 8.5. Besides these, most of the systematic uncertainties affecting other tt-based b-jet tagging efficiency calibrations have also been taken into consideration; details can be found in section 9.6. The systematic uncertainties on the light-flavour p rel T template contamination are calculated through independent variations by 100% of the estimated dilepton, b-jet, and c-jet contaminations. A 13% uncertainty on the W +jets normalisation is assumed. The normalisation of the multijet background is varied by 50%, based on the impact of either using the fake obtained in the pre-tag or tagged multijet samples. The b-jet tagging efficiency scale factor uncertainties for the MV1 tagging algorithm at the 70% efficiency working point are summarised in table 8. The table includes  set of systematic uncertainties, required if the scale factors were to be applied to an inclusive b-jet sample. However, when comparing the results from this method to those obtained from muon-based calibration methods in dijet events, the uncertainty associated with the extrapolation to inclusive b jets should not be considered. Analogously, when comparing to results from other tt-based measurements, many tt-related uncertainties will partially cancel.

Combination of b-jet efficiency calibration measurements
To obtain the best overall precision of the b-jet tagging efficiency calibration measurements, a combination of the results is performed. In each jet p T bin, the best estimate of the true datato-simulation scale factorκ i is extracted by maximising the likelihood that each measurement κ i , associated with a statistical uncertainty δκ stat i and a set of systematic uncertainties δκ syst il , originates from a Gaussian probability density function P i with mean valueκ i . The combination of N measurements is performed by maximising the likelihood The combination is performed separately for each p T bin, assuming that all sources of uncertainty are correlated between different measurements within a single bin. An alternative likelihood that includes all the p T bins in a global fit is used to estimate the impact on the final result of the assumption of bin-to-bin correlations. In this alternative likelihood, systematic uncertainties that are correlated among different bins are expressed through a single variable (and only one -62 - constraint), while systematic uncertainties specific to each individual bin (i.e. MC simulation statistical uncertainties) are implemented adding a different variable (and independent constraint) for each bin. In the case of correlated systematic uncertainties, the relative sign of the uncertainty in each individual measurement is taken into account. Each systematic uncertainty on the final scale factor has a positive (negative) sign if the difference between the shift in the scale factor when applying a positive and a negative variation of the underlying parameter is positive (negative), i.e., The final b-jet tagging efficiency scale factors are combinations of three individual measurements: the results of the dijet-based p rel T and system8 analyses shown in section 8.6, and those of the tt-based combinatorial likelihood method shown in section 9.7. The latter enters the combination with four individual channels, corresponding to the eµ and combined ee and µµ channels, each separately for two-and three-jet events. The fit quality and hence the overall compatibility between -63 -

JINST 11 P04008
the different measurements is evaluated by computing the global χ 2 where N all refers to the total number of measurements in all p T bins and the covariance matrix C i j accounts for correlations within and between different p T bins. As the p rel T and system8 analyses use partly overlapping samples, the statistical uncertainty is partially but not fully correlated. The correlation coefficients have been derived using toy experiments in which somewhat simplified versions of the p rel T and system8 analyses were performed. The statistical correlation of the two analyses was found to be moderate, e.g., below 50% for the MV1 algorithm at the 70% efficiency operating point. The smallest correlations are observed in the p T bins that suffer from large statistical uncertainties, while the largest correlations are found for bins in the lower p T range where the statistical uncertainties are smaller. There are also p T bins in which the two analyses use different but highly prescaled triggers, leading to a negligible correlation. The correlation of the statistical uncertainty is accounted for in the combination by dividing it into two components, one which is treated as fully correlated and one which is treated as uncorrelated. The systematic uncertainty arising from limited simulation statistics is treated as fully uncorrelated between p T bins but fully correlated between the p rel T and system8 analyses for a given bin. The main effect of this combination is a slightly reduced uncertainty at low p T compared to the uncertainty resulting from the system8 calibration alone.
The agreement in the data-to-simulation scale factors between the p rel T and system8 methods is very good. Their combination is shown, for the MV1 tagging algorithm at the 70% operating point, as the dijet result in figure 39, along with the results of the combinatorial likelihood analysis and their combination. It is seen that dijet scale factors tend to be systematically lower, by about 5%, than those resulting from the combinatorial likelihood analysis. However, both bin-by-bin and globally they are consistent with each other; the value of χ 2 /N dof in this combination turns out to be 0.95, with N dof = 48, corresponding to a probability to obtain a χ 2 larger than the observed one of 57%, providing evidence for a spread between measurements commensurate with the assigned uncertainties. The combination results have also been compared with those obtained with the "Best Linear Unbiased Estimates" (BLUE) method [60]. The two sets of results show good agreement, as well as similar χ 2 compatibility estimates. For this particular tagging algorithm and operating point, the efficiency scale factors are consistent with unity in the kinematic range covered by the analyses.

b-jet efficiency calibration of the soft muon tagging algorithm
A jet is considered tagged by the soft muon tagging (SMT) algorithm if it contains a reconstructed muon fulfilling the criteria listed in section 4. The efficiency with which a b jet in data passes such a tagging requirement is determined in a three-step approach. Data-to-simulation scale factors for the efficiency to reconstruct a muon are obtained from a tag-and-probe method, as described in refs. [61,62]. The efficiency with which a reconstructed muon passes the SMT selection criteria is measured using a tag-and-probe method in samples of isolated muons produced in J/ψ → µµ and Z → µµ decays as described in section 11.2. Finally, the probability that a b jet of a certain p T and η contains a reconstructed muon is derived from the same simulated tt sample used also in the tt-based b-jet efficiency measurements; systematic uncertainties are assigned to account for possible mismodelling of this probability. These three parts are then combined into a measurement of the SMT b-jet tagging efficiency.

Data and simulation samples
For the tag-and-probe method, events in the J/ψ → µµ sample are collected using a set of muon triggers optimised to be efficient at low p T [63], while events in the Z → µµ sample are required to have a tag muon that is accepted by the lowest-p T unprescaled trigger available in a given data taking period (which imposes a muon p T requirement of 18 GeV) [64].
The J/ψ → µµ and Z → µµ events in data are compared to simulated dimuon events, generated with PYTHIA. A simulated sample of tt events, generated with MC@NLO interfaced to HERWIG, is used to derive the b-jet tagging efficiency of the SMT algorithm.

Tag-and-probe based SMT muon efficiency measurement
The tag muon is required to pass stringent selection criteria to ensure that it is of good quality. A probe muon is then selected as a reconstructed muon that passes looser quality criteria but fulfils the requirement that the invariant mass of the tag-probe pair is consistent with that of the J/ψ meson (3096.916 ± 0.011 MeV [26]) or the Z boson (91.1876 ± 0.0021 GeV [26]).
All tag and probe muons are required to be measured both in the tracking detector and in the muon system, and to satisfy the selection criteria mentioned in section 4. The J/ψ → µµ events are further required to satisfy the following selection criteria, in line with the requirements for the study of the muon reconstruction efficiency, motivated in ref. [62]: • The tag muon is required to have |η| < 2.5 and p T > 4 GeV. The transverse and longitudinal impact parameters must fulfil |d 0 | < 0.3 mm, |z 0 | < 1.5 mm, |S d 0 | < 3 and |S z 0 | < 3.
• The probe muon is required to have |η| < 2.5 and p > 3 GeV.
The Z → µµ events are instead required to pass the following selection criteria, coherently with those adopted in the study of the muon reconstruction efficiency, described in ref. [61]: • The tag muon is required to have |η| < 2.4 and p T > 20 GeV.
• The probe muon is required to have |η| < 2.5 and p T > 7 GeV.
• Both muons are required to have longitudinal impact parameter |z 0 | < 10 mm and to be isolated from other tracks according to p 0.4 T < 0.2p muon T , where the variable p 0.4 T is the sum of the p T of ID tracks in a ∆R = 0.4 cone around the muon.
The efficiency for a muon to satisfy the SMT selection criteria, ε µ−SMT , is then measured as 1) where N µ refers to the number of probe muons either in the J/ψ → µµ or the Z → µµ sample and N SMT µ refers to the number of probe muons passing the SMT selection criteria. The background in the J/ψ → µµ sample before and after applying the SMT algorithm is estimated using fits to the dimuon invariant mass distribution. In the invariant mass range between 2.5 and 3.6 GeV, a second order polynomial and a Gaussian are used to describe the background and signal, respectively. The fitted background within 3σ of the mean of the signal peak is then subtracted from the observed event counts in this range. The remainder is assumed to be the number of J/ψ mesons, and used to estimate the efficiency through eq. (11.1). The mean of the Gaussian obtained in the fit to the pre-tag sample is used in the fit to the tagged sample.
In the Z → µµ sample, the backgrounds considered are Z → ττ, W → µν, W → τν, tt, bb and cc processes. The background contribution in the Z → µµ sample, estimated with simulated events, is found to be negligible (less than one per mille). Hence, no background subtraction is applied.

b-jet tagging efficiency measurement
The efficiency with which a b jet is tagged by the SMT algorithm, ε b , is defined as where ε sim b is later referred to as the uncalibrated b-jet tagging efficiency and ε data b as the datacalibrated b-jet tagging efficiency. The correction factor ε data µ−SMT /ε sim µ−SMT , which corrects the efficiency with which a muon passes the SMT selection criteria, is obtained with the tag-and-probe method as described in section 11.2. The correction factor ε data µ−reco /ε sim µ−reco , which corrects the muon reconstruction efficiency, is obtained from tag-and-probe studies of J/ψ → µµ and Z → µµ -66 -2016 JINST 11 P04008 decays as described in refs. [61,62]. The muon reconstruction efficiency is generally well modelled by the simulation, with scale factors compatible with unity for most of the detector region. The simulated sample from which ε sim b is derived, is corrected for known differences between data and simulation in the b-jet modelling. The simulated branching fractions of the various b-hadron decay modes giving rise to a muon are scaled to the world average values [26], both for direct decays and sequential decays via charm quarks and τ leptons. The jet energy in the simulated sample is corrected to match the scale and resolution observed in data, extracted from inclusive dijet samples and γ/Z+jet samples [65]. A correction of the jet energy to account for the momentum of the neutrino and the muon in semileptonic b-hadron decays is also applied, using the same procedure and corrections as in section 8.5. Residual differences in the modelling of b decays (b-quark fragmentation and hadronisation), together with the kinematics of the hard scatter process, are accounted for by assigning systematic uncertainties derived from the comparison between the nominal simulated samples and alternative models.

Systematic uncertainties
The systematic uncertainties associated with the b-jet tagging efficiency measurement are summarised in table 9.

SMT muon efficiency uncertainties
The uncertainties on the efficiency of a probe muon passing the SMT selection criteria are described below. The quadratic sum of these uncertainties constitute the SMT muon efficiency uncertainty.

J/ψ fit uncertainties
In the J/ψ → µµ sample, the difference in the efficiency estimates obtained using 3σ and 5σ J/ψ signal mass ranges (ε 5σ −ε 3σ ) is assigned as an uncertainty. Moreover, for a fixed fit range, the values of the three fit parameters are varied within their uncertainties and the difference in the efficiency obtained with the resulting minimum and maximum background estimates (ε bkgmax − ε bkgmin ) is assigned as a systematic uncertainty.
In the Z → µµ sample, systematic uncertainties are estimated by varying the selection cuts, tightening the |m µµ − m Z | cut to 6 GeV and loosening it to m Z − 20 GeV ≤ m µµ ≤ m Z + 10 GeV, and varying the isolation cut on

Isolation dependence
As the isolation profile of inclusive muons from b-hadron decays is very different from that for muons produced from a J/ψ or a Z boson, the dependence of the SMT muon efficiency scale factors on the muon isolation has been investigated. Three isolation variables were chosen for this investigation. These were the transverse energy deposited in the calorimeter in cones of three different sizes (corresponding to opening angles of 0.2, 0.3 and 0.4 radians) around the central muon.7 The other isolation variables are the number of tracks and the summed p T of the tracks found in cones of the same size. While the statistics in the data sample is enough to cover a wide isolation range, the more limited statistics in the simulated sample does not allow to derive data-to-simulation scale factors for muons surrounded by a large amount of transverse energy or a large number of tracks. Within the range studied however, the data-to-simulation scale factor found is consistent with unity, meaning there is no dependence of the muon efficiency on the isolation of the probe muon. Thus no systematic uncertainty from this source has been assigned.

Muon reconstruction efficiency
The muon reconstruction efficiency in simulated events is corrected with data-to-simulation scale factors obtained from tag-and-probe studies of J/ψ → µµ and Z → µµ decays [61,62]. The uncertainties on these scale factors are propagated to the b-jet tagging efficiency.

Generator dependence
The MC generator uncertainty covers the modelling of the b-quark kinematics as a result of the hard interaction in different NLO generators. This is evaluated by comparing the efficiencies in A +HERWIG [38] and POWHEG+HERWIG samples to that obtained using the baseline MC@NLO+HERWIG sample.

Parton shower and fragmentation model
Different showering and fragmentation models may result in different kinematics of the soft muon from the b-hadron decay. This is taken into account by comparing the efficiencies obtained using two samples of tt events, generated with POWHEG, one of which has been showered by HERWIG and the other by PYTHIA. To ensure that only the effects of the parton showering models are compared, rather than the difference in decay tables used by each hadronisation routine, the branching fraction of b → µX decays (both direct and sequential) in both simulated samples is re-weighted to match that in ref. [26].
7To exclude the signal from the central muon, the transverse energy deposited in a cone of opening angle 0.1 in the centre of these larger cones is not counted.

Initial and final state radiation
The systematic uncertainty due to initial and final state radiation (IFSR) in the tt events is estimated by studies using samples generated with A MC [48] interfaced to PYTHIA, and by varying the parameters controlling IFSR in a range consistent with experimental data [49,50].

Parton distribution function
The uncertainty on the parton distribution functions (PDFs) translates into an uncertainty on the b-quark kinematics. This is accounted for by using three different PDF sets, the nominal CT10 [42] as well as MSTW [34] and NNPDF [66]. The PDFs are varied based on the uncertainty along each of the PDF eigenvectors. Each variation is evaluated via an event-by-event re-weighting of the simulated tt sample. The total uncertainty assigned is the envelope of all PDF uncertainties.

Branching fraction rescaling
The rates of the direct and sequential decays of b quarks to muons are rescaled in the simulated samples to match the world average values. The experimental uncertainty on each of these values [26] is propagated to the final result. The final uncertainty on the BF rescaling is larger at higher p jet T , due to the larger relative weight of decays through double charm creation (like b → cc → µ), whose BF is known with a 30% accuracy. The η and p T dependence of the SMT muon efficiency from the tag-and-probe method is shown in figures 40 and 41 for the J/ψ → µµ and Z → µµ analyses respectively. In most of the η range the data efficiency is lower than the efficiency predicted by simulation. Only in the range η > 2 the simulation efficiency is significantly lower; the reason is a misconfiguration in these simulations. J/ψ → µµ sample is weak, there is a a strong p T dependence of the data-to-simulation scale factors in the Z → µµ sample which is caused by the sensitivity of the χ 2 match cut to residual misalignments of the Muon Spectrometer relative to the inner tracker that are present in the data but not in the perfectly aligned simulation. The scale factors were found to exhibit no strong dependence on φ. The b-jet tagging efficiency of the SMT algorithm as a function of the jet p T is shown in figure 42, where the simulation has been scaled with the muon reconstruction efficiency results from refs. [61,62] and the SMT muon efficiency calibration results from section 11.2. For the latter scaling, the J/ψ → µµ (Z → µµ) results are used for muon p T < 12 GeV (p T > 12 GeV). The green area indicates the statistical uncertainty summed in quadrature with the systematic uncertainties from the b-jet modelling presented in section 11.4.2 (with the exception of the uncertainty on the muon reconstruction efficiency) while the hashed area corresponds to the quadrature sum of the statistical uncertainty and the systematic uncertaintes on the muon reconstruction and SMT muon efficiencies. As analyses using the SMT tagging algorithm use the muon-based scale factors rather than the jet-based ones, no jet-based data-to-simulation scale factors are presented here.

c-jet tagging efficiency calibration using events with a W boson produced in association with a c quark
The efficiency with which a b-tagging algorithm tags c jets is referred to as the c-jet tagging efficiency. The method to calibrate the c-jet tagging efficiency described in this section is based on the selection of a single c jet produced in association with a W boson and identified by a soft muon stemming from the semileptonic decay of a c hadron; the W boson is reconstructed via its decay into an electron and a neutrino. In proton-proton collisions at a centre-of-mass energy of √ s = 7 TeV, the dominant production mechanism is gs → W − c and gs → W +c , where the W boson is always accompanied by a c quark of opposite charge. Given that the soft muon and c quark charge signs are the same, requiring that the charge of the soft muon and the charge of the electron from the W -boson decay should be of opposite sign selects W +c events with high purity. Most of the background processes are evenly populated with events where the charges of the decay leptons are of opposite sign (OS) or of same sign (SS). Therefore, the number of W +c signal events can be extracted as the difference between the numbers of events with opposite and with same charge leptons (OS-SS). This fundamental strategy has already been exploited in several W +c production cross section measurements [67][68][69][70][71]. Since the present analysis was performed in the course of the recent W +c cross section measurement [71], details about the extraction of the W +c sample, i.e. the event selection and background estimations, can be found in this reference. In the remainder of this section, jets that are soft-muon tagged (SMT) applying the algorithm described in section 4 are referred to as SMT jets and a sample composed of such jets extracted as the number of OS-SS events is referred to as the SMT-jet sample.
In a first step the c-jet tagging efficiency is measured using the SMT-jet sample in data and simulation. Following that, an extrapolation procedure is performed to derive data-to-simulation scale factors that are applicable to an unbiased, inclusive sample of c jets. It should be noted that the analysis described here does not attempt to perform a measurement of the calibration factors in bins of c-jet transverse momentum, because of the limited available number of events.

Data and simulation samples
The signal process is defined as a W boson produced in association with a single charm quark. W bosons produced in association with light quarks or gluons, hereafter referred to as W +light, as well as charm-or bottom-quark pairs are considered as backgrounds. The contribution from W bosons produced in association with a single bottom quark is negligible. The background further includes the production of Z/γ * +jets, top-quark pairs, single top quarks, dibosons (WW , W Z and Z Z) and multijet events. Because of the symmetry in the process of producing heavy charm-or bottom-quark pairs in W +cc and W +bb events, these are expected to show an even population of OS and SS events, whereas this is not the case for the other processes.
The data sample used to perform the analysis is collected by a single-electron trigger. The W+c signal process is generated using A 2.13 [38], where the showering and the hadronisation is done with PYTHIA 6.423 [9]. An additional signal sample is produced to study systematic uncertainties, where HERWIG 6.520 [8] is used for the parton shower and J 4.31 [72] for -71 -the underlying event. Samples with zero to four additional partons are used and the MLM [73] matching scheme is applied to remove overlaps between events with a given parton multiplicity generated both by the matrix element and the parton shower. Also overlaps with A samples used to model background processes leading to W bosons and heavy-flavour quarks are removed. The CTEQ6L1 parton distribution function [39] is used.
To improve on known shortcomings of the A +PYTHIA predictions and to minimise systematic uncertainties, several c-quark fragmentation and c-hadron decay properties are corrected as explained in section 12.5. In the following this corrected sample is referred to as the A +PYTHIA-corrected sample; the sample without any of the fragmentation and decay corrections applied is called the A +PYTHIA-default sample. To study c-hadron decay properties another signal sample is used which is also generated with A and PYTHIA, but where the E G [32] program is used to model the c-hadron decays.
Details on the simulated samples used to describe the W , Z and top background processes can be found in ref. [71]. Their contributions are normalised to NNLO predictions in case of the inclusive W , Z and tt productions [36,74] and to NLO predictions for the other processes [75,76]. The normalisation as well as other properties of the multijet background are determined using data-driven techniques.

Event selection
Only the most important steps of the event selection are mentioned here, while a more detailed description can be found in ref. [71]. Some information on the object definitions is also given in section 2.
W bosons are reconstructed via their leptonic decay into an electron and a neutrino. Electrons are required to have a transverse momentum p T > 25 GeV and a pseudorapidity range |η| < 2.47, excluding the calorimeter transition region 1.37 < |η| < 1.52. Electrons that fulfil the "tight" identification criteria described in ref.
[77] and re-optimised for the 2011 data-taking conditions are selected. In addition a calorimeter-based isolation requirement is applied: the sum of transverse energies in calorimeter cells within a cone of radius ∆R < 0.3 around the electron direction, ∆R<0.3 E cells T , is required to be less than 3 GeV. Only events with exactly one isolated electron are selected; events with additional electrons or isolated muons are rejected to suppress events from Z and tt background processes. Events are required to have missing transverse momentum (E miss T ) of at least 25 GeV accounting for the presence of the neutrino. The transverse mass of the W boson candidate reconstructed from the electron and neutrino candidates, m T (lν), is required to exceed 40 GeV. Events with exactly one jet with p T > 25 GeV are selected. This single jet is moreover required to contain exactly one muon within a cone of radius ∆R = 0.5 around the jet direction, following the selection requirements of the SMT algorithm as described in section 4.

Determination of the W + c yield
The yield of the W +c signal process is determined exploiting its charge correlation, by subtracting the number of SS events from OS events, N OS−SS = N OS − N SS . The remaining background, substantially reduced after the OS-SS subtraction, consists predominantly of W+light and to a lesser extent of multijet events. Their contributions after the OS-SS subtraction are estimated using datadriven methods as sketched below and explained in more detail in ref. [71]. Smaller backgrounds -72 -

JINST 11 P04008
from Z/γ * +jets, top and diboson production are estimated from MC simulations. Backgrounds from W +bb and W +cc events are negligible since they are expected to contain the same number of OS and SS events.
The number of OS-SS events of the W +light and multijet backgrounds is obtained using the following relation where the number of background events in the SS sample, N SS bkg , and the OS/SS asymmetry, A bkg , defined as are determined independently. For the W+light background, an estimate of N SS W +light is obtained from MC simulation, corrected for the rate of SMT light-flavour jets as described in section 15. For the multijet background, N SS multijet is determined using a data-driven method. In the SS sample, a binned maximum likelihood fit of two templates to the E miss T distribution in data is performed. One template represents the multijet background and the other the contributions from all other sources (including the W + c signal). While the former is extracted from a data control region selected by inverting some of the electron identification criteria as well as the electron isolation requirement, the latter is obtained from simulation. Since the SS sample is mainly composed of W + light and multijet events, the initial W +light and multijet estimates are adjusted by performing a constrained χ 2 fit so that the sum of all backgrounds and the signal equals the number of data SS events. The other backgrounds and the small W +c signal contribution are fixed to their values predicted by simulations in this fit.
For both the W +light and multijet backgrounds, A bkg is obtained using data-driven methods. The asymmetry of the multijet background is derived by performing the template fit to the E miss T distribution in data, as described above, separately for the OS and SS samples. A multijet , derived from the fit results according to eq. (12.2), is found to be consistent with zero within the assigned total uncertainties. The total uncertainties are dominated by the statistical component.
The asymmetry of the W +light background, A W +light , is obtained from MC simulation, but corrected by a factor derived in a data sample that is defined by omitting the identification requirements for the soft muon. The correction factor is obtained by investigating the charge correlation of the electron from the W -boson decay and generic tracks passing the soft-muon kinematic requirements and being associated with the selected jet. A W +light is found to be approximately 10%. The assigned total uncertainty is dominated by the statistical uncertainty due to the limited size of the simulated sample.
The selected numbers of OS and SS events in the data SMT-jet sample are 7445 and 3125, respectively. This results in a number of OS-SS data events of 4320 ± 100 (stat.) and an extracted number of SMT W+c events of 3910±100 (stat.)±160 (syst.). The estimated number of background events, the number of data events and the measured W+c yield are summarised in table 10. The only backgrounds exhibiting a significant asymmetry are from the single-top and diboson processes. Figure 43 shows the p T distribution of c-jet candidates in OS-SS events as well as the output weight of the MV1 tagging algorithm for which the c-jet tagging efficiency is calibrated as described -73 - Table 10. Number of OS-SS events of the different backgrounds and in data, as well as the measured W + c yield. The combined statistical and systematic uncertainties are quoted. Correlations between the uncertainties due to exploiting the constraint in the SS sample are taken into account when computing the total background uncertainty.  in the following sections. The signal contribution is normalised to the measured yield and the background contributions to the values listed in table 10. The W +c signal shapes are derived from the A +PYTHIA-corrected simulated signal sample. The multijet shapes are extracted from the data control region used to derive the fit templates for determining A multijet .

Measurement of the c-jet tagging efficiency of SMT c jets
The output weight of the MV1 tagging algorithm for which the c-jet tagging efficiency is calibrated at operating points corresponding to b-jet tagging efficiencies of 85 %, 75 %, 70 % and 60 % in simulated tt events is shown in figure 43b. It should be noted that in what follows the number of events refers to the number of OS-SS subtracted events, unless indicated otherwise.
The c-jet tagging efficiency of SMT c jets, ε data c(µ) , is derived as the fraction of W + c events selected in data that pass a certain b-tagging requirement where N W c is the number of W +c events before applying the b-tagging requirement (hereinafter referred to as pre-tag level or sample) and N b-tag W c is the number of W+c events passing the b-tagging requirement. N W c is derived as described in section 12.3.
The number of b-tagged signal events, N b-tag W c, is determined by subtracting from the number of all b-tagged events the expected number of b-tagged background events taking into account the tagging rates of the different background components, which depend on the jet flavour composition of the background component and their respective tagging efficiencies. The tagging rates of the W +light, tt, single top, diboson and Z +jets backgrounds are extracted using MC simulation with the tagging efficiencies of the differently flavoured jets being corrected to match those in data by applying b-tagging scale factors (sections 10 and 14) and the corresponding uncertainties being taken into account. The total uncertainties on the tagging rates are either dominated by or of the same order as the statistical uncertainties due to the limited size of the simulated samples.
The tagging rate of the multijet background is estimated using a data-driven method which follows closely the procedure to extract the OS/SS asymmetry at pre-tag level described in section 12.3: a binned maximum likelihood fit of templates to the E miss T distribution in data before and after applying the b-tagging requirement is performed. The multijet templates used to derive the number of b-tagged multijet events are extracted from data control samples defined in section 12.3 with the additional requirement that the selected SMT jet is b-tagged. Since the multijet tagging rates computed using the fit results for the OS and SS samples lead to compatible results, the final multijet tagging rate is obtained from the fit results derived for the sum of the OS and SS samples. The fit results before and after applying the MV1 b-tagging requirement corresponding to the ε b =70 % operating point are shown in figure 44. The multijet tagging rates for the different operating points vary between 26 % and 55 % indicating that the multijet sample has a large heavy flavour component. The relative total uncertainties range between 15 % and 23 % accounting for the dominating statistical uncertainties of the E miss T fits and systematic uncertainties related to the choice of the fit range and variations of the shapes of the multijet and non-multijet templates, obtained by modifying the inverted electron identification requirements or the relative fractions of the different background processes, respectively.
The c-jet tagging efficiencies of SMT c jets in data ε data c(µ) derived according to eq. by the precision on the W +light and multijet background yields at pre-tag level, in particular on the data-driven OS/SS asymmetry estimates as mentioned in section 12.3, and on the W +light tagging rate. The statistical uncertainties are of the same order as the systematic uncertainties.
The expected c-jet tagging efficiency, ε sim c(µ) , is defined as the fraction of SMT c jets selected in simulated A +PYTHIA-default W+c events that pass the b-tagging requirement. In figure 45a ε sim c(µ) is compared to ε data c(µ) for the different b-tagging operating points. The data-to-simulation scale factors for SMT c jets, κ data/sim c (µ) = ε data c(µ) /ε sim c(µ) , are shown in figure 45b. They decrease from 0.99 to 0.87 with increasing tightness of the operating points, while their total uncertainties increase from 4 % to 10 %. The systematic uncertainties arise from the previously discussed background determinations as well as from the W boson reconstruction and the SMT c-jet identification efficiencies. A more detailed discussion and a breakdown of the systematic uncertainties can be found in section 12.6. The statistical uncertainties are of the same order as or larger than the systematic uncertainties.

Calibration of the c-jet tagging efficiency for inclusive c-jet samples
Due to several differences between an inclusive sample of c jets and a sample of SMT c jets the derived c-jet tagging efficiency scale factors need to be extrapolated in order to be applicable also to the former.
Selecting a sample of c jets via semimuonic decays of c hadrons leads to a c-hadron composition that is different with regard to an inclusive c-jet sample, given that the semileptonic branching fractions of the different c-hadron types vary significantly. Since the c-hadron types differ also in several other characteristics relevant for the performance of b-tagging algorithms, e.g. the lifetime -which is correlated with the semileptonic branching fraction -or the charged particle decay multiplicity, the tagging efficiencies of c jets associated to different types differ considerably. For instance, the tagging efficiencies of c-jets associated to the most prominent weakly decaying c-hadron types for the ε b =70 % operating point of the MV1 tagging algorithm are as estimated from the A +PYTHIA-corrected sample, where the quoted uncertainties are statistical only. Therefore, the overall tagging efficiency of a c-jet sample strongly depends on the admixture of different c hadrons. Furthermore, the tagging efficiencies for SMT c jets differ with regard to inclusive c-jet samples because of differences in the decay properties, such as that requiring an associated muon guarantees at least one well reconstructed track stemming from a c-hadron decay.
The c-jet tagging efficiency of an inclusive sample of c jets, ε c , can be obtained from the c-jet tagging efficiency of SMT c jets, ε c(µ) , by applying an extrapolation factor α: The comparison of the expected ε sim c and ε sim c(µ) for several operating points of the MV1 tagging algorithm derived using the W+c sample simulated with A +PYTHIA-default shows that ε sim c is systematically lower than ε sim c(µ) , resulting in a correction factor α sim of about 0.8 for the different operating points.
The c-jet tagging efficiency scale factor κ data/sim c for inclusive c jets can be computed similarly from the measured c-jet tagging efficiency scale factor κ data/sim c (µ) by applying a correction factor δ κ data/sim -77 -where δ is expressed as the ratio of the efficiency extrapolation factors in data, α data , and simulation, α sim . Mismodelling of the differences between SMT c-jet and inclusive c-jet samples in MC simulation leads to different extrapolation factors α and thus to a ratio δ deviating from one.
The efficiency extrapolation factor α data is estimated using the simulated A +PYTHIAcorrected sample where c-quark fragmentation and c-hadron decay properties are corrected to the current best knowledge as discussed in detail in the following, and yielding a corresponding extrapolation factor α corr sim . An estimate for the scale factor extrapolation factor δ is then obtained from δ ≈ α corr sim α sim . (12.6) In order to describe the c-hadron composition of inclusive c-jet samples correctly, the fragmentation fractions of the relevant weakly decaying c-hadron types in the PYTHIA-default sample are reweighted to those obtained by combining the results of dedicated measurements performed in e + e − and e ± p collisions [78]. By correcting the semileptonic branching fractions of c hadrons to match the world average values [30], also the modelling of the c-hadron composition in the simulated SMT c-jet sample is improved. Comparing the c-hadron fractions in the SMT c-jet sample with the ones in the inclusive c-jet sample shows that the D + -meson component is strongly enhanced in the SMT c-jet sample due to its relatively large semileptonic branching fraction: the SMT c-jet sample consists of a similar amount (∼ 43 %) of D 0 and D + mesons, while in the inclusive c-jet sample the D 0 meson is clearly the dominant c-hadron type (∼ 60 %).
Given that the b-tagging algorithms exploit track and vertex properties that can specifically be associated with band c-hadron decays, it is important that these are well modelled by MC simulations. This applies in particular to the charged particle decay multiplicity of c-hadron decays. In order to improve its description in the PYTHIA-default simulation the relative branching fractions of the dominant semileptonic decay channels of the abundant D + and D 0 mesons are corrected to match the world average values [30]. The less frequent decay channels of the two mesons, which are known to a lower precision, are adjusted to maintain the overall normalisation. It is found that the relative fractions of D + decays with one and three charged decay products have a noticeable impact on the c-jet tagging efficiency of SMT c jets. Also the hadronic n-prong branching fractions of c hadrons in the PYTHIA-default sample are corrected with a significant impact on the predicted c-jet tagging efficiency of an inclusive c-jet sample. While the corrections in case of the D 0 meson have been inferred from measured inclusive n-prong branching fractions [30], the hadronic n-prong branching fractions of the D + and D s mesons as well as the Λ + c baryon are re-weighted to the predictions of the E G simulation. A comparison of the PYTHIA-default and E G predictions, as well as the measured values in case of the D 0 meson, reveals large differences in the hadronic nprong distributions. In particular the 2-to-0-prong ratio of the D 0 meson and the 3-to-1-prong ratio of the D + meson are found to have a significant influence on the inclusive c-jet tagging efficiency.
Finally, the b-tagging performance also depends on the kinematic distributions of the c-hadron and its decay products: first, the effect of any mismodelling of the momentum fraction of the c hadron (p c hadron T /p c jet T ), which is sensitive to the c-quark fragmentation function, is evaluated by comparing different simulations; second, the momentum of the decay muon in the rest frame of the c hadron (p * ) is re-weighted to agree with the E G prediction.
-78 - The A +PYTHIA-corrected simulation obtained by applying all corrections discussed above predicts a significantly lower b-jet tagging efficiency for inclusive c-jet samples with respect to the A +PYTHIA-default simulation, mainly due to the correction of the hadronic n-prong decay branching fractions. Since the impact of the corrections on the c-jet tagging efficiency predicted for a SMT c-jet sample is very small (<2% for all operating points), the efficiency correction factor computed using the A +PYTHIA-corrected sample α corr sim = 0.69 − 0.76 is systematically lower than α sim = 0.79−0.83 of the A +PYTHIA-default sample. The resulting scale factor extrapolation factors δ are 0.86-0.95, systematically lower than unity with a decreasing trend towards tighter operating points. Their total systematic uncertainties, ranging between 3 % and 7 %, are due to the before-mentioned corrections and are discussed in detail in section 12.6. Statistical uncertainties are neglected since the numerator and the denominator of δ are computed using approximately the same simulated events.

Systematic uncertainties
Systematic uncertainties on the c-jet tagging efficiency scale factors arise from the W boson reconstruction and SMT c-jet identification, the pre-tag yield and tagging rate determination of the backgrounds as well as from the extrapolation procedure to correct the measured c-jet tagging efficiency scale factors for SMT c jets. The different contributions are summarised in table 11 and discussed below.

Event reconstruction
The W boson reconstruction uncertainty arises from the electron trigger and reconstruction efficiencies, the electron energy scale and resolution as well as the E miss T reconstruction. There are two main sources of uncertainty on the c-jet identification: first the determination of the jet energy scale and resolution, second the reconstruction efficiency and the energy resolution of the soft muon as well as the SMT tagging efficiency and mistag rate, respectively. The lepton uncertainties are assessed by varying each of the efficiencies, the mistag rate, the energy scale and resolution in simulation -79 -within the range of the assigned uncertainties as determined from independent measurements and re-calculating the resulting c-jet tagging efficiency. The uncertainties due to jet energy scale and resolution determinations are estimated in the same way. The uncertainties on the lepton and jet energy scale and resolution are additionally propagated to the reconstruction of the missing transverse momentum. Further systematic uncertainties that affect the E miss T reconstruction, but are not associated with reconstructed objects, are also accounted for. The systematic uncertainties due to the jet energy scale and resolution calibrations dominate the event reconstruction uncertainties, but are of the same order as the statistical uncertainty due to the limited size of the simulated signal sample. A more detailed breakdown and discussion of the event reconstruction uncertainties can be found in ref. [71].

Pre-tag yields and background tagging rates
The determination of the OS-SS background yields at pre-tag level and the assessment of the corresponding uncertainties are discussed in section 12.3. The main source of systematic uncertainties is the data-driven estimation of the OS/SS asymmetry of the W +light and multijet backgrounds. The uncertainty due to the background tagging rates is dominated by the uncertainty on the W+light tagging rate mainly because of the limited size of the simulated sample used to derive it, as discussed in section 12.4.

Fragmentation and decay modelling
The c-quark fragmentation and c-hadron decay properties are corrected to improve the modelling of the A +PYTHIA signal sample as described in section 12.5. Whenever results from independent measurements are used to correct the MC description, the uncertainties assigned to those results are propagated to the extrapolated scale factors. This is done for the fragmentation fractions and the semileptonic decay branching fractions of the prominent weakly decaying c-hadrons as well as for the hadronic n-prong decay branching fractions of the D 0 meson. Where corrections are derived from MC simulations because no measurements are available, the corresponding systematic uncertainties are assessed by comparing predictions from different MC generators. Hence, the difference between the PYTHIA and HERWIG simulations is used to estimate the uncertainty due to the fragmentation function of c quarks. The systematic uncertainty due to a possible mismodelling of the p * distribution of the soft muon is evaluated from the difference between the E G and PYTHIA simulations. The largest difference between the E G and either the PYTHIA or HER-WIG simulations is used to estimate the uncertainties due to the hadronic n-prong decay branching fractions of the D + and D s mesons as well as the Λ + c baryon. The largest effect on the final scale factors computed for inclusive c jets arises from the correction of the n-prong decay branching fractions of hadronically decaying c hadrons. Since only semileptonically decaying c hadrons are used in the data measurement, uncertainties on the properties of hadronically decaying c hadrons propagate fully to the scale factors for inclusive c jets.

Results
The data-to-simulation c-jet tagging efficiency scale factors for several operating points of the MV1 tagging algorithm with respect to a W+c sample simulated with A +PYTHIA-default are shown in figure 46 for the different operating points. Being applicable to inclusive samples of c jets, these scale factors are derived from the measured c-jet tagging efficiency scale factors for SMT c jets (see section 12.4) by a simulation-based extrapolation procedure. The results range between 0.75 and 0.92, decreasing with increasing tightness of the operating point, while the assigned total uncertainties increase from 5 % to 13 %. There are three main sources of uncertainties that are of the same order: the statistical uncertainty, the systematic uncertainty on the measured scale factors for SMT c jets and the systematic uncertainty due to the extrapolation procedure described in section 12.5. The main contribution to the latter is the limited knowledge of the charged particle multiplicity of c-hadron decays.
13 c-jet tagging efficiency calibration using the D method In this section the c-jet tagging efficiency is measured using a sample of jets containing D + mesons, by comparing the yield of D + mesons before and after the tagging requirement. The measurement is based on the D + → D 0 (→ K − π + )π + decay mode, and the contamination with D + mesons that result from b-hadron decays is measured with a fit to the D 0 pseudo-proper time distribution.

Data and simulation samples
The data sample used in the D measurement was collected using a logical OR of inclusive jet triggers. These triggers have been heavily prescaled to a constant bandwidth of about 0.5 Hz each and reach an efficiency of 99% for events having the leading jet with an offline p T higher than the corresponding trigger thresholds by a factor ranging between 1.5 and 2. Events with at least one jet with p T above a given threshold at the highest trigger level are selected, and using a combination of the inclusive jet triggers, the data set covers the entire 20-200 GeV jet p T range used in the analysis. The analysis makes use of a Monte Carlo simulated sample of multijet events. The samples used are equivalent to those used in the muon-based b-jet tagging efficiency measurement (see section 8) with the exception that each event in the sample used in the D analysis is required to contain a D + meson, in the decay mode D 0 (→ K − π + )π + . Approximately one million events have been simulated perp ⊥ bin.
-81 -As the trigger algorithms requiring a single jet with a p T below approximately 250 GeV are prescaled in data but not in simulated events, the p T spectrum of jets in the multijet samples is harder in data than in simulation. Therefore the jet p T distribution has been reweighted to match that observed in data.

The D analysis
D + selection D + mesons are reconstructed in the decay D + → D 0 π + , with D 0 → K − π + . Pairs of oppositely charged tracks are considered for the D 0 candidates, assigning both kaon and pion mass hypotheses to them. Studies on simulated data confirm that only the correct combination of mass hypotheses produces a D 0 in the expected mass region. The D 0 candidates are then combined with charged particle tracks with opposite sign to that of the kaon candidate, assigning the pion mass to them.
The D + candidates must satisfy the following criteria: • All tracks must have at least five hits in the silicon tracking detectors, at least one of them in the pixel detector.
• The transverse momenta of the kaon and pion candidates from the D 0 decay candidate have to satisfy p T > 1 GeV.
• The transverse momentum of the D + candidate has to exceed 4.5 GeV.
The decay chain is fitted as follows: first the D 0 vertex is formed by fitting the kaon and pion candidates, and the resulting D 0 direction is reconstructed by combining the kaon and pion fourmomenta; the D 0 direction is then extrapolated back and fitted with the pion candidate to form the D + vertex. The decay chain is fitted with a tool allowing the simultaneous reconstruction and fit of both vertices. No requirements are made on the vertex fit χ 2 probability in order to minimise the bias on the b-tagging.
The D + candidate is in turn associated with a reconstructed jet requiring its direction to be within ∆R(D + , jet) = 0.3 of the jet direction. Finally, to reduce the amount of combinatorial background, as well as the contribution from b jets, the momentum of the D + candidate projected along the jet direction has to exceed 30% of the jet energy.
The kinematics of the decay causes the D + to release only a small fraction of energy to the prompt pion, usually called "slow pion"; for this reason the D + signal is commonly studied as a function of the mass difference ∆m between the D + and D 0 candidates. D + mesons are expected to form a peak in the ∆m distribution around 145.4 MeV, while the combinatorial background forms a rising distribution, starting at the pion mass. Figure 47 shows the distributions of the mass difference for the D + pairs associated with a reconstructed jet for four different jet p T intervals: 20-30 GeV, 30-60 GeV, 60-90 GeV, and 90-140 GeV.
A fit of the ∆m distribution in each jet p T interval is done in order to determine the yield of the D + mesons. The signal part S of the ∆m distribution is fitted using a modified Gaussian -82 - (Gauss mod ) function: which provides a better description of the signal tails than a simple Gaussian. The mean and width of the ∆m peak, ∆m 0 and σ, are the free parameters in the fit. The combinatorial background B is fitted with a power function multiplied by an exponential function: where α and β are free fit parameters.

Background subtraction technique
To allow for comparisons between data and simulation of observables related to the D + mesons or jets, for the mixture of b and c jets present in data, a background subtraction technique is used. Signal and background regions are defined as the region within 3σ of the ∆m peak value and the region above 150 MeV, respectively. The choice of the ∆m intervals for the signal and background -83 -

JINST 11 P04008
regions aims at including almost all the signal events in the signal region and ensuring a negligible fraction of signal events in the background region. For each observable, the data distribution extraction is carried out as follows: the distribution of events from the background region, normalised to the fitted background fraction in the signal region, is subtracted from the corresponding distribution in the signal region. The procedure relies on the assumption that the distribution of the observable of interest is the same for the combinatorial background under the peak and in the sidebands. This assumption has been verified to be valid in simulated events. It is further supported by the observation in data that the distributions obtained from two different contiguous sideband regions (∆m ∈ [150, 160] MeV and ∆m ∈ [160, 168] MeV) are compatible with each other within their statistical uncertainty.

Measuring the flavour composition in the D + sample
The measurement of the flavour composition for the selected D + sample is a key ingredient for its use in b-tagging calibration studies. The discriminating variable adopted in this paper to identify bottom and charm components is the D 0 pseudo-proper time defined as: where m D 0 is the D 0 meson mass, p T (D 0 ) is the transverse momentum of the reconstructed D 0 candidate and L xy (D 0 ) is the distance, in the transverse plane, between its decay vertex and the primary vertex in the event.
The first step of the flavour composition fit is the extraction of charm and bottom templates from simulated data: • The resolution on the D 0 pseudo-proper time, R(t), is described by the sum of a simple Gaussian and a modified Gaussian. As it has been verified with simulated events that the resolution does not depend on the D + production mechanism, its parameters are fitted to the more abundant charm component.
• The D 0 pseudo-proper time distribution for the charm component, F c (t), is modelled as a single exponential function with a time constant equal to the measured D 0 lifetime [26], convolved with the pseudo-proper time resolution R(t); no additional fits are needed to obtain the charm component model. Once the models for charm and bottom components are fixed, their sum is built as -84 -where f b is the fractional bottom abundance, and is used to fit the simulated or real backgroundsubtracted data. A binned maximum likelihood fit is performed leaving the f b parameter free. A validation of the fit procedure is performed by splitting the simulated inclusive sample into 40 sub-samples and repeating the pseudo-proper time fit on each sub-sample. The pull distribution of the fitted purity is found to be compatible with a Gaussian distribution centred on zero and with unit width, thus confirming that the fit results are unbiased and the uncertainties properly estimated.

Fit results
The fit is done using the D 0 pseudo-proper time defined in eq. (13.3), in the range [-1, 2] mm. The fit to background-subtracted real data, in the four bins of jet p T , is shown in figure 48. The bottom fractions as determined by these fits are summarised in table 12. In order to cross-check the fit results, the distribution of the D 0 impact parameter, a variable sensitive to the bottom component, is analysed. Figure 49 shows the comparison between the background-subtracted data and the Monte Carlo simulation for the impact parameter of the D 0 meson emerging from the D + decay. The distribution in simulated events is obtained by summing the bottom and charm components according to the overall f b value given in table 12. Data and simulation distributions are found to be in reasonable agreement.
Using the background subtraction technique described above, the shape of any variable in data can be compared to that in simulation. Figure 50 shows the distributions of the SV0 output weight, namely the decay length significance,8 the IP3D+JetFitter output weight, the IP3D+SV1 output weight and the MV1 output weight in the background-subtracted D + sample. The discrepancies observed between the tag weight shapes in data and simulation will be reflected in the data-tosimulation scale factors derived with the D method.

Measuring the c-jet tagging efficiency using D + candidates
The selected sample can be used to measure the c-jet tagging efficiency for jets associated with D + candidates, by performing a combined fit to the ∆m distributions for D + mesons in jets before and after applying the b-tagging requirement.
The fit parameters describing the signal and the background shapes are required to be equal for the two distributions and the combined fit only introduces the D + tagging efficiency D + as an extra parameter accounting for the reduction in the D + peak in the tagged jets. The procedure was tested in simulation and it has been verified that the measured efficiency on jets associated with a D + meson is unbiased.
8The significance is set to −10 when no secondary vertex is found. Using this method it is possible to obtain the efficiency to tag jets associated with a D + candidate. This inclusive efficiency D + is then decomposed into the efficiency for b and c jets using: where f b is the fraction of D + coming from bottom, before the b-tagging selection, determined by the fit to the pseudo-proper lifetime. The efficiency to tag a b jet, b , is taken from simulation and corrected by the data-to-simulation scale factors obtained by the p rel T and system8 methods (the combination of individual calibration results is discussed in detail in section 10). It is straightforward to solve this equation for c .

Extrapolation to inclusive charm
The calibration procedure described above measures the b-tagging efficiency for c jets with an exclusively reconstructed D + decay, D + → D 0 (→ K − π + )π + , and hence the corresponding scale factor κ data/sim c (D + ) = data c(D + ) / MC c(D + ) . To be applicable to an inclusive sample of c jets, an extrapolation procedure has to be applied to obtain the corresponding scale factor κ data/sim c . The extrapolation procedure follows closely the procedure described in section 12.5 for the c-jet tagging -86 - efficiency calibration analysis based on W + c events (section 12). The typical values of α, defined following eq. (12.4) as α = c / c(D + ) , evaluated for the MV1 algorithm, range between 0.5 and 0.7, depending on the working point. Despite the fact that the weakly decaying D 0 meson has a significantly shorter lifetime than e.g. the D + meson, c jets containing an exclusively reconstructed D + meson are tagged more often than generic c jets, explained by the requirement of having at least two reconstructed tracks from the weak decay of a D 0 meson. To obtain the scale factor extrapolation factor δ, following eq. (12.5) defined as δ = κ data/sim c /κ data/sim c (D + ) , the c-hadron fragmentation fractions of weakly decaying c hadrons and the charged particle multiplicities in the decays of weakly decaying c hadrons have been corrected as described in section 12.5. The main discrepancies between the Monte Carlo simulation and the experimental knowledge are the Λ + c fragmentation fraction and the D 0 → 0−prong decay branching ratio, which are both lower in the simulation. Therefore the effect of the extrapolation procedure is to decrease the estimated inclusive c-jet tagging efficiency, the extrapolation factor δ ranges between 0.82 and 0.92, depending on the tagging working point, and is relatively independent of the jet p T .

Systematic uncertainties
The dominant systematic uncertainties affecting the method presented in this paper are those related to the fit of the yield of D + mesons, to the extraction of the fraction of D + mesons originating from bottom hadrons and to the extrapolation of the c-jet tagging efficiency scale factor measured on jets associated with a D + meson to that of an inclusive c-jet sample.
The systematic and statistical uncertainties on the c-jet tagging efficiency scale factors of the MV1 tagging algorithm at 70% efficiency are shown in table 13. Each source of systematic uncertainty listed in the table is explained below.

Bottom fraction fit
To study the effects of imperfect modelling of the pseudo-proper time resolution in simulation and the bottom lifetime uncertainty, the following procedure is adopted: • Resolution systematics: the fit results have a weak dependence on the assumed resolution functions, and a conservative systematic uncertainty is assigned by fixing the Gaussian and modified Gaussian widths to 0.5 and 1.5 times the resolution fitted on the simulated sample. This mainly affects small pseudo-proper time values, while the fit results are mainly influenced by the bottom tails at high positive values.
• Lifetime uncertainty: the lifetimes of the two exponentials used in modelling the bottom component are each varied by the fractional error on the inclusive b-hadron lifetime world average [26].
In both cases the maximum positive and negative variations in the bottom fraction central value are taken as an estimate of the corresponding systematic uncertainty. The total uncertainty on the -88 - bottom fraction is calculated by combining the fit statistical error together with the resolution and lifetime systematics.

b-jet tagging efficiency scale factor
The tagging efficiency for b jets is evaluated by multiplying the value found in simulation by the scale factor measured with the p rel T and system8 methods, described in section 8. The variation of this scale factor within its error is propagated to the final results as a systematic uncertainty.

D + mass fit
The systematic uncertainty in the mass fit is evaluated by removing the constraint that the width of the D + mass peak and the parametrisation of the background shape are the same in the pre-tagged and tagged sample. The fit is separately repeated with and without these assumptions and the efficiency variations are taken as two separate systematic uncertainties. The obtained uncertainties, by definition single sided, have been symmetrised assuming that a similar variation could have been observed also in the opposite direction.

Jet energy scale and resolution
The systematic uncertainty originating from the jet energy scale is obtained by scaling the p T of each jet in the simulation up and down by one standard deviation, according to the uncertainty of the jet energy scale [65]. This systematic uncertainty impacts both the true c-jet tagging efficiency and the pseudo-proper time templates. The effect of uncertainties on the jet energy resolution has been found to be negligible.

Pile-up µ reweighting
In principle, the c-jet tagging efficiency as well as its estimation using reconstructed D + candidates may be affected by event pile-up. As for the b-jet efficiency measurements described in sections 8 and 9, the distribution of the average number of interactions per bunch crossing, µ , in simulated events is reweighted to agree with that in the data. The evaluation of the corresponding systematic uncertainty follows the procedure described in section 8.5.

Extrapolation
The uncertainties on the extrapolation factor are obtained by varying individually each charm fragmentation fraction and the topological decay branching fractions of c hadrons as described in section 12.5. Each variation is accounted for as a separate systematic uncertainty in table 13.

Results
The measured c-jet tagging efficiencies in data, the c-jet tagging efficiencies in simulation and the resulting data-to-simulation scale factors for the MV1 tagging algorithm at 70% efficiency are shown in figure 51. The corresponding data-to-simulation scale factors, after the extrapolation correction described in section 13.2, are shown in figure 52. in the W + c analysis. With conservative assumptions on the correlations of systematic uncertainties between the two analyses and bin-to-bin correlations within the D analysis, the weighted scale factor of the D analysis is larger than the one of the W + c analysis by about one standard deviation.

Mistag rate calibration
The mistag rate is defined as the fraction of light-flavour jets that are tagged by a b-tagging algorithm. The mistag rate is measured in data, using an inclusive sample of jets, with the negative tag method which is described in the following.

Data and simulation samples
The event sample for the mistag rate measurement was collected using a logical OR of inclusive jet triggers, analogous to what was done in the D analysis (see section 13.1).
The analysis also makes use of simulated inclusive jet samples, similar to those used in the muon-based b-jet tagging efficiency measurements and the D c-jet tagging efficiency measurement, but without any muon or D + filter. About 2.8 million events have been simulated perp ⊥ bin.

The negative tag method
Light-flavour jets are tagged as b jets mainly because of the finite resolution of the Inner Detector and the presence of tracks stemming from displaced vertices from long-lived particles or material interactions. Prompt tracks that are seemingly displaced, due to the finite resolution of the tracker, will as often appear to originate from a point behind as in front of the primary vertex with respect to the jet axis. In other words, the lifetime-signed impact parameter distribution of these tracks as well as the signed decay length of vertices reconstructed with these tracks are expected to be symmetric about zero.
The inclusive tag rate obtained by reversing the impact parameter significance sign of tracks for impact parameter based tagging algorithms, or reversing the decay length significance sign of secondary vertices for secondary vertex based tagging algorithms, is therefore expected to be a good -91 -approximation of the mistag rate due to resolution effects. For the SV0 algorithm, which is a basic secondary vertex based algorithm where the tag weight w is the signed decay length significance of the reconstructed secondary vertex, a jet is considered negatively tagged if it contains a secondary vertex with decay length significance w < −w cut rather than decay length significance w > w cut , where w cut is the reference weight cut value for a particular efficiency. For advanced tagging algorithms, based on likelihood ratios or neural networks, the negative tag rate is instead computed in a more complex way, defining a negative version of the tagging algorithm which internally reverses the impact parameter and the decay length selections. For such algorithms, a jet is considered negatively tagged if it has a negative tag weight w neg > w cut rather than the standard w > w cut . Figure 53 shows the tag weight distribution of the SV0 algorithm, as well as the standard and negative weight distributions for the IP3D+JetFitter algorithm. For the SV0 algorithm and for not too large weights (for reference a 50% b-jet efficiency is obtained with a requirement w > 5.65), the tag weight distribution is almost symmetric about zero for light-flavour jets, and the negative side of the tag weight distribution is dominated by light-flavour jets. For the IP3D+JetFitter algorithm, the standard and negative tag weight distributions for light-flavour jets are similar in shape, while the tag weight distributions for b and c jets differ substantially. For reference the weight cut value w for the IP3D+JetFitter algorithm, corresponding to a b-jet efficiency of 70%, is 0.35. The mistag rate ε l is then approximated by the negative tag rate of the inclusive jet sample, ε neg inc . The approximation would be exact if the negative tag weight distribution is identical for all jet flavours and is identical to the normal tag weight distribution. In reality, two correction factors are applied to relate ε neg inc to ε l .
• The negative tag rate for heavy-flavour, b and c jets, differs from the negative tag rate for light-flavour jets. b and c jets are positively tagged mainly because of the measurable lifetimes of b and c hadrons, shifting the decay length significance distributions towards larger values.
-92 -However, effects like the finite jet direction resolution can flip the sign of the discriminating variable, increasing significantly the negative tag rate for b and c jets. The correction factor k hf = ε neg l / ε neg inc is defined to account for this effect. Because of the effects described above and the relatively small fractions of b and c jets in the inclusive sample, k hf is typically smaller than, but close to unity.
• A symmetric decay length or impact parameter significance distribution for light-flavour jets is only expected for fake secondary vertices arising e.g. from track reconstruction resolution effects. However, a significant fraction of reconstructed secondary vertices have their origin in charged particle tracks stemming from long-lived particles (K 0 s , Λ 0 etc.) or material interactions (hadronic interactions and photon conversions). These vertices will show up mainly at positive decay length significances and thus cause an asymmetry for the positive versus negative tag rate for light-flavour jets. The correction factor k ll = ε l / ε neg l is defined to account for this effect. Because of the sources in light-flavour jets showing positive decay length, k ll is larger than unity. In particular k ll for the MV1 algorithm ranges, depending on jet p T and on the operating point, between 1 and 13.
The measured negative tag rate value ε neg inc is converted to the mistag rate ε l using the two correction factors k hf and k ll defined above: ε l = ε neg inc k hf k ll . Both correction factors are derived from simulated events.

Systematic uncertainties
The systematic uncertainties on the mistag rate scale factor κ data/sim l ≡ data l / sim l of the MV1 tagging algorithm at 70% efficiency are shown in tables 14 and 15, for jets with |η| ∈ [0, 1.2] and |η| ∈ [1.2, 2.5], respectively. In cases where the number of simulated events is not sufficient to evaluate with sufficient precision the effect of a given systematic variation, the systematic uncertainty has been evaluated by merging two adjacent jet p T bins.

Simulation statistics
The statistical uncertainties on k hf and k ll have been propagated through the analysis. The resulting uncertainties range between 1 and 11% (between 1% and 6% for the 70% operating point).

Data taking period dependence
The negative tag analysis has been carried out in three different data taking periods, and half of the largest difference between the results in different data taking periods is assigned as an uncertainty. These differences between periods (of up to 5.6%) may be related to biases introduced by the trigger selection (the inclusive jet triggers used in this analysis have undergone substantial changes in the prescale factors applied to them with evolving instantaneous luminosity) that have not been fully modelled in the selection of simulated events.

Jet vertex fraction
A jet vertex fraction requirement is imposed for the measurement, which helps to suppress the jets originating from pile-up. The dependence on the cut is studied by removing the requirement and repeating the measurement. The resulting uncertainties are below 1%.

Jet energy scale and resolution
A bias in the jet energy measurement in simulation compared to data will result in biases in the correction factors k hf and k ll if there is a correlation between the jet energy and these quantities. As the mistag rate increases with the jet energy, a shift in the jet energy scale in simulated events will also lead to an apparent mismatch between the mistag rate in data and simulated events.
To study this effect, the reconstructed jet energies were alternately shifted up and down by the uncertainty on the jet energy scale [6]. Half of the full difference of the corresponding shifts of the mistag rates is assigned as a systematic uncertainty. The resulting uncertainties are below 2%.
A bias of jet energy resolution between data and simulation is corrected with a smearing function applied to simulated events, which leads to a migration of jets between neighbouring p T bins. The effective difference from nominal is below 5%.

Trigger bias
The negative tag analysis uses the two leading jets in each event and requires them to be in a back-to-back configuration (∆φ > 2). As generally only one of the two leading jets is in the jet p T region where the inclusive jet trigger used to select the events is fully efficient, any mismodelling in the simulation of the trigger turn-on behaviour can lead to analysis biases. For example, it is found -94 - that the number of tracks associated to the leading and sub-leading jets in a given jet p T interval differs in data but not to the same extent in simulated events. In order to account for possible trigger biases, the measurement has been repeated using only jets with sub-leading p T . The variation in the mistag rate is taken as a systematic uncertainty. The trigger bias systematic uncertainty is one of the most dominant in the negative tag analysis, and is generally between 5 and 10%.

Heavy flavour fractions
The fractions of b and c jets enter directly into the correction factor k hf for the negative tag analysis. Relative uncertainties on the band c-jet fractions of 10% and 30% have been propagated through the analysis, resulting in uncertainties which are generally below 5%. These uncertainties are obtained from comparing these fractions as obtained from simulation with their estimates obtained by fitting templates of the distributions of the invariant mass of tracks significantly displaced from the primary vertex to the data.

Heavy flavour tagging efficiencies
The negative-tagging efficiencies for b and c jets directly enter into the negative tag analysis through the correction factor k hf . These efficiencies as obtained from simulation have been varied -95 -by 20% and 40%, respectively. The variations used for the band c-jet tagging efficiencies have conservatively been doubled compared to the uncertainties quoted in sections 8 and 13 to account for the extrapolation from the positive-tagging efficiency to the negative one. The resulting uncertainties are generally below 10%.

Long-lived particle decays, material interactions, fake tracks
The products from decays of long-lived particles, e.g. K s , Λ 0 , hadronic interactions or photon conversions in the detector material (mainly interactions in the first material layers of the detector), may cause reconstructed secondary vertices in light-flavour jets. While the secondary vertex based algorithms apply a veto to secondary vertices consistent with these decays or interactions, not all of them can be detected and there is a sizable fraction of vertices where one track arising from such decays or interactions is paired with a track from a different source into a vertex. Fake or badly measured tracks may also give rise to additional vertices.
To estimate the resulting systematic uncertainty from an imperfect modelling of the rate of such vertices in simulated events, the fraction of jets containing fake tracks, long-lived particles like K s and Λ 0 or material interactions have been varied based on estimates in data, before the application of their suppression criteria. The modelling of fake tracks is evaluated using the fraction of jets containing tracks with χ 2 /N dof > 3. The difference in the fraction of such tracks between data and simulated events is found to be 30%, which is assigned as a systematic uncertainty. The fraction of jets with K s or Λ 0 decays in data and simulation are compared by counting the number of events in the K s and Λ 0 mass peaks. As the fraction of reconstructed K s and Λ 0 candidates is consistent between data and simulation, the statistical uncertainty of the estimate, which is approximately 10%, is used as a systematic uncertainty. Finally, the uncertainty associated with jets with a hadronic interaction or photon conversion is estimated in simulated events, considering jets containing a selected track produced at a radial distance from the beam line r > 25 mm. About 80% of all jets have at least one such track, and a systematic uncertainty of 10% is assigned to this fraction, based on the precision with which the material in the detector is known. The resulting uncertainties are up to 7%, with the largest effects originating from hadronic interactions.

Track multiplicity
The simulation does not properly reproduce the multiplicity of tracks associated with jets observed in data. This could be due to imperfect modelling of fragmentation differences in the relative fraction of quark and gluon jets in the light-flavour sample or differences between data and simulation in the track reconstruction efficiency in the core of jets where the track density is high. A higher track multiplicity implies a larger probability of accidentally tagging a light-flavour jet as a b jet for purely combinatorial reasons. The systematic uncertainty in the negative tag analysis due to the track multiplicity is estimated by reweighting the jet sample according to the ratio of distributions of the number of tracks associated to jets in data and simulation. The effect of the track multiplicity reweighting ranges between approximately 5% at low jet p T and over 40% at high jet p T in the forward region. The track multiplicity systematic uncertainty affects the higher jet p T bins more because the discrepancy between data and simulation is larger in this region, presumably due to an imperfect modelling of track reconstruction in the core of high-p T jets in simulated events as well as to an imperfect description of the track multiplicities over a wide range of jet transverse momenta in the generator.

Impact parameter resolution
The secondary vertex reconstruction is very sensitive to the tracking resolution and the proper estimation of the track parameter errors, especially in light-flavour jets where a large contribution of fake vertices is present. It has been shown in section 7.1 that the track impact parameter resolutions in simulation are slightly better than those in data. Therefore, track impact parameters in the simulation have been smeared in order to bring data and simulation into better agreement. The chosen smearing approach does not take into account correlated modifications of the impact parameters of tracks that pass through the same pixel module, as would be needed to model residual misalignments in the Inner Detector. The parameters for the smearing have been chosen to cover the observed discrepancies in the impact parameter resolution between data and simulation. After having applied the track impact parameter smearing to the tracks in simulation, the primary vertex reconstruction and b-tagging have been rerun and the whole analysis repeated. The effect in the negative tag analysis is approximately 5% in the central η region but can be as large as 22% in the forward η region where the modelling of the impact parameter resolution is worse.
Given that the impact parameter sign for tracks associated with a jet depends on the direction of the tracks relative to the jet direction (unless a secondary vertex is found, as detailed in section 3.2), the finite jet angular resolution results in a degree of arbitrariness for tracks nearly aligned with the jet direction. The factors k hf and k ll therefore are sensitive to this resolution, and the uncertainty on the angular resolution in principle translates into uncertainties on k hf and k ll . In practice, however, smearing the jet directions in the simulation as done in section 8.3 has a negligible effect on the impact parameter significance distributions, and consequently on the predicted mistag rate.

Results
The measured mistag rates in data, the mistag rates in simulation and the resulting data-to-simulation scale factors for the MV1 tagging algorithm at 70% efficiency are shown in figure 54 for two different regions of the jet pseudorapidity.
For the MV1 tagging algorithm at the 70% efficiency operating point the efficiency in data tends to be higher than in simulation, leading to data-to-simulation scale factors that are about 1.2 and 1.4 in the central and forward directions, respectively.

Mistag rate calibration of the soft muon tagging algorithm
The probability with which a light-flavour jet is tagged by the SMT algorithm, referred to as the mistag rate, is measured in data using an inclusive jet sample. The method is designed to minimise biases from heavy-flavour jets and to make minimal use of information obtained from simulation.

Data and simulation samples
To cover a wide transverse momentum range, the events are required to pass one of several inclusive jet triggers, with p T thresholds ranging between 10 and 40 GeV. Only events in which the reconstructed primary vertex has at least five tracks associated to it are considered.
The data are compared to the same set of simulated QCD jet samples used in the negative tag analysis (see section 14.1).

Mistag rate measurement
The method to measure the mistag rate with collision data is based on a system of three equations and three unknowns (among which the mistag rate). The IP3D+JetFitter lifetime tagging algorithm (see section 3.4) is used as an auxiliary tagging algorithm to enhance the inclusive jet sample in light-flavour jets. Two samples are selected from events with exactly two jets, one in which one jet is not tagged by the IP3D+JetFitter algorithm (single-veto sample) and one in which neither jet is tagged by the IP3D+JetFitter algorithm (double-veto sample). In the latter sample the fraction of heavy-flavour jets is expected to be considerably suppressed. As the amount of heavy-flavour jets in the double-veto sample largely determines the uncertainty on the mistag rate measurement, the operating point of the IP3D+JetFitter algorithm is chosen to correspond to a high efficiency (80% in simulated tt events).
The number of jets in data which are tagged by the SMT in each of the two above samples is given by where N (N ) is the number of selected jets in the single-(double-) veto sample in data, N SMT (N SMT ) is the corresponding subset of jets which are also tagged by the SMT, f HF ( f HF ) is the fraction of heavy-flavour jets in single-(double-) veto sample and ε SMT LF is the SMT mistag rate. Assuming that the single-veto sample is already dominated by light-flavour jets and neglecting the effect of the second veto on this component (the inaccuracies in these approximations have been verified to lead to measurement biases negligible compared to the corresponding uncertainties), f HF and f HF are then related by f HF = f HF · (1 − ε HF ), (15. 3) with ε HF denoting the average of the band c-jet lifetime tagging efficiencies weighted by their relative fractions in the single-veto sample. Solving for ε SMT LF , one obtains The heavy-flavour efficiency of the IP3D+JetFitter algorithm, ε HF , is evaluated from true heavy-flavour jets in the simulated QCD jet sample described in section 4.2, corrected using datato-simulation scale factors from the calibration methods described in sections 8 and 13. Since the ratio of the fractions of b and c jets in the single-and double-veto samples could be different in data and simulation, a systematic uncertainty is associated to variations of the b-to-c ratio in the mistag rate estimate, as discussed in section 15. 3.
It has been observed that the estimation of the mistag rate bears a systematic bias with respect to the true value of the mistag rate for low jet transverse momenta. This has been found looking at the true and estimated mistag rates in simulation: the method returns an estimate about 20% to 40% lower for jet p T < 40 GeV. This effect is due to the correlation between the χ 2 match cut and the muon p T , which causes a migration of light-flavour jets towards higher values of the IP3D+JetFitter weights, rendering them more heavy-flavour-like. This bias is taken into account in the treatment of the systematics uncertainties (section 15.3).

Systematic uncertainties
The systematic uncertainties considered for the mistag rate measurement in data are shown in tables 16 and 17 for the two different jet pseudorapidity regions.

Calibration of the advanced tagging algorithm
The data-to-simulation scale factors of the efficiency of the IP3D+JetFitter algorithm (which have been determined in a way similar to that described in sections 8-10) have been varied within their uncertainties, which are also comparable to those derived for the MV1 tagging algorithm.

Flavour composition
The ratio of the fractions of b and c jets in data can be different than that found in the simulated events used to calculate ε HF . The systematic uncertainty associated to the limited knowledge of the b-to-c composition in data is assessed by doubling the fraction of b jets in the simulated QCD jet sample and re-deriving the heavy-flavour tagging efficiency for the IP3D+JetFitter algorithm. This leads to a relative 3-4% difference in the mistag rate which is taken as a systematic uncertainty.
-99 - Table 16. Relative systematic and statistical uncertainties, in %, on the mistag rate for the SMT tagging algorithm for jets with |η| < 1.2. Negligibly small uncertainties are indicated by dashes.

Method bias
The inaccurate estimation of the method with respect to the true mistag rate is evaluated in simulated events as a function of the jet p T and η. The relative difference between the estimated and true mistag rates is summarised in tables 16 and 17. The bias is found to originate from the correlation between χ 2 match and muon (or jet) p T , resulting in a slightly harder jet p T spectrum for jets tagged by the SMT. The difference is assigned as a systematic uncertainty on the estimation of the mistag rate.

Muon momentum corrections
The effect of the muon momentum corrections on the acceptance of the SMT algorithm to lightflavour jets (from the p T (µ) > 4 GeV cut) has been studied and found to be negligible in all cases.

Pile-up dependence
The mistag rate is studied as a function of the number of additional minimum-bias interactions. As no dependence is observed, no systematic uncertainty has been assigned. However, the available statistics does not allow to be conclusive on the matter.

Results
The mistag rate ε SMT,data LF , measured in data using eq. (15.4), together with the mistag rate in simulated events, ε SMT,sim LF , and the data-to-simulation scale factor κ SMT LF = ε SMT,data LF /ε SMT,sim LF are displayed in figure 55. The statistical uncertainties for jets with p T < 30 GeV are too large to allow for a meaningful measurement; the results for higher jet p T values indicate a scale factor κ SMT LF compatible with unity within one standard deviation.

Conclusions
Several b-tagging algorithms to identify jets arising from the hadronisation of b quarks have been developed in the ATLAS collaboration. The most powerful b-tagging algorithms are based on the lifetime of b hadrons leaving detectable signatures of charged particle tracks significantly displaced from the primary event vertex or secondary decay vertices in the detector. Whereas  Figure 55. The mistag rate in data and simulation (left) and the data-to-simulation scale factor (right) for the SMT algorithm, for jets with |η| < 1.2 (top) and jets with 1.2 < |η| < 2.5 (bottom). The last bin also includes jets with p T > 100 GeV. relatively simple and robust algorithms have been used already in analyses based on data collected in the very early periods of LHC running, more advanced algorithms based on sophisticated reconstruction techniques and multivariate combination methods have been provided for the analyses based on data collected from 2011 onwards. The most performant single algorithm is based on the complete reconstruction of the b-hadron decay chain involving secondary and tertiary decay vertices. This technique is used for the first time at a hadron collider experiment. To obtain the best possible performance, several b-tagging algorithms have been combined using multivariate analysis techniques like neural networks and boosted decision trees. The choice of a certain working point corresponding to a certain b-jet tagging efficiency and rejection of non-b jets allows to adapt the use of the b-tagging information to the needs of specific physics analyses. For an efficiency to identify b jets of 70%, the MV1 algorithm -the main algorithm used in ATLAS to analyse the 2011 and 2012 data -achieves rejection rates for light-flavour jets of about 100 and for c jets of about five, as estimated using simulated tt events. For kinematic properties of jets that are particularly favourable for b-tagging -in the central region of the detector and for jet transverse momenta around 80-150 GeV -the rejection of light-flavour jets reaches values above 300.
-101 -In the 2011 data a significant number of additional pile-up events, leading to additional vertices along the beam line, were present. It has been shown that this level of pile-up leads to only a minor degradation of the b-tagging performance.
To increase the efficiency to trigger mainly pure hadronic event topologies without leptons in the final state, b-tagging is also applied at the software-based trigger levels allowing to significantly decrease the trigger thresholds for these events.
In addition to the algorithms based on the lifetime of b hadrons, the presence of a muon from a semileptonic b-hadron decay in a jet is used in a dedicated b-tagging algorithm.
Since b-tagging algorithms rely critically on the reconstruction of charged particle tracks and the determination of their properties, dedicated studies have been performed. The impact parameter resolution of charged particle tracks -an important quantity driving the performance of b-tagging algorithms -has been measured over a wide kinematic range, after deconvoluting the pure track impact parameter resolution from the contribution of the primary vertex resolution. The simulation describes the data well, especially in the region of low track transverse momenta where the multiple scattering contribution dominates, pointing to an excellent description of the ID material. Differences at high track transverse momenta can be attributed to some ID residual misalignments.
To obtain a sample where particles originating from b-hadron decays and jet fragmentation can be cleanly separated, the decay B ± → J/ψ(µ + µ − )K ± has been reconstructed. This allows to validate properties of b-hadron and fragmentation tracks separately as well as details of the b-tagging algorithms like the association of the different types of particles to reconstructed primary and secondary vertices. Very good agreement has been found between data and simulation.
To make use of b-tagging algorithms in data analyses and fully specify their associated systematic uncertainties, these algorithms have been calibrated using the data themselves. The comparison to the expectation from simulation is achieved through data-to-simulation scale factors for the tagging efficiencies of the different kinds of jets (b, c and light-flavour jets). These scale factors are then applied as corrections in physics analyses and their uncertainties are propagated to the final result.
The efficiency to tag b jets with the muon-based tagging algorithm has been calibrated using J/ψ → µµ and Z → µµ events while the rate at which light-flavour jets are misidentified as b jets has been calibrated with an inclusive jet sample.
To calibrate the efficiency of the lifetime-based tagging algorithms to tag b jets two classes of events have been used. The first one is composed of QCD jet events containing a jet with an identified muon inside. Such a sample is enriched with b jets in which a semileptonic b-hadron decay has occurred. Two methods, p rel T and system8, allow a measurement of the b-jet tagging efficiency of lifetime-based algorithms up to jet transverse momenta of 200 GeV with statistical (systematic) uncertainties in the range of 1.5% to 4% (5% to 7%). The second class of events are selected tt events, which naturally have a high b-jet content, containing either one or two isolated leptons from the decays of W bosons. Several calibration methods are exploited for these samples. The tag counting method is based on the number of identified b jets in each event, while the kinematic selection method exploits the fraction of jets in a b-enriched sample that are identified as b jets. The combinatorial likelihood method increases the precision by exploiting the kinematic correlations between the jets while the kinematic fit method makes use of a kinematic fit to the tt system to identify the b jets. The tt-based measurements allow to extend the calibration -102 -analyses to jet transverse momenta of 300 GeV. Typical statistical (systematic) uncertainties for the combinatorial likelihood method range from 2% to 8% (2% to 8%). The p rel T method has also been applied to tt events, yielding results compatible with those from the method applied to dijet events, albeit with larger statistical and systematic uncertainties.
To obtain the best possible accuracy for the b-jet efficiency data-to-simulation scale factors, three individual measurements have been combined based on statistical methods taking into account the correlations in statistical and systematic uncertainties. This combined fit shows good consistency of the different measurements and results in uncertainties between 2% and 4% for transverse momenta between 20 and 200 GeV, rising to 12% for jets with transverse momenta between 200 and 300 GeV.
Two novel methods have been developed to measure the efficiency to tag c jets. The first one is based on a sample where a c jet -identified through a semileptonic decay of a c hadron into a muon -is produced in association with a W boson. Exploiting the correlation of the charges between the muon in the c jet and the W boson provides a c-jet sample with very high purity. The resulting c-jet tagging efficiency scale factors have uncertainties between 5% and 13%, depending on the chosen b-tagging operating point. The second method is based on the exclusive reconstruction of the decay D + → D 0 (K − π + )π + which allows to define a sample of c jets after the subtraction of the b-jet contribution. The statistical and systematic uncertainties on the data-to-simulation scale factors are about 10% and between 10% and 20%, respectively. Since both methods are based on sub-samples of specific c-hadron decays, a consistent procedure has been developed to obtain results valid for an inclusive sample of c jets. These two methods have been adopted for the first time to measure the efficiencies of b-tagging algorithms for c jets.
The rate to misidentify light-flavour jets as b jets has been measured on a sample of QCD jet events using the negative tag method. This method is based on modified versions of the algorithms where the signs of quantities sensitive to b-hadron lifetimes have been inverted. The uncertainties for the individual measurements extending up to jet transverse momenta of 750 GeV are typically in the range from 20% to 50% with a close to negligible statistical contribution.
The b-tagging algorithms discussed in this paper and their data-to-simulation scale factors derived in the calibration analyses have been applied in many ATLAS physics analyses covering a wide range of physics processes.