A parametrized Kalman filter for fast track fitting at LHCb

We present an alternative implementation of the Kalman filter employed for track fitting within the LHCb experiment. It uses simple parametrizations for the extrapolation of particle trajectories in the field of the LHCb dipole magnet and for the effects of multiple scattering in the detector material. A speedup of more than a factor of four is achieved while maintaining the quality of the estimated track quantities. This Kalman filter implementation could be used in the purely software-based trigger of the LHCb upgrade.


Introduction
The LHCb experiment is a dedicated heavy flavour physics experiment at the LHC focusing on the study of hadrons containing b and c quarks [1]. Due to the high luminosity at the LHC and the high proton-proton interaction cross section, a sophisticated trigger system is needed to reduce the rate of collisions saved for offline analysis. During Runs 1 and 2 of the LHC, this trigger system consisted of a hardware stage, reducing the rate from 40 MHz to 1 MHz, followed by a two-stage software trigger. In the latter, the full tracking system was read out and a partial (first stage) and full (second stage) event reconstruction were performed [2]. Both software stages included a fit of selected track candidates using a Kalman filter to extract their parameters and to reject fake tracks. In addition, the software trigger allowed an online calibration and alignment of the detector [3].
During Run 3 of the LHC, LHCb will be provided with a factor five higher luminosity compared to Run 2. In this scope, most of the subdetectors are currently being replaced or upgraded [4][5][6][7] and a new trigger strategy has been developed [8]. The hardware trigger will be removed and a two-stage, fully software-based trigger will process the full 30 MHz 1 of bunch-crossing rate. In the first stage, tracks with a high transverse momentum (p T ) and primary vertices will be reconstructed. These objects are used to select events with displaced topologies typical for b-hadron and c-hadron decays, and to select high-p T objects from decays of heavy vector bosons. In the second stage, a full event reconstruction will be performed, without any requirement on the p T and including particle identification. A large number of exclusive and several universal event selections based on the decay topology will be applied.
In LHCb, track reconstruction is split into a pattern recognition and a Kalman filtering [9,10] stage. During pattern recognition, sets in each subdetector are constructed from signals that potentially result from the passage of a single charged particle. Simple parametrizations are used throughout this procedure as it is only concerned with finding the right sets of signals and not to provide the best estimate of the track parameters. During the filtering stage, an estimate for the track parameters is calculated, and fake tracks are rejected. Given that the output of the filtering stage is used for physics selections the best possible precision needs to be achieved, hence an (extended) Kalman filter is used for track fitting. Ideally, Kalman filtering of the track candidates is already performed during the first trigger stage. However, the Kalman filter which was used during Run 1 and 2 in LHCb, in the following called default Kalman, is significantly too slow. It relies on lookup tables for the magnetic field and the material distribution of the detector [11], so-called maps. In addition it uses Runge-Kutta methods to solve the differential equations necessary to propagate the particle through the regions with an inhomogeneous magnetic field. Accessing the values in the lookup table and solving the differential equations are time consuming and prohibit the usage of the current Kalman filter in the first stage of the upgraded trigger system. This conclusion is independent of the choice of computing architecture (CPU or GPU) which is used for the first trigger stage.
In this paper, a fully parametrized version of the Kalman filter in LHCb, called parametrized Kalman, is presented. It obtains precise values of track parameters and track quality variables, while relying on neither computationally costly extrapolation methods nor material or magnetic field maps.

Detector and simulation
The LHCb detector [1] is a single-arm forward spectrometer covering the pseudorapidity range 2 < η < 5. Its Run 3 configuration includes a high-precision tracking system consisting of a silicon-pixel vertex detector surrounding the pp interaction region [5] (VELO), a large-area silicon-strip detector (Upstream Tracker (UT)) [7] located upstream of a dipole magnet with a bending power of about 4 Tm [12], and three stations of scintillating-fibre detectors (SciFi) [7] placed downstream of the magnet. Different types of charged hadrons are distinguished using information from two ring-imaging Cherenkov detectors [6,13]. Photons, electrons and hadrons are identified by a calorimeter system consisting of an electromagnetic and a hadronic calorimeter [6,14]. Muons are identified by a system composed of alternating layers of iron and multiwire proportional chambers [6,15]. Given the lack of collision data at this point for Run 3, simulation is required to model the effects of the detector response, the detector acceptance and the imposed selection requirements. In the simulation, pp collisions are generated using Pythia [16] with a specific LHCb configuration [17]. Decays of unstable particles are described by EvtGen [18], in which final-state radiation is generated using Photos [19]. The interaction of the generated particles with the detector, and its response, are implemented using the Geant4 toolkit [20] as described in Ref. [21].

Principles
In the following, the Kalman filter formalism and its application in the LHCb track reconstruction is outlined. During Kalman filtering, the information from measurements at detector planes is successively combined to obtain optimal estimates of the track parameters. The track is represented as a set of states at fixed z-positions 2 , which are typically detector layers. Each of these states is given by x = (x, y, t x , t y , q p ) and the corresponding covariance matrix P , where t x and t y are the slopes with respect to the z axis, q the charge of the particle in units of the electron charge and p its absolute momentum.
The Kalman filter procedure needs an estimate of a state as a starting point. Filtering is then a repeated application of two steps. Firstly, the current state is extrapolated to the next detector layer, and secondly, the extrapolated state is updated using the measurement in this layer. If the track has no associated measurement in this layer, the update step is omitted. These steps can be formalized as follows: given the state (x k−1|k−1 , P k−1|k−1 ) at position z k−1 , the extrapolated state (x k|k−1 , P k|k−1 ) at position z k is given by where the extrapolation function f k (x) is given by five individual mappings f k = . This leads to the transport matrix F k as The noise matrix Q k accounts for uncertainties of the extrapolation, e.g. due to scattering at the material of the detector layers or the material in between. The extrapolated state is then combined with the measurement m k in the respective detector layer to obtain the new state estimate at the position z k , x k|k and P k|k , using the following steps: Here H k projects the estimated state vector to the measurement space in order to allow a calculation of the residual r k . The covariance matrix of this residual is given by S k and is combined with the covariance matrix of the state to obtain the Kalman gain K k . The latter defines then how the estimated state is modified by the residual. The variance of the residual is given by R k . Starting at the most upstream measurement, the measurements are successively added and the track parameters updated until the last detector layer is reached. The same procedure is repeated starting at the most downstream measurement and successively including more upstream measurements. This yields two sets of states at every measurement position, which can be combined to obtain the respective optimal state.
The quality of a track can be estimated by its χ 2 track value. The value at each measurement is given by: and χ 2 track is then simply χ 2 k after all measurements have been added using the combined, optimal states. The optimal state estimates and the measurement information can also be used to remove measurements that show a large separation from the fitted trajectory by having a large contribution to the χ 2 track value. They are therefore likely to be wrongly associated to the respective track, and are so-called outliers. Once an outlier is removed, all Kalman filter steps are performed again. This procedure can be repeated until the maximum allowed number of outliers are removed, or no more outliers are present.
The above formalism is also the basis of the Kalman filter that is currently used for track fitting in the LHCb experiment. The extrapolation functions f k are based on maps of the magnetic field along the trajectory and numerical models for the extrapolations. Their complexities range up to a fifth-order Runge-Kutta method. The noise matrices Q k are obtained by a dedicated model for the multiple scattering and a map of the material traversed by the particle.
In the parametrized Kalman filter presented in this paper, these two costly steps are replaced by simple parametrizations. The extrapolation functions f k are given by analytic expressions that allow a fast evaluation and calculation of the derivatives in Equation 3. The noise matrices Q k depend on the momentum of the particle and are parametrized by a few parameters per extrapolation step.
An important difference with respect to the default Kalman filter is the treatment of energy loss due to the interaction with the detector material. While the multiple scattering is taken directly into account, the energy loss is not part of the extrapolation functions f k , i.e. f q p k is the unity transformation. This shortcoming is compensated by choosing the momentum of the state vectors to represent the momentum at the moment of production of the particle. Thereby, the extrapolation functions also take this initial momentum as input and thus indirectly take into account all energy loss that happened on average up to the respective detector layer. The only caveat being that q p after the filtering is only the best representation of the true value at the production point of the particle.

Parametrizations
Depending on the strength of the magnetic field and the typical distance between detector layers, different empirical analytical functions for the extrapolation are used.
Inside the VELO, where the magnetic field is very weak, these functions and the noise matrix are given by: where ∆z is the extrapolation distance along the z-direction and z 0 the initial or final z coordinate for a downstream or upstream extrapolation, respectively. The parameters p V 0 , p V 1 andp V 0 top V 3 are the same for all upstream and downstream extrapolations inside the VELO. They are determined using simulated B 0 s → φφ decays within the LHCb software framework, where φ → K + K − . This simulated sample allows to create a dataset D, containing pairs of states representing two consecutive measurements of one track inside the VELO. In addition to the true state parameters obtained from the simulation, also an extrapolation of each state to the z position of the respective other state is included in the dataset. Such extrapolation is based on the default extrapolation algorithm in LHCb [11]. This dataset allows tuning the parameters employing a minimization of the following likelihood-inspired function: Here, G(x, σ x ) is a normalized Gaussian distribution centered around 0 with width σ x . The two states of each dataset entry are represented by x 1 and x 2 , and the variable s is one of the state variables, s ∈ {x, t x , y, t y }. The positive empirical constant c is chosen to be small with respect to the amplitude of the Gaussian function and softens the impact of outliers.
In a first step, the extrapolation functions f x to f tx are tuned individually, taking into account that f x depends on the previously determined parameters for f tx . These tuning minimizations employ the state vector x 2 that is obtained by the extrapolation of the state vector x 1 . This choice improves the precision of the parametrized extrapolation, by removing the effect of multiple scattering that would be present if instead the true state was chosen for x 2 .
In a second step, the parameters of the extrapolation functions are fixed, and a minimization of the following function is performed: Here, G 2 (x, y, σ y , σ y , ρ) is a normalized two-dimensional Gaussian distribution centered around 0 with widths σ x and σ y and a correlation factor ρ. The variable d is either x or y.
In this minimization, the true state vector x 2 is used in order to get the correct estimate of the parameters for the respective elements of the noise matrix Q.
Inside the UT and the SciFi detector stations, the magnetic field is significantly stronger than inside the VELO and higher order terms are needed for the extrapolation functions: The noise matrix is given in full analogy to Equation 11 with the parametersp T 0 tõ p T 3 , where T either stands for the UT or the SciFi detector. These parameters and the parameters p T 0 to p T 4 are individually determined on simulation for every step from one detector layer to the next and for the upstream and downstream extrapolation separately. The same strategy as for the tuning of the parameters related to the extrapolation inside the VELO is followed.
For the long extrapolations between the different tracking subdetectors, more sophisticated parametrizations are necessary. In the case of the step between the VELO and the UT, where the magnetic field is still weak, the extrapolation is based on two equations.
The first describes the change in momentum along the x-direction of the particle: where t x/y,UT and t x/y,V are the state variables at the first UT detector layer and the last measurement inside the VELO, respectively. The right hand side of the equation consists of an integral of the magnetic field along the trajectory of the particle. Note that the integral expression is simply a parameter which was fitted for on the dataset. The second ingredient for the extrapolation is to model the effect of the magnetic field as a single kink of the trajectory at a certain z-position z mag between the VELO and the UT: where z V and z UT are the positions of the states inside the VELO and the UT, respectively. Equation 15 can be solved for t x,UT and Equation 16 is then employed to get an expression for x UT . The unknowns in these expressions are parametrized as a function of the state variables inside the VELO: In addition, the y-position of the extrapolated state is given by: where ∆z is defined as the difference between z UT and z V . The noise matrix is defined in analogy to Equation 11 with the parametersp S 0 top S 3 . These parameters and the parameters p S 0 to p S 8 are individually determined for the upstream and downstream extrapolation. The same strategy as for the tuning of the parameters related to the extrapolation inside the VELO is followed.
The extrapolation from the UT to the SciFi detector is more delicate because it is done over a distance of more than 5 meters through a strong magnetic field. Moreover, this field is far from uniform -in particular, it varies rapidly in the upper and lower regions, close to the magnet yoke. To ensure a good quality of the global track fit, the error on the extrapolation should be well below the other sources of error, mainly multiple scattering. The chosen solution is an expansion of the magnetic deviation in powers of q/p. The parametrization aims at giving good precision for charged particles used in physics analyses, that is for trajectories which roughly come from the origin.
To do so, the ideal direction (t 0 x , t 0 y ) as the one of a particle of charge q, momentum p, starting from the origin and hitting the UT detector layer in a given point (x, y) is defined. As a good approximation, we can take t 0 x = x/z + Bq/p, t 0 y = y/z, where B is proportional to the integrated field between the origin and the UT. The deviations from the ideal direction, δt x = t x − t 0 x , δt y = t y − t 0 y , are small, so only a first order expansion in δt x , δt y is considered. Corrections of higher order would be negligible compared to multiple scattering errors.
Finally, a polynomial expansion in q/p for the ideal direction is built, and a correction in δt x , δt y with coefficients which are themselves polynomials of q/p is added: where the first two terms are the straight line extrapolation, and the next ones the curvature correction. Similar expressions are used for the other state parameters f y (x), f tx (x), f ty (x). The degrees of expansion K 1 and K 2 are tuned for each parameter to obtain the required precision. In practice K 1 = 9, K 2 = 7 for f x and f tx and K 1 = 7, K 2 = 5 for f y and f ty are used. The dependence on x, y of the coefficients A u k , B u k , C u k , with u = x, y, t x , t y , is described through a tabulation on a grid of 50×50 points regularly spaced on the rectangle defined by |x/z| ≤ 0.25, |y/z| ≤ 0.25, by steps ∆X, ∆Y . In order to avoid a systematic convexity bias of a bilinear interpolation, the values at x, y are computed by a quadratic interpolation between the tabulated values at the six closest points on the grid: if (X, Y ) is the closest one, these values are: where ε x and ε y are the signs of ξ = (x − X)/∆X and ψ = (y − Y )/∆Y , respectively. With these notations the interpolation formula for a quantity F is given by : The tabulated values are obtained using the standard Runge-Kutta method of order 4, with 20 values of q/p in the range (−1/p min , 1/p min ), with p min = 3000 MeV/c and a polynomial fit in q/p. As a consequence, they do not give a reliable result for momenta below p min . Another limitation is the larger errors on the edges of the acceptance, especially for |t y | 0.25, where the field has strong spatial variations.

Performance
A sample of simulated proton-proton collisions that include a B 0 s → φφ, φ → K + K − decay is used to compare the reconstruction quality of the parametrized and the default Kalman filter. The extrapolation of the most upstream state estimate to the beam line is the same in both filters and is based on a simplified material map of the detector [11]. Therefore, not the state near the beam line, but the state at the most upstream measurement is employed for the comparison of the two Kalman filters. Although only tracks with measurements in each of the subdetectors are considered for this study, this is in principle not a requirement for operating the parameterized Kalman filter Figure 1 compares the resolution of the momentum, the x-position and the slope t x as a function of the true momentum of a particle. Since the position and slope are nearly exclusively determined by the measurements in the VELO, where only a very weak magnetic field is present, the parametrizations of the parametrized Kalman filter are sufficient to obtain results comparable to the default Kalman filter in these variables. In contrast, the momentum estimate strongly depends on the extrapolations in regions with strong magnetic field. There, especially at momenta below 10 GeV/c, an up to 20% worse resolution is observed for the parametrized Kalman filter. The Kalman filter does not only provide an estimate of the state parameters, but also a corresponding covariance matrix. In Figure 2 the pull distributions of the estimated momentum, x-position and slope t x for the parametrized Kalman filter are shown. In all three cases, good uncertainty estimates are visible. However, in analogy to the observations made for the resolution, the pull distribution of the momentum features slightly more pronounced tails.
Besides the estimate of the state near the beam line, which is used for the reconstruction of charged particles, an important output of the Kalman filter is the fit quality described by the χ 2 track per degrees of freedom N dof . In Figure 3, this quantity is shown for the parametrized Kalman filter for real tracks coming from a particle and fake tracks consisting of random combinations of clusters. In addition, the real track efficiencies and fake track rejection rates are shown for both Kalman filter versions when applying upper bounds on this quantity. The parametrized Kalman filter shows a slightly worse but overall comparable performance in separating the two track classes.
The fitted tracks are combined to reconstruct B 0 s → φφ candidates. Figure 4 shows the invariant mass distribution of candidates based on the two Kalman filter versions. A single  Gaussian distribution and a first order polynomial are employed to model the signal peak and the combinatorial background, respectively. This yields nearly identical estimated mass resolutions of 12.8 MeV/c 2 and 12.9 MeV/c 2 for the default and the parametrized Kalman filter, respectively.
In order to compare the timing performance of the parametrized Kalman filter and the default Kalman filter, throughput studies on a machine with two Intel(R) Xeon(R) Silver 4214 processors were performed. Simulated proton-proton collisions were used in order to mimic the situation of real data taking. Depending on the configuration of the outlier removal strategy, an overall speedup factor between 4 and 5.5 with respect to the default Kalman filter was achieved. The largest speedup is achieved when no iterations for the outlier removal are performed. Singling out the calculation steps of the Kalman filter, i.e. neglecting the part of the algorithms where the measurement information is constructed, the speedup factor is even larger and ranges from 5.7 to 10.
In the case of the parametrized Kalman filter, and singling out again the calculation step of the Kalman filter, 50% of the time is spent extrapolating the states between the detector layers. Here, the extrapolation between the UT and the SciFi constitutes the biggest component with a relative fraction of 40%. The remaining Kalman filter steps, consisting of updating the states with the cluster information and the combination of upstream and downstream filtered states, are responsible for 16% and 14% of the time spent, respectively. The extrapolation to the beam line, which is based on the default LHCb extrapolation algorithm, is responsible for the remaining 20% of the time budget.

Conclusion
We presented an alternative implementation of a Kalman filter for the LHCb experiment. Based on simple parametrizations of material effects and the extrapolation through the magnetic field of the detector, this algorithm achieves a significant speedup with respect to the current implementation, while retaining comparable quality of the track parameters. In the future, further improvements of the parametrizations might allow an even better estimate of the track parameters and a subsequent speedup. Ideas currently under discussion include for example an analytic parametrization of the x and y dependence of the parameters employed in the extrapolation from the UT to the SciFi detector and a better account for the limited acceptance of low momentum particles. The version presented in this document or a future implementation might therefore be well suited for the usage in the LHCb software trigger system for Run 3 of the LHC.