The calculation of player speed from tracking data

This short communication considers the calculation of player speed from tracking data. Whereas there are many player tracking systems, all rely on the collection of Cartesian coordinates corresponding to the players on the pitch. From these Cartesian coordinates, there are many ways that one could approximate player speed and acceleration. We introduce some simple principles from exploratory data analysis, which help yield more reliable speed calculations. The general principles are illustrated on various player tracking systems.


Introduction
In the past decade, the advent of player tracking data has sparked a revolution in sports analytics. 1 With player tracking data, analysts have access to the Cartesian coordinates of each player on the pitch where the observations are recorded frequently (e.g. 10 times per second). The availability of such detailed data provides opportunities to investigate sporting questions that were previously unimaginable. Gudmundsson and Horton 2 provide a review paper on spatio-temporal analyses used in invasion sports where player tracking data are available.
Currently, player tracking systems are expensive, and consequently, tracking data are only collected in "big" sports such as basketball (the National Basketball Association), soccer (various leagues and competitions), football (the National Football League) and hockey (the National Hockey League). Tracking data are not only collected during matches but also during workout sessions where fitness, training and health considerations are main concerns.
Tracking data are typically proprietary and are supplied by service providers using various technologies. 3 There are four prominent technologies: 1. global positioning systems (GPS), 2. local positioning systems (LPS), 3. inertial measurement units (IMU) and 4. optical tracking (OT) systems.
OT systems are fundamentally different as they do not require wearable devices and do not directly determine player coordinates. Instead, OT technology requires advanced camera systems and player recognition software to evaluate player coordinates. No matter which technology is utilized, tracking systems begin with the collection of the (x, y) coordinates of the participants measured at frequent time intervals. With the coordinates, various statistics can be calculated or approximated (e.g. speed, acceleration, distance traveled, etc.).
In this paper, we are concerned with derivative calculations associated with tracking data coordinates. Specifically, we are interested in the approximation of player speed which is an important statistic in sports analytics and sports science. For example, Wu and Swartz 4 require player speeds in soccer to assess off-the-ball activity. They introduce a measure which addresses defensive anticipation. Buchheit et al. 5 use regression methodology to determine factors that are associated with player speed in soccer. For example, horizontal force and horizontal power were seen to be associated with speed. Oliva-Lozano et al. 6 characterize positional differences in soccer based on acceleration and sprint profiles. Related to speed, Shen et al. 7 analyze pace of play in soccer, and conclude that pace increases with decreasing team quality, which indicates the importance of playing with pace. From a training and performance perspective, Ferrari Bravo et al. 8 demonstrate that sprint-training significantly increases both aerobic and anaerobic performances in soccer. Naturally, different applications require different levels of accuracy. For example, in sports science, critical velocity is an active research field which relies on highly accurate measurements of speed. 9 Much has been written on the accuracy of various tracking data technologies. For example, Mara et al. 10 considered the displacement accuracy of an OT system, Tan et al. 11 investigated the validity and accuracy of a GPS system, and Pino-Ortega et al. 12 provided a review of the validity and reliability of LPS systems against other devices. Massard et al. 13 questioned the need for sprint testing based on the comparison of GPS match and fieldtesting data. However, all of these investigations rely on some measure of the truth against which tracking measurements are compared. What should experimenters do if they do not have access to the truth and they are unsure of the accuracy of speed calculations obtained from tracking data? This paper introduces some simple principles from exploratory data analysis that assist experimenters to obtain more reliable estimates of speed.
In Section "Data," we describe the datasets upon which our methods are illustrated, and we describe how player speed is calculated from tracking data coordinates. In Section "Exploratory analyses," some simple exploratory plots are introduced that help the analyst obtain more reliable speed calculations. We conclude with a short discussion in Section "Discussion."

Data
We have access to tracking data from matches during the 2019 season of the Chinese Super League (CSL). The CSL uses OT technology (previously discussed) provided by Stats Perform where observations were recorded 10 times per second. The tracking data consist of roughly one million rows per match measured on seven variables. Each row corresponds to a particular player at a given instant in time. The soccer tracking data were initially provided as xml files, and were processed in R for further analysis. In Table 1, we present three rows of the soccer tracking data. Here we observe x − y coordinates and player identifiers at every 1/10th of a second. The entries are mostly intuitive except perhaps for the x − y coordinates which refer to the player location on a 105m by 68m soccer field. For example, (x, y) = ( − 52.5, 0) corresponds to the middle of the goal line on the left hand side of the soccer field.
Our second dataset corresponds to tracking data from the National Football League (NFL). Unlike the OT soccer data, the NFL data were based on GPS technology, but were also collected using 10 Hz sampling frames. The data were used in the 2019 Big Data Bowl competition and are publicly available at https://github.com/nflfootball-ops/Big-Data-Bowl.
Here we use data corresponding to a single deep pass play by the wide receiver Brandin Cooks of the New England Patriots taken from a 7-second interval during 7 September 2017 match against the Kansas City Chiefs. In Table 2, we present three rows of the football tracking data. Here we observe a similar structure to the tracking data in soccer. The football tracking data include the x − y coordinates for players measured in yards where x refers to the player position along the long axis of the field ranging from 0 to 120 yards, and y refers to the player position along the short axis of the field ranging from 0 to 53.3 yards. For instance, (x, y) = (0, 0) corresponds to the bottom left of the football field. The remaining variables in Table 2 are mostly intuitive where dis corresponds to the distance traveled from the previous frame (i.e. previous 1/10th second) and dir corresponds to the angle of player motion in degrees. The frame.id is the frame identifier for each frame which resets to one for each play.

Speed calculations
We emphasize that the approach that we introduce is general and straightforward. It can be utilized using any tracking technology in any sport. However, knowledge of the sport dictates our interpretation of the exploratory plots.
Consider then a particular player where our interest concerns the calculation of their speed. If (x(t), y(t)) denotes the location of the player at time t, then the player's speed at time t is defined by In words, formula (1) is the limiting change in the distance traveled with respect to time. Of course, (1) is a mathematical expression based on taking a limit, and is not a quantity that can be calculated from data. Instead, with tracking data, the player's locations are obtained at regular times which are denoted by (x 1 , y 1 ), (x 2 , y 2 ), ..., (x n , y n ).
Here, the subscripts i = 1, ..., n of the Cartesian coordinates refer to the time increments. Therefore, assuming that t corresponds to an observed time increment from the tracking data, it is reasonable to approximate s(t) in (1) byŝ where Δ = 1, 2, ... is an increment that needs to be specified. In our illustration with 10 Hz data, the value Δ = 1 corresponds to 1/10th of a second.
We have simplified the discussion by referring to speed. The approximation of velocity is also of interest where velocity has a directional component in addition to the scalar quantity speed. Note that acceleration calculations are also important, and are obtained as derivatives of speed.

Exploratory analyses
Whereas the estimand s(t) in (1) is an instantaneous speed, it's estimateŝ(t) in (2) is an average speed taken over the time period 2Δ. It may therefore appear that smaller values of Δ will yield better estimates. However, this needs to be balanced against the fact that player coordinates (x t , y t ) are subject to measurement error as is the time interval 2Δ. Therefore, inaccuracies in the speed estimates are propagated from inaccuracies in the raw data.
To theoretically investigate the magnitude of error in speed via measurement error in the numerator of (2), we consider the true speed Δ l /(2Δ) which denotes the change in location Δ l by the change in time where Δ denotes the previously defined incremental step size in time. With measurement error present, we denote the observed speed (Δ l ± E)/(2Δ), where E denotes a fixed error in the location measurement corresponding to the device. Then absolute error (AE) is given by We note that the AE (3)

Soccer example
To begin our investigation, Figure 1 provides a plot of the locations of a player from the CSL dataset taken during a 29-second interval where he is known to be running fast during portions of the interval. When a player is running fast, it is physically impossible to make sharp turns, and therefore, the smoothness of the path suggests apparent accuracy in the location measurements. However, when we take the path locations in Figure 1, and estimate speeds (2) using Δ = 1, there seems to be a significant accuracy problem. Figure 2 provides a plot of the estimated speed versus time for the selected path. In Figure 2, we observe that there are many instances where a player has a recorded speed which increases (or decreases) by roughly 1.0 m/s in the subsequent 1/10th second, and then returns to the baseline speed 1/10th of a second later. When speeds are recorded in the (0,8) m/s range, frequent fluctuations of this magnitude do not seem plausible. The problem here is that the location measurements were recorded to one decimal point on the meters scale, and therefore, there is inaccuracy in (2) when dividing by 2Δ which corresponds to 0.2 s.
A remedy to the estimation of the instantaneous speed s(t) is to increase the time increment Δ surrounding t. Increasing the length of the time interval 2Δ results in less fluctuation in the estimated speeds which is desirable. However, this is done at the expense of moving in the direction from instantaneous speeds to average speeds. We have found that the approximation Δ = 4 works well in this application. Figure 3 provides the analogous plot to Figure 2 where the time intervals have been widened to intervals of the length 0.8 s. In Figure 3, we observe that the fluctuations are less pronounced, and that the plot of estimated speed versus time is smoother. For example, the fluctations during the interval 16-18 s in Figure 2 are less believable than what is observed in Figure 3.
We refer back to the theoretical analysis of absolute error at the beginning of Section "Exploratory analyses." In this example, we have seen that we prefer the time increment Δ = 4 over Δ = 1. With Δ = 4, speed Δ l /2Δ = 8 m/s    In applications where acceleration measurements are important, one can imagine even greater challenges since acceleration is the derivative of speed. This is illustrated in the following example.

NFL football example
In the second example, we first note that the running patterns of an NFL wide receiver differ from those of a soccer player. Typically, the wide receiver sprints over a short-time interval and does not make many changes of direction. This has implications for the estimation of speed.
In Figure 4, we provide the estimated speed and acceleration estimates for Brandin Cooks based on a 7-second pass route. The red-lined plots correspond to estimates based on Δ = 1 (i.e. intervals of 0.2 s), and the blue-lined plots correspond to estimates based on Δ = 2 (i.e. intervals of 0.4 s). Using Δ = 1, the speed estimates appear satisfactory as there are no unrealistic fluctuations between successive estimates. When we compared the speed estimates using Δ = 1 to Δ = 2, there is no apparent improvement in the speed estimates. This suggests that Δ = 1 may be adequate for this application which is a different conclusion than with the soccer data. This may point to either a difference between the OT technology versus the GPS technology, or the intrinsic differences between the motions of soccer players compared to wide receivers in football.
When we look at the acceleration plots in Figure 4, it appears that Δ = 1 may exhibit untenable fluctuations in acceleration, especially around the 5.5-second mark. For example, from the 5.2-second to 5.9-second mark, there is a change in acceleration in each successive time step, and the acceleration follows an unlikely fluctuating pattern of up, down, up, down, down, up and down (i.e. five changes in direction). From the 5.2-second to 5.9-second mark with Δ = 2, we observe the more believable pattern of up, up, down, same, same, down and same (i.e. only one change in direction). With respect to the estimation of acceleration, Δ = 2 is preferred over Δ = 1.

Discussion
Tracking data have provided opportunities to study problems in sports analytics, which were once unimaginable. However, sound tracking data analyses require data that are reliable, and the reliability of tracking data statistics often degrade with increasingly complex statistics. We have provided some simple principles from exploratory data analysis to help experimenters derive more reliable estimates of player speed. The same principles can be utilized in the calculation of velocities and accelerations.
The principles developed here are general and can be used with any type of player tracking system in any sport. The experimenter needs to consider the estimands of interest. The experimenter also requires domain knowledge of the sport to assess whether the resultant variations in the estimates are reasonable.