A Detailed View on the Spatio-Temporal Information Content and the Arithmetic Coding of Discrete Trajectories

The trace of a moving object is commonly referred to as a trajectory. This paper considers the spatio-temporal information content of a discrete trajectory in relation to a movement prediction model for the object under consideration. The information content is the minimal amount of information necessary to reconstruct the trajectory, given the movement model. We show how the information content of arbitrary trajectories can be determined and use these findings to derive an approximative arithmetic coding scheme for trajectory information, reaching a level of compression that is close to the bound provided by its entropy. We then demonstrate the practical applicability of our ideas by using them to compress real-world vehicular trajectories, showing that this vastly improves upon the results provided by the best state-of-the art compression schemes for spatio-temporal data.


Introduction
Gathering, storing and transmitting data on the movement of objects are common parts of many applications in ubiquitous computing. These data, referred to as trajectories, basically comprise sequences of position and time measurements. Given that storage space and transmission capacity are valuable resources, in particular in the mobile domain, it is desirable to encode trajectories efficiently, e.g., by means of compression.
In general, compression methods seek to identify and remove redundancies from an input source; in this paper, we focus on redundancy within trajectories. This redundancy results from underlying principles of object mobility, such as kinematics or Newton's laws of motion. It is widely accepted that these principles cause mobility to be predictable to some degree; for example, several approaches have been proposed that use linear models for the compression of trajectories [9,15], though non-linear models have also been discussed recently [13,18].
However, no previous work has regarded the general upper bound for trajectory compression that is given by the information content, or entropy, of such a movement trace. The contribution of this paper is the cautious consideration of the following question: given a prediction model for object movements, how much information does a trajectory contain with respect to this model and what upper compression bound does this imply? Then, we use this knowledge to construct a compression scheme based on arithmetic coding that comes very close to reaching this bound and evaluate the impact of the model parameters on the compression performance.
Throughout this work we use the vehicular domain as an example to illustrate our findings and to prove its applicability to real world data. However, the ideas presented here can be used to analyze and compress any form of trajectory, provided that a prediction model for the respective mobility can be constructed.
This paper is an extended version of [12]. As novelties, it contains more details about the code model and its components. Also, it presents new findings on the symbol alphabet and the probability distribution and their impact on the arithmetic coding performance.
In the remainder of this paper, we present related work on trajectory compression and probabilistic positioning in Section 2. We introduce our idea of information content for trajectories and how to measure it in Section 3. In Section 4, we discuss how to apply this idea to vehicular trajectories and briefly describe details of an arithmetic coder implementing our model. Both the model and the coder are evaluated in Section 5.

Related work
In the literature, the compression of movement measurements is frequently discussed in the context of Mobile Object Databases (MODs). MODs receive trajectory updates from mobile units and can handle spatio-temporal search requests. For MODs, compression techniques have been proposed that either require an already completed data collection (of f line approaches, e. g. [4,6]) or that compress data on the fly (online approaches, e. g. [9,15]). For these approaches, line simplification and linear dead reckoning-both marking the current state of the art of trajectory compression-have been used. The compression performance of both are upper-bounded by optimal line simplification [10]. The authors of [3,18] approximate trajectories using so-called minimax polynomials, so that the maximum approximation error is minimized for the employed parameter values. Further compression techniques that focus on vehicular trajectories use cubic splines [13] and clothoids [1,17]; in general, these non-linear approaches attempt to model the smoothness of vehicular movements or roadways. [11] contains a detailed problem statement and a comparison of the above techniques.
In robot navigation, Probabilistic Positioning is often employed for self positioning, e.g., within office buildings [8,20]: instead of precise positions, position probabilities for a discrete map are given. We use a similar concept by defining a probability distribution over a limited region, but do not require any map material.
In [19], navigation decisions of pigeons are analyzed based on the stochastic complexity of trajectory sections by deriving the navigational confidence. The authors of [2] propose user movement estimation for cellular networks with Markov models. They determine state transition probabilities based on relative location frequencies and use these to derive compressed position update messages. Both approaches are special cases for information content measurements, but cannot directly be generalized to arbitrary movements. In this paper, we present a formal model that not only can be seen as a generalization of these approaches but can also be adapted to any other application area.
None of the existing approaches consider a general upper bound for trajectory compression that is given by the information content of a movement trace. In this paper, we will show that doing so will lead to significant improvements in the compression ratio of trajectories.

The information content of trajectories
In this section, we will show what the information content of a trajectory is and how it can be measured. To this end, we will introduce a formal model for the entropy calculation of trajectories and discuss its components and parameters.

What is the information content of trajectories?
Any object movement, such as the migration patterns of flocks, the movement of astronomical objects or the trajectories of road vehicles can each be described by a formal model. In general, such models can be used for movement predictions of particular objects based on previous position measurements, exploiting the redundancy and predictability of movements. Since typically not all factors influencing the mobility of an object can be modeled accurately, the actual position of the object might differ from the prediction. This deviation is commonly referred to as innovation, i.e., the uncertainty of the prediction process.
In this paper, we investigate the information content of the innovations. Let us begin with the necessary information theoretical concepts, for more details see [16,22]: given a random variable X with a finite sample space A X ={a 1 ,. . . ,a I } and a probability distribution P X ={ p 1 ,. . . ,p I }: p i = P(x = a i ), ∀1 ≤ i ≤ I : p i > 0 and I i=1 p i = 1. In the following, we will also refer to A X as the (discrete) alphabet. The Shannon information content (given in bits) of an event x ∈ A X is defined as where P(x) is the probability of its occurrence. Then, the entropy H(X) refers to the average information content of an outcome of X and is defined as In other words, the entropy is the average number of necessary bits to represent an outcome x ∈ A X .
If we apply this definition to our previous discussion, we can identify the estimation innovation as the outcome of each estimation step: it represents the estimation uncertainty and bears the information that was missing when the prediction was made. So, if the innovations of all position estimation steps are regarded, we can derive the information content of a whole movement trace. On the other hand, the alphabet A X and the probability distribution P X yield the entropy of the outcome, being the average information content.

How to determine the information content of trajectories
Basically, each outcome x of the random variable X is determined by the employed movement estimator. Therefore, we need to formalize all involved components: the movement estimator and the parts of X, namely the alphabet of possible values A X and the set of corresponding probabilities P X . The movement estimator is a function θ that determines a two-dimensional position based on an observation vector m = (m 1 , . . . , m N−1 ) containing previously collected position measurements: Then, the innovation i N is the difference between the estimation and the actual measurement: So, the innovation is a two-dimensional real vector itself and cannot directly be used for the outcome x, because R 2 is uncountably infinite, i.e., neither countable nor bounded. To process the innovation into an outcome x, we need to overcome these two issues.
The innovation domain can be made countable by means of simple discretization: the real innovation vector is mapped onto a grid, with each grid node referring to a particular symbol in A X . The grid cell width and form are accuracy parameters, their choice is influenced by several aspects, e.g., the highest tolerable discretization error or the highest discretization error under which the movement model still produces reasonable results.
Once it is countable, the innovation domain can be bounded, while still keeping all reasonable innovations covered by A X . That is, all possible positions within reach in the time period since the last measurement need to be mappable to A X . Which positions can be reached depends, e.g., on the movement model, or measurement noise. The limitation of the innovation domain is important, because the most probable innovations for any valid trajectory need to be mappable on it: if the limits are set too narrow, i.e., A X misses reasonable innovations, such innovations could not be covered by the random variable X. Contrariwise, too wide limits would include implausible innovations in A X and thus would increase its entropy, which then could be significantly higher than the actual entropy of the trajectory.
Once the movement estimator and the alphabet are known, P X is set up by assigning a probability to each symbol in A X . Like the alphabet, the probability distribution is crucial for the result of the entropy determination.
So, the entropy of a random variable X over the alphabet A X and with a probability distribution P X can be determined directly. To measure the information content of a trajectory, the deviations between the predicted positions and the actual position measurements are mapped to A X . Then, the information content of each measurement can be determined.

Exemplary implementation of information measurement
We can now apply the necessary parts for the determination of a trajectory's information content to a specific use case and show how to implement these components for vehicular trajectories.
To this end, we state a number of assumptions, upon which we build our model: (1) We assume that the movement of vehicles are regular and can be expressed by the formulae from kinematics or Newton's laws of motion. (2) We expect that due to this regularity in movement, we can estimate a vehicle's future movement based on past position measurements and limit the area around this estimate containing all reasonable deviations. (3) We assume that, within this area, the positions closer to the estimate are more likely to match the vehicle's next position than those at the border and that the deviations from the estimate are regular as well, so that they can be learned.

Movement estimator
As trajectory data, we consider position measurements p = ( p x p y ) ∈ R 2 ; then, the velocity (v) and acceleration (a) vectors of a vehicle at the position p i at the time t i can be described by: With these quantities, some simple movement models can be set up as described in [14]: the first model only considers the last position and the velocity vector: The second model extends the first one by using the approximated acceleration: Obviously, more complex movement models, e.g., using sensor data fusion, are conceivable. However, for the context of the arithmetic coding that we will face later on, this means that all data that is used for the estimation needs to be transmitted to the remote receiver side, which increases the communication load. Therefore, we aim at a minimal data basis for the movement estimator and thus at simple movement models.
Moreover, we will show that these simple models already perform very well and thus defer the investigation of other movement models to future work.

The discrete alphabet A X
As described above, the innovation domain can be made countable and bounded by projecting each innovation to a grid of limited size. In this section, we will discuss possible configurations for the discretization grid; to this end, we will discuss several grid node alignments, how to determine reasonable grid dimensions and how to set up the grid frame.

Discretization grid node alignment
It is clear that the specific grid design depends on the application context; in the vehicular domain, for example, a uniform approximation error for any region of the grid is desirable; this makes regularly tessellated gridsi.e., using regular triangles, squares and hexagons as shown in Fig. 1-an interesting option. Also, the dimensions and the density of such grid cells can be easily adjusted by a maximum discretization error that directly influences the edge length of the polygons.
The use of different tessellations has several influences on the model performance: With increasing number of cell edges, both the cell area and the average discretization error increase as well; this leads to a smaller average discretization error of a triangular grid compared to a square or hexagonal tessellation. In turn, the smaller the average discretization error, the better the movement estimation is likely to work. And finally, the number of resulting cells is inversely proportional to the cell size; this means that the alphabet size will decrease with an increasing number of edges per cell, resulting in a smaller entropy of X.

Discretization grid dimensions
While setting up the grid cells is a comparably straightforward task, the limitation of the grid scope is more challenging, because the grid needs to cover all reasonable (and only those!) measurement innovations. For the use case of vehicular movements, the grid boundaries strongly depend on the possible movements of a vehicle. We therefore introduce in the following a kinematic model to determine these boundaries.
For the determination of the discretization grid dimensions, we refer to a logical-not necessarily geometrical-grid center, at which the grid will be aligned along the movement direction. We set this grid center to the estimated next position according to the non-accelerated movement model 1, disregarding both acceleration and steering. Given such a logical center, we can calculate the maximum spatial deviations that can be achieved under our model. We set up this range using a straight-forward line of simple kinematic arguments: The longitudinal dimension of the grid-i.e., the dimension along the movement direction-directly results from the highest possible deceleration dec max and acceleration acc max that could cause a deviation from the predictand within one measurement interval. Then, according to Eq. 2, the longitudinal grid dimension interval relative to the logical grid center is For the lateral grid dimensioning, we need to regard an extreme steering behavior to derive the highest achievable lateral deviation from the estimated position. To this end, we consider the vehicle to pass through a curve, with the vehicle's velocity and the radius of the curve being chosen to such an extent that the lateral deviation is maximized. This deviation is limited, however, by the vehicle's velocity and the radius of the curve: given a curve radius r, a vehicle's speed is upper-bounded by the critical cornering speed v c (r) = √ a l · r, where a l refers to the highest possible lateral acceleration [21]. For the determination of a l , we state according to Coulomb's friction law: a l ≤ μ s · g, where μ s is the static friction coefficient and g ≈ 9.81 m s 2 is the earth's standard gravity acceleration. For the choice of μ s , default reference values as in [14] can be used. With the critical cornering speed, we can calculate the maximum lateral deviation: at a cornering of more than 90 • within the time interval t, the lateral deviation equals the sum of the curve radius and the distance that could be driven perpendicular to the assumed driving direction (cf. Fig. 2a). Otherwise, the lateral deviation is merely the width of the curve that has been passed so far (cf. Fig. 2b). Formally, the maximum lateral deviation can be expressed by the function d l : As depicted in Fig. 2c, the graph of d l resembles a square root curve: after a rapid growth with an increasing curve radius of up to approximately 80 m, the curve stagnates and features merely a minor slope.    [14]), we can derive from this analytical model that the grid would need to be at least 2 · 2.7 m = 5.4 m wide. Figure 3a shows an exemplary discretization grid with previous position measurements, the position estimate and the measures on the longitudinal and lateral dimension. In the following, we will refer to the lateral grid size as 2 · w lat .  However, though these grid dimensions are analytically set up, they do not necessarily need to be optimal. Further influences such as increased positioning noise levels may cause innovations to lie outside the analytically deduced boundaries. A simple countermeasure for such situations would be to add a single extra symbol to A X , representing outliers, which only minimally increases the alphabet size and the entropy of the random variable X. Complementary, expected or current noise statistics, e.g., dilution of precision (DOP) values, could be regarded for the grid dimensioning: the grid dimensions could be simply increased by a certain size to set up a guard zone around the analytically determined grid dimensions, thus allowing for a higher noise level by increasing the alphabet. The setup of such a guard zone is nontrivial, however, and beyond the scope of this paper and thus remains future work.

Discretization grid frame
In Fig. 3a, we assume a rectangular grid shape. While this is concrete and easy to model, it does at the same time not really reflect a true analytic boundary for the measurement deviations from the predictand. For such a boundary, one would have to regard that due to Coulomb's law, vehicles cannot progress as far on the longitudinal dimension when cornering. If this is taken into account, the resulting grid frame would feature an elliptic form. We approximate such an elliptic form with p-norms; a grid node with the metric offset (δ lon , δ lat ) from the logical grid center (the predictand) is mapped to an alphabet symbol iff

The probability distribution P X
The probability distribution of the random variable X assigns a probability p i to each symbol a i ∈ A X , as described in Section 3. Choosing the correct probability distribution that fits the actual nature of the innovations is non-trivial and needs to be regarded more closely.
In the following sections, we therefore discuss several possible distributions that we will evaluate with our arithmetic coding scheme in Section 5.

Uniform distribution
The simplest possible distribution is the uniform distribution, i.e., Under the assumption that the employed movement estimator satisfies a reasonable degree of accuracy and therefore small deviations from the position estimate are more likely than large ones, this distribution is not what one would expect from P X in reality. However, it is a good lower-bound benchmark that can be used to validate the performance of other probability distributions.

Normal distribution
Since we expect predominant accurate results from the position estimators, we assume P X to be reasonably close to the normal distribution which is commonly used in the context of noisy position measurements [25]. While it is unlikely that the innovations during an estimation process will be perfectly normal distributed, we consider this to be a good approximation.
As explained above, we derive the alphabet from a two-dimensional grid, so we need to employ a bivariate normal distribution N (μ, ), μ = (μ x , μ y ), = σ 2 x , ρσ x σ y T , ρσ x σ y , σ 2 y T with the standard deviations σ x , σ y and the correlation coefficient ρ [24]. The distribution's mean is set according to the estimated next position: while the grid is aligned using the nonaccelerated movement model 1, the mean μ of the probability distribution can be determined using other models, e.g., the accelerated movement model 2.
We do not expect the dimensions of the innovation domain to correlate, so ρ = 0. However, the normal distribution is symmetric, which does not necessarily need to apply to the grid as well. Therefore, we need to find a mapping (skewing) of the probability distribution to the dimensions of the grid. To this end, we propose a projection of the grid, which is schematically depicted in Fig. 4: the mean μ separates the grid into four quadrants (cf. Fig. 4a), each of which is scaled to the dimensions [s x ; s y ], where s k is the number of standard deviations that are supposed to cover the k axis of each quadrant (cf. Fig. 4b). Due to the scaling, the standard deviation of the distribution can be set to σ = 1 and so,  (cf. Fig. 4c). Afterwards, the assigned probabilities need to be normalized to eliminate scaling effects, so that I i=1 p i = 1 applies again (cf. Fig. 4d). Alternatively, asymmetric probability (e.g., lognormal) distributions could also be employed.

Trained distributions
Under the assumption that vehicular movements, disregarding the terrain or surrounding, follow a general principle or pattern, we also suppose that there is a generally fitting distribution that can be found, or learned. Since the alphabet is discrete and finite, we attempt to obtain such a distribution from a collection of previously observed traces, referred to as the training set.
To this end, we map each measurement in the training set to its corresponding alphabet symbol using the non-accelerated movement estimator 1 and the grid as described in Section 4.2. For each symbol a i ∈ A X , we can then determine the occurrence count n i . The probability p i can thus be written as According to the definition in Section 3.1, all symbol probabilities need to be nonzero. We therefore define n i := max{n i , 1} and p i analogously to p i , resulting in a close approximation of p i that can be used for the determination of the information content and for the arithmetic coding of trajectories. If each symbol occurs at least once, n i = n i and thus p i = p i .

Adaptive distributions
In contrast predefined distributions, we also regard distributions that evolve over time. Such adaptive distribution models start with an initial setup, e.g., a uniform distribution, and evaluate the observed symbol occurrences to converge towards the actual distribution. Of course, more realistic distributions can also be used as initial setups. Common adaptive distribution models regard n-gram relations in the symbol stream, e.g., simply constructing a unigram probability distribution. The advantage of this approach is that the resulting distribution is an optimal match for all previously, and at best upcoming symbol occurrences. Of course, this will only pay off for sufficiently long trajectories, i.e., if the ratio of trajectory length to the alphabet size satisfies a particular threshold value. Otherwise, the learned distribution reaches a representative version too late so that too few position measurements in the trajectory can benefit from the learned distribution to compensate for the learning phase, during which the distribution may be far from being an acceptable fit.

Contextual distributions
Finally, we regard probability models that feature more than a single probability distribution and which we refer to as contextual distributions. For these, we assume that the actual distribution of measurement innovations correlate with the measurement vehicles' current movement parameters, such as velocity or acceleration. Thus, for each known context, there is one probability distribution that can be selected for use. In doing so, distinctive situations can be taken into account for the information content determination and for the arithmetic coding, such as halts, accelerations after halts or movements at constant velocities.
It is obvious that contextual distributions are actually no self-contained probability distributions but rather a way to combine multiple distributions into a single probability model and thus can be used as an extension instead of a stand-alone and exclusive alternative. Also, there can be different kinds of distributions employed for each context in order to assemble the optimal fit for each context.

Exemplary alphabets and their entropies
We can now determine the entropy of the random variable X, i.e., the expected information content per position measurement for a given alphabet and probability distribution. Table 1 shows exemplary entropies for varying measurement intervals and accuracy bounds. For the regular square grid setup we assumed an acceleration interval [dec max ; acc max ] = [−11; 8] m s 2 . Also, we calculated the entropies for probability distributions with two different standard deviations: we chose s x = s y = 3σ and s x = s y = 4σ , thus assuming that approximately 99.7 % and 99.99 % of all measurement innovations will lie within the grid, respectively. Please note that the entropy, as a predictand, solely depends on the used movement model θ, the alphabet A X and the probability distribution P X of the random variable X, and not on actual measurements.
We can see from the table that even for very high accuracy demands, the expected average information content per symbol is very low: while off-the-shelf GPS receivers provide position measurements as fixed-point numbers with six digital places and thus can be encoded with 57 bits, the expected average information content at an accuracy bound of 0.1 m always lies below 16 bits. This would even apply if the probability distribution of the alphabet is considered uniform. According to our model, the entropy becomes smaller with the measurement interval. This is reasonable, because the more information is provided by measurements, the lower is the missing information for a perfect estimation.

Model implementation: an arithmetic coder
The estimation results presented in Table 1 encouraged us to build an arithmetic coder based on our formal model, since we could expect compression rates of more than 90 % even at tight accuracy bounds. For the implementation, we used the arithmetic coding project of Bob Carpenter [5], version 1.1 as a basis.
The only modification to our formal model lies in the handling of outlier measurements: Instead of adding an extra symbol to A X , the encoding stops upon an outlier measurement. This is inevitable, because the mapping of a discretized innovation to a symbol needs to be bijective; this is not fulfilled in case of outliers. Once an innovation cannot be mapped to a grid node, it is not possible to retrieve a valid grid node in the decoding process. Therefore, in this case the symbol stream is terminated with the End Of Stream symbol, P X and the estimator are reset and a new encoding begins. This is an undesirable situation, because at least one position needs to be transmitted uncoded; so, even with the mentioned guard zone, using a robust estimator is crucial for the compression performance.

Movement data and methodology
We evaluate the presented arithmetic coding model on the basis of an extensive real-world movement data set from the OpenStreetMap project [23]. These data are available under the Creative Commons license BY-SA 2.0 [7]. To isolate the effects caused by road topologies, we categorized each movement trace based on the highest object velocity v max as urban (8.3 m s < v max < 17 m s ) or highway (v max ≥ 17 m s ) movements. We then selected only those traces with 1 Hz measurement frequencies; this is a very common position measurement frequency that makes the selected database representative for a huge number of both off-the-shelf and high-accuracy positioning devices. We furthermore excluded traces with less than 100 measurement points to neglect side effects due to very short movement periods. In the vehicular domain, such trajectory lengths result mostly from positioning signal loss and thus from erroneous situations which we do not want to regard in this study. Finally, in doing so, we retrieved 2263 and 4946 traces for the urban and highway pattern, respectively.
For the evaluation of the trained probability distributions, we also split our trace collection into a training set and two test sets. Since traces are classified as highway or urban based on the maximum velocity, highway traces still contain a sufficient amount of urban traffic and are thus a good choice for obtaining a trained distribution. Hence, we randomly selected 2000 highway traces to train a distribution and used the other 2946 highway traces and all urban traces as two test sets.
For the evaluation, we compressed every collected trace for a variety of approximation error bounds ( ) up to 2 m and analyzed important effects, i.e., the accuracy of the movement estimator, the goodness of fit for the normal probability distribution and the compression performance. For the last-mentioned characteristic, we furthermore have a closer look on the influences of the grid node alignments, the grid frame shape, and the probability distributions with respect to a basic coding scheme configuration.

Movement estimation and discretization
Since it is essential to use an accurate and robust movement model, we performed movement estimations with the non-accelerated and accelerated movement estimators. We were in particular interested in the influence of the discretization and therefore analyzed the movement estimation inaccuracies for varying discretization steps; the results are shown as cumulative distributions in Fig. 5. Without discretization (i.e., = 0), the error was below 2 m in 90-95 % of the cases, with slightly worse results for the highway topologies, which we left out for lack of space. For increasing , the error distributions widen, which indicates that a coarser discretization lowers the quality of the estimator's observation vector. This in turn has a direct influence on the information content and the compression performance.
Unexpectedly, the non-accelerated estimator (w/o acc) outperforms the accelerated variant (acc), which is more impaired by the discretization, because it derives acceleration values from distorted velocities. To reduce this effect, we amended the accelerated estimator by smoothing the computed acceleration values exponentially (acc exp). This improves the situation, but does not provide the same robustness as the non-accelerated estimator (cf. Fig. 5). In all of our tests, this directly resulted in lower compression ratios for the respective movement models. We therefore only regard the nonaccelerated model in the remainder of our evaluation.
The discretization grid node alignment also influences the performance of the movement estimator as discussed in Section 4.2.1. Figure 6 shows the movement estimation error analysis for the nonaccelerated movement estimator and the three discussed tessellation schemes. Though the estimation results are very close to each other, the estimation works slightly best with the triangular tessellation and worst with the hexagonal tessellation. However, with these two tessellation schemes the estimator exhibits more movement underestimations of up to one meter on the longitudinal axis. The reasons for this effect are not completely understood and are subject to further investigation.

Gaussian probability distribution
To evaluate the goodness of fit for the assumed Gaussian probability distribution, we compare it to the actual cumulative distribution functions (cdf) for both topologies over the respective A X in Fig. 7. To this end we serialized the two-dimensional distributions over the grid to one-dimensional distributions over the ordered symbol set to gain a better overview: we concatenated the symbols from cross-sections along the lateral grid axis, causing stepped curves, where each "step" refers to one such cross-section. The cdf graphs for the normal distribution resemble the empirically determined ones, though the distributions for both the urban and highway topologies are denser, especially for lower values of . This basically confirms our assumption that the Gaussian distribution is reasonable approximation for P X , though it is also quite obvious that there is room for improvements. We will evaluate the impact of the other distribution models in Section 5.4.4.

Basic onf iguration and benchmarks
At first, we want to evaluate the compression performance of our proposed arithmetic coding scheme in a basic configuration: the vehicle movement is estimated using the non-accelerated movement model, and we use a rectangular shaped discretization grid with a grid node alignment following a square grid cell tessellation. Additionally, we use the current stateof-the-art compression algorithms for spatio-temporal data as benchmark algorithms. As discussed in [11], these cover the compression schemes based on optimal line simplification and cubic spline interpolation.
The spline-based approach runs in O(n 3 ) and was designed for relatively short trajectories of ∼250 elements; so, we selected typical trajectories of 1000-1300 positions and cut them to slices of 250 elements, each slice overlapping the previous one by 100 elements, and thus gained 318 and 327 shorter traces for the urban and highway topologies, respectively. In doing so, we avoid side effects and spread both advantageous and disadvantageous effects on multiple slices.
Both the line simplification and the arithmetic coding are capable of handling trajectories of several thousand measurements, so we additionally apply these to uncut traces in order to gain a broader basis for the analysis and to examine the ability of these approaches to handle data streams of variable length.
As an optimal probability distribution configuration for the arithmetic coder, we performed compressions using a posteriori knowledge: we measured the empirical distributions of the code symbols for each trajectory and used these as stochastic models for the arithmetic coder. This is only a theoretical optimum, because for a productive operation, this distribution would have to be transmitted along with the code bit stream and would cause a serious and non-acceptable overhead. Figure 8a-d depict the average compression ratios against the error bound . For all configurations, the same ranking of compression techniques is visible: the optimal line simplification-representing the upper bound for the performance of current state-ofthe-art movement compression-performs worst, being outperformed by the cubic spline approach (only for the trajectories with 250 elements in Fig. 8a and b. The arithmetic coding performs best, especially for very tight accuracy bounds of < 1.0 m and even if a uniform probability distribution is used for P X as reference. When a normal distribution is used, the compression ratios improve by another 20-30 %. The results for using a posteriori knowledge are significantly better for < 0.5 m, but thereafter are almost reached by the compression ratios with the normal distribution assumption. This underlines our findings from the probability distribution analysis that for growing , the impact of the probability distribution decreases. Because the basic arithmetic coding configuration outperforms the current state-of-the-art compression schemes, we will use it as benchmark for all enhanced code model configurations.
An interesting effect are the visible drops of the compression ratios for very high that occur for all distributions and topologies alike. These originate from the discretization grid that is getting coarser for growing . This can be seen in Fig. 9 that depicts the number of discretization grid cells per dimension over increasing accuracy threshold values. Figure 10 depicts the urban and highway compression performances for discretization grid node alignments with triangular, square and hexagonal grid cell tessellation. For the urban topologies, the hexagonal grid node alignment performs slightly best for low , while for > 1.0 m, its performance drops to the lowest of the three regarded tessellation models. The triangular grid node alignment achieves slightly worse results than the square tessellation, but in the end, it exceeds the compression ratio of the square tessellation. This can be explained with the results presented in Section 5.2: the triangular tessellation has a positive impact on the movement estimation, but on the other hand, it increases the alphabet size, causing a slightly worse compression performance. Also, the bad influence of the hexagonal tessellation on the movement estimator cannot be alleviated by the smaller alphabet.

Impact of discretization grid nodes alignments
For the highway scenario, the situation is different: due to a better movement predictability and the smallest alphabet, the coding with the hexagon grid performs significantly better than with the square grid. For the triangular tessellation, the higher grid node resolution causes also a better compression performance compared to the basic configuration benchmark.
However, based on this strongly different impact of the discretization node alignments, one cannot simply state a generally optimal choice of the discretization node alignments. However, we prefer the square grid tessellation for practical implementations due to its algorithmic simplicity and its good performance.  Grid node counts over increasing error bound larger p causes a larger alphabet, this result is a good indicator that a more tolerant choice of the grid frame (and thus the alphabet) size is rather beneficial than disadvantageous. This is an interesting result, because it shows that a higher tolerance for outlier measurements is more important to the overall compression performance than an exact determination and dimensioning of the code symbol alphabet.

Impact of probability distributions
In this section we will analyze the compression performance of the trained, adaptive, and contextual distributions described in Section 4.3. As mentioned above, we created the symbol distribution based on a training set of 2000 highway traces and used the trained distributions for the arithmetic coding for all traces in the two test sets. For the contextual distributions, we used 10 individually trained distributions for different velocity       where v is the previously observed velocity. We then used the same formula to obtain the correct distribution during the arithmetic coding of the test data. Figure 12 shows the compression results of the advanced distributions compared to the basic configuration. Though on the first sight, the compression results seem to be quite close to each other, it is worth  to focus on the relative compression performance with respect to the interval between the upper and lower compression bound, given by the a posteriori and uniform distribution results; this is depicted in Fig. 13: in this figure, a zero value means that a compression equals the achievement with a uniform distribution and a value of one means that a compression ratio as good as with a posteriori knowledge could be achieved. In general, the relative performance plots are very similar for the urban and highway traces; at first, it is clear to see that the trained distributions achieve the best compression results. For the highway traces, however, the trained distributions regarding velocity classes even exceed what we referred to as the upper performance limit. This is possible, because of the multitude of distributions that are selected depending on the current movement class and that are optimized views on the symbol distributions. Of course, if an a posteriori distribution with velocity classes would have been used, this could not have been exceeded. The arithmetic coding using an adaptive distribution performs worst for < 1.3 m. The main reason for this is that the number of grid nodes for small is very large and thus it needs many samples to actually learn the actual symbol distribution; however, the trajectories are too short most of the time, so these necessary samples can not be collected timely. Of course, this situation improves for smaller alphabets, the probability distributions of which are easier to be learned. The relative compression performance again underlines that the Gaussian distribution is a reasonably good fit for P X ; however, these plots also emphasize the heavy drop of compression performance for ≥ 1.3 m due to the coarseness of the discretization grid. The trained and adaptive distributions are naturally only marginally affected by this effect.
For highway topologies, the compression ratios are slightly better. This is most likely due to the more limited steering behavior at higher velocities, as switching between lanes becomes more prevalent. This is in fact shown by symbol distributions in Fig. 14: longitudinal variations are lower than with smaller velocities, while the lateral variations nearly remain the same.

Conclusion
In this paper we determined the information content and entropy of trajectories with respect to a prediction model. Further, by using these findings we were able to specify an arithmetic coding/compression scheme. We demonstrated the practical applicability of our ideas by using them to compress vehicular trajectories and applied this to a large number of heterogeneous realworld vehicular movement traces. The results from this evaluation show that our approach is superior to the best existing compression scheme for vehicular trajectories.
Two open aspects from [12] are addressed in this paper: first, we analyzed both the impact of different discretization grid parameters, namely the grid node alignment and the shape of the grid frame. We found out that for the grid node alignment, no clear statement can be made regarding a benefit for the compression performance, but that more realistic and elliptic grid frames do not pay off but that a frame should increase fault tolerance instead. Second, we have investigated trained and adaptive symbol probability distributions. It showed that trained models, especially those providing distributions for particular velocity classes, improve the arithmetic coding performance significantly for all regarded topologies.