Joint Tracking of Multiple Quantiles Through Conditional Quantiles

Estimation of quantiles is one of the most fundamental real-time analysis tasks. Most real-time data streams vary dynamically with time and incremental quantile estimators document state-of-the art performance to track quantiles of such data streams. However, most are not able to make joint estimates of multiple quantiles in a consistent manner, and estimates may violate the monotone property of quantiles. In this paper we propose the general concept of *conditional quantiles* that can extend incremental estimators to jointly track multiple quantiles. We apply the concept to propose two new estimators. Extensive experimental results, on both synthetic and real-life data, show that the new estimators clearly outperform legacy state-of-the-art joint quantile tracking algorithm and achieve faster adaptivity in dynamically varying data streams.


Introduction
The volumes of automatically generated data are constantly increasing [34] with more urgent demand for being analyzed in real-time [22]. Conventional statistical and data mining techniques are constructed for offline situations and are not applicable for such real-time analysis [23]. Thus a wide range of streaming algorithms are continuously being developed addressing a range real-time tasks such as clustering, filtering, cardinality estimation, estimation of moments or quantiles, predictions and anomaly detection [22].
Given a stream of data, probably the first and most arguably foundational problem is to describe the data distribution. Quantiles are useful to describe the distribution in a flexible and nonparametric way [30]. Estimation of quantiles of data streams has been considered for a wide range of applications like portfolio risk measurement in the stock market [11,1], fraud detection [46], signal processing and filtering [40], climate change monitoring [47], SLA violation monitoring [38,39], network monitoring [9,29], Monte Carlo simulation [43], structural health monitoring [13] and non-parametric statistical testing [26]. Motivated by the importance and the wide range of applications of streaming quantile estimation, in this paper, we will investigate advancing the state-of-the-art when it comes to simultaneous quantile estimation.
Suppose that we are interested in estimating the quantile related to some probability q. The natural approach is to use the q quantile of the sample distribution. Unfortunately, this conventional approach has clear disadvantages for data streams as computation time and memory requirement are linear to the number of samples received so far from the data stream. Such methods thus are infeasible for large data streams.
Several algorithms have been proposed to deal with those challenges. Most of the proposed methods fall under the category of what can be called histogram or batch based methods. The methods are based on efficiently maintaining a histogram estimate of the data stream distribution such that only a small storage footprint is required. Another ally of methods are the so-called incremental update methods. The methods are based on performing small updates of the quantile estimate every time a new sample is received from the data stream. Generally, the current estimate is a convex combination of the estimate at the previous time step and a quantity depending on the current observation. A thorough review of state-of-the-art streaming quantile estimation methods is given in the related work section (Section 2).
In data stream applications, a common situation is that the distribution of the data stream varies with time. Such system or environment is referred to as a dynamical system in the literature. Given a dynamical system, the problem most commonly addressed is to dynamically update estimates of quantiles of all data received from the data stream so far. Histogram based methods are well suited to address this problem. A less studied, yet important, problem is to estimate quantiles of the current distribution of the data stream typically referred to as quantile tracking. Incremental methods can document state-of-the-performance for the quantile tracking problem [45,18], while histogram methods are not well suited for efficient quantile tracking [6].
To address the tracking problem, several incremental quantile estimators have been suggested [7,6,5,45,31,42,29]. The intuitions behind the estimators are simple. If the received sample has a value below some threshold, e.g. the current quantile estimate, the estimate is decreased. Alternatively, whenever the received sample has a value above the same threshold, the estimate is increased. Even though the estimators document state-of-the-art tracking performance [45], neither of them use the values of the received samples directly to update the estimate, but only whether the value of the samples are above or below some varying threshold. Intuitively, this seems like a loss of information received from the data stream. Recently, Hammer et al. [18,16] presented an incremental estimator that used the values of the received samples directly which makes it distinct from all incremental estimators previously presented in the literature. The estimator is in fact a generalized exponentially weighted average of previous observations received from the data stream and documents state-of-the-art performance [18,16].
The incremental estimators above are constructed to track a single quantile, but from a practical point of view it is often more important to jointly track multiple quantiles, e.g. to be able to approximate the current data stream distribution. Of course one could run multiple incremental estimators in parallel, but we will then loose control over the joint properties of the estimates. Even the monotone property of quantiles * most likely will be violated. Surprisingly little research has been devoted to develop incremental quantile estimators that are able to make joint estimates of multiple quantiles for dynamically varying data streams. To the best of our knowledge, the only methods in the literature are due to of Cao et al. [6] and Hammer et al. [15,17,19]. The method by Cao et al. is based on first running an incremental update of each quantile estimate and secondly computing a monotonically increasing approximation of the cumulative distribution of the data stream distribution using a form of monotonically increasing linear interpolation. Finally, the quantile estimates are computed from the approximate cumulative distribution. A disadvantage of the method is that the "monotonization" approach is quite ad-hoc and cannot give any guarantee that the resulting quantile estimates converge to the true quantiles. The methods by Hammer et al. [15,17,19] are based on adjusting the update step size in each iteration to ensure that the monotone property of quantiles is satisfied. A disadvantage with these methods is that the required step sizes to satisfy the monotone property of quantiles can be too small to efficiently track the changes of the data stream distribution. This will be further demonstrated in the paper.
The two algorithms presented in this paper address the shortcomings of the current stateof-the-art multiple quantile tracking estimators described above. We show that the suggested estimators converge to the true quantiles and that the estimators can efficiently make joint estimates of multiple quantiles even when the properties of the data stream distribution change rapidly over time. Compared to most streaming quantile estimators in the literature, the suggested algorithms are extremely computationally and memory efficient. In fact, our algorithms require only storing a single value per quantile estimate.
The two algorithms extend the algorithms in [45] and [18] by applying a subtle idea of conditional quantiles. The idea is general and can also be used to obtain joint estimates based on other incremental estimators.
The remainder of this paper is organized as follows. Section 2 is dedicated to reviewing the state-of-the-art. In Section 3, we motivate the quantile estimators developed in this paper by first showing how the state-of-the-art incremental estimators in general fail to satisfy the monotone property when estimating multiple quantiles. In Section 4 we present the general concept of how to obtain joint estimates based on an incremental estimator. In Sections 5 and 6, we apply the concept to obtain two new algorithms for joint quantile estimation. In Section 7, we present comprehensive experimental results that catalogue the properties of our estimators. Section 8 concludes the article.

Related Work
The most representative work for this type of "streaming" quantile estimator is due to the seminal work of Munro and Paterson [32]. In [32], Munro and Paterson described a p-pass algorithm for selection using O(n 1/(2p) ) space for any p ≥ 2. Cormode and Muthukrishnan [10] proposed a more space-efficient data structure, called the Count-Min sketch, which is inspired by Bloom filters, where one estimates the quantiles of a stream as the quantiles of a random sample of the input. The key idea is to maintain a random sample of an appropriate size to estimate the quantile, where the premise is to select a subset of elements whose quantile approximates the true quantile. From this perspective, the latter body of research requires a certain amount of memory that increases as the required accuracy of the estimator increases [44]. Examples of these works are [3,44,32,12,14,28,27].
In [33], the authors propose a memory efficient method for simultaneous estimation of several quantiles using interpolation methods and a grid structure where each internal grid point is updated upon receiving an observation. The application of this approach is limited for stationary data. The approximation of the quantiles relies on using linear and parabolic interpolations, while the tails of the distribution are approximated using exponential curves. It is worth mentioning that the latter algorithm is based on the P 2 algorithm [20]. In [20], Jain et al. resort to five markers so that to track the quantile, where the markers correspond to different quantiles and the min and max of the observations. Their concept is similar to the notion of histograms, where each marker has two measurements, its height and its position. By definition, each marker has some ideal position, where some adjustments are made to keep it in its ideal position by counting the number of samples exceeding the marker. In simple terms, for example, if the marker corresponds to the 80% quantile, its ideal position will be around the point corresponding to 80% of the data points below the marker. However, such approach does not handle the case of non-stationary quantile estimation as the position of the markers will be affected by stale data points. Then based on the position of the markers, quantiles are computed by supposing that the curve passing through three adjacent markers is parabolic and by using a piecewise parabolic prediction function.
In many network monitoring applications, quantiles are key indicators for monitoring the performance of the system. For instance, system administrators are interested in monitoring the 95% quantile of the response time of a web-server so that to hold it under a certain threshold. Quantile tracking is also useful for detecting abnormal events and in intrusion detection systems in general. However, the immense traffic volume of high speed networks impose some computational challenges: little storage and the fact that the computation needs to be "one pass" on the data. It is worth mentioning that the seminal paper of Robbins and Monro [35] which established the field of research called "stochastic approximation" [24] have included an incremental quantile estimator as a proof of concept of the vast applications of the theory of stochastic approximation. An extension of the latter quantile estimator which first appeared as example in [35] was further developed in [21] in order to handle the case of "extreme quantiles". Moreover, the estimator provided by Tierney [41] falls under the same umbrella of the example given in [35], and thus can be seen as an extension of it.
As Arandjelovic remarks [2], most quantile estimation algorithms are not single-pass algorithms and thus are not applicable for streaming data. On the other hand, the single pass algorithms are concerned with the exact computation of the quantile and thus require a storage space of the order of the size of the data which is clearly an unfeasible condition in the context of big data stream. Thus, we submit that all work on quantile estimation using more than one pass, or storage of the same order of the size of the observations seen so far, is not relevant in the context of this paper.
Given dynamically varying data stream, two main problems are considered namely to i) dynamically update estimates of quantiles of all data received from the stream so far or ii) estimate quantiles of the current distribution of the data stream (tracking). To address problem i), histogram based methods form an important class of memory efficient methods. A representative work in this perspective is due to Schmeiser and Deutsch [36]. In fact, Schmeiser and Deutsch proposed to use equidistant bins where the boundaries are adjusted online. Arandjelovic et al. [2] use a different idea than equidistant bins by attempting to maintain bins in a manner that maximizes the entropy of the corresponding estimate of the historical data distribution. Thus, the bin boundaries are adjusted in an online manner. Nevertheless, histogram based methods have problems addressing problem ii) of tracking quantiles of the current data stream distribution [6].
To address the dynamic tracking problem ii) incremental algorithms represent an important class of methods. However, the research on incremental methods is quite sparse. As described in the introduction the methods are based in making small updates of the quantile estimates every time a new sample is received. In [8,5,7], the authors proposed modifications of the stochastic approximation algorithm [41]. While Tierney [41] uses a sample mean update from previous quantile estimates, several studies [8,5,6,7] propose an exponential decay when taking into account the old estimates. This modification is particularly helpful to track quantiles of non-stationary data stream distributions. Indeed, a "weighted" update scheme is applied to incrementally build local approximations of the distribution function in the neighborhood of the quantiles. More recent incremental quantile estimation approaches are the Frugal algorithm by Ma et al. [31], the DUMIQE algorithm by Yazidi and Hammer [45], and the DQTRE and DQTRSE algorithms by Tiwari and Pandey [42]. A nice property of the DUMIQE and the estimators suggested in this paper is that the update size is automatically adjusted in accordance to the scale/range of the data. This makes the estimators robust to substantial changes in the data stream. The DQTRE and DQTRSE aim to achieve the same aim by estimating the range of the data using peak and valley detectors. However, a disadvantage of these algorithms is that several tuning parameters are required to estimate the range of the data making the algorithms challenging to tune. Furthermore, the incremental algorithms above are constructed to track a single quantile. However, in many practical applications, it is required to track multiple quantiles, e.g. to get an overall non-parametric approximation of the current data stream distribution. As pointed out in the introduction the research on joint tracking multiple quantiles is sparse.

Monotone Property Violations for Incremental Quantile Estimators
Let X n denote a stochastic variable representing possible outcomes from a data stream at time n and let x n denote a random sample (realization). Further let f n (x) represent the distribution of X n and Q n (q) the quantile associated with probability q, i.e P (X n ≤ Q n (q)) = q. The paper will focus on joint tracking of quantiles for K different probabilities q 1 , q 2 , . . . , q K . A straight-forward approach would be to run K quantile tracking algorithms in isolation, but in this case, the joint properties of the quantiles will not be taken into account and even the monotone property of quantiles may get violated. We illustrate this using the deterministic based multiplicative incremental quantile estimator (DUMIQE) approach from [45] as an example. Please see [6] for an illustration for another algorithm. When a new x n is received from the data stream, the DUMIQE updates the estimates as follows for k = 1, 2, . . . , K. Please note that the DUMIQE algorithm assumes that Q n (q k ) > 0 ∀ k, n which is not useful if the true quantiles are negative for some n. To be able to efficiently track any quantile, [45] suggested two simple solutions. The first is based on tracking a phantom variable D n = X n + ∆ n where ∆ n is iteratively updated such that D n > Q min > 0. The second approach is based on combining the DUMIQE above for positive quantiles and a modified version for negative quantiles. Please see [45] for further details. Assume that the monotone property is satisfied and that the sample x n admits a value between Q n (q k ) and Q n (q k+1 ), i.e Then according to Equation (1) the estimates are updated as follows which means that the estimates are increased for the quantiles with an estimate below x n and decreased for the estimates above x n . Consequently, the monotone property may get violated.

Joint Tracking of Multiple Quantiles
In Sections 5 and 6 we will present two new algorithms for joint tracking of multiple quantiles based on extending the DUMIQE and QEWA algorithms [18], respectively. Both algorithms are based on the same concept: first track a central quantile of the distribution and typically the median. Next track other quantiles relative to the central quantile by taking advantage of conditional distributions. E.g. to track the 25% quantile, simply track the median of observations below the estimates of the median. However, some more sophisticated modifications of this concept will be conducted before obtaining the final algorithms.
Please note that the concept is general and can be applied to also extend other incremental tracking algorithms then the DUMIQE and QEWA considered in this paper.
The algorithms will take advantage of the following properties. Property 1: Define Y as a shifted variable of X, X = Y + δ and let Q Y (q) denote the q quantile of Y , then Q X (q) = Q Y (q) + δ. This means that if Y is a shifted variable of X, a 5 similar shift is observed in the quantiles. This follows from Property 2: Let q 1 < q 2 . Consider the conditional probability using that Q Y (q 2 ) = 0 (Property 1). This means that the q 1 quantile of Y is equal to the q 1 /q 2 quantile of the conditional variable Y | Y < 0. Conditioning in the opposite direction using that Q Y (q 1 ) = 0 (Property 1). This means that the q 2 quantile of Y is equal to the (q 2 − q 1 )/(1 − q 1 ) quantile of the conditional variable Y | Y > 0. Property 3: Finally we will take advantage of the multiplicative nature of DUMIQE (Equation (1)), which means that if Q 0 (q) > all following estimates will be strictly positive.
Due to the multiplicative nature of the DUMIQE algorithm (Property 3), we can jointly track multiple quantiles without using Property 2. This simplifies the algorithm. We also tested an algorithm based on DUMIQE using both Properties 1 and 2 resulting in a more complicated algorithm without any improved results. The QEWA algorithm is not multiplicative and the extension will use both Properties 1 and 2.

Extension of the DUMIQE Algorithm
We start by presenting the algorithm for K = 2 before extending to K > 2.

Tracking of Two Quantiles
Assume that q 1 < q 2 and Q n (q 1 ) > 0, n = 1, 2, . . . † . The algorithm will take advantage of a shifted variable Y n,1 = X n − Q n+1 (q 1 ). Let Q Y,n (q 2 ) denote an estimate of the q 2 quantile of Y n,1 . The monotone property Q n (q 1 ) < Q n (q 2 ) implies that Q Y,n (q 1 ) > 0 (Property 1) and we thus require Q Y,n (q 2 ) > 0. The algorithm is initiated with Q 0 (q 1 ) > 0 and Q Y,0 (q 2 ) > 0 and consists of the following updates: † If we are not sure that this is satisfied, a transformation as described in Section 3 can be used. 6 1.1 Update Q n (q 1 ) using the DUMIQE update rule in Equation (1) 1.4 Finally get an estimate for Q n+1 (q 2 ) by shifting back (Property 1).
It is straight forward to see that the monotone property is satisfied. Due to the multiplicative update form of the DUMIQE, Q Y,n (q 2 ) is positive for every n. Thus from Equation (8) it follows that Q n (q 1 ) < Q n (q 2 ) for every n.
The quantiles can further be updating in the opposite direction, i.e. by first tracking the Q n (q 2 ) quantile of the data stream and then Q n (q 1 ) relative to the Q n (q 2 ). We assume that Q n (q 2 ) > 0, n = 1, 2, . . . and again introduce a shifted variable, but in addition change sign Y n,2 = Q n+1 (q 2 ) − X n . Let Q Y,n (q 1 ) denote an estimate of the q 1 quantile of Y n,2 . The algorithm is initiated with Q 0 (q 2 ) > 0 and Q Y,0 (q 1 ) > 0 and consists of the following updates: 2 Compute the shifted observation y n,2 ← Q n+1 (q 2 ) − x n .
2.3 Track the q 1 quantile of the shifted variable using DUMIQE 2.4 Finally get an estimate for Q n+1 (q 1 ) by shifting back (Property 1).

Tracking of Multiple Quantiles
The algorithm for a general K is shown in Algorithm 1. In step 2, q c refers to a central probability of the data stream distribution and typically q c = 0.5 such that Q n (q c ) is an estimate of the median. The function DUMIQE( Q n (q c ), x n , q c , λ) refers to one update with the DUMIQE algorithm with Q n (q c ) referring to the estimate of Q n (q c ), x n the received data stream observation and λ the value of the tuning parameter.
In steps 3 to 7, the procedure 2.1 to 2.4 in the previous section is repeatedly performed. First Q n+1 (q c−1 ) is updated relative to Q n+1 (q c ), then Q n+1 (q c−2 ) relative to Q n+1 (q c−1 ) and so on. In steps 8 to 12, the procedure 1.1 to 1.4 in is repeatedly performed.
Algorithm 1 Joint quantiles tracking algorithm based on the an extension of DUMIQE. Input: end for 8: for k ∈ c + 1, . . . , K do 9: end for 13: end for 6 Extension of the QEWA Algorithm The QEWA algorithm consists of the following updates.
• If x n > Q n (q) • Else • a n+1 ← q where µ + n+1 and µ − n+1 represent estimates of the conditional expectations µ + = E(X n |X n > Q n (q)) and µ − = E(X n |X n < Q n (q)), respectively. From Equation (10) we see that the estimator is in fact a generalized EWA with weights 0 < b n < 1. The weights, b n , are computed such that the estimator tracks the quantile Q n (q) and not the expectation E(X n ) of the data stream distribution ‡ . For more details, we refer to [18].
We start by presenting the algorithm for K = 2 quantiles before extending to a general K. ‡ If constants weights were used, i.e. bn = b, Equation (10) would track the expectation E(Xn) and not Qn(q) 8

Tracking of Two Quantiles
The procedures will consist of the same steps as in Section 5.2, except that steps 1.3 and 2.3 will use conditional quantiles (Property 2) and thus are slightly more involved. The two procedures will be presented in the opposite order compared to above. We take advantage of a shifted variable Y n,2 = X n − Q n+1 (q 2 ), which is assumed to be negative Q Y,n (q 1 ) (Property 1) ‡ . The algorithm is initiated with Q 0 (q 2 ) > 0 and Q Y,0 (q 1 ) < 0 and consists of the following updates: § 1.1 Update Q n (q 2 ) using the QEWA update rules in Equations (10) to (16) to obtain an estimate Q n+1 (q 2 ).
1.3 Track Q Y,n (q 1 ) by tracking the q 1 /q 2 quantile of the conditional variable Y n,2 | Y n,2 < 0 (recall Equation (5)). Thus if y n,2 < 0, apply the update rules in Equations (10) to (16) with q = q 1 /q 2 to obtain a quantile estimate Q Y,n+1 (q 1 ). Further, if y n,2 > 0, we are outside of the support of the conditional variable and thus do no update of the quantile estimate, i.e. Q Y,n+1 (q 1 ) ← Q Y,n (q 1 ).
By the same reasoning as above, also this algorithm ensures the monotone property. ‡ The Y 's can be defined in different ways, but we find this to be the easiest. § Please note that to be able to track a quantile using the QEWA algorithm, the conditional expectations must also be tracked to obtain the weights bn. To simplify the explanation, we only focus on the quantile tracking for now.

Tracking of Multiple Quantiles
The algorithm for a general K is shown in Algorithm 2 where µ − c,n and µ + c,n refer to estimates of the conditional expectations E(X n |X n < Q n (q c )) and E(X n |X n > Q n (q c )), respectively. Further µ − Y,k,n and µ + Y,k,n refer to estimates of the conditional expectations E(Y n,k |Y n,k < Q Y,n (q k )) and E(Y n,k |Y n,k > Q Y,n (q k )), respectively. The function QEWA( Q n (q c ), µ − c,n , µ + c,n , x n , q c , λ, ρ) refers to one update with the QEWA algorithm with Q n (q c ), µ − c,n and µ + c,n representing estimates of Q n (q c ), E(X n |X n < Q n (q c )) and E(X n |X n > Q n (q c )), respectively, x n the data stream observation and λ and ρ the tuning parameter of the QEWA algorithm. In steps 3 to 11 and in steps 12 to 21, the procedures ind 1.1 to 1.4 and 2.1 to 2.4, in the previous section, are repeatedly performed.

end for 22: end for
We end this section with a few remarks. Remark 1: The estimate Q n (q c ), in Algorithms 1 and 2, tracks the overall trend of the data stream, while the other quantiles are updated relative to Q n (q c ) and only need to track changes in shape of the distribution. Thus for most dynamic data streams it is natural to use a step size λ that is on a larger scale than γ. Our experiments will show that the performance of the algorithm is not sensitive on the value of the γ parameter. Any value of γ somewhere between 1/10 and 1/10000 performed well in all our experiments. The performance of Algorithm 2 is not sensitive on the value of ρ either. Following the recommendation in [18] of using ρ = 0.01λ performed well in all our experiments.
Remark 2: In [45] and [18] theoretical results are given proving that the DUMIQE and QEWA estimators converge to the true quantiles for static data streams. Although the DUMIQE and QEWA algorithms are designed to track quantiles for dynamic environments, it is an important requirement that the estimators converge to the true quantile for static data streams. These theoretical results again imply that Algorithms 1 and 2 will converge to the true quantiles for static data streams.

Experiments
In this section, we evaluate the performance of Algorithms 1 and 2 for both synthetic and real life data sets. We will denote Algorithms 1 and 2 by ShiftQ and CondQ, respectively (since computation of quantiles are based on SHIFTed and CONDitional stochastic variables). We compare against the alternative multiple quantile tracking algorithm we are aware of in the literature, namely the method of Cao et al. in [6] and the MDUMIQE by Hammer et al. in [19]. Figure 1 shows a small section of the tracking processes for the algorithm DUMIQE, MDUMIQE, CondQ and ShiftQ. The gray dots show the data stream, that is the same in all the four panels. The black, gray and green curves show tracking of the 0.4, 0.5 and the 0.6 quantiles of the data, respectively. We see that using DUMIQE, the monotone property is violated while the three other algorithms are able to satisfy the monotone property. The idea behind the MDUMIQE is to reduce the step size to avoid monotone property violations. Consequently, as shown on the upper right panel, if the data stream changes rapidly, the MDUMIQE is not able to track efficiently since the adjusted step sizes become too small. We see that both CondQ and ShiftQ are able to efficiently track the dynamics of the data stream. However, CondQ is able to track the dynamics with less noise and thus in total seems the most efficient algorithm.

Synthetic Experiments
We now proceed to a more thorough evaluation of the suggested algorithms. The algorithms are designed to perform well for dynamically changing data streams and the experiments will focus on this. We considered four different data cases. For the first case, the data stream distributions were normally distributed and the expectations, µ n , varied smoothly as follows µ n = a sin 2π T n , n = 1, 2, 3, . . .
which is a sinus function with period T . For the second case, the data stream distributions were also normally distributed, but the expectation switched between values a and −a The standard deviation was set to one. For the two remaining cases, the data stream distributions were χ 2 distributed, one with smooth changes and one with rapid swithces. For the smooth case the number of degrees of freedom, ν n , varied with time as follows ν n = a sin 2π T n + b, n = 1, 2, 3, . . . where b > a such that ν n > 0 for all n. For the switch case, the number of degrees of freedom switched between values a + b and −a + b In the experiments we used a = 2 and b = 6. The χ 2 distribution is quite heavy tailed and both the scale and the shape of the distribution change with time making this a challenging quantile tracking problem.
Tracking error was measured using the average root mean squared error for the different quantiles where N is the total number of samples received from the data stream. In the experiments, we used N = 10 6 which efficiently removed any Monte Carlo errors. Following the recommendations in [18], for the QEWA estimator in the CondQ algorithm we used a ρ/λ ratio equal to 1/100. In order to obtain a good overview of the performance of the algorithms, we measured the estimation error for a wide range of values for the step length in the algorithms. However, in a practical situation, the history of the data stream can be used to track optimal values of the tuning parameters ¶ . Thus the focus will be on the performance of the algorithms using optimal step lengths. Complete results, showing estimation errors for every choice of the tuning parameters, are given in Figures 3 to 6 in Appendix A.
Tables 1 to 4 show estimation error for the different algorithms using an optimal step length. We see that the CondQ outperforms all the other algorithms for each of the 16 cases (data streams) and mostly with a clear margin. For most of the cases, both CondQ and ShiftQ outperform the algorithm by Cao et al. and the MDUMIQE. Further we see that the performance of CondQ and ShiftQ are not sensitive to the choice of the tuning parameter γ. Any of the value of γ between 0.0001 and 0.1, performed well in all the experiments. This makes tuning of the CondQ and ShiftQ algorithms easy.
As expected we observe that for the cases with many quantiles (K = 19) or rapid changes (T = 100), the MDUMIQE performs poorly since the adjusted step size to satisfy the monotone property will be too small to efficiently track the dynamics. The algorithm by Cao et al. performs well for the case with many quantiles (K = 19) and slow changes (T = 1000). However, with few quantiles, the algorithm performs poorer since the approximation of the cumulative distribution becomes poor. Further, with rapid changes (T = 100), the approximation procedure struggles to keep track with the changes in the data stream resulting in poor tracking.

Real-life Data Streams -Activity Change Detection
Activity recognition is a highly active field of research where the goal is to use sensors to automatically detect and identify the activities a user is performing. E.g. one could identify if a user (perhaps an obese child) is performing a healthy amount of exercise. We will focus on identifying changes in activities using accelerometer data which is available on almost any smart cell phone today. ¶ We are currently working on such procedures    We consider an accelerometer dataset from the Wireless Sensor Data Mining (WISDM) project [25]. Accelerations in x, y and z directions where observed, with a frequency of 20 observations per second, while users where performing the activities walking, jogging, walking up a stairway and walking down a stairway. A total of 36 users were observed and the dataset contained a total of 989 875 observations. Current research focuses on supervised approaches where historic and annotated activity observations are used to train an activity classification model [4]. For instance, the work reported in [25] trained models like decision trees and neural networks. However such an approach is highly sensitive to changes over time like if the user changes to an activity that is not part of the training material, becomes fitter, sick etc. In this example, we rather take an unsupervised approach and the goal is to detect whenever the user changes activity. Since we receive 20 accelerometer observations per second, it is important that the streaming approach is computationally efficient.
Change detection is useful as part of a sequential supervised scheme. Whenever a change is detected, the observations from the last activity can be classified and the supervised classifier retrained. If the supervised learner is sufficiently uncertain about the activity type of the last activity, it may ask the user for feedback in an online manner to gradually improve performance. Figure 2 shows in gray x, y and z acceleration for an arbitrary user. The red lines show when the user changed activity. Mostly the acceleration distributions are fairly stationary within an activity, but with some gradual and abrupt changes. The users often changed activities as often as every 30 second making this a challenging tracking and change detection problem.
We suggest the following change detection procedure. Let Q n,w (q) denote the estimate of the quantile associated with probability q of the accelerometer observations at time n and in dimension w ∈ {x, y, z}.
2. In each dimension compute the Euclidean distance between the current quantile estimates and the estimates h seconds back in time 3. In each dimension characterize the main distributional properties of ED n,w by tracking the first two moments using exponentially weighted moving averageŝ µ(ED n,w ) = (1 − ξ)μ(ED n−1,w ) + ξED n,ŵ µ(ED 2 n,w ) = (1 − ξ)μ(ED 2 n−1,w ) + ξED 2 n,ŵ σ(ED n,w ) = μ(ED n,w ) 2 −μ(ED 2 n,w ) 4. When the user changes activity, we expect ED n,w to rapidly increase in at least one dimension w. Thus a new activity is detected when ED n,w is more than η standard deviations higher thenμ(ED n,w ) in at least one dimension, i.e. if max w ED n,w −μ(ED n,w ) σ(ED n,w ) ≥ η 5. When a new activity was detected, restart the tracking of the quantile estimates and go back to step 1. We compare the quantile tracking approach above with tracking the first two moments of the acceleration distribution in each dimension w ∈ {x, y, z} leading to the following approach: 1. Let x n,w denote the observed accelerations at time n in dimension w ∈ {x, y, z}. Track the mean and standard deviation in each dimension using exponentially weighted moving averageμ (X n,w ) = (1 − ν)μ(X n,w ) + νx n,w µ(X 2 n,w ) = (1 − ν)μ(X 2 n,w ) + νx 2 n,w σ(X n,w ) = μ(ED n,w ) 2 −μ(ED 2 n,w ) 2. In each dimension compute the Mahalanobis distance (MD) between the current estimate of the mean and the estimate h seconds back in time, MD n,w = |μ(X n,w ) − µ(X n−h,w )|/σ(X n,w ).
3. In each dimension characterize the main distributional properties of MD n,w by tracking the first two moments using exponentially weighted moving averageŝ 4. When the user changes activity, we expect MD n,w to rapidly increase in at least one dimension w. Thus a new activity is detected when MD n,w where more than η standard deviations higher thenμ(MD n,w ) in at least one dimension, i.e. if max w MD n,w −μ(MD n,w ) σ(MD n,w ) ≥ η 5. When a new activity is detected, restart the tracking of the quantile estimates and go back to step 1.
A disadvantage with the MD approach above is that it only can detect changes in the first two moments of the acceleration distributions, while the quantile tracking approach can detect any changes in the distributions.
We measured the performance of the approaches for a wide range of values for the tuning parameters. To properly characterize the acceleration distributions, we tracked a total of K = 9 quantiles, namely quantiles associated with the probabilities 0.1, 0.2, . . . , 0.9. Given the scale of the observations small values of λ, β and ν are reasonable and we used 0.001, 0.005 and 0.01. Further we used 0.005, 0.01 and 0.05 for ξ, 5, 10 and 15 seconds for h we used and 10, 15 and 25 for η. Finally, following the recommendations in [18], we used ρ/λ = 0.01. We ran the change detection approaches for the whole dataset for all the combinations of the tuning parameters resulting in a total of 162 and 81 experiments for the quantile and MD approaches, respectively.
To measure detection performance we used the well-known measures precision, recall and the F1 score [37]. If the approach detects more than one change between two true changes, we characterize the first change as a correct detection and the others as false detections. Then   where the F1 score is the harmonic mean of precision and recall. Tables 5 to 7 show the top ten results with respect to the F1 score for the different approaches. We see that the CondQ and ShiftQ outperform MD. Further we see that CondQ detects true changes more rapidly than ShiftQ (last column). Using CondQ for change detection seems to be the best alternative.

Closing Remark
Incremental quantile estimators document state-of-the performance to track quantiles of dynamically varying data streams. They are however not able to compute joint estimates of multiple quantiles and even the monotone property of quantiles may get violated. In this paper we present  a new procedure that is able use incremental quantile estimators to obtain joint estimates of multiple quantiles. The fundamental idea is to first track a central quantile and track other quantiles relative to the central quantile. The joint properties are preserved by using properties of shifted and conditional quantiles. We apply the procedure to obtain the ShiftQ and CondQ algorithms which are extensions of the DUMIQE and QEWA algorithms, respectively. The experiments show that the ShiftQ and CondQ algorithms outperform state-of-the-art multiple quantile tracking algorithms in both synthetic and real-life data streams. Further, CondQ outperforms ShiftQ which is as expected since the QEWA documents better performance than DUMIQE [18].
A direction for future research is to apply this general concept of conditional quantiles to also extend other incremental quantile estimators.