Bayesian Analysis of the Stationary MAP2

In this article we describe a method for carrying out Bayesian estimation for the two-state stationary Markov arrival process (MAP2), which has been proposed as a versatile model in a number of contexts. The approach is illustrated on both simulated and real data sets, where the performance of the MAP2 is compared against that of the well-known MMPP2. As an extension of the method, we estimate the queue length and virtual waiting time distributions of a stationary MAP2/G/1 queueing system, a matrix generalization of the M/G/1 queue that allows for dependent inter-arrival times. Our procedure is illustrated with applications in Internet traffic analysis. MSC 2010 subject classifications: 62F15, 62M05, 60K20.

However, the MAP has mainly been studied from a queueing theory perspective. A number of variants of the MAP/G/1 queueing system have been considered in e.g. Wu et al. (2011), Zhang andHou (2011), Xue and Alfa (2011), Chaudhry et al. (2013), Dudina et al. (2013), and Banerjee et al. (2015). Also, Ramaswami (1980) derived a detailed theoretical analysis of the single-server queue where the arrival process is governed by a MAP with batch arrivals, which was studied later by Lucantoni (1991Lucantoni ( , 1993, where different numerical algorithms for computing the steady-state solutions were given. While performance analysis for models incorporating MAPs is a well developed area, less progress has been made on statistical estimation for such processes. Probably the major difficulty for inference for MAPs is that they suffer from identifiability problems so that many different MAP parameterizations produce the same joint density for any observed sequence of inter-event times when the underlying states of the process are not observed (Ramírez-Cobo et al., 2010b). In the context of statistical inference, this implies that it is not sensible to estimate the individual parameters of the MAP given a sample of inter-event time data, since there are infinitely many parameterizations which give the same density for the observed process. Some papers have investigated approaches to inference for MAPs, either from a moment matching method perspective or via maximum likelihood, see e.g. Riska et al. (2002) or Kriege and Buchholz (2010) and the references given therein for a recent review. However, in most of these articles, with the exceptions of Telek and Horváth (2007) and Bodrog et al. (2008) the issue of identifiability is not accounted for.
Although there has been some work on Bayesian estimation for a specific, identifiable subclass of MAPs, that is the Markov modulated Poisson process (MMPP ), see Scott (1999), Scott and Smyth (2003), Fearnhead and Sherlock (2006), as far as we know, Bayesian inference for the general MAP has not previously been considered. Recently however, an identifiable representation for the simplest, two-state stationary MAP (MAP 2 ), has been found by Bodrog et al. (2008). Thus, using this canonical representation of the MAP 2 the identifiability problems are removed. Therefore, the first objective of this paper is to undertake Bayesian estimation of the stationary MAP 2 .
Concerning the MAP/G/1 queueing system, most of works are focused on theoretical aspects and not on inferential issues, with the exception, to the best of our knowledge, of Riska et al. (2002). However, in teletraffic contexts where the MAP is used as an arrival model, there will typically be parameter uncertainty present and it will be of interest to estimate probabilities of congestion, queue lengths, waiting time distributions etc. based on sample data. Similarly, the study of ruin problems in insurance is directly related to the analysis of queueing systems, see Prabhu (1998), Asmussen and Albrecher (2010) or Ramírez-Cobo et al. (2010a). Therefore, given real insurance claim data (which are commonly dependent), it will be important to make inference for the MAP/G/1 queue in order to obtain an estimation of the ruin probability. Thus, the second objective of this work is to obtain numerical predictions of the steady state distributions of the MAP 2 /G/1 queue which, as far as we know, is an unexplored problem in the literature. Our MCMC (Markov chain Monte Carlo) algorithm for Bayesian inference for the MAP 2 is combined with results from queueing theory to obtain numerical predictions of the waiting time and queue length distributions of the system. The rest of this paper is organized as follows. In Section 2 we review the definition and key properties of the stationary MAP 2 . Special emphasis is put on comparing the MAP 2 with the two-state MMPP (MMPP 2 ), the most considered subclass of MAP in the literature. In Section 3 we introduce a MCMC algorithm for Bayesian inference for the stationary MAP 2 . Our approach is illustrated in detail with simulated and real data sets. In Section 4 we examine the MAP 2 /G/1 queueing system and show how our approach for estimating the MAP 2 can be used to approximate the steady state distributions of interest. Our results are then applied to a real example of Internet traffic arrivals. Conclusions and possible extensions to this work are considered in Section 5.

The stationary MAP 2
This section summarizes the main properties of the stationary MAP 2 , with special emphasis on the identifiability problem and differences between the MAP 2 and MMPP 2 processes.

The MAP and the MAP 2
Formally, a (m state) MAP is a doubly stochastic process {J(t), N(t)}, where J(t) represents an irreducible, continuous-time, Markov process with finite state space S = {1, 2, . . . , m}, and N (t) is a counting process that records the number of events occurring in (0, t].
The MAP evolves as follows. An initial state i 0 ∈ S is generated according to some initial probability vector α and at the end of an exponentially distributed sojourn time in state i ∈ S, with mean 1/λ i , one of two types of state transition can occur. Firstly, with probability 0 ≤ p ij1 ≤ 1, a single event occurs and the MAP enters a state j ∈ S, which may be the same as the previous state. Otherwise, with probability 0 ≤ p ij0 ≤ 1, no event occurs and the MAP enters a different state j = i. Clearly, we have that for A MAP can thus be expressed in terms of the initial probability vector α and the parameters {λ, P 0 , P 1 }, where λ = (λ 1 , λ 2 , . . . , λ m ), and P 0 and P 1 are m×m transition probability matrices with ij'th elements p ij0 and p ij1 , respectively.
From now on, we shall concentrate on the specific properties of the MAP 2 , that is the two state MAP with m = 2 and state space S = {1, 2}. Good reviews of the theoretical properties of the general MAP can be seen in e.g. Lucantoni (1993), Latouche andRamaswami (1999), andChakravarthy (2001).
As with any MAP, instead of transition probability matrices, the MAP 2 can be characterized in terms of rate (or intensity) matrices. For the MAP 2 , these matrices are Intuitively, D 1 (D 0 ) can be thought of as governing transition rates that do (do not) generate occurrences of events. D = D 0 + D 1 is the infinitesimal generator of the underlying Markov process J(t) with stationary probability vector π = (π, 1 − π) and computed as πD = 0. The stationary version of the process which we shall consider in this work is obtained when α = π.
The stationary MAP 2 can be viewed as a Markov renewal process. Indeed, let S r denote the state of the MAP 2 at the time of the rth event, and let T r denote the time between the (r − 1)th and rth events.
r=1 is a Markov renewal process, and in particular, {S r } ∞ r=0 is a Markov chain with transition matrix P given by From the practical viewpoint, it will often be the case that only a sequence of interevent times t = (t 1 , t 2 , . . . , t n ) are observed and that the corresponding state sequence of the MAP 2 is not. Therefore, it is important to consider the distribution of the variable T , representing the time between two successive events (inter-event time) in the stationary version of a MAP 2 . It is known that T is phase-type distributed with representation {φ, D 0 }, see e.g. Telek and Horváth (2007), and therefore the moments of T can be computed as and e is a row vector with all its components equal to one. The inter-event times in a stationary MAP 2 are known to be correlated and in e.g. Bodrog et al. (2008), it is shown that the autocorrelation coefficient of lag h, say R h , is given by where −1 ≤ γ < 1 is one of the two eigenvalues of the transition matrix P (as P is stochastic, then necessarily the other eigenvalue is equal to 1). Note that the specific structure in (5) implies geometric decay of the autocorrelation function, a property that is also shared by the general m-state MAPs, see Hervé and Ledoux (2013).
Finally, the likelihood function for a sequence of observed inter-event times t = (t 1 , t 2 , . . . , t n ) is given by see for example, Chakravarthy (2001).

A canonical representation for the stationary MAP 2
Representation (1) of the stationary MAP 2 is known to be over-parameterized, in the same sense as in Rydén (1996) for any given sequence of inter-event times t = (t 1 , . . . , t n ) and for all n ≥ 1, where f (t|D 0 , D 1 ) is the observed likelihood function given by Eq. (6). Bodrog et al. (2008) proves that any MAP 2 is completely characterized in terms of four descriptors, the first three moments and first-lag autocorrelation coefficient, (μ 1 , μ 2 , μ 3 , R 1 ), see Eq. (3) and (5). This definition of the MAP 2 in terms of four quantities allowed them to find a unique, canonical representation for the stationary MAP 2 . This canonical representation depends on the sign of the correlation parameter γ (in 5). Specifically, if γ ≥ 0, then the canonical form of the MAP 2 is given by where, 0 < λ 1 ≤ λ 2 and a, b ∈ [0, 1]. We shall refer to this case as model M 1 . Otherwise, for those MAP 2 s such that γ < 0, then their canonical form is where, 0 < λ 1 ≤ λ 2 and a, b ∈ [0, 1] as earlier. We shall refer to this case as model M 2 . Note that if a = b = 1, then the process is neither recurrent nor identifiable.
The transition matrices (2) for the states at an event occurrence under these two models are given by under model M 2 , respectively.

MMPP 2 versus MAP 2
As noted in Section 1, a number of works have dealt with inference for the two-state, Markov modulated Poisson process (MMPP 2 ), which is a subclass of MAP 2 s such that, in the traditional representation of the MAP 2 (1), satisfies p 121 = p 211 = 0. In particular, this implies that in the MMPP 2 1. the jumps from 1 to 2, or from 2 to 1, necessarily define transitions without the occurrence of an event, and 2. events only occur at self-transitions and since the elements of the diagonal of P 0 are zero then, every self-transition produces an event.
Because of the constraint p 121 = p 211 = 0, the stationary MMPP 2 is represented by four parameters (λ 1 , λ 2 , p 120 and p 210 ). On the contrary, the representation (1) of the general MAP 2 does not constrain any parameter to be equal to 0 (with the exception of p 110 = p 220 = 0, satisfied by all MAP 2 s, including the MMPP 2 s). This implies that in the MAP 2 , events can occur either at transitions 1 → 2, 2 → 1 or self-transitions. This fact makes the MAP 2 a more complex model which, in principle, should imply more versatility. Given that the canonical representation of a general MAP 2 is also expressed in terms of four parameters, it is of interest to study the benefits of modeling using MAP 2 s instead of MMPP 2 s. We shall address this issue below.
Firstly, the class of MAP 2 s is more general than the MMPP 2 s. Indeed, Asmussen and Koole (1993) show that stationary MAPs can approximate any stationary point process, a property which is not satisfied by MMPPs (as shown in Section 3 of the cited paper). The MAP 2 includes both renewal processes (phase type renewal processes as the Erlang and hyperexponential renewal process) and non-renewal processes, as is the case of the MMPP 2 . This implies there are classes of MAP 2 s that are not equivalent to any MMPP 2 .
As an example, consider the MAP 2 defined by λ = (0.005, 1) and Then, the procedure in Ramírez-Cobo et al. (2010b) can be used to show that the only potentially equivalent MMPP 2 to the MAP 2 given by (9) would have to be defined by λ = (0.5013, 0.5037) and which does not define a real MMPP 2 since p 120 > 1 and p 111 < 0. In order to clarify this phenomenon consider the canonical representation for (9), which can be obtained from the approach in Bodrog et al. (2008)  It can be easily seen that in this case the autocorrelation parameter γ = −0.9590 and the first-lag autocorrelation coefficient of the inter-event times is R 1 = −0.3219 < 0. Not many works have been devoted to the study of the autocorrelation function of the inter-event times produced by the MAP, but for example Kang and Sung (1995) prove that for any MMPP 2 , then γ ≥ 0 and R h ≥ 0, for all h. On the contrary, as we show next, for certain MAP 2 s the autocorrelation function R h can take negative values, for some h > 0.
Consider a MAP 2 with canonical representation (7). For this case it can be seen that γ = ab which leads to the following correlation patterns: If a MAP 2 has canonical representation (8), then γ = −ab which implies that Results in Kang and Sung (1995) concerning the autocorrelation function of the inter-event times produced by a MMPP 2 imply that patterns described in b) of equations (10) or in any of the three cases of equations (11) will never be captured by a MMPP 2 . And even the class of MAP 2 s with a correlation pattern as in a) of equations (10) is still larger than the class of MMPP 2 s. For example, the MAP 2 given by λ = (0.0064, 16.2737) and satisfies R h > 0, for all h ≥ 1, and has no equivalent MMPP 2 representation. The above demonstrates the class of MAP 2 s is richer than that of MMPP 2 s and highlights the capability of the MAP 2 to capture a range of autocorrelation patterns which cannot be obtained under a MMPP 2 .

Bayesian inference for the MAP 2
Assume now that we observe a sequence of real valued inter-event times, t = (t 1 , t 2 , . . . , t n ) generated from a stationary MAP 2 . In this section we shall develop a Markov chain Monte Carlo algorithm for Bayesian inference in order to approximate the inter-event time distribution and to estimate the parameters of the model. The algorithm is based on the canonical forms for the MAP 2 in (7) and (8).
In order to undertake Bayesian inference, we must first introduce prior distributions for the unknown MAP 2 parameters: the exponential rates λ = (λ 1 , λ 2 ) and transition probabilities a and b.
Unless there is specific information available a priori, it seems reasonable to apply relatively uninformative, but flexible prior distributions. Here, we adopt the approach of Gruet et al. (1999) and define the following prior structure and λ 2 has a gamma prior: In addition, we use independent beta (uniform) priors for a and b, a ∼ Be(α a , β a ), b ∼ Be(α b , β b ). In practice we set α 2 = 1, β 2 = 0.001, and α a = α b = β a = β b = 1.

The complete data likelihood for the MAP 2
We shall base our approach on reconstruction of the complete sequence of transitions, including both the transitions that take place when an event occurs and those transitions where there is no event but a state change occurs. Given the full history of the process up to the n'th event, then we can reconstruct the complete data likelihood function.
Lett = {t 1 , . . . ,t N } represent the sequence of real valued inter-transition times ands = {s 0 ,s 1 , . . . ,s N } the sequence of the visited states (with and without events). Note that the observed inter-event times t are included in the sett. Defineg = {g 0 ,g 1 , . . . ,g N } such thatg j = 1 (0) if the j−th transition occurred with (without) an event. Then the complete data likelihood is given by where π is the stationary probability vector corresponding to the underlying Markov chain, λ = (λ 1 , λ 2 ) denotes the exponential rates, and P 0 and P 1 are the transition matrices from states 1 and 2, respectively, with elements p ij0 and p ij1 , probabilities of a transition from state i to state j with 0 or 1 events.
The complete data likelihood function in Eq. (12) can be simplified in terms of sufficient statistics. Specifically, if n ij is the number of transitions with an event between states i and j, m 12 is the number of transitions without an event, (which must all be, given the canonical parameterization, from state 1 to state 2), n j is the number of transitions made from state j, and finally, v j is the time spent in state j, for i, j = 1, 2, then from (7) and (8), in the case of model M 1 and in the case of model M 2 , respectively.

Gibbs algorithm
Assuming that the MAP 2 is in one of the two canonical forms (7) or (8), we propose the following algorithm to sample the posterior distribution of the model parameters θ = (ω, λ 2 , a, b).
Algorithm to estimate the MAP 2 parameters.

Set initial values θ
(c) Complete the sequences (k) from f (s (k) |t, s (k) θ (k−1) ) by drawing the states when a transition without an event occurs.
We next describe how steps 2(b)-2(e) are undertaken at each Gibbs iteration. For simplicity of notation, we avoid the use of the superscript (k).
Step 2(b) To generate the set of states at events, s = (s 0 , s 1 , . . . , s n ), the classic forward-backward sampler is used, see Scott (2002). In order to implement it, the likelihood of the observed inter-event times t given the states at events s, is needed. Since the MAP 2 is a Markov renewal process, then the sequence of random variables representing the inter-event times, {T r } ∞ r=1 are conditionally independent given the sequence of states at events, {S r } ∞ r=1 , where the distributions of T r | S 1 , . . . , S r−1 is the same as that of T r | S r−1 . See Ch. 10 in Ç inlar (1975) for more details. From (7) and (8), the distribution of the inter-event times conditioned on the previous states is as follows, for all r = 1, . . . , n, where Ex(λ) represents an exponential variable with parameter λ. Thus, the likelihood of t given s is where P (i, j) denotes the (i, j)−th element in the transition matrix P as in (2), and h(·|s r−1 ) is the density function corresponding to Eq. (15) or Eq.
(16), depending on the value of s r−1 . From expression (17), note that we are assuming that our data start with an event. If this is not the case, the value φ s0 should be replaced by π s0 .
Step 2(c) Consider now the generation of intermediate states to obtain the complete sequence of states,s. From representations (7) and (8) there is at most one state transition between any two real events. For r = 1, . . . , n, letS r = ccSr(1,1)Sr (1,2) Sr(2,1)Sr(2,2) represent the number of transitions of each type that occur between real events (r − 1) and r, including the final transition associated with the real event. In particular, if s r−1 = 2 then there are no intermediate transitions and, we have that Otherwise, if s r−1 = 1, then either a transition without events to state 2 can occur, so that or there is no intermediate transitioñ Then, dropping the dependence on a, b for convenience, the following prior probabilities are obtained: , , (M 2 ) P S r = 0 1 1 0 s r−1 = 1, s r = 1 = 1.
Then, conditional on the inter-event times we have that, for model M 1 , the posterior probabilities q 1 and q 2 defined as q 1 = P S r = 0 1 1 0 s r−1 = 1, s r = 1, t r , q 2 = P S r = 1 0 0 0 s r−1 = 1, s r = 1, t r are given by and similarly for model M 2 . In Eq. (18), f (·|λ) denotes an exponential density function with rate λ, and g(·|λ 1 , λ 2 ) is the density of a sum of two exponentials with rates λ 1 and λ 2 , Step 2(d) To generate the complete inter-transitions times sequence,t, we proceed as follows. Consider the first inter-event time, t 1 . If s 0 = 2, thent 1 = t 1 . Otherwise, if s 0 = 1 and no intermediate transition occurs (that is, if s 1 = 1), t 1 = t 1 . However, if s 0 = 1 and an intermediate transition occurs (that is, s 1 = 2), thent 1 = c 1 andt 2 = c 2 , such that c 1 + c 2 = t 1 . If T 1 and C 1 represent random variables accounting for the first inter-event time, and the time at which the intermediate transition occurs, respectively, and F 2,2 denotes an F-distribution with parameters (2, 2) then, one has Therefore, we can generate x ∼ F 2,2 and set c 1 = t 1 1 + λ2 λ1 x and c 2 = t 1 − c 1 .
We can now proceed analogously to the completion oft = (t 1 , . . . ,t N ).

Model selection
The procedure to fit a MAP 2 in one of the canonical forms (7) or (8) to a given sequence of inter-event times t, has been described in Section 3.2. However, the canonical form is unknown in practice and therefore, model selection or model averaging need to be considered. Here, we use Bayes factors for model comparison and use the approach of Chib (1995) to calculate the marginal likelihoods under each model. Consider model M 1 . Then, given point estimates of the model parameters,θ = (ω,λ 2 ,â,b) (for example, θ = E(θ|·), the posterior mean), then log f (t|M 1 ) is approximated as follows: In Eq. (20), the first term denotes the likelihood function of the inter-event times, computed from Eq. (6). The second term in Eq. (20) concerns the log-prior distributions.
The joint posterior distribution of the model parameters, the third term in Eq. (20), is approximated as where the model has been dropped for the sake of abbreviation and M is the number of runs for the Gibbs sampler in Section 3.2.
Finally, the value of log f (t|M 2 ) is computed in an analogous way which allows for the derivation of the log Bayes factor.

Numerical illustrations
In this section we illustrate the performance of the proposed MCMC algorithm for Bayesian inference of the MAP 2 on a set of simulated and real data sets.
Simulated data. In order to clarify the differences between MAP 2 s and MMPP 2 s we consider three inter-event times traces, simulated from the following models: Since this MAP 2 is of type M 2 , then according to Section 2.3 it does not have any equivalent MMPP 2 .
The generator models were randomly obtained so that they satisfied to have an equivalent MMPP 2 and a non-negligible autocorrelation structure for the inter-event times (S1), to lack of an equivalent MMPP 2 (S2), and to be a MAP 2 of type M 2 with a non-negligible autocorrelation function (S3). The sample sizes where fixed to 1500, 500 and 1500, for S1, S2, and S3, respectively. The reason for such a choice is related to the theoretical coefficient of variation of the inter-event time distribution. Specifically, Figure 1: MCMC trace plots for simulated data set S1. The Gibbs algorithm was applied to a sample of 1500 data generated from a MAP 2 with parameters θ = (0.03, 49.28, 0.9343, 0.9806).
for the first and third generator models the coefficients of variation are equal to 2.48 and 1.68, which implies that if the sample sizes are not large enough, the empirical descriptors as the mean, variance and autocorrelation coefficients might be far from the theoretical ones and thus, the estimates of the model parameters might be inaccurate. For the second model, however, the coefficient of variation is close to 1, and therefore, a smaller sample size is enough to capture the theoretical descriptors.
The Gibbs sampler described in Section 3.2 with the prior structures commented at the beginning of Section 3, was run for 100,000 iterations (with 10,000 for burn in) for each data set. A prototype code was written in Matlab and, when run on Intel Core i5 at 3.2 GHz and 16 GB of DDR3 RAM, took approximately 19 seconds to perform 1000 iterations for a sample size equal to 500 (the time increased to 56 seconds when the length of the sample was 1500). Concerning the first data set S1, Figure 1 illustrates the mixing properties of the algorithm in this case.
Tables 1 summarizes the obtained results for the simulated data set S1 under both types of MAP 2 (M 1 and M 2 ) and MMPP 2 . The starting values of the MCMC, posterior mean for the model parameters, and corresponding 95% credible intervals, including those for the mean, variance, and first three autocorrelation coefficients of the interevent times distribution are included in the table. The sample values are shown in parenthesis in the first column. Finally, the marginal likelihoods under both types of MAP 2 and the MMPP 2 are also shown in the last row of the table. Figure 2 shows the fits to the empirical CDF (cumulative distribution function) of the inter-event times  provided by both types of MAP 2 and MMPP 2 . Some comments follow from the table and figure. First, the comparison of the marginal likelihoods obtained under the three models provides evidence in favor of the generating process. Note how the MMPP 2 produces a marginal likelihood that is slightly lower than that of the MAP 2 of type M 1 , but much higher than the corresponding to the MAP 2 of type M 2 . From Figure  2 it is clear that the MAP 2 of type M 1 performs better than the other approaches, the MMPP 2 provides a similar performance as the MAP 2 of type M 1 and finally, the MAP 2 of type M 2 shows the poorest behavior.
Concerning the parameters defining the generating process, it can be seen how under the generator model the credible intervals include the true values. Similarly, the intervals obtained when fitting the MMPP 2 capture the true parameters, except for parameter a. Regarding the sample values, all models correctly fit the mean and variance. With respect to the empirical autocorrelation function, the MAP 2 of type M 1 outperforms the other models (note how under the MAP 2 of type M 2 some negative values are obtained).
Consider now the analysis of the simulated data set S2, generated from a MAP 2 of type M 1 that does not possess an equivalent MMPP 2 . Table 2 and Figure 3 are described in analogous way as in the previous analysis for the sample S1. Some conclusions arise from the presented results. First, here again the generator model produces the highest value for the marginal likelihood, closely followed by the MAP 2 of type M 2 and MMPP 2 , respectively. From Figure 3 the same deduction is obtained. Note that despite having similar values for the marginal likelihoods, the estimated parameters under both types of MAP 2 s do not need to be similar because representations (7) and (8)    Finally, consider in Table 3 and Figure 4 the results obtained for the sample S3, which was generated by a MAP 2 of type M 2 .
Since any MMPP 2 is a MAP 2 of type M 1 , then as occurred in the previous simulated example, the generator model here does not have any equivalent MMPP 2 . For this data set, again the generator model is that which presents the highest marginal likelihood followed by the MAP 2 of type M 1 and the MMPP 2 . The analogous conclusion is derived from Figure 4, where it can be observed how the predictive CDF by the MAP 2 of type M 2 is almost indistinguishable from the empirical CDF. Concerning the estimation of the model parameters, note how the true values are included in the credible intervals obtained under the MAP 2 of type M 2 . The inter-event time distribution moments are well captured by the three types of models and as expected, both the MAP 2 of type M 1 as well as the MMPP 2 fail in fitting the alternate autocorrelation function of the inter-event times. With the exception of R 3 , the estimated MAP 2 of type M 2 does capture the correlation coefficients.
Real data. We now illustrate the performance of the MCMC algorithm when fitting the MAP 2 to a real data set. The sample, depicted by Figure 5, reports a sequence of 823 consecutive times (in sec) between real crashes in the corresponding author's personal computer (data available from the author on request).
As in the previous results, for comparison reasons both types of MAP 2 s and the MMPP 2 are estimated. The same prior distributions, number of iterations, and burn in  Table 3: Performance of the MCMC for the simulated data set S3, under both types of MAP 2 and MMPP 2 (second to fourth columns, respectively). From top to bottom: starting values of the MCMC, posterior mean for the model parameters, 95% credible intervals for the model parameters, 95% credible intervals for the mean, variance, and first three autocorrelation coefficients of the inter-event times distribution, and marginal likelihoods.  period as in the simulated data case are considered. Figure 6 depicts the convergence of the algorithm by showing the evolution of the average parameters for the real data set.
Note from Figures 1 and 6 that none of the figures or Monte Carlo computations would change meaningfully if only a few thousand draws had been taken.  Figure 7 show in analogous way as in the simulated data, the performance of the MCMC algorithm. From the results it is clear that the MAP 2 of type M 1 and the MMPP 2 outperform the MAP 2 of type M 2 , which presents the lowest marginal likelihood. Also, from Figure 7 it can be seen how the estimated CDFs under the MAP 2 Data set "Crashes logs" MAP 2 (M 1 )  Table 4: Performance of the MCMC for the real data set, under both types of MAP 2 and MMPP 2 (second to fourth columns, respectively). From top to bottom rows: starting values of the MCMC, posterior mean for the model parameters, 95% credible intervals for the model parameters, 95% credible intervals for the mean, variance, and first three autocorrelation coefficients of the inter-event times distribution, and marginal likelihoods.  of type M 1 and the MMPP 2 are closer to the empirical distribution than that under the MAP 2 of type M 2 . In all cases, the mean of the data are correctly fitted, while the MMPP 2 and MAP 2 of type M 2 fail in capturing the sample variance. Both the MAP 2 of type M 1 and the MMPP 2 underestimate the first autocorrelation coefficient but correctly fit the second and third coefficients. The MAP 2 of type M 2 gives negative estimates for the first and third autocorrelation coefficients.

Prior sensitivity
In this section we aim to test the prior sensitivity of the proposed MCMC algorithm. As seen just before Section 3.1, the prior structure was set as follows where in the numerical experiments carried out in Section 3.4, the values α 2 = 1, β 2 = 0.001, and α a = α b = β a = β b = 1 were selected. Figure 8 depicts in solid line the estimated (posterior) densities for the model parameters obtained when running the MCMC (MAP 2 of type M 1 ) for the S1 data set, under the previous prior structure. Note that the summarized information of such posterior sample is given in the second column of Table 1. Also, Figure 8 represents the posterior distribution of the parameters of the MAP 2 (type M 1 ), under different choices of the priors' distributions. In particular, in dashed and dotted lines the densities under the same priors for a and b as in the solid line case is shown, while the distribution of λ 2 becomes more informative (the variance decreases from 10000 to 100 and 0.1 in the dashed and dotted cases, respectively). The posterior densities shown in dashed-dotted and starred solid line are obtained under the same prior structure for λ 2 as in the solid line case, while the priors for parameters a and b change to a more (respectively less) informative distributions. Only in the case where α a = α b = β a = β b = 10, the posteriors differ significantly from those obtained in the rest of the cases. However, this is a very large change in the prior distribution and we should expect this to be influential when we have relatively small data samples as is the case here. Since analogous results were obtained when testing the prior sensitivity for the rest of data bases, it can be concluded that there is little sensitivity in posterior inference as long as the prior densities for a and b are relatively uninformative.

Inference for the MAP 2 /G/1 queueing system
In this section, we shall consider the MAP 2 as a model for the arrival process in a single-server, first in first out queueing system with independent, general service times. Following standard notation, we shall denote this queueing system by MAP 2 /G/1. The inference approach described in Section 3 will be combined with techniques from the queueing literature in order to estimate predictive equilibrium distributions for this system.

The MAP 2 /G/1 queueing system
The main properties of the MAP 2 /G/1 queueing system are briefly outlined below. For the complete description of the queueing system we refer the reader to Ramaswami (1980), Lucantoni (1993) or Ramírez-Cobo et al. (2014a).
Assume that we have a queueing system with MAP 2 arrivals, specified by the rate matrices {D 0 , D 1 } and independent and identically distributed service times with a general distribution, which are independent of the arrival times. Denote by μ < ∞ the expected value of the service time. Then, the traffic intensity of this system is given by where λ is the stationary arrival rate (inverse of the expected inter-event time), defined as λ = πD 1 e T = 1/μ 1 , where μ 1 is defined as in Eq. (3). Now define Z(t) to be the number of customers in the system (including in service, if any) at time t and let τ k be the epoch of the k-th departure from the queue, with τ 0 = 0. If the system is stable (ρ < 1), then the stationary distributions of the considered queueing system are defined as follows. For i ≥ 1, define which represents the stationary probability that the queue length is equal to i when a departure occurs. Similarly, denote by y i the stationary probability that the queue length is equal to i, at an arbitrary time. Finally, consider the waiting time CDF denoted by W (x) and defined as the probability that at an arbitrary time, a virtual customer who arrives at that time waits at most a time x before entering service. Closed-form expressions for the generating functions of the queue length distributions and for the Laplace Stieltjes transform of the waiting time distribution can be found in Lucantoni (1993). In order to invert such transforms the numerical routines described in Lucantoni (1993), as well as in Abate and Whitt (1995) were implemented.

Bayesian estimation of the MAP 2 /G/1 queueing system
As commented in Section 1, in some real-life contexts as teletraffic or insurance it is important to estimate the steady-state distributions of queueing systems. In particular, since the MAP 2 allows for dependent observations, the inference for the MAP 2 /G/1 queue is of interest. However, inferential analysis of the MAP/G/1 queueing system are rare. As established by Mcgrath et al. (1987) and Armero and Bayarri (1999), the Bayesian reasoning to the theory of queues presents a number of advantages with respect to other approaches as that related to the restrictions of the parameter space, accuracy of estimators, prediction or transient analyses. In this section we combine the MCMC algorithm shown in Section 3.2 with theoretical results from Section 4.1 to obtain estimations of the queue length and waiting time distributions in a MAP 2 /G/1 queueing system.
Given a sample of inter-event data, we have seen that the Gibbs sampler can be used to produce a sample of values θ (k) = (ω (k) , λ (k) 2 , a (k) , b (k) ), for k = 1, . . . , M, from the posterior distribution of the MAP 2 parameters. Assuming that the service rate μ is known, it is straightforward to estimate the probability that the system is stable, where ρ (k) is the system traffic intensity calculated from Eq. (22) using θ (k) and I(·) is the indicator function. If this probability is high, then for each set θ (k) of generated parameters such that ρ (k) < 1, the conditional posterior distributions of queue length and waiting times, given stability, can be estimated by Rao Blackwellization (Blackwell, 1947), that is, by simply averaging over the parameters satisfying the stability condition. Thus, for example, the predictive distribution of the waiting time W (x), is estimated by where W (c) (x) results from inverting the Laplace-Stieltjes transform of the stationary time distribution after substituting the set of parameters satisfying the stability condition θ (c) , c = 1, . . . , C ≤ M .
Note finally that when the service parameter μ is unknown, then, given an independent sample of service time data, inference for the service rate can be carried out along the lines of, for example, Armero and Bayarri (1994) (for the case where the service times were exponentially distributed). For a Monte Carlo sample, μ (1) , . . . , μ (M ) from the posterior distribution of the service rate, the traffic intensity may be estimated by calculating ρ (k) given (θ (k) , μ (k) ) and averaging as in (23). In order to condition on the existence of equilibrium, only those parameter sets (θ (k) , μ (k) ) such that ρ (k) < 1 are retained.

Application to Internet traffic analysis
Here we estimate the probabilities of congestion, queue lengths and waiting time distributions in a MAP 2 /G/1 queue under a sample of real arrival data. The analyzed data set represents a sequence of 300 consecutive inter-arrival times (in sec) and can be found in the Internet Traffic Archive (BC-pAug89 trace), http://ita.ee.lbl.gov/html/ traces.html. The same data set was explored in Ramirez et al. (2008); Ramírez-Cobo et al. (2010a) where Bayesian estimation of queueing systems with heavy-tailed arrival processes was considered. However, in such works the empirical autocorrelation of the inter-arrival times was not taken into account. The trace of the inter-arrival times as well as some of its statistical descriptors (sample mean, variance and first three autocorrelations coefficients) is shown by the top panel of Figure 9.
Both types of MAP 2 as well as the MMPP 2 were fitted to the data. In this case, the MAP 2 of type M 1 showed the best performance. The bottom panel of Figure 9 shows the empirical CDF of the inter-arrival times and superimposed (in dashed line) the fitted MAP 2 distribution generated using the Gibbs sampler described in Section 3.2. Also, the figure shows the 95% credible intervals for the statistical descriptors given by the top panel of the same figure.
Now we shall consider the queueing aspects. Given the MAP 2 arrival process, for computational simplicity we shall assume an exponential service process with rate μ . For the considered real dataset (of length equal to 300) and under the described setting, the computational time for estimating the queue according to Section 4.2 was around 32 seconds per 1000 iterations. Table 5 shows the posterior probability of stability (third column) and the expected value for the traffic intensity (fourth column) for an assortment of values of μ (the expected service time is E(S) = 1/μ ). As expected, when μ is large (faster service on average), the probability of stability of the system increases. From the table, it is clear that there is a high probability that the system is stable (that is, no congestion occurs) for values of μ greater than 450.
In Figures 8 and 9, we illustrate the predictive steady-state queue length (at departures and at an arbitrary time) and waiting time distributions for some values of μ greater than 450. Figure 10 depicts the predictive queue length distributions at departures and at an arbitrary time. As would be expected, for faster services on average (large μ ) the probability of an empty system is larger than for slower services. Finally, Figure 11 shows the predictive waiting time distributions for the values of μ considered for the analysis. It can be seen that when the service rate increases, then the distribution of the waiting times decreases, as expected as well.

Conclusions
In this article, we have illustrated how to carry out Bayesian inference for the stationary MAP 2 and then combined this approach with results from the queueing theory Figure 9: Top panel: trace of the teletraffic inter-arrival times and sample statistical descriptors (mean, variance and first three autocorrelation coefficients). Bottom panel: empirical CDF of the inter-arrival times (solid line), fitted distribution by the MAP 2 (dashed line) and 95% credible intervals for the statistical descriptors.
to estimate the steady-state distributions of interest in a MAP 2 /G/1. The MAP 2 has proven suitable for many statistical modeling applications since it combines phase-type distributions with a specific autocorrelation function for the inter-event times, allowing in this way for the modeling of dependent observations. In this work, the problem of the non-identifiability of the MAP 2 has been overcome by using the canonical version of the process instead of the redundant form. Extensive comparisons with the MMPP 2 , a simplified MAP 2 widely studied in the literature, have also been provided. A Gibbs sampler has been combined with a Chib criteria for selection model, although a different approach as the reversible jump MCMC (Green, 1995) Table 5: Expected service time (second column), probability that the system is stable (third column) and traffic intensity (fourth column).
especially when considering the inference for the MAP 2 /G/1 system due to the need of inverting a set of generating functions at each iteration.
A number of extensions are possible, among them the estimation of higher state MAPs, which are expected to show more versatility for modeling purposes than the MAP 2 , and of the batch Markov arrival process, the BMAP (Lucantoni, 1993), which allows for correlated arrivals in batches. Generalizations to inference for the BMAP/G/1 queue are also of interest. We are aware of both the theoretical and computational complexities of such problems due to the lack of unique representations and the increasing number of parameters. These complications present a challenging problem that we hope to address in the future.