Estimating customer impatience in a service system with unobserved balking

This paper studies a service system in which arriving customers are provided with information about the delay they will experience. Based on this information they decide to wait for service or to leave the system. Specifically, every customer has a patience threshold and they balk if the observed delay is above the threshold. The main objective is to estimate the parameters of the customers' patience-level distribution and the corresponding potential arrival rate, using knowledge of the actual queue-length process only. The main complication, and distinguishing feature of our setup, lies in the fact that customers who decide not to join are not observed, remarkably, we manage to devise a procedure to estimate the underlying patience and arrival rate parameters. The model is a multi-server queue with a Poisson stream of customers, enabling evaluation of the corresponding likelihood function of the state-dependent effective arrival process. We establish strong consistency of the MLE and derive the asymptotic distribution of the estimation error. Several applications and extensions of the method are discussed. The performance is further assessed through a series of numerical experiments. By fitting parameters of hyperexponential and generalized-hyperexponential distributions our method provides a robust estimation framework for any continuous patience-level distribution.


Introduction
In many service systems potential customers are provided with real-time information on the delay they will experience [22,33].This information, typically in terms of anticipated delay or current workload, is used by the potential customers to decide to either join the system or to balk.In general, one could view balking as an implicit form of admission control: the load on the service system does not explode as a consequence of the fact that during busy periods some of the potential customers decide to leave, thus mitigating the overall congestion.This means that the system is in effect self-regulating even if the potential service demand is unknown to the administrator.To make sound decisions in designing the service system, however, it is crucial to have knowledge of the volume of the potential demand and the mechanism based on which potential customers decide to join or to leave.With this knowledge, the service provider can, for example, determine the optimal admission price or decide whether it is economically viable to increase the service capacity.
This paper is motivated by the fact that it is challenging to extract this knowledge from data when balking customers cannot be observed by the administrator.Such complication arises naturally in many service systems.The most obvious case is a physical service facility where a visible queue (e.g., shops, parking lots, museums), sometimes even outside of the facility, will deter some of the potential customers.This has become even more prevalent recently with social distancing measures limiting the number of people that can be inside a public space at a given time.This form of data censorship occurs also in systems that provide real-time delay data via electronic means such as dedicated apps for the service.Some examples are expected travel times in transportation services (e.g., Google Maps and Uber) and expected delivery times in food delivery services (e.g. , UberEats in the US, or Deliveroo in Europe).Note that expected delay is often listed for the possible services (e.g., estimated arrival time of a taxi or delivery times for a list of Italian restaurants) even if a user does not indicate the specific service she is interested in, and therefore the system cannot know whether a user balked from the service due to the delay information.
In this paper, we model such service systems with unobservable balking as an M/G/s + H queue, where the last symbol stands for the cumulative distribution function of the customer patience.More specifically, we consider a first-come first-served (FCFS) queue with s (s 1) servers and potential customers arriving according to a Poisson process with rate λ > 0. Potential customers have independent and identically distributed (i.i.d.) patience levels that are distributed as the non-negative random variable Y ∼ H and bring i.i.d.service requirements that are distributed as the non-negative random variable B ∼ G. Customer i joins the system if Y i is greater or equal to the virtual waiting time at the moment of their arrival and otherwise balks without being observed.Note that while customers are homogeneous in a statistical sense, they have individual (thus heterogeneous) patience levels, so that this model can clearly represent the types of customers that will effectively join the system.
We set up an estimation problem by parametrizing the patience distribution as H θ (θ ∈ Θ), where Θ denotes a compact set satisfying some regularity conditions.The goal of this work is to estimate the pair (λ, θ) based on observation of the effective queueing process, which is constructed from records of arrival and departure times not including balking customers.As some of the customers balk, the corresponding effective interarrival times are not the usual i.i.d.exponentially distributed random variables.The effective arrival process is in fact a non-homogeneous Poisson process whose rate depends on the virtual waiting time process.In particular, the effective arrival rate to the system when the virtual waiting time v 0 is λ(1 − H(v)).Note that the marginal effective arrival process is not a non-homogeneous Poisson process with respect to time.Therefore, the dependence structure of the observations of inter-arrival times is directly determined by that of the virtual waiting time process.The workload-dependent arrival dynamics are utilized to derive a maximum-likelihood estimator (MLE) based on the effective inter-arrival observations.Specifically, the density of an inter-arrival time A conditional on initial virtual waiting time v 0 is given by The Markovian dynamics of the queue further imply that the likelihood of the inter-arrival times conditional on the waiting times and job sizes is given by a product of densities with the above form for any sample of observations.
We further study the asymptotic properties of the proposed estimator.Importantly, the asymptotic performance of the estimator is not given by standard results due to the intricate underlying dependence structure.Indeed, so as to prove consistency and asymptotic normality, a subtle reasoning is required to make sure that the situation at hand fulfils specific regularity conditions.It is further shown that for homogeneous (deterministic) patience levels, i.e., P(Y = θ) = 1 for some θ > 0, the asymptotic distribution of the errors is not normal, but rather exponential (with rate n and not √ n).The main contribution of this paper is the development of procedures for consistent estimation of the total arrival rate (corresponding to customers joining the system and balking customers, that is) and the patience-level distribution, observing the queue-length process only.Hence, somewhat counterintuitively, we can estimate load that has never been observed.
The main purpose of this paper is to lay foundations for statistical inference for congested service systems with unobserved balking.Although the queueing model considered in this paper is of course not a perfect replica of any specific real system, it does provide a methodological statistical approach for service systems where unobserved balking due to congestion is a key feature.Furthermore, the framework presented here can be extended to capture additional features, such as noisy waiting time information, as will be discussed in the concluding remarks.
It is also important to stress that the framework we develop in this paper is essentially different from the extensively studied problem of estimating patience parameters from observed abandonment data (see [11,40]).If balking customers are observed, then the data is comprised of waiting times and corresponding indicator variables stating whether a customer joined or balked.Specifically, when it is observed that a customer i joined (resp.balked) given the waiting time W i , the observation is of the form Y W i (resp.Y < W i ).This data can be directly applied to construct an estimator for θ, for example by applying an MLE method.If abandonment times are directly observed then even more information is available, namely the realization of Y i for every abandoning customer, and Y W i for customers who obtained service.Note that in the above scenarios estimation of the total arrival rate is straightforward because all arrivals are observed.However, when balking is unobserved the only information available is that of inter-arrival and waiting times, and therefore indirect estimation methods are called for.Of course, there may be applications where some of the balking or reneging customers are observed, while others are not.For such systems it may be reasonable to adopt a hybrid approach that uses classical patience estimation methods together with those presented in this paper.For the sake of brevity we assume throughout that balking happens immediately and is not observed by the system operator, so that no data regarding the balking customers is available for estimation purposes.
1.1.Contributions.We proceed by detailing our paper's contributions.The focus is on developing estimators with provable performance guarantees.Notably, setting up our estimator is straightforward in the sense that it relies on a closed-form expression for the likelihood, and does not require any queueing-theoretic analysis.As a result, even for models for which stationary performance analysis is involved (or intractable), parameter inference can be done in a relatively direct manner.Our contributions relate to (i) our estimator and its asymptotic properties, (ii) results and experiments for specific patience-level distributions, and (iii) extensions and ramifications.
• Framing the system described above as a queueing model with impatient customers, we can evaluate the corresponding likelihood function (in terms of the observed quantities).This means that, in a parametric context, we can estimate the unknown parametersi.e., λ and the parameter(s) θ corresponding to the patience level Y -relying on a maximum likelihood procedure.The estimation method has several attractive features, the most prominent one being that it does not require simultaneous estimation of the service requirement distribution or even making any distributional assumption about it.In addition, our methodology uses the fundamental queueing dynamics and does not, as is common in the literature, rely on any fluid and/or diffusion approximations.• In the case of a continuous patience-level distribution we prove, under appropriate regularity conditions, strong consistency and asymptotic normality of the resulting estimation errors (scaled by √ n).The proof of the asymptotic normality relies on an application of a suitable version of the martingale central limit theorem.• In the case that the patience level Y is constant (i.e., Y = θ for some θ), more refined results can be obtained.Most notably, we show that the estimation error (scaled by n) converges to an exponential distribution with a parameter that depends on the stationary workload density at θ and the loss probability of customers.• A number of other special cases are dealt with, covering exponentially and (generalized) hyperexponentially distributed patience levels; we present closed-form expressions for the asymptotic variance of the estimation error.The class of generalized hyperexponentially distributions is highly relevant for various reasons, most importantly because these can be used to approximate any positive continuous distribution arbitrarily well [8].In addition, in the call center literature [51] it has been observed that such distributions yield a good fit for observed customer patience in terms of the time until abandonment.We provide several numerical examples that illustrate the robustness of our methodology.Indeed, our experiments show that even if the patience-level distribution is misspecified (i.e., is not generalized hyperexponential itself), the distribution function of the estimated generalized hyperexponential distribution is still remarkably close to the true one.• We also discuss several ramifications and model extensions that can be handled using our framework.These include extending the framework to a system with noisy delay information, estimating the proportion of impatient customers in the population, and estimating specific utility parameters such as service value and waiting time sensitivity.

Related literature.
There is a substantial body of literature on queues with impatient customers.Without attempting to provide an exhaustive overview, we include a brief account of the various research lines.Naor's influential paper [45] can be seen as a pioneering contribution to the field of 'behavioral queues'.It presents a powerful stylized model for a queueing resource in which potential customers decide to join or to balk based on delay information.Since then performance and economic analysis of queueing systems with balking customers has been studied extensively.
In particular, the effect of providing workload information to strategic customers was investigated in e.g.[26,27].Many other model variations are reviewed in [29,30].More background on applied probability and queueing systems in a general context is provided in e.g.[4,17].
For various specific types of queueing systems with impatient customers explicit performance results are known; see, for example, [9,10,38,39,41,42].Importantly, however, the estimation techniques presented in our paper do not require knowledge of e.g. the queue's stationary distribution.It is also worth stressing that despite the fact that queues with impatient customers are well studied, to the best of our knowledge, hardly any workload-based estimation techniques have been developed.A notable exception is [25], featuring an asymptotic analysis of statistical inference of waiting times truncated by abandonments; our setting, however, is crucially different from the one in [25], as in the latter abandoning customers and their sojourn times are observed.
There are various papers on estimation problems in the setting where all customers join, but leave when getting impatient; examples are [1,14,40,54].Importantly, this setting is crucially different from ours, in which the balking customers are not observed.In this branch of the literature the challenge lies in the fact that for customers who are served, we have just a lower bound on their impatience time, making this a 'right-censored' estimation problem.
Another related research direction concerns the so-called queue inference engine: a setting in which one does not observe the arrival process of the customers, but rather the ordered service entry and service completion times (so-called 'transactional data'); see for example [16,36].This situation one comes across when considering e.g.automatic teller machines where the queue of customers waiting for the machine (and in particular their arrival times) is not observable, but the transactional data is recorded.In [16] a multi-server queue is considered, but with the specific feature that balking is assumed to occur when all servers are busy upon arrival, with a fixed probability independent of the queue-length, making the underlying queueing dynamics fundamentally different from ours.
Strictly speaking, [46,47] do not deal with transactional data: it is assumed that the interarrival times corresponding to the non-balking customers are observed (so that its scope is somewhat similar to ours).At the methodological level, however, [46,47] crucially differ from our approach: the proposed estimation procedure relies on discrete-event simulation rather than a rigorously backed maximum likelihood procedure.
Our work fits in the larger branch of the literature on inference for queues.This domain focuses on estimation problems where the queue is observed (for instance periodically), with the objective to estimate its input parameters.We refer to [3] for a broad recent overview of this area.Examples are the procedures for the M/G/1 queue as presented in [19,28].We also refer to the Poisson-probing based approach [50] for Lévy-driven queues [17], and the general framework presented in [20], as well as the corresponding hypothesis-testing problem [44].Some approaches rely on exploiting knowledge of the queue's tail probabilities [15,21,24].A related line of research concerns the detection of stability or instability of queueing networks; see the Monte Carlo based methods of [37,43].
The main conclusion from the above is that, despite the mature literature on queues with impatient customers and related estimation problems, there is a lack of methodologies that can deal with the situation in which balking customers are not observed.1.3.Paper organization.This paper is organized as follows.In Section 2 we describe the model featuring in this paper and introduce the notation used.In addition, we present a number of known results on queues with customer impatience that are relied on later.Then, in Section 3, we construct and analyze our maximum likelihood estimator.For continuous patience-level distributions, we prove strong consistency and asymptotic normality of the estimation error.The case of deterministic patience is treated separately, leading to a similar conclusion regarding strong consistency, but with the notable difference that in this case the limiting distribution of the estimation error is exponential.Section 4 provides detailed analysis of the MLE for exponential, hyperexponential and generalized hyperexponential patience distributions, the latter cases being relevant as they can be used to approximate the distributions of arbitrary non-negative random variables.This includes explicit analysis of the likelihood function and its corresponding asymptotic variance, as well as simulation experiments assessing the performance of the estimators.Section 5 illustrates how the method can be utilized for several applications.In Section 6 we provide concluding remarks and directions for followup research, including a brief discussion of a non-parametric estimation procedure.

Model and preliminaries
In this section we provide a detailed model description, introduce the notation used throughout this paper, state the objective of the paper, and present preliminaries.
Model description.We consider a service system with s servers (s ∈ N).Potential customers arrive at the service system according to a Poisson process with rate λ.Each customer has a service requirement with cumulative distribution function (cdf) G(•) and a patience level with cdf H(•).We define B and Y as the generic random variables corresponding to service requirements and impatience levels, respectively.The interarrival times (T i ) i=1,2... , the service requirements (B i ) i=1,2,... , and patience levels (Y i ) i=1,2,... are independent sequences of i.i.d.distributed random variables.Customer i joins the queue if Y i is smaller than or equal to the waiting time she will experience.More specifically, let V (t) (for t 0) denote the virtual waiting time at time t, which is defined as the time it would take before at least one server becomes idle if no customers have joined the system after time t: where Q(t) denotes the number of customers in the system (including those in service) at time t and N D (t) (for t 0) denotes the total number of customers finishing their service in a time-interval (0, t].For the single-server case (i.e., if s = 1), the virtual waiting time equals to the workload in system, i.e., the sum of remaining service requirements of customers.Note that the virtual waiting time in the multi-server case (s = 2, 3, . ..) does not coincide with the workload in system, unlike in the single-server case: (i) When i ∈ {0, 1, . . ., s − 1} servers are busy, the virtual waiting time equals zero, while the workload equals some positive value and decreases with slope i as time passes without effective arrivals.(ii) When all s servers are busy, the virtual waiting time and the workload decrease with slope 1 and s, respectively, as time passes without effective arrivals.Figure 1 illustrates an example of the realization of the virtual waiting time process for s = 2.The actual waiting time of a customer arriving at time t equals V (t−), so that a customer balks if the virtual waiting time just before the arrival instant exceeds her patience threshold.The resulting system can be considered as an M/G/s queue with impatient customers.
As some customers will balk, there will be a difference between the interarrival times of the customers (covering both the balking and non-balking customers) on one hand, and the effective interarrival times (covering just the non-balking customers) on the other hand.Throughout this paper the sequence of effective interarrival times is denoted by (A i ) i=1,2,... .Importantly, observe that the random variables A i are generally neither identically distributed nor independent.We can reconstruct them from the sequences (T i ) i=1,2,... , (B i ) i=1,2,... , and (Y i ) i=1,2,... in the following manner.For the following construction we let a non-balking customer arrive at time t = 0, as will be further motivated in Remark 1. Denoting the virtual waiting time process at time t by V (t), we have that A 1 = j 1 k=1 T k where T k where Observe that j i has the interpretation of being the index of the i-th non-balking customer.Let Ãi := i k=1 A k denote the i-th effective arrival time.We denote the upward-jump size in the virtual waiting time caused by the i-th joining customer by X i := V ( Ãi ) − V ( Ãi −).In the single-server case (s = 1), we have X i = B j i .In the multi-server case (s 2), on the other hand, X i takes a complicated form depending on both the queue-length and the residual service times of customers in service seen on the arrival instant.For a single-server system (s = 1), Figure 2 illustrates the workload process during a single busy period for the case of deterministic patience, i.e., Y = θ for some θ > 0. The building blocks of our framework, viz. the interarrival and service requirements and their effective counterparts, are highlighted in the figure.It is stressed that the distribution of the effective interarrival times can significantly deviate from the (exponential) distribution of the interarrival times.Figure 3 illustrates this effect for two different patience-level distributions.In addition, as mentioned, the effective interarrival times are not independent, as opposed to the interarrival times.
We focus on the case that the patience-level distribution is determined by a parameter θ that is a vector in R p for some (known) p ∈ N, entailing the estimation problem is parametric.In addition, the arrival rate λ is to be estimated, which is evidently of a parametric nature as well.Alternatively, θ could be a function from some more general space, rendering the estimation problem non-parametric; in Section 6 we briefly discuss an approach that can be used in this setting.Throughout this paper, we assume that we observe the full queue-length process.This in particular means that we observe the arrival and departure epochs of the non-balking customers (and that we do not observe the balking customers at all).The objective is to use this information to somehow learn the true arrival rate and the parameters of the patience-level distribution.More concretely, we wish to devise statistical procedures for estimating the arrival rate λ and the distribution of the patience Y (both corresponding to non-observable quantities) with provable performance properties.
Stationary distribution.We assume that the mean service time E[B] is finite, and that, with ρ := λE[B]/s denoting the traffic intensity, ( The virtual waiting time V (t) during a busy period for a singleserver system and deterministic patience levels.The dotted red lines mark arrival instants of new customers.Arrivals 3 and 5 observe a virtual waiting time level above θ and immediately balk, and thus the virtual waiting time continues to deplete as if there was no arrival.Although six customers have arrived during the busy period, only four effective interarrival times and service requirements are observed, namely (( Under these assumptions, we have that the queue is stable and that a stationary queue-length Q := Q(∞) and a stationary virtual waiting time V := V (∞) exist (see [5] and [6].)Note that lim x→∞ H(x) < 1 holds only if there are customers with infinite patience, i.e., P(Y = ∞) > 0. If P(Y < ∞) = 1, on the other hand, then we have lim x→∞ H(x) = 1, so that the system is stable irrespective of the value of the traffic intensity ρ.As we will see, the MLE method to be developed relies only on the transient dynamics of the system, so that it can be applied even if the stability condition (2) is not satisfied.On the other hand, we will utilize the system stability to discuss the asymptotic properties of the proposed estimator.
For the M/G/s queue with impatient customers, exact expressions for the distributions of the stationary queue-length Q and virtual waiting time V are not known in the literature.However, we can derive the following relations they satisfy, in a similar way to the single-server case [35].Let q n := P(Q = n) (n = 0, 1, . ..) denote the probability mass function of the stationary queue-length.The stationary virtual waiting time distribution has probability mass π 0 := P(V = 0) = s−1 n=0 q n at zero and it is absolutely continuous on (0, ∞).Let X |y denote a generic random variable for the stationary upward-jump size X conditioned that it takes a positive value and that the immediately preceding virtual waiting time equals y: With the level-crossing argument, we can verify that the density function v(x) of the stationary virtual waiting time satisfies the following integral equation (cf.[9]): Figure 3. Logarithmic scale tail distribution (smooth black) of the potential interarrival time T is exponential with rate λ = 1.Simulation of the tail distributions of the stationary effective interarrival times is illustrated for two cases: A d corresponds to deterministic patience levels with θ = 3 (dotted red), while A e corresponds to exponentially distributed patience levels with mean 3 (dashed blue).The service requirements in the example are Erlang distributed with parameters (5, 1.5).The plotted stationary effective inter-arrival distributions are given by the empirical distribution obtained by simulating n = 10 5 effective arrivals for each of the cases.
where J |y (x) := P(X |y > x) denotes the complementary cdf of the conditional upward-jump size X |y .
Owing to the pasta property, the virtual waiting time seen by an arriving customer has the same distribution as the stationary virtual waiting time V .The stationary waiting time W of a non-balking customer is thus given by a conditional random variable [V | V Y ].We can thus identify P(W = 0) and the probability density function (pdf) w(•) of W in terms of the stationary virtual waiting time distribution v(•): where P ℓ denotes the loss probability given by For s = 1, (3) simplifies to where G(x) := 1 − G(x) denotes the complementary cdf of service requirements.A solution of the integral equation ( 6) is, as can be found in [6,34], given in terms of the pdf g e (x) := G(x)/E[X] of the equilibrium distribution (i.e., the residual lifetime distribution) corresponding to the service requirements: In some special cases, the expression (7) for the virtual waiting time density v(•) can be further simplified, as shown in the examples below.
Example 1 (Constant patience levels).We consider the case that the patience levels take a constant value θ 0 (for some θ 0 > 0).Let g e (•) (for x 0, n = 1, 2, . ..) denote the n-fold convolution of the pdf g e (•) of the equilibrium distribution of the service times, and We then have, by [18,34], with as usual '⋆' denoting the convolution operator, Example 2 (Exponential service times).If the service times follow an exponential distribution with mean 1/µ, we have by [34,52] that Observe that if the patience distribution is also exponential with rate θ, then the expressions can be simplified by substituting

Parametric estimation procedure
In the setting considered, we focus on the first non-balking n + 1 customers.We set the time origin t = 0 to the time instant that a non-balking customer joins the system.We record the effective arrival times Ã1 , Ã2 , . . ., Ãn and departure times.From this information, we can fully reconstruct the virtual waiting time process using (1); in particular, we obtain the sequence of virtual waiting times V (0), V ( Ã1 ), . . ., V ( Ãn ) observed by the non-balking customers and the sizes of upward jumps X 0 , X 1 , . . ., X n caused by them.In this section the goal is to estimate, in a parametric context, the arrival rate and patience-level distribution from the data.
The crucial observation is that we can construct a likelihood function of the patience-level distribution using an observed sample of effective interarrival times A 1 , A 2 , . . ., A n , waiting times , and upward-jump sizes X 0 , X 1 , . . ., X n , which can be reconstructed from the observed sequences mentioned in the previous paragraph.We work under the natural assumption that the observed system is in stationarity, so that W 0 is the stationary waiting time of a non-balking customer and X 0 is a stationary upward-jump size.
Remark 1.To make the sequence of waiting times W 1 , W 2 , . . ., W n stationary, we should let the time origin coincide with an arrival instant of a non-balking customer.The validity of this statement can be argued as follows.
We first point out that the sequence of waiting times is in general not a Markov process; only for s = 1 it is.We obtain a Markov process when considering the vector of residual service times at effective arrival instants.The waiting times and upward jump sizes are in fact measurable functions of this process.Therefore, in what follows we let the underlying Markov process of residuals be stationary.
First it is observed that assuming the virtual waiting time at time t = 0 being stationary does not imply that the waiting time W 1 of the first customer follows the stationary distribution W .To see this, consider (for simplicity) an ordinary M/G/s queue (without customer impatience, that is).If the virtual waiting time at time t = 0 follows the stationary virtual waiting time distribution V , then the first customer arriving after time t = 0 finds an idle server with probability Since the stationary waiting time W has the same distribution as the stationary virtual waiting time V in the ordinary M/G/s queue (due to pasta), we find the inequality P(W 1 = 0) > P(V = 0) = P(W = 0).This shows that the waiting time W 1 of the first-arriving customer is biased, in that it is not distributed as W .
Intuitively, the bias is a consequence of the fact that a customer arriving after a long interarrival time is more likely chosen as the first arriving customer in our experiment (cf. the well-known inspection paradox in renewal theory), and such a customer tends to observe the system less congested than time-average.It is easily seen that letting time t = 0 correspond to the arrival instant of a non-balking customer makes the sequence of waiting times W 1 , W 2 , . . ., W n stationary.⋄ Let H(x) := 1 − H(x) denote the complementary cdf of patience levels.Also, we denote by H(x) the probability of a customer joining when the virtual waiting time equals x 0: Note that H(x) = H(x) = P(Y > x) for continuous patience distributions.
We next describe the construction of the likelihood function.We first recall that the effective interarrival times A 1 , A 2 , . . ., A n are not exponentially distributed, due to the fact that between two effective arrival instants there may have been arriving customers who observed a virtual waiting time level exceeding their patience level (as depicted in Figure 3).Despite this, we can still characterize the effective interarrival time distribution in terms of the observed quantities.To see this, suppose that, for some t 0, there have been no effective arrivals in the interval ( Ãi−1 , Ãi−1 + t].An arrival in ( Ãi−1 + t, Ãi−1 + t + ∆t] then occurs with probability λ ∆t + o(∆t), as ∆t ↓ 0. The corresponding customer joins the system (i.e., becomes effective) with probability H(V ( Ãi−1 ) − t) because (i) the virtual waiting time seen on the arrival equals max{0, V ( Ãi−1 )−t} and (ii) H(x) = 1 for x < 0. Therefore, the occurrence of the next (i.e., the i-th) effective arrival follows a timeinhomogeneous Poisson process with time-dependent intensity λ H(V ( Ãi−1 ) − t).Noting that This representation of the distribution of the effective interarrival times facilitates the evaluation of the likelihood.The (conditional) likelihood function for A := (A 1 , A 2 , . . ., A n ) given W := (W 1 , W 2 , . . ., W n ) and X := (X 1 , X 2 , . . ., X n ) is given by the product of the conditional densities This yields the conditional likelihood function where we used a Lindley-type recursion, generalized to the multi-server queue: In the rest of this section, we assume that the patience-level distribution is characterized by a finite-dimensional vector of parameters θ ∈ Θ ⊆ R p .For θ ∈ Θ, let H θ (•), H θ (•), and Hθ (•) denote H(•), H(•), and H(•) given parameter θ, respectively.Let any H θ (•), with θ ∈ Θ, be identifiable in the conventional Kullback-Leibler sense.The maximum likelihood estimator (MLE) of (λ, θ) is then given by where In the sequel we provide an asymptotic analysis describing the performance of this MLE.We first focus, in Sections 3.1-3.2,on the parameters pertaining to the patience only; i.e., the estimation does not cover the arrival rate λ.The object of study is, in self-evident notation, for λ > 0 given, Extending this to a procedure to also include estimation of the arrival rate is relatively straightforward; we get back to it in Section 3.3.
Remark 2. Observe that L is a conditional likelihood and not a full likelihood.This is due to the fact that the upward jump-sizes X i have an elaborate distribution, both marginally and jointly with the waiting times.The exception is the single-server case where X i ∼ B and is independent of the previous waiting and arrival times.Hence, for s = 1, the maximization of L n (H; A, W | X) with respect to them is equivalent to that of the unconditional likelihood function for (A, W , X).
It is noted that the procedure to estimate θ does not require the simultaneous estimation of the service-requirement or upward-jump size distributions.⋄ In more detail, the remainder of this section is organized as follows.First we establish (in Section 3.1) conditions for strong consistency and asymptotic normality of our estimator of the patience parameters, with the errors scaled by √ n, focusing on continuous patience-level distributions.We then consider (in Section 3.2) the case of deterministic patience and establish consistency (independently) for this case as well.It is noted that in the latter case the asymptotic errors are not normally but rather exponentially distributed, with the errors scaled by n.This is due to the fact that for deterministic patience the MLE is obtained on the boundary of the sample data, much like in the well-known case of estimating the parameter θ of a uniform distribution on [0, θ].As mentioned, Section 3.3 discusses the estimation of the arrival rate.
3.1.Continuous patience-level distribution.This subsection covers the asymptotic performance of the MLE for the case that the patience-level distribution is continuous and parametric.Let θ 0 denote the true parameter.Throughout the following analysis the underlying probability measure is P θ 0 , i.e., the probability measure corresponding to the true patience-level distribution.For the asymptotic results we make the following assumptions, which will be discussed in Section 3.1.1.
Assumption.The following assumptions are imposed: (A1) The parameter space Θ ⊂ R p is a compact set such that the true parameter lies in the interior of the set, i.e., θ 0 ∈ Θ o .(A2) The observation period commences at t = 0 which corresponds to the stationary arrival instant of a non-balking customer, and thus the sequence of waiting times Depending on whether the right endpoint takes a finite value or not, we assume one of the following properties: (i) If h sup = ∞, then there exists a positive non-decreasing function f (•) such that lim x→∞ f (x) ∈ (0, ∞] and for some constants (ii) If h sup < ∞, then there exists a positive non-decreasing function f (•) such that lim x→hsup− f (x) ∈ (0, ∞] and for some constants (A4) The gradient vector and Hessian matrix of H θ are continuous with respect to θ.With Ψ 1 (θ) denoting the Hessian matrix of the log-likelihood corresponding to a single waiting time (in stationarity), EΨ 1 (θ) has finite elements in all coordinates for any θ ∈ Θ. (A5) The collection of functions {H θ (x) : θ ∈ Θ} is equicontinuous: for any x 0 and ǫ > 0 there exists a δ > 0 such that |H θ (x) − H θ (y)| < ǫ for any y such that |x − y| < δ, uniformly over all θ ∈ Θ.
We now state the main results of this subsection.In the sequel, N(µ, Σ) denotes a normally distributed random variable with mean vector µ and covariance matrix Σ.
The proofs of the above theorems are provided in Section 3.1.2,after the detailed discussion of (A1)-(A5) that we give in Section 3.1.1.
3.1.1.Discussion of assumptions.We next provide more background on the assumptions imposed.In addition we discuss their possible relaxation.
(1) Assumption (A1) is natural, as it requires that the parameter space is big enough so that it contains the true parameter.In practice the parameter space can be adjusted on-the-fly, for example if for large n we obtain boundary solutions for the first order conditions.( 2) Assumption (A2) facilitates the use of known results on the convergence of stationary dependent sequences.Note that, as long as a stationary virtual waiting time distribution exists (for which we have given the condition as Eq. ( 2) in Section 2), the same results should hold without making this assumption because of the regenerative nature of the process.However, without this assumption the conditions for both consistency and asymptotic normality are harder to verify.In particular, the crucial step for consistency is the uniform convergence of the log-likelihood, established in Lemma 1 below.More elaborate conditions for uniform convergence are detailed in for example [2,48,49].(3) If there are patient customers who do not balk regardless of the waiting time, that is, lim x→∞ H θ 0 (x) > 0, then Assumption (A3) requires that the parameter space Θ is chosen so that the existence of patient customers (lim x→∞ H θ (x) > 0) is assumed for all θ ∈ Θ, which is a reasonable assumption in modeling a service system where both patient and impatient customers exist.In the case of lim x→∞ H θ 0 (x) = 0, on the other hand, Assumption (A3) requires that the decay rate of H θ (x) does not vary too strongly among the distributions H θ , with θ in the parameter space Θ.In practice, this assumption is seldom violated.For example, if the true patience-level distribution decays exponentially, i.e., lim x→∞ e νx H θ 0 (x) = c for some ν > 0 and c > 0, then (A3) is satisfied if the infimum tail function H inf (x) does not decay faster than the doubly exponential function e −e νx (which evidently decays exceptionally fast).As another example, supposing that the true patience-level distribution obeys a power law, i.e., lim x→∞ x k H θ 0 (x) = c for some k > 0 and c > 0, then (A3) is satisfied if H inf (x) does not decay faster than e −x k .(4) Assumption (A4) enables the construction of a standard martingale CLT for the asymptotic distribution of the estimation error of the MLE.It is not a necessary condition, but in cases where the assumption is not satisfied one is typically required to apply ad-hoc analysis to derive an asymptotic distribution.In Section 3.2 we show that for a deterministic patience level, that does not satisfy (A4), the asymptotic distribution of the error is exponential and not normal.Assumption (A2) implies that the expectation of the Hessian matrix of the log-likelihood Ψ 1 (θ) is the covariance matrix of the gradient, and thus it is always positive definite.As a consequence, one just needs to verify that the coordinates are finite for (A4) to hold.(5) The equicontinuity assumption (A5) enables concise analysis and can be verified for many continuous patience-level distributions.For example, if there exists a bounded density for every H θ (•), then it is Lipschitz continuous, and a uniform bound is given by the supremum of the constants in the compact set Θ.The assumption does not hold if the support of the distribution depends on θ.If H θ (x) has some discontinuities (with respect to x), one could pursue replacing (A5) by an appropriate upper semi-continuity assumption; see [23,Thm. 16b] and [48].We will present such an example in Section 5.1.Our asymptotic results may hold by replacing (A5) by (A4) and verifying additional measurability conditions on inf η∈B θ H η (x) for a neighborhood B θ of any θ ∈ Θ (see [2, Corr.2]).
3.1.2.Proofs.From ( 14), first note that we can express the log-likelihood Observe that we replaced Hθ (•) by With W := lim i→∞ W i existing, as n → ∞, the continuous mapping theorem yields where A 1 is the effective interarrival time when the initial virtual waiting time is W 0 + X 0 and W 0 = d W , i.e., the stationary waiting time.From now on we use the compact notations ℓ n (θ) := ℓ n (θ; A, W | X) and ℓ(θ The density of the effective interarrival times ( 11) is uniquely determined by the function H θ (•), and thus so is the likelihood (14).This is because the function dν θ (u) : x 0 is uniquely determined by θ, as a consequence of the fact that for every θ ∈ Θ the measure ν θ (•) corresponds to a different distribution (in the almost-everywhere sense, that is).As a consequence, {H θ (•) : θ ∈ Θ} is a parametric collection of distribution functions such that there is no pair θ 1 , θ 2 ∈ Θ for which H θ 1 (•) = H θ 2 (•) almost everywhere.This entails that the model is identifiable in the Kullback-Leibler sense (see e.g.[23, Ch. 17]), and hence The key step in the proof of Theorem 1 is establishing a uniform version of (18); then strong consistency follows by the methodology of Wald [23,.The virtual waiting time observations are not independent but by (A2) they are stationary, so that we can apply a uniform law for stationary sequences that is commonly used in the econometrics literature.We specifically rely on a theorem taken from [49] and its extensions, in particular the ones developed in [2,48].The uniform convergence is established in Lemma 1 (proven in the appendix).
Proof of Theorem 1. Observe that the strong consistency follows from the identifiability property (19) and from Lemma 1.Note in particular that the proof of [23,Thm. 17] does not rely on i.i.d.observations once uniform convergence has been established, and so the same steps can be applied here.In particular, a sufficient condition (relying on uniform convergence for strong consistency when observations are dependent) can also be found in [31]: for every θ = θ 0 there exists a neighborhood B δ (θ) such that lim n→∞ sup γ∈B δ (θ) almost surely.This follows directly from ( 19) and (20).
We continue by proving asymptotic normality that was stated in Theorem 2. This amounts to showing that the estimation error √ n( θ n − θ 0 ) is asymptotically normal as n → ∞, assuming that (A1)-(A5) are satisfied.To this end we apply the well-known delta method and an appropriate version of the martingale CLT (see e.g.[32, Thm.12.6]).
Proof of Theorem 2. For any given v, we let ∇H θ (v) ∈ R p and ∇ 2 H θ (v) ∈ R p×p denote the gradient and Hessian, respectively, of H θ .These are both continuous with respect to any coordinate θ k of η, as a consequence of (A4).Thus, both the gradient and the Hessian of the log-likelihood are continuous functions.These are given by ∇ℓ n (θ) := ln (θ) and ∇ 2 ℓ n (θ) := n i=1 Ψ i (θ), respectively, where, for k = 1, . . ., p, and, for k, l = 1, . . ., p, Following the lines of the standard delta method [55, Ch. 3], we consider the expansion of ln (•) at the MLE θ n around the true parameter: If θ 0 ∈ Θ, then due to the strong consistency that we found in Theorem 1 we have, as n → ∞, that θ n → as θ 0 .
Furthermore, as θ 0 ∈ Θ by assumption (A1), the smoothness assumption (A4) implies that, as n grows large, ℓ n (θ) converges to a concave function and the MLE is given by a sequence of roots θ n satisfying ln ( θ n ) = 0, i.e., from some N on there is no boundary solution for any n N with probability one.Therefore (24) yields, with i.e., both sides of (25) converge to the same distribution (if the limits exist).In Lemma 2 we apply the stationarity of W i in (A2) to show that B n → as I(θ 0 ), where I(θ 0 ) = −EΨ 1 (θ 0 ).Furthermore, ln (θ 0 )/ √ n is shown to satisfy a martingale CLT with asymptotic variance I(θ 0 ).As E∇ℓ 1 (θ 0 ) = 0, Assumption (A2) implies that the asymptotic variance equals the stationary covariance of the gradient hence I(θ 0 ) is a positive definite matrix (see [23,Ch. 18] for more details).Assumption (A4) further demands that the elements of I(θ 0 ) are all finite, and then combining the above and applying Slutsky's theorem to (25) yields Theorem 2.
We are thus left with showing Lemma 2 below; its proof is given in the appendix.
3.2.Constant patience-level MLE.We next consider the case where all customers have the same patience level θ 0 , i.e., H(y) = 1 {y θ 0 } ; observe that in this case the stability condition ( 2) is always satisfied, but the continuity assumptions of Section 3.1 do not apply.As we will see, in this case the properties of the MLE are markedly different from those identified in Section 3.1.
In this setting, the likelihood function ( 14) reduces to, with the event E i denoting {W i−1 +X i−1 θ} and E c i its complement, .
Observe that L(θ; A, W | X) (i) equals zero for θ ∈ [0, max i=1,2,...,n W i ), (ii) has an upward discontinuity at θ = max i=1,2,...,n W i and decreases for and (iii) takes a constant value for θ max i=1,2,...,n {W i +X i }.We thus find the intuitively appealing property that the MLE is given by i.e., the maximum virtual waiting time at jump times.Note that this estimator resembles the MLE of the parameter θ when the observations are uniformly distributed on [0, θ].
While the estimator θ n is clearly biased (which follows from E θ 0 [ θ n ] < θ 0 for all n), we show that it converges almost surely to θ 0 as n → ∞.In addition, we prove that the estimation error scaled at rate n converges to an exponential random variable. and where P ℓ and v(θ 0 ) denote the stationary loss probability and the virtual waiting time density at level θ 0 .In addition, the asymptotic variance of the estimation error agrees with the variance of the limiting exponential distribution, and is given by The proof of Theorem 3, which is given in the appendix, constructs lower and upper bounds for the MLE and establishes that they both converge to θ 0 .The same bounds are also used to characterize the asymptotic distribution of the estimation error.

3.3.
Estimating the arrival rate.Where in the preceding subsections we focused on estimating the patience-level distribution (in a parametric context) for a given value of the arrival rate λ, we now discuss the estimation of λ.Let λ 0 denote the true value of λ.There are several ways to estimate it, the most basic one relying on the idle period observations.Denote the duration of the k-th idle period by I k , where in the multi-server setting (s > 1) an idle period refers to time intervals such that at least one server is idle.Denote by E k the total number of effective arrivals during idle period k = 1, 2, . . .(meaning, for instance, E k = 1 for all k = 1, 2, . . .when s = 1).Let C n denote the number of idle periods observed up until (and including) the n-th effective arrival.Then we propose the estimator The rationale behind this estimator is that there is no balking during idle periods as the virtual waiting time is zero.Hence, the arrival process during these idle times is homogeneous Poisson with rate λ.Therefore, the estimator λ n is a standard MLE of the rate parameter of an exponentially distributed random variable, and satisfies all desired asymptotic properties.Of course, the above procedure does not exploit a substantial amount of potentially useful data that is collected during busy periods.For the case of a deterministic patience level θ 0 , the above estimator is easily improved upon.Recall that θ n was defined as max{W 1 , W 2 , . . ., W n }, with the immediate consequence that θ n θ 0 .Supposing we observe an arrival such that immediately after this arrival the virtual waiting time level is still below the current value of the estimator, then the next arrival is an effective arrival, and hence occurs after an exponentially distributed random variable with rate λ.In this way we generate more observations, thus allowing to estimate λ with a better precision.Observe that the new observations and the idle times form an i.i.d.sequence.
For the case of a continuous patience-level distribution one may use (33), or alternatively set up a joint MLE for λ and θ.The latter has clear advantages, in particular for small samples or heavily loaded systems.As the log-likelihood function ( 17) is a smooth and concave function with respect to λ, the MLE of the arrival rate for any estimator θn of θ is The asymptotic results of Section 3.1 can therefore be extended in a straightforward manner, so as to cover the joint estimation of (λ, θ).We demonstrate this in Section 4.1, where we detail a procedure for jointly estimating the arrival rate and the patience parameter for the case of an exponentially distributed patience level.

Exponential and generalized hyperexponential patience
In this section we discuss a robust and practical approach for estimating continuous patience distributions.In our approach, this is achieved by fitting the MLE of a generalized hyperexponential (GHE) distribution.This approach is attractive because the class of GHE distributions is known to be dense in the space of non-negative continuous distributions (see for instance [8]), which, in practical terms, means that any non-negative continuous distribution can be approximated arbitrarily closely by a GHE distribution.In our simulation-based experiments highly accurate estimates are obtained, even when the baseline patience distribution itself is not GHE.
We first provide a detailed analysis of the joint MLE for the arrival rate and the single parameter of an exponential patience distribution, to then move to the cases of hyperexponential and generalized hyperexponential patience distributions.We also present a heuristic search method that fits an approximate GHE distribution for any continuous patience distribution.
4.1.Exponentially distributed patience.Suppose that the arrival rate λ 0 is unknown and that the patience-level distribution is exponential, i.e., H θ 0 (x) = 1 − e −θ 0 x for an unknown θ 0 (and x 0).Due to (17) it can be verified, by distinguishing between the cases W 0 + X 0 A 1 and W 0 + X 0 < A 1 , that the log-likelihood for a single observation is here the Lindley recursion ( 13) has been used.Our objective is to analyze the MLE of both λ and θ for a sample of size n, which we denote by ( λ n , θ n ).We assume (A1)-(A2) hold, i.e., compact parameter space and stationary W 0 .In addition, (A3) holds because H θ is exponential for all θ ∈ Θ.The log-likelihood ℓ 1 (λ, θ) is Lipschitz continuous (with respect to the observations) and so (A5) holds.We thus have that Theorem 1 holds and the MLE for (λ, θ) is strongly consistent.
Clearly this is a smooth function.We proceed by computing the gradient and Hessian with respect to (θ, λ).It takes some elementary calculus to verify that gradient is given by For any given value of θ, ℓ 1 (λ, θ) is a concave function in λ, and therefore the MLE of ( 34) is given by The MLE θ n now follows by maximizing 1 n n i=1 ℓ 1 (θ, λ n (θ)) for θ ∈ Θ.The optimizing θ is obtained either on the boundary of the parameter space Θ or by solving the first order condition ∇(ℓ 1 (θ, λ n (θ)) θ = 0.As n → ∞ we are guaranteed to find an interior solution as long as θ 0 ∈ Θ o .
Taking second derivatives, we find that the entries of the Hessian matrix are given by 2 )e −θ(W 0 +X 0 ) .As w k e −θw is a bounded function on w ∈ [0, ∞) for any θ > 0 and k ∈ N, we conclude that −E∇ 2 (ℓ 1 (λ 0 , θ 0 )) < ∞ and therefore (A4) applies, which implies that Theorem 2 holds.In particular, as n → ∞, and the asymptotic covariance is given by the inverse of the Hessian, We performed numerical experiments to assess the performance of the estimation procedure.Table 1 presents approximated confidence intervals for the maximum-likelihood estimators ( λ n , θ n ); in addition we also evaluated the estimator of the arrival rate based on idle periods, denoted by λn , as discussed in Section 3.3.The confidence intervals are evaluated for three congestion levels, namely ρ = λ EX ∈ {0.5, 1, 2}.In all experiments the sample size (of observed waiting times) was n = 1000.Evidently, however, the heavier the system load, the fewer the number of idle periods: on average C n = C 1000 equals 665, 457, and 267 for the three congestion levels.
The numerical output is summarized in Table 1.A first observation is that the MLE for θ n is more accurate as the system load increases.This is because the patience threshold is reached more often and therefore the data is more informative.Furthermore, for a lightly loaded system the MLE λ n and idle-period based estimator λn yield similar (accurate) results, as opposed to the high-load regime in which the two estimators behave quite differently: • As a consequence of the fact that idle periods are observed considerably less frequently for a high load, λn becomes substantially less accurate.• The accuracy of λ n , however, is only slightly reduced in the high-load regime.This is potentially due to the better accuracy of the estimation of θ, being jointly estimated with λ in the MLE procedure.
In the most heavily loaded example (ρ = 2, that is) the 95% confidence interval of the MLE is very similar to the 80% confidence interval of the idle-period based estimator.We conclude that even though the observations of the waiting times and the effective interarrival observations are highly dependent, the MLE λ n provides substantially better confidence intervals than the idle-period based estimator λn .
Estimator  1. Confidence intervals for the MLE s ( θ n , λ n ), as well as the arrival rate estimated from idle periods λn , for different confidence levels.A total of n = 1 000 waiting observations were generated, M = 10 000 times for each parameter setting.The parameter of the exponential patience parameter was θ = 0.5, the arrival rate was λ = 1, and the service requirements followed a Gamma distribution with parameters (η, µ) with µ = 1 and η ∈ {0.5, 1, 2}.
Table 2 presents empirical confidence intervals for the maximum likelihood estimators of (λ, θ) for a multi-server system with s = 5.The confidence intervals are wider than in the single-server case, even though the sample size was taken twice as big, being indicative of the fact that the variance of the estimation error in a system with multiple servers is higher.As in the single-server case, lower system load increases the variance of the estimation error for the patience parameter θ.This effect is even stronger in the multi-server setting due to the fact that balking occurs only when all 5 servers are working, and such a state is not frequently observed.For higher load the estimation of the patience parameter is much more accurate, as was the case for the single-server system.For the arrival rate we observe high accuracy for all load levels.Table 2. Empirical confidence intervals for the MLE s ( θ n , λ n ) for different confidence levels.A total of n = 2 000 waiting observations were generated, M = 10 000 times for each parameter setting.The parameter of the exponential patience parameter was θ = 0.4, the arrival rate was λ = 1.The data was simulated for a system with s = 5 servers, and the service requirements followed a Gamma distribution with parameters (η, µ) with µ = 0.8 and η ∈ {2, 4, 8}, corresponding to loads of ρ = η sµ ∈ {0.5, 1, 2}.
4.2.(Generalized) hyperexponential patience.This subsection focuses on the case of the patience-level distribution being generalized hyperexponential (GHE), meaning that the corresponding cdf can be written as a mixture of exponential cdf s, with weights that sum to 1 but that are not necessarily positive -this in contrast with the standard hyperexponential (HE) distribution, where the weights are assumed to be positive.This case is particularly relevant because, as argued in [51], this distribution has a good empirical fit to patience data.Another motivation for considering this distribution lies in the known fact that the cdf of any continuous positive random variable can be approximated arbitrarily accurately (in terms of a suitably defined metric) by a GHE cdf [8].
For convenience we now assume the arrival rate λ to be known, but it is noted that it can be estimated in a similar manner as in the exponential case discussed in Section 4.1.Suppose the degree of the GHE distribution is p ∈ N: where θ = (α 1 , . . ., α p , β 1 , . . ., β p ) ∈ R 2p , p k=1 α k = 1 (where we, importantly, do not assume the α k to be positive) and β k > 0 for k = 1, . . ., p. Without loss of generality we assume that α 1 > α 2 > . . .> α p .Denote the coordinates of the true parameter θ 0 by (α 0k , β 0k ) for k = 1, . . ., p.
It means that we are to identify the 2p-dimensional parameter vector θ 0 = (α 01 , . . ., α 0p , β 01 , . . ., β 0p ).Similarly to the previous example of the exponential distribution, Assumptions (A1)-(A3) and (A5) are satisfied when assuming stationary waiting times and a compact parameter space Θ.Therefore, the MLE is strongly consistent by Theorem 1.One should be cautious when fitting a GHE distribution because of further conditions to be imposed on the parameters to make sure H θ (•) is a proper cdf.Even though convergence to the true parameters of the distribution is eventually guaranteed, for finite samples the estimated parameters may not yield a proper distribution.For an in-depth discussion on these conditions and the identifiability of GHE distributions, we refer [8, Section 3].
Using the representation ( 17), the log-likelihood pertaining to a single observation is calculated in a similar manner as in the exponential case, yielding As follows with some elementary algebra, the gradient is then given by, for j = 1, . . ., p, The Hessian can be derived by computing the matrix of second derivatives.For the evaluation of the asymptotic covariance, a convenient alternative is to apply (26), i.e., so as to avoid the symbolic evaluation of the matrix of second derivatives.The entries of I(θ 0 ) are finite because these are combinations of terms of the following types: • Products of polynomials (of degree at most 2) and exponentials.For example, we come across a term that is, up to a multiplicative constant, W 2 0 e −β j W 0 .Observe that the mapping x → x k e −x is bounded for x 0.
• Ratios of the form, e.g., up to a multiplicative constant In addition, it is verified that if the patience-level distribution is light-tailed (which is the case for the GHE distribution), then the stationary waiting time W is also light-tailed and has finite moments.Indeed, note that (3) implies so that we have from (4) that the density of W is bounded as w(x) λH(x), i.e., if the patience level distribution is light-tailed, so is W .As a consequence, from the above and Theorem 2, the estimation errors (scaled by √ n, that is) converge to a multivariate normal distribution.
Although there are theoretical guarantees for the asymptotic performance of the MLE, computation of the MLE is not straightforward, even for small parameter spaces such as p = 3.It requires maximizing ℓ n (θ)/n, i.e., solving a non-linear and non-concave program in p×(p−1) variables, with both equality and inequality constraints.This is computationally highly challenging, and standard optimization methods may lead to local maxima.In particular, observe that ∇(ℓ 1 (θ)) β j is very small for large values of β j , which implies that ℓ n (θ)/n displays very 'flat' behavior for large values of these β j .Our experiments revealed that a direct implementation using standard optimization packages often led to points that were even not local maxima.As a consequence, we decided to write dedicated code to compute the MLE.
In light of the inherent complexity of maximizing the likelihood function, in our numerical experiments we have applied the following nested two-step heuristic optimization method.
(a) For each vector of α = (α 1 , . . ., α p ) the objective functions ℓ n (θ)/n was maximized with respect to β = (β 1 , . . ., β p ) using a conventional coordinate descent algorithm.This step was typically fast and accurate, as for given α the objective function behaves nicely.Now we have reduced the problem to an optimization over α.(b) Then a standard L-BFGS quasi-Newton method (see e.g.[13]) is applied to ℓ n (θ)/n, so as to search for the optimal vector α, with β being parameterized by α.The optimization is carried out with the following constraints that ensure that the parameters yield a proper distribution [8]: Note that there is no firm guarantee that this heuristic method converges to the optimal parameters.To overcome this, the search for the optimizing vector α was conducted multiple times for different initial values.The resulting method turned out time-consuming, especially for a relatively large sample sizes n and/or a relatively large dimension p of the parameter space.However, it typically returns considerably more robust results than off-the-shelf optimization routines (that in addition tended to converge very slowly).Table 3 presents the marginal confidence intervals obtained by the normal approximation for an example of a HE cdf H θ (•) with p = 2. Observe that the variance of the estimation error is large, even for a substantial sample size.The accuracy is much higher for the low rate of β 1 = 0.25 than the higher rate of β 2 = 1.This may be explained by the fact that the likelihood function is almost flat for high values of β 2 .
The results presented in Table 3 show that the estimates are concentrated around the true values.The variance of the estimation error, however, is high, even more so given the large number of observations used.It should be realized, though, that HE distributions are not always easily distinguishable, in the sense that seemingly different HE distributions may lead to a very similar cdf.Clearly, from a practical standpoint the main question is not whether the correct parameters are recovered, but rather whether the cdf corresponding to the estimated parameters provides a good fit with the true cdf.In this respect the main conclusion of our experiments is that our MLE procedure performs remarkably well.For example, in Figure 4 we illustrate this by presenting the true cdf and the estimated cdf based on four random samples (using n = 10 000 observations).We have used the same parameters as in Table 3, with ρ = 1.We observe that, although the fitted parameters greatly differ from the correct ones, the fit of the (complementary) cdf is highly accurate.
In the remainder of this subsection we further study the performance of the HE MLE.Our experiments lead to the conclusion that HE MLE provides a good fit even in cases where the true distribution is not HE (i.e., in misspecified scenarios).The examples mainly focus on fitting the target distribution by a (conventional) HE distribution, rather than a GHE distribution, by computing the MLE.The motivation behind this choice is that, when working with a GHE distribution, there is the additional complication that the estimated parameters may not yield a proper cdf.More specifically, the space of valid parameters is hard to characterize, let alone to be coded in terms of constraints of an optimization problem.Nevertheless, we would still like to exploit the extra versatility that the GHE class offers, so as to improve the fit (when compared to the HE MLE).The MLE parameters of the fitted functions are: θ n,1 = (0.529, 0.471, 0.193, 0.736), θ n,2 = (0.884, 0.116, 0.297, 1.969), θ n,3 = (0.75, 0.25, 0.269, 1.031), and θ n,4 = (0.573, 0.427, 0.212, 0.746).The arrival rate in the example is λ = 1, and the service requirements are Gamma distributed with parameters (3,2).This is especially relevant for scenarios in which the model is misspecified, bearing in mind that, as mentioned earlier, in principle any distribution can be arbitrarily accurately approximated by a GHE distribution.
With the above considerations in mind, we implemented the following (seemingly naïve) heuristic model selection method.We generate various random weight vectors α (equipped with a random size p ∈ N).Then, for each of them, we optimize the likelihood function over β using only step (a) above.The best model is then chosen by comparing the various combinations of α and p relying on the Akaike Information Criterion (AIC).The AIC encompasses both the log-likelihood and, in order to avoid overfitting, a penalty for the number of parameters [12].Because step (a) can be performed highly efficiently, a main advantage of this heuristic is that it works very fast, even for bigger values of p, thus providing us with a technique to fit general continuous distributions.Recall from [8] that if we order the components of the weight vector (i.e., α 1 > α 2 > . . .> α p ), then a sufficient condition for (α, β) to yield a cdf is that α 1 > 0 and p i=1 α i β i > 0; we use this principle to select feasible solutions produced by the above heuristic.
Figures 5a-5c illustrate the estimated survival function of using the HE MLE with varying weights for three examples of patience-level distributions.In the first example the patience-level distribution is indeed HE with p = 4, whereas the second and third example are misspecified (corresponding to a lognormal and Gamma patience-level distribution, respectively).For each distribution the MLE was computed using p = 1, 2, and 4 (where p = 3 has been left out because it is barely distinguishable from p = 4).
• As was the case in the setting of Figure 4, the experiments corresponding to the HE patiencelevel distribution show that, even for p = 4 and as many as n = 10 000 waiting-time observations, the MLE does not accurately capture the true parameters.Nevertheless, the fit in terms of the cdf, as displayed in Figure 5a, is remarkably good.We in addition performed the GHE fitting heuristic, which also provide a highly accurate fit and is considerably faster to compute than the MLE for p = 4. • The case of lognormal patience, as illustrated by Figure 5b, shows that the fit is quite good for all estimators except for p = 1 (i.e., an exponential distribution).• Figure 5c presents the fitted distributions for Gamma patience.The fit of the HE MLE is decent, but, importantly, in this case the GHE heuristic performs considerably better than the HE MLE (for p = 1, 2, 4).The GHE heuristic selects a model with p = 10 weights.
The above observations are replicated for a multi-server system with s = 10.Figures 6a and 6b plot the true and estimated patience distribution for lognormal and gamma distributions, respectively, using the misspecified hyperexponential MLE.As in the single-server case, the GHE approximation yields an accurate fit.

Applications
In this section we discuss a series of applications in which our estimation methodology can be directly used.5.1.Unknown proportion of impatient customers.In this subsection we consider a system where an unknown proportion θ ∈ (0, 1) of the arriving customers has a known deterministic patience threshold w > 0. The other customers are patient and always join the queue.Suppose one wishes to estimate the fraction θ from data.
To this end, first observe that Assumption (A4) is not satisfied because H θ (x) is not continuous (in x) at x = w.However, with respect to θ we do have that H θ (x) is continuous.In addition, ℓ n (θ) is concave, and therefore the MLE is given by the first order condition ℓ ′ n ( θ n ) = 0.In this case we can apply results from [31] to establish strong consistency.In particular, the smoothness of ℓ ′ n (θ) enables direct verification of the sufficient condition (21).Asymptotic normality then follows by verifying the conditions of Theorem 2 directly.5.2.Noisy delay messages.Suppose that the patience threshold is a constant θ but the customers do not observe their exact waiting time but rather some noisy estimate W e = e(W ) for some random function e such that E[W e | W ] = W .We assume customers join based on this 'perturbed delay' W e , i.e., if W e θ.Suppose we are in the context that the parameters underlying the noise distribution are known, but the threshold θ is not.
The probability of joining at virtual waiting time level W can be computed given the specific noise distribution.A few examples are: • Additive perturbations.In this case W e = W + ǫ.One could for instance consider normally distributed perturbations: ǫ ∼ N(0, σ 2 ), independent of W , with σ > 0. Let customers facing W e < θ join the system.We thus have, with Φ(•) the cdf of the standard normal distribution, The asymptotic properties of Section 3.1 hold in this case because Φ(•) satisfies the regularity conditions.• Multiplicative perturbations.Now W e = W G, with G non-negative unit-mean, and independent of W .In this case Now if the random variable G is such that the regularity conditions of Section 3.1 are met, then it follows that the asymptotic properties hold in this case as well (with an asymptotic variance that can be expressed in terms of G).

Admission pricing.
Our estimation procedure can be exploited in the context of various problems rooted in operations research.Evidently, when having estimates of the arrival rate and the patience-level distribution, one could consider the option of increasing the service rate so as to potentially raise profits.Thus, the operational decision to be made is whether the increased revenues outweigh the cost of speeding up the service rate.In this subsection we consider another operations research related problem.Suppose that the queue has an admission price of p and that customers are homogenous with a utility function as featuring in the Naor model [45].More concretely, let there be constants r and c such that a customer will join the queue only if the virtual waiting time w upon their arrival satisfies r − p − cw 0, or, alternatively, w is smaller than the threshold value (r−p)/c.Now observe that for any fixed price p the threshold θ(p) = (r − p)/c can be estimated using the MLE procedure presented in Section 3.2.If one of the cost function parameters r and c is known, then the other parameter can be estimated directly.If both are unknown, then their estimation can be performed by an exploration procedure: set two prices, say p 1 and p 2 , and observe the system for each price.Supposing each of these two experiments is done with n clients, we let θ n (p i ) be the MLE for price i = 1, 2. Then the estimators for the cost function parameters r and c are given by, respectively, Theorem 3, in combination with the continuous mapping theorem, implies that, as n → ∞, both r n → as r and c n → as c.Furthermore, for large n confidence intervals for the stationary average revenue per unit of time can be approximated using (30).In particular, observing that the loss probability depends on the price, in self-evident notation, as n → ∞.This opens the possibility of approximating, for n large, the (joint) distribution of n ( r n − r 0 , c n − c 0 )).Note that the loss probability P ℓ needs to be estimated as well.For the single-server case this can be done by estimating the idle probability upon arrival for a given price and using (5).

Concluding remarks
This paper has considered a service system in which clients potentially balk based on the virtual waiting time level they face at arrival.The main objective concerned the development of a framework for estimating the arrival rate and patience-level distribution.Our approach resolves the complication that in our setup only non-balking clients are observed.Distinguishing between the case of a continuous patience-level distribution and constant patience, we developed MLE estimators and quantified their asymptotic properties.Through a sequence of examples and ramifications we have illustrated the performance and broad applicability of our findings.
An important next step could concern the extension to a non-stationary arrival process.For example, one may assume that the potential arrival process is a non-homogeneous Poisson process with an arrival rate function that depends on time, i.e., {λ(t) : t 0}.This means that, when the virtual waiting time is v, the effective arrival rate at time t is given by λ(t)(1 − H(v)), and that the likelihood function in Section 2 can be updated accordingly.If the arrival rate function is known, then our estimation procedure for θ essentially carries over, including its performance guarantees.If the arrival rate function is unknown and parametric assumptions about it are made, then the joint estimation of the arrival rate and patience parameters is possible.Furthermore, in the practically relevant case that the time-dependent arrival rate is cyclic (i.e., λ(t) = λ(t + s) for all t 0 and some cycle length s 0; think of daily or weekly patterns), then the queueing process will still have regenerative dynamics which can be exploited for asymptotic analysis.It is thus anticipated that the framework laid out in this paper can serve as a basis for developing estimation techniques for a wide class of non-stationary models.
While we have developed estimation procedures assuming the independence between the customers' service requirements and patience levels, one could consider an extension that allows dependence.Such an extension could, for example, model the situation in which customers with larger service requirements can be assumed to have more patience.Multi-class queueing models can be used to capture such dependence in service requirements and patience levels [53].More specifically, suppose that there are two customer classes 1 and 2, where class k (k = 1, 2) customers arrive with rate λ k and have service requirements (resp.patience levels), i.i.d.according to a class-dependent cdf G k (•) (resp.H k (•)).It is natural to assume that the customer class is unobservable by the estimator, thus yielding a sequence of dependent service requirements and patience levels.In this setting, we have to estimate the service-requirement cdf G k (•) jointly with the patience-level cdf H k (•) (and the arrival rate λ k ) to compute the likelihood function corresponding to an observed sequence (A, W , X).
Another possible direction for future work concerns the situation in which each customer decides to balk based on her sojourn time, i.e., the virtual waiting time seen by this customer just after (instead of before) her arrival.If the customer precisely knows her service requirement, this problem is easily reduced to the one considered in this paper.The other obvious option is that she balks if the virtual waiting time just before her arrival increased by a 'guess' of her service requirement exceeds her patience threshold.
An alternative to the approach followed in the paper, would be to pursue non-parametric estimation; cf. the results in [28] for the conventional case without balking.In the single-server case it may be of help that we can write, with W (x) and D(x) denoting the cdf s of the waiting times and sojourn times of non-balking customers respectively, if w(x) > 0 and 0 else; the validity of (38) follows from combining (4) with a level-crossing identity.
While it is clear how to estimate W (x) and D(x) using the empirical distribution, estimation of w(x) is more challenging; kernel-based techniques may turn out useful in this context.A final issue to consider is that in many systems customers do not observe exact waiting times, but rather queue lengths.Specifically, this can be modelled as an M/G/s + H system with customers that join or balk after having observed the number of customers in the system.In this case the patience level Y is a discrete random variable indicating at what queue length a customer is willing to join.Therefore, the inter-arrival process at any given queue length q 0 is a Poisson process with rate λ q = λ Hθ (q).As was done in this paper, and MLE for the arrival rates (and the corresponding parameter θ) can be derived from a sample of inter-arrival times and the respective queue lengths.where the first term on the right-hand side is always finite.We thus provide a proof of E[− log(H inf (W 0 )) | W 0 > 0] < ∞ below.Regardless of whether h sup < ∞ or h sup = ∞, it follows from (A3) that for any ǫ ∈ (0, ∞), there exist 0 < y ⋆ 1 < h sup and 0 < y ⋆ 2 < h sup such that e −f (y) H inf (y) < c 1 + ǫ, y y ⋆ 1 , 0 < f (y)H θ 0 (y) < c 2 + ǫ, y y ⋆ 2 .
Using these inequalities, with y ⋆ := max{y which is equivalent to (27).
Proof of Theorem 3.This proof consists of a lower bound and an upper bound.⊲ Lower bound.Note that for θ 0 > W k−1 , we have where (E λ,k ) k=1,2,... denotes a sequence of i.i.d.exponentially distributed random variables with parameter λ.Let N + n ⊂ {1, 2, . . ., n} denote the set of indices of observed virtual waiting times such that N + n = {k ∈ {1, 2, . . ., n} : W k−1 + X k−1 θ 0 }.We see from ( 13) and ( 40) that given k ∈ N + n , the observed virtual waiting time W k is stochastically identical to max{0, θ 0 − A ′ k }, where (A ′ k ) k=0,1,... denotes a sequence of i.i.d.random variables that are exponentially distributed with parameter λ.Therefore, (W k ) k∈N + n are i.i.d. with P(W k x | k ∈ N + n ) = e −λ(θ 0 −x) , x ∈ [0, θ 0 ], (41) so that P max where |N + n | denotes the number of elements in N + n .Obviously, we have θ 0 θ n and θ n max Let, as before, v(•) and w(•) denote the pdf s of the stationary virtual waiting time and the stationary waiting time of non-balking customers, and let P ℓ denote the stationary loss probability.From the ergodicity, we have as n → ∞, where the second equality follows from (3) and (4).Because (44) implies |N + n | → as ∞, we have for any ǫ > 0, using ( 42) and ( 43 as n → ∞.We therefore have that θ n → P θ 0 as n → ∞.Convergence in probability implies that there exists a subsequence ( θ nm ) ∞ m=1 such that θ nm → as θ 0 as m → ∞, and as θ n is a monotone non-decreasing sequence we conclude (29), i.e., the MLE is strongly consistent.

Figure 5 .
Figure 5. Tail distribution of the customer patience, compared with its estimated counterparts.The true patience distributions H θ 0 (t) are hyperexponential, lognormal and Gamma.The fitted distributions H θ (p) n (t) are based on a sample of n = 10 000 observations with the HE MLE for p ∈ {1, 2, 4}, and the GHE heuristic with p = 10.The arrival rate is λ = 1, and the service requirements follow a Gamma distribution with parameters (3, 2).

Figure 6 .
Figure 6.Tail distribution of the customer patience, compared with its estimated counterparts.The true patience distributions H θ 0 are lognormal and gamma.The fitted distributions H θ (p) n (t) are based on a sample of n = 30 000 observations with the HE MLE for p ∈ {1, 3}, and the GHE heuristic.The system has s = 10 has servers with an arrival rate of λ = 10, and the service requirements follow a Gamma distribution with parameters (3, 2).