Using infinite server queues with partial information for occupancy prediction

Motivated by demand prediction for the custodial prison population in England and Wales, this paper describes an approach to the study of service systems using infinite server queues, where the system has non-empty initial state and the elapsed time of individuals initially present is not known. By separating the population into initial content and new arrivals, we can apply several techniques either separately or jointly to those sub-populations, to enable both short-term queue length predictions and longer-term considerations such as managing congestion and analysing the impact of potential interventions. The focus in the paper is the transient behaviour of the $M_t/G/\infty$ queue with a non-homogeneous Poisson arrival process and our analysis considers various possible simplifications, including approximation. We illustrate the approach in that domain using publicly available data in a Bayesian framework to perform model inference.


Introduction
This work is motivated by the problem of predicting short and longer-term implications of policy changes on the custodial elements of the prison system in England and Wales. The model described here was developed following consultation with the Ministry of Justice (MoJ) to add to their methods of forecasting the prison population, to help analyse the implications of changing external factors accounting for the prison population such as government guidelines and sentencing policies, and to consider the uncertainty involved in the model and its predictions.
The nature of the prison system is such that arrivals can't be turned away, hence infinite server queueing models are directly applicable as they support the assumption of no queueing delay for service. While such models have been widely used in modelling service systems, including call centres (Ibrahim et al., 2016), hospital staffing (Pender, 2016), software reliability (Huang & Huang, 2008) and insurance claims (Cheung et al., 2019), the assumptions relevant to our scenario lead us to consider some less well-known results from the queueing systems literature and discuss how they can be useful in this setting.
In this domain, matters of capacity management and overcrowding at individual institutions have to be handled by adjustments involving medium term system-wide considerations such as sentencing patterns, parole guidelines and the use of community service arrangements. This intent to support policy makers leads to a focus on the strategic level of decision support (Grieco et al., 2021;Hulshof et al., 2012), and to some considerations on the development of models expressed in terms that are interpretable by policy makers and can enable "what if?" studies, including quantification of the impact of prospective policy change (Bravo et al., 2019;Dong & Perry, 2020;Kegel et al., 2017;Petris et al., 2009;Tuominen et al., 2022).
The value of decision support tools that can analyse the impact of interventions linked to policy change has been demonstrated in care pathways in health services (Demir & Southern, 2017). While modelling flows through a prison system has not been widely studied, there has been some work using queueing models to study the relationship between policy changes and prison occupancy. For example, Usta and Wein (2015) used a queueing network model of the jail system and a simulation approach to study the optimal mix of pretrial release and forms of sentencing to minimise the amount of recidivism, subject to constraints on the available prison occupancy; and Master et al. (2018) used a M=M=c=c queueing system to assess performance of alternative pretrial release policies, and of sentences with a split of custodial and supervised outcomes.
Using public data, we study a single phase of a prisoner's journey through the prison system, as it is well out of scope to model the full system. Our analysis is both informed by, and limited by, the availability of service system data: the size of the prison population is collected and recorded on a monthly basis, but the time served and the remaining length of stay (service times) of those individuals are not. Indeed for custodial sentence admissions, even within offence types (the 12 main offence groups are listed in Table 1), sentence lengths differ, and there is a difference between a court imposed sentence length and the actual service time, so the model cannot assume remaining service times for those individuals present at a given time, even if their formal sentence lengths are known. This contrasts with, for example, work on bed demand in an intensive care unit that considered existing patients as well as arrivals, but could use knowledge of the length of stay of existing patients (Pagel et al., 2017).
Queueing model analyses typically assume the system begins in an empty state (Li et al., 2019) and then, assuming a Poisson arrival process, the well-known results of Eick et al. (1993) mean that the queue length exhibits a Poisson distribution with a mean derivable from properties of the service time distribution. The scenario motivating the work in this paper involves the less well-studied situation where there are n > 0 individuals initially present and where the elapsed time at t ¼ 0 of each individual is not known (Goldberg & Whitt, 2008;Korhan Aras et al., 2017;Weber, 2005). In this case, the departure process is no longer Poisson, but the queue dynamics can be analysed as a combination involving those already in service at t ¼ 0 and those who subsequently arrive, and the departure process is the superposition of a binomial and a Poisson process (Weber, 2005). This separation into initial content and new input allow these sub-populations to be analysed jointly or separately.
As pointed out by Pagel et al. (2017), in contrast to much research involving applications of queueing systems that rely on steady state distributions, when considering the use of models for informing short term consequences associated with operational decision making, results involving transient distributions of queueing systems become relevant.
As highlighted in a recent review of major healthcare applications of infinite server queuing models (Worthington et al., 2020), time-inhomogeneous infinite server models have been used both for predictive modelling (e.g., ward capacity planning (Bekker & de Bruin, 2010)) and for investigating the impact of policy changes (e.g., the introduction of specialised treatment centres for specified illnesses (Utley et al., 2008)). Our ambition to support the analysis of both short and longer-term policy change means we also consider timevarying arrivals.
Another key choice in formulating a queueing theory model is the assumptions regarding service time. The available data suggest a heavy tailed service time (Ministry of Justice, 2020b). Hence, we are led to the M t =G=1 model as it is well recognised that there is an effect on the performance of the queue of the tail of the distribution as it becomes less exponential (Goldberg & Whitt, 2008). To employ our model as a prediction tool, we must also consider the uncertainty involved as we are considering behaviour beyond the sample of the original data set used to estimate the model parameters (Gans et al., 2003). Incorporating parameter uncertainty as part of model inference is an important step to avoid overconfidence in our results (Aktekin & Soyer, 2011). We consider parameter uncertainty involved in the prediction of the system size using estimates from aggregate historical data. Techniques typically rely on summary observable data such as queue lengths, visit counts and response times, but perform inference in different ways (Spinner et al., 2015). A common approach is by Jongbloed and Koole (2001) using a Poisson mixture model and a recent survey on forecasting of the arrival process is by Ibrahim et al. (2016). We employ a Bayesian framework to perform model inference and prediction (Aktekin & Soyer, 2011;Xie et al., 2014). The Bayesian framework allows us to incorporate expert knowledge about the system into prior distributions of model parameters, which is particularly important since we have limited historical data available to us (O'Hagan, 1998).
In summary, we study the transient behaviour of the time-varying infinite server queue, M t =G=1, fed by a non-homogeneous Poisson arrival process whose occupancy is observed at discrete points in time, but the time in service to that point is not known. The contributions of this paper are: (i) the novel synthesis of results from several authors about transient and stationary behaviour of the M t =G=1 queue; and (ii) application of the approach, using Bayesian inference, to the real-world domain of prison occupancy-a domain that has not been well-studied in the literature.
The structure of this paper is as follows: in Section 2 we describe the motivating application; Section 3 outlines relevant results from the literature for analysing an observed M t =G=1 queue; Section 4 presents a Bayesian framework for the estimation of the model parameters using domain data to illustrate the use of the model to both short-term and longer-term predictions, and includes a discussion of the mathematical assumptions of the model. In Section 5 we provide concluding remarks, with comments comparing the presented queueing theoretic approach with time-series based estimation methods. The Appendices contain further information about the prison system (Appendix A), use of the theory (Appendix B), and some examples of how analysis with our model compares with using a time-series based ARIMA model (Appendix C).

Motivating application: Prison occupancy
The study in this paper was produced following consultation with the Ministry of Justice (MoJ). We briefly describe the custodial elements of the prison system, that is, those that require accommodation, with more details in Appendix A. Data on the prison population is collected and managed by MoJ and the HM Prison and Probation Service (HMPPS). Statistics are regularly released as well as projections of the prison population (Ministry of Justice, 2018).
Attributes of the application domain that guided our modelling choices include: a large system of multiclass arrivals with a high offered load, no abandonments, input parameters that are subject to change and operation over a long time scale. Further, the domain data suggests the use of a nonexponential service time distribution and that a stationarity assumption is reasonable over short time frames, which allowed us to take advantage that several questions of interest have more tractable solutions under the assumptions of a stationary model.
Factors such as the conviction rate (the proportion of those arriving to court that are convicted and sentenced) and the custody rate (the proportion of those sentenced that are given custodial sentences) influence the sentenced population. Hence the size and composition of the prison population is subject to policy and legislation changes, for example, changes in Home Office (government department) resources that can affect charge rates and modifications to the sentencing guidelines (MoJ, 2019). Patterns in the published data illustrate how policy and legislative changes have had subsequent impacts on prison occupancy. Hence, from a policy maker's perspective, being able to adjust model parameters to allow, for example, for a more serious mix of offence groups coming before the courts reflects the importance of reviewing model parameters over time.
Of course, describing the dynamics of the prison system is beyond the scope of this paper, but from a modelling perspective it requires only some reasonable assumptions to treat the flow of prisoners as an M t =G=1 queue (Schwarz et al., 2016). Figure 1 displays our model of a simplified prisoner journey with the prison population divided into three main holding phases (displayed percentages are as of June 2019) (MoJ, 2019): (i) on remand (11%), (ii) sentenced prisoners (79%) and (iii) on recall (9%). Prisoners within the licence phase are in the community. A broader view of the prison system and its constraints would include, for example, the number of offenders on probation, staffing resources required for supervision, demands on the courts and parole hearing frequencies (Crowhurst & Harwich, 2016;Ministry of Justice, 2016, 2018.

An observed M t =G=' queue
Motivated by obtaining a prediction of the population given partial information, we describe results applicable to an infinite server system with Poisson arrivals and general service times observed at time s > 0, under the assumption that we have no information on when the n individuals present at this time each began their service, namely an observed M t =G=1 queue. We define the infinite server queue in Section 3.1 and then present results for the conditional distribution for the observed queue in Section 3.2. As the n individuals initially observed at time s complete service, this sub-population will go to zero as t ! 1: Arrivals after time s will occur according to the Poisson dynamics described in Section 3.1. These results are used in our empirical study in Section 4.

The M t =G=' queue
The M t =G=1 queue is a service system in which individuals arrive according to a non-homogeneous Poisson process with rate function kðtÞ, for À1 < t < 1 and where the service times are independent and identically distributed (i.i.d.) and independent of the arrival process (Eick et al., 1993). Let S be the service time and denote by G its cumulative distribution function (cdf) and g its density. Assume E½S < 1, kðtÞ is bounded and integrable and define the associated random variable S e , the stationary excess of the service time, with cdf G e ðtÞ ¼ PðS e tÞ for t ! 0, where G c ðuÞ ¼ 1 À GðuÞ: Let Q(t) represent the number of busy servers at time t and let mðtÞ ¼ E½QðtÞ: Theorem 3.1. (Eick et al., 1993, Theorem 1) For an M t =G=1 queue that was initially empty at t ¼ -1, for each t, Q(t) has a Poisson distribution with mean The departure process is a Poisson process with time dependent rate function k À ðtÞ, where For each t, Q(t) is independent of the departure process in the interval ðÀ1, t: The impact of the service time beyond its mean can be seen from For kðtÞ ¼ k, the approach to the steady state is given by where the steady state has the insensitivity property, as mð1Þ ¼ kE½S: Similarly, the transient behaviour of a stationary model that has been terminated, that is, for t < 0, kðtÞ ¼ k and zero otherwise, is mðtÞ ¼ kG c e ðtÞE½S: In an empty system, the approach to steady state is seen from Equation (4). For a non-empty system observed at time s, we note a result by Mandjes and Uraniewski (2011) who analysed the approach to steady state identifying a function that behaves as the difference between the transient and stationary distributions in the limit, and of relevance in the following Section 3.2.
Example 3.2. We use the Pareto distribution as it exhibits the heavy tailed non-exponential survival times observed in the prison domain, and we draw upon this later. For G $ Paðh, aÞ with shift parameter h > 0 and shape parameter a > 0, for x ! 0, G c ðxÞ ¼ h a ðx þ hÞ Àa : The high variability of G is indicated by the fact that the tail decays as a power instead of exponentially. For a > 1, E½S ¼ hða À 1Þ À1 , otherwise if a 1 then S has infinite mean and G c ðxÞ is not a directly integrable function. For a > 2, Var½S ¼ h 2 aða À 1Þ À2 ða À 2Þ À1 , for 1 < a 2 the variance is infinite, and for a otherwise the variance is undefined. For a > 1, G c e ðxÞ ¼ h aÀ1 ðx þ hÞ 1Àa : If a > 2 then the SCV is c 2 s ¼ aða À 2Þ À1 , then E½S e ¼ hða À 2Þ À1 :

Conditional distribution
Denote by s the observation time, by s þ d the prediction time, and define the vector of the past arrival intensity,k ¼ fkðtÞ : 0 t sg: For a stochastic process f(t) cut at an observation time s, define the past processf ðtÞ ¼ f ðtÞ if t < s and f ðtÞ ¼ 0 if t ! s, and define the future process as f ðtÞ ¼ 0, if t < s and f ðtÞ ¼ f ðtÞ if t ! s: The past process can be thought of as an M t =G=1 process in which arrivals are terminated at time s, so we are able to draw on results by Goldberg and Whitt (2008) who, motivated by a model of a two-stage item inspection process, studied the distribution of the last departure time from a queue of a terminating arrival process.
For the analysis of the observed queue, we separate the contributions of the sub-populations of those observed at time s and those arriving later. It is useful to consider the regions describing arrival and service pairs as depicted in Figure 2, where for each individual arriving to the system, a point is placed at (u, v), with the u-axis recording arrival times and v-axis recording service times. For example, Region (1) corresponds to individuals arriving and departing by s, and the region A ð2[3Þ ¼ fðu, vÞj u s, u þ v ! sg corresponds to individuals arriving on or before time s. Denote by N(A) the number of arrival-service pairs in a region A. As per Theorem 3.1, the Poisson arrivals and general service times generate a Poisson process on the plane with the intensity of a point occurring at (u, v) being ðkðuÞ, gðvÞÞ: As independent splitting of Poisson processes results in Poisson processes, the numbers of points in two disjoint regions are independent Poisson random variables. For example, consider A ð3Þ ¼ fðu, vÞj u s, u þ v ! s þ dg, corresponding to individuals arriving by s and present at s þ d: Then the proportion of individuals present at time s whose remaining service time from that point is at least d, is clearly NðA ð3Þ Þ=NðA ð2[3Þ Þ: This geometric depiction is captured in the following result.
where mean s is given by It directly follows that if kðtÞ ¼ k for t ! 0, and kðtÞ ¼ 0 for t < 0, then s ¼ kE½SG e ðsÞ and G c s ðxÞ ¼ G e ðs þ xÞ À G e ðxÞ G e ðsÞ : The above result provides an expression for the remaining service time distribution in Equation (5), which is required to calculate the conditional distribution in the following result.
Theorem 3.4. (Weber, 2005, Theorem 6) The random variable Qðs þ djsÞ with QðsÞ ¼ n can be expressed as where p s ðdÞ ¼ G c s ðdÞ is given in Equation (5), and PoðÁÞ and BiðÁ, ÁÞ are the Poisson and Binomial random variables, respectively.
The process Qðs þ dÞ can be written as Qðs þ dÞ ¼Qðs þ dÞ þ Qðs þ dÞ, whereQðtÞ has arrival ratekðtÞ and QðtÞ has arrival rate kðtÞ: As constructed above, Qðs þ djsÞ will be the number of points in regions ð3 [ 5Þ of Figure 2. NðA ð5Þ Þ has Poisson distribution with mean given in Theorem 3.1 and the result for Qðs þ djsÞ is given by Equation (9). The distribution of NðA ð3Þ Þ is binomial with n trials and parameter p s ðdÞ given in Equation (5).
Depending on the size of d relative to s, either existing or new arrivals can dominate the prediction. The calculations can reveal the contribution of each component to future demand requirements. For some quantities of interest the two sub-processes QðtÞ ¼ ðQðtÞ, QðtÞÞ, from the expression in Equation (8) can be studied separately. The results extend easily to multiple classes, i, fQ i ðsÞ ¼ n i g as each class is treated independently.
Proposition 3.5. The conditional distribution is with mean E Qðs þ dÞjQðsÞ ¼ n ½ ¼ np s ðdÞ þ mðs þ dÞ, and variance Var Qðs þ dÞjQðsÞ ¼ nÞ The departure process of the observed M t =G=1 queue at time s þ d will be the superposition of a Poisson process with intensity k À ðs þ dÞ ¼ Ð d 0 kðs þ d À uÞgðuÞdu and a binomial process with parameters n and 1 À p s ðdÞ (Weber, 2005). A consequence of which is that the Poisson-in-Poisson-out feature of the unobserved M t =G=1 system does not hold. Under a high load, the binomial process will be the superposition of several point processes and will be approximately Poisson: for large n, Thus considering the system began with an initial population drawn from a Poisson distribution with mean n and we employ this approximation in Section 4.

Last departure time
The behaviour of the decay of the initial population QðtÞ can be described by results of Goldberg and Whitt (2008). Suppose n individuals are observed at time s and consider the time of the last departure. Define M n ¼ maxfX 1 s , :::, X n s g, where each X i s is distributed according to Equation (5), then P M n x ½ ¼ G s ðxÞ n : Now consider the case where the arrival process is terminated at time c, but where an observation is not made. Following Goldberg and Whitt (2008), let D be the last departure time and define T ¼ ðD À cÞ þ , the remaining time after c until the last departure. To determine the distribution of T, let N ¼ N c be a random variable with a Poisson distribution having mean c , then TddM N : For any random variable Y with continuous cdf, let its quantile function be q Y q Y ðxÞ such that the number P½T q Y ðxÞ ¼ x: Write f ðxÞ $ gðxÞ as x ! 1 when f ðxÞ=gðxÞ ! 1 as x ! 1: Then as x ! 1, P½T > x $ c G c c ðxÞ, and for e À c < x < 1, The moments of which can be calculated by E½T k ¼ Ð 1 0 kx kÀ1 E½T > xdx: Example 3.7. For G $ Paðh, a), and with kðtÞ ¼ k for t 2 ðÀ1, s, with In Figure 9 for fixed E½S ¼ 3, we illustrate varying values ðk, c 2 s Þ: The effect of increasing the SCV is seen in the quantiles of the last departure time T. These results can be employed in the application domain to study the decreasing content of a prison population with no new arrivals.
In this section, we present a method to generate predictions that consider the parameter uncertainty using Bayesian inference, considering queue length data from March 2015 to March 2019 on the Adult/Male sentenced population (MoJ, 2020b, Table 1.2b: Sentenced to immediate custody). We refer to the MoJ documentation for details on counting processes, but we interpret this as the sentenced population and only consider data from 2015 due to changes in the reporting processes.
We aim to provide a demonstration of the estimation approach for the model parameters using simulated monthly counts generated from published quarterly data, as the data is not sufficient to perform Bayesian inference about the system and model parameters. To generate monthly counts we: (i) divide the quarterly counts to obtain a monthly mean count; (ii) fit a smoothing spline to produce monthly predictions; (iii) add noise equal to the residual standard error from a linear model fit.

Bayesian analysis of the sentenced population
To specify the model, we adopt a linear arrival rate of the form kðtÞ ¼ b 0 þ b 1 t and assume that the service time distribution for each offence group is Pareto, and can be adequately estimated from historical data (over a much longer time interval than the prediction lead time). Using public data (MoJ, 2020a, July 2018: June 2019) for the mean sentence length for each offence group, we specify that the service time is 50% of the sentence length for this demonstration. For a large system, define the mean function Mðs þ dÞ ¼ E½Qðs þ dÞ using Equation (11), and Pareto ðh, aÞ distribution as in Example 3.2, we have We specify prior distributions for arrival rate function and service distribution time parameters fb 0 , b 1 , a, hg, where arrival data is used to specify informative priors for arrival rate function. We define a Bayesian model for the monthly numbers of arrivals with non-informative priors. Denote by Q a ðs þ dÞ À Q a ðsÞ the number of individuals that arrive in ½s, s þ d) and is Poisson distributed with mean Ð sþd s ðb 0 þ b 1 uÞdu: We extract the posterior samples of b 0 and b 1 . Since both posterior densities are bell-shaped, we define normal priors for b 0 and b 1 centred on the mean of posterior samples and standard deviation equal to the scaled posterior sample standard deviation for the model. We use a scaled posterior sample standard deviation in order to fully explore the parameter space during Markov chain Monte Carlo (MCMC). We specify a uniform prior for a $ Uniform½2:5, 10, where the lower bound avoids infinite variance and the upper bound ensures good convergence and mixing of the Markov chains. In practice, we observe that increasing the upper bound of a leads to non-identifiability. We use RStan to perform full Bayesian statistical inference by adopting the Hamiltonian Monte Carlo (HMC) and no-U-turn samplers (NUTS) (Carpenter et al., 2017).
We demonstrate long-term (k-step ahead prediction) and short-term (one-step ahead prediction) forecasts. The long-term forecasts are useful in informing long-term planning decisions in relation to the prison system capacity, whereas short-term forecasts could provide the insight needed for the day-to-day prison system operation. To generate Summary motoring predictions about the system behaviour, we perform the following steps: (1) Sample ðb ðkÞ 0 , b ðkÞ 1 , a ðkÞ , h ðkÞ Þ from the posterior distribution ½b 0 , b 1 , a, hjn s using RStan, where n s ¼ ðn 1 , :::, n s Þ with n i , i ¼ 1, :::, s being the number of individuals present in the system at time i.

Predictions and simulation results
Example 4.1. We demonstrate the Bayesian model specification to predict the number of prisoners in the Theft offence group where E½S ¼ 5:22 months. From the arrival count data, we obtain the posterior sample for b 0 with mean 1376.5 and standard deviation 9.7. Similarly, the mean and standard deviation of posterior sample for b 1 are -11.5 and 0.3 respectively. We then specify the priors: b 0 $ Normalð1376:5, 10 Â 9:7Þ, b 1 $ NormalðÀ11:5, 10 Â 0:3Þ and a $ Uniform½2:5, 10, with h ¼ 5:22ða À 1Þ: We set two Markov chains with 10,000 iterations for each chain (including warmup).
In Figure 3, we produce the density plots of marginal posterior distributions for model parameters. We observe that the posterior sample density of b 0 is bell shaped and centered around 677.4 and the standard deviation of posterior samples is 17.76. Similarly, the values of b 1 is centred around À3.77 with posterior standard deviation 0.54. We observe that under our model specification the number of arrivals in Theft Offence group gradually decreases over time. The distributions of posterior samples for h and a are less informative, since we included the mean of service time in our prior specification.
In Figure 4 we illustrate long and short term predictions from August 2019 to March 2020. We note that the short-term projections are more computationally expensive as in order to obtain a new observation, we update the posterior distribution by re-running a Stan programme. The left panel plot in Figure 4 shows the long-term forecast (for 8 months). The number of offenders in Theft offence category is expected to decrease over time. Our predictions slightly underestimate the true values, however the true values lie within two standard deviation prediction interval. We observe that uncertainty about our projections increases with time. The right panel plot in Figure  4 shows the short-term forecasts. The predictions are closer to the true values together with the slightly lower uncertainty about our predictions. To access the performance of the proposed model, we use the Root Mean Square Error (RMSE): , where a lower value indicates a better model performance.
The RMSE for long-term predictions is 53.08, whereas for short-term predictions is 20.9.
Example 4.2. Adopting a similar approach as in Example 4.1, we consider the number of offenders for two further offence groups: Sexual offences and Public order offences. The predictions for Public order offences are illustrated in Figure 5. For Sexual offences the RMSE for long-term forecast is 7.16,  whereas for short-term forecast is 5.10. For Public order offences: the RMSE for long-term forecast is 2.97, whereas for short-term forecast is 1.89.
In the next example we reuse the posterior samples of model parameters derived in Examples 4.1 and 4.2 to demonstrate how the model can provide insight into policy modifications.  Figures 6, 7 and 8 we simulate the effect of increasing and decreasing the mean service time on the offence group population for theft offences, sexual offences and public order offences, respectively, and where the solid lines are the predictors and the dashed lines represent two standard deviation prediction intervals. We remark that to produce these results we reuse samples from the posterior distributions of b 0 , b 1 and a. The new values of h are given by h ¼ E½S ða À 1Þ:

Discussion
In the previous section we illustrated how to use the Bayesian model for short and longer-term predictions, and to provide insight into the implications of possible policy modifications. In Appendix B, drawing on other theoretical results, we briefly describe how modification of sentencing and custody rates can be seen as a form of intervention to enable congestion event recovery. Empirical demonstration of the value of such analysis is beyond the scope of the paper, as this would require data not available to us.
With regard to the predictions in Section 4.2, we note that time-series forecasting methods such as ARIMA models (Shumway et al., 2000) can predict the future prison population by describing the autocorrelation in the data. A direct comparison, using the data from Section 4.1, is presented in Appendix C, and in Section 5 we provide some comments on the different approaches to forecasting.
We now briefly discuss the impact of the assumptions of the mathematical model described in Section 3. It is assumed that the time served by individuals at the observation point t ¼ 0 is unknown, but if the elapsed service times fy i : i ¼ 1, :::, ng of the observed populationQðtÞ are recorded, it can be more effective to use this information to make predictions, while carrying a greater cost (Duffield & Whitt, 1997). In contrast to Equation (5), the conditional remaining service time ccdf for elapsed service time x is H c x ðtÞ ¼ G c ðt þ xÞ=G c ðxÞ: As the conditional remaining service times are no longer identically distributed, the complication of the distributionQðtÞ increases and the resulting process is a Markov process (Korhan Aras et al., 2017), although the estimates for the mean number remaining at time t and variance have simple forms: E½QðtÞ ¼ P n i¼1 H c y i ðtÞ, Var½QðtÞ ¼ P n i¼1 H c y i ðtÞH y i ðtÞ: Further, as noted by Whitt (1999), the importance of conditioning upon the time served at an observation point increases as the service-time distribution differs more from an exponential distribution. If G is highly variable, then the elapsed holding time can greatly help in future prediction and a long elapsed holding time tends to imply a very long remaining holding time. Let Yða, hÞ denote the Pareto service time distribution G as defined in Example 3.2 and let Y x ða, hÞ the associated ccdf for elapsed service time x. By the result for H c x ðtÞ above, Duffield and Whitt (1997, Theorem 8) E½Yðh, aÞ, that is, the mean remaining service time E½Y t ða, hÞ is approximately proportional to the elapsed service time t.

Concluding remarks
This work was motivated by the problem of predicting the population of housed inmates within the prison system in England and Wales, where the size of the prison population is recorded on a regular basis, with attention both to short-term predictions to enable local resource planning, and longer-term projections that can provide insight to policy makers regarding the impact of potential policy variations. We studied the transient behaviour of the timevarying infinite server queue, M t =G=1, fed by a non-homogeneous Poisson arrival process whose occupancy is observed at discrete points in time, but the time in service to that point is not known. Drawing on this analysis, and using publicly available data, we built a model that could be used as a decision support tool for custodial elements of the prison system. We illustrated the use of such a model for population prediction and for analysing the implications of changing external factors influencing the prison population such as government guidelines and sentencing policies. The proposed model produces predictions together with uncertainty bands and aligns with the current guidelines on informed decision making in the UK government (Treasury, 2015).
It is beyond our scope to compare the queueing theory approach to generating predictions with multiple forecasting methods, but we note recent work arguing that for time-series methods Liu et al. (2021) to support general scenario analysis requires extracting components known as "features" from the timeseries data, that in turn can be used to generate alternative scenarios (Kegel et al., 2017;Tuominen et al., 2022). In the queueing theory approach, model  parameters have a direct interpretation for the application domain, which straightforwardly enables the study of a variety of public policy initiatives by setting different input values to the model. We demonstrated this in Example 4.3 for changes in the distribution of sentence lengths. As a further example, changes under consideration to the sentencing and release of serious and dangerous sexual and violent offenders (MoJ, 2019), could be studied by changes in the arrival rate (e.g., the custody rate, conviction rate), studies we were unable to undertake, as they require data not available to us. In contrast, time-series methods are not so amenable to what-if style scenario analysis as they offer the possibility of forecasting future observations, but with limited interpretability of the fitted model (Petris et al., 2009).
The contributions of this paper are: (i) the novel synthesis of results from several authors about transient and stationary behaviour of the M t =G=1 queue to enable construction of a model suited to short and longer-term predictions, and to supporting considerations of parameter uncertainty; and (ii) illustration of the approach to potential policy changes in the realworld domain of prison occupancy.
Reflecting the data available for model building, we focused on the situation where the system has non-empty initial state and where the elapsed time of each individual in the system is not known. The dynamics of the queue is a combination of those already in service at some time and those who subsequently arrive, and separation into initial content and new input allowed these sub-populations to be analysed jointly and separately. We drew on results for the transient and stationary distributions of these queueing systems from several authors to enable an analytic approach. Then, using a Bayesian approach with public historical data that allows the inclusion of expert knowledge, we considered parameter uncertainty involved in the prediction of future arrivals and presented a model that maintains interpretability for the domain application. Incorporating other sources of uncertainty into our process Whitt (2002), including model and process uncertainty, and quantifying the contribution from each within our application is a topic of future research.
Further, we note that restoration of the departure process as approximately Poisson also allows the approach to be extended to a network of processes, referred to as a ðM t =G=1Þ N =M model, in which queue length distribution models have time-dependent product form and would be appropriate for models of many service systems , including further development of a model specific to the prison domain.
The approach is potentially applicable to other service systems, but the queueing model properties of interest will vary according to the application context, and the choice of what approach to take will also depend on the available data. Data driven development of a system model is an increasingly popular approach as discussed in Mandelbaum et al. (2019). However, the infrastructure to collect and manage data at multiple phases and timescales is not yet comprehensive, so parameter inference remains a challenging problem because of limited data availability about successive system states, and for model building methods that can take advantage of available but incomplete data are essential. We acknowledge that some of our articulated modelling assumptions may not apply in domains where finegrained staffing implications are of interest, for example, hospitals and call centres.
The availability of future information (via predictive algorithms, machine learning, or local observation) and the resultant novel queueing analysis, has policy design implications, described as "a broader shift from being reactive to proactive" as future information becomes part of the policy maker's toolkit (Spencer et al., 2014;Walton & Xu, 2021).
released from prison before the end of their sentence can be recalled to custody for breach of release conditions. Figure C1 displays a simplified version of prisoner journeys through the custodial system, with the total prison population divided into three main holding phases (displayed percentages are as at June 2019) (MoJ, 2019): (i) on remand (11%), (ii) sentenced prisoners (79%) and (iii) on recall (9%). Prisoners within the licence phase are in the community.
Arrivals to the Remand phase, are recorded in two streams, Untried prisoners and Sentenced prisoners. Individuals remain in remand for some time according to a length of stay distribution G 1 : Upon completion of the Remand phase, the model assumes all Sentenced prisoners proceed to the Sentenced population and a proportion of the Untried prisoners, with the remaining released. An arrival stream from the courts k 2 ðtÞ ! 0 brings individuals directly into the Sentenced population. All departures from the Sentenced population are released into the community, and are all under Licence, for some time, during which they can be Recalled into custody. If an individual is recalled, they re-enter the custodial system and this figure assumes a prisoner can only be recalled once. In practice, this is not the case, although the data available does not provide information on repeatedly recalled prisoners.

Data
Individual prison establishments record details of individuals on a prison IT system and this data is collected as a central database managed by MoJ and the HM Prison and Probation Service (HMPPS) to produce the various analyses of prison population. Statistics are regularly released as well as projections of the prison population (Ministry of Justice, 2016, 2018.
Between 1993 and 2012 the prison population in England and Wales increased by 41,800 prisoners to over 86,000, studies of this growth can point to patterns in the data that align with earlier identifiable policy changes. Such examples illustrate how modelling approaches enabling an understanding of the impact of possible policy changes can be valuable in this domain.
Since 1993 the courts sentenced more offenders each year. Almost all of the increase (98%) took place within two segments of the population -those sentenced to immediate custody (85% of the increase) and those recalled to prison for breaking the conditions of their release (13% of the increase). Three offence groups, violence against the person, drug offences and sexual offences have had a particular impact on the prison population: the number of these offenders increased and the mean length of stay increased.
The proportion of offenders whose sentences were 4 years or more, grew to 54% of the sentenced prison population in 2012, compared to 45% in 1993. The mean sentence length over all offences in 2018 was 17.3 months. For more serious offences, the mean prison sentence was 58.3 months, more than 24 months longer than in 2006. More than two and a half times as many people were sentenced to 10 years or more in 2018 than in 2006.
From 1999 to 2011, the mean time served increased from 8.1 to 9.5 months for those released from determinate sentences. This was likely due to an increase in the mean custodial determinate sentence length handed down by the courts, and a decline in the parole release rate which meant that offenders had served longer by the time they were released. The second largest increase was within the recall population, caused by changes to the law making it easier to recall prisoners, and changes introduced in the Criminal Justice Act 2003 which lengthened the licence period for most offenders.

Appendix B: Steady state recovery from congestion
To illustrate the applicability of further theory to support policy analysis, we present Duffield and Whitt (1997)'s analysis of what is termed congestion events, that is, where an unusually high number of individuals has been observed, and allowing one to investigate the effects of possible interventions (such as change of arrival rate, or pausing arrivals) on recovery from congestion. These results rely on an assumption that the system is in steady state, i.e., kðtÞ ¼ k, which could be motivated by a slowly changing arrival rate, or for scenario analysis. These results follow directly from the results for the model described in Section 3.2 and can be employed to characterise the effect of possible interventions that could be introduced to change the rate of recovery.
For all s, ¼ s ¼ kE½S, mðs þ dÞ ¼ kE½SG e ðdÞ and p s ðdÞ ¼ G c s ðdÞ ¼ G c e ðdÞ and the conditional mean is E Qðs þ dÞ j QðsÞ ¼ n ½ ¼ kE S ½ G e ðdÞ þ nG c e ðdÞ: As Goldberg and Whitt (2008) noted, since s and G c s ðdÞ can be computed, it can be directly determined when the steady state approximation is reasonable. Duffield and Whitt (1997) proposed the steady state rare event, as an approximation for what is seen at a random (steady state) time and also as an approximation for the associated hitting time event. The likelihood of a high congestion event, that is, having a large number N of individuals present, P N ! n ½ ¼ P 1 k¼n P½PoðÞ ¼ k with ¼ kE½S at t ¼ 0: Define a recovery level k, then a system has recovered from a rare congestion event at time 0 when QðtÞ first reaches k with < k < n: To define the recovery time for the mean, let The following result describes how the system recovers from congestion and the way it depends on cdf G. Theorem B.1. (Duffield & Whitt, 1997, Theorem 2) If there is no t such that G c ðt À Þ > 0 ¼ G c ðtÞ, then G c e ðtÞ is continuous and strictly increasing. Then the recovery time for the mean, b in Equation (B2)  This result is obtained by rearranging Equation (B1). It can be seen that the impact of G on depends on whether recovery level k is close to n or close to v. When the recovery level k is close to v, the cdf matters greatly. If the recovery level k, is closer to n than v, then the recovery time v depends more on the initial portion of the cdf G e than upon its tail. Consistent with this observation, the impact of a long-tail cdf on v differs little from the impact of a short-tail cdf with the same mean if k is suitably close to n, but it differs dramatically if k is suitably close to v. However, "suitably close" depends on the decay rate of the ccdf G c e : If the system does experience a congestion event, the recovery can be aided by reducing the arrival rate, or by pausing arrivals until the congestion has cleared. If the arrival rate is reduced to a new constant rate, is equivalent to reducing the mean, as ¼ kE½S, produces a new recovery equation of the form in Equation (B3). For k < m < n, clearly ðk À Þ=ðn À Þ is increasing in , so that the recovery time for the mean is reduced by decreasing : If the arrivals are terminated until recovery is achieved, ¼ 0 in the recovery equation. Then further recovery can be achieved by turning the arrivals on and replacing n with k, and j for k in the recovery equation, such that m < j < k: Example B.2. For G $ Paðh, a), the recovery equation in Equation (B3) we have t ¼ ð h aÀ1 ðnÀÞ kÀ Þ 1 aÀ1 À h: In Figure  C10, for fixed E½S ¼ 3 and recovery level k ¼ d þ 1e, the black lines illustrate the recovery time for varying levels of congestion for different scenarios ðh, a, c 2 s Þ, where the x-axis shows the congestion level relative to the steady state mean : To aid the recovery for the three scenarios (black lines), we reduce the arrival rate by 20 percent, that is, k 2 ¼ 0:8k 1 and illustrate the effect on the corresponding dashed lines (blue lines).

Appendix C: Comparison to time series analysis
We apply traditional time-series methods, ARIMA models (Shumway et al., 2000) to produce prison population predictions for the offence groups considered in Section 4.2. The order and the values of parameters are obtained using R package forecast (Hyndman and Khandakar, 2008). The forecasts of the number of prisoners in the Theft offence group are produced by ARIMA(2,1,0) with drift given by: y t ¼ À9:76 þ 0:94y tÀ1 þ 0:462y tÀ2 À 0:41y tÀ3 þ t : Similar to the analysis performed in Section 4.2, onestep ahead predictions are produced by iteratively adding one data point at a time and refitting the model and we note that the order of the model was updating with the changes in the training data set. The RMSE for long-term predictions is 30.21, whereas for short-term predictions is 10.16. The results are presented in Figure C11, and we observe that the length of the prediction intervals is lower compared to the predictions produced in Figure C4. This is due to the fact that the parameters of ARIMA model are set to the fixed quantities, whereas we assign probability distributions for the parameters in our proposed model. Similar results are presented for the Sexual offences and Public order offences groups in Figure C12. For example we present the k-step ahead forecasts for the Sexual offences group which are produced by ARIMA(2, 2, 2): y t ¼ 2:55y tÀ1 À 1:95y tÀ2 þ 0:23y tÀ3 þ 0:16y tÀ4 þ t À 1:62 tÀ1 þ 0:68 tÀ2 , where the RMSE for long-term predictions is 4.71, whereas for short-term predictions is 4.42, where the above comment holds observing lower prediction intervals. The k-step ahead forecasts for the Public order offence group was produced by ARIMA(2,0,2): y t ¼ 93:22 þ 0:28y tÀ1 þ 0:47y tÀ2 þ t þ 0:91 tÀ1 þ 0:85 tÀ2 , where the RMSE for long-term predictions is 5.44, whereas for short-term predictions is 2.16.