Regression analyses of the data sets for the analysis of decomposition error in discrete-time open tandem queues

The data sets and regression models presented here are related to the article “Point and interval estimation of decomposition error in discrete-time open tandem queues” [1]. The data sets are the first to analyze the approximation quality of the discrete-time decomposition approach and contain independent and dependent (explanatory) variables for the analysis of decomposition error, which were obtained using discrete-time queueing models and discrete-event simulation. Independent variables are the utilization parameters of the queues, and variability parameters of the service and arrival processes. Dependent variables are decomposition error with respect to the expected value and 95-percentile of the waiting time distribution at the downstream queue. This article presents multiple linear regression and quantile regression to explain the variance of the dependent variables for tandem queues with equal traffic intensity at both queues and for tandem queues with downstream bottlenecks, respectively.


Specifications
Mathematical Modelling  Specific subject area  Discrete-time queueing theory  Type of data  Table, Chart How the data were acquired Data were acquired using discrete-time queueing theory and discrete-event simulation. For a given parametrization of the tandem queue (arrival and service rates, and variability parameters) we computed the expected value and the 95th-percentile of waiting time using both methods. Decomposition error is the relative divergence between the results obtained with simulation and discrete-time queueing theory, respectively [1] . Data format Analyzed, Filtered Description of data collection The expected values of inter-arrival and service times (that is, the flow parameters) have been varied in the range [

Value of the Data
• Decomposition approaches for open queuing networks are known to yield approximate results for the analysis of non-renewal downstream arrival processes [1] . The data sets described and analyzed in this article are the first to investigate the approximation quality (that is, decomposition error) of the discrete-time decomposition approach. Decomposition error is the relative divergence between the waiting time (expected value and 95th-percentile), obtained with simulation and discrete-time queueing theory, respectively [1] . • The regression analyses reveal statistically significant correlations between the variability and utilization parameters of the tandem queue and decomposition error. This suggests that decomposition error can be efficiently estimated with only the input parameters of the tandem queue at hand. Researchers deploying decomposition approaches can use the data and regression models to alert severe decomposition errors with high forecasting accuracy. • This data may also help for the design of experiments of the analysis of the approximation quality of decomposition approaches for other network typologies (e.g. stochastic splits and merges), as well as for the validation of exact decomposition methods.

Data Description
The data repository supplied with this article contains raw data for the analysis of decomposition error in discrete-time open tandem queues. The data is formatted for the computation and validation of point and interval estimates for decomposition error as well as for the analysis of decomposition error in bottleneck queues.
The repository contains two folders: "01 Equal Traffic Intensities": Raw data for the analysis of decomposition error in tandem queues with equal traffic intensities, stored in two .csv-files (training and test data), "02 Bottleneck Analyses": Raw data for the analysis of decomposition error in tandem queues with bottleneck, stored in three .csv-files (downstream bottlenecks, upstream bottlenecks, and equal traffic intensities). The definition of all variables that appear in the data set is as follows: 1. EV Arrivals External: Expected value of the external arrival process 2. EV Service Upstream: Expected value of the upstream service process 3. EV Service Downstream: Expected value of the downstream service process 4. Utilization Upstream: Utilization of the upstream queue 5. Utilization Downstream: Utilization of the downstream queue 6. SCV Arrivals Downstream: Squared coefficient of variation of the downstream arrival process (that is, the upstream departure process) 7. SCV Service Upstream: Squared coefficient of variation of the upstream service process 8. SCV Service Downstream: Squared coefficient of variation of the downstream service process 9. EV Waiting Decomposition: Expected value of waiting time at the downstream queue, obtained with the discrete-time decomposition approach 10. EV Waiting Simulation: Expected value of waiting time at the downstream queue, obtained with discrete-event simulation 11. 95-perc Waiting Decomposition: 95th-percentile of waiting time at the downstream queue, obtained with the discrete-time decomposition approach 12. 95-perc Waiting Simulation: 95th-percentile of waiting time at the downstream queue, obtained with discrete-event simulation 13. EV Decomposition Error: Decomposition error with respect to the expected value of waiting time 14. 95-perc Decomposition Error: Decomposition error with respect to the 95th-percentile of waiting time In this article, we present data and regression models for two tandem queue configurations [2] . First, we consider tandem queues with equal traffic intensity, and second, we present data for tandem queues with downstream bottlenecks. For both configurations, we present OLS regression and quantile regression to explain the variance of decomposition error with respect to the expected value and 95th-percentile of waiting time.
To this end, Table 1 specifies the variables used in the regression analyses. We use the squared coefficient of variation to describe the variability of the arrival and service processes and the utilization to describe the traffic intensity at the upstream and downstream queue, respectively.

Equal Traffic Intensities
This data set contains 1,166 data points that we partition into two subsets. The training data set consists of 932 randomly chosen data points, and the test data set consist of the remaining 234 data points. Fig. 1 shows the empirical cumulative distribution of decomposition error for the entire data set. It shows that the discrete-time decomposition approach both overestimates and underestimates waiting time in the same proportion. We find the relative errors in the range of −21 . 9% and 32 . 5% (referring to decomposition error with respect to the expected value) and −30 . 8% and 36 . 7% (referring to decomposition error with respect to the 95th-percentile). The mean absolute values of decomposition error equal 3 . 93% and 4 . 51% regarding decomposition error with respect to the expected value and the 95th-percentile of waiting time, respectively.  Table 2 provides the summarizing statistics for the IVs in the training data set and the flow parameters for the tandem queue. Note that the expected values for the service processes at the upstream and the downstream queue are equal, and thus, we list E(B ) for both queues. We normalize the IVs of both subsets with the mean-and STD-values listed in Table 2 .  Table 3 presents the OLS and quantile regression coefficients and standard errors for decomposition error with respect to the expected value. The training data set was used to compute the coefficients. The OLS regression analysis is found to be statistically significant ( F (10 , 921) = 2123 , p > . 001 ), explaining the majority of the variance of the relative error of the expected value of waiting time ( R 2 Adj. = 0 . 958 ). The ANOVA reveals all direct effects and the majority of the interaction effects (with the exception of the interaction between downstream service time variability and utilization) to be statistically significant. Table 3 OLS and quantile regression estimates for decomposition error (dependent variable is the expected value of waiting time) in tandem queues with equal traffic intensity. The remaining columns of Table 3 present the coefficients of quantile regressions. The standard errors are obtained with 100 bootstrapping replications, respectively. The Pseudo R 2 of each model is well above 0.8. All quantile regression equations show similar patterns of changes in coefficient values as the OLS regression. We find the majority of direct and interaction effects to be statistically significant. As in the OLS regression, the interaction effect between the service process variability (at the upstream queueing system) and the utilization is found to be non-significant among each model. While the absolute sizes of the coefficients for most factors vary little across the equations, it should be noted that the weights of the service process variability at the upstream queueing system, and the arrival process variability at the downstream queueing system rise with increasing quantile. Table 4 presents the OLS regression coefficients for decomposition error with respect to the 95th-percentile of waiting time. We find a statistically significant OLS regression equation ( F (10 , 921) = 1064 , p > . 001 ), which explains the majority of the variance ( R 2 Adj. = 0 . 920 ) of decomposition error. All direct effects are statistically significant. The impact patterns of the interaction effects are the same as in Table 3 .
The remaining columns of Table 4 show the regression coefficients of quantile regressions. The standard errors are computed with 100 bootstrapping replications, respectively, with Pseudo R 2 of all models well above 0.6. Except for the service process variability at the downstream queueing system, which is non-significant for the models with τ ≤ . 05 , all direct effects are found to be statistically significant among each regression model. The majority of interaction coefficients is found to be significant or marginally significant. However, we did find nonsignificant coefficients among the interaction effect of the service process variability and the arrival process variability (both at the downstream queueing system), as well as in the Q(.975) model. As in Table 3 , the absolute sizes of coefficients vary little for most factors across the Table 4 OLS and quantile regression estimates for decomposition error (dependent variable is the 95th-percentile of waiting time) in tandem queues with equal traffic intensity. equations. However, the weight of the utilization increases by rising quantiles, while (in contrast to Table 2 ) the weight of the arrival process variability decreases.

Downstream bottlenecks
As Suresh and Witt [3] mention, in a narrower sense, the bottleneck is the queue with the highest traffic intensity. However, increasing the traffic intensity of a queue by only a small amount may shift the bottleneck position. Therefore, it is intuitive to state that either of the queues is the bottleneck if it's utilization is substantially greater than some ε, We created a data set that contains 969 data points and choose ε = 0 . 1 to split the data into three subsets. In the first data set (403 data points), the downstream queue is the bottleneck, in the second data set (131 data points), the traffic intensities are similar ( | ρ u − ρ d | ≤ ε), and in the third data set (435 data points), the upstream queue is the bottleneck. Analogous to the analyses of tandem queues with equal traffic intensities, we use OLS and quantile regression to model decomposition error with respect to the expected value and 95thpercentile of waiting time. We use the first and the second data set to compute the regression coefficients. Table 5 and Table 6 show the results with respect to the expected value and the 95th-percentile of waiting time, respectively. Table 5 OLS and quantile regression estimates for decomposition error (dependent variable is the expected value of waiting time) in tandem queues with downstream bottlenecks (cont. on next page).

Experimental Design, Materials and Methods
We use the algorithm described in [4] to generate 1,166 data points in a four-dimensional space-filling latin hypercube design (data sets with equal traffic intensities), and 969 data points in a five-dimensional space-filling latin hypercube design (data sets for bottleneck analyses). The expected values of the external inter-arrival and the service times are independently randomly selected from the interval [1.0, 30.0]. To create the data points with equal traffic intensity, the expected values of service time are equal for the upstream and the downstream queue. The variability parameters of the service time distributions are independently randomly selected from the interval [0.1, 3.0].
In our analyses [1] we assume that the random variables describing the service processes are described by discretized gamma distributions. Let X be a gamma-distributed random variable with shape parameter k and scale parameter θ . The probability density function of X is given by [1 , 5] f ( x ; k, θ ) = where (k ) is the gamma function. In order to generate gamma-distributed random variables X with predefined values for E(X ) and scv (X ) , we use the well-known closed-form expressions for the shape and scale parameters of the gamma function [1 , 5] , We use the squared coefficient of variation ( scv ) as normalized measure of statistical dispersion to measure the process variability. Let E(X ) define the expected value of, and V ar(X ) its variance. The variability of X is defined as [1] scv ( X ) = V ar ( X ) /E 2 ( X ) .

Ethics Statements
The authors declare that they have no conflict of interests.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data Availability
Data sets for the analysis of decomposition error in discrete-time open tandem queues (Original data) (KITopen Repository).