Determination of probability distribution of customer input at post office

If we want to analyze a real system with a large number of input data, it is very convenient to determine a probability distribution that best fits to given input data. There are many statistical methods to determine the correct probability distribution and one of them is Chi-Square Goodness of Fit Test. This statistical test can be also used to find out a probability distribution of time intervals between arrivals of customers at post office. Intervals between arrivals of customers occur in continuous time and therefore we consider continuous probable distributions.


Introduction
In real systems such as queuing systems at post offices are based on random events. A system is a set of elements that are arranged in a certain way. Models of systems that are affected by random events show random variables of different form. The result of random event is a random variable [10,12]. These random variables acquire different values and according to the type of these values we divide random variables to discrete and continuous random variables. Discrete random variables are usually integer values. Continuous random variables are values from closed or non-closed interval.
When we examine a particular system, we work with a number of data that represent the values of a random variable. In this case, it is advantageous to determine laws of probability that are attached to the given data. One of them is a probability distribution that describes the probability of the random variable in each value. In other words, probability distribution is the probability of occurrence of each outcome and in the context of queuing system at post office the outcome represents the event -the customer's arrival at the post office.

Background
The development of probability theory had a significant advance at the beginning of the 18th century with a predominantly normal distribution. The rapid development of probability theory probably began with DeMoivre's Dootrine of Chances (1713) and continued with Laplace´s and Gauss´s studies at the beginning of the 19th century and even more increased the dominance of normal distribution in statistics. The development of the exponential distribution came later. In 1931 T. Kondo in devoted his article in Biometrika to exponential distribution and Pearson's type X curve. In 1937 Sukhatme for the first time mentioned the idea that exponential distribution may be an alternative to normal distribution in the cases where the form of variation in the population is known and is not normal. In the 19th century Rényi, Epstein and Sobel made a significant contribution to the development of the exponential distribution. Also, very important was the paper by W. Weibull in 1951 in which he examined the expansion of the exponential distribution which now has his name. The first characterization of the exponential distribution was elaborated by Ghurey (1960) and Teicher (1961) which modified the characterization of normal distribution to the exponential distribution. In the main studies of exponential distribution began in the later years when the bases of statistics were basically built. [7] In 1900, Pearson introduced Chi-Square Goodness of Fit Test that is universally applicable to determine the probability distribution of a given random variable. Pearson found that for a certain amount of data is a distribution approximately chi square with k -1 degrees of freedom. [3] The point of the test that number of classes are fixed, and test is asymptotically chi-square distributed. [2]

Continuous distribution
With respect to this kind of system, which is based on events over time, we consider continuous distributions. A continuous random variable is a random variable with a set of possible values that is infinite and uncountable.

Uniform distribution
Uniform distribution is defined by two parameters, a is the minimum and b is the maximum. The probability density of the uniform distribution from the interval (a, b) is: Uniform distribution R (a,b) has distribution function:

Normal distribution
Regarding to normal distribution random errors are often mentioned as measurement errors caused by a large number of unknown and mutually independent causes. Probability density of normal distribution is given by the following formula: Normal distribution N (µ,σ) has distribution function:

Exponential distribution
Exponential distribution reflects the time between randomly occurring events. Probability density of exponential distribution is given by the following formula: Distribution function of exponential distribution Exp (λ) is following:

Gama distribution
Probability density of exponential distribution is given by the following formula: While for parameters α and β apply α > 0, β > 0. If the parameter α natural number than gama distribution is called Erlang distribution. The distribution function of the gamma distribution does not exist.  Working with data in this way is efficient if we want to perform the simulations where we generate data from a given probability distribution. There is a lot of different algorithms for this generation of random values. Using those algorithms, it is possible to transform random variables of uniform distribution from the interval (0,1) into the appropriate distribution.
It is essential to realize that random values are independent values of the uniform distribution from the interval (0,1). There are many mathematical generic formulas that can be used to analyse a particular system. Queuing theory is one of many mathematical sciences that offer such mathematical formulas where using it means obtaining results analytically by fitting specific parameters into the given formulas. If we decide to analyse queuing system this way, it is necessary to select the correct model, model that is the closest to the real model. The individual models offered by the queuing theory are characterized by the basic parameters. To specify the mathematical model of queuing system, it is necessary to specify: [10]  customer input  network of service lines,  average service time,  rules of entering and exiting into the system,  other specific elements of the system.
The arrival of customers (customer input) is a stochastic process which probability distribution reflects the length of time intervals between customer arrivals. Customer input, which meets three properties:  stationarity,  unconsciousness,  regularity, we call the elementary flow. The flow is stationary if the probability of the arrival of the customer does not depend on the particular time placed on the numerical axis. The property unconsciousness is fulfilled if the events occur independently or respectively. It means that customer enters the service system independently of other customers. The regularity of customer input is based on principle there are not two events happening at the same time, we always find a small-time interval in which only one customer enters the system.

Objective and methodology
The objective of this paper is to determine the probability distribution of measured data. The probability distribution was determined to examine the random variable that is in our case the customer's arrival at the post office. Intensity of customer arrivals is one the parameter of queuing system at post office. In the order determine the probability distribution of variable and to create model we used Chi-Square Goodness of Fit Test as a tool of inductive statistics. This method allows us to determine the probability distribution that fits, and work predict further behaviour of system.
To determine the quantitative side of the system we used the empirical method such a measurement. The object of the measurements were time intervals between the arrivals of the customers at the Bytča Post office and the measurement was done using timers directly at Bytča Post Office during different part of opening hours of the post office. The basic statistical set is potentially infinite. The required standard deviation is ± 0,05 and the required confidence level is 95%. For the calculation of the sample, we used a relationship for calculating the minimum sample: where t 1-α/2 is the critical value determined from the tables, σ is variance calculated from the standard deviation, p is variability of the base file and Δ is maximum allowable error range.
The measured values are divided into intervals. To determine the number of intervals and their length, we used the formulas from statistics. Determining the number of classes: Calculate the interval length: For graphical representation of the measured data, we used column graph. We also used the indicative statistics tool. Inductive statistics are concerned with statistical hypothesis testing. Testing is based on verifying the null hypotheses versus alternative hypothesis. Chi-Square Goodness of Fit Test is appropriate for determination of probability distribution. We used this test to verify the correspondence of measured data with exponential distribution.

Results
Since the customer´s requests handling system of post office mirror a queuing system with two basic input parameters the average interval between customer arrivals λ and the average service time 1/µ, it was necessary to obtain customer input data. Intervals between customer arrivals at post office are defined in continuous time. Customer arrival process represents stochastic process, meaning that each customer's arrival is random, and no rule is attached to it. In the order to examine the properties of the system at post Office in Bytča we made 7 measurements of customer input. After that, we divided the measured data into interval classes as you can see in the table below. [8]  In order to create a system model approaching the real system, it is necessary to find out what probability distribution belongs to the measured data. There are many ways and tests for verifying the probability distribution applied in practice. We chose Chi-Square Goodness of Fit Test that verifies if empirical distribution is statistically identical to any of the theoretical probability distribution and this test is generally applicable to discrete and continuous distributions with a sufficient amount of data. In order to determine what probability distribution is could be considered we plotted the measured data into a graph. [8] Figure 5. showed us that it could be potentially an exponential or Erlang distribution. Intervals between customer arrivals generally behave according exponential distribution in systems similar queuing system at Post Office [11]. To prove or disprove hypotheses about exponential distribution we decided to verify if measured data fits to exponential distribution: [5] [6]  Null hypothesis H 0 = Intervals between arrival of customer is modeled by exponential distribution.
 Alternative hypothesis H 1 = Intervals between arrival of customer is not modeled by exponential distribution.
Level of significance reflect probability that we reject the true hypothesis. In general, this probability must be low and therefore we have chosen α = 0,05.
A goal of the Chi-Square Goodness of Fit Test is to compare the calculated test criterion with the critical value that can be found in the table Chi-Square distribution. Calculation of the test criterion is given by the mathematical relationship: where p i represents the probabilities of individual class intervals. Those probabilities can be calculated using the following formula: where α and β are class interval boundaries, and parameter λ is 1/x average customer flow. In the table below, we can see probability classes with test criteria values for each class interval. There is also probability condition says that probability of class can not be small than value 5/n otherwise we have to merge interval classes until condition is not respected. For this testing the condition is following: 0022 , 0 2228 This inequality is true, meaning that we accept a null hypothesis -the intensity of the customer input at post office in Bytča corresponds to the exponential distribution.

Conclusions
Determining a probability distribution of measured data allows deeper analysis of data and to use the mathematical relationships that relate to particular probability distribution. If it´s known the probable distribution of data, it is also possible to simulate the data and predict its evolution based on the characteristics of the theoretical probability distribution. Probability distributions also play a very important role in generating random numbers in simulation models built on algorithms. Simulation models have a wide range of uses in many areas also in postal processes. The determination of exponential distribution that fits to time intervals between customer arrivals at Post Office Bytča can be useful in building a simulation model of queuing system of Post Office Bytča. In this research we were also able to calculate the average customer input, which is one of the basic parameters of queuing system.