Hat and squeeze functions, a way for making precise algorithms

Article history: Received 4 November 2010 Received in revised form 22 January 2011 Accepted 22 January 2011 Available online 27 January 2011 Random variates play key role in any simulation system and there are different algorithms to generate random variates. One of the best algorithms for generating random variates is uniform fractional part algorithm. The algorithm has high performance in terms of efficiency, speed and simplicity. Although the algorithm has useful results, it is an approximate algorithm. In this article, the approximate form of the algorithm has been studied, and some suggestions have also been presented. Through acceptance-rejection approach and hat and squeeze function, the approximate algorithm is transformed to near exact algorithm. The proposed model of this paper has been examined and compared with the traditional one and the preliminary results indicate that it performs better than the other existing algorithms. © 2011 Growing Science Ltd. All rights reserved


Introduction
Random variates play an important role on the implementations of simulation techniques.During the past few years, there has been an increasing interest in development of new techniques based on random variates.In this article, we introduce a new method to generate random variates from continuous distributions.The proposed algorithm called uniform fractional part (UFP) seems to be simpler and more efficient compared with other methods of random variates generation despite the fact that the method is an approximate one.The primary goal of this article is to present a new near exact algorithm based on the previously published work of Ahrens and Dieter (1982).There are different methods for generating random variates such as inverse transformation, convolution, combination and acceptance-rejection method (Banks, 1998).Among the developed algorithms for generating random variates, some of them are more useful and efficient than others but inverse transformation is one of the most precise and the simplest ones.The proposed model of this paper looks for one random number and it is worth to mention that this kind of method is also used for generating of order statistics (Meuwissen & Bedford, 1997).There are some limitations on the implementation of inverse transformation method such as the lack of availability of general form and closed from F for some continues distributions.However, it is possible to use numerical methods such as bisection, secant and Newton-Raphson for generating the function having a complex F .
The combination method is used when distribution function (F) is presentable as convex combination of other distributions of F , F , etc shown in Eq. ( 1).This method is applicable to discrete and continuous distributions. (1) Convolution method is used for distributions in which the value of X can be shown as a sum of random variates which is easier than a direct generation of X.In other words, we assume that y , y ، ... ، y exist such that the sum of these values have the same distribution as of X.When other methods are not applicable, acceptance-rejection method is used.In this method, a hat function is considered for the main function in such a way that the generation of variates is easier than this hat function.The proposed model of this paper is based on acceptance-rejection method and the details of the implementation is presented in this paper.

Classifications of generation algorithms of random variates
In order to have a better understanding of characteristics of different methods for generating random variates a summary of all these methods are presented in Table 1.Approximate algorithm such as uniform fractional part, search index, bisection, secant and Newton-Raphson Interaction, etc.-exact algorithm such as inverse transformation method (Devroye, 1982).
Accuracy 1 Some algorithms require a certain number of random numbers, but in some algorithms the number of random numbers is random itself.

Number of consumed random numbers 2
One by one such as composition-several by one like acceptance-rejection-one by several-several by several.

Number of input to output random numbers 3
Auxiliary algorithms like composition-productive algorithms, inverse transformation, etc Ability to generate random values 4 Inverse transformation, convolution, composition, acceptance-rejection.

Method of generating random value 5
In non-stationary simulation in which the distribution parameters used during the simulation are changed, the algorithm having short term primary setup time must be used.But In stationary simulation, if the algorithms have long primary setup time as the parameters are not changed during the simulation process are used, it is required to implement the algorithm once and therefore, no problem is made from point of view of time.

Setup time 6
Applicable only for discrete distributions-applicable with continues distributions-some special algorithms cover all distributions.

Type of distribution function 7
Applicable for all distributions (black box, universal or automatic methods) such as TDR-applicable for specific distribution such as polar, for normal distribution (Banks et al., 2005) Flexibility 8 independency from a machine, etc. (Hung et al., 2010).The method under the study of this article, uniform fractional part algorithm, almost covers all the mentioned criteria.Before we introduce uniform fractional part method, it is necessary to note that this method is one of the universal or black box methods.Although the algorithm has numerous advantages, it is an approximate algorithm.The object of this article is to improve precision of the existing algorithm.Initially, we introduce uniform fractional part or UFP, which is simpler and efficient compared with other methods of generating random variates.This method is based on a statistical hypothesis.First, the method and its applications are introduced, and then its near exact algorithm is presented.

Description of the uniform fractional part algorithm
This algorithm is one of the latest methods presented by Izady and Mahlooji (2005), and it is an approximate algorithm to be used for continues distributions.The algorithm is based on a theorem explained as an example on Morgan's book (1984, page 72).Since we need the details of the algorithm we explain the theorem in this section.
Theorem: fractional part of total two independent random variables with uniform distribution on 0,1 , itself has uniform distribution on 0,1 .

Fig. 1. Probabilities associated with x
The basic logic of the algorithm is as follows: Initially, an integer value in the scope of x is randomly selected (one column in Fig. 1) and then a value R R , where R and R are two independent uniform [0,1] random variables are added to make a random variate (Mahlooji et al., 2008).3 show the advantage of UFP algorithm in terms of the speed and the precession compared with other techniques for generating random variates using Gamma distribution.To measure how well the data follow a particular distribution Kolmogorov-Smirnov statistic is used.Obviously, the better the distribution fits the data, the smaller this statistic will be.The lower this statistic is, the bigger test p-values will be which means a better precision of the algorithm.Although x some algorithms are more accurate than UFP, their accuracy changes with the modification of the parameters and they are not considered robust but UFP is robust with respect the distribution parameters.
There are 9 algorithms considered for generating random variates.The random variates generated by algorithm 1, 2 and 3 follow continues distribution function with cut off point, arbitrary cut off points and equal area approach, respectively.Algorithm 4 is also based on continues distribution and uses different approaches to determine the tails of the distribution.Algorithm 5 applies continues distribution function with reduction approach for the number of random numbers used when d=1 and algorithm 6 repeats the same procedure using an arbitrary value, d.Algorithm 7 generates the random variates using continues distribution function with recycling uniform random number with d=1 and algorithm 8 uses an arbitrary d.Finally, algorithm 9 generates the random variates using continues distribution function based on generating int(x).The proposed model of this paper uses the idea of algorithm 9. Therefore, from the algorithms presented in UFP algorithm evolution, only the steps associated with the algorithm 9 is presented as follows: 1) Input p, uniform probability cut off points a , a , … , a and d a a for i 1, . . ., k 1, 2) generate the random number u ′ (uniform [0,1] random variables), 3) calculate the value ′ 1 and determine the index i, then identify a value for int (x) based on the value of i, 4) generate the random number u on 0, d where d denotes the length of the interval into which X falls (uniform [0, d ] random variables), 5) deliver X, as x ud a

UFP algorithms
Algorithm 1 is the primary algorithm of UFP in which the cut off points should have integer values.We also need to choose appropriate distribution parameters to maintain the independency condition.In the event where x does not follow uniform distribution there may be a correlation between R 1 and R 2 .Therefore, algorithm 2 removes the constraint associated with the determination of parameters and cut off points include non-integer values.To determine cut off points of the algorithm, we can use two approaches of an equal area and variance increase.There are some studies which indicate that equal area approach has more advantages than the other ones (Mahlooji et al., 2008).One of the problems with UFP algorithm is its inefficiency for the distributions with unlimited boundary.Therefore, the solution presented by this algorithm needs to truncate the distribution tails and the algorithm 4 performs the foresaid work with the pre-determined precision.
As the number of random numbers consumed in the algorithm increases, the error in algorithm increases too.In order to generate a random variate, UFP main algorithm (algorithm 1) needs three random numbers to generate [x].While algorithm 5 uses integer column for cut off points, eliminates the foresaid problem and it only uses two random numbers.In Algorithm 6, the same approach is implemented while cut off pointes are defined on desired basis.In algorithm 7, recycle approach of random number is used while cut off point has integer values.In this algorithm, the number of consumed random numbers is decreased to the lowest value and to generate a random variate, only one random number is used.Therefore, algorithm 7 is classified as a one to one algorithm.In algorithm 8, the same algorithm for generation of variates is implemented while cut off points are defined as desired value.One of the time consuming sections of the algorithm is the generation of the values from [x].Algorithm 9 removes such a problem using equal area approach and generates [x] value with index given to each cut off point.Also, in this algorithm, the value obtained from [x], shall be summed only with one random number generated within an interval of 0 to d and does not need to generate two random numbers, R.

Various applications of UFP algorithm
1. Generation of continues random variates such as Gamma, Beta, Normal, etc.: UFP algorithm can be used for various continues distribution such as Normal, Gamma, Beta, etc. and shows desired results in terms of speed and precision.According to Mahlooji and Izady (2004), the speed and precision of this method is robust against the changes of parameters.
2. Generation of correlated random variates: in some of the simulated models, there is a need to generate one random vector as X x , … , x T from one special joint distribution (or multivariates) in which the single components of its vector may not be independent (Sak et al., 2010).UFP method is also usable in generation of such independent values in which for generation of two values with desired integrated marginal distributions having desired correlations, this issue causes no need of making joint function (Mahlooji & Izady, 2004).
3. Generation of random variates associated with ordering statistics: UFP method is also usable for order statistics.Since this method does not need index search as a part of the algorithm, it is faster than the other methods for order statistics (Mahlooji et al., 2004).
4. Generation of random numbers: UFP algorithm has the capability of fast generation of random numbers in which UFP and alias have been combined and the final algorithm is named UFPG (Mahlooji et al., 2004).Despite linear congruential generators and multiplicative recursive generators which have limited length, the length of this generator is almost unlimited (its course length is more than four billions) and has various advantages such as transfer capability, repeatability, short term primary setup time and passing the various statistical tests (such as run, discrepancy and correlation).

The basics of Acceptance-rejection technique
Von Neumann (Banks, 1998) presented the main idea of this method.In acceptance-rejection method, in order to generate random points in some of the hard and complicated areas A, it is enough to find a simpler area such as B in such a way that the area A is covered.Then starting from region B, random points are generated and if the points are also located in the area A, they will be introduced as accepted points.Therefore, the accepted points will be distributed uniformly in the interval A and the method is called acceptance-rejection method.Squeeze method is a skilled one to promote the efficiency of acceptance-rejection methods.In this method, it is supposed that C is an area to be surrounded by A and also D is another area to surround A in such a way that D A C. Squeeze function is used for evaluation of P points generated in B, which may or may not be located in A. this method to the mentioned method, we can use hat and squeeze functions with uniform density (Franklin & Sen, 1975).The following assumptions hold with acceptance and rejection method.
1. another function exists such as h(x) to surround f(x) (it means h x f x for all Xs in Fig. 4) 2. It is possible to generate the random variate from distribution h(x), such points are shown by x, y .
3. If the graph f(x) is drawn, in this case, the point (x,y) is located in above or below of f(x) curve (y x or y f x ).
The most important factor to determine the desired value is to minimize the space between the two charts h and f based on minimizing the number of rejected points.The next important factor in determining h is to speed up the generation of variates from this distribution.The average number of required points (x,y) to generate an accepted x is to find the trial rate.It is obvious that the number of trial rate in an ideal state is equal to one (Ormann & Erflinger, 1994).The logic of acceptancerejection method is shown in Fig. 5.

The use of acceptance-rejection method on UFP algorithm
Since the main method (algorithm 9 that is named A ′ method) is an approximate one, by defining a hat function h(x) and through the use of acceptance-rejection logic, it can be transformed to a near exact method.Hat function defined for x, located in the interval a , a is uniformly defined on 0, f a .The logic for using hat function is that if for the random pair x, y (in which x is generated from any method and y is generated from a hat function), the inequality y f x is satisfied then the generated points are dispersed uniformly under f(x) density function and consequently, the points have f(x) distribution (because y values are under the density function f x and has it has a uniform distribution).Nevertheless, in algorithm A, evaluation of acceptance condition is a time consuming step for most of distributions.Therefore, the speed of algorithm may increase when squeeze function is used.Therefore, a lower limit of S(x) should be found in such a way that inequality s x f x holds.In this case, if x, vh x (v is uniform [0,1] random variables and h(x) is hat function) pair values is under S(x) (Squeeze function), we can accept random variate x without evaluation of density function.In fact, in algorithm A, local or piecewise hat function and in B, local or piecewise squeeze function are used.Determination of hat and squeeze function depends on the form of the function.If the function is descending (such as Fig. 6), hat function for each 0, f a interval is determined and squeeze function is considered uniformly on 0, f a interval.If the foresaid function is an ascending one (fig 7), the hat function is uniform on 0, f a interval and squeeze function is considered uniformly on 0, f a interval.We apply the definitions of squeeze and hat functions in descending situation due to frequency in different distribution functions, although the other definitions of hat and squeeze functions are also applicable to UFP algorithm.x Algorithm A (UFP algorithm with hat function) Algorithm B (UFP algorithm with hat and squeeze function) 1) generate x with ′ algorithm, 2) generate a random number ~ 0,1 , 3) calculate the value of , 4) if , then, return x (evaluated step for PDF), otherwise; go to step 1. 1) generate x with ′ algorithm, 2) generate a random number ~ 0,1 , 3) calculate the value of , 4) if , then, return x (evaluated step for squeeze function), otherwise; go to step 5.

5) If
, then, return x (evaluated step for PDF), otherwise; go to step 1.Now a situation is considered that the function has maximum and minimum points (fig.8 and Fig. 9).If the foresaid interval includes (M) modes or minimizing points, in this case, the hat function is uniformly defined on 0, f M interval and squeeze is as uniform on 0, f a interval.If the interval is located before or after maximizing or minimizing point, the hat and Squeeze function is considered as determined, previously.If the foresaid distribution functions have a more complicated form, we define h i and s i as follows, (2) : , (3) : .
Therefore, hat and squeeze functions are generated uniformly on 0, h and 0, s .Since we ignore many conditions to determine s i and h i , the performance of the proposed method will be increased.

Verification of the presented approach
For verification of the presented corrective approach, the improved algorithms obtained from this approaches are compared with the original one.Exponential distribution has been used for comparison.Of course, the improved algorithm is applicable for all other continuous distributions.As mentioned, the proposed algorithm is the universal one and has no limitation of use for any special distribution.
x For validation, two criteria of speed and precision are used.For speed criterion, one million random data have been generated and generation speed has been calculated based on random variate µs.For precise criterion also, we calculated P-value parameter based on Anderson-Darling test with 95% confidence level.It is worth to mention that the tests have been performed with Borland compiler of C++ 5.02 under 32-Bit platform and evaluation parameters have been calculated using Minitab software.Anderson-Darling's A 2 statistic defined as follows: The assumption under the study in this test is as follows: H 0 : The data follow a specified distribution H 1 : The data do not follow a specified distribution It is worth to mention that presented times in the tables are only for comparison between the original method and the improved one.More professional coding can reduce the amount of generating time, greatly.An example of generation time of random variates for Beta distribution with other methods presented in Table 2 with a more advanced coding program.Because our objective is the comparison, the coding has been simplified in this article (Izady, 2005).time consuming.In method A, for generation of 1000 random variates, we need 1000 times of density function evaluation.This value was decreased to 28 times for algorithm B, which causes speeding up the algorithm.The results show that by increasing the value of statistics, the number of density function evaluation and also the number of failed values of the algorithm shall decrease the precision of the algorithm.Therefore, as far as the value of these two parameters is low, it shows the good performance of the proposed algorithm.Afterwards the impact of the number of cut off points changes on other factors such as P-value of test, Anderson-Darling statistic value, the number of failed value and the number of density function evaluation shall be studied (Table 4-8).

Table 4
The impact of the number of cut off points on speed of the algorithm  Increasing the number of division causes the shortening of p and consequently, increasing the algorithm precision.Therefore, if p goes toward zero, the precision of this algorithm approaches the precision of other exact methods such as inverse transformation method.As it is observed all criteria under study shall be remained robust by changing exponential distribution parameter.

The performance of the algorithm B based on theoretical and experimental results
Note that the term α plays an important role on algorithm B which uses acceptance rejection method and it is calculated as A A (Hormann et al., 2004).Consider we truncate 0.001 of the area from the end Therefore, the obtained results are as follows: E #f 0.028, α E I 1.015228 .It is interesting to see that the empirical results confirm the theoretical findings.

Conclusions
In this work, we introduced the importance of random variates in simulation and presented an explanation and classification about different random variates algorithms.The proposed model of this paper also presented a transformation approximate algorithm as a near exact UFP method.Our findings indicate that near exact UFP method outperforms approximate UFP method in generating random variates.Future research may be about making all relations theoretical, which has been performed in the article as experimental and simulation.

Fig. 2 .
Fig. 2. The performance of UFP method versus the other methods in terms of time

Fig. 3 .
Fig. 3.The performance of UFP method versus the other methods in terms of p-value

Fig. 6 .
Fig. 6.Hat and squeeze function for descending functions Fig. 7. Hat and squeeze function for ascending functions

Fig. 8 .
Fig. 8. Hat and squeeze function for the function has minimum point Fig. 9. Hat and squeeze function for the function has maximum point

Table 1
Classification of random variates generation algorithms To avoid time wasting, prior to evaluation of p A , if p D or p C, if P is located in D it is also located in A and if there is not P in C, P is not in A either.Efficiency of acceptance and rejection methods depends on two things: easiness and generation of variates from B and another one is the average number of variates generated from B for generation of a number of A (this quantity is defined as In this method, it is supposed that function B is analyzed to B , B ,… in such a way that from each subclass, it is possible to generate variate easier than B. B is the jth class of B. Sampling is as follows.First of all, a section (such as B ) is randomly selected.Then a point is selected uniformly from B .If the point is located in A, the point is accepted.To avoid time consuming operations, squeeze function is used.The foresaid technique is named as stratified acceptance-rejection method.Since random variate x in UFP method is calculated using the equation x int x ud , then the first part of this equation shows how the interval of x is generated and the next part of the equation generates uniform variate in the determined interval.Due to similarity of |A||B| where |A| shows area A).Now a technique is introduced to generate variate to make |A| |B| coefficient closer to 1.

Table 2
comparison with UFP method with other methods from speed point of views

Table 3
shows the result of comparison of various criteria for three algorithms based on exponential distribution with various parameters.In the following table, AD means Anderson-Darling test statistic value, and R is the numbering average of values rejected from 1000 generated random values.fmeans the number of density function evaluation for generation of 1000 random variates.Number of cut off points in Table3is equal to 128 sections, which are considered as power of two, and it is due to capability of programming language for powers of two.As we can observe from Table3, the foresaid methods are robust and stable compared with the changes of distribution parameters.In terms of precision, Algorithm A provides higher P-values and lower Anderson-Darling statistic value than ′ .Therefore A is more precise than A'.In addition, algorithm B seems to be faster compared with two other algorithms.It is due to decreasing number of density function evaluation, which is

Table 5
The impact of the number of cut off points on p-value of test

Table 6
The impact of the number of cut off points on Anderson-Darling statistic value

Table 7
The impact of the number of cut off points on number of failed value