The Ziggurat Method for Generating Random Variables

We provide a new version of our ziggurat method for generating a random variable from a given decreasing density. It is faster and simpler than the original, and will produce, for example, normal or exponential variates at the rate of 15 million per second with a C version on a 400MHz PC. It uses two tables, integers ki and reals wi. Some 99% of the time, the required x is produced by: Generate a random 32-bit integer j and let i be the index formed from the rightmost 8 bits of j. If j < ki return x = j wi. We illustrate with C code that provides for inline generation of both normal and exponential variables, with a short procedure for setting up the necessary tables.


Introduction
In the early 1980's we d e v eloped the ziggurat method for sampling from decreasing densities, Marsaglia and Tsang 9].The method was based on covering the target density w i t h a set of horizontal equalarea rectangles, a `cap' and a tail.The ziggurat appellation came from the appearance of the layered rectangles.A uniform point ( x y) from a randomly chosen rectangle provided the required variate x, most of the time after two table fetches and a test on magnitude.
But the original ziggurat method had a rather complicated procedure for providing the required x for the rare cases when the simple test on magnitude failed.We s h o w here how to form the covering rectangles so that a simpler procedure may be used for the rare cases, and no `cap' is necessary.
The idea of cutting a density i n to many small pieces, then choosing and sampling from one of them, goes back to the early 60's, Marsaglia 4].Particular methods for normal and exponential variates were described in 6,7,8], and they were, for many y ears, fast and widely used standard generators for many platforms.Many of the details are spelled out in successive editions of Knuth's Volume 2,2].
Rather than cutting the density into easily handled pieces|in e ect, expressing the density as a mixture of very simple densities and a complicated residual density, the ziggurat approach c o vers the target density with the union of a collection of sets from which it is easy to choose uniform points, then uses the rejection method.
While the method and the set up were complicated, the original ziggurat method 9] led to some of the fastest methods for normal, exponential and other decreasing densities.Our goal here is to provide a v ersion that is faster and simpler.Methods for normal variables by Ahrens and Dieter 1] and Leva 3] are said to be fast we p r o vide comparisons with those methods.
Details of the new ziggurat method are in Section 2, then Section 3 provides methods for setting up the ziggurat for table size 256 or 128 table size is best chosen a power of 2, to make random selection from the table easy: a 7 or 8-bit segment of a 32-bit random integer.
Section 4 gives speci c implementations for normal and exponential variates, Section 5 provides some time comparisons and Section 6 summarizes.

The Ziggurat Method
We begin with a description of the basic rejection method: Let C be the set of points (x y) under a plot of the curve y = f (x) with nite area.Let Z be a set of points containing C : Z C .The basic idea of the rejection method is: Choose random points (x y) uniformly from Z until you get one that falls in C, then return x as the required variate.The density of such an x will be cf (x), with c the normalizing constant that makes cf (x) a proper density.(Dealing with an unscaled f allows us to ignore the nuisance constants that are part of many densities.)Three of the main criteria for choosing the covering set Z are: 1) Make it easy and fast to select a random point ( x y) from Z 2) Make it easy and fast to decide whether the random (x y) f r o m Z also falls in C 3) Make a r e a (C)=area(Z), the e ciency of the rejection procedure, close to 1.
The ziggurat method for choosing the covering set Z comes closer to meeting all of these criteria than does any other general method we a r e a ware of.A crude version of the ziggurat method is pictured in Figure 1: Here, C is the region under the curve y = f (x) = e ;x 2 =2 x > 0, and Z is the union of 8 sets, 7 rectangles and a bottom strip tailing o to in nity.(We h a ve c hosen only 8 sets for clarity in practice we c hoose 64, 128 or 256 sets whose union is Z.)All 8 sets in the pictured Z have the same area, v, s o it is easy to choose one of the sets at random.Furthermore, 7 of the 8 sets are rectangles from which i t is easy to get a random point ( x y), and further yet, for those rectangles, it is easy to decide if (x y) is in C : If the rightmost coordinates of the rectangles are 0 = x 0 < x 1 < x 2 < < x 7 , and rectangle R i is selected, i > 0, then the x-coordinate of a random point i n R i is U x i with U uniform (0,1), and if x < x i;1 then the random point ( x y) m ust be in C, con rming the acceptance of x without having to generate y. (Assume rectangle R 0 is empty with right e d g e x 0 = 0, the left edge of R 1 .) And nally, it is easy to get a random point ( x y) from the base strip, as the base is itself a rectangle adjoined to the tail, x > x 7 .Let r be the rightmost x i , so that the tail is r < x .We m a y generate from the base strip as follows: generate x = v U = f (r), with U uniform (0,1).If x < r return x, else return an x from the tail.That provides an x from the base rectangle with the required probability: r f(r)=v.Note that code to provide from the rectangle part of the base strip can have the same form as that for the other rectangles: Form x and return that x after a succesful test on magnitude.
For the normal tail, the method of Marsaglia 5] provides: generate x = ; ln(U 1 )=r y = ; ln(U 2 ) until y + y > x x, then return r + x.For the exponential tail, of course, x = r ; ln(U).
For our fastest application, we form the ziggurat with layers of 255 rectangles with common area v, and a bottom strip of area v.The rectangles end at x 1 < x 2 < < x 255 = r.Assume those x's have been determined,as described in the next section.
Since, as in most applications, we provide uniform (0,1) variates U by oating a random integer (32-bit unsigned long), we m a y s a ve time by incorporating the oat operation into the step that forms the x's that are to be returned: For each v alue of the index i in 1 i 255, form the 32-bit integer k i = b2 32 (x i;1 =x i )c, and set w i = :5 32 x i .For the special index i = 0, set k 0 = b2 32 r f(r)=vc and w 0 = :5 32 v = f (r).
We m a y then describe the entire procedure with these simple steps: The Ziggurat Algorithm 1. Generate a random 32-bit integer j and let i be the index provided by the rightmost 8 bits of j .2. Set x = j w i .If j < k i return x. 3.If i = 0 return an x from the tail.
3 Setting up the ziggurat Given the (unscaled) target density f (x) x 0, we want to nd 255 (or 127, 63, etc.) equal-area rectangles, such as pictured in Figure 1, so that the covering set Z is very close to C.
A picture of a ziggurat with some of its 255 rectangles is in Figure 2, for the exponential density.The rectangles are so closely layered that images blur in the lower part if all 255 are shown.So, after the rst 20, only rectangles R 30 R 40 : : : R 250 and the nal R 255 are shown.Those few will give an idea of how closely Z covers C, and the closeness of x i;1 =x i to unity for almost all of the rectangles.The rectangles must be chosen so that their common area, say v, has the same value as that of the nal, base, region.How is this done?
The right edges of the rectangles are at 0 = x 0 < x 1 < x 2 < < x 255 .From the gure, it is clear that we m ust have, using r to designate the rightmost end point, r = x 255 : Even if v is given, it is not easy to nd the necessary x's.Instead, a feasible procedure is this: de ne a function of r, the rightmost end point, say z(r), by a sequence of Maple-like commands: for i from 254 by ; 1 to 1 do x i = f ;1 (v=x i+1 + f (x i+1 )) od return (v ; x 1 + x 1 f (x 1 )) Then the problem is to nd the value of r that makes z(r) = 0.For f (x) = e ;x 2 =2 the choice r = 3 :6541528853610088 will make z(r) = 0, and then from that value for x 255 the other x 0 s may b e found, from the relation x i = f ;1 (v=x i+1 + f (x i+1 )), a procedure that requires inverting f .
The common area of the rectangles and the base turns out to be v = :00492867323399, making the e ciency of the rejection procedure 99.33%.
For f (x) = e ;x , the exponential density, r = 7 :69711747013104972 makes z(r) = 0 and will assign the proper value to x 255 , from which the other x's follow.The common area of the rectangles turns out to be v = :0039496598225815571993, making the e ciency of the rejection procedure 98.9%.
For those who may w ant to use a table of 127 x's: for the (half-) normal density, t h e v alue x 127 = 3:442619855899 will serve to nd v and the other x's the e ciency is 98.78%.For the exponential, x 127 = 6:898315116616, for an e ciency of 97.98%.With memory so readily available, there seems little reason for choosing 127 rather than 255 x's, but we will use 127 for the normal distribution in the implementation below, because we h a ve to accomodate x < 0 a s w ell as x > 0.
Furthermore, with memory so cheap, it is feasible to speed up the generating process even more: form an auxiliary table of integers, k i = b2 32 (x i;1 =x i )c, and, rather than storing a table of x's, store w i as :5 32 x i .Then the fast part of the generating procedure is: Generate j .Form i from the last 8 bits of j .If j < k i , return x = j w i .(Special values k 0 = b2 32 r f(r)=vc and w 0 = :5 32 v = f (r) p r o vide for the fast part when i = 0 .)

Implementation
It turns out that storing a third table, f i = f (x i ) is no more costly in terms of the nal size of the compiled program than is a more complicated version that computes the f (x i )'s.Assuming that table, the essential part of the generating procedure in C looks like this, assuming the variables have been declared, static tables k 256],w 256] and f 256] are setup, and SHR3, UNI are inline generators of unsigned longs or uniform (0,1) variates, using the #define statements given below.The result is a remarkably simple generating procedure, using a 32-bit integer generator such as SHR3, described below: The in nite `for' is executed until one of the three return's provides the required x (better than 99% of the time from the rst return).The method works for any decreasing density, c o n veniently scaled as f (x), for which a tail method can be provided.For symmetric densities, the right half is used and a random attached.The procedures di er only in that di erent f 's are used in the third return three tables k 256],w 256],f 256] need be constructed from that function and di erent tail methods need be provided.Finally, before giving a C implementation that combines both an exponential and a normal generator, we point out a feature of C, the #define statement, that provides inline access to a sequence of expression evaluations, saving the overhead costs of calls to a procedure.For the ziggurat method, as we h a ve outlined it, the required variate is returned some 99% of the time after two table fetches and a test on magnitude.If these steps could be done inline, the overall average running time would be reduced.This can be done|if appropriate #define statements are used.
First, with all integers 32-bit unsigned long, one needs a #define statement that provides an inline random number generator: #define SHR3 (jz=jsr, jsr^=(jsr<<13), jsr^=(jsr>>17), jsr^=(jsr<<5),jz+jsr) then the following #define statement provides inline generation of the fast part of the ziggurat method for exponential variates: #define REXP (j=SHR3, i=(j&255), (j<kz i] ?j*wz i] : efix())) Here efix needs to be a procedure that returns a variate from the tail if i=0, or generates y in f (x i;1 ) < y < f (x i ) and returns x = j w z i] i f y < f (x), or else starts all over.And jsr,i,j need to be static variables.To a void possible confusion with the commonly used variable names i and j, w e designate them iz,jz in the code below.
We now give a single C program for the ziggurat method.It provides inline generation of either exponential (REXP) or normal (RNOR) random variables.To do this, we m ust have seperate arrays for exponentials and normals: ke 256],we 256],fe 256],kn 128],wn 128],fn 128], initialized in a single procedure, zigset, and seperate ` x' procedures that provide for generation when that 99%, the inline part, does not provide an immediate value.
fe 255]=exp(-de) for(i=254 i>=1 i--) { de=-log(ve/de+exp(-de)) ke i+1]= (de/te)*m2 te=de fe i]=exp(-de) we i]=de/m2 } } \begin{verbatim} These comments apply to use of the above code: A main program must be provided, in which use of REXP or RNOR in an expression will provide the required variate.Before such uses, the tables must be set up by m e a n s of a statement s u c h as zigset(17235321) with any (non-zero) unsigned long argument.
SHR3 uses an inline 3-shift shift register and UNI oats it to (0,1).SHR3 is very fast, has period 2 32 ; 1 and does very well in tests of randomness|in particular for sequences made from the last 8 bits.SHR3 adds two successive terms of the sequence y n+1 = y n (I + L 13 )(I + R 17 )(I + L 5 ), with the y 0 s viewed as 1 32 binary vectors, L the 32 32 matrix that causes a left-shift of 1, and R its transpose.The resulting matrix T = ( I + L 13 )(I + R 17 )(I + L 5 ) has order 2 32 ; However, if a method is a record for speed, and at the same time comes from a simple generating procedure, then it is worth considering among the many that have been put forward we think the ziggurat method is such a o n e .

Summary
Extremely fast and simple sampling from decreasing densities can be provided by c o vering, as in Figures 1 and 2, the decreasing density f (x) with layers of 255 equal-area rectangles such that the base, made up of a smaller rectangle with the tail attached, has the same area, v, as each o f the 255 rectangles.The rectangle R i has range 0 < x < x i , with 0 = x 0 < x 1 < < x 255 , and with vertical range f (x i ) < y < f (x i;1 ).The x's are related by x i f (x i;1 ) ; f (x i )] = v and r f(r) + Z 1 r f (x) dx = v with r = x 255 : The x's may be found by successively trying values for r = x 255 until the process v = r f(r) + Z 1 r f (x) dx and x i f (x i;1 ) ; f (x i )] = v for i = 2 5 4 : : : 2 1 yields an x 1 for which x 1 (f(0) ; f (x 1 )) = v.
Implementation is made faster by using tabled values w i = x i =2 32 , k i = b2 32 (x i;1 =x i )c and f i = f (x i ).The fast part, used about 99% of the time, can be implemented inline, for speed and for the convenience of being able to produce a required variate by merely placing, for example, REXP or RNOR in any expressions where such random variables are required.

Figure 1 :
Figure 1: The ziggurat method with 7 rectangles and a bottom, base strip.
nd that the ziggurat algorithm can provide RNOR or REXP, normal or exponential variates, at about 15 million per second for CPU's operating at 400MHz.Exact timings vary, of course, with the platform used and the compiler.We compared times for two other methods said to be fast:Leva 3] and Ahrens-Dieter 1].We used a 400 MHz Pentium II PC with two d i e r e n t compilers: Microsoft Visual C++ (MS) and GNU gcc with -O3 optimization.To p r o vide a fair comparison, we used the fast inline SHR3 or UNI wherever random integers or reals were needed.Times are in nanoseconds. of the criteria by which r a n d o m v ariate generators are judged, but not the only one. We