and

We describe three tests of randomness—tests that many random number generators fail. In particular, all congruential generators—even those based on a prime modulus—fail at least one of the tests, as do many simple generators, such as shift register and lagged Fibonacci. On the other hand, generators that pass the three tests seem to pass all the tests in the Diehard Battery of Tests. Note that these tests concern the randomness of a generator's output as a sequence of independent, uniform 32-bit integers. For uses where the output is converted to uniform variates in [0,1), potential flaws of the output as integers will seldom cause problems after the conversion. Most generators seem to be adequate for producing a set of uniform reals in [0,1), but several important applications, notably in cryptography and number theory—for example, establishing probable primes, complexity of factoring algorithms, random partitions of large integers—may require satisfactory performance on the kinds of tests we describe here.


Introduction
Judging from journal articles and the far more topical content of newsgroup exchanges, there is increasing interest in the use of random number generators (RNG's) and their suitability for various kinds of computer simulations and for use in cryptography and computational number theory.Much of the interest in RNG's may be classified under two headings: (1) Which are good RNG's and (2) How do you decide?
For question (1), we have suggested a number of answers, see [3,5,7], and this note is directed toward question (2), for which we suggest three tests of randomness that serve to help distinguish the good from the not-so-good.
In an effort to provide tests that were more stringent than the usual easy-to-pass tests first described in [2] and extended in Knuth's widely read book [1], Marsaglia described several more tests in [3], then added others to make up the Diehard Battery of Tests of Randomness [6].These tests were included in a CDROM produced under a grant from the National Science Foundation.The CDROM provided a reliable source of random bits (600 megabytes) as well as the Diehard Battery of Tests of Randomness.Some 1000 free copies were distributed to interested researchers worldwide.
The stock of free copies was soon exhausted, but the CDROM is readily available on the internet, from sites at Florida State University and the University of Hong Kong, and it has been accessed hundreds of thousands of times.
The three tests we describe here may be viewed as a distillation of the Diehard battery of tests: reduce the volume and concentrate the essence.Based on our experience, if a random number generator passes the three tests described here, it is likely to pass the tests in Diehard.In addition, this 'distilled version' of Diehard is much easier to apply, because unlike Diehard, which requires a binary file of some 12 megabytes of random bits (and the concern of many queries on creating such files), the three tests here merely require the name of the C function that provides the 32-bit random integers to be tested.
In order to save journal space, specific versions of the three tests are not included, but C versions are readily available, and will be included in new versions of the Diehard CDROM available via the internet.+ 0 If u and v are random 32-bit positive integers, then three objects result from applying Euclid's algorithm to the pair u, v: (1) the number the iterations, k, needed to find the gcd (k=5 in the above example), (2) a variable-length sequence of partial quotients (1,4,3,3,2 in the example) and (3) the gcd-the final value of u (3 in the example).

The gcd Test
Repeating this procedure for many random choices of pairs u, v will produce three lists: (1) a list of k's that are independent and identically distributed (iid), (2) a list of variable-length sequences of partial quotients with elements not iid, and (3) a list of resulting gcd's that are iid.
Successive steps of Euclid's algorithm provide the partial quotients q of the continued fraction expansion of a ratio u/v.Such q's also arise as the partial quotients of the continued fraction expansion of a random real in (0,1].In the latter case, the q's are nearly iid with a distribution conjectured by Gauss: Pr[q < x] = ln(1 + x)/ ln (2).A thorough treatment of Euclid's algorithm, its history and relation to continued fractions, are in Knuth [1], with old and new results (pages 333-379, with extensive "Answers to Exercises" in pages 640-657).
The q's that arise from a random u ∈ {1, 2, . . ., 2 32 −1} and fixed v = n = 2 32 seem adequately close to iid with Gauss's conjectured distribution.But the q's that arise from both u and v randomly chosen from 1 to n = 2 32  −1, although they also seem to be close to iid, have a different distribution, the mixture with weights 1/n of the distributions with fixed v from 1 to n and u random from 1 to v. We have found that apparent distribution empirically.But we have not included a test on the distribution for such q's, as it did not turn out to be as discriminating as do tests on k and the gcd.
Thus we restrict ourselves to the true iid situations: the k values and the gcd values that result from many random choices of u and v in {1, 2, . . ., 2 32 −1}.It turns out that if we choose u and v as 32-bit integers from a random number generator, the distribution of k and the gcd will, for some generators, be significantly different from what we would expect from two truly independent random 32-bit integers.Note that in determining k, the number of steps in Euclid's algorithm, the first step may be 'wasted' if u > v-that is, the first partial quotient may be 0 and one proceeds as in versions of the algorithm where u and v are interchanged if necessary.The latter provides k's after one fewer steps.We have chosen to use the k count for unswitched u and v.
Unfortunately, we do not know the true distribution of k, the number of iterations needed, although extensive empirical study shows that k is quite close to normally distributed with mean µ = 18.5785 and σ = 3.405.Since the discrete variate k is nearly normal, its distribution might be adequately represented by a member of one of the standard discrete families.Poisson will not do, since the mean is not close to the variance, but binomial might serve.The mean of the binomial distribution is Np, and the variance is Npq.Setting the ratio to q = Npq/(Np) = σ 2 /µ = 3.405 2 /18.5785 = .624,say, we should consider a binomial distribution with p = .376,q = .624.But what value for N ?.It has been known since 1733 that k, the number of steps in Euclid's algorithm, takes maximum values N when v and u are successive Fibonacci numbers-see Knuth [1], p360.This serves as a guide for possible values of N for our binomial approximation.The Fibonacci number closest to n = 2 32 is F 48 .Experimenting with a few values for N near 48 yields the following: the distribution of k seems best fit to the binomial distribution when N = 50 and p = .376.The fit is quite good, as indicated in Figure 1, which contains the points (i, Pr[k = i]), (circles), the binomial probabilities for n = 50, p = .376(crosses), and the normal density with mean µ and standard deviation σ .
The binomial probabilities, Pr[k = i] = 50 i .376i .524 50−i serve quite satisfactorily for a test on the k values, (truncating, say, for k ≤ 3 and k ≥ 35), but we use a little more accurate table based on extensive simulation in our version of the gcd test.
(For the general case, we have found that for two random choices of u, v in 1, 2, . . ., n, the expected number of steps in Euclid's algorithm is .842766ln(n) + .06535,and the variance is .5151ln(n) + .1666.) As for the gcd's, when u and v are chosen randomly from a range as large as 1 to 2 32 , the distribution of their gcd is adequately close to the theoretical limit, Pr[gcd = j] = c/j 2 , with c = 6/π 2 .(The probability that u and v are both multiples of d is proportional to 1/d 2 , and the required probability is Pr given that fact.The constant comes from 1 ) Note that we do not need the true distributions to develop a test of randomness.All we need do is compare the sample distribution from a particular RNG with the standard provided by a number of presumably good RNG's, 'presumably good' meaning that they produce results so close to a single one that the single one may be used as a standard.Such a standard is based on a large number of sample distributions, each based on many repetitions (typically 10 10 or 10 11 ) of the test procedure for RNG's that appear to give similar results.Then, if a particular RNG produces results significantly far from the standard, we say that the generator fails the test.This will usually result from a goodness-of-fit test comparison with the standard, returning a p-value very close to 0 or 1.
Use of the empirical distribution for k and the assumed iid gcd's, based on extensive simulations, leads to a test of randomness that shows some RNG's are significantly different from others when applying Euclid's algorithm to pairs of values from the generator.The most frequent disparity is in the distribution of k, the number of steps needed to terminate Euclid's algorithm, but some RNG's also differ significantly in the distribution of the gcd's.
We call this the gcd test.A C version is available separately, but here is an outline of its features: Tables that provide probabilities for the expected values for the k's are included.Any k values< 3 are lumped at 3; k values>35 are lumped at 35.Probabilities for the gcd's have the form c/j 2 and need not be stored in advance, while probabilities for a gcd >100 are lumped at 100.Tables for the cell counts for the k's and for the gcd's , ktbl[36] and gcd[101] are initialized to 0. The following C segment generates n (usually 10 7 ) 32-bit pairs u and v, then increments the gcd and k tables for each pair; (IUNI is a #define that provides the random 32-bit integer): for(i=1;i<=n;i++){k=0; do{u=IUNI; v=IUNI;} while (u==0 || v==0); //get non-zero u and v do{w=u%v;u=v;v=w;k++;} while(v>0); if(u>100) u=100; gcd A standard goodness-of-fit test, (obs-exp) 2 /exp, compares the 33 lumped ktbl counts with the expected counts and converts to a p-value via a chi-square distribution with 32 degrees of freedom.A standard goodness-of-fit test compares the 100 lumped gcd counts with the expected counts and converts to a p-value via a chi-square distribution with 99 degrees of freedom.
Congruential generators x n = ax n−1 mod p, with p a prime and a a primitive root, consistently fail the gcd test, returning a p-value of 1.0000 to indicate a terrible fit to the required k distribution.So do the most commonly used of all RNG's: congruential x n = ax n−1 + odd c mod 2 32 .(For prime moduli> 2 32 , the 32-bit portion is returned via a mask, while for p = 2 32 − 5 the result is taken to be a random 32-bit integer.)However, extended congruential generators, x n = a 1 x n−1 + • • • + a r x n−r mod p for r > 1 and p a prime near 2 32 (using a 32-bit mask if p > 2 32 ) seem to pass both the tests: distribution of k's and distribution of the gcd's.
To look more closely at the number of steps in Euclid's algorithm for various RNG's, here is a table of the frequencies for k ≤ 3, k = 4, . . ., 11 after 10 7 calls to several full-period congruential generators, (moduli 2 32  For details of KISS (Keep It Simple Stupid) and SHR3, a three-shift shift register generator, see [6,7] or search 'discussions' on www.deja.comfor KISS+Marsaglia, or SHR3+Marsaglia using the 'all' and 'complete' archives.
It is clear that the distribution of k, the number of steps in Euclid's algorithm for two random 32-bit integers, departs significantly from the true one, when u and v are chosen with a congruential RNG, whether with prime modulus or modulus 2 32 .They all seem to produce a p-value very close to 1 from the standard (obs-exp) 2 /exp goodness-of-fit test.
It turns out that the average values for k are satisfactorily close to the value 18.7585 found from extensive simulation.(In our investigations, we found Knuth's expression for the expected value of k, 12 ln(2) ln(n)/π 2 + .06can be more accurately represented by 12 ln(2) ln(n))/π 2 + 0.06535.See Knuth [1], pages 337 and 372.)For the distribution of the gcd of two random 32-bit integers, the results are not as striking.It is obvious that two successive values from a congruential generator x n = ax n−1 + odd c mod 2 32 will never have an even gcd, because the trailing bit alternates.As for the congruential generators modulo a prime, most of them seem to pass the gcd distribution test, but not all of them-for example, 69070 is a primitive root for the prime 2 32  −5, but the congruential generator x n = 69070x n−1 mod 2 32 −5 provides an unsatisfactory distribution for the gcd of two successive outputs.Here are a few examples of expected and observed values for the gcd test.

The Gorilla Test
Merits of this test are suggested by a quote from reference [4]: Few images invoke the mysteries and ultimate certainties of a sequence of random events as well as that of the proverbial monkey at a typewriter.
The idea is to use the random number generator to produce a sequence of 'letters' from an alphabet, then study the frequency of k-letter words in that sequence.These were called monkey tests in [4], and they account for many of the most-difficult-to-pass tests in the Diehard battery.Making the sequences very long serves to show the shortcomings of many RNGs, particularly those whose relatively short periods make them unable to produce a reasonable proportion of the possible k-tuples.
Probably the best way to assess the distribution of k-letter words produced by the 'monkey' is to create a cell for each possible word, then count the number of appearances of each word in a long sequence of letters produced by the monkey.Then the quadratic form in a weak inverse of the covariance matrix of the zeroadjusted cell counts provides a chi square test.That turns out to be Q k − Q k−1 , where Q k is the naive Pearson quadratic form, (observed-expected) 2 /expected, for words of length k, see Marsaglia [3].Unfortunately, there are usually too many possible k-letter words to keep cell counts for each one.An alternative, as advocated in [3], is to count the number of missing k-letter words in a long string produced by the monkey.The resulting count should be approximately normally distributed with mean determined by theory and variance determined by extensive simulation.
That is the basis of the test included here.As a strong version of a monkey test, we call it the gorilla test.It is applied as follows: For 32-bit integers produced by the generator, specify one of the 32 possible bit positions, with the bits numbered 0 to 31 from most-to least-significant.Form a sequence of 2 26  + 25 such bits, made up of the designated bit of each of 2 26  + 25 integers from the generator.If x is the number of 26-bit 'words' that do not appear in that sequence, then x should be approximately normally distributed with mean 24687971 and standard deviation 4170, so that ((x − 24687971)/4170) should be uniformly distributed in [0, 1), that is, provide a p-value for the test, where () is the standard normal distribution function.
Below are examples of the output of the Gorilla Test for various RNG's.The output contains the p-value for each of the 32 bit positions of the generator's output.Recall that we first specify the bit position to be studied, from 0 to 31, say bit 3. Then we generate 67,108,889 (2 26  + 25) numbers from the generator and form a string of 67,108,889 bits by taking bit 3 from each of the generator's numbers.In that string of 67,108,889 bits we count the number of 26-bit strings that do not appear.That count should be approximately normal with mean 24687971 and standard deviation 4170.Converting this by means of the appropriate normal probability transformation, we get a number between 0 and 1, a p-value.The output of the Gorilla Test gives the pvalues for each of the 32 bit positions, then does an Anderson-Darling-Kolmogorov-Smirnov test on those 32 presumed uniform (0,1) values.

The Birthday Spacings Test
The birthday spacings test is an extension of the iterated spacings test described by Marsaglia in [3].If m birthdays are randomly chosen from a year of n days, then sorted, the number of duplicated values among the spacings between those ordered birthdays will be asymptotically Poisson distributed with parameter λ = m 3 /(4n).Theory provides little guidance on speed of the approach to limiting form, but extensive simulation with a variety of RNG's provides values of m and n for which the limiting Poisson distribution seems satisfactory.Among these is m = 1024 birthdays for a year of length n = 2 24 with λ = 16, the version used in Diehard [6], and the more stringent versions, n = 2 20 , 2 23 , 2 26 , 2 29 , 2 32  ; m = 256, 512, 1024, 2048, 4096.Experience suggests that a RNG that passes the birthday spacing test for n = 2 32 , m = 2 12 , λ = 4 will pass the test for those other values of n, m, λ, so we have chosen n = 2 32 and m = 2 12 , λ = 4, as the most-difficult-to-pass version of the birthday spacings test.That is the one used here.We shorten the name to the bday test.It goes as follows: The RNG in question produces 4096 birthdays.Each birthday is a 32-bit integer, a 'day' in a 'year' of 2 32 days.The 4096 birthdays are sorted, then differences taken to get the spacings.Then the spacings are sorted, from which a count of the number of duplicate spacings is evident.That provides a purported Poisson value with mean λ = 4.This process is repeated to get a sample of 5000 from that Poisson distribution.Then a goodness-of-fit test, (observedexpected) 2 /expected, is applied to the Poisson sample, the result inserted into the appropriate chi-square distribution function to get a value in [0, 1)-a p-value.
Here is an example of output from the bday test for a good RNG: Notice the different patterns of the observed versus expected values for generators that fail the birthday spacings tests.For some generators, there are far too many values less than the mean, 4, while for others there are far too many values greater than 4.

Summary and Remarks
We have presented three tests: gcd, Gorilla and bday.The first test, gcd, tests whether k, the number of steps to completion of Euclid's algorithm, and the resulting gcd, both have the distributions called for by the underlying probability theory for two independent uniform 32-bit random integers.Many RNG's fail this test; in particular congruential RNG's-even those with a prime modulus-seem to fail the test on the distribution of k, and often on the distribution of the gcd.An analogous test could be used for RNG's producing more than 32 bits; the gcd's should take value j with frequency proportional to 1/j 2 , but the distribution of k, steps to completion, may require a different table of probabilities.
Failure of a RNG to satisfy randomness aspects of Euclid's algorithm, the oldest and one of the most fundamental of those in number theory, may be of concern to those using procedures in computational number theory where conclusions are based on the assumption that random selections of integers from 1 to n are indeed independent and uniform.Examples are in probable prime tests or the complexity of factoring algorithms that depend on random selections.
The second test, the Gorilla Test, is based on what we have called Monkey Tests, in the sense of the RNG as a monkey at a typewriter, randomly producing a very long string of keystrokes.The number of missing k-letter words should be approximately normal with certain mean and variance depending on word size, alphabet size and probabilities for letters of the alphabet.A particularly strong monkey test, hence the Gorilla Test, chooses a particular bit from each 32-bit random integer, forms a string of 2 26 such bits, counts the number of missing 26-bit 'words' in that long string.The test is applied to each of the 32 bit positions; surprisingly, some RNG's consistently fail for certain bit positions.
An analogous version could be used for generators producing fewer or more than 32 bits.The third test, bday, is a strong version of Marsaglia's Birthday Spacings Test, in which each random integer is used to determine a birthday in a year of n days.With m birthdays chosen and put in increasing order, the number of duplicate values among the spacings between the birthdays should have a Poisson distribution with parameter λ = m 3 /(4n).The version here is strong in the sense that many RNG's might do well for smaller m's and n's, but fail as one or the other is increased.A year of n = 2 32 days is the implicit limit for 32-bit RNG's, while choosing m = 4096 birthdays seems to provide a test that some otherwise promising RNG's fail.
Analogous versions could be applied to RNG's on 48, 64 bits, etc, with n increased.Applications where tests such as bday may have implications are in computational number theory, where many procedures use a selection of random integers from 1 to n and conclusions are based on the assumption that they are independent and uniform.Of course, such applications often involve random integers with hundreds of bits, but experience with 32 bits suggests that analogous tests for larger integers may be worth considering.
We begin with an example: Consider integers u = 297 and v = 366.Use Euclid's algorithm to determine the gcd of u and v:

Figure 1 :
Figure 1: Points (i, Pr[k = i]), binomial probabilities and the normal density, µ = 18.7585, σ = 3.405 gcd And here are outputs for several generators that fail the bday test: