Random Number Generators: There should be only one

Posted in Java by Dan on January 11th, 2009

I came across this interesting RNG-related question on StackOverflow the other day. Since I spent a fair bit of time constructing an answer, I thought I’d re-post it here and expand on it slightly.

Basically, the questioner wanted to know why the code below produced distinctly non-random results. It generates 1500 values between 0 and 7 and keeps a tally of how often each value occurs. The results were 1310 fives and 190 sixes (the numbers 0, 1, 2, 3, 4 and 7 did not occur even once). The expectation was that the values would be uniformly distributed, with each of the eight values occurring an approximately equal number of times.

int[] vals = new int[8];
for (int i = 0; i < 1500; i++)
    vals[new Random(i).nextInt(8)]++;
System.out.println(Arrays.toString(vals));

The most suspicious part of this code is that it creates and seeds a new random number generator for every iteration. My initial, brief answer was the suggestion that the code should use only a single RNG instance. It’s good advice and solves the problem for this particular code, but it doesn’t explain why the original version generates only fives and sixes. It’s not like every RNG instance has the same seed, in fact every single one is different (the loop index is used as the seed). Somebody else asked if I knew why the RNGs were behaving the same even though they were seeded differently.  Well I didn’t know, but I decided to find out.

I was pretty sure it was because the seeds were non-random (I’ve discussed the seeding of RNGs previously). Because the seeds were 64-bit integers in the range 0-1500, the top 53 bits were identical (all zeros) across all of the seeds. Ideally the RNG implementation would be able to overcome this lack of diversity in the seed data, but apparently not (if we repeat the experiment with a better RNG, such as the MersenneTwisterRNG, we do not see such obvious anomalies). To see how this all plays out to generate only fives and sixes requires a detour into the source code of java.util.Random.

When you specify a seed in the constructor, it is manipulated as follows:

seed = (seed ^ 0x5DEECE66DL) & mask;

Where the mask simply retains the lower 48 bits and discards the others.

When generating the actual random bits, this seed is manipulated as follows:

randomBits = (seed * 0x5DEECE66DL + 0xBL) & mask;

Now if you consider that the seeds used were sequential (0 – 1499), and they were used once then discarded, the first four seeds generated the following four sets of random bits:

101110110010000010110100011000000000101001110100
101110110001101011010101011100110010010000000111
101110110010110001110010001110011101011101001110
101110110010011010010011010011001111000011100001

Note that the top 10 bits are indentical in each case. This is a problem because we only want to generate values in the range 0-7 (which only requires a few bits) and java.util.Random does this by shifting the higher bits to the right and discarding the low bits. It does this because in the general case the high bits are more random than the low bits. In this case they are not because the seed data was poor.

Finally, to see how these bits convert into the decimal values that we get, you need to know that java.util.Random makes a special case when n is a power of 2. It requests 31 random bits (the top 31 bits from the above 48), multiplies that value by n and then shifts it 31 bits to the right.

Multiplying by 8 (the value of n in this example) is the same as shifting left 3 places. So the net effect of this procedure is to shift the 31 bits 28 places to the right. In each of the 4 examples above, this leaves the bit pattern 101 (or 5 in decimal). In the example code, 1390 different seeds resulted in this particular 3-bit sequence. The others resulted in the sequence 110 (or 6 in decimal).

If we didn't discard the RNGs after just one value, we would see the sequences diverge. While the first four RNG sequences all start with 5, the second values of each are 6, 0, 2 and 4 respectively. The small differences in the initial seeds start to have an influence.

The Singleton RNG

As this problem illustrates, having multiple RNGs in a single application can be problematic. If the seeds are not completely independent of each other, the outputs will not be independent. In this case the seed of one RNG is derived from the seed of the previous RNG (by incrementing it). Everything is much simpler if we only have a single random number generator. Fortunately, java.util.Random, SecureRandom and all of the Uncommons Maths RNGs are thread-safe. A single instance can be accessed from multiple threads without causing problems, so there usually is no need to have multiple RNGs even in complex applications.

There a few ways that you can make sure that you only use a single RNG. You could use the singleton pattern to restrict the number of instances to one, but that approach is frowned upon these days. Dependency injection is the more politically correct alternative. Just pass your single RNG instance as an argument to the constructor of each object that needs to use random numbers. A third option is the functional style, passing the RNG as an argument to each method that needs it.

Repeatability in Multi-Threaded Applications

Repeatability is a useful feature of deterministic pseudo-RNGs. If you use the same seed for two RNG instances, they will produce the same output sequence. This is helpful for debugging applications because it means you can run your random code multiple times and have it behave identically each time.

It seems that the motivation for multiple RNGs in the original StackOverflow question was to maintain this repeatability in the presence of multiple threads (though that is not immediately obvious from the example). A single RNG won't be able to guarantee repeatability in this scenario without a little bit of extra work. The reason for this is the uncertainty introduced by the vagaries of the scheduler. You can guarantee the repeatability of the RNG output, but you can't guarantee that individual values will always be picked up by the same threads.

There are a couple of solutions to this problem that don't involve using one RNG per thread. They both involve decoupling the generation of the random numbers from their consumption. The simplest approach, if you know exactly how many numbers each thread needs, is to pre-generate all of the required random numbers and to pass them to the individual threads in arrays or lists. The numbers are generated in a predictable order but the order in which they are consumed is unspecified (and unimportant because we have partitioned them).

A variation on this approach, for when you don't know how many numbers are required or when it is impractical to pre-generate all of the values and hold them all at once, is to feed the numbers to the threads using separate queues. The RNG can fill the queues in a round-robin fashion while the threads are consuming them at their own pace.

Lessons Learned