Random Number Generators: There should be only one
I came across this interesting RNG-related question on StackOverflow the other day. Since I spent a fair bit of time constructing an answer, I thought I’d re-post it here and expand on it slightly.
Basically, the questioner wanted to know why the code below produced distinctly non-random results. It generates 1500 values between 0 and 7 and keeps a tally of how often each value occurs. The results were 1310 fives and 190 sixes (the numbers 0, 1, 2, 3, 4 and 7 did not occur even once). The expectation was that the values would be uniformly distributed, with each of the eight values occurring an approximately equal number of times.
int[] vals = new int[8]; for (int i = 0; i < 1500; i++) vals[new Random(i).nextInt(8)]++; System.out.println(Arrays.toString(vals));
The most suspicious part of this code is that it creates and seeds a new random number generator for every iteration. My initial, brief answer was the suggestion that the code should use only a single RNG instance. It’s good advice and solves the problem for this particular code, but it doesn’t explain why the original version generates only fives and sixes. It’s not like every RNG instance has the same seed, in fact every single one is different (the loop index is used as the seed). Somebody else asked if I knew why the RNGs were behaving the same even though they were seeded differently. Well I didn’t know, but I decided to find out.
I was pretty sure it was because the seeds were non-random (I’ve discussed the seeding of RNGs previously). Because the seeds were 64-bit integers in the range 0-1500, the top 53 bits were identical (all zeros) across all of the seeds. Ideally the RNG implementation would be able to overcome this lack of diversity in the seed data, but apparently not (if we repeat the experiment with a better RNG, such as the MersenneTwisterRNG, we do not see such obvious anomalies). To see how this all plays out to generate only fives and sixes requires a detour into the source code of java.util.Random.
When you specify a seed in the constructor, it is manipulated as follows:
seed = (seed ^ 0x5DEECE66DL) & mask;
Where the mask simply retains the lower 48 bits and discards the others.
When generating the actual random bits, this seed is manipulated as follows:
randomBits = (seed * 0x5DEECE66DL + 0xBL) & mask;
Now if you consider that the seeds used were sequential (0 – 1499), and they were used once then discarded, the first four seeds generated the following four sets of random bits:
101110110010000010110100011000000000101001110100 101110110001101011010101011100110010010000000111 101110110010110001110010001110011101011101001110 101110110010011010010011010011001111000011100001
Note that the top 10 bits are indentical in each case. This is a problem because we only want to generate values in the range 0-7 (which only requires a few bits) and java.util.Random does this by shifting the higher bits to the right and discarding the low bits. It does this because in the general case the high bits are more random than the low bits. In this case they are not because the seed data was poor.
Finally, to see how these bits convert into the decimal values that we get, you need to know that java.util.Random makes a special case when n is a power of 2. It requests 31 random bits (the top 31 bits from the above 48), multiplies that value by n and then shifts it 31 bits to the right.
Multiplying by 8 (the value of n in this example) is the same as shifting left 3 places. So the net effect of this procedure is to shift the 31 bits 28 places to the right. In each of the 4 examples above, this leaves the bit pattern 101 (or 5 in decimal). In the example code, 1390 different seeds resulted in this particular 3-bit sequence. The others resulted in the sequence 110 (or 6 in decimal).
If we didn’t discard the RNGs after just one value, we would see the sequences diverge. While the first four RNG sequences all start with 5, the second values of each are 6, 0, 2 and 4 respectively. The small differences in the initial seeds start to have an influence.
The Singleton RNG
As this problem illustrates, having multiple RNGs in a single application can be problematic. If the seeds are not completely independent of each other, the outputs will not be independent. In this case the seed of one RNG is derived from the seed of the previous RNG (by incrementing it). Everything is much simpler if we only have a single random number generator. Fortunately, java.util.Random, SecureRandom and all of the Uncommons Maths RNGs are thread-safe. A single instance can be accessed from multiple threads without causing problems, so there usually is no need to have multiple RNGs even in complex applications.
There a few ways that you can make sure that you only use a single RNG. You could use the singleton pattern to restrict the number of instances to one, but that approach is frowned upon these days. Dependency injection is the more politically correct alternative, just pass your single RNG instance as an argument to the constructor of each object that needs to use random numbers. A third option is the functional style, passing the RNG as an argument to each method that needs it.
Repeatability in Multi-Threaded Applications
Repeatability is a useful feature of deterministic pseudo-RNGs. If you use the same seed for two RNG instances, they will produce the same output sequence. This is helpful for debugging applications because it means you can run your random code multiple times and have it behave identically each time.
It seems that the motivation for multiple RNGs in the original StackOverflow question was to maintain this repeatability in the presence of multiple threads (though that is not immediately obvious from the example). A single RNG won’t be able to guarantee repeatability in this scenario without a little bit of extra work. The reason for this is the uncertainty introduced by the vagaries of the scheduler. You can guarantee the repeatability of the RNG output, but you can’t guarantee that individual values will always be picked up by the same threads.
There are a couple of solutions to this problem that don’t involve using one RNG per thread. They both involve decoupling the generation of the random numbers from their consumption. The simplest approach, if you know exactly how many numbers each thread needs, is to pre-generate all of the required random numbers and to pass them to the individual threads in arrays or lists. The numbers are generated in a predictable order but the order in which they are consumed is unspecified (and unimportant because we have partitioned them).
A variation on this approach, for when you don’t know how many numbers are required or when it is impractical to pre-generate all of the values and hold them all at once, is to feed the numbers to the threads using separate queues. The RNG can fill the queues in a round-robin fashion while the threads are consuming them at their own pace.
Lessons Learned
- Use a better RNG than java.util.Random
- Be careful when generating seeds.
- Don’t use multiple RNGs in a single application.

on January 13th, 2009 at 12:13 am
Hi!
Unfortunately, your advice is not good. You should have one RNG per random number sequence you want. Having one global RNG used everywhere, for different purposes, will not be good, since picking random random numbers from a good generator will NOT result in well distributed numbers. Please note that I did not write “random” twice after eachother in the last sentence by mistake.
Mats
on January 13th, 2009 at 3:26 am
Mats, can you expand on what you see as the problem?
I wasn’t specifically thinking about one RNG for different purposes, rather the scenario where you have one problem split across multiple threads (for speed-up reasons) with each thread doing an equal amount of work. If a single RNG feeds the threads in round-robin fashion, then thread 1 will get the 1st, 11th, 21st, 31st… numbers from the sequence. Taking every 10th number from an RNG and ignoring the intermediate values yields a sequence that is no less random than using all of the values. In fact this technique is sometimes used to make pseudorandom sequences harder to reverse-engineer.
But you’re specifically talking about “randomly” picking the values from the RNG. Correct me if I’m wrong, I assume you’re talking about the uncertainty introduced by having several threads fighting over a single RNG (in the case where we are making no effort to maintin repeatability). Are you saying that having multiple clients for a single RNG introduces bias?
For example, an Internet poker site has thousands of tables. Each of these tables has to shuffle a deck of cards for each hand. Are you saying that each table needs its own RNG to make it fair, rather than calling into a shared RNG whenever it needs to shuffle?
on January 13th, 2009 at 7:26 am
I’m pretty sure the answer is something like: if you are using a PRNG for the purpose of defeating predictable patterns(as in poker games), you are doling out numbers from a finite sequence that eventually repeats. Distributing the generation sequence means you generate more numbers and decreases the time to repeat. As well, if you want to reproduce results, a shared PRNG complicates matters considerably.
If you were using a truly random source, I don’t think the first would matter, and the latter would be made impossible.
on January 13th, 2009 at 9:10 am
Sorry, I think you are wrong! Picking numbers 1, 11, 21, 31,… or randomly, will give you a LESS well distributed random sequence than picking values 1, 2, 3, 4, …
Many people try to be smart, by first picking a good RNG, then they believe that the randomness will be even better if they pick numbers randomly out of the generated sequences. The fact is that the numbers get LESS randomly distributed that way, because these sequences aren’t really random, they’re pseudo random.
I did some research on this when I was writing up a proposal for having standard RNG classes in ISO/ANSI C++, around 1995 or so.
You DID actually write that one RNG per application would be a good idea:
A single instance can be accessed from multiple threads without
causing problems, so there usually is no need to have multiple
RNGs even in complex applications.
In complex applications you normally need more than just one random sequence, right?
I see the need to disguise the RNG used, for example in poker applications. Picking randomly out of such sequences, or every tenth number, will of course make it less easy for a bystander to guess the seed or RNG type, but the used numbers will be less well distributed. Perhaps this will not matter much, but if you do scientific calculations (not poker) I would NOT recommend such a technique.
And, yes, I would recommend having one RNG per table. But preventing an educated guess of what seed was used, and what the RNG is used, is of course a problem.
I don’t have my reference literature at hand (I will tomorrow), but I think my claim is fairly well documented in, for example, “Numerical Methods” by Dahlquist and Björk, or possibly “Numerical Recipes 3rd Edition” by Press, Teukolsky, Vetterling and Flannery.
on January 13th, 2009 at 2:20 pm
@James, good point about the period, but if you choose the right RNG (such as CMWC), it’s period will be so long that in practice it doesn’t really matter. The point about the complications of repeatability was one of things I was addressing. Yes, you can have one RNG per thread, and that’s not necessarily a bad thing, as long as you seed them correctly and independently.
@Mats, the complexity I meant was the complexity of multi-threading. Even so, I still believe that you can use a shared RNG for different purposes without problems. Each individual output is supposed to be independent of the other outputs. We know that for a PRNG they are not really independent because of the way that the algorithm works, but from a statistical point of view there is no correlation.
I can’t see how there would be any problem for a true RNG, so I assume you are arguing only about PRNGs. In this case, I would guess that the presence or absence of any problems depends largely on the quality of the algorithm.
I’m a programmer, not a mathematicican, so I decided to test these scenarios. I just ran Diehard on the MersenneTwisterRNG and AESCounterRNG. Instead of using a contiguous sub-sequence of the RNG output, I created the input files from every 10th value.
I then ran it again with input files that were created by generating 10 times more numbers than required and randomly selecting 10% of them (using an independent RNG – different algorithm).
In both cases Diehard found no statistical anomalies in the output. The results were just as good as when the test was run with contiguous values.
On the subject of poker, it absolutely is necessary that online poker sites have well distributed values. It is required in order to be able to obtain a licence to operate the site. The RNG is subjected to rigourous testing and analysis.
on January 14th, 2009 at 12:04 pm
How can there not be a correlation? The pseaudo random numbers are generated by a deterministic machine, right?
If you have a true RNG, where numbers are truly random, then sure, you can pick and choose any sequence you want, and the newly created sequence will be as random as the original sequence.
I tried to find the reference to my claim in the two books I mentioned, and it isn’t there. I think it is in the Communications of the ACM paper “Random Number Generators: Good ones are Hard to Find” from October 1988. It is available to buy for $10, which I thought was too much to prove my point. For you, perhaps it makes sense to get your company to pay for that paper? If I remember correctly, it was very good.
Good luck!
on February 5th, 2009 at 7:57 am
Random numbers…
Random numbers…