New Adventures in Software


A Java Programmer’s Guide to Random Numbers, Part 3: Seeding

Posted in Java by Dan on April 10th, 2008


The trilogy concludes.

This is the last of three articles about random numbers in Java programs. The series covers the following topics and also serves as an introduction to the random numbers package of the Uncommons Maths library:

  • True Random Numbers and Pseudorandom Numbers
  • Statistical Quality (making sure the dice are not loaded)
  • Performance (because sometimes you really do need half a million random numbers a second)
  • Different kinds of randomness (because not everything can be modelled by tossing a coin)
  • Degrees of Freedom (is every possible outcome actually possible?)
  • Security (when not losing money depends on nobody knowing what happens next)

Part 3: Seeding

Part 1 of this series discussed different kinds of random number generators (RNGs), highlighted the issues with using the default Java RNGs (java.util.Random and java.security.SecureRandom) and introduced the Uncommons Maths library, which provides three fast, high-quality alternative Java RNGs. Part 2 showed how to generate random numbers from different probability distributions using the classes provided by Uncommons Maths.

One aspect of pseudorandom number generators (PRNGs) that we glossed over in part 1 is the issue of seeding. In this article, we’ll look at two important issues related to the seeding of PRNGs.

Degrees of Freedom

Suppose we want to shuffle a standard deck of 52 playing cards. The Java API provides an implementation of a very good algorithm for shuffling a list (see Collections.shuffle). It even provides a variant of this method that allows us to specify our own RNG to be used. So shuffling a deck of cards (assuming it is represented as a List) is easy in Java.

Unfortunately, even though the algorithm is a perfect shuffle and we can choose an RNG that exhibits no bias, our shuffling code is still likely to be flawed in one respect. The problem is that the RNG is almost certainly not able to generate every possible outcome. There are many orderings of the deck that will never occur. The RNG has insufficient degrees of freedom. To understand why this is, we need to consider how many ways there are of ordering a deck of 52 cards. I’ll spare you the maths and tell you that the answer is 52! (that’s 52 factorial) or, in longhand, 80,658,175,170,943,878,571,660,636,856,403,766,975,289,505,440,883,277,824,000,000,000,000.

That’s a pretty big number.

Recall from the first article in this series that when we seed a PRNG we get the same sequence of output each time that we use the same initial seed. The maximum number of possible initial states for the PRNG is equal to the number of possible seed values. If each RNG state gives us a different shuffle, the number of different deck orderings we can generate is determined by the range of the RNG seed.

The CellularAutomatonRNG is very fast, but it only uses a 32-bit seed. So using it for shuffling restricts the result of any given shuffle to one of 4,294,967,296 outcomes. That’s 4.3 billion possibilities but it may as well be zero given the number of outcomes that should be possible but aren’t.

We would probably want to avoid using java.util.Random but, if we did use it, we’d give it a 64-bit long as its seed. The algorithm only actually uses 48 of those bits for seeding so that gives us 281,474,976,710,656 possible initial RNG states. 281 trillion is an improvement, but still woefully inadequate.

The MersenneTwisterRNG use a 64-bit seed. That’s 18,446,744,073,709,551,616 possible outcomes for any given shuffle. Even this huge number is a miserable, tiny, insignificant fraction of one percent of the full set of 52! possibilities that we ought to be able to generate.

SecureRandom may use different algorithms internally, depending on the platform.  On Windows, the default is the SHA1PRNG provider, which gives us a 160-bit seed.  Ignoring SecureRandom’s slowness, this is getting us closer to where we want to be (if you can call less than one trillionth of one percent of the possible outcomes ‘close’).

One more RNG…

It turns out that we need a seed of at least 226 bits in order to achieve the necessary degrees of freedom. This rules out most PRNGs that you might otherwise consider. There is at least one option however: Uncommons Maths’ AESCounterRNG. The AESCounterRNG is based on the AES block cipher. AES is the Advanced Encryption Standard adopted by the US government. The characteristics of AES that make it a good encryption algorithm also coincidentally make it a good RNG. When we use AES as an RNG, the seed is the encryption key. AES supports 128-bit, 192-bit and 256-bit keys.

Using AESCounterRNG with a 256-bit seed, we have the necessary degrees of freedom so that all 52! orderings are possible outcomes of a shuffle. AESCounterRNG, while still much faster than java.security.SecureRandom, is not quite as fast as the other RNGs we’ve been looking at. It does however have other advantages that we shall see later.

Note: Using encryption keys longer than 128 bits in Java requires that you obtain and install the unlimited strength cryptography policy files from Sun.

Security

Before you rush off to set up your new online casino, there’s one further issue to be considered. Remember that pseudorandom number generators are deterministic. That means that if we know which PRNG algorithm is in use and we know what seed was used, then we can predict with absolute certainty every single value that the RNG will ever produce. No problem for non-sensitive applications, but if people are gambling on the outcomes, you’d better be certain that they don’t have this kind of inside information.

Forget about seeding your RNG from the system clock - your seed data needs to be random itself or it is vulnerable (see How We Learned to Cheat at Online Poker: A Study in Software Security for a real world account of how RNG weaknesses can be exploited in online poker). Even if an attacker doesn’t know the seed, with many algorithms it can be reversed engineered by observing the RNG’s output and applying a little computing power.

To address these security concerns, the first step is to pick a PRNG implementation that is not susceptible to reverse-engineering. That brings us back to the AESCounterRNG. Being based on AES means that reverse -engineering the seed would involve cracking the cipher. Good luck doing that in a timely manner.

Disclaimer: You should not rely on the AESCounterRNG to address all of your security concerns. A fuller understanding of the issues can be gained by reading Ferguson and Schneier’s Practical Cryptography - particularly the description of their Fortuna PRNG design.

So where do we get the seed data from? We have a bit of a bootstrapping problem. In order to generate good quality, unpredictable random numbers we first need some good quality, unpredictable random data. By default, all of the Uncommons Maths RNGs seed themselves from /dev/random on Unix systems. On Windows systems seed data will be download from random.org and if neither of those strategies is succesful (perhaps the SecurityManager prevents file or network access) then it will fall back on the seeding strategy provided by SecureRandom.generateSeed. However, seeding strategies are pluggable so that you can provide your own. One possible strategy is to use a true hardware RNG to bootstrap the faster PRNG that will generate the actual output.

The End

And that’s pretty much everything I can tell you about using random number generators in Java. If you have any feedback on Uncommons Maths, please use the issue tracker for bugs and suggestions. If you want to know more about pseudorandom number generators, the following books cover the theoretical and practical issues in much more depth, and much more precisely, than I have done.

The Art of Computer Programming Practical Cryptography

6 Responses to 'A Java Programmer’s Guide to Random Numbers, Part 3: Seeding'

Subscribe to comments with RSS or TrackBack to 'A Java Programmer’s Guide to Random Numbers, Part 3: Seeding'.

  1. James said,

    on April 11th, 2008 at 10:38 am

    Hi Dan
    Great article. This is a very interesting series, which would be great to get onto JavaLobby.
    Let me know if this would be ok with you and we can discuss.

    Regards
    James

  2. Jeff said,

    on April 17th, 2008 at 2:12 am

    “Both the MersenneTwisterRNG and Java’s SecureRandom use a 64-bit seed. ”

    Are you sure SecureRandom uses a 64-bit seed? The SUN SHA1PRNG provider appears to use 160 bits (still less than the number of possible decks).

  3. Dan said,

    on April 17th, 2008 at 7:56 pm

    Jeff, you are absolutely right. Thanks for the correction - I have changed the text to be more accurate. I was repeating information from memory from an article I read quite a while ago. I can’t find the original page, but after your comment I double-checked the implementation of SecureRandom and I was indeed mistaken in my original posting.

  4. Henry said,

    on April 18th, 2008 at 1:10 pm

    hum… aren’t there some hardware random number generator we can use to seed the random gen?

  5. Duru Dominic said,

    on June 16th, 2008 at 6:47 pm

    My name is Dominic from west Africa. I read your article few days ago after a long period of staying in darkness. In fact, we have been deceived by private individual and our Government into believing that the so called lotto machines have “keys” and “plans” which they play and we end up loosing our money on a daily basis.

    These machines, some computerized and balls are owned by these people. Your article has been helpful.This is how it works; The machines have 90 numbers to select from in a random manner.I.e 1,2,3,4,5,…,.,.,.90
    The first five numbers that come out each time the machine is operated are the winning numbers and the last five,I.e,the 86th,87th 88th 89, 90th numbers are called the machine numbers.And all the results are compiled starting from the first time and this is where we llok at to guess the first five numbers to play the next time it is operated,still we loose money. So, is it possible to get accurately the first five.It is played once a week. Can it be solved by Xn+1 = (aXn + b) mod c.If possible, how do we get the constants a,b,and c .How do we get the “seed” PLease I need your help.Below is a sequence of previous results :
    1th 2nd 3rd 4th 5th 86th 87th 88th 89th 90th

    week1 44 35 27 61 39 71 12 33 89 9
    week2 34 77 2 65 62 80 47 74 79 60
    week3 5 34 9 40 41 27 3 58 63 76
    week4 55 49 87 20 3 48 67 61 65 54
    week5 61 45 87 50 38 14 13 71 21 15

    I hope you can look at these and tell give a useful information. Thank you very much for you anticipated response.
    Duru Dominic
    0092348035405004
    Email: perfectaffairs@yahoo.com

  6. Dan said,

    on June 16th, 2008 at 11:09 pm

    Dominic, from your description, it’s not clear how the numbers are generated. If there are actual physical balls that are picked out of a machine (as in the UK National Lottery), then which balls get picked is a result of physical factors and timing. I don’t think it would be very easy to predict but, if you are interested, see the book Fortune’s Formula by William Poundstone. There is a section describing how Claude Shannon and Ed Thorp built a device to predict roulette wheels.

    If the numbers are the result of a PRNG then, to reverse engineer it, you would need to find out what algorithm it uses. As I mentioned in the article, it would probably be an algorithm that is difficult (almost impossible) to reverse engineer. Also you are assuming that the device is not re-seeded from week-to-week. That’s probably not the case.

    Are you suggesting that the lottery is rigged? Detecting bias in the results can be done with statistics but it would probably require a very large sample of outputs to be worthwhile.

    The best plan is not to play. Even when they are fair lotteries are designed to make the players lose money. That’s because their primary function is to raise money for some other purpose. This means that only a fraction of the money paid in goes into the prize pool. In the UK National Lottery, for example, only about 50% of the money is paid out in prizes. Compare this 50% expected return to the 99.4% you’d get playing Blackjack in a casino, and it’s clear that if you like to gamble, you should chose something other than the lottery.

Leave a Reply