New Adventures in Software


Free Genetic Programming Book

Posted in Evolutionary Computation by Dan on April 7th, 2008


I just spotted this on comp.ai.genetic.  The authors of a new book called A Field Guide to Genetic Programming have made it available for download in PDF form free of charge.  Weighing in at around 200 pages, it looks like a reasonably concise introduction to the topic (unlike some of the huge and hideously expensive GP books you can buy on Amazon).

This is good timing for me because I’ve recently started hacking together a GP example application to include in the next release of the Watchmaker Framework for Evolutionary Computation.  So I can catch up on a bit of background reading to make sure I’m doing things sensibly.  Watchmaker is a general purpose evolution framework, intended to address the full range of evolutionary algorithms.  I’ve been claiming for a while that you can use it for genetic programming, so I thought it was about time I demonstrated this.  I’m not aware of anybody having used Watchmaker for GP so far.  I’d love to hear from anybody who has done so.

Genetic programming is also covered in an accessible way in Toby Segaran’s excellent book, Programming Collective Intelligence, which includes GP examples in Python.

A Java Programmer’s Guide to Random Numbers, Part 1: Beyond java.util.Random

Posted in Evolutionary Computation, Java by Dan on April 3rd, 2008


More than you ever wanted to know about randomness in Java programs.

This is the first in a series of articles about random numbers in Java programs. The series covers the following topics and also serves as an introduction to the random numbers package of the Uncommons Maths library:

  • True Random Numbers and Pseudorandom Numbers
  • Statistical Quality (making sure the dice are not loaded)
  • Performance (because sometimes you really do need half a million random numbers a second)
  • Different kinds of randomness (because not everything can be modelled by tossing a coin)
  • Degrees of Freedom (is every possible outcome actually possible?)
  • Security and Integrity (when not losing money depends on nobody knowing what happens next)

Part 1: Beyond java.util.Random

Random numbers are useful in a wide variety of software applications. They provide a crucial element of uncertainty in an otherwise deterministic world. Without random numbers in computers, the hugely popular online poker industry would not exist, video games would be boring and predictable, iTunes would have no shuffle, cryptography would be much more difficult, and many innovative algorithms, such as those used in artificial intelligence and evolutionary computation, simply couldn’t work.

True Random Numbers and Pseudorandom Numbers

“Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin.” - John von Neumann

Before continuing, it is important to make the distinction between so-called “true” random numbers and pseudorandom numbers. Though it may often seem otherwise to us as programmers, computer systems are infuriatingly deterministic. They are generally incapable of doing things randomly. To get a computer to behave in a way that is truly random, it is necessary to introduce some non-deterministic input. We could have somebody constantly rolling dice or flipping coins and typing in the results, but that is not very practical. A slightly more feasible approach is to construct a device that observes some real world phenomenon that is known to be unpredictable, such as radioactive decay or atmospheric noise. Data extracted from these events can be used a source of entropy for our applications. You could purchase a device that plugs into a serial port or USB port. To access these devices from Java you’d probably have to use C/C++ and JNI. Alternatively, you could get true random numbers indirectly - from an online service such as Random.org or Hotbits.

Since we can get truly unpredictable random numbers from this kind of hardware, why don’t we satisfy all of our randomness requirements in this way? Well the primary problem is throughput. These devices are quite limited in the quantity of randomness they can produce. They simply aren’t fast enough for many uses.

Pseudorandom numbers are not really random at all. They are the result of deterministic mathematical formulae. The best of these algorithms have been devised so that their output is statistically indistinguishable from true random numbers. PRNGs start with a single numeric seed value. The algorithm is applied to this seed to generate the output and a new, updated seed that is used to generate the next value. The mathematics involved is beyond the scope of this article - the definitive guide is the second volume of Donald Knuth’s The Art of Computer Programming.

An interesting property of this approach is that if you always start with the same seed value, you will always get the same sequence of “random” numbers. Though this can occasionally be useful, you would normally strive to avoid using the same seed value in the interests of unpredictability. A simple approach that is sufficient in many cases is to seed the PRNG from the current value of the system clock.

Aside from speed, another advantage of pseudorandom number generators (PRNGs) over true random number generators (TRNGs) is that they are more predictably unpredictable. That is, the statistical properties of PRNGs are more reliable than is the case with TRNGs.

Statistical Quality

xkcd.com

Why not java.util.Random?

The Java API helpfully provides a PRNG that we can use, the java.util.Random class. If you are a Java programmer you will almost certainly have used it at some point. For non-critical random numbers, e.g. adding some unpredictability to a game, it’s fine. It’s even pretty quick. Unfortunately, it’s not random enough - not by the standards required for more serious applications.

The problem of deciding whether a sequence of numbers is random is not an easy one to solve. You can’t simply look at the output and decide that it doesn’t look random enough. After all, if you toss a coin ten times, it could randomly come up heads every time, even though the probability of this sequence is pretty small. To get any kind of meaningful evaluation requires a large sample of the RNG output (perhaps millions of generated values). This sample can then be subjected to sophisticated statistical analysis.

Probably the best known test suite for random number generators is George Marsaglia’s Diehard Battery of Tests of Randomness. Diehard says that java.util.Random is not sufficiently random. But you don’t have to interpret Diehard’s complicated reports to see for yourself. This applet demonstrates it visibly.

Performance

So if not java.util.Random, how about Java’s other RNG, java.security.SecureRandom? SecureRandom is built for cryptography so it is specifically designed to avoid any such flaws. Diehard reports no issues with its output. Unfortunately, this quality comes at a high price - performance. In benchmarks, SecureRandom can be up to 60 times slower at generting random numbers than java.util.Random. This is bearable if you are only generating random values infrequently, but if your program relies on generating random numbers non-stop and as fast as possible (as many simulations do) then it’s a show-stopping bottleneck.

The good news is that, beyond the core API, there are random number generators as fast as, or faster than, java.util.Random with statistical properties as good as SecureRandom. The Uncommons Maths project provides a comprehensive open source random numbers package. Uncommons Maths provides three Random Number Generators, with different properties, for a wide variety of applications. Unlike java.util.Random, each of these RNGs completes the Diehard suite without any problems. Additionally, each of the Uncommons Maths RNGs is a sub-class of java.util.Random, which means any of them can be used as a drop-in replacement with minimal effort.

A good general-purpose PRNG is the MersenneTwisterRNG class. It is a pure Java port of Makoto Matsumoto and Takuji Nishimura’s proven and ultra-fast Mersenne Twister PRNG for C. Even faster is CellularAutomatonRNG, the Java port of Tony Pasqualoni’s experimental cellular automaton PRNG.

In the next article: Beyond dice and coins - using the right probability distribution.

Watchmaker Framework for Evolutionary Computation 0.4.3

Posted in Evolutionary Computation, Java by Dan on December 14th, 2007


This is mostly a maintenance release.  Uncommons Maths is now a separate project so the Watchmaker Framework has been modified to use the official version of that library. There are a few other minor tweaks (a couple of classes have been moved around, but nothing in the core framework).

Version 0.4.3 also introduces an experimental EvolutionMonitor component. This a Swing view that gives you some insight into the current state of the population while your evolutionary algorithm is running (the screenshot here shows the results of attaching the monitor to the Travelling Salesman example). In this first version all it does is graph the mean and peak fitness scores (using JFreeChart).  Future versions will hopefully display more information (perhaps I will add an API to enable data to be extracted from the population while running).  The EvolutionMonitor implements the EvolutionObserver interface so you can hook it up easily by calling the addEvolutionObserver method of your EvolutionEngine.

The other new feature is a new termination condition for terminating the algorithm when the population fitness begins to stagnate. If this condition is used and there is no fitness improvement within a specified number of generations, the evolution engine will assume that no further improvement can be made and will return the fittest individual found so far. This is often a more practical approach than specifying a maximum total number of generations or a fixed time limit in advance.

Book Review: Programming Collective Intelligence

Posted in Evolutionary Computation, Python, Software Development by Dan on December 13th, 2007


It’s called “Programming Collective Intelligence” and is presented as a book for building “Smart Web 2.0 Applications” but it is essentially an extremely accessible explanation of a wide array of machine learning and data-mining algorithms. How do sites like Amazon and Last.FM make recommendations? How do search engines work? How does Google News manage to categorise and present the most important news articles without human intervention? How do you build a useful spam filter?

All of these questions are answered and compelling example applications are built step-by-step to demonstrate the power of the ideas presented here. Decision trees, genetic algorithms, neural networks, support vector machines, genetic programming, Bayesian classifiers and non-negative matrix factorisation are some of the techniques covered and all without the dry, maths-heavy text that normally fills books on these topics.

The examples throughout are exclusively in Python, which may have put me off had I realised this when I ordered it. I have nothing against Python except for my complete lack of experience with it. However, the examples are easy enough to understand for anybody familiar with other high-level languages. As result of reading the book, I may actually try my hand at a bit of Python hacking now.

How well do these techniques work? Well I’d never have found out about this book but for Amazon’s automated recommendations system. I’d thoroughly recommend this book to anyone looking to learn about interesting AI techniques without wading through opaque academic papers.

(If you find the genetic algorithms and genetic programming topics interesting, check out the Watchmaker Framework for Evolutionary Computation and some of the books recommended there.)

Announcing Uncommons Maths

Posted in Evolutionary Computation, Java by Dan on November 19th, 2007


Uncommons Maths is a Java library consisting of a comprehensive random numbers package and other useful mathematical utility classes. It was originally part of the Watchmaker Framework for Evolutionary Computation but, due to its usefulness in other domains, it has now been converted into a standalone project (Apache Licence).

This article briefly describes what’s available in this first public release. I am most definitely not a mathematician and, as such, this library is written by a programmer for programmers (but if mathematicians find it useful that’s good too). It includes classes that are useful in real world programs and is not intended to ever cover the full spectrum of mathematics. However, I hope that it will expand in scope over time in this spirit of pragmatism. To that end, suggestions and contributions are actively encouraged.

Random Number Generators

The Uncommons Maths library provides three easy-to-use, statistically-sound, high-performance pseudorandom number generators (RNGs). They are:

MersenneTwisterRNG
A Java port of the fast and reliable Mersenne Twister RNG originally developed by Makoto Matsumoto and Takuji Nishimura. This is faster1 than java.util.Random and does not have the statistical flaws2 of that RNG.
CellularAutomatonRNG
A Java port of Tony Pasqualoni’s ultra-fast Cellular Automaton RNG. It uses a 256-cell automaton to generate random values. To the best of my knowledge, this is the fastest1 available pure Java RNG that completes the Diehard test suite without any problems.
AESCounterRNG
This is a cryptographically-strong3 non-linear RNG that is around 10x faster1 than java.security.SecureRandom. Reverse-engineering the generator state from observations of its output would involve cracking the AES block cipher.
  1. A benchmark comparing the performance of these three RNGs and the two JDK RNGs can be found here (under the title “RNG Performance”).
  2. This applet demonstrates the non-randomness of java.util.Random.
  3. The algorithm is not the only security consideration for RNGs. The source, secrecy and integrity of the seed data is also vital. For highly sensitive applications, consider using something like Fortuna.

Probability Distributions

Using the included probability distribution wrappers, these RNGs (and the standard JDK ones) can be used to generate values from Uniform, Normal, Binomial, Poisson and Exponential distributions.

Permutations & Combinations

Uncommons Maths also includes generics-enabled combination and permutation generators. These are based on Java classes originally written by Michael Gilleland.

Statistics

Uncommons Maths provides a statistical data set class that can calculate a variety of descriptive statistics (variance, median, standard deviation, arithmetic and geometric means, etc.) for a set of values.

Other

Other useful features in Uncommons Maths include utility methods to complement those in java.lang.Math, and utility classes for manipulating binary data.

The Unbeatable Draughts (Checkers) Player

Posted in Evolutionary Computation, Software Development by Dan on July 21st, 2007


While some of us can just about muddle through a Sudoku, others are aiming higher. The BBC has the story of how a team from the University of Alberta has solved Checkers (or “Draughts” as we like to call it in this part of the world). The Chinook program was already well capable of beating the best human players, now it’s not worth even trying since it can’t be beaten. Earlier incarnations of Chinook made use of evolutionary approaches. The latest release is the result of a phenomenal amount of CPU time dedicated to analysing every possible game position and determining the best move for each. A solution for Chess is still some way off.

Evolving Sudoku (Watchmaker 0.4.1)

Posted in Evolutionary Computation, Java by Dan on July 21st, 2007


Everybody loves Sudoku (probably), and evolutionary computation is kind of neat, so what could possibly be better than an animated evolutionary Sudoku solver?

The applet’s animation gives a good feel for how the randomly directed search eventually converges on the right solution through the power of cumulative selection. Have a play with the population size setting to trade-off performance with reliability (harder puzzles will typically require a larger population).

I’ve got lots of ideas for improvements to the Watchmaker Framework for Evolutionary Computation, but before I could get started, I first had to finish off the stuff I had been playing with. So that’s what I’ve been up to this evening and the result is version 0.4.1.

Ohloh - Forgettable name, interesting site

Posted in Evolutionary Computation, Software Development by Dan on July 16th, 2007


I discovered the Open Source project directory Ohloh yesterday (thanks to Andres Almiray’s post that I noticed on JavaBlogs). Oddly the name Ohloh is instantly forgettable and I’m having to look up the URL each time I go there.

By querying CVS and Subversion repositories, Ohloh generates information about the state of a project and ties individual contributions to developer profiles. It also allows individuals to list which Open Source projects they are users of. The site uses Google Maps to show how both contributors and users are distributed geographically.

I added my Watchmaker evolutionary computation project (the Ohloh entry is here). Unfortunately java.net’s strategy of hosting web content in the Subversion repository does skew the statistics somewhat (my contributions appear to be mostly HTML content, but this is mainly generated Javadoc output).

Of the information presented, most pleasing for me is the nice green tick the project gets for being well commented:

Across all Java projects on Ohloh, 35% of all source code lines are comments. For Watchmaker Evolution Framework, this figure is 45%.

This high number of comments puts Watchmaker Evolution Framework among the highest one-third of all Java projects on Ohloh.

A high number of comments might indicate that the code is well-documented and organized, and could be a sign of a helpful and disciplined development team.

That’s a good reference for my CV: “Dan is a helpful and disciplined development team.” :)

Scholarpedia: Wikipedia with better standards?

Posted in Evolutionary Computation, Software Development by Dan on June 16th, 2007


I’ve just stumbled upon Scholarpedia, a MediaWiki-based encyclopedia. The key difference between it and Wikipedia is its focus on peer-review of content. Of course, this immediately means that it has substantially less content than Wikipedia but less is more, right? Contributors are nominated and voted in by the public based on their reputation in their area of expertise, most being notable academics.

At present Scholarpedia seems to have quite a narrow focus (most current articles are about various kinds of adaptive systems in computer science). It will be interesting to see how the project progresses. Perhaps the most promising aspect is the quality of authors who have apparently signed up to write various sections. For example, if you could ask anybody to explain Genetic Algorithms, it would probably be John Holland. And who better to write about Hopfield Networks than John Hopfield himself?

Biomorphs and other Evolutionary Algorithms in Java - Watchmaker 0.4.0

Posted in Evolutionary Computation, Java by Dan on May 26th, 2007


Version 0.4.0 of the Watchmaker Framework for Evolutionary Computation is now available for download. This release adds support for interactive evolutionary algorithms. An example applet, based on Richard Dawkin’s famous Biomorph program, demonstrates how the framework can be used with an interactive selection strategy. Using the applet you can direct the evolution of recursive pictures that resemble biological entities such as plants and insects.

The new release also adds a blazingly fast random number generator (a Java port of Tony Pasqualoni’s cellular automaton RNG). This RNG out-performs even the Mersenne Twister. By offering this RNG as an option, the framework provides potential for improved evolutionary performance.

Next Page »