RSS Feed

New Adventures in Software


Escape Analysis in Java 6 Update 14 – Some Informal Benchmarks

Posted in Java by Dan on May 31st, 2009


Sun recently released update 14 of the Java 6 JDK and JRE.  As well as the usual collection of bug fixes, this release includes some experimental new features designed to improve the performance of the JVM (see the release notes).  One of these is Escape Analysis.

To see what kind of impact escape analysis might have on my applications, I decided to try it on a couple of my more CPU-intensive Java programs.  Escape analysis is turned off by default since it is still experimental.  It is enabled using the following command-line option:

-XX:+DoEscapeAnalysis

Benchmark 1

The first program I tested is a statistical simulation.  Basically it generates millions of random numbers (using Uncommons Maths naturally) and does a few calculations.

VM Switches: -server
95 seconds

VM Switches: -server -XX:+DoEscapeAnalysis
73 seconds

Performance improvement using Escape Analysis: 23%

Benchmark 2

The second program I tested is an implementation of non-negative matrix factorisation.

VM Switches: -server
22.6 seconds

VM Switches: -server -XX:+DoEscapeAnalysis
20.8 seconds

Performance improvement using Escape Analysis: 8%

Conclusions

These benchmarks are neither representative nor comprehensive.  Nevertheless, for certain types of program the addition of escape analysis appears to be another signficant step forward in JVM performance.

Want more articles like this? Subscribe to the feed.

Watchmaker 0.6.0 – Evolutionary Computation for Java

Posted in Evolutionary Computation, Java by Dan on April 26th, 2009


Version 0.6.0 of the Watchmaker Framework for Evolutionary Computation is now available for download.  This release incorporates several minor changes that I’ve been making over the last few months.  Consult the changelog for full details, but here are the highlights:

Numerous Improvements to the Evolution Monitor and other Swing Components

The Watchmaker Swing library provides a collection of GUI components that simplify the process of building user interfaces for evolutionary programs. These components have received many improvments for version 0.6.0. As well as controls for manipulating evolution parameters while the program is running, the library also provides an Evolution Monitor component. This provides real-time information about the state of the program, including a view of the fittest candidate so far and a graph showing changes in population fitness over time.

Upgraded to Uncommons Maths 1.2

This means even faster RNGs are available for you to use. It also means that we now use the Uncommons Maths Probability class rather than duplicating it in the framework (this means you may have to change some imports in your code when upgrading from Watchmaker 0.5.x).

Caching Fitness Evaluator

Version 0.6.0 introduces the CachingFitnessEvaluator class. This is a wrapper that provides caching for existing FitnessEvaluator implementations. The results of fitness evaluations are cached so that if the same candidate is evaluated twice, the expense of the fitness calculation can be avoided the second time. The cache uses weak references in order to avoid memory leakage.

Caching of fitness values can be a useful optimisation in situations where the fitness evaluation is expensive and there is a possibility that some candidates will survive from generation to generation unmodified. Programs that use elitism are one example of candidates surviving unmodified. Another scenario is when the configured evolutionary operator does not always modify every candidate in the population for every generation.

Caching of fitness scores is provided as an option rather than as the default Watchmaker Framework behaviour because caching is only valid when fitness evaluations are isolated and repeatable. An isolated fitness evaluation is one where the result depends only upon the candidate being evaluated. This is not the case when candidates are evaluated against the other members of the population.

Mona Lisa Example

After seeing Roger Alsing’s evolution of the Mona Lisa, I was inspired to try to reproduce it using the Watchmaker Framework. I didn’t follow Roger’s methodology but I have come up with something similar. My results aren’t as impressive as his latest efforts but may be interesting anyway. This example was actually included in version 0.5.1 but I didn’t draw attention to it. In 0.6.0 I’ve improved performance and used it to demonstrate the Watchmaker GUI components mentioned above.  You can try it for yourself here.  Maybe you can come up with a combination of parameters that works better than the defaults I have provided?

Useful Watchmaker Links

If you are new to Evolutionary Computation in Java,  these previous articles may be of interest:

Want more articles like this? Subscribe to the feed.

The Java Language Features that Nobody Uses

Posted in Java by Dan on April 17th, 2009


I read Anthony Goubard’s “Top 10 Unused Java Features” on JavaLobby earlier today.  I agree with some of his selections but I think he missed out a few key features that nobody uses.  Restricting myself to just language features (the API is too huge), here are four more widely unused features of Java.

4. The short data type

You use it? I don’t believe you. Everybody* uses int when they want integers, even if they don’t need a 32-bit range.

3. Octal Literals

Who uses Octal these days?** Hexadecimal is a more useful shorthand for binary values.  Worse, the leading-zero notation for Octal literals is just confusing:

int a = 60;
int b = 060;
System.out.println(a + b); // Prints 108.

2. Local Classes

Java has four types of nested class, three of which are widely used.  As well as static nested classes, named inner classes and anonymous inner classes, you can also define named classes within methods, though it’s rare to see one in the wild.

public class TopLevelClass
{
    public void someMethod()
    {
        class LocalClass
        {
            // Some fields and methods here.
        }
 
        LocalClass forLocalPeople = new LocalClass();
    }
}

1. Strict FP

There is probably a programmer out there somewhere for whom Java’s strictfp is vital, but I haven’t met him or her. If you already know what strictfp is used for then you are probably in the top 5% of Java programmers. If you don’t know what strictfp does, here you go, welcome to the top 5%. It’s basically about making sure that your calculations are equally wrong on all platforms.

* OK, maybe you used to be a C programmer.
** Here’s your rhetorical answer.

Related Articles

Want more articles like this? Subscribe to the feed.

Practical Evolutionary Computation: Elitism

Posted in Evolutionary Computation, Java, Software Development by Dan on February 12th, 2009


In my previous article about evolutionary computation, I glossed over the concept of elitism.  The Watchmaker Framework’s evolve methods require you to specify an elite count.  I told you to set this parameter to zero and forget about it.  This brief article ties up that loose end by explaining how to use elitism to improve the performance of your evolutionary algorithm.

In an evolutionary algorithm (EA), sometimes good candidates can be lost when cross-over or mutation results in offspring that are weaker than the parents. Often the EA will re-discover these lost improvements in a subsequent generation but there is no guarantee of this. To combat this we can use a feature known as elitism. Elitism involves copying a small propotion of the fittest candidates, unchanged, into the next generation. This can sometimes have a dramatic impact on performance by ensuring that the EA does not waste time re-discovering previously discarded partial solutions. Candidate solutions that are preserved unchanged through elitism remain eligible for selection as parents when breeding the remainder of the next generation.

NOTE: One potential downside of elitism is that it may make it more likely that the evolution converges on a sub-optimal local maximum.

The Watchmaker Framework supports elitism via the second parameter to the evolve method of an EvolutionEngine. This elite count is the number of candidates in a generation that should be copied unchanged from the previous generation, rather than created via evolution. Collectively these candidates are the elite. So for a population size of 100, setting the elite count to 5 will result in the fittest 5% of each generation being copied, without modification, into the next generation.

If you run the Hello World example from the previous article both with and without elitism, you will see that it completes in fewer generations with elitism enabled (22 generations vs. 40 when I ran it – though your mileage may vary due to the random nature of the evolution).

Source code for the Hello World example (and several other, more interesting evolutionary programs) is included in the download.

This is the third in a short and irregular series of articles on practical Evolutionary Computation, based on the work-in-progress documentation for the Watchmaker Framework for Evolutionary Computation.  The first article provided an introduction to the field of evolutionary computation and the second article showed how to implement a simple evolutionary algorithm using the Watchmaker Framework.

Further Reading

An Introduction to Genetic Algorithms Introduction to Evolutionary Computing The Blind Watchmaker

Want more articles like this? Subscribe to the feed.
Comments Off

Debugging Java Web Start Applications

Posted in Java by Dan on February 6th, 2009


How do you attach a debugger to a Java Web Start application?  Normally you probably wouldn’t bother, just start the application without Web Start and debug as normal.  However,  if you have a bug that shows up only when running in the Web Start sandbox, as I did today, that won’t help.  The SecurityManager restrictions were causing a different branch of my code to be executed than when launching the application from IDEA or the command line.

It was not immediately obvious how to attach the debugger to the Web-Started VM.  In IDEA, to remotely attach a debugger to the JVM, you should start the VM with following set of switches (or similar):

  -Xdebug
  -Xnoagent
  -Djava.compiler=NONE
  -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5005

Where do these switches go when launching a Web Start application?  Normally you launch the application by just clicking a JNLP link in your browser.

One option, which doesn’t work, is to specify the JVM arguments in JNLP file.  You can already do something like this:

  <j2se version="1.5+" java-vm-args="-ea -server"/>

Adding the debug switches is trivial… and futile.  The problem is that remote debugging requires the VM to open up a socket to accept connections from the debugger.  Rather sensibly, Web Start does not permit untrusted applications to open sockets on users’ machines.  I don’t know if it would work if the application was signed, I was too lazy to go through the hassle of signing the code.

If you want to open a socket on the client machine for debugging purposes, you are going to have to do it from the client machine rather than the JNLP file. The solution is to set the JAVAWS_VM_ARGS environment variable to include the debug switches and then to launch the javaws executable and point it at the unmodified JNLP file. From a bash shell it looks like this:

  export JAVAWS_VM_ARGS="-Xdebug -Xnoagent blah blah"
  javaws http://www.example.com/path_to/application.jnlp

You can then attach the debugger as normal.

Want more articles like this? Subscribe to the feed.

Uncommons Maths 1.2

Posted in Java by Dan on February 5th, 2009


Version 1.2 of Uncommons Maths is now available.  This adds two new RNGs, one that is extremely fast and one that has an extremely long period (both are simply translations of George Marsaglia’s C code).  This release also includes a Rational number type to enable exact arithmetic with fractional values.

Want more articles like this? Subscribe to the feed.

Practical Evolutionary Computation: Implementation

Posted in Evolutionary Computation, Java by Dan on January 21st, 2009


If an evolutionary algorithm is a good fit for a particular problem, there are plenty of options when it comes to implementing it. You may choose to use a high-level programming language for simplicity, or a low-level language for performance. You could write all of the code yourself from scratch, or you could reuse pre-written components and libraries. In this article we will necessarily be using one particular approach, but it is worth noting that there are alternatives.

Evolution Frameworks

As we saw previously, the basic outline of an evolutionary algorithm is fairly straightforward. It consists of a main loop that performs one generation per iteration, supplemented by a few functions to perform fitness evaluation, selection and mutation/cross-over. When implementing a simple EA, writing this structural code is not particularly onerous. However, if you write many different evolutionary programs, you end up writing code that is very similar over and over again.

A good programmer will usually want to extract and reuse this common code. Once you have done this, you have the basis of an evolutionary computation framework. Typically this will consist of an evolution engine that is reusable and that can accept different functions to customise fitness evaluation, selection and evolutionary operators.

An alternative to using a home-grown framework is to choose a ready-made one. There are open source evolutionary computation frameworks available for most programming languages. For popular languages, such as C, C++ and Java, there are dozens.

The advantage of a ready-made framework that is used by many other programmers is that it will have been well tested and should be free of significant bugs and performance problems. It may also provide advanced features such as parallel and/or distributed processing.

The Watchmaker Framework

The Watchmaker Framework for Evolutionary Computation is an extensible, high-performance, object-oriented framework for implementing platform-independent evolutionary algorithms in Java. It is freely available under a permissive Open Source licence.

This article introduces the core components of the Watchmaker Framework and shows how they can be used to implement simple evolutionary algorithms such as the “Hello World” example outlined previously.

The Evolution Engine

The central object of an evolutionary program built with the Watchmaker Framework is the evolution engine. An evolution engine is a general-purpose implementation of the evolutionary algorithm outline introduced previously.

The framework provides multiple implementations of the EvolutionEngine interface, but the one
that you will usually want to use is ConcurrentEvolutionEngine. As its name suggests, it takes
advantage of the parallel processing facilities of your computer in order to speed-up the evolutionary
process.

An EvolutionEngine has a single generic type parameter that indicates the type of object that it can evolve. For the “Hello World” program, we need to be able to evolve Java strings. Code that creates an engine that can evolve strings would look something like this:

EvolutionEngine<String> engine
    = new ConcurrentEvolutionEngine<String>(candidateFactory,
                                            evolutionaryOperator,
                                            fitnessEvaluator,
                                            selectionStrategy,
                                            rng);

Once you have created an EvolutionEngine, your program is as simple as calling the evolve method with appropriate arguments. However, as you can see from the code snippet above, there is a little bit of work to be done first in order to create an EvolutionEngine that is configured appropriately for the given problem. The constructor of the ConcurrentEvolutionEngine class requires five objects. These are:

  • A Candidate Factory
  • An Evolutionary Operator
  • A Fitness Evaluator
  • A Selection Strategy
  • A Random Number Generator

The Candidate Factory

The first object that needs to be plugged into the evolution engine is a candidate factory. Every evolutionary simulation must start with an initial population of candidate solutions and the CandidateFactory interface is the mechanism by which the evolution engine creates this population.

A candidate factory implementation has an associated type. It can only create objects of that type.   The type must match the type of the evolution engine that it is plugged into. You can write your own implementation of CandidateFactory for your program or, if you are using a common type such as strings, lists or arrays, you may be able to use a ready-made factory from the org.uncommons.watchmaker.framework.factories package.

For our “Hello World” program, we can use the provided StringFactory:

// Define the set of permitted characters (A-Z plus space).
char[] chars = new char[27];
for (char c = 'A'; c <= 'Z'; c++)
{
    chars[c - 'A'] = c;
}
chars[26] = ' ';
 
// Factory for random 11-character Strings.
CandidateFactory<String> factory = new StringFactory(chars, 11);

Tip: When writing your own CandidateFactory implementations, it is easiest to extend the provided AbstractCandidateFactory base class since there is then only a single method that must be implemented.

Evolutionary Operators

Evolutionary operators are the components that perform the actual evolution of a population. Cross-over is an evolutionary operator, as is mutation.

In the Watchamker Framework, evolutionary operators are defined in terms of the EvolutionaryOperator interface. This interface declares a single method that takes a list of selected individuals and returns a list of evolved individuals. Some operators (i.e. mutation) will process one individual at a time, whereas others will process individuals in groups (cross-over processes two individuals at a time).

As with candidate factories, evolutionary operators have associated types that must be compatible with the type of the evolution engine that they are used with. And, as with candidate factories, the framework provides several ready-made operators for common types. These can be found in the org.uncommons.watchmaker.framework.operators package. The cross-over and mutation operators that we need for our “Hello World” program are provided by the StringCrossover and StringMutation classes.

The Evolution Pipeline

Alert readers will have noticed that the evolution engine constructor only accepts a single evolutionary operator. So how can we use both cross-over and mutation? The answer is provided by the EvolutionPipeline operator. This is a compound evolutionary operator that chains together multiple operators of the same type.

List<EvolutionaryOperator<String>> operators
    = new LinkedList<EvolutionaryOperator<String>>();
operators.add(new StringCrossover());
operators.add(new StringMutation(chars, new Probability(0.02)));
EvolutionaryOperator<String> pipeline = new EvolutionPipeline<String>(operators);

Note: The evolution pipeline is just one of many useful operators included in the org.uncommons.watchmaker.framework.operators package. Elaborate evolution schemes can be constructed from combinations of these operators. Users of the Watchmaker Framework should take a few minutes to browse the API documentation and familiarise themselves with the available classes.

The Fitness Evaluator

So far we’ve been able to build our evolutionary program by simply combining instances of classes provided by the framework. There is one part of the program that we will always have to write for ourselves though and that is the fitness fuction, which is necessarily different for every program.

In the Watchmaker Framework, a fitness function is written by implementing the FitnessEvaluator interface. The getFitness method of this interface takes a candidate solution and returns its fitness score as a Java double. The method actually takes two arguments, but we can ignore the second for now.

The listing below is a fitness evaluator for the “Hello World” program. It simply assigns one point for each character in the candidate string that matches the corresponding position in the target string.

public class StringEvaluator implements FitnessEvaluator<String>
{
    private final String targetString = "HELLO WORLD";
 
    /**
      * Assigns one "fitness point" for every character in the
      * candidate String that matches the corresponding position in
      * the target string.
      */
    public double getFitness(String candidate,
                             List<? extends String> population)
    {
        int matches = 0;
        for (int i = 0; i < candidate.length(); i++)
        {
            if (candidate.charAt(i) == targetString.charAt(i))
            {
                ++matches;
            }
        }
        return matches;
    }
 
    public boolean isNatural()
    {
        return true;
    }
}

By some fitness measures, a higher value indicates a fitter solution. In other cases a lower value is better. The isNatural method of a fitness evaluator simply specifies which scenario applies. In Watchmaker Framework terminology, a natural fitness function is one that returns higher values for fitter individuals.

Selection Strategy

Selection is a key ingredient in any evolutionary algorithm. It’s what determines which individuals survive to reproduce and which are discarded. All we’ve said about selection so far is that it should favour fitter individuals. This definition permits several different implementations. The Watchmaker Framework includes all of the most common selection strategies in the org.uncommons.watchmaker.framework.selection package. These are sufficient for most evolutionary algorithms but, if necessary, it is straightforward to write your own implementation of the SelectionStrategy interface.

Some selection strategies work better than others for certain problems. Often a little trial-and-error is required to pick the best option. For now we will just create an instance of the RouletteWheelSelection class and use that for our “Hello World” application.  Roulette wheel selection is the most common type of fitness-proportionate selection. It gives all individuals a chance of being selected but favours the fitter individuals since an individual’s selection probability is dervived from its fitness score.

Random Number Generator

The final dependency that must be satisfied in order to create an evolution engine is the random number generator (RNG). An evolution engine has a single RNG that it passes to its candidate factory, evolutionary operator and selection strategy.  A discussion of the merits of various RNGs is beyond the scope of this article.  The standard java RNG (java.util.Random) is flawed, so instead we will use the provided org.uncommons.maths.random.MersenneTwisterRNG.

Completing the Jigsaw

We’ve now got all of the necessary pieces to complete the “Hello World” example application. Assuming that you’ve already created the StringEvaluator class (defined above) in a separate file, the code needed to create the evolution engine looks like this:

// Create a factory to generate random 11-character Strings.
char[] chars = new char[27];
for (char c = 'A'; c <= 'Z'; c++)
{
    chars[c - 'A'] = c;
}
chars[26] = ' ';
CandidateFactory<String> factory = new StringFactory(chars, 11);
 
// Create a pipeline that applies cross-over then mutation.
List<EvolutionaryOperator<String>> operators
    = new LinkedList<EvolutionaryOperator<String>>();
operators.add(new StringMutation(chars, new Probability(0.02)));
operators.add(new StringCrossover());
EvolutionaryOperator<String> pipeline = new EvolutionPipeline<String>(operators);
 
FitnessEvaluator<String> fitnessEvaluator = new StringEvaluator();
SelectionStrategy<Object> selection = new RouletteWheelSelection();
Random rng = new MersenneTwisterRNG();
 
EvolutionEngine<String> engine
    = new ConcurrentEvolutionEngine<String>(factory,
                                            pipeline,
                                            fitnessEvaluator,
                                            selection,
                                            rng);

The listing above only creates the evolution engine, it does not perform any evolution. For that we need to call the evolve method. The evolve method takes three parameters. The first is the size of the population. This is the number of candidate solutions that exist at any time. A bigger population will often result in a satisfactory solution being found in fewer generations. On the other hand, the processing of each generation will take longer because there are more individuals to deal with. For the “Hello World” program, a population size of 10 is fine.

The second parameter is concerned with elitism. For now, just use a value of zero. The final varargs parameter specifies one or more termination conditions.

Termination Conditions

Termination conditions make the evolution stop. There are a few reasons why we would like the evolution to stop. The most obvious is when it has found the solution that we are looking for. In the case of the “Hello World” program, that is when we have found the target string. The target string has a fitness score of 11, so we use the TargetFitness condition.

To complete the evolutionary “Hello World” application, add the following two lines:

String result = engine.evolve(10, 0, new TargetFitness(11));
System.out.println(result);

Note: When we move on to less trivial evolutionary programs, we will rarely be able to specify the outcome so precisely. The org.uncommons.watchmaker.framework.termination package includes other termination conditions that can be used. For example, we may want the program to run for a certain period of time, or a certain number of generations, and then return the best solution it has found up until that point. The ElapsedTime and GenerationCount conditions provide this functionality. Alternatively, we may want the program to continue as long as it is finding progressively better solutions. The Stagnation condition will terminate the evolution after a set number of generations pass without any improvement in the fitness of the fittest candidate. If multiple termination conditions are specified, the evolution will stop as soon as any one of them is satisfied.

Evolution Observers

Compile and run the above code and, perhaps after a brief pause, you’ll see the following output:

  HELLO WORLD

This is quite probably the most convoluted “Hello World” program you’ll ever write. It also gives no hints as to its evolutionary nature. We can make the program more interesting by adding an EvolutionObserver to report on the progress of the evolution at the end of each generation. Add the following code to your program before the call to the evolve method:

engine.addEvolutionObserver(new EvolutionObserver<String>()
{
    public void populationUpdate(PopulationData<? extends String> data)
    {
        System.out.printf("Generation %d: %sn",
                                    data.getGenerationNumber(),
                                    data.getBestCandidate());
    }
});

Re-compile the program and run it again. This time you’ll see all of the steps taken to arrive at the target string:

  Generation 0: JIKDORHOQZJ
  Generation 1: ULLLFQWZPXG
  Generation 2: UEULKFVFZLS
  Generation 3: KLLLFKZGRLS
  Generation 4: HLLLFKZGRLS
  Generation 5: HEDPOYWOZLS
  Generation 6: HEULKIWWZLD
  Generation 7: HPRLOYWOZLS
  Generation 8: HEULOYWOZLS
  Generation 9: HEULOYWORLS
  Generation 10: HEULOYWORLS
  Generation 11: HPLLK WQRLH
  Generation 12: HEBLOYWQRLS
  Generation 13: HEULOYWOBLA
  Generation 14: HEBLOIWMRLD
  Generation 15: HEBLOIWMRLD
  Generation 16: HEYLFNWQRLD
  Generation 17: HEBLOIWORLS
  Generation 18: HEBLOIWORLT
  Generation 19: HEBLOKWGRLD
  Generation 20: HELLAYWORLS
  Generation 21: HELHOIWORLT
  Generation 22: HEWLOIWORLS
  Generation 23: HEBLOYCORLD
  Generation 24: HELLKQWORLD
  Generation 25: HELLOIWORLT
  Generation 26: HELLOIWORLS
  Generation 27: HELLKQWORLD
  Generation 28: HELLFYWORLD
  Generation 29: HELLOIWORLD
  Generation 30: HELLOIWORLD
  Generation 31: HELLOIWORLD
  Generation 32: HELLOIWORLD
  Generation 33: HELLOIWORLD
  Generation 34: HELLOIWORLD
  Generation 35: HELLOIWDRLD
  Generation 36: HELLOIWORLD
  Generation 37: HELLOIWORLD
  Generation 38: HELLOPWORLD
  Generation 39: HELLOIWORLD
  Generation 40: HELLO WORLD
  HELLO WORLD
This is the second in a short series of articles on practical Evolutionary Computation.  The text is taken from the work-in-progress documentation for the Watchmaker Framework for Evolutionary Computation.  The first article provided an introduction to the field of evolutionary computation.

Further Reading

An Introduction to Genetic Algorithms Introduction to Evolutionary Computing The Blind Watchmaker

Want more articles like this? Subscribe to the feed.

Random Number Generators: There should be only one

Posted in Java by Dan on January 11th, 2009


I came across this interesting RNG-related question on StackOverflow the other day.  Since I spent a fair bit of time constructing an answer, I thought I’d re-post it here and expand on it slightly.

Basically, the questioner wanted to know why the code below produced distinctly non-random results.  It generates 1500 values between 0 and 7 and keeps a tally of how often each value occurs. The results were 1310 fives and 190 sixes (the numbers 0, 1, 2, 3, 4 and 7 did not occur even once).  The expectation was that the values would be uniformly distributed, with each of the eight values occurring an approximately equal number of times.

int[] vals = new int[8];
for (int i = 0; i < 1500; i++)
    vals[new Random(i).nextInt(8)]++;
System.out.println(Arrays.toString(vals));

The most suspicious part of this code is that it creates and seeds a new random number generator for every iteration. My initial, brief answer was the suggestion that the code should use only a single RNG instance. It’s good advice and solves the problem for this particular code, but it doesn’t explain why the original version generates only fives and sixes.  It’s not like every RNG instance has the same seed, in fact every single one is different (the loop index is used as the seed).  Somebody else asked if I knew why the RNGs were behaving the same even though they were seeded differently.  Well I didn’t know, but I decided to find out.

I was pretty sure it was because the seeds were non-random (I’ve discussed the seeding of RNGs previously).  Because the seeds were 64-bit integers in the range 0-1500, the top 53 bits were identical (all zeros) across all of the seeds.  Ideally the RNG implementation would be able to overcome this lack of diversity in the seed data, but apparently not (if we repeat the experiment with a better RNG, such as the MersenneTwisterRNG, we do not see such obvious anomalies).  To see how this all plays out to generate only fives and sixes requires a detour into the source code of java.util.Random.

When you specify a seed in the constructor, it is manipulated as follows:

seed = (seed ^ 0x5DEECE66DL) & mask;

Where the mask simply retains the lower 48 bits and discards the others.

When generating the actual random bits, this seed is manipulated as follows:

randomBits = (seed * 0x5DEECE66DL + 0xBL) & mask;

Now if you consider that the seeds used were sequential (0 – 1499), and they were used once then discarded, the first four seeds generated the following four sets of random bits:

101110110010000010110100011000000000101001110100
101110110001101011010101011100110010010000000111
101110110010110001110010001110011101011101001110
101110110010011010010011010011001111000011100001

Note that the top 10 bits are indentical in each case. This is a problem because we only want to generate values in the range 0-7 (which only requires a few bits) and java.util.Random does this by shifting the higher bits to the right and discarding the low bits. It does this because in the general case the high bits are more random than the low bits. In this case they are not because the seed data was poor.

Finally, to see how these bits convert into the decimal values that we get, you need to know that java.util.Random makes a special case when n is a power of 2. It requests 31 random bits (the top 31 bits from the above 48), multiplies that value by n and then shifts it 31 bits to the right.

Multiplying by 8 (the value of n in this example) is the same as shifting left 3 places. So the net effect of this procedure is to shift the 31 bits 28 places to the right. In each of the 4 examples above, this leaves the bit pattern 101 (or 5 in decimal). In the example code, 1390 different seeds resulted in this particular 3-bit sequence. The others resulted in the sequence 110 (or 6 in decimal).

If we didn’t discard the RNGs after just one value, we would see the sequences diverge. While the first four RNG sequences all start with 5, the second values of each are 6, 0, 2 and 4 respectively. The small differences in the initial seeds start to have an influence.

The Singleton RNG

As this problem illustrates, having multiple RNGs in a single application can be problematic. If the seeds are not completely independent of each other, the outputs will not be independent. In this case the seed of one RNG is derived from the seed of the previous RNG (by incrementing it). Everything is much simpler if we only have a single random number generator.  Fortunately, java.util.Random, SecureRandom and all of the Uncommons Maths RNGs are thread-safe.  A single instance can be accessed from multiple threads without causing problems, so there usually is no need to have multiple RNGs even in complex applications.

There a few ways that you can make sure that you only use a single RNG.  You could use the singleton pattern to restrict the number of instances to one, but that approach is frowned upon these days.  Dependency injection is the more politically correct alternative, just pass your single RNG instance as an argument to the constructor of each object that needs to use random numbers.  A third option is the functional style, passing the RNG as an argument to each method that needs it.

Repeatability in Multi-Threaded Applications

Repeatability is a useful feature of deterministic pseudo-RNGs. If you use the same seed for two RNG instances, they will produce the same output sequence.  This is helpful for debugging applications because it means you can run your random code multiple times and have it behave identically each time.

It seems that the motivation for multiple RNGs in the original StackOverflow question was to maintain this repeatability in the presence of multiple threads (though that is not immediately obvious from the example).  A single RNG won’t be able to guarantee repeatability in this scenario without a little bit of extra work.  The reason for this is the uncertainty introduced by the vagaries of the scheduler.  You can guarantee the repeatability of the RNG output, but you can’t guarantee that individual values will always be picked up by the same threads.

There are a couple of solutions to this problem that don’t involve using one RNG per thread.  They both involve decoupling the generation of the random numbers from their consumption.  The simplest approach, if you know exactly how many numbers each thread needs, is to pre-generate all of the required random numbers and to pass them to the individual threads in arrays or lists.  The numbers are generated in a predictable order but the order in which they are consumed is unspecified (and unimportant because we have partitioned them).

A variation on this approach, for when you don’t know how many numbers are required or when it is impractical to pre-generate all of the values and hold them all at once, is to feed the numbers to the threads using separate queues.  The RNG can fill the queues in a round-robin fashion while the threads are consuming them at their own pace.

Lessons Learned

Want more articles like this? Subscribe to the feed.

Watchmaker Framework for Evolutionary Computation – Version 0.5.0

Posted in Evolutionary Computation, Java by Dan on December 10th, 2008


It’s been very nearly a year since the last release (0.4.3) of the Watchmaker Framework for Evolutionary Computation so, before 2008 disappears completely, it’s time for a new version that includes some of the stuff that I’ve been working on intermittently during this time.

Backwards-(In)compatibility

The primary purpose of the 0.5.0 release is to break backwards-compatibility and upset users.  If you’re already using 0.4.3, there are two main changes that you need to be aware of:

Firstly, the StandaloneEvolutionEngine class has been renamed to ConcurrentEvolutionEngine.  The old name did not reflect what distinguished this EvolutionEngine implementation from other implementations (some of which are yet to be added but will be in future releases).  One such alternative is the new SequentialEvolutionEngine, which performs all work on the request thread and, as such, is useful in managed environments that do not permit control over threading.

Secondly, all methods and constructors that take probabilities as parameters now use the new Probability type rather than Java doubles.  There was too much duplication of range-checking code and code that generated events with a given probability.  All of this has now been encapsulated in one place.

Genetic Programming

The Watchmaker download includes source code for several example applications.  New in this release is the tree-based genetic programming (GP) example.  Genetic Programming is a type of evolutionary algorithm that involves evolving computer programs.  The Watchmaker Framework is a general-purpose evolutionary computation framework.  It does not include any GP-specific classes but there is nothing preventing you from applying it to the task of evolving program trees.  This example is a Watchmaker-powered Java implementation of the first GP problem presented in Toby Segaran’s book, Programming Collective Intelligence.  From a set of example inputs and outputs, it generates a program that encapsulates the formula for converting the inputs to the outputs.

Useful Links

Want more articles like this? Subscribe to the feed.

Smaller Java

Posted in Java by Dan on November 9th, 2008


In my previous post I talked about how to reduce the size of your Java binaries without sacrificing functionality.  Using Proguard to strip out unused and redundant code, I was able to squeeze 1.4 megabytes of already-compressed JAR files (an applet plus its libraries) into 276kb.  The motivation was to reduce download times and data transfer costs for network-launched software (applets and Web Start applications).  An 80% size reduction is pretty impressive but can we do better?  Yes we can.

GZip

A JAR file (or Java ARchive) is simply a zip file with a different extension.  In other words, the contents are already compressed.  Given that we’ve already removed redundant information and zipped the files, you might think that we couldn’t compress the code much further.

Files in a zip archive are compressed individually.  In practice, better compression is often achieved by compressing an archive as a whole so that similarities across separate files can be exploited.  For this, we can use gzip.  Compressing the 276kb JAR file results in a 251kb GZipped JAR file.  This is a reduction of about 9%.  Good but not spectacular.

GZip is more effective when its input is uncompressed.  If we expand the 276kb JAR file, repack it as an uncompressed JAR (use jar -0) and then gzip that, the resultant jar.gz file is a mere 193kb.  Now that’s more like it, a 30% reduction on our already spartan 276kb binary.

At this point it is worth noting that you can’t just embed a jar.gz file in an HTML page.  It won’t work.  Instead, what you need to do is set-up content-negotiation on the web server so that when a browser requests the vanilla applet JAR file, it receives the GZipped version.  This is actually pretty straightforward and we’ll cover that shortly.  But first, I don’t think that 193kb is anywhere near small enough.  Can we do better?  Yes we can.

Pack200

We’ve reached the limits of what we can achieve with general purpose compression techniques.  We’ve also reached a lower limit for applets, since we are reliant on the browser for compression.  For Web Start applications though it’s a different story.

The Sun JDK includes a little-known tool called pack200.  It is a compression utility designed specifically for compressing JAR files.  Because Pack200 understands the class file format used by the archive contents, it is able to make optimisations that are unavailable to general purpose tools.  Pack200 restructures the archive and the class files it contains and then GZips the result.

At this point I really didn’t think there was much scope for further reductions in size.  I was wrong.  That 276kb JAR that we started with, the one that started out as 1.4mb of compressed Java bytecode, the one that was squashed to just 193kb by GZip, was reduced to a tiny 81 kilobytes after Pack200 had finished with it.

Over the course of two blog posts, I’ve reduced my data transfer requirements by 94%.  Despite the difference in size, the two programs are functionaly equivalent.  Of course, I shouldn’t pretend that compression is completely free.  The client machine will have to unpack the archive.  This will increase start-up processing, but since smaller files are downloaded quicker, it’s still likely to be faster overall.

Content-Negotiation

To use either Pack200 or GZip to compress network-launched Java applications requires content-negotiation on the web server.  The client tells the web server what encodings it supports and the web server responds with the most appropriate option.  In the case of applets, the client is the web browser.  None of the browsers that I have tested support Pack200, but they will all accept GZip.  For Web Start applications, the javaws launcher does accept Pack200 so that will be used were available.

Sun’s Pack200 page describes a servlet that can perform the necessary content-negotiation.  However, if you’re not already running a servlet container, it’s probably easier to use the features of your web server.  Chris Nokleberg has written some straightforward instructions on how to achieve this with Apache.

Want more articles like this? Subscribe to the feed.
Next Page »