RSS Feed

New Adventures in Software


Programming the Semantic Web and Beautiful Data

Posted in Software Development by Dan on June 27th, 2009


As I’ve mentioned previously, I’m a big fan of Toby Segaran’s book Programming Collective Intelligence. It introduces several cutting-edge algorithms for building intelligent web applications through a well chosen set of compelling example programs . A different author might have made the book a dull, overly mathematical ordeal but Segaran manages to inspire the reader to find ways to apply these exotic techniques in their own projects. I was therefore interested to discover that he has since collaborated on two new books that will both be released in July.

Programming the Semantic WebFor Programming the Semantic Web, Segaran has teamed up with Colin Evans and Jamie Taylor. I was unable to find a table of contents for this book but the publisher’s blurb suggests that it will follow the same pragmatic, hands-on formula that worked so well for Programming Collective Intelligence:

With this book, the promise of the Semantic Web — in which machines can find, share, and combine data on the Web — is not just a technical possibility, but a practical reality. Programming the Semantic Web demonstrates several ways to implement semantic web applications, using existing and emerging standards and technologies. With this book, you will learn how to incorporate existing data sources into semantically aware applications and publish rich semantic data.

Programming the Semantic Web will help you:

  • Learn how the semantic web allows new and unexpected uses of data to emerge
  • Understand how semantic technologies promote data portability with a simple, abstract model for knowledge representation
  • Be familiar with semantic standards, such as the Resource Description Framework (RDF) and the Web Ontology Language (OWL)
  • Make use of semantic programming techniques to both enrich and simplify current web applications
  • Learn how to incorporate existing data sources into semantically aware applications

Each chapter walks you through a single piece of semantic technology, and explains how you can use it to solve real problems. Whether you’re writing a simple “mashup” or a maintaining a high-performance enterprise solution, this book provides a standard, flexible approach for integrating and future-proofing systems and data.

Beautiful DataToby has clearly been keeping himself busy because he’s also found time to co-edit the latest installment in O’Reilly’s Beautiful Code series. In 2007 the original Beautiful Code book presented an eclectic mix of 33 essays about elegance in software design and implementation, each written by a different well-known programmer. The success of this anthology has resulted in O’Reilly issuing three companion volumes in 2009: Beautiful Architecture, Beautiful Security and the forthcoming Beautiful Data: The Stories Behind Elegant Data Solutions (edited by Toby Segaran and Jeff Hammerbacher).

Beautiful Data follows the same format as the other books in the series, with each chapter authored by different expert practitioners. One of these chapters covers the making of the video for Radiohead’s House of Cards single, while another is about data processing challenges faced by NASA’s Mars exploration program.

Clearly I haven’t read either of these books because they are not available yet, so I can’t make any informed recommendations, but they do both look like they could be interesting.

Want more articles like this? Subscribe to the feed.

First Qualifying Solution Submitted for $1 Million Netflix Prize

Posted in Software Development, The Internet by Dan on June 26th, 2009


The word on the street (well Reddit actually) is that the BellKor’s Pragmatic Chaos team today submitted the first qualifying solution for the Netflix Prize.  If nobody submits a better solution within the next 30 days then they will claim the $1 million reward that has so far eluded the best efforts of thousands of programmers and researchers since the competition was launched in October 2006.

Netflix is a US-based online DVD rental service.  One of their features is that they make movie recommendations to customers based on their previous viewing history.  In order to improve their recommendations system, Netflix has been offering a million dollar reward to any individual or team that is able to develop software that increases the accuracy of these recommendations by at least 10%.

The financial rewards and intellectual challenge of the Netflix Prize have encouraged almost 50,000 individuals and teams to attempt to solve the problem using a vast array of different AI and data-mining techniques.

The BellKor team have overcome such obstacles as the Napolean Dynamite problem and will no doubt have the champagne on ice while they nervously wait to see if  anybody else is able to surpass their results within the next month.

Want more articles like this? Subscribe to the feed.

Opera Unite Divides Opinion

Posted in The Internet by Dan on June 17th, 2009


Opera Software would have you believe that yesterday they reinvented the web.  The launch of their new Opera Unite service has received a decent amount of publicity.  By now you’ve probably heard all about it, but if not you can read the details here.

The 10 second summary is that version 10 of Opera’s web browser contains a web server that allows users to serve web content directly from their desktop machines or laptops.  However, this description doesn’t really capture the potential of the platform.

Opera Unite

Some commentators have dismissed the announcement with a “so what?”.  Opera Unite content is only going to be available while the user’s computer is switched on and running Opera and will be constrained by their available upload bandwidth (which often isn’t much thanks to the ‘A’ in ADSL).  That doesn’t really cut it when compared to low-cost web hosting packages capable of serving thousands of users, but then the comparison isn’t particularly helpful.

I don’t need Opera Unite to host my personal website from my desktop.  I can install and configure Apache, tweak my firewall/router settings and find a solution to dynamic IP address issues.  The point is that with Opera Unite, you don’t have to do any of that.  Opera have completely eliminated all of that hassle and in doing so have made web serving accessible to even non-technical users.  But that’s only half of the story.  Serving your personal home page via Opera Unite is still sub-optimal.  If you want (semi-)permanent web hosting, pay for some cheap PHP hosting or get a Wordpress.com account.

If somebody gives you an Opera Unite URL, you shouldn’t expect that resource to be still around tomorrow or next year like you would with a link to Wikipedia.  The real value in Opera Unite is in ad hoc sharing and transient collaboration.  Things that were possible but bothersome previously are now trivial because you don’t have to worry about server configuration and networking issues.

For example, say I wanted to invite every reader of this blog to join a chat session.  I could try to find out which IM clients you all use and try to arrange something via MSN Messenger, Skype or Google Talk.  Or I could install and configure my own IRC server.  Or I could try to find a third-party server to host the chat room.  With Opera Unite I can simply open up my lounge and give you all the URL (regardless of which browser you happen to be using).  It just takes a few clicks.  The service is transient.  When we’re done, I kick you all out.

In our chat session I might decide to share some photos or other files with you.  I could send them via e-mail or upload them to an FTP server or a service like Flickr, but again it’s simpler with Unite.  I just enable the appropriate service and share the URL.  You can browse my shared directory and grab what you want directly from my machine.  The link probably won’t work tomorrow, but you won’t need it tomorrow.  Temporary is fine when it’s this easy.

The other service that I’m already finding useful is the media player, which enables me to remotely play my home MP3 collection from the office.  The Unite platform is based on open standards, so it will be interesting to see what other ideas for services people come up with.

Want more articles like this? Subscribe to the feed.

Escape Analysis in Java 6 Update 14 – Some Informal Benchmarks

Posted in Java by Dan on May 31st, 2009


Sun recently released update 14 of the Java 6 JDK and JRE.  As well as the usual collection of bug fixes, this release includes some experimental new features designed to improve the performance of the JVM (see the release notes).  One of these is Escape Analysis.

To see what kind of impact escape analysis might have on my applications, I decided to try it on a couple of my more CPU-intensive Java programs.  Escape analysis is turned off by default since it is still experimental.  It is enabled using the following command-line option:

-XX:+DoEscapeAnalysis

Benchmark 1

The first program I tested is a statistical simulation.  Basically it generates millions of random numbers (using Uncommons Maths naturally) and does a few calculations.

VM Switches: -server
95 seconds

VM Switches: -server -XX:+DoEscapeAnalysis
73 seconds

Performance improvement using Escape Analysis: 23%

Benchmark 2

The second program I tested is an implementation of non-negative matrix factorisation.

VM Switches: -server
22.6 seconds

VM Switches: -server -XX:+DoEscapeAnalysis
20.8 seconds

Performance improvement using Escape Analysis: 8%

Conclusions

These benchmarks are neither representative nor comprehensive.  Nevertheless, for certain types of program the addition of escape analysis appears to be another signficant step forward in JVM performance.

Want more articles like this? Subscribe to the feed.

Upgrading Obsolete Ubuntu Systems

Posted in Linux by Dan on May 30th, 2009


Ubuntu 7.10 (Gutsy Gibbon) recently reached EOL (end-of-life).  Presumably to discourage people from continuing to use a distro that is officially dead, the package repositories for an EOL release disappear from archive.ubuntu.com, which means that if you try to use apt-get to install or upgrade your software you will get 404 errors.

The recommended action for anyone still running 7.10 is to use the do-release-upgrade command to (relatively) painlessly upgrade to a newer, supported version of Ubuntu (remembering to take a backup first, just in case).  The one little catch with this solution is that without access to the package repository, you won’t be able to install the upgrade tool if you don’t already have it.

Fortunately, the Gutsy repository hasn’t been removed completely, it’s just relocated to old-releases.ubuntu.com.  So if you edit /etc/apt/sources.list and replace all occurrences of archive.ubuntu.com with old-releases.ubuntu.com, you will again be able to access the packages for 7.10.  You should then install the update-manager-core package to enable the upgrade.

sudo vi /etc/apt/sources.list
sudo apt-get install update-manager-core

After doing this and before upgrading, it is important to revert the changes made to the sources.list file (i.e. change it back to using archive.ubuntu.com).  This is because the distro upgrade will replace all references to ‘gutsy’ with ‘hardy’ (as in Hardy Heron, the Ubuntu 8.04 release) but will not change the repository addresses.  Since Hardy is hosted at archive.ubuntu.com, leaving it as old-releases.ubuntu.com will cause the upgrade to fail.

sudo vi /etc/apt/sources.list
sudo do-release-upgrade

If all goes well you will end up with a fully functioning Ubuntu 8.04 system.  Hardy Heron is the current LTS (Long Term Support) release.  You have the option of a further upgrade to 9.04 (Jaunty Jackalope), but although it is a more recent release, it will reach EOL earlier because there is no long term support commitment for Jaunty.  Jaunty will reach EOL in October 2010 whereas Hardy will be supported until April 2011 for desktops and April 2013 for servers.

Want more articles like this? Subscribe to the feed.

SICP – The most divisive book in Computer Science?

Posted in Software Development by Dan on May 28th, 2009


Structure and Interpretation of Computer Programs (universally referred to as SICP) seems to be mentioned whenever people are discussing the great/classic/essential Computer Science books.  It typically generates a mixed response.

Somebody recently sent a copy (anonymously?) to Python creator Guido van Rossum, apparently as a comment on his supposed ignorance (incidentally, this is an incredibly arsey thing to do).

It seems that SICP is a real love-it-or-hate-it kind of book.  Depending on who you listen to, it’s either a mind-bending classic through which true enlightenment can be achieved, or it’s dull, obvious and poorly written.

The distribution of the reviews for SICP on Amazon (UK) is striking:

Amazon SICP reviews

If you haven’t already read it, you can decide for yourself.  The whole thing is available online.  I didn’t get very far the one time I started to read it.  I quickly got bored with the introductory stuff, but I intend to give it another go sometime.  I’ve seen several people recommend the associated video lectures, which may be a better entry point.

Want more articles like this? Subscribe to the feed.

Watchmaker 0.6.0 – Evolutionary Computation for Java

Posted in Evolutionary Computation, Java by Dan on April 26th, 2009


Version 0.6.0 of the Watchmaker Framework for Evolutionary Computation is now available for download.  This release incorporates several minor changes that I’ve been making over the last few months.  Consult the changelog for full details, but here are the highlights:

Numerous Improvements to the Evolution Monitor and other Swing Components

The Watchmaker Swing library provides a collection of GUI components that simplify the process of building user interfaces for evolutionary programs. These components have received many improvments for version 0.6.0. As well as controls for manipulating evolution parameters while the program is running, the library also provides an Evolution Monitor component. This provides real-time information about the state of the program, including a view of the fittest candidate so far and a graph showing changes in population fitness over time.

Upgraded to Uncommons Maths 1.2

This means even faster RNGs are available for you to use. It also means that we now use the Uncommons Maths Probability class rather than duplicating it in the framework (this means you may have to change some imports in your code when upgrading from Watchmaker 0.5.x).

Caching Fitness Evaluator

Version 0.6.0 introduces the CachingFitnessEvaluator class. This is a wrapper that provides caching for existing FitnessEvaluator implementations. The results of fitness evaluations are cached so that if the same candidate is evaluated twice, the expense of the fitness calculation can be avoided the second time. The cache uses weak references in order to avoid memory leakage.

Caching of fitness values can be a useful optimisation in situations where the fitness evaluation is expensive and there is a possibility that some candidates will survive from generation to generation unmodified. Programs that use elitism are one example of candidates surviving unmodified. Another scenario is when the configured evolutionary operator does not always modify every candidate in the population for every generation.

Caching of fitness scores is provided as an option rather than as the default Watchmaker Framework behaviour because caching is only valid when fitness evaluations are isolated and repeatable. An isolated fitness evaluation is one where the result depends only upon the candidate being evaluated. This is not the case when candidates are evaluated against the other members of the population.

Mona Lisa Example

After seeing Roger Alsing’s evolution of the Mona Lisa, I was inspired to try to reproduce it using the Watchmaker Framework. I didn’t follow Roger’s methodology but I have come up with something similar. My results aren’t as impressive as his latest efforts but may be interesting anyway. This example was actually included in version 0.5.1 but I didn’t draw attention to it. In 0.6.0 I’ve improved performance and used it to demonstrate the Watchmaker GUI components mentioned above.  You can try it for yourself here.  Maybe you can come up with a combination of parameters that works better than the defaults I have provided?

Useful Watchmaker Links

If you are new to Evolutionary Computation in Java,  these previous articles may be of interest:

Want more articles like this? Subscribe to the feed.

The Java Language Features that Nobody Uses

Posted in Java by Dan on April 17th, 2009


I read Anthony Goubard’s “Top 10 Unused Java Features” on JavaLobby earlier today.  I agree with some of his selections but I think he missed out a few key features that nobody uses.  Restricting myself to just language features (the API is too huge), here are four more widely unused features of Java.

4. The short data type

You use it? I don’t believe you. Everybody* uses int when they want integers, even if they don’t need a 32-bit range.

3. Octal Literals

Who uses Octal these days?** Hexadecimal is a more useful shorthand for binary values.  Worse, the leading-zero notation for Octal literals is just confusing:

int a = 60;
int b = 060;
System.out.println(a + b); // Prints 108.

2. Local Classes

Java has four types of nested class, three of which are widely used.  As well as static nested classes, named inner classes and anonymous inner classes, you can also define named classes within methods, though it’s rare to see one in the wild.

public class TopLevelClass
{
    public void someMethod()
    {
        class LocalClass
        {
            // Some fields and methods here.
        }
 
        LocalClass forLocalPeople = new LocalClass();
    }
}

1. Strict FP

There is probably a programmer out there somewhere for whom Java’s strictfp is vital, but I haven’t met him or her. If you already know what strictfp is used for then you are probably in the top 5% of Java programmers. If you don’t know what strictfp does, here you go, welcome to the top 5%. It’s basically about making sure that your calculations are equally wrong on all platforms.

* OK, maybe you used to be a C programmer.
** Here’s your rhetorical answer.

Related Articles

Want more articles like this? Subscribe to the feed.

5 Ways to Become a Famous Programmer (Probably)

Posted in Software Development by Dan on April 12th, 2009


How do ordinary programmers become famous programmers?  Since I am not already a famous programmer I can’t speak from experience but, from scientific observations of the those programmers who are well-known, I have been able to identify the following five strategies for becoming a “thought leader”:

1. Do Great Things

Build software that everybody uses and you’ll become famous.  Easy.

This is the Torvalds method.  Everybody knows who Linus is because of Linux.  And, just to prove it wasn’t a fluke, he followed up by creating Git.  This approach is not fool-proof though.  Not all great projects become famous and not all famous projects have well-known developers.

2. Talk the Talk

Start a blog, get 100,000 readers, retire on the Adsense revenue.

The best known programmers aren’t necessarily known for their programming.  They may still be good programmers but they made their mark by being excellent communicators.  Joel Spolsky is the poster child for this category.  If it weren’t for Joel’s blog who knows whether Fog Creek Software would be in business at all?  By writing regular common sense about software development and management, and by evangelising the programmer’s utopia that he’s building in New York, Joel keeps his company, its products and its offices in the minds of his hundreds of thousands of readers.

Joel’s StackOverflow business partner, Jeff Atwood, is another example of programmer-turned-A-list-blogger.  I would love to know how to get from 200-reader blog to 100k-reader blog.  I’m sure that in Jeff’s case it has a lot to do with the regular posting schedule that he has maintained over a number of years.  Easy-to-read articles, with a hint of danger, posted several times a week leads to an extensive archive of articles that no doubt brings in a huge amount of Google traffic.

Steve Yegge is somebody else you could look to emulate as a blogger, but if you find yourself writing articles so long that they need a full-page table of contents, it’s probably time to lay off the dope.

3. Write the Book

Next time you are in a job interview or pitching for some consulting work and somebody asks “do you know anything about technology XYZ?”, wouldn’t it be great to be able to respond “actually, I wrote the book Professional XYZ in Action for Dummies in a Nutshell”?

There are thousands of software development books published every year.  Technology books have short life spans and publishers are always on the lookout for potential authors to write a book on the next big thing.  If you can demonstrate basic literacy and sufficient technical knowledge, it could be you.  Just don’t expect to get rich from the royalties.  If you divide the amount that you make by the number of hours you spend writing, you’ll be lucky to come out ahead of minimum wage levels.

It’s a big time commitment for modest direct financial rewards but, if you’re playing the long game, writing a book can lead to other opportunities such as speaking at conferences, providing expert consultancy and more.  You also get the satisfaction of seeing your book on Amazon and, better still, you get to harrass local bookstores by re-enacting the J.R. Hartley Yellow Pages ad.

4. Become a Cult Figure

Jon Skeet was a respected member of the programming community before the advent of StackOverflow but now, as the number one ranked user by some distance, Jon Skeet is a cult.

5. Work at Google

I don’t know whether being a well-known developer gets you a job at Google or if getting a job at Google makes you a well-known developer but either way there are a lot of famous programmers at Google.

Working at Google is like an extra stamp of credibility.  If you don’t work at Google and you say something stupid, people think you are dumb.  If you do work at Google and you say something stupid, people think that you know something that they don’t.

Want more articles like this? Subscribe to the feed.

Using PHPUnit with Hudson

Posted in PHP by Dan on March 25th, 2009


The problem with undocumented standards is that they tend not to be very standardised.  The XML format used by Ant’s JUnitReport task has been adopted, extended and bastardised by several different testing tools to the extent that there are at least half a dozen different dialects currently in use.  This creates a problem for tools like Hudson that try to parse this inconsistent output.  Currently Hudson works correctly with the XML from JUnit, TestNG (including ReportNG) and some other tools but it doesn’t recognise the output from Google Test or PHPUnit.

I was going to make the necessary changes to Hudson so that it accepts the PHPUnit and Google Test variants but I had some problems getting Hudson to build (yay Maven!).  I may return to implementing this fix later but for now I’ve used a quick and dirty hack that massages PHPUnit’s output into a form more acceptable to Hudson.  Since I’m invoking PHPUnit from a shell script, I can use sed to make the necessary modifications.  In PHPUnit’s case, I just need to eliminate the nesting of <testsuite> tags, which can be done by deleting the third and penultimate lines of the XML file:

# Tweak the test result XML to make it acceptable to Hudson.
lines=`wc -l test-results/results.xml|awk '{print $1}'`
end=`expr $lines - 1`
sed -i "$end d;3d" test-results/results.xml
Want more articles like this? Subscribe to the feed.
Next Page »