ReportNG Conquers Google, No Longer Just a Typo
This time last year, if you typed “reportng” into Google, you would get this:

This was Google’s way of telling the world that my project was insignificant, that the only people who would type that particular search term were those without full control of their fingers.
The Subversion repository for ReportNG dates back to September 2006. I’ve been waiting some time for Google to acknowledge its existence:
I didn’t have to think too long over the name for ReportNG. Unexciting as it is, it was a fairly obvious due to its relation to TestNG. It also has a built-in barometer for success. When Google stops asking “Did you mean reporting?” I’ve made it.
Well it seems I’ve finally “made it” – whatever that means. It feels like a bit of an anti-climax but, for what it’s worth, Google now considers ReportNG to be something more than a careless mistake:

Jack Nicholson, Software Developer

Jack’s new career in software development was going pretty well, but he didn’t appreciate the daily stand-up meetings.
Zeitgeist 1.0 – An Intelligent RSS News Aggregator
I recently signed-up for GitHub. Compared to Java.net or Sourceforge, it provides a much lower barrier of entry for code hosting. There’s no need to wait an indeterminate period of time for somebody to approve your project, you just upload it. And because it’s a DVCS, it’s easy for other people to fork your projects and submit patches. Open Source project hosting has become so straightforward, thanks to sites like GitHub and the Bazaar-based Launchpad, that it encourages developers to open up code that they might otherwise have kept to themselves. After all, why bother with local repositories and back-ups when you can get somebody else to do it for you and get free web-hosting and issue-tracking too?
I have a number of trivial and incomplete projects hosted in local Subversion repositories. I am slowly adding to GitHub those that have any worthwhile substance to them. I’m making no promises about the quality of this code, and I don’t intend to spend much time supporting it, but I’m putting it out there in case somebody might have a use for it.
First up is Zeitgeist. This is a small Java library/application for identifying common topics among a set of news articles downloaded from RSS feeds. It’s sort of like what Google News does. There is a basic HTML publisher included that generates a web page for displaying the current top news stories, including relevant pictures.
You give the program a list of RSS feeds that cover a certain topic (maybe world news, or music news, or a particular sport) and it uses non-negative matrix factorisation to detect similarities in the article contents and to group the articles by topic. The original idea comes from Programming Collective Intelligence.
The default HTML output looks a bit like the image below, but you could customise it with CSS or by hacking the default templates to modify what information is included (for example, you could add an excerpt instead of just displaying headlines).
The algorithm is not infallible and how well it works depends a lot on the feeds that you select. It’s also non-deterministic, so if you run it multiple times with the same input you will get variations in the output. Perhaps Zeitgeist is not that useful in it’s current form but it could be used for adding on-topic news headlines to a website or as the basis for something more advanced.
Programmers’ CVs – 20 years behind the times?
Take a programmer’s CV/résumé from the late 1980s and one from today and, aside from the content, what has changed?
Not much. Both will typically be approximately two pages of static, word-processed, black text on white A4 paper (or US Letter in North America). Maybe the text doesn’t always arrive on actual paper these days thanks to the miracles of electronic document transfer, but the format of the typical CV has not really changed since the demise of the typewriter.
NOTE: Just to be clear, I am considering CVs and résumés to be equivalent. Wikipedia makes a distinction but I’m assuming that’s mainly an American thing. In the UK there is typically no distinction and the Latin term “Curriculum Vitae”, abbreviated to “C.V.”, is almost universally preferred to the French “résumé” (I’ve no idea why there isn’t an English word for this type of document).
Have we really found the optimal way of communicating our skills and experiences, or has the humble CV been neglected by the Internet revolution? The IT recruitment industry seems wedded to the Microsoft .DOC format. This is partly because of the ubiquity of Microsoft Office in the corporate world and partly because agents prefer to receive CVs in a format that they can easily edit (which is why I insist on sending my CV as a PDF).
Dynamic CVs
Where’s the innovation? Shouldn’t a CV be something more dynamic? And why are we still e-mailing attachments every time we want somebody to see our CVs? Attached files very quickly become outdated. I often have recruiters that I’ve spoken to in the past phoning me to ask for an updated CV. If my CV was a URL, people would always be able to see the latest version (assuming I let them have access). We do have LinkedIn profiles, which are fine in the context of your LinkedIn network but don’t really work as a general purpose CV.
Fortunately, there are some people trying to drag the programmer’s CV into the 21st century. Ben Northrop’s coderscv.com provides a free online home for your programming CV. The documents are nicely presented and the timeline view is a neat way to display your own personal history. VisualCV goes even further and embraces multimedia content, though the site is not IT-specific. Maybe it’s not a good idea to have a video as part of your CV but it’s nice to have the choice.
If you are planning to break a few conventions with your CV, either on the web or in a static file, it would be useful to measure the impact of any changes that you make, so you might be interested in TrackMyCV.com, which I saw announced today on Reddit. You use it to make documents trackable in pretty much the same way spammers embed 1-pixel images in e-mails in order to see which messages get read.
StackOverflow Careers
Another attempt to bring programmer CVs into the Web 2.0 age is the recently announced StackOverflow Careers. Despite some minor imperfections, the main Stackoverflow site has been a phenomenal success. Co-founder Joel Spolsky has had previous success with the Joel on Software jobs board and we are reminded that he wrote a book on recruiting programmers, so this kind of job-related monetisation was the obvious next step.
Voting peculiarities and reputation anomalies aside, StackOverflow is a meritocracy of sorts and it is this that Joel and business partner Jeff Atwood are attempting to exploit. A CV posted on StackOverflow Careers will be accompanied by a reputation score and a history of contributions to the programming community. The careers feature is currently in beta and is not particularly sophisticated at present but I expect it to expand over time.
I like the idea of expanding the scope of a CV to include other online evidence of programming competence. In this case it’s StackOverflow reputation but it could be Ohloh data or information from Google Code/SourceForge. However, at $99 to list your CV for a year, the current pricing is ridiculous. Most recruitment sites charge employers but let candidates use the service for free. Joel and Jeff are taking a fee from both sides. The justification for charging candidates is that it will ensure that the only CVs listed on the service will be from people who are actively looking for work, increasing the value of the service to potential employers. It should also mean that your CV page is kept free from advertising.
I suspect that Joel and Jeff are aware that $99 is too much but it’s easier to start out too expensive and reduce the price than it is to do the opposite. The $99 figure also serves to make the introductory offer of $29 for 3 years look more reasonable in comparison. The problem with the introductory offer is that it only runs until 9th November. The site is still in beta and the functionality for employers to sign-up has not been launched yet. So, if you sign-up now to get the reduced rate, you’ll be paying $29 to list your CV on a site that is not used by any employers. It’s an unproven platform with a boot-strapping problem. Most programmers won’t want to pay money for a speculative premium service and employers are likely to be reluctant to sign-up to search a database with so few potential hires.
Attention to Detail
A thought for the day, courtesy of Landon Dyer (no relation) a.k.a DadHacker.
“Good programs do not contain spelling errors or have grammatical mistakes. I think this is probably a result of fractal attention to detail; in great programs things are correct at all levels, down to the periods at the ends of sentences in comments.”
10 Tips for Publishing Open Source Java Libraries
One of the strengths of the Java ecosystem is the huge number of open source libraries that are available. There are often several alternatives when you need a library that provides some specific functionality. Some library authors make it easy to evaluate and use their libraries while others don’t. Open source developers may not care whether their libraries are widely used but I suspect that many are at least partially motivated by the desire to see their projects succeed. With that in mind, here’s a checklist of things to consider to give your open source Java library the best chance of widespread adoption.
1. Make the download link prominent.
If other people can’t figure out how to download your project, it’s not going to be very successful. I’m bemused by the number of open source projects that hide their download links some place obscure. Put it in a prominent location on the front page. Use the word “download” and use large, bold text so that it can’t be missed.
2. Be explicit about the licence.
Potential users will want to know whether your licensing is compatible with their project. Don’t make users have to download and unzip your software in order to find out which licence you use. Display this information prominently on the project’s home page (don’t leave it hidden away in some dark corner of SourceForge’s project pages).
3. Prefer Apache, BSD or LGPL rather than GPL.
Obviously you are free to release your library under any terms that you choose. It’s your work and you get to decide who uses it and how. That said, while the GPL may be a fine choice for end user applications, it doesn’t make much sense for libraries. If you pick a copyleft licence, such as the GPL, your library will be doomed to irrelevance. Even the Free Software Foundation acknowledges this (albeit grudgingly), hence the existence of the LGPL.
The viral nature of the GPL effectively prevents commercial exploitation of your work. This may be exactly what you want, but it also prevents your library from being used by open source projects that use a more permissive licence. This is because they would have to abandon the non-copyleft licence and switch to your chosen licence. That isn’t going to happen.
4. Be conservative about adding dependencies.
Every third-party library that your library depends on is a potential source of pain for your users. They may already depend on a different version of the same library, which can lead to JAR Hell (such problems can be mitigated by using a tool such as Jar Jar Links to isolate dependencies). Injudicious dependencies can also greatly increase the size of your project and every project that uses it. Don’t introduce a dependency unless it adds real value to your library.
5. Document dependencies.
Ideally you should bundle all dependent JARs with your distribution. This makes it much easier for users to get started. Regardless, you should document exactly which versions of which libraries your library requires. NoClassDefFoundError is not the most friendly way to communicate this information.
6. Avoid depending on a logging framework.
Depending on a particular logging framework will cause a world of pain for half of your users. Some people like to use Sun’s JDK logging classes to avoid an external dependency; and some people like to use Log4J because Sun’s JDK logging isn’t very good. SimpleLog is another alternative.
If you pick the “wrong” logging framework you force your users to make a difficult choice. Either they maintain two separate logging mechanisms in their application, or they replace their preferred framework with the one you insisted that they use, or (more likely) they replace your library with something else.
For most small to medium sized libraries logging is not a necessity. Problems can be reported to the application code via exceptions and can be logged there. Incidental informational logging can usually be omitted (unless you’ve written something like Hibernate, which really does need trace logging so that you can figure out what is going on).
7. If you really need logging, use an indirect dependency.
OK, so not all libraries can realistically avoid logging. The solution is to use a logging adapter such as SLF4J. This allows you to write log messages and your users to have the final say over which logging back-end gets used.
8. Make the Javadocs available online.
Some libraries only include API docs in the download or, worse still, don’t generate it at all. If you’re going to have API documentation (and it’s not exactly much effort with Javadoc), put it on the website. Potential users can get a feel for an API by browsing its classes and methods.
9. Provide a minimal example.
In an ideal world your library will be accompanied by a beautiful user manual complete with step-by-step examples for all scenarios. In the real world all we want is a code snippet that shows how to get started with the library. Your online Javadocs can be intimidating if we don’t know which classes to start with.
10. Make the JAR files available in a Maven repository.
This one that I haven’t really followed through on properly for all of my projects yet, though I intend to. That’s because I don’t use Maven, but some people like to. These people will be more likely to use your library if you make the JAR file(s) available in a public Maven repository (such as Java.net’s). You don’t have to use Maven yourself to do this as there is a set of Ant tasks that you can use to publish artifacts.
That’s Gotta Hurt – Netflix Prize Snatched Away at Last Moment?
30 days ago, the BellKor’s Pragmatic Chaos team submitted the first qualifying solution for the $1 million Netflix prize. The prize is awarded to the best performing solution 30 days after first submission that achieves the 10% improvement threshold.
BellKor achieved 10.05% on 26th June and have since moved on to 10.08%. Several teams that were close to the qualifying mark responded by forming coallitions in a frantic race to find a hybrid solution that would surpass BellKor’s mark before the end of the 30 day period.
The Ensemble is one of these super teams. They achieved the 10% mark two days ago and then today, on the very last day of the competition, they appear to have dramatically snatched the prize with a submission that is just 0.01% better than BellKor’s.
UPDATE: BellKor subsequently submitted an entry that matched the Ensemble’s 10.09% only for the Ensemble to trump that 20 minutes later with a score of 10.10%, 4 minutes before the submissions closed.
UPDATE 2: Simon Owens has posted an interview with one of the members of the winning Ensemble team.
UPDATE 3: The Ensemble themselves have posted an account of the nail-biting final minutes of the competition.
First Qualifying Solution Submitted for $1 Million Netflix Prize
The word on the street (well Reddit actually) is that the BellKor’s Pragmatic Chaos team today submitted the first qualifying solution for the Netflix Prize. If nobody submits a better solution within the next 30 days then they will claim the $1 million reward that has so far eluded the best efforts of thousands of programmers and researchers since the competition was launched in October 2006.
Netflix is a US-based online DVD rental service. One of their features is that they make movie recommendations to customers based on their previous viewing history. In order to improve their recommendations system, Netflix has been offering a million dollar reward to any individual or team that is able to develop software that increases the accuracy of these recommendations by at least 10%.
The financial rewards and intellectual challenge of the Netflix Prize have encouraged almost 50,000 individuals and teams to attempt to solve the problem using a vast array of different AI and data-mining techniques.
The BellKor team have overcome such obstacles as the Napolean Dynamite problem and will no doubt have the champagne on ice while they nervously wait to see if anybody else is able to surpass their results within the next month.
5 Ways to Become a Famous Programmer (Probably)
How do ordinary programmers become famous programmers? Since I am not already a famous programmer I can’t speak from experience but, from scientific observations of the those programmers who are well-known, I have been able to identify the following five strategies for becoming a “thought leader”:
1. Do Great Things
Build software that everybody uses and you’ll become famous. Easy.
This is the Torvalds method. Everybody knows who Linus is because of Linux. And, just to prove it wasn’t a fluke, he followed up by creating Git. This approach is not fool-proof though. Not all great projects become famous and not all famous projects have well-known developers.
2. Talk the Talk
Start a blog, get 100,000 readers, retire on the Adsense revenue.
The best known programmers aren’t necessarily known for their programming. They may still be good programmers but they made their mark by being excellent communicators. Joel Spolsky is the poster child for this category. If it weren’t for Joel’s blog who knows whether Fog Creek Software would be in business at all? By writing regular common sense about software development and management, and by evangelising the programmer’s utopia that he’s building in New York, Joel keeps his company, its products and its offices in the minds of his hundreds of thousands of readers.
Joel’s StackOverflow business partner, Jeff Atwood, is another example of programmer-turned-A-list-blogger. I would love to know how to get from 200-reader blog to 100k-reader blog. I’m sure that in Jeff’s case it has a lot to do with the regular posting schedule that he has maintained over a number of years. Easy-to-read articles, with a hint of danger, posted several times a week leads to an extensive archive of articles that no doubt brings in a huge amount of Google traffic.
Steve Yegge is somebody else you could look to emulate as a blogger, but if you find yourself writing articles so long that they need a full-page table of contents, it’s probably time to lay off the dope.
3. Write the Book
Next time you are in a job interview or pitching for some consulting work and somebody asks “do you know anything about technology XYZ?”, wouldn’t it be great to be able to respond “actually, I wrote the book Professional XYZ in Action for Dummies in a Nutshell”?
There are thousands of software development books published every year. Technology books have short life spans and publishers are always on the lookout for potential authors to write a book on the next big thing. If you can demonstrate basic literacy and sufficient technical knowledge, it could be you. Just don’t expect to get rich from the royalties. If you divide the amount that you make by the number of hours you spend writing, you’ll be lucky to come out ahead of minimum wage levels.
It’s a big time commitment for modest direct financial rewards but, if you’re playing the long game, writing a book can lead to other opportunities such as speaking at conferences, providing expert consultancy and more. You also get the satisfaction of seeing your book on Amazon and, better still, you get to harrass local bookstores by re-enacting the J.R. Hartley Yellow Pages ad.
4. Become a Cult Figure
Jon Skeet was a respected member of the programming community before the advent of StackOverflow but now, as the number one ranked user by some distance, Jon Skeet is a cult.
5. Work at Google
I don’t know whether being a well-known developer gets you a job at Google or if getting a job at Google makes you a well-known developer but either way there are a lot of famous programmers at Google.
Working at Google is like an extra stamp of credibility. If you don’t work at Google and you say something stupid, people think you are dumb. If you do work at Google and you say something stupid, people think that you know something that they don’t.
Software Naming Revisited
What do I know about naming software projects? Maybe it’s not such a good idea to give your project a name which is a common typo of a common word?
Google Search suggests that it’s a mistake:

TestNG has achieved sufficient popularity to overcome that problem. ReportNG is not there yet.
Google Alerts doesn’t like it much either. I have alerts for each of my projects so I can see where they are being used and respond to any queries that may arise elsewhere on the web. An alert for “ReportNG” results in an e-mail every time somebody somewhere on the web misspells “reporting”. Not particularly helpful. So I tried “ReportNG AND TestNG”. Now I get an e-mail for every slack-fingered typist who manages to make two separate typos on the same page.

