I went to Cloudera's Hadoop World NYC 2009 on Friday: it was quite a show. One theme that played out through many presentations was abstraction layers on top of raw Map Reduce. The two biggest are Pig and Hive, which are Yahoo's and Facebook's solutions to the same basic problem, of how to write less code for repetitive Map Reduce tasks. There's a lot of good commentary out there on those. Hive is more like a sql shell, and if you want to extend it, I think you're going to be writing, say, Python mappers/reducers and streaming them into/out-of your Hive setup. With Pig, you're operating, as they put it in the training/documentation/O'Reilly book, which collectively document Pig very well, more at the level of a SQL query optimizer. You have some iteration facilities, and you can extend it with java. Pig does more exactly what you tell it to do, and Hive is something you 'hint' at. These are general-purpose tools.
In the more specialized area of web analytics, eBay has a very interesting internal tool, called Mobius Query Language, on which Neel Sundaresan gave a fascinating talk. I'll update with a link if Cloudera posts the presentation, but it helps you model visits with landmarks, duration, and some other concepts I didn't take notes on. It clearly helps them wrap their code around the maddeningly amorphous user visit: participating in an auction, bidding, winning, abandoning, etc. The language seemed general-purpose enough for application to any user-behavior modeling. The interface is a SQL-like query language that seems, like Hive, to generate Map Reduce jobs based on nicely abstracted view of exactly the sorts of questions you want to ask your web analytics system. For the moment, I'm doing what web analytics I'm doing by extending Pig, but I hereby declare the Movement to Get eBay to Opensource the Mobius Query Language. Who's with me?