Hadoop in the Enterprise

At Hadoop World NYC 2009, one of the most interesting presentations, from a business point of view, was by JP Morgan Chase. They couldn't share too many details for obvious reasons, but they were talking about cost savings of one, two or three orders of magnitude compared to existing technology. Peter Krey said humorously that anyone can save 30-40%: if you demand at least an order of magnitude, it takes a lot of fluff projects off the table. 'Fluff' wasn't the word he used, but you get the idea. Heh. To state the obvious, Hadoop is a disruptive technology. One way this might play out is as a replacement for ETL and data warehousing setups in big companies. Picture a pipeline of (1) DB2 tables (2) VSAM and other structured files, (3) Oracle OLTP databases, (4) Informatica/Ab Initio/Data Stage/whatever jobs filling up (5) Oracle data warehouses, and finally (6) SQL Server cubes connected to front-end applications in the hands of analysts. There are a lot of variations on this idea out there, but let's call it an example of a common pattern. 1, 2, 3 and 6 are hard to dislodge, because they're actual operations and high-level-user-facing apps, respectively, but 4 and 5 are pretty ripe, in many cases, to be moved from the special-purpose clusters they tend to run on to a general-purpose Hadoop cluster, on commodity hardware, with probable increased parallelization and massive cost savings. There's a lot of Oracle in that space, and Oracle now has a commanding position relative to the fate of java. So let's work this angle, but not make it too obvious, or go to far, or maybe java will start languishing like MySQL. I jest. Sort of.