<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Dalton Clark</title>
	<atom:link href="http://www.daltonclark.com/blog/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.daltonclark.com/blog</link>
	<description>Ben Clark&#039;s technology blog</description>
	<lastBuildDate>Fri, 22 Jan 2010 14:59:00 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Me on Hadoop setup at StyleFeeder, part 2</title>
		<link>http://www.daltonclark.com/blog/2010/01/22/hadoop-setup-at-stylefeeder-part-2-patch-rpm/</link>
		<comments>http://www.daltonclark.com/blog/2010/01/22/hadoop-setup-at-stylefeeder-part-2-patch-rpm/#comments</comments>
		<pubDate>Fri, 22 Jan 2010 14:56:02 +0000</pubDate>
		<dc:creator>Ben Clark</dc:creator>
				<category><![CDATA[technology]]></category>
		<category><![CDATA[hadoop]]></category>

		<guid isPermaLink="false">http://www.daltonclark.com/blog/?p=47</guid>
		<description><![CDATA[This post on the StyleFeeder tech blog is a HOWTO for taking a Cloudera Hadoop distribution in the 0.20 series, patching it for yourself, and running a Hadoop cluster on EC2 based on it.
]]></description>
			<content:encoded><![CDATA[<p><a href="http://blog.tech.stylefeeder.com/2010/01/22/hadoop-for-the-lone-analyst-part-2-patching-and-releasing-to-yourself/">This post</a> on the StyleFeeder tech blog is a HOWTO for taking a Cloudera Hadoop distribution in the 0.20 series, patching it for yourself, and running a Hadoop cluster on EC2 based on it.</p>
<a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save?linkurl=http%3A%2F%2Fwww.daltonclark.com%2Fblog%2F2010%2F01%2F22%2Fhadoop-setup-at-stylefeeder-part-2-patch-rpm%2F&amp;linkname=Me%20on%20Hadoop%20setup%20at%20StyleFeeder%2C%20part%202"><img src="http://www.daltonclark.com/blog/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a>]]></content:encoded>
			<wfw:commentRss>http://www.daltonclark.com/blog/2010/01/22/hadoop-setup-at-stylefeeder-part-2-patch-rpm/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Congratulations to StyleFeeder</title>
		<link>http://www.daltonclark.com/blog/2010/01/18/stylefeeder-acquired-by-time/</link>
		<comments>http://www.daltonclark.com/blog/2010/01/18/stylefeeder-acquired-by-time/#comments</comments>
		<pubDate>Tue, 19 Jan 2010 02:08:17 +0000</pubDate>
		<dc:creator>Ben Clark</dc:creator>
				<category><![CDATA[technology]]></category>

		<guid isPermaLink="false">http://www.daltonclark.com/blog/?p=43</guid>
		<description><![CDATA[My heartfelt congratulations to my clients and colleagues at StyleFeeder, which is being acquired by Time, Inc.  Time is getting a tremendous asset, technology that will give them an edge, and top talent.
]]></description>
			<content:encoded><![CDATA[<p>My heartfelt congratulations to my clients and colleagues at <a href="http://www.stylefeeder.com">StyleFeeder</a>, which is being <a href="http://online.wsj.com/article/SB10001424052748703626604575011191771805782.html?mod=WSJ_business_whatsNews">acquired</a> by Time, Inc.  Time is getting a tremendous asset, technology that will give them an edge, and top talent.</p>
<a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save?linkurl=http%3A%2F%2Fwww.daltonclark.com%2Fblog%2F2010%2F01%2F18%2Fstylefeeder-acquired-by-time%2F&amp;linkname=Congratulations%20to%20StyleFeeder"><img src="http://www.daltonclark.com/blog/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a>]]></content:encoded>
			<wfw:commentRss>http://www.daltonclark.com/blog/2010/01/18/stylefeeder-acquired-by-time/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Me on Hadoop Setup, at StyleFeeder</title>
		<link>http://www.daltonclark.com/blog/2010/01/14/ben-clark-on-hadoop-setup-at-stylefeeder/</link>
		<comments>http://www.daltonclark.com/blog/2010/01/14/ben-clark-on-hadoop-setup-at-stylefeeder/#comments</comments>
		<pubDate>Thu, 14 Jan 2010 15:06:08 +0000</pubDate>
		<dc:creator>Ben Clark</dc:creator>
				<category><![CDATA[technology]]></category>
		<category><![CDATA[hadoop]]></category>

		<guid isPermaLink="false">http://www.daltonclark.com/blog/?p=37</guid>
		<description><![CDATA[My colleagues and clients at StyleFeeder are good enough to let me post on their tech blog from time to time.  I&#8217;m exploring Hadoop on their behalf, as partially described here: http://blog.tech.stylefeeder.com/2010/01/14/hadoop-for-the-lone-analyst/.  That&#8217;s basically a HOWTO for Hadoop 0.20 + Apache logs + MySQL on EC2, with tips on streaming, compression, Pig, Redhat/CentOS [...]]]></description>
			<content:encoded><![CDATA[<p>My colleagues and clients at <a href="http://www.stylefeeder.com">StyleFeeder</a> are good enough to let me post on their tech blog from time to time.  I&#8217;m exploring Hadoop on their behalf, as partially described here: <a href="http://blog.tech.stylefeeder.com/2010/01/14/hadoop-for-the-lone-analyst/">http://blog.tech.stylefeeder.com/2010/01/14/hadoop-for-the-lone-analyst/</a>.  That&#8217;s basically a HOWTO for Hadoop 0.20 + Apache logs + MySQL on EC2, with tips on streaming, compression, Pig, Redhat/CentOS and the <a href="http://www.cloudera.com">Cloudera</a> Python scripts for EC2.</p>
<a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save?linkurl=http%3A%2F%2Fwww.daltonclark.com%2Fblog%2F2010%2F01%2F14%2Fben-clark-on-hadoop-setup-at-stylefeeder%2F&amp;linkname=Me%20on%20Hadoop%20Setup%2C%20at%20StyleFeeder"><img src="http://www.daltonclark.com/blog/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a>]]></content:encoded>
			<wfw:commentRss>http://www.daltonclark.com/blog/2010/01/14/ben-clark-on-hadoop-setup-at-stylefeeder/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Windows 7 Upgrade: worth it?</title>
		<link>http://www.daltonclark.com/blog/2009/11/14/windows-7-upgrade-worth-it/</link>
		<comments>http://www.daltonclark.com/blog/2009/11/14/windows-7-upgrade-worth-it/#comments</comments>
		<pubDate>Sat, 14 Nov 2009 20:37:34 +0000</pubDate>
		<dc:creator>Ben Clark</dc:creator>
				<category><![CDATA[technology]]></category>
		<category><![CDATA[OS]]></category>
		<category><![CDATA[Windows]]></category>

		<guid isPermaLink="false">http://www.daltonclark.com/blog/?p=27</guid>
		<description><![CDATA[After a couple of virtual machine upgrades that worked pretty well, I decided to upgrade my main workhorse laptop (a 15&#8243; Lenovo W500) from Windows Vista 64-bit Ultimate to Windows 7 64-bit Ultimate.  You can only upgrade to an identical subtype, without 3rd-party tools or a lot of hacking.  The result?  More [...]]]></description>
			<content:encoded><![CDATA[<p>After a couple of virtual machine upgrades that worked pretty well, I decided to upgrade my main workhorse laptop (a 15&#8243; Lenovo W500) from Windows Vista 64-bit Ultimate to Windows 7 64-bit Ultimate.  You can only upgrade to an identical subtype, without 3rd-party tools or a lot of hacking.  The result?  More disk space than when I started, and suddenly all my devices work. Skype is no longer helpless at finding the built-in video camera.  It had been complaining that the camera was in already in use.  When I plug in headphones, the speakers go silent without my having to find that really obscure control panel check box.   The ATI Radeon HD 3650, which Lenovo is calling the ATI Mobility FireGL V5700, seems to have discovered the DisplayPort connection to my Dell 2408WFP, and the external monitor picture is really crisp now.  You have to change the display settings twice: first have it duplicate the display on both monitors, then extend it to the other monitor.  Now it can find the external monitor without having to reboot, and without getting a link failure at some point (usually when you&#8217;re working on something really interesting).  I upgraded to VMWare 7 while I was at it, and instead of doing this: <img src="http://www.daltonclark.com/blog/wp-content/uploads/2009/11/vmware-splitscreen.png" alt="vmware-splitscreen" title="vmware-splitscreen" width="480" height="150" class="alignleft size-full wp-image-29" /> (split across two screens), which was totally useless, and in a way comical, it&#8217;s allowing me to run Centos or Ubuntu across two monitors.  I think it&#8217;s the better drivers/OS combo, not the VMware, that&#8217;s working now.  Come to think of it, it&#8217;s pretty outrageous what all was broken before.  Better late than never.</p>
<a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save?linkurl=http%3A%2F%2Fwww.daltonclark.com%2Fblog%2F2009%2F11%2F14%2Fwindows-7-upgrade-worth-it%2F&amp;linkname=Windows%207%20Upgrade%3A%20worth%20it%3F"><img src="http://www.daltonclark.com/blog/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a>]]></content:encoded>
			<wfw:commentRss>http://www.daltonclark.com/blog/2009/11/14/windows-7-upgrade-worth-it/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Hadoop in the Enterprise</title>
		<link>http://www.daltonclark.com/blog/2009/10/08/hadoop-in-the-enterprise/</link>
		<comments>http://www.daltonclark.com/blog/2009/10/08/hadoop-in-the-enterprise/#comments</comments>
		<pubDate>Thu, 08 Oct 2009 12:06:57 +0000</pubDate>
		<dc:creator>Ben Clark</dc:creator>
				<category><![CDATA[technology]]></category>
		<category><![CDATA[hadoop]]></category>

		<guid isPermaLink="false">http://www.daltonclark.com/blog/?p=17</guid>
		<description><![CDATA[At Hadoop World NYC 2009, one of the most interesting presentations, from a business point of view, was by JP Morgan Chase.  They couldn&#8217;t share too many details for obvious reasons, but they were talking about cost savings of one, two or three orders of magnitude compared to existing technology.  Peter Krey said [...]]]></description>
			<content:encoded><![CDATA[<p>At Hadoop World NYC 2009, one of the most interesting presentations, from a business point of view, was by JP Morgan Chase.  They couldn&#8217;t share too many details for obvious reasons, but they were talking about cost savings of one, two or three orders of magnitude compared to existing technology.  Peter Krey said humorously that anyone can save 30-40%: if you demand at least an order of magnitude, it takes a lot of fluff projects off the table.  &#8216;Fluff&#8217; wasn&#8217;t the word he used, but you get the idea.  Heh.</p>
<p>To state the obvious, Hadoop is a disruptive technology.  One way this might play out is as a replacement for ETL and data warehousing setups in big companies.  Picture a pipeline of (1) DB2 tables (2) VSAM and other structured files, (3) Oracle OLTP databases, (4) Informatica/Ab Initio/Data Stage/whatever jobs filling up (5) Oracle data warehouses, and finally (6) SQL Server cubes connected to front-end applications in the hands of analysts.  There are a lot of variations on this idea out there, but let&#8217;s call it an example of a common pattern.  1, 2, 3 and 6 are hard to dislodge, because they&#8217;re actual operations and high-level-user-facing apps, respectively, but 4 and 5 are pretty ripe, in many cases, to be moved from the special-purpose clusters they tend to run on to a general-purpose Hadoop cluster, on commodity hardware, with probable increased parallelization and massive cost savings.  There&#8217;s a lot of Oracle in that space, and Oracle now has a commanding position relative to the fate of java.  So let&#8217;s work this angle, but not make it too obvious, or go to far, or maybe java will start languishing like MySQL.  I jest.  Sort of.</p>
<a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save?linkurl=http%3A%2F%2Fwww.daltonclark.com%2Fblog%2F2009%2F10%2F08%2Fhadoop-in-the-enterprise%2F&amp;linkname=Hadoop%20in%20the%20Enterprise"><img src="http://www.daltonclark.com/blog/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a>]]></content:encoded>
			<wfw:commentRss>http://www.daltonclark.com/blog/2009/10/08/hadoop-in-the-enterprise/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>eBay&#8217;s Mobius Query Language at Hadoop World</title>
		<link>http://www.daltonclark.com/blog/2009/10/08/ebays_mobius_query_language_at_hadoop_world/</link>
		<comments>http://www.daltonclark.com/blog/2009/10/08/ebays_mobius_query_language_at_hadoop_world/#comments</comments>
		<pubDate>Thu, 08 Oct 2009 11:39:06 +0000</pubDate>
		<dc:creator>Ben Clark</dc:creator>
				<category><![CDATA[technology]]></category>
		<category><![CDATA[hadoop]]></category>

		<guid isPermaLink="false">http://www.daltonclark.com/blog/?p=13</guid>
		<description><![CDATA[I went to Cloudera&#8217;s Hadoop World NYC 2009 on Friday: it was quite a show.  
One theme that played out through many presentations was abstraction layers on top of raw Map Reduce.  The two biggest are Pig and Hive, which are Yahoo&#8217;s and Facebook&#8217;s solutions to the same basic problem, of how to [...]]]></description>
			<content:encoded><![CDATA[<p>I went to <a href="http://www.cloudera.com" title="Cloudera--services for Hadoop">Cloudera</a>&#8217;s Hadoop World NYC 2009 on Friday: it was quite a show.  </p>
<p>One theme that played out through many presentations was abstraction layers on top of raw Map Reduce.  The two biggest are <a href="http://hadoop.apache.org/pig/">Pig</a> and <a href="http://hadoop.apache.org/hive/">Hive</a>, which are Yahoo&#8217;s and Facebook&#8217;s solutions to the same basic problem, of how to write less code for repetitive Map Reduce tasks.   There&#8217;s a lot of good commentary out there on those.  Hive is more like a sql shell, and if you want to extend it, I think you&#8217;re going to be writing, say, Python mappers/reducers and streaming them into/out-of your Hive setup.  With Pig, you&#8217;re operating, as they put it in the training/documentation/O&#8217;Reilly book, which collectively document Pig very well, more at the level of a SQL query optimizer.  You have some iteration facilities, and you can extend it with java.   Pig does more exactly what you tell it to do, and Hive is something you &#8216;hint&#8217; at. These are general-purpose tools.</p>
<p>In the more specialized area of web analytics, eBay has a very interesting internal tool, called Mobius Query Language, on which Neel Sundaresan gave a fascinating talk.  I&#8217;ll update with a link if Cloudera posts the presentation, but it helps you model visits with landmarks, duration, and some other concepts I didn&#8217;t take notes on.  It clearly helps them wrap their code around the maddeningly amorphous user visit: participating in an auction, bidding, winning, abandoning, etc.  The language seemed general-purpose enough for application to any user-behavior modeling.  The interface is a SQL-like query language that seems, like Hive, to generate Map Reduce jobs based on nicely abstracted view of exactly the sorts of questions you want to ask your web analytics system.  For the moment, I&#8217;m doing what web analytics I&#8217;m doing by extending Pig, but I hereby declare the Movement to Get eBay to Opensource the Mobius Query Language.  Who&#8217;s with me?</p>
<p>On the conference in general, there is some good commentary out there, from <a href="http://dev.hubspot.com/bid/27047/Hadoop-World-NYC-2009" title="Comments on Hadoop World NYC 2009 by Dan Milstein">Dan Milstein</a>, <a href="http://dev.hubspot.com/bid/27054/Hadoop-World-impressions" title="Comments on Hadoop World NYC 2009 by Steve Laniel">Steve Laniel</a>, <a href="http://www.hilarymason.com/blog/hadoop-world-nyc/" title="Comments on Hadoop World NYC 2009 by Hilary Mason">Hilary Mason</a>, and no doubt others.</p>
<a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save?linkurl=http%3A%2F%2Fwww.daltonclark.com%2Fblog%2F2009%2F10%2F08%2Febays_mobius_query_language_at_hadoop_world%2F&amp;linkname=eBay%26%238217%3Bs%20Mobius%20Query%20Language%20at%20Hadoop%20World"><img src="http://www.daltonclark.com/blog/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a>]]></content:encoded>
			<wfw:commentRss>http://www.daltonclark.com/blog/2009/10/08/ebays_mobius_query_language_at_hadoop_world/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Out from behind the firewall</title>
		<link>http://www.daltonclark.com/blog/2009/10/05/out-from-behind-the-firewall/</link>
		<comments>http://www.daltonclark.com/blog/2009/10/05/out-from-behind-the-firewall/#comments</comments>
		<pubDate>Mon, 05 Oct 2009 17:35:41 +0000</pubDate>
		<dc:creator>Ben Clark</dc:creator>
				<category><![CDATA[technology]]></category>
		<category><![CDATA[clustering]]></category>
		<category><![CDATA[web]]></category>

		<guid isPermaLink="false">http://www.daltonclark.com/blog/?p=4</guid>
		<description><![CDATA[I have been working in big enterprises for a few years, but no longer.  Here&#8217;s a cross-post I did at the Stylefeeder tech blog, where I&#8217;m currently working.
]]></description>
			<content:encoded><![CDATA[<p>I have been working in big enterprises for a few years, but no longer.  Here&#8217;s a <a title="cross-post" href="http://blog.tech.stylefeeder.com/2009/09/09/simple-clustering-with-aws-and-free-rightscale/">cross-post</a> I did at the Stylefeeder tech blog, where I&#8217;m currently working.</p>
<a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save?linkurl=http%3A%2F%2Fwww.daltonclark.com%2Fblog%2F2009%2F10%2F05%2Fout-from-behind-the-firewall%2F&amp;linkname=Out%20from%20behind%20the%20firewall"><img src="http://www.daltonclark.com/blog/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a>]]></content:encoded>
			<wfw:commentRss>http://www.daltonclark.com/blog/2009/10/05/out-from-behind-the-firewall/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
