Aug 1, 2011

O'Reilly OSCON 2011

This year marked the first occurrences of OSCON Data and Java which were co-located within the OSCON conference itself. In reality, OSCON only gets majorly underway on Wednesday and Thursday of the week and these conferences ran on Monday and Tuesday. This actually works really well and I spent most of my time in the OSCON Data tracks and loved the opportunities I had to meet some new people, make some new friends and hear the latest in the Data and Java worlds. While this is true for a lot of conferences, the value I get out a conversations with certain people in the areas I work with (which I would not normally have the opportunity to have) is not to be understated.

Some thoughts inspired from some recent conferences:

The popularity of certain ASF projects like Hadoop make their mailing lists look like Twitter streams.I am all for transparency and democracy but are the mailing lists still the best place to handle decisions?

Now that I think the market has a better understanding of which Big Data tools to use for what, as far as public interest and developer adoption goes, my perception is that NoSQL (Structured Big Data) seems to be outpacing Hadoop (Unstructured Big Data). I personally think this is because there is more interest in asking THE SAME questions of the data that the NOSQL adopters have been doing historically, but now with systems that can handle the increase in data volume. This is opposed to Hadoop, which lets you ask NEW questions of your data. These are two very different types of users.

Why are all the Hadoop presentations about Performance, Reliability and Useability? Hadoop works well enough now that Data related conferences should not be dominated by these sessions. The world wide web is a MASSIVE repository of unstructured data just waiting to be turned into insightful information. Where are all the presentations on the new kinds of information we can derive with Hadoop? This is not necessarily a rant at selection committees but quite likely a more worrisome issue about a dearth of submissions in these areas which might be causally related to the fact that there still aren't too many people actually doing this.

I'm worried I might be picking up an undertone at Data related conferences (both from the selection committees and the attendees) that if you don't work for a Data startup or Facebook/eBay/LinkedIn/Yahoo/Twitter that you either don't get or don't have anything interesting to say about Big Data. One example was at OSCON Data, where LexisNexis only had 10 people in the room. In my opinion, this was the most interesting session at OSCON Data. Now I of course grant that different folks have different session interests. LexisNexis had a former federal prosecutor present on how they extracted correlations in their system to build a fraud ring case against 100 residents that were all living in a condo complex on the beach in $1.2 Million dollar units but oddly were all on various forms of welfare. It was awesome and a little terrifying that they know so much about us. Net net, we need to avoid a Big Data clique/mafia.

At OSCON itself, I co-presented a session with Glenn Gebhart from Vertica which I have embedded below. The session was about how to use Apache Nutch, Hadoop Java Map/Reduce and Eclipse Plugin, Apache Pig and Vertica to quantify whether or not we are in a Tech Bubble.

No comments: