Hadoop 2.2.0 – Single Node Cluster

We’re going to use the the Hadoop tarball we compiled earlier to run a pseudo-cluster. That means we will run a one-node cluster on a single machine. If you haven’t already read the tutorial on building the tarball, please head over and do that first.

Geting started with Hadoop 2.2.0 — Building

Start up your (virtual) machine and login as the user ‘hadoop’. First, we’re going to setup the essentials required to run Hadoop. By the way, if you are running a VM, I suggest you kill the machine used for building Hadoop and re-start from a fresh instance of Ubuntu to avoid any issues with compatibility later. For reference, the OS we are using is 64-bit Ubuntu 12.04.3 LTS.

I wrote a tutorial on getting started with Hadoop back in the day (around mid 2010). Turns out that the distro has moved on quite a bit with the latest versions. The tutorial is unlikely to work. I tried setting up Hadoop on a single-node “cluster” using Michael Knoll’s excellent tutorial but that too was out of date. And of course, the official documentation on Hadoop’s site is lame.

Having struggled for two days, I finally got the steps smoothed out and this is an effort to document it for future use.

