Cloud, Geek stuff, Tutorials, Video

Getting Started with Amazon EC2 – Install, Configure, Connect

The second in the series of Amazon Web Service tutorial videos. Here we describe how to create a keypair for shell access to an Amazon Machine Image instance, how to do basic firewall configuration to enable remote login and finally start and connect to the a basic Amazon Linux AMI.

All comments and questions would be most welcome.

Produced by: Mohammad Nauman (voice) and Toqeer Ali

Advertisements
Geek stuff, resources, Tutorials

Inserting Source Code and LaTeX in WordPress

A friend of mine asked me the other day how ‘pretty printed’ source code can be inserted into wordpress posts. Here’s how:

If you have a self-hosted wordpress server, get the plugin ‘SyntaxHighlighter Evolved’. Defaults work fine. If you have a wordpress.com account, the plugin is already. For LaTeX, you need the ‘WP-Latex’ plugin. After that, you can insert the source code using the syntax

[ sourcecode language="lang" ]
 ... code here (there is no space after the [ in the lines above and below).
[ / sourcecode]

You can get a list of supported languages here. Latex code can be inserted using the syntax:

$ latex 2^x $

Again, no space after the $ in the above code. There you go!

Geek stuff, Linux, Tutorials

Install, Configure and Execute Apache Hadoop from Source

Hadoop is Apache’s implementation of the brand-spanking new programming model called MapReduce, along with some other stuff such as Hadoop Distributed Filesystem (HDFS). It can be used to parallelize (or distribute) and thus massively speedup certain kinds of data processing. This tutorial will talk about installing, configuring and running the Hadoop framework on a single node. In a future tutorial, we might create a project that actually uses Hadoop for problem solving through multiple clustered nodes. Here, we start by looking at the setup of a single node.

Installing from the source is important if you want to make changes to the Hadoop framework itself. I’ve found that it’s also the easier method if you simply want to deploy Hadoop. Whichever path you want to take, going with SVN is probably the best way. So, first checkout the source of a stable branch. I used 0.20.2 because it is the ‘stable’ branch at the time and because I was having trouble with checking out 0.20.

But before that, you need to setup the dependencies. Here they are:

  1. JDK (I found 1.6+ to be compatible with the 0.20.2 branch)
  2. Eclipse (SDK or ‘classic’. This is required for the building the Hadoop eclipse plugin. I used 3.6.1)
  3. Ant (for processing the install/configuration scripts)
  4. xerces-c (the XML parser)
  5. SSH server
  6. g++

By the way, I used Ubuntu 10.04 as my dev box. Download binaries of Eclipse, ant and xerces-c. Extract them in your home folder and remember their folder names. We’ll be needing them later.

Install the rest of the dependencies with:

$ sudo apt-get install sun-java6-jdk ssh g++

Also, ssh server needs to be setup so that it doesn’t require password. You can check it with ‘ssh localhost’. If it does require a password, disable that using:

$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

Now, go you your home directory, setup environment variables and checkout the Hadoop source:

nam@zenbox:~$ cd ~
nam@zenbox:~$ export JAVA_HOME=/usr/lib/jvm/java-6-sun
nam@zenbox:~$ export PATH=$PATH:/usr/share/apache-ant-1.8.1
nam@zenbox:~$ svn co http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.21 hadoop

When you do this, you get pre-built hadoop binaries. (We’re skipping the actual build part here. We’ll come back to this shortly.) You can setup the requirements and the examples and then test the ‘pi’ example so:

nam@zenbox:~$ cd hadoop
nam@zenbox:~/hadoop$ ant
nam@zenbox:~/hadoop$ ant examples
nam@zenbox:~/hadoop$ bin/hadoop
nam@zenbox:~/hadoop$ bin/hadoop jar hadoop-0.20.2-examples.jar pi 10 1000000

Here’s (part of) what I got as output:

Number of Maps  = 10
Samples per Map = 1000000
Wrote input for Map #0
Wrote input for Map #1
...
Starting Job
10/09/25 15:01:21 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
10/09/25 15:01:21 INFO mapred.FileInputFormat: Total input paths to process : 10
10/09/25 15:01:21 INFO mapred.JobClient: Running job: job_local_0001
10/09/25 15:01:21 INFO mapred.FileInputFormat: Total input paths to process : 10
10/09/25 15:01:21 INFO mapred.MapTask: numReduceTasks: 1
10/09/25 15:01:21 INFO mapred.MapTask: io.sort.mb = 100
10/09/25 15:01:21 INFO mapred.MapTask: data buffer = 79691776/99614720
10/09/25 15:01:21 INFO mapred.MapTask: record buffer = 262144/327680
10/09/25 15:01:22 INFO mapred.MapTask: Starting flush of map output
10/09/25 15:01:22 INFO mapred.MapTask: Finished spill 0
10/09/25 15:01:22 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
...
10/09/25 15:01:24 INFO mapred.LocalJobRunner:
10/09/25 15:01:24 INFO mapred.TaskRunner: Task attempt_local_0001_r_000000_0 is allowed to commit now
10/09/25 15:01:24 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0001_r_000000_0' to hdfs://localhost:9000/user/nam/PiEstimator_TMP_3_141592654/out
10/09/25 15:01:24 INFO mapred.LocalJobRunner: reduce > reduce
10/09/25 15:01:24 INFO mapred.TaskRunner: Task 'attempt_local_0001_r_000000_0' done.
10/09/25 15:01:24 INFO mapred.JobClient:  map 100% reduce 100%
10/09/25 15:01:24 INFO mapred.JobClient: Job complete: job_local_0001
10/09/25 15:01:24 INFO mapred.JobClient: Counters: 15
10/09/25 15:01:24 INFO mapred.JobClient:   FileSystemCounters
10/09/25 15:01:24 INFO mapred.JobClient:     FILE_BYTES_READ=1567406
10/09/25 15:01:24 INFO mapred.JobClient:     HDFS_BYTES_READ=192987
10/09/25 15:01:24 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=199597
10/09/25 15:01:24 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=1781093
10/09/25 15:01:24 INFO mapred.JobClient:   Map-Reduce Framework
10/09/25 15:01:24 INFO mapred.JobClient:     Reduce input groups=20
10/09/25 15:01:24 INFO mapred.JobClient:     Combine output records=0
10/09/25 15:01:24 INFO mapred.JobClient:     Map input records=10
10/09/25 15:01:24 INFO mapred.JobClient:     Reduce shuffle bytes=0
10/09/25 15:01:24 INFO mapred.JobClient:     Reduce output records=0
10/09/25 15:01:24 INFO mapred.JobClient:     Spilled Records=40
10/09/25 15:01:24 INFO mapred.JobClient:     Map output bytes=180
10/09/25 15:01:24 INFO mapred.JobClient:     Map input bytes=240
10/09/25 15:01:24 INFO mapred.JobClient:     Combine input records=0
10/09/25 15:01:24 INFO mapred.JobClient:     Map output records=20
10/09/25 15:01:24 INFO mapred.JobClient:     Reduce input records=20
Job Finished in 3.58 seconds
Estimated value of Pi is 3.14158440000000000000

So, now that you know that Hadoop is actually running and working as it should, it’s time to setup the server. First, you need to define the node configurations in the conf/core-site.xml

<!-- Put site-specific property overrides in this file. -->
<configuration>
 <property>
  <name>fs.default.name</name>
  <value>hdfs://localhost:9000</value>
 </property>
 <property>
  <name>mapred.job.tracker</name>
  <value>hdfs://localhost:9001</value>
 </property>
 <property>
  <name>dfs.replication</name>
  <value>1</value>
  <!-- set to 1 to reduce warnings when running on a single node -->
 </property>
</configuration>

Also, setting the JAVA_HOME environment variable does not work when starting the hadoop service. So, you need to set it up in conf/hadoop-env.sh:

# The java implementation to use.  Required.
export JAVA_HOME=/usr/lib/jvm/java-6-sun

Then format the namenode specified in the configuration file above. See help for more details.

nam@zenbox:~/hadoop$ bin/hadoop namenode help
10/09/25 15:17:18 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = zenbox/127.0.1.1
STARTUP_MSG:   args = [help]
STARTUP_MSG:   version = 0.20.3-dev
STARTUP_MSG:   build =  -r ; compiled by 'nam' on Sat Sep 25 11:41:00 PKT 2010
************************************************************/
Usage: java NameNode [-format] | [-upgrade] | [-rollback] | [-finalize] | [-importCheckpoint]
10/09/25 15:17:18 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at zenbox/127.0.1.1
************************************************************/

nam@zenbox:~/hadoop$ bin/hadoop namenode -format

Now you can start the service with the start-all.sh script and hopefully see the output as follows:

nam@zenbox:~/hadoop$ bin/start-all.sh
starting namenode, logging to /home/nam/hadoop/hadoop-0.20.2/bin/../logs/hadoop-nam-namenode-zenbox.out
localhost: starting datanode, logging to /home/nam/hadoop/hadoop-0.20.2/bin/../logs/hadoop-nam-datanode-zenbox.out
localhost: starting secondarynamenode, logging to /home/nam/hadoop/hadoop-0.20.2/bin/../logs/hadoop-nam-secondarynamenode-zenbox.out
starting jobtracker, logging to /home/nam/hadoop/hadoop-0.20.2/bin/../logs/hadoop-nam-jobtracker-zenbox.out
localhost: starting tasktracker, logging to /home/nam/hadoop/hadoop-0.20.2/bin/../logs/hadoop-nam-tasktracker-zenbox.out

Finally, you can put a file in the hadoop filesystem, get the file listing and cat a file in the HDFS.

nam@zenbox:~/hadoop$ bin/hadoop dfs -put ~/a.txt a.txt
nam@zenbox:~/hadoop$ bin/hadoop dfs -ls
Found 1 items
-rw-r--r--   3 nam supergroup          5 2010-09-25 15:20 /user/nam/a.txt
nam@zenbox:~/hadoop$ bin/hadoop dfs -cat a.txt
[contents of a.txt here]
nam@zenbox:~/hadoop$ bin/hadoop dfs -rm a.txt
Deleted hdfs://localhost:9000/user/nam/a.txt

We’ll get to the building of source in another installment of this tutorial inshaallah.

Geek stuff, resources, Typography

Venturing into Arabic Font Design

I started using Arabic (and Urdu) scripts quite a while ago. I came across the whole non-latin script problem when I was developing a software for a government agency related to land revenue. In the development of this software, I found myself learning about alternative keyboards and Asian language support in Windows. I also found out about the excellent research going on in FAST-NU Islamabad related to Urdu language support for computers. As an aside, I encourage all readers to go read about their work on CRULP and see the Naskh, Web Naskh and Nastaliq fonts they’ve developed and released free of charge. They’ve done a truly great job and none of my comments in the rest of the post should be taken as an offense to them or their work. Continue reading “Venturing into Arabic Font Design”

Announcements, Pedagogy, Students

HCI-F07

This post will provide a way for students of HCI (Fall 07 – FAST-NU) to keep track of their results throughout the semester. Quiz/assignment scores and exam grades will be available through the spreadsheet embedded below. It will be your responsibility to bookmark these pages and visit the pages after exams to see your official score. If you have any problems/concerns, let me know through email. Commenting on this post is enabled but should only be used in case you have suggestions for improvement of this system. Do not post anything about your scores. I will not answer any such queries here.

https://spreadsheets.google.com/pub?key=0At2k2naHPl_qdHpBVHJqVGpQc0pyNkw2TFhyZ0VpaFE&hl=en&single=true&gid=0&output=html&widget=true