That was advice given to me at a meetup, but I wasn't thrilled about laying out bunches of money for old hardware taking up lots of space. Then one day, I stumbled upon a webpage about a cluster built out of Raspberry Pi boards, the little $40 computer which is roughly a processor from the 90s (think Pentium 2ish) with peripherals from the 00s.
This seemed like a fun way to get the learning on the cheap, plus it was kinda ridiculous in a cool way. And no one had built one mounted to the network switch, so I could fire up the soldering iron and do something original.
The order came to about $300, all from Amazon. Make sure you get the correct monitor cable and read through the next section to see if you also want to order 4 micro-usb cables.
______________Components______________ |
||||
---|---|---|---|---|
# |
Picture Link |
Text Link |
Quantity |
Unit Price |
1 |
![]() |
Raspberry Pi Board (model B) | 4 | 37.60 |
2 |
![]() |
16GB Class 10 SD Card | 4 | 9.47 |
3 |
![]() |
Cat-6 Patch Cable 5-pack | 1 | 7.99 |
4 |
![]() |
8x1 Gigabit Network Switch | 1 | 41.39 |
5 |
![]() |
Switching Power Supply | 1 | 8.01 |
6 |
![]() |
DC 12v to 5v Converter | 2 | 8.00 |
7 |
![]() |
Nylon Standoff Hardware | 1 | 11.00 |
8 |
![]() |
Raspberry Pi Case | 1 | 7.99 |
9 |
![]() |
Dvi to Hdmi Cable | 1 | 7.99 |
OK, let's start moving towards software. I used the latest Raspian release for my OS and it's been fine. The standard Raspberry Pi site can direct you on formatting the OS onto an SD card better than I could. The first stop after removing a PI from the box is to powerup with an SD card, usb mouse, usb keyboard and monitor hooked up to the dvi connector. If all goes well, you'll see a big brash Raspberry taking over the monitor. Play around for awhile and have fun, but don't expect a lot of speed.
This is all impressive, and personally staring at the pinkish-purplish raspberry does seem to reduce stress, but eventually we want run headless. This means the Pi board will be networked with another computer and run through an SSH session.
I'm going to switch from Raspberry Pi to Humble Pie for a minute and present you with itToby's webpage on the same subject. It's well written and what I used (thanks a million Toby!). From now on, my stuff will mostly be helper notes to his material:
This may only help some of you, but it took me awhile to figure out a good way of displaying and managing a RasPi graphical SSH session on a 2011 Macbook Air. After crawling around the net, I came up with these steps:
Java 7 Hotspot from Oracle is now included with Raspian and it performs better than OpenJDK, so it's worth the hassle to get going. See the Problem section below for specifics.
Just a bunch of small matters here:
#Archiving SD card diskutil list # make sure card appears as disk1 sudo dd if=/dev/rdisk1 bs=1m | gzip > ~/Desktop/pi.gz #Flashing SD card diskutil list # make sure card appears as disk1 diskutil unmountDisk /dev/disk1 gzip -dc ~/Desktop/pi.gz | sudo dd of=/dev/rdisk1 bs=1m
1. ARM hardware not supported message when starting Hadoop
Cause: Must run in client mode only, not server. Discussed here (right before the comments):
raspberrypicloud
Fix: I made the following changes to the /usr/local/hadoop/bin/hadoop script (not sure it's ideal,
but it works):
elif [ "$COMMAND" = "datanode" ] ; then CLASS='org.apache.hadoop.hdfs.server.datanode.DataNode' #HADOOP_OPTS=${HADOOP_OPTS/-server/} #HADOOP_OPTS=${HADOOP_OPTS/} if [ "$starting_secure_dn" = "true" ]; then HADOOP_OPTS="-jvm server $HADOOP_OPTS $HADOOP_DATANODE_OPTS" else #HADOOP_OPTS="$HADOOP_OPTS -server $HADOOP_DATANODE_OPTS" HADOOP_OPTS="$HADOOP_OPTS $HADOOP_DATANODE_OPTS" fi
2. Cannot create /fs/hadoop/tmp/dfs/data directory
Cause: /fs/hadoop directory ownership
Fix:
cd /fs sudo chown john:hadoop hadoop
3. Datanode Daemon Shutting Down Shortly After Starting
Cause: VERSION IDs differing between namenode and datanode
cat /fs/hadoop/tmp/dfs/name/current/VERSION cat /fs/hadoop/tmp/dfs/data/current/VERSION
Fix: I blow everything out and reformat HDFS, see Serverfault post if you want something less extreme
sudo rm -r /fs/hadoop/tmp # all nodes hadoop namenode -format # master only
So how does it perform? Well, I loaded it up with slightly over 300 Mbytes of text data and ran wordcount across 3 non-overclocked worker nodes. It took 23 minutes. The same data running on my Macbook Air (SSD disk) in psuedo-mode took 1 minute 19 seconds. So it definitely won't break any speed records. Even JPS takes 11 seconds. But remember, you're doing this for the learning.
My favorite thing: unplugging one of the nodes and watching replication in action on the browser.
Also I did this project at the same time as studying for the Cloudera Hadoop Administrator Certification (88% woo-hoo) and they reinforced each other well.