Here are the steps to get HBase running on Cloudera's VM:
- Download VM
Get it from Cloudera's website. - Start VM
As the above page states: "To launch the VMWare image, you will either need VMware Player for windows and linux, or VMware Fusion for Mac."
Note: I have Parallels for Mac and wanted to use that. I used Parallels Transporter to convert the "cloudera-training-0.3.1.vmx" to a new "cloudera-training-0.2-cl3-000002.hdd", create a new VM in Parallels selecting Ubuntu Linux as the OS and the newly created .hdd as the disk image. Boot up the VM and you are up and running. I gave it a bit more memory for the graphics to be able to switch the VM to 1440x900 which is native to my MacBook Pro I am using.
Finally follow the steps explained on the page above, i.e. open a Terminal and issue:
$ cd ~/git $ ./update-exercises --workspace
- Pull HBase branch
Open a new Terminal (or issue a$ cd ..
in the open one), then:
$ sudo -u hadoop git clone http://git.apache.org/hbase.git /home/hadoop/hbase $ sudo -u hadoop sh -c "cd /home/hadoop/hbase ; git checkout origin/0.20_on_hadoop-0.18.3" ... HEAD is now at c050f68... pull up to release
First we clone the repository, then switch to the actual branch. You will notice that I am usingsudo -u hadoop
because Hadoop itself is started under that account and so I wanted it to match. Also, the default "training" account does not have SSH set up as explained in Hadoop's quick-start guide. Whensudo
is asking for a password use the default set to "training".
- Build Branch
Continue in Terminal:
$ sudo -u hadoop sh -c "cd /home/hadoop/hbase/ ; export PATH=$PATH:/usr/share/apache-ant-1.7.1/bin ; ant package" ... BUILD SUCCESSFUL
- Configure HBase
There are a few edits to be made to get HBase running.
$ sudo -u hadoop vim /home/hadoop/hbase/build/conf/hbase-site.xml <configuration> <property> <name>hbase.rootdir</name> <value>hdfs://localhost:8020/hbase</value> </property> </configuration> $ sudo -u hadoop vim /home/hadoop/hbase/build/conf/hbase-env.sh # The java implementation to use. Java 1.6 required. # export JAVA_HOME=/usr/java/jdk1.6.0/ export JAVA_HOME=/usr/lib/jvm/java-6-sun ...
Note: There is a small glitch in the revision 826669 of that Cloudera specific HBase branch. The master UI (on port 60010 on localhost) will not start because a path is different and Jetty packages are missing because of it. You can fix it by editing the start up script and changing the path scanned:
$ sudo -u hadoop vim /home/hadoop/hbase/build/bin/hbase
Replace
for f in $HBASE_HOME/lib/jsp-2.1/*.jar; do
with
for f in $HBASE_HOME/lib/jetty-ext/*.jar; do
This is only until the developers have fixed this in the branch (compare the revision I used r813052 with what you get). Or if you do not want the UI you can ignore this and the error in the logs too. HBase will still run, just not its web based interface.
- Rev up the Engine!
The final thing is to start HBase:
$ sudo -u hadoop /home/hadoop/hbase/build/bin/start-hbase.sh $ sudo -u hadoop /home/hadoop/hbase/build/bin/hbase shell HBase Shell; enter 'help<RETURN>' for list of supported commands. Version: 0.20.0-0.18.3, r813052, Mon Oct 19 06:51:57 PDT 2009 hbase(main):001:0> list 0 row(s) in 0.2320 seconds hbase(main):002:0>
Done!
This sums it up. I hope you give HBase on the Cloudera Training VM a whirl as it also has Eclipse installed and therefore provides a quick start into Hadoop and HBase.
Just keep in mind that this is for prototyping only! With such a setup you will only be able to insert a handful of rows. If you overdo it you will bring it to its knees very quickly. But you can safely use it to play around with the shell to create tables or use the API to get used to it and test changes in your code etc.
Update: Updated title to include version number, fixed XML
Great post Lars. Hope to see more HBase post as well. :)
ReplyDeleteOne thing though:
training@training-vm:~$ hadoop version
Hadoop 0.20.1+133
Subversion -r cf888d18fcd414b839d23b7e61208e3f0118f15b
Compiled by root on Sun Sep 27 00:27:10 UTC 2009
Does this mean we don't need the special HBase branch anymore?
Hi sf, it seems as you say that cloudera-training-0.3.2 is updated to the latest release. Very nice. I checked and it does not have HBase installed on it by the looks. I will try it out and post my findings here. Thanks!
ReplyDeleteHbase is the Hadoop database. Think of it as a distributed, scalable, big data store.
ReplyDelete