Hortonworks Sandbox R and RStudio install

This blog: http://blogr-cs.blogspot.com/2012/12/integration-of-r-rstudio-and-hadoop-in.html tells how to modify a Cloudera VM to include R and RStudio in the VM as well as the RHadoop library. This document shows some modifications to the steps to support the Hortonworks Sandbox.

Sandbox VM as seen in VMware Fusion

wpid25-media_1363911727907.png

The IP address in blurred out. I simply installed the latest Sandbox VMware .ova file. When I downloaded the file from Hortonworks, the saved file was given the extension .ovf and this gives Fusion (5.03) trouble. Manually changing to .ova made the import work. Hortonworks import instructions are available here. The actual instructions to make the R, RStudio and the rmr2 library are found in the following history listing from my ssh session to the VM.

An image of the steps needed for the install

wpid26-media_1363912045557.png

The text for these commands for cut-n-paste convenience:

sudo rpm -Uvh http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
sudo yum -y install git wget R
ls /etc/default
sudo ln -s /etc/default/hadoop /etc/profile.d/hadoop.sh
cat /etc/profile.d/hadoop.sh | sed 's/export //g' > ~/.Renviron
wget http://download2.rstudio.org/rstudio-server-0.97.332-x86_64.rpm
sudo yum install --nogpgcheck rstudio-server-0.97.332-x86_64.rpm
sudo R
wget --no-check-certificate http://goo.gl/uV6Y9
sudo R CMD INSTALL rmr2_2.1.0.tar.gz

After the sudo R step, you need to install some R package prerequisites:

install.packages( c('RJSONIO', 'itertools', 'digest', 'Rcpp', 'functional', 'plyr', 'stringr'), repos='http://cran.revolutionanalytics.com')
install.packages( c('reshape2'), repos='http://cran.revolutionanalytics.com')

The URL in the second wget statement comes from the current location of the latest rmr2 library as documented here.

After this you should be able to run RStudio in your browser at port 8787.

Advertisements

11 comments

  1. Thanks for posting this!

    You might want to add the default root password for the Sandbox (hadoop), as well as mention that for RStudio, you can’t login using the root user (at least, I couldn’t).

    So I needed to login as root in CentOS, do “passwd sandbox” to change the ‘sandbox’ user password, then I could login to RStudio Server using the ‘sandbox, password’ credentials.

  2. I am able to ssh in as root using the default Sandbox password. I did change the password for the sandbox user as you write above. I also added the sandbox user to the sudo-ers list to make things easier.

  3. Pingback: Weekly bookmarks: mars 22nd | robertsahlin.com

  4. If you are using the Hortonworks Sandbox 1.3 with VirtualBox there is now an additional step. You will need to create a port forward from host port 8787 to guest port 8787. Do that in Network > Advanced in VirtualBox. You will see a bunch of other port forwards there so just use them as an example. You will then connect to 127.0.0.1:8787

    If you’re using VMware none of this is necessary.

    If you’re curious why, before 1.3 all platforms had 2 NICs to deal with VirtualBox’s limitation that doesn’t allow Host/Guest communication on NAT interfaces. This caused a lot of deployment problems so we removed the second interface. Hopefully things will be smoother now.

  5. Some things weren’t working for me. I did some digging on various things and posted my results to github:

    https://github.com/tavor/hortonworks-sandbox-rmr-rstudio-installation

    If the script works (haven’t tested yet), two wget commands should get the scripts needed to set stuff up:

    wget https://raw.github.com/tavor/hortonworks-sandbox-rmr-rstudio-installation/master/install-rstudio.sh?raw=true
    wget https://raw.github.com/tavor/hortonworks-sandbox-rmr-rstudio-installation/master/install_packages.r?raw=true

    After that, running install-rstudio.sh should work. Will be committing as I test it out.

  6. Thanks Jim! Great post!
    In my case ‘bitops’ was missed out in default packages which I had to download using > install.packages( c(‘bitops’), repo…….) inside R shell. Also rmrr2.2.2… was needed.
    There was no user named ‘sandbox’ in the VM that I downloaded (for Hadoop 2 + Yarn ) , I ended up creating new user called ‘sandbox’ and having a password created for the same.
    adduser sandbox
    passwd
    and magically R studio on http://xxx.xxx.xxx.xxx:8787/ took off !!
    Thanks!
    Have a nice day!

  7. Hi Jim,

    Thanks for the post.

    I am able to do all of the steps except for install rmr2. I tried even 2.2.2 as suggested by Manish. But I am getting “Error in getOctD(x, offset, len) : invalid octal digit”. Seems I am doing some simple step wrong. Not able to figure out. In my windows version, I am getting the same error, but able to workaround with the .zip version which I cant do in HDP.

    And I am not able to access r studio with port :8787

    Appreciate any help on this.

    Thanks
    Ganesan

    • Hi Jim, After carefully reading thro the entire post, I am all set with both rmr2 and rstudio server.

      Thanks much for the post again.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s