Tuesday, June 18, 2019

Setting up an RStudio server on Amazon Web Services

I have sometimes used using Amazon Web Services to outsource some very computationally intensive analyses, such as for my paper last year on species distribution modeling.  Here's a quick guide to setting it up.

I first set up an Amazon Elastic Cloud Computer (EC2) instance with the latest Ubuntu following these instructions

Then log into the instance on SSH (AWS has very clear and helpful tutorials on this that depend on whether you are logging on from Windows or a Linux operating system) and run the below commands.  The first one adds your rstudio user (that you will log onto from a web browser), makes a directory, lets you set the rstudio user password that you will use to log on, and then sets permissions so you can write to this directory.

All of the commands I discuss today you will input into the SSH terminal.  This means you are making changes to the remote computer (the Amazon EC2 instance, i.e., your new RStudio webserver), not your local computer.  Do not input the dollar sign - this represents the prompt that the terminal shows.

$sudo useradd rstudio
$sudo mkdir /home/rstudio
$sudo passwd rstudio
$sudo chmod -R 0777 /home/rstudio

Then update your instance.
$sudo apt-get update
$sudo apt-get upgrade

Here I use nano instead of vi to edit a new sources.list file (you have to put the .d or it won't be saved).  Other instructions I've seen use vi but I prefer nano as a simpler for the uninitiated like myself.
$sudo nano /etc/apt/sources.list.d/sources.list

Once you're in nano, add this line to the sources.list file.  You can replace it with your favorite CRAN mirror and whatever version of ubuntu you have.
$deb https://cloud.r-project.org/bin/linux/ubuntu xenial/

Next, add the key to your system.
$sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E084DAB9

Then update again...
$sudo apt-get update
...and install the latest version of R (next code line).  If you don't do the sources.list and key steps from previous lines, the r-base version will be older than current.
$sudo apt-get install r-base

Now that R is installed, install RStudio Server using the three lines below.  Go to the RStudio website for an updated version name for the .deb file (the code below is current as of 11 April 2017 when I first drafted this post for my own reference; I recommend going to find the most up-to-date filename). 


$sudo apt-get install gdebi-core

$wget https://download2.rstudio.org/rstudio-server-1.0.143-amd64.deb
$sudo gdebi rstudio-server-1.0.143-amd64.deb


Then you should be able to go to the IP address for your server (check in the EC2 console in AWS to get the IP address) with :8787 after it, and log in using the username and password you created at the beginning.








Part of the reason I wanted to use an AWS server is that I had a huge dataset that didn't fit on my laptop. Now that RStudio can be logged in, attach an Elastic Block Storage (EBS) volume to allow storing your files.  You can think of it like an external hard drive for your Amazon web server.

These instructions are adapted from the AWS tutorial and assume you are using using an instance and EBS volume that was created with the instance and is currently empty.
$lsblk
$sudo file -s /dev/xvdb
Answer was "data" which means is empty, no file system.

$sudo mkfs -t ext4 /dev/xvdb
$sudo mkdir /data
$sudo mount /dev/xvdb /data

$sudo chmod -R 0777 /data
$sudo chmod -R 0777 /data/*
These two lines add write permission for the folder and files.

I rebooted my instance and was terrified to not find the data drive.  However, it was because I have to mount the drive (the mount step above) each time.  You can change the fstab file to make it mount permanently.  To do this, search for the UUID of your EBS volume after it has been mounted.
$sudo blkid


You should get an answer back something like this:
/dev/xvdb: UUID="ab82e239-b284-4527-922f-b82b6b9ebc8c" TYPE="ext4"

$sudo cp /etc/fstab /etc/fstab.orig
$sudo nano /etc/fstab

Updated to note: Update the fstab file to include the UUID.  (I have forgotten exactly where, but this askubuntu.com answer says how.)


The EBS volume should now be mounted permanently.

To do mount a volume created from scratch (not at the time of the instance creation),
follow these steps and mount the volume as previously.   If the volume size needs increasing later on (i.e., you didn't predict the size of storage you needed), follow these instructions.  Then, if you are using a Linux system like Ubuntu, these.

You should now be able to shut down or reboot your RStudio server and still have the EBS volume still mounted.

No comments:

Post a Comment

Comments and suggestions welcome.