Intro

Previously, we created an AWS S3 bucket, uploaded files, and then launched an AWS EMR cluster via AWS CLI. Now, we are going to allow various ports to open so we can view the web interfaces via a web browser and SSH into the cluster.

Open IP Ports

To use RStudio and Hue, you will need to open ports.

To open ports go to the EC2 console

AWS EC2 Console

On the Security Groups page, select the Actions dropdown menu and choose Edit inbound rules.

AWS EC2 Edit Inbound Rules

Select Add Rule on the bottom left. Enter 8787 for the Port Range and underneath Source pull the dropdown menu to Anywhere.

WARNING: SELECTING ANYWHERE HAS SECURITY IMPLICATIONS SINCE ANYONE CAN THEN ACCESS THE CLUSTER!

AWS EC2 Edit Inbound Rules

Repeat this process to open Hue to the outside world with 8888 for the Port Range.

SSH into the cluster

To SSH into the cluster you need to know the Public DNS. This is available on the EC2 console running instance page.

AWS EC2 Extended View

The ssh command is then:

# Use ec2-user for admin rights
ssh -i "<YOUR_KEYPAIR>".pem ec2-user@"<PUBLIC DNS>"

# Use hadoop in order to hadoop jobs
ssh -i "<YOUR_KEYPAIR>".pem hadoop@"<PUBLIC DNS>"

So, in my example it would be:

# Use hadoop in order to hadoop jobs
ssh -i jjb_keypair.pem hadoop@ec2-52-1-0-32.compute-1.amazonaws.com