The guide below is meant to illustrate the installation process of the STATS@UIUC Big Data Image on Amazon’s EMR platform. By using Amazon’s EMR platform, we are operating on a different configuration of Hadoop than what was distributed on the UIUC Big Data Virtual Image. At a later time, we will make available an installation process that mimics the UIUC Big Data Image on Amazon’s EC2 platform.

A foreword….

This is new and there is a lot of potential for a configuration to be astray. Do not be alarmed if this guide is updated fervently.

Table of Contents

The guide has been broken up into multiple parts on the account of how large some of the posts are due to the image files.

  1. Obtain an Amazon Web Services (AWS) Account
  2. Installing Amazon Web Services Command Line Interface (AWS CLI) for Windows, OS X, and Linux
  3. Configuring AWS CLI
  4. Launching an AWS EMR Cluster with RStudio, Hive, Pig, and Hue
  5. Accessing RStudio and Hue on AWS EMR + SSH

Questions? E-mail me.