Intro

In the previous post, the ability to interact within the STATS@UIUC Big Data Image was discussed. Within this post, we detail different websites that are available with different ports and how to create snapshots or saves of the virtual image state.

In the Browser

Within this section, we will detail various web components of the image. These web components are going to be accessed by using different ports on the localhost domain from your preferred web browser. You can think of each port as a different URL. In application, you may have different subdomains pointing to these locations.

Accessing RStudio

On the modified UIUC HDP image, we have placed RStudio Server. This allows you to use R via your browser using the traditional desktop RStudio interface.

To access RStudio, make sure the image is running in Virtual Box.

Open a web browser and enter in the url field:

localhost:8787

To log into RStudio, use the following account information:

Username: rstudio

Password: rstudio

rstudio login page

After logging in, you should be presented with:

rstudio login page

Accessing Hue

Later in the course, we will be focusing on using Pig and Hive. Hortonworks provides a great web interface that simplifies this greatly.

To access the web interface use:

localhost:8000

Unlike RStudio, there is no login required. Upon entering the address, you should see:

hortonworks web interface intro

Accessing the NameNode

The NameNode is the centerpiece of an HDFS file system. It keeps the directory tree of all files in the file system, and tracks where across the cluster the file data is kept.

Within the NameNode web interface, you have access to:

  1. Uptime of the NameNode
  2. Number of Live, Dead, and Decommissioned Nodes
  3. Host and Port Information
  4. Safe Mode Status
  5. Heap Information
  6. Audit Logs
  7. Garbage collection Metrics
  8. Total Load
  9. File Operations
  10. CPU Usage

Note: The NameNode does not store the data of these files itself.

To access the NameNode web interface use: localhost:50070

There is no login to access this information. If the image is active, you should see:

hortonworks web interface intro

Accessing the YARN/Resource Manager

YARN/Resource Manager is responsible for managing compute resources in Hadoop clusters and using them for scheduling of users’ applications.

Within the YARN/Resource Manager web interface, you have access to:

  1. Cluster Status
  2. Running/Completed/Failed Jobs
  3. Scheduler Information
  4. Node Information
  5. Application Information

To access the YARN/Resource Manager web interface use:

localhost:8088

There is no login to access this information. If the image is active, you should see:

hortonworks web interface intro

Accessing the Job History page

The Job History interface allows you to view your previous jobs that have been submitted and are no longer on the system. Think of this as a showcase of what you have been able to do with hadoop!

To access the Job History web interface use:

localhost:19888

There is no login to access this information. If the image is active, you should see:

hortonworks web interface intro

Snapshots to back up work!

Snapshots allow you to save the state of a virtual machine for later use. You can use these snapshots at a later time to revert back to state the virtual machine was when the snapshot was taken. In essence, a snapshot is like a photograph, which preserves a point in time. This is ideal since it allows you the ability to back up your work on the virtual machine.

Taking a snapshot

There are two ways to take a snapshot of the virtual machine depending on the state it is in.

Snapshot of a running virtual machine

When the VM is running, go to the Machine pull-down menu and select Take Snapshot.

snapshot machine while running

Snapshot of a “saved” or “powered off” virtual machine

When the VM is in either the “saved” or the “powered off” state (in the left hand side panel in the main window)

snapshot machine state

Click on the Snapshots tab on the top right of the main window

snapshot topright

To take a snapshot in this mode, there are two options:

  1. Click on the small camera icon

snapshot camera icon

OR

  1. Right-click on the “Current State” item in the list and select Take Snapshot from the menu.

snapshot current state

Snapshot Save window

With either option, a window will pop up to prompt you for a snapshot name. Please label each snapshot appropriately (e.g. “Added book data files”)

snapshot naming

Press OK

Your new snapshot will then appear in the snapshots list.

snapshot current state

Note: “Current State” of the machine has moved underneath your new snapshot. If take another snapshot later on, the snapshot will be further in the embedded in the list.

Restoring a Snapshot

To restore a snapshot, right click on any snapshot in the list of snapshots and select Restore Snapshot.

snapshot restore previous state

When the machine is restored, the virtual machine previous denote as the “Current State” is lost.

Deleting a Snapshot

To delete a snapshot, right-click on it in the snapshots tree and select Delete.

snapshot delete snapshot

Note: Deleting a past snapshot which will not affect the “Current State” of the virtual machine. When the snapshot is deleted, only the files specific to the snapshotted version on the disk are removed.

Misc.

There are no limits on the number of snapshots you can take.

The only limitation is the amount of storage on your hard drive.