Creating a Stacked Barplot in R using Base Graphics (no ggplot2)

Stacked Barplots, or graphs that depict conditional distributions of data, are great for being able to see a level-wise breakdown of the data. Unfortunately, R has no easily built in functions for generating a stacked barplot. In fact, the majority of R users suggest creating stacked barplots using ggplot2 since it is easier and also looks a bit better. However, for those not yet aboard the ggplot2 train or for those that prefer base R, fear not! For we are about to embark on an adventure that will bring stacked barplots to base R!

Examining the Data

For this example, we will use data concerning web domains in different journals. The dataset looks like the following:

internetrefs
Domain Journal Count
gov NEJM 41
gov JAMA 103
gov Science 111
org NEJM 37
org JAMA 46
org Science 162
com NEJM 6
com JAMA 17
com Science 14
edu NEJM 4
edu JAMA 8
edu Science 47
other NEJM 9
other JAMA 15
other Science 52

Notice, the data has consistent order such that all Domain entries are grouped together and the Journal variable has a cyclic ordering where: NEJM, then JAMA, and finally Science appears. We will exploit this feature of the data. I will also provide a method that isolates the data when such feature does not exist.

Creating the Matrix

In order to create a stacked bar plot, we must first know a variable’s levels(). Levels is synonymous with factors that are associated with categorical (string) variables. The levels() function returns the unique values of the strings that a variable takes on. So:

#Attach object so that we can reference by Domain, Journal, and Count
#Instead of internetrefs$Count
attach(internetrefs)
levels(Journal)
## [1] "JAMA"    "NEJM"    "Science"
levels(Domain)
## [1] "com"   "edu"   "gov"   "org"   "other"

Then, using the levels() information, the data must be transformed into a matrix structure. The matrix structure takes on the following form: the rows are representative of the levels() of the Domain variable and the columns represent the levels() of the Journal variable. We induce this by:

### USING THE PRE-EXISTING ORDER FEATURE OF THE DATA ###

#Load the count values
data=Count

#Place data in a matrix that will have 3 columns since the number levels() for Journal is 3.
#Also, based on the ordering feature of the data, load the matrix such that we fill the matrix by row. 
data=matrix(data,ncol=3,byrow=T)

#Label the columns and rows
colnames(data)=levels(Journal)
rownames(data)=levels(Domain)

The matrix structure of data is then:

  JAMA NEJM Science
com 41 103 111
edu 37 46 162
gov 6 17 14
org 4 8 47
other 9 15 52

But, let’s say that you lack that nice feature of the data that we discussed earlier. One way to obtain it is by going through and ordering the columns of the initial dataframe:

internetrefs_ordered = internetrefs[with(internetrefs, order(Domain,Journal)), ]
  Domain Journal Count
8 com JAMA 17
7 com NEJM 6
9 com Science 14
11 edu JAMA 8
10 edu NEJM 4
12 edu Science 47
2 gov JAMA 103
1 gov NEJM 41
3 gov Science 111
5 org JAMA 46
4 org NEJM 37
6 org Science 162
14 other JAMA 15
13 other NEJM 9
15 other Science 52
From here, the problem t hen simplifies to the code used for the ordered data:
### AFTER CREATING THE ORDER FEATURE OF THE DATA ###

#Access and load the count values
data_ordered=internetrefs_ordered$Count

#Place data in a matrix that will have 3 columns since the number levels() for Journal is 3.
#Also, based on the ordering feature of the data, load the matrix such that we fill the matrix by row. 
data_ordered=matrix(data_ordered,ncol=3,byrow=T)

#Label the columns and rows
colnames(data_ordered)=levels(internetrefs_ordered$Journal)
rownames(data_ordered)=levels(internetrefs_ordered$Domain)
  JAMA NEJM Science
com 41 103 111
edu 37 46 162
gov 6 17 14
org 4 8 47
other 9 15 52

Building the Stacked Barplot

There are two ways to build a stacked barplot: percentage-based and counts-based. Also, we can opt to have the barplot stacked vertically or stacked horizontally.

However, special care needs to be taken when including a legend. By default, barplot()’s legend generating capabilities are pretty lacking. As a result, one needs to modify the margin space and how clipping is handled. This is achieved by setting par():

#mar is defined to receive: c(bottom, left, top, right) .
#The default margin is: c(5, 4, 4, 2) + 0.1 .
#As a result, we have exploded the right-hand side of the figure to hold legend.

#xpd=TRUE forces all plotting to be clipped to the figure region
par(mar=c(5.1, 4.1, 4.1, 7.1), xpd=TRUE)

To build percentage-based barplots, we must use prop.table() to generate each percentage for the columns:

#Here, margin represents whether it will be run on rows (1) or columns (2)
#We've selected to use prop.table() on columns since that was how we built our data.
prop = prop.table(data,margin=2)

If we are looking to build a percentage-based vertical stacked barplot then:

par(mar=c(5.1, 4.1, 4.1, 7.1), xpd=TRUE)
barplot(prop, col=heat.colors(length(rownames(prop))), width=2)
legend("topright",inset=c(-0.25,0), fill=heat.colors(length(rownames(prop))), legend=rownames(data))

If we are looking to build a percentage-based horizontal stacked barplot then:

par(mar=c(5.1, 4.1, 4.1, 7.1), xpd=TRUE)
barplot(prop, col=heat.colors(length(rownames(prop))), width=2, beside=TRUE)
legend("topright",inset=c(-0.25,0), fill=heat.colors(length(rownames(prop))), legend=rownames(data))

If we are looking to build a counts-based horizontal stacked barplot then:

par(mar=c(5.1, 4.1, 4.1, 7.1), xpd=TRUE)
barplot(data, col=heat.colors(length(rownames(data))), width=2, beside=TRUE)
legend("topright",inset=c(-0.25,0), fill=heat.colors(length(rownames(data))), legend=rownames(data))

If we are looking to build a counts-based vertical stacked barplot then:

par(mar=c(5.1, 4.1, 4.1, 7.1), xpd=TRUE)
barplot(data, col=heat.colors(length(rownames(data))), width=2)
legend("topright",inset=c(-0.25,0), fill=heat.colors(length(rownames(data))), legend=rownames(data))

In Summary

We’ve talked about a lot of different components required to pull off a stacked barplot in R. Below is the script that you will want to modify to suit your own data:

internetrefs = read.delim("F:/Desktop/internetrefs.txt")

#Force Order
data_ordered = internetrefs[with(internetrefs, order(Domain,Journal)), ]

#load the count values
data=data_ordered$Count

data=matrix(data,ncol=3,byrow=T)

colnames(data)=levels(data_ordered$Journal)
rownames(data)=levels(data_ordered$Domain)

prop = prop.table(data,margin=2)

par(mar=c(5.1, 4.1, 4.1, 7.1), xpd=TRUE)

#Percent-based vertically stacked barplot
barplot(prop, col=heat.colors(length(rownames(prop))), width=2)
legend("topright",inset=c(-0.25,0), fill=heat.colors(length(rownames(prop))), legend=rownames(data))

#Percent-based horizontally stacked barplot
barplot(prop, col=heat.colors(length(rownames(prop))), width=2, beside=TRUE)
legend("topright",inset=c(-0.25,0), fill=heat.colors(length(rownames(prop))), legend=rownames(data))

#Counts-based vertically stacked barplot
barplot(data, col=heat.colors(length(rownames(data))), width=2)
legend("topright",inset=c(-0.25,0), fill=heat.colors(length(rownames(data))), legend=rownames(data))

#Counts-based horizontally stacked barplot
barplot(data, col=heat.colors(length(rownames(data))), width=2,beside=TRUE)
legend("topright",inset=c(-0.25,0), fill=heat.colors(length(rownames(data))), legend=rownames(data))

Thanks

Special thanks go out to Weihong Huang!