# Creating Stacked Barplot and Grouped Barplot in R using Base Graphics (no ggplot2)

# Creating a Stacked Barplot in R using Base Graphics (no ggplot2)

Stacked Barplots, or graphs that depict conditional distributions of data, are great for being able to see a level-wise breakdown of the data. Unfortunately, R has no easily built in functions for generating a stacked barplot. In fact, the majority of R users suggest creating stacked barplots using ggplot2 since it is easier and also looks a bit better. However, for those not yet aboard the ggplot2 train or for those that prefer base R, fear not! For we are about to embark on an adventure that will bring stacked barplots to base R!

## Examining the Data

For this example, we will use data concerning web domains in different journals. The dataset looks like the following:

```
internetrefs
```

Domain | Journal | Count |
---|---|---|

gov | NEJM | 41 |

gov | JAMA | 103 |

gov | Science | 111 |

org | NEJM | 37 |

org | JAMA | 46 |

org | Science | 162 |

com | NEJM | 6 |

com | JAMA | 17 |

com | Science | 14 |

edu | NEJM | 4 |

edu | JAMA | 8 |

edu | Science | 47 |

other | NEJM | 9 |

other | JAMA | 15 |

other | Science | 52 |

Notice, the data has consistent order such that all **Domain** entries are grouped together and the **Journal** variable has a cyclic ordering where: *NEJM*, then *JAMA*, and finally *Science* appears. We will exploit this feature of the data. I will also provide a method that isolates the data when such feature does not exist.

### Creating the Matrix

In order to create a stacked bar plot, we must first know a variable’s levels(). Levels is synonymous with factors that are associated with categorical (string) variables. The levels() function returns the unique values of the strings that a variable takes on. So:

```
#Attach object so that we can reference by Domain, Journal, and Count
#Instead of internetrefs$Count
attach(internetrefs)
levels(Journal)
```

```
## [1] "JAMA" "NEJM" "Science"
```

```
levels(Domain)
```

```
## [1] "com" "edu" "gov" "org" "other"
```

Then, using the levels() information, the data must be transformed into a matrix structure. The matrix structure takes on the following form: the rows are representative of the levels() of the **Domain** variable and the columns represent the levels() of the **Journal** variable. We induce this by:

```
### USING THE PRE-EXISTING ORDER FEATURE OF THE DATA ###
#Load the count values
data=Count
#Place data in a matrix that will have 3 columns since the number levels() for Journal is 3.
#Also, based on the ordering feature of the data, load the matrix such that we fill the matrix by row.
data=matrix(data,ncol=3,byrow=T)
#Label the columns and rows
colnames(data)=levels(Journal)
rownames(data)=levels(Domain)
```

The matrix structure of data is then:

JAMA | NEJM | Science | |
---|---|---|---|

com | 41 | 103 | 111 |

edu | 37 | 46 | 162 |

gov | 6 | 17 | 14 |

org | 4 | 8 | 47 |

other | 9 | 15 | 52 |

But, let’s say that you lack that nice feature of the data that we discussed earlier. One way to obtain it is by going through and ordering the columns of the initial dataframe:

```
internetrefs_ordered = internetrefs[with(internetrefs, order(Domain,Journal)), ]
```

Domain | Journal | Count | |
---|---|---|---|

8 | com | JAMA | 17 |

7 | com | NEJM | 6 |

9 | com | Science | 14 |

11 | edu | JAMA | 8 |

10 | edu | NEJM | 4 |

12 | edu | Science | 47 |

2 | gov | JAMA | 103 |

1 | gov | NEJM | 41 |

3 | gov | Science | 111 |

5 | org | JAMA | 46 |

4 | org | NEJM | 37 |

6 | org | Science | 162 |

14 | other | JAMA | 15 |

13 | other | NEJM | 9 |

15 | other | Science | 52 |

From | here, the | problem t | hen simplifies to the code used for the ordered data: |

```
### AFTER CREATING THE ORDER FEATURE OF THE DATA ###
#Access and load the count values
data_ordered=internetrefs_ordered$Count
#Place data in a matrix that will have 3 columns since the number levels() for Journal is 3.
#Also, based on the ordering feature of the data, load the matrix such that we fill the matrix by row.
data_ordered=matrix(data_ordered,ncol=3,byrow=T)
#Label the columns and rows
colnames(data_ordered)=levels(internetrefs_ordered$Journal)
rownames(data_ordered)=levels(internetrefs_ordered$Domain)
```

JAMA | NEJM | Science | |
---|---|---|---|

com | 41 | 103 | 111 |

edu | 37 | 46 | 162 |

gov | 6 | 17 | 14 |

org | 4 | 8 | 47 |

other | 9 | 15 | 52 |

## Building the Stacked Barplot

There are two ways to build a stacked barplot: percentage-based and counts-based. Also, we can opt to have the barplot stacked vertically or stacked horizontally.

However, special care needs to be taken when including a legend. By default, barplot()’s legend generating capabilities are pretty lacking. As a result, one needs to modify the margin space and how clipping is handled. This is achieved by setting par():

```
#mar is defined to receive: c(bottom, left, top, right) .
#The default margin is: c(5, 4, 4, 2) + 0.1 .
#As a result, we have exploded the right-hand side of the figure to hold legend.
#xpd=TRUE forces all plotting to be clipped to the figure region
par(mar=c(5.1, 4.1, 4.1, 7.1), xpd=TRUE)
```

To build percentage-based barplots, we must use prop.table() to generate each percentage for the columns:

```
#Here, margin represents whether it will be run on rows (1) or columns (2)
#We've selected to use prop.table() on columns since that was how we built our data.
prop = prop.table(data,margin=2)
```

If we are looking to build a **percentage**-based *vertical* stacked barplot then:

```
par(mar=c(5.1, 4.1, 4.1, 7.1), xpd=TRUE)
barplot(prop, col=heat.colors(length(rownames(prop))), width=2)
legend("topright",inset=c(-0.25,0), fill=heat.colors(length(rownames(prop))), legend=rownames(data))
```

If we are looking to build a **percentage**-based *horizontal* stacked barplot then:

```
par(mar=c(5.1, 4.1, 4.1, 7.1), xpd=TRUE)
barplot(prop, col=heat.colors(length(rownames(prop))), width=2, beside=TRUE)
legend("topright",inset=c(-0.25,0), fill=heat.colors(length(rownames(prop))), legend=rownames(data))
```

If we are looking to build a **counts**-based *horizontal* stacked barplot then:

```
par(mar=c(5.1, 4.1, 4.1, 7.1), xpd=TRUE)
barplot(data, col=heat.colors(length(rownames(data))), width=2, beside=TRUE)
legend("topright",inset=c(-0.25,0), fill=heat.colors(length(rownames(data))), legend=rownames(data))
```

If we are looking to build a **counts**-based *vertical* stacked barplot then:

```
par(mar=c(5.1, 4.1, 4.1, 7.1), xpd=TRUE)
barplot(data, col=heat.colors(length(rownames(data))), width=2)
legend("topright",inset=c(-0.25,0), fill=heat.colors(length(rownames(data))), legend=rownames(data))
```

## In Summary

We’ve talked about a lot of different components required to pull off a stacked barplot in R. Below is the script that you will want to modify to suit your own data:

```
internetrefs = read.delim("F:/Desktop/internetrefs.txt")
#Force Order
data_ordered = internetrefs[with(internetrefs, order(Domain,Journal)), ]
#load the count values
data=data_ordered$Count
data=matrix(data,ncol=3,byrow=T)
colnames(data)=levels(data_ordered$Journal)
rownames(data)=levels(data_ordered$Domain)
prop = prop.table(data,margin=2)
par(mar=c(5.1, 4.1, 4.1, 7.1), xpd=TRUE)
#Percent-based vertically stacked barplot
barplot(prop, col=heat.colors(length(rownames(prop))), width=2)
legend("topright",inset=c(-0.25,0), fill=heat.colors(length(rownames(prop))), legend=rownames(data))
#Percent-based horizontally stacked barplot
barplot(prop, col=heat.colors(length(rownames(prop))), width=2, beside=TRUE)
legend("topright",inset=c(-0.25,0), fill=heat.colors(length(rownames(prop))), legend=rownames(data))
#Counts-based vertically stacked barplot
barplot(data, col=heat.colors(length(rownames(data))), width=2)
legend("topright",inset=c(-0.25,0), fill=heat.colors(length(rownames(data))), legend=rownames(data))
#Counts-based horizontally stacked barplot
barplot(data, col=heat.colors(length(rownames(data))), width=2,beside=TRUE)
legend("topright",inset=c(-0.25,0), fill=heat.colors(length(rownames(data))), legend=rownames(data))
```

## Thanks

Special thanks go out to Weihong Huang!