## Intro

Oddly enough, I spend a lot of my time talking about different kinds of
data structures and when they should be used. Recently, one of my fellow
graduate students discovered why using **sparse** matrices instead of **regular**
matrices are beneficial. Principally, the sparse matrix format reduces the size
of the overall data structure as it discards uneventful values. For instance,
if an entry value is zero or very close to it, then retaining such values
wouldn’t be informative. Instead, there would be a preference to only store
the locations inside the matrix that have non-zero entries.

### The Matrices War: Regular Matrices vs. Sparse Matrices

To start the discussion, let’s create a regular matrix that is “sparse”. For data,
let’s aim to use the binomial distribution to replicate a coin flip under a
low probability. Thus, heads would be a `1`

and tails would be a `0`

. So,
we’re only going to be interested in knowing when heads lands.

```
# Set a seed for reproducibility
set.seed(9134)
# Generate a matrix containing 1's and 0's with
# a low probability of 1.
standard_matrix = matrix(
rbinom(100*100, size = 1, prob = .06),
nrow = 100
)
# Show the first four rows and columns
standard_matrix[1:4, 1:4]
```

```
## [,1] [,2] [,3] [,4]
## [1,] 0 0 0 0
## [2,] 0 0 0 0
## [3,] 0 0 0 1
## [4,] 0 0 0 0
```

If we convert the matrix to a sparse matrix, the **zero** entries will be
discarded while the **non-zero** entries are preserved and stored by coordinate.
To convert to a sparse matrix, either use the `as()`

or `Matrix()`

functions
that reside in the `Matrix`

*R* package.

```
# Load the Matrix package
library("Matrix")
# Convert the regular matrix to a sparse matrix
sparse_matrix = as(standard_matrix, "sparseMatrix")
# Alternatively, we could use:
# Matrix(sparse_matrix, sparse = TRUE)
# Preview the sparse matrix
sparse_matrix[1:4, 1:4]
```

```
## 4 x 4 sparse Matrix of class "dgCMatrix"
##
## [1,] . . . .
## [2,] . . . .
## [3,] . . . 1
## [4,] . . . .
```

Notice, wherever there was a `0`

entry a period, e.g. `.`

, now acts as a placeholder.
As a result of discarding **zero** entries, the size of the matrix decreases
greatly.

```
# Standard matrix
print(object.size(standard_matrix), units="Kb")
```

```
## 39.3 Kb
```

```
# Sparse matrix
print(object.size(sparse_matrix), units="Kb")
```

```
## 8.4 Kb
```

This is largely because the `sparseMatrix`

format only saves the non-zero
values at a coordinate location under the scheme of:

`i`

is**row**index;`j`

is**column**index;`x`

is the**value**(optional if values other than 1 preferred); and`dim`

is the dimension of the matrix.

With scheme in hand, the sparse matrix can be directly constructed using
`sparseMatrix()`

function in the `Matrix`

*R* package.

```
# Construct a new sparse matrix from a sample.
sample_sparse_matrix = sparseMatrix(
i = c(1, 2, 4), # Row ID
j = c(1, 3, 5), # Column ID
x = c(8, 1, 7), # Contents
dims = list(4, 5) # Dimension of matrix
)
# Show sparse matrix
sample_sparse_matrix
```

```
## 4 x 5 sparse Matrix of class "dgCMatrix"
##
## [1,] 8 . . . .
## [2,] . . 1 . .
## [3,] . . . . .
## [4,] . . . . 7
```

## Fin

In short, sparse matrices are incredibly powerful for storing data that has a small amount of entries. Consider using a sparse matrix if you are encountering memory issues.