Editor’s Note: This post is published within the sitmo R package as a vignette.

# Intro

Many of the random number generators for various distributions rely on the Probability Integral Transformation theorem. Succintly stated as:

Theorem

Let $X$ be a random variable that has a cumulative distribution function (CDF) of $F_X\left({x}\right)$. Then, define random variable $U = F_X\left({X}\right)$. Thus, $U$ is a uniform distribution.

Proof

Given any random variable X, define $U = F_X\left({X}\right)$. Then:

Therefore, $F_U\left({u}\right)$ is the CDF of a Uniform(0,1) RV. Hence, $U$ has a uniform distribution on $[0,1]$.

# Random Uniform Distribution (runif) in C++

Within the R/C++ API, there are three ways one can use a random uniform distribution.

• Through the use of Rcpp’s hook into the Rmath.h library that controls random generation via R::runif(a,b) or Rcpp’s sugar Rcpp::runif(n,a,b).
• By using C++11’s built-in generators and statistical distributions to create a uniform real random variable generator.

With this being said, we opt to focus the remainder of this vignette on the creation of an RNG.

# Creating a Random Uniform Distribution

The basics of creating a random uniform distribution are as follows:

1. The generation of random numbers $R$ (e.g. 18885, 23945734, 4298034852, and so on)
2. The known maximum/ceiling of the random number generation $\max\left(R\right)$ (e.g. sitmo::prng_engine::max() or SITMO_RAND_MAX)
3. The ability to scale the randomly generated number between $[a,b]$.

sitmo provides a high quality version of 1. as shown informally in the next section and a means of acquiring 2. Thus, one is only left with creating the correct scaling equation. In particular, this equation is governed by:

The implementation of this using sitmo is given as follows:

To verify the quality of sitmo in an informal way, we can test the dependency or correlation between the seeds. To do so, we generate the same number of realizations under different seeds that have a range. With this being said, we consider the following code:

We can visualize the data by using a correlation graph. In this case, the generations off the diagonal should have no correlation ($r = 0$) whereas the seeds on the diagonal should have a correlation of 1 ($r = 1$).

Observing the correlation graph, we note that the pattern predicted - only correlation on the diagonal - exists. Thus, the generation under these seeds are ideal.