The Motivation

Over the past week, I’ve been working on converting R scripts into scripts that hook into R’s C++ API using RcppArmadillo. The common reaction when I mentioned the project was, “Huh? Why would you bother converting from R language into R C++?” Then, I showed them the benchmarks. Needless to say, I now have several folks who want to learn how to write scripts using RcppArmadillo. The post is meant as an introduction in a series of posts to writing in Rcpp/RcppArmadillo.

The Setup

Before we go further, let’s talk about the work environment basics for RcppArmadillo. First, make sure you install the following packages RcppArmadillo, Rcpp, inline, and rbenchmark

install.packages("Rcpp","RcppArmadillo","inline","rbenchmark");

Development Environment

The development flow that is typical with code development projects is to use an R Studio project space. To create an Rcpp project from within R Studio, go through the normal project creation steps:

  1. File => New Project
  2. New Directory => R Package
  3. Click the Type: dropdown menu to engage options, Select “Package w/ Rcpp”
    • By default it says, “Package”
  4. Fill in Package name
  5. Select appropriate directory for package
  6. Uncheck create a git repository for this project
    • Git is a version control system for code and is outside the scope of this tutorial.
  7. Press create project

Bold words indicate changes from the normal creation process.

If you create a new C++ file from the drop down menu, then note that in the upper right hand corner of the code editor you no longer have “Run” and “Re-run the previous code region.” The only remainder from creating R Scripts in the code editor is that of “Source,” which will compile that Rcpp script using the R console command sourceCpp().

In order to build your package using the Build tab, modifications need to be made to both the DESCRIPTION file and the NAMESPACE file.

The DESCRIPTION file should look like so:

Package: your package name here
Type: Package
Title: package title
Version: 1.0
Date: YYYY-MM-DD
Author: Your Name Here
Maintainer: Person to send complaints to <complain_to_me@gmail.com>
Description: Short description
License: MIT + file LICENSE
Imports: Rcpp (>= 0.12.9)
LinkingTo: Rcpp, RcppArmadillo

Note, the only main difference is the inclusion of RcppArmadillo in the LinkingTo field!

Within the NAMESPACE file, make the following modifications:

useDynLib(packagename)         # Change me!
importFrom(Rcpp, evalCpp)
exportPattern("^[[:alpha:]]+") # Exports all functions
                               # Remove if proficient with roxygen2's @export tag.

The key is to substitute your package name in useDynLib(packagename).

With these modifications, RStudio will be able to handle RcppArmadillo based packages.

A short cavat to the above note is having space or any special characters in the file path to the source file. So, this means you should not try to compile a source file with the following paths:

C:/users/name/what up/did you/know spaces/are very/harmful to/rcpp files.cpp

or

C:/users/name/!@#$%^&*()-=+/rcppfile.cpp

Making a simple script

Please note, there are several alternatives to this workflow. For example, if you only want to outsource one function that is loop-intensive, then using the inline package or cppFunction() is preferred.

Here is an example creating an inline function declaration with cppFunction():

#In R
library("Rcpp")

#In R
cppFunction('
    //declare return type, specify function name, and function parameters
    int hello_world_rcpp(int n) {
    
    //C++ for loop
    for(int i=0; i<n; i++){
    
      //prints to R console similar to print() or cat()
      Rcpp::Rcout << "Hello World!" << std::endl;
      
    }
    
    //send back the number of times hello world was said.
    return n;
}')

The R equivalent would be:

hello_world_r = function(n) {
      for(i in 1:n){
        cat("Hello World!");
      }
      return(n);
}

Calling the function results in:

#In R
hello_world_rcpp(n=2)
Hello World!
Hello World!
[1] 2

Why this wasn’t futile…

In the beginning, I mentioned that the primary reason for converting from R scripts to R’s C++ API was speed. To illustrate, note the following benchmarks created using library("rbenchmark").

library("rbenchmark")
benchmark(rcpp_v = hello_world_rcpp(2),r_v = hello_world_r(2))
##     test replications elapsed relative user.self sys.self user.child   sys.child
## 2    r_v          100    0.01       NA      0.02        0         NA          NA
## 1 rcpp_v          100    0.00       NA      0.00        0         NA          NA

So, we should prefer the rcpp implementation. Note, this may not always hold when you are echoing statements out to the console. There may be added lag using Rcpp out statements vs. R out statements. However, the looping procedures within Rcpp should be faster than looping procedures in R. Also, the output from this benchmark was suppressed. If you run this benchmark on your system, expect to receive 200 “Hello World!” and 200 returns of the number 2 in addition to the output table displayed above.