To Rcpp Attributes and Beyond From inline

In the olden days of Rcpp, circa 2012 or before Rcpp 0.10.0, there was a bees knees approach to inlining C++ code with R using a package called inline. Fast forward to when Rcpp 0.10.0 was released on November 12, 2012 and the world of Rcpp changed forever. Now, you may be thinking that I’m being a little overdramatic here. However, that is not the case at all. The pinnacle contribution of Rcpp Attributes made by J.J. Allaire changed everything. Everyday statisticians could harness the power of C++ from within R with very little knowledge of C++ as illustrated by his Rcpp Bay Area talk. While the end result between the two methods is the same, the journey to the end is remarkably different. The differences will be shown are considerable in magnitude. Therefore, we are brought to the pinnacle question of this post:

If a great algorithm exists and no one uses it, does it really exist?

From experience, the answer is resounding no and, thus, the origins of this post to hopefully discourage the use of inline in favor of more modern techniques offered by Rcpp for inlining C++ code.

End of the Beginning

For many, the formal introduction to Rcpp happens through Dirk Eddelbuettel’s Seamless R and C++ Integration with Rcpp (free for University students with a membership to Springer, otherwise $54.99), which unfortunately makes heavy use of the inline package. Indeed, the introductory fibonacci example to show how to embed C++ code within R makes direct use of inline. The use of inline would not be very problematic but it increases the barrier of entry for those just beginning their journey marrying R code with C++. In particular, to inline code with the inline package one must:

  1. translate the computationally intense code to C++
  2. provide faux function body declaration
  3. specify the R signature types for the function variables
  4. save compiled function given by cxxfunction() or cfunction() to a variable

As one can imagine, the approach yields several areas that draw away attention from the main task: improving the speed of computational intensive code. In essence, to accomplish the inlining task, one is forced to have knowledge of a smorgasbord of programming techniques.

Now, the contribution of Rcpp Attributes made by J.J. Allaire alleivated all of the above points in one swoop. The process of inlining code became remarkably straightforward to the point that you can now press a button in the RStudio IDE and voila the code is inlined! Simply put, the focus is now on the act of translating or writing computationally intensive code and not on managing it. In essence, Rcpp Attributes kept the syntactical features associated with functions and automated the previously manual process by introducing cppFunction() and sourceCpp(). The prior maps more closely to the intent of inline while the later provides a more familiar route for seasoned programmers.

Case Study: Row Sums

To investigate the properties of inline code, we’ll opt to recreate the rowSums() function within R that takes a matrix of \(m \times n\) dimensions and reduces it to \(m\) by summing over the \(n\) elements within a row. Mathematically, the operation can be represented by the following series:

$$\sum_{j = 1}^n x_{ij}$$

# Create a test matrix
(test_mat <- matrix(1:4, nrow = 2))
##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4
# Compute the result using base R implementation
(r_base <- rowSums(test_mat))
## [1] 4 6

Implementing within R

Before talking about C++ code, we begin by displaying the implementation for row summation within R. Please note, the implementation is not optimized to take advantage of vectorization simply to ease the translation effort to C++. So, the resulting R implementation is given as:

# Non-vectorized (e.g. horribly inefficient)
rowSums_r <- function(a) {
  nrows <- nrow(a)
  row_sums <- rep(0, nrows)
  for(i in seq_len(nrows)) { 
    for(j in seq_len(ncol(a))) {
      row_sums[i] <- row_sums[i] + a[i, j]
    }
  }
  row_sums 
}

Running the necessary test checks, yields:

# Calculate row sum with function
(r_func <- rowSums_r(test_mat))
## [1] 4 6
# Check against base R implementation
all.equal(r_func, r_base)
## [1] TRUE

Old school inline use via cxxfunction()

Before we get to the use of Rcpp attributes, we need to take a trip down memory lane or more pressingly using apriori knowledge of the inline package process to compile code. We’ll follow the points outlined in the previous section that compared both approaches. However, this does not mean that there is any level of endorsement toward writing code with this approach. In fact, the take away message from this section should resoundingly be:

DO NOT USE inline FOR C++ CODE!

With this being said, we begin. We start the process off by translating the code from R to C++ using Rcpp syntax:

// Data matrix
Rcpp::NumericMatrix x;

// Initialize storage
Rcpp::NumericVector row_sums(x.nrow());

// Sum over results
for (int i = 0; i < x.nrow(); ++i) {
  for(int j = 0; j < x.ncol(); ++j) {
     row_sums[i] += x(i,j);
 }
}

Moving along, the above code must be morphed so that x can receive the input value from R and be able to send the computed result back into R. In essence, a faux functional body declaration must be crafted. The results of which are a bit weird to say the least.

# Create a faux functional C++ body within R
src <- '
// Convert SEXP to NumericMatrix
Rcpp::NumericMatrix x(x_in); // x_in comes from R

// Proceed with algorithm like normal
Rcpp::NumericVector row_sums(x.nrow());

for (int i = 0; i < x.nrow(); ++i) {
  for(int j = 0; j < x.ncol(); ++j) {
     row_sums[i] += x(i,j);
 }
}

return row_sums;     // returns the value to R
'

Notice that the last code snippet was done entirely as a string in R.

Once the function body is made, we must create the necessary signature input alignment for what type should be passed into the function. In this particular case, the function is expecting to receive a numeric matrix and, thus, we must specify the signature of the object as numeric.

# Include the library
library("inline")

# Set the inputs of function
input_sig = signature(x_in = "numeric")

Having all of the above components completed, we can now proceed to actually compiling the code. The compliation step is where the name of the function is derived. So, in terms of the case study, the name of row_sum_inline is what will be used to call the function later on.

# Compile function and assign name
row_sum_inline <- cxxfunction(sig = input_sig,
                              body = src, plugin="Rcpp")
## ld: warning: text-based stub file /System/Library/Frameworks//CoreFoundation.framework/CoreFoundation.tbd and library file /System/Library/Frameworks//CoreFoundation.framework/CoreFoundation are out of sync. Falling back to library file for linking.

Once the compilation is done, usually takes about 4-5 seconds, we now have a C++ function within R.

# Call C++ function
(r_inline = row_sum_inline(test_mat))
## [1] 4 6
# Test against base-r
all.equal(r_inline, r_base)
## [1] TRUE

Modern Rcpp Inline via Rcpp Attributes

Unlike inline, Rcpp equivalents are far more modern and straight forward to use. Most notably, once a function has been written, it is able to be immediately put inline using either cppFunction() or sourceCpp(). The prior option more closely matches what is available within inline with respect to defining a single operable function while the later option enables users more familiar with C++ to breakaway from embedding code as text within an R file but instead treat it as a standalone C++ file. Regardless of the choice, the end result is cleaner code that is focused principally on the computation rather than on the setup and management of the C++ function.

Unlike the prior option, if you blink, then you may miss the magic of attributes.

To begin, write a proper C++ functional form of rowSums().

NumericVector row_sum_rcpp(NumericMatrix x){
  NumericVector row_sums(x.nrow());
  
  for (int i = 0; i < x.nrow(); ++i) {
    for(int j = 0; j < x.ncol(); ++j) {
      row_sums[i] += x(i,j);
    }
  }
  
  return row_sums; 
}

Using cppFunction(), enclose the above function in strings and run it!

# In R
library("Rcpp")
cppFunction("
NumericVector row_sum_rcpp(NumericMatrix x){
  NumericVector row_sums(x.nrow());
  
  for (int i = 0; i < x.nrow(); ++i) {
    for(int j = 0; j < x.ncol(); ++j) {
      row_sums[i] += x(i,j);
    }
  }
  
  return row_sums; 
}
")

That’s it. Ta-da! The function name given in C++ is exactly the same name one should call within R.

# Call C++ function
(r_rcpp = row_sum_rcpp(test_mat))
## [1] 4 6
# Test against base-r
all.equal(r_rcpp, r_base)
## [1] TRUE

Separation of Concerns

The addition of attributes also allowed for a true separation of concerns model to take place within marrying R code to C++. That is, there could be a distinct area for exploration (R) and for high performance computing (C++) as files would not need to be marred with a mixture of both. By opting for this approach, the gain also is in terms of appropriate syntax highlighting as the code is no longer embedded within a string.

As hinted at earlier, to achieve such a feat one would use sourceCpp(). There are only four major differences between the aforementioned approach with cppFunction().

  1. You must specify the Rcpp.h header file.
  2. Each function that should be imported into R must have the Rcpp attribute of // [[Rcpp::export]] placed immediately preceeding the function.
  3. Append all Rcpp objects with Rcpp:: as the namespace for Rcpp is not automatically loaded.
    • One can get around this with using namespace Rcpp;, however, this is not ideal as it introduces ambiguity into function calls.
  4. Place code in an external file.

Returning to the case study, create a file called row_sum_ex.cpp and embed the row sum formulation previously concocted:

#include <Rcpp.h> // include R header library

// Rcpp attribute tag require to create interface to R.
// [[Rcpp::export]]
Rcpp::NumericVector row_sum_rcpp_ext(Rcpp::NumericMatrix x){
  
  Rcpp::NumericVector row_sums(x.nrow());
  
  for (int i = 0; i < x.nrow(); ++i) {
    for(int j = 0; j < x.ncol(); ++j) {
      row_sums[i] += x(i,j);
    }
  }
  
  return row_sums; 
}

Next, lets compile the row_sum_ex.cpp by using sourceCpp() function.

sourceCpp("path/to/row_sum_ex.cpp")

If we skip this step, then when go to call row_sum_rcpp_ext in R the function will not be available as the code has yet to be: 1. compiled and 2. made available to the R session.

To make sure everything still works as expect, let’s do one last verification check:

# Call C++ function
(r_rcpp_ex = row_sum_rcpp_ext(test_mat))
## [1] 4 6
# Test against base-r
all.equal(r_rcpp_ex, r_base)
## [1] TRUE

Fin

In essence, many things have changed since the introduction of Rcpp Attributes. One crucial turning point is related to the focus afforded by Rcpp Attributes being placed more on the actual calculation than keeping track of inputs or casting objects. While inline is not ideal, the above guide gently provided some transitionary remarks to use more appropriate inlining techniques provided by Rcpp.

comments powered by Disqus