In the olden days of Rcpp, circa 2012 or before Rcpp 0.10.0, there was a bees knees
approach to inlining C++ code with R using a package called
inline
. Fast forward to when Rcpp
0.10.0 was released on November 12, 2012 and the world of Rcpp changed forever.
Now, you may be thinking that I’m being a little overdramatic here. However,
that is not the case at all. The pinnacle contribution
of Rcpp Attributes
made by J.J. Allaire
changed everything. Everyday statisticians could harness the power of C++ from within
R with very little knowledge of C++ as illustrated by his
Rcpp Bay Area talk.
While the end result between the two methods is the same, the journey to the
end is remarkably different. The differences will be shown are considerable
in magnitude. Therefore, we are brought to the pinnacle question of
this post:
If a great algorithm exists and no one uses it, does it really exist?
From experience, the answer is resounding no and, thus, the origins of this
post to hopefully discourage the use of inline
in favor of more modern
techniques offered by Rcpp for inlining C++ code.
End of the Beginning
For many, the formal introduction to Rcpp happens through
Dirk Eddelbuettel’s
Seamless R and C++ Integration with Rcpp
(free for University students with a membership to Springer, otherwise $54.99),
which unfortunately makes heavy use of the
inline
package. Indeed, the
introductory fibonacci example to show how to embed C++ code within R makes
direct use of inline
. The use of inline
would not be very problematic but
it increases the barrier of entry for those just beginning their journey
marrying R code with C++. In particular, to inline code with the inline
package one must:
- translate the computationally intense code to C++
- provide faux function body declaration
- specify the R signature types for the function variables
- save compiled function given by
cxxfunction()
orcfunction()
to a variable
As one can imagine, the approach yields several areas that draw away attention from the main task: improving the speed of computational intensive code. In essence, to accomplish the inlining task, one is forced to have knowledge of a smorgasbord of programming techniques.
Now, the contribution of Rcpp Attributes
made by J.J. Allaire alleivated all of the above points in one swoop. The
process of inlining code became remarkably straightforward to the point that you
can now press a button in the RStudio IDE
and voila the code is inlined! Simply put, the focus is now on the act of
translating or writing computationally intensive code and not on managing it.
In essence, Rcpp Attributes
kept the syntactical features associated with functions and automated the previously
manual process by introducing cppFunction()
and sourceCpp()
. The prior maps
more closely to the intent of inline
while the later provides a more familiar route for seasoned programmers.
Case Study: Row Sums
To investigate the properties of inline code, we’ll opt to recreate the rowSums()
function within R that takes a matrix of \(m \times n\)
dimensions and reduces
it to \(m\)
by summing over the \(n\)
elements within a row. Mathematically, the
operation can be represented by the following series:
$$\sum_{j = 1}^n x_{ij}$$
# Create a test matrix
(test_mat <- matrix(1:4, nrow = 2))
## [,1] [,2]
## [1,] 1 3
## [2,] 2 4
# Compute the result using base R implementation
(r_base <- rowSums(test_mat))
## [1] 4 6
Implementing within R
Before talking about C++ code, we begin by displaying the implementation for row summation within R. Please note, the implementation is not optimized to take advantage of vectorization simply to ease the translation effort to C++. So, the resulting R implementation is given as:
# Non-vectorized (e.g. horribly inefficient)
rowSums_r <- function(a) {
nrows <- nrow(a)
row_sums <- rep(0, nrows)
for(i in seq_len(nrows)) {
for(j in seq_len(ncol(a))) {
row_sums[i] <- row_sums[i] + a[i, j]
}
}
row_sums
}
Running the necessary test checks, yields:
# Calculate row sum with function
(r_func <- rowSums_r(test_mat))
## [1] 4 6
# Check against base R implementation
all.equal(r_func, r_base)
## [1] TRUE
Old school inline
use via cxxfunction()
Before we get to the use of Rcpp attributes, we need to take a trip down memory
lane or more pressingly using apriori knowledge of the
inline
package process
to compile code. We’ll follow the points outlined in the previous section that
compared both approaches. However, this does not mean that there is any level
of endorsement toward writing code with this approach. In fact, the take away
message from this section should resoundingly be:
DO NOT USE inline
FOR C++ CODE!
With this being said, we begin. We start the process off by translating the code from R to C++ using Rcpp syntax:
// Data matrix
Rcpp::NumericMatrix x;
// Initialize storage
Rcpp::NumericVector row_sums(x.nrow());
// Sum over results
for (int i = 0; i < x.nrow(); ++i) {
for(int j = 0; j < x.ncol(); ++j) {
row_sums[i] += x(i,j);
}
}
Moving along, the above code must be morphed so that x
can receive the input value
from R and be able to send the computed result back into R. In essence, a
faux functional body declaration must be crafted. The results of which are a
bit weird to say the least.
# Create a faux functional C++ body within R
src <- '
// Convert SEXP to NumericMatrix
Rcpp::NumericMatrix x(x_in); // x_in comes from R
// Proceed with algorithm like normal
Rcpp::NumericVector row_sums(x.nrow());
for (int i = 0; i < x.nrow(); ++i) {
for(int j = 0; j < x.ncol(); ++j) {
row_sums[i] += x(i,j);
}
}
return row_sums; // returns the value to R
'
Notice that the last code snippet was done entirely as a string in R.
Once the function body is made, we must create the necessary signature input
alignment for what type should be passed into the function. In this particular
case, the function is expecting to receive a numeric
matrix and, thus, we must
specify the signature of the object as numeric
.
# Include the library
library("inline")
# Set the inputs of function
input_sig = signature(x_in = "numeric")
Having all of the above components completed, we can now proceed to
actually compiling the code. The compliation step is where the name of the
function is derived. So, in terms of the case study, the name of
row_sum_inline
is what will be used to call the function later on.
# Compile function and assign name
row_sum_inline <- cxxfunction(sig = input_sig,
body = src, plugin="Rcpp")
## ld: warning: text-based stub file /System/Library/Frameworks//CoreFoundation.framework/CoreFoundation.tbd and library file /System/Library/Frameworks//CoreFoundation.framework/CoreFoundation are out of sync. Falling back to library file for linking.
Once the compilation is done, usually takes about 4-5 seconds, we now have a C++ function within R.
# Call C++ function
(r_inline = row_sum_inline(test_mat))
## [1] 4 6
# Test against base-r
all.equal(r_inline, r_base)
## [1] TRUE
Modern Rcpp Inline via Rcpp Attributes
Unlike inline
, Rcpp equivalents
are far more modern and straight forward to use. Most notably, once a function
has been written, it is able to be immediately put inline using either
cppFunction()
or sourceCpp()
. The prior option more closely matches what is
available within inline
with
respect to defining a single operable function while the later option enables
users more familiar with C++ to breakaway from embedding code as text within an
R file but instead treat it as a standalone C++ file. Regardless of the choice,
the end result is cleaner code that is focused principally on the computation
rather than on the setup and management of the C++ function.
Unlike the prior option, if you blink, then you may miss the magic of attributes.
To begin, write a proper C++ functional form of rowSums()
.
NumericVector row_sum_rcpp(NumericMatrix x){
NumericVector row_sums(x.nrow());
for (int i = 0; i < x.nrow(); ++i) {
for(int j = 0; j < x.ncol(); ++j) {
row_sums[i] += x(i,j);
}
}
return row_sums;
}
Using cppFunction()
, enclose the above function in strings and run it!
# In R
library("Rcpp")
cppFunction("
NumericVector row_sum_rcpp(NumericMatrix x){
NumericVector row_sums(x.nrow());
for (int i = 0; i < x.nrow(); ++i) {
for(int j = 0; j < x.ncol(); ++j) {
row_sums[i] += x(i,j);
}
}
return row_sums;
}
")
That’s it. Ta-da! The function name given in C++ is exactly the same name one should call within R.
# Call C++ function
(r_rcpp = row_sum_rcpp(test_mat))
## [1] 4 6
# Test against base-r
all.equal(r_rcpp, r_base)
## [1] TRUE
Separation of Concerns
The addition of attributes also allowed for a true separation of concerns model to take place within marrying R code to C++. That is, there could be a distinct area for exploration (R) and for high performance computing (C++) as files would not need to be marred with a mixture of both. By opting for this approach, the gain also is in terms of appropriate syntax highlighting as the code is no longer embedded within a string.
As hinted at earlier, to achieve such a feat one would use sourceCpp()
. There
are only four major differences between the aforementioned approach with cppFunction()
.
- You must specify the
Rcpp.h
header file. - Each function that should be imported into R must have the Rcpp attribute of
// [[Rcpp::export]]
placed immediately preceeding the function. - Append all Rcpp objects with
Rcpp::
as the namespace for Rcpp is not automatically loaded.- One can get around this with
using namespace Rcpp;
, however, this is not ideal as it introduces ambiguity into function calls.
- One can get around this with
- Place code in an external file.
Returning to the case study, create a file called row_sum_ex.cpp and embed the row sum formulation previously concocted:
#include <Rcpp.h> // include R header library
// Rcpp attribute tag require to create interface to R.
// [[Rcpp::export]]
Rcpp::NumericVector row_sum_rcpp_ext(Rcpp::NumericMatrix x){
Rcpp::NumericVector row_sums(x.nrow());
for (int i = 0; i < x.nrow(); ++i) {
for(int j = 0; j < x.ncol(); ++j) {
row_sums[i] += x(i,j);
}
}
return row_sums;
}
Next, lets compile the row_sum_ex.cpp by using sourceCpp()
function.
sourceCpp("path/to/row_sum_ex.cpp")
If we skip this step, then when go to call row_sum_rcpp_ext
in R the function
will not be available as the code has yet to be: 1. compiled and 2. made available
to the R session.
To make sure everything still works as expect, let’s do one last verification check:
# Call C++ function
(r_rcpp_ex = row_sum_rcpp_ext(test_mat))
## [1] 4 6
# Test against base-r
all.equal(r_rcpp_ex, r_base)
## [1] TRUE
Fin
In essence, many things have changed since the introduction of Rcpp Attributes.
One crucial turning point is related to the focus afforded by
Rcpp Attributes
being placed more on the actual calculation than keeping track of inputs or
casting objects. While inline
is
not ideal, the above guide gently provided some transitionary remarks to use
more appropriate inlining techniques provided by Rcpp.