A modulefile Approach to Compiling R on a Cluster

This is a follow up post to Working with R on a Cluster. Previously, we discussed ways to work with R in a purely distributed command line interface (CLI) environment. Within this post, we’ll detail how to setup your own private installation of R on a cluster that supports modulefiles.

Motivation

Lately, I’ve been needing to use more cutting edge versions of R than what has been made available by the campus cluster staff. The reason for the need to be current – for a lack of a better word – is to stay abreast of new developments in the R community and take advantage of feature rich R packages. Therefore, I’ve had to resort to building and using my own installation of R on a cluster that uses the CentOS 6.8 operating system (Red Hat Linux). Unfortunately, the guide below is a bit long as a result of the traditional compilation procedures for R not being well suited to the domain of high performance computing (HPC)’s need for modulefiles and lack of root access (e.g. no /usr/local/...).

Finding a list of available modules

Before beginning, make sure that you actually need to setup your own version of R by seeing what versions are available on the cluster. To do this, we will invoke module avail which lists all available modulefiles on the system.

module avail
------------------------------------------ /usr/share/Modules/modulefiles -------------------------------------------
dot         module-git  module-info modules     null        use.own

---------------------------------------------- /usr/local/modulefiles -----------------------------------------------
BerkeleyDB/5.0(default)             java/1.6.0(default)                 openmpi/1.6.4-intel-13.1
Macaulay2/1.4-r12617                java/1.7                            openmpi/1.6.5-gcc-4.7.1
R/2.13.2                            java/1.7.75                         openmpi/1.6.5-intel-14.0
R/2.15.1                            java/1.8                            openmpi/1.8.4-gcc-4.9.2
R/2.15.3                            lapack                              openmpi/1.8.4-intel-15.0
R/3.0.1                             libpwquality/1.2.4                  openmpi/2.0.1-gcc-6.2.0
R/3.1.0                             libuuid/1.0.2(default)              openssl/1.0.1
R/3.1.2                             libxml2/2.9.1(default)              p7zip/9.20.1
R/3.2.2(default)                    libxslt/1.1.28(default)             p7zip/9.38.1
R/3.2.5                             mathematica/10                      papi/5.4.1
authconfig/6.2.9                    mathematica/11                      petsc/3.3-p6(default)
blas                                mathematica/8.0                     php/5.5.11
boost/1.51.0                        matlab/7.11                         python/2(default)
boost/1.58.0                        matlab/7.14                         python/2.7.3
bzip2/1.0.6                         matlab/8.3                          python/2.7.8
cfd/Ansys-14.5                      matlab/8.4                          python/3
cfd/Ansys-15.0.7                    matlab/8.5                          python/3.4.0
cfd/Ansys-16.0                      matlab/8.6                          pythonmod/2.6(default)
cifs-utils/6.4                      matlab/9.0                          pythonmod/2.7.2
cmake/2.8(default)                  mc/4.8.13                           samba/4.1.11
cmake/3.0.2                         mercurial/1.8(default)              scilab/5.4.0(default)
cmake/3.6.2                         moab/7.2.4                          sssd/1.11.2
cracklib/2.9.0                      moab/7.2.5                          sssd/1.11.6(default)
cuda/5.5                            moab/7.2.6                          sssd/1.12.0
cuda/6.0                            moab/7.2.7                          sssd/1.12.1
cuda/6.5                            moab/7.2.8                          svn/1.6(default)
cuda/7.0                            moab/7.2.9                          svn/1.8.5
ding-libs/0.4.0                     moab/8.0.0                          svn/1.9.0
dos2unix/7.3.2                      moab/8.0.1                          svn/1.9.2
emacs/23.2(default)                 moab/8.1.0                          szip/2.1(default)
env/Physics                         moab/8.1.1                          texlive/2010(default)
env/cse                             moab/9.0.1                          texlive/2015
env/inv-catchenlab                  moab/9.0.2(default)                 torque/4.2.3.h4
env/inv-cse                         mpi/mpich/3.1.3-gcc-4.7.1           torque/4.2.5
env/ncsa                            mpi/openmpi/1.4-intel               torque/4.2.5.h2
env/taub                            mpiexec/0.84                        torque/4.2.6
fftw-3.3.3/mvapich2-2.0b_intel-14.0 mvapich/1.2-gcc+ifort               torque/4.2.7
fftw-3.3.3/openmpi-1.6.5_intel-14.0 mvapich2/1.6-gcc(default)           torque/4.2.8
fuse/2.9.3                          mvapich2/1.6-gcc+ifort              torque/4.2.9
gcc/4.7.1(default)                  mvapich2/1.6-gccdebug               torque/5.0.0
gcc/4.9.2                           mvapich2/1.6-intel                  torque/5.0.1
gcc/6.2.0                           mvapich2/1.9b-intel-13.1            torque/5.0.1p
gdb/7.11.1(default)                 mvapich2/2.0b-gcc-4.7.1             torque/5.1.0p
gettext/0.19.4                      mvapich2/2.0b-intel-14.0            torque/5.1.1
git/1.7(default)                    mvapich2/2.1rc1-gcc-4.9.2           torque/5.1.2.h5
grace/5.1(default)                  mvapich2/2.1rc1-intel-15.0          torque/6.0.1
gsl/1.16                            mvapich2/2.2-gcc-6.2.0              torque/6.0.1h3
h5utils/1.12                        mvapich2/2.2-intel-17.0             torque/6.0.2(default)
hwloc/1.7.2                         mvapich2/mpiexec                    unzip/unzip60
intel/11.1(default)                 mysql/5.6.23                        utils/makedepend/1.0.5
intel/13.1                          octave/3.4(default)                 valgrind/3.10.1
intel/14.0                          openblas/0.2.8-gcc(default)         valgrind/3.9.0
intel/15.0                          openldap/2.4.40                     vim/7.3(default)
intel/15.0.3                        openmpi/1.4-gcc                     visit/2.2.1(default)
intel/16.0.0                        openmpi/1.4-gcc+ifort               vnc/4.1.1
intel/17.0                          openmpi/1.4-intel                   wine/1.6.2
intltool/0.50.2                     openmpi/1.6.4-gcc-4.7.1

From the above, we note that the R versions available are:

  • 2.13.2, 2.15.1, 2.15.3, 3.0.1, 3.1.0, 3.1.2, 3.2.2 (default), 3.2.5

Thus, at the time of this writing, we cannot use any version in the 3.3.x line!

Loading a modulefile

Hypothetically speaking, let’s say that you did have a version of R that you wanted to use. In that case, you would load it in your environment using:

module load R 
module load R/3.2.2 # equivalent since default

This loads the compiled version of R done by the campus cluster staff into your environment. From there, you can access the R CLI by typing into shell:

R

Peaking at the modulefile recipe for R

From the module avail output, we can see that all modulefiles are stored in /usr/local/modulefiles. To see what is required to compile R, let’s peak at the contents of the latest R module file.

cat /usr/local/modulefiles/R/3.2.5
#%Module1.0####################################################################

proc ModulesHelp { } {
        global _module_name

        puts stderr "\tThis module sets up the environment for R, version 3.2.5"
}

set _module_name        [module-info name]

module-whatis "R-3.2.5 built with gcc-4.9.2, MKL, java-1.8 and texlive"

module load gcc/4.9.2
module load intel/15.0.3
module load java/1.8
module load texlive/2015

         set    approot         /usr/local/R/R-3.2.5
prepend-path    PATH            $approot/bin
prepend-path    LD_LIBRARY_PATH $approot/lib64:$approot/lib64/R/lib
prepend-path    MANPATH         $approot/share/man

From the R modulefile, we note that the following modules have been loaded into the environment:

module load gcc/4.9.2
module load intel/15.0.3
module load java/1.8
module load texlive/2015

Thus, when we compile from source, we will need to make sure the above modules are loaded into the environment.

Compiling R from Source

Preparing to Compile R from Source

Before we can install R from source, we must prepare the installation environment.

The first task is to unload any active module using module purge.

module purge # Remove all active modules from the environment

Then, we need to load in the suggested modules we gleamed from the looking at the latest R-3.2.5 modulefile.

module load gcc/4.9.2
module load intel/15.0.3
module load java/1.8
module load texlive/2015

However, the list is not conclusive as there are quite a few missing libraries on the cluster. Though, this may not necessarily be the case for yourself. Therefore, you may wish to skip this section and try to compile R. If it fails, then come back and work through each of the steps.

Having said this, we will now very quickly walk through how to compile the additional dependencies.

To do so, we will define a local library to hold the dependencies:

# Setup a location to store dependencies
local_lib=$HOME/local_lib

# Create the directory
mkdir -p $local_lib

# Export the bash variable (used in modulefiles)
export local_lib

# Append the following to bash profile
echo "export local_lib=$HOME/local_lib" >> ~/.bash_profile

Custom modulefiles

To use our own module files, we must first always load in the use.own module.

module load use.own

This module searches for custom modulefiles that have been installed in the ~/privatemodules directory (e.g. /home/username/privatemodules/module/version).

zlib

Zlib is a compression library.

Note: Zlib-1.2.11 appears to trigger the following error:

checking if zlib version >= 1.2.5… no

checking whether zlib support suffices… configure: error: zlib library and headers are required

Updated February 27th: This issue has been resolved in configure script that ships with R version 3.3.3. Turns out the zlib version string during the check was truncated to a maximum number of five characters, which made the string of 1.2.11 be read as 1.2.1 causing the check requiring a version greater than 1.2.5 to fail.

Thus, I’ve opted to use zlib 1.2.9 instead.

zlib_ver=1.2.9
install_path=$HOME/local_lib/zlib/$zlib_ver
mkdir -p $install_path
wget https://downloads.sourceforge.net/project/libpng/zlib/$zlib_ver/zlib-$zlib_ver.tar.gz?r=https%3A%2F%2Fsourceforge.net%2Fprojects%2Flibpng%2Ffiles%2Fzlib%2F$zlib_ver%2F -O zlib-$zlib_ver.tar.gz
#http://zlib.net/zlib-$zlib_ver.tar.gz # only for the latest version...
tar -xvzf ./zlib-$zlib_ver.tar.gz && cd zlib-$zlib_ver
./configure --prefix=$install_path
make && make install

Next, we create a module file to load the zlib information onto the path as needed.

#%Module1.0####################################################################

proc ModulesHelp { } {
        global _module_name

        puts stderr "\tThis module sets up the environment for zlib, version 1.2.9"
}

set _module_name        [module-info name]

module-whatis "zlib 1.2.9 built with gcc-4.9.2"

         set    approot         $::env(local_lib)/zlib/1.2.9
prepend-path    CPATH           $approot/include
prepend-path    LD_LIBRARY_PATH $approot/lib
prepend-path    MANPATH         $approot/share/man

bzip2

bzip2 is a high-quality data compressor.

On our cluster, we did have the option to load this module. However, in the event that you do not have this ability. I’m listing the instructions to install it from source next.

Unfortunately, there is a non-standard (e.g. non-carefree) make way to build this library that I encountered. When I ended up compiling R, I was told I needed to modify the make file’s CFLAG to include -fPIC. However, a simpler solution I found was simply to move the .so objects into the $install_path/lib folder.

Failure to perform one or the other option will result in an R compilation error later.

bzip_version=1.0.6
install_path=$HOME/local_lib/bzip2/$bzip_version
mkdir -p $install_path
wget http://www.bzip.org/$bzip_version/bzip2-$bzip_version.tar.gz
tar -xvzf ./bzip2-$bzip_version.tar.gz && cd bzip2-$bzip_version
make -f Makefile-libbz2_so
make && make install PREFIX=$install_path
mv *.so* $install_path/lib/
#%Module1.0####################################################################

proc ModulesHelp { } {
        global _module_name

        puts stderr "\tThis module sets up the environment for bzip2, version 1.0.6"
}

set _module_name        [module-info name]

module-whatis "bzip2 1.0.6 built with gcc-4.9.2"

         set    approot         $::env(local_lib)/bzip2/1.0.6
prepend-path    PATH            $approot/bin
prepend-path    CPATH           $approot/include
prepend-path    LD_LIBRARY_PATH $approot/lib
prepend-path    MANPATH         $approot/share/man

xzutils

xzutils is yet another compression library that contains the infamous liblzma header.

xzutils_version=5.2.3
install_path=$HOME/local_lib/xzutils/$xzutils_version
mkdir -p $install_path
wget http://tukaani.org/xz/xz-$xzutils_version.tar.gz
tar -xvzf ./xz-$xzutils_version.tar.gz && cd xz-$xzutils_version
./configure --prefix=$install_path
make && make install
#%Module1.0####################################################################

proc ModulesHelp { } {
        global _module_name

        puts stderr "\tThis module sets up the environment for xzutils, version 5.2.3"
}

set _module_name        [module-info name]

module-whatis "xzutils 5.2.3 built with gcc-4.9.2"

         set    approot            $::env(local_lib)/xzutils/5.2.3
prepend-path    PATH               $approot/bin
prepend-path    CPATH              $approot/include
prepend-path    CPLUS_INCLUDE_PATH $approot/include
prepend-path    LD_LIBRARY_PATH    $approot/lib
prepend-path    LIBRARY_PATH       $approot/lib
prepend-path    MANPATH            $approot/share/man

PCRE

PCRE or Perl Compatible Regular Expressions contains a set of functions that implement regular expression pattern matching in a manner similar to Perl 5. (Surprise, not a compression library!)

pcre_version=8.40
install_path=$HOME/local_lib/pcre/$pcre_version
mkdir -p $install_path
wget ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-$pcre_version.tar.gz
tar -xvzf pcre-$pcre_version.tar.gz && cd pcre-$pcre_version
./configure --prefix=$install_path --enable-utf8
make && make install
#%Module1.0####################################################################

proc ModulesHelp { } {
        global _module_name

        puts stderr "\tThis module sets up the environment for PCRE, version 8.40"
}

set _module_name        [module-info name]

module-whatis "PCRE 8.40 built with gcc-4.9.2"

         set    approot            $::env(local_lib)/pcre/8.40
prepend-path    PATH               $approot/bin
prepend-path    CPATH              $approot/include
prepend-path    LD_LIBRARY_PATH    $approot/lib

curl

curl is a command line tool and library for transferring data with URLs.

curl_version=7.52.1
install_path=$HOME/local_lib/curl/$curl_version
mkdir -p $install_path
wget --no-check-certificate https://curl.haxx.se/download/curl-$curl_version.tar.gz
tar xzvf curl-$curl_version.tar.gz && cd curl-$curl_version
./configure --prefix=$install_path
make && make install
#%Module1.0####################################################################

proc ModulesHelp { } {
        global _module_name

        puts stderr "\tThis module sets up the environment for curl, version 7.52.1"
}

set _module_name        [module-info name]

module-whatis "curl 7.52.1 built with gcc-4.9.2"

         set    approot            $::env(local_lib)/curl/7.52.1
prepend-path    PATH               $approot/bin
prepend-path    CPLUS_INCLUDE_PATH $approot/include
prepend-path    LD_LIBRARY_PATH    $approot/lib
prepend-path    LIBRARY_PATH       $approot/lib
prepend-path    MANPATH            $approot/share/man

tcltk

tcltk is a Tool Command Language that some packages in R require to function. In particular, the geoR used in some spatial calculations… It may appear to be odd that I’m installing from source instead of using a system library. But, when I tried to affiliate the path with what was available on the cluster, I was never able to compile code most likely because a development header was missing.

tcltk_version=8.6.6
install_path=$HOME/local_lib/tcltk/$tcltk_version
mkdir -p $install_path
wget http://prdownloads.sourceforge.net/tcl/tcl$tcltk_version-src.tar.gz
tar xzvf tcl$tcltk_version-src.tar.gz && cd tcl$tcltk_version/unix
./configure --prefix=$install_path
make && make install
#%Module1.0####################################################################

proc ModulesHelp { } {
        global _module_name

        puts stderr "\tThis module sets up the environment for tcltk, version 8.6.6"
}

set _module_name        [module-info name]

module-whatis "tcltk 8.6.6 built with gcc-4.9.2"

         set    approot            $::env(local_lib)/tcltk/8.6.6
prepend-path    PATH               $approot/bin
prepend-path    CPLUS_INCLUDE_PATH $approot/include
prepend-path    LD_LIBRARY_PATH    $approot/lib
prepend-path    LIBRARY_PATH       $approot/lib
prepend-path    MANPATH            $approot/share/man

Compiling R from Source

From here, it’s a clear shot to installing R from source by following the recipe in R Installation and Administration manual.

There are a few differences between the traditional install from source and the one necesitated by the cluster environment. Most notably, the installation must be done without root access. As a result there a few configuration options that I suggest using:

  • Supply a local directory via --prefix=, e.g. --prefix=$HOME/R
  • Disable the X Windows System as R will not be rendering any graphics to a UI.
# Unload modules
module purge

# Load system modules
module load gcc/4.9.2
module load intel/15.0.3 
module load java/1.8 
module load texlive/2015 

# Load own modules
module load use.own

module load zlib/1.2.9
module load bzip2-custom/1.0.6 
module load xzutils/5.2.3
module load pcre/8.40
module load curl/7.52.1
module load tcltk/8.6.6

# Required to avoid loading the system version of bzip2
export LDFLAGS="-L$local_lib/bzip2/1.0.6/lib"

# R version
r_version=3.3.2

# Grab the latest version of R
wget https://cran.r-project.org/src/base/R-3/R-$r_version.tar.gz
tar xvf R-$r_version.tar.gz && cd R-$r_version

# Configure R to be installed into ~/R
./configure --prefix=$HOME/R/$r_version --with-x=no --enable-R-shlib 

make && make install

And now, we must create a modulefile for R

#%Module1.0####################################################################

proc ModulesHelp { } {
        global _module_name

        puts stderr "\tThis module sets up the environment for curl, version 7.52.1"
}

set _module_name        [module-info name]

module-whatis "R-3.3.2 built with gcc-4.9.2, MKL, java-1.8, texlive, zlib, bzip2, xzutils, pcre, and curl"

# Load required modules
module load gcc/4.9.2
module load intel/15.0.3
module load java/1.8
module load texlive/2015

# Load custom modules (make sure to load in profile use.own)
module load zlib/1.2.9
module load bzip2-custom/1.0.6 
module load xzutils/5.2.3
module load pcre/8.40
module load curl/7.52.1

         set    approot            $::env(HOME)/R/3.3.2
prepend-path    PATH               $approot/bin
prepend-path    LD_LIBRARY_PATH    $approot/lib64:$approot/lib64/R/lib
prepend-path    MANPATH            $approot/share/man
comments powered by Disqus