Intro

Recently, I’ve had to think about a lot of things as it relates to simplifying the R experience. Specifically, how do you ease engineers, who are fluent in MATLAB, into working with R? As part of this brainstorming session, I’ve stumbled upon quite a few important realizations. One of these realizations is that there is clear lack of indication as to whether or not a loaded package is up-to-date. That is, when a package loads, there is no information to say whether or not the package is current or a version behind (out of date). To find out if a new package exists, a user must either stay subscribed to listservs, development repos, or must initiate a procedure to update all out of date packages via update.packages().

Obtaining a Package Version from CRAN

There are three ways to download package information.

  1. Use available.packages and capture all 7835 (and counting) packages on CRAN
  2. Target the packages download page on CRAN and extract information from that specific page.
  3. Host a version file on your own webserver!
  4. Mimic a CRAN-esk structure for your packages on your own webserver!

Option 1: Nuclear Option

Under the first option, one can easily use:

pkgInfoCRAN = function(pkg.name){
 d = available.packages()[pkg.name,]
 d["Version"]
}

How this option operates is available.packages() downloads a file called PACKAGES.gz (file download) from CRAN. This file gets generated each time a package is added to CRAN via tools::write_PACKAGES(). The file size is about 206 KB (kilobytes) at the moment but expands to 2.7288 MB when loaded into R.

Option 2: Strike force

Using the second option, one needs work a bit harder:

pkgVersionCRAN = function(pkg, cran_url="http://cran.r-project.org/web/packages/")
{
  
  # Create URL
  cran_pkg_loc = paste0(cran_url,pkg)
  
  # Try to establish a connection
  suppressWarnings( conn <- try( url(cran_pkg_loc) , silent=TRUE ) )
  
  # If connection, try to parse values, otherwise return NULL
  if ( all( class(conn) != "try-error") ) {
    suppressWarnings( cran_pkg_page <- try( readLines(conn) , silent=TRUE ) )
    close(conn)
  } else {
    return(NULL)
  }
  
  # Extract version info
  version_line = cran_pkg_page[grep("Version:",cran_pkg_page)+1]
  gsub("<(td|\\/td)>","",version_line)
}

How this option operates is a connection is established to the packages page on CRAN. The webpage is then downloaded and read into CRAN as a vector. From there, two regular expressions are deployed to find:

  1. The line above the version number (e.g. “Version”)
  2. Extract the version number from between <td></td>

This is then returned as a string.

The total size of this approach is about 9.056 KB (HTML: 8504 KB, CONNECTION: 552 KB) with significantly faster processing.

Option 3: Minimalist Approach

In this case, one could simply create a version file and update that version file each time a new release was pushed to CRAN. An example of a version file would be:

pkg.txt

1.0.0

Then the function developed in Option 2 could be modified so that it resembles:

pkgVersionWeb = function(pkg, web_url="http://thecoatlessprofessor.com/packages/")
{
  
  # Create URL
  cran_pkg_loc = paste0(cran_url,pkg,".txt")
  
  # Try to establish a connection
  suppressWarnings( conn <- try( url(cran_pkg_loc) , silent=TRUE ) )
  
  # If connection, try to parse values, otherwise return NULL
  if ( all( class(conn) != "try-error") ) {
    suppressWarnings( cran_pkg_page <- try( readLines(conn) , silent=TRUE ) )
    close(conn)
  } else {
    return(NULL)
  }
  
  # Extract version info
  cran_pkg_page[1]
}

The downside of this approach is:

  1. You must remember to update the version text file on your webserver.
  2. You must be able to serve the same file to thousands of users.

Option 4: Enthusiast approach

To mimic CRAN, you can follow the advice given in R Data Packages in External Data Repositories using the Additional_repositories field. The beauty of this approach is its simplicity - after the CRAN structure has been deployed - as all that is required is a slight modification of Option 1’s code. That is, one must supply a URL to the off site cran and then construct a utils::contrib.url().

pkgInfoRepo = function(pkg.name, url = "http://smac-group.com/datarepo"){
 d = available.packages(contriburl = contrib.url(url))[pkg.name,]
 d["Version"]
}

The downside of this approach is:

  1. You must remember to update the package in two locations
    • CRAN and on your webserver.
  2. You must remember to regenerate the PACKAGES file each time a package is added or removed.
  3. You must be able to serve the same file to thousands of users.

Adding a version check method to .onAttach()

As you may have noticed from the above methods, we aim to obtain the version information in a string. The reason for this is we would like to interface with the utils::compareVersion() function to compare version strings. Since we now have the version information, we can modify the .onAttach() function so that when a user loads the package information the results of a check between their version and the version accessible remotely is displayed.

.onAttach <- function(...){
  # Avoid running if in batch job / user not present
  if (!interactive()) return()
  
  # Obtain the installed package information
  local_version = utils::packageDescription('pkgname')
  
  # Grab the package information from CRAN
  cran_version = packageVersionCRAN("pkgname")
  
  # Verify we have package information
  if(!is.null(cran_version) && length(cran_version) != 0L){
    latest_version = utils::compareVersion(cran_version, local_version$Version)
    
    d = if(latest_version == 0){
     'CURRENT'
    }else if(latest_version == 1){
      'OUT OF DATE'
    }else{
      'DEVELOPMENT'
    }
  
  }else{ # Gracefully fail.
   d = "ERROR IN OBTAINING REMOTE VERSION INFO"
   latest_version = 0
  }
  
  # Use packageStartUpMessages() so that folks can suppress package messages with 
  # suppressPackageStartupMessages(library(pkg))
  packageStartupMessage('Version: ', local_version$Version, ' (', d,') built on ', local_version$Date)
  if(latest_version == 1){
    packageStartupMessage('\n!!! NEW VERSION ', cran_version , ' !!!')
    packageStartupMessage('Download the latest version: ',cran_version,' from CRAN via `install.packages("pkgname")`\n')
  }
}