R + OpenMP + vecLib on Apple Silicon

This guide is intended to help people who use R and are transitioning to an Apple Silicon machine. It includes a low-level general discussion of things like the Mac’s $PATH variable and R’s Makevars file. The goal is to help users of this software customize some options to make R work (and work very well) on Apple Silicon.

Intro

Apple’s in-house ARM-based systems-on-a-chip (“Apple Silicon,” the first model of which they called the “M1”) are incredible. When I upgraded my top-of-the-line late 2016 MacBook Pro to an M1 Max MacBook Pro, I saw a ~250% speedup in single core operations with much lower power draw. But the biggest gains were in multicore operations: Geekbench benchmarks show a 6- to 7-fold speedup compared to my old laptop. Amazing!

Two years into the Apple Silicon transition, most front-end software that I use for analytical things has been ported to the new architecture, including Python, R, and STATA—really only MATLAB is a holdout, and native support is in beta. But to unlock the real gains of upgrading, you want to leverage all the cores on these chips. And with R, that’s sometimes nontrivial.

Take for example an R package I use a lot: fixest. While fixest is faster, ceteris paribus, than competitors in estimating high-dimensional fixed effects models, these differences are irrelevant if STATA is running on 10 cores and fixest is just using one. Fortunately, fixest calls a C++ backend for computation, and these calls support OpenMP, a multiprocessing API. But to use OpenMP, fixest has to be compiled in R (as opposed to just installed from a binary, or pre-compiled, package). And R has to know where all of the necessary components live that are needed to support that compilation. To understand why it by default doesn’t, we need to take a detour.

The Mac shell (zsh) and PATH

On an Apple Silicon Mac running macOS 10.15 or later, the default shell is zsh. (Previously, is was bash, but Apple transitioned to zsh—which is built on top of bash—because more recent updates to bash were licensed with GNU License v3, which they purportedly didn’t like.) When you open a window in Terminal, you’re presented with a zsh input interface. If you enter a command (e.g., “python”), zsh will call the executable it associates with that command. It looks for those executables in a set of locations that are defined in the Mac’s PATH variable. To see the list of these locations (and their priority ranking in the case that your computer houses two python executables), you can run echo $PATH in the Terminal. If you want to know which python is being called when you enter that command, you can type “which python”.

The natural question here is: what if I install some executable and want to run it by name from the Terminal, but its location isn’t in my PATH? Well, you have to modify the PATH variable. Imagine we have some software in /this/location. We can add /this/location to the end of the PATH variable by running export PATH=$PATH:/this/location each time before we want to run the software there. Or we could make this change permanent by editing the files that are run when you open a new Terminal window (lots of detail on this here), which include ~/.zprofile (specific to zsh), ~/.zshrc (also specific to zsh), and ~/.profile (run by all shells). Specifically, we could add export PATH=$PATH:/this/location to ~/.zprofile. For this operation, I’d usually use a lean in-Terminal text editor like nano (i.e., run nano ~/.zprofile to start editing the file, type in export PATH=$PATH:/this/location, and use ctrl+O and then ctrl+X to save and then exit nano).

Homebrew

This noise about the PATH variable and the particulars of zsh are relevant here for one main reason: we’re going to use Homebrew to manage our package installations, and it behaves a little curiously on Apple Silicon.

Homebrew is a free and open-source package management system. It makes things easy that would otherwise be hard. For example, say you want to install wget (a flexible package for pulling content from web servers) on your Mac. You could head over to gnu.org and download the latest version of wget, unzip it, and then compile it. Or, if you have Homebrew installed on your Mac, you could just open a Terminal and type brew install wget. Conveniently, Homebrew also takes care of tracking updates to your installed packages and making sure everything is linked properly so your software remains interoperable through the commands brew upgrade and brew doctor. It’s neat. So here’s our first step:

  1. If you haven’t yet installed Homebrew, open a Terminal window, type /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh), and hit Enter.

During the Homebrew installation process, you might be prompted to install Xcode’s Command Line Tools, a set of software that we’ll need anyway for some other tasks. (One thing to note is that these tools will be periodically updated, usually alongside macOS. And when these updates happen, sometimes things will break. In general, I’ve found that running sudo xcode-select --switch /Library/Developer/CommandLineTools seems to fix this, but I don’t know why.)

Throughout this tutorial, we’re going to use Homebrew to manage package installations, but I mentioned it does something weird: if we’re on Apple Silicon, Homebrew installs things in /opt/homebrew, not /usr/local as was the case with Macs running Intel processors. There are many reasons for this change, detailed here, and for many purposes it’s just fine: when Homebrew is installed, it adds a line to your ~/.profile file (eval "$(/opt/homebrew/bin/brew shellenv)") that—for the most part—handles the PATH snafus. But the crux of it is that RStudio doesn’t run the content of ~/.profile when it starts up. Because of this, package compilation in RStudio is done without Homebrew’s tidy PATH management and you’ll be swimming in errors.

Rprofile

Fortunately, R (and by extension RStudio) runs its own profile file when starting up, ~/.Rprofile. This is where we’ll want to tell R to modify its PATH variable, and we do that in the R language. But first, let’s use Homebrew to install R and RStudio.

2. Run brew install --cask R

3. Run brew install --cask RStudio

4. Add Sys.setenv(PATH=paste0("/opt/homebrew/bin:",Sys.getenv("PATH"))) to ~/.Rprofile

Congrats! That was easy. Now, to speed things up.

vecLib

Many of the things people commonly do in R use vector and matrix computations, so the function library used for these calls can make big differences in speed. The R binary on Homebrew supports two linear algebra libraries, reference BLAS (which is possibly more stable) and the Apple-optimized BLAS (vecLib, which is much faster). I recommend you use the second. To do this, you need to create a symbolic link between the vecLib library and the default R BLAS library using two commands in the Terminal:

5. cd /Library/Frameworks/R.framework/Resources/lib

6. ln -sf libRblas.vecLib.dylib libRblas.dylib

And that’s that! R should automatically reference the correct linear algebra library when you open it. So make sure this is the case, you can run sessionInfo() in R and it should report, for example, “BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib”

OpenMP

We’re not done. Now we’ll enable OpenMP. First, use Homebrew to install the OpenMP libraries.

7. brew install libomp

So packages know to reference this when they’re being compiled, we need to add some lines to the R Makevars file, which is the repository for appropriate flags/variables to be referenced when we’re compiling packages from source code in R. Specifically, we will:

8. Add the following lines to ~/.R/Makevars:

LDFLAGS+=-L/opt/homebrew/opt/libomp/lib -lomp
CPPFLAGS+=-Xclang -fopenmp

These steps are more or less mirrored in the R documentation.

gfortran/gcc

Now we’re getting to the miscellanea. For one, some packages compiling in R try to call gfortran, but the gfortran Homebrew cask was deprecated in 2021and integrated, instead, into gcc. So your best bet if you don’t want to leave the safety of Homebrew is to do the following:

9. First, brew install gcc

10. Next, add some lines to your Makevars to let R know where gfortran really lives:

FLIBS+=-L/opt/homebrew/opt/gfortran/lib
F77+=/opt/homebrew/bin/gfortran
FC+=/opt/homebrew/bin/gfortran
CFLAGS+=-I/opt/homebrew/include
CPPFLAGS+=-I/opt/homebrew/include
CXXFLAGS+=-I/opt/homebrew/include

Stragglers

Inevitably, some packages will still complain when you try to compile them in RStudio. This might look something like this: you are in RStudio, and you try to compile a package named thing from source using the usual command: install.packages(‘thing’, type=‘source’), only to see an error like “fatal error: 'libpng.h' file not found

Generally, what I do in this case is search StackExchange :\. That generally leads me to running a command in the Terminal like “locate libpng”, adding something to my Makevars file (maybe “LDFLAGS+=-L/opt/homebrew/opt/libpng/lib”), and trying again. This has worked in nearly every case, but my Makevars file is now a bit of a mess:

CFLAGS=-I/opt/homebrew/include -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk
CPPFLAGS=-I/opt/homebrew/include -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk
CPPFLAGS+=-Xclang -fopenmp
CPPFLAGS+=-I/opt/homebrew/opt/openssl@1.1/include
CXXFLAGS=-I/opt/homebrew/include -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk

FLIBS=-L/opt/homebrew/opt/gfortran/lib
F77=/opt/homebrew/bin/gfortran
FC=/opt/homebrew/bin/gfortran
LDFLAGS+=-L/opt/homebrew/opt/jpeg/lib -L/opt/homebrew/opt/libpng/lib -L/opt/homebrew/opt/udunits/lib -L/opt/homebrew/opt/openssl@1.1/lib -L/opt/homebrew/opt/sqlite3/lib -L/opt/homebrew/opt/proj/lib
LDFLAGS+=-L/opt/homebrew/opt/libomp/lib -lomp
PKG_CONFIG_PATH+=/opt/homebrew/opt/openssl@1.1/lib/pkgconfig
PKG_CFLAGS+=-I/opt/homebrew/opt/openssl@1.1/lib/

This has only not worked AFAIK for a small set of (great!) geospatial packages, including terra, sf, and exactextractr. To compile these, I’ve had to add some flags during the installation call in R. Specifically, I use something like:

install.packages('terra', type = 'source', configure.args = '--with-proj-include=/opt/homebrew/include/ --with-proj-lib=/opt/homebrew/lib/', configure.vars = 'GDAL_DATA=/opt/homebrew/opt/gdal/share/gdal/')

And that’s about it. To really see the effect of these speedups, open RStudio and compile some packages that leverage OpenMP and vecLib from source, like data.table and fixest, using the usual install.packages(‘thing’, type=‘source’) command. Enjoy!

Previous
Previous

Humid heat metrics in Google Earth Engine