Installation of Scanpy on a Mac M1

February 10, 2022

Apple decided to change the architecture of their processor. They use ARM chips that they develop in-house. This change comes with some trade-off for python programming especially in the data science world. I had to use some data science and bioinformatic libraries and faced issues regarding their installation. The most notable one was with the library scanpy. First the library needs some system dependencies. Second, these some Python libraries that are just not available yet for the ARM chips. Indeed their wheels is still note ready to be used on PyPi. Here I kept note to remember the steps needed for the installation of scanpy. For managing my environments I use poetry, conda or pipenv. Unfortunately I did not manage to install Scanpy via pip depedent package managers (both pipenv or poetry do not work). The only alternative that worked wasconda. But to be operational, you first need to install and configure some dependencies.

HDF5

HDF5 is a high performance data software library and file format to manage, process, and store your heterogeneous data. HDF5 is built for fast I/O processing and storage. Scanpy can read and store AnnData object as h5ad files that are hdf5 files with some additional structure specifying how to store AnnData objects. To install it you will need homebrew on your Mac M1 machine.

brew install hdf5
export HDF5_DIR=/opt/homebrew/Cellar/hdf5/1.12.1 # use the version you have

LLVMLITE

Next install llvmlite with homebrew. llvmlite provides a Python binding to LLVM for use in Numba that translates a subset of Python and NumPy code into fast machine code. It is extensively used in Scanpy given its dependency to numpy and scikit-learn.

brew install llvm@11

Make sure /opt/homebrew/opt/llvm@11/bin is in your path. For that edit the /etc/paths to add this path. Next install llvmlite in your python environment.

Conda

I recommend installing Miniconda. Miniconda is essentially an installer for an empty conda environment, containing only Conda, its dependencies, and Python.

conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
conda config --set offline false

Once done, you can install scanpy in your machine.

conda install scanpy

Conclusion

It was a short one, but it will be, for sure, a good reminder for my future scanpy projects when working in a Mac M1. I hope that in the future these issues will be solved. In the mean time, I still use my Linux computer when I need to deal with Scanpy. There, everything works out of the box with classical pip install.

Update

Since Numba 0.55.2 and llvmlite 0.38.1 are now available with wheels from PyPi you can now easily install scanpy on your mac M1. You can see the thread here to have more context about it.