Setting up ML and AI tools on Apple Silicon
One of the questions I'm regularly asked is can you run data analysis/machine learning/Artificial Intelligence jobs on Apple Silicon machines. I'm not an expert in AI but I thought I'd go through the process of setting up an Apple Silicon MacBook Pro M1 max for machine learning using python. I've tried to document every step so apologies if it is too detailed.
The details of the machine are shown below, macOS Monterey version 12.5.1
First Steps
I'm using home-brew and conda to install and manage compatibility and dependences, detailed notes on installation are on the instructions for install cheminformatics tools on a Mac https://www.macinchem.org/reviews/cheminfo/cheminfoMacUpdate.php.
Install Homebrew from https://brew.sh. You may need to install the Xcode Command line tools, details are in the link above.
Install Anaconda or Miniconda normally (I used miniconda), and let the installer add the conda installation of Python to your PATH environment variable. There is no need to set the PYTHONPATH environment variable.
Here is the link for miniconda for Apple Silicon https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-arm64.sh. Make sure you use the arm64 version.
Restart the Terminal
You can check the installation using these commands in the Terminal
(base) chrisswain@ChrisM1MBP ~ % echo $PATH
/usr/local/miniconda/bin:
(base) chrisswain@ChrisM1MBP ~ % which python
/Users/chrisswain/miniconda3/bin/python
(base) chrisswain@ChrisM1MBP ~ % python --version
Python 3.8.12
Check you have the arm version installed
(base) chrisswain@ChrisM1MBP ~ % python
Python 3.8.12 | packaged by conda-forge | (default, Jan 30 2022, 23:13:55)
[Clang 11.1.0 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import platform
>>> platform.platform()
'macOS-12.6-arm64-arm-64bit'
>>>
The Terminal prompt includes (base) because we are in the base python installation environment.
This is probably a good point to install the Xcode command line tools, these include many useful tools such as the Apple LLVM compiler, linker, and Make for compiling executable software from source code.
Xcode-select --install
Setting up a Machine Learning environment
We can now set up an environment for machine learning. I have a folder called projects and I created a sub-folder called ForAIML
Whilst the instructions below take you through the complete process I've also created an environment file myMLenv.yml that can be downloaded from here. http://macinchem.org/reviews/AIML/SettingAIML_files/myMLenv.yml this used to create the conda environment using the following command.
conda env create -f myMLenv.yml -n myML
In the Terminal cd to this folder and then set up a python environment, at the moment I'm using python 3.9.x for most of my work.
conda create -n myML python=3.9
conda activate myML
Collecting package metadata (current_repodata.json): done
Solving environment: done
==> WARNING: A newer version of conda exists. <== current version: 4.13.0 latest version: 4.14.0
Please update conda by running
$ conda update -n base conda
You should see the following displayed in the Terminal, note packages are for conda-forge/osx-arm64.
Package Plan
environment location: /Users/chrisswain/Projects/ForAIML/myML
added / updated specs: - python=3.9
The following packages will be downloaded:
package | build
---------------------------|-----------------
ca-certificates-2022.07.19 | hca03da5_0 124 KB
libsqlite-3.39.2 | h2c9beb0_1 825 KB conda-forge
libzlib-1.2.12 | ha287fd2_2 48 KB conda-forge
openssl-3.0.5 | h7aea29f_1 2.3 MB conda-forge
pip-22.2.2 | pyhd8ed1ab_0 1.5 MB conda-forge
readline-8.1.2 | h46ed386_0 263 KB conda-forge
setuptools-65.1.1 | py38h10201cd_0 1.4 MB conda-forge
sqlite-3.39.2 | h40dfcc0_1 817 KB conda-forge
xz-5.2.6 | h57fd34a_0 230 KB conda-forge
------------------------------------------------------------
Total: 7.4 MB
The following NEW packages will be INSTALLED:
bzip2 conda-forge/osx-arm64::bzip2-1.0.8-h3422bc3_4
ca-certificates pkgs/main/osx-arm64::ca-certificates-2022.07.19-hca03da5_0
libffi conda-forge/osx-arm64::libffi-3.4.2-h3422bc3_5
libsqlite conda-forge/osx-arm64::libsqlite-3.39.2-h2c9beb0_1
libzlib conda-forge/osx-arm64::libzlib-1.2.12-ha287fd2_2
ncurses conda-forge/osx-arm64::ncurses-6.3-h07bb92c_1
openssl conda-forge/osx-arm64::openssl-3.0.5-h7aea29f_1
pip conda-forge/noarch::pip-22.2.2-pyhd8ed1ab_0
python conda-forge/osx-arm64::python-3.8.13-hd3575e6_0_cpython
python_abi conda-forge/osx-arm64::python_abi-3.8-2_cp38
readline conda-forge/osx-arm64::readline-8.1.2-h46ed386_0
setuptools conda-forge/osx-arm64::setuptools-65.1.1-py38h10201cd_0
sqlite conda-forge/osx-arm64::sqlite-3.39.2-h40dfcc0_1
tk conda-forge/osx-arm64::tk-8.6.12-he1e0b03_0
wheel conda-forge/noarch::wheel-0.37.1-pyhd8ed1ab_0
xz conda-forge/osx-arm64::xz-5.2.6-h57fd34a_0
Proceed ([y]/n)? y
Type Y and the installation will proceed.
The Terminal prompt should now include (myML) because we are now in the myML python environment.
We can now start to install a variety of data science analysis and visualisation packages.
(myML) chrisswain@ChrisM1MBP % conda install jupyter pip pandas numpy matplotlib seaborn scikit-learn tqdm scipy lxml version_information lightgbm yellowbrick rdkit=2022.03.4
Now add pytorch
(myML) chrisswain@ChrisM1MBP % pip3 install torch torchvision torchaudio
This will install the packages and any additional dependencies
Installing collected packages: urllib3, typing-extensions, pillow, numpy, idna, charset-normalizer, certifi, torch, requests, torchvision, torchaudio
Now you can start jupyter
(myML) chrisswain@ChrisM1MBP %jupyter notebook
(myML) chrisswain@ChrisM1MBP pytorchtest % jupyter notebook
[I 12:51:09.557 NotebookApp] Serving notebooks from local directory: /Users/chrisswain/Projects/ForAIML
[I 12:51:09.557 NotebookApp] Jupyter Notebook 6.4.12 is running at:
Create a new notebook by clicking on the "New" button and selecting Python 3 (ipykernel)" Type this code to verify all the dependencies are available and check PyTorch version.
import version_information
%load_ext version_information
%reload_ext version_information
%version_information torch, numpy, scipy, pandas, scikit-learn, seaborn, matplotlib
Then check PyTorch version/GPU access.
import torch
import numpy as np
import pandas as pd
import sklearn
import matplotlib.pyplot as plt
print(f"PyTorch version: {torch.__version__}")
# Check PyTorch has access to MPS (Metal Performance Shader, Apple's GPU architecture)
print(f"Is MPS (Metal Performance Shader) built? {torch.backends.mps.is_built()}")
print(f"Is MPS available? {torch.backends.mps.is_available()}")
# Set the device
device = "mps" if torch.backends.mps.is_available() else "cpu"
print(f"Using device: {device}")
Hopefully you now have a Python environment on you Apple Silicon machine for AI/ML
You might want to also have a look at PyCaret https://pycaret.gitbook.io/docs/.
PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows. It is an end-to-end machine learning and model management tool that exponentially speeds up the experiment cycle and makes you more productive.
PyCaret is essentially a Python wrapper around several machine learning libraries and frameworks, such as scikit-learn, XGBoost, LightGBM, CatBoost, spaCy, Optuna, Hyperopt, Ray, and a few more (this pre release candidate is not in the yml). You can install PyCaret 3.0-rc
pip install --pre pycaret
To use Tensorflow we need to add a few more packages, some of which are available from the Apple conda channel. To add the channel type
conda config --addchannels apple
Then we can install
conda install tensorflow-deps
Amount other things you should see
tensorflow-deps apple/osx-arm64::tensorflow-deps-2.9.0-0
Followed by the following packages using pip
pip install tensorflow-macos
pip install tensorflow-metal
pip install bayesian-optimization
pip install gym
pip install kaggle
Register your Environment
The following command registers your myML environment and makes it available as a kernel in your the Jupyter notebook.
python -m ipykernel install --user --name myML --display-name "Python 3.9 (myML)"
Checking Tensorflow
(myML) chrisswain@ChrisM1MBP ~ % python
>>> import tensorflow.keras
>>> import tensorflow as tf
>>> print(f"Tensor Flow version: {tf.__version__}")
Tensor Flow version: 2.10.0
>>> gpu = len(tf.config.list_physical_devices('GPU'))>0
>>> print("GPU is" , "available" if gpu else "NOT AVAILABLE"
GPU is available
Running simple Machine learning projects in a Jupyter Notebook.
To further confirm all is working correctly I've created a series of Jupyter notebooks exploring a variety of machine learning data analysis workflows.
The notebooks can be downloaded here JupyterNotebooks
PLSmodel using https://scikit-learn.org/stable/modules/generated/sklearn.crossdecomposition.PLSRegression.html.
MLRmodel using https://scikit-learn.org/stable/modules/generated/sklearn.linearmodel.LinearRegression.html
RFmodel using https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html
Lightgbm using https://lightgbm.readthedocs.io/en/v3.3.2/.
All examples use a data set from the UCI machine learning repository https://archive.ics.uci.edu/ml/datasets/combined+cycle+power+plant.
The dataset contains 9568 data points collected from a Combined Cycle Power Plant over 6 years (2006-2011), when the power plant was set to work with full load. Features consist of hourly average ambient variables Temperature (T), Ambient Pressure (AP), Relative Humidity (RH) and Exhaust Vacuum (V) to predict the net hourly electrical energy output (EP) of the plant. A combined cycle power plant (CCPP) is composed of gas turbines (GT), steam turbines (ST) and heat recovery steam generators. In a CCPP, the electricity is generated by gas and steam turbines, which are combined in one cycle, and is transferred from one turbine to another. While the Vacuum is colected from and has effect on the Steam Turbine, he other three of the ambient variables effect the GT performance. Pınar Tüfekci, Prediction of full load electrical power output of a base load operated combined cycle power plant using machine learning methods, International Journal of Electrical Power & Energy Systems, Volume 60, September 2014, Pages 126-140, ISSN 0142-0615
Last Updated 11 October 2022