Macs in Chemistry

Insanely great science

 

Comparing RDKit on a M1 MacBook Pro Max with Intel MacBookPro (2016)

https://www.rdkit.org

The RDKit is an open source toolkit for cheminformatics, 2D and 3D molecular operations, descriptor generation for machine learning, etc. There's also a molecular database cartridge for PostgreSQL and cheminformatics nodes for KNIME (distributed from the KNIME community site: https://www.knime.org/rdkit)

The RDKit core algorithms and data structures are written in C++. Wrappers are provided to use the toolkit from either Python (2.x and 3.x), Java, or C#.

RDKIt was installed on both machines using miniconda

conda install -c rdkit

There are a standard set of benchmarks that run with the RDKit in order to detect systematic performance improvements or regressions. Those are here:

https://github.com/rdkit/rdkit/blob/master/Regress/Scripts/new_timings.py

https://github.com/rdkit/rdkit/blob/master/Regress/Scripts/new_timings.py

The associated data files are in the folder

https://github.com/rdkit/rdkit/blob/master/Regress/Data

The scripts run through a variety of cheminformatics operations

The command used was

python new_timings.py

The script was run 3 times and the fastest times shown below.

The Results

Intel timeM1 max time
INFO: mols from smiles
Results1: 11.08 seconds, 50000 passed, 0 failed
INFO: Writing: Canonical SMILES
Results2: 4.99 seconds
INFO: mols from sdf
Results1: 3.96 seconds, 10000 passed, 0 failed
INFO: patterns from smiles
Results3: 0.04 seconds, 823 passed, 0 failed
INFO: Matching1: HasSubstructMatch
Results4: 22.94 seconds
INFO: Matching2: GetSubstructMatches
Results5: 22.94 seconds
INFO: reading SMARTS
Results6: 0.01 seconds for 428 patterns
INFO: Matching3: HasSubstructMatch
Results7: 90.56 seconds
INFO: Matching4: GetSubstructMatches
Results8: 85.82 seconds
INFO: Writing: Mol blocks
Results10: 15.20 seconds
INFO: BRICS decomposition
Results11: 27.28 seconds
INFO: Generate 2D coords
Results12: 9.39 seconds
INFO: Generate topological fingerprints
Results16: 78.53 seconds
INFO: Generate morgan fingerprints
Results16: 3.31 second
INFO: mols from smiles
Results1: 4.11 seconds, 50000 passed, 0 failed
INFO: Writing: Canonical SMILES
Results2: 2.16 seconds
INFO: mols from sdf
Results1: 1.60 seconds, 10000 passed, 0 failed
INFO: patterns from smiles
Results3: 0.02 seconds, 823 passed, 0 failed
INFO: Matching1: HasSubstructMatch
Results4: 13.55 seconds
INFO: Matching2: GetSubstructMatches
Results5: 13.67 seconds
INFO: reading SMARTS
Results6: 0.01 seconds for 428 patterns
INFO: Matching3: HasSubstructMatch
Results7: 56.01 seconds
INFO: Matching4: GetSubstructMatches
Results8: 50.21 seconds
INFO: Writing: Mol blocks
Results10: 7.35 seconds
INFO: BRICS decomposition
Results11: 13.51 seconds
INFO: Generate 2D coords
Results12: 4.88 seconds
INFO: Generate topological fingerprints
Results16: 51.80 seconds
INFO: Generate morgan fingerprints
Results16: 1.61 seconds


These all measure single core performance.

Pharmacelera have created an open-source python script for conformation generation genConf.py. This script generates conformations plus a number of filters to generate a diverse selection of reasonable conformations. This is very typical workflow and as such is good measure of likely performance benefit.

genConf.py script workflow generated by Pharmacelera

confgen

Again I used a selection of 1000 random structures from ChEMBL.

The Intel MacBook Pro took 4 hours 43 mins
The MacBook Pro M1 max took 2 hours 46 mins.

List of tools tested https://www.macinchem.org/reviews/MacBooks/m1macbookpromax.php

Last update 29 November 2021