Macs in Chemistry

Insanely great science

 

Comparing OpenBabel on a M1 MacBook Pro Max with Intel MacBookPro (2016)

OpenBabel

Open Babel: An open chemical toolbox Open Babel presents a solution to the proliferation of multiple chemical file formats. In addition, it provides a variety of useful utilities from conformer searching and 2D depiction, to filtering, batch conversion, and substructure and similarity searching. For developers, it can be used as a programming library to handle chemical data in areas such as organic chemistry, drug design, materials science, and computational chemistry. Also Cheminformatics nodes for KNIME

Authors: Noel M O'Boyle, Michael Banck, Craig A James, Chris Morley, Tim Vandermeersch and Geoffrey R Hutchison Journal of Cheminformatics 2011 3:33 DOI https://doi.org/10.1186/1758-2946-3-33

Extensively used in nearly 50 projects (http://openbabel.org/wiki/Related_Projects) installs available for Linux, MacOSX and Windows.

OpenBabel is written in C++ and source code is available, bindings are also available to allow scripting access using Java, .NET, Perl, Python or Ruby.

OpenBabel was installed using miniconda

conda install -c conda-forge openbabel

File conversion

For testing the file conversion is used a selection of structures from ChEMBL, 2D structures in sdf file format. MWt 250 to 500, calc LogP 0 to 5. This is a 2.6 GB file containing 1,144,624 molecules

The command used was

time obabel -isdf 'ChEMBLsubset.sdf'    -osmiles    -O 'ChEMBLsubset.smiles'

Generating 3D structures

To test generating 3D structures I took a random 1000 structures from ChEMBL as 2D structures in sdf format and generated a sdf file containing 3D structures

The command used was

time obabel -isdf ChEMBL1000_2D.sdf -osdf -O  ChEMBL1000_3D.sdf   --gen3D

Generating conformations

The next test was to generate conformations using a Genetic algorithm: This is a stochastic conformer generator that generates diverse conformers either on an energy or RMSD basis

The command used was

time obabel myfile.sdf -O ga_conformers.sdf --conformer --nconf 100 --score rmsd --writeconformers

Filter based on a calculated property

The command line option --filter restricts conversion to only those molecules which meet specified chemical (and other) criteria. It makes it easy to select a subset of molecules. The information to do this can come either from properties imported with the molecule, as from a SDF file, or from calculations made by OpenBabel on the molecule. The test was run on 10K random structures from ZINC.

time obabel -isdf ZincRandom10K.sdf    -osdf    -O filtered.sdf --filter "MW<300"
4495 molecules converted

Generating a Fastsearch file

OpenBabel provides a format called the fs -- fastsearch index which should be used when searching large datasets (like ChEMBL) for molecules similar to a particular query. There are faster ways of searching (like using a chemical database) but FastSearch is convenient, and should give reasonable performance for most people. Generating the initial fast search index takes a while but subsequent searching is very fast.

The command used was

time obabel -isdf 'ChEMBLsubset.sdf'    -ofs    -O 'chemblefastsearch.fs'

The timings are shown in the table below.

Task Intel timeM1 max time
File Conversion 5 min 45 secs 2 min 52 secs
Convert to 3D 2 min 15 secs 1 min 41 secs
Generate conformations27 sec14 secs
Filter3.8 sec1.7 secs
Generate fs12.8 mins6.6 mins


OpenBabel is single threaded and so these commands do not test multi-core performance.

List of tools tested https://www.macinchem.org/reviews/MacBooks/m1macbookpromax.php

Last updated 30 Nov 2021