Comparing OpenBabel on a M1 MacBook Pro Max with Intel MacBookPro (2016)
OpenBabel
Open Babel: An open chemical toolbox Open Babel presents a solution to the proliferation of multiple chemical file formats. In addition, it provides a variety of useful utilities from conformer searching and 2D depiction, to filtering, batch conversion, and substructure and similarity searching. For developers, it can be used as a programming library to handle chemical data in areas such as organic chemistry, drug design, materials science, and computational chemistry. Also Cheminformatics nodes for KNIME
Authors: Noel M O'Boyle, Michael Banck, Craig A James, Chris Morley, Tim Vandermeersch and Geoffrey R Hutchison Journal of Cheminformatics 2011 3:33 DOI https://doi.org/10.1186/1758-2946-3-33
Extensively used in nearly 50 projects (http://openbabel.org/wiki/Related_Projects) installs available for Linux, MacOSX and Windows.
OpenBabel is written in C++ and source code is available, bindings are also available to allow scripting access using Java, .NET, Perl, Python or Ruby.
OpenBabel was installed using miniconda
conda install -c conda-forge openbabel
File conversion
For testing the file conversion is used a selection of structures from ChEMBL, 2D structures in sdf file format. MWt 250 to 500, calc LogP 0 to 5. This is a 2.6 GB file containing 1,144,624 molecules
The command used was
time obabel -isdf 'ChEMBLsubset.sdf' -osmiles -O 'ChEMBLsubset.smiles'
Generating 3D structures
To test generating 3D structures I took a random 1000 structures from ChEMBL as 2D structures in sdf format and generated a sdf file containing 3D structures
The command used was
time obabel -isdf ChEMBL1000_2D.sdf -osdf -O ChEMBL1000_3D.sdf --gen3D
Generating conformations
The next test was to generate conformations using a Genetic algorithm: This is a stochastic conformer generator that generates diverse conformers either on an energy or RMSD basis
The command used was
time obabel myfile.sdf -O ga_conformers.sdf --conformer --nconf 100 --score rmsd --writeconformers
Filter based on a calculated property
The command line option --filter restricts conversion to only those molecules which meet specified chemical (and other) criteria. It makes it easy to select a subset of molecules. The information to do this can come either from properties imported with the molecule, as from a SDF file, or from calculations made by OpenBabel on the molecule. The test was run on 10K random structures from ZINC.
time obabel -isdf ZincRandom10K.sdf -osdf -O filtered.sdf --filter "MW<300"
4495 molecules converted
Generating a Fastsearch file
OpenBabel provides a format called the fs -- fastsearch index which should be used when searching large datasets (like ChEMBL) for molecules similar to a particular query. There are faster ways of searching (like using a chemical database) but FastSearch is convenient, and should give reasonable performance for most people. Generating the initial fast search index takes a while but subsequent searching is very fast.
The command used was
time obabel -isdf 'ChEMBLsubset.sdf' -ofs -O 'chemblefastsearch.fs'
The timings are shown in the table below.
Task | Intel time | M1 max time |
---|---|---|
File Conversion | 5 min 45 secs | 2 min 52 secs |
Convert to 3D | 2 min 15 secs | 1 min 41 secs |
Generate conformations | 27 sec | 14 secs |
Filter | 3.8 sec | 1.7 secs |
Generate fs | 12.8 mins | 6.6 mins |
OpenBabel is single threaded and so these commands do not test multi-core performance.
List of tools tested https://www.macinchem.org/reviews/MacBooks/m1macbookpromax.php
Last updated 30 Nov 2021