Macs in Chemistry

Insanely Great Science

RDKit conformer generation script


Pharmacelera we have written a python script to generate conformations with RDKit and made it available here .

Conformer generation is one of the first and most important steps in most ligand based experiments, particularly when the ligand’s 3D structure is unknown. For example, the quality of the conformers could affect the results of virtual screening experiments.


Rdkit warning


I just saw this message on the rdkit mailing list and I thought I'd flag it.

I've noticed a problem with anaconda python on the Mac. This may also be a problem on linux, but I haven't tested that yet.

Due to some changes in the way the anaconda team is doing python builds, the most recent conda python builds seem to no longer work with the RDKit. The symptom is an error message like "Fatal Python error: PyThreadState_Get: no current thread" when you try to import the rdkit.

I've observed this for the newest 3.5 (3.5.4-hf91e95415) and 3.6 (3.6.2-hd0bf7f115) builds. A workaround is to downgrade to 3.5.3 (conda install python=3.5.3) or 3.6.1 (conda install python=3.6.1).


RDKit and Python3


Greg Landrum posted the following to the RDKit users and since a couple of the Jupyter Notebooks I've published make extensive use of RDKit I thought I'd flag it.

As many of you are no doubt aware, the Python community plans to discontinue support for Python 2 in 2020. A growing number of projects in the Scientific Python stack are making the same transition and have made that explicit here:

I will be adding the RDKit to this list. The RDKit will switch to support only Python 3 by 2020. At some point between now and then - likely during the 2018.09 release cycle - we will create a maintenance branch for Python 2 that will continue to get bug fixes but will no longer have new Python features added. This branch will be maintained, and we will keep doing Python 2 builds, until 2020 when official Python 2 support ends.

Additionally, starting during the 2018.03 release cycle we will accept contributions for new features that are not compatible with Python 2 as long as those features are implemented in such a way that they don't break existing Python 2 code (more on this later). This will allow members of the RDKit community who have made the switch to Python 3 to start making use of the new features of the language in their RDKit contributions.

If you have not made the switch yet to Python 3: please read the web page I link to above and take a look at the list of projects that have committed to transition. The switch from Python 2 to Python 3 isn't always easy, but it's not getting any easier with time and you have a few years to complete it. There are a lot of online resources available to help.

Best Regards, -greg

The list of projects that will be making the transition so far includes; IPython, Jupyter notebook, pandas, Matplotlib SymPy, Astropy, Software Carpentry, SunPy xonsh, scikit-bio, PyStan, Axelrod osBrain, PyMeasure, rpy2, PyMC3, FEniCS, An Introduction to Applied Bioinformatics, music21, QIIME, Altair, gala, cual-id, CIS


Conformer generation


The generation of multiple conformations is an important step in a number of operations from input to ab initio calculations to providing input files for docking studies. A recent paper compared seven freely available conformer ensemble generators: Balloon (two different algorithms), the RDKit standard conformer ensemble generator, the Experimental-Torsion basic Knowledge Distance Geometry (ETKDG) algorithm, Confab, Frog2 and Multiconf-DOCK DOI, and also provided a dataset of ligand conformations taken from the PDB.

A recent twitter discussion involving Greg Landrum and David Koes prompted Greg to publish a blog post describing conformation generation within RDKit. The post compares using distance geometry to select diverse conformations versus an approach that combines the distance geometry approach with experimental torsion-angle preferences obtained from small-molecule crystallographic data (ETKDG). He also looks at the impact of force-field minimisation.

A really interesting read with code provided.


RDkit and Conda install of postgres cartridge on Mac OS


There has been an interesting discussion about installing rdkit-postgresql95 on Mac OS X on the rdkit mailing list and I thought it might be of wider interest.

Here's the resolution of the difficulties I was having installing rdkit-postgresql95 on Mac OS X. The problem turned out to be that the package originally posted used Py3.5, and I'm still using 2.7. I may change to 3.5 at some point, but Greg was kind enough to add a 2.7 version of the package.

So, the following invocations work to set up rdkit with the cartridge in a new env on Mac OS X. I'm on El Capitan, by the way, and for clarity, I've not tested the installation, but only checked that it completed successfully.

conda create -n rdk1 -c rdkit rdkit
. activate rdk1
conda install -c greglandrum rdkit-postgresql95

(The last command also installs postgresql 9.5.4-0.)


iPython Notebook issue


I’ve just been made aware of an issue with one of the Calculated properties iPython Notebook.

The latest update to Pandas

the respective piece of the pandas API got restructured for 0.18.1 and that the “format" module got moved from pandas.core to pandas.formats:

The consequence is that PandasTools now raises an error on attempting to import molecules into a data frame.

from rdkit.Chem import PandasTools
df = PandasTools.LoadSDF("demo.sdf")

AttributeError                          Traceback (most recent call last)
/Users/philopon/mysrc/python/mordred/.direnv/python-3.5.1/lib/python3.5/site-packages/IPython/core/ in __call__(self, obj)
    341             method = _safe_get_formatter_method(obj, self.print_method)
    342             if method is not None:
--> 343                 return method()
    344             return None
    345         else:

/Users/philopon/mysrc/python/mordred/.direnv/python-3.5.1/lib/python3.5/site-packages/pandas/core/ in _repr_html_(self)
    567             return self.to_html(max_rows=max_rows, max_cols=max_cols,
--> 568                                 show_dimensions=show_dimensions, notebook=True)
    569         else:
    570             return None

/usr/local/Cellar/rdkit-python/2016.03.1/lib/python3.5/site-packages/rdkit/Chem/ in patchPandasHTMLrepr(self, **kwargs)
    129   Patched default escaping of HTML control characters to allow molecule image rendering dataframes
    130   '''
--> 131   formatter = pd.core.format.DataFrameFormatter(self,buf=None,columns=None,col_space=None,colSpace=None,header=True,index=True,
   132                                                na_rep='NaN',formatters=None,float_format=None,sparsify=None,index_names=True,
    133                                                justify = None, force_unicode=None,bold_rows=True,classes=None,escape=False)

AttributeError: module 'pandas.core' has no attribute 'format'

At the moment the only solution is to make sure you are using Pandas version 0.18.0

pip uninstall pandas    
pip install pandas==0.18.0


SAR visualization with RDKit


One of the issues for machine learning models in helping understand structure activity relationships (SAR) is providing a nice chemist friendly visualisation. This excellent blog post provides a description of how to colour code the parts of molecules that are predicted to contribute to an activity.



RDkit updated


RDkit has been updated .

If you used home-brew to install RDkit as described here updating is very simple

brew update
brew upgrade rdkit

You can check which version you have installed using

MacPro> python
Python 2.7.11 (default, Dec 23 2015, 16:11:50) 
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from rdkit import rdBase
>>> print rdBase.rdkitVersion


iPython Notebook to calc physicochemical properties


I've been making increasing use of iPython notebooks, both as a way to perform calculations but also as a way of cataloging the work that I've been doing. One thing I seem to be doing quite regularly is calculating physicochemical properties for libraries of compounds and then creating a trellis of plots to show each of the calculated properties. In the past I've done this with a series of applescripts using several applications. This seemed an ideal task to try out using an iPython notebook.




Chemical similarity search in MongoDB


MongoDB (from "humongous") is an open-source object orientated document database.

Classified as a NoSQL database, MongoDB eschews the traditional table-based relational database structure in favor of JSON-like documents with dynamic schemas (MongoDB calls the format BSON), making the integration of data in certain types of applications easier and faster.

As you might expect chemical searching is not something that is traditionally supported, but there have been a couple of blog articles describing initial efforts, and there is now a detailed step by step description available. The post described implementation of chemical similarity searching using MongoDB and RDKit fingerprints it also has some initial comparisons with the more traditional SQL implementation using the RDKit PostgreSQL cartridge.


FMCS 1.0 - Find Maximum Common Substructure

Andrew Dalke has just released fmcs-1.0. It finds a maximum common substructure of two or more structures. Some of the features are:

  • handles 1,000s of structures
  • several different atom and bond comparison schemes
  • modifiers to require ring bonds only match ring bonds, or that incomplete rings are not allowed in the MCS
  • user-defined atom class typing through isotope labels (SMILES) or through an SD tag field
  • uses an exact solution to find a maximum common substructure
  • eports the current best solution if the timeout is reached

The software is distributed under the 2-clause BSD license and available for no charge from

You must have the Python bindings to RDKit in order to run fmcs.

Usage details are in the README, shown also in the project page at: