Macs in Chemistry

Insanely great science

 

Python, Chemistry and a Mac 1

Python is rapidly becoming the scripting language of choice for scientists, and whilst SciPy and NumPy are probably the best known scientific tools for Python there are actually a huge number of Scientific Python Resources available. One that is useful for data analysis is pandas, an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

If you are planning to do much scripting using python it is worth investigating iPython, this provides a rich architecture for interactive computing with:

Whilst there are several options for installing iPython I used:

pip install ipython[all]

To check the installation type

iptest

I got an error something like

libpng warning: Application built with libpng-1.5.17 but running with 1.6.10

It seems this is a known problem, the solution seems to be to uninstall XQuartz, uninstall matplotlib and python and reinstall.

Uninstall XQuartz

launchctl unload /Library/LaunchAgents/org.macosforge.xquartz.startx.plist
sudo launchctl unload /Library/LaunchDaemons/org.macosforge.xquartz.privileged_startx.plist
sudo rm -rf /opt/X11* /Library/Launch*/org.macosforge.xquartz.* /Applications/Utilities/XQuartz.app /etc/*paths.d/*XQuartz
sudo pkgutil --forget org.macosforge.xquartz.pkg

Uninstall matplotlib and ipython

pip uninstall matplotlib
pip uninstall ipython

Reinstall matplotlib and ipython

pip install matplotlib
pip install ipython

Then reinstall XQuartz

Python and Openbabel

Open Babel can be accessed via two Python modules:

The openbabel module: This contains the standard Python bindings automatically generated using SWIG from the C++ API.

The Pybel Module: This is a light-weight wrapper around the classes and methods in the openbabel module. Pybel provides more convenient and Pythonic ways to access the Open Babel toolkit. N.M. O’Boyle, C. Morley and G.R. Hutchison. Pybel: a Python wrapper for the OpenBabel cheminformatics toolkit. Chem. Cent. J. 2008, 2, 5.DOI,

Pybel provides convenience functions and classes that make it simpler to use the Open Babel libraries from Python, especially for file input/output and for accessing the attributes of atoms and molecules.

A simple example

In a Terminal window type

python
Python 2.7.6 (default, Mar 13 2014, 10:34:57) 
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.38)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> 
>>> import openbabel
>>> import pybel
>>> mymol = pybel.readstring("smi","CCN(CC)CC")
>>> myWt = mymol.molwt
>>> myWt
101.19
>>>

Molecules have the following attributes: atoms, charge, data, dim, energy, exactmass, formula, molwt, spin, sssr, title and unitcell (if crystal data).

python
Python 2.7.6 (default, Mar 13 2014, 10:34:57) 
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.38)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> 
>>> 
>>> import openbabel
>>> import pybel
>>> mymol = pybel.readstring('smi', "CCCC1=NN(C2=C1N=C(NC2=O)C3=C(C=CC(=C3)S(=O)(=O)N4CCN(CC4)C)OCC)C")
>>> myWt = mymol.molwt
>>> myWt
474.57639999999975
>>> 
>>> for atom in mymol:
...     print atom.type, atom.coords
... 
C3 (0.0, 0.0, 0.0)
C3 (0.0, 0.0, 0.0)
C3 (0.0, 0.0, 0.0)
Car (0.0, 0.0, 0.0)
Nar (0.0, 0.0, 0.0)
Nar (0.0, 0.0, 0.0)
Car (0.0, 0.0, 0.0)
Car (0.0, 0.0, 0.0)
Nar (0.0, 0.0, 0.0)
Car (0.0, 0.0, 0.0)
Nar (0.0, 0.0, 0.0)
Car (0.0, 0.0, 0.0)
O2 (0.0, 0.0, 0.0)
Car (0.0, 0.0, 0.0)
Car (0.0, 0.0, 0.0)
Car (0.0, 0.0, 0.0)
Car (0.0, 0.0, 0.0)
Car (0.0, 0.0, 0.0)
Car (0.0, 0.0, 0.0)
So2 (0.0, 0.0, 0.0)
O2 (0.0, 0.0, 0.0)
O2 (0.0, 0.0, 0.0)
N3 (0.0, 0.0, 0.0)
C3 (0.0, 0.0, 0.0)
C3 (0.0, 0.0, 0.0)
N3 (0.0, 0.0, 0.0)
C3 (0.0, 0.0, 0.0)
C3 (0.0, 0.0, 0.0)
C3 (0.0, 0.0, 0.0)
O3 (0.0, 0.0, 0.0)
C3 (0.0, 0.0, 0.0)
C3 (0.0, 0.0, 0.0)
C3 (0.0, 0.0, 0.0)

The draw() method of a Molecule generates 2D coordinates and a 2D depiction of a molecule

>>> mymol.draw(show=False, filename='/Users/username/Desktop/mymol.png')

mymol

If a molecule does not have 3D coordinates, they can be generated using the make3D() method. By default, this includes 50 steps of a geometry optimisation using the MMFF94 forcefield.

>>> mymol.make3D()
>>> for atom in mymol:
...     print atom.type, atom.coords
... 
C3 (0.9102395320252875, -0.08216900571989316, -0.016101157549428615)
C3 (2.4249701004892388, -0.09477976547644144, -0.01626968214051937)
C3 (2.992466507832764, 1.2071469471134735, 0.5393608290047934)
Car (4.480486610263952, 1.2284444412922508, 0.5540253212300712)
....
>>>

Using iPython Notebook

One of the advantages of using the iPython notebook is the inline rendering of structures, as both a 2D layout or as a 3D structure that can be rotated.

pycandy1

We can of course combine Pybel with other Python libraries, the following script was created using the iPython notebook, you can download the notebook here.

In [43]:
import openbabel
import pybel
import pandas as pd
%matplotlib inline

In [68]:
mymols = []
for mymol in pybel.readfile("sdf", "/Users/swain/Projects/PublishedFragments/frag.sdf"):
    mymols.append(mymol)

print 'Read %s molecules' % len(mymols)   

Read 709 molecules

We can now use Openbabel to calculate a variety of properties and descriptors.

In [69]:
descvalues = []
for mol in mymols:
    descvalues.append(mol.calcdesc())

print [k for k in descvalues[0]]

['TPSA', 'smarts', 'HBD', 'nF', 'logP', 'title', 'MW', 'tbonds', 'cansmi', 'InChI', 'formula', 'InChIKey', 'bonds', 'atoms', 'L5', 'HBA1', 'HBA2', 'sbonds', 'cansmiNS', 'dbonds', 's', 'MP', 'MR', 'abonds']

We now read them into a pandas dataframe, and plot the logP as a histogram.

In [66]:
df2 =pd.DataFrame(descvalues)
df2["logP"].hist()

Out[66]:
<matplotlib.axes.AxesSubplot at 0x112a2f890>

Installing Open Drug Discovery Toolkit (ODDT): a new open-source player in the drug discovery field

A recent paper in J Cheminformatics described Open Drug Discovery Toolkit (ODDT): a new open-source player in the drug discovery field DOI a free and open source tool for both computer aided drug discovery (CADD) developers and researchers. Open Drug Discovery Toolkit is released on a permissive 3-clause BSD license for both academic and industrial use. ODDT’s source code, additional examples and documentation are available on GitHub.

ODDT (Open Drug Discovery Toolkit)

Programming language: Python

Other requirements:

at least one of the toolkits:

OpenBabel (2.3.2+),

RDKit (2012.03)

Python (2.7+)

Numpy (1.6.2+)

Scipy (0.10+)

Sklearn (0.11+)

ffnet (0.7.1+), only for neural network functionality.

Installation of the toolkits is described here.

The easiest way to install ODDT on a Mac is to use PIP

pip install oddt

You may get messages suggesting you upgrade some of the dependencies such as scipy, this can be done using PIP

pip install —upgrade scipy

You can easily check all is working by running python in a terminal window

python
Python 2.7.10 (default, Jun  3 2015, 09:19:56) 
[GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import oddt
>>> mol = oddt.toolkit.readstring('smi', 'Cc1c(cc(cc1[N+](=O)[O-])[N+](=O)[O-])[N+](=O)[O-]')
>>> mol.atom_dict['atomtype']
array(['C.3', 'C.ar', 'C.ar', 'C.ar', 'C.ar', 'C.ar', 'C.ar', 'N.pl',
   'O.2', 'O.co', 'N.pl', 'O.2', 'O.co', 'N.pl', 'O.2', 'O.co'], 
  dtype='|S4')
>>> mol.atom_dict['isacceptor']
array([False, False, False, False, False, False, False, False,  True,
    True, False,  True,  True, False,  True,  True], dtype=bool)
>>>

The publication also includes a series of iPython notebooks to get you started.

Useful Resources

Scientific Python Resources Numeric and scientific resources for Python

Python Scientific Lecture Notes a quick introduction to central tools and techniques.

Fernando Pérez, Brian E. Granger, IPython: A System for Interactive Scientific Computing, Computing in Science and Engineering, vol. 9, no. 3, pp. 21-29, May/June 2007, doi. URL: http://ipython.org

rdkit Tutorials as ipython notebooks

Last updated 23 June 2015