Macs in Chemistry

Insanely Great Science

Publishing computational notebooks with Binder


I've now written a couple of Jupyter notebooks and one of the issues that has come up is how to share the notebooks in a way that ensures the results will be reproducible in an environment when updates to components occur regularly.

Binder is a collection of tools for building and executing version-controlled computational environments that contain code, data, and interactive front ends, like Jupyter notebooks. It's 100% open source.

At a high level, Binder is designed to make the following workflow as easy as possible

  • Users specify a GitHub repository
  • Repository contents are used to build Docker images
  • Deploy containers on-demand in the browser on a cluster running Kubernetes

Common use cases include:

  • sharing scientific work
  • sharing journalism
  • running tutorials and demos with minimal setup
  • teaching courses


If you want to find out more have a look at this blog post by the developers.


Predicting AMES activity Jupyter Notebook


I've been experimenting with the use of Jupyter Notebooks (aka iPython Notebooks) as an electronic lab notebook but also a means to share computational models. The aim would be to see how easy it would be to share a model together with the associated training data together with an explanation of how the model was built and how it can be used for novel molecules.

The Ames test is a widely employed method that uses bacteria to test whether a given chemical can cause mutations in the DNA of the test organism. More formally, it is a biological assay to assess the mutagenic potential of chemical compounds. PNAS. 70 (8): 2281–5. doi

In this first notebook a random forest model to predict AMES activity is described….


Molecular Design Toolkit


The Molecular Design Toolkit is an open source environment that aims to seamlessly integrated molecular simulation, visualization and cloud computing. It offers access to a large and still-growing set of computational modelling methods with a science-focused Python API, that can be easily installed using PIP. It is ideal for building into a Jupyter notebook. The API is designed to handle both small molecules and large bimolecular structures, molecular mechanics and QM calculations.


There are a series of Youtube videos describing some of the functionality in more details, starting with this introduction.


nteract a desktop-based, interactive computing application.


This blog post looks very interesting, a notebook environment for coding, data visualisation based on Juypter (aka iPython) notebooks

With nteract, you can create documents, that contain executable code, textual content, and images, and convey a computational narrative. Unlike Jupyter, your documents are stand-alone, cross-platform desktop applications, providing a seamless desktop experience and offline usage.

nteract can run your existing Jupyter notebooks without any modification, and supports multiple Jupyter kernels: Python, R, Julia, and JavaScript. Being a native Jupyter notebook, nteract applications can be easily saved to Domino, versioned, shared, and if needed, run on high-performance machines in the cloud, in your VPC, or on-premise.

More details are on GitHub.


iPython Notebook issue


I’ve just been made aware of an issue with one of the Calculated properties iPython Notebook.

The latest update to Pandas

the respective piece of the pandas API got restructured for 0.18.1 and that the “format" module got moved from pandas.core to pandas.formats:

The consequence is that PandasTools now raises an error on attempting to import molecules into a data frame.

from rdkit.Chem import PandasTools
df = PandasTools.LoadSDF("demo.sdf")

AttributeError                          Traceback (most recent call last)
/Users/philopon/mysrc/python/mordred/.direnv/python-3.5.1/lib/python3.5/site-packages/IPython/core/ in __call__(self, obj)
    341             method = _safe_get_formatter_method(obj, self.print_method)
    342             if method is not None:
--> 343                 return method()
    344             return None
    345         else:

/Users/philopon/mysrc/python/mordred/.direnv/python-3.5.1/lib/python3.5/site-packages/pandas/core/ in _repr_html_(self)
    567             return self.to_html(max_rows=max_rows, max_cols=max_cols,
--> 568                                 show_dimensions=show_dimensions, notebook=True)
    569         else:
    570             return None

/usr/local/Cellar/rdkit-python/2016.03.1/lib/python3.5/site-packages/rdkit/Chem/ in patchPandasHTMLrepr(self, **kwargs)
    129   Patched default escaping of HTML control characters to allow molecule image rendering dataframes
    130   '''
--> 131   formatter = pd.core.format.DataFrameFormatter(self,buf=None,columns=None,col_space=None,colSpace=None,header=True,index=True,
   132                                                na_rep='NaN',formatters=None,float_format=None,sparsify=None,index_names=True,
    133                                                justify = None, force_unicode=None,bold_rows=True,classes=None,escape=False)

AttributeError: module 'pandas.core' has no attribute 'format'

At the moment the only solution is to make sure you are using Pandas version 0.18.0

pip uninstall pandas    
pip install pandas==0.18.0




I came across the jupyter-docker-pymol recently and thought I'd give it a mention. It is a Container-based installation of PyMol, with interaction through the browser via ipymol and Jupyter notebook (based on jupyter/notebook).

This project uses PyMol and Python 3



Molecular visualization in the Jupyter Notebook with nglview


I'm making increasing use of iPython notebooks and this package looks like it will be very useful.

nglview is a Python package that makes it easy to visualize molecular systems, including trajectories, directly in the Jupyter Notebook. The recent 0.4.0 release of nglview brings a convenient interface for visualizing MDAnalysis Universe and AtomGroup objects directly:

More details here…

The notebook widget allows you to rotate and zoom the molecule and lets you select atoms by clicking on the molecule.

Easily installed using PIP

pip install nglview


There have been a number of comments and responses via twitter highlighting this superb demo.


The project is on Github, feel free to contribute!


ChEMBL Models iPython Notebook


With the release of ChEMBL 21 has come a set of updated target predicted models.

The good news is that, besides the increase in terms of training data (compounds and targets), the new models were built using the latest stable versions of RDKit (2015.09.2) and scikit-learn (0.17). The latter was upgraded from the much older 0.14 version, which was causing incompatibility issues while trying to use the models.

I've been using the models and I thought I'd share an iPython Notebook I have created. This is based on the ChEMBL notebook with code tidbits taken from the absolutely invaluable Stack Overflow. I'm often in the situation where I actually want to know the predicted activity at specific targets, and specifically want to confirm lack of predicted activity at potential off-targets. I could have a notebook for each target but actually the speed of calculation means that I can calculate all the models and then just cherry pick those of interest.

Read on…


iPython Notebook to calc physicochemical properties


I've been making increasing use of iPython notebooks, both as a way to perform calculations but also as a way of cataloging the work that I've been doing. One thing I seem to be doing quite regularly is calculating physicochemical properties for libraries of compounds and then creating a trellis of plots to show each of the calculated properties. In the past I've done this with a series of applescripts using several applications. This seemed an ideal task to try out using an iPython notebook.




LSH-based similarity search in MongoDB is faster than postgres cartridge


There is a great blog article on ChEMBL-og, describing their work evaluating chemical structure based searching in MongoDB. MongoDB is a NoSQL database designed for scalability and performance that is attracting a lot of interest at the moment.

The article does a great job in explaining the logic behind improving the search performance.

They also provide an iPython notebook so you can try it yourself.


Accessing Open Source Malaria Data using an iPython Notebook


The Open Source Malaria project is trying a different approach to curing malaria. Guided by open source principles, everything is open and anyone can contribute. To date a lot of people around the world have made contributions and the project is at a very exciting stage. Whilst everyone can see the compounds that have been made and the biological data, it is often spread over multiple web pages and can be tricky to link molecule with identifier with data. Over the last couple of months a significant effort has been put into populating a spreadsheet with all the information.

I've recently published a Vortex script to access the information, I've now published an iPython notebook that also shows how to import the data. Why not give it a try and then contribute your findings and suggestions to the Open Source Malaria project.