I've now written a couple of Jupyter notebooks and one of the issues that has come up is how to share the notebooks in a way that ensures the results will be reproducible in an environment when updates to components occur regularly.
Binder is a collection of tools for building and executing version-controlled computational environments that contain code, data, and interactive front ends, like Jupyter notebooks. It's 100% open source.
At a high level, Binder is designed to make the following workflow as easy as possible
- Users specify a GitHub repository
- Repository contents are used to build Docker images
- Deploy containers on-demand in the browser on a cluster running Kubernetes
Common use cases include:
- sharing scientific work
- sharing journalism
- running tutorials and demos with minimal setup
- teaching courses
If you want to find out more have a look at this blog post by the developers.
I've been experimenting with the use of Jupyter Notebooks (aka iPython Notebooks) as an electronic lab notebook but also a means to share computational models. The aim would be to see how easy it would be to share a model together with the associated training data together with an explanation of how the model was built and how it can be used for novel molecules.
The Ames test is a widely employed method that uses bacteria to test whether a given chemical can cause mutations in the DNA of the test organism. More formally, it is a biological assay to assess the mutagenic potential of chemical compounds. PNAS. 70 (8): 2281–5. doi
In this first notebook a random forest model to predict AMES activity is described….
The Molecular Design Toolkit is an open source environment that aims to seamlessly integrated molecular simulation, visualization and cloud computing. It offers access to a large and still-growing set of computational modelling methods with a science-focused Python API, that can be easily installed using PIP. It is ideal for building into a Jupyter notebook. The API is designed to handle both small molecules and large bimolecular structures, molecular mechanics and QM calculations.
There are a series of Youtube videos describing some of the functionality in more details, starting with this introduction.
This blog post looks very interesting, a notebook environment for coding, data visualisation based on Juypter (aka iPython) notebooks
With nteract, you can create documents, that contain executable code, textual content, and images, and convey a computational narrative. Unlike Jupyter, your documents are stand-alone, cross-platform desktop applications, providing a seamless desktop experience and offline usage.
More details are on GitHub.
I’ve just been made aware of an issue with one of the Calculated properties iPython Notebook.
The latest update to Pandas
the respective piece of the pandas API got restructured for 0.18.1 and that the “format" module got moved from pandas.core to pandas.formats:
The consequence is that PandasTools now raises an error on attempting to import molecules into a data frame.
from rdkit.Chem import PandasTools df = PandasTools.LoadSDF("demo.sdf") AttributeError Traceback (most recent call last) /Users/philopon/mysrc/python/mordred/.direnv/python-3.5.1/lib/python3.5/site-packages/IPython/core/formatters.py in __call__(self, obj) 341 method = _safe_get_formatter_method(obj, self.print_method) 342 if method is not None: --> 343 return method() 344 return None 345 else: /Users/philopon/mysrc/python/mordred/.direnv/python-3.5.1/lib/python3.5/site-packages/pandas/core/frame.py in _repr_html_(self) 566 567 return self.to_html(max_rows=max_rows, max_cols=max_cols, --> 568 show_dimensions=show_dimensions, notebook=True) 569 else: 570 return None /usr/local/Cellar/rdkit-python/2016.03.1/lib/python3.5/site-packages/rdkit/Chem/PandasTools.py in patchPandasHTMLrepr(self, **kwargs) 129 Patched default escaping of HTML control characters to allow molecule image rendering dataframes 130 ''' --> 131 formatter = pd.core.format.DataFrameFormatter(self,buf=None,columns=None,col_space=None,colSpace=None,header=True,index=True, 132 na_rep='NaN',formatters=None,float_format=None,sparsify=None,index_names=True, 133 justify = None, force_unicode=None,bold_rows=True,classes=None,escape=False) AttributeError: module 'pandas.core' has no attribute 'format'
At the moment the only solution is to make sure you are using Pandas version 0.18.0
pip uninstall pandas pip install pandas==0.18.0
I'm making increasing use of iPython notebooks and this package looks like it will be very useful.
nglview is a Python package that makes it easy to visualize molecular systems, including trajectories, directly in the Jupyter Notebook. The recent 0.4.0 release of nglview brings a convenient interface for visualizing MDAnalysis Universe and AtomGroup objects directly:
The notebook widget allows you to rotate and zoom the molecule and lets you select atoms by clicking on the molecule.
Easily installed using PIP
pip install nglview
There have been a number of comments and responses via twitter highlighting this superb demo.
The project is on Github, feel free to contribute!
With the release of ChEMBL 21 has come a set of updated target predicted models.
The good news is that, besides the increase in terms of training data (compounds and targets), the new models were built using the latest stable versions of RDKit (2015.09.2) and scikit-learn (0.17). The latter was upgraded from the much older 0.14 version, which was causing incompatibility issues while trying to use the models.
I've been using the models and I thought I'd share an iPython Notebook I have created. This is based on the ChEMBL notebook with code tidbits taken from the absolutely invaluable Stack Overflow. I'm often in the situation where I actually want to know the predicted activity at specific targets, and specifically want to confirm lack of predicted activity at potential off-targets. I could have a notebook for each target but actually the speed of calculation means that I can calculate all the models and then just cherry pick those of interest.
I've been making increasing use of iPython notebooks, both as a way to perform calculations but also as a way of cataloging the work that I've been doing. One thing I seem to be doing quite regularly is calculating physicochemical properties for libraries of compounds and then creating a trellis of plots to show each of the calculated properties. In the past I've done this with a series of applescripts using several applications. This seemed an ideal task to try out using an iPython notebook.
There is a great blog article on ChEMBL-og, describing their work evaluating chemical structure based searching in MongoDB. MongoDB is a NoSQL database designed for scalability and performance that is attracting a lot of interest at the moment.
The article does a great job in explaining the logic behind improving the search performance.
They also provide an iPython notebook so you can try it yourself.
The Open Source Malaria project is trying a different approach to curing malaria. Guided by open source principles, everything is open and anyone can contribute. To date a lot of people around the world have made contributions and the project is at a very exciting stage. Whilst everyone can see the compounds that have been made and the biological data, it is often spread over multiple web pages and can be tricky to link molecule with identifier with data. Over the last couple of months a significant effort has been put into populating a spreadsheet with all the information.
I've recently published a Vortex script to access the information, I've now published an iPython notebook that also shows how to import the data. Why not give it a try and then contribute your findings and suggestions to the Open Source Malaria project.