Macs in Chemistry

Insanely Great Science

Accessing Jupyter Notebook model from Vortex

 

I've become a great fan of Jupyter Notebooks as a way of modelling cheminformatics data, and I've published some of the notebooks here.

The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, machine learning and much more.

In the predicting AMES activity notebook I also looked at the use of pickle to store the predictive model and then access it using a Jupyter notebook without the need to rebuild the model. Whilst a notebook is a nice way to access the predictive model it might also be useful to be able to access it from other applications or from the command line.

In this tutorial we look at providing command line access to the model and then incorporating it into a Vortex script.

Scripting Vortex 38


Comments

Versions of python modules update

 

I the last post I asked about about adding version numbers. Almost immediately I got a brilliant response.

Simply install version_information, using either

pip install version_information

or

conda install version_information

Then

versions

Comments

Versions of python modules

 

I'm in the process of updating the Jupyter notebooks to Python3 and I looking at what I can do make sure other people can reproduce the results. At the moment I annotate the imported python modules with version numbers in the Jupyter notebook. Finding the versions is a bit tedious and I was wondering if there was some way to automate this?

from rdkit import Chem #rdkit 2016.03.5
from rdkit.Chem import PandasTools
import pandas as pd #pandas==0.17.1
import pandas_ml as pdml #pandas-ml==0.4.0
from rdkit.Chem import AllChem, DataStructs
import numpy #numpy==1.12.0
from sklearn.model_selection import train_test_split #scikit-learn==0.18.1
import subprocess
from StringIO import StringIO
import pickle
import os
%matplotlib inline
Comments

Python tutorials for OpenMM

 

This guide is a set of Jupyter notebooks intended to help researchers already familiar with molecular dynamics simulation learn how to use OpenMM in their research and software projects.

# For Mac OS X, substitute `MacOSX` for `Linux` below
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash -b ./Miniconda3-latest-Linux-x86_64.sh -p $HOME/miniconda
export PATH=$HOME/miniconda/bin:$PATH


conda install --yes -c omnia -c conda-forge jupyter notebook openmm mdtraj nglview

There is a detailed document describing OpenMM here

OpenMM is a set of libraries that lets programmers easily add molecular simulation features to their programs, and an “application layer” that exposes those features to end users who just want to run simulations. Instructions for installation under MacOSX are here.

OpenMM works on Mac OS X 10.7 or later. OpenCL is supported on OS X 10.10.3 or later.


Comments

A workflow for docking/virtual screening part 2

 

In the previous workflow I described docking a set of ligands with known activity into a target protein, in this workflow we will be using a set of ligands from the ZINC dataset searching for novel ligands. Once docked the workflow moves on to finding vendors and selecting subsets for purchase.

dockedligand


Comments

A workflow for docking/virtual screening (updated)

 

Whilst high-throughput screening (HTS) has been the starting point for many successful drug discovery programs the cost of screening, the lack of access to a large diverse sample collection, or the low throughput of the primary assay may preclude HTS as a starting point and identification of a smaller selection of compounds with a higher probability of being a hit may be desired. Directed or Virtual screening is a computational technique used in drug discovery research designed to identify potential hits for evaluation in primary assays. It involves the rapid in silico assessment of large libraries of chemical structures in order to identify those structures that most likely to be active against a drug target. The in silico screen can be based on known ligand similarity or based on docking ligands into the desired binding site.

In this workflow I'll be looking at using docking to identify potential hits.

I've updated the description to give more information about preparing the target protein.


Comments

A webinar demonstrating using Jupyter, the free iPython notebook

 

This is a recording of the March 2017 Global Health Compound Design meeting. A webinar demonstrating using Jupyter, the free iPython notebook.

https://youtu.be/XqyWctQxhNs

How to get started

Accessing Open Source Malaria data

Calculating physicochemical properties and plotting

Predicting AMES activity.



Comments

Publishing computational notebooks with Binder

 

I've now written a couple of Jupyter notebooks and one of the issues that has come up is how to share the notebooks in a way that ensures the results will be reproducible in an environment when updates to components occur regularly.

Binder is a collection of tools for building and executing version-controlled computational environments that contain code, data, and interactive front ends, like Jupyter notebooks. It's 100% open source.

At a high level, Binder is designed to make the following workflow as easy as possible

  • Users specify a GitHub repository
  • Repository contents are used to build Docker images
  • Deploy containers on-demand in the browser on a cluster running Kubernetes

Common use cases include:

  • sharing scientific work
  • sharing journalism
  • running tutorials and demos with minimal setup
  • teaching courses

binder

If you want to find out more have a look at this blog post by the developers.


Comments

Predicting AMES activity Jupyter Notebook

 

I've been experimenting with the use of Jupyter Notebooks (aka iPython Notebooks) as an electronic lab notebook but also a means to share computational models. The aim would be to see how easy it would be to share a model together with the associated training data together with an explanation of how the model was built and how it can be used for novel molecules.

The Ames test is a widely employed method that uses bacteria to test whether a given chemical can cause mutations in the DNA of the test organism. More formally, it is a biological assay to assess the mutagenic potential of chemical compounds. PNAS. 70 (8): 2281–5. doi

In this first notebook a random forest model to predict AMES activity is described….


Comments

Molecular Design Toolkit

 

The Molecular Design Toolkit is an open source environment that aims to seamlessly integrated molecular simulation, visualization and cloud computing. It offers access to a large and still-growing set of computational modelling methods with a science-focused Python API, that can be easily installed using PIP. It is ideal for building into a Jupyter notebook. The API is designed to handle both small molecules and large bimolecular structures, molecular mechanics and QM calculations.

wfn.png

There are a series of Youtube videos describing some of the functionality in more details, starting with this introduction.


Comments

nteract a desktop-based, interactive computing application.

 

This blog post looks very interesting, a notebook environment for coding, data visualisation based on Juypter (aka iPython) notebooks

With nteract, you can create documents, that contain executable code, textual content, and images, and convey a computational narrative. Unlike Jupyter, your documents are stand-alone, cross-platform desktop applications, providing a seamless desktop experience and offline usage.

nteract can run your existing Jupyter notebooks without any modification, and supports multiple Jupyter kernels: Python, R, Julia, and JavaScript. Being a native Jupyter notebook, nteract applications can be easily saved to Domino, versioned, shared, and if needed, run on high-performance machines in the cloud, in your VPC, or on-premise.

More details are on GitHub.


Comments