Macs in Chemistry

Insanely Great Science

Embeding LaTeX and MathML in Jupyter Notebooks

 

I've been using Jupyter notebooks for a little while but I only just recently found out that you can embed LaTeX or MathML into a notebook!

This notebook is just a series of examples of what can be done. You can embed equations inline or have them on a separate line in a markdown text cell. Or in a code cell by importing Math or invoking latex.




Comments

Deep Replay

 

This looks rather neat, Deep Replay

Deep Replay is a package designed to allow you to replay in a visual fashion the training process of a Deep Learning model in Keras.

part1

To install Deep Replay just type:

pip install deepreplay

Comments

ChEMBL 24 predictive models

 

Recently ChEMBL was updated to version 24 the update contains:

  • 2,275,906 compound records
  • 1,828,820 compounds (of which 1,820,035 have mol files)
  • 15,207,914 activities
  • 1,060,283 assays
  • 12,091 targets
  • 69,861 documents

In addition today they released the predictive models built on the updated database, they can be downloaded from the ChEMBL ftp server ftp://ftp.ebi.ac.uk/pub/databases/chembl/target_predictions

There are 1569 models.


Comments

Accessing a Jupyter Notebook HERG model from Vortex

 

A recent paper "The Catch-22 of Predicting hERG Blockade Using Publicly Accessible Bioactivity Data" DOI described a classification model for HERG activity. I was delighted to see that all the datasets used in the study, including the training and external datasets, and the models generated using these datasets were provided as individual data files (CSV) and Python Jupyter notebooks, respectively, on GitHub https://github.com/AGPreissner/Publications).

The models were downloaded and the Random Forest Jupyter Notebooks (using RDKit) modified to save the generated model using pickle to store the predictive model, and then another Jupyter notebook was created to access the model without the need to rebuild the model each time. This notebook was exported as a python script to allow command line access, and Vortex scripts created that allow the user to run the model within Vortex and import the results and view the most significant features.

All models and scripts are available for download.

Full details are here…

hergactiveVortex


Comments

Jupyter and Fortran

 

Well after my last post about Swift and Jupyter a reader sent me link to the use of both Julia and Fortran programming languages in a Jupyter Notebook.

fortranJupyter

More information in this lecture Project Jupyter: Architecture and Evolution of an Open Platform for Modern Data Science by Fernando Perez.

Project Jupyter, evolved from the IPython environment, provides a platform for interactive computing that is widely used today in research, education, journalism and industry. The core premise of the Jupyter architecture is to provide tools for human-in-the-loop interactive computing. It provides protocols, file formats, libraries and user-facing tools optimized for the task of humans interactively exploring problems with the aid of a computer, combining natural and programming languages in a common computational narrative.


Comments

Swift 4.1 in a Jupyter Notebook

 

I'm a great fan of Jupyter Notebooks but I only ever use python.

The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text

A recent post by Ray Yamamoto Hilton caught my eye who recently put together a little experiment to demonstrate using Swift 4.1 from within Jupyter Notebooks.

You can download a demo notebook here.

swiftjupyter


Comments

Downloading from the RCSB Protein Data Bank using Python

 

The RCSB Protein Data Bank is an absolutely invaluable resource that provides archive-information about the 3D shapes of proteins, nucleic acids, and complex assemblies that helps scientists understand all aspects of biomedicine and agriculture, from protein synthesis to health and disease. Currently the PDB contains over 134,000 data files containing structural information on 42547 distinct protein sequences of which 37600 are human sequences. They also provide a series of tools to search, view and analyse the data.

Downloading an individual pdf file is pretty trivial and can be done from the web page as shown in the image below. They also provide a Download Tool launched as stand-alone application using the Java Web Start protocol. The tool is downloaded locally and must be then opened. I've found this a little temperamental and had issues with Java versions and security settings.

Since I've been making extensive use of the web services to interact with RCSB I decided to explore the use of Python to download multiple files. I started off creating a Jupyter notebook using the web services provided by RCSB.

I've also used variations on this code to create a python script and a Vortex script.

Full details are here …


Comments

Accessing Jupyter Notebook model from Vortex

 

I've become a great fan of Jupyter Notebooks as a way of modelling cheminformatics data, and I've published some of the notebooks here.

The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, machine learning and much more.

In the predicting AMES activity notebook I also looked at the use of pickle to store the predictive model and then access it using a Jupyter notebook without the need to rebuild the model. Whilst a notebook is a nice way to access the predictive model it might also be useful to be able to access it from other applications or from the command line.

In this tutorial we look at providing command line access to the model and then incorporating it into a Vortex script.

Scripting Vortex 38


Comments

Versions of python modules update

 

I the last post I asked about about adding version numbers. Almost immediately I got a brilliant response.

Simply install version_information, using either

pip install version_information

or

conda install version_information

Then

versions

Comments

Versions of python modules

 

I'm in the process of updating the Jupyter notebooks to Python3 and I looking at what I can do make sure other people can reproduce the results. At the moment I annotate the imported python modules with version numbers in the Jupyter notebook. Finding the versions is a bit tedious and I was wondering if there was some way to automate this?

from rdkit import Chem #rdkit 2016.03.5
from rdkit.Chem import PandasTools
import pandas as pd #pandas==0.17.1
import pandas_ml as pdml #pandas-ml==0.4.0
from rdkit.Chem import AllChem, DataStructs
import numpy #numpy==1.12.0
from sklearn.model_selection import train_test_split #scikit-learn==0.18.1
import subprocess
from StringIO import StringIO
import pickle
import os
%matplotlib inline
Comments

Python tutorials for OpenMM

 

This guide is a set of Jupyter notebooks intended to help researchers already familiar with molecular dynamics simulation learn how to use OpenMM in their research and software projects.

# For Mac OS X, substitute `MacOSX` for `Linux` below
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash -b ./Miniconda3-latest-Linux-x86_64.sh -p $HOME/miniconda
export PATH=$HOME/miniconda/bin:$PATH


conda install --yes -c omnia -c conda-forge jupyter notebook openmm mdtraj nglview

There is a detailed document describing OpenMM here

OpenMM is a set of libraries that lets programmers easily add molecular simulation features to their programs, and an “application layer” that exposes those features to end users who just want to run simulations. Instructions for installation under MacOSX are here.

OpenMM works on Mac OS X 10.7 or later. OpenCL is supported on OS X 10.10.3 or later.


Comments

A workflow for docking/virtual screening part 2

 

In the previous workflow I described docking a set of ligands with known activity into a target protein, in this workflow we will be using a set of ligands from the ZINC dataset searching for novel ligands. Once docked the workflow moves on to finding vendors and selecting subsets for purchase.

dockedligand


Comments

A workflow for docking/virtual screening (updated)

 

Whilst high-throughput screening (HTS) has been the starting point for many successful drug discovery programs the cost of screening, the lack of access to a large diverse sample collection, or the low throughput of the primary assay may preclude HTS as a starting point and identification of a smaller selection of compounds with a higher probability of being a hit may be desired. Directed or Virtual screening is a computational technique used in drug discovery research designed to identify potential hits for evaluation in primary assays. It involves the rapid in silico assessment of large libraries of chemical structures in order to identify those structures that most likely to be active against a drug target. The in silico screen can be based on known ligand similarity or based on docking ligands into the desired binding site.

In this workflow I'll be looking at using docking to identify potential hits.

I've updated the description to give more information about preparing the target protein.


Comments

A webinar demonstrating using Jupyter, the free iPython notebook

 

This is a recording of the March 2017 Global Health Compound Design meeting. A webinar demonstrating using Jupyter, the free iPython notebook.

https://youtu.be/XqyWctQxhNs

How to get started

Accessing Open Source Malaria data

Calculating physicochemical properties and plotting

Predicting AMES activity.



Comments

Publishing computational notebooks with Binder

 

I've now written a couple of Jupyter notebooks and one of the issues that has come up is how to share the notebooks in a way that ensures the results will be reproducible in an environment when updates to components occur regularly.

Binder is a collection of tools for building and executing version-controlled computational environments that contain code, data, and interactive front ends, like Jupyter notebooks. It's 100% open source.

At a high level, Binder is designed to make the following workflow as easy as possible

  • Users specify a GitHub repository
  • Repository contents are used to build Docker images
  • Deploy containers on-demand in the browser on a cluster running Kubernetes

Common use cases include:

  • sharing scientific work
  • sharing journalism
  • running tutorials and demos with minimal setup
  • teaching courses

binder

If you want to find out more have a look at this blog post by the developers.


Comments

Predicting AMES activity Jupyter Notebook

 

I've been experimenting with the use of Jupyter Notebooks (aka iPython Notebooks) as an electronic lab notebook but also a means to share computational models. The aim would be to see how easy it would be to share a model together with the associated training data together with an explanation of how the model was built and how it can be used for novel molecules.

The Ames test is a widely employed method that uses bacteria to test whether a given chemical can cause mutations in the DNA of the test organism. More formally, it is a biological assay to assess the mutagenic potential of chemical compounds. PNAS. 70 (8): 2281–5. doi

In this first notebook a random forest model to predict AMES activity is described….


Comments

Molecular Design Toolkit

 

The Molecular Design Toolkit is an open source environment that aims to seamlessly integrated molecular simulation, visualization and cloud computing. It offers access to a large and still-growing set of computational modelling methods with a science-focused Python API, that can be easily installed using PIP. It is ideal for building into a Jupyter notebook. The API is designed to handle both small molecules and large bimolecular structures, molecular mechanics and QM calculations.

wfn.png

There are a series of Youtube videos describing some of the functionality in more details, starting with this introduction.


Comments

nteract a desktop-based, interactive computing application.

 

This blog post looks very interesting, a notebook environment for coding, data visualisation based on Juypter (aka iPython) notebooks

With nteract, you can create documents, that contain executable code, textual content, and images, and convey a computational narrative. Unlike Jupyter, your documents are stand-alone, cross-platform desktop applications, providing a seamless desktop experience and offline usage.

nteract can run your existing Jupyter notebooks without any modification, and supports multiple Jupyter kernels: Python, R, Julia, and JavaScript. Being a native Jupyter notebook, nteract applications can be easily saved to Domino, versioned, shared, and if needed, run on high-performance machines in the cloud, in your VPC, or on-premise.

More details are on GitHub.


Comments