Macs in Chemistry

Insanely great science


Accessing a Jupyter Notebook HERG model from Vortex

This is another example of using a Vortex script to access a Machine Learning model in the same way as the AMES model published earlier.

A recent paper "The Catch-22 of Predicting hERG Blockade Using Publicly Accessible Bioactivity Data" DOI described a classification model for HERG activity

Drug-induced inhibition of the human ether-à-go-go-related gene (hERG)-encoded potassium ion channels can lead to fatal cardiotoxicity. Several marketed drugs and promising drug candidates were recalled because of this concern. Diverse modeling methods ranging from molecular similarity assessment to quantitative structure–activity relationship analysis employing machine learning techniques have been applied to data sets of varying size and composition (number of blockers and nonblockers). In this study, we highlight the challenges involved in the development of a robust classifier for predicting the hERG end point using bioactivity data extracted from the public domain. To this end, three different modeling methods, nearest neighbors, random forests, and support vector machines, were employed to develop predictive models using different molecular descriptors, activity thresholds, and training set compositions. Our models demonstrated superior performance in external validations in comparison with those reported in the previous studies from which the data sets were extracted.

I was delighted to see that all the datasets used in the study, including the training and external datasets, and the models generated using these datasets were provided as individual data files (CSV) and Python Jupyter notebooks, respectively, on GitHub

Building the models

The models were downloaded and the Random Forest Jupyter Notebooks using RDKit modified to save the generated model using pickle to store the predictive model, and then another Jupyter notebook was created to access the model without the need to rebuild the model.

You can view these notebooks here.(the download is at the bottom of this page).

Generating the HERG model Jupyter Notebook
Jupyter Notebook to evaluate a single SMILES

This notebook was exported as a python script to allow command line access. The python script accepts a SMILES string input and can be accessed from the command line as shown below.

MacPro:~ Chris$ python /Users/Chris/Projects/HERG/PublicationsMaster/hERG/models/ "Fc1ccc(cc1)n3c2ccc(Cl)cc2c(c3)C5CCN(CCN4C(=O)NCC4)CC5"
[[ 0.03  0.97]]

We can now access this python script from a Vortex script using subprocess. Note we have to use explicit paths to ensure the script uses the correct python, you will need to edit these paths.

The results are posted into two new columns as shown below.


I have a database of around 7000 molecules for which the HERG activity has been published, the results of the RF prediction are shown below (HERGactive = Blue, Inactive= Green).


Highlighting the most important features back onto the molecule

Whilst the classification model is great for scoring a database of molecules it would be very useful to know what features in the molecule are important in the random forest model for determining the outcome.

We can do this within the Jupyter notebook by first calculating the fingerprint contributions for each atom in the molecule then using SimilarityMaps we can plot this as a contour map. We also add the HERG result as the title of the image. I created a folder "pics" to store the generated images, you may want to think about using a tmp folder. I've also hard coded the image file name just to make it easier to follow.


Fingerprint Contributions

mols = [None, Chem.MolFromSmiles(mySMILES)]

radius = 2
bit_size = 1024
fps_morgan2 = []
info_morgan2 = []
num_mols = len(mols) - 1
mols = [mols[i+1] for i in range(num_mols)] # remove reference cmp from list
for m in mols:
    info = {}
    fps_morgan2.append(AllChem.GetMorganFingerprintAsBitVect(m, radius, bit_size, bitInfo=info))

mol_weights = []
for i,m in enumerate(mols):
    weights = []
    orig_pp = rf_clf.predict_proba(np.array(fps_morgan2[i]).reshape(1, -1))[0][1]
    # get bits for each atom
    bitmap = [~DataStructs.ExplicitBitVect(1024) for x in range(m.GetNumAtoms())]
    for bit, es in info_morgan2[i].items():
        for at1, rad in es:
            if rad == 0: # for radius 0
                bitmap[at1][bit] = 0
            else: # for radii > 0
                env = Chem.FindAtomEnvironmentOfRadiusN(m, rad, at1)
                amap = {}
                submol = Chem.PathToSubmol(m, env, atomMap=amap)
                for at2 in amap.keys():
                    bitmap[at2][bit] = 0
    # loop over atoms
    for at1 in range(m.GetNumAtoms()):
        new_fp = np.array(fps_morgan2[i] & bitmap[at1]).reshape(1, -1)
        new_pp = rf_clf.predict_proba(new_fp)[0][1]

Similarity Map

def generateSimilarityMaps(mols, weights, fp):
    '''Generates a similarity map for a set of molecules and weights'''
    # colormap to use
    mycm = cm.PiYG
    # loop over molecules
    for i,m in enumerate(mols):
        fig = Draw.MolToMPL(m, coordScale=1.5, size=(250,250))
        # the values 0.02 and 0.01 can be adjusted for the size of the molecule
        x,y,z = Draw.calcAtomGaussians(m, 0.02, step=0.01, weights=weights[i])
        # use the maximum absolute peak as maximum scale
        maxscale = max(math.fabs(np.min(z)), math.fabs(np.max(z)))
        # this does the coloring
        fig.axes[0].imshow(z, cmap=mycm, interpolation='bilinear', origin='lower', extent=(0,1,0,1), vmin=-maxscale, vmax=maxscale)
        # this draws 10 contour lines
        # alternatively also the z values for the lines can be specified
        fig.axes[0].contour(x, y, z, 10, colors='k', alpha=0.5)
        ax = fig.axes[0]
        ax.set_title(result) #adds HERG result to title
        # this writes the figure in a file
       # fig.savefig('pics/mol'+str(i+1)+'_'+fp+'.png', bbox_inches='tight') #if generating multiple molecules
        fig.savefig('pics/mol1_rf.png', bbox_inches='tight') # You need to decide where to store images

Plot Output

generateSimilarityMaps(mols, mol_weights, 'rf')
from IPython.display import Image


We can now use a contextual Vortex script to access the model for a selected molecule, one of the very little publicised features available in Vortex is the ability to run context dependent scripts, these scripts execute a specific function dependent on the cursor selection. So selecting one field will execute one script, whilst selecting another will execute a different script. These scripts need to be stored in the “context” folder which is inside the “Vortex_Add-ons” folder.

import sys
import com.dotmatics.vortex.util.Util as Util
import subprocess
import os

myhome = os.getenv('HOME')
#You need to edit to the path to the model
hergmodel = myhome + '/Projects/HERG/PublicationsMaster/hERG/models/'

import tempfile

smiles = ''

col = vtable.findColumnWithName('Structure', 0)

if (col == None):
   col = vtable.findColumnWithName('SMILES', 0)
   if (col == None):
      vortex.alert('Load a workspace with a Structure or SMILES column please.')
      smiles = col.getValueAsString(cell_row)
   smiles = vortex.getMolProperty(vtable.getMolFileManager().getMolFileAtRow(cell_row), 'SMILES')
smiles = smiles.encode('utf-8')
vortex.alert('SMILES = '+smiles)

tmpdir = tempfile.mkdtemp()
#tmpdir = '/Users/Chris/Public'

# Command for HERGimage

#MacPro:~ Chris$ python /Users/Chris/Projects/HERG/PublicationsMaster/hERG/models/ "Fc1ccc(cc1)n3c2ccc(Cl)cc2c(c3)C5CCN(CCN4C(=O)NCC4)CC5"

p = subprocess.Popen(['/usr/local/miniconda/bin/python', hergmodel, smiles], stdout=subprocess.PIPE)
output = p.communicate()[0] 

chart = vws.addChart(31)
plot = vws.findPlot(chart)
imagefile = 'file:////' + myhome + '/Projects/HERG/PublicationsMaster/hERG/models/pics/mol1_rf.png'

from java.awt import Dimension

a = plot.getParent().getParent().getParent().getParent()
a.setSize(Dimension(400, 400))

If you right click on a row you can select the script from the popup menu. A dialog box pops up displaying the SMILES (you can comment this out), click OK and the SMILES string is passed on to the HERG prediction model and the image returned and displayed. Once again I've used hard coded paths to make it easier to follow.



You will first need to download the published notebooks and datasets from GitHub You can then add the downloaded files to the "models" folder (replacing rf_classifier.ipynb with the modified version) and create the "pics" folder. This is the folder structure that I have.


All models and scripts can be downloaded here

Last updated 13 June 2018