Macs in Chemistry

Insanely great science

 

Scripting Vortex 8

One of the critical activities of most drug discovery programs is the identification of novel leads, these hits can come from high throughput screening or fragment-based screening There is however great interest in virtual screening which allows the evaluation in silico of a vast number of compounds and the selection of a subset that have a greater chance of desired activity. The virtual screening can be achieved by searching using sub-structures or molecular descriptors, by docking potential ligands into the target protein and scoring the resulting docked pose, or by comparing with the shape and/or electrostatic map of a known ligand.

Shape-it is a tool developed by Silicos-it that aligns a reference molecule against a set of database molecules using the shape of the molecules as the align criterion. It is based on the use of Gaussian volumes as descriptor for molecular shape as it was introduced by Grant, J.A.; Gallardo, M.A.; Pickup, B.T. (1996) ‘A fast method of molecular shape comparison: a simple application of a Gaussian description of molecular shape’,J. Comp. Chem. 17, 1653-1666.

Shape-it™ is based on the computation of the overlap between two molecules when their atomic volumes are represented using a Gaussian function. The rationale is to find the single alignment of reference and database molecule in which their volume overlap is maximized. You can read more about the mathematics behind the application here on the Shape-it manual page.

Shape-it can be downloaded from here, to install

INTRODUCTION AND REQUIREMENTS

The following tools are required to compile shape-it:

If you want to install globally on your system, you will need admin access, and should follow these instructions.

INSTALL GLOBALLY (YOU NEED ADMIN ACCESS)

The double click the downloaded shape-it-1.0.0.tar.gz file

This will create a folder called ’shape-it-1.0.0'.

You now need to configure and compile filter-it. Run the following commands, one after the other:

cd shape-it-1.0.0
cmake CMakeLists.txt
make
make install (you may need to use sudo)
make clean

Typing shape-it -h in the Terminal should give the following help message.

ChrisMacbookPro:~ swain$ shape-it -h
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
  Shape-it v1.0.0 | Feb 18 2012 15:33:28

-> GCC:         4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.1.00)
-> Open Babel:  2.3.1

  Copyright 2012 by Silicos-it, a division of Imacosi BVBA

 Shape-it is free software: you can redistribute it and/or modify
 it under the terms of the GNU Lesser General Public License as published
  by the Free Software Foundation, either version 3 of the License, or
  (at your option) any later version.

  Shape-it is distributed in the hope that it will be useful,
  but WITHOUT ANY WARRANTY; without even the implied warranty of
  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 GNU Lesser General Public License for more details.

 Shape-it is linked against OpenBabel version 2.
  OpenBabel is free software; you can redistribute it and/or modify
 it under the terms of the GNU General Public License as published by
  the Free Software Foundation version 2 of the License.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


TASK:

  Shape-it is a tool to align pairs of molecules based on the maximal volume overlap.

REQUIRED OPTIONS: 

  -r|--ref <file>
       File of the reference molecule with 3D coordinates.
       Only the first molecule in the reference file will be used.
       Shape-it can also handle a gzipped files if the extension is '.gz'
       All input formats which are recognized by Open Babel are allowed.
  -d|--dbase <file>
       File of the database molecules with 3D coordinates.
       Shape-it can also handle gzipped files if the extension is '.gz'
       All input formats which are recognized by Open Babel are allowed.

OUTPUT OPTIONS: 

       One of these two output options is required.
  -o|--out <file>
       File to write all database or the N best molecules such that their
       coordinates correspond to the best alignment with the reference molecule.
       The first molecule in the file is the reference molecule. When this file
       if of type 'sdf', then each molecule contains a set of properties in which
       the respective scores are reported. These fields are labeled with an identifier
       starting with the tag Shape-it::
-s|--scores <file>
       Tab-delimited text file with the scores of molecules.
       When the N best scoring molecules are reported the molecules are ranked
       with the descending scores.

OPTIONAL OPTIONS: 

 --best <N> 
       When this option is used, only the N best scoring alignments will be reported.
       The scoring function is defined by the --rankBy option.
       By default all molecules in the database are reported with their respective
       scores without any ordering.
  --scoreOnly
       When this option is used the molecules are not aligned, only the volume overlap
       between the reference and the given pose is computed.
  --addIterations <nbr>
       Sets the number of additional iterations in the simulated annealing optimization step.
       The default value is set to 0, which refers to only a local gradient ascent.
       Increasing the number of iterations will add additional steps, and might give better
       alignments but it also takes more time.
  --rankBy <code>
       This option can be used in combination with --best of --cutoff to rank the molecules
       according to a given scoring function. The type of scoring function is indicated with
       a code:
         - TANIMOTO = Taninoto
         - TVERSKY_REF = reference Tversky
         - TVERSKY_DB = database Tversky
       By default TANIMOTO is used.
  --cutoff <value>
       Defines a cutoff value. Only molecules with a score higher than the cutoff
       are reported in the results files. Default value is set to 0.0.
       The scoring function is defined by the --rankBy option.
 --noRef
       By default the reference molecule is written in the output files.
       Use this option to switch off this behavior.

HELP: 

  -h|--help
       Prints this help overview.
  -v|--version
       Prints the version of the program.

The program expects a single reference molecule (with three-dimensional coordinates) and a database file containing one or more molecules (with three-dimensional coordinates) that need to be shape-aligned onto the reference molecule. The tool returns all aligned database molecules and their respective shape overlap scores, or the top-best scoring molecules. Since Shape-it does not do any conformational analysis it is probably worth having multiple conformations of each ligand in the query database to match onto the reference template.

The first part of the script should be pretty familiar it gets the path to the current sdf file, this will be the database file for the shape-it script. Shape-it creates two files, a database containing all the aligned molecules, and a text file containing the shape overlap scores as tab delimited text, in the Vortex script we get the path to the desktop and create the paths to the two output files, if you want to save these files elsewhere you will need to edit these paths.

import os

# get path to desktop
mydesk=os.path.join(os.path.expanduser("~"), "Desktop")

txtoutputfile=(mydesk + '/shapeitOutput.txt')
sdfoutputfile=(mydesk + '/shapeitOutput.sdf')

We now get the template file, we open a dialog and ask the user to select the file, the template file is returned as a "sun.awt.shell.DefaultShellFolder" object and so we need to then determine the path for insertion into the command.

#Get Template file

# Open a dialog to choose a file
# getFile(title, extensions, 0 = Open, 1 = Save)
# vortex will keep track of the last folder you looked in etc

tempfile = vortex.getFile("Choose a file", [".sdf", ".mol"], 0)

templatefile=tempfile.getAbsolutePath()

The shape-it command is of the following format, there are other options but we don’t need them for this script.

/usr/local/bin/shape-it -r '/Users/username/Desktop/ChemicalStructures/acetophenone_template_min.sdf' -d '/Users/username/Desktop/input.sdf' -o '/Users/username/Desktop/shapeit.sdf' -s '/Users/username/Desktop/shapeit.txt'  --rankBy TANIMOTO

So we build the command by substituting the relevant file path variables.

p =subprocess.Popen(['/usr/local/bin/shape-it', '-r',  templatefile, '-d',  sdfFile, '-o', sdfoutputfile, '-s', txtoutputfile],stdout=subprocess.PIPE)

The remainder of the script is a slight modification of previous scripts, we first read the file from the desktop. Then we create the columns, in this case the first two columns are set to text the remainder numeric. We then parse the data and import into the data table. The screening is pretty quick, on my laptop using a database of drug-like molecules I was getting through 20-25 molecules per second.

The Vortex Script

import sys

#Uncomment the following 2 lines if running in console
#vortex = console.vortex
#vtable = console.vtable



sys.path.append(vortex.getVortexFolder() + '/modules/jythonlib')

import subprocess

# Get the path to the currently open sdf file
sdfFile = vortex.getFileForPropertyCalculation(vtable)

import os

# get path to desktop
mydesk=os.path.join(os.path.expanduser("~"), "Desktop")

txtoutputfile=(mydesk + '/shapeitOutput.txt')
sdfoutputfile=(mydesk + '/shapeitOutput.sdf')

#Get Template file

# Open a dialog to choose a file
# getFile(title, extensions, 0 = Open, 1 = Save)
# vortex will keep track of the last folder you looked in etc

tempfile = vortex.getFile("Choose a template file", [".sdf", ".mol"], 0)

#templatefile=(mydesk + '/acetophenone_template_min.sdf')

templatefile=tempfile.getAbsolutePath()

# Run shape-it
#  /usr/local/bin/shape-it -r '/Users/username/Desktop/ChemicalStructures/acetophenone_template_min.sdf' -d '/Users/username/Desktop/input.sdf' -o '/Users/username/Desktop/shapeit.sdf' -s '/Users/username/Desktop/shapeit.txt'  --rankBy TANIMOTO 

p =subprocess.Popen(['/usr/local/bin/shape-it', '-r',  templatefile, '-d',  sdfFile, '-o', sdfoutputfile, '-s', txtoutputfile],stdout=subprocess.PIPE) 
output = p.communicate()[0]


# Read output file

f = open(mydesk + '/shapeitOutput.txt', 'r')
output=f.read()




# Create new columns in table if needed
lines = output.split('\n')
colName = lines[0].split('\t')
for i,c in enumerate(colName):
    if i == 0:
        column = vtable.findColumnWithName(c, 1, 3)
    elif i == 1:
        column = vtable.findColumnWithName(c, 1, 3)
    else:
        column = vtable.findColumnWithName(c, 1, 1)
vtable.fireTableStructureChanged()




keys = []
for i in lines:
    words = i.split('\t')
    if len(words) == 2:
        keys.append(words[0])

# Parse the output
rows = lines[1:len(lines)]
for r in range(0, vtable.getRealRowCount()):
    vals = rows[r].split('\t')
    for j in range(0, len(vals)):
        column = vtable.findColumnWithName(colName[j], 0)
        column.setValueFromString(r, vals[j])

The resulting table can be used to select compounds, or you could use the similarity script to add extra columns using different descriptors to measure similarity and then choose compounds that represent each of the different similarity measures or add physicochemical properties and use them as an additional way to filter compounds or create scatter plots to look for diversity.

shape-it

The vortex script can be downloaded from here shape-it.vpy.zip

The Vortex Scripts

Scripting Vortex Using OpenBabel
Scripting Vortex 2 Using filter-it
Scripting Votrex 3 Using cxcalc
Scripting Vortex 4 Using MOE
Scripting Vortex 5 Calculating similarities using OpenBabel
Scripting Vortex 6 Filtering compounds
Scripting Vortex 7 Using MayaChemTools
Scripting Vortex 8 Molecular Shape matching
Scripting Vortex 9 Getting a 2D depiction
Scripting Vortex 10 Interacting with the user
Scripting Vortex 11 Interacting with a web service
Scripting Vortex 12 JSON import
Scripting Vortex 13 Using OpenBabel fastsearch
Other Hints, Tips and Tutorials

Last updated 10 March 2012