Macs in Chemistry

Insanely great science

 

Scripting Vortex 13

One thing I’ve needed to do a couple of times recently is give an idea of how many similar compounds are available to the set of compounds I’m currently viewing. For example in designing a fragment library it is very useful to know for a particular fragment how many similar fragments are commercially available. Or when looking at the results of a high-throughput screen how many similar analogues to a particular hit were also screened.

To do this we need a way of doing a rapid similarity search of the reference database. I use OpenBabel in particular using the fast search capability with molecular fingerprints.

Molecular fingerprints encode molecular structure in a series of binary digits (bits) that represent the presence or absence of particular substructures in the molecule. Comparing fingerprints will allow you rapidly to determine the similarity between two molecule

Using OpenBabel

We first need to create the fast search index, by default the fingerprint is FP2 (Indexes linear fragments up to 7 atoms) and does not need to be defined but I’ve left it in so you can see the format if you want to use other fingerprints

/usr/local/bin/obabel   '/Users/swain/Projects/FragFiles/sdf/allFragments.sdf'    -O '/Users/swain/Projects/FragFiles/sdf/allFragments.fs -xfFP2

The available fingerprints are

FP2    Indexes linear fragments up to 7 atoms.
FP3    SMARTS patterns specified in the file patterns.txt
FP4    SMARTS patterns specified in the file SMARTS_InteLigand.txt
MACCS    SMARTS patterns specified in the file MACCS.txt

This process may take a little while, on my MacBook Pro a 50K sdf took less than a minute to generate the fast search index. You can test using a simple search, the following Terminal command returns all structures with a Tanimoto similarity of > 0.85.

/usr/local/bin/obabel  '/Users/swain/Projects/FragFiles/sdf/allFragments.fs'  -osmi -s'C1CN(CCN1)C1=CC=CC=C1' -at0.85

c1(N2CCNCC2)ccccc1
N1(CCN(CC1)c1ccccc1)c1ccccc1
N1(CCN(CC1)C)c1ccccc1
N(c1ccccc1)CCN(CC)CC
N1(c2ccc(N)cc2)CCN(CC1)C
N1(c2ccc(N)cc2)CCN(CC1)CC
N1(c2ccc(N)cc2)CCN(CC1)C
N1(c2ccc(N)cc2)CCN(CC1)CC
N1(c2ccc(N)cc2)CCN(CC1)CC
N1(c2ccc(N)cc2)CCNCC1
N1(c2c(N)cccc2)CCNCC1
N1(c2c(N)cccc2)CCN(CC1)C
N1(c2cc(N)ccc2)CCN(CC1)C
N1(c2c(N)cccc2)CCN(CC1)C
N1(c2c(N)cccc2)CCN(CC1)CC
c1cccc(c1)N(CCN)CC
16 molecules converted

Whilst the following command would return the 5 most similar structures

/usr/local/bin/obabel  '/Users/swain/Projects/FragFiles/sdf/allFragments.fs'  -osmi -s'C1CN(CCN1)C1=CC=CC=C1' -at5

c1(N2CCNCC2)ccccc1
N1(CCN(CC1)c1ccccc1)c1ccccc1
N1(CCN(CC1)C)c1ccccc1
N(c1ccccc1)CCN(CC)CC
N1(c2ccc(N)cc2)CCN(CC1)CC
5 molecules converted

There is a slight catch, whilst all the output is displayed in the Terminal window not all is derived from stdout, this is because by default stderr is also displayed in the Terminal window and stdlog is mapped to stderr. We can check what comes from where by selectively suppressing either stdout or stderr. Many thanks to Chris Morley for explaining this.

This command stops the stderr output

/usr/local/bin/obabel  '/Users/swain/Projects/FragFiles/sdf/allFragments.fs'  -osmi -s'C1CN(CCN1)C1=CC=CC=C1' -at5 2> /dev/null

This results in

c1(N2CCNCC2)ccccc1 N1(CCN(CC1)c1ccccc1)c1ccccc1
N1(CCN(CC1)C)c1ccccc1
N(c1ccccc1)CCN(CC)CC
N1(c2ccc(N)cc2)CCN(CC1)CC

If we now stop the stdout output

/usr/local/bin/obabel  '/Users/swain/Projects/FragFiles/sdf/allFragments.fs'  -osmi -s'C1CN(CCN1)C1=CC=CC=C1' -at5 > /dev/null

This results in

5 molecules converted

Since for this script we only need to know the number of similar molecules we need to capture the stderr output.

Script Explanation

The first part imports various libraries and I have to confess I now use a simple boilerplate so not all are actually needed. We then open a dialog to allow the user to select the appropriate fast search file and get the absolute path. We then create a column to store the results in, and then loop through the table extracting the SMILES string for each structure. The next step is key, we create the subprocess but ensure we capture both stdout and stderr, (we actually only need stderr but by capturing both it does give options for modifying the script). You can obviously change the similarity coefficient from 0.85 if you want.

p = subprocess.Popen(['/usr/local/bin/obabel', fsFile, '-o', 'smi', '-s',  smiles, '-at0.85'], stdout=subprocess.PIPE, stderr=subprocess.PIPE)

The output is in the following format

'1 molecule converted\n’

We now parse the output, add it to the table and update the display.

Since we are doing a separate search for each row in the table this might be slow for large datasets.

numSimResults

The Vortex Script

# This uses the OpenBabel fastsearch (http://openbabel.org/wiki/Tutorial:Fingerprints)


import sys
import com.dotmatics.vortex.util.Util as Util

if Util.getPlatform() == Util.PlatformIsWindows:
  sys.path.append(vortex.getVortexFolder() + '\\modules\\jythonconsole')
  sys.path.append(vortex.getVortexFolder() + '\\modules\\jythonlib')
else:
  sys.path.append(vortex.getVortexFolder() + '/modules/jythonconsole')
  sys.path.append(vortex.getVortexFolder() + '/modules/jythonlib')

import urllib2
import urllib
import subprocess

# Open a dialog to choose a file
# getFile(title, extensions, 0 = Open, 1 = Save)
# vortex will keep track of the last folder you looked in etc

file = vortex.getFile("Choose a fastsearch file", [".fs"], 0)
#display file path
if file:
    vortex.alert(str(file))


# Need to coerce file path  
fsFile=file.getAbsolutePath() 


#   /usr/local/bin/babel '/Users/swain/Projects/FragFiles/sdf/allFragments.fs'  -osmi  -xt -s'NC(=N)c1ccccc1' 



column = vtable.findColumnWithName('NumSim', 1, 3)
mfm = vtable.getMolFileManager()
rows = vtable.getRealRowCount()
for r in range(0, rows):
    smiles = vortex.getMolProperty(mfm.getMolFileAtRow(r), 'SMILES')
    p = subprocess.Popen(['/usr/local/bin/obabel', fsFile, '-o', 'smi', '-s',  smiles, '-at0.85'], stdout=subprocess.PIPE, stderr=subprocess.PIPE)

# for stdout output = p.communicate()[0]

# stderr

    output = p.communicate()[1]

    # output format '1 molecule converted\n’

# Parse the output


    vals= output.split(' ')

    column = vtable.findColumnWithName('NumSim', 0)     
    column.setValueFromString(r, vals[0])


vtable.fireTableStructureChanged()

The vortex script can be downloaded from here FindNumSimilar.vpy.zip

Scripting Vortex Using OpenBabel
Scripting Vortex 2 Using filter-it
Scripting Votrex 3 Using cxcalc
Scripting Vortex 4 Using MOE
Scripting Vortex 5 Calculating similarities using OpenBabel
Scripting Vortex 6 Filtering compounds
Scripting Vortex 7 Using MayaChemTools
Scripting Vortex 8 Molecular Shape matching
Scripting Vortex 9 Getting a 2D depiction
Scripting Vortex 10 Interacting with the user
Scripting Vortex 11 Interacting with a web service
Scripting Vortex 12 JSON import
Scripting Vortex 13 Using OpenBabel fastsearch
Other Hints, Tips and Tutorials

Last Updated 7 May 2013