Macs in Chemistry

Insanely great science

 

Scripting Vortex

I’m finding that I using Vortex more and more in my day job, it is an excellent application for displaying and exploring large or complex datasets. In fact the only issue is getting data into Vortex. It is possible to save the dataset in sdf format and then use other applications to generate addition fields and then use the rather nice merge function within Vortex, but I end up with multiple copies of the datasets as I use different applications to calculate descriptors or properties, and it would be nicer to be able to choose an option and have all the columns of data added automatically. Since all the applications I want to use have a command line interface I thought this might be an ideal opportunity to try scripting Vortex to send the structures to an external application and import the results.

vortex_1

Vortex contains a powerful scripting facility built on Jython a java implementation of the Python programming language and allows access to the key components of Vortex, Python and Java. Whilst it is possible to build a Swing JPanel to provide a GUI the scripts I have in mind will not need a user interface. Scripts in Vortex can be accessed via the scripts menu. This menu is dynamically built from the content of a users local files folder (On Mac you will find the vortex folder in a users home area (addressable via ~/vortex)). I created a sub folder inside the Script folder and called it “My Scripts”. Vortex scripts can also be executed by running a .vpy file from the system explorer.

These first four scripts use some of the tools provided by OpenBabel a free opensource Chemistry Toolbox, one of these tools is the obprop program a tool to print a set of standard molecular properties for all molecules in a file, the output includes:-

name [Name]
formula [Formula]
mol_weight [Molecular Weight]
exact_mass [Isotopic Mass]
canonical_SMILES [String]
num_atoms [Number]
num_bonds [Number]
num_residues [Number]
sequence [Residue Sequence]
num_rings [Number of Rings (by SSSR)]
logP [Number (octanol-water partition)]
PSA [Number (topological polar surface area)]
MR [Number (molar refractivity)]

The obprop tool can be accessed from the Terminal, and the output for the first molecule in the file is shown below where $$$$ is the delimiter between molecule records.

MacbookPro:~ PROMPT$ /usr/local/bin/obprop '/Users/username/Desktop/temp.sdf' 

name             N#Cc1ccc(cc1)Cn1cncc1CN[C@@H]1CCN(c2ccncc2)C1=O
formula          C21H20N6O
mol_weight       372.423
exact_mass       372.17
canonical_SMILES N#Cc1ccc(cc1)Cn1cncc1CN[C@@H]1CCN(C1=O)c1ccncc1    N#Cc1ccc(cc1)Cn1cncc1CN[C@@H]1CCN(c2ccncc2)C1=O

InChI            InChI=1S/C21H20N6O/c22-11-16-1-3-17(4-2-16)14-26-15-24-12-19(26)13-25-20-7-10-27(21(20)28)18-5-8-23-9-6-18/h1-6,8-9,12,15,20,25H,7,10,13-14H2/t20-/m1/s1

num_atoms        48
num_bonds        51
num_residues     0
sequence         -
num_rings        4
logP             2.54908
PSA              86.84
MR               107.424
$$$$

The Vortex script

The script starts by getting the path of the sdf that was imported into Vortex, we then construct the obprop command and pipe the output into a variable “output”. The next part creates the columns and uses the names from the obprop output to name them. The last part is used to parse the output “$$$$” is the divider between molecule records, each line is then a name and value pair.

import sys

# Uncomment the following 2 lines if running in console
#vortex = console.vortex
#vtable = console.vtable

sys.path.append(vortex.getVortexFolder() + '/modules/jythonlib')

import subprocess

# Get the path to the currently open sdf file
sdfFile = vortex.getFileForPropertyCalculation(vtable)

# Run obprop on the file
p = subprocess.Popen(['/usr/local/bin/obprop', sdfFile], stdout=subprocess.PIPE)
output = p.communicate()[0]

# Create new columns in table if needed
lines = output.split('\n')
keys = []
for i in lines:
    words = i.split(' ', 1)
    if len(words) == 2:
        keys.append(words[0])

columns = list(set(keys))
for c in columns:
    column = vtable.findColumnWithName(c, 1)
    vtable.fireTableStructureChanged()

# Parse the output
rows = output.split('$$$$')
for r in range(0, vtable.getRealRowCount()):
    keyvals = rows[r].split('\n')
    if len(keyvals) > 1:
        for i in keyvals:
            words = i.split(' ', 1)
            if len(words) == 2:
                key = words[0]
                value = words[1].lstrip()
                column = vtable.findColumnWithName(key, 0)
                column.setValueFromString(r, value)

One advantage of this approach is that if further properties are added to obprop the script will automatically add further columns.

The result looks like this, I’ve hidden the “sequence” and “num residues” columns). vortex_props

Similarity Calculation Scripts

The next three scripts calculate molecular similarity. One of the tasks I regularly undertake is to take an active lead structure and run a series of searches in order to identify potential compounds for evaluation (substructure, pharmacophore searches, docking etc.) and it is useful to be able to compare the results with similarity measures.

OpenBabel supports four different fingerprints

PROMPT> babel -L fingerprints
FP2    Indexes linear fragments up to 7 atoms.
FP3    SMARTS patterns specified in the file patterns.txt
FP4    SMARTS patterns specified in the file SMARTS_InteLigand.txt
MACCS    SMARTS patterns specified in the file MACCS.txt

These fingerprints can be used for similarity searches, for example the following command gives you the Tanimoto coefficient between a SMILES string in mysmiles.smi and all the molecules in mymols.sdf:

PROMPT>  babel  mysmiles.smi  mymols.sdf -ofpt
MOL_00000067   Tanimoto from first mol = 0.0888889
MOL_00000083   Tanimoto from first mol = 0.0869565
MOL_00000105   Tanimoto from first mol = 0.0888889
MOL_00000296   Tanimoto from first mol = 0.0714286
MOL_00000320   Tanimoto from first mol = 0.0888889

If you don’t specify a query file babel will just use the first molecule in the sdf file as the query as shown below

PROMPT> babel /Users/username/Desktop/temp.sdf -ofpt

The default fingerprint used is the FP2 fingerprint. You change the fingerprint using the "f" output option, the example below shows the command and the output.

MacbookPro:~ PROMPT$ babel /Users/username/Desktop/temp.sdf -ofpt -xfMACCS
N#Cc1ccc(cc1)Cn1cncc1CN[C@@H]1CCN(c2ccncc2)C1=O
N#Cc1ccc(cc1)Cn1cncc1CN[C@@H]1CCN(C1=O)c1cccc2ncccc12   Tanimoto from N#Cc1ccc(cc1)Cn1cncc1CN[C@@H]1CCN(c2ccncc2)C1=O = 0.979592
Possible superstructure of N#Cc1ccc(cc1)Cn1cncc1CN[C@@H]1CCN(c2ccncc2)C1=O
N#Cc1ccc(cc1)Cn1cncc1CN[C@H]1CCN(C1)C(=O)c1cccnc1C   Tanimoto from N#Cc1ccc(cc1)Cn1cncc1CN[C@@H]1CCN(c2ccncc2)C1=O = 0.843137
N#Cc1ccc(cc1)Cn1cncc1CN[C@H]1CCN(C1)C(=O)c1cccnc1N   Tanimoto from N#Cc1ccc(cc1)Cn1cncc1CN[C@@H]1CCN(c2ccncc2)C1=O = 0.830189
N#Cc1ccc(cc1)Cn1cncc1CN[C@H]1CCN(C1)C(=O)c1cccnc1O   Tanimoto from N#Cc1ccc(cc1)Cn1cncc1CN[C@@H]1CCN(c2ccncc2)C1=O = 0.803571
N#Cc1ccc(cc1)Cn1cncc1CN[C@H]1CCN(C1)C(=O)c1cccnc1OC   Tanimoto from N#Cc1ccc(cc1)Cn1cncc1CN[C@@H]1CCN(c2ccncc2)C1=O = 0.789474
N#Cc1ccc(cc1)Cn1cncc1CN[C@H]1CCN(C1)C(=O)c1cccnc1S   Tanimoto from N#Cc1ccc(cc1)Cn1cncc1CN[C@@H]1CCN(c2ccncc2)C1=O = 0.767857
N#Cc1ccc(cc1)Cn1cncc1CN[C@H]1CCN(C1)C(=O)c1cccnc1SC   Tanimoto from N#Cc1ccc(cc1)Cn1cncc1CN[C@@H]1CCN(c2ccncc2)C1=O = 0.754386

The Similarity script

Again the first part gets the path to the sdf file imported into Vortex (the file has the active lead structure as the first record), the next part constructs and runs the babel script. The results are piped into output. The columns are created if needed (note occasionally you may get “Possible superstructure”). Each record is separated by a linefeed “\n” and each line is parsed to get the similarity score, the exception being if a line is “Possible superstructure”.

import sys

# Uncomment the following 2 lines if running in console
#vortex = console.vortex
#vtable = console.vtable

sys.path.append(vortex.getVortexFolder() + '/modules/jythonlib')

import subprocess

# Get the path to the currently open sdf file
sdfFile = vortex.getFileForPropertyCalculation(vtable)

# Run obprop on the file
p = subprocess.Popen(['/usr/local/bin/babel', sdfFile, '-ofpt', '-xfMACCS'], stdout=subprocess.PIPE)
output = p.communicate()[0]

column = vtable.findColumnWithName('Sim_MACCS', 1)
column = vtable.findColumnWithName('Possible_Superstructure', 1)
vtable.fireTableStructureChanged()

lines = output.split('\n')
currentRow = 1
for i in range(1, len(lines)-1):
    if lines[i][0] == '>':
        column = vtable.findColumnWithName('Sim_MACCS', 0)
        column.setValueFromString(currentRow, lines[i].split()[-1])
        currentRow += 1
    elif lines[i][0:23] == 'Possible superstructure':
        column = vtable.findColumnWithName('Possible_Superstructure', 0)
        column.setValueFromString(currentRow-1, 'YES')

The script above uses the MACCS fingerprints if you want to use one of the other fingerprints just alter the sections highlighted in red.

Other Hints, Tips and Tutorials Updated 31 October 2011