Macs in Chemistry

Insanely great science

 

Scripting Vortex 2

This is the second page on scripting Vortex, on the first page I described how to use OpenBabel to calculate a limited selection of chemical properties. In this script we will use one of the brilliant tools from silicos vortex_1 filter-it is a program for filtering out molecules with unwanted properties (this program used to be known as sieve). It is based on the Open Babel open source C++ API for rapid calculation of molecular properties. The program comes with a number of pre-programmed molecular properties that can be used for filtering. These properties include, amongst others:

filter-it is a command line-driven program that is instructed by means of command line options and a user-definable filter file. It is by means of this ‘filter’ file that the user can define the actual filter criteria to be used.

filter-it can be downloaded from here, to install

INTRODUCTION AND REQUIREMENTS

The following tools are required to compile filter-it:

If you want to install globally on your system, you will need admin access, and should follow these instructions.

INSTALL GLOBALLY (YOU NEED ADMIN ACCESS)

The double click the downloaded filter-it-1.0.0.tar.gz file

This will create a folder called 'filter-it-1.0.0'.

You now need to configure and compile filter-it. Run the following commands, one after the other:

cd filter-it-1.0.0
cmake CMakeLists.txt
make
make install (you may need to use sudo)
make clean

Typing filter-it -h in the Terminal should give the following help message.

ChrisMacbookPro:~ swain$ filter-it -h
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Filter-it v1.0.0 | Feb 18 2012 09:50:49

-> GCC:        4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.1.00)
-> Open Babel: 2.3.1

Copyright 2012 by Silicos-it, a division of Imacosi BVBA

Filter-it is free software: you can redistribute it and/or modify
it under the terms of the GNU Lesser General Public License as published
by the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

Filter-it is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public License
 along with Filter-it.  If not, see http://www.gnu.org/licenses/.

Filter-it is linked against OpenBabel version 2.
OpenBabel is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation version 2 of the License.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


TASK: 

Filter-it is a tool to filter molecules from molecules.

USAGE 

 filter-it [options]

REQUIRED OPTIONS:

 --input=<file>
Specifies the file containing the input molecules. The format of the
file is specified by the file extension or, with higher priority, by the
optional --inputFormat=<format> option. Gzipped files are also processed.

 --filter=<file>
Specifies the file that defines the filter criteria.

OPTIONAL OPTIONS:

 --inputFormat=<format>
Specifies the format of the input file. The <format> argument is required.

 --pass[=file]
Specifies the file to which the molecules are written that pass the
filtering. The format of the file is specified by the file extension or,
with higher priority, by the --passFormat=<format> option. If the
extension is .gz, a compressed file will be written. The [file] argument
is optional; if not provided all output will be written to standard output
in a format defined by the --passFormat=<format> option.

 --passFormat=<format>
Specifies the format of the pass file. The <format> argument is required.

 --fail[=file]
Specifies the file to which the molecules are written that do not pass the
filter criteria. The format of the file is specified by the file extension
or, with higher priority, by the --failFormat=<format> option. If the
extension is .gz, a compressed file will be written. The [file] argument
is optional; if not provided all output will be written to standard output
in a format defined by the --failFormat=<format> option.

 --failFormat=<format>
Specifies the format of the fail file. The <format> argument is required.

 --tab[=file]
This flag directs the program to calculate all properties listed in
the filter definition file without applying any filtering step.
The calculated parameters are written to [file]. The [file] argument is
optional; if not provided all output is written to standard output.

 --salts
This flag directs the program not to strip away all salt fragments from the
molecules before the filtering takes place. By specifying this option,
this stripping is not performed and ensures that all salt counterions
are also taken into account during the filtering process.

 --rename
This flag directs the program to rename the title of each molecules
into a increasing digit reflecting the sequence of the molecule in
the input file. Existing titles are overwritten.

 --noLog
This flag specifies whether verbose logging should be switched off. When not
specified, then for each molecule a message is written to standard error
whether the molecule passes or fails the filter criteria. This behaviour
can be switched off with this command-line option. However, even
when this option is specified, information is still written to standard
error but reduced to a large extend.

 -h  --help

 -v  --version

The downloaded file also contains several other folders including example filter files, I created a folder in the ‘Applications’ folder called ‘Silicos’ and I’ve put all the various silicos filters etc in there. However you choose to organise it you will need to know the location of both the sieve executable and the filter files. Whilst sieve is a powerful for filtering molecules dependent on various paramters for this script we are just going to use it to calculate a list of 45 different molecular properties. If you did this from the command line it would look like this:-

/usr/local/bin/filter-it --in sdfFile, --filter' /Applications/Silicos/filter-it-1.0.0/filters/Leadlike.sieve --tab

This uses multi.sdf as the input file and outputs a tab separated list of values to stdout.

Vortex contains a powerful scripting facility built on Jython a java implementation of the Python programming language and allows access to the key components of Vortex, Python and Java. The Vortex script to use this application is shown below, the script starts by getting the path of the sdf file that was imported into Vortex, we then construct the sieve command and pipes the output into a variable “output”. The output is then parsed, using \n to separate each line and \t to separate each value on each line. The first line contains the column names and these are used to populate the Vortex columns, the other lines contain the data and this is used to populate the table.

import sys
 # Uses filter-it from silicos (http://www.silicos-it.com/)
# Uncomment the following 2 lines if running in console
#vortex = console.vortex
#vtable = console.vtable

sys.path.append(vortex.getVortexFolder() + '/modules/jythonlib')

import subprocess

# Get the path to the currently open sdf file
sdfFile = vortex.getFileForPropertyCalculation(vtable)

# Run filter-it on the file
# /usr/local/bin/filter-it --input sdfFile, --filter' /Applications/Silicos/filter-it-1.0.0/filters/Leadlike.sieve --tab
p = subprocess.Popen(['/usr/local/bin/filter-it', '--input&rsquo; , sdfFile, '--filter', '/Applications/Silicos/filter-it-1.0.0/filters/Leadlike.sieve', '--tab'], stdout=subprocess.PIPE)
output = p.communicate()[0]

# Create new columns in table if needed
lines = output.split('\n')
colName = lines[0].split('\t')
for c in colName:
column = vtable.findColumnWithName(c, 1)
vtable.fireTableStructureChanged()

keys = []
for i in lines:
words = i.split('\t')
if len(words) == 2:
keys.append(words[0])

# Parse the output
rows = lines[1:len(lines)]
for r in range(0, vtable.getRealRowCount()):
vals = rows[r].split('\t')
for j in range(0, len(vals)):
column = vtable.findColumnWithName(colName[j], 0)
column.setValueFromString(r, vals[j])

The image below shows the result. allcols With so many columns you might want to change the order in which they are displayed or not actually display some of them. To do this click the icon in the top right corner of the table, this will bring up the dialog box shown below. You can change the order of the columns by moving them up or down the list by highlighting the column(s) name and using the arrow icons. Or you can move columns to the “Hidden” list if you don’t want them to display.
cols1 cols2

The scripts can be downloaded from here. sievecolumns.vpy.zip

For an improved version of this script see Scripting Vortex 10 Interacting with the user

The Vortex Scripts

Scripting Vortex Using OpenBabel
Scripting Vortex 2 Using filter-it
Scripting Votrex 3 Using cxcalc
Scripting Vortex 4 Using MOE
Scripting Vortex 5 Calculating similarities using OpenBabel
Scripting Vortex 6 Filtering compounds
Scripting Vortex 7 Using MayaChemTools
Scripting Vortex 8 Molecular Shape matching
Scripting Vortex 9 Getting a 2D depiction
Scripting Vortex 10 Interacting with the user
Scripting Vortex 11 Interacting with a web service
Scripting Vortex 12 JSON import
Scripting Vortex 13 Using OpenBabel fastsearch
Other Hints, Tips and Tutorials

Updated 18 February 2012