Scripting Vortex 2
This is the second page on scripting Vortex, on the first page I described how to use OpenBabel to calculate a limited selection of chemical properties. In this script we will use one of the brilliant tools from silicos filter-it is a program for filtering out molecules with unwanted properties (this program used to be known as sieve). It is based on the Open Babel open source C++ API for rapid calculation of molecular properties. The program comes with a number of pre-programmed molecular properties that can be used for filtering. These properties include, amongst others:
- Physicochemical parameters, such as logP, topological polar surface area criteria, number of hydrogen bond acceptors and donors, and Lipinski’s rule-of-five.
- Graph-based properties, including ring-based parameters and rotatable bond criteria.
- Selection criteria by means of smarts patterns; o Similarity criteria. -Three-dimensional distances between user-definable fragments.
filter-it is a command line-driven program that is instructed by means of command line options and a user-definable filter file. It is by means of this ‘filter’ file that the user can define the actual filter criteria to be used.
filter-it can be downloaded from here, to install
INTRODUCTION AND REQUIREMENTS
The following tools are required to compile filter-it:
- The latest version of the OpenBabel source code (at least version 2.3)
- A C++ compiler (like g++)
- A makefile system (like GNU make)
- CMake system
If you want to install globally on your system, you will need admin access, and should follow these instructions.
INSTALL GLOBALLY (YOU NEED ADMIN ACCESS)
The double click the downloaded filter-it-1.0.0.tar.gz file
This will create a folder called 'filter-it-1.0.0'.
You now need to configure and compile filter-it. Run the following commands, one after the other:
cd filter-it-1.0.0 cmake CMakeLists.txt make make install (you may need to use sudo) make clean
filter-it -h in the Terminal should give the following help message.
ChrisMacbookPro:~ swain$ filter-it -h +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Filter-it v1.0.0 | Feb 18 2012 09:50:49 -> GCC: 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.1.00) -> Open Babel: 2.3.1 Copyright 2012 by Silicos-it, a division of Imacosi BVBA Filter-it is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. Filter-it is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. You should have received a copy of the GNU Lesser General Public License along with Filter-it. If not, see http://www.gnu.org/licenses/. Filter-it is linked against OpenBabel version 2. OpenBabel is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation version 2 of the License. +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ TASK: Filter-it is a tool to filter molecules from molecules. USAGE filter-it [options] REQUIRED OPTIONS: --input=<file> Specifies the file containing the input molecules. The format of the file is specified by the file extension or, with higher priority, by the optional --inputFormat=<format> option. Gzipped files are also processed. --filter=<file> Specifies the file that defines the filter criteria. OPTIONAL OPTIONS: --inputFormat=<format> Specifies the format of the input file. The <format> argument is required. --pass[=file] Specifies the file to which the molecules are written that pass the filtering. The format of the file is specified by the file extension or, with higher priority, by the --passFormat=<format> option. If the extension is .gz, a compressed file will be written. The [file] argument is optional; if not provided all output will be written to standard output in a format defined by the --passFormat=<format> option. --passFormat=<format> Specifies the format of the pass file. The <format> argument is required. --fail[=file] Specifies the file to which the molecules are written that do not pass the filter criteria. The format of the file is specified by the file extension or, with higher priority, by the --failFormat=<format> option. If the extension is .gz, a compressed file will be written. The [file] argument is optional; if not provided all output will be written to standard output in a format defined by the --failFormat=<format> option. --failFormat=<format> Specifies the format of the fail file. The <format> argument is required. --tab[=file] This flag directs the program to calculate all properties listed in the filter definition file without applying any filtering step. The calculated parameters are written to [file]. The [file] argument is optional; if not provided all output is written to standard output. --salts This flag directs the program not to strip away all salt fragments from the molecules before the filtering takes place. By specifying this option, this stripping is not performed and ensures that all salt counterions are also taken into account during the filtering process. --rename This flag directs the program to rename the title of each molecules into a increasing digit reflecting the sequence of the molecule in the input file. Existing titles are overwritten. --noLog This flag specifies whether verbose logging should be switched off. When not specified, then for each molecule a message is written to standard error whether the molecule passes or fails the filter criteria. This behaviour can be switched off with this command-line option. However, even when this option is specified, information is still written to standard error but reduced to a large extend. -h --help -v --version
The downloaded file also contains several other folders including example filter files, I created a folder in the ‘Applications’ folder called ‘Silicos’ and I’ve put all the various silicos filters etc in there. However you choose to organise it you will need to know the location of both the sieve executable and the filter files. Whilst sieve is a powerful for filtering molecules dependent on various paramters for this script we are just going to use it to calculate a list of 45 different molecular properties. If you did this from the command line it would look like this:-
/usr/local/bin/filter-it --in sdfFile, --filter' /Applications/Silicos/filter-it-1.0.0/filters/Leadlike.sieve --tab
This uses multi.sdf as the input file and outputs a tab separated list of values to stdout.
Vortex contains a powerful scripting facility built on Jython a java implementation of the Python programming language and allows access to the key components of Vortex, Python and Java. The Vortex script to use this application is shown below, the script starts by getting the path of the sdf file that was imported into Vortex, we then construct the sieve command and pipes the output into a variable “output”. The output is then parsed, using \n to separate each line and \t to separate each value on each line. The first line contains the column names and these are used to populate the Vortex columns, the other lines contain the data and this is used to populate the table.
import sys # Uses filter-it from silicos (http://www.silicos-it.com/) # Uncomment the following 2 lines if running in console #vortex = console.vortex #vtable = console.vtable sys.path.append(vortex.getVortexFolder() + '/modules/jythonlib') import subprocess # Get the path to the currently open sdf file sdfFile = vortex.getFileForPropertyCalculation(vtable) # Run filter-it on the file # /usr/local/bin/filter-it --input sdfFile, --filter' /Applications/Silicos/filter-it-1.0.0/filters/Leadlike.sieve --tab p = subprocess.Popen(['/usr/local/bin/filter-it', '--input’ , sdfFile, '--filter', '/Applications/Silicos/filter-it-1.0.0/filters/Leadlike.sieve', '--tab'], stdout=subprocess.PIPE) output = p.communicate() # Create new columns in table if needed lines = output.split('\n') colName = lines.split('\t') for c in colName: column = vtable.findColumnWithName(c, 1) vtable.fireTableStructureChanged() keys =  for i in lines: words = i.split('\t') if len(words) == 2: keys.append(words) # Parse the output rows = lines[1:len(lines)] for r in range(0, vtable.getRealRowCount()): vals = rows[r].split('\t') for j in range(0, len(vals)): column = vtable.findColumnWithName(colName[j], 0) column.setValueFromString(r, vals[j])
The image below shows the result.
With so many columns you might want to change the order in which they are displayed or not actually display some of them. To do this click the icon in the top right corner of the table, this will bring up the dialog box shown below. You can change the order of the columns by moving them up or down the list by highlighting the column(s) name and using the arrow icons. Or you can move columns to the “Hidden” list if you don’t want them to display.
The scripts can be downloaded from here. sievecolumns.vpy.zip
For an improved version of this script see Scripting Vortex 10 Interacting with the user
The Vortex Scripts
Scripting Vortex Using OpenBabel
Scripting Vortex 2 Using filter-it
Scripting Votrex 3 Using cxcalc
Scripting Vortex 4 Using MOE
Scripting Vortex 5 Calculating similarities using OpenBabel
Scripting Vortex 6 Filtering compounds
Scripting Vortex 7 Using MayaChemTools
Scripting Vortex 8 Molecular Shape matching
Scripting Vortex 9 Getting a 2D depiction
Scripting Vortex 10 Interacting with the user
Scripting Vortex 11 Interacting with a web service
Scripting Vortex 12 JSON import
Scripting Vortex 13 Using OpenBabel fastsearch
Other Hints, Tips and Tutorials
Updated 18 February 2012