Macs in Chemistry

Insanely great science

 

A Pan Assay Interference Compounds (PAINS) Filter for filter-it

Jonathan B. Baell and Georgina A. Holloway published a very interesting paper on their analysis of frequent hitters from screening assays. DOI

This report describes a number of substructural features which can help to identify compounds that appear as frequent hitters (promiscuous compounds) in many biochemical high throughput screens. The compounds identified by such substructural features are not recognised by filters commonly used to identify reactive compounds. Even though these substructural features were identified using only one assay detection technology, such compounds have been reported to be active from many different assays. In fact, these compounds are increasingly prevalent in the literature as potential starting points for further exploration, whereas they may not be

In the supplementary information they provided the corresponding filters in Sybyl Line Notation (SLN) format, unfortunately I don’t use SYBYL and so needed them in SMARTS format for use with filter-it.

Two tools that proved to be invaluable in the conversion were SMARTSViewer from the University of Hamburg and the Xemistry Web Sketcher front-end to the CACTVS chemiformatics toolkit. Rajarshi Guha had done the majority of the conversion I went through line by line comparing SLN and SMARTS and made a few minor edits. I and also grateful to Wolf Ihlenfeldt at Xemistry for helping sorting out a few rather challenging issues.

Creating the sieve file

Filter-it™ is a command-line program for filtering molecules with unwanted properties out of a set of molecules. The program comes with a number of pre-programmed molecular properties that can be used for filtering and several sieve files. The online manual provides details of the file format.

Fragment rules are specifications to define limits on the presence of user-definable molecular substructures within the input molecules. These substructures are defined by means of a SMARTS pattern, and the user can put limits on the number of occurrences for these particular fragments in each molecule.

The general syntax of fragment rule is:

FRAGMENT name smarts minimum maximum

FRAGMENT is the required keyword to specify a fragment rule. The name field specifies a user-definable name for the fragment, and the smarts field specifies the substructure in SMARTS terminology. The minimum and maximum limits define the allowed number of occurrences of the specific fragment in the molecule.

Thus the rule to filter out all ester functionalities would require a fragment definition like this:

FRAGMENT ester [O;X2;H0][C;X3]=O 0 0

Below I’ve shown the first few lines of the file:

#*************************************************************************************
#                                                                                     
# Sieve definitions derived from the PAINS                                            
#    Created by www.macinchem.org based on http://blog.rguha.net/?p=850               
#  New Substructure Filters for Removal of Pan Assay Interference Compounds (PAINS)   
# from Screening Libraries and for Their Exclusion in Bioassays                       
# All FILTER FAMILIES                           
#    DOI: 10.1021/jm901137j                                      
#  Tested on sieve v3.0.0                                                             
#*************************************************************************************

FRAGMENT regId=ene_six_het_A(483) [#6]-1(-[#6](~[!#6&!#1]~[#6]-[!#6&!#1]-[#6]-1=[!#6&!#1])~[!#6&!#1])=[#6;!R]-[#1] 0 0
FRAGMENT regId=hzone_phenol_A(479) c:1:c:c(:c(:c:c:1)-[#6]=[#7]-[#7])-[#8]-[#1] 0 0
FRAGMENT regId=anil_di_alk_A(478) [#6](-[#1])(-[#1])-[#7](-[#6](-[#1])-[#1])-c:1:c:c(:c(:c(:c:1)-[$([#1]),$([#6](-[#1])-[#1]),$([#8]-[#6](-[#1])(-[#1])-[#6](-[#1])-[#1])])-[#7])-[#1] 0 0
FRAGMENT regId=indol_3yl_alk(461) n:1(c(c(c:2:c:1:c:c:c:c:2-[#1])-[#6;X4]-[#1])-[$([#6](-[#1])-[#1]),$([#6]=:[!#6&!#1]),$([#6](-[#1])-[#7]),$([#6](-[#1])(-[#6](-[#1])-[#1])-[#6](-[#1])(-[#1])-[#7](-[#1])-[#6](-[#1])-[#1])])-[$  ([#1]),$([#6](-[#1])-[#1])] 0 0
FRAGMENT regId=quinone_A(370) [!#6&!#1]=[#6]-1-[#6]=:[#6]-[#6](=[!#6&!#1])-[#6]=,:[#6]-1 0 0
FRAGMENT regId=azo_A(324) [#7;!R]=[#7] 0 0
FRAGMENT regId=imine_one_A(321) [#6]-[#6](=[!#6&!#1;!R])-[#6](=[!#6&!#1;!R])-[$([#6]),$([#16](=[#8])=[#8])] 0 0
FRAGMENT regId=mannich_A(296) [#7]-[#6;X4]-c:1:c:c:c:c:c:1-[#8]-[#1] 0 0
FRAGMENT regId=anil_di_alk_B(251) c:1:c:c(:c:c:c:1-[#7](-[#6;X4])-[#6;X4])-[#6]=[#6] 0 0

Using Filter-it

Details for installing filter-it are described in the Scripting Vortex 2 tutorial. In particular you need to note where the filters (.sieve) files are stored.

Filter-it can be used from the command line and the general syntaxx is something like

  /usr/local/bin/filter-it --in sdfFile, --filter' /Applications/Silicos/filter-it-1.0.0/filters/Leadlike.sieve --tab

In this particular case we use.

 /usr/local/bin/filter-it --input='/Users/swain/Desktop/Testing/test.sdf' --pass='/Users/swain/Desktop/filterit.sdf' --filter='/Applications/Silicos/filter-it-1.0.0/filters/PAINS.sieve' --passFormat=sdf

You will be able to monitor progress in the terminal and there will be a brief summary of the numbers that pass or fail, in addition compounds that pass the filter are stored in the file filter it.sdf on the desktop. Alternatively if you want details of which rule caused a particular compound to fail use.

/usr/local/bin/filter-it --input='/Users/swain/Desktop/Testing/test.sdf' --filter='/Applications/Silicos/filter-it-1.0.0/filters/PAINS.sieve' ' ' --tab > '/Users/swain/Desktop/filteritTab.txt'

All information on every compound will be sent in tabulated form to the text file filteritTab.txt. in the format shown below.

NAME regId=enesixhetA(483) regId=hzonephenolA(479) regId=anildialkA(478) regId=indol3ylalk(461) regId=quinoneA(370) regId=azoA(324) regId=imineoneA(321) regId=mannichA(296) regId=anildialkB(251) regId=anildialk_C(246) etc.
Compound1 0 0 0 0 0 0 0 0 0….
Compound2 0 0 0 0 0 0 0 0 0….
Compound3 0 0 0 0 0 0 0 0 0….

Of course if you are not comfortable using the command line then iBabel provides a GUI.

ibabelFilterit

Sieve files

You can download the sieve files from here. http://macinchem.org/reviews/pains/Archive.zip

The zip archive contains four files, corresponding to the three families of filters defined in the publication together with a fourth file that contains all the filters in one file.

Filter Family A (“FreqHit5morethan150.hits”), Filter Family B (“FreqHit5lessthan150.hits”), Filter Family C (“FreqHit5_lessthan15.hits”), All (“All hits).