Lilly MedChem Rules
In late 2012 Robert Bruns and Ian Watson published a paper entitled Rules for Identifying Potentially Reactive or Promiscuous Compounds.
This article describes a set of 275 rules, developed over an 18-year period, used to identify compounds that may interfere with biological assays, allowing their removal from screening sets. Reasons for rejection include reactivity (e.g., acyl halides), interference with assay measurements (fluorescence, absorbance, quenching), activities that damage proteins (oxidizers, detergents), instability (e.g., latent aldehydes), and lack of druggability (e.g., compounds lacking both oxygen and nitrogen). The structural queries were profiled for frequency of occurrence in druglike and nondruglike compound sets and were extensively reviewed by a panel of experienced medicinal chemists. As a means of profiling the rules and as a filter in its own right, an index of biological promiscuity was developed. The 584 gene targets with screening data at Lilly were assigned to 17 subfamilies, and the number of subfamilies at which a compound was active was used as a promiscuity index. For certain compounds, promiscuous activity disappeared after sample repurification, indicating interference from occult contaminants. Because this type of interference is not amenable to substructure search, a “nuisance list” was developed to flag interfering compounds that passed the substructure rules.
The code to implement these rules was kindly made available by Ian Watson on GitHub unfortunately my initial attempts to compile this failed, but Matt was able to provide a patch to compile under Mac OSX (Mavericks) using Clang. Whilst this would be sufficient he then went the extra step and made it available via HomeBrew.
If you have not yet set up your machine to use Homebrew I urge you go and have a look at Cheminformatics on a Mac, if you follow the instructions then installation becomes very easy.
brew update brew install --HEAD lilly-medchem-rules
Since many of the links in the Lilly MedChem rules script are hard-coded and intended to be run from the source directory this is a “keg” only installation.
Use of Lilly MedChem Rules
The README file contains full details but the normal invocation will be of the form
Lilly_Medchem_Rules.rb input.smi > okmedchem.smi
Depending on what you have in your BASH file the safest way is probably to use full paths.
/usr/local/Cellar/lilly-medchem-rules/HEAD/Lilly_Medchem_Rules.rb /Users/username/Desktop/ReactiveTest.smi > /Users/username/Desktop/Temp/okmedchem.smi
This will result in a number of files being produced.
CC(C)(C)C(=O)C#N 3,3-dimethyl-2-oxobutanenitrile : D(80) norings:vinylcyanohet
BrC1=CC=C(N)C=C1 4-bromoaniline : D(44) anilinehewd:bromine
C12CCCCC1O2 Cyclohexeneoxide : D(50) het3memringfused
C(C)OP(=O)(OCC)C#N DiethylCyanophosphonate : D(30) norings
C1(=CC=CC=C1)NC1=CC=CC=C1 diphenylamine : D(50) anilinehnewd
COC(=O)C1=CC=C(C=C1)N(=O)=O methyl4-nitrobenzoate : D(95) nitro:ester
C1(=CC=C(C=C1)S(=O)(=O)O)C p-Toluenesulfonicacid : D(40) sulfonicacid
The first token on each line is the smiles, followed by the molecule name - whatever was in the input file. Some molecules will pass unchanged through the rules, accruing no demerits - For example Anisole in the file above. Other molecules that have passed, but have attracted demerits will have the demerits shown in the form D(nn) above, followed by the reason(s) for the demerits. If you don't care about the demerits associated with passing molecules, invoke the script with -noapdm and the demerit information will not be appended.
You should also Check the file ok0.log after execution to see evidence of failed smiles interpretation.
Rejected structures and the reason for rejection are in the bad*.smi files as shown below.
C1=CC=CC=C1 Benzene TP1 notenoughatoms
C(Br)C1=CC=CC=C1 Benzylbromide TP1 nointerestingatoms
Cl[Si](C)(C)C Chloro(trimethyl)silane TP1 notenoughatoms
C1(=CC=CC=C1)C#C Phenylacetylene TP1 nointerestingatoms
C(Br)C=C allylbromide TP1 notenoughatoms
ClP(C)C chloro(dimethyl)phosphine TP1 notenoughatoms
CS(=O)C dimethylsulphoxide TP1 notenoughatoms
IC1=CC=CC=C1 iodobenzene TP1 nointerestingatoms
O1CCCC1 tetrahydrofuran TP1 notenoughatoms
C1(=CC=CC=C1)P(C1=CC=CC=C1)C1=CC=CC=C1 triphenylphosphine TP1 nointerestingatoms
CCCC(=O)OC1=CC=C(C=C1)N(=O)=O 4-Nitrophenylbutyrate (1 matches to 'phenolicesterorcarbamate')
C(=O)(Br)C1=CC=CC=C1 BENZOYLBROMIDE (1 matches to 'acidhalide')
C1(=CC=CC=C1)N=C=NC1=CC=CC=C1 N,N'-Diphenylcarbodiimide (1 matches to 'allene')
CN=C(Br)C1=CC=CC=C1 N-Methylbenzimidoylbromide (1 matches to 'haloimine')
C1(=CC=CC=C1)N=C(Cl)C1=CC=CC=C1 N-Phenylbenzenecarboximidoylchloride (1 matches to 'haloimine')
C1(=CC=CC=C1)N=C=O Phenylisocyanate (1 matches to 'allene')
C(=O)(Cl)C1=CC=CC=C1 benzoylchloride (1 matches to 'acidhalide')
C1(=O)OC(=O)C=C1 maleicanhydride (2 matches to 'michaelrejected')
C(C)OC(=O)CC[N+](C)(C)C 3-Ethoxy-N,N,N-trimethyl-3-oxo-1-propanaminium (1 matches to 'reversemichael')
C(=O)(C1=CC=CC=C1)OOC(=O)C1=CC=CC=C1 Benzoylperoxide (1 matches to 'peroxide')
C1=CC=CC=C1C(O)C#N Mandelonitrile (1 matches to 'cyanohydrin')
C[N+](C)(C)CCC(=O)C N,N,N-Trimethyl-3-oxo-1-butanaminium (1 matches to 'reversemichael')
C(=O)(C)OC(=O)C aceticanhydride (1 matches to 'anhydride')
ClS(=O)(=O)C1=CC=CC=C1 chlorophenylsulfone (1 matches to 'sulfonylhalide')
IC1=CC=CC=C1 iodosobenzene (1 matches to '3valenthalogen')
COS(=O)(=O)C1=CC=C(C)C=C1 methyltosylate (1 matches to 'sulfonylester')
N(=O)C1=CC=CC=C1 nitrosobenzene (1 matches to 'nitroso')
C1=C(C(=O)NN)C=CN=C1 Isoniazid : D(125) N-N:hydrazide_acyclic
C1(=CC=CC=C1)N=N#N Phenylazide : D(100) N=N
C(C1=CC=CC=C1)N=N#N benzylazide : D(100) N=N