Macs in Chemistry

Insanely great science

 

Lilly MedChem Rules

In late 2012 Robert Bruns and Ian Watson published a paper entitled Rules for Identifying Potentially Reactive or Promiscuous Compounds.

This article describes a set of 275 rules, developed over an 18-year period, used to identify compounds that may interfere with biological assays, allowing their removal from screening sets. Reasons for rejection include reactivity (e.g., acyl halides), interference with assay measurements (fluorescence, absorbance, quenching), activities that damage proteins (oxidizers, detergents), instability (e.g., latent aldehydes), and lack of druggability (e.g., compounds lacking both oxygen and nitrogen). The structural queries were profiled for frequency of occurrence in druglike and nondruglike compound sets and were extensively reviewed by a panel of experienced medicinal chemists. As a means of profiling the rules and as a filter in its own right, an index of biological promiscuity was developed. The 584 gene targets with screening data at Lilly were assigned to 17 subfamilies, and the number of subfamilies at which a compound was active was used as a promiscuity index. For certain compounds, promiscuous activity disappeared after sample repurification, indicating interference from occult contaminants. Because this type of interference is not amenable to substructure search, a “nuisance list” was developed to flag interfering compounds that passed the substructure rules.

Installation

The code to implement these rules was kindly made available by Ian Watson on GitHub unfortunately my initial attempts to compile this failed, but Matt was able to provide a patch to compile under Mac OSX (Mavericks) using Clang. Whilst this would be sufficient he then went the extra step and made it available via HomeBrew.

If you have not yet set up your machine to use Homebrew I urge you go and have a look at Cheminformatics on a Mac, if you follow the instructions then installation becomes very easy.

brew update
brew install --HEAD lilly-medchem-rules

Since many of the links in the Lilly MedChem rules script are hard-coded and intended to be run from the source directory this is a “keg” only installation.

Use of Lilly MedChem Rules

The README file contains full details but the normal invocation will be of the form

Lilly_Medchem_Rules.rb input.smi > okmedchem.smi

Depending on what you have in your BASH file the safest way is probably to use full paths.

/usr/local/Cellar/lilly-medchem-rules/HEAD/Lilly_Medchem_Rules.rb /Users/username/Desktop/ReactiveTest.smi > /Users/username/Desktop/Temp/okmedchem.smi

This will result in a number of files being produced.

okmedchem.smi
CC(C)(C)C(=O)C#N 3,3-dimethyl-2-oxobutanenitrile : D(80) norings:vinylcyanohet
ClC1=CC(=O)CC1 3-Chloro-2-cyclopenten-1-one
BrC1=CC=C(N)C=C1 4-bromoaniline : D(44) aniline
hewd:bromine
ClC1=CC2=C(C=C1)N(C)C(=O)CN=C2C1=CC=CC=C1 7-Chloro-1-methyl-5-phenyl-1,3-dihydro-2H-1,4-benzodiazepin-2-one
C(N1N=NN=C1N)C=C 74999-22-7
C1(=CC=CC=C1)OC Anisole
C12CCCCC1O2 Cyclohexeneoxide : D(50) het
3memringfused
C(C)OP(=O)(OCC)C#N DiethylCyanophosphonate : D(30) norings
C1=C(CN2C=NC=N2)C=CC2=C1C(=CN2)CCN(C)C Rizatriptan
C(=O)(O)C1=CC=CC=C1 benzoicacid
C1(=O)CCCCC1 cyclohexanone
C1(=CC=CC=C1)NC1=CC=CC=C1 diphenylamine : D(50) aniline
hnewd
COC(=O)C1=CC=C(C=C1)N(=O)=O methyl4-nitrobenzoate : D(95) nitro:ester
C1(=CC=C(C=C1)S(=O)(=O)O)C p-Toluenesulfonicacid : D(40) sulfonic
acid

The first token on each line is the smiles, followed by the molecule name - whatever was in the input file. Some molecules will pass unchanged through the rules, accruing no demerits - For example Anisole in the file above. Other molecules that have passed, but have attracted demerits will have the demerits shown in the form D(nn) above, followed by the reason(s) for the demerits. If you don't care about the demerits associated with passing molecules, invoke the script with -noapdm and the demerit information will not be appended.

You should also Check the file ok0.log after execution to see evidence of failed smiles interpretation.

Rejected structures and the reason for rejection are in the bad*.smi files as shown below.

bad0.smi
C1=CC=CC=C1 Benzene TP1 notenoughatoms
C(Br)C1=CC=CC=C1 Benzylbromide TP1 nointerestingatoms
Cl[Si](C)(C)C Chloro(trimethyl)silane TP1 notenoughatoms
C1(=CC=CC=C1)C#C Phenylacetylene TP1 nointerestingatoms
C(Br)C=C allylbromide TP1 notenoughatoms
ClP(C)C chloro(dimethyl)phosphine TP1 notenoughatoms
CS(=O)C dimethylsulphoxide TP1 notenoughatoms
IC1=CC=CC=C1 iodobenzene TP1 nointerestingatoms
O1CCCC1 tetrahydrofuran TP1 notenoughatoms
C1(=CC=CC=C1)P(C1=CC=CC=C1)C1=CC=CC=C1 triphenylphosphine TP1 nointerestingatoms

bad1.smi
CCCC(=O)OC1=CC=C(C=C1)N(=O)=O 4-Nitrophenylbutyrate (1 matches to 'phenolicesterorcarbamate')
C(=O)(Br)C1=CC=CC=C1 BENZOYLBROMIDE (1 matches to 'acid
halide')
C1(=CC=CC=C1)N=C=NC1=CC=CC=C1 N,N'-Diphenylcarbodiimide (1 matches to 'allene')
CN=C(Br)C1=CC=CC=C1 N-Methylbenzimidoylbromide (1 matches to 'haloimine')
C1(=CC=CC=C1)N=C(Cl)C1=CC=CC=C1 N-Phenylbenzenecarboximidoylchloride (1 matches to 'halo
imine')
C1(=CC=CC=C1)N=C=O Phenylisocyanate (1 matches to 'allene')
C(=O)(Cl)C1=CC=CC=C1 benzoylchloride (1 matches to 'acidhalide')
C1(=O)OC(=O)C=C1 maleicanhydride (2 matches to 'michael
rejected')

bad2.smi
C(C)OC(=O)CC[N+](C)(C)C 3-Ethoxy-N,N,N-trimethyl-3-oxo-1-propanaminium (1 matches to 'reversemichael')
C(=O)(C1=CC=CC=C1)OOC(=O)C1=CC=CC=C1 Benzoylperoxide (1 matches to 'peroxide')
C1=CC=CC=C1C(O)C#N Mandelonitrile (1 matches to 'cyanohydrin')
C[N+](C)(C)CCC(=O)C N,N,N-Trimethyl-3-oxo-1-butanaminium (1 matches to 'reverse
michael')
C(=O)(C)OC(=O)C aceticanhydride (1 matches to 'anhydride')
ClS(=O)(=O)C1=CC=CC=C1 chlorophenylsulfone (1 matches to 'sulfonylhalide')
IC1=CC=CC=C1 iodosobenzene (1 matches to '3
valenthalogen')
COS(=O)(=O)C1=CC=C(C)C=C1 methyltosylate (1 matches to 'sulfonyl
ester')
N(=O)C1=CC=CC=C1 nitrosobenzene (1 matches to 'nitroso')

bad3.smi
C1=C(C(=O)NN)C=CN=C1 Isoniazid : D(125) N-N:hydrazide_acyclic
C1(=CC=CC=C1)N=N#N Phenylazide : D(100) N=N
C(C1=CC=CC=C1)N=N#N benzylazide : D(100) N=N