Macs in Chemistry

Insanely great science


MedChem Wizard KNIME workflow:- provided by Dr. Alastair Donald (


KNIME, the Konstanz Information Miner, is a visual platform for graphically building and editing workflows and data analysis pipelines from defined components called nodes. KNIME is developed by Prof. Michael Berthold at the University of Konstanz, Germany. It can be downloaded for free from . KNIME is built on the Eclipse Interactive Development Environment and written in Java. Versions are available for Mac OS X, Windows, and Linux and can be downloaded here. KNIME has rapidly become an invaluable cheminformatics workflow tool with many companies, institutions and individuals contributing nodes.

The MedChemWizard is a KNIME workflow designed to assist medicinal chemists with idea generation, ligand design and lead optimization using a number of common functional group transformations and medchem rules-of-thumb. It integrates with BioSolveIT’s SeeSAR tool ( to provide an estimated binding affinity and a predicted binding mode for the new ligands allowing the user to interactively inspect the quality of the new ideas and to select those of greatest interest. It is intended that this workflow will help medicinal chemistry teams avoid spending time synthesizing and testing poor (low potency) compounds, and facilitate scaffold hopping/morphing into new chemical and IP space – together, these factors should help improve overall odds of project success. An outline of what the Wizard accomplishes is shown below, using Viagra as an example with one example given per type of transformation (highlighted in red).


The idea for the workflow started with the notion of implementing a design tool for annotating, docking and scoring diamine ring structures, as not only was this something that I had personally done a number of times as a chemist (as shown below in the discovery of CHR-3996 and DOI, DOI and DOI), but also because this is a widely used scaffold hopping strategy, often seen in the medchem literature due to the reliable and diverse reactivity of diamine building blocks.


With that in mind, a workflow was constructed with the intention that it would be a powerful but simple to use tool (especially for KNIME non-experts) – essentially to draw in a structure, and then click “Execute all”. Once functional, the workflow was expanded to encompass a number of other medchem techniques such as methylene shuffling, bio-isosteric replacements, formation of new ring systems, strategies commonly used to overcome common metabolism problems etc. The majority of transformations are coded as SMARTS patterns and are fully customizable and extendable. It is my hope that users will contribute to further refine these rules and transformations over the coming months and years to capture the knowledge and experience of the medchem community.


Prior to running, the workflow requires some user set-up and input - To dock, score and visualize the predicted binding modes for new molecules requires installation of BioSolveIT’s suite of KNIME nodes, available from our KNIME update site. Instructions for installation are contained in the workflow download ZIP folder package.

Ligand preparation

Once the setup process is complete, the user needs to draw in the ligand that they want to modify. In this example I will be using a fragment style molecule (shown below) active against AKT2 (Protein Kinase B beta isoform, UniprotID – P31751). Simply draw the ligand into the MarvinSketch node – double click to open the node, draw in the ligand and click OK.


Receptor Preparation

In order to dock molecules, a protein binding site must be defined. This is achieved either using the “Prepare Receptor with LeadIT” node or the “Project Reader (FXX)” Node (for users who have a standalone installation of LeadIT). The receptor preparation wizard makes generating the binding site very easy. In this example I use structure 4gv1 – this is AKT2 complexed with AstraZeneca’s inhibitor AZD5363. All default values are used in receptor preparation – double click on the “Prepare Receptor with LeadIT” node, select “Create new project”, and in the newly opened LeadIT window choose to use the PDB server and enter 4gv1 (then hit enter) in the box. Click the green forward arrow a couple of times, then select as “Reference Ligand” AZD5363 (OXZ-501-A). This will then highlight in the display window the receptor amino acids that will define the binding site. Click the green forward arrow a couple more times, then finish the process by clicking on the chequered flag.

Adding a pharmacophore to speed docking and scoring

In order to increase the accuracy of the docking solutions, a hinge-binding pharmacophore constraint was added to the example as shown below. Adding a pharmacophore feature reduces the number of docking poses generated per molecule, speeding up subsequent steps.

Pasted Graphic

To use a pharmacophore in the MedChemWizard docking protocol, the user must first define the pharmacophore by going to the “Docking” menu in the LeadIT receptor preparation session, then select “Pharmacophore” and then “Define”. Click on hdon to highlight all the available H bond donors in the active site, then select the correct one (ALA-230) by clicking on it (this is easiest done with all protein surfaces hidden). Then, switch to H bond acceptors by clicking on hacc and selecting GLU-228. Click OK to accept this pharmacophore. The default behaviour is to add these pharmacophore features as being essential – this can be changed using the dialogs further down the window. The pharmacophore features must be enabled before they will be used: go to the “Docking” menu again, then select the “Define FlexX Docking” submenu, select the pharmacophore dropdown, then select the pharmacophore just defined (usually “Pharm (1)”) to be used with the appropriate pulldown menu, then click OK. Exit and return to KNIME by going to the LeadIT menu, select “Exit & Return to KNIME”, then “Save Changes”. CRTL-Click on OK on the KNIME window to close and execute the receptor preparation steps. For a video tutorial on creating a binding site with LeadIT, visit our YouTube channel -

Molecule filtering

After enumerating new ideas, the workflow automatically removes molecules that are generally considered to be “undesirables” by most medicinal chemists such as molecules known as PAINS, as well as molecules with substructures flagged as being responsible for adverse toxicological outcomes. If your workflow is not generating many output structures it could be that many are being filtered off in these steps. This can be checked by looking at the contents of the (bottom) output ports of the three metanodes - Substructure filter, PAINS filter and Tox filter – the bottom ports show molecules removed.

Pasted Graphic 1

Selecting a subset for docking and scoring

The default behaviour is to dock and score all the new ideas – this may take some time! For my example, the workflow generates 594 new structures. If you have limited computational power, a subset of ligands to dock and score can be created using the “Nominal Value Row Filter” node, should you wish to focus in on a specific class of product rather than dock and score the entire set. Simply double click on this node, select the column you want to base the filtering on (as shown below with the column “Class”).

Pasted Graphic 2

Once the docking is complete the molecules are rescored and ranked using the HYDE scoring function DOI. A new SeeSAR window opens with the top 100 scoring molecules listed – the original input molecule is listed by name as “INPUT”, making it easy to tell which molecules are predicted to be more potent (if it does not appear on the list, then all the top 100 results are predicted to be more potent!). By default, the workflow chooses to ignore docked poses that result in unusual conformers (flagged as red by SeeSAR). All results including red-flagged poses can be shown by connecting the top output port of the “Rescoring!” rather than the bottom port.

Executing the entire workflow

Once the ligand has been drawn and the protein binding site has been defined, one click on the “Execute all executable nodes” button (>>) sets the entire workflow in motion. I hope that you find the workflow useful – feedback on bugs/improvements/success stories would be very much appreciated!

The workflow can be downloaded from the BioSolveIT website, and there is a webinar on Jan 28

In this one hour BioSolveIT webinar you will learn how to fast-track your fragment hit discovery using our cutting edge in silico solutions using Protein Kinase B as an example. By the end you will appreciate how our suite of SBDD tools can enrich your work life, enhance your productivity and assist in the identification of potent, selective and synthetically accessible lead-like molecules.

Last updated 21 January 2015.