Macs in Chemistry

Insanely Great Science

drug discovery

Scaffold Hunter update


Scaffold Hunter is a chemical data organization and analysis tool and that has been continuously enhanced since the start of its development in 2007. The platform-independent open-source tool was first released in 2009 and provided an interactive visualisation of the so-called scaffold tree, which is a hierarchical classification scheme for molecules based on their common scaffolds. A recent publication describes recent extensions that significantly increase the applicability for a variety of tasks DOI.

When I first opened the application I did not find it particularly intuitive, fortunately there is a online tutorial and sample datasets available.


Accessing ZINC supplier information


ZINC is a free database of commercially-available compounds for virtual screening. ZINC contains over 100 million purchasable compounds in ready-to-dock, 3D formats. Sterling and Irwin, J. Chem. Inf. Model, 2015. This is an invaluable resource for any type of virtual screening or for anyone looking to create a physical screening or fragment collection.

Once you have done the virtual screening you will rapidly realise that the really time-consuming a tedious part now lies ahead. Finding out which vendors stock a particular molecule and then ordering them. Looking up the vendor details for individual compounds is extremely tedious and so this Vortex script may be very useful.

Many more scripts, iPython notebooks and tutorials can be found here.


ChEMBL 21 released


The release of ChEMBL_21 has been announced. This version of the database was prepared on 1st February 2016 and contains:

  • 1,929,473 compound records
  • 1,592,191 compounds (of which 1,583,897 have mol files)
  • 13,968,617 activities
  • 1,212,831 assays
  • 11,019 targets
  • 62,502 source documents


Data can be downloaded from the ChEMBL ftpsite or viewed via the ChEMBL interface

Please see ChEMBL_21 release notes for full details of all changes in this release.


Flagging potential aggregators in Vortex


Promiscuous inhibition caused by small molecule aggregation is a major source of false positive results in high-throughput screening. A recent particularly valuable publication, Irwin, Duan, Torosyan, Doak, Ziebart, Sterling, Tumanian and Shoichet, J Med Chem, 2015, 58(1 7), 7076-7087 DOI, has collated over 12,000 organic molecules known to act as aggregators at concentrations used in screening campaigns, and provides a resource Aggregation Advisor that can be used to try and predict possible false positives. However in many instances it would be unwise to submit proprietary information to the public web service. Potential aggregators are flagged based on calculated LogP >3 and/or similarity >0.85 to a known aggregator (using path based fingerprint) this script calculates xLogP using the algorithm provided by Dotmatics and then uses OpenBabel fast search to calculate the closest similarity to a known aggregator.

Full details of the Vortex script are here.



Importing Open Source Malaria Project data


The Open Source Malaria project is trying a different approach to curing malaria. Guided by open source principles, everything is open and anyone can contribute. To date a lot of people around the world have made contributions and the project is at a very exciting stage. Whilst everyone can see the compounds that have been made and the biological data, it is often spread over multiple web pages and can be tricky to link molecule with identifier with data. Over the last couple of months a significant effort has been put into populating a spreadsheet with all the information.

Whilst this is useful for viewing results it is not ideal for trying to build predictive models. Vortex is a chemically intelligent data analysis and visualisation platform. This script provides a one-click access to the OSM data and creates a workspace containing all the data, and since it is linked to the live spreadsheet you will always have access to the latest data.



Installing Open Drug Discovery Toolkit (ODDT)


A recent paper in J Cheminformatics described Open Drug Discovery Toolkit (ODDT): a new open-source player in the drug discovery field DOI a free and open source tool for both computer aided drug discovery (CADD) developers and researchers. Open Drug Discovery Toolkit is released on a permissive 3-clause BSD license for both academic and industrial use. ODDT’s source code, additional examples and documentation are available on GitHub.

To install ODDT on a Mac you first need to install the appropriate toolkits, the easiest way is to use Homebrew, I've written a page detailing how to do this here.

Once installed you can install ODDT using PIP as described here.


ConTour: Data-Driven Exploration of Multi-Relational Datasets for Drug Discovery


Caleydo is an open source visual analysis framework targeted at biomolecular data. It has been described in a number of publications and I noticed that a recent project ConTour included chemical structures.

Large scale data analysis is nowadays a crucial part of drug discovery. Biologists and chemists need to quickly explore and evaluate potentially effective yet safe compounds based on many datasets that are in relationship with each other. However, there is a is a lack of tools that support them in these processes. To remedy this, we developed ConTour, an interactive visual analytics technique that enables the exploration of these complex, multi-relational datasets.

Christian Partl, Alexander Lex, Marc Streit, Hendrik Strobelt, Anne-Mai Wassermann, Hanspeter Pfister, Dieter Schmalstieg ConTour: Data-Driven Exploration of Multi-Relational Datasets for Drug Discovery IEEE Transactions on Visualization and Computer Graphics (VAST '14), to appear, 2014.

I’ve added Caleydo to the listing of data analysis tools.


CLC Chemistry Workbench updated


The CLC Drug Discovery Workbench has been updated to version 1.5.

The new features are described on the updates page and in a video.


Open Phacts API update


The OpenPhacts API has been updated to include two new data sets and the corresponding API calls.

1) DisGeNet target-disease associations These API calls use URIs inputs that correspond to either diseases or targets (proteins or genes). The disease identifiers correspond to UMLS CUIs, Mesh ids or ConceptWiki and can use several namespaces, e.g.,, or

2) neXtProt nanopublications for tissue expression (PREVIEW mode) These API calls use URIs that correspond to either tissues or targets. The tissue identifiers correspond to the Caloha tissue ontology from neXtProt. These identifiers can use either the namespace from the neXtProt database (e.g., will be operational next week) or the Caloha ontology (, operational now).

To reduce the barriers to drug discovery in industry, academia and for small businesses, the Open PHACTS Discovery Platform provides tools and services to interact with multiple integrated and publicly available data sources. To integrate this data, extensive cross-referencing of scientific concepts is needed across all databases.


Medicinal Chemistry Toolkit


The Medicinal Chemistry Toolkit app is a suite of resources to support the day to day work of a medicinal chemist. Based on the experiences of medicinal chemistry experts, we developed otherwise difficult-to-access tools in a portable format for use in meetings, on the move and in the lab. The app is optimised for iPad and contains calculator functions designed to ease the process of calculating values of: Cheng-Prusoff; Dose to man; Gibbs free energy to binding constant; Maximum absorbable dose calculator; Potency shift due to plasma protein binding.

The app has been designed in collaboration with the editors of the forthcoming book, The Handbook of Medicinal Chemistry: Principles and Practice, which will be published in November 2014 providing a comprehensive, everyday resource for a practicing medicinal chemist throughout the drug development process and is an ideal companion for the biannual MedChem Summer school run by the RSC.

Handbook of Medicinal Chemistry_The_Publicity


SeeSAR 1.4


SeeSAR from BioSolve-it has just been updated, SeeSAR is intended as an interactive tool for designing/improving ligands for drug discovery. This update includes an option to highlight the neighbouring atoms that lead to a particular hyde-score, in the example below the carbon in the ring that gets a pretty big red (unfavourable) score, can be explained by the Receptor desolvation penalty ascribed to the carbonyl oxygen. stereo hardware support (as a first step supporting the polarized-glass-type only), a screen shot feature, an option to move labels out of the way for a better view.


There is a review of SeeSAR here


CLC Drug Discovery Workbench


The latest beta of CLC Drug Discovery Workbench v1.5 beta 4 is available for download. The workbench provides an integrated environment for drug discovery providing tools to explore visualise protein targets and ligands binding to them



  • Molecule 3D structure import: Mol2, SDF, PDB
  • Direct download of PDB structures from NCBI
  • Quick-style options including ball-n-sticks and molecular surfaces
  • Custom visualization applied to selected atoms
  • Save molecule visualizations on data
  • Molecule tables with 2D depiction of molecules


  • Generate molecule 3D structure from SMILES or 2D representation*
  • Automatic assignment of atom and bond properties
  • Automatic binding site setup
  • Chemical consistency check
  • Lipinski’s rule of five check


  • Binding pocket finder
  • Easy, graphical protein target setup
  • Fast track molecular docking
  • Optimize ligand interactions in binding site
  • Virtual screening
  • Ligand binding inspection
  • Calculate molecular properties
  • Protein structure and binding site alignment



DataWarrior is a data analysis tool that understands chemistry.

DataWarrior combines dynamic graphical views and interactive row filtering with chemical intelligence. Scatter plots, box plots, bar charts and pie charts not only visualize numerical or category data, but also show trends of multiple scaffolds or compound substitution patterns. Chemical descriptors encode various aspects of chemical structures, e.g. the chemical graph, chemical functionality from a synthetic chemist’s point of view or 3-dimensional pharmacophore features. These allow for fundamentally different types of molecular similarity measures, which can be applied for many purposes including row filtering and the customization of graphical views. DataWarrior supports the enumeration of combinatorial libraries as the creation of evolutionary libraries. Compounds can be clustered and diverse subsets can be picked. Calculated compound similarities can be used for multidimensional scaling methods, e.g. Kohonen nets. Physicochemical properties can be calculated, structure activity relationship tables can be created and activity cliffs be visualized.



Installing ACPC on a Mac


One of the advantages of using a Mac for science is that you can often make use of the UNIX underpinnings of Mac OSX to access programs written for Linux.

A recent publication in Journal of Cheminformatics caught my eye, screening of molecules using electrostatics is usually a very time-consuming process, but this publication describes an interesting and very quick way to screen molecules.

A rotation-translation invariant molecular descriptor of partial charges and its use in ligand-based virtual screening Francois Berenger, Arnout Voet, Xiao Yin Lee and Kam YJ Zhang Journal of Cheminformatics 2014, 6:23 doi

I’ve written instructions for how to install ACPC under Mac OSX.


A Review of Forge V10.2 on the New MacPro


Now that I have my new MacPro I thought it might be interesting to try out a couple of the software packages that I’ve previously reviewed. ForgeV10 allows the scientist to use Cresset’s proprietary electrostatic and physicochemical fields to align, score and compare diverse molecules. It allows the user to build field based pharmacophores to understand structure activity and then use the template to undertake a virtual screen to identify novel scaffolds. I’ve previously reviewed ForgeV10 and as it was formally known FieldAlign so I’m going to focus on the support for multiple processors and a few of the new features.

Read the review here


There is a compilation of software reviews here


Bringing Open Source to Drug Discovery


I gave a talk at the RSC 25th Symposium on Medicinal Chemistry in Eastern England meeting last week entitled “Bringing Open Source to Drug Discovery”.

The slides and pages of links are available here.

I also captured the laptop screen of the demo which I’ve now put on YouTube.

The aim was to show what was available and to show how they can be integrated into proprietary tools using scripting, many of the scripts are available on the hints and tutorials page.


Porting of BUDE (Bristol University Docking Engine) to OpenCL.


A recently publication “High Performance in silico Virtual Drug Screening on Many-Core Processors” DOI describes porting BUDE (Bristol University Docking Engine) to OpenCL.

Our highly optimized OpenCL implementation of BUDE sustains 1.43 TFLOP/s on a single NVIDIA GTX 680 GPU, or 46% of peak performance. BUDE also exploits OpenCL to deliver effective performance portability across a broad spectrum of different computer architectures from different vendors, includ- ing GPUs from NVIDIA and AMD, Intel’s Xeon Phi and multi-core CPUs with SIMD instruction sets.

BUDE is now one the fastest HPC applications ever developed and nicely demonstrates the portability of OpenCL across different architectures.

There is a list of GPU accelerated applications here.


TB Mobile Updated


TB Mobile from Collaborative Drug Discovery been updated.

TB Mobile makes available a set of molecules with activity against Mycobacterium Tuberculosis, and known targets available in CDD. It links to pathways (, genes (, literature (PubMed).

The latest update adds support for iOS7, and adds further compounds and molecular targets. So now there are 96 unique targets and 805 compounds. There is also major new functionality including interactive clustering, personal favourites, target prediction and exporting capabilities.



Schrodinger Small Molecule Drug Discovery Suite Updated


The Schrodinger Small Molecule Drug Discovery Suite was updated over the weekend, this is a major update that brings in a host of new features and improvements.

Maestro Graphical Interface

Improved flexible ligand superposition Additional graphics settings
Real-time antialiasing Real-time ambient occlusion, outlines, and cartoon shading effects Multivariate ranking in the Project Table
Simultaneously maximize or minimize up to four property values, and rank entries based on the optimization Date Created and Date Modified fields automatically generated in the Project Table Workspace responsiveness of atom labels is up to 2.5x faster Click and drag to rearrange atom, measurement, and adjustment labels in the Workspace Support for bond labels Installed scripts and Tools menu items now searchable in the Task Tree Significant improvements to the Property Calculation interface in the project facility
Simultaneously calculate multiple properties Additional 2D properties now available: AlogP, #Hbond acceptors, #HBond donors, #rotatable bonds, polar surface area, molar refractivity, and polarizability

Ligand Docking

Ligand efficiencies are now calculated from the DockingScore instead of the GlideScore Generate per-residue interaction energies in Virtual Screening Workflow (VSW) for visualization New server mode in Glide Ligand Designer enables near real-time interactive docking (Glide Ligand Designer Script)

Pharmacophore Modeling

Performance improvements to Phase database operations, including faster deletion and insertion of ligands Automatic restart of Phase database subjobs

Field-Based QSAR

Use QM-calculated fields in 3D QSAR (command line only; phasefqsar script)
fqsar script generates Jaguar input files for computing QM electrostatic fields for use in 3D QSAR

Molecular Dynamics

Monitor secondary structure elements over the course of the trajectory (Simulation Interactions Diagram; SID)

Quantum Mechanics

New interface to compute thermodynamic properties for reactions New faster TDDFT algorithm and graphical interface Compute Raman intensities Several improvements to the results script Jaguar pKa displays the computed pKa as an atom label by default Heat of formation graphical interface now supports bromine and iodine Improved numerical stability of the 1st and 2nd derivatives of the D3 correction Increased utility of script
Script acts on a group of isomers and skips structures with unique stoichiometries

Protein X-Ray Refinement

Optionally set hydrogen B-values

Workflows & Pipelining

Includes the latest version of KNIME (v2.9)
Many new features including a Send Email node and ability to save workflows under different names; see for a complete list of new features Use any Glide simulation option in the Glide Ligand Docking node Employ a specific template in the Prime Build Homology Modeling node Import ungrouped structures to PyMOL from Run PyMOL node

Job Control

Improved fault tolerance Improved handling of suspended jobs in queueing systems

There are also updates to the Biologics Suite and the Materials Science Suite.


A review of FAst MEtabolizer (FAME)


Whilst much computational work is undertaken to support, library design, virtual screening, hit selection and affinity optimisation the reality is that the most challenging issues to resolve in drug discovery often revolve around absorption, distribution, metabolism and excretion (ADME). Whilst we can measure the levels of parent drug in various medium tracking metabolic fate can often be a considerably more difficult proposition requiring significant resources. For this reason prediction of sites of metabolism has become the subject of current interest.

FAME DOI is a collection of random forest models trained on a comprehensive and highly diverse data set of 20,000 small molecules annotated with their experimentally determined sites of metabolism taken from multiple species (rat, dog and human). In addition dedicated models are available to predict sites of metabolism of phase I and II processes.


FAME offers a high performance prediction of sites of metabolism mediated by a wide variety of mechanisms.

The full review is available here

There is a list of software reviews here.


ROCS Updated


OpenEye have just announced that the virtual screening tool ROCS v3.2 has been released.

Several noteworthy features have been added to this version including a -subrocs option that can drastically improve substructure alignments. Also included is an application rocs-report that uses our 2D depiction technology to make pdf reports of hitlists displayed with 2D similarity, shape and color overlaps, as well as property histograms. Substantial upgrades have been made to vROCS. An improved sketcher now highlights unspecified stereochemistry in atoms and bonds in query structures, and requires the user to correct any unspecified stereochemistry.

ROCS is available for download here.