Cheminformatics for Drug Design: Data, Models and Tools, Meeting Report

This was a joint meeting Organised by SCI's Fine Chemicals Group and RSC's Chemical Information and Computer Applications Group. Held at Imperial War Museum, Duxford, UK, on Wednesday 12 October 2016. This was an excellent meeting and the conference centre at Duxford was superb, many participants arrived early to have a wander around the historic collection of aeroplanes.

Paul Mortenson (Astex) described the informatics platform that Astex have built to support their drug discovery group. Astex did a detailed analysis of their needs and decided that at the time no commercial vendor was suitable. In particular they wanted an ease of use and responsiveness that they felt required a bespoke design. They designed the system to be entirely within a web-browser since this meant there was no need to install anything specific on the desktop, tablet or smart phone. Responsiveness was a important consideration, and having all the servers locally critical. Users were very familiar and comfortable with a web environment. Since Astex is very much focused on structure-based design as well as the usual tables and spreadsheets they needed a 3D molecule viewer, for this they used OpenAstexViewer ( M.J.Hartshorn, JCAMD, 16, 871 (2002). This provides high quality display of both small molecules and proteins, it can also provide shaded molecular surfaces, with transparency and property mapping and protein schematics, and can be controlled using javascript.


Isabelle Giraud (Actelion) then described DataWarrior ( a freely available open source cross platform data analysis tool that understands chemistry, it provides an efficient way to search, sort and analyse structure-activity data. DataWarrior was developed at Actelion and it is highly integrated into their drug discovery platform, in 2014 it was decided to release DataWarrior without the integration layer as a stand-alone tool to the public. Actelion made the decision to open source DataWarrior in an effort to help development.


Matt Segall (Optibrium) gave a very interesting presentation dealing with the critical issue of decision making when the data/information is uncertain. He also made the important point that with most experimental data or calculated properties it is as important to understand the confidence in the score as the score itself. He then gave a demonstration of the desktop tool StarDrop ( that can be used for multi-parameter optimization (MPO) in which the relative importance of each parameter can be fine-tuned to give an overall score.


Julian Blagg (The institute for Cancer Research) gave a great talk describing the informatics platform at the ICR. After many years with data held in multiple spreadsheets, presentations, reports they decided they needed a platform that would allow them to share data seamlessly with both internal and external scientists. After careful review of the options they decided to go forward with the Dotmatics platform ( He described the initial set-up that that allowed users to browse biological data and link to chemical structures. This has now been expanded to include electronic notebooks and also the capture of high content screening data. The emphasis has been very much ease of use and unobtrusive capture of information. An added benefit has been the ability to share project specific data easily with external collaborators.

Subsequently CRUK has decided to roll this out across all the research labs. Julian highlighted that CRUK is a particularly enlightened research funding agency, recognizing that sharing and long-term storage and retrieval of data is a mission critical activity.

Vivienne Allen (Charles River Laboratories) gave an interesting talk on the tools used in screening library design, in particular in the design of soft-focus libraries. These are small libraries designed to target a specific protein class (e.g. GPCR, Kinase, Epigenetic). They use a combination of cheminformatic tools to explore literature sources (e.g. ChEMBL) and fragment known ligands to identify interesting substituents. These are combined with novel scaffolds and the subsequent virtual libraries evaluated using docking to prioritize targets.

Jerome Hert (Roche) gave a talk on the impact of computer aided drug discovery (CADD) at Roche. The process was defined as Design, Synthesis, Testing, Analysis steps and there are opportunities for CADD to contribute to all steps. The presentation used examples from different projects in illustration.

Al Dossetter (Medchemica), Medchemica have used ADME/T data from a consortium of pharma companies and applied a matched molecular pairs analysis (MMPA) to define a set of rules for favorable transformations. ( They can apply these rules to novel targets to suggest new compounds that would be worth making in an unbiased manner. Their analysis showed that whilst the data from an individual company provided on a very limited set of useful rules, combining data from multiple sources resulted a greater than additive number of rules.

David Leahy (Discovery Bus Ltd.) gave a thought-provoking final talk in which he argued the large pharma companies should be investing in artificial intelligence (AI) to replace medicinal chemists in drug design. There was an unresolved discussion about the patentability of molecules suggested by AI.