Macs in Chemistry

Insanely Great Science


7 June 2023 Cambridge Cheminformatics Network Meeting

In person at the Cambridge Crystallographic Data Centre on Union Road, Cambridge. 7 June 2023, 4-5.30pm UK time. Social event afterwards at the Alma. Registration (for Zoom attendance):


Structure-based Drug Design with Equivariant Diffusion Models, Charlie Harris, University of Cambridge

DECIMER: Deep Learning for Scraping, Curating and Registering Compounds From the Primary Literature, Kohulan Rajan, Jena University

Distributed HPC Workflows with Covalent, Will Cunningham, Agnostiq

More details here


alvaScience updates

alvaBuilder v1.0.10:

  • added SMILES column when exporting molecules in Excel format
  • improved GUI to relocate an alvaRunner project if the file path has changed
  • fixed Copy Scaffold as SMILES
  • fixed potential runtime error when using alvaRunner project on Apple M1/M2 CPU

alvaDesc v2.0.16:

  • enabled BLI calculation for disconnected structures
  • enabled Intrinsic state pseudoconnectvity indices calculation for disconnected structures
  • fixed Copy Scaffold as SMILES
  • fixed Z coloring for PCA and t-SNE charts
  • fixed '3D coordinates' value in Molecule detail when dealing with SDF/MOL2 files
  • fixed QED calculation (could affect molecules including isotopic hydrogens)
  • fixed No. value in the molecule detail panel when the dataset is sorted or filtered

alvaModel v2.0.8:

  • improved Bemis-Murcko framework identification, including exocyclic double bonds in scaffolds and linkers (Prediction detail)
  • fixed color of histograms of the model test set
  • fixed potential runtime error when using Apple M1/M2 CPU

alvaMolecule v2.0.6:

  • minor fixes

alvaRunner v2.0.8:

  • improved Bemis-Murcko framework identification, including exocyclic double bonds in scaffolds and linkers (Prediction detail)
  • fixed potential runtime error when using Apple M1/M2 CPU

In a recent post Pat Walters highlighted the use of molfeat in a google colab notebook

I thought I'd also mention other tools available from is an open-source toolkit that simplifies molecular processing and featurization workflows for ML scientists in drug discovery.

Cheminformatics support is all built upon the open-source toolkit RDKit It can be installed using conda

conda install -c conda-forge datamol

Or pip

pip install datamol

The latest version (0.9) appears to need Python 3.9 and RDKit version [2022.03, 2022.09]

There is a comprehensive series of tutorials and an extensive documentation.

License is Apache version 2.0.

If you would like to contribute details are on GitHub


Jazzy a Python library to calculate a set of atomic/molecular descriptors

Just spotted a very interesting paper "Fast calculation of hydrogen-bond strengths and free energy of hydration of small molecules" DOI.

Jazzy is a Python library that allows you to calculate a set of atomic/molecular descriptors which include the Gibbs free energy of hydration (kJ/mol), its polar/apolar components, and the hydrogen-bond strength of donor and acceptor atoms using either SMILES or MOL/SDF inputs. Jazzy is easy to use, does not require expensive hardware, and produces accurate estimations within milliseconds to seconds for drug-like molecules. The library also exposes functionalities to depict molecules with atomistic hydrogen-bond strengths in two or three dimensions.

Code is on GitHub

And there is a really useful cookbook with examples.


UK-QSAR Spring 2023 meeting

The UK-QSAR Spring 2023 meeting will held on Thursday 20th April 2023 / 9:00 AM – 17:00 PM at the Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK .

The meeting is organised jointly by EMBL-EBI and Sosei Heptares, and as always is free. The theme this time is “Learning from data” and for the occasion three of the most relevant databases for our fields will be introduced (PubChem, ChEMBL and the Cambridge Structural Database). The afternoon sessions will be focused on protein structure-based techniques (including use of AlphaFold, and ML for virtual screening) and reaction informatics (including applications of Enamine REAL, and machine learning).

Registration is now open. Please register using the following link:


Sheffield 9th Conference on Cheminformatics

The Sheffield Conference on Cheminformatics is always one of the highlights of the calendar, it will be held at The Edge, University of Sheffield, UK, Monday 19th – Wednesday 21st June, 2023.

As usual a great lineup of speakers

Confirmed Attendees & Titles of Paper:

  • Adele Hardie A World of Probabilities: An sMD/MSM Approach for Rational Design of Allosteric Modulators
  • Aras Asaad Persistence homological statistical summaries for ligand-based virtual screening
  • Benoit Baillif Applying atomistic neural networks to bias conformer ensembles towards bioactive-like conformations
  • Dan Woodward Coverage Score: A Model Agnostic Method to Efficiently Explore Chemical Space
  • David Palmer Simultaneous Entropy, Enthalpy and Free Energy Prediction using a Physics-Informed Neural Network and Multi-task Learning
  • Lauren Reid SARkush®: Automated Markush-like structure generation using matched pairs and generic atom scaffolds
  • Helle van den Maagdenberg QSPRpred: a Flexible and Open Quantitative Structure-Property Relationship Modelling Tool
  • Henriette Willems PI5P4K subtype-selective inhibitors: three binding modes from one privileged motif
  • James Webster An in-silico benchmarking platform for generative de novo drug design
  • Marc Lehner Partial Charge Prediction and Pattern Extraction from a AttentiveFP Graph Neural Network
  • Maria J Falaguera Illuminating the Chemical Space of Untargeted Proteins
  • Matteo Ferla Fragmenstein: stitching compounds together
  • Maximilian Beckers Prediction of small molecule developability using large-scale in silico ADMET models
  • Moritz Walter Integrating heterogeneous assay data for ML-based ADME prediction
  • Noel O’Boyle Handling large chemical spaces in Structure-Based Drug Design
  • Rajarshi Guha Virtual Screening of Virtual Libraries using a Genetic Algorithm
  • Richard Gowers The Open Free Energy Consortium: Alchemistry for everyone
  • Richard Sherhod Glolloc: a global-local mixture of experts model and its application to small molecule drug discovery
  • Roger Sayle FNGRPRNTS: Processing just the bits you need, and none of the 1s you don’t.
  • Roxana-Maria Rujan Resolving code names to structures from the medicinal chemistry literature: not as FAIR as it should be
  • Samuel Genheden AiZynthFinder: developments and learnings from three years of industrial application
  • Sébastien Guesné Beyond balanced accuracy: balanced Matthews’ correlation coefficient.
  • Sohvi Luukkonen DrugEx: deep learning for de novo drug design — a case for A2B selective ligands
  • Srijit Seal PKSmart: An Open-Source Computational Model to Predict in vivo Pharmacokinetics of Small Molecules
  • Tuomo Kalliokoski Efficient structure-based virtual screening of ultra-large enumerated chemical spaces using macHine leArning booSTEd dockiNg (HASTEN)
  • Uschi Dolfus Full modification control over retrosynthetic routes for guided optimization of lead structures

Stardrop Update

StarDrop 7.4. This latest release contains several new features.

Core Application​:

  • Added the ability to colour StarDrop data sets based on property values (“heatmapping”)
  • Added the ability to specify the size and position of structure in chart pop-ups
  • Added the ability to specify default properties to be displayed in chart pop-ups
  • Added the ability to synchronise the display of chart pop-ups and labels
  • Added the ability to convert chart pop-ups into labels
  • Added the ability to show chart pop-ups whenever compounds are selected


  • Added the ability to edit data set entries from a script
  • Added the ability to refresh a data set from a script

Query Interface:

  • Added the ability to query on pre-defined SMARTS and TEXT values
  • Added support for categorical data types

InChi Use Survey

The InChi group are running a short survey to find out more about the use. It would be really helpful if you have 2-3 mins to complete the survey

The International Chemical Identifier (InChI) is a textual identifier for chemical substances, designed to provide a standard way to encode molecular information and to facilitate the search for such information in databases and on the web. InChIs differ from the widely used registry numbers in three respects: firstly, they are freely usable and non-proprietary; secondly, they can be computed from structural information and do not have to be assigned by some organization; and thirdly, most of the information in an InChI is human readable (with a lot of practice). The InChIKey is a fixed length (27 character) condensed digital representation of the InChI that is not designed to be human-understandable.

More details are on the InChiTrust Website


6th RSC-BMCS/RSC-CICAG Artificial Intelligence in Chemistry

The dates for the 6th RSC-BMCS/RSC-CICAG Artificial Intelligence in Chemistry are now set, be sure to put them in your calendar. The call for oral and poster abstracts will be opening very soon as will registration. This will be a hybrid meeting with both in person and virtual attendance options. #AIChem23.


Spread the word, this is always a fantastic meeting and there are already a number of great speakers in place

Confirmed Speakers:

Andrew White, University of Rochester and VIAL, US
Kathryn Furnival, AstraZeneca, UK
Laksh Aithani, Charm Therapeutics, UK
Michael Bronstein, University of Oxford, UK
Michelle L. Gill, NVIDIA, US
Noor Shaker, Glamorous AI and X-Chem, UK

Conference website is


Cambridge Cheminformatics Meeting, 8 Feb

Wed 8 Feb at 4pm at



RSC CICAG winter 2022-2023 newsletter is out!!

The latest RSC CICAG newsletter is now available

This is a really bumper issue

Contents Chemical Information and Computer Applications Group Chair’s Report
CICAG Planned and Proposed Future Meetings
Social Media Migration – Opening up Mastodon as a Tool for Scholarly Communication
Cheminformatics: A Digital History ‒ Part 2. A Personal Perspective of the Role of the Web During the Period 1993-1996
InChI Technical Developments
Update from the Royal Society of Chemistry Library
The Open Free Energy Project
Meeting Report: SCI-RSC Workshop on Computational Tools for Drug Discovery 2022
A Crystallography Papermill: The CSD Response
Meeting Report: Ultra-Large Chemical Libraries
Open Science in the Royal Society of Chemistry
The Davy Notebooks Project
This JACS Does Not Exist: Generating Chemistry Abstracts with Machine Learning
Meeting Report: RSC-CICAG and RSC-BMCS 5th Artificial Intelligence in Chemistry Conference.
EU-OPENSCREEN ERIC: an Open-access Research Infrastructure for Chemical Biology and Early Drug Discovery
DECIMER ‒ An Open Toolkit for Optical Chemical Structure Recognition and Document Analysis
Cryo-EM for Industrial-Scale Structure-Based Drug Design
Cryo-EM & Drug Discovery
2022 CSD Updates
News from ACS CINF
News from CAS
RSC Databases Update
UKeiG: Winners of the Prestigious Tony Kent Strix Award 2022
AI4SD News
Book Review: Digital Transformation: New Tools and Methods for Mining Technological Intelligence
Cheminformatics and Chemical Information Books
2022 Reflections on Life at the Catalyst Science and Discovery Centre and Museum in Widnes
Other Chemical Information News

Contributions to the CICAG Newsletter are welcome from all sources ‒ please send to the Newsletter Editor Dr Helen Cooke FRSC: email

When you renew your Royal Society of Chemistry membership you can choose “Chemical Information and Computer Applications Group” (group 86) as one of your groups to be a member of. Use the “Connecting with others” button at If you have already submitted your form you can make a request to join a group via email ( or telephone (01223 432141)


Structure-based searching SQLite

I've been experimenting with SQLite a software library that provides a relational database management system, it is self-contained, serverless, and requires little or no admin.

SQLite is a C-language library that implements a small, fast, self-contained, high-reliability, full-featured, SQL database engine. SQLite is the most used database engine in the world. SQLite is built into all mobile phones and most computers and comes bundled inside countless other applications that people use every day.

In th first tutorial I described looking at using it for very fast exact lookup of chemical structures. This tutorial takes you through setting up the database, storing chemical structures as SMILES strings and then accessing it using a Jupyter Notebook.

The second tutorial shows how to create a python script to access from the command line, and using AppleScript to access it from ChemDraw. This allows you to get the structure for a specific identifier or check for the identifier for a drawn structure.

The third tutorial shows how to use the fabulous Chemicalite to support high performance chemical structure-based searching of a SQLite database of over 2 million structures.



Using SQLite for exact search

I've been experimenting with SQLite a software library that provides a relational database management system, it is self-contained, serverless, and requires little or no admin.

SQLite is a C-language library that implements a small, fast, self-contained, high-reliability, full-featured, SQL database engine. SQLite is the most used database engine in the world. SQLite is built into all mobile phones and most computers and comes bundled inside countless other applications that people use every day.

In particular I've been looking at using it for very fast exact lookup of chemical structures. This tutorial takes you through setting up the database, storing chemical structures as SMILES strings and then accessing it using a Jupyter Notebook.

The second tutorial shows how to create a python script to access from the command line, and using AppleScript to access it from ChemDraw. This allows you to get the structure for a specific identifier or check for the identifier for a drawn structure.

CD result


Updated Multicentre Cheminformatics Meeting

As many know we have been holding regular cheminformatics network meetings in Cambridge for many years. These have been a great means to find out about new science, network and then socialise in a nearby pub. As we try to spread the word I'm really delighted to announce a multi-centre cheminformatics meeting.

30 November 2022, 3.30-5pm UK / 4.30-6pm CET:

Cross-Site Cambridge/Oxford/Berlin Digital Drug Discovery Meeting!

Cambridge: Cambridge Crystallographic Data Centre, Union Road
Oxford: Phase II Biochemistry Seminar Room, New Biochemistry Building off South Parks Road
Berlin: Institut für Pharmazie, Königin-Luisestrasse

Please turn up 10 minutes before the start time to register where needed

  • Programme

Why is it so Hard to Search Ultra-Large Chemical Libraries?
Roger Sayle, NextMove Software, Cambridge

Fragmenstein: Stitching Compounds Together Like a Reanimated Corpse
Matteo Ferla, Oxford Protein Informatics Group, Department of Statistics

Data-Driven Methods for Active Compound Design and Risk Assessment
Andrea Volkamer, Charité Berlin and Saarland University

  • After-event locations (~5.15pm GMT/~6.15pm CET onwards)

Cambridge: The Alma, Russell Court
Oxford: The Royal Oak, Woodstock Road
Berlin: Luise Dahlem, Königin-Luise-Straße

Please join us in person (just turn up at the event location 10 minutes before the start time in case local registration is needed) or for joining remotely please register here:


RDKit updated

The latest version of RDKit has been released.


  • The new RegistrationHash module provides one of the last pieces required to build a registration system with the RDKit.
  • This release includes an initial version of a C++ implementation of the xyz2mol algorithm for assigning bonds and bond orders based on atomic positions. This work was done as part of the 2022 Google Summer of Code.
  • A collection of new functionality has been added to minimallib and is now accessible from JavaScript and other programming languages.


I first mentioned alvaScience in a review of alvaDesc I wrote in 2019 Since then I've mentioned new products as they have been launched but only recently have I gone back to the website to look at things in more detail.

At alvaScience, we are constantly exploring and implementing the most promising and innovative technologies in our software tools, which makes them a leading choice for QSAR and other cheminformatics research.

alvaModel is an interesting software tool to create Quantitative Structure Activity/Property Relationship (QSAR/QSPR) models using the descriptors and fingerprints calculated in alvaDesc. It comes with a variety of very useful tools for data and descriptors, such as feature reduction and a variety of machine learning tools

Regression model

  • Ordinary Least Squares (OLS) model
  • Partial Least Squares (PLS) model
  • KNN regression model
  • Support Vector Machine (SVM) model
  • Consensus model defined as the arithmetic mean of the values predicted by the selected models

Classification model

  • Linear and Quadratic Discriminant Analysis (LDA/QDA) model
  • Partial Least Squares Discriminant Analysis (PLS-DA) model
  • KNN classification model
  • Support Vector Machine (SVM) model
  • Consensus model defined assigning the class based on the majority of the values predicted by the selected models

Whilst building models is one thing being able to deploy them easily is something else. alvaRunner helps with this. alvaRunner can be accessed via the command line but I suspect many users will use the graphical interface. Using the GUI, for every imported molecule, you can see the predicted targets and whether the molecule is inside or outside the defined model’s Applicability Domain and you can sort and filter any column by right-clicking the corresponding column header.


alvaScience will be at the RSC-SCI Workshop on Computational Tools for Drug Discovery 2022, if you would like to try it out why not come along.


MayaChemTools update

The awesome has a couple of new additions and updates.

Two new command line scripts:

In addition, the and scripts have been updated to optionally filter matched torsions by atom indices for performing torsion scans. A number of enhancements have been made to script including visualization of docked poses.

All scripts are listed here


alvaMolecule updated

Alvascience have just released an update to alvaMolecule


Here is the list of the main changes of alvaMolecule:

  • Import XYZ cartesian coordinates format (*.xyz)
  • Export dataset as Excel file (*.xlsx)
  • Delete molecules
  • Find molecular structures in Google Patents/Scholar
  • Molecule grid: filter molecules by substructure (SMARTS)
  • Molecule grid: column value can be set as footer of a molecule cell
  • Charts: added molecule hint on charts
  • Charts: added context menu to save data of charts
  • Charts: added 'Lasso selection'
  • Highlight substructures (SMARTS)
  • Highlight Bemis-Murcko features
  • Standardizers: standardize nitro group as [N+]([O-](=O))
  • Standardizers: remove radicals preserving usual valence
  • Standardizers: neutralize atoms
  • Standardizers: neutralize molecule
  • Standardizers: custom standardizer with SMIRKS
  • Checkers: modified the non-standard atom set which now includes the following atoms: H, B, C, N, O, P, S, F, Cl, Br, I
  • Duplicate analysis: identify duplicated structures
  • Duplicate analysis: automatically delete duplicated structures
  • Duplicate analysis: identify molecules with the same value for a specific column
  • Scaffold analysis: identify the Bemis-Murcko scaffolds

alvaMolecule is free for academic and non-commercial use.


1st EUOS/SLAS Joint Challenge: Compound Solubility

The latest kaggle challenge is up.

Develop new methods to predict compound solubility based on chemical structure.

EU-OPENSCREEN ERIC and SLAS challenge you to develop a reliable algorithm that can predict the solubility of a small molecule, an essential feature of all biologically active compounds. EU-OPENSCREEN ERIC provides a high-quality data set of experimentally measured aqueous solubility of about 100,000 small molecules which was produced at an EU-OPENSCREEN ERIC high throughput screening partner site. 70,000 of these molecules will be available for download on Kaggle, and the residual 30,000 compounds will be withheld for prediction.

Full details are here


OPSIN 2.7.0 has been released

OPSIN - Open Parser for Systematic IUPAC Nomenclature, has been updated

OPSIN is a Java library for IUPAC name-to-structure conversion offering high recall and precision on organic chemical nomenclature.

Java 8 (or higher) is required for OPSIN 2.7.0

Supported outputs are SMILES, CML (Chemical Markup Language) and InChI (IUPAC International Chemical Identifier)

Convert a chemical name to SMILES

java -jar opsin-cli-2.7.0-jar-with-dependencies.jar -osmi input.txt output.txt

where input.txt contains chemical name/s, one per line


RSC CICAG Summer newsletter

The RSC CICAG newsletter is now available pdf

Chemical Information and Computer Applications Group Chair’s Report 4
Your CICAG Committee - Introducing Our New Members 5
CICAG Planned and Proposed Future Meetings 6
Free Workshops on Open-Source Tools for Chemistry 7
The COVID Moonshot 7
Practical Cheminformatics with Open-Source Software 11
The Catalyst Science and Discovery Centre Archives 12
Chemical Data Recovery 3: Legacy Chemical Data Recovery 15
Svante Wold, 1941-2022 22
Cheminformatics: a Digital History - Part 1 Early days at Sheffield: a Personal Perspective 23
Being #CompChemURG: Forging New pathways 28
UKeiG Call for Nominations for the Prestigious Tony Kent Strix Award 2022 29
Welcome to the New Era of Scientific Publishing 30
Greg Landrum Receives the Mike Lynch Award 37
Bioinformatics in the Post-AlphaFold 2 Era 38
Diana Leitch – Reflections on her Life in Chemistry, Chemical Information and Librarianship 45
Meeting Report: AI in Drug Discovery 50
AI4SD News 53
The IUPAC Green Book 55
RSC Historical Group – Women in Chemistry Symposium 56
ACS CINF Report for July 2022 57
News from CAS 57
Chemical Information / Cheminformatics and Related Books 59
Other Chemical Information News 61

We have already started compiling content for the winter newsletter, if you have suggestions or would like to contribute please get in touch.


RSC CICAG Open Source tools for Chemistry Workshops


The latest of the RSC CICAG workshops is now online

The is the latest in 20 workshops that are available on the RSC CICAG YouTube channel These workshops have now been viewed over 21,000 times and they are a fabulous way to find out about some of the Open-Source tools and resources that are available to chemists.

RSC CICAG are also organising a number of meetings

Details of the RSC CICAG meeting on Ultra-large Chemical Libraries are available on the CICAG website

This one-day meeting will be held on 10 August 2022 10:00-17:00, at Burlington House, London. Registration is open the speakers have been finalised and looks a great line-up. Remember bursaries are also available.

RSC CICAG and BMCS are organising the 5th Artificial Intelligence in Chemistry Symposium. #AIChem22


This two day meeting (Thursday-Friday, 1st-2nd September 2022) will be held at Churchill College Cambridge UK. Details of the meeting and registration are on the conference page website

RSC CICAG in collaboration with the SCI are also organising the SCI-RSC Workshop on Computational Tools for Drug Discovery 2022.

This will be held on 23 November 2022 at The Studio Birmingham. Full details and registration are on the website


RSC CICAG Open Source Tools for Chemistry :- Scoring of shape and ESP similarity (Ester Heid)


The latest of the RSC CICAG workshops is now online

Electrostatic effects along with volume restrictions play a major role in enzyme and receptor recognition. Evaluating electrostatic and shape similarities of pairs of molecules such as proposed versus known ligands can therefore be valuable indicators of prospective binding affinities. This workshop will demonstrate how to compute electrostatic and shape similarities using the open-source tool ESP-Sim, Available options for comparing electrostatics will be discussed interactively on selected examples of public datasets, along with advice on embedding and aligning molecules prior to computing similarities.

Whilst comparing molecules using 1D or 2D descriptors is well known, most molecules are three dimensional, as are biomolecule binding sites. The comparison of molecular shapes and electrostatics is particularly challenging and this workshop is a perfect introduction. Come along and you have a chance to ask questions directly.

All materials are available on GitHub


Chemfp 4.0 has been released


Chemfp 4.0 was recently released, with support for several diversity selection algorithms, and an improved API for interactive use in a notebook environment.

Chemfp is an analytics package for cheminformatics fingerprints. It contains command-line tools and an extensive Python library for fingerprint generation, high-performance similarity search, diversity selection, and exploratory research.

The new diversity selection algorithms are MaxMin, sphere exclusion (both random and directed), and HeapSweep.

People who live in the Jupyter notebook will likely enjoy the new chemfp user experience. Most long-term actions support progress bars, chemfp's Python objects have more informative repr()s, search results added Pandas integration, and there are new high-level APIs that let you express a lot of functionality compactly.

The Base License covers most in-house use of chemfp, though a few features are either limited or disabled and require a license key to unlock. For alternative licenses, including source code and no-cost academic licensing, see -- or try one of the re-formatted ChEMBL datasets at which include an embedded authorization key.


Electrostatic and shape similarity workshop


The latest RSC CICAG Open-Source Tools for Chemistry workshop

23 June 2022 Scoring of shape and ESP similarity (Ester Heid)

Electrostatic effects along with volume restrictions play a major role in enzyme and receptor recognition. Evaluating electrostatic and shape similarities of pairs of molecules such as proposed versus known ligands can therefore be valuable indicators of prospective binding affinities. This workshop will demonstrate how to compute electrostatic and shape similarities using the open-source tool ESP-Sim, Available options for comparing electrostatics will be discussed interactively on selected examples of public datasets, along with advice on embedding and aligning molecules prior to computing similarities.

Whilst comparing molecules using 1D or 2D descriptors is well known, most molecules are three dimensional, as are biomolecule binding sites. The comparison of molecular shapes and electrostatics is particularly challenging and this workshop is a perfect introduction. Come along and you have a chance to ask questions directly.



Cambridge Cheminformatics Network Meeting


Next Meeting: 4pm (UK time) 8 June 2022, via Zoom and in person (hybrid!) details are here

The IN-PERSON meeting will be held at the Cambridge Crystallographic Data Centre on Union Road, and be capped at 30 attendees. For IN-PERSON attendance please email andreas AT for registration. Afterwards, at 6pm, we will go to the Panton Arms, and everyone is welcome to join there!

HYBRID MEETING - please use this registration link for VIRTUAL attendance:

Programme Efficient algorithms for fingerprint similarity search and diversity selection Andrew Dalke, Dalke Scientific (remote)

Chemical substructure and similarity search at scale on a Graph computing platform Andrew Stolman, Abbvie (remote)

Automated determination of optimal λ schedules for free energy calculations Sofia Bariami and Mark Mackey, Cresset (in-person)

The Cambridge Cheminformatics Network Meetings, which are free to attend and open to all. We start our meetings in the afternoon at 4pm (UK time) with a series of short scientific talks, either on Zoom, or in person. If the latter, we will continue the evening with a mixer at the local pub.


iBabel updated


The latest version of iBabel is now available. The big change is iBabel is now a universal application.


More details here


Ultra Large Chemical Libraries Conference


More details of the RSC CICAG meeting on Ultra-large Chemical Libraries are available.

This one-day meeting will be held on 10 August 2022 10:00-17:00, at Burlington House, London.

Registration is open and a number of the speakers have been finalised and looks a great line-up.

Roger Sayle, NextMove Software Limited, United Kingdom
Carol Mulrooney, GSK, United States
Jan H Jensen, University of Copenhagen, Denmark
Noah Harrison, Evariste Technologies, United Kingdom
Peter Pogany, GSK, United Kingdom

There is still time to submit poster abstracts. A limited number of bursaries are available, the application form should be submitted to the organisers. A maximum of £300 will be reimbursed on submission of receipts.

If you would like to exhibit, sponsor or support this meeting please contact the organisers.

This meeting is supported by

DD logo 1600 x 325 RSC MedChem_no border


Comparing a M1 MacBook with Intel MacBookPro for Cheminformatics/CompChem Updated


I'm slowly working through a variety of cheminformatics toolkits and computational chemistry applications, I'm trying to run some "real world" workflows so you can see what kind of performance improvement you might expect.

The index page is here and I'll update it as a test more applications.


Installing Alphafold2 on Apple Silicon


AlphaFold2 is an artificial intelligence (AI) program developed by Alphabets's/Google's DeepMind which performs predictions of protein structure. Despite the name AlphaFold2 does not actually predict the folding mechanism instead it predicts the final 3D structure of a protein from the protein sequence DOI.

Source code for the AlphaFold model, trained weights and inference script are available under an open-source license at

I've compiled step by step instructions for installing Alphafold2 on a MacBook Pro M1 max here

Many thanks to Yoshitaka Moriwaki for help.


Matched molecular pair database generation and analysis


Matched molecular pair analysis (MMPA) is a popular structure activity method in cheminformatics that compares the properties of two molecules that differ only by a single chemical transformation, (e.g. substitution of a hydrogen atom by a chlorine atom). Because the structural difference between the two molecules is small, any experimentally observed change in a physical or biological property between the matched molecular pair could be associated with this particular molecular transformation.

Andrew Dalke has recently published open source code to support this methodology

To install

python -m pip install mmpdb

The package has been tested on Python 3.9.

You will need a copy of the RDKit cheminformatics toolkit, available from , which in turn requires NumPy. You will also need SciPy, peewee, and click. The latter three are listed as dependencies in setup.cfg and should be installed automatically.

Full details are described in this publication.

A. Dalke, J. Hert, C. Kramer. mmpdb: An Open-Source Matched Molecular Pair Platform for Large Multiproperty Data Sets. J. Chem. Inf. Model., 2018, 58 (5), pp 902–910. DOI.


Comparing energy usage between M1 Mac and Intel


The pages comparing cheminformatics/compchem apps on the MacBook Pro M1max are proving very popular. Several readers have asked me to compare energy usage which is an excellent suggestion.

Based on a suggestion I purchased Nevsetpo Power Meter UK Plug Power Monitor Watts Meter Plug and I've used it to test a selection of tasks. Once plugged into a socket it monitors total energy consumption of anything device plugged in. Both machines were fully charged and the "Optimised battery charging" was switched off.

I tried a few computationally intensive tasks and details of energy consumption are here..



Comparing a M1 MacBook with Intel MacBookPro for Cheminformatics/CompChem


As some of you may have seen I've started the comparison of my new MacBook Pro Apple M1 max with my old Intel MacBook.


I'm slowly working through a variety of cheminformatics toolkits and computational chemistry applications, I'm trying to run some "real world" workflows so you can see what kind of performance improvement you might expect.

The index page is here and I'll update it as a test more applications

When possible I've used the latest builds for the M1 arm architecture. Both machines were connected to power and had no other applications running. To date I've looked at the following.

More to come.


An introduction to cheminformatics, data analysis and machine learning

The video of the latest RSC CICAG Open Source Tools for Chemistry is now online.

Part of the RSC CICAG Open Chemical Sciences workshop series. Workshop given by Pat Walters ( entitled "An introduction to cheminformatics, data analysis and machine learning".

A hands-on workshop on building and validating ML models, including:

  • Initial exploratory data analysis
  • ML model building
  • Model evaluation
  • Making predictions on a larger data set

The notebooks for the tutorial are here If there are any problems, you should be able to just reload the page. The code and notebooks are in a GitHub repo.

The video is here YouTube.

More workshop videos are available on the RSC CICAG playlist


New additions to MayaChemTools


There have been a couple of new additions to the fabulous list of tools and scripts on MayaChemTools.

MayaChemTools is a growing collection of Perl and Python scripts, modules, and classes to support a variety of day-to-day computational discovery needs.




These scripts rely on the presence of Psi4 and RDKit in your environment. In addition, the script for further details.

MayaChemTools is free software; you can redistribute it and/or modify it under the terms of the GNU LGPL as published by the Free Software Foundation.


RSC CICAG workshops


The next of the monthly RSC CICAG workshops is on 19 August 2021 is entitled "An introduction to cheminformatics, data analysis and machine learning" with Pat Walters, Relay Therapeutics and blogger, if you don't follow his blog have a look now

This will be a hands-on workshop on building and validating ML models, including:

  • Initial exploratory data analysis
  • ML model building
  • Model evaluation
  • Making predictions on a larger data set

I'm sure this will be a really invaluable experience. Over 150 have already registered and you can register using the link below.

Registration for these free workshops is now open (#OpenChem21)

These workshops are sponsored by Liverpool ChiroChem


InChI version 1.06: now more than 99.99% reliable


A nice review of the latest version

In this paper, we report on the current state of the InChI Software, the details of the improvements in the v1.06 release, and the results of a test of the InChI run on PubChem, a database of more than a hundred million molecules. The upgrade introduces significant new features, including support for pseudo-element atoms and an improved description of polymers. We expect that few, if any, applications using the standard InChI will need to change as a result of the changes in version 1.06.

It can be downloaded here


Update to MayaChemTools


MayaChemTools is an ever increasing collection of python and perl scripts that support cheminformatics and computational chemistry.

The latest addition are based on PSI4 an open-source quantum chemistry package.

PSI4 provides a wide variety of quantum chemical methods using state-of-the-art numerical methods and algorithms. Several parts of the code feature shared-memory parallelization to run efficiently on multi-core machines. An advanced parser written in Python allows the user input to have a very simple style for routine computations, but it can also automate very complex tasks with ease.

The command line Python scripts based on Psi4 provide functionality for the following tasks:

  • Calculation of single point energies
  • Calculation of molecular properties and partial charges
  • Performing structure minimization
  • Generating molecular conformations
  • Visualizing frontier molecular orbitals and dual descriptors
  • Visualizing electrostatic potential on densities and molecular surfaces

MayaChemTools is free software; you can redistribute it and/or modify it under the terms of the GNU LGPL as published by the Free Software Foundation.


Automated reaction mapping


A recent publication caught my eye.

Extraction of organic chemistry grammar from unsupervised learning of chemical reactions DOI.

Anyone who has been involved in building a reaction database will know that atom mapping reagents/starting materials onto products is a very time-consuming and tedious process, that is often fraught with errors. So any method that automates this process is a significant step forward.

Here, we demonstrate that Transformer Neural Networks learn atom-mapping information between products and reactants without supervision or human labeling. Using the Transformer attention weights, we build a chemically agnostic, attention-guided reaction mapper and extract coherent chemical grammar from unannotated sets of reactions. Our method shows remarkable performance in terms of accuracy and speed, even for strongly imbalanced and chemically complex reactions with nontrivial atom-mapping. It provides the missing link between data-driven and rule-based approaches for numerous chemical reaction tasks.


They also supply a file containing Common patent reaction templates. This file contains the most common patent reaction templates (USPTO grants), including the year of the first appearance, the patent numbers, frequently used reagents, and the template count. The templates were extracted after applying RXNMapper to generate the atom-mapping.


DataWarrior update


A new version of DataWarrior has been released

v05.05.00: April 2021

  • 3D-Structure alignment considering shape and pharmacophoric features (PheSA)
  • Google Patent search and results in DataWarrior (keyword, structure, date, ...)
  • Link to Spaya synthesis planning server
  • Searchable and navigatable user manual
  • Macro to retrieve and visualize world-wide Corona virus spreading
  • Lots of new features, range filter animations, smarter labels, ...



RSC CICAG YouTube channel


The video of the ChimeraX workshop is now online

The RSC CICAG YouTube channel is building up a very useful collection of videos



Open Chemistry and Google Summer of Code


OpenChem is once again participating in the Google Summer of Code.

Avogadro 2 Project Ideas

  • Project: Python-based Compute and Data Server
  • Project: Biological Data Visualization
  • Project: Scripting Bindings
  • Project: Integrate with RDKit
  • Project: Tools for Interactive Molecular Dynamics

Open Babel Project Ideas

  • Project: Integrate CoordGen library
  • Project: Implement MMTF format
  • Project: Test Framework Overhaul
  • Project: Develop a JavaScript version of Open Babel
  • Project: Develop a validation and standardization filter

cclib Project Ideas

  • Project: Support for QCSchema JSON output
  • Project: Implement new parsers
  • Project: Discovering computational chemistry content online

QC-Devs Project Ideas

  • Project: Visualization of Molecular Structure and Reactivity
  • Project: Extended Interoperability of ChemTools and Quantum Chemistry Software
  • Project: Visualize Chemical Reactions
  • Project: Extended interoperability of GOpt and Quantum Chemistry Software
  • Project: Implement Workflows for Calculation and Usage of Databases of Isolated Atom Densities
  • Project: Orthogonal Procrustes for Rectangular Matrices
  • Project: Faster Molecular Integrals with Density-Fitting

3Dmol.js Project Ideas

  • Project: Improve 3Dmol.js

gnina Project Ideas

  • Project: Improve gnina

NWChem Project Ideas

  • Project NWChem-JSON
  • Project NWChem-Python-Jupyter Interface
  • JSON-LD for Chemical Data

DeepChem Project Ideas

  • Project: PyTorch Lightning Implementation
  • Project: Semiconductor Modeling Support
  • Project: Protein Language Models

Miscellaneous Project Ideas

  • Project: OneMol: Google Docs & YouTube for Molecules

There are more details of the potential ideas here or contribute your own idea.


RDKit blog


If you are a RDKit user then you should bookmark Greg Landrum's RDKit blog This is a new site and all the old content will be migrated in due course.



Searching Ultra large databases


There have been many estimates of the size of chemical space, an oft quoted number 1060 a number large enough to be effectively infinite.

At the beginning of December 2020 the NIH held a workshop looking at ultra large chemistry databases

Program Dec. 1, 2020

10:45 Susan Gregurick Welcome
11:00 Yurii Moroz Making virtual REAL: an Approach to Access Billions of Make-on-Demand Compounds
11:30 Daniel Kuhn Searching for novel chemical hit matter in large chemical spaces
12:00 Uta Lessel Boehringer Ingelheim Comprehensive Library of Accessible Innovative Molecules (BICLAIM)
12:30 Zhijie Liu Build & Explore Virtual Libraries for Drug Discovery Projects
1:00 Christos Nicolaou Idea2Data: Expediting Drug Discovery through Proximal Library Exploitation
1:30 Jason Deng & John Shirley Introduction to DEL informatics and Virtual Spaces at WuXi AppTec
2:00 Jennifer Elward Exploring GSK Space: Practical Application of Large Scale Virtual Screening
2:30 Venkatesh Mysore Screening Billions of Compounds on the AtomNet Model: Approaches and Future Directions

Dec. 2, 2020

11:00 Rick Stevens A Large-scale (4.2 Billion Molecules, 60TB) Compound Feature db for Deep Learning in Virtual Drug Screening
11:30 Vladimir Poroikov Revealing Antiviral Hits Among Billion Molecules with Ligand and Target-based Approaches
12:00 Jean-Louis Reymond The GDB Databases and Their Use for Drug Discovery
12:30 Matthias Rarey Combinatorial Approaches for Searching Synthetically Accessible Chemical Space
1:00 John Irwin Virtual Screening of Ultra Large Chemistry Databases
1:30 Gergely Zahoranszky-Kohalmi Integrated Computational Platform for Chemistry Automation
2:00 Tudor Oprea , The Art of Navigating in Chemical Bioactivity Space
2:30 Marc Nicklaus & Nadya Tarasova SAVI: Billions of Easily Synthesizable Compounds Generated with Expert-System Rules
3:00 Jim Brase ATOM – Scalable Deep Learning of Generative Models for Molecular Design Optimization

Dec. 3, 2020

11:00 Andrew Dalke & Brian Cole Compression of Chemfp Databases
11:30 Roger Sayle Advances in Searching Ultra-Large (100+ Billion Compound) Compound Chemical Databases: Arthor and SmallWorld
12:00 Christian Lemmen Efficient 3D Exploration of Multi-Billion Compound Spaces
12:30 Lutz Weber & Christoph Ruttkies SciWalker Next Generation - a Novel Comprehensive Semantic Chemistry Search Engine for Heterogeneous Documents and Databases
1:00 Wolf-Dietrich Ihlenfeldt Cloud Databases and Chemical Structure Searching
1:30 Evan Bolton Chemical Space is Infinite: How Can One Scale to Infinity While Still Being Usable/Useful
2:00 Ian Wetherbee & Stephen Boyer A Collaborative Database for Chemistry in Google BigQuery
2:30 Eugene Raush Chemical Substructure Search in Ultra Large Chemical Databases: Fast Virtual Screening with Rapid Isoster Discovery Engine (RIDE)
3:00 Mark McGann GigaDocking: Structure Based Virtual Screening of Billion of Molecules

There presentations are now available online.


An invitation and a recipe


Around this time last year RSC CICAG held a meeting to discuss 20 years of rule of 5 this meeting involved presentations but also less formal panel discussions and informal chats. This proved to be really popular and CICAG had planned to hold another end of year meeting again with the aim of fostering more informal interactions. However, the efforts involved in converting the AI in chemistry and the Open Chemical Science meetings from physical to virtual meetings meant that we had to postpone the end of year meeting till next year.

In its place we have organised an impromptu webinar looking back at the history of subjects at the core of CICAG's interests.

Andrew Dalke has kindly agreed to talk about the history of cheminformatics from punch cards to the present day.

John Overington will also talk about the history of ChEMBL an absolutely invaluable open-source database that we now take for granted.

Our aim is that we will try to make this informal with plenty of time to ask questions/reminisce so stock up on mince pies and mulled wine, click the link below and settle back for a fascinating hour or two.

You are invited to a Zoom webinar.

When: Dec 22, 2020 04:00 PM London Topic: Looking back at Cheminformatics and ChEMBL

Register in advance for this webinar:

After registering, you will receive a confirmation email containing information about joining the webinar.

I promised a recipe.

Mulled Wine

Whilst you can buy bottles of mulled wine I think making your own gives better results (we are chemists after all).


A bottle of inexpensive red wine
2 oranges
2 cinnamon sticks
4 Cloves
2 Star anise
30 g Sugar (or more to taste)

Pour the red wine into a saucepan and add the cinnamon sticks (you can use ground cinnamon), cloves and star anise. Then add the zest from one of the oranges plus the orange juice together with the sugar. Heat gently to dissolve sugar and the simmer on low heat for 10 mins.

Serve whilst warm and add orange slices to decorate.

You can add a little brandy or Grand Marnier to give a little extra kick.


Version 2.0.0 of alvaDesc released


Just got this message

We are happy to inform you that we have just released the version 2.0.0 of alvaDesc.


Here is the list of the main changes:

  • the descriptors can be explored using a spreadsheet-like table with sorting and filtering capabilities
  • PCA data can be exported to tab-separated text files
  • in addition to PCA, the t-SNE algorithm can be used for dimensionality reduction
  • the 'Lasso selection' can be used to select molecules in all charts many 3D descriptors (e.g., 3D Atom Pairs) can be calculated on molecule having 3D coordinates only on the molecular skeleton (i.e., having no 3D coordinates for H atoms)
  • if needed, the partial charges are calculated using the Gasteigers "Partial Equalization of Orbital Electronegativity" (PEOE)

There is a review of alvaDesc here.


Molecular Similarity Search Benchmark (MssBenchmark)


This looks like it could be a very useful resource.

Molecular Similarity Search Benchmark (MssBenchmark) on GitHub these can be run on your local machine or on a HPC.

Currently supports

They also have ChEMBL and Molport as test datasets.


  • ansicolors==1.1.8
  • docker==2.6.1
  • h5py==2.7.1
  • matplotlib==2.1.0
  • numpy==1.13.3
  • pyyaml==3.12
  • psutil==5.4.2
  • scipy==1.0.0
  • scikit-learn==0.19.1
  • jinja2==2.10
  • h5sparse==0.1.0

Online Events


The current global pandemic means that more events are moving online, here are details of a few that have been sent to me

Dotmatics User Symposium | Cambridge 2020 14th & 15th October Details and Registration.

KNIME Introduction to Working with Chemical Data October 12 - 16, 2020 details and registration.

Virtual RDKit UGM 6-8 October 2020 details and registration.

16th German Conference on Cheminformatics and EuroSAMPL Satellite Workshop 2-3 November 2020 details

Open Chemical Science 9 - 13 November 2020 details.


iBabel downloaded over 500 times


I just noticed that the latest update to iBabel has now been downloaded over 500 times.

iBabel is a graphical user interface (GUI) to the open-source cheminformatics toolkit Open Babel described in an article in J Cheminformatics, Open Babel: An open chemical toolbox DOI. iBabel was originally written as an AppleScript Studio application which underwent several updates, but gradually became unsupportable. So rather than try to patch the AppleScript Studio/ApplescriptObjC application I decided on a starting afresh with a complete rewrite using Cocoa/Swift to take advantage of new technologies, this has meant some features have been lost, some new ones added but hopefully the core features are still there.


Instructions for download and installing are on the main iBabel page, remember you need to install OpenBabel first.


AI3 Science Discovery Network+ YouTube Channel


The AI3SD Network+ (Artificial Intelligence and Augmented Intelligence for Automated Investigations for Scientific Discovery) YouTube channel is now up and running.

The network+ is funded by EPSRC and hosted by the University of Southampton and aims to bring together researchers looking to show how cutting edge artificial and augmented intelligence technologies can be used to push the boundaries of scientific discovery.

  1. Drug Repositioning for COVID-19 - Professor John Overington (Medicines Discovery Catapult) -
  2. InChI: Measuring the Molecules - Professor Jonathan Goodman (University of Cambridge) -
  3. Design Fiction as a Method and why we might use it to consider AI - Dr Naomi Jacobs (Lancaster University) -
  4. Neural Networks and Explanatory Opacity - Dr Will McNeill (University of Southampton) -
  5. Dimensionality in Chemistry: Using multidimensional data for machine learning - Dr Ella Gale (University of Bristol) -



alvaBuilder is a software tool for de novo molecular design. With its simple interface, it can be used to generate novel molecules having a desirable set of properties (e.g., similarity to a given molecule, MW, logP, SAscore, QED, etc.) starting from a training set of your choice.


There is a video introduction here.


New addition to MayaChemTools


I've just heard about a new addition to the superb MayaChemTools.

MayaChemTools is a growing collection of Perl and Python scripts, modules, and classes to support a variety of day-to-day computational discovery needs.

The latest addition

Perform torsion scan for molecules around torsion angles specified using SMILES/SMARTS patterns. A molecule is optionally minimized before performing a torsion scan. A set of initial 3D structures are generated for a molecule by scanning the torsion angle across the specified range and updating the 3D coordinates of the molecule. A conformation ensemble is optionally generated for each 3D structure representing a specific torsion angle. The conformation with the lowest energy is selected to represent the torsion angle. An option is available to skip the generation of the conformation ensemble and simply calculate the energy for the initial 3D structure for a specific torsion angle

The torsions are specified using SMILES or SMARTS patterns. A substructure match is performed to select torsion atoms in a molecule. The SMILES pattern match must correspond to four torsion atoms. The SMARTS patterns containing atom indices may match more than four atoms.


infiniSee 2.0 released


The amount of chemical space that is now accessible has increased rapidly over the last couple of years with vendors now offering billions of molecules either available from stock or via rapid synthesis, this has made searching increasingly difficult. infiniSee is a tool for searching these vast chemical spaces rapidly using a feature tree search.

Under the hood lies the FTrees similarity engine. FTrees employs a fuzzy pharmacophore descriptor that is able to pick out structurally distinct (therefore distant) molecules that are actually close neighbours in pharmacophore space.

This major update introduces a new user interface making infiniSee more intuitive and easy-to-use. In addition to the new interface, the update brings a new mode called Web Service. It allows power-users to shift the heavy-duty computing to a different machine.


There is a review of an older version of infiniSee here.


The updated APIs of IBM RXN for Chemistry are now available.


IBM RXN is a free web service for predicting chemical reactions.

Whether it’s daily research activity or experiments for fun, IBM RXN can help you predicting chemical reaction outcomes or designing retrosynthesis in just seconds

  • We provide a state-of-the-art trained artificial intelligence (AI) model that can be used in your daily research activities irrespective of the purpose
  • Use the prediction mode to open a project and invite collaborators to collectively plan complex synthesis.
  • Use the challenge mode to test your Organic Chemistry knowledge and prepare for class exams Design your retrosynthesis either using the automatic or the interactive mode. In the interactive mode, IBM RXN for Chemistry – just like an assistant – recommends disconnections and you choose.

For testing the synthesizability of a molecule or for digitizing recipes, use the RXN APIs whenever you need an AI-driven organic chemistry assistant in your code.

The full documentation is here.


iBabel version 4.0 released


At the start of 2020 I decided that I'd try learning to program in Swift using Xcode, my first project was molSeeker a tool for searching online resources using chemical identifiers. This was really just a vehicle for me to learn Swift and now I'm delighted to be able to release a more substantial effort, a complete rewrite of iBabel.

iBabel is a graphical user interface (GUI) to the open-source cheminformatics toolkit Open Babel described in an article in J Cheminformatics, Open Babel: An open chemical toolbox DOI. iBabel was originally written as an AppleScript Studio application which underwent several updates. However, recent changes have made this unsupportable so I decided on a complete Cocoa/Swift rewrite.

You can read all about the new version of iBabel here including the links to the download. I've attached a couple of screenshots to give you an idea of what functionality is available.

File Conversion Tools


Viewing Molecule Files


Extension of molSeeker


iBabel 4.0 is freely available for download.

Many thanks to Peter Ertl and Bruno Bienfait for the JSME editor and David Koes for 3Dmol.js the 3D viewer and all the OpenBabel team.


The beta of the 2020.03 RDKit released


The beta of the 2020.03 RDKit is now available on GitHub

Backwards incompatible changes:

  • Searches for equal molecules (i.e. mol1 @= mol2) in the PostgreSQL cartridge now use the dochiralsss option. So if dochiralsss is false (the default), the molecules CC(F)Cl and C[C@H](F)Cl will be considered to be equal. Previously these molecules were always considered to be different.
  • Attempting to create a MolSupplier from a filename pointing to an empty file, a file that does not exist or sometihing that is not a standard file (i.e. something like a directory) now generates an exception.
  • The cmake option RDKOPTIMIZENATIVE has been renamed to RDKOPTIMIZEPOPCNT


  • The drawings generated by the MolDraw2D objects are now significantly improved and can include simple atom and bond annotations
  • An initial implementation of a modified scaffold network algorithm is now available
  • A few new descriptor/fingerprint types are available - BCUTs, Morse atom fingerprints, Coulomb matrices, and MHFP and SECFP fingerprints

Plus lots of bug fixes.


ChemRPS a Chemical Registration and Publishing System


Whilst there are many commercial packages for creating structure searchable chemical databases there is little in the way of Open Source packages, in particular a solution that provides a web front end. There is the RDKit PostgreSQL cartridge however installing PostgreSQL and building the database is probably a step to far for those unfamiliar with the use of the command line.


I recently came across ChemRPS whilst this uses the same RDKit PostgreSQL cartridge a search engine (API) and a preconfigured webserver with register/search web pages including structure editor Ketcher from EPAM, the installation comes as a Docker image which should make things much easier.

The system had not been tested on a Mac so I've detailed the instructions in this review…


Cambridge Cheminformatics Meeting


A brief reminder about the next Cambridge Cheminformatics Network Meeting on Wednesday, 12 February 2020, at the Cambridge Crystallographic Data Centre (CCDC,, starting at 3.30pm with coffee and talks from 4pm onwards - this time it will be a 'startup event', with the program being as follows:

"Quantum Software for Quantum Chemistry"
Joan Camps, Riverlane

"Improving virtual screening by combining molecular docking and hydrophobic profile similarity"
Javier Vazquez, Pharmacelera

"Imputation of heterogeneous assay data using deep learning"
Tom Whitehead, Intellegens

We will as usual retreat to the Alma afterwards (around 5.30pm) - no registration is necessary, and everyone is welcome to attend. (If you plan to attend please aim to turn up somewhat before the event starts, given that participants need to sign in upon arrival at the CCDC which may take a few minutes.)


Vega Hub


Vega Hub a variety of models for predicting properties.

Compared with many existing QSAR models, we have put greater emphasis on ensuring that the models generate transparent, understandable, reproducible and verifiable results. To achieve this, a series of tools has been optimised, which can relate the results obtained for the target chemical to the results obtained for similar (structurally related) compounds.

Models can be downloaded here


Small molecules approved by FDA in 2019


After I posted Small molecules approved by FDA in 2019 a number of people contacted me asking for the dataset, they then asked how it was created. So I thought I'd put together a brief description of the process.


Ensemble learning in Cheminformatics


Yet another invaluable post on cheminformatics and machine learning Python package for Ensemble learning #Chemoinformatics #Scikit learn.

Ensemble learning sometime outperform than single model. So it is useful for try to use the method. Fortunately now we can use ensemble learning very easily by using a python package named ‘mlens‘

Install using PIP

pip install mlens

ML-Ensemble (mlens) is an open-source high performance ensemble learning package written in Python, code is available on GitHub

ML-Ensemble combines a Scikit-learn high-level API with a low-level computational graph framework to build memory efficient, maximally parallelized ensemble networks in as few lines of codes as possible.


CSFP - A New Molecular Fingerprint


An interesting paper in JCIM, Connected Subgraph Fingerprints: Representing Molecules Using Exhaustive Subgraph Enumeration DOI.

The very popular ECFP fingerprint enumerates all circular substructures, i.e. substructures with a central atom and a spherical extension around them. However, not all chemical reasonable substructures are circular. They could be shaped as paths, cycles or any other irregular form and consequently cannot be represented as single features in ECFPs. To overcome these limitations, we developed a novel algorithm named CONSENS systematically enumerating all connected substructures within given size limits. CONSENS is the central element of a novel fingerprint named CSFP - Connected Subgraph Fingerprint. CSFPs are not only richer in represented substructures, furthermore they allow finegrained control to the chemical model encoded.

ConsensLib is a header-only C++ library for efficient enumeration of connected induced subgraphs. The CONSENS (Connected Subgraph Enumeration Strategy) algorithm enumerates all node sets that form a connected subgraph of a given query graph. It is available on GitHub


OpenEye Toolkits v2019.Oct


OpenEye have announced the release of OpenEye Toolkits v2019.Oct. These libraries include the usual support for C++, C#, Java, and Python.


  • Spruce TK, a new toolkit for preparing biomolecular structures for modeling applications, is now available in both C++ and Python APIs. Full details of the methodology are here.
  • OEChem TK now provides improved substructure search capability, allowing users to search tens of millions of molecules in seconds.
  • SMIRNOFF, a small molecule force field from the Open Force Field Initiative, is now integrated into OEFF TK. The force field can handle almost all pharmaceutically relevant chemical space

Full details are in the release notes.

This is the last release to support macOS 10.12.


RDKit 2019_09_1 (Q3 2019) Release


A new version of RDKit has been released


  • The substructure matching code is now about 30% faster. This also improves the speed of reaction matching and the FMCS code.
  • A minimal JavaScript wrapper has been added as part of the core release.
  • It's now possible to get information about why molecule sanitization failed.
  • A flexible new molecular hashing scheme has been added.

There are however a number of backward incompatible changes detailed in the documents.

Also the old MolHash code should be considered deprecated. This release introduces a more flexible alternative.

Binaries have been uploaded to ( The available conda binaries for this release are:

  • Linux 64bit: python 3.6, 3.7
  • Mac OS 64bit: python 3.6, 3.7
  • Windows 64bit: python 3.6, 3.7

Some things that will be finished over the next couple of days:

  • The conda build scripts will be updated to reflect the new version
  • The homebrew script


Installing RDKit using Homebrew


I just saw this message on the RDKit users message board which offers a method to install RDKit using Homebrew, I use Anaconda to install RDKit so I've not tested it.

Recently, I updated the brew install recipe for rdkit on Mac. The biggest change is that boost and boost-python's versions were pinned down, so that the brew install recipe should be much more reproducible than before. Here is a fail-safe way to install rdkit with it (with Python wrappers, and InChI support):

I've added the instructions to the Cheminformatics on a Mac page as an alternative to using Anaconda to install RDKit.

The RDKit is an open source toolkit for cheminformatics, 2D and 3D molecular operations, descriptor generation for machine learning, etc.



Crowdfunding software development


Some time ago I wrote a piece on my thoughts on scientific software development I got a lot of very positive feedback and one of the comments about not knowing about available cheminformatics toolkits lead me to create a page on open source toolkits. However this really did not address the underlying problem of how to fund specialist scientific software.

Which is why I was intrigued to hear about Andrew Dalke's efforts to crowdfund development of an open source cheminformatics software development.

This is an experiment to see if a crowdfunding consortium can be used to fund the matched molecular pair program “mmpdb”. The deadline to join is 1 February 2020!

The project is mmpdb, initial work was described in and article in JCIM "mmpdb: An Open-Source Matched Molecular Pair Platform for Large Multiproperty Data Sets" DOI.

Here we present mmpdb, an open-source matched molecular pair (MMP) platform to create, compile, store, retrieve, and use MMP rules. mmpdb is suitable for the large data sets typically found in pharmaceutical and agrochemical companies and provides new algorithms for fragment canonicalization and stereochemistry handling. The platform is written in Python and based on the RDKit toolkit.

Go over to the project page to find out more and if you can contribute please do, and also please share the link. He will be talking at the RDKit UGM #rdkitugm2019 and the presentation will probably be online later.


Greg Landrum's ACS talk on RDKit


I've created a page of open source cheminformatics toolkits here.


Jupyter notebook to look at molecular similarity


I was recently asked for a tool to compare the similarity of a list of molecules with every other molecule in the list. I suspect there may be commercial tools to do this but for small numbers of compounds it is easy to visualise in a Jupyter notebook using RDKit.

Read more here, MolecularSimilarityNotebook



Jupyter notebook to create Wordcloud of tweets


I've often wanted to try creating a word cloud and when Noel O'Boyle collected together all the tweets from the Sheffield Conf on Chemoinformatics this seemed a good opportunity.

Relive the Sheffield Conf on Chemoinformatics with these #shef2019 tweets I've pulled down from Twitter, link to tweet.

The Jupyter notebook used to create the word cloud is here, it uses the excellent word cloud generator word_cloud. You will need to download the text from the tweets from the link provided in the tweet.



Adding substructure searching to a FileMaker Pro Database


Anyone who has had to store or search a collection of chemical structures rapidly realises that they need a software tool with a little chemical intelligence. Whilst there are a number of commercial databases they tend to be rather expensive and often require a knowledge of SQL or dedicated IT support. Fine for large corporations but not suitable for a single chemist or small group. In contrast FileMaker Pro is a popular desktop database with an easy to use interface (there are also server and mobile versions). Unfortunately whilst it is easy to use it does not support chemical structure based searching. Fortunately FileMaker Pro comes with an easy to use scripting interface and we can create scripts that run command line applications like Openbabel.


This tutorial shows how to add substructure and similarity searching to a FileMaker Pro database, full details are available here including download of example database.


NextMove open source MolHash


MolHash is a command-line application and programming library for generating hashes from molecular structures. This section gives an overview of each of the most useful hash functions in turn. The user should find it straightforward to add additional hash functions, or tweak the existing ones.

The source code is available on GitHub

CMAKE, RDKit and Boost are required.

There are detailed instructions on GitHub describing the compilation and installation instructions, but I got several errors asking where RDKit was etc.

Fortunately, thanks to Matt, you can now install using conda

conda install -c mcs07 -c conda-forge molhash

Once installed you can check it is working by typing this in the Terminal

MacPro:username$ molhash -help
usage:  molhash [options] <infile> [<outfile>]
    Use a hyphen for <infile> to read from stdin
    -a  Process all the molecule (and not just the single largest component)
    -sa Suppress atom stereo
    -sb Suppress bond stereo
    -sh Suppress explicit hydrogens
    -si Suppress isotopes
    -sm Suppress atom maps
    -t  Store titles only
hash type:
    -g   anonymous graph [default]
    -e   element graph
    -s   canonical smiles
    -m   Murcko scaffold
    -mf  molecular formula
    -ab  atom and bond counts
    -dv  degree vector
    -me  mesomer
   -ht  hetatom tautomer
    -hp  hetatom protomer
   -rp  redox-pair
    -ri  regioisomer
    -nq  net charge

An example of usage

 MacPro:username$ echo "c1ccccc1C(=O)Cl" | molhash -mf -
C7H5ClO c1ccc(cc1)C(=O)Cl

CGRtools: Python Library for Molecule, Reaction and Condensed Graph of Reaction Processing


CGRtools is a set of tools for processing of reactions based on Condensed Graph of Reaction (CGR) approach, details on Github Published in JCIM DOI

Basic operations:

  • Read /write /convert formats MDL .RDF and .SDF, SMILES, .MRV
  • Standardize reactions and valid structures checker.
  • Produce CGRs.
  • Perfrom subgraph search.
  • Build /correct molecules and reactions.
  • Produce template based reactions.

stable version are available through PyPI

pip install CGRTools

Install CGRtools library DEV version for features that are not well tested

pip install -U git+

There is also a tutorial using Jupyter notebook


A review of alvaDesc


alvaDesc is a desktop tool for the calculation of a wide range of molecular descriptors and a number of molecular fingerprints from alvaDesc can be used to determine over 5000 different descriptors (the full list is here).

It can be accessed via the command line or via a GUI.


The complete review is here..


Chemfiles 0.9


Just got this message

We are very happy to announce the release of the 0.9 version of Chemfiles. Chemfiles is a C++ library providing write and read access to chemistry file formats. Chemfiles also has bindings to other languages and can be used from C, Fortran, Python, Julia and Rust.

Source code is available on GitHub also described in detail here DOI.

It can be installed using Conda

conda install -c conda-forge chemfiles

There are other libraries for file conversion in particular OpenBabel a C++ library providing conversions between more than 110 formats.


Open Forcefield 0.2


The 0.2.0 release of the Open Force Field Toolkit, featuring RDKit support and the new-and-improved SMIRNOFF v0.2 force field spec has been announced.

We're excited to announce the public release of the Open Force Field toolkit version 0.2.0! Most notably, this release adds the ability to assign SMIRNOFF parameters and AM1-BCC charges with a completely open-source backend, adding support for the RDKit and AmberTools via a new ToolkitWrapper infrastructure that can be extended in the future to support additional cheminformatics toolkits. The OpenEye Toolkit will continue to be supported, as well as used internally our parameter-fitting pipelines in the short term. We're extremely grateful to the long list of contributors that have made this release possible, especially Shuzhe Wang from the Riniker group for piloting much of the RDKit functionality.


The Chemfp Project


The Chemfp project started as a way to promote the FPS format for cheminformatics fingerprint exchange and has evolved into a set of command-line tools and a Python library for fingerprint generation and high-performance similarity search. The 10 years of work and research results of the chemfp project have now been described in an excellent publication.

I looked at Chemfp when comparing various options for clustering large datasets and Chemfp was one of the highest performing, and Andrew Dalke was very responsive to questions.


Chembience updated


Update to RDKit 2018.09.2 and Postgres 10.7.

Chembience is a Docker based platform supporting the fast development of chemoinformatics-centric web applications and microservices. It creates a clean separation between your scientific web service implementation and any host-specific or infrastructure-related configuration requirements.


Counting Identical structures in two datasets


Sometimes I have two datasets and I just want to know the overlap of identical structures. This Vortex script counts the number of identical structures by comparing InChIKeys. It then displays a matrix showing how many unique molecules in each dataset and how many molecules are in both datasets.



Update to MayaChemTools


I just heard that the following command line scripts available as part of MayaChemTools package now have implemented multiprocessing functionality.










2nd RSC-BMCS / RSC-CICAG Artificial Intelligence in Chemistry


In June 2018 the First RSC-BMCS / RSC-CICAG Artificial Intelligence in Chemistry meeting was held in London. This proved to enormously popular, there were more oral abstracts and poster submissions than we had space for and was so over-subscribed we could have filled a venue double the size.

Planning for the second meeting is now in full swing, and it will be held in Cambridge 2-3 September 2019.

Event : 2nd RSC-BMCS / RSC-CICAG Artificial Intelligence in Chemistry
Dates : Monday-Tuesday, 2nd to 3rd September 2019
Place : Fitzwilliam College, Cambridge, UK
Websites : Event website, and RSC website.

Twitter #AIChem19


Applications for both oral and poster presentations are welcomed. Posters will be displayed throughout the day and applicants are asked if they wished to provide a two-minute flash oral presentation when submitting their abstract. The closing dates for submissions are:

  • 31st March for oral and
  • 5th July for poster

Full details can be found on the Event website,


New release of MayaChemTools


A new release of MayaChemTools is now available, these comprise a fantastic collection of Perl and Python scripts, modules, and classes to support a variety of day-to-day computational discovery needs.

The core set of command line Perl scripts available in the current release of MayaChemTools has no external dependencies and provide functionality for the following tasks:

  • Manipulation and analysis of data in SD, CSV/TSV, sequence/alignments, and PDB files
  • Listing information about data in SD, CSV/TSV, Sequence/Alignments, PDB, and fingerprints files
  • Calculation of a key set of physicochemical properties, such as molecular weight, hydrogen bond donors and acceptors, logP, and topological polar surface area
  • Generation of 2D fingerprints corresponding to atom neighborhoods, atom types, E-state indices, extended connectivity, MACCS keys, path lengths, topological atom pairs, topological atom triplets, topological atom torsions, topological pharmacophore atom pairs, and topological pharmacophore atom triplets
  • Generation of 2D fingerprints with atom types corresponding to atomic invariants, DREIDING, E-state, functional class, MMFF94, SLogP, SYBYL, TPSA and UFF
  • Similarity searching and calculation of similarity matrices using available 2D fingerprints
  • Listing properties of elements in the periodic table, amino acids, and nucleic acids
  • Exporting data from relational database tables into text files

The command line Python scripts based on RDKit provide functionality for the following tasks:

  • Calculation of molecular descriptors and partial charges
  • Comparison of 3D molecules based on RMSD and shape
  • Conversion between different molecular file formats
  • Enumeration of compound libraries and stereoisomers
  • Filtering molecules using SMARTS, PAINS, and names of functional groups
  • Generation of graph and atomic molecular frameworks
  • Generation of images for molecules
  • Performing structure minimization and conformation generation based on distance geometry and forcefields
  • Performing R group decomposition
  • Picking and clustering molecules based on 2D fingerprints and various clustering methodologies
  • Removal of duplicate molecules and salts from molecules

The command line Python scripts based on PyMOL provide functionality for the following tasks:

  • Aligning macromolecules
  • Splitting macromolecules into chains and ligands
  • Listing information about macromolecules
  • Calculation of physicochemical properties
  • Comparison of marcromolecules based on RMSD
  • Conversion between different ligand file formats
  • Mutating amino acids and nucleic acids
  • Generating Ramachandran plots
  • Visualizing X-ray electron density and cryo-EM density
  • Visualizing macromolecules in terms of chains, ligands, and ligand binding pockets
  • Visualizing cavities and pockets in macromolecules
  • Visualizing macromolecular interfaces
  • Visualizing surface and buried residues in macromolecules


Programming Languages for Chemical Information


This looks like it should be well worth bookmarking.

This thematic series comprises a set of invited papers, each one describing the use of a single language for the development of cheminformatics software that implement algorithms and analyses and aims to cover a variety of language paradigms. The issue will be rolling, such that as papers on new languages are submitted they will be automatically added to this issue.

The first article DOI is by Kevin Theisen (of ChemDoodle fame) reviewing HTML5/Javascript. Apparently there have been more lines of Javascript written than all other programming languages combined so it seems appropriate as a kick off article.


Using the Python 3 library fpsim2 for similarity searches


FPSim2 is a new tool for fast similarity search on big compound datasets (>100 million) being developed at ChEMBL. It was developed as a Python3 library to support either in memory or out-of-core fast similarity searches on such dataset sizes.

It is built using RDKit and can be installed using conda. It requires Python 3.6 and a recent version of RDKit..

I've written a couple of Jupyter notebooks to demonstrate it's use.

You can read the full tutorial here, and download the notebooks.


Chemical reactions from US patents (1976-Sep2016)


Great work by NextMove, an open, machine-readable, freely-reusable, annotated reaction data set, available for download here

Reactions extracted by text-mining from United States patents published between 1976 and September 2016. The reactions are available as CML or reaction SMILES. Note that the reactions SMILES are derived from the CML.

Reaction SMILES

For convenience the reaction SMILES includes tab delimited columns for: PatentNumber, ParagraphNum, Year, TextMinedYield, CalculatedYield

Now that we have a large initial data set it would be great if others could contribute using the same format.

There is a fabulous detailed review of this invaluable resource on the Depth-First blog




Just came across this.

Alvascience cheminformatics tools.

BMFpred is an easy-to-use software implementing the QSAR models described in “F. Grisoni, V.Consonni, M.Vighi (2018). Acceptable-by-design QSARs to predict the dietary biomagnification of organic chemicals in fish, Integrated Environmental Assessment and Management” to predict the laboratory-based fish Biomagnification Factor (BMF) of chemicals.

alvaDesc is the next generation tool for the calculation of a wide range of molecular descriptors and a number of molecular fingerprints. Specifically it calculates almost 4000 descriptors independent of 3-dimensional information such as constitutional, topological, phamacophore. It includes ETA and Atom-type E-state indices together with functional groups and fragment counts. Additionally, alvaDesc implements an extensive number of 3-dimensional descriptors such as 3D-autocorrelation, Weighted Holistic Invariant Molecular descriptors (WHIM) and GETAWAY.


Also available as a KNIME node, the alvaDesc KNIME Plugin contains three KNIME nodes:

  • Descriptor: calculates molecular descriptors
  • Fingerprint: calculates molecular fingerprints
  • Molecule Reader: reads standard molecule files and can be used as a source for the other two nodes (which are also compatible with KNIME standard molecule nodes)


Workshop on Computational Tools for Drug Discovery


In many companies/institutions/universities new arrivals are presented with a variety of desktop tools with little or no advice on how to use them other than "pick it up as you along". This workshop is intended to provide expert tutorials to get you started and show what can be achieved with the software.

The tutorials will be given a series of outstanding experts Christian Lemmen (BioSolveIT), Akos Tarcsay (ChemAxon), Giovanna Tedesco (Cresset), Dan Ormsby (Dotmatics) Greg Landrum (Knime ) and Matt Segall (Optibrium), you will be able to install the software packages on you own laptops together with a license to allow you to use it for a limited period after the event.

Registration and full details are here.

Computational Tools Flyer


OpenEye Applications v2018.Nov released


OpenEye announced the release of OpenEye Applications v2018.Nov. These applications include the usual support for Linux, MacOSX, and Windows, download here.

Supported versions of Mac OSX 10.11, 10.12, 10.13, 10.14.

  • OpenEye Applications are now released as an applications package. It includes all the OpenEye Applications suites except for VIDA and AFITT.
  • ROCS is now built on top of Shape TK 2.0.
  • QUACPAC now includes improvements to the tautomer functionality.

Full release notes


Cambridge Cheminformatics Network


28 November 2018
Cambridge Cheminformatics Meeting
CCDC, Union Road

3.30pm coffee; 4pm talks start; ~5.30pm drinks at The Alma

"Free ligand conformations in Structure Based Drug Discovery" Elisabetta Chiarparin, AstraZeneca

"Digital design – From molecules to medicines with structural informatics" Andrew Maloney , CCDC

"New Trend in Therapeutics Research - Artificial Intelligence for Identifying Novel Therapeutic Targets, Biomarkers and Drug Repositioning Opportunities" Namshik Han, Milner Institute


Making a Random Selection


Sometimes it is the simplest scripts that prove to be the most useful, the most downloaded AppleScript on the site is the one that simply prints the text on the clipboard.

I regularly need to select a specified number of molecules in a random fashion and this script does just that. Import a sdf file containing structures into Vortex and run the script to make a random selection.


Full details here….


OpenEye Toolkits v2018.Oct released


OpenEye have announced the release of OpenEye Toolkits v2018.Oct. These libraries include the usual support for C++, Python, C#, and Java. HIGHLIGHTS:

  • Omega TK now includes a method specifically tuned to sample macrocyclic conformational space.
  • FastROCS TK is now available in C++ and Java.
  • Quacpac TK includes improvements to the tautomer functionality.

Full details are in the Release notes.


How to contribute to RDKit


I just noticed that Greg Landrum has posted a page on how to contribute to RDKit.

There many ways to contribute, you don't have to be Python or C++ developer, simply being an active user and asking questions and contributing solutions helps other users. Improving the documentation is always a great place from newcomers to start, particularly highlighting things that are not as clear as they could be.

I've also added the link to the Toolkits page.


Install RDKiit using Conda


Just highlighted on the RDKit email list, you can install RDKit using conda.

RDKit is a collection of cheminformatics and machine-learning software written in C++ and Python.

There are other cheminformatics toolkits described here, and details on how to install a wide range of cheminformatics tools on a Mac detailed here


Installing Cheminformtics packages on a Mac


A while back I wrote a very popular page describing how to install a wide variety of chemiformatics packages on a Mac, since there have been some changes with Homebrew which have meant that a few of the scientific applications are no longer available so I've decided to rewrite the page on installing the missing packages using Anaconda.

I've also included a list of quick demos so you can everything is working as expected.

Full details are here

Packages include:

  • OpenBabel
  • RDKit
  • brew install cdk
  • chemspot
  • indigo
  • inchi
  • opsin
  • osra
  • pymol
  • oddt

In addition to gfortran and a selection of developers tools.


Open Source Cheminformatics Tookits


When I wrote the article entitled A few thoughts on scientific software one of the responses I got was that people did not know about the existence of open-source chemistry toolkits so I thought I'd publish a page that hopefully prevent stop people reinventing the wheel. Here are four open-source toolkits that I'm aware of and if I've missed any, my apologies and send me details. Listing of Open-source cheminformatics toolkits


RSC Chemical Information and Computer Applications Group


The website for RSC Chemical Information and Computer Applications Group (CICAG) has undergone an update now includes more information on forthcoming events and awards, together with the latest CICAG newsletter. Please feel free to share.

The Chemical Information and Computer Applications Group (CICAG) is one of the RSC’s many member-led Interest Groups, which exist to benefit RSC members and the wider chemical science community, and to meet the requirements of the RSC’s strategy and charter.

CICAG works to support users of chemical information, data and computer applications and advance excellence in the chemical sciences. Inform RSC members and others of the latest developments in these rapidly evolving areas and promote the wider recognition of excellence in chemical information and computer applications at this level





Chembience is a Docker based platform intended for the fast development of chemoinformatics-centric web applications and micro-services based on RDkit. It supports a clean separation of your scientific web service implementation work from any infrastructure related configuration requirements.


At its current development stage, Chembience supports three base types of application (App) containers: (1) a Django/Django REST framework-based App container which is specifically suited for the development of web-based Python applications, (2) a Python shell-based App container which allows for the execution of script-based python applications, and (3), a Jupyter-based App container which let you run Jupyter notebooks (currently only a Python kernel is supported).


Mixfile format


An interesting post on Mixtures & cheminformatics on designing a new file format to handle mixtures of chemicals, in particular things like "LDA within a solvent mixture of THF and hexanes, in a ratio of 1 to 7".


The format hasn’t been locked down yet, but it is very simple: it’s JSON-based, in order to make it easy to read & write with any software platform, and have high human readability. It’s hierarchical, making it possible to describe mixtures-of-mixtures, which happens frequently. Each component is expected to provide a structure and quantity whenever these are known, with name being also highly encouraged. Other information like canonical identifiers, database links, cross references, etc., can easily be encapsulated – the Mixfile is intended to be an inclusive container of information – but they do not necessarily impart much-if-any special meaning to the software that interprets them.

More info can be found on the GitHub page




Just got this message which I thought readers might be interested in

chemfp 1.5 is now available from and from PyPI (the Python package index) through "pip install chemfp".

The software is available in source code form under the MIT license. For more information see the home page at or the documentation page at .

Chemfp is a set of command-line tools and a Python library for working with cheminformatics fingerprints. It can use OEChem/OEGraphSim, RDKit, or Open Babel to create fingerprints in the FPS format, and it implements a high-speed Tanimoto search.

As far as I can tell, chemfp 1.5 is the fastest free/open source fingerprint search system for the CPU. (Some proprietary/commercial toolkits are faster, including the commercial version of chemfp, and GPU-based search is usually faster than the CPU.)

The main changes for this release are:

  • 10% faster performance for k-nearest search
  • fixed a bug in symmetric k-nearest neighbor when multiple fingerprints have no bits set
  • improved the use of chemfp as a baseline benchmark for similarity search tools

Similarity search performance benchmark

Concerning the last point, I have assembled a data set which can be used to benchmark similarity search performance for several different search types, fingerprint types, and scoring functions. This includes pre-computed fingerprints and expected search results, as well as timing numbers for several different versions of chemfp.

My hope is that it evolves into a standard benchmark that help evaluate search tools - bearing in mind that performance is only one of many factors that go into selecting a tool.

The benchmark files are at . Those files which fall under copyright are distributed under the MIT license.

Many thinks to ChEMBL, OpenEye, PubChem, Open Babel, RDKit, and Daniel Lemire for providing the data and resources for putting this benchmark together.

Best regards,



ACS awards for Computers in Chemistry


Nominations are now open for the Computers in Chemistry division of the ACS awards.

More details here


AI in Chemistry meeting report


RSC-BMCS / RSC-CICAG Artificial Intelligence in Chemistry Friday, 15th June 2018 - Royal Society of Chemistry at Burlington House, London, UK
Post-event Report on Speaker Presentations, written by Bursary Awardees


Accessing a Jupyter Notebook HERG model from Vortex


A recent paper "The Catch-22 of Predicting hERG Blockade Using Publicly Accessible Bioactivity Data" DOI described a classification model for HERG activity. I was delighted to see that all the datasets used in the study, including the training and external datasets, and the models generated using these datasets were provided as individual data files (CSV) and Python Jupyter notebooks, respectively, on GitHub

The models were downloaded and the Random Forest Jupyter Notebooks (using RDKit) modified to save the generated model using pickle to store the predictive model, and then another Jupyter notebook was created to access the model without the need to rebuild the model each time. This notebook was exported as a python script to allow command line access, and Vortex scripts created that allow the user to run the model within Vortex and import the results and view the most significant features.

All models and scripts are available for download.

Full details are here…



A quick look at CypReact


Sometimes you just want to know which enzymes are likely to be involved in the metabolism of a molecule, CypReact DOI takes a structure (SMILES or sdf input) and predicts if the molecule will react with any one of the nine of the most important human cytochrome P450 (CYP450) enzymes [CYP1A2, CYP2A6, CYP2B6, CYP2C8, CYP2C9, CYP2C19, CYP2D6, CYP2E1, or CYP3A4]. Read more here..



How Do You Build and Validate 1500 Models and What Can You Learn from Them?


Greg Landrum's ICCS 2018 presentation on slideshare


Implementing AB-MPS scoring


Whilst the rule of 5 (Ro5) has provided a useful way to describe small molecule drug space it is also clear that there are a significant number of molecular classes that exist beyond the rule of 5 boundaries (bRo5). In a review of the AbbVie compound collection DOI they were able to identify key findings that might explain the success (or failure) of bRo5 projects. From an analysis of a variety of calculated physicochemical properties they proposed a simple multiparametric scoring function (AB-MPS) was devised that correlated preclinical PK results with cLogD, number of rotatable bonds, and number of aromatic rings.

AB-MPS = Abs(cLogD-3) + NAR + NRB

Now implemented as a Vortex script.


Chemical Information and Computer Applications Group (CICAG) website


The new RSC CICAG website is now live why not have a look and provide suggestions and feedback.


The Chemical Information and Computer Applications Group (CICAG) is one of the RSC’s many member-led Interest Groups, which exist to benefit RSC members and the wider chemical science community.

Also provides links to the social media feeds (Twitter, LinkedIn etc.)


Diversity Genie


Diversity Genie is a desktop software tool which allows to analyze and manipulate chemical data. Its capabilities include:

  • mapping molecules and their properties with sammon embedding.

  • filtering and converting sets of molecules in SDF, SMILES, and InChI formats.

  • plotting histograms, scatter plots, and ROC curves.

  • Computing well-known molecular properties and merging CSV files.

  • Creating machine learning models using powerful gradient boosting methods.

Diversity Genie 3 is completely free to use by academia and for personal non-commercial use. You can download Mac OSX, Windows and Linux builds at



mmpdb: An Open Source Matched Molecular Pair Platform for Large Multi-Property Datasets


An interesting paper on chemrxiv DOI

Matched Molecular Pair Analysis (MMPA) enables the automated and systematic compilation of medicinal chemistry rules from compound/property datasets. Here we present mmpdb, an open source Matched Molecular Pair (MMP) platform to create, compile, store, retrieve, and use MMP rules. mmpdb is suitable for the large datasets typically found in pharmaceutical and agrochemical companies and provides new algorithms for fragment canonicalization and stereochemistry handling. The platform is written in Python and based on the RDKit toolkit. It is freely available from


MOE update 2018.01 released


The latest update to Chemical Computing Group's Molecular Operating Environment (MOE) software includes a variety of new features, enhancements

Windows XP (finally!) and macOS 10.6 have been removed from the list of officially supported platforms. Supported Windows platforms are Vista/7/8/10, and the minimum supported macOS is 10.7 (Lion).

Amber14:EHT Forcefield. The Amber14 parameter set is now supported in MOE. The new parameters consist of improvements to nucleic acids; otherwise, protein and small molecule parameters (and charges) are unchanged. The forcefield can be selected in the MOE | Footer.

TCR-MHC Protein Complex Database. A new MOE Project database containing T-Cell Receptor (TCR) – Major Histocompatibility Complex (MHC) x-ray structures has been added to MOE. The database can be accessed with MOE | Protein | Search | TCR-MHC | TCR-MHC which will launch the MOE Project Search panel.

Several applications have been parallelized to run in the moe -mpu environment:

  • Descriptor calculations with the SVL function QuaSAR_DescriptorMDB.
  • Energy minimization in the Database Viewer DBV | Compute | Molecule | Energy Minimize.
  • Conformational search using MDB input files in MOE | Compute | Conformations | Search.
  • Rotamer library generation with DBV | Compute | Build Rotamer Library.
  • Project database creation with the SVL run file dbupdate.svl and the scripts $MOE/bin/projupdate and $MOE/bin/projupdate.bat.

I plan to review the latest version of MOE in the near future.


Awesome Python Chemistry


A curated list of awesome Python frameworks, libraries, software and resources related to Chemistry.

A blog post giving more details


Vida updated


VIDA v4.4.0 has been released. This upgrade adds several new features and fixes many previous issues.

  • A new ribbon style that produces ribbons with a smoother appearance has been introduced into VIDA.


  • Improvements to the Builder/Sketcher, including:
  • closing the Sketcher window prompts for Save, Save as New, Discard, or Cancel
  • closing the Builder closes the Sketcher window
  • an additional “Save As New” option in the toolbar and Builder context menu
  • hitting Return now finishes adding typed-in molecules from the Sketcher
  • Significant improvements to the Extension Manager. In addition, extensions can be centrally deactivated.

VIDA is built on top of the OpenEye Toolkits v2017.Oct libraries to ensure that it and ancillary programs take full advantage of the state-of-the-art improvements in all underlying programming libraries. Support for macOS El Capitan (10.11), macOS Sierra (10.12), and macOS High Sierra (10.13) has been added.


Google summer of code chemistry ideas


The Open Chemistry project have collected together project ideas for GSoC 2018. The projects cover a wide range of projects in chemistry

The full listing is available here and includes projects that make use of a number of open source toolkits such as Open Babel, RdKit and cclib.


Molecular Materials Informatics Apps


Molecular Materials Informatics, Inc have been busy recently with updates to many of their applications

The following mobile apps have all been updated

PolyPharma Poly-pharmacology of molecular structures: use structure activity relationships to view predicted activities against biological targets, physical properties, and off-targets to avoid. Calculations are done using Bayesian models and other kinds of calculations that are performed on the device.

Green Lab Notebook allows recording of multistep chemical reactions, using molecular structure, name and stoichiometry as the primary components. When quantities are provided, interconversions are calculated automatically, and green chemistry metrics are shown.

SAR Table app is designed for creating tables containing a series of related structures, their activity/property data, and associated text. Structures are represented by scaffolds and substituents, which are combined together to automatically generate a construct molecule. The table editor has many convenience features and data checking cues to make the data entry process as efficient as possible.

MolPrime is a chemical structure drawing tool based on the unique sketcher from the Mobile Molecular DataSheet (MMDS).

Approved Drugs app contains over a thousand chemical structures and names of small molecule drugs approved by the US Food & Drug Administration (FDA). Structures and names can be browsed in a list, searched by name, filtered by structural features, and ranked by similarity to a user-drawn structure. The detail view allows viewing of a 3D conformation as well as tautomers. Structures can be exported in a variety of ways, e.g. email, twitter, clipboard.

Green Solvents reference card for chemical solvents, with data regarding their "greenness": safety, health and environmental effects.

For the desktop the OS X Molecular DataSheet (XMDS) is an interactive cheminformatics tool for viewing and editing molecular structures, chemical reactions and data. It is designed to be instantly intuitive to anyone who has used a Mac, a spreadsheet and any chemical structure sketcher.



MayaChem Tools


MayaChemTools is a fabulous collection of Perl and Python scripts, modules, and classes to support a variety of day-to-day computational discovery needs.

The core set of command line Perl scripts available in the current release of MayaChemTools has no external dependencies and provide functionality for the following tasks:

  • Manipulation and analysis of data in SD, CSV/TSV, sequence/alignments, and PDB files
  • Listing information about data in SD, CSV/TSV, Sequence/Alignments, PDB, and fingerprints files
  • Calculation of a key set of physicochemical properties, such as molecular weight, hydrogen bond donors and acceptors, logP, and topological polar surface area
  • Generation of 2D fingerprints corresponding to atom neighborhoods, atom types, E-state indices, extended connectivity, MACCS keys, path lengths, topological atom pairs, topological atom triplets, topological atom torsions, topological pharmacophore atom pairs, and topological pharmacophore atom triplets
  • Generation of 2D fingerprints with atom types corresponding to atomic invariants, DREIDING, E-state, functional class, MMFF94, SLogP, SYBYL, TPSA and UFF
  • Similarity searching and calculation of similarity matrices using available 2D fingerprints
  • Listing properties of elements in the periodic table, amino acids, and nucleic acids
  • Exporting data from relational database tables into text files

The command line Python scripts based on RDKit provide functionality for the following tasks:

  • Calculation of molecular descriptors
  • Comparison 3D molecules based on RMSD and shape
  • Conversion between different molecular file formats
  • Enumeration of compound libraries and stereoisomers
  • Filtering molecules using SMARTS, PAINS, and names of functional groups
  • Generation of graph and atomic molecular frameworks
  • Generation of images for molecules
  • Performing structure minimization and conformation generation based on distance geometry and forcefields
  • Picking and clustering molecules based on 2D fingerprints and various clustering methodologies
  • Removal of duplicate molecules

These invaluable scripts can be used in other applications, I've written a Vortex Script that uses them.


Screenlamp:- A toolkit for ligand-based virtual screening


A recent publication "Enabling the hypothesis-driven prioritization of ligand candidates in big databases: Screenlamp and its application to GPCR inhibitor discovery for invasive species control" {DOI]( describes a very interesting software tool for virtual screening.

While the advantage of screening vast databases of molecules to cover greater molecular diversity is often mentioned, in reality, only a few studies have been published demonstrating inhibitor discovery by screening more than a million compounds for features that mimic a known three-dimensional (3D) ligand. Two factors contribute: the general difficulty of discovering potent inhibitors, and the lack of free, user-friendly software to incorporate project-specific knowledge and user hypotheses into 3D ligand-based screening. The Screenlamp modular toolkit presented here was developed with these needs in mind.

The Screenlamp homepage gives more details and installation instructions. Screenlamp is written in Python (3.6) and can be downloaded from GitHub

Certain submodules within screenlamp require external software to sample low-energy conformations of molecules and to generate pair-wise overlays. The tools that are currently being used in the pre-built, automated screening pipeline are OpenEye OMEGA and OpenEye ROCS to accomplish those tasks. However, screenlamp does not strictly require OMEGA and ROCS, and you are free to use any open source alternative that provided that the output files are compatible with screenlamp tools, which uses the MOL2 file format.

Screenlamp is research software and has been made available to other researchers under a permissive Apache v2 open source license.


Mac in Chemistry Annual website review


At the end of each year I have a look at the website analytics to see which items were the most popular.

Over the year there were 60,000 unique visitors with 25% visiting the site on multiple occasions. The US provided 30% of the visitors and the UK 10% with Germany, Canada and Japan around 5%. As might be expected 60% of the visitors were using a Mac, but 25% of the visitors were Windows users and 10% iOS. Looking at the last month's Mac visitors, 53% were using Mac OS X 10.13, 25% 10.12 and 12% 10.11.

Safari and Chrome (each 41%) were the most used web browsers with the once dominant Internet Explorer down at 2%.

The most viewed blog pages in 2017 were

The most popular web pages were (other than the main page)

The continued popularity of the Fortran on a Mac web page is interesting, I'm not a big Fortran user but if anyone knows of items that could be added to the page I'd be delighted to hear about them. I've done a couple of updates to the Cheminformatics on a Mac page and I think I'll need to add a section on Bioconda in the future.

Interestingly the Scientific Applications under High Sierra page was of only transient popularity. It seem this update to Mac OSX was relatively benign with very few issues.

2017 also saw the 2000th download of iBabel, iBabel is a GUI (graphical user interface) for the open source cheminformatics toolkit OpenBabel. It also provides an interface to a variety of tools built using OpenBabel and a molecule viewer. I'm planning to do an update to iBabel to take advantage of some of the updates to OpenBabel but if you have any suggestions I'd happy to see if I can include them.


2017 also saw the migration of the website from http to https, a change that went pretty seamlessly with only a couple of minor glitches.

The Twitter feed is increasing in popularity with 390 followers. The most popular tweets were

Creating a Bioconda recipe
RSC meeting on AI in Chemistry

The RSS feed still has around 100 followers


Creating a Bioconda recipe


A little while back I mentioned BioConda. You can read more details in this publication "Bioconda: A sustainable and comprehensive software distribution for the life sciences", DOI. Conda is a platform- and language-independent package manager that sports easy distribution, installation and version management of software.

The conda package manager has recently made installing software a vastly more streamlined process. Conda is a combination of other package managers you may have encountered, such as pip, CPAN, CRAN, Bioconductor, apt-get, and homebrew. Conda is both language- and OS-agnostic, and can be used to install C/C++, Fortran, Go, R, Python, Java etc

The bioconda channel is a Conda channel providing bioinformatics related packages for Linux and Mac OS. Looking through the packages it is clear there it already contains a number of chemistry packages. These include: Updated 24 November 2017

  • OpenBabel
  • Rdkit
  • Opsin
  • chemfp
  • gromacs
  • osra
  • Autodock Vina
  • openmg
  • align-it
  • strip-it
  • shape-it
  • np-likeness-scorer
  • Smina

Bioconda offers a collection of over 3100 software tools, which are continuously maintained, updated, and extended by a growing global community of more than 330 contributors. Rather than try to duplicate this effort for a "Chemconda" it seems more efficient to encourage chemists to contribute to Bioconda. If you do package a chemistry application for Bioconda please let me know and I'll publicise it on my blog and add it to the list above. To start things rolling I've added to Bioconda and I've written a page describing how to create a bioconda recipe.

Link to page Creating a Bioconda recipe


chemfp 1.3 released


Chemfp is a set of command-line tools and a Python library for working with cheminformatics fingerprints. It can use OEChem/OEGraphSim, RDKit, or Open Babel to create fingerprints in the FPS format, and it implements a high-speed Tanimoto search.

The software is available under the MIT license. For more information see Documentation is available from .

There are many changes over chemfp 1.1, which was the last release of the public/no-cost version of chemfp. The biggest ones are:

  • Tested against the current version of all of the toolkits

  • Added support for the Avalon and pattern fingerprints in RDKit

  • In-memory Tanimoto searches for 166-bit MACCS keys on computers with the POPCNT instruction is about 30% faster.

  • FPS loading is about 40% faster. As a result, file-based searches are about 25% faster.

  • The in-memory search algorithms in version 1.1 were parallelized with OpenMP, but the NxM k-nearest search was left out. That case is now also parallelized.

  • Some of the APIs from the commercial version were backported to 1.3, including the fingerprint writer API and functions for substructure fingerprint screening.

  • Added and improved docstrings

This release support Python 2.7 but it no longer supports Python 2.5 or Python 2.6. The commercial version supports Python 2.7 and Python 3.5+, handles more than 4GB of fingerprint data, and has a binary fingerprint format for fast loading.

It is available from


SeeSAR Updated


SeeSAR has been updated to version 7. There is a review of an older version of SeeSAR here. However SeeSAR is a constantly evolving and improving piece of software.

SeeSAR is a software tool for interactive, visual compound prioritization as well as compound evolution. Structure-based design work ideally supports a multi-parameter optimization to maximize the likelihood of success, rather than affinity alone. Having the relevant parameters at hand in combination with real-time visual computer assistance in 3D is one of the strengths of SeeSAR.


This update includes

  • Full integration of ReCore functionality. Now - besides fragment-replacement, joining and merging of fragments is also possible. In addition, you can fine-tune results delivered by ReCore using pharmacophore filters.
  • Editing turns into full-blown designing. Besides atom by atom changes you may now add the most common rings with just one klick. So large changes can quickly be made to molecules and this in itself necessitates another new feature — namely that multiple poses are generated based on a superposition of the maximum common substructure.
  • Display of torsion distribution. On the one hand, we have now integrated the latest update of the database of torsion angle distributions from the CSD, while on the other, it is now possible to view the torsion angle distribution for a particular rotatable bond.
  • Miscellaneous enhancements. The SDF export covers your particular selection of favourites and any comments attached to a molecule. For a numerical filter it is now possible to define both a lower and upper bound. Last but not least, besides distances, you may now measure angles and torsions.


DataWarrior Updated


I notice that DataWarrior has had a couple of updates recently.

DataWarrior combines dynamic graphical views and interactive row filtering with chemical intelligence. Scatter plots, box plots, bar charts and pie charts not only visualize numerical or category data, but also show trends of multiple scaffolds or compound substitution patterns.

The latest updates

v04.06.01: August 2017 Fixed plugin interface bug. Various small bug-fixes and improvements.
v04.06.00: July 2017 new plugin interface to easily develop database access extentions

DataWarrior can be downloaded here


Molecular Query Language


This looks interesting Molecular Query Language (MolQL) a declarative language for describing selections/substructures of molecular data. The language provides a wide range of queries and can be used as a compilation target for various selection expressions such as the ones provided by PyMol, JMOL, or VMD.

There are a number of examples that you can explore in your web browser here.


Clustering Update


I previously mentioned a comparison of various tools to cluster large datasets. I've now updated the Vortex to allow the user to select the centroid of each cluster. I tried it on a 4.3 million structure clustered dataset and the script only took a few seconds to run.

The page on clustering is here and the Vortex script can be downloaded here


Launch of Flare


Cresset just announced the launch of Flare a new software tool to aid the understanding of protein ligand interactions.

Key new technology available in Flare 1.0:

  • Visualize the electrostatics of the protein active site using protein interaction potentials
  • Calculate the positions and stability of water in apo and liganded proteins using 3D-RISM
  • Understand the energetics of ligand binding using the WaterSwap technique.


Flare uses the XED force field to calculate a detailed map of the electrostatic character of the protein active site. The interaction potentials provide you with vital knowledge of the fundamental processes that underlie ligand-protein binding, helping you to perfect the design of new molecules. The position and energetics of water molecules in and around the active site is of crucial importance in understanding ligand binding. Knowledge of which water molecules are tightly bound and which are energetically unfavorable can give valuable insights into structure-activity relationships and help you to decide where to place ligand atoms. Cresset’s 3D-RISM analysis utilizes the advanced inter-molecular descriptions of the XED force field to give you a water analysis you can trust.

Flare is available for Mac OSX, Linux and Windows and free evaluation is available


StarDrop 6.4


StarDrop 6.4 now links prepared 3D docking and alignment models with data visualisation, 2D SAR analyses and predictive models in a single interface.

Computational chemists can make their validated 3D models available to their colleagues via StarDrop’s Pose Generation Interface, which is compatible with software from major computational chemistry providers, including:

  • FlexX™ – BioSolveIT
  • Gold™ – Cambridge Crystallographic Data Centre
  • MOE™ – Chemical Computing Group
  • AutoDock Vina – The Scripps Research Institute
  • POSIT™ – OpenEye Scientific
  • …extendable to other third party applications.

The Pose Generation Interface communicates with a Pose Generation Server, on which computational chemists can easily publish their validated docking or 3D alignment models. These are made instantly available for StarDrop users to submit their compounds and the resulting poses, protein structures and scores are returned directly to StarDrop for visualisation and analysis.

The Pose Generation Server can be installed wherever you run your 3D modelling software, supporting Linux, Windows® and Mac®

There are more details in the poster presented at the Spring ACS 2017.


Schrödinger Updated


The Schrödinger small molecule discovery suite has ben updated. This looks to be a substantial update and is described in the video below.

Supported MacOS X 10.12 and Mac OS X 10.9 - 10.11

3D Support, Supported: Interlaced stereo via Zalman 3D Monitors


A Functional Group Count Script


I recently wrote a review of Reaction Workflows, a web-based tool that allow users to build workflows from nodes that provide inputs and outputs or perform actions, including ones to perform reaction-, scaffold-, and transform-based enumeration, and it is all done within a web browser interface using drag and drop. Whilst you can draw input structures one of the real strengths is the ability to import pre-categorised reagent files e.g.Acid Chlorides or secondary amines. This script is intended to help with this within Vortex.

This script is a variation of the high performance sub-structure search scripts described previously, however instead of simply flagging the presence (or absence) of a SMARTS query we provide a count of the number of times a SMARTS query is identified within a molecule. The script uses all available cores and is thus capable of running multiple queries in parallel and can thus handle very large datasets. The script currently contains around 70 different SMARTS queries for both functional groups and atom counts and I'd be happy to add any suggestions.

Read more….


RDKit and Python3


Greg Landrum posted the following to the RDKit users and since a couple of the Jupyter Notebooks I've published make extensive use of RDKit I thought I'd flag it.

As many of you are no doubt aware, the Python community plans to discontinue support for Python 2 in 2020. A growing number of projects in the Scientific Python stack are making the same transition and have made that explicit here:

I will be adding the RDKit to this list. The RDKit will switch to support only Python 3 by 2020. At some point between now and then - likely during the 2018.09 release cycle - we will create a maintenance branch for Python 2 that will continue to get bug fixes but will no longer have new Python features added. This branch will be maintained, and we will keep doing Python 2 builds, until 2020 when official Python 2 support ends.

Additionally, starting during the 2018.03 release cycle we will accept contributions for new features that are not compatible with Python 2 as long as those features are implemented in such a way that they don't break existing Python 2 code (more on this later). This will allow members of the RDKit community who have made the switch to Python 3 to start making use of the new features of the language in their RDKit contributions.

If you have not made the switch yet to Python 3: please read the web page I link to above and take a look at the list of projects that have committed to transition. The switch from Python 2 to Python 3 isn't always easy, but it's not getting any easier with time and you have a few years to complete it. There are a lot of online resources available to help.

Best Regards, -greg

The list of projects that will be making the transition so far includes; IPython, Jupyter notebook, pandas, Matplotlib SymPy, Astropy, Software Carpentry, SunPy xonsh, scikit-bio, PyStan, Axelrod osBrain, PyMeasure, rpy2, PyMC3, FEniCS, An Introduction to Applied Bioinformatics, music21, QIIME, Altair, gala, cual-id, CIS


ROCS v3.2.2 released


ROCS is a shape-based superposition method. Molecules are aligned by a solid-body optimization process that maximizes the overlap volume between them. Volume overlap in this context is not the hard-sphere overlap volume, but rather a Gaussian-based overlap parameterized to reproduce hard-sphere volumes. ROCS uses only the heavy atoms of a ligand, hydrogens are ignored.

ROCS is built on top of the OpenEye Toolkits v2017.Feb libraries to ensure that ROCS and the ancillary programs are taking advantage of state-of-the-art improvements in the underlying programming libraries. This version of ROCS fixes a bug that prevented molecule streaming using pipes and named pipes on Linux and OS X systems. ROCS now accepts molecule streams or named pipes as database files.

Support for Mac OS X 10.10, 10.11, and MacOS X Sierra 10.12 has been added. Mac OS X 10.8 and 10.9 are no longer supported.


Options for Clustering large datasets of Molecules


Clustering is an invaluable cheminformatics technique for subdividing a typically large compound collection into small groups of similar compounds. One of the advantages is that once clustered you can store the cluster identifiers and then refer to them later this is particularly valuable when dealing with very large datasets. This often used in the analysis of high-throughput screening results, or the analysis of virtual screening or docking studies.

On this page I've explored multiple options for clustering, from Open Source toolkits to sophisticated desktop applications.

Read More….


ToMoCoMD framework


ToMoCoMD-CARDD is an interactive and user-friendly free multi-platform framework designed to calculate 2/3-D numerical descriptors (indices) for molecular structures, with the objective of characterizing or discriminating among them. It can be downloaded here

Reference DOI.


RSC Chemical Information and Computer Applications Group survey on Social Media Channels


The Royal Society of Chemistry Chemical Information and Computer Applications Group (CICAG) are conducting a survey to find out more about the way that scientists use the various social media channels.

The survey is very short and feedback would be appreciated from everyone, you don't have to be a member of the RSC (or CICAG) to contribute.

The survey can be found here

The Chemical Information and Computer Applications Group (CICAG) is one of the RSC’s many member-led Interest Groups.


Dotmatics Reaction Workflows


Workflow tools have become increasingly popular Pipeline Pilot, Knime and Taverna and perhaps the best known. Most are desktop client based but some have a web page that allow users to run protocols that expert users have created.

Dotmatics Reaction Workflows (RW) is a web-based tool that allow users to build workflows from nodes that provide inputs and outputs or perform actions, including ones to perform reaction-, scaffold-, and transform-based enumeration, and it is all done within a web browser interface using drag and drop. I've been looking at reaction workflow for enumerating a potential library array.

Read more here….



A webinar demonstrating using Jupyter, the free iPython notebook


This is a recording of the March 2017 Global Health Compound Design meeting. A webinar demonstrating using Jupyter, the free iPython notebook.

How to get started

Accessing Open Source Malaria data

Calculating physicochemical properties and plotting

Predicting AMES activity.


Chirys View


Chirys View is a simple molecular spreadsheet for Mac OSX. It has been designed as a fast viewer for collections of molecules represented as an SDF file (Structured Data Format). On import molecular weight, exact mass, molecular formula, hydrogen bond acceptor and donor counts are automatically calculated. You can combine multiple SDF files by multiple file imports or by coping and pasting from one document into another. You can then save selected compounds as new SDF file.

I imported 1 million structures from ChEMBL and whilst it took a few minutes to load and used 27GB RAM it did so without complaints, scrolling down a list of a million compounds is a little impractical but list sorting is pretty responsive. I had a look at some of the more complex structures and they molecular layout seems excellent and clearly legible.


As a simple molecular selection tool Chirys View works very well. My only complaint is that when you import 3D structures (e.g. from a docking run) the structures can be difficult to discern (see below), it would be nice to have a convert to 2D option.



A Review of MOEsaic


I've just added a review of MOEsaic, this is a web service application that is part of the MOE install from Chemical Computing Group.

MOEsaic is a browser-based application for analyzing series of small molecule chemical structures and related property data (e.g. from medicinal chemistry projects). Once structure-property data is uploaded to the server, MOEsaic allows users to perform structure based searching and data analysis.

There is a complete listing of reviews here.


OpenEye Toolkits v2017.Feb released


OpenEye have announced the release of OpenEye Toolkits v2017.Feb. These libraries include the usual support for C++, Python, C#, and Java.


FastROCS TK now allows customization of starting points for shape overlap optimization.
Quacpac TK now includes a flexible molecular charging engine.
OEMedchem TK now allows MCS similarity scores to be computed for a query molecule compared to a set of indexed target structures.


MOE updated


Chemical Computing Group have announced and update to MOE. The MOE 2016.0802 update contains a number of updates to the biomolecule modelling including improved hydrogen bond detection, and addition of a number of unnatural amino acids.

There have also been improvements to MOE/Web MOE/web. The MOE/web version compatibility check has been broadened. MOE/web license waiting has been improved. HTTPS authentication proxy server support has been improved.


InChi Updated


Download InChI version 1 (software version 1.05) for Standard and Non-Standard InChI/InChIKey (27 January 2017)

This package contains InChI Software version 1.05 (January 2017) final release.

In this version:

  • support for chemical element numbers 113-118 was newly added;
  • experimental support of InChI/InChIKey for simple regular single-strand polymers was implemented;
  • experimental support of large molecules containing up to 32767 atoms was added;
  • ability to read necessary for large molecules input files in Molfile V3000 format was added;
  • provisional support for extended features of Molfile V3000 was added;
  • InChI API Library was significantly updated; in particular, a novel API procedure for direct conversion of Molfile input to InChI has been added; a whole new set of API procedures for both low and high-level operations (InChI extensible interface, IXA) has been added;
  • the source code was significantly modified in order to ensure multi-thread execution safety of
  • the InChI Library; several minor bugfixes/changes were made and several convenience options were added to the inchi-1 executable.




Chembench is a web-based tool for QSAR (Quantitative Structure-Activity Relationship) modeling and prediction. Chembench doesn't require any programming or scripting knowledge to use. It's an interface that lets you skip past the hassles of file management and translating between programs, so you can focus on the science of making and applying predictive models. DOI.

It includes models/datasets for things like brain penetration, PGP, AMES, skin penetration etc. you can use the existing models or build your own and than evaluate novel compounds.


SeeSAR Updated


A new version of SeeSAR is now available for download.

Version 5.5 includes several new features and has undergone some tweaks under the hood to improve speed.


From the release notes:-

2D browsing featuring in-view molecule properties
To further enhance the 2D browsing, we have added an illustration of the molecules' key properties in the form of a radar plot. A thumbnail of the plot is embedded in each of the 2D molecule pictures, providing a quick overview. it enlarges upon mouse-over and provides access to the configuration dialog. Add or remove property-axes, optionally fine-tune the scales and set 'desired' value ranges. A hit or miss of the latter is indicated by green or red dots on the corners of the color-coded characteristic shape of the molecule on the plot (the greener the better).
Detecting novel/unoccupied binding sites
Now SeeSAR can search your protein for unoccupied pockets based on the world-renown DoGSite-Algorithm. You may then select these to become the binding site, within which to generate poses and calculate binding affinities for your molecules. The new binding site definition feature lets you either use a selected molecule from the table (based on a 6.5Å shell around it, as before) or will detect and visualize empty pockets for you to select instead.
Multiple reference molecules
The reference molecule in SeeSAR always stays in view even when you select other entries from the molecule tables. Now, however, you are able to set - and keep in view - as many reference molecules as you like. Either set them individually - in the selected molecule menu (as before) - or mark several as favorites and set them all as references at once, via the new menu button below the table.
Multiple core replacements with just one click
With the new multiple solutions button for ReCore in the molecule editor, brainstorming new scaffold ideas became yet easier. You can now generate 10 new alternative core replacements at once. The new molecules are saved directly to the table so that you can immediately see their estimated binding affinity and view all structures in 2D at a glance.


OSRA 2.1.0 released


Just got this email

I am glad to announce the release of OSRA 2.1.0. OSRA (Optical Structure Recognition Application) is a tool for converting images of molecules into SDF, SMILES and many other chemical formats. Images can be pictures of single molecules or complete PDF documents with multiple pages of text and graphics. In addition to molecules OSRA can also recognize reactions, and, starting with this version, simple polymers.

The improvements in this version: - Significantly improved recognition of PDF documents, no longer dependent on Ghostscript at runtime. - Recognition of polymers (different approach from POSRA - a separate tool focused on polymer recognition).

The new version is available at

Please note that if you are building from source the dependencies have changed. OSRA now requires poppler (version 0.41) to process PDF files and a custom-patched version of OpenBabel to save polymer MOL and SDF files. The patched version of OpenBabel is provided at the above url. OSRA no longer requires Ghostscript to be installed.


MayaChemTools: An Open Source Package for Computational Drug Discovery


Just noticed this paper.

MayaChemTools: An Open Source Package for Computational Drug Discovery 10.1021/acs.jcim.6b00505">DOI.

MayaChemTools is a growing collection of Perl scripts, modules, and classes to support a variety of computational drug discovery needs, such as manipulation and analysis of data, generation of two-dimensional (2D) fingerprints, similarity searching, and calculation of physicochemical properties.

MayaChemTools is freely available online at, under the terms of the GNU LGPL, as published by the Free Software Foundation.

It is possible to access them using a Vortex script.


Scientific Applications under Sierra (Update 14)


Whilst there are many sites that track the compatibility on common desktop applications, it is often difficult to find out information about scientific applications. I’ll update the list regularly and feel free to send in information.

When I compiled a similar lists for Yosemite and El Capitan they proved very popular with 13,000 page views, I hope this page is similarly useful.

4Peaks no reported issues

Avogadro all seems OK

BBEdit version 11.6.2 and newer are compatible, recommend against using earlier versions

Brainsight requires version 2.3.3 for full compatibility with 10.12 Sierra. You could note too that 2.0 through 2.2.x will never work because 10.12 removed support for garbage collected applications. 2.3.x uses ARC

ChemDraw the official line is that it is not supported, even under El Capitan there were reports of copy/paste issues. One user reports “ChemDraw 15 is working fine for me. copy/paste, everything without issues”.

ChemDoodle no reported issues

CrystalMaker “We are pleased to confirm that all our latest software runs fine on macOS “Sierra”, as well as OS X 10.11 “El Capitan”, 10.10 “Yosemite”, and earlier.”

CYLview app launcher (icon on the desktop or the dock) does not work need to start using “Terminal”

DataDesk Data Desk 8 for Mac runs on OS X 10.7 up to 10.12

DataWarrior requires Java installation

DEVONagent has been updated to version 3.9.5 to support Sierra, in addition this update brings support for Qwant. DEVONthink has also been updated for Sierra.

EndNote From Endnote (Thomson Reuters) version 7: Message for Mac user planning to update to Sierra: In preparation for Apple's release of macOS Sierra on September 20, we have been testing various versions of EndNote. Through our testing, we discovered some issues with the EndNote PDF viewer. These issues have been reported to Apple, but in the meantime, we recommend that you DO NOT upgrade to macOS Sierra.

EnzymeX no reported issues

Evernote a bug in some versions of Evernote for Mac that can cause images and other attachments to be deleted from a note under specific conditions. We've released an updated version of Evernote for Mac, version 6.9.1, to resolve this.

Findings no reported issues

Fortran users will be happy to hear there are no reported issues with FTranProjectBuilder

GAMESS no reported issues.

Homebrew, after every update it is worth checking your homebrew installation.

Username$ brew doctor
Please note that these warnings are just used to help the Homebrew maintainers
with debugging if you file an issue. If everything you use Homebrew for is
working fine: please don't worry and just ignore them. Thanks!

Warning: /usr/local is not writable.

You should probably change the ownership and permissions of /usr/local
back to your user account.
  sudo chown -R $(whoami) /usr/local

Warning: /usr/local is not writable.
Even if this directory was writable when you installed Homebrew, other
software may change permissions on this directory. For example, upgrading
to OS X El Capitan has been known to do this. Some versions of the
"InstantOn" component of Airfoil or running Cocktail cleanup/optimizations
are known to do this as well.

You should probably change the ownership and permissions of /usr/local
back to your user account.
sudo chown -R $(whoami) /usr/local

Once corrected you can then type

brew update
brew upgrade

You may get this error

$brew update
/usr/local/Library/ line 32: /usr/local/Library/ENV/scm/git: No such file or directory

simply retyping brew update seems to resolve the issue

If you have previously installed Openbabel using

brew install mcs07/cheminformatics/open-babel --HEAD

The "--HEAD" part means install the latest development version from GitHub. The latest version of OpenBabel is now available so type

brew uninstall mcs07/cheminformatics/open-babel
Uninstalling /usr/local/Cellar/open-babel/HEAD... (309 files, 14.6M)
brew install mcs07/cheminformatics/open-babel

You can check you have the latest version installed by type this in a Terminal window

obabel -V
Open Babel 2.4.0 -- Sep 24 2016 -- 14:01:18

iBabel seems to work fine with the latest version of OpenBabel under Sierra. One advantage to updating to OpenBabel 2.4.0 is that previews now work with Quicklook.


IDL does run under Sierra but does require some tuning. Detailed instructions here

iPython Notebook all working fine

Lego Mindstorms At the moment everything seems to be running really great on Sierra. However, please let you readers know they are welcome to contact us via the website way you did if they run into any errors. We'd be happy to solve them!

Manuscripts no reported issues

MarvinSketch I had to reinstall Java for Mac OSX this is the last version of Java Apple created to support legacy applications, similarly MarvinSpace and MarvinView.

Matlab in general appears to be fine except for certain language localisations these languages a patch is available for download. A patch is also available for MATLAB Runtime R2016b.

Mathematica 11.0.1 has been compatibility tested with macOS Sierra and you should not run into any OS-specific compatibility issues. The font-panel is disabled, but we are actively working to address this as soon as possible.

Mendeley no issues reported

MOE working fine, XQuartz did not need reinstalling. However the MOE app launcher (icon on the desktop or the dock) does not work because Apple changed some fundamental system components which affects lots of programs not specifically compiled for the newest MacOSXs. Also you cannot double click on a file to open it in MOE. You can still start MOE from the command line


It then works perfectly. Update, just had an email from CCG support, The problem with the MOE app launcher on MacOSX Sierra has been fixed in the MOE 2016 release.

MOPAC all seems to be working fine.

osra crashed with error abort trap: 6. I uninstalled using brew then reinstalled

brew uninstall osra
Uninstalling /usr/local/Cellar/osra/2.0.1... (7 files, 1.6M)
brew install osra

Then worked fine

Pandoc depends on llvm-3.5, not supported on Sierra. Llvm-3.9 is supported, installation using Homebrew seems to be OK.

Papers Mac 3.4.7 (527) is now available! Fixes a couple of problems under Sierra. A crash that can occur when switching PDFs, The search in PDF functionality is restored

R latest version (3.3.1) all seems fine

rdkit installed using home-brew works fine.

Readcube Version 2.22.13732 is Mac OS Sierra (v10.12) compatibility update.

Scansnap Note for using ScanSnap or ScanSnap Applications on macOS Sierra In order to avoid the ScanSnap compatibility problems, please do not use ScanSnap or ScanSnap applications on macOS Sierra in the following manner as doing so may cause some pages to be deleted or to become blank. Do not use [ScanSnap Organizer], [ScanSnap Merge Pages], or [CardMinder] Do not use Excellent mode when scanning A3 (11.7 in. x 16.5 in.) documents No image data will be lost nor any blank pages produced when content that has been scanned in the A4 (8.3 in. x 11.7 in.), Letter (8.5 in. x 11 in.), Legal (8.5 in. x 14 in.), or smaller sizes is saved.

Schrodinger a reader sent in this response. We received your query regarding MacOS Sierra. Unfortunately our current, 2016-3 release, do not yet support MacOS Sierra but we have plans to include support for this OS for the upcoming 2016-4 release of our software.

SeeSAR version 5.3 now, 5.4 will come out shortly. No compatibility issues observed/reported.

Studies no reported issues

UCSF Chimera version 1.11.1 seems to be working fine

Wizard worked great with the developer pre-releases, no reported issues

Vortex no problems so far, the embedded chemical drawing app Elemental appears to have no issues.

XQuartz did not require reinstallation :-) however there are reports of an intermittent display not found error when launching apps from a Linux box.

Allow applications downloaded from anywhere in macOS Sierra, if you open the security panel in the Settings the default options in Sierra are as shown below. There is no longer the option to open applications from Anywhere.


Apple have removed this function on macOS Sierra, but you can re-enable it running this in terminal

sudo spctl --master-disable


You can restore it back to the default setting using

sudo spctl --master-enable

I’ll add more updates later.


David Weininger


I just heard that David Weininger had died last Wednesday, for me his invention of SMILES was one of those ideas that you instantly knew was going to change the way we did science. So much of what we do in storing, searching and analysing chemical information is based on his pioneering work. I only met him once at a Daylight UGM but it was clear from our first conversation that he was a scientist with a special insight.

SMILES as a simple yet comprehensive chemical language in which molecules and reactions can be specified using ASCII characters representing atom and bond symbols

Anthony Nicholls of OpenEye has written a lovely tribute that is well worth reading


Cheminformatics for Drug Design: Data, Models and Tools, Meeting Report


This was a joint meeting Organised by SCI's Fine Chemicals Group and RSC's Chemical Information and Computer Applications Group. Held at Imperial War Museum, Duxford, UK, on Wednesday 12 October 2016. This was an excellent meeting and the conference centre at Duxford was superb, many participants arrived early to have a wander around the historic collection of aeroplanes.

Read full report.


ICM version 3.8-5


MolSoft have announced the release of ICM version 3.8-5.

  • Generate a 2D Interaction Diagram of a ligand with the binding pocket. The image is annotated with hydrogen bonds and interacting residues.
  • 3D ligand editor is a powerful tool for the interactive design of new lead compounds in 3D
  • ICMJS is a JavaScript/HTML5 viewer for 3D Molecular Graphics which does not require any plugin or installation.
  • Support for MMTF format. The Macromolecular Transmission Format (MMTF)
  • Support for Mac retina display
  • Add docking restraints by selecting atoms in the receptor
  • Updates to protein modelling, bioinformatics and cheminformatics

Full release notes are here


MOE 2016.08 released


Chemical Computing Group have just announced an update to MOE. This release has fixed a couple of Mac OSX 10.12 (Sierra) issues but also brings a host of new features.

  • MOEsaic: Web-Application for Ligand Analytics
  • Spectral Analysis for Structure Determination
  • Enhanced Protein Patch Analyzer
  • Integrated Antibody Project Database and Antibody Homology Modeler
  • Small Footprint MOE to Facilitate Large Scale Deployments
  • Physical and Virtual Rendering of Structures

A more detailed description of the new and enhanced features in MOE 2016.08 can be found at


Over 1000 downloads of iBabel


I just noticed that the latest version of iBabel has been downloaded over 1000 times, this is fantastic news and it certainly allows me to justify the effort put into creating the application.


iBabel started out as an AppleScript Studio application designed as a front-end to OpenBabel DOI, this was updated several times and is now an ApplescriptObjC application built with Xcode. As well as acting as a front-end to OpenBabel it also provided a front-end to tools built on OpenBabel and a molecule viewer using a selection of javascript viewers via an embedded web view.

I’m occasionally asked about the best way to install OpenBabel and I usually refer people to the page I wrote on installing cheminformatics tools on a Mac, this gives instructions on how to install a wide variety of cheminformatics toolkits and applications.

If you only want to install Openbabel then the best way is to use Homebrew.

Homebrew is a package manager for Mac OSX that installs packages in it’s own directory then symlinks the files to /usr/local. To install Homebrew you first need to have access to the command line tools for Xcode, the easiest way to do this is to download Xcode from the Mac Appstore

  1. Start Xcode on the Mac.
  2. Choose Preferences from the Xcode menu.
  3. In the General panel, click Downloads.
  4. On the Downloads window, choose the Components tab.
  5. Click the Install button next to Command Line Tools. You are asked for your Apple Developer login during the install process.

Or You can download the Xcode command line tools directly from the developer portal as a .dmg file. On the "Downloads for Apple Developers" list, select the Command Line Tools entry that you want.

To install Homebrew type this command in the Terminal

ruby -e "$(curl -fsSL"

Then type

brew doctor

The 'brew doctor' command checks everything is fine. e.g. it will warn if the developer tools are missing, and if there are unexpected items in /usr/local/bin and /usr/local/lib that may clash and might need to be deleted.

It is a good idea to first update the package list

brew update

To install a range of cheminformatics packages we can use a custom “tap” created by Matt

brew tap mcs07/cheminformatics

Then to specifically install Openbabel use

brew install mcs07/cheminformatics/open-babel

To check OpenBabel is working type this in a Terminal window:

obabel -:'C1=CC=CC=C1F' -ocan 
1 molecule converted


Scripting Vortex 34, analysis of catagorical information


I often need to tag individual molecules within a dataset with a specific property, perhaps the results of clustering algorithms, the results of PAINS filtering, or Liver toxicity filters. Alternatively if you have a drug discovery project with multiple chemotypes you might want to tag particular groups of compounds as belonging to a named series to aid analysis.

A question that might then arise is “How many molecules belong to each category?”. Whilst you can see the numbers in the sidebar there is not an easy way to export the results.

Hopefully this script can help.



OpenBabel 2.4.0 released


A major new update to OpenBabel has been released, version 2.4.0 is a significant change and is highly recommended.

New file formats

  • DALTON output files (read only) and DALTON input files (read/write) (Casper Steinmann)
  • JSON format used by ChemDoodle (read/write) (Matt Swain)
  • JSON format used by PubChem (read/write) (Matt Swain)
  • LPMD's atomic configuration file (read/write) (Joaquin Peralta)
  • The format used by the CONTFF and POSFF files in MDFF (read/write) (Kirill Okhotnikov)
  • ORCA output files (read only) and ORCA input files (write only) (Dagmar Lenk)
  • ORCA-AICCM's extended XYZ format (read/write) (Dagmar Lenk)
  • Painter format for custom 2D depictions (write only) (Noel O'Boyle)
  • Siesta output files (read only) (Patrick Avery)
  • Smiley parser for parsing SMILES according to the OpenSMILES specification (read only) (Tim Vandermeersch)
  • STL 3D-printing format (write only) (Matt Harvey)
  • Turbomole AOFORCE output (read only) (Mathias Laurin)
  • A representation of the VDW surface as a point cloud (write only) (Matt Harvey)

New file format capabilities and options

  • AutoDock PDBQT: Options to preserve hydrogens and/or atom names (Matt Harvey)
  • CAR: Improved space group support in .car files (kartlee)
  • CDXML: Read/write isotopes (Roger Sayle)
  • CIF: Extract charges (Kirill Okhotnikov)
  • CIF: Improved support for space-groups and symmetries (Alexandr Fonari)
  • DL_Poly: Cell information is now read (Kirill Okhotnikov)
  • Gaussian FCHK: Parse alpha and beta orbitals (Geoff Hutchison)
  • Gaussian out: Extract true enthalpy of formation, quadrupole, polarizability tensor, electrostatic potential fitting points and potential values, and more (David van der Spoel)
  • MDL Mol: Read in atom class information by default and optionally write it out (Roger Sayle)
  • MDL Mol: Support added for ZBO, ZCH and HYD extensions (Matt Swain)
  • MDL Mol: Implement the MDL valence model on reading (Roger Sayle)
  • MDL SDF: Option to write out an ASCII depiction as a property (Noel O'Boyle)
  • mmCIF: Improved mmCIF reading (Patrick Fuller)
  • mmCIF: Support for atom occupancy and atom_type (Kirill Okhotnikov)
  • Mol2: Option to read UCSF Dock scores (Maciej Wójcikowski)
  • MOPAC: Read z-matrix data and parse (and prefer) ESP charges (Geoff Hutchison)
  • NWChem: Support sequential calculations by optionally overwriting earlier ones (Dmitriy Fomichev)
  • NWChem: Extract info on MEP(IRC), NEB and quadrupole moments (Dmitriy Fomichev)
  • PDB: Read/write PDB insertion codes (Steffen Möller)
  • PNG: Options to crop the margin, and control the background and bond colors (Fredrik Wallner)
  • PQR: Use a stored atom radius (if present) in preference to the generic element radius (Zhixiong Zhao)
  • PWSCF: Extend parsing of lattice vectors (David Lonie)
  • PWSCF: Support newer versions, and the 'alat' term (Patrick Avery)
  • SVG: Option to avoid addition of hydrogens to fill valence (Lee-Ping)
  • SVG: Option to draw as ball-and-stick (Jean-Noël Avila)
  • VASP: Vibration intensities are calculated (Christian Neiss, Mathias Laurin)
  • VASP: Custom atom element sorting on writing (Kirill Okhotnikov)

Other new features and improvements

  • 2D layout: Improved the choice of which bonds to designate as hash/wedge bonds around a stereo center (Craig James)
  • 3D builder: Use bond length corrections based on bond order from Pyykko and Atsumi ( (Geoff Hutchison)
  • 3D generation: "--gen3d", allow user to specify the desired speed/quality (Geoff Hutchison)
  • Aromaticity: Improved detection (Geoff Hutchison)
  • Canonicalisation: Changed behaviour for multi-molecule SMILES. Now each molecule is canonicalized individually and then sorted. (Geoff Hutchison/Tim Vandermeersch)
  • Charge models: "--print" writes the partial charges to standard output after calculation (Geoff Hutchison)
  • Conformations: Confab, the systematic conformation generator, has been incorporated into Open Babel (David Hall/Noel O'Boyle)
  • Conformations: Initial support for ring rotamer sampling (Geoff Hutchison)
  • Conformer searching: Performance improvement by avoiding gradient calculation and optimising the default parameters (Geoff Hutchison)
  • EEM charge model: Extend to use additional params from (Tomáš Raček)
  • FillUnitCell operation: Improved behavior (Patrick Fuller)
  • Find duplicates: The "--duplicate" option can now return duplicates instead of just removing them (Chris Morley)
  • GAFF forcefield: Atom types updated to match Wang et al. J. Comp. Chem. 2004, 25, 1157 (Mohammad Ghahremanpour)
  • New charge model: EQeq crystal charge equilibration method (a speed-optimized crystal-focused charge estimator, (David Lonie)
  • New charge model: "fromfile" reads partial charges from a named file (Matt Harvey)
  • New conversion operation: "changecell", for changing cell dimensions (Kirill Okhotnikov)
  • New command-line utility: "obthermo", for extracting thermochemistry data from QM calculations (David van der Spoel)
  • New fingerprint: ECFP (Geoff Hutchison/Noel O'Boyle/Roger Sayle)
  • OBConversion: Improvements and API changes to deal with a long-standing memory leak (David Koes)
  • OBAtom::IsHBondAcceptor(): Definition updated to take into account the atom environment (Stefano Forli)
  • Performance: Faster ring-finding algorithm (Roger Sayle)
  • Performance: Faster fingerprint similarity calculations if compiled with -DOPTIMIZE_NATIVE=ON (Noel O'Boyle/Jeff Janes)
  • SMARTS matching: The "-s" option now accepts an integer specifying the number of matches required (Chris Morley)
  • UFF: Update to use traditional Rappe angle potential (Geoff Hutchison)

Language bindings

  • Bindings: Support compiling only the bindings against system libopenbabel (Reinis Danne)
  • Java bindings: Add example Scala program using the Java bindings (Reinis Danne)
  • New bindings: PHP (Maciej Wójcikowski)
  • PHP bindings: BaPHPel, a simplified interface (Maciej Wójcikowski)
  • Python bindings: Add 3D depiction support for Jupyter notebook (Patrick Fuller)
  • Python bindings, Pybel: calccharges() and convertdbonds() added (Patrick Fuller, Björn Grüning)
  • Python bindings, Pybel: compress output if filename ends with .gz (Maciej Wójcikowski)
  • Python bindings, Pybel: Residue support (Maciej Wójcikowski)

Development/Build/Install Improvements

  • Version control: move to git and GitHub from subversion and SourceForge
  • Continuous integration: Travis for Linux builds and Appveyor for Windows builds (David Lonie and Noel O'Boyle)
  • Python installer: Improvements to the Python installer and "pip install openbabel" (David Hall, Matt Swain, Joshua Swamidass)
  • Compilation speedup: Speed up compilation by combining the tests (Noel O'Boyle)
  • MacOSX: Support compiling with libc++ on MacOSX (Matt Swain)


Cheminformatics for Drug Design: Data, Models & Tools


This is a joint meeting Organised by SCI's Fine Chemicals Group and RSC's Chemical Information and Computer Applications Group. To be held at Imperial War Museum, Duxford, UK, on Wednesday 12 October 2016.

There is an interesting line up of speakers and exhibitors and a chance to have a look around the aerospace museum. More details and the booking form are here





I just noticed that iScienceSearch has been updated.

Search by structure, text, name and identifiers in 100 chemical and biological databases. A single front-end allows you to get links with answers for your query searching the scientific web! NEW! iScienceSearchlite is a new iScienceSearch version with a simplified UI, which is optimized for smaller screens and slow Internet connections.

There is an review of an earlier version here


CICAG Newsletter


The CICAG newsletter is now available here

CICAG aims to keep its members abreast of the latest activities, services, and developments in all aspects of chemical information, from generation through to archiving, and in the computer applications used in this rapidly changing area through meetings, newsletters and professional networking


Cheminformatics for Drug Design: Data, Models & Tools

I’ve just heard that the poster deadline for the Cheminformatics for Drug Design: Data, Models & Tools meeting organised by SCI's Fine Chemicals Group and RSC's Chemical Information and Computer Applications Group has been extended.

Imperial War Museum, Duxford, UK Wednesday 12 October 2016

Full details are available here

Sounds an excellent meeting and you will have a chance to look around the aircraft at the Duxford Imperial War Museum.


Accessing ZINC supplier information


ZINC is a free database of commercially-available compounds for virtual screening. ZINC contains over 100 million purchasable compounds in ready-to-dock, 3D formats. Sterling and Irwin, J. Chem. Inf. Model, 2015. This is an invaluable resource for any type of virtual screening or for anyone looking to create a physical screening or fragment collection.

Once you have done the virtual screening you will rapidly realise that the really time-consuming a tedious part now lies ahead. Finding out which vendors stock a particular molecule and then ordering them. Looking up the vendor details for individual compounds is extremely tedious and so this Vortex script may be very useful.

Many more scripts, iPython notebooks and tutorials can be found here.


StarDrop 6.3 released

Optibrium have just announced the release of StarDrop 6.3, perhaps the highlight of this release is the introduction of the new SeeSAR module.

The SeeSAR module developed in collaboration with BioSolve ITprovides seamless access in StarDrop to 3D structures based on X-ray crystallography or predicted with any docking software. The intuitive link between this 3D information and StarDrop’s cheminformatics analyses and visualisations, based on 2-dimensional compound structure, gives new insights into structure-activity relationships (SAR) within your project chemistry and aids the design of improved compounds. It also supports collaboration between computational and synthetic chemists, helping to share the results of 3D modelling with all decision makers.


You can watch a video tutorial here


Over 700 iBabel 3.6 Downloads


I just noticed that the latest version of iBabel has now been downloaded over 700 times since it was released at the start of the year.

iBabel started out as an AppleScript Studio application designed as a front-end to OpenBabel DOI, this was updated several times and is now an ApplescriptObjC application built with Xcode. As well as acting as a front-end to OpenBabel it also provided a front-end to tools built on OpenBabel and a molecule viewer.



Cheminformatics for Drug Design: Data, Models & Tools


A joint meeting Organised by SCI's Fine Chemicals Group and RSC's Chemical Information and Computer Applications Group

More Details and booking form

A4 Cheminfo flyer


MolSync web structure searching


Interesting post on structure based searching via a web interface.

The MolSync website and the technology behind it have been moving forward rapidly. The public-facing deployment now shows a proof of concept page for performing molecule searches:


Cytoscape Update


Cytoscape has been updated to version 3.4.0

Note, This update requires Java 8 is installed and Mac OS X 10.9 and later.

Cytoscape is an open source software platform for visualizing molecular interaction networks and biological pathways and integrating these networks with annotations, gene expression profiles and other state data.


iBabel over 500 downloads


I just noticed that iBabel has now been downloaded over 500 times since the start of the year. I'm surprised and delighted that it has proved so popular.


iBabel is a GUI (graphical user interface) for the open source cheminformatics toolkit OpenBabel. It also provides an interface to a variety of tools built using OpenBabel and a variety of molecule viewers.


Cheminformatics job opportunities


I've been sent details of a couple of jobs and I thought I'd pass them on.

BIO – Cheminformatics Data Scientist (Stratified Medical)

We are looking for an experienced and innovative cheminformatician to make a significant contribution in influencing the drug discovery process by applying your expertise in chemical methods development and use of intelligent algorithms to chemical data. Working within the Biomedical Data R&D team, you will be required to bring your experience and ideas across a broad range of drug design areas.

More details here…

Technical Expert Cheminformatics, (Syngenta, Jealotts Hill)

In this role, you will provide cheminformatics and mathematical modeling support to the computational chemistry platform in Chemical Research.

More details here…


RDkit updated


RDkit has been updated .

If you used home-brew to install RDkit as described here updating is very simple

brew update
brew upgrade rdkit

You can check which version you have installed using

MacPro> python
Python 2.7.11 (default, Dec 23 2015, 16:11:50) 
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from rdkit import rdBase
>>> print rdBase.rdkitVersion


Over 400 iBabel downloads


The latest version of iBabel has now been downloaded over 400 times since it was released in January.


iBabel is a GUI (graphical user interface) for the open source cheminformatics toolkit OpenBabel. It also provides an interface to a variety of tools built using OpenBabel and molecule viewers


BALL Project


BALL (Biochemical ALgorithms Library) is an application framework implemented in C++ that has been specifically designed to reduce development times in the field of Computational Molecular Biology and Molecular Modeling. It provides an extensive set of data structures as well as classes for Molecular Mechanics, advanced solvation methods, comparison and analysis of protein structures, file import/export, and visualization.

BALLView is BALL’s standalone molecular modelling and visualization application. Furthermore, it is also a framework for developing molecular visualization functionality.


It can be downloaded from here and requires

  • CMake >= 2.8.12
  • git
  • Python 2.7
  • Qt 5.4

Installation instructions for Mac OSX are here


Opsin Updated


Opsin (Open Parser for Systematic IUPAC Nomenclature) has been updated. If you used Homebrew to install it you can get the latest version by simply typing

brew update
brew upgrade opsin

Opsin is a freely available, algorithm that interprets the majority of organic chemical nomenclature DOI. You can try it out using this website


StarDrop 6.3

Optibrium have announced the latest update to the StarDrop application. The highlight for version 6.3 is perhaps the integration of SeeSAR an intuitive structure-based design tool.

The new SeeSAR module for StarDrop provides a state-of-the-art and scientifically rigorous approach to understanding the binding of compounds in their protein targets in 3D. Users can import ligand and protein structures, derived from crystal structures or predicted with any docking software, and visualise the key interactions driving potency. This is seamlessly linked to StarDrop’s chemoinformatics methods based on 2-dimensional (2D) compound structure and its unique Card View approach to interpreting the resulting structure-activity relationships.

A preview of StarDrop 6.3 will be on show at the American Chemical Society National Meeting, 13th-17th March 2016.

There are reviews of SeeSAR and StarDrop in the reviews section.


iPython Notebook to calc physicochemical properties


I've been making increasing use of iPython notebooks, both as a way to perform calculations but also as a way of cataloging the work that I've been doing. One thing I seem to be doing quite regularly is calculating physicochemical properties for libraries of compounds and then creating a trellis of plots to show each of the calculated properties. In the past I've done this with a series of applescripts using several applications. This seemed an ideal task to try out using an iPython notebook.




ChEMBL 21 released


The release of ChEMBL_21 has been announced. This version of the database was prepared on 1st February 2016 and contains:

  • 1,929,473 compound records
  • 1,592,191 compounds (of which 1,583,897 have mol files)
  • 13,968,617 activities
  • 1,212,831 assays
  • 11,019 targets
  • 62,502 source documents


Data can be downloaded from the ChEMBL ftpsite or viewed via the ChEMBL interface

Please see ChEMBL_21 release notes for full details of all changes in this release.


Summer of Code


I was just sent details of this

Interested in doing some chemistry programming this summer? Have students that might be interested?

Open Chemistry has been accepted into the Google Summer of Code for 2016 - including Open Babel, Avogadro, cclib and 3DMol.js.

If you are a student and interested in doing open chemistry software development this summer (or know of someone who is), we're definitely up for good proposal ideas. Take a look at our suggestions or come up with one on your own:

Student proposals can be submitted between March 14th and March 25th. Instructions are at the Summer of Code website.


OpenEye Toolkit Updated


OpenEye have announced the release of OpenEye Toolkits v2016.Feb. These libraries include the usual support for C++, Python, C# and Java.

The update address several key features.


OpenEye toolkits are used in web services that require protection from malicious users. The most obvious attack vector against the OpenEye toolkits is file format parsing since scientific file formats are complex and often underdefined and there is the potential for embedded malicious code. This update closes a number of potential vulnerabilities.

FastROCS TK: Database Loading Performance

An interesting development is that physical memory limits on GPU's mean that for loading larger libraries the loading of the dataset actually takes longer than the actual search. This release addresses that issue.

OEMedChem TK

This also contains first official release of OEMedChem TK, in particular access to matched molecular pairs.

This 2016.Feb release no longer support OSX 10.8 , but support has been added for OSX 10.11. This 2016.Feb release supports Python 3.5 for the following platforms: OSX 10.10, OSX 10.11, Ubuntu 12, Ubuntu 14, RedHat 6, and RedHat 7

Full release notes are here ….


iBabel Downloads


I thought I'd have a look at the number of downloads of iBabel there have been since I announced the latest release last month. So far there have been over 250 downloads and there seems to be a steady stream of downloads as shown in the plot below.


iBabel is a GUI (graphical user interface) for the open source cheminformatics toolkit OpenBabel. It also provides an interface to a variety of tools built using OpenBabel and a selection of molecule viewers


Unicon and Mona


I've just added Unicon and Mona to the alphabetical listing of applications.

UNICON is a command-line tool to cope with common cheminformatics tasks. The functionality of UNICON ranges from file conversion between standard formats SDF, MOL2, SMILES, and PDB via the generation of 2D structure coordinates and 3D structures to the enumeration of tautomeric forms, protonation states and conformer ensemble.

Mona is an interactive tool that can be used to prepare and visualize large small-molecule datasets. A set centric workflow allows to intuitively handle hundred thousands of molecules.


PolyPharma app updated


The iOS app PolyPharma has been updated. PolyPharma uses structure activity relationships to view predicted activities against biological targets, physical properties, and off-targets to avoid. Calculations are done using Bayesian models and other kinds of calculations that are performed on the device.

More details are available in this presentation.


RSC Undergraduate Research Bursaries


The RSC's Undergraduate Research Bursaries are now open for 2016 entries, seeking talented chemical sciences students to undertake a research placement this summer.

These research bursaries are to fund short (6-8 weeks) summer research projects for undergraduate chemistry students in the middle years of their course. The purpose of the awards is to give experience of research to undergraduates with research potential and to encourage them to consider a career in scientific research.

The bursary is worth £200 per week (£210 in London) for up to 8 weeks to cover a defined research placement.

The deadline for applications is 21 February 2016.

Please note that, for the first time, in 2016 CICAG will be funding one student bursary for research work which falls into one or more of the following areas: cheminformatics, chemical information, chemical data management, chemistry data analytics, applications of computational chemistry.

For more information about guidelines, eligibility criteria, award conditions, and the application form please see the Undergraduate Research Bursaries web page

Any questions should be directed to the Under Graduate Bursaries team (see link on the Undergraduate Research Bursaries web page linked to above).


Installing Checkmol/Matchmol under Mac OSX


Checkmol is a command-line utility program which reads molecular structure files in different formats and analyzes the input molecule for the presence of various functional groups and structural elements. At present, approx. 200 different functional groups are recognized. Output can be either clear text (English or German), a bitstring or its ASCII representation, or a set of special 8-character codes. This output can be easily placed into a database table, permitting the creation of chemical databases with a functional group search option. It was written by Norbert Haider, Department of Pharmaceutical Chemistry (now: Department of Drug and Natural Product Synthesis), University of Vienna, Austria.

The software is available both as source code and as a binary compiled for Linux (x86 architecture). It is entirely written in Pascal and it was compiled with Free Pascal 1.0.11 or Free Pascal 2.4.0 (starting from v0.4c). So to install we first need to get a Pascal compiler, this can be downloaded from Sourceforge.

Full details are here.




If you are renewing your Royal Society of Chemistry subscription remember your membership covers up to three interest groups. If you are interested in cheminformatics you might be interested in the Chemical Informations and Computer Applications group.

I'm on the committee and we have a couple of interesting meeting in the planning stage.


iBabel 3.6 is released



iBabel started out as an AppleScript Studio application designed as a front-end to OpenBabel DOI, this was updated several times and is now an ApplescriptObjC application built with Xcode. As well as acting as a front-end to OpenBabel it also provided a front-end to tools built on OpenBabel and a molecule viewer using a selection of java applets and plugins via an embedded web view.

This all worked perfectly for a while but various security issues mean that java applets and plugins via an embedded web view no longer function, in addition calls to remote web servers to provide javascript viewers also cause security issues. In addition OpenBabel has been substantially rewritten so that many of the small programs built on OpenBabel are no longer supported. This functionality has not been lost however, they have now been incorporated into the main OpenBabel program. Security updates, Sandboxing and changes within El Capitan also meant I had to update a number of features.

Now things have settled down a bit I've restarted work on iBabel and an update is now available.

I've transitioned most of the calls to babel over to obabel the differences are highlighted here and replaced the calls to the tools based built on OpenBabel with the new corresponding calls to obabel.

Updating the viewers however has taken more time than I expected with new security features in Mac OSX updates causing unexpected issues. Whilst not yet complete, I have removed all the java or plugin-based molecular viewers and replaced them with javascript versions.

Full details are here.


An early Christmas present from Chemical Computing Group


Chemical Computing Group have just released an up date to MOE, version 2015.10 includes:-

Protein-Protein Docking

  • Generate docked poses using FFT followed by all atom minimization
  • Define receptor and ligand sites to focus docking
  • Automatically detect antibody CDR sites

Integrated Alignment, Consensus and Superposition in the Sequence Editor

  • Manipulate multimeric protein sequences using split side-by-side Sequence Editor panes
  • Use dendrograms to visualize pairwise similarity, identity and RMSD relationships
  • Select residues based on plotted values using resizable sequence editor plots

Distributed Pharmacophore Searching

  • Run pharmacophore searches on a cluster directly from MOE GUI
  • Perform fast corporate database searches
  • Access multiple databases stored on a central server

Covalent Docking and Electron Density Docking

  • Use reaction-based organic transformations to covalently docking
  • Minimize ligand strain energy while maximizing ligand fit to electron density
  • Run docking through an enhanced streamlined scenario-based interface

Extended Hückel Descriptors and pKa Model

  • Compute molecular properties such as logP, logS and molar refractivity
  • Determine populations of ligand protonation states at a given pH
  • Calculate the pKa and pKb of small molecules

13C NMR Analysis

  • Apply QM conformation refinement to calculate 13C NMR shielding
  • Convert computed shieldings and predict 13C NMR chemical shifts
  • Compare computed chemical shifts to experimental shifts for structure determination

I'll write a review in the New Year.


Chemical Identifier Resolver


I just got this message so I thought I'd pass it on, I'll update any scripts that use the chemical identifier resolver in the New Year.

To all users of programmatic services on the web server of the CADD Group at the NCI/NIH:

The CACTUS web server will move to a significantly reconfigured system on new hardware by the end of the year. This move is planned to take place during the last week of December 2015. This move will also entail a change of the host's IPv4 address. Concurrent with the cut-over, the HTTPS protocol will be enabled for all services. Both HTTP and HTTPS will be supported in parallel for a transition period. We plan to turn off HTTP permanently by end of March 2016. Disruptions to users caused by the move should be minimal. If you encounter any bugs or different behavior starting 1/1/2016, please let us know immediately.


HELM integration with RDKit


The Pistoia Alliance HELM project have announced free MarvinBeans 5.0 licenses and integration of HELM with the RDKit cheminformatics suite.

The Pistoia Alliance HELM project has made two major announcements that help cement the reputation of HELM as the de-facto standard for describing and working with complex macromolecular structures. Firstly, HELM users can now take advantage of free MarvinBeans 5.0 licenses for the HELM toolkit. Secondly, RDKit is now HELM-enabled, making it a valuable addition to the extensive range of open source HELM-enabled tools.


Reactions in XMDS


I just heard about an update to the OSX Molecular DataSheet (XMDS)

After much procrastination, chemical reactions have started to make their way into the OSX Molecular DataSheet (XMDS) beta

This could be a very interesting development.


OpenEye Toolkits v2015.October released


OpenEye have announced the release of OpenEye Toolkits v2015.October. These libraries include the usual support for C++, Python, C# and Java.

New Features

  • FastROCS TK was added to the OpenEye toolkits collection
  • Molecule reading performance improvement in OEChem TK
  • The capabilities of the OEBio-Fragment Network have been expanded
  • 213 new ring templates have been added to the OEChem TK built-in ring dictionary

The full release notes give more details

In particular note the 2015.Oct release is the last to support Mac OSX 10.8 so time to upgrade if you have not already done so.


OS X Molecular DataSheet (XMDS) app updated


The OS X Molecular DataSheet XMDS app has been updated recently. This is a chemically aware spreadsheet editor: it operates on a grid of editable cells, made up of typed columns, that can be molecules, numbers or plain text.

The latest update brings drag and drop, as you might imagine moving cells containing molecules is rather more complicated than numbers or text. You can read full details here



Brood v3.0 released


OpenEye have announced the release of Brood v3.0 a bioisostere replacement program.

  • Custom Fragment Conformations: Fragment geometries can now be derived from any 3D source, including the CSD.
  • Fragment Joining and Cyclization: Finding a fragment to bridge two disconnected molecules or cyclize a molecule is now directly supported.
  • Improved Filter Properties: Property filters can now have both minimum and maximum values.
  • Mapping Fragments to Source Molecules: Molecules BROOD constructs now include the source molecule from which the replacement fragment is derived.
  • Results Navigation: BROOD’s results navigation tool has been redesigned to be more intuitive, giving users an easy way to quickly explore the clustered and aligned analog molecules.

Full details are available in the release notes.


LSH-based similarity search in MongoDB is faster than postgres cartridge


There is a great blog article on ChEMBL-og, describing their work evaluating chemical structure based searching in MongoDB. MongoDB is a NoSQL database designed for scalability and performance that is attracting a lot of interest at the moment.

The article does a great job in explaining the logic behind improving the search performance.

They also provide an iPython notebook so you can try it yourself.


KNIME 2.12 released


The latest update to KNIME has been released.

The KNIME Analytics Platform incorporates hundreds of processing nodes for data I/O, preprocessing and cleansing, modeling, analysis and data mining as well as various interactive views, such as scatter plots, parallel coordinates and others. It integrates all of the analysis modules of the well known Weka data mining environment and additional plugins allow R-scripts to be run, offering access to a vast library of statistical routines.

What's New in KNIME 2.12

Analytics - Decision Tree to Rule Set (New node) - Rule Handling (New node) - Statistics measure as aggregation methods in GroupBy node - Extended PMML Support (New node) - Data Generation (New node) - More Statistics Nodes (New set of nodes)

Tool Integration - New MongoDB Integration (New set of nodes) - Javascript Integration (New set of nodes) - Extended JSON Processing (New set of nodes) - XML XPath Interactive extraction (New node) - Extended Python Integration (New node)


SeeSAR Updated


SeeSAR has been updated to version 3.1, the release notes highlight two significant new features.

SeeSAR is a software tool for interactive, visual compound prioritization as well as compound evolution.

  • Working with "big data" With this update we lifted the limit of handling only a maximum of 5000 poses in SeeSAR. We know that a lot of people like to do their compound analysis and prioritization after virtual screening campaigns also with much bigger sets. It is not likely that you will look at more than a couple of hundred poses, however, since the filtering (see also below) is extremely efficient, it provides quite an attractive opportunity to load all your data (not just the top x) and do your prioritization with all properties at hand right here in SeeSAR.
  • Enhanced filtering Behind the scenes SeeSAR knows so much more about your compounds than what is displayed in the table. The basic stuff like no. of acceptors and donors, rotatable bonds, etc. to do the usual Lipinski-type filtering is of course available, but also more elaborate stuff like the number of hydrogen bonds formed or the number of torsions that lie outside the statistical "norm". All of these are now available for filtering to help you optimally trim down your data to find the really interesting part.

NOTE! SeeSAR project files from older versions are incompatible and cannot be loaded. By default SeeSAR puts a new version in a separate location. The recommendation is to export your data from the old project file with the old version and import it into the latest SeeSAR. This is a one-time effort, which allows you to benefit from the features of the most up-to-date version.


A review of cApp


One of the most common tasks for those involved in cheminformatics is handling files containing molecular information, these files can be in a variety of file types and usually the task involved is relatively minor. cApp is Java application that provides a simple interface to a variety of everyday activities.

cApp requires JRE7 and uses the Chemistry Development Kit (CDK), an open-source Java library for chem- and bioinformatics, and associated software, JChemPaint as chemical editor, and routines developed within the Program Collection for Structural Biology and Biophysical Chemistry by the Hofmann group. Full details of cApp are described in a J Cheminformatics paper DOI.

You can read the review here.


InChI, the IUPAC International Chemical Identifier


InChI is the International Chemical Identifier developed under the auspices of IUPAC and are intended to be unique identifiers, they are freely usable and non-proprietary; they can be computed from structural information and do not have to be assigned by some organization;most of the information in an InChI is human readable (in theory!).

A recent paper in J Cheminformatics DOI describes the design, layout and algorithms of InChI, if you want to understand or implement the code this is a great starting point.

The paper is organized as follows. First, we discuss the general concepts associated with chemical identifiers. Then we outline the design goals of InChI and our general approach, focussing on the InChI model of chemical structure and the hierarchical layered structure of the Identifier; the concept of Standard InChI is introduced. This is followed by a detailed description of each of the possible major InChI layers, accounting for molecular connectivity, charge, stereochemistry, isotopic enrichment, position of hydrogen atoms and bonding in metal compounds, and the sublayers associated with these layers. We then describe the workflow of InChI generation (normalization, canonicalization, and serialization stages), as well as generation of the compact hashed code derived from InChI (InChIKey); the related algorithms and implementation details are briefly discussed. Finally, we provide information about InChI Software, licensing, known problems/limitations, and future prospects for InChI.

The source code and documentation can also be downloaded from here


OpenEye toolkits updated


OpenEye has announced the release of OpenEye Toolkits v2015.June. These libraries include the usual support for C++, Python, C# and Java and are now available for download.

New Features Highlights:

  • PDB Splitting in OEBio TK
  • PAINS (Pan Assay Interference Compounds) filter in OEMolProp TK
  • Matched molecular pair improvements in OEMedChem TK
  • Custom ring template dictionaries in OEChem TK
  • Anaconda support for easier Python toolkit installation

MoSS Molecular Substructure Miner


MoSS is mainly a program to find frequent molecular substructures and discriminative fragments in a database of molecule descriptions. It can be used in the context of drug discovery and synthesis prediction for the purpose of analyzing the outcome of screening tests. Given a database of graphs, MoSS finds all (closed) frequent substructures, that is, all substructures that appear with a user-specified minimum frequency in the database (and do not have super-structures that occur with the same frequency).

MoSS has been included in CheS-Mapper


CheS-Mapper Updated


CheS-Mapper has been updated. CheS-Mapper (Chemical Space Mapper) is a 3D-viewer for chemical datasets with small compounds. Whilst executable jar files can be downloaded from the website the source code is available on GitHub.

There is a review of an older version of Ches-Mapper here.


DataWarrior Update


DataWarrior 4.1.1 is available for download, in addition to precompiled binaries all Java source files and the script to build DataWarrior on Linux/MacOSX can be downloaded for free use under the GNU public license. DataWarrior is a free data visualization and analysis program with embedded chemical intelligence.


There is a review of DataWarrior here.


HackaMol: An Object-Oriented Modern Perl Library for Molecular Hacking on Multiple Scales


To be honest I can't remember when I last used Perl but this publication brought back a few memories DOI.

HackaMol is an open source, object-oriented toolkit written in Modern Perl that organizes atoms within molecules and provides chemically intuitive attributes and methods.

Source code and example scripts are available online at http:// There is also a description of an IPerl Notebook in the supporting information.

There is also a very interesting extension HackaMol::X::Vina, a structured class that provides an interface with the AutoDock Vina docking program


Scientific Applications under Yosemite


I just thought I'd like to thank all those who contributed to the Scientific Applications under Yosemite web page, many users and developers contacted me either via email or in the comments section and they certainly added information about applications that I don't have access to.

To date the page has been viewed well over 10,000 times with readers from 188 different countries. Viewers spent an average of just under two minutes on the page and it still attracts 800 pages views a month.

Given that 75% of the visitors to the site are now using Yosemite I suspect most scientists have now made the transition and I won't be updating the page any more. Once again thanks for the contributions.


SeeSAR 3.0


BioSolveIT has just announced the release of SeeSAR 3.0.

This update of SeeSAR qualifies as major release 3, since it covers two milestones in its development. So far every SeeSAR session has started from scratch. The only way to retain molecules was to save them to file and re-load them again in a subsequent session. Needless to say that loading meant recalculating all Hyde-scores again...

Project files Starting with Version 3.0, SeeSAR allows you to store all session data in a project file. This includes the protein, ligands loaded from file and new (edited) ligands. Resuming your work on a project is now as easy as double-clicking on the project-file. As a result, everything just got a hell of a lot faster! Whilst calculating Hyde-scores for say 1000 compounds took around half an hour (depending on your hardware), loading the same information from a project file now takes only a few seconds. Note that you can also generate a project file on the command line, allowing you to outsource the calculation of Hyde-scores to a different machine. This enhancement is also a great way to exchange data and ideas with a colleague! Simply store your SeeSAR session as a project file in a commonly accessible location (e.g. a network drive). Your colleague can take a look with just a double-click.

Hyde update Hyde is quite sensitive with regards to the precise geometry of a binding pose. Even the tiniest difference in a pose can distort an anyway stretched hydrogen bond just so much that it is not recognized anymore - thereby leaving you with a huge desolvation penalty for such atoms, without the gain from the h-bond. This "sharpness" of Hyde is its greatest strength (for example by highlighting real activity cliffs), but also its greatest weakness (especially if the structure has flaws or is of low resolution). In order to minimize such troubles, we optimize each pose before the Hyde affinity assessment. We improved this optimization significantly. It is now fully flexible and with sharper clash criteria, making it suitable for docked poses as well as edited compounds. All of this as efficient as before, just perfect for interactive use.

There is a review of an earlier version of SeeSAR here


MOE updated


MOE2014.0901 Update is now available. MOE is a fully integrated molecular modelling and drug discovery software package.

MOE 2014.0901 updates:


Protein Builder

  • Option for AMBER residue name
  • Append/prepend multiple residue sequence specified by single-letter names Builder:
  • Added H’s inherit color if there is a consistent coloring in the residue

sddesc: New -smi:p option causes field headers to be written to the output ASCII file

Bug Fixes:

  • MOESVLRUNPATH now properly honored
  • Combinatorial Builder now honors different attachment point locations on the same R-group
  • Database Save As one entry per file mode now properly generates unique filenames
  • Dock Template Forcing batch file now correctly generated
  • Saved views in .moe files now properly restored
  • Auto-save when Database Viewer display attributes are changed can now be disabled to prevent changes to the database file modification date when only the display is changed and not the database content
  • SVL function Deprotonate now works properly
  • Various MOE Project and Project Database Update bugs
  • Various minor bug fixes

There are reviews of MOE available here

Moe:- Molecular modeling
Moe Update (Jan 2009):- Molecular modeling
Review of MOE (2009.10 release):- Molecular modeling
Moe Update (December 2010.10 release):- Molecular modeling
Moe Update (December 2011 release):- Molecular modeling
Moe Update (December 2012 release):- Molecular modeling


Decoy Finder 2.0


DecoyFinder has been updated to version 2.0. Decoy Finder is a graphical tool which helps finding sets of decoy molecules for a given group of active ligands. It does so by finding molecules which have a similar number of rotational bonds, hydrogen bond acceptors, hydrogen bond donors, logP value and molecular weight, but are chemically different, which is defined by a maximum Tanimoto value threshold between active ligand and decoy molecule MACCS fingerprints. Optionally, a maximum Tanimoto value threshold can be set between decoys in order to assure chemical diversity in the decoy set.

There have been some changes in the dependencies, it now needs RDKit (with OpenBabel being optional) and PyQt4 instead of PySide.

Installation of RDKit was already described in the page on setting up a Mac for Cheminformatics, and I've now added the instructions for pyqt

brew install pyqt

cinfony is a common API to several cheminformatics toolkits. It uses the Python programming language, and builds on top of Open Babel, RDKit, the CDK, Indigo, JChem, OPSIN and cheminformatics webservices. Currently it is hosted on Googlecode which is closing down. Fortunately the source code is also hosted on github, but you will need to look at the Google code site to read full details of the project. So the installation is:-

git clone
cd cinfony
python install

Then download the DecoyFinder2 source, and run


PYMOL under Yosemite


Reading through the discussion on Scientific Applications under Yosemite it seems some people are having problems with PYMOL, I thought I'd mention that installation of PYMOL using Homebrew is included on the page describing how to set up a Mac for Cheminformatics. The page also describes how to install a wide range of other useful tools.




ChemStack is a collection of components that allow users to build chemically intelligent systems, such as collaboration tools, information portals, electronic laboratory notebooks, eLearning systems etc.

Some examples of useful chemical interfaces include:

  • A sketcher to draw molecules
  • Viewers to display molecules
  • Components to display and interact with spectra
  • 3D graphics engines to investigate 3D structures
  • Text based input for IUPAC names and queries

There is a demo here that searches the ChEMBL database of 1.5M structures.


Substructure searching very large compound collections.


I described the use of the ability to script in Vortex multiple sub-structure searches using SMARTS. There are many occasions when this sort of feature is useful, if you want to flag molecules that contain reactive functional groups, toxicophores, or PAINS functional groups that have been shown to interfere with a variety of screens. Whilst the script worked fine it was rather slow for larger datasets, in the latest tutorial you can see how to take advantage of some of the latest features in Vortex to substantially improve search speeds allowing searching of 70 million compound collections on a desktop.

Scripting Vortex 24:- Substructure searching very large compound collections.

There are many more scripts listed on the Hints and Tutorials Page.


OpenEye Toolkits v2015.Feb released


OpenEye have announced the release of OpenEye Toolkits v2015.Feb. These libraries include the usual support for C++, Python, C# and Java.


  • Depiction of protein-ligand interactions in Grapheme TK
  • Improvements to matched pair analysis in OEMedChem TK
  • Improved orientation options for images from OEDepict TK
  • Better ring layout in OEDepict TK
  • A major upgrade to the documentation system



Cheminformatics on a Mac


A little while back I wrote a detailed tutorial for getting a wide variety of cheminformatics tools running on a Mac.

Someone just let me know about an issue with OSRA a utility designed to convert graphical representations of chemical structures, as they appear in journal articles, patent documents, textbooks, trade magazines etc., into SMILES

It turns out that OSRA requires ghostscript to process pdf images, this can be installed using brew.

brew install ghostscript

MedChem Wizard KNIME workflow


The MedChemWizard is a KNIME workflow designed to assist medicinal chemists with idea generation, ligand design and lead optimization using a number of common functional group transformations and medchem rules-of-thumb, this tutorial provided by Dr. Alastair Donald gives a detailed description of it's use.



Vortex scripts to access ChEMBL


ChEMBL is a manually curated chemical database of bioactive molecules . It is maintained by the European Bioinformatics Institute (EBI), of the European Molecular Biology Laboratory (EMBL), based at the Wellcome Trust Genome Campus, Hinxton, UK. The database currently contains over 1.4 million unique structures with the associated activity at 10,579 different targets. It also acts as a repository for Open Access primary screening and medicinal chemistry data directed at neglected diseases.

Whilst the database can be downloaded, the data can also be accessed via a web interface (shown below) and a series of web services, these Vortex scripts show how it is possible to pull data from ChEMBL into Vortex.

As usual I’ve written it as a tutorial to try and offer some explanation how the script works, Scripting Vortex 23:- Accessing ChEMBL using Web Services

I think this rather nicely shows the power of web services and json.

There is a list of other Vortex scripts on the Hints and Tutorials page


Scientific Applications under Yosemite (Update 12)


Whilst there are many sites that track the compatibility on common desktop applications, it is often difficult to find out information about scientific applications. Given that this seems to be such a major upgrade I thought I’d set up a spare machine to test applications before I update my main machine. I’ll update the list regularly and feel free to send in information.

I have a number of applications/libraries/toolkits installed using Homebrew and installed in usr/local, this is known to cause extended installation times for Yosemite. So don’t worry if it appears the install is stuck at 1 min remaining.

If you do use Homebrew then it is worth updating

brew update
brew upgrade

Aabel 3 appears to be working fine

BBEdit version 10.5.13 and newer are compatible with Yosemite

Beaker all seems OK

ChemBioDraw versions 12, 13 and 14 all function as before.

ChemDoodle all seems to work fine

Chimera aka UCSF Chimera versions 1.10 and higher are working on Yosemite.

Conquest and  Mercury from CCDC works fine but you may need to reinstall Quartz (see below)

Cresset report that after testing that Torch, Spark, Forge and Blaze appear to be compatible with Yosemite, the only cosmetic issues are due to a couple of as yet unresolved bugs in QT here.

Cytoscape 3.1.1 seems to be working fine

DataWarrior no issues

EndNote X7.2 works well with Yosemite.

Findings Electronic Notebook no issues, only small issue is that the ‘+’ button of the window does not trigger full-screen, though it can still be done via the Window menu.

IDL 8.2 and earlier gags on a missing reference in libPng.dyld, but IDL 8.3 and later is OK

Igor Pro version works fine

iNMR no problems reported

MacVector 13.0.6 No significant issues reported

Marvin Marvin, Instant JChem, and JChem suite all work but require Java 7 (available here

Mathematica no issues reported

Matlab 2014b works fine, 2012a thru 2014a need patching (directions available from Mathworks support site. Older versions will not run at all..

MOE works fine but you may need to reinstall Quartz (see below)

OpenBabel no issues so far

Opsin all works fine

OSRA no issues

Papers Current version is compatible but not optimised, they hope to have a beta out of a substantially redesigned version next week.

Pro Fit 6.2 appears to work fine.

Pybel no issues reported

PyCharm works fine

Pymol All these are confirmed to work:

  • MacPyMOL
  • MacPyMOLX11Hybrid after XQuartz reinstall (see below)
  • Open-Source PyMOL with homebrew

Known issues with MacPyMOL: - Movie export broken. Edu-only-PyMOL (free Student version) Does not work.(Now updated to work with Yosemite)
No reports so far about about - Other legacy versions (0.99 etc.) Apparently progam will not open - Open-Source PyMOL with fink or macports

PyRx 0.8 for docking works fine

RDkit no issues reported

SeeSAR all seems to be working fine

Sente 6.7.8 seems to run fine, except that it cannot open a reference library from the File > Open... dialog box. Workaround is to open from Finder.

Spartan 14 does not work because the Sentinel drivers are broken in Yosemite. The problem is NOT with Spartan, it is with the SafeNet developed Sentinel Run-Time Environment driver (the license manager). SafeNet has not given a definitive date when they will release an updated driver with Yosemite compatibility, but they are working on this. Best advice is to not upgrade but if you have to then contact for a temporary alternative license procedure.

Torch no issues

VarSeq no issues

Vortex Upgraded when the developer preview came out.  All works fine

The VVI products work well enough on Yosemite, but I'd like to achieve a higher level of quality for Yosemite (and iOS/iPad). There is an ongoing beta program for this product: which is Graph Builder reincarnated on the iPad. There is also a beta program ramping for Graph Builder on Yosemite: but a last minute interaction bug with Yosemite has delayed that for perhaps a few days. Please feel free to broadcast this information as you see fit. Beta program participation should be directed to

VMD no issues reported

Wizard Pro is fully Yosemite compatible

XQuartz it seems the Yosemite installer deleted the symlink between /opt/X11 and /usr/X11; you can either reinstall Quartz or try "ln -s /opt/X11 /usr/X11"

Updated 30 October 2014


VIDA v4.3.0 released


OpenEye have announced the release of VIDA v4.3. This is a major update with many new features and enhancements, including improvements to depiction, 2D alignment, list manager manipulation, surface selection and display, default colouring schemes, both visual and list-driven atom subset selection, cluster viewing, colouring by SD property and extension management.

One feature I’m sure will be very popular is the new advanced depiction options, including atom property maps from the Grapheme TK, substructure highlighting, and 2D structure alignment, are available for depiction in the 2D window and spreadsheet


Support for Mac OS X 10.8 and 10.9 was added
Mac OS X 10.6 is no longer supported


Impressions of Apple’s Swift, after a bit of practice


Swift is a new programming language from Apple for iOS and OS X apps that builds on the best of C and Objective-C, without the constraints of C compatibility. I’m delighted to hear that people are starting to explore it’s use in scientific applications. Dr. Alex M. Clark has posted his early impressions on the Cheminformatics blog, well worth a read.

There is also the Swift blog for more interesting tips.


CheS-Mapper updated


CheS-Mapper has been updated to version 2.4.

New Features Add Moss as new structural fragment mining algorithm Show the number of distinct 3D positions (at the top right, alongside other dataset info) Mapping warnings are now acessible within the viewer (Menu: Help > Show mapping warnings) Add hint for multiselection of compounds via 'control'-key (is shown when zooming into compounds for the first 3 times) More Changes The viewer no longer zooms out when changing component size or spread Add log conversion of feature values, by adding a new feature, instead of log-highlighting (gives better overview of log-distributed values, e.g. within the chart) Multiple selected compounds are now highlighted within the chart for nominal features (was only possible for numerical features) Fix Fix error that showed strucutural fragment values as '1'/'0' instead of 'match'/'no-match'

CheS-Mapper (Chemical Space Mapper) is a open source 3D-viewer for chemical datasets of small molecules, a publication in the Journal of Chemiformatics describes an early version of the application DOI: 10.1186/1758-2946-4-7, and there is a review here.


Open source pKa


A little while ago I suggested on Twitter that it might be useful if all chemistry undergraduates conduct a LogP or pKa determination as part of their practical classes. These results could then be stored in an open access database that would grow into a fantastic resource.

Sven Kochmann has now fleshed that initial idea out into a detailed proposal. Well worth reading and I’d encourage people to participate.


Asteris Updated


Asteris has been updated. Asteris is an iOS app that arose from a collaboration between Optibrium and Integrated Chemistry Design that allows medicinal chemists to design new molecules on their iPad and then calculate a range of physiochemical and ADME properties.

What's New in Version 1.0.2 Add sulfoxide support, using either double bond, or separated charges, Add multiple ring creation with one gesture if atoms are selected. Permit scaling with selected atoms and bonds. Add wavy bonds if Single bond tapped a second time. Add Presentation Mode.

There is a review of version one here


Compiling Plane of Best Fit


I was recently asked about compiling an algorithm, plane of best fit (PBF), to quantify and characterize the 3D character of molecules as described in Plane of Best Fit: A Novel Method to Characterize the Three-Dimensionality of Molecules, Nicholas C. Firth, Nathan Brown, and Julian Blagg, Journal of Chemical Information and Modeling 2012 52 (10), 2516-252 DOI. The source code is all available from the rdkit repository

The compilation had a slight glitch but full details are here.


Learning Python


It seems that Python is becoming the preferred language for scripting in science and I wrote a getting started page for Chemists and several people have pointed out a couple of resources that may be useful in particular Roaslind.

Rosalind is a platform for learning bioinformatics and programming through problem solving.

I looks like an excellent starting point for newcomers and more experienced programmers, whilst focussed on bioinformatics the exercises are useful for all disciplines.

For chemists chempython looks to be a very useful resource.


Datawarrior review


DataWarrior is a data analysis tool that understands chemistry, it provides an efficient way to search, sort and analyse structure-activity data. DataWarrior was developed at Actelion and it is highly integrated into the drug discovery platform, in 2014 it was decided to release DataWarrior without the integration layer as a stand-alone tool to the public. DataWarrior is a Java application and thus is cross platform.

I’ve written a review on my initial impressions.


OpenEye Toolkits Updated


OEToolkits 2014.Jun This release of the OpenEye toolkits is focused on stability and new platform support. The last release, 2014.Feb, was a major feature release introducing numerous new features. This release focused on fixing many bugs and improving the overall stability of the OpenEye toolkits.

There is still a major new feature being added in this release:

FreeForm API added to Szybki TK

Mac Users should note this release will be the last release to support OSX 10.7.


ChemAxon 6.3 released


ChemAxon have just announced the release of version 6.3.

This release includes several new features including the ability to draw and analyze complex patent Markush structures, display Markush hierarchy, work with R-Groups and enumerate Markush structures.

A new Solubility Predictor, the aqueous solubility predictor is based on the topology of the input molecules, but also calculates the pH dependence and the solubility at a desired pH level.

IUPAC name conversion supports now Japanese names as well as the existing English and Chinese names even if mixed in the same document so you can extract all the chemistry from documents in these languages.

There are also updates for Marvin JS 6.3, Standardizer & Structure Checker 6.3, Instant JChem 6.3, and Compound Registration 6.3.


Lilly MedChem Rules


In late 2012 Robert Bruns and Ian Watson published a paper entitled Rules for Identifying Potentially Reactive or Promiscuous Compounds. These 275 rules encapsulated 18 years of Drug Discovery experience and are used to identify potentially troublesome molecules.

The code to implement these rules was kindly made available by Ian Watson on GitHub unfortunately my initial attempts to compile this failed, but Matt was able to provide a patch to compile under Mac OSX (Mavericks) using Clang. Whilst this would be sufficient he then went the extra step and made it available via HomeBrew.

You can read more here.


Python, Chemistry and a Mac 1


After I posted the page on setting up a Mac for Cheminformatics I was asked if I could do something similar for writing chemistry (or Science in general) Python scripts on a Mac. So I’ve written a “How to” page on setting up your Mac to use the iPython notebook and write simple scripts that use Pybel to access OpenBabel.

The page is here Python, Chemistry and a Mac 1, and I’ll probably add more pages/scripts in the future.


Cheminformatics on a Mac


I’ve recently needed to set up a new Mac and I realised that the current installation process for all the applications, tools, chemistry toolboxes, and associated dependencies was unmanageable. I have a mixture of apps that I have compiled myself, others that I have simply used the precompiled binaries, others from Macports etc.

I decided to write a detailed account of the process of installing a number of toolkits and packages using Homebrew and PIP.

You can read the full account here in the hints and tutorials.

I’d be delighted to hear of any comments or suggestions for addition.


SMARTS viewer and editor


SMILES (Simplified Molecular Input Line Entry System) is a simple yet comprehensive chemical language in which molecules and reactions can be specified using ASCII characters representing atom and bond symbols. This system is compact and human readable which has made it an attractive way to store chemical information within a database.

Some examples

Ethanol CCO Cyclohexane C1CCCCC1 Nicotine CN1CCC[C@H]1c2cccnc2

In order to search for specific sub-structures it is necessary to create a query that describes the pattern of atoms and bonds (subgraph) required within the molecule (graph). SMARTS is a language that allows you to specify substructures using rules that are straightforward extensions of SMILES. That said complex queries can get challenging to interpret which is why the SMARTS viewer and SMARTS editor from BioSolveIT, two tools developed by Karen Schomburg and Lars Wetzer at the Center for Bioinformatics at the University of Hamburg, are so valuable.

The tools are provided free until June 30th 2014

K. Schomburg, H.-C. Ehrlich, K. Stierand, M.Rarey From Structure Diagrams to Visual Chemical Patterns J. Chem. Inf. Model., 2010, 50 (9), pp 1529-1535

K. Schomburg, L. Wetzer, M. Rarey Interactive Design of generic chemical patterns Drug Discov Today (2013)



We are starting to see companies exploit the client server model in bringing ever more sophisticated scientific applications to the iPad.

Asteris is a joint development from Optibrium the creators of StarDrop and Integrated Chemistry Design who created Chirys Draw. Asteris uses Chirys Draw’s touch interface to design novel molecules and then uses StarDrop’s predictive modeling power, guided by the Glowing Molecule™ visualization, instant feedback dramatically reduces the time it takes you to identify high quality compound designs. Using Asteris you can calculate a range of simple “core properties”, and ADME properties, including solubility, hERG inhibition and CNS penetration, using rigorously validated models from the StarDrop platform.

  • Molecular Weight
  • Number of rotatable bonds
  • Flexibility
  • Number of hydrogen bond donors
  • Number of hydrogen bond acceptors
  • Topological polar surface area.
  • logP
  • logS
  • logS7.4
  • logD
  • 2C9 pKi
  • hERGpIC50
  • BBB log([brain]:[blood])
  • BBB category
  • HIA category
  • P-gp category
  • 2D6 affinity category
  • PPB90 category

All of the predictions are calculated using StarDrop ’s ADME QSAR module. You will need to be connected to the internet to perform these calculations using the secure Asteris cloud server.Alternatively, you can run the calculations on your own server with the “Enterprise” edition.

All communications with the server uses industry-standard SSL encryption. No compound structures or data are stored on the server. Calculate "core properties" for an unlimited number of molecules for free. Calculate ADME properties for 20 new compounds each month, free of charge. Additional ADME property calculations can be purchased via an in-app purchase.

There are demo videos on the support site.


Diversity Genie released


Diversity Genie is a small but powerful utility to analyze datasets of small organic molecules. Its features include:

  • Calculation and comparison of diversity of chemical sets
  • Ability to handle sets of millions of molecules
  • Sorting, slicing, and merging large SD files
  • Conversion between SMILES, InChI, and SDF formats
  • Filtering based on property values and structural uniqueness
  • Computation of 2D and 3D atomic coordinates
  • Addition/Removal of implicit hydrogens
  • Computation of molecular properties such as molecular weight, number of rotatable bonds, number of HBD, HBA, as well as other descriptors
  • Export and import of data to/from CSV files
  • Data visualization

OEChem Updates


On the 10th anniversary the OEChem toolkit from OpenEye has been updated,

  • Added support for OSX 10.9 Mavericks.
  • The next toolkit release, 2014.Jun, will be the last release to support OSX 10.7.
  • This release will be the last release to support OSX 10.6.
  • The next toolkit release, 2014.Jun, will be the last release to support 64-bit Ubuntu 10.04.
  • GCC 4.8.2 support added for RHEL6. GCC 4.8.1 had a bug that made it impossible to compile OpenEye header files. Please use 4.8.2+.
  • Experimental support for Python 3.3 added.

Scripting Vortex to access Un1Chem


Un1Chem is a new web resource provided by the EBI, it is a 'Unified Chemical Identifier' system, designed to assist in the rapid cross-referencing of chemical structures, and their identifiers, between databases. Currently the uniChem contains data from 21 different data sources.

This script originally created by Sune Askjær first calculates the InChiKey for molecules in a workspace and then uses Un1Chem to search for information in multiple databases, then it provides a summary and a link to a locally generated summary table.


Full details are here Scripting Vortex 18.


Scripting Vortex 17 tutorial


In the tutorial Scripting Vortex 15 I showed how it is possible to create a contextual script for Vortex that downloaded a specific PDB file, then a FlexAlign Vortex script first identifies the structure column and then get the SMILES string of the selected molecule generates a 3D structure and uses Flex Align to do a one-shot flexalign between the ligand in the system in MOE, and the incoming ligand.

While this is useful if you have similar structures (perhaps analogues in a series) there will certainly be situations where it may be preferable to dock the new ligand into the binding site. The Scripting Vortex 17 tutorial describes how to achieve this.


A review of FAst MEtabolizer (FAME)


Whilst much computational work is undertaken to support, library design, virtual screening, hit selection and affinity optimisation the reality is that the most challenging issues to resolve in drug discovery often revolve around absorption, distribution, metabolism and excretion (ADME). Whilst we can measure the levels of parent drug in various medium tracking metabolic fate can often be a considerably more difficult proposition requiring significant resources. For this reason prediction of sites of metabolism has become the subject of current interest.

FAME DOI is a collection of random forest models trained on a comprehensive and highly diverse data set of 20,000 small molecules annotated with their experimentally determined sites of metabolism taken from multiple species (rat, dog and human). In addition dedicated models are available to predict sites of metabolism of phase I and II processes.


FAME offers a high performance prediction of sites of metabolism mediated by a wide variety of mechanisms.

The full review is available here

There is a list of software reviews here.


Marvin 6.1.5 has been released

Marvin has been updated

Bugfixes, Java Webstart did not run on Macintosh computers.

It can be downloaded from here

Note, many of you have bumped into the problem when the Gatekeeper Security of OS X blocks launching applications downloaded not from the Apple store. The solution is to modify the default settings of the Gatekeeper:

  • 'Apple > System Preferences > Security & Privacy '
  • In the 'General' section the setting of ' Allow applications downloaded from :' should be set to ' Anywhere'

After this you would not get the "damaged dmg" popup and you can install the downloaded dmg. After install it is probably a good idea to reset the Security settings.

If you are using Java applets it is probably worth reading this article

Apple just have introduced some new security settings in Safari for Java. In an average browser to make  a Java Applet to be able to touch your file system that Applet must be signed. In the new security update of Safari this Applet must be trusted as well. This means that you have to allow for the Applet to read and write you file system. Marvin starts with accessing some files on your computer, which means that it might not start without this permission or might not behave correctly




Ever had problems with an unusually formatted PDB file? PDBinout is a file conversion tool for PDB files that might interest you. It was created by Tomasz Woźniak at the Laboratory of Structural Chemistry of Nucleic Acids, Institute of Bioorganic Chemistry, Polish Academy of Sciences

PDB format is the most commonly used by various programs to define three-dimensional structure of biomolecules. Those programs however, often use different versions of this format. Therefore, it is often necessary to write own re-formatting scripts or change files manually, which makes PDB files less convenient to use. There are only few tools allowing to change one or two versions of PDB format into another and no comprehensive approach for unifying PDB format was developed. Here we present an open-source, Python-based tool PDBinout for processing and conversion of various versions of PDB file format for biostructural applications. Moreover, PDBinout allows to create one’s own PDB versions.

The download also includes a tutorial.

Reference Woźniak T. and Adamiak R.W. (2013) Personalization of structural PDB files, Acta Biochimica Polonica 60, Paper in Press


pKa Prospector


OpenEye have just announced the release of pKa Prospector v1.0 a database of high quality experimental pKa determinations. The ionisation state of a drug molecule can have profound effects on affinity, dissolution, absorption, distribution, metabolism and off-target activity. The ability to predict pKa is often compromised by the lack of relevant experimental data, pKa Prospector is intended to address that issue.

The built-in experimental pKa database was compiled by Tony Slater of pKaData Limited from a collection of IUPAC sources. Each measurement has been individually verified, curated, and assigned a metric of quality. There are more than 30,000 experiments across 12,000 molecules represented. The database is particularly relevant for medicinal chemistry due to the strong preponderance of room temperature aqueous measurements, the many molecules with multiple experimental records, and the presence of over three hundred different heterocycles.

It is also possible to add additional experimental results and have them integrated into the application thus expanding the chemical space covered. The search uses rooted maximum common substructure (MCS) with "electronically-aware" scoring, alternatively it can be searched by similarity or substructure. Ionizable groups are automatically identified and highlighted.


Bug in ChemBioDraw.


There was a blog entry on In the Pipeline about a bug in ChemDraw. Actually this has been known for a while (and present in previous versions) but it seems it still has not been fixed in the latest version of ChemBioDraw 13 on the Mac. As you can see in the image below including explicit hydrogens in your structure significantly impacts the calculated LogP. Whilst people don’t often add explicit atoms to phenyl rings, (expect perhaps in SAR studies) they often add them to heteroatoms.


At the moment there is no bug fix and no date set for a fix to any version of ChemBioDraw, the only approach is to avoid adding explicit hydrogens to structures if you want to calculate LogP. I’ve looked at a number of other applications and there seem to be no issues with ChemDoodle, Elemental, Marvin or OpenBabel.


Marvin 6.1.2 released


New features and improvements

  • MarvinSketch Dialog
  • 'Zoom to scaffold' checkbox option has been added to the "Preferences>Save/Load" tab. Documentation
  • Structure Checker
  • External structure checker configuration file URL can be set via Java System Property.


  • Editing
  • Electron-flow arrow could not be drawn from the A-B bond to the incipient A-C bond of an A-B-C structure. Forum
  • Import/Export
  • MolInputStream and MolImporter could have different format options.
  • MolImporter did not close its inputstream when an exception was thrown in the constructor.
  • Molecule type property was allowed in SDF, CSSDF export.
  • The coordinates of the sequence residue imported from SCSR MOL files were wrong if the residue had three attachment point.
  • Color and text format of atom label is exported to CDX and imported from CDX and CDXML. Forum
  • Graphical brackets were not imported from CDX files.
  • Gaussian Z-matrix input format
  • Command line, title line, and extra input properties were not exported to Gaussian Z-matrix input format. Forum
  • Clean 2D
  • Cleaning of position variation bonds could create overlapping bonds.
  • Cleaning of bridged systems could result in overlapping atoms. Forum
  • Calculations
  • Topology Analysis
  • Missing method has been added: TopologyAnalyserPlugin.getFsp3(). API Documentation
  • logD
  • New logD training documentation has been added. Documentation
  • Structure Checker
  • Fixer options in MarvinSketch are updated with newly defined settings.
  • External checkers can be loaded from JAR file in case the JAR file contains a space.

StarDrop 5.4

StarDrop was recently updated to version 5.4, this brings an update to the virtual library design module and scaffold based design, there have also been improvements to the plotting and data visualisation.

There are now seven optional plugins with three exciting new options.

Derek Nexus™ - Knowledge based toxicity prediction The new Derek Nexus module for StarDrop provides Lhasa Limited's world-leading technology for knowledge-based prediction of key toxicities. Using data from published and donated (unpublished) sources, Derek Nexus identifies structure-toxicity relationships that alert you to the potential for your compounds to cause toxicity. The Derek Nexus module provides predictions of the likelihood of a compound causing toxicity in over 40 endpoints, including mutagenicity, hepatotoxicity and cardiotoxicity.

BIOSTER™ - A world of chemistry experience BIOSTER is developed and updated in collaboration with Digital Chemistry and is available as an optional extension to StarDrop's Nova module. This combination enables you to quickly and easily search the comprehensive BIOSTER database to identify transformations that are relevant to your compounds. These can be automatically applied to generate novel structures with a high likelihood of biological activity and synthetic accessibility, prioritised against the property profile you require for your project. BIOSTER brings the collective experience of the chemistry community to help you to discover new active analogues of your compounds based on the tried and tested principle of isosterism. The BIOSTER module contains a unique compilation of over 20,000 precedented bioisosteric transformations, manually curated from the literature by Dr István Ujváry, complete with references to the original publications in which they are described.

torch3D™ The renamed torch3D module, using Cresset’s unique Field technology to understand and apply 3D Structure Activity Relationship (SAR), has been updated to include the latest version of Cresset’s XED force field providing insight into compounds’ 3D structures, biological activities and interactions.

These certainly significantly expand the potential utility of StarDrop, but note that these are not part of the standard install and may require additional licensing.