Macs in Chemistry

Insanely Great Science

artificial intelligence

AI in Chemistry literature


Some of the more popular pages on the site are the compilation of resources, a listing of Open-Source Cheminformatics toolkits and a list of useful python libraries for data science so I thought I'd flag a recent post by Pat Walters listing some interesting machine learning publications in 2020 on his practical cheminformatics blog.

His post AI in Drug Discovery 2020 is certainly an invaluable starting point.




OpenChem is a deep learning toolkit for Computational Chemistry with PyTorch backend. The goal of OpenChem is to make Deep Learning models an easy-to-use tool for Computational Chemistry and Drug Design Researchers.

You can read about in this publication DOI.

All code is available on GitHub


  • Modern NVIDIA GPU, compute capability 3.5 or newer.
  • Python 3.5 or newer (we recommend Anaconda distribution)
  • CUDA 9.0 or newer

numpy, pyyaml, scipy, ipython, mkl, scikit-learn, six, pytest, pytest-cov

The software is licensed under the MIT license


AiZynthFinder: a fast, robust and flexible open-source software for retrosynthetic planning


This looks very interesting DOI.

We present the open-source AiZynthFinder software that can be readily used in retrosynthetic planning. The algorithm is based on a Monte Carlo tree search that recursively breaks down a molecule to purchasable precursors. The tree search is guided by an artificial neural network policy that suggests possible precursors by utilizing a library of known reaction templates. The software is fast and can typically find a solution in less than 10 s and perform a complete search in less than 1 min.

Source code is on GitHub

Tested under macOS Catalina

Requires RDKit, Tensorflow, graphviz

Can then be installed using PIP.

The software is licensed under the MIT license


Discovery of new antibacterials using artificial intelligence


Given the huge popularity of the 3rd Artificial Intelligence in Chemistry Meeting earlier this week I thought I'd flag another meeting that might be of interest.

The Global Antibiotics Research and DevelopmentPartnership are organising a meeting entitled "Discovery of new antibacterials using artificial intelligence " you can register here looks like it will be an interesting session.



3rd AI in Chemistry Posters


The heavily oversubscribed 3rd RSC-BMCS / RSC-CICAG Artificial Intelligence in Chemistry will taking place next week 28th-29th September 2020 Twitter hashtag - #AIChem20. There is an accompanying poster session and there is a chance to talk to the poster presenters in the breakout rooms at the end of each day (You will need the latest version of Zoom 5.3.0).

Most of the posters are now available for viewing on Twitter so you can always have a browse and ask questions on Twitter even if you won't be at the meeting #AIChem20poster,

Below is a table containing all posters.

Poster Number Name Title Twitter link
P01 Antreas Afantitis Enalos cheminformatics tools: development of a de novo drug design module View on twitter
P02 Nurlybek Amangeldiuly Transfer learning with graph neural networks for protein_ligand binding kinetics prediction View on twitter
P03 Andy Sode Anker Characterising the atomic structure of mono_metallic nanoparticles from x_ray scattering data using conditional generative models View on twitter
P04 Jenna Bilbrey A look inside the black box: using graph_theoretical descriptors for the post_hoc interpretation of neural networks View on twitter
P05 Nicolas Bosc MAIP: a prediction platform for predicting blood_stage malaria inhibitors View on twitter
P06 Xiaojing Cong Receptor_ligand prediction by proteochemometric modeling: an application to G protein_coupled olfactory receptors View on twitter
P07 Simon Durr EVOLVE: a genetic algorithm to predict thermostability View on twitter
P08 Umberto Esposito Building a connected data pipeline to target drug development challenges
P09 Benedek Fabian MolBERT: molecular representation learning with advanced language models and useful auxiliary tasks View on twitter
P10 Miguel Garcia_Ortegon Improving VAE molecular representations by tailoring them to predict docking poses and scores View on twitter
P11 Wenhao Gao Can we synthesize molecules proposed by generative models View on twitter
P12 Helena Gaspar Proteochemometric models using multiple sequence alignments and a SentencePiece_based masked language model: application to CYP and kinome selectivity modelling View on twitter
P13 Ed Griffen An explainable AI system for medicinal chemists View on twitter
P14 Ed Griffen "Chemists: AI is here, unite to get the benefits" View on twitter
P15 Thomas Hadfield Explicit incorporation of structural information into a fragment elaboration model via deep reinforcement learning View on twitter
P16 Hans Hanley "GENerateZ: designing anticancer drugs using transcriptomic data, genetic algorithms, and variational autoencoder" View on twitter
P17 Fergus Imrie Generating property_matched decoy molecules using deep learning View on twitter
P18 Kjell Jorner Uniform quantitative predictive modelling for route design View on twitter
P19 Itai Levin Computationally assisted synthesis planning for hybrid chemoenzymatic pathways View on twitter
P20 Timur Madzhidov Deep conditional variational autoencoder for reaction conditions prediction View on twitter
P21 Gergely Makara AI_assisted lead optimization with derivatization design View on twitter
P22 Neann Mathai Performance and scope of a similarity_based and a random forest_based machine learning approach for small_molecule target prediction View on twitter
P23 Janosh Menke Enhancing molecular fingerprints using neural networks View on twitter
P24 Juan Carlos Mobarec Evolutionary chemistry for the design of desired pharmacological profiles View on twitter
P25 Rohit Modee Neural network potentials for representing potential energy surface and their applicability for geometry optimization View on twitter
P26 Joseph Morrone Challenges and progress in combining docking programs with deep neural networks View on twitter
P27 Eva Nittinger Non_additivity in public and inhouse data and its influence on ML performance View on twitter
P28 Ferruccio Palazzesi Integrating multi task graph convolutional neural network with a deep generative model View on twitter
P29 Yashaswi Pathak Deep learning enabled inorganic material generator View on twitter
P30 Quentin Perron Integrating data_driven computer_aided synthetic planning with generative AI
P31 Daniel Probst Classification of chemical reactions through NLP_inspired fingerprinting View on twitter
P32 Mikolaj Sacha Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits View on twitter
P33 withdrawn
P34 Jenke Scheen Data_driven estimation of optimally_designed perturbation networks in relative alchemical free energy calculations View on twitter
P35 Philippe Schwaller RXNMapper: unsupervised attention_guided atom_mapping View on twitter
P36 Matthew Segall Imputation versus prediction and applications in drug discovery View on twitter
P37 Vishnu Sresht Can generative models learn privileged substructures and identify new bioisosteres? View on twitter
P38 Gergely Takács Analysis of commercial and public compound databases by self_organizing maps View on twitter
P39 Morgan Thomas Towards integrating deep generative models with structure_based design View on twitter
P40 Hao Tian What is hidden behind allostery? An integrated framework to decipher key components in AuLOV dimerization View on twitter
P41 Alain Vaucher Learning how to do chemical reactions from data View on twitter
P42 Alexander van Teijlingen Beyond tripeptides _ two_step active machine learning for very large datasets View on twitter
P43 James Wallace eApps – enabling a predict first culture for computational medicinal chemistry View on twitter
P44 withdrawn
P45 Yuanqing Wang Bayesian active drug discovery via deep graph kernel learning View on twitter
P46 Robbie Warringham DigitalGlassware: structuring and contextualising chemical outcomes for faster discovery View on twitter
P47 withdrawn
P48 Jerome Wicker AIScape: a machine learning platform for activity and ADME predictions View on twitter

Some of the presenters have also recorded 2 min lightning presentation describing their work, these are available on the RSC CICAG YouTube channel.

Day 1 (odd numbered posters)

Day 2 (even numbered posters)



3rd RSC-BMCS / RSC-CICAG Artificial Intelligence in Chemistry


The heavily oversubscribed 3rd RSC-BMCS / RSC-CICAG Artificial Intelligence in Chemistry will taking place next week 28th-29th September 2020 Twitter hashtag - #AIChem20.

This is now an online virtual event, this has required an immense amount of reorganisation behind the scenes and significant expenditure. This would not have been possible without the generous support of the event sponsors. AstraZeneca and MSD and the exhibitors CCDC, Concept Life Sciences, IKTOS, Liverpool ChiroChem, Mcule, o2h discovery.

Some of the exhibitors have kindly provided videos describing their work, why not have a browse.


Liverpool ChiroChem


o2h discovery

If you are at the meeting they will also be able to talk to you directly at the breakout session at the end of each day.


AI in Chemistry Conference


The heavily oversubscribed 3rd RSC-BMCS / RSC-CICAG Artificial Intelligence in Chemistry will taking place 28th-29th September 2020 Twitter hashtag - #AIChem20. This has been converted to a virtual event and those registered will be getting more details soon.

There is an accompanying poster session, the posters are being hosted on twitter and you can see them all using the hashtag AIChem20posters. A number of the poster presenters have also created 2 min lightning presentations which will be uploaded to YouTube once the last few presentations have been included.

If you are registered and find you can no longer attend, please send an e-mail to the BMCS Secretariat promptly, so that your place can be allocated to one of those on the waiting list.


AI3 Science Discovery Network+ YouTube Channel


The AI3SD Network+ (Artificial Intelligence and Augmented Intelligence for Automated Investigations for Scientific Discovery) YouTube channel is now up and running.

The network+ is funded by EPSRC and hosted by the University of Southampton and aims to bring together researchers looking to show how cutting edge artificial and augmented intelligence technologies can be used to push the boundaries of scientific discovery.

  1. Drug Repositioning for COVID-19 - Professor John Overington (Medicines Discovery Catapult) -
  2. InChI: Measuring the Molecules - Professor Jonathan Goodman (University of Cambridge) -
  3. Design Fiction as a Method and why we might use it to consider AI - Dr Naomi Jacobs (Lancaster University) -
  4. Neural Networks and Explanatory Opacity - Dr Will McNeill (University of Southampton) -
  5. Dimensionality in Chemistry: Using multidimensional data for machine learning - Dr Ella Gale (University of Bristol) -

Swift for Tensorflow (and other things).


After creating MolSeeker and iBabel4 I've been investigating the use of Swift and in particular the open-source use. provides a nice introduction and overview, it also highlights the Google Summer of Code Swift projects which are a fabulous way for students to get involved.

The Google Swift for TensorFlow group have been very active, and Tyrolabs have recently posted a detailed summary, including a comparison with other languages.

Two years ago, a small team at Google started working on making Swift the first mainstream language with first-class language-integrated differentiable programming capabilities. The scope and initial results of the project have been remarkable, and general public usability is not very far off.

They have now provided support for Jupyter notebooks

There is also an interesting blog post here

IBM also seem to be using swift and are highlighting leveraging Watson.

Developers can take advantage of the Watson Developer Cloud’s Swift SDK to easily build Watson-powered applications for iOS or Linux platforms. Leverage the power of Watson’s advanced artificial intelligence, machine learning, and deep learning techniques to understand unstructured data and engage with users in new ways.

Since Swift is a relatively new language it is worth looking at the ongoing evolution.


Jupyter notebook to access IBM RXN AI-assisted retrosynthesis


A python wrapper for the IBM RXN api has been released, available on GitHub

To install

pip install rxn4chemistry

You will need to register and get an api key from here

This demo shows how to use for retrosynthesis ideas.


The page also includes links to download the notebook.


Jupyter notebook to access IBM RXN API


A python wrapper for the IBM RXN api has been released, available on GitHub

To install

pip install rxn4chemistry

You will need to register and get an api key from here

Simple demo using Jupyter Notebook


This is going to be very useful.


The updated APIs of IBM RXN for Chemistry are now available.


IBM RXN is a free web service for predicting chemical reactions.

Whether it’s daily research activity or experiments for fun, IBM RXN can help you predicting chemical reaction outcomes or designing retrosynthesis in just seconds

  • We provide a state-of-the-art trained artificial intelligence (AI) model that can be used in your daily research activities irrespective of the purpose
  • Use the prediction mode to open a project and invite collaborators to collectively plan complex synthesis.
  • Use the challenge mode to test your Organic Chemistry knowledge and prepare for class exams Design your retrosynthesis either using the automatic or the interactive mode. In the interactive mode, IBM RXN for Chemistry – just like an assistant – recommends disconnections and you choose.

For testing the synthesizability of a molecule or for digitizing recipes, use the RXN APIs whenever you need an AI-driven organic chemistry assistant in your code.

The full documentation is here.


Macs and CUDA


One of the highlights for me at the recent 2nd RSC-BMCS / RSC-CICAG Artificial Intelligence in Chemistry in Cambridge was the work of Adrian Roitberg and Olexandr Isayev et al on Approaching coupled cluster accuracy with a general-purpose neural network potential through transfer learning DOI.

Here we train a general-purpose neural network potential (ANI- 1ccx) that approaches CCSD(T)/CBS accuracy on benchmarks for reaction thermochemistry, isomerization, and drug-like molecular torsions. This is achieved by training a network to DFT data then using transfer learning techniques to retrain on a dataset of gold standard QM calculations (CCSD(T)/CBS) that optimally spans chemical space. The resulting potential is broadly applicable to materials science, biology, and chemistry, and billions of times faster than CCSD(T)/CBS calculations.

The presentation was really compelling and really looks like an example where AI can be truly transformational. The good news is the code is all freely available on Github, the bad news is that it "Works only under Ubuntu variants of Linux with a NVIDIA GPU" and Python binaries built for python 3.6 and CUDA 9.2.

In the past I would have stopped there but with the increasing number of external GPU and a NVIDIA CUDA Installation Guide for Mac OS X I'm wondering if there might be a path forward. I'd be very interested to hear about experiences with external GPU with NVIDIA graphics cards and using the CUDA toolkit on a Mac.


Olexandr emailed me to to mention they have a pure Python version this will run on Mac however there is no GPU acceleration.

TorchANI is a pytorch implementation of ANI. It is currently under alpha release, which means, the API is not stable yet. If you find a bug of TorchANI, or have some feature request, feel free to open an issue on GitHub, or send us a pull requests

Also stumbled across the paper

Ab-Initio Solution of the Many-Electron Schrödinger Equation with Deep Neural Networks


2nd RSC-BMCS / RSC-CICAG Artificial Intelligence in Chemistry


The 2nd RSC-BMCS / RSC-CICAG Artificial Intelligence in Chemistry is now over, two intensive days of presentations and posters. Many thanks for all who took part and made it such a successful event.

Special mention to the Poster prize winners.

P17 by Jenke Scheen of the University of Edinburgh Entitled: "Improving the accuracy of alchemical free energy methods by learning correction terms for binding energy estimates"

P6 by Adam Green of the University of Leeds Entitled: "Activity-directed discovery of inhibitors of the p53/MDM2 interaction: towards autonomous functional molecule discovery"

P3 by Ya Chen of the University of Hamburg Entitled: "NP-Scout: machine learning approach for the identification of natural products and natural product-like compounds in large molecular databases"

If you want to browse through the Twitter feeds search for the #AIChem19 hashtag.

Many of the presentations are now available in pdf format on the meeting website.

We are already thinking about a possible 3rd meeting, and any feedback would be much appreciated.


A parallel Fortran framework for neural networks and deep learning


Since the Fortran on a Mac is one of the most popular pages on the site I thought I'd mention this paper, submitted to ACM SIGPLAN Fortran Forum, I've just come across.

A parallel Fortran framework for neural networks and deep learning DOI.

This paper describes neural-fortran, a parallel Fortran framework for neural networks and deep learning. It features a simple interface to construct feed-forward neural networks of arbitrary structure and size, several activation functions, and stochastic gradient descent as the default optimization algorithm. Neural-fortran also leverages the Fortran 2018 standard collective subroutines to achieve data-based parallelism on shared- or distributed-memory machines. First, I describe the implementation of neural networks with Fortran derived types, whole-array arithmetic, and collective sum and broadcast operations to achieve parallelism. Second, I demonstrate the use of neural-fortran in an example of recognizing hand-written digits from images. Finally, I evaluate the computational performance in both serial and parallel modes. Ease of use and computational performance are similar to an existing popular machine learning framework, making neural-fortran a viable candidate for further development and use in production.


2nd RSC-BMCS / RSC-CICAG Artificial Intelligence in Chemistry



I was just looking through the delegate registrations for the 2nd RSC-BMCS / RSC-CICAG Artificial Intelligence in Chemistry Meeting taking place in Cambridge, UK 2nd to 3rd September 2019. We now have significantly more registrations than the first meeting, participants are coming from 16 different countries and whilst the UK and US predominate there are many participants from the rest of Europe and even some from Japan and Korea. There are 90 different organisations represented and I'm delighted to see there are over 20 student attendees, many from overseas. A number of students are presenting posters and the lineup of people taking part in the flash poster session can be found here.

Registration is still open for what looks like what will be another outstanding meeting.

A few people have said they are planning a visit to Cambridge for a holiday around the meeting and have asked for suggestions of things to do. Visit Cambridge is a good place to start.


Autocompletion with deep learning


This looks really interesting

TabNine is an autocompleter that helps you write code faster by adding a deep learning model which significantly improves suggestion quality. You can see videos at the link above.

There has been a lot of hype about deep learning in the past few years. Neural networks are state-of-the-art in many academic domains, and they have been deployed in production for tasks such as autonomous driving, speech synthesis, and adding dog ears to human faces. Yet developer tools have been slow to benefit from these advances

Deep TabNine is trained on around 2 million files from GitHub. During training, its goal is to predict each token given the tokens that come before it. To achieve this goal, it learns complex behaviour, such as type inference in dynamically typed languages.

An interesting idea, my only concern is the quality of code in the training set.


AI in Chemistry bursaries still available


The 2nd RSC-BMCS / RSC-CICAG Artificial Intelligence in Chemistry meetings is filling up fast, however there are still 6 bursaries unallocated. The closing date for applications is 15 July. The bursaries are available up to a value of £250, to support registration, travel and accommodation costs for PhD and post-doctoral applicants studying at European academic institutions.

You can find details here

Twitter hashtag - #AIChem19


Molecular Transformer


When this paper first appeared on the arXiv preprint server "Molecular Transformer - A Model for Uncertainty-Calibrated Chemical Reaction Prediction it generated considerable interest.

Organic synthesis is one of the key stumbling blocks in medicinal chemistry. A necessary yet unsolved step in planning synthesis is solving the forward problem: given reactants and reagents, predict the products. Similar to other work, we treat reaction prediction as a machine translation problem between SMILES strings of reactants-reagents and the products. We show that a multi-head attention Molecular Transformer model outperforms all algorithms in the literature, achieving a top-1 accuracy above 90% on a common benchmark dataset. Our algorithm requires no handcrafted rules, and accurately predicts subtle chemical transformations. Crucially, our model can accurately estimate its own uncertainty, with an uncertainty score that is 89% accurate in terms of classifying whether a prediction is correct. Furthermore, we show that the model is able to handle inputs without reactant-reagent split and including stereochemistry, which makes our method universally applicable.

I just noticed that it had been recently updated.

If you are interested in this exciting area of chemistry you might be interested to know the code is available on GitHub and the trained model is available online

One of the authors, Alpha Lee, is speaking at the 2nd Artificial Intelligence in Chemistry Meeting #AIChem19, 2nd to 3rd September 2019, Fitzwilliam College, Cambridge, UK. You can register for the meeting here if you would like to hear first hand about this technology.


The full lineup of speakers are here. Also remember there are bursaries available for the meeting.


Can we trust Published data?

I posted a poll on twitter

Looking at abstracts for the AI in Chemistry Meeting … many mine published data. The quality of the public data is obviously critical for good models. Is this something the AI community should be concerned about or get involved with to improve the quality of the literature?

The results are now in and interestingly despite nearly 2.5K impressions only 28 people voted. Of those that voted the overwhelming majority feel that AI scientists should help to improve the quality of the literature.


The comments associated with the tweet are interesting, certainly many machine learning models are robust enough to accommodate some poor data but I think there is a deeper concern.

Elisabeth Bik has regularly flagged questionable publications, unfortunately these are not always detected before their influence has been propagated through the literature.

For a very detailed example look at 5-HTTLPR: A POINTED REVIEW looking at an unusual version of the serotonin transporter gene 5-HTTLPR.

I've heard of many examples of scientists being unable to reproduce literature findings, usually little happens, however Amgen were able to reproduce only 6 out of 53 'landmark' studies and they published their findings.

How many times do scientists assume failure to reproduce published findings is their error?

There have been several studies looking at the possible causes of the failure to reproduce work, in 2011, an evaluation of 246 antibodies used in epigenetic studies found that one-quarter failed tests for specificity, meaning that they often bound to more than one target. Four antibodies were perfectly specific — but to the wrong target Reproducibility crisis: Blame it on the antibodies.

See also "The antibody horror show: an introductory guide for the perplexed" DOI

Colourful as this may appear, the outcomes for the community are uniformly grim, including badly damaged scientific careers, wasted public funding, and contaminated literature.

If you are mining literature data to predict novel drug targets then Caveat emptor.



Special Issue "Machine Learning with Python"


I was just sent details of a Special Issue "Machine Learning with Python for the journal Information.

We live in this day and age where quintillions of bytes of data are generated and collected every day. Around the globe, researchers and companies are leveraging these vast amounts of data in countless application areas, ranging from drug discovery to improving transportation with self-driving cars.As we all know, Python evolved into the lingua franca of machine learning and artificial intelligence research over the last couple of years. What makes Python particularly attractive for us researchers is that it gives us access to a cohesive set of tools for scientific computing and is easy to teach and learn. Also, as a language that bridges many different technologies and different fields, Python fosters interdisciplinary collaboration. And besides making us more productive in our research, sharing tools we develop in Python has the potential to reach a wide audience and benefit the broader research community.

This special issue is now open for submission.


Can we trust published data


A Twitter poll can we trust published data and should AI community be involved?

Looking at abstracts for … many mine published data. The quality of the public data is obviously critical for good models.


Swift for TensorFlow Models


This repository contains TensorFlow models written in Swift.

Swift for TensorFlow is a next-generation platform for machine learning, incorporating the latest research across machine learning, compilers, differentiable programming, systems design, and beyond. This is an early-stage project: it is not feature-complete nor production-ready, but it is ready for pioneers to try in projects, give feedback, and help shape the future!

This is the second public release of Swift for TensorFlow, available across Google Colaboratory, Linux, and macOS.


In which area is Artificial Intelligence likely to most impact Chemistry, the results are in


I ran a poll last week asking "In which area is Artificial Intelligence likely to most impact Chemistry?" And we now have the results.


Whilst Molecular Design was the most popular choice it was interesting to see that all options were well supported. This suggests that there are opportunities for artificial intelligence to have an impact in many facets of chemistry. I'm delighted to see this since this was part of the thinking behind the AI in Chemistry meeting and I think the line up of speakers will have something for everyone.

2nd RSC-BMCS / RSC-CICAG, Artificial Intelligence in Chemistry, Monday-Tuesday, 2nd to 3rd September 2019. Fitzwilliam College, Cambridge, UK. #AIChem19

Artificial Intelligence is presently experiencing a renaissance in development of new methods and practical applications to ongoing challenges in Chemistry. Following the success of the inaugural “Artificial Intelligence in Chemistry” meeting in 2018, we are pleased to announce that the Biological & Medicinal Chemistry Sector (BMCS) and Chemical Information & Computer Applications Group (CICAG) of the Royal Society of Chemistry are once again organising a conference to present the current efforts in applying these new methods. The meeting will be held over two days and will combine aspects of artificial intelligence and deep machine learning methods to applications in chemistry.

Programme (draft)

Monday, 2nd September
Registration, refreshments
Deep learning applied to ligand-based de novo design: a real-life lead optimization case study
Quentin Perron, IKTOS, France
A. Turing test for molecular generators
Jacob Bush, GlaxoSmithKline, UK
Flash poster presentations
Refreshments, exhibition and posters
Presentation title to be confirmed
Keynote: Regina Barzilay, Massachusetts Institute of Technology, USA
Lunch, exhibition and posters
Artificial intelligence for predicting molecular Electrostatic Potentials (ESPs): a step towards developing ESP-guided knowledge-based scoring functions
Prakash Rathi, Astex Pharmaceuticals, UK
Molecular transformer for chemical reaction prediction and uncertainty estimation
Alpha Lee, University of Cambridge, UK
Drug discovery disrupted - quantum physics meets machine learning
Noor Shaker, GTN, UK
Refreshments, exhibition and posters
Application of AI in chemistry: where are we in drug design?
Christian Tyrchan, AstraZeneca, Sweden
Presentation title to be confirmed
Anthony Nicholls, OpenEye Scientific Software, USA
17.30 Close
18.45 Drinks reception
19.15 Conference dinner

Tuesday, 3rd September
09.00v Deep generative models for 3D compound design from fragment screens
Fergus Imrie, University of Oxford, UK
DeeplyTough: learning to structurally compare protein binding sites
Joshua Meyers, BenevolentAI, UK
Discovery of nanoporous materials for energy applications
Maciej Haranczyk, IMDEA Materials Institute, Spain
Refreshments, exhibition and posters
Deep learning for drug discovery
Keynote: David Koes, University of Pittsburgh, USA
Networking lunch, exhibition and posters
Presentation title to be confirmed
Olexandr Isayev, University of North Carolina at Chapel Hill, USA
Dreaming functional molecules with generative ML models
Christoph Kreisbeck, Kebotix, USA
Refreshments, exhibition and posters
Presentation title to be confirmed
Keynote: Adrian Roitberg, University of Florida, USA

You can get more information and register here


2nd RSC-BMCS / RSC-CICAG Artificial Intelligence in Chemistry


In June 2018 the First RSC-BMCS / RSC-CICAG Artificial Intelligence in Chemistry meeting was held in London. This proved to enormously popular, there were more oral abstracts and poster submissions than we had space for and was so over-subscribed we could have filled a venue double the size.

Planning for the second meeting is now in full swing, and it will be held in Cambridge 2-3 September 2019.

Event : 2nd RSC-BMCS / RSC-CICAG Artificial Intelligence in Chemistry
Dates : Monday-Tuesday, 2nd to 3rd September 2019
Place : Fitzwilliam College, Cambridge, UK
Websites : Event website, and RSC website.

Twitter #AIChem19


Applications for both oral and poster presentations are welcomed. Posters will be displayed throughout the day and applicants are asked if they wished to provide a two-minute flash oral presentation when submitting their abstract. The closing dates for submissions are:

  • 31st March for oral and
  • 5th July for poster

Full details can be found on the Event website,


GuacaMol, benchmarking models.


Comparison of different algorithms is an under researched area, this publication looks like a useful starting point.

GuacaMol: Benchmarking Models for De Novo Molecular Design

De novo design seeks to generate molecules with required property profiles by virtual design-make-test cycles. With the emergence of deep learning and neural generative models in many application areas, models for molecular design based on neural networks appeared recently and show promising results. However, the new models have not been profiled on consistent tasks, and comparative studies to well-established algorithms have only seldom been performed. To standardize the assessment of both classical and neural models for de novo molecular design, we propose an evaluation framework, GuacaMol, based on a suite of standardized benchmarks. The benchmark tasks encompass measuring the fidelity of the models to reproduce the property distribution of the training sets, the ability to generate novel molecules, the exploration and exploitation of chemical space, and a variety of single and multi-objective optimization tasks. The benchmarking framework is available as an open-source Python package.

Source code :

The easiest way to install guacamol is with pip:

pip install git+ --process-dependency-links

guacamol requires the RDKit library (version 2018.09.1.0 or newer).


IBM RXN twitter feed


Just saw new twitter feed that might be of interest for any synthetic chemists interested in retrosynthesis/reaction prediction.


Account for news and general info on the freely available AI platform made by #compchem chemists for #organic chemists

You can try out the reaction planning service for free here

Pasted Graphic


Optibrium and Intellegens Collaborate

Optibrium and Intellegens Collaborate to Apply Novel Deep Learning Methods to Drug Discovery

Partnership combines Intellegens’ proprietary AI technology with Optibrium’s expertise in predictive modelling and compound design. Optibrium provides elegant software solutions for small molecule design, optimisation and data analysis. By leveraging Intellegens’ AlchemiteTM technology, the partnership will create a “next generation” predictive modelling platform that is capable of delivering more accurate predictions and enabling better decision-making when it comes to the optimisation of compounds.

Read more.


Intelligently Automating Machine Learning, Artificial Intelligence, and Data Science


A timely tutorial and example workflow.

we have put together a more comprehensive workflow, serving as a blueprint for anyone to build her or his own version of a Guided Analytics application to combine just the right amount of automation and interaction for a specific set of problems.

Full details here


KNIME update


What’s New in KNIME Analytics Platform 3.6.

  • KNIME Deep Learning
  • Constant Value Column Filter
  • Numeric Outliers
  • Column Expressions
  • Scorer (JavaScript)
  • Git Nodes
  • Call Workflow (Table Based)
  • KNIME Server Connection
  • Text Processing
  • Usability Improvements
  • Connect/Unconnect nodes using keyboard shortcuts
  • Zooming
  • Replacing and connecting nodes with node drop
  • Node repository search
  • Usability improvements in the KNIME Explorer
  • Copy from/Paste to JavaScript Table view/editor
  • Miscellaneous
  • Performance: Column Store (Preview)
  • Making views beautiful: CSS changes
  • KNIME Big Data Extensions
  • Create Local Big Data Environment
  • KNIME H2O Sparkling Water Integration
  • Support for Apache Spark v2.3
  • Big Data File Handling Nodes (Parquet/ORC)
  • Spark PCA
  • Spark Pivot
  • Frequent Item Sets and Association Rules
  • Previews
  • Create Spark Context via Livy
  • Database Integration
  • Apache Kafka Integration
  • KNIME Server

  • Management (Client Preferences)

  • Job View (Preview)
  • Distributed Executors (Preview)
  • General release notes

  • JSON Path library update

  • Java Snippet Bundle Imports

I suspect it will be the KNIME Deep learning that will catch the eye, the ability to set up deep learning models using drag and drop. Use regular Tensorflow models within KNIME Analytics Platform and seamlessly convert from Keras to Tensorflow for efficient network execution


The new Create Local Big Data Environment node creates a fully functional local big data environment including Apache Spark, Apache Hive and HDFS. It allows you to try out the nodes of the KNIME Big Data Extensions without a Hadoop cluster.


Second Major DeepChem Release


A major update the DeepChem has been announced.

This major version release finishes consolidating the DeepChem codebase around our TensorGraph API for constructing complex models in DeepChem. We've made a variety of improvements to TensorGraph's saving/loading features and added a number of new tutorials improving our documentation of TensorGraph. We've also removed a number of older deprecated submodules and models in favor of the new, standardized TensorGraph implementations.

In addition, we've implemented a number of new deep models and algorithms, including DRAGONNs, Molecular Autoencoders, MIX+GANs, continuous space A3C, MCTS for RL, Mol2Vec and more. We've also continued improving our core graph convolutional implementations.

Also remember the RSC-BMCS / RSC-CICAG Artificial Intelligence in Chemistry Meeting registration is now open.


Artificial Intelligence in Chemistry


I mentioned the first announcement of a meeting to be held next year.

RSC-BMCS / RSC-CICAG Artificial Intelligence in Chemistry Friday, 15th June 2018 Royal Society of Chemistry at Burlington House, London, UK.
Twitter hashtag - #RSC_AIChem


A number of the speakers have now been confirmed.

Confirmed Speakers

Keynote: What I learned about machine learning - revisited Bob Sheridan, Merck

Presentation title to be confirmed Nadine Schneider, Novartis

Scaling de novo design, from single target to disease portfolio Wilhem van Hoorn, Exscientia

Presentation title to be confirmed Marwin Segler, Benevolent AI

Molecular de novo design through deep learning Ola Engkvist, AstraZeneca

I also notice that there are a number of EPSRC funding opportunities

Artificial Intelligence - UKRI CDTs EPSRC is expected to support 10-20 doctoral training positions.

The call is now open for around 15 Centres for Doctoral Training (CDTs) focused on areas relevant to Artificial Intelligence (AI) across UKRI's remit. This call opens against the background of Professor Dame Wendy Hall and Jérôme Pesenti's review, Growing the artificial intelligence industry in the UK, and the Government's Industrial Strategy White Paper, Building a Britain fit for the Future. This investment in AI skills will be kick-started by support for over 100 studentships that will be funded during 2018/19 via the Research Councils current mechanisms and schemes.

Universities are invited to apply against two priority areas:

Enabling Intelligence, a priority area within Engineering and Physical Sciences Research Council's (EPSRC) main CDT call
Applications and Implications of Artificial Intelligence (AIAI), a new priority area relevant to all Research Councils.

More info..


Deep Learning Cheat Sheet (using Python Libraries)


Just came across this really invaluable resource.

  • Deep Learning Cheat Sheet (using Python Libraries)
  • PySpark Cheat Sheet: Spark in Python
  • Data Science in Python: Pandas Cheat Sheet
  • Cheat Sheet: Python Basics For Data Science
  • A Cheat Sheet on Probability
  • Cheat Sheet: Data Visualization with R
  • New Machine Learning Cheat Sheet by Emily Barry
  • Matplotlib Cheat Sheet
  • One-page R: a survival guide to data science with R
  • Cheat Sheet: Data Visualization in Python
  • Stata Cheat Sheet
  • Common Probability Distributions: The Data Scientist’s Crib Sheet
  • Data Science Cheat Sheet
  • 24 Data Science, R, Python, Excel, and Machine Learning Cheat Sheets
  • 14 Great Machine Learning, Data Science, R , DataViz Cheat Sheets


RSC-BMCS / RSC-CICAG Artificial Intelligence in Chemistry


The first announcement of a meeting to be held next year.

RSC-BMCS / RSC-CICAG Artificial Intelligence in Chemistry Friday, 15th June 2018 Royal Society of Chemistry at Burlington House, London, UK.
Twitter hashtag - #RSC_AIChem


Artificial Intelligence is presently experiencing a renaissance in development of new methods and practical applications to ongoing challenges in Chemistry. We are pleased to announce that the Biological & Medicinal Chemistry Sector (BMCS) and Chemical Information & Computer Applications Group (CICAG) of the Royal Society of Chemistry are organising a one-day conference entitled Artificial Intelligence in Chemistry to present the current efforts in applying these new methods. We will combine aspects of artificial intelligence and deep machine learning methods to applications in chemistry.

Applications for oral and poster presentations are welcomed. Posters will be displayed throughout the day and applicants will be asked if they would like to provide a two-minute flash oral presentation when submitting their abstract. Closing dates are 31st January for oral and 13th April for poster submissions.

More details here


“Found in Translation”: Predicting Outcomes of Complex Organic Chemistry Reactions


An interesting paper uses 1,808,938 reactions from the patent literature as a training set to build a model to predict reactions.

There is an intuitive analogy of an organic chemist's understanding of a compound and a language speaker's understanding of a word. Consequently, it is possible to introduce the basic concepts and analyze potential impacts of linguistic analysis to the world of organic chemistry. In this work, we cast the reaction prediction task as a translation problem by introducing a template-free sequence-to-sequence model, trained end-to-end and fully data-driven. We propose a novel way of tokenization, which is arbitrarily extensible with reaction information. With this approach, we demonstrate results superior to the state-of-the-art solution by a significant margin on the top-1 accuracy. Specifically, our approach achieves an accuracy of 80.1% without relying on auxiliary knowledge such as reaction templates. Also, 66.4% accuracy is reached on a larger and noisier dataset.

There is also a brief video describing the work.