Macs in Chemistry

Insanely Great Science

artificial intelligence

Macs and CUDA

 

One of the highlights for me at the recent 2nd RSC-BMCS / RSC-CICAG Artificial Intelligence in Chemistry in Cambridge was the work of Adrian Roitberg and Olexandr Isayev et al on Approaching coupled cluster accuracy with a general-purpose neural network potential through transfer learning DOI.

Here we train a general-purpose neural network potential (ANI- 1ccx) that approaches CCSD(T)/CBS accuracy on benchmarks for reaction thermochemistry, isomerization, and drug-like molecular torsions. This is achieved by training a network to DFT data then using transfer learning techniques to retrain on a dataset of gold standard QM calculations (CCSD(T)/CBS) that optimally spans chemical space. The resulting potential is broadly applicable to materials science, biology, and chemistry, and billions of times faster than CCSD(T)/CBS calculations.

The presentation was really compelling and really looks like an example where AI can be truly transformational. The good news is the code is all freely available on Github https://github.com/isayev/ASE_ANI, the bad news is that it "Works only under Ubuntu variants of Linux with a NVIDIA GPU" and Python binaries built for python 3.6 and CUDA 9.2.

In the past I would have stopped there but with the increasing number of external GPU and a NVIDIA CUDA Installation Guide for Mac OS X I'm wondering if there might be a path forward. I'd be very interested to hear about experiences with external GPU with NVIDIA graphics cards and using the CUDA toolkit on a Mac.

Update

Olexandr emailed me to to mention they have a pure Python version https://github.com/aiqm/torchani this will run on Mac however there is no GPU acceleration.

TorchANI is a pytorch implementation of ANI. It is currently under alpha release, which means, the API is not stable yet. If you find a bug of TorchANI, or have some feature request, feel free to open an issue on GitHub, or send us a pull requests

Also stumbled across the paper

Ab-Initio Solution of the Many-Electron Schrödinger Equation with Deep Neural Networks https://arxiv.org/abs/1909.02487Arxiv


Comments

2nd RSC-BMCS / RSC-CICAG Artificial Intelligence in Chemistry

 

The 2nd RSC-BMCS / RSC-CICAG Artificial Intelligence in Chemistry is now over, two intensive days of presentations and posters. Many thanks for all who took part and made it such a successful event.

Special mention to the Poster prize winners.

P17 by Jenke Scheen of the University of Edinburgh Entitled: "Improving the accuracy of alchemical free energy methods by learning correction terms for binding energy estimates"

P6 by Adam Green of the University of Leeds Entitled: "Activity-directed discovery of inhibitors of the p53/MDM2 interaction: towards autonomous functional molecule discovery"

P3 by Ya Chen of the University of Hamburg Entitled: "NP-Scout: machine learning approach for the identification of natural products and natural product-like compounds in large molecular databases"

If you want to browse through the Twitter feeds search for the #AIChem19 hashtag.

Many of the presentations are now available in pdf format on the meeting website.

We are already thinking about a possible 3rd meeting, and any feedback would be much appreciated.

Comments

A parallel Fortran framework for neural networks and deep learning

 

Since the Fortran on a Mac is one of the most popular pages on the site I thought I'd mention this paper, submitted to ACM SIGPLAN Fortran Forum, I've just come across.

A parallel Fortran framework for neural networks and deep learning DOI.

This paper describes neural-fortran, a parallel Fortran framework for neural networks and deep learning. It features a simple interface to construct feed-forward neural networks of arbitrary structure and size, several activation functions, and stochastic gradient descent as the default optimization algorithm. Neural-fortran also leverages the Fortran 2018 standard collective subroutines to achieve data-based parallelism on shared- or distributed-memory machines. First, I describe the implementation of neural networks with Fortran derived types, whole-array arithmetic, and collective sum and broadcast operations to achieve parallelism. Second, I demonstrate the use of neural-fortran in an example of recognizing hand-written digits from images. Finally, I evaluate the computational performance in both serial and parallel modes. Ease of use and computational performance are similar to an existing popular machine learning framework, making neural-fortran a viable candidate for further development and use in production.


Comments

2nd RSC-BMCS / RSC-CICAG Artificial Intelligence in Chemistry

 

AI-webpage-image

I was just looking through the delegate registrations for the 2nd RSC-BMCS / RSC-CICAG Artificial Intelligence in Chemistry Meeting taking place in Cambridge, UK 2nd to 3rd September 2019. We now have significantly more registrations than the first meeting, participants are coming from 16 different countries and whilst the UK and US predominate there are many participants from the rest of Europe and even some from Japan and Korea. There are 90 different organisations represented and I'm delighted to see there are over 20 student attendees, many from overseas. A number of students are presenting posters and the lineup of people taking part in the flash poster session can be found here.

Registration is still open for what looks like what will be another outstanding meeting.

A few people have said they are planning a visit to Cambridge for a holiday around the meeting and have asked for suggestions of things to do. Visit Cambridge is a good place to start.


Comments

Autocompletion with deep learning

 

This looks really interesting

TabNine is an autocompleter that helps you write code faster by adding a deep learning model which significantly improves suggestion quality. You can see videos at the link above.

There has been a lot of hype about deep learning in the past few years. Neural networks are state-of-the-art in many academic domains, and they have been deployed in production for tasks such as autonomous driving, speech synthesis, and adding dog ears to human faces. Yet developer tools have been slow to benefit from these advances

Deep TabNine is trained on around 2 million files from GitHub. During training, its goal is to predict each token given the tokens that come before it. To achieve this goal, it learns complex behaviour, such as type inference in dynamically typed languages.

An interesting idea, my only concern is the quality of code in the training set.

Comments

AI in Chemistry bursaries still available

 

The 2nd RSC-BMCS / RSC-CICAG Artificial Intelligence in Chemistry meetings is filling up fast, however there are still 6 bursaries unallocated. The closing date for applications is 15 July. The bursaries are available up to a value of £250, to support registration, travel and accommodation costs for PhD and post-doctoral applicants studying at European academic institutions.

You can find details here https://www.maggichurchouseevents.co.uk/bmcs/AI-2019.htm.

Twitter hashtag - #AIChem19

Comments

Molecular Transformer

 

When this paper first appeared on the arXiv preprint server "Molecular Transformer - A Model for Uncertainty-Calibrated Chemical Reaction Prediction https://arxiv.org/abs/1811.02633 it generated considerable interest.

Organic synthesis is one of the key stumbling blocks in medicinal chemistry. A necessary yet unsolved step in planning synthesis is solving the forward problem: given reactants and reagents, predict the products. Similar to other work, we treat reaction prediction as a machine translation problem between SMILES strings of reactants-reagents and the products. We show that a multi-head attention Molecular Transformer model outperforms all algorithms in the literature, achieving a top-1 accuracy above 90% on a common benchmark dataset. Our algorithm requires no handcrafted rules, and accurately predicts subtle chemical transformations. Crucially, our model can accurately estimate its own uncertainty, with an uncertainty score that is 89% accurate in terms of classifying whether a prediction is correct. Furthermore, we show that the model is able to handle inputs without reactant-reagent split and including stereochemistry, which makes our method universally applicable.

I just noticed that it had been recently updated.

If you are interested in this exciting area of chemistry you might be interested to know the code is available on GitHub and the trained model is available online https://rxn.res.ibm.com.

One of the authors, Alpha Lee, is speaking at the 2nd Artificial Intelligence in Chemistry Meeting #AIChem19, 2nd to 3rd September 2019, Fitzwilliam College, Cambridge, UK. You can register for the meeting here if you would like to hear first hand about this technology.

AI-webpage-image

The full lineup of speakers are here. Also remember there are bursaries available for the meeting.


Comments

Can we trust Published data?

I posted a poll on twitter

Looking at abstracts for the AI in Chemistry Meeting … many mine published data. The quality of the public data is obviously critical for good models. Is this something the AI community should be concerned about or get involved with to improve the quality of the literature?

The results are now in and interestingly despite nearly 2.5K impressions only 28 people voted. Of those that voted the overwhelming majority feel that AI scientists should help to improve the quality of the literature.

AIpollresult

The comments associated with the tweet are interesting, certainly many machine learning models are robust enough to accommodate some poor data but I think there is a deeper concern.

Elisabeth Bik has regularly flagged questionable publications, unfortunately these are not always detected before their influence has been propagated through the literature.

For a very detailed example look at 5-HTTLPR: A POINTED REVIEW looking at an unusual version of the serotonin transporter gene 5-HTTLPR.

I've heard of many examples of scientists being unable to reproduce literature findings, usually little happens, however Amgen were able to reproduce only 6 out of 53 'landmark' studies and they published their findings.

How many times do scientists assume failure to reproduce published findings is their error?

There have been several studies looking at the possible causes of the failure to reproduce work, in 2011, an evaluation of 246 antibodies used in epigenetic studies found that one-quarter failed tests for specificity, meaning that they often bound to more than one target. Four antibodies were perfectly specific — but to the wrong target Reproducibility crisis: Blame it on the antibodies.

See also "The antibody horror show: an introductory guide for the perplexed" DOI

Colourful as this may appear, the outcomes for the community are uniformly grim, including badly damaged scientific careers, wasted public funding, and contaminated literature.

If you are mining literature data to predict novel drug targets then Caveat emptor.

 

Comments

Special Issue "Machine Learning with Python"

 

I was just sent details of a Special Issue "Machine Learning with Python for the journal Information.

We live in this day and age where quintillions of bytes of data are generated and collected every day. Around the globe, researchers and companies are leveraging these vast amounts of data in countless application areas, ranging from drug discovery to improving transportation with self-driving cars.As we all know, Python evolved into the lingua franca of machine learning and artificial intelligence research over the last couple of years. What makes Python particularly attractive for us researchers is that it gives us access to a cohesive set of tools for scientific computing and is easy to teach and learn. Also, as a language that bridges many different technologies and different fields, Python fosters interdisciplinary collaboration. And besides making us more productive in our research, sharing tools we develop in Python has the potential to reach a wide audience and benefit the broader research community.

This special issue is now open for submission.

Comments

Can we trust published data

 

A Twitter poll can we trust published data and should AI community be involved?

Looking at abstracts for https://www.maggichurchouseevents.co.uk/bmcs/AI-2019.htm … many mine published data. The quality of the public data is obviously critical for good models.

https://twitter.com/macinchem/status/1135795628901093376

Comments

Swift for TensorFlow Models

 

This repository contains TensorFlow models written in Swift.

Swift for TensorFlow is a next-generation platform for machine learning, incorporating the latest research across machine learning, compilers, differentiable programming, systems design, and beyond. This is an early-stage project: it is not feature-complete nor production-ready, but it is ready for pioneers to try in projects, give feedback, and help shape the future!

This is the second public release of Swift for TensorFlow, available across Google Colaboratory, Linux, and macOS.


Comments

In which area is Artificial Intelligence likely to most impact Chemistry, the results are in

 

I ran a poll last week asking "In which area is Artificial Intelligence likely to most impact Chemistry?" And we now have the results.

pollResults

Whilst Molecular Design was the most popular choice it was interesting to see that all options were well supported. This suggests that there are opportunities for artificial intelligence to have an impact in many facets of chemistry. I'm delighted to see this since this was part of the thinking behind the AI in Chemistry meeting and I think the line up of speakers will have something for everyone.

2nd RSC-BMCS / RSC-CICAG, Artificial Intelligence in Chemistry, Monday-Tuesday, 2nd to 3rd September 2019. Fitzwilliam College, Cambridge, UK. #AIChem19

Synopsis
Artificial Intelligence is presently experiencing a renaissance in development of new methods and practical applications to ongoing challenges in Chemistry. Following the success of the inaugural “Artificial Intelligence in Chemistry” meeting in 2018, we are pleased to announce that the Biological & Medicinal Chemistry Sector (BMCS) and Chemical Information & Computer Applications Group (CICAG) of the Royal Society of Chemistry are once again organising a conference to present the current efforts in applying these new methods. The meeting will be held over two days and will combine aspects of artificial intelligence and deep machine learning methods to applications in chemistry.

Programme (draft)

Monday, 2nd September
08.30
Registration, refreshments
09.30
Deep learning applied to ligand-based de novo design: a real-life lead optimization case study
Quentin Perron, IKTOS, France
10.00
A. Turing test for molecular generators
Jacob Bush, GlaxoSmithKline, UK
10.30
Flash poster presentations
11.00
Refreshments, exhibition and posters
11.30
Presentation title to be confirmed
Keynote: Regina Barzilay, Massachusetts Institute of Technology, USA
12.30
Lunch, exhibition and posters
14.00
Artificial intelligence for predicting molecular Electrostatic Potentials (ESPs): a step towards developing ESP-guided knowledge-based scoring functions
Prakash Rathi, Astex Pharmaceuticals, UK
14.30
Molecular transformer for chemical reaction prediction and uncertainty estimation
Alpha Lee, University of Cambridge, UK
15.00
Drug discovery disrupted - quantum physics meets machine learning
Noor Shaker, GTN, UK
15.30
Refreshments, exhibition and posters
16.00
Application of AI in chemistry: where are we in drug design?
Christian Tyrchan, AstraZeneca, Sweden
16.30
Presentation title to be confirmed
Anthony Nicholls, OpenEye Scientific Software, USA
17.30 Close
18.45 Drinks reception
19.15 Conference dinner

Tuesday, 3rd September
08.30
Refreshments
09.00v Deep generative models for 3D compound design from fragment screens
Fergus Imrie, University of Oxford, UK
09.30
DeeplyTough: learning to structurally compare protein binding sites
Joshua Meyers, BenevolentAI, UK
10.00
Discovery of nanoporous materials for energy applications
Maciej Haranczyk, IMDEA Materials Institute, Spain
10.30
Refreshments, exhibition and posters
11.00
Deep learning for drug discovery
Keynote: David Koes, University of Pittsburgh, USA
12.00
Networking lunch, exhibition and posters
14.00
Presentation title to be confirmed
Olexandr Isayev, University of North Carolina at Chapel Hill, USA
14.30
Dreaming functional molecules with generative ML models
Christoph Kreisbeck, Kebotix, USA
15.00
Refreshments, exhibition and posters
15.30
Presentation title to be confirmed
Keynote: Adrian Roitberg, University of Florida, USA
16.30
Close

You can get more information and register here https://www.maggichurchouseevents.co.uk/bmcs/AI-2019.htm.


Comments

2nd RSC-BMCS / RSC-CICAG Artificial Intelligence in Chemistry

 

In June 2018 the First RSC-BMCS / RSC-CICAG Artificial Intelligence in Chemistry meeting was held in London. This proved to enormously popular, there were more oral abstracts and poster submissions than we had space for and was so over-subscribed we could have filled a venue double the size.

Planning for the second meeting is now in full swing, and it will be held in Cambridge 2-3 September 2019.

Event : 2nd RSC-BMCS / RSC-CICAG Artificial Intelligence in Chemistry
Dates : Monday-Tuesday, 2nd to 3rd September 2019
Place : Fitzwilliam College, Cambridge, UK
Websites : Event website, and RSC website.

Twitter #AIChem19

aifirst_announcement-

Applications for both oral and poster presentations are welcomed. Posters will be displayed throughout the day and applicants are asked if they wished to provide a two-minute flash oral presentation when submitting their abstract. The closing dates for submissions are:

  • 31st March for oral and
  • 5th July for poster

Full details can be found on the Event website,


Comments

GuacaMol, benchmarking models.

 

Comparison of different algorithms is an under researched area, this publication looks like a useful starting point.

GuacaMol: Benchmarking Models for De Novo Molecular Design

De novo design seeks to generate molecules with required property profiles by virtual design-make-test cycles. With the emergence of deep learning and neural generative models in many application areas, models for molecular design based on neural networks appeared recently and show promising results. However, the new models have not been profiled on consistent tasks, and comparative studies to well-established algorithms have only seldom been performed. To standardize the assessment of both classical and neural models for de novo molecular design, we propose an evaluation framework, GuacaMol, based on a suite of standardized benchmarks. The benchmark tasks encompass measuring the fidelity of the models to reproduce the property distribution of the training sets, the ability to generate novel molecules, the exploration and exploitation of chemical space, and a variety of single and multi-objective optimization tasks. The benchmarking framework is available as an open-source Python package.

Source code : https://github.com/BenevolentAI/guacamol.

The easiest way to install guacamol is with pip:

pip install git+https://github.com/BenevolentAI/guacamol.git#egg=guacamol --process-dependency-links

guacamol requires the RDKit library (version 2018.09.1.0 or newer).


Comments

IBM RXN twitter feed

 

Just saw new twitter feed that might be of interest for any synthetic chemists interested in retrosynthesis/reaction prediction.

https://twitter.com/ForRxn

#rxnforchemistry

Account for news and general info on the freely available AI platform made by #compchem chemists for #organic chemists

You can try out the reaction planning service for free here https://rxn.res.ibm.com.

Pasted Graphic



Comments

Optibrium and Intellegens Collaborate

Optibrium and Intellegens Collaborate to Apply Novel Deep Learning Methods to Drug Discovery

Partnership combines Intellegens’ proprietary AI technology with Optibrium’s expertise in predictive modelling and compound design. Optibrium provides elegant software solutions for small molecule design, optimisation and data analysis. By leveraging Intellegens’ AlchemiteTM technology, the partnership will create a “next generation” predictive modelling platform that is capable of delivering more accurate predictions and enabling better decision-making when it comes to the optimisation of compounds.

Read more.


Comments

Intelligently Automating Machine Learning, Artificial Intelligence, and Data Science

 

A timely tutorial and example workflow.

we have put together a more comprehensive workflow, serving as a blueprint for anyone to build her or his own version of a Guided Analytics application to combine just the right amount of automation and interaction for a specific set of problems.

Full details here


Comments

KNIME update

 

What’s New in KNIME Analytics Platform 3.6.

  • KNIME Deep Learning
  • Constant Value Column Filter
  • Numeric Outliers
  • Column Expressions
  • Scorer (JavaScript)
  • Git Nodes
  • Call Workflow (Table Based)
  • KNIME Server Connection
  • Text Processing
  • Usability Improvements
  • Connect/Unconnect nodes using keyboard shortcuts
  • Zooming
  • Replacing and connecting nodes with node drop
  • Node repository search
  • Usability improvements in the KNIME Explorer
  • Copy from/Paste to JavaScript Table view/editor
  • Miscellaneous
  • Performance: Column Store (Preview)
  • Making views beautiful: CSS changes
  • KNIME Big Data Extensions
  • Create Local Big Data Environment
  • KNIME H2O Sparkling Water Integration
  • Support for Apache Spark v2.3
  • Big Data File Handling Nodes (Parquet/ORC)
  • Spark PCA
  • Spark Pivot
  • Frequent Item Sets and Association Rules
  • Previews
  • Create Spark Context via Livy
  • Database Integration
  • Apache Kafka Integration
  • KNIME Server

  • Management (Client Preferences)

  • Job View (Preview)
  • Distributed Executors (Preview)
  • General release notes

  • JSON Path library update

  • Java Snippet Bundle Imports

I suspect it will be the KNIME Deep learning that will catch the eye, the ability to set up deep learning models using drag and drop. Use regular Tensorflow models within KNIME Analytics Platform and seamlessly convert from Keras to Tensorflow for efficient network execution

deeplearning

The new Create Local Big Data Environment node creates a fully functional local big data environment including Apache Spark, Apache Hive and HDFS. It allows you to try out the nodes of the KNIME Big Data Extensions without a Hadoop cluster.


Comments

Second Major DeepChem Release

 

A major update the DeepChem has been announced.

This major version release finishes consolidating the DeepChem codebase around our TensorGraph API for constructing complex models in DeepChem. We've made a variety of improvements to TensorGraph's saving/loading features and added a number of new tutorials improving our documentation of TensorGraph. We've also removed a number of older deprecated submodules and models in favor of the new, standardized TensorGraph implementations.

In addition, we've implemented a number of new deep models and algorithms, including DRAGONNs, Molecular Autoencoders, MIX+GANs, continuous space A3C, MCTS for RL, Mol2Vec and more. We've also continued improving our core graph convolutional implementations.

Also remember the RSC-BMCS / RSC-CICAG Artificial Intelligence in Chemistry Meeting registration is now open.


Comments

Artificial Intelligence in Chemistry

 

I mentioned the first announcement of a meeting to be held next year.

RSC-BMCS / RSC-CICAG Artificial Intelligence in Chemistry Friday, 15th June 2018 Royal Society of Chemistry at Burlington House, London, UK.
Twitter hashtag - #RSC_AIChem

AI-web-image-1

A number of the speakers have now been confirmed.

Confirmed Speakers

Keynote: What I learned about machine learning - revisited Bob Sheridan, Merck

Presentation title to be confirmed Nadine Schneider, Novartis

Scaling de novo design, from single target to disease portfolio Wilhem van Hoorn, Exscientia

Presentation title to be confirmed Marwin Segler, Benevolent AI

Molecular de novo design through deep learning Ola Engkvist, AstraZeneca

I also notice that there are a number of EPSRC funding opportunities

Artificial Intelligence - UKRI CDTs EPSRC is expected to support 10-20 doctoral training positions.

The call is now open for around 15 Centres for Doctoral Training (CDTs) focused on areas relevant to Artificial Intelligence (AI) across UKRI's remit. This call opens against the background of Professor Dame Wendy Hall and Jérôme Pesenti's review, Growing the artificial intelligence industry in the UK, and the Government's Industrial Strategy White Paper, Building a Britain fit for the Future. This investment in AI skills will be kick-started by support for over 100 studentships that will be funded during 2018/19 via the Research Councils current mechanisms and schemes.

Universities are invited to apply against two priority areas:

Enabling Intelligence, a priority area within Engineering and Physical Sciences Research Council's (EPSRC) main CDT call
Applications and Implications of Artificial Intelligence (AIAI), a new priority area relevant to all Research Councils.

More info..



Comments

Deep Learning Cheat Sheet (using Python Libraries)

 

Just came across this really invaluable resource.

  • Deep Learning Cheat Sheet (using Python Libraries)
  • PySpark Cheat Sheet: Spark in Python
  • Data Science in Python: Pandas Cheat Sheet
  • Cheat Sheet: Python Basics For Data Science
  • A Cheat Sheet on Probability
  • Cheat Sheet: Data Visualization with R
  • New Machine Learning Cheat Sheet by Emily Barry
  • Matplotlib Cheat Sheet
  • One-page R: a survival guide to data science with R
  • Cheat Sheet: Data Visualization in Python
  • Stata Cheat Sheet
  • Common Probability Distributions: The Data Scientist’s Crib Sheet
  • Data Science Cheat Sheet
  • 24 Data Science, R, Python, Excel, and Machine Learning Cheat Sheets
  • 14 Great Machine Learning, Data Science, R , DataViz Cheat Sheets



Comments

RSC-BMCS / RSC-CICAG Artificial Intelligence in Chemistry

 

The first announcement of a meeting to be held next year.

RSC-BMCS / RSC-CICAG Artificial Intelligence in Chemistry Friday, 15th June 2018 Royal Society of Chemistry at Burlington House, London, UK.
Twitter hashtag - #RSC_AIChem

AIfirst_announcement

Artificial Intelligence is presently experiencing a renaissance in development of new methods and practical applications to ongoing challenges in Chemistry. We are pleased to announce that the Biological & Medicinal Chemistry Sector (BMCS) and Chemical Information & Computer Applications Group (CICAG) of the Royal Society of Chemistry are organising a one-day conference entitled Artificial Intelligence in Chemistry to present the current efforts in applying these new methods. We will combine aspects of artificial intelligence and deep machine learning methods to applications in chemistry.

Applications for oral and poster presentations are welcomed. Posters will be displayed throughout the day and applicants will be asked if they would like to provide a two-minute flash oral presentation when submitting their abstract. Closing dates are 31st January for oral and 13th April for poster submissions.

More details here http://www.maggichurchouseevents.co.uk/bmcs/AI-2018.htm.


Comments

“Found in Translation”: Predicting Outcomes of Complex Organic Chemistry Reactions

 

An interesting paper uses 1,808,938 reactions from the patent literature as a training set to build a model to predict reactions.

There is an intuitive analogy of an organic chemist's understanding of a compound and a language speaker's understanding of a word. Consequently, it is possible to introduce the basic concepts and analyze potential impacts of linguistic analysis to the world of organic chemistry. In this work, we cast the reaction prediction task as a translation problem by introducing a template-free sequence-to-sequence model, trained end-to-end and fully data-driven. We propose a novel way of tokenization, which is arbitrarily extensible with reaction information. With this approach, we demonstrate results superior to the state-of-the-art solution by a significant margin on the top-1 accuracy. Specifically, our approach achieves an accuracy of 80.1% without relying on auxiliary knowledge such as reaction templates. Also, 66.4% accuracy is reached on a larger and noisier dataset.

There is also a brief video describing the work.


Comments