When this paper first appeared on the arXiv preprint server "Molecular Transformer - A Model for Uncertainty-Calibrated Chemical Reaction Prediction https://arxiv.org/abs/1811.02633 it generated considerable interest.
Organic synthesis is one of the key stumbling blocks in medicinal chemistry. A necessary yet unsolved step in planning synthesis is solving the forward problem: given reactants and reagents, predict the products. Similar to other work, we treat reaction prediction as a machine translation problem between SMILES strings of reactants-reagents and the products. We show that a multi-head attention Molecular Transformer model outperforms all algorithms in the literature, achieving a top-1 accuracy above 90% on a common benchmark dataset. Our algorithm requires no handcrafted rules, and accurately predicts subtle chemical transformations. Crucially, our model can accurately estimate its own uncertainty, with an uncertainty score that is 89% accurate in terms of classifying whether a prediction is correct. Furthermore, we show that the model is able to handle inputs without reactant-reagent split and including stereochemistry, which makes our method universally applicable.
I just noticed that it had been recently updated.
One of the authors, Alpha Lee, is speaking at the 2nd Artificial Intelligence in Chemistry Meeting #AIChem19, 2nd to 3rd September 2019, Fitzwilliam College, Cambridge, UK. You can register for the meeting here if you would like to hear first hand about this technology.
The full lineup of speakers are here. Also remember there are bursaries available for the meeting.
Application deadline for the 2019 Grant is April 19, 2019. Successful applicants will be notified no later than May 24, 2019.
The Grant Program has been created to provide funding for the career development of young researchers who have demonstrated excellence in their education, research or development activities that are related to the systems and methods used to store, process and retrieve information about chemical structures, reactions and compounds. One or more Grants will be awarded annually up to a total combined maximum of ten thousand U.S. dollars ($10,000). Grantees have the option of payments being made in U.S. dollars or in British Pounds equivalent to the U.S. dollar amount. Grants are awarded for specific purposes, and within one year each grantee is required to submit a brief written report detailing how the grant funds were allocated. Grantees are also requested to recognize the support of the Trust in any paper or presentation that is given as a result of that support.
There are more details of the requirements on the website
The 2018 awards went to
Stephen Capuzzi, Division of Chemical Biology and Medicinal Chemistry at the University of North Carolina Eshelman School of Pharmacy, Chapel Hill (USA), was awarded a Grant to attend the 31th ICAR in Porto, Portugal from 06/11/2018 to 06/15/2018, where he presented his research entitled “ComputerAided Discovery and Characterization of Novel Ebola Virus Inhibitors.”
Christopher Cooper, Cavendish Laboratory, University of Cambridge, UK, was awarded a Grant to present his current research on systematic, high-throughput screening of organic dyes for co-sensitized dye-sensitized solar cells. He presented his work at the Solar Energy Conversion Gordon Research Conference and Seminar held June 16-22, 2018 in Hong Kong.
Mark Driver, Chemistry Department, University of Cambridge, UK,was awarded a Grant to offset costs to attend the 7th EUCheMS conference where he will present a poster on his research that focuses on the development and applications of a theoretical approach to model hydrogen bonding.
Genqing Wang, La Trobe Institute for Molecular Sciences, La Trobe University, Australia, was awarded a Grant to present his work at the Fragment-Based Lead Discovery Conference (FBLD2018) in San Diego, USA in October 2018. The current focus of his work is the development of novel anti-virulence drugs which potentially overcome the problems of antibiotic resistance of Gram-negative bacteria.
Roshan Singh, University of Oxford, UK, was awarded a Grant to conduct research within Dr. Marcus Lundberg’s Group at Uppsala University, Sweden, as part of a collaboration that he has set up between them and Professor Edward Solomon’s Group at Stanford University, California. He conducts research within Professor John McGrady’s group at the University of Oxford. The collaboration will look to consolidate the experiments studies on heme Fe (IV)=O complexes currently being studied by Solomon’s Group with future multi-reference calculations to be conducted within Lundberg’s Group.
Great work by NextMove, an open, machine-readable, freely-reusable, annotated reaction data set, available for download here https://figshare.com/articles/ChemicalreactionsfromUSpatents1976-Sep2016/5104873
Reactions extracted by text-mining from United States patents published between 1976 and September 2016. The reactions are available as CML or reaction SMILES. Note that the reactions SMILES are derived from the CML.
For convenience the reaction SMILES includes tab delimited columns for: PatentNumber, ParagraphNum, Year, TextMinedYield, CalculatedYield
Now that we have a large initial data set it would be great if others could contribute using the same format.
There is a fabulous detailed review of this invaluable resource on the Depth-First blog http://depth-first.com/articles/2019/01/28/the-nextmove-patent-reaction-dataset/
Just saw new twitter feed that might be of interest for any synthetic chemists interested in retrosynthesis/reaction prediction.
Account for news and general info on the freely available AI platform made by #compchem chemists for #organic chemists
You can try out the reaction planning service for free here https://rxn.res.ibm.com.
A couple more examples of the use of augmented reality to display chemistry
This also looks interesting.
MoleculAR - sneak peak on an augmented reality app to help organic chemistry students visualise molecules in 3D, using just their lecture notes and mobile devices! pic.twitter.com/NOa9Q3bAYZ— Mark Coster (@MarkCoster_Chem) July 8, 2018
Touching proteins with virtual bare hands
….A more accessible and intuitive visualization of the three-dimensional configuration of the atomic geometry in the models can be achieved through the implementation of immersive virtual reality (VR). While bespoke commercial VR suites are available, in this work, we present a freely available software pipeline for visualising protein structures through VR. New consumer hardware, such as the HTC Vive and the Oculus Rift utilized in this study, are available at reasonable prices….
I bookmarked this paper a while back but have only just had time to read it through, STK: A Python Toolkit for Supramolecular Assembly. STK is a tool for the automated assembly, molecular optimization and property calculation of supramolecular materials. It has a simple Python API and integration with third party computational codes.
Additional linking functional groups can be defined as SMARTS and STK can be extended by adding additional optimisation force-fields.
I just stumbled across a fascinating series of lectures. These are recordings of the live discussions behind the ongoing software development led by Stephen Wolfram.
Of particular interest might be the discussion on incorporating chemistry into the Wolfram language.
A little while back I mentioned BioConda. You can read more details in this publication "Bioconda: A sustainable and comprehensive software distribution for the life sciences", DOI. Conda is a platform- and language-independent package manager that sports easy distribution, installation and version management of software.
The conda package manager has recently made installing software a vastly more streamlined process. Conda is a combination of other package managers you may have encountered, such as pip, CPAN, CRAN, Bioconductor, apt-get, and homebrew. Conda is both language- and OS-agnostic, and can be used to install C/C++, Fortran, Go, R, Python, Java etc
The bioconda channel is a Conda channel providing bioinformatics related packages for Linux and Mac OS. Looking through the packages it is clear there it already contains a number of chemistry packages. These include: Updated 24 November 2017
- Autodock Vina
Bioconda offers a collection of over 3100 software tools, which are continuously maintained, updated, and extended by a growing global community of more than 330 contributors. Rather than try to duplicate this effort for a "Chemconda" it seems more efficient to encourage chemists to contribute to Bioconda. If you do package a chemistry application for Bioconda please let me know and I'll publicise it on my blog and add it to the list above. To start things rolling I've added PubChem.py to Bioconda and I've written a page describing how to create a bioconda recipe.
Virtual reality apps for the iPhone are becoming more common and the latest is Learning Carbons VR. This is an educational virtual reality (VR) app where students can learn about the various forms of carbon.
For many of these types of apps you will need a VR Google Cardboard headset with head straps and a Bluetooth gamepad (MFi certified - made for iPhone).
iScienceSearch is an internet search portal for scientists that allows you to perform a search across multiple data sources with a single query. I wrote a review a while back but it has undergone several updates since then and has been significantly expanded.
iScienceSearch allows both text and structure based searches, but the really interesting thing is that when you do a search using a single query item it automatically searchs in the background for other synonyms, structure, CAS Registry Number, InChI etc.
The screenshot below (click to enlarge) was generated using a structure-based query, as you can see the search results also include the results from text-based queries using synonyms. The filters on the left-hand side allow you to sort and filter the results to allow you to focus on the most relevant information.
The searching is free and requires no registration.
The GPView program is a C++ package for wave function analysis and visualization.
It is developed and maintained by Tian Shi and Ping Wang Ref](http://arxiv.org/abs/1602.07302)
In this manuscript, we will introduce a recently developed program GPView, which can be used for wave function analysis and visualization. The wave function analysis module can calculate and generate 3D cubes for various types of molecular orbitals and electron density related with electronic excited states, such as natural orbitals, natural transition orbitals, natural difference orbitals, hole-particle density, detachment-attachment density and transition density. The visualization module of GPView can display molecular and electronic (iso-surfaces) structures. It is also able to animate single trajectories of molecular dynamics and non-adiabatic excited state molecular dynamics using the data stored in existing files. There are also other utilities help to extract and process the output of quantum chemistry calculations. The GPView provides full graphic user interface (GUI) which makes it very easy to use.
LaTeX is used for the markup and publication of scientific documents, it is particularly popular in mathematics, physics, computer science. I know some chemists use it so I thought I'd mention this resource of Chemistry LaTeX packages. It includes packages for most of the major Chemistry journals.
The ChemSpider Website has been updated.
ChemSpider is a free chemical structure database providing fast text and structure search access to over 34 million structures from hundreds of data sources.
The secret of a good iOS app is often finding a niche that is both useful but does not require lots of functionality or screen real estate. Chemical Names Spell Checker ticks both boxes nicely.
The Chemical Names Spell Checker provides chemical name spell checking and chemical name synonym look-up. Data are provided by the ChemSpell service that contains more than 1.3 million chemical names related to organic, inorganic, pharmaceutical, toxicological, and environmental health topics.
Once checked the name can be copied to the clipboard for use in another application.
The ChemSpell Web Service API is free of charge. Neither registration or licensing is required. This app nicely underlines the power of chemistry web services.
Well worth all chemists having on their iPhone or iPad.
I thought I would highlight a recent publication I read in Journal of Cheminformatics “Molecule database framework: a framework for creating database applications with chemical structure search capability” Journal of Cheminformatics 2013, 5:48 DOI.
From the abstract
Molecule Database Framework is written in Java and I created it by integrating existing free and open-source tools and frameworks. The core functionality includes:Chemical structure searches combined with property searches. Support for multi-component compounds (mixtures) mport and export of SD-files. Optional security (authorization). For chemical structure searching Molecule Database Framework leverages the capabilities of the Bingo Cartridge for PostgreSQL and provides type-safe searching, caching, transactions and optional method level security. Molecule Database Framework supports multi-component chemical compounds (mixtures). Furthermore the design of entity classes and the reasoning behind it are explained. By means of a simple web application I describe how the framework could be used. I then benchmarked this example application to create some basic performance expectations for chemical structure searches and import and export of SD-files.
While not a drag and drop solution it provides a means to create your own personal chemically searchable database.
Molecule Database Framework is available for download on the projects web page on bitbucket: https://bitbucket.org/kienerj/moleculedatabaseframework.
I was recently sent a link to an educational chemistry app Chemistry By Design: Learning by Using the Graphical Language of Organic Chemistry by University of Arizona. I had a quick look at it and it seems quite an interesting way to learn organic synthesis. There around a thousand synthetic routes to explore and it seems to cover a wide range of synthetic organic chemistry, the synthetic targets include natural products and pharmaceuticals. I was particularly delighted to see that Woodward’s 1954 synthesis of strychnine is included.
Whilst looking it up I noticed several other educational chemistry apps, Organic Chemistry Essentials, Organic Chemistry FlashCards. What is clear is it would be very useful to have a science category to help find these sort of applications.
There is a list of mobile science apps here.