A recent publication caught my eye.
Extraction of organic chemistry grammar from unsupervised learning of chemical reactions DOI.
Anyone who has been involved in building a reaction database will know that atom mapping reagents/starting materials onto products is a very time-consuming and tedious process, that is often fraught with errors. So any method that automates this process is a significant step forward.
Here, we demonstrate that Transformer Neural Networks learn atom-mapping information between products and reactants without supervision or human labeling. Using the Transformer attention weights, we build a chemically agnostic, attention-guided reaction mapper and extract coherent chemical grammar from unannotated sets of reactions. Our method shows remarkable performance in terms of accuracy and speed, even for strongly imbalanced and chemically complex reactions with nontrivial atom-mapping. It provides the missing link between data-driven and rule-based approaches for numerous chemical reaction tasks.
They also supply a file containing Common patent reaction templates. This file contains the most common patent reaction templates (USPTO grants), including the year of the first appearance, the patent numbers, frequently used reagents, and the template count. The templates were extracted after applying RXNMapper to generate the atom-mapping.
Really nice paper looking at reaction classification based on text description, and visualisation using reaction fingerprints. Mapping the space of chemical reactions using attention-based neural networks. DOI
Can be installed using conda
All code is on GitHub https://github.com/rxn4chemistry/rxnfp/tree/master/.