Comparison of different algorithms is an under researched area, this publication looks like a useful starting point.
De novo design seeks to generate molecules with required property profiles by virtual design-make-test cycles. With the emergence of deep learning and neural generative models in many application areas, models for molecular design based on neural networks appeared recently and show promising results. However, the new models have not been profiled on consistent tasks, and comparative studies to well-established algorithms have only seldom been performed. To standardize the assessment of both classical and neural models for de novo molecular design, we propose an evaluation framework, GuacaMol, based on a suite of standardized benchmarks. The benchmark tasks encompass measuring the fidelity of the models to reproduce the property distribution of the training sets, the ability to generate novel molecules, the exploration and exploitation of chemical space, and a variety of single and multi-objective optimization tasks. The benchmarking framework is available as an open-source Python package.
Source code : https://github.com/BenevolentAI/guacamol.
The easiest way to install guacamol is with pip:
pip install git+https://github.com/BenevolentAI/guacamol.git#egg=guacamol --process-dependency-links
guacamol requires the RDKit library (version 2018.09.1.0 or newer).
Just having a look at the new ChEMBL interface, quite like the easy way to embed records into web pages
<object data="https://www.ebi.ac.uk/chembl/beta/embed/#mini_report_card/Compound/CHEMBL1471" width="100%" height="300"></object>
and it is displayed as shown below.
Will doing some more investigations later this week.
With the release of ChEMBL 21 has come a set of updated target predicted models.
The good news is that, besides the increase in terms of training data (compounds and targets), the new models were built using the latest stable versions of RDKit (2015.09.2) and scikit-learn (0.17). The latter was upgraded from the much older 0.14 version, which was causing incompatibility issues while trying to use the models.
I've been using the models and I thought I'd share an iPython Notebook I have created. This is based on the ChEMBL notebook with code tidbits taken from the absolutely invaluable Stack Overflow. I'm often in the situation where I actually want to know the predicted activity at specific targets, and specifically want to confirm lack of predicted activity at potential off-targets. I could have a notebook for each target but actually the speed of calculation means that I can calculate all the models and then just cherry pick those of interest.
ChEMBL 21 introduced a few new tables, which are now available via the API. Keyword searching has been improved.
Compound images have transparent background by default
The official Python client library has been updated as well in order to reflect recent changes. This can be installed using PIP
pip install -U chembl_webresource_client
The release of ChEMBL_21 has been announced. This version of the database was prepared on 1st February 2016 and contains:
- 1,929,473 compound records
- 1,592,191 compounds (of which 1,583,897 have mol files)
- 13,968,617 activities
- 1,212,831 assays
- 11,019 targets
- 62,502 source documents
Please see ChEMBL_21 release notes for full details of all changes in this release.
There is a great blog article on ChEMBL-og, describing their work evaluating chemical structure based searching in MongoDB. MongoDB is a NoSQL database designed for scalability and performance that is attracting a lot of interest at the moment.
The article does a great job in explaining the logic behind improving the search performance.
They also provide an iPython notebook so you can try it yourself.
Excellent blog post on the ChEMBL python update.