ChEMBL Compound Curation Pipeline


With the imminent release of ChEMBL 26 I was interested to hear about the new chemical curation pipeline that had been developed.

The pipeline includes three functions:

  1. Check Identifies and validates problem structures before they are added to the database

  2. Standardize Standardises chemical structures according to a set of predefined ChEMBL business rules

  3. GetParent Generates parent structures of multi-component compounds based on a set of rules and defined list of salts and solvents

The code is all on GitHub and notebooks are available.

