ChEMBL Compound Curation Pipeline
With the imminent release of ChEMBL 26 I was interested to hear about the new chemical curation pipeline that had been developed.
The pipeline includes three functions:
Check Identifies and validates problem structures before they are added to the database
Standardize Standardises chemical structures according to a set of predefined ChEMBL business rules
GetParent Generates parent structures of multi-component compounds based on a set of rules and defined list of salts and solvents
The code is all on GitHub https://github.com/chembl/ChEMBLStructurePipeline and notebooks are available.