Chemical Structure validation/standardisation (Greg Landrum) by Greg Landrum
The next CICAG Open Source tools workshop is coming up.
21 April Chemical Structure validation/standardisation (Greg Landrum) Possibly the most important step in model or database building is data curation, this workshop will deal with chemical structure validation and standardisation.
Registration is free and you will be sent login details at a later date.
These workshops are sponsored by LiverpoolChiroChem.
Some pre-event details for the upcoming workshop
The workshop will be hands-on, so you'll get the most out of it if you have the most recent versions of the RDKit and the ChEMBL Structure Pipeline installed. You can download a conda environment specification here that will create a minimal environment which should have everything you need for the workshop. The conda environment uses python 3.7 since that's what's available on google colab, but you should also be able to use python 3.8 or 3.9 without problems; just change the python version spec in the environment file.
I'll be using a sample dataset from PubChem Substance: https://ftp.ncbi.nlm.nih.gov/pubchem/Substance/CURRENT-Full/SDF/Substance000000001000500000.sdf.gz
If you have problems creating (or don't want to create) a local environment, I also created a notebook in google colab which you can use. The cells at the beginning of this notebook install all required software and download the sample dataset: