FDA small molecules 2019
After I posted Small molecules approved by FDA in 2019 a number of people contacted me asking for the dataset, they then asked how it was created. So I thought I'd put together a brief description of the process.
First copy and paste the table from the FDA website into a text editor, I used BBEdit, tidy it up and save as comma delimited text.
Open the file in Vortex and you should see something like the image below. Some of the drugs are a mixture components e.g. Trikafta is a combination of tezacaftor, ivacaftor and elexacaftor. These rows were duplicated and a single active ingredient copied into each row.
There are several places you can search for the structures PubChem, ChemSpider, ChEMBL and Chemical Identifier Resolver. All have public API and I've written a couple of Vortex scripts to access them. In this instance I used the ChemSpider script to access the chemical structures using the active ingredient as the query term, the structure is returned as a SMILES string and automatically rendered as a 2D structure by Vortex.
The rows containing biologics had no structures and these rows were hidden, and the remaining small molecules were selected. This selected structures were then exported from Vortex in SDF format. The SDF file was then imported into a MOE database and the structures converted to 3D and minimised. The normalised Principle Moments of Inertia DOI were then calculated and the file exported in SDF format again. The physicochemical properties and the plots were then generated using a Jupyter Notebook using RDKit, ChemAxon and Seaborn for plotting.
Last Updated 3 January 2019