Macs in Chemistry

Insanely great science

 

Compiling Plane of Best Fit

I was recently asked about compiling an algorithm, plane of best fit (PBF), to quantify and characterize the 3D character of molecules as described in Plane of Best Fit: A Novel Method to Characterize the Three-Dimensionality of Molecules, Nicholas C. Firth, Nathan Brown, and Julian Blagg, Journal of Chemical Information and Modeling 2012 52 (10), 2516-252 DOI. The source code is all available from the rdkit repository https://github.com/rdkit/rdkit/tree/master/Contrib/PBF.

Compilation should have been straight-forward, and requires RDkit and Eigen (http://eigen.tuxfamily.org/) to be installed. Installation of which is described on the Cheminformatics on a Mac page.

cd Desktop/pbfstuff/
>> g++ PBFRDKit.cpp PBFCalculatorForSDF.cpp -L/usr/local/lib -I/usr/local/include/rdkit -I/usr/local/include/eigen3 -DUSE_EIGEN2 -lFileParsers -lGraphMol -lDataStructs -lEigenSolvers -lRDGeometryLib -lRDGeneral -o PBFCalculator

But I got an error

Undefined symbols for architecture x86_64:
  "Invar::operator<<(std::basic_ostream<char, std::char_traits<char> >&, Invar::Invariant const&)", referenced from:
   getBestFitPlane(std::vector<RDGeom::Point3D, std::allocator<RDGeom::Point3D> > const&, std::vector<double, std::allocator<double> >&, std::vector<double, std::allocator<double> > const*) in ccsSXicQ.o
   PBFRD(RDKit::ROMol&, int) in ccsSXicQ.o
....
....
ld: symbol(s) not found for architecture x86_64
collect2: error: ld returned 1 exit status

After a number of emails, discussions and experimentation we finally pin-pointed the problem. It seems there is an issue with having both the gcc installed by Homebrew and Clang the Apple default compiler replacement for gcc, so by using the full path as shown below compilation proceeds and the PBFCalculator executable is created.

/usr/bin/g++ PBFRDKit.cpp PBFCalculatorForSDF.cpp -L/usr/local/lib -I/usr/local/include/rdkit -I/usr/local/include/eigen3 -DUSE_EIGEN2 -lFileParsers -lGraphMol -lDataStructs -lEigenSolvers -lRDGeometryLib -lRDGeneral -o PBFCalculator


ls
PBFCalculator       PBFCalculatorForSDF.cpp PBFRDKit.cpp    PBFRDKit.h

Using PBFCalcluator ——————————————

Form within the folder containing the PBFcalculator executable the following terminal command is all that is needed to use the application.

./PBFCalculator /Users/username/Desktop/pbfstuff/FragData.sdf

The output is a new file FragDataScoredPBF.sdf that contains a new field called “PBF_Score”, I ran a file containing nearly 1000 fragments through the calculation in a couple of seconds. As might be expected the majority of the fragments were low scoring.

PBF_binned

Comparison with npmi

In the past I’ve used npmi (Normalized ratio of principle moments of inertia) as described by Sauer WH, Schwarz MK (2003) Molecular shape diversity of combinatorial libraries: A prerequisite for broad bioactivity. J Chem Inf Comput Sci 43:987–10030. DOI as a shape descriptor. The scatterplot below plots npr1 versus npr2 and I’ve colour coded (rodlike = green; dislike = yellow; sphere like = red) the display. I’ve also plotted the categorisation in left hand histogram, and binned the PBF results in the right hand histogram. Selecting the middle bin of the PBF plot highlights a line of points on the npr scatterplot parallel to the left edge of the scatterplot. Whilst both descriptors are intended to provide information on the 3D structure of the molecule it looks like the PBF provides more granularity which may be particularly useful when looking at small fragments. On the other hand descriptors like rodlike, disclike and spherelike are easier to visualise.

PBFvNPI

Last updated 6 August 2014.