Macs in Chemistry

Insanely great science

 

Installing ACPC on a Mac

One of the advantages of using a Mac for science is that you can often make use of the UNIX underpinnings of Mac OSX to access programs written for Linux.

A recent publication in Journal of Cheminformatics caught my eye, screening of molecules using electrostatics is usually a very time-consuming process, but this publication describes an interesting and very quick way to screen molecules.

A rotation-translation invariant molecular descriptor of partial charges and its use in ligand-based virtual screening Francois Berenger, Arnout Voet, Xiao Yin Lee and Kam YJ Zhang Journal of Cheminformatics 2014, 6:23 doi

Measures of similarity for chemical molecules have been developed since the dawn of chemoinformatics. Molecular similarity has been measured by a variety of methods including molecular descriptor based similarity, common molecular fragments, graph matching and 3D methods such as shape matching. Similarity measures are widespread in practice and have proven to be useful in drug discovery. Because of our interest in electrostatics and high throughput ligand-based virtual screening, we sought to exploit the information contained in atomic coordinates and partial charges of a molecule.

Installation of ACPC

The program ACPC is freely available for download but uses the OCaml package manager OPAM. However, if you have read my earlier post about using BREW and PIP as you might expect the installation is fairly straight-forward. We first use BREW to install automake and opam and then use opam to install ACPC.

brew update
brew install automake

brew install opam
opam init
eval `opam config env`
opam update


opam install acpc

Reading through the publication and read me it seems there are a couple of dependencies.

Gnuplota portable command-line driven graphing utility, CROC A package for calculating ROC curves and Concentrated ROC (CROC) curves and sympy a Python library for symbolic mathematics. These we can install using BREW and PIP

brew install gnuplot --cairo --with-x --tests

pip install CROC
pip install sympy

Now we have everything in place we can perform a quick check

Chrismacpro:~ chrisswain$ acpc -help
2014-05-14 10:17:07.938 INFO : 

Copyright (C) 2014, Zhang Initiative Research Unit,
Institute Laboratories, RIKEN
2-1 Hirosawa, Wako, Saitama 351-0198, Japan

Example: acpc -q query.mol2 -db database.mol2

  -cmp {CC|Tani|Tref|Tdb} LBAC+/- comparison method (default: CC)
  -htq                    list molecules scoring Higher Than the Query with itself
  -q query.mol2           query (incompatible with -qf)
  -qf f                   file containing a list of mol2 files (incompatible with -q)
  -db db.mol2             database
  -dx float               X axis discretization (default: 0.005000)
  -v                      output intermediate results
  -nopp                   don't rm duplicate molecules
  -np nprocs              max CPUs to use (default: 1)
  -ng                     no gnuplot
  -nr                     no ROC curve (also sets -ng)
  -o output.mol2          output file (also requires -top, incompatible with -qf)
  -top N                  nb. best scoring molecules to output (also requires -o)
  -help                   Display this list of options
  --help                  Display this list of options

Running a search

One of the authors (Francois Berenger )kindly provided me with a test set.

chrismacpro:DUD_ACPC_1.0_validation chrisswain$ acpc -q /Users/chrisswain/Downloads/data/berenger/DUD_ACPC_1.0_validation/probeligand.mol2 -db /Users/chrisswain/Downloads/data/bereger/DUD_ACPC_1.0_validation/gpb_ligdecs.25conf.mol2
2014-05-14 12:54:40.219 INFO : 

Copyright (C) 2014, Zhang Initiative Research Unit,
Institute Laboratories, RIKEN
2-1 Hirosawa, Wako, Saitama 351-0198, Japan

2014-05-14 12:54:47.052 INFO : 45449 molecule(s) in /Users/chrisswain/Downloads/data/berenger/DUD_ACPC_1.0_validation/gpb_ligdecs.25conf.mol2
2014-05-14 12:54:59.672 INFO : 1 molecule(s) in /Users/chrisswain/Downloads/data/berenger/DUD_ACPC_1.0_validation/probeligand.mol2
2014-05-14 12:55:00.198 WARN : removed 43603 molecules (duplicated names)
2014-05-14 12:55:00.198 INFO : writing names, scores and ranks in /Users/chrisswain/Downloads/data/berenger/DUD_ACPC_1.0_validation/probeligand.ranks
2014-05-14 12:55:00.201 INFO : writing scores and labels in /Users/chrisswain/Downloads/data/berenger/DUD_ACPC_1.0_validation/probeligand.scored-label
2014-05-14 12:55:00.479 INFO : db: /Users/chrisswain/Downloads/data/berenger/DUD_ACPC_1.0_validation/gpb_ligdecs.25conf.mol2 q: /Users/chrisswain/Downloads/data/berenger/  DUD_ACPC_1.0_validation/probeligand.mol2 cmp: CC dx: 0.005000 AUC: 0.853
running:
gnuplot -persist /Users/chrisswain/Downloads/data/berenger/DUD_ACPC_1.0_validation/probeligand.gpl
2014-05-14 12:55:00.495 INFO : speed: 2241.49 molecules/s

The plot below shows the ROC curve for this particular search.

gnupot1

The publication describes the results for searching 40 different targets, comparing the results with those found using other search methods.

A few things to note

The recommended charge models are: MOE's MMFF94x or as a fallback Open Babel's Gasteiger, it is important to generate the query and the search database using the same software.

Don't do two queries at the same time on the same computer or on top of NFS with a same query molecule file (-q SOMEQUERY.mol2), this may overwrite result files in a strange way (SOMEQUERY.ranks, SOMEQUERY.scores and SOMEQUERY.scored-label).

I did a search on a file containing 500,000 structures and it consumed 24 GB of RAM, for larger datasets you can use acpc_big which has a simpler output but requires less RAM.

Last updated 16 May 2014.