Virtual Chemical Libraries
A very interesting paper on Virtual Chemical Libraries by W. Patrick Walters DOI describing how it is now possible to generate virtual libraries of molecules of billions of compounds. These vast virtual libraries result in a number of practical challenges in particular their use in virtual screening.
If we consider a virtual screen with a false positive rate of 1% (an optimistic estimate for even the best virtual screening methods), a virtual screen on a library of 1 million molecules would yield 10,000 false positive hits. (A “false positive” is an inactive molecule which is predicted to be active).
Another consideration with very large virtual libraries is the time and CPU resource required for processing, whilst substructure and 2D similarity searches are very fast and can make use of hashed fingerprints. 3D or docking searches are orders of magnitude slower and require either storage of multiple conformations of the ligand or conformation generation on the fly. Realistically these require access to large compute clusters, cloud based resources are now relatively accessible but require significant expertise to access efficiently and securely.
Even the fastest docking programs require 2 seconds per molecule to dock an ensemble of conformations into a protein binding site. At this rate, approximately 15,327 CPU days would be required to dock 680 million molecules.