Macs in Chemistry

Insanely Great Science

Unix commands for helping deal with very large files


I'm regularly handling very large files containing millions for chemical structures and whilst BBEdit is my usual tool for editing text files in practice it becomes rather cumbersome for really large files (> 2 GB). In these cases I've compiled a useful list of UNIX commands that make life easier.

The page is part of the Hints and Tutorials section and can be viewed here.

Whilst I use them when dealing with large chemical structure files they are equally useful when dealing with any large text or data files. If anyone has any additional suggestions please feel free to submit them.


Implementing AB-MPS scoring


Whilst the rule of 5 (Ro5) has provided a useful way to describe small molecule drug space it is also clear that there are a significant number of molecular classes that exist beyond the rule of 5 boundaries (bRo5). In a review of the AbbVie compound collection DOI they were able to identify key findings that might explain the success (or failure) of bRo5 projects. From an analysis of a variety of calculated physicochemical properties they proposed a simple multiparametric scoring function (AB-MPS) was devised that correlated preclinical PK results with cLogD, number of rotatable bonds, and number of aromatic rings.

AB-MPS = Abs(cLogD-3) + NAR + NRB

Now implemented as a Vortex script.


Chemical Information and Computer Applications Group (CICAG) website


The new RSC CICAG website is now live why not have a look and provide suggestions and feedback.


The Chemical Information and Computer Applications Group (CICAG) is one of the RSC’s many member-led Interest Groups, which exist to benefit RSC members and the wider chemical science community.

Also provides links to the social media feeds (Twitter, LinkedIn etc.)


Intel® Distribution for Python


Anyone fancy taking this for a test drive and providing some information on performance?

Get real performance results and download the free Intel Distribution for Python that includes everything you need for blazing-fast computing, analytics, machine learning, and more. Use Intel Python with existing code, and you’re all set for a significant performance boost.

The core computing packages, Numpy, SciPy, and scikit-learn, are accelerated under the hood with powerful, multithreaded native performance libraries such as Intel® Math Kernel Library, Intel® Data Analytics Acceleration Library, and others, to deliver native code-like performance results to Python. We leverage Intel® hardware capabilities using multiple cores and the latest Intel® Advanced Vector Extensions (Intel® AVX) instructions, including Intel® AVX-512. The Intel Python team reimplemented select algorithms to dramatically improve their performance. Examples include NumPy FFT and random number generation, SciPy FFT, and more.

Available for Windows, Linux and macOS.

Minimum System Requirements

  • Processors: Intel Atom® processor or Intel® Core™ i3 processor
  • Disk space: 1 GB
  • Operating systems: Windows* 7 or later, macOS, and Linux
  • Python* versions: 2.7.X, 3.5.X, 3.6
  • Included development tools: Conda, conda-env, Jupyter Notebook (IPython)


Diversity Genie


Diversity Genie is a desktop software tool which allows to analyze and manipulate chemical data. Its capabilities include:

  • mapping molecules and their properties with sammon embedding.

  • filtering and converting sets of molecules in SDF, SMILES, and InChI formats.

  • plotting histograms, scatter plots, and ROC curves.

  • Computing well-known molecular properties and merging CSV files.

  • Creating machine learning models using powerful gradient boosting methods.

Diversity Genie 3 is completely free to use by academia and for personal non-commercial use. You can download Mac OSX, Windows and Linux builds at