Macs in Chemistry

Insanely Great Science

Unix commands for helping deal with very large files


I'm regularly handling very large files containing millions for chemical structures and whilst BBEdit is my usual tool for editing text files in practice it becomes rather cumbersome for really large files (> 2 GB). In these cases I've compiled a useful list of UNIX commands that make life easier.

The page is part of the Hints and Tutorials section and can be viewed here.

Whilst I use them when dealing with large chemical structure files they are equally useful when dealing with any large text or data files.


A suggestion from a reader. Sometimes rather than one large file download sites provide the data as a large number of individual files. We can keep track of the number of files using this simple command.

MacPro:~ Chris$ ls | wc -1

If anyone has any additional suggestions please feel free to submit them.


Implementing AB-MPS scoring


Whilst the rule of 5 (Ro5) has provided a useful way to describe small molecule drug space it is also clear that there are a significant number of molecular classes that exist beyond the rule of 5 boundaries (bRo5). In a review of the AbbVie compound collection DOI they were able to identify key findings that might explain the success (or failure) of bRo5 projects. From an analysis of a variety of calculated physicochemical properties they proposed a simple multiparametric scoring function (AB-MPS) was devised that correlated preclinical PK results with cLogD, number of rotatable bonds, and number of aromatic rings.

AB-MPS = Abs(cLogD-3) + NAR + NRB

Now implemented as a Vortex script.


Chemical Information and Computer Applications Group (CICAG) website


The new RSC CICAG website is now live why not have a look and provide suggestions and feedback.


The Chemical Information and Computer Applications Group (CICAG) is one of the RSC’s many member-led Interest Groups, which exist to benefit RSC members and the wider chemical science community.

Also provides links to the social media feeds (Twitter, LinkedIn etc.)


Intel® Distribution for Python


Anyone fancy taking this for a test drive and providing some information on performance?

Get real performance results and download the free Intel Distribution for Python that includes everything you need for blazing-fast computing, analytics, machine learning, and more. Use Intel Python with existing code, and you’re all set for a significant performance boost.

The core computing packages, Numpy, SciPy, and scikit-learn, are accelerated under the hood with powerful, multithreaded native performance libraries such as Intel® Math Kernel Library, Intel® Data Analytics Acceleration Library, and others, to deliver native code-like performance results to Python. We leverage Intel® hardware capabilities using multiple cores and the latest Intel® Advanced Vector Extensions (Intel® AVX) instructions, including Intel® AVX-512. The Intel Python team reimplemented select algorithms to dramatically improve their performance. Examples include NumPy FFT and random number generation, SciPy FFT, and more.

Available for Windows, Linux and macOS.

Minimum System Requirements

  • Processors: Intel Atom® processor or Intel® Core™ i3 processor
  • Disk space: 1 GB
  • Operating systems: Windows* 7 or later, macOS, and Linux
  • Python* versions: 2.7.X, 3.5.X, 3.6
  • Included development tools: Conda, conda-env, Jupyter Notebook (IPython)


Diversity Genie


Diversity Genie is a desktop software tool which allows to analyze and manipulate chemical data. Its capabilities include:

  • mapping molecules and their properties with sammon embedding.

  • filtering and converting sets of molecules in SDF, SMILES, and InChI formats.

  • plotting histograms, scatter plots, and ROC curves.

  • Computing well-known molecular properties and merging CSV files.

  • Creating machine learning models using powerful gradient boosting methods.

Diversity Genie 3 is completely free to use by academia and for personal non-commercial use. You can download Mac OSX, Windows and Linux builds at



CCP4 release 7.0 update 056 now available


Collaborative Computational Project No. 4 (CCP4) exists to produce and support a world-leading, integrated suite of programs that allows researchers to determine macromolecular structures by X-ray crystallography, and other biophysical techniques.

Details of the latest update are here


Google Sumer of code, Open Chemistry Projects


The details of some of the projects taking part in the Google Summer of Code are now online here under the Open Chemistry header.

Really interesting work includes 3-D coordinate generation, standardising fingerprint APIs, a framework for molecular validation, and standardization and molecular dynamics in Avogadro.

Good luck to all that are taking part!!


deMon2k code version 5 released


deMon (density of Montréal) is a software package for density functional theory (DFT) calculations. It uses the linear combination of Gaussian-type orbital (LCGTO) approach for the self-consistent solution of the Kohn-Sham (KS) DFT equations. The calculation of the four-center electron repulsion integrals is avoided by introducing an auxiliary function basis for the variational fitting of the Coulomb potential.

The user guide provides installation instructions and requires a Fortran compiler, BASH and MPI.


ChemDoodle 9.0 released


I just saw that ChemDoodle 9.0 has been released and I plan to have a detailed look later this month.

ChemDoodle 9 is a major revision of every aspect of the software. We spent over 2 years overhauling and improving the cheminformatics engine, interface, drawing controls, image and chemical file types, graphics, and operating system compatibility. In addition to the new features, the entire codebase has been refactored for the current best standards to take advantage of the latest performance, memory and security features of the operating system.

What is new in ChemDoodle 9

  • A new user manual discusses all the new features in detail over several pages, too many to list here. (click to load manual, section 1.2)
  • Drawing and Graphics – Tons of new systems for making your graphics quicker. Auto-placement of attributes (charges/radicals/stereocenters/etc.). An improved text tool that can create both atom text and formatted captions. Draw chiral carbon nanotubes in addition to zigzag and armchair. New dynamic brackets and structure highlights. Better drawing tools for advanced figures.
  • Chemistry – State-of-the-art implementation of the most recent CIP rules. A clearer and more powerful warning system. Advanced implicit hydrogen handling including the analysis of advanced aromatic resonance systems. Full support for the latest elements as defined by IUPAC and much more!
  • Interface – A brand new customizable cursor system, improved IUPAC name-to-structure interface. Improved color palettes, now with Rasmol, CPK and Custom color sets. HTTPS support for PubChem is now implemented for access in MolGrabber. Improved color choosers including alpha support and high resolution improvements across the entire application.
  • Chemical Files – The Nature style sheet has been added. SMILES interpretation has seen significant work, with a focus on very advanced cheminformatics techniques. Added support for the RCSB MacroMolecular Transmission Format (MMTF). More support for ChemDraw, MDL CT, MRV and ISIS/Sketch files.
  • Images – TIFF images can now be exported with custom DPI settings. GIF image output can now have semi-transparent pixels merged with white. Added viewBox attribute for SVG. When saving files, you can now use alternate extensions and other image file chooser improvements. Control which image file types are shown in the save image choosers.
  • Vector Art – New glassware graphics have been added as well as dozens of new BioArt.
  • Customizability – The keyboard and tools shortcuts are now fully customizable by the user. The user settings folder location can now be controlled. * Custom attribute names and values are now persisted through restarts. Windows – Full support for high-DPI screens, without the manual scaling required in the past. The OLE plugin has been rebuilt for the most current compliance with Windows libraries.
  • macOS – Improved and full Retina support. Native file choosers.


Jupyter and Fortran


Well after my last post about Swift and Jupyter a reader sent me link to the use of both Julia and Fortran programming languages in a Jupyter Notebook.


More information in this lecture Project Jupyter: Architecture and Evolution of an Open Platform for Modern Data Science by Fernando Perez.

Project Jupyter, evolved from the IPython environment, provides a platform for interactive computing that is widely used today in research, education, journalism and industry. The core premise of the Jupyter architecture is to provide tools for human-in-the-loop interactive computing. It provides protocols, file formats, libraries and user-facing tools optimized for the task of humans interactively exploring problems with the aid of a computer, combining natural and programming languages in a common computational narrative.


Swift 4.1 in a Jupyter Notebook


I'm a great fan of Jupyter Notebooks but I only ever use python.

The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text

A recent post by Ray Yamamoto Hilton caught my eye who recently put together a little experiment to demonstrate using Swift 4.1 from within Jupyter Notebooks.

You can download a demo notebook here.



Amber 18 and AmberTools 18released


Amber is a suite of biomolecular simulation programs. It began in the late 1970's, and is maintained by an active develpment community

Amber 18 ajor new features include:

  • Free energy calculations on GPUs
  • GPU support for 12-6-4 ion potentials
  • Domain decomposition for CPU-parallelism
  • Nudged elastic band calculations for pmemd (CPU and partial GPU implementation)
  • Constant redox potential calculations, to supplement constant pH simulations
  • Support and significant performance improvements for the latest Maxwell, Pascal and Volta GPUs from NVIDIA.
  • New pmemd.gem code for advanced force fields, including AMOEB

AmberTools 18 new features include

  • CUDA-enabled pbsa solver; extensions for membrane modeling with PB *lambda-dynamics method for constant pH simulations *packmol_memgen tool for building lipids and bilayers *New ("middle") integration algorithms in sander *Build tools based on CMake *Continued updates and extensions to cpptraj: *ability to obtain energies from snapshots of PME simulations *Pairlist and other speedups *improved scripting abilities

Instructions for installing Amber under Mac OSX are here

You will need to install gfortran, whilst you can download the binary it might be worth considering using Homebrew as described here


NWChem updated


Just catching up.

NWChem 6.8 is now available on Github

NWChem provides many methods for computing the properties of molecular and periodic systems using standard quantum mechanical descriptions of the electronic wavefunction or density. Its classical molecular dynamics capabilities provide for the simulation of macromolecules and solutions, including the computation of free energies using a variety of force fields. These approaches may be combined to perform mixed quantum-mechanics and molecular-mechanics simulations.

Instructions for compiling NWChem on various platforms including Mac OSX


STK: A Python Toolkit for Supramolecular Assembly


I bookmarked this paper a while back but have only just had time to read it through, STK: A Python Toolkit for Supramolecular Assembly. STK is a tool for the automated assembly, molecular optimization and property calculation of supramolecular materials. It has a simple Python API and integration with third party computational codes.

The source code of the program can be found at and the detailed documentation is here.

Additional linking functional groups can be defined as SMARTS and STK can be extended by adding additional optimisation force-fields.



Top 12 unix commands for data scientists.


A really useful post on KDnuggets.

With the beautiful intuitive interface it is sometimes easy to forget that Mac OS X has unix underpinnings and that the Terminal gives access to whole set of invaluable tools.

This post is a short overview of a dozen Unix-like operating system command line tools which can be useful for data science tasks. The list does not include any general file management commands (pwd, ls, mkdir, rm, ...) or remote session management tools (rsh, ssh, ...), but is instead made up of utilities which would be useful from a data science perspective, generally those related to varying degrees of data inspection and processing. They are all included within a typical Unix-like operating system as well.

If you regularly have to deal with very large data files some of these commands will be invaluable, for example:

head outputs the first n lines of a file (10, by default) to standard output. The number of lines displayed can be set with the -n option.

head -n 5 my file.txt

Read more here.


Review of MOE 2018.01


The 2018.01 release of Chemical Computing Group's Molecular Operating Environment (MOE) software includes a number of new features, enhancements and changes. I written a review that highlights a number of the features.


Read more here….


Roundtrip editing with ChemDraw 17.1


Whenever there is an update to ChemDraw I always hold my breath to see if round-trip editing (i.e. the ability to copy and paste from a chemical drawing package into Word for example and then be able to copy and paste the structure back from Word into the chemical drawing application) has been broken.

Fortunately this blog post provides an invaluable update to the current situation.


RDKit code changes


I just saw this on the RDKit email circulation list and since I know a number of readers use RDKit I thought I'd mention it.

When we do the beta for the 2018.03.1 release we're going to switch the C++ backend to use modern C++ (=C++11). For people who can't switch to use that code, we will continue to provide bug fixes for the 2017.09 release for at least another 6 months.

This should only affect people who need to build the RDKit C++ code themselves. If you use a binary version of the RDKit like the ones available inside of Anaconda Python or KNIME, this change should have no impact upon you.

It looks like we're almost there. Hopefully we will be able to do a beta of the 2018.03 release by the end of the week.


Updated Literature search script


I've updated the Vortex script to run text based queries of PubMed.

If you regularly use the E-utilities API you might want to read this.

After May 1, 2018, NCBI will limit your access to the E-utilities unless you have one of these keys. Obtaining an API key is quick, and simple, and will allow you to access NCBI data faster. If you don’t have an API key, E-utilities will still work, but you may be limited to fewer requests than allowed with an API key.

After May 1, 2018, any computer (IP address) that submits more than 3 E-utility requests per second will receive an error message. This limit applies to any combination of requests to EInfo, ESearch, ESummary, EFetch, ELink, EPost, ESpell, and EGquery.

If you write software of scripts that access the E-utilities API then the users will need to get their own api key. Calls will have this format

I've updated this script to reflect this change, and I've highlighted where you need to add your api key in the script. I've also tried to ensure that any query string should be encoded to make it URL safe and I've extended the search range up to 2018.



iRASPA: GPU-accelerated visualization software for materials scientists


A recent publication DOI describes a new application for materials science.

A new macOS software package, iRASPA, for visualisation and editing of materials is presented. iRASPA is a document-based app that manages multiple documents with each document containing a unique set of data that is stored in a file located either in the application sandbox or in iCloud drive. The latter allows collaboration on a shared document (on High Sierra). A document contains a gallery of projects that show off the main features, a CloudKit-based access to the CoRE MOF database (approximately 8000 structures), and local projects of the user. Each project contains a scene of one or more structures that can initially be read from CIF, PDB or XYZ-files, or made from scratch. Main features of iRASPA are: structure creation and editing, pictures and movies, ambient occlusion and high-dynamic range rendering, collage of structures, (transparent) adsorption surfaces, cell replicas and supercells, symmetry operations like space group and primitive cell detection, screening of structures using user-defined predicates, and GPU-computation of helium void fraction and surface areas in a matter of seconds. Leveraging the latest graphics technologies like Metal, iRASPA can render hundreds of thousands of atoms (including ambient occlusion) with stunning performance.


iRASPA is available from Mac app store.


SeeSAR updated


A new version of SeeSAR is available (7.3), this update includes.

  • Easy mode switching: from the molecules table to the editor or the inspirator and back in just one click...
  • Automated workflows: in the settings you can now decide about which calculations should happen automatically
  • Menus re-organized: buttons are grouped for better overview and almost all table entries obtained a convenient context menu, simply right-click to give it a try
  • Excel export: this is one of the rather hidden Easter Eggs. Besides SDF you may save tables now as XLSX (including the 2D depiction)
  • Saved settings: user settings (the layout, background color, etc.) are now saved separately from project settings (filters and visualization features)

Full release notes are available.


RDkit in Samson


I've posted about Samson a couple of times and it just keeps getting better and better.

SAMSON is a novel software platform for computational nanoscience. Rapidly build models of nanotubes, proteins, and complex nanosystems. Run interactive simulations to simulate chemical reactions, bend graphene sheets, (un)fold proteins. SAMSON's generic architecture makes it suitable for material science, life science, physics, electronics, chemistry, and even education. SAMSON is developed by the NANO-D group at INRIA, and means "Software for Adaptive Modeling and Simulation Of Nanosystems.

A recent blog post highlights the use of RDKit in Samson.

In this post I will present you the RDKit-SMILES Manager module that I integrated in the SAMSON platform. As some of you know, RDKit is an open source toolkit for cheminformatics which is widely used in the bioinformatics research. One of its features is the conversion of molecules from their SMILES code to a 2D and 3D structures. Thanks to the new SAMSON Element, it is now possible to use these features in the SAMSON platform. SMILES code files (.smi) or text files (.txt) containing several SMILES codes can be read using the import button.

The new module allows you to import a file containing SMILES strings, generate 2D depictions, and by right-clicking on these images, you can open, generate the 3D structure in SAMSON or save the image as png or svg.


It is also possible to run substructure searching using SMARTS.


Rodeo: A Python IDE for Data Scientists


Just added Rodeo a python IDE built for analysing data to the page of data analysis tools.



Introducing IBM Watson Services for Core ML


This should be an interesting development for those developing scientific apps for iOS, the ability to access IBM Watson capabilities.

With Watson Services for Core ML, it’s easy to build apps that access powerful Watson capabilities right from iPhone and iPad, so you can provide dynamic, intelligent insights that improve over time. And with the IBM Cloud Developer Console for Apple, you can quickly tap into Watson Services for Core ML and other services on IBM Cloud

To get you started there is a project on GitHub

Classify images with Watson Visual Recognition and Core ML. The images are classified offline using a deep neural network that is trained by Visual Recognition.

There is a database of Mobile apps for science.


Chemistry WebVR:- This is so cool


Jonas Bostrom who spoke at the Chemistry on Mobile Devices Meeting just sent me a link to EduChem VR - WebVR highlighting the use of virtual reality in chemistry.

"Chemistry WebVR" is web-based platform to learn about organic chemistry. You can experience important concepts like stereochemistry, molecular geometries, atom orbitals or reactions mechanisms in a virtual reality. It is userfriendly and works direct in your smartphone browser. The target is University courses and advanced high-school levels.

There is a demo of a SN2 reaction here and if you explore you will see a link to sign up as a beta tester.


mmpdb: An Open Source Matched Molecular Pair Platform for Large Multi-Property Datasets


An interesting paper on chemrxiv DOI

Matched Molecular Pair Analysis (MMPA) enables the automated and systematic compilation of medicinal chemistry rules from compound/property datasets. Here we present mmpdb, an open source Matched Molecular Pair (MMP) platform to create, compile, store, retrieve, and use MMP rules. mmpdb is suitable for the large datasets typically found in pharmaceutical and agrochemical companies and provides new algorithms for fragment canonicalization and stereochemistry handling. The platform is written in Python and based on the RDKit toolkit. It is freely available from


NMR solvent peaks


I just noticed this mentioned on Twitter and so I've added it to the Mobile Science site.

NMR Solvent peaks is a conveniently-searchable version of the ungainly table of NMR data most organic chemists keep a copy of nearby. Instead of searching through the table for a peak near your unidentified peak, just enter your solvent and the peak's multiplicity and location and you'll have a short list of candidate impurities

There is also a web-based version and a twitter feed for submitting bugs and finding out about updates.

There are a number of other NMR apps available


WWDC 2018


The Apple Worldwide Developers Conference takes place in San Jose, CA, June 4–8. The opportunity to buy tickets to WWDC18 is offered by random selection. Registration is open until Thursday, March 22, 2018 at 10:00 a.m. PDT

To register, you must be a member of the Apple Developer Program or Apple Developer Enterprise Program as of March 13, 2018 at 10:00 a.m. PDT, and agree to the WWDC18 Registration and Attendance Policy. Your membership must be current, valid, and in good standing from this date until the end of WWDC18.


Flagging Potential Kinase Inhibitors


Most of kinase inhibitors bind in the region of the ATP binding site using the hydrogen bonding interactions of the hinge region shown in the schematic below. We can use the knowledge of these hinge binding motifs to flag potential kinase inhibitors.




BBEdit 12.1.2 Released


BBEdit 12.1.2 is a minor update to my favourite text editor.

From the release notes.

There's a new item in the Application preferences, as part of the software update settings: "Early Access". You can use this to turn on (or off) notification of pre-release maintenance updates for the version of BBEdit that you're using. (Note that even if you turn on Early Access, you will not receive notice of pre-release versions of feature updates or major upgrades.)

A new setting in the "Editing" preferences allows you to control whether tick marks appear in the scroll bar for Live Search matches. Turning this off can be useful if you're working in very large files and have so many results that the application stalls while trying to update the marks.

There are also a number of bug fixes including.

Fixed bug in which the Markdown tokenizer was confused by empty URL references (e.g. ) in such a way that editing in certain subsequent parts of the file would cause syntax coloring to get out of whack. This change also fixes a bug in the Markdown syntax coloring in which links with an empty description or URL were not properly recognized and colored.

BBEdit 12.1.2 requires Mac OS X 10.11.6 or later, and is compatible with macOS 10.13 "High Sierra"

I use BBEdit extensively for Markdown editing but there are a number of alternatives.


Top 20 programming languages


Red Monk have published their Programming Language Rankings. The data source used for these queries is the GitHub Archive.

  1. JavaScript
  2. Java
  3. Python
  4. PHP
  5. C#
  6. C++
  7. CSS
  8. Ruby
  9. C
  10. Swift
  11. Objective-C
  12. Shell
  13. R
  14. TypeScript
  15. Scala
  16. Go
  17. PowerShell
  18. Perl
  19. Haskell
  20. Lua

Swift (+1): Finally, the apprentice is now the master. Technically, this isn’t entirely accurate, as Swift merely tied the language it effectively replaced – Objective C – rather than passing it. Still, it’s difficult to view this run as anything but a changing of the guard. Apple’s support for Objective C and the consequent opportunities it created via the iOS platform have kept the language in a high profile role almost as long as we’ve been doing these rankings. Even as Swift grew at an incredible rate, Objective C’s history kept it out in front of its replacement. Eventually, however, the trajectories had to intersect, and this quarter’s run is the first occasion in which this has happened. In a world in which it’s incredibly difficult to break into the Top 25 of language rankings, let alone the Top 10, Swift managed the chore in less than four years. It remains a growth phenomenon, even if its ability to penetrate the server side has not met expectations.


Three-Dimensional Printing of Ellipsoidal Structures Using Mercury


A recent paper on ChemRxiv

A description of how to use the Mercury software from the CCDC to print 3-dimensional crystal structures that depict the anisotropic displacement parameters, matching the commonly used ellipsoidal depiction used in scientific papers. Details on how to convert a cif file into a 3D printing data file is included in the main paper, and details on the preparation of that data file for printing on a number of different 3D printers is included in the ESI.


There is more on 3D printing here .


Vortex update

Dotmatics have announced the impending release of the latest update to Vortex

The focus appears to be on the enhancement of the Vortex bioinformatics tools reviewed previously.


Script Debugger 7 released


A new version of Script Debugger has been released.

Script Debugger is an integrated development environment focused entirely on AppleScript. This focus allows it to deliver a suite of tools that make AppleScript development amazingly productive. You can use it to write and edit code, analyze target applications, debug scripts, and more.



Second Major DeepChem Release


A major update the DeepChem has been announced.

This major version release finishes consolidating the DeepChem codebase around our TensorGraph API for constructing complex models in DeepChem. We've made a variety of improvements to TensorGraph's saving/loading features and added a number of new tutorials improving our documentation of TensorGraph. We've also removed a number of older deprecated submodules and models in favor of the new, standardized TensorGraph implementations.

In addition, we've implemented a number of new deep models and algorithms, including DRAGONNs, Molecular Autoencoders, MIX+GANs, continuous space A3C, MCTS for RL, Mol2Vec and more. We've also continued improving our core graph convolutional implementations.

Also remember the RSC-BMCS / RSC-CICAG Artificial Intelligence in Chemistry Meeting registration is now open.


SAMSON 0.7.0 is available


SAMSON has been updated with a number of cool features, I particularly like the embedded Jupyter console.

SAMSON is a platform for computational nanoscience.

Python scripting is now available! Most of the SAMSON API is exposed in Python, and a Jupyter console embedded in SAMSON allows you to create models and run simulations, generate movies, perform analysis and reporting, etc., directly from scripts.


What’s more, Python makes it even easier to integrate and pipeline SAMSON and SAMSON Elements with well-known packages from diverse fields, e.g. TensorFlow, PyRosetta, RDKit, ASE, etc., to name a few.


Data Aanlysis tools


I've just added the simple lightweight CSV editor Table Tool to the Data Analysis tools page.

The Data Analysis tools page contains a listing of over 100 applications, tools and libraries that can be used for data analysis under Mac OSX.


OMEGA v3.0.0 released


Conformational analysis is a critical component of molecular modelling and I've always viewed OMEGA from OpenEye as the standard to which all other software packages should be compared.

OMEGA's knowledge-based approach produces high-quality conformers, superior to those of many other methods. It has also been found to be the fastest of commercially available conformer generators. Benchmarking Conformer Ensemble Generators, Friedrich, N.-O. de Bruyn Kops, C. Fachsenberg, F. Sommer, K., Rarey, M. Kirchmair, J. J. Chem. Inf. Model. 2017, 57, 2719-2728. DOI.

OMEGA’s capability has been expanded for molecules containing large rings by adding a method specifically tuned to sample macrocyclic conformational space. The approach is based on a rewritten version of the original OMEGA distance geometry algorithm.


In this update support for macOS El Capitan (10.11), macOS Sierra (10.12), and macOS High Sierra (10.13) has been added.


Microsoft Quantum Development Kit Samples and Libraries under MacOSX


Well this is well out of my comfort zone but I thought I'd mention it.

Welcome to the Microsoft Quantum Development Kit! This repository contains the libraries and samples provided with the Quantum Development Kit

The Microsoft Quantum Development Kit has been tested under MacOSX, Ubuntu Linux, but may work on other distributions. The Python interoperability feature has been developed for the Anaconda distribution of Python 3.6. Please see the README file provided with the Python sample for more details

Thank you for your interest in Microsoft Quantum Development Kit preview. The development kit contains the tools you'll need to build your own quantum computing programs and experiments.

So off you go…..


Google Summer of Code:- Open Chemistry


There are a number of interesting projects being undertaken in this years Google Summer of Code.

If you know of any students that might be interested then perhaps point them to the Open Chemistry Project.

The Open Chemistry project is a collection of open source, cross platform libraries and applications for the exploration, analysis and generation of chemical data. The organization is an umbrella of leading projects developed by long-time collaborators and innovators in open chemistry such as the Avogadro, Open Babel, and cclib projects. These three alone have been downloaded over 700,000 times and cited in over 2,000 academic papers. Our goal is to improve the state of the art, and facilitate the open exchange of chemical data and ideas while utilizing the best technologies from quantum chemistry codes, molecular dynamics, informatics, analytics, and visualization.

There is a list of the GSoC Ideas 2018 here but of course students can add their own.


MOE update 2018.01 released


The latest update to Chemical Computing Group's Molecular Operating Environment (MOE) software includes a variety of new features, enhancements

Windows XP (finally!) and macOS 10.6 have been removed from the list of officially supported platforms. Supported Windows platforms are Vista/7/8/10, and the minimum supported macOS is 10.7 (Lion).

Amber14:EHT Forcefield. The Amber14 parameter set is now supported in MOE. The new parameters consist of improvements to nucleic acids; otherwise, protein and small molecule parameters (and charges) are unchanged. The forcefield can be selected in the MOE | Footer.

TCR-MHC Protein Complex Database. A new MOE Project database containing T-Cell Receptor (TCR) – Major Histocompatibility Complex (MHC) x-ray structures has been added to MOE. The database can be accessed with MOE | Protein | Search | TCR-MHC | TCR-MHC which will launch the MOE Project Search panel.

Several applications have been parallelized to run in the moe -mpu environment:

  • Descriptor calculations with the SVL function QuaSAR_DescriptorMDB.
  • Energy minimization in the Database Viewer DBV | Compute | Molecule | Energy Minimize.
  • Conformational search using MDB input files in MOE | Compute | Conformations | Search.
  • Rotamer library generation with DBV | Compute | Build Rotamer Library.
  • Project database creation with the SVL run file dbupdate.svl and the scripts $MOE/bin/projupdate and $MOE/bin/projupdate.bat.

I plan to review the latest version of MOE in the near future.


CDD Vault is Now an ELN


CDD Vault ELN is an extension to CDD Vault for archiving and selectively sharing experimental text, data. CDD Vault ELN helps you capture and collaborate around unstructured information (conversations, notes, documents, images, files) and structured data (experimental results, plots, SAR).

You can easily capture and link to a variety of objects in CDD Vault ELN including:

  • Images
  • File attachments
  • Links to CDD Vault & other resources
  • Tables
  • Structures


Awesome Python Chemistry


A curated list of awesome Python frameworks, libraries, software and resources related to Chemistry.

A blog post giving more details


Vida updated


VIDA v4.4.0 has been released. This upgrade adds several new features and fixes many previous issues.

  • A new ribbon style that produces ribbons with a smoother appearance has been introduced into VIDA.


  • Improvements to the Builder/Sketcher, including:
  • closing the Sketcher window prompts for Save, Save as New, Discard, or Cancel
  • closing the Builder closes the Sketcher window
  • an additional “Save As New” option in the toolbar and Builder context menu
  • hitting Return now finishes adding typed-in molecules from the Sketcher
  • Significant improvements to the Extension Manager. In addition, extensions can be centrally deactivated.

VIDA is built on top of the OpenEye Toolkits v2017.Oct libraries to ensure that it and ancillary programs take full advantage of the state-of-the-art improvements in all underlying programming libraries. Support for macOS El Capitan (10.11), macOS Sierra (10.12), and macOS High Sierra (10.13) has been added.


KNIME tutorial


Don't forget to sign up for your chance to hear a webinar by Greg Landrum, Knime's VP for Life Sciences, this Wednesday, He will be talking about processing malaria HTS results using Knime and will give a tutorial on workflows developed for ligand-based virtual screening, based on results of a phenotypic HTS against malaria.

Wed, Feb 21, 2018 3:00 PM - 4:00 PM GMT

Register Here.


The Royal Society of Chemistry Chemical Information and Computer Applications Group (CICAG) Winter Newsletter is now available Online


The Winter 2017-18 edition of the CICAG Newsletter has been published and can be downloaded from the Newsletters webpage.

Features in this edition which may be of interest include: * Details of CICAG's upcoming Artificial Intelligence in Chemistry meeting * 30th Anniversary celebration of the Catalyst Science Discovery Centre and a look at the scientific history and achievements of the area * Tony Kent Strix Award and Annual Lecture 2017 and eLucidate from UKeiG * Other CICAG planned and proposed meetings along with other upcoming conferences and events * Meeting reports * Book reviews * News from Infochem and CAS * A review of the latest chemical information news and developments

PhD Student and Post-Doc Conference Bursaries

Did you know that most CICAG sponsored meetings have a number of bursaries available for PhD and post-doctoral students? Normally up to a value of £250, these awards help to cover registration and travel costs. Preference will be given to members of the RSC (and meeting co-sponsors if applicable), especially those who are selected to give posters.

RSC-BMCS / RSC-CICAG Artificial Intelligence in Chemistry Friday, 15th June 2018 Royal Society of Chemistry at Burlington House, London, UK. Twitter hashtag - #RSC_AIChem


Google summer of code chemistry ideas


The Open Chemistry project have collected together project ideas for GSoC 2018. The projects cover a wide range of projects in chemistry

The full listing is available here and includes projects that make use of a number of open source toolkits such as Open Babel, RdKit and cclib.


Molecular Materials Informatics Apps


Molecular Materials Informatics, Inc have been busy recently with updates to many of their applications

The following mobile apps have all been updated

PolyPharma Poly-pharmacology of molecular structures: use structure activity relationships to view predicted activities against biological targets, physical properties, and off-targets to avoid. Calculations are done using Bayesian models and other kinds of calculations that are performed on the device.

Green Lab Notebook allows recording of multistep chemical reactions, using molecular structure, name and stoichiometry as the primary components. When quantities are provided, interconversions are calculated automatically, and green chemistry metrics are shown.

SAR Table app is designed for creating tables containing a series of related structures, their activity/property data, and associated text. Structures are represented by scaffolds and substituents, which are combined together to automatically generate a construct molecule. The table editor has many convenience features and data checking cues to make the data entry process as efficient as possible.

MolPrime is a chemical structure drawing tool based on the unique sketcher from the Mobile Molecular DataSheet (MMDS).

Approved Drugs app contains over a thousand chemical structures and names of small molecule drugs approved by the US Food & Drug Administration (FDA). Structures and names can be browsed in a list, searched by name, filtered by structural features, and ranked by similarity to a user-drawn structure. The detail view allows viewing of a 3D conformation as well as tautomers. Structures can be exported in a variety of ways, e.g. email, twitter, clipboard.

Green Solvents reference card for chemical solvents, with data regarding their "greenness": safety, health and environmental effects.

For the desktop the OS X Molecular DataSheet (XMDS) is an interactive cheminformatics tool for viewing and editing molecular structures, chemical reactions and data. It is designed to be instantly intuitive to anyone who has used a Mac, a spreadsheet and any chemical structure sketcher.



BBEdit 12 is now 64bit


To call BBEdit a text editor is a great injustice, it is the Swiss army knife of text editors and I use it constantly.

The latest update has a major change, BBEdit is now 64-bit this comes with several advantages as the release notes describe

BBEdit is now built as a 64-bit application. This works around various reported bugs in the OS and has other beneficial side effects: the application starts more quickly on a "cold" launch; 64-bit color pickers and contextual-menu plug-ins are now available; and our customers are even more handsome and athletic than before.

Beginning with this version, you can open documents that are much larger than was previously possible. In the Before Time, documents whose in-memory size (about twice the on-disk size) exceeded roughly 1.5GB would fail to open and report an out-of-memory error, as would documents whose internal structure required generation of large quantities of syntax coloring and/or code folding information (such as complicated XML documents). Beginning with this version, you can perform many large-scale operations on very large files without running out of memory or needing to clear Undo state. Support for the Touch Bar has been added to various windows (applicable only to computers that have a Touch Bar, of course):

There are many more updates and fixes described in detail in the release notes.

BBEdit 12 requires macOS 10.11.6 ("El Capitan") or later, and is compatible with macOS 10.13 "High Sierra".

If you are using macOS 10.13 "High Sierra", please make sure that you have updated to the latest available OS version (10.13.3 or later).


MayaChem Tools


MayaChemTools is a fabulous collection of Perl and Python scripts, modules, and classes to support a variety of day-to-day computational discovery needs.

The core set of command line Perl scripts available in the current release of MayaChemTools has no external dependencies and provide functionality for the following tasks:

  • Manipulation and analysis of data in SD, CSV/TSV, sequence/alignments, and PDB files
  • Listing information about data in SD, CSV/TSV, Sequence/Alignments, PDB, and fingerprints files
  • Calculation of a key set of physicochemical properties, such as molecular weight, hydrogen bond donors and acceptors, logP, and topological polar surface area
  • Generation of 2D fingerprints corresponding to atom neighborhoods, atom types, E-state indices, extended connectivity, MACCS keys, path lengths, topological atom pairs, topological atom triplets, topological atom torsions, topological pharmacophore atom pairs, and topological pharmacophore atom triplets
  • Generation of 2D fingerprints with atom types corresponding to atomic invariants, DREIDING, E-state, functional class, MMFF94, SLogP, SYBYL, TPSA and UFF
  • Similarity searching and calculation of similarity matrices using available 2D fingerprints
  • Listing properties of elements in the periodic table, amino acids, and nucleic acids
  • Exporting data from relational database tables into text files

The command line Python scripts based on RDKit provide functionality for the following tasks:

  • Calculation of molecular descriptors
  • Comparison 3D molecules based on RMSD and shape
  • Conversion between different molecular file formats
  • Enumeration of compound libraries and stereoisomers
  • Filtering molecules using SMARTS, PAINS, and names of functional groups
  • Generation of graph and atomic molecular frameworks
  • Generation of images for molecules
  • Performing structure minimization and conformation generation based on distance geometry and forcefields
  • Picking and clustering molecules based on 2D fingerprints and various clustering methodologies
  • Removal of duplicate molecules

These invaluable scripts can be used in other applications, I've written a Vortex Script that uses them.


Artificial Intelligence in Chemistry


I mentioned the first announcement of a meeting to be held next year.

RSC-BMCS / RSC-CICAG Artificial Intelligence in Chemistry Friday, 15th June 2018 Royal Society of Chemistry at Burlington House, London, UK.
Twitter hashtag - #RSC_AIChem


A number of the speakers have now been confirmed.

Confirmed Speakers

Keynote: What I learned about machine learning - revisited Bob Sheridan, Merck

Presentation title to be confirmed Nadine Schneider, Novartis

Scaling de novo design, from single target to disease portfolio Wilhem van Hoorn, Exscientia

Presentation title to be confirmed Marwin Segler, Benevolent AI

Molecular de novo design through deep learning Ola Engkvist, AstraZeneca

I also notice that there are a number of EPSRC funding opportunities

Artificial Intelligence - UKRI CDTs EPSRC is expected to support 10-20 doctoral training positions.

The call is now open for around 15 Centres for Doctoral Training (CDTs) focused on areas relevant to Artificial Intelligence (AI) across UKRI's remit. This call opens against the background of Professor Dame Wendy Hall and Jérôme Pesenti's review, Growing the artificial intelligence industry in the UK, and the Government's Industrial Strategy White Paper, Building a Britain fit for the Future. This investment in AI skills will be kick-started by support for over 100 studentships that will be funded during 2018/19 via the Research Councils current mechanisms and schemes.

Universities are invited to apply against two priority areas:

Enabling Intelligence, a priority area within Engineering and Physical Sciences Research Council's (EPSRC) main CDT call
Applications and Implications of Artificial Intelligence (AIAI), a new priority area relevant to all Research Councils.

More info..


Screenlamp:- A toolkit for ligand-based virtual screening


A recent publication "Enabling the hypothesis-driven prioritization of ligand candidates in big databases: Screenlamp and its application to GPCR inhibitor discovery for invasive species control" {DOI]( describes a very interesting software tool for virtual screening.

While the advantage of screening vast databases of molecules to cover greater molecular diversity is often mentioned, in reality, only a few studies have been published demonstrating inhibitor discovery by screening more than a million compounds for features that mimic a known three-dimensional (3D) ligand. Two factors contribute: the general difficulty of discovering potent inhibitors, and the lack of free, user-friendly software to incorporate project-specific knowledge and user hypotheses into 3D ligand-based screening. The Screenlamp modular toolkit presented here was developed with these needs in mind.

The Screenlamp homepage gives more details and installation instructions. Screenlamp is written in Python (3.6) and can be downloaded from GitHub

Certain submodules within screenlamp require external software to sample low-energy conformations of molecules and to generate pair-wise overlays. The tools that are currently being used in the pre-built, automated screening pipeline are OpenEye OMEGA and OpenEye ROCS to accomplish those tasks. However, screenlamp does not strictly require OMEGA and ROCS, and you are free to use any open source alternative that provided that the output files are compatible with screenlamp tools, which uses the MOL2 file format.

Screenlamp is research software and has been made available to other researchers under a permissive Apache v2 open source license.


Wolfram|Alpha Updated


Wolfram|Alpha has been updated.

Wolfram|Alpha. Building on 25 years of development led by Stephen Wolfram, Wolfram|Alpha has rapidly become the world's definitive source for instant expert knowledge and computation.

There are more apps on the MobileScience website.


Spark V10.5 released


Cresset have just announced the latest release of Spark a scaffold hopping and bioisostere replacement tool.



  • New wizards to support ligand growing and linking, macrocyclization and water replacement experiments
  • Enhanced Spark database update functionality
  • New pharmacophore constraints
  • Enhancements in search algorithm and advanced options.


Findings2 released


A new version of the very popular electronic notebook Findings has been released. You can try it out for free with no time limit. It allows the creation of up to 20 entries. Purchase Findings Pro to allow the creation of unlimited entries.


Remember there is a mobile version of Findings for you iPhone or iPad.


SeeSAR version 7.2 released


SeeSAR has been updated.

Get fresh inspiration from this huge update of SeeSAR! We realized, on the one hand, that the functionality of the editor was growing and growing, making it more and more complicated to use. On the other hand, access to the full functionality of ReCore demands a different kind of user interface. So we "took the bull by the horns" and, akin to the editor, created the new Inspirator which you can use to do:

  • Core replacement This feature is the same but with a much improved UI. You are able to directly select and visualize the bonds that will be clipped to carve out a core fragment for replacement. The clipped bonds now remain in place (even while you define sphere constraints) up until you define a new query. Also the display of results is much enhanced, as you can see the new core fragments highlighted in 2D as well as in 3D. For reference, your query molecule stays visible as well.
  • Fragment linking and merging You may of course launch the Inspirator with more than just one molecule. In this case, you can define bonds to clip on different molecules, thereby requesting linker fragments that will connect the remaining pieces. Note that it is not mandatory to clip a terminal part of each molecule to create the query, you may replace a core part in one and connect it to another fragment at the same time.
  • Fragment growing This was possibly the most frequently requested functionality in ReCore: Cut just one bond and grow onto this bond using a fragment library of typical side chains. In this way, you can, for example, reach out to nearby subpockets. The new growing algorithm can very quickly scan through a (for now) ready-made library of typical fragments. You may of course define sphere constraints at the same time in order to target particular locations in the bi

You can download SeeSAR here and use it for free for 7 days.


Xplor-NIH for molecular structure determination from NMR


A discussion on the new developments of Xplor-NIH DOI. Xplor-NIH is a popular software package for biomolecular structure determination from nuclear magnetic resonance (NMR) and other data sources.

Most of Xplor-NIH's code is now being developed directly in the Python language, and thus is directly accessible for modification by the end-user without recompilation, while code paths which require high performance, such as those executed at every timestep of molecular dynamics, are coded in C++. The Python interface to Xplor-NIH provides an extensible toolbox for developing further functionality. Precompiled packages for most popular Unix and Unix-like operating systems (such as Linux and Mac OS X), as well as documentation and support are available directly from


MedChemStructures Genius


The idea behind MedChem Structures Genius is that the chemical structure can be used as a visual and semantical mark to gain information on drug molecules (mode of action, side effects, bioavailability,…). This app, aimed at both students and professionnals, allows learning to recognize chemical drug structures and link them to their INN and their pharmacological class. The quiz allows self evaluation. Only small molecules and peptides and biochemical molecules are listed (no biologics, vaccines, …). The drug classification has been adapted from the ATC WHO classification.


There are many more science apps on the Mobile Science site.


GROMACS updated


The official release of GROMACS 2018 is now available.

GROMACS is one of the major software packages for the simulation of biological macromolecules.

Highlights from this update include:-

  • PME long-ranged interactions can now run on a single GPU, which means many fewer CPU cores are needed for good performance.
  • Optimized SIMD support for recent CPU architectures: AMD Zen, Intel Skylake-X and Skylake Xeon-SP.

  • The AWH (Accelerated Weight Histogram) method is now supported, which is an adaptive biasing method used for overcoming free energy barriers and calculating free energies (see

  • A new dual-list dynamic-pruning algorithm for the short-ranged interactions, that uses an inner and outer list to permit a longer-lived outer list, while doing less work overall and making runs less sensitive to the choice of the “nslist” parameter.
  • A physical validation suite is added, which runs a series of short simulations, to verify the expected statistical properties, e.g. of energy distributions between the simulations, as a sensitive test that the code correctly samples the expected ensemble.
  • Conserved quantities are computed and reported for more integration schemes - now including all Berendsen and Parrinello-Rahman schemes.


Fortran on a Mac


I was sent a few updates over the Christmas break and so I've updated the Fortran on a Mac page.


SeeSAR for Parallelized Fragment Growing & Pocket Exploration


I see that SeeSAR now supports a parallelized 'real' fragment growing.

SeeSAR is a software tool for interactive, visual compound prioritisation as well as compound evolution. Structure-based design work ideally supports a multi-parameter optimization to maximise the likelihood of success, rather than affinity alone. Having the relevant parameters at hand in combination with real-time visual computer assistance in 3D is one of the strengths of SeeSAR. Stimulating exploration with SeeSAR, we have embarked on pursuing a new cheminformatics compute paradigm of "Propose & Validate".


You can download SeeSAR here and use it for free for 7 days.


Behind the Scenes in Real-Life Software Design By Stephen_Wolfram · 48 videos


I just stumbled across a fascinating series of lectures. These are recordings of the live discussions behind the ongoing software development led by Stephen Wolfram.

Of particular interest might be the discussion on incorporating chemistry into the Wolfram language.


UCSF ChimeraX


A recent publication DOI describes an update to the popular molecule viewer UCSF Chimera

UCSF ChimeraX is next-generation software for the visualization and analysis of molecular structures, density maps, 3D microscopy, and associated data. It addresses challenges in the size, scope, and disparate types of data attendant with cutting-edge experimental methods, while providing advanced options for high-quality rendering (interactive ambient occlusion, reliable molecular surface calculations, etc.) and professional approaches to software design and distribution.

The application can be downloaded here

It is important to note that ChimeraX is not backward compatible with Chimera and does not read Chimera session files. It has been tested on MacOS X 10.12. The ChimeraX user interface is implemented in Qt, offering a native-like look and feel on each platform. ChimeraX is largely implemented using Python, an interpreted programming language. To manipulate these very large datasets interactively, ChimeraX uses memory-efficient data structures combined with high-performance algorithms implemented in C++. MacroMolecular Crystallographic Interchange Format (mmCIF) is the preferred format for atomic data in ChimeraX, mmCIF replaces the aged and more limited PDB format and offers a number of advantages.



Python support in Excel


The most popular suggestion on the "How can we improve Excel for Windows" forum is Python as an Excel scripting language with over 4500 votes and it has elicited a comment from the MSFT excel team.

Thanks for the continued passion around this topic. We’d like to gather more information to help us better understand the needs around Excel and Python integration.

Followed by a survey.

Of course one would hope that they also add it to the Mac version of Excel.


Suggestions for a Laser Pointer


I give a course that consists of a full day of lectures, in the past I've had to use a selection of laser pointers/batteries because they don't last.

So I'm looking for a laser pointer that will last for several hours, and be bright enough to show up on the large flat screens used in many lecture theatres these days.

Any suggestions welcome.


Predicting the Conformational Energy of Small Molecules


An interesting publication in JCIM, Atom Types Independent Molecular Mechanics Method for Predicting the Conformational Energy of Small Molecules, DOI.

We report herein our effort to incorporate lone pairs into our model to extend its applicability domain to any saturated small molecules. The developed model H-TEQ 2 has been validated on a wide variety of molecules from polyaromatic molecules to carbohydrates and molecules with high heteroatoms/carbon ratios.


Deep Learning Cheat Sheet (using Python Libraries)


Just came across this really invaluable resource.

  • Deep Learning Cheat Sheet (using Python Libraries)
  • PySpark Cheat Sheet: Spark in Python
  • Data Science in Python: Pandas Cheat Sheet
  • Cheat Sheet: Python Basics For Data Science
  • A Cheat Sheet on Probability
  • Cheat Sheet: Data Visualization with R
  • New Machine Learning Cheat Sheet by Emily Barry
  • Matplotlib Cheat Sheet
  • One-page R: a survival guide to data science with R
  • Cheat Sheet: Data Visualization in Python
  • Stata Cheat Sheet
  • Common Probability Distributions: The Data Scientist’s Crib Sheet
  • Data Science Cheat Sheet
  • 24 Data Science, R, Python, Excel, and Machine Learning Cheat Sheets
  • 14 Great Machine Learning, Data Science, R , DataViz Cheat Sheets




YANK is a GPU-accelerated Python framework for exploring algorithms for alchemical free energy calculations.


  • Modular Python framework to facilitate development and testing of new algorithms
  • GPU-accelerated via the OpenMM toolkit
  • Alchemical free energy calculations in both explicit and implicit solvent
  • Hamiltonian exchange among alchemical intermediates with Gibbs sampling framework
  • General Markov chain Monte Carlo framework for exploring enhanced sampling methods
  • Built-in equilibration detection and convergence diagnostics
  • Support for AMBER prmtop/inpcrd files
  • Support for absolute binding free energy calculations
  • Support for transfer free energies (such as hydration or partition free energies)

Install using conda

$ conda config --add channels omnia --add channels conda-forge
$ conda install yank

conda will install dependencies from binary packages automatically, including difficult-to-install packages such as OpenMM, numpy, and scipy. YANK runs on Python 3.5, and Python 3.6


Mac in Chemistry Annual website review


At the end of each year I have a look at the website analytics to see which items were the most popular.

Over the year there were 60,000 unique visitors with 25% visiting the site on multiple occasions. The US provided 30% of the visitors and the UK 10% with Germany, Canada and Japan around 5%. As might be expected 60% of the visitors were using a Mac, but 25% of the visitors were Windows users and 10% iOS. Looking at the last month's Mac visitors, 53% were using Mac OS X 10.13, 25% 10.12 and 12% 10.11.

Safari and Chrome (each 41%) were the most used web browsers with the once dominant Internet Explorer down at 2%.

The most viewed blog pages in 2017 were

The most popular web pages were (other than the main page)

The continued popularity of the Fortran on a Mac web page is interesting, I'm not a big Fortran user but if anyone knows of items that could be added to the page I'd be delighted to hear about them. I've done a couple of updates to the Cheminformatics on a Mac page and I think I'll need to add a section on Bioconda in the future.

Interestingly the Scientific Applications under High Sierra page was of only transient popularity. It seem this update to Mac OSX was relatively benign with very few issues.

2017 also saw the 2000th download of iBabel, iBabel is a GUI (graphical user interface) for the open source cheminformatics toolkit OpenBabel. It also provides an interface to a variety of tools built using OpenBabel and a molecule viewer. I'm planning to do an update to iBabel to take advantage of some of the updates to OpenBabel but if you have any suggestions I'd happy to see if I can include them.


2017 also saw the migration of the website from http to https, a change that went pretty seamlessly with only a couple of minor glitches.

The Twitter feed is increasing in popularity with 390 followers. The most popular tweets were

Creating a Bioconda recipe
RSC meeting on AI in Chemistry

The RSS feed still has around 100 followers