Macs in Chemistry

Insanely Great Science

open source

SketchEl 2


As highlighted recently SketchEl2 a chemical drawing package is now open source.


The SketchEl 2 project is underway as a desktop app, based on web technology and delivered as an Electron package. The GitHub repository is now public, on account of there being enough functionality to be arguably useful. This is a very early release, so do be ready to give some useful feedback if you feel so inclined to try it out.

The repository can be found here


Psi4 1.1: An Open-Source Electronic Structure Program


A recent paper describes Psi4 1.1: An Open-Source Electronic Structure Program Emphasizing Automation, Advanced Libraries, and Interoperability DOI

Psi4 is an ab initio electronic structure program providing methods such as Hartree–Fock, density functional theory, configuration interaction, and coupled-cluster theory. The 1.1 release represents a major update meant to automate complex tasks, such as geometry optimization using complete-basis-set extrapolation or focal-point methods. Conversion of the top-level code to a Python module means that Psi4 can now be used in complex workflows alongside other Python tools.

Psi4 1.1 can be downloaded from here with versions supporting Python 2.7, 3.5 and 3.6.

Note the installation instructions for Mac: Install XCode via the App Store, Make sure you open XCode and accept the license agreement after you install.


Scaffold Hunter update


Scaffold Hunter is a chemical data organization and analysis tool and that has been continuously enhanced since the start of its development in 2007. The platform-independent open-source tool was first released in 2009 and provided an interactive visualisation of the so-called scaffold tree, which is a hierarchical classification scheme for molecules based on their common scaffolds. A recent publication describes recent extensions that significantly increase the applicability for a variety of tasks DOI.

When I first opened the application I did not find it particularly intuitive, fortunately there is a online tutorial and sample datasets available.


aRMSD: A Comprehensive Tool for Structural Analysis


aRMSD is an open toolbox for structural comparison between two molecules with various capabilities to explore different aspects of structural similarity and diversity. Crystallographic data provided from cif files is fully supported and the results can be rendered with the help of the vtk package.

A. Wagner, H.-J. Himmel, J. Chem. Inf. Model, 2017, 57, 428-438 DOI


MayaChemTools: An Open Source Package for Computational Drug Discovery


Just noticed this paper.

MayaChemTools: An Open Source Package for Computational Drug Discovery 10.1021/acs.jcim.6b00505">DOI.

MayaChemTools is a growing collection of Perl scripts, modules, and classes to support a variety of computational drug discovery needs, such as manipulation and analysis of data, generation of two-dimensional (2D) fingerprints, similarity searching, and calculation of physicochemical properties.

MayaChemTools is freely available online at, under the terms of the GNU LGPL, as published by the Free Software Foundation.

It is possible to access them using a Vortex script.


Darwin source code released


It is sometimes difficult to remember that the heart of Mac OSX is the open-source Darwin source code.

Apple have recently released the latest update OS X 10.12 Source.

In addition Apple have made Swift open-source which supports a wider variety of platforms.


OpenBabel 2.4.0 released


A major new update to OpenBabel has been released, version 2.4.0 is a significant change and is highly recommended.

New file formats

  • DALTON output files (read only) and DALTON input files (read/write) (Casper Steinmann)
  • JSON format used by ChemDoodle (read/write) (Matt Swain)
  • JSON format used by PubChem (read/write) (Matt Swain)
  • LPMD's atomic configuration file (read/write) (Joaquin Peralta)
  • The format used by the CONTFF and POSFF files in MDFF (read/write) (Kirill Okhotnikov)
  • ORCA output files (read only) and ORCA input files (write only) (Dagmar Lenk)
  • ORCA-AICCM's extended XYZ format (read/write) (Dagmar Lenk)
  • Painter format for custom 2D depictions (write only) (Noel O'Boyle)
  • Siesta output files (read only) (Patrick Avery)
  • Smiley parser for parsing SMILES according to the OpenSMILES specification (read only) (Tim Vandermeersch)
  • STL 3D-printing format (write only) (Matt Harvey)
  • Turbomole AOFORCE output (read only) (Mathias Laurin)
  • A representation of the VDW surface as a point cloud (write only) (Matt Harvey)

New file format capabilities and options

  • AutoDock PDBQT: Options to preserve hydrogens and/or atom names (Matt Harvey)
  • CAR: Improved space group support in .car files (kartlee)
  • CDXML: Read/write isotopes (Roger Sayle)
  • CIF: Extract charges (Kirill Okhotnikov)
  • CIF: Improved support for space-groups and symmetries (Alexandr Fonari)
  • DL_Poly: Cell information is now read (Kirill Okhotnikov)
  • Gaussian FCHK: Parse alpha and beta orbitals (Geoff Hutchison)
  • Gaussian out: Extract true enthalpy of formation, quadrupole, polarizability tensor, electrostatic potential fitting points and potential values, and more (David van der Spoel)
  • MDL Mol: Read in atom class information by default and optionally write it out (Roger Sayle)
  • MDL Mol: Support added for ZBO, ZCH and HYD extensions (Matt Swain)
  • MDL Mol: Implement the MDL valence model on reading (Roger Sayle)
  • MDL SDF: Option to write out an ASCII depiction as a property (Noel O'Boyle)
  • mmCIF: Improved mmCIF reading (Patrick Fuller)
  • mmCIF: Support for atom occupancy and atom_type (Kirill Okhotnikov)
  • Mol2: Option to read UCSF Dock scores (Maciej Wójcikowski)
  • MOPAC: Read z-matrix data and parse (and prefer) ESP charges (Geoff Hutchison)
  • NWChem: Support sequential calculations by optionally overwriting earlier ones (Dmitriy Fomichev)
  • NWChem: Extract info on MEP(IRC), NEB and quadrupole moments (Dmitriy Fomichev)
  • PDB: Read/write PDB insertion codes (Steffen Möller)
  • PNG: Options to crop the margin, and control the background and bond colors (Fredrik Wallner)
  • PQR: Use a stored atom radius (if present) in preference to the generic element radius (Zhixiong Zhao)
  • PWSCF: Extend parsing of lattice vectors (David Lonie)
  • PWSCF: Support newer versions, and the 'alat' term (Patrick Avery)
  • SVG: Option to avoid addition of hydrogens to fill valence (Lee-Ping)
  • SVG: Option to draw as ball-and-stick (Jean-Noël Avila)
  • VASP: Vibration intensities are calculated (Christian Neiss, Mathias Laurin)
  • VASP: Custom atom element sorting on writing (Kirill Okhotnikov)

Other new features and improvements

  • 2D layout: Improved the choice of which bonds to designate as hash/wedge bonds around a stereo center (Craig James)
  • 3D builder: Use bond length corrections based on bond order from Pyykko and Atsumi ( (Geoff Hutchison)
  • 3D generation: "--gen3d", allow user to specify the desired speed/quality (Geoff Hutchison)
  • Aromaticity: Improved detection (Geoff Hutchison)
  • Canonicalisation: Changed behaviour for multi-molecule SMILES. Now each molecule is canonicalized individually and then sorted. (Geoff Hutchison/Tim Vandermeersch)
  • Charge models: "--print" writes the partial charges to standard output after calculation (Geoff Hutchison)
  • Conformations: Confab, the systematic conformation generator, has been incorporated into Open Babel (David Hall/Noel O'Boyle)
  • Conformations: Initial support for ring rotamer sampling (Geoff Hutchison)
  • Conformer searching: Performance improvement by avoiding gradient calculation and optimising the default parameters (Geoff Hutchison)
  • EEM charge model: Extend to use additional params from (Tomáš Raček)
  • FillUnitCell operation: Improved behavior (Patrick Fuller)
  • Find duplicates: The "--duplicate" option can now return duplicates instead of just removing them (Chris Morley)
  • GAFF forcefield: Atom types updated to match Wang et al. J. Comp. Chem. 2004, 25, 1157 (Mohammad Ghahremanpour)
  • New charge model: EQeq crystal charge equilibration method (a speed-optimized crystal-focused charge estimator, (David Lonie)
  • New charge model: "fromfile" reads partial charges from a named file (Matt Harvey)
  • New conversion operation: "changecell", for changing cell dimensions (Kirill Okhotnikov)
  • New command-line utility: "obthermo", for extracting thermochemistry data from QM calculations (David van der Spoel)
  • New fingerprint: ECFP (Geoff Hutchison/Noel O'Boyle/Roger Sayle)
  • OBConversion: Improvements and API changes to deal with a long-standing memory leak (David Koes)
  • OBAtom::IsHBondAcceptor(): Definition updated to take into account the atom environment (Stefano Forli)
  • Performance: Faster ring-finding algorithm (Roger Sayle)
  • Performance: Faster fingerprint similarity calculations if compiled with -DOPTIMIZE_NATIVE=ON (Noel O'Boyle/Jeff Janes)
  • SMARTS matching: The "-s" option now accepts an integer specifying the number of matches required (Chris Morley)
  • UFF: Update to use traditional Rappe angle potential (Geoff Hutchison)

Language bindings

  • Bindings: Support compiling only the bindings against system libopenbabel (Reinis Danne)
  • Java bindings: Add example Scala program using the Java bindings (Reinis Danne)
  • New bindings: PHP (Maciej Wójcikowski)
  • PHP bindings: BaPHPel, a simplified interface (Maciej Wójcikowski)
  • Python bindings: Add 3D depiction support for Jupyter notebook (Patrick Fuller)
  • Python bindings, Pybel: calccharges() and convertdbonds() added (Patrick Fuller, Björn Grüning)
  • Python bindings, Pybel: compress output if filename ends with .gz (Maciej Wójcikowski)
  • Python bindings, Pybel: Residue support (Maciej Wójcikowski)

Development/Build/Install Improvements

  • Version control: move to git and GitHub from subversion and SourceForge
  • Continuous integration: Travis for Linux builds and Appveyor for Windows builds (David Lonie and Noel O'Boyle)
  • Python installer: Improvements to the Python installer and "pip install openbabel" (David Hall, Matt Swain, Joshua Swamidass)
  • Compilation speedup: Speed up compilation by combining the tests (Noel O'Boyle)
  • MacOSX: Support compiling with libc++ on MacOSX (Matt Swain)


Importing Open Source Malaria Data into DataWarrior


Thomas Sander from has provided a version of DataWarrior that can directly import the Open Source Malaria Data.

The new version can be downloaded here, once downloaded and you will need to temporarily adjust your security settings to open it the first time. This is because DataWarrior is not from the Mac App Store or an identified developer. Once open make sure you reset your security settings.


Once installed and opened select the macro as shown below to retrieve the Open Source Malaria Data.


The import only takes a few seconds and pulls the data directly from the Open Source Malaria spreadsheet so it will contains the latest information.


There are now a variety of different options for accessing the Open Source Malaria data you can use the Cheminfo spreadsheet, or use a Vortex script or even an iPython notebook.


Open Source Molecular Modeling


A great publication on Open Source Molecular Modeling.

The success of molecular modeling and computational chemistry efforts are, by definition, dependent on quality software applications. Open source software development provides many advantages to users of modeling applications, not the least of which is that the software is free and completely extendable. In this review we categorize, enumerate, and describe available open source software packages for molecular modeling and computational chemistry. An updated online version of this catalog can be found at

From toolkits to desktop applications a fantastic and comprehensive listing.


The Hitchhiker’s Guide to Cross-Platform OpenCL Application Development


I just came across an interesting paper on cross-platform OpenCL programming. The Hitchhiker’s Guide to Cross-Platform OpenCL Application Development. In particular it highlights a number of issues and offers workarounds. These include Framework bugs, Specification limitations and Program bugs.

There are an increasing number of scientific applications taking advantage of GPU acceleration.


Parkinson disease mobile data collected using ResearchKit


ResearchKit is an open-source framework that allows researchers and developers to create powerful apps for medical research.

The Parkinson app is one of the first five apps built using ResearchKit.

mPower is a unique iPhone application that uses a mix of surveys and tasks that activate phone sensors to collect and track health and symptoms of Parkinson Disease (PD) progression - like dexterity, balance or gait. The goal of this app is to learn more about the variations of PD, and to improve the way we describe these variations and to learn how mobile devices and sensors can help us to measure PD and its progression to ultimately improve the quality of life for people with PD.

The initial results have now been published Scientific Data 3, Article number: 160011 (2016) ​DOI, with around 15,000 people contributed data to the study.


Open Science prize applications



The applications for the Open Science Prize are now in, 92 proposals, all look brilliant. They include apps for mobile devices, machine learning from public datasets, linking science and scientists, tracking disease and much, much more. Well worth popping over and having a look.


FreeSASA: An open source C library for solvent accessible surface area calculations


Calculating solvent accessible surface area is an important calculation in the study of protein structure and whilst there are many tools to undertake this sort of calculation FreeSASA represents the first open-source free standing tool for this sort of calculation. FreeSASA is an open source C library for SASA calculations that provides both command-line and Python interfaces.

Source code is available for download here and building the FreeSASA library and command-line interface only requires standard C and GNU libraries and a C99-compliant compiler, and should be straightforward on any UNIX system (has been tested in Mac OS X 10.8 and Debian 8).

Mitternacht S. FreeSASA: An open source C library for solvent accessible surface area calculations [version 1; referees: awaiting peer review]. F1000Research 2016, 5:189 DOI


Tabula is awesome!


I recently needed to download the supplementary information provided with a publication, my heart sank when I saw it was provided as a PDF file. My worst fears were justified when I tried to simply copy and paste SMILES strings together with 5 columns of data into a spreadsheet, no chance of it copying across in an ordered manner!

Then I tried Tabula a tool for "liberating data tables locked inside PDF files". It worked perfectly, nearly 2000 rows of data spread over 11 pages converted to a csv file in a couple of mouse clicks. This is wonderful and should be part of any data scientists toolkit.

It is included on the Data Analysis Tools page but really deserves a special mention.


Apple and Open Source


Whilst the decision to make Swift open source certainly captured the headlines, it is worth noting that Apple contributes to many more open source projects, there are more details about these open source projects on the developer and main Apple websites.


Swift Open Source


As I previously highlighted after the WWDC Apple have announced that Swift is now open source.

More details are on the Swift blog

Swift is now open source. Today Apple launched the open source Swift community, as well as amazing new tools and resources including: – a site dedicated to the open source Swift community Public source code repositories at A new Swift package manager project for easily sharing and building code A Swift-native core libraries project with higher-level functionality above the standard library Platform support for all Apple platforms as well as Linux is an entirely new site dedicated to open source Swift. This site hosts resources for the community of developers that want to help evolve Swift, contribute fixes, and most importantly, interact with each other. It also provides development snapshots for Apple and Linux platforms, requires OS X 10.11 (El Capitan) or Ubuntu 14.04 or 15.10 (64-bit).

Source code is available on Github




Polyphony is an open source software suite written in python. Its purpose is the superimposition free analysis and comparison of multiple 3D structures of the same or closely related protein molecules.

Absolute Requirements

python 2.6 or later, scipy, numpy, Biopython, especially the Bio.PDB module

Highly recommended

All following documentation assumes that you have these installed.

ipython , for interactive python scripting, matplotlib, for graph plotting, PyMOL, for interactive 3D visualisation. Open source version available on SourceForge

William R Pitt, Rinaldo W Montalvão and Tom L Blundell, BMC Bioinformatics, 2014, 15:324 doi


Importing Open Source Malaria Project data


The Open Source Malaria project is trying a different approach to curing malaria. Guided by open source principles, everything is open and anyone can contribute. To date a lot of people around the world have made contributions and the project is at a very exciting stage. Whilst everyone can see the compounds that have been made and the biological data, it is often spread over multiple web pages and can be tricky to link molecule with identifier with data. Over the last couple of months a significant effort has been put into populating a spreadsheet with all the information.

Whilst this is useful for viewing results it is not ideal for trying to build predictive models. Vortex is a chemically intelligent data analysis and visualisation platform. This script provides a one-click access to the OSM data and creates a workspace containing all the data, and since it is linked to the live spreadsheet you will always have access to the latest data.



Installing Open Drug Discovery Toolkit (ODDT)


A recent paper in J Cheminformatics described Open Drug Discovery Toolkit (ODDT): a new open-source player in the drug discovery field DOI a free and open source tool for both computer aided drug discovery (CADD) developers and researchers. Open Drug Discovery Toolkit is released on a permissive 3-clause BSD license for both academic and industrial use. ODDT’s source code, additional examples and documentation are available on GitHub.

To install ODDT on a Mac you first need to install the appropriate toolkits, the easiest way is to use Homebrew, I've written a page detailing how to do this here.

Once installed you can install ODDT using PIP as described here.


Swift 2.0


More news on Swift 2.0 on the Swift Blog

Today at WWDC, we announced Swift 2.0. This new version has even better performance, a new error handling API, and first-class support for availability checking. And platform APIs feel even more natural in Swift with enhancements to the Apple SDKs.

Open Source In addition to new features, the big news is that Apple will be making Swift open source later this year. We are all incredibly excited about this, and look forward to giving you a lot more information as the open source release gets nearer. Here is what we can tell you so far:

Swift source code will be released under an OSI-approved permissive license. Contributions from the community will be accepted — and encouraged. At launch we intend to contribute ports for OS X, iOS, and Linux. Source code will include the Swift compiler and standard library. We think it would be amazing for Swift to be on all your favorite platforms. We are excited about the opportunities an open source Swift creates for our industry. Baked-in safety features combined with excellent speed mean it has the chance to dramatically improve software versus using C-based languages. Swift is packed with modern features, it’s fun to write, and we believe it will get used in a lot of places. Together, we have an exciting road ahead.


Swift Open Source


Perhaps one of the more unexpected news items from WWDC2015.

Swift is now Open Source!




The latest issue of Journal of Cheminformatics has a paper that might be of interest to a variety of people involved in spectroscopy or data visualsation. SpeckTackle: JavaScript charts for spectroscopy.

We present SpeckTackle, a custom-tailored JavaScript charting library for spectroscopy in life sciences. SpeckTackle is cross-browser compatible and easy to integrate into existing resources, as we demonstrate for the MetaboLights database. Its default chart types cover common visualisation tasks following the de facto ‘look and feel’ standards for spectra visualisation.

SpeckTackle is an open-source JavaScript library to create custom-tailored charts for spectroscopy in life sciences. Implemented charts exist for mass spectrometry, one- and two-dimensional NMR, UV/VIS, IR, and general continuous data use cases such as chromatograms.

The authors kindly supply a demo web page demonstrating different chart types and functions of the SpeckTackle library. Example data is embedded in the web page (800 kb file size). Click on the buttons at the top of the page to see the data displayed. For the Chromatogram, Difference Chart and Spectral Match click the button then the Add Data button.

Highlighting a section of the spectra expands the view and mouseover on the 2D NMR spectra provides a tooltip giving chemical shifts

I've added this to the spectroscopy resources page


HackaMol: An Object-Oriented Modern Perl Library for Molecular Hacking on Multiple Scales


To be honest I can't remember when I last used Perl but this publication brought back a few memories DOI.

HackaMol is an open source, object-oriented toolkit written in Modern Perl that organizes atoms within molecules and provides chemically intuitive attributes and methods.

Source code and example scripts are available online at http:// There is also a description of an IPerl Notebook in the supporting information.

There is also a very interesting extension HackaMol::X::Vina, a structured class that provides an interface with the AutoDock Vina docking program


Open Phacts API update


The OpenPhacts API has been updated to include two new data sets and the corresponding API calls.

1) DisGeNet target-disease associations These API calls use URIs inputs that correspond to either diseases or targets (proteins or genes). The disease identifiers correspond to UMLS CUIs, Mesh ids or ConceptWiki and can use several namespaces, e.g.,, or

2) neXtProt nanopublications for tissue expression (PREVIEW mode) These API calls use URIs that correspond to either tissues or targets. The tissue identifiers correspond to the Caloha tissue ontology from neXtProt. These identifiers can use either the namespace from the neXtProt database (e.g., will be operational next week) or the Caloha ontology (, operational now).

To reduce the barriers to drug discovery in industry, academia and for small businesses, the Open PHACTS Discovery Platform provides tools and services to interact with multiple integrated and publicly available data sources. To integrate this data, extensive cross-referencing of scientific concepts is needed across all databases.


Canonical SMILES

I’m a great fan of SMILES notation (simplified molecular-input line-entry system) as a compact means of storing chemical structures, and whilst there are many tools for creating SMILES strings they often give different (but acceptable) results. Various algorithms for generating Canonical SMILES have been developed, including those by Daylight Chemical Information Systems, OpenEye Scientific Software, MEDIT, Chemical Computing Group, MolSoft LLC, all use proprietary code. In the latest issue of Journal of Cheminformatics Noel O’Boyle describes the development of Universal SMILES and Inchified SMILES as implemented in Open Babel an open source cheminformatics toolkit. DOI


Eyescale announces the release of GPU-SD 1.0.

GPU-SD is a library and daemon for the discovery and announcement of graphics processing units using ZeroConf. It enables auto-configuration of ad-hoc GPU clusters and multi-GPU machines. GPU-SD is used by the upcoming Equalizer 1.2 release for automatic configuration of local and remote GPU resources.

Version 1.0 of GPUSD provides automatic local discovery for Linux (X11/GLX), Mac OS X (CGL, GLX) and Windows (WGL, WGLNVgpuaffinity, WGLAMDgpu_association), a simple network announcement daemon using DNS service discovery and ZeroConf networking as well as remote discovery of resources announced using the GPU-SD daemon.

GPU-SD is a cross-platform library, available for Linux, Windows and Mac OS X and supports both 32-bit and 64-bit execution. It is licensed under the LGPL open source license, which allows free usage in commercial and open source projects. For more information about GPU-SD, please visit


OpenCL Q & A

Latest lecture on OpenCL is on MacResearch.