When I wrote the article entitled A few thoughts on scientific software one of the responses I got was that people did not know about the existence of open-source chemistry toolkits so I thought I'd publish a page that hopefully prevent stop people reinventing the wheel. Here are a few open-source cheminformatics toolkits that I'm aware of.
As a follow up I thought I'd put together a list of useful python libraries for data science
As always happy to hear comments or suggestion for additions.
When I wrote the article entitled A few thoughts on scientific software one of the responses I got was that people did not know about the existence of open-source chemistry toolkits so I thought I'd publish a page that hopefully prevent stop people reinventing the wheel. Here are four open-source toolkits that I'm aware of and if I've missed any, my apologies and send me details. Listing of Open-source cheminformatics toolkits
Just catching up.
NWChem 6.8 is now available on Github https://github.com/nwchemgit/nwchem.
NWChem provides many methods for computing the properties of molecular and periodic systems using standard quantum mechanical descriptions of the electronic wavefunction or density. Its classical molecular dynamics capabilities provide for the simulation of macromolecules and solutions, including the computation of free energies using a variety of force fields. These approaches may be combined to perform mixed quantum-mechanics and molecular-mechanics simulations.
Instructions for compiling NWChem on various platforms including Mac OSX https://github.com/nwchemgit/nwchem/wiki/Compiling-NWChem.
Manuscriptsapp is a great writing tool designed from the ground up for creating scientific publications. This week we heard an interesting development, it's now free, it will be open source.
There is a detailed blog post here giving the background.
I integrates nicely with a variety of reference managers (Mendeley, Zotero, Papers 3, Bookends and EndNote) with a couple of clicks and you can cite directly with specially supported reference managers, F1000Workspace New, Papers (Magic Citations) or Bookends. It has a Simple table editor with header, body and footer styles built-in and customizable. Tables can be imported from and exported to Word, Markdown, even LaTeX. You can create equations in LaTeX markup, or paste from MathType. Chemistry support is limited but is certainly on their todo list and they would love to have interested chemists to work with.
If you have not used it before now would be a good time to download and try it out. http://updates.manuscriptsapp.com/apps/manuscripts/download.
As highlighted recently SketchEl2 a chemical drawing package is now open source.
The SketchEl 2 project is underway as a desktop app, based on web technology and delivered as an Electron package. The GitHub repository is now public, on account of there being enough functionality to be arguably useful. This is a very early release, so do be ready to give some useful feedback if you feel so inclined to try it out.
The repository can be found here https://github.com/aclarkxyz/web_sketchel2
A recent paper describes Psi4 1.1: An Open-Source Electronic Structure Program Emphasizing Automation, Advanced Libraries, and Interoperability DOI
Psi4 is an ab initio electronic structure program providing methods such as Hartree–Fock, density functional theory, configuration interaction, and coupled-cluster theory. The 1.1 release represents a major update meant to automate complex tasks, such as geometry optimization using complete-basis-set extrapolation or focal-point methods. Conversion of the top-level code to a Python module means that Psi4 can now be used in complex workflows alongside other Python tools.
Psi4 1.1 can be downloaded from here with versions supporting Python 2.7, 3.5 and 3.6.
Note the installation instructions for Mac: Install XCode via the App Store, Make sure you open XCode and accept the license agreement after you install.
Scaffold Hunter is a chemical data organization and analysis tool and that has been continuously enhanced since the start of its development in 2007. The platform-independent open-source tool was first released in 2009 and provided an interactive visualisation of the so-called scaffold tree, which is a hierarchical classification scheme for molecules based on their common scaffolds. A recent publication describes recent extensions that significantly increase the applicability for a variety of tasks DOI.
When I first opened the application I did not find it particularly intuitive, fortunately there is a online tutorial and sample datasets available.
aRMSD is an open toolbox for structural comparison between two molecules with various capabilities to explore different aspects of structural similarity and diversity. Crystallographic data provided from cif files is fully supported and the results can be rendered with the help of the vtk package.
A. Wagner, H.-J. Himmel, J. Chem. Inf. Model, 2017, 57, 428-438 DOI
Just noticed this paper.
MayaChemTools: An Open Source Package for Computational Drug Discovery 10.1021/acs.jcim.6b00505">DOI.
MayaChemTools is a growing collection of Perl scripts, modules, and classes to support a variety of computational drug discovery needs, such as manipulation and analysis of data, generation of two-dimensional (2D) fingerprints, similarity searching, and calculation of physicochemical properties.
MayaChemTools is freely available online at www.MayaChemTools.org, under the terms of the GNU LGPL, as published by the Free Software Foundation.
It is possible to access them using a Vortex script.
A major new update to OpenBabel has been released, version 2.4.0 is a significant change and is highly recommended.
New file formats
- DALTON output files (read only) and DALTON input files (read/write) (Casper Steinmann)
- JSON format used by ChemDoodle (read/write) (Matt Swain)
- JSON format used by PubChem (read/write) (Matt Swain)
- LPMD's atomic configuration file (read/write) (Joaquin Peralta)
- The format used by the CONTFF and POSFF files in MDFF (read/write) (Kirill Okhotnikov)
- ORCA output files (read only) and ORCA input files (write only) (Dagmar Lenk)
- ORCA-AICCM's extended XYZ format (read/write) (Dagmar Lenk)
- Painter format for custom 2D depictions (write only) (Noel O'Boyle)
- Siesta output files (read only) (Patrick Avery)
- Smiley parser for parsing SMILES according to the OpenSMILES specification (read only) (Tim Vandermeersch)
- STL 3D-printing format (write only) (Matt Harvey)
- Turbomole AOFORCE output (read only) (Mathias Laurin)
- A representation of the VDW surface as a point cloud (write only) (Matt Harvey)
New file format capabilities and options
- AutoDock PDBQT: Options to preserve hydrogens and/or atom names (Matt Harvey)
- CAR: Improved space group support in .car files (kartlee)
- CDXML: Read/write isotopes (Roger Sayle)
- CIF: Extract charges (Kirill Okhotnikov)
- CIF: Improved support for space-groups and symmetries (Alexandr Fonari)
- DL_Poly: Cell information is now read (Kirill Okhotnikov)
- Gaussian FCHK: Parse alpha and beta orbitals (Geoff Hutchison)
- Gaussian out: Extract true enthalpy of formation, quadrupole, polarizability tensor, electrostatic potential fitting points and potential values, and more (David van der Spoel)
- MDL Mol: Read in atom class information by default and optionally write it out (Roger Sayle)
- MDL Mol: Support added for ZBO, ZCH and HYD extensions (Matt Swain)
- MDL Mol: Implement the MDL valence model on reading (Roger Sayle)
- MDL SDF: Option to write out an ASCII depiction as a property (Noel O'Boyle)
- mmCIF: Improved mmCIF reading (Patrick Fuller)
- mmCIF: Support for atom occupancy and atom_type (Kirill Okhotnikov)
- Mol2: Option to read UCSF Dock scores (Maciej Wójcikowski)
- MOPAC: Read z-matrix data and parse (and prefer) ESP charges (Geoff Hutchison)
- NWChem: Support sequential calculations by optionally overwriting earlier ones (Dmitriy Fomichev)
- NWChem: Extract info on MEP(IRC), NEB and quadrupole moments (Dmitriy Fomichev)
- PDB: Read/write PDB insertion codes (Steffen Möller)
- PNG: Options to crop the margin, and control the background and bond colors (Fredrik Wallner)
- PQR: Use a stored atom radius (if present) in preference to the generic element radius (Zhixiong Zhao)
- PWSCF: Extend parsing of lattice vectors (David Lonie)
- PWSCF: Support newer versions, and the 'alat' term (Patrick Avery)
- SVG: Option to avoid addition of hydrogens to fill valence (Lee-Ping)
- SVG: Option to draw as ball-and-stick (Jean-Noël Avila)
- VASP: Vibration intensities are calculated (Christian Neiss, Mathias Laurin)
- VASP: Custom atom element sorting on writing (Kirill Okhotnikov)
Other new features and improvements
- 2D layout: Improved the choice of which bonds to designate as hash/wedge bonds around a stereo center (Craig James)
- 3D builder: Use bond length corrections based on bond order from Pyykko and Atsumi (http://dx.doi.org/10.1002/chem.200901472) (Geoff Hutchison)
- 3D generation: "--gen3d", allow user to specify the desired speed/quality (Geoff Hutchison)
- Aromaticity: Improved detection (Geoff Hutchison)
- Canonicalisation: Changed behaviour for multi-molecule SMILES. Now each molecule is canonicalized individually and then sorted. (Geoff Hutchison/Tim Vandermeersch)
- Charge models: "--print" writes the partial charges to standard output after calculation (Geoff Hutchison)
- Conformations: Confab, the systematic conformation generator, has been incorporated into Open Babel (David Hall/Noel O'Boyle)
- Conformations: Initial support for ring rotamer sampling (Geoff Hutchison)
- Conformer searching: Performance improvement by avoiding gradient calculation and optimising the default parameters (Geoff Hutchison)
- EEM charge model: Extend to use additional params from http://dx.doi.org/10.1186/s13321-015-0107-1 (Tomáš Raček)
- FillUnitCell operation: Improved behavior (Patrick Fuller)
- Find duplicates: The "--duplicate" option can now return duplicates instead of just removing them (Chris Morley)
- GAFF forcefield: Atom types updated to match Wang et al. J. Comp. Chem. 2004, 25, 1157 (Mohammad Ghahremanpour)
- New charge model: EQeq crystal charge equilibration method (a speed-optimized crystal-focused charge estimator, http://pubs.acs.org/doi/abs/10.1021/jz3008485) (David Lonie)
- New charge model: "fromfile" reads partial charges from a named file (Matt Harvey)
- New conversion operation: "changecell", for changing cell dimensions (Kirill Okhotnikov)
- New command-line utility: "obthermo", for extracting thermochemistry data from QM calculations (David van der Spoel)
- New fingerprint: ECFP (Geoff Hutchison/Noel O'Boyle/Roger Sayle)
- OBConversion: Improvements and API changes to deal with a long-standing memory leak (David Koes)
- OBAtom::IsHBondAcceptor(): Definition updated to take into account the atom environment (Stefano Forli)
- Performance: Faster ring-finding algorithm (Roger Sayle)
- Performance: Faster fingerprint similarity calculations if compiled with -DOPTIMIZE_NATIVE=ON (Noel O'Boyle/Jeff Janes)
- SMARTS matching: The "-s" option now accepts an integer specifying the number of matches required (Chris Morley)
- UFF: Update to use traditional Rappe angle potential (Geoff Hutchison)
- Bindings: Support compiling only the bindings against system libopenbabel (Reinis Danne)
- Java bindings: Add example Scala program using the Java bindings (Reinis Danne)
- New bindings: PHP (Maciej Wójcikowski)
- PHP bindings: BaPHPel, a simplified interface (Maciej Wójcikowski)
- Python bindings: Add 3D depiction support for Jupyter notebook (Patrick Fuller)
- Python bindings, Pybel: calccharges() and convertdbonds() added (Patrick Fuller, Björn Grüning)
- Python bindings, Pybel: compress output if filename ends with .gz (Maciej Wójcikowski)
- Python bindings, Pybel: Residue support (Maciej Wójcikowski)
- Version control: move to git and GitHub from subversion and SourceForge
- Continuous integration: Travis for Linux builds and Appveyor for Windows builds (David Lonie and Noel O'Boyle)
- Python installer: Improvements to the Python setup.py installer and "pip install openbabel" (David Hall, Matt Swain, Joshua Swamidass)
- Compilation speedup: Speed up compilation by combining the tests (Noel O'Boyle)
- MacOSX: Support compiling with libc++ on MacOSX (Matt Swain)
Thomas Sander from openmolecules.org has provided a version of DataWarrior that can directly import the Open Source Malaria Data.
The new version can be downloaded here http://www.openmolecules.org/datawarrior, once downloaded and you will need to temporarily adjust your security settings to open it the first time. This is because DataWarrior is not from the Mac App Store or an identified developer. Once open make sure you reset your security settings.
Once installed and opened select the macro as shown below to retrieve the Open Source Malaria Data.
The import only takes a few seconds and pulls the data directly from the Open Source Malaria spreadsheet so it will contains the latest information.
A great publication on Open Source Molecular Modeling.
The success of molecular modeling and computational chemistry efforts are, by definition, dependent on quality software applications. Open source software development provides many advantages to users of modeling applications, not the least of which is that the software is free and completely extendable. In this review we categorize, enumerate, and describe available open source software packages for molecular modeling and computational chemistry. An updated online version of this catalog can be found at https://opensourcemolecularmodeling.github.io.
From toolkits to desktop applications a fantastic and comprehensive listing.
I just came across an interesting paper on cross-platform OpenCL programming. The Hitchhiker’s Guide to Cross-Platform OpenCL Application Development. In particular it highlights a number of issues and offers workarounds. These include Framework bugs, Specification limitations and Program bugs.
There are an increasing number of scientific applications taking advantage of GPU acceleration.
ResearchKit is an open-source framework that allows researchers and developers to create powerful apps for medical research.
The Parkinson app is one of the first five apps built using ResearchKit.
mPower is a unique iPhone application that uses a mix of surveys and tasks that activate phone sensors to collect and track health and symptoms of Parkinson Disease (PD) progression - like dexterity, balance or gait. The goal of this app is to learn more about the variations of PD, and to improve the way we describe these variations and to learn how mobile devices and sensors can help us to measure PD and its progression to ultimately improve the quality of life for people with PD.
The initial results have now been published Scientific Data 3, Article number: 160011 (2016) DOI, with around 15,000 people contributed data to the study.
Calculating solvent accessible surface area is an important calculation in the study of protein structure and whilst there are many tools to undertake this sort of calculation FreeSASA represents the first open-source free standing tool for this sort of calculation. FreeSASA is an open source C library for SASA calculations that provides both command-line and Python interfaces.
Source code is available for download here and building the FreeSASA library and command-line interface only requires standard C and GNU libraries and a C99-compliant compiler, and should be straightforward on any UNIX system (has been tested in Mac OS X 10.8 and Debian 8).
Mitternacht S. FreeSASA: An open source C library for solvent accessible surface area calculations [version 1; referees: awaiting peer review]. F1000Research 2016, 5:189 DOI
I recently needed to download the supplementary information provided with a publication, my heart sank when I saw it was provided as a PDF file. My worst fears were justified when I tried to simply copy and paste SMILES strings together with 5 columns of data into a spreadsheet, no chance of it copying across in an ordered manner!
Then I tried Tabula a tool for "liberating data tables locked inside PDF files". It worked perfectly, nearly 2000 rows of data spread over 11 pages converted to a csv file in a couple of mouse clicks. This is wonderful and should be part of any data scientists toolkit.
It is included on the Data Analysis Tools page but really deserves a special mention.
As I previously highlighted after the WWDC Apple have announced that Swift is now open source.
More details are on the Swift blog
Swift is now open source. Today Apple launched the open source Swift community, as well as amazing new tools and resources including: Swift.org – a site dedicated to the open source Swift community Public source code repositories at github.com/apple A new Swift package manager project for easily sharing and building code A Swift-native core libraries project with higher-level functionality above the standard library Platform support for all Apple platforms as well as Linux
Swift.org is an entirely new site dedicated to open source Swift. This site hosts resources for the community of developers that want to help evolve Swift, contribute fixes, and most importantly, interact with each other. It also provides development snapshots for Apple and Linux platforms, requires OS X 10.11 (El Capitan) or Ubuntu 14.04 or 15.10 (64-bit).
Source code is available on Github
Polyphony is an open source software suite written in python. Its purpose is the superimposition free analysis and comparison of multiple 3D structures of the same or closely related protein molecules.
python 2.6 or later, scipy, numpy, Biopython, especially the Bio.PDB module
All following documentation assumes that you have these installed.
ipython , for interactive python scripting, matplotlib, for graph plotting, PyMOL, for interactive 3D visualisation. Open source version available on SourceForge
William R Pitt, Rinaldo W Montalvão and Tom L Blundell, BMC Bioinformatics, 2014, 15:324 doi
The Open Source Malaria project is trying a different approach to curing malaria. Guided by open source principles, everything is open and anyone can contribute. To date a lot of people around the world have made contributions and the project is at a very exciting stage. Whilst everyone can see the compounds that have been made and the biological data, it is often spread over multiple web pages and can be tricky to link molecule with identifier with data. Over the last couple of months a significant effort has been put into populating a spreadsheet with all the information.
Whilst this is useful for viewing results it is not ideal for trying to build predictive models. Vortex is a chemically intelligent data analysis and visualisation platform. This script provides a one-click access to the OSM data and creates a workspace containing all the data, and since it is linked to the live spreadsheet you will always have access to the latest data.
A recent paper in J Cheminformatics described Open Drug Discovery Toolkit (ODDT): a new open-source player in the drug discovery field DOI a free and open source tool for both computer aided drug discovery (CADD) developers and researchers. Open Drug Discovery Toolkit is released on a permissive 3-clause BSD license for both academic and industrial use. ODDT’s source code, additional examples and documentation are available on GitHub.
To install ODDT on a Mac you first need to install the appropriate toolkits, the easiest way is to use Homebrew, I've written a page detailing how to do this here.
Once installed you can install ODDT using PIP as described here.
More news on Swift 2.0 on the Swift Blog
Today at WWDC, we announced Swift 2.0. This new version has even better performance, a new error handling API, and first-class support for availability checking. And platform APIs feel even more natural in Swift with enhancements to the Apple SDKs.
Open Source In addition to new features, the big news is that Apple will be making Swift open source later this year. We are all incredibly excited about this, and look forward to giving you a lot more information as the open source release gets nearer. Here is what we can tell you so far:
Swift source code will be released under an OSI-approved permissive license. Contributions from the community will be accepted — and encouraged. At launch we intend to contribute ports for OS X, iOS, and Linux. Source code will include the Swift compiler and standard library. We think it would be amazing for Swift to be on all your favorite platforms. We are excited about the opportunities an open source Swift creates for our industry. Baked-in safety features combined with excellent speed mean it has the chance to dramatically improve software versus using C-based languages. Swift is packed with modern features, it’s fun to write, and we believe it will get used in a lot of places. Together, we have an exciting road ahead.
The authors kindly supply a demo web page demonstrating different chart types and functions of the SpeckTackle library. Example data is embedded in the web page (800 kb file size). Click on the buttons at the top of the page to see the data displayed. For the Chromatogram, Difference Chart and Spectral Match click the button then the Add Data button.
Highlighting a section of the spectra expands the view and mouseover on the 2D NMR spectra provides a tooltip giving chemical shifts
I've added this to the spectroscopy resources page
To be honest I can't remember when I last used Perl but this publication brought back a few memories DOI.
HackaMol is an open source, object-oriented toolkit written in Modern Perl that organizes atoms within molecules and provides chemically intuitive attributes and methods.
There is also a very interesting extension HackaMol::X::Vina, a structured class that provides an interface with the AutoDock Vina docking program
The OpenPhacts API has been updated to include two new data sets and the corresponding API calls.
1) DisGeNet target-disease associations These API calls use URIs inputs that correspond to either diseases or targets (proteins or genes). The disease identifiers correspond to UMLS CUIs, Mesh ids or ConceptWiki and can use several namespaces, e.g. http://linkedlifedata.com/resource/umls/id/C0004238, http://purl.bioontology.org/ontology/MSH/D001281, or http://www.conceptwiki.org/concept/index/095cb66f-76ef-41b5-a8ae-c39352e6007e
2) neXtProt nanopublications for tissue expression (PREVIEW mode) These API calls use URIs that correspond to either tissues or targets. The tissue identifiers correspond to the Caloha tissue ontology from neXtProt. These identifiers can use either the namespace from the neXtProt database (e.g. http://www.nextprot.org/db/term/TS-0564, will be operational next week) or the Caloha ontology (ftp://ftp.nextprot.org/pub/currentrelease/controlledvocabularies/caloha.obo#TS-0564, operational now).
To reduce the barriers to drug discovery in industry, academia and for small businesses, the Open PHACTS Discovery Platform provides tools and services to interact with multiple integrated and publicly available data sources. To integrate this data, extensive cross-referencing of scientific concepts is needed across all databases.
I’m a great fan of SMILES notation (simplified molecular-input line-entry system) as a compact means of storing chemical structures, and whilst there are many tools for creating SMILES strings they often give different (but acceptable) results. Various algorithms for generating Canonical SMILES have been developed, including those by Daylight Chemical Information Systems, OpenEye Scientific Software, MEDIT, Chemical Computing Group, MolSoft LLC, all use proprietary code. In the latest issue of Journal of Cheminformatics Noel O’Boyle describes the development of Universal SMILES and Inchified SMILES as implemented in Open Babel an open source cheminformatics toolkit. DOI
GPU-SD is a library and daemon for the discovery and announcement of graphics processing units using ZeroConf. It enables auto-configuration of ad-hoc GPU clusters and multi-GPU machines. GPU-SD is used by the upcoming Equalizer 1.2 release for automatic configuration of local and remote GPU resources.
Version 1.0 of GPUSD provides automatic local discovery for Linux (X11/GLX), Mac OS X (CGL, GLX) and Windows (WGL, WGLNVgpuaffinity, WGLAMDgpu_association), a simple network announcement daemon using DNS service discovery and ZeroConf networking as well as remote discovery of resources announced using the GPU-SD daemon.
GPU-SD is a cross-platform library, available for Linux, Windows and Mac OS X and supports both 32-bit and 64-bit execution. It is licensed under the LGPL open source license, which allows free usage in commercial and open source projects. For more information about GPU-SD, please visit http://www.equalizergraphics.com/gpu-sd