Macs in Chemistry

Insanely Great Science

Khronos Releases OpenGL 4.6 with SPIR-V Support


The Khronos™ Group, an open consortium of leading hardware and software companies, announced at the SIGGRAPH 2017 Conference the immediate public availability of the OpenGL® 4.6 specification. OpenGL 4.6 integrates the functionality of numerous ARB and EXT extensions created by Khronos members AMD, Intel, and NVIDIA into core, including the capability to ingest SPIR-V™ shaders.

The OpenGL 4.6 specification can be found at The GLSL to SPIR-V compiler glslang has been updated with GLSL 4.60 support, and can be found at


Khronos Releases OpenCL 2.2


I see that OpenCL 2.2 has been released and reading through the press release there are a couple of notes that might be of wider interest.

By finalizing OpenCL 2.2, Khronos has delivered on its promise to make C++ a first-class kernel language in the OpenCL standard,” said Neil Trevett, OpenCL chair and Khronos president. “The OpenCL working group is now free to continue its work with SYCL, to converge the power of single source parallel C++ programming with standard ISO C++, and to explore new markets and opportunities for OpenCL — such as embedded vision and inferencing. We are also working to converge with, and leverage, the Khronos Vulkan API — merging advanced graphics and compute into a single API.

Vulkan is a new generation graphics and compute API that provides high-efficiency, cross-platform access to modern GPUs used in a wide variety of devices from computers and consoles to mobile phones and embedded platforms.

There is page of GPU accelerated applications in science applications here


Updated the GPU Science page


I've just updated the GPU Science page adding a couple of new applications including cyroSPARC.


Amber16 and AmberTools16 released


AmberTools consists of several independently developed packages that work well by themselves, and with Amber itself. The suite can also be used to carry out complete molecular dynamics simulations, with either explicit water or generalized Born solvent models.

The AmberTools suite is free of charge, and its components are mostly released under the GNU General Public License (GPL). A few components are included that are in the public domain or which have other, open-source, licenses. The sander program now has the LGPL license. AmberTools is distributed in source code format, and must be compiled in order to be used. You will need C, C++ and Fortran90 compilers

The Amber16 package builds on AmberTools16 by adding the pmemd program, which resembles the sander (molecular dynamics) code in AmberTools, but provides (much) better performance on multiple CPUs, and dramatic speed improvements on GPUs. Major new features include:

Semi-Isotropic Pressure Scaling (GPU)
Charmm VDW Force Switch (CPU, GPU)
Enhanced NMR Restraint support + R^6 averaging support (GPU)
Gaussian Accelerated Molecular Dynamics (CPU, GPU)
Support for external electric fields (CPU)
Expanded umbrella sampling support (GPU)
Constant pH supported with replica exchange along pH coordinate (GPU)
Support for gas phase MD (igb=6) (CPU, GPU)
Support and significant performance improvements for the latest Kepler, Maxwell and Pascal GPUs from NVIDIA.


GPU-Accelerated Scientific Applications


There has been a discussion about GPU-Accelerated Scientific Applications on CCL so I thought it was a good time to update the GPU Science page.

Many thanks to all those who contributed.


OLCF’s second OpenACC hackathon


The GPU Science page is always pretty popular so I thought I'd thought I'd mention an upcoming event.

OLCF’s second OpenACC hackathon will take place the week of October 19th, 2015

The goal of each hackathon is for current or prospective user groups of large hybrid CPU-GPU systems to send teams of 3-6 developers along with either (1) a (potentially) scalable application that needs to be ported to GPU accelerators, or (2) an application running on accelerators which needs optimization. There will be intensive mentoring during this 5-day hands-on workshop, with the goal that the teams leave with applications running on GPUs, or at least with a clear roadmap of how to get there. Our mentors come from national laboratories, universities and vendors, and besides having extensive experience in programming with OpenACC, many of them develop the OpenACC-capable compilers and help define the OpenACC standard.

The application period is now open and closes on 3 July, 2015. Space will be limited to a maximum of eight teams, with two mentors for each team. Groups will be notified about acceptance or rejection of their application by Friday, July 31, 2015. See below how to apply. Prior GPU experience is not required! Those groups whose application successfully passes the selection process will receive further information regarding registration.


Vulkan: New graphics API


The Khronos group have announced a series of lectures describing Vulcan the new cross-platform Graphics API Vulkan: High-Efficiency GPU Graphics and Compute. Also known as the Next Generation OpenGL Initiative.

There are more details on the Kronos website.


CUDA for Mac


Nvidia have released CUDA driver 6.0.51 which is required for CUDA support on Mac OS X 10.9 Mavericks. Note: Quadro FX for Mac or GeForce for Mac must be installed prior to CUDA 6.0.51 installation

An alternative method to download the latest CUDA driver is within Mac OS environment.  Access the latest driver through System Preferences > Other > CUDA.  Click 'Install CUDA Update'.

There are a number of GPU-accelerated science applications described on the GPU Science Page.




I’ve just been sent details of this webinar.

An Overview of AMBER 14 - Creating the World's Fastest Molecular Dynamics Software Package Tuesday, May 13, 2014 9:00 AM - 10:00 AM PDT

This webinar will provide an overview of the AMBER Molecular Dynamics Software package with focus on what is new with regards to GPU acceleration in the recently released version 14. This includes details of peer-to-peer support and optimizations, which have resulted in version 14 being the fastest MD software package on commodity hardware. Benchmarks will be provided, along with recommended hardware choices. In addition, an overview of the new GPU centric features in AMBER 14 will be covered, including support for multi-dimensional replica exchange MD, hydrogen mass repartitioning, accelerated MD, Scaled MD, and support-as-a-service on Amazon Web Services. This is a joint webinar by Ross C. Walker, University of California San Diego, Scott Le Grand, Amazon Web Services, and Adrian Roitberg, University of Florida.


OpenCL training course


Simon McIntosh-Smith has just released a new OpenCL training course “HandsOnOpenCL" via Github. It Includes a comprehensive set of exercises and solutions in C, C++ & Python.

There is a list of GPU-accelerated scientific applications here.


Expanding Computational Chemistry with GPUs

I see there is a session on GPU enabled science at the September ACS meeting.

Link to Abstracts.

A day and half of talks in the Division of Computers in Chemistry

NVIDIA will also host poster competition for GPU accelerated computational chemistry on Sept. 10th evening. NVIDIA will award a free GPU to the author of best poster selected by a panel of experts.

There is a listing of GPU-accelerated scientific applications here.


ACEMD webinar

The number of scientific application making use of graphics card processing continues to increase and NVIDIA are hosting another webinar. The latest webinar describes the use of ACEMD.

ACEMD is a production bio-molecular dynamics software specially optimized to run on graphics processing units (GPUs) on graphics cards. It reads CHARMM/NAMD and AMBER input files with a simple and powerful configuration interface. ACEMD allows performance equivalent to over 100 CPUs and microsecond long trajectories on workstation hardware. ACEMD is the computational engine behind one of the largest distributed computing project worldwide nowadays summing thousands of GPUs. ACEMD is compatible with CUDA and OpenCL, the new standard framework for parallel and high-performance computing over different architectures.

You can now view previously recorded webinars on GPU accelerated applications such as AMBER, NAMD, GROMACS and FastROCS here free registration required. There is a page of GPU-accelerated scientific applications here



I was at the Cresset UGM last week and had a chance to hear more about BlazeGPU. The original CPU application Blaze uses the shape and electrostatic character of known ligands to rapidly search large chemical collections for molecules with similar properties. The latest version BlazeGPU runs at 40 times the speed of the CPU version of Blaze but loses nothing in accuracy. At a fraction of the hardware cost, BlazeGPU delivers the same effective, ligand based virtual screening as Blaze, based on the shape and electrostatic nature of molecules.

BlazeGPU is written in OpenCL and OpenCL libraries are available from NVidia and AMD for their graphics cards, but also from Intel for the CPU and for their new Xeon Phi coprocessor cards. BlazeGPU is currently designed only to run on the GPU - for CPU-only clusters the original code is just as fast, and on a machine with a reasonably fast GPU or two the CPU tends to run flat out just feeding data to the graphics card, so there's not that much gain running on the CPU as well as the GPU.

Currently the conformer generation still runs on the CPU, but they are looking at the possibility of porting that to OpenCL as well in the future.

The relative performance is shown in the plot below, it is worth noting that these are relatively inexpensive graphics cards that you can pick up on Amazon or ebay for a few hundred pounds. Also note for a $2.10/hour GPU instance on AmazonEC2 you can process 2m conformations.


There are more examples of GPU science here.


NWChem 6.3 released

An update to NWChem has been released with a host of new features.

NWChem 6.3 includes a new real-time, time-dependent density functional theory capability developed by Ken Lopata, EMSLs first William Wiley Distinguished Postdoctoral Fellow. This capability allows researchers to probe the ultrafast dynamical behavior of molecules and materials in response to an applied electric field.

With this release, researchers will for the first time be able to perform large scale coupled cluster with perturbative triples calculations utilizing the NVIDIA GPU technology. A highly scalable multi-reference coupled cluster capability will also be available in NWChem 6.3.

EMSL Computing greatly expanded NWChem 6.3 plane wave capability with access to a large set of density functional and pseudopotentials or effective potentials, and a more extensive suite of functionality for the projector augmented wave methodology.

Latest set of basis sets in the Basis Set Exchange have been added to the NWChem basis set library.In addition, NWChem 6.3 includes a new set of reaction path methodologies, tools for various spectroscopies including Python scripts to post-process UV/Vis and core spectra. Binaries are not yet available but the source code and instructions for compilation on a Mac are available are available. You will need Xcode and gfortran 4.6.2 from


International Workshop on OpenCL (IWOCL)

This might be of interest to those involved in developing scientific applications that take advantage of the GPU.

The International Workshop on OpenCL (IWOCL) is an annual meeting of vendors, researchers and developers to promote the evolution and advancement of the OpenCL standard. The meeting is open to anyone who is interested in contributing to, and participating in the OpenCL community. IWOCL is the premier forum for the presentation and discussion of new designs, trends, algorithms, programming models, software, tools and ideas for OpenCL. Additionally, IWOCL provides a formal channel for community feedback to OpenCL promoters and contributors.

May13-14 2013 Georgia Institute of `Technology, more detailed here.


FastROCS updated

I just got an email from OpenEye announcing an update to FastROCS.

FastROCS is an extremely fast shape comparison application, based on the idea that molecules have similar shape if their volumes overlay well and any volume mismatch is a measure of dissimilarity. It uses a smooth Gaussian function to represent the molecular volume [1], so it is possible to routinely minimize to the best global match.


  • Processes 2 million conformations per second on a Quad Fermi box
  • Returns overlays based on the quality of the 3D shape match against the query
  • Overlays are intuitive and visually informative when viewed in standard visualizers (e.g. VIDA)
  • Available as an XML-RPC based web service
  • Jobs can be launched and the subsequent results viewed directly from within VIDA
  • Reports rigorous Tanimoto measure between shapes



GPU accelerated version of GROMACS 4.6

I thought I would give a plug to an upcoming webinar that Dr. Erik Lindhal at Stockholm University and NVIDIA are presenting to discuss latest GPU-acceleration technologies available to GROMACS users. Join to learn about latest accelerated version of GROMACS 4.6, which features are supported, it's installation and use, and how it performs with latest NVIDIA Kepler GPUs.   The webinar is planned for Thursday, April 4th, 2013 9:00 AM - 10:00 AM Pacific Standard Time

Register here:

There is a list of GPU accelerated application here


OpenMM Updated

OpenMM has been Updated

OpenMM is a toolkit for molecular simulation. It can be used either as a stand-alone application for running simulations, or as a library you call from your own code. It provides a combination of extreme flexibility (through custom forces and integrators), openness, and high performance (especially on recent GPUs) that make it truly unique among simulation codes.


Open Molecular Mechanics (OpenMM) workshop

This might be of interest.

Simbios invites you to join us at its next Open Molecular Mechanics (OpenMM) workshop.

Where:  Stanford University When:   March 26-29, 2013 Registration:  Free but required and spaces are limited. To register or for more information, visit

OpenMM ( is open-source software that enables molecular dynamics (MD) simulations to be accelerated on high performance computer architectures. It has demonstrated speed ups for both implicit solvent[1] and explicit solvent simulations[2] on graphics processing units (GPUs).  Its performance, openness, and extreme flexibility  via custom forces and integrators  make it truly unique among simulation codes.

A well-designed framework provides an application layer and a library, so that non-programmers can easily and quickly run MD simulations and develop custom algorithms on GPUs, while programmers are simultaneously able to integrate OpenMM cleanly into their own programs.

The workshop offers two tracks:  one for those who want to use OpenMM to run molecular dynamics simulations (no programming experience is needed), and another for programmers interested in integrating OpenMM into their own software.  

The last two days of the workshop are devoted to having the OpenMM team assist participants with their individual projects.   You can sign up for an instructional track, just the open working days, or both.  

OpenMM is supported by Simbios, an NIH National Center for Physics-Based Simulation of Biological Structures. To learn more about Simbios and its research and software tools, visit

[1] OpenMM accelerated code running on NVIDIA GeForce GTX 280 GPU vs.  conventional code with Amber9 running on Intel Xenon 2.66 GHz CPU. MS Friedrichs, et al., "Accelerating Molecular Dynamic Simulation on Graphics Processing Units," J. Comp. Chem., 2009, 30(6):864-872.

[2]Eastman, P. and Pande, V.S., Efficient Nonbonded Interactions for Molecular Dynamics on a Graphics Processing Unit, J. Comp. Chem., 2010, 31(6):1268-1272.

There is a listing of GPU accelerated scientific applications here.


Try MD/QC applications remotely on latest Tesla K20 GPU Accelerators for free

I just got this email from NVIDIA

Read the benchmark reports on how the Tesla K20 GPUs can increase application performance over the previous generation Fermi GPUs, and CPUs alone.
AMBER 12 Benchmark Report -
NAMD 2.9 Benchmark Report -
LAMMPS Benchmark Report -
GROMACS 4.6 Benchmark Reprot -
To sign up for the free and easy Tesla K20 GPU test drive today and try AMBER on Tesla K20, all you need to do is: 1. Register on 2. Log in to a remote cluster with Tesla K20 GPUs (we will send you instructions). 3. Run your application to get speed-up results for Tesla K20. Several MD/QC applications are already pre-loaded on the cluster. You can also try your code.

There is a listing of GPU accelerated scientific applications here.


OpenCL conference

I just noticed the announcement of the first International Workshop on OpenCL

The International Workshop on OpenCL (IWOCL) is an annual meeting of vendors, researchers and developers to promote the evolution and advancement of the OpenCL standard. The meeting is open to anyone who is interested in contributing to, and participating in the OpenCL community. IWOCL is the premier forum for the presentation and discussion of new designs, trends, algorithms, programming models, software, tools and ideas for OpenCL. Additionally, IWOCL provides a formal channel for community feedback to OpenCL promoters and contributors.

The closing date for submitting papers is Feb 8th

We solicit the submission of unpublished technical papers detailing innovative, original research related to OpenCL. All topics related to OpenCL are of interest, including OpenCL applications from any domain (e.g., scientific computing, video games, computer graphics, multimedia, information retrieval, optimization, text processing, data mining, finance, signal and image processing and numerical solvers), OpenCL performance analysis and modeling, OpenCL performance and correctness tools and proposed OpenCL extensions.

There is a listing of GPU accelerated scientific applications here.


Fen Zi GPU-based MD simulations

Fen Zi (yun dong de Fen Zi = Moving MOLECULES) is a CUDA code that enables large-scale, GPU-based MD simulations. The code of Fen Zi is now available in Google Code at Any help or feedback is welcome!

Fen Zi currently includes: - NVT and NVE ensembles (NPT coming soon) - Force field: CHARMM force field, Flexible Water Models - Lennard-Jones interactions: Switching or shifting - Long distance electrostatic interactions: Ewald summation method and Reaction field - Solvent: Explicit or implicit model; TIP3; Flexible SPC/Fw water model - Exclusion lists for VDW and electrostatic interactions: NBXMod from 1 to 5 - Restraint potentials to probe the free energetic evaluation of processes - Shake/Rattle bond constraints for atom–atom bonds involving at least one hydrogen atom in the bonded pair

There is a listing of GPU accelerated scientific applications here.


GPU-FS-kNN: A Software Tool for Fast and Scalable kNN Computation Using GPUs

A recent publication describes the development of a software tool GPU-FS-kNN (GPU-based Fast and Scalable k-Nearest Neighbour) for CUDA enabled GPUs. The basic approach is simple and adaptable to other available GPU architectures. They observed speed-ups of 50–60 times compared with CPU implementation on a well-known breast microarray study and its associated data sets.

The source code of the proposed GPU-based fast and scalable k nearest neighbor search technique (GPU-FS-kNN) is available at under GNU Public License (GPL).

There is a listing of GPU accelerated scientific applications here.


UK Many-Core developer conference 2012 (UKMAC 2012)

Based on the weblogs there appears to be significant interest in GPU accelerated scientific applications so I thought I’d highlight this meeting that was mentioned to me.

UK Many-Core developer conference 2012 (UKMAC 2012)

Registration now open Only a limited number of spaces are available, so please register early (£70 for the conference only, £99.50 including the conference dinner). The UK Many-Core developer conference 2012 (UKMAC 2012) conference follows on from the UK GPU developer conference and is now in its fourth year. Previous conferences have been held at Oxford, Cambridge and Imperial College.


GPU accelerated applications in science

Last week I highlighted a couple of GPU accelerated applications and based on the number of views of the page I thought it might be worth looking to see how many GPU accelerated science applications are available.

It should be noted from the start that there are two main programming frameworks for writing programs that can execute on the GPU. OpenCL originally developed by Apple is an open-source initiative supported by a wide variety of graphics card vendors. The other major implementation is CUDA developed by Nvidia and is specific for Nvidia graphics units. Whilst it is true that for several years CUDA gave higher performance recent developments with OpenCL have probably closed the gap. "A Comprehensive Performance Comparison of CUDA and OpenCL" DOI.

I’ve compiled a listing of GPU-accelerated science applications here.


More GPU accelerated applications

After posting about Lumo which accelerates the visualization of molecular orbitals from electronic structure calculations by harnessing the power of the gGPU, I received the following email which describes more GPU accelerated applications.

New Molecular Dynamics Benchmark Reports  Oct 2012 are now available to compare CPU vs GPU & NVIDIAs New Kepler GPU Performance.

These reports are intended to assist computational chemistry researchers and IT managers to discover acceleration achieved by running MD applications on GPU based computing solutions.

Download Benchmark Reports -

AMBER, GROMACS, LAMMPS, NAMD reports provide: a.      Benchmark data on latest GPU architectures b.      Hardware recommendations

Also, if you are looking to try GPUs: Sign up for a FREE GPU Test Drive on a remote cluster with AMBER, GROMACS, LAMMPS and NAMD preinstalled -


Lumo:- Molecular Orbital Visualisation

I’ve recently noticed an increasing interest in harnessing the computational power of the graphics card to accelerate scientfic calculations.

The latest application is Lumo which accelerates the visualization of molecular orbitals from electronic structure calculations by harnessing the power of the graphics processing unit in modern macs. Lumo currently reads formatted checkpoint calculations from Gaussian03/09 calculations and there is preliminary support for Orca output files. Lumo was designed to speed up the slow part of looking at molecular orbitals and making molecular orbital diagrams. Lumo eliminates several steps along the process by reading in the output of programs like Gaussian, quickly visualizing the orbitals, and creating pictures of the essential orbitals in seconds.

Lumo requires Mac OS 10.6 or higher, 64-bit processor, and an OpenCL capable compute device. Lumo is routinely run on MacBook Pros and MacBook Airs. For analysis of larger systems, it is recommended to have at least 4GB of system RAM.

There is a movie of Lumo in action on the website