Macs in Chemistry

Insanely great science

A Review of Flare version 2.0

Cresset provide a variety of software packages to support small molecule design, built on the foundation of their extended forcefield XED forcefield. When I first reviewed a couple of Cresset products FieldView, FieldAlign and Forge the forcefield was only applicable to small molecules. However the forcefield has been constantly developed and can now be applied to proteins. This is a significant advance.

Introduction to XED Forcefield

The XED forcefield is designed to more accurately reflect electrostatics (without resorting to wave functions). Whilst more traditional molecular mechanics forcefields rely on atom based charges (actually the charge is a point at the centre of the nucleus!) they fail to accurately describe electrostatics on the molecular surface. This is because they fail to accommodate things like lone pairs, pi-clouds etc. the XED forcefield addresses this problem by including additional points around each atom DOI. A comparison of the complete surface properties for all conformations would be a hugely demanding computational task so instead Cresset use field points, representing the maxima and minima of the fields.

images forge3

This means it more accurately models hydrogen-bond interaction geometries, pi-stacking interactions and sigma holes seen with halobenzenes.

image001

This makes these tools ideal for bioisosteric replacements, replacements for an ester are shown below.

ester_bio

Flare Version 2 is a recent extension to the portfolio with the introduction of Electrostatic Complementarity (EC), i.e. a comparison of electrostatics on both the small molecule ligand and the target protein DOI.

Electrostatic interactions between small molecules and their respective receptors are essential for molecular recognition and are also key contributors to the binding free energy. Assessing the electrostatic match of protein-ligand complexes therefore provides important insights into why ligands bind and what can be changed to improve binding. Ideally, ligand and protein electrostatic potentials at the protein-ligand interaction interface should maximize their complementarity while minimizing desolvation penalties.

In addition Flare version 2 includes a new Python API, that allows users to automate tasks by scripting, but also integration with other Python packages such as RDKit cheminformatics toolkit, and Python modules for graphing, statistics (NumPy, SciPy, MatPlotLib), and Jupyter notebook integration.

Using Flare version 2

Flare is available for download packaged as .dmg which can be unpacked and simply dropped into the Applications folder. It is a 1.5GB installation but it includes all the user manual, a Python interpreter (plus RDKit) and a number of third party tools such as Amber Tools. Double clicking on the application icon open the main Flare window. This interface gives rapid access to all the key components. In the centre is the main 3D window where the molecular structures/surfaces are displayed, at the top right are six tabs that change the top menu strip depending on the task in hand. The Ligand table displays a 2D image of the small molecule structure plus associated data such as calculated physicochemical properties, activity, scoring etc. To the right is a panel that can be used to display a log of activities undertaken, plus it can be used to store snapshots in a Storyboard. It also provides access to the Python command line. The protein table displays information about the imported proteins, such as sequence information, details of ligands, waters, cofactors, ions etc. It also allows the user to select particular components.

Flare_1

The 3D structures of proteins can be imported in .pdb format either directly from the Protein Data Bank or by file import from the local file system. A variety of file formats are supported, including sdf, mol2, smi, pdb and xed. When first importing a protein the user is asked to select the protonation state. In practice any protein downloaded from the PDB will need to be cleaned up prior to use. One major issue with macromolecular X-ray crystal structures is that of missing or poorly resolved atomic data. Areas that cannot be well-resolved may result in either multiple models, alternate locations (atoms are assigned "partial occupancies" based on how often they are found in different locations) or data being absent altogether. Another issue with X-ray crystal structures is that the bonding patterns of non-amino acids must be inferred from the positions of heavy atoms since the position of hydrogen atoms cannot usually be located with any degree of accuracy. For this reason it is important to choose the "Do full preparation on proteins and ligands" option.

Currently, the Protein Preparation engine can re-build missing side chains picking a reasonable rotamer depending on the local environment. It cannot build missing backbone atoms and hence missing loops; when a missing loop is detected, the "open" termini will be simply capped with Ace/Nme residues if requested by the user through the relevant checkbox in the Protein Preparation dialog. Protein Preparation will not alter existing residue stereochemistry, but it will attempt to guess the correct tautomer for His residues and flip Gln/Asn amide groups as needed. Protonation of side-chains does take into account the local environment, and so does tautomer assignment; this also applies to tautomeric and protonation state assignment for ligands.

The "Apply rule-based protonation" is fine for small molecule import. For protonating ligands, Flare uses a set of rules based on a SMARTS-like language. These rules aim at assigning a reasonable protonation state at pH 7 avoiding multiple protonation of neighbouring groups (e.g., double protonation of a piperazine) and taking into account topology (e.g., differential basicity of amines depending on their degree of substitution, aliphatic/aromatic character, and the like).

protonation

The import and checking takes a couple of minutes; once complete the Protein Table contains the original protein structure together with the checked and protonated structure (3UAG_P). The main 3D window is populated with structures and the ligands are extracted and populate the Ligand Table automatically if the "Auto-extract" option is checked during import (which is why the ligands don't appear in 3UAG_P in the Protein Table). Ligands can also be extracted manually from the Protein Table and added to the Ligand Table. A residue is considered to be a ligand if it has at least 5 carbon atoms and does not fall into other known categories based on topology (e.g., natural and non-natural amino acids, nucleic acid bases, water, etc.) and dictionary lookup from a dictionary of known cofactors and buffers.

3uagImport

All the panes are interconnected so moving the mouse over the amino-acid codes in the Protein Table window transiently highlights the corresponding amino-acid in the 3D view, clicking on an amino acid selects the residue in both Protein Table and 3D window as shown in the movie below.


With a residue selected right-click gives a variety of options via a dropdown menu; this affords access to the usual variety of display options and protein visualisation styles. The main 3D window also allows you to rotate and translate the protein; I found that even with complex visualisations the performance was excellent. The dropdown menus also give access to a variety of editing functions.

residueSelection

Editing the ligand

If you have used any of the Cresset software before you will be familiar with the "Edit a copy" option available from a right-click on the desired ligand in the Ligand Table.

LigandEdit

This opens the molecule editor with a copy of the chosen ligand in the active site. The editing tools are pretty standard and anyone who has used a chemical drawing package will pick it up fairly quickly.

editor

Once the structure has been modified you can add the fields, view hydrogen bonds etc. If happy you can then save the structure to the Ligand Table.

EditedLigand

Docking

Flare provides a very intuitive interface for docking ligands into the desired active site:

Docking

Flare has excellent multiprocessor support and on my MacPro it runs each of the docking runs in parallel. The docking algorithm (Lead Finder from BioMolTech https://www.biomoltech.com) assumes that the protein is rigid and analyses the possible conformations of the ligand by rotating functional groups along each freely rotatable bond.

DockingRun

The docked molecules are returned to the Ligand table with a "D" suffix appended to the molecule name. Scores from three different scoring functions are also returned:

DockedLigands

You can now generate the electrostatic surface on both ligand and protein as shown below. As is apparent from the image below it is very difficult to evaluate which regions of the electrostatic surface complement each other and in which regions there is a clash. This is where the Electrostatic Complementarity function comes in.

ElectrostaticSurfaces

Electrostatic Complementarity

In order to colour a ligand by its complementarity, a surface is first placed over the ligand. For each vertex on the surface, the electrostatic potentials due to the ligand and to the protein are computed. These two values are then clipped to a maximum absolute value (roughly corresponding to the maximum potential on the surface of a water molecule) and added together. The result is divided by the larger of the two potentials, and linearly rescaled so that a point where the ligand and protein electrostatic potentials are equal and opposite gets a score of 1, a place where they are equal gets -1, and a place where either potential is 0 gets 0. Finally, the resulting values are scaled down at points on the ligand surface which are too far away from any protein atom, to reflect the fact that solvent-exposed portions of the ligand contribute little information. The final score at each point on the surface is used to determine the colour of the surface at that point. A score of 1 (meaning perfect electrostatic complementarity) will be coloured in green; a score of 0 will be coloured in white; a score of -1 (meaning perfect electrostatic clash) is coloured in red.

Three different scores are available; the first computed score ('Complementarity') is the normalised surface integral of the complementarity score: effectively the average value of that score over the surface of the ligand. The other two scores ('Complementarity r' and 'Complementarity rho') are the Pearson’s correlation coefficient and the Spearman rank correlation coefficient respectively. I suspect deciding which is the most useful will come with experience and will depend on the particular case. Apparently the Spearman’s rho number is more robust against background electric fields, which may be useful if the computed protein electric potential is being biased by a large net charge on the protein.

ECscore

Whilst the numbers are useful for ranking compounds they don't really offer an insight into why molecules score differently. For this viewing the local Electrostatic Complementarity (EC) mapped on the molecule surface is much more useful. The view below shows the EC mapped onto the surface (green = good; red = bad), by switching on and off the protein surface/vdW/electrostatics you can really get a good feel for which areas of the ligand might need further modification. Incidentally you change the molecular surface by clicking the tiny down arrow as shown in the image below which is easy to miss.

ECsurface

WaterSwap

WaterSwap described in the publication "Rapid decomposition and visualisation of protein-ligand binding free energies by residue and by water" DOI, has enabled the development of new visualisations of binding affinities based on free energy decompositions to per-residue and per-water molecule components. These provide a clear picture of which protein-ligand interactions are strong, and which active site water molecules are stabilised or destabilised upon binding.

The method works by constructing a reaction coordinate that swaps the ligand bound to the protein with an equivalent volume of water molecules. The effect is to move the ligand from being bound to the protein to being free in solution, while simultaneously transferring an equivalent volume of water from being free in solution to being bound to the protein.

As you might imagine Waterswap is a very computationally intensive series of calculations, and Cresset's Engine Broker enables client machines to use remote or cloud based HPC resources to speed up computationally intensive calculation exploiting remote cluster nodes in addition to the local machine where Flare is running.

Cresset indicate you should consider 24-72 h CPU time (depending on the number of WaterSwap iterations you carry out) on a 16-36 core machine for a single WaterSwap calculation. You can install the Cresset Engine Broker on a single AWS instance.

The AWS prices as of today for spot and on-demand instances, respectively:

Instance type Spot price On-demand price
c4.4xlarge $0.1448 per Hour $0.876 per Hour
c4.8xlarge $0.2896 per Hour $1.591 per Hour


Since WaterSwap calculations can be restarted from their last state, you can safely go for AWS spot instances, as no results would be lost even in the event of a sudden shutdown of your instance due to the spot price spiking above your bid.

Protein residues in the Ligand and Water proteins are coloured on a scale from green to red according to the WaterSwap energetics:

The Ligand result indicates where the ligand is gaining most of its binding energy (the green residues) and where the desolvation is unfavourably impacting the binding energy of the ligand (the red residues). It also shows Swap water and Bound water. The Swap water contains the box of waters that has been exchanged for the ligand.

waterswapview

Whilst WaterSwap is a really interesting way to generate ideas and give an indication where to perhaps focus resources when trying to optimise affinity, I'd take the absolute binding free energy with a pinch of salt. This paragraph in the Flare user manual is worth highlighting:

While WaterSwap aims to calculate the absolute binding free energy, it is not magic, and cannot overcome errors in the parameters or model of the complex. WaterSwap will only be as accurate as the underlying model of the complex, and as it neglects terms such as polarisability, ionic effects and concentration effects, as well as the energy differences between different protein conformations and ligand entropy, the results cannot always be compared directly with experiment (like all absolute binding methods). Because WaterSwap ignores many effects, including the entropy cost of putting the ligand into the bound conformation, the WaterSwap binding free energy is an overestimate of the true value. Typically, we find that WaterSwap overestimates the binding free energy by 15-20 kcal mol-1 (e.g. we get values in the range -25 to -35 kcal mol-1. As such, WaterSwap is best used to rank binding of different ligands, or to compare binding free energies of different binding modes of the same ligand. Care should be taken when interpreting the results of WaterSwap, and, ideally, repeat calculations should be performed, e.g. by running on snapshots taken from an equilibrated molecular dynamics simulation.

Summary

Flare gives access to a very powerful set of tools designed to aid ligand binding, docking, electrostatic modelling and WaterSwap, all within a well thought-out interface. The storyboard feature also allows the user to store snapshots of progress and coupled with the log acts like a notebook. Many of these calculations are computationally intensive and are not ideal for running on a laptop so access to a compute cluster would be recommended; Cresset provide detailed instructions for setting one up and when in place it occurs seamlessly.

The "Edit a copy" feature in the Ligand table really does encourage iterative design, and whilst some will complain about the time that some of the calculations take to complete I always point out that it is much shorter than the time taken to actually make the compounds. The various colour-coded displays are very useful ways to aid interpretation.

If you are using protein structures from Protein Data Bank you might want to invest in additional software for protein preparation, however it sounds like improving this side is high on the Flare to do list.

It is always interesting to think about the target market for this sort of software, whilst it would be a valuable tool for expert molecular modellers I can also see it being used by experienced medicinal chemists (perhaps after an expert has prepared the protein).

Update

I've now written a review on Flare integration with Python.

Worth reading

Characterizing hydration sites in protein-ligand complexes towards the design of novel ligands DOI

Biomolecular Solvation Structure Revealed by Molecular Dynamics Simulations DOI

Last updated 11 March 2019