I previously mentioned a comparison of various tools to cluster large datasets. I've now updated the Vortex to allow the user to select the centroid of each cluster. I tried it on a 4.3 million structure clustered dataset and the script only took a few seconds to run.
The page on clustering is here and the Vortex script can be downloaded here http://macinchem.org/reviews/vortex_scripts/ChoseCentreFromClusters.vpy.zip.
Cresset just announced the launch of Flare a new software tool to aid the understanding of protein ligand interactions.
Key new technology available in Flare 1.0:
- Visualize the electrostatics of the protein active site using protein interaction potentials
- Calculate the positions and stability of water in apo and liganded proteins using 3D-RISM
- Understand the energetics of ligand binding using the WaterSwap technique.
Flare uses the XED force field to calculate a detailed map of the electrostatic character of the protein active site. The interaction potentials provide you with vital knowledge of the fundamental processes that underlie ligand-protein binding, helping you to perfect the design of new molecules. The position and energetics of water molecules in and around the active site is of crucial importance in understanding ligand binding. Knowledge of which water molecules are tightly bound and which are energetically unfavorable can give valuable insights into structure-activity relationships and help you to decide where to place ligand atoms. Cresset’s 3D-RISM analysis utilizes the advanced inter-molecular descriptions of the XED force field to give you a water analysis you can trust.
Flare is available for Mac OSX, Linux and Windows and free evaluation is available
StarDrop 6.4 now links prepared 3D docking and alignment models with data visualisation, 2D SAR analyses and predictive models in a single interface.
Computational chemists can make their validated 3D models available to their colleagues via StarDrop’s Pose Generation Interface, which is compatible with software from major computational chemistry providers, including:
- FlexX™ – BioSolveIT
- Gold™ – Cambridge Crystallographic Data Centre
- MOE™ – Chemical Computing Group
- AutoDock Vina – The Scripps Research Institute
- POSIT™ – OpenEye Scientific
- …extendable to other third party applications.
The Pose Generation Interface communicates with a Pose Generation Server, on which computational chemists can easily publish their validated docking or 3D alignment models. These are made instantly available for StarDrop users to submit their compounds and the resulting poses, protein structures and scores are returned directly to StarDrop for visualisation and analysis.
The Pose Generation Server can be installed wherever you run your 3D modelling software, supporting Linux, Windows® and Mac®
There are more details in the poster presented at the Spring ACS 2017.
I recently wrote a review of Reaction Workflows, a web-based tool that allow users to build workflows from nodes that provide inputs and outputs or perform actions, including ones to perform reaction-, scaffold-, and transform-based enumeration, and it is all done within a web browser interface using drag and drop. Whilst you can draw input structures one of the real strengths is the ability to import pre-categorised reagent files e.g.Acid Chlorides or secondary amines. This script is intended to help with this within Vortex.
This script is a variation of the high performance sub-structure search scripts described previously, however instead of simply flagging the presence (or absence) of a SMARTS query we provide a count of the number of times a SMARTS query is identified within a molecule. The script uses all available cores and is thus capable of running multiple queries in parallel and can thus handle very large datasets. The script currently contains around 70 different SMARTS queries for both functional groups and atom counts and I'd be happy to add any suggestions.
Greg Landrum posted the following to the RDKit users and since a couple of the Jupyter Notebooks I've published make extensive use of RDKit I thought I'd flag it.
As many of you are no doubt aware, the Python community plans to discontinue support for Python 2 in 2020. A growing number of projects in the Scientific Python stack are making the same transition and have made that explicit here: http://www.python3statement.org/
I will be adding the RDKit to this list. The RDKit will switch to support only Python 3 by 2020. At some point between now and then - likely during the 2018.09 release cycle - we will create a maintenance branch for Python 2 that will continue to get bug fixes but will no longer have new Python features added. This branch will be maintained, and we will keep doing Python 2 builds, until 2020 when official Python 2 support ends.
Additionally, starting during the 2018.03 release cycle we will accept contributions for new features that are not compatible with Python 2 as long as those features are implemented in such a way that they don't break existing Python 2 code (more on this later). This will allow members of the RDKit community who have made the switch to Python 3 to start making use of the new features of the language in their RDKit contributions.
If you have not made the switch yet to Python 3: please read the web page I link to above and take a look at the list of projects that have committed to transition. The switch from Python 2 to Python 3 isn't always easy, but it's not getting any easier with time and you have a few years to complete it. There are a lot of online resources available to help.
Best Regards, -greg
The list of projects that will be making the transition so far includes; IPython, Jupyter notebook, pandas, Matplotlib SymPy, Astropy, Software Carpentry, SunPy xonsh, scikit-bio, PyStan, Axelrod osBrain, PyMeasure, rpy2, PyMC3, FEniCS, An Introduction to Applied Bioinformatics, music21, QIIME, Altair, gala, cual-id, CIS
ROCS is a shape-based superposition method. Molecules are aligned by a solid-body optimization process that maximizes the overlap volume between them. Volume overlap in this context is not the hard-sphere overlap volume, but rather a Gaussian-based overlap parameterized to reproduce hard-sphere volumes. ROCS uses only the heavy atoms of a ligand, hydrogens are ignored.
ROCS is built on top of the OpenEye Toolkits v2017.Feb libraries to ensure that ROCS and the ancillary programs are taking advantage of state-of-the-art improvements in the underlying programming libraries. This version of ROCS fixes a bug that prevented molecule streaming using pipes and named pipes on Linux and OS X systems. ROCS now accepts molecule streams or named pipes as database files.
Support for Mac OS X 10.10, 10.11, and MacOS X Sierra 10.12 has been added. Mac OS X 10.8 and 10.9 are no longer supported.
Clustering is an invaluable cheminformatics technique for subdividing a typically large compound collection into small groups of similar compounds. One of the advantages is that once clustered you can store the cluster identifiers and then refer to them later this is particularly valuable when dealing with very large datasets. This often used in the analysis of high-throughput screening results, or the analysis of virtual screening or docking studies.
On this page I've explored multiple options for clustering, from Open Source toolkits to sophisticated desktop applications.
ToMoCoMD-CARDD is an interactive and user-friendly free multi-platform framework designed to calculate 2/3-D numerical descriptors (indices) for molecular structures, with the objective of characterizing or discriminating among them. It can be downloaded here http://tomocomd.com/software.
The Royal Society of Chemistry Chemical Information and Computer Applications Group (CICAG) are conducting a survey to find out more about the way that scientists use the various social media channels.
The survey is very short and feedback would be appreciated from everyone, you don't have to be a member of the RSC (or CICAG) to contribute.
The survey can be found here https://www.surveymonkey.co.uk/r/YSYFRDP.
The Chemical Information and Computer Applications Group (CICAG) is one of the RSC’s many member-led Interest Groups.
Workflow tools have become increasingly popular Pipeline Pilot, Knime and Taverna and perhaps the best known. Most are desktop client based but some have a web page that allow users to run protocols that expert users have created.
Dotmatics Reaction Workflows (RW) is a web-based tool that allow users to build workflows from nodes that provide inputs and outputs or perform actions, including ones to perform reaction-, scaffold-, and transform-based enumeration, and it is all done within a web browser interface using drag and drop. I've been looking at reaction workflow for enumerating a potential library array.
This is a recording of the March 2017 Global Health Compound Design meeting. A webinar demonstrating using Jupyter, the free iPython notebook.
How to get started
Accessing Open Source Malaria data
Calculating physicochemical properties and plotting
Predicting AMES activity.
Chirys View is a simple molecular spreadsheet for Mac OSX. It has been designed as a fast viewer for collections of molecules represented as an SDF file (Structured Data Format). On import molecular weight, exact mass, molecular formula, hydrogen bond acceptor and donor counts are automatically calculated. You can combine multiple SDF files by multiple file imports or by coping and pasting from one document into another. You can then save selected compounds as new SDF file.
I imported 1 million structures from ChEMBL and whilst it took a few minutes to load and used 27GB RAM it did so without complaints, scrolling down a list of a million compounds is a little impractical but list sorting is pretty responsive. I had a look at some of the more complex structures and they molecular layout seems excellent and clearly legible.
As a simple molecular selection tool Chirys View works very well. My only complaint is that when you import 3D structures (e.g. from a docking run) the structures can be difficult to discern (see below), it would be nice to have a convert to 2D option.
I've just added a review of MOEsaic, this is a web service application that is part of the MOE install from Chemical Computing Group.
MOEsaic is a browser-based application for analyzing series of small molecule chemical structures and related property data (e.g. from medicinal chemistry projects). Once structure-property data is uploaded to the server, MOEsaic allows users to perform structure based searching and data analysis.
There is a complete listing of reviews here.
OpenEye have announced the release of OpenEye Toolkits v2017.Feb. These libraries include the usual support for C++, Python, C#, and Java.
EXAMPLE NEW FEATURES
FastROCS TK now allows customization of starting points for shape overlap optimization.
Quacpac TK now includes a flexible molecular charging engine.
OEMedchem TK now allows MCS similarity scores to be computed for a query molecule compared to a set of indexed target structures.
Chemical Computing Group have announced and update to MOE. The MOE 2016.0802 update contains a number of updates to the biomolecule modelling including improved hydrogen bond detection, and addition of a number of unnatural amino acids.
There have also been improvements to MOE/Web MOE/web. The MOE/web version compatibility check has been broadened. MOE/web license waiting has been improved. HTTPS authentication proxy server support has been improved.
Download InChI version 1 (software version 1.05) for Standard and Non-Standard InChI/InChIKey (27 January 2017)
This package contains InChI Software version 1.05 (January 2017) final release.
In this version:
- support for chemical element numbers 113-118 was newly added;
- experimental support of InChI/InChIKey for simple regular single-strand polymers was implemented;
- experimental support of large molecules containing up to 32767 atoms was added;
- ability to read necessary for large molecules input files in Molfile V3000 format was added;
- provisional support for extended features of Molfile V3000 was added;
- InChI API Library was significantly updated; in particular, a novel API procedure for direct conversion of Molfile input to InChI has been added; a whole new set of API procedures for both low and high-level operations (InChI extensible interface, IXA) has been added;
- the source code was significantly modified in order to ensure multi-thread execution safety of
- the InChI Library; several minor bugfixes/changes were made and several convenience options were added to the inchi-1 executable.
Chembench is a web-based tool for QSAR (Quantitative Structure-Activity Relationship) modeling and prediction. Chembench doesn't require any programming or scripting knowledge to use. It's an interface that lets you skip past the hassles of file management and translating between programs, so you can focus on the science of making and applying predictive models. DOI.
It includes models/datasets for things like brain penetration, PGP, AMES, skin penetration etc. you can use the existing models or build your own and than evaluate novel compounds.
A new version of SeeSAR is now available for download.
Version 5.5 includes several new features and has undergone some tweaks under the hood to improve speed.
From the release notes:-
2D browsing featuring in-view molecule properties
To further enhance the 2D browsing, we have added an illustration of the molecules' key properties in the form of a radar plot. A thumbnail of the plot is embedded in each of the 2D molecule pictures, providing a quick overview. it enlarges upon mouse-over and provides access to the configuration dialog. Add or remove property-axes, optionally fine-tune the scales and set 'desired' value ranges. A hit or miss of the latter is indicated by green or red dots on the corners of the color-coded characteristic shape of the molecule on the plot (the greener the better).
Detecting novel/unoccupied binding sites
Now SeeSAR can search your protein for unoccupied pockets based on the world-renown DoGSite-Algorithm. You may then select these to become the binding site, within which to generate poses and calculate binding affinities for your molecules. The new binding site definition feature lets you either use a selected molecule from the table (based on a 6.5Å shell around it, as before) or will detect and visualize empty pockets for you to select instead.
Multiple reference molecules
The reference molecule in SeeSAR always stays in view even when you select other entries from the molecule tables. Now, however, you are able to set - and keep in view - as many reference molecules as you like. Either set them individually - in the selected molecule menu (as before) - or mark several as favorites and set them all as references at once, via the new menu button below the table.
Multiple core replacements with just one click
With the new multiple solutions button for ReCore in the molecule editor, brainstorming new scaffold ideas became yet easier. You can now generate 10 new alternative core replacements at once. The new molecules are saved directly to the table so that you can immediately see their estimated binding affinity and view all structures in 2D at a glance.
Just got this email
I am glad to announce the release of OSRA 2.1.0. OSRA (Optical Structure Recognition Application) is a tool for converting images of molecules into SDF, SMILES and many other chemical formats. Images can be pictures of single molecules or complete PDF documents with multiple pages of text and graphics. In addition to molecules OSRA can also recognize reactions, and, starting with this version, simple polymers.
The improvements in this version: - Significantly improved recognition of PDF documents, no longer dependent on Ghostscript at runtime. - Recognition of polymers (different approach from POSRA - a separate tool focused on polymer recognition).
The new version is available at osra.sf.net
Please note that if you are building from source the dependencies have changed. OSRA now requires poppler (version 0.41) to process PDF files and a custom-patched version of OpenBabel to save polymer MOL and SDF files. The patched version of OpenBabel is provided at the above url. OSRA no longer requires Ghostscript to be installed.
Just noticed this paper.
MayaChemTools: An Open Source Package for Computational Drug Discovery 10.1021/acs.jcim.6b00505">DOI.
MayaChemTools is a growing collection of Perl scripts, modules, and classes to support a variety of computational drug discovery needs, such as manipulation and analysis of data, generation of two-dimensional (2D) fingerprints, similarity searching, and calculation of physicochemical properties.
MayaChemTools is freely available online at www.MayaChemTools.org, under the terms of the GNU LGPL, as published by the Free Software Foundation.
It is possible to access them using a Vortex script.
Whilst there are many sites that track the compatibility on common desktop applications, it is often difficult to find out information about scientific applications. I’ll update the list regularly and feel free to send in information.
4Peaks no reported issues
Avogadro all seems OK
BBEdit version 11.6.2 and newer are compatible, recommend against using earlier versions
Brainsight requires version 2.3.3 for full compatibility with 10.12 Sierra. You could note too that 2.0 through 2.2.x will never work because 10.12 removed support for garbage collected applications. 2.3.x uses ARC
ChemDraw the official line is that it is not supported, even under El Capitan there were reports of copy/paste issues. One user reports “ChemDraw 15 is working fine for me. copy/paste, everything without issues”.
ChemDoodle no reported issues
CrystalMaker “We are pleased to confirm that all our latest software runs fine on macOS “Sierra”, as well as OS X 10.11 “El Capitan”, 10.10 “Yosemite”, and earlier.”
CYLview app launcher (icon on the desktop or the dock) does not work need to start using “Terminal”
DataDesk Data Desk 8 for Mac runs on OS X 10.7 up to 10.12
DataWarrior requires Java installation
EndNote From Endnote (Thomson Reuters) version 7: Message for Mac user planning to update to Sierra: In preparation for Apple's release of macOS Sierra on September 20, we have been testing various versions of EndNote. Through our testing, we discovered some issues with the EndNote PDF viewer. These issues have been reported to Apple, but in the meantime, we recommend that you DO NOT upgrade to macOS Sierra.
EnzymeX no reported issues
Evernote a bug in some versions of Evernote for Mac that can cause images and other attachments to be deleted from a note under specific conditions. We've released an updated version of Evernote for Mac, version 6.9.1, to resolve this.
Findings no reported issues
Fortran users will be happy to hear there are no reported issues with FTranProjectBuilder
GAMESS no reported issues.
Homebrew, after every update it is worth checking your homebrew installation.
Username$ brew doctor Please note that these warnings are just used to help the Homebrew maintainers with debugging if you file an issue. If everything you use Homebrew for is working fine: please don't worry and just ignore them. Thanks! Warning: /usr/local is not writable. You should probably change the ownership and permissions of /usr/local back to your user account. sudo chown -R $(whoami) /usr/local Warning: /usr/local is not writable. Even if this directory was writable when you installed Homebrew, other software may change permissions on this directory. For example, upgrading to OS X El Capitan has been known to do this. Some versions of the "InstantOn" component of Airfoil or running Cocktail cleanup/optimizations are known to do this as well. You should probably change the ownership and permissions of /usr/local back to your user account. sudo chown -R $(whoami) /usr/local
Once corrected you can then type
brew update brew upgrade
You may get this error
$brew update /usr/local/Library/brew.sh: line 32: /usr/local/Library/ENV/scm/git: No such file or directory
simply retyping brew update seems to resolve the issue
If you have previously installed Openbabel using
brew install mcs07/cheminformatics/open-babel --HEAD
The "--HEAD" part means install the latest development version from GitHub. The latest version of OpenBabel is now available so type
brew uninstall mcs07/cheminformatics/open-babel Uninstalling /usr/local/Cellar/open-babel/HEAD... (309 files, 14.6M) brew install mcs07/cheminformatics/open-babel You can check you have the latest version installed by type this in a Terminal window obabel -V Open Babel 2.4.0 -- Sep 24 2016 -- 14:01:18
iBabel seems to work fine with the latest version of OpenBabel under Sierra. One advantage to updating to OpenBabel 2.4.0 is that previews now work with Quicklook.
iPython Notebook all working fine
Lego Mindstorms At the moment everything seems to be running really great on Sierra. However, please let you readers know they are welcome to contact us via the website way you did if they run into any errors. We'd be happy to solve them!
Manuscripts no reported issues
Mathematica 11.0.1 has been compatibility tested with macOS Sierra and you should not run into any OS-specific compatibility issues. The font-panel is disabled, but we are actively working to address this as soon as possible.
Mendeley no issues reported
MOE working fine, XQuartz did not need reinstalling. However the MOE app launcher (icon on the desktop or the dock) does not work because Apple changed some fundamental system components which affects lots of programs not specifically compiled for the newest MacOSXs. Also you cannot double click on a file to open it in MOE. You can still start MOE from the command line
It then works perfectly. Update, just had an email from CCG support, The problem with the MOE app launcher on MacOSX Sierra has been fixed in the MOE 2016 release.
MOPAC all seems to be working fine.
osra crashed with error abort trap: 6. I uninstalled using brew then reinstalled
brew uninstall osra Uninstalling /usr/local/Cellar/osra/2.0.1... (7 files, 1.6M) brew install osra
Then worked fine
Pandoc depends on llvm-3.5, not supported on Sierra. Llvm-3.9 is supported, installation using Homebrew seems to be OK.
Papers Mac 3.4.7 (527) is now available! Fixes a couple of problems under Sierra. A crash that can occur when switching PDFs, The search in PDF functionality is restored
R latest version (3.3.1) all seems fine
rdkit installed using home-brew works fine.
Readcube Version 2.22.13732 is Mac OS Sierra (v10.12) compatibility update.
Scansnap Note for using ScanSnap or ScanSnap Applications on macOS Sierra In order to avoid the ScanSnap compatibility problems, please do not use ScanSnap or ScanSnap applications on macOS Sierra in the following manner as doing so may cause some pages to be deleted or to become blank. Do not use [ScanSnap Organizer], [ScanSnap Merge Pages], or [CardMinder] Do not use Excellent mode when scanning A3 (11.7 in. x 16.5 in.) documents No image data will be lost nor any blank pages produced when content that has been scanned in the A4 (8.3 in. x 11.7 in.), Letter (8.5 in. x 11 in.), Legal (8.5 in. x 14 in.), or smaller sizes is saved.
Schrodinger a reader sent in this response. We received your query regarding MacOS Sierra. Unfortunately our current, 2016-3 release, do not yet support MacOS Sierra but we have plans to include support for this OS for the upcoming 2016-4 release of our software.
SeeSAR version 5.3 now, 5.4 will come out shortly. No compatibility issues observed/reported.
Studies no reported issues
UCSF Chimera version 1.11.1 seems to be working fine
Wizard worked great with the developer pre-releases, no reported issues
Vortex no problems so far, the embedded chemical drawing app Elemental appears to have no issues.
XQuartz did not require reinstallation :-) however there are reports of an intermittent display not found error when launching apps from a Linux box.
Allow applications downloaded from anywhere in macOS Sierra, if you open the security panel in the Settings the default options in Sierra are as shown below. There is no longer the option to open applications from Anywhere.
Apple have removed this function on macOS Sierra, but you can re-enable it running this in terminal
sudo spctl --master-disable
You can restore it back to the default setting using
sudo spctl --master-enable
I’ll add more updates later.
I just heard that David Weininger had died last Wednesday, for me his invention of SMILES was one of those ideas that you instantly knew was going to change the way we did science. So much of what we do in storing, searching and analysing chemical information is based on his pioneering work. I only met him once at a Daylight UGM but it was clear from our first conversation that he was a scientist with a special insight.
SMILES as a simple yet comprehensive chemical language in which molecules and reactions can be specified using ASCII characters representing atom and bond symbols
Anthony Nicholls of OpenEye has written a lovely tribute that is well worth reading
This was a joint meeting Organised by SCI's Fine Chemicals Group and RSC's Chemical Information and Computer Applications Group. Held at Imperial War Museum, Duxford, UK, on Wednesday 12 October 2016. This was an excellent meeting and the conference centre at Duxford was superb, many participants arrived early to have a wander around the historic collection of aeroplanes.
MolSoft have announced the release of ICM version 3.8-5.
- Generate a 2D Interaction Diagram of a ligand with the binding pocket. The image is annotated with hydrogen bonds and interacting residues.
- 3D ligand editor is a powerful tool for the interactive design of new lead compounds in 3D
- Support for MMTF format. The Macromolecular Transmission Format (MMTF)
- Support for Mac retina display
- Add docking restraints by selecting atoms in the receptor
- Updates to protein modelling, bioinformatics and cheminformatics
Chemical Computing Group have just announced an update to MOE. This release has fixed a couple of Mac OSX 10.12 (Sierra) issues but also brings a host of new features.
- MOEsaic: Web-Application for Ligand Analytics
- Spectral Analysis for Structure Determination
- Enhanced Protein Patch Analyzer
- Integrated Antibody Project Database and Antibody Homology Modeler
- Small Footprint MOE to Facilitate Large Scale Deployments
- Physical and Virtual Rendering of Structures
A more detailed description of the new and enhanced features in MOE 2016.08 can be found at http://www.chemcomp.com/print/moe2016.08.pdf.
I just noticed that the latest version of iBabel has been downloaded over 1000 times, this is fantastic news and it certainly allows me to justify the effort put into creating the application.
I’m occasionally asked about the best way to install OpenBabel and I usually refer people to the page I wrote on installing cheminformatics tools on a Mac, this gives instructions on how to install a wide variety of cheminformatics toolkits and applications.
If you only want to install Openbabel then the best way is to use Homebrew.
Homebrew is a package manager for Mac OSX that installs packages in it’s own directory then symlinks the files to /usr/local. To install Homebrew you first need to have access to the command line tools for Xcode, the easiest way to do this is to download Xcode from the Mac Appstore
- Start Xcode on the Mac.
- Choose Preferences from the Xcode menu.
- In the General panel, click Downloads.
- On the Downloads window, choose the Components tab.
- Click the Install button next to Command Line Tools. You are asked for your Apple Developer login during the install process.
Or You can download the Xcode command line tools directly from the developer portal as a .dmg file. https://developer.apple.com/downloads/index.action. On the "Downloads for Apple Developers" list, select the Command Line Tools entry that you want.
To install Homebrew type this command in the Terminal
ruby -e "$(curl -fsSL https://raw.github.com/Homebrew/homebrew/go/install)"
The 'brew doctor' command checks everything is fine. e.g. it will warn if the developer tools are missing, and if there are unexpected items in /usr/local/bin and /usr/local/lib that may clash and might need to be deleted.
It is a good idea to first update the package list
To install a range of cheminformatics packages we can use a custom “tap” created by Matt
brew tap mcs07/cheminformatics
Then to specifically install Openbabel use
brew install mcs07/cheminformatics/open-babel
To check OpenBabel is working type this in a Terminal window:
obabel -:'C1=CC=CC=C1F' -ocan Fc1ccccc1 1 molecule converted
I often need to tag individual molecules within a dataset with a specific property, perhaps the results of clustering algorithms, the results of PAINS filtering, or Liver toxicity filters. Alternatively if you have a drug discovery project with multiple chemotypes you might want to tag particular groups of compounds as belonging to a named series to aid analysis.
A question that might then arise is “How many molecules belong to each category?”. Whilst you can see the numbers in the sidebar there is not an easy way to export the results.
Hopefully this script can help.
A major new update to OpenBabel has been released, version 2.4.0 is a significant change and is highly recommended.
New file formats
- DALTON output files (read only) and DALTON input files (read/write) (Casper Steinmann)
- JSON format used by ChemDoodle (read/write) (Matt Swain)
- JSON format used by PubChem (read/write) (Matt Swain)
- LPMD's atomic configuration file (read/write) (Joaquin Peralta)
- The format used by the CONTFF and POSFF files in MDFF (read/write) (Kirill Okhotnikov)
- ORCA output files (read only) and ORCA input files (write only) (Dagmar Lenk)
- ORCA-AICCM's extended XYZ format (read/write) (Dagmar Lenk)
- Painter format for custom 2D depictions (write only) (Noel O'Boyle)
- Siesta output files (read only) (Patrick Avery)
- Smiley parser for parsing SMILES according to the OpenSMILES specification (read only) (Tim Vandermeersch)
- STL 3D-printing format (write only) (Matt Harvey)
- Turbomole AOFORCE output (read only) (Mathias Laurin)
- A representation of the VDW surface as a point cloud (write only) (Matt Harvey)
New file format capabilities and options
- AutoDock PDBQT: Options to preserve hydrogens and/or atom names (Matt Harvey)
- CAR: Improved space group support in .car files (kartlee)
- CDXML: Read/write isotopes (Roger Sayle)
- CIF: Extract charges (Kirill Okhotnikov)
- CIF: Improved support for space-groups and symmetries (Alexandr Fonari)
- DL_Poly: Cell information is now read (Kirill Okhotnikov)
- Gaussian FCHK: Parse alpha and beta orbitals (Geoff Hutchison)
- Gaussian out: Extract true enthalpy of formation, quadrupole, polarizability tensor, electrostatic potential fitting points and potential values, and more (David van der Spoel)
- MDL Mol: Read in atom class information by default and optionally write it out (Roger Sayle)
- MDL Mol: Support added for ZBO, ZCH and HYD extensions (Matt Swain)
- MDL Mol: Implement the MDL valence model on reading (Roger Sayle)
- MDL SDF: Option to write out an ASCII depiction as a property (Noel O'Boyle)
- mmCIF: Improved mmCIF reading (Patrick Fuller)
- mmCIF: Support for atom occupancy and atom_type (Kirill Okhotnikov)
- Mol2: Option to read UCSF Dock scores (Maciej Wójcikowski)
- MOPAC: Read z-matrix data and parse (and prefer) ESP charges (Geoff Hutchison)
- NWChem: Support sequential calculations by optionally overwriting earlier ones (Dmitriy Fomichev)
- NWChem: Extract info on MEP(IRC), NEB and quadrupole moments (Dmitriy Fomichev)
- PDB: Read/write PDB insertion codes (Steffen Möller)
- PNG: Options to crop the margin, and control the background and bond colors (Fredrik Wallner)
- PQR: Use a stored atom radius (if present) in preference to the generic element radius (Zhixiong Zhao)
- PWSCF: Extend parsing of lattice vectors (David Lonie)
- PWSCF: Support newer versions, and the 'alat' term (Patrick Avery)
- SVG: Option to avoid addition of hydrogens to fill valence (Lee-Ping)
- SVG: Option to draw as ball-and-stick (Jean-Noël Avila)
- VASP: Vibration intensities are calculated (Christian Neiss, Mathias Laurin)
- VASP: Custom atom element sorting on writing (Kirill Okhotnikov)
Other new features and improvements
- 2D layout: Improved the choice of which bonds to designate as hash/wedge bonds around a stereo center (Craig James)
- 3D builder: Use bond length corrections based on bond order from Pyykko and Atsumi (http://dx.doi.org/10.1002/chem.200901472) (Geoff Hutchison)
- 3D generation: "--gen3d", allow user to specify the desired speed/quality (Geoff Hutchison)
- Aromaticity: Improved detection (Geoff Hutchison)
- Canonicalisation: Changed behaviour for multi-molecule SMILES. Now each molecule is canonicalized individually and then sorted. (Geoff Hutchison/Tim Vandermeersch)
- Charge models: "--print" writes the partial charges to standard output after calculation (Geoff Hutchison)
- Conformations: Confab, the systematic conformation generator, has been incorporated into Open Babel (David Hall/Noel O'Boyle)
- Conformations: Initial support for ring rotamer sampling (Geoff Hutchison)
- Conformer searching: Performance improvement by avoiding gradient calculation and optimising the default parameters (Geoff Hutchison)
- EEM charge model: Extend to use additional params from http://dx.doi.org/10.1186/s13321-015-0107-1 (Tomáš Raček)
- FillUnitCell operation: Improved behavior (Patrick Fuller)
- Find duplicates: The "--duplicate" option can now return duplicates instead of just removing them (Chris Morley)
- GAFF forcefield: Atom types updated to match Wang et al. J. Comp. Chem. 2004, 25, 1157 (Mohammad Ghahremanpour)
- New charge model: EQeq crystal charge equilibration method (a speed-optimized crystal-focused charge estimator, http://pubs.acs.org/doi/abs/10.1021/jz3008485) (David Lonie)
- New charge model: "fromfile" reads partial charges from a named file (Matt Harvey)
- New conversion operation: "changecell", for changing cell dimensions (Kirill Okhotnikov)
- New command-line utility: "obthermo", for extracting thermochemistry data from QM calculations (David van der Spoel)
- New fingerprint: ECFP (Geoff Hutchison/Noel O'Boyle/Roger Sayle)
- OBConversion: Improvements and API changes to deal with a long-standing memory leak (David Koes)
- OBAtom::IsHBondAcceptor(): Definition updated to take into account the atom environment (Stefano Forli)
- Performance: Faster ring-finding algorithm (Roger Sayle)
- Performance: Faster fingerprint similarity calculations if compiled with -DOPTIMIZE_NATIVE=ON (Noel O'Boyle/Jeff Janes)
- SMARTS matching: The "-s" option now accepts an integer specifying the number of matches required (Chris Morley)
- UFF: Update to use traditional Rappe angle potential (Geoff Hutchison)
- Bindings: Support compiling only the bindings against system libopenbabel (Reinis Danne)
- Java bindings: Add example Scala program using the Java bindings (Reinis Danne)
- New bindings: PHP (Maciej Wójcikowski)
- PHP bindings: BaPHPel, a simplified interface (Maciej Wójcikowski)
- Python bindings: Add 3D depiction support for Jupyter notebook (Patrick Fuller)
- Python bindings, Pybel: calccharges() and convertdbonds() added (Patrick Fuller, Björn Grüning)
- Python bindings, Pybel: compress output if filename ends with .gz (Maciej Wójcikowski)
- Python bindings, Pybel: Residue support (Maciej Wójcikowski)
- Version control: move to git and GitHub from subversion and SourceForge
- Continuous integration: Travis for Linux builds and Appveyor for Windows builds (David Lonie and Noel O'Boyle)
- Python installer: Improvements to the Python setup.py installer and "pip install openbabel" (David Hall, Matt Swain, Joshua Swamidass)
- Compilation speedup: Speed up compilation by combining the tests (Noel O'Boyle)
- MacOSX: Support compiling with libc++ on MacOSX (Matt Swain)
This is a joint meeting Organised by SCI's Fine Chemicals Group and RSC's Chemical Information and Computer Applications Group. To be held at Imperial War Museum, Duxford, UK, on Wednesday 12 October 2016.
There is an interesting line up of speakers and exhibitors and a chance to have a look around the aerospace museum. More details and the booking form are here https://www.soci.org/Events/Display-Event?EventCode=FCHEM481.
I just noticed that iScienceSearch has been updated.
Search by structure, text, name and identifiers in 100 chemical and biological databases. A single front-end allows you to get links with answers for your query searching the scientific web! NEW! iScienceSearchlite is a new iScienceSearch version with a simplified UI, which is optimized for smaller screens and slow Internet connections.
There is an review of an earlier version here
The CICAG newsletter is now available here
CICAG aims to keep its members abreast of the latest activities, services, and developments in all aspects of chemical information, from generation through to archiving, and in the computer applications used in this rapidly changing area through meetings, newsletters and professional networking
I’ve just heard that the poster deadline for the Cheminformatics for Drug Design: Data, Models & Tools meeting organised by SCI's Fine Chemicals Group and RSC's Chemical Information and Computer Applications Group has been extended.
Imperial War Museum, Duxford, UK Wednesday 12 October 2016
Full details are available here https://www.soci.org/Events/Display-Event?EventCode=FCHEM481
Sounds an excellent meeting and you will have a chance to look around the aircraft at the Duxford Imperial War Museum.
ZINC is a free database of commercially-available compounds for virtual screening. ZINC contains over 100 million purchasable compounds in ready-to-dock, 3D formats. Sterling and Irwin, J. Chem. Inf. Model, 2015. This is an invaluable resource for any type of virtual screening or for anyone looking to create a physical screening or fragment collection.
Once you have done the virtual screening you will rapidly realise that the really time-consuming a tedious part now lies ahead. Finding out which vendors stock a particular molecule and then ordering them. Looking up the vendor details for individual compounds is extremely tedious and so this Vortex script may be very useful.
Optibrium have just announced the release of StarDrop 6.3, perhaps the highlight of this release is the introduction of the new SeeSAR module.
The SeeSAR module developed in collaboration with BioSolve ITprovides seamless access in StarDrop to 3D structures based on X-ray crystallography or predicted with any docking software. The intuitive link between this 3D information and StarDrop’s cheminformatics analyses and visualisations, based on 2-dimensional compound structure, gives new insights into structure-activity relationships (SAR) within your project chemistry and aids the design of improved compounds. It also supports collaboration between computational and synthetic chemists, helping to share the results of 3D modelling with all decision makers.
I just noticed that the latest version of iBabel has now been downloaded over 700 times since it was released at the start of the year.
iBabel started out as an AppleScript Studio application designed as a front-end to OpenBabel DOI, this was updated several times and is now an ApplescriptObjC application built with Xcode. As well as acting as a front-end to OpenBabel it also provided a front-end to tools built on OpenBabel and a molecule viewer.
Cytoscape has been updated to version 3.4.0
Note, This update requires Java 8 is installed and Mac OS X 10.9 and later.
Cytoscape is an open source software platform for visualizing molecular interaction networks and biological pathways and integrating these networks with annotations, gene expression profiles and other state data.
I just noticed that iBabel has now been downloaded over 500 times since the start of the year. I'm surprised and delighted that it has proved so popular.
iBabel is a GUI (graphical user interface) for the open source cheminformatics toolkit OpenBabel. It also provides an interface to a variety of tools built using OpenBabel and a variety of molecule viewers.
I've been sent details of a couple of jobs and I thought I'd pass them on.
BIO – Cheminformatics Data Scientist (Stratified Medical)
We are looking for an experienced and innovative cheminformatician to make a significant contribution in influencing the drug discovery process by applying your expertise in chemical methods development and use of intelligent algorithms to chemical data. Working within the Biomedical Data R&D team, you will be required to bring your experience and ideas across a broad range of drug design areas.
Technical Expert Cheminformatics, (Syngenta, Jealotts Hill)
In this role, you will provide cheminformatics and mathematical modeling support to the computational chemistry platform in Chemical Research.
RDkit has been updated .
If you used home-brew to install RDkit as described here updating is very simple
brew update brew upgrade rdkit
You can check which version you have installed using
MacPro> python Python 2.7.11 (default, Dec 23 2015, 16:11:50) [GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> from rdkit import rdBase >>> print rdBase.rdkitVersion 2016.03.1 >>>
The latest version of iBabel has now been downloaded over 400 times since it was released in January.
iBabel is a GUI (graphical user interface) for the open source cheminformatics toolkit OpenBabel. It also provides an interface to a variety of tools built using OpenBabel and molecule viewers
BALL (Biochemical ALgorithms Library) is an application framework implemented in C++ that has been specifically designed to reduce development times in the field of Computational Molecular Biology and Molecular Modeling. It provides an extensive set of data structures as well as classes for Molecular Mechanics, advanced solvation methods, comparison and analysis of protein structures, file import/export, and visualization.
BALLView is BALL’s standalone molecular modelling and visualization application. Furthermore, it is also a framework for developing molecular visualization functionality.
It can be downloaded from here and requires
- CMake >= 2.8.12
- Python 2.7
- Qt 5.4
Installation instructions for Mac OSX are here
Opsin (Open Parser for Systematic IUPAC Nomenclature) has been updated. If you used Homebrew to install it you can get the latest version by simply typing
brew update brew upgrade opsin
Optibrium have announced the latest update to the StarDrop application. The highlight for version 6.3 is perhaps the integration of SeeSAR an intuitive structure-based design tool.
The new SeeSAR module for StarDrop provides a state-of-the-art and scientifically rigorous approach to understanding the binding of compounds in their protein targets in 3D. Users can import ligand and protein structures, derived from crystal structures or predicted with any docking software, and visualise the key interactions driving potency. This is seamlessly linked to StarDrop’s chemoinformatics methods based on 2-dimensional (2D) compound structure and its unique Card View approach to interpreting the resulting structure-activity relationships.
A preview of StarDrop 6.3 will be on show at the American Chemical Society National Meeting, 13th-17th March 2016.
I've been making increasing use of iPython notebooks, both as a way to perform calculations but also as a way of cataloging the work that I've been doing. One thing I seem to be doing quite regularly is calculating physicochemical properties for libraries of compounds and then creating a trellis of plots to show each of the calculated properties. In the past I've done this with a series of applescripts using several applications. This seemed an ideal task to try out using an iPython notebook.
The release of ChEMBL_21 has been announced. This version of the database was prepared on 1st February 2016 and contains:
- 1,929,473 compound records
- 1,592,191 compounds (of which 1,583,897 have mol files)
- 13,968,617 activities
- 1,212,831 assays
- 11,019 targets
- 62,502 source documents
Please see ChEMBL_21 release notes for full details of all changes in this release.
I was just sent details of this
Interested in doing some chemistry programming this summer? Have students that might be interested?
Open Chemistry has been accepted into the Google Summer of Code for 2016 - including Open Babel, Avogadro, cclib and 3DMol.js.
If you are a student and interested in doing open chemistry software development this summer (or know of someone who is), we're definitely up for good proposal ideas. Take a look at our suggestions or come up with one on your own:
Student proposals can be submitted between March 14th and March 25th. Instructions are at the Summer of Code website.
OpenEye have announced the release of OpenEye Toolkits v2016.Feb. These libraries include the usual support for C++, Python, C# and Java.
The update address several key features.
OpenEye toolkits are used in web services that require protection from malicious users. The most obvious attack vector against the OpenEye toolkits is file format parsing since scientific file formats are complex and often underdefined and there is the potential for embedded malicious code. This update closes a number of potential vulnerabilities.
FastROCS TK: Database Loading Performance
An interesting development is that physical memory limits on GPU's mean that for loading larger libraries the loading of the dataset actually takes longer than the actual search. This release addresses that issue.
This also contains first official release of OEMedChem TK, in particular access to matched molecular pairs.
This 2016.Feb release no longer support OSX 10.8 , but support has been added for OSX 10.11. This 2016.Feb release supports Python 3.5 for the following platforms: OSX 10.10, OSX 10.11, Ubuntu 12, Ubuntu 14, RedHat 6, and RedHat 7
I thought I'd have a look at the number of downloads of iBabel there have been since I announced the latest release last month. So far there have been over 250 downloads and there seems to be a steady stream of downloads as shown in the plot below.
iBabel is a GUI (graphical user interface) for the open source cheminformatics toolkit OpenBabel. It also provides an interface to a variety of tools built using OpenBabel and a selection of molecule viewers
I've just added Unicon and Mona to the alphabetical listing of applications.
UNICON is a command-line tool to cope with common cheminformatics tasks. The functionality of UNICON ranges from file conversion between standard formats SDF, MOL2, SMILES, and PDB via the generation of 2D structure coordinates and 3D structures to the enumeration of tautomeric forms, protonation states and conformer ensemble.
Mona is an interactive tool that can be used to prepare and visualize large small-molecule datasets. A set centric workflow allows to intuitively handle hundred thousands of molecules.
The iOS app PolyPharma has been updated. PolyPharma uses structure activity relationships to view predicted activities against biological targets, physical properties, and off-targets to avoid. Calculations are done using Bayesian models and other kinds of calculations that are performed on the device.
More details are available in this presentation.
The RSC's Undergraduate Research Bursaries are now open for 2016 entries, seeking talented chemical sciences students to undertake a research placement this summer.
These research bursaries are to fund short (6-8 weeks) summer research projects for undergraduate chemistry students in the middle years of their course. The purpose of the awards is to give experience of research to undergraduates with research potential and to encourage them to consider a career in scientific research.
The bursary is worth £200 per week (£210 in London) for up to 8 weeks to cover a defined research placement.
The deadline for applications is 21 February 2016.
Please note that, for the first time, in 2016 CICAG will be funding one student bursary for research work which falls into one or more of the following areas: cheminformatics, chemical information, chemical data management, chemistry data analytics, applications of computational chemistry.
For more information about guidelines, eligibility criteria, award conditions, and the application form please see the Undergraduate Research Bursaries web page http://www.rsc.org/Education/HEstudents/undergraduate-bursary.asp
Any questions should be directed to the Under Graduate Bursaries team (see link on the Undergraduate Research Bursaries web page linked to above).
Checkmol is a command-line utility program which reads molecular structure files in different formats and analyzes the input molecule for the presence of various functional groups and structural elements. At present, approx. 200 different functional groups are recognized. Output can be either clear text (English or German), a bitstring or its ASCII representation, or a set of special 8-character codes. This output can be easily placed into a database table, permitting the creation of chemical databases with a functional group search option. It was written by Norbert Haider, Department of Pharmaceutical Chemistry (now: Department of Drug and Natural Product Synthesis), University of Vienna, Austria.
The software is available both as source code and as a binary compiled for Linux (x86 architecture). It is entirely written in Pascal and it was compiled with Free Pascal 1.0.11 or Free Pascal 2.4.0 (starting from v0.4c). So to install we first need to get a Pascal compiler, this can be downloaded from Sourceforge.
If you are renewing your Royal Society of Chemistry subscription remember your membership covers up to three interest groups. If you are interested in cheminformatics you might be interested in the Chemical Informations and Computer Applications group.
I'm on the committee and we have a couple of interesting meeting in the planning stage.
iBabel started out as an AppleScript Studio application designed as a front-end to OpenBabel DOI, this was updated several times and is now an ApplescriptObjC application built with Xcode. As well as acting as a front-end to OpenBabel it also provided a front-end to tools built on OpenBabel and a molecule viewer using a selection of java applets and plugins via an embedded web view.
Now things have settled down a bit I've restarted work on iBabel and an update is now available.
I've transitioned most of the calls to babel over to obabel the differences are highlighted here and replaced the calls to the tools based built on OpenBabel with the new corresponding calls to obabel.
Chemical Computing Group have just released an up date to MOE, version 2015.10 includes:-
- Generate docked poses using FFT followed by all atom minimization
- Define receptor and ligand sites to focus docking
- Automatically detect antibody CDR sites
Integrated Alignment, Consensus and Superposition in the Sequence Editor
- Manipulate multimeric protein sequences using split side-by-side Sequence Editor panes
- Use dendrograms to visualize pairwise similarity, identity and RMSD relationships
- Select residues based on plotted values using resizable sequence editor plots
Distributed Pharmacophore Searching
- Run pharmacophore searches on a cluster directly from MOE GUI
- Perform fast corporate database searches
- Access multiple databases stored on a central server
Covalent Docking and Electron Density Docking
- Use reaction-based organic transformations to covalently docking
- Minimize ligand strain energy while maximizing ligand fit to electron density
- Run docking through an enhanced streamlined scenario-based interface
Extended Hückel Descriptors and pKa Model
- Compute molecular properties such as logP, logS and molar refractivity
- Determine populations of ligand protonation states at a given pH
- Calculate the pKa and pKb of small molecules
13C NMR Analysis
- Apply QM conformation refinement to calculate 13C NMR shielding
- Convert computed shieldings and predict 13C NMR chemical shifts
- Compare computed chemical shifts to experimental shifts for structure determination
I'll write a review in the New Year.
I just got this message so I thought I'd pass it on, I'll update any scripts that use the chemical identifier resolver in the New Year.
To all users of programmatic services on the cactus.nci.nih.gov web server of the CADD Group at the NCI/NIH:
The CACTUS web server will move to a significantly reconfigured system on new hardware by the end of the year. This move is planned to take place during the last week of December 2015. This move will also entail a change of the host's IPv4 address. Concurrent with the cut-over, the HTTPS protocol will be enabled for all services. Both HTTP and HTTPS will be supported in parallel for a transition period. We plan to turn off HTTP permanently by end of March 2016. Disruptions to users caused by the move should be minimal. If you encounter any bugs or different behavior starting 1/1/2016, please let us know immediately.
The Pistoia Alliance HELM project have announced free MarvinBeans 5.0 licenses and integration of HELM with the RDKit cheminformatics suite.
The Pistoia Alliance HELM project has made two major announcements that help cement the reputation of HELM as the de-facto standard for describing and working with complex macromolecular structures. Firstly, HELM users can now take advantage of free MarvinBeans 5.0 licenses for the HELM toolkit. Secondly, RDKit is now HELM-enabled, making it a valuable addition to the extensive range of open source HELM-enabled tools.
OpenEye have announced the release of OpenEye Toolkits v2015.October. These libraries include the usual support for C++, Python, C# and Java.
- FastROCS TK was added to the OpenEye toolkits collection
- Molecule reading performance improvement in OEChem TK
- The capabilities of the OEBio-Fragment Network have been expanded
- 213 new ring templates have been added to the OEChem TK built-in ring dictionary
In particular note the 2015.Oct release is the last to support Mac OSX 10.8 so time to upgrade if you have not already done so.
The OS X Molecular DataSheet XMDS app has been updated recently. This is a chemically aware spreadsheet editor: it operates on a grid of editable cells, made up of typed columns, that can be molecules, numbers or plain text.
The latest update brings drag and drop, as you might imagine moving cells containing molecules is rather more complicated than numbers or text. You can read full details here
OpenEye have announced the release of Brood v3.0 a bioisostere replacement program.
- Custom Fragment Conformations: Fragment geometries can now be derived from any 3D source, including the CSD.
- Fragment Joining and Cyclization: Finding a fragment to bridge two disconnected molecules or cyclize a molecule is now directly supported.
- Improved Filter Properties: Property filters can now have both minimum and maximum values.
- Mapping Fragments to Source Molecules: Molecules BROOD constructs now include the source molecule from which the replacement fragment is derived.
- Results Navigation: BROOD’s results navigation tool has been redesigned to be more intuitive, giving users an easy way to quickly explore the clustered and aligned analog molecules.
Full details are available in the release notes.
There is a great blog article on ChEMBL-og, describing their work evaluating chemical structure based searching in MongoDB. MongoDB is a NoSQL database designed for scalability and performance that is attracting a lot of interest at the moment.
The article does a great job in explaining the logic behind improving the search performance.
They also provide an iPython notebook so you can try it yourself.
The latest update to KNIME has been released.
The KNIME Analytics Platform incorporates hundreds of processing nodes for data I/O, preprocessing and cleansing, modeling, analysis and data mining as well as various interactive views, such as scatter plots, parallel coordinates and others. It integrates all of the analysis modules of the well known Weka data mining environment and additional plugins allow R-scripts to be run, offering access to a vast library of statistical routines.
What's New in KNIME 2.12
Analytics - Decision Tree to Rule Set (New node) - Rule Handling (New node) - Statistics measure as aggregation methods in GroupBy node - Extended PMML Support (New node) - Data Generation (New node) - More Statistics Nodes (New set of nodes)
SeeSAR has been updated to version 3.1, the release notes highlight two significant new features.
SeeSAR is a software tool for interactive, visual compound prioritization as well as compound evolution.
- Working with "big data" With this update we lifted the limit of handling only a maximum of 5000 poses in SeeSAR. We know that a lot of people like to do their compound analysis and prioritization after virtual screening campaigns also with much bigger sets. It is not likely that you will look at more than a couple of hundred poses, however, since the filtering (see also below) is extremely efficient, it provides quite an attractive opportunity to load all your data (not just the top x) and do your prioritization with all properties at hand right here in SeeSAR.
- Enhanced filtering Behind the scenes SeeSAR knows so much more about your compounds than what is displayed in the table. The basic stuff like no. of acceptors and donors, rotatable bonds, etc. to do the usual Lipinski-type filtering is of course available, but also more elaborate stuff like the number of hydrogen bonds formed or the number of torsions that lie outside the statistical "norm". All of these are now available for filtering to help you optimally trim down your data to find the really interesting part.
NOTE! SeeSAR project files from older versions are incompatible and cannot be loaded. By default SeeSAR puts a new version in a separate location. The recommendation is to export your data from the old project file with the old version and import it into the latest SeeSAR. This is a one-time effort, which allows you to benefit from the features of the most up-to-date version.
One of the most common tasks for those involved in cheminformatics is handling files containing molecular information, these files can be in a variety of file types and usually the task involved is relatively minor. cApp is Java application that provides a simple interface to a variety of everyday activities.
cApp requires JRE7 and uses the Chemistry Development Kit (CDK), an open-source Java library for chem- and bioinformatics, and associated software, JChemPaint as chemical editor, and routines developed within the Program Collection for Structural Biology and Biophysical Chemistry by the Hofmann group. Full details of cApp are described in a J Cheminformatics paper DOI.
InChI is the International Chemical Identifier developed under the auspices of IUPAC and are intended to be unique identifiers, they are freely usable and non-proprietary; they can be computed from structural information and do not have to be assigned by some organization;most of the information in an InChI is human readable (in theory!).
A recent paper in J Cheminformatics DOI describes the design, layout and algorithms of InChI, if you want to understand or implement the code this is a great starting point.
The paper is organized as follows. First, we discuss the general concepts associated with chemical identifiers. Then we outline the design goals of InChI and our general approach, focussing on the InChI model of chemical structure and the hierarchical layered structure of the Identifier; the concept of Standard InChI is introduced. This is followed by a detailed description of each of the possible major InChI layers, accounting for molecular connectivity, charge, stereochemistry, isotopic enrichment, position of hydrogen atoms and bonding in metal compounds, and the sublayers associated with these layers. We then describe the workflow of InChI generation (normalization, canonicalization, and serialization stages), as well as generation of the compact hashed code derived from InChI (InChIKey); the related algorithms and implementation details are briefly discussed. Finally, we provide information about InChI Software, licensing, known problems/limitations, and future prospects for InChI.
The source code and documentation can also be downloaded from here http://www.inchi-trust.org/downloads/
OpenEye has announced the release of OpenEye Toolkits v2015.June. These libraries include the usual support for C++, Python, C# and Java and are now available for download.
New Features Highlights:
- PDB Splitting in OEBio TK
- PAINS (Pan Assay Interference Compounds) filter in OEMolProp TK
- Matched molecular pair improvements in OEMedChem TK
- Custom ring template dictionaries in OEChem TK
- Anaconda support for easier Python toolkit installation
MoSS is mainly a program to find frequent molecular substructures and discriminative fragments in a database of molecule descriptions. It can be used in the context of drug discovery and synthesis prediction for the purpose of analyzing the outcome of screening tests. Given a database of graphs, MoSS finds all (closed) frequent substructures, that is, all substructures that appear with a user-specified minimum frequency in the database (and do not have super-structures that occur with the same frequency).
MoSS has been included in CheS-Mapper
DataWarrior 4.1.1 is available for download, in addition to precompiled binaries all Java source files and the script to build DataWarrior on Linux/MacOSX can be downloaded for free use under the GNU public license. DataWarrior is a free data visualization and analysis program with embedded chemical intelligence.
There is a review of DataWarrior here.
To be honest I can't remember when I last used Perl but this publication brought back a few memories DOI.
HackaMol is an open source, object-oriented toolkit written in Modern Perl that organizes atoms within molecules and provides chemically intuitive attributes and methods.
There is also a very interesting extension HackaMol::X::Vina, a structured class that provides an interface with the AutoDock Vina docking program
I just thought I'd like to thank all those who contributed to the Scientific Applications under Yosemite web page, many users and developers contacted me either via email or in the comments section and they certainly added information about applications that I don't have access to.
To date the page has been viewed well over 10,000 times with readers from 188 different countries. Viewers spent an average of just under two minutes on the page and it still attracts 800 pages views a month.
Given that 75% of the visitors to the site are now using Yosemite I suspect most scientists have now made the transition and I won't be updating the page any more. Once again thanks for the contributions.
BioSolveIT has just announced the release of SeeSAR 3.0.
This update of SeeSAR qualifies as major release 3, since it covers two milestones in its development. So far every SeeSAR session has started from scratch. The only way to retain molecules was to save them to file and re-load them again in a subsequent session. Needless to say that loading meant recalculating all Hyde-scores again...
Project files Starting with Version 3.0, SeeSAR allows you to store all session data in a project file. This includes the protein, ligands loaded from file and new (edited) ligands. Resuming your work on a project is now as easy as double-clicking on the project-file. As a result, everything just got a hell of a lot faster! Whilst calculating Hyde-scores for say 1000 compounds took around half an hour (depending on your hardware), loading the same information from a project file now takes only a few seconds. Note that you can also generate a project file on the command line, allowing you to outsource the calculation of Hyde-scores to a different machine. This enhancement is also a great way to exchange data and ideas with a colleague! Simply store your SeeSAR session as a project file in a commonly accessible location (e.g. a network drive). Your colleague can take a look with just a double-click.
Hyde update Hyde is quite sensitive with regards to the precise geometry of a binding pose. Even the tiniest difference in a pose can distort an anyway stretched hydrogen bond just so much that it is not recognized anymore - thereby leaving you with a huge desolvation penalty for such atoms, without the gain from the h-bond. This "sharpness" of Hyde is its greatest strength (for example by highlighting real activity cliffs), but also its greatest weakness (especially if the structure has flaws or is of low resolution). In order to minimize such troubles, we optimize each pose before the Hyde affinity assessment. We improved this optimization significantly. It is now fully flexible and with sharper clash criteria, making it suitable for docked poses as well as edited compounds. All of this as efficient as before, just perfect for interactive use.
There is a review of an earlier version of SeeSAR here
MOE2014.0901 Update is now available. MOE is a fully integrated molecular modelling and drug discovery software package.
MOE 2014.0901 updates:
- Option for AMBER residue name
- Append/prepend multiple residue sequence specified by single-letter names Builder:
- Added H’s inherit color if there is a consistent coloring in the residue
sddesc: New -smi:p option causes field headers to be written to the output ASCII file
- MOESVLRUNPATH now properly honored
- Combinatorial Builder now honors different attachment point locations on the same R-group
- Database Save As one entry per file mode now properly generates unique filenames
- Dock Template Forcing batch file now correctly generated
- Saved views in .moe files now properly restored
- Auto-save when Database Viewer display attributes are changed can now be disabled to prevent changes to the database file modification date when only the display is changed and not the database content
- SVL function Deprotonate now works properly
- Various MOE Project and Project Database Update bugs
- Various minor bug fixes
There are reviews of MOE available here
Moe:- Molecular modeling
Moe Update (Jan 2009):- Molecular modeling
Review of MOE (2009.10 release):- Molecular modeling
Moe Update (December 2010.10 release):- Molecular modeling
Moe Update (December 2011 release):- Molecular modeling
Moe Update (December 2012 release):- Molecular modeling
DecoyFinder has been updated to version 2.0. Decoy Finder is a graphical tool which helps finding sets of decoy molecules for a given group of active ligands. It does so by finding molecules which have a similar number of rotational bonds, hydrogen bond acceptors, hydrogen bond donors, logP value and molecular weight, but are chemically different, which is defined by a maximum Tanimoto value threshold between active ligand and decoy molecule MACCS fingerprints. Optionally, a maximum Tanimoto value threshold can be set between decoys in order to assure chemical diversity in the decoy set.
There have been some changes in the dependencies, it now needs RDKit (with OpenBabel being optional) and PyQt4 instead of PySide.
Installation of RDKit was already described in the page on setting up a Mac for Cheminformatics, and I've now added the instructions for pyqt
brew install pyqt
cinfony is a common API to several cheminformatics toolkits. It uses the Python programming language, and builds on top of Open Babel, RDKit, the CDK, Indigo, JChem, OPSIN and cheminformatics webservices. Currently it is hosted on Googlecode which is closing down. Fortunately the source code is also hosted on github, but you will need to look at the Google code site to read full details of the project. So the installation is:-
git clone https://github.com/cinfony/cinfony.git cd cinfony python setup.py install
Then download the DecoyFinder2 source, and run
Reading through the discussion on Scientific Applications under Yosemite it seems some people are having problems with PYMOL, I thought I'd mention that installation of PYMOL using Homebrew is included on the page describing how to set up a Mac for Cheminformatics. The page also describes how to install a wide range of other useful tools.
ChemStack is a collection of components that allow users to build chemically intelligent systems, such as collaboration tools, information portals, electronic laboratory notebooks, eLearning systems etc.
Some examples of useful chemical interfaces include:
- A sketcher to draw molecules
- Viewers to display molecules
- Components to display and interact with spectra
- 3D graphics engines to investigate 3D structures
- Text based input for IUPAC names and queries
There is a demo here that searches the ChEMBL database of 1.5M structures.
I described the use of the ability to script in Vortex multiple sub-structure searches using SMARTS. There are many occasions when this sort of feature is useful, if you want to flag molecules that contain reactive functional groups, toxicophores, or PAINS functional groups that have been shown to interfere with a variety of screens. Whilst the script worked fine it was rather slow for larger datasets, in the latest tutorial you can see how to take advantage of some of the latest features in Vortex to substantially improve search speeds allowing searching of 70 million compound collections on a desktop.
Scripting Vortex 24:- Substructure searching very large compound collections.
There are many more scripts listed on the Hints and Tutorials Page.
OpenEye have announced the release of OpenEye Toolkits v2015.Feb. These libraries include the usual support for C++, Python, C# and Java.
NEW FEATURES HIGHLIGHTS
- Depiction of protein-ligand interactions in Grapheme TK
- Improvements to matched pair analysis in OEMedChem TK
- Improved orientation options for images from OEDepict TK
- Better ring layout in OEDepict TK
- A major upgrade to the documentation system
A little while back I wrote a detailed tutorial for getting a wide variety of cheminformatics tools running on a Mac.
Someone just let me know about an issue with OSRA a utility designed to convert graphical representations of chemical structures, as they appear in journal articles, patent documents, textbooks, trade magazines etc., into SMILES
It turns out that OSRA requires ghostscript to process pdf images, this can be installed using brew.
brew install ghostscript
The MedChemWizard is a KNIME workflow designed to assist medicinal chemists with idea generation, ligand design and lead optimization using a number of common functional group transformations and medchem rules-of-thumb, this tutorial provided by Dr. Alastair Donald gives a detailed description of it's use.
ChEMBL is a manually curated chemical database of bioactive molecules . It is maintained by the European Bioinformatics Institute (EBI), of the European Molecular Biology Laboratory (EMBL), based at the Wellcome Trust Genome Campus, Hinxton, UK. The database currently contains over 1.4 million unique structures with the associated activity at 10,579 different targets. It also acts as a repository for Open Access primary screening and medicinal chemistry data directed at neglected diseases.
Whilst the database can be downloaded, the data can also be accessed via a web interface (shown below) and a series of web services, these Vortex scripts show how it is possible to pull data from ChEMBL into Vortex.
As usual I’ve written it as a tutorial to try and offer some explanation how the script works, Scripting Vortex 23:- Accessing ChEMBL using Web Services
I think this rather nicely shows the power of web services and json.
There is a list of other Vortex scripts on the Hints and Tutorials page
Whilst there are many sites that track the compatibility on common desktop applications, it is often difficult to find out information about scientific applications. Given that this seems to be such a major upgrade I thought I’d set up a spare machine to test applications before I update my main machine. I’ll update the list regularly and feel free to send in information.
I have a number of applications/libraries/toolkits installed using Homebrew and installed in usr/local, this is known to cause extended installation times for Yosemite. So don’t worry if it appears the install is stuck at 1 min remaining.
If you do use Homebrew then it is worth updating
brew update brew upgrade
Aabel 3 appears to be working fine
BBEdit version 10.5.13 and newer are compatible with Yosemite
Beaker all seems OK
ChemBioDraw versions 12, 13 and 14 all function as before.
ChemDoodle all seems to work fine
Chimera aka UCSF Chimera versions 1.10 and higher are working on Yosemite.
Conquest and Mercury from CCDC works fine but you may need to reinstall Quartz (see below)
Cytoscape 3.1.1 seems to be working fine
DataWarrior no issues
EndNote X7.2 works well with Yosemite.
Findings Electronic Notebook no issues, only small issue is that the ‘+’ button of the window does not trigger full-screen, though it can still be done via the Window menu.
IDL 8.2 and earlier gags on a missing reference in libPng.dyld, but IDL 8.3 and later is OK
Igor Pro version 18.104.22.168 works fine
iNMR no problems reported
MacVector 13.0.6 No significant issues reported
Mathematica no issues reported
MOE works fine but you may need to reinstall Quartz (see below)
OpenBabel no issues so far
Opsin all works fine
OSRA no issues
Papers Current version is compatible but not optimised, they hope to have a beta out of a substantially redesigned version next week.
Pro Fit 6.2 appears to work fine.
Pybel no issues reported
PyCharm works fine
Pymol All these are confirmed to work:
- MacPyMOL 22.214.171.124
- MacPyMOLX11Hybrid 126.96.36.199 after XQuartz reinstall (see below)
- Open-Source PyMOL with homebrew
Known issues with MacPyMOL:
- Movie export broken.
Edu-only-PyMOL (free Student version)
Does not work.(Now updated to work with Yosemite)
No reports so far about about - Other legacy versions (0.99 etc.) Apparently progam will not open - Open-Source PyMOL with fink or macports
PyRx 0.8 for docking works fine
RDkit no issues reported
SeeSAR all seems to be working fine
Sente 6.7.8 seems to run fine, except that it cannot open a reference library from the File > Open... dialog box. Workaround is to open from Finder.
Spartan 14 does not work because the Sentinel drivers are broken in Yosemite. The problem is NOT with Spartan, it is with the SafeNet developed Sentinel Run-Time Environment driver (the license manager). SafeNet has not given a definitive date when they will release an updated driver with Yosemite compatibility, but they are working on this. Best advice is to not upgrade but if you have to then contact email@example.com for a temporary alternative license procedure.
Torch no issues
VarSeq no issues
Vortex Upgraded when the developer preview came out. All works fine
The VVI products work well enough on Yosemite, but I'd like to achieve a higher level of quality for Yosemite (and iOS/iPad). There is an ongoing beta program for this product: https://itunes.apple.com/us/app/graph-ide/id904733611?mt=8 which is Graph Builder reincarnated on the iPad. There is also a beta program ramping for Graph Builder on Yosemite: https://itunes.apple.com/us/app/graph-builder/id470597599?mt=12 but a last minute interaction bug with Yosemite has delayed that for perhaps a few days. Please feel free to broadcast this information as you see fit. Beta program participation should be directed to firstname.lastname@example.org
VMD no issues reported
Wizard Pro is fully Yosemite compatible
XQuartz it seems the Yosemite installer deleted the symlink between /opt/X11 and /usr/X11; you can either reinstall Quartz or try "ln -s /opt/X11 /usr/X11"
Updated 30 October 2014
OpenEye have announced the release of VIDA v4.3. This is a major update with many new features and enhancements, including improvements to depiction, 2D alignment, list manager manipulation, surface selection and display, default colouring schemes, both visual and list-driven atom subset selection, cluster viewing, colouring by SD property and extension management.
One feature I’m sure will be very popular is the new advanced depiction options, including atom property maps from the Grapheme TK, substructure highlighting, and 2D structure alignment, are available for depiction in the 2D window and spreadsheet
Support for Mac OS X 10.8 and 10.9 was added
Mac OS X 10.6 is no longer supported
Swift is a new programming language from Apple for iOS and OS X apps that builds on the best of C and Objective-C, without the constraints of C compatibility. I’m delighted to hear that people are starting to explore it’s use in scientific applications. Dr. Alex M. Clark has posted his early impressions on the Cheminformatics blog, well worth a read.
There is also the Swift blog for more interesting tips.
CheS-Mapper has been updated to version 2.4.
New Features Add Moss as new structural fragment mining algorithm Show the number of distinct 3D positions (at the top right, alongside other dataset info) Mapping warnings are now acessible within the viewer (Menu: Help > Show mapping warnings) Add hint for multiselection of compounds via 'control'-key (is shown when zooming into compounds for the first 3 times) More Changes The viewer no longer zooms out when changing component size or spread Add log conversion of feature values, by adding a new feature, instead of log-highlighting (gives better overview of log-distributed values, e.g. within the chart) Multiple selected compounds are now highlighted within the chart for nominal features (was only possible for numerical features) Fix Fix error that showed strucutural fragment values as '1'/'0' instead of 'match'/'no-match'
CheS-Mapper (Chemical Space Mapper) is a open source 3D-viewer for chemical datasets of small molecules, a publication in the Journal of Chemiformatics describes an early version of the application DOI: 10.1186/1758-2946-4-7, and there is a review here.
A little while ago I suggested on Twitter that it might be useful if all chemistry undergraduates conduct a LogP or pKa determination as part of their practical classes. These results could then be stored in an open access database that would grow into a fantastic resource.
Sven Kochmann has now fleshed that initial idea out into a detailed proposal. Well worth reading and I’d encourage people to participate.
Asteris has been updated. Asteris is an iOS app that arose from a collaboration between Optibrium and Integrated Chemistry Design that allows medicinal chemists to design new molecules on their iPad and then calculate a range of physiochemical and ADME properties.
What's New in Version 1.0.2 Add sulfoxide support, using either double bond, or separated charges, Add multiple ring creation with one gesture if atoms are selected. Permit scaling with selected atoms and bonds. Add wavy bonds if Single bond tapped a second time. Add Presentation Mode.
There is a review of version one here
I was recently asked about compiling an algorithm, plane of best fit (PBF), to quantify and characterize the 3D character of molecules as described in Plane of Best Fit: A Novel Method to Characterize the Three-Dimensionality of Molecules, Nicholas C. Firth, Nathan Brown, and Julian Blagg, Journal of Chemical Information and Modeling 2012 52 (10), 2516-252 DOI. The source code is all available from the rdkit repository https://github.com/rdkit/rdkit/tree/master/Contrib/PBF.
The compilation had a slight glitch but full details are here.
It seems that Python is becoming the preferred language for scripting in science and I wrote a getting started page for Chemists and several people have pointed out a couple of resources that may be useful in particular Roaslind.
Rosalind is a platform for learning bioinformatics and programming through problem solving.
I looks like an excellent starting point for newcomers and more experienced programmers, whilst focussed on bioinformatics the exercises are useful for all disciplines.
For chemists chempython looks to be a very useful resource.
DataWarrior is a data analysis tool that understands chemistry, it provides an efficient way to search, sort and analyse structure-activity data. DataWarrior was developed at Actelion and it is highly integrated into the drug discovery platform, in 2014 it was decided to release DataWarrior without the integration layer as a stand-alone tool to the public. DataWarrior is a Java application and thus is cross platform.
I’ve written a review on my initial impressions.
OEToolkits 2014.Jun This release of the OpenEye toolkits is focused on stability and new platform support. The last release, 2014.Feb, was a major feature release introducing numerous new features. This release focused on fixing many bugs and improving the overall stability of the OpenEye toolkits.
There is still a major new feature being added in this release:
FreeForm API added to Szybki TK
Mac Users should note this release will be the last release to support OSX 10.7.
ChemAxon have just announced the release of version 6.3.
This release includes several new features including the ability to draw and analyze complex patent Markush structures, display Markush hierarchy, work with R-Groups and enumerate Markush structures.
A new Solubility Predictor, the aqueous solubility predictor is based on the topology of the input molecules, but also calculates the pH dependence and the solubility at a desired pH level.
IUPAC name conversion supports now Japanese names as well as the existing English and Chinese names even if mixed in the same document so you can extract all the chemistry from documents in these languages.
There are also updates for Marvin JS 6.3, Standardizer & Structure Checker 6.3, Instant JChem 6.3, and Compound Registration 6.3.
In late 2012 Robert Bruns and Ian Watson published a paper entitled Rules for Identifying Potentially Reactive or Promiscuous Compounds. These 275 rules encapsulated 18 years of Drug Discovery experience and are used to identify potentially troublesome molecules.
The code to implement these rules was kindly made available by Ian Watson on GitHub unfortunately my initial attempts to compile this failed, but Matt was able to provide a patch to compile under Mac OSX (Mavericks) using Clang. Whilst this would be sufficient he then went the extra step and made it available via HomeBrew.
You can read more here.
After I posted the page on setting up a Mac for Cheminformatics I was asked if I could do something similar for writing chemistry (or Science in general) Python scripts on a Mac. So I’ve written a “How to” page on setting up your Mac to use the iPython notebook and write simple scripts that use Pybel to access OpenBabel.
The page is here Python, Chemistry and a Mac 1, and I’ll probably add more pages/scripts in the future.
I’ve recently needed to set up a new Mac and I realised that the current installation process for all the applications, tools, chemistry toolboxes, and associated dependencies was unmanageable. I have a mixture of apps that I have compiled myself, others that I have simply used the precompiled binaries, others from Macports etc.
I decided to write a detailed account of the process of installing a number of toolkits and packages using Homebrew and PIP.
You can read the full account here in the hints and tutorials.
I’d be delighted to hear of any comments or suggestions for addition.
SMILES (Simplified Molecular Input Line Entry System) is a simple yet comprehensive chemical language in which molecules and reactions can be specified using ASCII characters representing atom and bond symbols. This system is compact and human readable which has made it an attractive way to store chemical information within a database.
Ethanol CCO Cyclohexane C1CCCCC1 Nicotine CN1CCC[C@H]1c2cccnc2
In order to search for specific sub-structures it is necessary to create a query that describes the pattern of atoms and bonds (subgraph) required within the molecule (graph). SMARTS is a language that allows you to specify substructures using rules that are straightforward extensions of SMILES. That said complex queries can get challenging to interpret which is why the SMARTS viewer and SMARTS editor from BioSolveIT, two tools developed by Karen Schomburg and Lars Wetzer at the Center for Bioinformatics at the University of Hamburg, are so valuable.
The tools are provided free until June 30th 2014
K. Schomburg, H.-C. Ehrlich, K. Stierand, M.Rarey From Structure Diagrams to Visual Chemical Patterns J. Chem. Inf. Model., 2010, 50 (9), pp 1529-1535 http://pubs.acs.org/doi/abs/10.1021/ci100209a
K. Schomburg, L. Wetzer, M. Rarey Interactive Design of generic chemical patterns Drug Discov Today (2013) http://dx.doi.org/10.1016/j.drudis.2013.02.001
We are starting to see companies exploit the client server model in bringing ever more sophisticated scientific applications to the iPad.
Asteris is a joint development from Optibrium the creators of StarDrop and Integrated Chemistry Design who created Chirys Draw. Asteris uses Chirys Draw’s touch interface to design novel molecules and then uses StarDrop’s predictive modeling power, guided by the Glowing Molecule™ visualization, instant feedback dramatically reduces the time it takes you to identify high quality compound designs. Using Asteris you can calculate a range of simple “core properties”, and ADME properties, including solubility, hERG inhibition and CNS penetration, using rigorously validated models from the StarDrop platform.
- CORE PROPERTIES
- Molecular Weight
- Number of rotatable bonds
- Number of hydrogen bond donors
- Number of hydrogen bond acceptors
- Topological polar surface area.
- STARDROP ADME PROPERTIES
- 2C9 pKi
- BBB log([brain]:[blood])
- BBB category
- HIA category
- P-gp category
- 2D6 affinity category
- PPB90 category
All of the predictions are calculated using StarDrop ’s ADME QSAR module. You will need to be connected to the internet to perform these calculations using the secure Asteris cloud server.Alternatively, you can run the calculations on your own server with the “Enterprise” edition.
All communications with the server uses industry-standard SSL encryption. No compound structures or data are stored on the server. Calculate "core properties" for an unlimited number of molecules for free. Calculate ADME properties for 20 new compounds each month, free of charge. Additional ADME property calculations can be purchased via an in-app purchase.
There are demo videos on the support site.
Diversity Genie is a small but powerful utility to analyze datasets of small organic molecules. Its features include:
- Calculation and comparison of diversity of chemical sets
- Ability to handle sets of millions of molecules
- Sorting, slicing, and merging large SD files
- Conversion between SMILES, InChI, and SDF formats
- Filtering based on property values and structural uniqueness
- Computation of 2D and 3D atomic coordinates
- Addition/Removal of implicit hydrogens
- Computation of molecular properties such as molecular weight, number of rotatable bonds, number of HBD, HBA, as well as other descriptors
- Export and import of data to/from CSV files
- Data visualization
On the 10th anniversary the OEChem toolkit from OpenEye has been updated,
- Added support for OSX 10.9 Mavericks.
- The next toolkit release, 2014.Jun, will be the last release to support OSX 10.7.
- This release will be the last release to support OSX 10.6.
- The next toolkit release, 2014.Jun, will be the last release to support 64-bit Ubuntu 10.04.
- GCC 4.8.2 support added for RHEL6. GCC 4.8.1 had a bug that made it impossible to compile OpenEye header files. Please use 4.8.2+.
- Experimental support for Python 3.3 added.
Un1Chem is a new web resource provided by the EBI, it is a 'Unified Chemical Identifier' system, designed to assist in the rapid cross-referencing of chemical structures, and their identifiers, between databases. Currently the uniChem contains data from 21 different data sources.
This script originally created by Sune Askjær first calculates the InChiKey for molecules in a workspace and then uses Un1Chem to search for information in multiple databases, then it provides a summary and a link to a locally generated summary table.
Full details are here Scripting Vortex 18.
In the tutorial Scripting Vortex 15 I showed how it is possible to create a contextual script for Vortex that downloaded a specific PDB file, then a FlexAlign Vortex script first identifies the structure column and then get the SMILES string of the selected molecule generates a 3D structure and uses Flex Align to do a one-shot flexalign between the ligand in the system in MOE, and the incoming ligand.
While this is useful if you have similar structures (perhaps analogues in a series) there will certainly be situations where it may be preferable to dock the new ligand into the binding site. The Scripting Vortex 17 tutorial describes how to achieve this.
Whilst much computational work is undertaken to support, library design, virtual screening, hit selection and affinity optimisation the reality is that the most challenging issues to resolve in drug discovery often revolve around absorption, distribution, metabolism and excretion (ADME). Whilst we can measure the levels of parent drug in various medium tracking metabolic fate can often be a considerably more difficult proposition requiring significant resources. For this reason prediction of sites of metabolism has become the subject of current interest.
FAME DOI is a collection of random forest models trained on a comprehensive and highly diverse data set of 20,000 small molecules annotated with their experimentally determined sites of metabolism taken from multiple species (rat, dog and human). In addition dedicated models are available to predict sites of metabolism of phase I and II processes.
FAME offers a high performance prediction of sites of metabolism mediated by a wide variety of mechanisms.
There is a list of software reviews here.
Marvin has been updated
Bugfixes, Java Webstart did not run on Macintosh computers.
It can be downloaded from here
Note, many of you have bumped into the problem when the Gatekeeper Security of OS X blocks launching applications downloaded not from the Apple store. The solution is to modify the default settings of the Gatekeeper:
- 'Apple > System Preferences > Security & Privacy '
- In the 'General' section the setting of ' Allow applications downloaded from :' should be set to ' Anywhere'
After this you would not get the "damaged dmg" popup and you can install the downloaded dmg. After install it is probably a good idea to reset the Security settings.
If you are using Java applets it is probably worth reading this article
Apple just have introduced some new security settings in Safari for Java. In an average browser to make a Java Applet to be able to touch your file system that Applet must be signed. In the new security update of Safari this Applet must be trusted as well. This means that you have to allow for the Applet to read and write you file system. Marvin starts with accessing some files on your computer, which means that it might not start without this permission or might not behave correctly
Ever had problems with an unusually formatted PDB file? PDBinout is a file conversion tool for PDB files that might interest you. It was created by Tomasz Woźniak at the Laboratory of Structural Chemistry of Nucleic Acids, Institute of Bioorganic Chemistry, Polish Academy of Sciences
PDB format is the most commonly used by various programs to define three-dimensional structure of biomolecules. Those programs however, often use different versions of this format. Therefore, it is often necessary to write own re-formatting scripts or change files manually, which makes PDB files less convenient to use. There are only few tools allowing to change one or two versions of PDB format into another and no comprehensive approach for unifying PDB format was developed. Here we present an open-source, Python-based tool PDBinout for processing and conversion of various versions of PDB file format for biostructural applications. Moreover, PDBinout allows to create one’s own PDB versions.
The download also includes a tutorial.
Reference Woźniak T. and Adamiak R.W. (2013) Personalization of structural PDB files, Acta Biochimica Polonica 60, Paper in Press
OpenEye have just announced the release of pKa Prospector v1.0 a database of high quality experimental pKa determinations. The ionisation state of a drug molecule can have profound effects on affinity, dissolution, absorption, distribution, metabolism and off-target activity. The ability to predict pKa is often compromised by the lack of relevant experimental data, pKa Prospector is intended to address that issue.
The built-in experimental pKa database was compiled by Tony Slater of pKaData Limited from a collection of IUPAC sources. Each measurement has been individually verified, curated, and assigned a metric of quality. There are more than 30,000 experiments across 12,000 molecules represented. The database is particularly relevant for medicinal chemistry due to the strong preponderance of room temperature aqueous measurements, the many molecules with multiple experimental records, and the presence of over three hundred different heterocycles.
It is also possible to add additional experimental results and have them integrated into the application thus expanding the chemical space covered. The search uses rooted maximum common substructure (MCS) with "electronically-aware" scoring, alternatively it can be searched by similarity or substructure. Ionizable groups are automatically identified and highlighted.
There was a blog entry on In the Pipeline about a bug in ChemDraw. Actually this has been known for a while (and present in previous versions) but it seems it still has not been fixed in the latest version of ChemBioDraw 13 on the Mac. As you can see in the image below including explicit hydrogens in your structure significantly impacts the calculated LogP. Whilst people don’t often add explicit atoms to phenyl rings, (expect perhaps in SAR studies) they often add them to heteroatoms.
At the moment there is no bug fix and no date set for a fix to any version of ChemBioDraw, the only approach is to avoid adding explicit hydrogens to structures if you want to calculate LogP. I’ve looked at a number of other applications and there seem to be no issues with ChemDoodle, Elemental, Marvin or OpenBabel.
New features and improvements
- MarvinSketch Dialog
- 'Zoom to scaffold' checkbox option has been added to the "Preferences>Save/Load" tab. Documentation
- Structure Checker
- External structure checker configuration file URL can be set via Java System Property.
- Electron-flow arrow could not be drawn from the A-B bond to the incipient A-C bond of an A-B-C structure. Forum
- MolInputStream and MolImporter could have different format options.
- MolImporter did not close its inputstream when an exception was thrown in the constructor.
- MOL, SDF, RXN, RDF
- Molecule type property was allowed in SDF, CSSDF export.
- The coordinates of the sequence residue imported from SCSR MOL files were wrong if the residue had three attachment point.
- Color and text format of atom label is exported to CDX and imported from CDX and CDXML. Forum
- Graphical brackets were not imported from CDX files.
- Gaussian Z-matrix input format
- Command line, title line, and extra input properties were not exported to Gaussian Z-matrix input format. Forum
- Clean 2D
- Cleaning of position variation bonds could create overlapping bonds.
- Cleaning of bridged systems could result in overlapping atoms. Forum
- Topology Analysis
- Missing method has been added: TopologyAnalyserPlugin.getFsp3(). API Documentation
- New logD training documentation has been added. Documentation
- Structure Checker
- Fixer options in MarvinSketch are updated with newly defined settings.
- External checkers can be loaded from JAR file in case the JAR file contains a space.
StarDrop was recently updated to version 5.4, this brings an update to the virtual library design module and scaffold based design, there have also been improvements to the plotting and data visualisation.
There are now seven optional plugins with three exciting new options.
Derek Nexus™ - Knowledge based toxicity prediction The new Derek Nexus module for StarDrop provides Lhasa Limited's world-leading technology for knowledge-based prediction of key toxicities. Using data from published and donated (unpublished) sources, Derek Nexus identifies structure-toxicity relationships that alert you to the potential for your compounds to cause toxicity. The Derek Nexus module provides predictions of the likelihood of a compound causing toxicity in over 40 endpoints, including mutagenicity, hepatotoxicity and cardiotoxicity.
BIOSTER™ - A world of chemistry experience BIOSTER is developed and updated in collaboration with Digital Chemistry and is available as an optional extension to StarDrop's Nova module. This combination enables you to quickly and easily search the comprehensive BIOSTER database to identify transformations that are relevant to your compounds. These can be automatically applied to generate novel structures with a high likelihood of biological activity and synthetic accessibility, prioritised against the property profile you require for your project. BIOSTER brings the collective experience of the chemistry community to help you to discover new active analogues of your compounds based on the tried and tested principle of isosterism. The BIOSTER module contains a unique compilation of over 20,000 precedented bioisosteric transformations, manually curated from the literature by Dr István Ujváry, complete with references to the original publications in which they are described.
torch3D™ The renamed torch3D module, using Cresset’s unique Field technology to understand and apply 3D Structure Activity Relationship (SAR), has been updated to include the latest version of Cresset’s XED force field providing insight into compounds’ 3D structures, biological activities and interactions.
These certainly significantly expand the potential utility of StarDrop, but note that these are not part of the standard install and may require additional licensing.
Marvin 6.1 has been released:
New drawing and displaying features in MarvinSketch and MarvinView
Among others drawing peptid cycles and bridges is available now, and IUPAC numbers can be displayed in MarvinSketch and MarvinView
Better images to structures conversion in Document to Structure
Optical Structure Recognition tools CLiDE and Imago can be used, in addition to OSRA
New Chinese Document to Structure feature
Chemical names in the flow of Chinese sentences are detected, without the spaces that separate words in English
Homology groups has been added Structure Checker got an integral part of Marvin Beans
Installing Marvin Beans will install the fully functional Structure Checker application
- Structure Checker: Checker names and error messages can be localized or customized
- Elemental Analysis: Charge is taken into account in atomic mass calculation
- Structure Checker: “Copy as action string” option is available
In tutorial 4 we looked at using the command line tool sddesc from Chemical Computing Group to calculate a number of molecular descriptors and then import them into Vortex. However there a couple of issues with doing this not the least ensuring all the environment variables are set correctly. An alternative is to use MOE as a web service and access the tools using the SOAP protocol (Simple Object Access Protocol). This protocol provides a specification for exchanging structured information in the implementation of Web Services in computer networks. It relies on XML Information Set for its message format.
Un1Chem is a new web resource provided by the EBI, it is a 'Unified Chemical Identifier' system, designed to assist in the rapid cross-referencing of chemical structures, and their identifiers, between databases. Currently the uniChem contains data from 19 different databases:-
Since ChemBioDraw can generate InChi Keys I thought it might be interesting to write an applescript that access this service. The InChIKey is a short, fixed-length character signature based on a hash code of the InChI string. By definition, hashing is a one-way conversion procedure and the original structure cannot be restored from the InChiKey allowing confidential searching.
Manipulating large collections of molecules can be a laborious exercise, involving many repetitive tasks such as s checking for duplicates or filtering by physico-chemical properties as you sift through vendors catalogues. A recent publication Journal of Cheminformatics 2013, 5:38 DOI describes a tool that may be useful.
To support intuition-driven processing of compound collections, we developed MONA, an interactive tool that has been designed to prepare and visualize large small-molecule datasets. Using an SQL database common cheminformatics tasks such as analysis and filtering can be performed interactively with various methods for visual support. Great care was taken in creating a simple, intuitive user interface which can be instantly used without any setup steps. MONA combines the interactivity of molecule database systems with the simplicity of pipelining tools, thus enabling the case-to-case application of chemistry expert knowledge.
Mona has been built on top of the well-known Naomi cheminformatics framework that has been proven to be robust, accurate and extremely efficient. Mona facilitates:
- Loading molecules files stored in SDF, MOL2 or SMILES-format
- Scanning entire directories for molecule data
- 2D depiction of hundreds of thousands of molecules
- Duplicate removal
- Filtering based on physic-chemical properties, functional groups, and substructures
- SMARTSediting and SMARTSmatch-visualization
- Set-operations such as union, intersection, sub-set splitting
- Visualization of property distributions
I’m not a big user of R a free software environment for statistical computing and graphics, but occasionally I notice cheminformatics modules being published. The latest issue of Bioinformatics DOI has a paper describing “fmcsR: Mismatch Tolerant Maximum Common Substructure Searching in R”.
The fmcsR package provides an R interface, with the time consuming steps of the FMCS algorithm implemented in C++. It includes utilities for pairwise compound comparisons, structure similarity searching, clustering and visualization of MCSs. In comparison to an existing MCS tool, fmcsR shows better time performance over a wide range of compound sizes. When mismatching of atoms or bonds is turned on, the compute times increase as expected, and the resulting FMCSs are often R1C5 substantially larger than their strict MCS counterparts. Based on R1C6 extensive virtual screening (VS) tests, the flexible matching feature enhances the enrichment of active structures at the top of MCS-based similarity search results. With respect to overall and early enrichment performance, FMCS outperforms most of the seven other VS methods considered in these tests.
fmcsR is freely available for all common operating systems from the Bioconductor site http://www.bioconductor.org/packages/devel/bioc/html/fmcsR.html.
DecoyFinder is a graphical tool which helps finding sets of decoy molecules for a given group of active ligands. It does so by finding molecules which have a similar number of rotational bonds, hydrogen bond acceptors, hydrogen bond donors, logP value and molecular weight, but are chemically different, which is defined by a maximum Tanimoto value threshold between active ligand and decoy molecule MACCS fingerprints. Optionally, a maximum Tanimoto value threshold can be set between decoys in order to assure chemical diversity in the decoy set.
Having spent the weekend getting it to run under Mac OS X I thought I’d write it all up so others can hopefully do it a little more smoothly.
The latest update to SYBYL-X has been released, version 2.1 is only supported on 64-bit systems. In addition Python 2.4 is no longer supported, if you are using the latest Mac OS X then you should have Python 2.5.1. To check simply type python in a Terminal window.
chrismacbookpro:~ chris$ python Python 2.5.1 (r251:54863, Nov 13 2007, 11:10:08) [GCC 4.0.1 (Apple Inc. build 5465)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>>
The major changes in SYBYL_X are a New Job Control System which replaces Netbatch, giving a consistent interface that is implemented across all of SYBYL-X modules. This is compatible with popular job queuing systems like Oracle Grid Engine, LSF, and Torque, and provides improved multi-processor support for key applications (Surflex-Sim, Surflex-Dock, Topomer Search, and UNITY).
There have also been updates to the Molecular Data Explorer (MDE) including:
- Set 3D Viewer preferences for the display of proteins
- Switch the structure viewing between the 3D Viewer and SYBYL’s main graphics window
- Mark compounds in the 3D Viewer
- Mark compounds in the Grid Viewer
- Use a right-click menu in the Grid Viewer
- Display a regression line and or unity line in a Scatter Plot
- Tile Viewers in a grid
- Set precision of column data
- Save structures to a database
- Export structures and associated column data to a MOL2 file
- Copy a table
The Python QSAR functionality is now accessible outside of SYBYL as standalone Python scripts. Results of the Python jobs can be read into SYBYL using the new readXML expression generator. See $TA_LIB/python/lib/python2.7/site- packages/tripos/qsarutl/README for more information.
QSAR Project Manager enhancements include the ability to:
- Modify names of structure sets and descriptor sets via a right-click menu
- Rename and delete items in the Project Data list via a right-click menu
I’ve just read a new blog entry on Noel O’Blog regarding the development of useful cheminformatics code, in the piece he advocates the use of Open Babel
I'm going to propose that you should write or adapt this code as an Open Babel plugin. I've just done this for Confab, the conformer generator I wrote some time back. If you do this, you don't need to consider how to put together the build infrastructure, write the code for reading/writing file formats, or for handling command-line options and arguments (in fact, you get a lot of additional functionality for free). More generally, the software will compile cross-platform, be included in every major Linux distribution and be available to a very large number of people. It will also have a lifetime beyond the end of the grant that funded it.
I couldn’t agree more, added to which it is nice to see how the code you contribute can be built on and extended.
There was an interesting entry on the ChemSpider blog this week, apparently they are starting to capture spectral information
The RSC now encourages authors for several of our journals to supply extra information, structures and spectra in their original file formats – which are attached to the article as supplementary information. Already we’ve seen several submissions of data that we have incorporated into ChemSpider records, both enriching the ChemSpider database and also showcasing the research of these authors through their publications. In this way, the RSC hopes to encourage the addition of reusable data files to the research paper as the start of its efforts to promote increased data sharing within chemical science research. In a few short weeks we’ve received a number of submissions from authors that include key chemical structures as mol files and in some cases extra data including 1H and 13C NMR spectra as well as UV and IR spectra.
Once you have done that you can see the spectra displayed using a Java applet.
Silicos-it have just announced that Strip-it version 1.0.2 has been released and is now available for download. Strip-it 1.0.2 includes a new command line feature, the --noHeader option, to suppress the generation of the header line in the output. This new release is based on a patch kindly provided by Bjrn Grning from the University of Freiburg.
Those who follow the ChemSpider Blog may have noticed that there have been a number of enhancements to ChemSpider. Details are covered in three blog posts the first cover autocompletion and combined structure and property searches, the second covers searching the supplementary information and the final post describes combining searches and then using spectral information to identify a reaction product.
OpenEye have announced updates to a couple of their products
OMEGA v2.5 is designed to produce high quality multi conformer databases.
Highlights from this release include: OpenMPI version 1.6 is supported on all platforms. The -mpinp and -mpihostfile flags are now used to run OMEGA and makefraglib in MPI mode. These new flags replace the oempirun script. PVM (parallel virtual machine) is no longer supported. An option has been added to allow hydrogen atoms in -OH, -SH, and amines to take part in conformational sampling. This new option can be enabled via the -sampleHydrogens parameter. By default, hydrogen atoms are not sampled. Now using -fixsmarts without -fixmol will rematch for every input structure. Previously, this would only match the first input structure and reuse that match for the rest of the calculation. Using both -fixsmarts and -fixmol will continue to match against the fixmol and use that match for the entire calculation.
EON v2.2. compares electrostatic potential maps of pre-aligned molecules and determines the Tanimoto measures for the comparison.
Highlights from this release include: OpenMPI version 1.6 is supported on all plafforms. The -mpinp and -mpihostfile flags are now used to run EON and makefraglib in MPI mode. These new flags replace the oempirun script. PVM (parallel virtual machine) is no longer supported. The default hitlist format has been changed from sdf to oeb for increased functionality and decreased filesize. The output format is adjustable with the -oformat parameter. Now SD tags are prefixed with EON_. The tags are optional with the -sdTags parameter. Additionally, any existing ROCS tags will not be removed because ROCS and EON tags no longer conflict
Schrödinger have just announced the latest release of their entire suite of software programs, this covers tools for drug design, material science, biological modelling and general purpose modelling.
There are a huge number of new features and improvements in the small molecule drug discovery suite as well as Materials Science. However a couple of features caught my eye, pKa prediction for both rule-based and QM-based methods has been improved. Covalent ligand docking has been added and includes a variety of common docking chemistries. Improved pi-stacking interactions in docking. The molecular dynamics has been updated and now includes support for GPU-acceleration. The QM tools look to have been updated and the interface improved. ADME tools have been updated and in particular P450 site of metabolism has improved accuracy.
Predictive capabilities that can be applied across a wide range of chemical systems, including ,Reaction thermochemistry and reaction path exploration ,Rate constants for reactions and transport from transition state theory ,Validated models for calculating oxidation and reduction potentials, Accurate heats of formation and atomization energies for larger systems, Reliable properties for systems containing transition metals,Efficient calculation of electric field dependent properties,Prediction of vibrational and electronic spectra for complex systems,Multiple pre-defined calculation modes representing tested simulation parameters balancing speed and accuracy.
There has also been an update to PYMOL with improved rendering speeds and a couple of bug fixes.
This is certainly a great update and well worth having a detailed look at.
Creating SMARTS strings can be an interesting experience, thankfully there are a few tools that make the task easier. A recent addition is SMARTSeditor from BiosolveIT, the SMARTS tools are prototypes developed by their academic partner at the Center for Bioinformatics (ZBH) of the University of Hamburg. The software may be used free of charge up until December 31th 2013.
SMARTSeditor is an interactive GUI application that lets you draw substructure patterns. Jump-starting from a molecule you may develop a SMARTS in a quite intuitive fashion by editing the topology and properties. Using pre-defined patterns for common functional groups lets you quickly reach your goal. SMARTSeditor supports recursions, allowing you to go to any level of complexity without getting lost.
You can read more here Schomburg, K.T., Wetzer, L., Rarey, M. Interactive design of generic chemical patterns. Drug Discov Today (2013)
You can also try the SMARTviewer out online here.
The Chemical Activity Predictor service is the first one of the new Apps from NCI-NIH. It provides the prediction of a (growing) number of small molecule physicochemical or biological properties calculated by QSAR models created in the GUSAR software. To use it you can simply paste in a SMILES string. It is only a beta test but worth having a look at.
ChemAxon have announced an update to their desktop suite of applications.
Version 6 for scientists:
- All GUIs are refreshed to simplify both the user experience and access to key chemistry functions
- Plexus – new web-based application for medicinal and computational chemists, featuring dynamic data visualization,
MacOSX 10.8 users should also note this support message
Many of you have bumped into the problem when the Gatekeeper Security of OS X Mountain Lion blocks launching applications downloaded not from the Apple store. The solution is to modify the default settings of the Gatekeeper:
- Select 'Apple > System Preferences > Security & Privacy '
- In the 'General' section the setting of ' Allow applications downloaded from :' should be set to ' Anywhere'! After this you would not get the "damaged dmg" popup and you can install the downloaded dmg. You even don't need to re-download it, the "damaged dmg" is misleading.
ChemmineR a cheminformatics package for analyzing drug-like small molecule data in R was recently updated. Its latest version contains functions for efficient processing of large numbers of molecules, physicochemical/structural property predictions, structural similarity searching, classification and clustering of compound libraries with a wide spectrum of algorithms. In addition, it offers visualization functions for compound clustering results and chemical structures.
To install, start R and enter
The Living Molecules iOS app has been updated, this version now activates the phones flashlight to better capture the glyphs in dim conditions.
The Living Molecules app allows you to turn chemical structures into a molecular glyph, which can be used on documents like posters, manuscripts or web pages. Anyone who has the app can use the camera to capture molecular glyphs, and bring the chemical data onto their device. A molecular glyph is the chemical equivalent of a QR code.
Some time ago I described a Safari extension that uses the chemicalize.org to index a web page for chemical content.
For an example of a “chemicalized” page have a look at this
As you can see below all molecules mentioned in the page become links that on a mouse over reveal the structure, they also provide a handy ribbon of structures across the top of the page that is useful for quickly scanning and navigation.
A recent publication by Southan and Stracz, Extracting and connecting chemical structures from text sources using chemicalize.org. Journal of Cheminformatics 2013, 5:20 describes how this information is being used to provide better indexing of the internet in a chemically intelligent manner. They include a demonstration of a number of web pages and document sources that were indexed in this manner including PDF’s from the patent office.
chemicalize.org now has 15000 unique visitors a month – which is a huge growth compared to spring 2012. These users contribute to the database every day, making sure it’s up-to-date and contains new interests as well. The database today contains 327000 structures that were converted from 545000 names and identifiers coming from 367000 webpages.
These structures and links have now been uploaded to PubChem and if you are interested in what sort of molecules have been registered via chemicalize.org you can browse them on the PubChem website here
A Pan Assay Interference Compounds (PAINS) Filter for filter-it
Jonathan B. Baell and Georgina A. Holloway published a very interesting paper on their analysis of frequent hitters from screening assays. DOI
This report describes a number of substructural features which can help to identify compounds that appear as frequent hitters (promiscuous compounds) in many biochemical high throughput screens. The compounds identified by such substructural features are not recognised by filters commonly used to identify reactive compounds. Even though these substructural features were identified using only one assay detection technology, such compounds have been reported to be active from many different assays. In fact, these compounds are increasingly prevalent in the literature as potential starting points for further exploration, whereas they may not be
In the supplementary information they provided the corresponding filters in Sybyl Line Notation (SLN) format, unfortunately I don’t use SYBYL and so needed them in SMARTS format for use with filter-it.
This article describes the process of creating a .sieve file for use with filter-it.
The Open Chemistry Group have just announced the availability of the first beta release of a suite of software packages for chemists.
It consists of Avogadro 2 an update to the well established molecular editing package, see a recent paper describing it for more details “Avogadro: an advanced semantic chemical editor, visualization, and analysis platform” DOI.
Some notable new features of Avogadro 2 include:
- Scalable data structures capable of addressing the needs of large molecular systems.
- A flexible file I/O API supporting seamless addition of formats at runtime.
- A Python-based input generator API, creating an input for a range of quantum codes.
- A specialized scene graph for supporting scalable molecular rendering.
- OpenGL 2.1/GLSL based rendering, employing point sprites, VBOs, etc.
- Unit tests for core classes, with ongoing work to improve coverage.
- Binary installers generated nightly.
- Use of MoleQueue to run computational codes such as NWChem, MOPAC, GAMESS, etc.
The final element of this first beta release is a chemically aware database MongoChem built on MongoDB intended to address the need for researchers and groups to be able to effectively store, index, search and retrieve relevant chemical data. It uses Open Babel to provide the cheminformatics input
There is a slide presentation describing the project in more detail here/
Marvin 5.12.3 has been released with a couple of bug fixes
- Name to Structure (n2s)
- Names with ylium and uide suffixes are now supported.
Calculation NMR (HNMR, CNMR Prediction, ...)
- Coupling of a nucleus with a group of magnetically equivalent nuclei was not handled properly.
- NMR Predictor did not consider negative coupling constant values.
JSDraw has been updated as part of the Cheminformatics DevSuite 2.5.1 Release.
There is an online demo of 3D structure generation here using JSME and JSmol.
Noel O’Boyle gave a brief talk at the New Orleans ACS describing the new features and plans for Open Babel
I’ve just written a review of Stardrop an application from Optibrium that was designed to aid decision making for scientists involved in drug discovery that has recently been updated.
- Virtual Library Enumeration – The Nova plug-in module for StarDrop now has the added ability to quickly and easily enumerate a virtual library based on a template scaffold that you define with substitution points and variable fragments. You can sketch the groups to substitute at each point, select them from a user-defined or centrally administered library, or take them from a decomposition of another series using the R-group analysis tool in StarDrop
- Data visualisation - now allows you to apply interactive filters to your graphs and plots to quickly focus on the most interesting compounds. StarDrop now also supports the analysis of dates allowing you to explore variations of properties or scores with time
- Clustering - this new tool enables you to easily identify groups of similar compounds within a data set, based on either their structural similarity or properties
- Dataset Filtering - this helps you to remove compounds from a data set with unwanted sub-structures or property values. You can define any number of criteria with which to filter a data set
- Duplicate Removal - when combining compound data from multiple sources it’s common to end up with multiple copies of the same compound in a single data set. The duplicate removal tool makes it easy to find these and choose the entries that you want to keep.
- ADME QSAR – new model for predicting log([Brain]:[Blood]) (the old model remains available for consistency with previously calculated results)
- StarDrop now includes a FieldAlign module, using Cresset's molecular Field technology, provides a unique, 3-dimensional (3D) insight into the biological activity, properties and interactions of your compounds.
There is a comprehensive list of software reviews here.
MayaChemTools: An open source package for computational discovery, COMP poster #306, 243rd ACS National Meeting & Exposition, March 25-29 2012, San Diego, CA
The current release of MayaChemTools provides command line PERL scripts for the following tasks:
- Manipulation of SD, CSV/TSV, Sequence/Alignments and PDB files
- Analysis of data in SD, CSV/TSV and Sequence/Alignments files
- Information about data in SD, CSV/TSV, Sequence/Alignments, PDB and fingerprints files
- Exporting data from MySQL, Oracle and PostgreSQL tables into text files
- Properties of periodic table elements, amino acids and nucleic acids
- Elemental analysis
- Support for multiple valence and aromaticity models
- Generation of fingerprints corresponding to atom neighborhoods, atom types, E-state indicies, extended connectivity, MACCS keys, path lengths, topological atom pairs, topological atom triplets, topological atom torsions, topological pharmacophore atom pairs and topological pharmacophore atom triplets
- Generation of fingerprints with atom types corresponding to atomic invariants, DREIDING, E-state, functional class, MMFF94, SLogP, SYBYL, TPSA and UFF
- Calculation of similarity matrices using a variety of similarity and distance coefficients
- Calculation of physicochemical properties including rotatable bonds, van der Waals molecular volume, hydrogen bond donors and acceptors, logP (SLogP), molar refractivity (SMR), topological polar surface area (TPSA), molecular complexity and so on
- Similarity searching using fingerprints
I’ve written a script for Vortex that uses MayaChem tools to calculate molecular properties.
Marvin 5.12.2 has been released with a couple of bug fixes
- Conversion from explicit hydrogen to implicit one removed stereo centers not having explicit hydrogen ligand.
- Non ring bond information were imported as query strings from SMARTS.
- After SMARTS import, those atoms that had no explicit aromatic property but had aromatic bond got query aromaticity property.
Way back in the distant past when I first joined the Pharma industry I remember working with a dumb terminal running sub-structure queries on a remote mainframe that seemed to take for ever on our relatively modest corporate database, returning the results would then bring our network to a crawl much to the annoyance of my colleagues. I’ve just downloaded ElementalDB from Dotmatics, this an iPad application that does a substructure search of a 1,200,000 structure database in less than a second.
New features and improvements
S orbitals and oval shaped s or p orbitals are imported from CDX/CDXML.
Painting, Charge symbol on carbon atoms was missing when the atom numbers were visible and the display of carbon atom labels was turned off. When two atoms had more than one electron flow arrows between them, the electron flow arrows overlapped each other. The second electron flow arrow started from a wrong position when a single electron and an electron pair flow arrow started from an atom which had a lone pair and a radical as well.
Editing, Atom Lists and NOT Lists could not be created by typing atomic symbols separated with commas (e.g., "f,br,cl" or "!f,br,cl").
Import/Export, MRV and CML export wrote out characters incorrectly which are not supported by the character set. SDF files having invalid header could not been imported. Deuterium and tritium isotopes were converted to simple hydrogen atom if a molecule was exported to ChemAxon compressed MOL format (CSMOL). MolExporter.exportToObject() added an extra newline to SMILES. Nitrogens connecting two aromatic rings had radical after import if nitrogen was bracketed in the SMILES representation. Absolute stereo flag was missing during InChi export/import and InChiKey export.
Molecule Representation, Number of added implicit Hydrogen atoms were incorrect in some cases for positively charged sulfur atom.
Calculation, After canonical tautomer generation, the information of "double cis or trans" bond type might have been lost in certain cases.
In addition Structure Checker configuration can be accessed via URL from MarvinSketch, Structure Checker application, and via Structure Checker API call. Users can now also access a custom web-service to extend name to structure conversion - for instance, with corporate IDs or common name dictionaries. Typing abbreviated group names is now case sensitive, When pasting unrecognised format onto the canvas, "Import as" dialog appears, and the user can choose the correct format. Structures can be copied as "Daylight SMARTS" and "ChemAxon SMARTS (CXSMARTS)" formats. The MMFF94 forcefield has been added to Generate3D and can also be used in the Conformer Plugin and Molecular Dynamics Plugin.
The complete release notes are available here
In the previous tutorial we made use of the Virtual Computational Chemistry Laboratory web service to calculate aLogP and LogS, both these results were returned in a simple text format. More recently there has been an increased use of JSON format for data exchange.
Molinspiration provide a number of cheminformatics tools but also provide a RESTful web service these web services can be used to calculate a range of molecular properties and bioactivity predictions.
The output from both web services is available either as a JSON string or plain text, the web service can be accessed by submitting a URL
I’ve been slowly updating iBabel.
There are now separate versions for Mac OS X 10.7.x and 10.8.x
More details on the iBabel3 page.
Just saw this announcement from OpenEye
OpenEye is pleased to announce that the OpenEye Toolkits v2013.Feb have been released. This release includes the C++, Python, .NET and Java versions of the Toolkits. Please see the release notes for specific details on the improvements and fixes made in this release.
- OSX 10.8 support added for C++, Python, and Java. Note, C++ and Python are built with the Clang 4 compiler.
- Visual Studio 2012 support added for C++ and C#.
- OSX 10.6 32-bit python is no longer supported.
- The next release, 2013.Jun, will be the last release to support SuSe 10.
- The documentation examples were accidentally left out of the last release on Windows. They can now be found in C:\Python27\OpenEye-2013.Feb.1\docexamples.
- Importing the python toolkits when there are spaces in the directory structure should now work properly.
- Added the toolkit package version to the openeye module, e.g., openeye.version == "2013.Feb.x" for this release.
I’ve just added the latest script for Vortex.
In previous scripts we have generated data using a local Java program, C program, PERL script, and SVL program. In this tutorial rather than have a local application generate the data we will use a web service.
There are more scripts on the Hints and Tutorial pages.
OpenEye has announced the release of OEDocking v3.0.1. This is a bug fix release to the FRED, HYBRID, and POSIT programs. Of note, the report generated by both FRED and HYBRID has been significantly improved with this release
- The program dockreport has been renamed to DOCKINGREPORT
NEW FEATURES AND IMPROVEMENTS
- The formatting of the DOCKING_REPORT has been significantly improved and now includes:
- Added a protein interaction fingerprint
- Polar Surface Area (PSA)
- Improved the geometry detection for hydrogen bond protein constraints in FRED and HYBRID. These constraints should now be tighter.
- Stereo isomer detection in POSIT was not handling bridgeheads properly, this caused some non-stereo molecules to be identified as such.
- Fixed a bug in FRED and HYBRID where clash detection between hydrogen bonding groups was occasionally too strict.
I’ve previously highlighted the use of ChemDoodle web components to display molecular structures within a web page, and a recent publication DOI by Henry Rzepa lead me to explore some of the newer additions to the means to render molecules within a web page without the use of applets or plugins.
Optibrium have just announced that StarDrop 5.3 is now available, including many new features, the highlights include:
- Virtual Library Enumeration – The Nova plug-in module for StarDrop now has the added ability to quickly and easily enumerate a virtual library based on a template scaffold that you define with substitution points and variable fragments. You can sketch the groups to substitute at each point, select them from a user-defined or centrally administered library, or take them from a decomposition of another series using the R-group analysis tool in StarDrop
- Data visualisation - now allows you to apply interactive filters to your graphs and plots to quickly focus on the most interesting compounds. StarDrop now also supports the analysis of dates allowing you to explore variations of properties or scores with time
- Clustering - this new tool enables you to easily identify groups of similar compounds within a data set, based on either their structural similarity or properties
- Dataset Filtering - this helps you to remove compounds from a data set with unwanted sub-structures or property values. You can define any number of criteria with which to filter a data set
- Duplicate Removal - when combining compound data from multiple sources it’s common to end up with multiple copies of the same compound in a single data set. The duplicate removal tool makes it easy to find these and choose the entries that you want to keep.
- ADME QSAR – new model for predicting log([Brain]:[Blood]) (the old model remains available for consistency with previously calculated results)
I just noticed that version 1.4.1 of the Mobile Molecular DataSheet (MMDS) has just been submitted to the iTunes AppStore, and its notable new feature is the ability to select a datasheet and calculate structure-based properties. A new column is created for each selected property, and the calculation feature is applied to each row. The available properties currently include molecular weight/formula, log P, molar refractivity and topological polar surface area. The functionality is provided by the molsync.com webservice.
There is more information on the Cheminformatics blog.
There is a page of mobile science applications here.
The new SMARTCyp version 2.4 includes solvent accessible surface area (SASA) in the scoring function. SASA is computed using the 2DSASA algorithm from 2D coordinates.
A paper describing the new models and their predictive accuracy on nine CYP isoforms is available in Molecular Pharmaceutics DOI
MacVector Inc have released a free version of MacVector.
If you used a temporary trial license, then when the 21 days were up, MacVector would simply refuse to start unless you entered a new valid license code. With the release of MacVector 12.7 we have changed that behavior. Now, when the trial license (or any annual license) expires, MacVector will give you the option of continuing to work, but with reduced functionality. All of the functions in the Analyze menu become disabled, but you can still open, edit, save and print MacVector documents, or save MacVector files in other formats.
ichemlabs have announced the release of ChemDoodle Web Components 5.
ChemDoodle Web Components 5 is a massive update. The most notable addition is a Full Sketcher, for drawing multiple molecules, shapes and figures, in addition to the Single Molecule Sketcher already provided. iChemLabs Cloud services and the ChemDoodle JSON format have been updated and drastically improved. The entire codebase has been reoptimized and cleaned, doubling the performance in desktop browsers and more than quadrupling the performance in mobile browsers. All Canvases now handle managing multiple molecules and shapes. Many new additions have been added and dozens of bug fixes have been implemented. We will be unrolling our new proprietary options over the next month, but of course, everything is available for free today under the GPL license!
There is a tutorial for using ChemDoodle web components here.
I’ve just finished a review of the latest version of MOE from the Chemical Computing Group.
There are a number of new features that will be of particular interest to Mac users and I’ve included a few tips for using Marvin as the external 2D chemical drawing package.
There is a collection of software reviews here.
The Chemical Computing Group have announced the release of PSILO version 2012.11. PSILO is a protein structure database and visualization system that provides an easily accessible, consolidated repository for macromolecular and protein-ligand information. Some key features in PSILO include:
- 3D Interaction Query
- Pocket Similarity Search
- Project Standard Orientation
New and enhanced features in PSILO 2012.11 include: domain motif search, nonredundant BLAST summary report, automatic GPCR annotation and Interactive protein:ligand interaction diagrams. PSILO offers research organizations a means to systematically track, register and search both experimental and computational macromolecular data. A web-browser interface facilitates searching and accessing public and private data
KNIME 2.7 has been released.
KNIME now runs on Java 7 for Windows and Linux systems (Mac stays on Java 6) Eclipse update 3.7 increases stability on Mac and some Linux systems. BIRT 3.7 brings Open Office support among other new features
JFreeChart nodes have now more setting options in the “General Plot Options” tab of their configuration window.
In R-> Local there are a number of new nodes to import:
- “Table to R” can read a KNIME table into R and output the R workspace.
- “R to Table” takes an R workspace and outputs a KNIME table.
- “R +Data to R” takes an R workspace and optional data input and outputs an R workspace.
- “R to R-View” takes an R workspace and outputs a KNIME view
There is a KNIME tutorial here
SMARTCyp 2.3 has been released with some additional improvements including: Improved energies for N-oxidations Empirical correction for unlikely N-oxidations of tertiary alkylamines A filtering functionality for excluding compounds with very low activation barriers to CYP-mediated oxidations A smiles string can now be input directly on the command line using the -smiles flag. Available as usual at http://www.farma.ku.dk/smartcyp The science behind the improved N-oxidations and the empirical correction has also been published in a paper in Angewandte Chemie: DOI
A great collection of freeware tools provided by Michel Petitjean
- ARMS: Spatial Alignment with the RMS (Root Mean Square) method. (fixed pairwise correspondence)
- ASV: Analytical calculation of van der Waals surfaces and volumes. (or any union of spheres)
- CSR: The Combined SDM/RMS Algorithm for spatial alignment of two molecules. (pairwise correspondence computed)
- CYL: Minimal radius enclosing cylinder. Minimal radius circumscribed cylinder.
- DIVCF: Selects by clustering major conformations of a molecule in a set of its conformers.
- DOG: Docking Geometrically two molecules. (fixed pairwise correspondence)
- GRD: Computation of the Radius and Diameter of a molecular graph. (computes also the topological shape index)
- MCG: Optimal Partition (classification): numerical variables and non-euclidean spaces. The number of classes is computed.
- POP: Optimal Partition (classification): categorical variables. The number of classes is computed.
- POSE: Computes the RMSD between two ligand poses. No rotation translation is performed.
- QCM: Quantitative Chirality Measure of a conformer (graph automorphisms enumeration included)
- RADI: Computation of the Radius and Diameter of a spatial set. (computes also various other geometrical parameters)
CLC bio is pleased to announce a new release of Molegro Virtual Docker , an integrated platform for computational drug design available for Windows, Linux, and Mac OS X. Molegro Virtual Docker offers high-quality protein-ligand docking based on novel optimization techniques combined with a user interface experience focusing on usability and productivity.
New features in version 5.5:
A new 'Energy Maps' tool provides volumetric visualization of protein force fields. This makes it possible to understand why a compound interacts with a given receptor, and may provide insights on how to improve the binding.
We also added a new execution mode in the Docking Wizard: 'Run Docking in Multiple Processes'. This makes it possible to run medium sized jobs on a local machine, while utilizing multiple CPU cores and even multiple GPU graphics cards. For large jobs on multiple machines, Molegro Virtual Grid should still be used.
The ray-tracer has been improved to more closely match the live 3D view output. This makes it possible to create high resolution renderings of the 3D view.
OpenEye is pleased to announce that the OpenEye Toolkits v2012.Oct have been released. This release includes the C++, Python, and .NET versions of the Toolkits
C++ examples build system changed to CMake for all supported platforms: Linux, Windows, and OSX.
This is a new release of the OpenEye Toolkits with versions of the following libraries:
Details of the changes to the individual libraries are here
Marvin from ChemAxon has been updated to version 5.11
New features and improvements
- Image I/O
- Recently added rendering options are now available to be set from MolPrinter API (Absolute label visibility, Peptide display type, R-group visibility, Any bond style, Lone pair rendering style, Charge rendering style). Documentation
- MSketch GUI
- A new "imageImportServiceURL=[URL]" program argument was added to the MarvinSketch application.
- MSketch applet
- A new "imageImportServiceURL" was added as an applet parameter.
- Graphical object handling
- When an MMidPoint object was set as an end point for an MPolyLine, getting the MMidPoint location caused a StackOverFlowError.
- Document to Structure (d2s)
- Names broken over two lines with a hyphen (-) are now recognized.
- Names followed by a superscript text, for instance, a reference or footnote number (e.g., "aspirin11") are now recognized.
- Name to Structure (n2s)
- In some cases, such as "4-methylthiophenylmethyl", there is an ambiguity whether "thiophenyl" refers to a compound derived from thiophene or thiophenol. Name to Structure now gives priority to the thiophenol related compound interpretation; though, "thiophenyl" by itself will still be supported as thiophene derivatives.
- If R-group visibility was turned off and any of the bonds had label(s) to paint, an ArrayIndexOutOfBounds exception was thrown.
- Image I/O
- Display parameters of charge, lone pair, peptide could not be set for molexporter. The default values were charge "in a circle", lone pair "as line", peptide "three letter format". Image copy also used these values.
- MOL, SDF, RXN, RDF
- Aliphatic query properties of atoms with query string were not read from MDL formats.
- After importing Extended MOL files that contain superatom S-groups the orientation of S-groups could be changed.
- Atom containing both aliphatic and unsaturated query properties were exported incorrectly to MDL formats.
- SDF import returned structure with incorrect S-group embedding.
- SMILES T* option did not export all SDF fields, but only those which appeared in the first molecule.
- Molecule Representation
- Two superatom S-groups being each others' parents caused infinite loop. In these cases, now java.lang.IllegalStateException is thrown.
- Valence Check
- Cloning of BicyclostereoDescriptor in RxnMolecules threw java.lang.ArrayIndexOutOfBoundException.
- Clean 2D
- Terminal methyl-group in phosphate-ester was cleaned incorrectly.
- Clean2D could not handle condensed adamantane derivatives. Forum topic
- Other (HBDA, Huckel Analysis, ...)
- The --pH command line option did not work in hydrogen bond acceptor-donor calculation.
- Structure Checker
- If fixer action was not defined, default fixer was not applied in structurechecker command line tool.
This version of ChemBioDraw released in August 2012 is the first release since Cambridgesoft became part of Perkin-Elmer and there are a significant number of changes. This is the first version to be released since the introduction of Mac OS X 10.7 and 10.8 and both are now officially supported. In addition the ChemDraw plugin is now supported in 64 bit mode and Microsoft Office 2011 is supported. I’ve written a brief review here.
I recently wrote a review of ForgeV10 from Cresset in which I actually imported the results into Vortex to do the analysis. There were however two issues with doing this, firstly interpretation of the 3D structures is sometimes difficult, this can be resolved by creating a 2D rendering of the structure. The other issue is trying to interpret the docking pose whilst looking at the analysis of the results in say a Vortex scatter plot.
I’m a great fan of SMILES notation (simplified molecular-input line-entry system) as a compact means of storing chemical structures, and whilst there are many tools for creating SMILES strings they often give different (but acceptable) results. Various algorithms for generating Canonical SMILES have been developed, including those by Daylight Chemical Information Systems, OpenEye Scientific Software, MEDIT, Chemical Computing Group, MolSoft LLC, all use proprietary code. In the latest issue of Journal of Cheminformatics Noel O’Boyle describes the development of Universal SMILES and Inchified SMILES as implemented in Open Babel an open source cheminformatics toolkit. DOI
Cresset have announced the formal release of sparkV10 the replacement for FieldStere.
- Updated molecular mechanics force field that uses a single analogue nitrogen atom and updates the field patterns for many functional groups including aromatic halides
- Added capability to read protein excluded volumes from pdb files
- Added new cluster algorithms for clustering of results
- Added option to edit reference molecules in the molecular editor
- Added capability to manage columns in the results table
- New optional module for scoring results using StarDrop models, this does not require access to a StarDrop server, simply place StarDrop model files in a directory and they automatically get used if you have the right license. The standard ADMET models that Optibrium have created are supplied but it works equally well with any models created by StarDrop.
- Added fragment import option in database generator
- Added capability to rescore all results against a 3D QSAR model using Forge or Torch
- Added capability to search databases for a particular fragment or substructure
- Added option to delete entire clusters from results
- Added depth cue to 3D window
- Added a GUI interface for selecting a portion of a molecule and writing command line arguments
- Cleaner GUI with improved buttons
Users should note:-
SparkV10 completely replaces Cresset’s previous “FieldStere” application. If FieldStere is currently installed then it is recommended to uninstall the binary to avoid confusion over which application should be used to open FieldStere project files
VIDA v4.2.0 has been released. This is an important update that offers many significant new features, is built on the most recent OpenEye Toolkits, and adds support for the Ubuntu platform. Among these new features is the ability to perform "telemodeling" by sharing interactive VIDA sessions over a network between multiple users at different locations. In addition, vast improvements have been made to the rendering engine to provide more vivid and realistic 3D graphics. The user interface has also been further streamlined for a more intuitive user experience and the new ability to export files as PDF documents enhances the off-line user experience as well. VIDA is available for download now. Existing licenses will continue to work. If a new license is needed, please contact your account manager or email email@example.com to request one.
Avogadro is a free, open source, cross-platform molecular editor designed for flexible use in computational chemistry, molecular modeling, bioinformatics, materials science, and related areas. Packages are available for Windows, Linux and Mac OS X. The source code source is available under the GNU GPLv2.
This release highlights a great deal of new features, including a built-in crystal library, crystallographic editing, building slabs / surfaces with arbitrary Miller planes, support for Abinit (and soon Quantum Espresso), searching for IUPAC names in PubChem, custom atomic colors and radii, and much more.
See the Release Notes: http://avogadro.openmolecules.net/wiki/Avogadro_1.1.0
What does Avogadro do?
- An intuitive "builder," including common fragments, downloading directly from PDB or PubChem, and peptide sequences
- Innovative "auto-optimize" tool which allows you to continue to build and modify, during molecular mechanics optimization
- Interfaces to many common computational packages
- Designed to help both educational users and advanced research
- Plugins that allow Avogadro to be extended and customized
- Well defined public API, library and Python bindings for development
- Embedded Python interpreter
- Translations available in 19+ languages
For more information: http://avogadro.openmolecules.net/wiki/
Vortex is an advanced data analysis package that understands chemistry, the capabilities of Vortex can be extended by the use of scripts. I’ve now created Vortex script exchange that users can use to download or share scripts.
There are also a series of scripting tutorials here to provide a starting point for creating new scripts.
Hopefully these scripts will be valuable to you.
I recently wrote a review of ForgeV10 in which I imported the results into Vortex for analysis. This works fine the only issue being the resulting structures are 3D which makes interpretation of the structure sometimes difficult to discern, this script uses OpenBabel to create SMILES which can be rendered as 2D images.
This is a review of ForgeV10 the latest offering from Cresset, whilst a new product those familiar with FieldAlign and FieldTemplater will recognise much of the functionality. ForgeV10 allows the scientist to use Cresset’s proprietary electrostatic and physicochemical fields to align, score and compare diverse molecules. It allows the user to build field based pharmacophores to understand structure activity and then use the template to undertake a virtual screen to identify novel scaffolds.
There is a compilation of software reviews here.
Screening Assistant 2 (SA2), an open-source JAVA software dedicated to the storage and analysis of small to very large chemical libraries. SA2 stores unique molecules in a MySQL database, and encapsulates several chemoinformatics methods, among which: providers management, interactive visualisation, scaffold analysis, diverse subset creation, descriptors calculation, sub-structure / SMART search, similarity search and filtering.
A recent publication describes it in detail. Mining Chemical Libraries with "Screening Assistant 2, Vincent Le Guilloux, Alban Arrault, Lionel Colliandre, Stéphane Bourg, Philippe Vayer and Luc Morin-Allory. DOI
CWM Global Search is an Internet search tool for scientists that want to search for chemical data on the Internet - it makes a federated search over many scientific databases on the Internet.
- Search the Internet by structure
- Find structures for synonyms, CAS Numbers, names
- Submit several compound in one search - use SDFiles.
- Find biological effects of a compound
CWM Global Search presently searches more than 60 free chemical and pharma relevant databases -- containing more than 100 million pages which associate chemical structures with data.
License fee per year: 5 copies for 1000 Euro, single copy for 240 Euro The limitation of the free version is that only a subset of result links can be opened. You will always get the information how many hits a query finds in a given data source, which helps you to decide to re-execute the query using the native user interface. Supports Internet Explorer, Chrome and Firefox on Windows and Safari on MAC computers.
From the latest issue of Journal of Cheminformatics
The work presented here details the Avogadro library, which is a framework providing a code library and application programming interface (API) with three-dimensional visualization capabilities; and has direct applications to research and education in the fields of chemistry, physics, materials science, and biology. The Avogadro application provides a rich graphical interface using dynamically loaded plugins through the library itself. The application and library can each be extended by implementing a plugin module in C++ or Python to explore different visualization techniques, build/manipulate molecular structures, and interact with other programs. We describe some example extensions, one which uses a genetic algorithm to find stable crystal structures, and one which interfaces with the PackMol program to create packed, solvated structures for molecular dynamics simulations. The 1.0 release series of Avogadro is the main focus of the results discussed here.
I’ve added CORINA to the alphabetical listing. CORINA is a fast and powerful 3D structure generator for small and medium sized, typically drug-like molecules. Its robustness, comprehensiveness, speed and performance makes CORINA a perfect application to convert large chemical datasets or databases.
I was wondering when someone would use an iPad as the front-end to a fully featured modelling package running on a remote server, looks like Wavefunction have done a pretty impressive job with taking their sophisticated Spartan computational chemistry package from the desktop to mobile devices.
iSpartan creates molecules as familiar 2D sketches, directly converts these into 3D structures, and calculates low energy conformations. Atomic and molecular properties, NMR and infrared spectra, molecular orbitals and electrostatic potential maps are available from a 5,000 molecule subset of the Spartan Spectra and Properties Database (SSPD). The database may also be searched by substructure. Properties, spectra and graphical models of molecules in the SSPD subset are available for examination.
iSpartan Server is an available add-on to the iSpartan app. iSpartan Server installs on a Windows or Macintosh computer and converts iSpartan from an application whose primary utility is sketching molecules in 2D and visualizing them in 3D, into an open-ended molecular modeling research tool providing access to the full Spartan Spectra and Properties Database (SSPD), currently ~170,000 molecules) and to the computational engines used to produce the data in the SSPD. For molecules not included in the database, connection to iSpartan Server supports calculation of structures, properties, and spectra for all user generated molecules from iSpartan running on the iPad, iPhone, and iPodTouch.
There is a listing of science apps for iOS here
Friday the 13th turned out to be a nightmare for OpenEye it turns out that a mathematical operation in the licensing software failed meaning users were unable to use certain versions of their software. To their great credit they delivered an update last night that resolved the problem.
VIDA v4.1.2 has been released. This is a very important bug fix release that enables continued use of VIDA after the licensing problem that was discovered on July 13, 2012.
To benefit from the display clarity of devices such as the New iPad, chemicalize.org’s chemical structure image generator (built using Marvin) now handles high DPI displays (Retina displays), that also includes iPhone 4 and most new Android devices. Orientation change and touch events like drag, tap and swipe are enabled now.
POSIT - Ligand guided pose prediction POSIT is designed to use bound ligand information to improve pose prediction. Using a combination of OpenEye approaches, including structure generation, shape alignment and flexible fitting, it produces a predicted pose whose accuracy depends on similarity measures to known ligand poses. As such, it produces a reliability estimate for each predicted pose.
The optimizer has been enhanced to produce better aligned structures in certain cases.
A memory leak in the optimizer was fixed, POSIT should now properly handle large streams of molecules. The -mcs flag is now turned off by default. In some cases, the mcs was taking far too long for no real benefit in pose prediction.
I just noticed ChemDoodle web components have been updated
I was at the Cresset Science Meeting last week and heard about the plans to update their comprehensive suit of drug discovery and design computational tools.
Together with an interesting updates to the tools the suite has undergone something of a makeover, all of the software tools have be renamed using a “Fire” theme and refocussed to specific users needs rather than the software capabilities. The renaming will not be complete until September so in the interim the links on some of the download pages still point to the originally named application.
TorchV10lite is a free 3D molecule viewing, editing and drawing application that shows your molecules in 3D overlaid with field patterns generated using their proprietary field technology together with 2D structure and physicochemical properties. It is the replacement of FieldView.
TorchV10 is a powerful design and 3D SAR tool for medicinal chemists. It is used to take leaps in structural design by identifying compounds with similar fields but different 2D chemical structures while maintaining or improving biological activity. It is the replacement for FieldAlign and due for release very soon.
SparkV10 is a powerful way of generating novel and diverse structures for your project. sparkV10 uses Cresset’s field technology to find biologically equivalent replacements for key moieties in your molecule, enabling you to find new structures in new chemical space. You can then use calculated physiochemical properties to filter and select the best designs. sparkV10 is the exciting replacement for FieldStere and due for release very soon.
The three applications above look to be intended for use by Medicinal Chemists whilst the remaining two applications are perhaps better suited to those more experienced in computational chemistry.
ForgeV10 takes advantage of Cresset’s patented ligand comparison method to align, score and compare molecules from a biological viewpoint, using the shape and electrostatic character of your molecules to create qualitative and quantitative 3D models of activity. forgeV10 combines FieldAlign and FieldTemplater in a single application,
BlazeV10 uses the shape and electrostatic character of known ligands to rapidly search large chemical collections for molecules with similar shape and electrostatic properties. It is installed and runs on a Linux cluster but is operated through a web-browser, enabling access from any platform and multiple locations.
Many molecular visualisation/modelling tools seem to assume the charge associated with an atom sits as a point at the centre of the nucleus, whilst this makes the computation easy it does not really reflect what the electrostatic surface really “looks like”. Cresset has pioneered the use of field point descriptors to give a more accurate description of the charge around an atom and to enable better comparisons and visualisation. This has been shown to be particularly important when trying to understand some molecular interactions such as Aryl-Aryl interactions or creating bioisosteric replacements.
Cresset now have an impressive suite of tools for drug discovery and I hope to review them in due course.
As part of an initiative to provide computational chemistry tutorials there is a competition now on.
Details for the competition Requirements Use freely available software tools and develop tutorials & models for workflows as requested in the challenges. Criteria to Judge
- Quality of predictive models
- Statistical measures, held-out test sets
- Quality of workflows
- Are these state-of-the-art?
- Clarity of the tutorials
- Suitable for undergraduate courses
- Include principles of underlying science
- Include description of “common pitfalls”
- Include description of all preparative steps & required resource
- Ease of use of the tools
- Can they be tailored/amended if new insights emerge (project specific or general insights)?
- Innovation of the computational methods
- Challenge 1: Workflow to analyze HTS data & build models for further hit finding
- Challenge 2: Structure-based design workflow, new chemotypes
- Challenge 3: Structure-based design workflow, medicinal chemistry strategy
- Challenge 4: Call for innovative drug discovery workflows
OpenEye is pleased to announce that the OpenEye Toolkits v2012.Jun.1 have been released. This release features numerous important bug fixes as well as support for a new platform: Ubuntu 12.04 LTS. Please note that this will be the last release to support Visual Studio 2003, Python 2.5 on Windows, and OS X 10.5. More specific details are provided in the release notes below. The OpenEye Toolkits are available for download now. Existing licenses will continue to work. If a new license is needed, please contact your account manager or email firstname.lastname@example.org to request one.
RELEASE NOTES This is a new release of the OpenEye Toolkits with versions of the following libraries: OEChem TK: 1.8.0 OEDepict TK: 2.0.3 OEDocking TK: 1.1.2 Grapheme TK: 1.0.3 GraphSim TK: 2.0.2 Grid TK: 1.4.0 Lexichem TK: 2.1.2 MolProp TK: 2.1.3 Omega TK: 2.4.6 Quacpac TK: 1.5.2 Shape TK: 1.8.2 Spicoli TK: 1.1.2 Szybki TK: 1.7.1 Zap TK: 2.1.3 Changes in platform support: * Added support for 64-bit Ubuntu 12.04 LTS. A reminder that 32-bit will not be supported on future linux distributions. * Last release to support Visual Studio 2003. Please upgrade to Visual Studio 2008 or 2010. * Last release to support Python 2.5 on Windows. Please upgrade to Python 2.6 or 2.7. * Last release to support OSX 10.5. Please upgrade to OSX 10.6 or 10.7.
A new version of StarDrop is now available. The new features include
- FieldAlign – this new module, using Cresset's molecular Field technology, provides a unique, 3-dimensional (3D) insight into the biological activity, properties and interactions of your compounds, helping to guide the design of novel, potent compounds with a high chance of success, there is a review of the FieldView and FieldAlign here.
- R-Group analysis – analyse a chemical series to interactively visualise the impact of variations to R-groups, linkers, atoms or fragments on compound properties. Explore the SAR of your chemistry, identify new optimisation strategies and automatically enumerate the missing combinations
- ADME QSAR – new models for predicting 2C9 pKi, BBB category and P-gp category (the old models remain available for consistency with previously calculated results)
- Nova – now available with the ability to select compounds using a combination of properties and chemical diversity
Scaffold Hunter is a JAVA-based software tool for the analysis of structure-related biochemical data. It enables generation of and navigation in a scaffold tree hierarchy annotated with various data.
Optibrium have just announced the imminent release of the next version of StarDrop
The highlight of this new release is the addition of a new plug-in module that provides access to Cresset's FieldAlign™ technology, which offers a unique, 3-dimensional insight into the biological activity of your compounds. This new development is the first result of the technology exchange, between Optibrium and Cresset, and adds another powerful tool to StarDrop that will enable you to understand the three-dimensional (3D) structure activity relationship (SAR) of your chemistry Version 5.2 also introduces new enhancements of StarDrop's core capabilities, in particular a flexible tool for performing automatic R-group analysis. This new feature analyses a chemical series to interactively visualise the impact of variations to R-groups, linkers, atoms or fragments on compound properties to help chemists to further understand the SAR of their chemistry and identify new optimisation strategies
Andrew Dalke has just released fmcs-1.0. It finds a maximum common substructure of two or more structures. Some of the features are:
- handles 1,000s of structures
- several different atom and bond comparison schemes
- modifiers to require ring bonds only match ring bonds, or that incomplete rings are not allowed in the MCS
- user-defined atom class typing through isotope labels (SMILES) or through an SD tag field
- uses an exact solution to find a maximum common substructure
- eports the current best solution if the timeout is reached
The software is distributed under the 2-clause BSD license and available for no charge from https://bitbucket.org/dalke/fmcs/downloads/fmcs-1.0.tar.gz
You must have the Python bindings to RDKit in order to run fmcs.
Usage details are in the README, shown also in the project page at: https://bitbucket.org/dalke/fmcs/
FITTED is a suite of programs to dock flexible ligands into flexible proteins. This software relies on a genetic algorithm to account for flexibility of the two molecules and location of water molecules, and on a novel application of a switching function to retain or displace water molecules and to form potential covalent bonds (covalent docking) with the protein side-chains.
The Suite includes many new features and implementations:
FITTED is a suite of programs (FITTED, PREPARE, ProCESS and SMART), JAVA GUI for easy keyword file editing and docking, Fully automated and flexible protein docking program, Automated covalent docking, Automatic protein preparation from pdb to mol2, Multi-mol2 support for docking and ligand processing, Uses an evolutionary algorithm, Semi-flexible protein docking with flexible waters, Has the ability to consider water molecules displaceable, Keyword files are simpler than ever, Support for Windows, Linux 32 and 64 bits, Mac OSX.
FastROCS is an extremely fast shape comparison application, based on the idea that molecules have similar shape if their volumes overlay well and any volume mismatch is a measure of dissimilarity running on the latest high performance graphics cards it can process 2 million conformations per second on a Quad Fermi box.
If you want to find out more about the use of GPUs in scientific computing take a look at this podcast.
SMARTCyp 2.2 has been released including the following updates: One new energy rule for sulfur atoms double bonded to sp2 carbon atoms. Update protonated amine SMARTS matching due to analysis of larger 2D6 data set. Faster predictions by rewriting the SMARTS matching code for the pharmacophores in the 2D6 and 2C9 models. Web site update with links to the 2C9 model paper, and prediction accuracy results on nine different isoforms. http://www.farma.ku.dk/smartcyp/
forgeV10 takes advantage of Cresset’s patented ligand comparison method to align, score and compare molecules from a biological viewpoint
It is designed to
- Decipher complex SAR and communicate the results
- Design better molecules based on predictions you can trust
- Prepare detailed pharmacophores
- Virtually screen 10 000 compounds on your desktop
- Generate ADME and off target activity profiles.
An interesting blog entry on Noel O’Blog regarding capturing stereochemistry from 2D representations.
Anyone involved with capturing this sort of information will be familiar with the interpretation of stereochemical information faithfully. I have to say I always store a SMILES string in a database, and not just because it capturers stereochemistry. It is a very compact way of storing chemical information, as a simple text string it is always possible to export to text editor, and after a little practise it becomes a very handy way to create SMARTS queries.
SMARTCyp has been updated to version 2.1.1. SMARTCyp is a method for prediction of which sites in a molecule that are most liable to metabolism by Cytochrome P450 a major contributor to oxidative metabolism.
The latest update now includes models for prediction of CYP2D6 and CYP2C9 specific metabolism
I’ve just completed a review of CheS-Mapper.
CheS-Mapper (Chemical Space Mapper) is a 3D-viewer for chemical datasets of small molecules, a recent publication in the Journal of Chemiformatics describes the application DOI: 10.1186/1758-2946-4-7, In addition more information is available on the wiki page. Whilst there are many applications for the visual analysis of data, very few provide the tools needed to handle chemical structures, CheS-Mapper is a java application that runs under Mac OSX (I only tested Lion) based on the Java libraries Jmol, CDK, WEKA, and utilizes OpenBabel and R, that provides an interesting means to explore chemical data sets.
There a complete list of software reviews here.
Sdfchecker is a free inspection and manipulation program for SDFiles (.sdf). Summary of functions: - Indicate number of structure records - Indicate number of blank structure records - Display list of Data Field names - Remove blank structure records - Split large files into smaller multiple files, a single random sized file, or containing a specified range of records - Convert into individual MOL files - Inspect for duplicate Data Field names within each record
The Mobile Molecular DataSheet (MMDS) has been updated. Two major usability enhancements:
(1) Additional tool banks on the left and right side of the sketcher provide simplified drawing tools that are more familiar to users of desktop chemical drawing software.
(2) A tooltip system provides tips, live demonstrations and links to documentation.
I recently wrote a couple of Applescripts that use the Chemical Identifier Resolver (CIR) a web service that performs various chemical name to structure conversions and it occurred to me that is should be possible to use this service to generate images for use as popups on a graph in the same way that I’ve previously described using Flot and ChemSpider. This works well but relies on the structure already being in the ChemSpider database, for novel structures we need a service for generating the image from a chemical identifier. CIR provides a simple web service for doing exactly this, for example submit a SMILES string and it can return a 2D image.
This tutorial shows how to create an interactive plot using Flot and CIR
From the KNIME newsletter
“…good news for our Mac Users! We have just released KNIME 2.5.4 which fixes issues caused by the latest Apple update of the Java environment. We are grateful to the very active KNIME community which has helped to identify and fix this problem.”
KNIME Desktop 2.5.4 can be downloaded from the download page (http://www.knime.org/download) or you can upgrade your existing KNIME installation by using the built-in update functionality available in the "File" menu
There is also a KNIME tutorial here
Virtual models for property Evaluation of chemicals within a Global Architecture (VEGA), Using the VEGA platform, you can access a series of QSAR (quantitative structure-activity relationship) models for regulatory purposes, or develop your own model for research purposes. QSAR models can be used to predict the property of a chemical compound, using information obtained from its structure. This version comes with some minor error fixes and with a new model (BCF Read-Across).
There is an interesting publication in Journal of Cheminformatics 2012, 4:7 doi:10.1186/1758-2946-4-7 describing CheS-Mapper .
CheS-Mapper (Chemical Space Mapper) is a 3D-viewer for chemical datasets with small compounds.
It can be used to analyze the relationship between the structure of chemical compounds, their physicochemical properties, and biological or toxic effects. CheS-Mapper divides large datasets into clusters of similar compounds and consequently arranges them in 3D space, such that their spatial proximity reflects their similarity.
The SVG support in Openbabel has undergone significant improvements due to the brilliant efforts of Noel O’Boyle and Chris Morley in particular the ability to colour a substructure within a molecule. This requires installation of the development version of OpenBabel at present.
I’ve added a movie to show it in action.
One of the critical activities of most drug discovery programs is the identification of novel leads, these hits can come from high throughput screening or fragment-based screening There is however great interest in virtual screening which allows the evaluation in silico of a vast number of compounds and the selection of a subset that have a greater chance of desired activity. The virtual screening can be achieved by searching using sub-structures or molecular descriptors, by docking potential ligands into the target protein and scoring the resulting docked pose, or by comparing with the shape and/or electrostatic map of a known ligand.
Shape-it is a tool developed by Silicos-it that aligns a reference molecule against a set of database molecules using the shape of the molecules as the align criterion. It is based on the use of Gaussian volumes as descriptor for molecular shape as it was introduced by Grant, J.A.; Gallardo, M.A.; Pickup, B.T. (1996) ‘A fast method of molecular shape comparison: a simple application of a Gaussian description of molecular shape’,J. Comp. Chem. 17, 1653-1666.
This script shows how to run shape-it from within Vortex, bringing in the shape matching scores for filtering and analysis.
MolSoft have announced the release of ICM version 3.7-2c.
New features include Atomic Property Fields APF is a 3D pharmacophoric potential implemented on a grid. APF can be generated from one or multiple ligands and seven properties are assigned from empiric physico-chemical components (hydrogen bond donors, acceptors, Sp2 hybridization, lipophilicity, size, electropositive/negative and charge).
The 3D ligand Editor is a powerful new tool for the interactive design of new lead compounds in 3D. It allows you to make modifications to the ligand and see the affect of the modification on the ligand binding energy and interaction with the receptor.
Use AQUASITES to design chemicals based on their ability to displace or keep water molecules inside the ligand binding site of proteins. The first step is to identify water binding sites and then the second step is to estimate the free energy of water displacement for a particular ligand(s).
Protein Modelling Inside ICM there are many features for homology modelling and loop modelling. This new option can be used if you have a gap in your protein and you want to find loops in the PDB which fit the gap.
"Pipe-able" Scripting in ICM. New options to pipe icm commands and scripts. Easy way to write pipe-able scripts (see $ICMHOME/molpipe/*.icm). Easy way to add parallelism to unix/mac ICM scripts: fork with pipe option ($ICMHOME\molpipe*.icm)
I’ve just added a new Vortex script, this one uses a PERL script that is part of the excellent MayaChemTools.
Scripting Vortex Using OpenBabel
Scripting Vortex 2 Using filter-it
Scripting Votrex 3 Using cxcalc
Scripting Vortex 4 Using MOE
Scripting Vortex 5 Calculating similarities using OpenBabel
Scripting Vortex 6 Filtering compounds
Scripting Vortex 7 Using MayaChemTools
I’ve just added another Vortex script. In this script we will make use of the ability of filter-it to categorise input molecules into 1) a set of molecules that fulfil all criteria as defined in the filter definition file (passed molecules), and 2) a set of molecules that do not fulfil at least one of the defined filter criteria (failed molecules). The filter file defines the criteria for acceptable calculated phisicochemical properties and also any substructures that should be included or excluded during the filtering. The filter file is a simple text file that users can define for themselves, there is a detailed explanation on the silicos-it website. They also provide several example filters “Leadlike”, “Druglike”, “CMCLike” and “Clean” which cleans up a file without imposing a “drug like” filter. It should be relatively straight-forward for users to create their own filters, one could imagine a rule-of-3 filter that might be used in fragment-based screening approaches, or a toxicphore filter based on SMARTS shown to be implicated in a specific toxicity. It might also be possible to define project specific filters if a project requires a specific profile. If you need help it might be worth contacting Silicos-it.
I’ve mentioned Silicos-it in the past and I thought I’d highlight them again since they have had a major makeover, the website has moved and the tools have been updated and renamed.
Silicos-it has contributed it’s expertise to the chemoinformatics community by porting its source code into the open source domain. Examples include the spectrophore descriptors, the filtering program filter-it and the pharmacophore tool align-it.
Filter-it™ is a command-line program for filtering molecules with unwanted properties out of a set of molecules. The program comes with a number of pre-programmed molecular properties that can be used for filtering.
I used the filter-it (previously called Sieve) in a Vortex script, I’ve rewritten the script and the tutorial to account for the name change.
Strip-it™ is a tool to extract molecular scaffolds according predefined rules. These rules are based on the definitions of scaffolds as described by Bemis & Murcko (J. Med. Chem. 1996, 39, 2887), Pollock (J. Chem. Inf. Model. 2008, 48, 1304) and Schuffenhauer (J. Chem. Inf. Model. 2007, 47, 47).
Align-it™ is a pharmacophore-based tool to align small molecules. The tool is based on the concept of modeling pharmacophoric features by Gaussian 3D volumes instead of the more common point or sphere representations. The smooth nature of these continuous functions has a beneficent effect on the optimisation problem introduced during alignment.
Shape-it™ is a shape-based alignment tool by representing molecules as a set of atomic Gaussians. The software is based on the method described by Grant and Pickup (J. Phys. Chem. 1995, 99, 3503).
Spectrophores are one-dimensional descriptors generated from the property fields surrounding the molecules. This technology allows the accurate description of molecules in terms of their surface properties or fields. Comparison of molecules’ property fields provides a robust structure-independent method of aligning actives from different chemical classes. When applied to molecules such as ligands and drugs, Spectrophores can be used as powerful molecular descriptors in the fields of chemoinformatics, virtual screening, and QSAR modeling. The Spectrophore code was developed by Silicos, and donated to the OpenBabel project in July 2010.
I’ve added two new applications from Metamolecular to the alphabetical listing.
ChemVector™ offers a modern solution to the chemical structure imaging problem. Features
- Runs with all commonly-used browsers on Windows, Linux, Mac, and iPad. This includes Internet Explorer 6-9 in addition to Firefox, Google Chrome, and Safari.
- Renders structures directly from individual molfiles on a server, or as inline content.
- Renders chemical structure content directly from ChemDraw™ binary files (.cdx).
- Declarative syntax replaces <img> tags with analogous <object> tags, making it easy for both developers and designers to work with the resulting markup.
- Non-blocking implementation makes it possible to render dozens of structures on a single page while maintaining UI responsiveness.
- Structures can be magnified pre- or post-rendering with no pixelation.
ChemCore is the chemiformatics foundation of all of the Metamolecular products and services. Written in Java and cross-compilable to a number of target runtimes and platforms, ChemCore is both fast and flexible.
- Fast subgraph matching
- Powerful graph query capabilities
- Flexible, efficient graph traversals
- Fast file input/output
- A complete system of atomic weights and elemental properties
- Sensible handling of implicit hydrogens
- Molecule validation and correctness-checking
- Molecule transformations, including canonicalization and salt-stripping
- Extensively tested and documented
I was reading the announcements of new products from OpenEye and I thought I should update the listings.
AFITT from OpenEye is the only software to offer a fully automatic ligand fitting process that optimizes a real-space fit to density while keeping conformational strain to a minimum. It capitalizes on a combination of core technologies that OpenEye has developed, specifically conformer generation, shape potential, high quality small molecule structure minimization, and visualization. The key step, after finding the appropriate conformers and aligning them to density, is the implementation of a refinement that combines force field and shape potentials, via a series of adiabatic optimizations . The AFITT distribution includes both a GUI and a collection of command-line applications.
BROOD is a software application designed to help project teams in drug discovery explore chemical and property space around their hit or lead molecule. BROOD generates analogs of the lead by replacing selected fragments in the molecule with fragments that have similar shape and electrostatics, yet with selectively modified molecular properties. BROOD fragment searching has multiple applications, including lead-hopping, side-chain enumeration, patent breaking, fragment merging, property manipulation, and patent protection by SAR expansion.
FILTER is a very fast molecular filtering and selection application. It uses a combination of physical property calculations and functional group knowledge to remove undesirable compounds before they enter experimental or virtual screening. Undesirable properties may include: toxic functionalities, a high likelihood of binding covalently with the target protein, interfering with the experimental assay, and/or a low probability of oral bioavailability.
QUACPAC provides pKa and tautomer enumeration in order to get correct protonation states. It also offers multiple partial charge models (including MMFF94 , AM1-BCC , and AMBER ) that cover a range of speed and quality in order to allow appropriate charging for every end use. QUACPAC's approach to tautomeric enumeration is to provide multiple tautomeric states rather than one "correct" tautomer. Subsequent downstream processes are then used to identify the appropriate tautomeric form.
SZYBKI optimizes molecular structures with the Merck Molecular Force Field, either with or without solvent effect, to yield quality 3D molecular structures for use as input to other programs. Since the chemistry of molecular interactions is a matter of shape and electrostatics, it is impossible to consider either without reasonable 3D molecular structures. SZYBKI also refines portions of a protein structure and optimize ligands within a protein active site, making it useful in conjunction with docking programs.
I’ve just posted the latest tutorial on scripting the chemically intelligent spreadsheet application Vortex, this tutorial shows how to use OpenBabel to provide similarity searching.
The full list of Vortex scripting tutorials are shown below.
More hints and tutorials can be found here.
This might be of interest.
Dotmatics is looking to expand the team working on Vortex, its data analysis platform. The candidate should have several years software development experience with Java and preferably with the Swing graphical user interface toolkit. The ideal candidate will have a degree or PhD in the life sciences, and will have experience with data visualisation and analysis techniques such as clustering. Experience with cheminformatics systems or statistical software, such as R, will be advantageous. Candidates will probably have experience working within the pharmaceutical/biotech sector or the life science software development industry.
The position will be based at the UK headquarters in Bishops Stortford (Herts, UK). We offer a competitive salary, benefits and a pleasant working environment at the Old Monastery site. Further information about the company and our software can be found at http://www.dotmatics.com.
This is the fourth tutorial on scripting Vortex a chemically intelligent data visualisation package. In the previous tutorials we have looked at getting data from OpenBabel, sieve, and cxcalc in this tutorial we will be using MOE as the compute engine. MOE from Chemical Computing Group is probably best known as a graphical user interface to a suite of computational chemistry tools, whilst this is indubitably the means by which many users will interact with the program it is worth finding out about the command-line tools that are available. These tools are often accessed by pipeline tools such as Knime to allow rapid processing of large files. CCG provides four very useful command-line tools in particular sddesc allows the calculation of some or all of the MOE molecular descriptors for each molecular entry.
The Vortex Scripts
Whilst Vortex has tools that allow you to do some analysis and of course you can use the scripting facility to access statistical or model building packages like R in this tutorial we will be using a model taken from the literature and implementing it within Vortex using a calculation field to construct the algorithm.
KNIME (Konstanz Information Miner) is a user-friendly and comprehensive open-source data integration, processing, analysis, and exploration platform. From day one, KNIME has been developed using rigorous software engineering practices and is used by professionals in both industry and academia in over 60 countries.
Release Date: December 1, 2012 Enhancements
- Enh 2933: Database Schema Browser for Database (Connection) Reader nodes
- Enh 2924: Database (Connection) Reader allows executing multi-line SELECT and non-SELECT queries
- Enh 2976: New Database dialog "Connection" tab more user friendly UI
- Enh 2952: Node-Annotations (multiline labels) replacing one-line labels underneath a node
- Enh 2882: Sort data in table view by clicking the column header
- Enh 2914: TableView supports Ctrl-C on a single cell
- Enh 2959: Tips & Tricks dialog is shown when KNIME starts
- Enh 2934: New editor action that allows to align nodes vertically (in addition to align horizontally)
- Enh 2928: Automatic checks for updates during startup (added command line argument "-checkForUpdates")
- Enh 2840: Missing Value node multiple column selection in Individual tab
- Enh 2876: Resolved Rename node name confusion: new name: Column Rename
- Enh 2975: Decision Tree View has zoom functionality
- Enh 2980: Weblog Reader is now able to read compressed files
- Enh 2974: File browsers in reader nodes (SDF, CSV, etc.) open with directory of currently selected file
- Enh 2907: XPath node can return missing value instead of empty string/NaN for non-matches
- Enh 2908: XPath node allows returning of attributes in a node set (multi-matches)
- Enh 2937: SubsetMatcher node allows mismatches
- Enh 2878: Add hidden debug option to initialize sorter memory service
- Enh 2883: Added ability to parallelize computation in ColumnRearranger
- Enh 2958: Added #clearHistory method for FileChooserHistory
- Enh 2964: Color chooser DialogComponent and SettingsModel is added
- Enh 2271: Upgrade of CDK integration (better renderer, SMARTS parsing) - part of community extensions
There is a KNIME Tutorial here
ChemAxon's Calculator (cxcalc) is a really useful command line program in Marvin Beans and JChem that performs chemical calculations using calculator plugins. There are a lot of calculations provided by ChemAxon (e.g. charge, pKa, logP, logD), and others can be added by writing custom plugins, perhaps one of the most useful is the ability to calculate the acidic and basic pKa. Calculation of pKa is essential to get a reasonable hold on the LogD of a molecule. LogD is probably the most critical physicochemical property in drug discovery, it has a major influence on absorption, cell penetration, metabolism, CYP450 inhibition and induction, PGP transporter activity and activity at the HERG channel, and is often a critical component of any structure activity relationship.
These scripts make use of cxcalc to generate data columns in Vortex
This is the second page on scripting Vortex, on the first page I described how to use OpenBabel to calculate a limited selection of chemical properties. In this script we will use one of the brilliant tools from silicos.
SIEVE is a program for filtering out molecules with unwanted properties. It is based on the Open Babel open source C++ API for rapid calculation of 45 different molecular properties.
The OSIRIS Property Explorer shown in this page is an integral part of Actelion's inhouse substance registration system. It lets you draw chemical structures and calculates on-the-fly various drug-relevant properties whenever a structure is valid. Prediction results are valued and color coded. Properties with high risks of undesired effects like mutagenicity or a poor intestinal absorption are shown in red. Whereas a green color indicates drug-conform behaviour.
It can be downloaded from here (24MB), this version requires Mac OS X 10.6 or higher and OpenBabel 2.3
Much will seem familiar to previous users of iBabel and the screenshots of the old version give a good overview of the capabilities, whilst the images below highlight a few of the new features.
The “Add title and index” option appends a title (default is Mol, but you can edit this in the adjacent text box) and an index number to multi-molecular files, e.g. Mol 1, Mol 2, Mol 3 etc. This is essential if you want to search files displayed in the “Viewer” since you need a unique identifier for each structure. In many cases the molecules will already have a molecule id.
Another new feature with OpenBabel 2.3 is the ability to generate 2D and 3D coordinates.
Perhaps the biggest changes have come with the “Viewer”, by storing the table data in an array we can use some of the cool ObjC functions such as the continuously updating selection count and the live searching of the “Name” text field. To import records identify the input file using the input button and then click the “Import” button.
The buttons highlighted in green allow the user to delete the highlighted row, delete all the “Selected” rows or clear all records completely. The selection can be modified using the buttons highlighted in pale blue.
For the other viewers, JMOL/JChemPaint are in the application bundle. ChemBioDraw needs to be in the Application folder but only works on some machines (something to do with only supporting 32-bit which I think we will have to wait for CambridgeSoft to address). Because of Java security issues Marvin has to be in the same file structure as the htm page, I think you only need to put an alias to Marvin in the Macintosh HD:Public folder or User:Public folder. the 2D and 3D radio buttons allow you to choose an appropriate display.
It also support JME as the editor but you need to get a copy from Peter Ertl directly and put it in the Public folder.
The PChem button pulls structures from PubChem, this can either be a single structure of a list (here is an example caslist.txt you can download to try).
As you can see the list contains a mixture of systematic names, trivial names, drug names and CAS numbers but the smart people at PubChem sort all that out nicely.
The result is two files on your output.smi which contains the successful searches and NoStructure.txt which contains cases where no structure was found. You can then import the file to view the structures.
I’d be delighted to hear of any bugs (honest) any suggestions for how iBabel might be improved.
There are a number Safari Extensions described on this site that access similar services and with the help of Matt I'm happy to anounce a new addition.
The Safari Extension for Opsin (download) allows the user to highlight a chemical name in a web page and then control click affords a dropdown menu, click on "Display ... using Opsin" and a small window will open displaying the chemical structure. What is particularly nice is that in addition to providing the structure in png format the same web service also provides the chemical structure in SMILES, InChi and CML format. If you click one of the buttons and the bottom of the structure window the structure will be downloaded in the appropriate format. You can read more about this extension here.
There is a full listing of the Safari Extensions here.
A selection of extensions that should be useful for chemists.
Chemspider :- Displays structure of highlighted chemical/drug and links to ChemSpider page.
PubChem :- Search PubChem for the highlighted compound
eMolecules :- Search eMolecules for the highlighted compound
Chemicalize :- Submit the current URL to chemicalize.org
DrugBank :- Search DrugBank for the highlighted compound
- Instant JChem Personal (an new OS independent desktop application for working with chemical and non chemical data) and Marvin, a chemical editor and viewer suite are free for all users
- All products are free for academic teachers and researchers - including the enterprise edition of Instant JChem
- Most products are free for freely accessible, non commercial websites