Blueprint for Scientific Visualization and Cheminformatics Analysis for Small Molecule Project Data
Dotmatics have just announced the release of Blueprint a web-based visualization and scientific analytics application designed to help scientists working on small molecule discovery projects.
Those at the last Dotmatics user group meeting saw an early demo of this platform, a web-based, interactive, data visualisation and analysis system with chemical intelligence and now it has been released.
- Load datasets from Browser or from files (SD, SMILES)
- Visualize structures in interactive tables, grids, matrices
- Visualize data as scatter plots, bar charts, line charts or pie charts
- Calculate molecular properties, ligand profiles, ligand efficiencies
- Refine datasets by filtering on structure and/or properties
- Perform R-group and matched molecular pairs analysis
- Generate program metrics
- Share workspaces, datasets and analyses
- Send selections to Browser, Vortex
- Export results to files
I'm sure we will hear more at the Dotmatics UGM.
Novartis Open Source tools for Drug Discovery
I'm sure most readers of this site are aware of the Open-Source cheminformatics toolkit RDKit that was first developed in Novartis. However I wonder how many are aware of the other Open-Source tools that Novartis have supported.
You can read more about them here
The Novartis Institutes for BioMedical Research (NIBR) is pioneering new informatics tools for drug discovery. We believe in the power of open-sourced, global collaboration for the greater good. Join us to help patients worldwide.
They are available on GitHub here.
They include Habitat an object management system, OntoBrowser a tool to manage ontologies and controlled terminologies. YAP is an extensible parallel framework, written in Python using OpenMPI libraries, and GridVar a jQuery plugin that visualises multi-dimensional datasets as layers organised in a row-column format
SCI-RSC Workshop on Computational Tools for Drug Discovery
I'm delighted to report this meeting seems to be filling up fast
All scientists working in drug discovery need tools and techniques for handling chemical information. This workshop offers a unique opportunity to try out a range of software packages for themselves with expert tuition in different aspects of pre-clinical drug discovery. Attendees will be able to choose from sessions covering data processing and visualisation; ligand and structure-based design, or ADMET prediction run by the software providers. All software and training materials required for the workshop will be provided for attendees to install and run on their own laptops and use for a limited period afterwards
Presentations from Optibrium / Cresset / Dotmatics /BioSolveIT/ Knime / ChemAxon
Open Drug Discovery Toolkit
Open Drug Discovery Toolkit (ODDT) is modular and comprehensive toolkit for use in cheminformatics, molecular modeling etc. ODDT is written in Python, and make extensive use of Numpy/Scipy.
Open Drug Discovery Toolkit (ODDT): a new open-source player in the drug discovery field DOI.
The Open Drug Discovery Toolkit was developed as a free and open source tool for both computer aided drug discovery (CADD) developers and researchers. ODDT reimplements many state-of-the-art methods, such as machine learning scoring functions (RF-Score and NNScore) and wraps other external software to ease the process of developing CADD pipelines. ODDT is an out-of-the-box solution designed to be easily customizable and extensible.
To install
Install a clean Miniconda environment, if you already don't have one.
Install ODDT:
conda install -c oddt oddt
Or you can use PIP
pip install oddt
Requirements
Python 2.7+ or 3.4+
OpenBabel (2.3.2+) or/and RDKit (2016.03)
Numpy (1.8+)
Scipy (0.13+)
Sklearn (0.18+)
joblib (0.8+)
pandas (0.17)
Skimage (0.10+) (optional, only for surface generation)
Scaffold Hunter update
Scaffold Hunter is a chemical data organization and analysis tool and that has been continuously enhanced since the start of its development in 2007. The platform-independent open-source tool was first released in 2009 and provided an interactive visualisation of the so-called scaffold tree, which is a hierarchical classification scheme for molecules based on their common scaffolds. A recent publication describes recent extensions that significantly increase the applicability for a variety of tasks DOI.
When I first opened the application I did not find it particularly intuitive, fortunately there is a online tutorial and sample datasets available.
Accessing ZINC supplier information
ZINC is a free database of commercially-available compounds for virtual screening. ZINC contains over 100 million purchasable compounds in ready-to-dock, 3D formats. Sterling and Irwin, J. Chem. Inf. Model, 2015. This is an invaluable resource for any type of virtual screening or for anyone looking to create a physical screening or fragment collection.
Once you have done the virtual screening you will rapidly realise that the really time-consuming a tedious part now lies ahead. Finding out which vendors stock a particular molecule and then ordering them. Looking up the vendor details for individual compounds is extremely tedious and so this Vortex script may be very useful.
Many more scripts, iPython notebooks and tutorials can be found here.
ChEMBL 21 released
The release of ChEMBL_21 has been announced. This version of the database was prepared on 1st February 2016 and contains:
- 1,929,473 compound records
- 1,592,191 compounds (of which 1,583,897 have mol files)
- 13,968,617 activities
- 1,212,831 assays
- 11,019 targets
- 62,502 source documents
Data can be downloaded from the ChEMBL ftpsite or viewed via the ChEMBL interface.
Please see ChEMBL_21 release notes for full details of all changes in this release.
Flagging potential aggregators in Vortex
Promiscuous inhibition caused by small molecule aggregation is a major source of false positive results in high-throughput screening. A recent particularly valuable publication, Irwin, Duan, Torosyan, Doak, Ziebart, Sterling, Tumanian and Shoichet, J Med Chem, 2015, 58(1 7), 7076-7087 DOI, has collated over 12,000 organic molecules known to act as aggregators at concentrations used in screening campaigns, and provides a resource Aggregation Advisor that can be used to try and predict possible false positives. However in many instances it would be unwise to submit proprietary information to the public web service. Potential aggregators are flagged based on calculated LogP >3 and/or similarity >0.85 to a known aggregator (using path based fingerprint) this script calculates xLogP using the algorithm provided by Dotmatics and then uses OpenBabel fast search to calculate the closest similarity to a known aggregator.
Full details of the Vortex script are here.
Importing Open Source Malaria Project data
The Open Source Malaria project is trying a different approach to curing malaria. Guided by open source principles, everything is open and anyone can contribute. To date a lot of people around the world have made contributions and the project is at a very exciting stage. Whilst everyone can see the compounds that have been made and the biological data, it is often spread over multiple web pages and can be tricky to link molecule with identifier with data. Over the last couple of months a significant effort has been put into populating a spreadsheet with all the information.
Whilst this is useful for viewing results it is not ideal for trying to build predictive models. Vortex is a chemically intelligent data analysis and visualisation platform. This script provides a one-click access to the OSM data and creates a workspace containing all the data, and since it is linked to the live spreadsheet you will always have access to the latest data.
Installing Open Drug Discovery Toolkit (ODDT)
A recent paper in J Cheminformatics described Open Drug Discovery Toolkit (ODDT): a new open-source player in the drug discovery field DOI a free and open source tool for both computer aided drug discovery (CADD) developers and researchers. Open Drug Discovery Toolkit is released on a permissive 3-clause BSD license for both academic and industrial use. ODDT’s source code, additional examples and documentation are available on GitHub.
To install ODDT on a Mac you first need to install the appropriate toolkits, the easiest way is to use Homebrew, I've written a page detailing how to do this here.
Once installed you can install ODDT using PIP as described here.
ConTour: Data-Driven Exploration of Multi-Relational Datasets for Drug Discovery
Caleydo is an open source visual analysis framework targeted at biomolecular data. It has been described in a number of publications and I noticed that a recent project ConTour included chemical structures.
Large scale data analysis is nowadays a crucial part of drug discovery. Biologists and chemists need to quickly explore and evaluate potentially effective yet safe compounds based on many datasets that are in relationship with each other. However, there is a is a lack of tools that support them in these processes. To remedy this, we developed ConTour, an interactive visual analytics technique that enables the exploration of these complex, multi-relational datasets.
Christian Partl, Alexander Lex, Marc Streit, Hendrik Strobelt, Anne-Mai Wassermann, Hanspeter Pfister, Dieter Schmalstieg ConTour: Data-Driven Exploration of Multi-Relational Datasets for Drug Discovery IEEE Transactions on Visualization and Computer Graphics (VAST '14), to appear, 2014.
I’ve added Caleydo to the listing of data analysis tools.
CLC Chemistry Workbench updated
The CLC Drug Discovery Workbench has been updated to version 1.5.
The new features are described on the updates page and in a video.
Open Phacts API update
The OpenPhacts API has been updated to include two new data sets and the corresponding API calls.
1) DisGeNet target-disease associations These API calls use URIs inputs that correspond to either diseases or targets (proteins or genes). The disease identifiers correspond to UMLS CUIs, Mesh ids or ConceptWiki and can use several namespaces, e.g. http://linkedlifedata.com/resource/umls/id/C0004238, http://purl.bioontology.org/ontology/MSH/D001281, or http://www.conceptwiki.org/concept/index/095cb66f-76ef-41b5-a8ae-c39352e6007e
2) neXtProt nanopublications for tissue expression (PREVIEW mode) These API calls use URIs that correspond to either tissues or targets. The tissue identifiers correspond to the Caloha tissue ontology from neXtProt. These identifiers can use either the namespace from the neXtProt database (e.g. http://www.nextprot.org/db/term/TS-0564, will be operational next week) or the Caloha ontology (ftp://ftp.nextprot.org/pub/currentrelease/controlledvocabularies/caloha.obo#TS-0564, operational now).
To reduce the barriers to drug discovery in industry, academia and for small businesses, the Open PHACTS Discovery Platform provides tools and services to interact with multiple integrated and publicly available data sources. To integrate this data, extensive cross-referencing of scientific concepts is needed across all databases.
Medicinal Chemistry Toolkit
The Medicinal Chemistry Toolkit app is a suite of resources to support the day to day work of a medicinal chemist. Based on the experiences of medicinal chemistry experts, we developed otherwise difficult-to-access tools in a portable format for use in meetings, on the move and in the lab. The app is optimised for iPad and contains calculator functions designed to ease the process of calculating values of: Cheng-Prusoff; Dose to man; Gibbs free energy to binding constant; Maximum absorbable dose calculator; Potency shift due to plasma protein binding.
The app has been designed in collaboration with the editors of the forthcoming book, The Handbook of Medicinal Chemistry: Principles and Practice, which will be published in November 2014 providing a comprehensive, everyday resource for a practicing medicinal chemist throughout the drug development process and is an ideal companion for the biannual MedChem Summer school run by the RSC.
SeeSAR 1.4
SeeSAR from BioSolve-it has just been updated, SeeSAR is intended as an interactive tool for designing/improving ligands for drug discovery. This update includes an option to highlight the neighbouring atoms that lead to a particular hyde-score, in the example below the carbon in the ring that gets a pretty big red (unfavourable) score, can be explained by the Receptor desolvation penalty ascribed to the carbonyl oxygen. stereo hardware support (as a first step supporting the polarized-glass-type only), a screen shot feature, an option to move labels out of the way for a better view.
There is a review of SeeSAR here
CLC Drug Discovery Workbench
The latest beta of CLC Drug Discovery Workbench v1.5 beta 4 is available for download. The workbench provides an integrated environment for drug discovery providing tools to explore visualise protein targets and ligands binding to them
MOLECULE STRUCTURE VISUALIZATION
- Molecule 3D structure import: Mol2, SDF, PDB
- Direct download of PDB structures from NCBI
- Quick-style options including ball-n-sticks and molecular surfaces
- Custom visualization applied to selected atoms
- Save molecule visualizations on data
- Molecule tables with 2D depiction of molecules
CHEMICAL AWARENESS
- Generate molecule 3D structure from SMILES or 2D representation*
- Automatic assignment of atom and bond properties
- Automatic binding site setup
- Chemical consistency check
- Lipinski’s rule of five check
STRUCTURE BASED DRUG DISCOVERY
- Binding pocket finder
- Easy, graphical protein target setup
- Fast track molecular docking
- Optimize ligand interactions in binding site
- Virtual screening
- Ligand binding inspection
- Calculate molecular properties
- Protein structure and binding site alignment
DataWarrior
DataWarrior is a data analysis tool that understands chemistry.
DataWarrior combines dynamic graphical views and interactive row filtering with chemical intelligence. Scatter plots, box plots, bar charts and pie charts not only visualize numerical or category data, but also show trends of multiple scaffolds or compound substitution patterns. Chemical descriptors encode various aspects of chemical structures, e.g. the chemical graph, chemical functionality from a synthetic chemist’s point of view or 3-dimensional pharmacophore features. These allow for fundamentally different types of molecular similarity measures, which can be applied for many purposes including row filtering and the customization of graphical views. DataWarrior supports the enumeration of combinatorial libraries as the creation of evolutionary libraries. Compounds can be clustered and diverse subsets can be picked. Calculated compound similarities can be used for multidimensional scaling methods, e.g. Kohonen nets. Physicochemical properties can be calculated, structure activity relationship tables can be created and activity cliffs be visualized.
Installing ACPC on a Mac
One of the advantages of using a Mac for science is that you can often make use of the UNIX underpinnings of Mac OSX to access programs written for Linux.
A recent publication in Journal of Cheminformatics caught my eye, screening of molecules using electrostatics is usually a very time-consuming process, but this publication describes an interesting and very quick way to screen molecules.
A rotation-translation invariant molecular descriptor of partial charges and its use in ligand-based virtual screening Francois Berenger, Arnout Voet, Xiao Yin Lee and Kam YJ Zhang Journal of Cheminformatics 2014, 6:23 doi
I’ve written instructions for how to install ACPC under Mac OSX.
A Review of Forge V10.2 on the New MacPro
Now that I have my new MacPro I thought it might be interesting to try out a couple of the software packages that I’ve previously reviewed. ForgeV10 allows the scientist to use Cresset’s proprietary electrostatic and physicochemical fields to align, score and compare diverse molecules. It allows the user to build field based pharmacophores to understand structure activity and then use the template to undertake a virtual screen to identify novel scaffolds. I’ve previously reviewed ForgeV10 and as it was formally known FieldAlign so I’m going to focus on the support for multiple processors and a few of the new features.
There is a compilation of software reviews here
Bringing Open Source to Drug Discovery
I gave a talk at the RSC 25th Symposium on Medicinal Chemistry in Eastern England meeting last week entitled “Bringing Open Source to Drug Discovery”.
The slides and pages of links are available here.
I also captured the laptop screen of the demo which I’ve now put on YouTube.
https://www.youtube.com/watch?v=sG9vDIfp0NE&feature=youtu.be
The aim was to show what was available and to show how they can be integrated into proprietary tools using scripting, many of the scripts are available on the hints and tutorials page.
Porting of BUDE (Bristol University Docking Engine) to OpenCL.
A recently publication “High Performance in silico Virtual Drug Screening on Many-Core Processors” DOI describes porting BUDE (Bristol University Docking Engine) to OpenCL.
Our highly optimized OpenCL implementation of BUDE sustains 1.43 TFLOP/s on a single NVIDIA GTX 680 GPU, or 46% of peak performance. BUDE also exploits OpenCL to deliver effective performance portability across a broad spectrum of different computer architectures from different vendors, includ- ing GPUs from NVIDIA and AMD, Intel’s Xeon Phi and multi-core CPUs with SIMD instruction sets.
BUDE is now one the fastest HPC applications ever developed and nicely demonstrates the portability of OpenCL across different architectures.
There is a list of GPU accelerated applications here.
TB Mobile Updated
TB Mobile from Collaborative Drug Discovery been updated.
TB Mobile makes available a set of molecules with activity against Mycobacterium Tuberculosis, and known targets available in CDD. It links to pathways (biocyc.org), genes (tbdb.org), literature (PubMed).
The latest update adds support for iOS7, and adds further compounds and molecular targets. So now there are 96 unique targets and 805 compounds. There is also major new functionality including interactive clustering, personal favourites, target prediction and exporting capabilities.
Schrodinger Small Molecule Drug Discovery Suite Updated
The Schrodinger Small Molecule Drug Discovery Suite was updated over the weekend, this is a major update that brings in a host of new features and improvements.
Maestro Graphical Interface
Improved flexible ligand superposition
Additional graphics settings
Real-time antialiasing
Real-time ambient occlusion, outlines, and cartoon shading effects
Multivariate ranking in the Project Table
Simultaneously maximize or minimize up to four property values, and rank entries based on the optimization
Date Created and Date Modified fields automatically generated in the Project Table
Workspace responsiveness of atom labels is up to 2.5x faster
Click and drag to rearrange atom, measurement, and adjustment labels in the Workspace
Support for bond labels
Installed scripts and Tools menu items now searchable in the Task Tree
Significant improvements to the Property Calculation interface in the project facility
Simultaneously calculate multiple properties
Additional 2D properties now available: AlogP, #Hbond acceptors, #HBond donors, #rotatable bonds, polar surface area, molar refractivity, and polarizability
Ligand Docking
Ligand efficiencies are now calculated from the DockingScore instead of the GlideScore Generate per-residue interaction energies in Virtual Screening Workflow (VSW) for visualization New server mode in Glide Ligand Designer enables near real-time interactive docking (Glide Ligand Designer Script)
Pharmacophore Modeling
Performance improvements to Phase database operations, including faster deletion and insertion of ligands Automatic restart of Phase database subjobs
Field-Based QSAR
Use QM-calculated fields in 3D QSAR (command line only; phasefqsar script)
phasefqsar script generates Jaguar input files for computing QM electrostatic fields for use in 3D QSAR
Molecular Dynamics
Monitor secondary structure elements over the course of the trajectory (Simulation Interactions Diagram; SID)
Quantum Mechanics
New interface to compute thermodynamic properties for reactions
New faster TDDFT algorithm and graphical interface
Compute Raman intensities
Several improvements to the results script
Jaguar pKa displays the computed pKa as an atom label by default
Heat of formation graphical interface now supports bromine and iodine
Improved numerical stability of the 1st and 2nd derivatives of the D3 correction
Increased utility of canonical.py script
Script acts on a group of isomers and skips structures with unique stoichiometries
Protein X-Ray Refinement
Optionally set hydrogen B-values
Workflows & Pipelining
Includes the latest version of KNIME (v2.9)
Many new features including a Send Email node and ability to save workflows under different names; see http://tech.knime.org/whats-new-in-knime-29 for a complete list of new features
Use any Glide simulation option in the Glide Ligand Docking node
Employ a specific template in the Prime Build Homology Modeling node
Import ungrouped structures to PyMOL from Run PyMOL node
Job Control
Improved fault tolerance Improved handling of suspended jobs in queueing systems
There are also updates to the Biologics Suite and the Materials Science Suite.
A review of FAst MEtabolizer (FAME)
Whilst much computational work is undertaken to support, library design, virtual screening, hit selection and affinity optimisation the reality is that the most challenging issues to resolve in drug discovery often revolve around absorption, distribution, metabolism and excretion (ADME). Whilst we can measure the levels of parent drug in various medium tracking metabolic fate can often be a considerably more difficult proposition requiring significant resources. For this reason prediction of sites of metabolism has become the subject of current interest.
FAME DOI is a collection of random forest models trained on a comprehensive and highly diverse data set of 20,000 small molecules annotated with their experimentally determined sites of metabolism taken from multiple species (rat, dog and human). In addition dedicated models are available to predict sites of metabolism of phase I and II processes.
FAME offers a high performance prediction of sites of metabolism mediated by a wide variety of mechanisms.
The full review is available here
There is a list of software reviews here.
ROCS Updated
OpenEye have just announced that the virtual screening tool ROCS v3.2 has been released.
Several noteworthy features have been added to this version including a -subrocs option that can drastically improve substructure alignments. Also included is an application rocs-report that uses our 2D depiction technology to make pdf reports of hitlists displayed with 2D similarity, shape and color overlaps, as well as property histograms. Substantial upgrades have been made to vROCS. An improved sketcher now highlights unspecified stereochemistry in atoms and bonds in query structures, and requires the user to correct any unspecified stereochemistry.
ROCS is available for download here.