Rescoring Docking using RF-Score-VS
A little while back I described a docking workflow including a rescoring script for Vortex, so I thought it might be useful to include this on a separate page.
Recently, machine-learning scoring functions trained on protein-ligand complexes have shown significant promise an example being (RF-Score-VS) trained on 15 426 active and 893 897 inactive molecules docked to a set of 102 targets DOI.
Our results show RF-Score-VS can substantially improve virtual screening performance: RF-Score-VS top 1% provides 55.6% hit rate, whereas that of Vina only 16.2% (for smaller percent the difference is even more encouraging: RF-Score-VS top 0.1% achieves 88.6% hit rate for 27.5% using Vina). In addition, RF-Score-VS provides much better prediction of measured binding affinity than Vina (Pearson correlation of 0.56 and −0.18, respectively). Lastly, we test RF-Score-VS on an independent test set from the DEKOIS benchmark and observed comparable results.
Binaries for RF-Score-VS are available https://github.com/oddt/rfscorevs_binary.
The full details of the Vortex script are here.
Mobile Science Apps
The new Mobile Science site has been running for a couple of months now and is getting an increasing number of visitors.
The top 10 most viewed apps are.
Elemental the Dotmatics chemistry sketch utility provided at no cost.
Wolfram|Alpha the world's definitive source for instant expert knowledge and computation.
ChemSpider which allows you to search the ChemSpider chemical database, provided by the RSC.
PocketCAS: Mathematics Toolkit the most advanced mathematics application for iPhone and iPad.
ElementalDB search within the 1.4M compound ChEMBL 19 dataset locally on your iPad.
MoleculAR Viewer an Augmented Reality app that allows you to visualize and interact with molecules.
MestReNova designed to for NMR data analysis productivity and flexibility anywhere.
CalcKit lets you create your own personalized calculators.
ChemTube3D an Open Educational Resource containing interactive 3D animations and structures,
WebMO Molecule Editor allows users to build and view molecules in 3-D, visualize orbitals and symmetry elements.
The most searched categories are Chemistry, Medical and Biology.
Feel free to let me know of any apps I've missed or that have been updated recently.
19th annual KDnuggets Software Poll
The results of the 19th annual KDnuggets Software Poll are now in. Continuing the trend over the last few years Python continues to expand its user base and is now up to 66%. Since a couple of the other options are also Python based this could be an underestimate.
There is more detailed analysis on the website. Interestingly Python seems to be the only programming language that is increasing in use.
Workshop on Computational Tools for Drug Discovery
In many companies/institutions/universities new arrivals are presented with a variety of desktop tools with little or no advice on how to use them other than "pick it up as you along". This workshop is intended to provide expert tutorials to get you started and show what can be achieved with the software.
The tutorials will be given a series of outstanding experts Christian Lemmen (BioSolveIT), Akos Tarcsay (ChemAxon), Giovanna Tedesco (Cresset), Dan Ormsby (Dotmatics) Greg Landrum (Knime ) and Matt Segall (Optibrium), you will be able to install the software packages on you own laptops together with a license to allow you to use it for a limited period after the event.
Registration and full details are here.
PLUMED Version 2.5 (Dec 19, 2018)
PLUMED is a plugin that works with a large number of molecular dynamics codes. It can be used to analyze features of the dynamics on-the-fly or to perform a wide variety of free energy methods. PLUMED can also work as a Command Line Tools to perform analysis on trajectories saved in most of the existing formats.
Huge number of changes.
PLUMED: A portable plugin for free-energy calculations with molecular dynamics DOI
OpenEye Applications v2018.Nov released
OpenEye announced the release of OpenEye Applications v2018.Nov. These applications include the usual support for Linux, MacOSX, and Windows, download here.
Supported versions of Mac OSX 10.11, 10.12, 10.13, 10.14.
- OpenEye Applications are now released as an applications package. It includes all the OpenEye Applications suites except for VIDA and AFITT.
- ROCS is now built on top of Shape TK 2.0.
- QUACPAC now includes improvements to the tautomer functionality.
ORCA Version 4.1 released
With the release of ORCA 4.1, they have moved our forum and download site to a new server at the Max Planck Institute fuer Kohlenforschung, where the ORCA team now has its home base. Now at https://orcaforum.kofo.mpg.de.
ORCAis an ab initio quantum chemistry program package that contains modern electronic structure methods including density functional theory, many-body perturbation, coupled cluster, multireference methods, and semi-empirical quantum chemistry methods. Its main field of application is larger molecules, transition metal complexes, and their spectroscopic properties. DOI.
List of new features for ORCA 4.1:
SCF/DFT
- B97M-V, wB97M-V, wB97X-V plus various D3 variants of B97 functionals
- Simple input keywords for DSD-BLYP, DSD-PBEP86, and DSD-PBEB95
- CPCM analytic Hessian
- DLPNO-double hybrid DFT including gradient
- SymRelax option in %method
Semiempirical methods
- XTB method of Grimme et al.
Coupled cluster
- Iterative solution of the full (T) equations for DLPNO-CCSD(T)
- Open shell DLPNO-CCSD density and spin density matrices
- Full DLPNO-MP2 gradient
- CIM (Cluster in molecules) Implementation with MP2, CCSD(T), DLPNO-MP2 and DLPNO-CCSD(T)
- IP and EA coupled cluster methods and their DLPNO variants
- STEOM-CCSD for open shells
- SOC between bt-PNO-STEOM and STEOM states
- Improved Multilevel implementation including multilevel DLPNO-IP
- F12-Triples scaling for RHF canonical CCSD(T) based on the CCSD/ CCSD-F12 ratio
Multireference
- New CASSCF SuperCIPT converger is reliable and efficient.
- New options for final orbitals to find partner orbitals for the chosen active space e.g. bonding / anti-bonding partners.
- MC-RPA (Multiconfigurational random phase approximation)
- ◦ AO driven integral direct for calculations on larger molecules
- ◦ Fock matrix -> conventional, direct, RIJ/COSX
- ◦ MPI parallel
- ◦ NTOs for visualizing transitions
- Checking stability of state specific CASSCF wave functions by orca_mcrpa
- Dynamic correlation dressed (DCD-CAS) method with inclusion of relativistic effects (SOC, spin-spin, magnetic fields)
- CASSCF RIJCOSX allows two separate auxiliary basis sets
- CASCI/NEVPT2 protocol for XAS and RIXS
Optimization
- Nudge elastic band method to locate transition states
- Enabled 3-dimensional relaxed potential energy surface scan
- Improvement of redundandant internal coordinate generation
- Faster and more smooth convergence for 3-dimensional systems and embedded cluster models
- Intrinsic reaction coordinate (IRC) following
- Swart model Hessian (good for weak interactions)
Molecular Dynamics
- MD simulations can now use Cartesian, distance, angle, and dihedral angle constraints.
- The MD module now features cells of several geometries (cube, orthorhombic, parallelepiped, sphere, ellipsoid), which can help to keep the system inside of a well-defined volume.
- The cells can be defined as elastic, such that their size adapts to the system. This enables to run simulations under constant pressure.
- Ability to define regions (subsets of atoms) enables applications such as thermostating different parts of the system to different temperatures (cold solute in hot solvent, temperature gradients, ...)
- Trajectories can now be written in XYZ and PDF file format.
- A restart file is written in every simulation step. Simulations can be restarted to seamlessly continue.
- The energy drift of the simulation is now displayed in every step.
- The MD module now works with a broader range of methods (semiempirics, ECPs, QM/MM).
- Fixed a bug in the time integration of the equations of motion which compromised energy conservation.
Spectroscopic properties
- orca_pnmr module tool to calculate paramagnetic NMR spectra
- NMR chemical shifts with RI-MP2 and double hybrid DFT including GIAO’s, spin-component scaling and CPCM
- NMR Spin-Spin coupling in calculations with DFT/HF
- NMR wth ZORA
- Maximoff-Scuseria correction for the kinetic energy density in GIAO-based calculations with meta-GGA functionals
- Exact and gauge invariant transition moments and approximate decomposition into dipole, quadrupole etc terms in all modules.
- PNO-ROCIS method for more efficient X-ray absorption calculations
- IP-ROCISD for high spin ROHF references
- TD-DFT:
- Transient spectra (excited state absorption) for CIS/TDA
- Triplet gradients (with RIJ, COSX and all) for all cases.
- Spin orbit coupling (including CPCM) and gradients
- Root following scheme for optimization
- Slow term to correct energy of relaxed excited state
- Full TD-DFT with double hybrids
- ESD module to calculate spectroscopic properties
- Vibrationally resolved absorption spectra including Duschinsky rotation and/or vibronic coupling.
- Fluorescence and Phosphorescence rates with same options.
- Resonance Raman spectra with the same options
- works with CIS/TDDFT, ROCIS, CASSCF and EOM/STEOM.
- Seven different schemes for obtaining an excited state PES and five different choices of coordinate systems
Analysis tools:
- Open Shell LED
- Dispersion interaction Density plots
- LED for DLPNO-MP2
- LED for the frozen state
- Update of AIM interface
- NBO 7 compatibility (i4)
- Miscellaneous
- Compound method (Infrastructure, plus W2.2, W1, G2(MP2), G2(MP2-SVP), G2(MP2-SV) methods)
- Property file (additional properties, plus new infrastructure)
- Decomposition of correlation energy for canonical RHF CCSD energies to singlet - triple pairs
- Additional EP2 extrapolation schemes using RI-MP2 and DLPNO-MP2 methods as cheap methods (request from forum)
- Lanthanide new def2 basis sets
- def2-XVP/C auxiliary basis sets for Ce-Lu by Chmela and Harding.
- Robust Second order optimizer for localized orbitals
- Added a few basis sets.
DIRAC18 released
The DIRAC program computes molecular properties using relativistic quantum chemical methods. It is named after P.A.M. Dirac, the father of relativistic electronic structure theory.
I can be downloaded from the zenodo repository.
New features are described here.
DIRAC, a relativistic ab initio electronic structure program, Release DIRAC18 (2018), written by T. Saue, L. Visscher, H. J. Aa. Jensen, and R. Bast, with contributions from V. Bakken, K. G. Dyall, S. Dubillard, U. Ekström, E. Eliav, T. Enevoldsen, E. Faßhauer, T. Fleig, O. Fossgaard, A. S. P. Gomes, E. D. Hedegård, T. Helgaker, J. Henriksson, M. Iliaš, Ch. R. Jacob, S. Knecht, S. Komorovský, O. Kullie, J. K. Lærdahl, C. V. Larsen, Y. S. Lee, H. S. Nataraj, M. K. Nayak, P. Norman, G. Olejniczak, J. Olsen, J. M. H. Olsen, Y. C. Park, J. K. Pedersen, M. Pernpointner, R. Di Remigio, K. Ruud, P. Sałek, B. Schimmelpfennig, A. Shee, J. Sikkema, A. J. Thorvaldsen, J. Thyssen, J. van Stralen, S. Villaume, O. Visser, T. Winther, and S. Yamamoto (available at https://doi.org/10.5281/zenodo.2253986, see also http://www.diracprogram.org).
Most popular Python IDE, Editors
I always keep an eye out for the polls on KDnuggets, the latest one looks at Python editors or IDEs, over 1900 people took part and the results are shown below (users could select up to 3). There is more detail in the linked page.
I've become a great fan of Jupyter, and not only for Python.
Wolfram|Alpha
Wolfram|Alpha has been updated, to include support for the iPhone XS max and bug fixes.
Remember the Star Trek computer? It's finally happening--with Wolfram|Alpha. Building on 25 years of development led by Stephen Wolfram, Wolfram|Alpha has rapidly become the world's definitive source for instant expert knowledge and computation.
RSC CICAG Interest Group
Royal Society of Chemistry members will be getting their annual subscription details around now. Can I remind people that your membership entitles you to membership of up to THREE Interest Groups. Apparently only around 25% take advantage of this option so I'd urge you to have a look at the groups available
In particular I'd like to highlight:-
86 Chemical Information and Computer Applications Group
The Chemical Information and Computer Applications Group (CICAG) is one of the RSC’s many member-led Interest Groups, which exist to benefit RSC members and the wider chemical science community, and to meet the requirements of the RSC’s strategy and charter. The storage, retrieval, analysis and preservation of chemical information and data are of critical importance for research, development and education in the chemical sciences. All chemists, and everybody else who works with chemical substances, need tools and techniques for handling chemical information.
If you have already submitted your form you can make a request to join a group via email (membership@rsc.org) or telephone (01223 432141).
If you want to find out more about CICAG activities the newsletters are available here and if you have ideas for future activities feel free to contact the committee.
Installing Osprey 3.0 under Mac OS X
A recent publication described OSPREY 3.0: Open-Source Protein Redesign for You, with Powerful New Feature DOI.
We present Osprey 3.0, a new and greatly improved release of the osprey protein design software. Osprey 3.0 features a convenient new Python interface, which greatly improves its ease of use. It is over two orders of magnitude faster than previous versions of osprey when running the same algorithms on the same hardware. Moreover, osprey 3.0 includes several new algorithms, which introduce substantial speedups as well as improved biophysical modeling. It also includes GPU support, which provides an additional speedup of over an order of magnitude. Like previous versions of osprey, osprey 3.0 offers a unique package of advantages over other design software, including provable design algorithms that account for continuous flexibility during design and model conformational entropy. Finally, we show here empirically that osprey 3.0 accurately predicts the effect of mutations on protein–protein binding.
Osprey 3.0 is available at http://www.cs.duke.edu/donaldlab/osprey.php as free and open‐source software GPLv2.
The source code is available on GitHub https://github.com/donaldlab/OSPREY3/.
Unfortunately the installation instructions do not include Mac OSX but there are instructions for "Debian-like Linux" which seemed promising. With the invaluable help of Nathan Guerin I was able to get OSPREY installed.
Open Source Python Data Science Libraries
When I wrote the article entitled A few thoughts on scientific software one of the responses I got was that people did not know about the existence of open-source chemistry toolkits so I thought I'd publish a page that hopefully prevent stop people reinventing the wheel. Here are a few open-source cheminformatics toolkits that I'm aware of.
As a follow up I thought I'd put together a list of useful python libraries for data science
As always happy to hear comments or suggestion for additions.
DOCK 6.9 released
DOCK 6.9 has been released.
This is a release of the new ligand searching method DOCKDN: De Novo design using fragment-based assembly. De novo design can be used to explore vast areas of chemical space in computational lead discovery. DOCKDN is an iterative fragment growth method, in which new molecules are built using rules for allowable connections based on known molecules.
For full information on what is new in DOCK 6.9
http://dock.compbio.ucsf.edu/DOCK6/newin_6.9.txt
GuacaMol, benchmarking models.
Comparison of different algorithms is an under researched area, this publication looks like a useful starting point.
GuacaMol: Benchmarking Models for De Novo Molecular Design
De novo design seeks to generate molecules with required property profiles by virtual design-make-test cycles. With the emergence of deep learning and neural generative models in many application areas, models for molecular design based on neural networks appeared recently and show promising results. However, the new models have not been profiled on consistent tasks, and comparative studies to well-established algorithms have only seldom been performed. To standardize the assessment of both classical and neural models for de novo molecular design, we propose an evaluation framework, GuacaMol, based on a suite of standardized benchmarks. The benchmark tasks encompass measuring the fidelity of the models to reproduce the property distribution of the training sets, the ability to generate novel molecules, the exploration and exploitation of chemical space, and a variety of single and multi-objective optimization tasks. The benchmarking framework is available as an open-source Python package.
Source code : https://github.com/BenevolentAI/guacamol.
The easiest way to install guacamol is with pip:
pip install git+https://github.com/BenevolentAI/guacamol.git#egg=guacamol --process-dependency-links
guacamol requires the RDKit library (version 2018.09.1.0 or newer).
IBM RXN twitter feed
Just saw new twitter feed that might be of interest for any synthetic chemists interested in retrosynthesis/reaction prediction.
Account for news and general info on the freely available AI platform made by #compchem chemists for #organic chemists
You can try out the reaction planning service for free here https://rxn.res.ibm.com.
Samson documentation updated
I've mentioned Samson a couple of times and I noticed that the documentation has been updated. Documentation is a critical but often overlooked feature of software.
SAMSON is a novel software platform for computational nanoscience. Rapidly build models of nanotubes, proteins, and complex nanosystems. Run interactive simulations to simulate chemical reactions, bend graphene sheets, (un)fold proteins. SAMSON's generic architecture makes it suitable for material science, life science, physics, electronics, chemistry, and even education. SAMSON is developed by the NANO-D group at INRIA, and means "Software for Adaptive Modeling and Simulation Of Nanosystems
ConstruQt API
Just got details of an interesting service
ChemAlive (www.chemalive.com) would like to offer ConstruQt, its core molecular design tool based on quantum mechanics (QM), for trial.
Currently you can:
- Transforms list of SMILES or InChI molecular designations into state-of-the-art 3D molecular structures in SD format
- Manages the conformational space of the molecules with a robust shape searching algorithm
- Generates all reasonable tautomeric forms of the molecule and prioritizes them by energy
- Generates all diastereomeric forms of the molecules and differentiates them by energy
- All molecules are stored in our unique database architecture making the calculations easily augmented and carried through to other processes
The last bullet point is worth noting, so don't submit anything confidential.
Cambridge Cheminformatics Network
28 November 2018
Cambridge Cheminformatics Meeting
CCDC, Union Road
3.30pm coffee; 4pm talks start; ~5.30pm drinks at The Alma
"Free ligand conformations in Structure Based Drug Discovery"
Elisabetta Chiarparin, AstraZeneca
https://uk.linkedin.com/in/elisabetta-chiarparin-206a021
https://www.astrazeneca.co.uk/
"Digital design – From molecules to medicines with structural informatics"
Andrew Maloney , CCDC
https://www.ccdc.cam.ac.uk/researchandconsultancy/ccdcresearch/ccdcresearchers/?id=d215312f-9564-49d2-b47c-a4bba9324f2f
https://www.ccdc.cam.ac.uk/
"New Trend in Therapeutics Research - Artificial Intelligence for Identifying Novel Therapeutic Targets, Biomarkers and Drug Repositioning Opportunities"
Namshik Han, Milner Institute
https://crukcambridgecentre.org.uk/users/namshik-han
https://www.milner.cam.ac.uk/
Applescript and Mojave
Well worth a read.
Executing AppleScript in a Mac app on macOS Mojave and dealing with AppleEvent sandboxing
Over a weekend recently I built a tiny Mac app (more on that later). What I was trying to achieve required executing AppleScript, like so many things on macOS. It seemed simple enough, but of course new app sandboxing restrictions in macOS Mojave got in the way.
Making a Random Selection
Sometimes it is the simplest scripts that prove to be the most useful, the most downloaded AppleScript on the site is the one that simply prints the text on the clipboard.
I regularly need to select a specified number of molecules in a random fashion and this script does just that. Import a sdf file containing structures into Vortex and run the script to make a random selection.
Mobile Science Update
The mobile science section is now up and running, and is slowly getting more views.
Just to emphasise that you don't need to log in to search, simply type in your query and hit search.
For example searching for "Viewer" results in 55 hits found and you can further refine your search.
Each App has a detailed description and a link to the iTunes App Store for download, shown below Elemental.
As ever always happy to hear about possible additions to the site.
Which versions of Mac OS and iOS are in use?
I'm occasionally approached by developers asking about the versions of Mac OS or iOS the visitors to the site are using. Whilst you can get general information from sites like NetMarketShare or Mixpanel I guess it is useful to know what versions scientists might be using.
Over the last month the operating systems used by visitors are 60% Mac, 21% Windows, 12% iOS, 4% Linux, 2.8% Android.
Of the Mac users, 43% are now using 10.14 (Mojave), 35% 10.13 with all older versions each well below 10%
For the visitors using iOS, 71% are using 12.x, and 16% using 11.x
These figures would suggest that visitors to the site are probably among the early adopters, and that Chemistry software developers should try and ensure they support the latest versions of operating systems as early as possible.
LigandScout 4.3 released
Inte:Ligand have just announced the release of LigandScout 4.3.
The LigandScout software suite comprises the most user friendly molecular design tools available to chemists and modelers worldwide. The platform seamlessly integrates computational technology for designing, filtering, searching and prioritizing molecules for synthesis and biological assessment.
This is a significant update and expands LigandScout's molecular dynamics support. This update also now includes halogen binding as a new pharmacophoric element. In addition plotting has received an upgrade.
Furthermore, LigandScout 4.3 Expert introduces a completely new set of features summarized under the term Remote Execution. It is now possible to screen large compound libraries on remote High Performance Computing directly from within the graphical LigandScout user interface.
It can be downloaded here http://www.inteligand.com/ligandscout4/downloads/LigandScout43macos20181012.dmg
You can read about the technology behind LigandScout here DOI and there is a review of an earlier version here.
In addition there are now over 40 LigandScout nodes for KNIME.
KNIME Analytics Platform is the open source software for creating data science applications, workflows and services. Intuitive, open, and continuously integrating new developments, KNIME makes understanding data and designing data science workflows and reusable components accessible to everyone.
BBEdit 12.5 released
BBEdit 12.5 requires Mac OS X 10.12.6 or later, and is compatible with macOS 10.14 "Mojave".
If you are using macOS 10.13 "High Sierra", please make sure that you have updated to the latest available OS version (10.13.6 or later).
If you are using macOS 10.14 "Mojave", please make sure that you have updated to the latest available OS version (10.14.1 or later).
There are a number of new additions
- There is a new command on the "Go" menu: "Commands...". This brings up a modal panel which lists everything that you can do from a menu in BBEdit: menu commands, clippings, scripts, stationery, text filters, as well as open text documents and recent file.
- You can now generate lipsum, using the "Lorem Ipsum" command on the Insert submenu of the Edit menu.
- Added support for Grep patterns to Canonize. This can be turned on using the check box in the "Canonize" dialog box; or by using a mode line in a Canonize data file.
- Multi-File Search results windows now get a "reload" button, which you can use to repeat the search using the same settings.
Full details are in the release notes.
IGMPlot release 2.4
The new IGMPlot release 2.4, is available for download at http://igmplot.univ-reims.fr . It provides chemists with a visual analysis of covalent and non-covalent interactions
Detailed installation notes are in the documentation (page 5).
IGMPlot is written in C++. It has been installed and tested on several platforms: computational centers (linux), MacOS, Windows10, and several compilers and versions (GNU, Intel, PGI), it can be compiled with or without OpenMP support
On MacOs machines, a sequential version of IGMPlot can be obtained with the Clang compiler. In the Makefile choose the options:
- CppCompilerFamily=GNU
- CppCompilerVersion=5andabove o OpenMP=NO
- CC=g++
On MacOS machines, to leverage OpenMP multicore execution, you must install a gcc (g++) version different from the one provided within the compiler front end “Clang” which until now has not built-in support for OpenMP. You might install gcc with the command: ‘brew install gcc -- without-multilib’ (see for instance https://stackoverflow.com/questions/35134681/installing- openmp-on-mac-os-x-10-11). This way, the compiler might be installed somewhere like /usr/local/Cellar/gcc/7.1.0/bin/g++-7. In this example, make sure the g++-7 command be available with your PATH and adjust the IGMPlot makefile accordingly (changing the g++ command with g++-7 for instance).
This link might also be useful OpenMP under MacOSX.
An automated framework for NMR chemical shift calculations of small organic molecules
A recent paper in Journal of Cheminformatics describes An automated framework for NMR chemical shift calculations of small organic molecules DOI.
As an alternative, we introduce the in silico Chemical Library Engine (ISiCLE) NMR chemical shift module to accurately and automatically calculate NMR chemical shifts of small organic molecules through use of quantum chemical calculations. ISiCLE performs density functional theory (DFT)-based calculations for predicting chemical properties—specifically NMR chemical shifts in this manuscript—via the open source, high-performance computational chemistry software, NWChem.
Isicle is available from GitHub https://github.com/pnnl/isicle or can be installed using Conda (with required dependencies
conda create -n isicle -c bioconda -c openbabel -c rdkit -c ambermd python=3.6.1 openbabel rdkit ambertools snakemake numpy pandas yaml statsmodels
In addition, ensure the following third-party software is installed and added to your PATH:
cxcalc (license required from ChemAxon, Marvin)
NWChem http://www.nwchem-sw.org/index.php/Download.
ISiCLE is implemented using the Snakemake workflow management system, enabling scalability, portability, provenance, fault tolerance, and automatic job restarting. Snakemake provides a readable Python-based workflow definition language and execution environment that scales, without modification, from single-core workstations to compute clusters through as-available job queuing based on a task dependency graph.
There is more details on Snakemake here.
I've added Isicle to the Spectroscopy Page.
Think science for the new £50 note
A slightly different post.
The Bank of England are looking for nominations for who should be on the new £50 note and you can make your nomination here.
https://www.bankofengland.co.uk/banknotes/50-pound-note-nominations.
You can nominate as many people as you like. But anyone who appears on the new £50 note must: have contributed to the field of science, be real – so no fictional characters please, not be alive - Her Majesty the Queen is the only exception, have shaped thought, innovation, leadership or values in the UK, inspire people, not divide them.
Firstly I'm delighted to see a scientist be suggested and reading the comments in various news outlets I'm staggered by the number of scientists who have been suggested.
For my part I've nominated Ada Lovelace and Charles Babbage, there is plenty of room on a £50 note for two figures and computing needs both software and hardware.
Mobile Science Updated
A number of people have contacted me about the Mobile Science pages being down. This was due to a problem with the system I was using to search and display the contents of the database.
I've now transferred to a new system and rebuilt the database. I used the opportunity to remove apps that were no longer available and to update to newer versions. The search interface is here
https://www.macinchem.org/mobsci/index.php. I've tried to tag all apps with appropriate comments so hopefully searching should identify the relevant applications.
Please have browse and let me know if anything should be added.
The SAMPL6 Blind Prediction Challenge for Computational Chemistry
Now on GitHub https://github.com/MobleyLab/SAMPL6.
SAMPL6 Part II will include a octanol-water log P prediction challenge and will be followed by a joint D3R/SAMPL workshop in San Diego, Aug 22-23, 2019, immediately before the San Diego ACS National Meeting. A special issue or special section of JCAMD will be organized to disseminate the results of this challenge.
Embeding LaTeX and MathML in Jupyter Notebooks
I've been using Jupyter notebooks for a little while but I only just recently found out that you can embed LaTeX or MathML into a notebook!
This notebook is just a series of examples of what can be done. You can embed equations inline or have them on a separate line in a markdown text cell. Or in a code cell by importing Math or invoking latex.
OpenEye Toolkits v2018.Oct released
OpenEye have announced the release of OpenEye Toolkits v2018.Oct. These libraries include the usual support for C++, Python, C#, and Java. HIGHLIGHTS:
- Omega TK now includes a method specifically tuned to sample macrocyclic conformational space.
- FastROCS TK is now available in C++ and Java.
- Quacpac TK includes improvements to the tautomer functionality.
Full details are in the Release notes.
How to contribute to RDKit
I just noticed that Greg Landrum has posted a page on how to contribute to RDKit. https://github.com/rdkit/rdkit/wiki/HowToContribute.
There many ways to contribute, you don't have to be Python or C++ developer, simply being an active user and asking questions and contributing solutions helps other users. Improving the documentation is always a great place from newcomers to start, particularly highlighting things that are not as clear as they could be.
I've also added the link to the Toolkits page.
PythoMS: A Python framework for analysis of mass spectrometric data
An interesting publication for those who use Mass Spectroscopy, PythoMS: A Python Framework to Simplify and Assist in the Processing and Interpretation of Mass Spectrometric Data Chemrxiv.
The PythoMS framework introduces a library of classes and a variety of scripts that quickly perform time-consuming tasks: making proprietary output readable; binning intensity vs time data to simulate longer scan times (and hence reduce noise); calculate theoretical isotope patterns and overlay them in histogram form on experimental data (an approach that works even for overlapping signals); render videos that enable zooming into the baseline of intensity vs. time plots (useful to make sense of data collected over a large dynamic range) or that depict the evolution of different species in a time-lapse format; calculate aggregates; and provide a quick first-pass at identifying fragments in MS/MS spectra. PythoMS is a living project that will continue to evolve as additional scripts are developed and deployed.
All available on GitHub under the MIT license https://github.com/larsyunker/PythoMS. This package has been written for python 3.5+.
I've added it to the Spectroscopy page.
Scientific Applications under Mojave Update 6
Whilst there are many sites that track the compatibility on common desktop applications, it is often difficult to find out information about scientific applications. Based on the number of page views on the lists for High Sierra, Sierra, El Capitan and Yosemite it is apparently a useful resource.
Apple has said that macOS 10.14 (Mojave) will run on every Mac released from 2012 onwards. Always a good idea to have a Time machine backup when undertaking a major update like this.
Much like prior versions of Mac OS, you can easily create a bootable install drive for MacOS Mojave 10.14. The initial installer is small (15 MB) but after the main installer has successfully downloaded (5GB+), when you are prompted to start the installation process, click Cancel, and the installer will be in the /Applications folder (Install macOS Mojave.app). You must make a copy of this to another location; otherwise if you launch the installer and it completes the Mojave update, it'll be deleted. I found installing or updating to macOS Mojave takes about 45 - 60 minutes depending on the age of the machine.
I’ll update the list regularly and please feel free to send in information.
Amsterdam Modeling Suite has Mojave support, more details soon.
Applescript, the security changes in Mojave have some significant impacts that are described in detail here, particularly to scripting additions..
Avogadro version 1.2.0 no issues
BBEdit have a comprehensive list of compatibility notes for each application and versions https://www.barebones.com/support/new-os.html. BBEdit 12.1.6 has just been released which bring full Mojave compatibility.
ChemDoodle all seems fine no issues reported
ChemDoodle3D all seems fine no issues reported
ChemDraw as a policy they do not test until after new operating systems have been released, will check back later. A user replied "I am on ChemDraw Pro 17.1.1.0 and I have found no compatibility issues on Mojave.".
Chimera Chimera buttons not being shown until the windows containing them are resized. "Just an update that we’ve fixed the button issue in Chimera on Mojave" 1.13.1 candidate release (and recent 1.14 daily builds) working on Mojave
ChimeraX No issues reported.
ChirysView "As far as we know, there are no issues with our products and Mojave".
Cresset No issues to date, more detailed testing underway.
CrystalMaker "CrystalMaker X works beautifully with Mojave. We’re a full 64-bit app with sandboxing and code signing. Our forthcoming 10.4 update (next week or so) will include a “Dark Mode” option".
DataWarrior no issues to date
Findings no issues
iBabel no issues to date
iRaspa no issues
MarvinSketch latest version no issues, earlier versions may require the old Apple supplied JDK to be reinstalled
MarvinView no issues
Microsoft have already announced compatibility notes.
Word, Excel, PowerPoint, Outlook, OneDrive, Skype for Business, and OneNote will install and run on macOS 10.14 Mojave. Microsoft fully supports Office 2016, Office 2019 and Office 365 for Mac on 10.14 Mojave when you have the following Office updates installed:
Office 365/2019 - Build 16.17.0 or later
Office 2016 - Build 16.16.2 or later
Skype for Business 16.21.65 or later
Mnova lite no issues reported
Molecular Materials and Informatics have just released an update for the Molecular notebook app
MOE There seem to be no problems running MOE 2018.01 or higher on MacOS 10.14.
MolSoft We have not had any reported issues about running ICM-Pro or ICM-Chemist-Pro on Mac OSX 10.14 Mojave.
O "It runs on my laptop without problem and I’ve not heard any news from users so I don’t think there are any issues at the moment."
Openbabel no reported issues
PyCharm only very minor issues
PyMOL "Although Mojave was just released, all of our automated and QA tests do not show anything out of the ordinary. While it's probably still too early to declare official support, we haven't encountered anything that suggests users should worry about upgrading with regards to the PyMOL application".
RDKit no reported issues
Schrodinger suites are not yet officially supported on Mojave. Tests are ongoing, and we're hoping to officially support that platform in our next release (2018-4).
SeeSAR "0 issues with Mojave at our end".
StarDrop Our initial tests suggest that everything is working as expected.
TextWrangler is not compatible with Mojave. It has been sunsetted, and is now part of BBEdit.
Vortex no issues reported.
xQuartz I've only done limited testing but all OK for me.
Updated 30 October 2018
New functionality in PyMOL command line scripts
MayaChemTools is a growing collection of Perl and Python scripts, modules, and classes to support a variety of day-to-day computational discovery needs.
The PyMOL command line scripts now have additional functionality:
- Volume objects to visualize X-ray and cryo-EM density for complex, chains, ligands, binding pockets, pocket solvents, pocket inorganics, etc.
- Alignment of macromolecules and densities during visualization of X-ray and cryo-EM densities
- Surface colored by vacuum electrostatics at residue level for chains and pockets
- Surface colored by hydrophobicity along with charge at atom level for chains and pockets
- Aromatic, polar, positively charged, negatively charged, and other residue group objects for chains and pockets
Camelot, python tool for extracting PDF table data
Camelot is described as a PDF Table Extraction for Humans, it is a Python library that makes it easy to extract tables from PDF files.
>>> import camelot >>> tables = camelot.read_pdf('foo.pdf') >>> tables <TableList n=1> >>> tables.export('foo.csv', f='csv', compress=True) # json, excel, html >>> tables[0] <Table shape=(7, 7)> >>> tables[0].parsing_report { 'accuracy': 99.02, 'whitespace': 12.24, 'order': 1, 'page': 1 } >>> tables[0].to_csv('foo.csv') # to_json, to_excel, to_html >>> tables[0].df # get a pandas DataFrame!
Camelot only works with text-based PDFs and not scanned documents. Camelot also comes with a command-line interface. It can be installed using conda
$ conda install -c camelot-dev camelot-py
I've added it to the Data Analysis tools page
Molecular Notebook with “dark mode”
Molecular Materials and Informatics have just released an update for the Molecular notebook app.
A desktop app to draw chemical structures and reactions, and to organise your data in convenient spreadsheet-like files. The Molecular Notebook is designed from the ground up to leverage the macOS platform, integrating seemlessly and making presentation quality graphics just one drag away.
The Molecular Notebook desktop app for chemical structure & data content creation has a refresh on the iTunes AppStore: it now responds to the dark-mode preference.
TTClust : A molecular simulation clustering program
TTclust DOI is a python program used to cluster molecular dynamics simulation trajectories. It only requires a trajectory and a topology file (compatible with most molecular dynamic packages such as Amber, Gromacs, Chramm, Namd or trajectory in PDB format thanks to the MDtraj package).
It is available on GitHub https://github.com/tubiana/TTClust.
For Mac user
If you have issues with pip, first try to add to pip the --ignore-installed argument : sudo pip install --ignore-installed -r requirements.txt If it still doesn't work, it's maybe because of the System Integrity Protection (SIP). I suggest you in this case install ANACONDA or MINICONDA and restart your terminal afterwards. Normally, the pip command should work because your default python will be the anaconda (or miniconda) python.
If you have still issues with the GUI or missing packages : install with pip :
pip install wxpython==4.0.0b1
pip install pandas
pip install ttclust
To activate autocompletion for the argpase module, you have to use this command (only once):
sudo activate-global-python-argcomplete
Install RDKiit using Conda
Just highlighted on the RDKit email list, you can install RDKit using conda.
https://anaconda.org/conda-forge/rdkit
RDKit is a collection of cheminformatics and machine-learning software written in C++ and Python.
There are other cheminformatics toolkits described here, and details on how to install a wide range of cheminformatics tools on a Mac detailed here
pywindow: Automated Structural Analysis of Molecular Pores
An interesting recent publication describes pywindow DOI a Python package for the analysis of structural properties of molecular pores (porous organic cages, but also MOFs and metallorganic cages).
Structural analysis of molecular pores can yield important information on their behavior in solution and in the solid state. We developed pywindow, a python package that enables the automated analysis of structural features of porous molecular materials, such as molecular cages.
Freely available on Github https://github.com/JelfsMaterialsGroup/pywindow
Requires numpy, scipy, scikit-learn
A number of Jupyter notebook examples are provided
Example1: Structural analysis of a single molecule loaded from a file type. (multiple examples)
Example2: Structural analysis of a single molecule loaded from an RDKit Molecule object. (required RDKit)
Example3: Calculating an average molecule diameter.
Example4: Analysis of a MOF.
Example5: Analysis of a metal-organic cage.
Example6: Analysis of a periodic system containing several molecular pores that requires unit cell reconstruction.
Example7: Analysis of an MD trajectory containing single molecular pore.
Example8: Analysis of an MD trajectory containing periodic system with multiple molecular pores that requires unit cell reconstruction
Mobile Science
Some may have noticed that the mobile science section of the site has gone down. The data for this section is stored in a MySQL database and displayed using a content management system called Pligg (aka Kliqqi) this has worked flawlessly for many years but the latest update to PHP seems to have broken something. Since it seems that Pligg has not been updated for several years I suspect it has been abandoned. So I need to move to another front-end to the database, any suggestions would be welcome but user friendly is essential.
iOS12 adoption
iRASPA: GPU-accelerated visualisation software for materials scientists
Just came across this application and I thought it would be worth flagging, iRASPA is a GPU-accelerated visualization package aimed at material science. Molecular Simulation Journal. 44 (8): 653–676 DOI
iRASPA is a visualization package (with editing capabilities) aimed at material science. Examples of materials are metals, metal-oxides, ceramics, biomaterials, zeolites, clays, and metal-organic frameworks. iRASPA is exclusively for macOS and as such can leverage the latest visualization technologies with stunning performance. iRASPA extensively utilizes GPU computing. For example, void-fractions and surface areas can be computed in a fraction of a second for small/medium structures and in a few seconds for very large unit cells. It can handle large structures (hundreds of thousands of atoms), including ambient occlusion, with high frame rates.
Via iCloud, iRASPA has access to the CoRE Metal-Organic Frameworks database containing 4764 structures and 2932 structures enhance with atomic charges. All the structures can be screened (in real-time) using user-defined predicates. The cloud structures can be queried for surface areas, void fraction, and other pore structure properties.
iRaspa is written in Swift.
Open Force Field Consortium
The Open Force Field Consortium, an academic-industry collaboration designed to improve small molecule force fields used to guide pharmaceutical drug discovery.
The Consortium will develop an extensible, open source toolkit for constructing, applying, and evaluating force fields; produce and curate public datasets necessary to build high-accuracy biomolecular force fields; and apply these tools and datasets to generate improved force fields. Academic and industry partners work together to ensure its success.
Turbomole Update
There is a new Turbomole release
TURBOMOLE has been developed to provide a fast and stable code to treat molecules for industrial application. With the TUBROMOLE implementation of RI-DFT, one of the fastest DFT methods will be available at your fingertips.
TURBOMOLE V7.3 (July 2018) New features:
- PNO-CCSD(T0) and PNO-CCSD(T) energies for closed-shell systems [1]
- new DFT-D4 dispersion correction based on xTB [2]
- modernized NMR (with RI-J, COSMO, meta-GGAs, low-order scaling HF-exchange, SMP parallelization) [3]
- VCD spectra using COSMO
- periodic DFT with larger basis sets (treatment of linear dependency)
- two-photon absorption cross sections and analytic frequency-dependent hyperpolarizabilities with TDDFT/TDHF [4]
- X2C gradients for 1- and 2-component DFT, full X2C and DLU-X2C [5]
- vibronic absorption/emission spectra (new module: radless) [6]
- CC2 vertical excited states with COSMO [7]
- NTO (natural transition orbitals) for TDDFT
- RI-GW based on dRPA (very fast GW and BSE) [8]
Efficiency:
- GW and Bethe-Salpeter based on fast dRPA
- support of RI-J and linear scaling HF exchange in NMR calculations
- PNO-MP2 closed shell energy calculations significantly more efficient
Usability:
- new scripts for parallel execution which recognize the most frequently used queuing systems
TmoleX (4.4) now supports:
- PNO-MP2, PNO-CCSD, PNO-CCSD(T0) and PNO-CCSD(T)
- DFT-D4 dispersion correction
- X2C relativistic two-component treatment for spin-orbit coupling terms, and new X2C basis sets
- Fukui indices and functions (calculation and visualization)
- movie exports to mp4 file format
- B97-3c functional
iOS12 adoption
The adoption of iOS is being monitored by mixpanel and it currently stands at around 20%. Rather slower uptake than iOS11 but it looks like there is a clean transition from iOS11 to iOS12.
The comparison with Android OS adoption is interesting.
Reenabling Extensions after Safari 12 update
The latest update to Safari (Version 12) brings a range of features intended to improve online security and privacy. Unfortunately one of the consequences is that only Safari Extensions available from the App Store are enabled and you will get a message that Safari no longer supports unsafe extensions and you are directed to the App Store.
Whilst I'm sure that extensions from major developers will migrate to the App Store I suspect that those Extensions provided by scientists may well not make the transition. This is a shame because some are very useful. However you can build the extension yourself to get around the problem.
This tutorial shows how to extract the code from an existing extension and then build it using Extension Builder.
Installing Cheminformtics packages on a Mac
A while back I wrote a very popular page describing how to install a wide variety of chemiformatics packages on a Mac, since there have been some changes with Homebrew which have meant that a few of the scientific applications are no longer available so I've decided to rewrite the page on installing the missing packages using Anaconda.
I've also included a list of quick demos so you can everything is working as expected.
Packages include:
- OpenBabel
- RDKit
- brew install cdk
- chemspot
- indigo
- inchi
- opsin
- osra
- pymol
- oddt
In addition to gfortran and a selection of developers tools.
iChemLabs and SciFinder-n
Just got this update from iChemLabs the developers of ChemDoodle.
"iChemLabs customized one of the leading chemistry sketchers and graphics drawing tools for the new SciFindern interface,” said Kevin Theisen, President, iChemLabs, LLC. “Our collaboration with CAS has been very successful in helping researchers develop and visualize better chemical structures more rapidly within SciFindern, and we look forward to continuing to provide SciFindern users with this best-in-class experience.”
Open Source Cheminformatics Tookits
When I wrote the article entitled A few thoughts on scientific software one of the responses I got was that people did not know about the existence of open-source chemistry toolkits so I thought I'd publish a page that hopefully prevent stop people reinventing the wheel. Here are four open-source toolkits that I'm aware of and if I've missed any, my apologies and send me details. Listing of Open-source cheminformatics toolkits
MayaChemtools
MayaChemTools now includes a collection of python scripts for PyMol
The command line Python scripts based on PyMOL provide functionality for the following tasks:
Aligning macromolecules Splitting macromolecules into chains and ligands Listing information about macromolecules Calculation of physicochemical properties Comparison of marcromolecules based on RMSD Conversion between different ligand file formats Visualizing X-ray electron density and cryo-EM density Visualizing macromolecules in terms of chains, ligands, and ligand binding pockets
MayaChemTools is a growing collection of Perl and Python scripts, modules, and classes to support a variety of day-to-day computational discovery needs.
Optibrium and Intellegens Collaborate
Optibrium and Intellegens Collaborate to Apply Novel Deep Learning Methods to Drug Discovery
Partnership combines Intellegens’ proprietary AI technology with Optibrium’s expertise in predictive modelling and compound design. Optibrium provides elegant software solutions for small molecule design, optimisation and data analysis. By leveraging Intellegens’ AlchemiteTM technology, the partnership will create a “next generation” predictive modelling platform that is capable of delivering more accurate predictions and enabling better decision-making when it comes to the optimisation of compounds.
MGMS Young Modellers’ Forum 2018
Molecular Graphics and Modelling Society Young Modellers’ Forum 2018.
To encourage young molecular modellers at the beginning of their careers, the MGMS invites PhD students who wish to present their work on any aspect of computational chemistry, cheminformatics, or computational biology at the 2018 Young Modellers’ Forum. Other members of the modelling community are are strongly encouraged to attend this event as it is your opportunity to see these talented young modellers and to assist us in the evaluation of the prizes. There is also the chance to discuss the talks afterwards in the pub
Abstract submission 5th October 2018
Date: Friday, 30th November, 2018 Venue: Room QA063, Queen Ann Court, The Old Naval College, Greenwich Location: Details of how to get to the campus can be found at http://www2.gre.ac.uk/about/travel/greenwich.
RSC Chemical Information and Computer Applications Group
The website for RSC Chemical Information and Computer Applications Group (CICAG) has undergone an update http://www.rsccicag.org now includes more information on forthcoming events and awards, together with the latest CICAG newsletter. Please feel free to share.
The Chemical Information and Computer Applications Group (CICAG) is one of the RSC’s many member-led Interest Groups, which exist to benefit RSC members and the wider chemical science community, and to meet the requirements of the RSC’s strategy and charter.
CICAG works to support users of chemical information, data and computer applications and advance excellence in the chemical sciences. Inform RSC members and others of the latest developments in these rapidly evolving areas and promote the wider recognition of excellence in chemical information and computer applications at this level
Virtual Chemical Libraries
A very interesting paper on Virtual Chemical Libraries by W. Patrick Walters DOI describing how it is now possible to generate virtual libraries of molecules of billions of compounds. These vast virtual libraries result in a number of practical challenges in particular their use in virtual screening.
If we consider a virtual screen with a false positive rate of 1% (an optimistic estimate for even the best virtual screening methods), a virtual screen on a library of 1 million molecules would yield 10,000 false positive hits. (A “false positive” is an inactive molecule which is predicted to be active).
Another consideration with very large virtual libraries is the time and CPU resource required for processing, whilst substructure and 2D similarity searches are very fast and can make use of hashed fingerprints. 3D or docking searches are orders of magnitude slower and require either storage of multiple conformations of the ligand or conformation generation on the fly. Realistically these require access to large compute clusters, cloud based resources are now relatively accessible but require significant expertise to access efficiently and securely.
Even the fastest docking programs require 2 seconds per molecule to dock an ensemble of conformations into a protein binding site. At this rate, approximately 15,327 CPU days would be required to dock 680 million molecules.
With this in mind it perhaps appropriate to flag that D3R Grand Challenge 4 has just opened, Full details are published on the Drug Design Data Resource site.
Papers Update
The latest update to Papers 3 (version 3.4.16)
- Adds support for Microsoft Word version 16
- Adds Citations support for Scrivener
- Resolves issues with PDF annotations
There are more reference management apps here.
Chembience
Chembience is a Docker based platform intended for the fast development of chemoinformatics-centric web applications and micro-services based on RDkit. It supports a clean separation of your scientific web service implementation work from any infrastructure related configuration requirements.
At its current development stage, Chembience supports three base types of application (App) containers: (1) a Django/Django REST framework-based App container which is specifically suited for the development of web-based Python applications, (2) a Python shell-based App container which allows for the execution of script-based python applications, and (3), a Jupyter-based App container which let you run Jupyter notebooks (currently only a Python kernel is supported).
Augmented Reality in Chemistry
The use of augmented and virtual reality in chemistry is slowly starting to gain traction. The initial use of virtual reality in drug discovery is well documented but usually confined to highly specialised hardware which has limited it's exposure to a wider audience. However as described by Jonas Boström at the recent Chemistry on Mobile Devices Meeting Virtual reality smartphone apps making chemistry look and feel cool. This project aims to enhance the learning experience for school chemistry lessons by providing virtual reality viewing of molecules using inexpensive Google Cardboard viewers available online.
Virtual reality smartphone apps are making chemistry look and feel cool. This project aims to enhance the learning experience for school chemistry lessons by providing virtual reality viewing of molecules using inexpensive Google Cardboard viewers.
EduChemVR have a number of apps for download to allow users to interact with macromolecules or learn stereochemistry.
The power of the latest generation of smart phones has enabled scientists to also explore augmented reality. Augmented reality is now being used in a number of situations. To enhance publications as demonstrated by Alistair Crow, if you want to know how to do this instructions are available here. Many people have probably used the superb ChemTube3D website created by Nick Greeves at the University of Liverpool which is an invaluable education resource, this is also accessible via a Smartphone app.
ChemTube3D contains interactive 3D animations and structures, with supporting information for some of the most important topics covered during an undergraduate chemistry degree
More recently some of the pages have been enhanced to provide access to virtual reality models, if you would like to develop similar pages there is an AppleScript droplet to batch convert Jmol files into files suitable for AR.
More recently Mark Costner has released MoleculAR: an augmented reality (AR) app to view molecules in 3D.
The images of molecules for use with the MoleculAR augmented reality app are available on GitHub and there is a more detailed explanation here.
Mixfile format
An interesting post on Mixtures & cheminformatics on designing a new file format to handle mixtures of chemicals, in particular things like "LDA within a solvent mixture of THF and hexanes, in a ratio of 1 to 7".
The format hasn’t been locked down yet, but it is very simple: it’s JSON-based, in order to make it easy to read & write with any software platform, and have high human readability. It’s hierarchical, making it possible to describe mixtures-of-mixtures, which happens frequently. Each component is expected to provide a structure and quantity whenever these are known, with name being also highly encouraged. Other information like canonical identifiers, database links, cross references, etc., can easily be encapsulated – the Mixfile is intended to be an inclusive container of information – but they do not necessarily impart much-if-any special meaning to the software that interprets them.
More info can be found on the GitHub page https://github.com/cdd/mixtures.
D3R Grand Challenge 4
I've written a couple of tutorials on docking here and here that have been popular pages.
The tools used for docking are being regularly updated and so the D3R Grand Challenge 4, a new blinded prediction challenge for protein-ligand poses and affinities is an invaluable data point for comparison of the current state of play.
The Grand Challenge 4 (GC4) will open on September 4, with the following submission deadlines:
- Stage 1a, cross-docking challenge: October 4
- Stage 1b, self-docking challenge: October 19
- Stage 2. affinity ranking and free energies: December 4
Challenge components will include:
- Affinity ranking of ~450 Cathepsin S inhibitors from the same large dataset drawn from in GC3
- Affinity ranking of ~150 beta secretase 1 (BACE) inhibitors
- Pose prediction of 20 BACE inhibitors
- Free energy prediction challenges suitable for alchemical free energy methods, for both Cathepsin and BACE
Full details will be published on the Drug Design Data Resource site.
OMEGA v3.0.1 released
OpenEye have just announced the release of OMEGA v3.0.1 This upgrade fixes several bugs and adds a number of internal improvements.
Major bug fixes
- A bug that caused memory leaks in OMEGA classic, dense, pose, and rocs modes, has been fixed. Previously, a substantial memory leak was experienced when running OMEGA on a large database.
- OMEGA macrocycle no longer uses excessive memory for molecules with terminal heavy atoms.
OMEGA performs rapid conformational expansion of drug-like molecules, yielding a throughput of tens of thousands of compounds per day per processor. OMEGA is very effective at reproducing bioactive conformations, and provides an optimal balance between speed and performance when used on large compound databases.
Deep Replay
This looks rather neat, Deep Replay
Deep Replay is a package designed to allow you to replay in a visual fashion the training process of a Deep Learning model in Keras.
To install Deep Replay just type:
pip install deepreplay
Chemfp
Just got this message which I thought readers might be interested in
chemfp 1.5 is now available from http://dalkescientific.com/releases/chemfp-1.5.tar.gz and from PyPI (the Python package index) through "pip install chemfp".
The software is available in source code form under the MIT license. For more information see the home page at http://chemfp.com/ or the documentation page at https://chemfp.readthedocs.io/en/chemfp-1.5/ .
Chemfp is a set of command-line tools and a Python library for working with cheminformatics fingerprints. It can use OEChem/OEGraphSim, RDKit, or Open Babel to create fingerprints in the FPS format, and it implements a high-speed Tanimoto search.
As far as I can tell, chemfp 1.5 is the fastest free/open source fingerprint search system for the CPU. (Some proprietary/commercial toolkits are faster, including the commercial version of chemfp, and GPU-based search is usually faster than the CPU.)
The main changes for this release are:
- 10% faster performance for k-nearest search
- fixed a bug in symmetric k-nearest neighbor when multiple fingerprints have no bits set
- improved the use of chemfp as a baseline benchmark for similarity search tools
Similarity search performance benchmark
Concerning the last point, I have assembled a data set which can be used to benchmark similarity search performance for several different search types, fingerprint types, and scoring functions. This includes pre-computed fingerprints and expected search results, as well as timing numbers for several different versions of chemfp.
My hope is that it evolves into a standard benchmark that help evaluate search tools - bearing in mind that performance is only one of many factors that go into selecting a tool.
The benchmark files are at https://bitbucket.org/dalke/chemfp_benchmark . Those files which fall under copyright are distributed under the MIT license.
Many thinks to ChEMBL, OpenEye, PubChem, Open Babel, RDKit, and Daniel Lemire for providing the data and resources for putting this benchmark together.
Best regards,
Andrew dalke@dalkescientific.com
Aug 15 1998 Apple launches the iMac
On August 15, 1998, Apple launched the first iMac into the world, the multi-colored gumdrop-shaped iMac proved to be the perfect launchpad for a revitalised Apple.
The first iMac had fairly modest specs, a 233 - 700MHz PowerPC 750 G3 processor, 128GB of storage, a 15-inch CRT, a CD-ROM drive, and an ATI graphics card. Since then Apple has regularly upgraded the iMac
The latest Pro version boasts up to 18-core 2.3 GHz Intel Xeon W processors (Turbo Boost up to 4.3GHz), 32GB of 2666MHz DDR4 memory (four SO-DIMM slots, user configurable to 128GB), up to 4TB SSD storage 27-inch (diagonal) Retina 5K display and Radeon Pro Vega graphics 64 card with 16GB of high bandwidth memory, and of course it is available in space grey.
Happy 20th birthday.
REALizer KNIME workflow from BioSolveIT
BioSolveIT have added to their collection of KNIME workflows.
The "REALizer" helps you to post-process the results from searches in the REAL Space, leading you to those compounds of biggest interest.
Fortran on a Mac update
As I've noted on several occasions I'm not a big Fortran user but looking at the website stats the Fortran on a Mac page is now the third most regularly read page on the site and page views seem to be increasing.
I was recently sent a new link and I have added it to the Fortran on a Mac page.
Sourcery Institute a variety of resources for Fortran programmers, Sourcery institute tap for Homebrew formulae not in homebrew/homebrew-core, a Coarray Fortran Jupyter notebook kernel, forks of flang and gcc and OpenCoarrays a transport layer for coarray Fortran compilers.
An Applescript droplet to generate Augmented Reality files from JMol
Augmented reality is finding new applications in science, in particular the ability to enhance publications or lecture notes, and viewers can set up a free account with Augment to provide easy access.
I was asked recently if it might be possible to generate an AppleScript droplet that you could simply drop a chemical structure file onto to generate the desired files needed for the Augment, and this is an ideal use case for a droplet.
This script uses Jmol to generate the Wavefront .obj and .mtl files which can be used
You read more about the script and download it here.
Nick Greeves has tweeted an example of its use here and a demo page here.
Updated INSENSITIVE
Insensitive (Incredible Nuclear Spin EvolutioN Simulation Tool Intended for Visual Education) is an application to simulate the NMR experiment based on the quantum mechanical density matrix formalism.
It is available for Mac OS X 10.6 and above and iOS 5.1.1 and above. Full details can be found in Concepts In Magnetic Resonance, 2011, 38A (2), 17-24 DOI.
The NMR experiment is usually described by a choice of three models that operate on different levels of abstraction: the vector model, the product operator formalism and the density matrix approach. The transition between these models poses a didactic challenge for teacher and student alike. A new computer program is presented, which simulates a spin system on the textbook level and compares the three approaches, with the possibility to manipulate the system at every step. It closes a gap between NMR education and professional simulation tools. Some algorithms are explained, which are used in the simulation to extract information from the density matrix.
ACS awards for Computers in Chemistry
Nominations are now open for the Computers in Chemistry division of the ACS awards.
More details here http://www.acscomp.org/awards.
New ChEMBL interface
Just having a look at the new ChEMBL interface, quite like the easy way to embed records into web pages
<object data="https://www.ebi.ac.uk/chembl/beta/embed/#mini_report_card/Compound/CHEMBL1471" width="100%" height="300"></object>
and it is displayed as shown below.
Will doing some more investigations later this week.
Intelligently Automating Machine Learning, Artificial Intelligence, and Data Science
A timely tutorial and example workflow.
we have put together a more comprehensive workflow, serving as a blueprint for anyone to build her or his own version of a Guided Analytics application to combine just the right amount of automation and interaction for a specific set of problems.
LabMathX
LabMathX is a MacOSX program for scientific analysis, calculations and Visualisation that includes support for older hardware.
- LabMathX is Scriptable with AppleScript. Check the Dictionary with Apple's Script Editor.
- LabMathX Supports Services and Can Be Accessed From the Services Menu in Other Services-Aware Applications.
- LabMathX and Its Plug-ins Are Written in Objective C Under Cocoa.
Apps at discount prices
A summer promotion is offering 12 applications at discount prices, pick and choose the ones you want.
Here is the list of participating apps:
iOS apps:
- Mindnode by IdeasOnCanvas GmbH (AUT) → now 10,99€/$9.99 (30% OFF)
- Notebooks by Alfons Schmid (AUT) → now 4,49€/$3.99 (40% OFF)
- Inko by Creaceed SPRL (BEL) → now 14,99€/$13.99 (30% OFF)
- Prizmo Go by Creaceed SPRL (BEL) → now 3,49€/$2.99 (40% OFF)
- Grafio by Ten Touch Ltd. (BGR) → now 8,99€/$7.99 (20% OFF)
- PocketCAS by Daniel Alm (DEU) → now 4,99€/$3.99 (50% OFF)
- Money by Jumsoft (LTU) → now 1,09€/$0.99 (65% OFF on Standard IAP)
Mac apps:
- Mindnode by IdeasOnCanvas GmbH (AUT) → now 29,99€/$26.99 (30% OFF)
- Notebooks by Alfons Schmid (AUT) → now 9,99€/$8.99 (50% OFF)
- Prizmo by Creaceed SPRL (BEL) → now 38,99€/$32.99 (30% OFF)
- Remote Buddy by IOSPIRIT GmbH (DEU) → now 19,99€/$17.99 (20% OFF)
- PocketCAS by Daniel Alm (DEU) → now 9,99€/$8.99 (50% OFF)
- Findings by Findings Software SAS (FRA) → now 32,99€/$29.99 (40% OFF)
- PDF Watermarker by seense (FRA) → now 8,99€/$7.99 (60% OFF)
- Money by Jumsoft (LTU) → now 16,99€/$14.99 (40% OFF on Standard IAP)
- Studies by The Mental Faculty B.V. (NLD) → now 21,99€/$19.99 (30% OFF)
- Workspaces by Apptorium (POL) → now 6,99€/$5.99 (35% OFF)
- FiveNotes by Apptorium (POL) → now 3,49€/$2.99 (40% OFF)
RSC CICAG webite
The Chemical Information and Computer Applications Group (CICAG) is one of the RSC’s many member-led Interest Group. The new website is now live, http://www.rsccicag.org
Why not have a browse around and let us know what else you would like to see included.
Wolfram|Alpha
Wolfram|Alpha has been updated
Across thousands of domains--with more continually added--Wolfram|Alpha uses its vast collection of algorithms and data to compute answers and generate reports for you. The Wolfram|Alpha App plugs directly into the Wolfram|Alpha supercomputing cloud, computing answers to your questions quickly, efficiently, and without draining your battery.
There are more iPhone/iPad science apps on the Mobile Science Website.
A few thoughts on scientific software
Whilst this website is aimed at providing a resource for Mac using chemists regular readers will know that much of the content is platform agnostic and includes much code/software that will be of interest to all scientists.
I recently got a rather sad email
It seems that Third Street Software quietly disappeared, breaking the syncing for Sente (reference management).
I've also heard about a couple of other smaller software developers who are finding life very tough and it started me thinking about the status of scientific software, after exchanging emails with a number of people in the industry (many thanks for their input) I thought I'd collect a few thoughts on my blog.
You can read it here https://www.macinchem.org/reviews/scientificsoftware/software.php.
BBEdit updated
BBEdit 12.1.5 contains fixes for reported issues. This update does not contain any new features.
The full release notes are available here https://www.barebones.com/support/bbedit/notes-12.1.5.html.
KNIME update
What’s New in KNIME Analytics Platform 3.6.
- KNIME Deep Learning
- Constant Value Column Filter
- Numeric Outliers
- Column Expressions
- Scorer (JavaScript)
- Git Nodes
- Call Workflow (Table Based)
- KNIME Server Connection
- Text Processing
- Usability Improvements
- Connect/Unconnect nodes using keyboard shortcuts
- Zooming
- Replacing and connecting nodes with node drop
- Node repository search
- Usability improvements in the KNIME Explorer
- Copy from/Paste to JavaScript Table view/editor
- Miscellaneous
- Performance: Column Store (Preview)
- Making views beautiful: CSS changes
- KNIME Big Data Extensions
- Create Local Big Data Environment
- KNIME H2O Sparkling Water Integration
- Support for Apache Spark v2.3
- Big Data File Handling Nodes (Parquet/ORC)
- Spark PCA
- Spark Pivot
- Frequent Item Sets and Association Rules
- Previews
- Create Spark Context via Livy
- Database Integration
- Apache Kafka Integration
KNIME Server
Management (Client Preferences)
- Job View (Preview)
- Distributed Executors (Preview)
General release notes
JSON Path library update
- Java Snippet Bundle Imports
I suspect it will be the KNIME Deep learning that will catch the eye, the ability to set up deep learning models using drag and drop. Use regular Tensorflow models within KNIME Analytics Platform and seamlessly convert from Keras to Tensorflow for efficient network execution
The new Create Local Big Data Environment node creates a fully functional local big data environment including Apache Spark, Apache Hive and HDFS. It allows you to try out the nodes of the KNIME Big Data Extensions without a Hadoop cluster.
Resuts from Avogadro Survey
The results of the Avogadro 2018 Community Survey are now in.
Avogadro is an advanced 3D molecule editor and visualizer designed for cross-platform use in computational chemistry, molecular modeling, bioinformatics, materials science, and related areas. It offers flexible high quality rendering and a powerful plugin architecture.
The results are well worth browsing though but here are a few things I've picked out
- The most common way people hear about Avogadro by word of mouth.
- Most people install downloaded binaries
- Many users can code, mainly Python
- Most tasks performed centre around initial molecule building and editing
You can download from sourceforge here https://sourceforge.net/projects/avogadro/files/latest/download
The use of augmented reality in chemistry
A couple more examples of the use of augmented reality to display chemistry
This also looks interesting.MoleculAR - sneak peak on an augmented reality app to help organic chemistry students visualise molecules in 3D, using just their lecture notes and mobile devices! pic.twitter.com/NOa9Q3bAYZ
— Mark Coster (@MarkCoster_Chem) July 8, 2018
Touching proteins with virtual bare hands
….A more accessible and intuitive visualization of the three-dimensional configuration of the atomic geometry in the models can be achieved through the implementation of immersive virtual reality (VR). While bespoke commercial VR suites are available, in this work, we present a freely available software pipeline for visualising protein structures through VR. New consumer hardware, such as the HTC Vive and the Oculus Rift utilized in this study, are available at reasonable prices….
https://doi.org/10.1007/s10822-018-0123-0
ChEMBL 24 predictive models
Recently ChEMBL was updated to version 24 the update contains:
- 2,275,906 compound records
- 1,828,820 compounds (of which 1,820,035 have mol files)
- 15,207,914 activities
- 1,060,283 assays
- 12,091 targets
- 69,861 documents
In addition today they released the predictive models built on the updated database, they can be downloaded from the ChEMBL ftp server ftp://ftp.ebi.ac.uk/pub/databases/chembl/target_predictions
There are 1569 models.
Tips & Tricks for Using KNIME
The Knime blog has a post containing lots of user submitted tips and tricks
Ever sat next to a friend or colleague at the computer and were awed when you suddenly realised the way they do certain tasks is much better? We recently asked KNIME users to share their tips and tricks on using KNIME. In this series of posts we’ll be showing you how the experts use KNIME in the hopes that by sharing ideas you’ll discover some handy techniques.
AI in Chemistry meeting report
RSC-BMCS / RSC-CICAG Artificial Intelligence in Chemistry
Friday, 15th June 2018 - Royal Society of Chemistry at Burlington House, London, UK
Post-event Report on Speaker Presentations, written by Bursary Awardees
http://www.maggichurchouseevents.co.uk/bmcs/Downloads/Archive/AI%20-%20post-event%20report.pdf
Added MestraNova to Mobile apps
I've just added MestReNova to the mobile science site.
MestRe Nova is an iPad app for viewing/manipulation NMR spectra
There are an increasing number of spectroscopy apps available.
1 million page views
I was looking at the website stats and I just noticed that last month the site passed the 1 million page views since it was changed to the current format. I'm delighted (and slightly surprised) that the site has proved to be so popular.
The top 5 most popular pages are:
EzMol
EzMol - An easy to use simple molecular graphics program
EzMol aims to fill a quite different role to that delivered by superb programs such as PyMol and Chimera. EzMol is designed at the occasional user and provides a step-by-step wizard to rapidly generate an image for inspection and publication. For example, residue selection, colouring and labelling using a paint-box approach so no typing of commands
You can read more here DOI.
Mnova 12.0.3 (minor release)
Update
IUPAC Name
- Able to name molecules with atoms in non-standard valence
- Implement skeletal replacement (“a”) nomenclature for heteropolycyclic ring systems
- Naming of branched ring assemblies
- Correct names of several suffix groups
- Able to name ring assemblies of 3-6 identical cyclic systems
MS
- Precursors m/z values displayed in the MSn extracted spectra title
Full details here http://resources.mestrelab.com/whats-new-mnova-12-0-3/
There is a review of Mnova here
Updated Conda
I've been checking a few things since I updated. One thing that was immediately apparent was the similarity maps in RDKit are much nicer! As you can see from the output of the HERG prediction.
Feel like I got something for free.
Add mathematical equations to your document in Pages, Numbers, and Keynote
I previously mentioned that there is LaTeX and MathML support in Pages and iBooks Author. This has now been extended to Numbers and keynote.
Add mathematical equations to your document in Pages, Numbers, and Keynote https://support.apple.com/en-us/HT207569.
You can include mathematical expressions and equations in your Pages, Numbers, or Keynote document when you use LaTeX commands or MathML elements.
Electronic Lab Notebooks
This looks useful a comparison of electronic lab notebooks
The Electronic Lab Notebook Matrix has been created to aid HMS researchers in the process of identifying a usable Electronic Lab Notebook solutions to meet their specific research needs. Through this resource, researchers can compare and contrast the numerous solutions available today, and also explore individual options in-depth.
Updating conda
I've been putting off doing any updates until I finished a substantial piece of work, but now I have time so wish me luck.
conda update -n root conda
conda update --all
Accessing a Jupyter Notebook HERG model from Vortex
A recent paper "The Catch-22 of Predicting hERG Blockade Using Publicly Accessible Bioactivity Data" DOI described a classification model for HERG activity. I was delighted to see that all the datasets used in the study, including the training and external datasets, and the models generated using these datasets were provided as individual data files (CSV) and Python Jupyter notebooks, respectively, on GitHub https://github.com/AGPreissner/Publications).
The models were downloaded and the Random Forest Jupyter Notebooks (using RDKit) modified to save the generated model using pickle to store the predictive model, and then another Jupyter notebook was created to access the model without the need to rebuild the model each time. This notebook was exported as a python script to allow command line access, and Vortex scripts created that allow the user to run the model within Vortex and import the results and view the most significant features.
All models and scripts are available for download.
Schrödinger Software Release 2018-2
Schrödinger have announced a major update their software suite.
Full details are here https://www.schrodinger.com/newfeatures
BiosolveIT update SeeSAR and more.
BioSolveIT have announced significant changes and improvements in SeeSAR resulting in another major release to version 8. The biggest change is that they now provide full protein visualization support. While the focus of the tool is for the most part still on the defined binding site, you can now...: see the whole protein in all its glory! As always, a major update means that HYDE scores must be re-calculated to stay in line with the changes made in the underlying structures. We certainly believe that these enhancements are well worth it:
- improved alignment
- full protein support in the seqence view
- search&find specific amino acids, waters or other protein components
- all protein visualization controls bundled
- enhanced pharmacophore handling
- fragment growing for covalent binders
For details see: https://www.biosolveit.de/SeeSAR/changes.html
They also have two new tools:
REALSpaceNavigator is the world's largest, ultra-fast searchable chemical space developed in collaboration with Enamine Ltd. It comprises roughly 3.8 billion compounds today, which will be delivered on demand in less than 4 weeks with an exceptional success rate of 80% and above.
PepSee is a software tool for interactive, visual compound prioritization as well as the design of next-generation peptide therapeutics. Peptide design ideally supports a multi-parameter optimization to maximize the likelihood of success. PepSee visualizes the relevant parameters at hand, side by side with the sequence data. Color-coded display stimulates SAR exploration. The main features of PepSee comprise:
- comfortable sequence & data import (from Excel, FASTA, PLN, Text, even PDF)
- automated as well as manual sequence alignment
- various data coloring and plotting options
- organizing and annotating your compounds
- interactive design of novel peptides
Data Creator Updated
One of the things that I’m occasionally asked for is a test data set that can be used to evaluate an application. Whilst I keep a couple of data sets that I can use perhaps DataCreator will provide a more comprehensive solution. Data Creator is an application that has been designed to fill this important niche, Data Creator can be used to build very large data sets using field types defined by the user and then filled with random realistic content.
Data Creator can create sample tables (rows and columns) as you like and fill them with pseudo-random proper content (rows of content) with a single click. You can select which kind of fields (columns) you like (name of animals, colors, fruits, english surname, german names and so on with over 50 different kind of data) and have all the contents filled for how many rows you like in a click. It can export to Comma separated value, Tab separated values, html tables, even web pages ready to click or in any custom format you like.
The latest update brings a couple of bug fixes and
- New type 'Decimal Number in Range' to many requested format such as currency (example: $ 1.99)
- Improved error detection of data formatting
- Optimized for macOS 10.12 Sierra
There is a review of DataCreator here.
A Review of MNova NMR
MNova NMR is Mestrelab Research’s NMR analysis program that can be used to quickly view, process and analyse both 1D and 2D spectra, as well as to easily produce publication quality assignments and images. The software can be downloaded from Mestrelab’s website (45-day free trial licences are available).
You can read the review here
Mobile Science Apps
I just checked the most upvoted apps on the Mobile Science site
https://www.macinchem.org/mobilescience/upvoted/.
ChemDoodle still tops the list but Medicinal Chemistry Toolkit and Elemental are picking up votes as is WolframAlpha. The newly updated Findings lab notebook also remains popular.
The virtual reality macromolecule viewer Learning MacroMols VR is also popular.
A quick look at CypReact
Sometimes you just want to know which enzymes are likely to be involved in the metabolism of a molecule, CypReact DOI takes a structure (SMILES or sdf input) and predicts if the molecule will react with any one of the nine of the most important human cytochrome P450 (CYP450) enzymes [CYP1A2, CYP2A6, CYP2B6, CYP2C8, CYP2C9, CYP2C19, CYP2D6, CYP2E1, or CYP3A4]. Read more here..
How Do You Build and Validate 1500 Models and What Can You Learn from Them?
Greg Landrum's ICCS 2018 presentation on slideshare
Scaling Python with Dask webinar
This looks to be an interesting webinar on Dask
https://know.anaconda.com/Scaling-Python-Dask-Webinar.html Wednesday, May 30th at 2:00PM CDT.
Dask is a flexible parallel computing library for analytic computing.
Dask is composed of two components:
- Dynamic task scheduling optimized for computation. This is similar to Airflow, Luigi, Celery, or Make, but optimized for interactive computational workloads.
- “Big Data” collections like parallel arrays, dataframes, and lists that extend common interfaces like NumPy, Pandas, or Python iterators to larger-than-memory or distributed environments. These parallel collections run on top of the dynamic task schedulers.
Unix commands for helping deal with very large files
I'm regularly handling very large files containing millions for chemical structures and whilst BBEdit is my usual tool for editing text files in practice it becomes rather cumbersome for really large files (> 2 GB). In these cases I've compiled a useful list of UNIX commands that make life easier.
The page is part of the Hints and Tutorials section and can be viewed here.
Whilst I use them when dealing with large chemical structure files they are equally useful when dealing with any large text or data files.
Updated
A suggestion from a reader. Sometimes rather than one large file download sites provide the data as a large number of individual files. We can keep track of the number of files using this simple command.
MacPro:~ Chris$ ls | wc -1
177248
If anyone has any additional suggestions please feel free to submit them.
Implementing AB-MPS scoring
Whilst the rule of 5 (Ro5) has provided a useful way to describe small molecule drug space it is also clear that there are a significant number of molecular classes that exist beyond the rule of 5 boundaries (bRo5). In a review of the AbbVie compound collection DOI they were able to identify key findings that might explain the success (or failure) of bRo5 projects. From an analysis of a variety of calculated physicochemical properties they proposed a simple multiparametric scoring function (AB-MPS) was devised that correlated preclinical PK results with cLogD, number of rotatable bonds, and number of aromatic rings.
AB-MPS = Abs(cLogD-3) + NAR + NRB
Now implemented as a Vortex script.
Chemical Information and Computer Applications Group (CICAG) website
The new RSC CICAG website is now live http://www.rsccicag.org why not have a look and provide suggestions and feedback.
The Chemical Information and Computer Applications Group (CICAG) is one of the RSC’s many member-led Interest Groups, which exist to benefit RSC members and the wider chemical science community.
Also provides links to the social media feeds (Twitter, LinkedIn etc.)
Intel® Distribution for Python
Anyone fancy taking this for a test drive and providing some information on performance?
Get real performance results and download the free Intel Distribution for Python that includes everything you need for blazing-fast computing, analytics, machine learning, and more. Use Intel Python with existing code, and you’re all set for a significant performance boost.
The core computing packages, Numpy, SciPy, and scikit-learn, are accelerated under the hood with powerful, multithreaded native performance libraries such as Intel® Math Kernel Library, Intel® Data Analytics Acceleration Library, and others, to deliver native code-like performance results to Python. We leverage Intel® hardware capabilities using multiple cores and the latest Intel® Advanced Vector Extensions (Intel® AVX) instructions, including Intel® AVX-512. The Intel Python team reimplemented select algorithms to dramatically improve their performance. Examples include NumPy FFT and random number generation, SciPy FFT, and more.
Available for Windows, Linux and macOS.
Minimum System Requirements
- Processors: Intel Atom® processor or Intel® Core™ i3 processor
- Disk space: 1 GB
- Operating systems: Windows* 7 or later, macOS, and Linux
- Python* versions: 2.7.X, 3.5.X, 3.6
- Included development tools: Conda, conda-env, Jupyter Notebook (IPython)
Diversity Genie
Diversity Genie is a desktop software tool which allows to analyze and manipulate chemical data. Its capabilities include:
mapping molecules and their properties with sammon embedding.
filtering and converting sets of molecules in SDF, SMILES, and InChI formats.
plotting histograms, scatter plots, and ROC curves.
Computing well-known molecular properties and merging CSV files.
Creating machine learning models using powerful gradient boosting methods.
Diversity Genie 3 is completely free to use by academia and for personal non-commercial use. You can download Mac OSX, Windows and Linux builds at
http://www.diversitygenie.com/index.html
CCP4 release 7.0 update 056 now available
Collaborative Computational Project No. 4 (CCP4) exists to produce and support a world-leading, integrated suite of programs that allows researchers to determine macromolecular structures by X-ray crystallography, and other biophysical techniques.
Details of the latest update are here https://twitter.com/ccp4_mx/status/991256632729403392
Google Sumer of code, Open Chemistry Projects
The details of some of the projects taking part in the Google Summer of Code are now online here https://summerofcode.withgoogle.com/organizations/6513013473935360/ under the Open Chemistry header.
Really interesting work includes 3-D coordinate generation, standardising fingerprint APIs, a framework for molecular validation, and standardization and molecular dynamics in Avogadro.
Good luck to all that are taking part!!
deMon2k code version 5 released
deMon (density of Montréal) is a software package for density functional theory (DFT) calculations. It uses the linear combination of Gaussian-type orbital (LCGTO) approach for the self-consistent solution of the Kohn-Sham (KS) DFT equations. The calculation of the four-center electron repulsion integrals is avoided by introducing an auxiliary function basis for the variational fitting of the Coulomb potential.
The user guide provides installation instructions and requires a Fortran compiler, BASH and MPI.
ChemDoodle 9.0 released
I just saw that ChemDoodle 9.0 has been released and I plan to have a detailed look later this month.
ChemDoodle 9 is a major revision of every aspect of the software. We spent over 2 years overhauling and improving the cheminformatics engine, interface, drawing controls, image and chemical file types, graphics, and operating system compatibility. In addition to the new features, the entire codebase has been refactored for the current best standards to take advantage of the latest performance, memory and security features of the operating system.
What is new in ChemDoodle 9
- A new user manual discusses all the new features in detail over several pages, too many to list here. (click to load manual, section 1.2)
- Drawing and Graphics – Tons of new systems for making your graphics quicker. Auto-placement of attributes (charges/radicals/stereocenters/etc.). An improved text tool that can create both atom text and formatted captions. Draw chiral carbon nanotubes in addition to zigzag and armchair. New dynamic brackets and structure highlights. Better drawing tools for advanced figures.
- Chemistry – State-of-the-art implementation of the most recent CIP rules. A clearer and more powerful warning system. Advanced implicit hydrogen handling including the analysis of advanced aromatic resonance systems. Full support for the latest elements as defined by IUPAC and much more!
- Interface – A brand new customizable cursor system, improved IUPAC name-to-structure interface. Improved color palettes, now with Rasmol, CPK and Custom color sets. HTTPS support for PubChem is now implemented for access in MolGrabber. Improved color choosers including alpha support and high resolution improvements across the entire application.
- Chemical Files – The Nature style sheet has been added. SMILES interpretation has seen significant work, with a focus on very advanced cheminformatics techniques. Added support for the RCSB MacroMolecular Transmission Format (MMTF). More support for ChemDraw, MDL CT, MRV and ISIS/Sketch files.
- Images – TIFF images can now be exported with custom DPI settings. GIF image output can now have semi-transparent pixels merged with white. Added viewBox attribute for SVG. When saving files, you can now use alternate extensions and other image file chooser improvements. Control which image file types are shown in the save image choosers.
- Vector Art – New glassware graphics have been added as well as dozens of new BioArt.
- Customizability – The keyboard and tools shortcuts are now fully customizable by the user. The user settings folder location can now be controlled. * Custom attribute names and values are now persisted through restarts. Windows – Full support for high-DPI screens, without the manual scaling required in the past. The OLE plugin has been rebuilt for the most current compliance with Windows libraries.
- macOS – Improved and full Retina support. Native file choosers.
Jupyter and Fortran
Well after my last post about Swift and Jupyter a reader sent me link to the use of both Julia and Fortran programming languages in a Jupyter Notebook.
More information in this lecture Project Jupyter: Architecture and Evolution of an Open Platform for Modern Data Science by Fernando Perez.
Project Jupyter, evolved from the IPython environment, provides a platform for interactive computing that is widely used today in research, education, journalism and industry. The core premise of the Jupyter architecture is to provide tools for human-in-the-loop interactive computing. It provides protocols, file formats, libraries and user-facing tools optimized for the task of humans interactively exploring problems with the aid of a computer, combining natural and programming languages in a common computational narrative.
Swift 4.1 in a Jupyter Notebook
I'm a great fan of Jupyter Notebooks but I only ever use python.
The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text
A recent post by Ray Yamamoto Hilton caught my eye who recently put together a little experiment to demonstrate using Swift 4.1 from within Jupyter Notebooks.
You can download a demo notebook here.
Amber 18 and AmberTools 18released
Amber is a suite of biomolecular simulation programs. It began in the late 1970's, and is maintained by an active develpment community
Amber 18 ajor new features include:
- Free energy calculations on GPUs
- GPU support for 12-6-4 ion potentials
- Domain decomposition for CPU-parallelism
- Nudged elastic band calculations for pmemd (CPU and partial GPU implementation)
- Constant redox potential calculations, to supplement constant pH simulations
- Support and significant performance improvements for the latest Maxwell, Pascal and Volta GPUs from NVIDIA.
- New pmemd.gem code for advanced force fields, including AMOEB
AmberTools 18 new features include
- CUDA-enabled pbsa solver; extensions for membrane modeling with PB *lambda-dynamics method for constant pH simulations *packmol_memgen tool for building lipids and bilayers *New ("middle") integration algorithms in sander *Build tools based on CMake *Continued updates and extensions to cpptraj: *ability to obtain energies from snapshots of PME simulations *Pairlist and other speedups *improved scripting abilities
Instructions for installing Amber under Mac OSX are here http://ambermd.org/Installation.php
You will need to install gfortran, whilst you can download the binary it might be worth considering using Homebrew as described here
NWChem updated
Just catching up.
NWChem 6.8 is now available on Github https://github.com/nwchemgit/nwchem.
NWChem provides many methods for computing the properties of molecular and periodic systems using standard quantum mechanical descriptions of the electronic wavefunction or density. Its classical molecular dynamics capabilities provide for the simulation of macromolecules and solutions, including the computation of free energies using a variety of force fields. These approaches may be combined to perform mixed quantum-mechanics and molecular-mechanics simulations.
Instructions for compiling NWChem on various platforms including Mac OSX https://github.com/nwchemgit/nwchem/wiki/Compiling-NWChem.
STK: A Python Toolkit for Supramolecular Assembly
I bookmarked this paper a while back but have only just had time to read it through, STK: A Python Toolkit for Supramolecular Assembly. STK is a tool for the automated assembly, molecular optimization and property calculation of supramolecular materials. It has a simple Python API and integration with third party computational codes.
The source code of the program can be found at https://github.com/lukasturcani/stk and the detailed documentation is here.
Additional linking functional groups can be defined as SMARTS and STK can be extended by adding additional optimisation force-fields.
Top 12 unix commands for data scientists.
A really useful post on KDnuggets.
With the beautiful intuitive interface it is sometimes easy to forget that Mac OS X has unix underpinnings and that the Terminal gives access to whole set of invaluable tools.
This post is a short overview of a dozen Unix-like operating system command line tools which can be useful for data science tasks. The list does not include any general file management commands (pwd, ls, mkdir, rm, ...) or remote session management tools (rsh, ssh, ...), but is instead made up of utilities which would be useful from a data science perspective, generally those related to varying degrees of data inspection and processing. They are all included within a typical Unix-like operating system as well.
If you regularly have to deal with very large data files some of these commands will be invaluable, for example:
head outputs the first n lines of a file (10, by default) to standard output. The number of lines displayed can be set with the -n option.
head -n 5 my file.txt
Review of MOE 2018.01
The 2018.01 release of Chemical Computing Group's Molecular Operating Environment (MOE) software includes a number of new features, enhancements and changes. I written a review that highlights a number of the features.
Roundtrip editing with ChemDraw 17.1
Whenever there is an update to ChemDraw I always hold my breath to see if round-trip editing (i.e. the ability to copy and paste from a chemical drawing package into Word for example and then be able to copy and paste the structure back from Word into the chemical drawing application) has been broken.
Fortunately this blog post provides an invaluable update to the current situation.
RDKit code changes
I just saw this on the RDKit email circulation list and since I know a number of readers use RDKit I thought I'd mention it.
When we do the beta for the 2018.03.1 release we're going to switch the C++ backend to use modern C++ (=C++11). For people who can't switch to use that code, we will continue to provide bug fixes for the 2017.09 release for at least another 6 months.
This should only affect people who need to build the RDKit C++ code themselves. If you use a binary version of the RDKit like the ones available inside of Anaconda Python or KNIME, this change should have no impact upon you.
It looks like we're almost there. Hopefully we will be able to do a beta of the 2018.03 release by the end of the week.
Updated Literature search script
I've updated the Vortex script to run text based queries of PubMed.
If you regularly use the E-utilities API you might want to read this.
After May 1, 2018, NCBI will limit your access to the E-utilities unless you have one of these keys. Obtaining an API key is quick, and simple, and will allow you to access NCBI data faster. If you don’t have an API key, E-utilities will still work, but you may be limited to fewer requests than allowed with an API key.
After May 1, 2018, any computer (IP address) that submits more than 3 E-utility requests per second will receive an error message. This limit applies to any combination of requests to EInfo, ESearch, ESummary, EFetch, ELink, EPost, ESpell, and EGquery.
If you write software of scripts that access the E-utilities API then the users will need to get their own api key. Calls will have this format
https://www.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?db=pubmed&api_key=ABCD123
I've updated this script to reflect this change, and I've highlighted where you need to add your api key in the script. I've also tried to ensure that any query string should be encoded to make it URL safe and I've extended the search range up to 2018.
iRASPA: GPU-accelerated visualization software for materials scientists
A recent publication DOI describes a new application for materials science.
A new macOS software package, iRASPA, for visualisation and editing of materials is presented. iRASPA is a document-based app that manages multiple documents with each document containing a unique set of data that is stored in a file located either in the application sandbox or in iCloud drive. The latter allows collaboration on a shared document (on High Sierra). A document contains a gallery of projects that show off the main features, a CloudKit-based access to the CoRE MOF database (approximately 8000 structures), and local projects of the user. Each project contains a scene of one or more structures that can initially be read from CIF, PDB or XYZ-files, or made from scratch. Main features of iRASPA are: structure creation and editing, pictures and movies, ambient occlusion and high-dynamic range rendering, collage of structures, (transparent) adsorption surfaces, cell replicas and supercells, symmetry operations like space group and primitive cell detection, screening of structures using user-defined predicates, and GPU-computation of helium void fraction and surface areas in a matter of seconds. Leveraging the latest graphics technologies like Metal, iRASPA can render hundreds of thousands of atoms (including ambient occlusion) with stunning performance.
iRASPA is available from Mac app store.
SeeSAR updated
A new version of SeeSAR is available (7.3), this update includes.
- Easy mode switching: from the molecules table to the editor or the inspirator and back in just one click...
- Automated workflows: in the settings you can now decide about which calculations should happen automatically
- Menus re-organized: buttons are grouped for better overview and almost all table entries obtained a convenient context menu, simply right-click to give it a try
- Excel export: this is one of the rather hidden Easter Eggs. Besides SDF you may save tables now as XLSX (including the 2D depiction)
- Saved settings: user settings (the layout, background color, etc.) are now saved separately from project settings (filters and visualization features)
Full release notes are available.
RDkit in Samson
I've posted about Samson a couple of times and it just keeps getting better and better.
SAMSON is a novel software platform for computational nanoscience. Rapidly build models of nanotubes, proteins, and complex nanosystems. Run interactive simulations to simulate chemical reactions, bend graphene sheets, (un)fold proteins. SAMSON's generic architecture makes it suitable for material science, life science, physics, electronics, chemistry, and even education. SAMSON is developed by the NANO-D group at INRIA, and means "Software for Adaptive Modeling and Simulation Of Nanosystems.
A recent blog post highlights the use of RDKit in Samson.
In this post I will present you the RDKit-SMILES Manager module that I integrated in the SAMSON platform. As some of you know, RDKit is an open source toolkit for cheminformatics which is widely used in the bioinformatics research. One of its features is the conversion of molecules from their SMILES code to a 2D and 3D structures. Thanks to the new SAMSON Element, it is now possible to use these features in the SAMSON platform. SMILES code files (.smi) or text files (.txt) containing several SMILES codes can be read using the import button.
The new module allows you to import a file containing SMILES strings, generate 2D depictions, and by right-clicking on these images, you can open, generate the 3D structure in SAMSON or save the image as png or svg.
It is also possible to run substructure searching using SMARTS.
Introducing IBM Watson Services for Core ML
This should be an interesting development for those developing scientific apps for iOS, the ability to access IBM Watson capabilities.
With Watson Services for Core ML, it’s easy to build apps that access powerful Watson capabilities right from iPhone and iPad, so you can provide dynamic, intelligent insights that improve over time. And with the IBM Cloud Developer Console for Apple, you can quickly tap into Watson Services for Core ML and other services on IBM Cloud
To get you started there is a project on GitHub https://github.com/watson-developer-cloud/visual-recognition-coreml.
Classify images with Watson Visual Recognition and Core ML. The images are classified offline using a deep neural network that is trained by Visual Recognition.
There is a database of Mobile apps for science.
Chemistry WebVR:- This is so cool
Jonas Bostrom who spoke at the Chemistry on Mobile Devices Meeting just sent me a link to EduChem VR - WebVR highlighting the use of virtual reality in chemistry.
"Chemistry WebVR" is web-based platform to learn about organic chemistry. You can experience important concepts like stereochemistry, molecular geometries, atom orbitals or reactions mechanisms in a virtual reality. It is userfriendly and works direct in your smartphone browser. The target is University courses and advanced high-school levels.
There is a demo of a SN2 reaction here and if you explore you will see a link to sign up as a beta tester.
mmpdb: An Open Source Matched Molecular Pair Platform for Large Multi-Property Datasets
An interesting paper on chemrxiv DOI
Matched Molecular Pair Analysis (MMPA) enables the automated and systematic compilation of medicinal chemistry rules from compound/property datasets. Here we present mmpdb, an open source Matched Molecular Pair (MMP) platform to create, compile, store, retrieve, and use MMP rules. mmpdb is suitable for the large datasets typically found in pharmaceutical and agrochemical companies and provides new algorithms for fragment canonicalization and stereochemistry handling. The platform is written in Python and based on the RDKit toolkit. It is freely available from https://github.com/rdkit/mmpdb
NMR solvent peaks
I just noticed this mentioned on Twitter and so I've added it to the Mobile Science site.
NMR Solvent peaks is a conveniently-searchable version of the ungainly table of NMR data most organic chemists keep a copy of nearby. Instead of searching through the table for a peak near your unidentified peak, just enter your solvent and the peak's multiplicity and location and you'll have a short list of candidate impurities
There is also a web-based version and a twitter feed for submitting bugs and finding out about updates.
There are a number of other NMR apps available
WWDC 2018
The Apple Worldwide Developers Conference takes place in San Jose, CA, June 4–8. The opportunity to buy tickets to WWDC18 is offered by random selection. Registration is open until Thursday, March 22, 2018 at 10:00 a.m. PDT
To register, you must be a member of the Apple Developer Program or Apple Developer Enterprise Program as of March 13, 2018 at 10:00 a.m. PDT, and agree to the WWDC18 Registration and Attendance Policy. Your membership must be current, valid, and in good standing from this date until the end of WWDC18.
Flagging Potential Kinase Inhibitors
Most of kinase inhibitors bind in the region of the ATP binding site using the hydrogen bonding interactions of the hinge region shown in the schematic below. We can use the knowledge of these hinge binding motifs to flag potential kinase inhibitors.
BBEdit 12.1.2 Released
BBEdit 12.1.2 is a minor update to my favourite text editor.
From the release notes.
There's a new item in the Application preferences, as part of the software update settings: "Early Access". You can use this to turn on (or off) notification of pre-release maintenance updates for the version of BBEdit that you're using. (Note that even if you turn on Early Access, you will not receive notice of pre-release versions of feature updates or major upgrades.)
A new setting in the "Editing" preferences allows you to control whether tick marks appear in the scroll bar for Live Search matches. Turning this off can be useful if you're working in very large files and have so many results that the application stalls while trying to update the marks.
There are also a number of bug fixes including.
Fixed bug in which the Markdown tokenizer was confused by empty URL references (e.g. ) in such a way that editing in certain subsequent parts of the file would cause syntax coloring to get out of whack. This change also fixes a bug in the Markdown syntax coloring in which links with an empty description or URL were not properly recognized and colored.
BBEdit 12.1.2 requires Mac OS X 10.11.6 or later, and is compatible with macOS 10.13 "High Sierra"
I use BBEdit extensively for Markdown editing but there are a number of alternatives.
Top 20 programming languages
Red Monk have published their Programming Language Rankings. The data source used for these queries is the GitHub Archive.
- JavaScript
- Java
- Python
- PHP
- C#
- C++
- CSS
- Ruby
- C
- Swift
- Objective-C
- Shell
- R
- TypeScript
- Scala
- Go
- PowerShell
- Perl
- Haskell
- Lua
Swift (+1): Finally, the apprentice is now the master. Technically, this isn’t entirely accurate, as Swift merely tied the language it effectively replaced – Objective C – rather than passing it. Still, it’s difficult to view this run as anything but a changing of the guard. Apple’s support for Objective C and the consequent opportunities it created via the iOS platform have kept the language in a high profile role almost as long as we’ve been doing these rankings. Even as Swift grew at an incredible rate, Objective C’s history kept it out in front of its replacement. Eventually, however, the trajectories had to intersect, and this quarter’s run is the first occasion in which this has happened. In a world in which it’s incredibly difficult to break into the Top 25 of language rankings, let alone the Top 10, Swift managed the chore in less than four years. It remains a growth phenomenon, even if its ability to penetrate the server side has not met expectations.
Three-Dimensional Printing of Ellipsoidal Structures Using Mercury
A recent paper on ChemRxiv
A description of how to use the Mercury software from the CCDC to print 3-dimensional crystal structures that depict the anisotropic displacement parameters, matching the commonly used ellipsoidal depiction used in scientific papers. Details on how to convert a cif file into a 3D printing data file is included in the main paper, and details on the preparation of that data file for printing on a number of different 3D printers is included in the ESI.
There is more on 3D printing here .
Vortex update
Dotmatics have announced the impending release of the latest update to Vortex
The focus appears to be on the enhancement of the Vortex bioinformatics tools reviewed previously.
Script Debugger 7 released
A new version of Script Debugger has been released.
Script Debugger is an integrated development environment focused entirely on AppleScript. This focus allows it to deliver a suite of tools that make AppleScript development amazingly productive. You can use it to write and edit code, analyze target applications, debug scripts, and more.
Second Major DeepChem Release
A major update the DeepChem has been announced.
This major version release finishes consolidating the DeepChem codebase around our TensorGraph API for constructing complex models in DeepChem. We've made a variety of improvements to TensorGraph's saving/loading features and added a number of new tutorials improving our documentation of TensorGraph. We've also removed a number of older deprecated submodules and models in favor of the new, standardized TensorGraph implementations.
In addition, we've implemented a number of new deep models and algorithms, including DRAGONNs, Molecular Autoencoders, MIX+GANs, continuous space A3C, MCTS for RL, Mol2Vec and more. We've also continued improving our core graph convolutional implementations.
Also remember the RSC-BMCS / RSC-CICAG Artificial Intelligence in Chemistry Meeting registration is now open.
SAMSON 0.7.0 is available
SAMSON has been updated with a number of cool features, I particularly like the embedded Jupyter console.
SAMSON is a platform for computational nanoscience.
Python scripting is now available! Most of the SAMSON API is exposed in Python, and a Jupyter console embedded in SAMSON allows you to create models and run simulations, generate movies, perform analysis and reporting, etc., directly from scripts.
What’s more, Python makes it even easier to integrate and pipeline SAMSON and SAMSON Elements with well-known packages from diverse fields, e.g. TensorFlow, PyRosetta, RDKit, ASE, etc., to name a few.
Data Aanlysis tools
I've just added the simple lightweight CSV editor Table Tool to the Data Analysis tools page.
The Data Analysis tools page contains a listing of over 100 applications, tools and libraries that can be used for data analysis under Mac OSX.
OMEGA v3.0.0 released
Conformational analysis is a critical component of molecular modelling and I've always viewed OMEGA from OpenEye as the standard to which all other software packages should be compared.
OMEGA's knowledge-based approach produces high-quality conformers, superior to those of many other methods. It has also been found to be the fastest of commercially available conformer generators. Benchmarking Conformer Ensemble Generators, Friedrich, N.-O. de Bruyn Kops, C. Fachsenberg, F. Sommer, K., Rarey, M. Kirchmair, J. J. Chem. Inf. Model. 2017, 57, 2719-2728. DOI.
OMEGA’s capability has been expanded for molecules containing large rings by adding a method specifically tuned to sample macrocyclic conformational space. The approach is based on a rewritten version of the original OMEGA distance geometry algorithm.
In this update support for macOS El Capitan (10.11), macOS Sierra (10.12), and macOS High Sierra (10.13) has been added.
Microsoft Quantum Development Kit Samples and Libraries under MacOSX
Well this is well out of my comfort zone but I thought I'd mention it.
Welcome to the Microsoft Quantum Development Kit! This repository contains the libraries and samples provided with the Quantum Development Kit https://github.com/microsoft/quantum.
The Microsoft Quantum Development Kit has been tested under MacOSX, Ubuntu Linux, but may work on other distributions. The Python interoperability feature has been developed for the Anaconda distribution of Python 3.6. Please see the README file provided with the Python sample for more details
Thank you for your interest in Microsoft Quantum Development Kit preview. The development kit contains the tools you'll need to build your own quantum computing programs and experiments.
So off you go…..
Google Summer of Code:- Open Chemistry
There are a number of interesting projects being undertaken in this years Google Summer of Code.
If you know of any students that might be interested then perhaps point them to the Open Chemistry Project.
The Open Chemistry project is a collection of open source, cross platform libraries and applications for the exploration, analysis and generation of chemical data. The organization is an umbrella of leading projects developed by long-time collaborators and innovators in open chemistry such as the Avogadro, Open Babel, and cclib projects. These three alone have been downloaded over 700,000 times and cited in over 2,000 academic papers. Our goal is to improve the state of the art, and facilitate the open exchange of chemical data and ideas while utilizing the best technologies from quantum chemistry codes, molecular dynamics, informatics, analytics, and visualization.
There is a list of the GSoC Ideas 2018 here but of course students can add their own.
MOE update 2018.01 released
The latest update to Chemical Computing Group's Molecular Operating Environment (MOE) software includes a variety of new features, enhancements
Windows XP (finally!) and macOS 10.6 have been removed from the list of officially supported platforms. Supported Windows platforms are Vista/7/8/10, and the minimum supported macOS is 10.7 (Lion).
Amber14:EHT Forcefield. The Amber14 parameter set is now supported in MOE. The new parameters consist of improvements to nucleic acids; otherwise, protein and small molecule parameters (and charges) are unchanged. The forcefield can be selected in the MOE | Footer.
TCR-MHC Protein Complex Database. A new MOE Project database containing T-Cell Receptor (TCR) – Major Histocompatibility Complex (MHC) x-ray structures has been added to MOE. The database can be accessed with MOE | Protein | Search | TCR-MHC | TCR-MHC which will launch the MOE Project Search panel.
Several applications have been parallelized to run in the moe -mpu environment:
- Descriptor calculations with the SVL function QuaSAR_DescriptorMDB.
- Energy minimization in the Database Viewer DBV | Compute | Molecule | Energy Minimize.
- Conformational search using MDB input files in MOE | Compute | Conformations | Search.
- Rotamer library generation with DBV | Compute | Build Rotamer Library.
- Project database creation with the SVL run file dbupdate.svl and the scripts $MOE/bin/projupdate and $MOE/bin/projupdate.bat.
I plan to review the latest version of MOE in the near future.
CDD Vault is Now an ELN
CDD Vault ELN is an extension to CDD Vault for archiving and selectively sharing experimental text, data. CDD Vault ELN helps you capture and collaborate around unstructured information (conversations, notes, documents, images, files) and structured data (experimental results, plots, SAR).
You can easily capture and link to a variety of objects in CDD Vault ELN including:
- Images
- File attachments
- Links to CDD Vault & other resources
- Tables
- Structures
Awesome Python Chemistry
A curated list of awesome Python frameworks, libraries, software and resources related to Chemistry.
https://github.com/lmmentel/awesome-python-chemistry
A blog post giving more details http://lukaszmentel.com/blog/awesome-python-chemistry/index.html.
Vida updated
VIDA v4.4.0 has been released. This upgrade adds several new features and fixes many previous issues.
- A new ribbon style that produces ribbons with a smoother appearance has been introduced into VIDA.
- Improvements to the Builder/Sketcher, including:
- closing the Sketcher window prompts for Save, Save as New, Discard, or Cancel
- closing the Builder closes the Sketcher window
- an additional “Save As New” option in the toolbar and Builder context menu
- hitting Return now finishes adding typed-in molecules from the Sketcher
- Significant improvements to the Extension Manager. In addition, extensions can be centrally deactivated.
VIDA is built on top of the OpenEye Toolkits v2017.Oct libraries to ensure that it and ancillary programs take full advantage of the state-of-the-art improvements in all underlying programming libraries. Support for macOS El Capitan (10.11), macOS Sierra (10.12), and macOS High Sierra (10.13) has been added.
KNIME tutorial
Don't forget to sign up for your chance to hear a webinar by Greg Landrum, Knime's VP for Life Sciences, this Wednesday, He will be talking about processing malaria HTS results using Knime and will give a tutorial on workflows developed for ligand-based virtual screening, based on results of a phenotypic HTS against malaria.
Wed, Feb 21, 2018 3:00 PM - 4:00 PM GMT
The Royal Society of Chemistry Chemical Information and Computer Applications Group (CICAG) Winter Newsletter is now available Online
The Winter 2017-18 edition of the CICAG Newsletter has been published and can be downloaded from the Newsletters webpage.
Features in this edition which may be of interest include: * Details of CICAG's upcoming Artificial Intelligence in Chemistry meeting * 30th Anniversary celebration of the Catalyst Science Discovery Centre and a look at the scientific history and achievements of the area * Tony Kent Strix Award and Annual Lecture 2017 and eLucidate from UKeiG * Other CICAG planned and proposed meetings along with other upcoming conferences and events * Meeting reports * Book reviews * News from Infochem and CAS * A review of the latest chemical information news and developments
PhD Student and Post-Doc Conference Bursaries
Did you know that most CICAG sponsored meetings have a number of bursaries available for PhD and post-doctoral students? Normally up to a value of £250, these awards help to cover registration and travel costs. Preference will be given to members of the RSC (and meeting co-sponsors if applicable), especially those who are selected to give posters.
RSC-BMCS / RSC-CICAG Artificial Intelligence in Chemistry Friday, 15th June 2018 Royal Society of Chemistry at Burlington House, London, UK. Twitter hashtag - #RSC_AIChem
Google summer of code chemistry ideas
The Open Chemistry project have collected together project ideas for GSoC 2018. The projects cover a wide range of projects in chemistry
The full listing is available here and includes projects that make use of a number of open source toolkits such as Open Babel, RdKit and cclib.
Molecular Materials Informatics Apps
Molecular Materials Informatics, Inc have been busy recently with updates to many of their applications
The following mobile apps have all been updated
PolyPharma Poly-pharmacology of molecular structures: use structure activity relationships to view predicted activities against biological targets, physical properties, and off-targets to avoid. Calculations are done using Bayesian models and other kinds of calculations that are performed on the device.
Green Lab Notebook allows recording of multistep chemical reactions, using molecular structure, name and stoichiometry as the primary components. When quantities are provided, interconversions are calculated automatically, and green chemistry metrics are shown.
SAR Table app is designed for creating tables containing a series of related structures, their activity/property data, and associated text. Structures are represented by scaffolds and substituents, which are combined together to automatically generate a construct molecule. The table editor has many convenience features and data checking cues to make the data entry process as efficient as possible.
MolPrime is a chemical structure drawing tool based on the unique sketcher from the Mobile Molecular DataSheet (MMDS).
Approved Drugs app contains over a thousand chemical structures and names of small molecule drugs approved by the US Food & Drug Administration (FDA). Structures and names can be browsed in a list, searched by name, filtered by structural features, and ranked by similarity to a user-drawn structure. The detail view allows viewing of a 3D conformation as well as tautomers. Structures can be exported in a variety of ways, e.g. email, twitter, clipboard.
Green Solvents reference card for chemical solvents, with data regarding their "greenness": safety, health and environmental effects.
For the desktop the OS X Molecular DataSheet (XMDS) is an interactive cheminformatics tool for viewing and editing molecular structures, chemical reactions and data. It is designed to be instantly intuitive to anyone who has used a Mac, a spreadsheet and any chemical structure sketcher.
BBEdit 12 is now 64bit
To call BBEdit a text editor is a great injustice, it is the Swiss army knife of text editors and I use it constantly.
The latest update has a major change, BBEdit is now 64-bit this comes with several advantages as the release notes describe
BBEdit is now built as a 64-bit application. This works around various reported bugs in the OS and has other beneficial side effects: the application starts more quickly on a "cold" launch; 64-bit color pickers and contextual-menu plug-ins are now available; and our customers are even more handsome and athletic than before.
Beginning with this version, you can open documents that are much larger than was previously possible. In the Before Time, documents whose in-memory size (about twice the on-disk size) exceeded roughly 1.5GB would fail to open and report an out-of-memory error, as would documents whose internal structure required generation of large quantities of syntax coloring and/or code folding information (such as complicated XML documents). Beginning with this version, you can perform many large-scale operations on very large files without running out of memory or needing to clear Undo state. Support for the Touch Bar has been added to various windows (applicable only to computers that have a Touch Bar, of course):
There are many more updates and fixes described in detail in the release notes.
BBEdit 12 requires macOS 10.11.6 ("El Capitan") or later, and is compatible with macOS 10.13 "High Sierra".
If you are using macOS 10.13 "High Sierra", please make sure that you have updated to the latest available OS version (10.13.3 or later).
MayaChem Tools
MayaChemTools is a fabulous collection of Perl and Python scripts, modules, and classes to support a variety of day-to-day computational discovery needs.
The core set of command line Perl scripts available in the current release of MayaChemTools has no external dependencies and provide functionality for the following tasks:
- Manipulation and analysis of data in SD, CSV/TSV, sequence/alignments, and PDB files
- Listing information about data in SD, CSV/TSV, Sequence/Alignments, PDB, and fingerprints files
- Calculation of a key set of physicochemical properties, such as molecular weight, hydrogen bond donors and acceptors, logP, and topological polar surface area
- Generation of 2D fingerprints corresponding to atom neighborhoods, atom types, E-state indices, extended connectivity, MACCS keys, path lengths, topological atom pairs, topological atom triplets, topological atom torsions, topological pharmacophore atom pairs, and topological pharmacophore atom triplets
- Generation of 2D fingerprints with atom types corresponding to atomic invariants, DREIDING, E-state, functional class, MMFF94, SLogP, SYBYL, TPSA and UFF
- Similarity searching and calculation of similarity matrices using available 2D fingerprints
- Listing properties of elements in the periodic table, amino acids, and nucleic acids
- Exporting data from relational database tables into text files
The command line Python scripts based on RDKit provide functionality for the following tasks:
- Calculation of molecular descriptors
- Comparison 3D molecules based on RMSD and shape
- Conversion between different molecular file formats
- Enumeration of compound libraries and stereoisomers
- Filtering molecules using SMARTS, PAINS, and names of functional groups
- Generation of graph and atomic molecular frameworks
- Generation of images for molecules
- Performing structure minimization and conformation generation based on distance geometry and forcefields
- Picking and clustering molecules based on 2D fingerprints and various clustering methodologies
- Removal of duplicate molecules
These invaluable scripts can be used in other applications, I've written a Vortex Script that uses them.
Artificial Intelligence in Chemistry
I mentioned the first announcement of a meeting to be held next year.
RSC-BMCS / RSC-CICAG Artificial Intelligence in Chemistry Friday, 15th June 2018 Royal Society of Chemistry at Burlington House, London, UK.
Twitter hashtag - #RSC_AIChem
A number of the speakers have now been confirmed.
Confirmed Speakers
Keynote: What I learned about machine learning - revisited Bob Sheridan, Merck
Presentation title to be confirmed Nadine Schneider, Novartis
Scaling de novo design, from single target to disease portfolio Wilhem van Hoorn, Exscientia
Presentation title to be confirmed Marwin Segler, Benevolent AI
Molecular de novo design through deep learning Ola Engkvist, AstraZeneca
I also notice that there are a number of EPSRC funding opportunities
Artificial Intelligence - UKRI CDTs EPSRC is expected to support 10-20 doctoral training positions.
The call is now open for around 15 Centres for Doctoral Training (CDTs) focused on areas relevant to Artificial Intelligence (AI) across UKRI's remit. This call opens against the background of Professor Dame Wendy Hall and Jérôme Pesenti's review, Growing the artificial intelligence industry in the UK, and the Government's Industrial Strategy White Paper, Building a Britain fit for the Future. This investment in AI skills will be kick-started by support for over 100 studentships that will be funded during 2018/19 via the Research Councils current mechanisms and schemes.
Universities are invited to apply against two priority areas:
Enabling Intelligence, a priority area within Engineering and Physical Sciences Research Council's (EPSRC) main CDT call
Applications and Implications of Artificial Intelligence (AIAI), a new priority area relevant to all Research Councils.
Screenlamp:- A toolkit for ligand-based virtual screening
A recent publication "Enabling the hypothesis-driven prioritization of ligand candidates in big databases: Screenlamp and its application to GPCR inhibitor discovery for invasive species control" {DOI](http://dx.doi.org/10.1007/s10822-018-0100-7) describes a very interesting software tool for virtual screening.
While the advantage of screening vast databases of molecules to cover greater molecular diversity is often mentioned, in reality, only a few studies have been published demonstrating inhibitor discovery by screening more than a million compounds for features that mimic a known three-dimensional (3D) ligand. Two factors contribute: the general difficulty of discovering potent inhibitors, and the lack of free, user-friendly software to incorporate project-specific knowledge and user hypotheses into 3D ligand-based screening. The Screenlamp modular toolkit presented here was developed with these needs in mind.
The Screenlamp homepage gives more details and installation instructions. Screenlamp is written in Python (3.6) and can be downloaded from GitHub https://github.com/psa-lab/screenlamp.
Certain submodules within screenlamp require external software to sample low-energy conformations of molecules and to generate pair-wise overlays. The tools that are currently being used in the pre-built, automated screening pipeline are OpenEye OMEGA and OpenEye ROCS to accomplish those tasks. However, screenlamp does not strictly require OMEGA and ROCS, and you are free to use any open source alternative that provided that the output files are compatible with screenlamp tools, which uses the MOL2 file format.
Screenlamp is research software and has been made available to other researchers under a permissive Apache v2 open source license.
Wolfram|Alpha Updated
Wolfram|Alpha has been updated.
Wolfram|Alpha. Building on 25 years of development led by Stephen Wolfram, Wolfram|Alpha has rapidly become the world's definitive source for instant expert knowledge and computation.
There are more apps on the MobileScience website.
Spark V10.5 released
Cresset have just announced the latest release of Spark a scaffold hopping and bioisostere replacement tool.
Highlights
- New wizards to support ligand growing and linking, macrocyclization and water replacement experiments
- Enhanced Spark database update functionality
- New pharmacophore constraints
- Enhancements in search algorithm and advanced options.
Findings2 released
A new version of the very popular electronic notebook Findings has been released. You can try it out for free with no time limit. It allows the creation of up to 20 entries. Purchase Findings Pro to allow the creation of unlimited entries.
Remember there is a mobile version of Findings for you iPhone or iPad.
SeeSAR version 7.2 released
SeeSAR has been updated.
Get fresh inspiration from this huge update of SeeSAR! We realized, on the one hand, that the functionality of the editor was growing and growing, making it more and more complicated to use. On the other hand, access to the full functionality of ReCore demands a different kind of user interface. So we "took the bull by the horns" and, akin to the editor, created the new Inspirator which you can use to do:
- Core replacement This feature is the same but with a much improved UI. You are able to directly select and visualize the bonds that will be clipped to carve out a core fragment for replacement. The clipped bonds now remain in place (even while you define sphere constraints) up until you define a new query. Also the display of results is much enhanced, as you can see the new core fragments highlighted in 2D as well as in 3D. For reference, your query molecule stays visible as well.
- Fragment linking and merging You may of course launch the Inspirator with more than just one molecule. In this case, you can define bonds to clip on different molecules, thereby requesting linker fragments that will connect the remaining pieces. Note that it is not mandatory to clip a terminal part of each molecule to create the query, you may replace a core part in one and connect it to another fragment at the same time.
- Fragment growing This was possibly the most frequently requested functionality in ReCore: Cut just one bond and grow onto this bond using a fragment library of typical side chains. In this way, you can, for example, reach out to nearby subpockets. The new growing algorithm can very quickly scan through a (for now) ready-made library of typical fragments. You may of course define sphere constraints at the same time in order to target particular locations in the bi
You can download SeeSAR here and use it for free for 7 days.
Xplor-NIH for molecular structure determination from NMR
A discussion on the new developments of Xplor-NIH DOI. Xplor-NIH is a popular software package for biomolecular structure determination from nuclear magnetic resonance (NMR) and other data sources.
Most of Xplor-NIH's code is now being developed directly in the Python language, and thus is directly accessible for modification by the end-user without recompilation, while code paths which require high performance, such as those executed at every timestep of molecular dynamics, are coded in C++. The Python interface to Xplor-NIH provides an extensible toolbox for developing further functionality. Precompiled packages for most popular Unix and Unix-like operating systems (such as Linux and Mac OS X), as well as documentation and support are available directly from http://nmr.cit.nih.gov/xplor-nih/.
MedChemStructures Genius
The idea behind MedChem Structures Genius is that the chemical structure can be used as a visual and semantical mark to gain information on drug molecules (mode of action, side effects, bioavailability,…). This app, aimed at both students and professionnals, allows learning to recognize chemical drug structures and link them to their INN and their pharmacological class. The quiz allows self evaluation. Only small molecules and peptides and biochemical molecules are listed (no biologics, vaccines, …). The drug classification has been adapted from the ATC WHO classification.
There are many more science apps on the Mobile Science site.
GROMACS updated
The official release of GROMACS 2018 is now available.
GROMACS is one of the major software packages for the simulation of biological macromolecules.
Highlights from this update include:-
- PME long-ranged interactions can now run on a single GPU, which means many fewer CPU cores are needed for good performance.
Optimized SIMD support for recent CPU architectures: AMD Zen, Intel Skylake-X and Skylake Xeon-SP.
The AWH (Accelerated Weight Histogram) method is now supported, which is an adaptive biasing method used for overcoming free energy barriers and calculating free energies (see http://dx.doi.org/10.1063/1.4890371).
- A new dual-list dynamic-pruning algorithm for the short-ranged interactions, that uses an inner and outer list to permit a longer-lived outer list, while doing less work overall and making runs less sensitive to the choice of the “nslist” parameter.
- A physical validation suite is added, which runs a series of short simulations, to verify the expected statistical properties, e.g. of energy distributions between the simulations, as a sensitive test that the code correctly samples the expected ensemble.
- Conserved quantities are computed and reported for more integration schemes - now including all Berendsen and Parrinello-Rahman schemes.
Fortran on a Mac
SeeSAR for Parallelized Fragment Growing & Pocket Exploration
I see that SeeSAR now supports a parallelized 'real' fragment growing.
SeeSAR is a software tool for interactive, visual compound prioritisation as well as compound evolution. Structure-based design work ideally supports a multi-parameter optimization to maximise the likelihood of success, rather than affinity alone. Having the relevant parameters at hand in combination with real-time visual computer assistance in 3D is one of the strengths of SeeSAR. Stimulating exploration with SeeSAR, we have embarked on pursuing a new cheminformatics compute paradigm of "Propose & Validate".
You can download SeeSAR here and use it for free for 7 days.
Behind the Scenes in Real-Life Software Design By Stephen_Wolfram · 48 videos
I just stumbled across a fascinating series of lectures. These are recordings of the live discussions behind the ongoing software development led by Stephen Wolfram.
Of particular interest might be the discussion on incorporating chemistry into the Wolfram language.
https://www.twitch.tv/videos/181269427?collection=F82InZg17BQFzw.
UCSF ChimeraX
A recent publication DOI describes an update to the popular molecule viewer UCSF Chimera
UCSF ChimeraX is next-generation software for the visualization and analysis of molecular structures, density maps, 3D microscopy, and associated data. It addresses challenges in the size, scope, and disparate types of data attendant with cutting-edge experimental methods, while providing advanced options for high-quality rendering (interactive ambient occlusion, reliable molecular surface calculations, etc.) and professional approaches to software design and distribution.
The application can be downloaded here http://www.rbvi.ucsf.edu/chimerax/download.html
It is important to note that ChimeraX is not backward compatible with Chimera and does not read Chimera session files. It has been tested on MacOS X 10.12. The ChimeraX user interface is implemented in Qt, offering a native-like look and feel on each platform. ChimeraX is largely implemented using Python, an interpreted programming language. To manipulate these very large datasets interactively, ChimeraX uses memory-efficient data structures combined with high-performance algorithms implemented in C++. MacroMolecular Crystallographic Interchange Format (mmCIF) is the preferred format for atomic data in ChimeraX, mmCIF replaces the aged and more limited PDB format and offers a number of advantages.
Python support in Excel
The most popular suggestion on the "How can we improve Excel for Windows" forum is Python as an Excel scripting language with over 4500 votes and it has elicited a comment from the MSFT excel team.
Thanks for the continued passion around this topic. We’d like to gather more information to help us better understand the needs around Excel and Python integration.
Followed by a survey.
Of course one would hope that they also add it to the Mac version of Excel.
Suggestions for a Laser Pointer
I give a course that consists of a full day of lectures, in the past I've had to use a selection of laser pointers/batteries because they don't last.
So I'm looking for a laser pointer that will last for several hours, and be bright enough to show up on the large flat screens used in many lecture theatres these days.
Any suggestions welcome.
Predicting the Conformational Energy of Small Molecules
An interesting publication in JCIM, Atom Types Independent Molecular Mechanics Method for Predicting the Conformational Energy of Small Molecules, DOI.
We report herein our effort to incorporate lone pairs into our model to extend its applicability domain to any saturated small molecules. The developed model H-TEQ 2 has been validated on a wide variety of molecules from polyaromatic molecules to carbohydrates and molecules with high heteroatoms/carbon ratios.
Deep Learning Cheat Sheet (using Python Libraries)
Just came across this really invaluable resource.
- Deep Learning Cheat Sheet (using Python Libraries)
- PySpark Cheat Sheet: Spark in Python
- Data Science in Python: Pandas Cheat Sheet
- Cheat Sheet: Python Basics For Data Science
- A Cheat Sheet on Probability
- Cheat Sheet: Data Visualization with R
- New Machine Learning Cheat Sheet by Emily Barry
- Matplotlib Cheat Sheet
- One-page R: a survival guide to data science with R
- Cheat Sheet: Data Visualization in Python
- Stata Cheat Sheet
- Common Probability Distributions: The Data Scientist’s Crib Sheet
- Data Science Cheat Sheet
- 24 Data Science, R, Python, Excel, and Machine Learning Cheat Sheets
- 14 Great Machine Learning, Data Science, R , DataViz Cheat Sheets
YANK
YANK is a GPU-accelerated Python framework for exploring algorithms for alchemical free energy calculations.
Features
- Modular Python framework to facilitate development and testing of new algorithms
- GPU-accelerated via the OpenMM toolkit
- Alchemical free energy calculations in both explicit and implicit solvent
- Hamiltonian exchange among alchemical intermediates with Gibbs sampling framework
- General Markov chain Monte Carlo framework for exploring enhanced sampling methods
- Built-in equilibration detection and convergence diagnostics
- Support for AMBER prmtop/inpcrd files
- Support for absolute binding free energy calculations
- Support for transfer free energies (such as hydration or partition free energies)
Install using conda
$ conda config --add channels omnia --add channels conda-forge
$ conda install yank
conda will install dependencies from binary packages automatically, including difficult-to-install packages such as OpenMM, numpy, and scipy. YANK runs on Python 3.5, and Python 3.6
Mac in Chemistry Annual website review
At the end of each year I have a look at the website analytics to see which items were the most popular.
Over the year there were 60,000 unique visitors with 25% visiting the site on multiple occasions. The US provided 30% of the visitors and the UK 10% with Germany, Canada and Japan around 5%. As might be expected 60% of the visitors were using a Mac, but 25% of the visitors were Windows users and 10% iOS. Looking at the last month's Mac visitors, 53% were using Mac OS X 10.13, 25% 10.12 and 12% 10.11.
Safari and Chrome (each 41%) were the most used web browsers with the once dominant Internet Explorer down at 2%.
The most viewed blog pages in 2017 were
- Installing Molden
- ChemBioDraw and Word 15
- Mac OSX installer for Coot
- Scientific keyboards for iOS
- Tools for Mac Fortran Programmers
The most popular web pages were (other than the main page)
- Fortran on a Mac
- Cheminformatics on a Mac
- Data Analysis Application on a Mac
- Hints and Tutorials
- Spectroscopy
- Reference Management
The continued popularity of the Fortran on a Mac web page is interesting, I'm not a big Fortran user but if anyone knows of items that could be added to the page I'd be delighted to hear about them. I've done a couple of updates to the Cheminformatics on a Mac page and I think I'll need to add a section on Bioconda in the future.
Interestingly the Scientific Applications under High Sierra page was of only transient popularity. It seem this update to Mac OSX was relatively benign with very few issues.
2017 also saw the 2000th download of iBabel, iBabel is a GUI (graphical user interface) for the open source cheminformatics toolkit OpenBabel. It also provides an interface to a variety of tools built using OpenBabel and a molecule viewer. I'm planning to do an update to iBabel to take advantage of some of the updates to OpenBabel but if you have any suggestions I'd happy to see if I can include them.
2017 also saw the migration of the website from http to https, a change that went pretty seamlessly with only a couple of minor glitches.
The Twitter feed is increasing in popularity with 390 followers. The most popular tweets were
Creating a Bioconda recipe
RSC meeting on AI in Chemistry
The RSS feed still has around 100 followers