Macs in Chemistry

Insanely Great Science

AutoDock Vina 1.2.0

 

A new publication describes and update to AutoDock Vina "AutoDock Vina 1.2.0: New Docking Methods, Expanded Force Field, and Python Bindings" DOI.

AutoDock Vina is arguably one of the fastest and most widely used open-source programs for molecular docking. However, compared to other programs in the AutoDock Suite, it lacks support for modeling specific features such as macrocycles or explicit water molecules. Here, we describe the implementation of this functionality in AutoDock Vina 1.2.0. Additionally, AutoDock Vina 1.2.0 supports the AutoDock4.2 scoring function, simultaneous docking of multiple ligands, and a batch mode for docking a large number of ligands. Furthermore, we implemented Python bindings to facilitate scripting and the development of docking workflows. This work is an effort toward the unification of the features of the AutoDock4 and AutoDock Vina programs.

The source code is available at https://github.com/ccsb-scripps/AutoDock-Vina.

  • AutoDock4.2 and Vina scoring functions
  • Support of simultaneous docking of multiple ligands and batch mode for virtual screening
  • Support of macrocycle molecules
  • Hydrated docking protocol
  • Can write and load external AutoDock maps
  • Python bindings for Python 3 (Linux and Mac)
  • AutoDock Vina is distributed under the Apache License, Version 2.0.
Comments

Additions to MayaChemTools

 

A couple of new scripts have been added to the excellent MayaChemTools growing collection of Perl and Python scripts, modules, and classes to support a variety of day-to-day computational discovery needs.

RDKitFilterTorsionLibraryAlerts.py - Filter torsion library alerts Direct Link

And

Psi4CalculateInteractionEnergy.py - Calculate interaction energy Direct Link.

Comments

Generating IUPAC names for molecules

 

I was recently asked if I could generate IUPAC names for a series of molecules for a patent filing. There are many chemical drawing packages that will generate the IUPAC for a single compound and whilst I could have spent several hours cutting and pasting I decided to write a simple Vortex script to do the task.

Read the details here ….

vortexChemicalnames


Comments

Open Source Antibiotics Structures

 

The OpenSourceAntibiotics project is a consortium of researchers interested in open ways to discover and develop new, inexpensive medicines for bacterial infections. All data is in the open and anyone can contribute. Whilst all data is on the wiki it can be tricky to sometimes link structure to identifier, in an effort to make these more accessible and hopefully indexed by search engines a page containing structures, identifiers, SMILES and InChiKey has been created.

You can view the page here https://opensourceantibiotics.github.io/murligase/CompChemTools/ForIndexing/OSA_data.html.

This page is updated nightly via a cron job. This calls a shell script that runs a Python script that reads the data from the master spreadsheet, uses RDKit to generate the images of the structures and create the html page. The shell script then uploads the html file to GitHub.

Hopefully the html page will be indexed by search engines which will allow anyone to search for the structures. Please feel free to share.

Comments

JupyterLite runs entirely in a web browser

 

I've only just stumbled across this. JupyterLite is a JupyterLab distribution that runs entirely in the browser built from the ground-up using JupyterLab components and extensions, no need to start a Python Jupyter server on the host machine.

Python kernel backed by Pyodide running in a Web Worker Initial support for interactive visualization libraries such as altair, bqplot, ipywidgets, matplotlib, and plotly JavaScript and P5.js kernels running in an IFrame View hosted example Notebooks and other files, then edit, save, and download from the browser's IndexDB (or localStorage) Support for saving settings for JupyterLab/Lite core and federated extensions Basic session and kernel management to have multiple kernels running at the same time Support for Code Consoles

You can try it out here https://jupyterlite.readthedocs.io/en/latest/_static/lab/index.html, all you need is a static web page.

Could be very useful for teaching.

Comments

JupyterLab Desktop App now available

I started using iPython Notebooks many years ago, these became Jupyter notebooks and I'm now transitioning to JupyterLab

I noticed recently there is now a JupyterLab desktop app.

JupyterLab App is the cross-platform standalone application distribution of JupyterLab. It is a self-contained desktop application which bundles a Python environment with several popular Python libraries ready to use in scientific computing and data science workflows.

It is available from GitHub https://github.com/jupyterlab/jupyterlab_app#download.

JupyterLab App works on Debian and Fedora based Linux, macOS and Windows operating systems.

Comments

New additions to MayaChemTools

 

There have been a couple of new additions to the fabulous list of tools and scripts on MayaChemTools.

MayaChemTools is a growing collection of Perl and Python scripts, modules, and classes to support a variety of day-to-day computational discovery needs.

o Psi4GenerateConstrainedConformers.py http://www.mayachemtools.org/docs/scripts/html/Psi4GenerateConstrainedConformers.html>

o Psi4PerformConstrainedMinimization.py http://www.mayachemtools.org/docs/scripts/html/Psi4PerformConstrainedMinimization.html.

o Psi4PerformTorsionScan.py http://www.mayachemtools.org/docs/scripts/html/Psi4PerformTorsionScan.html.

These scripts rely on the presence of Psi4 https://psicode.org/ and RDKit in your environment. In addition, the script RDKitPerformTorsionScan.py http://www.mayachemtools.org/index.html for further details.

MayaChemTools is free software; you can redistribute it and/or modify it under the terms of the GNU LGPL as published by the Free Software Foundation.

Comments

Update to MayaChemTools

 

MayaChemTools is an ever increasing collection of python and perl scripts that support cheminformatics and computational chemistry.

The latest addition are based on PSI4 an open-source quantum chemistry package.

PSI4 provides a wide variety of quantum chemical methods using state-of-the-art numerical methods and algorithms. Several parts of the code feature shared-memory parallelization to run efficiently on multi-core machines. An advanced parser written in Python allows the user input to have a very simple style for routine computations, but it can also automate very complex tasks with ease.

The command line Python scripts based on Psi4 provide functionality for the following tasks:

  • Calculation of single point energies
  • Calculation of molecular properties and partial charges
  • Performing structure minimization
  • Generating molecular conformations
  • Visualizing frontier molecular orbitals and dual descriptors
  • Visualizing electrostatic potential on densities and molecular surfaces

MayaChemTools is free software; you can redistribute it and/or modify it under the terms of the GNU LGPL as published by the Free Software Foundation.

Comments

Ammolite

 

This looks really interesting, Ammolite enables the transfer of structure related objects from Biotite to PyMOL for visualization, via PyMOL’s Python API:

  • mport AtomArray and AtomArrayStack objects into PyMOL - without intermediate structure files
  • Convert PyMOL objects into AtomArray and AtomArrayStack instances.
  • Use Biotite’s boolean masks for atom selection in PyMOL.
  • Display images rendered with PyMOL in Jupyter notebooks.

To install

conda install -c conda-forge ammolite

Biotite package bundles popular tasks in computational molecular biology into a uniform Python library.

ammolite

Comments

Python on Apple Silicon

 

A lot of people have been asking me about running data analysis on the new laptops with M1 chips. It looks like we are starting to see a few benchmarks appearing.

A recent blog post Are The New M1 Macbooks Any Good for Data Science? Let’s Find Out would suggest that the performance of the M1chip continues to impress.

Whilst all benchmarks come with caveats, some use "native" installations others require Rosetta

Python is approximately three times faster when run natively on a new M1 chip, Numpy looks to be slightly slower, Pandas is twice as fast, SciKit-Learn is twice as fast.

Instructions for installing TensorFlow 2.4 on Apple Silicon M1: installation under Conda environment have also been reported.

PyCharm, JetBrains’ IDE for Python development, now supports Apple Silicon M1 processors.

Comments

JupyterLab 3.0 released

 

JupyterLab is the next-generation web-based user interface for Project Jupyter.

JupyterLab 3.0 includes a number of new features and enhancements that are described on the Jupyter blog. Full details are described in the ChangeLog

To install using conda

conda install -c conda-forge jupyterlab=3

However note that some extensions may not yet have been updated.

Comments

OpenChem

 

OpenChem is a deep learning toolkit for Computational Chemistry with PyTorch backend. The goal of OpenChem is to make Deep Learning models an easy-to-use tool for Computational Chemistry and Drug Design Researchers.

You can read about in this publication DOI.

All code is available on GitHub https://github.com/Mariewelt/OpenChem.

Requires

  • Modern NVIDIA GPU, compute capability 3.5 or newer.
  • Python 3.5 or newer (we recommend Anaconda distribution)
  • CUDA 9.0 or newer

numpy, pyyaml, scipy, ipython, mkl, scikit-learn, six, pytest, pytest-cov

The software is licensed under the MIT license

Comments

RDKit blog

 

If you are a RDKit user then you should bookmark Greg Landrum's RDKit blog https://greglandrum.github.io/rdkit-blog/about/. This is a new site and all the old content will be migrated in due course.

RDKitBlog

Comments

AiZynthFinder: a fast, robust and flexible open-source software for retrosynthetic planning

 

This looks very interesting DOI.

We present the open-source AiZynthFinder software that can be readily used in retrosynthetic planning. The algorithm is based on a Monte Carlo tree search that recursively breaks down a molecule to purchasable precursors. The tree search is guided by an artificial neural network policy that suggests possible precursors by utilizing a library of known reaction templates. The software is fast and can typically find a solution in less than 10 s and perform a complete search in less than 1 min.

Source code is on GitHub https://github.com/MolecularAI/aizynthfinder.

Tested under macOS Catalina

Requires RDKit, Tensorflow, graphviz

Can then be installed using PIP.

The software is licensed under the MIT license

Comments

IOData: A python library for reading, writing, and converting computational chemistry file formats

 

IOData is a free and open‐source Python library for parsing, storing, and converting various file formats commonly used by quantum chemistry, molecular dynamics, and plane‐wave density‐functional‐theory software programs. In addition, IOData supports a flexible framework for generating input files for various software packages. While designed and released for stand‐alone use, its original purpose was to facilitate the interoperability of various modules in the HORTON and ChemTools software packages with external (third‐party) molecular quantum chemistry and solid‐state density‐functional‐theory packages. IOData is designed to be easy to use, maintain, and extend; this is why we wrote IOData in Python and adopted many principles of modern software development, including comprehensive documentation, extensive testing, continuous integration/delivery protocols, and package management. This article is the official release note of the IOData library.

DOI

Source code is on GitHub https://github.com/theochem/iodata, can be installed using conda or pip.

There is also OpenBabel, OpenBabel has support for 113 formats in total.

Comments

Python: First version released to run natively on Apple Silicon

 

Python 3.9.1 has been released this now supports Apple Silicon (M1 chip).

Installer news 3.9.1 is the first version of Python to support macOS 11 Big Sur. With Xcode 11 and later it is now possible to build “Universal 2” binaries which work on Apple Silicon. We are providing such an installer as the macos11.0 variant. This installer can be deployed back to older versions, tested down to OS X 10.9. As we are waiting for an updated version of pip, please consider the macos11.0 installer experimental. This work would not have been possible without the effort of Ronald Oussoren, Ned Deily, and Lawrence D’Anna from Apple. Thank you!

Also note macOS ARM builds on conda-forge, and clang compilers for conda-build 3 https://anaconda.org/conda-forge/clang_osx-arm64


Comments

Molecular Similarity Search Benchmark (MssBenchmark)

 

This looks like it could be a very useful resource.

Molecular Similarity Search Benchmark (MssBenchmark) on GitHub https://github.uconn.edu/mldrugdiscovery/MssBenchmark these can be run on your local machine or on a HPC.

Currently supports

They also have ChEMBL and Molport as test datasets.

Requires

  • ansicolors==1.1.8
  • docker==2.6.1
  • h5py==2.7.1
  • matplotlib==2.1.0
  • numpy==1.13.3
  • pyyaml==3.12
  • psutil==5.4.2
  • scipy==1.0.0
  • scikit-learn==0.19.1
  • jinja2==2.10
  • h5sparse==0.1.0
Comments

Writing Python with Xcode

 

I was reading a recent KDnuggets article on a recent poll "What Python IDE / Editor you used the most in 2020?", as expected the poll was topped by Jupyter Notebook (42%), JupyterLab added extra (14%). Visual Studio Code, PyCharm and Spyder were also popular options.

I started wondering if it was possible to use Xcode to code python, the answers "Yes", but it requires a little setting up to do. After a fair bit of online searching I managed to put together a set of instructions that I thought I'd share.

Writing Python with Xcode


Comments

PyMOL 2.4 released

 

PyMOL 2.4 has been released. Download ready-to-use bundles from https://pymol.org/ or update your installation with

conda install -c schrodinger pymol

Highlights:

  • Incentive PyMOL only:

    • Support for https://lookingglassfactory.com/schrodinger
    • Pi-Pi and Pi-Cation interactions (A > find > pi-interactions)
    • WaterMap result presets (A > preset > WaterMap ...)
    • APBS Plugin improvements (multi-state assemblies, propka pH calculation)
  • Open-Source and Incentive PyMOL:

    • Distinguish .mrc and .ccp4 formats (origin interpretation)
    • Trajectory handling improvements
    • Improved error handling in Python API with exceptions
    • ... many bug fixes

This will be the last release with support for Python 2.7.

Full release notes https://pymol.org/dokuwiki/?id=media:new24

Comments

f90wrap: an automated tool for constructing deep Python interfaces to modern Fortran codes

 

f90wrap is a tool to automatically generate Python extension modules which interface to Fortran libraries that makes use of derived types. It builds on the capabilities of the popular f2py utility by generating a simpler Fortran 90 interface to the original Fortran code which is then suitable for wrapping with f2py, together with a higher-level Pythonic wrapper that makes the existance of an additional layer transparent to the final user. f90wrap has been used to wrap a number of large software packages of relevance to the condensed matter physics community, including the QUIP molecular dynamics code and the CASTEP density functional theory code.

The full paper is here https://iopscience.iop.org/article/10.1088/1361-648X/ab82d2

Install using PIP

pip install f90wrap

Source code is on GitHub https://github.com/jameskermode/f90wrap.

Now added to the Fortran on a Mac page

Comments

Swift for Tensorflow (and other things).

 

After creating MolSeeker and iBabel4 I've been investigating the use of Swift and in particular the open-source use.

Swift.org provides a nice introduction and overview, it also highlights the Google Summer of Code Swift projects which are a fabulous way for students to get involved.

The Google Swift for TensorFlow group have been very active, and Tyrolabs have recently posted a detailed summary, including a comparison with other languages.

Two years ago, a small team at Google started working on making Swift the first mainstream language with first-class language-integrated differentiable programming capabilities. The scope and initial results of the project have been remarkable, and general public usability is not very far off.

They have now provided support for Jupyter notebooks https://github.com/google/swift-jupyter

There is also an interesting blog post here fast.ai.

IBM also seem to be using swift https://developer.ibm.com/technologies/swift/ and are highlighting leveraging Watson.

Developers can take advantage of the Watson Developer Cloud’s Swift SDK to easily build Watson-powered applications for iOS or Linux platforms. Leverage the power of Watson’s advanced artificial intelligence, machine learning, and deep learning techniques to understand unstructured data and engage with users in new ways.

Since Swift is a relatively new language it is worth looking at the ongoing evolution.

Comments

Jupyter notebook to access IBM RXN AI-assisted retrosynthesis

 

A python wrapper for the IBM RXN api has been released, available on GitHub https://github.com/rxn4chemistry/rxn4chemistry

To install

pip install rxn4chemistry

You will need to register and get an api key from here https://rxn.res.ibm.com/rxn/user/profile.

This demo shows how to use for retrosynthesis ideas.

ibmrxn_notebook

The page also includes links to download the notebook.

Comments

Jupyter notebook to access IBM RXN API

 

A python wrapper for the IBM RXN api has been released, available on GitHub https://github.com/rxn4chemistry/rxn4chemistry

To install

pip install rxn4chemistry

You will need to register and get an api key from here https://rxn.res.ibm.com/rxn/user/profile.

Simple demo using Jupyter Notebook

IBMRXNdemo

This is going to be very useful.

Comments

Ensemble learning in Cheminformatics

 

Yet another invaluable post on cheminformatics and machine learning Python package for Ensemble learning #Chemoinformatics #Scikit learn.

Ensemble learning sometime outperform than single model. So it is useful for try to use the method. Fortunately now we can use ensemble learning very easily by using a python package named ‘mlens‘

Install using PIP

pip install mlens

ML-Ensemble (mlens) is an open-source high performance ensemble learning package written in Python, code is available on GitHub https://github.com/flennerhag/mlens.

ML-Ensemble combines a Scikit-learn high-level API with a low-level computational graph framework to build memory efficient, maximally parallelized ensemble networks in as few lines of codes as possible.


Comments

Modin for distributed Pandas calculations

 

Modin is a library designed to accelerate Pandas by automatically distributing the computation across all of the system’s available CPU cores. Modin uses Ray to provide an effortless way to speed up your pandas notebooks, scripts, and libraries. Unlike other distributed DataFrame libraries, Modin provides seamless integration and compatibility with existing pandas code. Even using the DataFrame constructor is identical. Modin is a DataFrame designed for datasets from 1MB to 1TB+

It can be installed using PIP

pip install modin

If you don't have Ray or Dask installed, you will need to install Modin with one of the targets:

pip install modin[ray] # Install Modin dependencies and Ray to run on Ray
pip install modin[dask] # Install Modin dependencies and Dask to run on Dask
pip install modin[all] # Install all of the above

Currently, Modin depends on pandas version 0.23.4.

I've added Modin to the Open Source Data Science Python Libraries.

Comments

Python and CUDA

 

After my last post on Macs and CUDA I was sent a link to CuPy which is a library that is supported by NVIDIA that allows to easily run CUDA code in Python using NumPy arrays as input.

CuPy's interface is highly compatible with NumPy; in most cases it can be used as a drop-in replacement. All you need to do is just replace numpy with cupy in your Python code. It supports various methods, indexing, data types, broadcasting and more.

To install

pip install cupy

Note

The latest version of cuDNN and NCCL libraries are included in binary packages (wheels). For the source package, you will need to install cuDNN/NCCL before installing CuPy, if you want to use it.

Or you can install versions specific to the particular CUDA environment. Full details are on GitHub https://github.com/cupy/cupy.

Comments

Determining the Amino Acids in a collection of peptides

 

I've recently become interested the comparison of the amino amino-acid composition of peptides, to allow comparison of cyclic versus linear peptides, or brain penetrant curses non-penetrant. I had a look around but could not find any tools that did this, in particular I wanted to include any non-proteinergic amino-acids.

This tutorial provides a means to analyse many thousands of peptides using Vortex.

Comments

Ten simple rules for writing and sharing computational analyses in Jupyter Notebooks

 

As a regular Jupyter/Python user this publication (PLoS Comput Biol 15(7): e1007007) DOI is a great reminder of good practice, and as Jupyter becomes increasingly popular as a means to share code/data/results writing the notebook in a manner that helps readers is increasingly important.

This ability to combine executable code and descriptive text in a single document has close ties to Knuth’s notion of “literate programming” and has convinced many researchers to switch to computational notebooks from other programming environments. Jupyter Notebooks in particular have seen widespread adoption: as of December 2018, there were more than 3 million Jupyter Notebooks shared publicly on GitHub (https://www.github.com), many of which document academic research.

There are of course many different ways to share Jupyter notebooks.

Whether you use notebooks to track preliminary analyses, to present polished results to collaborators, as finely tuned pipelines for recurring analyses, or for all of the above, following this advice will help you write and share analyses that are easier to read, run, and explore.

Comments

An interactive RDKit widget for Jupyter: a first pass

 

This looks like it could be very interesting.

A blog post by Greg Landrum a widget for displaying molecules where you can select atoms and find out which atoms are selected propagating to Python in a Jupyter Notebook.

This is basic, but I think it's a decent start towards something that could be really useful. Interested? Have suggestions (ideally accompanied by code!) on how to improve it? If it looks like this is actually likely to be used, I will figure out how to create a standalone nbwidget out of this and create a separate github repo for it.

Looks like a useful tool for selecting bonds for conformational analysis, selecting bonds for creating a Ramachandran plot, selecting groups for bioisosteric replacement……

Sounds like Greg is looking for input.


Comments

Jupyter notebook to look at molecular similarity

 

I was recently asked for a tool to compare the similarity of a list of molecules with every other molecule in the list. I suspect there may be commercial tools to do this but for small numbers of compounds it is easy to visualise in a Jupyter notebook using RDKit.

Read more here, MolecularSimilarityNotebook

molsim


Comments

Extending Jupyter

 

I'm a great fan of Jupyter notebooks and I'm always looking for ways to get more out of them. I came across this blog post recently which is packed with useful tips

99 ways to extend the Jupyter ecosystem

Whenever someone says ‘You can do that with an extension’ in the Jupyter ecosystem, it is often not clear what kind of extension they are talking about. The Jupyter ecosystem is very modular and extensible, so there are lots of ways to extend it. This blog post aims to provide a quick summary of the most common ways to extend Jupyter, and links to help you explore the extension ecosystem.

I've also published some notebooks under Tips and Tutorials, Jupyter notebooks


Comments

Python leads the 11 top Data Science, Machine Learning platforms

 

The results of the latest KDnuggets poll, which is in it's 20th year, are in. Python is clearly moving to become the dominant platform with the votes for R slowly declining.

top-analytics-data-science-machine-learning-software-2019-3yrs-590

The blog post on KDnuggets gives far more detailed analysis and is well worth reading.


Comments

Jupyter notebook to create Wordcloud of tweets

 

I've often wanted to try creating a word cloud and when Noel O'Boyle collected together all the tweets from the Sheffield Conf on Chemoinformatics this seemed a good opportunity.

Relive the Sheffield Conf on Chemoinformatics with these #shef2019 tweets I've pulled down from Twitter, link to tweet.

The Jupyter notebook used to create the word cloud is here, it uses the excellent word cloud generator word_cloud. You will need to download the text from the tweets from the link provided in the tweet.

test

Comments

Binder news

 

If you use Binder to serve your Jupyter notebooks you will be interested in this.

Have a repository full of Jupyter notebooks? With Binder, open those notebooks in an executable environment, making your code immediately reproducible by anyone, anywhere

We flipped the switch on making mybinder.org 6 a federation. This means that there are now two clusters that serve requests for mybinder.org 6. What changes for you as a user? Hopefully nothing. You will notice that if you visit mybinder.org 6 (or any other link to it) you will be redirected to gke.mybinder.org 1 or ovh.mybinder.org 5. Beyond that small change everything should keep working as before

This should mean that Binder becomes more robust and not susceptible to outages. Now this is in place it should also be possible to add further server resources.

Comments

Special Issue "Machine Learning with Python"

 

I was just sent details of a Special Issue "Machine Learning with Python for the journal Information.

We live in this day and age where quintillions of bytes of data are generated and collected every day. Around the globe, researchers and companies are leveraging these vast amounts of data in countless application areas, ranging from drug discovery to improving transportation with self-driving cars.As we all know, Python evolved into the lingua franca of machine learning and artificial intelligence research over the last couple of years. What makes Python particularly attractive for us researchers is that it gives us access to a cohesive set of tools for scientific computing and is easy to teach and learn. Also, as a language that bridges many different technologies and different fields, Python fosters interdisciplinary collaboration. And besides making us more productive in our research, sharing tools we develop in Python has the potential to reach a wide audience and benefit the broader research community.

This special issue is now open for submission.

Comments

End of the line for Python 2

 

Just a reminder that support for Python 2.7 will end on Jan 31 2020 (there will be no 2.8), all major scientific packages now support Python 3.x and there will be no further updates the Python 2.x versions.

An increasing number of projects have pledged to drop support for Python 2.7 no later than 2020, these include pandas, RDKit, iPython, Matplotlib, NumPy, SciPy, BioPython, Psi4, scikit-learn, Tensorflow, Jupyter notebook and many more.

Time to update those old scripts and Jupyter notebooks.

Comments

CGRtools: Python Library for Molecule, Reaction and Condensed Graph of Reaction Processing

 

CGRtools is a set of tools for processing of reactions based on Condensed Graph of Reaction (CGR) approach, details on Github https://github.com/cimm-kzn/CGRtools. Published in JCIM DOI

Basic operations:

  • Read /write /convert formats MDL .RDF and .SDF, SMILES, .MRV
  • Standardize reactions and valid structures checker.
  • Produce CGRs.
  • Perfrom subgraph search.
  • Build /correct molecules and reactions.
  • Produce template based reactions.

stable version are available through PyPI

pip install CGRTools

Install CGRtools library DEV version for features that are not well tested

pip install -U git+https://github.com/cimm-kzn/CGRtools.git@master#egg=CGRtools

There is also a tutorial using Jupyter notebook https://github.com/cimm-kzn/CGRtools/tree/master/tutorial


Comments

HELM notation in Jupyter Notebook

 

I was recently asked for a way to visualise HELM notation

HELM (Hierarchical Editing Language for Macromolecules) enables the representation of a wide range of biomolecules such as proteins, nucleotides, antibody drug conjugates etc. whose size and complexity render existing small-molecule and sequence-based informatics methodologies impractical or unusable.

The RDKit provides limited support for HELM notation (currently peptide) and a simple Jupyter Notebook provides an easy interface as shown here


Comments

A Quick look at Flare and Python

I recently wrote a review of Flare Version 2 which is a recent extension to the Cresset portfolio with the introduction of Electrostatic Complementarity (EC), i.e. a comparison of electrostatics on both the small molecule ligand and the target protein. In addition Flare version 2 includes a new Python API, that allows users to automate tasks by scripting, but also integration with other Python packages such as RDKit cheminformatics toolkit, Python modules for graphing, statistics (NumPy, SciPy, MatPlotLib), and Jupyter notebook integration, it is this aspect of Flare that is the subject of this review.


Comments

Python Collection

 

I was sent to this recently.

Python Collection

This collection publishes articles describing new Python modules and libraries, as well as applications developed in Python. Python is a free, open source programming language with an emphasis on readability which is widely used in science due to its ease of use and high-performance. Python’s usefulness in research is further bolstered by scientific libraries and tools such as Numpy, Scipy, Pandas, IPython and MatPlotlib. As for example demonstrated by Biopython, Python libraries can be incredibly valuable to other researchers. Publishing a citable, peer reviewed article outlining a new package boosts its visibility and enables its creators to receive proper credit for their contribution.

Very little there at present but I'll keep an eye on it for the future.


Comments

QUBEKit: QUantum BEspoke FF toolKit

 

Just saw an interesting paper "QUBEKit: Automating the Derivation of Force Field Parameters from Quantum Mechanics" DOI.

QUBEKit is python based force field derivation toolkit that allows users to derive accurate molecular mechanics parameters directly from quantum mechanical calculations.

Code is available on GitHub QUBEKit, and there is a user tutorial on the Wiki Page.

Requirements:

  • Anaconda3
  • Biochemical and Organic Simulation System (BOSS)
  • OpenMM
  • Gaussian09
  • ONETEP
  • Matlab 2017

Python modules used:

  • numpy
  • argparse
  • collections
  • colorama
  • matplotlib

Comments

Counting Identical structures in two datasets

 

Sometimes I have two datasets and I just want to know the overlap of identical structures. This Vortex script counts the number of identical structures by comparing InChIKeys. It then displays a matrix showing how many unique molecules in each dataset and how many molecules are in both datasets.

results

Comments

Update to MayaChemTools

 

I just heard that the following command line scripts available as part of MayaChemTools package now have implemented multiprocessing functionality.

o RDKitCalculateMolecularDescriptors.py

o RDKitCalculatePartialCharges.py

o RDKitGenerateConformers.py

o RDKitFilterChEMBLAlerts.py

o RDKitFilterPAINS.py

o RDKitPerformMinimization.py

o RDKitRemoveSalts.py

o RDKitSearchSMARTS.py


Comments

PyMOL 2.3 released

 

Just got this message

We are happy to announce the release of PyMOL 2.3. Download ready-to-use bundles from https://pymol.org/2/ or update your installation with "conda install -c schrodinger pymol". New features include: - Atom-level cartoon transparency - Fast MMTF export - Sequence viewer gaps display

This is the first time there are PyMOL bundles with Python 3. If you use custom or third-party Python 2 scripts, they might stop working until you convert them.

Full release notes are here https://pymol.org/dokuwiki/?id=media:new23 and


Comments

New release of MayaChemTools

 

A new release of MayaChemTools is now available, these comprise a fantastic collection of Perl and Python scripts, modules, and classes to support a variety of day-to-day computational discovery needs.

The core set of command line Perl scripts available in the current release of MayaChemTools has no external dependencies and provide functionality for the following tasks:

  • Manipulation and analysis of data in SD, CSV/TSV, sequence/alignments, and PDB files
  • Listing information about data in SD, CSV/TSV, Sequence/Alignments, PDB, and fingerprints files
  • Calculation of a key set of physicochemical properties, such as molecular weight, hydrogen bond donors and acceptors, logP, and topological polar surface area
  • Generation of 2D fingerprints corresponding to atom neighborhoods, atom types, E-state indices, extended connectivity, MACCS keys, path lengths, topological atom pairs, topological atom triplets, topological atom torsions, topological pharmacophore atom pairs, and topological pharmacophore atom triplets
  • Generation of 2D fingerprints with atom types corresponding to atomic invariants, DREIDING, E-state, functional class, MMFF94, SLogP, SYBYL, TPSA and UFF
  • Similarity searching and calculation of similarity matrices using available 2D fingerprints
  • Listing properties of elements in the periodic table, amino acids, and nucleic acids
  • Exporting data from relational database tables into text files

The command line Python scripts based on RDKit provide functionality for the following tasks:

  • Calculation of molecular descriptors and partial charges
  • Comparison of 3D molecules based on RMSD and shape
  • Conversion between different molecular file formats
  • Enumeration of compound libraries and stereoisomers
  • Filtering molecules using SMARTS, PAINS, and names of functional groups
  • Generation of graph and atomic molecular frameworks
  • Generation of images for molecules
  • Performing structure minimization and conformation generation based on distance geometry and forcefields
  • Performing R group decomposition
  • Picking and clustering molecules based on 2D fingerprints and various clustering methodologies
  • Removal of duplicate molecules and salts from molecules

The command line Python scripts based on PyMOL provide functionality for the following tasks:

  • Aligning macromolecules
  • Splitting macromolecules into chains and ligands
  • Listing information about macromolecules
  • Calculation of physicochemical properties
  • Comparison of marcromolecules based on RMSD
  • Conversion between different ligand file formats
  • Mutating amino acids and nucleic acids
  • Generating Ramachandran plots
  • Visualizing X-ray electron density and cryo-EM density
  • Visualizing macromolecules in terms of chains, ligands, and ligand binding pockets
  • Visualizing cavities and pockets in macromolecules
  • Visualizing macromolecular interfaces
  • Visualizing surface and buried residues in macromolecules

Comments

Using the Python 3 library fpsim2 for similarity searches

 

FPSim2 is a new tool for fast similarity search on big compound datasets (>100 million) being developed at ChEMBL. It was developed as a Python3 library to support either in memory or out-of-core fast similarity searches on such dataset sizes.

It is built using RDKit and can be installed using conda. It requires Python 3.6 and a recent version of RDKit..

I've written a couple of Jupyter notebooks to demonstrate it's use.

You can read the full tutorial here, and download the notebooks.






Comments

Comparison of bioactivity predictions

 

Small molecules can potentially bind to a variety of bimolecular targets and whilst counter-screening against a wide variety of targets is feasible it can be rather expensive and probably only realistic for when a compound has been identified as of particular interest. For this reason there is considerable interest in building computational models to predict potential interactions. With the advent of large data sets of well annotated biological activity such as ChEMBL and BindingDB this has become possible.

ChEMBL 24 contains 15,207,914 activity data on 12,091 targets, 2,275,906 compounds, BindingDB contains 1,454,892 binding data, for 7,082 protein targets and 652,068 small molecules.

These predictions may aid understanding of molecular mechanisms underlying the molecules bioactivity and predicting potential side effects or cross-reactivity.

Whilst there are a number of sites that can be used to predict bioactivity data I'm going to compare one site, Polypharmacology Browser 2 (PPB2) http://ppb2.gdb.tools with two tools that can be downloaded to run the predictions locally. One based on Jupyter notebooks models built using ChEMBL built by the ChEMBL group https://github.com/madgpap/notebooks/blob/master/targetpred21_demo.ipynb and a more recent random forest model PIDGIN. If you are using proprietary molecules it is unwise to use the online tools.

Read the article here

Comments

Optimizing colormaps with consideration for color vision deficiency to enable accurate interpretation of scientific data

 

Around 4% of the population suffer from colour blindness in one for or another with red/green colour blindness being the most common and sadly in many plots, graphs, presentations little effort is made to make things easier for those people with colour blindness.

Color blindness, also known as color vision deficiency (CVD), is the decreased ability to see color or differences in color. Simple tasks such as selecting ripe fruit, choosing clothing, and reading traffic lights can be more challenging. Color blindness may also make some educational activities more difficult.

A recent publication seeks to address this need, Optimizing colormaps with consideration for color vision deficiency to enable accurate interpretation of scientific data DOI

While there have been some attempts to make aesthetically pleasing or subjectively tolerable colormaps for those with CVD, our goal was to make optimized colormaps for the most accurate perception of scientific data by as many viewers as possible. We developed a Python module, cmaputil, to create CVD-optimized colormaps, which imports colormaps and modifies them to be perceptually uniform in CVD-safe colorspace while linearizing and maximizing the brightness range. The module is made available to the science community to enable others to easily create their own CVD-optimized colormaps.

journal.pone.0199239.g001


Comments

Rescoring Docking using RF-Score-VS

 

A little while back I described a docking workflow including a rescoring script for Vortex, so I thought it might be useful to include this on a separate page.

Recently, machine-learning scoring functions trained on protein-ligand complexes have shown significant promise an example being (RF-Score-VS) trained on 15 426 active and 893 897 inactive molecules docked to a set of 102 targets DOI.

Our results show RF-Score-VS can substantially improve virtual screening performance: RF-Score-VS top 1% provides 55.6% hit rate, whereas that of Vina only 16.2% (for smaller percent the difference is even more encouraging: RF-Score-VS top 0.1% achieves 88.6% hit rate for 27.5% using Vina). In addition, RF-Score-VS provides much better prediction of measured binding affinity than Vina (Pearson correlation of 0.56 and −0.18, respectively). Lastly, we test RF-Score-VS on an independent test set from the DEKOIS benchmark and observed comparable results.

Binaries for RF-Score-VS are available https://github.com/oddt/rfscorevs_binary.

results.png

The full details of the Vortex script are here.


Comments

19th annual KDnuggets Software Poll

 

The results of the 19th annual KDnuggets Software Poll are now in. Continuing the trend over the last few years Python continues to expand its user base and is now up to 66%. Since a couple of the other options are also Python based this could be an underestimate.

Pasted Graphic

There is more detailed analysis on the website. Interestingly Python seems to be the only programming language that is increasing in use.


Comments

Most popular Python IDE, Editors

 

I always keep an eye out for the polls on KDnuggets, the latest one looks at Python editors or IDEs, over 1900 people took part and the results are shown below (users could select up to 3). There is more detail in the linked page.

poll-top-python-ide-468

I've become a great fan of Jupyter, and not only for Python.



Comments

Installing Osprey 3.0 under Mac OS X

 

A recent publication described OSPREY 3.0: Open-Source Protein Redesign for You, with Powerful New Feature DOI.

We present Osprey 3.0, a new and greatly improved release of the osprey protein design software. Osprey 3.0 features a convenient new Python interface, which greatly improves its ease of use. It is over two orders of magnitude faster than previous versions of osprey when running the same algorithms on the same hardware. Moreover, osprey 3.0 includes several new algorithms, which introduce substantial speedups as well as improved biophysical modeling. It also includes GPU support, which provides an additional speedup of over an order of magnitude. Like previous versions of osprey, osprey 3.0 offers a unique package of advantages over other design software, including provable design algorithms that account for continuous flexibility during design and model conformational entropy. Finally, we show here empirically that osprey 3.0 accurately predicts the effect of mutations on protein–protein binding.

Osprey 3.0 is available at http://www.cs.duke.edu/donaldlab/osprey.php as free and open‐source software GPLv2.

The source code is available on GitHub https://github.com/donaldlab/OSPREY3/.

Unfortunately the installation instructions do not include Mac OSX but there are instructions for "Debian-like Linux" which seemed promising. With the invaluable help of Nathan Guerin I was able to get OSPREY installed.

Read more…..


Comments

Open Source Python Data Science Libraries

 

When I wrote the article entitled A few thoughts on scientific software one of the responses I got was that people did not know about the existence of open-source chemistry toolkits so I thought I'd publish a page that hopefully prevent stop people reinventing the wheel. Here are a few open-source cheminformatics toolkits that I'm aware of.

As a follow up I thought I'd put together a list of useful python libraries for data science

As always happy to hear comments or suggestion for additions.



Comments

GuacaMol, benchmarking models.

 

Comparison of different algorithms is an under researched area, this publication looks like a useful starting point.

GuacaMol: Benchmarking Models for De Novo Molecular Design

De novo design seeks to generate molecules with required property profiles by virtual design-make-test cycles. With the emergence of deep learning and neural generative models in many application areas, models for molecular design based on neural networks appeared recently and show promising results. However, the new models have not been profiled on consistent tasks, and comparative studies to well-established algorithms have only seldom been performed. To standardize the assessment of both classical and neural models for de novo molecular design, we propose an evaluation framework, GuacaMol, based on a suite of standardized benchmarks. The benchmark tasks encompass measuring the fidelity of the models to reproduce the property distribution of the training sets, the ability to generate novel molecules, the exploration and exploitation of chemical space, and a variety of single and multi-objective optimization tasks. The benchmarking framework is available as an open-source Python package.

Source code : https://github.com/BenevolentAI/guacamol.

The easiest way to install guacamol is with pip:

pip install git+https://github.com/BenevolentAI/guacamol.git#egg=guacamol --process-dependency-links

guacamol requires the RDKit library (version 2018.09.1.0 or newer).


Comments

Samson documentation updated

 

I've mentioned Samson a couple of times and I noticed that the documentation has been updated. Documentation is a critical but often overlooked feature of software.

SAMSON is a novel software platform for computational nanoscience. Rapidly build models of nanotubes, proteins, and complex nanosystems. Run interactive simulations to simulate chemical reactions, bend graphene sheets, (un)fold proteins. SAMSON's generic architecture makes it suitable for material science, life science, physics, electronics, chemistry, and even education. SAMSON is developed by the NANO-D group at INRIA, and means "Software for Adaptive Modeling and Simulation Of Nanosystems

Pasted Graphic


Comments

Making a Random Selection

 

Sometimes it is the simplest scripts that prove to be the most useful, the most downloaded AppleScript on the site is the one that simply prints the text on the clipboard.

I regularly need to select a specified number of molecules in a random fashion and this script does just that. Import a sdf file containing structures into Vortex and run the script to make a random selection.

results

Full details here….


Comments

An automated framework for NMR chemical shift calculations of small organic molecules

 

A recent paper in Journal of Cheminformatics describes An automated framework for NMR chemical shift calculations of small organic molecules DOI.

As an alternative, we introduce the in silico Chemical Library Engine (ISiCLE) NMR chemical shift module to accurately and automatically calculate NMR chemical shifts of small organic molecules through use of quantum chemical calculations. ISiCLE performs density functional theory (DFT)-based calculations for predicting chemical properties—specifically NMR chemical shifts in this manuscript—via the open source, high-performance computational chemistry software, NWChem.

Isicle is available from GitHub https://github.com/pnnl/isicle or can be installed using Conda (with required dependencies

conda create -n isicle -c bioconda -c openbabel -c rdkit -c ambermd python=3.6.1 openbabel rdkit ambertools snakemake numpy pandas yaml statsmodels

In addition, ensure the following third-party software is installed and added to your PATH:

cxcalc (license required from ChemAxon, Marvin)
NWChem http://www.nwchem-sw.org/index.php/Download.

ISiCLE is implemented using the Snakemake workflow management system, enabling scalability, portability, provenance, fault tolerance, and automatic job restarting. Snakemake provides a readable Python-based workflow definition language and execution environment that scales, without modification, from single-core workstations to compute clusters through as-available job queuing based on a task dependency graph.

There is more details on Snakemake here.

I've added Isicle to the Spectroscopy Page.


Comments

Embeding LaTeX and MathML in Jupyter Notebooks

 

I've been using Jupyter notebooks for a little while but I only just recently found out that you can embed LaTeX or MathML into a notebook!

This notebook is just a series of examples of what can be done. You can embed equations inline or have them on a separate line in a markdown text cell. Or in a code cell by importing Math or invoking latex.




Comments

How to contribute to RDKit

 

I just noticed that Greg Landrum has posted a page on how to contribute to RDKit. https://github.com/rdkit/rdkit/wiki/HowToContribute.

There many ways to contribute, you don't have to be Python or C++ developer, simply being an active user and asking questions and contributing solutions helps other users. Improving the documentation is always a great place from newcomers to start, particularly highlighting things that are not as clear as they could be.

I've also added the link to the Toolkits page.


Comments

PythoMS: A Python framework for analysis of mass spectrometric data

 

An interesting publication for those who use Mass Spectroscopy, PythoMS: A Python Framework to Simplify and Assist in the Processing and Interpretation of Mass Spectrometric Data Chemrxiv.

The PythoMS framework introduces a library of classes and a variety of scripts that quickly perform time-consuming tasks: making proprietary output readable; binning intensity vs time data to simulate longer scan times (and hence reduce noise); calculate theoretical isotope patterns and overlay them in histogram form on experimental data (an approach that works even for overlapping signals); render videos that enable zooming into the baseline of intensity vs. time plots (useful to make sense of data collected over a large dynamic range) or that depict the evolution of different species in a time-lapse format; calculate aggregates; and provide a quick first-pass at identifying fragments in MS/MS spectra. PythoMS is a living project that will continue to evolve as additional scripts are developed and deployed.

All available on GitHub under the MIT license https://github.com/larsyunker/PythoMS. This package has been written for python 3.5+.

I've added it to the Spectroscopy page.



Comments

New functionality in PyMOL command line scripts

 

MayaChemTools is a growing collection of Perl and Python scripts, modules, and classes to support a variety of day-to-day computational discovery needs.

The PyMOL command line scripts now have additional functionality:

  • Volume objects to visualize X-ray and cryo-EM density for complex, chains, ligands, binding pockets, pocket solvents, pocket inorganics, etc.
  • Alignment of macromolecules and densities during visualization of X-ray and cryo-EM densities
  • Surface colored by vacuum electrostatics at residue level for chains and pockets
  • Surface colored by hydrophobicity along with charge at atom level for chains and pockets
  • Aromatic, polar, positively charged, negatively charged, and other residue group objects for chains and pockets

Comments

Camelot, python tool for extracting PDF table data

 

Camelot is described as a PDF Table Extraction for Humans, it is a Python library that makes it easy to extract tables from PDF files.

>>> import camelot
>>> tables = camelot.read_pdf('foo.pdf')
>>> tables
<TableList n=1>
>>> tables.export('foo.csv', f='csv', compress=True) # json, excel, html
>>> tables[0]
<Table shape=(7, 7)>
>>> tables[0].parsing_report
{
    'accuracy': 99.02,
    'whitespace': 12.24,
    'order': 1,
    'page': 1
}
>>> tables[0].to_csv('foo.csv') # to_json, to_excel, to_html
>>> tables[0].df # get a pandas DataFrame!

Camelot only works with text-based PDFs and not scanned documents. Camelot also comes with a command-line interface. It can be installed using conda

$ conda install -c camelot-dev camelot-py

I've added it to the Data Analysis tools page

Comments

TTClust : A molecular simulation clustering program

 

TTclust DOI is a python program used to cluster molecular dynamics simulation trajectories. It only requires a trajectory and a topology file (compatible with most molecular dynamic packages such as Amber, Gromacs, Chramm, Namd or trajectory in PDB format thanks to the MDtraj package).

It is available on GitHub https://github.com/tubiana/TTClust.

For Mac user

If you have issues with pip, first try to add to pip the --ignore-installed argument : sudo pip install --ignore-installed -r requirements.txt If it still doesn't work, it's maybe because of the System Integrity Protection (SIP). I suggest you in this case install ANACONDA or MINICONDA and restart your terminal afterwards. Normally, the pip command should work because your default python will be the anaconda (or miniconda) python.

If you have still issues with the GUI or missing packages : install with pip :

pip install wxpython==4.0.0b1
pip install pandas
pip install ttclust

To activate autocompletion for the argpase module, you have to use this command (only once):

sudo activate-global-python-argcomplete

Pasted Graphic


Comments

Install RDKiit using Conda

 

Just highlighted on the RDKit email list, you can install RDKit using conda.

https://anaconda.org/conda-forge/rdkit

RDKit is a collection of cheminformatics and machine-learning software written in C++ and Python.

There are other cheminformatics toolkits described here, and details on how to install a wide range of cheminformatics tools on a Mac detailed here


Comments

pywindow: Automated Structural Analysis of Molecular Pores

 

An interesting recent publication describes pywindow DOI a Python package for the analysis of structural properties of molecular pores (porous organic cages, but also MOFs and metallorganic cages).

Structural analysis of molecular pores can yield important information on their behavior in solution and in the solid state. We developed pywindow, a python package that enables the automated analysis of structural features of porous molecular materials, such as molecular cages.

Freely available on Github https://github.com/JelfsMaterialsGroup/pywindow

Requires numpy, scipy, scikit-learn

A number of Jupyter notebook examples are provided

Example1: Structural analysis of a single molecule loaded from a file type. (multiple examples)
Example
2: Structural analysis of a single molecule loaded from an RDKit Molecule object. (required RDKit)
Example3: Calculating an average molecule diameter.
Example
4: Analysis of a MOF.
Example5: Analysis of a metal-organic cage.
Example
6: Analysis of a periodic system containing several molecular pores that requires unit cell reconstruction.
Example7: Analysis of an MD trajectory containing single molecular pore.
Example
8: Analysis of an MD trajectory containing periodic system with multiple molecular pores that requires unit cell reconstruction


Comments

Installing Cheminformtics packages on a Mac

 

A while back I wrote a very popular page describing how to install a wide variety of chemiformatics packages on a Mac, since there have been some changes with Homebrew which have meant that a few of the scientific applications are no longer available so I've decided to rewrite the page on installing the missing packages using Anaconda.

I've also included a list of quick demos so you can everything is working as expected.

Full details are here

Packages include:

  • OpenBabel
  • RDKit
  • brew install cdk
  • chemspot
  • indigo
  • inchi
  • opsin
  • osra
  • pymol
  • oddt

In addition to gfortran and a selection of developers tools.

Comments

MayaChemtools

 

MayaChemTools now includes a collection of python scripts for PyMol

The command line Python scripts based on PyMOL provide functionality for the following tasks:

Aligning macromolecules Splitting macromolecules into chains and ligands Listing information about macromolecules Calculation of physicochemical properties Comparison of marcromolecules based on RMSD Conversion between different ligand file formats Visualizing X-ray electron density and cryo-EM density Visualizing macromolecules in terms of chains, ligands, and ligand binding pockets

MayaChemTools is a growing collection of Perl and Python scripts, modules, and classes to support a variety of day-to-day computational discovery needs.


Comments

ChEMBL 24 predictive models

 

Recently ChEMBL was updated to version 24 the update contains:

  • 2,275,906 compound records
  • 1,828,820 compounds (of which 1,820,035 have mol files)
  • 15,207,914 activities
  • 1,060,283 assays
  • 12,091 targets
  • 69,861 documents

In addition today they released the predictive models built on the updated database, they can be downloaded from the ChEMBL ftp server ftp://ftp.ebi.ac.uk/pub/databases/chembl/target_predictions

There are 1569 models.


Comments

Updated Conda

 

I've been checking a few things since I updated. One thing that was immediately apparent was the similarity maps in RDKit are much nicer! As you can see from the output of the HERG prediction.

hergactiverdkit

Feel like I got something for free.


Comments

Updating conda

 

I've been putting off doing any updates until I finished a substantial piece of work, but now I have time so wish me luck.

conda update -n root conda

conda update --all

Comments

Accessing a Jupyter Notebook HERG model from Vortex

 

A recent paper "The Catch-22 of Predicting hERG Blockade Using Publicly Accessible Bioactivity Data" DOI described a classification model for HERG activity. I was delighted to see that all the datasets used in the study, including the training and external datasets, and the models generated using these datasets were provided as individual data files (CSV) and Python Jupyter notebooks, respectively, on GitHub https://github.com/AGPreissner/Publications).

The models were downloaded and the Random Forest Jupyter Notebooks (using RDKit) modified to save the generated model using pickle to store the predictive model, and then another Jupyter notebook was created to access the model without the need to rebuild the model each time. This notebook was exported as a python script to allow command line access, and Vortex scripts created that allow the user to run the model within Vortex and import the results and view the most significant features.

All models and scripts are available for download.

Full details are here…

hergactiveVortex


Comments

Scaling Python with Dask webinar

 

This looks to be an interesting webinar on Dask

https://know.anaconda.com/Scaling-Python-Dask-Webinar.html Wednesday, May 30th at 2:00PM CDT.

Dask is a flexible parallel computing library for analytic computing.

Dask is composed of two components:

  • Dynamic task scheduling optimized for computation. This is similar to Airflow, Luigi, Celery, or Make, but optimized for interactive computational workloads.
  • “Big Data” collections like parallel arrays, dataframes, and lists that extend common interfaces like NumPy, Pandas, or Python iterators to larger-than-memory or distributed environments. These parallel collections run on top of the dynamic task schedulers.

Comments

Intel® Distribution for Python

 

Anyone fancy taking this for a test drive and providing some information on performance?

Get real performance results and download the free Intel Distribution for Python that includes everything you need for blazing-fast computing, analytics, machine learning, and more. Use Intel Python with existing code, and you’re all set for a significant performance boost.

The core computing packages, Numpy, SciPy, and scikit-learn, are accelerated under the hood with powerful, multithreaded native performance libraries such as Intel® Math Kernel Library, Intel® Data Analytics Acceleration Library, and others, to deliver native code-like performance results to Python. We leverage Intel® hardware capabilities using multiple cores and the latest Intel® Advanced Vector Extensions (Intel® AVX) instructions, including Intel® AVX-512. The Intel Python team reimplemented select algorithms to dramatically improve their performance. Examples include NumPy FFT and random number generation, SciPy FFT, and more.

Available for Windows, Linux and macOS.

Minimum System Requirements

  • Processors: Intel Atom® processor or Intel® Core™ i3 processor
  • Disk space: 1 GB
  • Operating systems: Windows* 7 or later, macOS, and Linux
  • Python* versions: 2.7.X, 3.5.X, 3.6
  • Included development tools: Conda, conda-env, Jupyter Notebook (IPython)

Comments

STK: A Python Toolkit for Supramolecular Assembly

 

I bookmarked this paper a while back but have only just had time to read it through, STK: A Python Toolkit for Supramolecular Assembly. STK is a tool for the automated assembly, molecular optimization and property calculation of supramolecular materials. It has a simple Python API and integration with third party computational codes.

The source code of the program can be found at https://github.com/lukasturcani/stk and the detailed documentation is here.

Additional linking functional groups can be defined as SMARTS and STK can be extended by adding additional optimisation force-fields.

molecular_cage


Comments

RDkit in Samson

 

I've posted about Samson a couple of times and it just keeps getting better and better.

SAMSON is a novel software platform for computational nanoscience. Rapidly build models of nanotubes, proteins, and complex nanosystems. Run interactive simulations to simulate chemical reactions, bend graphene sheets, (un)fold proteins. SAMSON's generic architecture makes it suitable for material science, life science, physics, electronics, chemistry, and even education. SAMSON is developed by the NANO-D group at INRIA, and means "Software for Adaptive Modeling and Simulation Of Nanosystems.

A recent blog post highlights the use of RDKit in Samson.

In this post I will present you the RDKit-SMILES Manager module that I integrated in the SAMSON platform. As some of you know, RDKit is an open source toolkit for cheminformatics which is widely used in the bioinformatics research. One of its features is the conversion of molecules from their SMILES code to a 2D and 3D structures. Thanks to the new SAMSON Element, it is now possible to use these features in the SAMSON platform. SMILES code files (.smi) or text files (.txt) containing several SMILES codes can be read using the import button.

The new module allows you to import a file containing SMILES strings, generate 2D depictions, and by right-clicking on these images, you can open, generate the 3D structure in SAMSON or save the image as png or svg.

GenAll

It is also possible to run substructure searching using SMARTS.


Comments

Rodeo: A Python IDE for Data Scientists

 

Just added Rodeo a python IDE built for analysing data to the page of data analysis tools.

rodeo-overview-shot


Comments

mmpdb: An Open Source Matched Molecular Pair Platform for Large Multi-Property Datasets

 

An interesting paper on chemrxiv DOI

Matched Molecular Pair Analysis (MMPA) enables the automated and systematic compilation of medicinal chemistry rules from compound/property datasets. Here we present mmpdb, an open source Matched Molecular Pair (MMP) platform to create, compile, store, retrieve, and use MMP rules. mmpdb is suitable for the large datasets typically found in pharmaceutical and agrochemical companies and provides new algorithms for fragment canonicalization and stereochemistry handling. The platform is written in Python and based on the RDKit toolkit. It is freely available from https://github.com/rdkit/mmpdb


Comments

Flagging Potential Kinase Inhibitors

 

Most of kinase inhibitors bind in the region of the ATP binding site using the hydrogen bonding interactions of the hinge region shown in the schematic below. We can use the knowledge of these hinge binding motifs to flag potential kinase inhibitors.

schematicatpbinding

Read more ….


Comments

Top 20 programming languages

 

Red Monk have published their Programming Language Rankings. The data source used for these queries is the GitHub Archive.

  1. JavaScript
  2. Java
  3. Python
  4. PHP
  5. C#
  6. C++
  7. CSS
  8. Ruby
  9. C
  10. Swift
  11. Objective-C
  12. Shell
  13. R
  14. TypeScript
  15. Scala
  16. Go
  17. PowerShell
  18. Perl
  19. Haskell
  20. Lua

Swift (+1): Finally, the apprentice is now the master. Technically, this isn’t entirely accurate, as Swift merely tied the language it effectively replaced – Objective C – rather than passing it. Still, it’s difficult to view this run as anything but a changing of the guard. Apple’s support for Objective C and the consequent opportunities it created via the iOS platform have kept the language in a high profile role almost as long as we’ve been doing these rankings. Even as Swift grew at an incredible rate, Objective C’s history kept it out in front of its replacement. Eventually, however, the trajectories had to intersect, and this quarter’s run is the first occasion in which this has happened. In a world in which it’s incredibly difficult to break into the Top 25 of language rankings, let alone the Top 10, Swift managed the chore in less than four years. It remains a growth phenomenon, even if its ability to penetrate the server side has not met expectations.


Comments

Awesome Python Chemistry

 

A curated list of awesome Python frameworks, libraries, software and resources related to Chemistry.

https://github.com/lmmentel/awesome-python-chemistry

A blog post giving more details http://lukaszmentel.com/blog/awesome-python-chemistry/index.html.


Comments

MayaChem Tools

 

MayaChemTools is a fabulous collection of Perl and Python scripts, modules, and classes to support a variety of day-to-day computational discovery needs.

The core set of command line Perl scripts available in the current release of MayaChemTools has no external dependencies and provide functionality for the following tasks:

  • Manipulation and analysis of data in SD, CSV/TSV, sequence/alignments, and PDB files
  • Listing information about data in SD, CSV/TSV, Sequence/Alignments, PDB, and fingerprints files
  • Calculation of a key set of physicochemical properties, such as molecular weight, hydrogen bond donors and acceptors, logP, and topological polar surface area
  • Generation of 2D fingerprints corresponding to atom neighborhoods, atom types, E-state indices, extended connectivity, MACCS keys, path lengths, topological atom pairs, topological atom triplets, topological atom torsions, topological pharmacophore atom pairs, and topological pharmacophore atom triplets
  • Generation of 2D fingerprints with atom types corresponding to atomic invariants, DREIDING, E-state, functional class, MMFF94, SLogP, SYBYL, TPSA and UFF
  • Similarity searching and calculation of similarity matrices using available 2D fingerprints
  • Listing properties of elements in the periodic table, amino acids, and nucleic acids
  • Exporting data from relational database tables into text files

The command line Python scripts based on RDKit provide functionality for the following tasks:

  • Calculation of molecular descriptors
  • Comparison 3D molecules based on RMSD and shape
  • Conversion between different molecular file formats
  • Enumeration of compound libraries and stereoisomers
  • Filtering molecules using SMARTS, PAINS, and names of functional groups
  • Generation of graph and atomic molecular frameworks
  • Generation of images for molecules
  • Performing structure minimization and conformation generation based on distance geometry and forcefields
  • Picking and clustering molecules based on 2D fingerprints and various clustering methodologies
  • Removal of duplicate molecules

These invaluable scripts can be used in other applications, I've written a Vortex Script that uses them.


Comments

UCSF ChimeraX

 

A recent publication DOI describes an update to the popular molecule viewer UCSF Chimera

UCSF ChimeraX is next-generation software for the visualization and analysis of molecular structures, density maps, 3D microscopy, and associated data. It addresses challenges in the size, scope, and disparate types of data attendant with cutting-edge experimental methods, while providing advanced options for high-quality rendering (interactive ambient occlusion, reliable molecular surface calculations, etc.) and professional approaches to software design and distribution.

The application can be downloaded here http://www.rbvi.ucsf.edu/chimerax/download.html

It is important to note that ChimeraX is not backward compatible with Chimera and does not read Chimera session files. It has been tested on MacOS X 10.12. The ChimeraX user interface is implemented in Qt, offering a native-like look and feel on each platform. ChimeraX is largely implemented using Python, an interpreted programming language. To manipulate these very large datasets interactively, ChimeraX uses memory-efficient data structures combined with high-performance algorithms implemented in C++. MacroMolecular Crystallographic Interchange Format (mmCIF) is the preferred format for atomic data in ChimeraX, mmCIF replaces the aged and more limited PDB format and offers a number of advantages.

sym


Comments

Python support in Excel

 

The most popular suggestion on the "How can we improve Excel for Windows" forum is Python as an Excel scripting language with over 4500 votes and it has elicited a comment from the MSFT excel team.

Thanks for the continued passion around this topic. We’d like to gather more information to help us better understand the needs around Excel and Python integration.

Followed by a survey.

Of course one would hope that they also add it to the Mac version of Excel.

Comments

Deep Learning Cheat Sheet (using Python Libraries)

 

Just came across this really invaluable resource.

  • Deep Learning Cheat Sheet (using Python Libraries)
  • PySpark Cheat Sheet: Spark in Python
  • Data Science in Python: Pandas Cheat Sheet
  • Cheat Sheet: Python Basics For Data Science
  • A Cheat Sheet on Probability
  • Cheat Sheet: Data Visualization with R
  • New Machine Learning Cheat Sheet by Emily Barry
  • Matplotlib Cheat Sheet
  • One-page R: a survival guide to data science with R
  • Cheat Sheet: Data Visualization in Python
  • Stata Cheat Sheet
  • Common Probability Distributions: The Data Scientist’s Crib Sheet
  • Data Science Cheat Sheet
  • 24 Data Science, R, Python, Excel, and Machine Learning Cheat Sheets
  • 14 Great Machine Learning, Data Science, R , DataViz Cheat Sheets



Comments

YANK

 

YANK is a GPU-accelerated Python framework for exploring algorithms for alchemical free energy calculations.

Features

  • Modular Python framework to facilitate development and testing of new algorithms
  • GPU-accelerated via the OpenMM toolkit
  • Alchemical free energy calculations in both explicit and implicit solvent
  • Hamiltonian exchange among alchemical intermediates with Gibbs sampling framework
  • General Markov chain Monte Carlo framework for exploring enhanced sampling methods
  • Built-in equilibration detection and convergence diagnostics
  • Support for AMBER prmtop/inpcrd files
  • Support for absolute binding free energy calculations
  • Support for transfer free energies (such as hydration or partition free energies)

Install using conda

$ conda config --add channels omnia --add channels conda-forge
$ conda install yank

conda will install dependencies from binary packages automatically, including difficult-to-install packages such as OpenMM, numpy, and scipy. YANK runs on Python 3.5, and Python 3.6


Comments

Open Drug Discovery Toolkit

 

Open Drug Discovery Toolkit (ODDT) is modular and comprehensive toolkit for use in cheminformatics, molecular modeling etc. ODDT is written in Python, and make extensive use of Numpy/Scipy.

Open Drug Discovery Toolkit (ODDT): a new open-source player in the drug discovery field DOI.

The Open Drug Discovery Toolkit was developed as a free and open source tool for both computer aided drug discovery (CADD) developers and researchers. ODDT reimplements many state-of-the-art methods, such as machine learning scoring functions (RF-Score and NNScore) and wraps other external software to ease the process of developing CADD pipelines. ODDT is an out-of-the-box solution designed to be easily customizable and extensible.

To install

Install a clean Miniconda environment, if you already don't have one.

Install ODDT:

conda install -c oddt oddt

Or you can use PIP

pip install oddt

Requirements

Python 2.7+ or 3.4+
OpenBabel (2.3.2+) or/and RDKit (2016.03)
Numpy (1.8+)
Scipy (0.13+)
Sklearn (0.18+)
joblib (0.8+)
pandas (0.17)
Skimage (0.10+) (optional, only for surface generation)


Comments

Cluster mols

 

cluster_mols is a PyMOL plugin that allows the user to quickly select compounds from a virtual screen to be purchased or synthesized.

900px-Cluster_mols_py_pymol

The most up to date version (recommended) of clustermols is available through BitBucket at: https://bitbucket.org/mpb21/clustermols_py/overview

This plugin has a number of dependencies that are required. And it is currently only supported on Linux and OSX.

Baumgartner, Matthew (2016) IMPROVING RATIONAL DRUG DESIGN BY INCORPORATING NOVEL BIOPHYSICAL INSIGHT. Doctoral Dissertation, University of Pittsburgh.


Comments

FreeSASA

 

FreeSASA is a command line tool, C-library and Python module for calculating solvent accessible surface areas (SASA).

The Read Me gives download, build and installation instructions, in addition it details how to build the Python interface.

Simon Mitternacht (2016) FreeSASA: An open source C library for solvent accessible surface area calculations. F1000Research 5:189. DOI


Comments

Scripting PubMed searches

 

PubMed comprises more than 24 million citations for biomedical literature from MEDLINE, life science journals, and online books. Citations may include links to full-text content from PubMed Central and publisher web sites. They also provide a number of programming tools that allow access to the information, E-utilities are a set of server-side programs that provide a stable interface into the Entrez query and database system.

To access these data, a piece of software first posts an E-utility URL to NCBI, then retrieves the results of this posting, after which it processes the data as required. The software can thus use any computer language that can send a URL to the E-utilities server and interpret the XML response; examples of such languages are Perl, Python, Java, and C++.

A while back I wrote a vortex script that helps with these sort of searches if you have multiple terms you want to search. I've updated this script to incorporate the changes requiring api keys to allow multiple requests to the E-utilities api, and I've highlighted where you need to add your own api key in the script. I've also tried to ensure that any query string should be encoded to make it URL safe.

The update is detailed more fully here….

tut25result


Comments

Downloading from the RCSB Protein Data Bank using Python

 

The RCSB Protein Data Bank is an absolutely invaluable resource that provides archive-information about the 3D shapes of proteins, nucleic acids, and complex assemblies that helps scientists understand all aspects of biomedicine and agriculture, from protein synthesis to health and disease. Currently the PDB contains over 134,000 data files containing structural information on 42547 distinct protein sequences of which 37600 are human sequences. They also provide a series of tools to search, view and analyse the data.

Downloading an individual pdf file is pretty trivial and can be done from the web page as shown in the image below. They also provide a Download Tool launched as stand-alone application using the Java Web Start protocol. The tool is downloaded locally and must be then opened. I've found this a little temperamental and had issues with Java versions and security settings.

Since I've been making extensive use of the web services to interact with RCSB I decided to explore the use of Python to download multiple files. I started off creating a Jupyter notebook using the web services provided by RCSB.

I've also used variations on this code to create a python script and a Vortex script.

Full details are here …


Comments

Interacting with the RCSB Protein Data Bank

 

The RCSB Protein Data Bank is an absolutely invaluable resource that provides archive-information about the 3D shapes of proteins, nucleic acids, and complex assemblies that helps scientists understand all aspects of biomedicine and agriculture, from protein synthesis to health and disease. Currently the PDB contains over 134,000 data files containing structural information on 42547 distinct protein sequences of which 37600 are human sequences. They also provide a series of tools to search, view and analyse the data.

The latest addition to the Hints and Tutorials page is a couple of Vortex scripts for interacting with the RCSB Protein Data Bank, specifically they search for PDB structures associated with a list of Uniprot codes, and then search for associated information. Read more here…

Comments

Bioconda: A sustainable and comprehensive software distribution for the life sciences

 

Bioconda is a channel for the conda package manager specializing in bioinformatics software.

Bioconda supports only 64-bit Linux and Mac OSX.

Bioconda offers a collection of over 2900 software tools, which are continuously maintained, updated, and extended by a growing global community of more than 250 contributors. Bioconda improves analysis reproducibility by allowing users to define isolated environments with defined software versions, all of which are easily installed and managed without administrative privileges.

The conda package manager has recently made installing software a vastly more streamlined process. Conda is a combination of other package managers you may have encountered, such as pip, CPAN, CRAN, Bioconductor, apt-get, and homebrew. Conda is both language- and OS-agnostic, and can be used to install C/C++, Fortran, Go, R, Python, Java etc

You can read more details in this publication "Bioconda: A sustainable and comprehensive software distribution for the life sciences", doi.

Whilst there are a number of compilaions of Bioinformatics software, Bioconda looks to be by far the most comprehensive.

After installing Conda, the first step is to set up the Bioconda channel

conda config --add channels conda-forge
conda config --add channels bioconda

Packages can then be installed using

conda install cnvkit

This installs CNVkit plus the appropriate Python and R dependencies.


Comments

SAMSON, Software for Adaptive Modeling and Simulation Of Nanosystems

 

SAMSON is a novel software platform for computational nanoscience. Rapidly build models of nanotubes, proteins, and complex nanosystems. Run interactive simulations to simulate chemical reactions, bend graphene sheets, (un)fold proteins. SAMSON's generic architecture makes it suitable for material science, life science, physics, electronics, chemistry, and even education. SAMSON is developed by the NANO-D group at INRIA, and means "Software for Adaptive Modeling and Simulation Of Nanosystems.

samson

SAMSON has an open architecture which allows anyone to extend it - and adapt it to their needs - by downloading SAMSON Elements (modules). SAMSON Elements come in many flavors: apps, editors, controllers, models, parsers, etc., and are adapted to different application domains. SAMSON Elements help users build new models, perform calculations, run interactive or offline simulations, visualize and interpret results, and more. Add new SAMSON Elements to SAMSON straight from SAMSON Connect.

In the latest news Python scripting is coming to SAMSON 0.7.0. Most of the SAMSON API is now exposed in Python, and this will allow you to create models and run simulations, generate movies, perform analysis and reporting, etc., directly from scripts. Python will make it even easier to integrate and pipeline SAMSON and SAMSON Elements with well-known packages from diverse fields, e.g. TensorFlow, PyRosetta, RDKit, ASE, etc., to name a few


Comments

RDKit conformer generation script

 

Pharmacelera we have written a python script to generate conformations with RDKit and made it available here .

Conformer generation is one of the first and most important steps in most ligand based experiments, particularly when the ligand’s 3D structure is unknown. For example, the quality of the conformers could affect the results of virtual screening experiments.


Comments

Rdkit warning

 

I just saw this message on the rdkit mailing list and I thought I'd flag it.

I've noticed a problem with anaconda python on the Mac. This may also be a problem on linux, but I haven't tested that yet.

Due to some changes in the way the anaconda team is doing python builds, the most recent conda python builds seem to no longer work with the RDKit. The symptom is an error message like "Fatal Python error: PyThreadState_Get: no current thread" when you try to import the rdkit.

I've observed this for the newest 3.5 (3.5.4-hf91e95415) and 3.6 (3.6.2-hd0bf7f115) builds. A workaround is to downgrade to 3.5.3 (conda install python=3.5.3) or 3.6.1 (conda install python=3.6.1).

Comments

Scoria: a Python module for manipulating 3D molecular data

 

Just catching up on reading the literature and came across this interesting python paper in Journal of Cheminformatics. DOI.

Scoria is useful for both analyzing molecular dynamics (MD) trajectories and molecular modeling. For example, we have used beta-version Scoria functions to create large-scale lipid-bilayer models, to construct small-molecules models with improved predicted binding affinities, to measure MD-sampled binding-pocket shapes and volumes , and to develop neural-network docking scoring functions, among other applications. As an additional example, in this manuscript we describe a trajectory-analysis Scoria script that colors the atoms of one protein chain by the frequency of their contacts with a second chain.

scoria


Comments

Predicting sites of metabolism Vortex script

 

It is really useful to have two sites of metabolism tools available that use contrasting methodologies, FAME 2 using curated dataset of experimentally determined metabolism data to build a machine learning model using simple descriptors. In contrast SMARTCyp uses precomputed activation energies from density functional theory (DFT) calculations of model compounds.

I previously wrote a script displaying the [results of a SMARTCyp calculation in a webview. The first part of the script imports the smartcyp.jar, however with each update I was finding issues so I thought it might be better to simply treat SMARTCyp as a command line application and use subprocess to access it.

Using a similar script we can also access FAME2

More details here.

somprediction


Comments

chemfp 1.3 released

 

Chemfp is a set of command-line tools and a Python library for working with cheminformatics fingerprints. It can use OEChem/OEGraphSim, RDKit, or Open Babel to create fingerprints in the FPS format, and it implements a high-speed Tanimoto search.

The software is available under the MIT license. For more information see http://chemfp.com/. Documentation is available from http://chemfp.readthedocs.io/en/chemfp-1.3/ .

There are many changes over chemfp 1.1, which was the last release of the public/no-cost version of chemfp. The biggest ones are:

  • Tested against the current version of all of the toolkits

  • Added support for the Avalon and pattern fingerprints in RDKit

  • In-memory Tanimoto searches for 166-bit MACCS keys on computers with the POPCNT instruction is about 30% faster.

  • FPS loading is about 40% faster. As a result, file-based searches are about 25% faster.

  • The in-memory search algorithms in version 1.1 were parallelized with OpenMP, but the NxM k-nearest search was left out. That case is now also parallelized.

  • Some of the APIs from the commercial version were backported to 1.3, including the fingerprint writer API and functions for substructure fingerprint screening.

  • Added and improved docstrings

This release support Python 2.7 but it no longer supports Python 2.5 or Python 2.6. The commercial version supports Python 2.7 and Python 3.5+, handles more than 4GB of fingerprint data, and has a binary fingerprint format for fast loading.

It is available from http://dalkescientific.com/releases/chemfp-1.3.tar.gz.


Comments

Accessing Jupyter Notebook model from Vortex

Chemical Drawing Programs – The Comparison of Accelrys (Symyx) Draw, ChemDraw, DrawIt, ACD/ChemSketch, ChemDoodle and Chemistry 4-D Draw

http://dragon.unideb.hu/~gundat/rajzprogramok/dprog.html

There is also a comparison of six chemical drawing packages here