pyCHARMM: Embedding CHARMM Functionality in a Python Framework
CHARMM is a very highly regarded biomolecular simulation and modelling package. pyCHARMM allows the user to access CHARMM functionality from python.
We anticipate that pyCHARMM will be a robust platform for the development of comprehensive and complex workflows utilizing Python and its extensive functionality as well as an optimal platform for users to learn molecular modeling methods and practices within a Python-friendly environment such as Jupyter Notebooks.
The publication is here https://doi.org/10.26434/chemrxiv-2023-j6wt1
And there is a GitHub repo for a workshop here https://github.com/clbrooksiii/pyCHARMM-Workshop
Setting up ML and AI tools on Apple Silicon
I've had a number of questions about setting up a machine learning/artificial intelligence environment on an Apple Silicon Mac. So I've tried to write a step by step guide.
Setting up ML and AI tools on Apple Silicon, using home-brew and conda to install and manage compatibility and dependences.
I've also created a .yml file that you can use instead of going through all the steps.
There are a couple of example Jupyter notebooks that give a starting point for trying things out.
I'm very much aware that this is a bit of a moving target at the moment so comments/suggestions are much appreciated.
JupyterLab Desktop App now available
I started using iPython Notebooks many years ago, these became Jupyter notebooks and I'm now transitioning to JupyterLab
I noticed recently there is now a JupyterLab desktop app.
JupyterLab App is the cross-platform standalone application distribution of JupyterLab. It is a self-contained desktop application which bundles a Python environment with several popular Python libraries ready to use in scientific computing and data science workflows.
It is available from GitHub https://github.com/jupyterlab/jupyterlab_app#download.
JupyterLab App works on Debian and Fedora based Linux, macOS and Windows operating systems.
Ammolite
This looks really interesting, Ammolite enables the transfer of structure related objects from Biotite to PyMOL for visualization, via PyMOL’s Python API:
- mport AtomArray and AtomArrayStack objects into PyMOL - without intermediate structure files
- Convert PyMOL objects into AtomArray and AtomArrayStack instances.
- Use Biotite’s boolean masks for atom selection in PyMOL.
- Display images rendered with PyMOL in Jupyter notebooks.
To install
conda install -c conda-forge ammolite
Biotite package bundles popular tasks in computational molecular biology into a uniform Python library.
JupyterLab 3.0 released
JupyterLab is the next-generation web-based user interface for Project Jupyter.
JupyterLab 3.0 includes a number of new features and enhancements that are described on the Jupyter blog. Full details are described in the ChangeLog
To install using conda
conda install -c conda-forge jupyterlab=3
However note that some extensions may not yet have been updated.
Jupyter Notebook for docking either locally or using Colab
Here are two variations of a Jupyter Notebook to help with docking experiments. The first version runs locally and requires the user to install RDKit, OpenBabel, SMINA and py3Dmol, the second version can be run using Google CoLab and thus all you require is a web browser.
LFortran 0.9.0 is released
I'm not a big Fortran user but I know that the Fortran on Mac is regularly the most popular page on the site so I do post snippets of news I hope are useful.
LFortran 0.9.0 is released.
LFortran is a modern open-source (BSD licensed) interactive Fortran compiler built on top of LLVM. It can execute user’s code interactively to allow exploratory work (much like Python, MATLAB or Julia) as well as compile to binaries with the goal to run user’s code on modern architectures such as multi-core CPUs and GPUs.
The easiest is to install using Conda:
conda install lfortran jupyter
The Fortran Jupyter notebooks now just work on Linux, macOS and Windows, including stdout capture (print *, "Hello World!"), etc.
More information is here lfortran.org and on GitHub https://gitlab.com/lfortran/lfortranhttps://gitlab.com/lfortran/lfortran.
Swift for Tensorflow (and other things).
After creating MolSeeker and iBabel4 I've been investigating the use of Swift and in particular the open-source use.
Swift.org provides a nice introduction and overview, it also highlights the Google Summer of Code Swift projects which are a fabulous way for students to get involved.
The Google Swift for TensorFlow group have been very active, and Tyrolabs have recently posted a detailed summary, including a comparison with other languages.
Two years ago, a small team at Google started working on making Swift the first mainstream language with first-class language-integrated differentiable programming capabilities. The scope and initial results of the project have been remarkable, and general public usability is not very far off.
They have now provided support for Jupyter notebooks https://github.com/google/swift-jupyter
There is also an interesting blog post here fast.ai.
IBM also seem to be using swift https://developer.ibm.com/technologies/swift/ and are highlighting leveraging Watson.
Developers can take advantage of the Watson Developer Cloud’s Swift SDK to easily build Watson-powered applications for iOS or Linux platforms. Leverage the power of Watson’s advanced artificial intelligence, machine learning, and deep learning techniques to understand unstructured data and engage with users in new ways.
Since Swift is a relatively new language it is worth looking at the ongoing evolution.
Jupyter notebook to access IBM RXN AI-assisted retrosynthesis
A python wrapper for the IBM RXN api has been released, available on GitHub https://github.com/rxn4chemistry/rxn4chemistry
To install
pip install rxn4chemistry
You will need to register and get an api key from here https://rxn.res.ibm.com/rxn/user/profile.
This demo shows how to use for retrosynthesis ideas.
The page also includes links to download the notebook.
Jupyter notebook to access IBM RXN API
A python wrapper for the IBM RXN api has been released, available on GitHub https://github.com/rxn4chemistry/rxn4chemistry
To install
pip install rxn4chemistry
You will need to register and get an api key from here https://rxn.res.ibm.com/rxn/user/profile.
Simple demo using Jupyter Notebook
This is going to be very useful.
Interactive plots in Jupyter Notebooks updated
I've been using Jupyter notebooks for a while for a wide variety of projects.
I've been looking at ways to produce interactive plots within a Jupyter notebook and after trying a couple of options to produce interactive data frames, in addition to 2D and 3D scatterplots including structures on tooltips.
Full review and the Jupyter notebook are here.
Interactive plots in Jupyter notebooks
I've been looking at ways to produce interactive plots within a Jupyter notebook and after trying a couple of options I used Plotly. This seems fairly straight-forward to use and I can produce interactive data frames, in addition to 2D and 3D scatterplots.
More details are shown here together with the jupyter notebook. It is very much a work in progress and suggestions are welcome. In particular, whilst I can get text to appear when hovering over a data point I'd be interested in ideas of how to get the structure displayed when you mouse over a point.
Modin for distributed Pandas calculations
Modin is a library designed to accelerate Pandas by automatically distributing the computation across all of the system’s available CPU cores. Modin uses Ray to provide an effortless way to speed up your pandas notebooks, scripts, and libraries. Unlike other distributed DataFrame libraries, Modin provides seamless integration and compatibility with existing pandas code. Even using the DataFrame constructor is identical. Modin is a DataFrame designed for datasets from 1MB to 1TB+
It can be installed using PIP
pip install modin
If you don't have Ray or Dask installed, you will need to install Modin with one of the targets:
pip install modin[ray] # Install Modin dependencies and Ray to run on Ray
pip install modin[dask] # Install Modin dependencies and Dask to run on Dask
pip install modin[all] # Install all of the above
Currently, Modin depends on pandas version 0.23.4.
I've added Modin to the Open Source Data Science Python Libraries.
Determining the Amino Acids in a collection of peptides
I've recently become interested the comparison of the amino amino-acid composition of peptides, to allow comparison of cyclic versus linear peptides, or brain penetrant curses non-penetrant. I had a look around but could not find any tools that did this, in particular I wanted to include any non-proteinergic amino-acids.
This tutorial provides a means to analyse many thousands of peptides using Vortex.
Ten simple rules for writing and sharing computational analyses in Jupyter Notebooks
As a regular Jupyter/Python user this publication (PLoS Comput Biol 15(7): e1007007) DOI is a great reminder of good practice, and as Jupyter becomes increasingly popular as a means to share code/data/results writing the notebook in a manner that helps readers is increasingly important.
This ability to combine executable code and descriptive text in a single document has close ties to Knuth’s notion of “literate programming” and has convinced many researchers to switch to computational notebooks from other programming environments. Jupyter Notebooks in particular have seen widespread adoption: as of December 2018, there were more than 3 million Jupyter Notebooks shared publicly on GitHub (https://www.github.com), many of which document academic research.
There are of course many different ways to share Jupyter notebooks.
Whether you use notebooks to track preliminary analyses, to present polished results to collaborators, as finely tuned pipelines for recurring analyses, or for all of the above, following this advice will help you write and share analyses that are easier to read, run, and explore.
An interactive RDKit widget for Jupyter: a first pass
This looks like it could be very interesting.
A blog post by Greg Landrum a widget for displaying molecules where you can select atoms and find out which atoms are selected propagating to Python in a Jupyter Notebook.
This is basic, but I think it's a decent start towards something that could be really useful. Interested? Have suggestions (ideally accompanied by code!) on how to improve it? If it looks like this is actually likely to be used, I will figure out how to create a standalone nbwidget out of this and create a separate github repo for it.
Looks like a useful tool for selecting bonds for conformational analysis, selecting bonds for creating a Ramachandran plot, selecting groups for bioisosteric replacement……
Sounds like Greg is looking for input.
Jupyter notebook to look at molecular similarity
I was recently asked for a tool to compare the similarity of a list of molecules with every other molecule in the list. I suspect there may be commercial tools to do this but for small numbers of compounds it is easy to visualise in a Jupyter notebook using RDKit.
Read more here, MolecularSimilarityNotebook
Extending Jupyter
I'm a great fan of Jupyter notebooks and I'm always looking for ways to get more out of them. I came across this blog post recently which is packed with useful tips
99 ways to extend the Jupyter ecosystem
Whenever someone says ‘You can do that with an extension’ in the Jupyter ecosystem, it is often not clear what kind of extension they are talking about. The Jupyter ecosystem is very modular and extensible, so there are lots of ways to extend it. This blog post aims to provide a quick summary of the most common ways to extend Jupyter, and links to help you explore the extension ecosystem.
I've also published some notebooks under Tips and Tutorials, Jupyter notebooks
Jupyter notebook to create Wordcloud of tweets
I've often wanted to try creating a word cloud and when Noel O'Boyle collected together all the tweets from the Sheffield Conf on Chemoinformatics this seemed a good opportunity.
Relive the Sheffield Conf on Chemoinformatics with these #shef2019 tweets I've pulled down from Twitter, link to tweet.
The Jupyter notebook used to create the word cloud is here, it uses the excellent word cloud generator word_cloud. You will need to download the text from the tweets from the link provided in the tweet.
Binder news
If you use Binder to serve your Jupyter notebooks you will be interested in this.
Have a repository full of Jupyter notebooks? With Binder, open those notebooks in an executable environment, making your code immediately reproducible by anyone, anywhere
We flipped the switch on making mybinder.org 6 a federation. This means that there are now two clusters that serve requests for mybinder.org 6. What changes for you as a user? Hopefully nothing. You will notice that if you visit mybinder.org 6 (or any other link to it) you will be redirected to gke.mybinder.org 1 or ovh.mybinder.org 5. Beyond that small change everything should keep working as before
This should mean that Binder becomes more robust and not susceptible to outages. Now this is in place it should also be possible to add further server resources.
End of the line for Python 2
Just a reminder that support for Python 2.7 will end on Jan 31 2020 (there will be no 2.8), all major scientific packages now support Python 3.x and there will be no further updates the Python 2.x versions.
An increasing number of projects have pledged to drop support for Python 2.7 no later than 2020, these include pandas, RDKit, iPython, Matplotlib, NumPy, SciPy, BioPython, Psi4, scikit-learn, Tensorflow, Jupyter notebook and many more.
Time to update those old scripts and Jupyter notebooks.
CGRtools: Python Library for Molecule, Reaction and Condensed Graph of Reaction Processing
CGRtools is a set of tools for processing of reactions based on Condensed Graph of Reaction (CGR) approach, details on Github https://github.com/cimm-kzn/CGRtools. Published in JCIM DOI
Basic operations:
- Read /write /convert formats MDL .RDF and .SDF, SMILES, .MRV
- Standardize reactions and valid structures checker.
- Produce CGRs.
- Perfrom subgraph search.
- Build /correct molecules and reactions.
- Produce template based reactions.
stable version are available through PyPI
pip install CGRTools
Install CGRtools library DEV version for features that are not well tested
pip install -U git+https://github.com/cimm-kzn/CGRtools.git@master#egg=CGRtools
There is also a tutorial using Jupyter notebook https://github.com/cimm-kzn/CGRtools/tree/master/tutorial
HELM notation in Jupyter Notebook
I was recently asked for a way to visualise HELM notation
HELM (Hierarchical Editing Language for Macromolecules) enables the representation of a wide range of biomolecules such as proteins, nucleotides, antibody drug conjugates etc. whose size and complexity render existing small-molecule and sequence-based informatics methodologies impractical or unusable.
The RDKit provides limited support for HELM notation (currently peptide) and a simple Jupyter Notebook provides an easy interface as shown here
Using the Python 3 library fpsim2 for similarity searches
FPSim2 is a new tool for fast similarity search on big compound datasets (>100 million) being developed at ChEMBL. It was developed as a Python3 library to support either in memory or out-of-core fast similarity searches on such dataset sizes.
It is built using RDKit and can be installed using conda. It requires Python 3.6 and a recent version of RDKit..
I've written a couple of Jupyter notebooks to demonstrate it's use.
You can read the full tutorial here, and download the notebooks.
Comparison of bioactivity predictions
Small molecules can potentially bind to a variety of bimolecular targets and whilst counter-screening against a wide variety of targets is feasible it can be rather expensive and probably only realistic for when a compound has been identified as of particular interest. For this reason there is considerable interest in building computational models to predict potential interactions. With the advent of large data sets of well annotated biological activity such as ChEMBL and BindingDB this has become possible.
ChEMBL 24 contains 15,207,914 activity data on 12,091 targets, 2,275,906 compounds, BindingDB contains 1,454,892 binding data, for 7,082 protein targets and 652,068 small molecules.
These predictions may aid understanding of molecular mechanisms underlying the molecules bioactivity and predicting potential side effects or cross-reactivity.
Whilst there are a number of sites that can be used to predict bioactivity data I'm going to compare one site, Polypharmacology Browser 2 (PPB2) http://ppb2.gdb.tools with two tools that can be downloaded to run the predictions locally. One based on Jupyter notebooks models built using ChEMBL built by the ChEMBL group https://github.com/madgpap/notebooks/blob/master/targetpred21_demo.ipynb and a more recent random forest model PIDGIN. If you are using proprietary molecules it is unwise to use the online tools.
A Jupyter Kernel for Swift
I'm constantly impressed by the expansion of Jupyter it is rapidly becoming the first-choice platform for interactive computing.
The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more.
A latest expansion is a Jupyter Kernel for Swift, intended to make it possible to use Jupyter with the Swift for TensorFlow project.
Swift for TensorFlow is a new way to develop machine learning models. It gives you the power of TensorFlow directly integrated into the Swift programming language. With Swift, you can write the following imperative code, and Swift automatically turns it into a single TensorFlow Graph and runs it with the full performance of TensorFlow Sessions on CPU, GPU and TPU.
Requires MacOS 10.13.5 or later, with Xcode 10.0 beta or later
Most popular Python IDE, Editors
I always keep an eye out for the polls on KDnuggets, the latest one looks at Python editors or IDEs, over 1900 people took part and the results are shown below (users could select up to 3). There is more detail in the linked page.
I've become a great fan of Jupyter, and not only for Python.
Embeding LaTeX and MathML in Jupyter Notebooks
I've been using Jupyter notebooks for a little while but I only just recently found out that you can embed LaTeX or MathML into a notebook!
This notebook is just a series of examples of what can be done. You can embed equations inline or have them on a separate line in a markdown text cell. Or in a code cell by importing Math or invoking latex.
Deep Replay
This looks rather neat, Deep Replay
Deep Replay is a package designed to allow you to replay in a visual fashion the training process of a Deep Learning model in Keras.
To install Deep Replay just type:
pip install deepreplay
ChEMBL 24 predictive models
Recently ChEMBL was updated to version 24 the update contains:
- 2,275,906 compound records
- 1,828,820 compounds (of which 1,820,035 have mol files)
- 15,207,914 activities
- 1,060,283 assays
- 12,091 targets
- 69,861 documents
In addition today they released the predictive models built on the updated database, they can be downloaded from the ChEMBL ftp server ftp://ftp.ebi.ac.uk/pub/databases/chembl/target_predictions
There are 1569 models.
Accessing a Jupyter Notebook HERG model from Vortex
A recent paper "The Catch-22 of Predicting hERG Blockade Using Publicly Accessible Bioactivity Data" DOI described a classification model for HERG activity. I was delighted to see that all the datasets used in the study, including the training and external datasets, and the models generated using these datasets were provided as individual data files (CSV) and Python Jupyter notebooks, respectively, on GitHub https://github.com/AGPreissner/Publications).
The models were downloaded and the Random Forest Jupyter Notebooks (using RDKit) modified to save the generated model using pickle to store the predictive model, and then another Jupyter notebook was created to access the model without the need to rebuild the model each time. This notebook was exported as a python script to allow command line access, and Vortex scripts created that allow the user to run the model within Vortex and import the results and view the most significant features.
All models and scripts are available for download.
Jupyter and Fortran
Well after my last post about Swift and Jupyter a reader sent me link to the use of both Julia and Fortran programming languages in a Jupyter Notebook.
More information in this lecture Project Jupyter: Architecture and Evolution of an Open Platform for Modern Data Science by Fernando Perez.
Project Jupyter, evolved from the IPython environment, provides a platform for interactive computing that is widely used today in research, education, journalism and industry. The core premise of the Jupyter architecture is to provide tools for human-in-the-loop interactive computing. It provides protocols, file formats, libraries and user-facing tools optimized for the task of humans interactively exploring problems with the aid of a computer, combining natural and programming languages in a common computational narrative.
Swift 4.1 in a Jupyter Notebook
I'm a great fan of Jupyter Notebooks but I only ever use python.
The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text
A recent post by Ray Yamamoto Hilton caught my eye who recently put together a little experiment to demonstrate using Swift 4.1 from within Jupyter Notebooks.
You can download a demo notebook here.
Downloading from the RCSB Protein Data Bank using Python
The RCSB Protein Data Bank is an absolutely invaluable resource that provides archive-information about the 3D shapes of proteins, nucleic acids, and complex assemblies that helps scientists understand all aspects of biomedicine and agriculture, from protein synthesis to health and disease. Currently the PDB contains over 134,000 data files containing structural information on 42547 distinct protein sequences of which 37600 are human sequences. They also provide a series of tools to search, view and analyse the data.
Downloading an individual pdf file is pretty trivial and can be done from the web page as shown in the image below. They also provide a Download Tool launched as stand-alone application using the Java Web Start protocol. The tool is downloaded locally and must be then opened. I've found this a little temperamental and had issues with Java versions and security settings.
Since I've been making extensive use of the web services to interact with RCSB I decided to explore the use of Python to download multiple files. I started off creating a Jupyter notebook using the web services provided by RCSB.
I've also used variations on this code to create a python script and a Vortex script.
Accessing Jupyter Notebook model from Vortex
Chemical Drawing Programs – The Comparison of Accelrys (Symyx) Draw, ChemDraw, DrawIt, ACD/ChemSketch, ChemDoodle and Chemistry 4-D Draw
http://dragon.unideb.hu/~gundat/rajzprogramok/dprog.html
There is also a comparison of six chemical drawing packages here