Macs in Chemistry

Insanely Great Science

An invitation and a recipe

 

Around this time last year RSC CICAG held a meeting to discuss 20 years of rule of 5 this meeting involved presentations but also less formal panel discussions and informal chats. This proved to be really popular and CICAG had planned to hold another end of year meeting again with the aim of fostering more informal interactions. However, the efforts involved in converting the AI in chemistry and the Open Chemical Science meetings from physical to virtual meetings meant that we had to postpone the end of year meeting till next year.

In its place we have organised an impromptu webinar looking back at the history of subjects at the core of CICAG's interests.

Andrew Dalke has kindly agreed to talk about the history of cheminformatics from punch cards to the present day.

John Overington will also talk about the history of ChEMBL an absolutely invaluable open-source database that we now take for granted.

Our aim is that we will try to make this informal with plenty of time to ask questions/reminisce so stock up on mince pies and mulled wine, click the link below and settle back for a fascinating hour or two.

You are invited to a Zoom webinar.

When: Dec 22, 2020 04:00 PM London Topic: Looking back at Cheminformatics and ChEMBL

Register in advance for this webinar: https://us02web.zoom.us/webinar/register/WN_-bWCQe32RW6ZKVEyb8qq4A

After registering, you will receive a confirmation email containing information about joining the webinar.

I promised a recipe.

Mulled Wine

Whilst you can buy bottles of mulled wine I think making your own gives better results (we are chemists after all).

Ingredients

A bottle of inexpensive red wine
2 oranges
2 cinnamon sticks
4 Cloves
2 Star anise
30 g Sugar (or more to taste)

Pour the red wine into a saucepan and add the cinnamon sticks (you can use ground cinnamon), cloves and star anise. Then add the zest from one of the oranges plus the orange juice together with the sugar. Heat gently to dissolve sugar and the simmer on low heat for 10 mins.

Serve whilst warm and add orange slices to decorate.

You can add a little brandy or Grand Marnier to give a little extra kick.



Comments

Workshop on Open-Source Tools for Chemistry

 

Just a couple of notes for software installs prior to the event for those attending the free online Workshop on Open-Source Tools for Chemistry 9-13 November 2020.

Monday 13-30 to 15-30 Cheminformatics and Data Analysis using DataWarrior (Isabelle Giraud)

DataWarrior can be downloaded from here http://www.openmolecules.org/datawarrior/download.html

The training files can all be downloaded from here

Monday 16 - 00 to 18-00 Molecular visualisation using Pymol (Garrett Morris)

Software to install:

PyMOL via Conda:
Conda: https://www.anaconda.com/distribution/ or Miniconda: https://docs.conda.io/en/latest/miniconda.html
https://anaconda.org/psi4/pymol or https://omicx.cc/2019/05/26/install-pymol-windows/

PyMOL via MacPorts:
http://www.ub.edu/cbdd/?q=content/installing-pymol-macports
% sudo port install tcl -corefoundation
% sudo port install tk -quartz
% sudo port install pymol

PyMOL from GitHub: https://github.com/schrodinger/pymol-open-source

Tuesday 11 to 13-00 Chemistry in the cloud: leveraging Google Colab for quantum chemistry (Jan Jensen)

Participants should download Chrome and have a Google account
Participants should make sure they can access this page: https://bit.ly/37fIYbp.
Some basic degree of Python proficiency is required for the course

It would be great if participants could fill out this survey https://forms.gle/pjwsnJTb4X6QpiHK9 early enough to help me design the course

Wednesday 13-30 to 15-30 Accessing biological and chemical data in ChEMBL (Anna Gaulton)

Requires a modern web-browser (with javascript not blocked) such as Chrome/Safari

Thursday 16-00 to 18-00 Fragment based screening, XChem at Diamond (Rachel Skyner)

Requires Chrome web browser, if there is time Rachel would like to give an introduction to the new Python API, we can go through the installation at the workshop but you must have Anaconda installed.

Friday 11-00 to 13-00 An introduction to KNIME workflows (Greg Landrum)

Knime can be downloaded here https://www.knime.com/downloads

Registration This event will be free to attend but registration is required.

More details and registration can be found here https://www.rsc.org/events/detail/43180/workshop-on-open-source-tools-for-chemistry.

Last Updated 28 October 2020

Comments

Workshop on Open-Source Tools for Chemistry

 

All scientists working in chemistry need software tools for accessing, handling and storing chemical information, or performing molecular modelling and computational chemistry. There is now a wealth of open-source tools to help in these activities; however, many are not as well-known as commercial offerings. This workshop offers a unique opportunity for attendees to try out a range of open-source software packages for themselves with expert tuition in different aspects of chemistry.

pymol

The software packages will be presented over six two-hour sessions as follows:

09 November: 13.30 - 15.30 Cheminformatics and data analysis using Data Warrior (Isabelle Giraud) 09 November: 16.00 - 18.00 Molecular visualization using PyMOL (Garrett M Morris)

10 November: 11.00 - 13.00  Chemistry in the cloud: leveraging Google Colab for quantum chemistry  (Jan Jensen)

11 November: 13.30 - 15.30  Accessing biological and chemical data in ChEMBL (Anna Gaulton)

12 November: 16.00 - 18.00  Fragment-based screening, XChem at Diamond (Rachael Skyner)

13 November: 11.00 - 13.00  Interactive and automated chemical data analysis with KNIME (Greg Landrum)

Registration This event will be free to attend but registration is required.

More details and registration can be found here https://www.rsc.org/events/detail/43180/workshop-on-open-source-tools-for-chemistry.


Comments

ChEMBL Compound Curation Pipeline

 

With the imminent release of ChEMBL 26 I was interested to hear about the new chemical curation pipeline that had been developed.

The pipeline includes three functions:

  1. Check Identifies and validates problem structures before they are added to the database

  2. Standardize Standardises chemical structures according to a set of predefined ChEMBL business rules

  3. GetParent Generates parent structures of multi-component compounds based on a set of rules and defined list of salts and solvents

The code is all on GitHub https://github.com/chembl/ChEMBLStructurePipeline and notebooks are available.


Comments

Comparison of bioactivity predictions

 

Small molecules can potentially bind to a variety of bimolecular targets and whilst counter-screening against a wide variety of targets is feasible it can be rather expensive and probably only realistic for when a compound has been identified as of particular interest. For this reason there is considerable interest in building computational models to predict potential interactions. With the advent of large data sets of well annotated biological activity such as ChEMBL and BindingDB this has become possible.

ChEMBL 24 contains 15,207,914 activity data on 12,091 targets, 2,275,906 compounds, BindingDB contains 1,454,892 binding data, for 7,082 protein targets and 652,068 small molecules.

These predictions may aid understanding of molecular mechanisms underlying the molecules bioactivity and predicting potential side effects or cross-reactivity.

Whilst there are a number of sites that can be used to predict bioactivity data I'm going to compare one site, Polypharmacology Browser 2 (PPB2) http://ppb2.gdb.tools with two tools that can be downloaded to run the predictions locally. One based on Jupyter notebooks models built using ChEMBL built by the ChEMBL group https://github.com/madgpap/notebooks/blob/master/targetpred21_demo.ipynb and a more recent random forest model PIDGIN. If you are using proprietary molecules it is unwise to use the online tools.

Read the article here

Comments

GuacaMol, benchmarking models.

 

Comparison of different algorithms is an under researched area, this publication looks like a useful starting point.

GuacaMol: Benchmarking Models for De Novo Molecular Design

De novo design seeks to generate molecules with required property profiles by virtual design-make-test cycles. With the emergence of deep learning and neural generative models in many application areas, models for molecular design based on neural networks appeared recently and show promising results. However, the new models have not been profiled on consistent tasks, and comparative studies to well-established algorithms have only seldom been performed. To standardize the assessment of both classical and neural models for de novo molecular design, we propose an evaluation framework, GuacaMol, based on a suite of standardized benchmarks. The benchmark tasks encompass measuring the fidelity of the models to reproduce the property distribution of the training sets, the ability to generate novel molecules, the exploration and exploitation of chemical space, and a variety of single and multi-objective optimization tasks. The benchmarking framework is available as an open-source Python package.

Source code : https://github.com/BenevolentAI/guacamol.

The easiest way to install guacamol is with pip:

pip install git+https://github.com/BenevolentAI/guacamol.git#egg=guacamol --process-dependency-links

guacamol requires the RDKit library (version 2018.09.1.0 or newer).


Comments

New ChEMBL interface

 

Just having a look at the new ChEMBL interface, quite like the easy way to embed records into web pages

<object data="https://www.ebi.ac.uk/chembl/beta/embed/#mini_report_card/Compound/CHEMBL1471" width="100%" height="300"></object>

and it is displayed as shown below.

Will doing some more investigations later this week.

Comments

ChEMBL Models iPython Notebook

 

With the release of ChEMBL 21 has come a set of updated target predicted models.

The good news is that, besides the increase in terms of training data (compounds and targets), the new models were built using the latest stable versions of RDKit (2015.09.2) and scikit-learn (0.17). The latter was upgraded from the much older 0.14 version, which was causing incompatibility issues while trying to use the models.

I've been using the models and I thought I'd share an iPython Notebook I have created. This is based on the ChEMBL notebook with code tidbits taken from the absolutely invaluable Stack Overflow. I'm often in the situation where I actually want to know the predicted activity at specific targets, and specifically want to confirm lack of predicted activity at potential off-targets. I could have a notebook for each target but actually the speed of calculation means that I can calculate all the models and then just cherry pick those of interest.

Read on…


Comments

ChEMBL21 Update

 

With the release of ChEMBL21 we also get updates to the web services.

ChEMBL 21 introduced a few new tables, which are now available via the API. Keyword searching has been improved.

Compound images have transparent background by default

The official Python client library has been updated as well in order to reflect recent changes. This can be installed using PIP

pip install -U chembl_webresource_client

Comments

ChEMBL 21 released

 

The release of ChEMBL_21 has been announced. This version of the database was prepared on 1st February 2016 and contains:

  • 1,929,473 compound records
  • 1,592,191 compounds (of which 1,583,897 have mol files)
  • 13,968,617 activities
  • 1,212,831 assays
  • 11,019 targets
  • 62,502 source documents

chembl21

Data can be downloaded from the ChEMBL ftpsite or viewed via the ChEMBL interface

Please see ChEMBL_21 release notes for full details of all changes in this release.


Comments

LSH-based similarity search in MongoDB is faster than postgres cartridge

 

There is a great blog article on ChEMBL-og, describing their work evaluating chemical structure based searching in MongoDB. MongoDB is a NoSQL database designed for scalability and performance that is attracting a lot of interest at the moment.

The article does a great job in explaining the logic behind improving the search performance.

They also provide an iPython notebook so you can try it yourself.

Comments

ChEMBL python update.

 

Excellent blog post on the ChEMBL python update.

http://chembl.blogspot.co.uk/2015/07/chembl-python-client-update.html

Comments