Macs in Chemistry

Insanely Great Science

Installing RDKit using Homebrew

 

I just saw this message on the RDKit users message board which offers a method to install RDKit using Homebrew, I use Anaconda to install RDKit so I've not tested it.

Recently, I updated the brew install recipe for rdkit on Mac. The biggest change is that boost and boost-python's versions were pinned down, so that the brew install recipe should be much more reproducible than before. Here is a fail-safe way to install rdkit with it (with Python wrappers, and InChI support):

I've added the instructions to the Cheminformatics on a Mac page as an alternative to using Anaconda to install RDKit.

The RDKit is an open source toolkit for cheminformatics, 2D and 3D molecular operations, descriptor generation for machine learning, etc.

helm2smiles


Comments

Crowdfunding software development

 

Some time ago I wrote a piece on my thoughts on scientific software development I got a lot of very positive feedback and one of the comments about not knowing about available cheminformatics toolkits lead me to create a page on open source toolkits. However this really did not address the underlying problem of how to fund specialist scientific software.

Which is why I was intrigued to hear about Andrew Dalke's efforts to crowdfund development of an open source cheminformatics software development.

This is an experiment to see if a crowdfunding consortium can be used to fund the matched molecular pair program “mmpdb”. The deadline to join is 1 February 2020!

The project is mmpdb, initial work was described in and article in JCIM "mmpdb: An Open-Source Matched Molecular Pair Platform for Large Multiproperty Data Sets" DOI.

Here we present mmpdb, an open-source matched molecular pair (MMP) platform to create, compile, store, retrieve, and use MMP rules. mmpdb is suitable for the large data sets typically found in pharmaceutical and agrochemical companies and provides new algorithms for fragment canonicalization and stereochemistry handling. The platform is written in Python and based on the RDKit toolkit.

Go over to the project page http://mmpdb.dalkescientific.com to find out more and if you can contribute please do, and also please share the link. He will be talking at the RDKit UGM #rdkitugm2019 and the presentation will probably be online later.


Comments

Determining the Amino Acids in a collection of peptides

 

I've recently become interested the comparison of the amino amino-acid composition of peptides, to allow comparison of cyclic versus linear peptides, or brain penetrant curses non-penetrant. I had a look around but could not find any tools that did this, in particular I wanted to include any non-proteinergic amino-acids.

This tutorial provides a means to analyse many thousands of peptides using Vortex.

Comments

OraRdkitCart an Oracle data cartridge

 

OraRdkitCart is an Oracle data cartridge/extensible index to allow substructure and similarity searching using SQL queries on tables which contain indexed chemical structures.

It uses a Java RMI server and RDKit wrappers for chemical structure handling.

The cartridge has been tested on Oracle 12C and Oracle 18C. It would be expected to run on Oracle 19C, but has not yet been tested.

Full details on GitHub https://jones-gareth.github.io/OraRdKitCart/index.html

Comments

Greg Landrum's ACS talk on RDKit

 

I've created a page of open source cheminformatics toolkits here.




Comments

Novartis Open Source tools for Drug Discovery

 

I'm sure most readers of this site are aware of the Open-Source cheminformatics toolkit RDKit that was first developed in Novartis. However I wonder how many are aware of the other Open-Source tools that Novartis have supported.

You can read more about them here

The Novartis Institutes for BioMedical Research (NIBR) is pioneering new informatics tools for drug discovery. We believe in the power of open-sourced, global collaboration for the greater good. Join us to help patients worldwide.

They are available on GitHub here.

They include Habitat an object management system, OntoBrowser a tool to manage ontologies and controlled terminologies. YAP is an extensible parallel framework, written in Python using OpenMPI libraries, and GridVar a jQuery plugin that visualises multi-dimensional datasets as layers organised in a row-column format

Comments

An interactive RDKit widget for Jupyter: a first pass

 

This looks like it could be very interesting.

A blog post by Greg Landrum a widget for displaying molecules where you can select atoms and find out which atoms are selected propagating to Python in a Jupyter Notebook.

This is basic, but I think it's a decent start towards something that could be really useful. Interested? Have suggestions (ideally accompanied by code!) on how to improve it? If it looks like this is actually likely to be used, I will figure out how to create a standalone nbwidget out of this and create a separate github repo for it.

Looks like a useful tool for selecting bonds for conformational analysis, selecting bonds for creating a Ramachandran plot, selecting groups for bioisosteric replacement……

Sounds like Greg is looking for input.


Comments

Jupyter notebook to look at molecular similarity

 

I was recently asked for a tool to compare the similarity of a list of molecules with every other molecule in the list. I suspect there may be commercial tools to do this but for small numbers of compounds it is easy to visualise in a Jupyter notebook using RDKit.

Read more here, MolecularSimilarityNotebook

molsim


Comments

Openforcefield

 

The Open Force Field Initiative is an open source, open science, and open data approach to better force fields. All the code is on GitHib and they also provide highly curated datasets.

The idea is to enable molecular mechanics on small and macromolecules jointly using open and freely available software.

A recent blog post from Peter Schmidtke caught my eye.

Recently a few updates of the openforcefield toolkit came out … a game changer, as you’ll see.

The work investigated whether the 768 fragments from the XChem fragment library at Diamond can be parametrised with the new version of Open Force Field (0.4) and how they behave after a simple minimisation.

In short all fragments technically pass the parametrisation and minimisation step, this was supported by visual inspection.

All the code is on GitHub.


Comments

NextMove open source MolHash

 

MolHash is a command-line application and programming library for generating hashes from molecular structures. This section gives an overview of each of the most useful hash functions in turn. The user should find it straightforward to add additional hash functions, or tweak the existing ones.

The source code is available on GitHub https://github.com/nextmovesoftware/molhash.

CMAKE, RDKit and Boost are required.

There are detailed instructions on GitHub describing the compilation and installation instructions, but I got several errors asking where RDKit was etc.

Fortunately, thanks to Matt, you can now install using conda

conda install -c mcs07 -c conda-forge molhash

Once installed you can check it is working by typing this in the Terminal

MacPro:username$ molhash -help
usage:  molhash [options] <infile> [<outfile>]
    Use a hyphen for <infile> to read from stdin
options:    
    -a  Process all the molecule (and not just the single largest component)
    -sa Suppress atom stereo
    -sb Suppress bond stereo
    -sh Suppress explicit hydrogens
    -si Suppress isotopes
    -sm Suppress atom maps
    -t  Store titles only
hash type:
    -g   anonymous graph [default]
    -e   element graph
    -s   canonical smiles
    -m   Murcko scaffold
    -mf  molecular formula
    -ab  atom and bond counts
    -dv  degree vector
    -me  mesomer
   -ht  hetatom tautomer
    -hp  hetatom protomer
   -rp  redox-pair
    -ri  regioisomer
    -nq  net charge

An example of usage

 MacPro:username$ echo "c1ccccc1C(=O)Cl" | molhash -mf -
C7H5ClO c1ccc(cc1)C(=O)Cl
Comments

End of the line for Python 2

 

Just a reminder that support for Python 2.7 will end on Jan 31 2020 (there will be no 2.8), all major scientific packages now support Python 3.x and there will be no further updates the Python 2.x versions.

An increasing number of projects have pledged to drop support for Python 2.7 no later than 2020, these include pandas, RDKit, iPython, Matplotlib, NumPy, SciPy, BioPython, Psi4, scikit-learn, Tensorflow, Jupyter notebook and many more.

Time to update those old scripts and Jupyter notebooks.

Comments

HELM notation in Jupyter Notebook

 

I was recently asked for a way to visualise HELM notation

HELM (Hierarchical Editing Language for Macromolecules) enables the representation of a wide range of biomolecules such as proteins, nucleotides, antibody drug conjugates etc. whose size and complexity render existing small-molecule and sequence-based informatics methodologies impractical or unusable.

The RDKit provides limited support for HELM notation (currently peptide) and a simple Jupyter Notebook provides an easy interface as shown here


Comments

A review of alvaDesc

 

alvaDesc is a desktop tool for the calculation of a wide range of molecular descriptors and a number of molecular fingerprints from https://www.alvascience.com. alvaDesc can be used to determine over 5000 different descriptors (the full list is here).

It can be accessed via the command line or via a GUI.

3Dplot

The complete review is here..



Comments

A Quick look at Flare and Python

I recently wrote a review of Flare Version 2 which is a recent extension to the Cresset portfolio with the introduction of Electrostatic Complementarity (EC), i.e. a comparison of electrostatics on both the small molecule ligand and the target protein. In addition Flare version 2 includes a new Python API, that allows users to automate tasks by scripting, but also integration with other Python packages such as RDKit cheminformatics toolkit, Python modules for graphing, statistics (NumPy, SciPy, MatPlotLib), and Jupyter notebook integration, it is this aspect of Flare that is the subject of this review.


Comments

Chembience updated

 

Update to RDKit 2018.09.2 and Postgres 10.7.

Chembience is a Docker based platform supporting the fast development of chemoinformatics-centric web applications and microservices. It creates a clean separation between your scientific web service implementation and any host-specific or infrastructure-related configuration requirements.


Comments

Update to MayaChemTools

 

I just heard that the following command line scripts available as part of MayaChemTools package now have implemented multiprocessing functionality.

o RDKitCalculateMolecularDescriptors.py

o RDKitCalculatePartialCharges.py

o RDKitGenerateConformers.py

o RDKitFilterChEMBLAlerts.py

o RDKitFilterPAINS.py

o RDKitPerformMinimization.py

o RDKitRemoveSalts.py

o RDKitSearchSMARTS.py


Comments

New release of MayaChemTools

 

A new release of MayaChemTools is now available, these comprise a fantastic collection of Perl and Python scripts, modules, and classes to support a variety of day-to-day computational discovery needs.

The core set of command line Perl scripts available in the current release of MayaChemTools has no external dependencies and provide functionality for the following tasks:

  • Manipulation and analysis of data in SD, CSV/TSV, sequence/alignments, and PDB files
  • Listing information about data in SD, CSV/TSV, Sequence/Alignments, PDB, and fingerprints files
  • Calculation of a key set of physicochemical properties, such as molecular weight, hydrogen bond donors and acceptors, logP, and topological polar surface area
  • Generation of 2D fingerprints corresponding to atom neighborhoods, atom types, E-state indices, extended connectivity, MACCS keys, path lengths, topological atom pairs, topological atom triplets, topological atom torsions, topological pharmacophore atom pairs, and topological pharmacophore atom triplets
  • Generation of 2D fingerprints with atom types corresponding to atomic invariants, DREIDING, E-state, functional class, MMFF94, SLogP, SYBYL, TPSA and UFF
  • Similarity searching and calculation of similarity matrices using available 2D fingerprints
  • Listing properties of elements in the periodic table, amino acids, and nucleic acids
  • Exporting data from relational database tables into text files

The command line Python scripts based on RDKit provide functionality for the following tasks:

  • Calculation of molecular descriptors and partial charges
  • Comparison of 3D molecules based on RMSD and shape
  • Conversion between different molecular file formats
  • Enumeration of compound libraries and stereoisomers
  • Filtering molecules using SMARTS, PAINS, and names of functional groups
  • Generation of graph and atomic molecular frameworks
  • Generation of images for molecules
  • Performing structure minimization and conformation generation based on distance geometry and forcefields
  • Performing R group decomposition
  • Picking and clustering molecules based on 2D fingerprints and various clustering methodologies
  • Removal of duplicate molecules and salts from molecules

The command line Python scripts based on PyMOL provide functionality for the following tasks:

  • Aligning macromolecules
  • Splitting macromolecules into chains and ligands
  • Listing information about macromolecules
  • Calculation of physicochemical properties
  • Comparison of marcromolecules based on RMSD
  • Conversion between different ligand file formats
  • Mutating amino acids and nucleic acids
  • Generating Ramachandran plots
  • Visualizing X-ray electron density and cryo-EM density
  • Visualizing macromolecules in terms of chains, ligands, and ligand binding pockets
  • Visualizing cavities and pockets in macromolecules
  • Visualizing macromolecular interfaces
  • Visualizing surface and buried residues in macromolecules

Comments

GuacaMol, benchmarking models.

 

Comparison of different algorithms is an under researched area, this publication looks like a useful starting point.

GuacaMol: Benchmarking Models for De Novo Molecular Design

De novo design seeks to generate molecules with required property profiles by virtual design-make-test cycles. With the emergence of deep learning and neural generative models in many application areas, models for molecular design based on neural networks appeared recently and show promising results. However, the new models have not been profiled on consistent tasks, and comparative studies to well-established algorithms have only seldom been performed. To standardize the assessment of both classical and neural models for de novo molecular design, we propose an evaluation framework, GuacaMol, based on a suite of standardized benchmarks. The benchmark tasks encompass measuring the fidelity of the models to reproduce the property distribution of the training sets, the ability to generate novel molecules, the exploration and exploitation of chemical space, and a variety of single and multi-objective optimization tasks. The benchmarking framework is available as an open-source Python package.

Source code : https://github.com/BenevolentAI/guacamol.

The easiest way to install guacamol is with pip:

pip install git+https://github.com/BenevolentAI/guacamol.git#egg=guacamol --process-dependency-links

guacamol requires the RDKit library (version 2018.09.1.0 or newer).


Comments

An automated framework for NMR chemical shift calculations of small organic molecules

 

A recent paper in Journal of Cheminformatics describes An automated framework for NMR chemical shift calculations of small organic molecules DOI.

As an alternative, we introduce the in silico Chemical Library Engine (ISiCLE) NMR chemical shift module to accurately and automatically calculate NMR chemical shifts of small organic molecules through use of quantum chemical calculations. ISiCLE performs density functional theory (DFT)-based calculations for predicting chemical properties—specifically NMR chemical shifts in this manuscript—via the open source, high-performance computational chemistry software, NWChem.

Isicle is available from GitHub https://github.com/pnnl/isicle or can be installed using Conda (with required dependencies

conda create -n isicle -c bioconda -c openbabel -c rdkit -c ambermd python=3.6.1 openbabel rdkit ambertools snakemake numpy pandas yaml statsmodels

In addition, ensure the following third-party software is installed and added to your PATH:

cxcalc (license required from ChemAxon, Marvin)
NWChem http://www.nwchem-sw.org/index.php/Download.

ISiCLE is implemented using the Snakemake workflow management system, enabling scalability, portability, provenance, fault tolerance, and automatic job restarting. Snakemake provides a readable Python-based workflow definition language and execution environment that scales, without modification, from single-core workstations to compute clusters through as-available job queuing based on a task dependency graph.

There is more details on Snakemake here.

I've added Isicle to the Spectroscopy Page.


Comments

How to contribute to RDKit

 

I just noticed that Greg Landrum has posted a page on how to contribute to RDKit. https://github.com/rdkit/rdkit/wiki/HowToContribute.

There many ways to contribute, you don't have to be Python or C++ developer, simply being an active user and asking questions and contributing solutions helps other users. Improving the documentation is always a great place from newcomers to start, particularly highlighting things that are not as clear as they could be.

I've also added the link to the Toolkits page.


Comments

Install RDKiit using Conda

 

Just highlighted on the RDKit email list, you can install RDKit using conda.

https://anaconda.org/conda-forge/rdkit

RDKit is a collection of cheminformatics and machine-learning software written in C++ and Python.

There are other cheminformatics toolkits described here, and details on how to install a wide range of cheminformatics tools on a Mac detailed here


Comments

Installing Cheminformtics packages on a Mac

 

A while back I wrote a very popular page describing how to install a wide variety of chemiformatics packages on a Mac, since there have been some changes with Homebrew which have meant that a few of the scientific applications are no longer available so I've decided to rewrite the page on installing the missing packages using Anaconda.

I've also included a list of quick demos so you can everything is working as expected.

Full details are here

Packages include:

  • OpenBabel
  • RDKit
  • brew install cdk
  • chemspot
  • indigo
  • inchi
  • opsin
  • osra
  • pymol
  • oddt

In addition to gfortran and a selection of developers tools.

Comments

Chembience

 

Chembience is a Docker based platform intended for the fast development of chemoinformatics-centric web applications and micro-services based on RDkit. It supports a clean separation of your scientific web service implementation work from any infrastructure related configuration requirements.

chembience

At its current development stage, Chembience supports three base types of application (App) containers: (1) a Django/Django REST framework-based App container which is specifically suited for the development of web-based Python applications, (2) a Python shell-based App container which allows for the execution of script-based python applications, and (3), a Jupyter-based App container which let you run Jupyter notebooks (currently only a Python kernel is supported).


Comments

Updated Conda

 

I've been checking a few things since I updated. One thing that was immediately apparent was the similarity maps in RDKit are much nicer! As you can see from the output of the HERG prediction.

hergactiverdkit

Feel like I got something for free.


Comments

Accessing a Jupyter Notebook HERG model from Vortex

 

A recent paper "The Catch-22 of Predicting hERG Blockade Using Publicly Accessible Bioactivity Data" DOI described a classification model for HERG activity. I was delighted to see that all the datasets used in the study, including the training and external datasets, and the models generated using these datasets were provided as individual data files (CSV) and Python Jupyter notebooks, respectively, on GitHub https://github.com/AGPreissner/Publications).

The models were downloaded and the Random Forest Jupyter Notebooks (using RDKit) modified to save the generated model using pickle to store the predictive model, and then another Jupyter notebook was created to access the model without the need to rebuild the model each time. This notebook was exported as a python script to allow command line access, and Vortex scripts created that allow the user to run the model within Vortex and import the results and view the most significant features.

All models and scripts are available for download.

Full details are here…

hergactiveVortex


Comments

How Do You Build and Validate 1500 Models and What Can You Learn from Them?

 

Greg Landrum's ICCS 2018 presentation on slideshare


Comments

Google Sumer of code, Open Chemistry Projects

 

The details of some of the projects taking part in the Google Summer of Code are now online here https://summerofcode.withgoogle.com/organizations/6513013473935360/ under the Open Chemistry header.

Really interesting work includes 3-D coordinate generation, standardising fingerprint APIs, a framework for molecular validation, and standardization and molecular dynamics in Avogadro.

Good luck to all that are taking part!!


Comments

RDKit code changes

 

I just saw this on the RDKit email circulation list and since I know a number of readers use RDKit I thought I'd mention it.

When we do the beta for the 2018.03.1 release we're going to switch the C++ backend to use modern C++ (=C++11). For people who can't switch to use that code, we will continue to provide bug fixes for the 2017.09 release for at least another 6 months.

This should only affect people who need to build the RDKit C++ code themselves. If you use a binary version of the RDKit like the ones available inside of Anaconda Python or KNIME, this change should have no impact upon you.

It looks like we're almost there. Hopefully we will be able to do a beta of the 2018.03 release by the end of the week.


Comments

RDkit in Samson

 

I've posted about Samson a couple of times and it just keeps getting better and better.

SAMSON is a novel software platform for computational nanoscience. Rapidly build models of nanotubes, proteins, and complex nanosystems. Run interactive simulations to simulate chemical reactions, bend graphene sheets, (un)fold proteins. SAMSON's generic architecture makes it suitable for material science, life science, physics, electronics, chemistry, and even education. SAMSON is developed by the NANO-D group at INRIA, and means "Software for Adaptive Modeling and Simulation Of Nanosystems.

A recent blog post highlights the use of RDKit in Samson.

In this post I will present you the RDKit-SMILES Manager module that I integrated in the SAMSON platform. As some of you know, RDKit is an open source toolkit for cheminformatics which is widely used in the bioinformatics research. One of its features is the conversion of molecules from their SMILES code to a 2D and 3D structures. Thanks to the new SAMSON Element, it is now possible to use these features in the SAMSON platform. SMILES code files (.smi) or text files (.txt) containing several SMILES codes can be read using the import button.

The new module allows you to import a file containing SMILES strings, generate 2D depictions, and by right-clicking on these images, you can open, generate the 3D structure in SAMSON or save the image as png or svg.

GenAll

It is also possible to run substructure searching using SMARTS.


Comments

mmpdb: An Open Source Matched Molecular Pair Platform for Large Multi-Property Datasets

 

An interesting paper on chemrxiv DOI

Matched Molecular Pair Analysis (MMPA) enables the automated and systematic compilation of medicinal chemistry rules from compound/property datasets. Here we present mmpdb, an open source Matched Molecular Pair (MMP) platform to create, compile, store, retrieve, and use MMP rules. mmpdb is suitable for the large datasets typically found in pharmaceutical and agrochemical companies and provides new algorithms for fragment canonicalization and stereochemistry handling. The platform is written in Python and based on the RDKit toolkit. It is freely available from https://github.com/rdkit/mmpdb


Comments

Google Summer of Code:- Open Chemistry

 

There are a number of interesting projects being undertaken in this years Google Summer of Code.

If you know of any students that might be interested then perhaps point them to the Open Chemistry Project.

The Open Chemistry project is a collection of open source, cross platform libraries and applications for the exploration, analysis and generation of chemical data. The organization is an umbrella of leading projects developed by long-time collaborators and innovators in open chemistry such as the Avogadro, Open Babel, and cclib projects. These three alone have been downloaded over 700,000 times and cited in over 2,000 academic papers. Our goal is to improve the state of the art, and facilitate the open exchange of chemical data and ideas while utilizing the best technologies from quantum chemistry codes, molecular dynamics, informatics, analytics, and visualization.

There is a list of the GSoC Ideas 2018 here but of course students can add their own.


Comments

MayaChem Tools

 

MayaChemTools is a fabulous collection of Perl and Python scripts, modules, and classes to support a variety of day-to-day computational discovery needs.

The core set of command line Perl scripts available in the current release of MayaChemTools has no external dependencies and provide functionality for the following tasks:

  • Manipulation and analysis of data in SD, CSV/TSV, sequence/alignments, and PDB files
  • Listing information about data in SD, CSV/TSV, Sequence/Alignments, PDB, and fingerprints files
  • Calculation of a key set of physicochemical properties, such as molecular weight, hydrogen bond donors and acceptors, logP, and topological polar surface area
  • Generation of 2D fingerprints corresponding to atom neighborhoods, atom types, E-state indices, extended connectivity, MACCS keys, path lengths, topological atom pairs, topological atom triplets, topological atom torsions, topological pharmacophore atom pairs, and topological pharmacophore atom triplets
  • Generation of 2D fingerprints with atom types corresponding to atomic invariants, DREIDING, E-state, functional class, MMFF94, SLogP, SYBYL, TPSA and UFF
  • Similarity searching and calculation of similarity matrices using available 2D fingerprints
  • Listing properties of elements in the periodic table, amino acids, and nucleic acids
  • Exporting data from relational database tables into text files

The command line Python scripts based on RDKit provide functionality for the following tasks:

  • Calculation of molecular descriptors
  • Comparison 3D molecules based on RMSD and shape
  • Conversion between different molecular file formats
  • Enumeration of compound libraries and stereoisomers
  • Filtering molecules using SMARTS, PAINS, and names of functional groups
  • Generation of graph and atomic molecular frameworks
  • Generation of images for molecules
  • Performing structure minimization and conformation generation based on distance geometry and forcefields
  • Picking and clustering molecules based on 2D fingerprints and various clustering methodologies
  • Removal of duplicate molecules

These invaluable scripts can be used in other applications, I've written a Vortex Script that uses them.


Comments

“Found in Translation”: Predicting Outcomes of Complex Organic Chemistry Reactions

 

An interesting paper uses 1,808,938 reactions from the patent literature as a training set to build a model to predict reactions.

There is an intuitive analogy of an organic chemist's understanding of a compound and a language speaker's understanding of a word. Consequently, it is possible to introduce the basic concepts and analyze potential impacts of linguistic analysis to the world of organic chemistry. In this work, we cast the reaction prediction task as a translation problem by introducing a template-free sequence-to-sequence model, trained end-to-end and fully data-driven. We propose a novel way of tokenization, which is arbitrarily extensible with reaction information. With this approach, we demonstrate results superior to the state-of-the-art solution by a significant margin on the top-1 accuracy. Specifically, our approach achieves an accuracy of 80.1% without relying on auxiliary knowledge such as reaction templates. Also, 66.4% accuracy is reached on a larger and noisier dataset.

There is also a brief video describing the work.


Comments

RDKit conformer generation script

 

Pharmacelera we have written a python script to generate conformations with RDKit and made it available here .

Conformer generation is one of the first and most important steps in most ligand based experiments, particularly when the ligand’s 3D structure is unknown. For example, the quality of the conformers could affect the results of virtual screening experiments.


Comments

Rdkit warning

 

I just saw this message on the rdkit mailing list and I thought I'd flag it.

I've noticed a problem with anaconda python on the Mac. This may also be a problem on linux, but I haven't tested that yet.

Due to some changes in the way the anaconda team is doing python builds, the most recent conda python builds seem to no longer work with the RDKit. The symptom is an error message like "Fatal Python error: PyThreadState_Get: no current thread" when you try to import the rdkit.

I've observed this for the newest 3.5 (3.5.4-hf91e95415) and 3.6 (3.6.2-hd0bf7f115) builds. A workaround is to downgrade to 3.5.3 (conda install python=3.5.3) or 3.6.1 (conda install python=3.6.1).

Comments

RDKit and Python3

 

Greg Landrum posted the following to the RDKit users and since a couple of the Jupyter Notebooks I've published make extensive use of RDKit I thought I'd flag it.

As many of you are no doubt aware, the Python community plans to discontinue support for Python 2 in 2020. A growing number of projects in the Scientific Python stack are making the same transition and have made that explicit here: http://www.python3statement.org/

I will be adding the RDKit to this list. The RDKit will switch to support only Python 3 by 2020. At some point between now and then - likely during the 2018.09 release cycle - we will create a maintenance branch for Python 2 that will continue to get bug fixes but will no longer have new Python features added. This branch will be maintained, and we will keep doing Python 2 builds, until 2020 when official Python 2 support ends.

Additionally, starting during the 2018.03 release cycle we will accept contributions for new features that are not compatible with Python 2 as long as those features are implemented in such a way that they don't break existing Python 2 code (more on this later). This will allow members of the RDKit community who have made the switch to Python 3 to start making use of the new features of the language in their RDKit contributions.

If you have not made the switch yet to Python 3: please read the web page I link to above and take a look at the list of projects that have committed to transition. The switch from Python 2 to Python 3 isn't always easy, but it's not getting any easier with time and you have a few years to complete it. There are a lot of online resources available to help.

Best Regards, -greg

The list of projects that will be making the transition so far includes; IPython, Jupyter notebook, pandas, Matplotlib SymPy, Astropy, Software Carpentry, SunPy xonsh, scikit-bio, PyStan, Axelrod osBrain, PyMeasure, rpy2, PyMC3, FEniCS, An Introduction to Applied Bioinformatics, music21, QIIME, Altair, gala, cual-id, CIS


Comments

Conformer generation

 

The generation of multiple conformations is an important step in a number of operations from input to ab initio calculations to providing input files for docking studies. A recent paper compared seven freely available conformer ensemble generators: Balloon (two different algorithms), the RDKit standard conformer ensemble generator, the Experimental-Torsion basic Knowledge Distance Geometry (ETKDG) algorithm, Confab, Frog2 and Multiconf-DOCK DOI, and also provided a dataset of ligand conformations taken from the PDB.

A recent twitter discussion involving Greg Landrum and David Koes prompted Greg to publish a blog post describing conformation generation within RDKit. The post compares using distance geometry to select diverse conformations versus an approach that combines the distance geometry approach with experimental torsion-angle preferences obtained from small-molecule crystallographic data (ETKDG). He also looks at the impact of force-field minimisation.

A really interesting read with code provided.


Comments

RDkit and Conda install of postgres cartridge on Mac OS

 

There has been an interesting discussion about installing rdkit-postgresql95 on Mac OS X on the rdkit mailing list and I thought it might be of wider interest.

Here's the resolution of the difficulties I was having installing rdkit-postgresql95 on Mac OS X. The problem turned out to be that the package originally posted used Py3.5, and I'm still using 2.7. I may change to 3.5 at some point, but Greg was kind enough to add a 2.7 version of the package.

So, the following invocations work to set up rdkit with the cartridge in a new env on Mac OS X. I'm on El Capitan, by the way, and for clarity, I've not tested the installation, but only checked that it completed successfully.

conda create -n rdk1 -c rdkit rdkit
. activate rdk1
conda install -c greglandrum rdkit-postgresql95

(The last command also installs postgresql 9.5.4-0.)


Comments

iPython Notebook issue

 

I’ve just been made aware of an issue with one of the Calculated properties iPython Notebook.

The latest update to Pandas

the respective piece of the pandas API got restructured for 0.18.1 and that the “format" module got moved from pandas.core to pandas.formats:

The consequence is that PandasTools now raises an error on attempting to import molecules into a data frame.

from rdkit.Chem import PandasTools
df = PandasTools.LoadSDF("demo.sdf")

AttributeError                          Traceback (most recent call last)
/Users/philopon/mysrc/python/mordred/.direnv/python-3.5.1/lib/python3.5/site-packages/IPython/core/formatters.py in __call__(self, obj)
    341             method = _safe_get_formatter_method(obj, self.print_method)
    342             if method is not None:
--> 343                 return method()
    344             return None
    345         else:

/Users/philopon/mysrc/python/mordred/.direnv/python-3.5.1/lib/python3.5/site-packages/pandas/core/frame.py in _repr_html_(self)
    566 
    567             return self.to_html(max_rows=max_rows, max_cols=max_cols,
--> 568                                 show_dimensions=show_dimensions, notebook=True)
    569         else:
    570             return None

/usr/local/Cellar/rdkit-python/2016.03.1/lib/python3.5/site-packages/rdkit/Chem/PandasTools.py in patchPandasHTMLrepr(self, **kwargs)
    129   Patched default escaping of HTML control characters to allow molecule image rendering dataframes
    130   '''
--> 131   formatter = pd.core.format.DataFrameFormatter(self,buf=None,columns=None,col_space=None,colSpace=None,header=True,index=True,
   132                                                na_rep='NaN',formatters=None,float_format=None,sparsify=None,index_names=True,
    133                                                justify = None, force_unicode=None,bold_rows=True,classes=None,escape=False)

AttributeError: module 'pandas.core' has no attribute 'format'

At the moment the only solution is to make sure you are using Pandas version 0.18.0

pip uninstall pandas    
pip install pandas==0.18.0

Comments

SAR visualization with RDKit

 

One of the issues for machine learning models in helping understand structure activity relationships (SAR) is providing a nice chemist friendly visualisation. This excellent blog post provides a description of how to colour code the parts of molecules that are predicted to contribute to an activity.

inactive


Comments

RDkit updated

 

RDkit has been updated .

If you used home-brew to install RDkit as described here updating is very simple

brew update
brew upgrade rdkit

You can check which version you have installed using

MacPro> python
Python 2.7.11 (default, Dec 23 2015, 16:11:50) 
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from rdkit import rdBase
>>> print rdBase.rdkitVersion
2016.03.1
>>>

Comments

iPython Notebook to calc physicochemical properties

 

I've been making increasing use of iPython notebooks, both as a way to perform calculations but also as a way of cataloging the work that I've been doing. One thing I seem to be doing quite regularly is calculating physicochemical properties for libraries of compounds and then creating a trellis of plots to show each of the calculated properties. In the past I've done this with a series of applescripts using several applications. This seemed an ideal task to try out using an iPython notebook.

test2png

Read more ….


Comments

Chemical similarity search in MongoDB

 

MongoDB (from "humongous") is an open-source object orientated document database.

Classified as a NoSQL database, MongoDB eschews the traditional table-based relational database structure in favor of JSON-like documents with dynamic schemas (MongoDB calls the format BSON), making the integration of data in certain types of applications easier and faster.

As you might expect chemical searching is not something that is traditionally supported, but there have been a couple of blog articles describing initial efforts, and there is now a detailed step by step description available. The post described implementation of chemical similarity searching using MongoDB and RDKit fingerprints it also has some initial comparisons with the more traditional SQL implementation using the RDKit PostgreSQL cartridge.

Comments

FMCS 1.0 - Find Maximum Common Substructure

Andrew Dalke has just released fmcs-1.0. It finds a maximum common substructure of two or more structures. Some of the features are:

  • handles 1,000s of structures
  • several different atom and bond comparison schemes
  • modifiers to require ring bonds only match ring bonds, or that incomplete rings are not allowed in the MCS
  • user-defined atom class typing through isotope labels (SMILES) or through an SD tag field
  • uses an exact solution to find a maximum common substructure
  • eports the current best solution if the timeout is reached

The software is distributed under the 2-clause BSD license and available for no charge from https://bitbucket.org/dalke/fmcs/downloads/fmcs-1.0.tar.gz

You must have the Python bindings to RDKit in order to run fmcs.

Usage details are in the README, shown also in the project page at:  https://bitbucket.org/dalke/fmcs/