LSH-based similarity search in MongoDB is faster than postgres cartridge


There is a great blog article on ChEMBL-og, describing their work evaluating chemical structure based searching in MongoDB. MongoDB is a NoSQL database designed for scalability and performance that is attracting a lot of interest at the moment.

The article does a great job in explaining the logic behind improving the search performance.

They also provide an iPython notebook so you can try it yourself.


myChEMBL and Docker under MacOSX


Following on from the release of ChEMBL 20 earlier in the year we now see the release of the MyChEMBL virtual machines supporting a CentOS-based image, along with the existing Ubuntu version. What might be of interest to Mac OS X users is are myChEMBL Docker images.

Docker is an open platform for building, shipping and running distributed application. Docker scontainers wrap up a piece of software in a complete filesystem that contains everything it needs to run: code, runtime, system tools, system libraries – anything you can install on a server. This guarantees that it will always run the same, regardless of the environment it is running in. Installation under Linux is straightforward and instructions for Mac OS X are provided.

Installation on OS X is more complicated. This is because the standard OS X installation downloads and configures VirtualBox and runs a very lightweight 64-bit Linux with docker installed. Now the problem is, that it won't work in case of myChEMBL. This is because this Virtual Machine has only 20GB of available disk space and our myChEMBL container is 23GB after decompressing. So in order to use it, you first have to resize the volume, which is explained here:

Once done the steps are very simple:

Download the MyChEMBL image from the FTP. Uncompress Load image into docker Run it

Detailed instructions are provided here

After successful completion of the steps above, you can open you browser and go to if you are running docker locally or http://someotherhost/ if you are running docker on some other host. You should then be able to see myChEMBL launchpad page.


ChemSpider website update


The ChemSpider Website has been updated.

ChemSpider is a free chemical structure database providing fast text and structure search access to over 34 million structures from hundreds of data sources.

The new website has a much cleaner look and perhaps more importantly is now viewable on the smaller screens of mobile devices. This is achieved by collapsing the individual page record into a tabbed view to reduce scrolling on long records and by eliminating the java based plugins and replacing them with javascript versions.




A few applications have been updated over the week or so.

Wizard Pro for Mac has been updated to version 1.7.0 highlights from the update include:-

1-click Data Refresh: Suppose you've imported and cleaned your data, perhaps built a few models -- and then your data changes. Now, thanks to the new "Refresh" button in the toolbar, you can instantly update all of your analyses using fresh data from the original source. Customize how columns are matched up with a convenient popover, and feel free to move or rename the source data file on your computer -- Wizard will automatically keep an eye on it. Command-R to refresh the data, Command-E to configure the link.

Revamped menu system: Wizard has a new modular architecture that means you'll only see menus relevant to what you're doing -- that is, Raw Data, Pivot, Summary, Model, and Predict each have their own menu now. Most of the menus are more concise, so you can find what you're looking for faster.

JSON support: by popular request, Wizard 1.7 can import JavaScript (JSON) data files. Make sure your data is encoded as an array of objects and Wizard will do the rest. Export any table or result as JSON, too.

There is a review of Wizard Pro here.

Plot2 a scientific 2D plotting program designed for everyday plotting, it is easy to use, it creates high quality plots, and it allows easy and powerful manipulations and calculations of data. The latest update fives and export bug.

MyScript Calculator for iOS, fixes problems with tutorial being played repeatedly and a drag and drop bug.


ChEMBL 20 released


ChEMBL 20 has been released.

The updated database contains

  • * 1,715,135 compound records
  • * 1,463,270 compounds (of which 1,456,020 have mol files)
  • * 13,520,737 activities
  • * 1,148,942 assays
  • * 10,774 targets
  • * 59,610 source documents

A number of structural alerts have now been added these include Pfizer LINT filters, Glaxo Wellcome Hard Filters, Bristol-Myers Squibb HTS Deck Filters, NIH MLSMR Excluded Functionality Filters, University of Dundee NTD Screening Library Filters and Pan Assay Interference Compounds (PAINS) Filters. The PAINS annotation was created using the Vortex script described here.




I was recently sent details of a new website Chemplore the aim is to provide an modern, interactive and easy way to visualize small molecules and macromolecules in the browser. It's built using many modern web technologies and tools including WebGL, SVG and Go.

It pulls data from a variety of sources including PubChem and PDB, and provides interactive 2D and 3D viewers plus a variety of chemical information.


It is currently beta and the developers are looking for feedback




I just got this email

iScienceSearch, the Internet search engine for chemists is now completely free! Please click to start the application. There is nothing to download. This application will run in your browser.

I’ve previously reviewed iScienceSearch and it seem to have been updated considerably since then.

iScienceSearch is a meta search engine that searches over 100 different databases, The search engine is intelligent and will search using any synonyms or chemical structures of your search query to extend the search to data sources that might not include the original query text.

  • Search the Internet by drawing a structure.
  • Type a name or identifier and you get the structure.
  • Find suppliers for lab and research chemicals
  • Search the AKosSamples database by substructure



Dataglass, database access for iOS


I suspect most scientists are now finding that they are storing data in SQL databases and I noticed that Impathic have released a series of tools to access a variety of SQL databases from your iOS device, so whether you are using MySQL, Oracle, Access etc there is probably a dataglass app to give you access.


Chemical similarity search in MongoDB


MongoDB (from "humongous") is an open-source object orientated document database.

Classified as a NoSQL database, MongoDB eschews the traditional table-based relational database structure in favor of JSON-like documents with dynamic schemas (MongoDB calls the format BSON), making the integration of data in certain types of applications easier and faster.

As you might expect chemical searching is not something that is traditionally supported, but there have been a couple of blog articles describing initial efforts, and there is now a detailed step by step description available. The post described implementation of chemical similarity searching using MongoDB and RDKit fingerprints it also has some initial comparisons with the more traditional SQL implementation using the RDKit PostgreSQL cartridge.


Molecule database framework


I thought I would highlight a recent publication I read in Journal of Cheminformatics “Molecule database framework: a framework for creating database applications with chemical structure search capability” Journal of Cheminformatics 2013, 5:48 DOI.

From the abstract

Molecule Database Framework is written in Java and I created it by integrating existing free and open-source tools and frameworks. The core functionality includes:Chemical structure searches combined with property searches. Support for multi-component compounds (mixtures) mport and export of SD-files. Optional security (authorization). For chemical structure searching Molecule Database Framework leverages the capabilities of the Bingo Cartridge for PostgreSQL and provides type-safe searching, caching, transactions and optional method level security. Molecule Database Framework supports multi-component chemical compounds (mixtures). Furthermore the design of entity classes and the reasoning behind it are explained. By means of a simple web application I describe how the framework could be used. I then benchmarked this example application to create some basic performance expectations for chemical structure searches and import and export of SD-files.

While not a drag and drop solution it provides a means to create your own personal chemically searchable database.

Molecule Database Framework is available for download on the projects web page on bitbucket: