Macs in Chemistry

Insanely great science

 

Flexible Search of UniChem

UniChem is a web resource provided by the EBI, it is a 'Unified Chemical Identifier' system, designed to assist in the rapid cross-referencing of chemical structures, and their identifiers, between multiple databases. Currently the UniChem contains data from 27 different data sources. Currently UniChem provides links to 108,941,995 structures.

Chambers, J., Davies, M., Gaulton, A., Hersey, A., Velankar, S., Petryszak, R., Hastings, J., Bellis, L., McGlinchey, S. and Overington, J.P. UniChem: A Unified Chemical Structure Cross-Referencing and Identifier Tracking System. Journal of Cheminformatics 2013, 5:3 (January 2013). DOI: http://dx.doi.org/10.1186/1758-2946-5-3

The previous script showed how to search using ChEMBLID, however one of the attractions of UniChem is that you can search with any molecule identifier if you know the corresponding datasource. This script allows the user to use any molecule identifiers and then search a specified datasource using a common web service.

The first part of the script populates a dialog box that allows the user to select both the column contains the molecule id and the datasource that is to be searched.

drugbank

All RESTful queries are constructed using the following base url

https://www.ebi.ac.uk/unichem/rest/

Specific query urls are then constructed by adding a method name to this base url, followed by input data.

Input data may consist of three types

src_compound_id (the molecule identifier)
src_id (the number for the datasource, ChEMBL is 1)
InChIKey

If the column contained ChEMBLID the URL would have the form,

https://www.ebi.ac.uk/unichem/rest/src_compound_id/CHEMBL1089/1

for other datasources we just need the src_id.

One slight complication is that whilst there are 27 datasources the numbering for the datasources goes up to 31. This is because 13, 16, 19 and 30 are missing. So whilst we can get the index position of the datasource.

input_dbx = input_db.getSelectedIndex()

This does might correspond to the number of the src_id required for the URL, so we need to have a list of datasource numbers

datasourceNumbers = ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '14', '15', '17', '18', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '31']

and then use the index position of the datasource to get the src_id

chosen_db = datasourceNumbers[input_dbx]

We can then construct the url

api_url = 'https://www.ebi.ac.uk/unichem/rest/src_compound_id/%s/%s' % (chembl_id, chosen_db)

The rest of the script is similar to the previous version.

db_search

The Vortex Script

#Flexible Unichem search to get all ID
#Authored by Chris Swain (http://www.macinchem.org)
#All rights reserved.

# Python imports
import urllib2
import urllib
from com.xhaus.jyson import JysonCodec as json

# Vortex imports
import com.dotmatics.vortex.util.Util as Util
import com.dotmatics.vortex.mol2img.jni.genImage as genImage
import com.dotmatics.vortex.mol2img.Mol2Img as mol2Img
import jarray
import binascii
import string
import os

columnNames = vtable.getColumnNames()
datasources = ['ChEMBL', 'Drugbank', 'PDB', 'Guide to Pharm', 'Drugs of the Future', 'Kegg Ligand', 'ChEBI', 'NIH Clinical', 'ZINC', 'eMolecules', 'IBM IP', 'Gene Expression', 'NFDA Substance', 'SureChEMBL   Patents', 'PharmGKB', 'Human Metab', 'Selleck', 'Thomson Pharma', 'Pubchem', 'Mcule', 'NMR shift DB', 'Networks', 'Toxicology Resource', 'Human Metab', 'MolPort', 'Japanese Chemicals', 'BindingDB']
datasourceNumbers = ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '14', '15', '17', '18', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '31']


input_label = swing.JLabel("ID column (for input)")
input_cb = javax.swing.JComboBox(columnNames)
input_db_label = swing.JLabel("Datasource (for input)")
input_db = javax.swing.JComboBox(datasources)
panel = swing.JPanel()

layout.fill(panel, input_label, 0, 0)
layout.fill(panel, input_cb,    1, 0)
layout.fill(panel, input_db_label,  0, 2)
layout.fill(panel, input_db,    1, 2)

ret = vortex.showInDialog(panel, "Choose ID column and Datasource")

if ret == vortex.OK:
    input_idx = input_cb.getSelectedIndex()
    input_dbx = input_db.getSelectedIndex()
    #vortex.alert(input_dbx)

    if input_idx == 0:
        vortex.alert("you must choose a column")
    else:
        chosen_col = vtable.getColumn(input_idx )
        chosen_db = datasourceNumbers[input_dbx]



#vortex.alert(chosen_db)


        #col names from here https://www.ebi.ac.uk/unichem/ucquery/listSources

        cols = {
            '1': vtable.findColumnWithName('ChEMBL', 1), #1
            '2': vtable.findColumnWithName('Drugbank', 1), #2
            '3': vtable.findColumnWithName('PBD', 1), #3
            '4': vtable.findColumnWithName('Guide to Pharm', 1), #4
            '5': vtable.findColumnWithName('Drugs of the Future', 1), #5
            '6': vtable.findColumnWithName('Kegg Ligand', 1), #6
            '7': vtable.findColumnWithName('ChEBI', 1), #7
            '8': vtable.findColumnWithName('NIH Clinical', 1), #8
            '9': vtable.findColumnWithName('ZINC', 1), #9
            '10': vtable.findColumnWithName('eMolecules', 1), #10
            '11': vtable.findColumnWithName('IBM IP', 1), #11
            '12': vtable.findColumnWithName('Gene Expression', 1), #12
            '14': vtable.findColumnWithName('NFDA Substance', 1), #13
            '15': vtable.findColumnWithName('SureChEMBL Patents', 1), #14
            '17': vtable.findColumnWithName('PharmGKB', 1), #15
            '18': vtable.findColumnWithName('Human Metab', 1), #16
            '20': vtable.findColumnWithName('Selleck', 1), #17
            '21': vtable.findColumnWithName('Thomson Pharma', 1), #18
            '22': vtable.findColumnWithName('Pubchem', 1), #19
            '23': vtable.findColumnWithName('Mcule', 1), #20
            '24': vtable.findColumnWithName('NMR shift DB', 1), #21
            '25': vtable.findColumnWithName('Networks', 1), #22
            '26': vtable.findColumnWithName('Toxicology Resource', 1), #23
            '27': vtable.findColumnWithName('Human Metab', 1), #24
            '28': vtable.findColumnWithName('MolPort', 1), #25
            '29': vtable.findColumnWithName('Japanese Chemicals', 1), #26
            '31': vtable.findColumnWithName('BindingDB', 1), #27
        }

        rows = vtable.getRealRowCount()
        for r in range(0, int(rows)):
            chembl_id = chosen_col.getValueAsString(r)
            #vortex.alert(chembl_id)
            # "https://www.ebi.ac.uk/unichem/rest/src_compound_id/CHEMBL1089/1"
            #Need to add chosen_db
            api_url = 'https://www.ebi.ac.uk/unichem/rest/src_compound_id/%s/%s' % (chembl_id, chosen_db)
            try:
                molecule_record = urllib2.urlopen(api_url).read()
            except urllib2.HTTPError:
                continue
            j = json.loads(molecule_record)
            for entry in j:
                src_id = entry['src_id']
                if src_id in cols:
                    cols[src_id].setValueFromString(r, entry['src_compound_id'])


vtable.fireTableStructureChanged()

The script can be downloaded from here https://macinchem.org/reviews/vortex_scripts/UniChemSearch.vpy.zip

Page Updated 15 February 2016