Macs in Chemistry

Insanely great science

 

Scripting Vortex 23

ChEMBL is a manually curated chemical database of bioactive molecules . It is maintained by the European Bioinformatics Institute (EBI), of the European Molecular Biology Laboratory (EMBL), based at the Wellcome Trust Genome Campus, Hinxton, UK. The database currently contains over 1.4 million unique structures with the associated activity at 10,579 different targets. It also acts as a repository for Open Access primary screening and medicinal chemistry data directed at neglected diseases.

Whilst the database can be downloaded the data can also be accessed via a web interface (shown below) and a series of web services

chembl

The currently available web services are :-

General Methods

Check API status

Compound Methods

Get compound by ChEMBLID
Get compound by Standard InChiKey
Get list of compounds matching Canonical SMILES
Get list of compounds matching Canonical SMILES using HTTP POST
Get list of compounds containing the substructure represented by a given Canonical SMILES
Get list of compounds containing the substructure represented by a given Canonical SMILES using HTTP POST
Get list of compounds similar to the one represented by a given Canonical SMILES, at a given cutoff percentage
Get list of compounds similar to the one represented by a given Canonical SMILES, at a given cutoff percentage using HTTP POST
Get image of a ChEMBL compound by ChEMBLID
Get individual compound bioactivities
Get alternative compound forms (e.g. parent and salts) of a compound
Get mechanism of action details for compound (where compound is a drug)

Target Methods

Get all targets
Get target by ChEMBLID
Get target by UniProt Accession Identifier
Get individual target bioactivities
Get approved drugs for target

Assay Methods

Get assay by ChEMBLID
Get individual assay bioactivities

We can use these web services to access ChEMBL data from within Vortex, the following scripts illustrate some of the means to do this.

UniprotID to ChEMBL target information.

When reading interesting results in the literature it is often useful to find out more about a particular target, this script uses the Uniprot ID to interrogate ChEMBL using the “Get target by UniProt Accession Identifier” web service to bring back target information. Because we can’t be sure what the column containing the Uniprot IDs will be entitled (e.g. Uniprot ID, uniprot_id, UNIPROTid etc) the first part of the script pops up a dialog asking the user to select the desired column.

vortex23_1

We then construct the query string to access the appropriate web service, and then pull back the data. There is a little error trapping because some Uniprot IDs may not be in ChEMBL.

mystr = "http://www.ebi.ac.uk/chemblws/targets/uniprot/" + uniprotID + ".json"

The data is returned in json format as shown below.

{"target": {"targetType": "PROTEIN FAMILY", "chemblId": "CHEMBL2095179", "geneNames": "Unspecified", "description": "Adenylate cyclase", "compoundCount": 75, "bioactivityCount": 137, "proteinAccession": "P26769", "synonyms": "ATP pyrophosphate-lyase 2,Adenylate cyclase type II,Adenylyl cyclase 2,4.6.1.1,Adcy2,Adenylate cyclase type 2", "organism": "Rattus norvegicus", "preferredName": "Adenylate cyclase"}}

The last part of the script parses the data and populates the table.

jsonData1

The Vortex Script

#ChEMBL Target Search Search using uniprotID

# Python imports
import urllib2
import urllib
from com.xhaus.jyson import JysonCodec as json

# Vortex imports
import com.dotmatics.vortex.util.Util as Util
import com.dotmatics.vortex.mol2img.jni.genImage as genImage
import com.dotmatics.vortex.mol2img.Mol2Img as mol2Img
import jarray
import binascii
import string
import os


input_label     = swing.JLabel("Uniprot column (for input)")
input_cb    = workspace.getColumnComboBox()

panel = swing.JPanel()

layout.fill(panel, input_label, 0, 0)
layout.fill(panel, input_cb,    1, 0)

ret = vortex.showInDialog(panel, "Choose uniprot column")

if ret == vortex.OK:
    input_idx = input_cb.getSelectedIndex()

    if input_idx == 0:
        vortex.alert("you must choose a column")
    else:
        col     = vtable.getColumn(input_idx - 1)

        rows = vtable.getRealRowCount()
        for r in range(0, int(rows)):
            uniprotID = col.getValueAsString(r)
            mystr = "http://www.ebi.ac.uk/chemblws/targets/uniprot/" + uniprotID + ".json"
            try:
                myreturn = urllib2.urlopen(mystr).read()
            except urllib2.HTTPError:
                continue
        #       some not found Target not found for accession:P55957
        # if myreturn.find('Target not found') != -1:
            j = json.loads(myreturn)
            TheData = str(j['target']['chemblId'])
            colChemblID = vtable.findColumnWithName('ChEMBLID', 1)
            colChemblID.setValueFromString(r, TheData)
                TheData = str(j['target']['compoundCount'])
            colCompounds = vtable.findColumnWithName('Num Compds', 1)
            colCompounds.setValueFromString(r, TheData)
            TheData = str(j['target']['bioactivityCount'])
            colBio = vtable.findColumnWithName('BioactivityCount', 1)
            colBio.setValueFromString(r, TheData)
            TheData = str(j['target']['targetType'])
            colType = vtable.findColumnWithName('target_Type', 1)
            colType.setValueFromString(r, TheData)
            TheData = str(j['target']['preferredName'])
            colType = vtable.findColumnWithName('preferred_Name', 1)
            colType.setValueFromString(r, TheData)




vtable.fireTableStructureChanged()

Getting ChEMBL Target Data

After pulling back the target information associated with a particular Uniprot ID we may want to find out more about the compounds that have been tested against this target. The table now contains the ChEMBLID (highlighted in red) for the target and we can use this to interrogate ChEMBL to find all molecules that have been tested against this target.

jsonData2

To capture the desired ChEMBL ID we need to know the column and the particular cell containing the ID. To do this we can use an action from the user right-clicking on a cell to capture the contents.

taskID = col.getValueAsString(cell_row)

We also capture the text in the “preferred_name” column to use as the label for a new workspace that will contain the results.

col1 = vtable.findColumnWithName('preferred_Name', 0)

TableName = col1.getValueAsString(cell_row)

We then construct the URL needed to access the web service and then pull back the data.

mystr = "https://www.ebi.ac.uk/chemblws/targets/" + taskID + "/bioactivities.json"

The data in json format looks like this

{"bioactivities": [{"units": "nM", "reference": "Bioorg. Med. Chem. Lett., (2010) 20:19:5811", "target_chemblid": "CHEMBL2111430", "target_name": "MIF/CD74 (Macrophage migration inhibitory factor and HLA-DR antigens-associated invariant chain)", "bioactivity_type": "IC50", "ingredient_cmpd_chemblid": "CHEMBL1257355", "value": "7000", "operator": "=", "parent_cmpd_chemblid": "CHEMBL1257355", "assay_chemblid": "CHEMBL1259539", "activity_comment": "Unspecified", "name_in_reference": "10", "assay_description": "Inhibition of human recombinant biotinylated MIF/CD74 interaction after 30 mins", "organism": "Homo sapiens", "assay_type": "B", "target_confidence": 5}]}

The last part of the script parses the data into a cvs string, and then create column headers.

We then create a new workspace using all the items we created in the script.

arrayToWorkspace(rows, column_names, TableName)

The result is shown below, a new workspace showing all molecules that have been assayed against that target.

targetdata

You need to put this script in the “context” folder which is inside the “Vortex_Add-ons” folder.

The Vortex Script

#ChEMBL Targets Data Search Search
#Authored by Chris Swain (http://www.macinchem.org)
#All rights reserved.

# Python imports
import urllib2
import urllib
import csv
import sys
from com.xhaus.jyson import JysonCodec as json

# Vortex imports
import com.dotmatics.vortex.util.Util as Util
import com.dotmatics.vortex.mol2img.jni.genImage as genImage
import com.dotmatics.vortex.mol2img.Mol2Img as mol2Img
import com.dotmatics.vortex.table.VortexTableModel as vtm
import jarray
import binascii
import string
import os

# Example search string
# http://www.ebi.ac.uk/chemblws/targets/CHEMBL2095179/bioactivities.json

col = vtable.findColumnWithName('ChEMBLID', 0)
col1 = vtable.findColumnWithName('preferred_Name', 0)
if (col == None):
    vortex.alert('Load a workspace with a ChEMBLID column please.')
    quit()
else:
    taskID = col.getValueAsString(cell_row)
    #  taskID = "CHEMBL2095179"
    TableName = col1.getValueAsString(cell_row)


mystr = "https://www.ebi.ac.uk/chemblws/targets/" + taskID + "/bioactivities.json"

myreturn = urllib2.urlopen(mystr).read()

j = json.loads(myreturn)


rows = []
for ba in j['bioactivities']:
   values = [ba['parent_cmpd_chemblid'], ba['target_name'], ba['bioactivity_type'], ba['value'], ba['units'], ba['assay_description'], ba['organism']]
   row = ([str(i) for i in values])
  rows.append(row)

#vortex.addTable("Bioactivities", csvstring, 0, 4, -1, 0)

column_names = ['parent_cmpd_chemblid', 'target_name', 'bioactivity_type', 'value', 'units', 'assay_description', 'organism']

arrayToWorkspace(rows, column_names, TableName)


vtable.fireTableStructureChanged()

ChEMBLID to SMILES script

Whilst the table above contains the textual information associated with an assay it does not include the chemical structure. This script uses the parentcmpdchemblid field and the https://www.ebi.ac.uk/chemblws/compounds/CHEMBL1.json web service to access the chemical data.

withStructures

The data in json format looks like this

{"compound": {"smiles": "COc1ccc2[C@@H]3[C@H](COc2c1)C(C)(C)OC4=C3C(=O)C(=O)C5=C4OC(C)(C)[C@@H]6COc7cc(OC)ccc7[C@H]56", "chemblId": "CHEMBL1", "passesRuleOfThree": "No", "molecularWeight": 544.59, "molecularFormula": "C32H32O8", "acdLogp": 7.67, "stdInChiKey": "GHBOEFUAGSHXPO-XZOTUCIWSA-N", "knownDrug": "No", "medChemFriendly": "Yes", "rotatableBonds": 2, "alogp": 3.63, "numRo5Violations": 1, "acdLogd": 7.67}}

By parsing the data we can pull out the SMILES string and populate the table, Vortex them renders the SMILES to display the structure. It is also possible to modify the script to access the calculated properties and add them to the table.

The Vortex Script

#Use ChEMBLid to get SMILES string
#Authored by Chris Swain (http://www.macinchem.org)
#All rights reserved.

# Python imports
import urllib2
import urllib
from com.xhaus.jyson import JysonCodec as json

# Vortex imports
import com.dotmatics.vortex.util.Util as Util
import com.dotmatics.vortex.mol2img.jni.genImage as genImage
import com.dotmatics.vortex.mol2img.Mol2Img as mol2Img
import jarray
import binascii
import string
import os

# "https://www.ebi.ac.uk/chemblws/compounds/CHEMBL1.json"
colsmi = vtable.findColumnWithName('SMILES', 0)

input_label     = swing.JLabel("ChEMBLid column (for input)")
input_cb    = workspace.getColumnComboBox()

panel       = swing.JPanel()

layout.fill(panel, input_label, 0, 0)
layout.fill(panel, input_cb,    1, 0)

ret = vortex.showInDialog(panel, "Choose ChEMBLid column")

if ret == vortex.OK:
    input_idx = input_cb.getSelectedIndex()

    if input_idx == 0:
        vortex.alert("you must choose a column")
    else:
        col = vtable.getColumn(input_idx - 1)


        rows = vtable.getRealRowCount()
        for r in range(0, int(rows)):
            chemblId = col.getValueAsString(r)
            mystr = "http://www.ebi.ac.uk/chemblws/compounds/" + chemblId + ".json"
            try:
                myreturn = urllib2.urlopen(mystr).read()
            except urllib2.HTTPError:
                continue
#               some not found 

            j = json.loads(myreturn)
            TheData = str(j['compound']['smiles'])
            colsmi = vtable.findColumnWithName('SMILES', 1)
            colsmi.setValueFromString(r, TheData)


vtable.fireTableStructureChanged()

Getting ChEMBL Compound Data Search

Now we have a workspace containing all the molecules tested against a particular target, the next step in the analysis might be to select an particularlyy interesting molecule and see if there is any more biological data in ChEMBL associated with the molecule.

To capture the desired ChEMBL ID we need to know the column and the particular cell containing the ID. To do this we can use an action from the user right-clicking on a cell to capture the contents.

taskID = col.getValueAsString(cell_row)

We also capture the text in the “preferred_name” column to use as the label for a new workspace that will contain the result

The data is returned in this format and can be parsed to populate a new workspace.

{"bioactivities": [{"reference": "Bioorg. Med. Chem. Lett., (2004) 14:9:2047", "target_chemblid": "CHEMBL1985", "target_name": "Glucagon receptor", "organism": "Homo sapiens", "ingredient_cmpd_chemblid": "CHEMBL63923", "value": "73", "operator": "=", "assay_chemblid": "CHEMBL680804", "parent_cmpd_chemblid": "CHEMBL63923", "units": "nM", "activity_comment": "Unspecified", "name_in_reference": "6k", "assay_description": "In vitro binding affinity against human glucagon receptor (h-GlucR) was determined", "bioactivity_type": "Ki", "assay_type": "B", "target_confidence": 8}, {"reference": "Bioorg. Med. Chem. Lett., (2004) 14:9:2047", "target_chemblid": "CHEMBL2097167", "target_name": "Adenylate cyclase", "organism": "Homo sapiens", "ingredient_cmpd_chemblid": "CHEMBL63923", "value": "2000", "operator": ">", "assay_chemblid": "CHEMBL645297", "parent_cmpd_chemblid": "CHEMBL63923", "units": "nM", "activity_comment": "Unspecified", "name_in_reference": "6k", "assay_description": "In vitro inhibitory activity against glucagon induced human adenylate cyclase", "bioactivity_type": "Ki", "assay_type": "B", "target_confidence": 4}]}

The result is shown below.

compoundData

The Vortex Script

#ChEMBL Compound Data Search
#Authored by Chris Swain (http://www.macinchem.org)
#All rights reserved.

# Python imports
import urllib2
import urllib
import csv
import sys
from com.xhaus.jyson import JysonCodec as json

# Vortex imports
import com.dotmatics.vortex.util.Util as Util
import com.dotmatics.vortex.mol2img.jni.genImage as genImage
import com.dotmatics.vortex.mol2img.Mol2Img as mol2Img
import com.dotmatics.vortex.table.VortexTableModel as vtm
import jarray
import binascii
import string
import os

# Example search string
# http://www.ebi.ac.uk/chemblws/compounds/CHEMBL63923/bioactivities.json

input_label     = swing.JLabel("ChEMBLid column (for input)")
input_cb    = workspace.getColumnComboBox()

panel       = swing.JPanel()

layout.fill(panel, input_label, 0, 0)
layout.fill(panel, input_cb,    1, 0)

# Get name of column containing the compound ChEMBLID
ret = vortex.showInDialog(panel, "Choose ChEMBL_ID column")
input_idx = input_cb.getSelectedIndex()
col = vtable.getColumn(input_idx - 1)


#col = vtable.findColumnWithName('parent_cmpd_chemblid', 0)

if (col == None):
    vortex.alert('Load a workspace with a parent_cmpd_chemblid column please.')
    quit()
else:
    taskID = col.getValueAsString(cell_row)         
    #  taskID = "CHEMBL2095179"
    TableName = taskID + " BioProfile"

# Use this string in console for testing
# mystr = "http://www.ebi.ac.uk/chemblws/compounds/CHEMBL2095179/bioactivities.json"


mystr = "https://www.ebi.ac.uk/chemblws/compounds/" + taskID + "/bioactivities.json"

myreturn = urllib2.urlopen(mystr).read()

j = json.loads(myreturn)

I rows = [] for ba in j['bioactivities']: values = [ba['parentcmpdchemblid'], ba['targetname'], ba['bioactivitytype'], ba['operator'], ba['value'], ba['units'], ba['assay_description'], ba['organism'], ba['reference']] row = ([str(i) for i in values]) rows.append(row)

#vortex.addTable("Bioactivities", csvstring, 0, 4, -1, 0)

column_names = ['parent_cmpd_chemblid', 'target_name', 'bioactivity_type', 'qual', 'value', 'units', 'assay_description', 'organism', 'Reference']

arrayToWorkspace(rows, column_names, TableName)


vtable.fireTableStructureChanged()

The four scripts can be downloaded from here.

These two scripts need to be added to the scripts folder.

ChEMBLid2SMILES.vpy
ChEMBLtargetfromUniprot.vpy

Whilst these two scripts need to be stored in the context folder which is in the VortexAddon folder.

ChEMBLTargetDataV1.vpy
ChEMBLCompoundDataV1.vpy

Page Updated 31 October 2014